C 语言辅导、B-tree数据结构代码辅导讲解
- 首页 >> C/C++编程CIS*4430 (Winter 2018): Assignment Three
Instructor: F. Song
Due on Mar. 23, 2018 by midnight
Part I. Paper-and-Pencil Questions
1 (a) A PARTS file with Part# as the hash key includes records with the following Part# values:
2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981, 9208.
The file uses eight buckets, numbered 0 to 7. Each bucket takes one disk block and holds two
records. Load these records into the file in the given order, using the hash function h(KEY) =
KEY mod 8. Calculate the average number of disk accesses for a random retrieval on Part#.
(b) Load the above records into an extendible hashing structure. Show the process by recording
the changes to the bucket and directory structures, as well as the global and local depths. The
hash function is h(KEY) = KEY mod 32, and the binary hashed values are extracted in the
reversing order of a number, i.e., from the least significant bit to the most significant bit.
2 (a) Linear hashing is capable of expanding a file without the need of reorganization. The
method stores records directly into buckets and may use two hashing functions at a time to map a
record to an existing bucket. Assuming that a primary bucket can hold two records and an
overflow bucket can hold only one, build a linear hashing structure for the records with the
following key values:
32, 43, 76, 44, 29,51, 35, 18, 62, 17, 26, 53, and 80.
Suppose that there are initially two empty primary buckets and the hashing function is: H0= KEY
mod 2. Later on when the structure is expanded, you may use H1 = KEY mod 4, H2 = KEY mod
8, and so on. The expansion of the structure is controlled by the upper bound of space utilization
of 80%. Every time a record is added, the space utilization should be computed. If it is over
80%, you need to split an existing bucket in a linear order from left to right. You can compute
space utilization using the following formula:
(Total # of records, including the overflowed)
(Total # of primary buckets) * (Primary bucket size)
(b) Increase the primary bucket size to 4 and redo the above exercise. Explain why the retrieval
performance of linear hashing improves as the primary bucket size increases.
2
Part II. Programming Exercise
3. Continue the exercise in Assignment Two by implementing a delete command to the interface
of the B-tree program. Once again, each B-tree page contains 512 bytes, with fixed-length fields
for a key and a byte-offset, and the data file consists of variable-length records, each of which is
started with a length indicator and followed with a sequence of fields separated by delimiter “|”.
Note that for the purpose of deletion and storage management, the values of “flag for index
modification” and “aval-list” in the header records will be used in this assignment.
When deleting a key value in the B-tree structure, follow these standard conventions as
discussed in the lecture: (i) When deleting a key in an internal node, swap it with the immediate
successor on a leaf page before actually deleting it, and (ii) Always try redistribution before
concatenation, and in both cases, always try the left sibling before the right sibling. Please refer
to the pseudocode in the slide “Deletion Process” in the lecture notes on “11-B-Trees” for the
steps needed for implementing this command.
Note that for both B-tree and data files, the deleted spaces need to be chained together so
that they can be reused later on. This may further affect your implementation on the “insert”
command in order to reuse the deleted spaces. For simplicity, you can assume the first-fit
strategy for both B-tree and data files, that is, you can simply maintain the avail-lists as stack
structures.
In addition to the deletion command, you are also required to add the following
improvements:
(a) To assist debugging and marking, the command “ShowMeta” should not only display the
contents of the header records for both B-tree and data files, but also show the list of available
spaces in the avail-lists for both files.
(b) Coordinate the changes between data and B-tree files. Every time a record is added to or
deleted from the data file, set the status flag in the header record of the data file to be 1, and every
time the corresponding B-tree page(s) is modified and stored back to the disk, reset the status flag
to be 0.
Submission Requirements: The same as described in Assignment One.