C 语言辅导、B-tree数据结构代码辅导讲解

2018.04.25 - 首页 >> C/C++编程

CIS*4430 (Winter 2018): Assignment Three

Instructor: F. Song

Due on Mar. 23, 2018 by midnight

Part I. Paper-and-Pencil Questions

1 (a) A PARTS file with Part# as the hash key includes records with the following Part# values:

2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981, 9208.

The file uses eight buckets, numbered 0 to 7. Each bucket takes one disk block and holds two

records. Load these records into the file in the given order, using the hash function h(KEY) =

KEY mod 8. Calculate the average number of disk accesses for a random retrieval on Part#.

(b) Load the above records into an extendible hashing structure. Show the process by recording

the changes to the bucket and directory structures, as well as the global and local depths. The

hash function is h(KEY) = KEY mod 32, and the binary hashed values are extracted in the

reversing order of a number, i.e., from the least significant bit to the most significant bit.

2 (a) Linear hashing is capable of expanding a file without the need of reorganization. The

method stores records directly into buckets and may use two hashing functions at a time to map a

record to an existing bucket. Assuming that a primary bucket can hold two records and an

overflow bucket can hold only one, build a linear hashing structure for the records with the

following key values:

32, 43, 76, 44, 29,51, 35, 18, 62, 17, 26, 53, and 80.

Suppose that there are initially two empty primary buckets and the hashing function is: H0= KEY

mod 2. Later on when the structure is expanded, you may use H1 = KEY mod 4, H2 = KEY mod

8, and so on. The expansion of the structure is controlled by the upper bound of space utilization

of 80%. Every time a record is added, the space utilization should be computed. If it is over

80%, you need to split an existing bucket in a linear order from left to right. You can compute

space utilization using the following formula:

(Total # of records, including the overflowed)

(Total # of primary buckets) * (Primary bucket size)

(b) Increase the primary bucket size to 4 and redo the above exercise. Explain why the retrieval

performance of linear hashing improves as the primary bucket size increases.

Part II. Programming Exercise

3. Continue the exercise in Assignment Two by implementing a delete command to the interface

of the B-tree program. Once again, each B-tree page contains 512 bytes, with fixed-length fields

for a key and a byte-offset, and the data file consists of variable-length records, each of which is

started with a length indicator and followed with a sequence of fields separated by delimiter “|”.

Note that for the purpose of deletion and storage management, the values of “flag for index

modification” and “aval-list” in the header records will be used in this assignment.

When deleting a key value in the B-tree structure, follow these standard conventions as

discussed in the lecture: (i) When deleting a key in an internal node, swap it with the immediate

successor on a leaf page before actually deleting it, and (ii) Always try redistribution before

concatenation, and in both cases, always try the left sibling before the right sibling. Please refer

to the pseudocode in the slide “Deletion Process” in the lecture notes on “11-B-Trees” for the

steps needed for implementing this command.

Note that for both B-tree and data files, the deleted spaces need to be chained together so

that they can be reused later on. This may further affect your implementation on the “insert”

command in order to reuse the deleted spaces. For simplicity, you can assume the first-fit

strategy for both B-tree and data files, that is, you can simply maintain the avail-lists as stack

structures.

In addition to the deletion command, you are also required to add the following

improvements:

(a) To assist debugging and marking, the command “ShowMeta” should not only display the

contents of the header records for both B-tree and data files, but also show the list of available

spaces in the avail-lists for both files.

(b) Coordinate the changes between data and B-tree files. Every time a record is added to or

deleted from the data file, set the status flag in the header record of the data file to be 1, and every

time the corresponding B-tree page(s) is modified and stored back to the disk, reset the status flag

to be 0.

Submission Requirements: The same as described in Assignment One.