辅导CISC-235、讲解Python/Java编程、辅导C/C++设计、讲解HOTNCU留学生

2018.11.07 - 首页 >> Python编程

CISC-235

Fall 2018

Assignment 2

A certain university has decided to get in on the blockbuster film game by creating a set of

inter-connected movies which will be collectively called the HOTNCU (Harvard of the North

Cinematic Universe). Each movie will focus on the gripping adventures of one or more

super-heroes who happen to be students, staff or faculty at the mysterious University Q,

situated in the far-away and not-cold-at-all land of Notario. (The University Administration

is very proud of having come up with this clever disguise for the actual setting of the movies.)

The projected number of movies in the series will be at least 3000 and not more than 4000.

Each movie under development has been assigned a project code name to preserve secrecy.

Each code name is an 8-letter English word. A sample set of code names is provided in the

file HOTNCU_potential_codenames_2018F.txt.

You have been assigned the task of creating a data structure that can

contain 4000 items or more

support insert and search operations

provide access to each item with a maximum of 3 steps (that is, the maximum number

of table addresses examined during an insert operation can be no more than 3). For

more information on this, see below.

Your hard-earned data structures expertise has convinced you that neither a sorted array nor

a binary tree can meet this requirement, so you have settled on using a hash table.

The HOTNCU Project Director was previously a Computer Science professor and she has

taken an interest in your project. She has already decided that you are required to use some

form of open addressing. She is aware that your table will need to be > 4000 in size but she

wants you to try to minimize it.

She wants you to explore at least two forms of open addressing: quadratic probing and

double hashing. For each method she wants you to experiment with different hashing

functions and details of the collision resolution methods to determine a table size that lets you

achieve the required performance standard. See below for a discussion of how to compute

the necessary information.

Part 1:

Formulate an hypothesis regarding which of quadratic probing or double hashing will

achieve the required performance standard with smaller table sizes.

Part 2:

Decide how you will convert the code names into usable key values. This may involve

converting each code name to an integer, or simply treating each code name as a bit string.

You will also find a wealth of ideas on the Internet. Whatever method you decide on, explain

why you chose it and remember to cite your source if it is not your own creation.

You may wish to take advantage of the fact that all the code names have exactly 8 characters.

Part 3:

Implement a hash table where collisions are resolved by quadratic probing.

Use an hashing function of your own choice. You must implement the algorithm

yourself. Using downloaded code from external sources is not permitted – but writing your

own code based on a published algorithm is fine (remember to cite your sources). You may

wish to experiment with different functions to minimize the number of collisions.

Try at least three combinations of and :

2) (remember that you will have to convert the result to an integer),

3) other combinations of your choice.

For each combination, find a table size that lets you achieve the requirement on maximum

probe sequence length. Use experimentation to get close to the minimum table size that

satisfies the requirement.

For experimentation, I suggest that you draw many random samples of 4000 strings from the

provided text file, and run your hashing algorithms on the samples with a range of table sizes

to determine which sizes meet the performance requirement.

A table size must be rejected if there is any code name in the sample for which the insert

operation exceeds the maximum number of steps.

Part 4:

Repeat Part 3 but with Double Hashing instead of Quadratic Probing. Try at least three

combinations of and . For each combination, find a table size that achieves the

required performance.

You are free to choose any hashing functions you like for and , but as with Part 3

you must implement them yourself.

Part 5:

Write a report according the format specified for this course, including an analysis of your

experimental results and whether or not they support your hypothesis.

Computing the Number of Steps in an Insertion

Every time your program looks at the content of a table address, that counts as a step. So if

you are inserting a value and you try addresses 717, 5, and 2083 before finally inserting the

value in address 3006, that counts as four steps.

Logistics:

You may complete the programming part of this assignment in Python, Java, C or C++.

You must submit your source code, properly formatted and documented You must also

submit a PDF file containing your report. All files must contain your name and student

number, and must contain the following statement: “I confirm that this submission is my own

work and is consistent with the Queen's regulations on Academic Integrity.” Combine your

files into a .zip archive for uploading.

You are required to work individually on this assignment. You may discuss the problem in

general terms with others and brainstorm ideas, but you may not share code. This includes

letting others read your code or your written conclusions.

The due date for this assignment is 11:59 PM, November 16, 2018.