辅导CISC-235、讲解Python/Java编程、辅导C/C++设计、讲解HOTNCU留学生
- 首页 >> Python编程CISC-235
Fall 2018
Assignment 2
A certain university has decided to get in on the blockbuster film game by creating a set of
inter-connected movies which will be collectively called the HOTNCU (Harvard of the North
Cinematic Universe). Each movie will focus on the gripping adventures of one or more
super-heroes who happen to be students, staff or faculty at the mysterious University Q,
situated in the far-away and not-cold-at-all land of Notario. (The University Administration
is very proud of having come up with this clever disguise for the actual setting of the movies.)
The projected number of movies in the series will be at least 3000 and not more than 4000.
Each movie under development has been assigned a project code name to preserve secrecy.
Each code name is an 8-letter English word. A sample set of code names is provided in the
file HOTNCU_potential_codenames_2018F.txt.
You have been assigned the task of creating a data structure that can
contain 4000 items or more
support insert and search operations
provide access to each item with a maximum of 3 steps (that is, the maximum number
of table addresses examined during an insert operation can be no more than 3). For
more information on this, see below.
Your hard-earned data structures expertise has convinced you that neither a sorted array nor
a binary tree can meet this requirement, so you have settled on using a hash table.
The HOTNCU Project Director was previously a Computer Science professor and she has
taken an interest in your project. She has already decided that you are required to use some
form of open addressing. She is aware that your table will need to be > 4000 in size but she
wants you to try to minimize it.
She wants you to explore at least two forms of open addressing: quadratic probing and
double hashing. For each method she wants you to experiment with different hashing
functions and details of the collision resolution methods to determine a table size that lets you
achieve the required performance standard. See below for a discussion of how to compute
the necessary information.
Part 1:
Formulate an hypothesis regarding which of quadratic probing or double hashing will
achieve the required performance standard with smaller table sizes.
Part 2:
Decide how you will convert the code names into usable key values. This may involve
converting each code name to an integer, or simply treating each code name as a bit string.
You will also find a wealth of ideas on the Internet. Whatever method you decide on, explain
why you chose it and remember to cite your source if it is not your own creation.
You may wish to take advantage of the fact that all the code names have exactly 8 characters.
Part 3:
Implement a hash table where collisions are resolved by quadratic probing.
Use an hashing function of your own choice. You must implement the algorithm
yourself. Using downloaded code from external sources is not permitted – but writing your
own code based on a published algorithm is fine (remember to cite your sources). You may
wish to experiment with different functions to minimize the number of collisions.
Try at least three combinations of and :
1)
2) (remember that you will have to convert the result to an integer),
3) other combinations of your choice.
For each combination, find a table size that lets you achieve the requirement on maximum
probe sequence length. Use experimentation to get close to the minimum table size that
satisfies the requirement.
For experimentation, I suggest that you draw many random samples of 4000 strings from the
provided text file, and run your hashing algorithms on the samples with a range of table sizes
to determine which sizes meet the performance requirement.
A table size must be rejected if there is any code name in the sample for which the insert
operation exceeds the maximum number of steps.
Part 4:
Repeat Part 3 but with Double Hashing instead of Quadratic Probing. Try at least three
combinations of and . For each combination, find a table size that achieves the
required performance.
You are free to choose any hashing functions you like for and , but as with Part 3
you must implement them yourself.
Part 5:
Write a report according the format specified for this course, including an analysis of your
experimental results and whether or not they support your hypothesis.
Computing the Number of Steps in an Insertion
Every time your program looks at the content of a table address, that counts as a step. So if
you are inserting a value and you try addresses 717, 5, and 2083 before finally inserting the
value in address 3006, that counts as four steps.
Logistics:
You may complete the programming part of this assignment in Python, Java, C or C++.
You must submit your source code, properly formatted and documented You must also
submit a PDF file containing your report. All files must contain your name and student
number, and must contain the following statement: “I confirm that this submission is my own
work and is consistent with the Queen's regulations on Academic Integrity.” Combine your
files into a .zip archive for uploading.
You are required to work individually on this assignment. You may discuss the problem in
general terms with others and brainstorm ideas, but you may not share code. This includes
letting others read your code or your written conclusions.
The due date for this assignment is 11:59 PM, November 16, 2018.