辅导DuckDuckGo、讲解Java/C++编程、辅导search engine、讲解Java
- 首页 >> Java编程1. Problem Statement
A new search engine “DuckDuckGo” is implementing a system to count the most popular
keywords used in their search engine. They want to know what the most popular keywords are
at any given time. You are required to undertake that implementation. Keywords will be given
from an input file together with their frequencies. You need to use a max priority structure to
find the most popular keywords.
You must use the following data structures for the implementation.
Max Fibonacci heap: to keep track of the frequencies of keywords
Hash table: keywords should be used as keys for the hash table and value is the pointer to the
corresponding node in the Fibonacci heap.
You will need to perform the increase key operation many times as keywords appear in the
input keywords stream. You are only required to implement the Fibonacci heap operations
that are required to accomplish this task.
2. Guidelines
Do not use complex data structures provided by programming languages. You have to
implement Max-Fibonacci heap data structure by your own using primitive data structures such
as pointers. But if you want you can use a library implementation for the hash table.
Your implementation should be your own. You have to work by yourself for this assignment
(discussion is allowed). Program code will be checked and you may have negative result if it has
reasonable doubt, this is a very easy assignment and should not take you more than 3 hours.
You may implement this assignment in Java or C++. Your program must be compatible with the
compilers and run on the following environment:
You must write a makefile document which creates an executable file named keywordcounter.
3. Input and Output Requirements
Your program is required to take the input file name as an argument. Following is an example of
a program that read from a file named file_name.
For C++
$./ keywordcounter file_name
For Java
java keywordcounter file_name
3.1. Input format
Keywords appear one per each line in the input file and starts with $ sign. In each line, an integer will
appear after the keyword and that is the count (frequency) of the keyword (There is a space
between keyword and the integer). You need to increment the keyword by that count. Queries will
also appear in the input file and once a query (for most popular keywords) appears, you should
append the output to the output file. Query will be an integer number ( ) without $ sign in the
beginning. You should write the top most keyword to the output file. When “stop” (without $ sign)
appears in the input stream, program should end. Following is an example of an input file.
$facebook 5
$youtube 3
$facebook 10
$amazon 2
$gmail 4
$weather 2
$facebook 6
$youtube 8
$ebay 2
$news 2
$facebook 12
$youtube 11
$amazon 6
3
$facebook 12
$amazon 2
$stop 3
$playing 4
$gmail 15
$drawing 3
$ebay 12
$netflix 6
$cnn 5
5
stop
3.2. Output format
Once a query appears, you need to write down the most popular keywords to the output file in
descending order. Output for a query should be comma separated list without any new lines.
Once the output for a query is finished you should put a new line to write the output for the
next query. You should produce all the outputs in the output file named “output_file.txt”.
Following is the output file for the above input file.
facebook,youtube,amazon
facebook,youtube,gmail,ebay,amazon
3.3. Assumptions
Input keyword can be any arbitrary string including alphanumeric characters and special
symbols.
The count/frequency can only be a positive number greater than 0.
No spaces in the keywords.
For two keywords to be same, whole word should match. i.e. #youtube and #youtube_music are
two different keywords.
One query has only one integer.
If there are more than one keyword with similar frequencies, you can print them in any order. Query integer ≤ 20. i.e. you need to find at most 20 popular keywords.
Input file may consist of more than 1 million keywords.