辅导DuckDuckGo、讲解Java/C++编程、辅导search engine、讲解Java

2018.11.13 - 首页 >> Java编程

1. Problem Statement

A new search engine “DuckDuckGo” is implementing a system to count the most popular

keywords used in their search engine. They want to know what the most popular keywords are

at any given time. You are required to undertake that implementation. Keywords will be given

from an input file together with their frequencies. You need to use a max priority structure to

find the most popular keywords.

You must use the following data structures for the implementation.

Max Fibonacci heap: to keep track of the frequencies of keywords

Hash table: keywords should be used as keys for the hash table and value is the pointer to the

corresponding node in the Fibonacci heap.

You will need to perform the increase key operation many times as keywords appear in the

input keywords stream. You are only required to implement the Fibonacci heap operations

that are required to accomplish this task.

2. Guidelines

Do not use complex data structures provided by programming languages. You have to

implement Max-Fibonacci heap data structure by your own using primitive data structures such

as pointers. But if you want you can use a library implementation for the hash table.

Your implementation should be your own. You have to work by yourself for this assignment

(discussion is allowed). Program code will be checked and you may have negative result if it has

reasonable doubt, this is a very easy assignment and should not take you more than 3 hours.

You may implement this assignment in Java or C++. Your program must be compatible with the

compilers and run on the following environment:

You must write a makefile document which creates an executable file named keywordcounter.

3. Input and Output Requirements

Your program is required to take the input file name as an argument. Following is an example of

a program that read from a file named file_name.

For C++

$./ keywordcounter file_name

For Java

java keywordcounter file_name

3.1. Input format

Keywords appear one per each line in the input file and starts with $ sign. In each line, an integer will

appear after the keyword and that is the count (frequency) of the keyword (There is a space

between keyword and the integer). You need to increment the keyword by that count. Queries will

also appear in the input file and once a query (for most popular keywords) appears, you should

append the output to the output file. Query will be an integer number ( ) without $ sign in the

beginning. You should write the top most keyword to the output file. When “stop” (without $ sign)

appears in the input stream, program should end. Following is an example of an input file.

$facebook 5

$youtube 3

$facebook 10

$amazon 2

$gmail 4

$weather 2

$facebook 6

$youtube 8

$ebay 2

$news 2

$facebook 12

$youtube 11

$amazon 6

$facebook 12

$amazon 2

$stop 3

$playing 4

$gmail 15

$drawing 3

$ebay 12

$netflix 6

$cnn 5

stop

3.2. Output format

Once a query appears, you need to write down the most popular keywords to the output file in

descending order. Output for a query should be comma separated list without any new lines.

Once the output for a query is finished you should put a new line to write the output for the

next query. You should produce all the outputs in the output file named “output_file.txt”.

Following is the output file for the above input file.

facebook,youtube,amazon

facebook,youtube,gmail,ebay,amazon

3.3. Assumptions

Input keyword can be any arbitrary string including alphanumeric characters and special

symbols.

The count/frequency can only be a positive number greater than 0.

No spaces in the keywords.

For two keywords to be same, whole word should match. i.e. #youtube and #youtube_music are

two different keywords.

One query has only one integer.

If there are more than one keyword with similar frequencies, you can print them in any order. Query integer ≤ 20. i.e. you need to find at most 20 popular keywords.

Input file may consist of more than 1 million keywords.