C Assignment 留学生辅导讲解、讲解C Assignment 程序、辅导C/C++程序
- 首页 >> C/C++编程Programming Assignment 3 (PA3) - Spam Checker
Due Date: Wednesday, August 29th @ 11:59 pm
Getting Started Detailed Overview Unit Testing Extra Credit
Example Input Functions to be Written README Questions Turnin Summary
Assignment Overview
In this assignment you are writing a two part interactive program meant to check if an email is in a list of spam
addresses or not. The first part will read in email data files and populate hash tables with these email
addresses, then write out these populated tables into an output file. The second part will allow for user
interaction with the data. It will first read in the tables that were written to the output file from the first part and
will then allow the user to enter email addresses and check if they are spam. If the email address is found in all
of the tables then there is a high chance that the email is a spam email. In our program, if the email address is
found in all of the tables, then it is definitely a spam. This program is inspired by Bloom Filtering, which you
may want to read about to fully understand what you are trying to accomplish.
The educational purpose of this programming assignment is to provide you with more practice with dynamic
memory allocation, more complex file I/O, hashtables and separate-chaining hashtables.
We will try not to do too much hand-holding in this PA. You have some experience now and you need to learn
how to look up information yourself. You will not be given nearly as much help in the upper-division classes,
so now is the time to start doing more on your own.
Grading
● README: 10 points - See README Requirements here and questions below
○ http://cseweb.ucsd.edu/~ricko/CSE30READMEGuidelines.pdf
● Compiling: 5 points - Using our Makefile; no warnings. If what you turn in does not compile with the
given Makefile, you will receive 0 points for this assignment. NO EXCEPTIONS!
● Style: 10 points - See Style Requirements here
○ http://cseweb.ucsd.edu/~ricko/CSE30StyleGuidelines.pdf
● Correctness: 75 points
○ Make sure you have all files tracked in Git.
● Extra Credit: 5 points - View Extra Credit section for more information.
● Wrong Language: You will lose 10 points for each module in the wrong language, C vs. Assembly or
vice versa.
NOTE: If what you turn in does not compile with given Makefile, you will receive 0 points for this assignment.
Getting Started
Follow these steps to acquire the starter files and prepare your Git repository.
Gathering Starter Files:
The first step is to gather all the appropriate files for this assignment. Connect to pi-cluster via ssh.
$ ssh cs30yyz@pi-cluster.ucsd.edu
Create and enter the pa3 working directory.
$ mkdir ~/pa3
$ cd ~/pa3
Copy the starter files from the public directory.
$ cp -r ~/../public/pa3StarterFiles/* ~/pa3/
Copy over the following file from PA2. You are responsible for making sure these functions have good style
and work correctly as they are required for this assignment.
$ cp ~/pa2/checkRange.s ~/pa3
Starter files provided:
pa3.h
test.h
Spam emails provided:
/emails/emails_<num>
pa3Strings.h
testrevHash.c
/emails/emails_all
pa3Globals.c
Makefile
*num means the number of spam email addresses in the file
** You can assume no duplicates in the email list
Preparing Git Repository:
You are required to use Git with this and all future programming assignments. Refer to the PA0 writeup for how
to set up your local git repository.
Example Input
Two sample stripped executables provided for you to try and compare your output against are available in the
public directory. Note that you cannot copy them to your own directory; you can only run them using the
following commands (where you will also pass in the command line arguments):
$ ~/../public/pa3testCreate
$ ~/../public/pa3testSearch
NOTE:
1. The output of your program MUST match exactly as it appears in the pa3testCreate and
pa3testSearch output. You need to pay attention to everything in the output, from the order of the
error messages to the small things like extra newlines at the end (or beginning, or middle, or
everywhere)!
2. We are not giving you any sample outputs, instead you are provided some example inputs.
You are responsible for trying out all functionality of the program; the list of example inputs is
not exhaustive or complete. It is important that you fully understand how the program works
and you test your final solution thoroughly against the executable.
Back to Top
Usage for create:
Usage: ./create -i dataFile -o outFile [-s size] [-h]
-i/--infile dataFile -- The file containing the data
-o/--output outFile -- The file to output hash tables to
-s/--size size -- The size of the hash tables.
If not specified defaults to 5000.
-h/--help -- Displays this long usage message
Example input that has error output:
cs30yyz@pi-cluster-001:pa3$ ./create
cs30yyz@pi-cluster-001:pa3$ ./create -o -s
cs30yyz@pi-cluster-001:pa3$ ./create -s 1000000
cs30yyz@pi-cluster-001:pa3$ ./create -i emails/emails_1000 -s 10000
cs30yyz@pi-cluster-001:pa3$ ./search
cs30yyz@pi-cluster-001:pa3$ ./search -i outfile -o
Usage for search:
Usage: ./search -i inFile [-h]
-i inFile -- The file containing the hash tables to search
-h -- Displays this long usage message
Example input that has normal output:
cs30yyz@pi-cluster-001:pa3$ ./create -i emails/emails_10000 -s 10000 -o outfile
cs30yyz@pi-cluster-001:pa3$ ./create -i emails/emails_10000 -o outfile
cs30yyz@pi-cluster-001:pa3$ ./search -i outfile -h
cs30yyz@pi-cluster-001:pa3$ ./search -i outfile
Note: An easy way to check your output against the sample executable is by using diff:
cs30yyz@pi-cluster-001:pa3$ ./create -i emails/emails_10 -o myOutfile
cs30yyz@pi-cluster-001:pa3$ ~/../public/pa3testCreate -i emails/emails_10 -o refOutfile
cs30yyz@pi-cluster-001:pa3$ diff -s myOutfile refOutfile
Detailed Overview
The function prototypes for the various C and Assembly functions are as follows.
C routines:
int main( int argc, char * argv[] ); (in the file create.c)
int main( int argc, char * argv[] ); (in the file search.c)
void launchUserQuery(table_t * htbl, table_t * rtbl, table_t * eotbl);
int checkTables( char * str, table_t * htbl, table_t * rtbl, table_t * eotbl );
int evenOddHash( char * str );
void freeLinkedList( linkedList_t * head );
void * retrieveData( int hashVal, table_t * table );
void populateTables( table_t * htbl, table_t * rtbl, table_t * eotbl,
Back to Top
FILE * dataFile );
void insertData( void ** head, char * str );
void readTables( FILE * inFile, table_t * htbl, table_t * rtbl,
table_t * eotbl );
void writeTables( FILE * outFile, table_t * htbl, table_t * rtbl,
table_t * eotbl );
Assembly routines:
int hash( char * str );
long checkRange( long value, long minRange, long maxRange );
int revHash( char * str );
Process Overview:
The following is an explanation of the main tasks of the assignment, and how the individual functions work
together to form the whole program. There are two diagrams for this program since there are two separate
main functions.
The first portion of this assignment is create, which will be used to read data and produce hash tables that get
written to a file.
Create workflow:
1. Use getopt_long() to parse all the command line arguments passed into the program. Do all the
required error checking.
2. Allocate space for the three different hash tables and begin populating the tables with data inside the
file specified by the user. *Any time during memory allocation, if an error occurs, free all allocated
memory and exit program
3. Write the populated tables into an output file which could be used to run search later.
Back to Top
The second portion of this assignment is search, which allows the user to check if an email is likely to be spam
or not.
Search workflow:
1. Use getopt() to parse all the command line arguments passed into the program. Do all the required
error checking.
2. Read in the different hash tables stored in the specified input file and recreate those tables in memory.
3. Begin the interactive user query mode that allows user to search the tables. If user specified -x for stats
mode, at the end of the query the total stats and collisions found will be printed. [Three collisions means
that there is a higher chance of the email being spam]
4. When the user ends the search with ctrl-d, free all allocated memory and exit the program.
Back to Top
Explanation about the hashtables
There are three hashtables:
● table 1(htbl) - Use hash() as the hash function. The void * at index i should be set to 1 if some
item hashed to index i
● table 2(rtbl) - Use revhash() as the hash function. The void * at index i should be set to 1 if
some item hashed to index i
● table 3(eotbl) - Use evenOddHash() as the hash function. This is a separate-chaining hash
table. When an item is hashed to index i, it should be pushed to the linked list pointed by the pointer
at index i
Back to Top
Memory Diagram of tables 1 and 2:
Memory Diagram of table 3 (linked list):
Functions to be Written
Listed below are all the modules to be written.
hash.s
int hash( char * str );
This function will be used to create the hash key of a string. This function creates an integer hash key from
str with the following hashing algorithm. Your task is to translate this algorithm into assembly. Use the global
variables defined in pa3Globals.c to access these constants in assembly. The return value can be negative
because of overflow.
int hash = HASH_START_VAL;
int strLen = strlen(str);
for( int i = 0; i < strLen; i++ ) {
hash = hash * HASH_PRIME + str[i];
Back to Top
}
return hash;
Return Value: The hash key of str.
revHash.s
int revHash( char * str );
This function is similar to hash.s, the only difference is that to compute the hash key, it iterates through the
string in reverse order. So, instead of starting at the beginning of the string, you will start at the end and work
your way forward.
Return Value: The hash key of str.
evenOddHash.c
int evenOddHash( char * str );
This function is also similar to hash.s, the only difference is that to compute the hash key, it iterates through the
string first by by even and then by odd indices. So, loop through the even indices in the string first (i.e. str[0]
then str[2], etc.), then iterate over all odd indices (i.e. str[1] then str[3], etc.).
Return Value: The hash key of str.
insertData.c
void insertData( void ** head, char * str );
This function takes in the head of a linked list and an element to insert (a string) and then pushes the element
onto the front of the linked list. To implement this function, you need to allocate new memory for the string, so
that the list does not need to rely on the memory already allocated for the string passed in as an argument.
In order to insert a new element in the list first allocate space for the new head of the list (a struct of type
linkedList_t), then allocate space for a copy of the string that was passed in. After that, put the allocated
string with a copy of str into the newly allocated head, and finally attach the new head to the original list.
Reasons for error:
● Error with dynamically allocating memory. Free previously allocated memory, then call perror(),
passing in the macro MEM_ERR defined in pa3.h as the parameter. (perror() will print the correct
error message for you) Then return immediately.
populateTables.c
void populateTables( table_t * htbl, table_t * rtbl, table_t * eotbl,
FILE * dataFile );
This function is in charge of populating 3 hash tables by reading in the dataFile, grabbing each line through
fgets(), and computing the hash value from each of the 3 hash functions. When reading in line by line, use
strchr() to find the newline character in the resulting buffer from fgets(), and replace it with a null
character. Next, in order to allow our strings to be case-insensitive, use tolower() on every character of
Back to Top
each string to fully convert all characters in it to lower-case. Following this, we will generate the three hashes
for each string by calling hash(), revHash(), and evenOddHash(). Note that the hashkey returned by
each of these hash functions will not necessarily be an index within the size of the hash table. In order to
guarantee that the indices are within the size of the hashtable and are positive, you MUST do the following for
each hash key returned by the hashes, and use the result as the index into the corresponding hashtable:
idx = ((hash_key % table_size) + table_size) % table_size.
After calculating all 3 hash indices, index into the hash tables htbl and rtbl and set the entry to 1. The third
hash table contains a linked list at each hash bucket, so when an element hashes to that location, the string is
pushed to the front of that linked list, and we do so by calling insertData().
writeTables.c
void writeTables( FILE * outFile, table_t * htbl, table_t * rtbl,
table_t * eotbl );
This function serializes the three hash tables and writes them to outFile so that they can later be read and
deserialized back into hashtables in their original format (note that the ordering of elements in each linked list
does not matter).
Contents of the tables should be written to outFile in the following order:
1. Write the size of the tables. You only need to write one number here since all three of the them have
the same size.
2. Write all entries of htbl. They are pointed by member tbl in table_t and each entry is a void
*. Use fwrite() (man -s3 fwrite) to write to outFile.
3. Write all entries of rtbl. They are pointed by member tbl in table_t and each entry is a void
*. Use fwrite().
4. Write the strings in each bucket of eotbl. This is slightly more complicated than the three writes
above, which can be accomplished with calls to fwrite. You can use fputs() (man -s3 fputs)
to help you write each string. Notice that fputs writes the string without its terminating null byte (‘\0’), so
you should write a ‘\0’ after each string using fputc()(man -s3 fputc). To represent the end of a
bucket, we want to write an extra null byte (‘\0’).
Example:
The following eotbl should be serialized as:
a@spam.me\0b@spam.me\0\0c@hack.co\0\0\0
Back to Top
How to test:
In testwriteTables.c, you should set up the hashtables for testing and a FILE stream to write to.
To check the correctness of the output, you can use the following commands to dump the content of the file:
od -c filename (octal dump that print printable characters)
xxd filename (hex dump)
Check the man pages for these commands to get more information!
create.c
int main( int argc, char * argv[] );
This is the main driver for the create portion of your program. Its main tasks are to parse the command line
arguments, build the hashtables of email data, and serialize the hashtables to a file.
Parsing command line arguments:
For this assignment, we want the program to support both short and long options, so we will be using
getopt_long() to parse the flags described in the long usage statement. Check the man page (man -s3
getopt_long) for more details and examples of how to use getopt_long(). Make sure to use the
constants for argument parsing defined in pa3.h. The implementation details for the flags are listed below:
Required
/Optional
Short
Flag
Long Flag Required
Argument
How to Handle
O -h --help N/A Print out the long usage to stdout and return indicating
success.
O -s --size table size Convert the table size from a string to a long (using
strtol()) and save the converted value for later use.
Back to Top
R -i --infile infile name Try to open the file of this name
R -o --output output file name Try to open the file of this name
In the case where getopt_long() detects an invalid flag or missing flag argument, it will automatically print
an error message for you. If this happens, you also need to print the short usage and return indicating failure.
Note that infile and output flags are required, so you need to check if either of them is missing. Size flag is
optional. DEFAULT_SIZE (pa3.h) should be used if no size flag entered.
If all flags are parsed without any errors, make sure you also check for extra arguments after getopt_long()
completes.
Build the hashtables and write to file:
Dynamically allocate zero-filled memory for the three hashtables. Remember to check if any error happens
during dynamic memory allocation. If no error occurs, call populateTables(). After the hashtables get
populated, call writeTables() to write to the file.
Free up memory:
Now you are done with the hashtables and are about to exit. Remember to free all the memory you
dynamically allocated. The function freeLinkedList() should be helpful here.
Reasons for error:
In this file, as soon as an error is detected, print the appropriate error message(s), short usage
SHORT_CREATE_USAGE (see pa3Strings.h), and return indicating failure.
The following are error messages (numbers indicate the order of error checking) you need to print for each
case, make sure you don’t forget to print the short usage in each case!
1. Command line argument error
○ Size (make sure the error checking is the following order)
■ Size is not a valid number: print the error message (INVALID_NUM)
■ Size is a number too large to be converted to a long: use snprintf() and perror()
to print out the error message (TOO_LARGE_NUM)
■ Size is not within [MIN_SIZE, MAX_SIZE] (pa3.h) inclusive. (checkRange())
○ Infile
■ Error encountered when opening the file. If this happens, call perror() with the
FILTER_ERR, and immediately return. (Don’t forget to set errno to 0 before calling
fopen())
○ Output
■ Error encountered when opening the file. If this happens, call perror() with the
WTABLE_FILE_ERR, and immediately return. (Don’t forget to set errno to 0 before
calling fopen())
○ Any missing flag argument or invalid flag
■ getopt_long() prints out the error message for you
2. Missing either of infile flag or output flag
○ Print ARG_ERR
Back to Top
3. extra arguments after getopt_long() completes
○ Print EXTRA_ARGS
4. Dynamic memory allocation failed
○ Free whatever you have dynamically allocated in this function
○ Call perror() with the custom MEM_ERR
Return Value: EXIT_SUCCESS on success and EXIT_FAILURE on any error.
search.c
int main( int argc, char * argv[] );
This function performs the functionality for the search program. It starts by parsing the command line
arguments, handling any errors encountered in the process; then it creates the hash tables and fills them by
reading from the input file. Finally it performs an interactive search and prints stats if needed.
Now let’s take a closer look at each step of the program.
1. Parse command line arguments using getopt: we need to handle 2 possible options
a. INFILE_FLAG: use fopen to open the specified input file. If an error occurs opening the file,
print RTABLE_FILE_ERROR using perror, and then print SHORT_SEARCH_USAGE.
b. HELP_FLAG: simply print LONG_SEARCH_USAGE.
c. If we didn't see an input file, print error message MISSING_INFILE, and
SHORT_SEARCH_USAGE, then immediately return.
d. If we saw extra args, print error message EXTRA_ARGS, and SHORT_SEARCH_USAGE, then
immediately return.
2. Create the 3 tables, and then fill them by reading from the specified input file (call readTables())
3. Perform interactive search by calling launchUserQuery()
4. Finally, don't forget to free all allocated memory.
Reasons for error:
● Error calling fopen
● No input file specified
● Extra arguments for the input flags
Return Value: 1 if an error occurs; 0 otherwise.
launchUserQuery.c
void launchUserQuery(table_t * htbl, table_t * rtbl, table_t * eotbl);
This function handles user interaction to search the tables and prints out the result for each search.
First, start by printing out a prompt for the user to input the string. Then, use fgets() in a while loop to read
the user input line by line.
● Once we read in a string, we want to find any newline character also read in along the string, and
replace it with a NUL character.
● If the user did not enter any string (simply pressed ‘enter’), then we want to immediately reprompt, and
skip the search.
Back to Top
● Before performing the search, we also need to pre-process the string by lowercasing every character.
(Hint: you might find tolower() helpful)
● Finally we want to check the string in three tables by calling checkTables().
○ If checkTables() returns the macro IN_TABLE, print FOUND_IN_ALL with the current word
as the argument.
○ If the string is found in at least one table, print FOUND_IN_SOME with corresponding arguments.
○ If the string in not found in any of the tables, print WORD_NOT_FOUND.
● At the end of each fgets() iteration, reprompt the user for the next input.
Lastly, print a newline before exiting the function.
checkTables.c
int checkTables( char * str, table_t * htbl, table_t * rtbl, table_t * eotbl );
This function reports the number of tables the input string was seen in.
Perform the check in each table by calling retrieveData(). Check htbl first, if it is found in htbl then
check rtbl, if it is found in rtbl then check if it is found in eotbl. Increment the count every time the string
is found in a table. If the string is found in eotbl, then you must also look inside the linked list attached to its
eotbl index. If the input string itself is also found in the linked list, return IN_TABLE instead of the count.
Return Value: Number of tables the string was found in, or IN_TABLE if it is found in all three tables and the
linked list attached to the 3rd table
freeLinkedList.c
void freeLinkedList( linkedList_t * head );
This function frees all elements in the input linked list. To free a linked list, iterate through the list starting from
the head, first free the value of the node, then the node itself, until we reach the end of the list.
retrieveData.c
void * retrieveData( int hashVal, table_t * table );
This function takes a hash value and a table pointer and returns the element at the relevant index of the table.
To calculate the index, mod the hash value by the size of the table, add table size to the result, and then mod
the entire result by the table size again (the same formula from populateTables.c). f
Return Value: The element at the relevant index of the table.
readTables.c
void readTables( FILE * inFile, table_t * htbl, table_t * rtbl,
table_t * eotbl );
Back to Top
Read from outFile and populate the three hashtables serialized in writeTables(). You can assume
the outFile is a file of the same format produced by writeTables(). The hashtables should have the
same structure as original ones, but the order of elements within a linked list does not matter.
Refer to writeTables() for how the hashtables are serialized to the file.
You should read in the following order:
● Read the size of the hashtables
● Read the htbl contents
● Read the rtbl contents
● Read eotbl contents
Here is one way to implement reading the third table eotbl from the file:
● Keep a relatively big buffer (size should be greater than max word size) and read the file in blocks of
the size of the buffer with fread()
○ If there is no incomplete string left from last read into the buffer, you can try reading in a whole
buffer size of data
○ If there is an incomplete string from last read, the incomplete string should be move to the front
of the buffer (more details on this in the next bullet point). When you read from file into the buffer
again, you should append to the incomplete string that is currently at the beginning of the buffer
● Process data in the buffer you just read
○ If you encounter a ‘\0’,check if it is the end of a bucket in eotbl or the end of a string
■ If it is the end of a bucket, go to the next bucket
■ If it is the end of a string, push the string to the linked list of the current bucket
(insertData())
○ If there is a string that is at the end of the buffer but not terminated, move the incomplete string
to the front of the buffer so that next time when you read data from the file into the buffer, you
can start after the end of the incomplete string. You can use memmove()(man -s3
memmove) to accomplish this.
Reasons for error:
● Error with dynamically allocating memory.
○ Free previously allocated memory for the hashtables. Call perror(), passing in the macro
MEM_ERR defined in pa3.h as the parameter. (perror() will print the correct error message
for you) And then return.
Return Value: None (The three hashtables should be populated correctly if no error occurred)
Unit Testing
You are provided with only one basic unit test file for testrevHash(. This file only has minimal test cases
and is only meant to give you an idea of how to write your own unit test files. You must write unit test files
for some of the functions (as listed below), as well as add several of your own thorough test cases to
all of the unit test files. You need to add as many unit tests as necessary to cover all possible use
cases of each function. You will lose points if you don’t do this! You are responsible for making sure you
thoroughly test your functions. Make sure you think about boundary cases, special cases, general cases,
extreme limits, error cases, etc. as appropriate for each function. The Makefile includes the rules for compiling
Back to Top
these tests. Keep in mind that your unit tests will not build until all required files for that specific unit test have
been written. These test files will be collected and count as part of the correctness of your program.
Unit tests you need to complete:
testinsertData.c
testhash.c
testrevHash.c
testevenOddHash.c
testpopulateTables.c
testwriteTables.c
To compile:
$ make testrevHash
To run:
$ ./testrevHash
(Replace “testrevHash” with the appropriate file names to
compile and run the other unit tests)
README Questions
1. [AI] Your friend is stressed and asks you for your CSE 30 PA code. How do you convince them to act
with integrity?
2. [Unix] You are trying to open your pa3Strings.h file. You have already typed “vim pa3S”. Which single
key can you press to autocomplete the command to “vim pa3Strings.h”?
3. [Vim] How do you change the color scheme in vim? Give the command you would use to change the
color scheme to “desert”.
Extra Credit
There are 5 points total for extra credit on this assignment.
● [5 Points] Multiple Email Search
Multiple Email Search
Do not begin this extra credit until your non extra credit assignment is completed and you have tested it
thoroughly. You will need to modify your launchUserQuery() to accept multiple emails entered at the same
time. There will be no changes to how the non extra credit program operates, you will be making a separate
searchEC program. To begin, make a copy of your launchUserQuery.c file. Do not make changes to any
other files (also do not worry about the usage message, keep it the same as the normal program).
$ cp ~/pa3/launchUserQuery.c ~/pa3/launchUserQueryEC.c
To make the extra credit:
cs30yyz@pi-cluster-001:pa3$ make searchEC
There is a public executable for you to test with, use your normal create to make the infile as this has not
changed.
cs30yyz@pi-cluster-001:pa3$ ~/../public/pa3testSearchEC -i infile
launchUserQueryEC.c
void launchUserQuery(table_t * htbl, table_t * rtbl, table_t * eotbl);
Back to Top
First note that you should NOT change the name of the function, only the name of the file is different. You will
need to make a few changes to this function in order to properly deal with multiple emails entered at the same
time. As an example, this is what the program output should look like with multiple emails entered:
Enter a word: hijack@spam.me jello grad@hack.co
hijack@spam.me is SPAM!
jello is not SPAM!
grad@hack.co is SPAM!
Enter a word:
Thus, you will need to so some form of looping over the emails entered. Make sure that your new functionality
is displaying the same stats as in the normal program.
Be sure to test well with the public executable to ensure your output is correct. For simplicity, you are
guaranteed that each email will be separated by a single space.
Turnin Summary
Log in to gradescope.com using your @ucsd.edu email address, select the CSE 30 course, assignment PA3,
and upload the following files. Your file names must match the below *exactly* otherwise our Makefile will not
find your files.
Due Date: Wednesday night, August 29th @ 11:59 pm
Files required for the Final Turn-in:
hash.s
revHash.s
checkRange.s
testinsertData.c
testhash.c
testrevHash.c
testevenOddHash.c
testpopulateTables.c
testwriteTables.c
launchUserQuery.c
checkTables.c
create.c
evenOddHash.c
freeLinkedList.c
retrieveData.c
pa3Globals.c
populateTables.c
insertData.c
readTables.c
search.c
writeTables.c
pa3.h
pa3Strings.h
test.h
Makefile
README
Extra Credit Files:
launchUserQueryEC.c
If there is anything in these procedures which needs clarifying, please feel free to ask any tutor, the instructor,
or post on the Piazza Discussion Board.
Back to Top