辅导CSC 230设计编程、讲解c/c++语言程序、c++编程调试 讲解Python程序|讲解留学生Prolog
- 首页 >> Java编程 CSC 230 Project 4
Movie Watch List Manager
For this project, you get to write a program that will help you manage a list of movies that you would like to watch. It will
maintain a database of available movies, read in at program start-up. By entering commands for the program, the user can view
the entire movie database, or just the movies from a particular range of years or those with a title containing a given string. The
user can choose movies to add to their watch list and later remove them if they change their mind.
The sample execution below shows how you can run the program. The bold text is input typed by the user. Here, we're telling it
to read a movie database from the input file, list-d.txt. We ask it to output the entire database (11 movies in this case), and
then we ask it to list just the movies with a year between 1990 and 1999. Then we ask it to display the watch list (which is
empty), so we add a few movies to our watch list and display the list again. Finally, we remove one movie from the list, take
another look at the list, and enter the quit command to terminate the program.
$ ./movies list-d.txt
cmd> database
database
ID Title Year Len
4511 Aladdin 1992 90
4772 Alice in Wonderland 1951 75
18145 Cinderella 1950 74
61360 Mulan 1998 88
70281 Pinocchio 1940 88
70767 Pocahontas 1995 81
82766 Snow White and the Seven Dwarfs 1937 83
99053 The Lion King 1994 88
111278 Toy Story 1995 81
111279 Toy Story 2 1999 92
111280 Toy Story 3 2010 103
cmd> year 1990 1999
year 1990 1999
ID Title Year Len
4511 Aladdin 1992 90
99053 The Lion King 1994 88
70767 Pocahontas 1995 81
111278 Toy Story 1995 81
61360 Mulan 1998 88
111279 Toy Story 2 1999 92
cmd> list
list
List is empty
cmd> add 99053
add 99053
cmd> add 61360
add 61360
cmd> add 111278
add 111278
cmd> list
list
ID Title Year Len
99053 The Lion King 1994 88
61360 Mulan 1998 88
111278 Toy Story 1995 81
cmd> remove 61360
remove 61360
cmd> list
list
ID Title Year Len
99053 The Lion King 1994 88
111278 Toy Story 1995 81
cmd> quit
quit
As with recent projects, you'll be developing this one using git for revision control. You should be able to just unpack the starter
into the p4 directory of your cloned repo to get started. See the Getting Started section for instructions.
This project supports a number of our course objectives. See the Learning Outcomes section for a list.
The project uses a list of movies based on the title.basics.tsv.gz file previously retrieved from the IMDb (Internet Movie
Database) website.
Rules for Project 4
You get to complete this project individually. If you're unsure what's permitted, you can have a look at the academic integrity
guidelines in the course syllabus.
In the design section, you'll see some instructions for how your implementation is expected to work. Be sure you follow these
rules. It's not enough to just turn in a working program; your program has to follow the design constraints we've asked you to
follow. For this assignment, we're putting some constraints on the functions you'll need to define, the data structures you'll use
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 2/12
and how you're going to organize your code into components. Still, you will have lots of opportunities to design parts of your
solution and to create additional functions to simplify your implementation.
Requirements
This section says what your program is supposed to be able to do, and what it should do when something goes wrong.
Program Execution
The movies program expects one or more filenames on the command line. Each of these files should contain a list of movies that
the program can read into its database at startup. If the program is run with invalid command-line arguments (e.g., no filenames
given on the command line), it should print the following usage message to standard error and exit with a status of 1.
usage: movies*
If the program can't open one of the given files for reading, it should print the following message to standard error and exit with
a status of 1. Here, filename is the name of the file given on the command line. The program should report the first filename on
the command line that it can't successfully open (i.e., if there are multiple filenames on the command line that can't be opened,
it just needs to report this error for the first one that can't be opened).
Can't open file: filename
Movie List Format
At program start-up, the movies program reads in a database of movies. On the command line, it is given one or more filenames
for files containing movie lists, stored in a particular format. Each line of a movie list file describes one movie. A movie
description consists of five fields, with tab characters (ASCII 0x09) separating the fields. The first field is an integer ID unique to
the given movie. The next field is a title for the movie (a string). The next field is a integer year of when the movie was released.
The next field is an integer length of the movie in minutes and the last field is a string listing various genres for the movie. The
list of genres is a comma-separated list of strings. None of the fields will contain a tab character, and none of them will be empty.
Format of a line of a Movie List
Your program will only use the genre field if you're doing the extra credit part of the assignment. Otherwise, your program can
just skip over this field as it reads in a movie list. The possible genres are Action, Adventure, Animation, Biography, Comedy,
Crime, Documentary, Drama, Family, Fantasy, History, Horror, Musical, Mystery, Romance, Sci-Fi, Sport, Thriller, War, and
Western.
Some of the title fields are fairly long, but you will only need to store the first 38 characters of the title. This is described in the
Movie Listing section below.
The program should process the movie list files in the same order they are given on the command line. Within each movie list, it
should process movies in order from the first line to the last line of the file. The order for processing these files matters for error
reporting. If there is something wrong with a movie list, the program should report the first error it encounters.
A movie list file can contain any number of movie descriptions, one per line. If the format of the movie list is invalid (e.g., if a line
is missing one of the expected fields or if one of the numeric fields can't be parsed as a number), then it should print the
following message to standard error and exit with a status of 1. Here, filename is the name of the file containing the bad movie
description.
Invalid movie list: filename
Every movie should have a unique numeric ID. If the program encounters more than one movie with the same ID (even if other
fields like the title or year are different), it should print the following message to standard error and exit with a status of 1. Here,
ID is the movie ID that occurred more than once. The program should detect duplicate IDs, whether they occur within the same
movie list file or across two different movie lists.
Duplicate movie id: ID
Watch List
As the user interacts with the movies program, they can select movies from the database to add to their watch list, a subset of
movies the user plans to watch. The database is the set of all movies available, and the watch list is the subset of the database
that the user has selected.
It's possible for the watch list to be empty (it's empty when the program starts up). It's managed like a set, so it can't contain
more than one of the same move.
Movie List Output
A few user commands are used to list movies, either from the database or from the watch list. The output format for these
reports is mostly the same. It consists of a header describing each of the four fields (like the example shown below). After the
header, the report lists one movie per line. Each movie is reported as a movie ID in a 6-character field, a movie title in a 38
character field, a movie year in a 4 character field, and finally a movie length given in a 3 character field. For the year and the
movie length, the widths of 4 and 3 are minimum field widths, so it's possible to have a year with more than four digits or a
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 3/12
length with more than three digits. For cases like these, the columns may not line up properly. Each of these fields is rightaligned
and has a single space separating them.
ID Title Year Len
8466 Avatar 2009 162
84694 Star Trek VI: The Undiscovered Country 1991 110
84702 Star Wars: Episode IV - A New Hope 1977 121
94055 The Englishman Who Went Up a Hill Bu.. 1995 99
108082 The Wizard of Oz 1982 78
For movie titles that are too long to fit in their field width, you will print as much of the title as you can, and then print two
periods instead of the last two characters of the field, to indicate that the whole title was too long to fit. You can see this in the
"The Englishman Who Went Up a Hill But Came Down a Mountain" title above. Here, we print just the first 36 characters of the
title, then print two periods at the end, making the overall length exactly 38 characters.
User Commands
After start-up, the movies program reads commands typed in by the user. Each command is given as single line of user input.
For each command, the program will prompt the user with the following prompt. There's a space after the greater-than sign, but
you probably can't see it in this web page.
cmd>
After the user enters a command, the program will echo that command back to the user on the next output line. This is mostly to
help with debugging your programs. If we're capturing program output to a file, then things typed by the user don't go to the
output file (user inputs show up on the terminal, but they're not part of the program's output). By echoing each command, our
output files will include a copy of each command the user typed, making it easier to see what the program was asked to do. So,
for example, if the user typed in a command like the following, the program would echo a copy of the command on the next line:
cmd> year 1990 1999
year 1990 1999
The user can type any of 7 (or 8) available commands: database, year, title, add, remove, list and quit. There is also a
genre command that can be implemented for extra credit. These commands are described below. Each valid command starts
with one of the keywords listed above. For some commands, the keyword must be followed by one or more parameters. There
may be one or more whitespace characters at the start of the command, between the keyword and the parameters, between
parameters or at the end of the command. Any non-whitespace characters on a line following a valid command (and it's
parameters, if any) may be ignored by the program.
If the user enters an invalid command, the program should print the following message to standard output (not standard error),
ignore the command and prompt the user for another command. Invalid commands would be those that start with something
other than the 7 (or 8 for extra credit) keywords listed above, or if the command's parameters weren't correct.
Invalid command
After the first prompt, the program should print a blank line before prompting the user for another command. This is shown in
the sample execution at the start of this project description. It's just to provide a little separation between the output for
consecutive commands.
The program should terminate when it is given the quit command or when it reaches the end-of-file on standard input. In the
case of the quit command, the program should echo the command back to the user (like all the other commands). In the case of
end-of-file, there's no command to echo, so the program should just terminate.
Database command
If the user enters the database command, the program should print out all the movies in the entire database in the format
described in the "Movie List Output" section above. Movies should be sorted by their ID field, least to greatest.
If there are no movies in the database (this could happen if the program was given an empty movie list file to read), the
program should print the following message to standard output and then prompt for another command.
No matching movies
Year command
The year command requires two integer parameters, a low value and a high value. It lists movies from the database with a year
at least as high as the low value and no higher than the high value. Output should be given in the format described in the "Movie
List Output" section above, ordered by year, from low to high. Movies with the same year should be sorted by ID.
For example, the user could enter the following year command, with the following response from the program. Notice that the
two movies with a year of 1995 are ordered by ID number.
cmd> year 1990 1999
year 1990 1999
ID Title Year Len
4511 Aladdin 1992 90
99053 The Lion King 1994 88
70767 Pocahontas 1995 81
111278 Toy Story 1995 81
61360 Mulan 1998 88
111279 Toy Story 2 1999 92
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 4/12
The year command would be invalid if it was missing a parameter, or one of its parameters couldn't be parsed as an integer
value, or if its first parameter was greater than its second parameter. If the range of years doesn't contain any movies, the
program should print a line to standard output saying "No matching movies", like in the following example:
cmd> year 1978 1979
year 1978 1979
No matching movies
Title command
For this command, the user can enter a single-token string. The program will find all movies in the database that contain that
string as a substring in their title field. It will print out just these movies from the database in the format described in the "Movie
List Output" section above, with movies whose title contains or is equal to the given string listed in order of ID. A movie's title
field matches the given string even if the string is just a substring of a longer word in the movie's subject field. For example, if
the user entered "title and", then it could match a movie that had the word "and" in its title field, or one that had the word
"Wonderland" in its title field. For example,
cmd> title and
title and
ID Title Year Len
4772 Alice in Wonderland 1951 75
82766 Snow White and the Seven Dwarfs 1937 83
If there are no matching movies in the database, the title command should print the "No matching movies" message to standard
output, just like the database and year commands.
A title command would be invalid if it didn't have a string after the title keyword.
Genre command
The genre command is for extra credit. For this command, the program will need to store the list of genres for each movie (the
last field on each line in a movie list). The user can enter the genre keyword, followed by a single word (a sequence of nonwhitespace
characters). The program will find all movies in the database that contain that word as a substring in their genre
field. It will print out just these movies from the database in the format described in the "Movie List Output" section above, with
movies that match the given word listed in order of ID. A movie's genre field matches the given word even if the word is just a
substring of a longer word in the movie's genre field. For example, if the user entered "genre mat", then it could match a movie
that had the word the word "Animation" in its genre field.
If there are no matching movies in the database, the genre command should print the "No matching movies" message to
standard output, just like the database and year commands.
A genre command would be invalid if it didn't have a word after the genre keyword.
Add command
The add command is for adding movies from the database to the watch list. Movies added to the watch list are added at the end,
and adding a movie to the watch list doesn't remove it from the database; it just puts that movie on the watch list. The add
command expects a movie ID as a parameter. So, for example, the following command would add the movie with an ID of 42 to
the watch list.
add 42
An add command would be invalid if there wasn't an integer after the add keyword. If the integer doesn't match a movie ID from
the database, the program should print the following to standard output (where ID is the ID the user asked to add).
Movie ID is not in the database
If the user gives the ID of a movie that's already on the watch list, the program should print the following message to standard
output (where ID is the ID the user asked to add):
Movie ID is already on the watch list
Remove command
The remove command is for removing movies from the watch list. As a parameter, it expects the ID of the movie to be removed.
It removes that movie from the watch list, and the remaining movies stay in the same order. So, for example, the following
command would remove the movie with an ID of 42 from the watch list.
remove 42
If the remove command isn't given a valid integer as a parameter, then it is an invalid command. If the parameter is a valid
integer but doesn't match the ID of a movie on the watch list, then it should print the following message to standard output,
where ID is the ID of the movie the user asked to remove.
Movie ID is not on the watch list
List command
The list command shows the movies on the watch list in the same format described in the "Movie List Format" section above.
Notice that the watch list is ordered based on the order movies were added (not sorted by ID).
For example, running the list command might look like the following.
cmd> list
list
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 5/12
ID Title Year Len
70356 Pirates of the Caribbean: The Curse .. 2003 143
59720 Mission: Impossible 1996 110
91003 The Bourne Identity 2002 119
30675 Ferris Bueller's Day Off 1986 103
If the watch list is empty, the program should print a line to standard output saying "List is empty".
So, for example, you might get the following response from a list command:
cmd> list
list
List is empty
Quit command and termination
The quit command doesn't take any parameters. It should terminate the program. It's entered like the following:
cmd> quit
The program should also terminate successfully if it reaches the end-of-file on standard input while it's trying to read the next
command.
Design
Program Organization
Your implementation will be organized into three components. The input component will help with reading input from the movie
list files and from the user. The database component will contain code for implementing movies and the database. The movies
component will contain main, code to read in user commands and the implementation for the watch list.
Components and Dependency Structure
The input and database components will each have a header file, so other components can use types and functions defined by
these components. The figure above shows the dependency structure of the project. The database component can use code
provided by input and the main movies component can use code provided by both input and database.
Movie and Database Representation
This project is a good chance to get some experience using structs, dynamic memory allocation and resizable arrays. Each movie
will be represented by a struct with a field for each of the four values associated with a movie. The title field will be a string, and
the ID, year, and length can be stored as ints. The title field just needs to be able to store a string of up to 38 characters.
Although lots of titles are longer than this, the output of the program never reports more than 38 characters for a title, so you
won't need to store more than the first 38 characters.
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 6/12
Movie Representation
The database will be represented by its own struct, containing fields to store a resizable array of pointers to movies. Each movie
will be stored in a block of dynamically allocated memory. Inside the Database struct, you will use a resizable array of pointers to
movies to keep up with all the movies in the database. The count and capacity fields are for maintaining the resizable array, for
keeping up with how many movies are in the database and for detecting when you run out of capacity and need to grow the
array. Your resizable array should start with an initial capacity of 5, and it should double in capacity whenever the array needs to
be enlarged.
Database Representation
Extra Credit Design and Implementation
If you do the extra credit part of this project, each movie will need to store a string of genre keywords read from the movie list
files. The genres for a movie may be a long string, so we're not going to store it inside the movie struct. Instead, the string will
be stored in another block of dynamically allocated memory, and the movie struct will just keep a pointer to this string. That way,
the genre string can be exactly as long as it needs to be to hold whatever genre string is given in the movie list input.
Movie Representation with Genres
Watch List Representation
You will represent the watch list as a resizable array in the top-level movies component. Like the database, this array should
start with an initial capacity of 5 and it should double in size whenever it needs to grow.
You can represent your watch list however you want to. If you want, you can store it inside a struct, like we're doing with the
Database, or you can just use some global variables inside the movies component to keep up with the watch list. If you do
choose to use some global variables for the watch list, be sure to mark them as static. This will prevent possible name collisions
with symbols defined elsewhere in your program.
Expected Functions
As part of your implementation, you will define and use the following functions. You can define more if you want to. Just try to
put them in a component that's suitable for whatever they do and remember to mark them as static where you can (i.e., if
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 7/12
they're not used outside the component where they're defined).
Your input component only needs to have one function.
char *readLine( FILE *fp )
This function reads a single line of input from the given file and returns it as a string inside a block of dynamically allocated
memory. You can use this function to read commands from the user and to read movie descriptions from a movie list file.
Inside the function, you should implement a resizable array to read in a line of text that could be arbitrarily large. If there's
no more input to read, this function should return NULL. Since this function returns a pointer to dynamically allocated
memory, some other code will be responsible for eventually freeing that memory (to avoid a memory leak).
Your database component should have the following 7 (or 8) functions.
Database *makeDatabase()
This function dynamically allocates storage for the database, initializes its fields (to store a resizable array) and returns a
pointer to it.
void freeDatabase( Database *dat )
This function frees the memory used to store the database, including freeing space for all the movies, freeing the resizable
array of movie pointers and freeing space for the database struct itself.
void readDatabase( Database *dat, char const *filename )
This function reads all the movies from a movie list file with the given name. It makes an instance of the Movie struct for
each one and stores a pointer to that movie in the resizable array
void listAll( Database *dat )
This function lists all the movies in the database, sorted by ID number. The movies component can call this in response to
the user entering the database command.
void listYear( Database *dat, int min, int max );
This function lists all the movies with a year between the given min and max values (inclusive). Your movies component can
call this when the user enters the year command. In the output, movies should be sorted by year, and by ID if they have
the same year.
void listTitle( Database *dat, char const *title )
This function lists all the movies where the given title string occurs in the movie's title field. In the output, the movies
should be listed in order by ID. For this function (and the extra credit listGenre() function), you may find the strstr()
function useful for finding a short string inside a larger one. We will talk about this function briefly in class, but, if you want
to use it, you may need to do some reading on your own. You'll find it on page 620 of your textbook, or, if you're on a Linux
machine, you can just type man strstr at the shell prompt to look at the online documentation.
void listGenre( Database *dat, char const *genre )
You only need this function if you're doing the extra credit. It reports all movies where the given genre string occurs in the
movie's genres field. In the output, the movies should be listed in order by ID.
void listDatabase( Database *dat, bool (*test)( Movie const *movie, void const *data ), void const *data )
This is a static function in the database component. It is used by the listAll(), listYear(), listTitle(), and listGenre() functions
to actually report the list of movies in the right format. In addition to a pointer to the database, this function also takes a
pointer to a function (test) and a pointer to an arbitrary block of data (data) to let the caller tell the function which
particular movies it should print out. This is described in more detail in the "Selecting Movies to Report" section below.
Your movies component will contain main() and any other functions you need to parse command-line arguments and user
commands.
Function Visibility
Any functions that are needed by a different component should be prototyped (and commented) in the header. Functions that
don't need to be used by a different component should not be prototyped in the header, and should be marked static (given
internal linkage), so they won't be visible to any other part of the program. This is like making the function an implementation
detail of its component, something we could change if we wanted without affecting other parts of the program.
Sorting the Database
You'll use the standard library qsort() function to sort movies, either by ID or by year (and ID). Using qsort() will make the
sorting easier (and probably more efficient), but you have to help out qsort() by providing a pointer to a comparison function. We
have some examples of this in the material from lecture 12, in the slides and in the sort.c example program.
To use qsort() you'll need to think about a few things. As usual, you'll need to write your own comparison function, one that
takes two (const) void pointers, but knows that they're really pointers to two elements of the array inside the Database struct.
So, your comparison function will need to cast these void pointers to pointers of the right type before it can start looking at the
fields of the Movie objects they point to. Remember that the comparison function gets pointers to two array elements (not copies
of the values in two array elements, pointers to the elements). So, for example, since the array is full of pointers to Movie
structs, your comparison function will get two pointers to pointers to Movie instances. You have to define your comparison so it
takes two void pointer parameters, but, internally, your comparison function will know that these are really pointers to pointers
to Movies. After casting the void pointers parameters to these more specific types, you can access the fields of the Movies in
order to compare them.
You have to sort the database two different ways (for the year command vs for the database and title commands), so you will
need to implement two different comparison functions for sorting the movies in the database.
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 8/12
Parsing User Commands
We're reading user input one line at a time. After we get a string containing a command, we will need to look inside this string to
figure out what command the user typed. The sscanf() function will make it easy to do this. It works much like scanf() or
fscanf(), but it parses input from a string instead of from a file. We'll cover this function in lecture 18, but you may want to look
at the material for lecture 18 early (it should already be posted) so you can get started on the project earlier.
Remember, unlike reading from a file, sscanf() doesn't automaticall resume parsing from where it left off on the last call. For
example, you couldn't call something like sscanf( str, "%d", &x ); to read an int then call sscanf( str, "%d", &y ); again
to read the next int. If you gave sscanf() the same string in two successive calls, it would just start parsing at the start of the
string each time. If you want to parse multiple values out of the same string, you can do it all at once, like
sscanf( str, "%d%d", &x, &y );, or you could advance the pointer on successive scalls to sscanf(), as in
sscanf( str + offset, "%d", &y );. If you need to do this, the %n conversion specifier we covered in lecture 9 can be helpful.
Selecting Movies to Report
The listDatabase() function can be used to print any selected movies from the database, so, it can be used by the four
functions void listAll(), void listYear(), void listTitle(), and void listGenre() to print out the needed subset of the database. How
does it know which movies to report? Internally, it will call the provided test function for each Movie in the database. If the test
function returns true, listDatabase() should print that Movie; otherwise, it shouldn't. This lets client code use a single interface to
print any subset of the Movies. The client code just needs to provide a pointer to a function listDatabase() can use to decide what
to print and what not to print. For example, to perform the database command, you can pass in a pointer to a test function that
always returns true. To print Movies in a range of years, you can pass in a pointer to a function that checks the year and returns
true if the Movie's year is in the range. To do this, we'll need to use the data parameter to listDatabase().
Notice that listDatabase() takes a void pointer data parameter, and the test function also takes a void pointer data parameter.
This parameter is a mechanism for providing extra information the test function needs in order to do its job, like the range of
years needed for the year command. When you call listDatabase(), you can pass in a pointer to anything you want as the data
parameter (even NULL, if you don't need this parameter). The listDatabase() function should remember this parameter and will
give this same pointer to the test function every time it calls it. This gives you a way to supply a pointer to anything your test
function needs to answer the question, "Should we print this Movie?". Inside each of your test functions, you will just need to
convert the data value from a void pointer back to whatever it really points to before it can use it. This is like how the
comparison function used by qsort has to convert its void pointer parameters to a more specific type before it can use them (but
the type its converting to will be different here).
For example, to implement year 1990 1999 command, our test fun
Movie Watch List Manager
For this project, you get to write a program that will help you manage a list of movies that you would like to watch. It will
maintain a database of available movies, read in at program start-up. By entering commands for the program, the user can view
the entire movie database, or just the movies from a particular range of years or those with a title containing a given string. The
user can choose movies to add to their watch list and later remove them if they change their mind.
The sample execution below shows how you can run the program. The bold text is input typed by the user. Here, we're telling it
to read a movie database from the input file, list-d.txt. We ask it to output the entire database (11 movies in this case), and
then we ask it to list just the movies with a year between 1990 and 1999. Then we ask it to display the watch list (which is
empty), so we add a few movies to our watch list and display the list again. Finally, we remove one movie from the list, take
another look at the list, and enter the quit command to terminate the program.
$ ./movies list-d.txt
cmd> database
database
ID Title Year Len
4511 Aladdin 1992 90
4772 Alice in Wonderland 1951 75
18145 Cinderella 1950 74
61360 Mulan 1998 88
70281 Pinocchio 1940 88
70767 Pocahontas 1995 81
82766 Snow White and the Seven Dwarfs 1937 83
99053 The Lion King 1994 88
111278 Toy Story 1995 81
111279 Toy Story 2 1999 92
111280 Toy Story 3 2010 103
cmd> year 1990 1999
year 1990 1999
ID Title Year Len
4511 Aladdin 1992 90
99053 The Lion King 1994 88
70767 Pocahontas 1995 81
111278 Toy Story 1995 81
61360 Mulan 1998 88
111279 Toy Story 2 1999 92
cmd> list
list
List is empty
cmd> add 99053
add 99053
cmd> add 61360
add 61360
cmd> add 111278
add 111278
cmd> list
list
ID Title Year Len
99053 The Lion King 1994 88
61360 Mulan 1998 88
111278 Toy Story 1995 81
cmd> remove 61360
remove 61360
cmd> list
list
ID Title Year Len
99053 The Lion King 1994 88
111278 Toy Story 1995 81
cmd> quit
quit
As with recent projects, you'll be developing this one using git for revision control. You should be able to just unpack the starter
into the p4 directory of your cloned repo to get started. See the Getting Started section for instructions.
This project supports a number of our course objectives. See the Learning Outcomes section for a list.
The project uses a list of movies based on the title.basics.tsv.gz file previously retrieved from the IMDb (Internet Movie
Database) website.
Rules for Project 4
You get to complete this project individually. If you're unsure what's permitted, you can have a look at the academic integrity
guidelines in the course syllabus.
In the design section, you'll see some instructions for how your implementation is expected to work. Be sure you follow these
rules. It's not enough to just turn in a working program; your program has to follow the design constraints we've asked you to
follow. For this assignment, we're putting some constraints on the functions you'll need to define, the data structures you'll use
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 2/12
and how you're going to organize your code into components. Still, you will have lots of opportunities to design parts of your
solution and to create additional functions to simplify your implementation.
Requirements
This section says what your program is supposed to be able to do, and what it should do when something goes wrong.
Program Execution
The movies program expects one or more filenames on the command line. Each of these files should contain a list of movies that
the program can read into its database at startup. If the program is run with invalid command-line arguments (e.g., no filenames
given on the command line), it should print the following usage message to standard error and exit with a status of 1.
usage: movies
If the program can't open one of the given files for reading, it should print the following message to standard error and exit with
a status of 1. Here, filename is the name of the file given on the command line. The program should report the first filename on
the command line that it can't successfully open (i.e., if there are multiple filenames on the command line that can't be opened,
it just needs to report this error for the first one that can't be opened).
Can't open file: filename
Movie List Format
At program start-up, the movies program reads in a database of movies. On the command line, it is given one or more filenames
for files containing movie lists, stored in a particular format. Each line of a movie list file describes one movie. A movie
description consists of five fields, with tab characters (ASCII 0x09) separating the fields. The first field is an integer ID unique to
the given movie. The next field is a title for the movie (a string). The next field is a integer year of when the movie was released.
The next field is an integer length of the movie in minutes and the last field is a string listing various genres for the movie. The
list of genres is a comma-separated list of strings. None of the fields will contain a tab character, and none of them will be empty.
Format of a line of a Movie List
Your program will only use the genre field if you're doing the extra credit part of the assignment. Otherwise, your program can
just skip over this field as it reads in a movie list. The possible genres are Action, Adventure, Animation, Biography, Comedy,
Crime, Documentary, Drama, Family, Fantasy, History, Horror, Musical, Mystery, Romance, Sci-Fi, Sport, Thriller, War, and
Western.
Some of the title fields are fairly long, but you will only need to store the first 38 characters of the title. This is described in the
Movie Listing section below.
The program should process the movie list files in the same order they are given on the command line. Within each movie list, it
should process movies in order from the first line to the last line of the file. The order for processing these files matters for error
reporting. If there is something wrong with a movie list, the program should report the first error it encounters.
A movie list file can contain any number of movie descriptions, one per line. If the format of the movie list is invalid (e.g., if a line
is missing one of the expected fields or if one of the numeric fields can't be parsed as a number), then it should print the
following message to standard error and exit with a status of 1. Here, filename is the name of the file containing the bad movie
description.
Invalid movie list: filename
Every movie should have a unique numeric ID. If the program encounters more than one movie with the same ID (even if other
fields like the title or year are different), it should print the following message to standard error and exit with a status of 1. Here,
ID is the movie ID that occurred more than once. The program should detect duplicate IDs, whether they occur within the same
movie list file or across two different movie lists.
Duplicate movie id: ID
Watch List
As the user interacts with the movies program, they can select movies from the database to add to their watch list, a subset of
movies the user plans to watch. The database is the set of all movies available, and the watch list is the subset of the database
that the user has selected.
It's possible for the watch list to be empty (it's empty when the program starts up). It's managed like a set, so it can't contain
more than one of the same move.
Movie List Output
A few user commands are used to list movies, either from the database or from the watch list. The output format for these
reports is mostly the same. It consists of a header describing each of the four fields (like the example shown below). After the
header, the report lists one movie per line. Each movie is reported as a movie ID in a 6-character field, a movie title in a 38
character field, a movie year in a 4 character field, and finally a movie length given in a 3 character field. For the year and the
movie length, the widths of 4 and 3 are minimum field widths, so it's possible to have a year with more than four digits or a
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 3/12
length with more than three digits. For cases like these, the columns may not line up properly. Each of these fields is rightaligned
and has a single space separating them.
ID Title Year Len
8466 Avatar 2009 162
84694 Star Trek VI: The Undiscovered Country 1991 110
84702 Star Wars: Episode IV - A New Hope 1977 121
94055 The Englishman Who Went Up a Hill Bu.. 1995 99
108082 The Wizard of Oz 1982 78
For movie titles that are too long to fit in their field width, you will print as much of the title as you can, and then print two
periods instead of the last two characters of the field, to indicate that the whole title was too long to fit. You can see this in the
"The Englishman Who Went Up a Hill But Came Down a Mountain" title above. Here, we print just the first 36 characters of the
title, then print two periods at the end, making the overall length exactly 38 characters.
User Commands
After start-up, the movies program reads commands typed in by the user. Each command is given as single line of user input.
For each command, the program will prompt the user with the following prompt. There's a space after the greater-than sign, but
you probably can't see it in this web page.
cmd>
After the user enters a command, the program will echo that command back to the user on the next output line. This is mostly to
help with debugging your programs. If we're capturing program output to a file, then things typed by the user don't go to the
output file (user inputs show up on the terminal, but they're not part of the program's output). By echoing each command, our
output files will include a copy of each command the user typed, making it easier to see what the program was asked to do. So,
for example, if the user typed in a command like the following, the program would echo a copy of the command on the next line:
cmd> year 1990 1999
year 1990 1999
The user can type any of 7 (or 8) available commands: database, year, title, add, remove, list and quit. There is also a
genre command that can be implemented for extra credit. These commands are described below. Each valid command starts
with one of the keywords listed above. For some commands, the keyword must be followed by one or more parameters. There
may be one or more whitespace characters at the start of the command, between the keyword and the parameters, between
parameters or at the end of the command. Any non-whitespace characters on a line following a valid command (and it's
parameters, if any) may be ignored by the program.
If the user enters an invalid command, the program should print the following message to standard output (not standard error),
ignore the command and prompt the user for another command. Invalid commands would be those that start with something
other than the 7 (or 8 for extra credit) keywords listed above, or if the command's parameters weren't correct.
Invalid command
After the first prompt, the program should print a blank line before prompting the user for another command. This is shown in
the sample execution at the start of this project description. It's just to provide a little separation between the output for
consecutive commands.
The program should terminate when it is given the quit command or when it reaches the end-of-file on standard input. In the
case of the quit command, the program should echo the command back to the user (like all the other commands). In the case of
end-of-file, there's no command to echo, so the program should just terminate.
Database command
If the user enters the database command, the program should print out all the movies in the entire database in the format
described in the "Movie List Output" section above. Movies should be sorted by their ID field, least to greatest.
If there are no movies in the database (this could happen if the program was given an empty movie list file to read), the
program should print the following message to standard output and then prompt for another command.
No matching movies
Year command
The year command requires two integer parameters, a low value and a high value. It lists movies from the database with a year
at least as high as the low value and no higher than the high value. Output should be given in the format described in the "Movie
List Output" section above, ordered by year, from low to high. Movies with the same year should be sorted by ID.
For example, the user could enter the following year command, with the following response from the program. Notice that the
two movies with a year of 1995 are ordered by ID number.
cmd> year 1990 1999
year 1990 1999
ID Title Year Len
4511 Aladdin 1992 90
99053 The Lion King 1994 88
70767 Pocahontas 1995 81
111278 Toy Story 1995 81
61360 Mulan 1998 88
111279 Toy Story 2 1999 92
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 4/12
The year command would be invalid if it was missing a parameter, or one of its parameters couldn't be parsed as an integer
value, or if its first parameter was greater than its second parameter. If the range of years doesn't contain any movies, the
program should print a line to standard output saying "No matching movies", like in the following example:
cmd> year 1978 1979
year 1978 1979
No matching movies
Title command
For this command, the user can enter a single-token string. The program will find all movies in the database that contain that
string as a substring in their title field. It will print out just these movies from the database in the format described in the "Movie
List Output" section above, with movies whose title contains or is equal to the given string listed in order of ID. A movie's title
field matches the given string even if the string is just a substring of a longer word in the movie's subject field. For example, if
the user entered "title and", then it could match a movie that had the word "and" in its title field, or one that had the word
"Wonderland" in its title field. For example,
cmd> title and
title and
ID Title Year Len
4772 Alice in Wonderland 1951 75
82766 Snow White and the Seven Dwarfs 1937 83
If there are no matching movies in the database, the title command should print the "No matching movies" message to standard
output, just like the database and year commands.
A title command would be invalid if it didn't have a string after the title keyword.
Genre command
The genre command is for extra credit. For this command, the program will need to store the list of genres for each movie (the
last field on each line in a movie list). The user can enter the genre keyword, followed by a single word (a sequence of nonwhitespace
characters). The program will find all movies in the database that contain that word as a substring in their genre
field. It will print out just these movies from the database in the format described in the "Movie List Output" section above, with
movies that match the given word listed in order of ID. A movie's genre field matches the given word even if the word is just a
substring of a longer word in the movie's genre field. For example, if the user entered "genre mat", then it could match a movie
that had the word the word "Animation" in its genre field.
If there are no matching movies in the database, the genre command should print the "No matching movies" message to
standard output, just like the database and year commands.
A genre command would be invalid if it didn't have a word after the genre keyword.
Add command
The add command is for adding movies from the database to the watch list. Movies added to the watch list are added at the end,
and adding a movie to the watch list doesn't remove it from the database; it just puts that movie on the watch list. The add
command expects a movie ID as a parameter. So, for example, the following command would add the movie with an ID of 42 to
the watch list.
add 42
An add command would be invalid if there wasn't an integer after the add keyword. If the integer doesn't match a movie ID from
the database, the program should print the following to standard output (where ID is the ID the user asked to add).
Movie ID is not in the database
If the user gives the ID of a movie that's already on the watch list, the program should print the following message to standard
output (where ID is the ID the user asked to add):
Movie ID is already on the watch list
Remove command
The remove command is for removing movies from the watch list. As a parameter, it expects the ID of the movie to be removed.
It removes that movie from the watch list, and the remaining movies stay in the same order. So, for example, the following
command would remove the movie with an ID of 42 from the watch list.
remove 42
If the remove command isn't given a valid integer as a parameter, then it is an invalid command. If the parameter is a valid
integer but doesn't match the ID of a movie on the watch list, then it should print the following message to standard output,
where ID is the ID of the movie the user asked to remove.
Movie ID is not on the watch list
List command
The list command shows the movies on the watch list in the same format described in the "Movie List Format" section above.
Notice that the watch list is ordered based on the order movies were added (not sorted by ID).
For example, running the list command might look like the following.
cmd> list
list
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 5/12
ID Title Year Len
70356 Pirates of the Caribbean: The Curse .. 2003 143
59720 Mission: Impossible 1996 110
91003 The Bourne Identity 2002 119
30675 Ferris Bueller's Day Off 1986 103
If the watch list is empty, the program should print a line to standard output saying "List is empty".
So, for example, you might get the following response from a list command:
cmd> list
list
List is empty
Quit command and termination
The quit command doesn't take any parameters. It should terminate the program. It's entered like the following:
cmd> quit
The program should also terminate successfully if it reaches the end-of-file on standard input while it's trying to read the next
command.
Design
Program Organization
Your implementation will be organized into three components. The input component will help with reading input from the movie
list files and from the user. The database component will contain code for implementing movies and the database. The movies
component will contain main, code to read in user commands and the implementation for the watch list.
Components and Dependency Structure
The input and database components will each have a header file, so other components can use types and functions defined by
these components. The figure above shows the dependency structure of the project. The database component can use code
provided by input and the main movies component can use code provided by both input and database.
Movie and Database Representation
This project is a good chance to get some experience using structs, dynamic memory allocation and resizable arrays. Each movie
will be represented by a struct with a field for each of the four values associated with a movie. The title field will be a string, and
the ID, year, and length can be stored as ints. The title field just needs to be able to store a string of up to 38 characters.
Although lots of titles are longer than this, the output of the program never reports more than 38 characters for a title, so you
won't need to store more than the first 38 characters.
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 6/12
Movie Representation
The database will be represented by its own struct, containing fields to store a resizable array of pointers to movies. Each movie
will be stored in a block of dynamically allocated memory. Inside the Database struct, you will use a resizable array of pointers to
movies to keep up with all the movies in the database. The count and capacity fields are for maintaining the resizable array, for
keeping up with how many movies are in the database and for detecting when you run out of capacity and need to grow the
array. Your resizable array should start with an initial capacity of 5, and it should double in capacity whenever the array needs to
be enlarged.
Database Representation
Extra Credit Design and Implementation
If you do the extra credit part of this project, each movie will need to store a string of genre keywords read from the movie list
files. The genres for a movie may be a long string, so we're not going to store it inside the movie struct. Instead, the string will
be stored in another block of dynamically allocated memory, and the movie struct will just keep a pointer to this string. That way,
the genre string can be exactly as long as it needs to be to hold whatever genre string is given in the movie list input.
Movie Representation with Genres
Watch List Representation
You will represent the watch list as a resizable array in the top-level movies component. Like the database, this array should
start with an initial capacity of 5 and it should double in size whenever it needs to grow.
You can represent your watch list however you want to. If you want, you can store it inside a struct, like we're doing with the
Database, or you can just use some global variables inside the movies component to keep up with the watch list. If you do
choose to use some global variables for the watch list, be sure to mark them as static. This will prevent possible name collisions
with symbols defined elsewhere in your program.
Expected Functions
As part of your implementation, you will define and use the following functions. You can define more if you want to. Just try to
put them in a component that's suitable for whatever they do and remember to mark them as static where you can (i.e., if
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 7/12
they're not used outside the component where they're defined).
Your input component only needs to have one function.
char *readLine( FILE *fp )
This function reads a single line of input from the given file and returns it as a string inside a block of dynamically allocated
memory. You can use this function to read commands from the user and to read movie descriptions from a movie list file.
Inside the function, you should implement a resizable array to read in a line of text that could be arbitrarily large. If there's
no more input to read, this function should return NULL. Since this function returns a pointer to dynamically allocated
memory, some other code will be responsible for eventually freeing that memory (to avoid a memory leak).
Your database component should have the following 7 (or 8) functions.
Database *makeDatabase()
This function dynamically allocates storage for the database, initializes its fields (to store a resizable array) and returns a
pointer to it.
void freeDatabase( Database *dat )
This function frees the memory used to store the database, including freeing space for all the movies, freeing the resizable
array of movie pointers and freeing space for the database struct itself.
void readDatabase( Database *dat, char const *filename )
This function reads all the movies from a movie list file with the given name. It makes an instance of the Movie struct for
each one and stores a pointer to that movie in the resizable array
void listAll( Database *dat )
This function lists all the movies in the database, sorted by ID number. The movies component can call this in response to
the user entering the database command.
void listYear( Database *dat, int min, int max );
This function lists all the movies with a year between the given min and max values (inclusive). Your movies component can
call this when the user enters the year command. In the output, movies should be sorted by year, and by ID if they have
the same year.
void listTitle( Database *dat, char const *title )
This function lists all the movies where the given title string occurs in the movie's title field. In the output, the movies
should be listed in order by ID. For this function (and the extra credit listGenre() function), you may find the strstr()
function useful for finding a short string inside a larger one. We will talk about this function briefly in class, but, if you want
to use it, you may need to do some reading on your own. You'll find it on page 620 of your textbook, or, if you're on a Linux
machine, you can just type man strstr at the shell prompt to look at the online documentation.
void listGenre( Database *dat, char const *genre )
You only need this function if you're doing the extra credit. It reports all movies where the given genre string occurs in the
movie's genres field. In the output, the movies should be listed in order by ID.
void listDatabase( Database *dat, bool (*test)( Movie const *movie, void const *data ), void const *data )
This is a static function in the database component. It is used by the listAll(), listYear(), listTitle(), and listGenre() functions
to actually report the list of movies in the right format. In addition to a pointer to the database, this function also takes a
pointer to a function (test) and a pointer to an arbitrary block of data (data) to let the caller tell the function which
particular movies it should print out. This is described in more detail in the "Selecting Movies to Report" section below.
Your movies component will contain main() and any other functions you need to parse command-line arguments and user
commands.
Function Visibility
Any functions that are needed by a different component should be prototyped (and commented) in the header. Functions that
don't need to be used by a different component should not be prototyped in the header, and should be marked static (given
internal linkage), so they won't be visible to any other part of the program. This is like making the function an implementation
detail of its component, something we could change if we wanted without affecting other parts of the program.
Sorting the Database
You'll use the standard library qsort() function to sort movies, either by ID or by year (and ID). Using qsort() will make the
sorting easier (and probably more efficient), but you have to help out qsort() by providing a pointer to a comparison function. We
have some examples of this in the material from lecture 12, in the slides and in the sort.c example program.
To use qsort() you'll need to think about a few things. As usual, you'll need to write your own comparison function, one that
takes two (const) void pointers, but knows that they're really pointers to two elements of the array inside the Database struct.
So, your comparison function will need to cast these void pointers to pointers of the right type before it can start looking at the
fields of the Movie objects they point to. Remember that the comparison function gets pointers to two array elements (not copies
of the values in two array elements, pointers to the elements). So, for example, since the array is full of pointers to Movie
structs, your comparison function will get two pointers to pointers to Movie instances. You have to define your comparison so it
takes two void pointer parameters, but, internally, your comparison function will know that these are really pointers to pointers
to Movies. After casting the void pointers parameters to these more specific types, you can access the fields of the Movies in
order to compare them.
You have to sort the database two different ways (for the year command vs for the database and title commands), so you will
need to implement two different comparison functions for sorting the movies in the database.
2021/3/23 CSC230 Project 4
https://www.csc2.ncsu.edu/courses/csc230/proj/p4/p4.html 8/12
Parsing User Commands
We're reading user input one line at a time. After we get a string containing a command, we will need to look inside this string to
figure out what command the user typed. The sscanf() function will make it easy to do this. It works much like scanf() or
fscanf(), but it parses input from a string instead of from a file. We'll cover this function in lecture 18, but you may want to look
at the material for lecture 18 early (it should already be posted) so you can get started on the project earlier.
Remember, unlike reading from a file, sscanf() doesn't automaticall resume parsing from where it left off on the last call. For
example, you couldn't call something like sscanf( str, "%d", &x ); to read an int then call sscanf( str, "%d", &y ); again
to read the next int. If you gave sscanf() the same string in two successive calls, it would just start parsing at the start of the
string each time. If you want to parse multiple values out of the same string, you can do it all at once, like
sscanf( str, "%d%d", &x, &y );, or you could advance the pointer on successive scalls to sscanf(), as in
sscanf( str + offset, "%d", &y );. If you need to do this, the %n conversion specifier we covered in lecture 9 can be helpful.
Selecting Movies to Report
The listDatabase() function can be used to print any selected movies from the database, so, it can be used by the four
functions void listAll(), void listYear(), void listTitle(), and void listGenre() to print out the needed subset of the database. How
does it know which movies to report? Internally, it will call the provided test function for each Movie in the database. If the test
function returns true, listDatabase() should print that Movie; otherwise, it shouldn't. This lets client code use a single interface to
print any subset of the Movies. The client code just needs to provide a pointer to a function listDatabase() can use to decide what
to print and what not to print. For example, to perform the database command, you can pass in a pointer to a test function that
always returns true. To print Movies in a range of years, you can pass in a pointer to a function that checks the year and returns
true if the Movie's year is in the range. To do this, we'll need to use the data parameter to listDatabase().
Notice that listDatabase() takes a void pointer data parameter, and the test function also takes a void pointer data parameter.
This parameter is a mechanism for providing extra information the test function needs in order to do its job, like the range of
years needed for the year command. When you call listDatabase(), you can pass in a pointer to anything you want as the data
parameter (even NULL, if you don't need this parameter). The listDatabase() function should remember this parameter and will
give this same pointer to the test function every time it calls it. This gives you a way to supply a pointer to anything your test
function needs to answer the question, "Should we print this Movie?". Inside each of your test functions, you will just need to
convert the data value from a void pointer back to whatever it really points to before it can use it. This is like how the
comparison function used by qsort has to convert its void pointer parameters to a more specific type before it can use them (but
the type its converting to will be different here).
For example, to implement year 1990 1999 command, our test fun