c++编程辅导、辅导c++编程、辅导c++程序、c++编程辅导、c++辅导、c++程序辅导、c++语言辅导
- 首页 >> C/C++编程Project Description
In this assignment you will implement two versions of a tokeniser that
with minor changes could be used to complete variations of projects 6,
10 and 11 in the nand2tetris course. A detailed description of the
requirements are shown below. The exectuable program, tokeniser will
read text from standard input and produce a list of all tokens in the text
on standard output.
SVN Repository
You must create a directory in your svn repository named: <year>/
<semester>/cs/assignment1. This directory must only contain the
following files and directories - the web submission system will check
this:
• Makefile - this file is used by make to compile
your tokeniser program - do not modify this file.
• tokeniser.cpp C++ source files containing
the next_token() function.
• my*.cpp C++ source files with names that start with my
• my*.h C++ include files with names that start with my
• lib - this directory contains precompiled programs and components
- do not modify this directory.
• includes - this directory contains .h files for precompiled classes -
do not modify this directory.
• tests - this directory contains sample test data, it can be used to
store any extra files you need for testing
Note: if the lib/tokens.o file does not get added to your svn repository
you will need explicitly added it using:
% svn add lib/tokens.o
Submission and Marking Scheme
This assignment has two assignments in the web submission
system named: Assignment 1 - Milestone
Submissions and Assignment 1 - Final Submissions. The assessment
is based on "Assessment of Programming Assignments".
Assignment 1 - Milestone Submissions: due 11:55pm Tuesday of week 7
The marks awarded by the web submission system for the milestone
submission contribute up to 20% of your marks for assignment
1. Your milestone submission mark, after the application of late penalties,
will be posted to the myuni gradebook when the assignment marking is
complete.
Your programs must be written in C++ and will be tested using the set
of test files that are attached below. Although a wide range of tests may
be run, including a number of secret tests, marks will only be recorded
for those tests that test the milestone token definitions shown below. Your
programs will be compiled using the Makefile included in the zip file
attached below. Any .h or .cpp files that you create, in addition to the
skeletons provided, must have names that start with my.
The Makefile will use all of the my*.cpp files in your svn directory as part
of the tokeniser program that it compiles.
Assignment 1 - Final Submissions: due 11:55pm Friday of week 7
The marks awarded for the final submission contribute up to 80% of your
marks for assignment 1.
Your final submission mark will be the geometric mean of the marks
awarded by the web submission system, a mark for your logbook and a
mark for your code. It will be limited to 20% more than the marks
awarded by the web submission system. See "Assessment - Mark
Calculations" for examples of how the marks are combined. Your final
submission mark, after the application of late penalties, will be posted to
the myuni gradebook when the assignment marking is complete.
Automatic Marking
The automatic marking will compile and test your tokeniser program in
exactly the same way as for the milestone submission. The difference is
that marks will be recorded for all of the tests including
the secret tests. Note: if your program fails any of these secret tests
you will not receive any feedback about these secret tests, even if you
ask!
Logbook Marking
Important: the logbook must have entries for all work in this assignment,
including your milestone submissions. See "Assessment - Logbook
Review" for details of how your logbook will be assessed.
Code Review Marking
For each of your programming assignments you are expected to submit
well written code. See "Assessment - Code Review" for details of how
your code will be assessed.
Tokenisers
Background
The primary task of any language translator is to work out how the
structure and meaning of an input in a given language so that an
appropriate translation can be output in another language. If you think of
this in terms of a natural language such as English. When you attempt to
read a sentence you do not spend your time worrying about what
characters there are, how much space is between the letters or where
lines are broken. What you do is consider the words and attempt to
derive structure and meaning from their order and arrangement into
English language sentences, paragraphs, sections, chapters etc. In the
same way, when we attempt to write translators from assembly language,
virtual machine language or a programming language into another form,
we attempt to focus on things like keywords, identifiers, operators and
logical structures rather than individual characters.
The role of a tokeniser is to take the input text and break it up into tokens
(words in natural language) so that the assembler or compiler using it only
needs to concern itself with higher level structure and meaning. This
division of labor is reflected in most programming language definitions in
that they usually have a separate syntax definition for tokens and another
for structures formed from the tokens.
The focus of this assignment is writing a tokeniser to recognise tokens
that conform to a specific set of rules. The set of tokens may or may not
correspond to a particular language because a tokeniser is a fairly generic
tool. After completing this assignment we will assume that you know how
to write a tokeniser and we will provide you a working tokeniser to use in
each of the remaining programming assignments. This will permit you to
take the later assignments much further than would be otherwise possible
in the limited time available.
Writing Your Program
You are required to complete the implementation of the C++
file tokeniser.cpp which is used to compile the program tokeniser. You
will implement a function, next_token(), that will read text character by
character using the static function nextch(), and return the next
recognised token in the input. The tokens that must be recognised in the
milestone and final submissions are specified in separate tables below.
If you wish to write any of your code in separate .cpp or .h files, the
names of the additional files must all start with my. All files
matching my*.cpp will be automatically included when compiling
the tokeniser program. Your programs will be compiled using
the Makefile in the zip file attached below using the command:
% make
Note: Do not modify the Makefile or the subdirectories
includes and lib. They will be replaced during testing by the
web submission system.
Testing Your Program
For each file in the tests directory, the output of the tokeniser program
must match the corresponding .tokens output file. You must not produce
any output of your own. You can test your program against all of the
supplied tests using the command:
% make test
The testing will not show you any program output, just whether or not a
test was passed or failed. If you want to see the actual output, the
commands used to run the tests are shown in string quotes ("). Simply
copy the commands and paste them into your shell.
The web submission system will test your program in exactly this way.
The key difference between your testing and the web submission testing
is that the web submission system has some secret tests that it will use.
If you want to try additional tests, just create some new files in
the tests sub-directory and generate the correct outputs using the
command:
% make test-new
This will increase the number of tests that will be run in the future.
Milestone Tokens
Your milestone submission will be marked using tests that require the
correct recognition of the following tokens:
Notes:
• all input must be read using the function nextch()
◦
• if the end of input is reached, return the token EOI.
• it a character is found that cannot be part of a token or is not a
space " ", tab "\t", carriage return "\r" or newline "\n", return the
token BAD.
• letter, digit19 and digit are never returned as token classes
• all tokens must be contiguous characters in the input
• when searching for the start of the next token all spaces
and newlines are ignored
• in a definition the or operator | separates alternative components
Token Definition Example
Token
IDENTIFIER ::
= letter ( letter | digit )* _he82mUch
INT ::
= '0' | ( digit19 digit* ) 17
SYMBOL ::
=
';' | ':' | '!' | ',' | '.' | '=' | '{' | '}' | '(' | ')' |
'[' | ']' | '@' ;
Additional
Rules Definition Example
Text
letter ::
= 'a'-'z' | 'A'-'Z' | '_' "C"
digit19 ::
= 1'-'9' "1"
digit ::
= 0'-'9' "0"
• in a definition the round brackets ( ) which are not inside single
quotes are for grouping components of token
• in a definition the square brackets [ ] which indicates that the
enclosed components may appear 0 or once
• in a definition the star character * indicates that the preceding
component of a token may appear 0 or more times
Final Submission Tokens
Your final submission will be marked using tests that require the correct
recognition of the following tokens in addition to the milestone tokens
listed above.
Additional Notes:
• this tokeniser does not ignore comments, it returns them as tokens
• single-line comments start with "//" and finish at the next newline
character "\n"
• multi-line comments start with "/*" and continue until the first "*/",
the shortest multi-line comment is "/**/"
• keyword tokens are only to be recognised by interpreting an
identifier token
• use the string_to_token() function to check if an identifier is
actually a keyword
Tests
In addition to the test files in the zip file attached below, we will use a
number of secret tests that may contain illegal characters or character
combinations that may defeat your tokenisers. Note: these tests
Token Definition Example
Token
DOUBLE ::
= ( '0' | ( digit19 digit* ) ) [ '.' digit* ] 17.05
KEYWORD ::
=
'if' | 'while' | 'else' | 'class' | 'int' |
'string' if
ONELINECOMMENT
::
=
'//' ( any character except
newline ) newline // oneliner
MULTI-LINE
COMMENT
::
=
'/*' ( any characters up to the first
'*/' ) '*/' /* hello */
are secret, if your programs fail any of these secret tests you will
not receive any feedback about these secret tests, even if you ask!