辅导C/C++编程、C/C++程序调试、svn辅导留学生、讲解C/C++设计
- 首页 >> C/C++编程Project Description
In this assignment you will complete a variation of project 6 in the
nand2tetris course. A detailed description of Nand2Tetris Project
6 tailored to this course is shown below. In this assignment the
assembler will be written as two separate programs. The
executable program, parser, will read a Hack Assembly Language
program from standard input and produce an abstract syntax
tree on standard output. The executable program, translator, will
read the abstract syntax tree and assemble a machine code
representation of the original Hack Assembly Language program.
The assembled code will be formatted as sixteen zeros or ones per
line and it will be written to standard output.
SVN Repository
You must create a directory in your svn repository named: <year>/
<semester>/cs/assignment2. This directory must only contain the
following files and directories - the web submission system will
check this:
? Makefile - this file is used by make to compile your
programs - do not modify this file.
? parser.cpp - C++ source file
? translator.cpp - C++ source file
? my*.cpp C++ source files with names that start with my
? my*.h C++ include files with names that start with my
? bin - this directory contains precompiled programs and
scripts - do not modify this directory.
? lib - this directory contains precompiled components - do not
modify this directory.
? includes - this directory contains .h files for precompiled
classes - do not modify this directory.
? tests - this directory contains test data, you can add your
own tests here
Note: if the lib/lib.a file does not get added to your svn repository
you will need to explicitly add it using:
% svn add lib/lib.a
Submission and Marking Scheme
This assignment has two assignments in the web submission
system named: Assignment 2 - Milestone
Submissions and Assignment 2 - Final Submissions. The
assessment is based on "Assessment of Programming
Assignments".
Assignment 2 - Milestone Submissions: due 11:55pm
Tuesday of week 9
The marks awarded by the web submission system for the
milestone submission contribute up to 20% of your marks for
assignment 2. Your milestone submission mark, after the
application of late penalties, will be posted to the myuni gradebook
when the assignment marking is complete.
Your programs must be written in C++ and will be tested using
Hack Assembly Language programs that that may or may not be
syntactically correct. Although a wide range of tests may be run,
including a number of secret tests, marks will only be recorded for
those tests that are syntactically correct and do not use
symbols. Your programs will be compiled using
the Makefile included in the zip file attached below.
The parser program will be compiled using the file parser.cpp and
the translator program will be compiled using the
file translator.cpp file. In both cases any .cpp files with names
starting with my will be also included together with the
precompiled library functions.
Assignment 2 - Final Submissions: due 11:55pm Friday of
week 9
The marks awarded for the final submission contribute up to 80%
of your marks for assignment 2.
Your final submission mark will be the geometric mean of the
marks awarded by the web submission system, a mark for your
logbook and a mark for your code. It will be limited to 20% more
than the marks awarded by the web submission system.
See "Assessment - Mark Calculations" for examples of how the
marks are combined. Your final submission mark, after the
application of late penalties, will be posted to the myuni gradebook
when the assignment marking is complete.
Automatic Marking
The automatic marking will compile and test both of your programs
in exactly the same way as for the milestone submission. The
difference is that marks will be recorded for all of the
tests including the secret tests. Note: if your programs fail any of
these secret tests you will not receive any feedback about
these secret tests, even if you ask!
Logbook Marking
Important: the logbook must have entries for all work in this
assignment, including your milestone submissions. See
"Assessment - Logbook Review" for details of how your logbook
will be assessed.
Code Review Marking
For each of your programming assignments you are expected to
submit well written code. See "Assessment - Code Review" for
details of how your code will be assessed.
Nand2Tetris Project 6: The
Assembler
Background
Low-level machine programs are rarely written by humans.
Typically, they are generated by compilers. Yet humans can inspect
the translated code and learn important lessons about how to write
their high-level programs better, in a way that avoids low-level
pitfalls and exploits the underlying hardware better. One of the key
players in this translation process is the assembler -- a program
designed to translate code written in a symbolic machine language
into code written in binary machine language.
This project marks an exciting landmark in our Nand to Tetris
odyssey: it deals with building the first rung up the software
hierarchy, which will eventually end up in the construction of a
compiler for a Java/C++ like high-level language. The relevant
reading for this project is Chapter 6. Some of the useful tools
available include, the Hack Assembler, the CPU Emulator and
working versions of the two programs, bin/workingparser
and bin/working-translator.
Objective
The Hack assembler is a relatively simple program however, so that
you can gain experience with the tools used in other workshops
and assignments, you will build your assembler from three parts.
This will involve using a precompiled tokeniser for the Hack
assembly language to implement a parser that recognises labels,
A-instructions and C-instructions using tokens returned by the
tokeniser. The parser will construct a tree representation of the
program that the translator will walk over in order to assemble the
final machine code.
If you wish to create additional source files that can be used by
your programs the .cpp and .h files must have names that start
with my. All my*.cpp files will be compiled as part of both
you parser and translator programs.
Compiling and Running Your Programs
You programs can be compiled with one of the following three
commands. To compile both:
% make
To compile just parser:
% make parser
To compile just translator:
% make translator
To test your programs you can use the following command:
% make
Testing student assembler-m against hack files
Checking "./assembler-m < tests/AddL.asm | diff - tests/AddL.hack" - test failed
...
The test scripts do not show the program outputs, just passed or
failed, but they do show you the commands being used to run
each test. You can copy-paste these commands if you want to run
a particular test yourself and see all of the output.
If you wish to add new tests, create appropriately named files in
the tests directory and then use the command:
% make test-add
Note: Do not modify the Makefile or the subdirectories
includes, bin and lib. They will be replaced during
testing by the web submission system.
The Tokeniser
You must use the precompiled tokeniser functions provided by the
library in the zip file attached below. The tokeniser functions are
described in the files includes/a-tok.h and include/tokeniser.h.
This tokeniser returns the followings tokens:
Token
Class Definition Examp
le Token Spelli
ng
ak_addr
ess
::
= '@' ( name | number ) @hello ak_name "hello
"
ak_label ::
= (' name ')' (END) ak_label "END
"
ak_dest ::
= 'MD' | 'AM' | 'AD' | 'AMD' Am ak_dest_
AM "AM"
ak_regis
ter
::
= 'A' | 'M' | 'D' a ak_regist
er_A "A"
ak_alu_
op
::
=
'0' | '1' | '-1' | '!D' | '!A' | '-D' |
'-A' | 'D+1' | 'A+1' | 'D-1' |
'A-1' | 'D+A' | 'D-A' |'A-D'|
'D&A' | 'D|A' | '!M' | '-M' |
'M+1' | 'M-1' | 'D+M' | 'DM'
| 'M-D' | 'D&M' | 'D|M'
1 ak_alu_1 "1"
Notes:
? if an error occurs or the end of input is reached, the
token ak_eoi is returned
?
? it is an error to find a character that cannot be part of token or
is not a space " ", tab "\t", carriage return "\r" or newline "\n"
? single line comments that start with "//" and finish at the next
newline character "\n"
? when searching for the start of the next token all spaces, tabs,
carriage returns, newlines and comments are ignored
? newlines are not significant to the tokeniser so more than one
instruction can be on the same line or an instruction could be
split over multiple lines
? name, number, letter and digit are never returned as token
classes
? letters in a name are case sensitive, all other occurrences of
letters are converted to uppercase
ak_jump ::
=
'JMP' | 'JLT' | 'JLE' | 'JGT' |
'JGE' | 'JEQ' |'JNE' JGT ak_jgt "JGT"
ak_assi
gn
::
= =' = ak_assig
n
"="
ak_semi ::
= ;' ; ak_semi ";"
ak_null ::
= NULL' nuLl ak_null "NUL
L"
Additio
nal
Rules
Definition Example Text
name
::
= letter ( letter | digit )* "_he:82m.Uch$"
number ::
= digit digit* "99"
letter ::
= 'a'-'z' | 'A'-'Z' | '$' | '_' | ':' | '.' "$"
digit ::
= 0'-'9' "1"
? in a definition the or operator | separates
alternative components
? in a definition the round brackets ( ) which are not inside
single quotes are for grouping components of token
? in a definition the star character * indicates that the preceding
component of a token may appear 0 or more times
The Assembler Parser
The following table describes the structure of an assembly
language program that must be translated into machine
code. Note: there are no references to lines in these rules. All
whitespace, including newline characters, and comments will have
been removed by the precompiled tokeniser. Therefore these rules
are defined purely in terms of the token classes in the previous
table.
Notes:
? in a definition eof indicates the end of the input
? in a definition the or operator | separates
alternative components
? in a definition the round brackets ( ) are for grouping
components
? in a definition the square brackets [ ] which indicates that the
enclosed components may appear 0 or once
? in a definition the star character * indicates that the preceding
component may appear 0 or more times
Abstract Syntax Tree
An abstract syntax tree is just a tree like data structure that holds a
representation of a program. In this case the parser program must
Term Definition
program ::= ( ak_label | ak_address | C-instr )* ak_eoi
C-instr ::= [destination] aluop [ajump]
destination ::= ( ak_null | ak_dest | ak_register ) ak_semi
aluop ::= ak_register | ak_alu_op
ajump ::= ak_semi ( ak_null | ak_jump )
construct an object of class an_program_node and append all
labels or instructions to it. Each tree class can only be manipulated
by pointers and has a typedef for a pointer to the class so
the * operator does not need to be used very often. Each class for
which you are allowed to create objects also has a create function,
for example to create an an_program_node you would call the
function an_program_node_create(). To accommodate situations
where pointers of different classes can be used, each class has a
function to convert the pointer type after checking that it is
appropriate. For example, a pointer to an an_c_instruction can be
used anywhere a pointer to an an_instruction is required and the
conversion function is named to_an_instruction(). If the checks
fails your program will exit immediately.
Parser
The skeleton parser.cpp file has most of the parsing structure
already written, you just need to complete the object creation and
parsing of labels and individual instructions. The main() function of
the parser program prints out an XML representation of the
abstract syntax tree using the precompiled library
function an_print_as_xml(). You do not need to know how to write
out XML. XML is produced by the parser because it is easy to read
by humans.
If your parser encounters any errors, it must exit immediately and
have not produced any output. Examples of errors include, multiple
definitions of a label, an A-instruction with a number that is too
large, or a C-instruction with multiple destination components,
multiple jump components or no ALU component. In this two
program structure, the parser will not be able to directly detect the
first two kinds of errors. Multiple definitions of a label will not be
detectable until the translator walks the abstract syntax tree and a
number that is too large will cause the tokeniser to report that it is
at the end of the input. The other errors can be detected by the
parser and should result in an immediate exit with no output.
The iobuffer.h include file gives details of functions you can use to
manage when output is produced by your programs.
Translator
The skeleton translator.cpp file includes a loop that walks over the
abstract syntax tree produced by the parser.
The translator program uses the precompiled
function an_parse_xml(), to read the output of
the parser program. You do not need to know how to read in and
interpret XML. The later workshops and assignments will use this
technique to allow their compilers to be written as separate
programs.
One advantage of this two program approach is that walking over
the tree more than once is really simple. This is important in the
translator when it comes to resolving symbols. When a program
includes an A-instruction such as @somename you cannot tell
if somename is a label or a variable until the entire program has
been inspected because the label definition may come later. This
will be true for jumps to a later part of a program. We solve this
problem by walking over the abstract syntax tree and record all the
labels that we can find. This is Pass 1. When we walk over the
abstract syntax tree a second time, Pass 2, every label name will
already have been defined. Therefore, all undefined names found in
Pass 2 must be variable names.
If your translator encounters any errors, it must exit immediately
and have not produced any output. It may be interesting to reflect
on why we did not include missing label definitions amongst the list
of errors to be detected.
Lookup Tables
Your translator program will need a way of generating the correct
set of bits for each part of a C-instruction. For example, "JMP"
needs to be mapped to "111". To do this you will need some sort
of lookup table to record these mappings. The translator program
also needs a lookup table so that it can implement a symbol table
to record label addresses and the memory locations for
variables. You should use the precompiled symbol table classes
described in the includes/symbols.h file to create any lookup
tables that you require. The includes/symbols.h file includes
examples of how to use these lookup tables.
You will need several lookup tables, one for the symbol table and
one for each component of a C-instruction. Although most parts of
a C-instruction are unique, the M, A and D registers need to be
mapped to different sets of bits depending on whether they are
used as an alu operation or a destination. For example, "A" as a
destination would be "100" but as an alu operation it would be
"0110000".
Test Programs
In addition to the tests in tests directory, we will use a number
of secret tests that may contain illegal tokens and / or multiple
labels and instructions on the same line and / or syntactic
errors. Note: these tests are secret, if your programs fail any of
these secret tests you will not receive any feedback about
these secret tests, even if you ask!
The Pong program supplied in the tests directory was written in
the Java-like high-level Jack language and translated into the Hack
assembly language by the Jack compiler and a Hack Virtual
Machine translator. (The Virtual Machine, Jack and the Jack
compiler are described in Chapters 7-8, 9 and 10-11, respectively).
Although the original Jack program is only about 300 lines of Jack
code, the executable Pong code is naturally much longer. Running
this interactive program in the supplied CPU Emulator is a slow
affair, so don't expect a high-powered Pong game. This slowness
is actually a virtue, since it enables your eye to track the graphical
behavior of the program. And don't worry! as we continue to build
the software platform in the next few projects, Pong and and other
games will run much faster.