辅导C/C++编程、C/C++程序调试、svn辅导留学生、讲解C/C++设计

2018.09.25 - 首页 >> C/C++编程

Project Description

In this assignment you will complete a variation of project 6 in the

nand2tetris course. A detailed description of Nand2Tetris Project

6 tailored to this course is shown below. In this assignment the

assembler will be written as two separate programs. The

executable program, parser, will read a Hack Assembly Language

program from standard input and produce an abstract syntax

tree on standard output. The executable program, translator, will

read the abstract syntax tree and assemble a machine code

representation of the original Hack Assembly Language program.

The assembled code will be formatted as sixteen zeros or ones per

line and it will be written to standard output.

SVN Repository

You must create a directory in your svn repository named: <year>/

<semester>/cs/assignment2. This directory must only contain the

following files and directories - the web submission system will

check this:

? Makefile - this file is used by make to compile your

programs - do not modify this file.

? parser.cpp - C++ source file

? translator.cpp - C++ source file

? my*.cpp C++ source files with names that start with my

? my*.h C++ include files with names that start with my

? bin - this directory contains precompiled programs and

scripts - do not modify this directory.

? lib - this directory contains precompiled components - do not

modify this directory.

? includes - this directory contains .h files for precompiled

classes - do not modify this directory.

? tests - this directory contains test data, you can add your

own tests here

Note: if the lib/lib.a file does not get added to your svn repository

you will need to explicitly add it using:

% svn add lib/lib.a

Submission and Marking Scheme

This assignment has two assignments in the web submission

system named: Assignment 2 - Milestone

Submissions and Assignment 2 - Final Submissions. The

assessment is based on "Assessment of Programming

Assignments".

Assignment 2 - Milestone Submissions: due 11:55pm

Tuesday of week 9

The marks awarded by the web submission system for the

milestone submission contribute up to 20% of your marks for

assignment 2. Your milestone submission mark, after the

application of late penalties, will be posted to the myuni gradebook

when the assignment marking is complete.

Your programs must be written in C++ and will be tested using

Hack Assembly Language programs that that may or may not be

syntactically correct. Although a wide range of tests may be run,

including a number of secret tests, marks will only be recorded for

those tests that are syntactically correct and do not use

symbols. Your programs will be compiled using

the Makefile included in the zip file attached below.

The parser program will be compiled using the file parser.cpp and

the translator program will be compiled using the

file translator.cpp file. In both cases any .cpp files with names

starting with my will be also included together with the

precompiled library functions.

Assignment 2 - Final Submissions: due 11:55pm Friday of

week 9

The marks awarded for the final submission contribute up to 80%

of your marks for assignment 2.

Your final submission mark will be the geometric mean of the

marks awarded by the web submission system, a mark for your

logbook and a mark for your code. It will be limited to 20% more

than the marks awarded by the web submission system.

See "Assessment - Mark Calculations" for examples of how the

marks are combined. Your final submission mark, after the

application of late penalties, will be posted to the myuni gradebook

when the assignment marking is complete.

Automatic Marking

The automatic marking will compile and test both of your programs

in exactly the same way as for the milestone submission. The

difference is that marks will be recorded for all of the

tests including the secret tests. Note: if your programs fail any of

these secret tests you will not receive any feedback about

these secret tests, even if you ask!

Logbook Marking

Important: the logbook must have entries for all work in this

assignment, including your milestone submissions. See

"Assessment - Logbook Review" for details of how your logbook

will be assessed.

Code Review Marking

For each of your programming assignments you are expected to

submit well written code. See "Assessment - Code Review" for

details of how your code will be assessed.

Nand2Tetris Project 6: The

Assembler

Background

Low-level machine programs are rarely written by humans.

Typically, they are generated by compilers. Yet humans can inspect

the translated code and learn important lessons about how to write

their high-level programs better, in a way that avoids low-level

pitfalls and exploits the underlying hardware better. One of the key

players in this translation process is the assembler -- a program

designed to translate code written in a symbolic machine language

into code written in binary machine language.

This project marks an exciting landmark in our Nand to Tetris

odyssey: it deals with building the first rung up the software

hierarchy, which will eventually end up in the construction of a

compiler for a Java/C++ like high-level language. The relevant

reading for this project is Chapter 6. Some of the useful tools

available include, the Hack Assembler, the CPU Emulator and

working versions of the two programs, bin/workingparser

and bin/working-translator.

Objective

The Hack assembler is a relatively simple program however, so that

you can gain experience with the tools used in other workshops

and assignments, you will build your assembler from three parts.

This will involve using a precompiled tokeniser for the Hack

assembly language to implement a parser that recognises labels,

A-instructions and C-instructions using tokens returned by the

tokeniser. The parser will construct a tree representation of the

program that the translator will walk over in order to assemble the

final machine code.

If you wish to create additional source files that can be used by

your programs the .cpp and .h files must have names that start

with my. All my*.cpp files will be compiled as part of both

you parser and translator programs.

Compiling and Running Your Programs

You programs can be compiled with one of the following three

commands. To compile both:

% make

To compile just parser:

% make parser

To compile just translator:

% make translator

To test your programs you can use the following command:

% make

Testing student assembler-m against hack files

Checking "./assembler-m < tests/AddL.asm | diff - tests/AddL.hack" - test failed

...

The test scripts do not show the program outputs, just passed or

failed, but they do show you the commands being used to run

each test. You can copy-paste these commands if you want to run

a particular test yourself and see all of the output.

If you wish to add new tests, create appropriately named files in

the tests directory and then use the command:

% make test-add

Note: Do not modify the Makefile or the subdirectories

includes, bin and lib. They will be replaced during

testing by the web submission system.

The Tokeniser

You must use the precompiled tokeniser functions provided by the

library in the zip file attached below. The tokeniser functions are

described in the files includes/a-tok.h and include/tokeniser.h.

This tokeniser returns the followings tokens:

Token

Class Definition Examp

le Token Spelli

ak_addr

ess

= '@' ( name | number ) @hello ak_name "hello

ak_label ::

= (' name ')' (END) ak_label "END

ak_dest ::

= 'MD' | 'AM' | 'AD' | 'AMD' Am ak_dest_

AM "AM"

ak_regis

ter

= 'A' | 'M' | 'D' a ak_regist

er_A "A"

ak_alu_

'0' | '1' | '-1' | '!D' | '!A' | '-D' |

'-A' | 'D+1' | 'A+1' | 'D-1' |

'A-1' | 'D+A' | 'D-A' |'A-D'|

'D&A' | 'D|A' | '!M' | '-M' |

'M+1' | 'M-1' | 'D+M' | 'DM'

| 'M-D' | 'D&M' | 'D|M'

1 ak_alu_1 "1"

Notes:

? if an error occurs or the end of input is reached, the

token ak_eoi is returned

? it is an error to find a character that cannot be part of token or

is not a space " ", tab "\t", carriage return "\r" or newline "\n"

? single line comments that start with "//" and finish at the next

newline character "\n"

? when searching for the start of the next token all spaces, tabs,

carriage returns, newlines and comments are ignored

? newlines are not significant to the tokeniser so more than one

instruction can be on the same line or an instruction could be

split over multiple lines

? name, number, letter and digit are never returned as token

classes

? letters in a name are case sensitive, all other occurrences of

letters are converted to uppercase

ak_jump ::

'JMP' | 'JLT' | 'JLE' | 'JGT' |

'JGE' | 'JEQ' |'JNE' JGT ak_jgt "JGT"

ak_assi

= =' = ak_assig

"="

ak_semi ::

= ;' ; ak_semi ";"

ak_null ::

= NULL' nuLl ak_null "NUL

Additio

nal

Rules

Definition Example Text

name

= letter ( letter | digit )* "_he:82m.Uch$"

number ::

= digit digit* "99"

letter ::

= 'a'-'z' | 'A'-'Z' | '$' | '_' | ':' | '.' "$"

digit ::

= 0'-'9' "1"

? in a definition the or operator | separates

alternative components

? in a definition the round brackets ( ) which are not inside

single quotes are for grouping components of token

? in a definition the star character * indicates that the preceding

component of a token may appear 0 or more times

The Assembler Parser

The following table describes the structure of an assembly

language program that must be translated into machine

code. Note: there are no references to lines in these rules. All

whitespace, including newline characters, and comments will have

been removed by the precompiled tokeniser. Therefore these rules

are defined purely in terms of the token classes in the previous

table.

Notes:

? in a definition eof indicates the end of the input

? in a definition the or operator | separates

alternative components

? in a definition the round brackets ( ) are for grouping

components

? in a definition the square brackets [ ] which indicates that the

enclosed components may appear 0 or once

? in a definition the star character * indicates that the preceding

component may appear 0 or more times

Abstract Syntax Tree

An abstract syntax tree is just a tree like data structure that holds a

representation of a program. In this case the parser program must

Term Definition

program ::= ( ak_label | ak_address | C-instr )* ak_eoi

C-instr ::= [destination] aluop [ajump]

destination ::= ( ak_null | ak_dest | ak_register ) ak_semi

aluop ::= ak_register | ak_alu_op

ajump ::= ak_semi ( ak_null | ak_jump )

construct an object of class an_program_node and append all

labels or instructions to it. Each tree class can only be manipulated

by pointers and has a typedef for a pointer to the class so

the * operator does not need to be used very often. Each class for

which you are allowed to create objects also has a create function,

for example to create an an_program_node you would call the

function an_program_node_create(). To accommodate situations

where pointers of different classes can be used, each class has a

function to convert the pointer type after checking that it is

appropriate. For example, a pointer to an an_c_instruction can be

used anywhere a pointer to an an_instruction is required and the

conversion function is named to_an_instruction(). If the checks

fails your program will exit immediately.

Parser

The skeleton parser.cpp file has most of the parsing structure

already written, you just need to complete the object creation and

parsing of labels and individual instructions. The main() function of

the parser program prints out an XML representation of the

abstract syntax tree using the precompiled library

function an_print_as_xml(). You do not need to know how to write

out XML. XML is produced by the parser because it is easy to read

by humans.

If your parser encounters any errors, it must exit immediately and

have not produced any output. Examples of errors include, multiple

definitions of a label, an A-instruction with a number that is too

large, or a C-instruction with multiple destination components,

multiple jump components or no ALU component. In this two

program structure, the parser will not be able to directly detect the

first two kinds of errors. Multiple definitions of a label will not be

detectable until the translator walks the abstract syntax tree and a

number that is too large will cause the tokeniser to report that it is

at the end of the input. The other errors can be detected by the

parser and should result in an immediate exit with no output.

The iobuffer.h include file gives details of functions you can use to

manage when output is produced by your programs.

Translator

The skeleton translator.cpp file includes a loop that walks over the

abstract syntax tree produced by the parser.

The translator program uses the precompiled

function an_parse_xml(), to read the output of

the parser program. You do not need to know how to read in and

interpret XML. The later workshops and assignments will use this

technique to allow their compilers to be written as separate

programs.

One advantage of this two program approach is that walking over

the tree more than once is really simple. This is important in the

translator when it comes to resolving symbols. When a program

includes an A-instruction such as @somename you cannot tell

if somename is a label or a variable until the entire program has

been inspected because the label definition may come later. This

will be true for jumps to a later part of a program. We solve this

problem by walking over the abstract syntax tree and record all the

labels that we can find. This is Pass 1. When we walk over the

abstract syntax tree a second time, Pass 2, every label name will

already have been defined. Therefore, all undefined names found in

Pass 2 must be variable names.

If your translator encounters any errors, it must exit immediately

and have not produced any output. It may be interesting to reflect

on why we did not include missing label definitions amongst the list

of errors to be detected.

Lookup Tables

Your translator program will need a way of generating the correct

set of bits for each part of a C-instruction. For example, "JMP"

needs to be mapped to "111". To do this you will need some sort

of lookup table to record these mappings. The translator program

also needs a lookup table so that it can implement a symbol table

to record label addresses and the memory locations for

variables. You should use the precompiled symbol table classes

described in the includes/symbols.h file to create any lookup

tables that you require. The includes/symbols.h file includes

examples of how to use these lookup tables.

You will need several lookup tables, one for the symbol table and

one for each component of a C-instruction. Although most parts of

a C-instruction are unique, the M, A and D registers need to be

mapped to different sets of bits depending on whether they are

used as an alu operation or a destination. For example, "A" as a

destination would be "100" but as an alu operation it would be

"0110000".

Test Programs

In addition to the tests in tests directory, we will use a number

of secret tests that may contain illegal tokens and / or multiple

labels and instructions on the same line and / or syntactic

errors. Note: these tests are secret, if your programs fail any of

these secret tests you will not receive any feedback about

these secret tests, even if you ask!

The Pong program supplied in the tests directory was written in

the Java-like high-level Jack language and translated into the Hack

assembly language by the Jack compiler and a Hack Virtual

Machine translator. (The Virtual Machine, Jack and the Jack

compiler are described in Chapters 7-8, 9 and 10-11, respectively).

Although the original Jack program is only about 300 lines of Jack

code, the executable Pong code is naturally much longer. Running

this interactive program in the supplied CPU Emulator is a slow

affair, so don't expect a high-powered Pong game. This slowness

is actually a virtue, since it enables your eye to track the graphical

behavior of the program. And don't worry! as we continue to build

the software platform in the next few projects, Pong and and other

games will run much faster.