P–Code留学生辅导、讲解C++编程、辅导arithmetic expression、辅导C/C++语言
- 首页 >> C/C++编程Homework 3 : Compiling Simple Expressions to
P–Code
Zhiyao Liang
MUST FIT
zyliang@must.edu.mo
Monday 17 December 2018
Abstract
The task of this assignment is to implement a compiler that can
evaluate simple arithmetic expressions. The focus is on the code generation
part. [2] [1].
1 Preparation
You need to have some programming environment to edit, compile and run
C programs.
1
The concepts of stack should be understood. The picture above describes
a stack, which is found from the web at https://www.cs.cmu.edu/
~adamchik/15-121/lectures/Stacks%20and%20Queues/pix/stack.bmp
Besides the textbook [2], especially chapter 8, there are several documents
that are helpful for this homework:
In the folder slides at the ftp site of this course:
– The PowerPoint file for Code Generation. This file presents main
ideas in chapter 8 of the textbook. Especially, some pictures of
parse trees and the flow chart of translating while–statement and
if–statement are shown.
– p-code.pdf and p-code.txt. The two files have the same content,
which describes (a subset of) P-code instructions, which explanations,
that are relevant to this project.
In the folder ExpS--to--P, which is provided with this homework, there
are several files .
– The .exp files are the code of ExpS programs.
– The .p files are the P-code programs translated from the corresponding
.exp files. P-code program for array.exp is not provided.
gencode.c which is provided for this homework. This file combines
the code provided in Chapter 8 of the textbook. This file describes a
general skeleton of a code generator.
2 P-code Machine
The runtime environment of a P–code virtual machine (PVM) can be implemented
as follows:
There are 4 pieces of data structures:
code area: It is an array of P–code instructions. There are two ways
that instructions can be loaded into the code area:
1. All together: A sequence of P–code instructions can be loaded all
together to the code area from an existing P–code program file.
2. On the fly: P–code instruction will be appended one-by-one to
the instruction array by some language translator.
2
When a PVM is running, the current instruction’s address (an index
in the array) is remembered. By default, current = current + 1, after
the current instruction is executed. A special case is a jumping
instruction: when the current is a jump instruction to some label, and
the corresponding label instruction is at address j, then current will
be changed to j+1.
Once an instruction is added to the code area, it should not be removed
or changed.
data area: It is an array of integers (since we only require integer
operations). The values of all variables will be saved in this area. The
address of a variable x is the index in the array where x’s value is
saved.
symbol table: It is a data structure that supports the following operations:
– lookup(x): Given a name (a string) return its address in the data
area. If x does not exists, some special signal is returned. The
names include variable names and label names. The value of a
label is the corresponding label instruction address in the code
area.
– insert(x): Allocate a new slot in the data area for the value of
the name x (x is not initialized at the moment). Nothing is done
if x is already existing. The address of x should be registered
in the symbol table so that it can be looked up. Therefore, the
symbol table remembers the next available address in the data
area, and the size of the data area. When a name x is used by
an instruction (lod or lda), if x is new (lookup(x) cannot find it),
x is inserted into the symbol table. A special kind of name is a
label. The value of a label is the address of the corresponding
label defining instruction in the code area.
stack: A stack is needed by the P-code instructions. (Read the description
of P-code instructions p-code.pdf). It is an array of integer.
A stack remembers:
– the stack top, which is an index in the array, it is the address
where the latest data item is saved. Data are added or removed
from the stack at the top. Initially, when the stack is empty, the
top could be -1.
3
– the size of the stack: the maximum number of data items of the
stack, which can be implemented as the size of the array.
The stack supports 2 operations:
– pop(): When the stack is not empty, return the data (integer) at
the top, and top = top ? 1. When the stack is empty, nothing is
popped, and some error message is printed out.
– push(c): When the stack is not full, top = top + 1, and c is
saved at the top in the data area. When the stack is already full,
nothing is pushed, and some error message is printed.
There are two ways that a source program can be translated to and
executed as P-code instructions.
Compiled. A source program is translated to a P-code program, and
then the P-code program is loaded and executed.
Interpreted. A statement of the source program is translated to a
sequence of P-code instructions, then, this sequence of P–code instructions
are executed. Especially, for a structured statement: Compound–
statement, If–statement, or While–statement, the PVM will wait until
the whole structured statement is provided and then start to run the
structured statement.
There are two different ways to print results on the screen when a PVM
is running.
Verbose mode: After each expression or assignment statement, the
value of the expression or the RHS of the assignment is printed. The
calculator described in homework 1 could be considered in the verbose
mode.
Silent mode: Only the write instruction (wri) will print an integer on
the screen. Otherwise, nothing is printed.
A programmer can decide which mode to choose.
2.1 P–code program
A P–code program is a sequence of P–code instructions. A semicolon ; will
start a comment that stop at the end of line. Detailed descriptions of P–code
instructions are in the textbook and the file p-code.pdf. A P–code program
will stop when there is no more instruction to run, or the halting instruction
(stp) is executed.
4
3 The enriched ExprS language
The language Expr–Simple (ExpS) is defined the same as in homework 1; in
order to make the language more powerful and let the code generation task
more interesting, we allow additional features in ExpS, as follows:
Comparison operators are allowed: >, <, >=, <=, ==, ! =. Therefore,
6 more token types are added for them.
While–statement is allowed, and the token while is considered a reserved
word.
If–statement is allowed, and the two tokens if and else are considered
as reserved words.
Curly braces, { and }, are allowed.
A simple statement (assignment or expression statement) will end at
a semicolon or newline.
EOF token is not required. If we require that every ExpS program
ends with a quit, then we don’t need the EOF token.
A programmer has the freedom to decide the token types.
3.1 Updated lexical rules of ExprS
Correspondingly, the updated lexical rules to define tokens of ExpS are
described by regular–expressions as follows in Table 3.1. This table is the
same as we saw in homework 1, but with additional token types.
More descriptions of the tokens are in the document for homework 1.
3.2 Updated syntax rules of ExpS
The updated grammar that allows while and if statements are shown in Table
3.2. For the tokens that are not ID or numbers or keywords, their string
literals, instead of their token types, are directly shown in the grammar
for simplicity of presentation. For two items in the grammar (variable or
constant) X and Y , we use space in stead of · (which is used in homework
1) to describe the concatenation of X and Y ; i.e, X Y means the same as
X · Y .
Note that the grammar rules in Table 3.2 is not ideal for writing a
recursive–descendent parser, since it does not represent the precedence and
5
ASSIGN : =
P LUS : +
MINUS :
T IMES :
DIV ID : /
LP AREN : (
LCUR : {
RCUR : }
RP AREN : )
EQ : ==
NEQ : ! =
GT : >
LT : <
GT E : >=
LT E : <=
Digit : 0 . . . 9
N atural : Digit · Digit?
NUMBER : N atural | M inus · N atural
Letter : a . . . z | A . . . Z
ID : Letter · Letter?
QUIT : quit
W HILE : while
IF : if
ELSE : else
RET URN : '\n '
SEMI : ;
Table 1: the lexical rules of tokens
associativity of operators in the expressions. You can adjust the grammar
according to your consideration.
4 Tasks to do
1. Implement a PVM (virtual machine of P–code). When a PVM is
running, it will execute the instructions that are in its memory of code
area.
2. Write a language translator that can translate ExpS statements to
P–code instructions. There are two choices to design the translator.
6
program → statementList QUIT
statementList → statement | statement statementList
statement → assignmentSt | expressionSt | whileSt |
ifSt | compoundSt
assignment → ID = expression
assignmentSt → assignment ;
expression → NUMBER | ID | ( expression ) |
expression operator expression
expressionSt → expression ;
operator → + | ? | ? | / |
== | ! = | > | < | >= | <=
whileSt → W HILE · ( expression ) statement
ifSt → IF ( expression ) statement |
IF ( expression ) statement ELSE statement
compoundSt → { statementList }
Table 2: Grammar of the ExpS language
An interpreter: An ExpS statement is translated to some P–code
instructions, then these instructions are executed, then continue
to process the next ExpS statement. The calculator of homework1
can be implemented as an interpreter.
A compiler: An ExpS program (a sequence of statements) are
translated all together to a sequence of P–code instructions, then
these instructions are executed.
You are only required to implement one (interpreter of compiler).
Bonus points will be earned if you can do the following parts
You program can work both as a compiler or interpreter.
Support arrays of the ExpS language. You may modify the grammar of
ExpS in Table 3.2 for arrays. Hint: the grammar listed in cgencode.c
will be helpful.
Here is some recommended strategy of doing this homework.
Continue to use the data structure of homework 1.
Translate simple expressions first, i.e. only the assignment and arithmetic
statements of ExpS. Then try the if and while statements.
7
If your homework 1 work is not perfect, you can focus on the code
generator part in this homework, i.e, the code generator can assume
some parse tree is already in the memory.
5 Submitting your homework
Due time: 12 January 2018 Saturday, 7:00pm.
At most 3 students can form a group to do this homework together.
At the top of the every text file that you changed or created (maybe
just the .c and .h files), record your name (if mutiple students forma
group, provide all the names) and date as comments.
Please attach all of your source code and document files to the email,
possibly zip them into a single file. Any information about how to run
and understand your code (a readme file) will be helpful.
Attention: Do not attach any binary files (.o, .obj, .exe, .out), since
google will consider them virus. Just send the source code and textual
documents.
Send your email to:
zyliangstu@gmail.com
The title (subject) of your email should be like:
[name][hmkNum info]
Name can be in Chinese or English (Pingying). For example, if a
student wants to submit homework 5, whose name Shan Li, then the
title of the email should be:
[Li, Shan][hmk3 code-generator]
If it is a group work, include all members’ name in the email title.
8
References
[1] Andrew W. Appel. Modern Compiler Implementation in C: Basic Techniques.
Cambridge University Press, New York, NY, USA, 1997.
[2] Kenneth C. Louden. Compiler Construction Principles and Practice.
PWS Publishing Company, 1997. ISBN 0-534-93972-4
http://www.cs.sjsu.edu/~louden/cmptext/.