P–Code留学生辅导、讲解C++编程、辅导arithmetic expression、辅导C/C++语言

- 首页 >> C/C++编程

Homework 3 : Compiling Simple Expressions to

P–Code

Zhiyao Liang

MUST FIT

zyliang@must.edu.mo

Monday 17 December 2018

Abstract

The task of this assignment is to implement a compiler that can

evaluate simple arithmetic expressions. The focus is on the code generation

part. [2] [1].

1 Preparation

You need to have some programming environment to edit, compile and run

C programs.

1

The concepts of stack should be understood. The picture above describes

a stack, which is found from the web at https://www.cs.cmu.edu/

~adamchik/15-121/lectures/Stacks%20and%20Queues/pix/stack.bmp

Besides the textbook [2], especially chapter 8, there are several documents

that are helpful for this homework:

In the folder slides at the ftp site of this course:

– The PowerPoint file for Code Generation. This file presents main

ideas in chapter 8 of the textbook. Especially, some pictures of

parse trees and the flow chart of translating while–statement and

if–statement are shown.

– p-code.pdf and p-code.txt. The two files have the same content,

which describes (a subset of) P-code instructions, which explanations,

that are relevant to this project.

In the folder ExpS--to--P, which is provided with this homework, there

are several files .

– The .exp files are the code of ExpS programs.

– The .p files are the P-code programs translated from the corresponding

.exp files. P-code program for array.exp is not provided.

gencode.c which is provided for this homework. This file combines

the code provided in Chapter 8 of the textbook. This file describes a

general skeleton of a code generator.

2 P-code Machine

The runtime environment of a P–code virtual machine (PVM) can be implemented

as follows:

There are 4 pieces of data structures:

code area: It is an array of P–code instructions. There are two ways

that instructions can be loaded into the code area:

1. All together: A sequence of P–code instructions can be loaded all

together to the code area from an existing P–code program file.

2. On the fly: P–code instruction will be appended one-by-one to

the instruction array by some language translator.

2

When a PVM is running, the current instruction’s address (an index

in the array) is remembered. By default, current = current + 1, after

the current instruction is executed. A special case is a jumping

instruction: when the current is a jump instruction to some label, and

the corresponding label instruction is at address j, then current will

be changed to j+1.

Once an instruction is added to the code area, it should not be removed

or changed.

data area: It is an array of integers (since we only require integer

operations). The values of all variables will be saved in this area. The

address of a variable x is the index in the array where x’s value is

saved.

symbol table: It is a data structure that supports the following operations:

– lookup(x): Given a name (a string) return its address in the data

area. If x does not exists, some special signal is returned. The

names include variable names and label names. The value of a

label is the corresponding label instruction address in the code

area.

– insert(x): Allocate a new slot in the data area for the value of

the name x (x is not initialized at the moment). Nothing is done

if x is already existing. The address of x should be registered

in the symbol table so that it can be looked up. Therefore, the

symbol table remembers the next available address in the data

area, and the size of the data area. When a name x is used by

an instruction (lod or lda), if x is new (lookup(x) cannot find it),

x is inserted into the symbol table. A special kind of name is a

label. The value of a label is the address of the corresponding

label defining instruction in the code area.

stack: A stack is needed by the P-code instructions. (Read the description

of P-code instructions p-code.pdf). It is an array of integer.

A stack remembers:

– the stack top, which is an index in the array, it is the address

where the latest data item is saved. Data are added or removed

from the stack at the top. Initially, when the stack is empty, the

top could be -1.

3

– the size of the stack: the maximum number of data items of the

stack, which can be implemented as the size of the array.

The stack supports 2 operations:

– pop(): When the stack is not empty, return the data (integer) at

the top, and top = top ? 1. When the stack is empty, nothing is

popped, and some error message is printed out.

– push(c): When the stack is not full, top = top + 1, and c is

saved at the top in the data area. When the stack is already full,

nothing is pushed, and some error message is printed.

There are two ways that a source program can be translated to and

executed as P-code instructions.

Compiled. A source program is translated to a P-code program, and

then the P-code program is loaded and executed.

Interpreted. A statement of the source program is translated to a

sequence of P-code instructions, then, this sequence of P–code instructions

are executed. Especially, for a structured statement: Compound–

statement, If–statement, or While–statement, the PVM will wait until

the whole structured statement is provided and then start to run the

structured statement.

There are two different ways to print results on the screen when a PVM

is running.

Verbose mode: After each expression or assignment statement, the

value of the expression or the RHS of the assignment is printed. The

calculator described in homework 1 could be considered in the verbose

mode.

Silent mode: Only the write instruction (wri) will print an integer on

the screen. Otherwise, nothing is printed.

A programmer can decide which mode to choose.

2.1 P–code program

A P–code program is a sequence of P–code instructions. A semicolon ; will

start a comment that stop at the end of line. Detailed descriptions of P–code

instructions are in the textbook and the file p-code.pdf. A P–code program

will stop when there is no more instruction to run, or the halting instruction

(stp) is executed.

4

3 The enriched ExprS language

The language Expr–Simple (ExpS) is defined the same as in homework 1; in

order to make the language more powerful and let the code generation task

more interesting, we allow additional features in ExpS, as follows:

Comparison operators are allowed: >, <, >=, <=, ==, ! =. Therefore,

6 more token types are added for them.

While–statement is allowed, and the token while is considered a reserved

word.

If–statement is allowed, and the two tokens if and else are considered

as reserved words.

Curly braces, { and }, are allowed.

A simple statement (assignment or expression statement) will end at

a semicolon or newline.

EOF token is not required. If we require that every ExpS program

ends with a quit, then we don’t need the EOF token.

A programmer has the freedom to decide the token types.

3.1 Updated lexical rules of ExprS

Correspondingly, the updated lexical rules to define tokens of ExpS are

described by regular–expressions as follows in Table 3.1. This table is the

same as we saw in homework 1, but with additional token types.

More descriptions of the tokens are in the document for homework 1.

3.2 Updated syntax rules of ExpS

The updated grammar that allows while and if statements are shown in Table

3.2. For the tokens that are not ID or numbers or keywords, their string

literals, instead of their token types, are directly shown in the grammar

for simplicity of presentation. For two items in the grammar (variable or

constant) X and Y , we use space in stead of · (which is used in homework

1) to describe the concatenation of X and Y ; i.e, X Y means the same as

X · Y .

Note that the grammar rules in Table 3.2 is not ideal for writing a

recursive–descendent parser, since it does not represent the precedence and

5

ASSIGN : =

P LUS : +

MINUS :

T IMES :

DIV ID : /

LP AREN : (

LCUR : {

RCUR : }

RP AREN : )

EQ : ==

NEQ : ! =

GT : >

LT : <

GT E : >=

LT E : <=

Digit : 0 . . . 9

N atural : Digit · Digit?

NUMBER : N atural | M inus · N atural

Letter : a . . . z | A . . . Z

ID : Letter · Letter?

QUIT : quit

W HILE : while

IF : if

ELSE : else

RET URN : '\n '

SEMI : ;

Table 1: the lexical rules of tokens

associativity of operators in the expressions. You can adjust the grammar

according to your consideration.

4 Tasks to do

1. Implement a PVM (virtual machine of P–code). When a PVM is

running, it will execute the instructions that are in its memory of code

area.

2. Write a language translator that can translate ExpS statements to

P–code instructions. There are two choices to design the translator.

6

program → statementList QUIT

statementList → statement | statement statementList

statement → assignmentSt | expressionSt | whileSt |

ifSt | compoundSt

assignment → ID = expression

assignmentSt → assignment ;

expression → NUMBER | ID | ( expression ) |

expression operator expression

expressionSt → expression ;

operator → + | ? | ? | / |

== | ! = | > | < | >= | <=

whileSt → W HILE · ( expression ) statement

ifSt → IF ( expression ) statement |

IF ( expression ) statement ELSE statement

compoundSt → { statementList }

Table 2: Grammar of the ExpS language

An interpreter: An ExpS statement is translated to some P–code

instructions, then these instructions are executed, then continue

to process the next ExpS statement. The calculator of homework1

can be implemented as an interpreter.

A compiler: An ExpS program (a sequence of statements) are

translated all together to a sequence of P–code instructions, then

these instructions are executed.

You are only required to implement one (interpreter of compiler).

Bonus points will be earned if you can do the following parts

You program can work both as a compiler or interpreter.

Support arrays of the ExpS language. You may modify the grammar of

ExpS in Table 3.2 for arrays. Hint: the grammar listed in cgencode.c

will be helpful.

Here is some recommended strategy of doing this homework.

Continue to use the data structure of homework 1.

Translate simple expressions first, i.e. only the assignment and arithmetic

statements of ExpS. Then try the if and while statements.

7

If your homework 1 work is not perfect, you can focus on the code

generator part in this homework, i.e, the code generator can assume

some parse tree is already in the memory.

5 Submitting your homework

Due time: 12 January 2018 Saturday, 7:00pm.

At most 3 students can form a group to do this homework together.

At the top of the every text file that you changed or created (maybe

just the .c and .h files), record your name (if mutiple students forma

group, provide all the names) and date as comments.

Please attach all of your source code and document files to the email,

possibly zip them into a single file. Any information about how to run

and understand your code (a readme file) will be helpful.

Attention: Do not attach any binary files (.o, .obj, .exe, .out), since

google will consider them virus. Just send the source code and textual

documents.

Send your email to:

zyliangstu@gmail.com

The title (subject) of your email should be like:

[name][hmkNum info]

Name can be in Chinese or English (Pingying). For example, if a

student wants to submit homework 5, whose name Shan Li, then the

title of the email should be:

[Li, Shan][hmk3 code-generator]

If it is a group work, include all members’ name in the email title.

8

References

[1] Andrew W. Appel. Modern Compiler Implementation in C: Basic Techniques.

Cambridge University Press, New York, NY, USA, 1997.

[2] Kenneth C. Louden. Compiler Construction Principles and Practice.

PWS Publishing Company, 1997. ISBN 0-534-93972-4

http://www.cs.sjsu.edu/~louden/cmptext/.


站长地图