CS 415: Compilers
Project 2/3: A Pascal Subset Front-End and Code Generator
Due dates: Friday, April 13 (Front-End), and Monday, April 30 (Code Generator)
Pascal Subset
Using flex and bison, you are to write a parser and code generator for a subset of Pascal.
Programs consist of single blocks. There are no procedure
declarations. Base types are limited to integer and boolean.
Composit types are limited to single dimensional arrays of base types,
indexed by integers.
Only the following statements are included:
while-do, if-then, if-then-else, assignment, write, break, and compound.
Operators are restricted
to arithmetic, logical, and relational.
Executing a break statement causes the program to jump out of the
immediately nested while loop and go (jump) to the
statement following that loop. The grammar that we are using
for this language is as follows:
| start |
::= |
program ID ; block . |
| block |
::= |
variables cmpdstmt |
| variables
| ::=
| var vardcls | empty string
|
| vardcls
| ::=
| vardcls vardcl ; | vardcl ;
|
| vardcl
| ::=
| IDlist : type
|
| IDlist
| ::=
| IDlist , ID | ID
|
| type
| ::=
| array [ integer_constant .. integer_constant ] of stype
|
| | | | stype
|
| stype
| ::=
| integer | boolean
|
| stmtlist
| ::=
| stmtlist ; stmt | stmt
|
| stmt
| ::=
| ifstmt | astmt | wstmt |
cmpdstmt | writestmt | breakstmt
|
| wstmt
| ::=
| while condexp do stmt
|
| ifstmt
| ::=
| ifhead then stmt else stmt | ifhead then stmt
|
| ifhead
| ::=
| if condexp
|
| cmpdstmt
| ::=
| begin stmtlist end
|
| writestmt
| ::=
| writeln ( exp )
|
| breakstmt
| ::=
| break
|
| astmt
| ::=
| lvalue := exp
|
| exp
| ::=
| rvalue | exp + exp | exp - exp | exp * exp
|
|
| |
| exp and exp | exp or exp | exp
exor exp | not exp
|
|
| |
| ( exp ) | constant
|
| condexp
| ::=
| exp != exp | exp == exp | exp <
exp | exp <= exp | ID |
boolean_constant
|
| lvalue
| ::=
| ID | ID [ exp ]
|
| rvalue
| ::=
| ID | ID [ exp ]
|
| constant
| ::=
| integer_constant | boolean_constant
|
| boolean_constant
| ::=
| true | false
|
Front-End: Project Description
As part of this project, you will
- Augment the scanner to recognize the key words (tokens) while
, do , break , and boolean ,
- Write the parser and detect and report at least the following
syntax errors by inserting appropriate error productions :
- "\n***Error: illegal variable declaration\n"
- "\n***Error: illegal statement\n"
- "\n***Error: illegal expression\n"
- "\n***Error: illegal conditional expression\n"
A reference parser for this project (without semantic
analysis) will be make available as an executable
on the undergraduate network. We hope to post this reference
parser during the second week of the project.
Note: Reporting the same error occurrence
multiple times is okay.
The grammar given above is ambiguous. However, we used features of bison
to deal with operator precedence and associativity to
resolve the ambiguity.
- Augment your parser to perform semantic analysis, in particular type checking.
This will include the implementation a single-level symbol table.
Your type checker has to detect and report at least the
following semantic errors:
- Declarations
"\n***Error: duplicate declaration of %s\n"
- Array declaration
\n***Error: lower bound exceeds upper bound\n"
- If stmt
"\n***Error: exp in if stmt must be boolean\n"
- While stmt:
"\n***Error: exp in while stmt must be boolean\n"
- Writeln stmt (Note: accepts only integer values
"\n***Error: illegal type for write\n"
- Assignment stmt
"\n***Error: assignment types do not match\n"
"\n***Error: assignment to whole array\n"
- Array subscript operation id[exp] in exp
"\n***Error: subscript exp not type integer\n"
"\n***Error: id %s is not an array\n"
- break statement
"\n***Error: break statement outside of a while loop \n"
- Identifier
"\n***Error: undeclared identifier %s\n"
- Arithmetic, logical, and relational operators in exp
"\n***Error: types of operands for operation %s do not match\n";
Specific Details
Names are case sensitive .
Efficiency is not the main concern of this project. You may a
simple unordered list to implement your symbol table. If you design
your symbol table interface well, it should be simple to replace the
underlying table implementation with a hashing scheme later.
Logical operators (and, or, exor, not) require boolean arguments. The
arithmetic and relational operators (e.g., +) require integer arguments.
Integer and boolean arguments cannot be mixed.
The equality/inequality operators (==, !=) apply to any type, as long as
the two argument types match.
Provided Code
The following code is provided as a starting point for your
project. You may modify any of these files.
- declaration of types and attribute operations:
attr.h and attr.c
There is nothing useful yet in these files.
- declaration of symbol table and symbol table operations:
symtab.h and symtab.c
There is nothing useful yet in these files.
- Scanner: scan.l (flex)
- Parser: parse.y (bison)
- Makefile
- demo-simple , a program
that can be successfully parsed with the partial parser that can
be generated from the provided code.
In order to test your compiler, we will provide test cases.
The following is just a tentative list:
- Parser and syntax error testing (more codes will be posted later):
demo-simple; this test case actually
works with the dummy parser provided;
- Semantic error / type error testing:
You can generate an executable called parser by
typing make . The parser expects the input on
stdin, i.e., you can call the parser on an input program as
follows: parser < demo-simple .
Due Date
The front-end due at 11:59 pm on April 13,
2007--that is, one minute before midnight.
If you have specific, overriding,
personal reasons why these dates are unreasonable, you should discuss
them directly with the instructor before the deadline.
Submission Procedure
We will use the web-based handin facility.
You need submit the following files: scan.l, attr.h, attr.c, symtab.h,
symtab.c, parse.y, ReadMe . The ReadMe file contains may contain
comments that you want the grader to know about.
It is your reponsibility to verify
that your files have been submitted correctly .
Grading Criteria
The project will be mainly graded on functionality. You will receive
no credit for the entire project if your
front-end code does not compile on remus .
We may use automatic testing tools, so make sure that your error messages are
exactly as described above.
Code Generator: Project Description
You will write a syntax directed translation scheme that will generate
ILOC code for the above language. You may test the correctness of your
generated ILOC code by running it on the ILOC simulator provided as
part of project 1 (~uli/cs415/spring07/proj1 on the
undergraduate cluster.)
Code Shape Requirements
-
Your code should use the register-register model that exposes
the maximal opportunities for register allocation. In other words,
each new value should reside in a separate virtual register, if
possible. The
function NextRegister will return a new (fresh) register number.
There are two versions of this project, the required version , and the
advanced version .
- required version: Every read and write of a scalar variable
leads to a memory reference. In other words, scalar variables are
treated the same way as array variables, i.e., are assumed to be
potentially aliased.
- advanced version: A read and write of a scalar variable
within a single basic block may lead to a
reference to a corresponding virtual register. The same
scalar variable may be assigned to different virtual registers
across different basic blocks. The first
reference to a scalar variable in a basic block will result
in generating a LOAD or STORE instruction. After that, the scalar variable "lives"
in its virtual register, and the virtual register needs to
be written back to the memory location of its scalar
variable at the end of the basic block if
the variable has been modified. Assignments to scalar variables that
live in registers are implemented as register-register
transfers (use the i2i ILOC instruction ).
There is no extra credit for this project, just the
enjoyment of doing it :-)
-
All variables are statically allocated, i.e., there is no need for
activation records. The static area starts at
memory location 1024. Addresses above are reserved for register
spilling. This convention will allow you to run your project 1
register allocator on each basic block of your generated code.
The register r0
should contain the starting address (namely 1024) of the static area.
-
You may only use the following ILOC instructions. All these
instructions are implemented in our ILOC simulator.
- no operation: nop .
- arithemtic: add, sub, mult .
- logical: and, or, xor .
- memory: load, loadI, loadAO, loadAI, store, storeAO, storeAI .
- control flow: br, cbr, cmp_LT, cmp_LE, cmp_EQ, cmp_NE .
- register transfer: i2i .
- I/O: output .
Pleas see files instrutil.h and instrutil.c for
details about the required formats. Procedures emit,
NextRegister, and NextLabel are also defined within these files.
li>
You may want to generate nop instructions as targets of
branches and conditional branches, e.g., L1: nop .
-
Boolean values are represented as 4 bytes entities with 0 == FALSE
and 1 == TRUE. Therefore, regular load and store instruction can be used.
-
The evaluation of an exp will always result in an integer
value, while the evaluation of an condexp will always
result in a boolean (0 or 1) value. An ILOC cmp_ instruction
writes a boolean value into its target register.
-
The function NextLabel will generate a new (fresh) label each time
it is called.
Provided Code
You are expected to extend your
solution for the first part of the project (Front End) for the
code generation part. However, you may assume that your input
programs for this project do not have any parsing or static
semantics errors (e.g., no type errors).
You will need to modify files parse.y , attr.h ,
attr.c , symtab.h , symtab.c , and Makefile
.
- Parser/Code Generator: parse.y
Here is where most of your code will go, as in the first part
of the project. Use calls to procedure emit to generate code.
This will write the resulting ILOC code into file iloc.out.
Please modify your own version of parse.y from part1 of
this project accordingly .
- instrutil.h and instrutil.c . *** DO NOT MODIFY ***
- You may modify your own Makefile of the first part of the
project to include the files instrutil.h and
instrutil.c , or use Makefile
In order to test your compiler, we will provide some test cases.
The test cases will be posted soon.
We will provide a sample solution codegen-sol
of the required version that will run on the undergraduate
cluster. You can use this compiler to run on your own
sample codes. This compiler also emits comments. Your compiler
is not required to emit any comments, but it may be useful
for you to do so. You should use the provided procedure
emitComment for this purpose.
Due Date
The code generator is due at 11:59 pm on April 30,
2007--that is, one minute before midnight.
If you have specific, overriding,
personal reasons why these dates are unreasonable, you should discuss
them directly with the instructor before the deadline.
Submission Procedure
Again, we will use the web-based handin facility.
You need submit the following files: attr.h, attr.c, symtab.h,
symtab.c, parse.y, ReadMe . The ReadMe file contains may contain
comments that you want the grader to know about.
It is your reponsibility to verify
that your files have been submitted correctly .
Grading Criteria
As the Front End, the this project will be mainly graded on functionality. You will receive
no credit for the entire project if your
code generator does not compile on remus .