Programming in WLP4

Parts of a compiler

From Lecture Note Very important

Recognition: yes/no S ∈ L(G)
Parsing: proving S ∈ L(G) by providing a derivation

CFG is a 4-tuple

  1. N: finite set of non-terminals
  2. T: finite set of terminals
  3. S: Start symbol, S ∈ N
  4. P: finite set of production rules

Definition of derivation

a sequence a0, a1, a2, ...... an
where a0 is the start symbol, an is a sequence of terminals and
ai ⇒ ai+1
example of a derivation

Derivation and Parse Tree

  1. A derivation specifies a unique parse tree.
  2. A parse tree has many derivations.(can start from left subtree of right subtree, for example)
  3. A input string has many derivations.

Leftmost/Rightmost derivation

Always expand the leftmost/rightmost non-terminal.

A parse tree has a unique leftmost/rightmost derivation!

Ambiguous grammer

A grammer is ambiguous if there exist a string in the language for which there are multiple parse trees, even when using the same derivation style. (i.e Leftmost/rightmost)


Augmenting a Grammar

Nomair's Handout (TopDownParsingHandout.pdf)


Top Down Parsing

Key words: First Follow Nullable Predict table LL1

Algorithm

Nomair's Handout (TopDownParsingHandout.pdf)

Computing the Predict table given First, Follow and Nullable

Nomair's Handout (TopDownParsingHandout.pdf)

LL(1)

In the name LL(1), the first L stands for scanning the input from left to right, the second L stands for producing a leftmost derivation, and the 1 stands for using one input symbol of lookahead at each step to make parsing action decision

When is a grammar LL(1)

From Anthony Zhang's note

From Lecture Note:

  1. A non-terminal that is nullable should not contain the same terminal in it's first and follow set. Example that's not LL(1) grammer:
  2. There should be only one way to derive ε for a nullable symbol. Example that's not LL(1) grammer:
  3. The same non-terminal should not generate the same first terminal in more than one way. Example that's not LL(1) grammer: In this case, we have both
    E ⇒ E + T ⇒ T + T ⇒ F + T ⇒ a + T
    E ⇒ T ⇒ F ⇒ a
  4. Left recursive grammers are never LL(1). Example that's not LL(1) grammer: In this case, we have both
    E ⇒ T + E ⇒ a + E
    E ⇒ T ⇒ a

Factoring a non LL(1) Grammer

Post on Piazza


Bottom Up Parsing

Key words: Shift Reduce DFA Stack Conflictions

Algorithm

Nomair's Handout (BottomUpParsingHandout.pdf)

LR(0) DFA

Nomair's Handout (BottomUpParsingHandout.pdf)

Can have conflictions (shilf-reduce, reduce-reduce), which can be resolve by using SLR(1)

SLR(1) DFA

Nomair's Handout (BottomUpParsingHandout.pdf)

LR(1) --- Not in the list


Context Sensitive Analysis

Introduction

Key words: Symbol table Type errors

Multiple variables/procedures

Missing variable/procedure declarations

Type Checking

Type Rules


Code Generation

Key words: $29 Offset Table

Variables, Frame Pointer, Using the stack to store temporaries

;prologue
lis $4
.word 4
sub $29, $30, $4
;set up variables
sw $1, 0($29) ;push parameter 1;
sw $2, -4($29) ;push parameter 2;

Need to use a offset table to remember the offset of each variable

Note: Don't use beq or bne to jump to labels, becuse addresses are unsigned and can be big! Use jr instead, see below

bne $3, $0, 3
lis $10
.word endLoop
jr$10

Runtime Environments

Dealing with pointers

When doing comparisons, use s

Calling Procedures


Compiler Optimizations

Constant Folding

;expr 1 + 2
code(1)
push($3)
code(2)
pop($8)
add $3, $8, $3
;expr 1 + 2
lis $3
.word 3

Constant Propagation

Replace the use of a variable with a constant value

;int x = 1; ... return x + x
lw, -12($29)
push($3)
lw $3, -12($29)
pop($8)
add $3, $8, $3
;int x = 1; ... return x + x
;1 + 1
lis $3
.word 2

Common Subexpression Elimination

;a + a
code(a)
push($3)
...
...
pop($8)
...

;(a + b) * (a + b)
code(a)
...
...
;a + a
code(a)
add $3, $3, $3

;(a + b) * (a + b)
code(a + b)
mult $3, $3
mflo $3

Dead Code Elimination

a = 10;
if(a < 20){
	...
	...
} else {
...
}
a = 10;
if(10 < 20){ 	// constant propagation
	...
	...
} else {
(removed)
}

Register Allocation

In compiler optimization, register allocation is the process of assigning a large number of target program variables onto a small number of CPU registers.

In general words, keep variables that are used most often in registers, so they can be accessed quickly.

Method Inlining

Replace a funciton call with the body of the function.

Useful when the calle is short. Doing this saves us from saving and restoring registers between procedures.

Tail Recursion


int fib(int n, int a){
	if(n == 0) return a;
	else return fib(n - 1, n * a); // tail recurion
}

Reusing the current stack frame for the recursive call.


Memory Management

Internal Fragmentation vs External Fragmentation


Free List Algorithm

Maintain a list of pointer to free blocks

when allocate x bytes, (x + 4) bytes are allocated, where the first four byte stores the integer x, aka the size of the block.

Since we allocate an extra word to store the size, when delete c, we get to address c[- 1] to delete the pre-allocated (x + 4) bytes.

Fragmentation may happen and we have three strategies:

  1. First fit.
  2. Best fit (leaves the smallest hole).
  3. Worst fit (leaves the biggest hole).


Binary Buddy Algorithm

Creates internal fragmentations

Allocate blocks that are powers of 2

For example, when allocate 20 words, we need (20 + 1) = 21 words, (same as Free List Algorithm). Instead of allocating 21 words, we allocate 32 words. since 32 = 24

Book-keeping in Binary Buddy Algorithm

Note: Merging is recursive!


Pre-midterm

Loader

Handout

Regular Language

A language is regular if

ε-NFA to NFA

  1. Whenever you have a chain of ε-transitions, followed by a transition on a, add a new transition on symbol a
  2. If a chain of ε transitions ends at an accept state, make all states in this chain accept states
  3. Remove all ε transitions.
  4. Remove all unreachable states.

NFA → DFA

Exmaple from note

Maximal Munch Algorithm

  1. Start with a DFA for the language of tokens.
  2. Run the input program until you get stuck
    1. If at accepting state, good
    2. Otherwise, backtrack the DFA and rewind the input to the last seen accpet state
    3. Otherwise, output error. (couldn't find the last accepting state)
  3. Emit token, reset state to start state.

Note: maximal munch Algorithm, aslo called greedy algorithm, is deterministic, but sometimes it won't find a tokenization.