To define a syntax-directed translation so that the translation of an input is the corresponding AST, we first need operations that create AST nodes. Let's use java code, and assume that we have the following class hierarchy:
class ExpNode { }
class IntLitNode extends ExpNode {
public IntLitNode(int val) {...}
}
class PlusNode extends ExpNode {
public PlusNode( ExpNode e1, ExpNode e2 ) { ... }
}
class TimesNode extends ExpNode {
public TimesNode( ExpNode e1, ExpNode e2 ) { ... }
}
Now we can define a syntax-directed translation for simple arithmetic expressions, so that the translation of an expression is its AST:
CFG Translation rules
=== =================
exp -> exp + term exp1.trans = new PlusNode(exp2.trans, term.trans)
exp -> term exp.trans = term.trans
term -> term * factor term1.trans = new TimesNode(term2.trans, factor.trans)
term -> factor term.trans = factor.trans
factor -> INTLITERAL factor.trans = new IntLitNode(INTLITERAL.value)
factor -> ( exp ) factor.trans = exp.trans
Syntax-Directed Translation and Predictive Parsing
Now we consider how to implement a syntax-directed translation using a predictive parser. It is not obvious how to do this, since the predictive parser works by building the parse tree top-down, while the syntax-directed translation needs to be computed bottom-up. Of course, we could design the parser to actually build the parse tree (top-down), then use the translation rules to build the translation (bottom-up). However, that would not be very efficient.
Instead, we avoid explicitly building the parse tree by giving the parser a second stack called the semantic stack:
* The semantic stack holds nonterminals' translations; when the parse is finished, it will hold just one value: the translation of the root nonterminal (which is the translation of the whole input).
* Values are pushed onto the semantic stack (and popped off) by adding actions to the grammar rules. The action for one rule must:
o Pop the translations of all right-hand-side nonterminals.
o Compute and push the translation of the left-hand-side nonterminal.
* The actions themselves are represented by action numbers, which become part of the right-hand sides of the grammar rules. They are pushed onto the (normal) stack along with the terminal and nonterminal symbols. When an action number is the top-of-stack symbol, it is popped and the action is carried out.
So what actually happens is that the action for a grammar rule X -> Y1 Y2 ... Yn is pushed onto the (normal) stack when the derivation step X -> Y1 Y2 ... Yn is made, but the action is not actually performed until complete derivations for all of the Y's have been carried out.
Example: Counting Parentheses
For example, consider the following syntax-directed translation for the language of balanced parentheses and square brackets. The translation of a string in the language is the number of parenthesis pairs in the string.
CFG Translation Rules
=== =================
exp -> epsilon exp.trans = 0
-> ( exp ) exp1.trans = exp2.trans + 1
-> [ exp ] exp1.trans = exp2.trans
The first step is to replace the translation rules with translation actions. Each action must:
* Pop all right-hand-side nonterminals' translations from the semantic stack.
* Compute and push the left-hand-side nonterminal's translation.
Here are the translation actions:
CFG Translation Actions
=== ===================
exp -> epsilon push(0);
-> ( exp ) exp2Trans = pop(); push( exp2Trans + 1 );
-> [ exp ] exp2Trans = pop(); push( exp2Trans );
Next, each action is represented by a unique action number, and those action numbers become part of the grammar rules:
CFG with Actions
================
exp -> epsilon #1
-> ( exp ) #2
-> [ exp ] #3
#1: push(0);
#2: exp2Trans = pop(); push( exp2Trans + 1 );
#3: exp2Trans = pop(); push( exp2Trans );
Note that since action #3 just pushes exactly what is popped, that action is redundant, and it is not necessary to have any action associated with the third grammar rule. Here's a picture that illustrates what happens when the input "([])" is parsed (assuming that we have removed action #3):
input so far stack semantic stack action
------------ ----- -------------- ------
( exp EOF pop, push "( exp ) #2"
( (exp) #2 EOF pop, scan
([ exp) #2 EOF pop, push "[ exp ]"
([ [exp] ) #2 EOF pop, scan
([] exp] ) #2 EOF pop, push epsilon #1
([] #1 ] ) #2 EOF pop, do action
([] ] ) #2 EOF 0 pop, scan
([]) ) #2 EOF 0 pop, scan
([]) EOF #2 EOF 0 pop, do action
([]) EOF EOF 1 pop, scan
([]) EOF empty stack: input accepted!
translation of input = 1
In the example above, there is no grammar rule with more than one nonterminal on the right-hand side. If there were, the translation action for that rule would have to do one pop for each right-hand-side nonterminal. For example, suppose we are using a grammar that includes the rule: methodBody -> { varDecls stmts }, and that the syntax-directed translation is counting the number of declarations and statements in each method body (so the translation of varDecls is the number of derived declarations, the translation of stmts is the number of derived statements, and the translation of methodBody is the number of derived declarations and statements).
CFG Rule: methodBody -> { varDecls stmts }
Translation Rule: methodBody.trans = varDecls.trans + stmts.trans
Translation Action: stmtsTrans = pop(); declsTrans = pop();
push( stmtsTrans + declsTrans );
CFG rule with Action: methodBody -> { varDecls stmts } #1
#1: stmtsTrans = pop();
declsTrans = pop();
push( stmtsTrans + declsTrans );
Note that the right-hand-side nonterminals' translations are popped from the semantic stack right-to-left. That is because the predictive parser does a leftmost derivation, so the varDecls nonterminal gets "expanded" first; i.e., its parse tree is created before the parse tree for the stmts nonterminal. This means that the actions that create the translation of the varDecls nonterminal are performed first, and thus its translation is pushed onto the semantic stack first.
Another issue that has not been illustrated yet arises when a left-hand-side nonterminal's translation depends on the value of a right-hand-side terminal. In that case, it is important to put the action number before that terminal symbol when incorporating actions into grammar rules. This is because a terminal symbol's value is available during the parse only when it is the "current token". For example, if the translation of an arithmetic expression is the value of the expression:
CFG Rule: factor -> INTLITERAL
Translation Rule: factor.trans = INTLITERAL.value
Translation Action: push( INTLITERAL.value )
CFG rule with Action: factor -> #1 INTLITERAL // action BEFORE terminal
#1: push( currToken.value )