The nature of the instruction set of the target machine determines the difficulty of instruction selection. The uniformity and completeness of the instruction set are important factors. If the target machine does not support each data type in a uniform manner, then each exception to the general rule requires special handling.
Instruction speeds and machine idioms are other important factors. If we do not care about the efficiency of the target program, instruction selection is straightforward. For each type of three- address statement we can design a code skeleton that outlines the target code to be generated for that construct.
For example, every three address statement of the form x := y + z, where x, y, and z are statically allocated, can be translated into the code sequence
MOV y, R0 /* load y into register R0 */
ADD z, R0 /* add z to R0 */
MOV R0, x /* store R0 into x */
Unfortunately, this kind of statement � by - statement code generation often produces poor code. For example, the sequence of statements
a := b + c
d := a + e
would be translated into
MOV b, R0
ADD c, R0
MOV R0, a
MOV a, R0
ADD e, R0
MOV R0, d
Here the fourth statement is redundant, and so is the third if �a� is not subsequently used.
The quality of the generated code is determined by its speed and size.
A target machine with a rich instruction set may provide several ways of implementing a given operation. Since the cost differences between different implementations may be significant, a naive translation of the intermediate code may lead to correct, but unacceptably inefficient target code. For example if the target machine has an �increment� instruction (INC), then the three address statement a := a+1 may be implemented more efficiently by the single instruction INC a, rather than by a more obvious sequence that loads a into a register, add one to the register, and then stores the result back into a.
MOV a, R0
ADD #1,R0
MOV R0, a
Instruction speeds are needed to design good code sequence but unfortunately, accurate timing information is often difficult to obtain. Deciding which machine code sequence is best for a given three address construct may also require knowledge about the context in which that construct appears.
REGISTER ALLOCATION
Instructions involving register operands are usually shorter and faster than those involving operands in memory. Therefore, efficient utilization of register is particularly important in generating good code. The use of registers is often subdivided into two subproblems:
1. During register allocation, we select the set of variables that will reside in registers at a point in the program.
2. During a subsequent register assignment phase, we pick the specific register that a variable will reside in.
Finding an optimal assignment of registers to variables is difficult, even with single register values. Mathematically, the problem is NP-complete. The problem is further complicated because the hardware and/or the operating system of the target machine may require that certain register usage conventions be observed.
Certain machines require register pairs (an even and next odd numbered register) for some operands and results. For example, in the IBM System/370 machines integer multiplication and integer division involve register pairs. The multiplication instruction is of the form
M x, y
where x, is the multiplicand, is the even register of an even/odd register pair.
The multiplicand value is taken from the odd register pair. The multiplier y is a single register. The product occupies the entire even/odd register pair.
The division instruction is of the form
D x, y
where the 64-bit dividend occupies an even/odd register pair whose even register is x; y represents the divisor. After division, the even register holds the remainder and the odd register the quotient.
Now consider the two three address code sequences (a) and (b) in which the only difference is the operator in the second statement. The shortest assembly sequence for (a) and (b) are given in(c).
Ri stands for register i. L, ST and A stand for load, store and add respectively. The optimal choice for the register into which �a� is to be loaded depends on what will ultimately happen to e.
t := a + b t := a + b
t := t * c t := t + c
t := t / d t := t / d
(a) (b)
fig. 2 Two three address code sequences
L R1, a L R0, a
A R1, b A R0, b
M R0, c A R0, c
D R0, d SRDA R0, 32
ST R1, t D R0, d
ST R1, t
(a) (b)
fig.3 Optimal machine code sequence
CHOICE OF EVALUATION ORDER
The order in which computations are performed can affect the efficiency of the target code. Some computation orders require fewer registers to hold intermediate results than others. Picking a best order is another difficult, NP-complete problem. Initially, we shall avoid the problem by generating code for the three -address statements in the order in which they have been produced by the intermediate code generator.
APPROCHES TO CODE GENERATION
The most important criterion for a code generator is that it produce correct code. Correctness takes on special significance because of the number of special cases that code generator must face. Given the premium on correctness, designing a code generator so it can be easily implemented, tested, and maintained is an important design goal.