The PDP11 IMP Compiler

Peter S. Robertson

 

There have been a few references on the history site to my IMP compiler for the PDP11, so I thought it might be useful to give an overview of how it managed to produce some quite interesting code sequences.

 

This compiler built on ideas I had played with in a Data General Nova compiler I wrote for the University of Aarhus (Denmark) in 1975. The Nova had a strange instruction set where all sorts of sub-functions could be combined into a single instruction. I remember describing the incremental process by which the compiler generated very tight sequences and being dismayed to be told that a compiler could never have produced such code: in short, that I was a liar. I hasten to add that this was a person from neither Edinburgh nor Aarhus. Even though, so far as I know, neither the Nova nor the PDP11 IMP compilers exist today, enough people who have seen the output from PDP11 IMP survive today to silence such disbelievers. I suspect that many others have experienced the same sort of closed-minded disbelief from the ignorant and opinionated but at the time I found it surprising and painful.

One of the things I enjoyed about working with compilers was the way in which simple optimisations would interact and combine, occasionally generating surprising code: the first example here illustrates that. The second example is the process in reverse: look at the code, spot an optimisation, implement it, look again and now see the chance for a further optimisation.

This latter is what Ian Young, Graham Toal, Chris Whitfield and I did while brainstorming Ackermann’s Function one evening at the Lattice Logic offices in Albany Lane. One of us would come up with a suggestion to improve the code and I would tear my hair out (I had more then) thinking of simple ways to make the compiler do it. Chris remembers me jumping round the machine room in glee when I worked out the trick for giving a function multiple entry points. This was what finally reduced the resulting code size below that of the documented “optimal” hand coding.

I should make it clear that, while Ackermann was the inspiration for developing these optimisations, the compiler had no specific knowledge of that particular function; all of the techniques mentioned led to improvements in many programs, not the least the compiler itself.

All my IMP compilers (except the IMP Interpreter and an IBM 370 batch compiler) eventually worked in three passes. The first pass, based on an extended, and severely overstretched, variant of Hamish Dewar’s parser, did all the error checking and output a machine-independent intermediate code. I find it amusing that this code, created in 1975, is still in use today, unchanged 27 years after being written. The second pass processed the intermediate code and generated two streams of output: a sequence of code fragments and a sequence of  directives. The second pass always tried to generate the best code possible. Unlike many compilers of that period it did not generate “the obvious sequences” and then try to massage them into more sensible code. The peephole optimisations were all reserved for the call and jump knitting, managed by the third pass. Directives were things like <procedure start>, <procedure end>, <label>, <jump>, <call>, <dump bytes>, and so on. The third pass built a list of these directives, manipulated them, and finally used the resulting list to read the code fragments and combine them into the output object file. The directives were passed over repeatedly, stopping after a pass had failed to make any changes.

One idea behind this scheme was to delay as long as possible making decisions so that the maximum amount of information was available when they ultimately had to be taken. For example, all a program’s labels, jumps and calls were known before any code addresses had been committed, so making use of optimally-sized branch instructions was fairly simple.

The first example comes from the second pass of the PDP11 compiler itself. I remember looking through a dump of the code the compiler generated when compiling itself; I was always on the lookout for opportunities to optimise code. I spotted a bug and spent quite a while tracking it down, only to discover it wasn’t a fault at all but some amazing code.

 

%routine Move(%record(exp)%name Lhs, Rhs)

   %if Rhs_Oper=0 %then  Simple(Lhs, Rhs) –

                  %else Complex(Lhs, Rhs)

%end

 

This routine was called to assign something (Rhs) to a destination (Lhs). Rhs would either be a simple object (constant, variable…) or an expression; Rhs_Oper was zero for simple objects and non-zero for expressions.

The compiler’s coding conventions were that integer function results were passed back in r0 and the first two integer-like parameters were passed in registers r1 and r0 respectively.

The information sent to the third pass was something like this:

 

Directives

Fragments

Notes

<start Move>

 

 

<label Move>

 

 

 <dump 2>

TST 8(r0)

1

<branch not equal L1>

 

 

<call Simple>

 

 

<branch L2>

 

3

<label L1>

 

 

<call Complex>

 

 

<label L2>

 

3

<return>

 

3

<end, frame=0>

 

2

 

 

Notes

 

  1. The second pass has detected that the parameters to the call of move are simply passed on to Simple or Complex; the only hard code it generates is the test on Rhs_Oper.
  2. The <end> directive indicates that the function needs no stack frame and so no entry or exit sequences are necessary.
  3. A branch to a return is converted into a return. Label L2 is now unused and is removed. This results in:

 

Directives

Fragments

Notes

<start Move>

 

 

<label Move>

 

 

 <dump 2>

TST 8(r0)

 

<branch not equal L1>

 

 

<call Simple>

 

4

<return>

 

<label L1>

 

 

<call Complex>

 

4

<return>

 

<end, frame=0>

 

 

 

  1. As there is no exit sequence, these two directives will normally generate the code: <JSR PC, LAB> <RTS PC>. The directives can be replaced by <branch LAB>, as the return from this call will also pop the current return address of the stack and jump back directly, giving:

 

Directives

Fragments

Notes

<start Move>

 

 

<label Move>

 

 

 <dump 2>

TST 8(r0)

 

<branch not equal L1>

 

5

<branch Simple>

 

<label L1>

 

<branch Complex>

 

 

<end, frame=0>

 

 

 

  1. This is a conditional branch around an unconditional branch. An inverted conditional branch to Simple can replace the sequence. Label L1 is now unused and can be removed:

 

Directives

Fragments

Notes

<start Move>

 

 

<label Move>

 

 

 <dump 2>

TST 8(r0)

 

<branch equal Simple>

 

 

<branch Complex>

 

 6

<end, frame=0>

 

 

 

  1. Now the bit that initially confused me. The final instruction in the function is a branch to another function. As part of the compiler’s shuffling of procedures to minimise branch sizes, it decided to locate the code for Move immediately above the code for Complex. The branch to complex then turned into a branch to the next instruction, and so was removed. The final generated code was the two instructions:

 

TST 8(r0)

BEQ Simple

 

It looked as if the compiler had forgotten to put in the jump to Complex, especially as the assembly language listing kept the source statements in their original order. You had to look carefully at the instruction addresses to see that the routine had been moved.

I find this fascinating as it is an example where the now-common technique of expanding small procedures in-line would have led to considerably worse code than was actually generated.

 

 

On to Ackermann. The IMP version of this function is as follows:

 

%integerfn Ackermann(%integer m, n)

   %result = n+1 %if m = 0

   %result = Ackermann(m-1, 1) %if n = 0

   %result = Ackermann(m-1, Ackermann(m, n-1))

%end

 

The key to generating good code for this function is to remove any entry and exit sequences. The problem comes with the parameter m which needs to be preserved across the nested call of Ackermann (third %result statement). It turns out that there is a fairly simple technique to handle this. The second pass lets the third pass sort out whether the parameter needs to be stored and where. The choice is simple: if the parameter is loaded from memory no more than once, it should be stored during the function prologue, otherwise it is a candidate for being pushed onto the stack and popped back when needed.

In symbolic form, the two output streams generated by the second pass for this function were:

 

Directives

Fragments

Notes

<start Ackermann>

 

 

<label Ackermann>

 

 

<dump 2>

TST r1

 

<branch not equal L1>

 

 

<dump 2>

INC r0

 

<return>

 

 

<label L1>

 

 

<dump 2>

TST r0

 

<branch not equal L2>

 

 

<call Ackermann, r1-1, r0+1>

 

5, 6

<return>

 

 

<label L2>

 

 

<store P2>

 

3

<call Ackermann, r0-1>

 

6

<load P2: r1>

 

2

<call Ackermann, r1-1>

 

6

<return>

 

 

<end, frame=0, store P1: use=1>

 

1, 4

 

 

Notes

 

  1. By the time the <end> directive is generated, the compiler knows much about the function. Here it notes that parameter 1 (m) needs to be stored in the local stack frame, which has not yet been used. The compiler has also observed that the value of n is always in r0 when needed; sadly the nested call of Ackermann forces parameter 1 (m) to be stored.
  2. By this point the compiler has avoided storing parameter m. Now it discovers that its value needs to be loaded but hasn’t a clue where the third pass will put it, so it asks pass 3 to load it into r1.
  3. Here the second pass has converted the argument “m-1” into “r1-1” and knows that r1 is the original value of parameter m. The upcoming function call will corrupt r1 and so it must be stored. The question is where? If there is only one use of the value, the stack can be used (two 16-bit instructions), if there are more, it’s better to stuff the parameter into the local stack frame at the start of the function, ignore the <store> directive, and make <load> pick up the parameter from the local frame.
  4. Now the third pass knows that the parameter P2 (m) is only read from memory once and so can use push (note 2) and pop (note 3) instructions to preserve the register. At last the basic entry and exit sequence can be determined: no stack frame is needed, so no entry and exit code is required. <return> can generate “RTS PC”.
  5. The r0+1 comes from a simple observation. The obvious way to load the constant 1 into a register is to use “MOV #1, r0”. However, the sequence “CLR r0; INC r0” is the same size and takes the same time to execute, but as we have just tested r0 we know that it must be zero, so the “CLR r0” can be dropped leaving “INC r0”.
  6. These calls indicate that the given registers are to be modified before the procedue is entered. On seeing the first such call, the third pass decides to change the obvious sequence:

 

Ackermann:

         

          DEC r1

          JSR PC, Ackermann

 

into the following sequence, which is the same size and takes the same time to execute:

 

X1:       DEC r1

Ackermann:

         

          JSR PC, X1

 

Remember that the third pass is responsible for all code addressing and the function entry and exit sequences; it is a simple matter for it to infiltrate extra code in front of the main entry point.

 

The same algorithm then adds another entry point to deal with incrementing r0. Nothing can be done with the next call to Ackermann, but the third call can use the X1 entry point.

At this stage, the situation is as follows:

 

Directives

Fragments

Notes

<start Ackermann>

 

 

<label X2>

 

 

<insert 2: INC r0>

 

 

<label X1>

 

 

<insert  2: DEC r1>

 

 

<label Ackermann>

 

 

 <dump 2>

TST r1

 

<branch not equal L1>

 

 

<dump 2>

INC r0

 

<return>

 

 

<label L1>

 

 

<dump 2>

TST r0

 

<branch not equal L2>

 

 

<call X2>

 

7

<return>

 

<label L2>

 

 

<store P2>

 

 

<call Ackermann, r0-1>

 

 

<load P2: r1>

 

 

<call X1>

 

7

<return>

 

<end, frame=0, store P1: use=1>

 

 

 

  1. These two directives will normally generate the sequence: <JSR PC, LAB> <RTS PC>. The directives can be replaced by <branch LAB> as the return from this call will also pop the current return address of the stack and jump back directly.

This reduces the function to:

 

Directives

Fragments

Notes

<start Ackermann>

 

 

<label X2>

 

 

<insert 2: INC r0>

 

 

<label X1>

 

 

<insert  2: DEC r1>

 

 

<label Ackermann>

 

 

 <dump 2>

TST r1

 

<branch not equal L1>

 

 

<dump 2>

INC r0

 

<return>

 

 

<label L1>

 

 

<dump 2>

TST r0

 

<branch not equal L2>

 

8

<branch X2>

 

<label L2>

 

<store P2>

 

 

<call Ackermann, r0-1>

 

 

<load P2: r1>

 

 

<branch X1>

 

 

<end, frame=0, store P1: use=1>

 

 

 

  1. Here we have a conditional branch round an unconditional branch to X2. This can be converted into an inverted conditional branch to X2.
  2. Now the third pass has finished optimising the structure of the function and can generate the final sequence:

 

 

X2:        INC r0

X1:        DEC r1

Ackermann: TST r1

           BNE L1

           INC r0

           RET PC

L1:        TST r0

           BEQ X2

           MOV r1, *--SP

           DEC r0

           JSR PC, Ackermann

           MOV *SP++, r1

           BR  X1