There have been a few references on the history site to my IMP compiler for the PDP11, so I thought it might be useful to give an overview of how it managed to produce some quite interesting code sequences.
This compiler built on ideas I had played with in a Data General Nova compiler I wrote for the University of Aarhus (Denmark) in 1975. The Nova had a strange instruction set where all sorts of sub-functions could be combined into a single instruction. I remember describing the incremental process by which the compiler generated very tight sequences and being dismayed to be told that a compiler could never have produced such code: in short, that I was a liar. I hasten to add that this was a person from neither Edinburgh nor Aarhus. Even though, so far as I know, neither the Nova nor the PDP11 IMP compilers exist today, enough people who have seen the output from PDP11 IMP survive today to silence such disbelievers. I suspect that many others have experienced the same sort of closed-minded disbelief from the ignorant and opinionated but at the time I found it surprising and painful.
One of the things I enjoyed about working with compilers was the way in which simple optimisations would interact and combine, occasionally generating surprising code: the first example here illustrates that. The second example is the process in reverse: look at the code, spot an optimisation, implement it, look again and now see the chance for a further optimisation.
This latter is what Ian Young, Graham Toal, Chris Whitfield and I did while brainstorming Ackermann’s Function one evening at the Lattice Logic offices in Albany Lane. One of us would come up with a suggestion to improve the code and I would tear my hair out (I had more then) thinking of simple ways to make the compiler do it. Chris remembers me jumping round the machine room in glee when I worked out the trick for giving a function multiple entry points. This was what finally reduced the resulting code size below that of the documented “optimal” hand coding.
I should make it clear that, while Ackermann was the inspiration for developing these optimisations, the compiler had no specific knowledge of that particular function; all of the techniques mentioned led to improvements in many programs, not the least the compiler itself.
All my IMP compilers (except the IMP Interpreter and an IBM 370 batch compiler) eventually worked in three passes. The first pass, based on an extended, and severely overstretched, variant of Hamish Dewar’s parser, did all the error checking and output a machine-independent intermediate code. I find it amusing that this code, created in 1975, is still in use today, unchanged 27 years after being written. The second pass processed the intermediate code and generated two streams of output: a sequence of code fragments and a sequence of directives. The second pass always tried to generate the best code possible. Unlike many compilers of that period it did not generate “the obvious sequences” and then try to massage them into more sensible code. The peephole optimisations were all reserved for the call and jump knitting, managed by the third pass. Directives were things like <procedure start>, <procedure end>, <label>, <jump>, <call>, <dump bytes>, and so on. The third pass built a list of these directives, manipulated them, and finally used the resulting list to read the code fragments and combine them into the output object file. The directives were passed over repeatedly, stopping after a pass had failed to make any changes.
One idea behind this scheme was to delay as long as possible making decisions so that the maximum amount of information was available when they ultimately had to be taken. For example, all a program’s labels, jumps and calls were known before any code addresses had been committed, so making use of optimally-sized branch instructions was fairly simple.
The first example comes from the second pass of the PDP11 compiler itself. I remember looking through a dump of the code the compiler generated when compiling itself; I was always on the lookout for opportunities to optimise code. I spotted a bug and spent quite a while tracking it down, only to discover it wasn’t a fault at all but some amazing code.
%routine
Move(%record(exp)%name Lhs, Rhs)
%if Rhs_Oper=0 %then Simple(Lhs, Rhs) –
%else Complex(Lhs, Rhs)
%end
This routine was called to assign something (Rhs) to a destination (Lhs). Rhs would either be a simple object (constant, variable…) or an expression; Rhs_Oper was zero for simple objects and non-zero for expressions.
The compiler’s coding conventions were that integer function results were passed back in r0 and the first two integer-like parameters were passed in registers r1 and r0 respectively.
The information sent to the third pass was something like this:
Directives |
Fragments |
Notes |
<start Move> |
|
|
<label Move> |
|
|
<dump 2> |
TST 8(r0) |
1 |
<branch not equal L1> |
|
|
<call Simple> |
|
|
<branch L2> |
|
3 |
<label L1> |
|
|
<call Complex> |
|
|
<label L2> |
|
3 |
<return> |
|
3 |
<end, frame=0> |
|
2 |
Directives |
Fragments |
Notes |
<start Move> |
|
|
<label Move> |
|
|
<dump 2> |
TST 8(r0) |
|
<branch not equal L1> |
|
|
<call Simple> |
|
4 |
<return> |
|
|
<label L1> |
|
|
<call Complex> |
|
4 |
<return> |
|
|
<end, frame=0> |
|
|
Directives |
Fragments |
Notes |
<start Move> |
|
|
<label Move> |
|
|
<dump 2> |
TST 8(r0) |
|
<branch not equal L1> |
|
5 |
<branch Simple> |
|
|
<label L1> |
|
|
<branch Complex> |
|
|
<end, frame=0> |
|
|
Directives |
Fragments |
Notes |
<start Move> |
|
|
<label Move> |
|
|
<dump 2> |
TST 8(r0) |
|
<branch equal Simple> |
|
|
<branch Complex> |
|
6 |
<end, frame=0> |
|
|
TST
8(r0)
BEQ
Simple
It looked as if the compiler had forgotten to put in the jump to Complex, especially as the assembly language listing kept the source statements in their original order. You had to look carefully at the instruction addresses to see that the routine had been moved.
I find this fascinating as it is an example where the now-common technique of expanding small procedures in-line would have led to considerably worse code than was actually generated.
On to Ackermann. The IMP version of this function is as follows:
%integerfn
Ackermann(%integer m, n)
%result = n+1 %if m = 0
%result = Ackermann(m-1, 1) %if n = 0
%result = Ackermann(m-1, Ackermann(m, n-1))
%end
The key to generating good code for this function is to remove any entry and exit sequences. The problem comes with the parameter m which needs to be preserved across the nested call of Ackermann (third %result statement). It turns out that there is a fairly simple technique to handle this. The second pass lets the third pass sort out whether the parameter needs to be stored and where. The choice is simple: if the parameter is loaded from memory no more than once, it should be stored during the function prologue, otherwise it is a candidate for being pushed onto the stack and popped back when needed.
In symbolic form, the two output streams generated by the second pass for this function were:
Directives |
Fragments |
Notes |
<start Ackermann> |
|
|
<label Ackermann> |
|
|
<dump 2> |
TST r1 |
|
<branch not equal L1> |
|
|
<dump 2> |
INC r0 |
|
<return> |
|
|
<label L1> |
|
|
<dump 2> |
TST r0 |
|
<branch not equal L2> |
|
|
<call Ackermann, r1-1, r0+1> |
|
5, 6 |
<return> |
|
|
<label L2> |
|
|
<store P2> |
|
3 |
<call Ackermann, r0-1> |
|
6 |
<load P2: r1> |
|
2 |
<call Ackermann, r1-1> |
|
6 |
<return> |
|
|
<end, frame=0, store P1: use=1> |
|
1, 4 |
Ackermann:
…
DEC r1
JSR PC, Ackermann
into the following sequence, which is the same size and takes the same time to execute:
X1: DEC r1
Ackermann:
…
JSR PC, X1
Remember that the third pass is responsible for all code addressing and the function entry and exit sequences; it is a simple matter for it to infiltrate extra code in front of the main entry point.
The same algorithm then adds another entry point to deal with incrementing r0. Nothing can be done with the next call to Ackermann, but the third call can use the X1 entry point.
At this stage, the situation is as follows:
Directives |
Fragments |
Notes |
<start Ackermann> |
|
|
<label X2> |
|
|
<insert 2: INC r0> |
|
|
<label X1> |
|
|
<insert 2: DEC r1> |
|
|
<label Ackermann> |
|
|
<dump 2> |
TST r1 |
|
<branch not equal L1> |
|
|
<dump 2> |
INC r0 |
|
<return> |
|
|
<label L1> |
|
|
<dump 2> |
TST r0 |
|
<branch not equal L2> |
|
|
<call X2> |
|
7 |
<return> |
|
|
<label L2> |
|
|
<store P2> |
|
|
<call Ackermann, r0-1> |
|
|
<load P2: r1> |
|
|
<call X1> |
|
7 |
<return> |
|
|
<end, frame=0, store P1: use=1> |
|
|
This reduces the function to:
Directives |
Fragments |
Notes |
<start Ackermann> |
|
|
<label X2> |
|
|
<insert 2: INC r0> |
|
|
<label X1> |
|
|
<insert 2: DEC r1> |
|
|
<label Ackermann> |
|
|
<dump 2> |
TST r1 |
|
<branch not equal L1> |
|
|
<dump 2> |
INC r0 |
|
<return> |
|
|
<label L1> |
|
|
<dump 2> |
TST r0 |
|
<branch not equal L2> |
|
8 |
<branch X2> |
|
|
<label L2> |
|
|
<store P2> |
|
|
<call Ackermann, r0-1> |
|
|
<load P2: r1> |
|
|
<branch X1> |
|
|
<end, frame=0, store P1: use=1> |
|
|
X2: INC r0
X1: DEC r1
Ackermann:
TST r1
BNE L1
INC r0
RET PC
L1: TST r0
BEQ X2
MOV r1, *--SP
DEC r0
JSR PC, Ackermann
MOV *SP++, r1
BR
X1