This document is a transcription of
these scans.
NB: The typesetting system and the printer used for the original document
used a monospaced font, but with fractional width spacing between words to provide fully justified paragraphs of text.
Since this spacing is difficult to reproduce and represent using HTML/ASCII,
the justification is approximated using full sized spaces instead.
Introduction
SKIMP is a very simple language designed with the teaching of compilers in
mind. Although simple, it possesses many of the structural features of
full-scale languages, for example the subroutine and parameter structure, making
it a worthwhile non-trivial vehicle for teaching. Many useful features of
full-scale languages are missing but can bp incorporated with very little
difficulty. This is quite intentional. The reasons are firstly that SKIMP is a
sufficiently rich language for the compiler to be written in SKIMP itself with
little inconvenience (the SKIMP MKI compiler was written in SKIMP) and secondly
that incorporation of the missing features provides a very suitable form of
student exercise. Although it would be desirable for students to write a
complete compiler of their own, this is a considerable task which they would
probably not have the time to complete. The task of extending a simple language
such as SKIMP for which a compiler already exists is much more feasible and
still allows students to get to grips with the compiling process and so
understand it more fully.
The name SKIMP arises fron the language's origin. Essentially it is a much
smaller version of the language IMP, an ALGOL-60 type language developed in
Edinburgh for systems implementation and general use.
Since SKIMP is intended for teaching rather than production programming, it
has not been necessary for the compiler to produce object code for a particular
real machine. Instead, the opportunity has been taken to invent a machine whose
order code does not have the inconsistencies and exceptional cases that all
actual machines seen to have. The compiler produces symbolic code for this
invented machine together with a certain amount of optional monitoring of the
compilation.
This document describes the SKIMP language, the object machine and details of
the method of compilation. Examples of compiler output and suggested extensions
to SKIMP suitable for student exercises are given in appendices. To enable
students to test their modified compilers an assembler/interpreter for the
object code is available. The use of this is also described in an appendix.
1
The internal value is formed by packing the (ASCII) values of the characters
right-justified in 8-bit fields of a word. A quote can become one of the
characters of the value by repeating it. For example:
'JIM''S' ''''
Input Lexical Processing
Spaces are deleted on input everywhere except when they appear between
quotes. Before deletion, however, they are significant in terminating the
keyword '%' shift mode. Text between quotes remains intact with the exception
that a repeated quote is regarded as a single quote. Newlines can therefore
appear between quotes. For example, to test for the value of a newline
character:
%IF I = '
' %THEN %RETURN
This is equivalent to:
%IF I = 10 %THEN %RETURN
since the ASCII value of a newline character is 10.
SKIMP statements are separated by either ';' or a newline. Thus several
statements can appear on one line of text. For example:
X = 1 ; Y = 2 ; Z = 3
In order to represent a single statement on more than one line intermediate
newline characters can be inserted when preceded by the continuation keyword %C.
For example:
%IF X = 1 %THEN %C
A = B
This is equivalent to:
%IF X = 1 %THEN A = B
Program Structure
A SKIMP program starts with the statement:
%BEGIN
and ends with the statement:
%ENDOFPROGRAM
Inside the program, routines and functions may appear in sequence or nested
as required. A routine takes the form:
3
%ROUTINE RT
.....
%END
and a function takes the form:
%INTEGERFN FN
.....
%END
The only difference between routines and functions is that functions produce a
value as their result and routines do not. The structure of a program could
take the following form, for example:
%BEGIN
%ROUTINE A
.....
%END
%INTEGERFN B
%ROUTINE C
.....
%END
.....
%END
.....
%ENDOFPROGRAM
Details of routines and functions and their parameters are given later.
Declarations
Names of variables and arrays denoting storage objects are declared at the
start of a routine or function. Execution of declaration statements has the
affect of setting aside storage for those variables or arrays. When the routine
or function in which the declaration occurred is returned from, that storage is
deleted so that it may be re-used for future declarations. When routines and
functions are entered recursively new storage is set aside for the variables and
arrays declared there without the storage for the old incarnation of the routine
or functions being lost. This becomes accessible again when the recursive
activation is left. A stack is used to accomplish this. The only objects SKIMP
manipulates are integer valued.
All names must be declared before they can be used. Scalar declarations take
the form:
%INTEGER <NAME>, <NAME>, <NAME>, . . .
Arrays are declared:
%INTEGERARRAY <NAME>, <NAME>, . . .( <EXPR> : <EXPR> )
where the phrase <EXPR> is used to denote the presence of an arithmetical
expression. For example:
%INTEGER I, J, K
sets aside three locations and names them 'I', 'J' and 'K'. The expressions in
4
the array declaration are the lower and upper bounds on the index of the arrays.
They may be either constants, for example:
%INTEGERARRAY A ( I : 10 )
or expressions which are evaluated at run-time, for example:
%INTEGERARRAY B, C ( -1 : N+1 )
Only one-dimensional arrays are defined in SKIMP (but see possible extensions in
Appendix 2).
Note that the first statement of a routine or function has the effect of
declaring its name. Hence, this must appear before any calls on the names.
This allows self-recursive calls but not cross-recursive calls between two
routines at the same nesting level. (SKIMP does not have routine and function
SPECs as in IMP).
Scope of Variables
When a variable or array is declared, that name can be used anywhere in the
routine or function in which the declaration appears, but not outside it. If
the routine or function has nested routines or functions the name can also be
used in these (as 'global' to them). For example:
%ROUTINE A
%INTEGER I
%ROUTINE B
%INTEGER J
....
%END
%ROUTINE C
%INTEGER K
%ROUTINE D
....
%END
....
%END
....
%END
The name 'I' can be used anywhere in routines 'A', 'B', 'C' and 'D'. The name
'J' can be used only in routine 'B' and the name 'K' can be used in routines 'C'
and 'D'.
Although, clearly, all the names of variables declared in a single routine or
function must be distinct, the same names may be declared in different routines
to refer to different storage objects. When a name is redeclared in a nested
routine or function, the storage for the global object of the same name becomes
inaccessible whilst within that routine or function. The name will always refer
to the most 'local' declaration.
Arithmetic Expressions
An arithmetic expression consists of a seauence of operands separated by
operators, possibly preceded by a unary operator (*, -, \) . Operands can be
5
array elements, constants, calls on functions or bracketted sub-expressions.
The operators are:
+ : addition
- : subtraction
* : MultIplication
/ : division (rounded)
** : exponentiation
<< : logical shift left
>> : logical shift right
& : logical AND
| : logical OR
|| : logical Exclusive OR
\ : logical NOT
Their precedences are:
\ : highest
** << >>
* / &
+ - | || : lowest
The precedence is left to right for operators of equal precedence. Thus:
1 / J * K
is equivalent to:
( 1 / J ) * K
Array element operands take the form:
<NAME> ( <EXPR> )
For example:
A ( 1 )
B ( 2 * A ( J ) )
Assignment
Variables and array elements are assigned new values using statements of the
form :
<NAME> = <EXPR>
<NAME> ( <EXPR> ) = <EXPR>
For example:
I = 2
A ( I * J ) = ( K + L ) / 2
6
Routines and Functions
A routine is called by executing a statement consisting of the name of the
routine. For example:
%ROUTINE RT
....
%END
would be called by the statement:
RT
Similarly, the call of a function appears as an operand of an expression. For
example:
%INTEGERFN FN
....
%END
might be called by:
I = FN
Routines and functions can be called recursively to any depth. The exit from a
routine is specified by the statement:
%RETURN
This is also implied by executing the %END statement of a routine, The exit
from a function must give in addition a result for the function:
%RESULT = <EXPR>
For example:
%RESULT = I * J - K
Execution of the %END statement of a function is invalid. A routine or function
can only be entered by an appropriate call as described above. If a routine or
function is encountered in the normal flow of control of a program at run-time
it is skipped around.
Routine and Function Parameters
Parameters of the following types can be passed to routines and functions:
%INTEGER
%INTEGERNAME
%INTEGERARRAYNAME
The first is a 'value' type parameter and the other two are 'reference' types.
For the %INTEGER type, the value of an expression in the routine or function
call is calculated and passed to the routine or function as the initial value of
the parameter. The initialised parameter can be regarded thereafter as a local
7
variable. For the %INTEGERNAME type, a reference to some storage object outside
the routine or function is passed and this object will be referred to whenever
the parameter is referenced within the routine or function. Hence the only
admissible forms of parameter in the call are either the name of a variable or
the name of an array element. Similarly for the %INTEGERARRAYNAME type, a
reference to an array is passed and the only admissible form of 'actual'
parameter is the name of an array. For example:
%INTEGER A,B,C
%INTEGERARRAY M,N (1:10)
%ROUTINE JIM(%INTEGER I,%INTEGERNAME J, %INTEGERARRAYNAME K)
....
%END
JIM (A+B,C,M)
JIM (A,M(B),N)
Jumps
SKIMP labels are numerical, For example:
123: 987:
To jump to these labels the instructions would be:
->123 ->987
Labels are local to a routine or function and jumps can only take place within
the routine or function and not either to an outer textual level or to a nested
inner level. Labels are not declared and jumps may precede or follow the
occurrence of the corresponding label without restriction.
Conditional Statements
The general form of the conditional statement is:
%IF <COND> %THEN <INSTN> %ELSE <INSTR>
where <COND> represents the condition to be tested and <INSTR> represents a
restricted class of instructions described below. If the condition is true the
first of these instructions is executed, otherwise the second. The %ELSE clause
can be omitted in which case if the condition is false execution proceeds to the
following statement.
A <COND> is formed by one or more simple relations of the form:
<EXPR> <COMP> <EXPR>
where <COMP> represents one of the following comparators:
= * <= < >= >
For example:
I < J + K
These simple relations can be combined using the keywords %AND and %OR. For
8
example:
I = J %AND K = 0
No precedence is defined between %AND and %OR. Instead, brackets must be used.
A single level of bracketting may only contain a sequence of simple comparisons
connected with either %AND or %OR but not both. For example:
I = 0 %AND ( J = 0 %OR K = 0 %OR L = 0 )
The relations are evaluated left to right but only up to the point where the
truth or falsity of the whole condition can be determined.
An <INSTR> may be any of the following types of statement:
assignment
routine call
jump
%RETURN
%RESULT=<EXPR>
%STOP
For example:
%IF I = J %THEN %RETURN
%IF K > 0 %THEN K = 0 %ELSE K = -1
In addition, groups of statements can take the place of an <INSTR> and be made
conditional. The grouping is made by preceding the first statement by the
statement:
%START
and following the last one by the statement:
%FINISH
For example:
%IF J = K %THEN L = 0 %ELSE %START
%IF J < K %THEN %START
L = 1
M = 2
%FINISH %ELSE %START
L = 2
M = 1
%FINISH
%FINISH
%STARTs and %FINISHes may be nested to any depth but must balance within any
routine or function.
Comments
Comment statements start with the exclamation character '|'. Thereafter, all
text is ignored up to the next statement separator i.e. semi-colon or newline.
For example:
9
! this is a comment
I = 0 ;! so is this ; J = 0
Input-Output
The standard IMP set of input-output routine names are built into SKIMP.
These are:
%ROUTINE READ SYMBOL(%INTEGERNAME X)
Read a single symbol from the current input stream into X.
%INTEGERFN NEXT SYMBOL
Return as the result the next single symbol which will be read.
%ROUTINE SKIP SYMBOL
Skip over the next single symbol in the input.
%ROUTINE PRINT SYMBOL(%INTEGER X)
Output the single symbol X.
%ROUTINE SPACE.
Output a space symbol.
%ROUTINE SPACES(%INTEGER X)
Output X spaces.
%ROUTINE NEWLINE
Output a newline symbol.
%ROUTINE NEWLINES(%INTEGER X)
Output X newlines.
%ROUTINE NEWPAGE
Output a newpage symbol.
%ROUTINE READ(%INTEGERNAME X)
Read a decimal number into X skipping preceding spaces and newlines.
%ROUTINE WRITE(%INTEGER X, Y)
Output the decimal number X of Y digits.
10
Chapter 2
The Object Machine.
The SKIMP compiler produces object code output for an invented machine. The
output takes the form of what would be this machine's assembly code rather than
a binary form so that it can easily be inspected by students. Since the machine
is only an invented one and does not exist, the object code cannot be run in the
ordinary way, but an assembler/interpreter for this code is available which
allows the output to be tested.
No attempt has been made to define the machine completely. Only those
features necessary and relevant for the compiler are defined. Interrupt
structure and input-output instructions, for example, are not defined.
The machine has a number of registers which can either be used as
accumulators or as index registers. In this respect, it is similar to machines
such as IBM 370, DEC 10 etc.. It is not a true multi-accumulator machine,
however, in that no register to register arithmetic operations are defined. The
instruction format is shown below:
operation , register , base , displacement
For example:
LOAD, ACC, DR1, 2
Instructions operate between the register and an effective storage address
formed from the sum of the contents of the base register and the displacement.
The machine is word-addressed and assumed to have sufficient storage and
registers for the purposes of the compiler. Although not strictly necessary, it
is also useful to define a word size and the instruction word layout. A 32-bit
word is envisaged with the layout:
+----------------+------------+--------+------------------+
| operation | register | base | displacement |
+----------------+------------+--------+------------------+
(8) (4) (4) (16)
This provides for 16 registers and a displacement of up to 65535. All sixteen
registers, 0-15, can be used in the register field of the instruction, but only
registers 1-15 in the base field allowing a zero in the base field to indicate
no indexing in the effective address.
The compiler object code output does not in practice use register numbers.
Instead, it uses register mnemonics for convenience of inspection. The
mnemonics used by the compiler are:
ACC, STP, COT, WK, DR0, DR1, DR2, DR3,.....
Their individual use is explained in later chapters.
11
The operations defined for this machine are:
LOAD : Load the register with the contents of the word at the effective
storage address.
LDA : Load the register with the effective storage address.
ADD : Add the contents of the word at the effective storage address to the
register.
SUB : Subtract the contents of the word at the effective storage address
from the register.
MLT : Multiply the register by the contents of the word at the effective
storage address.
DIV : Divide the register by the contents of the word at the effective
storage address.
EXP : Raise the register to the power given by the contents of the word at
the effective storage address.
STR : Store the contents of the register in the location at the effective
storage address.
NEG : Negate the contents of the register (the effective address is
ignored).
NOT : Perform the logical NOT on the register (the effective address is
ignored).
SHL : Shift the register logically left by the amount given by the contents
of the word at the effective storage address (zeros shifted in from
the right, bits shifted out to the left lost).
SHR : Shift the register logically right by the amount given by the
contents of the word at the effective storage address (zeros shifted
in from the right, bits shifted out to the right lost).
AND : Perform the logical AND between the register and the contents of the
word at the effective storage address.
OR : Perform the logical OR between the register and the contents of the
word at the effective storage address.
XOR : Perform the logical Exclusive OR between the register and the
contents of the word at the effective storage address.
BAL : Unconditional branch to the effective address and the address of the
next instruction stored in the specified register.
B : Unconditional branch to the effective address (any register present
is ignored).
BZ : Branch to the effective address if the contents of the register are
zero.
BNZ : Branch to the effective address if the contents of the register are
not zero.
12
BG : Branch to the effective address if the contents of the register are
greater than zero.
BNG : Branch to the effective address if the contents of the register are
not greater than i.e. less than or equal to, zero.
BL : Branch to the effective address if the contents of the register are
less than zero
BNL : Branch to the effective address if the contents of the register are
not less than i.e. grater than or equal to, zero
STOP : Stop execution.
In addition to these instructions, the compiler also outputs an assembler
directive 'FILL' in the same instruction format. This is used to direct the
assembler to fill in previously assembled holes in the object code with
appropriate values. This is intended primarily for dealing with forward jumps
and is described in detail in later chapters.
13
For example:
<DIGIT> = '0','1','2','3'.'4','5','6','7','8','9';
This would define a phrase <DIGIT> as any of the digits 0 to 9. Characters such
as these digits are enclosed within single quotes as shown. Groups of
characters can also appear between quotes if required. Commas separate the
alternative forms the phrase may take and a semi-colon terminates the
definition.
Keywords are enclosed within double quote characters. For example:
<PROC> = "ROUTINE" , "INTEGERFN" ;
Note that the '%' character can be omitted. When it is not clear what
constitutes a single keyword, for instance %INTEGERARRAYNAME, it may be divided
up as convenient. For example, the forms:
"INTEGERARRAYNAME"
"INTEGER" "ARRAY" "NAME"
are both valid and equivalent.
A decimal number might be defined by the phrase <N> as follows:
<N> = <DIGIT> <N> , <DIGIT> ;
That is, a number is either a digit followed by a number or is a single digit.
This illustrates how phrases can be defined recursively.
It is also possible to have a null alternative i.e. one with no items. For
example, if it was desired to define a decimal number optionally preceded by a
plus or minus sign, a phrase <SIGN> could be defined:
<SIGN> = '+' , '-' , ;
Then the required definition could be:
<SN> = <SIGN> <N> ;
In other words, the third alternative form of <SIGN>, namely nothing, can
precede the <N>. A phrase definition with only one alternative such as this is
rather redundant but nevertheless quite valid. The particular syntax analysis
scheme used in the SKIMP compiler imposes certain restrictions on the forms
phrase definitions may take but discussion of these is postponed until the
scheme itself has been described.
The syntax definitions for SKIMP are shown in figure 3.2. The types of
statement in SKIMP are defined by the phrase <STATEMENT> and the remaining
definitions follow from this. There are two phrases built into the compiler and
which therefore do not need to be defined. These are <NAME> and <CONST>. They
represent identifiers and constants respectively. The reason for them being
built-in is that identifiers and constants are recognised at the lexical
analysis stage and hence do not require definitions for use by the syntax
analyser.
15
<STATEMENT>= <INSTR>,
"IF"<COND>"THEN"<INSTR><ELSE>,
<CONST>':'<STATEMENT>,
"FINISH"<ELSE>,
"INTEGER"<ARRAY>,
<PROC> <NAME><FORMAL>,
"END"<OFPROG>,
"BEGIN";
<INSTR> = <NAME><ACTUAL><ASSIGN>,
'->'<CONST>,
"START",
"RETURN",
"RESULT"'='<EXPR>,
"STOP";
<EXPR> = <UNARY><OPERAND><EXPRREST>;
<UNARY> = '-','\','+', ;
<OPERAND> = <NAME><ACTUAL>,<CONST>,'('<EXPR>')';
<EXPRREST> = <OP><OPERAND><EXPRREST>, ;
<OP> = '<<','>>','&','||','|','**','/','*','+','-';
<COND> = <TEST><CONDREST>;
<TEST> = <EXPR><COMP><EXPR>,'('<COND>')';
<COMP> = '=','#','<=','<','>=','>';
<CONDREST> = "AND"<TEST><ANDCOND>,"OR"<TEST><ORCOND>, ;
<ANDCOND> = "AND"<TEST><ANDCOND>, ;
<ORCOND> = "OR"<TEST><ORCOND>, ;
<ELSE> = "ELSE"<INSTR>, ;
<ACTUAL> = '('<EXPR><EXPRS>')', ;
<EXPRS> = ','<EXPR><EXPRS>, ;
<ASSIGN> = '='<EXPR>, ;
<ARRAY> = "ARRAY"<NAME><NAMES>'('<EXPR>':'<EXPR>')',<NAME><NAMES>;
<NAMES> = ','<NAME><NAMES>, ;
<PROC> = "ROUTINE","INTEGERFN";
<FORMAL> = '('"INTEGER"<FORM><NAME><NAMES><FORMALS>')', ;
<FORM> = "ARRAYNAME","NAME", ;
<FORMALS> = ','"INTEGER"<FORM><NAME><NAMES><FORMALS>, ;
<OFPROG> = "OFPROGRAM", ;
figure 3.2
Syntax Reduction
When the syntax definitions are read in, they are reduced to a numerical form
to suit the syntax analyser. The keywords are also picked out and built up into
a dictionary for the benefit of the lexical analyser. This dictionary is a tree
structure of cells which each represent an individual keyletter. When two or
more keywords start with the same keyletters, the same cells are used for the
letters in common. For example, the keywords %START and %STOP would be
represented by the structure shown in figure 3.3.
16
the compiler will halt without attempting to compile the source.
Implementation Details
The routines which implement everything described in this chapter are
contained in the file 'SKIMPA'. The routine which reads in the syntax
definitions is named 'READ PS' and it takes the definitions from a file named
'SKIMPPS'. The resultant reduced form is stored in the array named 'PS'. The
positions in this where phrase definitions start are given by the array 'PP'
which is indexed by phrase identification number. The information given in
figures 3.2, 3.5 and 3.6 is output to a file named 'SKIMPPSL'.
The keyword dictionary is formed from cells which are records of the record
array named 'KD'. Their format is:
%RECORDFORMAT KDF(%INTEGER L,N,A,B)
'L' represents the key letter, 'N' the identification number at a leaf cell and
'A' and 'B' the two links used in the tree structuring. The creation of the
dictionary is slightly involved since keywords in the dictionary have to be the
'lowest common denominator' of the keywords in the syntax description. This
means that keywords may have to be split up if a new keyword is encountered
which is a prefix of an existing one. A local string array 'KA' of keywords is
used to aid this process.
The main control loop of the compiler which is illustrated in figure 3.1 is
implemented in the main routine named 'SKIMP'. The box 'generate code'
corresponds to routine 'STATEMENT' in file 'SKIMPB'. Subroutines used by this
routine appear in the other files 'SKIMPC', SKIMPD' and 'SKIMPE'. The general
strategy in processing statements is to break them down into component syntatic
parts and process each of these simpler parts individually. At the highest
level, each alternative form in the definition of phrase <STATEMENT> is
processsed separately by executing a switch on the alternative number in the
analysis record of the statement (see next chapter). This is also done at the
lower hierarchical levels.
19
is already in the dictionary then its existing identification number is used.
If it is not already in, it is inserted and a new identification number
generated for it. The output to the lexical array consists of the number 256
(the phrase number of <NAME>) followed by the identification number of the name.
The identification numbers are the positions at which the names are stored in
a 'hash table'. The hash table operates as follows. A calculation is performed
on the characters of the name itself to produce a 'hash index'. If this
position is not already occupied, the name is inserted at that point. If the
position is already occupied the name stored there may or may not be the same as
the new name being looked up. If it is the same then no further action is
required. If it is not, a 'clash' has occurred which must be resolved.
Numerous methods of doing this have been invented but the simplest possible one
is used here. This is to scan forward cyclically inspecting each successive
position in turn. If the new name is encountered in any of these positions,
again no further action is required. If, however, an unoccupied position is
encountered then the new name cannot be in already and so can be inserted there.
The requirements of the hash calculation are that it should be fast and that
as far as possible different names should produce different indexes. This is,
of course, unattainable in general since there are many more potential names
than indexes but a fairly good attempt can be made. The scheme used in this
compiler packs character values into a word, divides this word by a prime number
just less than the size of the hash table and takes the remainder as the hash
index.
Constants start with either a digit or a single quote character. The lexical
action in this case is to evaluate the constant and to insert the number 257
(for phrase <CONST> followed by the value into the lexical array. For
character constants the value consists of the individual character values packed
into 8-bit fields of a word right justified.
Characters which are not part of keywords, names or constants are passed
through to the lexical array unchanged. Since characters have values less than
128, the type of a lexical item can be determined from the range it lies in.
Syntax Analysis
The compiler uses a top-down recursive descent method of syntax analysis
which is directed by the table of reduced syntax definitions described earlier.
This analyses the output of the lexical phase with respect to the phrase
<STATEMENT>. The algorithm is implemented as a routine which considers just one
phrase definition. Where literal characters or keywords appear as items in the
definition they are compared directly with the lexical information and where a
sub-phrase appears as an item, the routine is called recursively to match it.
The technique involved in matching a phrase is essentially a search in that
each alternative in the phrase definition is considered in turn. Similarly, the
items in an alternative are considered one after another. An alternative
matches if all the items in it have been matched. If any item in the definition
does not match, the whole alternative in which that item appears is abandoned
and the algorithm moves on to try the next alternative. A flow diagram of the
algorithm is given in figure 4.1. Figure 4.2 shows the stages in the analysis
of the statement:
%INTEGER I,J
21
operand of an expression) phrase <OPERAND> will have been recognised twice.
This could have been avoided by writing the definition of <EXPR> in the
following way:
<EXPR> = <OPERAND><REST> ;
<REST> = <OP><OPERAND><REST> , ;
Definitions with only one alternative are, strictly, unnecessary and result
in the inefficiency of an extra recursive call, but are occasionally used for
the sake of clarity. <EXPR> and <COND> are examples in the SKIMP syntax.
Efficiency is also partly the reason for recognising names and constants at
the lexical stage. If their recognition were left to this algorithm, two
phrases such as:
<LETTER> = 'A' , 'B' , 'C', . . . . 'Z' ;
<DIGIT> = '0' , '1' , '2' , . . . . '9' ;
would probably have to be introduced. Since the algorithm searches through the
alternatives, the recognition of letters and digits would be very slow.
Also in view of the searching, it will clearly be beneficial to put the most
commonly occurring forms of alternative first in the definition of a phrase. In
particular, <STATEMENT> and <INSTR> are ordered in what is hopefully a sensible
way with this consideration in mind.
Analysis Records
During the course of syntax analysis, a record is kept of which phrases have
been recognised. This information is passed to the code generation phase of the
compiler and directs its course of action. The record consists of a
linearisation of the analysis tree corresponding to the statement recognised.
For instance, the analysis tree corresponding to the example above would take
the form shown in figure 4.3.
<STATEMENT> / 5
| |
| +-------+
| |
"INTEGER" <ARRAY> /2
| |
| +----------+
| |
<NAME> <NAMES> / 1
| | | |
| +----+ | +------+
| | | |
'I' ',' <NAME> <NAMES> /2
| |
| |
'J' null
figure 4.3
The number after the each phrase name in the tree is the number of the
24
alternative in its definition which was matched. These numbers and the pointers
emanating from each phrase comprise the information stored in the linearised
form of analysis record. For each phrase, the record takes the form shown in
figure 4.4:
+--------------------+-----------+-----------+-----------
| alternative number | pointer 1 | pointer 2 | . . . .
+--------------------+-----------+-----------+-----------
figure 4.4
The information is pruned to a certain extent, however. There are only pointers
for the phrase items matched. All reference to keywords, literals and null
alternatives is omitted since their presence can be inferred from the
alternative numbers. Phrases <NAME> and <CONST> are treated specially since
they were recognised at the lexical stage. To be consistent with other phrases,
both are given an alternative number matched of 1. In addition, for <NAME>, the
following word is set to contain the identification number of the name and for
<CONST>, the following word is set to the value of the constant. The analysis
record for the axample is therefore as shown in figure 4.5:
+--------------> +--------------->
+--> +---|--> | +----|---> |
| | | | | | | | | |
+---+---+---+---+---+---+-------+---+----+----+---+-------+---+
| 5 | 3 | 2 | 6 | 8 | 1 | ID(I) | 1 | 11 | 13 | 1 | ID(J) | 2 |
+---+---+---+---+---+---+-------+---+----+----+---+-------+---+
figure 4.5
The phrase names themselves also do not appear in the record as can be seen
above. The pointers are just indexes into the array in which the record is
stored. To aid clarity, when the analysis record is printed out by the compiler
(the ANAL option) the positions at which phrases start are indicated by inserts
such as:
(1/STATEMENT) (3/ARRAY) (8/NAMES)
Implementation Details
Line reconstruction and lexical analysis are implemented together in the one
routine which is named 'READ STATEMENT' and which appears in file 'SKIMPA'.
Subsidiary to this are routines which deal with each kind of lexical object.
These are named 'KEYWORD', 'NAME' and 'CONST'. Reconstructed text is placed in
an array named 'TT' and this is lexically reduced into an array named 'T'. The
name dictionary which is created during lexical processing comprises an array
named 'NAMEDLINK' and a byte array named 'NAMED'. The former is used as the
hash table and the entries in it point to positions in the latter array where
the characters of the names are stored as concatenated strings. For example,
figure 4.6:
25
+----+-+----+-+----+-+------+-+--------+
| | | | | | | | | |
+----+++----+++----+++------+++--------+
| | | |
+---------+ +----|------+ +--+
| | +----+ |
| | | |
+-+-----------+---------+-------+-----------------
| 5 H A R R Y 4 D I C K 3 T O M 4 F R E D . . . .
+-------------------------------------------------
figure 4.6
The syntax analysis routine is named 'COMPARE', also in file 'SKIMPA', and is
a function which returns the value 1 for success and 0 for failure. It takes as
its single parameter the identification number of the phrase to be matched in
this call. Hence, it is initially called with the number of phrase <STATEMENT>.
Since <STATEMENT> is assumed always to be the first phrase definition to be read
in, this number is 258. Phrase numbers 256 and 257, <NAME> and <CONST>
respectively, are picked out on entry to the routine and treated as literals
since they have already been recognised. The analysis record is placed in an
array named 'A'.
Processing Analysis Records
The basis of the processing rests on knowing the starting position in the
analysis record of each phrase which is of concern. The values in these
positions indicate the alternative which was actually recognised by the
comparison routine and it is this information which directs the code generation.
The form of analysis record facilitates finding out where the phrases start.
These are simply the values of the pointers which follow each alernative number.
Consider, for example, the phrase structure associated with the declaration of
variables:
<STATEMENT> = . . . . . ."INTEGER"<ARRAY>, . . . .
<ARRAY> = "ARRAY"<NAME><NAMES>'('<EXPR>':'<EXPR>')',<NAME><NAMES>;
The analysis record for:
%INTEGER I, J, K
would be:
5 3 2 6 8 1 ID(I) 1 11 13 1 1D(J) 1 16 18 1 ID(K) 2
The compiler fairly consistently uses names for the positions of where phrases
start in the analysis record which consist of the name of the phrase with 'P'
(for 'pointer') appended. In the case of phrase <STATEMENT> the variable is
'STATEMENTP' and it is the value in the analysis record at that point which
causes the main switch in routine 'STATEMENT'. For the positions of phrases
<ARRAY>, <NAME> and <NAMES>, the variables are 'ARRAYP', 'NAMEP' and 'NAMESP'
respectively. The following IMP code shows how the values for these are set up
and the declaration of scalars is handled:
26
ARRAYP=A(STATEMENTP+1)
NAMEP=A(ARRAYP+1)
NAMESP=A(ARRAYP+2)
%IF A(ARRAYP)=2 %START
%CYCLE ;| for each name declared
PUSHTAG(A<NAMEP*1), etc. )
NEXTRAD(LRVEL)= etc.
%IF A(NAMESP)=2 %THEN %EXIT
NAMEP=A(NAMESP+1)
NAMESP=A(NAMESP+2)
%REPEAT
%FINISH %ELSE %START
| array declarations
. . . .
The routine call on 'PUSHTAG' and the following line deal with the declaration
of each name. This is described in more detail in chapter 6. Each time round
the loop the variables 'NAMEP' and 'NAMESP' are updated for the next name and
the exit from the loop takes place when the null alternative of phrase <NAMES>,
alternative 2, is encountered.
27
-----+------------------+-------------+-------------+-----
| storage for A | for B | for C |
-----+------------------+-------------+-------------+-----
For calls in a different order, say 'A' called 'C', 'C' called 'D' and 'D'
called 'B', the picture would be:
-----+-----------+-----------+-----------+-----------+-----
| A | C | D | B |
-----+-----------+-----------+-----------+-----------+-----
Similarly recursive calls, say 'B' called 'A', 'A' called 'A', 'A' called 'C'
and 'C' called 'B':
--+-----------+------------+-----------+-----------+------------+--
| B (1) | A (1) | A (2) | C | B (2) |
--+-----------+------------+-----------+-----------+------------+--
Note that completely new storage areas are allocated for recursive calls. A
result of this 'allocate when called' scheme is that the storage for routine is
liable to start anywhere on the stack and not in a fixed place. The compiler
cannot therefore assign addresses to the variables as it comes across them in
declarations at compile-time. However, it can assign them addresses relative to
the start of the storage area for that routine and this it does. To access a
variable at run-time therefore requires both this 'relative address' and the
address of the start of the appropriate local name space. These start addresses
can be kept in registers thus allowing the 'base register and displacement' form
in a single object instruction to be used to access variables.
A register need not be allocated to each routine in the program, however.
Only those which are currently active i.e. called, and which contain variables
in scope at the current point of execution of the program, need have
corresponding registers. Since only one routine can be in scope at each textual
level i.e. routine nesting depth, only one register per textual level is needed.
This set of registers is known as a 'Display'. For machines with no or only a
few index registers, a display can be kept in store rather than registers but
then extra overheads are involved whenever variables are accessed. A typical
situation is that in which there are sufficient registers for one to be
permanently allocated to the most recent local name space and perhaps another to
be used for all global accesses. The loss of efficiency here is usually fairly
acceptable. No such difficulties arise with the SKIMP machine.
Clearly, the display will need to be updated on routine entry and exit.
Consider the following section of program with two routines at the same textual
level:
%ROUTINE A
. . .
%END
%ROUTINE B
. . .
%END
When 'A' is called, the stack situation will be:
29
--------+-------------------+-------------------------
| A |
--------+-------------------+-------------------------
|
DR-->
where 'DR' is the register allocated to point at the start of 'A's local name
space. When 'B' is called from 'A', the variables in 'A' are no longer in scope
i.e. are inaccessible, so 'DR' can be re-used to point to 'B's local name space:
---+------------------+-------------------+-------------
| A | B |
---+------------------+-------------------+-------------
|
DR-->
When 'B' returns, 'A's variables must again become accessible. Hence, the
previous value of 'DR' must be preserved. This will be true for all the
registers of the display. Their values must be preserved when they are updated
so that they can eventually be restored. The stack can also be used for this
purpose. The object code the SKIMP compiler produces puts preserved display
values in the first location of each local name space followed by the return
address to the calling routine. Storage for variables starts after that.
This discussion leads to the following scheme for updating the display. The
display is represented by the registers 'DR1', 'DR2', 'DR3', etc. and the
register which is used to point to the start of unallocated storage on the stack
is named 'STP', standing for STack Pointer. The call of a routine or function
is made by means of an instruction of the form;
BAL, WK, , (start address)
which leaves the return address in a 'work' register named 'WK'. The code at
the point of entry to a routine is:
STR, DR(level of routine), STP, 0
LDA, DR(level of routine), STP, 0
STR, WK, STP, 1
ADD, STP, , (storage area size)
At a point of exit i.e. %RETURN, %RESULT=, or %END, the code is:
LDA, STP, DR(level of routine), 0
LOAD, DR(level of routine), STP, 0
LOAD, WK, STP, 1
B, , WK, 0
Array Addressing
As mentioned previously, variables are allocated locations relative to the
start of a local name space as they are declared. A problem arises for arrays
since they may have bounds which are evaluated at run-time which means that the
amount of space they need is not known at compile-time. If the space for arrays
and variables were allocated in the order of declaration, variables declared
after an array would not have a fixed relative address. The solution is to
30
allocate the space for arrays after that for the variables as shown in the
diagram:
--+--------+-------------+-----------------+------------------+--
| old DR | return addr | static area | dynamic area |
--+--------+-------------+-----------------+------------------+--
|
DR-->
Each array is now allocated a single location in the static area and this points
to the place in the dynamic area where the storage for the array is located.
For example;
%INTEGER I, J
%INTEGERARRAY A, B( ? : ? )
%INTEGER K, L
+------------------------->
+-----|-------------> |
| | | |
-+--+--+-----+-----+--+--+--+--+-----+-----+-----------+-----------+-
| | | I | J | | | K | L | A | B |
-+--+--+-----+-----+-----+-----+-----+-----+-----------+-----------+-
| |
<--DR STP-->
When register 'STP' os incremented at routine entry, it is just incremented
past the static storage area. The object code output for an array declaration
allocates space beyond 'STP' and moves 'STP' up as appropriate for the size of
the array. Variables in the static area have an address:
DR(textual level of declaration) + relative address of variable
which allows them to be accessed in one instruction. Array elements on the
other hand can only be accessed indirectly via the pointer. If this points to
the position of the possibly hypothetical zero element of the array rather than
the first actual location, the address of an element such as:
A( expression )
will be:
[ DR(level) + rel. addr. of pointer ] + value of expression
where the square brackets indicate 'contents of'.
In addition to incrementing 'STP' the object code for an array declaration
must also fill in this zero address. For the general case of an array
declaration:
%INTEGERARRAY A,B,....( lower : upper )
the form of the object code is:
31
code to evaluate 'lower' in 'ACC'
STR, ACC, DR(level), temp1
code to evaluate 'upper' in 'ACC'
LDA, ACC, ACC, 1
STR, ACC, DR(level), temp2
SUB, STP, DR(level), temp1
STR, STP, DR(level), rel. addr. of 'A's pointer
ADD, STP. DR(level), temp2
SUB. STP, DR(level), temp1
STR, STP. DR(level), rel. addr. of 'B's pointer
ADD, STP. DR(level), temp2
. . . . .
'temp1' and 'temp2' are temporary working storage locations set aside in the
local name space to hold the values of 'lower' and 'upper+1' respectively.
(Similar working storage is also used for partial results when evaluating
expressions.) For each array declared the value 'STP-lower' before 'STP' is
incremented gives the zero element address and that plus 'upper+1' gives the
final value of 'STP' having allowed space of length 'upper-lower+1' for the
array.
Parameter Passing
Parameters are regarded as initialised local variables of routines in SKIMP.
Passing parameters for a routine call therefore consists of setting these
initial values. This is quite straightforward since the the parameters will
have the first relative addresses to be allocated in the static area of the
routine's local name space. Before the entry code has been executed, the start
of the future local name space will pointed to by the current value of 'STP'.
This allows the values of the parameters to be stored using addresses relative
to 'STP' before the 'BAL' is executed.
The three types of parameter are each treated differently. For %INTEGER
value type parameters, the expression in the call is evaluated in register 'ACC'
and then stored in the appropriate place. For %INTEGERNAME type parameters, the
address of the variable or array element is evaluated in register 'ACC' and then
stored. Thereafter, the parameter has to be accessed indirectly via this
address. For %INTEGERARRAYNAME type parameters, the word containing the zero
element address of the array in the call is copied to the parameter location.
This allows accesses to array name parameter elements to be handled by the
compiler in exactly the same way as those to ordinary array elements.
To illustrate the scheme, consider the stack when routine 'R' in the
following program is called:
%BEGIN
%ROUTINE R(%INTEGER X, %INTEGERNAME Y,%INTEGRRARRAYNAME Z)
. . . . .
%END
%INTEGERARRAY A ( 0 : ? )
%INTEGER I, J, K
R(I+J, K, A)
. . . . .
32
+------------------->+<-------------------------------+
| <-|--------------------------+ |
| | | | |
-+--+--+-----+-----+-----+-----+-----------+--+--+-----+-----+-----+--
| | | | I | J | K | A | | | X | Y | Z |
-+--+--+-----+-----+-----+-----+-----------+--+--+-----+-----+-----+--
| (A0) | (I+J) (A0)|
<--DR1 DR2--> STP-->
The object code for such a call would be:
LOAD, ACC, DR1, 3 \ I+J
ADD, ACC, DR1, 4 /
STR, ACC, STP, 2 -->
LDA, ACC, DR1, 5 \ name
STR, ACC, STP, 3 /
LOAD, ACC, DR1, 2 \ array name
STR, ACC, STP, 4 /
BAL, WK, , (start address of R)
If a variable of %INTEGERNAME type i.e. a parameter, is passed as the actual
parameter for another %INTEGERNAME parameter, the word containing the reference
to the original actual parameter is copied to the position for the new parameter
thereby maintaining just one level of indirection to reach the required address.
Implementation Details
Declarations of both scalar variables and arrays are processed as alternative
5 of phrase <STATEMENT> in file 'SKIMPB'. The routine 'PUSHTAG' which records
details of declarations for future use is described in the next chapter.
Relative addresses in the static area of any local name space are allocated
sequentially. However, routines may be nested and may appear in the middle of
the variable declarations of the enclosing routine. Hence, the next relative
address to be allocated must be retained for each textual level up to the
current level to allow for this. These values are stored in the array 'NEXTRAD'
which has an element for each possible textual level. Since the compiler
variable 'LEVEL' always contains the value of the current textual level, the
assignment of relative addresses always uses 'NEXTRAD(LEVEL)'.
Routine 'GET WORK' allocates locations in the local name space for temporary
storage of partial results, in this instance in connection with array
declarations. They are handled in exactly the same way as scalar declarations.
Routine entry and exit sequences are dumped by the routines 'ENTER' and 'DUMP
RETURN' which appear in file 'SKIMPD'. Note that 'ENTER' also handles the
'%BEGIN' statement since this is similar in many respects.
33
+--------+--------+----------+---------+----------------------+
| form | type | params | level | address |
+--------+--------+----------+---------+----------------------+
(4) (4) (4) (4) (16)
As an example, consider the program:
%BEGIN
%INTEGER I,J
%ROUTINE R(%INTEGERNAME K)
%INTEGERARRAY A(1: 10)
.......
%END
.......
%ENDOFPROGRAM
The tag words for these names would be:
+---+---+---+---+-----------+
I | 0 | 1 | 0 | 1 | 2 |
+---+---+---+---+-----------+
+---+---+---+---+-----------+
J | 0 | 1 | 0 | 1 | 3 |
+---+---+---+---+-----------+
+---+---+---+---+-----------+
R | 4 | 0 | 1 | 1 | (addr) |
+---+---+---+---+-----------+
+---+---+---+---+-----------+
K | 1 | 1 | 0 | 2 | 2 |
+---+---+---+---+-----------+
+---+---+---+---+-----------+
A | 2 | 1 | 0 | 2 | 3 |
+---+---+---+---+-----------+
Values of tags can be inspected in the output from the compiler by using the
option '/TAGS' when calling the compiler. They are printed out (in hexadecimal)
at the %END of every routine and function and at %ENDOFPROGRAM for the names
that were locally declared. An example of this can be found in appendix 1.
The packing of the tag word places some limits on the programs the compiler
can compile, for instance only up to fifteen routine parameters and only up to
fifteen textual levels, but these are not serious limits for this compiler. (In
fact, less than fifteen textual levels are permitted due to lack of registers.)
This word of information has to be made available whenever the name it
corresponds to is used. Since names are identified by a dictionary position
i.e. hash position, a further array is used in parallel. This contains pointers
to the list cells which contain the tag information. This is be illustrated in
figure 6.1 for the preceding example again.
35
%BEGIN
%INTEGER I
%ROUTINE R
%INTEGERARRAY I(1:10)
......
%END
......
%ENDOFPROGRAM
The tags for this whilst the compiler is compiling routine 'R' will be:
| | |
+---+ +---+---+ +-+-+-+-+---+---+ +-+-+-+-+---+---+
| I |<-----| | |---->|2|1|0|2| 2 | |---->|0|1|0|1| 2 | |
+---+ +---+---+ +-+-+-+-+---+---+ +-+-+-+-+---+---+
| | | | |
2nd declaration 1st declaration
If when a call is popped up, nothing remains on the list i.e. this was not a
redeclaration, then the space in the name dictionary for that name can also be
recovered. This is the case since the dictionary works on a last-in-first-out
basis, a consequence of the nested structure of SKIMP programs. The question
arises as to how the compiler knows which tags to pop up when an %END statement
is reached. An obvious method would be to search through the dictionary for
tags having the appropriate textual level in the level field. A more efficient
method, however, which avoids searching, is to remember in a list the ones which
have to be popped up. Each cell in the list contains the dictionary position of
a name, so that this list can be scanned down at the %END statement and a
'popup' performed for every position mentioned in it (plus extra 'popups' for
routine parameter tag cells). Because of the nesting of routines in SKIMP a
separate list has to be maintained for each textual level. An array with one
position per textual level contains pointers to the heads of these lists as
shown in figure 6.3:
+---+ +---+---+ +---+---+ +---+---+
| |----------->| | |--->| | |--->| | |
+---+ +---+---+ +---+---+ +---+---+
| |-------+
+---+ | +---+---+
| |---+ +--->| | |
+---+ | +---+---+
| | |
| | | +---+---+ +---+---+
| | +------->| | |--->| | |
+---+---+ +---+---+
figure 6.3
Implementation Details
Lists of calls are frequently used by the compiler to maintain current status
information during compilation. Some are used as pushdown popup stacks but not
37
all. A list cell in this implementation consists of two words, one an
information word and the other a link word:
+---------------+---------------+
| information | link |
+---------------+---------------+
Information words are taken from an array named 'TAG' and link words from an
array named 'LINK'. Initially, all these pairs of words are formed into an
'Available Space List' or 'Free List' and thereafter cells are taken from this
list as required and returned to it when no longer needed. The variable
'TAGASL' always points to the head of the available space list:
+---+---+ +---+---+ +---+---+
TAGASL--->| | |--->| | |--->| | |---> . . .
+---+---+ +---+---+ +---+---+
By convention, the link of the last call in any list is set to zero. Links to
other list calls take the form of the index of the words in the 'TAG' and 'LINK'
arrays. These are therefore declared from 1 upwards to avoid conflict. Both of
the arrays are initialised in file 'SKIMPA' to set up the tags and links for
built-in routine names such as 'READ SYMBOL' etc. (The name dictionary is also
initialised for the same reason.)
A function 'NEWTAG', in file 'SKIMPD', takes the cell indicated by 'TAGASL'
from the available space list whenever a cell is required. 'TAGASL' is then
moved down the list to the next one. If ever 'TAGASL' is zero, the compiler
stops with a suitable monitor. The result of the 'NEWTAG' function is the index
of the cell in 'TAG' and 'LINK'. A common sequence of instructions might take
the form:
CELL = NEWTAG
TAG (CELL) = ....
LINK (CELL) = ....
A function named 'RETURNTAG',also in file 'SKIMPD', has the reverse effect.
It takes a single parameter which is the index of the cell to be returned to the
available space list. The result of the function is the value in the 'LINK'
word of the cell returned. This is purely a matter of programming convenience.
The two parallel arrays which form the hash dictionary of names and pointers
to tags are named 'NAMEDLINK' and 'TAGLINK' and point to the 'NAMED' and
'TAG/LINK' arrays respectively.
Routines 'PUSHTAG' and 'POPTAGS', in file 'SKIMPD', push and pop tags from
'TAGLINK' as required. 'PUSHTAG' is called with the identification number of the
name whose tag is being pushed together with it's form, type, etc. as
parameters. A subsidiary function of this routine is to check whether the same
name has already been declared at this level and monitor the fault if so.
'POPTAGS' is called at the %END of each routin e or function to pop all the tags
for names declared at that level, monitoring them if the '/TAGS' option was set.
The array which contains pointers to lists of names declared at each level is
named 'NAMELIST'.
38
Chapter 7
Compilation of Expressions
Although all the registers in the object machine can be used as accumulators
the machine is regarded, for the sake of simplicity, as a single-accummulator
machine and all expressions are evaluated in the one register 'ACC'. This
avoids the necessity of having register to register operations which would be
needed for a true multi-accumulator machine and avoids the extra complexity this
introduces for expression compilation.
A 'tree' algorithm is used in the SKIMP compiler. Many other methods are
possible but this one produces reasonably good object code without too many
redundant 'LOAD' and 'STR' operations. It is also straightforward to
comprehend. Expressions are regarded as trees with operators at the nodes and
operands as leaves. For example, the expression:
I * J - K / (L + M)
has the tree form shown in figure 7.1.
-
|
+-----+-----+
| |
* /
| |
+--+--+ +--+--+
| | | |
I J K +
|
+--+--+
| |
L M
figure 7.1
The algorithm examines the 'shape' of the tree in order to decide the best
order of evaluation of the expression. The commutativity of the operators, that
is, whether 'X?Y' is the same as 'Y?X' or not, is taken into account and also
whether the sub-trees of operator nodes are leaves or not. When the sub-tree is
a leaf, that operand is capable of being immediately loaded into 'ACC' or can
operate immediately on 'ACC'.
When the sub-tree is not a leaf i.e. a further tree, the algorithm works
recursively to compile it. The algorithm can be tabulated for the various
combinations possible, the whole thing being regarded as 'evaluate tree' whose
object is to dump code which puts the result in 'ACC'. The table is shown in
figure 7.2.
This algorithm would not be acceptable if the order of evaluation was defined
as part of the language, for instance strictly left to right. As an example,
the object code is improved by evaluating the second operand first for
non-commutative operators. To illustrate the working of the algorithm, consider
the expression which was used above:
I * J - K / (L + M)
39
+--------------+------------+----------------+-------------------------+
| left branch | operator | right branch | actions |
+--------------+------------+----------------+-------------------------+
| leaf 1 | all | leaf 2 | LOAD,ACC,leaf 1 |
| | | | (operation),ACC,leaf 2 |
+--------------+------------+----------------+-------------------------+
| tree | all | leaf | evaluate tree |
| | | | (operation),ACC,leaf |
+--------------+------------+----------------+-------------------------+
| leaf | comm. | tree | evaluate tree |
| | | | (operation),ACC,leaf |
+--------------+------------+----------------+-------------------------+
| leaf | non-comm. | tree | evaluate tree |
| | | | STR,ACC,(work) |
| | | | LOAD,ACC,leaf |
| | | | (operation),ACC,(work) |
+--------------+------------+----------------+-------------------------+
| tree 1 | all | tree 2 | evaluate tree 2 |
| | | | STR,ACC,(work) |
| | | | evaluate tree 1 |
| | | | (operation),ACC,(work) |
+--------------+------------+----------------+-------------------------+
figure 7.2
The steps, with indentation for recursive depth are shown in below. The
'work' locations used for partial results are in the local name space for the
current routine as was described in chapter 5 in connection with array
declarations.
evaluate tree [ I*J-K/(L+M) ]
tree [ I*J ] all [ - ] tree [ K/(L+M) ]
evaluate tree [ K/(L*M) ]
leaf [ K ] non-comm [ / ] tree [ L+M ]
evaluate tree [ L+M ]
leaf [ L ] all [ + ] leaf [ M ]
LOAD,ACC, L
ADD,ACC, M
STR,ACC, work 1
LOAD,ACC, K
DIV,ACC, work 1
STR,ACC, work 1
evaluate tree [ I*J ]
leaf [ I ] all [ * ] leaf [ J ]
LOAD,ACC, I
MLT,ACC, J
SUB,ACC, work 2
The case of array elements and function calls as operands must be considered
in more detail. These cannot be regarded as leaves of the tree since they
involve further computation which needs register 'ACC'. They are therefore
treated as trees from the point of view of the algorithm. Array element
accessing puts the value of the element in 'ACC' and the result of functions is
also left in 'ACC'. This satisfies the requirements of 'evaluate tree'.
Variables of type %INTEGERNAME i.e. parameters, can still be regarded as
leaves by making use of another index register, 'WK', for the indirect
40
reference. Constants are leaves of trees and, in general, are accessed from the
table of constants following the object code using register 'COT' as the base.
The use of a location in the constant table is avoided if a 'LOAD' of the
constant is required and the value is less than 65536. This is because the
'LDA' instruction can be used with the index register set to zero. For
example,an instruction to 'LOAD' the value 1234 would be:
LDA, ACC, , 1234
The tree representation is that of Reverse Polish form together with
pointers. In Reverse Polish notation the operator follows the operands. In
general,
operand 1 operator operand 2
becomes:
operand 1 operand 2 operator
and the same transformation takes place for the operands when they are
expressions. For example, the infix expression:
I * (J + K) / L - M
becomes:
I J K + * L / M -
The extra pointers that were mentioned come after each operator and
correspond to the branches of the tree. In general the form is:
<-------------------------+
<--------------|----------------------+ |
| | | |
+-----------+ +-----------+ +----------+---+---+
| operand 1 | | operand 2 | | operator | | |
+-----------+ +-----------+ +----------+---+---+
When an operand is a further sub-expression, the pointer points to the
operator in it's repesentation. Otherwise, the pointer points to the first word
of a pair of words which describe the operand. The first of the pair is a type
number and the second is consequent on the value of the first. The type values
are: ^operators
|
-1 : function call
-2 : array element
leaves/ -3 : scalar variable
\ -4 : constant
They are negative to distinguish them from values which denote operators and
hence sub-trees. The values for operators are their alternative numbers from
phrase <OP> (with two additional values beyond for the unary operators '-' and
'\') and are thus always positive. Values greater than or equal to -2 cause the
algorithm to be called recursively, a requirement which was mentioned above.
For types -1 and -2, the second word of the pair contains an analysis record
pointer which will allow the function call or array element accession object
code to be generated when it is required. For type -3, the second word is the
41
tag of the variable and for type -4, the second word is the value of the
constant. As an example, consider the expression:
I + 12345
The representation of this is:
<------------------------+
<--------------|-------------------+ |
| | | |
+------+--------+------+-------+-----+-----+-----+
| -3 | tag(I) | -4 | 12345 | 9 | 1 | 3 |
+------+--------+------+-------+-----+-----+-----+
An example to show sub-trees might be:
(I + 99) * (J - 88)
The representation of this is:
<--------------------+
<-------------------------------|----------------+ |
<-------|-------+ <--------|--------+ | |
<---------|--------|---+ | <---------|---------|---+ | | |
| | | | | | | | | | | |
+---+------+---+----+---+---+---+---+------+---+----+----+---+----+---+---+----+
|-3 |tag(I)|-4 | 99 | 9 | 1 | 3 |-3 |tag(J)|-4 | 88 | 10 | 8 | 10 | 8 | 5 | 12 |
+---+------+---+----+---+---+---+---+------+---+----+----+---+----+---+---+----+
The technique used to convert infix expressions to this form combines a
method of converting them to Reverse Polish with a scheme of pseudo-evaluation
to calculate the values of the pointers. The conversion to Reverse Polish is
shown diagrammatically in figure 7.3. A pushdown stack is used for operators to
take account of their precedence. This delays their storage until the
appropriate place in the output. A new stack is declared at each recursive
level to deal with bracketted sub-expressions. The conversion of the
expression:
I + J * (K - L) / M
would proceed as shown in figure 7.4. The dotted box indicates the recursive
call to deal with '(K-L)'.
When operators and operands are 'stored' according to the flow diagram, a
pseudo-evaluation takes place. To explain what is meant by 'pseudo'-evaluation.
first consider an 'evaluation' of a reverse Polish expression. A pushdown stack
is used to retain operand values and partial results during the evaluation.
Proceeding from left to right, whenever an operand is encountered it's value is
pushed down. Whenever a (binary) operator is encountered, the top two items on
the stack are popped and the operation between them is performed. The result is
then pushed back onto the stack. (For unary operators, just one item is popped
from the stack). Proceeding in this way. when the whole of the reverse Polish
string has been dealt with, the value of the whole expression is what remains in
the stack. For example:
10 + 8 * (6 - 4) / 2
42
which in reverse Polish is:
10 8 6 4 - * 2 / +
The stages of evaluation in a pushdown stack are
| | | | | | | | | | | | | | | | | |
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+
| | | | | | | 4 | | | | | | | | | | |
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+
| | | | | 6 | | 6 | | 2 | | | | 2 | | | | |
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+
| | | 8 | | 8 | | 8 | | 8 | | 16 | | 16 | | 8 | | |
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+
| 10 | | 10 | | 10 | | 10 | | 10 | | 10 | | 10 | | 10 | | 18 |
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+
In a 'pseudo'-evaluation, it is not the values of operands that are pushed down
but some other property. In this case, it is positions in the representation of
the tree. In place of evaluation when operators occur, the positions of
operands are popped up and it is these values which are appended to the tree
representation. The position of the operator is then pushed down in their
place. Consider the example used above:
(I + 99) * (J - 88)
which in reverse Polish is:
I 99 + J 88 - *
The successive contents of the pseudo-evaluation stack will be:
| | | | | | | | | | | | | |
+----+ +----+ +----+ +----+ +----+ +----+ +----+
| | | | | | | | | | | | | |
+----+ +----+ +----+ +----+ +----+ +----+ +----+
| | | | | | | | | 10 | | | | |
+----+ +----+ +----+ +----+ +----+ +----+ +----+
| | | 3 | | | | 8 | | 8 | | 12 | | |
+----+ +----+ +----+ +----+ +----+ +----+ +----+
| 1 | | 1 | | 5 | | 5 | | 5 | | 5 | | 15 |
+----+ +----+ +----+ +----+ +----+ +----+ +----+
The representations of the trees corresponding to all the expressions in a
program can be monitored by using the parameter '/EXPR' when calling the
compiler. Examples can be found in appendix 1.
Implementation Details
The routine concerned with the compilation of expressions is named 'EXPR' and
can be found in file 'SKIMPE'. Within 'EXPR', subroutine 'TO TREE' produces the
linear tree representation using a further subroutine 'PS EVAL' for the
pseudo-evaluation part. It also has the subsidiary function of checking that
all the names used in the expression have been declared and are of an
appropriate type, for instance, not routine names.
The tree code generating algorithm is implemented by routine 'EVALUATE' which
makes use of a subroutine 'OPN' to dump primitive operations i.e. routine calls.
array and variable accesses and constant handling.
44
A minor complication of routine 'EXPR' deals with the expressions involved
with simple comparisons in conditional statements. This is because a comparison
such as:
expr1 <COMP> expr2
is compiled as an evaluation of 'expr1 - expr2' and then the value tested
against zero. An external flag named 'CONDFLAG' is set in these circumstances
and its effect is form a tree with branches representing each expression the
topmost node being a '-'. The algorithm then proceeds as normal.
45
+ - - - - - - - - - -+
| +----------+ |
| | sc 1 | |
| +----------+ |
| ->FALSE if false |
| +----------+ |
| | sc 2 | |
| +----------+ |
| ->TRUE if true |
| +----------+ |
| | sc 3 | |
| +----------+ |
+ - - - - - - - - - -+
->FALSE if false
TRUE: +--------------------+
| instr 1 |
+--------------------+
->NEXT
FALSE: +--------------------+
| instr 2 |
+--------------------+
NEXT:
The code in this second case should be of the form:
+----------------+
| sc 1 |
+----------------+
->FALSE if false
+----------------+
| sc 2 |
+----------------+
->FALSE if false
+----------------+
| sc 3 |
+----------------+
->FALSE if false
+----------------+
| instr 1 |
+----------------+
->NEXT
FALSE: +----------------+
| instr 2 |
+----------------+
NEXT:
Next consider:
%IF sc1 %OR sc2 %OR sc3 %THEN instr 1 %ELSE instr 2
This should produce the code in the diagram below. It will be apparent that if
an %AND follows the simple comparison, the jump is on false and if an %OR
follows, the jump is on true. To make this rule consistent throughout, since
the jump following the last simple comparison ts always on false we can imagine
an %AND following the last simple comparison:
%IF ( condition ) %AND %THEN instr 1 %ELSE instr 2
47
+----------------+
| sc 1 |
+----------------+
->TRUE if true
+----------------+
| sc 2 |
+----------------+
->TRUE if true
+----------------+
| sc 3 |
+----------------+
->FALSE if false
TRUE: +----------------+
| instr 1 |
+----------------+
->NEXT
FALSE: +----------------+
| instr 2 |
+----------------+
NEXT:
The next observation concerns the destination of the jump. This is not
always to one of the unconditional instructions. It can be to a later simple
comparison. For example:
%IF ((sc1 %AND sc2) %OR sc3) %AND sc4 %THEN instr 1 %ELSE instr 2
This should produce code of the form:
+----------------+
| sc 1 |
+----------------+
->FALSE1 if false
+----------------+
| sc 2 |
+----------------+
->TRUE if true
FALSE1:+----------------+
| sc 3 |
+----------------+
->FALSE2 if false
TRUE: +----------------+
| sc 4 |
+----------------+
->FALSE2 if false
+----------------+
| instr 1 |
+----------------+
->NEXT
FALSE2:+----------------+
| instr 2 |
+----------------+
NEXT:
For a simple comparison followed by an %AND, the jump is to the simple
comparison following the next %OR at an outer bracketting level. For a simple
comparison followed by an %OR, the jump is to the simple comparison following
48
the next %AND at an outer bracketting level. This can again be made consistent
if the conditional statement is regarded as of the form:
%IF ( ( condition ) %AND %THEN instr 1 ) %OR %ELSE instr 2
To demonstrate this, we could redraw the example above showing bracketting
levels with each simple comparison (or instruction) at the level of the
following %AND or %OR:
sc1 %AND
|
| false sc2 %OR
| |
| |true sc3 %AND
| | | |
+-------------|-------> |false sc4 %AND
| | | |
+-----------|--------> |
| |false instr1 %OR
| |
+------------+---------------------->instr2
A further example might be:
%IF (sc1 %OR sc2 %OR sc3) %AND (sc4 % OR sc5) %THEN instr1 %ELSE instr2
The organisation required is:
sc1 %OR cs2 %OR sc4 %OR
| | | |
|true |true sc3 %AND | | true
| | | | |
| | |false | | sc5 %AND
| | | | | |
+---------+-----------|-------> | |false
| | |
| +--------|-------->instr1 %OR
| |
+--------------------+-------------------->instr2
In compiling conditional statements, the plan of action is to make several
passes over the condition, working out the structure required step by step and
finally, in a last pass, generating the object code. Five parallel arrays are
used, each with one position per simple comparison. The first pass traverses
the analysis record and locates the positions of all the simple comparisons.
Simultaneously, the nesting levels and truth or falsity of the branches is
determined and also stored. The next pass goes over this information to
determine the destinations of the branches using the rules of thumb described
above. For those simple comparisons which are the destination of branches, the
next pass allocates 'private' labels to which branches can be compiled using the
normal methods used elsewhere in the compiler (see chapter 9). Private labels
are just values allocated upwards from 10000, which allows them to be
distinguished from ordinary labels. For the sake of consistency, the
instructions made conditional are also allocated positions in the arrays and are
regarded as being at nesting levels -1 and -2. The rules can then be applied
uniformly throughout.
When the instruction made conditional is itself a branch, the object is
optimised slightly to allow branches to be made direct to the required place
49
from within the condition instead of having a branch to a branch. For example:
%IF sc1 %OR sc2 %OR sc3 %THEN ->100 %ELSE instr
The code produced takes the form:
+-------------+
| sc 1 |
+-------------+
->100 if true
+-------------+
| sc 2 |
+-------------+
->100 if true
+-------------+
| sc 3 |
+-------------+
->100 if true
+-------------+
| instr |
+-------------+
This is easily achieved by using the required label instead of a private label
to label the conditional branch instruction in the branch array. The only other
modification is to ensure that the branch on the previous i.e.last, simple
comparison is on true rather than false and to the label specified. This should
be clearer when the implementation details are discussed.
The contents of the five arrays can be displayed during compilation by using
the parameter '/COND'. Examples will be found in appendix 1.
%START-%FINISH Groups
When either of the instructions made conditional is a %START-%FINISH group,
the destinations of the final branches are not immediately known. This can only
be done after the %FINISH has appeared. Information therefore has to be
preserved during the compilation of the group. Since the group may itself
contain further conditional statements and hence nested %START-%FINISH groups, a
pushdown list is used for the purpose. Furthermore, since routine bodies may
also appear anywhere and be nested, a list will potentially be required for each
textual level of the program. An array of headcells to lists is used as
previously to achieve this:
+----+ +---+---+ +---+---+
| |------>| | |---->| | |
+----+ +---+---+ +---+---+
| |---+
+----+ | +---+---+
| | +---| | |
+----+ +---+---+
| |
Two items of information need to be stored for each group. Firstly, a marker
which distinguishes between groups before and those after %ELSE keywords and
secondly, a label number assigned to the destination of branches around the
group. This will be a private label.
50
Implementation Details
The code generation for conditions is done by integerfn 'COND' in file
'SKIMPC'. This is called from the section of 'STATEMENT' in file 'SKIMPB' which
deals with conditional statements. It is at this latter place that the code for
the instructions made conditional is generated by calling routine 'INSTR'. In
order to deal with the branch instruction optimisation, two parameters in
addition to the parameter which indicates where the condition starts in the
analysis record are passed in. These give the label numbers of the branches or
-1 if either of then is a non-branch instruction. These are then used instead
of private labels as will be seen below. The result of this function is the
private label number of the label to be put in front of the %ELSE clause.
%START-%FINISH groups are dealt with after the condition code has been
generated and also where %FINISH statements are processed. Two subroutines are
used to maintain the pushdown lists. These are 'PUSHSTART' and 'POPSTART' and
can be found in file 'SKIMPD'.
The five parallel arrays within function 'COND' are named 'TESTP', 'LEVEL',
'ANDOR', 'BRANCH' and 'LABEL'. They hold. respectively, the analysis record
positions of the simple comparisons (or 'tests'), the nesting levels of the
simple comparisons, the truth or falsity of the jumps(1 for false and 2 for
true), the destination of the jumps (in terms of column position in the table)
and the labels assigned to simple comparisons. For example, consider:
%IF (sc1 %OR sc2 %OR sc3) %AND (sc4 %OR sc5) %THEN instr1 %ELSE instr2
The contents of the arrays will be:
1 2 3 4 5 6 7
TESTP ap1 ap2 ap3 ap4 ap5
LEVEL 1 1 0 1 -1 -2
ANDOR 2 2 1 2 1 2
BRANCH 4 4 7 6 7
LABEL -1 -1 -1 10000 -1 10002 10001
where the 'ap?' represent analysis record positions. The first pass over the
statement is performed by the subroutine 'PROCESSCOND' and fills in rows
'TESTP', 'LEVEL' and 'ANDOR'. It works recursively following the recursive
definition of conditions. Subsequently, two extra columns are filled in with -1
and -2 for the pseudo-%AND and the pseudo-%OR. On the next pass the row
'BRANCH'is filled in and following that labels are assigned as necessary. (The
-1 indicates that no jump takes place to this simple comparison and hence no
label is required. The final pass generates the object code by calling 'EXPR'
with the 'CONDFLAG' set as described in the previous chapter to evaluate LHS
minus RHS, and dumping the appropriate intermediate jump instructions.
Another example may help to clarify the picture further . Consider the case:
%IF sc1 %OR ((sc2 %OR sc3) %AND sc4 %AND sc5) %THEH instr1 %ELSE instr2
The structure of the statement is:
51
sc2 %OR
|
|true sc3 %AND sc4 %AND
| | | |
sc1 %OR | |false | |false
| | | | |
| true +-----------|------> | sc5 %AND
| | | |
| | | |false
| | | |
+----------------------|----------|-----------|-------->instr1 %OR
| | |
+----------+-----------+---------------------->instr2
The table for this statement will be:
1 2 3 4 5 6 7
TESTP ap1 ap2 ap3 ap4 ap5
LEVEL 0 2 1 1 -1 -2
ANDOR 2 2 1 1 1 2
BRANCH 6 4 7 7 7
LABEL -1 -1 -1 10004 -1 10003 10005
As a final example, consider a statement containing conditional branches:
%IF sc1 %OR sc2 %OR sc3 %THEN ->88 %ELSE ->99
The table will be:
1 2 3 4 5
TESTP ap1 ap2 ap3
LEVEL 0 0 -1 -2
ANDOR 2 2 2 2
BRANCH 4 4 4
LABEL -1 -1 -1 88 99
52
Chapter 9
Labels and Jumps
As the SKIMP compiler is a one-pass compiler, jumps to labels not yet
encountered cannot be completely generated. Backward jumps are, of course, no
problem. For the forward jumps, all that can be done is to leave a hole in the
branch instruction dumped which can be filled in at some later stage. In this
case, the assembler which forms the first stage of the 'SKIMPAI' interpreter
system has the job to do. In order for this to be possible, information must be
recorded somewhere concerning the positions of the holes to be filled in and
what to fill them in with. The method adopted is to use the holes themselves as
links in lists which relate to each label and for the compiler to dump 'FILL'
assembler directives which are really the headcells to these lists and which
also include the value to be filled in. The 'FILL' directives are dumped
whenever a label which has been previously referenced is encountered. An
example should make this clear. Consider the program:
. . . .
->88
. . . .
->99
. . . .
->99
. . . .
99: . . . .
. . . .
The object code dumped will take the form:
. . . . . .
+------+
12$ B,,,| 0 |
+------+
. . . . . .
+------+
34$ B,,,| 12 |
+------+
. . . . . .
+------+
56$ B,,,| 34 |
+------+
As labels are local to each level of routine nesting, once again separate
lists must be used for each level. Each cell of a list in this case contains
three items of information, the label number, a flag and an address:
+---------------+------+-----------+
| label | flag | address |
+---------------+------+-----------+
(16) (1) (15)
53
The flag indicates whether the address is that of the latest forward reference
to the label i.e. the current top of the forward reference list, or the actual
address of the label which will be needed for any backward references. The
value 1 indicates the former and 0 the latter. When additional forward
references appear therefore, the address is pushed down 'through' this cell.
A cell of this form is created whenever a label is encountered for the first
time - either as a label or as a jump. The list for the current textual level
is first searched and a new cell added if no existing cell refers to that label.
The flag allows two vital checks to be made. Firstly that the label has not
occurred before (as a label) and secondly at the end of the routine that the
label has in fact appeared.
Routine calls use the 'BAL' instruction and are always backward references
since SKIMP does not allow routine %SPECs. The address of the routine body is,
in other words, already known when any routine calls are made. SKIMP allows
routines to be placed anywhere in a program. In particular at the head of
surrounding routines. Jumps around these routine bodies must therefore be
dumped. These are forward references but are simple in that only one backward
reference will occur. They are therefore treated specially without having to
invent a private label.
Calls on the built-in input-output routines are also treated specially.
Since these routines are implemented in the 'SKIMPAI' assembler/interpreter, the
'BAL' instructions dumped contain the marker 'EXT' and a number which refers to
the routine required. Were external references part of the language, the names
themselves would have to appear in the object file and likewise for any external
variables in the program. A linker would then resolve the references between
programs.
Implementation Details
The routines which accomplish the functions decrlbed above are named 'FILL
LABEL' and 'FILL BRANCH', in file 'SKIMPD'. A function 'GET LABEL' extracts a
label from the analysis record and checks that it is valid i.e.less than 10000.
'POP LABELS' pops up the cells and checks for missing labels at %END (and
%ENDOFPROGRAM).
54
LEX, ANAL example
SKIMP COMPILER MKII
FILE: TESTL
OPTIONS: LEX ANAL
%begin
130
(1/STATEMENT) 8
0$ LDA,COT,,0
1$ LDA,DR1,,0
2$ LDA,STP,DR1,0
%integerfn r
136 134 256 82
(1/STATEMENT) 6 5 6 8 (5/PROC) 2 (6/NAME) 1
82 (8/FORMAL)
3$ B,,,0
4$ STR,DR2,STP,0
5$ LDA,DR2,STP,0
6$ STR,WK,STP,1
7$ LDA,STP,STP,0
%integer i,j,k
136 256 73 44 256 74 44 256 75
(1/STATEMENT) 5 3 (3/ARRAY) 2 6 8 (6/NAME) 1
73 (8/NAMES) 1 11 13 (11/NAME) 1 74 (13/NAMES) 1
16 18 (16/NAME) 1 75 (18/NAMES) 2
i=j+k
256 73 61 256 74 43 256 75
(1/STATEMENT) 1 3 (3/INSTR) 1 7 9 10 (7/NAME) 1
73 (9/ACTUAL) 2 (10/ASSIGN) 1 12 (12/EXPR) 1
16 17 23 (16/UNARY) 4 (17/OPERAND) 1 20 22
(20/NAME) 1 74 (22/ACTUAL) 2 (23/EXPRREST) 1 27
28 34 (27/OP) 9 (28/OPERAND) 1 31 33 (31/NAME) 1
75 (33/ACTUAL) 2 (34/EXPRREST) 2
8$ LOAD,ACC,DR2,3
9$ ADD,ACC,DR2,4
10$ STR,ACC,DR2,2
%if i>1234 %then %stop
135 256 73 62 257 1234 145 144
56
(1/STATEMENT) 2 5 36 37 (5/COND) 1 8 35 (8/TEST) 1
12 24 25 (12/EXPR) 1 16 17 23 (16/UNARY) 4
(17/OPERAND) 1 20 22 (20/NAME) 1 73 (22/ACTUAL) 2
(23/EXPRREST) 2 (24/COMP) 6 (25/EXPR) 1 29 30
34 (29/UNARY) 4 (30/OPERAND) 2 32 (32/CONST) 1
1234 (34/EXPRREST) 2 (35/CONDREST) 3 (36/INSTR) 6
(37/ELSE) 2
11$ LOAD,ACC,DR2,2
12$ SUB,ACC,COT,0
13$ BNG,ACC,,0
14$ STOP,,,0
15$ FILL,10000,13,15
%results=i+4321
141 61 256 73 43 257 4321
(I/STATEMENT) 1 3 (3/INSTR) 5 5 (5/EXPR) 1
9 10 16 (9/UNARY) 4 (10/OPERAND) 1 13 15 (13/NAME) 1
73 (15/ACTUAL) 2 (16/EXPRREST) 1 20 21 25 (20/OP) 9
(21/OPERAND) 2 23 (23/CONST) 1 4321 (25/EXPRREST) 2
15$ LOAD,ACC,DR2,2
16$ ADD,ACC,COT,1
17$ LDA,STP,DR2,0
18$ LOAD,DR2,STP,0
19$ LOAD,WK,STP,1
20$ B,,WK,0
%end
131
(1/STATEMENT) 7 3 (3/OFPROG) 2
21$ FILL,ALLOC,7,5
21$ STOP,,,0
22$ FILL,SKIP,3,22
%endofprogram
131 139
(1/STATEMENT) 7 3 (3/OFPROG) 1
22$ FILL,ALLOC,2,2
22$ STOP,,,0
23$ FILL,COT,0,23
23$ CONST...1234
24$ CONST,,,4321
25$ FILL,STACK,1,25
$ 0 FAULTS IN PROGRAM
57
EXPR example
SKIMP COMPILER MKII
FILE: TESTE
OPTIONS: EXPR
%begin
0$ LDA,COT,,0
1$ LDA,DR1,,0
2$ LDA,STP,DR1,0
%integer i,j,k,l
%integerarray a(1:10)
-4* 1
3$ LDA,ACC,,1
4$ STR,ACC,DR1,6
-4* 10
5$ LDA,ACC,,10
6$ LDA,ACC,ACC,1
7$ STR,ACC,DR1,7
8$ SUB,STP,DR1,6
9$ STR,STP,DR1,8
10$ ADD,STP,DR1,7
i=j+k
+
-3 01010003 -3 01010004 9* 1 3
11$ LOAD,ACC,DR1,3
12$ ADD,ACC,DR1,4
13$ STR,ACC,DR1,2
a(j+k)=i*l-j*k
+
-3 01010002 -3 01010005 8 1 3 -3 01010003 -3 01010004
8 8 10 10* 5 12
* -
14$ LOAD,ACC,DR1,3
15$ MLT,ACC,DR1,4
16$ STR,ACC,DR1,6
17$ LOAD,ACC,DR1,2
18$ MLT,ACC,DR1,5
19$ SUB,ACC,DR1,6
20$ STR,ACC,DR1,6
+
-3 01010003 -3 01010004 9* 1 3
21$ LOAD,ACC,DR1,3
22$ ADD,ACC,DR1,4
23$ ADD,ACC,DR1,8
24$ LOAD,WK,DR1,6
25$ STR,WK,ACC,0
58
i=i*(j+k)/(l-i**2)
+ *
-3 01010002 -3 01010003 -3 01010004 9 3 5 8 1 7
-3 01010005 -3 01010002 -4 2 6 15 17 10 13 19 7*
10 22 ** -
26$ LOAD,ACC,DR1,2
27$ EXP,ACC,COT,0
28$ STR,ACC,DR1,6
29$ LOAD,ACC,DR1,5
30$ SUB,ACC,DR1,6
31$ STR,ACC,DR1,6
32$ LOAD,ACC,DR1,3
33$ ADD,ACC,DR1,4
34$ MLT,ACC,DR1,2
35$ DIV,ACC,DR1,6
36$ STR,ACC,DR1,2
i=a(j)+a(k)
+
-2 17 -2 43 9* 1 3
-3* 01010004
37$ LOAD,ACC,DR1,4
38$ ADD,ACC,DR1,8
39$ LOAD,ACC,ACC,0
40$ STR,ACC,DR1,6
-3* 01010003
41$ LOAD,ACC,DR1,3
42$ ADD,ACC,DR1,8
43$ LOAD,ACC,ACC,0
44$ ADD,ACC,DR1,6
45$ STR,ACC,DR1,2
%endofprogram
46$ FILL,ALLOC,2,9
46$ STOP,,,0
47$ FILL,COT,0,47
47$ CONST,,,2
48$ FILL,STACK,1,48
$ 0 FAULTS IN PROGRAM
59
TAGS Example
SKIMP COMPILER MKII
FILE: TESTT
OPTIONS: TAGS
%begin
0$ LDA,COT,,0
1$ LDA,DR1,,0
2$ LDA,STP,DR1,0
%routine a(%integer i,j,k)
3$ B,,,0
4$ STR,DR2,STP,0
5$ LDA,DR2,STP,0
6$ STR,WK,STP,1
7$ LDA,STP,STP,0
%end
8$ FILL,ALLOC,7,5
75 K 01020004
74 J 01020003
73 I 01020002
8$ LDA,STP,DR2,0
9$ LOAD,DR2,STP,0
10$ LOAD,WK,STP,1
11$ B,,WK,0
12$ FILL,SKIP,3,12
%integerfn b(%integername l)
12$ B,,,0
13$ STR,DR2,STP,0
14$ LDA,DR2,STP,0
15$ STR,WK,STP,1
16$ LDA,STP,STP,0
%routine c(%integerarrayname m,n)
17$ B,,,0
18$ STR,DR3,STP,0
19$ LDA,DR3,STP,0
20$ STR,WK,STP,1
21$ LDA,STP,STP,0
%end
22$ FILL,ALLOC,21,4
78 N 31130003
77 M 31130002
62
22$ LDA,STP,DR3,0
23$ LOAD,DR3,STP,0
24$ LOAD,WK,STP,1
25$ B,,WK,0
26$ FILL,SKIP,17,26
%end
26$ FILL,ALLOC,16,3
67 C 40220012
31130002
31130003
76 L 11020002
26$ STOP,,,0
27$ FILL,SKIP,12,27
%integer i,j
%endofprogram
27$ FILL,ALLOC,2,4
74 J 01010003
73 I 01010002
66 B 4111000D
11020002
65 A 40310004
01020002
01020003
01020004
27$ STOP,,,0
28$ FILL,COT,0,28
28$ FILL,STACK,1,28
$ 0 FAULTS IN PROGRAM
63
Basic Example
SKIMP COMPILER MKII
FILE: HANOI
OPTIONS:
%begin
0$ LDA,COT,,0
1$ LDA,DR1,,0
2$ LDA,STP,DR1,0
%routine hanoi(%integer n,p1,p2)
3$ B,,,0
4$ STR,DR2,STP,0
5$ LDA,DR2,STP,0
6$ STR,WK,STP,1
7$ LDA,STP,STP,0
%if n>0 %then %start
8$ LOAD,ACC,DR2,2
9$ BNG,ACC,,0
hanoi(n-1,p1,6-p1-p2)
10$ LOAD,ACC,DR2,2
11$ SUB,ACC,COT,0
12$ STR,ACC,STP,2
13$ LOAD,ACC,DR2,3
14$ STR,ACC,STP,3
15$ LDA,ACC,,6
16$ SUB,ACC,DR2,3
17$ SUB,ACC,DR2,4
18$ STR,ACC,STP,4
19$ BAL,UK,,4
write(p1,1) ;
20$ LOAD,ACC,DR2,3
21$ STR,ACC,STP,2
22$ LDA,ACC,,1
23$ STR,ACC,STP,3
24$ BAL,WK,EXT,11
write(p2,1) ;
25$ LOAD,ACC,DR2,4
26$ STR,ACC,STP,2
27$ LDA,ACC,,1
28$ STR,ACC,STP,3
29$ BAL,HK,EXT,11
newline
30$ BAL,WK,EXT,7
64
hanoi(n-1,6-p1-p2,p2)
31$ LOAD,ACC,DR2,2
32$ SUB,ACC,COT,O
33$ STR,ACC,STP,2
31$ LDA,ACC,,6
35$ SUB,ACC,DR2,3
36$ SUB,ACC,DR2,4
37$ STR,ACC,STP,3
38$ LOAD,ACC,DR2,4
39$ STR,ACC,STP,4
40$ BAL,WK,,4
%finish
11$ FILL,10000,9,41
%end
41$ FILL,ALLOC,7,5
41$ LDA,STP,DR2,0
42$ LOAD,DR2,STP,0
43$ LOAD,WK,STP,1
44$ B,,WK,0
45$ FILL,SKIP,3,45
%integer a,b,c
1:read(a)
45$ LDA,ACC,DR1,2
46$ STR,ACC,STP,2
47$ BAL,WK,EXT,10
%if a=0 %then %stop
48$ LOAD,ACC,DR1,2
49$ BNZ,ACC,,0
50$ STOP,,,0
51$ FILL,10001,49,51
read(b) ;
51$ LDA,ACC,DR1,3
52$ STR,ACC,STP,2
53$ BAL,WK,EXT,10
read(c)
54$ LDA,ACC,DR1,4
55$ STR,ACC,STP,2
56$ BAL,WK,EXT,10
65
hanoi(a,b,c)
57$ LOAD,ACC,DR1,2
58$ STH,ACC,STP,2
59$ LOAD,ACC,DR1,3
60$ STR,ACC,STP,3
61$ LOAD,ACC,DR1,4
62$ STR,ACC,STP,4
63$ BAL,WK,,4
->1
64$ B,,,45
%endofprogram
65$ FILL,ALLOC,2,5
65$ STOP,,,0
66$ FILL,COT,0,66
66$ CONST,,,1
67$ FILL,STACK,1,67
$ 0 FAULTS IN PROGRAM
66
1. a. Identifier labels instead of numerical labels:
LOOP: ->NEXT
NEXT: ->LOOP
b. Switches:
%SWITCH TYPE(1:9)
TYPE (I): ->TYPE(I*J)
2. a. Indefinite loops:
%CYCLE
. . . .
%REPEAT
b. Incremental loops:
%FOR I=1,1,N %CYCLE
. . . .
%REPEAT
c. 'While' loops:
FRED %WHILE I>0
%WHILE I>0 %CYCLE
. . . .
%REPEAT
d. 'Until' loops:
FRED %UNTIL I>0
%CYCLE
. . . .
%REPEAT %UNTIL I>0
e. %EXIT and %CONTINUE statements.
3. a. Reversed order conditional statements:
I=J*K %IF I=0
b. 'Unless' conditions:
%UNLESS I=J %THEN K=L
c. 'else-if' clauses:
%IF I=0 %THEN J=0 %ELSE %IF K=0 %THEN L=0
d. Multi-sided comparisons:
%IF A<B<C<D %THEN E=F
68
4. a. Multi-dimensional arrays:
%INTEGERARRAY A(1:9,-1:1),B,C(1:N,1:M*N,0:1)
b. Array bound checking:
5. a. Own variables:
%OWNINTEGER I,J=123,K=456
b. Own arrays:
%OWNINTEGERARRAY L(1:10)=1,2,3.4,5,6(5)
c. Constant variables and arrays:
%CONSTINTEGER I=1234
%CONSTINTEGERARRAY J(1:3)=2,7,4
6. Real variables:
%REAL X,Y,Z
%REALARRAY A(1:N)
%REALFN RF(%REAL X,%REALNAME Y,%REALARRAYNAME Z)
SKIMPAI has, built-in, the real operations:
ADDF, SUBF, MLTF, DIVF, EXPF, NEGF, FLT
7. String variables:
%STRING(15) S
%STRING(7)%ARRAY T(1:10)
%STRING(99)%FN SF(%STRING(3) S)
S=S."FRED".SF("JOE").T(I)
8. Records:
%RECORDFORMAT RF(%INTEGER I,J,%RECORD(RF)%NAME K)
%RECORD(RF) R
R_I=R_J+1
X=R_K_K_I
9. a. INTERDATA object code
b. PDP-11 object code
c. VAX 11-780 object code
69
The object file output from the SKIMP compiler forms the input to the
assembler/interpreter. The first stage identifies statements to be assembled by
means of the '$' symbol following code addresses. It is the remainder of such
lines that are regarded as significant. Any other lines such as compiler
monitoring are ignored. This also allows interpreter directives to be inserted
in the program by using SKIMP comments as shown:
! $ . . . .
The interpreter directives available are:
MONITOR
TRON
TROFF
The first of these, 'MONITOR', when encountered during interpretation prints out
the contents of the registers and the stack. Since this may generate a
substantial amount of output, this facility should be used sparingly. The
second and third turn on and off respectively a trace facility which prints out
the value of the program counter for each instruction interpreted. This can
also be turned on by means of a '/TR' parameter in the call of 'SKIMPAI'.
Tracing may also generate large amounts of output| For example:
! $ MONITOR
The interpreter will not attempt to interpret programs which failed to
compile correctly. Furthermore, the system makes various checks during
interpretation in an endeavour to ensure that the program has not gone out of
control. These include unassigned variable checking, that data addresses lie
within the stack or constant table and that program counter addresses lie within
the code area. A limit of 10000 instructions executed is also imposed to catch
loops.
To demonstrate the use of these facilities, consider the following program:
%BEGIN
%INTEGERARRAY A(1:10)
%INTEGER I
! $ TRON
I=1
1: A(I)=I*I
I=I+1
%IF I<=10 %THEN ->1
! $ TROFF
! $ MONITOR
%ENDOFPROGRAM
The output from the compiler and the interpreter are shown below:
71
SKIMP COMPILER MKII
FILE: TEST
OPTIONS:
%BEGIN
0$ LDA,COT,,0
1$ LDA,DR1,,0
2$ LDA,STP,DR1,0
%INTEGERARRAY A(1:10)
3$ LDA,ACC,,1
4$ STR,ACC,DR1,2
5$ LDA,ACC,,10
6$ LDA,ACC,ACC,1
7$ STR,ACC,DR1,3
8$ SUB,STP,DR1,2
9$ STR,STP,DR1,4
10$ ADD,STP,DR1,3
%INTEGER I
! $ TRON
I=1
11$ LDA,ACC,,1
12$ STR,ACC,DR1,5
1: A(I)=I*I
13$ LOAD,ACC,DR1,5
14$ MLT,ACC,DR1,5
15$ STR,ACC,DR1,2
16$ LOAD,ACC,DR1,5
17$ ADD,ACC,DR1,4
18$ LOAD,WK,DR1,2
19$ STR,WK,ACC,0
I=I+1
20$ LOAD,ACC,DR1,5
21$ ADD,ACC,COT,0
22$ STR,ACC,DR1,5
%IF I<=10 %THEN ->1
23$ LOAD,ACC,DR1,5
24$ SUB,ACC,COT,1
25$ BHG,ACC,,13
72
! $ TROFF
! $ MONITOR
%ENDOFPROGRAM
26$ FILL,ALLOC,2,6
26$ STOP,,,0
27$ FILL,COT,0,27
27$ CONST,,,1
28$ CONST,,,10
29$ FILL,STACK,1,29
$ 0 FAULTS IN PROGRAM
11$ 12$ 13$ 11$ 15$ 16$ 17$ 18$ 19$ 20$
21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$
18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$
15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$
25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$
22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$
19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$
16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$
13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$
23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$
20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$
17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$
14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$
24$ 25$
COT 27
DR1 29
STP 45
ACC 1
WK 100
29$ ? ? 100 11 34 11 1 4 9 16 25 36 49 64 81
100 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ?
STOPPED AT 26$, 143 INSTRUCTIONS EXECUTED
73