head 1.2; access; symbols; locks gtoal:1.2; strict; comment @# @; 1.2 date 2010.02.11.22.57.44; author gtoal; state Exp; branches; next 1.1; 1.1 date 2010.02.11.22.57.17; author gtoal; state Exp; branches; next ; desc @Skimp Mk II manual @ 1.2 log @2 typos @ text @
This document is a transcription of these scans.

NB: The typesetting system and the printer used for the original document used a monospaced font, but with fractional width spacing between words to provide fully justified paragraphs of text.

Since this spacing is difficult to reproduce and represent using HTML/ASCII, the justification is approximated using full sized spaces instead.


















                                  SKIMP  MK II


                                   D. J. Rees












































                                  Introduction

   SKIMP is a  very simple language designed with  the teaching of compilers  in
mind.   Although  simple,  it  possesses many  of  the  structural  features  of
full-scale languages, for example the subroutine and parameter structure, making
it a  worthwhile non-trivial  vehicle for  teaching.   Many  useful features  of
full-scale  languages are  missing  but can  bp  incorporated with  very  little
difficulty.  This is quite intentional.  The reasons are firstly that SKIMP is a
sufficiently rich language for the  compiler to be written in SKIMP  itself with
little inconvenience (the SKIMP MKI compiler was written  in SKIMP) and secondly
that  incorporation of the missing  features provides  a  very suitable form  of
student exercise.   Although  it  would be  desirable for  students to  write  a
complete compiler of  their own,  this is a considerable  task which  they would
probably not have the time to complete.  The task of extending a simple language
such  as SKIMP for which  a compiler  already exists is  much more feasible  and
still  allows  students to get  to  grips  with  the compiling  process  and  so
understand it more fully.

   The name SKIMP arises fron the  language's origin.  Essentially it is a  much
smaller  version of the  language IMP, an  ALGOL-60 type  language developed  in
Edinburgh for systems implementation and general use.

   Since SKIMP  is intended for teaching rather than  production programming, it
has not been necessary for the compiler to produce object  code for a particular
real machine.  Instead, the opportunity has been taken to invent a machine whose
order  code does not  have the inconsistencies  and exceptional  cases that  all
actual machines seen  to have.   The compiler  produces symbolic  code for  this
invented machine together  with a certain amount  of optional monitoring of  the
compilation.

   This document describes the SKIMP language, the object machine and details of
the method of compilation.  Examples of compiler output and suggested extensions
to SKIMP  suitable for student  exercises are given  in appendices.   To enable
students  to test  their modified  compilers  an assembler/interpreter  for the
object code is available.  The use of this is also described in an appendix.




























                                       1


                                    Chapter 1

                               The SKIMP Language

   A SKIMP program consists of a sequence of keywords, identifiers and constants
together with arithmetical operators and various separator characters.


Keywords


   A keyword consists of a sequence  of upper  or lower case letters preceded by
the character '%'.  For example:

                 %ROUTINE         %END          %THEN

The '%' character is regarded as a shift character in that all following letters
up to the  first non-letter are shifted to keyword mode.   This implies  that no
spaces  or other  non-letter characters may  be inserted  into the  middle  of a
keyword.  The '%' can, however, be repeated after a space.  Thus,

                %INTEGERNAME      and     %INTEGER %NAME

are valid and equivalent, but

                             %INTEGER NAME

is invalid.


Identifiers


   An identifier consists of  a  string of letters and digits of any length the
first character of which must be a letter.  For example,

                       SKIMP    I     J2     X1A

In descriptions of SKIMP syntax below, the phrase <NAME>  is used to denote the
presence of an identifier.


Constants


   Constants appear  as operands  in arithmetic expressions  and may  be of  two
forms, decimal and  character.  Both represent  integer valued quantities  since
SKIMP only has integer valued variables and performs only integer arithmetic.  A
decimal constant is represented by a sequence of digits.  For example:

                         123     98765       4

No decimal points or powers of 10 are allowed.

   Strings  of up  to four  characters can  be  represented by  surrounding  the
characters with quotes.  For example:

                     'FRED'       '123*'       'X'



                                       2
The internal  value is formed  by  packing the (ASCII) values  of the characters
right-justified in 8-bit  fields of  a word.   A  quote can  become one  of  the
characters of the value by repeating it.  For example:

                           'JIM''S'        ''''


Input Lexical Processing


   Spaces  are deleted  on  input everywhere  except  when they  appear  between
quotes.   Before deletion,  however, they  are  significant in  terminating  the
keyword '%' shift mode.   Text between quotes remains intact with the  exception
that a repeated  quote is regarded as  a single quote.   Newlines can  therefore
appear between  quotes.   For  example,  to test  for  the value  of  a  newline
character:

                            %IF  I = '
                            '  %THEN  %RETURN

This is equivalent to:

                       %IF  I = 10  %THEN  %RETURN

since the ASCII value of a newline character is 10.

   SKIMP statements are  separated by either  ';' or  a newline.   Thus  several
statements can appear on one line of text.  For example:

                       X =  1 ;  Y = 2  ;  Z = 3

In  order to represent  a single statement  on more  than one line  intermediate
newline characters can be inserted when preceded by the continuation keyword %C.
For example:

                         %IF  X = 1  %THEN  %C
                         A = B

This is equivalent to:

                         %IF  X = 1  %THEN  A = B


Program Structure


   A SKIMP program starts with the statement:

                                     %BEGIN

and ends with the statement:

                                 %ENDOFPROGRAM

   Inside the program, routines and  functions may  appear in sequence or nested
as required.  A routine takes the form:







                                        3
                             %ROUTINE  RT
                             .....
                             %END

and a function takes the form:

                             %INTEGERFN  FN
                             .....
                             %END

The only difference between  routines and functions is that  functions produce a
value as their  result and routines  do not.   The structure of a  program could
take the following form, for example:

                           %BEGIN
                              %ROUTINE  A
                              .....
                              %END
                              %INTEGERFN B
                                 %ROUTINE  C
                                 .....
                                 %END
                              .....
                              %END
                           .....
                           %ENDOFPROGRAM

Details of routines and functions and their parameters are given later.


Declarations


   Names of  variables  and arrays denoting storage objects are declared at  the
start of a  routine or function.   Execution of  declaration statements has  the
affect of setting aside storage for those variables or arrays.  When the routine
or function in which the declaration  occurred is returned from, that storage is
deleted so that  it may be re-used for future declarations.   When routines  and
functions are entered recursively new storage is set aside for the variables and
arrays declared there without the storage for the old incarnation of the routine
or functions  being lost.   This  becomes accessible  again when  the  recursive
activation is left.  A stack is used to accomplish this.  The only objects SKIMP
manipulates are integer valued.

   All names must be declared before they can be used.  Scalar declarations take
the form:

                     %INTEGER <NAME>, <NAME>, <NAME>, . . .

Arrays are declared:

              %INTEGERARRAY <NAME>, <NAME>, . . .( <EXPR> : <EXPR> )

where  the phrase  <EXPR> is  used to  denote  the presence  of an  arithmetical
expression.  For example:

                               %INTEGER  I, J, K

sets aside three locations and names them  'I', 'J' and 'K'.  The expressions in




                                       4
the array declaration are the lower and upper bounds on the index of the arrays.
They may be either constants, for example:

                           %INTEGERARRAY A ( I : 10 )

or expressions which are evaluated at run-time, for example:

                         %INTEGERARRAY B, C ( -1 : N+1 )

Only one-dimensional arrays are defined in SKIMP (but see possible extensions in
Appendix 2).

   Note that  the first statement  of a routine  or function  has the effect  of
declaring its name.   Hence,  this must appear  before any calls  on the  names.
This  allows self-recursive  calls but  not  cross-recursive  calls between  two
routines at the  same nesting level. (SKIMP does  not have routine and  function
SPECs as in IMP).


Scope of Variables


   When a variable or array is declared, that name  can be used anywhere in the
routine or function in  which the declaration appears, but not  outside it.  If
the routine or  function has nested routines or functions the  name can also be
used in these (as 'global' to them).  For example:

                            %ROUTINE  A
                            %INTEGER I
                               %ROUTINE B
                               %INTEGER J
                               ....
                               %END
                               %ROUTINE C
                               %INTEGER K
                                  %ROUTINE D
                                  ....
                                  %END
                              ....
                              %END
                           ....
                           %END

The name 'I' can be used anywhere in routines  'A', 'B', 'C' and 'D'.   The name
'J' can be used only in routine 'B' and the name 'K' can be used in routines 'C'
and 'D'.

   Although, clearly, all the names of variables declared in a single routine or
function must be distinct, the same names may be declared  in different routines
to refer to different  storage objects.  When a  name is redeclared  in a nested
routine or function, the storage for the global  object of the same name becomes
inaccessible whilst within that routine or function.  The name will always refer
to the most 'local' declaration.


Arithmetic Expressions


   An arithmetic  expression consists  of a  seauence of  operands separated by
operators, possibly preceded by a  unary operator (*, -, \)  .  Operands can be



                                        5
array elements,  constants, calls  on functions  or bracketted  sub-expressions.
The operators are:

                  +       :       addition
                  -       :       subtraction
                  *       :       MultIplication
                  /       :       division (rounded)
                  **      :       exponentiation
                  <<      :       logical shift left
                  >>      :       logical shift right
                  &       :       logical AND
                  |       :       logical OR
                  ||      :       logical Exclusive OR
                  \       :       logical NOT

Their precedences are:

                        \             :       highest
                   **   <<    >>
                   *    /     &
                  +   -   |   ||      :       lowest

The precedence is left to right for operators of equal precedence. Thus:

                                    1 / J * K

is equivalent to:
                                 ( 1 / J ) * K

Array element operands take the form:

                                <NAME> ( <EXPR> )

For example:

                                     A ( 1 )
                                B ( 2 * A ( J ) )


Assignment


   Variables and array elements are assigned new values using statements of the
form :

                                 <NAME> = <EXPR>
                           <NAME> ( <EXPR> ) = <EXPR>

For example:
                                     I  =   2
                          A ( I * J ) = ( K + L ) / 2












                                        6
Routines and Functions


   A routine  is called by executing a statement  consisting of the name  of the
routine.  For example:

                             %ROUTINE  RT
                             ....
                             %END

would be called by the statement:

                                       RT

Similarly, the call of a function appears as an  operand of an  expression.  For
example:

                              %INTEGERFN   FN
                              ....
                              %END

might be called by:

                                         I = FN

Routines and functions can be called recursively to any  depth.  The exit from a
routine is specified by the statement:

                                    %RETURN

This is also  implied by executing  the %END statement of a  routine,  The exit
from a function must give in addition a result for the function:

                                %RESULT = <EXPR>

For example:

                              %RESULT = I * J - K

Execution of the %END statement of a function is invalid.  A routine or function
can only be entered by an appropriate call as described  above.  If a routine or
function is encountered in the normal flow of  control of a  program at run-time
it is skipped around.


Routine and Function Parameters


   Parameters of the following types can be passed to routines and functions:

                                    %INTEGER
                                  %INTEGERNAME
                                %INTEGERARRAYNAME

The first is a 'value' type  parameter and the other two are  'reference' types.
For the  %INTEGER type, the  value of an expression  in the routine or  function
call is calculated and passed to the routine or function as the initial value of
the parameter.  The initialised parameter can be regarded thereafter as a  local





                                       7
variable.  For the %INTEGERNAME type, a reference to some storage object outside
the routine or function is passed and this  object will be  referred to whenever
the parameter is  referenced within the  routine  or  function.  Hence  the only
admissible forms of parameter in the call  are either  the name of a variable or
the name of  an array  element.   Similarly for  the %INTEGERARRAYNAME   type, a
reference to  an  array is  passed and  the  only  admissible form  of  'actual'
parameter is the name of an array. For  example:

       %INTEGER A,B,C
       %INTEGERARRAY M,N (1:10)
       %ROUTINE JIM(%INTEGER I,%INTEGERNAME J, %INTEGERARRAYNAME  K)
       ....
       %END
       JIM (A+B,C,M)
       JIM (A,M(B),N)


Jumps


   SKIMP labels are numerical,  For example:

                                123:        987:

To jump to these labels the instructions would be:

                              ->123          ->987

Labels are local to a  routine or function and jumps can only take place  within
the routine or function and not either to an outer textual level or to a  nested
inner level.   Labels  are not  declared and  jumps may  precede or  follow  the
occurrence of the corresponding label without restriction.


Conditional Statements


   The general  form of the conditional statement is:

                      %IF <COND> %THEN <INSTN> %ELSE <INSTR>

where  <COND> represents the condition to  be tested and  <INSTR> represents  a
restricted class of instructions described below.  If the condition is true the
first of these instructions is executed, otherwise the second.  The %ELSE clause
can be omitted in which case if the condition is false execution proceeds to the
following statement.

   A <COND> is formed by one or more simple relations of the form:

                              <EXPR> <COMP> <EXPR>

where <COMP> represents one of the following comparators:

                            =   *   <=    <   >=   >

For example:

                                   I < J + K

These simple relations can  be combined using the  keywords  %AND and %OR.   For



                                       8
example:

                                I = J  %AND  K = 0

No precedence is defined between  %AND and %OR.  Instead, brackets must be used.
A single level of bracketting  may only contain a sequence of simple comparisons
connected with either %AND or %OR but not both. For example:

                 I = 0  %AND  ( J = 0  %OR  K = 0  %OR  L  = 0 )

The relations are evaluated  left to right  but  only up to the point  where the
truth or falsity of the whole condition can be determined.

   An <INSTR> may be any of the following types of statement:

                          assignment
                          routine call
                          jump
                          %RETURN
                          %RESULT=<EXPR>
                          %STOP

For example:

                            %IF  I  = J  %THEN  %RETURN
                       %IF  K > 0  %THEN  K = 0   %ELSE  K = -1

In addition, groups of statements can  take the place of an <INSTR> and be  made
conditional.   The grouping  is made  by preceding  the first  statement by  the
statement:

                                     %START

and following the last one by the statement:

                                    %FINISH

For example:

                    %IF  J = K %THEN L = 0 %ELSE %START
                       %IF J < K %THEN %START
                         L = 1
                         M = 2
                       %FINISH  %ELSE %START
                         L = 2
                         M = 1
                       %FINISH
                    %FINISH

%STARTs and  %FINISHes may be nested  to any depth but  must balance within any
routine or function.


Comments


   Comment statements start with the exclamation character '|'. Thereafter, all
text is ignored up to the next  statement separator i.e. semi-colon or newline.
For example:




                                        9
           ! this is a comment
   I = 0      ;! so is this ;   J = 0


Input-Output


   The standard IMP set of  input-output routine  names are  built into SKIMP.
These are:

  %ROUTINE READ SYMBOL(%INTEGERNAME X)
    Read a single symbol from the current  input stream into X.

  %INTEGERFN NEXT SYMBOL
    Return as the result the next single symbol which will be read.

  %ROUTINE SKIP SYMBOL
    Skip over the next single symbol in the input.

  %ROUTINE PRINT SYMBOL(%INTEGER X)
    Output the single symbol X.

  %ROUTINE SPACE.
    Output a space symbol.

  %ROUTINE SPACES(%INTEGER X)
     Output X spaces.

  %ROUTINE NEWLINE
    Output a newline symbol.

  %ROUTINE NEWLINES(%INTEGER X)
    Output X newlines.

  %ROUTINE NEWPAGE
    Output a newpage symbol.

  %ROUTINE READ(%INTEGERNAME X)
    Read a decimal number into X skipping preceding spaces and newlines.

  %ROUTINE WRITE(%INTEGER X, Y)
    Output the decimal number X of Y digits.





















                                        10


                                   Chapter 2

                              The Object Machine.


   The SKIMP compiler produces object code output for an invented  machine.  The
output takes the form of what would be this machine's assembly  code rather than
a binary form so that it can easily be inspected by students.  Since the machine
is only an invented one and does not exist, the object code cannot be run in the
ordinary way,  but an  assembler/interpreter for  this code  is available  which
allows the output to be tested.

   No attempt  has been  made to  define the  machine  completely.   Only  those
features necessary  and  relevant for  the  compiler  are  defined.    Interrupt
structure and input-output instructions, for example, are not defined.

   The  machine  has  a  number  of  registers  which  can  either  be  used  as
accumulators or as index registers.  In this respect,  it is similar to machines
such as  IBM 370, DEC  10 etc..   It is  not a  true multi-accumulator  machine,
however, in that no register to register arithmetic operations are defined.  The
instruction format is shown below:

                   operation , register , base , displacement

For example:

                                LOAD, ACC, DR1, 2


   Instructions operate  between the register and  an  effective storage address
formed from the sum of the contents  of the  base register and the displacement.
The  machine  is word-addressed  and  assumed  to  have  sufficient  storage and
registers for the purposes of the compiler.  Although not strictly necessary, it
is also useful to define a word size and the instruction word layout.  A  32-bit
word is envisaged with the layout:

           +----------------+------------+--------+------------------+
           |   operation    |  register  |  base  |   displacement   |
           +----------------+------------+--------+------------------+
                   (8)           (4)        (4)          (16)

This provides for 16 registers and a displacement of  up to 65535.  All  sixteen
registers, 0-15, can be used in the register field of the  instruction, but only
registers 1-15 in the base field allowing  a zero in the base field to  indicate
no indexing in the effective address.

   The  compiler object code  output does not  in practice use register numbers.
Instead,  it  uses  register  mnemonics  for  convenience  of  inspection.   The
mnemonics used by the compiler are:

                  ACC, STP, COT, WK, DR0, DR1, DR2, DR3,.....

Their individual use is explained in later chapters.








                                        11
   The operations defined for this machine are:

LOAD   :   Load  the register with  the contents  of the word  at the  effective
           storage address.

LDA    :   Load the register with the effective storage address.

ADD    :   Add the contents of the word  at the effective storage address to the
           register.

SUB    :   Subtract the  contents of  the word at the effective storage  address
           from the register.

MLT    :   Multiply the  register by the contents  of the word at the  effective
           storage address.

DIV    :   Divide  the register by  the contents of  the word  at the  effective
           storage address.

EXP    :   Raise the register to the power given  by the contents of the word at
           the effective storage address.

STR    :   Store the  contents of the register in  the location at the effective
           storage address.

NEG    :   Negate  the  contents of  the  register  (the  effective  address  is
           ignored).

NOT    :   Perform the  logical NOT on  the register  (the effective  address is
           ignored).

SHL    :   Shift the register logically left by the amount given by the contents
           of the word at the  effective  storage address (zeros shifted in from
           the right, bits shifted out to the left lost).

SHR    :   Shift  the  register logically  right  by  the amount  given  by  the
           contents of the word at the effective  storage address (zeros shifted
           in from the right, bits shifted out to the right lost).

AND    :   Perform the logical AND  between the register and the contents of the
           word at the effective storage address.

OR     :   Perform the logical OR between the  register  and the contents of the
           word at the effective storage address.

XOR    :   Perform  the logical  Exclusive  OR  between  the  register  and  the
           contents of the word at the effective storage address.

BAL    :   Unconditional branch to the effective  address and the address of the
           next instruction stored in the specified register.

B      :   Unconditional branch to the effective  address  (any register present
           is ignored).

BZ     :   Branch to the effective address if the  contents of  the register are
           zero.

BNZ    :   Branch to the effective address if the  contents of  the register are
           not zero.




                                       12
BG     :   Branch to the effective address if  the contents of the  register are
           greater than zero.

BNG    :   Branch to the effective address if  the contents of the  register are
           not greater than i.e. less than or equal to, zero.

BL     :   Branch to the effective address if  the contents of the  register are
           less than zero

BNL    :   Branch to  the effective address if the contents of the  register are
           not less than i.e. grater than or equal to, zero

STOP   :   Stop execution.

   In addition to  these instructions, the compiler  also  outputs an  assembler
directive 'FILL' in  the same instruction format.   This is  used to direct  the
assembler  to  fill  in previously  assembled  holes  in the  object  code  with
appropriate values.  This  is intended primarily for dealing  with forward jumps
and is described in detail in later chapters.












































                                       13
                                   Chapter 3

                      The Structure of the SKIMP Compiler


   The SKIMP compiler consists of a set of external routines written in IMP.  It
is a 1-pass system in that the compiler makes one pass over the  source text and
generates the object code statement by statement.   Each  statement is processed
separately and the processing consists of four phases. These are:

                      1  :  line reconstruction
                      2  :  lexical analysis
                      3  :  syntax analysis
                      4  :  code generation

The flow of control through these phases is shown in figure 3.1.

                           +--------------------+
          +--------------->|  line reconstruct  |
          |                |   next statement   |
          |                +----------+---------+
          |                           |
          |                +----------+---------+
          |                |  analyse  lexical  |
          |                |      structure     |
          |                +----------+---------+
          |                           |
          |                +----------+---------+
          |                |  analyse syntactic |
          |                |      structure     |
          |                +----------+---------+
          |                           |
          |                 +---------+--------+
          |                 |  valid syntax ?  |
          |                 +-+--------------+-+
          |                   | no        yes|
          |                   |              |
          |    +--------------+---+      +---+-----------+
          |<---| print "SYNTAX ?" |      | generate code |
          |    +------------------+      +-------+-------+
          |                                      |
          |                           +----------+---------+
          |                           |  end of program ?  |
          |                           +-+---------------+--+
          |                             |no          yes|
          +-----------------------------+               |


                                 figure 3.1


Syntax Definitions


   For convenience  when continually modifying  and developing  the  compiler, a
description of  the syntax of  statements is read  in from  a  file each  time a
program is compiled.  The syntax takes the form of phrase structure definitions.






                                       14
For example:
               <DIGIT> = '0','1','2','3'.'4','5','6','7','8','9';

This would define a phrase <DIGIT> as any of the digits 0 to 9.  Characters such
as these  digits  are  enclosed within  single  quotes  as  shown.    Groups  of
characters can  also appear between  quotes if  required.   Commas separate  the
alternative  forms  the  phrase  may  take  and  a   semi-colon  terminates  the
definition.

   Keywords are enclosed within double quote characters.  For example:

                         <PROC> = "ROUTINE" , "INTEGERFN" ;

Note that  the  '%' character  can be  omitted.   When  it  is  not  clear  what
constitutes a single keyword,  for instance %INTEGERARRAYNAME, it may be divided
up as convenient.  For example, the forms:

                               "INTEGERARRAYNAME"
                            "INTEGER" "ARRAY" "NAME"

are both valid and equivalent.

    A decimal number might be defined by the phrase <N> as follows:

                          <N> = <DIGIT> <N> , <DIGIT> ;

That is, a number is either a digit followed by a  number or is a single  digit.
This illustrates how phrases can be defined recursively.

   It is also possible to have a null alternative i.e.  one with no  items.  For
example, if it was desired to define a  decimal number optionally  preceded by a
plus or minus sign, a phrase <SIGN> could be defined:

                             <SIGN> = '+' , '-' , ;

Then the required definition could be:

                                <SN> = <SIGN> <N> ;

In  other words,  the third  alternative form  of  <SIGN>, namely  nothing,  can
precede the <N>.  A phrase definition with only one alternative such as this  is
rather redundant but nevertheless quite  valid.  The particular syntax  analysis
scheme used  in the SKIMP  compiler  imposes certain restrictions  on the  forms
phrase definitions  may take  but discussion  of these  is postponed  until  the
scheme itself has been described.

   The syntax definitions  for SKIMP  are shown  in figure  3.2.   The types  of
statement  in SKIMP are  defined by  the phrase  <STATEMENT> and  the  remaining
definitions follow from this.  There are two phrases built into the compiler and
which therefore do not need to be defined.  These are <NAME> and <CONST>.   They
represent identifiers and  constants respectively.   The reason  for them  being
built-in  is  that identifiers  and  constants  are recognised  at  the  lexical
analysis  stage and hence  do  not require  definitions for  use by  the  syntax
analyser.









                                        15
 <STATEMENT>= <INSTR>,
              "IF"<COND>"THEN"<INSTR><ELSE>,
              <CONST>':'<STATEMENT>,
              "FINISH"<ELSE>,
              "INTEGER"<ARRAY>,
              <PROC> <NAME><FORMAL>,
              "END"<OFPROG>,
              "BEGIN";
 <INSTR>    = <NAME><ACTUAL><ASSIGN>,
              '->'<CONST>,
              "START",
              "RETURN",
              "RESULT"'='<EXPR>,
              "STOP";
 <EXPR>     = <UNARY><OPERAND><EXPRREST>;
 <UNARY>    = '-','\','+', ;
 <OPERAND>  = <NAME><ACTUAL>,<CONST>,'('<EXPR>')';
 <EXPRREST> = <OP><OPERAND><EXPRREST>,   ;
 <OP>       = '<<','>>','&','||','|','**','/','*','+','-';
 <COND>     = <TEST><CONDREST>;
 <TEST>     = <EXPR><COMP><EXPR>,'('<COND>')';
 <COMP>     = '=','#','<=','<','>=','>';
 <CONDREST> = "AND"<TEST><ANDCOND>,"OR"<TEST><ORCOND>, ;
 <ANDCOND>  = "AND"<TEST><ANDCOND>, ;
 <ORCOND>   = "OR"<TEST><ORCOND>, ;
 <ELSE>     = "ELSE"<INSTR>, ;
 <ACTUAL>   = '('<EXPR><EXPRS>')', ;
 <EXPRS>    = ','<EXPR><EXPRS>, ;
 <ASSIGN>   = '='<EXPR>, ;
 <ARRAY>    = "ARRAY"<NAME><NAMES>'('<EXPR>':'<EXPR>')',<NAME><NAMES>;
 <NAMES>    = ','<NAME><NAMES>, ;
 <PROC>     = "ROUTINE","INTEGERFN";
 <FORMAL>   = '('"INTEGER"<FORM><NAME><NAMES><FORMALS>')', ;
 <FORM>     = "ARRAYNAME","NAME", ;
 <FORMALS>  = ','"INTEGER"<FORM><NAME><NAMES><FORMALS>, ;
 <OFPROG>   = "OFPROGRAM", ;


                            figure 3.2


Syntax Reduction


   When the syntax definitions are read in, they are reduced to a numerical form
to suit the syntax analyser.  The keywords are also picked out and built up into
a dictionary for the benefit of the lexical analyser.  This dictionary is a tree
structure of cells which each  represent an individual  keyletter.   When two or
more keywords start with the  same keyletters, the same  cells are  used for the
letters in  common.   For  example,  the  keywords  %START  and  %STOP would  be
represented by the structure shown in figure 3.3.












                                        16
            +---+---+---+
            | S |   |   |
            +---+-+-+---+
                  |
                +-+-+---+---+
                | T |   |   |
                +---+-+-+---+
                      |
                    +-+-+---+---+     +---+---+---+
                    | A |   |   |---->| O |   |   |
                    +---+-+-+---+     +---+-+-+---+
                          |                 |
                        +-+-+---+---+      +-+-+---+---+
                        | R |   |   |      | P |   |   |
                        +---+-+-+---+      +---+---+---+
                              |
                            +-+-+---+---+
                            | T |   |   |
                            +---+-+-+---+


                                    figure 3.3

As can be seen, the  letters S and T share  the  same cells.   Each cell has two
links in addition.  The first links together cells containing successive letters
in keywords and the second links together  cells containing alternative letters.
Thus, the cell  containing O  is  linked from the cell  containing A through the
second link since A and  O are the  alternative letters which  may follow T.  In
addition to the three  items already  described,  ther is a further  item in the
cells at the leaves of the tree  which  contains an identification number of the
keyword which terminates there.  These numbers are assigned as the dictionary is
created and range  from  128 upwards  in order  to distinguish them  from normal
characters at the  lexical  analysis  stage.    To improve  dictionary searching
efficiency, the first letter  of each keyword is used  to index into an array of
26 cells.  Tree searching only takes place thereafter.

   Just as keywords  are assigned identification numbers,  so  are phrase names.
Phrases <NAME> and  <CONST> are assigned the numbers  256  and 257 respectively,
and other phrases  the numbers  258 upwards.    These  numbers  are used  in the
reduced numerical form of the syntax definitions.  In  general, the reduced form
of a definition is as shown in figure 3.4:

      +----------------------->  +---------------------->  +-- - - -->
      |                        | |                       | |          |
    +-+-+-----+---------------++-++-----+---------------++-++-- - --+-+-+
    |   | NT1 | alternative 1 |   | NT2 | alternative 2 |   |       | 0 |
    +---+-----+---------------+---+-----+---------------+---+-- - --+---+


                                   figure 3.4

The zero signifies the end of the definition.  The contents of  the alternatives
fields in the diagram  correspond one for one with the  items in  the definition
read in.  They are either the numerical equivalents of the literal characters or
keyword or phrase  identification numbers.   NT1, NT2  etc. are  the  number  of
phrase items in that alternative.  This is of relevance to  the syntax analyser.
The definitions reduced in this way are  stored one after another in an array as
shown in figure 3.5.





                                       17
     1      4   1 259  11   3 135 260 145 259 261 16   2 257  58 258  20
    17      1 133 261  24   1 136 262 29   3 263 256 264  33   1 131 265
    33     36   0 130   0  42   3 256 266 267 47   1  45  62 257  50   0
    49    143  53   0 140  58   1 141 61 268  61   0 144   0  67   3 269
    65    270 271   0  71   0 45  74   0  92  77   0  43  79   0   0  84
    81      2 256 266  87   1 257 92   1  40 268  41   0  98   3 272 270
    97    271 100   0   0 105   0 60  60 109   0  62  62 112   0  38 116
   113      0  33  33 119   0 33 123   0  42  42 126   0  47 129   0  42
   129    132   0  43 135   0 45   0  140  2 273 274   0 146   3 268 275
   145    268 151   1  40 260 41   0  155  0  61 158   0  35 162   0  60
   161     61 165   0  60 169   0 62  61 172   0  62   0 178   2 128 273
   177    276 183   2 138 273 277 185  0   0 191   2 128 273 276 193   0
   193      0 199   2 138 273 277 201  0   0 206   1 132 259 208   0   0
   209    215   2  40 268 278 41 217   0   0 223   2  44 268 278 225   0
   225      0 230   1  61 268 232  0   0 243   4 129 256 279  40 268  58
   241    268  41 247   2 256 279  0  253  2  44 256 279 255   0   0 259
   257      0 142 263   0 136 134  0  273  4  40 136 280 256 279 281  41
   273    275   0   0 280   0 129 137 283  0 137 285   0   0 294   4  44
   289    136 280 256 279 281 296  0   0 300   0 139 302   0   0


                                figure 3.5

The identification numbers of the keywords and phrases corresponding to this are
shown in figure 3.6.  This figure also gives the starting index positions in the
array of each of the phrase definitions.

                    Keywords            Phrases

                    128  AND            256     0   NAME
                    129  ARRAY          257     0   CONST
                    130  BEGIN          258     1   STATEMENT
                    131  END            259    37   INSTR
                    132  ELSE           260   136   COND
                    133  FINISH         261   202   ELSE
                    134  FN             262   233   ARRAY
                    135  IF             263   256   PROC
                    136  INTEGER        264   264   FORMAL
                    137  NAME           265   297   OFPROG
                    138  OR             266   209   ACTUAL
                    139  OFPROGRAM      267   226   ASSIGN
                    140  RETURN         268    62   EXPR
                    141  RESULT         269    68   UNARY
                    142  ROUTINE        270    80   OPERAND
                    143  START          271    93   EXPRREST
                    144  STOP           272   101   OP
                    145  THEN           273   141   TEST
                                        274   173   CONDREST
                                        275   152   COMP
                                        276   186   ANDCOND
                                        277   194   ORCOND
                                        278   218   EXPRS
                                        279   248   NAMES
                                        280   276   FORM
                                        281   286   FORMALS


                                  figure 3.6

If any faults are discovered in the syntax definitions by the reduction  routine



                                       18
the compiler will halt without attempting to compile the source.

Implementation Details


   The  routines  which  implement everything  described  in  this  chapter  are
contained in  the  file  'SKIMPA'.   The  routine  which  reads  in  the  syntax
definitions is named  'READ PS' and it takes the  definitions from a file  named
'SKIMPPS'.  The resultant reduced form is stored  in the array named 'PS'.   The
positions in  this where phrase definitions  start are given  by the array  'PP'
which is indexed  by  phrase identification number.   The  information given  in
figures 3.2, 3.5 and 3.6 is output to a file named 'SKIMPPSL'.

   The keyword dictionary is formed  from cells which are records of the  record
array named 'KD'.  Their format is:

                       %RECORDFORMAT KDF(%INTEGER L,N,A,B)

'L' represents the key letter,  'N' the identification number at a leaf cell and
'A' and  'B'  the two links used in the  tree structuring.   The creation of the
dictionary is  slightly involved since keywords in the dictionary have to be the
'lowest common denominator'  of  the keywords  in the syntax  description.  This
means that  keywords  may have to  be split up  if a new  keyword is encountered
which is a prefix of an existing one.  A local string array  'KA' of keywords is
used to aid this process.


   The main control  loop of the compiler which is illustrated in figure 3.1  is
implemented in  the  main  routine  named  'SKIMP'.   The  box  'generate  code'
corresponds to routine  'STATEMENT'  in file 'SKIMPB'.  Subroutines used by this
routine appear in the other files  'SKIMPC',  SKIMPD' and 'SKIMPE'.  The general
strategy in processing statements is to break them down  into component syntatic
parts and process each of  these  simpler  parts  individually.   At the highest
level,  each  alternative  form in  the  definition  of  phrase  <STATEMENT>  is
processsed separately by executing a  switch  on  the  alternative number in the
analysis record of the statement  (see next  chapter).  This is also done at the
lower hierarchical levels.


























                                       19
                                    Chapter 4

                Line Reconstruction, Lexical and Syntax Analysis


Line Reconstruction


   Line reconstruction is the first  phase of processing of  the raw source text
and is a simple cleaning up operation.  Its functions are:

                1.  to mark keyword letters
                2   to remove spaces
                3.  to concatenate continuation lines
                4.  to discard comments


   Since  the  use of  '%'  as a  prefix  can  lead  to  many  different ways of
representing  keywords (c.f. %INTEGERARRAYNAME above),  this variability must be
removed before they can conveniently  be  processed  any  further.  This is done
during line  reconstruction by deleting the '%' and adding the value 128 to each
keyletter.

   The only significance  of  spaces  is  in terminating keywords.  Hence, after
keyletters have  been  marked  spaces  are discarded.  All symbols within quotes
(and this can include spaces and newlines) are left intact, however.

   One statement at a time is read from the  source  file  i.e.  up to the first
semi-colon or newline.  Any  number of lines  may be concatenated using the '%C'
at the end of a line subject  only to  a  limitation of 256 characters in total.
This is dealt with  by  looking at the previous  character whenever a newline is
encountered.  If it is the value 'C'+128 i.e. a '%'-shifted 'C', then it and the
newline are discarded and processing continues with the next line.  If the first
character  of  a  statement is a  '!' the statement is regarded as a comment and
discarded.


Lexical Analysis


   The  cleaned  up  text  received  from  the  line  reconstruct  phase is next
processed to find lexical objects.  These are:

                        1.   keywords
                        2.   names
                        3.   constants

   The start of any lexical object is determined by inspecting each character in
turn.  A value greater than  128 indicates the start of a keyword.  The adjacent
characters also greater than  128 are  concatenated  into a string which is then
looked up in the  keyword  dictionary  described previously.  The identification
number thus produced (ranging from  128 upwards) is stored in the lexical output
array in place of the  keyletters.  Potentially  more  than  one  identification
number can result from a  compound keyword.  In this case the look up routine is
called repeatedly until all the keyletters have been identified.

   If a  character  value lies between 'A' and 'Z', it and the following letters
and digits comprise a name which is then looked up in a dictionary.  If the name





                                        20
is  already  in  the dictionary then its existing identification number is used.
If  it  is  not  already  in, it  is  inserted  and  a new identification number
generated  for  it. The  output  to the lexical array consists of the number 256
(the phrase number of <NAME>) followed by the identification number of the name.

   The identification numbers are the positions at which the names are stored in
a 'hash table'.  The hash table operates as follows.  A calculation is performed
on the  characters  of  the  name  itself  to produce  a  'hash index'.  If this
position  is  not  already occupied, the name is inserted at that point.  If the
position is already occupied the name stored there may or may not be the same as
the new  name being looked up.   If  it  is  the  same then no further action is
required.  If  it  is  not, a  'clash'  has  occurred  which  must  be resolved.
Numerous  methods of doing this have been invented but the simplest possible one
is used here.  This is to scan  forward  cyclically  inspecting  each successive
position in turn.  If the new name  is  encountered  in  any of these positions,
again no further  action is required.   If,  however,  an unoccupied position is
encountered then the new name cannot be in already and so can be inserted there.

   The requirements of the hash  calculation are that it should be fast and that
as far as possible different  names  should produce different indexes.  This is,
of course,  unattainable  in general  since  there are many more potential names
than indexes but a fairly  good  attempt  can be made.   The scheme used in this
compiler packs character values into a word, divides this word by a prime number
just less than the size  of  the  hash table and takes the remainder as the hash
index.

   Constants start with either a digit or a single quote character.  The lexical
action in this case is to  evaluate  the  constant  and to insert the number 257
(for phrase  <CONST>  followed  by  the  value  into  the  lexical  array.   For
character constants the value consists of the individual character values packed
into 8-bit fields of a word right justified.

   Characters which are not part  of  keywords,  names  or  constants are passed
through to the lexical array  unchanged.  Since characters have values less than
128, the type of a lexical item can be determined from the range it lies in.


Syntax Analysis


   The compiler  uses  a  top-down  recursive  descent method of syntax analysis
which is directed  by the table of reduced syntax definitions described earlier.
This  analyses  the  output  of  the  lexical  phase  with respect to the phrase
<STATEMENT>.  The algorithm is implemented as a routine which considers just one
phrase definition.  Where literal characters  or keywords appear as items in the
definition they are compared directly  with  the lexical information and where a
sub-phrase appears as an item, the routine is called recursively to match it.

   The technique involved in matching a phrase is  essentially  a search in that
each alternative in the phrase definition is considered in turn.  Similarly, the
items in  an  alternative  are  considered  one  after  another.  An alternative
matches if all the items in it have been matched.  If any item in the definition
does not match,  the  whole  alternative in which that item appears is abandoned
and the algorithm moves on to  try  the next alternative.  A flow diagram of the
algorithm is given in  figure  4.1.  Figure 4.2 shows the stages in the analysis
of the statement:

                             %INTEGER I,J





                                       21
                                 |
   compare phrase                |
+ - - - - - - - - - - - - - - - -|- - - - - - - - - - - +
|                                |                      |
                         +---------------+
|    +------------------>|    another    |--------------|-------->
     |                   | alternative ? |  no             failure
|    |                   +---------------+              |
     |                       yes |
|    |                           |                      |
     |                     +-----------+
|    |    +--------------->|  another  |----------------|-------->
     |    |                |  item ?   |  no               success
|    |    |                +-----------+                |
     |    |                  yes |
|    |    |                      |                      |
     |    |              +---------------+
|    |    |              |    phrase ?   |              |
     |    |              +---------------+
|    |    |              s |           | no             |
     |    |                |           |
|    |    |     + - - - - - -+       +------------+     |
     |    |     |   compare  |       |    item    |
|    |    |     |   phrase   |       |   match ?  |     |
     |    |     + - - - - - -+       +------------+
|    |    |  success |  | failure     yes |  | no       |
     |    |          |  |                 |  |
|    |    +<---------+--|-----------------+  |          |
     |                  |                    |
|    +<-----------------+--------------------+          |

+ - - - - - - - - - - - - - - - - - - - - - - - - - - - +

                         figure 4.1

   The  restrictions  upon  the  form  phrase  definitions  may  take which were
mentioned  in  the  previous  chapter  arise  from  this  comparison  algorithm.
Consider the definition:

                         <N>  = <N><DIGIT> , <DIGIT> ;

If the analyser were to use this a recursive loop would result since <N> appears
as the first item in the first alternative.  Similarly, consider the definition:

                         <N> = <DIGIT> , <DIGIT><N> ;

In this case, the analyser would only  be able to recognise single digit numbers
since the  first  alternative  would  always  prove  successful  and  the second
alternative would never be tried.   Situations of  these  kinds must be avoided.
Fortunately this usually presents no difficulties.

   From the efficiency point of view, it is also desirable to avoid other forms
of definition. An example might be:

                  <EXPR> = <OPERAND><OP><EXPR> , <OPERAND> ;

The   inefficiency  results   from  <OPERAND>  being  the  first  item in  both
alternatives. Clearly, whenever the second alternative is recognised ( the last





                                        22
                   compare <STATEMENT>
                   1st alternative
                      1st item = <INSTR>

                         compare <INSTR>
                         1st alternative
                            1st item = <NAME> / no match
                         2nd alternative
                            1st item = '-' / no match
                         3rd alternative
                            1st item = "START" / no match
                         4th alternative
                            1st item = "RETURN" / no match
                         5th alternative
                            1st item = "RESULT" / no match
                         6th alternative
                            1st item = "STOP" / no match
                         failure to match <INSTR>

                   2nd alternative
                      1st item = "IF" / no match
                   3rd alternative
                      1st item = <CONST> / no match
                   4th alternative
                      1st item = "FINISH" / no match
                   5th alternative
                      1st item = "INTEGER" / match
                      2nd item = <ARRAY>

                         compare <ARRAY>
                         1st alternative
                            1st item = "ARRAY" / no match
                         2nd alternative
                            1st item = <NAME> / match (I)
                            2nd item = <NAMES>

                               compare <NAMES>
                               1st alternative
                                  1st item = ',' / match
                                  2nd item = <NAME> / match (J)
                                  3rd item = <NAMES>

                                     compare <NAMES>
                                     1st alternative
                                        1st item = ',' / no match
                                     2nd alternative
                                        null item / match
                                     successful match of <NAMES>

                                  no more items
                                successful match of <NAMES>

                            no more items
                         successful match of <ARRAY>

                      no more items
                   successful match of <STATEMENT>

                              figure 4.2




                                        23
operand of an  expression)  phrase  <OPERAND> will  have  been recognised twice.
This  could  have  been  avoided  by  writing  the  definition of <EXPR> in the
following way:

                    <EXPR> = <OPERAND><REST> ;
                    <REST> = <OP><OPERAND><REST> , ;

   Definitions with  only  one alternative are, strictly, unnecessary and result
in the inefficiency of an extra recursive call,  but  are  occasionally used for
the sake of clarity. <EXPR> and <COND> are examples in the SKIMP syntax.

   Efficiency is also  partly  the reason for recognising names and constants at
the lexical stage.   If  their  recognition  were  left  to  this algorithm, two
phrases such as:

               <LETTER> = 'A' , 'B' , 'C', . . . . 'Z' ;
               <DIGIT> = '0' , '1' , '2' , . . . . '9' ;

would probably have to  be introduced.  Since the algorithm searches through the
alternatives, the recognition of letters and digits would be very slow.

   Also in view of the searching,  it will clearly be beneficial to put the most
commonly occurring forms of alternative first in the definition of a phrase.  In
particular, <STATEMENT> and <INSTR>  are ordered in what is hopefully a sensible
way with this consideration in mind.


Analysis Records


   During the course of syntax analysis,  a record is kept of which phrases have
been recognised.  This information is passed to the code generation phase of the
compiler  and  directs  its  course  of  action.   The   record  consists  of  a
linearisation of the analysis tree  corresponding  to  the statement recognised.
For instance,  the analysis tree corresponding to  the  example above would take
the form shown in figure 4.3.

  <STATEMENT> / 5
    |     |
    |     +-------+
    |             |
"INTEGER"      <ARRAY> /2
                |   |
                |   +----------+
                |              |
              <NAME>        <NAMES> / 1
                |            | | |
                |       +----+ | +------+
                |       |      |        |
              'I'      ','   <NAME>  <NAMES> /2
                               |        |
                               |        |
                              'J'      null


                    figure 4.3

The  number  after  the  each  phrase  name  in  the  tree is the number of  the





                                        24
alternative in its definition which was matched.  These numbers and the pointers
emanating  from each phrase  comprise the  information stored in  the linearised
form of analysis record.  For each phrase,  the record takes  the  form shown in
figure 4.4:

         +--------------------+-----------+-----------+-----------
         | alternative number | pointer 1 | pointer 2 |  . . . .
         +--------------------+-----------+-----------+-----------

                                 figure 4.4

The information is pruned to a certain extent, however.  There are only pointers
for the  phrase  items  matched.  All  reference  to keywords, literals and null
alternatives  is  omitted  since   their  presence  can  be  inferred  from  the
alternative numbers.   Phrases  <NAME>  and  <CONST> are treated specially since
they were recognised at the lexical stage.  To be consistent with other phrases,
both are given an alternative number matched of 1.  In addition, for <NAME>, the
following word is set to contain  the identification  number of the name and for
<CONST>, the following word is set to the value of  the  constant.  The analysis
record for the axample is therefore as shown in figure 4.5:

                          +-------------->         +--------------->
              +-->    +---|-->            |   +----|--->            |
              |   |   |   |   |           |   |    |    |           |
        +---+---+---+---+---+---+-------+---+----+----+---+-------+---+
        | 5 | 3 | 2 | 6 | 8 | 1 | ID(I) | 1 | 11 | 13 | 1 | ID(J) | 2 |
        +---+---+---+---+---+---+-------+---+----+----+---+-------+---+

                                 figure 4.5

The phrase names themselves also do not appear in  the  record  as  can  be seen
above.  The pointers are  just  indexes  into  the  array in which the record is
stored.  To aid clarity, when the analysis record is printed out by the compiler
(the ANAL option)  the positions at which phrases start are indicated by inserts
such as:

              (1/STATEMENT) (3/ARRAY) (8/NAMES)


Implementation Details


   Line reconstruction  and lexical analysis are implemented together in the one
routine which  is  named  'READ STATEMENT'  and which appears in  file 'SKIMPA'.
Subsidiary to this  are  routines  which  deal with each kind of lexical object.
These are named  'KEYWORD', 'NAME' and 'CONST'.  Reconstructed text is placed in
an array named  'TT' and this is lexically reduced into an array named 'T'.  The
name dictionary which is  created  during  lexical processing comprises an array
named 'NAMEDLINK'  and a byte  array  named  'NAMED'.  The former is used as the
hash table and the entries in it point to positions  in  the  latter array where
the characters of  the  names  are stored as concatenated strings.  For example,
figure 4.6:











                                     25
      +----+-+----+-+----+-+------+-+--------+
      |    | |    | |    | |      | |        |
      +----+++----+++----+++------+++--------+
            |      |      |        |
  +---------+ +----|------+     +--+
  |           |    +----+       |
  |           |         |       |
+-+-----------+---------+-------+-----------------
| 5 H A R R Y 4 D I C K 3 T O M 4 F R E D . . . .
+-------------------------------------------------

                  figure 4.6

   The syntax analysis routine is named 'COMPARE', also in file 'SKIMPA', and is
a function which returns the value 1 for success and 0 for failure.  It takes as
its single parameter the  identification number  of  the phrase to be matched in
this call.  Hence, it is initially called with the number of phrase <STATEMENT>.
Since <STATEMENT> is assumed always to be the first phrase definition to be read
in, this number is  258.   Phrase  numbers  256  and  257,  <NAME>  and  <CONST>
respectively,  are  picked  out  on entry to the routine and treated as literals
since they have  already been recognised.  The analysis  record  is placed in an
array named 'A'.


Processing Analysis Records


   The basis  of  the  processing  rests on knowing the starting position in the
analysis record of  each  phrase  which  is  of concern.  The  values  in  these
positions  indicate  the  alternative  which  was  actually  recognised  by  the
comparison routine and it is this information which directs the code generation.
The  form of analysis record facilitates  finding  out  where the phrases start.
These are simply the values of the pointers which follow each alernative number.
Consider,  for example,  the phrase structure associated with the declaration of
variables:

<STATEMENT> = . . . . . ."INTEGER"<ARRAY>, . . . .
<ARRAY> = "ARRAY"<NAME><NAMES>'('<EXPR>':'<EXPR>')',<NAME><NAMES>;

The analysis record for:

                            %INTEGER I, J, K

would be:

      5 3 2 6 8 1 ID(I) 1 11 13 1 1D(J) 1 16 18 1 ID(K) 2

The compiler fairly  consistently uses  names for the positions of where phrases
start in the analysis record  which  consist  of the name of the phrase with 'P'
(for 'pointer')  appended.  In  the  case  of phrase <STATEMENT> the variable is
'STATEMENTP'  and  it  is  the  value in the analysis record at that point which
causes the main  switch  in  routine  'STATEMENT'.  For the positions of phrases
<ARRAY>, <NAME>  and  <NAMES>,  the variables are 'ARRAYP', 'NAMEP' and 'NAMESP'
respectively.  The following IMP code shows how  the values for these are set up
and the declaration of scalars is handled:








                                       26
                     ARRAYP=A(STATEMENTP+1)
                     NAMEP=A(ARRAYP+1)
                     NAMESP=A(ARRAYP+2)
                     %IF A(ARRAYP)=2 %START
                       %CYCLE    ;| for each name declared
                         PUSHTAG(A<NAMEP*1), etc. )
                         NEXTRAD(LRVEL)= etc.
                         %IF A(NAMESP)=2 %THEN %EXIT
                         NAMEP=A(NAMESP+1)
                         NAMESP=A(NAMESP+2)
                       %REPEAT
                     %FINISH %ELSE %START
                       | array declarations
                       . . . .

The routine call on 'PUSHTAG' and the following line  deal with the  declaration
of each name.  This is described  in  more detail in chapter 6.  Each time round
the loop the variables 'NAMEP' and 'NAMESP' are updated for the  next  name  and
the exit from the loop takes place  when the null alternative of phrase <NAMES>,
alternative 2, is encountered.











































                                       27
                                   Chapter 5

                              Storage Organisation


Layout of Store


   The compiler produces object code which assumes the layout of store  shown in
figure 5.1.  The object  code  itself  starts  at  address zero and the table of
constants and  run-time  stack  follow consecutively.  If  it  was  so  desired,
however, the code could easily be made relocatable and the two other areas could
be located elsewhere, for instance in separate  segments  on a machine with such
an architecture.

                        0   +------------------+
                        |   |                  |
                        v   |                  |
                            |   Object Code    |
                            |                  |
                            |                  |
                            +------------------+
                            |                  |
                            |                  |
                            |    Table of      |
                            |   Constants      |
                            |                  |
                            |                  |
                            +------------------+
                            |                  |
                            |                  |
                            |     Run-time     |
                            |      Stack       |
                            |                  |
                            |                  |
                            +------------------+

                                 figure 5.1

   The entry point to the  object  code is at address zero.  The  code dumped at
this position sets the register 'COT' to the base of  the  table of contants and
also sets the first  display  register, 'DR1', to the start  of  the stack  (see
below).  Accesses to data are always made via index registers.


Run-time Storage Allocation and Addressing


   The block structure  of  SKIMP  leads naturally to the use of a stack for the
allocation of  storage.   In  particular,  the  possibly  recursive  calling  of
routines and functions is most  conveniently  dealt with using a stack.  Storage
for local variables  and  arrays  is  allocated when the routine  or function is
entered and  deallocated  when  it is left.  These areas  are sometimes known as
'local name spaces'.  Return addresses can also be preserved on the stack.

   Consider a number of  routines 'A', 'B', 'C' etc. and  the storage allocation
for variables and arrays in those routines.  Suppose routine 'A' is called which
then calls 'B' and which then calls 'C'.  A picture of the stack would be:





                                        28
          -----+------------------+-------------+-------------+-----
               | storage for A    |    for B    |    for C    |
          -----+------------------+-------------+-------------+-----

For calls in  a different  order, say  'A'  called 'C',  'C' called 'D'  and 'D'
called 'B', the picture would be:

          -----+-----------+-----------+-----------+-----------+-----
               |     A     |     C     |     D     |     B     |
          -----+-----------+-----------+-----------+-----------+-----

Similarly recursive calls, say 'B'  called 'A',  'A' called 'A',  'A' called 'C'
and 'C' called 'B':

     --+-----------+------------+-----------+-----------+------------+--
       |   B (1)   |    A (1)   |   A (2)   |     C     |    B (2)   |
     --+-----------+------------+-----------+-----------+------------+--

Note that  completely new  storage areas  are  allocated for recursive calls.  A
result of this 'allocate when called'  scheme is that the storage for routine is
liable to start  anywhere on the stack  and not in a fixed  place.  The compiler
cannot  therefore  assign  addresses to the variables as it comes across them in
declarations at compile-time.  However, it can assign them addresses relative to
the start of  the  storage area  for that routine and this it does.  To access a
variable at run-time  therefore  requires  both this  'relative address' and the
address of the start of the appropriate local name space.  These start addresses
can be kept in registers thus allowing the 'base register and displacement' form
in a single object instruction to be used to access variables.

   A register  need not be allocated  to each  routine  in the program, however.
Only those which are currently  active i.e.  called, and which contain variables
in  scope  at  the  current  point  of  execution  of  the  program,  need  have
corresponding registers.  Since only one routine can be in scope at each textual
level i.e. routine nesting depth, only one register per textual level is needed.
This set of registers is known as a 'Display'.  For  machines  with no or only a
few  index registers, a  display  can be kept in store rather than registers but
then extra  overheads are involved whenever  variables  are accessed.  A typical
situation  is  that  in  which  there  are  sufficient  registers  for one to be
permanently allocated to the most recent local name space and perhaps another to
be used for all global accesses.  The loss of efficiency  here is usually fairly
acceptable.  No such difficulties arise with the SKIMP machine.

   Clearly, the  display  will  need  to be updated  on routine  entry and exit.
Consider the following  section of program with two routines at the same textual
level:

                                 %ROUTINE A
                                 . . .
                                 %END
                                 %ROUTINE B
                                 . . .
                                 %END

When 'A' is called, the stack situation will be:









                                     29
               --------+-------------------+-------------------------
                       |         A         |
               --------+-------------------+-------------------------
                       |
                 DR-->

where 'DR' is the register  allocated to point  at the  start of 'A's local name
space.  When 'B' is called from 'A', the variables in 'A' are no longer in scope
i.e. are inaccessible, so 'DR' can be re-used to point to 'B's local name space:

             ---+------------------+-------------------+-------------
                |         A        |          B        |
             ---+------------------+-------------------+-------------
                                   |
                              DR-->

When 'B'  returns, 'A's  variables  must  again  become  accessible.  Hence, the
previous  value  of 'DR'  must  be  preserved.  This  will  be true  for all the
registers of the display.  Their values must be preserved  when they are updated
so that they can  eventually  be restored.  The stack  can also be used for this
purpose.  The object  code the SKIMP compiler  produces  puts preserved  display
values  in the first  location  of each local  name space followed by the return
address to the calling routine.  Storage for variables starts after that.

   This discussion leads to the following  scheme for updating the display.  The
display is  represented  by the registers  'DR1',  'DR2',  'DR3',  etc. and  the
register which is used to point to the start of unallocated storage on the stack
is named 'STP', standing  for STack Pointer.  The call of a routine  or function
is made by means of an instruction of the form;

                       BAL,    WK,    ,   (start address)

which leaves the return  address in a 'work'  register  named 'WK'.  The code at
the point of entry to a routine is:

                   STR,   DR(level of routine),   STP,   0
                   LDA,   DR(level of routine),   STP,   0
                   STR,   WK,    STP,   1
                   ADD,   STP,  ,       (storage area size)

At a point of exit i.e. %RETURN, %RESULT=, or %END, the code is:

                   LDA,    STP,   DR(level of routine), 0
                   LOAD,  DR(level of routine),    STP, 0
                   LOAD,  WK,   STP,   1
                   B,     ,     WK,    0


Array Addressing


   As mentioned  previously, variables  are allocated  locations relative to the
start of a local  name space as they are declared.  A problem  arises for arrays
since they may have bounds which are evaluated at run-time which  means that the
amount of space they need is not known at compile-time.  If the space for arrays
and variables  were  allocated in the  order of declaration, variables  declared
after  an array  would  not have a fixed  relative  address.  The solution is to






                                        30
allocate  the space  for arrays  after  that for the  variables  as shown in the
diagram:

         --+--------+-------------+-----------------+------------------+--
           | old DR | return addr |   static area   |   dynamic area   |
         --+--------+-------------+-----------------+------------------+--
           |
      DR-->

Each array is now allocated a single location in the static area and this points
to the place in the  dynamic  area  where  the storage for the array is located.
For example;

                       %INTEGER   I, J
                       %INTEGERARRAY   A, B( ? : ? )
                       %INTEGER  K, L

                               +------------------------->
                         +-----|------------->            |
                         |     |              |           |
   -+--+--+-----+-----+--+--+--+--+-----+-----+-----------+-----------+-
    |  |  |  I  |  J  |     |     |  K  |  L  |     A     |     B     |
   -+--+--+-----+-----+-----+-----+-----+-----+-----------+-----------+-
    |                                                                 |
     <--DR                                                      STP-->


   When register 'STP' os incremented at routine  entry, it is just  incremented
past the static storage area.  The object  code  output for an array declaration
allocates space  beyond 'STP' and moves 'STP' up as appropriate  for the size of
the array.  Variables in the static area have an address:

          DR(textual level of declaration) + relative address of variable

which  allows  them to be accessed  in one  instruction.  Array  elements on the
other hand can only be accessed  indirectly  via the pointer.  If this points to
the position of the possibly  hypothetical zero element of the array rather than
the first actual location, the address of an element such as:

                                 A( expression )

will be:

          [ DR(level) + rel. addr. of pointer ] + value of expression

where the square brackets indicate 'contents of'.

   In addition  to incrementing  'STP' the object  code for an array declaration
must also  fill in  this  zero  address.  For  the  general  case  of  an  array
declaration:

                   %INTEGERARRAY     A,B,....( lower : upper )

the form of the object code is:









                                        31
               code to evaluate 'lower' in 'ACC'
               STR,   ACC,   DR(level),   temp1
               code to evaluate 'upper' in 'ACC'
               LDA,   ACC,   ACC,         1
               STR,   ACC,   DR(level),   temp2
               SUB,   STP,   DR(level),   temp1
               STR,   STP,   DR(level),   rel. addr. of 'A's pointer
               ADD,   STP.   DR(level),   temp2
               SUB.   STP,   DR(level),   temp1
               STR,   STP.   DR(level),   rel. addr. of 'B's pointer
               ADD,   STP.   DR(level),   temp2
               . . . . .

'temp1' and 'temp2' are temporary  working  storage  locations  set aside in the
local name  space to hold the  values of  'lower'  and  'upper+1'  respectively.
(Similar  working  storage is also used  for  partial  results  when  evaluating
expressions.) For each  array  declared  the value  'STP-lower'  before 'STP' is
incremented  gives the zero  element  address and that  plus 'upper+1' gives the
final value of 'STP' having  allowed  space  of length  'upper-lower+1'  for the
array.


Parameter Passing


   Parameters are regarded as initialised  local variables of routines in SKIMP.
Passing  parameters  for  a routine  call  therefore  consists of setting  these
initial  values.  This is quite  straightforward  since the the  parameters will
have the first  relative  addresses to be  allocated  in the static  area of the
routine's  local name space.  Before the entry code has been executed, the start
of the future  local name space will  pointed  to by the current value of 'STP'.
This allows  the values of the  parameters to be stored using addresses relative
to 'STP' before the 'BAL' is executed.


   The three  types of parameter  are each  treated  differently.   For %INTEGER
value type parameters, the expression in the call is evaluated in register 'ACC'
and then stored in the appropriate place.  For %INTEGERNAME type parameters, the
address of the variable or array element is evaluated in register 'ACC' and then
stored.  Thereafter,  the  parameter  has  to  be  accessed  indirectly via this
address.  For %INTEGERARRAYNAME  type  parameters, the word  containing the zero
element  address  of the array  in the call is copied to the parameter location.
This  allows  accesses  to array name  parameter  elements  to be handled by the
compiler in exactly the same way as those to ordinary array elements.


   To illustrate  the  scheme,  consider  the  stack when  routine  'R'  in  the
following program is called:

            %BEGIN
            %ROUTINE R(%INTEGER X, %INTEGERNAME Y,%INTEGRRARRAYNAME Z)
            . . . . .
            %END
            %INTEGERARRAY A ( 0 : ? )
            %INTEGER I, J, K
            R(I+J, K, A)
            . . . . .






                                        32
               +------------------->+<-------------------------------+
               |                  <-|--------------------------+     |
               |                 |  |                          |     |
     -+--+--+-----+-----+-----+-----+-----------+--+--+-----+-----+-----+--
      |  |  |     |  I  |  J  |  K  |     A     |  |  |  X  |  Y  |  Z  |
     -+--+--+-----+-----+-----+-----+-----------+--+--+-----+-----+-----+--
      |      (A0)                               |      (I+J)        (A0)|
       <--DR1                             DR2-->                  STP-->

The object code for such a call would be:

                  LOAD,   ACC,   DR1,   3 \  I+J
                  ADD,    ACC,   DR1,   4 /
                  STR,    ACC,   STP,   2 -->
                  LDA,    ACC,   DR1,   5 \  name
                  STR,    ACC,   STP,   3 /
                  LOAD,   ACC,   DR1,   2 \  array name
                  STR,    ACC,   STP,   4 /
                  BAL,    WK,    ,      (start address of R)

   If a variable of %INTEGERNAME  type i.e. a parameter, is passed as the actual
parameter for another  %INTEGERNAME parameter, the word containing the reference
to the original actual parameter is copied to the position for the new parameter
thereby maintaining just one level of indirection to reach the required address.


Implementation Details


   Declarations of both scalar variables and arrays are processed as alternative
5 of phrase  <STATEMENT> in file 'SKIMPB'.  The routine  'PUSHTAG' which records
details of  declarations  for future  use  is  described  in  the  next chapter.
Relative  addresses in the static  area of any  local  name  space are allocated
sequentially.  However, routines  may be nested and may appear  in the middle of
the variable  declarations  of the enclosing routine.   Hence, the next relative
address  to be  allocated  must be  retained  for each  textual  level up to the
current level to allow for this.  These values are stored in the array 'NEXTRAD'
which  has an element  for each  possible  textual  level.   Since the  compiler
variable  'LEVEL' always  contains  the value of the current  textual level, the
assignment of relative addresses always uses 'NEXTRAD(LEVEL)'.

   Routine 'GET WORK' allocates  locations in the local name space for temporary
storage  of  partial  results,  in  this  instance  in  connection   with  array
declarations.  They are handled in exactly the same way as scalar declarations.

   Routine entry and exit sequences are dumped by the routines 'ENTER' and 'DUMP
RETURN' which  appear in file 'SKIMPD'.   Note  that  'ENTER'  also  handles the
'%BEGIN' statement since this is similar in many respects.















                                       33


                                    Chapter 6

                             Identifier Tag Handling


   All identifiers in SKIMP  either have to be declared  or have to be specified
before  they  can be used.  This is to  enable  the  compiler  to  generate  the
appropriate  code  when  the  identifier is later encountered.  In practice this
means that the compiler must preserve information about each identifier as it is
declared or specified and this will be referred back to each time the identifier
is used.  This  information  on each identifier  is often  called a 'tag'. There
are several different items of information which are packed together to form the
tag which are:

                 1.     Form of identifier
                 2.     Type of identifier
                 3.     Number of parameters (or dimensions)
                 4.     Textual level of declaration
                 5.     Relative address or start address

   1.  The  form of an identifier  is intended to differentiate  between  scalar
variables  and  arrays, value  and  reference  types of  parameter, and data and
program objects.

   2.  The type of an identifier  specifies  whether it is %INTEGER in nature or
not. (In extensions to SKIMP the type would distinguish between %INTEGER, %REAL,
%COMPLEX, %STRING etc.)

   The various  forms and types in SKIMP as at present implemented are given the
code values shown below:

                                          form       type

                %INTEGER                    0          1
                %INTEGERNAME                1          1
                %INTEGERARRAY               2          1
                %INTEGERARRAYNAME           3          1
                %ROUTINE                    4          0
                %INTEGERFN                  4          1


   3.  If SKIMP  implemented  arrays  of more than one  dimension, the number of
dimensions  would have to be saved.  As this is not the case this information is
redundant.  The corresponding  information for routines and functions is needed,
however.  This is the number  of  parameters  they take.  Information  about the
parameters occupies further tag words using a scheme described later.

   4.  The textual level of declaration of each name must be saved since this is
used to select the appropriate display base register for addressing purposes.

   5.  To complete the information  needed for addressing  variables and arrays,
the relative  addresses assigned must be saved.  For routines  and functions the
address of their  entry points must be stored so that the correct address can be
output for the BAL instructions.

   These items are packed into a single word in the following format:





                                        34
        +--------+--------+----------+---------+----------------------+
        |  form  |  type  |  params  |  level  |       address        |
        +--------+--------+----------+---------+----------------------+
            (4)      (4)       (4)       (4)            (16)

As an example, consider the program:

                       %BEGIN
                       %INTEGER I,J
                           %ROUTINE R(%INTEGERNAME K)
                           %INTEGERARRAY   A(1: 10)
                           .......
                           %END
                       .......
                       %ENDOFPROGRAM

The tag words for these names would be:

                         +---+---+---+---+-----------+
                    I    | 0 | 1 | 0 | 1 |     2     |
                         +---+---+---+---+-----------+

                         +---+---+---+---+-----------+
                    J    | 0 | 1 | 0 | 1 |     3     |
                         +---+---+---+---+-----------+

                         +---+---+---+---+-----------+
                    R    | 4 | 0 | 1 | 1 |   (addr)  |
                         +---+---+---+---+-----------+

                         +---+---+---+---+-----------+
                    K    | 1 | 1 | 0 | 2 |     2     |
                         +---+---+---+---+-----------+

                         +---+---+---+---+-----------+
                    A    | 2 | 1 | 0 | 2 |     3     |
                         +---+---+---+---+-----------+


   Values of tags can be inspected  in the output from the compiler by using the
option '/TAGS' when calling the compiler.  They are printed out (in hexadecimal)
at the %END of every routine  and  function and at  %ENDOFPROGRAM  for the names
that were locally declared.  An example of this can be found in appendix 1.

   The packing  of the tag word places  some limits on the programs the compiler
can compile, for  instance only up to fifteen  routine parameters and only up to
fifteen  textual levels, but these are not serious limits for this compiler. (In
fact, less than fifteen  textual levels are permitted due to lack of registers.)
This  word  of  information  has to be  made  available  whenever  the  name  it
corresponds  to is used.  Since  names are identified  by a dictionary  position
i.e. hash position, a further array is used in parallel.  This contains pointers
to the list cells which contain  the tag information.  This is be illustrated in
figure 6.1 for the preceding example again.










                                       35
                        +---+---+                 TAG              LINK
                        |   |   |                  v                v
                        +---+---+        +---+---+---+---+------++------+
            +-----------|   |   |------->| 2 | 1 | 0 | 2 |   3  ||   0  |
            |           +---+---+        +---+---+---+---+------++------+
            |           |   |   |
    +---+   |           +---+---+        +---+---+---+---+------++------+
    | I |<--|-----------|   |   |------->| 0 | 1 | 0 | 1 |   2  ||   0  |
    +---+   |           +---+---+        +---+---+---+---+------++------+
    | J |<--|-------+   |   |   |
    +---+   |       |   +---+---+        +---+---+---+---+------++------+
    | R |<--|---+   +---|   |   |------->| 0 | 1 | 0 | 1 |   3  ||   0  |
    +---+   |   |       +---+---+        +---+---+---+---+------++------+
    | K |<--|---|---+   |   |   |
    +---+   |   |   |   +---+---+        +---+---+---+---+------++------+
    | A |<--+   |   +---|   |   |------->| 1 | 1 | 0 | 2 |   2  ||   0  |
    +---+       |       +---+---+        +---+---+---+---+------++------+
    |   |       |       |   |   |
    |   |       |       +---+---+        +---+---+---+---+------++------+
                +-------|   |   |------->| 4 | 0 | 1 | 1 |(addr)||      |--->
                        +---+---+        +---+---+---+---+------++------+    |
                        |   |   |    <---------------------------------------+
                        |   |   |   |    +---+---+---+---+------++------+
                        +---+---+   +--->| 1 | 1 | 0 | 2 |   2  ||   0  |
                                         +---+---+---+---+------++------+

                                 figure 6.1

   The diagram also shows how the  information on parameters is stored.  This is
in further  list cells beyond that for the name of the routine.  In general, the
structure  is as shown in figure 6.2.  Note that  there is now a duplication  of
the tag information  on the  parameter, firstly, in the chain of cells after the
tag for the name of the routine  and secondly in the dictionary position for the
parameter   name.   However, this  second  occurrence  is  only  present  during
compilation of the routine i.e. when the parameter name is in scope, whereas the
first occurrence is present  whenever the routine name is in scope i.e. whenever
parameter information is needed to compile the routine calls.

       |   |   |
       +---+---+      +------+--+    +-------+--+    +-------+--+
       |   |   |----->| name |  |--->| par 1 |  |--->| par 2 |  |---> . . .
       +---+---+      +------+--+    +-------+--+    +-------+--+
       |   |   |
                                figure 6.2

   Only  those  names  which  are in  scope  at the position  in the program the
compiler is working should have a tag cell.  A tag cell is therefore pushed down
whenever a name is declared and popped up again (together with all the parameter
tag  cells for routine  names)  when it is 'undeclared'  i.e. at the  end of the
routine  or  function  in  which  it  was declared.  This  also allows  for  the
redeclaration of the same name at an inner nested routine level.  The tag at the
head of the pushdown list is thus always the one for the latest declaration.  If
the redeclaration  is a routine  name, of course, the original tag is beyond all
the parameter information cells in the list.

   Consider the following program:







                                        36
                            %BEGIN
                            %INTEGER I
                               %ROUTINE R
                               %INTEGERARRAY I(1:10)
                               ......
                               %END
                            ......
                            %ENDOFPROGRAM

   The tags for this whilst the compiler is compiling routine 'R' will be:

                 |   |   |
      +---+      +---+---+     +-+-+-+-+---+---+     +-+-+-+-+---+---+
      | I |<-----|   |   |---->|2|1|0|2| 2 |   |---->|0|1|0|1| 2 |   |
      +---+      +---+---+     +-+-+-+-+---+---+     +-+-+-+-+---+---+
      |   |      |   |   |
                                2nd declaration       1st declaration

   If when a call is popped up, nothing  remains on the list i.e. this was not a
redeclaration, then the space  in the name  dictionary for that name can also be
recovered.  This is the case since  the dictionary  works on a last-in-first-out
basis, a consequence  of the nested  structure  of SKIMP programs.  The question
arises as to how the compiler  knows which tags to pop up when an %END statement
is reached.   An  obvious  method would be to search  through the dictionary for
tags having the  appropriate textual level in the level field.  A more efficient
method, however, which avoids searching, is to remember in a list the ones which
have to be popped up.  Each cell in the list contains the dictionary position of
a name, so that  this  list  can be  scanned  down  at the %END  statement and a
'popup'  performed  for every position  mentioned in it (plus extra 'popups' for
routine parameter tag cells).   Because  of the  nesting  of routines in SKIMP a
separate  list has to be  maintained  for each textual level.  An array with one
position  per textual  level  contains  pointers to the heads  of these lists as
shown in figure 6.3:

              +---+            +---+---+    +---+---+    +---+---+
              |   |----------->|   |   |--->|   |   |--->|   |   |
              +---+            +---+---+    +---+---+    +---+---+
              |   |-------+
              +---+       |    +---+---+
              |   |---+   +--->|   |   |
              +---+   |        +---+---+
              |   |   |
              |   |   |        +---+---+    +---+---+
              |   |   +------->|   |   |--->|   |   |
                               +---+---+    +---+---+

                                 figure 6.3


Implementation Details



   Lists of calls are frequently used by the compiler to maintain current status
information  during compilation.  Some are used as pushdown popup stacks but not








                                        37
all.   A list cell  in  this  implementation  consists  of  two  words,  one  an
information word and the other a link word:

                       +---------------+---------------+
                       |  information  |      link     |
                       +---------------+---------------+

Information  words are taken from  an array  named 'TAG' and link  words from an
array  named 'LINK'.   Initially, all these  pairs of words  are formed  into an
'Available  Space List' or 'Free List' and thereafter  cells are taken from this
list as  required  and  returned  to  it when  no longer  needed.  The  variable
'TAGASL' always points to the head of the available space list:

                       +---+---+    +---+---+    +---+---+
             TAGASL--->|   |   |--->|   |   |--->|   |   |---> . . .
                       +---+---+    +---+---+    +---+---+


By convention, the  link of the last  call in any list is set to zero.  Links to
other list calls take the form of the index of the words in the 'TAG' and 'LINK'
arrays. These are  therefore  declared from 1 upwards to avoid conflict. Both of
the  arrays are  initialised  in file 'SKIMPA'  to set up the tags and links for
built-in  routine  names such as 'READ SYMBOL' etc. (The name dictionary is also
initialised for the same reason.)

   A function  'NEWTAG', in file 'SKIMPD', takes  the cell indicated by 'TAGASL'
from the  available  space  list  whenever a cell is  required. 'TAGASL' is then
moved down the  list to the next one.  If  ever 'TAGASL'  is zero, the  compiler
stops with a suitable monitor.  The result of the 'NEWTAG' function is the index
of the cell in 'TAG' and 'LINK'. A common  sequence of  instructions  might take
the form:

                           CELL = NEWTAG
                           TAG (CELL) = ....
                           LINK (CELL) = ....


   A function  named 'RETURNTAG',also in file 'SKIMPD', has  the reverse effect.
It takes a single parameter which is the index of the cell to be returned to the
available space list.  The result  of the  function  is the  value in the 'LINK'
word of the cell returned.  This is purely a matter of programming convenience.

   The two parallel  arrays which form the hash dictionary of names and pointers
to tags are named  'NAMEDLINK'  and  'TAGLINK'  and  point  to the  'NAMED'  and
'TAG/LINK' arrays respectively.

   Routines  'PUSHTAG' and 'POPTAGS', in file 'SKIMPD', push  and pop  tags from
'TAGLINK' as required. 'PUSHTAG' is called with the identification number of the
name whose  tag  is  being  pushed  together  with  it's  form,  type,  etc.  as
parameters.  A subsidiary  function of this routine is to check whether the same
name has  already  been  declared  at this level  and  monitor  the fault if so.
'POPTAGS' is called at the %END of each routin e or function to pop all the tags
for names declared at that level, monitoring them if the '/TAGS' option was set.
The  array which contains  pointers to lists of names  declared at each level is
named 'NAMELIST'.








                                        38
                                     Chapter 7

                              Compilation of Expressions


   Although all the registers in the object  machine can be used as accumulators
the machine  is regarded, for the  sake  of simplicity, as a single-accummulator
machine  and all  expressions  are  evaluated  in the one register  'ACC'.  This
avoids  the necessity  of having  register to register operations which would be
needed for a true multi-accumulator machine and avoids the extra complexity this
introduces for expression compilation.

   A 'tree'  algorithm  is used  in the SKIMP compiler.  Many  other methods are
possible  but  this  one produces reasonably  good  object code without too many
redundant  'LOAD'  and  'STR'  operations.    It   is  also  straightforward  to
comprehend.  Expressions  are regarded  as trees with operators at the nodes and
operands as leaves.  For example, the expression:

                              I  * J - K / (L + M)

has the tree form shown in figure 7.1.

                                    -
                                    |
                              +-----+-----+
                              |           |
                              *           /
                              |           |
                           +--+--+     +--+--+
                           |     |     |     |
                           I     J     K     +
                                             |
                                          +--+--+
                                          |     |
                                          L     M

                                figure 7.1


   The algorithm  examines  the 'shape' of the tree  in order to decide the best
order of evaluation of the expression.  The commutativity of the operators, that
is, whether  'X?Y' is the  same as 'Y?X' or not, is taken  into account and also
whether the sub-trees of operator nodes are leaves or not.  When the sub-tree is
a leaf, that  operand is capable  of being immediately  loaded into 'ACC' or can
operate immediately on 'ACC'.

   When  the  sub-tree is not a  leaf i.e. a further  tree, the algorithm  works
recursively  to  compile  it.  The  algorithm  can be  tabulated for the various
combinations  possible, the whole  thing being regarded as 'evaluate tree' whose
object is to dump code which puts  the result in 'ACC'.  The  table  is shown in
figure 7.2.

   This algorithm would not be acceptable if the order of evaluation was defined
as  part of the language, for  instance  strictly left to right.  As an example,
the  object  code  is  improved  by  evaluating the  second  operand  first  for
non-commutative operators.  To illustrate the working of the algorithm, consider
the expression which was used above:

                              I * J - K / (L + M)




                                       39
   +--------------+------------+----------------+-------------------------+
   | left branch  |  operator  |  right branch  |         actions         |
   +--------------+------------+----------------+-------------------------+
   |    leaf 1    |    all     |     leaf 2     |  LOAD,ACC,leaf 1        |
   |              |            |                |  (operation),ACC,leaf 2 |
   +--------------+------------+----------------+-------------------------+
   |     tree     |    all     |      leaf      |  evaluate tree          |
   |              |            |                |  (operation),ACC,leaf   |
   +--------------+------------+----------------+-------------------------+
   |     leaf     |    comm.   |      tree      |  evaluate tree          |
   |              |            |                |  (operation),ACC,leaf   |
   +--------------+------------+----------------+-------------------------+
   |     leaf     |  non-comm. |      tree      |  evaluate tree          |
   |              |            |                |  STR,ACC,(work)         |
   |              |            |                |  LOAD,ACC,leaf          |
   |              |            |                |  (operation),ACC,(work) |
   +--------------+------------+----------------+-------------------------+
   |    tree 1    |     all    |     tree 2     |  evaluate tree 2        |
   |              |            |                |  STR,ACC,(work)         |
   |              |            |                |  evaluate tree 1        |
   |              |            |                |  (operation),ACC,(work) |
   +--------------+------------+----------------+-------------------------+

                                 figure 7.2

   The steps, with indentation  for  recursive  depth are shown in  below.   The
'work'  locations used for partial  results are in the local  name space for the
current  routine  as  was  described  in  chapter 5  in  connection  with  array
declarations.

              evaluate tree [ I*J-K/(L+M) ]
                 tree [ I*J ]   all [ - ]   tree [  K/(L+M) ]
                 evaluate tree [ K/(L*M) ]
                 leaf [ K ]   non-comm [ / ]   tree [ L+M ]
                 evaluate tree [ L+M ]
                    leaf [ L ]   all [ + ]   leaf [ M ]
                    LOAD,ACC, L
                    ADD,ACC, M
                    STR,ACC, work 1
                    LOAD,ACC, K
                    DIV,ACC, work 1
                 STR,ACC, work 1
                 evaluate tree [ I*J ]
                    leaf [ I ]   all [ * ]   leaf [ J ]
                    LOAD,ACC, I
                    MLT,ACC, J
                 SUB,ACC, work 2


   The case of array  elements and function calls as operands must be considered
in more detail.  These  cannot be regarded  as leaves  of the  tree  since  they
involve  further  computation  which  needs  register 'ACC'.  They are therefore
treated  as  trees  from  the  point  of view of the  algorithm.   Array element
accessing  puts the value of the element in 'ACC' and the result of functions is
also left in 'ACC'.  This satisfies the requirements of 'evaluate tree'.

   Variables  of  type  %INTEGERNAME  i.e. parameters, can  still be regarded as
leaves  by  making  use  of  another  index  register,  'WK',  for  the indirect





                                        40
reference.  Constants are leaves of trees and, in general, are accessed from the
table of constants  following  the object code using register 'COT' as the base.
The use of a  location  in  the constant  table  is  avoided  if a 'LOAD' of the
constant  is  required  and  the value is less than 65536.   This is because the
'LDA'  instruction  can  be  used  with the index  register  set  to  zero.  For
example,an instruction to 'LOAD' the value 1234 would be:

                                 LDA, ACC, , 1234


   The  tree  representation  is  that  of Reverse  Polish  form  together  with
pointers.  In  Reverse  Polish  notation  the operator follows the operands.  In
general,

                          operand 1 operator operand 2

becomes:
                          operand 1 operand 2 operator

and  the  same  transformation  takes  place  for  the  operands  when  they are
expressions.  For example, the infix expression:

                              I * (J + K) / L - M

becomes:

                              I J K + * L / M -

   The  extra  pointers  that  were  mentioned  come  after  each  operator  and
correspond to the branches of the tree.  In general the form is:

                                      <-------------------------+
                      <--------------|----------------------+   |
                     |               |                      |   |
               +-----------+   +-----------+   +----------+---+---+
               | operand 1 |   | operand 2 |   | operator |   |   |
               +-----------+   +-----------+   +----------+---+---+

   When  an  operand  is a  further  sub-expression, the  pointer  points to the
operator in it's repesentation.  Otherwise, the pointer points to the first word
of a pair of words which  describe the operand.  The first of the pair is a type
number and the second is  consequent on the value of the first.  The type values
are:                  ^operators
                      |
                     -1     :       function call
                     -2     :       array element
             leaves/ -3     :       scalar variable
                   \ -4     :       constant

They are negative  to distinguish  them from  values which denote  operators and
hence sub-trees.  The  values for  operators  are their alternative numbers from
phrase <OP> (with two  additional  values beyond for the unary operators '-' and
'\') and are thus always positive.  Values greater than or equal to -2 cause the
algorithm to be called recursively, a  requirement  which  was  mentioned above.
For  types -1 and -2, the second  word of the pair  contains an analysis  record
pointer  which  will  allow the function call or array  element accession object
code to be generated when it is required.  For type -3, the  second  word is the






                                        41
tag  of  the  variable  and  for  type  -4, the  second word is the value of the
constant. As an example, consider the expression:

                                   I + 12345

The representation of this is:

                                     <------------------------+
                     <--------------|-------------------+     |
                    |               |                   |     |
                +------+--------+------+-------+-----+-----+-----+
                |  -3  | tag(I) |  -4  | 12345 |  9  |  1  |  3  |
                +------+--------+------+-------+-----+-----+-----+

An example to show sub-trees might be:

                               (I + 99) * (J - 88)

The representation of this is:

                                                        <--------------------+
                       <-------------------------------|----------------+    |
              <-------|-------+               <--------|--------+       |    |
   <---------|--------|---+   |    <---------|---------|---+    |       |    |
  |          |        |   |   |   |          |         |   |    |       |    |
+---+------+---+----+---+---+---+---+------+---+----+----+---+----+---+---+----+
|-3 |tag(I)|-4 | 99 | 9 | 1 | 3 |-3 |tag(J)|-4 | 88 | 10 | 8 | 10 | 8 | 5 | 12 |
+---+------+---+----+---+---+---+---+------+---+----+----+---+----+---+---+----+


   The  technique  used to convert  infix  expressions  to this  form combines a
method  of  converting them to Reverse Polish with a scheme of pseudo-evaluation
to calculate  the values of the  pointers. The conversion  to Reverse  Polish is
shown diagrammatically in figure 7.3.  A pushdown stack is used for operators to
take  account  of  their  precedence.   This  delays  their  storage  until  the
appropriate  place  in  the  output.   A new stack is declared at each recursive
level  to  deal  with  bracketted  sub-expressions.     The  conversion  of  the
expression:

                           I + J * (K - L) / M

would  proceed as shown in figure 7.4.  The  dotted  box indicates the recursive
call to deal with '(K-L)'.

   When  operators and  operands are 'stored'  according  to the flow diagram, a
pseudo-evaluation  takes place.  To explain what is meant by 'pseudo'-evaluation.
first consider an 'evaluation' of a reverse Polish expression.  A pushdown stack
is  used  to retain  operand  values and partial  results during the evaluation.
Proceeding  from left to right, whenever an operand is encountered it's value is
pushed down.  Whenever a (binary) operator  is encountered, the top two items on
the stack are popped and the operation between them is performed.  The result is
then  pushed back onto the stack.  (For unary operators, just one item is popped
from the stack).  Proceeding  in this way. when the whole of the reverse  Polish
string has been dealt with, the value of the whole expression is what remains in
the stack. For example:

                               10 + 8 * (6 - 4) / 2






                                        42
   to reverse Polish              |
+ - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - +
                                  |
|                       +---------+-------+                           |
               <--------| unary operator ?|
|             |     yes +-----------------+                           |
        +----------+              | no
|   +-->| stack op |------------->|                                   |
    |   +----------+     +--------+--------+     + - - - - - -+
|   |                    | next operand a  |---->| to reverse |       |
    |                    | sub-expression ?| yes |   Polish   |
|   |                    +--------+--------+     + - - - - - -+       |
    |                             | no                 |
|   |                     +-------+-------+            |              |
    |                     | store operand |            |
|   |                     +-------+-------+            |              |
    |                             |<-------------------+
|   |                   +---------+---------+    +-----------------+  |
    |                   | another operator ?|--->| unstack & store |----->
|   |                   +---------+---------+ no | operators left  |  |
    |                             | yes          +-----------------+
|   |                  +----------+----------+                        |
    +------------------| new operator higher |<----+
|                  yes |     precedence ?    |     |                  |
                       +----------+----------+     |
|                                 | no             |                  |
                         +--------+--------+       |
|                        | unstack & store |-------+                  |
                         |    operator     |
|                        +-----------------+                          |

+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

                                figure 7.3

       input   op stack     action                   cumulative output

         I      empty       store  I                       I
         +        +         stack  +                       I
         J        +         store  J                       IJ
         *        +*        stack  *                       IJ
       + - - - - - - - - - - - - - - - - - - - - +
       | K      empty       store  K             |         IJK
       | -        -         stack  -             |         IJK
       | L        -         store  L             |         IJKL
       |        empty       unstack & store  -   |         IJKL-
       + - - - - - - - - - - - - - - - - - - - - +
         /        +         unstack & store  *             IJKL-*
                  +/        stack /
         M        +         store  M                       IJKL-*M
                  +         unstack & store  /             IJKL-*M/
                empty       unstack & store  +             IJKL-*M/+

                               figure 7.4









                                        43
which in reverse Polish is:

                               10 8 6 4 - * 2 / +

The stages of evaluation in a pushdown stack are

     |    |  |    |  |    |  |    |  |    |  |    |  |    |  |    |  |    |
     +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+
     |    |  |    |  |    |  |  4 |  |    |  |    |  |    |  |    |  |    |
     +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+
     |    |  |    |  |  6 |  |  6 |  |  2 |  |    |  |  2 |  |    |  |    |
     +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+
     |    |  |  8 |  |  8 |  |  8 |  |  8 |  | 16 |  | 16 |  |  8 |  |    |
     +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+
     | 10 |  | 10 |  | 10 |  | 10 |  | 10 |  | 10 |  | 10 |  | 10 |  | 18 |
     +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+

In a 'pseudo'-evaluation, it is not the values of operands  that are pushed down
but some other property.  In this case, it is positions in the representation of
the tree.  In  place  of  evaluation  when  operators  occur,  the  positions of
operands  are popped  up and it is these  values which are appended  to the tree
representation.  The  position  of  the  operator  is  then pushed down in their
place.  Consider the example used above:

                              (I + 99) * (J - 88)

which in reverse Polish is:

                                 I 99 + J 88 - *

The successive contents of the pseudo-evaluation stack will be:

             |    |  |    |  |    |  |    |  |    |  |    |  |    |
             +----+  +----+  +----+  +----+  +----+  +----+  +----+
             |    |  |    |  |    |  |    |  |    |  |    |  |    |
             +----+  +----+  +----+  +----+  +----+  +----+  +----+
             |    |  |    |  |    |  |    |  | 10 |  |    |  |    |
             +----+  +----+  +----+  +----+  +----+  +----+  +----+
             |    |  |  3 |  |    |  |  8 |  |  8 |  | 12 |  |    |
             +----+  +----+  +----+  +----+  +----+  +----+  +----+
             |  1 |  |  1 |  |  5 |  |  5 |  |  5 |  |  5 |  | 15 |
             +----+  +----+  +----+  +----+  +----+  +----+  +----+

The  representations  of the trees  corresponding  to all  the  expressions in a
program  can  be  monitored  by  using  the  parameter '/EXPR' when  calling the
compiler.  Examples can be found in appendix 1.


Implementation Details


   The routine concerned with the compilation of expressions is named 'EXPR' and
can be found in file 'SKIMPE'.  Within 'EXPR', subroutine 'TO TREE' produces the
linear  tree  representation  using  a  further  subroutine  'PS EVAL'  for  the
pseudo-evaluation part.   It also has the subsidiary  function  of checking that
all  the  names  used  in  the  expression  have  been  declared  and  are of an
appropriate type, for instance, not routine names.

   The tree code generating algorithm is implemented by routine 'EVALUATE' which
makes use of a subroutine 'OPN' to dump primitive operations i.e. routine calls.
array and variable accesses and constant handling.


                                        44
   A minor  complication  of routine  'EXPR' deals with the expressions involved
with simple comparisons in conditional statements.  This is because a comparison
such as:

                              expr1  <COMP>  expr2

is  compiled  as  an  evaluation  of  'expr1 - expr2'  and then the value tested
against  zero.  An  external flag named 'CONDFLAG' is set in these circumstances
and its  effect is form a tree with  branches  representing  each expression the
topmost node being a '-'.  The algorithm then proceeds as normal.





















































                                        45


                                     Chapter 8

                              Conditional Statements


   The general form of conditional statement is:

          %IF  condition  %THEN  instruction 1  %ELSE  instruction 2

where the 'instructions' include the case of a group of statements surrounded by
%START and %FINISH.  In compiling this, the aim is to produce object code of the
form:

                            +---------------+
                            |   condition   |
                            +---------------+
                             ->FALSE if false
                            +---------------+
                            | instruction 1 |
                            +---------------+
                                 ->NEXT
                     FALSE: +---------------+
                            | instruction 2 |
                            +---------------+
                     NEXT:

where  the  box  containing  'condition'  implies object code instructions which
evaluate  the truth or falsity of the condition for subsequent testing. When the
%ELSE clause is absent, this simplifies to the form:

                            +---------------+
                            |   condition   |
                            +---------------+
                             ->FALSE if false
                            +---------------+
                            | instruction 1 |
                            +---------------+
                     FALSE:

In addition, since the truth or falsity of the  whole condition can sometimes be
determined   without  evaluating  all  the  simple  comparisons,  there  may  be
intermediate  jumps  out  from  the  middle  of the condition  to the consequent
instructions. For example, consider;

        %IF sc1 %AND (sc2 %OR sc3) %THEN instr 1 %ELSE instr 2

This would produce code of the form shown in the next diagram.  Analagously with
arithmetic  expressions,  however,  the  language  may  define  conditions to be
evaluated  'in toto'  as  Boolean  expressions which would preclude jumping out.
Fortunately, SKIMP does not require this.

   A  number  of   observations   can  be  made  with  a  view  to  producing  a
straightforward  simple  compilation  algorithm.  The first concerns whether the
jump  following  each  simple  comparison in the condition  should be on true or
false. Consider the case:

        %IF sc1 %AND sc2 %AND sc3 %THEN instr 1 %ELSE instr 2




                                        46
                           + - - - - - - - - - -+
                           |    +----------+    |
                           |    |   sc 1   |    |
                           |    +----------+    |
                           |  ->FALSE if false  |
                           |    +----------+    |
                           |    |   sc 2   |    |
                           |    +----------+    |
                           |   ->TRUE if true   |
                           |    +----------+    |
                           |    |   sc 3   |    |
                           |    +----------+    |
                           + - - - - - - - - - -+
                              ->FALSE if false
                    TRUE:  +--------------------+
                           |      instr 1       |
                           +--------------------+
                                  ->NEXT
                    FALSE: +--------------------+
                           |      instr 2       |
                           +--------------------+
                    NEXT:

The code in this second case should be of the form:

                           +----------------+
                           |      sc 1      |
                           +----------------+
                            ->FALSE if false
                           +----------------+
                           |      sc 2      |
                           +----------------+
                            ->FALSE if false
                           +----------------+
                           |      sc 3      |
                           +----------------+
                            ->FALSE if false
                           +----------------+
                           |    instr 1     |
                           +----------------+
                                 ->NEXT
                    FALSE: +----------------+
                           |    instr 2     |
                           +----------------+
                    NEXT:

Next consider:

          %IF  sc1  %OR  sc2  %OR  sc3  %THEN  instr 1  %ELSE  instr 2

This should produce the code in the diagram  below.  It will be apparent that if
an  %AND  follows  the  simple  comparison, the  jump  is on false and if an %OR
follows, the  jump  is on true.  To make this rule  consistent throughout, since
the jump following  the last simple comparison ts always on false we can imagine
an %AND following the last simple comparison:

               %IF ( condition ) %AND %THEN instr 1 %ELSE instr 2






                                        47
                           +----------------+
                           |      sc 1      |
                           +----------------+
                             ->TRUE if true
                           +----------------+
                           |      sc 2      |
                           +----------------+
                             ->TRUE if true
                           +----------------+
                           |      sc 3      |
                           +----------------+
                            ->FALSE if false
                    TRUE:  +----------------+
                           |    instr 1     |
                           +----------------+
                                 ->NEXT
                    FALSE: +----------------+
                           |    instr 2     |
                           +----------------+
                    NEXT:


   The  next  observation  concerns  the  destination  of the jump.  This is not
always to one of the  unconditional  instructions.  It can be to a later  simple
comparison. For example:

       %IF ((sc1 %AND sc2) %OR sc3) %AND sc4 %THEN instr 1 %ELSE instr 2

This should produce code of the form:

                           +----------------+
                           |      sc 1      |
                           +----------------+
                           ->FALSE1 if false
                           +----------------+
                           |      sc 2      |
                           +----------------+
                             ->TRUE if true
                    FALSE1:+----------------+
                           |      sc 3      |
                           +----------------+
                           ->FALSE2 if false
                    TRUE:  +----------------+
                           |      sc 4      |
                           +----------------+
                           ->FALSE2 if false
                           +----------------+
                           |    instr 1     |
                           +----------------+
                                 ->NEXT
                    FALSE2:+----------------+
                           |    instr 2     |
                           +----------------+
                    NEXT:

   For a simple  comparison  followed  by  an  %AND, the  jump  is to the simple
comparison  following the next %OR at an outer  bracketting level.  For a simple
comparison followed by an %OR, the jump is to the  simple  comparison  following





                                        48
the next  %AND at an outer bracketting level.  This can again be made consistent
if the conditional statement is regarded as of the form:

           %IF ( ( condition ) %AND %THEN instr 1 ) %OR %ELSE instr 2

To  demonstrate  this, we  could  redraw  the  example above showing bracketting
levels  with  each  simple  comparison (or  instruction) at  the  level  of  the
following %AND or %OR:

    sc1 %AND
       |
       | false    sc2 %OR
       |             |
       |             |true    sc3 %AND
       |             |        |  |
       +-------------|------->   |false    sc4 %AND
                     |           |         |  |
                     +-----------|-------->   |
                                 |            |false    instr1 %OR
                                 |            |
                                 +------------+---------------------->instr2

A further example might be:

     %IF (sc1 %OR sc2 %OR sc3) %AND (sc4 % OR sc5) %THEN instr1 %ELSE instr2

The organisation required is:

    sc1 %OR   cs2 %OR                 sc4 %OR
       |         |                    |  |
       |true     |true    sc3 %AND    |  | true
       |         |           |        |  |
       |         |           |false   |  |        sc5 %AND
       |         |           |        |  |        |
       +---------+-----------|------->   |        |false
                             |           |        |
                             |           +--------|-------->instr1 %OR
                             |                    |
                             +--------------------+-------------------->instr2


   In compiling  conditional  statements, the  plan of action is to make several
passes over the  condition, working out the structure  required step by step and
finally, in a last  pass, generating  the object code.  Five parallel arrays are
used, each with one  position  per simple comparison.  The first  pass traverses
the analysis  record  and locates  the positions  of all the simple comparisons.
Simultaneously, the nesting  levels  and  truth  or  falsity  of the branches is
determined  and  also  stored.  The  next  pass  goes over  this  information to
determine  the destinations  of the branches  using the rules of thumb described
above.  For those simple comparisons  which are the destination of branches, the
next pass allocates 'private' labels to which branches can be compiled using the
normal  methods used  elsewhere in the compiler (see chapter 9).  Private labels
are  just  values  allocated  upwards  from  10000,  which  allows  them  to  be
distinguished  from  ordinary  labels.    For  the  sake  of  consistency,  the
instructions made conditional are also allocated positions in the arrays and are
regarded as being  at nesting  levels -1 and -2.  The rules  can then be applied
uniformly throughout.

   When  the  instruction  made  conditional  is  itself a branch, the object is
optimised  slightly  to allow  branches  to be made direct to the required place



                                        49
from within the condition instead of having a branch to a branch. For example:

                %IF sc1 %OR sc2 %OR sc3 %THEN ->100 %ELSE instr

The code produced takes the form:

                           +-------------+
                           |     sc 1    |
                           +-------------+
                             ->100 if true
                           +-------------+
                           |     sc 2    |
                           +-------------+
                             ->100 if true
                           +-------------+
                           |     sc 3    |
                           +-------------+
                             ->100 if true
                           +-------------+
                           |    instr    |
                           +-------------+

This is easily  achieved  by using the required label instead of a private label
to label the conditional branch instruction in the branch array.  The only other
modification  is to ensure  that the branch  on  the  previous  i.e.last, simple
comparison is on true rather than false and to the label specified.  This should
be clearer when the implementation details are discussed.

   The contents of the five arrays can be displayed  during compilation by using
the parameter '/COND'.  Examples will be found in appendix 1.


%START-%FINISH Groups


   When either of the instructions  made  conditional is a %START-%FINISH group,
the destinations of the final branches are not immediately known.  This can only
be  done  after  the  %FINISH  has  appeared.  Information  therefore  has to be
preserved  during  the  compilation  of the  group.   Since the group may itself
contain further conditional statements and hence nested %START-%FINISH groups, a
pushdown  list is used for the purpose.  Furthermore, since  routine  bodies may
also appear anywhere and be nested, a list will potentially be required for each
textual  level  of  the  program.  An  array  of  headcells  to lists is used as
previously to achieve this:

                       +----+       +---+---+     +---+---+
                       |    |------>|   |   |---->|   |   |
                       +----+       +---+---+     +---+---+
                       |    |---+
                       +----+   |   +---+---+
                       |    |   +---|   |   |
                       +----+       +---+---+
                       |    |

Two  items of  information  need to be stored for each group.  Firstly, a marker
which  distinguishes  between groups before and those after  %ELSE  keywords and
secondly, a label  number  assigned  to the destination  of branches  around the
group.  This will be a private label.





                                        50
Implementation Details


   The code  generation  for  conditions  is done  by  integerfn 'COND'  in file
'SKIMPC'.  This is called from the section of 'STATEMENT' in file 'SKIMPB' which
deals with conditional statements.  It is at this latter place that the code for
the instructions  made  conditional is generated by calling routine 'INSTR'.  In
order to deal with  the  branch  instruction  optimisation,  two  parameters  in
addition  to  the  parameter  which  indicates where the condition starts in the
analysis  record  are passed in. These give the label numbers of the branches or
-1 if either  of then is a non-branch  instruction.  These are then used instead
of  private  labels  as  will be seen below.  The result of this function is the
private label number of the label to be put in front of the %ELSE clause.


   %START-%FINISH  groups  are  dealt  with  after  the  condition code has been
generated and also where  %FINISH statements are processed.  Two subroutines are
used to maintain  the  pushdown lists.  These are 'PUSHSTART' and 'POPSTART' and
can be found in file 'SKIMPD'.

   The five  parallel  arrays within function 'COND' are named 'TESTP', 'LEVEL',
'ANDOR', 'BRANCH' and 'LABEL'.   They  hold. respectively, the  analysis  record
positions  of the  simple  comparisons  (or 'tests'), the nesting  levels of the
simple  comparisons, the  truth or falsity  of  the  jumps(1 for false and 2 for
true), the destination  of the jumps (in terms of column  position in the table)
and the labels assigned to simple comparisons.  For example, consider:

     %IF (sc1 %OR sc2 %OR sc3) %AND (sc4 %OR sc5) %THEN instr1 %ELSE instr2

The contents of the arrays will be:

                         1     2     3     4     5     6     7
            TESTP      ap1   ap2   ap3   ap4   ap5
            LEVEL        1     1     0     1    -1    -2
            ANDOR        2     2     1     2     1     2
           BRANCH        4     4     7     6     7
            LABEL       -1    -1    -1 10000    -1 10002 10001

where the 'ap?' represent  analysis  record  positions.  The first pass over the
statement  is  performed  by the  subroutine  'PROCESSCOND'  and  fills  in rows
'TESTP', 'LEVEL' and 'ANDOR'.   It works  recursively  following  the  recursive
definition of conditions.  Subsequently, two extra columns are filled in with -1
and -2 for  the  pseudo-%AND and the  pseudo-%OR.   On  the  next  pass  the row
'BRANCH'is  filled in and  following that labels are assigned as necessary. (The
-1 indicates  that no jump  takes place to this simple  comparison  and hence no
label is required.  The  final pass  generates the object code by calling 'EXPR'
with the 'CONDFLAG' set as described  in the  previous  chapter  to evaluate LHS
minus RHS, and dumping the appropriate intermediate jump instructions.

   Another example may help to clarify the picture further .  Consider the case:

     %IF sc1 %OR ((sc2 %OR sc3) %AND sc4 %AND sc5) %THEH instr1 %ELSE instr2

The structure of the statement is:









                                        51
           sc2 %OR
              |
              |true    sc3 %AND   sc4 %AND
              |           |       |  |
sc1 %OR       |           |false  |  |false
   |          |           |       |  |
   | true     +-----------|------>   |        sc5 %AND
   |                      |          |           |
   |                      |          |           |false
   |                      |          |           |
   +----------------------|----------|-----------|-------->instr1 %OR
                          |          |           |
                          +----------+-----------+---------------------->instr2


The table for this statement will be:

                         1     2     3     4     5     6     7
            TESTP      ap1   ap2   ap3   ap4   ap5
            LEVEL        0     2     1     1    -1    -2
            ANDOR        2     2     1     1     1     2
            BRANCH       6     4     7     7     7
            LABEL       -1    -1    -1 10004    -1 10003 10005

As a final example, consider a statement containing conditional branches:

                 %IF sc1 %OR sc2 %OR sc3 %THEN ->88 %ELSE ->99

The table will be:
                         1     2     3     4     5
            TESTP      ap1   ap2   ap3
            LEVEL        0     0    -1    -2
            ANDOR        2     2     2     2
           BRANCH        4     4     4
            LABEL       -1    -1    -1    88    99




























                                        52


                                     Chapter 9

                                  Labels and Jumps


   As  the  SKIMP  compiler  is  a one-pass  compiler, jumps  to labels  not yet
encountered  cannot be completely  generated.  Backward jumps are, of course, no
problem.  For the forward jumps, all that can be done is to leave  a hole in the
branch  instruction  dumped which can be filled in at some later stage.  In this
case, the  assembler  which  forms  the first stage of the 'SKIMPAI' interpreter
system has the job to do.  In order for this to be possible, information must be
recorded  somewhere  concerning  the positions  of the holes to be filled in and
what to fill them in with.  The method adopted is to use the holes themselves as
links  in lists  which relate to each label and for the  compiler to dump 'FILL'
assembler  directives  which are  really the  headcells to these lists and which
also  include  the  value  to  be  filled  in.  The 'FILL' directives are dumped
whenever  a  label  which  has  been  previously  referenced is encountered.  An
example should make this clear. Consider the program:

                                   . . . .
                                   ->88
                                   . . . .
                                   ->99
                                   . . . .
                                   ->99
                                   . . . .
                               99: . . . .
                                   . . . .

The object code dumped will take the form:

                       . .     . . . .
                                   +------+
                       12$     B,,,|   0  |
                                   +------+
                       . .     . . . .
                                   +------+
                       34$     B,,,|  12  |
                                   +------+
                       . .     . . . .
                                   +------+
                       56$     B,,,|  34  |
                                   +------+

   As labels are local to each  level of routine  nesting, once  again  separate
lists  must  be used for each level.  Each  cell of a list in this case contains
three items of information, the label number, a flag and an address:

                    +---------------+------+-----------+
                    |     label     | flag |  address  |
                    +---------------+------+-----------+
                           (16)        (1)      (15)









                                        53
The flag indicates whether  the address is that of the latest  forward reference
to the label  i.e. the current top of the forward  reference list, or the actual
address of the label  which  will be needed for any  backward  references.   The
value 1  indicates  the  former and  0  the  latter.   When  additional  forward
references appear therefore, the address is pushed down 'through' this cell.

   A cell of this form  is created whenever a label is encountered for the first
time - either as a label or as a jump.  The list  for the current  textual level
is first  searched and a new cell added if no existing cell refers to that label.
The flag allows  two vital  checks  to be made.  Firstly  that the label has not
occurred  before (as a label) and  secondly  at the end  of the routine that the
label has in fact appeared.

   Routine  calls  use the 'BAL' instruction  and are always backward references
since  SKIMP does not allow routine %SPECs.  The address of the routine body is,
in other  words, already  known when any routine  calls are made.   SKIMP allows
routines  to  be placed  anywhere  in a program.  In  particular  at the head of
surrounding routines.   Jumps  around  these  routine  bodies  must therefore be
dumped.  These  are forward  references but are simple in that only one backward
reference will occur.  They  are  therefore  treated specially without having to
invent a private label.

   Calls  on  the  built-in  input-output  routines  are also treated specially.
Since these routines are implemented in the 'SKIMPAI' assembler/interpreter, the
'BAL' instructions  dumped contain the marker 'EXT' and a number which refers to
the routine required.  Were external  references part of the language, the names
themselves would have to appear in the object file and likewise for any external
variables  in the program.  A linker  would then  resolve the references between
programs.


Implementation Details


   The routines  which  accomplish  the functions decrlbed above are named 'FILL
LABEL' and  'FILL BRANCH', in file 'SKIMPD'.  A function  'GET LABEL' extracts a
label from the analysis  record and checks that it is valid i.e.less than 10000.
'POP LABELS' pops  up  the  cells  and  checks  for  missing labels at %END (and
%ENDOFPROGRAM).
























                                        54










                                     Appendix 1

                           Examples of SKIMP Object Code Files


















































                                        55
                                  LEX, ANAL example

                                 SKIMP COMPILER MKII

FILE: TESTL
OPTIONS: LEX ANAL


%begin

  130

  (1/STATEMENT)    8

    0$          LDA,COT,,0
    1$          LDA,DR1,,0
    2$          LDA,STP,DR1,0

%integerfn r

  136  134  256   82

  (1/STATEMENT)    6    5    6    8  (5/PROC)    2  (6/NAME)    1
   82  (8/FORMAL)

    3$          B,,,0
    4$          STR,DR2,STP,0
    5$          LDA,DR2,STP,0
    6$          STR,WK,STP,1
    7$          LDA,STP,STP,0


%integer i,j,k

  136  256   73   44  256   74   44  256   75

  (1/STATEMENT)    5    3  (3/ARRAY)    2    6    8  (6/NAME)    1
   73  (8/NAMES)    1   11   13  (11/NAME)    1   74  (13/NAMES)   1
    16  18  (16/NAME)    1   75   (18/NAMES)   2



i=j+k

  256   73   61   256   74   43  256   75

  (1/STATEMENT)     1    3  (3/INSTR)    1    7    9   10  (7/NAME)    1
   73  (9/ACTUAL)    2   (10/ASSIGN)    1   12  (12/EXPR)    1
   16   17   23  (16/UNARY)     4  (17/OPERAND)     1  20   22
  (20/NAME)    1   74  (22/ACTUAL)     2  (23/EXPRREST)    1    27
   28   34  (27/OP)    9   (28/OPERAND)    1   31   33  (31/NAME)    1
   75  (33/ACTUAL)    2   (34/EXPRREST)    2

    8$          LOAD,ACC,DR2,3
    9$          ADD,ACC,DR2,4
   10$          STR,ACC,DR2,2

%if i>1234 %then %stop

  135  256   73   62  257 1234  145  144



                                        56
  (1/STATEMENT)    2    5   36   37 (5/COND)    1    8   35 (8/TEST)    1
   12   24   25 (12/EXPR)    1   16   17   23  (16/UNARY)    4
  (17/OPERAND)    1   20   22 (20/NAME)    1   73 (22/ACTUAL)    2
  (23/EXPRREST)    2 (24/COMP)    6 (25/EXPR)    1   29   30
   34 (29/UNARY)    4 (30/OPERAND)    2   32 (32/CONST)    1
 1234 (34/EXPRREST)    2 (35/CONDREST)    3 (36/INSTR)    6
  (37/ELSE)    2

   11$          LOAD,ACC,DR2,2
   12$          SUB,ACC,COT,0
   13$          BNG,ACC,,0
   14$          STOP,,,0
   15$          FILL,10000,13,15


%results=i+4321

  141    61  256   73   43 257 4321

  (I/STATEMENT)     1    3 (3/INSTR)    5    5 (5/EXPR)    1
   9   10   16 (9/UNARY)    4 (10/OPERAND)    1   13   15 (13/NAME)    1
  73 (15/ACTUAL)    2 (16/EXPRREST)    1   20   21   25 (20/OP)    9
 (21/OPERAND)    2   23 (23/CONST)   1 4321 (25/EXPRREST)    2


   15$          LOAD,ACC,DR2,2
   16$          ADD,ACC,COT,1
   17$          LDA,STP,DR2,0
   18$          LOAD,DR2,STP,0
   19$          LOAD,WK,STP,1
   20$          B,,WK,0


%end

  131

  (1/STATEMENT)    7    3 (3/OFPROG)    2

   21$          FILL,ALLOC,7,5
   21$          STOP,,,0
   22$          FILL,SKIP,3,22


%endofprogram

  131  139

 (1/STATEMENT)    7    3 (3/OFPROG)    1

   22$          FILL,ALLOC,2,2
   22$          STOP,,,0
   23$          FILL,COT,0,23
   23$          CONST...1234
   24$          CONST,,,4321
   25$          FILL,STACK,1,25

     $   0 FAULTS IN PROGRAM





                                        57
                    EXPR example

                    SKIMP COMPILER MKII

FILE: TESTE
OPTIONS: EXPR

%begin
    0$          LDA,COT,,0
    1$          LDA,DR1,,0
    2$          LDA,STP,DR1,0


%integer i,j,k,l

%integerarray a(1:10)

   -4*    1

    3$          LDA,ACC,,1
    4$          STR,ACC,DR1,6

   -4*   10

    5$          LDA,ACC,,10
    6$          LDA,ACC,ACC,1
    7$          STR,ACC,DR1,7
    8$          SUB,STP,DR1,6
    9$          STR,STP,DR1,8
   10$          ADD,STP,DR1,7


i=j+k
                                  +
   -3   01010003  -3  01010004   9*     1    3

   11$          LOAD,ACC,DR1,3
   12$          ADD,ACC,DR1,4
   13$          STR,ACC,DR1,2


a(j+k)=i*l-j*k
                                 +
   -3   01010002  -3  01010005   8      1    3  -3   01010003  -3 01010004
    8     8   10  10*    5   12
    *              -
   14$          LOAD,ACC,DR1,3
   15$          MLT,ACC,DR1,4
   16$          STR,ACC,DR1,6
   17$          LOAD,ACC,DR1,2
   18$          MLT,ACC,DR1,5
   19$          SUB,ACC,DR1,6
   20$          STR,ACC,DR1,6
                                  +
   -3   01010003  -3   01010004  9*     1    3

   21$          LOAD,ACC,DR1,3
   22$          ADD,ACC,DR1,4
   23$          ADD,ACC,DR1,8
   24$          LOAD,WK,DR1,6
   25$          STR,WK,ACC,0


                                        58
i=i*(j+k)/(l-i**2)
                                                 +              *
   -3   01010002  -3   01010003  -3   01010004   9     3    5   8    1    7
   -3   01010005  -3   01010002  -4     2    6  15    17   10  13   19    7*
   10    22                                  **             -

   26$          LOAD,ACC,DR1,2
   27$          EXP,ACC,COT,0
   28$          STR,ACC,DR1,6
   29$          LOAD,ACC,DR1,5
   30$          SUB,ACC,DR1,6
   31$          STR,ACC,DR1,6
   32$          LOAD,ACC,DR1,3
   33$          ADD,ACC,DR1,4
   34$          MLT,ACC,DR1,2
   35$          DIV,ACC,DR1,6
   36$          STR,ACC,DR1,2


i=a(j)+a(k)
                       +
  -2     17  -2   43   9*    1    3

  -3*    01010004

   37$          LOAD,ACC,DR1,4
   38$          ADD,ACC,DR1,8
   39$          LOAD,ACC,ACC,0
   40$          STR,ACC,DR1,6

   -3*  01010003

   41$          LOAD,ACC,DR1,3
   42$          ADD,ACC,DR1,8
   43$          LOAD,ACC,ACC,0
   44$          ADD,ACC,DR1,6
   45$          STR,ACC,DR1,2

%endofprogram
   46$          FILL,ALLOC,2,9
   46$          STOP,,,0
   47$          FILL,COT,0,47
   47$          CONST,,,2
   48$          FILL,STACK,1,48

     $   0 FAULTS IN PROGRAM

















                                        59
                    TAGS Example

                    SKIMP COMPILER MKII

FILE: TESTT
OPTIONS: TAGS


%begin
    0$          LDA,COT,,0
    1$          LDA,DR1,,0
    2$          LDA,STP,DR1,0


%routine a(%integer i,j,k)
    3$         B,,,0
    4$         STR,DR2,STP,0
    5$         LDA,DR2,STP,0
    6$         STR,WK,STP,1
    7$         LDA,STP,STP,0


%end
    8$         FILL,ALLOC,7,5

75    K        01020004
74    J        01020003
73    I        01020002

    8$         LDA,STP,DR2,0
    9$         LOAD,DR2,STP,0
   10$         LOAD,WK,STP,1
   11$         B,,WK,0
   12$         FILL,SKIP,3,12


%integerfn b(%integername l)
   12$          B,,,0
   13$          STR,DR2,STP,0
   14$          LDA,DR2,STP,0
   15$          STR,WK,STP,1
   16$          LDA,STP,STP,0


%routine c(%integerarrayname m,n)
   17$          B,,,0
   18$          STR,DR3,STP,0
   19$          LDA,DR3,STP,0
   20$          STR,WK,STP,1
   21$          LDA,STP,STP,0


%end
   22$         FILL,ALLOC,21,4

78    N        31130003
77    M        31130002






                                        62
   22$          LDA,STP,DR3,0
   23$          LOAD,DR3,STP,0
   24$          LOAD,WK,STP,1
   25$          B,,WK,0
   26$          FILL,SKIP,17,26


%end
   26$          FILL,ALLOC,16,3

67    C         40220012
                    31130002
                    31130003
76    L         11020002

   26$          STOP,,,0
   27$          FILL,SKIP,12,27


%integer i,j


%endofprogram
   27$          FILL,ALLOC,2,4

74    J         01010003
73    I         01010002
66    B         4111000D
                    11020002
65    A         40310004
                    01020002
                    01020003
                    01020004

   27$          STOP,,,0
   28$          FILL,COT,0,28
   28$          FILL,STACK,1,28

     $   0 FAULTS IN PROGRAM
























                                        63
                    Basic Example

                    SKIMP COMPILER MKII

FILE: HANOI
OPTIONS:

%begin
    0$          LDA,COT,,0
    1$          LDA,DR1,,0
    2$          LDA,STP,DR1,0


%routine hanoi(%integer n,p1,p2)
    3$          B,,,0
    4$          STR,DR2,STP,0
    5$          LDA,DR2,STP,0
    6$          STR,WK,STP,1
    7$          LDA,STP,STP,0


%if n>0 %then %start
    8$          LOAD,ACC,DR2,2
    9$          BNG,ACC,,0


hanoi(n-1,p1,6-p1-p2)
   10$          LOAD,ACC,DR2,2
   11$          SUB,ACC,COT,0
   12$          STR,ACC,STP,2
   13$          LOAD,ACC,DR2,3
   14$          STR,ACC,STP,3
   15$          LDA,ACC,,6
   16$          SUB,ACC,DR2,3
   17$          SUB,ACC,DR2,4
   18$          STR,ACC,STP,4
   19$          BAL,UK,,4


write(p1,1) ;
   20$          LOAD,ACC,DR2,3
   21$          STR,ACC,STP,2
   22$          LDA,ACC,,1
   23$          STR,ACC,STP,3
   24$          BAL,WK,EXT,11


write(p2,1) ;
   25$          LOAD,ACC,DR2,4
   26$          STR,ACC,STP,2
   27$          LDA,ACC,,1
   28$          STR,ACC,STP,3
   29$          BAL,HK,EXT,11


newline
   30$          BAL,WK,EXT,7






                                        64
hanoi(n-1,6-p1-p2,p2)
   31$          LOAD,ACC,DR2,2
   32$          SUB,ACC,COT,O
   33$          STR,ACC,STP,2
   31$          LDA,ACC,,6
   35$          SUB,ACC,DR2,3
   36$          SUB,ACC,DR2,4
   37$          STR,ACC,STP,3
   38$          LOAD,ACC,DR2,4
   39$          STR,ACC,STP,4
   40$          BAL,WK,,4


%finish
   11$          FILL,10000,9,41


%end
   41$          FILL,ALLOC,7,5
   41$          LDA,STP,DR2,0
   42$          LOAD,DR2,STP,0
   43$          LOAD,WK,STP,1
   44$          B,,WK,0
   45$          FILL,SKIP,3,45


%integer a,b,c


1:read(a)
   45$          LDA,ACC,DR1,2
   46$          STR,ACC,STP,2
   47$          BAL,WK,EXT,10


%if a=0 %then %stop
   48$          LOAD,ACC,DR1,2
   49$          BNZ,ACC,,0
   50$          STOP,,,0
   51$          FILL,10001,49,51


read(b) ;
   51$          LDA,ACC,DR1,3
   52$          STR,ACC,STP,2
   53$          BAL,WK,EXT,10


read(c)
   54$          LDA,ACC,DR1,4
   55$          STR,ACC,STP,2
   56$          BAL,WK,EXT,10











                                        65
hanoi(a,b,c)
   57$          LOAD,ACC,DR1,2
   58$          STH,ACC,STP,2
   59$          LOAD,ACC,DR1,3
   60$          STR,ACC,STP,3
   61$          LOAD,ACC,DR1,4
   62$          STR,ACC,STP,4
   63$          BAL,WK,,4


->1
   64$          B,,,45


%endofprogram
   65$          FILL,ALLOC,2,5
   65$          STOP,,,0
   66$          FILL,COT,0,66
   66$          CONST,,,1
   67$          FILL,STACK,1,67

    $    0 FAULTS IN PROGRAM









































                                        66













                                    Appendix 2

                            Possible Extensions to SKIMP















































                                        67
1.  a.  Identifier labels instead of numerical labels:

                  LOOP: ->NEXT
                  NEXT: ->LOOP

    b.  Switches:

                  %SWITCH TYPE(1:9)
                  TYPE (I): ->TYPE(I*J)

2.  a.  Indefinite loops:

                  %CYCLE
                  . . . .
                  %REPEAT

    b.  Incremental loops:

                  %FOR I=1,1,N %CYCLE
                  . . . .
                  %REPEAT

    c.  'While' loops:

                  FRED %WHILE I>0
                  %WHILE I>0 %CYCLE
                  . . . .
                  %REPEAT

    d.  'Until' loops:

                  FRED %UNTIL I>0
                  %CYCLE
                  . . . .
                  %REPEAT %UNTIL I>0

    e.  %EXIT and %CONTINUE statements.

3.  a.  Reversed order conditional statements:

                  I=J*K %IF I=0

    b.  'Unless' conditions:

                  %UNLESS I=J %THEN K=L

    c.  'else-if' clauses:

                  %IF I=0 %THEN J=0 %ELSE %IF K=0 %THEN L=0

    d.  Multi-sided comparisons:

                  %IF A<B<C<D %THEN E=F










                                        68
4.  a.  Multi-dimensional arrays:

                  %INTEGERARRAY A(1:9,-1:1),B,C(1:N,1:M*N,0:1)

    b.  Array bound checking:

5.  a.  Own variables:

                 %OWNINTEGER I,J=123,K=456

    b.  Own arrays:
                 %OWNINTEGERARRAY L(1:10)=1,2,3.4,5,6(5)

    c.  Constant variables and arrays:

                 %CONSTINTEGER I=1234
                 %CONSTINTEGERARRAY J(1:3)=2,7,4

6.  Real variables:

                 %REAL X,Y,Z
                 %REALARRAY A(1:N)
                 %REALFN RF(%REAL X,%REALNAME Y,%REALARRAYNAME Z)

    SKIMPAI has, built-in, the real operations:

                 ADDF, SUBF, MLTF, DIVF, EXPF, NEGF, FLT

7.  String variables:

                 %STRING(15) S
                 %STRING(7)%ARRAY T(1:10)
                 %STRING(99)%FN SF(%STRING(3) S)
                 S=S."FRED".SF("JOE").T(I)

8.  Records:
                 %RECORDFORMAT RF(%INTEGER I,J,%RECORD(RF)%NAME K)
                 %RECORD(RF) R
                 R_I=R_J+1
                 X=R_K_K_I

9.  a.  INTERDATA object code

    b.  PDP-11 object code

    c.  VAX 11-780 object code

















                                        69















                                     Appendix 3

                          The SKIMP Assembler/Interpreter













































                                        70
   The  object  file  output  from  the  SKIMP  compiler  forms the input to the
assembler/interpreter.  The first stage identifies statements to be assembled by
means of the '$' symbol  following code addresses.  It is the remainder  of such
lines  that are  regarded  as  significant.  Any other  lines  such  as compiler
monitoring  are ignored.  This also allows interpreter directives to be inserted
in the program by using SKIMP comments as shown:

                                  ! $ . . . .

The interpreter directives available are:

                                  MONITOR
                                  TRON
                                  TROFF

The first of these, 'MONITOR', when encountered during interpretation prints out
the  contents  of  the  registers  and  the  stack.  Since  this  may generate a
substantial  amount  of output, this  facility  should  be used  sparingly.  The
second  and third turn on and off respectively a trace facility which prints out
the value of the program  counter  for  each instruction  interpreted.  This can
also  be  turned  on  by  means  of  a '/TR' parameter in the call of 'SKIMPAI'.
Tracing may also generate large amounts of output|  For example:

                                   ! $ MONITOR

   The interpreter  will  not  attempt  to interpret  programs  which  failed to
compile  correctly.   Furthermore,  the  system   makes  various  checks  during
interpretation  in an endeavour  to ensure  that the program has not gone out of
control.  These  include  unassigned  variable checking, that data addresses lie
within the stack or constant table and that program counter addresses lie within
the code area.  A limit  of 10000 instructions executed is also imposed to catch
loops.

   To demonstrate the use of these facilities, consider the following program:

                              %BEGIN
                              %INTEGERARRAY A(1:10)
                              %INTEGER I
                              ! $ TRON
                              I=1
                           1: A(I)=I*I
                              I=I+1
                              %IF I<=10 %THEN ->1
                              ! $ TROFF
                              ! $ MONITOR
                              %ENDOFPROGRAM

The output from the compiler and the interpreter are shown below:















                                        71
                    SKIMP COMPILER MKII

FILE: TEST
OPTIONS:


%BEGIN
    0$          LDA,COT,,0
    1$          LDA,DR1,,0
    2$          LDA,STP,DR1,0


%INTEGERARRAY A(1:10)
    3$          LDA,ACC,,1
    4$          STR,ACC,DR1,2
    5$          LDA,ACC,,10
    6$          LDA,ACC,ACC,1
    7$          STR,ACC,DR1,3
    8$          SUB,STP,DR1,2
    9$          STR,STP,DR1,4
   10$          ADD,STP,DR1,3


%INTEGER I


! $ TRON
I=1
   11$          LDA,ACC,,1
   12$          STR,ACC,DR1,5


1: A(I)=I*I
   13$          LOAD,ACC,DR1,5
   14$          MLT,ACC,DR1,5
   15$          STR,ACC,DR1,2
   16$          LOAD,ACC,DR1,5
   17$          ADD,ACC,DR1,4
   18$          LOAD,WK,DR1,2
   19$          STR,WK,ACC,0


I=I+1
   20$          LOAD,ACC,DR1,5
   21$          ADD,ACC,COT,0
   22$          STR,ACC,DR1,5


%IF I<=10 %THEN ->1
   23$          LOAD,ACC,DR1,5
   24$          SUB,ACC,COT,1
   25$          BHG,ACC,,13











                                        72
! $ TROFF
! $ MONITOR
%ENDOFPROGRAM
   26$          FILL,ALLOC,2,6
   26$          STOP,,,0
   27$          FILL,COT,0,27
   27$          CONST,,,1
   28$          CONST,,,10
   29$          FILL,STACK,1,29

     $   0 FAULTS IN PROGRAM



   11$   12$   13$   11$   15$   16$   17$   18$   19$   20$
   21$   22$   23$   24$   25$   13$   14$   15$   16$   17$
   18$   19$   20$   21$   22$   23$   24$   25$   13$   14$
   15$   16$   17$   18$   19$   20$   21$   22$   23$   24$
   25$   13$   14$   15$   16$   17$   18$   19$   20$   21$
   22$   23$   24$   25$   13$   14$   15$   16$   17$   18$
   19$   20$   21$   22$   23$   24$   25$   13$   14$   15$
   16$   17$   18$   19$   20$   21$   22$   23$   24$   25$
   13$   14$   15$   16$   17$   18$   19$   20$   21$   22$
   23$   24$   25$   13$   14$   15$   16$   17$   18$   19$
   20$   21$   22$   23$   24$   25$   13$   14$   15$   16$
   17$   18$   19$   20$   21$   22$   23$   24$   25$   13$
   14$   15$   16$   17$   18$   19$   20$   21$   22$   23$
   24$   25$
COT 27
DR1 29
STP 45
ACC 1
WK 100
 29$  ?   ? 100  11  34  11   1   4   9  16  25  36  49  64  81
 100  ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?
   ?  ?   ?

STOPPED AT 26$, 143 INSTRUCTIONS EXECUTED








                                        73
Last updated on 2005-Nov-21 14:15:30 by bfoley@@compsoc.nuigalway.ie
@ 1.1 log @Initial revision @ text @d1094 1 a1094 1 <OP> = '<<','>>','&','||','|','**'.,'/','*','+','-'; d1120 1 a1120 1 When the syntax definitions are read in. they are reduced to a numerical form @