CS4:COMP_LN2

                 Algol for compiler writers
                 ==========================

     Algol is the base from which IMP and Pascal were  derived.    Much  of  the
language should be familiar to you,  especially when written in the IMP notation
with percent sign to  designate  the  keywords.    The  complete  definition  is
contained in the modified report and this must be studied.   A less daunting but
less complete view can be obtained by reading the  ERCC  Algol  Maunual.    This
lecture  will  highlight some of the areas that are different from IMP or Pascal
including the three which cause greatest difficulty to compiler writers.

     The basic elements of  the  language  are  very  similar  to  IMP  and  are
described  in  sections  1  &  2 of the Report.   Note that upper and lower case
letters are distinct in identifiers and that the syntax  of  real  constants  is
more fussy than IMP.  The syntax of strings looks confusing but is only a way of
allowing string constants to be broken up with editing characters.   However the
syntax  of comments is complicated and they are difficult to remove in a lexical
pass for reasons that are not obvious. Further, the delimiter for "end" comments
are different from the delimiter for normal comments. The equivalence definition
of comments means that comments within strings are not comments and must not  be
touched,    so  any  lexical  scanner which is trying to remove comments must be
pretty smart.(Programs 9-15 of the test suite probe deeply into this area).  The
extended Paramater delimiter (3.2.1) is really a comment:-

            WRITE(a)the next figure gives the no of places:(5);

is exactly equivalent to:-

            WRITE(a,5);

     Variables and dynamic arrays are exactly like IMP in their declaration  and
use.

     Algol  expressions  come  in  3  varieties:-  Arithmetic,     Boolean   and
Designational.   Arithmetic expressions are very similar to Imp.   They have the
same operators and precedence rules although exponentiation  has  some  detailed
differences.  Functions can be incorporated just like Imp, as can array elements
although these latter have their subscripts enclosed in square brackets so  that
they are syntactically different from functions. The only added value that Algol
expressions have is the conditional term:-

            X:=X+(%if A>B %then A %else B);

     The concept is simple although it is difficult to compile on 370 since  the
the  different  parts  end  up in different registers.   Note the way the syntax
prevents any ambiguity  in  meaning.    The  use  of  "expression"  and  "simple
expression" mean that an "if" can follow an "else" but not a "Then".   This device
is used in a number of similar constructs.

     Boolean  expressions  are  also  familiar  to  Pascal   Users   and   quite
straightforward, even though there are no operations on 370 to support "implies"
or "equiv". Note that each Boolean Operator has an individual precedence so that
"ands"  and  "ors"  can  be mixed after an "if" without the brackets required by
IMP.

     Designational expressions are peculiar to Algol.  They are simply a rule to
produce a label.   In practice,  almost all designation expressions are either a
simple label or a switch element.   Apart from their novelty,  DEs add little to
the language and have been dropped from all Algol derivatives. After all:-

            %goto (%if A<B %then LAB1 %else LAB2);

is little different from:-

            %if A<B %then %goto A %else %goto B;

     The real problem with designational expressions is in switches.   An  Algol
switch  is  not a list of labels but a list of designational expressions.   This
will be considered later.

Compound Statements and Blocks
------------------------------

     An Algol program is a seqence of statements separated by  semicolons  which
are normally executed in sequence ("goto"s etc excepted).   It is fundamental to
Algol that a compound statement or Block can be  supplied  everywhere  a  simple
statement  is  acceptable  .   A Dummy statement is also a valid basic statement
(4.4.1).   The idea of a dummy statement is simple enough but  it  can  trip  up
Parsers  and  makes  some  Algol constructs look rather odd.   Since there is no
"unless" in Algol  some  programmers  obtain  the  effect  by  writing  a  dummy
statement after the "then":-

            %if A#0 %then %else X:=X+1;

     A compound statement is a series of statements (any one  of  which  can  of
course be a compound statement or Block) grouped together by %begin and %end.  A
Block introduces a new level of naming and consists of a %begin followed by  one
or more declarations and then one or more statments.  The crucial distinction is
that a Block has the "begin" followed by a declaration.  A block must be entered
via  the  "begin" but may be left by a non-local "goto".   Labels like variables
within a Block are local to the Block.  Labels in a compound statement are local
to the enclosing Block.

     A further quirk which confuses novices is that %BEGIN and %END are regarded
as  separators in their own right and do not need semicolons after the %BEGIN or
before a %END. If a semicolon is written, the effect is to add a dummy statement
which  normally  has no effect.   HOWEVER,   if a semicolon is written after the
%BEGIN of an intended block,   the effect of the semicolon is to turn the  Block
into  a compound statement since there is no declaration after the %BEGIN.   Any
declarations are then faulted by the compiler.

Assignment Statements
---------------------

     The Algol assignment statement is similar to Pascal or IMP and is likely to
provide  few  problems.    Note however that Algol allows one to assign reals to
integers the transformation being via the IMP INT function. Multiple assignments
are also permitted and well-defined (4.2.3).

The FOR Statement
-----------------

     The  Algol  FOR  statement  has  mind-bending  generality  and  provides  a
challenge  to  compiler  writers.    There is always a controlled variable and a
controlled staement which may of course be a compound statement or block.    The
FOR statement is an Algol statement but not a basic statement so that it can not
appear after a "then". This is because of the dangling "else" ambiguity. However
a  special  piece of syntax in the definition of conditions (4.5.1) allows a FOR
to follow a "then" in a restricted condition which has no "else".

The basic control statement has three forms:-

      The first is a string of expressions (often constants) :-

            %for I:=1,3,5,7,11,13 %do S;

      The second is the IMP "step"-"until" :-

            %for i:= 1 %step 1 %until 10 %do s;

      The third is a %while loop
 
            %for I:I+1 %while REM>0.0001 %do S;

    However these can be combined without limit:-

       %for i:= 1,3, 10 %step 1 %until 15,20 %step 1 %until 25 %do s;

     To compile these forms it is normally  necessary  to  turn  the  controlled
statement  into a parameterless procedure and call it.   Thus the equivalent IMP
for the horrendous FOR just quoted is:-

         %Routine p
         s
         %end
         i=1; p;
         i=3; p
         %for i= 10,1,15 %cycle
            p
         %repeat
         %for i= 20,1,25 %cycle
            p
        %repeat
 

     The overhead of transforming the  controlled  statement  into  a  procedure
causes a substantial runtime penalty but is usually unavoidable.  Where there is
only a single "step"-"until" element the procedurisation should be  omitted  and
the controlled statement put in-line.  The exact meaning of the various elements
is very clearly stated in section 4.6.4.   Note that Algol allows the controlled
variable to be a REAL variable!.


Switches
--------

     The Algol switch is simply a list of designational expressions.  THe bounds
are assumed to start at 1 and are obtained from the context. eg:-

           %switch SW:=LAB1,LAB2,%if x<y %then LABOK %else FAIL[ERR];

     Thus this  switch  is  assumed  to  have  bounds  of  1:3.    This  complex
construction  is  beyond the simple indexed branch provided as switch support by
the PUT routines and must be compiled into the following IMP

           %switch HIDDEN(1:3)
           HIDDEN(1): ->LAB1
           HIDDEN(2): ->LAB2
           HIDDEN(3): %if X<Y %then -> LABOK %else ->FAIL(ERR)

     This structure with a simple indexed branch leading to a series of  "goto"s
can  cope with the problem of the labels being at different textual levels.   It
can also cope with complex desgnational expressions.   However it is inefficient
for  the  common  case  where  the switch is composed only of labels and all the
labels are at the same level as the switch.   In this latter case,   an IMP-type
switch will suffice.  To enable the compiler to distinguish these cases and also
to deal with forward references to  labels  outside  the  current  block,    the
compiler must know all the labels currently in scope. Some early Algols made the
User declare all labels at the head of each block by an unauthorised  additional
statement of the form:-

           %label LAB1,LAB2;

     Nowadays this is frowned upon!   Compiler writers  are  supposed  to  build
parsers that action each label as it is found.   At the Block end,   the list of
Labels is collected and shuffled into the Block heading as an extra declaration.


Procedures
----------

     Procedures and parameter passing form the crux of the problem of  compiling
Algol - every aspect of the declaration and call has far-reaching consequences.

     An Algol procedure consists of a heading which is in several  parts  and  a
body of a single statement which is usually,  but not necessarily,  a block. For
example :-

         %procedure EX(A,B); %value A;
              %integer A: %real B;
              %begin %integer X,Y;
                   ............
              %end;

     The value part may be omitted, but if given must follow the parameters. The
value  list can only include parameters and any parameters not in the value list
are assumed to be called by name.  The specification list can have as many parts
as  necessary  but  can  only  specify  parameters - any local declarations must
follow a "begin".

     It is specifically stated (5.4.3) that the parameters may be re-declared in
the body hence strictly the above procedure occupies two textual levels. This is
a nonsense however and most  compilers  treat  procedure......end  as  a  single
entity.

     Function procedures are like Pascal - the  keyword  procedure  is  prefixed
with <type> and the "result" is assigned to the procedure name within the body -
the dual scope of the name is a nuisance.

     Procedures in Algol are declared by putting the body at  the  head  of  the
block like any other declarations. This has consequences. Consider the following
fragment :-

          %begin
               %integer A,B,C;
               ..........
          %begin
               %integer X,Y,Z;
               %procedure P(I); %integer I;
                 %begin
                   .......
                  A:=I;    %comment ?;
                  %end
                  %real A,Q,R;
                  .........

     The variable A is declared in the outer block and  in  the  same  block  as
procedure  P.    The local one is thus the relevant variable for the assignment.
This fragment represents "use before declaration" which is banned in array bound
expressions (5.2.4.2) but not generally. Some compilers would fault this program
- otherwise the compiler must shuffle the declarations so that procedures follow
all other declarations. The trouble with faulting use before declaration is that
it prevents two procedures (or,   less importantly,   two switches) from calling
each other. Most Algol experts insist on cross recursion being available even if
use before declaration is banned.

     The above discussions illustrate the enormous advantages of IMP's  %routine
%spec  statement  and  one  solution  to Algol's problems is to extend the label
noting mechanism (described above) to note procedures and construct -  invisible
to the programmer - procedure specs. The complications this causes to the parser
is amply repaid in simplification the sematic phases.

     There is one final problem with Algol procedures which concerns  procedures
as parameters. Consider the following fragment :-

            %procedure P(Q);  %procedure Q;
                %begin     %integer X,Y;
                   ..........
                   Q(X,Y).
                %end;

     This is not compileable since it is not clear if X and Y are to  be  passed
to Q by name or value.  If separate compilation of procedures is banned it ought
to be possible for the compiler to find a call of P and extract this information
from  the  procedure passed as the actual parameter.   This is hazardous.   Most
compilers demand a specific comment after a formal procedure specification:-

             %procedure P(Q);
                 %procedure Q; %comment(I,J): %value I,J:
                                 %real I: %integer J;

     Note this comment  exactly  apes  the  normal  specification  of  an  Algol
procedure  except that colons rather than semicolons separate the entities.   If
the comment does not comply exactly with the required format it is  taken  as  a
text comment.

Parameter Passing
-----------------

     Algol supports a variety of parameters - it is one of the richest languages
in this respect.   It is possible to pass integers,   reals,  Booleans,  labels,
integer arrays,   real arrays and Boolean arrays by name or value.  Additionally
procedures  switches  and strings ("which do not have values" (4.7.5.4)!) may be
passed by name.

     Variables by value,   arrays by name,   and procedures are similar  to  the
corresponding facilities in IMP or Pascal.   Arrays by value involve copying the
entire array and access information.   Note just as,  say,  1 is a valid real by
value  so  arrays  by  value may have an implied type change which involves type
changing each element of the array as it is copied!

     Labels and switches are simple in concept  -  access  to  them  is  just  a
variant of the non-local "goto".  However,  it is possible to pass a label several
times into a place where it would not be accessible by the  normal  scope  rules
and  then jump to it.   To allow this means taking great care with the design of
the object program's stack control.

     The unique element of Algol is in its call by name which  is  specified  as
being by textual substitution (!);   this makes available a range of programming
techniques of great complexity eg:-

       %real %procedure SUM(I,VAL,N);%value N;
           %integer I,N; %real VAL;
           %begin %real TOT;
                 TOT:=0;
                 %for I=1 %step 1 %until N %do
                       TOT:=TOT + VAL;
                 SUM:=TOT;
           %end;

     This procedure would be pointless in  most  languages  simply  computing  a
product by repeated addition. However, when called as :-

              SUM(I,A(I),10);

the effect of textual substitution makes the controlled statement become :-

              TOT:=TOT + A(I);

and the array is summed.

Other calls might be :-

              SUM(I,A(I,J,K),N);
              SUM(I,A(I)+B(I)+C(I),10)

to sum the column of a matrix, or to sum several arrays.

Note that the effect of textual substitution on assignment :-

              PARAM:= something;

is valid when a variable is passed becoming :-

              A(I,J):=something;

but invalid if A(I)+B(I) is passed as

              A(I)+B(I):=something;

is syntactically invalid.

     Strict interpretation of textual substitution means dynamic type  changing.
Since Algol allows assignment of reals to integers and integers to reals and the
parameter  may  be  real  or  integer,    that  leaves  a  four-way  matrix   of
possibilities.  Normally compilers treat a variable supplied as actual parameter
to be an expression except when it is of the same type as the formal  parameter.
This reduces the complexity of the problem.

     From the above examples it should be clear that an Algol name parameter  is
in fact an IMP map (if a variable is passed) or an IMP function. Thus SUM in IMP
is:-

       %realfn SUM(%integername I,%realmap VAL, %integer N)
           %real TOT; TOT=0;
               %for I=1,1,10 %cycle
                   TOT=TOT+VAL
               %repeat
               %result=TOT
       %end

and the call

       -> SKIP
       %realmap THUNKS
          %result==A(I)
          %end
       SKIP: SUM(I,THUNKS,10)

the more complex call:-

       -> SKIP                         !            ->SKIP
       %realfn NOT ASS THUNKS          !           %realmap   NOTASSTHINKS
          %RESULT=A(I)+B(I)+C(I)       !           %real TEMP
          %end                         !           TEMP=A(I)+B(I)+C(I)
       SKIP: SUM(I,NOTASSTHUNKS,10)    !           %result==TEMP
                                       !           %end
                                       !           SKIP:SUM(I,NOTASSTHUNKS,10)

     However,   since an expression and a variable may be  passed  to  the  same
procedure,  it is inconvenient to have to pass either a function or a map to the
same parameter.   It is much simpler if the second IMP example is recoded as  on
the right. This is more convenient to handle since a map is passed in all cases.
However, if an attempt to assign to the parameter is made it will overwrite TEMP
harmlessly.   A separate method of checking for this must be devised.  Note that
when a simple variable (I) is passed by name the effect is the  same  as  if  it
were  passed  by reference.   The called procedure will expect a map and this is
what is normally passed.

     Since a procedure parameter or thunks will only execute correctly if it can
access  its  global variables it follows that calling one of these entities must
involve some environment restoration. Thus to pass a label to an Algol procedure
one passes as parameter the following paramaterless procedure:-

          %procedure  PASSGOTO;
              %goto    ACTUAL LABEL;

     Note the degenerate case of a simple statement as body!   Similarly  for  a
switch:-

          %procedure  PASSSW(VAL); %value VAL; %integer VAL
             %goto    ACTUAL SWITCH[VAL];

     Thus are Algol parameters mastered :-  if  anything  is  difficult  pass  a
procedure and call it when the parameter is accessed.

Environmental Block
-------------------

     The Algol report talks as if the environmental block is compiled with  each
program  but  this  is a grossly inefficient way of doing it.   In fact a set of
separately compiled procedures is used.   A suitable set is available on  Amdahl
as:-

             ERCC07:STUENU0IS   (source)
       or    ERCC07:STUENU0IY   (object)

FACIENDA:
--------

  1) Look at the Algol Test suite  in  a  partitioned  file  ERCC07:ALGTESTS  on
     Amdahl.

     ALGTESTS ABLURB has a brief description of each test.   Choose  a  suitable
     example  (eg AT97) and code the equivalent IMP using a map.   Check you get
     the right answer and then examine the object code.

  2) Look at test AT93 to see how difficult call by name can become.

   3) Algol is available on EMAS (but not EMAS-A) by extending  your  searchlist
      to include ERCLIB.ALGOL.   You can thus try writing and running a few very
      simple programs.

   4) The EMAS Algol compiler does not  distinguish  between  Labels  passed  by
      value  and  Labels  passed by name.   This is a defect but it has never in
      practice inconvenienced anyone.   The whole test suite executes correctly.
      Can you write a 10 line program to illustrate this defect?  A can of Lager
      is offered for the most elegant solution!