CS4:COMP_LN2 Algol for compiler writers ========================== Algol is the base from which IMP and Pascal were derived. Much of the language should be familiar to you, especially when written in the IMP notation with percent sign to designate the keywords. The complete definition is contained in the modified report and this must be studied. A less daunting but less complete view can be obtained by reading the ERCC Algol Maunual. This lecture will highlight some of the areas that are different from IMP or Pascal including the three which cause greatest difficulty to compiler writers. The basic elements of the language are very similar to IMP and are described in sections 1 & 2 of the Report. Note that upper and lower case letters are distinct in identifiers and that the syntax of real constants is more fussy than IMP. The syntax of strings looks confusing but is only a way of allowing string constants to be broken up with editing characters. However the syntax of comments is complicated and they are difficult to remove in a lexical pass for reasons that are not obvious. Further, the delimiter for "end" comments are different from the delimiter for normal comments. The equivalence definition of comments means that comments within strings are not comments and must not be touched, so any lexical scanner which is trying to remove comments must be pretty smart.(Programs 9-15 of the test suite probe deeply into this area). The extended Paramater delimiter (3.2.1) is really a comment:- WRITE(a)the next figure gives the no of places:(5); is exactly equivalent to:- WRITE(a,5); Variables and dynamic arrays are exactly like IMP in their declaration and use. Algol expressions come in 3 varieties:- Arithmetic, Boolean and Designational. Arithmetic expressions are very similar to Imp. They have the same operators and precedence rules although exponentiation has some detailed differences. Functions can be incorporated just like Imp, as can array elements although these latter have their subscripts enclosed in square brackets so that they are syntactically different from functions. The only added value that Algol expressions have is the conditional term:- X:=X+(%if A>B %then A %else B); The concept is simple although it is difficult to compile on 370 since the the different parts end up in different registers. Note the way the syntax prevents any ambiguity in meaning. The use of "expression" and "simple expression" mean that an "if" can follow an "else" but not a "Then". This device is used in a number of similar constructs. Boolean expressions are also familiar to Pascal Users and quite straightforward, even though there are no operations on 370 to support "implies" or "equiv". Note that each Boolean Operator has an individual precedence so that "ands" and "ors" can be mixed after an "if" without the brackets required by IMP. Designational expressions are peculiar to Algol. They are simply a rule to produce a label. In practice, almost all designation expressions are either a simple label or a switch element. Apart from their novelty, DEs add little to the language and have been dropped from all Algol derivatives. After all:- %goto (%if A0.0001 %do S; However these can be combined without limit:- %for i:= 1,3, 10 %step 1 %until 15,20 %step 1 %until 25 %do s; To compile these forms it is normally necessary to turn the controlled statement into a parameterless procedure and call it. Thus the equivalent IMP for the horrendous FOR just quoted is:- %Routine p s %end i=1; p; i=3; p %for i= 10,1,15 %cycle p %repeat %for i= 20,1,25 %cycle p %repeat The overhead of transforming the controlled statement into a procedure causes a substantial runtime penalty but is usually unavoidable. Where there is only a single "step"-"until" element the procedurisation should be omitted and the controlled statement put in-line. The exact meaning of the various elements is very clearly stated in section 4.6.4. Note that Algol allows the controlled variable to be a REAL variable!. Switches -------- The Algol switch is simply a list of designational expressions. THe bounds are assumed to start at 1 and are obtained from the context. eg:- %switch SW:=LAB1,LAB2,%if xLAB1 HIDDEN(2): ->LAB2 HIDDEN(3): %if X LABOK %else ->FAIL(ERR) This structure with a simple indexed branch leading to a series of "goto"s can cope with the problem of the labels being at different textual levels. It can also cope with complex desgnational expressions. However it is inefficient for the common case where the switch is composed only of labels and all the labels are at the same level as the switch. In this latter case, an IMP-type switch will suffice. To enable the compiler to distinguish these cases and also to deal with forward references to labels outside the current block, the compiler must know all the labels currently in scope. Some early Algols made the User declare all labels at the head of each block by an unauthorised additional statement of the form:- %label LAB1,LAB2; Nowadays this is frowned upon! Compiler writers are supposed to build parsers that action each label as it is found. At the Block end, the list of Labels is collected and shuffled into the Block heading as an extra declaration. Procedures ---------- Procedures and parameter passing form the crux of the problem of compiling Algol - every aspect of the declaration and call has far-reaching consequences. An Algol procedure consists of a heading which is in several parts and a body of a single statement which is usually, but not necessarily, a block. For example :- %procedure EX(A,B); %value A; %integer A: %real B; %begin %integer X,Y; ............ %end; The value part may be omitted, but if given must follow the parameters. The value list can only include parameters and any parameters not in the value list are assumed to be called by name. The specification list can have as many parts as necessary but can only specify parameters - any local declarations must follow a "begin". It is specifically stated (5.4.3) that the parameters may be re-declared in the body hence strictly the above procedure occupies two textual levels. This is a nonsense however and most compilers treat procedure......end as a single entity. Function procedures are like Pascal - the keyword procedure is prefixed with and the "result" is assigned to the procedure name within the body - the dual scope of the name is a nuisance. Procedures in Algol are declared by putting the body at the head of the block like any other declarations. This has consequences. Consider the following fragment :- %begin %integer A,B,C; .......... %begin %integer X,Y,Z; %procedure P(I); %integer I; %begin ....... A:=I; %comment ?; %end %real A,Q,R; ......... The variable A is declared in the outer block and in the same block as procedure P. The local one is thus the relevant variable for the assignment. This fragment represents "use before declaration" which is banned in array bound expressions (5.2.4.2) but not generally. Some compilers would fault this program - otherwise the compiler must shuffle the declarations so that procedures follow all other declarations. The trouble with faulting use before declaration is that it prevents two procedures (or, less importantly, two switches) from calling each other. Most Algol experts insist on cross recursion being available even if use before declaration is banned. The above discussions illustrate the enormous advantages of IMP's %routine %spec statement and one solution to Algol's problems is to extend the label noting mechanism (described above) to note procedures and construct - invisible to the programmer - procedure specs. The complications this causes to the parser is amply repaid in simplification the sematic phases. There is one final problem with Algol procedures which concerns procedures as parameters. Consider the following fragment :- %procedure P(Q); %procedure Q; %begin %integer X,Y; .......... Q(X,Y). %end; This is not compileable since it is not clear if X and Y are to be passed to Q by name or value. If separate compilation of procedures is banned it ought to be possible for the compiler to find a call of P and extract this information from the procedure passed as the actual parameter. This is hazardous. Most compilers demand a specific comment after a formal procedure specification:- %procedure P(Q); %procedure Q; %comment(I,J): %value I,J: %real I: %integer J; Note this comment exactly apes the normal specification of an Algol procedure except that colons rather than semicolons separate the entities. If the comment does not comply exactly with the required format it is taken as a text comment. Parameter Passing ----------------- Algol supports a variety of parameters - it is one of the richest languages in this respect. It is possible to pass integers, reals, Booleans, labels, integer arrays, real arrays and Boolean arrays by name or value. Additionally procedures switches and strings ("which do not have values" (4.7.5.4)!) may be passed by name. Variables by value, arrays by name, and procedures are similar to the corresponding facilities in IMP or Pascal. Arrays by value involve copying the entire array and access information. Note just as, say, 1 is a valid real by value so arrays by value may have an implied type change which involves type changing each element of the array as it is copied! Labels and switches are simple in concept - access to them is just a variant of the non-local "goto". However, it is possible to pass a label several times into a place where it would not be accessible by the normal scope rules and then jump to it. To allow this means taking great care with the design of the object program's stack control. The unique element of Algol is in its call by name which is specified as being by textual substitution (!); this makes available a range of programming techniques of great complexity eg:- %real %procedure SUM(I,VAL,N);%value N; %integer I,N; %real VAL; %begin %real TOT; TOT:=0; %for I=1 %step 1 %until N %do TOT:=TOT + VAL; SUM:=TOT; %end; This procedure would be pointless in most languages simply computing a product by repeated addition. However, when called as :- SUM(I,A(I),10); the effect of textual substitution makes the controlled statement become :- TOT:=TOT + A(I); and the array is summed. Other calls might be :- SUM(I,A(I,J,K),N); SUM(I,A(I)+B(I)+C(I),10) to sum the column of a matrix, or to sum several arrays. Note that the effect of textual substitution on assignment :- PARAM:= something; is valid when a variable is passed becoming :- A(I,J):=something; but invalid if A(I)+B(I) is passed as A(I)+B(I):=something; is syntactically invalid. Strict interpretation of textual substitution means dynamic type changing. Since Algol allows assignment of reals to integers and integers to reals and the parameter may be real or integer, that leaves a four-way matrix of possibilities. Normally compilers treat a variable supplied as actual parameter to be an expression except when it is of the same type as the formal parameter. This reduces the complexity of the problem. From the above examples it should be clear that an Algol name parameter is in fact an IMP map (if a variable is passed) or an IMP function. Thus SUM in IMP is:- %realfn SUM(%integername I,%realmap VAL, %integer N) %real TOT; TOT=0; %for I=1,1,10 %cycle TOT=TOT+VAL %repeat %result=TOT %end and the call -> SKIP %realmap THUNKS %result==A(I) %end SKIP: SUM(I,THUNKS,10) the more complex call:- -> SKIP ! ->SKIP %realfn NOT ASS THUNKS ! %realmap NOTASSTHINKS %RESULT=A(I)+B(I)+C(I) ! %real TEMP %end ! TEMP=A(I)+B(I)+C(I) SKIP: SUM(I,NOTASSTHUNKS,10) ! %result==TEMP ! %end ! SKIP:SUM(I,NOTASSTHUNKS,10) However, since an expression and a variable may be passed to the same procedure, it is inconvenient to have to pass either a function or a map to the same parameter. It is much simpler if the second IMP example is recoded as on the right. This is more convenient to handle since a map is passed in all cases. However, if an attempt to assign to the parameter is made it will overwrite TEMP harmlessly. A separate method of checking for this must be devised. Note that when a simple variable (I) is passed by name the effect is the same as if it were passed by reference. The called procedure will expect a map and this is what is normally passed. Since a procedure parameter or thunks will only execute correctly if it can access its global variables it follows that calling one of these entities must involve some environment restoration. Thus to pass a label to an Algol procedure one passes as parameter the following paramaterless procedure:- %procedure PASSGOTO; %goto ACTUAL LABEL; Note the degenerate case of a simple statement as body! Similarly for a switch:- %procedure PASSSW(VAL); %value VAL; %integer VAL %goto ACTUAL SWITCH[VAL]; Thus are Algol parameters mastered :- if anything is difficult pass a procedure and call it when the parameter is accessed. Environmental Block ------------------- The Algol report talks as if the environmental block is compiled with each program but this is a grossly inefficient way of doing it. In fact a set of separately compiled procedures is used. A suitable set is available on Amdahl as:- ERCC07:STUENU0IS (source) or ERCC07:STUENU0IY (object) FACIENDA: -------- 1) Look at the Algol Test suite in a partitioned file ERCC07:ALGTESTS on Amdahl. ALGTESTS ABLURB has a brief description of each test. Choose a suitable example (eg AT97) and code the equivalent IMP using a map. Check you get the right answer and then examine the object code. 2) Look at test AT93 to see how difficult call by name can become. 3) Algol is available on EMAS (but not EMAS-A) by extending your searchlist to include ERCLIB.ALGOL. You can thus try writing and running a few very simple programs. 4) The EMAS Algol compiler does not distinguish between Labels passed by value and Labels passed by name. This is a defect but it has never in practice inconvenienced anyone. The whole test suite executes correctly. Can you write a 10 line program to illustrate this defect? A can of Lager is offered for the most elegant solution!