SKIMP MK II D. J. Rees Introduction SKIMP is a very simple language designed with the teaching of compilers in mind. Although simple, it possesses many of the structural features of full-scale languages, for example the subroutine and parameter structure, making it a worthwhile non-trivial vehicle for teaching. Many useful features of full-scale languages are missing but can bp incorporated with very little difficulty. This is quite intentional. The reasons are firstly that SKIMP is a sufficiently rich language for the compiler to be written in SKIMP itself with little inconvenience (the SKIMP MKI compiler was written in SKIMP) and secondly that incorporation of the missing features provides a very suitable form of student exercise. Although it would be desirable for students to write a complete compiler of their own, this is a considerable task which they would probably not have the time to complete. The task of extending a simple language such as SKIMP for which a compiler already exists is much more feasible and still allows students to get to grips with the compiling process and so understand it more fully. The name SKIMP arises fron the language's origin. Essentially it is a much smaller version of the language IMP, an ALGOL-60 type language developed in Edinburgh for systems implementation and general use. Since SKIMP is intended for teaching rather than production programming, it has not been necessary for the compiler to produce object code for a particular real machine. Instead, the opportunity has been taken to invent a machine whose order code does not have the inconsistencies and exceptional cases that all actual machines seen to have. The compiler produces symbolic code for this invented machine together with a certain amount of optional monitoring of the compilation. This document describes the SKIMP language, the object machine and details of the method of compilation. Examples of compiler output and suggested extensions to SKIMP suitable for student exercises are given in appendices. To enable students to test their modified compilers an assembler/interpreter for the object code is available. The use of this is also described in an appendix. 1 Chapter 1 The SKIMP Language A SKIMP program consists of a sequence of keywords, identifiers and constants together with arithmetical operators and various separator characters. Keywords A keyword consists of a sequence of upper or lower case letters preceded by the character '%'. For example: %ROUTINE %END %THEN The '%' character is regarded as a shift character in that all following letters up to the first non-letter are shifted to keyword mode. This implies that no spaces or other non-letter characters may be inserted into the middle of a keyword. The '%' can, however, be repeated after a space. Thus, %INTEGERNAME and %INTEGER %NAME are valid and equivalent, but %INTEGER NAME is invalid. Identifiers An identifier consists of a string of letters and digits of any length the first character of which must be a letter. For example, SKIMP I J2 X1A In descriptions of SKIMP syntax below, the phrase is used to denote the presence of an identifier. Constants Constants appear as operands in arithmetic expressions and may be of two forms, decimal and character. Both represent integer valued quantities since SKIMP only has integer valued variables and performs only integer arithmetic. A decimal constant is represented by a sequence of digits. For example: 123 98765 4 No decimal points or powers of 10 are allowed. Strings of up to four characters can be represented by surrounding the characters with quotes. For example: 'FRED' '123*' 'X' 2 The internal value is formed by packing the (ASCII) values of the characters right-justified in 8-bit fields of a word. A quote can become one of the characters of the value by repeating it. For example: 'JIM''S' '''' Input Lexical Processing Spaces are deleted on input everywhere except when they appear between quotes. Before deletion, however, they are significant in terminating the keyword '%' shift mode. Text between quotes remains intact with the exception that a repeated quote is regarded as a single quote. Newlines can therefore appear between quotes. For example, to test for the value of a newline character: %IF I = ' ' %THEN %RETURN This is equivalent to: %IF I = 10 %THEN %RETURN since the ASCII value of a newline character is 10. SKIMP statements are separated by either ';' or a newline. Thus several statements can appear on one line of text. For example: X = 1 ; Y = 2 ; Z = 3 In order to represent a single statement on more than one line intermediate newline characters can be inserted when preceded by the continuation keyword %C. For example: %IF X = 1 %THEN %C A = B This is equivalent to: %IF X = 1 %THEN A = B Program Structure A SKIMP program starts with the statement: %BEGIN and ends with the statement: %ENDOFPROGRAM Inside the program, routines and functions may appear in sequence or nested as required. A routine takes the form: 3 %ROUTINE RT ..... %END and a function takes the form: %INTEGERFN FN ..... %END The only difference between routines and functions is that functions produce a value as their result and routines do not. The structure of a progran could take the following form, for example: %BEGIN %ROUTINE A ..... %END %INTEGERFN B %ROUTINE C ..... %END ..... %END ..... %ENDOFPROGRAM Details of routines and functions and their parameters are given later. Declarations Names of variables anrd arrays denoting storage objects are declared at the start of a routine or function. Execution of declaration statements has the affect of setting aside storage for those variables or arrays. When the routine or function in which the declaration occurred is returned from, that storage is deleted so that it may be re-used for future declarations. When routines and functions are entered recursively new storage is set aside for the variables and arrays declared there without the storage for the old incarnation of the routine or functions being lost. This becomes accessible again when the recursive activation is left. A stack is used to accomplish this. The only objects SKIMP manipulates are integer valued. All names must be declared before they can be used. Scalar declarations take the form: %INTEGER , , , . . . Arrays are declared: %INTEGERARRAY , , . . .( : ) where the phrase is used to denote the presence of an arithmetical expression. For example: %INTEGER I, J, K sets aside three locations and names them 'I', 'J' and 'K'. The expressions in 4 the array declaration are the lower and upper bounds on the index of the arrays. They may be either constants, for example: %INTEGERARRAY A ( I : 10 ) or expressions which are evaluated at run-time, for example: %INTEGERARRAY B, C ( -1 : N+1 ) Only one-dimensional arrays are defined in SKIMP (but see possible extensions in Appendix 2). Note that the first statement of a routine or function has the effect of declaring its name. Hence, this must appear before any calls on the names. This allows self-recursive calls but not cross-recursive calls between two routines at the same nesting level. (SKIMP does not have routine and function SPECs as in IMP). Scope of Variables When a variable or array is declared, that name can be used anywhere in the routine or function in which the declaration appears, but not outside it. If the routine or function has nested routines or functions the name can also be used in these (as 'global' to them). For example: %ROUTINE A %INTEGER I %ROUTINE B %INTEGER J .... %END %ROUTINE C %INTEGER K %ROUTINE D .... %END .... %END .... %END The name 'I' can be used anywhere in routines 'A', 'B', 'C' and 'D'. The name 'J' can be used only in routine 'B' and the name 'K' can be used in routines 'C' and 'D'. Although, clearly, all the names of variables declared in a single routine or function must be distinct, the same names may be declared in different routines to refer to different storage objects. When a name is redeclared in a nested routine or function, the storage for the global object of the same name becomes inaccessible whilst within that routine or function. The name will always refer to the most 'local' declaration. Arithmetic Expressions An arithmetic expression consists of a seauence of operands separated by operators, possibly preceded by a unary operator (*, -, \) . Operands can be 5 array elements, constants, calls on functions or bracketted sub-expressions. The operators are: + : addition - : subtraction * : MultIplication / : division (rounded) ** : exponentiation << : logical shift left >> : logical shift right & : logical AND | : logical OR || : logical Exclusive OR \ : logical NOT Their precedences are: \ : highest ** << >> * / & + - | || : lowest The precedence is left to right for operators of equal precedence. Thus: 1 / J * K is equivalent to: ( 1 / J ) * K Array element operands take the form: ( ) For example: A ( 1 ) B ( 2 * A ( J ) ) Assignment Variables and array elements are assigned new values using statements of the form : = ( ) = For example: I = 2 A ( I * J ) = ( K + L ) / 2 6 Routines and Functions A routine is called by executing a statement consisting of the name of the routine. For example: %ROUTINE RT .... %END would be called by the statement: RT Similarly, the call of a function appears as an operand of an expression. For example: %INTEGERFN FN .... %END might be called by: I = FN Routines and functions can be called recursively to any depth. The exit from a routine is specified by the statement: %RETURN This is also implied by executing the %END statement of a routine, The exit from a function must give in addition a result for the function: %RESULT = For example: %RESULT = I * J - K Execution of the %END statement of a function is invalid. A routine or function can only be entered by an appropriate call as described above. If a routine or function is encountered in the normal flow of control of a program at run-time it is skipped around. Routine and Function Parameters Parameters of the following types can be passed to routines and functions: %INTEGER %INTEGERNAME %INTEGERARRAYNAME The first is a 'value' type parameter and the other two are 'reference' types. For the %INTEGER type, the value of an expression in the routine or function call is calculated and passed to the routine or function as the initial value of the parameter. The initialised parameter can be regarded thereafter as a local 7 variable. For the %INTEGERNAME type, a reference to some storage object outside the routine or function is passed and this object will be referred to whenever the parameter is referenced within the routine or function. Hence the only admissible forms of parameter in the call are either the name of a variable or the name of an array element. Similarly for the %INTEGERARRAYNAME type, a reference to an array is passed and the only admissible form of 'actual' parameter is the name of an array. For example: %INTEGER A,B,C %INTEGERARRAY M,N (1:10) %ROUTINE JIM(%INTEGER I,%INTEGERNAME J, %INTEGERARRAYNAME K) .... %END JIM (A+B,C,M) JIM (A,M(B),N) Jumps SKIMP labels are numerical, For example: 123: 987: To jump to these labels the instructions would be: ->123 ->987 Labels are local to a routine or function and jumps can only take place within the routine or function and not either to an outer textual level or to a nested inner level. Labels are not declared and jumps may precede or follow the occurrence of the corresponding label without restriction. Conditional Statements The general form of the conditional statement is: %IF %THEN %ELSE where represents the condition to be tested and represents a restricted class of instructions described below. If the condition is true the first of these instructions is executed, otherwise the second. The %ELSE clause can be omitted in which case if the condition is false execution proceeds to the following statement. A is formed by one or more simple relations of the form: where represents one of the following comparators: = * <= < >= > For example: I < J + K These simple relations can be combined using the keywords %AND and %OR. For 8 example: I = J %AND K = 0 No precedence is defined between %AND and %OR. Instead, brackets must be used. A single level of bracketting may only contain a sequence of simple comparisons connected with either %AND or %OR but not both. For example: I = 0 %AND ( J = 0 %OR K = 0 %OR L = 0 ) The relations are evaluated left to right but only up to the point where the truth or falsity of the whole condition can be determined. An may be any of the following types of statement: assignment routine call jump %RETURN %RESULT= %STOP For example: %IF I = J %THEN %RETURN %IF K > 0 %THEN K = 0 %ELSE K = -1 In addition, groups of statements can take the place of an and be made conditional. The grouping is made by preceding the first statement by the statement: %START and following the last one by the statement: %FINISH For example: %IF J = K %THEN L = 0 %ELSE %START %IF J < K %THEN %START L = 1 M = 2 %FINISH %ELSE %START L = 2 M = 1 %FINISH %FINISH %STARTs and %FINISHes may be nested to any depth but must balance within any routine or function. Comments Comment statements start with the exclamation character '|'. Thereafter, all text is ignored up to the next statement separator i.e. semi-colon or newline. For example: 9 ! this is a comment I = 0 ;! so is this ; J = 0 Input-Output The standard IMP set of input-output routine names are built into SKIMP. These are: %ROUTINE READ SYMBOL(%INTEGERNAME X) Read a single symbol from the current input stream into X. %INTEGERFN NEXT SYMBOL Return as the result the next single symbol which will be read. %ROUTINE SKIP SYMBOL Skip over the next single symbol in the input. %ROUTINE PRINT SYMBOL(%INTEGER X) Output the single symbol X. %ROUTINE SPACE. Output a space symbol. %ROUTINE SPACES(%INTEGER X) Output X spaces. %ROUTINE NEWLINE Output a newline symbol. %ROUTINE NEWLINES(%INTEGER X) Output X newlines. %ROUTINE NEWPAGE Output a newpage symbol. %ROUTINE READ(%INTEGERNAME X) Read a decimal number into X skipping preceding spaces and newlines. %ROUTINE WRITE(%INTEGER X, Y) Output the decimal number X of Y digits. 10 Chapter 2 The Object Machine. The SKIMP compiler produces object code output for an invented machine. The output takes the form of what would be this machine's assembly code rather than a binary form so that it can easily be inspected by students. Since the machine is only an invented one and does not exist, the object code cannot be run in the ordinary way, but an assembler/interpreter for this code is available which allows the output to be tested. No attempt has been made to define the machine completely. Only those features necessary and relevant for the compiler are defined. Interrupt structure and input-output instructions, for example, are not defined. The machine has a number of registers which can either be used as accumulators or as index registers. In this respect, it is similar to machines such as IBM 370, DEC 10 etc.. It is not a true multi-accumulator machine, however, in that no register to register arithmetic operations are defined. The instruction format is shown below: operation , register , base , displacement For example: LOAD, ACC, DR1, 2 Instructions operate between the register and an effective storage address formed from the sum of the contents of the base register and the displacement. The machine is word-addressed and assumed to have sufficient storage and registers for the purposes of the compiler. Although not strictly necessary, it is also useful to define a word size and the instruction word layout. A 32-bit word is envisaged with the layout: +----------------+------------+--------+------------------+ | operation | register | base | displacement | +----------------+------------+--------+------------------+ (8) (4) (4) (16) This provides for 16 registers and a displacement of up to 65535. All sixteen registers, 0-15, can be used in the register field of the instruction, but only registers 1-15 in the base field allowing a zero in the base field to indicate no indexing in the effective address. The compiler object code output does not in practice use register numbers. Instead, it uses register mnemonics for convenience of inspection. The mnemonics used by the compiler are: ACC, STP, COT, WK, DR0, DR1, DR2, DR3,..... Their individual use is explained in later chapters. 11 The operations defined for this machine are: LOAD : Load the register with the contents of the word at the effective storage address. LDA : Load the register with the effective storage address. ADD : Add the contents of the word at the effective storage address to the register. SUB : Subtract the contents of the word at the effective storage address from the register. MLT : Multiply the register by the contents of the word at the effective storage address. DIV : Divide the register by the contents of the word at the effective storage address. EXP : Raise the register to the power given by the contents of the word at the effective storage address. STR : Store the contents of the register in the location at the effective storage address. NEG : Negate the contents of the register (the effective address is ignored). NOT : Perform the logical NOT on the register (the effective address is ignored). SHL : Shift the register logically left by the amount given by the contents of the word at the effective storage address (zeros shifted in from the right, bits shifted out to the left lost). SHR : Shift the register logically right by the amount given by the contents of the word at the effective storage address (zeros shifted in from the right, bits shifted out to the right lost). AND : Perform the logical AND between the register and the contents of the word at the effective storage address. OR : Perform the logical OR between the register and the contents of the word at the effective storage address. XOR : Perform the logical Exclusive OR between the register and the contents of the word at the effective storage address. BAL : Unconditional branch to the effective address and the address of the next instruction stored in the specified register. B : Unconditional branch to the effective address (any register present is ignored). BZ : Branch to the effective address if the contents of the register are zero. BNZ : Branch to the effective address if the contents of the register are not zero. 12 BG : Branch to the effective address if the contents of the register are greater than zero. BNG : Branch to the effective address if the contents of the register are not greater than i.e. less than or equal to, zero. BL : Branch to the effective address if the contents of the register are less than zero BNL : Branch to the effective address if the contents of the register are not less than i.e. grater than or equal to, zero STOP : Stop execution. In addition to these instructions, the compiler also outputs an assembler directive 'FILL' in the same instruction format. This is used to direct the assembler to fill in previously assembled holes in the object code with appropriate values. This is intended primarily for dealing with forward jumps and is described in detail in later chapters. 13 Chapter 3 The Structure of the SKIMP Compiler The SKIMP compiler consists of a set of external routines written in IMP. It is a 1-pass system in that the compiler makes one pass over the source text and generates the object code statement by statement. Each statement is processed separately and the processing consists of four phases. These are: 1 : line reconstruction 2 : lexical analysis 3 : syntax analysis 4 : code generation The flow of control through these phases is shown in figure 3.1. +--------------------+ +--------------->| line reconstruct | | | next statement | | +----------+---------+ | | | +----------+---------+ | | analyse lexical | | | structure | | +----------+---------+ | | | +----------+---------+ | | analyse syntactic | | | structure | | +----------+---------+ | | | +---------+--------+ | | valid syntax ? | | +-+--------------+-+ | | no yes| | | | | +--------------+---+ +---+-----------+ |<---| print "SYNTAX ?" | | generate code | | +------------------+ +-------+-------+ | | | +----------+---------+ | | end of program ? | | +-+---------------+--+ | |no yes| +-----------------------------+ | figure 3.1 Syntax Definitions For convenience when continually modifying and developing the compiler, a description of the syntax of statements is read in from a file each time a program is compiled. The syntax takes the form of phrase structure definitions. 14 For example: = '0','1','2','3'.'4','5','6','7','8','9'; This would define a phrase as any of the digits 0 to 9. Characters such as these digits are enclosed within single quotes as shown. Groups of characters can also appear between quotes if required. Commas separate the alternative forms the phrase may take and a semi-colon terminates the definition. Keywords are enclosed within double quote characters. For example: = "ROUTINE" , "INTEGERFN" ; Note that the '%' character can be omitted. When it is not clear what constitutes a single keyword, for instance %INTEGERARRAYNAME, it may be divided up as convenient. For example, the forms: "INTEGERARRAYNAME" "INTEGER" "ARRAY" "NAME" are both valid and equivalent. A decimal number might be defined by the phrase as follows: = , ; That is, a number is either a digit followed by a number or is a single digit. This illustrates how phrases can be defined recursively. It is also possible to have a null alternative i.e. one with no items. For example, if it was desired to define a decimal number optionally preceded by a plus or minus sign, a phrase could be defined: = '+' , '-' , ; Then the required definition could be: = ; In other words, the third alternative form of , namely nothing, can precede the . A phrase definition with only one alternative such as this is rather redundant but nevertheless quite valid. The particular syntax analysis scheme used in the SKIMP compiler imposes certain restrictions on the forms phrase definitions May take but discussion of these is postponed until the scheme itself has been described. The syntax definitions for SKIMP are shown in figure 3.2. The types of statement in SKIMP are defined by the phrase and the remaining definitions follow from this. There are two phrases built into the compiler and which therefore do not need to be defined. These are and . They represent identifiers and constants respectively. The reason for them being built-in is that identifiers and constants are recognised at the lexical analysis stage and hence do not require definitions for use by the syntax analyser. 15 = , "IF""THEN", ':', "FINISH", "INTEGER", , "END", "BEGIN"; = , '->', "START", "RETURN", "RESULT"'=', "STOP"; = ; = '-','\','+', ; = ,,'('')'; = , ; = '<<','>>','&','||','|','**'.,'/','*','+','-'; = ; = ,'('')'; = '=','#','<=','<','>=','>'; = "AND","OR", ; = "AND", ; = "OR", ; = "ELSE", ; = '('')', ; = ',', ; = '=', ; = "ARRAY"'('':'')',; = ',', ; = "ROUTINE","INTEGERFN"; = '('"INTEGER"
')', ; = "ARRAYNAME","NAME", ; = ','"INTEGER", ; = "OFPROGRAM", ; figure 3.2 Syntax Reduction When the syntax definitions are read in. they are reduced to a numerical form to suit the syntax analyser. The keywords are also picked out and built up into a dictionary for the benefit of the lexical analyser. This dictionary is a tree structure of cells which each represent an individual keyletter. When two or more keywords start with the same keyletters, the same cells are used for the letters in common. For example, the keywords %START and %STOP would be represented by the structure shown in figure 3.3. 16 +---+---+---+ | S | | | +---+-+-+---+ | +-+-+---+---+ | T | | | +---+-+-+---+ | +-+-+---+---+ +---+---+---+ | A | | |---->| O | | | +---+-+-+---+ +---+-+-+---+ | | +-+-+---+---+ +-+-+---+---+ | R | | | | P | | | +---+-+-+---+ +---+---+---+ | +-+-+---+---+ | T | | | +---+-+-+---+ figure 3.3 As can be seen, the letters S and T share the same cells. Each cell has two links in addition. The first links together cells containing successive letters in keywords and the second links together cells containing alternative letters. Thus, the cell containing O is linked from the cell containing A through the second link since A and O are the alternative letters which may follow T. In addition to the three items already described, ther is a further item in the cells at the leaves of the tree which contains an identification number of the keyword which terminates there. These numbers are assigned as the dictionary is created and range from 128 upwards in order to distinguish them from normal characters at the lexical analysis stage. To improve dictionary searching efficiency, the first letter of each keyword is used to index into an array of 26 cells. Tree searching only takes place thereafter. Just as keywords are assigned identification numbers, so are phrase names. Phrases and are assigned the numbers 256 and 257 respectively, and other phrases the numbers 258 upwards. These numbers are used in the reduced numerical form of the syntax definitions. In general, the reduced form of a definition is as shown in figure 3.4: +-----------------------> +----------------------> +-- - - --> | | | | | | +-+-+-----+---------------++-++-----+---------------++-++-- - --+-+-+ | | NT1 | alternative 1 | | NT2 | alternative 2 | | | 0 | +---+-----+---------------+---+-----+---------------+---+-- - --+---+ figure 3.4 The zero signifies the end of the definition. The contents of the alternatives fields in the diagram correspond one for one with the items in the definition read in. They are either the numerical equivalents of the literal characters or keyword or phrase identification numbers. NT1, NT2 etc. are the number of phrase items in that alternative. This is of relevance to the syntax analyser. The definitions reduced in this way are stored one after another in an array as shown in figure 3.5. 17 1 4 1 259 11 3 135 260 145 259 261 16 2 257 58 258 20 17 1 133 261 24 1 136 262 29 3 263 256 264 33 1 131 265 33 36 0 130 0 42 3 256 266 267 47 1 45 62 257 50 0 49 143 53 0 140 58 1 141 61 268 61 0 144 0 67 3 269 65 270 271 0 71 0 45 74 0 92 77 0 43 79 0 0 84 81 2 256 266 87 1 257 92 1 40 268 41 0 98 3 272 270 97 271 100 0 0 105 0 60 60 109 0 62 62 112 0 38 116 113 0 33 33 119 0 33 123 0 42 42 126 0 47 129 0 42 129 132 0 43 135 0 45 0 140 2 273 274 0 146 3 268 275 145 268 151 1 40 260 41 0 155 0 61 158 0 35 162 0 60 161 61 165 0 60 169 0 62 61 172 0 62 0 178 2 128 273 177 276 183 2 138 273 277 185 0 0 191 2 128 273 276 193 0 193 0 199 2 138 273 277 201 0 0 206 1 132 259 208 0 0 209 215 2 40 268 278 41 217 0 0 223 2 44 268 278 225 0 225 0 230 1 61 268 232 0 0 243 4 129 256 279 40 268 58 241 268 41 247 2 256 279 0 253 2 44 256 279 255 0 0 259 257 0 142 263 0 136 134 0 273 4 40 136 280 256 279 281 41 273 275 0 0 280 0 129 137 283 0 137 285 0 0 294 4 44 289 136 280 256 279 281 296 0 0 300 0 139 302 0 0 figure 3.5 The identification numbers of the keywords and phrases corresponding to this are shown in figure 3.6. This figure also gives the starting index positions in the array of each of the phrase definitions. Keywords Phrases 128 AND 256 0 NAME 129 ARRAY 257 0 CONST 130 BEGIN 258 1 STATEMENT 131 END 259 37 INSTR 132 ELSE 260 136 COND 133 FINISH 261 202 ELSE 134 FN 262 233 ARRAY 135 IF 263 256 PROC 136 INTEGER 264 264 FORMAL 137 NAME 265 297 OFPROG 138 OR 266 209 ACTUAL 139 OFPROGRAM 267 226 ASSIGN 140 RETURN 268 62 EXPR 141 RESULT 269 68 UNARY 142 ROUTINE 270 80 OPERAND 143 START 271 93 EXPRREST 144 STOP 272 101 OP 145 THEN 273 141 TEST 274 173 CONDREST 275 152 COMP 276 186 ANDCOND 277 194 ORCOND 278 218 EXPRS 279 248 NAMES 280 276 FORM 281 286 FORMALS figure 3.6 If any faults are discovered in the syntax definitions by the reduction routine 18 the compiler will halt without attempting to compile the source. Implementation Details The routines which implement everything described in this chapter are contained in the file 'SKIMPA'. The routine which reads in the syntax definitions is named 'READ PS' and it takes the definitions from a file named 'SKIMPPS'. The resultant reduced form is stored in the array named 'PS'. The positions in this where phrase definitions start are given by the array 'PP' which is indexed by phrase identification number. The information given in figures 3.2, 3.5 and 3.6 is output to a file named 'SKIMPPSL'. The keyword dictionary is formed from cells which are records of the record array named 'KD'. Their format is: %RECORDFORMAT KDF(%INTEGER L,N,A,B) 'L' represents the key letter, 'N' the identification number at a leaf cell and 'A' and 'B' the two links used in the tree structuring. The creation of the dictionary is slightly involved since keywords in the dictionary have to be the 'lowest common denominator' of the keywords in the syntax description. This means that keywords may have to be split up if a new keyword is encountered which is a prefix of an existing one. A local string array 'KA' of keywords is used to aid this process. The main control loop of the compiler which is illustrated in figure 3.1 is implemented in the main routine named 'SKIMP'. The box 'generate code' corresponds to routine 'STATEMENT' in file 'SKIMPB'. Subroutines used by this routine appear in the other files 'SKIMPC', SKIMPD' and 'SKIMPE'. The general strategy in processing statements is to break them down into component syntatic parts and process each of these simpler parts individually. At the highest level, each alternative form in the definition of phrase is processsed separately by executing a switch on the alternative number in the analysis record of the statement (see next chapter). This is also done at the lower hierarchical levels. 19 Chapter 4 Line Reconstruction, Lexical and Syntax Analysis Line Reconstruction Line reconstruction is the first phase of processing of the raw source text and is a simple cleaning up operation. Its functions are: 1. to mark keyword letters 2 to remove spaces 3. to concatenate continuation lines 4. to discard comments Since the use of '%' as a prefix can lead to many different ways of representing keywords (c.f. %INTEGERARRAYNAME above), this variability must be removed before they can conveniently be processed any further. This is done during line reconstruction by deleting the '%' and adding the value 128 to each keyletter. The only significance of spaces is in terminating keywords. Hence, after keyletters have been marked spaces are discarded. All symbols within quotes (and this can include spaces and newlines) are left intact, however. One statement at a time is read from the source file i.e. up to the first semi-colon or newline. Any number of lines may be concatenated using the '%C' at the end of a line subject only to a limitation of 256 characters in total. This is dealt with by looking at the previous character whenever a newline is encountered. If it is the value 'C'+128 i.e. a '%'-shifted 'C', then it and the newline are discarded and processing continues with the next line. If the first character of a statement is a '!' the statement is regarded as a comment and discarded. Lexical Analysis The cleaned up text received from the line reconstruct phase is next processed to find lexical objects. These are: 1. keywords 2. names 3. constants The start of any lexical object is determined by inspecting each character in turn. A value greater than 128 indicates the start of a keyword. The adjacent characters also greater than 128 are concatenated into a string which is then looked up in the keyword dictionary described previously. The identification number thus produced (ranging from 128 upwards) is stored in the lexical output array in place of the keyletters. Potentially more than one identification number can result from a compound keyword. In this case the look up routine is called repeatedly until all the keyletters have been identified. If a character value lies between 'A' and 'Z', it and the following letters and digits comprise a name which is then looked up in a dictionary. If the name 20 is already in the dictionary then its existing identification number is used. If it is not already in, it is inserted and a new identification number generated for it. The output to the lexical array consists of the number 256 (the phrase number of ) followed by the identification number of the name. The identification numbers are the positions at which the names are stored in a 'hash table'. The hash table operates as follows. A calculation is performed on the characters of the name itself to produce a 'hash index'. If this position is not already occupied, the name is inserted at that point. If the position is already occupied the name stored there may or may not be the same as the new name being looked up. If it is the same then no further action is required. If it is not, a 'clash' has occurred which must be resolved. Numerous methods of doing this have been invented but the simplest possible one is used here. This is to scan forward cyclically inspecting each successive position in turn. If the new name is encountered in any of these positions, again no further action is required. If, however, an unoccupied position is encountered then the new name cannot be in already and so can be inserted there. The requirements of the hash calculation are that it should be fast and that as far as possible different names should produce different indexes. This is, of course, unattainable in general since there are many more potential names than indexes but a fairly good attempt can be made. The scheme used in this compiler packs character values into a word, divides this word by a prime number just less than the size of the hash table and takes the remainder as the hash index. Constants start with either a digit or a single quote character. The lexical action in this case is to evaluate the constant and to insert the number 257 (for phrase followed by the value into the lexical array. For character constants the value consists of the individual character values packed into 8-bit fields of a word right justified. Characters which are not part of keywords, names or constants are passed through to the lexical array unchanged. Since characters have values less than 128, the type of a lexical item can be determined from the range it lies in. Syntax Analysis The compiler uses a top-down recursive descent method of syntax analysis which is directed by the table of reduced syntax definitions described earlier. This analyses the output of the lexical phase with respect to the phrase . The algorithm is implemented as a routine which considers just one phrase definition. Where literal characters or keywords appear as items in the definition they are compared directly with the lexical information and where a sub-phrase appears as an item, the routine is called recursively to match it. The technique involved in matching a phrase is essentially a search in that each alternative in the phrase definition is considered in turn. Similarly, the items in an alternative are considered one after another. An alternative matches if all the items in it have been matched. If any item in the definition does not match, the whole alternative in which that item appears is abandoned and the algorithm moves on to try the next alternative. A flow diagram of the algorithm is given in figure 4.1. Figure 4.2 shows the stages in the analysis of the statement: %INTEGER I,J 21 | compare phrase | + - - - - - - - - - - - - - - - -|- - - - - - - - - - - + | | | +---------------+ | +------------------>| another |--------------|--------> | | alternative ? | no failure | | +---------------+ | | yes | | | | | | +-----------+ | | +--------------->| another |----------------|--------> | | | item ? | no success | | | +-----------+ | | | yes | | | | | | | | +---------------+ | | | | phrase ? | | | | +---------------+ | | | s | | no | | | | | | | | + - - - - - -+ +------------+ | | | | compare | | item | | | | | phrase | | match ? | | | | + - - - - - -+ +------------+ | | | success | | failure yes | | no | | | | | | | | | +<---------+--|-----------------+ | | | | | | +<-----------------+--------------------+ | + - - - - - - - - - - - - - - - - - - - - - - - - - - - + figure 4.1 The restrictions upon the form phrase definitions may take which were mentioned in the previous chapter arise from this comparison algorithm. Consider the definition: = , ; If the analyser were to use this a recursive loop would result since appears as the first item in the first alternative. Similarly, consider the definition: = , ; In this case, the analyser would only be able to recognise single digit numbers since the first alternative would always prove successful and the second alternative would never be tried. Situations of these kinds must be avoided. Fortunately this usually presents no difficulties. From the efficiency point of view, it is also desirable to avoid other forms of definition. An example might be: = , ; The inefficiency results from being the first item in both alternatives. Clearly, whenever the second alternative is recognised ( the last 22 compare 1st alternative 1st item = compare 1st alternative 1st item = / no match 2nd alternative 1st item = '-' / no match 3rd alternative 1st item = "START" / no match 4th alternative 1st item = "RETURN" / no match 5th alternative 1st item = "RESULT" / no match 6th alternative 1st item = "STOP" / no match failure to match 2nd alternative 1st item = "IF" / no match 3rd alternative 1st item = / no match 4th alternative 1st item = "FINISH" / no match 5th alternative 1st item = "INTEGER" / match 2nd item = compare 1st alternative 1st item = "ARRAY" / no match 2nd alternative 1st item = / match (I) 2nd item = compare 1st alternative 1st item = ',' / match 2nd item = / match (J) 3rd item = compare 1st alternative 1st item = ',' / no match 2nd alternative null item / match successful match of no more items successful match of no more items successful match of no more items successful match of figure 4.2 23 operand of an expression) phrase will have been recognised twice. This could have been avoided by writing the definition of in the following way: = ; = , ; Definitions with only one alternative are, strictly, unnecessary and result in the inefficiency of an extra recursive call, but are occasionally used for the sake of clarity. and are examples in the SKIMP syntax. Efficiency is also partly the reason for recognising names and constants at the lexical stage. If their recognition were left to this algorithm, two phrases such as: = 'A' , 'B' , 'C', . . . . 'Z' ; = '0' , '1' , '2' , . . . . '9' ; would probably have to be introduced. Since the algorithm searches through the alternatives, the recognition of letters and digits would be very slow. Also in view of the searching, it will clearly be beneficial to put the most commonly occurring forms of alternative first in the definition of a phrase. In particular, and are ordered in what is hopefully a sensible way with this consideration in mind. Analysis Records During the course of syntax analysis, a record is kept of which phrases have been recognised. This information is passed to the code generation phase of the compiler and directs its course of action. The record consists of a linearisation of the analysis tree corresponding to the statement recognised. For instance, the analysis tree corresponding to the example above would take the form shown in figure 4.3. / 5 | | | +-------+ | | "INTEGER" /2 | | | +----------+ | | / 1 | | | | | +----+ | +------+ | | | | 'I' ',' /2 | | | | 'J' null figure 4.3 The number after the each phrase name in the tree is the number of the 24 alternative in its definition which was matched. These numbers and the pointers emanating from each phrase comprise the information stored in the linearised form of analysis record. For each phrase, the record takes the form shown in figure 4.4: +--------------------+-----------+-----------+----------- | alternative number | pointer 1 | pointer 2 | . . . . +--------------------+-----------+-----------+----------- figure 4.4 The information is pruned to a certain extent, however. There are only pointers for the phrase items matched. All reference to keywords, literals and null alternatives is omitted since their presence can be inferred from the alternative numbers. Phrases and are treated specially since they were recognised at the lexical stage. To be consistent with other phrases, both are given an alternative number matched of 1. In addition, for , the following word is set to contain the identification number of the name and for , the following word is set to the value of the constant. The analysis record for the axample is therefore as shown in figure 4.5: +--------------> +---------------> +--> +---|--> | +----|---> | | | | | | | | | | | +---+---+---+---+---+---+-------+---+----+----+---+-------+---+ | 5 | 3 | 2 | 6 | 8 | 1 | ID(I) | 1 | 11 | 13 | 1 | ID(J) | 2 | +---+---+---+---+---+---+-------+---+----+----+---+-------+---+ figure 4.5 The phrase names themselves also do not appear in the record as can be seen above. The pointers are just indexes into the array in which the record is stored. To aid clarity, when the analysis record is printed out by the compiler (the ANAL option) the positions at which phrases start are indicated by inserts such as: (1/STATEMENT) (3/ARRAY) (8/NAMES) Implementation Details Line reconstruction and lexical analysis are implemented together in the one routine which is named 'READ STATEMENT' and which appears in file 'SKIMPA'. Subsidiary to this are routines which deal with each kind of lexical object. These are named 'KEYWORD', 'NAME' and 'CONST'. Reconstructed text is placed in an array named 'TT' and this is lexically reduced into an array named 'T'. The name dictionary which is created during lexical processing comprises an array named 'NAMEDLINK' and a byte array named 'NAMED'. The former is used as the hash table and the entries in it point to positions in the latter array where the characters of the names are stored as concatenated strings. For example, figure 4.6: 25 +----+-+----+-+----+-+------+-+--------+ | | | | | | | | | | +----+++----+++----+++------+++--------+ | | | | +---------+ +----|------+ +--+ | | +----+ | | | | | +-+-----------+---------+-------+----------------- | 5 H A R R Y 4 D I C K 3 T O M 4 F R E D . . . . +------------------------------------------------- figure 4.6 The syntax analysis routine is named 'COMPARE', also in file 'SKIMPA', and is a function which returns the value 1 for success and 0 for failure. It takes as its single parameter the identification number of the phrase to be matched in this call. Hence, it is initially called with the number of phrase . Since is assumed always to be the first phrase definition to be read in, this number is 258. Phrase numbers 256 and 257, and respectively, are picked out on entry to the routine and treated as literals since they have already been recognised. The analysis record is placed in an array named 'A'. Processing Analysis Records The basis of the processing rests on knowing the starting position in the analysis record of each phrase which is of concern. The values in these positions indicate the alternative which was actually recognised by the comparison routine and it is this information which directs the code generation. The form of analysis record facilitates finding out where the phrases start. These are simply the values of the pointers which follow each alernative number. Consider, for example, the phrase structure associated with the declaration of variables: = . . . . . ."INTEGER", . . . . = "ARRAY"'('':'')',; The analysis record for: %INTEGER I, J, K would be: 5 3 2 6 8 1 ID(I) 1 11 13 1 1D(J) 1 16 18 1 ID(K) 2 The compiler fairly consistently uses names for the positions of where phrases start in the analysis record which consist of the name of the phrase with 'P' (for 'pointer') appended. In the case of phrase the variable is 'STATEMENTP' and it is the value in the analysis record at that point which causes the main switch in routine 'STATEMENT'. For the positions of phrases , and , the variables are 'ARRAYP', 'NAMEP' and 'NAMESP' respectively. The following IMP code shows how the values for these are set up and the declaration of scalars is handled: 26 ARRAYP=A(STATEMENTP+1) NAMEP=A(ARRAYP+1) NAMESP=A(ARRAYP+2) %IF A(ARRAYP)=2 %START %CYCLE ;| for each name declared PUSHTAG(A, alternative 2, is encountered. 27 Chapter 5 Storage Organisation Layout of Store The compiler produces object code which assumes the layout of store shown in figure 5.1. The object code itself starts at address zero and the table of constants and run-time stack follow consecutively. If it was so desired, however, the code could easily be made relocatable and the two other areas could be located elsewhere, for instance in separate segments on a machine with such an architecture. 0 +------------------+ | | | v | | | Object Code | | | | | +------------------+ | | | | | Table of | | Constants | | | | | +------------------+ | | | | | Run-time | | Stack | | | | | +------------------+ figure 5.1 The entry point to the object code is at address zero. The code dumped at this position sets the register 'COT' to the base of the table of contants and also sets the first display register, 'DR1', to the start of the stack (see below). Accesses to data are always made via index registers. Run-time Storage Allocation and Addressing The block structure of SKIMP leads naturally to the use of a stack for the allocation of storage. In particular, the possibly recursive calling of routines and functions is most conveniently dealt with using a stack. Storage for local variables and arrays is allocated when the routine or function is entered and deallocated when it is left. These areas are sometimes known as 'local name spaces'. Return addresses can also be preserved on the stack. Consider a number of routines 'A', 'B', 'C' etc. and the storage allocation for variables and arrays in those routines. Suppose routine 'A' is called which then calls 'B' and which then calls 'C'. A picture of the stack would be: 28 -----+------------------+-------------+-------------+----- | storage for A | for B | for C | -----+------------------+-------------+-------------+----- For calls in a different order, say 'A' called 'C', 'C' called 'D' and 'D' called 'B', the picture would be: -----+-----------+-----------+-----------+-----------+----- | A | C | D | B | -----+-----------+-----------+-----------+-----------+----- Similarly recursive calls, say 'B' called 'A', 'A' called 'A', 'A' called 'C' and 'C' called 'B': --+-----------+------------+-----------+-----------+------------+-- | B (1) | A (1) | A (2) | C | B (2) | --+-----------+------------+-----------+-----------+------------+-- Note that completely new storage areas are allocated for recursive calls. A result of this 'allocate when called' scheme is that the storage for routine is liable to start anywhere on the stack and not in a fixed place. The compiler cannot therefore assign addresses to the variables as it comes across them in declarations at compile-time. However, it can assign them addresses relative to the start of the storage area for that routine and this it does. To access a variable at run-time therefore requires both this 'relative address' and the address of the start of the appropriate local name space. These start addresses can be kept in registers thus allowing the 'base register and displacement' form in a single object instruction to be used to access variables. A register need not be allocated to each routine in the program, however. Only those which are currently active i.e. called, and which contain variables in scope at the current point of execution of the program, need have corresponding registers. Since only one routine can be in scope at each textual level i.e. routine nesting depth, only one register per textual level is needed. This set of registers is known as a 'Display'. For machines with no or only a few index registers, a display can be kept in store rather than registers but then extra overheads are involved whenever variables are accessed. A typical situation is that in which there are sufficient registers for one to be permanently allocated to the most recent local name space and perhaps another to be used for all global accesses. The loss of efficiency here is usually fairly acceptable. No such difficulties arise with the SKIMP machine. Clearly, the display will need to be updated on routine entry and exit. Consider the following section of program with two routines at the same textual level: %ROUTINE A . . . %END %ROUTINE B . . . %END When 'A' is called, the stack situation will be: 29 --------+-------------------+------------------------- | A | --------+-------------------+------------------------- | DR--> where 'DR' is the register allocated to point at the start of 'A's local name space. When 'B' is called from 'A', the variables in 'A' are no longer in scope i.e. are inaccessible, so 'DR' can be re-used to point to 'B's local name space: ---+------------------+-------------------+------------- | A | B | ---+------------------+-------------------+------------- | DR--> When 'B' returns, 'A's variables must again become accessible. Hence, the previous value of 'DR' must be preserved. This will be true for all the registers of the display. Their values must be preserved when they are updated so that they can eventually be restored. The stack can also be used for this purpose. The object code the SKIMP compiler produces puts preserved display values in the first location of each local name space followed by the return address to the calling routine. Storage for variables starts after that. This discussion leads to the following scheme for updating the display. The display is represented by the registers 'DR1', 'DR2', 'DR3', etc. and the register which is used to point to the start of unallocated storage on the stack is named 'STP', standing for STack Pointer. The call of a routine or function is made by means of an instruction of the form; BAL, WK, , (start address) which leaves the return address in a 'work' register named 'WK'. The code at the point of entry to a routine is: STR, DR(level of routine), STP, 0 LDA, DR(level of routine), STP, 0 STR, WK, STP, 1 ADD, STP, , (storage area size) At a point of exit i.e. %RETURN, %RESULT=, or %END, the code is: LDA, STP, DR(level of routine), 0 LOAD, DR(level of routine), STP, 0 LOAD, WK, STP, 1 B, , WK, 0 Array Addressing As mentioned previously, variables are allocated locations relative to the start of a local name space as they are declared. A problem arises for arrays since they may have bounds which are evaluated at run-time which means that the amount of space they need is not known at compile-time. If the space for arrays and variables were allocated in the order of declaration, variables declared after an array would not have a fixed relative address. The solution is to 30 allocate the space for arrays after that for the variables as shown in the diagram: --+--------+-------------+-----------------+------------------+-- | old DR | return addr | static area | dynamic area | --+--------+-------------+-----------------+------------------+-- | DR--> Each array is now allocated a single location in the static area and this points to the place in the dynamic area where the storage for the array is located. For example; %INTEGER I, J %INTEGERARRAY A, B( ? : ? ) %INTEGER K, L +-------------------------> +-----|-------------> | | | | | -+--+--+-----+-----+--+--+--+--+-----+-----+-----------+-----------+- | | | I | J | | | K | L | A | B | -+--+--+-----+-----+-----+-----+-----+-----+-----------+-----------+- | | <--DR STP--> When register 'STP' os incremented at routine entry, it is just incremented past the static storage area. The object code output for an array declaration allocates space beyond 'STP' and moves 'STP' up as appropriate for the size of the array. Variables in the static area have an address: DR(textual level of declaration) + relative address of variable which allows them to be accessed in one instruction. Array elements on the other hand can only be accessed indirectly via the pointer. If this points to the position of the possibly hypothetical zero element of the array rather than the first actual location, the address of an element such as: A( expression ) will be: [ DR(level) + rel. addr. of pointer ] + value of expression where the square brackets indicate 'contents of'. In addition to incrementing 'STP' the object code for an array declaration must also fill in this zero address. For the general case of an array declaration: %INTEGERARRAY A,B,....( lower : upper ) the form of the object code is: 31 code to evaluate 'lower' in 'ACC' STR, ACC, DR(level), temp1 code to evaluate 'upper' in 'ACC' LDA, ACC, ACC, 1 STR, ACC, DR(level), temp2 SUB, STP, DR(level), temp1 STR, STP, DR(level), rel. addr. of 'A's pointer ADD, STP. DR(level), temp2 SUB. STP, DR(level), temp1 STR, STP. DR(level), rel. addr. of 'B's pointer ADD, STP. DR(level), temp2 . . . . . 'temp1' and 'temp2' are temporary working storage locations set aside in the local name space to hold the values of 'lower' and 'upper+1' respectively. (Similar working storage is also used for partial results when evaluating expressions.) For each array declared the value 'STP-lower' before 'STP' is incremented gives the zero element address and that plus 'upper+1' gives the final value of 'STP' having allowed space of length 'upper-lower+1' for the array. Parameter Passing Parameters are regarded as initialised local variables of routines in SKIMP. Passing parameters for a routine call therefore consists of setting these initial values. This is quite straightforward since the the parameters will have the first relative addresses to be allocated in the static area of the routine's local name space. Before the entry code has been executed, the start of the future local name space will pointed to by the current value of 'STP'. This allows the values of the parameters to be stored using addresses relative to 'STP' before the 'BAL' is executed. The three types of parameter are each treated differently. For %INTEGER value type parameters, the expression in the call is evaluated in register 'ACC' and then stored in the appropriate place. For %INTEGERNAME type parameters, the address of the variable or array element is evaluated in register 'ACC' and then stored. Thereafter, the parameter has to be accessed indirectly via this address. For %INTEGERARRAYNAME type parameters, the word containing the zero element address of the array in the call is copied to the parameter location. This allows accesses to array name parameter elements to be handled by the compiler in exactly the same way as those to ordinary array elements. To illustrate the scheme, consider the stack when routine 'R' in the following program is called: %BEGIN %ROUTINE R(%INTEGER X, %INTEGERNAME Y,%INTEGRRARRAYNAME Z) . . . . . %END %INTEGERARRAY A ( 0 : ? ) %INTEGER I, J, K R(I+J, K, A) . . . . . 32 +------------------->+<-------------------------------+ | <-|--------------------------+ | | | | | | -+--+--+-----+-----+-----+-----+-----------+--+--+-----+-----+-----+-- | | | | I | J | K | A | | | X | Y | Z | -+--+--+-----+-----+-----+-----+-----------+--+--+-----+-----+-----+-- | (A0) | (I+J) (A0)| <--DR1 DR2--> STP--> The object code for such a call would be: LOAD, ACC, DR1, 3 \ I+J ADD, ACC, DR1, 4 / STR, ACC, STP, 2 --> LDA, ACC, DR1, 5 \ name STR, ACC, STP, 3 / LOAD, ACC, DR1, 2 \ array name STR, ACC, STP, 4 / BAL, WK, , (start address of R) If a variable of %INTEGERNAME type i.e. a parameter, is passed as the actual parameter for another %INTEGERNAME parameter, the word containing the reference to the original actual parameter is copied to the position for the new parameter thereby maintaining just one level of indirection to reach the required address. Implementation Details Declarations of both scalar variables and arrays are processed as alternative 5 of phrase in file 'SKIMPB'. The routine 'PUSHTAG' which records details of declarations for future use is described in the next chapter. Relative addresses in the static area of any local name space are allocated sequentially. However, routines may be nested and may appear in the middle of the variable declarations of the enclosing routine. Hence, the next relative address to be allocated must be retained for each textual level up to the current level to allow for this. These values are stored in the array 'NEXTRAD' which has an element for each possible textual level. Since the compiler variable 'LEVEL' always contains the value of the current textual level, the assignment of relative addresses always uses 'NEXTRAD(LEVEL)'. Routine 'GET WORK' allocates locations in the local name space for temporary storage of partial results, in this instance in connection with array declarations. They are handled in exactly the same way as scalar declarations. Routine entry and exit sequences are dumped by the routines 'ENTER' and 'DUMP RETURN' which appear in file 'SKIMPD'. Note that 'ENTER' also handles the '%BEGIN' statement since this is similar in many respects. 33 Chapter 6 Identifier Tag Handling All identifiers in SKIMP either have to be declared or have to be specified before they can be used. This is to enable the compiler to generate the appropriate code when the identifier is later encountered. In practice this means that the compiler must preserve information about each identifier as it is declared or specified and this will be referred back to each time the identifier is used. This information on each identifier is often called a 'tag'. There are several different items of information which are packed together to form the tag which are: 1. Form of identifier 2. Type of identifier 3. Number of parameters (or dimensions) 4. Textual level of declaration 5. Relative address or start address 1. The form of an identifier is intended to differentiate between scalar variables and arrays, value and reference types of parameter, and data and program objects. 2. The type of an identifier specifies whether it is %INTEGER in nature or not. (In extensions to SKIMP the type would distinguish between %INTEGER, %REAL, %COMPLEX, %STRING etc.) The various forms and types in SKIMP as at present implemented are given the code values shown below: form type %INTEGER 0 1 %INTEGERNAME 1 1 %INTEGERARRAY 2 1 %INTEGERARRAYNAME 3 1 %ROUTINE 4 0 %INTEGERFN 4 1 3. If SKIMP implemented arrays of more than one dimension, the number of dimensions would have to be saved. As this is not the case this information is redundant. The corresponding information for routines and functions is needed, however. This is the number of parameters they take. Information about the parameters occupies further tag words using a scheme described later. 4. The textual level of declaration of each name must be saved since this is used to select the appropriate display base register for addressing purposes. 5. To complete the information needed for addressing variables and arrays, the relative addresses assigned must be saved. For routines and functions the address of their entry points must be stored so that the correct address can be output for the BAL instructions. These items are packed into a single word in the following format: 34 +--------+--------+----------+---------+----------------------+ | form | type | params | level | address | +--------+--------+----------+---------+----------------------+ (4) (4) (4) (4) (16) As an example, consider the program: %BEGIN %INTEGER I,J %ROUTINE R(%INTEGERNAME K) %INTEGERARRAY A(1: 10) ....... %END ....... %ENDOFPROGRAM The tag words for these names would be: +---+---+---+---+-----------+ I | 0 | 1 | 0 | 1 | 2 | +---+---+---+---+-----------+ +---+---+---+---+-----------+ J | 0 | 1 | 0 | 1 | 3 | +---+---+---+---+-----------+ +---+---+---+---+-----------+ R | 4 | 0 | 1 | 1 | (addr) | +---+---+---+---+-----------+ +---+---+---+---+-----------+ K | 1 | 1 | 0 | 2 | 2 | +---+---+---+---+-----------+ +---+---+---+---+-----------+ A | 2 | 1 | 0 | 2 | 3 | +---+---+---+---+-----------+ Values of tags can be inspected in the output from the compiler by using the option '/TAGS' when calling the compiler. They are printed out (in hexadecimal) at the %END of every routine and function and at %ENDOFPROGRAM for the names that were locally declared. An example of this can be found in appendix 1. The packing of the tag word places some limits on the programs the compiler can compile, for instance only up to fifteen routine parameters and only up to fifteen textual levels, but these are not serious limits for this compiler. (In fact, less than fifteen textual levels are permitted due to lack of registers.) This word of information has to be made available whenever the name it corresponds to is used. Since names are identified by a dictionary position i.e. hash position, a further array is used in parallel. This contains pointers to the list cells which contain the tag information. This is be illustrated in figure 6.1 for the preceding example again. 35 +---+---+ TAG LINK | | | v v +---+---+ +---+---+---+---+------++------+ +-----------| | |------->| 2 | 1 | 0 | 2 | 3 || 0 | | +---+---+ +---+---+---+---+------++------+ | | | | +---+ | +---+---+ +---+---+---+---+------++------+ | I |<--|-----------| | |------->| 0 | 1 | 0 | 1 | 2 || 0 | +---+ | +---+---+ +---+---+---+---+------++------+ | J |<--|-------+ | | | +---+ | | +---+---+ +---+---+---+---+------++------+ | R |<--|---+ +---| | |------->| 0 | 1 | 0 | 1 | 3 || 0 | +---+ | | +---+---+ +---+---+---+---+------++------+ | K |<--|---|---+ | | | +---+ | | | +---+---+ +---+---+---+---+------++------+ | A |<--+ | +---| | |------->| 1 | 1 | 0 | 2 | 2 || 0 | +---+ | +---+---+ +---+---+---+---+------++------+ | | | | | | | | | +---+---+ +---+---+---+---+------++------+ +-------| | |------->| 4 | 0 | 1 | 1 |(addr)|| |---> +---+---+ +---+---+---+---+------++------+ | | | | <---------------------------------------+ | | | | +---+---+---+---+------++------+ +---+---+ +--->| 1 | 1 | 0 | 2 | 2 || 0 | +---+---+---+---+------++------+ figure 6.1 The diagram also shows how the information on parameters is stored. This is in further list cells beyond that for the name of the routine. In general, the structure is as shown in figure 6.2. Note that there is now a duplication of the tag information on the parameter, firstly, in the chain of cells after the tag for the name of the routine and secondly in the dictionary position for the parameter name. However, this second occurrence is only present during compilation of the routine i.e. when the parameter name is in scope, whereas the first occurrence is present whenever the routine name is in scope i.e. whenever parameter information is needed to compile the routine calls. | | | +---+---+ +------+--+ +-------+--+ +-------+--+ | | |----->| name | |--->| par 1 | |--->| par 2 | |---> . . . +---+---+ +------+--+ +-------+--+ +-------+--+ | | | figure 6.2 Only those names which are in scope at the position in the program the compiler is working should have a tag cell. A tag cell is therefore pushed down whenever a name is declared and popped up again (together with all the parameter tag cells for routine names) when it is 'undeclared' i.e. at the end of the routine or function in which it was declared. This also allows for the redeclaration of the same name at an inner nested routine level. The tag at the head of the pushdown list is thus always the one for the latest declaration. If the redeclaration is a routine name, of course, the original tag is beyond all the parameter information cells in the list. Consider the following program: 36 %BEGIN %INTEGER I %ROUTINE R %INTEGERARRAY I(1:10) ...... %END ...... %ENDOFPROGRAM The tags for this whilst the compiler is compiling routine 'R' will be: | | | +---+ +---+---+ +-+-+-+-+---+---+ +-+-+-+-+---+---+ | I |<-----| | |---->|2|1|0|2| 2 | |---->|0|1|0|1| 2 | | +---+ +---+---+ +-+-+-+-+---+---+ +-+-+-+-+---+---+ | | | | | 2nd declaration 1st declaration If when a call is popped up, nothing remains on the list i.e. this was not a redeclaration, then the space in the name dictionary for that name can also be recovered. This is the case since the dictionary works on a last-in-first-out basis, a consequence of the nested structure of SKIMP programs. The question arises as to how the compiler knows which tags to pop up when an %END statement is reached. An obvious method would be to search through the dictionary for tags having the appropriate textual level in the level field. A more efficient method, however, which avoids searching, is to remember in a list the ones which have to be popped up. Each cell in the list contains the dictionary position of a name, so that this list can be scanned down at the %END statement and a 'popup' performed for every position mentioned in it (plus extra 'popups' for routine parameter tag cells). Because of the nesting of routines in SKIMP a separate list has to be maintained for each textual level. An array with one position per textual level contains pointers to the heads of these lists as shown in figure 6.3: +---+ +---+---+ +---+---+ +---+---+ | |----------->| | |--->| | |--->| | | +---+ +---+---+ +---+---+ +---+---+ | |-------+ +---+ | +---+---+ | |---+ +--->| | | +---+ | +---+---+ | | | | | | +---+---+ +---+---+ | | +------->| | |--->| | | +---+---+ +---+---+ figure 6.3 Implementation Details Lists of calls are frequently used by the compiler to maintain current status information during compilation. Some are used as pushdown popup stacks but not 37 all. A list cell in this implementation consists of two words, one an information word and the other a link word: +---------------+---------------+ | information | link | +---------------+---------------+ Information words are taken from an array named 'TAG' and link words from an array named 'LINK'. Initially, all these pairs of words are formed into an 'Available Space List' or 'Free List' and thereafter cells are taken from this list as required and returned to it when no longer needed. The variable 'TAGASL' always points to the head of the available space list: +---+---+ +---+---+ +---+---+ TAGASL--->| | |--->| | |--->| | |---> . . . +---+---+ +---+---+ +---+---+ By convention, the link of the last call in any list is set to zero. Links to other list calls take the form of the index of the words in the 'TAG' and 'LINK' arrays. These are therefore declared from 1 upwards to avoid conflict. Both of the arrays are initialised in file 'SKIMPA' to set up the tags and links for built-in routine names such as 'READ SYMBOL' etc. (The name dictionary is also initialised for the same reason.) A function 'NEWTAG', in file 'SKIMPD', takes the cell indicated by 'TAGASL' from the available space list whenever a cell is required. 'TAGASL' is then moved down the list to the next one. If ever 'TAGASL' is zero, the compiler stops with a suitable monitor. The result of the 'NEWTAG' function is the index of the cell in 'TAG' and 'LINK'. A common sequence of instructions might take the form: CELL = NEWTAG TAG (CELL) = .... LINK (CELL) = .... A function named 'RETURNTAG',also in file 'SKIMPD', has the reverse effect. It takes a single parameter which is the index of the cell to be returned to the available space list. The result of the function is the value in the 'LINK' word of the cell returned. This is purely a matter of programming convenience. The two parallel arrays which form the hash dictionary of names and pointers to tags are named 'NAMEDLINK' and 'TAGLINK' and point to the 'NAMED' and 'TAG/LINK' arrays respectively. Routines 'PUSHTAG' and 'POPTAGS', in file 'SKIMPD', push and pop tags from 'TAGLINK' as required. 'PUSHTAG' is called with the identification number of the name whose tag is being pushed together with it's form, type, etc. as parameters. A subsidiary function of this routine is to check whether the same name has already been declared at this level and monitor the fault if so. 'POPTAGS' is called at the %END of each routin e or function to pop all the tags for names declared at that level, monitoring them if the '/TAGS' option was set. The array which contains pointers to lists of names declared at each level is named 'NAMELIST'. 38 Chapter 7 Compilation of Expressions Although all the registers in the object machine can be used as accumulators the machine is regarded, for the sake of simplicity, as a single-accummulator machine and all expressions are evaluated in the one register 'ACC'. This avoids the necessity of having register to register operations which would be needed for a true multi-accumulator machine and avoids the extra complexity this introduces for expression compilation. A 'tree' algorithm is used in the SKIMP compiler. Many other methods are possible but this one produces reasonably good object code without too many redundant 'LOAD' and 'STR' operations. It is also straightforward to comprehend. Expressions are regarded as trees with operators at the nodes and operands as leaves. For example, the expression: I * J - K / (L + M) has the tree form shown in figure 7.1. - | +-----+-----+ | | * / | | +--+--+ +--+--+ | | | | I J K + | +--+--+ | | L M figure 7.1 The algorithm examines the 'shape' of the tree in order to decide the best order of evaluation of the expression. The commutativity of the operators, that is, whether 'X?Y' is the same as 'Y?X' or not, is taken into account and also whether the sub-trees of operator nodes are leaves or not. When the sub-tree is a leaf, that operand is capable of being immediately loaded into 'ACC' or can operate immediately on 'ACC'. When the sub-tree is not a leaf i.e. a further tree, the algorithm works recursively to compile it. The algorithm can be tabulated for the various combinations possible, the whole thing being regarded as 'evaluate tree' whose object is to dump code which puts the result in 'ACC'. The table is shown in figure 7.2. This algorithm would not be acceptable if the order of evaluation was defined as part of the language, for instance strictly left to right. As an example, the object code is improved by evaluating the second operand first for non-commutative operators. To illustrate the working of the algorithm, consider the expression which was used above: I * J - K / (L + M) 39 +--------------+------------+----------------+-------------------------+ | left branch | operator | right branch | actions | +--------------+------------+----------------+-------------------------+ | leaf 1 | all | leaf 2 | LOAD,ACC,leaf 1 | | | | | (operation),ACC,leaf 2 | +--------------+------------+----------------+-------------------------+ | tree | all | leaf | evaluate tree | | | | | (operation),ACC,leaf | +--------------+------------+----------------+-------------------------+ | leaf | comm. | tree | evaluate tree | | | | | (operation),ACC,leaf | +--------------+------------+----------------+-------------------------+ | leaf | non-comm. | tree | evaluate tree | | | | | STR,ACC,(work) | | | | | LOAD,ACC,leaf | | | | | (operation),ACC,(work) | +--------------+------------+----------------+-------------------------+ | tree 1 | all | tree 2 | evaluate tree 2 | | | | | STR,ACC,(work) | | | | | evaluate tree 1 | | | | | (operation),ACC,(work) | +--------------+------------+----------------+-------------------------+ figure 7.2 The steps, with indentation for recursive depth are shown in below. The 'work' locations used for partial results are in the local name space for the current routine as was described in chapter 5 in connection with array declarations. evaluate tree [ I*J-K/(L+M) ] tree [ I*J ] all [ - ] tree [ K/(L+M) ] evaluate tree [ K/(L*M) ] leaf [ K ] non-comm [ / ] tree [ L+M ] evaluate tree [ L+M ] leaf [ L ] all [ + ] leaf [ M ] LOAD,ACC, L ADD,ACC, M STR,ACC, work 1 LOAD,ACC, K DIV,ACC, work 1 STR,ACC, work 1 evaluate tree [ I*J ] leaf [ I ] all [ * ] leaf [ J ] LOAD,ACC, I MLT,ACC, J SUB,ACC, work 2 The case of array elements and function calls as operands must be considered in more detail. These cannot be regarded as leaves of the tree since they involve further computation which needs register 'ACC'. They are therefore treated as trees from the point of view of the algorithm. Array element accessing puts the value of the element in 'ACC' and the result of functions is also left in 'ACC'. This satisfies the requirements of 'evaluate tree'. Variables of type %INTEGERNAME i.e. parameters, can still be regarded as leaves by making use of another index register, 'WK', for the indirect 40 reference. Constants are leaves of trees and, in general, are accessed from the table of constants following the object code using register 'COT' as the base. The use of a location in the constant table is avoided if a 'LOAD' of the constant is required and the value is less than 65536. This is because the 'LDA' instruction can be used with the index register set to zero. For example,an instruction to 'LOAD' the value 1234 would be: LDA, ACC, , 1234 The tree representation is that of Reverse Polish form together with pointers. In Reverse Polish notation the operator follows the operands. In general, operand 1 operator operand 2 becomes: operand 1 operand 2 operator and the same transformation takes place for the operands when they are expressions. For example, the infix expression: I * (J + K) / L - M becomes: I J K + * L / M - The extra pointers that were mentioned come after each operator and correspond to the branches of the tree. In general the form is: <-------------------------+ <--------------|----------------------+ | | | | | +-----------+ +-----------+ +----------+---+---+ | operand 1 | | operand 2 | | operator | | | +-----------+ +-----------+ +----------+---+---+ When an operand is a further sub-expression, the pointer points to the operator in it's repesentation. Otherwise, the pointer points to the first word of a pair of words which describe the operand. The first of the pair is a type number and the second is consequent on the value of the first. The type values are: ^operators | -1 : function call -2 : array element leaves/ -3 : scalar variable \ -4 : constant They are negative to distinguish them from values which denote operators and hence sub-trees. The values for operators are their alternative numbers from phrase (with two additional values beyond for the unary operators '-' and '\') and are thus always positive. Values greater than or equal to -2 cause the algorithm to be called recursively, a requirement which was mentioned above. For types -1 and -2, the second word of the pair contains an analysis record pointer which will allow the function call or array element accession object code to be generated when it is required. For type -3, the second word is the 41 tag of the variable and for type -4, the second word is the value of the constant. As an example, consider the expression: I + 12345 The representation of this is: <------------------------+ <--------------|-------------------+ | | | | | +------+--------+------+-------+-----+-----+-----+ | -3 | tag(I) | -4 | 12345 | 9 | 1 | 3 | +------+--------+------+-------+-----+-----+-----+ An example to show sub-trees might be: (I + 99) * (J - 88) The representation of this is: <--------------------+ <-------------------------------|----------------+ | <-------|-------+ <--------|--------+ | | <---------|--------|---+ | <---------|---------|---+ | | | | | | | | | | | | | | | +---+------+---+----+---+---+---+---+------+---+----+----+---+----+---+---+----+ |-3 |tag(I)|-4 | 99 | 9 | 1 | 3 |-3 |tag(J)|-4 | 88 | 10 | 8 | 10 | 8 | 5 | 12 | +---+------+---+----+---+---+---+---+------+---+----+----+---+----+---+---+----+ The technique used to convert infix expressions to this form combines a method of converting them to Reverse Polish with a scheme of pseudo-evaluation to calculate the values of the pointers. The conversion to Reverse Polish is shown diagrammatically in figure 7.3. A pushdown stack is used for operators to take account of their precedence. This delays their storage until the appropriate place in the output. A new stack is declared at each recursive level to deal with bracketted sub-expressions. The conversion of the expression: I + J * (K - L) / M would proceed as shown in figure 7.4. The dotted box indicates the recursive call to deal with '(K-L)'. When operators and operands are 'stored' according to the flow diagram, a pseudo-evaluation takes place. To explain what is meant by 'pseudo'-evaluation. first consider an 'evaluation' of a reverse Polish expression. A pushdown stack is used to retain operand values and partial results during the evaluation. Proceeding from left to right, whenever an operand is encountered it's value is pushed down. Whenever a (binary) operator is encountered, the top two items on the stack are popped and the operation between them is performed. The result is then pushed back onto the stack. (For unary operators, just one item is popped from the stack). Proceeding in this way. when the whole of the reverse Polish string has been dealt with, the value of the whole expression is what remains in the stack. For example: 10 + 8 * (6 - 4) / 2 42 to reverse Polish | + - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - + | | +---------+-------+ | <--------| unary operator ?| | | yes +-----------------+ | +----------+ | no | +-->| stack op |------------->| | | +----------+ +--------+--------+ + - - - - - -+ | | | next operand a |---->| to reverse | | | | sub-expression ?| yes | Polish | | | +--------+--------+ + - - - - - -+ | | | no | | | +-------+-------+ | | | | store operand | | | | +-------+-------+ | | | |<-------------------+ | | +---------+---------+ +-----------------+ | | | another operator ?|--->| unstack & store |-----> | | +---------+---------+ no | operators left | | | | yes +-----------------+ | | +----------+----------+ | +------------------| new operator higher |<----+ | yes | precedence ? | | | +----------+----------+ | | | no | | +--------+--------+ | | | unstack & store |-------+ | | operator | | +-----------------+ | + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + figure 7.3 input op stack action cumulative output I empty store I I + + stack + I J + store J IJ * +* stack * IJ + - - - - - - - - - - - - - - - - - - - - + | K empty store K | IJK | - - stack - | IJK | L - store L | IJKL | empty unstack & store - | IJKL- + - - - - - - - - - - - - - - - - - - - - + / + unstack & store * IJKL-* +/ stack / M + store M IJKL-*M + unstack & store / IJKL-*M/ empty unstack & store + IJKL-*M/+ figure 7.4 43 which in reverse Polish is: 10 8 6 4 - * 2 / + The stages of evaluation in a pushdown stack are | | | | | | | | | | | | | | | | | | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | | | | | | | 4 | | | | | | | | | | | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | | | | | 6 | | 6 | | 2 | | | | 2 | | | | | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | | | 8 | | 8 | | 8 | | 8 | | 16 | | 16 | | 8 | | | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | 10 | | 10 | | 10 | | 10 | | 10 | | 10 | | 10 | | 10 | | 18 | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ In a 'pseudo'-evaluation, it is not the values of operands that are pushed down but some other property. In this case, it is positions in the representation of the tree. In place of evaluation when operators occur, the positions of operands are popped up and it is these values which are appended to the tree representation. The position of the operator is then pushed down in their place. Consider the example used above: (I + 99) * (J - 88) which in reverse Polish is: I 99 + J 88 - * The successive contents of the pseudo-evaluation stack will be: | | | | | | | | | | | | | | +----+ +----+ +----+ +----+ +----+ +----+ +----+ | | | | | | | | | | | | | | +----+ +----+ +----+ +----+ +----+ +----+ +----+ | | | | | | | | | 10 | | | | | +----+ +----+ +----+ +----+ +----+ +----+ +----+ | | | 3 | | | | 8 | | 8 | | 12 | | | +----+ +----+ +----+ +----+ +----+ +----+ +----+ | 1 | | 1 | | 5 | | 5 | | 5 | | 5 | | 15 | +----+ +----+ +----+ +----+ +----+ +----+ +----+ The representations of the trees corresponding to all the expressions in a program can be monitored by using the parameter '/EXPR' when calling the compiler. Examples can be found in appendix 1. Implementation Details The routine concerned with the compilation of expressions is named 'EXPR' and can be found in file 'SKIMPE'. Within 'EXPR', subroutine 'TO TREE' produces the linear tree representation using a further subroutine 'PS EVAL' for the pseudo-evaluation part. It also has the subsidiary function of checking that all the names used in the expression have been declared and are of an appropriate type, for instance, not routine names. The tree code generating algorithm is implemented by routine 'EVALUATE' which makes use of a subroutine 'OPN' to dump primitive operations i.e. routine calls. array and variable accesses and constant handling. 44 A minor complication of routine 'EXPR' deals with the expressions involved with simple comparisons in conditional statements. This is because a comparison such as: expr1 expr2 is compiled as an evaluation of 'expr1 - expr2' and then the value tested against zero. An external flag named 'CONDFLAG' is set in these circumstances and its effect is form a tree with branches representing each expression the topmost node being a '-'. The algorithm then proceeds as normal. 45 Chapter 8 Conditional Statements The general form of conditional statement is: %IF condition %THEN instruction 1 %ELSE instruction 2 where the 'instructions' include the case of a group of statements surrounded by %START and %FINISH. In compiling this, the aim is to produce object code of the form: +---------------+ | condition | +---------------+ ->FALSE if false +---------------+ | instruction 1 | +---------------+ ->NEXT FALSE: +---------------+ | instruction 2 | +---------------+ NEXT: where the box containing 'condition' implies object code instructions which evaluate the truth or falsity of the condition for subsequent testing. When the %ELSE clause is absent, this simplifies to the form: +---------------+ | condition | +---------------+ ->FALSE if false +---------------+ | instruction 1 | +---------------+ FALSE: In addition, since the truth or falsity of the whole condition can sometimes be determined without evaluating all the simple comparisons, there may be intermediate jumps out from the middle of the condition to the consequent instructions. For example, consider; %IF sc1 %AND (sc2 %OR sc3) %THEN instr 1 %ELSE instr 2 This would produce code of the form shown in the next diagram. Analagously with arithmetic expressions, however, the language may define conditions to be evaluated 'in toto' as Boolean expressions which would preclude jumping out. Fortunately, SKIMP does not require this. A number of observations can be made with a view to producing a straightforward simple compilation algorithm. The first concerns whether the jump following each simple comparison in the condition should be on true or false. Consider the case: %IF sc1 %AND sc2 %AND sc3 %THEN instr 1 %ELSE instr 2 46 + - - - - - - - - - -+ | +----------+ | | | sc 1 | | | +----------+ | | ->FALSE if false | | +----------+ | | | sc 2 | | | +----------+ | | ->TRUE if true | | +----------+ | | | sc 3 | | | +----------+ | + - - - - - - - - - -+ ->FALSE if false TRUE: +--------------------+ | instr 1 | +--------------------+ ->NEXT FALSE: +--------------------+ | instr 2 | +--------------------+ NEXT: The code in this second case should be of the form: +----------------+ | sc 1 | +----------------+ ->FALSE if false +----------------+ | sc 2 | +----------------+ ->FALSE if false +----------------+ | sc 3 | +----------------+ ->FALSE if false +----------------+ | instr 1 | +----------------+ ->NEXT FALSE: +----------------+ | instr 2 | +----------------+ NEXT: Next consider: %IF sc1 %OR sc2 %OR sc3 %THEN instr 1 %ELSE instr 2 This should produce the code in the diagram below. It will be apparent that if an %AND follows the simple comparison, the jump is on false and if an %OR follows, the jump is on true. To make this rule consistent throughout, since the jump following the last simple comparison ts always on false we can imagine an %AND following the last simple comparison: %IF ( condition ) %AND %THEN instr 1 %ELSE instr 2 47 +----------------+ | sc 1 | +----------------+ ->TRUE if true +----------------+ | sc 2 | +----------------+ ->TRUE if true +----------------+ | sc 3 | +----------------+ ->FALSE if false TRUE: +----------------+ | instr 1 | +----------------+ ->NEXT FALSE: +----------------+ | instr 2 | +----------------+ NEXT: The next observation concerns the destination of the jump. This is not always to one of the unconditional instructions. It can be to a later simple comparison. For example: %IF ((sc1 %AND sc2) %OR sc3) %AND sc4 %THEN instr 1 %ELSE instr 2 This should produce code of the form: +----------------+ | sc 1 | +----------------+ ->FALSE1 if false +----------------+ | sc 2 | +----------------+ ->TRUE if true FALSE1:+----------------+ | sc 3 | +----------------+ ->FALSE2 if false TRUE: +----------------+ | sc 4 | +----------------+ ->FALSE2 if false +----------------+ | instr 1 | +----------------+ ->NEXT FALSE2:+----------------+ | instr 2 | +----------------+ NEXT: For a simple comparison followed by an %AND, the jump is to the simple comparison following the next %OR at an outer bracketting level. For a simple comparison followed by an %OR, the jump is to the simple comparison following 48 the next %AND at an outer bracketting level. This can again be made consistent if the conditional statement is regarded as of the form: %IF ( ( condition ) %AND %THEN instr 1 ) %OR %ELSE instr 2 To demonstrate this, we could redraw the example above showing bracketting levels with each simple comparison (or instruction) at the level of the following %AND or %OR: sc1 %AND | | false sc2 %OR | | | |true sc3 %AND | | | | +-------------|-------> |false sc4 %AND | | | | +-----------|--------> | | |false instr1 %OR | | +------------+---------------------->instr2 A further example might be: %IF (sc1 %OR sc2 %OR sc3) %AND (sc4 % OR sc5) %THEN instr1 %ELSE instr2 The organisation required is: sc1 %OR cs2 %OR sc4 %OR | | | | |true |true sc3 %AND | | true | | | | | | | |false | | sc5 %AND | | | | | | +---------+-----------|-------> | |false | | | | +--------|-------->instr1 %OR | | +--------------------+-------------------->instr2 In compiling conditional statements, the plan of action is to make several passes over the condition, working out the structure required step by step and finally, in a last pass, generating the object code. Five parallel arrays are used, each with one position per simple comparison. The first pass traverses the analysis record and locates the positions of all the simple comparisons. Simultaneously, the nesting levels and truth or falsity of the branches is determined and also stored. The next pass goes over this information to determine the destinations of the branches using the rules of thumb described above. For those simple comparisons which are the destination of branches, the next pass allocates 'private' labels to which branches can be compiled using the normal methods used elsewhere in the compiler (see chapter 9). Private labels are just values allocated upwards from 10000, which allows them to be distinguished from ordinary labels. For the sake of consistency, the instructions made conditional are also allocated positions in the arrays and are regarded as being at nesting levels -1 and -2. The rules can then be applied uniformly throughout. When the instruction made conditional is itself a branch, the object is optimised slightly to allow branches to be made direct to the required place 49 from within the condition instead of having a branch to a branch. For example: %IF sc1 %OR sc2 %OR sc3 %THEN ->100 %ELSE instr The code produced takes the form: +-------------+ | sc 1 | +-------------+ ->100 if true +-------------+ | sc 2 | +-------------+ ->100 if true +-------------+ | sc 3 | +-------------+ ->100 if true +-------------+ | instr | +-------------+ This is easily achieved by using the required label instead of a private label to label the conditional branch instruction in the branch array. The only other modification is to ensure that the branch on the previous i.e.last, simple comparison is on true rather than false and to the label specified. This should be clearer when the implementation details are discussed. The contents of the five arrays can be displayed during compilation by using the parameter '/COND'. Examples will be found in appendix 1. %START-%FINISH Groups When either of the instructions made conditional is a %START-%FINISH group, the destinations of the final branches are not immediately known. This can only be done after the %FINISH has appeared. Information therefore has to be preserved during the compilation of the group. Since the group may itself contain further conditional statements and hence nested %START-%FINISH groups, a pushdown list is used for the purpose. Furthermore, since routine bodies may also appear anywhere and be nested, a list will potentially be required for each textual level of the program. An array of headcells to lists is used as previously to achieve this: +----+ +---+---+ +---+---+ | |------>| | |---->| | | +----+ +---+---+ +---+---+ | |---+ +----+ | +---+---+ | | +---| | | +----+ +---+---+ | | Two items of information need to be stored for each group. Firstly, a marker which distinguishes between groups before and those after %ELSE keywords and secondly, a label number assigned to the destination of branches around the group. This will be a private label. 50 Implementation Details The code generation for conditions is done by integerfn 'COND' in file 'SKIMPC'. This is called from the section of 'STATEMENT' in file 'SKIMPB' which deals with conditional statements. It is at this latter place that the code for the instructions made conditional is generated by calling routine 'INSTR'. In order to deal with the branch instruction optimisation, two parameters in addition to the parameter which indicates where the condition starts in the analysis record are passed in. These give the label numbers of the branches or -1 if either of then is a non-branch instruction. These are then used instead of private labels as will be seen below. The result of this function is the private label number of the label to be put in front of the %ELSE clause. %START-%FINISH groups are dealt with after the condition code has been generated and also where %FINISH statements are processed. Two subroutines are used to maintain the pushdown lists. These are 'PUSHSTART' and 'POPSTART' and can be found in file 'SKIMPD'. The five parallel arrays within function 'COND' are named 'TESTP', 'LEVEL', 'ANDOR', 'BRANCH' and 'LABEL'. They hold. respectively, the analysis record positions of the simple comparisons (or 'tests'), the nesting levels of the simple comparisons, the truth or falsity of the jumps(1 for false and 2 for true), the destination of the jumps (in terms of column position in the table) and the labels assigned to simple comparisons. For example, consider: %IF (sc1 %OR sc2 %OR sc3) %AND (sc4 %OR sc5) %THEN instr1 %ELSE instr2 The contents of the arrays will be: 1 2 3 4 5 6 7 TESTP ap1 ap2 ap3 ap4 ap5 LEVEL 1 1 0 1 -1 -2 ANDOR 2 2 1 2 1 2 BRANCH 4 4 7 6 7 LABEL -1 -1 -1 10000 -1 10002 10001 where the 'ap?' represent analysis record positions. The first pass over the statement is performed by the subroutine 'PROCESSCOND' and fills in rows 'TESTP', 'LEVEL' and 'ANDOR'. It works recursively following the recursive definition of conditions. Subsequently, two extra columns are filled in with -1 and -2 for the pseudo-%AND and the pseudo-%OR. On the next pass the row 'BRANCH'is filled in and following that labels are assigned as necessary. (The -1 indicates that no jump takes place to this simple comparison and hence no label is required. The final pass generates the object code by calling 'EXPR' with the 'CONDFLAG' set as described in the previous chapter to evaluate LHS minus RHS, and dumping the appropriate intermediate jump instructions. Another example may help to clarify the picture further . Consider the case: %IF sc1 %OR ((sc2 %OR sc3) %AND sc4 %AND sc5) %THEH instr1 %ELSE instr2 The structure of the statement is: 51 sc2 %OR | |true sc3 %AND sc4 %AND | | | | sc1 %OR | |false | |false | | | | | | true +-----------|------> | sc5 %AND | | | | | | | |false | | | | +----------------------|----------|-----------|-------->instr1 %OR | | | +----------+-----------+---------------------->instr2 The table for this statement will be: 1 2 3 4 5 6 7 TESTP ap1 ap2 ap3 ap4 ap5 LEVEL 0 2 1 1 -1 -2 ANDOR 2 2 1 1 1 2 BRANCH 6 4 7 7 7 LABEL -1 -1 -1 10004 -1 10003 10005 As a final example, consider a statement containing conditional branches: %IF sc1 %OR sc2 %OR sc3 %THEN ->88 %ELSE ->99 The table will be: 1 2 3 4 5 TESTP ap1 ap2 ap3 LEVEL 0 0 -1 -2 ANDOR 2 2 2 2 BRANCH 4 4 4 LABEL -1 -1 -1 88 99 52 Chapter 9 Labels and Jumps As the SKIMP compiler is a one-pass compiler, jumps to labels not yet encountered cannot be completely generated. Backward jumps are, of course, no problem. For the forward jumps, all that can be done is to leave a hole in the branch instruction dumped which can be filled in at some later stage. In this case, the assembler which forms the first stage of the 'SKIMPAI' interpreter system has the job to do. In order for this to be possible, information must be recorded somewhere concerning the positions of the holes to be filled in and what to fill them in with. The method adopted is to use the holes themselves as links in lists which relate to each label and for the compiler to dump 'FILL' assembler directives which are really the headcells to these lists and which also include the value to be filled in. The 'FILL' directives are dumped whenever a label which has been previously referenced is encountered. An example should make this clear. Consider the program: . . . . ->88 . . . . ->99 . . . . ->99 . . . . 99: . . . . . . . . The object code dumped will take the form: . . . . . . +------+ 12$ B,,,| 0 | +------+ . . . . . . +------+ 34$ B,,,| 12 | +------+ . . . . . . +------+ 56$ B,,,| 34 | +------+ As labels are local to each level of routine nesting, once again separate lists must be used for each level. Each cell of a list in this case contains three items of information, the label number, a flag and an address: +---------------+------+-----------+ | label | flag | address | +---------------+------+-----------+ (16) (1) (15) 53 The flag indicates whether the address is that of the latest forward reference to the label i.e. the current top of the forward reference list, or the actual address of the label which will be needed for any backward references. The value 1 indicates the former and 0 the latter. When additional forward references appear therefore, the address is pushed down 'through' this cell. A cell of this form is created whenever a label is encountered for the first time - either as a label or as a jump. The list for the current textual level is first searched and a new cell added if no existing cell refers to that label. The flag allows two vital checks to be made. Firstly that the label has not occurred before (as a label) and secondly at the end of the routine that the label has in fact appeared. Routine calls use the 'BAL' instruction and are always backward references since SKIMP does not allow routine %SPECs. The address of the routine body is, in other words, already known when any routine calls are made. SKIMP allows routines to be placed anywhere in a program. In particular at the head of surrounding routines. Jumps around these routine bodies must therefore be dumped. These are forward references but are simple in that only one backward reference will occur. They are therefore treated specially without having to invent a private label. Calls on the built-in input-output routines are also treated specially. Since these routines are implemented in the 'SKIMPAI' assembler/interpreter, the 'BAL' instructions dumped contain the marker 'EXT' and a number which refers to the routine required. Were external references part of the language, the names themselves would have to appear in the object file and likewise for any external variables in the program. A linker would then resolve the references between programs. Implementation Details The routines which accomplish the functions decrlbed above are named 'FILL LABEL' and 'FILL BRANCH', in file 'SKIMPD'. A function 'GET LABEL' extracts a label from the analysis record and checks that it is valid i.e.less than 10000. 'POP LABELS' pops up the cells and checks for missing labels at %END (and %ENDOFPROGRAM). 54 Appendix 1 Examples of SKIMP Object Code Files 55 LEX, ANAL example SKIMP COMPILER MKII FILE: TESTL OPTIONS: LEX ANAL %begin 130 (1/STATEMENT) 8 0$ LDA,COT,,0 1$ LDA,DR1,,0 2$ LDA,STP,DR1,0 %integerfn r 136 134 256 82 (1/STATEMENT) 6 5 6 8 (5/PROC) 2 (6/NAME) 1 82 (8/FORMAL) 3$ B,,,0 4$ STR,DR2,STP,0 5$ LDA,DR2,STP,0 6$ STR,WK,STP,1 7$ LDA,STP,STP,0 %integer i,j,k 136 256 73 44 256 74 44 256 75 (1/STATEMENT) 5 3 (3/ARRAY) 2 6 8 (6/NAME) 1 73 (8/NAMES) 1 11 13 (11/NAME) 1 74 (13/NAMES) 1 16 18 (16/NAME) 1 75 (18/NAMES) 2 i=j+k 256 73 61 256 74 43 256 75 (1/STATEMENT) 1 3 (3/INSTR) 1 7 9 10 (7/NAME) 1 73 (9/ACTUAL) 2 (10/ASSIGN) 1 12 (12/EXPR) 1 16 17 23 (16/UNARY) 4 (17/OPERAND) 1 20 22 (20/NAME) 1 74 (22/ACTUAL) 2 (23/EXPRREST) 1 27 28 34 (27/OP) 9 (28/OPERAND) 1 31 33 (31/NAME) 1 75 (33/ACTUAL) 2 (34/EXPRREST) 2 8$ LOAD,ACC,DR2,3 9$ ADD,ACC,DR2,4 10$ STR,ACC,DR2,2 %if i>1234 %then %stop 135 256 73 62 257 1234 145 144 56 (1/STATEMENT) 2 5 36 37 (5/COND) 1 8 35 (8/TEST) 1 12 24 25 (12/EXPR) 1 16 17 23 (16/UNARY) 4 (17/OPERAND) 1 20 22 (20/NAME) 1 73 (22/ACTUAL) 2 (23/EXPRREST) 2 (24/COMP) 6 (25/EXPR) 1 29 30 34 (29/UNARY) 4 (30/OPERAND) 2 32 (32/CONST) 1 1234 (34/EXPRREST) 2 (35/CONDREST) 3 (36/INSTR) 6 (37/ELSE) 2 11$ LOAD,ACC,DR2,2 12$ SUB,ACC,COT,0 13$ BNG,ACC,,0 14$ STOP,,,0 15$ FILL,10000,13,15 %results=i+4321 141 61 256 73 43 257 4321 (I/STATEMENT) 1 3 (3/INSTR) 5 5 (5/EXPR) 1 9 10 16 (9/UNARY) 4 (10/OPERAND) 1 13 15 (13/NAME) 1 73 (15/ACTUAL) 2 (16/EXPRREST) 1 20 21 25 (20/OP) 9 (21/OPERAND) 2 23 (23/CONST) 1 4321 (25/EXPRREST) 2 15$ LOAD,ACC,DR2,2 16$ ADD,ACC,COT,1 17$ LDA,STP,DR2,0 18$ LOAD,DR2,STP,0 19$ LOAD,WK,STP,1 20$ B,,WK,0 %end 131 (1/STATEMENT) 7 3 (3/OFPROG) 2 21$ FILL,ALLOC,7,5 21$ STOP,,,0 22$ FILL,SKIP,3,22 %endofprogram 131 139 (1/STATEMENT) 7 3 (3/OFPROG) 1 22$ FILL,ALLOC,2,2 22$ STOP,,,0 23$ FILL,COT,0,23 23$ CONST...1234 24$ CONST,,,4321 25$ FILL,STACK,1,25 $ 0 FAULTS IN PROGRAM 57 EXPR example SKIMP COMPILER MKII FILE: TESTE OPTIONS: EXPR %begin 0$ LDA,COT,,0 1$ LDA,DR1,,0 2$ LDA,STP,DR1,0 %integer i,j,k,l %integerarray a(1:10) -4* 1 3$ LDA,ACC,,1 4$ STR,ACC,DR1,6 -4* 10 5$ LDA,ACC,,10 6$ LDA,ACC,ACC,1 7$ STR,ACC,DR1,7 8$ SUB,STP,DR1,6 9$ STR,STP,DR1,8 10$ ADD,STP,DR1,7 i=j+k + -3 01010003 -3 01010004 9* 1 3 11$ LOAD,ACC,DR1,3 12$ ADD,ACC,DR1,4 13$ STR,ACC,DR1,2 a(j+k)=i*l-j*k + -3 01010002 -3 01010005 8 1 3 -3 01010003 -3 01010004 8 8 10 10* 5 12 * - 14$ LOAD,ACC,DR1,3 15$ MLT,ACC,DR1,4 16$ STR,ACC,DR1,6 17$ LOAD,ACC,DR1,2 18$ MLT,ACC,DR1,5 19$ SUB,ACC,DR1,6 20$ STR,ACC,DR1,6 + -3 01010003 -3 01010004 9* 1 3 21$ LOAD,ACC,DR1,3 22$ ADD,ACC,DR1,4 23$ ADD,ACC,DR1,8 24$ LOAD,WK,DR1,6 25$ STR,WK,ACC,0 58 i=i*(j+k)/(l-i**2) + * -3 01010002 -3 01010003 -3 01010004 9 3 5 8 1 7 -3 01010005 -3 01010002 -4 2 6 15 17 10 13 19 7* 10 22 ** - 26$ LOAD,ACC,DR1,2 27$ EXP,ACC,COT,0 28$ STR,ACC,DR1,6 29$ LOAD,ACC,DR1,5 30$ SUB,ACC,DR1,6 31$ STR,ACC,DR1,6 32$ LOAD,ACC,DR1,3 33$ ADD,ACC,DR1,4 34$ MLT,ACC,DR1,2 35$ DIV,ACC,DR1,6 36$ STR,ACC,DR1,2 i=a(j)+a(k) + -2 17 -2 43 9* 1 3 -3* 01010004 37$ LOAD,ACC,DR1,4 38$ ADD,ACC,DR1,8 39$ LOAD,ACC,ACC,0 40$ STR,ACC,DR1,6 -3* 01010003 41$ LOAD,ACC,DR1,3 42$ ADD,ACC,DR1,8 43$ LOAD,ACC,ACC,0 44$ ADD,ACC,DR1,6 45$ STR,ACC,DR1,2 %endofprogram 46$ FILL,ALLOC,2,9 46$ STOP,,,0 47$ FILL,COT,0,47 47$ CONST,,,2 48$ FILL,STACK,1,48 $ 0 FAULTS IN PROGRAM 59 TAGS Example SKIMP COMPILER MKII FILE: TESTT OPTIONS: TAGS %begin 0$ LDA,COT,,0 1$ LDA,DR1,,0 2$ LDA,STP,DR1,0 %routine a(%integer i,j,k) 3$ B,,,0 4$ STR,DR2,STP,0 5$ LDA,DR2,STP,0 6$ STR,WK,STP,1 7$ LDA,STP,STP,0 %end 8$ FILL,ALLOC,7,5 75 K 01020004 74 J 01020003 73 I 01020002 8$ LDA,STP,DR2,0 9$ LOAD,DR2,STP,0 10$ LOAD,WK,STP,1 11$ B,,WK,0 12$ FILL,SKIP,3,12 %integerfn b(%integername l) 12$ B,,,0 13$ STR,DR2,STP,0 14$ LDA,DR2,STP,0 15$ STR,WK,STP,1 16$ LDA,STP,STP,0 %routine c(%integerarrayname m,n) 17$ B,,,0 18$ STR,DR3,STP,0 19$ LDA,DR3,STP,0 20$ STR,WK,STP,1 21$ LDA,STP,STP,0 %end 22$ FILL,ALLOC,21,4 78 N 31130003 77 M 31130002 62 22$ LDA,STP,DR3,0 23$ LOAD,DR3,STP,0 24$ LOAD,WK,STP,1 25$ B,,WK,0 26$ FILL,SKIP,17,26 %end 26$ FILL,ALLOC,16,3 67 C 40220012 31130002 31130003 76 L 11020002 26$ STOP,,,0 27$ FILL,SKIP,12,27 %integer i,j %endofprogram 27$ FILL,ALLOC,2,4 74 J 01010003 73 I 01010002 66 B 4111000D 11020002 65 A 40310004 01020002 01020003 01020004 27$ STOP,,,0 28$ FILL,COT,0,28 28$ FILL,STACK,1,28 $ 0 FAULTS IN PROGRAM 63 Basic Example SKIMP COMPILER MKII FILE: HANOI OPTIONS: %begin 0$ LDA,COT,,0 1$ LDA,DR1,,0 2$ LDA,STP,DR1,0 %routine hanoi(%integer n,p1,p2) 3$ B,,,0 4$ STR,DR2,STP,0 5$ LDA,DR2,STP,0 6$ STR,WK,STP,1 7$ LDA,STP,STP,0 %if n>0 %then %start 8$ LOAD,ACC,DR2,2 9$ BNG,ACC,,0 hanoi(n-1,p1,6-p1-p2) 10$ LOAD,ACC,DR2,2 11$ SUB,ACC,COT,0 12$ STR,ACC,STP,2 13$ LOAD,ACC,DR2,3 14$ STR,ACC,STP,3 15$ LDA,ACC,,6 16$ SUB,ACC,DR2,3 17$ SUB,ACC,DR2,4 18$ STR,ACC,STP,4 19$ BAL,UK,,4 write(p1,1) ; 20$ LOAD,ACC,DR2,3 21$ STR,ACC,STP,2 22$ LDA,ACC,,1 23$ STR,ACC,STP,3 24$ BAL,WK,EXT,11 write(p2,1) ; 25$ LOAD,ACC,DR2,4 26$ STR,ACC,STP,2 27$ LDA,ACC,,1 28$ STR,ACC,STP,3 29$ BAL,HK,EXT,11 newline 30$ BAL,WK,EXT,7 64 hanoi(n-1,6-p1-p2,p2) 31$ LOAD,ACC,DR2,2 32$ SUB,ACC,COT,O 33$ STR,ACC,STP,2 31$ LDA,ACC,,6 35$ SUB,ACC,DR2,3 36$ SUB,ACC,DR2,4 37$ STR,ACC,STP,3 38$ LOAD,ACC,DR2,4 39$ STR,ACC,STP,4 40$ BAL,WK,,4 %finish 11$ FILL,10000,9,41 %end 41$ FILL,ALLOC,7,5 41$ LDA,STP,DR2,0 42$ LOAD,DR2,STP,0 43$ LOAD,WK,STP,1 44$ B,,WK,0 45$ FILL,SKIP,3,45 %integer a,b,c 1:read(a) 45$ LDA,ACC,DR1,2 46$ STR,ACC,STP,2 47$ BAL,WK,EXT,10 %if a=0 %then %stop 48$ LOAD,ACC,DR1,2 49$ BNZ,ACC,,0 50$ STOP,,,0 51$ FILL,10001,49,51 read(b) ; 51$ LDA,ACC,DR1,3 52$ STR,ACC,STP,2 53$ BAL,WK,EXT,10 read(c) 54$ LDA,ACC,DR1,4 55$ STR,ACC,STP,2 56$ BAL,WK,EXT,10 65 hanoi(a,b,c) 57$ LOAD,ACC,DR1,2 58$ STH,ACC,STP,2 59$ LOAD,ACC,DR1,3 60$ STR,ACC,STP,3 61$ LOAD,ACC,DR1,4 62$ STR,ACC,STP,4 63$ BAL,WK,,4 ->1 64$ B,,,45 %endofprogram 65$ FILL,ALLOC,2,5 65$ STOP,,,0 66$ FILL,COT,0,66 66$ CONST,,,1 67$ FILL,STACK,1,67 $ 0 FAULTS IN PROGRAM 66 Appendix 2 Possible Extensions to SKIMP 67 1. a. Identifier labels instead of numerical labels: LOOP: ->NEXT NEXT: ->LOOP b. Switches: %SWITCH TYPE(1:9) TYPE (I): ->TYPE(I*J) 2. a. Indefinite loops: %CYCLE . . . . %REPEAT b. Incremental loops: %FOR I=1,1,N %CYCLE . . . . %REPEAT c. 'While' loops: FRED %WHILE I>0 %WHILE I>0 %CYCLE . . . . %REPEAT d. 'Until' loops: FRED %UNTIL I>0 %CYCLE . . . . %REPEAT %UNTIL I>0 e. %EXIT and %CONTINUE statements. 3. a. Reversed order conditional statements: I=J*K %IF I=0 b. 'Unless' conditions: %UNLESS I=J %THEN K=L c. 'else-if' clauses: %IF I=0 %THEN J=0 %ELSE %IF K=0 %THEN L=0 d. Multi-sided comparisons: %IF A1 ! $ TROFF ! $ MONITOR %ENDOFPROGRAM The output from the compiler and the interpreter are shown below: 71 SKIMP COMPILER MKII FILE: TEST OPTIONS: %BEGIN 0$ LDA,COT,,0 1$ LDA,DR1,,0 2$ LDA,STP,DR1,0 %INTEGERARRAY A(1:10) 3$ LDA,ACC,,1 4$ STR,ACC,DR1,2 5$ LDA,ACC,,10 6$ LDA,ACC,ACC,1 7$ STR,ACC,DR1,3 8$ SUB,STP,DR1,2 9$ STR,STP,DR1,4 10$ ADD,STP,DR1,3 %INTEGER I ! $ TRON I=1 11$ LDA,ACC,,1 12$ STR,ACC,DR1,5 1: A(I)=I*I 13$ LOAD,ACC,DR1,5 14$ MLT,ACC,DR1,5 15$ STR,ACC,DR1,2 16$ LOAD,ACC,DR1,5 17$ ADD,ACC,DR1,4 18$ LOAD,WK,DR1,2 19$ STR,WK,ACC,0 I=I+1 20$ LOAD,ACC,DR1,5 21$ ADD,ACC,COT,0 22$ STR,ACC,DR1,5 %IF I<=10 %THEN ->1 23$ LOAD,ACC,DR1,5 24$ SUB,ACC,COT,1 25$ BHG,ACC,,13 72 ! $ TROFF ! $ MONITOR %ENDOFPROGRAM 26$ FILL,ALLOC,2,6 26$ STOP,,,0 27$ FILL,COT,0,27 27$ CONST,,,1 28$ CONST,,,10 29$ FILL,STACK,1,29 $ 0 FAULTS IN PROGRAM 11$ 12$ 13$ 11$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ 13$ 14$ 15$ 16$ 17$ 18$ 19$ 20$ 21$ 22$ 23$ 24$ 25$ COT 27 DR1 29 STP 45 ACC 1 WK 100 29$ ? ? 100 11 34 11 1 4 9 16 25 36 49 64 81 100 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? STOPPED AT 26$, 143 INSTRUCTIONS EXECUTED 73