\documentstyle[a4,12pt]{article} \begin{document} \author{Rainer Thonnes} \title{APM Low-level compiler for 8086} \maketitle \parskip .1 in \setcounter{secnumdepth}{10} \parindent 0in \section{Preamble} LC (Low-level compiler) for 8086 - User Notes \section{Introduction} This is a very simple compiler which was written with the aim of getting a working system together in a short space of time. The "low" in low-level does not mean that the compiler gives access to low-level facilities such as machine code (no such facilities are provided), but rather that the language is relatively low-level, offering far fewer features than a conventional high-level language. The simplicity of the compiler made it possible for it to be written and debugged in only a few weeks. Although lacking many desirable features, it was designed in order to make it easy to port Imp programs, without going to the effort of writing an Imp compiler. It is easy to translate Imp programs into LC, provided they do not make too extensive use of records or pointers. The compiler was aimed at the 8086, consequently the only data types available are the byte and the 16-bit word. Records and pointers are not supported, nor are strings, although there is a rudimentary facility for moving string constants into byte arrays. Arrays are supported, but can only be one dimensional and constant bounded. Recursive procedures are supported, but they may not be nested. No distinction is made between routines and functions, instead the RETURN statement takes an optional expression parameter. External linkage for procedures is available, so separately compiled modules may be combined with each other or with modules produced with the assembler. Both local and global (own) variables are available, however they are distinguished not syntactically (for example by using the keyword OWN), but by context (variables declared within procedures are local, those declared outwith procedures are own). Local variables live in the stack frame of the procedure containing them, global variables are allocated space in the data segment. Global variables may be given initial values. The keywords of the language are not specially marked (as with '\%' in Imp), they are just reserved words (like in Pascal). All words, whether keywords or user chosen names (tags) of procedures, variables, labels, or constants, consist of alphanumeric characters, and may not contain spaces. Space is in fact used as a sort of all-purpose separator; it is used, for example, to separate a procedure name from its parameters, and parameters from each other. procedure declaration. Parentheses are used only for array indexing and to over-ride precedence in arithmetic expressions (precedence is otherwise strictly left to right). Comments are accepted in two forms. Either they begin with '|' and occupy the remainder of the line, or they begin with '\{' and extend to the next '\}', even if this is on another line. \section{Declarations} Constants are declared using the keyword CONST followed by one or more constructs of the form TAG = expression on the same line. For example, CONST NL=10 SPECIAL='*' BELL=7 Variables are declared using either of the keywords BYTE or WORD, followed by one or more constructs of the following forms: \small\tt \begin{verbatim}BYTE X Just declares a scalar variable. BYTE X (1:19) Declares an array. WORD X = 37 Declares and initialises a scalar. WORD X (m:n) = a b c Declares and initialises an array. \end{verbatim}\rm \normalsize Labels are declared in the form TAG: as one might expect. Procedures are declared using the keyword PROC, followed by a tag which is the name of the procedure. This is followed by zero or more further tags which are the procedure's parameters, which are all assumed to be of type WORD, and passed by value. In procedure declarations a quoted string may appear optionally after the procedure tag. This identifies a procedure as being external (exportable). For example, PROC PSYM "PRINTSYMBOL" SYM declares an external procedure PRINTSYMBOL with a single parameter SYM, which will be referred to in the rest of this program using the tag PSYM. External procedures defined in other modules may be imported (the equivalent of an Imp \%externalroutinespec) by using SPEC instead of PROC, and by omitting the parameter list. Note that when a procedure is called, no check is made to see whether the correct number of parameters have been passed. This is one of the things that were left out in order to make the compiler easy to write. It is important that the programmer takes care to make sure the number of parameters passed to a procedure matches with the number of paramaters that procedure expects, especially since parameters are pushed by the calling code, and removed from the stack by the called procedure. Arithmetic operators are the usual set of '+', '-', '*', '\&', '$<$$<$', '$>$$>$', with '/' for integer division, '\%' for remainder, '!' for logical or, and '\' for logical exclusive or. Comparators are the familiar '$<$', '$<$=', '$>$', '$>$=', '=', with '\#' for not equal. In addition '[', '[=', ']', and ']=' are available and denote unsigned comparisions. Apart from declarations, most statements will be assignment statements or procedure calls. In addition, to eliminate the need for excessive labels, program structuring is available using the keywords IF, ELSE, FINISH, CYCLE, REPEAT, END. Notice there is no THEN. END marks the end of a procedure or of the whole program. IF, which is always followed by a comparison condition, may either begin a statement (in which case it expects to match up with a future FINISH), or it may appear after a simple statement (in which case it applies to that statement only). The two examples shown here are equivalent. \small\tt \begin{verbatim}Z = Z+1 IF X=Y IF X=Y Z = Z+1 FINISH \end{verbatim}\rm \normalsize CYCLE expects to match up with a futur REPEAT, which may, however, be conditional (e.g. REPEAT IF X$<$=4). ELSE either appears as a complete statement on its own, or may be followed by an IF clause. For example: \small\tt \begin{verbatim}IF X=0 ... ELSE IF X=1 ... ELSE ... FINISH \end{verbatim}\rm \normalsize Where labels have to be used, the LC equivalent of Imp's "-$>$" or Pascal's "GOTO" is "JUMP". A programmer expecting to cross-call between LC and assembler should be aware of the calling conventions, which are as follows. First, the parameters, if any, are pushed, in the order they appear. Then the procedure is called using an inter-segment indirect call. Parameters are removed from the stack when the called procedure returns, because it does so using the XRET $<$n$>$ instruction. The called module, if it has a data segment it wishes to use, must set DS up for itself, and preserve the previous contents of DS. LC procedures normally begin with the code sequence \small\tt \begin{verbatim} PUSH DS preserve caller's DS PUSH BP preserve caller's BP MOV BP,SP set up local frame base SUB SP,size_of_local_variables allocate space MOV AX,data_segment make own variables accessible MOV DS,AX XOR AX,AX (to do with event trapping) PUSH AX \end{verbatim}\rm \normalsize The code sequence for returning from a procedure is \small\tt \begin{verbatim} MOV SP,BP point SP at caller's BP/DS POP BP restore caller's registers POP DS XRET size_of_parameters return and remove parameters \end{verbatim}\rm \normalsize \vspace{.75in} assem:lc.doc printed on 14/03/89 at 15.27 \newpage \tableofcontents \end{document}