\documentstyle[a4,12pt]{article}
\begin{document}
\author{Rainer Thonnes}
\title{APM Low-level compiler for 8086}
\maketitle
\parskip .1 in
\setcounter{secnumdepth}{10}
\parindent 0in
\section{Preamble}
LC (Low-level compiler) for 8086 - User Notes

\section{Introduction}

This is a very simple compiler which was written with the aim of getting a
working system together in a short space of time. The "low" in low-level
does not mean that the compiler gives access to low-level facilities such as
machine code (no such facilities are provided), but rather that the language
is relatively low-level, offering far fewer features than a conventional
high-level language. The simplicity of the compiler made it possible for it
to be written and debugged in only a few weeks.

Although lacking many desirable features, it was designed in order to make
it easy to port Imp programs, without going to the effort of writing an Imp
compiler. It is easy to translate Imp programs into LC, provided they do
not make too extensive use of records or pointers.

The compiler was aimed at the 8086, consequently the only data types
available are the byte and the 16-bit word. Records and pointers are
not supported, nor are strings, although there is a rudimentary facility
for moving string constants into byte arrays. Arrays are supported, but
can only be one dimensional and constant bounded. Recursive procedures
are supported, but they may not be nested. No distinction is made between
routines and functions, instead the RETURN statement takes an optional
expression parameter. External linkage for procedures is available, so
separately compiled modules may be combined with each other or with modules
produced with the assembler. Both local and global (own) variables are
available, however they are distinguished not syntactically (for example by
using the keyword OWN), but by context (variables declared within procedures
are local, those declared outwith procedures are own). Local variables
live in the stack frame of the procedure containing them, global variables
are allocated space in the data segment. Global variables may be given
initial values.

The keywords of the language are not specially marked (as with '\%' in Imp),
they are just reserved words (like in Pascal). All words, whether keywords
or user chosen names (tags) of procedures, variables, labels, or constants,
consist of alphanumeric characters, and may not contain spaces. Space is in
fact used as a sort of all-purpose separator; it is used, for example, to
separate a procedure name from its parameters, and parameters from each
other. procedure declaration. Parentheses are used only for array indexing
and to over-ride precedence in arithmetic expressions (precedence is
otherwise strictly left to right).

Comments are accepted in two forms. Either they begin with '|' and
occupy the remainder of the line, or they begin with '\{' and extend to
the next '\}', even if this is on another line.


\section{Declarations}

Constants are declared using the keyword CONST followed by one or more
constructs of the form TAG = expression on the same line. For example,
CONST NL=10 SPECIAL='*' BELL=7

Variables are declared using either of the keywords BYTE or WORD, followed
by one or more constructs of the following forms:

\small\tt \begin{verbatim}BYTE X                  Just declares a scalar variable.
BYTE X (1:19)           Declares an array.
WORD X = 37             Declares and initialises a scalar.
WORD X (m:n) = a b c    Declares and initialises an array.
\end{verbatim}\rm  \normalsize 
Labels are declared in the form TAG: as one might expect.

Procedures are declared using the keyword PROC, followed by a tag which
is the name of the procedure. This is followed by zero or more further
tags which are the procedure's parameters, which are all assumed to be
of type WORD, and passed by value.

In procedure declarations a quoted string may appear optionally after the
procedure tag. This identifies a procedure as being external (exportable).
For example, PROC PSYM "PRINTSYMBOL" SYM declares an external procedure
PRINTSYMBOL with a single parameter SYM, which will be referred to in the
rest of this program using the tag PSYM. External procedures defined in
other modules may be imported (the equivalent of an Imp \%externalroutinespec)
by using SPEC instead of PROC, and by omitting the parameter list.

Note that when a procedure is called, no check is made to see whether the
correct number of parameters have been passed. This is one of the things
that were left out in order to make the compiler easy to write. It is
important that the programmer takes care to make sure the number of
parameters passed to a procedure matches with the number of paramaters that
procedure expects, especially since parameters are pushed by the calling
code, and removed from the stack by the called procedure.

Arithmetic operators are the usual set of '+', '-', '*', '\&', '$<$$<$', '$>$$>$',
with '/' for integer division, '\%' for remainder, '!' for logical or, and
'\' for logical exclusive or.

Comparators are the familiar '$<$', '$<$=', '$>$', '$>$=', '=', with '\#' for
not equal. In addition '[', '[=', ']', and ']=' are available and
denote unsigned comparisions.

Apart from declarations, most statements will be assignment statements
or procedure calls. In addition, to eliminate the need for excessive
labels, program structuring is available using the keywords IF, ELSE,
FINISH, CYCLE, REPEAT, END. Notice there is no THEN.

END marks the end of a procedure or of the whole program.

IF, which is always followed by a comparison condition, may either begin
a statement (in which case it expects to match up with a future FINISH),
or it may appear after a simple statement (in which case it applies to
that statement only). The two examples shown here are equivalent.

\small\tt \begin{verbatim}Z = Z+1 IF X=Y
IF X=Y
  Z = Z+1
FINISH
\end{verbatim}\rm  \normalsize 
CYCLE expects to match up with a futur REPEAT, which may, however, be
conditional (e.g. REPEAT IF X$<$=4).

ELSE either appears as a complete statement on its own, or may be
followed by an IF clause. For example:

\small\tt \begin{verbatim}IF X=0
  ...
ELSE IF X=1
  ...
ELSE
  ...
FINISH
\end{verbatim}\rm  \normalsize 
Where labels have to be used, the LC equivalent of Imp's "-$>$" or Pascal's
"GOTO" is "JUMP".

A programmer expecting to cross-call between LC and assembler should be
aware of the calling conventions, which are as follows. First, the
parameters, if any, are pushed, in the order they appear. Then the
procedure is called using an inter-segment indirect call. Parameters
are removed from the stack when the called procedure returns, because
it does so using the XRET $<$n$>$ instruction. The called module, if it
has a data segment it wishes to use, must set DS up for itself, and
preserve the previous contents of DS. LC procedures normally begin
with the code sequence

\small\tt \begin{verbatim}  PUSH DS                          preserve caller's DS
  PUSH BP                          preserve caller's BP
  MOV BP,SP                        set up local frame base
  SUB SP,size_of_local_variables   allocate space
  MOV AX,data_segment              make own variables accessible
  MOV DS,AX
  XOR AX,AX                        (to do with event trapping)
  PUSH AX
\end{verbatim}\rm  \normalsize 
The code sequence for returning from a procedure is

\small\tt \begin{verbatim}  MOV SP,BP                        point SP at caller's BP/DS
  POP BP                           restore caller's registers
  POP DS
  XRET size_of_parameters          return and remove parameters
\end{verbatim}\rm  \normalsize 
\vspace{.75in} assem:lc.doc printed on 14/03/89 at 15.27

\newpage
\tableofcontents
\end{document}