\documentstyle[a4,12pt]{article}
\begin{document}
\author{Rainer Thonnes}
\title{APM Cross-Assembler for 8086}
\maketitle
\parskip .1 in
\setcounter{secnumdepth}{10}
\parindent 0in
\section{Preamble}
ASS86 User Notes

\section{Introduction}

ASS86 is a simple assembler for the Intel 8086 microprocessor, and is also
suitable for the 8088, 80186, and 80286. It is a two-pass assembler without
macro facility, and while the syntax for the specification of most opcodes
is the same as that used by Intel's own assembler (ASM86), there are a
number of deviations which are documented here.

The assembler is available on the APMs and on VAX/VMS. On the APMs, the
command used to invoke the assembler is "ASSEM:ASS86", which takes as only
parameter the name of the file to be assembled. Two optional boolean
qualifiers determine whether the object and listing files are to be
generated. The default qualifier values are -OBJ and -NOLIST. The source
file is assumed to have extension ".86", the object file ".IOB", and the
listing file ".LIS". Examples:

ASSEM:ASS86 FRED -LIST

assembles FRED.86, with object to FRED.IOB and listing to FRED.LIS.

ASSEM:ASS86 TEST -NOOBJ

assembles TEST.86 without producing object or listing, only errors are
reported to the console.


\section{General notes}

Assembler statements usually appear one per line, though certain lines may
contain more than one statement, separated by ';'.

Comments may appear either as complete lines or to the right of any
statement. They begin with '/' and occupy the rest of the line.

Tags (names entered into the assembler's dictionary) consist of alphanumeric
characters, but must start with a letter. Upper and lower case letters are
not distinguished. Although tags may be longer, only the first six
characters are significant. Tags must be unique, which means that
user-defined labels must not have the name as any of the pre-defined tags
(register names, opcodes, or directives).

Tags may be pre-defined by the programmer, using statements of the form
"TAG = EXPRESSION", where the expression is either a constant, a register,
or a memory reference.

Labels consist of a tag followed by ':'. A label may appear either on a
line by itself, or in front of an instruction.


\section{Segmentation}

The assembler generates object output in two "areas", normally associated
with the code and data segments. It maintains a separate location counter
for each of these two areas.


\section{Operands}

The operands for most instructions other than jumps are either constants,
registers, or memory references.

Constants may be literals (signed decimal numbers, the radix may be
over-ridden using '\_'-notation, such that hex 3FC would be expressed
as 16\_3FC), or expressions involving literals and constant tags, combined
with the operators plus(+), minus (-), and(\&), or(!), exclusive-or($\backslash$),
left-shift($<$$<$), and right-shift($>$$>$). Operator precedence is strictly
left to right, and cannot be over-ridden with brackets, because brackets
imply indexing.

Registers are the eight-bit registers AL, CL, DL, BL, AH, CH, DH, BH, and
the sixteen-bit registers AX, CX, DX, BX, SP, BP, SI, DI, and the segment
registers ES, CS, SS, DS.

Memory operands may be directly addressable locations in memory, such as
those defined by a label or an absolute offset, or they may be indexed.

To reference locations at an absolute memory offset, the pre-defined tags B
(for byte) and W (for word) may be used. For example, B(37) denotes the
byte at offset 37 from the beginning of the data segment, W(0) denotes the
word (i.e. 16-bit word) at offset 0 in the data segment.

Indexed memory operands use conventional bracket notation. So 23(BP) refers
to the location offset 23 bytes from wherever in the stack segment the BP
register points to. Note that a tag has a built-in property which
identifies it as a byte (as distinct from a word). With 23(BP) the
assembler does not know whether a byte or word is meant, but B(23+BP) and
W(23+BP) leave no doubt. Doubt may be resolved by context. For example,
the instruction "MOV AX,23(BP)" would move the contents of the WORD at
23(BP) into AX, because AX is a word register, whereas "MOV 23(BP),CH" would
move the contents of the CH register into the BYTE at 23(BP), because CH is
a byte register. Where doubt remains, for example in "INC 23(BP)", the
assembler assumed WORD mode.

Memory operands involving indexing with more than one register are specified
in the obvious fashion, eg 4(BX+SI). It makes no difference which comes
first or whether the offset is inside or outside the brackets, so W(SI+4+BX)
means the same as 4(BX+SI).

Memory operands always live in the default segment the processor uses, i.e.
in the stack segment for indexing involving the BP register, etc. This may
be over-ridden by including a segment register inside the parentheses. This
forces the assembler to put a "segment override prefix" instruction in front
of the instruction being generated, e.g. MOV AX,64(BP+ES).


\section{Assembler directives}

\small\tt \begin{verbatim}CODE          This causes subsequent code to be generated into the code area.
DATA          This causes subsequent code to be generated into the data area.
LOC n         This sets the location counter of the current area to n.
              Initially both location counters begin at 0.
              Other assemblers call this ORG.
EXTERNAL xyz  This defines an external symbol "xyz" to the linker as being
              the current value of the location counter, in the current area.
SPEC xyz      This reserves two bytes in the current area, which at link/load
              time will be filled in with (the offset part of) the value of
              external symbol "xyz".
SEGSPEC xyz   This, similary, reserves two bytes in the current area, which
              will be filled in with the segment part of the value of "xyz".
LIST n        This controls generation of the assembler listing.
              0 turns listing off, 1 turns it on.  By default it is on.
BYTE \/       These two directives are the equivalent of what other assemblers
WORD /\       call DC.B and DC.W, they evaluate a constant expression and
              plant its value in-line in the current area.  Alternatively,
              a list of constants separated by commas may be given, or
              a quoted string.  In the latter case the string must begin
              with a double quote, and continues to the end of the line
              or the next double quote, whichever comes first.  What is
              planted are the characters in the string, without any length
              prefix or null suffix.
ENDMODULE     This causes the assembler to generate a module separator
              linker directive into the object file.  It should only
              be used when assembling several independent modules in
              a single file.
END           This is normally the last statement in any source file.
              It cuases the assembler to stop.
\end{verbatim}\rm  \normalsize 

\section{Deviations from standard instruction mnemonics}

Because Intel's ASM86 assembler associates more information with operands
than does this simpler assembler, a different mechanism is used here to
distinguish between the within-segment and inter-segment variants of the
jump, call, and return instructions, and between the direct and indirect
variants of the jump and call instructions.

The mnemonic JMP is used for the unconditional short jump, which, like
the conditional jumps, generates two bytes of code, and is restricted to
a reach of between 128 bytes back and 127 bytes forward from the next
instruction. The mnemonic JUMP is used for the long within-segment jump,
with a reach of -32768 to +32767 bytes. The operands for JMP and JUMP
should always be a label within the code area.

An indirect jump within the current segment is generated by the JUMPI
mnemonic, the operand of which is the address of a 2-byte memory location
containing the offset of the jump destination. To perform an inter-segment
jump we use XJUMP, the operand for which takes the special form xxx:yyy
(two constant expressions separated by a colon). The xxx part denotes the
segment part, the yyy the offset part of the jump's destination. Finally,
XJUMPI is used for an indirect inter-segment jump, i.e. the operand
designates a 4-byte memory location containing the jump destination (offset
part first).

To recap, the JUMP instruction is made INDIRECT by suffixing its opcode tag
with "I", and is made INTERSEGMENT by prefixing "X" to it.

Similarly we have the CALL, CALLI, XCALL, and XCALLI variants of the
subroutine call instruction, and the RET and XRET variants of the subroutine
return instruction. The return instructions take an optional parameter, a
constant which is to be added to SP after the return address has been
popped, in order to remove the parameters which the caller had pushed.


Documentation dated 20/10/86, RWT.
\vspace{.75in} assem:ass86.doc printed on 14/03/89 at 17.12

\newpage
\tableofcontents
\end{document}