\documentstyle[a4,12pt]{article} \begin{document} \author{Rainer Thonnes} \title{APM Cross-Assembler for 8086} \maketitle \parskip .1 in \setcounter{secnumdepth}{10} \parindent 0in \section{Preamble} ASS86 User Notes \section{Introduction} ASS86 is a simple assembler for the Intel 8086 microprocessor, and is also suitable for the 8088, 80186, and 80286. It is a two-pass assembler without macro facility, and while the syntax for the specification of most opcodes is the same as that used by Intel's own assembler (ASM86), there are a number of deviations which are documented here. The assembler is available on the APMs and on VAX/VMS. On the APMs, the command used to invoke the assembler is "ASSEM:ASS86", which takes as only parameter the name of the file to be assembled. Two optional boolean qualifiers determine whether the object and listing files are to be generated. The default qualifier values are -OBJ and -NOLIST. The source file is assumed to have extension ".86", the object file ".IOB", and the listing file ".LIS". Examples: ASSEM:ASS86 FRED -LIST assembles FRED.86, with object to FRED.IOB and listing to FRED.LIS. ASSEM:ASS86 TEST -NOOBJ assembles TEST.86 without producing object or listing, only errors are reported to the console. \section{General notes} Assembler statements usually appear one per line, though certain lines may contain more than one statement, separated by ';'. Comments may appear either as complete lines or to the right of any statement. They begin with '/' and occupy the rest of the line. Tags (names entered into the assembler's dictionary) consist of alphanumeric characters, but must start with a letter. Upper and lower case letters are not distinguished. Although tags may be longer, only the first six characters are significant. Tags must be unique, which means that user-defined labels must not have the name as any of the pre-defined tags (register names, opcodes, or directives). Tags may be pre-defined by the programmer, using statements of the form "TAG = EXPRESSION", where the expression is either a constant, a register, or a memory reference. Labels consist of a tag followed by ':'. A label may appear either on a line by itself, or in front of an instruction. \section{Segmentation} The assembler generates object output in two "areas", normally associated with the code and data segments. It maintains a separate location counter for each of these two areas. \section{Operands} The operands for most instructions other than jumps are either constants, registers, or memory references. Constants may be literals (signed decimal numbers, the radix may be over-ridden using '\_'-notation, such that hex 3FC would be expressed as 16\_3FC), or expressions involving literals and constant tags, combined with the operators plus(+), minus (-), and(\&), or(!), exclusive-or($\backslash$), left-shift($<$$<$), and right-shift($>$$>$). Operator precedence is strictly left to right, and cannot be over-ridden with brackets, because brackets imply indexing. Registers are the eight-bit registers AL, CL, DL, BL, AH, CH, DH, BH, and the sixteen-bit registers AX, CX, DX, BX, SP, BP, SI, DI, and the segment registers ES, CS, SS, DS. Memory operands may be directly addressable locations in memory, such as those defined by a label or an absolute offset, or they may be indexed. To reference locations at an absolute memory offset, the pre-defined tags B (for byte) and W (for word) may be used. For example, B(37) denotes the byte at offset 37 from the beginning of the data segment, W(0) denotes the word (i.e. 16-bit word) at offset 0 in the data segment. Indexed memory operands use conventional bracket notation. So 23(BP) refers to the location offset 23 bytes from wherever in the stack segment the BP register points to. Note that a tag has a built-in property which identifies it as a byte (as distinct from a word). With 23(BP) the assembler does not know whether a byte or word is meant, but B(23+BP) and W(23+BP) leave no doubt. Doubt may be resolved by context. For example, the instruction "MOV AX,23(BP)" would move the contents of the WORD at 23(BP) into AX, because AX is a word register, whereas "MOV 23(BP),CH" would move the contents of the CH register into the BYTE at 23(BP), because CH is a byte register. Where doubt remains, for example in "INC 23(BP)", the assembler assumed WORD mode. Memory operands involving indexing with more than one register are specified in the obvious fashion, eg 4(BX+SI). It makes no difference which comes first or whether the offset is inside or outside the brackets, so W(SI+4+BX) means the same as 4(BX+SI). Memory operands always live in the default segment the processor uses, i.e. in the stack segment for indexing involving the BP register, etc. This may be over-ridden by including a segment register inside the parentheses. This forces the assembler to put a "segment override prefix" instruction in front of the instruction being generated, e.g. MOV AX,64(BP+ES). \section{Assembler directives} \small\tt \begin{verbatim}CODE This causes subsequent code to be generated into the code area. DATA This causes subsequent code to be generated into the data area. LOC n This sets the location counter of the current area to n. Initially both location counters begin at 0. Other assemblers call this ORG. EXTERNAL xyz This defines an external symbol "xyz" to the linker as being the current value of the location counter, in the current area. SPEC xyz This reserves two bytes in the current area, which at link/load time will be filled in with (the offset part of) the value of external symbol "xyz". SEGSPEC xyz This, similary, reserves two bytes in the current area, which will be filled in with the segment part of the value of "xyz". LIST n This controls generation of the assembler listing. 0 turns listing off, 1 turns it on. By default it is on. BYTE \/ These two directives are the equivalent of what other assemblers WORD /\ call DC.B and DC.W, they evaluate a constant expression and plant its value in-line in the current area. Alternatively, a list of constants separated by commas may be given, or a quoted string. In the latter case the string must begin with a double quote, and continues to the end of the line or the next double quote, whichever comes first. What is planted are the characters in the string, without any length prefix or null suffix. ENDMODULE This causes the assembler to generate a module separator linker directive into the object file. It should only be used when assembling several independent modules in a single file. END This is normally the last statement in any source file. It cuases the assembler to stop. \end{verbatim}\rm \normalsize \section{Deviations from standard instruction mnemonics} Because Intel's ASM86 assembler associates more information with operands than does this simpler assembler, a different mechanism is used here to distinguish between the within-segment and inter-segment variants of the jump, call, and return instructions, and between the direct and indirect variants of the jump and call instructions. The mnemonic JMP is used for the unconditional short jump, which, like the conditional jumps, generates two bytes of code, and is restricted to a reach of between 128 bytes back and 127 bytes forward from the next instruction. The mnemonic JUMP is used for the long within-segment jump, with a reach of -32768 to +32767 bytes. The operands for JMP and JUMP should always be a label within the code area. An indirect jump within the current segment is generated by the JUMPI mnemonic, the operand of which is the address of a 2-byte memory location containing the offset of the jump destination. To perform an inter-segment jump we use XJUMP, the operand for which takes the special form xxx:yyy (two constant expressions separated by a colon). The xxx part denotes the segment part, the yyy the offset part of the jump's destination. Finally, XJUMPI is used for an indirect inter-segment jump, i.e. the operand designates a 4-byte memory location containing the jump destination (offset part first). To recap, the JUMP instruction is made INDIRECT by suffixing its opcode tag with "I", and is made INTERSEGMENT by prefixing "X" to it. Similarly we have the CALL, CALLI, XCALL, and XCALLI variants of the subroutine call instruction, and the RET and XRET variants of the subroutine return instruction. The return instructions take an optional parameter, a constant which is to be added to SP after the return address has been popped, in order to remove the parameters which the caller had pushed. Documentation dated 20/10/86, RWT. \vspace{.75in} assem:ass86.doc printed on 14/03/89 at 17.12 \newpage \tableofcontents \end{document}