\documentstyle[a4,12pt]{article} \begin{document} \author{Rainer Thonnes} \title{APM Cross-Assembler for Acorn ARM} \maketitle \parskip .1 in \setcounter{secnumdepth}{10} \parindent 0in \section{Preamble} Cross-Assembler for the ARM (Acorn RISC Machine) Processor \section{USER NOTES} The APM command "ASSEM:ARM HARRY" will assemble the sequence of statements held in file HARRY.ARM, and generate two output files. Object code (in pure binary) is sent to HARRY.OBJ, and a listing is sent to HARRY.LIS. The assembler recognises all the mnemonics listed in the "Instruction Set" section of VLSI Technology Inc's VL86C010 data sheet as well as the locally defined assembler directives described below. Each line in the source file may be no longer than 255 characters, and will contain either a machine instruction, a directive, or neither. In addition each line may be labelled or may contain a comment (or both). The part of a line to the right of any unquoted semicolon is treated as a comment and ignored. Labels, where they occur, must begin in column 1, and be separated from the statement proper by one or more spaces (or by a colon and any number of spaces). Non-labelled statements must have a space in column 1. Labels and other names, used to define constant values or fixed addresses, may be defined by the programmer. Apart from '\_' they must contain alphanumeric characters only, upper and lower case are not distinguished. There is no restriction on the length of names. Names must be distinct from all the predefined mnemonics, which include all the opcodes, condition code, shift and block transfer addressing mode specifiers. The name "*" may be used to denote the address of the current instruction. The names R0-R15, PC, and LINK are predefined for convenience only (i.e. they are eligible for redefinition if need be), and have the values 0-15, 15, 14, respectively. Literals are either quoted characters or numbers. Numbers are interpreted in decimal radix unless over-ridden using '\_'-notation. So 2\_101111, 8\_57, 16\_2F, and 47 are all the same thing. An 'H' suffix is accepted as an alternative for the '16\_' prefix, but the first character must still be in the range 0-9, so if a number begins with a letter then a leading 0 must be added, for example 2FH is the same as 16\_2F, and 16\_C37 would be written as 0C37H. Literals may be signed using '-' (minus) or '\' (not). Quoted constants consist of up to four ASCII characters enclosed in single quote marks. Quote marks themselves may be quoted by doubling them up. Where fewer than four characters appear, they are right-aligned. So 'C' is the same as 16\_43, 'a b' is 16\_612062, '''!''''' is 16\_27212727. In most contexts constant expressions, to be evaluated at assembly time, are allowed, in which all the usual integer operations are supported (+, -, *, /, \% (remainder), \&, ! (or), !! (exclusive or), $<$$<$, $>$$>$). These operations are performed strictly left-to-right, i.e. the usual precedence rules do not apply. No bracketing is allowed. \section{Assembler directives} The directive "=" equates a name to a constant expression. The name begins in column 1, as if it were a label. It is followed (possibly after a few spaces) by an equals sign, which in turn is followed by the expression. The directive "ORG" is used to specify an alternative base address for the code being assembled. At most one such directive may appear in the source file, and it must appear before any statement which generates code (because the object code, being pure binary, cannot contain switching directives). The effect of the ORG statement is to determine the values generated for the offset fields in instructions using PC-relative addressing. In the absence of an ORG directive the code is assumed to start at location zero. The directive "DATA" is used to plant in-line data. The data operand is a full 32-bit word (because code must always be word-aligned). Text string data may be generated with the aid of multi-character quoted constants described above, possibly in combination with constant expressions. For example, the two strings "on" and "mat" may be generated using 'no'$<$$<$8+2 and 'tam'$<$$<$8+3 for length prefixed right-aligned form, bearing in mind that the ARM treats byte 0 in memory as the low-order byte of word 0 (opposite to the 68000 way). The "END" directive marks the end of the source file. \section{Summary of instruction mnemonics (or their components) and operand syntax} All ARM instructions are conditional (i.e. are only executed if the condition code bits in the PSR are such that the condition specified in the COND field of the instruction is true). The instruction mnemonics are formed by concatenating the basic opcode mnemonic, the condition mnemonic (if not supplied, "always" is assumed), and some instruction specific option flags. The condition mnemonics known to the assembler are those listed in the data sheet (EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE, LT, GT, LE, AL, NV), plus HS and LO (which are alternatives for CS and CC). \subsection{"BRANCH" instructions} The opcodes B (branch) and BL (branch and link, i.e. call) may, just as all other instruction mnemonics, occur on their own or together with a condition specifier. B is the same as BAL, BLAL is the same as BL. The opcode is followed by an operand which is a constant expression, usually a label, giving the address of the instruction to be executed next. \subsection{"DATA PROCESSING" instructions} These are as listed in the data sheet (AND, EOR, SUB, RSB, ADD, ADC, SBC, RSC, TST, TEQ, CMP, CMN, ORR, MOV, BIC, MVN). The opcode may be followed by the option letter S or P. The operands number two (for compare and move instructions) or three (for the rest). The first and middle operands are always registers, the first of these (except in compare instructions) is always the destination. The last operand may be a register (possibly shifted) or a literal. General operand syntax, including shift code mnemonics (ASL, LSL, LSR, ASR, ROR, RRX), are as in the data sheet. Examples: \small\tt \begin{verbatim} MVNEQ R0,#16_FF000 ; set R0 to 16_FFF00FFF if "zero" is true. ADDS R1,R1,R1,LSL #4 ; set R1 to R1+(R1<<4), i.e. multiply by 17, ; and set the condition codes. \end{verbatim}\rm \normalsize \subsection{"MULTIPLY" instructions} MUL (multiply) and MLA (multiply and accumulate) are accepted, and take three (MUL) or four (MLA) operands, which are all register numbers. \small\tt \begin{verbatim} MULS R1,R2,R3 ; sets R1 to R2*R3, and affects the condition codes MLA R1,R2,R3,R4 ; sets R1 to R2*R3+R4 \end{verbatim}\rm \normalsize The assembler checks that the restrictions (that the first register must not be the PC or the same as the second) are complied with. Where it is required to replace the contents of R1 with R1*R2, you must write \small\tt \begin{verbatim} MUL R1,R2,R1 instead of MUL R1,R1,R2 \end{verbatim}\rm \normalsize and indeed in this case the assembler does the switch automatically and issues a warning. \subsection{"SINGLA DATA TRANSFER" instructions} The basic STR or LDR opcode may be followed by the letters B (indicating byte rather than word transfer) and/or T (see data sheet). The opcode is then followed by two operands, the first of which is the register to be stored or loaded, the second specifies the address of the operand in memory. The same syntax as in the sheet is used to specify the various addressing modes: \small\tt \begin{verbatim} LDRVC R5,TEMP ; (TEMP is a label) If the overflow flag is clear, then ; load the contents of location TEMP into R5. ; The assembler will convert TEMP into an offset ; form using PC as the base register). STR R0,[R1] ; Store the contents of R0 in the memory location ; pointed at by R1. ...,[R1,#8] ; Specifies the location 8 bytes on from where R1 points [R1,#8]! ; As above, but replaces R1 with R1+8 at the same time. [R1,R2] ; Refers to the location whose address is the sum of ; the contents of registers 1 and 2. [R1,R2]! ; As above and also replaces R1 with R1+R2. [R1,R2,LSL #1] ; Here the address used is R1+R2*2 [R1],#12 ; Specifies location R1+12 and also adds 12 to R1 [R1],-R2,LSL #2 ; Specifies location R1-R2*4 and subtracts R2*4 from R1. \end{verbatim}\rm \normalsize Where an immediate offset is used (as in \#8 or \#12 above), this must be in the range -4095 to +4095 (or, unless byte transfers are involved, and assuming the base register involved is divisible by four, in the range -4092 to +4092). In the case of a label, this must be within 4092 bytes of the instruction after the next. \subsection{"BLOCK DATA TRANSFER" (i.e. multiple register transfer) instructions} STM and LDM, followed by one of the eight addressing mode specifiers (IB, IA, DB, DA, ED, FD, EA, FA) forms the opcode for these instructions. There are two operands. The first is the register containing the base address for the transfer, the second is either a single register (although in this case it would probably be more appropriate to use a STR or LDR instruction instead) or a list of registers enclosed in curly brackets. Within the list, registers are separated by commas (for enumeration) or dashes (for sequencing), so R2-R5 and R5-R2 both mean the same as R2,R3,R4,R5, or as R5,R2,R4,R3. The first operand may be followed by '!' if the base register is to be incremented or decremented by the size of the transfer (it is normally unchanged otherwise), the register list may be followed by '$\hat{ }${}' to force updating of the PSR flags or a user bank transfer). \subsection{"SOFTWARE INTERRUPT" instruction} The SWI opcode (with condition specifier if appropriate) takes a single literal operand. \subsection{"COPOROCESSOR" instructions} These are not implemented. \section{Coda} Documentation dated 07/11/88 \vspace{.75in} assem:arm.doc printed on 14/03/89 at 15.27 \newpage \tableofcontents \end{document}