A High Level Assembler For the ICL 2900


                        Philip A. F. Hartley


                     Computer Science 4 Project
                               May 1976


                           CHAPTER 1.


                          Introduction.


        The goal of the  project  was  to  write  a  high  level
     assembler  for  the  ICL  2900  series  of computers. This
     problem immediately seperated into two distinct subproblems:

               1. Gaining an understanding of the machine.
               2. Choosing the form of the language.

        The 2970 had been used for  development  work  over  the
     last  year  by  ERCC.  So  there  was  a lot of first hand
     experience  of  the  machine  around.  The  first  problem,
     therefore,  could be overcome by reading the machine manuals
     and then talking to people who had experience of the machine
     in order to clear up the points which needed clarification.

        The choice of HAL as the structure of the language  was
     arrived  at  quickly.  Once  HAL  had  been  selected, the
     problem of where to start  arose.  Whether  to  start  from
     scratch  or  to take an already existing HAL, determine the
     machine dependent  parts  and  alter  or  replace  these  to
     produce  2900  code.  EMAS  was  chosen  as the development
     machine on which to implement HAL-2900 using the latter  of
     the above techniques.

        It is important to see this project in a greater context
     than  just a fourth year programming exercise. This was the
     first project undertaken in the  department  which  involved
     direct contact with the machine structure. It was therefore
     felt  that  the  project  could play a large part in helping
     later projects come to grips with the machine at an  earlier
     stage  than  was  possible  in  this project, due to lack of
     adequate documentation.


                           CHAPTER 2.


                     Approach to the 2900.


        In  order  to  write  a  good  compiler  or  high  level
     assembler  for  a  machine  the  implementor  must  have  an
     extemely good knowledge of the machine both in terms of  its
     order  code  and  of the overall structure of the system. A
     considerable  part  of  the  project  was  spent  trying  to
     understand  the  machine by going through the manufacturer's
     documentation.

        It was quite evident that none of the documentation gave
     an adequate 'middle of the road' view of the system - it was
     either  too  glossy  or  extremely  detailed  and  virtually
     unreadable.

        The  experience  of  the EMAS project team was that the
     manufacturer's documentation was not quite good enough; some
     details were either confused, ambiguous or even omitted.

        It was for these reasons, both  for  the  project's  own
     sake  and  to aid the dissemination of information about the
     machine, that Appendix A was written, a detailed, but  not
     exhaustive,  description  and discussion of the 2970 system.
     It was intended to give the reasonably experienced computer
     scientist an  introduction  to  the  main  concepts  of  the
     system, without getting bogged down in too much detail. But
     it  was  also designed as a start to the documentation which
     would complement that of the  manufacturer  inasmuch  as  it
     would  contain  clarification of details of the system which
     were not adequately covered elsewhere.


                           CHAPTER 3.


     The approach to the high level assembler for the 2900


     General

        A high level assembler  is  a  programming  language  in
     which  there  are  program  structuring  -  and  often  data
     structuring - facilities which one expects to find in a high
     level language but which allows direct access to all machine
     facilities, including all  machine  instructions,  registers
     and memory.

        The  first  high level assembler was N. Wirth's PL360
     [1] for the IBM 360/370. It was originally  written  as  a
     tool  to aid the implementation of an ALGOL 60 compiler but
     it soon became evident that this type  of  language  bridged
     the  rather  wide  gap  between  assemblers  and  high level
     languages.

        The advantage  of  high  level  assemblers  over  normal
     assemblers is, as Bell and Wichmann say in [3], "that they
     are  an  easy  method  of  exploiting  the  hardware without
     becoming entangled in machine code". The number of mistakes
     eliminated by being  able  to  write  arbitrary  expressions
     instead of having to generate the instructions by hand shows
     how great these advantages are.

        The  advantage  over  high  level languages is precisely
     that one  can  get  direct  access  to  machine  facilities.
     Techniques  used  to access data or program can be tailored
     to the particular application rather than being very general
     as is necessary in a compiler. Certainly, on a machine like
     the Interdata 70/74, by equivalencing a variable name  with
     a register, 2 bytes per access to that variable can be saved
     over  a  program in which the variable was equivalenced to a
     memory  location  (that  is,  the   difference   between   a
     register-register   form  and  a  register-indexed  form  of
     instruction). Therefore, unless the compiler can do  global
     optimisation,  a  difficult and expensive procedure, as much
     as 2 bytes per access  can  be  gained  in  the  high  level
     assembler  over  the  high level language. If a variable is
     accessed 100 times, then up to 200 bytes can be saved on the
     size of the code - a very important factor when  programming
     on machines with a small memory.


     A high level assembler for the 2900?

        But  why  a high level assembler for the 2900? The 2900
     is  not  a  multi-register  machine  so  the  advantage   of
     assigning  a  variable  name to a register quoted above does
     not apply.

        However, there are some short forms  (e.g. the  LNB+n
     addressing  mode)  but  the main advantages are in the areas
     where the available high level languages are deficient. For
     example,  in  IMP  [4]  there  is  no  direct  methods   of
     manipulating   64   bit   integer   quantities;  the  escape
     descriptor for dynamic data structuring cannot  be  used  to
     full advantage.


        At  the moment, the alternative to programming in a high
     level language on the  2900  is  assembler.  There  is  one
     tremendous  disadvantage  of  working  at  this level on the
     2900: the primitive level interface  is  not  guaranteed  to
     remain  the  same. This makes impracticable the use of such
     techniques for  all  but  the  smallest  applications,  even
     although system VME/K was written in MAPLE/STAPLE.


     Choice of the language.

        There  were  three constraints which the language had to
     satisfy, in order to maximise the advantages stated above.

          1. The language  should  not  enforce  any  method  of
          accessing data or program.
          2. The  language  should  allow  easy  access  to  all
          machine types and functions.
          3. It should be possible to produce code as  efficient
          as that produced by hand.

        HAL languages, designed by H. Dewar of the Department
     of Computer Science, are a family of assemblers whose high
     level  features  are  basically the same on all machines but
     whose low level features  are  tailored  to  the  particular
     machine  for  which  the  HAL  is designed. The high level
     features include block structure (only to restrict the scope
     of  macros  and  variable  names),  assignment   statements,
     conditional statements, controlled loops, a macro scheme and
     a  statement  for equivalencing variable names to registers,
     memory locations or constants. The language allows for  the
     explicit positioning of code and data.

        One  of the main features of the language is to seperate
     the program into two main sections;  a  declarative  section
     and  a  control section. Most of the control section can be
     machine independent, operating on names which are  bound  to
     machine resources in the declarative section.

        This  means  that  the binding of a variable name in the
     declarative section can be changed from a store location  to
     a  register,  or  from  a  long  form  to  a short form, for
     example, without altering the control section.

        The language had to be extended to cater for the machine
     types, namely descriptors, which do not exist on  any  other
     machine for which HAL is implemented.

        HAL   was   chosen   because   it  satisfied  the  above
     constraints and because of the  great  deal  of  programming
     experience  I  already had with HAL70 and the small amount
     with HAL7502.  It  is  very  important,  I  feel,  to  be
     familiar with the language which is being implemented.


     Method of implementation.

        Once  HAL  was  chosen,  a  decision had to be made: to
     start from scratch or to convert an  already  existing  HAL
     implementation to HAL2900.

        This  was  an extremely difficult decision to make. The
     first  method  would  have  involved   a   great   deal   of
     're-inventing the wheel' type of work. Therefore the second
     technique  was  chosen,  thus  taking  the  chance  that the
     program might have to have been rejected and  so  wasting  a
     great deal of time.

        HAL70  was  examined  since  it  was  by  far  the  most
     established HAL language in use. It is quite a  large  and
     complicated  program and took a long time to understand. It
     was written for an 8K pdp15 and is  therefore  necessarily
     very  tightly  written, which made it even more difficult to
     understand.


     A brief description of the machine independent HAL

        The main function of the machine independent part of the
     HAL program is to convert source  statements  into  a  more
     convenient   semi-reverse  polish  form  (not  true  reverse
     polish, as will be seen later). This includes the tasks  of
     reducing  of  names  to  an  internal  form  and  dictionary
     handling. This  sections  handles  conditional  statements,
     converting  them  to  a  set  of  assembler  labels and jump
     directives,  the   macro   scheme,   conditional   assembly,
     definition   and   redefinition  of  variable  names,  block
     stucture, setting of the  assembler  location  counter,  the
     production of the listing file and error reports.

        The machine dependent section generates the object file,
     evaluates  expressions  and  conditions, planting code where
     necessary, generating jumps and planting data.

        The two sections are not totally dependent especially in
     the area of  assembler  jumps  where  they  become  slightly
     entangled  in  order  to optimise the short form of the jump
     instruction. Some machine dependent  features  to  do  with
     descriptors  appear  in  the  main reverse polish conversion
     routine but these will be discussed later.


     Choice of machine.

        On which machine was the  language  to  be  implemented?
     EMAS  was chosen for several reasons. Firstly, the 2900 is
     a 32 bit machine and the assembler would have to  manipulate
     values  of  up  to  32  bits  in  length. EMAS was the only
     readily available  machine  on  which  this  could  be  done
     easily.

        More  importantly,  there  were extensive facilities for
     the  generation  and  manipulation  of  2900  object   files
     available on EMAS.

        Other  reasons  for choosing EMAS were that IMP on the
     Interdatas was not fully debugged, the operating system was
     not  designed  for  general  purpose  use  anyhow,  and  the
     machines  were  not  reliable  enough  at  the  start of the
     project. The PDP15 was not big enough with only 8K  words
     of store - although HAL70 and HAL7502 both fit into this.



                           CHAPTER 4.


                     Implementation details.


        For  reference, most of the machine dependent section of
     the assembler is contained in routine ASSEMBLE starting  at
     line 244 in appendix D.

        The internal representation of a tag is by 3*16 and 1*32
     bit values.

          +--------+--------+   +--------+----------------+
          |        |        |   |        |                |
          |  tag1  |  tag2  |   |  type  |      val       |
          |        |        |   |        |                |
          +--------+--------+   +--------+----------------+
              16       16           16           32

     TAG1 represents the first 3 characters of a tag (in base 36
     format)  and  TAG2  holds  characters  4 - 6.  A  tag is a
     sequence of letters followed by  a  sequence  of  digits  of
     which only the first 6 are significant.

        The type part is interpreted as follows:

               bit 0          unused
               bit 1          machine instruction
               bit 2          macro
               bit 3          forward reference
               bit 4          undefined tag
               bit 5          relocatable
               bit 6          pseudo register
               bit 7          register
               bit 8          memory reference
               bit 9          type of reference
                              =0 => W(REG+VAL)
                              =1 => L(REG+VAL)
                              =2 => B(REG+VAL)
                              =3 => IS(VAL) or IS(B) if REG = B
                              =4 => @(REG+VAL)
                              =5 => @DR(REG+VAL)
                              =6 => @(REG+VAL)(B)
               bits 12 - 15   index register (REG, above)

     The registers are represented by :

     LNB = 1, XNB = 2, PC  = 3, SSN = 4, TOS = 5
     DR  = 6, B   = 8, SF  = 9, ACC = 10

     Zero in the index register  field  implies  that  no  index
     register is being used.


     Examples

     In  the  following examples, and throughout the rest of the
     report, ( , ) represents a (TYPE, VAL) pair, LMASK = x'10'
     (i.e. 1 set in 'type of reference'),  PSREG = x'200'  (i.e.
     'pseudo  register'  set  to  1) and REGISTER = x'100' (i.e.
     'register' set to 1). The expression on the left hand  side
     is  what  actually  appears in the source; the expression on
     the right hand side is the internal  representation  of  the
     left hand side.

     lnb = (register, lnb)
     w(lnb+4) = (lnb, 1)         / lnb is offset by a number of words
                                 / so a conversion is done from the
                                 / offset stated in the source (which
                                 / is always bytes) to words.
     l(pc+16) = (lmask+pc, 8)    / again the conversion but to half
                                 / words this time.
     w(r0+16) = (psreg+0, 16)    / no conversion for pseudo registers
                                 / until the actual evaluation of the
                                 / pseodo reg is required.


        Therefore  an  assembly  time  expression,  to  which  a
     variable name can be equivalenced, for example, is one whose
     result can be represented by a  (TYPE, VAL)  pair,  without
     generating any code.

        An  operand  in  the  reverse  polish  form is therefore
     represented by a (TYPE, VAL) pair (TAG1 and TAG2  can  be
     discarded  since  they  are  only  required  for  dictionary
     look-up and these operations have been completed by the time
     the  reverse  polish  is   generated).   An   operator   is
     represented  by  a  negative  number.  (The  switch ASS in
     ASSEMBLE is indexed by the operator value).  For  example,
     given the following definitions:

         $def a = w(lnb+4), b = w(lnb+8)
         $def c = l(ssn+32)

     (i.e.   define  A,  B  and  C  to  have the values of the
     assembly time expressions on the right  hand  sides  of  the
     appropriate equals signs)
     then  the  reverse  polish  form  generated  for  the source
     statement:

         c = a+b+3

     (i.e.   assign to the memory location or register  synonymed
     by  C  the  value  of  the  expression  given by adding the
     contents  of  the  memory  location,  register  or  constant
     synonymed by A, adding it to the memory reference, register
     or constant synonymed by B, and then adding 3).
     would be:

         LMASK+SSN, 8, store,
                LNB, 1, LNB, 2, add, 0, 3, add, store
                !

     where  'store',  'load'  and  'add' are the operators in the
     reverse polish.

        Routine ASSEMBLE is  then  called  and  the  expression
     starting  at  the position indicated is evaluated. When the
     'load' operation is met the pointer is returned to the start
     of the expression and when the 'store' operation is reached,
     the evaluation is terminated.

        When an operation is  met  in  the  reverse  polish  the
     appropriate  procedure  is  invoked  by  jumping through the
     switch ASS (in routine ASSEMBLE) indexed by the  operation
     value.  For binary operations the left operand is delivered
     in (TYPE1, VAL1) and the right operand in (TYPE, VAL)  and
     the  result  is  expected to be returned in (TYPE, VAL). A
     unary operation is invoked with its operand in  (TYPE, VAL)
     and the result returned in the same.


     Descriptors

        The descriptor provides a method of accessing a regular,
     usually static, set of data items. The descriptor itself is
     a  64  bit quantity of which the first (most significant) 32
     bits define the type of the data being accessed (1,  8,  32,
     64 bit length, for example), whether any modifier applied is
     to be scaled and an upper bound on the size of the modifier.
     The  second (least significant) 32 bits contain the address
     of the first item of the set.

        The notation  chosen  for  descriptors  was  @DESC  for
     non-modified   descriptors   and  @DESC(MOD)  for  modified
     descriptors. This was chosen as being the  most  consistent
     with  the  syntax  of  other  indirect  forms  in  HAL; e.g
     W(ADDR), B(ADDR), etc.

        The symbol '@' could then be treated as an  operation  -
     unary   for   non-modified   descriptors   and   binary  for
     descriptors  to  which   modification   is   applied.   The
     particular  operation  to  be  invoked  is  worked  out from
     context. It is quite straightforward but not trivial since,
     in the modified case, the '@' is being treated as a prefixed
     binary operator, whereas  all  other  binary  operators  are
     infix.  Of  course,  the  treatment  of  unary '@' is quite
     consistent with unary minus (-) and not (\).

        The technique of 'evaluation' of descriptors is  to  try
     to  fit  the  descriptor  reference  to one of the available
     addressing modes and if this is not possible,  to  load  the
     DR   (a   register  which  holds  a  descriptor)  with  the
     descriptor. The modifier, if any, is  loaded  into  the  B
     register unless it fits one of the modifier addressing modes
     which  is  consistent  with  the  desciptor  reference. For
     example,

                  @(L(LNB+4))(W(LNB+12))

     would be coerced to

                  @DR(W(LNB+12))

     where DR gets L(LNB+4), and

                  @(L(XNB+40))(L+M)

     would be coerced to

                  @(L(XNB+40))(B)

     where B gets L+M, given that L+M will not fit one of  the
     modifier addressing modes.


     Registers

        Each register on the 2900 is meant to serve a particular
     purpose.  For  instance,  the  XNB  is meant for use as an
     index register and no computation can be  performed  on  it.
     The  accumulator  is  meant to hold intermediate results of
     expressions and for performing computation and it cannot  be
     used as an index register. The B register is semi-flexible
     in  that  a limited amount of computational functions can be
     performed with it (on 32 bit quantities) and it  has  got  a
     route  into  the  stack, although its primary function is to
     hold modifiers.

        Because  registers  have  such  set  functions   and   a
     particular  register  has  to  be  used when its function is
     required, it was difficult to see how  the  concept  of  the
     temporary  register  specification  (the '$TEMP' directive)
     would fit into the HAL2900 scheme. After  all,  the  stack
     and  the  accumulator can be used to hold partial results of
     expressions so  that  explicit  specification  of  temporary
     registers  is  unnecesary. The only register it appeared to
     be necessary to let the the user claim or release was the B
     register since there are a number of instructions  to  allow
     it to be used as a cycle control variable.

        This,  however,  turns  out to be a rather short sighted
     view. There are situations, especially with XNB, where the
     user may want to load a register with a particular value and
     be sure that the register will not  be  used  later  by  the
     assembler  for its own purposes. If the assembler does need
     to use the register, then it should  flag  the  occasion  by
     generating an error message.

        A  problem  with  index registers is that offsets are of
     different lengths depending on the  register  in  use.  For
     example,  offsets from the LNB are taken by the hardware to
     mean a number of words; offsets from the PC are taken to be
     a number of half-words (i.e. 2 bytes). But when a  register
     is  stored,  if it can be, then the value is a byte address.
     The assembler, therefore, assumes that all offsets it finds
     are a number of bytes and does the appropriate  conversions.
     The only cases in which this is not true is when it detects
     SF = sf+(....)  or  LNB = sf-(....)  in which case it uses
     the instructions ASF and RALN (Adjust Stack  Front  and
     Raise  Local Name Base) which assume an operand which is
     a number of words.


     The evaluation of expressions

        At first sight the 2900 is ideal for the  evaluation  of
     expressions;  it  has a stack, the top item of which (top of
     stack - TOS) can be accessed by a primary  addressing  mode
     and  an  accumulator  (the  real  top of stack) which can be
     loaded, stack and loaded or stored using any of the  primary
     addressing modes.

        On  top of this the B register can be used to a limited
     extent as an accumulator to  perform  addition,  subtraction
     (but  not  reverse subtraction) and multiplication on 32 bit
     quantities. It is  better  to  use  the  B  register  when
     possible  since  its  arithmetic  is faster than that of the
     main accumulator.

        The drawbacks  come  in  the  way  the  multiple  length
     accumulator  is  handled.  Operations  cannot  be performed
     directly between a  64  bit  accumulator  and  32  bit  twos
     complement  directly or indirectly (i.e.   via a descriptor)
     addressed values. In order to convert a 32 bit value  to  a
     64  bit  value  with  sign  extension,  the value must first
     either be loaded into a 32 bit accumulator and  then,  by  a
     direct  modification  of the program status register, change
     the size to 64 bits, or it could be loaded directly  into  a
     64  bit  accumulator  via  a descriptor. In either case the
     result is a 32 bit value extended on the  left  to  64  bits
     with  zeroes. The sign extension must be done by program; a
     left shift logical 32 bits and a right shift  arithmetic  32
     bits.  There  are  other  cases when the accumulator causes
     problems, mostly to do with  conversion,  but  they  do  not
     concern this project.

        The  assembler,  therefore,  had  to  handle expressions
     consisting of operands potentially of different lengths  and
     had to decide what conversions should be done.

        There is no way at compile time, however, of telling the
     length  of  indirectly  addressed  data  items. There are 2
     choices here: either a new notation is introduced to specify
     the length of the item implicitly or some  explicit  way  of
     describing  the lengths of all indirectly addressed items is
     employed.

        The latter method was chosen. The former  method  would
     certainly  be  the  most flexible but a notation to describe
     all  types  of  descriptors  would  have  been   cumbersome,
     perhaps.  The latter method seemed to cover the majority of
     cases.

        So  the  '$ACC'  directive  was  introduced.  If   the
     directive  '$ACC 32' is given then all indirectly addressed
     items up to the next '$ACC' directive are assumed to be  32
     bit twos complement values. Similarly for '$ACC 64'.

        The  problems  which now existed were when to use the B
     register as an  accumulator  and  when  to  coerce  data  to
     another length.

        The  choice  of  accumulator was made at the start of an
     expression. The B register was chosen when '$ACC 32'  was
     in force and all operations involved in the expression could
     be   performed   on  it.  This  only  seemed  to  have  one
     disadvantage; that there was no reverse subtraction  on  the
     B register and the occurence of this operation could not be
     detected  by  a simple scanning of the source text (as could
     be done to detect other  non-B  register  operations).  So
     when  this  operation  was  required in the evaluation of an
     expression it is converted to a negate and add operation.

        The length to which all operands should be converted was
     taken to be indicated  by  the  current  '$ACC'  directive.
     This  seemed like the most desirable length at the time but
     perhaps the length of the LHS (if it  existed)  would  have
     been a better choice.

        Conversion  at present is always done when a variable is
     required  as  an  operand.  There  is,  however,  quite   a
     considerable  amount  of  code  required to do conversion so
     perhaps  it  would  have  been  a  better  idea  to  do  the
     conversion  only  when  necessary,  deferring the conversion
     when 2 operands of the same length are combined in a  binary
     operation,  and converting if the operands are of dissimilar
     length, for example. This  gives  the  advantage  of  doing
     CONVERT(J+K+L) rather than CONVERT(J)+CONVERT(K)+CONVERT(L).

        After having  written  some  test  programs  and  having
     examined  the  code  produced  there  is,  perhaps, a better
     scheme for  evaluating  expressions  which  comes  to  mind.
     Take, for example, the expression:

         dest = @desc(mod)+w(addr)

     The  destination of the expression involved in DESC is the
     DR, therefore the operands should be  coerced  to  64  bits
     without   sign  extension.  MOD  will,  unless  some  other
     'less strong' addressing mode will suffice, be destined  for
     the  B  register.  Therefore operands should be treated as
     signed 32 bit twos complement quantities and coerced to this
     length, if necessary. Similarly for ADDR except  that  the
     possible   final   destination  will  be  XNB.  The  whole
     expression should then be coerced to the length of DEST.

        This is difficult with the  present  structure  since  a
     binary  operation  sees  only  2  operands; it knows nothing
     about the environment of the expression. This could be done
     by inserting more information into  the  reverse  polish  to
     indicate  what  kind  of  expression  is  coming  up  so the
     appropriate accumulator and coercion length choices  can  be
     made.  There are still problems with expressions which have
     no  destination,  conditions,  for  example,  and  thus   no
     sensible  guess  can be made about coercion length. Perhaps
     the '$ACC' directive may be the choice.


     Assignments

        Assignments on the 2900 are not as straightforward as on
     multi-register  machines.  The  difficulty  comes   because
     different  registers  and  memory  locations are accessed by
     different instructions whereas  on  multi-register  machines
     they  are  accessed more uniformly. There are special cases
     with ACC = dr which has to be detected in order to use CYD
     (copy the descriptor register into the accumulator) and with
     B and TOS since they are treated as registers but,  unlike
     the  other  registers, can be accessed directly by a primary
     addressing mode.


     Pseudo registers

        In multi-register machines like the IBM 360/370 and the
     Interdata 70/74 it  is  possible  to  access  a  number  of
     seperately  addressable  areas  by  assigning  a register to
     point to the base of each area. Elements of each  area  can
     then   be   accessed   by  specifying  an  offset  from  the
     appropriate base register.

        The 2900 does not have multiple index registers.  There
     is only one such register, the XNB.

        A  facility  in  the  assembler is therefore provided to
     allow a  number  of  memory  locations  to  act  like  index
     registers,  the XNB being loaded with the appropriate value
     when required.

        For this purpose, sixteen pseudo registers are provided,
     R0 - R15. A pseudo register can  be  equivalenced  in  the
     '$DEF' statement to any expression capable of assembly time
     evaluation.  The  pseudo register can then be used in place
     of a real index register in both assembly time and run  time
     expressions.

        This  effectively  allows  the  user  to  equivalence  a
     variable name with an expression which has an extra level of
     indirection (not meaning via a descriptor  this  time),  for
     example, an expression like W(W(LNB+4)+16) in the following
     manner:

               $def r0 = w(lnb+4)
               $def k = w(r0+16)

     When  the variable name K is referenced, the XNB would be
     automatically loaded with  W(LNB+4)  and  the  (TYPE, VAL)
     pair altered to represent W(XNB+16).

        There  is a rather obvious generalisation to this but it
     would probably create more problems  than  it  would  solve.
     For  instance, when does one actually 'evaluate' the object
     and load the appropriate registers?  The  method  which  is
     implemented  is  rather clumsy but it covers the majority of
     cases in which indirection is required.


     Segmentation

        The 2900 is  a  segmented  machine.  That  is,  logical
     program  and  data sections can be seperated by placing them
     in different segments.

        It is worthwhile discussing here how much the loader, or
     rather the program which generates a file  for  the  loader,
     can effect the way the assembler operates.

        The  system routine LPUT is the program which generates
     ERCC  2900  object  files  on  EMAS.  A  program   passes
     fragments of information to LPUT and it assembles them into
     a  format  suitable  for  other  programs to use as input in
     order to produce  proper  loader  files.  It  is,  however,
     primarily  designed  for use by compilers. The restrictions
     which this brings are far reaching. For  instance,  in  the
     process  of  the compilation of an IMP program, information
     for a  limited  number  of  segments  is  generated.  These
     segments  are  the code segment, the GLA segment for static
     data, the procedure linkage  table  for  external  procedure
     linking   and   another   segment   for  storing  diagnostic
     information. LPUT is tailored to this sort of  use;  it  is
     not  intended  to  produce  general object files, containing
     information for an arbitrary number of segments.

        If LPUT is used, therefore, there a  restricted  number
     of  segments  for  use, of which only two are really useful,
     the code and the  GLA  segments.  It  would  obviously  be
     better to use a more generalised system. This could be done
     by generating loader information directly but there seems to
     be  such  a  plethora of ICL loader formats that this might
     lead  to  unacceptable  inflexibility.  The  chosen  loader
     format   may  become  available  on  only  one  system,  for
     instance, or the  specification  of  the  format,  like  the
     primitive level interface, may change.

        It was felt, therefore, that the only way of producing a
     working system, no matter how limited that system is, was to
     use the facilities already available.

        So,  at  the  moment,  code  can  only  be dumped in two
     segments, the code and GLA segments, although this does not
     preclude the linking in of other segments at run time.  The
     mechanism  for  depositing  values  in  either segment is as
     follows: when the assembler is  entered  it  is  set  up  as
     currently  dumping  in  the  code segment. To change to the
     GLA segment, the current location counter is  captured  (by
     the $DEF var = * facility) and then changed (by the '$LOC'
     directive)   to   a   value  containing  a  pseudo  register
     component. The assembler assumes  that  when  the  location
     counter  has  a  component  of a pseudo register, then it is
     required to dump code in the GLA segment.

        This can be generalised by specifying two expressions in
     the '$LOC' directive; the segment required and the location
     counter within  that  segment.  Some  segments  would  have
     predefined values: 0 for the code segment and 1 for the GLA
     segment, say.


     Optimisation

        The  implementors  of  some languages (of PL11 [2], for
     example) feel that the code produced by the assembler should
     be as 'clear' as possible. That is, it should  not  try  to
     hide  anything  from  the  programmer,  and  that  a  simple
     inspection of the source program should be enough to 'guess'
     what code should be  produced.  They  therefore  feel  that
     optimisation  should  only  be  done  in very rare, but well
     defined  cases,  since  it  leads  to  obscure  code   being
     produced.

        Certainly,  the  assembler  should  not  use  any memory
     locations unless they are  specifically  set  aside  by  the
     programmer  but  optimisation  in  terms  of remembering the
     contents of registers through basic blocks (i.e.  section of
     code with one entry point and one  exit  and  containing  no
     machine  instructions)  plays a large part in satisfying the
     fourth constraint  of  the  language  mentioned  in  chapter
     three,  that  the  code  produced by the assembler should be
     efficient.

        The main optimisation done, in fact, is to remember  the
     contents  of  registers.  This  can  be  done  by  having a
     (TYPE, VAL)  pair  associated  with  each  register   whose
     contents  are  to be recorded. When the register is loaded,
     the duplet describing the item being loaded is deposited  in
     the relevant (TYPE, VAL) pair of the register.

        The  problem  here  is  that there are some (TYPE, VAL)
     forms which cannot be accessed directly,  for  example,  the
     value  of  the  contents of an index register plus an offset
     (an immediate form in Interdata terms), or an access  to  a
     byte  which  has  to be loaded via a descriptor. But by the
     time the code has been generated to produce a form which can
     be directly referenced, the original (TYPE, VAL)  pair  has
     been  lost. This is a difficult problem to get round and it
     has not really been solved, but the assembler still  catches
     a lot of important cases.

        The  condition code, and what set it, is also remembered
     but this is relatively minor since, unlike  the  Interdata,
     the  loading  of  a variable into a register does not affect
     the condition code. So the only time this optimisation will
     produce a result is when the same comparison is  done  twice
     within  a  short  enough space so that the register involved
     does not get corrupted.

        Register contents are  forgotten  whenever  a  value  is
     written  to  store unless it is the contents of the register
     itself which are being written or the  register  contains  a
     constant.  A  better  way  would  be  to  forget all memory
     references in registers if the store was to be  done  via  a
     descriptor,  but if the store was direct then only to forget
     references to that same location in other  registers.  This
     is  not  done  at  the moment because it would be too clumsy
     with the present set up - another area for improvement.


     Other features

        All other HALS, so far,  have  generated  'stand alone'
     programs.  That is, there were no facilities for generating
     external  references  or  entry  points.   For   the   2900
     implementation this is impracticable since HAL2900 programs
     will  almost always want to interface with other entities in
     the system, for example, to do I/O.

        Two directives have been introduced  for  this  purpose,
     '$EXT'  and  '$ENT'. Both are followed by a name, '$EXT'
     instructing the loader  to  deposit  in  the  suceeding  two
     locations  a  descriptor via which the object referred to by
     the given name can be accessed, and  '$ENT'  informing  the
     loader  that  the  succeeding  two  locations will contain a
     descriptor defining an entry point which will be  externally
     referred to by the given name.


     Current state of the program

        The assembler is now available on EMAS for general use.
     Programs  have  been  run  on the New Range Simulator on
     EMAS, but not, as yet, on the real machine,  although  this
     not cause any problems.


                           CHAPTER 5.


                          Conclusions.


        Chapter  3 discusses in detail what the advantages might
     be  of  programming  in  high  level  assemblers  but  after
     programming  a little in HAL2900 it becomes clear that very
     little can be gained  over  high  level  languages  on  this
     machine.  This is connected with fact that the architecture
     of the machine  enforces  quite  strict  techniques  on  the
     programmer, for example, of routine calling. Very little is
     saved in terms of code produced and is it really worth it to
     sacrifice excellent diagnostics for what is basically just a
     reduction  in  the awkwardness of handling certain items? I
     think not.

        But the prime object of the project was not to produce a
     production system,  although  I  wanted  to  get  something
     working,  but  more  to  learn  how  to write such a system,
     perhaps producing a platform to a useable system on the way.
     I certainly feel I have provided the latter.

        I have learned a lot about the process  of  compilation,
     the  problems  of  evaluating expressions at this level, the
     problems of optimisation, and how to manage a large program.
     The project would have been worthwhile just  to  gain  this
     experience.


                           CHAPTER 6.


                 References and acknowledgements


     [1]   PL360 - A Programming Language for the IBM360.
           Wirth, N.
           JACM, vol 15 (1968), p37

     [2]   PL11 - A Programming Language for the PDP11
           Russell, R.
           CERN report #74-24

     [3]   PL516 -  Programming Language for the Honeywell DDP516
           Bell and Wichmann
           Software, VOL 1 (1971), p61

     [4]   IMP Programming Language and Compiler
           Stevens, P. D.
           Computer Journal, Val 17 (1965), #3


     Many  thanks  go to my supervisor, Nick Shelness, without
     whose help, advice, good ideas, and general bringing down to
     earth, I wouldn't have finished the project.

     Also to H. Dewar who suggested the project in  the  first
     place (in a slightly altered form) and without whose amazing
     program I couldn't possibly have got as far.

     Thanks to Jeff Tansley for manuals and advice.

     Thanks  to  P. Stevens,  G. Millard,  R. Wickham  and
     others in ERCC and the RCO who gave me advice.