Extended object file format for EMAS 370


General

The format described in these notes extends the EMAS 2900 object file format to allow greater flexibility and compatibility with other ERCC Compiler Group compilers, produced for other operating systems. This should increase the commonality of code in all ERCC Compiler Group products, improving maintenance and portability. At the level of the Put interface (see Description of EMAS 370 Put Code Generating Interface), which represents the common code generating interface for all ERCC Compiler Group's compilers on EMAS 370, the extended set of areas already appears to exist, being mapped onto the old restricted set. The purpose of these notes is to allow other object file manipulating software, particularly the loader, linker and Modify, to use the extended set and for these areas to exist physically in an object file with an extended definition. The new set of areas is almost a superset of the EMAS 2900 definition, except that the old common area (area 6) is removed and area 6 used as the new Diagnostic table Area, with read only, shareable properties. As ERCC's Fortran compilers no longer make use of the old single common, using instead multiple commons numbered from 11 upwards, this presents no conflicts with existing compiler needs. A further proposed change, which is not a simple extension, is the use of a more compact representation of common areas. Although all other areas would still be laid out fully, common areas would now be held as a list of initialisation records. This should not impose a significant loading overhead, but would greatly reduce the size of unloaded object files for some large Fortran programs. The types of areas defined by such records is extended beyond common areas, allowing, at first, additional static areas to be defined. The possibility of additional areas with other properties is left open for future development.



Area summary

EMAS 2900 ! EMAS 370 ----------------------------------------------------------------- 1 ! Code ! Code ----------------------------------------------------------------- 2 ! GLA-Read/write/relocate ! GLA-Read/write/relocate ----------------------------------------------------------------- 3 ! PLT - Never used ! Unused ----------------------------------------------------------------- 4 ! SST - Read only ! SST - Read only ----------------------------------------------------------------- 5 ! UST - Read/writeelocate ! UST - Read/write/relocate ----------------------------------------------------------------- 6 ! Common ! Diagnostics - Read only ----------------------------------------------------------------- 7 ! Init Stack ! Statics - Read/write/relocate ----------------------------------------------------------------- 8 ! Not used ! IO Tables - Read/write/relocate ----------------------------------------------------------------- 9 ! Not used ! Zero UST - Read/write ----------------------------------------------------------------- 10 ! Multiple common 1 ! Constants - read only ----------------------------------------------------------------- 11+! Multiple commons ! Compiler defined areas



Table 1


Area properties

Code Unchanged from EMAS 2900. Now used almost exclusively for machine instructions, as PC relative addressing is not supported by 370 architectures. Read only, shareable. GLA Unchanged from EMAS 2900. Should only be needed for linkage, as extra areas are provided for statics. IO Tables also allow separation of some linkages. Read, write, unshared, relocatable, initialisable. SST Unchanged from EMAS 2900. SHould have more specialised role, as extra areas are provided for constants and diagnostic records. Read only, shareable. UST Unchanged from EMAS 2900. Should have more specialised role, as extra areas are provided for statics and zeroed statics. Read, write, shareable, relocatable. Diagnostics New read only area. Removes diagnostic records from SST, reducing run time paging. Allows interleaved generation of diagnostics and constants without such overheads. Statics New read/write area. Currently relocatable. Unshared. Initialisable. IO Tables New second GLA, for specialised purposes. Read, write, relocatable, unshared. Zero UST Zeroed static area. Read, write, unshared, unrelocatable. Constants New read only area. Shareable.

Compiler Defined Areas

Areas 11 upwards are definable by individual compilers. This extends the EMAS 2900 use of them for multiple commons. Their properties are defined in records whose format is given below. These replace the records on list LDATA(13) (single word references) of EMAS 2900 object files. These extended areas may be laid out or merely described in their records. The area definition records may contain pointers to a laid out area, assume complete initialisation to zero or the unassigned pattern or be multiply initialised by records in the list headed by LDATA(13). The format of an area definition record is integer Link, Area, Len, Props, Disp, %string(31) Name). Link is the link to the next in the list. Area is the number ( >=11 ) given to this area. Len is the length in bytes of this area. Props defines the properties, including initialisation, of this area. Bit value = 1<<0 Property defined = Blank common - no initialisation 1<<1 Named common 1<<2 Local (static) area 1<<3 Reserved for future use 1<<4 " " " " 1<<5 " " " " 1<<6 " " " " 1<<7 " " " " 1<<8 Zero filled 1<<9 Unassigned pattern filled 1<<10 Multiple initialisation 1<<11 Laid out 1<<12 Reserved for future use 1<<13 " " " " 1<<14 " " " " 1<<15 " " " " 1<<16 - 1<<31 " " " " Disp is the offset from the start of the file of any laid out area. Name is the name of this area, where appropriate.



Physical layout of object files

The current layout of loaded object files, with new areas mapped onto old, is shown in figure 1. This shows the order as all read only areas, followed by all read/write areas, followed by all common areas. A new layout is shown in figure 2. This would be the order in which they would be loaded. This makes Diagnostic Tables a completely separate area, not fused to the other read only areas, bringing more frequently accessed read only areas closer together and so reducing paging. Diagnostics are only normally accessed following a program error or a %monitor.

The unloaded object file

It is anticipated that all object files on EMAS 370 will be generated through the Put interface. LPUT will be maintained unchanged for utilities which have yet to make the change to Put, but the extended format will only be available through Put. This will require a means of distinguishing the old and new formats and the object file map is extended accordingly. The general form of Put generated object files, prior to loading, is shown in Figure 3. The file header contains the locations of the object file map and the LDATA table. These in turn contain the locations of the other parts of the file. The LDATA lists are linked lists containing linkage and initialisation details, plus the history records for the file. The headers of these lists are held in the LDATA tables.

The file header

The file header is unchanged from EMAS 2900. It consists of eight words, used as follows: word 0 Offset of end of data from start of file. 1 Offset of start of data from start of file, usually 32. 2 Physical file size. 3 File type = 1 for object file. 4 Sum check, not used by Put. 5 Packed date and time last changed. 6 Offset of LDATA from start of file. 7 Offset of object file map from start of file.

The file map

The object file map is extended by four entries from EMAS 2900, to allow old and new formats to be distinguished. The new format is: (%integer N, %record(AreaFmt)%array Area(1:N)) where N is the number of areas defined (7 for old format, 11 for new) and Area is a table of records for each area. The new format uses entry 11 to define the diagnostic table area, i.e. area 6, to avoid confusion with old uses of the area 6 common. Each record in Area has the following format: (%integer Start, Len, Props) where Start is the offset of the start of the area from the start of the file, Len is the length of the area, Props defines the properties of the area. Currently the most significant bit in Props defines a non-shareable area.

LDATA

The LDATA table is a sequence of integers with the following meanings: LDATA Meaning 0 The number of entries following, initially 14. 1 Offset of start of list of procedure entry records. 2 Total number of entries and references. 3 Total number of relocations. 4 Offset from start of file of list of data entries. 5 Load address of code for bound files. 6 Load address of gla for bound files. 7 Offset from start of file of list of static procedure references. 8 Offset from start of file of list of dynamic procedure references. 9 Offset from start of file of list of data references. 10 Load address of initialised stack for bound files. 11 Offset from start of file of list of compiler defined areas. 12 Offset from start of file of history records. 13 Offset from start of file of list of multiple initialisation records. 14 Offset from start of file of list of relocation requests. Note that the use of LDATA(11) is changed from EMAS 2900, since single word references were only needed for VME compatibility. Similarly LDATA(15) is no longer present, as OMF diagnostics have no meaning on EMAS 370.

LDATA list record types

In the record formats given, all displacements are in bytes and are relative to the start of the area specified in that record. The links are relative to the start of the object file. All strings have a maximum length of 31 characters and are held with their actual length, rounded up to a word boundary. Procedure entries - listhead LDATA(1) The format of each record is (%integer Link, CodeOffset, GlaOffset, EPOffset, ParamW, %string(31) Iden) where CodeOffset is the offset of the head of code from the head of code for the module, GlaOffset is the offset of the head of GLA from the head of GLA for the module, EPOffset is the offset of the entry point from the head of code given in CodeOffset. Where the most signficannt bit is set this is a main entry, ParamW is (number of parameters<<16)!(bytesize of parameters). If ParamW is -1, this indicates no parameter checking. Iden is the name of the entry point. Data entries - listhead The format of each record is (%integer Link, Disp, Len, Area, %string(31) Iden) where Disp is the offset of the entry from the start of Area; Len is the (minimum) length of the data item. Area is the area containing the item. Iden is the name of the item. Procedure references - listhead LDATA(7) and LDATA(8) The format of each record is (%integer Link, RefLoc, %string(31) Iden) where Refloc is (Area<<24)!RefAd) RefAd is the displacement from the start of Area of a procedure reference which is to be filled in by the loader. Iden is the name of the procedure being referenced. Data references - listhead LDATA(9) The format of each record is (%integer Link, RefArray, Len, %string(31) Iden) where RefArray is the offset from the start of the file of a record of the form (%integer N, %integerarray RefLoc(1:N)) and each item in RefLoc has the form (Area<<24)!RefAd, where RefAd is the offset from the start of Area, of a word which is to have the address of data item Iden added to it by the loader. Area is the area containing the references. Len is the expected, minimum, length of the item. Iden is the name of the item. Compiler defined areas - listhead LDATA(11) Areas 11 upwards are definable by individual compilers. Their properties are defined in records whose format is given below. These extended areas may be laid out or merely described in their records. The area definition records may contain pointers to a laid out area, assume complete initialisation to zero or the unassigned pattern or be multiply initialised by records in the list headed by LDATA(13). The format of each record is %integer Link, Area, Len, Props, Disp, %string(31) Name. Link is the link to the next in the list. Area is the number ( >=11 ) given to this area. Len is the length in bytes of this area. Props defines the properties, including initialisation, of this area. Bit value = 1<<0 Property defined = Blank common 1<<1 Named common 1<<2 Local (static) area 1<<3 Reserved for future use 1<<4 " " " " 1<<5 " " " " 1<<6 " " " " 1<<7 " " " " 1<<8 Zero filled 1<<9 Unassigned pattern filled 1<<10 Multiple initialisation 1<<11 Laid out 1<<12 Reserved for future use 1<<13 " " " " 1<<14 " " " " 1<<15 " " " " 1<<16 - 1<<31 " " " " Disp is the offset from the start of the file of any laid out area. Name is the name of this area, where appropriate. Object file history - listhead LDATA(12) Currently history records are not linked, but form a contiguous sequence. Each sub-type has a fixed or determinable length and chaining is by incrementing an offset by this length. It might be sensible to consider the chaining of these records as for all other types. Currently they are defined as the following sub-types. 0 - end of history records (%byteinteger Type) 1 - source file name (%byteinteger Type, %string(*) S) 2 - parms (%byteinteger Type, %longinteger Parms) 3 - start of linked object (%byteinteger Type) 4 - object file name (%byteinteger Type, %string(*) S) 5 - date linked (%byteinteger Type, %integer PackedDate) 6 - date compiled (%byteinteger Type, %integer PackedDate) 7 - end of linked object (%byteinteger Type) 8 - general text (%byteinteger Type, %string(*) Text) 9 - compiler (%byteinteger Type, %string(*) Utility) In addition it has proved desirable to be able to identify included source files and the depth and structure of inclusion. Thus a new record might be included. 10 - included source file (%byteinteger Type, Depth, %string(*) S) Initialisation data - listhead LDATA(13) The format of each record is (%integer Link, Area, Disp, Len, Rep, Addr) where Area is the area to be initialised (1-10 or specified by an LDATA(11) record) Disp is th offset within Area where the initialisation is to start. Len is the length of initialisation data. Rep is the number of copies to be added consecutively. Addr is the offset from the start of the file of the initialisation data or (where Len is one) a value with Rep bytes are to be filled. Relocation request blocks - listhead LDATA(14) The format of each record is (%integer Link, N, %recordarray Relocs(1:N)) where N is the number of relocation requests in this block. R has the following format (%integer AreaLoc, BaseLoc) where AreaLoc is (Area<<24)!AreaDisp BaseLoc is (Base<<24)!BaseDisp Area identifies the area containing the item to be relocated. AreaDisp is the offset of a word within Area to be relocated. Base is an area whose base will be used in the relocation. BaseDisp is an offset within Base which will be used in the relocation. Note that BaseDisp is always zero currently, because of the limitations of LPUT. This will no longer be guaranteed by Put, which will use BaseDisp to avoid work for itself.

Implementation

It is proposed that the definitions given in this set of notes form the basis for the object file formats in EMAS 370, as released for a user service. The changes are mostly trivial and should reduce work for most software affected. The major changes will involve extending the loader to cope with additional area definitions and more general user defined areas. For other software, notably Link and Modify, the changes are simpler. The abandonment of LPUT and the move to Put should be largely a matter of spec changes. Analyse will also need to be slightly extended. Rob Pooley, March 25th 1985