\documentstyle[a4,12pt]{article} \begin{document} \author{Frank Cringle} \title{APM C Compiler -- Version 2} \maketitle \parskip .1 in \setcounter{secnumdepth}{10} \parindent 0in \section{Preamble} APM C Compiler {\hspace{0.5 in}} (Version 2) {\hspace{2.1 in}} Frank Cringle The Bell Labs portable C compiler, with M68000 code generator added at MIT, is available on the APM. This compiler conforms to "The C Programming Language" by Kernighan \& Ritchie, with the addition of structure assignment, the enumerated type and in-line assembly. A large proportion of the UNIX(tm) run-time library exists, and porting of programs from UNIX to the APM and vice versa is possible for programs which do not rely on multiple processes. \section{Preparation} Before running any programs compiled with the V2 C compiler, the run-time library must be installed, and preferably preloaded. Add the following commands to your login.com file or, if you have a personal machine, to its custom:xx.com file: {\hspace*{0.2 in}} preload nc:libc.mob \\ {\hspace*{0.2 in}} install nc:libc.mob If your programs use transcendental functions or graphics, install and preload the corresponding libraries too: {\hspace*{0.2 in}} preload nc:libc.mob,nc:libm.mob,nc:libg.mob \\ {\hspace*{0.2 in}} install nc:libc.mob,nc:libm.mob,nc:libg.mob \section{Command line parsing} The C compiler automatically generates a call to the run-time function \_START before executing the main() function of a C program. This does a unix shell style parse of the command line, and builds parameters argc and argv for the main() function. The name of the command itself is not available under the current APM operating system, so argv[0] is always set to "main". The following features are provided: \subsection{I/O redirection} {\hspace*{0.2 in}} $<$ file {\hspace{0.4 in}} open file as stdin \\ {\hspace*{0.2 in}} $>$ file {\hspace{0.4 in}} open file as stdout \\ {\hspace*{0.2 in}} $>$$>$ file {\hspace{0.3 in}} open file as stdout in append mode Any of the above may be preceded by a digit (0-9), in which case the file descriptor corresponding to the digit is used instead of 0 (stdin) or 1 (stdout) - e.g. 2$>$ errors redirects the error stream stderr to file errors. \subsection{Single character escape} The sequence $\backslash$c is replaced with a single character as follows - \small\tt \begin{verbatim} c character n newline 10 t tab 9 b backspace 8 r return 13 f formfeed 12 v vertical tab 11 other c \end{verbatim}\rm \normalsize \subsection{Wild card expansion} Any blank-delimited string containing the characters ?, * or [ is treated as a filename template and is expanded to a sorted list of corresponding filenames - \small\tt \begin{verbatim} ? matches one character * matches any number (including 0) of characters [...] matches any one of the characters enclosed. A range of characters may be specified with a hyphen, e.g. a-z. \end{verbatim}\rm \normalsize Matching does not take place in the directory part of filenames. \subsection{Symbol substitution} A \$ followed by an optionally bracketed (with \{\}) string of letters and digits is replaced with its value if the string was previously defined using symbol=value. {\hspace*{0.2 in}} examples: {\hspace*{0.4 in}} \} symb=fred \\ {\hspace*{0.4 in}} \} echo hallo\$\{symb\}die \\ {\hspace*{0.4 in}} hallofreddie \\ {\hspace*{0.4 in}} \} flags=-O -c \\ {\hspace*{0.4 in}} \} cc68 \$flags *.c \subsection{Quoting} Strings separated with space or tab are considered to be separate tokens, unless quoted using matching pairs of ' or ", in which case the whole quoted string is treated as one token. No wild card expansion is done inside quoted strings, and symbol substitution and single character escapes are also suppressed if the quote character is '. \section{Compiling and linking C programs} A Unix-style cc command is available for the new C compiler. Its options correspond fairly closely to the original. The command is nc:cc68 with the following options: \small\tt \begin{verbatim} -o name of output file -c Compile named files to .mob, but do not run the linker -p Generate profiling code (count function calls) -pt Generate profiling code (accumulate cputime per function) -O Run peephole optimiser -l Maintain source line number in register d5 -L Generate trap #15 instruction before each source line for tracing -S Compile named files to .a68 -P Run the preprocessor and leave macro-expanded source in .i -E As -P, but output to stdout -Dx Mark preprocessor symbol x as defined. -Ux Mark preprocessor symbol x as undefined. -Ix Add directory x to the list of directories to search for #includes. \end{verbatim}\rm \normalsize The options are followed by a list of source files. The files with .c extensions are compiled, those with .a68 extensions are just assembled. By default (unless -c is given), all the resulting .mob files are linked to produce a file named a.mob (or as specified after -o). Examples: {\hspace*{0.2 in}} Compile a simple one-module program with no external data: \\ {\hspace*{0.4 in}} nc:cc68 -c -O prog.c \\ {\hspace*{0.2 in}} This produces prog.mob which can be run without further linking. {\hspace*{0.2 in}} Compile and link a one-module program which does refer to uninitialised \\ {\hspace*{0.2 in}} external data: \\ {\hspace*{0.4 in}} nc:cc68 -O -o prog.mob prog.c \\ {\hspace*{0.2 in}} If -o prog.mob is omitted, the output is put in a.mob. {\hspace*{0.2 in}} Compile and link all .c programs in a directory: \\ {\hspace*{0.4 in}} cc68 -O -o prog.mob *.c {\hspace*{0.2 in}} Compile a mixture of C and assembler sources: \\ {\hspace*{0.4 in}} cc68 -O prog1.c prog2.c prog3.a68 \subsection{Details of the compile and link phases} \subsubsection{cpp - preprocessor} The preprocessor expands '\#' directives in the source file, and produces an output acceptable to the compiler. It is run automatically by cc68, or can be invoked directly: \small\tt \begin{verbatim} nc:cpp file.c file.i Flags: -C do not delete comments -Dname=val define name, as if by #define -Dname define name=1 -Idirectory search directory for #include files -P do not insert line directives (#line 12, foo.c) in the output -R allow macro recursion -Uname remove any built-in definition of name \end{verbatim}\rm \normalsize The symbols apm and mc68000 are predefined in this version of the preprocessor, so machine dependent code can be expressed as: {\hspace*{0.5 in}} \#ifdef apm \\ {\hspace*{0.5 in}} /* do one thing */ \\ {\hspace*{0.5 in}} \#else \\ {\hspace*{0.5 in}} /* do another */ \\ {\hspace*{0.5 in}} \#endif Use \#ifdef apm for apm dependencies (e.g. filenames) and \#ifdef mc68000 for architecture dependencies (e.g. byte sex). \subsubsection{c68 - C compiler} The compiler reads a 'pure' C source file (after preprocessing) and produces an assembler file suitable for the MIT assembler a68. \small\tt \begin{verbatim} nc:c68 file.i file.a68 Flags: -l generate code to maintain the line number in register d5 (displayed in run-time error messages). -XL generate line number traps, so the program can be traced using the software front panel. -XP generate profiling code (count function calls) -XT generate code to count cputime per function. \end{verbatim}\rm \normalsize Assembler instructions can be included in the source file using the asm(..) directive. Example: {\hspace*{0.5 in}} asm("trap \#15"); asm(".word 999"); {\hspace{0.3 in}} /* line trap 999 for sfp */ The instructions must correspond to MIT's idea of the M68000 op-codes - see C:A68.DOC \subsubsection{o68 - optimiser} The optimiser endeavours to reduce the size of an assembler file. This applies not only to the code section, but also involves removing redundant symbol information, which would otherwise slow down the assembler. There is considerable latitude for improvement in the raw compiled code, so use of the optimiser is highly recommended. {\hspace*{0.5 in}} nc:o68 infile outfile \subsubsection{a68 - assembler} The assembler processes the output of the compiler, or user written assembler programs, and produces an APM-style object module file. {\hspace*{0.5 in}} nc:a68 file {\hspace{0.9 in}} -- assemble file.a68 to file.mob Details of the instruction formats accepted by the assembler can be found in C:A68.DOC. Changes and additions made for version 2 include: {\hspace*{0.2 in}} New control instructions: \small\tt \begin{verbatim} .insrt "filename" (include text of named file. If it is not in the current directory, C: is tried) .if value (if operand is non-zero, assemble following) .elif value ( else if this operand in non-zero ... ) .else ( else .... ) .endif (end of conditional ) \end{verbatim}\rm \normalsize {\hspace*{0.2 in}} New operand formats: \small\tt \begin{verbatim} ?symbol ( = 0 if symbol is defined in .text section, else 1 ) [symbol,register] ( equivalent to .pc@(symbol-.-2) if symbol is defined in the .text section, otherwise register@(symbol-_dbeg), where _dbeg is the beginning of the .data section) \end{verbatim}\rm \normalsize {\hspace*{0.2 in}} New data declarations: \small\tt \begin{verbatim} .vect "name",.extdata,size These cause space to be reserved for an .vect "name",.sysproc import vector of appropriate type, and an .vect "name",.extproc appropriate entry in the import list of the .vect "name",.dynproc .mob file. If the type is .extdata, the optional value 'size' gives the minimum required size of the object referenced. \end{verbatim}\rm \normalsize \subsubsection{Clink - the C linker} \subsubsection{When do I need to clink ?} Clink is similar to the standard apm link program. It combines a number of .mob files, resolving cross-references among them. There are two reasons why the linker is language dependent. Firstly, the apm object module format was not specified with pre-linking in mind. The header contains insufficient information to locate the initialised data within the file, in order to fix import vectors. Secondly, the apm loader does not support the concept of common data blocks, which are referenced but not defined in any module of a program, and should be allocated space by the loader. The standard linker analyses the reset code to find the initialised data, but this only works for products of H-series compilers. Clink expects to find a 'secondary header' at the beginning of the code section, which specifies the location and size of the initialised data. This is provided by a68, so any .mob file produced using cc68 will have one. If the initial value of a data import vector is greater than zero, this is taken to signify a reference to a common block, of size at least equal to the value. All references to the same symbol among the files being linked are resolved to the same address within the final data area, either to the defining instance if there is one (more than one definition is an error), or to an address beyond the final location of the initialised data. The data size requirement of the output module is adjusted accordingly, and code is added to the reset routine to clear this common area before the program is run. Any program which references common data, even if it consists of only one module, will have to be processed by clink, or messages of the form Cannot find $<$symbol$>$ will be encountered on loading. Common references result from external declarations of the form {\hspace*{0.2 in}} int a; {\hspace{1.5 in}} /* better to declare static or initialise */ or \\ {\hspace*{0.2 in}} struct \{ ... complicated ... \} array[9999]; /* leave as is and clink */ These values may be declared and initialised in another module, in which case the reference is resolved in the normal way, either by clink or dynamically on loading if clink is not used. But if there is no initialising declaration, the reference must be resolved statically using clink. The code required to access the variable is more efficient if it is declared with storage class static or, if it is really referenced in other modules, given an initial value. However, this means the value is allocated space in the initialised data image, so large structures or arrays are best left to be allocated by clink. \subsubsection{Parameters} \small\tt \begin{verbatim} -c suppresses allocation of common data blocks -e filename file contains a list of symbols to be excluded from the export list -f filename file contains a list of .mob file names to be linked. The names would normally be placed on the command line. -i filename file contains a list of symbols to be included in the export list -o filename output is written to file. Default is a.mob. -v a list of the modules linked with the position and size of their code and data in the output module is produced (verbose) \end{verbatim}\rm \normalsize The -e or -i filename can be '-', in which case stdin is used. A line with a single '/' terminates a list of names. The filename can also be *, in which case all the relevant names are selected. Input files can be either .mob files (the usual case) or archives (produced by ar) containing a number of .mob files. \\ {\hspace*{0.2 in}} Examples: {\hspace*{0.4 in}} \} clink -e$\backslash$* -o prog.mob prog.mob \\ {\hspace*{0.4 in}} \} clink -i - -v -o lib.mob mod1.mob mod2.mob mod3.mob \\ {\hspace*{0.4 in}} mod1entry mod1func mod2data \\ {\hspace*{0.4 in}} / \subsubsection{Filename extensions} \small\tt \begin{verbatim} .h a C include (header) file .c a C source file .i a preprocessed C source file .a68 an assembler source file .mob an apm object module \end{verbatim}\rm \normalsize \section{Execution profiling} Execution profiles can be generated using the compiler flags -p and -pt. With -p each call of a function is counted, and the totals are displayed when the program terminates. With -pt the cputime (in milliseconds) spent in each function is accumulated and displayed on program termination. The results are written to the terminal, or into a file if the symbol pro\_file gives a filename. Cputime totals are only correct if a function is exited normally (via a return statement or through the end of the function). If exit is via a signal or longjump, the time is incremented by 1 ms. Example: \small\tt \begin{verbatim} 854 Proc_2() 587 Func_3() 2498 Proc_6() 2613 Proc_3() 11167 Proc_1() 5151 Proc_8() 1854 Proc_7() 1790 Func_1() 4510 Func_2() 741 Proc_4() 583 Proc_5() 42538 main() \end{verbatim}\rm \normalsize The results can be processed using cutils:sort -nr pro\_file \section{Stack trace-back on error} Stack trace-back shows which functions were active when the program hit an error. Example: {\hspace*{0.5 in}} abort(8F624B) line 300 \\ {\hspace*{0.5 in}} Proc7() + 36 \\ {\hspace*{0.5 in}} Proc0() line 178 \\ {\hspace*{0.5 in}} main(1, 8F8B68) line 117 The line number is displayed if line numbers were requested at compile time (by the -l flag). If a line number is not available the offset from the start of the function is shown instead. Function parameters are shown in hex. The symbol cdebug controls trace-back. Its value is the sum of the following options: {\hspace*{0.6 in}} 1 trace-back if stopped by CTRL-Y \\ {\hspace*{0.6 in}} 2 trace-back if program returns non-zero status \\ {\hspace*{0.6 in}} 4 trace-back if program returns zero status \\ {\hspace*{0.6 in}} 8 show function parameters The default value is cdebug=8. The trace-back is displayed on the terminal, and then written into the file 'stacktrace' in the current directory. \section{Libraries} Run-time support, unix system calls and standard subroutines are available in nc:libc.mob, which should always be installed to run C programs. Transcendental maths functions are in nc:libm.mob, and links to Fred's graphics routines are in nc:libg.mob. \subsection{System calls} Simulations of the UNIX system calls available on the APM: \small\tt \begin{verbatim}access(2) existence and executability are equated to mode 4 (read) acct(2) not available alarm(2) fully implemented brk(2) not available - use sbrk() chdir(2) change filestore default directory chmod(2) set owner and world protection on filestore file chown(2) not available close(2) fully implemented creat(2) mode interpreted as per chmod(2) dup(2) fully implemented exec(2) fully implemented exit(2) fully implemented fork(2) dummy routine getpid(2) returns filestore user number getuid(2) dummy routines indir(2) not implemented ioctl(2) dummy routine, returns 0 for a tty, -1 otherwise kill(2) dummy routine link(2) dummy routine - rename(n1, n2) provides alternative lock(2) not implemented lseek(2) seek past end of file not supported mknod(2) not implemented mount(2) not implemented mpx(2) not implemented nice(2) dummy routine open(2) ":" is interpreted as stream 0 (usually console), other names are passed to the filestore pause(2) fully implemented phys(2) not implemented pipe(2) dummy routine pkon(2) not implemented profil(2) dummy routine, alternative prvided (see profiling) ptrace(2) not implemented read(2) fully implemented setuid(2) dummy routine signal(2) partially implemented stat(2) st_mode and st_size implemented. All times set to FS timestamp stime(2) not implemented sync(2) not implemented time(2) dummy routine times(2) tms_utime set to 50 * seconds since APM boot, others 0 umask(2) dummy routine unlink(2) interpreted as delete file utime(2) not implemented wait(2) dummy routine write(2) fully implemented \end{verbatim}\rm \normalsize \subsection{C Subroutines} The C subroutines, as documented in section 3 of the UNIX manuals, have been compiled from UNIX source code, so they should be complete. Any which rely on unavailable system calls or UNIX-specialities, e.g. the password file, will of course not work. \subsection{Maths routines} The standard UNIX maths library for calculating transcendental functions is available in nc:libm.mob \section{Utilities} \subsection{xr - cross-referencer} This utility takes a list of object files and libraries, and analyses the interdependencies of symbol reference and definition among them. \small\tt \begin{verbatim} nc:xr {flags} files Flags: -f three column output consisting of filename, symbols defined, symbols referenced. default five column output of symbol name, symbol type, value, defining file, referencing files. \end{verbatim}\rm \normalsize \vspace{.75in} c:v2.hlp printed on 05/04/89 at 21.04 \newpage \tableofcontents \end{document}