On the Design of Editors for Small Computers
C.H. Whitfield

M.Sc

University of Edinburgh

May 1972

Abstract

This Thesis deals with various aspects of the design of interactive context editor programs, special consideration being given to the design of the command language so as to increase the rate with which beginning users become familiar with it.

A description of an Editor program is given with special emphasis on the organisation and implementation of the command decoding mechanism as a simple macro-assembler this facilitating the provision of a command definition feature and also permitting the complete separation of the decoding phase from the execution of the command.

This separation can be exploited when moving the program from one operating environment to another as the editing routines of necessity make assumptions about this environment whereas the decoder is much less affected. 1. Introduction
2. On-Line Text Editors: the State of the Art
3. On Humans
4. Command Language Design
5. Program Design
6. Implementation
7. Conclusions
8. Appendices:
(1) User Manual
(2) Program Listing

Introduction

The last five years have seen a sharp rise in the numbers of computer users in Universities and similar establishments; these new users come from a wide range of specialities and their immediate concern is for what a computer may be able to do for them, not with how it does it.

A parallel development has been the small but growing proportion of computer time which is available on-line, that is, immediately available through some kind of typewriter terminal rather than on a batch basis. On-line access may be provided either by a small machine with a single user or by a large machine with a time-sharing operating system such as Michigan's mts or Edinburgh's emas.

Whether large or small, the computer system now includes the terminal(s) through which it is accessed and these constitute a potential bottleneck. The cost-effectiveness of such a system depends not only on the efficiency of the programs being executed in machine terms, but also on the real-time effectiveness of its users at their terminals; for small machines with a single user, this latter is the only point worth considering as there are very few applications indeed in which small single user computers are cpu bound for more than a small fraction of the real time for which they are in use.

All on-line systems with any pretence of being general-purpose now offer an interactive text editor as a basic utility, this one program probably requiring the most frequent responses from more users than any other program on the system. It has the capacity to cause its users much grief if they make mistakes which it fails to detect or cannot cope with, and is frequently very badly designed from an ergonomic point of view.

This thesis is a study of these editor programs with suggestions as to how the breed might be improved. The problem is approached from two directions: firstly, what features in the command language are likely to improve the performance of its human users and secondly, how should the program be implemented to optimise its use of machine resources.

In the course of this study, it became clear that the differences between designing these programs for large, time-shared as opposed to small, single-user computers were not so great and could largely be avoided. I shall not therefore, despite the title, restrict myself to small machines when it seems inappropriate to do so.

The editor which is subsequently described has evolved from three others, written one each by Mr Alan Freeman, myself and Mr Hamish Dewar in that order. Acknowledgement is due to these sources, especially the latter who wrote an imp compiler which was invaluable in developing these programs; also Professor Sidney Michaelson who was always available as a source of information and encouragement.

On-Line Text Editors: the State of the Art

Any interactive program is, so far as its users are concerned, defined completely by its command language and real-time performance; any inaccessible features might just as well not be there and commands which are accepted but not honoured represent errors in the implementation. Consequently, a simple way to look at the features available in a range of, in this case interactive editors, is merely to examine their command languages.

The original batch system 'amender' is a device without any explicit commands; all 'lines', corresponding directly with card images, are numbered and replacement of a line is specified by repunching a card with the same sequence number as the one it is to replace; the use of sequence numbering in place of counting newlines has the highly desirable effect that insertion and deletion of lines does not cause the numbering of all subsequent lines to change, and so obviates the need to get a clean listing every time an alteration is made. The first interactive editors were simply developments of this sort of program, modified to make them easier to use from a typewriter terminal, the biggest single step being the introduction of context dependent commands such as 'find this string', 'change this string to something else' and so on. Editors of this sort are currently distributed by almost all computer manufacturers as 'context editors' and advertised as a selling point in the machine's software package.

A typical example is the editor distributed by Digital Equipment Corpn (DEC) as a component of the Advanced Software System for its PDP9-15 family of computers.

This editor accepts and executes commands from the user one at time; it is not possible to concatenate multiple commands into a single command string. While the manufacturers describe it as 'a powerful context-editing program .....(which) is most frequently used to modify MACRO and FORTRAN source programs, but (it) can also be used to edit any symbolic text', the commands are very definitely oriented towards editing fixed- rather than free-form text, even free-format programming languages such as ALGOL.

For example, the commands include two search instructions, one of which is intended purely for finding labels in the assembly language of the machine, and the insert mode for typing in long passages of text is switched on/off by a blank line. It is thus impossible to create sequences of blank lines in the output file with this editor, and indeed, its read-in routine removes any such sequences present on input to the program.

A weak feature is that all commands take effect from the start of a line; thus specifying sufficient context to make alterations towards the end of a line can be tricky, even in programming languages; in free-form text it would be much more so.

The editor really operates in block mode where 'pages' are loaded into and written from a block buffer in response to explicit user commands; any line currently in the buffer can be accessed. To simplify its use for quick simple alterations, a line-by-line mode can be set in which a single line buffer is automatically read and written by any command which causes movement to the next line in the file with the resulting major restriction that backwards movement is not permitted except to go right back to the beginning. Many of the most useful features, the ability to create intermediate backup copies, to rename files when clashes occur, to concatenate files and so on are very heavily dependent on particular features of the operating system.

A version of the program is distributed which additionally displays the current state of that part of the file being operated upon, on a CRT display; command requests and diagnostic information are still via the typewriter.

Editors of this sort have the obvious, and certainly not to be underrated, attraction that the command structure is very simple indeed and that the beginning user is unlikely to destroy his files inadvertently; even if he does, the ability to create and replace backup copies is built into the editor and so more likely to be used than otherwise. Against this, the program imposes restrictions on the format of the output files which can be generated; it is not possible to circumvent the restriction on multiple newlines except by producing lines containing only say a space character which is always liable to be removed as redundant by some other piece of software.

The fact that the command language is simple is undoubtedly desirable to begin with but its inflexibility and inability to extend with his skill, is likely to exasperate the more proficient user; to learn to use an editor of this sort is to go up a blind alley since there are no facilities whatsoever for concatenating or repeating commands and therefore a very definite upper bound on what can be achieved. Additionally, the inability to concatenate commands prevents the user from segmenting any particular editing job into pieces which fit his individual preconceptions of what is to be done; by so dictating both the user's tactics and strategy, this lack of concatenation may in fact make the command language more difficult to use, even for beginners, and so be a more serious omission than it might at first sight appear.

In sharp contrast to the above is a program called TECO, also marketed by DEC but with their PDP-10 computer. There can be little doubt that this is one of the most powerful editors available anywhere but equally that its command language is the one of the most rebarbative and its error messages the most unhelpful.

TECO's command language contains many of the features, such as arithmetic, iteration, conditional branching, concatenation of commands, of a general purpose programming language and it is purely from this that its power derives; the primitive operations offered are truly primitive and some mnemonics hardly qualify as such - notably 'R' is used for 'left' and 'L' for 'right'.

A rudimentary command definition facility is available as TECO can be made to execute any arbitrary piece of text to which it has access as a command string; it is therefore possible to save a piece of text for later use, which may well be as a command string. To facilitate this sort of thing, thirty-six 'Q-registers' are available as workspace: these can hold either numerical values or text strings; a pushdown stack is also provided along with 'push' and 'pop' operations.

It is a matter of common observation that people will learn to use any command language, no matter how bad, if they are exposed to it for long enough; the main weakness, and especially for casual users, of a system such as TECO is that no bridge exists between its command structure and that of the 'safe' PDP-10 editor, LINED, which is similar to, but a little more primitive than, the PDP9-15 editor described above.

So far we have been concerned purely with program editors whose main function is to make changes to the body of program text or simple data files; even free-format computer languages such as ALGOL are very stylised indeed when compared with natural language free-form text as represented by the manuscripts of books, theses or even computer manuals. Rice and van Dam[3] describe various editors designed to meet this need, and especially to provide features such that the user can structure his file hierarchically and inspect and make alterations to it at whatever level he feels is useful at a particular time: either as words, sentences, pages or even 'hyperpages' of any arbitrary size. The main difficulty seems to be that the demand for such systems is currently rather limited and that those which have been built quite understandably tend to exploit expensive hardware to its limits, thus making them specific to a particular configuration; the use of a light pen to issue commands, rather than the more conventional and widely available typewriter terminal is an example of this.

An argument could be advanced for using CRT display devices purely for output, as in the PDP9-15 editor described earlier, on the grounds that it is less demanding of the display device and so potentially less costly. Further, the hard copy is available of the commands issued during an editing session should this subsequently be required: the volume of this should be substantially reduced by the removal of need to print out lines of the file here and there as editing proceeds. Further reductions in this direction could be achieved by improved command languages which require the typing of fewer command lines to obtain a given effect. Developments in this direction rather than eliminating the typewriter device altogether mean that the CRT display becomes a desirable option which increases the performance of an editor rather than a necessity without which it will not work at all.

A possible weakness of the free-form editors described in [3], with the exception of Stanford Research Institute's NLS, is that they either use non-printing format control characters or represent the desired format of the text quite literally with the result that the files so produced may make demands which cannot be met by some software systems. A more basic objection is that if format information is conveyed implicitly rather than by explicit markers, there are likely to be ambiguities which require some knowledge, not only of the the grammatical structure of the language, but even of the semantic content of the text in question, for their resolution. For example, in the absence of any explicit cues, is a newline followed by six spaces a new paragraph or merely an indentation? If these ambiguities cannot be satisfactorily resolved, then the systematic alteration of the output format, merely by changing a few parameters, is impossible. With regard to prevailing file system standards, it is perhaps significant that the authors of [3] say that their article was produced with the aid of a variety of free-form editing systems then go on to bemoan the fact that it had to be completely retyped because of file incompatibility between two different systems apparently running on the same computer.

My own feeling is that free form text editing using wide screen CRT's interactively with a light pen or cursor is unlikely to gain wide acceptance because of the cost and specialised nature of the hardware required and facilities offered, and because it involves learning to use a new program of rather restricted applicability; this is especially so when similar effects can be achieved in many cases with much cheaper and more widely available equipment. This document was produced using the editor subsequently described herein to modify files stored in a canonical form in which all format control information is represented symbolically as are underlining and uppercase letters and whose own physical layout bears no necessary relationship to that of the output; this form may be generated automatically from original input which contains the required layout information represented literally and any errors resulting from ambiguity in the input then removed. The canonical form for any natural language text is slightly more compact than the original bitwise as it requires a smaller alphabet. Proof reading was done with the aid of a CRT display and the formatting program[4] which also generated the final copy. This formatter also produces cleaned-up versions of its input files so there was no need when editing to make changes in a tidy manner; this saved much effort on my part. The whole system runs on an 8K PDP9 with DECtape I/O; as was pointed out by the authors of [3], the ability to treat a file as a hierarchical structure is invaluable and I would regard this as the system's main lack but one which it would unfortunately require a disk to remedy. Nonetheless it produces acceptable typeset copy with a very small hardware outlay compared to the free-form editors they describe.

To summarise, a wide range of editors are commercially available, most of which are primarily program editors and are often line-by-line, one-command-at-a-time affairs like the PDP9-15 editor described at the start of this section. A notable exception to this is TECO which implements something similar to a multi-register computer's assembly language except that it has string operations instead of floating point arithmetic. The former is easy to use, the latter easy to have disastrous accidents with.

The free form editors described in [3] with one exception make what, according to the standards currently prevailing, must be regarded as unrealistic demands on the capabilities of the file storage systems likely to be generally available. Unless large numbers of people require that the standards are higher, these needs are unlikely to be met generally and these editors would remain, as at present, dependent on particular file storage and character code conventions which may cause trouble in some other operating environment.

I think the biggest single hardware advance would be the wider availability and use of CRT's purely as output devices to display the relevant portion of a file continually, this being updated as changes are made. This, together with improved command languages could substantially reduce the bulk of hard copy generated by an editing session without eliminating it as an aid to memory. An added advantage is that editors designed for this sort of environment work perfectly well when the CRT is not available; the same is not true of editors designed to use the display interactively with a light-pen. In a multi-access environment where the system may be entered from remote terminals, this last point is not to be taken lightly.

References:

1. PDP9 Context Editor User Manual (DEC-15-YWZA-D)
Digital Equipment Corpn., Maynard, Massachusetts.

2. TECO: Text Editor and Corrector User Manual (DEC-10-ETEB-D)
Digital Equipment Corpn., Maynard, Massachusetts.

3. ACM Computing Surveys, Vol 3, Number 3, September 1971
Andries van Dam and David E. Rice
On-Line Text Editing: A Survey.

4. Edinburgh University Computer Science Department Internal Documentation.
C.H. Whitfield, August 1971
FORMAT: A free-form text formatting program.

On Humans

It should be obvious that the real performance of any highly interactive computer program, such as an editor, is going to depend heavily on the behaviour of its users sitting at their consoles: if they make large numbers of errors or have difficulties with the command language when these could be reduced by changes to the program, then this must be regarded as a failing of the program and not of its users. It is therefore perfectly valid to consider aspects of human behaviour in designing the command language of such programs and so appropriate to review at this point what is currently known about human performance in the relevant areas.

We shall be mainly concerned with the learning of speed-skills which fortunately, because of its obvious commercial significance, is a very well researched area of psychology. To allow self-styled 'hard', as opposed to 'soft' scientists to read on, I would point out that we are only interested in how some hypothetical average human performs and not in why, so acceptance of results does not imply acceptance of any particular theory to account for the results. Nonetheless, I feel bound to say that E.R.Crossman[1] in particular has produced a mathematically formulated and adequately parameterised positive feedback model which predicts experimental results very well for a wide range of speed-skills.

Any system utility such as an editor will have a large number of users, whether singly on a small machine, or many-at-a-time on a large, time-shared one; it is therefore valid to concern ourselves with some hypothetical average user rather than with individuals. The behaviour of this user can then be quantified as can any other component of the system and individual differences ignored. It is conceivable that a very wide range of abilities might cause problems arising from the conflicting requirements of the two ends of the continuum; such difficulties cannot, by definition, be resolved with a single command language so the best we can do is to make the 'accept' band in the middle as wide as possible by making the command language flexible and able to appear differently in different circumstances.

Despite beliefs to the contrary, most of our behaviour is automatic and unconscious, this in fact being essential if we are to operate at a viable rate. The acquisition of a speed-skill involves quite simply the automation of large, invariant chunks of the activity; thus, speed is gained at the expense of flexibility. Subjectively, in the early stages of learning, execution of the activity is slow, tiring and variable from one cycle to the next. As larger and larger sections of the skill are automated, performance tends by discrete jumps towards the final state in which the highly overlearned skill is largely involuntary and very inflexible, acquiring almost the character of a reflex. It is resistant to conscious intervention either running its course as usual or breaking down completely; consider for example reciting the alphabet backwards or interchangeably driving cars with three- and four-speed gearboxes.

More formally, in the initial stages of learning and for a wide range of tasks humans process information at approximately ten bits/second and there is an increasing relationship between errors and speed. Much conscious intervention is required with consequent fatigue. In contrast to this, highly overlearned speed skills may be executed at apparent speeds of around forty bits/second and the relationship between errors and speed breaks down. Note especially that in this state, reducing speed may well increase errors as may conscious interference with the unconscious performance of the skill; performance is much more consistent and consequently more predictable. Also, Bartlett[2] has reported that the invariant elements do not degrade with fatigue but only the higher level control functions which, taken with the increased performance possible, provides two reasons for wishing our user to automate his skill as quickly as possible. Everyday skills such as using natural languages, playing ball games, operating machines and so on all include large components which are automated to the degree characterised above.

An obvious way to make a command language easy to learn is for it to exploit the already-learned components of other skills, but it must be emphasised that this can very easily rebound. If a spurious similarity exists between an already-learned skill and one which is to be acquired, learning will be made much more difficult and indeed intermittent errors may persist almost indefinitely. This is the basis of the so-called interference effects which will be referred to later.

It can be shown[3] that humans work best at a constant stress level and that, if permitted to do so, they adjust their work rate accordingly to very near the optimum. If made to work faster they make a disproportionate number of errors; if made to work more slowly, they lose concentration and work even slower.

The optimum value referred to above is that rate at which correct responses are produced at the highest rate; it is not such that no errors are made. What appears to happen is that self-paced activities, such as using a text editor, settle down to some work rate at which the error rate is non-zero but acceptably low. Since 'acceptable' is in this context defined by the consequences of making such an error, we can expect robustness, or lack of it, in the command decoder to affect the average work rate independently of, and in addition to, the structure of the command language.

Using a text editor can as a first approximation be segmented into deciding what alteration is required then making it. These further sub-divide into, firstly, deciding what logical change is necessary and how this change can be effected by alterations to the body of the text, and secondly, into converting the the static concept of the effect required into a sequence of commands to achieve this effect, then converting this into editor commands.

    ---------      -------------      -------------      ----------
   ! specify !    ! static form !    !  sequence   !    !   real   !
   ! logical ! -> !  of text    ! -> !     of      ! -> !  editor  !
   ! change  !    ! alterations !    ! imperatives !    ! commands !
    ---------      -------------      -------------      ----------

       (a)              (b)                (c)               (d)

                               Fig  3-1

Step a-b must of necessity lie with the user and may involve considerable intellectual effort; I contend that the better the command language the more of (b), (c) and (d) will be taken over by the command decoder and thus involve minimal effort on the part of the user. Where commands cannot be concatenated, the user has to organise tactics as well as strategy and is responsible for b-c and c-d. Permitting command concatenation relieves him of the tactics c-d.

By mimicking the form of the natural language imperative we can gain a considerable advantage in the number of commands which the user can hold in his short-term memory merely because the gross structure, if not the detail, is familiar. A further gain can be had by automating step b-c where appropriate; this can be done by using a static prepositional form to qualify an imperative, thus disguising two commands as a little more than one. For example:

Find 'FRED' Insert 'JIM' (1)
Before 'FRED' Insert 'JIM' (2)

Examples (1) and (2) might very well be commands having identical effects; the suggestion is that (2) is better than (1) because it is derived from one natural language imperative sentence rather than two. G.A.Miller[4] has shown that short term memory capacity is generally greater for more structured objects than for less structured ones; an obvious example is the use of hexadecimal and octal numbering schemes to describe strings of binary digits.

G.A.Millar again, [5], has shown that transformation times for natural language constructions may be quite appreciable; he quotes for example 1.5 seconds to convert an active simple sentence to its passive form. The effort involved in converting one imperative modified by a prepositional clause into two imperatives is unlikely to be trivial, and effort leads to fatigue.

Improvements in this direction will have their biggest effect on the time required by new users to reach some arbitrary level of proficiency with a command language but it seems highly probable that even with long term users, some effect will be detectable even after long periods of fairly intensive use; studies of speed-skills such as are required on assembly lines have shown periods of anything up to two or even three years before performance stopped improving.

It was suggested earlier that the more of an activity could be learned as a speed-skill the better and that such learning would proceed more quickly when other already-learned skills could be exploited and consequently the natural English imperative form was mimicked. The final stage is use mnemonic commands to reduce the amount of typing required without, hopefully, losing the parallel with the already learned background skill possessed by most of the users, that is natural English. This is possible because, despite the views of Lewis Carroll and Humpty Dumpty, and whether we mean it to or not, any symbol to which we have been exposed acquires associations of one sort or another. In the restricted context of, in this case, single letter mnemonics, text editors and English speakers, many of these associations will be stable within the population and can therefore be exploited.

In the light of previous remarks made about learning speed-skills it should be clear that attempts to use associations like 'L' for 'right' and 'R' for 'left' will produce severe interference effects which will be very difficult to extinguish. We must especially beware of superficially rational arguments to justify the use of patently bad mnemon cs. The previous example might be rationalised by something of the form: 'we have a command 'L' which means go forward so many Lines and another 'R' which means Retreat so many lines; what could be more rational, consistent and elegant than that the degenerate form 'L0', abbreviated to 'L', should mean go to the right-hand end, and similarly 'R' to the left-hand end, of the current line?'

Note that interference is a two-way dynamic process; here the natural language interferes with the artificial one and vice versa, any period without exposure to one increasing the relative strength of the other. The loser, again particularly for casual users, in any deal of this sort is bound to be the artificial language.

Another point of lesser importance but well worth considering is the possible effect of operant conditioning on our terminal user's performance. While it is no doubt unpleasant to to see ourselves as sharing aspects of behaviour with white laboratory rats it is nonetheless true that we condition just as effectively. Anyone who doubts this might ask himself what the attraction of fruit machines and bingo is. The basic requirement for operant conditioning to occur is quite simply that some activity becomes temporally connected with something that the subject regards as pleasant and that continued activity continues to produced the desired effect. The most durable conditioning is produced when when the rewards are randomly distributed in time and proportionate to the effort expended but sometimes do not materialise ('random ratio reinforcement'). I would suggest on this basis that we need not worry about lack of motivation in our console users.

I make no apologies for this exceedingly sketchy section; the contents can be found in any second year experimental psychology text though in some cases the theoretical models offered are weak. This section is important because whatever the failings of current psychological theorising in some areas, the effects the theories seek to model are perfectly real and will not go away if we simply ignore them.

Since they are real and do affect the potential performance of the user/interactive program composite to a marked degree, it is perfectly proper that they should be considered in designing the user interface for such a program.

References
1. Ergonomics, Vol 2 1959, pp163-166
E.R.F.W. Crossman
A Theory of the Acquisition of Speed-skill +

2. Proceedings of the Royal Society, B, vol 131, 1943, pp248-57. (Ferrier Lecture)
F.C. Bartlett
Fatigue Following Highly Skilled Work +

3. J. Exp. Psych., 1966, 71, 849-857
P.M. Fitts
Cognitive Aspects of Information Processing:
III. Set for Speed vs. Accuracy

4. Psychological Review 1956, 63, 81-97
G.A. Miller
The magical number seven plus or minus two: Some limits on our capacity for processing information.

5. American Psychologist, 1962, 17, 748-762
G.A. Miller
Some Psychological Studies of Grammar.

+ Reprinted in:
Penguin Modern Psychology Readings: Skills
David Legge(Ed.)

Command Language Design

An editor is simply a text manipulation program which operates on an input file according to instructions given it by its user; compared to even a simple macro generator, its string structuring facilities are minimal; also, string operators and operands are implemented in many modern high-level programming languages. The point of all this is that in designing a command language, while we may wish it to be powerful and flexible, it must be borne in mind that there are jobs which it could do, that are more conveniently done either with a macro generator or by writing, compiling and running a short program in some suitable high-level programming language, and that consequently the final decision as to where the line should finally be drawn is by its very nature an arbitrary one.

In descending order of importance, I see the significant requirements to be:

1. The command language must be a single language: that is, there must be a continuous gradation from the simplest commands used by the least experienced user to the most complicated ones used by the most proficient.

2. The basic command structure and simple commands must be very easy to learn; initial impressions tend to stick.

3. The command language must be as redundant as is reasonably possible to make it less likely that errors will go undetected.

4. There should be a command definition capability: (1) almost requires iteration, command concatenation, conditional branching and so on, thus making it possible in principle to build up very complicated command sequences. Most of this power is not practically available if no macro facility is provided and such commands have to be typed from scratch every time they are used.

All command languages have two attributes in terms of which they can be evaluated: these might be characterised as power and utility, the former being easily defined in a negative way by whatever cannot be specified in the particular language. The latter is more elusive and must be defined in terms of our hypothetical user's performance over a set task using different languages; note that this carries the implicit assumption that sufficient is common within the sampling groups that stable comparisons can be obtained. By mimicking the form of the natural language imperative in the native language of the members of the groups I would hope to meet this requirement.

It must be realised that most editing is simple and that an editor without simple, obvious, 'safe' commands is pretty well useless; any editor ever written as such, even TECO, is a puny device compared to a powerful macro generator and there is no point in building editors which provide power without utility. Note especially that languages which provide economical, succinct descriptions of events are almost invariably obscure to the general population because they make many assumptions about the background knowledge of those they are intended for; such languages are totally unsuited to the off-the-cuff production of non-optimal command strings by the unspecialised users of a system utility. They also tend to be less redundant, thus making it more likely that typing errors will go undetected.

A feature required rather infrequently but one about which a disproportionate amount is heard because of the difficulties it causes with most editors when the need does arise, is the ability to segment a file into an arbitrary number of pieces and then reassemble them in a different order. The real solution to this is the hierarchical file structure mentioned earlier but this is a comparatively expensive feature to implement on conventional linear storage devices and certainly not justifiable in a general purpose program editor. Furthermore, a language suitable for describing simple search/delete/insert style operations on a linear file is not suited to describing restructuring operations on a tree-like structure; what I am saying is quite simply that to include such a feature in a general purpose editor is to include a group of commands which bear little relation to the others: this is both in direct contravention of (1) and represents a continuing overhead for a little used facility. If a distinct language is required to describe these operations and they are infrequently used, they might as well be in a separate program.

The language finally settled on provides for concatenation of commands, iteration either for a specified number of repetitions or until some condition is met or a combination of the two, specification of alternative command sequences to be tried successively until one of them succeeds or they all fail, and definition of new commands with user chosen names in terms of the commands already available.

Acceptable command strings belong to the class:

 = 
'!' ,
'[' '=' * ']',
;

where:
= <>t4 (',')* ;
= <>t4 ()*;
= <>t4 '(' ')' ,
<>t4 ,
<>t4 ;

and:
= <>t4 '(' ('+','-','T')4? ')', ;
= <>t4 '\', ;
= <>t4 '?', ;
= <>t4 '$'*, ;
= <>t4 any character but newline.

<n>: is a positive integer taken modulo some system defined large value v of the order of 10,000 and a zero result converted to +1. '*' is used as a short hand for 'k?' where k=v+1.

Text strings are delimited by '/', ':' or '.', redundant delimiters between consecutive text strings being elided. The special symbol '-' can be used to stand for the last text string typed at any point in a command string.

Within command definitions, '#n', 1 <= n <= 4, refers to the _nth parameter and matches with the _nth type specifier.

Commands defined in terms of other commands are not bound at definition time but at call time; it is not therefore generally possible to check new commands when they are defined.

It is not possible to concatenate command definitions or the special re-execute command, '!'<n><query>.

Note that the language contains context-sensitive elements not explicitly mentioned above; for example, parameters must match commands, '-' means '-1' as a number and 'the previous text parameter' if decoded as a text string, for non-definition commands, command names must be defined and so on.

Examples of syntactically correct strings follow; see the user manual in Appendix 1 for more information about semantics.

The following strings are syntactically acceptable, assuming of course that the command names used are defined.

1. Change 'fred' to 'jim' and print the line.
C /FRED/JIM/ P

2. Find 'xxx' and change it to 'yyy'
F/XXX/ C-/YYY/

3. Do (2) throughout the whole file.
(F/XXX/ C-/YYY/)*

4. Define a command to perform this global substitute.
[$GLOB(TT)=(F#1 C-#2)* ]

5. Call new command defined in (4).
$GLOB /XXX/YYY/

6. Close off the new file, terminate editor run.
$CLOSE

7. Possible definition for the 'find' command(F) used above: move right one character position(R) or, failing that, move(M) to the next line (so that consecutive 'find's locate consecutive occurrences of the target string), try to set pointer 'before'(B) target string on this line then move to next line; do this until 'B' succeeds.
< [f(t)=(r,m)(\b#1m)*]>


The desirable properties of this language I would list as:

(a) The simple forms can be made very similar to English imperative forms: examples (1), (2) and (6) above.

(b) Straightforward repetition is a very simple variation on the non-repeated form: examples (2) and (3)

(c) User defined commands are called in exactly the same ways as any others and the definition of them is a straightforward variation on the immediately-executed form: examples (3), (4) and (5).

(d) Although by definition, complicated commands are complicated, the 'find' command in example (7) is a straightforward extension of the forms used in the preceding six examples: we are dealing with a single language, not a collection of odds-and-ends.

(e) Despite (a), it can be decoded by an LR(0) parser; the fact that no backtracking is ever required makes it cheaper to implement than otherwise.

The undesirable properties of the language stem from its very generality and are all connected with the production of diagnostic information after errors have been detected.

(f) The production of error messages is complicated by the absence of any necessary connection between the command string typed by the user and the one actually executed.

(g) Ideally, error trapping should be such that after any fails, the editor is restored to the state it was in before execution of that was attempted. Unfortunately, this is not feasible in practice unless the editor is running on a virtual memory system as it requires random access to the whole of the file being modified; even then it would be exceedingly expensive. Given that we cannot link error recovery directly to the command structure, the application of some arbitrary set of conventions is complicated by (f) as to the decoder, any command is just a string of characters.

(h) Since we have implemented a programming language complete with looping and conditional branching instructions, there is no decode-time guarantee obtainable that any particular command will terminate in a finite time: detection of some non-terminating loops is possible in special cases but not in the general; the use of a large finite value to represent '*' is dictated by this.

(i) Because of its generality, the language includes many unusual forms which might be considered sufficiently so to be regarded as typing errors: for example, because of the presence of the alternative feature(',') and the null , strings ending in a comma are accepted and their gross behaviour is identical to that of strings ending in a . Notably, a string consisting only of commas is accepted and executed though it does nothing. Any decisions as to which, if any, of these to forbid must be essentially arbitrary.

Program Design

From the outset, the intention was to make the program as easily transferable and modifiable as possible, these two unfortunately going together as an efficient implementation of a program of this sort, as opposed to one with simpler i/o requirements and less dependent on character manipulation operations, of necessity makes assumptions about the installation on which it is to run: primarily these are word-length, byte-oriented addressing or not, virtual memory or not and if not, how much main store and what sort of fast backing store if any, and lastly, is it a single user machine. Unfortunately, all these considerations affect strategy rather than tactics so it is unlikely that much automatic assistance will be available. I have therefore attempted to isolate the various parts of the program so that they can be altered individually without repercussions elsewhere.

Especially important is that the command decoder must be completely separate from the editor proper since it is this latter part which will be affected most by changes in the operating environment. This separation is in fact forced upon us by the requirement for a command definition facility and the immediate consequence that it becomes unacceptably inefficient to interpret the user's command string directly at execution-time, the alternative being to generate some other representation of it.

The program falls quite naturally into a self-contained command decoder which converts command strings into a pseudo-code which is executed by a simple sequencing mechanism which calls the required editing operations in the appropriate order; editing routines are thus self-contained and their internal operation is largely separated from each other and completely from the decoding and sequencing mechanisms.

All editing operations are defined on a single line-buffer with the conversion between the line-buffer and the format of line-records held in a bulk buffer being performed by a pair of complementary routines: the bulk buffer format can thus be changed, for example to suit computers with differing word-lengths and/or byte-oriented addressing without causing difficulties elsewhere. This is important with programs such as editors where the data buffer space required is likely to be quite large; altering the packing density of these line records provides a very simple way of trading speed for storage to suit the operating environment.

The command language consists of a number of fixed special symbols, '(', ')', '*' and so on, together with an indefinitely large and variable number of command names; command names always precede any parameters they may require and multi-letter names are explicitly marked: it is thus an LR(0) language and is easily decoded by a specialised, but simple, keyword macro scheme which assigns a fixed significance to the 'special symbols' and looks the variable elements up in a macro definition table. Extensions to the language are made simply by making additions to this table.

To remove the possibillity of recursive loops in what is after all a command decoder not a general purpose macrogenerator, any particular macro only has access, when being expanded, to those definitions which were already present when it itself was defined: it can thus never call itself, either directly or via some arbitrarily long chain of intermediate calls. An immediate consequence of this restriction is that when a command is redefined, it must quite literally be replaced rather than deleted and a new definition generated, otherwise the possibility arises that other macros which accessed it prior to redefinition might not subsequently be able to do so; a secondary but nonetheless highly desirable spin-off is that the effective length of the macro table on the n+1th call must be less than it was on the _nth and consequently, that as we go further into the parse tree and the number of nodes and derived searches increases, the mean length of these searches decreases with a consequent saving in execution time.

While macro substitution can convert one string of input symbols to another, the result is still a string of characters; to permit the complete separation of the decoder and editor functions it is desirable to transliterate the characters into numerical codes which are acted upon by the sequencing mechanism and editing routines. In the interests of increased execution-time efficiency, by avoiding unnecessary scanning, it is desirable that the decoder should be able to calculate what are effectively absolute addresses and this is duly done.

Addresses are planted by chaining together all points where a reference is required then replacing these list pointers by the value of the current location counter at some suitable time: no symbolic labelling mechanism as such is required. This is possible because all jumps, except those connected directly with the use of parentheses, are forward jumps.

This decoder thus has most of the essential features of a simple macro-assembler plus a number of features which are required to process the special characters, '(', ')' and so on, which were mentioned earlier. In practice, and unlike many general purpose macrogenerators, this decoder appears to be very resistant to mistreatment due largely to the presence of rigid checking of all parameters and the absence of features like definitions within other macro calls and recursive calling sequences. The absence of these latter two means that the environment is static throughout the decoding of any particular command string, thus greatly reducing the complexity of 'peculiar' cases that may arise; they can all be diagnosed in terms of parameter mismatching and/or the absence of a macro name from the appropriate part of the definition table. The protection of the basic system macros and checking of parameters at each level of calling means that diagnostics can be printed which reference objects that the user knows of. It goes without saying that all these restrictions reduce the power of the decoder but I consider the trade-off of power for robustness a good one for reasons made clear in the preceding sections.

Some traditional macrogenerator features could be put in, for example the generation of sequential integers, but such features, in the context of the whole editor, would be much better implemented as editing operations than put in the command decoder.

The problem mentioned in the last section of defining the state of the editor after an execution-time failure is dealt with by specifying that any 'within line' command leaves everything as it was prior to the attempted execution while multi-line commands in general do not: the resetting of pointers is implemented in the editing routines themselves for the small number(3) of operations involved. While the basic commands conform to this standard, it must be clear that macros can easily be defined that do not: suppose we have a primitive command 'R _n' which moves _n characters position to the right within the current line and fails if this is not possible - consider the difference between 'R5' and '(R1)5' on failure.

The notion of a unitary command is useful as one which either succeeds completely or alternatively, fails completely leaving the state of the editor unchanged: all primitives except 'move' are unitary as defined but this can clearly be changed by techniques as above. It seems highly desirable that all commands should be unitary and a recovery mechanism linked to the scope-defining parentheses would give this. In the above, 'failure' means, of course, genuine unforeseen failure as opposed situations which were anticipated and allowed for with '*' or '?'.

The peripherals attached to a small computer are likely to be much more variable than those on a large machine; the small machine may have paper tape only, magnetic tape or possibly a disk: the amount of core store available is likely to vary by anything up to a factor of six. In particular, these factors affect how requests to move backward through the file are dealt with: do we rely on holding as much as possible in main store and deny access to anything that has been written out or do we write and read back again intermediate files, only copying to the final output at the end of the editing session? The former method can give much higher performance for reasonably short files but imposes rigid restraints on the size and so the complexity of the editor program as enough store must be saved to give an adequately large buffer. The alternative method will be slower because of the file copying required which may be a critical point unless a disk is available. The editor on which this thesis is based operates in this latter way. Note that the above problem only exists when comparatively slow bulk storage is available: with paper-tape only there is no real choice in the matter while with a disk, the difficulty disappears.

A program listing of this editor appears as Appendix 2: the intention when writing the version which appears was to make the program easy to follow while still containing all salient features. I must emphasise that the implementation whose listing appears was not intended to be and is not efficient in any operational sense: notably, it uses whole machine words to store 8-bit character values and contains a bulk move routine and the line-record pack and unpack routines written in a high-level language.

Implementation

Implementation of this editor requires the implementation of the editor program per se and also the specification of a standard command set in terms of definitions which reference primitive operations defined in that program. Since the definition facilities are described in the user manual appearing in Appendix 1, all that requires to be said about this latter topic is that it is largely a matter of convenience which commands are defined as primitives and which are macros: the constraints which do emerge are concerned with the state of the editor's various pointers after an error condition has been trapped, which point has just been dealt with in the previous section.

In the following section, numbers in square brackets refer to line numbers in the program listing in Appendix 2.

Command Decoder

The command decoder[91:547] requires access to two main data areas, array 'ST' which is used to hold a stack, and array 'B' which contains both the macro definition table and decoder output. This output consists of numerical function codes with associated references to literals, strings and numerical values, and the literals themselves. The literals are stored from the bottom of B upwards, the macro table and the interpretable code from the top downwards in that order. New commands can only be defined from the basic command level so there is no risk of these latter two getting mixed up. Storing text literals from the bottom up gives a marginal advantage over the other way around because they can then be stored immediately even though their final length is not at that point known. It has the slightly bizarre result that the 'code' has to be planted and executed in the reverse direction to what might seem more usual.

    ------------------------------------------------------------
   !  -> -> ->!             !<-  <-  <-  <-   !<-  <-  <-  <-   !
   !          !             !  interpretable  !                 !
   ! literals !    free     !                 !   macro defns   !
   !          !             !      code       !                 !
   !          !             !                 !                 !
    ------------------------------------------------------------
               W           X                   Y                 Z

                  (arrows show direction of growth)

                             Fig 6-1

'W' points to the first free cell above the literals, 'X' to the first free cell below the interpretable code, 'Y' to the last cell occupied by the macro table and 'Z' to the first(non-existent) cell after the table as in Fig 6-1 above.

The macro table consists of at least as many entries as there are commands defined with the most recently defined command being represented by the entry starting at B(Y). Each entry consists of a macro name and body each preceded by a length cell which, since these objects are stored consecutively, gives the required offset to the next length cell.

In this implementation and in the interests of simplicity, the macro table entries, except for the length cells, contain one seven-bit character in a machine word. A first obvious step to improve the use of storage is to pack up these characters in some suitable way. If the target machine is byte-oriented, all cells, including the length cells will fit into eight bits; if the machine word-length is a multiple of six, six-bit characters might be more appropriate when it will be necessary to use a shift character if more than the restricted IBM card punch character set is to be catered for. In either case, alterations are required only in and the command definition routine at [494].

             ------------------------------------------
            !        !           !        !            !
            !  q - p !           !  r - q !            !
            !        !   macro   !        !   macro    !
            !--------!           !--------!            !
            !        !   name    !  8-bit ! definition !
            !  free  !           !  code  !            !
            !        !           !  word  !            !
             ------------------------------------------
                 P                    Q                  R

                            Fig 6-2

The length cell for the macro body currently holds both the length information and an eight-bit code word specifying the parameter types required: in a byte-oriented implementation it will be necessary to put this into a separate cell thus decreasing the maximum permissible definition length by one, requiring changes at [497] and [286]. The general layout in the current implementation is as shown above in Fig 6-2 but note that the length cell associated with the name holds a protection bit which is tested before redefinition; macro names, all else being equal, are thus restricted to being half the length of bodies.

The decoder is the recursive routine EXPAND[120] which operates by calling itself to expand references to other macros within the the string currently being processed; within each level there is a need to keep track of the balanced parentheses which delimit the scope of loops in the command string; in this implementation this is done by a separate stack kept in 'ST' and the stack pointer passed as a parameter of each call of 'EXPAND'. This is unfortunately required in IMP by the necessity of passing a variable number of parameters with each call on EXPAND, this being done by loading them onto this private stack before each recursive call. Writing in machine-code a single stack is all that is required: since parentheses must be balanced within each macro definition to permit localisation of bracketting errors we obtain the desirable side-effect from the strict nesting of calls and parentheses that a single pushdown acceptor suffices.

Routines PUSHBR[151] and POPBR[167] keep track of the number of left and right parentheses encountered, POPBR also planting code to set the execution time loop and branch conditions associated with each bracket pair. An extra bracket pair is inserted around a command [210] if it is either a command typed in by the user - that is the first non-recursive call on EXPAND, or if an unparenthesised command was qualified by '\' or '?'. This simplifies implementation of the execution portion of the program by converting statements of the form '\M5' into '\(M5)'.

Sequencing Mechanism

[553:627]

This is simply a pushdown automaton controlled by numerical codes stored in 'B' by the decoder; it controls the dispatching of control to the various editing routines and tests the result (succeed/fail) on return, also maintaining loop counts for all partially executed repetitions. Array 'ST' is used to hold a single stack needed to implement the pushdown.

The handling of failure conditions is dealt with by associating an error word with each alternative execution path: should a failure occur, this word is decoded at ERROR EXIT[611] and appropriate action taken.

Editing Routines

There are two parts to the editor proper: a buffer control routine, MOVE TO[829] and the editing routines defined on a single line buffer. A pair of complementary routines, PACK[664] and UNPACK[709] convert between the format of the bulk buffer and that of the line buffer where all editing is done. Both these buffers are held in array AA with all text above the current line at one end, towards AMIN, and that below, towards the other end, with the line buffer in the gap between. This scheme has the highly desirable property of concentrating all activity in AA into a small area and should reduce the amount of page turning in virtual memory systems.

The contents of the bulk buffer are in line records each with a header word containing the offset to the header word of the next record, backwards if above the current line, forwards if below. It is thus possible to move backwards or forwards a number of lines without having to unpack the intervening ones. A consequence of this is that multi-line text searches are slower with this scheme, relative to a straight move than might be the case with some others. Also, note that of the two conversion routines, PACK and UNPACK, it is more important that UNPACK should be efficient because lines must always be unpacked before they can be examined but only need to be repacked if they are changed; in particular, text searches involve unpacking.

In the interests of a simple implementation, all file i/o is via the line buffer so that the only path to/from the bulk buffer is via PACK/UNPACK. While this may seem highly desirable in that it means there need only be one copy each of the pack and unpack routines it in fact causes problems mentioned later. In a byte-oriented environment where the transformation between the bulk and line buffer formats is a trivial one, the above becomes an unnecessary restriction which would be better done away with. In particular, MOVE TO would be much more efficient if it had direct access to the the bulk buffer without going through this path.

MOVE TO guarantees to position the 'window' held in the bulk buffer in such a way as to minimise access time, tape winding and so on: it must therefore be very heavily influenced by the environment in which the program is to run. It also guarantees to return the buffer in good order with spare space left for insertions thus a call to move to the current line is an easy way for the rest of the editor to make space in the buffer. The routine has to cope with two main problems: firstly that access may be required to a line which was, but no longer is, in the buffer and secondly, that the buffer is full and there is nowhere to make an insertion. The latter is quite likely to turn into the first problem as it desirable move the window in something larger than single line increments if the backing store is comparatively slow, say magnetic tape or a slow disk; this may result in the point at which insertions were being made becoming inaccessible. The routine implemented here is a simple heuristic program which aims to over-react slightly to whatever caused the problem to arise so that hopefully it will take longer next time to recur. Also attempted, not very successfully so far, is the reduction of a series of single line 'move' commands into one longer one mainly to increase the efficiency of 'F'. This last problem is brought on entirely by the command structure; the editor proper is executing a sequence of commands completely in the dark as to what is to happen next.

The editing routines start at [919] and as they are all defined on a single line buffer, are simple and should be self-explanatory. 'Insert' suffers from the problem aluded to above that the only i/o path to and from the bulk buffer is through the line buffer: if 'insert' runs out of space, the use of the line buffer is required to clear the condition this unfortunately involving repacking the current line first and so losing the position of CP within it. A solution of sorts would be to ensure that enough space is available at all times to insert a single newline and redefine the 'insert' macro so that if the primitive failed for lack of space the macro inserted a newline then moved backwards one line to cause MOVE TO to tidy the buffer up and make space; deleting a newline, in fact the one just inserted, would then restore CP and enable us to continue. This no doubt illustrates the power of the command language but seems a little messy otherwise. The obvious solution in a byte-oriented environment is to access the buffer directly.

Beside the insertion of packing routines, if this is called for by the machine structure, large speed gains can be had by recoding the bulk move routine MOVE[67] to exploit the characteristics of whatever machine the program is to run on. Smaller gains can be had in a large number of places where consecutive store locations are accessed, notably in string matching and stacking and unstacking variables; compiler generated code tends to be bad in this respect because it uses the CPU to perform trivial address arithmetic which can often be done directly by the machine hardware. In programs of this sort where these operations are important, substantial speed gains can be obtained. These gains are worth having even in a small single user machine because they can usually be traded for storage, thus adding to the buffer-size. Note that in general while speed gains are fairly easily had by recoding critical routines, reduction of the code size involves extensive recoding either automatically or by hand. Editors are a special case here because they invariably have a large data buffer, only a small part of which is accessible at any instant and large amounts of space can be saved by packing this more densely, the problem being that this is a time-consuming business; the situation thus arises that a large amount of (data) space can be saved by recoding a small part of the program to make dense packing acceptably fast. The two level buffer used here is designed among other things to facilitate this.

A point worth mentioning but no more is that this implementation does not as it stands deal with the full ISO character set, in particular, all lower case letters are converted to upper case. The main saving is in execution time as otherwise it would be desirable that the search routine be made to match strings irrespective of case differences; this would represent a continuous overhead and since the file system into which the intermediate copies are written cannot cope with the full set anyway this feature was left out. Clearly with the command structure which exists it is very easy to put this feature in together with the necessary commands to enable/disable the case equivalencing should this be required.

Conclusions

The current state of the art would appear to be such that 'editor' usually means 'program editor' and that these devices tend to contain features, for example throwing away multiple newlines, which would be unacceptable if processing free-form text directly; they also tend to have 'closed-ended' command languages which pose fairly severe restrictions on what can be done with them. There are one or two exceptions to these generalisations, notably TECO, but these all go too far the other way, sacrificing utility for power and consequently demanding quite a high degree of expertise on the part of their users. For casual users especially, this lack of a single command language will prove to be a nuisance and appears to be completely unnecessary.

In addition to the structure of the languages, command mnemonics are badly chosen in many cases, again disproportionately increasing the difficulties of the casual users since bad mnemonics are essentially those which are forgotten most quickly.

Two sets of mnemonics can be compared in terms of the extent to each interferes with the other: this was not done because there is no point in doing it with specially made up made up mnemonics which could be arranged to deliver whatever result was required. Doing it with editor command mnemonics requires two fairly large, thirty at least, groups of casual users. These were not immediately available though with the more widespread use of EMAS this should be possible in a year or two. The problem lies in obtaining unspecialised users; as was noted elsewhere, people will learn any set of mnemonics if forced to use it for long enough but their subsequent behaviour in this small area could hardly be said to be typical of the population at large.

The editor described here has the command decoder isolated from the editor proper, the decoder and sequencing mechanism making no reference to editing at all. In a paged, virtual-memory system, keyboard response time could probably be improved by providing a table-driven command decoder as a system utility since then hopefully the code representing the decoder would be more heavily used and so less likely to cause page delays. My main doubts here were whether or not a macro scheme could be made sufficiently robust: it seems that it can.

The gross structure of an editor program is very little affected by the environment, except perhaps for paper tape only i/o, but its performance may be very greatly enhanced by modifications which make it fit this environment better; these modifications will be at the level of strategy rather than machine-level tactics and little help will be available from automatic optimisers. It is therefore highly desirable that editor programs should be modular to facilitate making such changes. In particular, trading CPU capacity for storage is particularly easy with editors because of the presence of large data buffers whose packing density can be increased for the price of the processor time to do it.

A fairly large number of special purpose editing systems are available for modifying free-form text. At the current time most i/o systems are designed primarily for dealing with text files representing program source texts and data, and may not be capable of meeting more stringent requirements. I would expect that and the fact that many of these editors depend on CRT displays to limit their appeal.

The feeling I get from looking at a number of the editors on the market is that there are too many of them, this largely stemming from the fact that they seem to be either safe but very restrictive or powerful at the expense of being impossible for a beginning user to start with; the editor offered here is intended to fill this gap.

Appendix 2: EDIT5B User Manual




Basic Structure

This document describes a context editor: it is primarily a program editor but can genuinely be used on free-form text subject to the restriction that the output file, if not the input, will contain lines terminated by a newline character, each being no longer than some implementation-defined maximum, and that the last character in the output file will therefore always be a newline. No special free-form formatting facilities such as page generation or right-justification of lines, are included.

All commands consist of an alphabetic command name followed optionally by up to four parameters, each of which may be either a text string or a number: numbers may be omitted when they take the default value of +1; no such default is available for text strings. Spaces are ignored except where necessary to separate consecutive numerical parameters or mark the end of a multi-letter command name. Multi-letter names must be preceded by '$' to avoid confusion with consecutive parameterless single letter commands.

The special symbol '-' may be used to stand for the last text parameter typed at any point in a command string; 'N' must be used to stand for quoted newline. If interpreted as a number, '-' alone means -1.

In principle, any number of commands may be concatenated to form a single command string whose elements are executed serially exactly as if they were typed in one at a time; execution of the whole string stops at that point if any element fails. Where a repeated component can be isolated, unnecessary typing can be avoided by repetition of that component, thus '(XYZ)3' is exactly equivalent to 'XYZXYZXYZ'. If it is not clear how many times a sub-string should be repeated, the number can be followed by '?': thus '(XYZ)3?' specifies that any number of repetitions up to three, zero included, is a correct execution of the command. '*' can be used as short-hand for a very large number followed by '?', thus for practical purposes '(XYZ)*' is a command to execute 'XYZ' repeatedly until it fails.

Two or more sub-strings may be specified as alternatives to be tried in order until one of them succeeds; if none succeed, then the whole alternative list has failed. For example, 'ABC,PQR,XYZ' specifies a command which succeeds if any of its three component parts succeeds and fails only if they all fail. The main use of this feature is in repeated commands as above; for example we might type '(ABC,PQR,XYZ)*' which will continue to execute until all three components fail on the same repetition. Execution of the alternative list starts at the left hand end on each repetition irrespective of which of its components was executed on the previous one.

The conditional branching in the previous paragraph is dependent on a decision as to whether a particular element of a command failed or succeeded; the sensing of these conditions can be inverted by preceding any command by a '\' whereupon the branch will occur when the command succeeds rather than as above, when it failed. More precisely, the branch condition for any command derives from comparing its result (succeed/fail) with its immediately enclosing environment which may also be inverted: thus '(x)' and '\(\x)' are equivalent as are '\(x)' and '(\x)'.

New commands may be defined with names chosen by the user, the effect when they are subsequently called being exactly as if the command itself had been typed out each time. These may be parameterised in the same way as standard commands. The standard commands are defined by a text file read in each time the program is loaded and can thus be changed very easily merely by editing that file; this should be borne in mind when reading the rest of this document.

The primitive operations and the standard commands all take effect relative to a character pointer (CP) which initially is at the start of the input file. Also used subsequently is the notion of the current line, this being the line of the file in which CP currently resides.

Text Referencing Commands



F 'text string'
Find the next occurrence of 'text string', searching sequentially through the input file from top to bottom and left to right starting at the next character but one after CP. By skipping one character we ensure that successive calls on 'F' locate successive occurrences of 'text string'. 'F' is the only text referencing command whose scope is greater than the current line.

A 'text string'
Set CP after the text string specified.

B 'text string'
Set CP before the text string specified.

C 'text1'text2'
Change 'text1' to 'text2'.

D 'text string'
Delete the text string specified.

I 'text string'
Insert the specified text string to the left of CP.

U 'text string'
Delete from CP up to but excluding the specified text string.

V 'text string'
Verify that the text string specified lies immediately to the right of CP: fail unless it does.

Note

1. '/', ':' and '.' may be used as string quotes in addition to the more conventional ''' used above; these others are more conveniently placed on most keyboards.

2. All the above commands except 'F' fail, leaving CP unchanged, unless the text string specified lies between CP and the end of the current line.
Commands with Numerical Parameters

E +n
Erase _n characters to right of CP.

G +n
Insert _n lines from the user console above the current line. CP is moved back to the start of the current line. Typing ':' on a line by itself causes a switch back to command mode at any time and irrespective of _n. When in command mode the prompt character is '>', when in 'insert' mode after a 'G' command, it is ':'.

K +n
Kill _n lines including the current line. CP is set to the start of the line following the last one removed.

L +n
Move CP _n character positions left.

R +n
Move CP _n character positions right.

M _+n
Move backwards(-) or forwards(+) _n lines in the file.

P +n
Print _n lines on the user console starting with the current line; the current position of CP is marked with an up-arrow unless it is at the very beginning of a line. If _n is greater that one, CP is moved to the start of the last line printed.

Other Commands



$CLOSE
Close off the output file and terminate this run of the editor program.

$GS +n
Secondary input version of 'G +n'; lines come from a secondary input file rather than the console.

$REWIND
Rewind the secondary input file referenced by the previous command.

! +n
Execute the last command string typed _n more times.

Examples(1)

To aid description in the following examples a line is at some point taken to be line N and other lines thereafter referred to as N+1, N+2 etc. This numbering is fixed at the time N is assigned ; for example, N+1 and N+3 are consecutive lines if N+2 is removed.

>P 5
take current line as N ; print lines N to N+4 inclusive on the console and leave CP set to the start of N+4.

>MP
take the current line to be N ; set CP to the start of N+1 then print N+1.

>F'3:' M-2 K3 M- P2
search through the file for the next occurrence of '3:' ; take the current line as line N. Set CP to the start of N-2 ; destroy N-2, N-1, and N leaving CP set to the start of N+1. Go back one line, setting CP to the start of N-3 ; print lines N-3 and N+1 which are now contiguous, finally leaving at the start of N+1.

Examples(2)


1) Print the current line and every tenth line thereafter for the next fifty lines:
>P( M10 P )5

2) Change '1' to '2' in the fifth source statement on a line of IMP program text:
>( A';' )4 C'1'2'

3) Change calls on 'PRINT' to calls on 'WRITE' throughout an IMP program. The latter has two, not three parameters.
>M-*( F'PRINT(' C-'WRITE(' A',' B- U')' P )*
In this last example, note the use of:
a) 'M-*' to make sure that CP really is at the top of the file before starting.
b) 'P' to print out any line that is altered.
c) 'PRINT(' rather than just 'PRINT' so that for example, 'PRINT SYMBOL' won't be changed.
d) The use of '-' to save retyping parameters.
e) Spaces to make the command string more readable and easier to type correctly.

Note that the string search does not ignore spaces so that although for example, 'PRINT(' and 'PRINT (' have the same meaning in IMP, they are different strings of characters and will be treated as such by the Editor. Command Definitions

Any command string which can be typed directly into the editor can be given a name and so added to the command set available.

The format of a command definition is:
[ name ( parameter types ) = definition body ]
For example, the 'change' command can be defined:
[C(TT)=D#1I#2 ]
where:
'C' is the name of a new command which has two text parameters(TT) to which '#1' and '#2' refer.

Up to four parameters are permitted in any one definition and their types may be text string(T), positive integer(+) and positive or negative integer(-).

Restrictions are:
1. No name must be longer than 126 and no definition body longer than 254 characters.
2. Names must be defined before they are referred to elsewhere though it is quite permissible subsequently to change the body associated with a name.
3. '-' as a text parameter cannot be used within a command definition.

Definition is in fact destructive re-definition, a new command name being created only when that name is not already available; attempts to dynamically change the standard pre-loaded commands will be monitored. These can be changed by editing a short text file of definitions which is read every time the editor is loaded.

Note that commands are bound at call-time not definition-time and that consequently changing a command A which is referenced by another, B, will alter the effect of B. This can be exploited when building up large commands by permitting parts of them to be changed without retyping completely.

It would perhaps be as well to point out here that while this editor implements a fairly powerful command language, long editor commands are, compared to a compiler-implemented string manipulation program to achieve the same effect, grossly inefficient and restraint should be exercised in the use made of it; this rider of course does not apply if the program is run on a single user small computer.

A further caution is that as commands are not bound until call-time they cannot be checked at definition-time. This does not affect the standard commands because, once they are loaded they cannot be altered or renamed on that run of the editor but new user-defined macros should be checked out carefully as would be done with any other program.

A list of the definitions of the standard commands follows:

[G(++)=G#2#1]
[C(+)=Z#1]
[$CLOSE=C1]
[$REWIND=C3]
[A(T)=A#1]
[B(T)=B#1]
[C(TT)=D#1I#2]
[D(T)=D#1]
[E(+)=E#1]
[I(T)=I#1]
[M(-)=M#1]
[P(+)=P#1]
[S(-)=S#1]
[U(T)=U#1]
[V(T)=V#1]

[$GS(+)=M-?G#1 2]
[G(+)=M-?G#1]
[R(+)=S#1]
[L(+)=S-#1]
[K(+)=L*(E*DN)#1]
[F(T)=(R,M)(\B#1M)*]
*

Note the re-use of identifiers C and G to avoid having 'unofficial' names accessible to the console as would happen if distinct intermediate names were used: this is permissible only in the pre-loaded definitions and is unambiguous because they cannot be redefined.

The extra level of macro calling in these two definitions is needed because parameters cannot be converted when referencing a primitive operation directly; the only acceptable parameter form in this context is '#n' where 1<=n<= 4.

The primitive operations correspond to those identifiers which are not defined in the above table; their names are fixed.

The '*' at the end of the list above switches the editor into command mode after loading the standard commands at the start of an editor run.

Appendix 2




This Appendix contains a listing of the editor program; the implementation is intended to show how the algorithm works and is certainly not efficient either in CPU terms or in its use of storage.

See section 6 for suggestions as to how the program may be efficiently implemented.