This Thesis deals with various aspects of the design of interactive context editor programs, special consideration being given to the design of the command language so as to increase the rate with which beginning users become familiar with it.
A description of an Editor program is given with special emphasis on the organisation and implementation of the command decoding mechanism as a simple macro-assembler this facilitating the provision of a command definition feature and also permitting the complete separation of the decoding phase from the execution of the command.
This separation can be exploited when moving the program from one
operating environment to another as the editing routines of necessity
make assumptions about this environment whereas the decoder is much less
affected.
1. Introduction
2. On-Line Text Editors: the State of the Art
3. On Humans
4. Command Language Design
5. Program Design
6. Implementation
7. Conclusions
8. Appendices:
(1) User Manual
(2) Program Listing
The last five years have seen a sharp rise in the numbers of computer users in Universities and similar establishments; these new users come from a wide range of specialities and their immediate concern is for what a computer may be able to do for them, not with how it does it.
A parallel development has been the small but growing proportion of computer time which is available on-line, that is, immediately available through some kind of typewriter terminal rather than on a batch basis. On-line access may be provided either by a small machine with a single user or by a large machine with a time-sharing operating system such as Michigan's mts or Edinburgh's emas.
Whether large or small, the computer system now includes the terminal(s) through which it is accessed and these constitute a potential bottleneck. The cost-effectiveness of such a system depends not only on the efficiency of the programs being executed in machine terms, but also on the real-time effectiveness of its users at their terminals; for small machines with a single user, this latter is the only point worth considering as there are very few applications indeed in which small single user computers are cpu bound for more than a small fraction of the real time for which they are in use.
All on-line systems with any pretence of being general-purpose now offer an interactive text editor as a basic utility, this one program probably requiring the most frequent responses from more users than any other program on the system. It has the capacity to cause its users much grief if they make mistakes which it fails to detect or cannot cope with, and is frequently very badly designed from an ergonomic point of view.
This thesis is a study of these editor programs with suggestions as to how the breed might be improved. The problem is approached from two directions: firstly, what features in the command language are likely to improve the performance of its human users and secondly, how should the program be implemented to optimise its use of machine resources.
In the course of this study, it became clear that the differences between designing these programs for large, time-shared as opposed to small, single-user computers were not so great and could largely be avoided. I shall not therefore, despite the title, restrict myself to small machines when it seems inappropriate to do so.
The editor which is subsequently described has evolved from three others, written one each by Mr Alan Freeman, myself and Mr Hamish Dewar in that order. Acknowledgement is due to these sources, especially the latter who wrote an imp compiler which was invaluable in developing these programs; also Professor Sidney Michaelson who was always available as a source of information and encouragement.
Any interactive program is, so far as its users are concerned, defined completely by its command language and real-time performance; any inaccessible features might just as well not be there and commands which are accepted but not honoured represent errors in the implementation. Consequently, a simple way to look at the features available in a range of, in this case interactive editors, is merely to examine their command languages.
The original batch system 'amender' is a device without any explicit commands; all 'lines', corresponding directly with card images, are numbered and replacement of a line is specified by repunching a card with the same sequence number as the one it is to replace; the use of sequence numbering in place of counting newlines has the highly desirable effect that insertion and deletion of lines does not cause the numbering of all subsequent lines to change, and so obviates the need to get a clean listing every time an alteration is made. The first interactive editors were simply developments of this sort of program, modified to make them easier to use from a typewriter terminal, the biggest single step being the introduction of context dependent commands such as 'find this string', 'change this string to something else' and so on. Editors of this sort are currently distributed by almost all computer manufacturers as 'context editors' and advertised as a selling point in the machine's software package.
A typical example is the editor distributed by Digital Equipment Corpn (DEC) as a component of the Advanced Software System for its PDP9-15 family of computers.
This editor accepts and executes commands from the user one at time; it is not possible to concatenate multiple commands into a single command string. While the manufacturers describe it as 'a powerful context-editing program .....(which) is most frequently used to modify MACRO and FORTRAN source programs, but (it) can also be used to edit any symbolic text', the commands are very definitely oriented towards editing fixed- rather than free-form text, even free-format programming languages such as ALGOL.
For example, the commands include two search instructions, one of which is intended purely for finding labels in the assembly language of the machine, and the insert mode for typing in long passages of text is switched on/off by a blank line. It is thus impossible to create sequences of blank lines in the output file with this editor, and indeed, its read-in routine removes any such sequences present on input to the program.
A weak feature is that all commands take effect from the start of a line; thus specifying sufficient context to make alterations towards the end of a line can be tricky, even in programming languages; in free-form text it would be much more so.
The editor really operates in block mode where 'pages' are loaded into and written from a block buffer in response to explicit user commands; any line currently in the buffer can be accessed. To simplify its use for quick simple alterations, a line-by-line mode can be set in which a single line buffer is automatically read and written by any command which causes movement to the next line in the file with the resulting major restriction that backwards movement is not permitted except to go right back to the beginning. Many of the most useful features, the ability to create intermediate backup copies, to rename files when clashes occur, to concatenate files and so on are very heavily dependent on particular features of the operating system.
A version of the program is distributed which additionally displays the current state of that part of the file being operated upon, on a CRT display; command requests and diagnostic information are still via the typewriter.
Editors of this sort have the obvious, and certainly not to be underrated, attraction that the command structure is very simple indeed and that the beginning user is unlikely to destroy his files inadvertently; even if he does, the ability to create and replace backup copies is built into the editor and so more likely to be used than otherwise. Against this, the program imposes restrictions on the format of the output files which can be generated; it is not possible to circumvent the restriction on multiple newlines except by producing lines containing only say a space character which is always liable to be removed as redundant by some other piece of software.
The fact that the command language is simple is undoubtedly desirable to begin with but its inflexibility and inability to extend with his skill, is likely to exasperate the more proficient user; to learn to use an editor of this sort is to go up a blind alley since there are no facilities whatsoever for concatenating or repeating commands and therefore a very definite upper bound on what can be achieved. Additionally, the inability to concatenate commands prevents the user from segmenting any particular editing job into pieces which fit his individual preconceptions of what is to be done; by so dictating both the user's tactics and strategy, this lack of concatenation may in fact make the command language more difficult to use, even for beginners, and so be a more serious omission than it might at first sight appear.
In sharp contrast to the above is a program called TECO, also marketed by DEC but with their PDP-10 computer. There can be little doubt that this is one of the most powerful editors available anywhere but equally that its command language is the one of the most rebarbative and its error messages the most unhelpful.
TECO's command language contains many of the features, such as arithmetic, iteration, conditional branching, concatenation of commands, of a general purpose programming language and it is purely from this that its power derives; the primitive operations offered are truly primitive and some mnemonics hardly qualify as such - notably 'R' is used for 'left' and 'L' for 'right'.
A rudimentary command definition facility is available as TECO can be made to execute any arbitrary piece of text to which it has access as a command string; it is therefore possible to save a piece of text for later use, which may well be as a command string. To facilitate this sort of thing, thirty-six 'Q-registers' are available as workspace: these can hold either numerical values or text strings; a pushdown stack is also provided along with 'push' and 'pop' operations.
It is a matter of common observation that people will learn to use any command language, no matter how bad, if they are exposed to it for long enough; the main weakness, and especially for casual users, of a system such as TECO is that no bridge exists between its command structure and that of the 'safe' PDP-10 editor, LINED, which is similar to, but a little more primitive than, the PDP9-15 editor described above.
So far we have been concerned purely with program editors whose main function is to make changes to the body of program text or simple data files; even free-format computer languages such as ALGOL are very stylised indeed when compared with natural language free-form text as represented by the manuscripts of books, theses or even computer manuals. Rice and van Dam[3] describe various editors designed to meet this need, and especially to provide features such that the user can structure his file hierarchically and inspect and make alterations to it at whatever level he feels is useful at a particular time: either as words, sentences, pages or even 'hyperpages' of any arbitrary size. The main difficulty seems to be that the demand for such systems is currently rather limited and that those which have been built quite understandably tend to exploit expensive hardware to its limits, thus making them specific to a particular configuration; the use of a light pen to issue commands, rather than the more conventional and widely available typewriter terminal is an example of this.
An argument could be advanced for using CRT display devices purely for output, as in the PDP9-15 editor described earlier, on the grounds that it is less demanding of the display device and so potentially less costly. Further, the hard copy is available of the commands issued during an editing session should this subsequently be required: the volume of this should be substantially reduced by the removal of need to print out lines of the file here and there as editing proceeds. Further reductions in this direction could be achieved by improved command languages which require the typing of fewer command lines to obtain a given effect. Developments in this direction rather than eliminating the typewriter device altogether mean that the CRT display becomes a desirable option which increases the performance of an editor rather than a necessity without which it will not work at all.
A possible weakness of the free-form editors described in [3], with the exception of Stanford Research Institute's NLS, is that they either use non-printing format control characters or represent the desired format of the text quite literally with the result that the files so produced may make demands which cannot be met by some software systems. A more basic objection is that if format information is conveyed implicitly rather than by explicit markers, there are likely to be ambiguities which require some knowledge, not only of the the grammatical structure of the language, but even of the semantic content of the text in question, for their resolution. For example, in the absence of any explicit cues, is a newline followed by six spaces a new paragraph or merely an indentation? If these ambiguities cannot be satisfactorily resolved, then the systematic alteration of the output format, merely by changing a few parameters, is impossible. With regard to prevailing file system standards, it is perhaps significant that the authors of [3] say that their article was produced with the aid of a variety of free-form editing systems then go on to bemoan the fact that it had to be completely retyped because of file incompatibility between two different systems apparently running on the same computer.
My own feeling is that free form text editing using wide screen CRT's interactively with a light pen or cursor is unlikely to gain wide acceptance because of the cost and specialised nature of the hardware required and facilities offered, and because it involves learning to use a new program of rather restricted applicability; this is especially so when similar effects can be achieved in many cases with much cheaper and more widely available equipment. This document was produced using the editor subsequently described herein to modify files stored in a canonical form in which all format control information is represented symbolically as are underlining and uppercase letters and whose own physical layout bears no necessary relationship to that of the output; this form may be generated automatically from original input which contains the required layout information represented literally and any errors resulting from ambiguity in the input then removed. The canonical form for any natural language text is slightly more compact than the original bitwise as it requires a smaller alphabet. Proof reading was done with the aid of a CRT display and the formatting program[4] which also generated the final copy. This formatter also produces cleaned-up versions of its input files so there was no need when editing to make changes in a tidy manner; this saved much effort on my part. The whole system runs on an 8K PDP9 with DECtape I/O; as was pointed out by the authors of [3], the ability to treat a file as a hierarchical structure is invaluable and I would regard this as the system's main lack but one which it would unfortunately require a disk to remedy. Nonetheless it produces acceptable typeset copy with a very small hardware outlay compared to the free-form editors they describe.
To summarise, a wide range of editors are commercially available, most of which are primarily program editors and are often line-by-line, one-command-at-a-time affairs like the PDP9-15 editor described at the start of this section. A notable exception to this is TECO which implements something similar to a multi-register computer's assembly language except that it has string operations instead of floating point arithmetic. The former is easy to use, the latter easy to have disastrous accidents with.
The free form editors described in [3] with one exception make what, according to the standards currently prevailing, must be regarded as unrealistic demands on the capabilities of the file storage systems likely to be generally available. Unless large numbers of people require that the standards are higher, these needs are unlikely to be met generally and these editors would remain, as at present, dependent on particular file storage and character code conventions which may cause trouble in some other operating environment.
I think the biggest single hardware advance would be the wider
availability and use of CRT's purely as output devices to display the
relevant portion of a file continually, this being updated as changes
are made. This, together with improved command languages could
substantially reduce the bulk of hard copy generated by an editing
session without eliminating it as an aid to memory. An added advantage
is that editors designed for this sort of environment work perfectly
well when the CRT is not available; the same is not true of editors
designed to use the display interactively with a light-pen. In a
multi-access environment where the system may be entered from remote
terminals, this last point is not to be taken lightly.
References:
1. PDP9 Context Editor User Manual (DEC-15-YWZA-D)
Digital Equipment Corpn., Maynard, Massachusetts.
2. TECO: Text Editor and Corrector User Manual (DEC-10-ETEB-D)
Digital Equipment Corpn., Maynard, Massachusetts.
3. ACM Computing Surveys, Vol 3, Number 3, September 1971
Andries van Dam and David E. Rice
On-Line Text Editing: A Survey.
4. Edinburgh University Computer Science Department Internal Documentation.
C.H. Whitfield, August 1971
FORMAT: A free-form text formatting program.
It should be obvious that the real performance of any highly interactive computer program, such as an editor, is going to depend heavily on the behaviour of its users sitting at their consoles: if they make large numbers of errors or have difficulties with the command language when these could be reduced by changes to the program, then this must be regarded as a failing of the program and not of its users. It is therefore perfectly valid to consider aspects of human behaviour in designing the command language of such programs and so appropriate to review at this point what is currently known about human performance in the relevant areas.
We shall be mainly concerned with the learning of speed-skills which fortunately, because of its obvious commercial significance, is a very well researched area of psychology. To allow self-styled 'hard', as opposed to 'soft' scientists to read on, I would point out that we are only interested in how some hypothetical average human performs and not in why, so acceptance of results does not imply acceptance of any particular theory to account for the results. Nonetheless, I feel bound to say that E.R.Crossman[1] in particular has produced a mathematically formulated and adequately parameterised positive feedback model which predicts experimental results very well for a wide range of speed-skills.
Any system utility such as an editor will have a large number of users, whether singly on a small machine, or many-at-a-time on a large, time-shared one; it is therefore valid to concern ourselves with some hypothetical average user rather than with individuals. The behaviour of this user can then be quantified as can any other component of the system and individual differences ignored. It is conceivable that a very wide range of abilities might cause problems arising from the conflicting requirements of the two ends of the continuum; such difficulties cannot, by definition, be resolved with a single command language so the best we can do is to make the 'accept' band in the middle as wide as possible by making the command language flexible and able to appear differently in different circumstances.
Despite beliefs to the contrary, most of our behaviour is automatic and unconscious, this in fact being essential if we are to operate at a viable rate. The acquisition of a speed-skill involves quite simply the automation of large, invariant chunks of the activity; thus, speed is gained at the expense of flexibility. Subjectively, in the early stages of learning, execution of the activity is slow, tiring and variable from one cycle to the next. As larger and larger sections of the skill are automated, performance tends by discrete jumps towards the final state in which the highly overlearned skill is largely involuntary and very inflexible, acquiring almost the character of a reflex. It is resistant to conscious intervention either running its course as usual or breaking down completely; consider for example reciting the alphabet backwards or interchangeably driving cars with three- and four-speed gearboxes.
More formally, in the initial stages of learning and for a wide range of tasks humans process information at approximately ten bits/second and there is an increasing relationship between errors and speed. Much conscious intervention is required with consequent fatigue. In contrast to this, highly overlearned speed skills may be executed at apparent speeds of around forty bits/second and the relationship between errors and speed breaks down. Note especially that in this state, reducing speed may well increase errors as may conscious interference with the unconscious performance of the skill; performance is much more consistent and consequently more predictable. Also, Bartlett[2] has reported that the invariant elements do not degrade with fatigue but only the higher level control functions which, taken with the increased performance possible, provides two reasons for wishing our user to automate his skill as quickly as possible. Everyday skills such as using natural languages, playing ball games, operating machines and so on all include large components which are automated to the degree characterised above.
An obvious way to make a command language easy to learn is for it to exploit the already-learned components of other skills, but it must be emphasised that this can very easily rebound. If a spurious similarity exists between an already-learned skill and one which is to be acquired, learning will be made much more difficult and indeed intermittent errors may persist almost indefinitely. This is the basis of the so-called interference effects which will be referred to later.
It can be shown[3] that humans work best at a constant stress level and that, if permitted to do so, they adjust their work rate accordingly to very near the optimum. If made to work faster they make a disproportionate number of errors; if made to work more slowly, they lose concentration and work even slower.
The optimum value referred to above is that rate at which correct responses are produced at the highest rate; it is not such that no errors are made. What appears to happen is that self-paced activities, such as using a text editor, settle down to some work rate at which the error rate is non-zero but acceptably low. Since 'acceptable' is in this context defined by the consequences of making such an error, we can expect robustness, or lack of it, in the command decoder to affect the average work rate independently of, and in addition to, the structure of the command language.
Using a text editor can as a first approximation be segmented into
deciding what alteration is required then making it. These further
sub-divide into, firstly, deciding what logical change is necessary and
how this change can be effected by alterations to the body of the text,
and secondly, into converting the the static concept of the effect
required into a sequence of commands to achieve this effect, then
converting this into editor commands.
--------- ------------- ------------- ---------- ! specify ! ! static form ! ! sequence ! ! real ! ! logical ! -> ! of text ! -> ! of ! -> ! editor ! ! change ! ! alterations ! ! imperatives ! ! commands ! --------- ------------- ------------- ---------- (a) (b) (c) (d) Fig 3-1
Step a-b must of necessity lie with the user and may involve considerable intellectual effort; I contend that the better the command language the more of (b), (c) and (d) will be taken over by the command decoder and thus involve minimal effort on the part of the user. Where commands cannot be concatenated, the user has to organise tactics as well as strategy and is responsible for b-c and c-d. Permitting command concatenation relieves him of the tactics c-d.
By mimicking the form of the natural language imperative we can
gain a considerable advantage in the number of commands which the user
can hold in his short-term memory merely because the gross structure, if
not the detail, is familiar. A further gain can be had by automating
step b-c where appropriate; this can be done by using a static
prepositional form to qualify an imperative, thus disguising two
commands as a little more than one. For example:
Find 'FRED' Insert 'JIM' (1)
Before 'FRED' Insert 'JIM' (2)
Examples (1) and (2) might very well be commands having identical effects; the suggestion is that (2) is better than (1) because it is derived from one natural language imperative sentence rather than two. G.A.Miller[4] has shown that short term memory capacity is generally greater for more structured objects than for less structured ones; an obvious example is the use of hexadecimal and octal numbering schemes to describe strings of binary digits.
G.A.Millar again, [5], has shown that transformation times for natural language constructions may be quite appreciable; he quotes for example 1.5 seconds to convert an active simple sentence to its passive form. The effort involved in converting one imperative modified by a prepositional clause into two imperatives is unlikely to be trivial, and effort leads to fatigue.
Improvements in this direction will have their biggest effect on the time required by new users to reach some arbitrary level of proficiency with a command language but it seems highly probable that even with long term users, some effect will be detectable even after long periods of fairly intensive use; studies of speed-skills such as are required on assembly lines have shown periods of anything up to two or even three years before performance stopped improving.
It was suggested earlier that the more of an activity could be learned as a speed-skill the better and that such learning would proceed more quickly when other already-learned skills could be exploited and consequently the natural English imperative form was mimicked. The final stage is use mnemonic commands to reduce the amount of typing required without, hopefully, losing the parallel with the already learned background skill possessed by most of the users, that is natural English. This is possible because, despite the views of Lewis Carroll and Humpty Dumpty, and whether we mean it to or not, any symbol to which we have been exposed acquires associations of one sort or another. In the restricted context of, in this case, single letter mnemonics, text editors and English speakers, many of these associations will be stable within the population and can therefore be exploited.
In the light of previous remarks made about learning speed-skills it should be clear that attempts to use associations like 'L' for 'right' and 'R' for 'left' will produce severe interference effects which will be very difficult to extinguish. We must especially beware of superficially rational arguments to justify the use of patently bad mnemon cs. The previous example might be rationalised by something of the form: 'we have a command 'L' which means go forward so many Lines and another 'R' which means Retreat so many lines; what could be more rational, consistent and elegant than that the degenerate form 'L0', abbreviated to 'L', should mean go to the right-hand end, and similarly 'R' to the left-hand end, of the current line?'
Note that interference is a two-way dynamic process; here the natural language interferes with the artificial one and vice versa, any period without exposure to one increasing the relative strength of the other. The loser, again particularly for casual users, in any deal of this sort is bound to be the artificial language.
Another point of lesser importance but well worth considering is the possible effect of operant conditioning on our terminal user's performance. While it is no doubt unpleasant to to see ourselves as sharing aspects of behaviour with white laboratory rats it is nonetheless true that we condition just as effectively. Anyone who doubts this might ask himself what the attraction of fruit machines and bingo is. The basic requirement for operant conditioning to occur is quite simply that some activity becomes temporally connected with something that the subject regards as pleasant and that continued activity continues to produced the desired effect. The most durable conditioning is produced when when the rewards are randomly distributed in time and proportionate to the effort expended but sometimes do not materialise ('random ratio reinforcement'). I would suggest on this basis that we need not worry about lack of motivation in our console users.
I make no apologies for this exceedingly sketchy section; the contents can be found in any second year experimental psychology text though in some cases the theoretical models offered are weak. This section is important because whatever the failings of current psychological theorising in some areas, the effects the theories seek to model are perfectly real and will not go away if we simply ignore them.
Since they are real and do affect the potential performance
of the user/interactive program composite to a marked degree, it is
perfectly proper that they should be considered in designing the user
interface for such a program.
References
1. Ergonomics, Vol 2 1959, pp163-166
E.R.F.W. Crossman
A Theory of the Acquisition of Speed-skill +
2. Proceedings of the Royal Society, B, vol 131, 1943, pp248-57. (Ferrier Lecture)
F.C. Bartlett
Fatigue Following Highly Skilled Work +
3. J. Exp. Psych., 1966, 71, 849-857
P.M. Fitts
Cognitive Aspects of Information Processing:
III. Set for Speed vs. Accuracy
4. Psychological Review 1956, 63, 81-97
G.A. Miller
The magical number seven plus or minus two: Some limits on our capacity for processing information.
5. American Psychologist, 1962, 17, 748-762
G.A. Miller
Some Psychological Studies of Grammar.
+ Reprinted in:
Penguin Modern Psychology Readings: Skills
David Legge(Ed.)
An editor is simply a text manipulation program which operates on an input file according to instructions given it by its user; compared to even a simple macro generator, its string structuring facilities are minimal; also, string operators and operands are implemented in many modern high-level programming languages. The point of all this is that in designing a command language, while we may wish it to be powerful and flexible, it must be borne in mind that there are jobs which it could do, that are more conveniently done either with a macro generator or by writing, compiling and running a short program in some suitable high-level programming language, and that consequently the final decision as to where the line should finally be drawn is by its very nature an arbitrary one.
In descending order of importance, I see the significant
requirements to be:
1. The command language must be a single language: that is, there
must be a continuous gradation from the simplest commands used by the
least experienced user to the most complicated ones used by the most
proficient.
2. The basic command structure and simple commands must be very
easy to learn; initial impressions tend to stick.
3. The command language must be as redundant as is reasonably
possible to make it less likely that errors will go undetected.
4. There should be a command definition capability: (1) almost
requires iteration, command concatenation, conditional branching and
so on, thus making it possible in principle to build up very
complicated command sequences. Most of this power is not practically
available if no macro facility is provided and such commands have to
be typed from scratch every time they are used.
All command languages have two attributes in terms of which they can be evaluated: these might be characterised as power and utility, the former being easily defined in a negative way by whatever cannot be specified in the particular language. The latter is more elusive and must be defined in terms of our hypothetical user's performance over a set task using different languages; note that this carries the implicit assumption that sufficient is common within the sampling groups that stable comparisons can be obtained. By mimicking the form of the natural language imperative in the native language of the members of the groups I would hope to meet this requirement.
It must be realised that most editing is simple and that an editor without simple, obvious, 'safe' commands is pretty well useless; any editor ever written as such, even TECO, is a puny device compared to a powerful macro generator and there is no point in building editors which provide power without utility. Note especially that languages which provide economical, succinct descriptions of events are almost invariably obscure to the general population because they make many assumptions about the background knowledge of those they are intended for; such languages are totally unsuited to the off-the-cuff production of non-optimal command strings by the unspecialised users of a system utility. They also tend to be less redundant, thus making it more likely that typing errors will go undetected.
A feature required rather infrequently but one about which a disproportionate amount is heard because of the difficulties it causes with most editors when the need does arise, is the ability to segment a file into an arbitrary number of pieces and then reassemble them in a different order. The real solution to this is the hierarchical file structure mentioned earlier but this is a comparatively expensive feature to implement on conventional linear storage devices and certainly not justifiable in a general purpose program editor. Furthermore, a language suitable for describing simple search/delete/insert style operations on a linear file is not suited to describing restructuring operations on a tree-like structure; what I am saying is quite simply that to include such a feature in a general purpose editor is to include a group of commands which bear little relation to the others: this is both in direct contravention of (1) and represents a continuing overhead for a little used facility. If a distinct language is required to describe these operations and they are infrequently used, they might as well be in a separate program.
The language finally settled on provides for concatenation of
commands, iteration either for a specified number of repetitions or
until some condition is met or a combination of the two, specification
of alternative command sequences to be tried successively until one of
them succeeds or they all fail, and definition of new commands with user
chosen names in terms of the commands already available.
Acceptable command strings belong to the class:
=
'!',
'[''=' * ']',
;
where:
= <>t4 (',' )* ;
= <>t4 ( )*;
= <>t4 '(' ')' ,
<>t4,
<>t4;
and:
= <>t4 '(' ('+','-','T')4? ')', ;
= <>t4 '\', ;
= <>t4 '?', ;
= <>t4 '$' *, ;
= <>t4 any character but newline.
Text strings are delimited by '/', ':' or '.', redundant delimiters
between consecutive text strings being elided. The special symbol '-'
can be used to stand for the last text string typed at any point in a
command string.
Within command definitions, '#n', 1 <= n <= 4, refers to the _nth parameter and matches with the _nth type specifier.
Commands defined in terms of other commands are not bound at
definition time but at call time; it is not therefore generally possible
to check new commands when they are defined.
It is not possible to concatenate command definitions or the
special re-execute command, '!'<n><query>.
Note that the language contains context-sensitive elements not explicitly mentioned above; for example, parameters must match commands, '-' means '-1' as a number and 'the previous text parameter' if decoded as a text string, for non-definition commands, command names must be defined and so on.
Examples of syntactically correct strings follow; see the user
manual in Appendix 1 for more information about semantics.
The following strings are syntactically acceptable, assuming of
course that the command names used are defined.
1. Change 'fred' to 'jim' and print the line.
C /FRED/JIM/ P
2. Find 'xxx' and change it to 'yyy'
F/XXX/ C-/YYY/
3. Do (2) throughout the whole file.
(F/XXX/ C-/YYY/)*
4. Define a command to perform this global substitute.
[$GLOB(TT)=(F#1 C-#2)* ]
5. Call new command defined in (4).
$GLOB /XXX/YYY/
6. Close off the new file, terminate editor run.
$CLOSE
7. Possible definition for the 'find' command(F) used above: move right one character position(R) or,
failing that, move(M) to the next line (so that consecutive 'find's
locate consecutive occurrences of the target string), try to set
pointer 'before'(B) target string on this line then move to next
line; do this until 'B' succeeds.
< [f(t)=(r,m)(\b#1m)*]>
The desirable properties of this language I would list as:
(a) The simple forms can be made very similar to English
imperative forms: examples (1), (2) and (6) above.
(b) Straightforward repetition is a very simple variation on the
non-repeated form: examples (2) and (3)
(c) User defined commands are called in exactly the same ways as
any others and the definition of them is a straightforward variation
on the immediately-executed form: examples (3), (4) and (5).
(d) Although by definition, complicated commands are complicated,
the 'find' command in example (7) is a straightforward extension of
the forms used in the preceding six examples: we are dealing with a
single language, not a collection of odds-and-ends.
(e) Despite (a), it can be decoded by an LR(0) parser; the
fact that no backtracking is ever required makes it cheaper to
implement than otherwise.
The undesirable properties of the language stem from its very
generality and are all connected with the production of diagnostic
information after errors have been detected. From the outset, the intention was to make the program as easily
transferable and modifiable as possible, these two unfortunately going
together as an efficient implementation of a program of this sort, as
opposed to one with simpler i/o requirements and less dependent on
character manipulation operations, of necessity makes assumptions about
the installation on which it is to run: primarily these are word-length,
byte-oriented addressing or not, virtual memory or not and if not, how
much main store and what sort of fast backing store if any, and lastly,
is it a single user machine. Unfortunately, all these considerations
affect strategy rather than tactics so it is unlikely that much
automatic assistance will be available. I have therefore attempted to
isolate the various parts of the program so that they can be altered
individually without repercussions elsewhere.
Especially important is that the command decoder must be completely
separate from the editor proper since it is this latter part which will
be affected most by changes in the operating environment. This
separation is in fact forced upon us by the requirement for a command
definition facility and the immediate consequence that it becomes
unacceptably inefficient to interpret the user's command string directly
at execution-time, the alternative being to generate some other
representation of it.
The program falls quite naturally into a self-contained command
decoder which converts command strings into a pseudo-code which is
executed by a simple sequencing mechanism which calls the required
editing operations in the appropriate order; editing routines are thus
self-contained and their internal operation is largely separated from
each other and completely from the decoding and sequencing mechanisms.
All editing operations are defined on a single line-buffer with the
conversion between the line-buffer and the format of line-records held
in a bulk buffer being performed by a pair of complementary routines:
the bulk buffer format can thus be changed, for example to suit
computers with differing word-lengths and/or byte-oriented addressing
without causing difficulties elsewhere. This is important with programs
such as editors where the data buffer space required is likely to be
quite large; altering the packing density of these line records provides
a very simple way of trading speed for storage to suit the operating
environment.
The command language consists of a number of fixed special symbols,
'(', ')', '*' and so on, together with an indefinitely large and
variable number of command names; command names always precede any
parameters they may require and multi-letter names are explicitly
marked: it is thus an LR(0) language and is easily decoded by a
specialised, but simple, keyword macro scheme which assigns a fixed
significance to the 'special symbols' and looks the variable elements up
in a macro definition table. Extensions to the language are made simply
by making additions to this table.
To remove the possibillity of recursive loops in what is after all
a command decoder not a general purpose macrogenerator, any particular
macro only has access, when being expanded, to those definitions which
were already present when it itself was defined: it can thus never call
itself, either directly or via some arbitrarily long chain of
intermediate calls. An immediate consequence of this restriction is
that when a command is redefined, it must quite literally be replaced
rather than deleted and a new definition generated, otherwise the
possibility arises that other macros which accessed it prior to
redefinition might not subsequently be able to do so; a secondary but
nonetheless highly desirable spin-off is that the effective length of
the macro table on the n+1th call must be less than it was on the
_nth and consequently, that as we go further into the parse tree and the
number of nodes and derived searches increases, the mean length of these
searches decreases with a consequent saving in execution time.
While macro substitution can convert one string of input symbols to
another, the result is still a string of characters; to permit the
complete separation of the decoder and editor functions it is desirable
to transliterate the characters into numerical codes which are acted
upon by the sequencing mechanism and editing routines. In the interests
of increased execution-time efficiency, by avoiding unnecessary
scanning, it is desirable that the decoder should be able to calculate
what are effectively absolute addresses and this is duly done.
Addresses are planted by chaining together all points where a
reference is required then replacing these list pointers by the value of
the current location counter at some suitable time: no symbolic
labelling mechanism as such is required. This is possible because all
jumps, except those connected directly with the use of parentheses, are
forward jumps.
This decoder thus has most of the essential features of a simple
macro-assembler plus a number of features which are required to process
the special characters, '(', ')' and so on, which were mentioned
earlier. In practice, and unlike many general purpose macrogenerators,
this decoder appears to be very resistant to mistreatment due largely to
the presence of rigid checking of all parameters and the absence of
features like definitions within other macro calls and recursive calling
sequences. The absence of these latter two means that the environment
is static throughout the decoding of any particular command string, thus
greatly reducing the complexity of 'peculiar' cases that may arise; they
can all be diagnosed in terms of parameter mismatching and/or the
absence of a macro name from the appropriate part of the definition
table. The protection of the basic system macros and checking of
parameters at each level of calling means that diagnostics can be
printed which reference objects that the user knows of. It goes without
saying that all these restrictions reduce the power of the decoder but
I consider the trade-off of power for robustness a good one for reasons
made clear in the preceding sections.
Some traditional macrogenerator features could be put in, for
example the generation of sequential integers, but such features, in the
context of the whole editor, would be much better implemented as editing
operations than put in the command decoder.
The problem mentioned in the last section of defining the state of
the editor after an execution-time failure is dealt with by specifying
that any 'within line' command leaves everything as it was prior to the
attempted execution while multi-line commands in general do not: the
resetting of pointers is implemented in the editing routines themselves
for the small number(3) of operations involved. While the basic
commands conform to this standard, it must be clear that macros can
easily be defined that do not: suppose we have a primitive command 'R
_n' which moves _n characters position to the right within the current
line and fails if this is not possible - consider the difference between
'R5' and '(R1)5' on failure.
The notion of a unitary command is useful as one which either
succeeds completely or alternatively, fails completely leaving the state
of the editor unchanged: all primitives except 'move' are unitary as
defined but this can clearly be changed by techniques as above. It
seems highly desirable that all commands should be unitary and a
recovery mechanism linked to the scope-defining parentheses would give
this. In the above, 'failure' means, of course, genuine unforeseen
failure as opposed situations which were anticipated and allowed for
with '*' or '?'.
The peripherals attached to a small computer are likely to be much
more variable than those on a large machine; the small machine may have
paper tape only, magnetic tape or possibly a disk: the amount of core
store available is likely to vary by anything up to a factor of six. In
particular, these factors affect how requests to move backward through
the file are dealt with: do we rely on holding as much as possible in
main store and deny access to anything that has been written out or do
we write and read back again intermediate files, only copying to the
final output at the end of the editing session? The former method can
give much higher performance for reasonably short files but imposes
rigid restraints on the size and so the complexity of the editor program
as enough store must be saved to give an adequately large buffer. The
alternative method will be slower because of the file copying required
which may be a critical point unless a disk is available. The editor on
which this thesis is based operates in this latter way. Note that the
above problem only exists when comparatively slow bulk storage is
available: with paper-tape only there is no real choice in the matter
while with a disk, the difficulty disappears.
A program listing of this editor appears as Appendix 2: the
intention when writing the version which appears was to make the program
easy to follow while still containing all salient features. I must
emphasise that the implementation whose listing appears was not intended
to be and is not efficient in any operational sense: notably, it uses
whole machine words to store 8-bit character values and contains a bulk
move routine and the line-record pack and unpack routines written in a
high-level language.
Implementation of this editor requires the implementation of the
editor program per se and also the specification of a standard command
set in terms of definitions which reference primitive operations defined
in that program. Since the definition facilities are described in the
user manual appearing in Appendix 1, all that requires to be said about
this latter topic is that it is largely a matter of convenience which
commands are defined as primitives and which are macros: the constraints
which do emerge are concerned with the state of the editor's various
pointers after an error condition has been trapped, which point has just
been dealt with in the previous section.
In the following section, numbers in square brackets refer to line
numbers in the program listing in Appendix 2. The command decoder[91:547] requires access to two main data areas,
array 'ST' which is used to hold a stack, and array 'B' which
contains both the macro definition table and decoder output. This
output consists of numerical function codes with associated references
to literals, strings and numerical values, and the literals themselves.
The literals are stored from the bottom of B upwards, the macro table
and the interpretable code from the top downwards in that order. New
commands can only be defined from the basic command level so there is no
risk of these latter two getting mixed up. Storing text literals from
the bottom up gives a marginal advantage over the other way around
because they can then be stored immediately even though their final
length is not at that point known. It has the slightly bizarre result
that the 'code' has to be planted and executed in the reverse direction
to what might seem more usual. 'W' points to the first free cell above the literals, 'X' to the
first free cell below the interpretable code, 'Y' to the last cell
occupied by the macro table and 'Z' to the first(non-existent) cell
after the table as in Fig 6-1 above. The macro table consists of at least as many entries as there are
commands defined with the most recently defined command being
represented by the entry starting at B(Y). Each entry consists of a
macro name and body each preceded by a length cell which, since these
objects are stored consecutively, gives the required offset to the next
length cell.
In this implementation and in the interests of simplicity, the
macro table entries, except for the length cells, contain one seven-bit
character in a machine word. A first obvious step to improve the use of
storage is to pack up these characters in some suitable way. If the
target machine is byte-oriented, all cells, including the length cells
will fit into eight bits; if the machine word-length is a multiple of
six, six-bit characters might be more appropriate when it will be
necessary to use a shift character if more than the restricted IBM
card punch character set is to be catered for. In either case,
alterations are required only in The length cell for the macro body currently holds both the length
information and an eight-bit code word specifying the parameter types
required: in a byte-oriented implementation it will be necessary to put
this into a separate cell thus decreasing the maximum permissible
definition length by one, requiring changes at [497] and [286]. The
general layout in the current implementation is as shown above in Fig
6-2 but note that the length cell associated with the name holds a
protection bit which is tested before redefinition; macro names, all
else being equal, are thus restricted to being half the length of
bodies.
The decoder is the recursive routine EXPAND[120] which operates
by calling itself to expand references to other macros within the the
string currently being processed; within each level there is a need to
keep track of the balanced parentheses which delimit the scope of loops
in the command string; in this implementation this is done by a separate
stack kept in 'ST' and the stack pointer passed as a parameter of
each call of 'EXPAND'. This is unfortunately required in IMP by
the necessity of passing a variable number of parameters with each call
on EXPAND, this being done by loading them onto this private stack
before each recursive call. Writing in machine-code a single stack is
all that is required: since parentheses must be balanced within each
macro definition to permit localisation of bracketting errors we obtain
the desirable side-effect from the strict nesting of calls and
parentheses that a single pushdown acceptor suffices.
Routines PUSHBR[151] and POPBR[167] keep track of the number
of left and right parentheses encountered, POPBR also planting code
to set the execution time loop and branch conditions associated with
each bracket pair. An extra bracket pair is inserted around a command
[210] if it is either a command typed in by the user - that is the first
non-recursive call on EXPAND, or if an unparenthesised command was
qualified by '\' or '?'. This simplifies implementation of the
execution portion of the program by converting statements of the form
'\M5' into '\(M5)'. This is simply a pushdown automaton controlled by numerical codes
stored in 'B' by the decoder; it controls the dispatching of control to
the various editing routines and tests the result (succeed/fail) on
return, also maintaining loop counts for all partially executed
repetitions. Array 'ST' is used to hold a single stack needed to
implement the pushdown.
The handling of failure conditions is dealt with by associating an
error word with each alternative execution path: should a failure occur,
this word is decoded at ERROR EXIT[611] and appropriate action taken.
There are two parts to the editor proper: a buffer control routine,
MOVE TO[829] and the editing routines defined on a single line
buffer. A pair of complementary routines, PACK[664] and
UNPACK[709] convert between the format of the bulk buffer and that of
the line buffer where all editing is done. Both these buffers are held
in array AA with all text above the current line at one end, towards
AMIN, and that below, towards the other end, with the line buffer in
the gap between. This scheme has the highly desirable property of
concentrating all activity in AA into a small area and should reduce
the amount of page turning in virtual memory systems.
The contents of the bulk buffer are in line records each with a
header word containing the offset to the header word of the next record,
backwards if above the current line, forwards if below. It is thus
possible to move backwards or forwards a number of lines without having
to unpack the intervening ones. A consequence of this is that
multi-line text searches are slower with this scheme, relative to a
straight move than might be the case with some others. Also, note that
of the two conversion routines, PACK and UNPACK, it is more
important that UNPACK should be efficient because lines must always
be unpacked before they can be examined but only need to be repacked if
they are changed; in particular, text searches involve unpacking.
In the interests of a simple implementation, all file i/o is via
the line buffer so that the only path to/from the bulk buffer is via
PACK/UNPACK. While this may seem highly desirable in that it means
there need only be one copy each of the pack and unpack routines it in
fact causes problems mentioned later. In a byte-oriented environment
where the transformation between the bulk and line buffer formats is a
trivial one, the above becomes an unnecessary restriction which would be
better done away with. In particular, MOVE TO would be much more
efficient if it had direct access to the the bulk buffer without going
through this path.
MOVE TO guarantees to position the 'window' held in the bulk
buffer in such a way as to minimise access time, tape winding and so on:
it must therefore be very heavily influenced by the environment in which
the program is to run. It also guarantees to return the buffer in good
order with spare space left for insertions thus a call to move to the
current line is an easy way for the rest of the editor to make space in
the buffer. The routine has to cope with two main problems: firstly
that access may be required to a line which was, but no longer is, in
the buffer and secondly, that the buffer is full and there is nowhere to
make an insertion. The latter is quite likely to turn into the first
problem as it desirable move the window in something larger than single
line increments if the backing store is comparatively slow, say magnetic
tape or a slow disk; this may result in the point at which insertions
were being made becoming inaccessible. The routine implemented here is
a simple heuristic program which aims to over-react slightly to whatever
caused the problem to arise so that hopefully it will take longer next
time to recur. Also attempted, not very successfully so far, is the
reduction of a series of single line 'move' commands into one longer one
mainly to increase the efficiency of 'F'. This last problem is brought
on entirely by the command structure; the editor proper is executing a
sequence of commands completely in the dark as to what is to happen
next.
The editing routines start at [919] and as they are all defined on
a single line buffer, are simple and should be self-explanatory.
'Insert' suffers from the problem aluded to above that the only i/o
path to and from the bulk buffer is through the line buffer: if 'insert'
runs out of space, the use of the line buffer is required to clear the
condition this unfortunately involving repacking the current line first
and so losing the position of CP within it. A solution of sorts
would be to ensure that enough space is available at all times to insert
a single newline and redefine the 'insert' macro so that if the
primitive failed for lack of space the macro inserted a newline then
moved backwards one line to cause MOVE TO to tidy the buffer up and
make space; deleting a newline, in fact the one just inserted, would
then restore CP and enable us to continue. This no doubt illustrates
the power of the command language but seems a little messy otherwise.
The obvious solution in a byte-oriented environment is to access the
buffer directly.
Beside the insertion of packing routines, if this is called for by
the machine structure, large speed gains can be had by recoding the bulk
move routine MOVE[67] to exploit the characteristics of whatever
machine the program is to run on. Smaller gains can be had in a large
number of places where consecutive store locations are accessed, notably
in string matching and stacking and unstacking variables; compiler
generated code tends to be bad in this respect because it uses the
CPU to perform trivial address arithmetic which can often be done
directly by the machine hardware. In programs of this sort where these
operations are important, substantial speed gains can be obtained.
These gains are worth having even in a small single user machine
because they can usually be traded for storage, thus adding to the
buffer-size. Note that in general while speed gains are fairly easily
had by recoding critical routines, reduction of the code size involves
extensive recoding either automatically or by hand. Editors are a
special case here because they invariably have a large data buffer, only
a small part of which is accessible at any instant and large amounts of
space can be saved by packing this more densely, the problem being that
this is a time-consuming business; the situation thus arises that a
large amount of (data) space can be saved by recoding a small part of
the program to make dense packing acceptably fast. The two level buffer
used here is designed among other things to facilitate this.
A point worth mentioning but no more is that this implementation
does not as it stands deal with the full ISO character set, in
particular, all lower case letters are converted to upper case. The
main saving is in execution time as otherwise it would be desirable that
the search routine be made to match strings irrespective of case
differences; this would represent a continuous overhead and since the
file system into which the intermediate copies are written cannot cope
with the full set anyway this feature was left out. Clearly with the
command structure which exists it is very easy to put this feature in
together with the necessary commands to enable/disable the case
equivalencing should this be required.
The current state of the art would appear to be such that 'editor'
usually means 'program editor' and that these devices tend to contain
features, for example throwing away multiple newlines, which would be
unacceptable if processing free-form text directly; they also tend to
have 'closed-ended' command languages which pose fairly severe
restrictions on what can be done with them. There are one or two
exceptions to these generalisations, notably TECO, but these all go
too far the other way, sacrificing utility for power and consequently
demanding quite a high degree of expertise on the part of their users.
For casual users especially, this lack of a single command language
will prove to be a nuisance and appears to be completely unnecessary.
In addition to the structure of the languages, command mnemonics
are badly chosen in many cases, again disproportionately increasing the
difficulties of the casual users since bad mnemonics are essentially
those which are forgotten most quickly.
Two sets of mnemonics can be compared in terms of the extent to
each interferes with the other: this was not done because there is no
point in doing it with specially made up made up mnemonics which could
be arranged to deliver whatever result was required. Doing it with
editor command mnemonics requires two fairly large, thirty at least,
groups of casual users. These were not immediately available though
with the more widespread use of EMAS this should be possible in a
year or two. The problem lies in obtaining unspecialised users; as
was noted elsewhere, people will learn any set of mnemonics if forced to
use it for long enough but their subsequent behaviour in this small area
could hardly be said to be typical of the population at large.
The editor described here has the command decoder isolated from the
editor proper, the decoder and sequencing mechanism making no reference
to editing at all. In a paged, virtual-memory system, keyboard response
time could probably be improved by providing a table-driven command
decoder as a system utility since then hopefully the code representing
the decoder would be more heavily used and so less likely to cause page
delays. My main doubts here were whether or not a macro scheme could be
made sufficiently robust: it seems that it can.
The gross structure of an editor program is very little affected by
the environment, except perhaps for paper tape only i/o, but its
performance may be very greatly enhanced by modifications which make it
fit this environment better; these modifications will be at the level of
strategy rather than machine-level tactics and little help will be
available from automatic optimisers. It is therefore highly desirable
that editor programs should be modular to facilitate making such
changes. In particular, trading CPU capacity for storage is
particularly easy with editors because of the presence of large data
buffers whose packing density can be increased for the price of the
processor time to do it.
A fairly large number of special purpose editing systems are
available for modifying free-form text. At the current time most i/o
systems are designed primarily for dealing with text files representing
program source texts and data, and may not be capable of meeting more
stringent requirements. I would expect that and the fact that many of
these editors depend on CRT displays to limit their appeal.
The feeling I get from looking at a number of the editors on the
market is that there are too many of them, this largely stemming from
the fact that they seem to be either safe but very restrictive or
powerful at the expense of being impossible for a beginning user to
start with; the editor offered here is intended to fill this gap.
This document describes a context editor: it is primarily a program
editor but can genuinely be used on free-form text subject to the
restriction that the output file, if not the input, will contain
lines terminated by a newline character, each being no longer than
some implementation-defined maximum, and that the last character in the
output file will therefore always be a newline. No special free-form
formatting facilities such as page generation or right-justification of
lines, are included.
All commands consist of an alphabetic command name followed
optionally by up to four parameters, each of which may be either a text
string or a number: numbers may be omitted when they take the default
value of +1; no such default is available for text strings. Spaces are
ignored except where necessary to separate consecutive numerical
parameters or mark the end of a multi-letter command name. Multi-letter
names must be preceded by '$' to avoid confusion with consecutive
parameterless single letter commands.
The special symbol '-' may be used to stand for the last text
parameter typed at any point in a command string; 'N' must be used to
stand for quoted newline. If interpreted as a number, '-' alone means
-1.
In principle, any number of commands may be concatenated to form a
single command string whose elements are executed serially exactly as if
they were typed in one at a time; execution of the whole string stops at
that point if any element fails. Where a repeated component can be
isolated, unnecessary typing can be avoided by repetition of that
component, thus '(XYZ)3' is exactly equivalent to 'XYZXYZXYZ'. If
it is not clear how many times a sub-string should be repeated, the
number can be followed by '?': thus '(XYZ)3?' specifies that any
number of repetitions up to three, zero included, is a correct execution
of the command. '*' can be used as short-hand for a very large number
followed by '?', thus for practical purposes '(XYZ)*' is a command to
execute 'XYZ' repeatedly until it fails.
Two or more sub-strings may be specified as alternatives to be
tried in order until one of them succeeds; if none succeed, then the
whole alternative list has failed. For example, 'ABC,PQR,XYZ'
specifies a command which succeeds if any of its three component parts
succeeds and fails only if they all fail. The main use of this
feature is in repeated commands as above; for example we might type
'(ABC,PQR,XYZ)*' which will continue to execute until all three
components fail on the same repetition. Execution of the alternative
list starts at the left hand end on each repetition irrespective of
which of its components was executed on the previous one.
The conditional branching in the previous paragraph is dependent on
a decision as to whether a particular element of a command failed or
succeeded; the sensing of these conditions can be inverted by
preceding any command by a '\' whereupon the branch will occur when the
command succeeds rather than as above, when it failed. More precisely,
the branch condition for any command derives from comparing its result
(succeed/fail) with its immediately enclosing environment which may also
be inverted: thus '(x)' and '\(\x)' are equivalent as are '\(x)' and
'(\x)'.
New commands may be defined with names chosen by the user, the
effect when they are subsequently called being exactly as if the command
itself had been typed out each time. These may be parameterised in the
same way as standard commands. The standard commands are defined by a
text file read in each time the program is loaded and can thus be
changed very easily merely by editing that file; this should be borne in
mind when reading the rest of this document.
The primitive operations and the standard commands all take effect
relative to a character pointer (CP) which initially is at the
start of the input file. Also used subsequently is the notion of the
current line, this being the line of the file in which CP
currently resides.
To aid description in the following examples a line is at some
point taken to be line N and other lines thereafter referred to as
N+1, N+2 etc. This numbering is fixed at the time N is assigned ;
for example, N+1 and N+3 are consecutive lines if N+2 is removed.
Note that the string search does not ignore spaces so that
although for example, 'PRINT(' and 'PRINT (' have the same meaning
in IMP, they are different strings of characters and will be treated
as such by the Editor.
Command Definitions
Any command string which can be typed directly into the editor can
be given a name and so added to the command set available.
The format of a command definition is: Up to four parameters are permitted in any one definition and their
types may be text string(T), positive integer(+) and positive or
negative integer(-).
Restrictions are: Definition is in fact destructive re-definition, a new command name
being created only when that name is not already available; attempts to
dynamically change the standard pre-loaded commands will be monitored.
These can be changed by editing a short text file of definitions which
is read every time the editor is loaded.
Note that commands are bound at call-time not definition-time and
that consequently changing a command A which is referenced by
another, B, will alter the effect of B. This can be exploited
when building up large commands by permitting parts of them to be
changed without retyping completely.
It would perhaps be as well to point out here that while this
editor implements a fairly powerful command language, long editor
commands are, compared to a compiler-implemented string manipulation
program to achieve the same effect, grossly inefficient and restraint
should be exercised in the use made of it; this rider of course does not
apply if the program is run on a single user small computer.
A further caution is that as commands are not bound until call-time
they cannot be checked at definition-time. This does not affect the
standard commands because, once they are loaded they cannot be altered
or renamed on that run of the editor but new user-defined macros should
be checked out carefully as would be done with any other program.
A list of the definitions of the standard commands follows: Note the re-use of identifiers C and G to avoid having
'unofficial' names accessible to the console as would happen if distinct
intermediate names were used: this is permissible only in the pre-loaded
definitions and is unambiguous because they cannot be redefined.
The extra level of macro calling in these two definitions is needed
because parameters cannot be converted when referencing a primitive
operation directly; the only acceptable parameter form in this context
is '#n' where 1<=n<= 4.
The primitive operations correspond to those identifiers which are
not defined in the above table; their names are fixed.
The '*' at the end of the list above switches the editor into
command mode after loading the standard commands at the start of an
editor run.
This Appendix contains a listing of the editor program; the
implementation is intended to show how the algorithm works and is
certainly not efficient either in CPU terms or in its use of storage.
See section 6 for suggestions as to how the program may be
efficiently implemented.
(f) The production of error messages is complicated by the
absence of any necessary connection between the command string typed
by the user and the one actually executed.
(g) Ideally, error trapping should be such that after any
(h) Since we have implemented a programming language complete
with looping and conditional branching instructions, there is no
decode-time guarantee obtainable that any particular command will
terminate in a finite time: detection of some non-terminating loops is
possible in special cases but not in the general; the use of a large
finite value to represent '*' is dictated by this.
(i) Because of its generality, the language includes many unusual
forms which might be considered sufficiently so to be regarded as
typing errors: for example, because of the presence of the alternative
feature(',') and the null Program Design
Implementation
Command Decoder
------------------------------------------------------------
! -> -> ->! !<- <- <- <- !<- <- <- <- !
! ! ! interpretable ! !
! literals ! free ! ! macro defns !
! ! ! code ! !
! ! ! ! !
------------------------------------------------------------
W X Y Z
(arrows show direction of growth)
Fig 6-1
------------------------------------------
! ! ! ! !
! q - p ! ! r - q ! !
! ! macro ! ! macro !
!--------! !--------! !
! ! name ! 8-bit ! definition !
! free ! ! code ! !
! ! ! word ! !
------------------------------------------
P Q R
Fig 6-2
Sequencing Mechanism
[553:627]
Editing Routines
Conclusions
Appendix 2: EDIT5B User Manual
Basic Structure
Text Referencing Commands
F 'text string'
Find the next occurrence of 'text string', searching sequentially
through the input file from top to bottom and left to right starting at
the next character but one after CP. By skipping one character we
ensure that successive calls on 'F' locate successive occurrences of
'text string'. 'F' is the only text referencing command whose scope
is greater than the current line.
A 'text string'
Set CP after the text string specified.
B 'text string'
Set CP before the text string specified.
C 'text1'text2'
Change 'text1' to 'text2'.
D 'text string'
Delete the text string specified.
I 'text string'
Insert the specified text string to the left of CP.
U 'text string'
Delete from CP up to but excluding the specified text string.
V 'text string'
Verify that the text string specified lies immediately to the right of
CP: fail unless it does.
Note
1. '/', ':' and '.' may be used as string quotes in addition to
the more conventional ''' used above; these others are more
conveniently placed on most keyboards.
2. All the above commands except 'F' fail, leaving CP
unchanged, unless the text string specified lies between CP and the
end of the current line.
Commands with Numerical Parameters
E +n
Erase _n characters to right of CP.
G +n
Insert _n lines from the user console above the current line. CP is
moved back to the start of the current line. Typing ':' on a line by
itself causes a switch back to command mode at any time and irrespective
of _n. When in command mode the prompt character is '>', when in
'insert' mode after a 'G' command, it is ':'.
K +n
Kill _n lines including the current line. CP is set to the start of
the line following the last one removed.
L +n
Move CP _n character positions left.
R +n
Move CP _n character positions right.
M _+n
Move backwards(-) or forwards(+) _n lines in the file.
P +n
Print _n lines on the user console starting with the current line; the
current position of CP is marked with an up-arrow unless it is at the
very beginning of a line. If _n is greater that one, CP is moved
to the start of the last line printed.
Other Commands
$CLOSE
Close off the output file and terminate this run of the editor
program.
$GS +n
Secondary input version of 'G +n'; lines come from a secondary input
file rather than the console.
$REWIND
Rewind the secondary input file referenced by the previous
command.
! +n
Execute the last command string typed _n more times.
Examples(1)
>P 5
take current line as N ; print lines N to N+4 inclusive on the
console and leave CP set to the start of N+4.
>MP
take the current line to be N ; set CP to the start of N+1
then print N+1.
>F'3:' M-2 K3 M- P2
search through the file for the next occurrence of '3:' ; take the
current line as line N. Set CP to the start of N-2 ; destroy
N-2, N-1, and N leaving CP set to the start of N+1. Go back
one line, setting CP to the start of N-3 ; print lines N-3 and
N+1 which are now contiguous, finally leaving Examples(2)
1) Print the current line and every tenth line thereafter for the
next fifty lines:
>P( M10 P )5
2) Change '1' to '2' in the fifth source statement on a line of
IMP program text:
>( A';' )4 C'1'2'
3) Change calls on 'PRINT' to calls on 'WRITE' throughout
an IMP program. The latter has two, not three parameters.
>M-*( F'PRINT(' C-'WRITE(' A',' B- U')' P )*
In this last example, note the use of:
a) 'M-*' to make sure that CP really is at the top of the file
before starting.
b) 'P' to print out any line that is altered.
c) 'PRINT(' rather than just 'PRINT' so that for example,
'PRINT SYMBOL' won't be changed.
d) The use of '-' to save retyping parameters.
e) Spaces to make the command string more readable and easier to
type correctly.
[ name ( parameter types ) = definition body ]
For example, the 'change' command can be defined:
[C(TT)=D#1I#2 ]
where:
'C' is the name of a new command which has two text parameters(TT)
to which '#1' and '#2' refer.
1. No name must be longer than 126 and no definition body longer
than 254 characters.
2. Names must be defined before they are referred to elsewhere
though it is quite permissible subsequently to change the body
associated with a name.
3. '-' as a text parameter cannot be used within a command
definition.
[G(++)=G#2#1]
[C(+)=Z#1]
[$CLOSE=C1]
[$REWIND=C3]
[A(T)=A#1]
[B(T)=B#1]
[C(TT)=D#1I#2]
[D(T)=D#1]
[E(+)=E#1]
[I(T)=I#1]
[M(-)=M#1]
[P(+)=P#1]
[S(-)=S#1]
[U(T)=U#1]
[V(T)=V#1]
[$GS(+)=M-?G#1 2]
[G(+)=M-?G#1]
[R(+)=S#1]
[L(+)=S-#1]
[K(+)=L*(E*DN)#1]
[F(T)=(R,M)(\B#1M)*]
*
Appendix 2