2. SOME THEORY It is well known that the syntax of a context-free language may be described by a set of phrases expressed as BNF productions. Alternatively, the phrases may be described by regular expressions. This is equivalent but more powerful. For example the language:- -> -> | + | - -> | May be written:- L -> exp (( + , - ) exp)* exp -> var , const The constructs used to form the regular expression are:- L -> A B which describes A and B following sequentially. L -> A , B describing A and B as alternatives, and, L -> A* describing A repeated zero or more times. An important property of regular expressions for communications applications is that any regular expression may also be represented as a state transition diagram. The equivalent representations for the basic constructs are given below:- L -> A B S --> A --> B --> F S --> A ---_ L -> A , B ! V ---> B --> F -- ! ! V / L -> A* S --> A --> F This property allows a finite state automaton (FSA) to be described by a regular expression. S and F are start and finish states respectively of the FSA. Alternatively a state transition diagram may be represented by a regular expression. (I'm not sure if this has been proved for the general case). In order to describe a context free language it is necessary to suppose that on encountering a non-terminal in the regular expression the state of the current FSA is saved on a stack and that a new FSA is entered to recognise the sub-phrase. When this new FSA terminates the saved state is popped from the stack and a return to the calling FSA is made. If the sub-phrase was recognised successfully then the non-terminal is recognised and the calling FSA continues to the next sequential term of the regular expression. If the sub-phrase failed to be recognised then the calling FSA must try an alternative. A finite state automation augmented by a stack of states in this way is called a push-down atomaton or PDA. For compiler applications PDA's can be used as syntax checkers, with each atom of the regular expresions programmed to recognise an atom of the input language. A complication is that ambiguities in the language may require alternative non-terminals to be recognised in parallel by concurrent FSA's. In this case the language and the PDA are called non-deterministic. For communications applications a deterministic FSA or PDA is normally suitable but it is useful to generalise the atoms of the regular expression to perform an arbitrary action. This action is viewed as an arbitrary predicate guarding entry to an arbitrary action procedure. If the guard is passed the action is carried out and the FSA makes its transition to the appropriate new state. If the guard fails then the FSA proceeds to the next alternative. It is interesting to note that deterministic PDA's have the same sort of constructs as structured programming languages. Eg sequential atoms correspond to sequential imperative statements, alternative atoms to %if ... %then ... %else ... constructs, and repetitive atoms to %while ... %cycle ... %repeat constructs. Non-terminals correspond to procedure calls; nothing corresponds to passing parameters to a procedure.