% File:		parser.doc
% Author:	Peter Ross
% Updated:	7 June 1985

This describes how to use the (very) simple chart parser written in Prolog
in the file 'parser'.

The program expects data in the form of Prolog clauses:

[1] rule(Tag, Category, ExpansionList).
	The 'Tag' argument is just anbitrary Prolog term used to group
	together a set of rule/3 and lexical/3 clauses. The parser needs
	to be told this tag in order to know which set of these clauses
	to use as data. Since it is arbitrary, you could use a compound
	term to allow you to selectively include rules, at no extra cost.
	The 'Category' argument identifies a syntactic category, e.g.
	noun or sentence.
	The 'ExpansionList' is a list giving one possible expansion of that
	category into sub-categories.

[2] lexical(Tag, Word, CategoryList).
	The 'Tag' argument is as above.
	The 'Word' is a word that may legitimately appear in the input.
	The 'CategoryList' is a list of the possible syntactic categories
	which that word might fit, e.g. 'fly' might be a noun or verb.

[3] initial_category(Tag, Category).
	This states what the highest level category is - it should be unique.

[4] strategy(Tag, Strategy).
	This should instantiate Strategy to either 'td' for top-down or
	'bu' for bottom-up. The parser does not check it for validity.

[5] policy(Tag, Policy).
	This should instantiate Policy to either 'df' for depth-first or
	'bf' for breadth-first. The parser does not check it for validity.

[6] ersatz_category(Tag, DummyCategoryName).
	By default, the category name 'user' is specially used by the system.
	It supposes that there is an extra rule of the form
		rule(Tag, user, [TopLevelCategory])
	when doing top-down parsing. This extra caegory will of course appear
	in the final chart, and should help in the later analysis of it. If
	you are really attached to the category name 'user' but want to do
	some top-down parsing, then you can persuade the system to use a
	different name instead by including a clause of this form.

IMPORTANT: The rule/3 and lexical/3 predicates are only used during the
	   initialisation stage, when the transformed rules are constructed
	   and when the initial chart is created.

There are three important data structures manipulated by the program:

[a] edge(Category, FoundList, NeedsList, StartVertex, EndVertex).
	This describes an edge in the chart. If you don't know what this
	means, read Winograd's book "Language As A Cognitive Process, vol 1"
	or the articles by Henry Thompson and Graeme Ritchie in "Artificial
	Intelligence: Tools, Techniques and Applications" by O'Shea and
	Eisenstadt.
	There are three odd details. Items on the 'FoundList' are of the form
		Category = VertexNumber
	A word from the input will appear in the 'FoundList' as 
		word(Word) = VertexNumber
	The 'FoundList' is ordered so that the most recently found category is 
	first.

[b] the chart: ActiveEdgeList + InactiveEdgeList.
	The edges are segregated into two lists for convenience.
	The parser returns such a chart when it finishes.

[c] the agenda: CandidateList - Hole.
	This is a difference list (a list with a variable at the end - 'Hole'
	is the variable). The items in the list are of the form
		ActiveEdge + InactiveEdge
	and are candidates (guaranteed successful, in fact) for an application 
	of the fundamental rule of chart parsing.
	The default monitoring system prints out agendas.

The program has two top level predicates. They are:

[i]   parse(Tag, WordList, MaxVertex, Chart).
	The Tag and WordList must be given. The MaxVertex and Chart must be
	variables. When the parse is done, MaxVertex will be instantiated to
	the largest vertex number (the lowest is 0), and Chart will be
	instantiated to the final chart. When doing top-down parsing, the
	parser adds one ersatz rule of the form 
		rule(Tag,user,[TopMostCategory])
	- there will be edges for the ersatz category 'user' to help you to 
	examine the chart afterwards for successful parsings. You can 
	substitute what you want for 'user' - see above. This predicate does 
	some transformations on the rule/3 clauses supplied by the user. The 
	transformations are all done by the predicate invert_rules(Tag).

[ii]  make_chart(Tag, WordList, MaxVertex, Chart).
	This is exactly like parse/4 above, but assumes that the rule
	transormations have already been done.

When a parse has been done and a final chart has been produced, you may
want to add to the word list and extend the chart, for example if using the
parser for plan recognition (as far as I know, no-one has yet used a chart
parser for this purpose - this is part of my own research work). You can
do so by the following means: call
	initial_setup(Tag, Strategy, Policy, NewWordList, 
		      MinVertex, NewMaxVertex,
		      OldChart, NewInitialChart,
		      AgendaOfYourChoice, NewAgenda),
	chart(Tag, Strategy, Policy, NewInitialChart, NewAgenda, FinalChart)
This will assume that more words are added between MinVertex and NewMaxVertex,
and will add to the OldChart and AgendaOfYourChoice you gave it, to create
a new FinalChart. Since all the information needed about the presence of
any previous set of words (presumably ending at MinVertex) will be contained
in the OldChart you got out of parsing that previous word sequence, this will
incrementally add to the chart.
	
The implementation of the fundamental rule is explicit within the code, so
that it is easy to change for your own purposes.


There is a simple monitoring mechanism. The predicates
	watch(Tag)	nowatch(Tag)
turn it on or off for the specified rule set. If on, it reports when the
rule transformations are complete, and whenever an item on the agenda is
processed, it tries to call the predicate
	user_mon(Tag,Strategy,Policy,OldChart,OldAgenda,NewChart,NewAgenda)
If this fails, it uses a default strategy of printing out the new chart and
agenda and waiting for a <cr> before continuing.

There is a simple test rig, called by the predicate
	test(Tag).
It makes use of a trivial utility 
	print_chart(Chart) 
to print out the active and inactive edges after the parse has finished.
The lists of edges are sorted before being printed, into the standard order
of Prolog terms.

A sample rule set can be found in the file 'sample'.