% File: parser.doc % Author: Peter Ross % Updated: 7 June 1985 This describes how to use the (very) simple chart parser written in Prolog in the file 'parser'. The program expects data in the form of Prolog clauses: [1] rule(Tag, Category, ExpansionList). The 'Tag' argument is just anbitrary Prolog term used to group together a set of rule/3 and lexical/3 clauses. The parser needs to be told this tag in order to know which set of these clauses to use as data. Since it is arbitrary, you could use a compound term to allow you to selectively include rules, at no extra cost. The 'Category' argument identifies a syntactic category, e.g. noun or sentence. The 'ExpansionList' is a list giving one possible expansion of that category into sub-categories. [2] lexical(Tag, Word, CategoryList). The 'Tag' argument is as above. The 'Word' is a word that may legitimately appear in the input. The 'CategoryList' is a list of the possible syntactic categories which that word might fit, e.g. 'fly' might be a noun or verb. [3] initial_category(Tag, Category). This states what the highest level category is - it should be unique. [4] strategy(Tag, Strategy). This should instantiate Strategy to either 'td' for top-down or 'bu' for bottom-up. The parser does not check it for validity. [5] policy(Tag, Policy). This should instantiate Policy to either 'df' for depth-first or 'bf' for breadth-first. The parser does not check it for validity. [6] ersatz_category(Tag, DummyCategoryName). By default, the category name 'user' is specially used by the system. It supposes that there is an extra rule of the form rule(Tag, user, [TopLevelCategory]) when doing top-down parsing. This extra caegory will of course appear in the final chart, and should help in the later analysis of it. If you are really attached to the category name 'user' but want to do some top-down parsing, then you can persuade the system to use a different name instead by including a clause of this form. IMPORTANT: The rule/3 and lexical/3 predicates are only used during the initialisation stage, when the transformed rules are constructed and when the initial chart is created. There are three important data structures manipulated by the program: [a] edge(Category, FoundList, NeedsList, StartVertex, EndVertex). This describes an edge in the chart. If you don't know what this means, read Winograd's book "Language As A Cognitive Process, vol 1" or the articles by Henry Thompson and Graeme Ritchie in "Artificial Intelligence: Tools, Techniques and Applications" by O'Shea and Eisenstadt. There are three odd details. Items on the 'FoundList' are of the form Category = VertexNumber A word from the input will appear in the 'FoundList' as word(Word) = VertexNumber The 'FoundList' is ordered so that the most recently found category is first. [b] the chart: ActiveEdgeList + InactiveEdgeList. The edges are segregated into two lists for convenience. The parser returns such a chart when it finishes. [c] the agenda: CandidateList - Hole. This is a difference list (a list with a variable at the end - 'Hole' is the variable). The items in the list are of the form ActiveEdge + InactiveEdge and are candidates (guaranteed successful, in fact) for an application of the fundamental rule of chart parsing. The default monitoring system prints out agendas. The program has two top level predicates. They are: [i] parse(Tag, WordList, MaxVertex, Chart). The Tag and WordList must be given. The MaxVertex and Chart must be variables. When the parse is done, MaxVertex will be instantiated to the largest vertex number (the lowest is 0), and Chart will be instantiated to the final chart. When doing top-down parsing, the parser adds one ersatz rule of the form rule(Tag,user,[TopMostCategory]) - there will be edges for the ersatz category 'user' to help you to examine the chart afterwards for successful parsings. You can substitute what you want for 'user' - see above. This predicate does some transformations on the rule/3 clauses supplied by the user. The transformations are all done by the predicate invert_rules(Tag). [ii] make_chart(Tag, WordList, MaxVertex, Chart). This is exactly like parse/4 above, but assumes that the rule transormations have already been done. When a parse has been done and a final chart has been produced, you may want to add to the word list and extend the chart, for example if using the parser for plan recognition (as far as I know, no-one has yet used a chart parser for this purpose - this is part of my own research work). You can do so by the following means: call initial_setup(Tag, Strategy, Policy, NewWordList, MinVertex, NewMaxVertex, OldChart, NewInitialChart, AgendaOfYourChoice, NewAgenda), chart(Tag, Strategy, Policy, NewInitialChart, NewAgenda, FinalChart) This will assume that more words are added between MinVertex and NewMaxVertex, and will add to the OldChart and AgendaOfYourChoice you gave it, to create a new FinalChart. Since all the information needed about the presence of any previous set of words (presumably ending at MinVertex) will be contained in the OldChart you got out of parsing that previous word sequence, this will incrementally add to the chart. The implementation of the fundamental rule is explicit within the code, so that it is easy to change for your own purposes. There is a simple monitoring mechanism. The predicates watch(Tag) nowatch(Tag) turn it on or off for the specified rule set. If on, it reports when the rule transformations are complete, and whenever an item on the agenda is processed, it tries to call the predicate user_mon(Tag,Strategy,Policy,OldChart,OldAgenda,NewChart,NewAgenda) If this fails, it uses a default strategy of printing out the new chart and agenda and waiting for a before continuing. There is a simple test rig, called by the predicate test(Tag). It makes use of a trivial utility print_chart(Chart) to print out the active and inactive edges after the parse has finished. The lists of edges are sorted before being printed, into the standard order of Prolog terms. A sample rule set can be found in the file 'sample'.