;[6{   91 CHAPTER 5 DESIGN AND USE OF A COMPUTER PROGRAM [48{ [6{ TO INCORPORATE THE EDITCOST AND PHONCODE CORRECTORS  5.1. Introduction [48{ [6{The  editcost  and  phoncode  spelling  correctors  mentioned  in  chapter  4 [48{ [6{  have been developed: their design  and use is outlined  in this chapter.  They [48{ [6{  are  described  in  detail  in  chapter  7.  The  editcost  corrector  has  been [48{ [6{  incorporated  in  an  interactive  spelling  checking  program,  together  with [48{ [6{  facilities  for  adding  additional  words  to  the  editcost  dictionary [48{ [6{  (='addword')  and  for  dictionary  reference  (='lookup').  This  program  was [48{ [6{  used  by  a  group  of  children  from  the  Reading  Unit,  as  described  in  the [48{ [6{  second  study  (study  2),  chapter  6.  Whilst  the  phoncode  program  has  not [48{ [6{  been used  directly  by  the children,  it  has been  tested  on the  full  corpus [48{ [6{  of their errors (see chapter 8). [48{ [6{  Text  editors  for  use  by  children  exist:  one  such  editor  ('Walter')  is [48{ [6{  described  in  chapter  6.  Assumptions  about  its  use  by  children  from  the [48{ [6{  Reading  Unit  were  tested  in  the  first  study  (study  1)  described  in  the [48{ [6{  same chapter.  This  text editor  was  designed  and implemented  by  Sharples [48{ [6{  (Sharples,  1984)  and  used  by  him  in  a  study  of  children's  creative  writing [48{ [6{  skills. It was chosen for use in study 1 for the following reasons:  1. children in the Sharples study had not had difficulty using it;  2. it  fulfilled  the  necessary  requirements  of  allowing  the  child  to  interactively create and alter text;  3. it  was  available  for  use  on  a  machine  that  the  children  could  access easily. It  is  not  suggested  that  Walter  was  the  'ideal  text  editor'  however: [48{ [6{  other  more  suitable  editors  may  exist  or  could  be  developed,  e.g.  screen [48{ [6{  editors  with  multiple  windows,  perhaps. These  will  not  be  considered [48{ [6{  further here. [6{   92 [48{ [6{  In this  chapter,  a  description is  given  of  how  both  the  spelling  correc- [48{ [6{1[48{ [6{  tors  and  other  facilities  could  be  incorporated  in  a  larger  program , [48{ [6{  satisfying the educational  and technical design  constraints imposed in  chap- [48{ [6{  ter 4. A text editor, based on that  used by Sharples, forms the shell of [48{ [6{  this  program.  The  means  by  which  these  constraints  are  satisfied  are [48{ [6{  summarised  at  the  end  of  this  chapter.  The  content  and  structure  of [48{ [6{  the  dictionaries  used  in  study  2  are  described,  together  with  an  example [48{ [6{  of  the  dictionary.  A  brief  discussion  of  error  detection  and  correction [48{ [6{  follows,  together  with  short  descriptions  of  the  editcost  and  phoncode [48{ [6{  correction programs and other existing facilities. [48{ [6{  5.2. Example of a hypothetical session [48{ [6{  A  hypothetical  session  is  described  in  this  section,  based  on  an  actual [48{ [6{  session taken from the second study. This example is constructed from:  - a  protocol  collected  in  the  second  study,  where  the  editcost  program, addword and lookup facilities were used;  - data  from  the  first  study  where  the  text  editor,  Walter,  was  used; The  phoncode  correction  program  is  also  incorporated  in  this  example [48{ [6{  session. [48{ [6{During any one  session, the child's writing is  based on a particular  topic. [48{ [6{  The  topic  for  each  session  will  have  been  decided  at  the  end  of  the [48{ [6{  preceding  session.  It  will  relate  to  a  project  that  the  child  will  work  on [48{ [6{  over  a  number  of  sessions.  The  stimulus  for  writing  will  take  one  of  a [48{ [6{  number of forms: it might be an interview, a demonstration, 'horror story' [48{ [6{  swopping,  or  'last  night's  football  match'.  The  teacher/investigator  will [48{ [6{  discuss with the children what they might write. [48{ [6{  The  conventions  used  below  for  distinguishing  between  text  displayed  by [48{ [6{  the computer and text input by the user (the child) are: [48{ [6{    [48{ [6{1[48{ [6{This larger  program  does not  currently  exist. Walter  was written  in Pop-2,  whereas  the  spelling  correctors  and  other  programs  were  written  in  Pascal.  Given  a  version  of  Walter  in Pascal, however, it would not be difficult to implement this imagined program [6{   93 [48{ [6{  an example of text displayed by the computer an example of text input by the user [48{ [6{  The  dictionaries  relating  to  the  chosen  topic  are  set  up  before  the [48{ [6{  start of the session. [48{ [6{  %gospell type in filenames as requested - type 'no' to stop name of file to be used peterossdict another file? generaldict another file? no dictionaries set up [48{ [6{ Each child then goes to a terminal and 'logs-on': [48{ [6{  Please type your first name and then press the RETURN button Steven Hello, Steven [48{ [6{If the teacher is aware  of any words that the child might need, that  are [48{ [6{  not  already  in  his  dictionary,  she  adds  them  to  the  dictionary  for  the [48{ [6{  session: [48{ [6{w: addword What word do you want to add : turtle Give a meaning or example : a small robot used in turtle geometry to draw shapes  w: [48{ [6{The  child  may  then  start  to  write  his  new  composition,  giving  the [48{ [6{  appropriate command for adding text to the text editor: [48{ [6{  w: new story:logo is done by Peter Ross.  story:logo is about the turtel  story: w: [48{ [6{w:  is  the  command  level  prompt  for  the  editor;  story:  is  the  text [48{ [6{  prompt; carriage  return    causes  the  cursor  to move  to  the  next  line. [48{ [6{  A  second    immediately  after  the  'story:'  prompt  causes  a  return  to [48{ [6{  command level.  The  story may  then be  continued,  or edited,  or a  spelling [48{ [6{  checked. [6{   94 [48{ [6{  w: check What word do you want to check : turtel Wait a minute while I check it. It could be turtle true turned terrible Type the word that you want (or 'no' if it is not there) :turtle I will change it. [48{ [6{ The  checked  word  is  changed  to  the  correction,  the  story  (so  far) [48{ [6{  retyped  on  the  screen,  and  the  cursor  placed  at  the  end  of  the  text. [48{ [6{  The child can continue with his story: [48{ [6{  story: logo is done by Peter Ross. story: logo is about the turtle a little story: bugy  story: w: check What word do you want to check : Wait a minute while I check it. It could be buggy boy by body Type the word that you want (or 'no' if it is not there) : buggy I will change it. [48{ [6{ In this case, if the child types when asked which word to check, the [48{ [6{  program  assumes  that  the  last  word  typed  is  to  be  checked.  It  is [48{ [6{  automatically replaced: [6{   95 [48{ [6{  story: logo is done by Peter Ross. story: logo is about the turtle a little story: buggy  which is controlled by a story: commputer. it costs $350  story: w: check What word do you want to check : commputer Wait a minute while I check it. It could be computer compute computes computing Type the word that you want (or 'no' if it is not there) : computer I will change it. [48{ [6{ Earlier words in the text may also be checked and replaced. [48{ [6{  Words may also be changed without using the spelling checker: [48{ [6{  story: logo is done by Peter Ross. story: logo is about the turtle a little story: buggy which is controlled by a story: computer. it costs $350. story: The turtle can make shapes.  story: [48{ [6{w: change old words: make new words: draw old words: [48{ [6{The first occurrence of 'old  words' in the story  is changed to  whatever [48{ [6{  is put  after  'new  words'. More  changes  may be  typed  after the  next  'old [48{ [6{  words' prompt.  Alternatively  returns the  child to  the story,  with  the [48{ [6{  changes made: [6{   96 [48{ [6{  story: logo is done by Peter Ross. story: logo is about the turtle a little story: buggy which is controlled by a story: computer. it costs $350. story: The turtle can draw shapes. It story: can be in matham  story: w: check What word do you want to check : mathamatics Wait a minute while I check it. It could be mathematics materials material manipulate Type the word that you want (or 'no' if it is not there) : mathematics [48{ [6{In this case the checked word ('mathamatics') does not match to any word [48{ [6{  in the  text (only  'matham' was  typed). It  cannot be  changed  automatically. [48{ [6{  The user is prompted: [48{ [6{Is it the last word that you want changed : yes [48{ [6{If  the  user  types  'yes'  or  'y'  the  last  word  typed  will  be  deleted  and [48{ [6{  the word selected from the options offered will be substituted: [48{ [6{  I will change it. [48{ [6{ and the story is retyped: [48{ [6{story: logo is done by Peter Ross. story: logo is about the turtle a little story: buggy which is controlled by a story: computer. it costs $350. story: The turtle can draw shapes. It story: can be in mathematics [48{ [6{ If the user  types 'no'  (or any response  other than  'yes' or  'y') then  he [48{ [6{  is  given  the  choice  of  appending  the  option  selected  to  the  text  (after [48{ [6{  the  last  typed  word),  or  to  delete  some  other  word  in  the  text  and [48{ [6{  insert it in its place: [6{   97 [48{ [6{  Do you want to add this word : no Do you want to change a word : yes Which word : matham [48{ [6{If  a  successful  match  is  made the  word  is  changed.  If  no  match  is  made, [48{ [6{  or  if  the  response  to  'change  a  word'  was  not  yes,  then  no  change  is [48{ [6{  made. The story is retyped on the screen and the composition continues. [48{ [6{  story: logo is done by Peter Ross. story: logo is about the turtle a little story: buggy which is controlled by a story: computer. it costs $350. story: The turtle can draw shapes. It story: can be in mathematics. There is  story: another type of turtle for the BBC  story: which use fisher tecnic  story: [48{ [6{w: check What word do you want to check : tecnic Wait a minute while I check it. It could be technic teaching taking think Type the word that you want (or 'no' if it is not there) : techic That is not one of the words - try copying it again : technic I will change it. [48{ [6{ story: logo is done by Peter Ross. story: logo is about the turtle a little story: buggy which is controlled by a story: computer. it costs $350. story: The turtle can draw shapes. It story: can be in mathematics. There is story: another type of turtle for the BBC story: which use fisher technic [48{ [6{ If  the  correction  is  miscopied  i.e.  it  does  not  match  any  of  the  options, [48{ [6{  the  user  is  asked  to  retype  it.  If  retyped  correctly,  the  program [48{ [6{  proceeds. If there is still no match no change is made: [48{ [6{  Sorry, I still cannot change it. [48{ [6{ a message is printed and he is returned to the text. [6{   98 [48{ [6{  Sometimes,  when  a  word  is  checked,  the  required  correction  may  be [48{ [6{  missing from those offered: [48{ [6{story: logo is done by Peter Ross. story: logo is about the turtle a little story: buggy which is controlled by a story: computer. it costs $350. story: The turtle can draw shapes. It story: can be in mathematics. There is story: another type of turtle for the BBC story: which use fisher technic. story: The govement has bought  story: w: check What word do you want to check : bought Wait a minute while I check it. It could be brought bough both but Type the word that you want (or 'no' if it is not there) : no Is it one of these: boat [48{ [6{The reason  that  the  correction  is not  offered  is  either  that  it  is  not  in [48{ [6{  the  dictionary  or  that  the  editcost  program  fails  to  find  it.  Further [48{ [6{  options  are  offered  by  the  phonetic  coding  algorithm.  One  of  these  may [48{ [6{  be accepted, in the same way that the previous options were accepted: [48{ [6{  Is it one of these: boat : yes Type the word that you want (or 'no' if it is not there) : boat I'll change it. [48{ [6{If  the  correct  word  is  still  not  offered  (as  in  this  case,  where  the [48{ [6{  checked  word  is  spelt  correctly),  the  user  types  'no'  again  and  is  asked [48{ [6{  to try respelling the word: [48{ [6{Type the word that you want (or 'no' if it is not there') :no Think hard about the spelling - type it again to check it (or type 'no' if you think that the word is missing) : [6{   99 [48{ [6{  If  a  word  is  retyped,  then  it  is  checked  as  before.  If  'no'  is  typed  the [48{ [6{  user is asked if he wants a word added: [48{ [6{  : no Do you want a word added to the dictionary : yes Ask your teacher to help you. [48{ [6{ If  a  response  other  than  'no'  or  'n'  is  typed  the  addword  procedure  is [48{ [6{  prompted as before. Otherwise, the user is returned to the text. [48{ [6{  story: logo is done by Peter Ross. story: logo is about the turtle a little story: buggy which is controlled by a story: computer. it costs $350. story: The turtle can draw shapes. It story: can be in mathematics. There is story: another type of turtle for the BBC story: which use fisher technic story: The govement has bought lots of  story: turtles for schools. The turtle  story: works by mottor  story: [48{ [6{ w: check What word do you want to check : mottor Wait a minute while I check it. It could be motor mother more not Type the word that you want (or 'no' if it is not there) : lookup [48{ [6{Which word do you want to look up in the dictionary : motor motor = a machine to make things move Is this the word that you want : yes I'll change it. [48{ [6{The  user  may  delay  continuation  of  the  correction,  and  look  up  the [48{ [6{  definition of  a  word in  the  dictionary, using  the  'lookup' command.  If  the [48{ [6{  user  types  'yes'  when  asked  if  this  is  the  word  he  wanted,  the  program [48{ [6{  continues  as  if  he  had  selected  this  word  as  the  correct  option.  Other- [48{ [6{  wise,  the  response  is  taken  as  a  'no'  to  the  'type  the  word  you  want  - [48{ [6{  no if it's not there'. [6{   100 [48{ [6{  story: logo is done by Peter Ross. story: logo is about the turtle a little story: buggy which is controlled by a story: computer. it costs $350. story: The turtle can draw shapes. It story: can be in mathematics. There is story: another type of turtle for the BBC story: which use fisher technic story: The govement has bought lots of story: turtles for schools. The turtle story: works by motors controlling wheels.  story: The B.B.C Buggy cost $60.  story: w: save name of file : logo save finished w: [48{ [6{When the  story is  complete it  can be  saved  in a  named  file in  the  user's [48{ [6{  area. It can be retrieved again with the command 'recall': [48{ [6{  w: recall name of file : logo [48{ [6{The story will  be printed on  the display. The  cursor will be positioned  at [48{ [6{  the  end  of  the  text.  More  text  may  be  added  to  the  story.  The  text [48{ [6{  may be printed out (on a lineprinter) with the command 'print'. [48{ [6{  At  the  end  of  the  session  the  user  types  'goodbye'.  The  user  is [48{ [6{  prompted for any story that has not been saved since last editing: [48{ [6{  w: goodbye Do you wish to save your last story: : y name of file : turtle save finished Goodbye, Steven [6{   101 [48{ [6{  The  following  sections  relate  to  the  interactive  program  incorporating [48{ [6{  editcost,  addwords,  and  lookup,  and  to  the  phoncode  corrector,  and  also [48{ [6{  provide the basis for the program illustrated above. [48{ [6{  5.3. Dictionaries [48{ [6{For  each  session  each  child  works  from  a  'session  dictionary'  in  his [48{ [6{  working  area.  It  is  possible  to  read  any  number  of  dictionary  files  into [48{ [6{  the session  dictionary  at the  start  of  a session  (=sessiondict).  In  general, [48{ [6{  the files used are: 1. a general dictionary file (=generaldict)  2. a  dictionary  of  vocabulary  for  the  particular  topic  of  that  session  e.g.  horrordict  is  a  dictionary  of  'horror  story'  vocabulary. [48{ [6{If, as a result of discussion of the topic,  it is apparent that there are [48{ [6{  additional words that  will be  required, these  can be  added. It  is  possible [48{ [6{  to add  further  words  at any  later  point  in the  session.  These  words  are [48{ [6{  not, however automatically  added to  the permanent  dictionary file,  only  to [48{ [6{  the temporary  sessiondict.  The addword  facility could  be altered  to  enable [48{ [6{  permanent  storing  of  the  added  dictionary  words.  Provision  for  phonetic [48{ [6{  coding of the added words would have to be made. [48{ [6{  5.3.1. Size and content [48{ [6{Examples  of  some  of  the  dictionary  files  used  and  their  sizes  are  as [48{ [6{  follows: File Topic Approx. No. of words generaldict frequently used words 700 myselfdict physical description of person 171 footballdict football match review 221 horrordict horror stories 173 islandict description of a desert island 239 turtledict using the logo turtle 82 turtle2dict further vocabulary for logo 250 perqdict using the perq computer (used 67 in conjunction with turtle2dict) photodict developing a film 105 [48{ [6{ Some dictionary  files  were specifically  compiled  for writing  up  interviews [48{ [6{  with members of the Artificial Intelligence department: [6{   102 [48{ [6{ File Topic Approx. No. of words davewysedict technical drawing/ 266 mechanical engineering patamblerdict robotics 230 peterossdict using logo in education 185 mikeshdict Open University distance 171 learning In  some  sessions  two  topic  dictionaries  might  be  used,  either  for  related [48{ [6{  vocabulary, or in cases where  the child is finishing  one story and about  to [48{ [6{  start  another.  On  average,  the  dictionary  used  in  any  one  session  will [48{ [6{  contain between 750 and 1000 words. [48{ [6{  In  addition,  a  definition  or  example  is  provided  for  each  word  in  the [48{ [6{  dictionary.  See  figure 5-1  for sections  from the  generaldict and [48{ [6{  turtle2dict. [48{ [6{5.3.2. Dictionary structure [48{ [6{Each  word  is  represented  in  two  dictionaries.  In  the  pattern-matching [48{ [6{  ('editcost')  dictionary  each  word  is  read  in  and  stored  as  a  string,  with [48{ [6{  pointers both to a string representing a definition or example, and to the [48{ [6{  next  word.  Words  are  indexed  by  'first  character':  a  number  of  special [48{ [6{  first characters are defined (see chapter 7 for details). [48{ [6{  In  the  phonetic  coding  dictionary,  a  file  of  words  and  codes  for  their [48{ [6{  phonetic representation is converted into  a tree structure: each node is  a [48{ [6{  phoneme  and  each  daughter  is  the  next  phoneme  in  some  word.  Thus, [48{ [6{  words  represented  by  three  phonemes  will  match  to  nodes  at  three  levels [48{ [6{  of  the  tree,  the  final  node  including  a  pointer  to  the  string  (word)  that [48{ [6{  the phonemes represent (see chapter 7). [48{ [6{  5.4. Definition of 'a word' and affixes [48{ [6{  A  working  definition  of  a  word  is  'a  sequence  of  alphabetic  characters [48{ [6{  delimited by spaces, linefeeds or punctuation'. [48{ [6{  These  characters  may  be  upper  or  lower  case,  'a'  to  'z'.  Apostrophes, [48{ [6{  hyphens  and  digits  in  words  are  not  regarded  as  characters.  Contractions [48{ [6{  such as  'doesn't' and  'I'll' are  not included  in  the dictionary.  All  input  is [6{   103 [48{ [6{  Sections from turtle2dict define say what it means, "define a procedure for drawing a square" dome round cover, like the top half of a ball drops lets fall, "the goalie drops the ball and kicks it" ...... letters what words are made up of, "there are 5 letters in chair" logo a language used for computing ...... turtle a small robot, "you can draw shapes with the logo turtle" create make something that has never been made before database a collection of data or information in a computer degrees measure of amount of turning Sections from the generaldict different not the same, "she wears different shoes each day" discovered found, "I discovered gold in the box" ....... glad pleased, happy, "I am glad that I brought my umbrella" go leave, "go away, go home" goes leaves, "he goes home every weekend" gone left, "all the people had gone and the place was deserted" good not bad; nice, fine, "it was a good film, I enjoyed it" ...... new not old, just made, "is that a new jacket, I have not seen it before" news information, "have you heard the news, we won" next one after, "and the next in the queue is me" night darkness, not day, "it was a dark and wet night"  Figure 5-1: Example sections from the dictionary  converted  to  lower  case  for  matching:  all  words  in  the  dictionary  are  in [48{ [6{  lower  case,  and  so  all  'options  for  corrections'  are  given  in  lower  case. [6{   104 [48{ [6{  This  includes  corrections  of  proper  names.  No  affix  checking  or  stripping [48{ [6{  algorithm is used. [48{ [6{5.5. Error Detection [48{ [6{No  spelling  error  detection  process  was  used  in  the  interactive  checking [48{ [6{  program  in  study  2  or  in  the  example  session  above.  The  user  offered [48{ [6{  the  word  to  be  checked:  he  had  to  decide  for  himself  which  words  were [48{ [6{  possible  misspellings.  In  'checking  a  word'  corrections  from  the  dictionary [48{ [6{  are  offered,  including  the  input  misspelling  if  it  matches  a  dictionary [48{ [6{  word. [48{ [6{The  program  could  be  extended  to  include  a  facility  to  'check  a [48{ [6{  passage'.  All  words  in  the  passage  that  are  not  found  in  the  dictionary [48{ [6{  would  be  queried  as  misspellings.  For  example,  in  the  text  above,  the [48{ [6{  word  'govement'  is  neither  checked  nor  corrected.  If  the  whole  passage [48{ [6{  were  checked,  this  'word'  (misspelling)  would  be  highlighted  in  some  way, [48{ [6{  to  indicate  to  the  user  that  it  should  be  checked.  Options  for  the [48{ [6{  correction  might  be  offered  automatically.  In  checking  the  passage,  the [48{ [6{  definition  of  'a  word'  would  have  to  be  made  more  precise.  Digits [48{ [6{  appearing  as  part  of  a  word  could  be  queried,  other  digits  in  the  text [48{ [6{  ignored.  Punctuation  characters  ,.+:-?!"()  spaces  and  linefeeds  could  be [48{ [6{  taken  as  delimiters.  Words  containing  other  non-alphabetic  characters  or [48{ [6{  apostrophes could be queried as errors. [48{ [6{  5.6. Error Correction [48{ [6{5.6.1. Syntax [48{ [6{As  was  discussed  in  chapter  3,  whilst  it  is  desirable  to  use  syntactic [48{ [6{  information  in  spelling  error  detection  and  correction  there  are  many [48{ [6{  difficulties  in  doing  so.  Attempts  were  made  to  find  solutions  to  these [48{ [6{  at an early stage in this project. [48{ [6{  The first  'possible solution'  considered was  to take  some existing  natural [48{ [6{  language  parsing  program  and  adapt  it.  If  regularities  could  be  found  in [48{ [6{  the  children's  grammar,  and  these  conformed  (to  a  large  extent)  to  the [48{ [6{  grammar  used  in  the  parsing  program,  then  the  grammar  in  the  parser [6{   105 [48{ [6{  could  be  extended  to  parse  the  children's  text.  Defining  the  grammar [48{ [6{  for  the  children's  text,  however,  proved  impossible!  It  was  also  realised [48{ [6{  that,  as  a  substantial  proportion  of  the  words  would  be  misspelt,  the [48{ [6{  parse  would  have  to  be  capable  of  dealing  with  ill-formed  input.  This  is [48{ [6{  not  a  trivial  problem,  and  is  in  fact  a  separate  area  of  natural  language [48{ [6{  research (Fass and Wilks, 1983). [48{ [6{  A  different  approach  was  considered:  instead  of  attempting  to  parse  a [48{ [6{  complete  sentence,  the  immediate  context  alone  could  be  considered. [48{ [6{  Given any  word  that is  labelled  as a  part  of speech,  the  likelihood  of  any [48{ [6{  other  part  of  speech  occuring  adjacent  to  it  can  be  calculated.  Thus, [48{ [6{  given  two  adjacent  words  with  ambiguous  part  of  speech  labels  the  most [48{ [6{  likely  combination  of  labels  for  that  position  can  be  predicted.  Addition- [48{ [6{  ally,  if  it  could  be  assumed  that  a  small  number  of  frequently  occurring, [48{ [6{  correctly  spelt,  words  exist  -  for  example  a  set  of  function  words  such [48{ [6{  as  'a,  the,  at,  and,  of,  from,  for...'  -  then  these  words  (in  conjunction [48{ [6{  with  the  likelihoods  of  adjacent  labels)  could  be  used  to  construct [48{ [6{  'templates'  for  sequences  of  labels  for  parts  of  sentences.  So,  for  any [48{ [6{  word  not  found  in  the  dictionary  a  misspelling  would  be  assumed.  All [48{ [6{  function  words  found  in  the  text  would  be  assumed  to  be  correct. [48{ [6{  Templates  would  be  matched  to  word  sequences. A  'correction'  label [48{ [6{  would  be  assigned  to  all  misspellings  according  to  the  prediction  of  its [48{ [6{  most  likely  label,  taken  from  matching  to  the  template.  This  correction [48{ [6{  label  would  then  be  used  in  selection  of  candidates  for  corrections,  in [48{ [6{  addition  to  other  information.  Words  misspelt  as  other  words  would  be [48{ [6{  detected  if  their  'possible  labels'  did  not  match  to  the  most  likely  label [48{ [6{  assigned  for  that  position  in  the  template.  Again,  this  'most  likely  label' [48{ [6{  might be used in correction. [48{ [6{  However,  the  assumption  that  the  set  of  function  words  will  always  be [48{ [6{  spelt  correctly  cannot  be  made.  Additionally,  a  great  deal  of  work  would [48{ [6{  be  required  to  calculate  the  likelihood  of  words  occurring  in  relative [48{ [6{  positions:  it  would  be  a  thesis  project  in  itself  to  provide  a  type- [48{ [6{  labelling  system  for  ill-formed  and  incorrect  input.  Consequently,  work  on [48{ [6{  this particular  problem  was  not followed  up  in  this  thesis.  Atwell  (Atwell, [48{ [6{  1983) is currently working on this problem. [6{   106 [48{ [6{  5.6.2. Semantics [48{ [6{No  work  was  carried  out,  in  this  thesis,  on  semantic  analysis.  The  use [48{ [6{  of  semantic  information  in  spelling  correction  is  the  subject  of  Fass's [48{ [6{  1983  MSc  thesis  (Fass,  1983),  and  his  current  PhD  thesis  (Fass,  1984)  (see [48{ [6{  chapter 3). [48{ [6{There  is  semantic  constraint  placed  on  words  considered  as  correction [48{ [6{  candidates,  in  that  words  not  appearing  in  the  topic  related  or  general [48{ [6{  dictionaries  are  not  offered  i.e.  words  offered  are  mostly  those  relating [48{ [6{  to  the  topic  of  interest.  Additionally,  the  facility  for  referencing  the [48{ [6{  definitions of words ('lookup') provides some semantic information as  an aid [48{ [6{  to the user. [48{ [6{5.7. The editcost program [48{ [6{The  editcost  program  takes  the  word  to  be  corrected  (prompted  after [48{ [6{  the  command  'check'  in  the  example)  input  by  the  user,  and  the  session- [48{ [6{  dict.  Sections  of  the  dictionary  are  selected  for  comparison  with  the [48{ [6{  input  word  (=inpw):  these  words  form  the  shortlist.  The  selection  is [48{ [6{  based  on  the  initial  letters  of  inpw  and  its  length.  The  shortlisting [48{ [6{  function  is  described  in  section  7.2.2.  Each  word  on  the  shortlist  is [48{ [6{  compared  with  inpw.  For  each  word,  the  cost  of  editing  inpw  to  match [48{ [6{  it  is  calculated.  The  cost  is  dependent  upon  the  edit  operation  and  the [48{ [6{  particular  letters  involved.  Detail  of  the  calculation  of  editcosts  is  given [48{ [6{  in chapter 7. The  four words from the dictionary with least editcost  are [48{ [6{  offered  to  the  user,  in  ascending  order  of  cost,  as  options  for  the [48{ [6{  correct spelling of the input word (see the example in section 5.2). [48{ [6{  To  some  extent  the  decision  to  offer  four  options  is  an  arbitrary  one. [48{ [6{  Certain constraints,  however, do  limit  the number  of  words to  be  offered [48{ [6{  to  the  user.  Firstly,  it  is  not  possible  to  guarantee  that  the  closest [48{ [6{  match  to  the  checked  word  will  always  be  the  required  word:  for  the [48{ [6{  most  bizarre  spellings  (taken  out  of  context)  only  telepathy  could [48{ [6{  guarantee  accurate  correction!  Further  information  about  unrecognizable [48{ [6{  words  could  be  obtained  from  the  context  or  perhaps  by  recognition  of [48{ [6{  the spoken  word.  This  is not  possible  here.  Educationally,  it  is  not  wholly [48{ [6{  undesirable that the program should  fail to correct these words: the  child [6{   107 [48{ [6{  is  left  with  some  incentive  to  spell  the  word  'as  close  to  correct  as [48{ [6{  possible'. [48{ [6{Secondly, the greater  the number  of options offered,  the more likely  it [48{ [6{  is  that  the  correction  will  be  amongst  them.  The  larger  the  number  of [48{ [6{  words  offered,  the  more  difficult  the  task  of  selection  will  be  for  the [48{ [6{  child. With a  greater number they  will be more  likely to become  confused, [48{ [6{  or be unwilling to read until they find the correct one. [48{ [6{  It  is  necessary  to  select  the  minimum  number  of  words  to  be  offered [48{ [6{  such  that  the  correction  is  likely  to  be  amongst  them.  In  the  initial [48{ [6{  development  of  the  editcost  algorithm  (before  actual  use  with  the [48{ [6{  children) varying  numbers  of  words were  offered.  Offering  more  than  four [48{ [6{  words  did  not  appear  to  greatly  increase  the  frequency  with  which  the [48{ [6{  correction  was  included  in  the  options.  The  assumption  is  made  that,  if [48{ [6{  the  word  is  in  the  dictionary  and  the  checked  word  is  a  reasonable [48{ [6{ 2[48{ [6{enough  approximation  to  it,  the  four  closest  matches  should  include  it. [48{ [6{  Chapter 8 gives detail of the program's success in this. [48{ [6{  5.8. The phoncode program [48{ [6{  The phonetic  coding program  takes  the input  word,  inpw, and  a  diction- [48{ [6{  ary  and  selects  those  words  in  the  dictionary  that  might  be  considered [48{ [6{  phonetically  equivalent  to  the  inpw.  Using  a  table  of  grapheme-phoneme [48{ [6{  correspondences,  the  inpw  is  split  into  all  combinations  of  graphemes  with [48{ [6{  their  corresponding  phonemes:  it  is  effectively  being  parsed  to  generate [48{ [6{  all  phoneme  sequences,  according  to  the  grapheme-phoneme  grammar.  A [48{ [6{  sequence  of  phonemes,  generated  by  the  grammar,  will  be  referred  to  as [48{ [6{  a  'phoneme  sentence'.  The  dictionary  is  represented  as  a  tree  of [48{ [6{  phonemes.  The  inpw 'phoneme  sentences'  are matched  to  the  phonemes  in [48{ [6{  the  tree.  If  a  path  in  the  tree  matches  a  phoneme  sentence,  and  that [48{ [6{  path  represents  a  word,  then  the  inpw  is  considered  to  be  a  possible [48{ [6{  phonetic  equivalent  to  that  word  from  the  dictionary.  All  matches  are [48{ [6{  found by exhaustive search and are offered to the user (the 'four option' [48{ [6{    [48{ [6{2[48{ [6{'Reasonable  enough  approximation'  here  means  that  the  spelling  could  normally  be  recognised, by a competent speller, as the intended word. [6{   108 [48{ [6{  condition applies only  to the  editcost program).  See  section 7.3 for  more [48{ [6{  detail of the phoncode program. [48{ [6{  5.9. Other facilities [48{ [6{5.9.1. The lookup facility [48{ [6{The  lookup  procedure  takes  a  word  from  the  user,  prompted  when  the [48{ [6{  command  'lookup'  is  given,  and  accesses  the  editcost  dictionary.  It  looks [48{ [6{  for the  word  in  the dictionary  (see  figure  5-1)  and  prints  out  its  stored [48{ [6{  definition. [48{ [6{w:lookup Which word do you want to look up in the dictionary? w:true true = not false, "is it true that you have a job" w:lookup Which word do you want to look up in the dictionary? w:stick stick = fix; piece of wood, "stick that poster on the wall" w:lookup Which word do you want to look up in the dictionary? w:turtle turtle = a small robot, "you can draw shapes with the logo turtle" [48{ [6{ If the word to be  looked up is incorrectly  spelt, or if it is not in  the [48{ [6{  dictionary, then a message will be printed: [48{ [6{  w:lookup Which word do you want to look up in the dictionary? w:stik stik = is not in the dictionary w:lookup Which word do you want to look up in the dictionary? w:specifications specifications = is not in the dictionary [48{ [6{ Both 'lookup' and 'look up' are accepted as commands. [6{   109 [48{ [6{  5.9.2. The addword facility [48{ [6{If,  as  a  result  of  discussion  of  the  writing  topic,  it  is  found  that [48{ [6{  there  are  words  that  the  child  might  wish  to  use  but  that  are  not  in [48{ [6{  the  dictionary,  they  can  be  added  with  the  addword  procedure.  A  defini- [48{ [6{  tion  is  also  added.  Additionally,  if  the  child  finds  that  a  word  he  wishes [48{ [6{  to  check  is  not  in  the  dictionary  it  may  be  added,  using  addword.  The [48{ [6{  child  is  encouraged  to  ask  the  teacher/investigator  to  add  the  word,  to [48{ [6{  prevent  misspelt  words  being  added  to  the  dictionary.  (Unless  the [48{ [6{  teacher/investigator  is  able  to  phonetically  code  the  word,  it  can  only  be [48{ [6{  added to the editcost program dictionary). [48{ [6{  w:add word What word do you want to add? specification Give a meaning or example how you specify something  w:addword What word do you want to add? possible Give a meaning or example can be done [48{ [6{The procedure accepts both 'addword' and 'add word' as commands. [48{ [6{  5.10. Relating the design to the requirements [48{ [6{  The design  of  the  program  presented  in  this  chapter  will  be  related  to [48{ [6{  the program requirements, given in chapter 4, section 4.6. [48{ [6{  The  correction  programs  are  not  restricted  to  fixed  word  lists  or [48{ [6{  predefined  dictations,  though  they  are  limited  by  the  topic  and  general [48{ [6{  dictionaries  forming  the  session  dictionary.  The  addword  facility  permits [48{ [6{  further extension, however. The child can use the corrector to check any [48{ [6{  word  he  wishes  whilst  writing.  It  is  used  interactively.  A  word  can  be [48{ [6{  corrected  at  any  point  in  the  writing  process:  the  child  has  control  of [48{ [6{  the tool. [48{ [6{If  the  correctors  were  incorporated  in  a  text  editor  then  the  child [48{ [6{  would  be  able  to  generate  and  alter  text  with  ease,  assuming  he  can  use [48{ [6{  the  text  editor.  This  assumption  is  tested  in  study  1,  described  in [48{ [6{  chapter 6. [6{   110 [48{ [6{  Neither  the  editcost  nor  the  phoncode  programs  are  restricted  to [48{ [6{  dealing  with  single  error  misspellings.  There  is  no  assumption  of  "first [48{ [6{  letter correct", though  there is some  restriction on  which alternatives  are [48{ [6{  permitted  in  the  editcost  program.  Both  programs  are  able  to  correct [48{ [6{  short word errors. [48{ [6{In  the  editcost  program,  the  sequence  of  edit  operations  used  to [48{ [6{  transform  the  misspelling  into  the  correction  is  stored.  It  could  easily  be [48{ [6{  used  to  reconstruct  the  error.  Similarly,  grapheme-phoneme  correspon- [48{ [6{  dences  used  by  the  phoncode  program  can  be  recorded  and  the  error [48{ [6{  reproduced.  This  could  provide  information  about  the  phoneme-grapheme [48{ [6{  correspondences being  used  by the  child. It  can also  be used  in  classifying [48{ [6{  pairs of words as 'phonetically equivalent'. [48{ [6{  The  programs'  success  in  providing  the  correction,  assuming  it  is  in  the [48{ [6{  dictionary,  is  considered  in  chapter  8.  It  was  considered  important  that [48{ [6{  the  correction  should  not  be  rejected  at  any  stage  in  the  process  of [48{ [6{  selection of candidates. [48{ [6{The  editcost  program  permits  any  number  of  candidate  corrections  to [48{ [6{  be  offered  to  the  user,  although  only  four  are  currently  offered,  whilst [48{ [6{  the  phoncode  program  only  gives  (a  small  number  of)  exact  matches: [48{ [6{  therefore, not too many "final corrections" will be offered to the user. [48{ [6{  A number of questions relating to the programs need considering: ques- [48{ [6{  tions  relating  to  assumptions  about  the  way  in  which  the  children  will  use [48{ [6{  the  programs  are  addressed  in  chapter  6.  In  chapter  8  the  performance [48{ [6{  of the correction programs is assessed. [6{   I TABLE OF CONTENTS  5. Design and use of a computer program to incorporate the editcost  91  and phoncode correctors [48{ [6{5.1. Introduction 91  5.2. Example of a hypothetical session 92  5.3. Dictionaries 101  5.3.1. Size and content 101  5.3.2. Dictionary structure 102  5.4. Definition of 'a word' and affixes 102  5.5. Error Detection 104  5.6. Error Correction 104  5.6.1. Syntax 104  5.6.2. Semantics 106  5.7. The editcost program 106  5.8. The phoncode program 107  5.9. Other facilities 108  5.9.1. The lookup facility 108  5.9.2. The addword facility 109  5.10. Relating the design to the requirements 109 [6{   II LIST OF FIGURES  Figure 5-1: Example sections from the dictionary 103