@part(discuss,root "thesis.mss")

@chapter(Performance)
@label(perform)

@section(Introduction)
@label(perform-intro)

In this chapter the performance of the editcost and
phoncode programs, in correcting the errors made by the children,
is assessed.

If the programs are able to provide correction of the errors,
then this provides evidence that:
@begin(alphabetize)
there are regularities in the children's errors;

information relating to these regularities can be used 
by the programs to reconstruct the correction from the error.
@end(alphabetize)

Where there is failure to correct an error, this can be 
attributed to one or more of the following:
@begin(alphabetize)
the errors are not sufficiently regular;

the programs do not have sufficient information about the 
regularities of the errors  i.e. the grammar or weightings are
incomplete or incorrect;

the algorithm fails: sufficient information about existing regularities
may be supplied to the program, but still there is
failure to reconstruct the error.
@end(alphabetize)

Relating to these possible sources of error, the following question
is also considered:
@begin(itemize)
Is a human judge able to perceive regularities in the errors, and
would he/she then be able to provide corrections?
@end(itemize)

The editcost and phoncode programs are each considered
in relation to the following questions:
@begin(enumerate)
Does the program succeed in correcting the errors made?

If there is a failure, is it due to:
@begin(alphabetize)
the errors being irregular,

the program data being insufficient or incorrect,

the methods of analysis being unsuitable?
@end(alphabetize)

When the programs succeed, what does this tell us about
@begin(alphabetize)
the individual children

the methods of correction?
@end(alphabetize)
@end(enumerate)

@newpage
@section(Performance of the editcost program)
@label(perform-editc)

The performance of the editcost program
was initially assessed on two sets of data:
@begin(enumerate)
The words used with the editcost program in study 2 (S2);

The complete set of errors made in studies 1 and 2 (S1, S2).
@end(enumerate)
@subsection(Testing editcost in use - Study 2)
@label(peditc-inuse)

The editcost program was used in study 2 (S2),
as described in chapter @ref(assumptions). Each child
used the program whenever he wished to check the spelling of a word
(i.e. the input word). In some cases the word that was checked
was correctly spelt: in other cases it was misspelt.
It was compared with the set of words shortlisted from the
dictionary. The dictionary consisted of the words in the 
generaldict, and the topic dictionary words for the particular session. 
The four words with lowest minimum repair cost were
found and offered as possible corrections.

Whenever a word was checked, the outcome could be categorised
in one of three ways:
@begin(romanize)
the correction for the input word was both in the dictionary
and offered as a possible correction;

the correction was in the dictionary, but was not offered as a
possible correction;

the correction was not in the dictionary.
@end(romanize)
The frequency of occurrence for each category i, ii, iii,
for each group of children taking part in S2 is given in
figure @ref(pedit-one).
Group 1 comprises FR, DV, TE and DR; group 2, DI, MA, GR and ST.

@begin(figure)
@begin(verbatim)
                i              ii            iii        iv
            correction     correction     correction   total
          in dictionary  in dictionary    not in the
           and offered    not offered     dictionary

  Group 1       229            27             55        311

  Group 2        77             3             39        119

  Both groups   306            30             94        430

@end(verbatim)
@caption(Editcost in use: outcomes of checking)
@tag(pedit-one)
@end(figure)

These results can be re-expressed as percentages.
@begin(verbatim)
Percentage correction offered, overall: i/total

        group 1         group 2         both groups
         73.6%           64.7%             71.2%

Percentage correction offered, when in the dictionary: 
i/(i+ii)

        group 1         group 2         both groups
         89.5%           96.3%             91.1%

and from this, percentage correction @b(not) offered when 
in the dictionary: ii/(i+ii)

        group 1         group 2         both groups
         10.5%            3.7%              8.9%

Percentage of corrections not in the dictionary: iii/iv

        group 1         group 2         both groups
         17.7%           32.8%             21.9%
@end(verbatim)

From these results it can be seen that the program was
able to offer the correction for a large percentage (>90%)
of the words checked, assuming that they were in the dictionary.
The correction algorithm was more successful for group 2
than for group 1 (96% vs. 89% corrections). However,
group 2 attempted to check the spelling of the larger
percentage of words that were not in the dictionary.

These results may also be considered for individual children,
as in figures @ref(pedit-offered) and @ref(pedit-notoff).
@begin(figure)
@begin(verbatim)
             i          i/iv       i/(i+ii)          iv
           number     % of the   % of those in  total number
         corrected      total    the dictionary    checked
  Group 1
  FR         69        71.9%          92%            96
  DV         50         82%          87.7%           61
  TE         72        74.2%          90%            97
  DR         38        67.7%         86.4%           57

  Group 2
  DI         10        66.7%         100%            15
  ST         20        62.5%         100%            32
  MA         27        65.9%         96.4%           41
  GR         20        64.5%         90.9%           31
 
  Total     306                                     430
@end(verbatim)
@caption(Editcost in use: individual results
Correction offered)
@tag(pedit-offered)
@end(figure)

@begin(figure)
@begin(verbatim)

              ii                   iii              iv
          correction in        correction not      total
          the dictionary      in the dictionary   number
         freq   % of total    freq   % of total   checked
  Group 1
  FR       6       6.3%        21      21.9%        96
  DV       7      11.5%         4       6.6%        61
  TE       8       8.2%        17      17.5%        97
  DR       6      10.5%        13      22.8%        57

  Group 2
  DI       -         0%         5      33.3%        15
  ST       -         0%        12      37.5%        32
  MA       1       2.4%        13      31.7%        41
  GR       2       6.5%         9        29%        31

  Total   30                   94                  430
@end(verbatim)
@caption(Editcost in use: individual results
Correction not offered)
@tag(pedit-notoff)
@end(figure)

Results for groups 1 and 2 were compared using the 
Mann-Whitney U test (one-tailed). This test was also
used to assess differences in performance of groups
1 and 2 in the first study, and for all other group
comparisons in this chapter.

From figure @ref(pedit-offered) it can be seen that group 2
showed a higher percentage correction of words in the
dictionary than group 1
(p<0.05).
The number of corrections offered, taken as a percentage of the 
total number of words checked, was higher for group 1 
(p<0.02).
Within groups there is little difference shown in the percentage of errors
corrected (the range being <10%). Between groups there is less
than 15% difference between the highest (100% for ST and DI) and
the lowest (86.4% for DR).
For the majority of cases where the correction was not offered, 
(c. 75%) the correction was not in the dictionary
(see figure @ref(pedit-notoff)).
The exception to this was errors made by DV - however, 
he showed the highest correction
success rate overall.
Group 1 had a significantly greater percentage of errors that were in the 
dictionary and not corrected (p<0.02) than group 2;
group 2 showed a greater percentage of errors for which the correction
was not in the dictionary (p<0.02).

The possible corrections were ordered by cost, the lowest
cost being offered first. The intended word, if it was included
in the possible corrections, could be the first word offered
(off(1)) or in the second, third or fourth positions (off(2/3/4)).
The corrections offered were categorised according to whether they were
off(1) or off(2/3/4).
For each group the percentage of first words offered was:
@begin(verbatim,group)
                Group 1        Group 2       Both groups
  off(1)         76.4%          97.4%           81.7%
  off(2/3/4)     23.6%           2.6%           18.3%
@end(verbatim)

For group 1 the intended correction was the first word offered in
three-quarters of cases. For group 2 it was off(1) in more than 97% of
cases.
Overall, in more than four-fifths of cases the intended
correction was offered as the possible correction with
the least cost repair.

On a number of occasions, if a word was checked and the correction
not offered  the child was encouraged to re-check it with
a different spelling: "One closer to the correct word".
These rechecks are included in the above categories, according to
their outcomes. For group 1, 21 of the 27 words (in category ii)
were rechecked with a different spelling. For 19 of these, the correction 
was found and offered. For group 2, for all 3 words in category ii,
the correction was offered when rechecked. So, for the combined groups,
of the 30 words for which the correction was not offered, 24
were rechecked with different spellings, 22 of these rechecked
words produced the required spelling.

When the required word was not in the dictionary, the investigator
could be asked to add it. The initial spelling could then be rechecked.
Twenty-six words were added and rechecked, 11 from group 1 and 15 from
group 2. With the exception of 1 word from group 2, the corrections
were offered for all added and rechecked words.

The words that were not corrected successfully by
the algorithm are discussed in more detail at the end of this 
section, in subsection @ref(pedit-disc12).

@subsection(Testing on the corpus containing
Study 1 and Study 2 errors)
@label(pedit-test12)

The editcost program was tested on the corpus of errors made by
the children in both studies. These errors included
those checked with the editcost program (S2), those
made when writing (S2), and those made when typing (S1 and S2).
Chapter @ref(assumptions) gives details of the two studies.
The dictionary used was set up specifically for testing.
Whilst the dictionary that had been used in each of the S2
sessions contained 750 to 1000 words, the testing dictionary
contained more than 2000 words. It comprised the general
dictionary, plus all the topic dictionaries 
and all the corrections of errors (with duplicates removed).

Each error was checked using the editcost program, and the five
dictionary words with lowest minimum cost repair were recorded.
The reason for recording the fifth word was to test whether
the performance would be substantially improved if it was
included in the possible corrections. In only two cases in S1
and 15 cases in S2 was the correction the fifth option.
This represents 2% of the total number of errors. In
these results the correction offered as
the fifth option is not counted as a success.

For each child the following information was recorded:
@begin(alphabetize)
the number of errors for which the correction was offered;

the percentage of errors for which the correction was offered;

the number of the corrections that were offered as first option (off(1));

the percentage of the corrections that were offered as first option;

the total number of errors made.
@end(alphabetize)

The results of testing the errors made in S1 are given 
in figure @ref(pedit-ps1).

@begin(figure)
@begin(verbatim)
            a.        b.        c.         d.        e.
          number      %       number     % of a.   total
        corrected  corrected  off(1)     off(1)    number

  Group 1
    GQ      15       100%       13       86.6%       15
    JM      38       100%       33       86.8%       38
    MN      29       82.9%      24       82.8%       35

  Group 2
    LB      24       92.3%      22       91.7%       26
    NM      30       90.9%      28       93.3%       33
    CM      46       76.7%      34       73.9%       60
    SS      18       64.3%      14       77.8%       28

  Group 1   82       93.2%      70       85.4%       88
  total
  Group 2  118       80.3%      98       83.1%      147
  total
  Both     200       85.1%     168       84%        235
  groups
@end(verbatim)
@caption(Editcost tested on  Study 1 errors)
@tag(pedit-ps1)
@end(figure)

The program offered the correction for 85% of errors, over both groups.
93.2% of errors made by group 1 were corrected, whilst 80.3% of
corrections were offered for group 2 (not significant).
It was least successful for CM and SS, offering only 64% of
corrections in the case of SS (the reasons for this failure are
discussed in section @ref(pres-indiv)). 
It was most successful for GQ and JM,
providing 100% correction.
In  84% of cases where the correction
was offered it was the first option i.e. it had
lowest edit cost. Note that the program weightings were
based on the frequency of errors made by this group, and 
therefore a high percentage of corrections offered 
was to be expected.

The results of testing the errors made in S2 are given
in figure @ref(pedit-ps2).

@begin(figure)
@begin(verbatim)
            a.        b.        c.         d.        e.
          number      %       number     % of a.   total
        corrected  corrected  off(1)     off(1)    number

  Group 1
    FR     103       83.7%      81       78.6%      123
    DV      65       66.3%      39       60%         98
    TE     106       80.9%      66       62.3%      131
    DR      55       63.2%      41       74.5%       87

  Group 2
    GR      45       81.2%      34       61.8%       55
    DI      21       95.5%      19       86.4%       22
    MA      39       92.9%      33       78.6%       42
    ST      39       92.9%      33       78.6%       42

  Group 1  329       74.9%     227       69%        439
  total
  Group 2  144       89.4%     119       82.6%      161
  total
  Both     473       78.8%     346       73.2%      600
  groups
@end(verbatim)
@caption(Editcost tested on Study 2 errors)
@tag(pedit-ps2)
@end(figure)

The same information is given, as for S1. Corrections were offered
for nearly 79% of errors, over both groups. 
The first option offered was the correction in 73% of errors overall.

@subsection(Errors which the editcost program failed to correct)
@label(pedit-disc12)

The errors for which the editcost program did not offer corrections will
now be considered, and reasons for this failure discussed.
The sets of errors on which the program failed are given
in figures @ref(errors-useS2) (use by S2), @ref(errors-editS1)
(testing on S1 errors), @ref(errors-editS21) (testing on S2,
group 1 errors) and @ref(errors-editS22) (testing on S2,
group 2 errors).
@newpage
@begin(fullpagefigure)
@begin(verbatim)
          FR                               DV
  eyes          irs               brown         blounm
  eyes          ias               hair          hear
  head          hard              of            ove
  saw           sore              buried        beray
  saw           sour              of            ovre
  about         ubout             dalgleish     dugle
          TE                      magazine      magen
  gold          goib                       DR
  through       thr               strachan      stacking
  through       thro              strachan      cracking
  called        golld             instructions  inchuns
  bunny         bune              instructions  chuns
  any           ene               turtle        trener
  conservative  cunjnc            turtle        turend
  conservative  sevter                     GR
          MA                      computer      ucnputer
  perq          pirck             computer      unconputer
          DI and ST - no uncorrected errors
@end(verbatim)
@caption(Using editcost - Study 2
Errors for which correction not offered)
@tag(errors-useS2)
@begin(verbatim)
          MW                               NM
  won           one               paw           po
  threw         through           change        caing
  a             are               we            wer
  change        gh                         CM                  
  wrote         nrote             called        colde
  wrote         krote             commercial    commrs
          LB                      university    ynusty
  quarry        qorie             can           came
  fool          full              draw          droy
          SS                      that          ther
  new           neea              recall        tecall
  had           hat               change        calde
  draw          john              night         nairt
  make          mosea             the           whe
  make          msea              haunted       hoted
  time          the               through       thro
  get           cedt              came          cane
  television    tahgfring         hear          haes
  tv            talhfi
  horror        horey
          GQ and JM - no uncorrected errors   
@end(verbatim)
@caption(Testing editcost - Study 1
Errors for which correction not offered)
@tag(errors-editS1)
@end(fullpagefigure)
@begin(fullpagefigure)
@begin(verbatim)
                    Group 1
          FR                           TE
  hair       hera, hare       weight        wait
  eyes       irs, ias         gold          goib
  head       hard, herd       through       thro,thr
  silver     isilver          came          gam
  saw        sore,sour        called        golld
  where      warh             come          conn 
  of         fo               seen          cn, cen
  piece      pees, peces,     motor         moterdf
             peesc            dangerously   bandrie
  about      ubout            bunny         bune
  showed     sods, shodes,    conservative  cunjnc,sevter
             sodes            plastic       plasek
  wyse       wizes            of            over
  put        pit              work          wrk
                              any           ane
          DV                  would         wob
  eyes       liss, isse       we            wie  
  brown      bloum, blounm    had           thad       
  hair       hear             just          tust
  bye        bi               walked        workt
  island     illing           took          tike 
  of         over, ove,       dark          barck
             ov, ovre    
  buried     beray                   DR         
  ghost      goss             have          haft, half  
  dalgleish  dugle            light         like
  have       uve              the           then
  for        of               check         pellrs
  goodbye    boodbye          packed        park,par,part,
  magazine   magen                          parck, pakt
  interview  intovue          won           win
  soldiers   soildde          strachan      stacking,
  plastic    plaiked                        cracking
  stairs     stared, stare    nicholas      nickris,nickis
  down       dame             turtle        turned, trener
  pictures   pieces                         turend
  could      cood             stadium       stamun
  bit        bid              brazil        brasur
  talk       tock             picture       ping
  drill      drule            instructions  inchins,incruns
  more       mor                            chuns, inchuns
  photo      fot, front       robot         romdt, rodet,
  white      withe                          rodert, roder, 
                                            romdert
                              dead          beb
                              dog           bog
@end(verbatim)
@caption(Testing editcost - Study 2, group 1 
Errors for which correction not offered)
@tag(errors-editS21)
@end(fullpagefigure)

@begin(figure)
@begin(verbatim)
                    Group 2
          GR                           DI
  called     could            specifications  spec
  straight   strat   
  try        trie                      MA
  who        how              perq            purk, pirck
  uses       yous             alternatives    alteration
  a          and              
  put        pit                       ST
  any        ena              tune           chune
  computer   unconputer,      procedures     prgrame
             ucnputer         so             sow

@end(verbatim)
@caption(Testing editcost - Study 2, group 2
Errors for which correction not offered)
@tag(errors-editS22)
@end(figure)

An error for which the editcost program does not offer the
correction will be referred to as a @b(non-corrected error).
The set of non-corrected errors resulting from the use
of the program in S2 is a subset of those resulting from
testing all the S2 errors, and so
this subset will not be considered separately.

Non-correction of an error indicates the inability of the
program to reconstruct the correction from the error.
This could be due to:
@begin(enumerate)
errors being so irregular that the correction cannot be inferred;

program data being incomplete or incorrect, that is:
@begin(alphabetize)
omission of the correction in the shortlisting process;

the weightings used being inappropriate;

the costing function being inappropriate;
@end(alphabetize)

the description of the errors in terms of format (and hence
analysis in terms of edit operations) being inadequate.
@end(enumerate)
The latter two possible causes of failure will be considered first.

Inclusion of a dictionary word in the 
shortlist, for consideration by the costing algorithm,
was dependent upon the length and first character(s)
of the word. In a number of cases, the desired correction
was omitted from the shortlist. Non-correction of
the misspelling is attributable to a failure in shortlisting for:
@begin(verbatim)
        9 out of 35 non-corrected errors in S1
        24 out of 127 non-corrected errors in S2
        33 out of 162 non-corrected errors in total
@end(verbatim)
If further alternatives are permitted for first
letter confusions, more corrections could be included
in the shortlist. For example, alternatives
a for u (=a/u), e/i, g/b, wh/ho, t/ch, h/th
would reduce the omissions from the shortlist by 6.
Additionally, if a difference of 4 characters is permitted
between word and error, for  words of less than 10 characters,
then a further 5 words would be shortlisted.

However, the program does succeed in providing 
the correction for 85.1% of S1 errors and for 78.8% of S2 
errors when tested, and for 91.1% of S2 errors checked
(for which the correction is available) when the
program is in use (see section 8.2.1):
more than 80% of errors tested overall.
For a large number of errors, therefore, it seems that their description 
in terms of format, assignment of weightings and calculation
of costs, is sufficient to enable reconstruction.

It may be that some of the spellings are so bizarre
that they conform to no apparent pattern: the correction will not
be recognizable from the error. To test this,
the set of non-corrected errors (for S1 and S2)
was given to an independent judge for correction.
The judge  was asked to write what he thought would be the
correction for each misspelling alongside it; to mark with a tick
any word that he thought was spelt correctly (i.e.
words misspelt as other words would be marked); to mark with a cross
any word for which he could suggest no correction.
Having corrected or marked all words presented,
the judge was then told that, in fact, all the words were
misspellings. He was then asked to write alongside each ticked word
(apparently correct words) what he thought the spelling could be
(knowing that it was not the word given). The judge's
corrections were then compared with the intended corrections, and
all discrepancies noted.

If the judge had succeeded in correcting all the errors,
where the editcost program failed, this would
suggest that improvements of the program were needed.
On the other hand if the judge failed to correct the majority of errors
(i.e. they were unrecognizable) then this would indicate
too little consistency, or lack of identifiable pattern, in the errors
made. That more than 80% of errors were successfully
corrected, by the program, indicates that there is an
identifiable pattern in the majority of errors.

It might be argued that the judge might fail to correct
the errors because of unfamiliarity with the vocabulary
used by the children in the two studies. This was overcome by using
the same judge who had already seen all sets of 
error-correction pairs (see subsection @ref(pphon-disc12)).
This meant that the judge had seen all the errors before,
with their corrections, though in a different
order (errors were presented in a random order).
He was also reminded of the topics dealt with in the children's
writing. Despite this, he failed to recognize a substantial number of
errors, though he did indicate that his previous experience
had slightly influenced the corrections 
offered.

Outcomes of comparison of the judge's corrections and the
intended corrections are classed as follows:
@begin(enumerate)
the correction provided by the judge was the intended word (=C)

no correction could be suggested (=NC)

the wrong correction was suggested (=WC)

the misspelling was taken as the correct spelling of another
word initially, but was later reconsidered and classed in one 
of the above categories (=IC, INC, IWC)
@end(enumerate)
A summary of the results is given, for each group, in 
figure @ref(boog-editsum).
The total frequencies for the categories C, NC, and WC are given.
Included are those errors initially thought to be correct (the frequency
of which are given in brackets, for each category).
@begin(figure)
@begin(verbatim) 
                                 No         Wrong 
                  Correction  Correction  Correction  Total
                      C          NC           WC
  Study 1          
    Group 1          2(1)       2(0)         2(2)      6(3)
    Group 2          5(2)       9(1)        15(1)     29(4)
  S1 total          7(3)      11(1)        17(3)     35(7)
  Study 2
    Group 1         29(6)      38(11)       43(7)    110(24)
    Group 2         12(2)       3(3)         2(0)     17(5)
  S2 total         41(8)      41(14)       45(7)    127(29)
  Total             48(11)     52(15)       62(10)   162(36)
@end(verbatim)
@caption(Comparison of judge's corrections with
intended corrections - summary)
@tag(boog-editsum)
@end(figure)
Results are also given, for each child, 
in figures @ref(boog-edit1) and @ref(boog-edit2).
@begin(fullpagefigure)
@begin(verbatim)
          C      IC     NC     INC      WC      INC    Total
  Group 1
  GQ      -       -      -       -       -       -        0
  JM      -       -      -       -       -       -        0
  MW      1       1      2       -       -       2        6
  
  Group 2
  LB      -       1      -       -       1       -        2
  NM      -       -      1       -       2       -        3
  CM      3       -      3       -       7       1       14
  SS      -       1      4       1       4       -       10
  
  Group 1 1       1      2       0       0       2        6
  total
  Group 2 3       2      8       1      14       1       29
  total
  Both    4       3     10       1      14       3       35
  groups
@end(verbatim)
@caption(Comparison of judge's corrections with
intended corrections - Study 1)
@tag(boog-edit1)

@begin(verbatim)
          C      IC     NC     INC      WC      INC    Total
  Group 1
  FR      7       2      3       -       4       4       20
  DV      7       1      5       5      12       3       33
  TE      6       1     10       1       7       -       25
  DR      3       2      9       5      13       -       32
  
  Group 2
  GR      5       2      -       2       1       -       10
  DI      1       -      -       -       -       -        1
  MA      2       -      -       1       -       -        3
  ST      2       -      -       -       1       -        3
  
  Group 1 23      6     27      11      36       7      110
  total
  Group 2 10      2      0       3       2       0       17
  total
  Both    33      8     27      14      38       7      127
  groups
@end(verbatim)
@caption(Comparison of judge's corrections with
intended corrections - Study 2)
@tag(boog-edit2)
@end(fullpagefigure)

The judge corrected 29.6% of the non-corrected errors.
He failed to offer a correction for 32.1% of the errors and offered
alternatives for 38.3%.
Of those corrected 11 had initially been believed to be alternative words,
spelt correctly, and were left uncorrected.
At first attempt, therefore, only 22.8% of errors were successfully
corrected. Overall the judge failed to correct
70.4% of errors. Thus for
70.4% of errors that the program failed to correct,
the human judge also failed to identify the correction;
despite knowing that all the words were errors and having
previously seen the error/correction pairs.
Additionally, of the 33 words that the program failed to shortlist,
20 presented difficulty to the judge.
The judge experienced particular difficulty with the errors made 
by CM and SS (S1, group 2) and by DV, TE and DR (S2, group 2): 
he failed to correct between 72% and 90% of them.
This suggests that these errors were in some way unrecognizable.

Summarising the results for the editcost program overall:
@begin(itemize)
the program succeeded in correcting
@begin(alphabetize)
85.1% of errors made by Group 1, when tested;

78.8% of errors made by Group 2, when tested;

91.1% of errors made by Group 2 (for which the correction
was available) when the program was in use;

80.6% of errors tested (a + b) overall.
@end(alphabetize)

of those it failed to correct (162 errors)
@begin(romanize)
48 were corrected by the judge (therefore attributable to failure on
the part of the program), accounting for 5.7% of errors overall;

114 were not corrected by the judge (therefore attributable to
insufficient regularities shown in the errors), accounting for 13.7%
overall.
@end(romanize)
@end(itemize)

@newpage
@section(Performance of the phoncode program)
@label(perform-phonc)
@subsection(Testing on Study 1 and Study 2 errors)
@label(pphon-test12)
The performance of the phoncode program was assessed on the
sets of errors made in S1 and S2. The same testing dictionary
was used for testing both the phoncode and editcost programs. The
dictionary was coded phonemically for testing with the phoncode
program (see chapter @ref(detail), section @ref(dict-phon)).

Each error was input to the phoncode program. Words offered
by the program as `phonetic equivalents' were recorded. Whether or not
the correction for the error was included in these words was
noted. The following information was obtained:
@begin(alphabetize)
the number and percentage of errors for which the correction is 
included in the words offered by the program;

the number and percentage of errors for which the correction is not offered;

the total number of errors made.
@end(alphabetize)
The results of testing the errors in S1 are given in
figure @ref(pphon-ps1). 
@begin(figure)
@begin(verbatim)
                 a.                b.               c.
              correction       correction not     total
             included in        included in     number of
            words offered      words offered      errors
            freq      %        freq      %              
  Group 1
  GQ         10     66.7%        5     33.3%        15
  JM         31     81.6%        7     18.4%        38
  MN         21     60.0%       14     30.0%        35

  Group 2
  LB         23     88.5%        3     11.5%        26
  NM         21     63.7%       12     36.3%        33
  CM         30     50.0%       30     50.0%        60
  SS          8     28.6%       20     71.4%        28

  Group 1    62     70.5%       26     29.5%        88
  Group 2    82     55.8%       65     44.2%       147
  1 & 2     144     61.3%       91     38.7%       235
@end(verbatim)
@caption(Phoncode tested on study 1 errors)
@tag(pphon-ps1)
@end(figure)
The percentage of errors for which the correction is included in the
words offered, for all children, is 61.3%.
The overall percentage for group 1 is higher than that for group 2,
though the difference is not statistically significant.
The lowest percentage offered is
28.6% for SS (more than 20% lower than for any other child).
CM is next lowest with 50% corrected. MN, NM and GQ all fall
in the 60 to 67% range. 
The highest percentage corrections are for LB and JM, 
with 88.5% and 81.6% respectively.
Information is given for each child, and for each group of children.
The results of testing errors made in S2 are given in figure
@ref(pphon-ps2). 
The same information is provided for this group.
@begin(figure)
@begin(verbatim)
                 a.                b.               c.
             correction       correction not      total
            included in        included in      number of
            words offered     words offered       errors
            freq     %         freq     %              
  Group 1
  FR         78     63.4%       55     36.6%       123
  DV         43     43.8%       45     56.2%        98
  TE         71     54.2%       40     45.8%       131
  DR         30     34.5%       57     65.5%        87

  Group 2
  GR         35     63.6%       20     34.4%        55
  DI         16     72.7%        6     27.3%        22
  MA         33     78.6%        9     21.4%        42
  ST         29     69.0%       13     31.0%        42

  Group 1   222     50.5%      217     49.5%       439
  Group 2   113     70.2%       48     29.8%       161
  1 & 2     335     55.9%      265     44.1%       600
@end(verbatim)
@caption(Phoncode tested on study 2 errors)
@tag(pphon-ps2)
@end(figure)
The overall percentage correction for both groups is 55.9%.
Group 2 all have higher percentage corrections than Group 1:
the Group 2 total is 70.2%, while that for Group 1 is 50.2%
(p<0.02).
The percentage corrected, for all children, ranges
from 35.5% to 78.6%, distributed fairly evenly through the whole range. 
@newpage

@subsection(Errors which the phoncode program failed to correct)
@label(pphon-disc12)

Of the 835 misspellings made overall, the phoncode program failed to correct
356 (42.6%).
This failure may be attributed to one or more of the following:
@begin(enumerate)
the misspellings and corrections were not "phonetically equivalent";

the program failed to find the "phonetically equivalent" correction for
the misspelling, due to:
@begin(romanize)
the phoneme-grapheme grammar being incorrect or incomplete;

the segmentation algorithm being incorrect;

the words being incorrectly coded in the phonetically coded dictionary.
@end(romanize)
@end(enumerate)

In order to determine which of the misspellings might be considered
phonetic and which non-phonetic, a judge was used to classify them.
This was the same person who was later used to judge the errors
that the editcost program failed to correct (see section 8.2.3).
The judge was a male Scottish teacher, with a knowledge of linguistics.
He was very familiar with the dialect used by the children in the two
studies. 

The judge was given the complete set of misspellings and corrections, for both
sets of children.
He was asked to look at each misspelling/correction pair and to decide 
whether or not they could be considered to be phonetically equivalent:
if both were read aloud would they be indistinguishable.
After a practice on a set of 'misspellings' and 'corrections' 
taken from Cohen (1984), the definition was further refined to
"both spellings being interpreted as the same word by a local
native speaker, when read aloud; the pronunciation of misspellings to be
determined by the common pronunciation of graphemes in
different contexts". The judge, therefore, was permitted to consider
the same misspelling as having more than one pronunciation.

Each error was marked by the judge as either phonetic or non-phonetic.
The results of this classification and those of the phoncode program were
compared. These results were classified in the following categories:
@begin(alphabetize)
correction included in words offered and judged to be 'phonetic' (C/Ph)
 = agreement;

correction included in words offered and error judged to be
'non-phonetic' (C/NPh) = disagreement;

correction not included in words offered and error judged to 
be 'phonetic' (NC/Ph) = disagreement;

correction not included in words offered and error judged to be
'non-phonetic' (NC/NPh) = agreement;

total number of errors
@end(alphabetize)

Results of the comparison of the judge's classification and the program
performance are given in figures @ref(phoncomp-ps1) and @ref(phoncomp-ps2).
@begin(figure)
@begin(verbatim)
             a         b         c         d          e
            C/Ph     C/NPh     NC/Ph     NC/NPh     total
           (% of     (% of     (% of     (% of    number of
           total)    total)    total)    total)     errors
  Group 1
  GQ         6         4         0         5          15
           (40%)    (26.7%)     (0%)    (33.3%)
  JM        22         9         0         7          38
          (57.9%)   (23.7%)     (0%)    (18.4%)
  MW        16         5         1        13          35
          (45.7%)   (14.3%)    (2.9%)   (37.1%)
  Group 2                              
  LB        14         9         0         3          26
          (53.9%)   (34.6%)     (0%)    (11.5%)
  NM        17         4         1        11          33
          (51.6%)   (12.1%)     (3%)    (33.3%)
  CM        15        15         4        26          60 
           (25%)     (25%)     (6.7%)   (43.3%)
  SS         2         6         2        18          28
           (7.1%)   (21.5%)    (7.1%)   (64.3%)
                              
  Group 1   44        18         1        25          88
  total    (50%)    (20.5%)    (1.1%)   (28.4%)
                              
  Group 2   48        34         7        58         147
  total   (32.7%)   (23.1%)    (4.8%)   (39.4%)
                              
  Both      92        52         8        83         235
  groups  (39.2%)   (22.1%)    (3.4%)   (35.3%)
@end(verbatim)
@caption(Comparison of errors corrected by the
Phoncode program with those judged
to be 'phonetic' - Study 1)
@tag(phoncomp-ps1)
@end(figure)

@begin(figure)
@begin(verbatim)
             a         b         c         d          e
            C/Ph     C/NPh     NC/Ph     NC/NPh     total
           (% of     (% of     (% of     (% of    number of
           total)    total)    total)    total)     errors
  Group 1
  FR        52        26        12        33         123
          (42.3%)   (21.1%)    (9.8%)   (26.8%)
  DV        31        12         3        52          98
          (31.6%)   (12.2%)    (3.1%)   (53.1%)
  TE        42        29         7        53         131
          (32.1%)   (22.1%)    (5.3%)   (40.5%)
  DR        20        10         4        53          87
           (23%)    (11.5%)    (4.6%)   (60.9%)
  Group 2                              
  GR        25        10         5        15          55
          (45.4%)   (18.2%)    (9.1%)   (27.3%)
  DI        14         2         1         5          22
          (63.6%)    (9.1%)    (4.6%)   (22.7%)
  MA        28         5         0         9          42 
          (66.7%)   (11.9%)     (0%)    (21.4%)
  ST        20         9         3        10          42
          (47.6%)   (21.4%)    (7.2%)   (23.8%)
                              
  Group 1  145        77        26       191         439
  total    (33%)    (17.5%)     (6%)    (43.5%)
                              
  Group 2   87        26         9        39         161
  total    (54%)    (16.2%)    (5.6%)   (24.2%)
                              
  Both     232       103        35       230         600
  groups  (38.7%)   (17.2%)    (5.8%)   (38.3%)
@end(verbatim)
@caption(Comparison of errors corrected by the
Phoncode program with those judged
to be 'phonetic' - Study 2)
@tag(phoncomp-ps2)
@end(figure)

For Study 1, group 1, the agreement between the program and the judge
is 78.4% (=a + d =50% + 28.4%) and 72.1% for group 2 (=32.7% + 39.4%): 
that is, 74.5% (=39.2% + 35.3%) overall.
Groups 1 and 2 showed no significant differences when compared in
any of the categories (a),(b),(c),(d).
Most disagreement between judge and program outcome occurred in
the C/NPh category (22.1%); misspellings classed as non-phonetic by 
the judge were corrected by the program. 
Only 1.1% of group 1 errors and 4.8% of group 2 errors (3.4%  or 
8 errors overall) were classed as phonetic but not corrected. 
Of the errors made, 39.2% were both classed
as phonetic (by the judge) and corrected (by the phoncode program).

For Study 2, 77% agreement is shown between judge and program
(group 1 - 76.5%; group 2 - 78.2%). 
Groups 1 and 2 differed on the frequency of errors classed in categories
(a) and (d): group 2 had more errors classed as phonetic and
corrected than group 1 (p<0.02) and fewer non-phonetic and non-corrected
errors (p<0.05).
No significant differences were shown between the two groups in
the categories for which judge and program disagreed.
Over the two groups 5.8% of errors were classed as phonetic
but not corrected (group 1 - 6%; group 2 - 5.6%). 
In all, 38.7% of errors were
classed as phonetic and corrected, with a further 17.2% corrected.

The combined figures for both studies give 76.3% agreement
between the program and the judge.
38.8% of errors were judged to be phonetic, and were corrected,
with a further 18.6% corrected (but judged to be non-phonetic).
37.5% were judged to be non-phonetic and were not corrected by the
phoncode program.
Only 5.1% were judged to be phonetic but not corrected. 

The reasons for the failure of the phoncode program will be considered.
The program was not designed to correct non-phonetic errors, thus
a large percentage of the misspellings (37.5%) were classed as non-phonetic
and were not corrected.
There were 43 misspellings, judged to be phonetic, which the program 
failed to correct (NC/Ph).
These are listed in figure @ref(phonboth).

@begin(figure)
@begin(verbatim)
                   Study 1
          MW                               NM
  won           one               sounds        souns
          SS                               CM                  
  get           cedt              picture       picher
  easter        eastr             buttons       butns
                                  buttons       buttns 
                                  castle        castl
                   Study 2
          FR                               TE
  blood         plood             police        plec
  treasure      tresher           seen          cn
  diamonds      dimens            dangerously   dangersly
  jewels        jouls             ireland       irlnd
  using         yoosing           thatcher      thacher
  magazine      magzine           work          wrk
  magazine      magzeen           if            ifh
  computer      compyooter                 DV
  how           howe              goals         gois
  put           pit               score         scorre
  chemical      cemikle           picture       pichur
  plans         plandes                    DR         
          GR                      took          toog
  university    univesty          picture       picher
  boxes         boxs              kitchen       kitshen
  used          yoosed            has           his
  alphabet      alphapet                   ST
  put           pit               government    goverment
          DI                      programmes    progames
  designed      designned         three         theree
@end(verbatim)
@caption(Errors judged to be phonetic, but not 
corrected: S1 and S2)
@tag(phonboth).
@end(figure)

As stated above, the failure may be attributed to incomplete or
incorrect grammar; incorrect segmentation; or incorrect coding
of the dictionary. 
The difficulties of segmentation and coding are discussed in chapter
@ref(detail), subsections @ref(phon-graph), @ref(dict-phon).

Examples of segmentation@foot(A segmentation error is one where
the misspelling is split into graphemes in such a way that
it cannot be matched to the phoneme string
representing the correction.) errors are:
@begin(verbatim,group)
    y = /y/      u = /ju/          d = /d/      d = /d/    
   oo = /u/      s = /z/           e = /I/      e = /I/
    s = /z/      i = /I/           s = /z/      s = /z/
    i = /I/     ng = /ng/          i = /aI/     i = /aI/
   ng = /ng/                       g =  ?      gn = /n/
                                  nn = /n/     ed = /d/
                                  ed = /d/
@end(verbatim)
@begin(verbatim,group)
    k = /k/      k = /k/           s = /s/      s = /s/
    i = /I/      i = /I/           c = /k/      c = /k/
    t = /t/    tch = /ch/          o = /o:/   o_e = /o:/
   sh = /sh/     e = /E/          rr = /r/      r = /r/
    e = /E/      n = /n/           e =  ?
    n = /n/
@end(verbatim)
The phoneme-grapheme grammar failed to provide matches in a number
of cases, though for some of them their classification as 'phonetic' 
errors might be disputed. 

Examples of these are:
@begin(verbatim)
        get     cedt            blood      plood
        put     pit             took       toog
        has     his             alphabet   alphapet
@end(verbatim)
Other classes of errors that presented difficulties include:
@begin(alphabetize)
omitted schwa, particularly before n and l 
@*
e.g.  buttons   buttns     police   plec

other omitted vowels 
@*
e.g.  boxes   boxs       chemical   cemikle

errors involving 'r'
@*
e.g.  easter  eastr       picture   picher

consonant confusions, particularly involving 'd', 't', 'ch' 
@*
e.g.   get   cedt         picture   picher

consonant omissions, particularly d after n
@*
e.g. sounds   souns       diamonds   dimens
@end(alphabetize)

The other set of misspellings that judge and program disagreed on were
those judged as non-phonetic, but corrected by the phoncode program.
A large number of these were vowel confusions accepted as equivalent
by the phoncode grammar but rejected by the judge.
Additionally, other classes of errors accepted by the program, but
considered 'non-phonetic', were:
@begin(itemize)
errors involving 'r' (and vowel);

final 'e' (omitted and added);
 
transpositions, in particular 'ed/de' and 'es/se' and vowels;

incorrectly doubled or singled consonants, in particular 'n' before
'g' or 't', and 'l' before 'k' or 'd';

errors involving 'h' (usually silent)
@end(itemize)

For some of these misspellings, the alteration of a grapheme from 
'tied' to 'untied' would enable them to be corrected, and matched
to their phonetic equivalent.
For a number of others, in particular those involving omission of an 
unstressed vowel, the program would need to be altered to take them 
into account.

Summarising the results for the phoncode program overall:
@begin(itemize)
the program succeeded in correcting
@begin(alphabetize)
61.3% of errors made by Group 1, when tested;

55.9% of errors made by Group 2, when tested;

57.4% of errors tested overall.
@end(alphabetize)

of those it failed to correct (356 errors)
@begin(romanize)
43 were judged to be phonetic (therefore attributable to failure on
the part of the program), accounting for 5.1% of errors overall;

313 were judged to be non-phonetic, 37.5%;
@end(romanize)
additionally, 38.8% of misspellings were both judged to be phonetic and
corrected by the phoncode program.
@end(itemize)

@newpage
@section(Results for combined programs)
@label(pres-combined)

@comment[
- performance, and how phoncode improved results
- words they both failed to get
- best to use both?]

The results of testing the performance of the two programs,
on the sets of misspellings from the two studies, were
combined. There was a large amount of overlap between the corrections.
The results for each program and for the combined programs are given
in figures @ref(editphon-ps1) and @ref(editphon-ps2).

@begin(figure)
@begin(verbatim)
             a         b         c         d          e
         corrected corrected corrected corrected    total
            by        by        by        by      number of
          editcost  phoncode  neither  combined     errors
  Group 1
  GQ        15        10         0        15          15
          (100%)    (66.7%)     (0%)    (100%)
  JM        38        31         0        38          38
          (100%)    (81.6%)     (0%)    (100%)
  MW        29        21         5        30          35
          (82.9%)    (60%)    (14.3%)   (85.7%)
  Group 2                              
  LB        24        23         0        26          26
          (92.3%)   (88.5%)     (0%)    (100%)
  NM        30        21         2        31          33
          (90.9%)   (63.7%)    (6.1%)   (93.9%)
  CM        46        30        10        50          60 
          (76.7%)    (50%)    (16.7%)   (83.3%)
  SS        18         8         9        19          28
          (63.4%)   (28.6%)   (32.1%)   (67.9%)
                              
  Group 1   82        62         5        83          88
  total   (93.2%)   (70.5%)    (5.7%)   (94.3%)
                              
  Group 2  118        82        21       126         147
  total   (80.3%)   (55.8%)   (14.3%)   (85.7%)
                              
  Both     200       144        26       209         235
  groups  (85.1%)   (61.3%)   (11.1%)   (88.9%)
@end(verbatim)
@caption(Comparison of errors corrected by Editcost
and by Phoncode programs - Study 1)
@tag(editphon-ps1)
@end(figure)
For each child, for each group, the following information is given:
@begin(alphabetize)
the number and percentage of errors corrected by the editcost program;

the number and percentage of errors corrected by the phoncode program;

the number and percentage of errors corrected by neither program;

the number and percentage of errors corrected by either of
the two programs;

the total number of errors made.
@end(alphabetize)


@begin(figure)
@begin(verbatim)
             a         b         c         d          e
         corrected corrected corrected corrected    total
            by        by        by        by      number of
          editcost  phoncode  neither  combined     errors
  Group 1
  FR       103        78        10       113         123
          (83.7%)   (63.4%)    (8.1%)   (91.9%)
  DV        65        43        22        76          98
          (66.3%)   (43.8%)   (22.4%)   (77.6%)
  TE       106        71        13       118         131
          (80.9%)   (53.2%)    (9.9%)   (90.1%)
  DR        55        30        26        61          87
          (63.2%)   (34.5%)   (29.9%)   (70.1%)
  Group 2                              
  GR        45        35         6        49          55
          (81.2%)   (63.6%)   (10.9%)   (89.1%)
  DI        21        16         1        21          22
          (95.5%)   (72.7%)    (4.5%)   (95.5%)
  MA        39        33         1        41          42 
          (92.9%)   (78.6%)    (2.4%)   (97.6%)
  ST        39        29         2        40          42
          (92.9%)    (69%)     (4.8%)   (95.2%)
                              
  Group 1  329       222        71       368         439
  total   (74.9%)   (50.5%)   (16.2%)   (83.8%)
                              
  Group 2  144       113        10       151         161
  total   (89.4%)   (70.2%)    (6.2%)   (93.8%)
                              
  Both     473       335        81       519         600
  groups  (78.8%)   (55.9%)   (13.5%)   (86.5%)
@end(verbatim)
@caption(Comparison of errors corrected by Editcost
and by Phoncode programs - Study 2)
@tag(editphon-ps2)
@end(figure)
For Study 1, the percentage correction for the combined
programs is 88.9%. Of the 35 errors that the editcost program
failed to correct, 9 errors were corrected by the phoncode program.
The remaining 26 that neither program corrected include some 
that were neither corrected by the judge (in 
testing editcost) nor judged to be phonetic.
Group 1 show a higher percentage correction in all categories 
than group 2, though none of the differences are significant.
By combining the two programs the number of errors corrected is increased,
for most children. GQ and JM are the exceptions with 100% correction
using the editcost program alone.
The increases vary from one additional
correction (MW, NM, SS), to two (LB), to four (CM).

For Study 2, the combined programs correct 86.5% of misspellings.
46 are corrected by the phoncode program that were not corrected by the
editcost program, leaving 81 misspellings not corrected by either
program. 
Group 2 show higher percentage corrections than group 1 for the
individual programs (p<0.05 for editcost and p<0.02 for phoncode)
but no significant differences for the combined programs.
Improvements in the number of misspellings corrected vary from 0 (DI),
1 (ST) to 11 (DV), 12 (TE).

The overall percentage correction by the combined programs is 87.2%.
@comment[Of the 107 misspellings which were not corrected, XX were neither
corrected by the judge nor considered to be phonetic.]

@newpage
@section(Results for individual children)
@label(pres-indiv)

In this section the performance of the spelling correction program, 
in relation to individual children is considered.
The relationships between a number of measures was found by
correlation  of the rankings of individual children on performance
measures.
It was hypothesized that the children who made the most
'regular' errors, i.e. those who produced the fewest bizarre spellings,
would also be those for whom the editcost and phoncode correctors would
be most successful.
Additionally, the errors that they make would be considered to be
'phonetic'. The children making the most 'regular' errors were those
who were perceived as having the least difficulty.

The children were ranked (roughly and subjectively, it should be
noted) in terms of their spelling ability. This ranking was based on
observation by the investigator and discussion with the Reading Unit
teacher. 
For S1, the rough rankings in order of decreasing ability, were:
@verbatim(
      GQ;  JM;  LB;  MW;  NM;  CM;  SS
)
For S2 the rough rankings were:
@verbatim(
   MA and DI;  ST;  GR;  FR;  DV;  DR;  TE
)

The hypotheses tested were:
@begin(enumerate)
success of correction by the editcost and the phoncode programs 
would correlate;

children whose errors were judged to be phonetic would also
show greatest success with the phoncode program;

the children ranked as most able would be those for whom the programs
were most successful and whose errors were judged to be phonetic.
@end(enumerate)
For the children in each group, the relationships between the 
following measures were found using
the Spearman Rank correlation coefficient.
@begin(alphabetize)
percentage correction by the editcost program (in testing);

percentage of corrections that were off(1);

percentage correction by the phoncode program;

percentage  of errors judged to be phonetic;

percentage improvement of editcost results when both programs' 
results are combined.
@end(alphabetize)

The perceived rankings of the children's general spelling
ability were not  statistically correlated with these measures as they 
were considered to be too subjective and crude. They are, however,
considered in relation to the results of these correlations.
The measure of b) was included to test whether there was any relationship
between the degree of success of the editcost program (where off(1)
indicated greatest success) and other measures. 
Measure e) was included to further test the relationship between 
the editcost and phoncode programs' results.

For all measures, percentages were of total number or errors 
made by each child (except (b), which was percentage of (a)).

Significant correlations were found between a number of measures. These
will be summarised and then discussed.

For Study 1
@begin(verbatim)
- correlation between   a) and c) =  .88        ( p < 0.05 )
                        b) and d) =  .76        ( p < 0.05 ) 
                        c) and d) =  .75        ( p < 0.05 )
@end(verbatim)
For Study 2
@begin(verbatim)
- correlation between   a) and c) =  .93        ( p < 0.01 )
                        c) and d) =  .98        ( p < 0.01 )
                        a) and d) =  .97        ( p < 0.01 )

- correlation between   b) and e) = -.76        ( p < 0.05 )
                        b) and a) =  .82        ( p < 0.05 )
                        b) and c) =  .68        ( p < 0.05 )
                        b) and d) =  .71        ( p < 0.05 )
                        e) and a) = -.72        ( p < 0.05 )
                        e) and c) = -.81        ( p < 0.05 )
                        e) and d) = -.74        ( p < 0.05 )
@end(verbatim)

For  children in Study 1, success of editcost and 
phoncode programs were correlated; as were
success of phoncode program and percentage of errors judged to be phonetic,
and percentage corrections offered as the first editcost option
and percentage judged to be phonetic.

Stronger correlations are shown for Study 2:
performance of phoncode and editcost programs, and
percentage of errors judged phonetic all correlate.
Additionally, percentage of errors offered as first option
correlated negatively with the percentage improvement
made by the phoncode program when both program' results
were combined: both of these correlate (the latter, negatively)
with the three strongly correlated measures above.

Therefore, in general it can be said that those children
for whom the editcost program is successful, the phoncode
program will also be successful. A large part of the failure of 
the editcost program can be attributed to unrecognizable errors.
These children also make the fewest unrecognizable errors.
The correlation between phoncode performance and judgement
of phonetic errors suggests that those children for whom
the phoncode program is most successful make the fewest non-phonetic 
errors. These relations are shown most strongly in the Study 2
children; a strong direct correlation is also shown between performance
of editcost program and percentage of phonetic errors. For these children,
the correlations also suggest that those with the most errors
offered as first options also make most phonetic errors,
and fewest non-phonetic errors.

The negative correlation between measures e) and a) is to be
expected: the more successful the editcost program is,
the less scope there will be for improvement.
The editcost program incorporates some information relating
to phonetic equivalence of words (e.g. most likely substitutions
are phonetically similar), hence the high correlation between
measures a) and c) is also not surprising.

Considering the individual performance rankings of the children,
firstly for Study 1; group 1 were described as the more 
able students (see notes on children in appendix @ref(app-assum)), 
and group 2
as the "hopeless cases" (with LB as an addition to this group).
From the performance rankings, GQ, JM and LB generally come out as the 
top group, with MW and NM as the middle group (except
for a percentage of errors judged phonetic, where
NM and GQ swop groups), and CM and SS as the least able,
with the worst results for all measures. These rankings fit very
well with the perceived abilities of the children.
For Study 2, the performance rankings are even clearer:
best ranked are DI, MA and ST, then FR and GR
(where group 1 - "moderately able", and group 2 - "very bright", 
overlap), and finally TE, DV, and DR.
Again, there is a good fit between rankings
and perceived abilities, with the exception of TE who performs
better than would be expected.

In relation to the theoretical discussion of the stages of failure
in the spelling process, various inferences may be made  on the basis
of these findings.
Children who have the least difficulties are more likely to be
failing at a later stage in the process than those who make a large
number of bizarre and irregular spelling errors.
If the former succeed at the 'selection of plausible graphemes' stage,
but fail at the third stage, their errors will be phonetic.
They are more likely to be using correspondences from the
phoncode grammar and hence their errors will be corrected by 
the phoncode program. Their errors are occurring in the selection of 
orthographically correct plausible graphemes: information relating 
to the format is used by the editcost program to correct these
successfully.
It is expected that both editcost and phoncode programs will 
successfully correct the errors made by these children.
In terms of absolute success, it can be seen from the
results that the editcost program is clearly
more successful.
It is designed to cope with both phonetic and non-phonetic errors:
hence its higher rate of success.

Where there is failure at the first or second stage,
that is, the graphemes selected to represent the speech sounds are 
not plausible, we would expect the phoncode program to fail.
We would also expect a lower rate of phonetic errors.
The editcost program is able to 'pick up' some of these
non-phonetic errors: some are too irregular, however, and
cannot be fitted into any general description of errors.

Those children perceived as 'better spellers' showed more regularity
in their errors, made fewer non-phonetic errors and were more likely
to have their errors corrected successfully by both the editcost
and phoncode programs. They were considered to be failing to
select the correct grapheme from the plausible graphemes generated.

The children perceived as least able showed more irregular errors
and more non-phonetic errors. The editcost program was
more successful for them than the phoncode program.
Neither were as successful with these children as with the
better spellers.
Their failings occur at the first or second stage in the spelling process;
that is, in the segmenting of the word into phonemes,
or in the selection of plausible graphemes to represent each phoneme.
Inferences cannot be drawn from these results to judge at which
of the first two stages
the failing is occurring.

It might be inferred from these findings that success in correction
by the phoncode program implies that a phonological strategy
is being used by the child.
Conversely, success in correction by the editcost program could be taken
to suggest that a visuo-orthographic strategy is being employed.
If this argument is accepted, the implication would be that
those children for whom both programs are successful used both
phonological and visuo-orthographic strategies in spelling.
Following from this, it could be argued that the children
for whom the phoncode program is comparatively less successful
use predominantly visuo-orthographic stategies.
There are no clear conclusions that can be drawn from the evidence
presented here, however, for two reasons:
@begin(enumerate)
the editcost program incorporates a certain amount of
phonological information in relation to likely errors:
therefore, the success of the editcost program and the failure 
of the phoncode program does not necessarily imply that a phonological
strategy is not being used;

it is very difficult to assess "comparatively less successful":
whilst the rankings on editcost and phoncode performance correlate 
highly, the absolute differences between percentages appear to bear
little relation to these rankings.
@end(enumerate)

One conclusion that may be drawn is that the more able children
appear to use both strategies with more success than the less able
children.

@newpage
@section(Testing the programs on independent data)
@label(frith-testing)

The editcost and phoncode programs were also tested on data from
an external source. These were a corpus of misspellings of
thirty words produced by 202 ten-year old children, in a
dictation test. 
The children were a random sample selected from a group of
15,000 children in English and Welsh schools.
The data was made available to Roger Mitton (Birkbeck College,
London) by Dr. Uta Frith (MRC Cognitive Development Unit, London).
A copy of the corpus of misspellings was provided for testing
in this thesis.

The number of misspellings in the corpus is 2482.
Of these 1364 are unique: the rest are the same misspelling
made by more than one child.
The set of unique misspellings will be referred to here as
'errors excluding repeats', whilst the full corpus will be referred
to as 'errors including repeats'.
The set of errors excluding repeats was used with the editcost and
phoncode programs. The testing dictionary was that referred to
elsewhere in this thesis (section @ref(pedit-test12)),
with the addition of those of the thirty dictated words that were
not already included.

Results are given for each of the thirty words:
figure @ref(frith-exclcop) shows the results for the
errors, excluding repeats;
figure @ref(frith-inclcop) shows correction of errors including
repeats.

@begin(figure)
@begin(verbatim)
                 a         b         c         d        e
             corrected corrected corrected corrected  total
                 by        by        by        by     number
              editcost  phoncode   either    either     of
    Words     (off(1))            (number)     (%)    errors

  often        26 (19)     15        26      89.7%      29
  visited      41 (33)      8        41      91.1%      45
  aunt         14 (9)       4        14      66.7%      21
  magnificent  83 (78)     21        83      82.2%     101
  house         8 (5)       3         8      88.9%       9
  opposite     56 (44)     30        58      79.4%      73
  gallery      51 (26)     18        51       81%       63
  remember     37 (31)      9        37      90.2%      41
  splendid     33 (29)      9        33      58.9%      56
  purple       24 (18)     12        25      75.8%      33
  curtains     39 (32)     24        39      79.6%      49
  wrote        13 (8)       7        14       56%       25
  poetry       62 (46)     24        63      78.8%      80
  problem      35 (30)      8        35      83.3%      42
  understand   24 (20)      5        24      82.8%      29
  latest       32 (27)     10        32      71.1%      45
  poems        28 (23)      9        29      74.4%      39
  wanted       10 (5)       4        11      52.4%      21
  laugh        18 (9)       9        23      62.2%      37
  pretend      45 (37)     10        45      81.8%      55
  really       29 (19)     15        29      70.7%      41
  special      74 (53)     32        74      85.1%      87
  refreshment  53 (48)     16        53      81.5%      65
  there         5 (2)       4         5      71.4%       7
  blue          5 (4)       3         5      71.4%       7
  juice        18 (14)     23        32      69.6%      46
  cake          9 (6)       2        11      84.6%      13
  biscuits     63 (55)     16        63      80.8%      78
  stomach      62 (44)     44        67      79.8%      84
  contented    37 (33)      9        37       86%       43

  Total      1034 (807)   403      1067      78.2%    1364
@end(verbatim)
@caption(Testing of the editcost and phoncode programs
on independent data - excluding repeats)
@tag(frith-exclcop)
@end(figure)


@begin(figure)
@begin(verbatim)
                 a         b         c         d        e
             corrected corrected corrected corrected  total
                 by        by        by        by     number
              editcost  phoncode   either    either     of
    Words     (off(1))            (number)     (%)    errors

  often        51 (42)     32        51      91.1%      56
  visited      93 (78)     24        93      93.9%      99
  aunt         71 (63)     43        71      87.7%      81
  magnificent 136 (131)    59       136      88.3%     154
  house        14 (11)      9        14      93.3%      15
  opposite    125 (109)    97       132       88%      150
  gallery     101 (70)     58       101      88.6%     114
  remember     90 (84)     13        90      94.7%      95
  splendid    102 (96)     61       102      81.6%     125
  purple       41 (35)     27        42       84%       50
  curtains     76 (66)     57        76      88.4%      86
  wrote        63 (55)     60        64       78%       82
  poetry       91 (73)     37        92      84.4%     109
  problem      66 (60)     15        66      90.4%      73
  understand   29 (25)      5        29      85.3%      34
  latest       47 (42)     14        47       77%       61
  poems        64 (58)     19        65      86.7%      75
  wanted       28 (17)     16        29      70.8%      41
  laugh        28 (18)     30        50      76.9%      65
  pretend      81 (69)     26        81       88%       92
  really      110 (99)     94       110      90.2%     122
  special     108 (87)     52       108      89.3%     121
  refreshment  77 (72)     28        77      86.5%      89
  there        19 (12)     19        19      54.3%      35
  blue         11 (10)      3        11      84.6%      13
  juice        54 (50)     64        74      83.1%      89
  cake         13 (10)      5        18       90%       20
  biscuits    126 (113)    65       126      89.4%     141  
  stomach     108 (77)     86       116      84.7%     137
  contented    52 (47)     16        52      89.7%      58

  Total      2075 (1779) 1134      2142      85.7%    2482
@end(verbatim)
@caption(Testing of the editcost and phoncode programs
on independent data - including repeats)
@tag(frith-inclcop)
@end(figure)

Results of testing are given in the following categories:
@begin(alphabetize)
the number of errors for which the correction was offered
by the editcost program (and the number for which these
were the first offer, when offered);

the number of errors for which the correction was offered by the phoncode 
program;

the number of errors for which the correction was offered by
either of the two programs;

the percentage of the total number of errors for which the
correction was offered, by either program (c/e);

the total number of errors.
@end(alphabetize)


For 78.2% of the unique errors, and for 86.3% of the total number
of errors, the correction is offered by either the editcost or the
phoncode program. Of the errors corrected by the editcost program,
85.7% are offered as the first option i.e. the least cost repair,
representing 71.7% of the total number of errors.

As with the children in the two studies, the editcost program
was more successful than the phoncode program. Because a large
number of the errors made by the children would probably not
be classed as phonetic, this was to be expected.
Some failure could be attributed to the program, however.
The words that the correctors failed on are not analysed in detail, 
though the discussion of failure in relation to the two
studies is of relevance (see section @ref(pphon-disc12)).
The phoncode program provided little improvement over the editcost
program, except for the words 'laugh' and 'juice'.

The combined programs failed to achieve 70% correction on 
unique misspellings of 'aunt', 'wrote', 'wanted', 'laugh', 'juice'
and 'splendid'. There is improvement in performance when repeated errors
are included. Those errors that the program succeeds in correcting are those
that are most often repeated (the exception being those of 'there').
For 7 of the 30 words, more than 90% of the missspellings were corrected.

Mitton had previously tested two other spelling correction
algorithms with this data @cite(mitton84).
He found that 42% of errors (including repeats) would be included
as candidates when classed as single edit misspellings (i.e. one edit
operation required to correct the error).
Depending upon the size and the content of the dictionary,
there may be many other candidates.
The errors were also coded using the soundex code.
For 64% of errors the coding matched for error and correction.
Again, many other candidates may also match.
Combining the results of the two algorithms, the correction
was found to be in the candidate list for 72.9% of errors.
For the editcost program alone, the percentage of errors corrected was 83.6%,
of which for more than 85% the first word offered was the correction
(=71.7% of total). 
The correction programs, therefore, though designed for use with
children with spelling difficulties, could also be used by other children.


@newpage
@section(Summary)
@label(perform-summ)

The results presented in this chapter show that the spelling correction
programs, developed in this study, were successful in correcting the 
errors made by children with learning difficulties in spelling.
The editcost program was the more successul of the two.
As it was designed to deal with both phonetic and non-phonetic
errors, whereas the phoncode program was designed to deal with
phonetic errors, this was to be expected.
The editcost program succeeded in offering corrections for more than 80%
of the errors made in the two studies.
The phoncode program succeeded in offering corrections for 57.4%
of errors tested.
Of those the phoncode program failed to correct (42.6%), 37.5% were judged
not to be phonetic.
In combination the two programs provided corrections for 87.2%
of errors over both studies.
The success of the programs is restricted by the correction being 
in the dictionary: if it is not in the dictionary it cannot
be offered to the user.

The programs were also tested on independent data and found to be successful:
78.2% of unique errors made were corrected by the combined programs; 86.3%
of errors in the corpus (including repeats) were corrected.
Of the corrections made by the editcost program, for 71.7%
(of the complete corpus) the intended correction was the first word
offered.
This compares favourably with other algorithms tested on the same data.
The program would, therefore, be suitable for use by children
with no specific difficulties.


Evidence is provided that there are regularities in the errors made by 
children with spelling disabilities.
In testing the editcost program, 13.7% of failures to correct
errors were attributed to there being insufficient regularities
to enable correction.
80.6% of errors were successfully corrected and 5.7% of failures
were attributed to failure on the part of the program.
Thus, for 86.3% of errrors there was sufficient regularity
in the misspelling to permit correction of the error.

In considering the results of the phoncode program, 57.4% were 
corrected overall and a further 5.1% attributed to failure
on the part of the program. 37.5% of errors, therefore, were
assessed as being non-phonetic - that is, phoneme-grapheme
correspondences on which they were based did not conform to the grammar
provided.
However, 62.5% of misspellings did conform to the grammar.
It is argued that there are regular phoneme-grapheme correspondences in
the children's spellings, and that there are also additional regularities
in the orthography (according to the additional corrections by the editcost
program).

That the programs succeed in correcting a large proportion of the errors
made also demonstrates that these regularities can be used by the 
programs to reconstruct the corrections from the errors.
The information incorporated in the programs, based on the description 
of the errors in terms of format, general classes of characters and rules,
and phoneme-grapheme correspondences, enables successful debugging of the 
error to provide the correction.
The description or errors in these terms is also, to a large extent,
validated by the results.
   