.pg 5.1 2-D silhouette vision
This section describes how various sorts of information can be extracted from 
a simple silhouette image, i.e. an image consisting essentially of a 
rectangular array of values each either 0 (representing a point not part of
the image of some object) or 1 (representing a point that is part of some 
object). Consider, for example, this image of the letter 'T':
.nf
.sp 1
                        00000000000000000000000000
                        00000011111111111100000000
                        00000011111111111100000000
                        00000000001111000000000000
                        00000000001111000000000000
                        00000000001111000000000000
                        00000000001111000000000000
                        00000000001111000000000000
                        00000000000000000000000000
.sp 1
.fi
It conveniently fits the frame of the image and is aligned nicely with the 
sides, which shows that it is an artificial example. The 234 values that make
up the image can be much more compactly held in 
.ul 1
run-length coded form,
a series of integers giving the numbers of cells in a left-to-right
top-to-bottom scan between each 1/0 or 0/1 transition. For example, starting 
at the top left corner, there are 26+6 zeros, then 12 ones, then 8+6 zeros, 
then 12 ones, then 8+10 zeros, etc. The run-length coded version of the above 
image is therefore 32, 12, 14, 12, 18, 4, 22, 4, 22, 4, 22, 4, 22, 4, 38.
To decode this you need to know the top left value (0 or 1) and that the image
is 26 wide. Nevertheless the 234 original values have been replaced by 15 
(plus perhaps another two at the start giving the necessary information).
The 234 original values fit into 30 8-bit bytes, so using run-length coding
has saved over 40% in space.
.ti 4
It is possible to extract various useful pieces of information from a 
silhouette image. Since the image is only 2-D, however, it is assumed that the
viewed object has constant thickness in the unperceived dimension - an object
for which this is true is commonly called 2.5-D ("two-and-a-half 
dimensional"). It is frequently possible for a vision system to rely on this 
assumption even when it is false, either because the variation in thickness is 
small enough not to matter much or because the system can be taught to treat 
what appears to be different objects (being one 3-D object but viewed in 
different stable states) in the same way.
.ti 4
The simplest useful information that can be extracted from the image is the
coordinates of the centre of area. The x-coordinate is simply the average of
the x-coordinates of the points in the image of the object; the y- coordinate
is found the same way. For example, the x-coordinate of the centre of area of
the image of the 'T' above is, assuming that the origin is at the centre of
the first pixel (at the top left) and that the X axis runs across, the Y axis
runs downward: 
.nf

2*6 + 2*7 + 2*8 + 2*9 + 7*10 + 7*11 + 7*12 + 7*13 + 2*14 + 2*15 + 2*16 + 2*17
 2  +  2  +  2  +  2  +  7   +  7   +  7   +  7   +  2   +  2   +  2   +  2

  = 506/44 = 11.5 (the centreline of the image, as expected!)

(note that the denominator in the fraction is just the number of pixels in the 
object's image, i.e. a measure of its 'mass') and similarly the y-coordinate
is

  = 136/44 = 3.090+

.fi
It is possible, and faster, to evaluate the various summations in these two 
calculations as the image is scanned for the first time. The coordinates of 
the centre of area are very useful to know, because they are two of the 
coordinates of the centre of mass if the object is 2.5-D. When picking the 
object up it is best to have the centre of mass between the fingers, in order 
to ensure that the object does not pivot as it is transported.
.ti 4
An estimate of the orientation of the object can also be calculated. The 
orientation is defined thus: imagine a straight line, through the centre of 
area. Take the sum of the square of the distance of every object image point 
from this line. The orientation of the line that makes this quantity a minimum 
is the orientation of the object. Some simple coordinate geometry will show 
how to determine this.
.sp 1
.in +4
Define the position of the line by its perpendicular distance (p) from the 
origin, and the angle (a) that the perpendicular makes with the X axis. Any 
point on the line satisfies the equation
.nf
      x cos(a) + y sin(a) = p


.fi
[To see this, note that OF=x, so OA= x cos(a). Also AP=FB=y sin(a)]. Now
consider any point (x1,y1) in the plane. The line which is parallel to the
first one and passes through (x1,y1) will be distant x1 cos(a) + y1 sin(a) from
the origin. Thus the point will be distant x1 cos(a) + y1 sin(a) - p from the
first line. Therefore the orientation angle a is the value that minimises the
expression
.nf
        ( x1 cos(a) + y1 sin(a) - p)
.fi
where the sum is taken over all the points (x1,y1) in the object's image. If 
the centre of area is at (x0,y0) then simple calculus will show that the angle
a satisfies the equation
.nf
                      2   (x1-x0)(y1-y0)
        tan(2a) = 
                    (x1-x0)   -   (y1-y0)
.fi
and these summations can be calculated fairly easily.
.in -4
.ti 4
Providing that only objects known to the system ever appear in the image, it 
is possible for the system to distinguish between them without having to form
a full description of the shape of each object. For example, it might be 
possible to categorise each object by using one or more special features of 
their shapes such as the length of the perimeter of the image, the area of the
image, the area and/or location of the largest/smallest hole in the image, 
the directions of the maxima or minima of the radius of the image, etc.
Rather than have a programmer specify the criteria to use, it is often more
convenient to show all the objects to the vision system during a 'teach' 
session, and have the controlling computer decide what set of criteria is
suitable in order to be able to reliably distinguish between each of the 
objects.
.pg 5.1.1 A digression: Hough transforms 
Various methods have been described in [5] for deciding where the edges are in
a grey level image. However, the methods are not especially useful for robotic
vision tasks; they form part of models of human visual prcessing, and are not
especially efficient. One reasonably efficient though non-human method uses
what are known as .ul 1 Hough transforms. Consider a point on the edge of the
image at (x,y). As was shown above, any line through this point has equation
.nf
       x cos(a) + y sin(a) = p 
.fi 
and it is p and a that actually determine the
line. The angle a could be a anywhere between 0 and 360 degrees, and p could
lie between 0 and the largest value that would let the line appear somewhere
in the image, namely the length of the diagonal of the rectangular image
(remember, p is the closest that the line gets to the origin; thus if p is too
big, the line never appears in the image at all). 
.ti 4 
Imagine keeping x and
y fixed and varying a and p. If the points (p,a) are plotted they lie on a
sine curve (which is "the Hough transform of (x,y)") within the rectangle 0 <
a < 360, 0 < p < diagonal. There is one such sine curve for each edge point
(x,y) in the original image. Now each straight edge in the original image is
one line, determined by one particular pair (p,a). All the Hough transforms of
the edge points will intersect at that pair. Thus it should be possible to
find the equations of the edges by finding those points (p,a) where many of
the Hough transforms intersect, or nearly do so. The easiest way to do this is
to divide the rectangle 0 < a < 360, 0 < p < diagonal into many small
rectangles, and count the number of times that a Hough transform goes through
each. The rectangles crossed by many of the transforms probably (though not
certainly) correspond to an edge in the original image. 
.ti 4 
This method of
finding the eqautions of edges is surprisingly efficient, although it is
rather sensitive to the choice of size of the small rectangles. One noteworthy
advantage to the method is that it is untroubled by edges which are partially
obscured - the two or more visible parts of such an edge are NOT treated as
separate bits in the finding process.