.pg 5.1 2-D silhouette vision This section describes how various sorts of information can be extracted from a simple silhouette image, i.e. an image consisting essentially of a rectangular array of values each either 0 (representing a point not part of the image of some object) or 1 (representing a point that is part of some object). Consider, for example, this image of the letter 'T': .nf .sp 1 00000000000000000000000000 00000011111111111100000000 00000011111111111100000000 00000000001111000000000000 00000000001111000000000000 00000000001111000000000000 00000000001111000000000000 00000000001111000000000000 00000000000000000000000000 .sp 1 .fi It conveniently fits the frame of the image and is aligned nicely with the sides, which shows that it is an artificial example. The 234 values that make up the image can be much more compactly held in .ul 1 run-length coded form, a series of integers giving the numbers of cells in a left-to-right top-to-bottom scan between each 1/0 or 0/1 transition. For example, starting at the top left corner, there are 26+6 zeros, then 12 ones, then 8+6 zeros, then 12 ones, then 8+10 zeros, etc. The run-length coded version of the above image is therefore 32, 12, 14, 12, 18, 4, 22, 4, 22, 4, 22, 4, 22, 4, 38. To decode this you need to know the top left value (0 or 1) and that the image is 26 wide. Nevertheless the 234 original values have been replaced by 15 (plus perhaps another two at the start giving the necessary information). The 234 original values fit into 30 8-bit bytes, so using run-length coding has saved over 40% in space. .ti 4 It is possible to extract various useful pieces of information from a silhouette image. Since the image is only 2-D, however, it is assumed that the viewed object has constant thickness in the unperceived dimension - an object for which this is true is commonly called 2.5-D ("two-and-a-half dimensional"). It is frequently possible for a vision system to rely on this assumption even when it is false, either because the variation in thickness is small enough not to matter much or because the system can be taught to treat what appears to be different objects (being one 3-D object but viewed in different stable states) in the same way. .ti 4 The simplest useful information that can be extracted from the image is the coordinates of the centre of area. The x-coordinate is simply the average of the x-coordinates of the points in the image of the object; the y- coordinate is found the same way. For example, the x-coordinate of the centre of area of the image of the 'T' above is, assuming that the origin is at the centre of the first pixel (at the top left) and that the X axis runs across, the Y axis runs downward: .nf 2*6 + 2*7 + 2*8 + 2*9 + 7*10 + 7*11 + 7*12 + 7*13 + 2*14 + 2*15 + 2*16 + 2*17 2 + 2 + 2 + 2 + 7 + 7 + 7 + 7 + 2 + 2 + 2 + 2 = 506/44 = 11.5 (the centreline of the image, as expected!) (note that the denominator in the fraction is just the number of pixels in the object's image, i.e. a measure of its 'mass') and similarly the y-coordinate is = 136/44 = 3.090+ .fi It is possible, and faster, to evaluate the various summations in these two calculations as the image is scanned for the first time. The coordinates of the centre of area are very useful to know, because they are two of the coordinates of the centre of mass if the object is 2.5-D. When picking the object up it is best to have the centre of mass between the fingers, in order to ensure that the object does not pivot as it is transported. .ti 4 An estimate of the orientation of the object can also be calculated. The orientation is defined thus: imagine a straight line, through the centre of area. Take the sum of the square of the distance of every object image point from this line. The orientation of the line that makes this quantity a minimum is the orientation of the object. Some simple coordinate geometry will show how to determine this. .sp 1 .in +4 Define the position of the line by its perpendicular distance (p) from the origin, and the angle (a) that the perpendicular makes with the X axis. Any point on the line satisfies the equation .nf x cos(a) + y sin(a) = p .fi [To see this, note that OF=x, so OA= x cos(a). Also AP=FB=y sin(a)]. Now consider any point (x1,y1) in the plane. The line which is parallel to the first one and passes through (x1,y1) will be distant x1 cos(a) + y1 sin(a) from the origin. Thus the point will be distant x1 cos(a) + y1 sin(a) - p from the first line. Therefore the orientation angle a is the value that minimises the expression .nf ( x1 cos(a) + y1 sin(a) - p) .fi where the sum is taken over all the points (x1,y1) in the object's image. If the centre of area is at (x0,y0) then simple calculus will show that the angle a satisfies the equation .nf 2 (x1-x0)(y1-y0) tan(2a) = (x1-x0) - (y1-y0) .fi and these summations can be calculated fairly easily. .in -4 .ti 4 Providing that only objects known to the system ever appear in the image, it is possible for the system to distinguish between them without having to form a full description of the shape of each object. For example, it might be possible to categorise each object by using one or more special features of their shapes such as the length of the perimeter of the image, the area of the image, the area and/or location of the largest/smallest hole in the image, the directions of the maxima or minima of the radius of the image, etc. Rather than have a programmer specify the criteria to use, it is often more convenient to show all the objects to the vision system during a 'teach' session, and have the controlling computer decide what set of criteria is suitable in order to be able to reliably distinguish between each of the objects. .pg 5.1.1 A digression: Hough transforms Various methods have been described in [5] for deciding where the edges are in a grey level image. However, the methods are not especially useful for robotic vision tasks; they form part of models of human visual prcessing, and are not especially efficient. One reasonably efficient though non-human method uses what are known as .ul 1 Hough transforms. Consider a point on the edge of the image at (x,y). As was shown above, any line through this point has equation .nf x cos(a) + y sin(a) = p .fi and it is p and a that actually determine the line. The angle a could be a anywhere between 0 and 360 degrees, and p could lie between 0 and the largest value that would let the line appear somewhere in the image, namely the length of the diagonal of the rectangular image (remember, p is the closest that the line gets to the origin; thus if p is too big, the line never appears in the image at all). .ti 4 Imagine keeping x and y fixed and varying a and p. If the points (p,a) are plotted they lie on a sine curve (which is "the Hough transform of (x,y)") within the rectangle 0 < a < 360, 0 < p < diagonal. There is one such sine curve for each edge point (x,y) in the original image. Now each straight edge in the original image is one line, determined by one particular pair (p,a). All the Hough transforms of the edge points will intersect at that pair. Thus it should be possible to find the equations of the edges by finding those points (p,a) where many of the Hough transforms intersect, or nearly do so. The easiest way to do this is to divide the rectangle 0 < a < 360, 0 < p < diagonal into many small rectangles, and count the number of times that a Hough transform goes through each. The rectangles crossed by many of the transforms probably (though not certainly) correspond to an edge in the original image. .ti 4 This method of finding the eqautions of edges is surprisingly efficient, although it is rather sensitive to the choice of size of the small rectangles. One noteworthy advantage to the method is that it is untroubled by edges which are partially obscured - the two or more visible parts of such an edge are NOT treated as separate bits in the finding process.