.pg 5.4 3-D visual processing Three dimensional perception would clearly be very useful to a robot - for instance, it would then be possible for it to tell whether parts were properly aligned with each other before trying to fit them together. There are essentially two ways of acquiring the range information on which to base 3-D vision. The first is by measuring the time of flight of some kind of wave, sound or light or radar, from a known transmitting point reflected to a known receiving point. Various problems make this unsuitable for robotics, unfortunately: .nf a) if the transmitted wave spreads it is hard to be sure where the reflection is coming from; if it does not spread then it is necessary to scan the transmitted wave over the scene, and this takes time to do b) sound travels at around 330 m/s in air. Awaiting the refelction from a n o object only 1.5m away wastes around 0.01 secs, so that the time needed to build an adequate image is uneconomically large c) light and other elctromagnetic radiation travels at nearly 300,000,000 m/s in air. It would therefore be necessary to be able to measure a time difference of only 0.00000001 secs in order to be able to detect the re reflection, and considerably smaller times in order to be able to calculate the range with any accuracy. While this might be technologically feasible, it is certainly not an economic proposition. .fi The second way is to use triangulation in some form. The idea that naturally comes to mind is to have two cameras providing images of a scene from slightly different viewpoints. Knowing the locations, directions of view and physical characteristics of the cameras, it would only be necessary to compare the whereabouts of the two images of any visible point in order to be able to calculate its location in space. However, there are problems. If the cameras are fixed, the viewable volume is fixed; if they are movable, they are another source of error in the calculations. Also, even in regulated lighting conditions, it is not simple to decide which points in one image match which points in the other, since the grey levels of the pixels in the two images that correspond to one physical point will almost certainly differ. This is because the intensity of the light reflected by the objects will vary with the direction of view. The problem is even worse if the scene happens to include curved surfaces! .ti 4 One way to reduce these problems is to replace one of the cameras with a projector throwing a plane of light onto the scene. This arrangement is called a 'striper'; it has been successfully used in various applications, including some involving cylindrical, spherical and conical surfaces. .pg 5.4.1 The 'striper' The general arrangement is as shown in the diagram: .sp 14 The projected plane of light will cause a line of light across the scene; provided that the lighting conditions are right only this line of light will register in the camera image: .sp 10 The plane of light can be moved across the whole scene by moving the camera, the projector or the scene itself. The succession of images will produce a contour map of the scene: .sp 10 The main problem is to decide what points in one image correspond to what points in the next. Obviously it is fairly easy to match the discontinuities in the image of the line of light, and to match up the straight segments. The curved segments can be matched by decomposing them into bits, for example by finding parallel chords of equal length in the two images: .sp 10 Things can go wrong, e.g. if the plane of light happens to be parallel to the axis of a cylinder then the contour map for the cylinder will be a set of parallel but not equally spaced straight lines, and a little thought should convince you that this is ambiguous. However in most cases it is possible to have have the computer deduce where the edges and surfaces are from the information in the contour map, and thereby build up its own internal representation of the viewed three dimensional objects. An more detailed account can be found in [7].