Biedermans recognitionbycomponents theory

Biederman (1987, 1990) put forward a theory of object recognition extending that of Marr and Nishihara (1978). The central assumption of his recognition-by-components theory is that objects consist of basic shapes or components known as "geons" (geometric ions). Examples of geons are blocks, cylinders, spheres, arcs, and wedges. According to Biederman (1987), there are about 36 different geons. This may seem suspiciously few to provide descriptions of all the objects we can recognise and identify. However, we can identify enormous numbers of spoken English words even though there are only about 44 phonemes in the English language. The reason is that these phonemes can be arranged in almost endless different orders. The same is true of geons. Part of the reason for the richness of the object descriptions provided by geons stems from the different possible spatial relationships among them. For example, a cup can be described by an arc connected to the side of a cylinder, and a bucket can be described by the same two geons, but with the arc connected to the top of the cylinder.

In order to understand recognition-by-components theory more fully, refer to Figure 4.5. The stage we have discussed so far is that of the determination of the components or geons of a visual object and their relationships. When this information is available, it is matched with stored object representations or structural models containing information about the nature of the relevant geons, their orientations, sizes, and so on. In general terms, the identification of any given visual object is determined by whichever stored object representation provides the best fit with the component- or geonbased information obtained from the visual object.

As can be seen in Figure 4.5, only part of Biederman's theory has been presented so far. What has been omitted is any analysis of how an object's components or geons are determined. The first step is edge extraction, which was described by Biederman (1987, p. 117) in the following way: "[There is] an early edge extraction stage, responsive to differences in surface characteristics namely, luminance, texture, or colour, provides a line drawing description of the object."

The next step is to decide how a visual object should be segmented to establish the number of parts of components of which it consists. Biederman (1987) agreed with Marr and Nishihara (1978) that the concave parts of an object's contour are of particular value in accomplishing the task of segmenting the visual image into parts.

The other major element is to decide which edge information from an object possesses the important characteristic of remaining invariant across different viewing angles. According to Biederman (1987), there are five such invariant properties of edges:

• Parallel: sets of points in parallel.

• Co-termination: edges terminating at a common point.

• Co-linearity: points in a straight line.

According to the theory, the components or geons of a visual object are constructed from these invariant properties. Thus, for example, a cylinder has curved edges and two parallel edges connecting the curved edges, whereas a brick has three parallel edges and no curved edges. Biederman (1987, p. 116) argued that the five properties:

have the desirable properties that they are invariant over changes in orientation and can be determined from just a few points on each edge. Consequently, they allow a primitive [component or geon] to be extracted with great tolerance for variations of viewpoint, occlusion [obstruction], and noise.

An important part of Biederman's theory with respect to the invariant properties is what he called the "non-accidental" principle. According to this principle, regularities in the visual image reflect actual (or non-accidental) regularities in the world rather than depending on accidental characteristics of a given viewpoint. Thus, for example, it is assumed that a two-dimensional symmetry in the visual image indicates symmetry in the threedimensional object. Use of the non-accidental principle helps object recognition, but occasionally leads to error. For example, a straight line in a visual image usually reflects a straight edge in the world, but it might not (e.g., a bicycle viewed end-on).

Some visual illusions can be explained by assuming that we use the non-accidental principle. For example, consider the Ames distorted room (described in Chapter 2). It is actually of a most peculiar shape, but when viewed from a particular point it gives rise to the same retinal image as a conventional rectangular room. Of particular relevance here, misleading properties such as symmetry and parallelism can be derived from the visual image of the Ames room, and may underlie the illusion.

Biederman's (1987) theory makes it clear how objects can be recognised in normal viewing conditions. However, we can generally recognise objects when the conditions are sub-optimal (e.g., an intervening object obscures part of the target object). According to Biederman (1987), there are various reasons why we are able to achieve object recognition in such conditions:

• The invariant properties (e.g., curvature; parallel lines) can still be detected even when only parts of edges can be seen.

• Provided that the concavities of a contour are visible, there are mechanisms allowing the missing parts of a contour to be restored.

• There is normally a considerable amount of redundant information available for recognising complex objects, and so they can still be identified when some of the geons or components are missing (e.g., a giraffe could be identified from its neck even if its legs were hidden from view).

Any adequate theory of object recognition needs to address the binding problem. A version of this problem arises when we are presented with several objects at the same time, and have to decide which features or geons belong to which objects. An attempt to solve this problem was made by Hummel and Biederman (1992), who proposed a connectionist model of Biederman's (1987) geon theory. This model is a seven-layer connectionist network taking as its input a line drawing of an object and producing as its output a unit representing its identity. According to Ellis and Humphreys (1999, p. 157), "The binding mechanism they employ, depends on synchrony in the activation of units in the network. In crude terms, units whose activation varies together are bound together, therefore so are the features they represent." More specifically, units that typically belong to the same object are connected by fast enabling links, which help to ensure that related units are all activated at the same time.

Hummel and Biederman (1992) carried out various simulation studies with their connectionist model, and showed that it provided an efficient and accurate mechanism for binding. However, it is not necessarily the case that people solve the binding problem in a similar way.

Experimental evidence

A study by Biederman, Ju, and Clapper (1985) was designed to test the notion that complex objects can be detected even when some of the components or geons are missing. Line drawings of complex objects having six or nine components were presented briefly. Even when only three or four of their components were present, participants displayed about 90% accuracy in identifying the objects.

Biederman (1987) discussed one of his studies in which participants were presented with degraded line drawings of objects (see Figure 4.6). Object recognition was much harder to achieve when parts of the contour providing information about concavities were omitted than when other parts of the contour were deleted. This confirms the notion that information about concavities is important for object recognition.

According to Biederman's theory, object recognition depends on edge information rather than on surface information (e.g., colour). To test this, participants were presented with line drawings or full-colour photographs of common objects for between 50 and 100 ms (Biederman, 1987). Performance was comparable with the two types of stimuli: mean identification times were 11 ms faster with the coloured objects, but the error rate was slightly higher. Even objects for which colour would seem to be important (e.g., bananas) showed no benefit from being presented in colour.

Joseph and Proffitt (1996) pointed out that many studies have found that colour does help object recognition, especially for objects (e.g., cherries) having a characteristic colour. They replicated this finding. They also found that colour knowledge can be more important than colour perception in object recognition. For example, their participants took a relatively long time to decide that an orange-coloured asparagus was not celery, because the stored colours for asparagus and celery are very similar. Somewhat surprisingly,

Intact figures (left-hand side), with degraded line drawings either preserving (middle column) or not preserving (far-right column) parts of the contour providing information about concavities. Adapted from Biederman (1987).

Business Correspondence

Business Correspondence

24 chapters on preparing to write the letter and finding the proper viewpoint how to open the letter, present the proposition convincingly, make an effective close how to acquire a forceful style and inject originality how to adapt selling appeal to different prospects and get orders by letter proved principles and practical schemes illustrated by extracts from 217 actual letter.

Get My Free Ebook


Responses

  • Rasmus
    What kind of theory is biederman's component theory?
    3 years ago

Post a comment