bio-mimetic learning from images using imprecise expert information

17
Fuzzy Sets and Systems 158 (2007) 295 – 311 www.elsevier.com/locate/fss Bio-mimetic learning from images using imprecise expert information Jonathan Rossiter a , , Toshiharu Mukai b a Artificial Intelligence Research Group, Department of Engineering Mathematics, University of Bristol, Bristol BS8 1TR, UK b Bio-mimetic Control Research Center, RIKEN (The Institute of Physical and Chemical Research), 2271-130,Anagahora, Shimoshidami, Moriyama-ku, Nagoya 463-0003, Japan Available online 20 October 2006 Abstract We present a method for training a cross-product granular model with uncertain image data provided by domain experts. This image data is generated by a process of vague image tagging where experts label regions in the image using vague and general shapes. This is possible through a number of observations of, and assumptions about, human behaviour and the human visual system. We focus on the human tendency to concentrate on one central region of interest at a time and from this characteristic we define an applicability function across each tagged shape. We present bio-mimetic justification for our choice of applicability function and show examples of the vague tagging process and machine learning with this tagged data using a cross-product granule learner. Illustrated applications include medical decision making from radiological images and guided training of robots in hazardous environments. © 2006 Elsevier B.V.All rights reserved. Keywords: Bio-mimetic; Cross-product fuzzy set; Image classification; Vague regions of interest 1. Introduction In this paper we examine the general problem of rapidly extracting useful information from experts in the application domains of image classification and uncertain machine learning. We present a schema for representing vague image information and we present a method for building uncertain models of class definitions from this representation. In this way we have a method for rapidly extracting information from experts and learning image classifications from this information. The rapidity of this process, especially from the point of view of the expert, enables a more natural and free conveyance of image-related information from the expert to the machine learner.We are thus attempting to free the expert from detailed hand-labelling of images at the pixel level and we let them make more imprecise statements in the nature of “this general area shows good examples of class X.” If we consider the problem of fusing expert knowledge with learnt knowledge we can take either the a priori or the a posteriori approach to including the expert knowledge. We might show the a posteriori approach as in Fig. 1a where sensor signals are recorded in a database of examples and a model is learnt from this database. It is only then that expert input is used to modify and refine the model. The alternative is to build the model a priori using information Corresponding author. E-mail addresses: [email protected] (J. Rossiter), [email protected] (T. Mukai). 0165-0114/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.fss.2006.10.012

Upload: jonathan-rossiter

Post on 21-Jun-2016

216 views

Category:

Documents


4 download

TRANSCRIPT

Fuzzy Sets and Systems 158 (2007) 295–311www.elsevier.com/locate/fss

Bio-mimetic learning from images using impreciseexpert information

Jonathan Rossitera,∗, Toshiharu Mukaib

aArtificial Intelligence Research Group, Department of Engineering Mathematics, University of Bristol, Bristol BS8 1TR, UKbBio-mimetic Control Research Center, RIKEN (The Institute of Physical and Chemical Research), 2271-130, Anagahora, Shimoshidami,

Moriyama-ku, Nagoya 463-0003, Japan

Available online 20 October 2006

Abstract

We present a method for training a cross-product granular model with uncertain image data provided by domain experts. Thisimage data is generated by a process of vague image tagging where experts label regions in the image using vague and generalshapes. This is possible through a number of observations of, and assumptions about, human behaviour and the human visualsystem. We focus on the human tendency to concentrate on one central region of interest at a time and from this characteristicwe define an applicability function across each tagged shape. We present bio-mimetic justification for our choice of applicabilityfunction and show examples of the vague tagging process and machine learning with this tagged data using a cross-product granulelearner. Illustrated applications include medical decision making from radiological images and guided training of robots in hazardousenvironments.© 2006 Elsevier B.V. All rights reserved.

Keywords: Bio-mimetic; Cross-product fuzzy set; Image classification; Vague regions of interest

1. Introduction

In this paper we examine the general problem of rapidly extracting useful information from experts in the applicationdomains of image classification and uncertain machine learning. We present a schema for representing vague imageinformation and we present a method for building uncertain models of class definitions from this representation. Inthis way we have a method for rapidly extracting information from experts and learning image classifications from thisinformation. The rapidity of this process, especially from the point of view of the expert, enables a more natural andfree conveyance of image-related information from the expert to the machine learner. We are thus attempting to freethe expert from detailed hand-labelling of images at the pixel level and we let them make more imprecise statementsin the nature of “this general area shows good examples of class X.”

If we consider the problem of fusing expert knowledge with learnt knowledge we can take either the a priori or thea posteriori approach to including the expert knowledge. We might show the a posteriori approach as in Fig. 1a wheresensor signals are recorded in a database of examples and a model is learnt from this database. It is only then thatexpert input is used to modify and refine the model. The alternative is to build the model a priori using information

∗ Corresponding author.E-mail addresses: [email protected] (J. Rossiter), [email protected] (T. Mukai).

0165-0114/$ - see front matter © 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.fss.2006.10.012

296 J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311

Sensors Data Model

Expert

induction

information /knowledge

Sensors Data Model

Expert

Projection

induction

(a) (b)

Fig. 1. Learning with experts: (a) posterior fusion; (b) prior projection.

derived from interviews with experts, in the general method that expert systems are constructed, and then modify thisbase model through machine learning. Both of these methods are unsatisfactory where experts are pressed for time,money and patience.

We propose a simpler method of fusing expert information and training data through the process of a prior projectionfrom expert actions into training data [14,15]. This is shown simply in Fig. 1b. Here the actions of the expert in aspecific domain are projected into the same problem space as the sensor-derived data.

As we will discuss in the following sections, there are suitable and unsuitable application domains for this projectionprocess. Also there are some important restrictions and assumptions that can or must be made in order for this processto be useful and productive. In this paper we take a bio-mimetic approach to this problem in the specific domains ofimage analysis and machine vision. Bio-mimetics (the study and mimicry of human and animal behaviour in artificialsystems) can add to our understanding of, and confidence in, the combined learning process.

From the image processing point of view the domain expert can be thought of as a translator of raw images intosuitable training data. The general problem of selecting suitable training sites from an image is a core problem in imageprocessing and is becoming increasingly important as multimedia applications generate more image data. This trainingsite selection problem is not to be confused with the problem of feature subset selection. Unsupervised image subsetselection is used extensively in automatic image annotation [11,12,24]. In this paper, though, we only consideringsupervised subset selection by experts.

The paper is organised into the following sections. First, we present the motivations and background to this methodof rapidly extracting vague but useful information from experts. Then we present the application of this method inthe image classification domain with reference to a simple bio-mimetic assumption of human behaviour. Next, wedefine suitable applicability functions which are necessary for this projection process. We make a further bio-mimeticobservation, this time in specific aspects of the human visual system, and use this as the basic for defining a bio-mimeticapplicability function. Next, we explain the cross-product granular learning model [2] we use for uncertain learning.Finally, we present some examples of the use of this method in image analysis and machine vision and we discuss thecomparison of this method to other segmentation and classification algorithms.

2. Motivations and background

Let us take as one of our motivations the application of medical image analysis. In hospitals it is the job of radiologiststo assess medical images including those from XRays and CT and MRI scanners. This is a time consuming and repetitivetask. For a high-resolution chest CT scan there may be in the order of 100 separate scans to view [10]. Problems oftime, cost, and lack of trained radiologists are easily appreciated, but further problems become apparent as radiologistsare overloaded. The problem of satisfaction of search [17], for example, is a distinctly human problem that arises insuch applications. The problem here is that a radiologist may focus on the first abnormality they find in a series of scansand assume this is the cause of the patient’s symptoms, possibly missing more serious abnormalities later in the scanseries. The goal of automatic image analysis in this domain is to relieve all these problems.

In moving toward the goal of a tool for autonomous medical image analysis we need to take into account the expertradiologist’s knowledge, skills and experience. Traditionally we would ask an expert to laboriously build a labelleddata set of radiological images and use this to train a machine learner. The sheer cost and effort of this may mean thatin reality only a small training data set is built. Not only that, but one problem encountered in medical imaging is thatimages are often very unclear. The boundary between one organ and another, or between diseased and healthy tissue

J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311 297

is not discrete. This means that experts are often frustrated in having to define this artificial discrete boundary whenpreparing labelled data.

When one discusses with radiological experts how they reach their diagnosis they often refer to the radiologicalimages in imprecise terms. Indeed, when the radiologists prepare their final reports they use such terms as “moderatedisease”, “toward the posterior”, “suggests” and so on. This in itself hints that learning from pixel-level hand-labelledimages is insufficient to capture the diagnosis process. Ideally we would want to learn about the diagnosis processdirectly from radiological reports (e.g. through the MedLee system [9]), from medical images and from interviews withradiologists themselves. Given the vagueness inherent in this process and even in the final diagnosis (e.g. “the patientis quite likely to be suffering from...”) we naturally focus on a modelling with words approach. In this paper we focuson the image acquisition stage and how this can be used in a modelling with words granular framework.

2.1. Information from images

Consider Fig. 2a which shows a typical CT scan of the chest [16]. Evident in the lung fields are a number of darkregions corresponding to the voids characteristic of the disease emphysema. If pressed for a detailed hand classificationa radiological expert may produce an image similar to Fig. 2b showing emphysema as black and other lung tissue inwhite. Clearly, given the non-distinct borders between the two classes of tissue, even this classification is overly precise.

Fig. 2. Emphysema.

Fig. 3. Original image.

298 J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311

Fig. 4. Labelled regions: (a) crisp labelling; (b) imprecise labelling.

Indeed, we might say that a blurred boundary would be advantageous even when hand classifying images. That approachwould be closer to a linguistic interpretation such as “there is extensive disease in the left extremity.”

To this end we propose a method for rapidly labelling such images with vague class labels. In reality, the labelsare not vague, rather the information in the tagged region is subject to an applicability assumption that constrains thevagueness of the information imparted by any one tagged region.

As a good example consider the images in Figs. 3 and 4. Fig. 3 shows a photograph of lilies in a pond. Fig. 4a showsthe same photograph with some hand labelled regions corresponding to the classes “flower” and “leaf.” Note that thesehave been labelled by tracing around the appropriate regions at the pixel level. This is somewhat time consuming.Fig. 4b shows the same photograph again, but this time with three circular or elliptical shapes, each of which hasan associated class label. In this third figure we have rapidly labelled regions that are “generally good examples ofclass X.”

This rapid, but vague, labelling of images is the core of the proposed learning mechanism.

3. Vague image tagging

In general, we can say that any image labelling operation will generate a set of labelled image regions, {LIR}. Alabelled image region is defined as the pair:

LIR = 〈R, C〉, (1)

where R contains all the pixels in the region of interest and C is a class label.Now we can define our vague labelled image region LIRv as the following triple:

LIRv = 〈Rv, C, A〉, (2)

where Rv contains all the pixels in a geometric region in the original image defined by some mutually agreed uponand generally simple geometric shape, C is a class label, and A is an applicability function defined over Rv . Thus, weobtain a set of vague labelled regions {LIRv} which is the vague version of the previously defined set of crisp labelledregions {LIR}. We discuss the form of this applicability function A in the next section.

A single LIRv can be thought of as corresponding to a linguistic granule of information of the form “this regionis a good example of class C.” This representation is thus compatible with the general paradigm of modelling withwords [13].

Key to the interpretation of this information and the maintenance of consistency between the artificial system and thedomain expert is the applicability function A. This is a function defined over the space of Rv (i.e. the two-dimensionalimage space of (x, y)) with a range [0, 1]. In other words:

A : (x, y) → [0, 1]. (3)

We call this function “applicability” because it defines how applicable any smaller region within Rv is to the conceptdescribed by the label C. In this way we can say that image information within any particular LIRv has a degree of

J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311 299

y

x

A (x,y)

Fig. 5.

A (

d)

A (

d)

d

0.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1

0.20.1

0

d

0.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1

0.20.1

0

(a) (b)

Fig. 6. Example applicability functions.

vagueness associated with it. The question now naturally emerges: What function to use for A? We address this questionin the next section.

3.1. Bio-mimetic behaviour assumption

Let us recall Fig. 4a. Here the assumption was that within any labelled region all image information is equallyapplicable. We can immediately see that this is identical to an LIRv with a unit applicability function. That is, every partof the image region contained in LIRv contributes equally to the concept described by the class label C, also containedin LIRv . While this is appropriate for the regions in Fig. 4a this assumption of uniform applicability is wholly unsuitablefor the regions defined in Fig. 4b. Here a much more suitable assumption might be, “information is more important(or more applicable) near the center of the region and less important (or less applicable) near the edges.” This is asimple summary of a natural behaviour characteristic of humans. When assessing contiguous image regions humansgenerally weight information near the center above information at the edges. If we were to define an applicabilityfunction inspired by this behaviour characteristic we would in effect be making a bio-mimetic assumption of humanbehaviour.

Fig. 5 shows a typical applicability function overlaid upon a vague image region. While A may be defined in terms of(x, y) it seems more natural to define this function in terms of d, the distance from the center of the region of interest.In other words:

A : d → [0, 1]. (4)

Fig. 6a and b show some example applicability functions defined over d. By defining different applicability functionswe can vary focusing bias toward or away from the center of the vague image region. In both these examples it is clearthat applicability is maximal (i.e. A(d) = 1) at the center of the image region.

During the image tagging phase it is important that the expert and the machine learner are consistent in theirunderstanding of the applicability function A. Generally, all parties will use an agreed-upon standard applicabilityfunction.

300 J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311

3.2. Generating the feature value data set

Having collected a number of vague image regions LIRv from the domain expert and having agreed a suitableapplicability function with the same expert we can now induce a model based on words. In this paper we use a form ofcross-product (or Cartesian-product) fuzzy set to model each class. In order to build these fuzzy sets we need to deriveimage feature values from each LIRv . Typical features are those found in conventional image analysis such as colour,texture and so on.

This feature extraction process serves two purposes; firstly, it reduces the size of the database since a tagged regionof some thousands of pixels may be replaced by a much smaller number of image feature values, and secondly,it gives us the opportunity to emphasise particular image-related features, such as texture, in the machine learningprocess.

As with many image processing tasks we face a trade off between low-level and high-level features. Low-levelfeatures contain a lot of raw information but require large computing resources and may be difficult to interpret. High-level features, on the other hand, generalise well and are easier to interpret but loose some of the detailed informationcontent. In semi-critical applications such as medical diagnosis tools we must select a set of image features which notonly matches the problem at hand but also enables transparent interpretation of the learnt model.

3.3. Feature selection

For any tagged region the number of image features that can be calculated is limited by the size and shape of theregion. For example, to calculate the feature mean red colour we need at least one pixel and at most all pixels. Onthe other hand, to calculate a value for the feature “snake-skin texture” using Gabor filters [3], or some such, willrequire a much larger contiguous area, say 50 × 50 pixels. So for a tagged region with a contiguous square area of25 × 25 pixels the mean red colour can be calculated easily but the snake-skin feature value cannot. Clearly, the mainconstraints by which we must select features are the minimum and maximum area needed to calculate an individualfeature value.

In general, for any image feature F taken from our chosen feature set {F1, . . . , Fn} there is an associated minimumand maximum area within the bounds of which a value for F can be calculated. We denote these lower and upper areabounds Z−(F ) and Z+(F ), respectively. The problem remains therefore not just to select features that are meaningfulto the problem at hand but also to select suitable features where a large enough number of values can be calculated foreach tagged region.

Let us assume that the area of a tagged region Rv is Z(Rv) (that is, Z is the function size_of ) and this is large enough togenerate at least one value of feature F, i.e. Z−(F )�Z(Rv)�Z+(F ). We can sub-divide region Rv into m sub-regions,{Rvi

| i ∈ N, i ∈ {1, . . . , m}}, each of size Z(Rvi), where Z−(F )�Z(Rvi

)�Z(Rv). We can now calculate m valuesfor our chosen feature, one for each sub-region Rvi

.Finally, we calculate a single applicability value, a, for each sub-region based on the distance of the centre of the

sub-region from the centre of the tagged region. This is shown in Fig. 7a. In this way we generate a database consistingof tuples of the following form:

T = 〈{f1, . . . , fm}, C, a〉, (5)

where each fn is a value calculated for feature Fn, C is the class label and a is the associated applicability value forthis sub-region.

We can interpret each of these tuples as a smaller information granule than was originally specified by the expert,with the following meaning: “the sub-region with feature values {f1, . . . , fm} is a good example of class C, withapplicability a.”

3.3.1. Conflicting feature sizesWhere the selected feature set contains features with greatly differing values for the area lower bound Z−(F ) we

must compromise on the sub-region size Z(Rvi). In practise we may take some mean or weighted average of Z−(F )

in order to select Z(Rvi). Having selected a compromise value for Z(Rvi

) we may also be faced with the problem of

J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311 301

Rvi Rv

A

0

1

RviRv

Z_(F)

RviRv

Z+(F)

(a) (b) (c)

Fig. 7. Calculating feature values: (a) overlaying applicability; (b) Z(Rvi) < Z−(F ); (c) Z+(F ) < Z(Rvi

).

how to calculate a value for feature F where Z(Rvi) < Z−(F ) or where Z+(F ) < Z(Rvi

). We therefore must dealwith the following two cases:

(1) Z(Rvi) < Z−(F ): In this case we must extend the sub-region into the surrounding pixel area and calculate a value

for F using a region of size Z−(F ). This is shown in Fig. 7b.(2) Z+(F ) < Z(Rvi

): In this case we calculate as many values for the feature as possible by breaking up Rviinto

small regions of size Z+(F ) and then take the mean of the resulting feature values. This is shown in Fig. 7c.

There may be cases where it is unacceptable to compromise on the area used to calculate a particular feature. In otherwords there is no way in which a feature can be satisfactorily calculated for a region Z(Rvi

). For these cases we includethe tuple in the database but mark a missing value for that feature. That is, for that particular granule we have completeuncertainty about that particular feature value.

4. Cross-product granular learning with applicability functions

We will now briefly review the granular learning method we are using to learn class models from this databaseof uncertain training instances. More information on the cross-product granular fuzzy set and learning algorithmsfor their generation can be found in [19,2,18]. The advantage of using this method is that it overcomes problems ofdecomposition error (e.g. as suffered by naïve Bayes classifiers) whilst at the same time creating a model that can bereadily interpreted. We can translate a cross-product fuzzy model into linguitic terms in order to give the experts greaterinsight into the problem domain. The experts also have the chance to compare the learnt model with their own internalunderstanding of the problem.

Induction with cross-product fuzzy sets involves the building of a single cross-product fuzzy set GM for each problemclass. First, a single cross-product fuzzy set Gj is defined for each tuple in the training database T and then these areaggregated to generate the cross-product fuzzy sets for each problem class. In this paper we combine all point-derivedfuzzy sets Gj which have the same associated class by a simple summation and normalisation process. In other words,the aggregation operation used to turn cross-product fuzzy sets Gj derived from training instances into the model GM

is simply:

GM = N

⎛⎝∑

j

Gj

⎞⎠ , (6)

where N is a fuzzy set normalising operation.

4.1. Cross-product fuzzy sets in brief

For the chosen set of image processing features {F1, . . . , Fn}, each member has the associated set of q fuzzy sets{f(i,1), . . . , f(i,q)} defined over it, where f(i,j) is the j th fuzzy set defined over the ith feature Fi . Each of thesefuzzy sets has a label associated with it. Thus, l(f(i,j)) is the label associated with fuzzy set f(i,j) and Li is the setof all labels belonging to fuzzy sets defined over feature Fi . The number q of feature fuzzy sets may be decided byeither considering the complexity and computability requirements of the cross-product fuzzy sets or by considering

302 J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311

the linguistic interpretability of the resulting model. In other words, a large value for q gives a more accurate modelbut computational requirements will be larger and the final model will be more difficult to summarise.

A cross-product granule fuzzy set is defined over the cross-product label space K where K is generated by cross-product concatenation of fuzzy set labels across all image features. Thus,

K = ×{Li | i = 1, . . . , n}, (7)

where × denotes the Cartesian product set operation. So for q fuzzy sets defined over n features, K will have qn

elements.A cross-product granule fuzzy set is thus a discrete fuzzy set G over the universe of cross-product labels K:

G = {(k : m) | k ∈ K, m ∈ [0, 1]}. (8)

4.2. Training with applicability values

With the introduction of applicability values in the training database we modify (6) thus:

GM = N

⎛⎝∑

j

Gj × aj

⎞⎠ , (9)

where N is a fuzzy set normalising operation and × is the algebraic product.

4.3. Testing with unseen data

Classification of an unseen data tuple involves converting the tuple into a point-derived cross-product fuzzy set [18]and matching it to each of the class fuzzy sets. The winning class is the class whose cross-product fuzzy set matchesthe point-derived fuzzy set most closely. In this paper we calculate the support for a test point p with equivalentcross-product fuzzy set Gp being of a certain class C as

SC(p) =∑

k∈K MGp(k) × MGMC(k)

|K| , (10)

where MGp(k) is the membership degree of element k in cross-product fuzzy set Gp, itself derived from test pointp, and MGMC

(k) is the membership of k in cross-product fuzzy set GMCwhich represents class C. This operation is

simply the normalised sum of all pairs of memberships with the same focal element. There are a number of alternativesimilarity measures for comparing fuzzy sets, such as mass assignment based semantic unification [1].

5. An illustrated example: detecting disease from lung scans

In this example we attempt to classify regions in a lung CT scan into the classes “healthy”, “unhealthy” and “body”(see [10,16] for background). We took a 5122 pixel CT scan image and split it into the left and right halves as shown inFigs. 8a and b. We then tagged example regions in only the left image, concentrating on severe emphysema (voids inthe lung volume). This gives a training data set T of 806 tuples, each including an applicability value as shown in (5).This training data set was then used to build a cross-product fuzzy set model of lung disease. We then classified all 4096regions (including untagged regions) in both the left and right lung images using this learnt model. We learnt for twocases: tagging with linear graded applicability and tagging without any applicability. In this example the sub-region sizeZ(Rvi

) was calculated to give nine sub-regions per average tagged region. Three fuzzy sets, {small, medium, large},were defined on each member of the image feature set {mean intensity, variance of intensity}. When learning withoutapplicability information we simply weight all tuples in the training set equally.

Fig. 8c shows the vague tagged regions. Fig. 9a and b show the actual regions of severe disease in the left and rightlungs, respectively. Fig. 9c shows an example of classifier output. For clarity these results were reproduced as regionblock diagrams in Fig. 10. Fig. 10a shows the results of classifying the left lung without applicability, that is, whereall information from the expert is given equal weighting. Fig. 10b shows the results of classifying the same lung with

J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311 303

Fig. 8. Lung CT scan: (a) left; (b) right; (c) tagged regions.

Fig. 9. Actual severe disease (black): (a) left; (b) right; (c) typical raw results image.

Left, withoutapplicability

Left, withapplicability

Right, without applicability

Right, withapplicability(a) (b) (c) (d)

Fig. 10. Classification results: (a) left, without applicability; (b) left, with applicability; (c) right, without applicability; (d) right, with applicability.

applicability, that is, where an applicability function is applied across each tagged region as described in Section 3.Clearly, the applicability results (Fig. 10b) most closely match the actual classifications given in Fig. 9a.

304 J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311

Table 1Classification results for lung disease (% misclassified)

Training data (%) Test data (%)

I Whole image: No applicability 21.8 20.5II Whole image: Applicability 16.2 8.5III Lung region only: No applicability 29.6 30.8IV Lung region only: Applicability 17.6 9.8

response in retinalines of equal visual

cornea

observed scenethe human eye

retinal map

foveola

1.3deg

fovea

5deg

Fig. 11. The human eye and retina.

Classifying the unseen right lung image yielded classification results in Fig. 10c (without applicability information)and Fig. 10d (with applicability). Table 1 shows the breakdown of results for this problem. It is clear again that thegraded applicability function improves classification results markedly. Note also the improvement from training datato test data when using applicability information. In these cases the applicability function is clearly helping to capturemore of the class definition, as defined by the expert, and the test data contains clearly differentially regions that canbe easily matched to the learnt class definitions.

While these results are not conclusive they provide firm evidence for the efficacy of this method, both in terms offast image tagging by experts and for learning with applicability functions using cross-product fuzzy sets.

6. Inspiration from the human eye

We now turn once again to the form of applicability function A. Recall that the expert is trying to label the image bydrawing rough areas of interest and associating each one with a class label. It is now helpful to examine how the humanvisual system may impact upon both the shape and the size of the region chosen by the expert and also the applicabilityfunction we select.

Fig. 11 shows a simple diagram of the human eye and it illustrates how light from an observed scene enters the eyeand strikes the retina. The retina itself is not homogeneous but is made of two types of light sensitive cells, the conesand the rods. Cones are active most in bright conditions and have high colour sensitivity while rods are better at muchlower light levels but are less colour sensitive. The distributions of cones and rods are non-uniform, with many morecones present at the center 5◦ of the retina in a region called the fovea. Moving away from the fovea the concentrationof cones rapidly falls and rods dominate, as shown in Fig. 12 [4,22].

J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311 305

cones

rods

-60 0 60

angle from centre of fovea (deg)

Fig. 12. Distribution of light sensitive cells in the human retina.

6.1. Fovea-like functions

While Fig. 12 shows the densities of both cones and rods, we must be a little careful in our interpretation of thisfigure. For our typical application (daylight/bright classification of images and scenes) the cones (and hence the fovea)contribute most.Yet the rods will also contribute to some degree. We can combine the contributing factors of both conesand rods into a rough class of applicability function. In this paper we assume the function is smooth and monotonicwith maximum at the centre of the region of interest and minimum at the edge.

For simplicity we consider two families of fovea-like functions:

• Trapezoidal functions: A trapezoidal function defined across the distance from the centre of the expert-defined regionto the edge requires only one parameter 0�v�D (where D is the radius of the tagged region) in order to definean infinite family of functions from uniform to triangular. The advantage of this function type is in the simplicityof calculations in practical implementation. The disadvantage is that such a grossly linear function will not closelyresemble the cell distribution of Fig. 12.

• Gaussian-like functions: Here we approximate the centre-focus of the fovea with a simple Gaussian with variance �.Note again that we are not trying to exactly match the distributions of cones in the fovea, rather the centre-waiting, orfocus, of the retina. The Gaussian-like function can be optionally normalised such that A(0) = 1 and/or A(D) = 0where D is the radius of the tagged region.

Now let us consider the problem of spatial scaling when applying the applicability function to the image region.Whilst the expert is able to define and label image regions of any size or shape the human fovea itself is fixed insize. We therefore may be tempted to change the spatial scaling of our applicability functions from a relative scale(that is, in proportion to the size of each individual image region) to an absolute scale (that is, in proportion tothe size of the complete scene.) In the context of this paper both of these scalings have their own benefits. Fixedor absolute scaling of the applicability function most closely matches the physiology of the fovea, but may distortthe information contained in either very small or very large image regions. Scaling relative to the region geometry,on the other hand, is more likely to preserve the information from both large and small regions, but the functionthen becomes less like the physiology of the human fovea. The difference between these scaling methods in shownin Fig. 13.

If we consider saccades and eye movements, the relative scaling scheme becomes more attractive. This is becausethe human visual focus can roam over a reasonably large region and this seems to simulate a larger, more general,applicability function than just the fovea alone. Thus the applicability function becomes one which is inspired by boththe fovea and the internal regioning ability of the higher level visual perception in the brain.

7. Further illustrated examples

Figs. 14 and 15 show uncertain labelling with fovea-like applicability functions and the granular machine learner inoperation.

306 J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311

Rv R

v

A

0

1

A

0

1

(a) (b)

Fig. 13. Applicability function scaling: (a) relative scaling; (b) absolute scaling.

Fig. 14. Robot vision example.

In Fig. 14 the application is the navigation of an autonomous robot in a new hazardous environment. Although thisimage shows a safe (office) environment, the intention is to operate the robot remotely and for an operator (domainexpert) to train the robot into semi-autonomous operation. This form of remote on-line training is required of rescuerobots that cannot be completely trained to deal with unknown hazardous environments in advance.

The image in Fig. 14a is rapidly labelled by the expert using rough geometric regions into the four classes body, clothes,wall, and ceiling as shown in Fig. 14b. Fig. 14c is used in the testing of the model and shows regions correspondingto the four classes. Fig. 14d shows the results from testing the learnt model on this image when it was trained usinga uniform applicability function, Fig. 14e shows results after training with a relative-scale Gaussian function with� = 0.3, and Fig. 14f shows results with an absolute-scale Gaussian with � = 0.4 defined over 5% of the image. Forthese tests three fuzzy sets {small, medium, large} were defined on each of the image feature set {red, green, blue, meanintensity, variance of intensity}. Percentage values in both Figs. 14 and 15 indicate the percentage of pixels correctlyclassified.

J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311 307

Fig. 15. Diagnosing disease from lung scans.

Fig. 15 shows results from training the learner to detect emphysema disease in the same CT scan of the lung as usedin Fig. 8 but now with just two classes. Here the class healthy is labelled with black regions and the class diseasedis labelled with white regions, as shown in Fig. 15b. Fig. 15c is used in the testing of the model and shows regionscorresponding to the two classes. For these tests three fuzzy sets {small, medium, large} were defined on each of theimage feature set {mean intensity, variance of intensity}. Fig. 15d shows results from training where the applicabilityfunction was unity, Fig. 15e shows results using a trapezoid applicability function with parameter v = 0, and Fig. 15fshows results from training using a relative-scale Gaussian with � = 0.2.

It is clear from both the above examples that a graded applicability function improves classification accuracy above aunity function (Figs. 14d and 15d). What is less clear is the impact of the type of graded function on results. With theserelatively small image sizes (4802 and 5122, respectively) and the small number of image features used, both the simpletrapezoid and the low-variance Gaussians performed similarly well. It is expected that the difference between thesefunctions will increase as more complex problems are tackles and as the quantity and complexity of image features areincreased. What is more clear from our tests is that the relative-scale applicability function is much more convenientto use since we do not need to determine a suitable absolute scale over which to define the applicability function.

8. Comparisons to other methods

In this section we now consider how the proposed bio-mimetic learning method can be compared to other methods.In this paper we are concerned with learning rapidly from experts and for comparison to other methods we must breakthis down into two parts:

• How to segment and label an image in order to extract training data.• How to learn from uncertain training data.

Let us now consider these two parts in turn.

8.1. Segmentation and labelling

In Section 2.1 we showed how experts can naturally label rough regions in an image and in Section 3 we showed howwe can extract uncertain training data from these images. Unfortunately conventional image processing techniques force

308 J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311

Fig. 16. Blobworld segmentation results.

Fig. 17. JSEG segmentation results.

the expert into a different labelling paradigm, almost exclusively based on machine segmentation. In these approachesan image is automatically segmented and the expert is then required to label the image segments. A slight variationdemands that the expert pinpoint the centres of key areas and a segmentation algorithm then segments the image basedon these pinpoints, for example using geodesic active contours [6] or flow fields [20]. The disadvantages of thesesegmentation methods include:

• The segmentation produced does not correspond to the segmentation that the expert would define, given enoughtime. This in itself is bad enough, but a further problem is that the automatic segmentation imposes an unnaturalskew on the way the expert subsequently labels the classes.

• Segmentation boundaries are crisp. No account is taken of uncertainty in labelling. Thus experts are forced intocrisply labelling regions that they would not naturally describe using crisp terms. This introduces bias and error inthe training data.

As an illustration of the problem consider the previous images in Figs. 2a, 3 and 14a. We now segment these imagesautomatically using two algorithms. Firstly, we segment using the Blobworld algorithm [5] and results are shown inFig. 16. Secondly, we segment using the JSEG algorithm [8] with results shown in Fig. 17.

It is clear from both these algorithms that the segmentation boundaries do not coincide with the boundaries definedby the crisp classes shown in Figs. 2b and 14c, nor do they in fact agree with each other. It is also clear that since thesesegmentations cross the crisp boundaries in 2b and 14b these segmentations will impose a labelling bias and will “lead”the expert. Giving the expert such prior boundaries will greatly affect his ability to correctly label the most suitabletraining data. Further to this is the inevitable frustration the expert will experience when they seek to label some imageregion that is, for example, only partially covered by two separate regions.

Regardless of the automatic segmentation method used, the problem remains that the expert is thinking of a seg-mentation based on some problem domain features which, unless we are very lucky, will not exactly correspond to thefeatures used in automatic segmentation. Where our proposed method succeeds is in representing the segmentation andclassification information supplied only by the experts.

One interesting segmentation method that is worthy of further study is described in [23] and takes into accountthe depth of field of regions in the image. Combining such a bio-mimetic segmentation method and the proposedapplicability function with eye-tracking and focus-tracking sensors could help automate training in applications, suchas robot navigation, where depth information is important.

J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311 309

Table 2Classification comparison (% correct classified)

Cross-product granule learner, 5 fuzzy sets per feature 52.3%SVM trained using applicability value as weight value 70.3%SVM trained with no weight values 71.3%

Fig. 18. Classification error images, white = misclassified: (a) cross-product granule learner; (b) weighted SVM; (c) unweighted SVM.

8.2. Learning from uncertain data

Let us now consider alternative mechanisms for learning from uncertain data. In this paper we consider learningfrom a set of tuples:

T = 〈{f1, . . . , fm}, C, a〉,where a is an applicability value. We have proposed that the cross-product granular learning algorithm is a goodalgorithm to learn from this uncertain data in order to generate linguistic models.

If we temporarily ignore the need for an easily interpretable linguistic model we can consider alternative machinelearners such as neural networks and support vector machines (SVMs). Let us consider the database T of 342 = 1156labelled tuples including an applicability values, generated using expert labelled regions from Fig. 14a and b. We nowperform learning using these case, with features {red, green, blue, mean intensity, variance of intensity} and relativeapplicability with � = 0.3, and the results shown in Table 2. To implement the SVM we used the libsvm library [7]with additional support for weighted data by M.-W. Chang and H.-T. Lin.

Corresponding classification error regions within the image are also shown in Fig. 18. Here misclassified regionsare white.

Clearly the SVM performs better at point-by-point classification. On the other hand, it is most difficult to interpret thetrained SVM model. In the expert domains we are discussing in this paper such a black box model, unlike the proposedgranular learner, provides no linguistically interpretable information. Where linguistic interpretability is not required acombination of uncertain labelling of images (as described in Section 3) and black box learner (such as weighted SVM)may well provide the most appropriate solutions. Where linguistic interpretability is required the granular learner is farmore appropriate.

It should, of course, be noted that the complexity of the basic granular learner is exponential in number of featurefuzzy sets. Thus for computability we have limited the above tests to five image features and five fuzzy sets per feature.This gives a maximal cross-product label universe of 55 = 3125 members. Accuracy naturally improves with increasednumber of fuzzy sets (see [19,18] for more discussion on cross-product fuzzy set complexity) but computationalrequirements are substantially greater. With the basic, un-pruned, granular fuzzy sets used in these tests, and the currentimplementation, we restrict the cross-product label universe to this approximate size. This yields a complete learningsystem implemented in Java (for the image labelling front end) and C (for the granular learning algorithm) that is apractical system for generating uncertain computing-with-words models of expert problem domains. Future work willconcentrate on improving the accuracy of the granular learning method whilst maintaining computability and speed oflearning.

310 J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311

9. Conclusions

We have presented a method for experts to rapidly generate classification data sets from images. The general principleinvolves defining rough regions of interest that are linked to a previously agreed applicability function. This applicabilityis defined through some simple bio-mimetic assumptions and observations and is the enabling factor in this learningprocess. Cross-product granular models are generated from this vague information and unseen images can then beclassified. Although not as empirically accurate as the compared SVM the cross-product fuzzy model has a crucialcharacteristic: it can be readily interpreted using linguistic terms in line with the general principle of modelling withwords. Application examples include medical image diagnosis and robot navigation.

Although the general principles has been shown here, improvements can be made along a number of fronts. Forexample, a more refined analysis of human vision may yield a more suitable applicability function.Also a more extensiveset of image features may be needed for practical application. Since this work emphasises the human-like aspects ofvision and behaviour, image features along the lines of those presented in [21] may be most suitable. In this way aclassifier may generate a richer linguitic report on its decisions such as, “this region appears to be diseased because itis quite dark and has a spiky shape.”

It is important to also note that comparisons to other image classification methods may not be wholly appropriate.Even so, we have attempted to compare this approach to other segmentation and learning algorithms. The goal in thisresearch is the rapid extraction of expert information and the consequent learning from this information, and othermethods typically take no account of this. In other words, the benefit of this method in terms of reduced load on theexperts suggests greater gains than can be seen in purely empirical results.

Acknowledgements

The first author holds a Royal Society Research Fellowship and this research has been made possible by the generoussupport of the Royal Society.

References

[1] J.F. Baldwin, J. Lawry , T.P. Martin, Efficient algorithms for semantic unification, in: Proc. IPMU, 1996.[2] J.F. Baldwin, T.P. Martin, J.G. Shanahan, Modelling with words using Cartesian granule features, in: Proc. Internat. Conf. on Fuzzy Systems,

FuzzIEEE 1997, Barcelona, Spain, 1997, pp. 1295–1300.[3] A.C. Bovik, Analysis of multichannel narrow-band filters for image texture segmentation, IEEE Trans. Signal Process. 39 (9) (1991)

2025–2043.[4] P. Buser, M. Imbert, Vision, MIT Press, New York, 1992.[5] C. Carson, S. Belongie, H. Greenspan, J. Malik, Blobworld: image segmentation using expectation-maximization and its applications to image

querying, IEEE Trans. Pattern Anal. Mach. Intell. 24 (8) (2002) 1026–1038.[6] V. Caselles, R. Kimmel, G. Sapiro, Geodesic active contours, Internat. J. Comput. Vision (1997) 61–79.[7] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, 2001. Software available at 〈http://www.csie.ntu.edu.tw/cjlin/libsvm〉.[8] Y. Deng, B.S. Manjunath, Unsupervised segmentation of color-texture regions in images and video, IEEE Trans. Pattern Anal. Mach. Intell.

23 (8) 800–810.[9] C. Friedman, P.O. Alderson, J.H. Austin, J.J. Cimino, S.B. Johnson, A general natural language text processor for clinical radiology, J. Amer.

Med. Inform. Assoc. 1 (2) (1994) 161–174.[10] A. Horwood, S.J. Hogan, P.R. Goddard, J.M. Rossiter, Computer-assisted diagnosis of CT pulmonary images, Applied Nonlinear Mathematics

Technical Report #200109, University of Bristol.[11] J. Jeon, V. Lavrenko, R. Manmatha, Automatic image annotation and retrieval using crossmedia relevance models, in: SIGIR’03, Toronto

Canada, July 28–August 1, 2003.[12] J. Jiwoon, R. Manmatha, Using maximum entropy for automatic image annotation, in: Proc. Third Internat. Conf. on Image and Video Retrieval,

2004, pp. 24–32.[13] J.M. Rossiter, Humanist computing: modelling with words, concepts, and behaviours, in: J. Lawry, J.G. Shanahan, A.L. Ralescu (Eds.),

Modelling with Words—Learning, Fusion, and Reasoning within a Formal Linguistic Representation Framework, Lecture Notes in ComputerScience, Vol. 2873, Springer, Berlin, 2003.

[14] J. Rossiter, T. Mukai, The rapid elicitation of knowledge about images using fuzzy information granules, in: Proc. Internat. Conf. on FuzzySystems, FuzzIEEE 2004, Budapest, August 2004.

[15] J.M. Rossiter, T. Mukai, Learning from uncertain image data using granular fuzzy sets and bio-mimetic applicability functions, in: Proc. FourthConf. of the European Society of Fuzzy Logic and Technology, Eusflat 2005, 2005, pp. 1070–1075.

[16] J.M. Rossiter, T. Mukai, P.R. Goddard, Intelligent sensor fusion in uncertain taxonomical hierarchies, in: Proc. 2002 UK Workshop onComputational Intelligence, pp. 155–161.

J. Rossiter, T. Mukai / Fuzzy Sets and Systems 158 (2007) 295–311 311

[17] S. Samuels, H.L. Kundel, C.F. Nodine, L.C. Toto, Mechanism of satisfaction of search: eye position recordings in the reading of chest radiographs,Radiology 94 (1995) 895–902.

[18] J.G. Shanahan, Soft Computing for Knowledge Discovery: Introducing Cartesian Granule Features, Kluwer Academic Publishers, Dordrecht,2000.

[19] J.G. Shanahan, J.F. Baldwin, T.P. Martin, Knowledge discovery using cartesian granule features with applications, in: Proc. Internat. Conf. ofthe North American Fuzzy Information Processing Society, NAFIPS 1999, New York, 1999, pp. 228–232.

[20] B. Sumengen, B.S. Manjunath, C. Kenney, Image segmentation using curve evolution and flow fields, in: Proc. IEEE Internat. Conf. on ImageProcessing (ICIP), Rochester, NY, USA, September 2002.

[21] H. Tamura, S. Mori, T. Yamawaki, Textual features corresponding to visual perception, IEEE Trans. Systems Man Cybernet. 6 (8) (1978).[22] S. Ullman, High-Level Vision: Object Recognition and Visual Cognition, MIT Press, New York, 1996.[23] J.Z. Wang, J. Li, R.M. Gray, G. Wiederhold, Unsupervised multiresolution segmentation for images with low depth of field, IEEE Trans. Pattern

Anal. Mach. Intell. 23 (1) (2001).[24] C. Yang, M. Dong, J. Hua, Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning, in:

IEEE CVPR 2006, New York City, NY, 2006.