phytoplankton recognition using parametric discriminants

18
Journal of Plankton Research Vol.18 noJ pp.393-41O, 1996 Phytoplankton recognition using parametric discriminants Helen McCall, Isabel Bravo 1 , J.Alistair Lindley and Beatriz Reguera 1 Plymouth Marine Laboratory, Prospect Place, Plymouth PL1 3DH, UK and 'Institute Espanol de Oceanografia, Cabo Estay-Canido, 36280 Vigo, Spain Abstract A comparison was made between the use of linear and quadratic discriminant functions for classifying phytoplankton specimens of the genera Dinophysis and Ceratium by means of a general morphometric function. The class distributions were found to fit quadratic boundaries better than linear boundaries. A nine species quadratic discriminant classified within 95% confidence intervals. Morphological variants not used in the calibration were all correctly identified, although control species unknown to the model were poorly rejected. An accuracy of 99% was obtained for separating three morphological variants of Dinophysis acuminata. Digitalfilterswere developed to extract the morpho- metric function directly from photomicrograph images, and present the data as an orientation-indepen- dent feature vector. Using this feature vector, a quadratic discriminant classified test data from 14 species of the genera Dinophysis, Ceratium and Ornithocercus with an accuracy of 83%, with 37% of the error due to two similarly shaped species of Dinophysis overlapping. Introduction The purpose of this investigation was to examine the problem of automated visual identification of marine plankton, from a conventional statistical and mathemat- ical viewpoint, for the development of feature extraction and pattern recognition techniques for use in marine environmental research. The investigation was designed to explore the multivariate relationships between a range of taxonomic classes of marine dinoflagellates based on their respective morphologies, and to explore the use of conventional image-analysis techniques to extract the pertinent morphological features for an automated classifier. There is a growing requirement worldwide for efficient and inexpensive systems for monitoring phytoplankton (Lembeyeefa/., 1993; Reguera et al., 1993) to facili- tate effective control of fishery harvesting. Marine phytoplankton and protozooplankton have traditionally been identified by visual descriptions of their morphology, as illustrated by simple line drawings of their respective shapes in published keys (Sournia, 1967; Dodge, 1982). When using conventional published keys, a major component of the information derived from the line drawings is the overall outline shape of the organism. By ignoring the descriptions of fine detail of taxa-specific features, and taking the overall outline shape alone, the concept was developed for this project of a general morphometric function, equally applicable to all rigid-bodied plankton organisms, allowing direct and quantitative comparison. Method Plankton samples were obtained from the archives of Plymouth Marine Lab- oratory, Centro Oceanographico de Vigo, the Sir Alistair Hardy Foundation for Ocean Science and the Forth River Purification Board, covering a range of locations in the North East Atlantic, including the Azores, Gallician rias, North © Oxford University Press 393 by guest on September 18, 2011 plankt.oxfordjournals.org Downloaded from

Upload: independent

Post on 27-Apr-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Plankton Research Vol.18 noJ pp.393-41O, 1996

Phytoplankton recognition using parametric discriminants

Helen McCall, Isabel Bravo1, J.Alistair Lindley and Beatriz Reguera1

Plymouth Marine Laboratory, Prospect Place, Plymouth PL1 3DH, UK and'Institute Espanol de Oceanografia, Cabo Estay-Canido, 36280 Vigo, Spain

Abstract A comparison was made between the use of linear and quadratic discriminant functions forclassifying phytoplankton specimens of the genera Dinophysis and Ceratium by means of a generalmorphometric function. The class distributions were found to fit quadratic boundaries better thanlinear boundaries. A nine species quadratic discriminant classified within 95% confidence intervals.Morphological variants not used in the calibration were all correctly identified, although control speciesunknown to the model were poorly rejected. An accuracy of 99% was obtained for separating threemorphological variants of Dinophysis acuminata. Digital filters were developed to extract the morpho-metric function directly from photomicrograph images, and present the data as an orientation-indepen-dent feature vector. Using this feature vector, a quadratic discriminant classified test data from 14species of the genera Dinophysis, Ceratium and Ornithocercus with an accuracy of 83%, with 37% ofthe error due to two similarly shaped species of Dinophysis overlapping.

Introduction

The purpose of this investigation was to examine the problem of automated visualidentification of marine plankton, from a conventional statistical and mathemat-ical viewpoint, for the development of feature extraction and pattern recognitiontechniques for use in marine environmental research. The investigation wasdesigned to explore the multivariate relationships between a range of taxonomicclasses of marine dinoflagellates based on their respective morphologies, and toexplore the use of conventional image-analysis techniques to extract the pertinentmorphological features for an automated classifier.

There is a growing requirement worldwide for efficient and inexpensive systemsfor monitoring phytoplankton (Lembeyeefa/., 1993; Reguera et al., 1993) to facili-tate effective control of fishery harvesting.

Marine phytoplankton and protozooplankton have traditionally been identifiedby visual descriptions of their morphology, as illustrated by simple line drawings oftheir respective shapes in published keys (Sournia, 1967; Dodge, 1982). Whenusing conventional published keys, a major component of the information derivedfrom the line drawings is the overall outline shape of the organism. By ignoring thedescriptions of fine detail of taxa-specific features, and taking the overall outlineshape alone, the concept was developed for this project of a general morphometricfunction, equally applicable to all rigid-bodied plankton organisms, allowingdirect and quantitative comparison.

Method

Plankton samples were obtained from the archives of Plymouth Marine Lab-oratory, Centro Oceanographico de Vigo, the Sir Alistair Hardy Foundation forOcean Science and the Forth River Purification Board, covering a range oflocations in the North East Atlantic, including the Azores, Gallician rias, North

© Oxford University Press 393

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

H.McCall tt al

Sea and the shelf seas to the west of Britain (see Table I). Specimens were photo-graphed using 35 mm cameras attached to inverting microscopes. A range of mag-nifications was used: 10 x 10,10 x 20,10 x 40 and 10 x 63.Eachsetofopticsateachmagnification was calibrated by photographing a micrometre scale.

A database containing 3805 photomicrographs of marine phytoplankton andzooplankton was indexed, and the specimens classified and cross-checked againstpublished keys (Jorgenson, 1911; Kofoid and Campbell, 1929; Schiller, 1933;Sournia, 1967; Dodge, 1982; Larsen and Moestrup, 1992).

The use of a description of the overall outline shape as a general morphometricfunction for classification was first established using a set of manual measurementsto describe the function.

Manual measurements were taken of 1025 specimens of the genera Dinophysisand Ceratium, comprising nine main species of interest (Figure 1): Ceratium tripos,Ceratium horridum, Ceratium longipes, Ceratium arcticum, Ceratium azoricum,Dinophysis acuta, Dinophysis acuminata, Dinophysis rotundata and Dinophysissacculus; plus nine controls comprising two morphological variants, Ceratiumtripos var. pulchellum and Ceratium horridum var. buceros, and seven other spe-cies: Ceratium dedinatum, Ceratium symmetricum, Dinophysis dens, Dinophysispulchellum, Dinophysis norvegica, Dinophysis caudata and Dinophysis tripos,similar in form to the nine main species.

The measurements were taken from photomicrograph prints using vernier cali-pers. The four parameters measured — total length, maximum width, girdle widthand height of the maximum width — were chosen to give a description of the over-all shape of the theca (see Figure 2). The parameters measured constituted dis-tances between points around the outline shape, and corresponded to lengths andwidths which might reasonably be assumed to approximate to a normal distri-bution in a population to allow for the use of a parametric discriminant. The exter-nal list structures found on Dinophysis spp. were omitted from the overall functionof the shape for the manual measurements to remove problems of variance due tophysical damage, and partial absence in newly divided cells. To provide a fixedreference for orientation, the girdle was taken to be as close to horizontal as poss-ible, with the width measurements taken to be in parallel to the girdle, and thelength and height measurements at right angles to the girdle. For convenience inmaking these measurements, the conventional orientation of Ceratium spp. figures

Table I. Equipment, location and date of samples used from Plymouth Marine Laboratory archive

Equipment

Continuous plankton recorderContinuous plankton recorderContinuous plankton recorderLonghurst-Hardy plankton recorderLonghurst-Hardy plankton recorderLonghurst-Hardy plankton recorderLonghurst-Hardy plankton recorderWater bottles

Forth River Purification Board samplesPlankton net

Location

48°59'N 07°22'W42°31'N69°07'W48°48'N 50°58'W31°00'N 29°44'W59°09'N 18°45'W51°00'N 13°00'W region54°00'N 06°O9'EAzores area

56°00'N 03°00'W region

Date

20/08/8912/05/9118/11/9112/05/8130/08/8106/05/8611/06/8913/03/92

1992

394

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

Phytophuikton recognition using parametric discriminants

m

Fig. L Calibration species from the genera Ceratium, Omithocercus and Dinophysis: (a) C.horridum,(b) C.tripos, (c) C.lineatwn, (d) C.pentagonum, (e) C.furca, (f) C.arcticum, (g) C.longipes, (h) C.azor-icum, (I) C.fusus, (j) O.steinii, (k) D.acuta, (I) D.rotundata, (m) D.acuminata, (n) D.sacculus.

was inverted, making the apical horn point towards the base. This inversion was toallow species such as Ceratium furca and Ceratium lineatum to be measured like-wise without the height of maximum width giving a negative value. For similarreasons, where the antapical homs extended beyond the extent of the apical horn,the base was taken to be the end of the antapical horns. The base was, in both cases,the lowest point when the girdle was horizontal and the apical horn pointeddownwards.

395

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

H.McCall el aL

a) Ceratium spp.

Fig. 2. Manual morphometnc measurements.

b) Dinophysis spp.

A selection of micrometre scale photomicrographs taken at the appropriatemagnifications (10 x 10,10 x 20,10 x 40,10 x 63) on the same microscopes werealso measured, and all measurements scaled accordingly to micrometres.

A further set of measurements was taken from 124 individuals comprising threemorphological variants of D.acuminata found in samples from the Gallician rias(Bravo et aL, 1993). Three parameters were measured (see Figure 3): maximumlength from the girdle plane to the thecal base, width of the theca from the mostconvex part of the dorsal edge to the ventral end and parallel to the girdle, and theconvexity of the dorsal edge measured as the angle a formed between the line MCand the girdle plane, where M is a point on the dorsal edge 6 u.m below the girdleplane and C is the centre of the closest circumference to the dorsal edge.

General morphometric functions were extracted as simple outline shapes (Fig-ure 4) from the 256 x 256 pixel, 256 greyscale digital images (Figure 5) of 14 speciesfrom the three genera Ceratium, Dinophysis and Omithocercus, using adaptions ofconventional digital filters (Gonzalez and Woods, 1992). The analytical algorithmswere developed in the 'C language on a UNIX workstation.

The initial constraints set for extraction of the outlines were that the organismshould be located in the centre of the image; the organism should be in focus with-out portions being obscured by other objects, or cut off by the edge of the image.

Initial attempts were made to extract outlines using the Sobel edge detect filter(Gonzalez and Woods, 1992). This filter differentiates the image, resulting in ahorizontal and a vertical gradient component, along with a directional component.The horizontal and vertical components were combined as a root sum of squares ofthe corresponding elements, to provide a simplified Sobel gradient, and the gradi-ent thresholded to produce a binarized image of the edges present in the original

3%

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

Pbytoplantton recognition using parametric discriminants

Dinophysis acuminata

Fig. 3. Angular measurements.

V118F12 P18F23 P4F16Dinophysis sacculus Ctratitan Horridum Ceratium tripos

151 puels/lOOtim 52 pixels/lOOum 52 pixelVlOOym

Fig. 4. Extracted outlines (scaled in pixels/100 (j.m).

P4F17Dinophysis acula

110 pixels/1 OOJUTI

image. This produced easily recognizable outlines of the organisms, but the out-lines were rather incomplete, making it impractical to attempt separating the out-line of the organism from the background detritus edges, or the edges formed bythe internal structures of the organism, because any attempt to use a filling algor-ithm simply bled through the breaks in the outline edges.

To overcome the difficulties in separating a gradient-based outline from sur-rounding detritus (Figure 5) and other organisms, the algorithm was adapted toseparate out a region of high density of edge gradients. The Sobel horizontal andvertical operators were applied, and the two resulting gradients were individuallythresholded, binarized and then combined with a Boolean AND of their corre-sponding elements (Figure 5b). A 3 x 3 dilating filter (Gonzalez and Woods, 1992)was then applied to homogenize the region of high density of gradients (Figure 5c),followed by an eroding filter to remove the external thickening of the region

397

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

H.McCall et al

d) original image b) sobel derived edge gradients

c) region filled by dilation d) thinned by erosion e) extracted outline

Fig. 5. Feature extraction from image of C.longipes (specimen P70F08).

caused by the dilation. The result was an image comprising filled regions corre-sponding to the various objects in the field (Figure 5d).

The organism was then separated from the surrounding objects by selecting theregion of filling closest to the centre of the image and following a coherent track offilling until the edge of the object was found. To check that this was the true edge ofthe object and not that of an unfilled void within the organism's outline, a coherenttrack was then followed to the perimeter of the image. If the perimeter of the imagewas not found, then the position on the edge of the object was assumed not to bethe true edge and a new track across filled space started from that point.

Having identified a locus on the true edge of the organism, a contour-followingalgorithm traced around the edge of the organism, recording the loci in a linked listof coordinates. Thus, the outline shape (Figures 5e and 4) or morphometric func-tion was reduced to a one-dimensional set of vectors.

The constraints set on the extraction of morphometric data from the outlineswere that all measurements should be orientation and position independent, andshould all be in the same units of measurement. Recognition of the orientation ofan organism, or of a defined locus on an organism, would logically require an initialidentification of the organism. Setting a standard unit of measurement, in this casethe micrometre, was to prevent any artifacts due to the differential scale of unitsfrom affecting the shape of the class distributions in the multivariate analyses.

398

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

Pbytoplankton recognition using parametric discriminants

The length of the perimeter of the organism was measured by triangulatingbetween adjacent loci. The area of the organism was approximated as a count ofpixels within the outline. The square root of the area in micrometres was calculatedto provide a parameter in the standard units.

A histogram of distances between each pair of loci was computed, sampling at4 jtm intervals. From this the maximum length was derived as the highest intervalwhere data were recorded. A simple description of the shape of the histogram wasprovided by the mean and SD of the histogram.

The feature vector was comprised of the five morphometric parameters inmicrometres: perimeter, square root of area, maximum length, and mean and SDof the histogram of distances between loci.

The manual and outline data were split into Model calibration and Test datasetsusing the random function provided in the 'C language standard library.

Parametric discriminant functions (Phillips et al., 1973; Statistical Analysis Sys-tems Institute, 1985) were employed to investigate and model the separationbetween the morphological classes, and to demonstrate the application of such amodel to statistical classification of individual specimens. The discriminant func-tions utilized are analogous to supervised procedures requiring the classes to bepre-defined, as opposed to cluster analyses which are analogous to unsupervisedprocedures needing no prior knowledge of the classes. These discriminant func-tions assume a multivariate normal distribution. The quadratic discriminant func-tion is based upon Mahalanobis distances between class means, weighted by thewithin-class covariance matrices. The linear discriminant assumes covariancematrices to be equal and pools the covariance matrices to produce distancesbetween class means which are analogous to Euclidean distances.

The prior probabilities were assumed equal for the discriminant analyses,because no other assumptions could be made concerning expected class sizes innatural populations.

A pilot study on classifying four species of Dinophysis using total length, maxi-mum width and girdle width suggested that the boundaries between classes werelinear, and so a linear discriminant was initially attempted.

Results

The results of the manual measurements are shown in Table II. It can be seen fromthe means and SDs shown that for each of the four parameters measured there arespecies which overlap within the SDs of the class distributions, e.g. all the Ceratiumspp. except C.tripos overlap with each other on total length, and D.acuminataoverlaps with D. rotundata on total length. All such overlap is within genus, with nooverlap between the two genera. The most conservative parameter measured canbe seen to be girdle width, where the SDs are the lowest as a proportion of themeans within classes, and the total population's SD is the lowest as a proportion ofthe total mean. There is also the least amount of overlap shown on this parameter,with only C.arcticum and C.longipes overlapping within the SDs from their means.However, the total population's SD for girdle width is sufficiently low (i.e. the classmeans are sufficiently close to each other) for the tails of the individual class distri-

399

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

H.McCall et aL

Table n. Manual morphometnc calibration data, within-class statistics in micrometres

Taxonomic class

C.arcticumC.azoricumC.horridumC.tongipesC. triposD.acuminataD.aculaD.rotundataDsacculus

Total

Total length(tim)

Mean

191.64137.39204.34173.46272.4952.2975.2048.3743.35

99.75

SD

32.6621.6429.2617.5519.273.874.772.592.05

70.85

Girdle width((irn)

Mean

58.1049.2241.3957.9473.2317.2924.9730.758.79

30.54

SD

3.423.393.213.375.812.782.062.271.25

18.54

Maximumwidth (pim)

Mean

455.7892.10

174.62262.04211.5136.3651.5242.9622.20

100.08

SD

58.8112.3922.4851.6724.624.222.232.312.31

114.01

Height ofmaximum

width

Mean

92.6285.49

135.6959.27

144.9327.0128.4927.9123.61

51.13

SD

37.5716.9926.3420.8631.983.292.442.452.43

43.37

Natural log ofthe deter-

minant of thecovariancematrix

22.8213.5920.6521.5720.178.357.855.274.00

pooled = 18.03

140

120

100

C. arcticum

C. azoricum

C. horridum

C. longipes

C. tripos

D. acuminata

D. acuta

D. rotundata

D. sacculus

2 6 10 14 18 22 26 30 34 38 42 46 50 54 56 62 70 74 78 82 84

Girdle width (um)

Fig. 6. Distribution of girdle widths in the genera Ceratium and Dinophysis.

400

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

Phytoplankton recognition using parametric discriminants

butions to overlap with their neighbours in the frequency histogram (Figure 6).Therefore, the classification is a multivariate problem.

The most variable classes can be seen to be C.arcticum and C.longipes withD.acuminata being the most variable of the genus Dinophysis. This is also reflectedin the within-class covariance matrices, as shown by the natural logs of their deter-minants in Table II.

A linear discriminant function assuming a normal distribution (see Table HI)showed relatively high Mahalanobis distances between most of the nine species,with just D.acuminata showing itself to be relatively very close to both D.sacculusand D.acuta, and D.acuta being rather close to D.rotundata.

Classifications derived from the posterior probabilities of class membership onresubstitution of the Test dataset onto the linear discriminant model (see TableIV) show very good separation with <5 % classification error overall. This stronglysuggests that the within-class distributions are indeed normal. However, an exam-ination of the erroneous classifications shows some serious anomalies: the classifi-cation table (Table IV) shows a C.horridum reclassified as a D.acuta, and threeC.azoricum reclassified as D.rotundata. Each of these specimens had measure-ments which were towards the limits of the distributions for their correct classes,but each of their parameters were totally different from those of the classes derivedfrom their posterior probabilities. Despite these parameters being very distantfrom those of the resultant class, the anomalous errors mostly had posterior prob-abilities >0.99, making it difficult to reject these specimens by thresholding withinset confidence intervals.

An examination of the within-class covariance matrices, as depicted by the natu-ral logs of the determinants of the covariance matrices in Table II, shows that thecovariance matrices for the classes within genera are reasonably similar, and somight be assumed to have linear boundaries within genus. However, the twogenera fall into two separate groupings and the pooled covariance matrix gives anatural log of its determinant which is within the range for Ceratium spp., but verymuch higher than those for Dinophysis spp. Therefore, on a linear model withpooled covariance matrices, the distances between any of the Dinophysis classmeans and any one specimen are artificially and dramatically foreshortened by thegrossly inappropriate magnitude of the assumed variation within the classdistribution.

The experiment was repeated using a quadratic discriminant function whereeach class is modelled on its individual covariance matrix. This gave a total errorrate of 0.03 for classification of the Test dataset. The trans-genera anomaliesshown for misclassifications in the linear model were not repeated with the quad-ratic model (Table V).

The average posterior probabilities for class membership of the Test dataset(Table V) show that, with one exception, all the classes obtained mean posteriorprobabilities >0.99 for correct classifications and mean posterior probabilities<0.95 for misclassifications. This suggests that much of the misclassification shouldbe rejected if a threshold was set at 95% confidence intervals. The exception wasC.azoricum with a mean posterior probability of 0.9834 for correct classification.The apparent anomaly of a C.azoricum (specimen P21F25) being misclassified as aC.horridum with a posterior probability of 0.9982 was found to be for a specimen atthe extreme limits of the range of C.azoricum as described previously (Sournia,

401

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

Tab

le I

II. D

ista

nces

bet

wee

n ni

ne s

peci

es i

n li

near

dis

crim

inan

t of

man

ual

mor

phom

etri

c da

ta

C.a

rcti

cum

C

.azo

ricu

m

C.h

orri

dum

C

.long

ipes

C

.lrip

osD

.acu

min

ata

D.a

cula

D.r

otun

data

D

.sac

culu

s

C.a

rcti

cum

C.a

zori

cum

C.h

orri

dum

C.lo

ngip

esC

.lrip

osD

.acu

min

ala

D.a

cula

D.r

otun

data

D.s

accu

lus

029

8.27

215.

8994

.82

284.

6948

0.39

408.

6940

4.39

574.

23

0 51.7

878

.34

134.

3915

2.52

94.9

668

.00

235.

78

0 71.3

913

9.55

158.

2010

6.83

126.

7222

0.08

010

3.72

269.

0619

1.14

184.

0036

3.57

051

9.92

399.

8238

2.45

651.

29

0 8.99

29.8

79.

82

0 17.0

434

.65

0 72.4

2

Tab

le I

V. C

lass

ific

atio

n su

mm

ary

from

lin

ear

disc

rim

inan

t of

man

ual

mor

phom

etri

c da

ta

Fro

m t

axa

C.a

rcti

cum

C.a

zori

cum

C.h

orri

dum

C.lo

ngip

esC

. tri

pos

D.a

cum

inal

aD

.acu

laD

.rot

unda

taD

.sac

culu

s

Tot

al

Err

or r

ate

Into

tax

a

C.a

rcti

cum

37 37 0.02

63

C.a

zori

cum

26 26 0.10

34

C.h

orri

dum

22 22 0.08

33

C.lo

ngip

es

1 123 25 0.

0

C. t

ripo

s

37 37 0.0

D.a

cum

inat

a

72 2 1

75 0.12

2

D.a

cuta

1 2 87 90 0.03

33

D.r

otun

data

3 152 56 0.

0189

D.s

accu

lus

8 90 98 0.0

Tot

al

38 29 24 23 37 82 90 53 90 466 0.

0430

by guest on September 18, 2011plankt.oxfordjournals.orgDownloaded from

Tab

le V

. Mea

n po

ster

ior

prob

abil

itie

s fo

r cl

ass

mem

bers

hip

from

qua

drat

ic d

iscr

imin

ant

of m

anua

l m

easu

rem

ents

Fro

m t

axa

C.a

rcti

cum

C.a

zori

cum

C.h

orri

dum

C.lo

ngip

esC

tnpo

sD

.acu

min

ata

D.a

cnta

D.r

otun

data

D.s

accu

lus

To

tal

Into

tax

a

C.a

rcti

cum

C

.azo

ricu

m

C.h

orri

dum

C

.long

ipes

C

.trip

os

D.a

cum

inat

a D

.acu

ta

D.r

otun

data

D

.sac

culu

s

0.99

47

0.92

81

0.98

340.

9982

1.00

00

0.99

300.

9834

0.99

99

0.68

520.

9058

0.94

530.

9971

0.94

28

0.97

26

1.00

00

1.00

00

0.99

66

0.84

34

0.99

48

0.99

97

0.99

97

1.00

00

1.00

00

0.99

16

0.99

16

o § by guest on September 18, 2011plankt.oxfordjournals.orgDownloaded from

H.McCaII el al.

1967). Specimen P21F25 had a girdle width of 41.42 u-m, which is at the mean valuefor C.horridum, and the antapical horns were at a very obtuse angle and wereparticularly short, which led to measurements for maximum width, height of maxi-mum width and total length which had ratios similar to those found in C.horridum.This specimen was not classed as taxonomically poor during the initial preparationof the index because it corresponded well to one of the extreme examplesdescribed by Sournia. However, it is highly atypical for the species range as awhole.

Repeating the quadratic discriminant experiment with an acceptance thresholdset within 95% confidence intervals (Table VI) gave a 3.22% rejection overall,resulting in a 60% reduction in misclassifications and just a 2% reduction in thenumber of correct classifications. Only four specimens out of the 466 in the Testdataset were wrongly classified, giving <1% misclassification.

Nine control classes which were not present in the Model calibration datasetwere included in the Test dataset for the thresholded quadratic discriminant(Table VI) to test the reactions to forms unknown to the model. The morphologi-cal variants C.horridum var. buceros and C.tripos var. pulchellum were all cor-rectly identified as C.horridum and C.tripos, respectively. The other two controlclasses of Ceratium species (C.declinatum and C.symmetricum) with their own dis-tinct morphologies only achieved a 10% rejection overall with the remainder beingmisclassified as other Ceratium species. Dinophysis pulchellum, which are thesame shape as D.rotundata but smaller in overall size, were classified as D.rotun-data. Dinophysis dens, which may be a gamete of D.acuta (Moita and Sampayo,1993), but is very similar in form to D.acuminata, was classified as D.acuminata.Dinophysis norvegica was classified as the very similar D.acuta.

The major anomaly in the classification of the controls was the misclassificationof most of the D.caudata and the D.tripos as C.horridum. These two Dinophysisspecies are very like one another and are closest in form to an over-large D.acuta(which one specimen was classified as). The measured parameters for these speci-mens in no way resembled those of the C.horridum in either magnitude or ratios.The only anomaly observable in the measured parameters for these two controlclasses was that each measurement was very close to the mean value for the corre-sponding parameter in the total inter-class means shown in Table II. This wouldhave placed these specimens in a tight cluster at the centre of the total inter-classdistributions.

To investigate this anomaly further, the experiment was repeated using pooledcovariance matrices so that the distances would be analogous to Euclidean dis-tances. This resulted in all the D.caudata and the D.tripos being classified asD.acuta, the closest class on Euclidean distance. It might be assumed then that thisanomaly was due to the very high covariances for C.horridum (Table II), coupledwith the fact that on the most conservative feature (girdle width) C.horridum is theclosest of the high-covariance classes to the D.caudata mean of 26.44 \hm, with theminimum C.horridum measurement being 34.30 jim. This anomaly might alsopoint to the distribution of C.horridum departing from a multivariate normaldistribution.

404

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

Tab

le V

I. M

anua

l m

orph

omet

nc t

est

data

cla

ssif

ied

wit

hin

95

% c

onfi

denc

e in

terv

als

Fro

m t

axa

C.a

rcti

cum

C.a

zori

cum

C.h

orri

dum

C.lo

ngip

esC

.trip

osD

.acu

min

ata

D.a

cuta

D.r

otun

data

D.s

accu

lus

Con

trol

sC

.hor

ridu

m

var.

buce

ros

C.tr

ipos

var

.pu

lche

llum

C.d

ecli

natu

m.

C.s

ymm

etri

cum

D.c

auda

taD

.den

sD

.nor

vegi

caD

.pul

chel

lum

D. t

ripo

s

Tot

al

Err

or r

ate

Into

taxa

C.a

rcti

cum

C

.azo

ricu

m

C.h

orri

dum

3624

1 23 18 2

38 1

36

27

53

0.05

0.

17

0.04

C.lo

ngip

es

1

22 2 9 34 0.04

C.tr

ipos

34

1 4 39 0.08

D.a

cum

inat

a D

.acu

ta

81 6 87 0.01

90

1 1

92 0.00

No

clas

s

D.r

otun

data

D

.sac

culu

s

2 3 1 1 1 1

5384

6 2

2 55

84

17

0.00

0.

07

Tot

al

38 29 24 23 37 82 90 53 90 18 1 2 18 9 6 1 2 1

524 0.

05

•?" |do) 1;too r § 1 3" B s a* D90 1imetnc

I

by guest on September 18, 2011plankt.oxfordjournals.orgDownloaded from

HMcCaUetal

The three morphotypes of D.acuminata found in the Rfas Bajas (Bravo et al.,1995; Figures 5-7), when classified with the quadratic discriminant, gave the resultsshown in Table VII. The total error rating for classification of the test data was 0.01,using only three parameters to describe the shape of the theca. This demonstratesthat a very low resolution function of the shape can clearly separate very similarmorphotypes. However, it might also point to multimodality within species classes,caused by the presence of distinct morphotypes, being one cause of the 5% errorstill remaining in Table VI.

The calibration data from the Model dataset of automatically extracted outlines(Figure 4) are shown in Table VIII. There is significant overlap between classeswithin the SDs of the means for each of the five parameters, showing separation ofclasses to be a multivariate problem. The SDs are mostly of the order of 10% of themeans. Some exceptions, such as the perimeter measurements for C.horridum,may be due to the adherence of particles of detritus causing the outlines to beroughened, so increasing the variation between individuals; other exceptionsbeing due to the naturally high variation in particular features, such as maximumlength of C.arcticum which corresponds to the distance between the points of theantapical horns, which is highly variable. The data for D.acuminata and D.rotun-data can be seen to overlie closely, within the SDs of their means, for each of thefive parameters, showing that this feature vector contains insufficient informationto separate these two classes as represented. There is no overlap between the threegenera.

The classification summary from a quadratic discriminant of 14 species in threegenera (Table IX) gave an accuracy of 83%, with the major contributions to theerror coming from the overlap between D.acuminata and D.rotundata, along withoverlap between C.longipes and C.tripos, which had mahalanobis distances verysimilar in value to the natural logs of the determinants of their respective covari-ance matrices.

The average posterior probabilities for class membership for the outline data(Table X) show that much of the misclassification due to peripheral overlap mightbe rejected by setting thresholds for acceptance, although the low mean posteriorprobabilities for correct classifications of C.longipes and C.tripos suggest that indi-vidual class thresholds might be more appropriate than a single pooled value ifrejection of correct classifications is to be minimized.

Table VII. Classification score for D.acuminata morphotypes test data

From class

Class FlClass F2Class F3

Total

Error rate

Into class

ClassFl

17

1

18

0.000

Class F2

14

14

0.000

Class F3

29

29

0.033

Total

171430

61

0.011

406

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

Phytoptankton recognition using parametric discriminants

Table IX. Classification summary for test data from automatically extracted outlines

From taxa Into class Total

A B C D E F G H I J K L M N

A: C.arcticum 22 2 1 6 31B: Cazoncum 30 30O.C.furca 2 27 1 2 32DC.fusus 7 28 1 36E-.Chorridum 19 2 1 22F: C.lineatum 25 1 26G: C.longipes 2 1 14 8 25H: C.penta-

gonum 2 31 1 34I: C.tripos 2 3 12 17

J: D.acuminata 65 33 5 103K.D.acuta 1 2 103 2 108L: D.rotundata 5 30 35M: D.saccutus 1 4 66 71

N: O.steinii 1 15 16

Total 26 32 38 28 23 26 21 34 26 76 103 63 71 19 586

Error rate 0.29 0.00 0.16 0.22 0.14 0.04 0.44 0.09 0.29 0.37 0.05 0.14 0.07 0.06 0.17

Table

From

X. Mean

taxa

posterior probabilities for class membership from

Into class

A B C D E F G H

outline data

I J K L M N

A: C.arcticum 0.94 0.76 0.93 0.68B: C.azoricum 0.96C: C.furca 0.75 0.99 0.98 0.77D: C.fusus 0.89 0.95 0.70E: C.horridum 0.91 0.89 1.00F: C.lineatum 1.00 0.93G: C.longipes 0.70 0.44 0.84 0.70H: C.pentagonum 0.97 0.94 0.58I: C.tripos 0.81 0.95 0.90

J: D.acuminata 0.73 0.61 0.80K-.D.acuw 0.79 0.90 0.97 0.84L: D.rotundata 0.74 0.72M: D.sacculus 0.63 0.62 0.96

N: O.steinii 0.61 0.99

Total 0.91 0.95 0.96 0.95 0.88 0.98 0.86 0.93 0.79 0.73 0.97 0.67 0.94 0.95

407

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

H.McCall el at

Conclusions

The results from the manual morphometric study demonstrate that a very low res-olution description of the outline shape of these rigid-bodied organisms is suf-ficient to provide reliable separation within 95% confidence intervals. However,the poor rejection of unmodelled classes with similar morphologies to those mod-elled (Table VI) suggests that a more detailed description of the outline should befurnished to prevent similar classes, unknown to the model, being misclassified.

The results of the classification of three morphotypes of D.acuminata (TableVII) show that the species distribution may approximate to a multimodal multi-variate normal. Many dinoflagellate species have several morphotypes, whichmight cause a similar multimodality.

Many of the dinoflagellate species have recognized morphological variants(Sournia, 1967), although in some cases, such as C.tripos var. pulchellum (Lopez,1955), a constant progression in form can be found between the typical forms andthe variants. Lopez found a constant progression in all of the measured featuresbetween C.tripos var. pulchellum (Schrod.) and C.tripos var. atlanticum Ostenf.,but also found that these two variants formed distinct bimodal classes on fre-quency distribution histograms, with the intermediate forms being very rare.

If such morphological variants form distinct modal classes, then the perform-ance of this method of discrimination might be improved by defining separateclasses for them in the calibration.

In the extraction of outlines from images, a number of attempts were made toobtain a homogeneous filled region corresponding to the organism. Attemptsusing simple high-pass and high-boost filters to differentiate the image resulted invery indistinct margins to the region of interest, as did the use of a Boolean OR tocombine the two Sobel gradients. The use of a Boolean AND appeared to give acleaner outline by rejecting pixels which were not set in both horizontal andvertical components.

The major problems encountered with this outlining process were firstly theobscuration of portions of the specimens due to overlying debris and due to thenarrow effective field of view of the digitizing equipment used, and secondly thevery low contrast and variable focusing of the lightfield illuminated photomicro-graphs, which caused a significant number of the specimens to require adjustmentof the binarizing threshold to produce a recognizable outline.

The overall contrast could be easily improved by the use of darkfield illumnationto enhance the specimen perimeter. Alternatively, the dynamic range of theimages could be analysed to enable a suitable contrast-enhancing filter to beapplied prior to the gradient filtering. The flat field presented by the background inmicrographs, due to the very shallow depth of field of the objective lens, makes thisform of feature extraction much more feasible than in general purpose imageanalysis.

The images from which the outlines were extracted were at five different magni-fications. This will have caused a degree of error in the measurements due to fractaleffects (Kaandorp, 1994). These errors might be minimized by using separatemodels for each magnification.

408

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

Pbytoplankton recognition using parametric discriminants

The perimeter of the organisms was measured as a fractal dimension, and byvarying the sampling width for this measurement a better description of the overallroughness of shape might be obtained.

The classification summary in Table IX shows a fair amount of peripheral over-lap between class distributions which suggests that the description of the outlineshould provide more detail on the finer points of the curvature of the outline. Thismight be achieved either by attempting to differentiate the curvature of the out-line, which has previously proved impractical due to the diversity of orders ofcurvature (Yarranton, 1967) or by attempting an approximation using fractaldimensioning, or else by providing a more detailed description of the shape of thehistogram of distances between loci.

Acknowledgements

We thank the Sir Alistair Hardy Foundation for Ocean Science and the ForthRiver Purification Board for making plankton samples available to us, and theUniversity of Plymouth for the use of digitizing equipment. We received valuableadvice from K.R.Clarke and M.Carr on multivariate statistics, from G.Moore onimage processing and D.Harbour on dinoflagellate taxonomy. This work waspartly funded by a CEC MAST 2 contribution to a project coordinated by MrP.Culverhouse of the University of Plymouth Centre for Intelligent Systems, andalso by the NERC Technology Initiative. Mr R.Williams was responsible forPlymouth Marine Laboratory involvement in the MAST 2 project and contributedto the photography of the specimens.

ReferencesBravo,I., Reguera.B. and Fraga,S. (1995) Description of different morphotypes of Dinophysis acum-

inata complex in the Galician Rias Baixas in 1991. In Lassus.P., Arzul.G., Erard-Le Denn.D., Gen-tian ,P. and Marcaillou-Le Baut.C. (eds), Harmful Marine Algal Blooms. Lavoisier, Paris, pp. 21-26.

DodgeJ.D. (1982) Marine Dineflagellates of the British Isles. HMSO, London, 303 pp.Gonzalez.R.C. and Woods.R.E. (1992) Digital Image Processing. Addison-Wesley, Reading, MA,

716 pp.Jorgensen.E. (1911) Die Ceratien. Eine kurze Monographie des Gattung Ceratium Schrank. Int. Rev.

Ges. Hydrobiol. Hydrogr., 4 (Biol. SuppL, T Ser.), 1-124.KaandorpJ.A. (1994) Fractal Modelling: Growth and Form in Biology. Springer-Verlag. Berlin,

208 pp.Kofoid.C.A. and Campbell, A.S. (1929) A conspectus of the marine and fresh water ciliata belonging to

the suborder Tintinnoinea, with descriptions of new species principally from the' Agassiz' expeditionto the Eastern Tropical Pacific 1904-1905. Univ. Calif. Publ. ZooL, 34,1-403.

LarsenJ. and Moestrup,0. (1992) Potentially toxic phytoplankton. Idem Leafl. Plankton. ICES, No.180,12 pp.

Lembeye.G., Yasumoto.T., ZhaoJ. and Fernandez.R. (1993) DSP outbreaks in Chilean fiords. InSmayda.TJ. and Shimizu.Y. (eds), Toxic Phytoplankton Blooms in the Sea. Elsevier, Amsterdam,pp. 525-529.

LopezJ. (1955) Varacion alometrica en Ceratium tripos. Invest. Pesq., 2,131-159.McCall,H.and LindleyJ.A. (1994) Automated visual identification of marine plankton using multi-

variate analysis. In Culverhouse.P. (ed.), 24 Month Progress Report Year End September 1994. Con-tract MAS2-CT92-00I5. CEC, Brussels, 28 pp.

Moita,M.T. and de M.Sampayo,M.A. (1993) Are there cysts in the genus Dinophysis? In Smayda,TJ.and Shimizu.Y. (eds), Toxic Phytoplankton Blooms in the Sea. Elsevier, Amsterdam, pp. 153-157.

Phillips3-F., Campbell^. A. and Wilson.B.R. (1973) A multivariate study of geographic variation inthe whelk Dicathais.J. Exp. Mar. B'toL EcoL, 11,27-69.

Reguera,B., Bravo.I., Marcaillou-Le Baut.C. and Masselin.P. (1993) Monitoring of Dinophysis spp.

409

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from

H.McCall et al

and vertical distribution of okadaic acid on mussel rafts in Raa de Pontevedra (NW Spain). InSmaydaXJ. and Shimizu.Y. (eds). Toxic Phytoplankton Blooms in the Sea. Elsevier, Amsterdam,pp. 553-558.

SAS Institute Inc. (1989) SAS/STAT* User's Guide, Versiond, Fourth Edition, Volume 1. SAS InstituteInc., Cary, NC, 943 pp.

SchillerJ. (1933) Dinoflagellatae. In Rabenhorst.L. (ed.), Krytoganmmen-Flora 10. I. Teil. Alcade-mische Verlaggesellschaft, Leipzig, 617 pp.

Sournia.A. (1967) Le Genre Ceratium (Peridinium Planctonique) dans le Canal de Mozambique. Con-tribution a une Revision Mondiale. Vie Milieu Ser. A BioL Mar., 18<Issue 2-3A), 375-500.

Yarranton.G.A. (1967) Parameters for use in distinguishing populations of Euceratium Graa Bull.Mar. EcoL, 6,147-158.

Received on March 10, 1995; accepted on November 6, 1995

410

by guest on Septem

ber 18, 2011plankt.oxfordjournals.org

Dow

nloaded from