phytoplankton recognition using parametric discriminants
TRANSCRIPT
Journal of Plankton Research Vol.18 noJ pp.393-41O, 1996
Phytoplankton recognition using parametric discriminants
Helen McCall, Isabel Bravo1, J.Alistair Lindley and Beatriz Reguera1
Plymouth Marine Laboratory, Prospect Place, Plymouth PL1 3DH, UK and'Institute Espanol de Oceanografia, Cabo Estay-Canido, 36280 Vigo, Spain
Abstract A comparison was made between the use of linear and quadratic discriminant functions forclassifying phytoplankton specimens of the genera Dinophysis and Ceratium by means of a generalmorphometric function. The class distributions were found to fit quadratic boundaries better thanlinear boundaries. A nine species quadratic discriminant classified within 95% confidence intervals.Morphological variants not used in the calibration were all correctly identified, although control speciesunknown to the model were poorly rejected. An accuracy of 99% was obtained for separating threemorphological variants of Dinophysis acuminata. Digital filters were developed to extract the morpho-metric function directly from photomicrograph images, and present the data as an orientation-indepen-dent feature vector. Using this feature vector, a quadratic discriminant classified test data from 14species of the genera Dinophysis, Ceratium and Ornithocercus with an accuracy of 83%, with 37% ofthe error due to two similarly shaped species of Dinophysis overlapping.
Introduction
The purpose of this investigation was to examine the problem of automated visualidentification of marine plankton, from a conventional statistical and mathemat-ical viewpoint, for the development of feature extraction and pattern recognitiontechniques for use in marine environmental research. The investigation wasdesigned to explore the multivariate relationships between a range of taxonomicclasses of marine dinoflagellates based on their respective morphologies, and toexplore the use of conventional image-analysis techniques to extract the pertinentmorphological features for an automated classifier.
There is a growing requirement worldwide for efficient and inexpensive systemsfor monitoring phytoplankton (Lembeyeefa/., 1993; Reguera et al., 1993) to facili-tate effective control of fishery harvesting.
Marine phytoplankton and protozooplankton have traditionally been identifiedby visual descriptions of their morphology, as illustrated by simple line drawings oftheir respective shapes in published keys (Sournia, 1967; Dodge, 1982). Whenusing conventional published keys, a major component of the information derivedfrom the line drawings is the overall outline shape of the organism. By ignoring thedescriptions of fine detail of taxa-specific features, and taking the overall outlineshape alone, the concept was developed for this project of a general morphometricfunction, equally applicable to all rigid-bodied plankton organisms, allowingdirect and quantitative comparison.
Method
Plankton samples were obtained from the archives of Plymouth Marine Lab-oratory, Centro Oceanographico de Vigo, the Sir Alistair Hardy Foundation forOcean Science and the Forth River Purification Board, covering a range oflocations in the North East Atlantic, including the Azores, Gallician rias, North
© Oxford University Press 393
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
H.McCall tt al
Sea and the shelf seas to the west of Britain (see Table I). Specimens were photo-graphed using 35 mm cameras attached to inverting microscopes. A range of mag-nifications was used: 10 x 10,10 x 20,10 x 40 and 10 x 63.Eachsetofopticsateachmagnification was calibrated by photographing a micrometre scale.
A database containing 3805 photomicrographs of marine phytoplankton andzooplankton was indexed, and the specimens classified and cross-checked againstpublished keys (Jorgenson, 1911; Kofoid and Campbell, 1929; Schiller, 1933;Sournia, 1967; Dodge, 1982; Larsen and Moestrup, 1992).
The use of a description of the overall outline shape as a general morphometricfunction for classification was first established using a set of manual measurementsto describe the function.
Manual measurements were taken of 1025 specimens of the genera Dinophysisand Ceratium, comprising nine main species of interest (Figure 1): Ceratium tripos,Ceratium horridum, Ceratium longipes, Ceratium arcticum, Ceratium azoricum,Dinophysis acuta, Dinophysis acuminata, Dinophysis rotundata and Dinophysissacculus; plus nine controls comprising two morphological variants, Ceratiumtripos var. pulchellum and Ceratium horridum var. buceros, and seven other spe-cies: Ceratium dedinatum, Ceratium symmetricum, Dinophysis dens, Dinophysispulchellum, Dinophysis norvegica, Dinophysis caudata and Dinophysis tripos,similar in form to the nine main species.
The measurements were taken from photomicrograph prints using vernier cali-pers. The four parameters measured — total length, maximum width, girdle widthand height of the maximum width — were chosen to give a description of the over-all shape of the theca (see Figure 2). The parameters measured constituted dis-tances between points around the outline shape, and corresponded to lengths andwidths which might reasonably be assumed to approximate to a normal distri-bution in a population to allow for the use of a parametric discriminant. The exter-nal list structures found on Dinophysis spp. were omitted from the overall functionof the shape for the manual measurements to remove problems of variance due tophysical damage, and partial absence in newly divided cells. To provide a fixedreference for orientation, the girdle was taken to be as close to horizontal as poss-ible, with the width measurements taken to be in parallel to the girdle, and thelength and height measurements at right angles to the girdle. For convenience inmaking these measurements, the conventional orientation of Ceratium spp. figures
Table I. Equipment, location and date of samples used from Plymouth Marine Laboratory archive
Equipment
Continuous plankton recorderContinuous plankton recorderContinuous plankton recorderLonghurst-Hardy plankton recorderLonghurst-Hardy plankton recorderLonghurst-Hardy plankton recorderLonghurst-Hardy plankton recorderWater bottles
Forth River Purification Board samplesPlankton net
Location
48°59'N 07°22'W42°31'N69°07'W48°48'N 50°58'W31°00'N 29°44'W59°09'N 18°45'W51°00'N 13°00'W region54°00'N 06°O9'EAzores area
56°00'N 03°00'W region
Date
20/08/8912/05/9118/11/9112/05/8130/08/8106/05/8611/06/8913/03/92
1992
394
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
Phytophuikton recognition using parametric discriminants
m
Fig. L Calibration species from the genera Ceratium, Omithocercus and Dinophysis: (a) C.horridum,(b) C.tripos, (c) C.lineatwn, (d) C.pentagonum, (e) C.furca, (f) C.arcticum, (g) C.longipes, (h) C.azor-icum, (I) C.fusus, (j) O.steinii, (k) D.acuta, (I) D.rotundata, (m) D.acuminata, (n) D.sacculus.
was inverted, making the apical horn point towards the base. This inversion was toallow species such as Ceratium furca and Ceratium lineatum to be measured like-wise without the height of maximum width giving a negative value. For similarreasons, where the antapical homs extended beyond the extent of the apical horn,the base was taken to be the end of the antapical horns. The base was, in both cases,the lowest point when the girdle was horizontal and the apical horn pointeddownwards.
395
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
H.McCall el aL
a) Ceratium spp.
Fig. 2. Manual morphometnc measurements.
b) Dinophysis spp.
A selection of micrometre scale photomicrographs taken at the appropriatemagnifications (10 x 10,10 x 20,10 x 40,10 x 63) on the same microscopes werealso measured, and all measurements scaled accordingly to micrometres.
A further set of measurements was taken from 124 individuals comprising threemorphological variants of D.acuminata found in samples from the Gallician rias(Bravo et aL, 1993). Three parameters were measured (see Figure 3): maximumlength from the girdle plane to the thecal base, width of the theca from the mostconvex part of the dorsal edge to the ventral end and parallel to the girdle, and theconvexity of the dorsal edge measured as the angle a formed between the line MCand the girdle plane, where M is a point on the dorsal edge 6 u.m below the girdleplane and C is the centre of the closest circumference to the dorsal edge.
General morphometric functions were extracted as simple outline shapes (Fig-ure 4) from the 256 x 256 pixel, 256 greyscale digital images (Figure 5) of 14 speciesfrom the three genera Ceratium, Dinophysis and Omithocercus, using adaptions ofconventional digital filters (Gonzalez and Woods, 1992). The analytical algorithmswere developed in the 'C language on a UNIX workstation.
The initial constraints set for extraction of the outlines were that the organismshould be located in the centre of the image; the organism should be in focus with-out portions being obscured by other objects, or cut off by the edge of the image.
Initial attempts were made to extract outlines using the Sobel edge detect filter(Gonzalez and Woods, 1992). This filter differentiates the image, resulting in ahorizontal and a vertical gradient component, along with a directional component.The horizontal and vertical components were combined as a root sum of squares ofthe corresponding elements, to provide a simplified Sobel gradient, and the gradi-ent thresholded to produce a binarized image of the edges present in the original
3%
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
Pbytoplantton recognition using parametric discriminants
Dinophysis acuminata
Fig. 3. Angular measurements.
V118F12 P18F23 P4F16Dinophysis sacculus Ctratitan Horridum Ceratium tripos
151 puels/lOOtim 52 pixels/lOOum 52 pixelVlOOym
Fig. 4. Extracted outlines (scaled in pixels/100 (j.m).
P4F17Dinophysis acula
110 pixels/1 OOJUTI
image. This produced easily recognizable outlines of the organisms, but the out-lines were rather incomplete, making it impractical to attempt separating the out-line of the organism from the background detritus edges, or the edges formed bythe internal structures of the organism, because any attempt to use a filling algor-ithm simply bled through the breaks in the outline edges.
To overcome the difficulties in separating a gradient-based outline from sur-rounding detritus (Figure 5) and other organisms, the algorithm was adapted toseparate out a region of high density of edge gradients. The Sobel horizontal andvertical operators were applied, and the two resulting gradients were individuallythresholded, binarized and then combined with a Boolean AND of their corre-sponding elements (Figure 5b). A 3 x 3 dilating filter (Gonzalez and Woods, 1992)was then applied to homogenize the region of high density of gradients (Figure 5c),followed by an eroding filter to remove the external thickening of the region
397
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
H.McCall et al
d) original image b) sobel derived edge gradients
c) region filled by dilation d) thinned by erosion e) extracted outline
Fig. 5. Feature extraction from image of C.longipes (specimen P70F08).
caused by the dilation. The result was an image comprising filled regions corre-sponding to the various objects in the field (Figure 5d).
The organism was then separated from the surrounding objects by selecting theregion of filling closest to the centre of the image and following a coherent track offilling until the edge of the object was found. To check that this was the true edge ofthe object and not that of an unfilled void within the organism's outline, a coherenttrack was then followed to the perimeter of the image. If the perimeter of the imagewas not found, then the position on the edge of the object was assumed not to bethe true edge and a new track across filled space started from that point.
Having identified a locus on the true edge of the organism, a contour-followingalgorithm traced around the edge of the organism, recording the loci in a linked listof coordinates. Thus, the outline shape (Figures 5e and 4) or morphometric func-tion was reduced to a one-dimensional set of vectors.
The constraints set on the extraction of morphometric data from the outlineswere that all measurements should be orientation and position independent, andshould all be in the same units of measurement. Recognition of the orientation ofan organism, or of a defined locus on an organism, would logically require an initialidentification of the organism. Setting a standard unit of measurement, in this casethe micrometre, was to prevent any artifacts due to the differential scale of unitsfrom affecting the shape of the class distributions in the multivariate analyses.
398
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
Pbytoplankton recognition using parametric discriminants
The length of the perimeter of the organism was measured by triangulatingbetween adjacent loci. The area of the organism was approximated as a count ofpixels within the outline. The square root of the area in micrometres was calculatedto provide a parameter in the standard units.
A histogram of distances between each pair of loci was computed, sampling at4 jtm intervals. From this the maximum length was derived as the highest intervalwhere data were recorded. A simple description of the shape of the histogram wasprovided by the mean and SD of the histogram.
The feature vector was comprised of the five morphometric parameters inmicrometres: perimeter, square root of area, maximum length, and mean and SDof the histogram of distances between loci.
The manual and outline data were split into Model calibration and Test datasetsusing the random function provided in the 'C language standard library.
Parametric discriminant functions (Phillips et al., 1973; Statistical Analysis Sys-tems Institute, 1985) were employed to investigate and model the separationbetween the morphological classes, and to demonstrate the application of such amodel to statistical classification of individual specimens. The discriminant func-tions utilized are analogous to supervised procedures requiring the classes to bepre-defined, as opposed to cluster analyses which are analogous to unsupervisedprocedures needing no prior knowledge of the classes. These discriminant func-tions assume a multivariate normal distribution. The quadratic discriminant func-tion is based upon Mahalanobis distances between class means, weighted by thewithin-class covariance matrices. The linear discriminant assumes covariancematrices to be equal and pools the covariance matrices to produce distancesbetween class means which are analogous to Euclidean distances.
The prior probabilities were assumed equal for the discriminant analyses,because no other assumptions could be made concerning expected class sizes innatural populations.
A pilot study on classifying four species of Dinophysis using total length, maxi-mum width and girdle width suggested that the boundaries between classes werelinear, and so a linear discriminant was initially attempted.
Results
The results of the manual measurements are shown in Table II. It can be seen fromthe means and SDs shown that for each of the four parameters measured there arespecies which overlap within the SDs of the class distributions, e.g. all the Ceratiumspp. except C.tripos overlap with each other on total length, and D.acuminataoverlaps with D. rotundata on total length. All such overlap is within genus, with nooverlap between the two genera. The most conservative parameter measured canbe seen to be girdle width, where the SDs are the lowest as a proportion of themeans within classes, and the total population's SD is the lowest as a proportion ofthe total mean. There is also the least amount of overlap shown on this parameter,with only C.arcticum and C.longipes overlapping within the SDs from their means.However, the total population's SD for girdle width is sufficiently low (i.e. the classmeans are sufficiently close to each other) for the tails of the individual class distri-
399
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
H.McCall et aL
Table n. Manual morphometnc calibration data, within-class statistics in micrometres
Taxonomic class
C.arcticumC.azoricumC.horridumC.tongipesC. triposD.acuminataD.aculaD.rotundataDsacculus
Total
Total length(tim)
Mean
191.64137.39204.34173.46272.4952.2975.2048.3743.35
99.75
SD
32.6621.6429.2617.5519.273.874.772.592.05
70.85
Girdle width((irn)
Mean
58.1049.2241.3957.9473.2317.2924.9730.758.79
30.54
SD
3.423.393.213.375.812.782.062.271.25
18.54
Maximumwidth (pim)
Mean
455.7892.10
174.62262.04211.5136.3651.5242.9622.20
100.08
SD
58.8112.3922.4851.6724.624.222.232.312.31
114.01
Height ofmaximum
width
Mean
92.6285.49
135.6959.27
144.9327.0128.4927.9123.61
51.13
SD
37.5716.9926.3420.8631.983.292.442.452.43
43.37
Natural log ofthe deter-
minant of thecovariancematrix
22.8213.5920.6521.5720.178.357.855.274.00
pooled = 18.03
140
120
100
C. arcticum
C. azoricum
C. horridum
C. longipes
C. tripos
D. acuminata
D. acuta
D. rotundata
D. sacculus
2 6 10 14 18 22 26 30 34 38 42 46 50 54 56 62 70 74 78 82 84
Girdle width (um)
Fig. 6. Distribution of girdle widths in the genera Ceratium and Dinophysis.
400
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
Phytoplankton recognition using parametric discriminants
butions to overlap with their neighbours in the frequency histogram (Figure 6).Therefore, the classification is a multivariate problem.
The most variable classes can be seen to be C.arcticum and C.longipes withD.acuminata being the most variable of the genus Dinophysis. This is also reflectedin the within-class covariance matrices, as shown by the natural logs of their deter-minants in Table II.
A linear discriminant function assuming a normal distribution (see Table HI)showed relatively high Mahalanobis distances between most of the nine species,with just D.acuminata showing itself to be relatively very close to both D.sacculusand D.acuta, and D.acuta being rather close to D.rotundata.
Classifications derived from the posterior probabilities of class membership onresubstitution of the Test dataset onto the linear discriminant model (see TableIV) show very good separation with <5 % classification error overall. This stronglysuggests that the within-class distributions are indeed normal. However, an exam-ination of the erroneous classifications shows some serious anomalies: the classifi-cation table (Table IV) shows a C.horridum reclassified as a D.acuta, and threeC.azoricum reclassified as D.rotundata. Each of these specimens had measure-ments which were towards the limits of the distributions for their correct classes,but each of their parameters were totally different from those of the classes derivedfrom their posterior probabilities. Despite these parameters being very distantfrom those of the resultant class, the anomalous errors mostly had posterior prob-abilities >0.99, making it difficult to reject these specimens by thresholding withinset confidence intervals.
An examination of the within-class covariance matrices, as depicted by the natu-ral logs of the determinants of the covariance matrices in Table II, shows that thecovariance matrices for the classes within genera are reasonably similar, and somight be assumed to have linear boundaries within genus. However, the twogenera fall into two separate groupings and the pooled covariance matrix gives anatural log of its determinant which is within the range for Ceratium spp., but verymuch higher than those for Dinophysis spp. Therefore, on a linear model withpooled covariance matrices, the distances between any of the Dinophysis classmeans and any one specimen are artificially and dramatically foreshortened by thegrossly inappropriate magnitude of the assumed variation within the classdistribution.
The experiment was repeated using a quadratic discriminant function whereeach class is modelled on its individual covariance matrix. This gave a total errorrate of 0.03 for classification of the Test dataset. The trans-genera anomaliesshown for misclassifications in the linear model were not repeated with the quad-ratic model (Table V).
The average posterior probabilities for class membership of the Test dataset(Table V) show that, with one exception, all the classes obtained mean posteriorprobabilities >0.99 for correct classifications and mean posterior probabilities<0.95 for misclassifications. This suggests that much of the misclassification shouldbe rejected if a threshold was set at 95% confidence intervals. The exception wasC.azoricum with a mean posterior probability of 0.9834 for correct classification.The apparent anomaly of a C.azoricum (specimen P21F25) being misclassified as aC.horridum with a posterior probability of 0.9982 was found to be for a specimen atthe extreme limits of the range of C.azoricum as described previously (Sournia,
401
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
Tab
le I
II. D
ista
nces
bet
wee
n ni
ne s
peci
es i
n li
near
dis
crim
inan
t of
man
ual
mor
phom
etri
c da
ta
C.a
rcti
cum
C
.azo
ricu
m
C.h
orri
dum
C
.long
ipes
C
.lrip
osD
.acu
min
ata
D.a
cula
D.r
otun
data
D
.sac
culu
s
C.a
rcti
cum
C.a
zori
cum
C.h
orri
dum
C.lo
ngip
esC
.lrip
osD
.acu
min
ala
D.a
cula
D.r
otun
data
D.s
accu
lus
029
8.27
215.
8994
.82
284.
6948
0.39
408.
6940
4.39
574.
23
0 51.7
878
.34
134.
3915
2.52
94.9
668
.00
235.
78
0 71.3
913
9.55
158.
2010
6.83
126.
7222
0.08
010
3.72
269.
0619
1.14
184.
0036
3.57
051
9.92
399.
8238
2.45
651.
29
0 8.99
29.8
79.
82
0 17.0
434
.65
0 72.4
2
Tab
le I
V. C
lass
ific
atio
n su
mm
ary
from
lin
ear
disc
rim
inan
t of
man
ual
mor
phom
etri
c da
ta
Fro
m t
axa
C.a
rcti
cum
C.a
zori
cum
C.h
orri
dum
C.lo
ngip
esC
. tri
pos
D.a
cum
inal
aD
.acu
laD
.rot
unda
taD
.sac
culu
s
Tot
al
Err
or r
ate
Into
tax
a
C.a
rcti
cum
37 37 0.02
63
C.a
zori
cum
26 26 0.10
34
C.h
orri
dum
22 22 0.08
33
C.lo
ngip
es
1 123 25 0.
0
C. t
ripo
s
37 37 0.0
D.a
cum
inat
a
72 2 1
75 0.12
2
D.a
cuta
1 2 87 90 0.03
33
D.r
otun
data
3 152 56 0.
0189
D.s
accu
lus
8 90 98 0.0
Tot
al
38 29 24 23 37 82 90 53 90 466 0.
0430
by guest on September 18, 2011plankt.oxfordjournals.orgDownloaded from
Tab
le V
. Mea
n po
ster
ior
prob
abil
itie
s fo
r cl
ass
mem
bers
hip
from
qua
drat
ic d
iscr
imin
ant
of m
anua
l m
easu
rem
ents
Fro
m t
axa
C.a
rcti
cum
C.a
zori
cum
C.h
orri
dum
C.lo
ngip
esC
tnpo
sD
.acu
min
ata
D.a
cnta
D.r
otun
data
D.s
accu
lus
To
tal
Into
tax
a
C.a
rcti
cum
C
.azo
ricu
m
C.h
orri
dum
C
.long
ipes
C
.trip
os
D.a
cum
inat
a D
.acu
ta
D.r
otun
data
D
.sac
culu
s
0.99
47
0.92
81
0.98
340.
9982
1.00
00
0.99
300.
9834
0.99
99
0.68
520.
9058
0.94
530.
9971
0.94
28
0.97
26
1.00
00
1.00
00
0.99
66
0.84
34
0.99
48
0.99
97
0.99
97
1.00
00
1.00
00
0.99
16
0.99
16
o § by guest on September 18, 2011plankt.oxfordjournals.orgDownloaded from
H.McCaII el al.
1967). Specimen P21F25 had a girdle width of 41.42 u-m, which is at the mean valuefor C.horridum, and the antapical horns were at a very obtuse angle and wereparticularly short, which led to measurements for maximum width, height of maxi-mum width and total length which had ratios similar to those found in C.horridum.This specimen was not classed as taxonomically poor during the initial preparationof the index because it corresponded well to one of the extreme examplesdescribed by Sournia. However, it is highly atypical for the species range as awhole.
Repeating the quadratic discriminant experiment with an acceptance thresholdset within 95% confidence intervals (Table VI) gave a 3.22% rejection overall,resulting in a 60% reduction in misclassifications and just a 2% reduction in thenumber of correct classifications. Only four specimens out of the 466 in the Testdataset were wrongly classified, giving <1% misclassification.
Nine control classes which were not present in the Model calibration datasetwere included in the Test dataset for the thresholded quadratic discriminant(Table VI) to test the reactions to forms unknown to the model. The morphologi-cal variants C.horridum var. buceros and C.tripos var. pulchellum were all cor-rectly identified as C.horridum and C.tripos, respectively. The other two controlclasses of Ceratium species (C.declinatum and C.symmetricum) with their own dis-tinct morphologies only achieved a 10% rejection overall with the remainder beingmisclassified as other Ceratium species. Dinophysis pulchellum, which are thesame shape as D.rotundata but smaller in overall size, were classified as D.rotun-data. Dinophysis dens, which may be a gamete of D.acuta (Moita and Sampayo,1993), but is very similar in form to D.acuminata, was classified as D.acuminata.Dinophysis norvegica was classified as the very similar D.acuta.
The major anomaly in the classification of the controls was the misclassificationof most of the D.caudata and the D.tripos as C.horridum. These two Dinophysisspecies are very like one another and are closest in form to an over-large D.acuta(which one specimen was classified as). The measured parameters for these speci-mens in no way resembled those of the C.horridum in either magnitude or ratios.The only anomaly observable in the measured parameters for these two controlclasses was that each measurement was very close to the mean value for the corre-sponding parameter in the total inter-class means shown in Table II. This wouldhave placed these specimens in a tight cluster at the centre of the total inter-classdistributions.
To investigate this anomaly further, the experiment was repeated using pooledcovariance matrices so that the distances would be analogous to Euclidean dis-tances. This resulted in all the D.caudata and the D.tripos being classified asD.acuta, the closest class on Euclidean distance. It might be assumed then that thisanomaly was due to the very high covariances for C.horridum (Table II), coupledwith the fact that on the most conservative feature (girdle width) C.horridum is theclosest of the high-covariance classes to the D.caudata mean of 26.44 \hm, with theminimum C.horridum measurement being 34.30 jim. This anomaly might alsopoint to the distribution of C.horridum departing from a multivariate normaldistribution.
404
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
Tab
le V
I. M
anua
l m
orph
omet
nc t
est
data
cla
ssif
ied
wit
hin
95
% c
onfi
denc
e in
terv
als
Fro
m t
axa
C.a
rcti
cum
C.a
zori
cum
C.h
orri
dum
C.lo
ngip
esC
.trip
osD
.acu
min
ata
D.a
cuta
D.r
otun
data
D.s
accu
lus
Con
trol
sC
.hor
ridu
m
var.
buce
ros
C.tr
ipos
var
.pu
lche
llum
C.d
ecli
natu
m.
C.s
ymm
etri
cum
D.c
auda
taD
.den
sD
.nor
vegi
caD
.pul
chel
lum
D. t
ripo
s
Tot
al
Err
or r
ate
Into
taxa
C.a
rcti
cum
C
.azo
ricu
m
C.h
orri
dum
3624
1 23 18 2
38 1
36
27
53
0.05
0.
17
0.04
C.lo
ngip
es
1
22 2 9 34 0.04
C.tr
ipos
34
1 4 39 0.08
D.a
cum
inat
a D
.acu
ta
81 6 87 0.01
90
1 1
92 0.00
No
clas
s
D.r
otun
data
D
.sac
culu
s
2 3 1 1 1 1
5384
6 2
2 55
84
17
0.00
0.
07
Tot
al
38 29 24 23 37 82 90 53 90 18 1 2 18 9 6 1 2 1
524 0.
05
•?" |do) 1;too r § 1 3" B s a* D90 1imetnc
I
by guest on September 18, 2011plankt.oxfordjournals.orgDownloaded from
HMcCaUetal
The three morphotypes of D.acuminata found in the Rfas Bajas (Bravo et al.,1995; Figures 5-7), when classified with the quadratic discriminant, gave the resultsshown in Table VII. The total error rating for classification of the test data was 0.01,using only three parameters to describe the shape of the theca. This demonstratesthat a very low resolution function of the shape can clearly separate very similarmorphotypes. However, it might also point to multimodality within species classes,caused by the presence of distinct morphotypes, being one cause of the 5% errorstill remaining in Table VI.
The calibration data from the Model dataset of automatically extracted outlines(Figure 4) are shown in Table VIII. There is significant overlap between classeswithin the SDs of the means for each of the five parameters, showing separation ofclasses to be a multivariate problem. The SDs are mostly of the order of 10% of themeans. Some exceptions, such as the perimeter measurements for C.horridum,may be due to the adherence of particles of detritus causing the outlines to beroughened, so increasing the variation between individuals; other exceptionsbeing due to the naturally high variation in particular features, such as maximumlength of C.arcticum which corresponds to the distance between the points of theantapical horns, which is highly variable. The data for D.acuminata and D.rotun-data can be seen to overlie closely, within the SDs of their means, for each of thefive parameters, showing that this feature vector contains insufficient informationto separate these two classes as represented. There is no overlap between the threegenera.
The classification summary from a quadratic discriminant of 14 species in threegenera (Table IX) gave an accuracy of 83%, with the major contributions to theerror coming from the overlap between D.acuminata and D.rotundata, along withoverlap between C.longipes and C.tripos, which had mahalanobis distances verysimilar in value to the natural logs of the determinants of their respective covari-ance matrices.
The average posterior probabilities for class membership for the outline data(Table X) show that much of the misclassification due to peripheral overlap mightbe rejected by setting thresholds for acceptance, although the low mean posteriorprobabilities for correct classifications of C.longipes and C.tripos suggest that indi-vidual class thresholds might be more appropriate than a single pooled value ifrejection of correct classifications is to be minimized.
Table VII. Classification score for D.acuminata morphotypes test data
From class
Class FlClass F2Class F3
Total
Error rate
Into class
ClassFl
17
1
18
0.000
Class F2
14
14
0.000
Class F3
29
29
0.033
Total
171430
61
0.011
406
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
Phytoptankton recognition using parametric discriminants
Table IX. Classification summary for test data from automatically extracted outlines
From taxa Into class Total
A B C D E F G H I J K L M N
A: C.arcticum 22 2 1 6 31B: Cazoncum 30 30O.C.furca 2 27 1 2 32DC.fusus 7 28 1 36E-.Chorridum 19 2 1 22F: C.lineatum 25 1 26G: C.longipes 2 1 14 8 25H: C.penta-
gonum 2 31 1 34I: C.tripos 2 3 12 17
J: D.acuminata 65 33 5 103K.D.acuta 1 2 103 2 108L: D.rotundata 5 30 35M: D.saccutus 1 4 66 71
N: O.steinii 1 15 16
Total 26 32 38 28 23 26 21 34 26 76 103 63 71 19 586
Error rate 0.29 0.00 0.16 0.22 0.14 0.04 0.44 0.09 0.29 0.37 0.05 0.14 0.07 0.06 0.17
Table
From
X. Mean
taxa
posterior probabilities for class membership from
Into class
A B C D E F G H
outline data
I J K L M N
A: C.arcticum 0.94 0.76 0.93 0.68B: C.azoricum 0.96C: C.furca 0.75 0.99 0.98 0.77D: C.fusus 0.89 0.95 0.70E: C.horridum 0.91 0.89 1.00F: C.lineatum 1.00 0.93G: C.longipes 0.70 0.44 0.84 0.70H: C.pentagonum 0.97 0.94 0.58I: C.tripos 0.81 0.95 0.90
J: D.acuminata 0.73 0.61 0.80K-.D.acuw 0.79 0.90 0.97 0.84L: D.rotundata 0.74 0.72M: D.sacculus 0.63 0.62 0.96
N: O.steinii 0.61 0.99
Total 0.91 0.95 0.96 0.95 0.88 0.98 0.86 0.93 0.79 0.73 0.97 0.67 0.94 0.95
407
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
H.McCall el at
Conclusions
The results from the manual morphometric study demonstrate that a very low res-olution description of the outline shape of these rigid-bodied organisms is suf-ficient to provide reliable separation within 95% confidence intervals. However,the poor rejection of unmodelled classes with similar morphologies to those mod-elled (Table VI) suggests that a more detailed description of the outline should befurnished to prevent similar classes, unknown to the model, being misclassified.
The results of the classification of three morphotypes of D.acuminata (TableVII) show that the species distribution may approximate to a multimodal multi-variate normal. Many dinoflagellate species have several morphotypes, whichmight cause a similar multimodality.
Many of the dinoflagellate species have recognized morphological variants(Sournia, 1967), although in some cases, such as C.tripos var. pulchellum (Lopez,1955), a constant progression in form can be found between the typical forms andthe variants. Lopez found a constant progression in all of the measured featuresbetween C.tripos var. pulchellum (Schrod.) and C.tripos var. atlanticum Ostenf.,but also found that these two variants formed distinct bimodal classes on fre-quency distribution histograms, with the intermediate forms being very rare.
If such morphological variants form distinct modal classes, then the perform-ance of this method of discrimination might be improved by defining separateclasses for them in the calibration.
In the extraction of outlines from images, a number of attempts were made toobtain a homogeneous filled region corresponding to the organism. Attemptsusing simple high-pass and high-boost filters to differentiate the image resulted invery indistinct margins to the region of interest, as did the use of a Boolean OR tocombine the two Sobel gradients. The use of a Boolean AND appeared to give acleaner outline by rejecting pixels which were not set in both horizontal andvertical components.
The major problems encountered with this outlining process were firstly theobscuration of portions of the specimens due to overlying debris and due to thenarrow effective field of view of the digitizing equipment used, and secondly thevery low contrast and variable focusing of the lightfield illuminated photomicro-graphs, which caused a significant number of the specimens to require adjustmentof the binarizing threshold to produce a recognizable outline.
The overall contrast could be easily improved by the use of darkfield illumnationto enhance the specimen perimeter. Alternatively, the dynamic range of theimages could be analysed to enable a suitable contrast-enhancing filter to beapplied prior to the gradient filtering. The flat field presented by the background inmicrographs, due to the very shallow depth of field of the objective lens, makes thisform of feature extraction much more feasible than in general purpose imageanalysis.
The images from which the outlines were extracted were at five different magni-fications. This will have caused a degree of error in the measurements due to fractaleffects (Kaandorp, 1994). These errors might be minimized by using separatemodels for each magnification.
408
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
Pbytoplankton recognition using parametric discriminants
The perimeter of the organisms was measured as a fractal dimension, and byvarying the sampling width for this measurement a better description of the overallroughness of shape might be obtained.
The classification summary in Table IX shows a fair amount of peripheral over-lap between class distributions which suggests that the description of the outlineshould provide more detail on the finer points of the curvature of the outline. Thismight be achieved either by attempting to differentiate the curvature of the out-line, which has previously proved impractical due to the diversity of orders ofcurvature (Yarranton, 1967) or by attempting an approximation using fractaldimensioning, or else by providing a more detailed description of the shape of thehistogram of distances between loci.
Acknowledgements
We thank the Sir Alistair Hardy Foundation for Ocean Science and the ForthRiver Purification Board for making plankton samples available to us, and theUniversity of Plymouth for the use of digitizing equipment. We received valuableadvice from K.R.Clarke and M.Carr on multivariate statistics, from G.Moore onimage processing and D.Harbour on dinoflagellate taxonomy. This work waspartly funded by a CEC MAST 2 contribution to a project coordinated by MrP.Culverhouse of the University of Plymouth Centre for Intelligent Systems, andalso by the NERC Technology Initiative. Mr R.Williams was responsible forPlymouth Marine Laboratory involvement in the MAST 2 project and contributedto the photography of the specimens.
ReferencesBravo,I., Reguera.B. and Fraga,S. (1995) Description of different morphotypes of Dinophysis acum-
inata complex in the Galician Rias Baixas in 1991. In Lassus.P., Arzul.G., Erard-Le Denn.D., Gen-tian ,P. and Marcaillou-Le Baut.C. (eds), Harmful Marine Algal Blooms. Lavoisier, Paris, pp. 21-26.
DodgeJ.D. (1982) Marine Dineflagellates of the British Isles. HMSO, London, 303 pp.Gonzalez.R.C. and Woods.R.E. (1992) Digital Image Processing. Addison-Wesley, Reading, MA,
716 pp.Jorgensen.E. (1911) Die Ceratien. Eine kurze Monographie des Gattung Ceratium Schrank. Int. Rev.
Ges. Hydrobiol. Hydrogr., 4 (Biol. SuppL, T Ser.), 1-124.KaandorpJ.A. (1994) Fractal Modelling: Growth and Form in Biology. Springer-Verlag. Berlin,
208 pp.Kofoid.C.A. and Campbell, A.S. (1929) A conspectus of the marine and fresh water ciliata belonging to
the suborder Tintinnoinea, with descriptions of new species principally from the' Agassiz' expeditionto the Eastern Tropical Pacific 1904-1905. Univ. Calif. Publ. ZooL, 34,1-403.
LarsenJ. and Moestrup,0. (1992) Potentially toxic phytoplankton. Idem Leafl. Plankton. ICES, No.180,12 pp.
Lembeye.G., Yasumoto.T., ZhaoJ. and Fernandez.R. (1993) DSP outbreaks in Chilean fiords. InSmayda.TJ. and Shimizu.Y. (eds), Toxic Phytoplankton Blooms in the Sea. Elsevier, Amsterdam,pp. 525-529.
LopezJ. (1955) Varacion alometrica en Ceratium tripos. Invest. Pesq., 2,131-159.McCall,H.and LindleyJ.A. (1994) Automated visual identification of marine plankton using multi-
variate analysis. In Culverhouse.P. (ed.), 24 Month Progress Report Year End September 1994. Con-tract MAS2-CT92-00I5. CEC, Brussels, 28 pp.
Moita,M.T. and de M.Sampayo,M.A. (1993) Are there cysts in the genus Dinophysis? In Smayda,TJ.and Shimizu.Y. (eds), Toxic Phytoplankton Blooms in the Sea. Elsevier, Amsterdam, pp. 153-157.
Phillips3-F., Campbell^. A. and Wilson.B.R. (1973) A multivariate study of geographic variation inthe whelk Dicathais.J. Exp. Mar. B'toL EcoL, 11,27-69.
Reguera,B., Bravo.I., Marcaillou-Le Baut.C. and Masselin.P. (1993) Monitoring of Dinophysis spp.
409
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
H.McCall et al
and vertical distribution of okadaic acid on mussel rafts in Raa de Pontevedra (NW Spain). InSmaydaXJ. and Shimizu.Y. (eds). Toxic Phytoplankton Blooms in the Sea. Elsevier, Amsterdam,pp. 553-558.
SAS Institute Inc. (1989) SAS/STAT* User's Guide, Versiond, Fourth Edition, Volume 1. SAS InstituteInc., Cary, NC, 943 pp.
SchillerJ. (1933) Dinoflagellatae. In Rabenhorst.L. (ed.), Krytoganmmen-Flora 10. I. Teil. Alcade-mische Verlaggesellschaft, Leipzig, 617 pp.
Sournia.A. (1967) Le Genre Ceratium (Peridinium Planctonique) dans le Canal de Mozambique. Con-tribution a une Revision Mondiale. Vie Milieu Ser. A BioL Mar., 18<Issue 2-3A), 375-500.
Yarranton.G.A. (1967) Parameters for use in distinguishing populations of Euceratium Graa Bull.Mar. EcoL, 6,147-158.
Received on March 10, 1995; accepted on November 6, 1995
410
by guest on Septem
ber 18, 2011plankt.oxfordjournals.org
Dow
nloaded from
Parametric and Non parametric Granger Causality Testing Linkages netween Internationsl Stock markets
FPGA Implementation of Fuzzy System with Parametric Membership Functions and Parametric Conjunctions