protein crystallization image analysis iccbm-2013

27
Madhav Sigdel Computer Science PhD Student University of Alabama in Huntsville 14th International Conference on the Crystallization of Biological Macromolecules 9/27/2012

Upload: madhav-sigdel

Post on 15-Apr-2017

499 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Protein crystallization image analysis   ICCBM-2013

Madhav SigdelComputer Science PhD Student

University of Alabama in Huntsville

14th International Conference on the Crystallization of Biological Macromolecules

9/27/2012

Page 2: Protein crystallization image analysis   ICCBM-2013

OverviewProtein crystallography

Protein crystallization

Crystallization trials

Scoring of crystallization trials

Image acquisition

Image classification

Page 3: Protein crystallization image analysis   ICCBM-2013

Protein Image Samples

Image 1 Image 2 Image 3

Image 4 Image 5 Image 6

Page 4: Protein crystallization image analysis   ICCBM-2013

Protein Crystallization Phases (Hampton Research)

1. Clear Drop2. Phase Separation3. Regular Granular Precipitate4. Birefringent Precipitate or Microcrystals5. Posettes and Spherulites6. Needle Crystals (1D Growth)7. Plate Crystals (2D Growth)8. Single Crystals (3D Growth < 0.2 mm)9. Single Crystals (3D Growth > 0.2 mm)

Page 5: Protein crystallization image analysis   ICCBM-2013

General ApproachApply image processing techniques to extract

featuresApply data mining techniques for classificationImage processing

Region of Interest (drop boundary) detectionImplementation of complex algorithms for edge

detection Hough transform Canny edge detection

Geometric and texture featuresDistributed computing to speed up the processFeature extraction computationally expensive

Page 6: Protein crystallization image analysis   ICCBM-2013

Related WorksAccording to no of categories

Binary classification - [Xioqung 2004], [Takahashi 2005], [Ming 2008], [Roy Liu 2008]Distinguishes between crystal and non-

crystal class onlyMulticlass classification – [Kanako Saitoh

2006], [Christian A 2010]Reported accuracy is very less for some

classesVarieties of classification methods applied

Page 7: Protein crystallization image analysis   ICCBM-2013

Our ApproachLow cost/in-house assembled system for

image acquisitionTrace fluorescent labeling of proteinApplication of intensity and simple

geometric features for processing imageClassification into 3 categories

Non-crystalsLikely leadsCrystals

Page 8: Protein crystallization image analysis   ICCBM-2013

Image Acquisition System

~30 minutes to collect images from 3-celled 96-well plate (288 images)

Page 9: Protein crystallization image analysis   ICCBM-2013

Image CategoriesImage category Grouping of Hampton categories

Non-crystals1. Clear Drop2. Phase Separation3. Regular Granular Precipitate

Likely leads 4. Birefringent Precipitate or Microcrystals*. Unclear bright regions

Crystals

5. Posettes or Spherulites6. Needles (1D Growth)7. Plates (2D Growth)8. Single Crystals (3D Growth < 0.2 mm)9. Single Crystals (3D Growth > 0.2 mm)

Page 10: Protein crystallization image analysis   ICCBM-2013

Non-crystal Images

Clear drops Regular precipitates

Page 11: Protein crystallization image analysis   ICCBM-2013

Likely Leads

Granular precipitate / Microcrystals

Unclear bright regions

Page 12: Protein crystallization image analysis   ICCBM-2013

Crystals

Page 13: Protein crystallization image analysis   ICCBM-2013

Image PreprocessingImage size reduction

Median filter

Thresholding techniquesOtsu threshold – select threshold intensity which

maximizes inter-class variance and minimizes intra-class variance

Dynamic thresholding I – select 90th percentile intensity of green component as the threshold

Dynamic thresholding II – select maximum intensity of green component as the threshold

Page 14: Protein crystallization image analysis   ICCBM-2013

Otsu Threshold

Image 1 Image 2Image 3

Image 4 = Otsu (Image1)Image 5 = Otsu (Image 2)

Image 6 = Otsu (Image 3)

Page 15: Protein crystallization image analysis   ICCBM-2013

Thresholding Techniques Comparison

Image 4: Max green threshold

Image 2: Otsu thresholdingImage 1: Original image

Image 3: 90th percentile threshold

Page 16: Protein crystallization image analysis   ICCBM-2013

Intensity Features

Background region in the original image

Image 1: Original image resized (Img1) Image 2: Thresholded image (Img2)

Image 3: Img1 AND Img2 Image 4: Img1 AND (Img2)c

Page 17: Protein crystallization image analysis   ICCBM-2013

Intensity featuresThreshold intensity () Bright pixel count (n) Average intensity in bright region (fStandard deviation of intensity in bright

region (fAverage intensity in dark region (b Standard deviation of intensity in dark

regionb

Page 18: Protein crystallization image analysis   ICCBM-2013

Region/Blob Features

Image 1: Original image Image 2 = Binary(Image1) Image 3 = Skeleton(Image2)

Image 4: Showing the connectedregions in different colors

Largest Blob (R1) R2 R3 R4

Extracted blobs

Page 19: Protein crystallization image analysis   ICCBM-2013

Region/Blob FeaturesNo of blobsConsider R1 denotes the largest blob

Area(R1)Boundary pixel count in R1

Fullness – No. of white pixels in R1 /Area(R1)Measure of boundary smoothness of R1

Variance of boundary smoothness of R1

Measure of symmetry of R1 along X and Y-axisConsider R2,R3,

R4 and R5 as the 4 largest blobs excluding R1 Average areaAverage fullness

Page 20: Protein crystallization image analysis   ICCBM-2013

Dataset

Category No of images Percentage

Non Crystals 1514 67.3%

Likely Leads 404 18.0%

Crystals 332 14.8%

Total images 2250

Page 21: Protein crystallization image analysis   ICCBM-2013

Experimental Results

Confusion matrix

    Observed class

   Non

crystalsLikely leads Crystals Actual

Total Accuracy

Actual class

Non-Crystals

1467 43 4 1514 96.9%

Likely Leads

42 317 45 404 78.5%

Crystals 7 68 257 332 77.4%Observed Total 1516 428 306 2250 90.7%

Classifier – Multilayer Perceptron Neural NetworkTesting – 10-fold cross validation

Page 22: Protein crystallization image analysis   ICCBM-2013

Other Classification TechniquesMax class ensemble method

Uses multiple classifiers with different feature combination

Assigned class is the maximum predicted class of all the classifiers

Decreases false negatives but increases false positives

Exhaustive binary classifiersSolves multiclass problem using all possible

binary classifiersFor class 3 – no of binary classifiers = 6

Overall accuracy around 82%

Page 23: Protein crystallization image analysis   ICCBM-2013

Future WorkClassify the crystals according to crystal

morphologyTrack temporal evolution of the crystalsExtract other relevant image features and

improvement of accuracy

Page 24: Protein crystallization image analysis   ICCBM-2013

SummaryIntensity is shown to be an easier but

useful search parameter to identify crystals

Efficient image processing (3 sec/image) Classification into 3 categories – non-

crystals, likely crystals and clear crystalsComparable accuracy with other

systems

Page 25: Protein crystallization image analysis   ICCBM-2013

AcknowledgementCoworkers

Salma BegumMarc L PuseyRamazan Aygun

iExpressGenes inc.

Page 26: Protein crystallization image analysis   ICCBM-2013
Page 27: Protein crystallization image analysis   ICCBM-2013