protein crystallization image analysis iccbm-2013
TRANSCRIPT
Madhav SigdelComputer Science PhD Student
University of Alabama in Huntsville
14th International Conference on the Crystallization of Biological Macromolecules
9/27/2012
OverviewProtein crystallography
Protein crystallization
Crystallization trials
Scoring of crystallization trials
Image acquisition
Image classification
Protein Image Samples
Image 1 Image 2 Image 3
Image 4 Image 5 Image 6
Protein Crystallization Phases (Hampton Research)
1. Clear Drop2. Phase Separation3. Regular Granular Precipitate4. Birefringent Precipitate or Microcrystals5. Posettes and Spherulites6. Needle Crystals (1D Growth)7. Plate Crystals (2D Growth)8. Single Crystals (3D Growth < 0.2 mm)9. Single Crystals (3D Growth > 0.2 mm)
General ApproachApply image processing techniques to extract
featuresApply data mining techniques for classificationImage processing
Region of Interest (drop boundary) detectionImplementation of complex algorithms for edge
detection Hough transform Canny edge detection
Geometric and texture featuresDistributed computing to speed up the processFeature extraction computationally expensive
Related WorksAccording to no of categories
Binary classification - [Xioqung 2004], [Takahashi 2005], [Ming 2008], [Roy Liu 2008]Distinguishes between crystal and non-
crystal class onlyMulticlass classification – [Kanako Saitoh
2006], [Christian A 2010]Reported accuracy is very less for some
classesVarieties of classification methods applied
Our ApproachLow cost/in-house assembled system for
image acquisitionTrace fluorescent labeling of proteinApplication of intensity and simple
geometric features for processing imageClassification into 3 categories
Non-crystalsLikely leadsCrystals
Image Acquisition System
~30 minutes to collect images from 3-celled 96-well plate (288 images)
Image CategoriesImage category Grouping of Hampton categories
Non-crystals1. Clear Drop2. Phase Separation3. Regular Granular Precipitate
Likely leads 4. Birefringent Precipitate or Microcrystals*. Unclear bright regions
Crystals
5. Posettes or Spherulites6. Needles (1D Growth)7. Plates (2D Growth)8. Single Crystals (3D Growth < 0.2 mm)9. Single Crystals (3D Growth > 0.2 mm)
Non-crystal Images
Clear drops Regular precipitates
Likely Leads
Granular precipitate / Microcrystals
Unclear bright regions
Crystals
Image PreprocessingImage size reduction
Median filter
Thresholding techniquesOtsu threshold – select threshold intensity which
maximizes inter-class variance and minimizes intra-class variance
Dynamic thresholding I – select 90th percentile intensity of green component as the threshold
Dynamic thresholding II – select maximum intensity of green component as the threshold
Otsu Threshold
Image 1 Image 2Image 3
Image 4 = Otsu (Image1)Image 5 = Otsu (Image 2)
Image 6 = Otsu (Image 3)
Thresholding Techniques Comparison
Image 4: Max green threshold
Image 2: Otsu thresholdingImage 1: Original image
Image 3: 90th percentile threshold
Intensity Features
Background region in the original image
Image 1: Original image resized (Img1) Image 2: Thresholded image (Img2)
Image 3: Img1 AND Img2 Image 4: Img1 AND (Img2)c
Intensity featuresThreshold intensity () Bright pixel count (n) Average intensity in bright region (fStandard deviation of intensity in bright
region (fAverage intensity in dark region (b Standard deviation of intensity in dark
regionb
Region/Blob Features
Image 1: Original image Image 2 = Binary(Image1) Image 3 = Skeleton(Image2)
Image 4: Showing the connectedregions in different colors
Largest Blob (R1) R2 R3 R4
Extracted blobs
Region/Blob FeaturesNo of blobsConsider R1 denotes the largest blob
Area(R1)Boundary pixel count in R1
Fullness – No. of white pixels in R1 /Area(R1)Measure of boundary smoothness of R1
Variance of boundary smoothness of R1
Measure of symmetry of R1 along X and Y-axisConsider R2,R3,
R4 and R5 as the 4 largest blobs excluding R1 Average areaAverage fullness
Dataset
Category No of images Percentage
Non Crystals 1514 67.3%
Likely Leads 404 18.0%
Crystals 332 14.8%
Total images 2250
Experimental Results
Confusion matrix
Observed class
Non
crystalsLikely leads Crystals Actual
Total Accuracy
Actual class
Non-Crystals
1467 43 4 1514 96.9%
Likely Leads
42 317 45 404 78.5%
Crystals 7 68 257 332 77.4%Observed Total 1516 428 306 2250 90.7%
Classifier – Multilayer Perceptron Neural NetworkTesting – 10-fold cross validation
Other Classification TechniquesMax class ensemble method
Uses multiple classifiers with different feature combination
Assigned class is the maximum predicted class of all the classifiers
Decreases false negatives but increases false positives
Exhaustive binary classifiersSolves multiclass problem using all possible
binary classifiersFor class 3 – no of binary classifiers = 6
Overall accuracy around 82%
Future WorkClassify the crystals according to crystal
morphologyTrack temporal evolution of the crystalsExtract other relevant image features and
improvement of accuracy
SummaryIntensity is shown to be an easier but
useful search parameter to identify crystals
Efficient image processing (3 sec/image) Classification into 3 categories – non-
crystals, likely crystals and clear crystalsComparable accuracy with other
systems
AcknowledgementCoworkers
Salma BegumMarc L PuseyRamazan Aygun
iExpressGenes inc.