an investigation into the relationship between semantic and content based similarity using lidc
DESCRIPTION
An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC. Grace Dasovich Robert Kim Midterm Presentation August 21 2009. Outline. Outline. Related Work Data Modeling Approach and Results Similarity Measures Artificial Neural Network - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/1.jpg)
An Investigation into the Relationship between Semantic and Content Based Similarity
Using LIDC
Grace Dasovich
Robert Kim
Midterm Presentation
August 21 2009
![Page 2: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/2.jpg)
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
![Page 3: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/3.jpg)
• Computer-Aided Diagnosis (CADx) based on low-level image features– Armato et al. developed a linear discriminant
classifier using features of lung nodules– Need to find the relationship between the
image features and radiologists’ ratings
Related Work
![Page 4: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/4.jpg)
• Image features and the semantic ratings– Lung Interpretations
• Barb et al. developed Evolutionary System for Semantic Exchange of Information in Collaborative Environments (ESSENCE)
• Raicu et al. used ensemble classifiers and decision trees to predict semantic ratings
• Samala et al. used several combinations of image features and the radiologists’ ratings to classify nodules
Related Work
![Page 5: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/5.jpg)
– Similarity• Li et al. investigated four different methods to
compute similarity measures for lung nodules– Feature-based– Pixel-value-difference– Cross correlation– ANN
Related Work
![Page 6: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/6.jpg)
Materials
• LIDC Dataset
• 149 Unique Nodules– One slice per nodule, largest nodule area
• 9 Semantic Characteristics– Calcification and Internal Structure had little
variation, thus were not used
• 64 Content Features– Shape, size, intensity, and texture
6
Data
![Page 7: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/7.jpg)
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
Outline
![Page 8: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/8.jpg)
• Cosine Similarity
• Jeffrey Divergence
• Euclidean Distance
Similarity Measures
![Page 9: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/9.jpg)
Similarity Measures
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Euclidean Distance
Co
sin
e S
imila
rity
![Page 10: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/10.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
3.5
4
Euclidean Distance
Jeff
rey
Div
erg
en
ce
Similarity Measures
![Page 11: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/11.jpg)
• Computed feature distance measures
Similarity Measures
![Page 12: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/12.jpg)
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
![Page 13: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/13.jpg)
• Two three-layer ANNs – Input (64 neurons), hidden layer (5 neurons), output
(1)– Input (64 neurons), hidden layer (5 neurons), output
(7)
• Input = 64 feature distances• Output = Semantic similarity or difference in
semantic ratings• Hyperbolic tangent function, backpropagation
algorithm, 200 iterations
Methods
![Page 14: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/14.jpg)
• ANN with a single output– 640 random pairs from all 109 nodules– 231 pairs from nodules with malignancy > 3– 496 pairs from nodules with area > 122 mm2
Methods
![Page 15: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/15.jpg)
Methods
• ANN with seven outputs– 640 random pairs from all 109 nodules
![Page 16: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/16.jpg)
• Leave-one-out method– Cosine similarity or Jeffrey divergence or
difference in Semantic ratings used as teaching data
– An ANN trained with entire dataset minus one image pair
– The pair left out used for testing– Correlation between calculated radiologists’
similarity and ANN output calculated
Methods
![Page 17: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/17.jpg)
• ANN with a single output– 640 random pairs from all 109 nodules– 231 pairs from nodules with malignancy > 3– 496 pairs from nodules with area > 122 mm2
• ANN with seven outputs– 640 random pairs from all 109 nodules
Methods
![Page 18: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/18.jpg)
• ANN using 640 random pairs
Results
![Page 19: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/19.jpg)
• ANN using 231 pairs with malignancy rating > 3
Results
![Page 20: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/20.jpg)
• ANN using 496 pairs with area > 122 mm2
Results
![Page 21: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/21.jpg)
• ANN output vs. target values using Jeffrey divergence for the 640 pairs (r = 0.438)
Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Output
Ta
rge
t
![Page 22: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/22.jpg)
• ANN using random 640 pairs and the Jeffrey divergence with seven semantic ratings
Results
![Page 23: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/23.jpg)
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
![Page 24: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/24.jpg)
Methods
• Normalization of Features– Min-Max Technique – Z-Score Technique
• Pair Selection– Looked for matches between k number of
most similar images based on semantic and content
24
Methods
![Page 25: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/25.jpg)
Methods
• Multivariate Regression Analysis– Select features with highest correlation
coefficients
– Feature distance measures
25
Methods
![Page 26: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/26.jpg)
• Nodule Analysis– Determine differences between selected and
non-selected nodules– Define requirements for our model
Methods
![Page 27: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/27.jpg)
Results
27
Results
0 2 4 6 8 10 12 14 16 18 200
0.5
1
Cor
rela
tion
Threshold0 2 4 6 8 10 12 14 16 18 20
0
1000
2000
Num
ber
of P
airs
![Page 28: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/28.jpg)
Results
d(i, j) d2(i, j) exp(d(i, j))
Cosine 0.871 0.849 0.866
Jeffrey 0.647 0.633 0.608
![Page 29: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/29.jpg)
Results
Correlation Coefficient Feature0.1175 Equivalent Diameter0.1085 Energy (Haralick)0.0823 Gabor Mean 135_050.0647 Convex Area0.0467 Gabor STD 135_040.0322 Min Intensity BG0.0295 Markov 40.0280 Variance (Haralick)0.0265 Gabor STD 45_050.0238 SD Intensity
R2 = 0.871
29
Results
![Page 30: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/30.jpg)
Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Content
Sem
antic
30
Results
![Page 31: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/31.jpg)
Results
1 2 3 4 50
0.5
1Lobulation
1 2 3 4 50
0.5
1Malignancy
1 2 3 4 50
0.2
0.4
0.6
0.8
1Margin
1 2 3 4 50
0.2
0.4
0.6
0.8
1Sphericity
1 2 3 4 50
0.5
1Spiculation
1 2 3 4 50
0.5
1Subtlety
1 2 3 4 50
0.5
1Texture
79 Nodules
70 Nodules
31
Results
![Page 32: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/32.jpg)
Results
-2 0 2 4 6 80
0.2
0.4Equivalent Diameter
-2 0 2 4 60
0.2
0.4Energy
-1 0 1 2 3 40
0.2
0.4Gabor Mean 135 5
-2 0 2 4 6 8 100
0.5
1Convex Area
-2 -1 0 1 2 3 4 50
0.1
0.2Gabor SD 135 4
-3 -2 -1 0 1 20
0.2
0.4Min Intensity BG
-1 0 1 2 3 4 5 60
0.5
1Markov4
-2 0 2 4 6 80
0.5
1Variance
-2 -1 0 1 2 3 40
0.1
0.2Gabor SD 45 5
-2 0 2 4 60
0.1
0.2SD Intensity
79 nodules70 nodules
32
Results
![Page 33: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/33.jpg)
Results
-5 0 5 100
0.1
0.2
0.3
0.4A
-5 0 5 100
0.05
0.1
0.15
0.2B
79 Nodules70 Nodules
79 Nodules70 Nodules
1 2 3 4 50
0.2
0.4
0.6
0.8C
1 2 3 4 50
0.2
0.4
0.6
0.8D
79 Nodules70 Nodules
79 Nodules70 Nodules
Results
A. Equivalent Diameter, B. Standard Deviation of Intensity, C. Malignancy, D. Subtlety
![Page 34: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/34.jpg)
Preliminary Issues
• The ANN also is not yet sufficient to predict semantic similarity from content– Best correlation 0.438– Malignancy correlation 0.521– Jeffrey performed better unlike linear model
• A semantic gap still exists
Conclusions
![Page 35: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/35.jpg)
Conclusions
• Our linear model applies to a specific type of nodule– Characteristics: High malignancy, high texture,
low lobulation, and low spiculation– Features: Larger diameter, greater intensity
• Linear models are not sufficient for determination of similarities– R2 of 0.871 with chosen nodules
35
Conclusions
![Page 36: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/36.jpg)
Future Work
• Reduce variability among radiologists– Use only nodules with radiologists’ agreement
• Find best combination of content features– 64 may be too many– Currently only using 2D
Future Work
![Page 37: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/37.jpg)
• Different semantic distance measures– Some ratings are ordinal, Jeffery is for
categorical
• Different methods of machine learning– Incorporate radiologists’ feedback into training– Ensemble of classifiers
Future Work
![Page 38: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC](https://reader036.vdocuments.mx/reader036/viewer/2022081512/56815176550346895dbfafbc/html5/thumbnails/38.jpg)
Thanks for Listening
Any Questions?
38
Thanks for Listening