project 3 mushrooms
TRANSCRIPT
Data Mining a Mushroom DatasetRaymond BorgesJarilyn Hernandez
Project III
Outline Background Introduction Hypotheses Methodology Results Conclusions Future Work
BackgroundPrevious Work
Hypothetical examples of 23 species from Agaricus and Lepiota families
Class attribute: EdibilityEdible(4,208)51.8%Poisonous(3,916)48.2%
The Mushroom Dataset
Data Set Characteristics:
Multivariate
Number of Instances:
8124 Area: Life
Attribute Characteristics:
Categorical
Number of Attributes:
22Date Donated:
1987
Benchmark ruleset1. Odor = not almond or anise or none(120 poisonous cases missed, 98.52% accuracy)
2. Spore-print-color =green(48 cases missed, 99.41% accuracy)
3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown(8 cases missed, 99.90% accuracy)
4. Habitat= leaves and cap-color=white or4. Population=clustered and cap-color=white(100% accuracy)
The Mushroom Dataset22 Attributes18 Visuallyon Mushroom
4 Others1 Habitat1 Population1 Bruises1 Odor
Accuracy with 1R
Veil ColorRing Number
Stalk ShapeCap Shape
Cap SurfaceCap Color
Stalk RootHabitat
Stalk Color AboveStalk Color Below
PopulationStalk Surface AboveStalk Surface Below
Ring Type
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%100%
52%
54%
55%
56%
58%
59%
65%
69%
71%
71%
72%
77%
77%
77%
Visual Attribute ruleset Only 4 attrib.(100% accuracy)1. Stalk surface above ring = not silky and ring
number = not one, (79% accuracy JRIP)
2. Population not clustered(80% accuracy J48)
Once retrieved test these two rules:3. Odor = not bad, (98% accuracy J48)4. Spore print color = not green, (100% J48)
Results Odor and spore color may be the best
attributes statistically but not in the field
Focused on visual-queue attributes, e.g. habitat, population, cap and stalk
Obtained a more practical classification
IntroductionProject III
IntroductionTaking into account human perception
Based on: Lighting conditions Mushroom stage in lifecycle Humidity Seasons Human senses? other unknown factors…
IntroductionSome attributes difficult to discernTextures, Shapes or
Colors like: Brown Chocolate Buff Cinnamon
Hypotheses1. Complex attributes = Higher error
probability2. Human senses + external factors = Big
impact
So…Ruleset will change to approach realitySome attributes will fair much better than others
Methodology
MethodologyCollect survey responses:
1. Evaluate species in different conditions
2. Measure overall accuracy
3. Weight attributes based on survey performance
Methodology part 1Take 3 mushroom species Agaricus Abruptibulbus Agaricus Augustus Lepiota Rubrotincta
Place under 2 distinct set of conditions
Methodology part 25 questions per species in each condition
Abruptibulbusunder conditions X
Abruptibulbusunder conditions Y
Augustusunder conditions X
Augustusunder conditions Y
Rubrotinctaunder conditions X
Rubrotinctaunder conditions Y
Methodology part 3 Design Tutorial (SurveyMonkey.com) Design Website (Weebly.com)
Get people to take survey (hardest part) Designed Flyers Poster boards Business cards
Survey at Mountainlair
Survey at Mountainlair
Methodology 4 Calculate survey test scores Calculate species’ accuracy variation Calculate attributes’ accuracy variation Calculate attribute weights Use data mining tools to find best
ruleset
Weighting MethodologyWeighting attributes based on: Highest survey accuracy attributes with
lowest variations in responses
Determine accuracy variation by measuring Euclidean distance for correct answers for each specimen in setting A and setting B
Results
Overall Survey Results 30 questions per survey 15 Attributes measured 37 completed surveys 1,110 answered questions OverallSurvey Grades
Highest was 24 out of 30 correct answers
A 0B 1C 7D 8F 14
Results
cap sh
ape
stalk sh
ape
gill size
veil color
stalk ro
ot
stalk co
lor above rin
g
cap su
rface
ring number
stalk su
rface
above ring
gill color
stalk co
lor below rin
g
ring ty
pe
stalk su
rface
below ring
gill spacin
g
cap co
lor0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Survey Accuracy per Attribute
gill color
stalk surface above ring
ring type
stalk surface below ring
gill size
stalk color below ring
stalk color above ring
stalk root
gill spacing
cap color
cap surface
cap shape
stalk shape
ring number
veil color
0 10 20 30 40 50 60 70 80 90 10013.5
2.7
5.4
5.4
13.5
10.8
64.9
21.7
16.2
5.4
5.4
48.7
18.9
5.4
10.8
63.55
63.55
73
78.4
36.45
67.6
59.45
45.95
78.4
81.1
32.45
32.45
33.75
59.5
37.8
Attribute Accuracy Attribute Variation
cap co
lor
stalk
surfa
ce below rin
g
ring t
ype
gill sp
acing
stalk
surfa
ce ab
ove rin
g
stalk
color b
elow ring
ring n
umber
gill c
olor
stalk
root
veil c
olor
gill si
ze
cap su
rface
stalk
shap
e
stalk
color a
bove rin
g
cap sh
ape
0102030405060708090
100
76.7 74.2 69 65.7 61.8 60.3 56.3 55 36 33.7 31.5 30.7 27.4 20.9 16.7
Weighted Attributes
Wei
ght
J48 Tree 99.6%Classification
P P P PP E PE
almond creosote foul anise spicy fishy
E = EdibleP = Poisonous
E E E EE E PE
black brown buff chocolate green orange yellow
E
smoothsilky
EE
Odor
Spore print colormusty none pungent
white
fibrousscaly
P
purple
Stalk surface below ring
3 Attributes
J48 Tree 99.9%Classification
P P P PP E PE
almond creosote foul anise spicy fishy
E = EdibleP = Poisonous
E E E EE E PE
black brown buff chocolate green orange yellow
E
smooth
E E
Odor
Spore print colormusty none pungent
white
Ring type
P P P EP P PP
pendantcobwebbyevanescent flaring largenonezone sheathing
purple
fibrousscaly silky
Stalk surface below ring
4 Attributes
0 1 2 3 4 5 6 7 8 9 10 11 12 130
10
20
30
40
50
60
70
80
90
Series3
Attribute Accuracy
Complexity
Accuracy
Conclusions
ConclusionComplex attributes = Higher error probabilityHypothesis 1: False
They are actually more accurate the more complex the attribute
Fat spheres = Complex attributes Height = Survey accuracy
ConclusionHuman senses + external factors = Big impactHypothesis 2: True 24% change in correctly identifying
attributes due to ambient environment conditions
1.2 questions answered incorrectly out of 5 due to ambient environments of mushrooms
Future Work Evaluate mushroom expertise for
increase in mushroom attribute identification accuracy
Measure Spore print color and Odor in surveys?
Questions?