project 3 mushrooms

36
Data Mining a Mushroom Dataset Raymond Borges Jarilyn Hernandez Project III

Upload: cs-ncstate

Post on 06-May-2015

966 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Project 3 mushrooms

Data Mining a Mushroom DatasetRaymond BorgesJarilyn Hernandez

Project III

Page 2: Project 3 mushrooms

Outline Background Introduction Hypotheses Methodology Results Conclusions Future Work

Page 3: Project 3 mushrooms

BackgroundPrevious Work

Page 4: Project 3 mushrooms

Hypothetical examples of 23 species from Agaricus and Lepiota families

Class attribute: EdibilityEdible(4,208)51.8%Poisonous(3,916)48.2%

The Mushroom Dataset

Data Set Characteristics:

Multivariate

Number of Instances:

8124 Area: Life

Attribute Characteristics:

Categorical

Number of Attributes:

22Date Donated:

1987

Page 5: Project 3 mushrooms

Benchmark ruleset1. Odor = not almond or anise or none(120 poisonous cases missed, 98.52% accuracy)

2. Spore-print-color =green(48 cases missed, 99.41% accuracy)

3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown(8 cases missed, 99.90% accuracy)

4. Habitat= leaves and cap-color=white or4. Population=clustered and cap-color=white(100% accuracy)

Page 6: Project 3 mushrooms

The Mushroom Dataset22 Attributes18 Visuallyon Mushroom

4 Others1 Habitat1 Population1 Bruises1 Odor

Page 7: Project 3 mushrooms

Accuracy with 1R

Veil ColorRing Number

Stalk ShapeCap Shape

Cap SurfaceCap Color

Stalk RootHabitat

Stalk Color AboveStalk Color Below

PopulationStalk Surface AboveStalk Surface Below

Ring Type

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%100%

52%

54%

55%

56%

58%

59%

65%

69%

71%

71%

72%

77%

77%

77%

Page 8: Project 3 mushrooms

Visual Attribute ruleset Only 4 attrib.(100% accuracy)1. Stalk surface above ring = not silky and ring

number = not one, (79% accuracy JRIP)

2. Population not clustered(80% accuracy J48)

Once retrieved test these two rules:3. Odor = not bad, (98% accuracy J48)4. Spore print color = not green, (100% J48)

Page 9: Project 3 mushrooms

Results Odor and spore color may be the best

attributes statistically but not in the field

Focused on visual-queue attributes, e.g. habitat, population, cap and stalk

Obtained a more practical classification

Page 10: Project 3 mushrooms

IntroductionProject III

Page 11: Project 3 mushrooms

IntroductionTaking into account human perception

Based on: Lighting conditions Mushroom stage in lifecycle Humidity Seasons Human senses? other unknown factors…

Page 12: Project 3 mushrooms

IntroductionSome attributes difficult to discernTextures, Shapes or

Colors like: Brown Chocolate Buff Cinnamon

Page 13: Project 3 mushrooms

Hypotheses1. Complex attributes = Higher error

probability2. Human senses + external factors = Big

impact

So…Ruleset will change to approach realitySome attributes will fair much better than others

Page 14: Project 3 mushrooms

Methodology

Page 15: Project 3 mushrooms

MethodologyCollect survey responses:

1. Evaluate species in different conditions

2. Measure overall accuracy

3. Weight attributes based on survey performance

Page 16: Project 3 mushrooms

Methodology part 1Take 3 mushroom species Agaricus Abruptibulbus Agaricus Augustus Lepiota Rubrotincta

Place under 2 distinct set of conditions

Page 17: Project 3 mushrooms

Methodology part 25 questions per species in each condition

Abruptibulbusunder conditions X

Abruptibulbusunder conditions Y

Augustusunder conditions X

Augustusunder conditions Y

Rubrotinctaunder conditions X

Rubrotinctaunder conditions Y

Page 18: Project 3 mushrooms

Methodology part 3 Design Tutorial (SurveyMonkey.com) Design Website (Weebly.com)

Get people to take survey (hardest part) Designed Flyers Poster boards Business cards

Page 19: Project 3 mushrooms
Page 20: Project 3 mushrooms

Survey at Mountainlair

Page 21: Project 3 mushrooms

Survey at Mountainlair

Page 22: Project 3 mushrooms

Methodology 4 Calculate survey test scores Calculate species’ accuracy variation Calculate attributes’ accuracy variation Calculate attribute weights Use data mining tools to find best

ruleset

Page 23: Project 3 mushrooms

Weighting MethodologyWeighting attributes based on: Highest survey accuracy attributes with

lowest variations in responses

Determine accuracy variation by measuring Euclidean distance for correct answers for each specimen in setting A and setting B

Page 24: Project 3 mushrooms

Results

Page 25: Project 3 mushrooms

Overall Survey Results 30 questions per survey 15 Attributes measured 37 completed surveys 1,110 answered questions OverallSurvey Grades

Highest was 24 out of 30 correct answers

A 0B 1C 7D 8F 14

Page 26: Project 3 mushrooms

Results

cap sh

ape

stalk sh

ape

gill size

veil color

stalk ro

ot

stalk co

lor above rin

g

cap su

rface

ring number

stalk su

rface

above ring

gill color

stalk co

lor below rin

g

ring ty

pe

stalk su

rface

below ring

gill spacin

g

cap co

lor0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Survey Accuracy per Attribute

Page 27: Project 3 mushrooms

gill color

stalk surface above ring

ring type

stalk surface below ring

gill size

stalk color below ring

stalk color above ring

stalk root

gill spacing

cap color

cap surface

cap shape

stalk shape

ring number

veil color

0 10 20 30 40 50 60 70 80 90 10013.5

2.7

5.4

5.4

13.5

10.8

64.9

21.7

16.2

5.4

5.4

48.7

18.9

5.4

10.8

63.55

63.55

73

78.4

36.45

67.6

59.45

45.95

78.4

81.1

32.45

32.45

33.75

59.5

37.8

Attribute Accuracy Attribute Variation

Page 28: Project 3 mushrooms

cap co

lor

stalk

surfa

ce below rin

g

ring t

ype

gill sp

acing

stalk

surfa

ce ab

ove rin

g

stalk

color b

elow ring

ring n

umber

gill c

olor

stalk

root

veil c

olor

gill si

ze

cap su

rface

stalk

shap

e

stalk

color a

bove rin

g

cap sh

ape

0102030405060708090

100

76.7 74.2 69 65.7 61.8 60.3 56.3 55 36 33.7 31.5 30.7 27.4 20.9 16.7

Weighted Attributes

Wei

ght

Page 29: Project 3 mushrooms

J48 Tree 99.6%Classification

P P P PP E PE

almond creosote foul anise spicy fishy

E = EdibleP = Poisonous

E E E EE E PE

black brown buff chocolate green orange yellow

E

smoothsilky

EE

Odor

Spore print colormusty none pungent

white

fibrousscaly

P

purple

Stalk surface below ring

3 Attributes

Page 30: Project 3 mushrooms

J48 Tree 99.9%Classification

P P P PP E PE

almond creosote foul anise spicy fishy

E = EdibleP = Poisonous

E E E EE E PE

black brown buff chocolate green orange yellow

E

smooth

E E

Odor

Spore print colormusty none pungent

white

Ring type

P P P EP P PP

pendantcobwebbyevanescent flaring largenonezone sheathing

purple

fibrousscaly silky

Stalk surface below ring

4 Attributes

Page 31: Project 3 mushrooms

0 1 2 3 4 5 6 7 8 9 10 11 12 130

10

20

30

40

50

60

70

80

90

Series3

Attribute Accuracy

Complexity

Accuracy

Page 32: Project 3 mushrooms

Conclusions

Page 33: Project 3 mushrooms

ConclusionComplex attributes = Higher error probabilityHypothesis 1: False

They are actually more accurate the more complex the attribute

Fat spheres = Complex attributes Height = Survey accuracy

Page 34: Project 3 mushrooms

ConclusionHuman senses + external factors = Big impactHypothesis 2: True 24% change in correctly identifying

attributes due to ambient environment conditions

1.2 questions answered incorrectly out of 5 due to ambient environments of mushrooms

Page 35: Project 3 mushrooms

Future Work Evaluate mushroom expertise for

increase in mushroom attribute identification accuracy

Measure Spore print color and Odor in surveys?

Page 36: Project 3 mushrooms

Questions?