u sing a hierarchy in place of flat data for s equential p attern m ining matt ramsey 100063967...
TRANSCRIPT
![Page 1: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/1.jpg)
USING A HIERARCHY IN PLACE OF FLAT DATA FOR SEQUENTIAL
PATTERN MINING
Matt Ramsey
100063967
Supervisor: Jan Stanek
Minor Thesis Presentation
![Page 2: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/2.jpg)
PRESENTATION STRUCTURE
Introduction Concepts Research Question
Methodology Results Conclusion Discussion
![Page 3: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/3.jpg)
INTRODUCTION
What the project is: Performing sequential pattern mining on patient
prescription data using the ATC drug hierarchy
Some Important concepts Sequential Pattern Mining ATC drug hierarchy
![Page 4: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/4.jpg)
CONCEPT 1 - SEQUENTIAL PATTERN MINING
Sequential pattern mining is trying to find the relationships between occurrences of sequential events, to find if there exists any specific order of the occurrences (Zhao, Q & Bhowmick, S 2003).
![Page 5: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/5.jpg)
CONCEPT 3 - ATC CODES Anatomical Therapeutic Chemical (ATC): A
drug classification that creates distinct groups of drugs EG: Propicillin (J01CE03)
![Page 6: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/6.jpg)
RESEARCH QUESTION
How will the use of a hierarchy affect the success of the sequential pattern mining?
![Page 7: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/7.jpg)
THEORY
Theory is: there will be more patterns, but may experience “contamination” and “dilution”
Contamination Occurs when a level of the hierarchy that is too high is used –
it contains too many unrelated subgroups E.G: If we have a pattern at level 2, and we move up to
pattern mine at level 1, we still have that pattern, but the meaning might be lost.
Pattern is contaminated
Dilution Occurs when a level of the hierarchy that is too low is used -
meaning information becomes too specific E.G: At Level 5 of the ATC hierarchy may be too many
possible states (prescriptions) for strong patterns to emerge Pattern is diluted by the level of detail
![Page 8: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/8.jpg)
METHODOLOGY – PRE-PROCESSING
Remove Prescriptions with no drug recorded Calculate Duration = Packsize * (Repeats+1) /
Dose Convert Drug names to ATC codes
Patient Key
Provider Key
Date Script number
Drug Name
Dose Packsize
Repeats
A2rJFF8mDe
Amrr6MjVUS
22/04/2008
A006507 Propicillin 1 daily
20 2
Patient Key
Date Drug Code Duration (days)
A2rJFF8mDe 22/04/2008 J10CE03 60
^Prescription(1)="A2rJFF8mDe,Amrr6MjVUS,22/07/2004,A006507, Propicillin,1 daily,20,2,Tablets,.....”^Prescription(1,"ATC Code")="J10CE03"^Prescription(1,"Patient ID")="A2rJFF8mDe”^Prescription(1,"Prescription Duration")=60^Prescription(1,"Script Date")="22/04/2008"
![Page 9: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/9.jpg)
METHODOLOGY – SEQUENTIAL PATTERN MINING
There are many Sequential Pattern Mining algorithms, but majority are overly complex or specific to multi-dimensional data, time sensitive data, etc.
We want a simple method for proof-of-concept Some problems with existing “simple” algorithms
such as AprioriAll (Agrawal & Srikant 1995) and PrefixSpan (Pei et al. 2004)
![Page 10: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/10.jpg)
METHODOLOGY – PROBLEM 1: PATTERNS WITHIN SEQUENCES
Patterns: A -> B
SPM with Min Support: 3
![Page 11: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/11.jpg)
METHODOLOGY – PROBLEM 1: PATTERNS WITHIN SEQUENCES
Patterns: ---
SPM with Min Support: 3
![Page 12: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/12.jpg)
METHODOLOGY – PROBLEM 2: TRANSITIVE PATTERNS
SPM with Min Support: 3
Patterns: A -> C
![Page 13: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/13.jpg)
METHODOLOGY – ADDRESSING PROBLEMS
1. Link data with the same Patient Key2. Break each patient’s sequence up into
individual 2 drug sequences.
No transitive effect because no possible gaps No missing patterns within patients as all
prescriptions treated as their own sequence
![Page 14: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/14.jpg)
METHODOLOGY – PATTERN MINING
Use 2 support thresholds Minimum Support Minimum Patients
Pattern Mine Iteratively: Mine for patterns in level 5 (flat data) Modify the ATC codes (eg A01AA01 -> A01AA) Mine for patterns in level 4 Modify the ATC codes (eg A01AA -> A01A) Etc
Reflect on strength and relevance of gained patterns How strongly supported the patterns are How meaningful they are
![Page 15: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/15.jpg)
EXAMPLE PATTERNS WITH MINSUP = 6, MINPAT = 3
LEVEL 5 : 2 ITEM PATTERNS
PATTERN: Temazepam (N05CD07) -> Paracetamol (N02BE01) PATTERN OCCURS 8 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 49) (3, 52) (3, 56) (3, 58) (3, 65) (3, 67) (3, 80) (3, 100)
PATTERN: Metformin (A10BA02) -> Gliclazide (A10BB09) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 64) (4, 1) (4, 3) (13, 45) (13, 49)
....
LEVEL 4 : 2 ITEM PATTERNS
PATTERN: Biguanides (A10BA) -> Sulfonamides, urea derivatives (A10BB) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 64) (4, 1) (4, 3) (13, 45) (13, 49)
....
![Page 16: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/16.jpg)
RESULTS FOR MINSUP = 6, MINPAT = 3
Level Pattern Length
Unique Patterns
Total Patterns
5 2 8 55
4 2 11 73
3 2 15 109
3 3 1 6
2 2 20 152
2 3 1 6
1 2 31 430
1 3 25 211
1 4 11 73
1 5 2 17
1 6 1 7
![Page 17: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/17.jpg)
RESULTS FOR MINSUP = 6, MINPAT = 3
Level Pattern Length
Unique Patterns
Total Patterns
5 2 8 55
4 2 11 73
3 2 15 109
3 3 1 6
2 2 20 152
2 3 1 6
1 2 31 430
1 3 25 211
1 4 11 73
1 5 2 17
1 6 1 7
![Page 18: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/18.jpg)
RESULTS FOR MINSUP = 6, MINPAT = 3
A01AA01
A01AA
A01A
A01
A
Level Pattern Length
Unique Patterns
Total Patterns
5 2 8 55
4 2 11 73
3 2 15 109
3 3 1 6
2 2 20 152
2 3 1 6
1 2 31 430
1 3 25 211
1 4 11 73
1 5 2 17
1 6 1 7
Example code
![Page 19: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/19.jpg)
INTERESTING PATTERNS DISCOVERED
Only emerges at level 4, and is one of the strongest patterns
Gets diluted at level 5
PATTERN: ACE inhibitors, plain (C09AA) -> HMG CoA reductase inhibitors (C10AA) PATTERN OCCURS 9 TIMES IN TOTAL OF 2 PATIENTS
PATTERN: OTHER ANALGESICS AND ANTIPYRETICS (N02B) -> HYPNOTICS AND SEDATIVES (N05C) PATTERN OCCURS 6 TIMES IN TOTAL OF 1 PATIENTS
Identified as unusual by supervisor Jan Stanek
PATTERN: Metformin (A10BA02) -> Gliclazide (A10BB09) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS
PATTERN: Biguanides (A10BA) -> Sulfonamides, urea derivatives (A10BB) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS
PATTERN: ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B) -> ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B)
PATTERN OCCURS 19 TIMES IN TOTAL OF 3 PATIENTS Very vague pattern, is present at lower levels By higher level, loses its meaning; becomes
contaminated
![Page 20: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/20.jpg)
ACHIEVEMENTS
Created modified algorithm Finds patterns within sequences as well as across
sequences Uses 2 support thresholds
Discovered more rules than if performing pattern mining on flat data
Asses the impact of using a hierarchy for sequential pattern mining YES using hierarchy DOES furthers pattern mining BUT it is up to an expert in the field to assess
whether the extra patterns are useful or not.
![Page 21: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/21.jpg)
CONCLUSION Unique Research
Using hierarchy to enhance pattern mining Finding patterns within sequences as well as across
all sequences Small data set required
This research shows the potential importance of using a hierarchy to enhance data mining
Forms the basis for further research
![Page 22: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/22.jpg)
REFERENCES
Agrawal, R & Srikant, R 1995, 'Mining Sequential Patterns', paper presented at the Eleventh International Conference on Data Engineering.
Pei, J, Han, J, Dayal, U, Mortazavi-Asl, B, Wang, J, Pinto, H, Chen, Q & Hsu, M 2004, 'Mining sequential patterns by pattern-growth: The prefixspan approach', IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 11.
Zhao, Q & Bhowmick, S 2003, 'Sequential pattern mining: A survey', ITechnical Report CAIS Nayang Technological University Singapore, pp. 1–26.
![Page 23: U SING A HIERARCHY IN PLACE OF FLAT DATA FOR S EQUENTIAL P ATTERN M INING Matt Ramsey 100063967 Supervisor: Jan Stanek Minor Thesis Presentation](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649efc5503460f94c0f562/html5/thumbnails/23.jpg)
DISCUSSION