![Page 1: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/1.jpg)
PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems
IEEE Big Data 2014
![Page 2: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/2.jpg)
Agenda
• Introduction• Motivation• Model Structure• Progressive Learning• Use Cases
– Automate MS Annotation (Multi-label Classification)
– Latent Semantic Discovery• Conclusion
IEEE Big Data 2014
![Page 3: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/3.jpg)
Introduction
• Probabilistic graphical models (PGM) consist of a structural model and a set of conditional probabilities.
• Graphical models can be classified into two major categories: – (1) directed graphical models (Bayesian
networks)– (2) undirected graphical models (Markov
Random Fields)
IEEE Big Data 2014
![Page 4: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/4.jpg)
Motivation
MS1
MS2MS3
1300 2,979,334
Frag1 Frag2
..
GOG1
GOG2
…
MS1
MS213000* 2,979,334 =
3,873,134,200
MS3
IEEE Big Data 2014
![Page 5: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/5.jpg)
Model Structure
50 2020
40 50 30 50
10
5
10 20 1515
GOG1 GOG2
F1 F2F3 F4 F5 F6
F7F8
F9 F10 F11
MS1
MS2
MS3
P(GOG1 | F1,F3,F7) = P(GOG1|F1) * P(GOG1|F3) * P(F3|F7)) = 50/50 * 20/60 * 10/25
IEEE Big Data 2014
![Page 6: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/6.jpg)
Progressive Learning
• This learning technique is very attractive in the big data age for the following reasons:– Training the model does not require processing all
data upfront.– It can easily learn from new data without the need to
re-include the previous training data in the learning.– The training session can be distributed instead of
doing it in one long-running session.
IEEE Big Data 2014
![Page 7: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/7.jpg)
Automate MS Annotation(Multi-label Classification)
• Data Set Includes:Item CountScan 1974
Peak 266571
Edges 10743
Root 450
MS2 Fragment Node 5983
MS3 Fragment Node 201
IEEE Big Data 2014
![Page 8: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/8.jpg)
Results
IEEE Big Data 2014
![Page 9: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/9.jpg)
Results
IEEE Big Data 2014
![Page 10: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/10.jpg)
Results
![Page 11: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/11.jpg)
![Page 12: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/12.jpg)
Latent Semantic discovery
Java Developer
.NET Developer
Nurse Health Care
Java J2EE C#Care giver
RN Senior Home
510
350 5
0100
10
15
1
P(Java,J2EE| Java Developer) = P(Java|Java Developer) * P(J2EE|Java Developer) = 5/7 * 10/10
P(Java,C#|Java Dev, .NET Dev) = P(Java|Java Dev)*P(Java|.NET Dev) * P(C#|Java Dev) * P(C#|.NET Dev)
IEEE Big Data 2014
![Page 13: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/13.jpg)
Results
IEEE Big Data 2014
![Page 14: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/14.jpg)
Conclusion
• we propose an efficient and scalable probabilistic graphical model for massive hierarchical data (PGMHD).
• we successfully applied PGMHD to the bioinformatics domain to automatically classify and annotate high-throughput mass spectrometry data.
• we successfully applied this model to large-scale latent semantic discovery by using 1.6 billion search log entries provided by CareerBuilder.com within a Hadoop Map/Reduce framework.
IEEE Big Data 2014
![Page 15: PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014](https://reader036.vdocuments.mx/reader036/viewer/2022081605/5a4d1ad07f8b9ab05997122b/html5/thumbnails/15.jpg)
Questions
IEEE Big Data 2014