Use of graphs and taxonomic classifications to analyze content relationships among courseware
Márcio de Carvalho Saraiva and Claudia Bauzer Medeiros
Institute of ComputingUNICAMP
Background and Motivation
It should be easy to understand how different materials are related.
Ouyang and Zhu, 2007
6
Relationships:● Authorship● Date● Location● Visual● Topics● etc.
?
?
?
?
?
Related Work
7
Educational
Data Mining(Pereira, 2014)
Recognition of data relationships
(Sathiyamurthy et al. 2012)
Analysis of relationships using graph databases
(Cavoto et al. 2015)
Integration of multimedia data
(Santanchè et al. 2014)
Objects metadata
Architecture with hierarchies
● Analysis on a single level
● not related to education
● one kind of data
● semantic annotations
● training sets
Goal
Allow the integration of different types of educational material, highlighting relationships among content.
8
Proposal
CIMAL
I'm having trouble on "Big Data" in discipline "X" of teacher "Y" what other material could help me to understand this issue?
Sources 1 to N
Student
CIMAL: Courseware Integration under Multiple relations to Assist Learning
9
Proposal
10
Step B - Intermediate Representation Instantiation Step C - Intermediate Representation Analysis
Step D - Courseware access
Step A - Extraction of elements of interest
Extractor DDEx
Java + Youtube API
input
courseware
elements of interest
Proposal - Step A - Extraction of elements of interest
11
Classification Algorithms
Introduction to Databases
Proposal - Step A - Extraction of elements of interest
12
Commented slide, highlighted concepts,Slide titles,
Descriptions from figures and tables ....
14
Proposal - Step A - Extraction of elements of interest
0:00- 0:30
“...Databases are important...”
0:31- 1:00
“...everybody need to know SQL...”
1:01- 1:30
“...the DBMS is a computer software
application...”
Shadows as graphs Builder
15
Metadata and Text Extractor
input
Graph-based Representation
courseware
Step A - Extraction of elements of interest
Step B - Intermediate Representation Instantiation
Shadowsas graphs
elements of interest
Step D - Courseware access
Step C - Intermediate Representation Analysis
Intermediate Graph
Representation Builder
Proposal - Step B - Intermediate Representation Instantiation
Extractor
Author
Discipline
Text
Date
Set of relevantconcepts
16Mota and Medeiros, 2013
Proposal - Step B - Intermediate Representation Instantiation
Introduction to
Databases(video)
Prof. Saraiva
AdvancedDatabases
Lorem ipsum dolor sit amet, onsectetur adipiscing elit...
10/11/2015
SQLDatabases
DBMS
Courseware
17
Metadata and Text Extractor Classifier
Information about Relations
Graph
-bas
ed
Repre
sent
ation
Intermediate Graph
Representation Builder
input
Graph-based Representation
Classification of
Representations
courseware
Relationships Analyzer
Combiner
Enriched Taxonomy
Topics
external sources
Taxonomy
Step A - Extraction of elements of interest
Step B - Intermediate Representation Instantiation Step C - Intermediate Representation Analysis
Java + Lucene APIs Graph Database (Neo4J)
Shadowsas graphs
Classification of Shadows
elements of interest
Step D - Courseware access
Proposal - Step C - Intermediate Representation Analysis
Extractor
18
The ACM Computing Classification System (CCS)
A B C D
General and reference Hardware Theory of computation Information systems
1 2 3
Information retrieval Data management systems
1 2 3
Query languages Middleware for databases
Information integration
World Wide Web
Proposal - Step C - Intermediate Representation Analysis
19
The ACM Computing Classification System (CCS)
A B C D
General and reference Hardware Theory of computation Information systems
1 2 3
Information retrieval Data management systems World Wide Web
1 2 3
Query languages Middleware for databases
Information integration
Proposal - Step C - Intermediate Representation Analysis
20
Metadata and Text Extractor Classifier
Information about Relations
Graph
-bas
ed
Repre
sent
ation
Intermediate Graph
Representation Builder
input
Graph-based Representation Classification
of Representations
courseware
Relationships Analyzer
Combiner
Enriched Taxonomy
Topics
external sources
Taxonomy
Step A - Extraction of elements of interest
Step B - Intermediate Representation Instantiation Step C - Intermediate Representation Analysis
Java + Lucene APIs Graph Database (Neo4J)
Shadowsas graphs
Classification of Shadows
elements of interest
Step D - Courseware access
Proposal - Step C - Intermediate Representation Analysis
Extractor
21
The ACM Computing Classification System (CCS)
A B C D
General and reference HardwareTheory of computation
Information systems
1 2 3
Information retrieval Data management systems
1 2 3
Query languages Middleware for databases
Information integration
World Wide Web
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases(video)
Prof. Saraiva
AdvancedDatabases
Lorem ipsum dolor sit amet, onsectetur adipiscing elit...
10/11/2015
SQL,Database, DBMS...
Topics???
22
Introduction to
Databases(video)
N wikipages
Proposal - Step C - Intermediate Representation Analysis
ESA
80% SQL
20% Depth-first search
Gabrilovich and Markovitch, 2007 ; Apache Lucene, 2014
Proposal - Step C - Intermediate Representation Analysis
23
Introduction to
Databases(video)
25
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases(video)
Prof. Saraiva
AdvancedDatabases
Lorem ipsum dolor sit amet, onsectetur adipiscing elit...
10/11/2015
SQL,Database, DBMS...
Topics
Information Systems
Datamanagement
systems
Querylanguages
26
Metadata and Text Extractor Classifier
Information about Relations
Graph
-bas
ed
Repre
sent
ation
Intermediate Graph
Representation Builder
inputelements of interest
Graph-based Representation
Classification of
Representations
courseware
Relationships Analyzer
Combiner
Enriched Taxonomy
Topics
external sources
Taxonomy
Step A - Extraction of elements of interest
Step B - Intermediate Representation Instantiation Step C - Intermediate Representation Analysis
Java + Lucene APIs Graph Database (Neo4J)
Shadowsas graphs
Classification of Shadows
Step D - Courseware access
Proposal - Step C - Intermediate Representation Analysis
Extractor
27
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases(video)
Information Systems
Datamanagement
systems
Querylanguages
Classification Algorithms
(slides)
InformationIntegration
28
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases(video)
Information Systems
Datamanagement
systems
Querylanguages
Classification Algorithms
(slides)
InformationIntegration
29
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases(video)
Information Systems
Datamanagement
systems
Querylanguages
Classification Algorithms
(slides)
InformationIntegration
30
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases(video)
Information Systems
Datamanagement
systems
Querylanguages
Databases I(video)
31
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases(video)
Information Systems
Datamanagement
systems
Querylanguages
Databases I(video)
32
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases
Classification Algorithms
Databases I Data Mining
33
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases
Classification Algorithms
Databases I Data Mining
Clique
34
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases
Classification Algorithms
Databases I Data Mining
3
22 1
Shortest Path Graph
to “Data Mining”
35
Proposal - Step C - Intermediate Representation Analysis
Introduction to
Databases
Classification Algorithms
Databases I Data Mining
Centrality
Proposal
Step A - Extraction of elements of interest
Extractor Classifier
Graph builder Information about Relations
Graph
-bas
ed
Repre
sent
ation
Intermediate Graph
Representation Builder
inputelements of interest
Graph-based Representation
Classification of
Representations
courseware
Relationships AnalyzerGraphInterface
query
Combiner
Enriched Taxonomy
Topics
external sources
Taxonomy
Step B - Intermediate Representation Instantiation Step C - Intermediate Representation Analysis
Step D - Courseware access
Graph Database (Neo4J)
Java + 2graph API
Graph-based representations, informations about relations and
classification
output
Java + Lucene APIs Graph Database (Neo4J)
DDEx
Shadows
36
Preliminary conclusions
Expected contributions:
● A framework for integration of different courseware highlighting relationships among topics;
○ It is not necessary tags and training sets;
● Analysis of multilevels relationships through graphs and taxonomy;
● Adaptation of the algorithm ESA to classification of topics of courseware using intrinsic features
37
References
● Changuel, S., Labroche, N., and Bouchon-Meunier, B. (2015). Resources sequencing using automatic prerequisite–outcome annotation. ACM Trans. Intell. Syst. Technol.,6(1):pages 6:1–6:30.
● Gabrilovich, E. and Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI’07, pages 1606–1611, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
● Mishra, S., Gorai, A., Oberoi, T., and Ghosh, H. (2010). Efficient Visualization of Content and Contextual Information of an Online Multimedia Digital Library for Effective Browsing. 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pages 257–260.
● Mota, M. S. and Medeiros, C. B. (2013). Introducing shadows: Flexible document representation and annotation on the web. ICDE Workshops, pages 13–18.
● Ouyang, Y. and Zhu, M. (2007). eLORM: Learning object relationship mining based repository. Proceedings - The 9th IEEE International Conference on E-Commerce Technology; The 4th IEEE International Conference on Enterprise Computing, E-Commerce and E-Services, CEC/EEE 2007, pages 691–698.
38
References
● Pereira, B. (2014). Entity Linking with Multiple Knowledge Bases: An Ontology Modularization Approach. In The Semantic Web - ISWC 2014, pages 513–520. Springer International Publishing.
● Santanchè, A., Longo, J. S. C., Jomier, G., Zam, M., and Medeiros, C. B. (2014). Multifocus research and geospatial data - anthropocentric concerns. JIDM - Journal of Information and Data Management, 5(2):146–160.
● Sathiyamurthy, K., Geetha, T. V., and Senthilvelan, M. (2012). An approach towards dynamic assembling of learning objects. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics, ICACCI ’12, pages 1193–1198, New York, NY, USA. ACM.
● Shirakawa, M., Nakayama, K., Hara, T., and Nishio, S. (2015). Wikipedia-based semantic similarity measurements for noisy short texts using extended naive bayes. IEEE Trans. Emerging Topics Comput., 3(2):205–219.
● Tong, Y., Cao, C. C., Zhang, C. J., Li, Y., and Chen, L. (2014). CrowdCleaner: Data cleaning for multi-version data on the web via crowdsourcing. 2014 IEEE 30th International Conference on Data Engineering, pages 1182–1185.
39
Acknowledgements
● Laboratory of Information Systems - Unicamp
● Work partially financed by CAPES, FAPESP/Cepid in Computational Engineering and Sciences (2013/08293-7), FAPESP-PRONEX (eScience project), INCT in Web Science (CNPq 557.128/2009-9), and individual grants from CAPES and CNPq.
40
Thanks!Use of graphs and taxonomic classifications to analyze
content relationships among courseware
Institute of ComputingUNICAMP