analyze content relationships among courseware...

41
Use of graphs and taxonomic classifications to analyze content relationships among courseware Márcio de Carvalho Saraiva and Claudia Bauzer Medeiros Institute of Computing UNICAMP

Upload: hakiet

Post on 25-Aug-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Use of graphs and taxonomic classifications to analyze content relationships among courseware

Márcio de Carvalho Saraiva and Claudia Bauzer Medeiros

Institute of ComputingUNICAMP

Background and Motivation

Slides2

Videos

Background and Motivation

3

Background and Motivation

More than 1600 items about "databases"

Changuel et al., 20154

5

Background and Motivation

It should be easy to understand how different materials are related.

Ouyang and Zhu, 2007

6

Relationships:● Authorship● Date● Location● Visual● Topics● etc.

?

?

?

?

?

Related Work

7

Educational

Data Mining(Pereira, 2014)

Recognition of data relationships

(Sathiyamurthy et al. 2012)

Analysis of relationships using graph databases

(Cavoto et al. 2015)

Integration of multimedia data

(Santanchè et al. 2014)

Objects metadata

Architecture with hierarchies

● Analysis on a single level

● not related to education

● one kind of data

● semantic annotations

● training sets

Goal

Allow the integration of different types of educational material, highlighting relationships among content.

8

Proposal

CIMAL

I'm having trouble on "Big Data" in discipline "X" of teacher "Y" what other material could help me to understand this issue?

Sources 1 to N

Student

CIMAL: Courseware Integration under Multiple relations to Assist Learning

9

Proposal

10

Step B - Intermediate Representation Instantiation Step C - Intermediate Representation Analysis

Step D - Courseware access

Step A - Extraction of elements of interest

Extractor DDEx

Java + Youtube API

input

courseware

elements of interest

Proposal - Step A - Extraction of elements of interest

11

Classification Algorithms

Introduction to Databases

Proposal - Step A - Extraction of elements of interest

12

Commented slide, highlighted concepts,Slide titles,

Descriptions from figures and tables ....

13

Proposal - Step A - Extraction of elements of interest

Data Science

Data Mining

Classification

14

Proposal - Step A - Extraction of elements of interest

0:00- 0:30

“...Databases are important...”

0:31- 1:00

“...everybody need to know SQL...”

1:01- 1:30

“...the DBMS is a computer software

application...”

Shadows as graphs Builder

15

Metadata and Text Extractor

input

Graph-based Representation

courseware

Step A - Extraction of elements of interest

Step B - Intermediate Representation Instantiation

Shadowsas graphs

elements of interest

Step D - Courseware access

Step C - Intermediate Representation Analysis

Intermediate Graph

Representation Builder

Proposal - Step B - Intermediate Representation Instantiation

Extractor

Author

Discipline

Text

Date

Set of relevantconcepts

16Mota and Medeiros, 2013

Proposal - Step B - Intermediate Representation Instantiation

Introduction to

Databases(video)

Prof. Saraiva

AdvancedDatabases

Lorem ipsum dolor sit amet, onsectetur adipiscing elit...

10/11/2015

SQLDatabases

DBMS

Courseware

17

Metadata and Text Extractor Classifier

Information about Relations

Graph

-bas

ed

Repre

sent

ation

Intermediate Graph

Representation Builder

input

Graph-based Representation

Classification of

Representations

courseware

Relationships Analyzer

Combiner

Enriched Taxonomy

Topics

external sources

Taxonomy

Step A - Extraction of elements of interest

Step B - Intermediate Representation Instantiation Step C - Intermediate Representation Analysis

Java + Lucene APIs Graph Database (Neo4J)

Shadowsas graphs

Classification of Shadows

elements of interest

Step D - Courseware access

Proposal - Step C - Intermediate Representation Analysis

Extractor

18

The ACM Computing Classification System (CCS)

A B C D

General and reference Hardware Theory of computation Information systems

1 2 3

Information retrieval Data management systems

1 2 3

Query languages Middleware for databases

Information integration

World Wide Web

Proposal - Step C - Intermediate Representation Analysis

19

The ACM Computing Classification System (CCS)

A B C D

General and reference Hardware Theory of computation Information systems

1 2 3

Information retrieval Data management systems World Wide Web

1 2 3

Query languages Middleware for databases

Information integration

Proposal - Step C - Intermediate Representation Analysis

20

Metadata and Text Extractor Classifier

Information about Relations

Graph

-bas

ed

Repre

sent

ation

Intermediate Graph

Representation Builder

input

Graph-based Representation Classification

of Representations

courseware

Relationships Analyzer

Combiner

Enriched Taxonomy

Topics

external sources

Taxonomy

Step A - Extraction of elements of interest

Step B - Intermediate Representation Instantiation Step C - Intermediate Representation Analysis

Java + Lucene APIs Graph Database (Neo4J)

Shadowsas graphs

Classification of Shadows

elements of interest

Step D - Courseware access

Proposal - Step C - Intermediate Representation Analysis

Extractor

21

The ACM Computing Classification System (CCS)

A B C D

General and reference HardwareTheory of computation

Information systems

1 2 3

Information retrieval Data management systems

1 2 3

Query languages Middleware for databases

Information integration

World Wide Web

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases(video)

Prof. Saraiva

AdvancedDatabases

Lorem ipsum dolor sit amet, onsectetur adipiscing elit...

10/11/2015

SQL,Database, DBMS...

Topics???

22

Introduction to

Databases(video)

N wikipages

Proposal - Step C - Intermediate Representation Analysis

ESA

80% SQL

20% Depth-first search

Gabrilovich and Markovitch, 2007 ; Apache Lucene, 2014

Proposal - Step C - Intermediate Representation Analysis

23

Introduction to

Databases(video)

24

ESA

80% SQL

Querylanguages

Proposal - Step C - Intermediate Representation Analysis

Courseware

25

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases(video)

Prof. Saraiva

AdvancedDatabases

Lorem ipsum dolor sit amet, onsectetur adipiscing elit...

10/11/2015

SQL,Database, DBMS...

Topics

Information Systems

Datamanagement

systems

Querylanguages

26

Metadata and Text Extractor Classifier

Information about Relations

Graph

-bas

ed

Repre

sent

ation

Intermediate Graph

Representation Builder

inputelements of interest

Graph-based Representation

Classification of

Representations

courseware

Relationships Analyzer

Combiner

Enriched Taxonomy

Topics

external sources

Taxonomy

Step A - Extraction of elements of interest

Step B - Intermediate Representation Instantiation Step C - Intermediate Representation Analysis

Java + Lucene APIs Graph Database (Neo4J)

Shadowsas graphs

Classification of Shadows

Step D - Courseware access

Proposal - Step C - Intermediate Representation Analysis

Extractor

27

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases(video)

Information Systems

Datamanagement

systems

Querylanguages

Classification Algorithms

(slides)

InformationIntegration

28

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases(video)

Information Systems

Datamanagement

systems

Querylanguages

Classification Algorithms

(slides)

InformationIntegration

29

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases(video)

Information Systems

Datamanagement

systems

Querylanguages

Classification Algorithms

(slides)

InformationIntegration

30

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases(video)

Information Systems

Datamanagement

systems

Querylanguages

Databases I(video)

31

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases(video)

Information Systems

Datamanagement

systems

Querylanguages

Databases I(video)

32

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases

Classification Algorithms

Databases I Data Mining

33

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases

Classification Algorithms

Databases I Data Mining

Clique

34

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases

Classification Algorithms

Databases I Data Mining

3

22 1

Shortest Path Graph

to “Data Mining”

35

Proposal - Step C - Intermediate Representation Analysis

Introduction to

Databases

Classification Algorithms

Databases I Data Mining

Centrality

Proposal

Step A - Extraction of elements of interest

Extractor Classifier

Graph builder Information about Relations

Graph

-bas

ed

Repre

sent

ation

Intermediate Graph

Representation Builder

inputelements of interest

Graph-based Representation

Classification of

Representations

courseware

Relationships AnalyzerGraphInterface

query

Combiner

Enriched Taxonomy

Topics

external sources

Taxonomy

Step B - Intermediate Representation Instantiation Step C - Intermediate Representation Analysis

Step D - Courseware access

Graph Database (Neo4J)

Java + 2graph API

Graph-based representations, informations about relations and

classification

output

Java + Lucene APIs Graph Database (Neo4J)

DDEx

Shadows

36

Preliminary conclusions

Expected contributions:

● A framework for integration of different courseware highlighting relationships among topics;

○ It is not necessary tags and training sets;

● Analysis of multilevels relationships through graphs and taxonomy;

● Adaptation of the algorithm ESA to classification of topics of courseware using intrinsic features

37

References

● Changuel, S., Labroche, N., and Bouchon-Meunier, B. (2015). Resources sequencing using automatic prerequisite–outcome annotation. ACM Trans. Intell. Syst. Technol.,6(1):pages 6:1–6:30.

● Gabrilovich, E. and Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI’07, pages 1606–1611, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

● Mishra, S., Gorai, A., Oberoi, T., and Ghosh, H. (2010). Efficient Visualization of Content and Contextual Information of an Online Multimedia Digital Library for Effective Browsing. 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pages 257–260.

● Mota, M. S. and Medeiros, C. B. (2013). Introducing shadows: Flexible document representation and annotation on the web. ICDE Workshops, pages 13–18.

● Ouyang, Y. and Zhu, M. (2007). eLORM: Learning object relationship mining based repository. Proceedings - The 9th IEEE International Conference on E-Commerce Technology; The 4th IEEE International Conference on Enterprise Computing, E-Commerce and E-Services, CEC/EEE 2007, pages 691–698.

38

References

● Pereira, B. (2014). Entity Linking with Multiple Knowledge Bases: An Ontology Modularization Approach. In The Semantic Web - ISWC 2014, pages 513–520. Springer International Publishing.

● Santanchè, A., Longo, J. S. C., Jomier, G., Zam, M., and Medeiros, C. B. (2014). Multifocus research and geospatial data - anthropocentric concerns. JIDM - Journal of Information and Data Management, 5(2):146–160.

● Sathiyamurthy, K., Geetha, T. V., and Senthilvelan, M. (2012). An approach towards dynamic assembling of learning objects. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics, ICACCI ’12, pages 1193–1198, New York, NY, USA. ACM.

● Shirakawa, M., Nakayama, K., Hara, T., and Nishio, S. (2015). Wikipedia-based semantic similarity measurements for noisy short texts using extended naive bayes. IEEE Trans. Emerging Topics Comput., 3(2):205–219.

● Tong, Y., Cao, C. C., Zhang, C. J., Li, Y., and Chen, L. (2014). CrowdCleaner: Data cleaning for multi-version data on the web via crowdsourcing. 2014 IEEE 30th International Conference on Data Engineering, pages 1182–1185.

39

Acknowledgements

● Laboratory of Information Systems - Unicamp

● Work partially financed by CAPES, FAPESP/Cepid in Computational Engineering and Sciences (2013/08293-7), FAPESP-PRONEX (eScience project), INCT in Web Science (CNPq 557.128/2009-9), and individual grants from CAPES and CNPq.

40

Thanks!Use of graphs and taxonomic classifications to analyze

content relationships among courseware

[email protected]

Institute of ComputingUNICAMP