jeevan-mtech-thesis-presentation.pdf

119
1/109 VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages . Jeevan Joishi [[email protected]] MTech Research Associate, Software Analytics Research Lab (SARL) www.software-analytics.in

Upload: kamanw

Post on 18-Dec-2015

17 views

Category:

Documents


1 download

TRANSCRIPT

  • 1/109

    VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational

    Database Query Languages .

    Jeevan Joishi [[email protected]]

    MTech Research Associate, Software Analytics Research Lab (SARL)www.software-analytics.in

  • 2/109

    MTech Thesis Evaluation Committee Members

    Thesis AdviserProf. Ashish SurekaAdjunct Faculty at IIIT-Delhi and currently Visiting Researcher at Siemens Corporate Research and TechnologyFaculty In-charge, Software Analytics Research Lab (SARL)

    External Examiner

    Dr. Radha Krishna PisipatiPrincipal Research Scientist at Infosys Technologies Limited.

    Internal Examiner

    Prof. Sandip AineFaculty Member at IIIT-Delhi

  • 3/109

    1. Research Motivation and Aim

    2. Related Work and Novel Research Contributions

    3. Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

    5. Experimental Dataset

    6. Performance Comparison

    7. Conclusion

    8. Limitations

    9. References

    Outline

    Vishleshan

  • 4/109

    1. Research Motivation and Aim

    2. Related Work and Novel Research Contributions

    3. Implementation of Alpha Algorithm in SQL, RDBMS

    4. Implementation of Alpha Algorithm in CQL, Column Oriented

    5. Experimental Dataset

    6. Performance Comparison

    7. Conclusion

    8. Limitations

    9. References

    Presentation Outline

    Vishleshan

    Research Motivation and Aim

  • 5/109

    Why NoSQL?

    Global population accessing internet has increasedtremendously.

    Most applications are hosted on the cloud and need tosupport users 24 hours a day, 365 days a year.

    Vishleshan

    Research Motivation and Aim

    Introduction to NoSQL

    Figure taken from [17]

    Fig 1: Scale of internet usage.

  • 6/109

    Why NoSQL?

    Data is captured in huge volumes and consists of bothstructured and unstructured data.

    Amount of data is growing rapidly and nature of data isgrowing as well.

    Vishleshan

    Research Motivation and Aim

    Introduction to NoSQL

    Figure taken from [17]

    Fig 2: Growth of data.

  • 7/109

    Why NoSQL?

    What is wrong with relational databases?

    Nothing!

    Relational Databases employ one size fits all philosophyfor storage.

    Relational Databases are used when strong consistency is amust.

    Relational Databases can create problem when its time toscale.

    Vishleshan

    Research Motivation and Aim

    Introduction to NoSQL

  • 8/109

    Why NoSQL?

    Explosion of social media sites like Facebook, Twitter with large dataneeds.

    They had to capture and deal with very large volumes of data in a waywhich was difficult to deal with traditional RDBMS.

    Traditional databases are designed to scale up. We required a databasethat can scale out.

    When relational applications become successful, usage goes up. Joinsare inherent in RDBMS and become very slow!

    Application developers find it difficult to get the dynamic scalabilitythey need while maintaining the performance users demand.

    Vishleshan

    Research Motivation and Aim

    Introduction to NoSQL

  • 9/109

    Why NoSQL?

    We require a technology that scales out rather than scaling up!

    Scale Up- Add more processor, memory.

    Scale Out- Add more servers.

    Vishleshan

    Research Motivation and Aim

    Introduction to NoSQL

    Figure taken from [18]

    Fig 3: Scale-up vs. Scale-out.

  • 10/109

    NoSQL Database.

    Hence, NoSQL databases were introduced: Not Only SQL

    Non-relational data stores.

    Do not require a fixed table schema.

    Do not strictly follow on ACID properties of database,instead focus on CAP(Consistency, Availability, PartitionTolerance).

    Column stores, Graph databases, Document stores.

    Vishleshan

    Research Motivation and Aim

    Introduction To NoSQL

  • 11/109

    RDBMS vs. NoSQL

    Scale up vs. Scale out

    Normalization vs. De-normalization

    ACID vs. CAP

    Schema vs. Schema-less

    Structured Data vs. Unstructured Data.

    Vishleshan

    Research Motivation and Aim

    Introduction to NoSQL

  • 12/109

    Row Oriented vs. Graph Oriented

    Vishleshan

    Research Motivation and Aim

    Row Oriented vs. Graph Oriented Database

    Figure taken from [19]

    Record No

    Name Address City State

    01 Jeevan Joishi Uniworld Apartment Bangalore Karnataka

    02 Kunal Gupta 15th Cross Road Kanpur Uttar Pradesh

    03 Priyanka Verma Sector-7 Jind Haryana

    04 Nidhi Agarwal JJ colony Bhiwani Haryana

    Table 1: A RDBMS table.Fig 4: A Graph model.

  • 13/109

    Row Oriented vs. Graph Oriented

    Vishleshan

    Research Motivation and Aim

    Row Oriented vs. Graph Oriented Database

    In row oriented, to read specific attributes, whole recordneeds to be read.

    Joins in relational databases are compute-intensive tasks.

    However, graph databases can read individual values basedon nodes, relationships or properties.

    Graph databases avoid joins by traversing relationship(s)using index-free adjacency.

  • 14/109

    Row Oriented vs. Graph Oriented

    Vishleshan

    Research Motivation and Aim

    Row Oriented vs. Graph Oriented Database

    Figure taken from [20]

    Fig. 5: Relationships in Relational databases.

  • 15/109

    Row Oriented vs. Graph Oriented

    Vishleshan

    Research Motivation and Aim

    Row Oriented vs. Graph Oriented Database

    Figure taken from [20]

    Fig. 5: Relationships in Relational databases.

  • 16/109

    Row Oriented vs. Graph Oriented

    Vishleshan

    Research Motivation and Aim

    Row Oriented vs. Graph Oriented Database

    Figure taken from [20]

    Fig. 6: Relationships in Graph databases.

  • 17/109

    Row Oriented vs. Graph Oriented

    Vishleshan

    Research Motivation and Aim

    Row Oriented vs. Graph Oriented Database

    Native Graph Processing using index-free adjacencyNon-Native Graph Processing using Global lookup indexFig 7: Fig 8:

    Non-native vs. Native Graph Processing

  • 18/109

    Process Mining

    Process Mining is analysing a process using event log data.

    One of the key aspects is to study the social structure of theorganization using event logs.

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Fig 9: Types of Process Mining Techniques

  • 19/109

    Process Mining

    Process Mining focuses on the analysis of process usingthe data present in event logs.

    Each event in an event log record details in an activity.

    Each event is associated with Case Identifiers (CaseID).

    Each event has a timestamp.

    Each event has an activity that is being performed.

    An event has an actor that handles the event.

    Additionally, each such event may include a unique identifier.

    Vishleshan

    Research Motivation and Aim

    Process Mining

  • 20/109

    Process Mining

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Fig. 10: An example Event Log.

  • 21/109

    Process Mining

    Each event in an event log record details in an activity.

    Each event is associated with Case Identifiers(CaseID).

    Each event has a timestamp.

    Each event has an activity that is being performed.

    An event has an actor that handles the event.

    Additionally, each such event may include a uniqueidentifier.

    Vishleshan

    Research Motivation and Aim

    Process Mining

  • 22/109

    Process Mining

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Fig. 10: An example Event Log.

  • 23/109

    Process Mining

    Each event in an event log record details in an activity.

    Each event is associated with Case Identifiers (CaseID).

    Each event has a timestamp.

    Each event has an activity that is being performed.

    An event has an actor that handles the event.

    Additionally, each such event may include a uniqueidentifier.

    Vishleshan

    Research Motivation and Aim

    Process Mining

  • 24/109

    Process Mining

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Fig. 10: An example Event Log.

  • 25/109

    Process Mining

    Each event in an event log record details in an activity.

    Each event is associated with Case Identifiers (CaseID).

    Each event has a timestamp.

    Each event has an activity that is being performed.

    An event has an actor that handles the event.

    Additionally, each such event may include a uniqueidentifier.

    Vishleshan

    Research Motivation and Aim

    Process Mining

  • 26/109

    Process Mining

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Fig. 10: An example Event Log.

  • 27/109

    Process Mining

    Each event in an event log record details in an activity.

    Each event is associated with Case Identifiers (CaseID).

    Each event has a timestamp.

    Each event has an activity that is being performed.

    An event has an actor that handles the event.

    Additionally, each such event may include a uniqueidentifier.

    Vishleshan

    Research Motivation and Aim

    Process Mining

  • 28/109

    Process Mining

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Fig. 10: An example Event Log.

  • 29/109

    Process Mining

    Each event in an event log record details in an activity.

    Each event is associated with Case Identifiers (CaseID).

    Each event has a timestamp.

    Each event has an activity that is being performed.

    An event has an actor that handles the event.

    Additionally, each such event may include a uniqueidentifier.

    Vishleshan

    Research Motivation and Aim

    Process Mining

  • 30/109

    Process Mining

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Fig. 10: An example Event Log.

  • 31/109

    Process Mining

    Vishleshan

    Research Motivation and Aim

    Process Mining

    3 types of process mining techniques:1. Process Discovery

    2. Process Conformance

    3. Process Enhancement

    3 types of process mining perspectives:1. Control Flow Perspective

    2. Organizational Perspective

    3. Case Perspective.

  • 32/109

    Similar Task Algorithm

    Similar Task algorithm focuses on identifying actorsperforming similar activities in the organizationalperspective.

    It focuses on activities the actors perform irrespective ofcases.

    It is based on the notion that people doing similar thingshave a stronger relation than people doing different things.

    Vishleshan

    Research Motivation and Aim

    Process Mining

  • 33/109

    Similar Task Algorithm

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Case Identifier

    Activity Identifier

    Actor

    1 A Nidhi

    2 A Nidhi

    2 C Kunal

    1 B Priyanka

    3 A Pooja

    1 C Nidhi

    3 D Kunal

    3 B Priyanka

    2 B Pooja

    2 D Astha

    1 D Astha

    A B C D

    Nidhi 2 0 1 0

    Kunal 0 0 1 0

    Priyanka 0 2 0 0

    Pooja 1 1 0 0

    Astha 0 0 0 2

    Actor-Activity Matrix

    Sample Event LogTable 2:

    Table 3:

  • 34/109

    Similar - Task Algorithm

    Given two vectors of attributes, A and B, the Cosine-Similarity if given by

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Nidhi Kunal Priyanka Pooja Astha

    Nidhi --- 0.32 0.00 0.63 0.00

    Kunal 0.32 --- 0.00 0.00 0.70

    Priyanka 0.00 0.00 --- 0.70 0.00

    Pooja 0.63 0.00 0.70 --- 0.00

    Astha 0.00 0.70 0.00 0.00 ---

    Cosine Similarity ValuesTable 4:

    Figure taken from [21].

  • 35/109

    Similar Task Algorithm at a glance!

    Vishleshan

    Research Motivation and Aim

    Similar - Task Algorithm at a glance!

  • 36/109

    Sub Contract Algorithm

    Sub Contract algorithm focuses on how work moves amongperformers.

    The main idea is to count the number of times individual jperforms an activity in between two activities performed byindividual i.

    The relation between individuals are case dependent.

    Vishleshan

    Research Motivation and Aim

    Process Mining

  • 37/109

    Sub Contract Algorithm

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Case Identifier

    Activity Identifier

    Actor

    1 A Nidhi

    2 A Nidhi

    2 C Kunal

    1 B Priyanka

    3 A Pooja

    1 C Nidhi

    3 D Kunal

    3 B Priyanka

    2 B Pooja

    2 D Astha

    1 D Astha

    Sample Event LogTable 5:

    Case Identifier

    Activity Identifier

    Actor

    1 A Nidhi

    1 B Priyanka

    1 C Nidhi

    1 D Astha

    2 A Nidhi

    2 C Kunal

    2 B Pooja

    2 D Astha

    3 A Pooja

    3 D Kunal

    3 B Priyanka

    Organized Event LogTable 6:

    Zoom Shape 1

  • 38/109

  • 39/109

  • 40/109

    Sub Contract Algorithm

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Case Identifier

    Activity Identifier

    Actor

    1 A Nidhi

    2 A Nidhi

    2 C Kunal

    1 B Priyanka

    3 A Pooja

    1 C Nidhi

    3 D Kunal

    3 B Priyanka

    2 B Pooja

    2 D Astha

    1 D Astha

    Sample Event LogTable 5:

    Case Identifier

    Activity Identifier

    Actor

    1 A Nidhi

    1 B Priyanka

    1 C Nidhi

    1 D Astha

    2 A Nidhi

    2 C Kunal

    2 B Pooja

    2 D Astha

    3 A Pooja

    3 D Kunal

    3 B Priyanka

    Organized Event LogTable 6:

  • 41/109

    Sub - Contract Algorithm

    normal = 4.0

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Nidhi Kunal Priyanka Pooja Astha

    Nidhi 0.00 0.00 1.00 0.00 0.00

    Kunal 0.00 0.00 0.00 0.00 0.00

    Priyanka 0.00 0.00 0.00 0.00 0.00

    Pooja 0.00 0.00 0.00 0.00 0.00

    Astha 0.00 0.00 0.00 0.00 0.00

    Sub Contraction Valuesbefore Normalization

    Table 7:

    Nidhi Kunal Priyanka Pooja Astha

    Nidhi 0.00 0.00 0.25 0.00 0.00

    Kunal 0.00 0.00 0.00 0.00 0.00

    Priyanka 0.00 0.00 0.00 0.00 0.00

    Pooja 0.00 0.00 0.00 0.00 0.00

    Astha 0.00 0.00 0.00 0.00 0.00

    Sub Contraction Valuesafter Normalization

    Table 8:

    Zoom Shape 1

  • 42/109

  • 43/109

  • 44/109

    Sub - Contract Algorithm

    normal = 4.0

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Nidhi Kunal Priyanka Pooja Astha

    Nidhi 0.00 0.00 1.00 0.00 0.00

    Kunal 0.00 0.00 0.00 0.00 0.00

    Priyanka 0.00 0.00 0.00 0.00 0.00

    Pooja 0.00 0.00 0.00 0.00 0.00

    Astha 0.00 0.00 0.00 0.00 0.00

    Sub Contraction Valuesbefore Normalization

    Table 7:

    Nidhi Kunal Priyanka Pooja Astha

    Nidhi 0.00 0.00 0.25 0.00 0.00

    Kunal 0.00 0.00 0.00 0.00 0.00

    Priyanka 0.00 0.00 0.00 0.00 0.00

    Pooja 0.00 0.00 0.00 0.00 0.00

    Astha 0.00 0.00 0.00 0.00 0.00

    Sub Contraction Valuesafter Normalization

    Table 8:

  • 45/109

  • 46/109

  • 47/109

    Sub - Contract Algorithm

    normal = 4.0

    Vishleshan

    Research Motivation and Aim

    Process Mining

    Nidhi Kunal Priyanka Pooja Astha

    Nidhi 0.00 0.00 1.00 0.00 0.00

    Kunal 0.00 0.00 0.00 0.00 0.00

    Priyanka 0.00 0.00 0.00 0.00 0.00

    Pooja 0.00 0.00 0.00 0.00 0.00

    Astha 0.00 0.00 0.00 0.00 0.00

    Sub Contraction Valuesbefore Normalization

    Table 7:

    Nidhi Kunal Priyanka Pooja Astha

    Nidhi 0.00 0.00 0.25 0.00 0.00

    Kunal 0.00 0.00 0.00 0.00 0.00

    Priyanka 0.00 0.00 0.00 0.00 0.00

    Pooja 0.00 0.00 0.00 0.00 0.00

    Astha 0.00 0.00 0.00 0.00 0.00

    Sub Contraction Valuesafter Normalization

    Table 8:

  • 48/109

    Sub Contract Algorithm at a glance I

    Vishleshan

    Research Motivation and Aim

    Sub - Contract Algorithm at a glance!

  • 49/109

    Sub Contract Algorithm at a glance II

    Vishleshan

    Research Motivation and Aim

    Sub - Contract Algorithm at a glance!

  • 50/109

    Research Motivation and Aim

    Query languages provide the most standard way tointeract with the database.

    We, try to implement process mining algorithm usingdatabase query languages to the extent possible so thatour application is tightly coupled to the database.

    Our work lies at the intersection of Process Mining andNoSQL databases.

    Vishleshan

    Research Motivation and Aim

  • 51/109

    Research Aim

    Vishleshan

    Research Motivation and Aim

    Research Aim .

    To investigate the intersection of Process Mining and Graph Database(s) fordetecting social, hierarchical structures.

    To understand application needs that can be modelled into this new domain.

    To implement Similar-Task algorithm and Sub-Contract algorithm in row-orienteddatabase, MySQL.

    To implement Similar-Task algorithm and Sub-Contract algorithm in graphoriented database, Neo4j.

    To compare performance of Similar-Task algorithm and Sub-Contract Algorithm in

    MySQL and Neo4j.

  • 52/109

    1. Research Motivation and Aim

    2. Related Work and Novel Research Contributions

    3. Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

    5. Experimental Dataset

    6. Performance Comparison

    7. Conclusion

    8. Limitations

    9. References

    Presentation Outline

    Vishleshan

  • 53/109

    Implementation of Mining Algorithms in Relational Databases

    Ordonez et al. [5] Implement k-means clustering algorithm in SQL.

    Cluster large datasets in RDBMS.

    Define suitable tables, index them and write suitable queries forclustering purposes.

    Ordonez et al. [6] Extend own work in [5].

    Efficient implementation of EM algorithm to perform clustering invery large datasets.

    Vishleshan

    Related Work and Novel Research Contributions

    Implementation of Mining Algorithms in Relational Databases.

  • 54/109

    Implementation of Mining Algorithms in Relational Databases

    Berzal et al. [7] Implemented Tree Based Association Rule Mining to discover

    interesting patterns in relational databases.

    Sattler et al. [8] Applied data mining techniques on a decision tree and classifier.

    Tight coupling of data mining and database systems.

    Vishleshan

    Related Work and Novel Research Contributions

    Implementation of Mining Algorithms in Relational Databases

  • 55/109

    Implementation of Mining Algorithms in Graph Databases

    Wang et al. [9] Studied structural pattern mining for large disk based graph

    databases.

    They presented a novel ADI index structure and efficient algorithmsfor mining frequent pattern.

    Wang et al. [10] Presented techniques to obtain scalable mining in graph databases.

    Vishleshan

    Related Work and Novel Research Contributions

    Implementation of Mining Algorithms in Graph Databases

  • 56/109

    Implementation of Mining Algorithms in Graph Databases

    Vishleshan

    Related Work and Novel Research Contributions

    Implementation of Mining Algorithms in Graph Databases.

    Huan et al. [11] Presented novel technique to mine maximal frequent sub-graph in

    graph databases.

    Ozaki et al. [12] Came up with hyper-clique pattern in graph databases.

    Used hyper-clique pattern to detect highly correlated sub-graphs.

  • 57/109

    Performance Comparison of Mining Algorithms in Relational and Graph Databases.

    Vicknair et al. [13] Performance comparison of Relational and Graph databases for

    data provenance systems.

    McColl et al. [14] Evaluated performance of series of open-source graph databases.

    Used various graph algorithms for a graph setup consisting of 256 million nodes.

    Vishleshan

    Related Work and Novel Research Contributions

    Performance Comparison of Mining Algorithms in Relational and Graph Databases.

  • 58/109

    Performance Comparison of Mining Algorithms in Relational and Graph Databases.

    Ciglan et al. [15] Benchmarked graph databases over graph traversal algorithms.

    Macko et al. [16] Presented a performance introspection framework for Graph

    database, PIG.

    PIG provided tools and mechanisms to understand performance of graph database.

    Vishleshan

    Related Work and Novel Research Contributions

    Performance Comparison of Mining Algorithms in Relational and Graph Databases.

  • 59/109

    Novel Research Contributions

    While there has been work done in implementing data mining algorithmsin relational and graph databases, we are,

    First to implement organizational mining algorithms (Similar-Task andSub-Contract) in row oriented database MySQL using SQL.

    First to implement organizational mining algorithms (Similar-Task andSub-Contract) in graph oriented database Neo4j using CYPHER.

    Performance Benchmarking of organizational mining algorithms(Similar-Task and Sub-Contract) on MySQL and Neo4j.

    Vishleshan

    Related Work and Novel Research Contributions

    Novel Research Contributions.

  • 60/109

    Presentation Outline

    1. Research Motivation and Aim

    2. Related Work and Novel Research Contributions

    3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

    4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

    5. Experimental Dataset

    6. Performance Comparison

    7. Conclusion

    8. Limitations

    9. References

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

  • 61/109

    Steps

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Similar Task Algorithm

    Implementation of Similar-Task algorithm in SQL can be divided into four (4) broad tasks

    Declare and iterate cursor to select distinct tasks.

    Create a table to store result.

    Fetch actors vector and calculate Cosine Similarity.

    Write results to the result table.

  • 62/109

    Define and iterate cursor

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Similar Task Algorithm

    Declare cursor to select distinct tasks from table

    Open cursor. Loop through the results returned by the cursor.

  • 63/109

    Declare table to store results

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Similar Task Algorithm

    Dynamically create table with the specified table-name.

    Prepare SQL statements from the query and execute it.

  • 64/109

    Fetch actors vector and calculate Cosine-Similarity I.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Similar Task Algorithm

    Prepare query to insert into table

    Define variables to store values for cosine-similarity calculation.

  • 65/109

    Fetch actors vector and calculate Cosine-Similarity II.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Similar Task Algorithm

    Inside the cursor, collect distinct tasks from the tables for the required calculation.

  • 66/109

    Fetch actors vector and calculate Cosine-Similarity III.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Similar Task Algorithm

    Append parts of cosine similarity calculation to the SQL query.

  • 67/109

    Update Final Results I.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Similar Task Algorithm

    Declare a cursor to get all distinct teams.

    Iterate through the cursor to get distinct teams

  • 68/109

    Update Final Results II.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Similar Task Algorithm

    Form a query by for creating table and taking distinct teams as columns.

    Inside the cursor loop, append distinct teams as columns of the table.

  • 69/109

    Update Final Results III.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Similar Task Algorithm

    Form a query for inserting values into the table (resultant table)

    Inside the cursor loop, assign similarity values at the respective column (match teams).

  • 70/109

    Steps

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Sub - Contract Algorithm.

    Sub-Contract Algorithm implementation can be studied under four (4) broad categories:

    Create table to store results.

    Find distinct case identifiers.

    Update normal and find sub-contraction within each case.

    Normalize the result.

  • 71/109

    Create table to store results I

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Sub - Contract Algorithm.

    Declare cursor to select distinct actors.

    Iterate through the cursor to collect the distinct actors.

  • 72/109

    Create table to store results II

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Sub - Contract Algorithm.

    Form a query to create a table.

    Inside the cursor, append each distinct actor as part of the query.

  • 73/109

    Find distinct case identifiers

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Sub - Contract Algorithm.

    Declare cursor to select distinct case identifiers with count >= 3

    Iterate through the cursor. For each distinct case identifier, call procedure ExecuteCase.

  • 74/109

    Update normal and find sub-contraction I.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Sub - Contract Algorithm.

    Update normal.

  • 75/109

    Update normal and find sub-contraction II.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Sub - Contract Algorithm.

    Declare a cursor to find sub-contracting actors.

  • 76/109

    Update normal and find sub-contraction III.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Sub - Contract Algorithm.

    Iterate through the cursor to find IDs of actor

  • 77/109

    Update normal and find sub-contraction IV.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Sub - Contract Algorithm.

    Declare cursor to find sub-contracting actors.

    Iterate through the cursor to find IDs of sub-contracting actors.

  • 78/109

    Update normal and find sub-contraction V.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Sub - Contract Algorithm.

    For any pair of sub-contracting actor, insert or update sub-contract value between them.

  • 79/109

    Normalize the result.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

    Sub - Contract Algorithm.

    Declare cursor to select distinct actors that formed columns of the result table

    For each column, form an update query and normalize it by normal

  • 80/109

    Presentation Outline

    1. Research Motivation and Aim

    2. Related Work and Novel Research Contributions

    3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

    4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

    5. Experimental Dataset

    6. Performance Comparison

    7. Conclusion

    8. Limitations

    9. References

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

  • 81/109

    Steps

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

    Similar Task Algorithm.

    Implementation of Similar Task algorithm in CYPHER consists mainly of two (2) broad functions.

    Load data with Actor and activity nodes being unique.

    Calculate Cosine-Similarity between actors.

  • 82/109

    Load actor and activity node uniquely.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

    Similar Task Algorithm.

    Load data directly from the data file. Make unique nodes for actor and activity.

  • 83/109

    Calculate Cosine - Similarity.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

    Similar Task Algorithm.

    Match common activities between actors and calculate similarity.

  • 84/109

    Steps

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

    Sub Contract Algorithm.

    Implementation of Sub Contract algorithm in CYPHER consists mainly of four (4) broad functions.

    Identify sub contracting actors within each case.

    Collect unique names and make new nodes for each of them.

    Set sub contraction strength between unique actor nodes.

    Calculate normal and normalize the sub contraction value.

  • 85/109

    Identify sub contracting actors.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

    Sub Contract Algorithm.

    Identify sub-contracting actors and connect then via [:RELATED_TO] relationship.

  • 86/109

    Collect unique names and create unique actor nodes.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

    Sub Contract Algorithm.

    Collect unique actor names

    Make new nodes, UNIQUEACTOR for each distinct actor names found.

  • 87/109

    Set sub contraction strength between unique actors.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

    Sub Contract Algorithm.

    For all sub-contracting actor, determine strength of sub-contraction between the actors.

  • 88/109

    Calculate normal and normalize the result.

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

    Sub Contract Algorithm.

    Calculate normal.

    Normalize the sub-contraction strength between actors.

  • 89/109

    Presentation Outline

    1. Research Motivation and Aim

    2. Related Work and Novel Research Contributions

    3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

    4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

    5. Experimental Dataset

    6. Performance Comparison

    7. Conclusion

    8. Limitations

    9. References

    Vishleshan

    Experimental Dataset

  • 90/109

    Experimental Dataset.

    We use Business Process Intelligence 2014 (BPI 2014)dataset to conduct our experiments.

    The log contains events from an incident and problemmanagement system of Rabobank Group ICT.

    Contains data about managing requests from RabobankGroup ICT.

    Contains total 466737 records.

    Vishleshan

    Experimental Dataset

  • 91/109

    Dataset Details

    Vishleshan

    Experimental Dataset

    Fig. 11: Sample Event Log from MySQL.

  • 92/109

    Presentation Outline

    1. Research Motivation and Aim

    2. Related Work and Novel Research Contributions

    3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

    4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

    5. Experimental Dataset

    6. Performance Comparison

    7. Conclusion

    8. Limitations

    9. References

    Vishleshan

    Performance Comparison

  • 93/109

    Load Time

    Vishleshan

    Performance Comparison

    Similar Task Algorithm

    Dataset size Load Time (msec)

    MySQL Neo4j

    65,000 2467 3413

    1,01,000 2875 3362

    2,19,500 5966 4354

    3,00,000 5850 5877

    4,66,737 7819 6875

    Table 9: Data Load TimeFig 12: Load Time

  • 94/109

    Execution Time I

    Vishleshan

    Performance Comparison

    Similar Task Algorithm

    Table 10: Execution Time of Step-8 & Step-9

    Dataset Size

    Execution Time (msec)

    Step -8 Step -9

    MySQL Neo4j MySQL Neo4j

    65,000 225 9616 2467 2403

    1,01,000 372 11700 2875 2925

    2,19,500 713 14655 5966 3664

    3,00,000 903 29520 5850 7380

    4,66,737 1403 48891 7819 12223

  • 95/109

    Execution Time II

    Vishleshan

    Performance Comparison

    Similar Task Algorithm

    Fig. 13: Execution Time of Step-8 & Step-9

  • 96/109

    Disk Usage in MySQL I

    Vishleshan

    Performance Comparison

    Similar Task Algorithm

    Table 11: Disk Space Usage in MySQL.

    Tables Dataset Size

    65000 101000 219500 300000 466737

    Dataset 3686400 5783552 11026432 15220736 21544960

    OTMatrix 65536 65536 65536 81920 81920

    InitSim 1589248 1589248 1589248 3686400 3686400

    FinalSim 229376 262144 278528 491520 1589248

  • 97/109

    Disk Usage in MySQL II

    Vishleshan

    Performance Comparison

    Similar Task Algorithm

    Fig 14: Disk Space Usage in MySQL.

  • 98/109

    Disk Usage in Neo4j I

    Vishleshan

    Performance Comparison

    Similar Task Algorithm

    Table 12: Disk Space Usage in Neo4j.

    Graph Elements

    Dataset Size

    65000 101000 219500 300000 466737

    Nodes 2820 2910 3075 3990 4215

    Relationships 770040 414315 479663 8568809 983227

    Properties 1033856 563873 651203 1155011 1323439

  • 99/109

    Disk Usage in Neo4j II

    Vishleshan

    Performance Comparison

    Similar Task Algorithm

    Fig. 14: Disk Space Usage in Neo4j.

  • 100/109

    Load Time

    Vishleshan

    Performance Comparison

    Sub Contract Algorithm

    Dataset size Load Time (msec)

    MySQL Neo4j

    65,000 6575 9567

    1,01,000 8390 10476

    2,19,500 14279 14873

    3,00,000 26437 25435

    4,66,737 43712 38234

    Table 13: Load TimeFig 15: Load Time

  • 101/109

    Execution Time in MySQL I

    Vishleshan

    Performance Comparison

    Sub Contract Algorithm

    Table 14: Execution Time for 4 main steps in MySQL.

    Dataset Size Execution Time (msec)

    Update Normal

    Sub-ContractDetection

    Update Result

    Normalize result

    65,000 32 11712 8296 16

    1,01,000 32 11782 8138 16

    2,19,500 35 11713 7940 17

    3,00,000 70 11,736 8094 17

    4,66,737 73 11747 7754 20

  • 102/109

    Execution Time in MySQL II

    Vishleshan

    Performance Comparison

    Sub Contract Algorithm

    Fig 16: Execution Time for 4 main steps in MySQL.

  • 103/109

    Execution Time in Neo4j I

    Vishleshan

    Performance Comparison

    Sub Contract Algorithm

    Table 15: Execution Time for 4 main steps in Neo4j

    Dataset Size Execution Time (msec)

    Update Normal

    Sub-ContractDetection

    Update Result

    Normalize result

    65,000 118 1542 2077 5

    1,01,000 140 1707 2773 5

    2,19,500 202 2534 2369 6

    3,00,000 336 3442 5261 9

    4,66,737 560 4149 5334 9

  • 104/109

    Execution Time in Neo4j II

    Vishleshan

    Performance Comparison

    Sub Contract Algorithm

    Fig. 17: Execution Time for 4 main steps in Neo4j.

  • 105/109

    Disk Space Usage in MySQL I

    Vishleshan

    Performance Comparison

    Sub Contract Algorithm

    Tables Dataset Size

    65000 101000 219500 300000 466737

    Dataset 4734976 6832128 13123584 18366464 27836416

    OrganisedData 4734976 6832128 13123584 18366464 27836416

    ResultMatrix 1589248 1589248 1589248 1589248 1589248

    Table 15: Disk Space Usage in MySQL

  • 106/109

    Disk Space Usage in MySQL II

    Vishleshan

    Performance Comparison

    Sub Contract Algorithm

    Fig 17: Disk Space Usage in MySQL

  • 107/109

    Disk Space Usage in Neo4j I

    Vishleshan

    Performance Comparison

    Sub Contract Algorithm

    GraphElements

    Dataset Size

    65000 101000 219500 300000 466737

    Nodes 982212 1523732 3360798 4598454 7190330

    Relationships 153477921 183955761 285778449 375437997 490033038

    Properties 384189475 461537287 719874720 942665404 1238579332

    Table 16: Disk Space Usage for graph elements in Neo4j.

  • 108/109

    Disk Space Usage in Neo4j II

    Vishleshan

    Performance Comparison

    Sub Contract Algorithm

    Fig. 18: Disk Space Usage for graph elements in Neo4j.

  • 109/109

    Presentation Outline

    1. Research Motivation and Aim

    2. Related Work and Novel Research Contributions

    3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

    4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

    5. Experimental Dataset

    6. Performance Comparison

    7. Conclusion

    8. Limitations

    9. References

    Vishleshan

    Conclusion

  • 110/109

    Conclusion

    Neo4j performs better when it comes to loading data.

    Read operations in MySQL are comparatively faster for asingle node setup.

    Neo4j gives much improved performance wheneverrelationships are of prime importance.

    Writes performance varied greatly for both cases. Forsmaller dataset, MySQL performs better whereas for largerdataset, Neo4j gives improved performance.

    Vishleshan

    Conclusion

    .

  • 111/109

    Presentation Outline

    1. Research Motivation and Aim

    2. Related Work and Novel Research Contributions

    3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

    4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

    5. Experimental Dataset

    6. Performance Comparison

    7. Conclusion

    8. Limitations and Future work

    9. References

    Vishleshan

    Limitations

  • 112/109

    Limitations

    Different sizes of single dataset was used.

    Single node setup of databases were used.

    Metrics used for organizational mining were only two innumber.

    Vishleshan

    Limitations and Future Work

    . Limitations

  • 113/109

    Future Work

    To apply the algorithm over larger data sets.

    Create a multi-node Neo4j setup and implement thealgorithms on it.

    Implement and study impact of process enhancement andrecommendation systems.

    Experiment with more relational and graph orienteddatabases.

    Vishleshan

    Limitations and Future Work.

    Future Work

  • 114/109

    Presentation Outline

    1. Research Motivation and Aim

    2. Related Work and Novel Research Contributions

    3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

    4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

    5. Experimental Dataset

    6. Performance Comparison

    7. Conclusion

    8. Limitations

    9. References

    Vishleshan

    Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

  • 115/109

    References I

    Vishleshan

    References

    WIL VAN DER AALST.Process Mining: Overview and Opportunities.ACM, 2012. vi, 2, 11

    P Neubauer. Graph databases, NOSQL and Neo4j? www.infoq.com.

    I Robinson, J Webber, E Eifrem. Graph Databaseswww.books.google.com.

    Minseok Song, WIL M. P. Van Der Aalst. Towards comprehensive support for organizational mining.Elsevier, 2008.

  • 116/109

    References II

    Vishleshan

    References

    Carlos Ordonez.Programming the K-means clustering algorithm in SQL

    C. Ordonez and P. Cereghini.SQLEM: fast clustering in SQL using the EM algorithm.International Conference on Management of Data

    Nicolas Marin Jose Maria Serrano Fernando Berzal, Juan Carlos Cubero.TBRAR: An ecient method for association rule mining in relational databases.Elsevier, 2001.

    K-U.Sattler and O.Dunemann.SQL Database Primitives for Decision Tree Classiers.Conference on Information and Knowledge Management, 2001.

  • 117/109

    References III

    Vishleshan

    References

    W Wang, C Wang, Y Zhu, B Shi, J Pei, X Yan.Graphminer: a structural pattern mining system for large disk based graph databases and its applications.ACM, 2005.

    C Wang, W Wang, Y Zhu, B Shi, J Pei.Scalable Mining of large disk based graph databases.ACM, 2004.

    J Huan, W Wang, J Prins.SPIN: mining maximal frequent subgraphs from graph databases.ACM, 2004.

    T Ozaki, T Okhwaha.Mining correlated subgraphs in graph databases.Advancement in Knowledge Discovery and Data Mining, 2008.

  • 118/109

    References IV

    Vishleshan

    References

    C Vicknair, M Macais, Z Zhao, X Nan, Y Chen.A comparison of graph databases and a relational database: a data provenance perspectiveACM, 2010.

    RC McColl, R Ediger, J Poovey, D Campbell.A performance evaluation of open-source graph databases.ACM, 2014.

    M Ciglan, A Averbuch, L HluchyBenchmarking graph traversal operations over graph databases.IEEE, 2012.

    P Macko, D Margo, M Seltzer.Performance introspection of graph databasesACM, 2013.

  • 119/109

    References V

    Vishleshan

    References

    Why NOSQL?Couchbase.

    Scale-out vs. Scale-up.www.natishalom.typepad.com.

    Introduction to Graph Databases and Neo4j.www.neo4j.com

    Cosine- Similaritywww.Wikipedia.com

    From Relational to Neo4j.www.neo4j.com