graphandtimeseriesdatabases - graz university of...
TRANSCRIPT
![Page 1: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/1.jpg)
Graph and Timeseries DatabasesDatabases 2 (VU) (706.711 / 707.030)
Roman Kern
Institute of Interactive Systems and Data Science,Technical University Graz
2018-10-22
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 1 / 30
![Page 2: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/2.jpg)
Graph DatabasesMotivation and Basics of Graph Databases?
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 2 / 30
![Page 3: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/3.jpg)
Introduction - Graph Databases
What is a graph database?Datastorage optimised for graph data structure
▶ i.e., efficient storage and access
Scales gracefully with the amount of data
Additional index for look-ups
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 3 / 30
![Page 4: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/4.jpg)
Introduction - Graph Databases
Why should graph databases work?Networks usually have certain properties
▶ Small world phenomena▶ … even in big networks only a few hops are on average required to reach even distant nodes
Access to data follows certain patterns▶ Locality of reference▶ … operations are focused on certain areas
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 4 / 30
![Page 5: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/5.jpg)
Introduction - Graph Databases
Why should one not use a graph database?… if all the data is updated at once
▶ E.g. operation applied on all nodes
… if the query cannot easily be expressed as a graph traversal operation▶ E.g. lot of random access, or aggregate functions
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 5 / 30
![Page 6: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/6.jpg)
Introduction
Graph database vs. relational databaseIn principle a graph database can be implemented
▶ … via a relational database▶ … using joins or consecutive queries
But, relational databases are not optimised for such graph models▶ … i.e., lot of sparse (semi-empty) rows
Additionally, relational databases are not designed▶ … for changes in the schema, e.g. dynamic types of relations
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 6 / 30
![Page 7: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/7.jpg)
Introduction
Figure: Comparison of import time for 2 graph databases vs. a relational DBRoman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 7 / 30
![Page 8: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/8.jpg)
Introduction - Graph Databases
Main conceptsMany contemporary graph databases are based on property graphs
▶ i.e., each node and edge are associated with a set of key/values▶ … where edges are directed (and often carry a label)
Often support ACID properties▶ i.e., each modification takes place within a transaction
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 8 / 30
![Page 9: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/9.jpg)
Introduction - Graph Databases
Query types for graph databasesLookup of nodesTraversal of a graph
▶ Start at a node▶ … continue following edges▶ … until a stopping criteria has been reached▶ Breadth-first vs. depth-first
Path finding▶ Find a path between two nodes (e.g. Dijkstra, A*)
Path matching▶ Matching patterns in graph
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 9 / 30
![Page 10: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/10.jpg)
Modelling of Graph DatabasesHow to represent the data and how to model
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 10 / 30
![Page 11: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/11.jpg)
Modelling of Graph Databases
ApproachHow should a graph database schema look like?
▶ i.e., how is the data represented as nodes and edges
Many different ways of modelling▶ Graphs are a very flexible data structure▶ … capable of capturing many domains models
In many cases a direct mapping is possible▶ Domain model and graph model
Need to review the model▶ Validate that the graph is suited for the queries being used▶ E.g. don’t mix entities with relations
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 11 / 30
![Page 12: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/12.jpg)
Modelling of Graph Databases
How to model for graph databases
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 12 / 30
![Page 13: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/13.jpg)
ApplicationPractical aspects of graph databases
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 13 / 30
![Page 14: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/14.jpg)
Application of Graph Databases
Main software tools for graph databasesNeo4j
OrientDB
TitanDB
… and many others
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 14 / 30
![Page 15: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/15.jpg)
Application of Graph Databases
Panama Papers - IntroLeaked documents from a firm in Panama (2.6TB of data)
▶ … about offshore activitiesJournalist (around the world) were working on analysing the data
▶ … a graph database was the back-end of this activities▶ Neo4j (plus Apache Solr and Tika)
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 15 / 30
![Page 16: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/16.jpg)
Application of Graph Databases
Panama Papers - StepsPopulate the database
▶ Analyse the documents⋆ e.g., entity extraction (detect names)
▶ → entities in the graph (entity types: company, officer, client, address, …)▶ Extract meta-data of documents▶ → properties for nodes
Detect relationships▶ e.g. using the connection of sender/receiver of E-Mails▶ → connections between the nodes
Refinement of graph▶ Manual work conducted by the journalists▶ More entity types, e.g. money flow, document types
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 16 / 30
![Page 17: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/17.jpg)
Application of Graph Databases
https://neo4j.com/blog/analyzing-panama-papers-neo4j/Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 17 / 30
![Page 18: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/18.jpg)
Time Series DatabasesMotivation and Basics of Time Series Databases
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 18 / 30
![Page 19: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/19.jpg)
Introduction Time Series Databases
What is a time series database?Data storage optimised for temporal data
▶ “Endless” stream of incoming data▶ … challenge for traditional databases
Often accompanied by▶ Tools to acquire time series data▶ Tools to visualise and analyse such data
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 19 / 30
![Page 20: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/20.jpg)
Introduction Time Series Databases
Typical characteristics of time series databasesFast write/append operations
Slow update/delete operations
Scales well to huge amount of dataRetention policy
▶ i.e., to forget old data
Access restrictionsProvide (SQL-like) query languages
▶ Optimised for time range queries▶ Specialised queries for aggregates
Often rely on other storage mechanism▶ e.g., key-value store, wide-column storage
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 20 / 30
![Page 21: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/21.jpg)
Introduction Time Series Databases
Sources of time seriesObservations
▶ Environmental, e.g., weather data, CO2
Economy▶ e.g., stock exchange data
Sensor data▶ e.g., human sensor data
Log files
…
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 21 / 30
![Page 22: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/22.jpg)
Introduction Time Series Databases
Types of time series databasesLimited type of payload
▶ E.g. limited to just timestamp + number▶ → least amount of memory needed
Flexible payload▶ Allows for richer representation▶ E.g. timestamp + document
Wide-tables▶ Each row consists of many columns▶ … often hundreds of columns▶ → sparse rows
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 22 / 30
![Page 23: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/23.jpg)
Modelling of Time Series DatabasesHow to represent the data and how to model
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 23 / 30
![Page 24: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/24.jpg)
Modelling of Time Series Databases
ApproachOften based on single samples (observations)
▶ Univariat, bivariat, multivariat⋆ E.g. sensor readouts of multiple sensors (temperature, air pressure)
▶ Example: Measurement consists of⋆ Timestamp, metric name, value, list of filters⋆ E.g. 10:32, cpu-usage, 0.87, host=example.com, cpu=01
Flat file▶ Generic vs. specific▶ Store the name of the time series with each observation (generic)
⋆ Needed in case of dynamic systems⋆ e.g. different sensors become available or disappear
▶ Have dedicated time series (specific)
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 24 / 30
![Page 25: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/25.jpg)
Modelling of Time Series Databases
ApproachWindowed storage
▶ Each row represent a time window▶ Columns for a more fine grained resolution
⋆ Typically between 100 and 1000 observations per row▶ Alternatively, multiple observations are stored in a single columns
⋆ Using a custom (compressed) format
Special case: temporal and spatial data▶ Requires specialised look-up methods
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 25 / 30
![Page 26: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/26.jpg)
Example for Time Series DatabasesPractical Aspects of Time Series Databases
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 26 / 30
![Page 27: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/27.jpg)
Time Series Databases Example
TICK StackCollection of tools:
▶ Telegraf: server agent for collecting and reporting metrics (stream or batch processing) towrite data into the DB
▶ InfluxDB: the time series database component▶ Chronograf: Graphing and visualisation frontend for exploration▶ Kapacitor: Data processing engine, can process stream and batch data
https://www.influxdata.com/wp-content/themes/influx/images/TICK-Stack.png
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 27 / 30
![Page 28: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/28.jpg)
Time Series Databases Example
InfluxDB FeaturesTags
▶ Tags are indexed▶ store commonly-queried meta data▶ if “GROUP BY” should be used on the data
Fields▶ Fields are not indexed▶ Everything that should not be stored as string▶ If aggregation functions should be used on the data (COUNT, MAX, PERCENTILE, CUMSUM)
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 28 / 30
![Page 29: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/29.jpg)
Time Series Databases Example
Figure: Screenshot of example data stored in InfluxDB
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 29 / 30
![Page 30: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)](https://reader030.vdocuments.mx/reader030/viewer/2022041104/5f03db467e708231d40b1a80/html5/thumbnails/30.jpg)
The EndNext: Map/Reduce
Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 30 / 30