testing the in-memory column store for in- d atabase physics analysis

12
Testing the In-Memory Column Store for in-database physics analysis Dr. Maaike Limper

Upload: marcus

Post on 06-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Testing the In-Memory Column Store for in- d atabase physics analysis. Dr. Maaike Limper. About CERN. CERN - European Laboratory for Particle Physics. Support the research activities of 10 000 scientists from 110+ nationalities. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Testing the In-Memory Column Store for   in- d atabase physics analysis

Testing the In-Memory Column Store for in-database physics analysis

Dr. Maaike Limper

Page 2: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN 2

About CERN

Largest machine in the world, the Large Hadron Collider: 27km, 6000+ superconducting magnets

Four main experiments: ATLAS, ALICE, CMS, LHCb

17/6/2014

CERN - European Laboratory for Particle Physics

Support the research activities of 10 000 scientists from 110+ nationalities

Page 3: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN 3

Higgs Boson discovery

4 July 2012: Scientists from ATLAS and CMS present Higgs discovery result

17/6/2014

› Operation of the Large Hadron Collider and its experiments relies on Oracle

databases: conditions data, metadata, logging & monitoring data, …

› … but the data-points in these plots did not came out of a database

Plots of the invariant mass of photon-pairs produced at the LHC show a significant bump around 125 GeV …

Page 4: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN 4

CERN openlab

“CERN openlab is a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide LHC community” http://openlab.web.cern.ch

17/6/2014

My project: “Test the possibility of using the Oracle database for physics analysis”

Page 5: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN 5

In-database physics analysis

Higgs decay to 2 photons candidate: event display from the ATLAS experiment

17/6/2014

Page 6: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN 6

In-database physics analysis

Physics Analysis database

› Separate physics-objects in separate tables

› Physics-object described by hundreds of variables wide tables!

17/6/2014

J/ψ

Ψ(3686)

Analysis queries

› Predicate filtering to quickly apply object quality-criteria

› Each analysis-specific query uses unique combination of columns

Page 7: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN 7

The problem

› Analysis query performance typically limited by I/O reads Full table scans over tables with many columns, while only few

columns are used for each specific analysis

› Combination of columns unique for each query Can’t index every column!

17/6/2014

Page 8: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN 8

In-Memory Column Store

Oracle’s In-Memory Column Store provides a solution to reduce I/O read time, especially for tables with many columns

17/6/2014

› Profit from fast In-Memory reads

› Read only columns relevant for the specific analysis query

Page 9: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN

Compression rates

› COMPRESS FOR QUERY vs CAPACITY HIGH “electron” typical physics-object data: mixture of int, float, double “Event Filter” only booleans (mostly false), best compression “Missing Energy” table with floats & double, worst compression

9

Table name Compress ratio IMC cap. high

Compress ratio IMC query

“electron” 3.52 1.97

“Event Filter” 63.46 22.13

“Missing Energy” 1.7 1.2

17/6/201417/6/2014 9

Average compression rate of dataset is 2.1 with query compression and 3.6 with capacity high: physics-objects represent the bulk of the data

Page 10: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN

Simple query performance

› Comparing “read from disk” vs IMC time: 1000x faster

› Comparing “read from buffer cache” vs IMC time: 40x faster

Note 2x more memory needed to put data in the buffer cache compared to placing it in the In-Memory Column store !

1017/6/201417/6/2014 10

Page 11: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN

Complex query performance

› Comparing “read from disk” vs IMC time: 70x faster

› Comparing “read from buffer cache” vs IMC time: 7x faster

11

With IMC only 10 s to make this plot, allowing the analyst to quickly optimize results while trying different variable combinations

17/6/201417/6/2014 11

Page 12: Testing the In-Memory Column Store for   in- d atabase physics analysis

Maaike Limper - CERN

Conclusion

IMC’s STAR-story:

› Situation: In-database physics analysis is limited by I/O

› Task: Remove I/O bottleneck for any query using any combination of columns in a table

› Action: Use Oracle’s In-Memory Column Store Take advantage of fast reads from cache Columnar compression increases size of data that fits in-memory Access only relevant columns and use predicate pruning to further reduce

I/O

› Result: I/O bottleneck removed, real-time in-database physics analysis is now possible*

*while the Oracle database is not currently used for physics analysis, this study shows promising results using the In-Memory Column Store for in-database physics analysis 1217/6/201417/6/2014