non-traditional databases. reading 1. scientific data management at the johns hopkins institute for...
TRANSCRIPT
Non-Traditional Non-Traditional DatabasesDatabases
ReadingReading
1.1. Scientific data management at the Johns Hopkins Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif institute for data intensive engineering and science Yanif Ahmad, Randal Burns, Michael Kazhdan, Charles Ahmad, Randal Burns, Michael Kazhdan, Charles Meneveau, Alex Szalay, Andreas Terzis, February 2011 Meneveau, Alex Szalay, Andreas Terzis, February 2011 SIGMOD Record , Volume 39 Issue 3 , SIGMOD Record , Volume 39 Issue 3 , http://dl.acm.org/citation.cfm?http://dl.acm.org/citation.cfm?id=1942776.1942782&coll=DL&dl=ACM&CFID=6620605id=1942776.1942782&coll=DL&dl=ACM&CFID=66206057&CFTOKEN=48992457 7&CFTOKEN=48992457
2.2. Migrating a (large) science database to the cloud Ani Migrating a (large) science database to the cloud Ani Thakar, Alex Szalay, June 2010 HPDC '10: Proceedings of Thakar, Alex Szalay, June 2010 HPDC '10: Proceedings of the 19th ACM International Symposium the 19th ACM International Symposium on High on High Performance Distributed Computing , Performance Distributed Computing , http://dl.acm.org/citation.cfm?id=1851539&bnc=1 http://dl.acm.org/citation.cfm?id=1851539&bnc=1
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 22
ReadingReading
3.3. M. Stonebaker, U. Cetintemel, One Size Fits All": An M. Stonebaker, U. Cetintemel, One Size Fits All": An Idea Whose Time Has Come and Gone, in Idea Whose Time Has Come and Gone, in Proceeding of CDE '05 Proceedings of the 21st Proceeding of CDE '05 Proceedings of the 21st International Conference on Data Engineering, International Conference on Data Engineering, IEEE Computer Society Washington, DC, USA, IEEE Computer Society Washington, DC, USA, 2005, 2005, http://www.computer.org/portal/web/csdl/abs/prochttp://www.computer.org/portal/web/csdl/abs/proceedings/icde/2005/2285/00/22850002abs.htm eedings/icde/2005/2285/00/22850002abs.htm
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 33
Traditional Database Traditional Database Management SystemsManagement Systems Focus on business data Focus on business data
managementmanagement Provide uniform capabilities Provide uniform capabilities
regardless of the data regardless of the data characteristicscharacteristics
Need: Need: capabilities to meet new capabilities to meet new application requirementsapplication requirements
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 44
Examples of New Examples of New NeedsNeeds Stream Data ProcessingStream Data Processing Large scale scientific databasesLarge scale scientific databases Data warehousingData warehousing
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 55
Streaming DataStreaming Data
Sensor-based applicationsSensor-based applications– Real-time systems: sophisticated Real-time systems: sophisticated
alerting, location-based services, alerting, location-based services, – Historical dataHistorical data
Financial applicationsFinancial applications– Support applications, such as electronic Support applications, such as electronic
trading, legal compliance, real-time trading, legal compliance, real-time marker analysis, etc.marker analysis, etc.
Performance requirementsPerformance requirements
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 66
Performance SDMS vs. Performance SDMS vs. RDMSRDMS
Empirical results (see reference paper #3)Empirical results (see reference paper #3) Issues:Issues:
– Inbound processing model Inbound processing model – Correct primitives for stream processing Correct primitives for stream processing
(aggregates, “timeout,” “slack”)(aggregates, “timeout,” “slack”)– Seamless integration of DBMS processing Seamless integration of DBMS processing
with application processing (client-server vs. with application processing (client-server vs. embedded applications)embedded applications)
– Transactional behavior (weaker notion of Transactional behavior (weaker notion of recovery, tolerance, no ACID requirements)recovery, tolerance, no ACID requirements)
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 77
Security for Security for Streaming Data?Streaming Data? What is the difference between What is the difference between
the security needs of streaming the security needs of streaming vs. traditional (e.g., relational) vs. traditional (e.g., relational) data?data?
How to enforce security?How to enforce security?– Security punctuationSecurity punctuation
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 88
Scientific DatabasesScientific Databases
Massive amount of dataMassive amount of data Heterogeneous dataHeterogeneous data
– Sensor data, satellite, scientific Sensor data, satellite, scientific simulation data, etc.simulation data, etc.
Goal: better understanding of Goal: better understanding of physical phenomenaphysical phenomena– Genomic database, geological Genomic database, geological
exploration, astronomy, etc. exploration, astronomy, etc. FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 99
Scientific DatabasesScientific Databases
Need efficient analysis and querying Need efficient analysis and querying capabilitiescapabilities– Multi-dimensional indexing (e.g., Multi-dimensional indexing (e.g.,
genomic sequence indexing)genomic sequence indexing)– Specific applications (e.g., visualization Specific applications (e.g., visualization
of seismic data)of seismic data)– Specific aggregations (e.g., data mining Specific aggregations (e.g., data mining
for biological correlation)for biological correlation)– Efficient data archiving, staging, lineage, Efficient data archiving, staging, lineage,
and error propagation techniquesand error propagation techniquesFarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1010
Example Scientific Example Scientific Data Management Data Management Reference #1Reference #1 Basic research: Basic research:
1.1. formation of hypotheses and theoriesformation of hypotheses and theories
2.2. designing experiments for their designing experiments for their validationvalidation
3.3. collecting data by experimentationcollecting data by experimentation
4.4. analyzing data to guide new insights for analyzing data to guide new insights for further researchfurther research
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1111
Scientific ComputingScientific Computing
Steps 3 and 4 are data intensiveSteps 3 and 4 are data intensive Need to improve computational Need to improve computational
powerpower– Parallel processingParallel processing– Grid and supercomputersGrid and supercomputers– Special application logic Special application logic – Preservation of scientific dataPreservation of scientific data
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1212
Current Technologies and Current Technologies and Scientific DatabasesScientific Databases
Reference #2: How to migrate Reference #2: How to migrate large scale scientific database to large scale scientific database to cloud environment?cloud environment?
Difficult engineering processDifficult engineering process Limited capabilities of database Limited capabilities of database
useruser Based on commercial cloudBased on commercial cloud
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1313
Data WarehousingData Warehousing
Repository of data providing Repository of data providing organized and cleaned organized and cleaned enterprise-wide data (obtained enterprise-wide data (obtained form a variety of sources) in a form a variety of sources) in a standardized formatstandardized format– Data mart (single subject area)Data mart (single subject area)– Enterprise data warehouse (integrated Enterprise data warehouse (integrated
data marts)data marts)– Metadata Metadata
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1414
Data WarehousingData Warehousing
Difference between OLTP and Difference between OLTP and OLAPOLAP
Data management: updates, Data management: updates, indexing, dependencies, etc.indexing, dependencies, etc.
OLAP: needs Read Optimized OLAP: needs Read Optimized storagestorage
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1515
FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1616
Next ClassNext Class
Geographical DatabasesGeographical Databases