sensor data warehouse design march 14 2005 mysql conf

30
MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 1 Real-Time Sensor Data Warehouse Architecture Using MySQL Database Jacob Nikom MIT Lincoln Laboratory The MySQL Users Conference 2005 19 April 2005 This work was sponsored by the U.S. Army Space and Missile Defense Command under Air Force Contract# F19628-00-C-0002. Opinions, interpretations, recommendations and conclusions are that of the author and are not necessarily endorsed by the United States Government.

Upload: prateek-shrivastava

Post on 28-Nov-2014

112 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MySQL Users Conf.04-19-2005

MIT Lincoln Laboratory1

Real-Time Sensor Data Warehouse Architecture Using MySQL Database

Jacob Nikom

MIT Lincoln Laboratory

The MySQL Users Conference 2005

19 April 2005

This work was sponsored by the U.S. Army Space and Missile Defense Command under Air Force Contract# F19628-00-C-0002.Opinions, interpretations, recommendations and conclusions are that of the author and are not necessarily endorsed by the United States Government.

Page 2: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 2

Outline

• Introduction

• Corporate Information Factory (CIF) and its

Data Management Architecture (DMA)

• Designing ROCC DMA using CIF architecture

• Summary

Page 3: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 3

Outline

• Introduction

– Reagan Test Site (RTS) and its instrumentation

– What is RTS Operations Coordination Center (ROCC)?

– ROCC primary operations

– ROCC logical component block diagram

– ROCC modernization

– New ROCC Data Management Architecture

• Corporate Information Factory (CIF) and its Data Management Architecture (DMA)

• Designing ROCC DMA based on CIF architecture

• Summary

Page 4: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 4

Reagan Test Site (RTS) and its Instrumentation

• The Reagan Test Site (RTS) range instrumentation

– Multiple RF sensors collecting data in several regions of electromagnetic spectrum

– Multiple optical sensors collecting objects’ metrics and spectral characteristics

– Telemetry systems capable of tracking multiple targets

– Mobile and fixed ground safety instrumentation

Page 5: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 5

What is RTS Operations Coordination Center (ROCC)?

Network

Flat Files

Sensors

Data Analysis Algorithms

Decision Algorithms

Current DMA

Displays

• RTS instrumentation is controlled by the ROCC

• ROCC primary operations– Executes the prepared scenario for the acquisition session

– Manages the data flow from multiple sensors

– Processes the acquired data

– Provides operator displays to track and predict the path of space objects

– Stores the acquired data for later analysis and reporting

– Facilitates training and simulation of performed activities

Page 6: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 6

What kind of system is ROCC?

Feedback control system block diagram

• Control is the process of making a system variable adhere to a particular value, called reference value

• A system designed to follow a changing reference is called tracking control system

PLANTCONTROLLER

feedbackprocessor

referenceInput r(t)

controlled variable c(t)

feedbacksignal

actuatingsignal m(t)

error signal e(t)

b(t)c(t)

+

-

FORWARD PATH

FEEDBACK PATH

COMPARATOR

ROCC is a tracking control system following the predefined reference input

Page 7: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 7

Current ROCC DMA Block Diagram

Planning

Reference Data

Report:Data analysis

Output Data

Data Plant

Sensors

Simulation

Automatic Real-Time Processing & Analysis

Manual Processing & Analysis

Displays Voice Operators

TrackingFusion

ClassificationIdentification

TrajectoryEstimation

Tactical decision control loop

• ROCC controls the data acquisition, analysis and distribution processes

• Maximizes the quality of delivered data over specified time

Page 8: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 8

ROCC Modernization

• Obsolete system hardware– Old central processors and boards are no longer supported– Not enough computational power to perform new tasks– Old components and interfaces are incompatible with modern

technology

• Aging system software– Centralized monolithic architecture– Flat files for storing data– Use of old procedural languages– Alphanumeric displays

• Modernized system– Industry standard 32/64-bit Xeon or Opteron servers– Software vendor independence: Linux and Java– Database-based storage– Distributed architecture using publish/subscribe paradigm– Graphical user interface for visualization tools– Targeted dataflow rates: 5 MB/s (sustained), 10 MB/s (peak)– Data accumulation rate: 1 TB/year

Page 9: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 9

New Data Management Architecture

• ROCC data management challenges

– Support powerful high-precision instrumentation with almost real-time response

– Support intensive and costly data collection process involving many human operators with high level of reliability

– Support data analysis leading to changes in data acquisition environment

– Be adequate for the wide range of transaction types – from simple real-time record reads and inserts to complex multidimensional analytical queries

– Manage combination of streaming data with traditional structures

– Provide request management, configuration management and data quality management capabilities

• Search for new data management architecture

– New system represents conceptual change from the old architecture

– Instrumentation and Control software traditionally concentrates on algorithm development and lacks good data architecture

– Need for framework supporting “analysis – decision – execution” paradigm

– Enterprise software is a leading implementer of distributed architecture and publish/subscribe paradigm

Page 10: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 10

Outline

• Introduction

• Corporate Information Factory (CIF) for Data Management Architecture

– What is Corporate Information Factory (CIF)?

– CIF data flow diagram

– CIF data

– CIF layers

– CIF logical component block diagram

• Designing ROCC data management architecture using CIF architecture

• Summary

Page 11: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 11

What is Corporate Information Factory (CIF) ? *

• Information ecosystem is a model of corporate information processing

– “CIF is the physical embodiment of the notion of an information ecosystem”

• CIF consists of the following components

– External world

– Applications

– An integration and transformation layer (I & T layer)

– An operational data store (ODS)

– A data warehouse (DW) with current and historical detailed data

– A data mart(s)

– An internet and intranet

– A metadata repository

– An exploration and data mining warehouse

– Alternative (secondary) storage

– Decision support system (DSS)

• CIF approach could be used for modeling information processing in any organization (“forest vs. trees” view)

* “Corporate Information Factory”, by W.H. Inmon, Claudia Imhoff, Ryan Sousa. Wiley; 2 edition (December 18, 2000)

Page 12: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 12

CIF Data Flow Diagram

DW

Primary storage

management

Data acquisition

Integration &Transform

layer

Reference

data

Application layer

CRM (tx)

eComm (tx)

ERP (tx)

BI (tx)

Data delivery

Exploration warehouse

Data mining warehouse

Statistical analysis

DSS

applications

Finance

Sales

Marketing

Accounting

Data marts

External world

Enterprise transactions

Internet

Enterprise Resource Planning

(ERP)

ODS

Historical reference

data

Operational reports

External data

Metadata managementRow detailed data

Operational layer

Warehouse layer

Report & Analysis layer

eComm (rpt)

CRM (rpt)

ERP (rpt)

BI (rpt)

Alternative storage

CRM = Customer Relation Management

BI = Business Intelligence

Page 13: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 13

CIF Data

• External data

– Data is defined outside of corporation. Could have erroneous, redundant or unnecessary items

– Data format is defined outside of corporation. Reformatting could be required

• Reference data

– Allows to standardize on commonly used names for important and frequently used information

– Allows consistent interpretation of corporate data across different departments

– Could be aliases for common and often referred names

• Historical data

– Volume of data – longer history more data

– Usefulness of data – recent data is more useful than the older one

– Granularity of data – older data likely be used on summary level

ODS Applications

Ancient history Recent history Most current activity Immediate future

Corporate timeline

Data

DW

Page 14: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 14

CIF Layers

• Application layer

– Interacting directly with end user

– Gathering detailed transaction data

– Auditing and adjusting data

– Editing data

• Integration and transformation layer

– Combined non-integrated data from multiple application

– Transform external data into corporate data

– Creating appropriate metadata

– Mathematical transformation

– Reformatting and resequencing

CRM (tx)

eComm (tx)

ERP (tx)

BI (tx)

Page 15: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 15

CIF Layers (Continued)

• Operational layer– Subject-oriented– Integrated– Volatile– Current-valued– Detailed– Normalized

• Warehouse layer– Subject-oriented– Integrated– Nonvolatile– Time-variant– Comprised of both summary and detailed data– Summary data optimized for Report & Analyses queries– Normalized and de-normalized data

• Report & Analysis layer– Statistical analysis

– Exploration reporting– Data mining reporting

– DSS analysis and reporting– Finance – Sales– Marketing– Accounting

ODS

DataWarehouse

eComm (rpt)

CRM (rpt)

ERP (rpt)

BI (rpt)

Statistics

Page 16: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 16

CIF Logical Component Block Diagram

Corporate Goals

Reference Data Data Plant

Applications

Tactical decision control loop

OperationalData Store

Real-time DSS

Long-term DSS

DataWarehouse

Strategic decision control loop

Output Data

Corporate Report

• System controls the corporation resources using real-time and long-term DSS

• Maximized the expected profit of corporation over specified time

Page 17: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 17

Outline

• Introduction

• Corporate Information Factory (CIF) for Data Management Architecture (DMA)

• Designing ROCC DMA using CIF architecture

– ROCC data flow diagram

– ROCC data

– ROCC layers

– ROCC logical component block diagram

– Database selection

– Three dangers of database design

• Summary

Page 18: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 18

ROCC Data Flow Diagram

Operational data

Data acquisition

Integration &Transform

layer

Reference data

Archived data

SpaceData

marts

ODS

Operational layer

Warehouse layer

Report & Analysis layer

External world

Multicast middleware

Quick Look reports

Planning

Post overview BET

Impact

Bias modeling

Data mining warehouse

Sensor control data

Short-term reporting &

analysis

Long-term reporting &

analysis

RIB

RIB

RIB

RIB

Best Choice

Smoother

Data Fusion

Classifier

DSS applications

Secondary storage

DW

RIB = ROCC Interface Box

Page 19: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 19

ROCC Data

• External data– Data is defined outside of ROCC. Could have erroneous, redundant, or unnecessary items– Data format is defined outside of ROCC. Reformatting or object conversion could be

required

• Reference data– Comprise geophysics models and constants necessary for external data interpretation– Comprise common locations, sensor names, name of computers, programs– Comprise the user names, passwords, access rights and privileges

• Historical data– Operational data being migrated to the warehouse become historical data– Detailed historical data are used to produce summarized historical data– Historical data only inserted, never updated

• Planning data– Comprise configuration data for the sensors’ acquisition procedures– Comprise ROCC software components’ configuration data (XML format)– Comprise data to plan specific activities to acquire space objects’ coordinates

Page 20: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 20

ROCC Layers

• External world

– Simultaneous output from multiple sensors up to 10 MB/s– Capable to produce data autonomously– Capable to work under the guidance of DSS applications– Produces data as streams with considerable output rates

Feedback from DSS applications

• Integration and transformation layer

Plays vitally important role in reconciling the incoming external data

content and format with the internal data requirements

– Converts incoming data into appropriate Java objects– Creates necessary metadata– Mathematical transformation– Reformatting and resequencing

RIB

RIB

RIB

RIB

Page 21: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 21

ROCC Layers (continued)

– Subject-orientedFocusing on basic transaction processing. Inserts and reads the streams of integrated andtransformed sensor data• Tracks, Ids, Control blocks, etc.

– Integrated Physical unification and cohesiveness

• Uniform key structures• Table naming conventions• Common physical units and coordinate systems• Data layouts and Metadata

– VolatileODS data could be updated (replaced) as a normal part of processing. After acquisitionsession is done the data are moved to the DW

– Current-valuedODS data values are related to the current event (current acquisition session). For the nextmission the ODS will be updated and its content will be moved to the DW (data migration)

– DetailedODS contains inserted values of the published sensor objects and does not expect to havesummary data

– NormalizedODS contains normalized data

– Decision Support System ApplicationsMakes real-time operational decisions like ID assignment, sensor allocation, etc

ODS

Best Choice

Smoother

Data Fusion

Classifier

DSS applications

• Operational Layer

Page 22: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 22

ROCC ODS Specifics

• Data streams of objects– Streams of measurements usually don’t have very complex structures– Object-relational mapping is straightforward and not computationally intensive

• Indices– High-speed insertion does not allow to use indices– Relatively small size of the ODS allows to work without indices– Indices do exist in the DW

• Real-time DSS feedback– Could control the sensors, which in turn influences the input data– Typical analytical application assume that data producer is not changed during

the query

DW

Network

Secondary System

Primary System

ODS

Network

ODS

Network

Archive System

Additional benefits

• Necessary operations could be performed during the copying

• Two operational databases could be used in parallel right after the acquisition

• Fault-tolerance (primary and secondary ODS)

Page 23: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 23

ROCC Layers (continued)

• Historical (data warehouse) layer

– Subject-oriented

Organized like ODS around major ROCC entities, but focused on the modeling and analysis of data

– Integrated

Data migrated into DW from ODS are integrated with the rest of DW data

– Time-variant

Every datum in the data warehouse is identified with a particular time period. All summarized data are correct only for the particular period to whom the corresponding detailed data are identified with

– Non-volatile

There are no updates in the warehouse, only inserts. The past cannot be changed, only expanded

– Comprised of both summary and detailed data

Once detailed data from ODS migrated into DW, they became a part of history. In addition to the detailed historical data DW contains summary data. They are pre-calculated to reduce analytical query times

– ROCC DW specifics

ROCC DW does not use multidimensional data model yet, only summarized tables

DataWarehouse

Page 24: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 24

What is Angle Bias Modeling?What is Angle Bias Modeling? Creation of a mathematical model to describe differences Creation of a mathematical model to describe differences between reported and actual antenna pointing positionsbetween reported and actual antenna pointing positions

Δ

Adjusted pointing using biases

Raw pointing information

Bias

Corrected pointing information

Bias model Bias model coefficientscoefficients

DataWarehouse

Bias

Modeling

Application

ODS RIB

Real-time queriesReal-time queries

Storing sensor Storing sensor data streamsdata streams

Data Data migrationmigration

Analytical Analytical queriesqueries

Sensor data Sensor data collectioncollection

Sensor Control System

ROCC Layers (continued)

Continuous automatic monitoring of sensor metric performanceExample: Angle Bias Modeling using ROCC Data WarehouseExample: Angle Bias Modeling using ROCC Data Warehouse

• Analysis and Reporting layer

Page 25: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 25

Angle Bias Modeling using ROCC Angle Bias Modeling using ROCC Data WarehouseData Warehouse

Organization of Sensor-Specific Summary Track Data in the Warehouse

Observed Data Truth Data (Time-aligned and in Sensor Coord) Residual Data

Time Range Az El Iono Corr Tropo Corr SNR Range Az El Delta Rng Delta Az SNRSource

Bias Modeling Application Data Flow

Generate Generate ResidualsResiduals

Observed Observed Data Data

AtmosphericAtmospheric DataData

Truth Truth DataData

Residual Residual Data Data

Multivariate Multivariate RegressionRegression

Bias ModelBias ModelAnalytic Analytic EquationEquation

Bias Model Bias Model CoefficientsCoefficients

ReportReport

Sensor Control Sensor Control System System

DataWarehouse

Strategic decision control loop

Data Warehouse

Page 26: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 26

ROCC Logical Component Block Diagram

Planning

Reference Data

Tactical decision control loop

Strategic decision control loop

Output Data

ReportData Analysis

Data PlantSensors

Simulation

Displays Voice Operators

Operational Data Store

Tactical real-time DSS

Strategic long-term DSS

Data Warehouse

Bias Modeling Sensor Comparison Operators

• ROCC controls the RTS resources using tactical and strategic DSS

• Maximizes the quality of collected data over specified time

TrackingFusion

ClassificationIdentification

TrajectoryEstimation

Page 27: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 27

Database Selection

Comparison Comparison criteriacriteria

(qualitative values)

MySQL Oracle DB2 (IBM) SQL Server

(Microsoft)

PostgreSQL

Speed High High High High Low

Sophistication Moderate High High High High

Reliability High High High Moderate Low

Administration

simplicity

High Low Low Moderate High

Standardization High Moderate Moderate Moderate Moderate

Savings High Low Low Low High

• The same server should work adequately for both ODS and DW

• Deficiency in sophistication could be mitigated by custom programming

Page 28: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 28

Three dangers of ROCC DMA design

• “Balkanization” of data– Different groups of data have different design– Attempt to fit data definitions into requirements of the existing tool– In the long run increase the maintenance cost

• Dialectism– Usage of specific database dialects– Deviation from existing SQL standards– Locks the user with specific vendor

• “Dirty” repository design– Part of the data stored in the database, another (closely related on)

stored in the file system– Duplication of data between database and file system– Increases the maintenance const

Page 29: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 29

Outline

• Introduction

• Corporate Information Factory (CIF) for Data Management Architecture

• Designing ROCC data management architecture using CIF Architecture

• Summary

Page 30: Sensor Data Warehouse Design March 14 2005 MySQL Conf

MIT Lincoln LaboratoryMySQL Users Conf.

04-19-2005 04/09/23 04:07 PM 30

Summary

• Modernization of the ROCC calls for a new type of data management architecture– New high-performance hardware– Significant increase of generated and managed volumes of data– Introduction of new services

• CIF satisfies the requirements– Designed to support large scale information system– Effectively manages different types of information queries– Provides flexibility in distributing data between multiple producers and consumers

• ODS and DW represent two types of repositories for information request– ODS supports near real-time storage requirements and targeted, low granular queries– DW is used for complex queries against summary-level data

• ODS and DW are parts of different control loops– ODS provides information for tactical decisions about near real-time data acquisition– DW delivers feedback for strategic decisions leading to system improvements

• MySQL is a good fit for ODS and DW databases– Good performance for fast queries in ODS– Capable of storing large amount of data in DW– Simple installation and licensing allow many independent servers to run inside one system

being used as ODS, DW, data marts, etc.– Excellent Java support allows seamless integration with the rest of the software