data and metadata challenges

1
www.walsaip.uprm.edu W ALSAIP W ALSAIP Supported By A IP G roup G roup A utom ated In fo rm a tio n Processing 2 NSF Grant CNS-0540592 This work centers on the design and development of a Java-based XML information representation (XIR) tool for the coupling/binding representation of data and metadata entities associated with physical sensors pertaining to environmental surveillance monitoring (ESM) applications. Metadata, defined in general as data that describe data, is associated with each sensor-signal-data through a binding/coupling registry process using Extensible Markup Language (XML) format. The concept of sensor data availability in ESM is decomposed into three specific requirements for the XIR system: let users get to information in a remote manner, get access to data as soon as it is required, and enable a uniform interpretation of data among heterogeneous data sources and data destinations. Data and Metadata Challenges Data and Metadata [1] Manetti Luca, Terribilini Andrea, Knecht Alfredo, “Autonomous Remote Monitoring System for Landslides”, SPIE’s 9th Annual International Symposium on Smart Structures and Materials, 2002, San Diego, CA. [2] Nativi Stefano, Giuli Dino, Innocenti Emilio Bugli, “Interoperability for Multimedia Systems to Support Decision-Makers in the Environment Sector” IEEE International Conference On Multimedia Computing and Systems, Volume 2, June 1999, Pages 338-342 [3] Dong-Jun Won, Il-Yop Chung, Joong-Moon Kim, Seung-Il Moon, Jang-Cheol Seo, Jong-Woong Choe, D Won, II-Yop Chung, J. Kim, S. Moon, J. Seo, J. Choe, “Development of Power Quality Monitoring System with Central Processing Scheme”, Power Engineering Society Summer Meeting, IEEE, South Korea, pp. 915-919 vol.2, 21-25 July 2002. [4] MANTIS Project (MultimodAl Networks of In-situ Sensors): http://mantis.cs.colorado.edu/index.php/tiki-index.php Abstract 1 WALSAIP Conceptual Model 2 References 7 Ongoing Work 6 Problem Formulation 3 Proposed Solution 4 Automated XML Schema Representations for Sensor-based Information Processing Systems Data Representation Systems Signal Conditioning System Raw Data Servers Computational Signal Processing Systems Signal Data Post-processing Pre-processing Stage Post-processing Stage Processing Stage INTERNET Computed Data Servers Information Rendering Systems Sensor Array Structures Signal Data – all readings collected directly from sensors. Metadata – data that describes data. Metadata is crucial to provide researchers a concrete idea of the real conditions in which data was collected. Metadata is a determinant of how the environment influenced the measurement in case of abnormal findings. There is a need for proper characterization of binding/coupling relationships between data and metadata files to improve information content analysis. Data should be interoperable across heterogeneous users with different data architectures, storage systems, and platforms. A mechanism should be design to make data readable and understandable across heterogeneous users in automated information processing systems. Lack of support for dynamic metadata management. Systems need to incorporate information from “human sensors”. Study Mechanisms for data and metadata acquisition Study correspondences of data and metadata files Figure 1. WALSAIP Conceptual Model Figure 3. Example: NERR System Data/Metadata Data + Metadata + Processing Decision Making! Figure 2. Decision Making Input Figure 4. Shannon’s Theory and XML Processing AUTOMATE! U ser Informati on Source XML Coder Communicatio n Channel Rx Receiver Tx Transmitter XML Decoder U ser Informatio n Destinatio n Hazards: Jamming Interferenc e Power Failure Etc…. Data Meta Data XML XML Data Meta Data Design and implementation of the Information Representation Tool (XIR) tool using Java, XML, and FTP technologies for encapsulation of data and metadata files (proposed as format for information content exchange) in automated information processing systems. Enable user to develop “stencils” in order to customize “XML tags” during encapsulation. Information theoretic measures are used to study how the extensible markup language (XML) may serve as a means for integrating symbols and meaning (semiotics and semantics parts), from metadata, with signals and structure (syntactic part) from sensor-based raw signal-data. Applying engineering techniques for solution design. Generating source code to implement a proposed solution instantiation. Identifying potential test cases to perform functional verification test after coding. Integrating a proposed solution to the WALSAIP architecture. Implementation Effort 5 Analysis of current metadata management and storage formats in order to provide encapsulation support. Analysis of data/metadata consumer modules within WALSAIP architecture to ensure compatibility and integration. Generation algorithms to acquire plain text values non text data such as images and acoustic signals. Engineering of algorithms to gather critical metadata directly from data. For example image dimensions and format. Figure 6. Data and Metadata Encapsulation Example Figure 5. Shannon’s Theory Approach to Information Flow Study Proposed solution contemplates dynamic metadata management. Enable data and metadata enhancement with user observations. Context awareness aids in the detection, estimation, and classification of sensor-based signals acquired from ESM for the assessment and proper management of Earth’s geophysical, environmental, and ecological issues. N ote: Static Local Inform ation Input Server Application Output Input 1 K D 1 K M Internet K D K M K K K M M M ~ 1 K K K D D D ~ 1 - - Internet K D ~ k M ~ <?xml version="1.0" ?> - <encapsulation> - <metadata> - <research> <researchName>Wide Area Large Scale Automated Information Processing</researchName> <department>Department of Electrical and Computer Engineering</department> <intitution>University of Puerto Rico at Mayaguez</intitution> <phone>787-832-4040</phone> <contact>Domingo Rodriguez</contact> <email>[email protected]</email> </research> - <sensingInfo> <initialDate>2006-07-05</initialDate> <initialTime>22:23:00.14</initialTime> <endingDate>2006-07-06</endingDate> <endingTime>22:23:00.14</endingTime> <nodeID>0</nodeID> <samplingRate>138</samplingRate> <type>humidity</type> </sensingInfo> </metadata> <data>65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 </data> </encapsulation> Luz V. Acabá-Cuevas – M.S. Student Prof. Domingo Rodríguez – Advisor AIP Group, ECE Department University of Puerto Rico Email: [email protected] Mayagüez Campus

Upload: miach

Post on 29-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

3. 2. Problem Formulation. WALSAIP Conceptual Model. Study Mechanisms for data and metadata acquisition. Data + Metadata + Processing Decision Making!. Study correspondences of data and metadata files. Raw Data Servers. Computed Data Servers. INTERNET. Data Representation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data and Metadata Challenges

www.walsaip.uprm.eduWALSAIP

WALSAIPSupported By

A I PGroupGroupA u t o m a t e dI n f o r m a t i o nP r o c e s s i n g

2

NSF Grant  CNS-0540592

This work centers on the design and development of a Java-based XML information representation (XIR) tool for the coupling/binding representation of data and metadata entities associated with physical sensors pertaining to environmental surveillance monitoring (ESM) applications. Metadata, defined in general as data that describe data, is associated with each sensor-signal-data through a binding/coupling registry process using Extensible Markup Language (XML) format. The concept of sensor data availability in ESM is decomposed into three specific requirements for the XIR system: let users get to information in a remote manner, get access to data as soon as it is required, and enable a uniform interpretation of data among heterogeneous data sources and data destinations.

Data and Metadata Challenges

Data and Metadata

[1] Manetti Luca, Terribilini Andrea, Knecht Alfredo, “Autonomous Remote Monitoring System for Landslides”, SPIE’s 9th Annual International Symposium on Smart Structures and Materials, 2002, San Diego, CA.

[2] Nativi Stefano, Giuli Dino, Innocenti Emilio Bugli, “Interoperability for Multimedia Systems to Support Decision-Makers in the Environment Sector” IEEE International Conference On Multimedia Computing and Systems, Volume 2, June 1999, Pages 338-342

[3] Dong-Jun Won, Il-Yop Chung,   Joong-Moon Kim,  Seung-Il Moon,   Jang-Cheol Seo,  Jong-Woong Choe, D Won, II-Yop Chung, J. Kim, S. Moon, J. Seo, J. Choe, “Development of Power Quality Monitoring System with Central Processing Scheme”, Power Engineering Society Summer Meeting, IEEE, South Korea, pp. 915-919 vol.2, 21-25 July 2002.

[4] MANTIS Project (MultimodAl Networks of In-situ Sensors): http://mantis.cs.colorado.edu/index.php/tiki-index.php

Abstract1

WALSAIP Conceptual Model2

References7

Ongoing Work6

Problem Formulation3

Proposed Solution4

Automated XML Schema Representations for Sensor-based Information Processing Systems

Data Representation Systems

Signal Conditioning System

Raw Data Servers

Computational Signal Processing Systems

Signal Data Post-processing

Pre-processing Stage Post-processing StageProcessing Stage

INTERNETComputed Data

Servers

Information Rendering

Systems

Sensor Array Structures

Signal Data – all readings collected directly from sensors.

Metadata – data that describes data. Metadata is crucial to provide researchers a concrete idea of the real conditions in which data was collected. Metadata is a determinant of how the environment influenced the measurement in case of abnormal findings.

There is a need for proper characterization of binding/coupling relationships between data and metadata files to improve information content analysis.

Data should be interoperable across heterogeneous users with different data architectures, storage systems, and platforms.

A mechanism should be design to make data readable and understandable across heterogeneous users in automated information processing systems.

Lack of support for dynamic metadata management.

Systems need to incorporate information from “human sensors”.

Study Mechanisms for data and metadata acquisition

Study correspondences of data and metadata files

Figure 1. WALSAIP Conceptual Model

Figure 3. Example: NERR System Data/Metadata

Data + Metadata + Processing

Decision Making!

Figure 2. Decision Making Input

Figure 4. Shannon’s Theory and XML Processing

AUTOMATE!

User

Information Source

XMLCoder

Communication

Channel

RxReceiver

TxTransmitter

XMLDecoder

User

Information Destination

Hazards:

JammingInterferencePower FailureEtc….

DataMetaData

XML

XMLData

MetaData

Design and implementation of the Information Representation Tool (XIR) tool using Java, XML, and FTP technologies for encapsulation of data and metadata files (proposed as format for information content exchange) in automated information processing systems.

Enable user to develop “stencils” in order to customize “XML tags” during encapsulation.

Information theoretic measures are used to study how the extensible markup language (XML) may serve as a means for integrating symbols and meaning (semiotics and semantics parts), from metadata, with signals and structure (syntactic part) from sensor-based raw signal-data.

Applying engineering techniques for solution design.Generating source code to implement a proposed solution instantiation.Identifying potential test cases to perform functional verification test after coding.Integrating a proposed solution to the WALSAIP architecture.

Implementation Effort5

Analysis of current metadata management and storage formats in order to provide encapsulation support. Analysis of data/metadata consumer modules within WALSAIP architecture to ensure compatibility and integration.Generation algorithms to acquire plain text values non text data such as images and acoustic signals.Engineering of algorithms to gather critical metadata directly from data. For example image dimensions and format.

Figure 6. Data and Metadata Encapsulation Example

Figure 5. Shannon’s Theory Approach to Information Flow Study

Proposed solution contemplates dynamic metadata management.

Enable data and metadata enhancement with user observations.

Context awareness aids in the detection, estimation, and classification of sensor-based signals acquired from ESM for the assessment and proper management of Earth’s geophysical, environmental, and ecological issues.

Note:

Static Local Information

Input

Server Application

Output

Inp

ut

1KD 1KM

Internet

KD KM

KKK MMM

~

1

KKK DDD

~

1

-

--

Internet

KD~

kM~

<?xml version="1.0" ?> - <encapsulation>- <metadata>- <research> <researchName>Wide Area Large Scale Automated Information Processing</researchName> <department>Department of Electrical and Computer Engineering</department> <intitution>University of Puerto Rico at Mayaguez</intitution> <phone>787-832-4040</phone> <contact>Domingo Rodriguez</contact> <email>[email protected]</email> </research>- <sensingInfo> <initialDate>2006-07-05</initialDate> <initialTime>22:23:00.14</initialTime> <endingDate>2006-07-06</endingDate> <endingTime>22:23:00.14</endingTime> <nodeID>0</nodeID> <samplingRate>138</samplingRate> <type>humidity</type> </sensingInfo> </metadata> <data>65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 </data> </encapsulation>

Luz V. Acabá-Cuevas – M.S. Student Prof. Domingo Rodríguez – AdvisorAIP Group, ECE Department University of Puerto Rico Email: [email protected] Mayagüez Campus