next generation environmental informatics as exemplified...

1
Next Generation Environmental Informatics as exemplified by the Tetherless World Semantic Water Quality Portal Ping Wang 1 ([email protected]) , Jin Guang Zheng 1 ([email protected]) , Linyun Fu 1 ([email protected]) , Evan W. Patton 1 ([email protected]) , Timothy Lebo 1 ([email protected]) , Li Ding 1 ([email protected]) , Joanne S. Luciano 1 ([email protected]) , and Deborah L. McGuinness 1 ([email protected]) ( 1 Rensselaer Polytechnic Institute 110 8 th St., Troy, NY, 12180 United States) Poster: IN31B-1438 Glossary: EPA – U.S. Environmental Protection Agency MPN – Most Probable Number PML 2 – Proof Markup Language (PML) version 2 RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute USGS – United States Geological Survey Motivation In late 2009 in Bristol County, RI there was a case of E. Coli contaminating the public water supply resulting in illnesses in the population, particularly young children. residents requested information concerning when the contamination began, how it happened, and what measures were being taken to monitor and prevent future occurrences. That event reflected the increasing demand for direct and transparent access to ecological and environmental information, and inspired the Semantic Water Quality Portal (SemantAqua) project. Next Generation Environmental Informatics Starting with the domain of water quality, we are investigating a general framework called SemantEcothat can support dynamicenvironmental informatics portals via semantically-enabled approaches, including: capture of the semantics of domain knowledge using a family of modular simple OWL2 ontologies, integration of environmental monitoring and regulation data from multiple sources following Linked Data principles preservation of provenance metadata using the Proof Markup Language (PML) version 2 inference of environment pollution events using OWL2 inference Combined with distributed sensor networks and incremental OWL2 classification, this work could provide a scaffold for deploying near real-time reporting of pollution events in communities. SemantAqua Workflow Location-based Information Retrieval Users input a ZIP Code™ to identify the area for their search. SemantAqua uses Geonames to look up additional information, e.g. city and state, to generate location-based query over the USGS and EPA datasets. The mobile interface also takes advantage of the W3C geolocation APIs to find polluted sites near the user. Enabling Context-Sensitive Actions In order to help users take an active role in monitoring water quality where they live, SemantAqua attempts to identify useful links where users can report problems with their local water supplies. Currently, the portal supports reporting to the EPA and some state departments that are related to environmental preservation and protection (e.g. the California Department of Fish and Game). Work to identify the appropriate links to external authorities that accept reports within their jurisdictions is still ongoing. Provenance-based Query SemantAqua captures provenance i during the data integration stages them in the Proof Markup Langu version2 Provenance Interlingua. The provenance information is beingused to support provenance-based queries. For example the system allows users to inspect data source information choose to rely only on data from so trust. This will be particularly i portal expands to include other sources of data (see Future Work). Using Ontologies as Facets Regulations are encoded as ontologies, an ontology a potential view of the wo select from a number of different regulation ontologies to classify the data, allow differences between state regulations regulations set forth by the EPA. In addition, type information from the water ontology that describes the different types of measurement sites and their pollut gives the user some control over is displayed on the map. More customized Queries The Characteristic, Health Concern and Time Frame facets enable the user to his/her query. The user can issue the most relevant to his/her interests What sites/facilities in this ar withthese specific contaminants, e.g. fecal coliform, lead? What polluted sites/facilities ar with pollutants that could cause symptoms or health problems, e.g. Di What sites/facilities were pollut two years? Data Presentation Different icons are used to differentiate polluted sites from clean sites. Clicking on one of these polluted sites will display a popup window that provides more details about the pollution events: names of contaminants, measured values, limit values, time of measurement, and health effects. Archive CSV2RDF4LOD Enhance derive a r c h i v e Publish CSV2RDF4LOD Direct Visualize Reason derive Connecting to Health Issues Aiming at helping citizens investigate health impacts of water pollution, SemantAqua links water quality data to some known health considerations. We have generated an initial ontology describing potential health impacts of overexposure to contaminants. Initial content came from EPA. For example, exposure to E. Coli results in abdominal cramping and diarrhea, and if left untreated can result in high bloodpressure and kidney damage. Thishealth information is presented to the user together with the pollution details (see Data Presentation) and also used to customize information retrieval (see More Customized Queries). Time Series Visualization The timeseries visualization retrieves water quality data related to a selected water site or facility by querying the triple store and displays the water quality data as a time series. The user selects a particular permit for a facility, the characteristic of the water, and the test type (if any) associated with that particular characteristic. For the EPA data there are up to five different test types that take measurements in different ways and compute the limits differently: Quantity Average, Quantity Max, Concentration Min, Concentration Average, Concentration Max. The visualization on the right is about the quality of the water released by the Southeast Water Pollution Control Plant located in San Francisco. The plot showstheenterococci measurements in green and theregulation defined limit in blue. We can see that there are three severe violations (in red) happened during 2009 and 2010. Access to such information can help citizens be more informed and make requests to the state administrator to improve the handling of the water at the local facilities. Future Work Currently, twenty-seven states out of fifty have been encoded in RDF using the SemantEco and SemantAqua ont and work continues on converting the remaining states. The current portal contains the regulatory informati the fifty states. An effort is underway to encode additional regulatory information from different states a what states simply defer to the EPA on different pollutants as the EPA regulations have already been encode In addition, work on linking contaminants to external resources such as DBpedia and symptom a information from sources such as WebMD will provide the data needed to answer the more interesting question the health impacts of pollution. We also have initiated work on linking to reporting systems at the federal so that users can report potential issues in their neighborhoods, thus making this portal a environmental change. Lastly, we plan to augment the portal to generate data reports of user's query result contain query specification, identified pollution events, relevant converted and source data and provenance These data reports can be useful when users report their findings to authorities or environmental organizat Sponsors: Visit our project page at: http://tw.rpi.edu/web/project/SemantAQ Try it out: http://aquarius.tw.rpi.edu/projects/sem

Upload: others

Post on 26-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Next Generation Environmental Informatics as exemplified ...tw.rpi.edu/media/2011/12/10/a7cd/AGU_2011_Poster-Ping-v5.pptx · 10/12/2011  · Next Generation Environmental Informatics

Next Generation Environmental Informatics as exemplified by the Tetherless World

Semantic Water Quality Portal Ping Wang1 ([email protected]), Jin Guang Zheng1 ([email protected]), Linyun Fu1 ([email protected]), Evan W. Patton1 ([email protected]), Timothy Lebo1 ([email protected]), Li Ding1 ([email protected]), Joanne S. Luciano1 ([email protected]), and Deborah L. McGuinness1 ([email protected]) (1Rensselaer Polytechnic Institute 110 8th St., Troy, NY, 12180 United States)

Poster: IN31B-1438 Glossary: EPA – U.S. Environmental Protection Agency MPN – Most Probable Number PML 2 – Proof Markup Language (PML) version 2 RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute USGS – United States Geological Survey

Motivation In late 2009 in Bristol County, RI there was a case of E. Coli contaminating the public water supply resulting in illnesses in the population, particularly young children. residents requested information concerning when the contamination began, how it happened, and what measures were being taken to monitor and prevent future occurrences. That event reflected the increasing demand for direct and transparent access to ecological and environmental information, and inspired the Semantic Water Quality Portal (SemantAqua) project.

Next Generation Environmental Informatics Starting with the domain of water quality, we are investigating a general framework called SemantEco that can support dynamic environmental informatics portals via semantically-enabled approaches, including: • capture of the semantics of domain knowledge using a family of modular

simple OWL2 ontologies, • integration of environmental monitoring and regulation data from multiple

sources following Linked Data principles • preservation of provenance metadata using the Proof Markup Language

(PML) version 2 • inference of environment pollution events using OWL2 inference Combined with distributed sensor networks and incremental OWL2 classification, this work could provide a scaffold for deploying near real-time reporting of pollution events in communities.

SemantAqua Workflow

Location-based Information Retrieval Users input a ZIP Code™ to identify the area for their search. SemantAqua uses Geonames to look up additional information, e.g. city and state, to generate location-based query over the USGS and EPA datasets. The mobile interface also takes advantage of the W3C geolocation APIs to find polluted sites near the user.

Enabling Context-Sensitive Actions In order to help users take an active role in monitoring water quality where they live, SemantAqua attempts to identify useful links where users can report problems with their local water supplies. Currently, the portal supports reporting to the EPA and some state departments that are related to environmental preservation and protection (e.g. the California Department of Fish and Game). Work to identify the appropriate links to external authorities that accept reports within their jurisdictions is still ongoing.

Provenance-based Query SemantAqua captures provenance information during the data integration stages and encodes them in the Proof Markup Language (PML) version 2 Provenance Interlingua. The provenance information is being used to support provenance-based queries. For example the system allows users to select and inspect data source information so users can choose to rely only on data from sources they trust. This will be particularly important as the portal expands to include other more varied sources of data (see Future Work).

Using Ontologies as Facets Regulations are encoded as ontologies, thus making an ontology a potential view of the world. Users can select from a number of different regulation ontologies to classify the data, allowing them to see differences between state regulations and the federal regulations set forth by the EPA. In addition, type information from the water ontology that describes the different types of measurement sites and their polluted counterparts gives the user some control over what information is displayed on the map.

More customized Queries The Characteristic, Health Concern and Time Frame facets enable the user to further customer his/her query. The user can issue queries that are the most relevant to his/her interests. • What sites/facilities in this area are polluted

with these specific contaminants, e.g. fecal coliform, lead?

• What polluted sites/facilities are contaminated with pollutants that could cause the following symptoms or health problems, e.g. Diarrhea?

• What sites/facilities were polluted in the past two years?

Data Presentation Different icons are used to differentiate polluted sites from clean sites. Clicking on one of these polluted sites will display a popup window that provides more details about the pollution events: names of contaminants, measured values, limit values, time of measurement, and health effects.

Archive

CSV2RDF4LOD Enhance

derive

archive

Publish

CSV2RDF4LOD Direct

Visualize Reason

derive

Connecting to Health Issues Aiming at helping citizens investigate health impacts of water pollution, SemantAqua links water quality data to some known health considerations. We have generated an initial ontology describing potential health impacts of overexposure to contaminants. Initial content came from EPA. For example, exposure to E. Coli results in abdominal cramping and diarrhea, and if left untreated can result in high blood pressure and kidney damage. This health information is presented to the user together with the pollution details (see Data Presentation) and also used to customize information retrieval (see More Customized Queries).

Time Series Visualization The time series visualization retrieves water quality data related to a selected water site or facility by querying the triple store and displays the water quality data as a time series. The user selects a particular permit for a facility, the characteristic of the water, and the test type (if any) associated with that particular characteristic. For the EPA data there are up to five different test types that take measurements in different ways and compute the limits differently: Quantity Average, Quantity Max, Concentration Min, Concentration Average, Concentration Max. The visualization on the right is about the quality of the water released by the Southeast Water Pollution Control Plant located in San Francisco. The plot shows the enterococci measurements in green and the regulation defined limit in blue. We can see that there are three severe violations (in red) happened during 2009 and 2010. Access to such information can help citizens be more informed and make requests to the state administrator to improve the handling of the water at the local facilities.

Future Work Currently, twenty-seven states out of fifty have been encoded in RDF using the SemantEco and SemantAqua ontologies and work continues on converting the remaining states. The current portal contains the regulatory information of four of the fifty states. An effort is underway to encode additional regulatory information from different states as well as identify what states simply defer to the EPA on different pollutants as the EPA regulations have already been encoded. In addition, work on linking contaminants to external resources such as DBpedia and symptom and health effect information from sources such as WebMD will provide the data needed to answer the more interesting questions regarding the health impacts of pollution. We also have initiated work on linking to reporting systems at the federal and state levels so that users can report potential issues in their neighborhoods, thus making this portal a helpful tool for enacting environmental change. Lastly, we plan to augment the portal to generate data reports of user's query results, which could contain query specification, identified pollution events, relevant converted and source data and provenance of these data. These data reports can be useful when users report their findings to authorities or environmental organizations.

Sponsors:

Visit our project page at: http://tw.rpi.edu/web/project/SemantAQUA

Try it out: http://aquarius.tw.rpi.edu/projects/semantaqua/