an ontology-based approach to building bns for the weather forecasting domain tali boneh ann...

40
An Ontology-Based Approach to Building BNs for the Weather Forecasting Domain Tali Boneh Ann Nicholson, Kevin Korb (Monash University) John Bally (Bureau of Meteorology) Monash Bayesian Reasoning Workshop April 2006

Upload: marcus-george

Post on 24-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

An Ontology-Based Approach to Building BNs for the Weather Forecasting Domain

Tali Boneh

Ann Nicholson, Kevin Korb

(Monash University)

John Bally

(Bureau of Meteorology)

Monash Bayesian Reasoning Workshop

April 2006

The weather forecasting domain

• The Australian Bureau of Meteorology (Bureau) is the

national meteorological authority of Australia.

• Its role is to observe and understand Australian weather and

climate and provide weather services.

• A service is defined by its clients.

• The output of a service is products, namely weather reports

in a variety of formats (text and graphics) using several

delivery media such as newspapers, radio and the internet.

Traditional weather forecasting process

Forecasters:

• examine a large amount of data from different sources

and in different formats;

• analyse and integrate these data to generate the

weather products/reports;

• use several tools (Decision Support Systems), as well

as their own judgment, to enable integration of

information and make diagnosis-prediction decisions.

• the weather products are created by typing text into

formatted forms using a specialised text editor

Characteristics of Decision Support Systems

Aim of DSS: to digitally and graphically display data for

forecasters

• The graphical representation enables forecasters to

manually interact with the data, adjust it, change,

integrate and create new data when necessary

• The manual interaction enables forecasters to digitally

and graphically represent their thoughts

• The digital representation enables an automated text

(product) generation

Requirements and limitations - I

• The information contains uncertainty

incomplete knowledge missing data uncertainty in observation uncertainty in guidance

• Data quality is low, data are imprecise.

• It is not always clear how to combine the information

how to weigh the different bits of information how to incorporate historical data how to include forecasters’ knowledge

Requirements and limitations - II

• Existing DSS focus only on data storage, graphical user interface, and automated products generation

• More advanced meteorological decisions are left to the subjective judgement, experience, knowledge and character of the forecaster:

– how to derive weather elements based on others

– how to manipulate forecast data

– how to integrate data from different sources

The final representation is still subjective.

Requirements and limitations - III

• The domain is highly complex and involves different

dimensions

– e.g. elements, locations, time issues (time of the year,

time of the day, and lead times)

• The domain evolves and changes rapidly with better

understanding of the atmosphere, better technology

and better Numerical Weather Prediction models

A rapid development of decision support systems is

required

Requirements and limitations - IV

• In some cases it is desirable to implement more

than one technology using the same data

Approaches to DDSs should support multiple

technologies (e.g. BNs, ANNs, rule-based)

New DSS approach

• Integration of information that can capture complex meteorological concepts in ways that match the forecaster's knowledge.

• DSSs that

– can derive forecast weather elements based on local or synoptic-scale information

– modify forecast data using complex meteorological concepts while ensuring weather element consistency

– avoid comprehensive modelling and implementation

– deal with separate small decision steps .

Current new tools: 'state of the art'

• The Australian Thunderstorm Interactive Forecast System (TIFS)

– the DSS is incorporated in the software as a code

– knowledge is not explicit and some may be lost

• The National Oceanic and Atmospheric Administration's National Weather Service the “Graphical Forecasting Editor” (GFE)

– includes a framework, called Smart Tools ( based on Python)

– lets forecasters write their own tools

– can be documented at any level the forecaster finds appropriate

– code can be verified and become available to all forecasters

– code is kept in a central repository with its documentation

– procedures can be created and become operational quickly.

Disadvantages of current tools

• Most of these DSSs are rule-based

the tools do not appropriately deal with uncertainty

• Knowledge is not explicit: captured directly into a coding

language

– representation from which the domain knowledge is not

easily recognisable and may be lost as a result of the

modelling decisions taken and the representation itself

the knowledge cannot be easily shared and reused

Possible Resolution: dealing with uncertainty

• Bayesian Network technology deals with

uncertainty, missing data and poor data quality .

• Probability theory is one of the scientific ways of

dealing with reasoning under uncertainty.

• Applying formal statistics can yield better results,

compared with subjective judgment .

• The final output of the process is objective and

is based on a solid mathematical basis.

Problems with Knowledge Engineering BNs

Capturing the knowledge directly into a BN may result in:

• a representation from which the domain knowledge is not easily recognisable

– example: row information may need to be processed before it can be provided to the network. The details will be buried as a code in the implementation

• loss of information as a result of modelling decisions.

– example: omitting a variable from the network for efficiency reasons. The variable could be useful if different technologies are to be implemented

Ontology-based approach

Knowledge Base

BayesianNetworks

Other Technology

Other Technology

Data

semi-automated construction

Ontology-based approach

• To overcome potential disadvantages:

knowledge should be represented in a form that enables re-use and sharing across software and people

need a knowledge-level-model that is independent of particular computer languages

• The concept of constructing small steps of DSSs requires that the domain expert should be able to develop their own networks.

need to support the forecaster in constructing BN

A consensual conceptualisation of a domain for the purpose of knowledge to be shared and re-used is called Ontology.

Ontology

• In Philosophy: a systematic explanation of being

• In knowledge engineering: a formal, explicit specification

of a shared conceptualisation

• Ontologies aim to capture consensual knowledge in a

generic way, for the purpose of re-use and sharing

across machines and people.

Ontology design

• Declarative Knowledge

– knowledge about what objects states

and relations are in the domain

– concepts: wind, temperature, fog

• Procedural Knowledge

– knowledge about how to find relevant

facts and make inferences

– how to predict: wind, temperature, fog

Ontology

Declarative Knowledge

Procedural Knowledge

Forecasting Ontology – declarative knowledge

• Weather services and products– Service: aviation, disaster, marine, public

– Product: airport briefing, synoptic situation, recent events, media statement

• Weather data sources: NWPs, radar, satellite, tracker, guidance

• Weather phenomenon/information– weather elements: wind, temperature, fog, thunderstorm, inversion)

– tools additional information: tracker length

– other environment information: time issues

• Database schema

Forecasting Ontology – procedural knowledge

• procedure– rule based– bayesian network– decision theory– neural network

• procedure working data• output relation• algorithm

– value description– description-description– general algorithm

Knowledge elicitation

• Bayesian Network

– input variables

– output variables

– type of connection between input and output (predictor/environment/guidance/network refinements)

– working data for learning the probabilities

– multiple working data describing the inputs and

outputs at runtime

Semi-automated Construction of BN

• Extraction from ontology

– inputs, outputs variables → BN nodes

– Direction of arcs:

Predictors – from output to input (sensors)

Environment – from input to output (background factor)

Guidance – from output to input (sensor)

• Refinement of structure

– more arcs can go from the environment to the predictors

– Intermediate variables to reduce size of CPTs

– CPTs (from data, from experts, combination)

• Updating ontology

Case study – forecasting fog

• Different types of variables

– guidance:

Stern-Parkyn, Regano

– meteorological variables

weather elements:

Moisture, Pressure Gradient and Lapse Rate

environment variables (background factors):

Rainfall, Month

Possible BN structure(s) can be constructed from this knowledge.

Fog – Ontology fragment

Fog Y/N Prob

Pressure Gradient 3pm Bendigo

Pressure Gradient 3pm Wonthaggi

pYWON-pYBDG pYWON-pYBDG

Pressure Gradient 3pm East Sale

Pressure Gradient 3pm Hamilton

Combined Pressure Gradient 3pm

Predictor

Predicted Rainfall Amount Y/N 9am-9am

Environment

Moisture 6/9pm

Predictor

Stern/Parkyn

Environment

Regano

Guidance

Guidance

Month

Actual 6/9pm Data

Actual MSLP Data

Incremental prototyping development model

• Construction in steps

– guidance only

– meteorology only

– combined network

Bayesian Network: fog – guidance only

SternParkyn

0 to 11 to 22 to 55 to 1010 to 1515 to 3030 to 100

46.915.117.59.784.124.382.27

4.79 ± 11

ReganoLatest

Vfavfavunfav

13.416.070.7

Fog

fognofog

3.2396.8

Bayesian Network: fog – meteorology only

Month

JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember

8.767.998.768.488.768.488.768.367.788.047.788.04

Fog

fognofog

3.7196.3

LapseRate9pmCont

< 2.052.05 to 2.752.75 to 3.25>= 3.25

26.518.117.637.8

2.74 ± 0.75

Gradient

Vfavfavunfav

33.019.347.6

RainNoRain

0 to 4.5>= 4.5

91.68.41

LengthOfNight

Nov to JanFeb and OctMarch and SeptApr and AugMay to July

24.616.016.516.826.0

Moisture

Vfavfavunfav

23.519.756.8

LapseRate9pmCont

< 2.052.05 to 2.752.75 to 3.25>= 3.25

26.518.117.637.8

2.74 ± 0.75

Month

JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember

8.767.998.768.488.768.488.768.367.788.047.788.04

RainNoRain

0 to 4.5>= 4.5

91.68.41

Fog

fognofog

3.7196.3

Gradient

Vfavfavunfav

33.019.347.6

LengthOfNight

Nov to JanFeb and OctMarch and SeptApr and AugMay to July

24.616.016.516.826.0

ReganoLatest

Vfavfavunfav

13.616.070.4

SternParkyn

0 to 11 to 22 to 55 to 1010 to 1515 to 3030 to 100

51.014.316.78.393.603.892.12

4.39 ± 11

Moisture

Vfavfavunfav

23.519.756.8

Bayesian Network: fog – combined

Environment

WeatherGuidance

Meteorology

ROC curve evaluation

• Receiver Operating Characteristic (ROC) curves

• P(true positive) vs. P(false positive)

• Area under curve (AUC) is global measure

– perfect test: AUC = 1

• Can be used to find optimal cutoff values

Bureau Evaluation Measures

• POD (True Positive Rate)

True Positive(True Positive + False Negative) = #fog events

• False Positive Rate

False Positive (False Positive + True Negative) = #no-fog events

• False Positive Ratio (FAR)

False Positive (False Positive + True Positive) = #fog was forecasted

Evaluation

• Stratified 10-fold cross-validation used

• Dataset randomly divided into 90% (training) and 10% (validation) fractions

– separately for fog and no-fog cases

• Process repeated for 3 networks

Results: ROC evaluation of the three networks

ROC Fog network

0

20

40

60

80

100

0 20 40 60 80 100

FARare

PO

D

Guidance Only - AUC 0.833

Met Only - AUC 0.916

Combined - AUC 0.928

ROC evaluation of the Melbourne Network

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

FARate

PO

D

3pm TAF - 0.857

3pm Combined - 0.917

ROC evaluation of the Melbourne Network

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

FARate

PO

D

9pm TAF - 0.866

9pm Combined - 0.928

POD & FAR – operations versus network

Forecast Operational

POD (%)

Operational

FAR (%)

Network

POD (%)

Network

FAR (%)

3pm TAF 56 73 65 77

3pm TAF and

Code Grey

87 90 95 90

9pm TAF 67 73 71 76

9pm TAF and

Code Grey

87 90 95 89 

1% cutoff was used for Code Grey

20% cutoff was used for TAF

Ontology preferences

Fog Say No Fog -20

Fog Say Code Grey - less than 5% chance of fog 10

Fog Say Code Grey - 5% chance of fog 16

Fog Say Code Grey - 10% chance of fog 18

Fog Say Code Grey - 20% chance of fog 19

Fog Say TAF – Prob Fog 20

No Fog Say No Fog 2

No Fog Say Code Grey - less than 5% chance of fog -1

No Fog Say Code Grey - 5% chance of fog -2

No Fog Say Code Grey - 10% chance of fog -3

No Fog Say Code Grey - 20% chance of fog -4

No Fog Say TAF – Prob Fog -5

Ontology (POD – FAR)

FORECAST OUTCOME

Model POD Model FAR

No fog 100 97

<5% Code Grey 95-89 90

5% Code Grey 88-84 85-83

10% Code Grey 81-73 82-80

20% Code Grey 76-70 79-78

Fog on TAF 71-65 77-76

Fog decision network

Month

JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember

8.767.998.768.488.768.488.768.367.788.047.788.04

LengthOfNight

Nov to JanFeb and OctMarch and SeptApr and AugMay to July

24.616.016.516.826.0

RainNoRain

0 to 4.5>= 4.5

91.68.41

Fog

fognofog

3.7196.3

LapseRate9pmCont

< 2.052.05 to 2.752.75 to 3.25>= 3.25

26.518.117.637.8

2.74 ± 0.75

Moisture

Vfavfavunfav

23.519.756.8

Gradient

Vfavfavunfav

33.019.347.6

SternParkyn

0 to 11 to 22 to 55 to 1010 to 1515 to 3030 to 100

51.014.316.78.393.603.892.12

4.39 ± 11

U

Decision

saynofogCodeGreyLessThen5CodeGrey5CodeGrey10CodeGrey20sayfog

0.677751.319301.404691.358431.289331.20796

Conclusions

• Small fragments of Bayesian Networks are beneficial in the forecasting domain

• The incremental development model supports the acceptance of the Bayesian Networks

• The ontology was found to be useful

– for the explicit representation of all elicited knowledge including background information (variables, discretisation, arcs and probabilities)

– for sharing information between domain experts and the knowledge engineer

– as a guide for further elicitation

– in supporting the domain experts in the construction of a Bayesian Network

Future Work

• Further development of the ontology

• More research on how to determine preferences

• Other forecasting case studies

– Thunderstorms

• Testing

• Implementation issues

An Ontology-Based Approach to Building BNs for the Weather Forecasting Domain

Tali Boneh

Ann Nicholson, Kevin Korb

(Monash University)

John Bally

(Bureau of Meteorology)

Monash Bayesian Reasoning Workshop

April 2006