6 retrieval - shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6...

17
6 Retrieval The state transition system model provides a natural basis for retrieval of answers in response to various queries about the presence and movements of occupants within the smart environment. In this chapter, we first formulate a data model based upon an occupancy relation with a real-valued probability attribute. This relation captures the state information after recognition and reasoning have been performed and a set of valid occupants have been determined. We then show how to formulate a few spatio- temporal queries. We first formulate queries using the well-known SQL database query language focusing on the computation of probabilities, an aspect that is novel to our model. It is well-known that an SQL-like language cannot formulate general transitive closures, and hence we explore the formulation of more complex queries in a constraint-based extension of logic programs, called CLP(R), which permits reasoning over real-valued variables and arithmetic operations. Our objective in this chapter is to outline the basic approach to query formulation rather than to provide a comprehensive coverage of all possible types of queries. We also formulate in this chapter the precision and recall metrics in a query- dependent manner, since the performance of a smart environment is ultimately determined by how well it can respond to the queries that the end-user is interested in. We provide examples for calculation of query-dependent performance metrics based on experimental results from a simulation run. We conclude the chapter with an overview of event processing in probabilistic databases. 6.1 Data Model We begin with a presentation of the data model underlying our query language. While the data model is basically relational in nature, it departs from the standard relational model in the use of a real-valued attribute for the probability. 97

Upload: others

Post on 26-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

6 Retrieval

The state transition system model provides a natural basis for retrieval of answers in

response to various queries about the presence and movements of occupants within

the smart environment. In this chapter, we first formulate a data model based upon

an occupancy relation with a real-valued probability attribute. This relation captures

the state information after recognition and reasoning have been performed and a set of

valid occupants have been determined. We then show how to formulate a few spatio-

temporal queries. We first formulate queries using the well-known SQL database

query language focusing on the computation of probabilities, an aspect that is novel

to our model. It is well-known that an SQL-like language cannot formulate general

transitive closures, and hence we explore the formulation of more complex queries

in a constraint-based extension of logic programs, called CLP(R), which permits

reasoning over real-valued variables and arithmetic operations. Our objective in this

chapter is to outline the basic approach to query formulation rather than to provide

a comprehensive coverage of all possible types of queries.

We also formulate in this chapter the precision and recall metrics in a query-

dependent manner, since the performance of a smart environment is ultimately

determined by how well it can respond to the queries that the end-user is interested

in. We provide examples for calculation of query-dependent performance metrics

based on experimental results from a simulation run. We conclude the chapter with

an overview of event processing in probabilistic databases.

6.1 Data Model

We begin with a presentation of the data model underlying our query language. While

the data model is basically relational in nature, it departs from the standard relational

model in the use of a real-valued attribute for the probability.

97

Page 2: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

Definition 6.1: State Relation: Given an environment with occupants O =

o1 . . . on, zones Z = z1 . . . zm, and biometric events occurring at distinct increasing

times T = t1 . . . tx, the states of the smart environment can be represented as a

relation

state(time, occupant, zone, probability)

where occupant ∈ O, zone ∈ Z, and time ∈ T , where T ⊆ {00:00, 00:01, . . . , 23:59},

a discrete totally ordered set of time units. The attribute probability ∈ R, the set of

real numbers, and is functionally dependent on the other three attributes. The state

relation satisfies the following integrity constraint:

∀t∀i Σ{p : (∃z) state(t, oi, z, p)} = 1

For simplicity, we assume that time is discretized as a totally ordered set of hour-

minute points over a 24-hour period, although it can be extended to cover multiple

days, months, and years in a straightforward way. We use the state relation in order

to represent the result after recognition and reasoning have been completed.

Occ Entr. Office Lounge Ext.o1 0.08 0.23 0.67 0.02o2 0.25 0.45 0.20 0.10o3 0.12 0.62 0.20 0.06o4 0.05 0.11 0.20 0.64o5 0.78 0.13 0.06 0.03GT o5 o2,o3 o1 o4

(a) State s8 at 10:15 am

Occ Entr. Office Lounge Ext.o1 0.07 0.21 0.70 0.02o2 0.11 0.20 0.64 0.05o3 0.08 0.39 0.49 0.04o4 0.05 0.11 0.20 0.64o5 0.78 0.13 0.06 0.03GT o5 o3 o1,o2 o4

(b) State s9 at 10:20 am

Table 6.1: Sample States

Table 6.1 shows a fragment of two consecutive states in the smart environment, and

the state relation based on these states is shown in table 6.2. Note that the tuples in

the state relation shown correspond only to those time units at which the biometric

events occur. At each such time t, the state is represented by a set of m × n tuples

corresponding to all possible occupant-zone pairs.

98

Page 3: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

State Relationsstate(10:15, o1, entrance, 0.08)state(10:15, o1, office, 0.23)state(10:15, o1, lounge, 0.67)state(10:15, o1, external, 0.02)

...state(10:15, o5, entrance, 0.78)

...state(10:15, o5, external, 0.03)

(a) State Relation for s8

State Relationsstate(10:20, o1, entrance, 0.07)state(10:20,o1, office, 0.21)state(10:20, o1, lounge, 0.70)state(10:20, o1, external, 0.02)

...state(10:20, o5, entrance, 0.78)

...state(10:20, o5, external, 0.03)

(b) State Relation s9

Table 6.2: Sample State Relations

As discussed in the previous chapter, a person is considered as a valid occupant only

if his track is valid. We therefore define an reduced state relation, state′, from the

state relation by selecting those tuples that correspond to valid occupants. Using this

relation we define an occupancy relation, as follows.

Definition 6.2: Occupancy Relation: Given a smart environment with O =

o1 . . . on, Z = z1 . . . zm, and valid occupants V ⊆ O, we define:

∀t∀o∀z[state′(t, o, z, pr) ← state(t, o, z, pr) ∧ o ∈ V ]

Assuming an increasing time sequence T = t1 . . . tx, we define

∀i ∀o∀z[occupancy(ti, t′i+1, o, z, pr) ← state′(ti, o, z, pr) ∧ 1 ≤ i ≤ (x− 1)]

where t′ is the time unit that precedes t by one minute.

Occupancy Relationsoccupancy(10:15, 10:19, o1, entrance, 0.08)occupancy(10:15, 10:19, o1, office, 0.23)occupancy(10:15, 10:19, o1, lounge, 0.67)occupancy(10:15, 10:19, o1, external, 0.02)

...occupancy(10:15, 10:19, o5, entrance, 0.78)

...occupancy(10:15, 10:19, o5, external, 0.03)

Table 6.3: Sample Occupancy Relation

99

Page 4: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

Table 6.3 is a snapshot of the occupancy relation based on two consecutive states

shown in table 6.1. Like in state relation, for each time interval [ti, t′i+1], the occupancy

consists of a set of m× n tuples corresponding to all possible person-zone pairs. The

occupancy relation also satisfies the integrity constraint:

∀t∀i Σ{p : (∃z) occupancy(ti, t′i+1, oi, z, p)} = 1

6.2 Query Formulation

We first use the well-known database query language SQL (Structured Query

Language) for writing queries and then discuss the use of CLP (constraint logic

programming) for more complex queries. Our focus will be on queries involving

the computation of probabilities, as this is the novel part of the work.

6.2.1 Structured Query Language

Below is the syntax of the most basic form of SQL queries [73]:

SELECT attributesFROM relationsWHERE condition

We will use the

occupancy(from, to, person, zone, prob)

relation as the basis for formulating queries. The tuples of the relations that satisfy

the condition are selected and the relevant attributes are returned as the result. The

condition is typically a conjunction of simpler tests that serve as a basis for tuple

selection. There are numerous extensions to the basic syntax outlined above, in order

to perform aggregate operations, grouping, ordering, etc.

Query 1. What is the probability that occupant o3 was in the lounge at 3:00 pm?

Answer: For this query, it is necessary to check the occupancy relation for the

presence of o3 in the lounge at a time interval that contains 3:00 pm. If o3 was

detected to be in the lounge, at most one tuple in occupancy will satisfy the query;

100

Page 5: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

otherwise the query returns without any answer. For simplicity, we assume overloaded

comparison operators <=, <, =, etc., that are defined on time values.

SELECT probFROM occupancyWHERE person = o3 and

zone = lounge andfrom <= 15:00 <= to

Query 2. What is the probability that occupant o5 was in the lounge or office at

3:00 pm?

Answer: The probability of an occupant being in one of two zones at any given time

is the sum of the probabilities of the occupant in these zones at that time.

SELECT SUM(prob)FROM occupancyWHERE person = o5 and

zone in (lounge, office)from <= 15:00 <= to

Query 3. What is the probability that occupant o7 was in the lounge during 10:00

am to 11:00 am?

Answer: The probability that an occupant was in the lounge at a given time is the 1

- sum of the probabilities that he was in one of the other zones at this time (since the

sum of the probabilities across all zones = 1 at any given time). Since there could be

multiple sub-intervals within 10:00 am to 11:00 am during which o7 was in the lounge

(with different probabilities), the answer to the query is 1 minus the product of the

probabilities that he was not in the lounge during every such sub-interval.

(1 - PROD(SELECT (1 - SUM(prob)) as sumprobFROM occupancyWHERE person = o7 and

zone 6= lounge and10:00 <= from and to <= 11:00

GROUP BY from))

101

Page 6: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

6.2.2 Constraint Logic Programming

Logic programs offer a more expressive query paradigm than SQL because they

permit the formulation of general recursive queries. SQL offers the guarantee that

all queries will terminate, an important requirement for a database query language.

The subset of Horn clauses called Datalog, which is essentially Horn clauses without

function symbols [20], also has the strong termination property and has been studied

extensively in the literature. Since our underlying data model uses a real-valued

attribute for probability along with operations for comparison and arithmetic, it

is more natural to adopt the paradigm of constraint logic programming over reals,

CLP(R) [61], rather than Horn clauses. Essentially, CLP(R) extends Horn clauses by

generalizing unification to constraint satisfaction. Typically, CLP(R) systems provide

solvers for linear equalities and inequalities; non-linear equations and inequations are

deferred until one or more variables become bound and they become linear. They

also support aggregation predicates, such as min, max, sum, count, etc., and we will

make use of such operations in our formulations as well.

A CLP(R) program is a collection of rules, each of which may be one of two forms:

p(t̄)

p(t̄) : − p1(t̄1) . . . pk(t̄k)

where each p is a user-defined predicate and each p1 . . . pk may be user-defined or may

be one of a pre-defined set of builtin constraint predicates, such as ≤, ge, etc. The

terms t̄ and t̄i for 1 ≤ i ≤ k include ordinary terms as in Horn clauses as well as

terms composed from real numbers, variables, and the usual arithmetic operators.

As CLP(R) is not a well-known paradigm, we first show how to formulate queries 1-3

in the language, and then illustrate more complex queries. A CLP(R) program for

query 1 would be:

query1(Prob) :-occupancy(From, To, o3, lounge, Prob),From <= 15:00, To => 15:00.

102

Page 7: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

Here we are taking the liberty to use the <= and => operators for comparing time

units.

Query 2 can be expressed as follows, where sum(P, G, A) is an aggregation predicate

which sums up all instances of P that satisfy the goal G and places the result in A.

The predicate member is the standard list membership predicate.

query2(Prob) :-sum(P,q2(P),Prob),

q2(Prob) :-occupancy(From, To, o5, Z, Prob),From <= 15:00, To => 15:00,member(Z, [lounge, office]).

The formulation of query 3 makes use of the sum and prod aggregation predicates.

The effect of the ’GROUP BY from’ clause in the SQL query is realized by not making

the variable F in the goal q3b(F,P) an existential variable. In this manner, the

predicate q3a(Prob) defines the probabilities for occupant o7 not being in the lounge

at every time interval from 10:00 to 11:00 am.

query3(1-Prob) :-prod(P, q3a(P), Prob).

q3a(1-S) :-sum(P, q3b(F,P), S).

q3b(From,Prob) :-occupancy(From, To, o7, Z, Prob),Z 6= lounge,10:00 <= From, 11:00 => To.

Query 4. What is the probability that o3 and o5 met in the lounge today? Assume

that “met” means “being in the same zone at the same time”.

Answer: The probability that o3 and o5 met in the lounge today is 1 minus the

probability that o3 and o5 did not meet in the lounge. For any interval, the probability

of not having met in the lounge during this interval is 1 minus product of the

103

Page 8: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

probabilities of their being in the lounge during this interval – predicate q4 returns

this probability for every overlapping interval.

query4(1-Prob) :-prod(P, q4(P), Prob).

q4(1 - Prob1*Prob2) :-occupancy(From1, To1, o3, lounge, Prob1),occupancy(From2, To2, o5, lounge, Prob2),overlaps(From1,To1, From2, To2)

overlaps(F1,T1,F2,T2) :-F1 <= F2, F2 <= T1.

overlaps(F1,T1,F2,T2) :-F2 <= F1, F1 <= T2.

Query 5. What is the longest contiguous duration during which occupant o1 was in

the office?

Answer: We define the contiguous occupancy in a zone recursively and use this

definition in order to define the longest duration.

query5(Duration) :-setof(F,occupancy(F,_,o1,office,_), FromSet),max(D,q5(FromSet,D), Duration).

q5(FS,D) :-member(F,FS),contiguous(F,T,o1,office),D = T - F.

contiguous(F,T,P,Z) :-occupancy(F,T1,P,Z, _),T2 = T1 + 0:01,contiguous(T2,T,P,Z).

contiguous(F,T,P,Z) :-\+ occupancy(F,_,P,Z,_),T = F - 0:01.

It is straightforward to extend the above definition of contiguous so that the

average probability during this period is also included. Other extensions include

the incorporation of distances between adjacent zones and spatial queries that make

104

Page 9: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

use of this distance information. As can be seen from the above formulations and

possible extensions, the CLP(R) paradigm is a powerful paradigm for probabilistic

spatio-temporal queries.

Query 6. If occupant o5 was known to be in the lounge at 3:00 pm with certainty,

what is the probability that o3 was in the lounge at that time?

Answer: Such a query cannot be answered without re-initializing the state and

occupancy relations. The knowledge that o5 was in the lounge at 3:00 pm with

certainty means that o5 should also have been present with certainty in the zone

preceding his entrance to the lounge. Inductively, we can say that o5 was present with

certainty in all zones in his track leading up to entrance to the lounge at or preceding

3:00 pm. Thus, we can define a revised event sequence in which the probability

of o5 is 1.0 for each event in his track leading up to this lounge at 3:00 pm, and

the probabilities of all other occupants for each such event are zero (0). The state

transitions are carried out with the revised event sequence in order to determine a

revised set of states and valid occupants, and thereby a revised occupancy relation.

Now, query 1 can be invoked in order to determine the probability of o3 in the lounge

at 3:00 pm.

The above query shows the close interconnection between retrieval, reasoning,

and recognition. A ‘what if’ retrieval query that declares knowledge about an event

causes the redefinition of one or more biometric events, thereby triggering the state

transition module to compute a new set of states, thereby triggering the reasoning

module to determine a new set of valid occupants, which the retrieval module uses

to determine a new occupancy relation for answering the query.

6.3 Query-dependent Performance Metrics

The query-dependent characterization involves evaluating the performance of the

model from an information retrieval perspective based on its ability to answer spatio-

temporal queries about the environment and its occupants. While query-independent

is holistic and involves performance characterization at a system level, the scope of

105

Page 10: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

query-dependent performance characterization is restricted to the spatio-temporal

dimensions that are either explicit or implicit from the query of interest. At a very

narrow level, the window of interest for evaluating the performance may only concern

an occupant’s presence in a particular zone at a specific point in time to a more broad

level that may concern the entire sequence of states of the state transition model.

The performance metrics of any given query of interest are defined in terms of the

ground truth, which is a set of true answers associated with the query. The nature of

the response set may vary depending on the type of query posed and may comprise

of basic entities or attributes of the smart environment such as occupants, zones,

probabilities of presence, time of occurrence or derived attributes such as duration of

presence, tracks, etc., which are based on relations that can be defined as part of the

data model.

Definition 6.3: Ground Truth: Given n occupants O={o1 . . . on} and an event

sequence e1 . . . ex, then the ground truth, GT , is a sequence oi1 . . . oix where each

index i1 . . . ix lies in the range 1 . . . n.

The ground truth basically states which person was the true occupant for each

biomeric event. We first define occupancy-based precision and recall, as follows.

Definition 6.4: Precision-Recall (Occupants): Given an environment with m

zones, n occupants O = {o1 . . . on}, an event sequence E = e1 . . . ex, and ground truth

GT . For an occupancy-based query Q, suppose Relo is the set of relevant occupants

that satisfies the query as per GT , and Reto is the set of retrieved occupants as per

the data model and occupancy relation. Then, precisiono = |Reto ∩ Relo| / |Reto|

and recallo = |Reto ∩ Relo| / |Relo|.

This definition can be extended to queries that determine durations or time intervals.

Essentially, each time interval < t1, t2 > can be regarded as a discrete set of time

points {t1, t1 + 0 : 01, . . . t2}. This leads to the following definition.

106

Page 11: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

Definition 6.5: Precision-Recall (Time): Given an environment with m zones,

n occupants O = {o1 . . . on}, an event sequence E = e1 . . . ex, and ground truth GT .

For an time-based query Q, suppose Relt is the set of relevant time units that satisfy

the query as per GT , and Rett is the set of retrieved time units as per the data model

and occupancy relation. Then, precisiont = |Rett ∩ Relt| / |Rett| and recallt =

|Rett ∩ Relt| / |Relt|

We discuss the evaluation of performance metrics for a few select queries. The queries

are based on data obtained from the simulator for a simulation run with n = 15, σ

= 1.0 (100%) and varying θ.

Query 7. Who all came to work today before 9:00 am?

Answer: Table 6.4a shows a listing of all the relevant events matching the criteria

of the query - ground truth events corresponding to detection of occupants at the

entrance before 9: 00 am. Table 6.4b shows the set of results matching the criteria

of the query, retrieved by the smart environment at varying levels of recognition

threshold θ.

Time Zone Occ Prob8:25 entrance o12 0.158:38 entrance o1 0.358:40 entrance o8 0.718:44 entrance o14 0.398:58 entrance o4 0.58

(a) Relevant Events for Query 7

θ Occ0 o12, o1, o8, o38,o40.2 o12, o1, o8, o38,o40.4 o12, o1, o8,o40.6 o8≥0.8 -

(b) Retrieved Results for Query 7

Table 6.4: Processing Query 7

Table 6.5 shows the calculation of precision and recall metrics based on definition 6.4.

At lower values of θ, a larger number of available matches (including spurious

track events) are retrieved, as the number of estimated tracks are abnormally high

(discussed in section 5.4.2). Many of these estimated tracks are spurious and hence

cause the inclusion of the spurious track events in the retrieved results. As recognition

threshold increases, the number of estimated valid tracks converges to the genuine

107

Page 12: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

number of occupants, effectively reducing the extent of spurious events. At θ

= 0.4 (which corresponds to an optimal recognition threshold at a system wide

level based on experiments in section 5.4.2), we observe a maximum precision of

1. Beyond the optimal recognition threshold, the number of estimated valid tracks

reduces drastically, and in this process most of the spurious tracks and some of the

valid occupant tracks get filtered. Thus the retrieval results are affected, effectively

reducing only the recall values. Beyond θ = 0.8, the system is not able to retrieve

any results, leading to a null value for precision and recall.

θ |Ret | |Rel | |Ret ∩ Rel | Precision =|Ret∩Rel ||Ret |

Recall =|Ret∩Rel ||Rel |

0.0 5 5 4 0.80 0.800.2 5 5 4 0.80 0.800.4 4 5 4 1 0.800.6 1 5 1 1 0.20≥0.8 0 5 0 0 0

Table 6.5: Query-based Performance Metrics - Query 7

Query 8. Who all were present in the cafeteria between 10:00 am - 11:00 am?

Answer: Table 6.6a shows a listing of all the relevant events matching the criteria

of the query - ground truth events corresponding to detection of occupants at the

cafeteria between 10:00 am - 11:00 am.

Time Zone Occ Prob10:03 cafeteria o12 0.2710:05 cafeteria o8 0.1910:08 cafeteria o4 0.5010:26 cafeteria o2 0.2510:29 cafeteria o15 0.2110:29 cafeteria o6 0.7110:54 cafeteria o3 0.76

(a) Relevant Events for Query 8

θ Occ0 o12, o34,o4, o25, o15, o6, o30.2 o12, o34,o4, o25, o15, o6, o30.4 o12, o4, o15, o6, o3≥0.6 -

(b) Retrieved Results for Query 8

Table 6.6: Processing Query 8

Table 6.6b shows the set of results matching the criteria of the query, retrieved by

the smart environment at varying levels of recognition threshold θ. Table 6.7 shows

108

Page 13: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

the calculation of precision and recall metrics based on the definition 6.4 using the

retrieved and relevant results. Like in query 7, we observe that precision becomes

maximum at recognition threshold θ = 0.4 and drops to zero thereafter (from θ =

0.6).

θ |Ret | |Rel | |Ret ∩ Rel | Precision =|Ret∩Rel ||Ret |

Recall =|Ret∩Rel ||Rel |

0.0 7 7 5 0.71 0.710.2 7 7 5 0.71 0.710.4 5 7 5 1 0.710.6 0 7 0 0 0

Table 6.7: Query-based Performance Metrics - Query 8

Query 9. How long was occupant o10 present in the mail room today?

Answer: Table 6.8a shows a listing of all the relevant events matching the criteria

of the query - ground truth events corresponding to detection of occupant o10 in the

mail room and the total duration of time o10 spent in the mail room.

Time Interval Zone Duration9:32 - 9:34 mail 211:11- 11:14 mail 315:29 - 15:32 mail 3

(a) Relevant Track for o10

θ Interval Duration0 9.32 - 9.37 50.2 9.32 - 9.37 50.4 9.32 - 9.37 50.6 9.32 - 9.37 5≥0.8 0 0(b) Retrieved Results for Query 9

Table 6.8: Processing Query 9

Table 6.8b shows the set of results matching the criteria of the query, retrieved by the

smart environment at varying levels of recognition threshold θ. The events at time

stamps 11:11 and 15:29 of occupant o10 are misidentified by the system, effectively

ruling out their retrieval. The duration of presence associated with the event at 9:32 is

overestimated as the subsequent event was misidentified leading to an overestimation

of duration of presence. Table 6.9 shows the calculation of precision and recall metrics

109

Page 14: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

based on definition 6.5. Precision and recall values are observed to be low and

remains constant across varying recognition threshold upto θ = 0.6, and drops to

zero thereafter (from θ = 0.8).

θ |Ret | |Rel | |Ret ∩ Rel | Precision =|Ret∩Rel ||Ret |

Recall =|Ret∩Rel ||Rel |

0.0 5 8 2 0.40 0.250.2 5 8 2 0.40 0.250.4 5 8 2 0.40 0.250.6 5 8 2 0.40 0.25≥0.8 0 8 0 0 0

Table 6.9: Query-based Performance Metrics - Query 9

Figure 6.1: Query-based Precision Recall: Before Reasoning

Figure 6.1 shows the precision recall plot for the three queries discussed above. It is

observed that performance metrics of duration based queries are more susceptible to

the detrimental effects of error arising out of misidentified events. The performance

metrics of queries 1 and 2, which are based on a set of occupants and their tracks,

exhibit moderately high values of precision and recall upto the optimal recognition

threshold, and in some cases slightly beyond the optimal point for the precision metric.

On applying spatio-temporal reasoning (figure 6.2), query 9 shows a marked

improvement in the precision as a missing exit event is inserted into the track of

occupant o10 at time stamp 9:34, after the mail room event of 9:32, thereby raising

the precision to 1.

110

Page 15: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

Figure 6.2: Query-based Precision Recall: Impact of Reasoning

Observing the behavior across these queries, we can conclude that the optimal

behavior of query dependent performance metrics, closely follows the optimal behavior

of query independent metrics, which was found to occur at recognition threshold θ =

0.4 (from section 5.4.2)

6.4 Related Work

Traditionally, database management systems have been used in handling precise data

in applications such as payroll and inventory. The approach to queries in the domain

of databases has been different from the information retrieval arena. SQL queries in

databases are associated with a rich structure and a precise semantics that facilitates

formulation of complex queries at a user level and complex optimizations at the system

level. However, users are expected to have a detailed knowledge of the database to

formulate queries successfully. On the other hand, query formulation in information

retrieval (IR) is based a set of keywords which makes this process easy for casual

users. Additionally, IR queries provide two important features, otherwise missing in

databases: ranking of results and inclusion of uncertain matches, i.e., the results may

include documents may not match all the keywords in the query.

In domains such as supply chain management [123], financial services [36], business

activity monitoring [66], elder care [78, 96], and various pervasive computing scenarios

111

Page 16: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

[33, 83], applications are required to extract complex events (mostly user-specified)

from data streams of low-level atomic events. Over the years, a broad range of

applications that need to manage large and imprecise data sets in domains such as

sensor networks [37], information extraction [47] and business intelligence [18] have

emerged. The imprecision in data can be due to factors such as measurement errors

in sensor networks [65, 70], or the inherent ambiguity of natural language-text in

information extraction [47] or due to the nature of inference or prediction algorithms

[33, 78, 96] . Imprecision could be a conscious decision in business intelligence driven

by the goal of minimizing the cost of data cleaning or in privacy applications where

it is artificially generated to hide sensitive attributes of individuals so that the data

may be published. The conventional database management systems are incapable of

handling large volumes of imprecise data associated with an increasing number of new

applications, as imprecision is modeled in a probabilistic manner. The existing rich

query languages coupled with some of the event detection engines such as Cayuga

[36], SASE [128] or SnoopIB [1] are capable of extracting sophisticated patterns from

event streams, though these languages require the data to be precise.

A probabilistic database management system (ProbDMS) [9, 31, 29] stores large

volumes of probabilistic data and supports complex queries in addition to the

standard features supported by conventional database management systems. The

major challenge in a ProbDMS is that it needs to exhibit scalability with increase

in data volume, like conventional database management systems, as well as perform

probabilistic inference, a challenging problem using Artificial Intelligence techniques.

To address this issue, researchers have exploited the special form of probabilistic

inference that occurs during query evaluation on relational probabilistic data leading

to techniques such as: lineage-based representations [9], safe plans [30], algorithms for

top-k queries [99], and representations of views over probabilistic data [100]. Recent

work [101, 28, 64] on probabilistic data streams has investigated queries of varying

complexity. Extensions to SQL with provision for uncertain matches and ranked

results have been proposed [3, 59, 89], though with certain restrictions.

112

Page 17: 6 Retrieval - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6 Retrieval Thestatetransitionsystemmodelprovidesanaturalbasisforretrievalofanswersin

There has been considerable research spanning two decades in the area of temporal

query languages [113]. Likewise, spatial databases also have a rich history, especially

in its application to geographic information systems (GIS) [34, 48, 112]. Probabilistic

versions of temporal and spatial query languages have been a studied in references

[35, 91]. A spatio-temporal data model and query language capable of handling time-

dependent geometries, including those changing continuously that describe moving

objects is discussed in reference [46].

113