6 retrieval - shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/2368/15/15_chapter 6.pdf · 6...
TRANSCRIPT
6 Retrieval
The state transition system model provides a natural basis for retrieval of answers in
response to various queries about the presence and movements of occupants within
the smart environment. In this chapter, we first formulate a data model based upon
an occupancy relation with a real-valued probability attribute. This relation captures
the state information after recognition and reasoning have been performed and a set of
valid occupants have been determined. We then show how to formulate a few spatio-
temporal queries. We first formulate queries using the well-known SQL database
query language focusing on the computation of probabilities, an aspect that is novel
to our model. It is well-known that an SQL-like language cannot formulate general
transitive closures, and hence we explore the formulation of more complex queries
in a constraint-based extension of logic programs, called CLP(R), which permits
reasoning over real-valued variables and arithmetic operations. Our objective in this
chapter is to outline the basic approach to query formulation rather than to provide
a comprehensive coverage of all possible types of queries.
We also formulate in this chapter the precision and recall metrics in a query-
dependent manner, since the performance of a smart environment is ultimately
determined by how well it can respond to the queries that the end-user is interested
in. We provide examples for calculation of query-dependent performance metrics
based on experimental results from a simulation run. We conclude the chapter with
an overview of event processing in probabilistic databases.
6.1 Data Model
We begin with a presentation of the data model underlying our query language. While
the data model is basically relational in nature, it departs from the standard relational
model in the use of a real-valued attribute for the probability.
97
Definition 6.1: State Relation: Given an environment with occupants O =
o1 . . . on, zones Z = z1 . . . zm, and biometric events occurring at distinct increasing
times T = t1 . . . tx, the states of the smart environment can be represented as a
relation
state(time, occupant, zone, probability)
where occupant ∈ O, zone ∈ Z, and time ∈ T , where T ⊆ {00:00, 00:01, . . . , 23:59},
a discrete totally ordered set of time units. The attribute probability ∈ R, the set of
real numbers, and is functionally dependent on the other three attributes. The state
relation satisfies the following integrity constraint:
∀t∀i Σ{p : (∃z) state(t, oi, z, p)} = 1
For simplicity, we assume that time is discretized as a totally ordered set of hour-
minute points over a 24-hour period, although it can be extended to cover multiple
days, months, and years in a straightforward way. We use the state relation in order
to represent the result after recognition and reasoning have been completed.
Occ Entr. Office Lounge Ext.o1 0.08 0.23 0.67 0.02o2 0.25 0.45 0.20 0.10o3 0.12 0.62 0.20 0.06o4 0.05 0.11 0.20 0.64o5 0.78 0.13 0.06 0.03GT o5 o2,o3 o1 o4
(a) State s8 at 10:15 am
Occ Entr. Office Lounge Ext.o1 0.07 0.21 0.70 0.02o2 0.11 0.20 0.64 0.05o3 0.08 0.39 0.49 0.04o4 0.05 0.11 0.20 0.64o5 0.78 0.13 0.06 0.03GT o5 o3 o1,o2 o4
(b) State s9 at 10:20 am
Table 6.1: Sample States
Table 6.1 shows a fragment of two consecutive states in the smart environment, and
the state relation based on these states is shown in table 6.2. Note that the tuples in
the state relation shown correspond only to those time units at which the biometric
events occur. At each such time t, the state is represented by a set of m × n tuples
corresponding to all possible occupant-zone pairs.
98
State Relationsstate(10:15, o1, entrance, 0.08)state(10:15, o1, office, 0.23)state(10:15, o1, lounge, 0.67)state(10:15, o1, external, 0.02)
...state(10:15, o5, entrance, 0.78)
...state(10:15, o5, external, 0.03)
(a) State Relation for s8
State Relationsstate(10:20, o1, entrance, 0.07)state(10:20,o1, office, 0.21)state(10:20, o1, lounge, 0.70)state(10:20, o1, external, 0.02)
...state(10:20, o5, entrance, 0.78)
...state(10:20, o5, external, 0.03)
(b) State Relation s9
Table 6.2: Sample State Relations
As discussed in the previous chapter, a person is considered as a valid occupant only
if his track is valid. We therefore define an reduced state relation, state′, from the
state relation by selecting those tuples that correspond to valid occupants. Using this
relation we define an occupancy relation, as follows.
Definition 6.2: Occupancy Relation: Given a smart environment with O =
o1 . . . on, Z = z1 . . . zm, and valid occupants V ⊆ O, we define:
∀t∀o∀z[state′(t, o, z, pr) ← state(t, o, z, pr) ∧ o ∈ V ]
Assuming an increasing time sequence T = t1 . . . tx, we define
∀i ∀o∀z[occupancy(ti, t′i+1, o, z, pr) ← state′(ti, o, z, pr) ∧ 1 ≤ i ≤ (x− 1)]
where t′ is the time unit that precedes t by one minute.
Occupancy Relationsoccupancy(10:15, 10:19, o1, entrance, 0.08)occupancy(10:15, 10:19, o1, office, 0.23)occupancy(10:15, 10:19, o1, lounge, 0.67)occupancy(10:15, 10:19, o1, external, 0.02)
...occupancy(10:15, 10:19, o5, entrance, 0.78)
...occupancy(10:15, 10:19, o5, external, 0.03)
Table 6.3: Sample Occupancy Relation
99
Table 6.3 is a snapshot of the occupancy relation based on two consecutive states
shown in table 6.1. Like in state relation, for each time interval [ti, t′i+1], the occupancy
consists of a set of m× n tuples corresponding to all possible person-zone pairs. The
occupancy relation also satisfies the integrity constraint:
∀t∀i Σ{p : (∃z) occupancy(ti, t′i+1, oi, z, p)} = 1
6.2 Query Formulation
We first use the well-known database query language SQL (Structured Query
Language) for writing queries and then discuss the use of CLP (constraint logic
programming) for more complex queries. Our focus will be on queries involving
the computation of probabilities, as this is the novel part of the work.
6.2.1 Structured Query Language
Below is the syntax of the most basic form of SQL queries [73]:
SELECT attributesFROM relationsWHERE condition
We will use the
occupancy(from, to, person, zone, prob)
relation as the basis for formulating queries. The tuples of the relations that satisfy
the condition are selected and the relevant attributes are returned as the result. The
condition is typically a conjunction of simpler tests that serve as a basis for tuple
selection. There are numerous extensions to the basic syntax outlined above, in order
to perform aggregate operations, grouping, ordering, etc.
Query 1. What is the probability that occupant o3 was in the lounge at 3:00 pm?
Answer: For this query, it is necessary to check the occupancy relation for the
presence of o3 in the lounge at a time interval that contains 3:00 pm. If o3 was
detected to be in the lounge, at most one tuple in occupancy will satisfy the query;
100
otherwise the query returns without any answer. For simplicity, we assume overloaded
comparison operators <=, <, =, etc., that are defined on time values.
SELECT probFROM occupancyWHERE person = o3 and
zone = lounge andfrom <= 15:00 <= to
Query 2. What is the probability that occupant o5 was in the lounge or office at
3:00 pm?
Answer: The probability of an occupant being in one of two zones at any given time
is the sum of the probabilities of the occupant in these zones at that time.
SELECT SUM(prob)FROM occupancyWHERE person = o5 and
zone in (lounge, office)from <= 15:00 <= to
Query 3. What is the probability that occupant o7 was in the lounge during 10:00
am to 11:00 am?
Answer: The probability that an occupant was in the lounge at a given time is the 1
- sum of the probabilities that he was in one of the other zones at this time (since the
sum of the probabilities across all zones = 1 at any given time). Since there could be
multiple sub-intervals within 10:00 am to 11:00 am during which o7 was in the lounge
(with different probabilities), the answer to the query is 1 minus the product of the
probabilities that he was not in the lounge during every such sub-interval.
(1 - PROD(SELECT (1 - SUM(prob)) as sumprobFROM occupancyWHERE person = o7 and
zone 6= lounge and10:00 <= from and to <= 11:00
GROUP BY from))
101
6.2.2 Constraint Logic Programming
Logic programs offer a more expressive query paradigm than SQL because they
permit the formulation of general recursive queries. SQL offers the guarantee that
all queries will terminate, an important requirement for a database query language.
The subset of Horn clauses called Datalog, which is essentially Horn clauses without
function symbols [20], also has the strong termination property and has been studied
extensively in the literature. Since our underlying data model uses a real-valued
attribute for probability along with operations for comparison and arithmetic, it
is more natural to adopt the paradigm of constraint logic programming over reals,
CLP(R) [61], rather than Horn clauses. Essentially, CLP(R) extends Horn clauses by
generalizing unification to constraint satisfaction. Typically, CLP(R) systems provide
solvers for linear equalities and inequalities; non-linear equations and inequations are
deferred until one or more variables become bound and they become linear. They
also support aggregation predicates, such as min, max, sum, count, etc., and we will
make use of such operations in our formulations as well.
A CLP(R) program is a collection of rules, each of which may be one of two forms:
p(t̄)
p(t̄) : − p1(t̄1) . . . pk(t̄k)
where each p is a user-defined predicate and each p1 . . . pk may be user-defined or may
be one of a pre-defined set of builtin constraint predicates, such as ≤, ge, etc. The
terms t̄ and t̄i for 1 ≤ i ≤ k include ordinary terms as in Horn clauses as well as
terms composed from real numbers, variables, and the usual arithmetic operators.
As CLP(R) is not a well-known paradigm, we first show how to formulate queries 1-3
in the language, and then illustrate more complex queries. A CLP(R) program for
query 1 would be:
query1(Prob) :-occupancy(From, To, o3, lounge, Prob),From <= 15:00, To => 15:00.
102
Here we are taking the liberty to use the <= and => operators for comparing time
units.
Query 2 can be expressed as follows, where sum(P, G, A) is an aggregation predicate
which sums up all instances of P that satisfy the goal G and places the result in A.
The predicate member is the standard list membership predicate.
query2(Prob) :-sum(P,q2(P),Prob),
q2(Prob) :-occupancy(From, To, o5, Z, Prob),From <= 15:00, To => 15:00,member(Z, [lounge, office]).
The formulation of query 3 makes use of the sum and prod aggregation predicates.
The effect of the ’GROUP BY from’ clause in the SQL query is realized by not making
the variable F in the goal q3b(F,P) an existential variable. In this manner, the
predicate q3a(Prob) defines the probabilities for occupant o7 not being in the lounge
at every time interval from 10:00 to 11:00 am.
query3(1-Prob) :-prod(P, q3a(P), Prob).
q3a(1-S) :-sum(P, q3b(F,P), S).
q3b(From,Prob) :-occupancy(From, To, o7, Z, Prob),Z 6= lounge,10:00 <= From, 11:00 => To.
Query 4. What is the probability that o3 and o5 met in the lounge today? Assume
that “met” means “being in the same zone at the same time”.
Answer: The probability that o3 and o5 met in the lounge today is 1 minus the
probability that o3 and o5 did not meet in the lounge. For any interval, the probability
of not having met in the lounge during this interval is 1 minus product of the
103
probabilities of their being in the lounge during this interval – predicate q4 returns
this probability for every overlapping interval.
query4(1-Prob) :-prod(P, q4(P), Prob).
q4(1 - Prob1*Prob2) :-occupancy(From1, To1, o3, lounge, Prob1),occupancy(From2, To2, o5, lounge, Prob2),overlaps(From1,To1, From2, To2)
overlaps(F1,T1,F2,T2) :-F1 <= F2, F2 <= T1.
overlaps(F1,T1,F2,T2) :-F2 <= F1, F1 <= T2.
Query 5. What is the longest contiguous duration during which occupant o1 was in
the office?
Answer: We define the contiguous occupancy in a zone recursively and use this
definition in order to define the longest duration.
query5(Duration) :-setof(F,occupancy(F,_,o1,office,_), FromSet),max(D,q5(FromSet,D), Duration).
q5(FS,D) :-member(F,FS),contiguous(F,T,o1,office),D = T - F.
contiguous(F,T,P,Z) :-occupancy(F,T1,P,Z, _),T2 = T1 + 0:01,contiguous(T2,T,P,Z).
contiguous(F,T,P,Z) :-\+ occupancy(F,_,P,Z,_),T = F - 0:01.
It is straightforward to extend the above definition of contiguous so that the
average probability during this period is also included. Other extensions include
the incorporation of distances between adjacent zones and spatial queries that make
104
use of this distance information. As can be seen from the above formulations and
possible extensions, the CLP(R) paradigm is a powerful paradigm for probabilistic
spatio-temporal queries.
Query 6. If occupant o5 was known to be in the lounge at 3:00 pm with certainty,
what is the probability that o3 was in the lounge at that time?
Answer: Such a query cannot be answered without re-initializing the state and
occupancy relations. The knowledge that o5 was in the lounge at 3:00 pm with
certainty means that o5 should also have been present with certainty in the zone
preceding his entrance to the lounge. Inductively, we can say that o5 was present with
certainty in all zones in his track leading up to entrance to the lounge at or preceding
3:00 pm. Thus, we can define a revised event sequence in which the probability
of o5 is 1.0 for each event in his track leading up to this lounge at 3:00 pm, and
the probabilities of all other occupants for each such event are zero (0). The state
transitions are carried out with the revised event sequence in order to determine a
revised set of states and valid occupants, and thereby a revised occupancy relation.
Now, query 1 can be invoked in order to determine the probability of o3 in the lounge
at 3:00 pm.
The above query shows the close interconnection between retrieval, reasoning,
and recognition. A ‘what if’ retrieval query that declares knowledge about an event
causes the redefinition of one or more biometric events, thereby triggering the state
transition module to compute a new set of states, thereby triggering the reasoning
module to determine a new set of valid occupants, which the retrieval module uses
to determine a new occupancy relation for answering the query.
6.3 Query-dependent Performance Metrics
The query-dependent characterization involves evaluating the performance of the
model from an information retrieval perspective based on its ability to answer spatio-
temporal queries about the environment and its occupants. While query-independent
is holistic and involves performance characterization at a system level, the scope of
105
query-dependent performance characterization is restricted to the spatio-temporal
dimensions that are either explicit or implicit from the query of interest. At a very
narrow level, the window of interest for evaluating the performance may only concern
an occupant’s presence in a particular zone at a specific point in time to a more broad
level that may concern the entire sequence of states of the state transition model.
The performance metrics of any given query of interest are defined in terms of the
ground truth, which is a set of true answers associated with the query. The nature of
the response set may vary depending on the type of query posed and may comprise
of basic entities or attributes of the smart environment such as occupants, zones,
probabilities of presence, time of occurrence or derived attributes such as duration of
presence, tracks, etc., which are based on relations that can be defined as part of the
data model.
Definition 6.3: Ground Truth: Given n occupants O={o1 . . . on} and an event
sequence e1 . . . ex, then the ground truth, GT , is a sequence oi1 . . . oix where each
index i1 . . . ix lies in the range 1 . . . n.
The ground truth basically states which person was the true occupant for each
biomeric event. We first define occupancy-based precision and recall, as follows.
Definition 6.4: Precision-Recall (Occupants): Given an environment with m
zones, n occupants O = {o1 . . . on}, an event sequence E = e1 . . . ex, and ground truth
GT . For an occupancy-based query Q, suppose Relo is the set of relevant occupants
that satisfies the query as per GT , and Reto is the set of retrieved occupants as per
the data model and occupancy relation. Then, precisiono = |Reto ∩ Relo| / |Reto|
and recallo = |Reto ∩ Relo| / |Relo|.
This definition can be extended to queries that determine durations or time intervals.
Essentially, each time interval < t1, t2 > can be regarded as a discrete set of time
points {t1, t1 + 0 : 01, . . . t2}. This leads to the following definition.
106
Definition 6.5: Precision-Recall (Time): Given an environment with m zones,
n occupants O = {o1 . . . on}, an event sequence E = e1 . . . ex, and ground truth GT .
For an time-based query Q, suppose Relt is the set of relevant time units that satisfy
the query as per GT , and Rett is the set of retrieved time units as per the data model
and occupancy relation. Then, precisiont = |Rett ∩ Relt| / |Rett| and recallt =
|Rett ∩ Relt| / |Relt|
We discuss the evaluation of performance metrics for a few select queries. The queries
are based on data obtained from the simulator for a simulation run with n = 15, σ
= 1.0 (100%) and varying θ.
Query 7. Who all came to work today before 9:00 am?
Answer: Table 6.4a shows a listing of all the relevant events matching the criteria
of the query - ground truth events corresponding to detection of occupants at the
entrance before 9: 00 am. Table 6.4b shows the set of results matching the criteria
of the query, retrieved by the smart environment at varying levels of recognition
threshold θ.
Time Zone Occ Prob8:25 entrance o12 0.158:38 entrance o1 0.358:40 entrance o8 0.718:44 entrance o14 0.398:58 entrance o4 0.58
(a) Relevant Events for Query 7
θ Occ0 o12, o1, o8, o38,o40.2 o12, o1, o8, o38,o40.4 o12, o1, o8,o40.6 o8≥0.8 -
(b) Retrieved Results for Query 7
Table 6.4: Processing Query 7
Table 6.5 shows the calculation of precision and recall metrics based on definition 6.4.
At lower values of θ, a larger number of available matches (including spurious
track events) are retrieved, as the number of estimated tracks are abnormally high
(discussed in section 5.4.2). Many of these estimated tracks are spurious and hence
cause the inclusion of the spurious track events in the retrieved results. As recognition
threshold increases, the number of estimated valid tracks converges to the genuine
107
number of occupants, effectively reducing the extent of spurious events. At θ
= 0.4 (which corresponds to an optimal recognition threshold at a system wide
level based on experiments in section 5.4.2), we observe a maximum precision of
1. Beyond the optimal recognition threshold, the number of estimated valid tracks
reduces drastically, and in this process most of the spurious tracks and some of the
valid occupant tracks get filtered. Thus the retrieval results are affected, effectively
reducing only the recall values. Beyond θ = 0.8, the system is not able to retrieve
any results, leading to a null value for precision and recall.
θ |Ret | |Rel | |Ret ∩ Rel | Precision =|Ret∩Rel ||Ret |
Recall =|Ret∩Rel ||Rel |
0.0 5 5 4 0.80 0.800.2 5 5 4 0.80 0.800.4 4 5 4 1 0.800.6 1 5 1 1 0.20≥0.8 0 5 0 0 0
Table 6.5: Query-based Performance Metrics - Query 7
Query 8. Who all were present in the cafeteria between 10:00 am - 11:00 am?
Answer: Table 6.6a shows a listing of all the relevant events matching the criteria
of the query - ground truth events corresponding to detection of occupants at the
cafeteria between 10:00 am - 11:00 am.
Time Zone Occ Prob10:03 cafeteria o12 0.2710:05 cafeteria o8 0.1910:08 cafeteria o4 0.5010:26 cafeteria o2 0.2510:29 cafeteria o15 0.2110:29 cafeteria o6 0.7110:54 cafeteria o3 0.76
(a) Relevant Events for Query 8
θ Occ0 o12, o34,o4, o25, o15, o6, o30.2 o12, o34,o4, o25, o15, o6, o30.4 o12, o4, o15, o6, o3≥0.6 -
(b) Retrieved Results for Query 8
Table 6.6: Processing Query 8
Table 6.6b shows the set of results matching the criteria of the query, retrieved by
the smart environment at varying levels of recognition threshold θ. Table 6.7 shows
108
the calculation of precision and recall metrics based on the definition 6.4 using the
retrieved and relevant results. Like in query 7, we observe that precision becomes
maximum at recognition threshold θ = 0.4 and drops to zero thereafter (from θ =
0.6).
θ |Ret | |Rel | |Ret ∩ Rel | Precision =|Ret∩Rel ||Ret |
Recall =|Ret∩Rel ||Rel |
0.0 7 7 5 0.71 0.710.2 7 7 5 0.71 0.710.4 5 7 5 1 0.710.6 0 7 0 0 0
Table 6.7: Query-based Performance Metrics - Query 8
Query 9. How long was occupant o10 present in the mail room today?
Answer: Table 6.8a shows a listing of all the relevant events matching the criteria
of the query - ground truth events corresponding to detection of occupant o10 in the
mail room and the total duration of time o10 spent in the mail room.
Time Interval Zone Duration9:32 - 9:34 mail 211:11- 11:14 mail 315:29 - 15:32 mail 3
(a) Relevant Track for o10
θ Interval Duration0 9.32 - 9.37 50.2 9.32 - 9.37 50.4 9.32 - 9.37 50.6 9.32 - 9.37 5≥0.8 0 0(b) Retrieved Results for Query 9
Table 6.8: Processing Query 9
Table 6.8b shows the set of results matching the criteria of the query, retrieved by the
smart environment at varying levels of recognition threshold θ. The events at time
stamps 11:11 and 15:29 of occupant o10 are misidentified by the system, effectively
ruling out their retrieval. The duration of presence associated with the event at 9:32 is
overestimated as the subsequent event was misidentified leading to an overestimation
of duration of presence. Table 6.9 shows the calculation of precision and recall metrics
109
based on definition 6.5. Precision and recall values are observed to be low and
remains constant across varying recognition threshold upto θ = 0.6, and drops to
zero thereafter (from θ = 0.8).
θ |Ret | |Rel | |Ret ∩ Rel | Precision =|Ret∩Rel ||Ret |
Recall =|Ret∩Rel ||Rel |
0.0 5 8 2 0.40 0.250.2 5 8 2 0.40 0.250.4 5 8 2 0.40 0.250.6 5 8 2 0.40 0.25≥0.8 0 8 0 0 0
Table 6.9: Query-based Performance Metrics - Query 9
Figure 6.1: Query-based Precision Recall: Before Reasoning
Figure 6.1 shows the precision recall plot for the three queries discussed above. It is
observed that performance metrics of duration based queries are more susceptible to
the detrimental effects of error arising out of misidentified events. The performance
metrics of queries 1 and 2, which are based on a set of occupants and their tracks,
exhibit moderately high values of precision and recall upto the optimal recognition
threshold, and in some cases slightly beyond the optimal point for the precision metric.
On applying spatio-temporal reasoning (figure 6.2), query 9 shows a marked
improvement in the precision as a missing exit event is inserted into the track of
occupant o10 at time stamp 9:34, after the mail room event of 9:32, thereby raising
the precision to 1.
110
Figure 6.2: Query-based Precision Recall: Impact of Reasoning
Observing the behavior across these queries, we can conclude that the optimal
behavior of query dependent performance metrics, closely follows the optimal behavior
of query independent metrics, which was found to occur at recognition threshold θ =
0.4 (from section 5.4.2)
6.4 Related Work
Traditionally, database management systems have been used in handling precise data
in applications such as payroll and inventory. The approach to queries in the domain
of databases has been different from the information retrieval arena. SQL queries in
databases are associated with a rich structure and a precise semantics that facilitates
formulation of complex queries at a user level and complex optimizations at the system
level. However, users are expected to have a detailed knowledge of the database to
formulate queries successfully. On the other hand, query formulation in information
retrieval (IR) is based a set of keywords which makes this process easy for casual
users. Additionally, IR queries provide two important features, otherwise missing in
databases: ranking of results and inclusion of uncertain matches, i.e., the results may
include documents may not match all the keywords in the query.
In domains such as supply chain management [123], financial services [36], business
activity monitoring [66], elder care [78, 96], and various pervasive computing scenarios
111
[33, 83], applications are required to extract complex events (mostly user-specified)
from data streams of low-level atomic events. Over the years, a broad range of
applications that need to manage large and imprecise data sets in domains such as
sensor networks [37], information extraction [47] and business intelligence [18] have
emerged. The imprecision in data can be due to factors such as measurement errors
in sensor networks [65, 70], or the inherent ambiguity of natural language-text in
information extraction [47] or due to the nature of inference or prediction algorithms
[33, 78, 96] . Imprecision could be a conscious decision in business intelligence driven
by the goal of minimizing the cost of data cleaning or in privacy applications where
it is artificially generated to hide sensitive attributes of individuals so that the data
may be published. The conventional database management systems are incapable of
handling large volumes of imprecise data associated with an increasing number of new
applications, as imprecision is modeled in a probabilistic manner. The existing rich
query languages coupled with some of the event detection engines such as Cayuga
[36], SASE [128] or SnoopIB [1] are capable of extracting sophisticated patterns from
event streams, though these languages require the data to be precise.
A probabilistic database management system (ProbDMS) [9, 31, 29] stores large
volumes of probabilistic data and supports complex queries in addition to the
standard features supported by conventional database management systems. The
major challenge in a ProbDMS is that it needs to exhibit scalability with increase
in data volume, like conventional database management systems, as well as perform
probabilistic inference, a challenging problem using Artificial Intelligence techniques.
To address this issue, researchers have exploited the special form of probabilistic
inference that occurs during query evaluation on relational probabilistic data leading
to techniques such as: lineage-based representations [9], safe plans [30], algorithms for
top-k queries [99], and representations of views over probabilistic data [100]. Recent
work [101, 28, 64] on probabilistic data streams has investigated queries of varying
complexity. Extensions to SQL with provision for uncertain matches and ranked
results have been proposed [3, 59, 89], though with certain restrictions.
112
There has been considerable research spanning two decades in the area of temporal
query languages [113]. Likewise, spatial databases also have a rich history, especially
in its application to geographic information systems (GIS) [34, 48, 112]. Probabilistic
versions of temporal and spatial query languages have been a studied in references
[35, 91]. A spatio-temporal data model and query language capable of handling time-
dependent geometries, including those changing continuously that describe moving
objects is discussed in reference [46].
113