a data model for fuzzy linguistic databases with flexible querying

11
A Data Model for Fuzzy Linguistic Databases with Flexible Querying Van Hung Le, Fei Liu, and Hongen Lu Department of Computer Science and Computer Engineering La Trobe University, Bundoora, VIC 3086, Australia [email protected], [email protected], [email protected] Abstract. Information to be stored in databases is often fuzzy. Two important issues in research in this field are the representation of fuzzy in- formation in a database and the provision of flexibility in database query- ing, especially via including linguistic terms in human-oriented queries and returning results with matching degrees. Fuzzy linguistic logic pro- gramming (FLLP), where truth values are linguistic, and hedges can be used as unary connectives in formulae, is introduced to facilitate the representation and reasoning with linguistically-expressed human knowl- edge. This paper presents a data model based on FLLP called fuzzy linguistic Datalog for fuzzy linguistic databases with flexible querying. Keywords: Fuzzy database, querying, fuzzy logic programming, hedge algebra, linguistic value, linguistic hedge, Datalog. 1 Introduction Humans mostly use words to characterise and to assess objects and phenomena in the real world. Thus, it is a natural demand for formalisms that can represent and reason with human knowledge expressed in linguistic terms. FLLP [1] is such a formalism. In FLLP, each fact or rule (a many-valued implication) is associated with a linguistic truth value, e.g., Very True or Little False, taken from a linear hedge algebra (HA) of a linguistic variable Truth [2,3], and linguistic hedges (modifiers), e.g., Very and Little, can be used as unary connectives in formulae. The latter is motivated by the fact that humans often use hedges to state different levels of emphasis, e.g., very close and quite close. Deductive databases combine logic programming and relational databases to construct systems that are powerful (e.g., can handle recursive queries), still fast and able to deal with very large volumes of data (e.g., utilizing set-oriented processing instead of one tuple at a time). In this work, we develop an extension of Datalog [4] called fuzzy linguistic Datalog (FLDL) by means of FLLP. Features of FLDL are: (i) It enables to find answers to queries over a fuzzy linguistic database (FLDB) using a fuzzy linguistic knowledge base (FLKB) represented by an FLDL program in which all components (except usual connectives) can be expressed in linguistic terms; (ii) Results are returned with a comparative linguistic truth value and thus can be ranked accordingly. A. Nicholson and X. Li (Eds.): AI 2009, LNAI 5866, pp. 495–505, 2009. c Springer-Verlag Berlin Heidelberg 2009

Upload: humg

Post on 29-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

A Data Model for Fuzzy Linguistic Databases

with Flexible Querying

Van Hung Le, Fei Liu, and Hongen Lu

Department of Computer Science and Computer EngineeringLa Trobe University, Bundoora, VIC 3086, Australia

[email protected], [email protected], [email protected]

Abstract. Information to be stored in databases is often fuzzy. Twoimportant issues in research in this field are the representation of fuzzy in-formation in a database and the provision of flexibility in database query-ing, especially via including linguistic terms in human-oriented queriesand returning results with matching degrees. Fuzzy linguistic logic pro-gramming (FLLP), where truth values are linguistic, and hedges canbe used as unary connectives in formulae, is introduced to facilitate therepresentation and reasoning with linguistically-expressed human knowl-edge. This paper presents a data model based on FLLP called fuzzylinguistic Datalog for fuzzy linguistic databases with flexible querying.

Keywords: Fuzzy database, querying, fuzzy logic programming, hedgealgebra, linguistic value, linguistic hedge, Datalog.

1 Introduction

Humans mostly use words to characterise and to assess objects and phenomenain the real world. Thus, it is a natural demand for formalisms that can representand reason with human knowledge expressed in linguistic terms. FLLP [1] is sucha formalism. In FLLP, each fact or rule (a many-valued implication) is associatedwith a linguistic truth value, e.g., Very True or Little False, taken from a linearhedge algebra (HA) of a linguistic variable Truth [2,3], and linguistic hedges(modifiers), e.g., Very and Little, can be used as unary connectives in formulae.The latter is motivated by the fact that humans often use hedges to state differentlevels of emphasis, e.g., very close and quite close.

Deductive databases combine logic programming and relational databases toconstruct systems that are powerful (e.g., can handle recursive queries), stillfast and able to deal with very large volumes of data (e.g., utilizing set-orientedprocessing instead of one tuple at a time). In this work, we develop an extensionof Datalog [4] called fuzzy linguistic Datalog (FLDL) by means of FLLP. Featuresof FLDL are: (i) It enables to find answers to queries over a fuzzy linguisticdatabase (FLDB) using a fuzzy linguistic knowledge base (FLKB) representedby an FLDL program in which all components (except usual connectives) canbe expressed in linguistic terms; (ii) Results are returned with a comparativelinguistic truth value and thus can be ranked accordingly.

A. Nicholson and X. Li (Eds.): AI 2009, LNAI 5866, pp. 495–505, 2009.c© Springer-Verlag Berlin Heidelberg 2009

496 V.H. Le, F. Liu, and H. Lu

The paper is organised as follows: Section 2 gives an overview of FLLPwhile Section 3 presents FLDL; Section 4 gives discussions and concludes thepaper.

2 Preliminaries

2.1 Linguistic Truth Domains and Operations

Values of the linguistic variable Truth, e.g., True, VeryTrue, VeryLittleFalse, canbe regarded to be generated from a set of primary terms G = {False, T rue}using hedges from a set H = {V ery, Little, ...} as unary operations. There existsa natural ordering among these terms, with a ≤ b meaning that a indicates adegree of truth not greater than b, e.g., True < V eryT rue. Hence, the termdomain is a partially ordered set and can be characterised by an HA X =(X, G, H,≤), where X is a term set, and ≤ is the semantically ordering relationon X [2,3]. Hedges either increase or decrease the meaning of terms they modify,i.e., ∀h ∈ H, ∀x ∈ X, either hx ≥ x or hx ≤ x. The fact that a hedge h modifiesterms more than or equal to another hedge k, i.e., ∀x ∈ X , hx ≤ kx ≤ x orx ≤ kx ≤ hx, is denoted by h ≥ k. The primary terms False ≤ True aredenoted by c− and c+, respectively. For an HA X = (X, {c−, c+}, H,≤), Hcan be divided into disjoint subsets H+ and H− defined by H+ = {h|hc+ >c+}, H− = {h|hc+ < c+}. An HA is said to be linear if both H+ and H− arelinearly ordered. It is shown that the term domain X of a linear HA is alsolinearly ordered. An l-limited HA, where l is a positive integer, is a linear HA inwhich every term has a length of at most l + 1. A linguistic truth domain is afinite and linearly ordered set X = X ∪ {0, W, 1}, where X is the term domainof an l -limited HA, and 0 (AbsolutelyFalse), W (the middle truth value), and1 (AbsolutelyTrue) are the least, the neutral and the greatest elements of X ,respectively [1]. Operations are defined on X as follows: (i) Conjunction: x ∧ y= min(x, y); (ii) Disjunction: x ∨ y = max(x, y); (iii) An inverse mapping of ahedge: the idea is that if we modify a predicate by a hedge h, its truth valuewill be changed by the inverse mapping of h, denoted h−, e.g., if young(john) =V eryT rue, then V ery young(john) = V ery−(V eryT rue) = True; (iv) Many-valued modus ponens states that from (B, α) and (A←i B, β) (i.e., truth valuesof B and the rule A ←i B are at least α and β, respectively), one obtains(A, Ci(α, β)), where Ci is the t-norm, whose residuum is the truth function ofthe implication←i, evaluating the modus ponens [5]. Note that our rules can useany of the �Lukasiewicz and Godel implications. Given a linguistic truth domainX consisting of v0 ≤ v1 ≤ ... ≤ vn, where v0 = 0, vn = 1, �Lukasiewicz and Godelt-norms are defined as:

CL(vi, vj) ={

vi+j−n if i + j − n > 0v0 otherwise

CG(vi, vj) = min(vi, vj)

A Data Model for FLDB with Flexible Querying 497

2.2 Fuzzy Linguistic Logic Programming

Language. The language is a many-sorted predicate language without func-tion symbols. Let A denote the set of all attributes. For each sort of variablesA ∈ A, there is a set CA of constant symbols, which are names of elements inthe domain of A. Connectives can be: conjunctions ∧ (also called Godel) and∧L (�Lukasiewicz); the disjunction ∨; implications ←L (�Lukasiewicz) and ←G

(Godel); and hedges as unary connectives. For any connective c different fromhedges, its truth function is denoted by c•, and for a hedge connective h, itstruth function is its inverse mapping h−. The only quantifier allowed is ∀.

A term is either a constant or a variable. An atom (or atomic formula) is of theform p(t1, ..., tn), where p is an n-ary predicate symbol, and t1, ..., tn are termsof corresponding attributes. A body formula is defined inductively as follows: (i)An atom is a body formula; (ii) If B1 and B2 are body formulae, then so are∧(B1, B2), ∨(B1, B2) and hB1, where h is a hedge. A rule is a graded implication(A← B.r), where A is an atom called rule head, B is a body formula called rulebody, and r is a truth value different from 0; (A ← B) is called the logical partof the rule. A fact is a graded atom (A.b), where A is an atom called the logicalpart of the fact, and b is a truth value different from 0. A fuzzy linguistic logicprogram (program, for short) is a finite set of rules and facts such that there areno two rules (facts) having the same logical part, but different truth values. Aprogram P can be represented as a partial mapping P : Formulae → X \ {0},where the domain of P , denoted dom(P ), is finite and consists only of logicalparts of rules and facts, and X is a linguistic truth domain; for a rule (A← B.r)(resp. fact (A.b)), P (A← B) = r (resp. P (A) = b). A query is an atom ?A.

Example 1. We take the linguistic truth domain of a 2-limited HA X = (X, {F,T }, {V, M, Q, L},≤), where V, M, Q, L, F and T stand for Very, More, Quite,Little, False and True, i.e., X = {0,VVF,MVF,VF,QVF,LVF,VMF, MMF, MF,QMF,LMF,F,VQF,MQF,QF,QQF,LQF,LLF,QLF,LF,MLF,VLF,W,VLT,MLT,LT, QLT, LLT,LQT,QQT,QT,MQT,VQT,T,LMT,QMT,MT,MMT,VMT, LVT,QVT,VT,MVT,VVT, 1} (truth values are in an ascending order). Assume thatwe have the following piece of knowledge: (i) “A car is considered good if itis quite comfortable and consumes very less fuel” is VeryTrue; (ii) “A Toyotais comfortable” is QuiteTrue; (iii) “A Toyota consumes less fuel” is MoreTrue;(iv) “A BMW is comfortable” is VeryTrue; (v) “A BMW consumes less fuel” isQuiteTrue. The knowledge can be represented by the following program:

(good(X)←G ∧(Q comfort(X), V less fuel(X)).V T )(comfort(toyota).QT )

(less fuel(toyota).MT )(comfort(bmw).V T )

(less fuel(bmw).QT )

It can be seen that since it is difficult to give precise numerical assessments ofthe criteria, the use of qualitative assessments is more realistic and appropriate.

498 V.H. Le, F. Liu, and H. Lu

We assume the underlying language of a program P is defined by constants andpredicate symbols appearing in P . Thus, we can refer to the Herbrand base ofP , which consists of all ground atoms, by BP [6].

Declarative Semantics. Given a program P , let X be the linguistic truth do-main; a fuzzy linguistic Herbrand interpretation (interpretation, for short) f isa mapping f : BP → X. Interpretation f can be extended to all formulae, de-noted f , as follows: (i) f(A) = f(A), if A is a ground atom; (ii) f(c(B1, B2)) =c•(f(B1), f(B2)), where B1, B2 are ground formulae, and c is a binary connec-tive; (iii) f(hB) = h−(f(B)), where B is a ground body formula, and h isa hedge; (iv) f(ϕ) = f(∀ϕ) = infϑ{f(ϕϑ)|ϕϑ is a ground instance of ϕ}. Aninterpretation f is a model of P if for all ϕ ∈ dom(P ), f(ϕ) ≥ P (ϕ).

Given a program P , let X be the linguistic truth domain. A pair (x; θ), wherex ∈ X , and θ is a substitution, is called a correct answer for P and a query ?Aif for every model f of P , we have f(Aθ) ≥ x.

Fixpoint Semantics. Let P be a program. An immediate consequence operatorTP is defined as: for an interpretation f and every ground atom A, TP (f)(A) =max{sup{Ci(f(B), r) : (A←i B.r) is a ground instance of a rule in P}, sup{b :(A.b) is a ground instance of a fact in P}}. It is shown in [1] that the LeastHerbrand model of the program P is exactly the least fixpoint of TP and can beobtained by finitely iterating TP from the bottom interpretation, mapping everyground atom into 0.

3 Fuzzy Linguistic Datalog

According to [4], a data model is a mathematical formalism with two parts: (i)A notation for describing data, and (ii) A set of operations used to manipulatethat data. Furthermore, model-theoretic, proof-theoretic, fixpoint semantics andtheir relationship are considered as important parts of a formal data model.

3.1 Language

Our FLDL is an extension of Datalog [4] without negation and possibly withrecursion in the same spirit as the one in [7]. The underlying mathematicalmodel of data for FLDL is the notion of fuzzy linguistic relation (fuzzy relation,for short): a fuzzy predicate r(A1, ..., An) is interpreted as a fuzzy relation R :CA1 × ... × CAn → X and is represented in the form of a (crisp) relation withthe relation scheme R(A1, ..., An, TV ), where X is a linguistic truth domain andis the domain of the truth-value attribute TV . Thus, our FLDB is a relationaldatabase in which a truth-value attribute is added to every relation to store alinguistic truth value for each tuple. The relations are in the set-of-lists sense,i.e., components appear in a fixed order, and reference to a column is only byits position among the arguments of a given predicate symbol [4]. All notionsare the same as those in FLLP. Moreover, we also have some restrictions onlogic programs as in the classical case. A rule is said to be safe if every variable

A Data Model for FLDB with Flexible Querying 499

occurring in the head also occurs in the body. An FLDL program consists of finitesafe rules and facts. A predicate appearing in logical parts of facts is called anextensional database (EDB) predicate, whose relation is stored in the databaseand called EDB relation, while one defined by rules is called an intensionaldatabase (IDB) predicate, whose relation is called IDB relation, but not both.

3.2 Model-Theoretic Semantics

Let P be an FLDL program; we denote the schema of P by sch(P ). For aninterpretation f of P , the fact that f(r(a1, ..., an)) = α, where r(a1, ..., an) isa ground atom, is denoted by a tuple (a1, ..., an, α) in the relation R for thepredicate r. Hence, f can be considered as a database instance over sch(P ). Asin the classical case [4,8], the semantics of P is the least model of P .

3.3 Fuzzy Linguistic Relational Algebra

We extend a monotone subset, consisting of Cartesian product, equijoin, pro-jection and union, of relational algebra [4] for the case of our relations andcreate a new operation called hedge-modification. We call the collection of theseoperations and the classical selection fuzzy linguistic relational algebra (FLRA).

Cartesian Product. Predicates can be combined by conjunctions or disjunc-tions in rule bodies, thus we have two kinds of Cartesian product called conjunc-tion and disjunction Cartesian product. Let R and S be fuzzy relations of arityk1 and k2, respectively. The conjunction (resp. disjunction) Cartesian productof R and S, denoted R×∧ S or ×∧(R, S) (resp. R×∨ S or ×∨(R, S)), is the setof all possible (k1 + k2 − 1)-tuples of which the first k1 − 1 and the next k2 − 1components are from a tuple in R and a tuple in S excluding the truth val-ues, respectively, and the new truth value is ∧•(TVr, TVs) (resp. ∨•(TVr, TVs)),where TVr, TVs are the truth values of the tuples in R and S, respectively.

Equijoin. We also have two kinds of equijoin called conjunction and disjunctionequijoin. The conjunction (resp. disjunction) equijoin of R and S on column iand j, written R ��∧$i=$j S or ��∧$i=$j (R, S) (resp. R ��∨$i=$j S or ��∨$i=$j (R, S)),is those tuples in the conjunction (resp. disjunction) product of R and S suchthat the ith component of R equals the jth component of S.

Hedge-Modification. Let R be the relation for a predicate r. The relationfor formula kr, where k is a hedge, is computed by a hedge-modification of R,denoted Hk(R), as follows: for every tuple in R with a truth value α, there isthe same tuple in Hk(R) except that the truth value is k−(α).

Projection. Given a relation for the body of a rule, a projection is used toobtain the relation for the IDB predicate in the head. Due to semantics of ourrules, the truth value of each tuple in the projected relation is computed usingthe expression C( , ρ), where C is the t-norm corresponding to the implicationused in the rule, ρ is the truth value of the rule, and the first argument of C is the

500 V.H. Le, F. Liu, and H. Lu

truth value of the corresponding tuple in the body relation. More precisely, if R

is a relation of arity k, we let ΠC( ,ρ)i1,i2,...,im

(R), where the ij’s are distinct integersin the range 1 to k−1, denote the projection of R w.r.t. C( , ρ) onto componentsi1, i2, ..., im, i.e., the set of (m + 1)-tuples (a1, ..., am, α) such that there is somek-tuple (b1, ..., bk−1, β) in R for which aj = bij for j = 1, ..., m, and α = C(β, ρ).

Union. For the case there is more than one rule with the same IDB predicate intheir heads, the relation for the predicate is the union of all projected relationsof such rules. The union of relations R1, ..., Rn of the same arity k + 1, denoted⋃n

i=1 Ri, is the set of tuples such that for all tuples (a1, ..., ak, αi) in Ri, there isone and only one tuple (a1, ..., ak, max{αi}) in

⋃ni=1 Ri.

Remark 1. Clearly, it would not be efficient if we store all possible tuples in theEDB relations; instead we want to store only tuples with non-zero truth values,called non-zero tuples, i.e., those have actual meaning, and tuples not appearingin the database are regarded to have a truth value 0. Nevertheless, there aresituations where we have to store all possible tuples in some EDB relations.Our language allows rule bodies to be built using the disjunction. Thus, we canhave a non-zero tuple in the relation for a disjunction formula even if one of thetwo tuples in the relations for its disjuncts is missing. However, by disjunctionCartesian product or equijoin, the absence of a tuple in a relation for one of thedisjuncts will lead to the absence of the tuple in the relation for the formula.Thus, to ensure that all non-zero tuples will not be lost, all relations for thedisjuncts need to consist of all possible tuples that can be formed out of constants(of corresponding sorts of variables) appearing in the program such that tupleswhich are not explicitly associated with a truth value will have a truth value 0.Moreover, in order for the relations for the operands of a disjunction to consistof all possible tuples, relations for all predicates occurring in the operands mustalso consist of all their possible tuples; for the relation of a predicate in the headof a rule to consist of all possible tuples, all relations for predicates in its bodyalso contain all their possible tuples. In summary, we can say that the relation fora predicate involved in a disjunction must contain all possible tuples; conversely,the relation just needs to consist of only non-zero tuples.

3.4 Translation of FLDL Rules into FLRA Expressions

From the proof-theoretic point of view, we would like to compute relations forIDB predicates from relations for EDB predicates using rules. To that end, wefirst translate every rule in an FLDL program into an FLRA expression whichyields a relation for the IDB predicate in the head of the rule. A translationalgorithm to do this is adapted from the algorithm for the classical case in [9]. Thealgorithm requires the following: (a) Function corr(i) returns, for any variablevi in the head of the rule, the index j of the first occurrence of the same variablein the body; (b) Function const(w) returns true if argument w is a constant,and false otherwise; (c) Procedure newvar(x) returns a new variable name in x;(d) Let L be a string, and x, y symbols. Then L < x, y > (resp., L[x, y]) denotes

A Data Model for FLDB with Flexible Querying 501

a new string obtained by replacing the first occurrence (resp., all occurrences)of x in L with y; (e) An artificial predicate eq(x, y) whose relation EQ containsone tuple (c, c, 1) for each constant c appearing in the program. EQ is used toexpress the equality of two components in the crisp sense, i.e., the truth valuesof such tuples are 1.

TRANSLATION ALGORITHMINPUT: A rule r : (H ←i B.ρ), where H is a predicate p(v1, ..., vn), and B is

a formula of predicates q1, ..., qm whose arguments uj are indexed consecutively.During the execution, B denotes whatever follows the symbol ← except ρ.

OUTPUT: An FLRA expression of relations Q1, ..., Qm.BEGIN T (r) END

RULE 1: T (r)BEGIN

IF ∃i : const(vi) THEN /*case (a)*/BEGIN newvar(x); RETURN T (H < vi, x >← ∧(B, eq(x, vi)).ρ) END

ELSEIF ∃i, j : vi = vj , i < j THEN /*case (b)*/BEGIN newvar(x); RETURN T (H < vi, x >← ∧(B, eq(x, vj)).ρ) END

ELSE RETURN ΠCi( ,ρ)corr(1),...,corr(n)T

′(B)END

RULE 2: T ′(B)BEGIN

IF ∃i : const(ui) THEN /*case (a)*/BEGIN newvar(x); RETURN σ$i=ui

T ′(B < ui, x >) ENDELSEIF ∃i, j : ui = uj , i < j THEN /*case (b)*/

BEGIN newvar(x); RETURN σ$i=$jT′(B < ui, x >) END

ELSEBEGIN B[∧,×∧]; B[∨,×∨]; B[h,Hh]; B[eq(...), EQ];

FOR i:= 1 TO m DO B[qi(...), Qi]; RETURN BEND

END

After the application of rule T ′, the selection conditions are replaced by thoseof relations and join conditions as follows: (i) For each condition $i = a, wefind relation Q that contains the component of position i. Let l be the totalnumber of components of all relations preceding Q; we put a selection condition$j = a for Q, where j = i − l. (ii) For each condition $i = $j, we find theinnermost expression, which is either a relation, a product ×c, or a join ��c,where c ∈ {∧,∨}, that contains both components of positions i and j. Let lbe the total number of components of all relations preceding the expression; weput i′ = i − l, j′ = j − l, and: (ii.1) If the expression is a relation Q, we put aselection condition $i′ = $j′ for Q. (ii.2) If it is a product E = ×c(E1, E2) or ajoin E =��c

F (E1, E2), the components of positions i and j are contained in E1

and E2, respectively (otherwise, E is not the innermost expression containingboth components). Let j′′ = j′−l1, where l1 is the total number of components ofall relations in E1; we replace E by a join ��c

i′=j′′ (E1, E2) or ��cF∧i′=j′′ (E1, E2).

502 V.H. Le, F. Liu, and H. Lu

Example 2. Given a rule (p(X, X, Z)← ∨(r(X, Y ),∧(s(Y, a, Z), V q(X, Z))).ρ),by Rule T , case (b), we have (1) in the following. Then, by T , last recursive call,we have (2). After that, by T ′, cases (a) and (b), we have (3). Then, by T ′, lastrecursive call, we have (4). After pushing the selections into relation selectionsand join conditions, we have (5).

T (p(V1, X, Z)← ∧(∨(r(X, Y ),∧(s(Y, a, Z), V q(X, Z))), eq(V1, X)).ρ) (1)

ΠC( ,ρ)8,1,5 T ′(∧(∨(r(X, Y ),∧(s(Y, a, Z), V q(X, Z))), eq(V1, X))) (2)

ΠC( ,ρ)8,1,5 σ$1=$6∧$2=$3∧$4=a∧$5=$7∧$6=$9T

′(∧(∨(r(V2 , V3),∧(s(Y, V4, V5),V q(V6, Z))), eq(V 1, X))) (3)

ΠC( ,ρ)8,1,5 σ$1=$6∧$2=$3∧$4=a∧$5=$7∧$6=$9(×∧(×∨(R,×∧(S,HV (Q))), EQ)) (4)

ΠC( ,ρ)8,1,5 (��∧6=2 (��∨1=4∧2=1 (R, ��∧3=2 (σ$2=aS,HV (Q))), EQ)) (5)

The translated expression of a rule r with an IDB predicate p in its head willbe denoted by E(p, r). Since p can appear in the heads of more than one rule,the relation for p is the union of all such translated expressions, denoted E(p) =⋃

r E(p, r). The collection of all E(p) is denoted by E .

3.5 Equivalence between E and TP

We show the equivalence between E and TP in the sense that for every IDBpredicate p, if there exists a tuple (a1, ..., an, α) in the relation produced byE(p), the truth value of the ground atom p(a1, ..., an) computed by the immediateconsequence operator TP (f) is also α, otherwise, the truth value of p(a1, ..., an)is 0, where f is the interpretation corresponding to current values of relations.

Consider a rule r : (A ← B.ρ) of a program P with predicate p in the head.For each subformula Bj of B, including itself, we denote the subexpression ofE(p, r) corresponding to Bj by E(Bj), and the concatenation of arguments of allpredicates appearing in Bj by A(Bj). Moreover, we denote the part of E(p, r)excluding the projection Π , which is a combination of E(B) and occurrences ofthe relation EQ in the form of conjunction joins or products, by Eb(r).

Example 3. Consider the rule r in Example 2; if B1 = ∧(s(Y, a, Z), V q(X, Z)),E(B1) =��∧3=2 (σ$2=aS,HV (Q)) and A(B1) = (Y, a, Z, X, Z); Eb(r) =��∧6=2

(��∨1=4∧2=1 (R, ��∧3=2 (σ$2=aS,HV (Q))), EQ).

Let Q1, ..., Qm be the relations that have been already computed for all pred-icates q1, ..., qm in B, and f an interpretation such that f(qi(bi

1, ..., biki

)) = βi

if there is a tuple (bi1, ..., b

iki

, βi) ∈ Qi, and f(qi(bi1, ..., b

iki

)) = 0, otherwise. Foreach ground instance (A′ ← B′.ρ) of r, let θ be the ground substitution suchthat A′ = Aθ and B′ = Bθ. Also, let B′

j = Bjθ for all j. It can be seenthat the substitution θ identifies at most one tuple (bj

1, ..., bjnj

, βj), which cor-responds to B′

j , in each E(Bj). More precisely, (bj1, ..., b

jnj

) is obtained fromA(Bj) by replacing each occurrence of a variable by its substitute value in θ.We denote the subset of E(Bj) corresponding to B′

j by E(B′j); thus, E(B′

j) is

A Data Model for FLDB with Flexible Querying 503

either empty or a tuple (bj1, ..., b

jnj

, βj). It can be proved by induction on thestructure of B′

j that for all j, if there exists such a tuple, then βj = f(B′j),

otherwise f(B′j) = 0. In particular, if E(B′) = {(b1, ..., bk, β)}, then β = f(B′);

otherwise, E(B′) is empty and f(B′) = 0. Moreover, if E(B′) = {(b1, ..., bk, β)},since there is only one tuple (c, c, 1) in EQ for each constant c, there is ex-actly one tuple in Eb(r) which is formed by the tuple in E(B′) and occur-rences of EQ via conjunction joins or products (with one component of EQbeing restricted to a constant due to Cases (a) of Rule T and T ′). Because∧•(β, 1) = β, the truth value of the tuple in Eb(r) is also β. By projection, thetruth value of the tuple corresponding to A′ in E(p, r) is C(β, ρ) = C(f(B′), ρ).Otherwise, E(B′) is empty, and there is no tuple for A′ in E(p, r). Finally,if there exist tuples for A′ in E(p, r)’s, the truth value of the tuple for A′ inE(p) is max{C(f(B′), ρ)|(A′ ← B′.ρ) is a ground instance of a rule in P}. Onthe other hand, since an IDB predicate cannot be an EDB predicate simultane-ously, TP (f)(A′) = max{C(f(B′), ρ)|(A′ ← B′.ρ) is a ground instance of a rulein P}. In the case there is no tuple for A′ in E(p), for all B′, E(B′) is empty,and f(B′) = 0; hence, TP (f)(A′) = 0.

3.6 Fixpoint Semantics

Due to the equivalence between E and TP (f), the semantics of the program Pcan be obtained by repeatedly iterating the expressions in E , obtained from therules in P , from a set of the relations for the EDB predicates. More concretely,we can write E(p) as E(p, R1, ..., Rk, P1, ..., Pm), where Ri’s are all EDB relationsincluding EQ, and Pi’s are all IDB relations. The semantics of program P is theleast fixpoint of the equations Pi = E(pi, R1, ..., Rk, P1, ..., Pm), for i = 1 . . .m.Similar to [4], we have a naive evaluation algorithm:

INPUT: A collection of FLRA expressions E obtained from rules by the trans-lation algorithm, and lists of IDB and EDB relations P1, ..., Pm and R1, ..., Rk.

OUTPUT: The least fixpoint of equations Pi = E(pi, R1, ..., Rk, P1, ..., Pm).

BEGINFOR i := 1 TO k DO

IF ri involves a disjunction THENRi := Ri+ a set of other possible tuples with a truth value 0;

FOR i := 1 TO m DOIF pi involves a disjunction THEN

Pi := a set of all possible tuples with a truth value 0ELSE Pi := ∅;

REPEAT FOR i := 1 TO m DO Qi := Pi; /*save old values of Pi*/FOR i := 1 TO m DO Pi := E(pi, R1, ..., Rk, Q1, ..., Qm)

UNTIL Pi = Qi for all i = 1 . . .m;END

504 V.H. Le, F. Liu, and H. Lu

For example, given the program in Example 1, applying the above algorithms,we obtain tuples {(toyota, QT ), (bmw, V LT )} in the relation for predicate good(with Q−(QT ) = T, V −(MT ) = QT, Q−(V T ) = V V T, V −(QT ) = V LT ).

Since the least model of a program P can be obtained by finitely iterating TP

from the bottom interpretation, the following theorem follows immediately.

Theorem 1. Every query over an FLKB represented by an FLDL program canbe exactly computed by finitely iterating the expressions in E, obtained from therules in the program, from a set of relations for the EDB predicates.

4 Discussions and Conclusion

We can utilize some optimization techniques such as incremental evaluation andmagic-sets for evaluation of FLDL as for the classical case [4]. Nevertheless, in-cremental tuples can be used for only relations of the predicates that do notinvolve any disjunction. The procedural semantics of FLLP can be used to findanswers to queries w.r.t. a threshold as discussed in [1]. However, it may notterminate for recursive programs. In this paper, we have presented a data modelfor fuzzy data in which every tuple has a linguistic truth value; the data can bestored in a crisp relational database with an extra truth-value attribute addedto every relation. Queries to the database are made over an FLKB expressed byan FLDL program. We define FLRA as the set of operations to manipulate thedata. Logical rules whose heads have the same IDB predicate can be convertedinto an expression of FLRA which yields a relation for the predicate. The seman-tics of the FLDL program can be obtained by finitely iterating the expressionsfrom the database. Concerning related works, the many-valued logic extensionof Datalog in [7] can also handle fuzzy similarity and enable threshold compu-tation. The probabilistic Datalog in [10] uses a magic sets method as the basicevaluation strategy for modularly stratified programs. The method transformsa probabilistic Datalog program into a set of relational algebra equations, thenthe fixpoint of these equations is computed.

References

1. Le, V.H., Liu, F., Tran, D.K.: Fuzzy linguistic logic programming and its applica-tions. Theory and Practice of Logic Programming 9, 309–341 (2009)

2. Nguyen, C.H., Wechler, W.: Hedge algebras: An algebraic approach to structureof sets of linguistic truth values. Fuzzy Sets and Systems 35, 281–293 (1990)

3. Nguyen, C.H., Wechler, W.: Extended hedge algebras and their application to fuzzylogic. Fuzzy Sets and Systems 52, 259–281 (1992)

4. Ullman, J.D.: Principles of database and knowledge-base systems, vol. I, II. Com-puter Science Press, Inc., New York (1988,1990)

5. Hajek, P.: Metamathematics of Fuzzy Logic. Kluwer, Dordrecht (1998)6. Lloyd, J.W.: Foundations of logic programming. Springer, Berlin (1987)7. Pokorny, J., Vojtas, P.: A data model for flexible querying. In: Caplinskas, A.,

Eder, J. (eds.) ADBIS 2001. LNCS, vol. 2151, pp. 280–293. Springer, Heidelberg(2001)

A Data Model for FLDB with Flexible Querying 505

8. Abiteboul, S., Hull, R., Vianu, V.: Foundations of databases. Addison-Wesley, USA(1995)

9. Ceri, S., Gottlob, G., Tanca, L.: Logic programming and databases. Springer, Berlin(1990)

10. Fuhr, N.: Probabilistic datalog: implementing logical information retrieval for ad-vanced applications. J. Am. Soc. Inf. Sci. 51(2), 95–110 (2000)