[ieee 2010 international conference on artificial intelligence and computational intelligence (aici)...

A Distributed Concept Lattices Vertical Union Method

Feng Ma, Jiankun Yu, Zhiyong Zeng, Ye Tao, Tao Feng School of Computer and Information

Yunnan University of Finance and Economics Kunming , China

E-MAIL:[email protected], [email protected], [email protected], [email protected], [email protected]

Abstract—Concept lattice, the core data structure of formal context, has high time complexity when it is constructed. This problem has disturbed the further application of concept lattice in data mining. A union method is developed in this paper, which first vertically divide the formal context into distributed stations, construct concept sub-lattices independently, then union them together. The validity and completeness of this method is proved by both theory and experiment. Besides that, compared with the traditional algorithm which construct concept lattice directly from a single formal context, this method has dramatically improved the time performance.

Keywords- formal context; data mining; concept lattice

I. INTRODUCTION Since German Profession Wille proposed formal concept

analysis in 1982[1], concept lattice has attracted a lot of attention as the core data structure of formal concept analysis, and has been widely used in Knowledge Discovery[2][3], Software Engineer[4-[6] and many other fields. But due to the completeness of the concept lattice, it often has a high time complexity during the construct process. How to resolve this problem, has been one of the important research area of concept lattice’s further application in data mining. Many researchers proposed their own view on this point, such as[7][8][9], and got satisfied result under single station. But when data is larger than one station could handle, the former methods’ performance may not as good as expected.

Along with the development of the Internet technology, distributed data mining presented as one of the important solutions for this kind of problem[10]. In distributed data mining system, data partition and union technique are two important steps[11]. A good method of data partition is helpful to improve the quality of data mining results. A reasonable union technique will simplify the process of data mining. In this paper, a incremental concept lattice construct method[7] was introduced to build concept sub-lattice from partitioned data sets independently. Then we need a reasonable union method to combine all these concept sub-lattice together, and the final result must be suitable for the whole data. So the union method is the key point in research.

This paper is organized as follows: Section 2 gives some basic concepts of Concept Lattice, Section 3 discuss about the Distributed Concept Lattice Vertical Union Method, Section 4 shows the experimentation result and gives the analysis, and Section 5 gives the conclusion.

II. BASIC CONCEPT OF CONCEPT LATTICE For the given data sets, the formal context is defined as a

triple K=(G,M,I) in the formal concept analysis[1]. The G is the set of objects, M is the set of attributes, and I is the binary relationship between G and M. For a object g∈G, attribute m∈M, gIm stands for that object g has attribute m.

Definition 2.1: A couple (A, B) derived from formal context is called a basic concept(or simply called concept), where A⊆G, B⊆M, A=g(B), B=f(A). And the g(B)={ m∈M | g∈A, gIm}, f(A)={ g∈G | m∈B, gIm}. The A is called the extension of the concept, and the B is the intension of it.

Definition 2.2: For concepts C1=(A1, B1) and C2=(A2, B2), define C1≤C2 if and only if A1⊆A2, C1 is called the sub-concept of C2 and C2 is the super-concept of C1. If C1≤C2 and there isn’t a concept C such that C1≤C≤C2, the relation between C1 and C2 is called immediate sub-super concept, denotes as C1<C2.

Definition 2.3: The lattice induced by all the concepts and their sub-super partial order relationship in formal context is called concept lattice, denotes as L(K).

Definition 2.4: If the formal context K is divided into several small parts, K1,K2…Kn, and K1∪K2∪…∪Kn=K, the concept lattice induced from Ki, i=1,2…n, is called concept sub-lattice.

III. DISTRIBUTED CONCEPT LATTICE VERTICAL UNION METHOD

As mentioned before, the data partition is one of the important steps of distributed data mining, the way of depart the formal context largely influence the efficiency and effectiveness of the union method. Since the formal context partition can be divided into two types: horizontal partition and vertical partition, the union of the sub-concept lattice also could be divided into horizontal and vertical union. This paper was about to discuss the vertical union of concept sub-lattice, which means firstly depart the formal context into several sets with same attribute field but different object field, then construct concept sub-lattice with Incremental concept formation algorithms mentioned in [7] independently, and finally use DCLVUM, Distributed Concept Lattice Vertical Union Method, to union them together.

This paper is sponsored by: "Research on Vague sets similarity measure and clustering algorithm", No.2009CD076; "Parallel data mining model research in decision support system" , No.2007G079M; "Financial data model research under distributed data mining", No.YC10A003

2010 International Conference on Artificial Intelligence and Computational Intelligence

978-0-7695-4225-6/10 $26.00 © 2010 IEEE

DOI 10.1109/AICI.2010.336

469

2010 International Conference on Artificial Intelligence and Computational Intelligence

978-0-7695-4225-6/10 $26.00 © 2010 IEEE

DOI 10.1109/AICI.2010.336

469

A. Description of DCLVUM After the partition of formal context and the construction

of concept sub-lattice, the key point would be the union method. For the convenience of description, we would like to give some definitions before describe the detail of the union method.

Definition 3.1.1: For a concept C=(A,B), the number of attributes in B, is called its intension quantity, denotes as |B|.

Definition 3.1.2: During the union process of L(K1) with L(Ki), if the concept C1=(A1,B1)∈L(K1), Ci=(Ai,Bi)∈L(Ki), and B1 ⊆ Bi, then C1 change into C1’=(A1 ∪ Ai，B1), and the C1’is called the updated concept of Ci.

Definition 3.1.3: During the union process of L(K1) with L(Ki), if the concept C1=(A1,B1)∈L(K1), Ci=(Ai,Bi)∈L(Ki), and B1 ⊄ Bi，Bi ⊄ B1, B1 ≠ Bi, then a new concept C12’=（{A1 ∪ Ai}，{B1 ∩ Bi}）will be added to L(K1), and it is called the new added concept of Ci.

Assume that the K is the whole formal context, and have been divided into p stations, so we got K1,K2,…,Kp, and K=K1 ∪ K2 ∪ … ∪ Kp. L(Ki) is the concept sub-lattice build from Ki. Cij stands for a concept of L(Ki). |L(Ki)| presents the number of concepts in L(Ki).

The DCLVUM is described as follows: DCLVUM(Distributed Concept Lattice Vertical Union Method) Input: L(K1),L(K2)……L(Kp) Output: L(K1) ∪ L(K2) ∪ … ∪ L(Kp) Begin

For （i=2;i<=p;i++） Order all the concept in L(Ki) by their own intension

quantity For （j=1;j<=|L(Ki)|;j++） Take concept Cij, and use CLVUA to union Cij with

L(K1) EndFor EndFor

End The CLVUA, Concept Lattice Vertical Union Algorithm,

is a algorithm which aim at union a concept to a exist concept lattice. The detail of it was given as follows. CLVUA（Concept lattice Vertical Union Algorithm） Input: L(K1), Cij=(Aij,Bij) Output: updated concept sub-lattice L’(K1) Begin

Order all the concept in L(K1) by their own intension quantity

For (m=1;m<=|L(K1)|;m++) // UorNAflag[C1m]=true means that the concept

C1m=(A1m,B1m) is a update or new added concept If UorNAflag[C1m] =true then Continue Endif If B1m

⊆ Bij then Update C1m to C1m’=({ A1m ∪ Aij }，B1m) Set UorNAflag[C1m]=true; Else Generate new concept Cmj=({A1m ∪ Aij},{B1m ∩ Bij})

If exist C1’=(A1’,B1’)∈L(K1)，B1’= B1m ∩ Bij then

Update C1’ to C1’’=(A1’ ∪ A1m ∪ Aij,B1’) Set UorNAflag[C1’]=true; Else Add Cmj into L(K1); Set UorNAflag[Cmj]=true; Add new edge Cmj->C1m; Add Cij into L(K1); Add new edge Cmj->Cij; Set UorNAflag[Cij]=true; If exist C1nf=(A1nf,B1nf)∈L(K1),and B1nf

⊆ B1m ∩ Bij then

Add new edge C1nf->Cmj; Endif If exist C1if=( A1if,B1if) ∈L(K1),and B1if

⊆ Bij then Add new edge C1if->Cij; Endif If exist C1ic=( A1ic,B1ic),and Bij

⊆ B1ic then Add new edge Cij->C1ic; Endif Endif Endif Endfor

End Proposition 3.1.1: When concepts in L(K1) and L(Ki)

both order by their intension quantity, C1 ∈ L(K1), Ci∈L(K2), Cj∈L(K2), and |Bi|≤|Bj|. If C1 was the updated or new added concept of Ci, the Cj don’t have to compare with C1 during the process of L(K1) union with L(Ki).

Proof: Assume that C1m∈L(K1), Ci∈L(Ki), and the C1 was the operation result of C1m and Ci. If C1 is the new added concept, then C1=（A1m ∪ Ai,B1m ∩ Bi） . If C1 is the updated concept of Ci, the C1= （ A1m ∪ Ai,B1m ） = （A1m ∪ Ai, B1m ∩ Bi）. So we take C1=（A1m ∪ Ai, B1m ∩ Bi

） to explain both new added and updated situation. As the |Bi|≤|Bj|, if Cj≤Ci, then Cj compared with C1 will generate a new concept C1j=( A1m ∪ Ai ∪ Aj, B1m ∩ Bi ∩ Bj), but (A1m ∪ Ai ∪ Aj, B1m ∩ Bi ∩ Bj)= （A1m ∪ Ai, B1m ∩ Bi）=C1. If Cj≤Ci do not exist, then let Bi ∩ Bj=Bij, so according to the definition of concept, a concept Cij=(Ai ∪ Aj,Bij) must be exist in L(Ki), and |Bij|≤|Bi|. When Cj compared with C1, a new concept C1j=(A1m ∪ Ai ∪ Aj, B1m ∩ Bi ∩ Bj) will be generated. However, as |Bij|≤|Bi|, so when Cij union with L(K1), it has already generated a concept C1ij=(A1m ∪ Ai ∪ Aj, B1m ∩ Bi ∩ Bj)=C1j. So if C1 was the updated or new added concept of Ci, and |Bi|≤|Bj|, then Cj don’t have to compare with C1.

B. Validity and Completeness of DCLVUM In the issue of model union, a important standard is that,

whether the final model has validity and completeness. That is, whether the final model suitable for the original whole data and whether the final model represents all the information in the original whole data. Take two stations for example, whole formal context K is divided into K1 and K2, K= K1 ∪ K2. L(K1), L(K2) was the concept sub-lattice build

470470

from K1 and K2. If we could proof L(K1) ∪ L(K2) ⇔ L(K), then we could say that, the final model of DCVUM has validity and completeness.

Assume that O(K1), O(K2) and O(K) respectively denotes all the objects in K1, K2 and K, A(K1), A(K2)and A(K) respectively denotes all the attributes in K1, K2 and K.

Proposition 3.2.1: ∀ C ∈ L(K1) ∪ L(K2) ， then C∈L(K).

Proof: As O（K1）∪O（K2）=O（K），A（K1）∪

A（K2）=A（K）, so ∀ C=（A，B）∈ L(K1) ∪L(K2)，then A⊆ O（K），B⊆A（K）, and A= g (B) ，B= f (A).According to the definition 2.1，C∈L(K).

Proposition 3.2.2: ∀ C’ ∈ L(K)， then C’ ∈ L(K1)∪L(K2).

Proof: ∀ C’=（A’，B’）∈L(K), let A’={A1∪A2}，as

A1 ⊆ O(K1) ， A2 ⊆ O(K2). If A1 ≠ φ， A2=

φ， then

C’ ∈ L(K1)， so according to DCLVUM, C’ ∈ L(K1)∪

L(K2). If O1=φ，O2 ≠ φ

，then C’∈ L(K2). When union C’ with L(K1)，as C’∈ L(K)，so there is no such concept like C1i=(A1i, B1i)∈L(K1)，that make B’ ⊆ B1i. According to DCLVUM， we know that C’ will be added into L(K1)，

so C’ ∈ L(K1)∪L(K2). If O1 ≠ φ，O2 ≠ φ

， then let C1’=(O1,A1’) ∈ L(K1), C2’=(O2,A2’) ∈ L(K2). During the process of union， concept C12=( A1∪A2,B1’ ∩ B2’)=(A’, B1’ ∩ B2’) will be generated after C2’compared with C1’. If B1’ ∩ B2’ ≠ B’， then there will be two concepts which have same extension but different intension. But this conflict with the definition 2.1, so it is impossible. In other words, B1’ ∩ B2’=B’， so C12=(A’, B’)=C’∈L(K1) ∪L(K2).

So the concepts in L(K1)∪L(K2) are totally same with that in L(K). According to definition 2.3, we could refer that, if two concept lattice has same concepts, they must have the same structure. That is to say, L(K1)∪L(K2) ⇔ L(K).

C. Illustration of DCLVUM Now we take the formal context in Table 1 as K to

describe the process of the DCVUM and illustrate the validity and completeness of it. To make it easy to follow, we take the first and second record as the K1 and the rest records and K2. The L(K1) and L(K2) was show in Figure 1 and Figure2.

TABLE I. FORMAL CONTEXT K

Attr Obj A B C D

1 1 1 0 1

2 1 0 1 0

3 0 1 1 0

4 1 1 0 1

5 1 0 0 0

Figure 1. Concept sub-lattice L(K1)

Figure 2. Concept sub-lattice L(K2)

Here, B’（#21）stands for the #21 concepts’ attributes set. We firstly order the concept in L(K2) by their intension quality, then take the first concept of L(K2), #21, and let it compared with concepts in L(K1) one by one. According to DCVUM, when #21 compare with #11, as do not exist B’(#21) ⊆ B’(#11), a new concept (A’(#21) ∪ A’(#11),

B ’ (#21) ∩ B ’ (#11))= （ {1,2,3,4,5}， φ） will be

generated, and put into L(K1), which we call #11NA, and treated as the immediate sup-concept of #11. As B’(#21)=B’(#11NA), #21 do not need to add to L(K1). So L(K1) will be updated as Figure3.

Figure 3. The result of #21 union with L(K1)

#26（φ ,{a,b,c,d}）

#22（{3,4}，b）

#21（{3,4,5}，φ ）

#23（{4,5}，a）

#24（3，{b,c}） #25（4，{a,b,d}）

#13（2，{a,c}）#12（1，{a,b,d}）

#14（φ ，{a,b,c,d}）

#11（{1,2}，a）

#11NA（{1,2,3,4,5}，φ ）

#13（2，{a,c}）#12（1，{a,b,d}）

#11（{1,2}，a）

#14（φ ，{a,b,c,d}）

471471

Then we take the second concept of L(K2), #22, and union it to L(K1). As #11NA is new added concept of #21, so #22 do not need to compare with it. #22 compared with #11, got (A ’ (#22) ∪ A ’ (#11), B ’ (#22) ∩ B ’ (#11))=

（{1,2,3,4}，φ）=#12NA. As B’(#12NA)=B’(#11NA), so

according to DCVUM, #11NA need to update to （A’(#12NA) ∪ A’ (#11NA), B ’ (#12NA) ） . Since A’

(#12NA) ⊆ A’ (#11NA), #11NA remain the same after updated. When #22 compare with #12, generated a new concept (A’(#22) ∪ A’(#12), B’(#22) ∩ B’(#12))=（{1,3,4}，b）=#22NA. Since there do not exist a concept C ∈ L(K1) that make B’(C)=B’(#22NA), so #22 should added into L(K1) and be the immediate sup-concept of #12. As B’(#11NA) ⊆ B’(#22NA), #22NA’s immediate sup-concept is #11NA. For #22, as the B’(#22)=B’(#22MA), so #22 do not have to add to the L(K1). After compare with the rest of the concepts in L(K2), the L(K1) was show in Figure 4. And when all concepts of L(K2) union with L(K1), the finally L(K1) was show in Figure 5.

Figure 4. The result of #22 union with L(K1)

Figure 5. The result of L(K2) union with L(K1)

To compare the L(K1)∪L(K2) and L(K), we construct L(K) and shown it in Figure 6. So we can easily find out that L(K1)∪L(K2) ⇔ L(K).

Figure 6. Concept lattice L(K)

D. Time performance of DCLVUM Now let’s look at the time performance of the DCVUM.

Assume the number of records in K is n, equally divided them into p stations, so every station has n/p records. For every station, the maximum number of concepts in its own

concept sub-lattice is pn /2 . As mentioned before, when

L(K2) union with L(K1), concepts of L(K2) don’t have to compare with the updated or new added concepts. So the

maximum compare time of L(K1)∪L(K2) is pn /2 *

pn /2 . When L(K3) union with (L(K1)∪L(K2))=L’(K1), as L(K1)∪L(K2) ⇔ L(K), K1∪K2 has 2n/p records. So the maximum

concepts in L(K1) ∪ L(K2) would be pn /22 . And the

maximum compare time for L(K3) union with (L(K1)∪L(K2))

is pn /22 *

pn /2 . As a result, the whole p stations compare time would

bepn /2 *

pn /2 +pn /22 *

pn /2 +……+pnp /)1(2 −

*pn /22 =

pn /22 (p2 - 2 ). So we can approximately say that

DCLVUM’s time performance is O(ppn +)/2(2 ). For the

traditional concept lattice construct method, the Godin Algorithm for example, if a concept lattice L(M) build upon a m records formal context, its maximum concepts is also

m2 . But at this time, if a new record was about to union with

it, it has to compare with every concept in L(M), that is m2

times. At the same time, new record may generate new concepts during the union process, and the number of new

concepts maybe m2 at the worst situation Furthermore, these

new concepts need to compare with the already exist concepts in L(M) to check the sub-super relationship. So it

still need to compare m2 +（

m2 +1）+……+(m2 +

m2 -1)=

m2 *m2 +

)12(*2

)12(1 −−+ mm

times. As a result, traditional construct method to construct a concept lattice from a n records formal context, need compare

∑=

−−+++n

m

mm

mmm

1)]12(*

2)12(12*22[

= 322 12 −+ +nn

#8（φ ,{a,b,c,d}）

#1（{1,2,3,4,5}，φ ）

#3（{1,2,4,5},a）

#7（3,{b,c}）#5（{1,4},{a,b,d}）

#2（{1,3,4},b） #4（{2,3}，c）

#6（2,{a,c}）

#14（φ ,{a,b,c,d}）

#11A（{1,2,3,4,5}，φ ）

#11（{1,2},a）

#12（{1},{a,b,d}） #13（2,{a,c}）

#22NA（{1,3,4},b）

#14（φ ,{a,b,c,d}）

#11NA（{1,2,3,4,5}，φ ）

#11U（{1,2,4,5},a）

#24（3,{b,c}）

#34NA（{2,3}，c）

#13（2,{a,c}）

#22NA（{1,3,4},b）

#12U（{1,4},{a,b,d}）

472472

times. So the time performance of traditional method can be

approximately regarded as O(n22 ). Compared with

traditional method, the accelerate value of DCLVUM would

be T= )2()2()/2(

2

ppn

n

OO

+.

IV. EXPERIMENTATION AND ANALYSIS

A. Experimentation The experimentation environment contains a 100M

Ethernet and 6 computers. Each computer has a P4 2.1G CPU and 512M memory. The system on it is Linux Ubuntu, and DCLVUM was implemented by C and MPI language. Ten groups of formal contexts were chosen to experimentation. The number of attributes in formal context was restricted to 80, and the objects(records) was generated randomly. The number of objects started from 400, grow 400 a time, and stop at 4000.

Traditional Algorithm(Godin Algorithm)’s time cost on every formal context was also recorded for the convenience of analysis. We divided formal context in to 2 stations and use DCLVUM to union the concept sub-lattice, then recorded the time cost as DCLVUM_2. In the same way, we recorded 3 to 6 stations situation as DCLVUM_2 to DCLVUM_6. The experimentation result is show in Figure 7.

Figure 7. The time performance of DCLVUM

B. Analysis From the Figure 7, we can see that the traditional

algorithm’s time performance is very sensitive to the number of records. Compare with it, the DCLVUM has a better time performance on the same formal context and the more stations the better.

But when the stations reach a certain level, the improvement degree is growing slowly. The DCLVUM_6 just a little better than DCLVUM_5. That is because the time of DCLVUM including two parts: construct concept sub-

lattice on distributed stations and union all the concept sub-lattices together. The former part save time as the formal context is smaller than the whole one. But when more and more stations were induced, the later part will cost more and more time to union all the concept sub-lattice together. If the later part grow faster than the former part, the whole improvement degree of the DCLVUM would grow slowly. However, it is still better than the traditional method.

V. CONCLUSION This paper proposed a Distributed Concept Lattice

Vertical Union Method, which could effective union the concept sub-lattice build from the formal context on distributed stations. The formal context on each station has the same attributes field but different objects field. This method has been proofed its validity and completeness in theory and implemented in real distributed environment. The experiment result has show that, it has a satisfied improvement on the time performance of traditional concept lattice’s construct method..

REFERENCES [1] R.Wille, Restructuring lattice theory: An approach based on

hierarchies of concepts[A]. Rival Ordered Sets[C], Dordrecht: Reidel, 1982 445-470.

[2] Hu Xuegang, Model research of knowledge discovery in database [D]. Hefei: Hefei University of Technology, Information school , 2000.

[3] Xie Zhipeng, Research of knowledge discovery based on concept lattice model [D]. Hefei: Hefei University of Technology, Information school , 2001.

[4] Xie Zhipeng, Liu Zongtian. Concept lattice and association rule discovery[J]. Journal of Computer Research & Development,2000, 37 (12) : 1 415- 1 421.

[5] Xie Zhipeng, Liu Zongtian. Research on Classifier based on lattice structure[A]. P roc Conference on Intelligent Information Processing, 16th World Computer Congress, Beijing, China[C]. 2000. 333- 338.

[6] Godin R, Mili H, Mineau GW , Design of class hierarchies based on concept (Galo is) lattices[J]. Theory and Application of Object System s, 1998, 4 (2) : 117- 134.

[7] Godin R, Missaoui R, Alaoui H. Incremental concept formation algorithms based on Galois (concept) lattices[J].Computational Intelligence,1995,11(2):246-267.

[8] Li Yun, Liu Zongtian. Attribute-based incremental formation algorithm of concept lattice. [J]. MINI-Micro system, 2004,25(10):1768-1771

[9] Xie Zhipeng, Liu Zongtian. A fast incremental algorithm for building concept lattice[J]. Chinese Journal of Computers,2002,25(5):490-496

[10] Vicent Dho, Beat Wuthrich, Distributed Mining of Classification Rules, Knowledge and Information, pp. 1-30. Apr. 2002

[11] Ali Mesbah, Data Mining and Parallel/Distribute Processing, Parallel computation, Feb. 2002

[12] Li Yun, Liu Zongtian. The horizontal union algorithm of multiple concept lattice. ACTA ELECTRONICA SINICA, Vol 32, No.11, pp. 1849-1854, Nov. 2004

0

500

1000

1500

2000

2500

3000

3500

4000

4500

400 800 1200 1600 2000 2400 2800 3200 3600 4000

Tim

e (s

econ

d)

GodinDCLVUM_2DCLVUM_3DCLVUM_4DCLVUM_5DCLVUM_6

473473

[ieee 2010 international conference on artificial intelligence and computational intelligence (aici)...

Documents