distributed query processing using different semijoin operations

26
1 Distributed Query Processing using different Semijoin operations. Presented By: Jamal Uddin Ahamed Friday,March12,2004

Upload: ferris-wheeler

Post on 30-Dec-2015

36 views

Category:

Documents


0 download

DESCRIPTION

Distributed Query Processing using different Semijoin operations. Presented By: Jamal Uddin Ahamed Friday,March12,2004. Presentation Outline:. 1.Overview. 2.Semijoin Operation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed Query Processing using different Semijoin operations

1

Distributed Query Processing using different Semijoin operations.

Presented By: Jamal Uddin Ahamed

Friday,March12,2004

Page 2: Distributed Query Processing using different Semijoin operations

2

Presentation Outline:

1.Overview.

2.Semijoin Operation.

3. Different semijoin operations.

a. 2 way semijoin.

b.Hash Semijoin.

c.Domain Specific Semijoin.

d. Composite semijoin.

4. References.

5.Questions and Answer.

Page 3: Distributed Query Processing using different Semijoin operations

3

1.1 What is distributed database system?

A distributed database system is characterized by the distribution of the system components of hardware ,control and data. For this research, a distributed system is a collection of independent computers interconnected via point-to-point communication lines.

Page 4: Distributed Query Processing using different Semijoin operations

4

1.2 Node Characteristics:

Each computer , known as a node in the

network, has a processing capability, a

data storage capability, and is capable

of operating autonomously in the system.

Each node contains a version of a

distributed DBMS.

Page 5: Distributed Query Processing using different Semijoin operations

5

1.3 What is distributed query processing?

The retrieval of data from different sites in a network is known as distributed query processing.

Page 6: Distributed Query Processing using different Semijoin operations

6

1.4 Phases of distributed query processing with a semijoin operator.

1. Initial Local processing (Selections and Projects are processed at each site.)

2. Semijoin processing ( A semijoin program) is derived from the remaining join operations and executed to reduce the size of the relations in a cost-effective way)

3. Final processing (all relations involved are transmitted to final site and all joins are performed there.)

Page 7: Distributed Query Processing using different Semijoin operations

7

2.1 Semijoin:

A semijoin from Ri to Rj on attribute A can be denoted as Rj⋉ Ri .It is used to reduce the data transmission cost.

Computing steps:

1) Project Ri on attribute A (Ri[A] ) and ship this projection ( a semijoin projection) from the site of Ri to the site of Rj ;

2) Reduce Rj to Rj’ by eliminating tuples where attribute A are not matching any value in Ri[A] .

Page 8: Distributed Query Processing using different Semijoin operations

8

2.2 Example:

Example (semijoin s: R1—AR2):

3

4

5

7

8

9

A C

R2

A B

1

2

4

5

3 6

R1

Site 1 Site 21

2

3

R1[A]

projectionShip(3)

qsShip(2)

Ship(6)3 7

R2’

reduce

Benefit (s) = 6 -2 = 4Cost (s) = 3Cost effectiveness D(s) = B(s)-C(s) >0

Page 9: Distributed Query Processing using different Semijoin operations

9

3.a.1 Definition of 2 way semijoin.

2-way Semijoin—an extended version of the semijoin

Definition: A 2-way semijoin (t) of Ri and Rj on attribute A can be denoted as

RiARj = {Ri—ARj, Rj—ARi }

So t reduces Ri and Rj to Ri’ and Rj’ respectively.

Page 10: Distributed Query Processing using different Semijoin operations

10

3.a.2 Properties of 2 way semijoin. Computing steps:

1) Send Ri [A] from site i to site j ;

2) Reduce Rj to Rj’ by eliminating tuples whose attribute A are not matching any of Ri [A] and at the same time partition Ri [A] to Ri [A]m (match one of Rj [A]) and Ri [A]nm(Ri [A]- Ri [A]m) ;

3) Send min(Ri [A]m , Ri [A]nm) back to site i ;

4) Reduce Ri to Ri ’ using Ri [A]m (or Ri [A]nm) . Evaluation:

– Benefit: B(t) = [S(Ri ) - S(Ri ’)] + [S(Rj) - S(Rj’)]– Cost: C(t) = S(Ri [A] ) + min[S(Ri [A]m ) , S( Ri [A]nm)]– If the benefit exceeds the cost (D(t) >0) then it is called a cost-

effective 2-way semioin

Page 11: Distributed Query Processing using different Semijoin operations

11

3.a.3 2-way semijoin example.

3

4

5

7

8

9

A C

R2

A B

1

2

4

5

3 6

R1

Site 1 Site 2

1

2

3

R1[A]

projection

Ship(3)

3 6R1’

reduce

3

R1[A]m

1

2

R1[A]nm

partition

3 7R2’

reduce

qs

Ship(2)Ship(2)

Ship(1)

Page 12: Distributed Query Processing using different Semijoin operations

12

3.a.4 Semijoin Vs 2-way semijoin.

-It is an extended version of semijoin.– It has more reduction power than semijoin.– The propagation of reduction effects by the 2-

way semijoin is further than by the semijoin.

Page 13: Distributed Query Processing using different Semijoin operations

13

3.b.1 Hash-semijoin operator.

Main idea : use a search filter which represents the semijoin projection with a small bit array .

Definition:

The hash-semijoin of Ri and Rj is denoted Rj∝ Ri. It is computed as follow: – The Semijoin projection of Ri is represented as

a bit array;– Shipping this bit array to the site of Rj ; – finally, the tuples of Rj are screened by the

search filter.

Page 14: Distributed Query Processing using different Semijoin operations

14

3.b.2 hash semijoin example.

S#

Name

1 Cindy

3 Jemal

4 Sunny

8 Maggie

S#(R1)

1

3

4

8

projection

10110001

B

H(x)=X

Hij((Ri))Bij

S#

Phone2 222

3 333

4 444

5 555

6 666

Ship(Bij) Rj

R1

R2

reduce

3 3334 444

Page 15: Distributed Query Processing using different Semijoin operations

15

3.b.3 Semijoin Vs Hash Semijoin.

• Advantages:– Hash-semijoin is more cost-effective than semijoin– The search filter in the hash-semijoin achieves

considerable savings in the cost of a semijoin operation

• Limitation:– Only works on execution tree– Tightly related with the hash functions

Page 16: Distributed Query Processing using different Semijoin operations

16

3.c.1 What is horizontally partitioned database

We can call a distributed database system is horizontally partitioned (or fragmented) if the relations can be split horizontally into several disjoint sets of tuples, which are called horizontal fragments.

Page 17: Distributed Query Processing using different Semijoin operations

17

3.c.2 Horizontally partitioned database system.(Example)

E-no E-name D-no

101 johnson 01

103 jordan 03

105 erving 01

109 jabbar 12

110 sampson 14

141 chang 16

EMPE-no E-name D-no

101 johnson 01

103 jordan 03

105 erving 01

E-no E-name D-no

109 jabbar 12

110 sampson 14

141 chang 16

EMP2: 11D-no 20

EMP1: 1D-no 10

Page 18: Distributed Query Processing using different Semijoin operations

18

3.c.3 Horizontally partitioned database system.(Properties) A fragmented relation Ri can be constructed by performing

a union operation on all its fragment.

Ri = Uk Rik There is commutative rule between the binary operations

join and union for fragmented relations: a join between two fragmented relation R1 and R2 is equivalent to a union over the joins between each fragment of R1 and each fragment of R2.

Mathematically: (U R1k)[A=B] (U R2m)= U(R1k[A=B] R2m) k m k.m

Page 19: Distributed Query Processing using different Semijoin operations

19

3.c.4 Why can’t we use regular semjoin between two fragment to reduce the size of fragments?(Continue) We consider a joint Ri[A=B] Rj between two fragmented

relations Ri and Rj. We want to reduce the size of Rik, a fragment of Ri , by semijoin before it is sent to the final processing site. We cannot perform the semijoin

Rik A=B] Rjm

between Rik and any fragment Rjm of Rj without considering the other fragment Rjm of Rj , because the join operation dictates that no tuple of a relation can be eliminate before it is compare with all tupls of the other joining relation which may be contribute to the join.

Page 20: Distributed Query Processing using different Semijoin operations

20

Example:

E-no E-name D-no

101 johnson 01

103 jordan 03

135 erving 01

EMP1: 1D-no 10

EMP2: 11D-no 20

E-no Sal D-no

101 1000 12

102 2000 03

105 3000 11

E-no E-name D-no

109 jabbar 12

110 sampson 14

141 chang 16

sal: 101E-no 105

sal: 105E-no 110

E-no Sal D-no

107 1000 12

107 2000 03

110 3000 11

D-no

01

03

12

14

16

Page 21: Distributed Query Processing using different Semijoin operations

21

3.c.5 Definition of Domain Specific Semijoin.

The domain-specific semijoin operation, Rik( A=B] Rjm,

where A and B are the joining attributes and Rik, Rjm are two

fragments of the joining relation Ri and Rj respectively, is defined as follows:

Rik( A=B] Rjm ={r|r Rik ; r.A Rjm [B] U(Dom[Rj.B]-Dom[Rjm.B])}

Where Rik is the restricted fragment and Rjm is the restricting fragment. We also called Ri the restricted relation and Rj is the restricting relation of the domain-specific semijoin.

Page 22: Distributed Query Processing using different Semijoin operations

22

3.d.1 Definition of Composite Semijoin.

Composite Semijoin: a semijoin in which the projection and the transimssion involve multiple columns (attrs).

Page 23: Distributed Query Processing using different Semijoin operations

23

3.d.2 Example of Composite Semijoin.

A1 A2 Non-join Attr

1 aa -

1 bb -

2 cc -

3 cc -

A1 A2 Non-join Attr

1 cc -

1 aa -

2 bb -

3 bb -

A1 A2 Non-join Attr

1 aa -

R1 R2

No False loop!!

Page 24: Distributed Query Processing using different Semijoin operations

24

3.d.3 Semijoin Vs Composite Semijoin.

Composite semijoins in a query processing algorithm is likely to result in substantial RT reduction.

Composite semijoins should not always be used. If it results greater RT, ignore it.

Strategy with composite semijoins is at least as good as that without composite semijoins.

Page 25: Distributed Query Processing using different Semijoin operations

25

References: 1. Using 2-way semijoin in distributed query processing. By Hyunchul

Kang and Nick Roussopoulos.

2. Improving distributed query processing by hash-semijoins. By Judy Tseng and Arbee Chen.

3. Domain Specific Semijoin:A new operation for distributed query processing. By Jason Chen and Victor Li.

4. Composite Semijoin in distributed query processing. By William Perrizio and Chun Chen

Page 26: Distributed Query Processing using different Semijoin operations

26

Comments &Questions??

Thank You!