join optimization in hive

Join Optimization in Hive

Liyin Tang

Outline

• Map Join Optimization– Previous Common Join and Map Join– Optimized Map Join– JDBM– Performance Evaluation

• Convert Join to Map Join Automatically– How it works– Performance Evaluation

Common JoinTask A

Mapper

Mapper

Table X

Mapper

…

…

Mapper

Mapper…

Mapper

Reducer

Table Y

Shuffle

Common Join Task

Mapper

Mapper

MapJoin Task

Mapper

…

…

…

…

…

…

Previous Map JoinTask A

Task C

… Big Table Data

Record

Record

Record

Record

Record

……

…

Small Table Data

Optimized Map JoinTask A

Task C

Mapper

Mapper

…

…

Mapper …

MapJoin Task

Big Table Data

Record

Record

Record

Record

……

MapReduce Local Task

Small Table Data

Small Table Data

Small Table Data

Distributed Cache

HashTable Files

Upload files to DC

HashTable FilesHashTable Files

JDBM

• JDBM is too heavy weight for Map Join– Take more than 70% CPU time

– Generate very large file• No need to use persistent hashtable for map

join

Performance Evaluation ISmall Table Big Table Join

ConditionAverage Previous Map Join Execution time

Average New Optimized Map Join Execution time

Performance Improvement

75 K rows;383K file size

130 M rows;3.5G file size;

1 join key,2 join value

1032 sec 79 sec + 1206%

500 K rows;2.6M file size

130 M rows;3.5G file size


3991 sec 144 sec +2671 %


16.7 B rows;459 G file size


4801 sec 325 sec + 1377 %

Converting Common Join into Map JoinTask A

CommonJoinTask

Task C

Task A

Conditional Task

Task C

MapJoinLocalTask

CommonJoinTask. . . . .

c

a

b

Previous Execution Flow

Optimized Execution Flow

MapJoinTask

MapJoinLocalTask

MapJoinTask

MapJoinLocalTask

MapJoinTask

Compile Time

Task A

Conditional Task

Task C

MapJoinLocalTask

CommonJoinTask

a

MapJoinTask

MapJoinLocalTask

MapJoinTask

SELECT * FROM SRC1 x JOIN SRC2 y

ON x.key = y.key;

Assume TABLE x is the big table Assume TABLE y is the

big table

Execution Time

Task A

Conditional Task

Task C

MapJoinLocalTask

CommonJoinTask

a

MapJoinTask

Table X is the big table

Both tables are too big for map join

SELECT * FROM SRC1 x JOIN SRC2 y

ON x.key = y.key;

Backup TaskTask A

Conditional Task

Task C

MapJoin LocalTask

CommonJoinTask

MapJoinTask

Run as a Backup Task

Memory Bound

Performance Bottleneck

• Distributed Cache is the potential performance bottleneck– Large hashtable file will slow down the

propagation of Distributed Cache– Mappers are waiting for the hashtables file from

Distributed Cache• Compress and archive all the hashtable file

into a tar file.

Compress and Archive Task A

Task C

a

b

Mapper

Mapper

…

…

Small Table Data

MapJoin Task

Big Table Data

Record

Record

Record

Record

……

Mapper …

MapReduce Local Task

Distributed Cache

HashTable Files

Compressed & Archived

Small Table Data

Small Table Data

HashTable FilesHashTable Files

Performance Evaluation IISmall Table Big Table Join Condition Average Join

Execution Time Without Compression

Average Join Execution Time With Compression





106 sec 73 sec + 45%




129 sec 106 sec +21 %




441 sec 326 sec + 35 %




326 sec 251 sec +30 %

1M rows;10M file size



495 sec 266sec +86 %




425 sec 255 sec +67%

Performance Evaluation IIISmall Table Big Table Join

ConditionPrevious Common Join

Optimized Common Join





169 sec 79 sec + 114%




246 sec 144 sec +71 %




511 sec 325 sec + 57 %




502 sec 305 sec +64 %




653 sec 248 sec +163 %




1117sec 536 sec +108%

Future Work

• Audit how many join will be converted into map join in the cluster.

• Set hashtable file replica number based on the number of Mappers

• Tune the limit of small table data size by sampling

• Increase the in-memory hashtable capacity.

Thank you

Liyin Tang

join optimization in hive

Documents

6m file size

5g file size

383k file size

10m file size

input data size

persistent hash table

efficient execution path

previous map join