presto

23
Presto [email protected] Thursday, 17 April, 14

Upload: chen-chun

Post on 25-Jun-2015

539 views

Category:

Software


2 download

TRANSCRIPT

Page 2: Presto

Content

• Background

• Architecture

• Key points for low query latency

• What we do

• Reference

Thursday, 17 April, 14

Page 3: Presto

Background

• 300+PB data stored in Hadoop/HDFS-based clusters

• More queries and get results faster improves analysts, data scientists, and engineers productivity

• MapReduce and Hive are designed for large-scale, reliable computation

• External projects too nascent or did not meet our requirements for flexibility and scale

Thursday, 17 April, 14

Page 4: Presto

Architecture

Thursday, 17 April, 14

Page 5: Presto

Key points for low latency

• In memory parallel computing

• Pipeline

• Data local computation

• Data cache

• Dynamic compile part of the plan to byte code

• Careful use of memory and data structure

• BlinkDB liked approximate queries

• Traditional SQL optimize

• GC controlThursday, 17 April, 14

Page 6: Presto

Compile flow

Thursday, 17 April, 14

Page 7: Presto

In memory parallel computingselect c1.rank, count(*) from dim.city c1 join dim.city c2 on c1.id = c2.id

where c1.id > 10 group by c1.rank limit 10;

Thursday, 17 April, 14

Page 8: Presto

In memory parallel computing

Thursday, 17 April, 14

Page 9: Presto

In memory parallel computing

Thursday, 17 April, 14

Page 10: Presto

In memory parallel computing

• PlanDistribution=Source– InputSplit[] splits =

inputFormat.getSplits(jobConf, 0);

• PlanDistribution=Hash– Hash Shuffle– Fixed Workers– query.initial-hash-partitions

Thursday, 17 April, 14

Page 11: Presto

SplitRunner thread number task.shard.max-threads=availableProcessors() * 4

Pipeline - TaskExecutor

Thursday, 17 April, 14

Page 12: Presto

Pipeline - Operator process flow

Page(max page size: 1MB, max rows: 16 * 1024 )

Thursday, 17 April, 14

Page 13: Presto

Pipeline - ExchangeOperator

Thursday, 17 April, 14

Page 14: Presto

Data local computation

• Select acceptable nodes (as least 10 nodes by default)– Nodes has the same address– If not enough, add nodes in the same rack– If not enough, randomly select nodes in other racks

• Select the node with the smallest number of assignments (pending tasks)

Thursday, 17 April, 14

Page 15: Presto

Data cache

• Google Guava LoadingCache• Cached Objects– HiveMeta database table partition– Byte Code Class

FilterAndProjectOperatorFactoryFactory, ScanFilterAndProjectOperatorFactoryFactory

– functions

Thursday, 17 April, 14

Page 16: Presto

Dynamic compile plan to byte code

• Presto dynamic compile FilterAndProjectOperator and ScanFilterAndProjectOperator to byte code which lets the JIT optimize and generate native machine code

• How much does it speed up ?• ScanFilterAndProjectOperator

Thursday, 17 April, 14

Page 17: Presto

Careful use mem & data structure

• Slice– Unsafe#copyMemory– 20% ~ 30% speed up for ORCFile write performance

• ThreadLocalRandom– ThreadLocal seed instead of AtomicLong– 100% speed up

• ListenableFuture– Async Callback

Thursday, 17 April, 14

Page 18: Presto

Approximate queries

• approx_avg, approx_distinct, approx_percentile• +50% speed up

Thursday, 17 April, 14

Page 19: Presto

Traditional SQL optimize

• ImplementSampleAsFilter• LimitPushDown• MaterializeSamplePullUp• MergeProjections• PredicatePushDown• PruneRedundantProjections• PruneUnreferencedOutputs• SetFlatteningOptimizer• SimplifyExpressions• UnaliasSymbolReferences

Thursday, 17 April, 14

Page 20: Presto

GC control

• A JDK 1.7 BUG • When code cache fills up, there is a chance that JIT

might stop compile byte code to native code.• By forcing classes to unload from the perm gen,

we let the code cache evictor make room before the cache fills up.

• System.gc()

Thursday, 17 April, 14

Page 21: Presto

What we do

• Support kerberos authentication

• Implicit type coercion

• Support reading lzo compressed tables

• Implement useful functions

• Fix planning issue when using DISTICT aggregations in HAVING clause

• https://github.com/MTDATA/presto/commits/mt-0.60

Thursday, 17 April, 14

Page 22: Presto

Reference

• http://prestodb.io/

• https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920

• http://www.slideshare.net/zhusx/presto-overview?from_search=1

• http://www.slideshare.net/frsyuki/hadoop-source-code-reading-15-in-japan-presto

Thursday, 17 April, 14

Page 23: Presto

Thanks

Thursday, 17 April, 14