presto summit sf 2019 - starburst data€¦ · presto summit sf 2019 martin traverso, dain...

Post on 09-Jun-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Presto Summit SF 2019Martin Traverso, Dain Sundstrom, David Phillips

Presto Software Foundation“An independent, non-profit organization with the mission of supporting a community of passionate users and developers devoted to the advancement of the Presto distributed SQL query engine for big data.”

“It is dedicated to preserving the vision of high quality, performant, and dependable software.”

“Ensuring the project remains open, collaborative and independent for decades to come”

Presto Community

• Github: https://github.com/prestosql

• Website: https://prestosql.io

• Blog: https://prestosql.io/blog

• Twitter: @prestosql

• Slack: prestosql.slack.com

Since the Launch…• Launched on January 31, 2019

• 16 releases (~1 per week)

• 1300+ commits

• 200k lines changed

• 650+ pull requests closed

• 50+ contributors

• 170 weekly active members on Slack

Contributors

kokosing

raunaqmorarka

pgagnonMiguelWeezardo

MarvinCai

Praveen2112

chancez

hustnn

kasiafi

sopel39

stagraqubole

yui-knk

Yaliang

dain

11xor6

Lewuathe

garvit-gupta

VicoWu

qqibrow

findepi

pettyjamesm

martint

electrum

vincentpoon

wyukawa

guyco33

bill-warshaw vkorukanti

anusudarsandilipkasana

sshardool linxingyuan1102

luohao

zhenxiao

rzeyde-varada

takezoe

kabunchiryanrupp

ilfrinChethanUK

ebyhrxumingming

Recent Improvements(since the launch)

ORC Performance

ORC Performance

Semijoin

Performance• S3 network bandwidth/latency for Parquet and ORC

• ZSTD and LZ4 for ORC/Parquet

• Skip redundant ORDER BY

• ORDER BY + LIMIT with OUTER JOIN

• IN (SELECT DISTINCT …)

• JOIN involving coercions and inline tables

• Spilling

• Coming soon: UNNEST improvements

• … and more

ROW subscript operator

WITH t(r) AS ( VALUES ROW(ROW(1, 'a')), ROW(ROW(2, 'b'))) SELECT r[1], r[2] FROM t

r :: row(? smallint, ? varchar(1))

Access field by ordinal

Visualize plan structure

Clearer subplanschema

SELECT max(totalprice) FROM ( SELECT totalprice FROM orders ORDER BY orderkey)

Warn on redundant ORDER BY

Pushdown•Limit

•TableSample

•Filter (simple range predicates)

•Projection (column and ROW field dereference)

•Coming soon

•Generalized projections and filters

•Aggregation

•Join

https://github.com/prestosql/presto/issues/18

New Plugins

• Elasticsearch connector

• Apache Phoenix connector

• Apache Ranger

• https://cwiki.apache.org/confluence/display/RANGER/Presto+Plugin

Other Improvements• Docker image

• Spill-to-disk improvements

• CLI output formats

• UUID type and functions

• format(), combinations() functions

• ORC bloom filters (non-legacy)

• Connector-provided view definitions

• More type mappings for various connectors

• … and more!

• FETCH FIRST … WITH TIES syntax

• OFFSET syntax

• COMMENT ON <table> IS …

• [LEFT/RIGHT/FULL] JOIN LATERAL (…) ON …

• Pass-through security (client provided credentials)

• Kerberos security improvements

• Role-based security

• Secure query results in client API

• Current user security mode for views

Roadmap

Roadmap

• Dynamic

• Real world priorities and requirements

• What volunteers work on

• Not a wish list

• https://github.com/prestosql/presto/labels/roadmap

Core Engine• Case-sensitive identifiers

• Timestamp semantics

• Dynamic filtering

• Dynamically-resolved functions

• SQL-defined functions (CREATE FUNCTION)

• Operator fusion and late materialization

Connectors

• Iceberg (in progress)

• Kinesis (in progress)

• Druid

• Pinot

• Clickhouse

Infrastructure

• Coordinator High Availability

• Spot instances

• Kubernetes

Getting Involved• Join Slack

• https://prestosql.io/community.html

• #troubleshooting channel

• File issues/bugs:

• https://github.com/prestosql/presto

• Write blog posts

• https://prestosql.io/blog

top related