olap at toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · challenge •...
TRANSCRIPT
![Page 1: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/1.jpg)
OLAP at [email protected]
![Page 2: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/2.jpg)
Challenge• 14000+ Hive Table
• 20+ billion rows for Top 10 daily partitions (300+ billion rows for Top 1)
• Data Security
• Variety of complex HQLs
• 50+ HS2 instances running on Marathon with Consul (trouble shooting)
![Page 3: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/3.jpg)
Case study
![Page 4: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/4.jpg)
Case study
![Page 5: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/5.jpg)
Case study
![Page 6: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/6.jpg)
Case study
• Operator Priority: and > or
• A and B or C <> A and (B or C)
• PartitionPruner#compactExpr
![Page 7: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/7.jpg)
Open Source Projects
• Apache Hive (HMS + HS2)
• Apache Spark (SQL)
• Presto
• Apache Kylin (Data Cube)
• Apache Sentry (Authorization)
![Page 8: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/8.jpg)
Architecture Overview
Sentry HMS HDFS YARN
HS2 Spark SQL
QueryEditor Priest TEA
Presto Kylin QAP
Other Tools
![Page 9: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/9.jpg)
What we did?
• Introduce Presto
• rewrite HQL to ANSI SQL
• deployed on YARN by slider
• HiveMPP
![Page 10: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/10.jpg)
What we did?
![Page 11: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/11.jpg)
What we did?• Query Analysis Platform (QAP)
• Caching HMS & HDFS RPC to speed up HQL semantic analysis.
• Extract Query Cost Features. (Cardinality Estimation)
• Predict elapsed time for every HQL query. (decision tree regressor)
![Page 12: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/12.jpg)
What we did?
• HMS & HS2 as a Service
• We have 50+ HS2 instances
• Emit metrics for Every HS2 RPC call.
![Page 13: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/13.jpg)
What we did?
Metrics for Every HS2 RPC call
![Page 14: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/14.jpg)
What we did?
• Introduce Apache Sentry
• Integrated into our people system.
• Work at HS2/HMS/CLI as an authorization plugin.
• Hook: preAnalyze && postAnalyze
![Page 15: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/15.jpg)
What we did?
![Page 16: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/16.jpg)
What we did?
• Category query error log
• Client or Server ?
• parse / validate / execute ?
![Page 17: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/17.jpg)
What we did?
![Page 18: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/18.jpg)
What we did?
![Page 19: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/19.jpg)
What we did?• Introduce Apache Kylin to speed up multi-
dimensional analytics.
• improve cuboid spanning algorithm
• CuboidJob is triggered by chronos task
• auto resume CuboidJob
• popcnt & lzcnt
![Page 20: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/20.jpg)
What we did?
![Page 21: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/21.jpg)
Future work
• Integrate Hive & Spark SQL
• Identify and refuse bad query
• Auto suggestion for HQL
• DevOps improvement
![Page 22: OLAP at Toutiao 杨朝中bos.itdks.com › 0400dd4f77854580b1cdb6c7226ca088.pdf · Challenge • 14000+ Hive Table • 20+ billion rows for Top 10 daily partitions (300+ billion rows](https://reader036.vdocuments.mx/reader036/viewer/2022062921/5f0346e67e708231d4086a72/html5/thumbnails/22.jpg)
We are hiring!
https://job.toutiao.com