building multi-petabyte data warehouses with clickhouse · pdf file– paraccel (now...
TRANSCRIPT
![Page 1: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/1.jpg)
BuildingMulti-PetabyteDataWarehouseswithClickHouse
AlexanderZaitsev
LifeSteet,Altinity
PerconaLiveDublin,2017
Altinity
![Page 2: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/2.jpg)
WhoamI
• GraduatedMoscowStateUniversityin1999
• Softwareengineersince1997
• Developeddistributedsystemssince2002
• Focusedonhighperformanceanalyticssince2007
• DirectorofEngineeringinLifeStreet
• Co-founderofAltinity
![Page 3: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/3.jpg)
![Page 4: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/4.jpg)
• AdTechcompany(adexchange,adserver,RTB,DMPetc.)
since2006
• 10,000,000,000+events/day
• 2K/event
• 3monthsretention(90-120days)
10B*2K*[90-120]=[1.8-2.4]PB
![Page 5: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/5.jpg)
• Tried/used/evaluated:
– MySQL(TokuDB,ShardQuery)
– InfiniDB
– MonetDB
– InfoBrightEE– Paraccel(nowRedShift)
– Oracle
– Greenplum
– SnowflakeDB
– Vertica
ClickHouse
![Page 6: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/6.jpg)
Flashback:ClickHouseat08/2016
• 1-2monthsinOpenSource
• InternalYandexproduct–nootherinstallations
• Nosupport,roadmap,communicatedplans
• 3officialdevs
• Anumberofvisiblelimitations(andmanyinvisible)
• Storiesofotherdoomedopen-sourcedDBs
![Page 7: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/7.jpg)
Developproductionsystemwith“that”?
![Page 8: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/8.jpg)
ClickHouseis/wasmissing:
• Transactions
• Constraints
• Consistency
• UPDATE/DELETE
• NULLs(addedfewmonthsago)
• Milliseconds
• Implicittypeconversions
• StandardSQLsupport
• Partitioningbyanycolumn(dateonly)
• Enterpriseoperationtools
![Page 9: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/9.jpg)
SQLdevelopersreaction:
![Page 10: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/10.jpg)
Butwetriedandsucceeded
![Page 11: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/11.jpg)
Beforeyougo:
ü Confirmyourusecase
ü Checkbenchmarks
ü Runyourown
ü Considerlimitations,notfeatures
ü MakeaPOC
![Page 12: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/12.jpg)
Migrationproblem:basicthingsdonotfit
![Page 13: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/13.jpg)
MainChallenges
• Efficientschema
– UseClickHousebests
– Workaroundlimitations
• Reliabledataingestion
• Shardingandreplication
• Clientinterfaces
![Page 14: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/14.jpg)
LifeStreetUseCase
• Publisher/Advertiserperformance
• Campaign/Creativeperformanceprediction
• Realtimealgorithmicbidding
• DMP
![Page 15: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/15.jpg)
LifeStreetRequirements
• Load10Brows/day,500dimensions/row
• Ad-hocreportson3monthsofdata
• Lowdataandquerylatency
• HighAvailability
![Page 16: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/16.jpg)
Multi-DimensionalAnalysis
N-dimensionalcube
M-dimensionalprojection
slice
OLAPquery:aggregation+filter+groupby
Rangefilter
Queryresult
Disclaimer:averageslie
![Page 17: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/17.jpg)
Typicalschema:“star”
• Facts• Dimensions• Metrics• Projections
![Page 18: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/18.jpg)
StarSchemaApproach
De-normalized:dimensionsinafacttable
Normalized:dimensionkeysinafacttableseparatedimensiontables
Singletable,simple Multipletables
Simplequeries,nojoins Morecomplexquerieswithjoins
Datacannotbechanged Dataindimensiontablescanbechanged
Sub-efficientstorage Efficientstorage
Sub-efficientqueries Moreefficientqueries
![Page 19: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/19.jpg)
Normalizedschema:traditionalapproach-joins
• LimitedsupportinClickHouse(1level,
cascadesub-selectsformultiple)
• Dimensiontablesarenotupdatable
![Page 20: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/20.jpg)
Dictionaries-ClickHousedimensionsapproach
• Lookupservice:key->value
• Supportsdifferentexternalsources(files,
databasesetc.)
• Refreshable
![Page 21: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/21.jpg)
Dictionaries.ExampleSELECT country_name, sum(imps) FROM T ANY INNER JOIN dim_geo USING (geo_key) GROUP BY country_name; vs SELECT dictGetString(‘dim_geo’, ‘country_name’, geo_key) country_name, sum(imps) FROM T GROUP BY country_name;
![Page 22: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/22.jpg)
Dictionaries.Configuration<dictionary>
<name></name>
<source>…</source>
<lifetime>...</lifetime>
<layout>…</layout>
<structure>
<id>...</id>
<attribute>...</attribute>
<attribute>...</attribute>
...
</structure>
</dictionary>
![Page 23: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/23.jpg)
Dictionaries.Sources• file
• mysqltable
• clickhousetable
• odbcdatasource
• executablescript
• httpservice
![Page 24: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/24.jpg)
Dictionaries.Layouts
• flat
• hashed
• cache
• complex_key_hashed
• range_hashed
![Page 25: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/25.jpg)
Dictionaries.range_hashed
• ‘EffectiveDated’queries
<layout>
<range_hashed/>
</layout>
<structure>
<id>
<name>id</name>
</id>
<range_min>
<name>start_date</name>
</range_min>
<range_max>
<name>end_date</name>
</range_max>
dictGetFloat32('srv_ad_serving_costs','ad_imps_cpm',toUInt64(0),event_day)
![Page 26: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/26.jpg)
Dictionaries.Updatevalues• Bytimer(default)
• AutomaticforMySQLMyISAM
• Using‘invalidate_query’
• Manuallytouchingconfigfile
• Warning:Ndict*Mnodes=N*MDB
connections
![Page 27: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/27.jpg)
Dictionaries.Restrictions
• ‘Normal’keysareonlyUInt64
• Noondemandupdate(addedinSep2017
1.1.54289)
• Everyclusternodehasitsowncopy
• XMLconfig(DDLwouldbebetter)
![Page 28: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/28.jpg)
Dictionariesvs.Tables
+NoJOINs
+Updatable
+Alwaysinmemoryforflat/hash(faster)
- Notapartoftheschema
- Somewhatinconvenientsyntax
![Page 29: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/29.jpg)
Tables
• Engines
• Sharding/Distribution
• Replication
![Page 30: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/30.jpg)
Engine=?
• Inmemory:
– Memory
– Buffer
– Join
– Set
• Ondisk:
– Log,TinyLog
– MergeTreefamily
• Interface:
– Distributed– Merge
– Dictionary• Specialpurpose:
– View
– MaterializedView
– Null
![Page 31: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/31.jpg)
Mergetree• Whatis‘merge’
• PKsortingandindex
• Datepartitioning
• Queryperformance
Block1 Block2
Mergedblock
PKindex
Seedetailsat:https://medium.com/@f1yegor/clickhouse-primary-keys-2cf2a45d7324
![Page 32: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/32.jpg)
MergeTreefamily
ReplicatedReplacingCollapsingSummingAggergatingGraphite
MergeTree+ +
![Page 33: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/33.jpg)
DataLoad
• Multipleformatsaresupported,includingCSV,TSV,
JSONs,nativebinary
• Errorhandling• SimpleTransformations
• Loadlocally(better)ordistributed(possible)
• Temptableshelp
• Replicatedtableshelpwithde-dup
![Page 34: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/34.jpg)
ThepowerofMaterializedViews
• MVisatable,i.e.engine,replicationetc.
• Updatedsynchronously
• Summing/AggregatingMergeTree–consistentaggregation
• Altersareproblematic
![Page 35: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/35.jpg)
DataLoadDiagram
Temptables(local)
Facttables(shard)
SummingMergeTree(shard)
SummingMergeTree(shard)
LogFiles
INSERT
MV MV
INSERT Buffertables(local)
Realtimeproducers
INSERT
Bufferflush
MySQL
Dictionaries
CLICKHOUSENODE
![Page 36: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/36.jpg)
Updatesanddeletes
• Dictionariesarerefreshable
• ReplacingandCollapsingmergetrees
– eventuallyupdates
– SELECT…FINAL
• Partitions
![Page 37: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/37.jpg)
ShardingandReplication• ShardingandDistribution=>Performance
– FacttablesandMVs–distributedovermultipleshards
– Dimensiontablesanddicts–replicatedateverynode
(localjoinsandfilters)
• Replication=>Reliability– 2-3replicaspershard
– CrossDC
![Page 38: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/38.jpg)
DistributedQuerySELECTfooFROMdistributed_tableGROUPbycol1
Server1,2or3
SELECTfooFROMlocal_tableGROUPBYcol1
• Server1
SELECTfooFROMlocal_tableGROUPBYcol1
• Server2
SELECTfooFROMlocal_tableGROUPBYcol1
• Server3
![Page 39: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/39.jpg)
Replication• Pertabletopologyconfiguration:
– Dimensiontables–replicatetoanynode– Facttables–replicatetomirrorreplica
• Zookepertocommunicatethestate
– State:whatblocks/partstoreplicate
• Asynchronous=>fasterandreliableenough
• Synchronous=>slower
• Isolatequerytoreplica
• Replicationqueues
![Page 40: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/40.jpg)
SQL• SupportsbasicSQLsyntax
• Non-standardJOINsimplementation:
– 1levelonly
– ANYvsALL
– onlyUSING
• Aliasingeverywhere
• Arrayandnesteddatatypes,lambda-expressions,ARRAYJOIN
• GLOBALIN,GLOBALJOIN
• Approximatequeries
• Someanalyticalfunctions
![Page 41: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/41.jpg)
HardwareandDeployment
• LoadisCPUintensive=>morecores
• Queryisdiskintensive=>fasterdisks
• 10-12SATARAID10– SAS/SSD=>x2performanceforx2priceforx0.5capacity
• 10TB/serverseemsoptimal
• Zookeper–keepinonDCforfastquorum
• RemoteDCworkbad(e.g.EastanWestcoastinUS)
![Page 42: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/42.jpg)
MainChallengesRevisited
• Designefficientschema
– UseClickHousebests
– Workaroundlimitations
• Designshardingandreplication
• Reliabledataingestion
• Clientinterfaces
![Page 43: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/43.jpg)
Migrationprojecttimelines• August2016:POC
• October2016:firsttestruns
• December2016:productionscaledataload:
– 10-50Bevents/day,20TBdata/day– 12x2serverswith12x4TBRAID10
• March2017:ClientAPIready,startingmigration
– 30+clienttypes,20req/squeryload
• May2017:extensionto20x3servers
• June2017:migrationcompleted!
– 2-2.5PBuncompresseddata
![Page 44: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/44.jpg)
Fewexamples
:)selectcount(*)fromdw.ad8_fact_eventwhereaccess_day=today()-1;SELECTcount(*)FROMdw.ad8_fact_eventWHEREaccess_day=(today()-1)┌────count()─┐│7585106796│└────────────┘1rowsinset.Elapsed:0.503sec.Processed12.78billionrows,25.57GB(25.41billionrows/s.,50.82GB/s.)
![Page 45: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/45.jpg)
:)selectdictGetString('dim_country','country_code',toUInt64(country_key))country_code,count(*)cntfromdw.ad8_fact_eventwhereaccess_day=today()-1groupbycountry_codeorderbycntdesclimit5;SELECTdictGetString('dim_country','country_code',toUInt64(country_key))AScountry_code,count(*)AScntFROMdw.ad8_fact_eventWHEREaccess_day=(today()-1)GROUPBYcountry_codeORDERBYcntDESCLIMIT5┌─country_code─┬────────cnt─┐│US│2159011287││MX│448561730││FR│433144172││GB│352344184││DE│336479374│└──────────────┴────────────┘5rowsinset.Elapsed:2.478sec.Processed12.78billionrows,55.91GB(5.16billionrows/s.,22.57GB/s.)
![Page 46: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/46.jpg)
:)SELECTdictGetString('dim_country','country_code',toUInt64(country_key))AScountry_code,sum(cnt)AScntFROM(SELECTcountry_key,count(*)AScntFROMdw.ad8_fact_eventWHEREaccess_day=(today()-1)GROUPBYcountry_keyORDERBYcntDESCLIMIT5)GROUPBYcountry_codeORDERBYcntDESC┌─country_code─┬────────cnt─┐│US│2159011287││MX│448561730││FR│433144172││GB│352344184││DE│336479374│└──────────────┴────────────┘5rowsinset.Elapsed:1.471sec.Processed12.80billionrows,55.94GB(8.70billionrows/s.,38.02GB/s.)
![Page 47: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/47.jpg)
:)SELECTcountDistinct(name)ASnum_cols,formatReadableSize(sum(data_compressed_bytes)ASc)AScomp,formatReadableSize(sum(data_uncompressed_bytes)ASr)ASraw,c/rAScomp_ratioFROMlf.columnsWHEREtable='ad8_fact_event_shard'┌─num_cols─┬─comp───────┬─raw──────┬──────────comp_ratio─┐│308│325.98TiB│4.71PiB│0.06757640834769944│└──────────┴────────────┴──────────┴─────────────────────┘1rowsinset.Elapsed:0.289sec.Processed281.46thousandrows,33.92MB(973.22thousandrows/s.,117.28MB/s.)
![Page 48: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/48.jpg)
ClickHouseatfall2017
• 1+yearOpenSource
• 100+prodinstallsworldwide
• Publicchangelogs,roadmap,andplans
• 5+2Yandexdevs,fewcommunitycontributors
• Activecommunity,blogs,casestudies
• Alotoffeaturesaddedbycommunityrequests
• SupportbyAltinity
![Page 49: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/49.jpg)
Sonowitismucheasier
![Page 50: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/50.jpg)
ClickHouseandMySQL
• MySQLiswidespreadbutweakforanalytics
– TokuDB,InfiniDBsomewhathelp
• ClickHouseisbestinanalytics
Howtocombine?
![Page 51: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/51.jpg)
Imagine
MySQLflexibilityatClickHousespeed?
![Page 52: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/52.jpg)
Dreams….
![Page 53: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/53.jpg)
ClickHousewithMySQL• ProxySQLtoaccessClickHouse
dataviaMySQLprotocol(moreat
thenextsession)
• BinlogsintegrationtoloadMySQL
datainClickHouseinrealtime(in
progress)
MySQL CH
ProxySQL
binlogconsumer
![Page 54: Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica ClickHouse Flashback: ClickHouse](https://reader034.vdocuments.mx/reader034/viewer/2022052319/5aacc7f47f8b9a693f8d7efa/html5/thumbnails/54.jpg)
ClickHouseinsteadofMySQL
• Weblogsanalytics
• Monitoringdatacollectionandanalysis
– Percona’sPMM
– InfinidatInfiniMetrics
• Othertimeseriesapps
• ..andmore!