seize the data. 2015 · 2015. 8. 7. · • break the query up into sub-queries - vertica can only...

29
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

Upload: others

Post on 08-Nov-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.1 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015

Page 2: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

SEIZE THE DATA. 2015

VERTICA PERFORMANCE TUNINGPractical LessonsCurtis Bennett, Vertica Professional Services

August 10, 2015

Page 3: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

SEIZE THE DATA. 2015

Logical / Physical Modeling

Page 4: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4 SEIZE THE DATA. 2015

Denormalize!

It's the Data Model, Stupid!

• Do not offload your relational, 3rd normal form model right into Vertica. Bad idea

• Get rid of any 1:1 relationships that you had in your row-oriented database because the table got too wide

• It may be advantageous to separate out "intelligent" fields such as credit cards or phone numbers to increase the amount of RLE

− where phone_number = ‘123-456-7890’

becomes

where area_code = ‘123’

and phone_number = ‘123-456-7890’

Page 5: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5 SEIZE THE DATA. 2015

Partitioning

Benefits of Partitioning include:• Ability to drop partitions easily

− No delete vectors

− Very fast

• Ignore huge chunks of irrelevant information, known as "Partition Pruning"

• Take Advantage of some powerful functions:

− SWAP_PARTITIONS_BETWEEN_TABLES

− MOVE_PARTITIONS_TO_TABLE

• Increased optimizer parallelism

Partitioning by dates works best. Avoid Complex

partition keys such as modulus values

Finding something by knowing where it is not

Page 6: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 SEIZE THE DATA. 2015

Pre-Join Projections

There are some very good use-cases

Pre-Join projections can solve the problem of inefficient GROUP BY operations when the GROUP BY list spans across tables, and you need to have a single set of ordered columns in order to facilitate a GROUP BY PIPELINE

Caveats:

• Slight load penalty

• Enforces referential integrity

• Cascades deletes (due to RI)

Page 7: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7 SEIZE THE DATA. 2015

Live Aggregate Projections

• Can replace the creation of aggregation tables

• Projection which is maintained and aggregated on the fly as data is loaded

• Supports

• Count

• Max

• Min

• Sum

• Combinations of any of the above

• Restrictions apply, check the documentation

• See also Top-K projections and Projections with Expressions

Page 8: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 SEIZE THE DATA. 2015

Some More Physical Model Tidbits

• Speaking of Referential Integrity, don't forget to create the Primary Key and Foreign Key constraints on your tables

• Prevents the optimizer from making a poor decision by flipping an Inner/Outer join

• Don't use data types that are larger than necessary

• NUMBER defaults to NUMBER(38) which consumes 3 binary words. NUMBER(37) consumes 2 binary words and thus would be slightly faster

• CHAR(1) is not as efficient as a BOOLEAN type

• Don't bloat your VARCHARs. If VARCHAR(200) is sufficient, don't make it VARCHAR(1000) just to be safe - you'll add excess processing time to your queries - as much as a 20% overhead

• Joining on INT types is WAY faster than joining on large CHAR values.

Page 9: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

SEIZE THE DATA. 2015

PROJECTIONS

Page 10: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 SEIZE THE DATA. 2015

Replication vs. Segmentation

• Replicate dimensions, Segment facts

• In a large cluster (somewhere north of 20 nodes), segment almost everything

• If speed is of paramount importance, replicate

• A single node cluster is faster than a 3 node cluster, but obviously doesn't scale

• Define your segmentation keys with simplicity and consistency in mind

• The key should be unique, or nearly unique

• The segmentation value should be consistently applied in order to facilitate local joins as often as possible

Remember that if left to its own devices, Vertica will choose to segment by default.

Page 11: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 SEIZE THE DATA. 2015

The ORDER BY Clause

Two primary methods:

• ORDER BY low cardinality to high cardinality, ending with the primary key

• Promotes great RLE and generates good compression and performance

• ORDER BY for predicate-based lookups

• Predicates the predicate values

• Super fast first, then joins

Page 12: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 SEIZE THE DATA. 2015

Don't Overlook Encoding!

Encoding & Compression

• Vertica supports nearly a dozen kinds of column encoding

• A powerful feature of the columnar architecture

• Having good compression can result in tremendous performance gains

• Don't be afraid to experiment with different encoding types if performance is critical

• Sometimes AUTO actually works really well

• Let the Database Designer decide which encoding to use

• Familiarize yourself with the function: DESIGNER_DESIGN_PROJECTION_ENCODINGS()

Page 13: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13 SEIZE THE DATA. 2015

Database Designer & CorrelationsPro Tip:

Check out the function ANALYZE_CORRELATIONS()In order to generate correlations, you must run Database Designer manually through API calls directly in VSQL.

See the following functions to get started:

• DESIGNER_CREATE_DESIGN

• DESIGNER_ADD_DESIGN_TABLES

• DESIGNER_SET_ANALYZE_CORRELATIONS_MODE

• DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY

Correlation statistics replace regular statistics, so be careful not to create regular statistics on your table if correlated statistics are more optimal

Page 14: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14 SEIZE THE DATA. 2015

Miscellaneous

• Speaking of Database Designer - USE IT!

• Feel free to experiment with projection design if performance is critical

• Replace the table name with the projection name in the query to test different projection designs

• Projections that are no longer used should be removed -> fewer projections = fewer choices = slightly faster queries

Probably 90% of all performance-related problems are solvable with good projection design

EXPLAIN your queries. If performance is key, avoid HASH JOINS and HASH GROUP BYs.

Avoid RESEGMENTATION and BROADCAST at all cost!

Don't forget to update statistics!

Page 15: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

SEIZE THE DATA. 2015

QUERIES

Page 16: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16 SEIZE THE DATA. 2015

What NOT to do

• Avoid the use of IN() clauses to produce a set of keys values in a sub-query for the use in an outer query

• UNION statements inside a subquery

• Don’t select more than you need - in a columnar database, there is a cost associated with selecting lots of columns

• Don't go crazy with analytics when a simple aggregate will do

• Avoid inequality or negation predicates: !=, <>, >=, <= are all inefficient

• LIKE and ILIKE are slow. If possible, avoid % at the beginning of the string., e.g., use query ilike 'select%' instead of query ilike '%select%"

• Avoid OR, if possible

• Replace GROUP BY 1,2,3 or ORDER BY 1,2,3 with the actual column names, especially in production code

Page 17: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17 SEIZE THE DATA. 2015

Query Experimentation

• Try WITH CLAUSE materialization

• SELECT add_vertica_options('OPT', 'ENABLE_WITH_CLAUSE_MATERIALIZATION') ;

• SELECT clr_vertica_options('OPT', 'ENABLE_WITH_CLAUSE_MATERIALIZATION') ;

• Play with the Syntactic Optimizer

• SELECT /* +syntactic_optimizer */ col1, col2 from table1 …

• Try adding an ORDER BY clause into your sub-query, especially if it forces an outer to get a MERGE JOIN; it may be worth the cost of the sort

• Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can sometimes provide the optimizer with additional options

• Familiarize yourself with the analytic functions - perhaps you've coded something brute force that an analytic can solve more elegantly

Page 18: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18 SEIZE THE DATA. 2015

Pinned Projections

Pro Tip:

• CREATE GLOBAL TEMP TABLE foo(i int, j int) NO PROJECTION;

• CREATE PROJECTION foo_p(i, j)

AS select i, j from foo order by j PINNED ;

Creates a temp table ONLY on the initiator node

Useful for staging tables and working tables

VERY fast

Temporary tables are faster than regular ones, whether

they are Pinned or not.

Page 19: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

SEIZE THE DATA. 2015

INTERNALS

Page 20: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20 SEIZE THE DATA. 2015

Resource ManagerTake advantage of the Resource Manager• Increase the PlannedConcurrency in order to decrease budgeted RAM

− Different Resource Pools may have different footprints

− Compare the execution_engine_profile's counter_name values for 'memory allocated' and 'memory reserved'

• Lower ExecutionParallelism

− If hyper-threaded is enabled, the defaults here are very high because they are based on physical core counts, not logical

− Even if hyper-threading is off, it should usually be about 2/3rds of the default

− Take advantage of Cascading Pools

− If a query is important, raise the Priority to HIGH

Page 21: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21 SEIZE THE DATA. 2015

Problems can adversely affect performance

System Health

• Check for projection skew - projections should be evenly distributed - remember that Vertica is only as fast as the slowest node

• Make sure statistics have been analyzed and are current

• Remove all the delete vectors - they can have a profound negative impact on system performance

• Check ROS fragmentation - improper loading methodologies can create ROS fragmentation, which can create performance problems

• Make sure SEQUENCE cache refill sizes are reasonable - the default is 250,000 - setting it lower can create excessive catalog locking

• When in doubt, leave it to the professionals - a Vertica HealthCheck evaluates over 130 different audit points. If we can't find your bottleneck, no one can!

Page 22: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22 SEIZE THE DATA. 2015

Catalog

Catalog bloat can be a real problem

• Keep the number of objects to a minimum

• Empty tables and unused tables should be removed

− Tables that have no projections

• Unused projections should be removed

− Check the projection_usage table

• Partitions should be kept to a reasonable number

• Keep delete vectors in check

• Large clusters should segment everything

• Turn on Catalog Compression

Page 23: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23 SEIZE THE DATA. 2015

Useful KnobsPro Tip:

• NewEEGroupBySmallMemMB - set to 16

− Increases RAM allocated for Group Bys. Improves performance

• MaxOptMemMB - increase to 200

− Amount of memory to allocate to optimizer. Some large queries require more and fail

• GlobalEEProfileing - set to 0

− Turn off Global Execution Engine Profiling - too costly

If you have lots of RAM, try increasing:

• NewEEROSSubdivisionRows

• MaxDesiredEEBlockSize

• GBHashMemCapMB

Page 24: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

SEIZE THE DATA. 2015

HARDWARE

Page 25: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25 SEIZE THE DATA. 2015

Disks

Should have as many disks as physical cores

10k RPM SAS at least.

SSDs are fast, but very expensive

Catalog and Data on separate mounts

Avoid LVM - not fully supported

EXT4 filesystem

Increase ReadAhead to 4096 or 8192

When in doubt, throw more hardware at it

Page 26: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26 SEIZE THE DATA. 2015

Memory & CPU

Memory:

• 8GB per physical core recommended

• Identical across servers

CPU:

• Greater than 2500 MHz

• At least 8 cores

• Frequency Scaling disabled

Page 27: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.27 SEIZE THE DATA. 2015

Network

10Gbit network preferred

Bonded

Private - keep your Vertica cluster isolated

Large clusters might require control nodes

Page 28: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

SEIZE THE DATA. 2015QUESTIONS?Please attend our Q&A with HP Big Data experts today

Marina Ballroom, Lobby level

10:15 am • 10:30 am

12:00 pm • 1:00 pm

2:30 pm • 3:00 pm

4:30 pm • 5:00 pm

Page 29: SEIZE THE DATA. 2015 · 2015. 8. 7. · • Break the query up into sub-queries - Vertica can only choose one projection for each query, so having multiple sub-queries in a SQL can

SEIZE THE DATA. 2015