postgresql and jdbc: striving for high performance

61
© 2016 NetCracker Technology Corporation Confidential PostgreSQL and JDBC: striving for top performance Vladimir Sitnikov PgConf 2016

Upload: vladimir-sitnikov

Post on 16-Apr-2017

719 views

Category:

Software


2 download

TRANSCRIPT

Page 1: PostgreSQL and JDBC: striving for high performance

© 2016 NetCracker Technology Corporation Confidential

PostgreSQL and JDBC: striving for top performance

Vladimir SitnikovPgConf 2016

Page 2: PostgreSQL and JDBC: striving for high performance

2© 2016 NetCracker Technology Corporation Confidential

About me

• Vladimir Sitnikov, @VladimirSitnikv• Performance architect at NetCracker• 10 years of experience with Java/SQL• PgJDBC committer

Page 3: PostgreSQL and JDBC: striving for high performance

3© 2016 NetCracker Technology Corporation Confidential

Explain (analyze, buffers) PostgreSQL and JDBC

•Data fetch•Data upload•Performance•Pitfalls

Page 4: PostgreSQL and JDBC: striving for high performance

4© 2016 NetCracker Technology Corporation Confidential

Intro

Fetch of a single row via primary key lookup takes 20ms. Localhost. Database is fully cached

A. Just fine C. Kidding? Aim is 1ms!B. It should be 1sec D. 100us

Page 5: PostgreSQL and JDBC: striving for high performance

5© 2016 NetCracker Technology Corporation Confidential

Lots of small queries is a problem

Suppose a single query takes 10ms, then 100 of them would take a whole second *

* Your Captain

Page 6: PostgreSQL and JDBC: striving for high performance

6© 2016 NetCracker Technology Corporation Confidential

PostgreSQL frontend-backend protocol

• Simple query• 'Q' + length + query_text•Extended query•Parse, Bind, Execute commands

Page 7: PostgreSQL and JDBC: striving for high performance

7© 2016 NetCracker Technology Corporation Confidential

PostgreSQL frontend-backend protocol

Super extended queryhttps://github.com/pgjdbc/pgjdbc/pull/478backend protocol wanted features

Page 8: PostgreSQL and JDBC: striving for high performance

8© 2016 NetCracker Technology Corporation Confidential

PostgreSQL frontend-backend protocol

Simple query•Works well for one-time queries•Does not support binary transfer

Page 9: PostgreSQL and JDBC: striving for high performance

9© 2016 NetCracker Technology Corporation Confidential

PostgreSQL frontend-backend protocol

Extended query• Eliminates planning time• Supports binary transfer

Page 10: PostgreSQL and JDBC: striving for high performance

10© 2016 NetCracker Technology Corporation Confidential

PreparedStatement

Connection con = ...;PreparedStatement ps = con.prepareStatement("SELECT..."); ...ps.close();

Page 11: PostgreSQL and JDBC: striving for high performance

11© 2016 NetCracker Technology Corporation Confidential

PreparedStatement

Connection con = ...;PreparedStatement ps = con.prepareStatement("SELECT..."); ...ps.close();

Page 12: PostgreSQL and JDBC: striving for high performance

12© 2016 NetCracker Technology Corporation Confidential

Smoker’s approach to PostgreSQL

PARSE S_1 as ...; // con.prepareStmt BIND/EXECDEALLOCATE // ps.close()PARSE S_2 as ...; BIND/EXECDEALLOCATE // ps.close()

Page 13: PostgreSQL and JDBC: striving for high performance

13© 2016 NetCracker Technology Corporation Confidential

Healthy approach to PostgreSQL

PARSE S_1 as ...; BIND/EXEC BIND/EXEC BIND/EXEC BIND/EXEC BIND/EXEC ...DEALLOCATE

Page 14: PostgreSQL and JDBC: striving for high performance

14© 2016 NetCracker Technology Corporation Confidential

Healthy approach to PostgreSQL

PARSE S_1 as ...; 1 once in a life BIND/EXEC REST call BIND/EXEC BIND/EXEC one more REST call BIND/EXEC BIND/EXEC ...DEALLOCATE “never” is the best

Page 15: PostgreSQL and JDBC: striving for high performance

15© 2016 NetCracker Technology Corporation Confidential

Happiness closes no statements

Conclusion №1: in order to get top performance, you should not close statementsps = con.prepareStatement(...)ps.execueQuery();ps = con.prepareStatement(...)ps.execueQuery();...

Page 16: PostgreSQL and JDBC: striving for high performance

16© 2016 NetCracker Technology Corporation Confidential

Happiness closes no statements

Conclusion №1: in order to get top performance, you should not close statements

ps = con.prepare...ps.execueQuery();ps = con.prepare...ps.execueQuery();...

Page 17: PostgreSQL and JDBC: striving for high performance

17© 2016 NetCracker Technology Corporation Confidential

Unclosed statements in practice

@Benchmarkpublic Statement leakStatement() { return con.createStatement();}pgjdbc < 9.4.1202, -Xmx128m, OracleJDK 1.8u40# Warmup Iteration 1: 1147,070 ns/op# Warmup Iteration 2: 12101,537 ns/op# Warmup Iteration 3: 90825,971 ns/op# Warmup Iteration 4: <failure>java.lang.OutOfMemoryError: GC overhead limit exceeded

Page 18: PostgreSQL and JDBC: striving for high performance

18© 2016 NetCracker Technology Corporation Confidential

Unclosed statements in practice

@Benchmarkpublic Statement leakStatement() { return con.createStatement();}pgjdbc >= 9.4.1202, -Xmx128m, OracleJDK 1.8u40# Warmup Iteration 1: 30 ns/op# Warmup Iteration 2: 27 ns/op# Warmup Iteration 3: 30 ns/op...

Page 19: PostgreSQL and JDBC: striving for high performance

19© 2016 NetCracker Technology Corporation Confidential

Statements in practice

• In practice, application is always closing the statements• PostgreSQL has no shared query cache• Nobody wants spending excessive time on

planning

Page 20: PostgreSQL and JDBC: striving for high performance

20© 2016 NetCracker Technology Corporation Confidential

Server-prepared statements

What can we do about it?• Wrap all the queries in PL/PgSQL• It helps, however we had 100500 SQL of them

• Teach JDBC to cache queries

Page 21: PostgreSQL and JDBC: striving for high performance

21© 2016 NetCracker Technology Corporation Confidential

Query cache in PgJDBC

• Query cache was implemented in 9.4.1202 (2015-08-27)see https://github.com/pgjdbc/pgjdbc/pull/319• Is transparent to the application• We did not bother considering PL/PgSQL again• Server-prepare is activated after 5 executions

(prepareThreshold)

Page 22: PostgreSQL and JDBC: striving for high performance

22© 2016 NetCracker Technology Corporation Confidential

Where are the numbers?

• Of course, planning time depends on the query complexity• We observed 20мс+ planning time for OLTP

queries: 10KiB query, 170 lines explain• Result is ~0ms

Page 23: PostgreSQL and JDBC: striving for high performance

23© 2016 NetCracker Technology Corporation Confidential

Overheads

Page 24: PostgreSQL and JDBC: striving for high performance

24© 2016 NetCracker Technology Corporation Confidential

Generated queries are bad

• If a query is generated• It results in a brand new java.lang.String object• Thus you have to recompute its hashCode

Page 25: PostgreSQL and JDBC: striving for high performance

25© 2016 NetCracker Technology Corporation Confidential

Parameter types

If the type of bind value changes, you have to recreate server-prepared statementps.setInt(1, 42);...ps.setNull(1, Types.VARCHAR);

Page 26: PostgreSQL and JDBC: striving for high performance

26© 2016 NetCracker Technology Corporation Confidential

Parameter types

If the type of bind value changes, you have to recreate server-prepared statementps.setInt(1, 42);...ps.setNull(1, Types.VARCHAR);

It leads to DEALLOCATE PREPARE

Page 27: PostgreSQL and JDBC: striving for high performance

27© 2016 NetCracker Technology Corporation Confidential

Keep data type the same

Conclusion №1• Even NULL values should be properly typed

Page 28: PostgreSQL and JDBC: striving for high performance

28© 2016 NetCracker Technology Corporation Confidential

Unexpected degradation

If using prepared statements, the response time gets 5'000 times slower. How’s that possible?

A. Bug C. FeatureB. Feature D. Bug

Page 29: PostgreSQL and JDBC: striving for high performance

29© 2016 NetCracker Technology Corporation Confidential

Unexpected degradation

https://gist.github.com/vlsi -> 01_plan_flipper.sql

select * from plan_flipper -- <- table where skewed = 0 -- 1M rows and non_skewed = 42 -- 20 rows

Page 30: PostgreSQL and JDBC: striving for high performance

30© 2016 NetCracker Technology Corporation Confidential

Unexpected degradation

https://gist.github.com/vlsi -> 01_plan_flipper.sql0.1ms 1st execution0.05ms 2nd execution0.05ms 3rd execution0.05ms 4th execution0.05ms 5th execution250 ms 6th execution

Page 31: PostgreSQL and JDBC: striving for high performance

31© 2016 NetCracker Technology Corporation Confidential

Unexpected degradation

https://gist.github.com/vlsi -> 01_plan_flipper.sql0.1ms 1st execution0.05ms 2nd execution0.05ms 3rd execution0.05ms 4th execution0.05ms 5th execution

250 ms 6th execution

Page 32: PostgreSQL and JDBC: striving for high performance

32© 2016 NetCracker Technology Corporation Confidential

Unexpected degradation

• Who is to blame?• PostgreSQL switches to generic plan after 5

executions of a server-prepared statement

• What can we do about it?• Add +0, OFFSET 0, and so on• Pay attention on plan validation• Discuss the phenomenon pgsql-hackers

Page 33: PostgreSQL and JDBC: striving for high performance

33© 2016 NetCracker Technology Corporation Confidential

Unexpected degradation

https://gist.github.com/vlsi -> 01_plan_flipper.sqlWe just use +0 to forbid index on a bad columnselect * from plan_flipper where skewed+0 = 0 ~ /*+no_index*/ and non_skewed = 42

Page 34: PostgreSQL and JDBC: striving for high performance

34© 2016 NetCracker Technology Corporation Confidential

Explain explain explain explain

The rule of 6 explains:prepare x(number) as select ...;explain analyze execute x(42); -- 1msexplain analyze execute x(42); -- 1msexplain analyze execute x(42); -- 1msexplain analyze execute x(42); -- 1msexplain analyze execute x(42); -- 1msexplain analyze execute x(42); -- 10 sec

Page 35: PostgreSQL and JDBC: striving for high performance

35© 2016 NetCracker Technology Corporation Confidential

Везде баг

Page 36: PostgreSQL and JDBC: striving for high performance

36© 2016 NetCracker Technology Corporation Confidential

Decision problem

There’s a schema A with table X, and a schema B with table X. What is the result of select * from X?

A.X C. ErrorB.X D. All of the above

Page 37: PostgreSQL and JDBC: striving for high performance

37© 2016 NetCracker Technology Corporation Confidential

Search_path

There’s a schema A with table X, and a schema B with table X. What is the result of select * from X?• search_path determines the schema used• server-prepared statements are not prepared for

search_path changes crazy things might happen

Page 38: PostgreSQL and JDBC: striving for high performance

38© 2016 NetCracker Technology Corporation Confidential

Search_path can go wrong

• 9.1 will just use old OIDs and execute the “previous” query• 9.2-9.5 might fail with "cached plan must not change

result type” error

Page 40: PostgreSQL and JDBC: striving for high performance

40© 2016 NetCracker Technology Corporation Confidential

To fetch or not to fetch

You are to fetch 1M rows 1KiB each, -Xmx128m while (resultSet.next()) resultSet.getString(1);

A. No problem C. Must use LIMIT/OFFSET

B. OutOfMemory D. autoCommit(false)

Page 41: PostgreSQL and JDBC: striving for high performance

41© 2016 NetCracker Technology Corporation Confidential

To fetch or not to fetch

• PgJDBC fetches all rows by default• To fetch in batches, you need Statement.setFetchSize and connection.setAutoCommit(false)• Default value is configurable via defaultRowFetchSize (9.4.1202+)

Page 42: PostgreSQL and JDBC: striving for high performance

42© 2016 NetCracker Technology Corporation Confidential

fetchSize vs fetch time

10 50 100 1000 200002468

6.48

2.28 1.761.04 0.97

2000 rows

2000 rows

fetchSize

Fa

ster

, ms

select int4, int4, int4, int4

Page 43: PostgreSQL and JDBC: striving for high performance

43© 2016 NetCracker Technology Corporation Confidential

FetchSize is good for stability

Conclusion №2:• For stability & performance reasons set defaultRowFetchSize >= 100

Page 44: PostgreSQL and JDBC: striving for high performance

44© 2016 NetCracker Technology Corporation Confidential

Data upload

For data uploads, use• INSERT() VALUES()• INSERT() SELECT ?, ?, ?• INSERT() VALUES() executeBatch• INSERT() VALUES(), (), () executeBatch• COPY

Page 45: PostgreSQL and JDBC: striving for high performance

45© 2016 NetCracker Technology Corporation Confidential

Healty batch INSERT

PARSE S_1 as ...; BIND/EXEC BIND/EXEC BIND/EXEC BIND/EXEC BIND/EXEC ...DEALLOCATE

Page 46: PostgreSQL and JDBC: striving for high performance

46© 2016 NetCracker Technology Corporation Confidential

TCP strikes back

JDBC is busy with sending queries, thus it has not started

fetching responses yet

DB cannot fetch more queries since it is busy with sending responses

Page 47: PostgreSQL and JDBC: striving for high performance

47© 2016 NetCracker Technology Corporation Confidential

Batch INSERT in real life

PARSE S_1 as ...; BIND/EXEC BIND/EXECSYNC flush & wait for the response BIND/EXEC BIND/EXECSYNC flush & wait for the response ...

Page 48: PostgreSQL and JDBC: striving for high performance

48© 2016 NetCracker Technology Corporation Confidential

TCP deadlock avoidance

• PgJDBC adds SYNC to your nice batch operations• The more the SYNCs the slower it performs

Page 49: PostgreSQL and JDBC: striving for high performance

49© 2016 NetCracker Technology Corporation Confidential

Horror stories

A single line patch makes insert batch 10 times faster:

https://github.com/pgjdbc/pgjdbc/pull/380

- static int QUERY_FORCE_DESCRIBE_PORTAL = 128;+ static int QUERY_FORCE_DESCRIBE_PORTAL = 512;...// 128 has already been used static int QUERY_DISALLOW_BATCHING = 128;

Page 50: PostgreSQL and JDBC: striving for high performance

50© 2016 NetCracker Technology Corporation Confidential

Trust but always measure

• Java 1.8u40+• Core i7 2.6Ghz• Java microbenchmark harness• PostgreSQL 9.5

Page 51: PostgreSQL and JDBC: striving for high performance

51© 2016 NetCracker Technology Corporation Confidential

Queries under test: INSERT

pgjdbc/ubenchmark/InsertBatch.java

insert into batch_perf_test(a, b, c) values(?, ?, ?)

Page 52: PostgreSQL and JDBC: striving for high performance

52© 2016 NetCracker Technology Corporation Confidential

Queries under test: INSERT

pgjdbc/ubenchmark/InsertBatch.java

insert into batch_perf_test(a, b, c) values(?, ?, ?)

Page 53: PostgreSQL and JDBC: striving for high performance

53© 2016 NetCracker Technology Corporation Confidential

Queries under test: INSERT

pgjdbc/ubenchmark/InsertBatch.java

insert into batch_perf_test(a, b, c) values (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), ...;

Page 54: PostgreSQL and JDBC: striving for high performance

54© 2016 NetCracker Technology Corporation Confidential

Тестируемые запросы: COPY

pgjdbc/ubenchmark/InsertBatch.java

COPY batch_perf_test FROM STDIN1 s1 12 s2 23 s3 3...

Page 55: PostgreSQL and JDBC: striving for high performance

55© 2016 NetCracker Technology Corporation Confidential

Queries under test: hand-made structs

pgjdbc/ubenchmark/InsertBatch.java

insert into batch_perf_test select * from unnest('{"(1,s1,1)","(2,s2,2)", "(3,s3,3)"}'::batch_perf_test[])

Page 56: PostgreSQL and JDBC: striving for high performance

56© 2016 NetCracker Technology Corporation Confidential

You’d better use batch, your C.O.

16 128 10240

50

100

150

216

128

InsertBatchStructCopy

The number of inserted rows

fa

ster

, ms

int4, varchar, int4

Page 57: PostgreSQL and JDBC: striving for high performance

57© 2016 NetCracker Technology Corporation Confidential

COPY is good

16 128 10240

0.51

1.52

2.5

BatchStructCopy

The number of inserted rows

Fa

ster

, ms

int4, varchar, int4

Page 58: PostgreSQL and JDBC: striving for high performance

58© 2016 NetCracker Technology Corporation Confidential

COPY is bad for small batches

1 4 8 16 12805

10152025

BatchStructCopy

Batch size in rows

Fa

ster

, ms

Insert of 1024 rows

Page 59: PostgreSQL and JDBC: striving for high performance

59© 2016 NetCracker Technology Corporation Confidential

Final thoughts

• PreparedStatement is our hero• Remember to EXPLAIN ANALYZE at least six

times, a blue moon is a plus• Don’t forget +0 and OFFSET 0

Page 60: PostgreSQL and JDBC: striving for high performance

60© 2016 NetCracker Technology Corporation Confidential

About me

• Vladimir Sitnikov, @VladimirSitnikv• Performance architect in NetCracker• 10 years of experience with Java/SQL• PgJDBC committer

Page 61: PostgreSQL and JDBC: striving for high performance

© 2016 NetCracker Technology Corporation Confidential

Questions?

Vladimir Sitnikov,PgConf 2016