high-performance jdbc voxxed bucharest 2016
TRANSCRIPT
High-Performance JDBCVLAD MIHALCEA
About me
• @Hibernate Developer
• vladmihalcea.com
• @vlad_mihalcea
• vladmihalcea
Performance Facts
“More than half of application performance bottlenecks originate in the database”
AppDynamics - http://www.appdynamics.com/database/
Data access layers
Poor man’s JDBC
• High response time
• Low throughput
Photo by Amit Patel CC BY 2.0 https://www.flickr.com/photos/amitp/6069412747/
State of the art JDBC
• Low response time
• High throughput
Photo by zoetnet CC BY 2.0 https://www.flickr.com/photos/zoetnet/14288129197/
Response time
• connection acquisition time
• statements submission time
• statements execution time
• result set fetching time
• idle time prior to releasing the database connection
𝑇 = 𝑡𝑎𝑐𝑞 + 𝑡𝑟𝑒𝑞 + 𝑡𝑒𝑥𝑒𝑐 + 𝑡𝑟𝑒𝑠 + 𝑡𝑖𝑑𝑙𝑒
Connection management
𝑇 = 𝑡𝑎𝑐𝑞 + 𝑡𝑟𝑒𝑞 + 𝑡𝑒𝑥𝑒𝑐 + 𝑡𝑟𝑒𝑠 + 𝑡𝑖𝑑𝑙𝑒
Connection acquisition overhead
Metric DB_A (ms) DB_B (ms) DB_C (ms) DB_D (ms) HikariCP (ms)
min 11.174 5.441 24.468 0.860 0.001230
max 129.400 26.110 74.634 74.313 1.014051
mean 13.829 6.477 28.910 1.590 0.003458
p99 20.432 9.944 54.952 3.022 0.010263
Connection pooling
• Logical vs physical connections
• Lease vs create
• Release vs close
Connection pool sizing
FlexyPool
• Java EE
• Bitronix / Atomikos
• Apache DBCP / DBCP2
• C3P0
• HikariCP
• Tomcat CP
• Vibur DBCP
https://github.com/vladmihalcea/flexy-pool
FlexyPool
• concurrent connections histogram
• concurrent connection requests histogram
• connection acquisition time histogram
• connection lease time histogram
• maximum pool size histogram
• retry attempts histogram
https://github.com/vladmihalcea/flexy-pool
FlexyPool – Concurrent connection requests
1
28
55
82
10
9
13
6
16
3
19
0
21
7
24
4
27
1
29
8
32
5
35
2
37
9
40
6
43
3
46
0
48
7
51
4
54
1
56
8
59
5
62
2
64
9
67
6
70
3
73
0
75
7
78
4
81
1
83
8
86
5
89
2
91
9
94
6
97
3
10
00
10
27
0
2
4
6
8
10
12
Sample time (Index × 15s)
Co
nn
ecti
on
req
ue
sts
max mean p50 p95 p99
FlexyPool – Pool size growth
1
28
55
82
10
9
13
6
16
3
19
0
21
7
24
4
27
1
29
8
32
5
35
2
37
9
40
6
43
3
46
0
48
7
51
4
54
1
56
8
59
5
62
2
64
9
67
6
70
3
73
0
75
7
78
4
81
1
83
8
86
5
89
2
91
9
94
6
97
3
10
00
10
27
0
1
2
3
4
5
6
Sample time (Index × 15s)
Max
po
ol s
ize
max mean p50 p95 p99
FlexyPool – Connection acquisition time
12
85
58
21
09
13
61
63
19
02
17
24
42
71
29
83
25
35
23
79
40
64
33
46
04
87
51
45
41
56
85
95
62
26
49
67
67
03
73
07
57
78
48
11
83
88
65
89
29
19
94
69
73
10
00
10
27
0
500
1000
1500
2000
2500
3000
3500
Sample time (Index × 15s)
Co
nn
ecti
on
acq
uis
itio
n t
ime
(ms)
max mean p50 p95 p99
FlexyPool – Connection lease time
1
29
57
85
11
3
14
1
16
9
19
7
22
5
25
3
28
1
30
9
33
7
36
5
39
3
42
1
44
9
47
7
50
5
53
3
56
1
58
9
61
7
64
5
67
3
70
1
72
9
75
7
78
5
81
3
84
1
86
9
89
7
92
5
95
3
98
1
10
09
10
37
0
5000
10000
15000
20000
25000
30000
35000
40000
Sample time (Index × 15s)
Co
nn
ecti
on
leas
e ti
me
(ms)
max mean p50 p95 p99
Statement Batching
statement.addBatch(
"INSERT INTO post "(title, version, id) " +
"VALUES ('Post no. 1', 0, 1)");
statement.addBatch(
"INSERT INTO post_comment (post_id, review, version, id) " +
"VALUES (1, 'Post comment 1.1', 0, 1)");
int[] updateCounts = statement.executeBatch();
𝑇 = 𝑡𝑎𝑐𝑞 + 𝑡𝑟𝑒𝑞 + 𝑡𝑒𝑥𝑒𝑐 + 𝑡𝑟𝑒𝑠 + 𝑡𝑖𝑑𝑙𝑒
Statement Batching (5k rows)
1 10 20 30 40 50 60 70 80 90 100 1000
0
500
1000
1500
2000
2500
Batch size
Tim
e (m
s)
DB_A DB_B DB_C DB_D
Oracle Statement batching
• For Statement and CallableStatement,
the Oracle JDBC Driver doesn’t actually support batching,
each statement being executed separately.
MySQL Statement batching
• By default, the MySQL JDBC driver doesn’t send the batched statements in a single request.
• The rewriteBatchedStatements connection property adds all batched statements to a String buffer.
Batch PreparedStatements
PreparedStatement postStatement = connection.prepareStatement(
"INSERT INTO Post (title, version, id) VALUES (?, ?, ?)");
postStatement.setString(1, String.format("Post no. %1$d", 1));
postStatement.setInt(2, 0);
postStatement.setLong(3, 1);
postStatement.addBatch();
postStatement.setString(1, String.format("Post no. %1$d", 2));
postStatement.setInt(2, 0);
postStatement.setLong(3, 2);
postStatement.addBatch();
int[] updateCounts = postStatement.executeBatch();
Batch PreparedStatements
• SQL Injection Prevention
• Better performance
• Hibernate can batch statements automatically
Insert PreparedStatement batching (5k rows)
1 10 20 30 40 50 60 70 80 90 100 1000
0
200
400
600
800
1000
1200
1400
1600
Batch size
Tim
e (m
s)
DB_A DB_B DB_C DB_D
Update PreparedStatement batching (5k rows)
1 10 20 30 40 50 60 70 80 90 100 1000
0
100
200
300
400
500
600
700
Batch size
Tim
e (m
s)
DB_A DB_B DB_C DB_D
Delete PreparedStatement batching (5k rows)
1 10 20 30 40 50 60 70 80 90 100 1000
0
200
400
600
800
1000
1200
Batch size
Tim
e (m
s)
DB_A DB_B DB_C DB_D
Statement caching
𝑇 = 𝑡𝑎𝑐𝑞 + 𝑡𝑟𝑒𝑞 + 𝑡𝑒𝑥𝑒𝑐 + 𝑡𝑟𝑒𝑠 + 𝑡𝑖𝑑𝑙𝑒
Statement caching gain (one minute interval)
Database System No Caching
Throughput (SPM)
Caching Throughput
(SPM)
Percentage Gain
DB_A 419 833 507 286 20.83%
DB_B 194 837 303 100 55.56%
DB_C 116 708 166 443 42.61%
DB_D 15 522 15 550 0.18%
Oracle server-side statement caching
• Hard parse
• Soft parse
• Bind peeking
• Adaptive cursor sharing (since 11g)
SQL Server server-side statement caching
• Execution plan cache
• Parameter sniffing
• Force recompile
SELECT *
FROM task
WHERE status = ?
OPTION(RECOMPILE);
PostgreSQL server-side statement caching
• Prior to 9.2 – execution plan caching
• 9.2 – optimization and planning are deferred
• The prepareThreshold connection property
MySQL server-side statement caching
• No execution plan cache
• Since Connector/J 5.0.5 PreparedStatements are only emulated
• To activate server-side prepared statements:
• useServerPrepStmts
• cachePrepStmts
Client-side statement caching
• Recycling Statement, PreparedStatement or CallableStatement objects
• Reusing database cursors
Oracle implicit client-side statement caching
• Connection-level cache
• PreparedStatement and CallabledStatement only
connectionProperties.put(
"oracle.jdbc.implicitStatementCacheSize",
Integer.toString(cacheSize)
);
dataSource.setConnectionProperties(
connectionProperties
);
Oracle implicit client-side statement caching
• Can be disabled on a per statement basis
if (statement.isPoolable()) {
statement.setPoolable(false);
}
Oracle explicit client-side statement caching
• Caches both metadata and execution state with data
OracleConnection oracleConnection =
(OracleConnection) connection;
oracleConnection.setExplicitCachingEnabled(true);
oracleConnection.setStatementCacheSize(cacheSize);
Oracle explicit client-side statement caching
• Vendor-specific API
PreparedStatement statement = oracleConnection.
getStatementWithKey(SELECT_POST_KEY);
if (statement == null)
statement = connection.prepareStatement(SELECT_POST);
try {
statement.setInt(1, 10);
statement.execute();
} finally {
((OraclePreparedStatement) statement).
closeWithKey(SELECT_POST_KEY);
}
SQL Server client-side statement caching
• Microsoft JDBC Driver 4.2 disableStatementPooling
• jTDS 1.3.1 – JDBC 3.0
JtdsDataSource jdtsDataSource =
(JtdsDataSource) dataSource;
jdtsDataSource.setMaxStatements(cacheSize);
PostgreSQL Server client-side statement caching
• PostgreSQL JDBC Driver 9.4-1202 makes client-side statement connection-bound instead of statement-bound
• Configurable:
• preparedStatementCacheQueries (default is 256)
• preparedStatementCacheSizeMiB (default is 5MB)
• Statement.setPoolable(false) is not supported
MySQL Server client-side statement caching
• Configurable:
• cachePrepStmts (default is false)
Required for server-side statement caching as well
• prepStmtCacheSize (default is 25)
• prepStmtCacheSqlLimit (default is 256)
• Statement.setPoolable(false) works for client-side statements only
ResultSet fetch size
• ResultSet - application-level cursor
𝑇 = 𝑡𝑎𝑐𝑞 + 𝑡𝑟𝑒𝑞 + 𝑡𝑒𝑥𝑒𝑐 + 𝑡𝑟𝑒𝑠 + 𝑡𝑖𝑑𝑙𝑒
statement.setFetchSize(fetchSize);
Oracle ResultSet fetch size
• Default fetch size is 10
• Oracle 10i and 11g JDBC Driver maximum ResultSet size memory preallocation
• VARCHAR2(4000) – allocates 8000 bytes (even for 1 character)
• Memory buffers are recycled only when using Statement caching
• Oracle 12c allocates memory on demand
• VARCHAR2(4000) – 15 bytes + the actual row column size
SQL Server ResultSet fetch size
• Adaptive buffering
• Only for the default read-only and forward-only ResultSet
• Updatable cursors use fixed data blocks
PostgreSQL ResultSet fetch size
• Fetch all – one database roundtrip
• Custom fetch size – database cursor
MySQL ResultSet fetch size
• Fetch all – one database roundtrip
• Streaming – only one record at a time
ResultSet fetch size (10k rows)
1 10 100 1000 10000
0
100
200
300
400
500
600
Fetch size
Tim
e (m
s)
DB_A DB_B DB_C DB_D
ResultSet size
• Avoid fetching data that is not required
• Hibernate addresses the max-size vendor-specific SQL statement syntax
SQL:2008 ResultSet size limit
• Oracle 12c, SQL Server 2012 and PostgreSQL 8.4
SELECT
pc.id AS pc_id, p.title AS p_title
FROM post_comment pc
INNER JOIN post p ON p.id = pc.post_id
ORDER BY pc_id
OFFSET ? ROWS
FETCH FIRST (?) ROWS ONLY;
Oracle ResultSet size limit
SELECT *
FROM (
SELECT
pc.id AS pc_id, p.title AS p_title
FROM post_comment pc
INNER JOIN post p ON p.id = pc.post_id
ORDER BY pc_id
)
WHERE ROWNUM <= ?
SQL Server ResultSet size limit
SELECT
TOP (?) pc.id AS pc_id, p.title AS p_title
FROM post_comment pc
INNER JOIN post p ON p.id = pc.post_id
ORDER BY pc_id
PostgreSQL and MySQL ResultSet size limit
SELECT
pc.id AS pc_id, p.title AS p_title
FROM post_comment pc
INNER JOIN post p ON p.id = pc.post_id
ORDER BY pc_id
LIMIT ?
Statement max rows
• Vendor-independent syntax
• Might not influence the execution plan
• According to the documentation:
“If the limit is exceeded, the excess rows are silently dropped.”
statement.setMaxRows(maxRows);
Max size: 1 million vs 100 rows
Fetch all Fetch max rows Fetch limit
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Tim
e (m
s)
DB_A DB_B DB_C DB_D
Fetching too many columns
• Fetching all column (ORM tools)
SELECT *
FROM post_comment pc
INNER JOIN post p ON p.id = pc.post_id
INNER JOIN post_details pd ON p.id = pd.id
Fetching too many columns
• Fetching a custom SQL projection
SELECT pc.version
FROM post_comment pc
INNER JOIN post p ON p.id = pc.post_id
INNER JOIN post_details pd ON p.id = pd.id
Fetching too many columns performance impact
All columns Custom projection
0
5
10
15
20
25
30
Tim
e (m
s)
DB_A DB_B DB_C DB_D
Processing Logic
• Hibernate defers connection acquisition
• Release connection as soon as possible
𝑇 = 𝑡𝑎𝑐𝑞 + 𝑡𝑟𝑒𝑞 + 𝑡𝑒𝑥𝑒𝑐 + 𝑡𝑟𝑒𝑠 + 𝑡𝑖𝑑𝑙𝑒
Questions and Answers
𝑇 = 𝑡𝑎𝑐𝑞 + 𝑡𝑟𝑒𝑞 + 𝑡𝑒𝑥𝑒𝑐 + 𝑡𝑟𝑒𝑠 + 𝑡𝑖𝑑𝑙𝑒
• Response time
• Connection management
• Batch updates
• Statement caching
• ResultSet fetching
• https://leanpub.com/high-performance-java-persistence