Download - Foreign Data Wrapper Enhancements
FOREIGN DATA WRAPPER ENHANCEMENTS June 17, 2015 PostgreSQL Developers Unconference Clustering Track Shigeru HANADA, Etsuro Fujita
Who are we • Shigeru HANADA
• From Tokyo, Japan • Working on FDW since 2010 • Implemented initial FDW API and postgres_fdw
• Etsuro Fujita • From Tokyo, Japan • Working on Postgres for 10 years • Interested in FDW enhancements
Agenda • Past enhancements proposed for 9.5
• Inheritance support (Committed) • Join push-down (Committed) • Join push-down for postgres_fdw (Returned with feedback) • Update push-down (Returned with feedback) • Possible remote query optimization in 9.5
• Ideas for further enhancement • Sort push-down • Aggregate push-down • More aggressive join push-down
• Discussions
PAST ENHANCEMENTS PROPOSED FOR 9.5
Inheritance support • Outline
• Allow foreign table to participate in inheritance tree • A way to implement sharding
• Example postgres=# explain verbose select * from parent ;! QUERY PLAN!---------------------------------------------------------------------------! Append (cost=0.00..270.00 rows=2001 width=4)! -> Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=4)! Output: parent.a! -> Foreign Scan on public.ft1 (cost=100.00..135.00 rows=1000 width=4)! Output: ft1.a! Remote SQL: SELECT a FROM public.t1! -> Foreign Scan on public.ft2 (cost=100.00..135.00 rows=1000 width=4)! Output: ft2.a! Remote SQL: SELECT a FROM public.t2!(9 rows)
Update push-down • Outline
• Send whole UPDATE/DELETE statement when it has same semantics on the remote side
• Example postgres=# explain verbose update foo set a = a + 1 where a > 10;! QUERY PLAN!--------------------------------------------------------------------------------! Update on public.foo (cost=100.00..139.78 rows=990 width=10)! Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1! -> Foreign Scan on public.foo (cost=100.00..139.78 rows=990 width=10)! Output: (a + 1), ctid! Remote SQL: SELECT a, ctid FROM public.foo WHERE ((a > 10)) FOR UPDATE!(5 rows)!!postgres=# explain verbose update foo set a = a + 1 where a > 10;! QUERY PLAN!-----------------------------------------------------------------------------! Update on public.foo (cost=100.00..139.78 rows=990 width=10)! -> Foreign Update on public.foo (cost=100.00..139.78 rows=990 width=10)! Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a > 10))!(3 rows)
Current
Patched
Update push-down, cont. • Issues
• FDW-APIs for update push-down • Called from nodeModifyTable.c or nodeForeignscan.c?
• Update push-down for an update on a join • "UPDATE foo ... FROM bar ..." (both foo and bar are remote)
• Further enhancements • INSERT/UPSERT push-down
Join push-down • Outline
• Join foreign tables on remote side, if it’s safe
• Example fdw=# EXPLAIN (VERBOSE) SELECT tbalance FROM pgbench_branches b JOIN pgbench_tellers t USING(bid);! QUERY PLAN!------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------! Foreign Scan (cost=100.00..101.00 rows=50 width=4)! Output: t.tbalance! Relations: (public.pgbench_branches b) INNER JOIN (public.pgbench_tellers t)! Remote SQL: SELECT r.a1 FROM (SELECT l.a9 FROM (SELECT bid a9 FROM public.pgbench_branches) l) l (a1) INNER JOIN (SELECT r.a11, r.a10 FROM (SELECT bid a10, tbalance a11 FROM public.pgbench_tellers) r) r (a1, a2) ON ((l.a1 = r.a2))!(4 rows)
Join push-down, cont. • Issues
• Implement postgres_fdw to handle join APIs • Centralize deparsing remote query
• Should use parse tree rather than planner information to generate join query?
• Generic SQL deparser would help porting to FDWs for other RDBMS
Possible remote query optimization in 9.5 • When we run a following query:
SELECT c.grade, max(s.score) max_score! FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id! WHERE c.subject = ‘Math’! GROUP BY c.grade!HAVING max(s.score) > 50! ORDER BY c.grade DESC;
“scores” and “classes” are foreign tables
Possible remote query optimization in 9.5 • When we run a following query:
SELECT c.grade, max(s.score) max_score! FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id! WHERE c.subject = ‘Math’! GROUP BY c.grade!HAVING max(s.score) > 50! ORDER BY c.grade DESC;
SELECT c.grade, s.score!FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id!WHERE c.subject= ‘Math’!ORDER BY c.grade DESC; Genarate remote query
We can push-down red portions of the
query
Possible remote query optimization in 9.5 postgres=# EXPLAIN SELECT c.grade, max(s.score) max_score!postgres-# FROM scores s LEFT JOIN classes c!postgres-# ON c.class_id = s.class_id!postgres-# WHERE c.subject= 'Math'!
postgres-# GROUP BY c.grade!postgres-# HAVING max(s.score) > 50!postgres-# ORDER BY c.grade DESC;!
QUERY PLAN!----------------------------------------------------------------------------------! GroupAggregate (cost=27.92..27.94 rows=1 width=8)! Group Key: c.grade!
Filter: (max(s.score) > 50)! -> Sort (cost=27.92..27.92 rows=1 width=8)! Sort Key: c.grade DESC!
-> Hash Join (cost=20.18..27.91 rows=1 width=8)! Hash Cond: (s.class_id = c.class_id)! -> Seq Scan on scores s (cost=0.00..6.98 rows=198 width=8)! -> Hash (cost=20.12..20.12 rows=4 width=8)!
-> Seq Scan on classes c (cost=0.00..20.12 rows=4 width=8)! Filter: (subject = 'Math'::text)!(11 rows)
IDEAS FOR FURTHER ENHANCEMENT
Ideas for further enhancement • Sort push-down • Aggregate push-down • More aggressive join push-down • 2PC support (out of scope of this session)
• Will be discussed in Ashutosh’s session on 19th Jun.
Sort push-down • Outline
• Mark a ForiegnScan as sorted
• Efficacy • Avoid unnecessary sort on local side • Use ForeignScan as a source of MergeJoin directly
• How to implement • Add extra ForeignPath with pathkeys • Estimate costs of pre-sorted path • Sort result of a foreign scan
• add ORDER BY, in RDBMS FDWs • choose pre-sorted file, in file-based FDWs
Sort push-down • Issues
• How can we limit candidates of sort keys? • No brute-force approach • Introduce FOREIGN INDEX to represent generic remote indexes? • Introduce FDW-specific catalogs? • Extract key information from ORDER BY, JOIN, GROUP BY?
• How can we ensure that the semantics of ordering are identical? • Even between PostgreSQLs, we have collation issues. • Is it OK to leave it to DBAs? • Limiting to non-character data types seems a way to go for the first cut.
• Can we use pre-sorted join results as sorted path? • MergeJoin as a root node of remote query means the result is sorted by
the join key, but it is not certain even we execute EXPLAIN before query.
• Any idea?
Aggregate push-down • Outline
• Replace a Aggregate/GroupAggregate/HashAggregate plan node with a ForeignScan which produces aggregated results
• Efficacy • Reduce amount of data transferred • Off-load overheads of aggregation
• How to implement • New FDW API for aggregation hooking • Implement API in each FDW
Aggregate push-down • Issues
• GROUP BY requires identical semantics about grouping keys. • We have similar issue to sort push-down.
• How can we map functions to remote ones? • ROUTINE MAPPING is defined in SQL standard, but it doesn’t seem
well-designed.
More aggressive join push-down • Outline
• Send local data to join it on remote side, with following way: • VALUES expression in FROM clause • per-table replication, with logical replication, Slony-I, etc.
• Efficacy • Reduce amount of data transferred from remote to local
• Limited to cases that joining small local table and huge remote table which produce small results
More aggressive join push-down • How to implement
• Replace reference to a small local table with VALUES() • Use a remote replicated table as an alternative
• Issues • How can we construct VALUES() expression? • How can we know a table is replicated on the remote side?
SELECT *! FROM huge_remote_table h! JOIN! (VALUES (1, ‘foo’), (2, ‘bar’)) AS s (id, name)! ON s.id;
Generated by scanning local small table
DISCUSSIONS