fdb: a query engine for factorised relational databases · 2017-03-22 · fdb: a query engine for...

FDB: A Query Engine for Factorised RelationalDatabases

Nurzhan Bakibayev, Dan Olteanu, and Jakub Zavodny — Oxford CS

Christan [email protected]

University of Florida

November 1, 2013

cgrant (UF) FDB: Factorised RDBMS November 1, 2013 1 / 39

Contents

1 Intro

2 F-Representation

3 F-Trees

4 Query EvaluationNormalisation OperatorSwap OperatorCartesian Product OperatorSelection Operator

Merge Selection OperatorAbsorb Selection OperatorSelection with Constant Operator

Projection Operator

5 Query OptimizationCost of an F-PlanExhaustive SearchGreedy Heuristic

6 ExperimentsExperiment #1Experiment #2Experiment #2Experiment #3Experiment #4

7 Conclusion8 Thanks


Intro

Intro

A succinct representation for large relations.

It is used in Google’s F1 data management system for ML.

Used for large incomplete/uncertain relations.

FDB can speed up expected queries.

This paper discusses basics FDB system.


Intro

TL;DR

Given a query: Q1 = Order ./item Store ./location Disp

And a result:

We can represent it as this:

Or a tree:

In exponentially -space and +speed for SPJ queries.


Intro

Contents

1 Intro

2 F-Representation

3 F-Trees

4 Query EvaluationNormalisation OperatorSwap OperatorCartesian Product OperatorSelection OperatorProjection Operator

5 Query Optimization

Cost of an F-PlanExhaustive SearchGreedy Heuristic




Intro

Example Schema


F-Representation

Factorized Representations

Relational algebra expressions.

Constructed with singleton relations 〈v〉, the union ∪ and productoperators ×.

exponentially more succinct than relations they encode.

fast (constant-delay) enumeration of tuples


F-Representation

Factorized RepresentationsDefinition 1

A factorised representation E , or f-representation for short, over a set S ofattributes and domain D is a relational algebra expression of the form

∅, empty relation over S;

〈〉, the relation consisting of the nullary tuplem if S = ∅;〈A : a〉, the unary relation with a single tuple with value a, if S = {A}and a is a value in the domain D;

(E ), where E is an f-representation over S;

E1 ∪ · · · ∪ En, where each Ei is an f-representation over S;

E1 × · · · × En, where each Ei is an f-representation over Si and S isthe disjoint union over all Si .


F-Representation

Factorized RepresentationsSingleton

〈A : a〉 is a singleton

|E | the size is the number of singletons

A single system may have many different f-representations.

The tuples of a given f-representation can be enumerated with O(|S|)additional space and delay.

Factorized Representation Compact Factorized Representation


F-Trees

F-TreesDefinition

An f-tree for short, over a schema S of attributes is an unordered rootedforest with each node labelled by a non-empty subset of S such that each

attribute of S labels exactly one node.

Q1 F-Tree Q1 Alternate F-Tree

Tree created in O(|S| · |R| · log |R|).

We can go back to R in O(|S| · |R|).


F-Trees

F-TreesQueries

Given a query Q = πPσϕ(R1 × · · · × Rn)

Given a query Q, an f-tree T is an f-tree of Q if and only if it satisfiesthe path constraint.

all dependent attributes can only label nodes along a sameroot-to-leaf path.

Nodes of a join are only dependent if they are in the P list or thesame Ri .


F-Trees

F-TreesSize Bounds

Let sT be the maximal root-to-leaf path edge cover T .

For and database D and f-tree T the result size is at mostO(|P| · |D|s(T ))

If we let s(Q) be the minimal s(T ) over all f-trees.

|P| · |D|s(Q) � Q(D) — This can mean asymptotically smaller,exponential size and time savings


Query Evaluation

Contents

1 Intro

2 F-Representation

3 F-Trees







Query Evaluation

F-plans

Any select-project-join query can be evaluated by a sequentialcomposition of operators called an f-plan.

f-trees uniquely identify f-representations → operators are in treeform.

The time complexity of each f-plan operator is O(|T |2N logN), whereN is the sum of input and output f-representations and T is the inputf-tree.


Query Evaluation Normalisation Operator

Normalisation Operator1

push-up operator ψBnormalization η(T )

A tree is normalized if the push up operator can not be applied.

The push-up operator is applied bottom up.

ψB η(T ) — Normalization

1All other operators expect normalized inputscgrant (UF) FDB: Factorised RDBMS November 1, 2013 15 / 39

Query Evaluation Swap Operator

Swap Operator XA,B

Exchanges B with its parent A and promotes children.


Query Evaluation Cartesian Product Operator

Cartesian Product Operator

T ′ = T∞ × T∈


Query Evaluation Selection Operator

Merge Selection OperatorµA,B

If A and B are siblings

µ( , ) =



Absorb Selection OperatorαA,B

If A is an ancestor of B

Example:



Selection with Constant OperatorσAθc

Filters out all the values in 〈A : a〉 where a¬θc


Query Evaluation Projection Operator

Projection Operatorπ−A

Removes a node from the tree. Only permitted if A is a leaf node or withother attributes.


Query Optimization

Contents

1 Intro

2 F-Representation

3 F-Trees







Query Optimization

Query Optimization

Query optimization has two objectives:

minimizing the cost of computing a factorised query result;

minimizing the size of this output representation.


Query Optimization

Query Optimization

Optimize the f-tree then the f-representation.

Products and Selections are always evaluated first (cheapest).

Selection (A = B) can only be done if they are on the same path orsiblings


Query Optimization Cost of an F-Plan

Cost of an F-Plan

use asymptotic bounds of s(T ).

with an evaluation time ofO(|Ds(f ) · log |D|), where s(f ) = max(s(T0, . . . s(Tk )).

, or, use cardinality of estimate from intermediate f-trees usingRDBMS style techniques


Query Optimization Exhaustive Search

Exhaustive Search

Enumerate plans and the one with the shortest path (Dijkstra)between Tinit to Tfinal is least cost.

Search space is O(n4n) with n − p attributes in the projection list


Query Optimization Greedy Heuristic

Greedy Heuristic

Only restructures nodes that appear in selections and projections.

In projection ordering chooses the least cost at each step.

Computing the tree take O(|T |2).


Experiments

Contents

1 Intro

2 F-Representation

3 F-Trees







Experiments

Implementation

Compare SQLite and PostgreSQL with FDB and RDB

for SQLite and PostgreSQL tried to disable expensive writing (try tomake it in-memory)

Implemented in C++

Optimal f-representation computed with multi-way merge-sort joinalgorithm.


Experiments

Experiment Design

Generate random data for queries

Repeat a number of times for optimizations time, execution time,representation sizes and f-plan quality

Tuples generated independently with a uniform or Zipf distribution.

Queries are equality joins


Experiments Experiment #1

Experiment #1

Average time for optimising a query on flat data.

A = 40 Attributes, R = 1, . . . , 8 relations, K = 1, . . . , 9 selections



Experiment #2



Experiment #3

Compare performance of FDB, RDB, SQLite and PostgreSQLResults size Evaluation Time



Experiment #3Compare performance of FDB, RDB, SQLite and PostgreSQL.

Each tuple size decreases results size by 20Results size Evaluation time



Experiment #3Compare performance of FDB, RDB, SQLite and PostgreSQL.

In QA is a many-many query. This is highly compressible so the factorizedresult is small.



Experiment #4Experiment over Factorized data

Results size Evaluation time


Conclusion

Conclusion

Factorizing Relational Data prevents the explosion of data size forlarge joins.

This paper described techniques and experiments to decrease this sizeand improving query performance.

For MLNs and other graph DBs that require many joins this cansignificantly improve query performance.


Thanks

Thank you

Questions?


fdb: a query engine for factorised relational databases · 2017-03-22 · fdb: a query engine for...

Documents