query languages for document stores
DESCRIPTION
TRANSCRIPT
© 2013 triAGENS GmbH | 2013-06-18
Query Languagesfor Document Stores
2013-06-18
Jan Steemann
© 2013 triAGENS GmbH | 2013-06-18
me
I'm a software developer working at triAGENS GmbH, CGN on - a document store
© 2013 triAGENS GmbH | 2013-06-18
Documents
© 2013 triAGENS GmbH | 2013-06-18
Documents
documents are self-contained, aggregate data structures...
...consisting of named and typed attributes,which can be nested / hierarchical
documents can be used to model complex business objects
© 2013 triAGENS GmbH | 2013-06-18
Example order document
{ "id": "abc10022", "date": "20130426" "customer": { "id": "c199023", "name": "acme corp." }, "items": [ { "id": "p123", "quantity": 1, "price": 25.13
} ]}
© 2013 triAGENS GmbH | 2013-06-18
Document stores
document stores are databases specialised in handling documents
they've been around for a while got really popular with the NoSQL buzz
(CouchDB, MongoDB, ...)
© 2013 triAGENS GmbH | 2013-06-18
Why use Document Stores?
© 2013 triAGENS GmbH | 2013-06-18
Saving programming language data
document stores allow saving a programming language object as a whole
your programming language object becomes a document in the database, without the need for much transformation
compare this to saving data in a relational database...
© 2013 triAGENS GmbH | 2013-06-18
Persistence the relational way
orders
id date
1 2013-04-20
2 2013-04-21
3 2013-04-21
4 2013-04-22
customers
customer
c1
c2
c1
c3
id name
c1
c2
c3
acme corp.
sample.com
abc co.
orderitems
1
order item
1
price quantity
23.25 1
© 2013 triAGENS GmbH | 2013-06-18
Benefits of document stores
no impedance mismatch, no complex object-relational mapping,no normalisation requirements
querying documents is often easier and faster than querying highly normalised relational data
© 2013 triAGENS GmbH | 2013-06-18
Schema-less
in document stores, there is no "table"-schema as in the relational world
each document can have different attributes there is no such thing as ALTER TABLE that's why document stores are called
schema-less or schema-free
© 2013 triAGENS GmbH | 2013-06-18
Querying Document Stores
© 2013 triAGENS GmbH | 2013-06-18
Querying by document id is easy
every document store allows querying a single document at a time
accessing documents by their unique ids is almost always dead-simple
© 2013 triAGENS GmbH | 2013-06-18
Complex queries?
what if you want to run complex queries (e.g. projections, filters, aggregations, transformations, joins, ...)??
let's check the available options in some of the popular document stores
© 2013 triAGENS GmbH | 2013-06-18
CouchDB: map-reduce
querying by something else than document key / id requires writing a view
views are JavaScript functions that are stored inside the database
views are populated by incremental map-reduce
© 2013 triAGENS GmbH | 2013-06-18
map-reduce
the map function is applied on each document (that changed)
map can filter out non-matching documents or emit modified or unmodified versions of them emitted documents can optionally be passed into
a reduce function reduce is called with groups of similar
documents and can thus perform aggregation
© 2013 triAGENS GmbH | 2013-06-18
CouchDB map-reduce example
map = function (doc) { var i, n = doc.orderItems.length; for (i = 0; i < n; ++i) { emit(doc.orderItems[i], 1); }};
reduce = function (keys, values, rereduce) { if (rereduce) { return sum(values);
} return values.length;};
© 2013 triAGENS GmbH | 2013-06-18
map-reduce
map-reduce is generic and powerful provides a programming language need to create views for everything that is
queried access to a single "table" at a time (no
cross-"table" views) a bit clumsy for ad-hoc exploratory queries
© 2013 triAGENS GmbH | 2013-06-18
MongoDB: find()
ad-hoc queries in MongoDB are much easier can directly apply filters on collections,
allowing to find specific documents easily:mongo> db.orders.find({ "customer": { "id": "c1", "name": "acme corp." }});
© 2013 triAGENS GmbH | 2013-06-18
MongoDB: complex filters
can filter on any document attribute or sub-attribute
indexes will automatically be used if present nesting filters allows complex queries quite flexible and powerful, but tends to be
hard to use and read for more complex queries
© 2013 triAGENS GmbH | 2013-06-18
MongoDB: complex filtering
mongo> db.users.find({ "$or": [ { "active": true }, { "age": { "$gte": 40 } } ]});
© 2013 triAGENS GmbH | 2013-06-18
MongoDB: more options
can also use JavaScript functions for filtering, or JavaScript map-reduce
several aggregation functions are also provided
neither option allows running cross-"table" queries
© 2013 triAGENS GmbH | 2013-06-18
Why not use a QueryLanguage?
© 2013 triAGENS GmbH | 2013-06-18
Query languages
a good query language should allow writing both simple and complex
queries, without having to switch the methodology
provide the required features for filtering, aggregation, joining etc.
hide the database internals
© 2013 triAGENS GmbH | 2013-06-18
SQL
in the relational world, there is one accepted general-purpose query language: SQL
it is quite well-known and mature: 35+ years of experience many developers and established tools
around it standardised (but mind the "dialects"!)
© 2013 triAGENS GmbH | 2013-06-18
SQL in document stores?
SQL is good at handling relational data not good at handling multi-valued or
hierchical attributes, which are common in documents
(too) powerful: SQL provides features many document stores intentionally lack (e.g. joins, transactions)
SQL has not been adopted by document stores yet
© 2013 triAGENS GmbH | 2013-06-18
Query Languagesfor Document Stores
© 2013 triAGENS GmbH | 2013-06-18
XQuery?
XQuery is a query and programming language
targeted mainly at processing XML data can process hierarchical data very powerful and extensible W3C recommendation
© 2013 triAGENS GmbH | 2013-06-18
XQuery
XQuery has found most adoption in the area of XML processing
today people want to use JSON, not XML XQuery not available in popular document
stores
© 2013 triAGENS GmbH | 2013-06-18
ArangoDB Query Language (AQL)
ArangoDB provides AQL, a query language made for JSON document processing
it allows running complex queries on documents, including joins and aggregation
language syntax was inspired by XQuery and provides similar concepts such as FOR, LET, RETURN, ...
the language integrates JSON "naturally"
© 2013 triAGENS GmbH | 2013-06-18
AQL example
FOR order IN orders
FILTER order.status == "processed"
LET itemsValue = SUM(( FOR item IN order.items FILTER item.status == "confirmed" RETURN item.price * item.quantity ))
FILTER itemsValue >= 500
RETURN { "items" : order.items, "itemsValue" : itemsValue, "itemsCount" : LENGTH(order.items) }
© 2013 triAGENS GmbH | 2013-06-18
AQL: some features
queries can combine data from multiple "tables"
this allows joins using any document attributes or sub-attributes
indexes will be used if present
© 2013 triAGENS GmbH | 2013-06-18
AQL: join example
FOR user IN users
FILTER user.id == 1234
RETURN { "user" : user, "posts" : (FOR post IN blogPosts
FILTER post.userId == user.id && post.date >= '20130613'
RETURN post ) }
© 2013 triAGENS GmbH | 2013-06-18
AQL: additional features
AQL provides basic functionality to query graphs, too
the language can be extended with user-defined JavaScript functions
© 2013 triAGENS GmbH | 2013-06-18
JSONiq
JSONiq is a data processing and query language for handling JSON data
it is based on XQuery, thus provides the same FLWOR expressions: FOR, LET, WHERE, ORDER, ...
JSON is integrated "naturally" most of the XML handling is removed
© 2013 triAGENS GmbH | 2013-06-18
JSONiq: example
for $order in collection("orders")
where $order.customer.id eq "abc123"
return { customer : $order.customer, items : $order.items }
© 2013 triAGENS GmbH | 2013-06-18
JSONiq: join example
for $post in collection("posts")
let $postId := $post.id
for $comment in collection("comments")
where $comment.postId eq $postId
group by $postId
order by count($comment) descending
return { id : $postId, comments : count($comment) }
© 2013 triAGENS GmbH | 2013-06-18
JSONiq
JSONiq is a generic, database-agnostic language
it can be extended with user-defined XQuery functions
JSONiq is currently not implemented inside any document database...
© 2013 triAGENS GmbH | 2013-06-18
JSONiq
...but it can be used via a service (at 28.io) the service provides the JSONiq query
language and implements functionality not provided by a specific database
such features are implemented client-side, e.g. joins for MongoDB
© 2013 triAGENS GmbH | 2013-06-18
Summary
© 2013 triAGENS GmbH | 2013-06-18
Summary
today's document stores provide different, proprietary mechanisms for querying data
there is currently no standard query mechanism for document stores as there is in the relational world (SQL)
© 2013 triAGENS GmbH | 2013-06-18
Summary
you CAN use query languages in document stores today, e.g. AQL and JSONiq
if you like the idea, give them a try, provide feedback and contribute!