schema less table & dynamic schema

Schema-less table & Dynamic SchemaDavide Mauridmauri@solidq.com@mauridb

Davide Mauri

• Microsoft SQL Server MVP

• Works with SQL Server from 6.5, on BI from 2003

• Specialized in Data Solution Architecture, Database Design, Performance Tuning, High-Performance Data Warehousing, BI, Big Data

• President of UGISS (Italian SQL Server UG)

• Regular Speaker @ SQL Server events

• Consulting & Training, Mentor @ SolidQ

• E-mail: dmauri@solidq.com

• Twitter: @mauridb

• Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx

Agenda

• Schema, Schemaless & Implicit Schemas

• Possible solutions

• Conclusion

Schema

• “A priori” definition of data structures

• Allows data to be inserted if and only if it is compatible with the schema

• Es: RDBMS Table, XML Schema, Class, Struct

Schemaless (?)

• No definition at all on the data you expect to have. • Unstructured data.

• For example: text files, binary files• with no metadata and no position-based format

• In one word: chaos

Implicit Schema

• In reality a schema always exists, albeit implicit• Otherwise it would be impossible to handle data

Implicit Schema

Any data that doesn't fit this implicit schema will not be manipulated properly, leading to errors.

(Schemaless data structures, Martin Fowler)

• Flexibility• Easy to manage

• actually, almost no management at all

• Easy to be extended• Just add a new element and you’re done

• Easy to be used• No mismatch between OOP and other models

• Schema information are hidden somewhere• Scattered all across the codebase

• It’s really difficult to keep under control the chaos that can emerge• For example two different element that contains the same information

• CustomerName and Customer_Name

• You still need to have a sort of «First Normal Form» in order to avoid inconsistency and code inefficiencies

• It’s really difficult to define and maintain integrity constraints • Data Integrity is a value that must be preserved!

• Otherwise we’ll have data, not information

• XML Schema were born for that specific reason

• Without Data Integrity, the process of extracting information from data becomes• Difficult

• Expensive

• Untrustable

Words of Wisdom

«Schemaless => implicit schema = bad.

Prefer an explicit schema»(Schemaless data structures, Martin Fowler)

But if we need it anyway?

• What if my use case is one that perfectly fits the need for a implicit schema?

• The only possible solution are the so-called «No-SQL» databases• Document Database or Key-Value store?

• How can I integrate it into already existing database?

• Integration does not come for free!

Schemaless & RDBMS

• (Usually) Are the exact opposite extremes

• Still is a very common request• CRM, eCommerce, ERPs….

• Schemaless is used not only for pure data persistence

Solution within an RDBMS

• «Custom» columns• Custom1, Custom2

• In-Table Data Structures• BLOB, XML, JSON, «Complex» columns

• Entity-Attribute-Value Models

«Custom» Columns

• A problem until SQL Server 2008• Space is still used for fixed length column even if they contain a NULL

• With SQL Server 2008 the «Sparse Column» feature comes to help• Helps to make the schema easily modifiable, even in presence of

existing data

• Changes to the schema must still be done with «ALTER TABLE»

«Custom» Columns

• Sparse Columns• Are Columns at 100%

• Optionally you can have *all* the Sparse Columns returned as a single XML column• «Column Set»

• Make development easier

• Do not take space if not used • But use more space when used

DemoDynamic Schema & Sparse Columns

In-Table Data Structures

• Complete support for XML• XPath/XQuery

• XML Index

• Performance «Good Enough»• But not optimal (compared with the equivalent relational approach)

• Use a lot of space

• XML Sometimes needs some help to boost performance

• Would be nice to be able to «promote» elements to turn them into real columns• Must be done manually using a choice of

• Triggers

• Stored Procedure

• Data Access Layer

• Service Broker

• JSON support is still missing in SQL Server• But others database like PostgreSQL already have it…• …so we can see it coming to MS Platform too

• Right now one solution is to use SQLCLR• Solutions available surfing the web:• http://www.sqlservercentral.com/articles/SQLCLR/74160/• http://www.json4sql.com/examples.html

• There is also a pure T-SQL solution• https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-

in-sql-server/

• Blob is an option if you just need to do persistence

• Blob can be stored in different way• «Classic» blob inside SQL Server pages & extents

• Blob in a filestream

• Blob in a filetable

DemoDynamic Schema & In-Table Data Structures

Entity-Attribute-Values

• Old and very common technique to store attribute-value pairs• Some well-known samples: Wordpress

• Works on any RDBMS• No «special» features required

• There’s a huge debate around it • http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_mo

• But until SQL 2005 no true alternative

• Offers maximum flexibility• No real control over data types.

• Options to deal with data types• All strings• SQL Variant• One-Column-Per-Type

• Complex query pattern for «AND» predicates between attributes• «Return all the entities that have «CPU=i7» and «Display=15.4’»

• Queries requires the implementation of a relational operator not implemented in common RDMBS• «Relational Division»

• Document and well explained in theory• It is quite easy to implement it. Follow theory + add some pepper to boost

performances

Relational Division

• Let’s get back to theory a little bit, in order to see the problem from a more open perspective:

Dividend

Divisor Result

Remainder

Relational Division

• How do we implement the division?

• Thanks to Codd and the relational theory we already have the solution

Relational Division

• Thanks to relational algebra we know that the division is expressed as

• Generate all possibile pairings

• Remove existing pairing• (Now we’ve found all pairings that are NOT answers)

• Remove the non-answers from the dividend

DemoDynamic Schema & EAV

Conclusions

• It works! • Performance more than good

• Choose the solution that better fits your use-case• Search for attributes only?

• Persistence only?

• Search for attributes & values?

• Performance read, write, read/write?

Conclusions

• Use it if and only if when really needed

• Always remeber the «Words of Wisdom» • If you can define and use a schema.

• It may seem «not cool» and convoluted but in the long term is the best solution.• *data* *must* *be* *turned* *into* *information*

• Sooner or later

• Without metadata (a schema) it’s really really really hard!

Questions?

Thanks!

• If you want to rate this session on my SpeakerScore page:

• www.speakerscore.com

• Feedback Key: TZQL

Demo Material

• Can be found here• http://1drv.ms/1Av5mb5

• Everything is release under the Creative Common Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) • http://creativecommons.org/licenses/by-nc-sa/4.0/

schema less table & dynamic schema

data http

unstructured data

implicit schemaany data

table data structuresblob

data solution architecture

pure data persistencesolution

information xml schema

schema es

Software

low power dynamic comparator design - welcome to...

chapter 7 xml - · n data source schema not known in...

a neural schema architecture for autonomous robots ·...

less-than-truckload dynamic pricing model in physical...

dynamic schema e schemaless tables

ron jones dynamic warm-up “moving better with less...

dynamic hair capture using spacetime...

what's new in mysql 8 · –full json document support...

funbox pinpoint - breakthrough cookie less device...

big data semantics - exascale infolabdata that make it di...

xml origin and usages - vsis- filegoal: integrate autonomous...

ml schema: machine learning schema

the less 기초 : the dynamic styleshee language basic

—draft— hyperqueries: dynamic distributed query...

an axiomatic model of dynamic schema evolution in...

nosql for architects - migrating from rdbms to a schema-less...

setting-less protection: centralized substation...

postgresql, sqlalchemy, and schema-less data. - wmmi.net ·...

mastercam x3 packaging - texas a&m university · 2d dynamic...

schema delle posizioni (lallai 2) schema a. schema delle...