nosql and data modeling for data modelers
TRANSCRIPT
Big Data, NoSQL & Data Modeling
10 Tips for Data Modeling Success on Modern Data Projects
Karen Lopez, InfoAdvisorswww.datamodel.com
Data Models – Traditional Process
Conceptual (Data) Model
Logical Data Model
Physical Data
Model(s) OLTP
OLTPOLTP OLTP
OLTP
MARTMART
OLTP
OLTPOLTP
Aug 2014©InfoAdvisors - infoadvisors.com
Relational
Aug 2014©InfoAdvisors - infoadvisors.com
Data Models started
with relational
modeling, so they look
like relational database
structures.
But….
That doesn’t mean they can’t be used to model data that goes into a non-relational format.
All that formatting happens at build OR consumption time, not requirements time.
Aug 2014©InfoAdvisors - infoadvisors.com
The Big Data Story
Lots of data
Coming at us fast
Lots of variety in format & quality
We want all the data
Highly available
“It’s web scale”Aug 2014©InfoAdvisors - infoadvisors.com
What do we really mean by scale?
Bringing computing to the data
Massively parallel processing
Cheap, commodity hardware, but lots of it
Optimized for Query/Reads/Questions/Telling stories
Aug 2014©InfoAdvisors - infoadvisors.com
We’ve been down this road before…
Traditional transactional applications
Reporting-optimized
tables/structures
Data Warehouse / Dimensional
Modeling
Aug 2014©InfoAdvisors - infoadvisors.com
Highly normalized Highly Denormalized
ETL
EDW
Data Mart
Data Mart
Hadoop
ETL
EDW
Analytics Mart
Data Mart
NoSQL, Not Only SQL
Relational GraphColumnar/Column
Family
Key ValueDocument Databases
Others
Aug 2014©InfoAdvisors - infoadvisors.com
Sample Hive Statement
CREATE EXTERNAL TABLE TaxRebateUsage (
state string,
zipcode string,
agi_class int,
n1 int,
mars2 int,
prep int,
n2 int,
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE
Aug 2014©InfoAdvisors - infoadvisors.com
Sample JSON/MongoDB Notation
Aug 2014©InfoAdvisors - infoadvisors.com
Sample FoundationDB Statement
Aug 2014©InfoAdvisors - infoadvisors.com
Sample Cassandra Statement
Aug 2014©InfoAdvisors - infoadvisors.com
Sample Vertica Statement
Aug 2014©InfoAdvisors - infoadvisors.com
Sample Neo4j Statement
Aug 2014©InfoAdvisors - infoadvisors.com
Those weren’t SCHEMALESS….
They had data facts, which had meanings. And sometimes expected formats, precisions, and types.
In the NoSQL world, we don’t apply those necessarily at write time, but at read time.
SCHEMALESS really is MULTIPLE SCHEMAs (Polyschematic) or VARYING SCHEMAs.
Aug 2014©InfoAdvisors - infoadvisors.com
The Big Data Big Lies
Schemaless
• Schema on Read, not Schema on Write
• Polyschematic
Big
• New data stories
• New technologies
• Not just volume
Aug 2014©InfoAdvisors - infoadvisors.com
10 Tips For Modeling in a Hybrid World
1. Models require a modeler
2. Data modeling tools are essential
3. There are many types of data models: know which ones you need
4. Modeling does not have to happen at the same time in every project. It should happen at the right time
5. Modeling is not just schema design. Think outside the boxes and lines
Aug 2014©InfoAdvisors - infoadvisors.com
10 Tips for Modeling in a Hybrid World
6. A data model is much more than a diagram
7. You will need training.
8. Team members may not understand modeling. They will need training
9. NoSQL is not one thing. Learn many patterns
10.Modern data architectures are likely hybrid solutions. You can’t just support one part.
Aug 2014©InfoAdvisors - infoadvisors.com
What does this mean for data modelers?
There will be jobs for traditional, ERD, relational modelers….
….just like there are still jobs of RPG and COBOL programmers
All data has a data story. Many data stories.
A good modeler is a an architect at heart – finding the right solution for the data story.
Aug 2014©InfoAdvisors - infoadvisors.com
Business Intelligence Journal
Look for September 2014 Issue Article on Modern
Data Architectures
Aug 2014©InfoAdvisors - infoadvisors.com
Thank You!
www.infoadvisors.com
www.datamodel.com
www.dataversity.net
community.embarcadero.com
#TEAMDATA
Aug 2014©InfoAdvisors - infoadvisors.com