sql to nosql best practices with amazon dynamodb - aws july 2016 webinar series

Post on 16-Apr-2017

692 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Rick Houlihan, Principal TPM – DBS NoSQL

26 July 2016

From SQL to NoSQLBest Practices for Migrating from RDBMS to DynamoDB

Agenda

• Evolution of Data Processing• Why NoSQL• Key DynamoDB concepts• SQL to NoSQL Data Modeling

Timeline of Database TechnologyDa

ta P

ress

ure

Data Volume Since 2010

Dat

a Vo

lum

eHistorical Current

90% of stored data generated in last 2 years

1 Terabyte of data in 2010 equals 6.5 Petabytes today

Linear correlation between data pressure and technical innovation

SQL is not built for this

Technology Adoption and the Hype Curve

Why NoSQL?

Optimized for storage Optimized for compute

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale

SQL NoSQL

Amazon DynamoDB

Document or Key-Value Scales to Any WorkloadFully Managed NoSQL

Access Control Event Driven ProgrammingFast and Consistent

TableTable

Items

Attributes

PartitionKey

Sort Key

MandatoryKey-value access patternDetermines data distribution

OptionalModel 1:N relationshipsEnables rich query capabilities

All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values

00 55 A954 FFAA00 FF

Partition Keys

Id = 1Name = Jim

Hash (1) = 7B

Id = 2Name = AndyDept = Eng

Hash (2) = 48

Id = 3Name = KimDept = Ops

Hash (3) = CD

Key Space

Partition Key uniquely identifies an itemPartition Key is used for building an unordered hash indexAllows table to be partitioned for scale

Partition 3

Partition:Sort Key uses two attributes together to uniquely identify an ItemWithin unordered hash index, data is arranged by the sort keyNo limit on the number of items (∞) per partition keyExcept if you have local secondary indexes

Partition:Sort Key

00:0 FF:∞

Hash (2) = 48

Customer# = 2Order# = 10Item = Pen

Customer# = 2Order# = 11Item = Shoes

Customer# = 1Order# = 10Item = Toy

Customer# = 1Order# = 11Item = Boots

Hash (1) = 7B

Customer# = 3Order# = 10Item = Book

Customer# = 3Order# = 11Item = Paper

Hash (3) = CD

55 A9:∞54:∞ AAPartition 1 Partition 2

Partitions are three-way replicated

Id = 2Name = AndyDept = Engg

Id = 3Name = KimDept = Ops

Id = 1Name = Jim

Id = 2Name = AndyDept = Engg

Id = 3Name = KimDept = Ops

Id = 1Name = Jim

Id = 2Name = AndyDept = Engg

Id = 3Name = KimDept = Ops

Id = 1Name = Jim

Replica 1

Replica 2

Replica 3

Partition 1 Partition 2 Partition N

Indexes

Local secondary index (LSI)

Alternate sort key attributeIndex is local to a partition key

A1(partition)

A3(sort)

A2(item key)

A1(partition)

A2(sort) A3 A4 A5

LSIs A1(partition)

A4(sort)

A2(item key)

A3(projected)

Table

KEYS_ONLY

INCLUDE A3

A1(partition)

A5(sort)

A2(item key)

A3(projected)

A4(projected) ALL

10 GB max per partition key, i.e. LSIs limit the # of range keys!

Global secondary index (GSI)

Alternate partition and/or sort keyIndex is across all partition keysUse composite sort keys for compound indexes

A1(partition) A2 A3 A4 A5

A5(partition)

A4(sort)

A1(item key)

A3(projected) INCLUDE A3

A4(partition)

A5(sort)

A1(item key)

A2(projected)

A3(projected) ALL

A2(partition)

A1(itemkey) KEYS_ONLY

GSIs

Table RCUs/WCUs provisioned separately for GSIs

Online indexing

Migrating to NoSQL

Planning DataAnalysis

DataModeling Testing Migration

Data Analysis Phase

Analysis of both the RDBMS schema as well as application access patterns is key to understanding the performance of workloads on a NoSQL database platform.

Data Analysis Phase

RDBMS Source Data AnalysisKey data attributes• Read/Write Velocity• Data Partitioning• Key Cardinality

Access Pattern of the ApplicationExamples:• Write/Read Workloads• Relational Structures• Aggregation Dimensions

Data Modeling

1:1 relationships or key-values

Use a table or GSI with an alternate partition keyUse GetItem or BatchGetItem API

Example: Given an SSN or license number, get attributes

Users TablePartition key AttributesSSN = 123-45-6789 Email = johndoe@nowhere.com, License = TDL25478134SSN = 987-65-4321 Email = maryfowler@somewhere.com, License = TDL78309234

Users-License-GSIPartition key AttributesLicense = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789

1:N relationships or parent-children

Use a table or GSI with partition and sort keyUse Query API

Example:Given a device, find all readings between epoch X, Y

Device-measurementsPartition Key Sort key AttributesDeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90

N:M relationships

Use a table and GSI with partition and sort key elements switchedUse Query API

Example: Given a user, find all games. Or given a game, find all users.

User-Games-TablePartition Key Sort keyUserId = bob GameId = Game1UserId = fred GameId = Game2UserId = bob GameId = Game3

Game-Users-GSIPartition Key Sort keyGameId = Game1 UserId = bobGameId = Game2 UserId = fredGameId = Game3 UserId = bob

Hierarchical DataTiered relational data structures

It’s all about aggregations…

Document Management Process ControlSocial Network

Data TreesIT Monitoring

How OLTP Apps Use Data

Mostly Hierarchical Structures

Entity Driven Workflows Data Spread Across Tables Requires Complex Queries Primary driver for ACID

Hierarchical Structures as Item Collections…Use composite sort key to define a HierarchyHighly selective result sets with sort queriesIndex anything, scales to any size

 Primary Key

AttributesProductID type

Items

1 bookIDtitle author genre publisher datePublished ISBN

 Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-4

2

albumIDtitle artist genre label studio relesed producer

Dark Side of the Moon Pink Floyd Progressive Rock Harvest Abbey Road 3/1/73 Pink Floyd

albumID:trackIDtitle length music vocals

 

Speak to Me 1:30 Mason Instrumental

albumID:trackIDtitle length music vocals

Breathe 2:43 Waters, Gilmour, Wright Gilmour

albumID:trackIDtitle length music vocals

On the Run 3:30 Gilmour, Waters Instrumental

3

movieIDtitle genre writer producer

Idiocracy Scifi Comedy Mike Judge 20th Century Fox

movieID:actorIDname character image

 

Luke Wilson Joe Bowers img2.jpg

movieID:actorIDname character image

Maya Rudolph Rita img3.jpg

movieID:actorIDname character image

Dax Shepard Frito Pendejo img1.jpg

… or as Documents (JSON)

JSON data types (M, L, BOOL, NULL)Index root attributesFully Atomic Updates

 Primary Key

AttributesProductID

Items

1id title author genre publisher datePublished ISBN

 bookID Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-

4

2

id title artist genre Attributes

albumID Dark Side of the Moon Pink Floyd Progressive Rock

{ label:"Harvest", studio: "Abbey Road", published: "3/1/73", producer: "Pink Floyd", tracks: [{title: "Speak to Me", length: "1:30", music: "Mason",

vocals: "Instrumental"},{title: ”Breathe", length: ”2:43", music: ”Waters, Gilmour, Wright", vocals: ”Gilmour"},{title: ”On the Run", length: “3:30",

music: ”Gilmour, Waters", vocals: "Instrumental"}]}

3

id title genre writer Attributes

movieID Idiocracy Scifi Comedy Mike Judge

{ producer: "20th Century Fox", actors: [{ name: "Luke Wilson", dob: "9/21/71", character: "Joe Bowers", image: "img2.jpg"},{ name: "Maya Rudolph", dob: "7/27/72", character: "Rita", image: "img1.jpg"},{ name:

"Dax Shepard", dob: "1/2/75", character: "Frito Pendejo", image: "img3.jpg"}]

Partition 11000 WCUs

Partition K1000 WCUs

Partition M1000 WCUs

Partition N1000 WCUs

Votes Table

Candidate A Candidate B

Scaling bottlenecks

50,000/sec

70,000/sec

Voters

Provision 200,000 WCUs

Write sharding

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4Candidate A_7 Candidate B_8

Candidate A_6 Candidate A_8

Candidate A_5

Voter

Votes Table

Write sharding

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4Candidate A_7 Candidate B_8

UpdateItem: “CandidateA_” + rand(0, 10)ADD 1 to Votes

Candidate A_6 Candidate A_8

Candidate A_5

Voter

Votes Table

Votes Table

Shard aggregation

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4

Candidate A_5

Candidate A_6 Candidate A_8

Candidate A_7 Candidate B_8

Periodic Process

Candidate ATotal: 2.5M

1. Sum2. Store Voter

Write Sharded IndexingUse for GSI’s with high volume aggregationsCommon when low cardinality attributes must be indexedScales to any size workload

 Primary Key

AttributesProductID type

Items

1 bookIDtitle author genre publisher datePublished ISBN

 Ringworld Larry Niven Science Fiction:1 Ballantine:50 Oct-70 0-345-02046-4

2

albumIDtitle artist genre label studio relesed producer

Dark Side of the Moon Pink Floyd Progressive Rock:99 Harvest Abbey Road 3/1/73 Pink Floyd:2

albumID:trackIDtitle length music vocals

 

Speak to Me 1:30 Mason Instrumental

albumID:trackIDtitle length music vocals

Breathe 2:43 Waters, Gilmour, Wright Gilmour

albumID:trackIDtitle length music vocals

On the Run 3:30 Gilmour, Waters Instrumental

3

movieIDtitle genre writer producer

Idiocracy Scifi Comedy:42 Mike Judge 20th Century Fox:17

movieID:actorIDname character image

 

Luke Wilson Joe Bowers img2.jpg

movieID:actorIDname character image

Maya Rudolph Rita img3.jpg

movieID:actorIDname character image

Dax Shepard Frito Pendejo img1.jpg

Conclusion

Keys to Success

• Select a workload that is a good fit for NoSQL• Understand the source data and access patterns• Test thoroughly and often• Plan on an iterative migration process

Thank you!

top related