django meetup: django multicolumn joins

40
Django Meetup: Django Multicolumn Joins Jeremy Tillman Software Engineer, Hearsay Social @hssengineering

Upload: hearsay-social

Post on 23-Jan-2015

1.485 views

Category:

Business


3 download

DESCRIPTION

A presentation shared by Hearsay Social software engineer Jeremy Tillman.

TRANSCRIPT

Page 1: Django Meetup: Django Multicolumn Joins

Django Meetup:Django Multicolumn Joins

Jeremy TillmanSoftware Engineer, Hearsay Social

@hssengineering

Page 2: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 2

About Me

• Joined Hearsay Social May 2012 as Software Engineering Generalist

• Computer Engineer BA, Purdue University

• 3 years @ Microsoft working on versions of Window Server

• 9 years of databases experience– Access, SQL Server, MySql

• Loves Sea Turtles!

Page 3: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 3

Why do we want multicolumn joins?

Page 4: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 4

Django First App: Poll example

class Poll(models.Model): question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')

class Choice(models.Model): poll = models.ForeignKey(Poll) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0)

Page 5: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 5

What if we stored Polls for X number of customers?

class Customer(models.Model): name = models.CharField(max_length=100)

class Meta: ordering = (‘name’,)

class Choice(models.Model): poll = models.ForeignKey(Poll) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0)

class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')

CREATE TABLE customer(id INT NOT NULL AUTO_INCRMENT PRIMARY KEY,name VARCHAR(100) NOT NULL);

CREATE TABLE poll(id INT NOT NULL AUTO_INCRMENT PRIMARY KEY,customer_id INT NOT NULL,question VARCHAR(200) NOT NULL,pub_date DATETIME NOT NULL,INDEX idx_customer (customer_id));

CREATE TABLE choice(id INT NOT NULL AUTO_INCRMENT PRIMARY KEY,poll INT NOT NULL,choice_text VARCHAR (200),votes INT NOT NULL DEFAULT 0,INDEX idx_poll (poll_id));

Page 6: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 6

How is our data being stored?

CREATE TABLE choice(id INT NOT NULL AUTO_INCRMENT PRIMARY KEY,poll_id INT NOT NULL,choice_text VARCHAR (200),votes INT NOT NULL DEFAULT 0,INDEX idx_poll (poll_id));

id poll_id choice_text votes1 1 Ham 52 7 Aries 83 2 Elephant 9…. … … …23,564,149 1 All of the above 223,564,150 74 Sea turtle 7

Page 7: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 7

Data locality part 1: Scope by poll

CREATE TABLE choice(id INT NOT NULL,poll_id INT NOT NULL,choice_text VARCHAR (200),votes INT NOT NULL DEFAULT 0,PRIMARY KEY (poll_id, id));

id poll_id choice_text votes1 1 Ham 51,562 1 Turkey 4623,564,149 1 All of the above 2…. … … …18,242,234 74 Jelly fish 023,564,150 74 Sea turtle 7

Page 8: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 8

Data locality part 2: Scope by customer

CREATE TABLE choice(id INT NOT NULL,customer_id INT NOT NULL,poll_id INT NOT NULL,choice_text VARCHAR (200),votes INT NOT NULL DEFAULT 0,PRIMARY KEY (customer_id, poll_id, id));

id poll_id customer_id choice_text votes1 1 1 Ham 51,562 1 1 Turkey 4623,564,149 1 1 All of the above 218,242,234 74 1 Jelly fish 023,564,150 74 1 Sea turtle 7… … … … …

Page 9: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 9

Representation in Django Models

class Customer(models.Model): name = models.CharField(max_length=100)

class Meta: ordering = (‘name’,)

class Choice(models.Model): customer = models.ForeignKey(Customer) poll = models.ForeignKey(Poll) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0)

class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')

Page 10: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 10

Customer Load/Data Balance

customer_id id1 12 23 34 4

Page 11: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 11

Customer Load/Data Balance: Split Customers

customer_id id3 33 54 44 6

customer_id id1 11 52 22 6

Page 12: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 12

Add DB and Balance Load: id collision

customer_id id3 33 5

customer_id id1 11 5

customer_id id2 22 64 44 6

Page 13: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 13

Queries: Find all choices for a poll?customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 14: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 14

Queries: Find all choices for a poll?

Attempt 1) Using related set

target_poll.choice_set.all()or

Choice.objects.filter(poll=target_poll)

SELECT * FROM choice WHERE poll_id = 1

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 15: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 15

Queries: Find all choices for a poll?

Attempt 2) Adding a F expression

target_poll.choice_set.all(customer=F(‘poll__customer’))or

Choice.objects.filter(poll=target_poll,

customer=F(‘poll__customer’))

SELECT c.* FROM choice c INNER JOIN poll pON c.poll_id = p.idWHERE

c.poll_id = 1AND

c.customer_id = p.customer_id;

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 16: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 16

Queries: Find all choices for a poll?

Attempt 3) Filter explicitly

target_poll.choice_set.all(customer=target_poll.customer)or

Choice.objects.filter(poll=target_poll, customer=target_poll.customer)

SELECT * FROM choiceWHERE

poll_id = 1AND

customer_id = 2;

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 17: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 17

Field Assignment

quantity_inn = Customer.objects.create(id=15, name=‘Quantity Inn’)

quantity_poll = Poll.objects.create(id=1, company=quantity_inn, question=‘What size bed do you prefer?’)

choice1 = Choice(id=1, choice_text=“King”, poll=quantity_poll)

choice1.customer_id ??????

choice1.customer = quantity_poll.customer Repetitive

Page 18: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 18

What do we do?

Page 19: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 19

Solution via Django 1.6

class ForeignObject(othermodel, from_fields, to_fields[, **options])

where:

from django.db.models import ForeignObject

Page 20: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 20

ForeignObject Usage

class ForeignModel(models.Model):

id1 = models.IntegerField()

id2 = models.IntegerField()

class ReferencingModel(models.Model):

om_id1 = models.IntegerField()

om_id2 = models.IntegerField()

om = ForeignObject(ForeignModel,

from_fields=(om_id1, om_id2),

to_fields=(id1, id2))

Page 21: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 21

Conversion from ForeignKey to ForeignObject

class Choice(models.Model):

customer = models.ForeignKey(Customer)

poll = models.ForeignKey(Poll)

choice_text = models.CharField(max_length=200)

votes = models.IntegerField(default=0)

class Choice(models.Model):

customer = models.ForeignKey(Customer)

poll_id = models.IntegerField()

choice_text = models.CharField(max_length=200)

votes = models.IntegerField(default=0)

poll = models.ForeignObject(Poll,

from_fields=(‘customer’, ‘poll_id’),

to_fields=(‘customer’, ‘id’))

Page 22: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 22

Queries with ForeignObject

Attempt 1) Using related set

target_poll.choice_set.all()

SELECT * FROM choiceWHERE

poll_id = 1AND

customer_id = 2;

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 23: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 23

Queries with ForeignObject

Attempt 2) Manually stated

Choice.objects.filter(poll=target_poll)

SELECT * FROM choiceWHERE

poll_id = 1AND

customer_id = 2;

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 24: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 24

Queries with ForeignObject

Attempt 2) Manually stated w/tuple

Choice.objects.filter(poll=(2, 1))

SELECT * FROM choiceWHERE

poll_id = 1AND

customer_id = 2;

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 25: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 25

Field Assignment with ForeignObject

quantity_inn = Customer.objects.create(id=15, name=‘Quantity Inn’)

quantity_poll = Poll.objects.create(id=1, company=quantity_inn, question=‘What size bed do you prefer?’)

choice1 = Choice(id=1, choice_text=“King”, poll=quantity_poll)

choice1.customer_id

>> 15

choice1.customer = quantity_poll.customer Not needed

Page 26: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 26

“With great power comes great responsibility”

Page 27: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 27

Tuple ordering matters

Choice.objects.filter(poll=(1, 2))

SELECT * FROM choiceWHERE

poll_id = 2AND

customer_id = 1;

poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 28: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 28

IN Operator

Choice.objects.filter(poll__in=[(2, 1), (2, 2)])

SELECT * FROM choiceWHERE

(poll_id = 1AND

customer_id = 2)OR

(poll_id = 2AND

customer_id = 2);

poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 29: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 29

IN Operator w/queryset

Choice.objects.filter(poll__in= Poll.objects.filter(customer_id=2))

SELECT c.* FROM choice cWHEREEXISTS (SELECT p.customer_id, p.id

FROM poll pWHERE

p.customer_id = 2AND

p.customer_id = c.customer_idAND

p.id = c.poll_id);

poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 30: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 30

IN Operator with MySql

Choice.objects.filter(poll__in=[(2, 1), (2, 2)])

SELECT * FROM choiceWHERE

(poll_id, customer_id)IN

((1, 2), (2, 2));

poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 31: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 31

IN Operator w/queryset & MySQL

Choice.objects.filter(poll__in= Poll.objects.filter(customer_id=2))

SELECT c.* FROM choice cWHERE(c.customer_id, c.poll_id)IN(SELECT p.customer_id, p.id

FROM poll pWHERE

p.customer_id = 2);

poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))

customer_id id question1 1 What’s your seat pref.?

1 2 Are you married?

2 1 Gender?

2 2 Did you have fun?

customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?

Poll

Choice

Page 32: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 32

ForeignKey vs ForeignObject

Whats the difference?

ForeignKey is a ForeignObject

pseudo def: ForeignObject(OtherModel, from_fields=((‘self’,)), to_fields=((OtherModel._meta.pk.name),))

Page 33: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 33

ForeignKey usage: Order By Example

Poll.objects.order_by(‘customer’)class Customer(models.Model): name = models.CharField(max_length=100)

class Meta: ordering = (‘name’,)

class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')

Page 34: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 34

ForeignKey usage: Order By Example

Poll.objects.order_by(‘customer’)

SELECT p.* from poll INNER JOIN customer cON

p.customer_id = c.idORDER BY

c.name ASC;

class Customer(models.Model): name = models.CharField(max_length=100)

class Meta: ordering = (‘name’,)

class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')

Page 35: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 35

ForeignKey usage: Order By Example

Poll.objects.order_by(‘customer_id’)

SELECT p.* from poll INNER JOIN customer cON

p.customer_id = c.idORDER BY

c.name ASC;

class Customer(models.Model): name = models.CharField(max_length=100)

class Meta: ordering = (‘name’,)

class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')

Alias for customer

Page 36: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 36

ForeignKey usage: Order By Example

Poll.objects.order_by(‘customer__id’)

SELECT p.* from poll INNER JOIN customer cON

p.customer_id = c.idORDER BY

p.customer_id ASC;

class Customer(models.Model): name = models.CharField(max_length=100)

class Meta: ordering = (‘name’,)

class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')

Page 37: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 37

ForeignKey usage: Order By Example

Poll.objects.order_by(‘customer_id’)

SELECT * from pollORDER BY

customer_id ASC;

class Customer(models.Model): name = models.CharField(max_length=100)

class Meta: ordering = (‘name’,)

class Poll(models.Model): customer_id = models.IntegerField() question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')

customer = models.ForeignObject(Customer, from_fields=(‘customer_id’,), to_fields=(‘id’,))

Page 38: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 38

Still more fun stuff

• ForeignObject.get_extra_description_filter

• ForeignObject.get_extra_restriction

• More to come

Page 39: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 39

Dig for more information:

• ForeignObject source• django/db/models/fields/related.py

• V1 Version of Patch (Based of Django 1.4)• https://github.com/jtillman/django/tree/MultiColumnJoin

• Blog post to come• Hearsay Social Blog (http://engineering.hearsaysocial.com/)

Page 40: Django Meetup: Django Multicolumn Joins

Django Multicolumn Joins | © 2012 Hearsay Social 40

Questions?