ajax world2008 eric farrar
TRANSCRIPT
10101010101010001010101101101010101011
It’s 11 p.m., Do you know where you queriesare?Eric Farrar, Sybase iAnywhere
Outline
What are ORMs and Active Records? Tradeoffs
Playing Nice with your Database Managing Indexes Eager Loading and Client-Side Joins Lazy Loading
Conclusion
Object-Relational Mapper
Systems to bridge the gap between object-oriented languagesand relational databases
Inherently difficult: Normalization (splitting data across tables) Databases can only store scalar values Add an extra layer of abstraction
class Employee < ActiveRecord::Base belongs_to :officeend
class Office < ActiveRecord::Base has_one :employeeend
Active Record Pattern
The ‘meat’ of an ORM that handles the CRUD work Allows regular objects to be treated as persistent objects Ideally, totally abstracts all database interaction
my_office = Office.new()
my_office.number = 123
me = Employee.new
me.name = ‘Eric Farrar’
me.office = my_office
Examples of ORMs/Active Records
LINQ (Language Integrated Query) Hibernate / NHibernate Django Ruby on Rails (ActiveRecord) Many more…
For our purposes, we will use Rail’s ActiveRecord for theexamples
Trade-offs
Advantages Easy to learn Simplifies database creation and management No context switching between languages You don’t need know about the database
Disadvantages Performance suffers (up to 50% slower) Often uses lowest-common denominator solution Concurrency semantics often very difficult You don’t need know about the database
Managing Indexes
Indexes are used to make things quick to look up phone book vs. reverse look-up
Indexes should be present on anything you will search for Searching for non-indexed properties will result in full table
scan By default, indexes are usually only put on primary keys Lack of indexes often will not appear during development Result will be a gradual slowdown (as data volume increases)
as opposed to avalanche failure Why not put an index on everything? Multi-column indexes vs. single column indexes
Client-Side Join
Objects are usually ‘related’ to each other belongs_to has_one has_many has_and_belongs_to_many
ORMs use these relationship to allow object traversal ex. me.office
Assuming 10000 employees, how many queries will this codeproduce?Employees.find(:all).each do |e|
puts e.office.number
end
“Man, this is heavy!”
Answer: 10001
Why? The application is doing the work of joining the data, notthe database. This is called a ‘client-side’ join
This is solved by giving a hint to the ORM and the databasethat you intend to use the ‘office’ property
This pattern is called eager loading
Employees.find(:all).each do |e| # <-- 1 query here
puts e.office.number # <-- 10,000 queries here
end
Employees.find(:all :include => :office).each do |e|
puts e.office.number
end
Inviting the Database to the Party
Eager loading solves the N+1 problem, but it is still only halfway there
In ORMs, the relations are defined inside the object models The ORM may know that Employees are Offices are related,
but the database doesn’t know that The database will obediently execute the query, but don’t
expect it to do anything clever Modern query optimizers will use every statistic available when
determining query paths Keeping them ignorant will result in bare-bones optimization
Lazy Loading
Eager loading deals with the case where you want more thanyour class includes
What if you want less? Suppose your Employee class includes a picture field that is a
high resolution bitmap (~ 3 mb) The previous query will actually return the picture in order to fully
populate the object
This innocent code will naively return > 30 Gb of data
Employees.find(:all).each do |e|
puts e.name
end
Be Lazy
Instead, lazily load your object properties
Accessing e.picture will work by issuing another databasequery
This simple example ignores potential problems withconcurrency Use locking
Employees.find(:all :select => [“name”]).each do |e|
puts e.name
end
Conclusions
ORMs and Active Records can provide large productivityadvantages, typically at the expense of performance
ORMs should never be seen as an alternative to learningabout databases (although it can be a good introduction)
At times, you will likely need to drop down to the databaselevel (profiling, etc) to diagnose problems
Ideally, a programmer using a ORM will always consider howtheir code will actually look once it hits the database Similarities to a C compiler
You should be able to answer “Yes!” to the question, “Do youknow where your queries are?”