how to avoid hanging yourself with rails

Post on 18-May-2015

4.267 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation given to Toronto Rails Project Night, performance tips for ActiveRecord usage

TRANSCRIPT

How to avoid hanging yourself with Rails

Using ActiveRecord right the first time

work.rowanhick.com

1

Discussion tonight

• Intended for new Rails Developers

• People that think Rails is slow

• Focus on simple steps to improve common :has_many performance problems

• Short - 15mins

• All links/references up on http://work.rowanhick.com tomorrow

2

About me

• New Zealander (not Australian)

• Product Development Mgr for a startup in Toronto

• Full time with Rails for 2 years

• Previously PHP/MySQL for 4 years

• 6 years Prior QA/BA/PM for Enterprise CAD/CAM software dev company

3

Disclaimer

• For sake of brevity and understanding, the SQL shown here is cut down to “psuedo sql”

• This is not an exhaustive in-depth analysis, just meant as a heads up

• Times were done using ApacheBench through mongrel in production mode

• ab -n 1000 http://127.0.0.1/orders/test_xxxx

4

ActiveRecord lets you get in trouble far to quick.

• Super easy syntax comes at a cost. @orders = Order.find(:all)@orders.each do |order| puts order.customer.name puts order.customer.country.nameend

✴Congratulations, you just overloaded your DB with (total number of Orders x 2) unnecessary SQL calls

5

What happened there?

• One query to get the orders@orders = Order.find(:all)“SELECT * FROM orders”

• For every item in the orders collection customer.name:“SELECT * FROM customers WHERE id = x”

customer.country.name:“SELECT * FROM customers WHERE id = y”

6

Systemic Problem in Web development

I’ve seen:

- 15 Second page reloads

- 10000 queries per page

“<insert name here> language performs really poorly, we’re going to get it redeveloped in <insert new language here>”

7

Atypical root cause

• Failure to build application with *real* data

• ie “It worked fine on my machine” but the developer never loaded up 100’000 records to see what would happen

• Using Rake tasks to build realistic data sets

• Test, test, test

• tail -f log/development.log

8

Faker to the rescue• in lib/xchain.rake

namespace :xchain do desc "Load fake customers" task :load_customers => :environment do require 'Faker' Customer.find(:all, :conditions => "email LIKE('%XCHAIN_%')").each { |c| c.destroy } 1..300.times do c = Customer.new c.status_id = rand(3) + 1 c.country_id = rand(243) + 1 c.name = Faker::Company.name c.alternate_name = Faker::Company.name c.phone = Faker::PhoneNumber.phone_number c.email = "XCHAIN_"+Faker::Internet.email c.save end end

$ rake xchain:load_customers

9

Eager loading

• By using :include in .finds you create sql joins

• Pull all required records in one queryfind(:all, :include => [ :customer, :order_lines ])

✓ order.customer, order.order_lines

find(:all, :include => [ { :customer => :country }, :order_lines ])

✓ order.customer order.customer.country order.order_lines

10

Improvement

• Let’s start optimising ... @orders = Order.find(:all, :include => {:customers => :country} )

• Resulting SQL ...“SELECT orders.*, countries.* FROM orders LEFT JOIN customers ON ( customers.id = orders.customers_id ) LEFT JOIN countries ON ( countries.id = customers.country_id)

✓ 7.70 req/s 1.4x faster

11

Select only what you need

• Using the :select parameter in the find options, you can limit the columns you are requesting back from the database

• No point grabbing all columns, if you only want :id and :name Orders.find(:all, :select => ‘orders.id, orders.name’)

12

The last slide was very important

• Not using selects is *okay* provided you have very small columns, and never any binary, or large text data

• You can suddenly saturate your DB connection.

• Imagine our Orders table had an Invoice column on it storing a pdf of the invoice...

13

Oops

• Can’t show a benchmark

• :select and :include don’t work together !, reverts back to selecting all columns

• Core team for a long time have not included patches to make it work

• One little sentence in ActiveRecord rdoc “Because eager loading generates the SELECT statement too, the :select option is ignored.”

14

‘mrj’ to the rescue

• http://dev.rubyonrails.org/attachment/ticket/7147/init.5.rb

• Monkey patch to fix select/include problem

• Produces much more efficient SQL

15

Updated finder

• Now :select and :include playing nice: @orders = Order.find(:all, :select => 'orders.id, orders.created_at, customers.name, countries.name, order_statuses.name', :include => [{:customer[:name] => :country[:name]}, :order_status[:name]], :conditions => conditions, :order => 'order_statuses.sort_order ASC,order_statuses.id ASC, orders.id DESC')

✓15.15 req/s 2.88x faster

16

r8672 change

• http://blog.codefront.net/2008/01/30/living-on-the-edge-of-rails-5-better-eager-loading-and-more/

• The following uses new improved association load (12 req/s)

@orders = Order.find(:all, :include => [{:customer => :country}, :order_status] )

• The following does not

@orders = Order.find(:all, :include => [{:customer => :country}, :order_status], :order => ‘order_statuses.sort_order’ )

17

r8672 output...

• Here’s the SQL

Order Load (0.000837) SELECT * FROM `orders` WHERE (order_status_id < 100) LIMIT 10

Customer Load (0.000439) SELECT * FROM `customers` WHERE (customers.id IN (2106,2018,1920,2025,2394,2075,2334,2159,1983,2017))

Country Load (0.000324) SELECT * FROM `countries` WHERE (countries.id IN (33,17,56,150,194,90,91,113,80,54))

OrderStatus Load (0.000291) SELECT * FROM `order_statuses` WHERE (order_statuses.id IN (10))

18

But I want more

• Okay, this still isn’t blazing fast. I’m building the next killr web2.0 app

• Forgetabout associations, just load it via SQL, depending on application, makes a huge difference

• Concentrate on commonly used pages

19

Catch 22

• Hard coding SQL is the fastest solution

• No construction of SQL, no generation of ActiveRecord associated classes

• If your DB changes, you have to update SQL

‣ Keep SQL with models where possible

20

It ain’t pretty.. but it’s fast

• Find by SQL class order def self.find_current_orders find_by_sql("SELECT orders.id, orders.created_at, customers.name as customer_name, countries.name as country_name, order_statuses.name as status_name FROM orders LEFT OUTER JOIN `customers` ON `customers`.id = `orders`.customer_id LEFT OUTER JOIN `countries` ON `countries`.id = `customers`.country_id LEFT OUTER JOIN `order_statuses` ON `order_statuses`.id = `orders`.order_status_id WHERE order_status_id < 100 ORDER BY order_statuses.sort_order ASC,order_statuses.id ASC, orders.id DESC") endend

• 28.90 req/s ( 5.49x faster )

21

And the results

find(:all) 5.26 req/s

find(:all, :include) 7.70 req/s 1.4x

find(:all, :select, :include) 15.15 req/s 2.88x

find_by_sql() 28.90 req/s 5.49x

22

Don’t forget indexes

• 64000 ordersOrderStatus.find(:all).each { |os| puts os.orders.count }

• Avg 0.61 req/s no indexes

• EXPLAIN your SQLALTER TABLE `xchain_test`.`orders` ADD INDEX order_status_idx(`order_status_id`);

• Avg 23 req/s after index (37x improvment)

23

Avoid .count

• It’s damned slowOrderStatus.find(:all).each { |os| puts os.orders.count }

• Add column orders_count + update codeOrderStatus.find(:all).each { |os| puts os.orders_count }

✓34 req/s vs 108 req/s (3x faster)

24

For the speed freaks

• Merb - http://merbivore.com

• 38.56 req/s - 7x performance improvement

• Nearly identical code

• Blazingly fast

25

The End

work.rowanhick.com

26

top related