top 10 perl performance tips
Post on 10-May-2015
14.360 Views
Preview:
DESCRIPTION
TRANSCRIPT
Top 10 Perl Performance Tips
Perrin HarkinsWe Also Walk Dogs
Devel::NYTProf
Ground Rules
● Make a repeatable test to measure progress with○ Sometimes turns up surprises
● Use a profiler (Devel::NYTProf) to find where the time is going
○ Don't flail and waste time optimizing the wrong things!● Try to weigh the cost of developer time vs buying more
hardware○ Optimization is crack for developers, hard to know when
to stop
1. The Big Picture
● The biggest gains usually come from changing your high-level approach
○ Is there a more efficient algorithm?○ Can you restructure to reduce duplicated effort?
● Sometimes you just need to tune your SQL● A boatload of RAM hides a multitude of sins● The bottleneck is usually I/O
○ Files○ Database○ Network○ Batch I/O often makes a huge difference
2. Use DBI Efficiently
● Can make a huge difference in tight loops with many small queries
● connect_cached() avoids connection overhead○ Or use your favorite connection cache, but beware
overuse of ping()● prepare_cached() avoids object creation and server-side
prepare overhead● Use bind parameters to reuse SQL statements instead of
creating new ones
2. Use DBI Efficiently
● Use bind_cols() in a fetch() loop for most efficient retrieval.○ Less copying is faster.○ Alternatively, fetchrow_arrayref()
● prepare() and then many execute() calls is faster than do()
2. Use DBI Efficiently
● Turn off AutoCommit for batch changes○ Commit every thousand rows or so saves work for your
database● Use your database's bulk loader when possible
○ Writing rows to CSV and using MySQL's LOAD DATA INFILE crushes the fastest DBI code
○ 10X speedup is not unusual
2. Use DBI Efficiently
● Use ORMs Wisely○ Consider using straight DBI for the most performance
sensitive sections■ Removing a layer means fewer method calls and
faster code○ Write report queries by hand if they seem slow
■ Optimizer hints and choices about SQL variations are beyond the scope of ORMs but make a huge difference for this kind of query
3. Choose the Fastest Hash Storage
● memcached is not the fastest option for a local cache○ BerkeleyDB (not DB_File!) and Cache::FastMmap are
about twice as fast● CHI abstracts the storage layer
○ Useful if you think network strategy may change later
3. Choose the Fastest Hash Storage
Cache Get time Set time Run timeCHI::Driver::Memory 0.03ms 0.05ms 0.35s
BerkeleyDb 0.05ms 0.17ms 0.57sCache::FastMmap 0.06ms 0.09ms 0.62sCHI::Driver::File 0.10ms 0.26ms 1.11sCache::Memcached::Fast 0.12ms 0.15ms 1.23sMemcached::libmemcached 0.14ms 0.16ms 1.40sCHI::Driver::DBI Sqlite 0.11ms 1.94ms 2.05sCache::Memcached 0.29ms 0.21ms 2.88sCHI::Driver::DBI MySQL 0.45ms 0.33ms 4.41s
4. Generate Code and Compile to a Subroutine
● This is how most templating tools work.● Remove the cost of things that won't change for a while
○ Skip re-parsing templates○ Skip large groups of conditionals○ Choose architecture-specific code
my %subs;my $code = qq{print "Hello $thing\n";};$subs{'hello'} = eval "sub { $code }";$subs{'hello'}->();
5. Sling Text Efficiently
● Slurp files when possible. my $text = do { local $/; <$fh>; }
● Seems obvious, but I still see people doing this:my @lines = <$fh>;my $text = join('', @lines);
● Consider memory with huge files.
5. Sling Text Efficiently
● Use a "sliding window" to search very large files.○ Too big to slurp, but line-by-line is slow.○ Chunks of 8K or 16K are much faster, but require book-
keeping code. ○ http://www.perlmonks.org/?node_id=128925
● Use the cheapest string tests you can get away with.○ index() beats a regex when you just want to know if a
string contains another string● Use a fast CSV parser
○ Text::CSV_XS is much faster than the regexes you copied from that web page.
6. Replace LWP With Something Faster
● LWP is amazing, but modules built on C libraries tend to be faster.
○ LWP::Curl○ HTTP::Lite○ Maybe HTTP::Async for parallel
LWP 32.8/sHTTP::Async 64.5/sHTTP::Lite 200/sLWP::Curl 1000/s
7. Use a Fast Serializer
● Data::Dumper is great for debugging, but slow for serialization.
● JSON::XS is the new speed king, and is human-readable and cross-language.
● Storable handles more and is second-best in speed.
7. Use a Fast Serializer
YAML 84.7/s
XML::Simple 800/s
Data::Dumper 2143/s
FreezeThaw 2635/s
YAML::Syck 4307/s
JSON::Syck 4654/s
Storable 9774/s
JSON::XS 41473/s
8. Avoid Startup Costs
● Use a daemon to run code persistently○ Skip the costs of compiling○ Cache data○ Open connections ahead of time
● mod_perl, FastCGI, Plack, etc. for web● PPerl for command-line
○ Or hit your web server with lwp-get
9. Sometimes You Have to Get Crazy
● Use the @_ array directly to avoid copying sub add_to_sql { my $sqlbase = shift; # hashref my ($name, $value) = @_; if ($value) { push(@{ $sqlbase->{'names'} }, $name); push(@{ $sqlbase->{'values'} }, $value); } return $sqlbase;}
9. Sometimes You Have to Get Crazy
sub add_to_sql { # takes 3 params: hashref, name, and value return if not $_[2];
push(@{ $_[0]->{'names'} }, $_[1]); push(@{ $_[0]->{'values'} }, $_[2]);}
● 40% faster than original● More than 40% harder to read
10. Consider Compiling Your Own Perl
● Compiling without threads can be good for a free 15% or so.● No code changes needed! ● Has maintenance costs.
Resources
Tim Bunce's Advanced DBI slides:http://www.slideshare.net/Tim.Bunce/dbi-advanced-tutorial-2007 Also see Tim's NYTProf slides:http://www.slideshare.net/Tim.Bunce/develnytprof-v4-at-oscon-201007
man perlperf Programming Perl appendix on performance
Thank you!
Slides will be available on the conference website
Avoid tie()
● Slower than method calls!● PITA to debug too.
Use a Fast Sort
● For sorting on derived keys, consider a GRT sort.○ Faster than Schwartzian Transform○ Use Sort::Maker to build it.
top related