![Page 1: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/1.jpg)
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
November 12, 2014 | Las Vegas
Performance Profiling in Production Analyzing Web Requests at Scale Using MapReduce and Storm
Zach Musgrave, Yelp
![Page 2: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/2.jpg)
Roadmap
1. Why profile your code? 2. Create and analyze profiles 3. Acquire profiles from your webapp 4. Search and sort profiles 5. Aggregate similar profiles together 6. Search, sort, aggregate in real time 7. Future work, extensions, and possibilities
![Page 3: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/3.jpg)
In a magical world, far far away…
• Our apps never break • Our apps never slow down • Developers think about scalability • All external services run in O(1) • All bugs are known a priori
![Page 4: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/4.jpg)
In the world we live in…
• Accidents happen • Developers are people • Developers make mistakes • Those mistakes can make it to… production!
![Page 5: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/5.jpg)
One Fateful Day…
![Page 6: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/6.jpg)
One Fateful Day…
![Page 7: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/7.jpg)
One Fateful Day…
Your code makes me sad :(
Crap crap crap
Holy crap
That’s like a 25% bump
Is this even our fault?
Who did this???Are we timing out?
What’s the user impact?Holy crap
![Page 8: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/8.jpg)
![Page 9: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/9.jpg)
Roadmap
1. Why profile your code? 2. Create and analyze profiles 3. Acquire profiles from your webapp 4. Search and sort profiles 5. Aggregate similar profiles together 6. Search, sort, aggregate in real time 7. Future work, extensions, and possibilities
![Page 10: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/10.jpg)
Enter… the profiler…
• Generate deterministic statistics – How many times is a method called? – How long is that method’s runtime? – What’s that method’s name/module? – How much total runtime is devoted?
• It’s easy to use ad hoc: – python -m cProfile myscript.py
![Page 11: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/11.jpg)
ztm@dev7-devb:~$ python -m pstats some-filename-goes-here.profile Welcome to the profile statistics browser. % sort cumulative % callees 10 Ordered by: cumulative time List reduced from 34239 to 10 due to restriction <10> ! ncalls tottime cumtime wsgi/app.py:134(classic_yelp_routing) 1271 0.043 0.208 web/common.py:126(handler_context) 1271 0.102 1790.071 web/wsgi.py:365(execute_request) web/gatekeeper/check.py:226(_handle) 1320 0.037 0.277 visit_captcha.py:58(is_captcha_uri) 1318 0.036 1804.759 emergency_captcha.py:121(__call__) 1318 0.021 0.043 gatekeeper/check.py:223(_should_log_request_timing) web/emergency_captcha.py:90(_handle) 1318 0.016 0.183 visit_captcha.py:58(is_captcha_uri) 1312 0.030 1804.016 web/accesscookies/app.py:118(__call__) 1317 0.020 0.369 web/emergency_captcha.py:67(_should_display_captcha) web/accesscookies/app.py:118(__call__) 1311 0.020 1802.502 pagelet/app.py:37(app) 1312 0.038 0.311 web/accesscookies/app.py:151(should_handle) 1312 0.020 0.058 web/wsgi.py:55(__init__) pagelet/app.py:37(app) 1312 0.030 1803.569 .../pyramid/router.py:242(__call__) 41 0.000 0.009 core/ips.py:48(is_internal_ip) 1312 0.003 0.003 {method 'get' of 'dict' objects}
Raw Profile Output
![Page 12: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/12.jpg)
ztm@dev7-devb:~$ diff_pstats -s calls several_months_ago.profile recently.profile SORTING BY DELTA IN calls BEFORE AFTER DELTA yelp/util/request_bucketer/bucketer.py:<lambda> 485 3284 2798 ...site-packages/staticconf/proxy.py:method 1967 3524 1557 yelp/util/experiments.py:<genexpr> 231 1620 1389 ...site-packages/simplejson/encoder.py:iterencode 0 1189 1189 yelp/core/encapsulation.py:__new__ 0 1062 1062
ztm@dev7-devb:~$ diff_pstats -s cum several_months_ago.profile recently.profile SORTING BY DELTA IN cum BEFORE AFTER DELTA yelp/wsgi/tweens.py:tween 1.352045 5.487169 4.135124 yelp/web/gatekeeper/check.py:_handle 0.000000 1.378666 1.378666 yelp/web/emergency_captcha.py:_handle 0.000000 1.378226 1.378226 yelp/util/cheetah/filters.py:markup_filter 0.000000 0.101759 0.101759 yelp/logic/decorators.py:wrapper 0.233577 0.321657 0.088080 yelp/logic/experiments.py:experiments_for_yuv 0.034188 0.120480 0.086293 yelp/util/request_bucketer/bucketer.py:get_bucket 0.049993 0.135661 0.085668
Diff Based on Call Count (n~1,000)
Diff Based on Cumulative Runtime (n~1,000)
![Page 13: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/13.jpg)
BUT HOW DOES THIS WORK IN PRODUCTION?!?!
Hi! I’m Daurius, the profiling hedgehog!
![Page 14: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/14.jpg)
Roadmap
1. Why profile your code? 2. Create and analyze profiles 3. Acquire profiles from your webapp 4. Search and sort profiles 5. Aggregate similar profiles together 6. Search, sort, aggregate in real time 7. Future work, extensions, and possibilities
![Page 15: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/15.jpg)
Get Your Data!
• Make a context manager – Wrap your app in it at a high level – Return a profiling context… sometimes
• Make a place to put your profiles – We use a distributed logging system, Scribe – You can also save them to a local disk – As long as they eventually go to the cloud!
• Add your logging stream to your profiles! – Then you can search for attributes
![Page 16: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/16.jpg)
Internet
End User Requests Yelp DCs
(East Coast)
Yelp DCs (West Coast)
Scribe Aggregator
Scribe Aggregator
Upload Scribe to Amazon S3
Upload Scribe to Amazon S3
All your profiling and logging data in one place!
Real-time analysis Log tailing
S3
System Diagram
![Page 17: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/17.jpg)
Webapp Contextclass CProfileScribeContext(object): """ Context: on exit, save cProfile to Scribe log. """ scribe_category = "cprofile" ! def __enter__(self): self.profiler = cProfile.Profile() self.profiler.enable() ! def __exit__(self, *args): self.profiler.create_stats() write_out = { "cprofile": encode_stats(Stats(self.profiler)), "ranger": ranger.request_info } clog.log_line( self.scribe_category, write_ranger_line(write_out) ) !!
![Page 18: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/18.jpg)
Webapp Contextclass CProfileScribeContext(object): """ Context: on exit, save cProfile to Scribe log. """ scribe_category = "cprofile" ! def __enter__(self): self.profiler = cProfile.Profile() self.profiler.enable() ! def __exit__(self, *args): self.profiler.create_stats() write_out = { "cprofile": encode_stats(Stats(self.profiler)), "ranger": ranger.request_info } clog.log_line( self.scribe_category, write_ranger_line(write_out) ) !!
![Page 19: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/19.jpg)
Webapp Contextclass CProfileScribeContext(object): """ Context: on exit, save cProfile to Scribe log. """ scribe_category = "cprofile" ! def __enter__(self): self.profiler = cProfile.Profile() self.profiler.enable() ! def __exit__(self, *args): self.profiler.create_stats() write_out = { "cprofile": encode_stats(Stats(self.profiler)), "ranger": ranger.request_info } clog.log_line( self.scribe_category, write_ranger_line(write_out) ) !!
![Page 20: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/20.jpg)
Webapp Context Managerclass CProfileContextManager(object): ! def should_profile(self, servlet): """ Get the probability for a specific servlet. """ ! cprof_prob = get_config(servlet, DEFAULT) if random.random() < cprof_prob: return True return False ! def get_manager(self, request): """ Return a context manager for the request. """ ! if config.enabled and self.should_profile(servlet): return CProfileScribeContext() return CProfileNoOp()
![Page 21: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/21.jpg)
Webapp Context Managerclass CProfileContextManager(object): ! def should_profile(self, servlet): """ Get the probability for a specific servlet. """ ! cprof_prob = get_config(servlet, DEFAULT) if random.random() < cprof_prob: return True return False ! def get_manager(self, request): """ Return a context manager for the request. """ ! if config.enabled and self.should_profile(servlet): return CProfileScribeContext() return CProfileNoOp()
![Page 22: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/22.jpg)
Webapp Context Managerclass CProfileContextManager(object): ! def should_profile(self, servlet): """ Get the probability for a specific servlet. """ ! cprof_prob = get_config(servlet, DEFAULT) if random.random() < cprof_prob: return True return False ! def get_manager(self, request): """ Return a context manager for the request. """ ! if config.enabled and self.should_profile(servlet): return CProfileScribeContext() return CProfileNoOp()
![Page 23: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/23.jpg)
Per-Servlet Configuration• Consider maintaining a config!
– Default percentage of requests to profile – Override for specific servlets
• Useful for unusual/rarely loaded flows • Reload dynamically with PyStaticConf
cprofile: enabled: True probability: default: 0.000X servlets: - home: 0.002X - biz_details: 0.001X
![Page 24: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/24.jpg)
Roadmap
1. Why profile your code? 2. Create and analyze profiles 3. Acquire profiles from your webapp 4. Search and sort profiles 5. Aggregate similar profiles together 6. Search, sort, aggregate in real time 7. Future work, extensions, and possibilities
![Page 25: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/25.jpg)
Usability is KEY!
• Having ~150,000 of anything per day is HARD! • You need to be able to search, sort, and filter • You need to do this quickly
– Or it gets stale — less than one day latency – Stale data isn’t (usually) useful!
I’m a classy ‘hog… I wanna be FRESH!
![Page 26: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/26.jpg)
Enter… Amazon EMR!
![Page 27: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/27.jpg)
Why Amazon EMR?
• EMR lets you run MapReduce jobs in the cloud – How big a cluster? As big as you want!
• EMR spins up on demand, too • It’s super easy to use with Python!
– Yelp maintains MRJob
Mr. Job and I are best buddies!
![Page 28: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/28.jpg)
Save Discrete Profile, Logging Files
• Process lines of Scribe logs into correct formats – Perfectly parallel – each line is independent!
• Save into Amazon S3 – One file for each request’s cProfile – One file for each request’s logging data
• Analyze logging data for searchable parameters – Each parameter can be computed in parallel!
![Page 29: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/29.jpg)
Parameters Yelp Cares About• WHO: Is the user logged in? • WHAT: Which page did the user access?
• site (main, mobile, api, biz site) • servlet (home, biz_details, user_profile) • action (submit, first load, refresh)
• WHERE: Which data center? • WHEN: 2014-10-01 T 13:10:53 • HOW: HTTP request (GET, POST, PUT) • HOW LONG: over/under 1 second response
![Page 30: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/30.jpg)
Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): ! def mapper(self, _, line): # convert text into dict; convert JSON to a pstats object request = process_ranger_line(line) pstats = decode_stats(request["cprofile"]) ! # ex: logs/cprofile-discrete/2014/10/01/00:01:34-3fc2d016d8accaf4 save_path = get_basekeyname(request) ! # save pstats and logging info to Amazon S3 bucket.set_object_gz(save_path + ".profile.gz", marshal.dumps(pstats.stats) ) bucket.set_object_gz(save_path + ".ranger.gz", write_ranger_line(request["ranger"]) ) # key examples: datacenter/sfo ; loggedin/True ; servlet/home for key in make_all_matching_tags(request) yield key, save_path
![Page 31: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/31.jpg)
Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): ! def mapper(self, _, line): # convert text into dict; convert JSON to a pstats object request = process_ranger_line(line) pstats = decode_stats(request["cprofile"]) ! # ex: logs/cprofile-discrete/2014/10/01/00:01:34-3fc2d016d8accaf4 save_path = get_basekeyname(request) ! # save pstats and logging info to Amazon S3 bucket.set_object_gz(save_path + ".profile.gz", marshal.dumps(pstats.stats) ) bucket.set_object_gz(save_path + ".ranger.gz", write_ranger_line(request["ranger"]) ) # key examples: datacenter/sfo ; loggedin/True ; servlet/home for key in make_all_matching_tags(request) yield key, save_path
![Page 32: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/32.jpg)
Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): ! def mapper(self, _, line): # convert text into dict; convert JSON to a pstats object request = process_ranger_line(line) pstats = decode_stats(request["cprofile"]) ! # ex: logs/cprofile-discrete/2014/10/01/00:01:34-3fc2d016d8accaf4 save_path = get_basekeyname(request) ! # save pstats and logging info to Amazon S3 bucket.set_object_gz(save_path + ".profile.gz", marshal.dumps(pstats.stats) ) bucket.set_object_gz(save_path + ".ranger.gz", write_ranger_line(request["ranger"]) ) # key examples: datacenter/sfo ; loggedin/True ; servlet/home for key in make_all_matching_tags(request) yield key, save_path
![Page 33: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/33.jpg)
Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): ! def mapper(self, _, line): # convert text into dict; convert JSON to a pstats object request = process_ranger_line(line) pstats = decode_stats(request["cprofile"]) ! # ex: logs/cprofile-discrete/2014/10/01/00:01:34-3fc2d016d8accaf4 save_path = get_basekeyname(request) ! # save pstats and logging info to Amazon S3 bucket.set_object_gz(save_path + ".profile.gz", marshal.dumps(pstats.stats) ) bucket.set_object_gz(save_path + ".ranger.gz", write_ranger_line(request["ranger"]) ) # key examples: datacenter/sfo ; loggedin/True ; servlet/home for key in make_all_matching_tags(request) yield key, save_path
![Page 34: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/34.jpg)
Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): ! def mapper(self, _, line): # convert text into dict; convert JSON to a pstats object request = process_ranger_line(line) pstats = decode_stats(request["cprofile"]) ! # ex: logs/cprofile-discrete/2014/10/01/00:01:34-3fc2d016d8accaf4 save_path = get_basekeyname(request) ! # save pstats and logging info to Amazon S3 bucket.set_object_gz(save_path + ".profile.gz", marshal.dumps(pstats.stats) ) bucket.set_object_gz(save_path + ".ranger.gz", write_ranger_line(request["ranger"]) ) # key examples: datacenter/sfo ; loggedin/True ; servlet/home for key in make_all_matching_tags(request) yield key, save_path
![Page 35: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/35.jpg)
Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): def reducer(self, tag_key, matching_paths): # ex: logs/cprofile-discrete/2014/10/01/tags/datacenter/sfo tag_path = tag_path_for(tag_key) ! # get old list of matching values; add new values tag_list = bucket.get_object_gz(tag_path).split("\n") # update matching values tag_list.extend(list(matching_paths)) tag_contents = "\n".join(tag_list) # upload tag file w/ new matching web requests bucket.set_object_gz(tag_path, tag_contents) # output the number of paths per tag we added yield key, len(matching_paths)
![Page 36: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/36.jpg)
Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): def reducer(self, tag_key, matching_paths): # ex: logs/cprofile-discrete/2014/10/01/tags/datacenter/sfo tag_path = tag_path_for(tag_key) ! # get old list of matching values; add new values tag_list = bucket.get_object_gz(tag_path).split("\n") # update matching values tag_list.extend(list(matching_paths)) tag_contents = "\n".join(tag_list) # upload tag file w/ new matching web requests bucket.set_object_gz(tag_path, tag_contents) # output the number of paths per tag we added yield key, len(matching_paths)
![Page 37: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/37.jpg)
Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): def reducer(self, tag_key, matching_paths): # ex: logs/cprofile-discrete/2014/10/01/tags/datacenter/sfo tag_path = tag_path_for(tag_key) ! # get old list of matching values; add new values tag_list = bucket.get_object_gz(tag_path).split("\n") # update matching values tag_list.extend(list(matching_paths)) tag_contents = "\n".join(tag_list) # upload tag file w/ new matching web requests bucket.set_object_gz(tag_path, tag_contents) # output the number of paths per tag we added yield key, len(matching_paths)
![Page 38: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/38.jpg)
Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): def reducer(self, tag_key, matching_paths): # ex: logs/cprofile-discrete/2014/10/01/tags/datacenter/sfo tag_path = tag_path_for(tag_key) ! # get old list of matching values; add new values tag_list = bucket.get_object_gz(tag_path).split("\n") # update matching values tag_list.extend(list(matching_paths)) tag_contents = "\n".join(tag_list) # upload tag file w/ new matching web requests bucket.set_object_gz(tag_path, tag_contents) # output the number of paths per tag we added yield key, len(matching_paths)
![Page 39: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/39.jpg)
Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): def reducer(self, tag_key, matching_paths): # ex: logs/cprofile-discrete/2014/10/01/tags/datacenter/sfo tag_path = tag_path_for(tag_key) ! # get old list of matching values; add new values tag_list = bucket.get_object_gz(tag_path).split("\n") # update matching values tag_list.extend(list(matching_paths)) tag_contents = "\n".join(tag_list) # upload tag file w/ new matching web requests bucket.set_object_gz(tag_path, tag_contents) # output the number of paths per tag we added yield key, len(matching_paths)
![Page 40: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/40.jpg)
Roadmap
1. Why profile your code? 2. Create and analyze profiles 3. Acquire profiles from your webapp 4. Search and sort profiles 5. Aggregate similar profiles together 6. Search, sort, aggregate in real time 7. Future work, extensions, and possibilities
![Page 41: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/41.jpg)
Aggregate into Multiple Requests
• Any single profile doesn’t tell the whole story – If you pick one at random… – There’s no guarantee it’ll show the badness
• Create aggregate profiles – Usually one per day, for each set of parameters – Compare daily aggregates to see the big picture
It’s hard to see the hedgehogs for the trees!
![Page 42: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/42.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def mapper(self, _, pathname): # download the logging info and process it ranger_raw = bucket.get_object_gz(pathname + ".ranger.gz") ranger_data = process_ranger_line(ranger_raw) ! # download the cProfile info and process it profile_raw = bucket.get_object_gz(pathname + ".profile.gz") stats = pstats.Stats(marshal.loads(profile_raw)) ! # key examples: datacenter/sfo ; loggedin/True ; servlet/home tags = make_all_matching_tags(ranger_data) # generate all 7-ary, ... 1-ary, 0-ary matching paths # 3-ary example: http_method.GET,servlet.biz_details,site.main for path in batch_process_paths(tags): yield path, {"ranger": [ranger_data], "cprofile": encode_stats(stats), }
![Page 43: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/43.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def mapper(self, _, pathname): # download the logging info and process it ranger_raw = bucket.get_object_gz(pathname + ".ranger.gz") ranger_data = process_ranger_line(ranger_raw) ! # download the cProfile info and process it profile_raw = bucket.get_object_gz(pathname + ".profile.gz") stats = pstats.Stats(marshal.loads(profile_raw)) ! # key examples: datacenter/sfo ; loggedin/True ; servlet/home tags = make_all_matching_tags(ranger_data) # generate all 7-ary, ... 1-ary, 0-ary matching paths # 3-ary example: http_method.GET,servlet.biz_details,site.main for path in batch_process_paths(tags): yield path, {"ranger": [ranger_data], "cprofile": encode_stats(stats), }
![Page 44: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/44.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def mapper(self, _, pathname): # download the logging info and process it ranger_raw = bucket.get_object_gz(pathname + ".ranger.gz") ranger_data = process_ranger_line(ranger_raw) ! # download the cProfile info and process it profile_raw = bucket.get_object_gz(pathname + ".profile.gz") stats = pstats.Stats(marshal.loads(profile_raw)) ! # key examples: datacenter/sfo ; loggedin/True ; servlet/home tags = make_all_matching_tags(ranger_data) # generate all 7-ary, ... 1-ary, 0-ary matching paths # 3-ary example: http_method.GET,servlet.biz_details,site.main for path in batch_process_paths(tags): yield path, {"ranger": [ranger_data], "cprofile": encode_stats(stats), }
![Page 45: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/45.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def mapper(self, _, pathname): # download the logging info and process it ranger_raw = bucket.get_object_gz(pathname + ".ranger.gz") ranger_data = process_ranger_line(ranger_raw) ! # download the cProfile info and process it profile_raw = bucket.get_object_gz(pathname + ".profile.gz") stats = pstats.Stats(marshal.loads(profile_raw)) ! # key examples: datacenter/sfo ; loggedin/True ; servlet/home tags = make_all_matching_tags(ranger_data) # generate all 7-ary, ... 1-ary, 0-ary matching paths # 3-ary example: http_method.GET,servlet.biz_details,site.main for path in batch_process_paths(tags): yield path, {"ranger": [ranger_data], "cprofile": encode_stats(stats), }
![Page 46: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/46.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def mapper(self, _, pathname): # download the logging info and process it ranger_raw = bucket.get_object_gz(pathname + ".ranger.gz") ranger_data = process_ranger_line(ranger_raw) ! # download the cProfile info and process it profile_raw = bucket.get_object_gz(pathname + ".profile.gz") stats = pstats.Stats(marshal.loads(profile_raw)) ! # key examples: datacenter/sfo ; loggedin/True ; servlet/home tags = make_all_matching_tags(ranger_data) # generate all 7-ary, ... 1-ary, 0-ary matching paths # 3-ary example: http_method.GET,servlet.biz_details,site.main for path in batch_process_paths(tags): yield path, {"ranger": [ranger_data], "cprofile": encode_stats(stats), }
![Page 47: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/47.jpg)
Aggregate into Multiple Requests
• Generating all batch paths is messy – First version looked like this…
![Page 48: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/48.jpg)
Aggregate into Multiple Requests
• Generating all batch paths is messy – First version looked like this…
![Page 49: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/49.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def reducer(self, path_key, entries): combo_pstats = None combo_ranger = [] # Loop over every set of profiles (1 or >1) given for entry in entries: # Add cProfile data together if combo_pstats: combo_pstats.add(decode_stats(entry["cprofile"])) else: combo_pstats = decode_stats(entry["cprofile"]) # Add logging data together combo_ranger.append(entry["ranger"]) ! # See next slide
![Page 50: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/50.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def reducer(self, path_key, entries): combo_pstats = None combo_ranger = [] # Loop over every set of profiles (1 or >1) given for entry in entries: # Add cProfile data together if combo_pstats: combo_pstats.add(decode_stats(entry["cprofile"])) else: combo_pstats = decode_stats(entry["cprofile"]) # Add logging data together combo_ranger.append(entry["ranger"]) ! # See next slide
![Page 51: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/51.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def reducer(self, path_key, entries): combo_pstats = None combo_ranger = [] # Loop over every set of profiles (1 or >1) given for entry in entries: # Add cProfile data together if combo_pstats: combo_pstats.add(decode_stats(entry["cprofile"])) else: combo_pstats = decode_stats(entry["cprofile"]) # Add logging data together combo_ranger.append(entry["ranger"]) ! # See next slide
![Page 52: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/52.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def reducer(self, path_key, entries): combo_pstats = None combo_ranger = [] # Loop over every set of profiles (1 or >1) given for entry in entries: # Add cProfile data together if combo_pstats: combo_pstats.add(decode_stats(entry["cprofile"])) else: combo_pstats = decode_stats(entry["cprofile"]) # Add logging data together combo_ranger.append(entry["ranger"]) ! # See next slide
![Page 53: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/53.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def reducer(self, path_key, entries): # See previous slide ! # ex: data/cprofile-processed/batch/2014/10/01/ # date.2014-10-01,http_method.GET,servlet.biz_details,site.main pathname = batch_path(combo_ranger) # save combined cprofile and logging data bucket.set_object_gz(pathname + ".profile.gz", marshal.dumps(combo_pstats.stats) ) bucket.set_object_gz(pathname + ".ranger.gz", write_ranger_line(combo_ranger) ) yield pathname, len(combo_ranger)
![Page 54: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/54.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def reducer(self, path_key, entries): # See previous slide ! # ex: data/cprofile-processed/batch/2014/10/01/ # date.2014-10-01,http_method.GET,servlet.biz_details,site.main pathname = batch_path(combo_ranger) # save combined cprofile and logging data bucket.set_object_gz(pathname + ".profile.gz", marshal.dumps(combo_pstats.stats) ) bucket.set_object_gz(pathname + ".ranger.gz", write_ranger_line(combo_ranger) ) yield pathname, len(combo_ranger)
![Page 55: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/55.jpg)
Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def reducer(self, path_key, entries): # See previous slide ! # ex: data/cprofile-processed/batch/2014/10/01/ # date.2014-10-01,http_method.GET,servlet.biz_details,site.main pathname = batch_path(combo_ranger) # save combined cprofile and logging data bucket.set_object_gz(pathname + ".profile.gz", marshal.dumps(combo_pstats.stats) ) bucket.set_object_gz(pathname + ".ranger.gz", write_ranger_line(combo_ranger) ) yield pathname, len(combo_ranger)
![Page 56: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/56.jpg)
Web Request Profile Me Maybe?
Scribe to S3
Nightly MRJob: Upload and tag
Nightly MRJob: Aggregate records
Ad-hoc MRJob: N-day aggregate
Amazon S3: - discrete profiles, logs - per-attribute tags
Amazon S3: - combined profiles, logs - per-attribute tags
E-mail notify
EMREMR
EMR
Profilistic service Hi, Daurius!
System Diagram Redux
![Page 57: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/57.jpg)
Aggregate into Multiple Requests
• We have, for every possible combination: • A combined set of profile statistics • A combined set of logging data
• Ex: examining {servlet: biz_details} • user logged in; long run!
• DC: east; user logged in; long run!
• DC: east; HTTP: POST; user logged in; long run!
• DC: east; HTTP: POST; site: main; logged in; long run
![Page 58: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/58.jpg)
ztm@dev7-devb:~$ diff_pstats -s calls several_months_ago.profile recently.profile SORTING BY DELTA IN calls BEFORE AFTER DELTA yelp/util/request_bucketer/bucketer.py:<lambda> 485 3284 2798 ...site-packages/staticconf/proxy.py:method 1967 3524 1557 yelp/util/experiments.py:<genexpr> 231 1620 1389 ...site-packages/simplejson/encoder.py:iterencode 0 1189 1189 yelp/core/encapsulation.py:__new__ 0 1062 1062
ztm@dev7-devb:~$ diff_pstats -s cum several_months_ago.profile recently.profile SORTING BY DELTA IN cum BEFORE AFTER DELTA yelp/wsgi/tweens.py:tween 1.352045 5.487169 4.135124 yelp/web/gatekeeper/check.py:_handle 0.000000 1.378666 1.378666 yelp/web/emergency_captcha.py:_handle 0.000000 1.378226 1.378226 yelp/util/cheetah/filters.py:markup_filter 0.000000 0.101759 0.101759 yelp/logic/decorators.py:wrapper 0.233577 0.321657 0.088080 yelp/logic/experiments.py:experiments_for_yuv 0.034188 0.120480 0.086293 yelp/util/request_bucketer/bucketer.py:get_bucket 0.049993 0.135661 0.085668
Diff Based on Call Count (n~1,000)
Diff Based on Cumulative Runtime (n~1,000)
![Page 59: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/59.jpg)
Storage Considerations, Per Day
• 152,814 discrete profile/log records • 40,537 aggregate combinations (0-ary to 7-ary) !
• 386,707 total files created !
• 62.25 GB storage space used (all gzipped) • 40.99 GB on aggregate profiles (without logs) • 21.28 GB on individual profiles/logs
![Page 60: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/60.jpg)
Performance Considerations
![Page 61: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/61.jpg)
Performance Considerations
• Amazon Elastic MapReduce: all units of work should take equal time • This is not the case for our
aggregations! • 60%: 10 or fewer profiles • 95%: 1,000 or fewer profiles • 8: over 100,000 profiles
![Page 62: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/62.jpg)
Ooh! It’s my time to shine!
Remember Ease of Use? …Remember Daurius?
![Page 63: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/63.jpg)
![Page 64: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/64.jpg)
![Page 65: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/65.jpg)
![Page 66: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/66.jpg)
![Page 67: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/67.jpg)
Roadmap
1. Why profile your code? 2. Create and analyze profiles 3. Acquire profiles from your webapp 4. Search and sort profiles 5. Aggregate similar profiles together 6. Search, sort, aggregate in real time 7. Future work, extensions, and possibilities
![Page 68: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/68.jpg)
• Apache Storm • Real-time distributed computation platform • Directed graph of processing steps (tuples)
• Spouts - sources of data - like Scribe! • Bolts - processors of data - like MRJob! • Groupings - define how tuples move between…
Enter… the Storm!
![Page 69: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/69.jpg)
Pyleus: A Python Framework for Storm Topologies
• Pyleus: Yelp’s super new Python Storm bindings • Now open sourced! http://pyleus.org
• Build topologies in Python • Declaratively describe structure in YAML
• Respects requirements.txt • Compose a topology from Python packaged components!
![Page 70: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/70.jpg)
Sample Pyleus Topologyname: profilistic workers: 3 topology: ! - spout: name: cprofile-sfo module: yelp_pyleus_util.scribe_spout options: scribe_host: 10.10.10.10 stream: cprofile ! - spout: name: cprofile-iad module: yelp_pyleus_util.scribe_spout options: scribe_host: 10.20.10.10 stream: cprofile
![Page 71: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/71.jpg)
Sample Pyleus Topologyname: profilistic workers: 3 topology: ! - spout: name: cprofile-sfo module: yelp_pyleus_util.scribe_spout options: scribe_host: 10.10.10.10 stream: cprofile ! - spout: name: cprofile-iad module: yelp_pyleus_util.scribe_spout options: scribe_host: 10.20.10.10 stream: cprofile
![Page 72: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/72.jpg)
Sample Pyleus Topologyname: profilistic workers: 3 topology: ! - spout: name: cprofile-sfo module: yelp_pyleus_util.scribe_spout options: scribe_host: 10.10.10.10 stream: cprofile ! - spout: name: cprofile-iad module: yelp_pyleus_util.scribe_spout options: scribe_host: 10.20.10.10 stream: cprofile
![Page 73: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/73.jpg)
Sample Pyleus Topologyname: profilistic workers: 3 topology: ! - spout: name: cprofile-sfo module: yelp_pyleus_util.scribe_spout options: scribe_host: 10.10.10.10 stream: cprofile ! - spout: name: cprofile-iad module: yelp_pyleus_util.scribe_spout options: scribe_host: 10.20.10.10 stream: cprofile
![Page 74: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/74.jpg)
Sample Pyleus Topology - bolt: # equivalent of first mapper name: process-ranger module: profilistic.storm.process_ranger groupings: - shuffle_grouping: cprofile-sfo - shuffle_grouping: cprofile-iad ! - bolt: # equiv. of first reducer, plus an S3 cache name: update-tag module: profilistic.storm.update_tag tasks: 6 parallelism_hint: 3 groupings: - fields_grouping: component: process-ranger fields: - tag
![Page 75: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/75.jpg)
Sample Pyleus Topology - bolt: # equivalent of first mapper name: process-ranger module: profilistic.storm.process_ranger groupings: - shuffle_grouping: cprofile-sfo - shuffle_grouping: cprofile-iad ! - bolt: # equiv. of first reducer, plus an S3 cache name: update-tag module: profilistic.storm.update_tag tasks: 6 parallelism_hint: 3 groupings: - fields_grouping: component: process-ranger fields: - tag
![Page 76: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/76.jpg)
Sample Pyleus Topology - bolt: # equivalent of first mapper name: process-ranger module: profilistic.storm.process_ranger groupings: - shuffle_grouping: cprofile-sfo - shuffle_grouping: cprofile-iad ! - bolt: # equiv. of first reducer, plus an S3 cache name: update-tag module: profilistic.storm.update_tag tasks: 6 parallelism_hint: 3 groupings: - fields_grouping: component: process-ranger fields: - tag
![Page 77: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/77.jpg)
Sample Pyleus Boltclass MyFirstBolt(pyleus.storm.SimpleBolt): ! def initialize(self): # set up any persistent config resources staticconf.YamlConfiguration( ... ) self.bucket = s3.get_bucket( ... ) ! def process_tuple(self, tup): key, value = tup # do stuff here! ! new_tup = (new_key, new_value) self.emit(new_tup) !if __name__ == '__main__': MyFirstBolt.run()
![Page 78: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/78.jpg)
Sample Pyleus Boltclass MyFirstBolt(pyleus.storm.SimpleBolt): ! def initialize(self): # set up any persistent config resources staticconf.YamlConfiguration( ... ) self.bucket = s3.get_bucket( ... ) ! def process_tuple(self, tup): key, value = tup # do stuff here! ! new_tup = (new_key, new_value) self.emit(new_tup) !if __name__ == '__main__': MyFirstBolt.run()
![Page 79: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/79.jpg)
Sample Pyleus Boltclass MyFirstBolt(pyleus.storm.SimpleBolt): ! def initialize(self): # set up any persistent config resources staticconf.YamlConfiguration( ... ) self.bucket = s3.get_bucket( ... ) ! def process_tuple(self, tup): key, value = tup # do stuff here! ! new_tup = (new_key, new_value) self.emit(new_tup) !if __name__ == '__main__': MyFirstBolt.run()
![Page 80: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/80.jpg)
Sample Pyleus Boltclass MyFirstBolt(pyleus.storm.SimpleBolt): ! def initialize(self): # set up any persistent config resources staticconf.YamlConfiguration( ... ) self.bucket = s3.get_bucket( ... ) ! def process_tuple(self, tup): key, value = tup # do stuff here! ! new_tup = (new_key, new_value) self.emit(new_tup) !if __name__ == '__main__': MyFirstBolt.run()
![Page 81: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/81.jpg)
Profilistic in Pyleus
• Profiles used to be one day delayed • Or, in emergencies, an ad hoc midday batch run • Now, ~10 minutes after bad performance…
!
• You can investigate!
![Page 82: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/82.jpg)
Roadmap
1. Why profile your code? 2. Create and analyze profiles 3. Acquire profiles from your webapp 4. Search and sort profiles 5. Aggregate similar profiles together 6. Search, sort, aggregate in real time 7. Future work, extensions, and possibilities
![Page 83: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/83.jpg)
Future Work
1. Active monitoring • For every new aggregation created each day • Pull the same aggregation from 1 day, 1 week ago • DIFF them! • If the delta is too big, send an alert or an e-mail • Easy add-on to end of Pyleus topology
![Page 84: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/84.jpg)
Future Work
2. Visualization within the webapp • Already possible ad hoc: graphviz files to PDF • Most recent Yelp hackathon (F’14): Someone built this!
![Page 85: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/85.jpg)
How do I… DIY?
1. Wrap the webapp in a context manager 2. Save profiles into the cloud 3. Tag profiles with attributes 4. Combine profiles based on attributes 5. Build quick-’n-dirty internal app to search/filter 6. Refactor it all into Storm? 7. Give the hedgehog a hug!
I believe in you! ♥
![Page 86: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/86.jpg)
Yelp Dataset ChallengeAcademic dataset from Phoenix, Las Vegas, Madison,
Waterloo and Edinburgh! !!!
+ Your academic project, research and/or visualizations
submitted by December 31, 2014 =
$5,000 prize + $1,000 for publication + $500 for presenting*
yelp.com/dataset_challenge
*See full terms on website
● 1,125,458 Reviews ● 42,153 Businesses
○ 320,002 Business attributes ● 403,210 Tips !!
● 252,898 Users ○ 955,999 Edge social graph
● 31,617 Checkin Sets
![Page 87: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/87.jpg)
Thanks for listening!
Don’t be a stranger! !!
Python MapReduce package: !
Python Storm package:
[email protected] !!http://mrjob.org !http://pyleus.org
![Page 88: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014](https://reader033.vdocuments.mx/reader033/viewer/2022060121/559446c01a28ab02738b467f/html5/thumbnails/88.jpg)
Please give us your feedback on this presentation
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Join the conversation on Twitter with #reinvent
BDT402