latency vs everything
TRANSCRIPT
Latency VS Everything
Meetup Optimisation & performance PHP
chez CCM BenchmarkJuin 9 2016
Ori Pekelman
Je suis @OriPekelman partout (Twitter/Linked-in/Github)
Désolé je vais faire les slides en Anglais.
Tout ceci vient d’un billet de blog que je suis en train d’écrire.
Latency VS Everything
You know the fallacies of distributed computing, right? Allow me, if you please,to add something to the mix.
This is not troll bait, and I truly hope for a civilized conversation, but I will posit from the get-go a provocative statement:
Everything can be traded-off against latency. Given infinite latency you can achieve any desired quality of a distributed system. (duh)
Latency VS Everything
Latency VS Everything
In this talk we will discuss two forms of latencies, the first is the one we usually think about: Run-Time Latency.
Mommy, Daddy, where do execution time latencies come from?
1. Badly implemented tight loops (you've been a bad boy)2. Tight loops integrating slow IO (mommy told you about co-
locating data and processing)3. The physical limitations on the rotation of magnetic platters4. The speed of light and entropy5. Anything you can't parallelize (you have multiple cores
dammnit)
As you can guess it is mostly about #6. And you can't do anything about #5.
L1 cache reference ................................ 0.5 nsBranch mispredict ................................... 5 nsL2 cache reference .................................. 7 nsMutex lock/unlock .................................. 25 nsMain memory reference ............................. 100 ns Compress 1K bytes with Zippy .................... 3,000 ns = 3 µsSend 2K bytes over 1 Gbps network .............. 20,000 ns = 20 µsSSD random read ............................... 150,000 ns = 150 µsRead 1 MB sequentially from memory ............ 250,000 ns = 250 µsRound trip within same datacenter ............. 500,000 ns = 0.5 msRead 1 MB sequentially from SSD* ............ 1,000,000 ns = 1 msDisk seek .................................. 10,000,000 ns = 10 msRead 1 MB sequentially from disk ........... 20,000,000 ns = 20 msSend packet CA->India->CA ................. 250,000,000 ns = 250 msGetting Coffee.........................300,000,000,000 ns = 300,000 msRefactoring slow code................50,000,000,000,000 ns = 50,000,000 msSetting up a new test cluster.......500,000,000,000,000 ns = 500,000,000 msDiscover you need a new DB........1,500,000,000,000,000 ns = 1,500,000,000 msIntegrating new DB to your code.. 8,500,000,000,000,000 ns = 8,000,000,000 msMigrating Production cluster.....15,000,000,000,000,000 ns = 15,000,000,000 ms
Stuff Every Developer Hacker NewsSaysShould Know
L1 cache reference ................................ 0.5 nsBranch mispredict ................................... 5 nsL2 cache reference .................................. 7 nsMutex lock/unlock .................................. 25 nsMain memory reference ............................. 100 ns Compress 1K bytes with Zippy .................... 3,000 ns = 3 µsSend 2K bytes over 1 Gbps network .............. 20,000 ns = 20 µsSSD random read ............................... 150,000 ns = 150 µsRead 1 MB sequentially from memory ............ 250,000 ns = 250 µsRound trip within same datacenter ............. 500,000 ns = 0.5 msRead 1 MB sequentially from SSD* ............ 1,000,000 ns = 1 msDisk seek .................................. 10,000,000 ns = 10 msRead 1 MB sequentially from disk ........... 20,000,000 ns = 20 msSend packet CA->India->CA ................. 250,000,000 ns = 250 msGetting Coffee.........................300,000,000,000 ns = 300,000 msRefactoring slow code................50,000,000,000,000 ns = 50,000,000 msSetting up a new test cluster.......500,000,000,000,000 ns = 500,000,000 msDiscover you need a new DB........1,500,000,000,000,000 ns = 1,500,000,000 msIntegrating new DB to your code.. 8,500,000,000,000,000 ns = 8,000,000,000 msMigrating Production cluster.....15,000,000,000,000,000 ns = 15,000,000,000 ms
This part usually gets left out.
1. Optimize what is slow.2. You don't need to optimize the coffee thing, on the contrary take a longer break, and please don't do a tight loop on coffee.
Remember:
In the real world, when thinking about distributed systems, we are usually more interested in execution time. It's all about achieving a desired state of world under some threshold, some time-out (People are so anxious).
The thing is that in order to shorten latencies in the real-world, in execution time, you are going to have to spend code-time (and coffee time), which is so many factors greater.
And usually works at much higher granularity (human brains can only be very poorly sharded, and sharding people may be illegal where you live, check with a local legal expert for advice, IANAL). These are hard problems.
ExecutiontimevsCodetime
You can create, quite easily, a system that has constant read-time as long as you accept stale caches.
If you accept those stale caches you can also create a system where writes are mostly constant-time (as long as there is no requirement of writes to not having taken into consideration perfect consistency and partition tolerance.).
Its enough to implement strict CQRS. Its enough to say "All writes get logged, but may not succeed in the very improbable sense I have promised you they would have").
Constantresponsetimesystems
Resolving Code time latency is mostly about not solving solved problems (there are so many unsolved ones to do yet).
It's hidden behind many layers, but, yes, you, Oh lowly developer of something that should be simple, just a small web application. You are tasked every day with resolving these hard theoretical questions; Every time anyone says something is "Slow".
Solving them is all about Not Solving them. Not optimizing a tight loop and use the L1 cache better.
It is about implementing patterns, using the frameworks and leveraging infrastructure elements to do that.
Constantresponsetimesystems
SolvedProblems:This specific thing is about slow io in a loop.
Because "slow" is always, just that, a desired state of the world, having some guarantees of eventual consistency and acceptable levels of latency: Between a MySQL database and the browser rendering a page - between a payment gateway and a bank, between two players frantically hitting their keyboard and an imaginary beast, just eaten, or just having ate.
It’s always about the granularity of your cache, the staleness you can accept on that side… and the minimal required time you can get an async write to finish.
The latter usually being simply a function of how well you can parallelize workers.
Constantresponsetimesystems
If you could have a system that gives you a perfect clone of production for every single pull-request and automatically get notified whenever there is any form of regression.
Than have a system that allows you to pinpoint immediately and precisely the pain point … so you only optimize the tight loops that matter… well that would solve that.
EvenTightLoopsAre a SolvedProblem
(Open Parenthesis….
If you could have a system that gives you a perfect clone of production for every single pull-request and automatically get notified whenever there is any form of regression.
Than have a system that allows you to pinpoint immediately and precisely the pain point … so you only optimize the tight loops that matter… well that would solve that.
Platform.sh+blackfire.io
propose that capability :)
...close Parenthesis)
Is a solved problem. But you will have to produce a bunch of code to make that happen in your use case. Some solved problems still require code (sometimes much of it).
Some can be simply and elegantly abstracted away. Anything that can be resolved on the infrastructure level should be. And everything on the infrastructure level can and should be automated.
Constantresponsetimesystems
L1 cache reference ................................ 0.5 nsBranch mispredict ................................... 5 nsL2 cache reference .................................. 7 nsMutex lock/unlock .................................. 25 nsMain memory reference ............................. 100 ns Compress 1K bytes with Zippy .................... 3,000 ns = 3 µsSend 2K bytes over 1 Gbps network .............. 20,000 ns = 20 µsSSD random read ............................... 150,000 ns = 150 µsRead 1 MB sequentially from memory ............ 250,000 ns = 250 µsRound trip within same datacenter ............. 500,000 ns = 0.5 msRead 1 MB sequentially from SSD* ............ 1,000,000 ns = 1 msDisk seek .................................. 10,000,000 ns = 10 msRead 1 MB sequentially from disk ........... 20,000,000 ns = 20 msSend packet CA->India->CA ................. 250,000,000 ns = 250 msGetting Coffee.........................300,000,000,000 ns = 300,000 msRefactoring slow code................50,000,000,000,000 ns = 50,000,000 msSetting up a new test cluster.......500,000,000,000,000 ns = 500,000,000 msDiscover you need a new DB........1,500,000,000,000,000 ns = 1,500,000,000 msImplementing CQRS.................8,500,000,000,000,000 ns = 8,000,000,000 msMigrating Production cluster.....15,000,000,000,000,000 ns = 15,000,000,000 ms
You do this.
L1 cache reference ................................ 0.5 nsBranch mispredict ................................... 5 nsL2 cache reference .................................. 7 nsMutex lock/unlock .................................. 25 nsMain memory reference ............................. 100 ns Compress 1K bytes with Zippy .................... 3,000 ns = 3 µsSend 2K bytes over 1 Gbps network .............. 20,000 ns = 20 µsSSD random read ............................... 150,000 ns = 150 µsRead 1 MB sequentially from memory ............ 250,000 ns = 250 µsRound trip within same datacenter ............. 500,000 ns = 0.5 msRead 1 MB sequentially from SSD* ............ 1,000,000 ns = 1 msDisk seek .................................. 10,000,000 ns = 10 msRead 1 MB sequentially from disk ........... 20,000,000 ns = 20 msSend packet CA->India->CA ................. 250,000,000 ns = 250 msGetting Coffee.........................300,000,000,000 ns = 300,000 msRefactoring slow code................50,000,000,000,000 ns = 50,000,000 msSetting up a new test cluster.......500,000,000,000,000 ns = 500,000,000 msDiscover you need a new DB........1,500,000,000,000,000 ns = 1,500,000,000 msImplementing CQRS.................8,500,000,000,000,000 ns = 8,000,000,000 msMigrating Production cluster.....15,000,000,000,000,000 ns = 15,000,000,000 ms
Because platform.sh can take this.
L1 cache reference ................................ 0.5 nsBranch mispredict ................................... 5 nsL2 cache reference .................................. 7 nsMutex lock/unlock .................................. 25 nsMain memory reference ............................. 100 ns Compress 1K bytes with Zippy .................... 3,000 ns = 3 µsSend 2K bytes over 1 Gbps network .............. 20,000 ns = 20 µsSSD random read ............................... 150,000 ns = 150 µsRead 1 MB sequentially from memory ............ 250,000 ns = 250 µsRound trip within same datacenter ............. 500,000 ns = 0.5 msRead 1 MB sequentially from SSD* ............ 1,000,000 ns = 1 msDisk seek .................................. 10,000,000 ns = 10 msRead 1 MB sequentially from disk ........... 20,000,000 ns = 20 msSend packet CA->India->CA ................. 250,000,000 ns = 250 msGetting Coffee.........................300,000,000,000 ns = 300,000 msRefactoring slow code................50,000,000,000,000 ns = 50,000,000 msSetting up a new test cluster............50,000,000,000 ns = 50,000 msDiscover you need a new DB........1,500,000,000,000,000 ns = 1,500,000,000 msImplementing CQRS.....................5,000,000,000,000 ns = 5,000,000 msMigrating Production cluster.............50,000,000,000 ns = 50,000 ms
And make it into this!
Complete development to production lifecycle
Opinionated but flexible, integrates with any toolchain, any workflow
Git driven infrastructure orchestration
Automated no-risk deployments
On-the-fly cloning of production into staging clusters in less than a minute
Zero admin chores : it’s not DevOps its NoOps
Dynamic infrastructures, High-Availability, Elastic Scaling, Managed integrated caches are solved problems so is automated performance testing.
Don’t solve solved problems.
Go implement CQRS.
Because we can do this.
Enterprise grade production
Best PHP PaaS out there. Powers Magento Cloud. Default Symfony deployment option.
Multi-Cloud, highly available multi-datacenter PaaS with zero-downtime scaling and 99.99% SLAs
Entire infrastructure management - web servers, databases, search-engines, caches, message queues…
Secure, stable, scalable horizontally and vertically
Fine-grain access controls for each environment
Platform.sh : built for better productivity
Unlimited concurrent staging environments eliminates QA bottlenecks and allow for continuous deployments.
Testing each feature in perfect isolation is how agile was supposed to be and for the first time, can be
Fast on-boarding of new developers increases flexibility and empowers remote work
20-40% better developer productivity
90% Less Ops/DevOps effort
40% faster User Acceptance Testing
Second Generation PaaS Built on bleeding edge technologies
Powered by a high-density micro-container grid
Unique consensus based orchestration layer
Unique cluster cloning technology
Unique git-powered service topology technology
Replicated redundant storage grid
High availability network overlay
Micro-container architecture
Platform.sh is a second generationPaaS
Batteries includedUnlike all other PaaS systems, no add-ons required : internally manages MySQL, Postgres, MongoDB, Solr, ElasticSearch, Redis, RabbitMQ and more (included in the price).
Built for scalable modern web appsFull stack infrastructure management with micro-services support and managed CDN
IntegratedFully automatable by third party tools on every aspect
Ori PekelmanProduct Marketing & Evangelist
Fred Plais CEO
Damien TournoudCTO
Sylvie GeorgeaultCFO
Kieron Sambrook-Smith Chief Commercial Officer
Doug Goldberg VP Sales, NA
Rob Douglass VP Customer Success
Management teamHeadquartered in Paris, with staff in the East Coast, West Coast, Canada, France, Germany, UK and more.
Subscriptions from $10 to $50k per month
Global 24/7/365 supportCustomers in 104+ countries
2,000+ customers, strong acceleration in Q4 2015
Comprehensive Offering Since Q3 2014Key growth metrics
Commercially successful Many global brands with multi-year contracts, thousands of self-service clients
Testimonials “Platform.sh has reduced our hosting costs by over 60% but with a faster customer experience through a cutting edge hosting stack. It has become the cornerstone of our product development lifecycle, saving time and money at every step. I just can’t imagine working without it.”
Peter Ward, Reiss, a leading UK fashion brand
Strategic Symfony partnership
Horizon 2020 WinnerTop EU Innovation Grant: €2m
Best Horizontal Cloud Platform EuroCloud 2015
“European cloud leadership is being born before our very own
eyes”, JDN
“Britain’s next $ Billion company”, Silicon Valley Comes to UK
Strategic Atlassian partnership
Awards & RecognitionVC backed award winning startup with a global reach
Our Product Offering
Self-service Hosting : 10$ to 300$ / monthWeb Agency Plan, Partners (Symfony, Atlassian)
Enterprise Grade hosting: 800$ to 15k$ / monthMulti-Cloud: Amazon EC2, UpCloud, Orange Business Services, On Premise (VMWare or OpenStack)Managed Private Cloud Region > 15k$ / month
White-Label offering with automated single tenant SaaS
All of the product offerings are based on a single technical stack
Focus On Mass market PHP: Drupal, Symfony, Magento, WordPress
Soft launch on NodeJS
Roadmap
We have runtimes ready for Java, Ruby and Python (.Net is in the works )More clouds targets (Azure, Google in discussions)
Work on more on premise targets
THANKS.