latency vs everything

Latency VS Everything

Meetup Optimisation & performance PHP

chez CCM BenchmarkJuin 9 2016

Ori Pekelman

Je suis @OriPekelman partout (Twitter/Linked-in/Github)

Désolé je vais faire les slides en Anglais.

Tout ceci vient d’un billet de blog que je suis en train d’écrire.


You know the fallacies of distributed computing, right? Allow me, if you please,to add something to the mix.

This is not troll bait, and I truly hope for a civilized conversation, but I will posit from the get-go a provocative statement:

Everything can be traded-off against latency. Given infinite latency you can achieve any desired quality of a distributed system. (duh)



In this talk we will discuss two forms of latencies, the first is the one we usually think about: Run-Time Latency.

Mommy, Daddy, where do execution time latencies come from?

1. Badly implemented tight loops (you've been a bad boy)2. Tight loops integrating slow IO (mommy told you about co-

locating data and processing)3. The physical limitations on the rotation of magnetic platters4. The speed of light and entropy5. Anything you can't parallelize (you have multiple cores

dammnit)

As you can guess it is mostly about #6. And you can't do anything about #5.

L1 cache reference ................................ 0.5 nsBranch mispredict ................................... 5 nsL2 cache reference .................................. 7 nsMutex lock/unlock .................................. 25 nsMain memory reference ............................. 100 ns Compress 1K bytes with Zippy .................... 3,000 ns = 3 µsSend 2K bytes over 1 Gbps network .............. 20,000 ns = 20 µsSSD random read ............................... 150,000 ns = 150 µsRead 1 MB sequentially from memory ............ 250,000 ns = 250 µsRound trip within same datacenter ............. 500,000 ns = 0.5 msRead 1 MB sequentially from SSD* ............ 1,000,000 ns = 1 msDisk seek .................................. 10,000,000 ns = 10 msRead 1 MB sequentially from disk ........... 20,000,000 ns = 20 msSend packet CA->India->CA ................. 250,000,000 ns = 250 msGetting Coffee.........................300,000,000,000 ns = 300,000 msRefactoring slow code................50,000,000,000,000 ns = 50,000,000 msSetting up a new test cluster.......500,000,000,000,000 ns = 500,000,000 msDiscover you need a new DB........1,500,000,000,000,000 ns = 1,500,000,000 msIntegrating new DB to your code.. 8,500,000,000,000,000 ns = 8,000,000,000 msMigrating Production cluster.....15,000,000,000,000,000 ns = 15,000,000,000 ms

Stuff Every Developer Hacker NewsSaysShould Know

L1 cache reference ................................ 0.5 nsBranch mispredict ................................... 5 nsL2 cache reference .................................. 7 nsMutex lock/unlock .................................. 25 nsMain memory reference ............................. 100 ns Compress 1K bytes with Zippy .................... 3,000 ns = 3 µsSend 2K bytes over 1 Gbps network .............. 20,000 ns = 20 µsSSD random read ............................... 150,000 ns = 150 µsRead 1 MB sequentially from memory ............ 250,000 ns = 250 µsRound trip within same datacenter ............. 500,000 ns = 0.5 msRead 1 MB sequentially from SSD* ............ 1,000,000 ns = 1 msDisk seek .................................. 10,000,000 ns = 10 msRead 1 MB sequentially from disk ........... 20,000,000 ns = 20 msSend packet CA->India->CA ................. 250,000,000 ns = 250 msGetting Coffee.........................300,000,000,000 ns = 300,000 msRefactoring slow code................50,000,000,000,000 ns = 50,000,000 msSetting up a new test cluster.......500,000,000,000,000 ns = 500,000,000 msDiscover you need a new DB........1,500,000,000,000,000 ns = 1,500,000,000 msIntegrating new DB to your code.. 8,500,000,000,000,000 ns = 8,000,000,000 msMigrating Production cluster.....15,000,000,000,000,000 ns = 15,000,000,000 ms

This part usually gets left out.

1. Optimize what is slow.2. You don't need to optimize the coffee thing, on the contrary take a longer break, and please don't do a tight loop on coffee.

Remember:

In the real world, when thinking about distributed systems, we are usually more interested in execution time. It's all about achieving a desired state of world under some threshold, some time-out (People are so anxious).

The thing is that in order to shorten latencies in the real-world, in execution time, you are going to have to spend code-time (and coffee time), which is so many factors greater.

And usually works at much higher granularity (human brains can only be very poorly sharded, and sharding people may be illegal where you live, check with a local legal expert for advice, IANAL). These are hard problems.

ExecutiontimevsCodetime

You can create, quite easily, a system that has constant read-time as long as you accept stale caches.

If you accept those stale caches you can also create a system where writes are mostly constant-time (as long as there is no requirement of writes to not having taken into consideration perfect consistency and partition tolerance.).

Its enough to implement strict CQRS. Its enough to say "All writes get logged, but may not succeed in the very improbable sense I have promised you they would have").

Constantresponsetimesystems

Resolving Code time latency is mostly about not solving solved problems (there are so many unsolved ones to do yet).

It's hidden behind many layers, but, yes, you, Oh lowly developer of something that should be simple, just a small web application. You are tasked every day with resolving these hard theoretical questions; Every time anyone says something is "Slow".

Solving them is all about Not Solving them. Not optimizing a tight loop and use the L1 cache better.

It is about implementing patterns, using the frameworks and leveraging infrastructure elements to do that.


SolvedProblems:This specific thing is about slow io in a loop.

Because "slow" is always, just that, a desired state of the world, having some guarantees of eventual consistency and acceptable levels of latency: Between a MySQL database and the browser rendering a page - between a payment gateway and a bank, between two players frantically hitting their keyboard and an imaginary beast, just eaten, or just having ate.

It’s always about the granularity of your cache, the staleness you can accept on that side… and the minimal required time you can get an async write to finish.

The latter usually being simply a function of how well you can parallelize workers.


If you could have a system that gives you a perfect clone of production for every single pull-request and automatically get notified whenever there is any form of regression.

Than have a system that allows you to pinpoint immediately and precisely the pain point … so you only optimize the tight loops that matter… well that would solve that.

EvenTightLoopsAre a SolvedProblem

(Open Parenthesis….

If you could have a system that gives you a perfect clone of production for every single pull-request and automatically get notified whenever there is any form of regression.

Than have a system that allows you to pinpoint immediately and precisely the pain point … so you only optimize the tight loops that matter… well that would solve that.

Platform.sh+blackfire.io

propose that capability :)

...close Parenthesis)

Is a solved problem. But you will have to produce a bunch of code to make that happen in your use case. Some solved problems still require code (sometimes much of it).

Some can be simply and elegantly abstracted away. Anything that can be resolved on the infrastructure level should be. And everything on the infrastructure level can and should be automated.


L1 cache reference ................................ 0.5 nsBranch mispredict ................................... 5 nsL2 cache reference .................................. 7 nsMutex lock/unlock .................................. 25 nsMain memory reference ............................. 100 ns Compress 1K bytes with Zippy .................... 3,000 ns = 3 µsSend 2K bytes over 1 Gbps network .............. 20,000 ns = 20 µsSSD random read ............................... 150,000 ns = 150 µsRead 1 MB sequentially from memory ............ 250,000 ns = 250 µsRound trip within same datacenter ............. 500,000 ns = 0.5 msRead 1 MB sequentially from SSD* ............ 1,000,000 ns = 1 msDisk seek .................................. 10,000,000 ns = 10 msRead 1 MB sequentially from disk ........... 20,000,000 ns = 20 msSend packet CA->India->CA ................. 250,000,000 ns = 250 msGetting Coffee.........................300,000,000,000 ns = 300,000 msRefactoring slow code................50,000,000,000,000 ns = 50,000,000 msSetting up a new test cluster.......500,000,000,000,000 ns = 500,000,000 msDiscover you need a new DB........1,500,000,000,000,000 ns = 1,500,000,000 msImplementing CQRS.................8,500,000,000,000,000 ns = 8,000,000,000 msMigrating Production cluster.....15,000,000,000,000,000 ns = 15,000,000,000 ms

You do this.

L1 cache reference ................................ 0.5 nsBranch mispredict ................................... 5 nsL2 cache reference .................................. 7 nsMutex lock/unlock .................................. 25 nsMain memory reference ............................. 100 ns Compress 1K bytes with Zippy .................... 3,000 ns = 3 µsSend 2K bytes over 1 Gbps network .............. 20,000 ns = 20 µsSSD random read ............................... 150,000 ns = 150 µsRead 1 MB sequentially from memory ............ 250,000 ns = 250 µsRound trip within same datacenter ............. 500,000 ns = 0.5 msRead 1 MB sequentially from SSD* ............ 1,000,000 ns = 1 msDisk seek .................................. 10,000,000 ns = 10 msRead 1 MB sequentially from disk ........... 20,000,000 ns = 20 msSend packet CA->India->CA ................. 250,000,000 ns = 250 msGetting Coffee.........................300,000,000,000 ns = 300,000 msRefactoring slow code................50,000,000,000,000 ns = 50,000,000 msSetting up a new test cluster.......500,000,000,000,000 ns = 500,000,000 msDiscover you need a new DB........1,500,000,000,000,000 ns = 1,500,000,000 msImplementing CQRS.................8,500,000,000,000,000 ns = 8,000,000,000 msMigrating Production cluster.....15,000,000,000,000,000 ns = 15,000,000,000 ms

Because platform.sh can take this.

L1 cache reference ................................ 0.5 nsBranch mispredict ................................... 5 nsL2 cache reference .................................. 7 nsMutex lock/unlock .................................. 25 nsMain memory reference ............................. 100 ns Compress 1K bytes with Zippy .................... 3,000 ns = 3 µsSend 2K bytes over 1 Gbps network .............. 20,000 ns = 20 µsSSD random read ............................... 150,000 ns = 150 µsRead 1 MB sequentially from memory ............ 250,000 ns = 250 µsRound trip within same datacenter ............. 500,000 ns = 0.5 msRead 1 MB sequentially from SSD* ............ 1,000,000 ns = 1 msDisk seek .................................. 10,000,000 ns = 10 msRead 1 MB sequentially from disk ........... 20,000,000 ns = 20 msSend packet CA->India->CA ................. 250,000,000 ns = 250 msGetting Coffee.........................300,000,000,000 ns = 300,000 msRefactoring slow code................50,000,000,000,000 ns = 50,000,000 msSetting up a new test cluster............50,000,000,000 ns = 50,000 msDiscover you need a new DB........1,500,000,000,000,000 ns = 1,500,000,000 msImplementing CQRS.....................5,000,000,000,000 ns = 5,000,000 msMigrating Production cluster.............50,000,000,000 ns = 50,000 ms

And make it into this!

Complete development to production lifecycle

Opinionated but flexible, integrates with any toolchain, any workflow

Git driven infrastructure orchestration

Automated no-risk deployments

On-the-fly cloning of production into staging clusters in less than a minute

Zero admin chores : it’s not DevOps its NoOps

Dynamic infrastructures, High-Availability, Elastic Scaling, Managed integrated caches are solved problems so is automated performance testing.

Don’t solve solved problems.

Go implement CQRS.

Because we can do this.

Enterprise grade production

Best PHP PaaS out there. Powers Magento Cloud. Default Symfony deployment option.

Multi-Cloud, highly available multi-datacenter PaaS with zero-downtime scaling and 99.99% SLAs

Entire infrastructure management - web servers, databases, search-engines, caches, message queues…

Secure, stable, scalable horizontally and vertically

Fine-grain access controls for each environment

Platform.sh : built for better productivity

Unlimited concurrent staging environments eliminates QA bottlenecks and allow for continuous deployments.

Testing each feature in perfect isolation is how agile was supposed to be and for the first time, can be

Fast on-boarding of new developers increases flexibility and empowers remote work

20-40% better developer productivity

90% Less Ops/DevOps effort

40% faster User Acceptance Testing

Second Generation PaaS Built on bleeding edge technologies

Powered by a high-density micro-container grid

Unique consensus based orchestration layer

Unique cluster cloning technology

Unique git-powered service topology technology

Replicated redundant storage grid

High availability network overlay

Micro-container architecture

Platform.sh is a second generationPaaS

Batteries includedUnlike all other PaaS systems, no add-ons required : internally manages MySQL, Postgres, MongoDB, Solr, ElasticSearch, Redis, RabbitMQ and more (included in the price).

Built for scalable modern web appsFull stack infrastructure management with micro-services support and managed CDN

IntegratedFully automatable by third party tools on every aspect

Ori PekelmanProduct Marketing & Evangelist

Fred Plais CEO

Damien TournoudCTO

Sylvie GeorgeaultCFO

Kieron Sambrook-Smith Chief Commercial Officer

Doug Goldberg VP Sales, NA

Rob Douglass VP Customer Success

Management teamHeadquartered in Paris, with staff in the East Coast, West Coast, Canada, France, Germany, UK and more.

Subscriptions from $10 to $50k per month

Global 24/7/365 supportCustomers in 104+ countries

2,000+ customers, strong acceleration in Q4 2015

Comprehensive Offering Since Q3 2014Key growth metrics

Commercially successful Many global brands with multi-year contracts, thousands of self-service clients

Testimonials “Platform.sh has reduced our hosting costs by over 60% but with a faster customer experience through a cutting edge hosting stack. It has become the cornerstone of our product development lifecycle, saving time and money at every step. I just can’t imagine working without it.”

Peter Ward, Reiss, a leading UK fashion brand

Strategic Symfony partnership

Horizon 2020 WinnerTop EU Innovation Grant: €2m

Best Horizontal Cloud Platform EuroCloud 2015

“European cloud leadership is being born before our very own

eyes”, JDN

“Britain’s next $ Billion company”, Silicon Valley Comes to UK

Strategic Atlassian partnership

Awards & RecognitionVC backed award winning startup with a global reach

https://platform.sh/2015/06/european-horizon-2020-grant/



https://platform.sh/press/2015/06/23/european-cloud-leadership-born/




Our Product Offering

Self-service Hosting : 10$ to 300$ / monthWeb Agency Plan, Partners (Symfony, Atlassian)

Enterprise Grade hosting: 800$ to 15k$ / monthMulti-Cloud: Amazon EC2, UpCloud, Orange Business Services, On Premise (VMWare or OpenStack)Managed Private Cloud Region > 15k$ / month

White-Label offering with automated single tenant SaaS

All of the product offerings are based on a single technical stack

Focus On Mass market PHP: Drupal, Symfony, Magento, WordPress

Soft launch on NodeJS

Roadmap

We have runtimes ready for Java, Ruby and Python (.Net is in the works )More clouds targets (Azure, Google in discussions)

Work on more on premise targets

THANKS.

latency vs everything

Technology