massively scalable applications - techferry

21
Massively Scalable Applications Deepansh Malik

Upload: techferry

Post on 15-Jul-2015

474 views

Category:

Technology


4 download

TRANSCRIPT

Massively Scalable Applications

Deepansh Malik

Introductions

www.techferry.com /TechFerry /@ techferry

Deepansh MalikCEO at TechFerry

@DeepanshMalik

https://in.linkedin.com/in/deepanshmalik

TechFerry: Analytics, IT Innovation, R&D CompanySpecialization in

o Growth Analyticso HealthCare Analyticso Massively Scalable Applications and Rich UI

Massively Scalable Applications

Benchmark: 1 Million TRX per second

1 Million Requests per second

1 Million Messages per second

1 Million DB Transactions per second

1 Million/sec = 1 Billion TRX in 17 minutes

= 86.4 Billion TRX a day

Scale out or Scale up?

Scale out -> Add more hardware.

1 CPU Core = 1000 requests/sec

To massively scale (1 Million request/second), we need 1000 cores. 50

machines 20 cores each.

Good idea or stupid idea? Costs??

Scale up?Can one machine scale to a million transactions per second?

The Answer is YES.

Our commodity hardware is very powerful.

What is the bottleneck then? What do we need to save tons of money being

wasted in scaling out?

Let us beginArchitecting

Massively Scalable Apps

Computing Spectrum

Symmetric Multi ProcessingA single problem or a single task (eg. a DB query), it

takes 2 milliseconds on a core.

Can I use two cores and complete this single task in 1

ms?

Distributed Computing

Distribute load on multiple machines.

Make sure there are no bottlenecks or single point of

failures.

Can we achieve End to End Distribution, from

messaging to processing to databases?

Concurrent Programming

One CPU core currently handles 1000 trx/sec.

Can one core handle 1000 trx in a millisecond

instead? That is 1M trx/sec.

Can we remove context switching overheads and

synchronous, I/O idling?

Parallel Programming

● Throw more CPU cores for different

tasks.

Distributed Computing

Distribute workload between two or more computing devices or machines

connected by some type of network.

● For example, clustered architecture with multiple machines

However, in real life web applications, we need to distribute workload on

● application servers,

● database servers,

● perform real-time computations or analytics.

Distributed Computing

Distributed Storage

Distributed Messaging

Distributed Analytics

(Real Time and Batch)

Traditional vs New

Spot the Bottleneck node / single point of failure.Traditional: Load Balancer (L), Master DB (M) | New: ??

Traditional New

Load balancingApp Servers

Master SlaveDB Architecture

Distributed Computing - Tools

➔ Distributed Messaging

◆ Apache Kafka, RabbitMQ, Apache ActiveMQ

◆ A detailed comparison from LinkedIn is available at

http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf

➔ Distributed Analytics

◆ Apache Storm (Real Time), Apache Spark (Batch)

➔ Distributed Storage

◆ Cassandra

Use Cases:

Highly Suitable for Real Time analytics of High Velocity Big Data

Machine to Machine (M2M) or Internet of Things (IoT)

M2M, IoT and real time analytics

https://www.linkedin.com/pulse/20141203105632-40354099-m2m-iot-and-real-time-analytics

Concurrent Programming

is a form of computing in which several computations are executing during

overlapping time periods –concurrently – instead of sequentially

software code that facilitates the performance of multiple computing tasks at

the same time

Architectural Concepts

Events, Threads or Actors?

Asynchronous Programming

Functional Programming

Concurrent Programming

Events vs Threads, ActorsNodeJS vs J2EE

Performance comparison of

Multithreaded synchronous

technology using Spring/Hibernate,

VSEvent based, single process, asynchronous

technology using NodeJS.

Independent Research Report from TechFerry Innovation Lab

http://www.techferry.com/eBooks/NodeJS-vs-J2EE-Stack.html

Asynchronous Programming

End to end asynchronous programming

Non blocking call-backs

not just at Application layer

but also at UI or Database layers.

Pick asynchronous programming at application,

database or UI layer based on your use-case.

Functional ProgrammingA programming paradigm, a style of building the

structure and elements of computer programs, that

treats computation as the evaluation of mathematical

functions and avoids changing-state and mutable data.

Routines can easily be moved to a different CPU core.

Scala/Akka Actors

Innovation Labs @ TechFerry

Symmetric Multi Processing

Symmetric Multi Processing (SMP) is the processing of programs by multiple

processors that share a common operating system and memory.

The processors share memory and the I/O bus or data path.

A single copy of the operating system is in charge of all the processors.

Asymmetric vs Symmetric

Asymmetric MultiprocessingThe different CPU take on different job

Symmetric Multi Processing (SMP)

All CPU run in parallel, doing the same job

CPUs share the same memory

+1 408-337-6607

[email protected]

Contact Information

www.techferry.com

Thank You/techferry /@techferry