how to manage large amounts of data with iteratee - scaladays berlin 2014

66
HOW TO MANAGE LARGE AMOUNTS OF DATA WITH ITERATEE BY @WAXZCE – QUENTIN ADAM SCALA DAYS 2014 BERLIN

Upload: quentin-adam

Post on 08-May-2015

1.505 views

Category:

Technology


2 download

DESCRIPTION

How to manage a large file with HTTP? Is it possible to manage data stream when you are in a HTTP POST request? How to be relax when managing data stream (ie : not use while(true) hack or something I/O block)? This talk is about Play! iteratee and how to use it on real project : what is an iteratee? why it's useful? how to use it? how to manage it? is it clean? is your code readable or not? http://scaladays.org/#schedule/How-to-manage-large-amounts-of-data-with-Iteratee

TRANSCRIPT

Page 1: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

HOW TO MANAGE LARGE AMOUNTS OF DATA WITH ITERATEEBY @WAXZCE – QUENTIN ADAM

SCALA DAYS 2014 BERLIN

Page 2: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

MY DAY TO DAY WORK : CLEVER CLOUD, MAKE YOUR APP RUN ALL THE TIME

Page 3: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

And learn a lot of things about your code, apps, and good/bad design…

KEEP YOUR APPS ONLINE. MADE WITH NODE.JS, SCALA, JAVA, RUBY, PHP, PYTHON, GO…

Page 4: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

AND LEARN A LOT OF THINGS ABOUT YOUR CODE, APPS, AND GOOD/BAD DESIGN…

Page 5: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

WHY STREAMS ARE SO TRENDY THESE DAYS?

Page 6: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

2 MAIN REASONS

Page 7: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

DATA ARE COMING BIGGER AND BIGGER

Page 8: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

WE NEED TO ACT WHILE DATA TRANSFERRING IS OCCURRING

Page 9: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

&

Page 10: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

WE ARE NOW USING LOTS OF PERMANENT CONNECTION

Page 11: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

EXAMPLE

Page 12: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

WHAT IS INSIDE AN HTTP REQUEST ?

Verb• The action

Resource• The object of the action

Headers • The context of the action

Body • Optional• The datas

Page 13: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

IN MANY CASE THE REQUEST IS MANIPULATE ALL FROM MEMORY

Page 14: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

UPLOADING A 20 GB FILE ON A POST REQUEST AND STORE IT

EXAMPLE

Page 15: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

IT’S A BAD IDEA TO PUT THE BODY PART IN MEMORY

Page 16: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

CREATE A TEMP FILE TO STORE THE DATA

Page 17: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

Client upload

file

Create a temp file

Send a file in a

backend

Then answer to the client

The file is transferring two

times

Page 18: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

NOT SO GOOD

Page 19: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

Client upload file

Directly stream it to the backend

Then answer to the client

Page 20: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

LET’S ACT ON STREAM!

Page 21: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

CLASSIC JAVA STREAM MANAGEMENT

Page 22: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

CLASSIC JAVA STREAM MANAGEMENT

• Buffers• Buffer management• Buffer exception• Buffer overflow

Page 23: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

CLASSIC JAVA STREAM MANAGEMENT • Low performances if not buffered

• Not modular

• Thread blocking

• Code is ugly

• No back pressure

• Error handling is bad

• i/o management and business code are mixed

Page 24: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

I CAN’T SLEEP IN THE NIGHT

Page 25: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

WE HAVE TO FIND A BETTER WAY

Page 26: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

THAT’S WHY WE NEED ITERATEE

Page 27: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

DATA STRUCTURES TO EXPRESS DATA STREAM MANAGEMENT

Page 28: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

Like a recipe

Consume the data

ITERATEE : HOW TO MANAGE A STREAM

Page 29: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

Produce the data

ENUMERATOR : DATA STREAM

Page 30: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

Set of tools to do cool things with Iteratee and Enumerator

ENUMERATEE

Page 31: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

SIMPLE EXAMPLE TO START

Page 32: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

AN ENUMERATOR

Page 33: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

AN ITERATEE

Page 34: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

INPUTS

• EOF

• Empty

• El

Page 35: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

ITERATEE STATES

• Done

• Error

• Cont

Page 36: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

AN ITERATEE

Page 37: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

AN ITERATEE TO MANAGE THE STREAM

Page 38: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014
Page 39: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

AN ITERATEE

Page 40: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014
Page 41: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

ITERATEE COMPOSITION

Page 42: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014
Page 43: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

PURE FUNCTIONAL

Page 44: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

TYPE SAFE

Page 45: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

COMPOSITION

Page 46: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

SO WE WILL JUST MANAGE THE BODY STREAM

JUST DO NOT REWRITE HTTP PARSER

Page 47: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

CREATE TEMP FILE

Built in on play with

Page 48: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

CREATE TEMP FILE

Built in on play with

Page 49: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

BODY PARSERSREQUEST HEADERS -> ITERATEE[ARRAY[BYTE], EITHER[SIMPLERESULT, ?]]

Page 50: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

GET FILE AND CALCULATE HASH FROM CHUNK

Page 51: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

DEMO

Page 52: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

I WANT TO USE FORMS

Page 53: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

SO PARTHANDLER ALLOW YOU TO COMPOSE A BODYPARSER

ITERATEE <3 COMPOSITION

Page 54: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

NOW WE CAN ACCESS TO STREAMS

Page 55: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

DATA STORAGE

Page 56: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

I HATE FILE SYSTEM

Page 57: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

I HATE FILE SYSTEM

Page 58: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

DO NOT USE THE FILE SYSTEM AS A DATASTORE

File system are POSIX compliant

• POSIX is ACID• POSIX is powerful but is a bottleneck • File System is the nightmare of ops • File System creates coupling (host provider/OS/language)• SPOF-free multi tenant File System is a unicorn

Page 59: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

• Key = hash of the chunk

STORE FILE IN CHUNK INSIDE K/V STORE

Page 60: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

STORE AS META DATA THE LIST OF CHUNKS

Page 61: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

PROCESS

File is post by browser

Stream is chunk

nomalized

Each chunk is async

store in kv

List of chunk hash is store

with file id

No overhead for data store

Page 62: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

NEW DEMO• Play 2 + iteratee

• Datastore is riak

Page 63: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

ARE YOU CONVINCE TO USE STREAM ?

Page 64: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

DATA STREAMING PROGRAMMING IS NOW TRENDING

Page 65: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

THX TO @CLEMENTDCLEMENT DELAFARGUE

Page 66: How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

I’m @waxzce on twitter

I’m the CEO of

A PaaS provider, give it a try ;-)

THX FOR LISTENING & QUESTIONS TIME

Coupon for Clever Cloud trial :

berlin_scala_days