sparking pandas: an experiment

13
SPARKING PANDAS: AN EXPERIMENT PyConOtto - Florence '17 Francesco Bruni brunifrancesco

Upload: francesco-bruni

Post on 21-Apr-2017

594 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Sparking pandas: an experiment

SPARKING PANDAS: ANEXPERIMENT

PyConOtto - Florence '17

Francesco Bruni

� brunifrancesco

Page 2: Sparking pandas: an experiment

WHO I AMMSc in Telecommunication Engineering

Functional pythonista

Currently working with geo data

Page 3: Sparking pandas: an experiment

OUTLINE

Why Sparking Pandas

Functional data processing pipelines

A real world application

Conclusions

Page 4: Sparking pandas: an experiment

WHY SPARKING PANDAS

What if your data don't fit into memory?

Page 5: Sparking pandas: an experiment

APACHE SPARK: THECOMPONENTS

Page 6: Sparking pandas: an experiment

APACHE SPARK: THE

ARCHITECTURE

Page 7: Sparking pandas: an experiment

FUNCTIONAL DATA

PROCESSING PIPELINES

High order functions

Immutable data

Lazy evaluation

Page 8: Sparking pandas: an experiment

THE EXPERIMENT

The scenario

Containerized application

Page 9: Sparking pandas: an experiment

THE SCENARIO

Page 10: Sparking pandas: an experiment

CONTAINERIZED

APPLICATION

Containerized componentsConstrained memory nodesdocker-composed ecosystem

Page 11: Sparking pandas: an experiment

HANDS ON CODEApache Spark basics

Linear regression

Near real time processing with Apache Kafka

Page 12: Sparking pandas: an experiment

CONCLUSIONS

Complex structure

Worth the effort with a lot of data

Worker nodes should be distribueted

Keep exploring :)

Page 13: Sparking pandas: an experiment

QUESTIONS?

� brunifrancesco

https://github.com/brunifrancesco/docker-spark