ai&big data club спікер Дмитро Сподарець "Які...
TRANSCRIPT
What tools data scientists are using?
Dmitry Spodarets
AI&BigData Club
Who am I
Dmitry Spodarets• Founder and CEO at FlyElephant• PhD candidate at Odessa National
University• Lecturer at Odessa Polytechnic University • Organizer of technical conferences about
AI, BigData, HPC, JS, Web Technologies …
FlyElephant
We automate Data Science and Engineering Simulation
and help teams to work efficiently.
Computing resources
Ready-computing infrastructure
Collaboration & Sharing
Fast Deployment
Expert Community
Data Science Tools Survey
Datasets
less th
an 1 MB
1.1 to 10 M
B
11 to 100 M
B
101 MB to
1 GB
1.1 to 10 GB
11 to 100 GB
101 GB to 1 Terabyte
1.1 to 10 TB
11 to 100 TB
101 TB to 1 Petabyte
1.1 PB to 10 Petabyte
11 to 100 PB
over 100 PB
0
10
20
30
40
50
60
70
Datasets
Datasets
Tools for collecting data
Python 45R 26
Spark 18SQL 15
Excel 13Kafka 11
Pandas 10custom 8Hadoop 5Numpy 5
SAS 5
Tools for storing data
PostgreSQL 37
csv 31
MySQL 21
Hadoop 16
Excel 15
HDFS 15
Mongodb 15
My Server 12
Oracle 11
Hive 8
Programming languages
Python 151R 88
SQL 37Java 32Scala 22bash 17C++ 17
JavaScript 15C# 13vba 8C 6
Libraries
Pandas 88Numpy 68
scikit-learn 48scipy 26dplyr 20
matplotlib 20ggplot2 15keras 14SPARK 13
xgboost 13Tensorflow 12
Tools for the visualization of data
matplotlib 66seaborn 33ggplot2 26Excel 22
Tableau 22R 19
ggplot 14plotly 13bokeh 12
d3 11
Cloud services
aws 77none 41azure 25google 24
digital ocean 9OpenStack 7
Watson 1
Computing power
NVIDIA DGX-1 Deep Learning Supercomputer170/3 TFLOPS (GPU FP16 / CPU FP32)
intel xeon phi processor
nvidia tesla p100~5 TeraFLOPS
~3 TeraFLOPS
FPGA
Dmitry Spodarets