[sneak preview] apache spark: preparing for the next wave of reactive big data

11
APACHE SPARK PREPARING FOR THE NEXT WAVE OF REACTIVE BIG DATA

Upload: typesafeinc

Post on 14-Jul-2015

13.837 views

Category:

Software


0 download

TRANSCRIPT

Page 1: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

APACHE SPARKPREPARING FOR THE NEXT WAVE OF REACTIVE BIG DATA

Page 2: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

74% Developers8% Data Scientists7% C-level execs

TOP 3 LANGUAGES USED WITH SPARK

88% Scala 44% Java22% Python

31% are evaluating Spark now

are running Spark in production

13%

82% of users chose Spark to replace MapReduce

78% of users need faster processing of larger data sets

62% of users load data into Spark with Hadoop DFS

54% of users run Spark standalone

67% of users need Spark for event stream processing

20%

are planning to use Spark in 2015

TOP 3 INDUSTRIES

RESPONDENTS

Telecoms, Banks, Retail

APACHE SPARK SURVEY 2015 - QUICK SNAPSHOT

Page 3: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

3

JOB TYPE/ROLE

7.5% Data Scientist 6.5% C-Level Executive 3.5% So�ware Architect 3.5% Dev Ops 1% Business Analyst

74% Developer

6.5% Other

INDUSTRY FOCUS

33% Other

5% Consulting

4% Healthcare / Insurance

9% Advertising

10% So�ware / Technology

11% Retail

12% Banking / Finance

16% Telecommunications / Networks

Including Biotechnology/Chemistry, Machinery, Education, Government and Utilities and other sectors

Page 4: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

4INFRASTRUCTURE TECHNOLOGIES IN USE

53% Amazon EC2

34% Docker

22% Cloudera CDH

16% Ansible

14% Mesos

13% OpenStack

12% Apache.org Builds of Hadoop

10% HortonWorks HDP

10% Heroku

8% Google Compute Engine

7% Core OS

7% MapR Hadoop Distribution

6% Microso� Azure

5% Marathon

4% Kubernetes

2% Aurora

11% Other XaaS

Page 5: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

5

Evaluating Spark now

Currently usingin production

Evaluated,not planning to use

Evaluated, will use in 2016 or later

Um, what’s Spark?

Planning touse in 2015

31%

28%

20%

13%

6% 2%

CURRENT RELATIONSHIP WITH SPARK

Page 6: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

6

Fast Batch Processing of

Large Data Sets

78%Support for

Event Stream Processing

60%Fast Data Queries in Real Time

56%Improved

Programmer Productivity

55%

BUSINESS GOALS IN MIND

Page 7: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

7

SPARK FEATURES/MODULES IN DEMAND

25%

59%65%82%

51%

Core API as a Replacement for

MapReduceStreaming Library(Spark Streaming)

Machine Learning Library

(MLlib) Integrated SQL (SparkSQL)

Graph Algorithms Library

(GraphX)

Page 8: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

8

DATA PROCESSING WITH SPARK

39%

41%

46%

46%

59%

61%

Read or Write Data to One or More Databases

Static Reports

SQL Queries and Business Intelligence

Write Data to Hadoop Distributed File System (HDFS)

Ad-hoc Queries and Reporting

ETL Data from External Sources

67% Event Stream Processing

71%

65%

40%

Use Spark as Part of a Larger Data Pipeline

Extract Information from Data Sooner Rather than Later

Automate Decision Making at Runtime

Page 9: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

9

2ndJava 44%

1stScala 88%

3rdPython 22%

WHICH LANGUAGES ARE IMPORTANT TO YOUR SPARK INSTALLATION?

Honorable mentions: R, Clojure, Groovy, Ruby & Go

Page 10: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

10

HOW DO YOU LOAD DATA INTO SPARK?

62% Hadoop Distributed File System (HDFS)

18% Other Services(e.g. over socket connection)

41% Apache Kafka

46% Databases

29% Amazon S3

12% Other*

*Including: Apache Cassandra, Amazon Kinesis and Apache HBase

Page 11: [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

11

Typesafe (Twitter: @Typesafe) is dedicated to helping developers build Reactive applications on the JVM. Backed by Greylock Partners, Shasta Ventures, Bain Capital Ventures and Juniper Networks, Typesafe is headquartered in San Francisco with offices in Switzerland and Sweden. To start building Reactive applications today, download Typesafe Activator.

© 2015 Typesafe

Hello, Apache Spark! Typesafe Activator template for devs

DOWNLOAD

Get the FULL report (PDF)

DOWNLOAD