nyc web perf-final-july-23

18
July 23, 2015 “Let’s turn Real User Data into a Science!” Dan Boutin – Senior Product Evangelist

Upload: dan-boutin

Post on 15-Aug-2015

38 views

Category:

Software


0 download

TRANSCRIPT

July 23, 2015

“Let’s turn Real User Data into a Science!”Dan Boutin – Senior Product Evangelist

mPulse

What’s a Beacon?

www.w3.org/TR/Beacon

Total Beacons Collected since 6/2013:~ 85 Billion

Run rate over 3B per week and growing Projected ~ 175B by 1/1/166

Big Data ChallengesData Scientists spend too much time ‘data wrangling’

“Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.”

NY Times – August 17th, 2014

Big Data ChallengesBuilding a data science platform is very difficult

Infrastructure

•Choosing big data technologies and setting up a cluster can easily take 9 months or more

Data Pipeline

•Building a high performing big data schema requires specialized skills

•Extracting, transforming, and loading of data (data wrangling) is an enormous time sink and a poor use of data scientists time

Analysis and Workflow

•Figuring out how you can ask questions of the data and how to visualize the results takes time that data scientists should be using to generate actionable insights from their studies

Trade-Offs

Julia Language & iJulia Notebook UI

Julia is a rising star in scientific programming

processing speed support for parallel processing

compatibility with 400+ prebuilt statistical packages large number and growing number of visualization libraries.

Trade-Offs

Why Julia?

R vs Python vs JuliaModern compiler technology

Data ConnectivityPackage Ecosystem

Functional Programming ConstructIntegration with Python, C, C++, R, …

© 2014 SOASTA. All rights reserved. April 15, 2023 8

Trade-Offs

•Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud.

•Columnar Database

•Extremely fast query times

•Attractive Economics

Hadoop vs Big Query vs Red Shift vs …Capabilty – managed Big Data up to 2 petabytes

Cloud Economics – $1,000 TB per month

Why Red Shift?

© 2014 SOASTA. All rights reserved. April 15, 2023 10

Now Let’s Talk Architecture

Data Science WorkbenchData Science without the data wrangling, and much more

Infrastructure

Data PipelineAnalysis and Workflow

• Data Science Workbench comes with the state-of-the-art technology you need to analyze your customer experiences

• All of the real user beacon data is loaded into Data Science Workbench into a highly optimized schema ready for analysis

• Data science is done with Julia, a remarkably fast and in-memory solution for analyzing huge data-sets

• Access to an ever growing library of analysis functions and visualizations based on SOASTA’s and our customers’ expertise

© 2014 SOASTA. All rights reserved. April 15, 2023 13

The Result!

• Every customer beacon unpacked, transformed and loaded nightly by SOASTA into a SOASTA designed Schema in Amazon Redshift. This process designed, supplied and supported by SOASTA

• Amazon Redshift is an extremely inexpensive and powerful BIG DATA database that can scale to almost 2 Petabytes in size. Amazon estimates compute and storage costs of $1,000/TB/month for our implementation

• An online, interactive explore, discover and develop interface based on the Julia scientific programming language developed at MIT and the iJulia Notebook UI

• SOASTA developed Functions & Statistical Models

Well, let’s see it!

procedure Traffic is type Airplane_ID is range 1..10; -- 10 airplanes task type Airplane (ID: Airplane_ID); -- task representing airplanes, with ID as initialisation parameter type Airplane_Access is access Airplane; -- reference type to Airplane protected type Runway is -- the shared runway (protected to allow concurrent access) entry Assign_Aircraft (ID: Airplane_ID); -- all entries are guaranteed mutually exclusive entry Cleared_Runway (ID: Airplane_ID); entry Wait_For_Clear; private Clear: Boolean := True; -- protected private data - generally more than just a flag... end Runway; type Runway_Access is access all Runway;

Trivia Time!

@DanBoutinSOASTA

19831995

© 2014 SOASTA. All rights reserved.

Thank You!

Dan Boutin – Senior Product [email protected]

Mobile (404) 304-9529@DanBoutinSOASTA

July 23, 2015

“Let’s turn Real User Data into a Science!”Dan Boutin – Senior Product Evangelist