enterprise data science

21
Data Science in the Enterprise Misha Lisovich [email protected]

Upload: mikhail-lisovich

Post on 19-Aug-2015

78 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Data Science in the Enterprise

Misha Lisovich [email protected]

Outline1. Data Science in the enterprise2. Interviews!

Enterprise Data Science● What does a data scientist do?● How do you make the most impact?

The Process1. Productize

- Compelling data products- Innovation pipeline

2. Ruggedize- Toolchain: Rstudio, Devtools, Github, Travis CI, Docker- Strong testing- Production-ready Architecture

3. Assimilate- Command line tools- Make it into HTTP APIs- Make it into Docker containers

Step 1: ProductizeInternal Products:

- Ad-hoc Analyses - Internal Dashboards- Automated reports- Rapid Prototyping

External Products:- End-user data products- Backend services

1. Dashboards

Business Intelligence Internal ToolsData & Job Monitoring

2. Automated Reports

.Rmd -> html

=

3. Rapid Prototyping

4. Backend Services

Batch Data Processing (ETL)

R APIs

5. End-user Products

Step 2: Ruggedize

1. Create reproducible architecture2. Set up strong testing & CI 3. Separate Production and Dev 4. Set up monitoring & reporting

Case Study: HB Architecture

- Rstudio - Containerized Architecture- Continuous Integration- Multiple Environments- Notifications/Monitoring

Data Architecture

elasticsearch:

image: elasticsearch

shiny-server:

image: shiny

ports:

- "443:443"

links:

- elasticsearch

etl:

image:etl

volumes:

- .:/data

etl-data:

image: etl-dataETL

Shiny Server Elastic

ETL Data

SQL S3

Web

rAPI

SQL

Shiny Server

Elastic

ETL data

ETL

rAPI

Docker Compose Containers

+ =

Rstudio Server

Environments

ETL

Shiny Server Elastic

data volume

SQL S3

www.dataproduct.com

internal-dashboards.com

ETL

Shiny Server Elastic

data volume

SQL S3

staging-www.dataproduct.com

staging-internal-dashboards.com

Production Staging

Continuous Integration

Github Travis CI

commit

latest-stable tag

Production

pull latest-stable

Staging

pull latest-stableSuccess!

Docker Registry/Rolling Back

Docker Registry

ETL data volume

Changes Deployed to Prod

Save Versioned Image

Danger! Need to Rollback!

ETL data volume

Load Older Image

Docker Registry

Step 3: Assimilate!

(i.e., be kind to your devs)

Assimilate (contd)- HTTP APIs

- OpenCPU, rapier- Docker containers

- Rocker- Command line tools

- Rscript, littler, docopt

Interviewing ● What we want

○ Problem solving ability○ Typical question: how would you approach a problem we are currently

working on?

● What others want○ It depends! :)○ Orgs with DS will ask you the standard DS questions

InterviewingOrgs without DS will ask you about1. Search / recommendations2. SQL3. Data engineering (how would you use Hadoop to …?)4. Software development

Orgs without DS will evaluate you as a1. Subject matter expert (search/recs)2. Software engineer3. DBA / SQL analyst4. Product manager

Be prepared! (study ‘Cracking the Interview’ or similar)

Thank you!

[email protected]