applied ai tech talk: how to setup a data science dept
TRANSCRIPT
Tech Talks: How to Setup a Data Science Business Function Jun 2015 www.applied.ai
How to Setup a Data Science Business Function
Applied AI Tech Talk
● We are data scientists:
○ variously quants, statisticians, actuarial & machine learning types
● We are consultants:
○ we do complex data analysis, predictive modelling etc
○ and we also help to do the soft stuff...
… enabling companies to learn from their data in a sustainable way
This is a totally biased talk
Like any collaborative business effort involving research & development, a data science function should be built carefully in order to enable the best expertise and technologies.
- Me, ~2 weeks ago
http://blog.applied.ai/how-to-build-a-data-science-business-function/
How to Setup a Data Science Business Function a.ka.
Making in-house Data Science sustainable
● Including, for example:
Data Science is a broad discipline
one-off scenario-specific modelling
exercises
on-line predictive modelling of user
actions
regular analysis of campaigns and
customer discovery
… and a significant amount of data acquisition, preparation, storage etc
● To be sustainable and minimise risk, we need to combine:
○ great people
○ advanced maths
○ scientific experimentation
○ software engineering
○ high-quality data
○ solid business practices
○ communication
The most important thing is communication
https://www.quora.com/How-could-the-Data-Science-Venn-Diagram-be-improved
1. Setting up and sizing the team
2. Defining and operating projects
3. Systemising the data pipeline and analyses
4. Ensuring effective communication
… to help us make in-house Data Science sustainable
Four main areas to cover:
● The practitioner will use a wide variety of tools to:
○ acquire, manipulate, store and access data efficiently
○ design surveys and scientific experiments to test hypotheses
○ undertake statistically valid analyses
○ implement high-quality, optimised predictive models
○ derive and communicate actionable insights
… requiring diverse skills covering database management, software engineering, statistical analysis, machine learning, graphic design, ethics, social responsibility, domain knowledge and communication.
1. Setting up and sizing the team
Data Scientists need a lot of skills!
● But the days of hiring a single, unicorn-like, 'full-stack' data scientist are pretty much gone, and probably never really existed.
1. Setting up and sizing the team
Don’t believe in unicorns
The team needs to be small, agile and focused:
● 2-6 data scientists is ample
● they should be proven generalists, team-players and pragmatists
● able to cope with vague requirements, messy data and high failure rates
“The first hire(s) should help get three things ready: your data; a clear problem to be solved; and a process to evaluate the business impact of any new solution".
- Simon Chan, Forbes, April 2015 http://www.forbes.com/sites/theyec/2015/04/30/how-to-do-your-
first-data-science-hire-right/
1. Setting up and sizing the team
Start with a small, focused team
Any piece of research or development likely to last more than a few days and/or involve more than one person should have:
● A primary sponsor and a project leader
● A well defined goal (SMART), and a written spec
● Progress meetings to validate and update the plan, with full and frank
communication between major stakeholders
● Knowledge sharing upon completion
● Consider maintaining a basic RACI and risks & issues register.
2. Defining and operating projects
Automate good workflows and deal with technical debt:
● Understand and map the data 'pipeline'
● Stop when the models are good enough
● Encourage a systematic, shared approach to the creation of all machine
learning tools and analyses, with:
○ proper source control and documentation
○ code reviews & 'lunch and learn' seminar sessions
○ regular refactoring of algorithms, applications and data preparation
scripts where appropriate.
3. Systemising the data pipeline and analyses
Strong communication within & without the team is vital, helping to
ensure that projects stay on-track and issues are spotted early:
● Daily stand-up meetings (<10 mins), sharing immediate activities & issues
● An up-to-date communal task schedule - e.g. the Kanban methodology
● Simplified and centralised comms tech; move written discussions away
from email and towards wikis, message boards, and group chats Slack
● Try to allow data scientists / software engineers the time & space to get
into a productive flow state without meetings and interruptions.
4. Ensuring effective communication
● Start with a small team of capable generalists and work hard to define the
business problems and success criteria, set timescales and to understand &
access the available data
● Allow for and embrace failure, give data scientists time and space to
research and experiment
● Specialise when necessary, automate where possible and embed into an
ongoing cycle of development, maintenance and support.
● Require a corporate sponsor with clout and encourage strong
communication within the team and the rest of the business
http://blog.applied.ai/how-to-build-a-data-science-business-function/
In review
Applied AI is a data science consultancyWe provide data-driven insights and solutions using applied artificial intelligencewww.applied.ai
Thank You
Any questions?