big data + data science startup focus points
TRANSCRIPT
BIG DATA & DATA SCIENCE START-UP FOCUS POINTS
+ BUSINESS AND TECHNOLOGY REFERENCE ARCHITECTURE
@TomZorde
I HAVE AN IDEA FOR A DATA SCIENCE START-UP
• Use these slides to focus conversation
• What stage are you at?
• What is the problem you’re trying to solve?
• What type of business model would work?
• Tools? – A rapidly evolving space.
• Reference Architecture helps identify what level of the stack we’re talking about.
AREAS OF EARLY FOCUSSEED STAGE - Research & Development
1. Research & Define Concept, business model, internal & sourced capabilities
2. Define customer value proposition and identify target market
ANGEL – Business Planning & Product Development
3. Identify services and products required and evaluate gaps for go-to-market readiness
4. Source funding partner to build minimum viable product and get commitment for round 2 funding
5. Assemble team and build MVP prototype exceeding expectations
ROUND 1/ SERIES A FUNDING – Commercially operational
ROUND 2 / SERIES B FUNDING – Fully Operational
ROUND 3 / SERIES C FUNDING – Expansion
IPO/ ACQUISITION
BUSINESS PLANNING & DEVELOPMENT - LOGICAL STEPS1. Full business needs and information requirements
analysis. Business Drivers
• Revenue generation? Cost reduction? Customer retention? Compliance?
• Process Improvement? Fraud detection? Analytics? Dashboard?
• Solving a tough problem? Retiring/replacing assets, technologies and systems?
2. Technology Evaluation and Selection
• Define requirements and objective first
• Evaluation a variety of technology stacks – develop a framework first
3. Board Support for Start-up Resources
4. Prototyping, Discovery, and Planning• Rent Infrastructure in Cloud – VMWare, AWS, MS
Azure and others• Use Spare Hardware and Network Bandwidth• Assessment, Proposal. Project/Program Plan for
next steps• Start small and keep delivering
5. Architecture Design, Estimation, Business Case6. Obtain funding and executive sponsorships, owners, etc.7. SDLC, don’t forget Hardware, Security, Testing, Data governance etc.
FORESEEABLE CHALLENGESBusiness urgency, time to market pressures
• Big Data /Data Science start up needs careful planning
• Big Data needs infrastructure, software stacks, people, start up plan
Lack of Big Data Resources, Lack of Sponsorships (except in some companies)
• Big Data is complex and multiple skill sets (mostly new to many companies) – Infrastructure, Administration, Security, Programming, Testing, etc.
• Skepticism about Big Data
Integration with Existing Technologies and Systems• Can not develop isolated big data solutions
• Integration with existing systems will be a top challenge (requires both sides to do additional work)
Open Sources: Stability, Maturity, and Security
INFORMATION AS A PRODUCT/SERVICE
TYPES OF RELEVANT BUSINESS MODELS
Differentiation
New Services
Customers Experience
Contextual Relevance
Brokering
Raw Data
Benchmarking
Analysis and Insight (Meta Data)
Delivery
Market Place
Facilitator
Advertising
REFERENCE ARCHITECTURE
Decisions & Insight
Analytics & Discovery
Data Access and Distribution
Data Collection& Organisation
Infrastructure Platform
Mon
itorin
g, A
lert
s, To
ols,
Se
curit
y, G
over
nanc
e
• The technology stack is rapidly evolving with all traditional as well as new vendors providing offerings• Open source tools remain at the foundation layers.• Different use cases will require different technology tools.
REFERENCE ARCHITECTUREDecisions & Insight• IBM Watson• Industry Specific
Analytics & Discovery• SAP Business Objects• IBM Cognos• SAS Analytics• Dell Statistica
• Oracle Hyperion• Microsoft BI• KNIME• Pentaho• Informatica
REFERENCE ARCHITECTUREData Access and Distribution• Document: MongoDB, CouchDB• Graph: Neo4j, Titan• Key Value Pair: Riak, Redis• Columnar: Cassandra, Hbase• Search: Lucene, Solr, ElasticSearch
Monitoring, Alerts, Tools, Security, Governance:• Hadoop:Apache, CloudEra, Hortonworks,
MapR, IBM• SQL Mapping: Hive• Big Data Transformation: Pig• Hadoop Load: Sqoop• Realtime-ETL: Storm• Cluster Computing: Apache Spark• Languages: Python, Java, R, Scala
REFERENCE ARCHITECTUREData Collection& Organisation (Batch & Real-Time)• Hadoop• Hadoop Map Reduce• Mahout
Infrastructure Platform• AWS• Azure• Mortar• Google BigQuery• Qubole
• Dell• HP• IBM
BIG DATA & DATA SCIENCE START-UP FOCUS POINTS
@TomZorde
Thank you