proposal digipolis acpaas - vub course published.pdf · myths and misconception debunked....
TRANSCRIPT
VUB Lecture
March 2019
Topics
• Introduction
• Getting into the industry
• Academy
• Audit report
• Proposal NLP
• What I wish I had known
• Start up work
• Ethics
• Research
My path
• 2010: Bachelor in Physics, ULB.
• 2012: Master in Astrophysics and Space Sciences, ULg
• 2017: PhD in Oceanography, GHER, ULg
• Machine Learning Engineer at Omina Technologies
Skills: Numerical Modelling, Data Assimilation, Artificial Intelligence
Master
• Astrophysics and Space Sciences:• Astronomy
• Celestial Mechanics
• General Relativity
• Atmosphere
• Optics
• Physical Oceanography
• Numerical methods and programming
Master Thesis
• Discovered applications of numerical modelling in Oceanography.
• Numerical modeling of the 2011 Tōhoku tsunami and its propagation in the Pacific ocean.
• Programming in matlab, fortranand bash.
• Handling data, working on a supercomputer, …
PhD
• GeoHydrodynamics and Environment Research
• Supervision of Dr. Alexander Barth
• Research, and some side activities:• EGU
• Calvi
• Workshops
PhD• Thesis: Constraining model biases in a global general circulation model with
ensemble data assimilation methods
Looking for a job
• Questions:• What do I want to do?
• What can I do?
• Where to find opportunities?
Deterministic
ModellingData
Assimilation
Artificial
Intelligence
Build your network !
Current career
• Research and development in artificial intelligence
• Consultancy at other companies:• Telecommunications
• Banking
• Marine technologies
• Transportation
• Domotica
• …
• Developing products and services
• Talks at conferences and AI schools
To the industry
We give you the power to turn your big data
into smart data
We give you the knowledge to take important decisions
at lightening speed
We give you the ability to learn from the past and
predict the future
Omina Leadership
Tom Vervust
COO
Rachel Alexander
CEO
Lynn Westall
VP Bus Dev
MANAGEMENT TEAM
Johan LoeckxHead of VUB
AILaboratory
Freddy Van Den Wyngaert
Former CIO AGFA Gevaert
Pol HauspieFormer CEO
Lerhnout and Hauspie
Luke PetershmidtEntrepreneur &
Creator of Bakugan
ADVISORY BOARD
Omina Technologies
Omina Consultancy
Omina Core
®
Full Stack Machine Learning Software Platform
Our powerful and unique ML software has been designed from the ground up to be
easily tailored for companies across a wide set of verticals and their diverse use
cases.
Scientific Research
Research and development partnership with the VUB for semi-generic toolbox
development.
End-to-End AI Solution Development
End-to-end development and support of AI solutions, customized application for
wide range of industries.
Omina Academy
Security & Legal
Data security, legal and privacy
issues when dealing with data. How
to setup AI experiments and
implementations.
HR
Will AI replace all the humans in your
company ? What might be the impact
on your organization. How to you
prepare for it.
What is AI
ABC of AI. Myths and misconception
debunked. Difference between AI and
automation. How could it benefit your
business. What is our approach.
Data is the Key
Is your data valuable or not.
Understand the importance of data
and learn how to optimize your data
now.
Governance
How to implement AI in your
company according to your
governance. Best practices & cases
studies that worked.
Best Practices
Ready to step in the driver seat ?
How did other companies
successfully started innovating.
Management ReportOmina Academy
December 2018
AI Globally
17
Understanding AI
❑ General customer refers to Science Fiction: Terminator, iRobot, Black Mirror, …
▪ Intelligence/Consciousness.
▪ General/Narrow intelligence.
▪ It will take over jobs.
▪ Doomsday is coming.
▪ Magically solve all problems.
❑ Consequences:
▪ Lack of credibility, Overselling.
▪ Marketing tool: AI for selling.
▪ Hypes and AI Winters.
https://wallpaper-gallery.net/single/i-robot-wallpaper-18.html
18
19
20
Classical Machine
Learning
NN
Deep
Learning
AI
Deep Learning (Algorithm)
21
0
1
2
3
4
5
6
7
8
9
4
Fully Activated
No Activation
Partially Activated
Hidden 1 Hidden 2Input Layer Output Layer
Mathematics: In Relation to SVM
22
❑ Support Vector Machine Algorithm:
▪ Find the largest geometric margin between the labeled data points.
❑ Convex optimization problem:
▪ 𝑚𝑎𝑥ෝγ,𝑤,𝑏 ොγ, with γ 𝑖 = 𝑦 𝑖 𝑤
𝑤
𝑇𝑥 𝑖 +
𝑏
𝑤
Subject to the constraint: 𝑦 𝑖 𝑤𝑇𝑥 𝑖 + 𝑏 ≥ ොγ
❑ Solved using the generalized Lagrangian :
▪ 𝐿 𝑤, α, β = 𝑓 𝑤 + σ𝑖=1𝑘 α𝑖 𝑔𝑖 𝑤 + σ𝑖=1
𝑙 β𝑖 ℎ𝑖 𝑤
23
Credit Risk Classifier
Train the model with major classification algorithms
Decision Tree Classifier is a simple and widely used classification techniques. Key objective of the learning algorithm is to build
predictive model that accurately predict the class labels of previously unknown records.
The decision tree classifiers organized a series of test questions and conditions in a tree structure.
Model 2: Decision Tree
Management ReportData Science Audit
December 2018
AGENDA AUDIT OBJECTIVES PROJECT SCOPE
Management Report Meeting
• Provide a general overview of AI
trends in the market
• Access publicly available data
regarding competitors
• Semi-structured interviews will inform
the capacity report
• Survey of currently used models and
technologies will also inform the
integration of ML and AI technologies
• Strategic road map will be informed
by both capacity and benchmarking
reports
• Skills workshop aims to build
capacities in ML and AI technologies
• Data deep dive will provide strategic
direction to tackle concrete use cases
• Benchmark report of the current state-
of-the-art developments in AI and ML
in business domain
• Capacity report analysing current
capacity with regard to AI and ML
technologies
• Strategic roadmap for integrating AI
and ML technologies into current and
future offerings
• Bundled management report
• Tailored skills workshop presenting
findings of capacity and roadmap
reports
• Deep dive into use case
Presentation of the content of the
Management Report:
1. Capacity report
2. List of possible use cases from
the interviews
3. Benchmark of the competition
and current AI activities
4. Strategic roadmap to achieve AI
transition
5. State of the art developments in
AI in marine technologies
Skill workshop: To be planned in
January 2019
Capacity Report: Strengths
Core development
teams
Interest in AI possibilities
Excellent Practical
Programming Knowledge
Company wide positive
feeling towards AI
Strong Theoretical background
for AI
Capacity Report: Weaknesses• General concepts and domains
• Lack of examples on past projects
• No link between currently used tools and AI equivalent
Lack of generalAI knowledge
• Mainly matlab and excel formula
• No knowledge about AI libraries
Limited Python experience
• Stringent Guidelines and rules to follow
• Licensed softwares
Project Specificities
• No repetitive work
• Often unique projects
• Difficulty to identify « quick wins »
Niche MarketPlayer
• Difficulties to keep up-to-date with latest developments
• Lack of knowledge around AI applications in respective fields
Research and Education
List of Use cases
Team 1
Bathymetry for sea-bedmigration
Fibre opticcables
protection monitoring
Soil thermo-conductivityinvestigation
Darwin platform for world-widewindfarm data
Team 2
Deterministicmodel
replacement
Local data-driven forecasts
Team 3
OvertoppingNeural
Networks
Coastal design parameterresponse
Materialpropertiesparameterresponse
Empiricalprocesses
formula replacement
Team 4
Image recognition on satellite data
Insight in complexsystems
Complex model local
replacement
Team 5
Breakwaterstructure flow investigation
Historical dikefailures stability
modelling
Team 6
Flood riskestimation
Team 7
Missing data filling from time
series
Image processing for data extraction
Competition Benchmark
Estimation of AI capacity
Financial capabilities
History and length of AI
activity
Historicalinterest in AI
AI technologyimplementation
AI skills
Partnership or in-house skills
Active recruitement of
AI experts
Leaders
Deltares
Royal HaskoningDHV
Artellia
Atkins
Followers
Antea
Cowi
Sumaqua
Sweco
VITO
Latecomers
Intertec
Arcadis
DHI
Egis
HKV
Outsiders
Hydroscan
Witteveen + Bos
ECOSHAPE
SBE
Strategic Roadmap: Classic approach
Execute Pilot program
•Build up familiarity
•Get stakeholders interest
•Get support
•Find business values project
•Avoid difficulties
Extend in-house capabilities with AI team
•Continue on momentum
•Define cohesivecompany strategy
•Share acquiredknowledge
Acquire broad AI training
•Broaden common knowledge
•Provide courses and formations to relevant people
•Brainstorm ideas
Develop an AI strategy
•Consolidation of AI approach
•Obtain competitiveedge
•New/betterproduct
•More customers
•More data
•Support companyshift towards AI
Developcommunication
•Effect on stakeholders
•Care for guidelines and rules
•Customers care
Strategic Roadmap: Market approach
AI suggested approach:
•Develop unique AI solution where no classic approach is possible
•Combine AI and domain knowledge into AI leaders in every department
•Learn from academia exploration
AI implementation obstacles:
•Uniqueness of every project
•Lack of data from previous projects
•Domain specific knowledge requires high innovation potential
•Has built a long history of AI activity in research
•Strong AI backgrounds and capabilities in-house
Competitor1
•Partnership with data sciences company
•Acquisition of AI company
•Combination of domain knowledge and AI experts in-house
Competitor2
•Development of specific products
•Partnership with Oracle and IBM
Competitor3
Strategic Roadmap: Solution
Bringing in knowledge
Identifyingtechnologies
Approachingcustomers
Bringing and building
knowledge
Recruitment of 1-2 AI experts
• Support for proposals writing
• Transfer knowledge
• Identification of previous use-cases
• Implementation support
Training of departmentvolunteers
• MOOC
• Self-education and research
Focus efforts on knowledgetransfers
Approachingcustomers
Identification of potentialinnovation partners
Development of new products
Strategic Roadmap: Technologies
Objectives:
•Complex process insight
•Model replacement
•Local forecasts
Neural Networks:
•CNN
•RNN
•Deep Learning
Model knowledge:
•SVM
•RF
•K-means
General knowledge:
•Bias/variance trade-off
•Parameter tuning
•Condition of applicability To avoid
Meta-Learning
GeneticAlgorithms
GANS
Proposal NLP Engine
Digipolis - A001021
November 2018
Requirements
• Can divide texts in sentences, words, part of words.
• To be able to identify semantic relations between entities, words, sentences.
• To be able to tag the content of documents.
• To identify the similarities between words, sentences and documents.
• Items like places, names, dates identification.
• Sentiment analysis on the author of the message. To estimate the objectivity of a sentence.
• To build a corpus around content domains.
• Model must also be able to learn from feedback info.
Requirements
• Use case 1:Portal of the city of Antwerp: Orders are placed through the portal. The solution must be robust and make the link between miss spelled words and semantic linked words: “Pencil, pensil, or pen.
• Use case 2:Following social media, in particular Twitter (with Twitter feeds, I guess it’s an already existing engine). Events of the cities get feedback from citizens on twitter. The solution must be able to retrieve the attitude of the citizen from tweets in relation to an event in particular.
Operational Requirements
• Support: Support of the delivered product, counting also support costs.
• Infrastructure: On premise or in the cloud, required description of the platform used to deliver the solution. Digipolis also requires the minimum capacity of storage, memory, CPU, networking, …, and which operational constraints are crucial.
• Availability:The solution is to be delivered as and generic platform, which can tackle both given use cases. The ultimate use of the solution will be business critical. Omina will need to provide insurances over the availability of the solution.
• Logging and monitoring:Omina will need to describe the planned built-in logging and monitoring, which will need to be operational when the product is delivered and used.
Proposal Rating
Price model: 25%• Budget compared to deliverable
• Maintenance cost after delivery.
• Marketing: How is the product available on the market? Open source, license, support, SaaS?
Concept: 25%: • Availability, quality, concept
• Design principle: API, microservice, …
• Architecture, components?
• Wireframes, POC ?
Development plan: 25%:• Agile principles
• Transparent sprint planning and monthly or bimonthly releases.
Technical aspects: 25%:• Use of future-proof technologies.
• Re-use of ACPaaS components.
• Reusability of the solution.
• Innovation of the solution.
Step 1: Connect to the Digipolis NLP interpreter engine
Upload your Document(s)
NLP ENGINE
Step 2: Choose the desired output
Select your service
SELECT LANGUAGE
+
SERVICE
Step 3: Upload the document(s) to be analyzed
Upload your Document(s)
UPLOAD DOCUMENT(S)
Step 4: Request is sent to the NLP engine
Select your service
REQUESTED
Step 5: Receive result from the NLP Engine
Token
number
Token Lemma Morphological
segmentation
PoS tag Confidence in
the POS tag
Named
entity type
Base Type of
dependency
1 Marie Marie[Marie] SPEC (deeleigen) 1.000000 B-PER B-NP 1 Su
2 Vroeg Vragen[vraag] Ww(pv, verl, ev) 0.532544 0 B-VP 0 ROOT
3 Zich Zich [zich] VNW (refl, pron, obl,
red, 3, getal)
0.999740 0 B-NP 2 Se
4 Af Af [af] VZ [fin] 0.996853 0 0 2 Svp
RESULTS
Possible procurement platform wireframe
Bestelformulier
Naam:
Departement:
Type bestelling:
Product:
Jan Janssen
WZC De goede deugt
Kantoormateriaal
Stylo
Bestellen Annuleren
Bedoelde u: Woonzorgcentrum De Goede Deugd
Bedoelde u: Balpen
Technical Architecture: Via User Interface
START
I log onto
the NLP
User
Interface
I want one or
more
documents
processed
by the NLP
Engine
I choose:
a) in which
language the
documents
are written
b) Which
domain the
documents
belong to
I choose which
actions are to be
performed on the
documents
a) Item ID
b) Semantic
Extraction
c) Content
Extraction
d) etc
An API is
called:
The
document
and chosen
parameters
are passed
in the JSON
JSON is
passed to
pre-
processor
This is then
passed to
the STEM
processor
The output is
a normalized
document
upon which
NLP actions
can be
performed
NLP
Microservices
are activated
based on the
parameters in
the JSON
Sentiment
Analysis
Domain
Association
Semantic
ExtractionContent
Extraction
Item
Identification
Spelling
Check
Main Micro Services
Document
Translation
XXX
(Additional
Services)
NLP Engine
1) Docments are saved in a database
2) Corpus (per domain) is updated
An API is called
passing the
information
requested in JSON
format back to the
end user
JSON is
translated
to a user-
friendly
output for
the user
The NLP Engine is a combination of the following open
source technologies customized for the usecases of Digipolis
- FROG
- LAMachine
- NLTK
- Python Libraries
MongoDB
Project Plan2018 2019
November December January February
W46 W47 W48 W49 W50 W51 W52 W1 W2 W3 W4 W5 W6 W7 W8 W9
K
Data
Deep
Dive
1.13
Infr.
5 7
84 6
9
Sprint 1.1 and 1.2 Sprint 2.1 and 2.2 Sprints 3-9 Sprint 10.1 and 10.2 Sprint 11
Front-
end
Input
10.1
Test, Train, Deploy
Improvements
Corpus
Data
Deep
Dive
1.2
Pre-
Processing
2.1
Pre-
Proces
sing
2.2
Item Module
Requirements Implementation Test Deploy
Front-
end
Output
10.2
• Mobilize the project team
• Align sprint planning &
approach
• Workshops with end-users
to understand system
requirements (inputs &
outputs)
• Data exploration to
evaluate complexity
• Understand existing
environment
• Identify Infrastructure
requirements
• Connecting databases
• Set up / adjust the IT
landscape
• Decomposing and
transforming data into
usuable formats for micro
services
• Process to clean data
• Collect requirements from
SMEs, implement solution,
test and validate with
stakeholders
• Service integration into
ACPaaS engine
Data Analysis Report Infrastructure blueprint &
workable IT environment
Cleansed Data Implemented micro services
Sponsor
Subject Matter Experts
DBA
Subject Matter Experts
DBA
DBA Subject Matter Experts
DBA
Subject Matter Experts
Data Engineer
Machine Learning Engineer
Project Manager
Data Engineer
Machine Learning Engineer
Data Engineer
Machine Learning Engineer
Software Developer
Data Engineer
Machine Learning Engineer
Machine Learning Engineer
Software Developer
Project Plan - Phases & Milestones
Kick-Off Data Deep DiveDevelop Pre-
Processing ModuleDevelopmentCore Modules
Infrastructure Set-up
Sprint 1.1 and 1.2 Sprint 2.1 and 2.2 Sprint 3-9
Project Plan - Phases & MilestonesSprint 3-9 Sprint 10.1 and 10.2 Sprint 11
• Collect requirements from
SMEs, implement solution,
test and validate with
stakeholders
• Service integration into
ACPaaS engine
• Extend existing ACPaas
user interface
• Create Output visualization
for EndUsers
• Consolidation of micro
services
• All-round demo
• Extend functionality with
additional features (tbd)
• Preparation of Test Cases
• Integration Testing
• User Acceptance Testing
• Performance Testing
• Review technical
documentation for
handover
• Create user guidelines
• Go-Live & hypercare
• Handover to Digipolis
• Collect customer feedback
• Continuous Improvements
• Maintenance & Support
Implemented Corpus module End user application interface Improved micro services &
test cases
Tested, Documented and
Deployed solution
Supported Solution
Subject Matter Experts
DBA
Subject Matter Experts Sponsor
Subject Matter Experts
DBA
Subject Matter Experts
DBA
Subject Matter Experts
Data Engineer
Machine Learning Engineer
Software Developer
Machine Learning Engineer
Software Developer
Date Engineer
Machine Learning Engineer
Software Developer
Project Manager
Data Engineer
Machine Learning Engineer
Software Developer
Project Manager
Machine Learning Engineer
Software Developer
Development Corpus
Front End Developments
Test, Train & Deploy
SupportContinuous
Improvements
What I wished I hadknownNovember 2018
General Work Guidelines
• Know yourself and know what you do not want.
• Be pragmatic in work.
• Communication is key:• Clients
• Business
• Colleagues
• Real situations are never optimal.
AI Guidelines
• Keep Learning and stay ahead:• One apple a day keeps the doctor away; One paper a day keeps the doctor away.
• Learn « hard » skills:• Software engineering
• Big Data
• Learn « soft » skills:• Presentation of work
• Project management
• Work methods: Waterfall, agile, scrum, lean, six sigma, kanban, PMI/PMBOK
• Learn about tools:• Jira, Realtimeboard, Slack, …
Start-up environmentNovember 2018
Working at a start-up
• Freedom
• Innovation
• Grow opportunities
• Variety of work
• Actually building something from scratch
• Chaotic environment: Lack of explicit rules
• Lots of responsibilities
• Many different skills to learn
Ethics in AINovember 2018
Ethics
Be responsible.
Omina CoreNovember 2018
Omina Core
• Core idea: Make AI accessible to SME
• Semi-generic toolbox of modules
• Research partnership with VUB
Data AssimilationNovember 2018
Model and observations
Data Assimilation Methods
4DVar and Neural Networks