wendelin exanalytics2020 big data with mariadb...© 2014 wendelin project et al. – cc sa-nc erp5...
TRANSCRIPT
© 2014 Wendelin Project et al. – CC SA-NC
Wendelin Exanalytics2020 Big Data with MariaDB
2014-04-03 – Santa Clara
www.wendelin.io
© 2014 Wendelin Project et al. – CC SA-NC
Agenda
● Our background: ERP5
● Our future: Wendelin Exanalytics
● Our challenge: out-of-core
© 2014 Wendelin Project et al. – CC SA-NC
ERP5
MariaDB
NEO
Python
ERP5
Web Workflow
HR
Do
cum
ent
Man
agem
ent
Supply ChainFinance
MRP
Customisation
Fine Grain SecurityFull TraceabilityScalability
FlexibilityRapid prototypingZope TTW on steroids
BankingAerospaceHealthChemicalGovernmentNGOCloud ComputingConsultingMechanical
Online contribution for 3rd parties To-do listsNotifications
Careers and assignmentsPayrollProjects
CR
M
© 2014 Wendelin Project et al. – CC SA-NC
Terra-SAR X Satellite
Accessible to Airbuspartners and distributors
Interfaces with DLR(Germany Space Agency)
« With ERP5, our partners all over the world can access our infrastructure and order online with complete security “ Ralf Duering
Management of sales and production of images
Compliant with ESA
standard (ECSS)
© 2014 Wendelin Project et al. – CC SA-NC
SANEF Group
« Web has become our primary sales channel. » Frédéric Charlier
Online sales and customer relation for ETC Tolling
120.000 new customers / year
51.000 invoice/hour7.000.000 contacts / year
250 users
Implemented in 4 months
© 2014 Wendelin Project et al. – CC SA-NC
Open Source ERP/CRM for S&P 100
© 2014 Wendelin Project et al. – CC SA-NC
Agenda
● Our background: ERP5
● Our future: Wendelin Exanalytics
● Our challenges with MariaDB
© 2014 Wendelin Project et al. – CC SA-NC
Take the Best Analytics scikit-learn.org
© 2014 Wendelin Project et al. – CC SA-NC
Made by Great Mathematicianshttp://en.wikipedia.org/wiki/Fields_Medal
Wendelin Werner
© 2014 Wendelin Project et al. – CC SA-NC
Add Distributed Storage neoppod.org NEO
© 2014 Wendelin Project et al. – CC SA-NC
Add Elastic PaaS erp5.com
# Initialize datadata_size = 1000000server_count = 1000chunk_size = data_size / server_countdata = array(data_size)
# Process data in parallel on each server (Map Reduce, Batch, etc.)for server in server_count: data.activate().process(server*chunk_size, chunk_size)
PaaS
© 2014 Wendelin Project et al. – CC SA-NC
And Multicloud Deployment slapos.org
MMC Rus
© 2014 Wendelin Project et al. – CC SA-NC
Wendelin Exanalytics Core 100% open source
NEO
SlapOS
Scikit Learn
ERP5
Multicloud Deployment
Elastic PaaS
Distributed Storage
Data Analytics
Multi Data Center
10
0% P
yth
on
© 2014 Wendelin Project et al. – CC SA-NC
Wendelin User Interface renderjs.org
© 2014 Wendelin Project et al. – CC SA-NC
Wendelin Options 100% open source
Time sequence processingDataPad / JP Morgan
JIT compiler / type inferenceContinuum / DARPA
Scikit Learn
Pandas
Numba / Parakeet
NEO
10
0% P
yth
on
Blaze Full out-of-core arraysContinuum / DARPA
Reatime log collectionTreasure Data / AmazonFluentd
NLTK Natural Language TookitU. Texas / Chalmers
© 2014 Wendelin Project et al. – CC SA-NC
Wendelin Applications● Intrusion detection
● Fraud detection
● Business and economic prevision
● Marketing
● Media analysis
● Public security
● Brain Computer Interface
● Internet Of Things
© 2014 Wendelin Project et al. – CC SA-NC
Business Model: German Style No VC
Nexedi (WendelinCo)
Scikit Learn
Big Data System User
Extension 1
Big Data System Supplier
Extension 2
100% open source hardware
100%
1 - 10% proprietary
© 2014 Wendelin Project et al. – CC SA-NC
Agenda
● Our background: ERP5
● Our future: Wendelin Exanalytics
● Our challenge: out-of-core
© 2014 Wendelin Project et al. – CC SA-NC
Out-of-core arrays
# Numpynp.ndarray(shape=(2,2), dtype=float, order='F')
# Out-of-core datanp.ndarray(shape=(1e18,2), dtype=float, order='F')
# Full out-of-corenp.ndarray(shape=(1e9,2e9), dtype=float, order='F')
1 Exabyte
1 Exabyte
Best out-of-core topology depends on the algorithm and array geometry
© 2014 Wendelin Project et al. – CC SA-NC
neo.ndarray out-of-core data
neo.ndarray
1 2 3 4 5 6 7 8 9 10 11 12
5
9
6
10
7
11
1 2 3 4
8
12
© 2014 Wendelin Project et al. – CC SA-NC
NEO Overview
neoctlSate accessCommand control
MasterOID & TID allocationSynchronisationLoad balancing
StorageObject dataTransaction dataPartition table
ApplicationZODBneo.client
DataControl
AdminState archivalCommand proxy
© 2014 Wendelin Project et al. – CC SA-NC
NEO Overview
neoctlSate accessCommand control
MasterOID & TID allocationSynchronisationLoad balancing
StorageObject dataTransaction dataPartition table
ApplicationZODBneo.client
DataControl
AdminState archivalCommand proxy
© 2014 Wendelin Project et al. – CC SA-NC
Object retrieval
Retrieve x : hash(x._p_oid)
S1 S2 S3
?Parition Node State
S1 IP:PORT ?
S2 IP:PORT Connected
S3 IP:PORT ?
Partition Node State
0S1
S3
1S2
S3
... ... ...
Variable
Variable
© 2014 Wendelin Project et al. – CC SA-NC
Rodmap
● Q2 2014: neo.ndarray
● Q3 2014: developer release of Wendelin
● Q4 2014: neo.ndarray with simple optimizations
● Q1 2014: mariadb embedded
● Q2 2015: coloured caches
● Q3 2015: coloured caches with C client cache
● Q4 2015: GO storage
© 2014 Wendelin Project et al. – CC SA-NC
Challenges
● Reduce latency → embedded mariadb ?
● Reduce SQL overhead → precompile queries ?
● Reduce copies → BLOB protocol ?
● Accelerate storage → C++ ? GO ?
● Optimize cache → colored caching
© 2014 Wendelin Project et al. – CC SA-NC
Wendelin Exanalytics2020 Big Data with MariaDB
2014-04-03 – Santa Clara
www.wendelin.io