how hadoop makes the natixis pack more efficient

20
How did HADOOP make the NATIXIS PACK more efficient ? a short story by Front Office, PnL, Risks and Finance Pierre Alexandre PAUTRAT, Cyril MONTAGNON, Emmanuel VAIE Dataworks Summit Munich 2017 April the 6 th 2017 #DWS17

Upload: dataworks-summithadoop-summit

Post on 13-Apr-2017

27 views

Category:

Technology


0 download

TRANSCRIPT

Prsentation PowerPoint

How did HADOOP make the NATIXIS PACK more efficient ?

a short story by Front Office, PnL, Risks and FinancePierre Alexandre PAUTRAT, Cyril MONTAGNON, Emmanuel VAIEDataworks Summit Munich 2017 April the 6 th 2017 #DWS17

Prsentation de chacun:PAPCyrilEmmanuel

We are going to tell you about the experience we had in Natixis in the big Hadoop shift from our legacy IT architecture to a new data centric one.How did we structure/design the change process? How did we convince the different IT actors to embrace the change?And how did we avoid conflicts?Pierre Alexandre will tell you: the history of the project, the way the infrastructure was built and about security and cluster governance.Cyril will talk about technologies: our sucesses and our ambitionsIll provide some final points AND try and to make the journey as pleasant as possible!!!!1

THow did HADOOP make the NATIXIS PACK more efficient ?

2How DID WE MEET HADOOP ?A game phase: the Front Office, Pnl,the Risks and Finance departmentsHakas and OpeningsTake AwayQ&A

Let us play the match again12345

The first point will deal with history: pioneeringThe second is about consolidationThe third point will deal with our feedbackThe 4th point will be a synthesisAnd the last point is for you

2

How did we meet Hadoop ?How did HADOOP make the NATIXIS PACK more efficient ?3

Experiments done by two separate departments in 2014Credit card analysis POC and site creation for Marketing (Internal Anonymized Test Only)Hadoop, Elastic Search, KibanaNoSQL Persistence for simulated profits and losses by the Market Risks DepartmentHBASEInformal exchanges between the Front Office and the IT Risks by the end of 2014

How did we meet Hadoop ?How did HADOOP make the NATIXIS PACK more efficient ?4

Big Data Thursday: an open meeting with a positive mood !June 2015: we built a first Platform - secured as a Production platform should be - host our DEV !Target: Go live for a PROD platform Summer 2016, if Pilots projects were to be OK if sharing a platform was ok for everybody (FO and IT Risks)January 2016: First project results accelerate the decision to move forward - especially for regulatory hot topics

STACKHDFSHBASEHIVEKERBEROSQUICK BIDATAINTEGRATIONSQOOPRANGERWORKFLOWSAMBARITRAININGBACKUP AND CONTINUITY PLANPYTHONSTANDARDSSCHEDULERSSPARKBACKUPSRDATA GOVERNANCEATLASWHEREHOWSCOLLIBRA

BILLINGPLAFORMGOVERNANCECOMMITEEPRODUCTION PLATFORMSPONSORHADOOPCOMMITERSSCALAJAVASPARK MLPHOENIX

KAFKAINTEGRATION WITH AUTHORIZATION2017 MarchProduction versionNatixis2.5.3LLAPZEPPELINAUTOMATICSSO2S 2014INDEXIMAANACONDAARCHIVING CLUSTEROur Technical Journey

A game phase: The Front Office, Pnl, The Risks and Finance departments ecosystemHow did HADOOP make the NATIXIS PACK more efficient ?6

Regulatory evolutions

RIM (Regulatory Initial Margin) the initial amount for your loan !

FRTB (Fundamental Review of the Trading Book) the new vision of the Market Risk for the ECB

What events could have made us consider Hadoop as a viable solution ?Why ?The data volumetryThe immutable intrinsic characteristic of HDFS: the data is never lostThe ability to being scaled out horizontally

6

A game phase: The Front Office, Pnl, The Risks and Finance departments ecosystem How did HADOOP make the NATIXIS PACK more efficient ?7

More efficientStop processing data in a sequential wayDo not waste time in transferring data from one NAS to anotherGo beyond the limit of the (usual) monolithic and centralized systemsProcess data where it is in a common and secured place-> HDFSPrecise and secured synchronization -> KAFKANoSQL persistence versus Standard SQL -> HBASEConnecting the BIG DATA universe to the BIG COMPUTE paradigmAdded Value: making Golden Sources available on the cluster

The additional features :

7

A game phase: The Front Office, Pnl, The Risks and Finance departments ecosystem6 avril 20178PnLPnL certificationFinanceRegulatoryProvisionAccountancyFront OfficePositionsMarket dataBig ComputeRisksRisk ScenarioComputeSensitivities certification

As a matter of fact, our different departments are collobarating in such a way :

Prsenter le datalake8

HAKAHow did HADOOP make the NATIXIS PACK more efficient ?9

If you are interested and want to know more: welcome on board !Diversity improves knowledgeOur Infrastructure team is onboarded and curious by natureOpen your minds, exchange with others, contribute to Hadoop!Web Champions inspiration (GAFA)With the banking industryTry to optimise the architecture during this meeting through guided debateAn iterative way A progressive way

Exchange : Big Data Thursday

To bodly go where no man has gone before

What were the key points of success : weekly meetings

9

HAKATITRE DE LA PRSENTATION27 FVRIER 201710

Try your own solution as early as possibleProceed iteratively, work on the DEV with real data of the real sizeEnjoy the Wave between DEV and ProdFind a Minimum Viable Solution for each project A reference, a starting kit, publish everything on the Entreprise Social NetworkAn integrated BI solution (with a Big Data cluster) is crucial: IndeximaDemonstrate use cases to build platform legitimacyA Machine Learning enabled platformWith flagship success in the community

Minimum Viable SolutionDoneMore than 40 Pocs & Projects and 10 in production

Openings: TechnologiesHow did HADOOP make the NATIXIS PACK more efficient ?11

Infrastructure and security helpers:Ambari: setup confortRanger KMS, Ranger: securityAmbari Metrics: monitoringETL and stored like processes, data appenders: HDFS DFS copy, Web HDFSHIVE, High latency (if not using LLAP)Low latency, version control and NoSQL container:HBASE and Phoenix

Openings : the tools sorted by needs

11

Openings: Age of discovery

How did HADOOP make the NATIXIS PACK more efficient ?12

Hive, what else?

ProsConsBest learning curveNot easy to testJDBC compatibleNot iterative Hard to maintainLatencyNot really ACIDAPI is not friendly (UDF)

Data scientists, operatives, POCs

Openings: Age of discovery

How did HADOOP make the NATIXIS PACK more efficient ?13Hive use casesExplore dataI am comfortable with SQLBusiness is pushing hard to produce results

Openings: Age of reason

How did HADOOP make the NATIXIS PACK more efficient ?14ProsConsReduced latencySlow learning curve (Scala)Iterative jobsMemory greedy Easy to testEvolving very quicklyEasy to maintainSlow learning curve (Scala)Friendly APIMemory greedy Large community

Ok now I want a computation engine for developers Spark!

Openings: Age of reasonHow did HADOOP make the NATIXIS PACK more efficient ?15

Spark use casesIterative computations (cache data!)Streaming dataI want to test my codeMachine learning

Openings: Age of reason

How did HADOOP make the NATIXIS PACK more efficient ?16Another tool to read and write data very fast : HBase!

Uses cases : Logs, Time series

ProsConsVery fastJust a distributed multimapLatencyREST apiTTLData model less flexible (than Hive)

Openings: Technologies used NOWHow did HADOOP make the NATIXIS PACK more efficient ?17

Inter-application messaging KafkaDatabase import SQOOPDatascience and prototypingZeppelin with LivyBI and RestitutionIndexIma 10 B records in 10 msec !SQL Server and PolypathData governanceAtlas, Colibra

Take AwaysHow did HADOOP make the NATIXIS PACK more efficient ?18

AssociateDynamic iterative positive mood weekly meetingsManage your projects as a communityMinimum Viable Solutions and iterateIntegrated BI solution : an open window on the big dataDEV. Cluster as a PROD. Cluster : Kerberos is keyHadoop Providers : make them involved in your project !In our case Thank You Hortonworks for your involvement !

SPEAKERSM. Pierre Alexandre [email protected]://www.linkedin.com/in/pierrealexandrepautrat/

M. Cyril [email protected]

M. Emmanuel VAIE [email protected]://www.linkedin.com/in/emmanuelvaieADRESSENATIXIS30, avenue Pierre Mends France 75013 Paris - Francewww.natixis.com

CONTACT

How did HADOOP make the NATIXIS PACK more efficient ?

a short story by Front Office, PnL, Risks and FinancePierre Alexandre PAUTRAT, Cyril MONTAGNON, Emmanuel VAIEDataworks Summit Munich 2017 April the 6 th 2017 #DWS17

Prsentation de chacun:PAPCyrilEmmanuel

We are going to tell you about the experience we had in Natixis in the big Hadoop shift from our legacy IT architecture to a new data centric one.How did we structure/design the change process? How did we convince the different IT actors to embrace the change?And how did we avoid conflicts?Pierre Alexandre will tell you: the history of the project, the way the infrastructure was built and about security and cluster governance.Cyril will talk about technologies: our sucesses and our ambitionsIll provide some final points AND try and to make the journey as pleasant as possible!!!!20