agility for big data
DESCRIPTION
Agility for big dataTRANSCRIPT
![Page 1: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/1.jpg)
Agility for Big DataMy journey implementing an Agile method to Big Data applications
![Page 2: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/2.jpg)
Who I am
![Page 3: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/3.jpg)
What is the hardest part about bringing agility to your big data applications?
![Page 4: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/4.jpg)
“The more data you give the business, the more questions they will ask”
Jose Carlos EirasServed as CIO at Kraft Foods, Philip Morris, General Motors and DHL
![Page 5: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/5.jpg)
Reporting over Workable Software
![Page 6: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/6.jpg)
Reporting over Workable Software
• Problems experienced
• Customer don’t know about they want until they see that
• Very long feedback cycle because of waiting for quality data
• Developing workable software is much more expensive than generating a report manually
• Workable software without data to use is even more expensive
• Switching cost between tasks is high, but the switching cost between projects is even higher
• Releasing a feature to All Users will result in more questions coming in, either because of data quality or other valid reasons
• Very low product success rate, lots of resources wasted and low team spirit
![Page 7: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/7.jpg)
Reporting over Workable Software
• Solutions
• Focus on a very specific customer group and generate reports for them
• Collect data that targets a very specific customer group, like: parents in Box Hill area who work in IT
• Manually generated reports
• Data quality easier to control over a small amount of data
• Deliver reports to end users in the most cost effective way: eg face to face, email, or open source BI tools
• Get feedback and test hypothesis
• Focus on a subset of data while discovering the value of existing data
• Apply new methodology to a subset of data in a much more effective way
• Data quality easier to control on a subset of data
• Focus on one customer and get feedback from the client
• Test hypothesis
![Page 8: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/8.jpg)
Reporting over Workable Software
• Solutions
• Data Freedom - Empowering people (example - data scientists exploring data values)
• Provide an SQL-like interface for users to easily access the data
• Provide semantic schema so that users can easily find where to find right data
• Document your data if necessary to help other people understand, decipher and use data
• Provide easy-to-use report designs for accessing data like Pentaho, Jasper Report
• Provide easy to use scheduling tools like Oozie, or general BI tools
• Mentally, developers should provide support for other people to freely explor data in ways they like
• In the scenario that data must be accessed through developers, those developers should think about what stops other users from accessing data
• Safeguard to prevent cluster overloading
• The overall result will be to increase the speed of feedback - dramatically
![Page 9: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/9.jpg)
Reporting over Workable Software
• More to try
• Automated data quality control
• Explore different ways for the customer service team to address data quality issues
• Sampling data for product discovery programs
• Explore ways to test a hypothesis in an even quicker manner – example: customer centric data collection and reporting
• Explore a wider scale of data freedom through web service
![Page 10: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/10.jpg)
Continuous Delivery
• Continuous delivery, where to start?
• Problems: legacy systems, low unit test coverage, low functional/ integration test coverage, no acceptance testing, not enough testing data, and so on…
• Start with an easy problem so that it is achievable and will help to build team trust
• Must have – testing data and integration testing suites
![Page 11: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/11.jpg)
Continuous Delivery
• Build pipeline //dev box//build//daily build server//alpha//beta//production//
• Testing Data - you will never cover all scenarios, so what do you do? Hybrid data fixtures with data manual produced, generated, and from production
• Versioning Data
• Keep data clean as code, refactor your data often
• Backward and forward compatibility
• Vertical slicing story, architecture and teams
• NoSQL database engines
• Start continuous delivery for some components NOW and learn from
![Page 12: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/12.jpg)
Deployment != Release
• Separate deployment from release
• Tips
• Data batch toggles
• Feature toggles
• Customer/ Country/ Region releases
• Manually generated report area
• Don’t forget about “exclusive” toggles
• Leave release up to the production manager. They release and they organize press releases.
![Page 13: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/13.jpg)
Q&A
What is the hardest part about bringing agility to your big data applications?
![Page 14: Agility for big data](https://reader034.vdocuments.mx/reader034/viewer/2022050620/53ed58338d7f7289708b5bf4/html5/thumbnails/14.jpg)
My Personal Information
• LinkedIn Profile: http://au.linkedin.com/pub/charlie-cheng/24/92/978/
• Twitter: @charlie_cheng
Are you looking for some training and find it is hard to select the right one?
We are running a customer discovery program on it at StudyIsFun.
Please contact me at [email protected] if you are interested.