big data architecture workshop - cloudera · pdf filethe cloudera big data architecture...

2
DATASHEET The Cloudera Big Data Architecture Workshop (BDAW) is a 3-day learning event that addresses advanced big data architecture topics. BDAW brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data architecture problems in general, and then applies them to the design of a challenging system. Throughout the highly interactive workshop, participants apply concepts to real-world examples resulting in detailed synergistic discussions. The workshop is conducive for participants to learn techniques for architecting big data systems, not only from Cloudera’s experience but also from the experiences of fellow participants. More specifically, BDAW addresses advanced big data architecture topics, including, data formats, transformation, real-time, batch and machine learning processing, scalability, fault tolerance, security and privacy, minimizing the risk of an unsound architecture and technology selection. To gain the most from the workshop, participants should have working knowledge of technologies such as HDFS, Spark, Map-Reduce, Hive/Impala, Data Formats and relational database management systems. Detailed API level knowledge is not needed, as there will not be any programming activities. The workshop will be divided into small groups to discuss the problems and develop solutions. Each group will select a spokesperson who will present the group’s findings to the workshop. There will not be any programming labs, but we will have solutions implemented and deployed in the cloud for demos during the workshop. “ Cloudera has not only prepared us for success today, but has also trained us to face and prevail over our big data challenges in the future by using Hadoop.” – Persado Big Data Architecture Workshop

Upload: lephuc

Post on 26-Mar-2018

225 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Big Data Architecture Workshop - Cloudera · PDF fileThe Cloudera Big Data Architecture Workshop ... by using Hadoop. ... Cloudera_BigData_Architecture_Wrkshp_DS_101

DATASHEET

The Cloudera Big Data Architecture Workshop (BDAW) is a 3-day learning event that addresses advanced big data architecture topics. BDAW brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data architecture problems in general, and then applies them to the design of a challenging system.

Throughout the highly interactive workshop, participants apply concepts to real-world examples resulting in detailed synergistic discussions. The workshop is conducive for participants to learn techniques for architecting big data systems, not only from Cloudera’s experience but also from the experiences of fellow participants.

More specifically, BDAW addresses advanced big data architecture topics, including, data formats, transformation, real-time, batch and machine learning processing, scalability, fault tolerance, security and privacy, minimizing the risk of an unsound architecture and technology selection.

To gain the most from the workshop, participants should have working knowledge of technologies such as HDFS, Spark, Map-Reduce, Hive/Impala, Data Formats and relational database management systems. Detailed API level knowledge is not needed, as there will not be any programming activities.

The workshop will be divided into small groups to discuss the problems and develop solutions. Each group will select a spokesperson who will present the group’s findings to the workshop. There will not be any programming labs, but we will have solutions implemented and deployed in the cloud for demos during the workshop.

“ Cloudera has not only prepared us for success today, but has also trained us to face and prevail over our big data challenges in the future by using Hadoop.”– Persado

Big Data Architecture Workshop

Page 2: Big Data Architecture Workshop - Cloudera · PDF fileThe Cloudera Big Data Architecture Workshop ... by using Hadoop. ... Cloudera_BigData_Architecture_Wrkshp_DS_101

© 2017 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.

Cloudera_BigData_Architecture_Wrkshp_DS_101

Course details:

About ClouderaCloudera delivers the modern platform for machine learning and advanced analytics built on the latest open source technologies. The world’s leading organizations trust Cloudera to help solve their most challenging business problems by efficiently capturing, storing, processing and analyzing vast amounts of data.

Learn more at cloudera.com

1. Introduction

2. Workshop Application Use Cases• Oz Metropolitan

• Architectural questions

• Team activity: Analyze Metroz Application Use Cases

3. Application Vertical Slice• Definition

• Minimizing risk of an unsound architecture

• Selecting a vertical slice

• Team activity: Identify an initial vertical slice for Metroz

4. Application Processing• Real time, near real time processing

• Batch processing

• Data access patterns

• Delivery and processing guarantees

• Machine Learning pipelines

• Team activity: identify delivery and processing patterns in Metroz, characterize response time requirements, identify Machine Learning pipelines

5. Application Data• Three V’s of Big Data

• Data Lifecycle

• Data Formats

• Transforming Data

• Team activity: Metroz Data Requirements

6. Scalable Applications• Scale up, scale out, scale to X

• Determining if an application will scale

• Poll: scalable airport terminal designs

• Hadoop and Spark Scalability

• Team activity: Scaling Metroz

7. Fault Tolerant Distributed Systems• Principles

• Transparency

• Hardware vs. Software redundancy

• Tolerating disasters

• Stateless functional fault tolerance

• Stateful fault tolerance

• Replication and group consistency

• Fault tolerance in Spark and Map Reduce

• Application tolerance for failures

• Team activity: Identify Metroz component failures and requirements

8. Security and Privacy• Principles

• Privacy

• Threats

• Technologies

• Team activity: identify threats and security mechanisms in Metroz

9. Deployment• Cluster sizing and evolution

• On-premise vs. Cloud

• Edge computing

• Team activity: select deployment for Metroz

10. Technology Selection• HDFS

• HBase

• Kudu

• Relational Database Management Systems

• Map Reduce

• Spark, including streaming, SparkSQL and SparkML

• Hive

• Impala

• Cloudera Search

• Data Sets and Formats

• Team activity: technologies relevant to Metroz

11. Software Architecture• Architecture artifacts

• One platform or multiple, lambda architecture

• Team activity: produce high level architecture, selected technologies, revisit vertical slice

• Vertical Slice demonstration

12. Wrap Up

201709