big data goes airborne. propelling your big data initiative with ironcluster & amazon web...
DESCRIPTION
Learn about the only solution to instantly provision a full-featured ETL environment running on AWS for less than your Sunday newspaper!TRANSCRIPT
Big Data Goes Airborne
Big Data Goes Airborne
Jorge A. LopezDirector Product Marketing, Syncsort
Chris KeyserPartner Solution Architect, Amazon Web Services
Agenda
1. The Cloud as a Data Platform
2. Addressing Data Processing Challenges with Ironcluster & AWS
3. DEMO
4. Closing Comments + Q&A
Why Are Customers Adopting Cloud and AWS?
1.
Cost savings through
economics of scale
Don’t have to guess on capacity
3.
Agility, Speed to market & Flexibility
4.
Global in minutes
5.
2.
Trade capital expense for
variable expense
Security and Compliance
6.
AWS Global Infrastructure
10 Regions
26 Availability Zones
51 Edge Locations
The Good News Is that Cloud Isn’t an ‘All or Nothing’ Choice
On-Premises Resources
Cloud Resources
Integration
Corporate Data Centers
Integrating Your On-Premises, AWS and SaaS Infrastructure
Applications on premise
App Migration/Archiving
Hybrid Data Warehouse / BI
Active Directory
Network ConfigurationCorporate
Data Centers
Users & Access Rules (IAM)
Your Private Network (VPC)
Your On-Premises Data Center
AWS Direct Connect Your CloudData Center
Applications on AWS
Data Warehouse/BI
Managed Databases
AWS Provides Broad and Deep Services
Regions Availability Zones Content Delivery POPs
Storage GatewayS3 EBS Glacier Import/ExportDynamoDB ElastiCache
StorageCompute Databases
RDS
MySQL, PostgreSQLOracle, SQL Server
Elastic Load BalancerEC2 Auto Scaling
Direct Connect Route 53VPCNetworking
Analytics
Data PipelineRedshiftEMR Kinesis SWFSNS SQS CloudSearchSES AppStreamCloudFront
Application Services
WorkSpaces
Management &AdministrationIAM CloudWatchCloudTrail APIs and SDKsManagement ConsoleCloud HSM Command Line Interface
Elastic Beanstalk for Java, Node.js, Python, Ruby, PHP and .Net OpsWorks CloudFormationContainers & Deployment
Technology Partners Consulting Partners AWS MarketplaceEcosystem
Support CertificationTrainingProfessional Services
G2
GPUenabled
M3
General purpose
Memoryoptimized
R3
Storage and IOoptimized
C3
Computeoptimized
I2 HS1
32 vCPU60 GB RAM720 GB SSD
32 vCPU244 GB RAM6.4 TB SSD
16 vCPU117 GB RAM48 TB HDD
8 vCPU15 GB RAM1536 CUDA cores4 GB Video RAM
32 vCPU244 GB RAM720 GB SSD
c3.8xlarge i2.8xlarge hs1.8xlarge r3.8xlarge G2.2xlarge
8 vCPU30 GB RAM160 GB SSD
m3.2xlarge
Amazon EC2 - Broad Selection of Compute Instance Families
AWS as a Data Platform
EC2EBS
Instance Storage
RedshiftRDS
SQL Stores
EMR
hadoop
DynamoDB
NoSQL
Kinesis
stream
CloudSearch
search
S3
Storage Services
CloudFrontGlacier
DBA
Data
Velocity
Variety
Volume
Structured, Unstructured, Text, Binary
Gigabytes, Terabytes, Petabytes
Millisecond, Second, Minute, Hour, Day
Master instance group
Task instance group
Core instance group
HDFS HDFS
Amazon S3Amazon Redshift
Amazon DynamoDB
Amazon EMR - Hadoop Tuned for AWS
Amazon Redshift - Petabyte Scale Data Warehouse
Leader Node– SQL endpoint– Stores metadata– Coordinates query execution
Compute Nodes– Local, columnar storage– Execute queries in parallel– Backup and restore via S3– Parallel load from S3, EMR, or
DynamoDB
HW optimized for data processing– DW1: 2TB – 1.6PB Magnetic– DW2: 160GB – 256TB SSD
10 GigE(HPC)
IngestionBackupRestore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3 / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute Node
128GB RAM
16TB disk
16 coresCompute Node
128GB RAM
16TB disk
16 coresCompute Node
LeaderNode
The Data Processing Challenge
!! !
Innovative Cloud Solutions
Ironcluster ETL,Amazon EC2 Edition
COLLECT, PROCESS & DISTRIBUTE DATA AT DISRUPTIVE SCALE & COST
Blazingly FAST, infinitely SCALABLE EASY to use graphical user interface Self-tuning engine for SMART data integration The capacity you need, when YOU need it Instantly provision with single-click access
Ironcluster Hadoop ETLfor Amazon EMR
Now FREEin the AWS
Marketplace!
Only pure-play ETL app available on the AWS Marketplace
Ironcluster – Enterprise-grade ETL in 3 Easy Steps
Done? Spin Down Ironcluster
Go to AWS Marketplace & Select Your Ironcluster Instance
Spin up Ironcluster & Start Developing
1 2 3
Got Big Data? – Enter Hadoop with Ironcluster Hadoop ETL
Now… How do I get productive quickly?
! Many use cases (Where do I start?)
!! Disparate tools (or BYOL)!!! Lots of manual coding!!!! Expensive, hard-to-find
skillsOutcomes: High Costs + Slow Results
Get Your Hadoop Cluster
! Procure!! Setup!!! Configure!!!! Deploy
Got Big Data? – Enter Hadoop with Ironcluster Hadoop ETL
Now… How do I get productive quickly?
! Many use cases (Where do I start?)
!! Disparate tools (or BYOL)!!! Lots of manual coding!!!! Expensive, hard-to-find
skillsOutcomes: High Costs + Slow Results
Get Your Hadoop ClusterGet Your Hadoop Cluster
! Procure!! Setup!!! Configure!!!! Deploy
Vs.
Now …Get right to work!
Fully Productive in Days + No Brainer Cost
Syncsort Ironcluster: Hadoop ETL for Amazon EMR
Blazingly Fast, Easy to Use Hadoop ETL on Amazon EMR
+( )
Develop MapReduce ETL jobs graphically Create sophisticated data flows in no time,
with a library of Use Case Accelerators Avoid the coding nightmare without
compromising on performance Develop once, reuse many times Leverage all your data, including Amazon
Redshift & S3 sources/targets Scale infinitely with a disruptively low,
“no brainer” price
It’s FREE!!
It’s All About Discovering New Insights
An End-to-End Approach to Data Processing & VisualizationCreate data extracts in seconds with just a click in Ironcluster!
Access your data from virtually any source including Social, Redshift, S3, XML, and more
Visualize w/ Tableau• Combined power of
Hadoop & AWS• Faster queries• All enterprise data• Advanced analytics
Vast Variety ofData Sources
Process w/ Ironcluster in AWS• Fastest & lightweight
run-time ETL engine• Deploy with or without
Hadoop• Comprehensive library of
transformations
TDEs at blazing speed• Directly create TDE
files or objects to load Tableau
• Cut latency• No pre-requisite
software to install
Ironcluster Tableau Connector
Lower Your Cost & Optimize Cloud Computing on Any AWS Platform
Redshift: Transform data, then load to Redshift for reporting and advanced analytics
S3: Stream log data from S3, aggregate for insight into web user behavior, stream back to S3
RDS: Translate data from MySQL, Oracle, Microsoft SQL Server, or PostgreSQL
DynamoDB: Join large data volumes & load to DynamoDB for mobile, gaming and add apps
<---> Throughput
Speed &Efficiency
*Users of the new Ironcluster ETL for EC2 can experience up to a 75% reduction in processing time and total cost of ownership when compared to legacy ETL approaches and tools. Based on Syncsort benchmarking and POCs.
$75% Processing
Time
Cost*
The Possibilities Are Endless
Sort & aggregate massive data volumes generated by mobile devices to improve customer satisfaction
Develop & run complex market risk models on big datasets with Ironcluster in Amazon EMR
Leverage Use Case Accelerators to quickly deploy click-stream and web log analysis applications in AWS
Pre-process PB of data from sensors and research new algorithms to support quality assurance
Visit Us @ The Amazon Web Services Marketplace
Try Ironcluster ETL FREE for 30 Days!www.syncsort.com/IronclusterEC2
Got Big Data? Get Ironcluster Hadoop ETL for Amazon EMR FREE!www.syncsort.com/IronclusterEMR
Watch this Webcast On-Demand - Including a Product Demonstration!http://bit.ly/1zYh9er