big data in the cloud with informatica cloud and amazon redshift
DESCRIPTION
Data warehousing costs have been continually rising with the explosion of Big Data. To help you explore the most cost-effective data warehousing techniques, learn from the cloud experts from Amazon and Informatica. Learn more: http://www.informaticacloud.com/amazon-redshift Amazon Redshift is a petabyte-scale cloud-based data warehouse that allows you to provision multiple database nodes on demand and offload raw data from on-premise databases for more cost effective data warehousing. Getting this data into Redshift is easy with Informatica Cloud. In this interactive webinar, you’ll learn: -How Amazon Redshift is changing the economics of data warehousing -Why Big Data integration and management is a strategic imperative within enterprises -How cloud integration makes cloud data warehousing even more cost effective At Informatica, our goal is to unlock your information potential. Join us with featured guest speakers from Amazon for this interactive webinar.TRANSCRIPT
![Page 1: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/1.jpg)
Cloud and Amazon Redshift
Rahul Pathak, Amazon Redshift Product ManagementNicolas Brisoux, Informatica Cloud Platform AdoptionDarren Cunningham, Informatica Cloud Marketing
@infacloud #redshift
![Page 2: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/2.jpg)
Today’s Agenda
• Informatica and Amazon Strategic Partnership
• Amazon Redshift Overview
• Informatica Cloud Redshift Connector
• Demonstration
• Discussion
• Next Steps
2
![Page 3: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/3.jpg)
Informatica: The Information Management Leader
B2B Data Exchange
Informatica supports the requirements of cross-organizational
data exchange, so users apply familiar & trusted data integration
tools and techniques to the growing practice of B2B data integration.
Cloud Data IntegrationEnterprise Data Integration
Complex Event Processing
Informatica received high praise for its services from customers. For deployments involving systems
monitoring use cases, Informatica offers a five-day stand‐up of
RulePoint.
Ultra Messaging
In spite of the new entrants, Informatica remains the market
leader in this highly demanding part of the messaging market.
Data Quality Master Data Management
Application ILM
![Page 4: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/4.jpg)
Informatica Cloud: our fastest growing product lineToday’s Focus: Cloud Data Integration
4
![Page 5: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/5.jpg)
Informatica Cloud and Amazon Redshift:Enabling cost-effective data warehousing
• Redshift Connector pre-release announced in February
• General availability this month (August)
5
InformaticaCloud.com/Amazon-Redshift
![Page 7: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/7.jpg)
AWS Database Services
Amazon RDSFully managed SQL database service for OLTP workloads
Amazon DynamoDB
Fully managed NoSQL service for massively scalable, high throughput, low latency workloads
Amazon Redshift
Fully managed fast and powerful, petabyte-scale data warehouse service
Amazon ElastiCache
Fully managed Memcached-compliant in memory caching service
![Page 8: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/8.jpg)
We set out to build…
A fast and powerful, petabyte-scale data warehouse that is:
A Lot Faster
A Lot Cheaper
A Lot SimplerAmazon Redshift
![Page 9: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/9.jpg)
Data warehousing done the AWS way
• Pay as you go, no up front costs
• Fast, cheap, easy to use
• SQL
• Easy to provision
![Page 10: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/10.jpg)
Common Customer Use Cases
• Reduce costs by extending DW rather than adding HW
• Migrate completely from existing DW systems
• Respond faster to business; provision in minutes
• Improve performance by an order of magnitude
• Make more data available for analysis
• Access business data via standard reporting tools
• Add analytic functionality to applications
• Scale DW capacity as demand grows
• Reduce HW & SW costs by an order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
![Page 11: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/11.jpg)
Progress Since Launch on Feb 14, 2013
• Fastest growing service in AWS history
• Well over 1,000 customers; adding over 100 per week
• Obtained SOC1 & SOC2 certification with more in progress
• Deployed in US East (N. Virginia), US West (Oregon), EU (Ireland) and Asia Pacific (Tokyo)
• Additional global regions coming soon
![Page 12: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/12.jpg)
Amazon Redshift Customers
• 5x – 20x reduction in query times; 4x cost reduction over HIVE
• 20x – 40x reduction in query times
• Nokia: 50% reduction in costs, 2x improvement in query times
![Page 13: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/13.jpg)
Amazon Redshift Customer: bit.ly
“When we want to answer a question with Redshift, we just write a SQL query and get an answer within a few minutes – if not seconds.”
- Sean O’Connor, Engineer at bit.lyBit.ly provides social link sharing analytics, managing over 300 million shortens and 5 billion clicks each month
![Page 14: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/14.jpg)
14
Amazon Redshift Customer: HasOffers
“Amazon Redshift introduces a major opportunity to improve the performance of our real-time reporting, allowing us to run queries up to 50 times faster than our current OLAP solution.”
- Niek Sanders, VP of Engineering,
HasOffers
HasOffers records and reports billions of desktop and mobile interactions for performance marketers
![Page 15: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/15.jpg)
Amazon Redshift Customer: Infor
“This is the formula for fast and broad adoption, where customers can get consistent, accurate, and useful data fast - in weeks not months or years.”
- Ali Shadman, SVP, Business Cloud & Upgrades, Infor
Infor is the world’s third largest ERP vendor, serving over 70,000 customers in 194 countries
![Page 16: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/16.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With row storage you do unnecessary I/O
• To get total amount, you have to read everything
![Page 17: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/17.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• With column storage, you only read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
![Page 18: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/18.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Columnar compression saves space & reduces I/O
• Amazon Redshift analyzes and compresses your data
analyze compression listing;
Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
![Page 19: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/19.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Track of the minimum and maximum value for each block
• Skip over blocks that don’t contain the data needed for a given query
• Minimize unnecessary I/O
![Page 20: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/20.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Use direct-attached storage to maximize throughput
• Hardware optimized for high performance data processing
• Large block sizes to make the most of each read
• Amazon Redshift manages durability for you
![Page 21: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/21.jpg)
Amazon Redshift architecture
• Leader Node– SQL endpoint– Stores metadata– Coordinates query execution
• Compute Nodes– Local, columnar storage– Execute queries in parallel– Load, backup, restore via
Amazon S3– Parallel load from Amazon
DynamoDB
• Single node version available
10 GigE(HPC)
IngestionBackupRestore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute Node
128GB RAM
16TB disk
16 coresCompute Node
128GB RAM
16TB disk
16 coresCompute Node
LeaderNode
![Page 22: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/22.jpg)
Amazon Redshift runs on optimized hardware
HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate
HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage
• Optimized for I/O intensive workloads
• High disk density
• Runs in HPC - fast network
• HS1.8XL available on Amazon EC2
128 GB RAM
16 cores
16 TB disk
16 GB RAM
2 TB disk
2 cores
![Page 23: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/23.jpg)
Amazon Redshift lets you start small and grow big
Extra Large Node (HS1.XL)3 spindles, 2 TB, 16 GB RAM, 2 cores
Single Node (2 TB)
Cluster 2-32 Nodes (4 TB – 64 TB)
Eight Extra Large Node (HS1.8XL)24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
XL
XL XL XL XL XL XL XL XL
XL XL XL XL XL XL XL XL
XL XL XL XL XL XL XL XL
XL XL XL XL XL XL XL XL
Note: Nodes not to scale
![Page 24: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/24.jpg)
Amazon Redshift is priced to let you analyze all your data
Simple Pricing Number of Nodes x Cost per HourNo charge for Leader Node No upfront costsPay as you go
Price Per Hour for HS1.XL Single Node
Effective Hourly Price Per TB
Effective Annual Price per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year Reservation
$ 0.500 $ 0.250 $ 2,190
3 Year Reservation
$ 0.228 $ 0.114 $ 999
![Page 25: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/25.jpg)
Amazon Redshift is easy to use
• Provision in minutes
• Monitor query performance
• Point and click resize
• Built in security
• Automatic backups
Slides not intended for redistribution.
![Page 26: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/26.jpg)
Amazon Redshift has security built-in
• SSL to secure data in transit
• Encryption to secure data at rest
– AES-256; hardware accelerated– All blocks on disks and in
Amazon S3 encrypted
• No direct access to compute nodes
• Amazon VPC support
Slides not intended for redistribution.
10 GigE(HPC)
IngestionBackupRestore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Amazon S3 / Amazon DynamoDB
Customer VPC
InternalSecurityGroup
JDBC/ODBC
LeaderNode
Compute Node
Compute Node
Compute Node
![Page 27: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/27.jpg)
Amazon Redshift continuously backs up your data and recovers from failures
• Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times
• Backups to Amazon S3 are continuous, automatic, and incremental– Designed for eleven nines of durability
• Continuous monitoring and automated recovery from failures of drives and nodes
• Able to restore snapshots to any Availability Zone within a region
Slides not intended for redistribution.
![Page 28: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/28.jpg)
Amazon Redshift works with your existing analysis tools
More coming soon…
JDBC/ODBC
Amazon Redshift
Connect using drivers from PostgreSQL.org
![Page 29: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/29.jpg)
Amazon Redshift integrates with multiple data sources
Amazon Elastic MapReduce
Amazon DynamoDB
Amazon Elastic Compute Cloud
(EC2)
AWS Storage Gateway Service
Amazon Simple Storage Service
(S3)
Corporate Data Center
Amazon Relational Database Service
(RDS)
Amazon Redshift
![Page 30: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/30.jpg)
Today’s Agenda
• Informatica and Amazon Strategic Partnership
• Amazon Redshift Overview
• Informatica Cloud Redshift Connector
• Demonstration
• Discussion
• Next Steps
30
![Page 31: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/31.jpg)
2
1
Informatica Cloud Architecture Overview
4SecureAgent
Your Company 3
Marketplace
Amazon Redshift
![Page 32: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/32.jpg)
Map Once. Deploy Anywhere.
ON PREMISE HADOOP 3rd PARTYAPPLICATIONS
CLOUD
![Page 33: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/33.jpg)
Cloud Amazon Redshift Connector DemoNicolas Brisoux, Cloud Platform Adoption
![Page 34: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/34.jpg)
Best practices to remember…
• The Amazon S3 bucket that holds the data files must be created in the same region as your cluster
• Files are deleted from Amazon S3 bucket when upload is complete
• Choose a batch size where the number of batches matches the number of slices in your cluster
• Each XL node has 2 slices, each 8XL node has 16
• If you have a 2 node XL cluster and 40,000 rows of data, choose a batch size of 10,000
• The Informatica Cloud Redshift connector can maximize Amazon’s parallel processing capabilities this way
![Page 35: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/35.jpg)
Informatica Cloud Amazon Redshift demonstration
Firewall
Informatica Cloud Secure Agent
Metadata Mappings
Authenticate and retrieve Data Synchronization Task
1
1
Retrieve Account Data2
2
3 Perform lookup on SLA level
3
4
4
Put Account Data & SLA Level into Flat File
5 Transferred compressed Flat File
5
6 Initiate load from Amazon S3
6
7 Load data into Amazon Redshift
7
![Page 36: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/36.jpg)
PowerCenter Mappings and Informatica Cloud
• If you want to reuse your existing PowerCenter mappings with Informatica Cloud and Redshift you have 2 options:
• Use the PowerCenter Repository Manager to export your existing workflows and import them into Informatica Cloud using the PowerCenter Tasks feature
Or…
• Keep your existing mappings in PowerCenter and stage the data
• Create a DSS task in Informatica Cloud to move the data to Redshift from the staging area
• This task can be managed from PowerCenter
1
2
![Page 37: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/37.jpg)
Why Informatica Cloud Integration for Redshift?
37
1 Map Once, Deploy Anywhere
2 Rapid Connectivity & Deployment
3 Advanced Integration Delivered Easily
4 Excellence in batch and real-time integration
InformaticaCloud.com
![Page 38: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/38.jpg)
Next Steps
• Get started with Amazon Redshift
• Get started with Informatica Cloud
• InformaticaCloud.com
• Learn more about our Redshift Connector
• InformaticaCloud.com/Amazon-Redshift
38
![Page 39: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/39.jpg)
Discussion
Rahul Pathak, Amazon Redshift Product Management
Nicolas Brisoux, Informatica Cloud Platform Adoption
Darren Cunningham, Informatica Cloud Marketing
@infacloud #redshift
InformaticaCloud.com
![Page 40: Big Data in the Cloud with Informatica Cloud and Amazon Redshift](https://reader034.vdocuments.mx/reader034/viewer/2022042521/5550d5a9b4c90599308b514d/html5/thumbnails/40.jpg)