sql server 2008 r2 parallel data warehouse
DESCRIPTION
Presentation by Bruce Campbell of Microsoft Learn about a new capability in SQL Server 2008 R2, Parallel Data Warehouse, formerly known as Project Madison.TRANSCRIPT
SQL Server and Data WarehousingSQL Server and Data WarehousingSQL Server 2008 R2 Parallel Data Warehouse ApplianceSQL Server 2008 R2 Parallel Data Warehouse Appliance
Speaker: Phil Hummel of WinWire Technologies
Presentation developed by: Bruce Campbell
Western Region Data Warehouse Specialist, Microsoft
Silicon Valley SQL Server User Group
February 16, 2009
Mark Ginnebaugh, User Group Leader,
Agenda
• SLQ 2008 R2 Parallel DW Appliance
– Hardware and Software Architecture
– Case Study
– Customer Experience Opportunities– Customer Experience Opportunities
• Next Steps
SQL Server Parallel Data Warehouse Formerly Project Madison
ProjectMadison Madison MPP Layer
INDUSTRY STANDARD
NETWORKING
INDUSTRY STANDARD
SERVERSReference Hardware Platforms
INDUSTRY STANDARD
STORAGE
Parallel DW Appliance Experience
• All hardware from a single vendor
• Multiple vendors to chose from
• Orderable at the rack or cluster
• Vendor will
– Assemble appliances– Assemble appliances
– Image appliances with OS, SQL Server and Madison software
• Appliance installed in less than a day
• Support –
– Vendor provides hardware support
– Microsoft provides software support
SQL Server Parallel DW Node
Parallel DW - MPP Example
Database Servers
Query Rewritten Into Steps
That Run Efficiently On
Database Servers
ODBC/JDBC
SQL92 with
Analytical
ExtensionsDual
Dual Infiniband
Infiniband
Dual Fiber Channel
Dual Fiber ChannelExtensions
SELECT location, year
sum(b.sales_amt)
FROM customer a, sales b
WHERE b.sales > 500 and
a.custid = b.custid
GROUP BY location, year
ORDER BY 1,2
Database Servers
• A SQL Server 2008 instance
• SQL as primary interface
• Each MPP node is a highly tuned SMP node with standard interfaces
• DB engine nodes autonomous on local data
SQLSQLSQL
Database Server
Ultra Shared Nothing
• An extension of traditional shared nothing design
– Push shared nothing architecture into SMP node
• IO and CPU affinity within SMP nodes
– Eliminate contention per user query
– Use full PDW Node resources for each user query– Use full PDW Node resources for each user query
– Multiple physical instances of tables
• Distribute large tables
• Replicate small tables
– Re-Distribute rows “on-the-fly” when necessary
Control Node & Client Drivers• Client connections always go through the control node
– Clustered to a passive node to support High Availability
• Processes SQL requests
• Prepares execution plan
• Orchestrates distributed execution
• Local SQL Server to do final query plan processing / result
aggregation
• Drivers
• ODBC
• OLE-DB
• Ado.Net client drivers
Landing Zone• Provides high capacity storage for data files from ETL
processes
• Supports division of workload dedicated to ETL processes
• SSIS available on the landing zone
• Connected to PDW internal network• Connected to PDW internal network
• Available as sandbox for other applications and scripts that run on internal network.
SourceLanding
Zone Files
Data Loader
Compute Nodes
Backup Node
• Builds on SQL Server native backup/restore facility
• Executes at Infiniband network speeds
• Database-level backup
• Subsequent Back Ups are Optimized
• Coordinated backup across the nodes
• Quiesce write activity to synchronize
Software Architecture
PDW Services
DMS
IIS
Compute NodesDatabase Server
Landing Zone
Nexus Nexus
Query
Tool
JDBC
OLE-DB
ODBC
Ado.Net
SQL Server
DMS
User DataAdmin Console
MS BI
(AS, RS)
DMSLoader
SQL SSIS
Control Node
Other 3Other 3rd
Party
Tools
SQL Server
DW
Authentication
DW
Configuration
DW
QueueDW Schema
DMS
Backup Node
Management Node
Built by DWPUExisting MS software 3rd Party
DSQLCore Engine
Services
DMS
Manager
DMS
DMS
Loader
ClientSQL SSIS
HPC AD
SQL OS
SQL OS
Data Distribution supports even distribution of data across PDW nodes
Data Replication
SQL Server Parallel DW Architecture - HP
Database Servers
Control Nodes
Active / Passive
SQLSQLSQL
SQLSQLSQL
SQLSQLSQL
SQLSQLSQL
SQLSQLSQLClient Drivers
Dual Infiniband
Dual Infiniband
Spare Database Server
Dual Fiber Channel
Dual Fiber ChannelSQLSQLSQL
SQLSQLSQL
SQLSQLSQL
SQLSQLSQLETL Load Interface
Corporate Backup
Solution
Data Center
Monitoring
Corporate Network Private Network
SQLSQLSQL
SQLSQLSQL
MPP Architecture
HA Built In
Linear Scalability
Hub and Spoke – Flexible Business Alignment
Parallel database copy
technology enables rapid
data integration and
consistency between hub
and spokes
Support user groups with
very different SLAs; hot,
warm and cold data;
different requirements on
data loading, etc.
16
A Hub and Spoke solution gives you the flexibility to add/change diverse workloads/user groups, A Hub and Spoke solution gives you the flexibility to add/change diverse workloads/user groups,
while maintaining data consistency across the enterprisewhile maintaining data consistency across the enterprise
Create SQL Server Parallel Data Warehouse, SQL Server 2008, Fast Track Data Warehouse,
and SQL Server Analysis Services spokes
Parallel DW and Fast Track Hub and Spoke
Regional Reporting
Departmental
Reporting
High Performance HQ
17
Central EDW Hub
Regional Reporting
ETL Tools
High Performance HQ
Reporting
Microsoft Released first Technology Preview for
Parallel Data Warehouse• First Technology Preview released on August 14
• DATAllegro’s MPP engine is now ported to SQL Server 2008 and Windows Server 2008
• 10 customers from 7 industries signed up
– First Premier BankCard was the first customer to enlist on Madison
– Internally – ICE, MSIT, ADCenter, XBOX
• Appliances with 8 to 20 nodes now ready to host customers test drives
Early Results
• Data Loading rates of 1 TB per hour
• Query executions at over 1.5 TB per minute
• Madison running 5 times faster than DATAllegro with Ingres DBMS before acquisition!
Launch of Parallel Data Warehouse:
• Next Technology Preview due early CY2010
• Technology Adoption Program (TAP) due early CY2010
• Nominations now open
• Parallel Data warehouse to launch in summer 2010
Parallel DW Beta Programs
• Two Programs
– MTP – Madison Technology Preview
• 20 – 30 participants
• Duration of 4 to 6 weeks• Duration of 4 to 6 weeks
– TAP – Beta production implementation
• 6 – 8 customers
• First iteration 9 to 12 weeks
Parallel DW Beta Programs
• Requirements
– Focus on EDW and large data marts
– Migration projects, not green field
– Open to customers & prospects
– 30+ TB of data…at least 4 100+ TB – 30+ TB of data…at least 4 100+ TB
– Hub-and-spoke in only a select few cases
Case Study: First Premier Bankcard
ExistingExisting
EnvironmentEnvironment
Hardware16 CPU HP 8620 Itanium
Hitachi Storage 27TB Raw
SATA 21 LUNS
Software
Current Current
ChallengesChallenges
Data Load Speeds
Analytic Capacity
Analytic Speed
MadisonMadison
Highlights Highlights
�Improved by 300%
�30TB/160 Cores
�Query Speeds 70X SoftwareWindows 2003 SP2
SQLServer 2008
SSIS/SSRS
Data Warehouse18 Terabytes
Star Schema
80 Fact Tables
500 + Dimensions
Analytic Speed
Mixed Workload
Total Cost of
Ownership
�Query Speeds 70X
Improvement
�Concurrency
Mixed Workload
�TCO Lowered by
50%
Microsoft Commitment
• MTP
– High touch Support
– MS or partner will provide HW and will host the MTP
– Customer may have opportunity to engage with TAP
– MS will work with customer to define scope and success criteria
– MS will perform the bulk of MTP work (2 -3 resources)
• TAP
– Customer must procure the Madison reference architecture and conduct the TAP in their own data center
– Premier support will be provided
– MSFT Services will be provided
– Training / mentoring will be provided
– MS will work with customer to define scope and success criteria
Customer Commitment
• MTP
– Customer to provide data, queries, concurrency model, existing data
model, etc.
– Customer to provide SME and DBA to answer questions of MTP team
– Customer to provide existing benchmarks
– Customer to define priorities for testing and areas of interest– Customer to define priorities for testing and areas of interest
– Customer to attend 2-3 day MTP interactive session and review
• TAP
– Customer to provide data, queries, concurrency model, existing data
model, etc.
– Customer to provide SME, DBA and other resources to work with MS
TAP team
– For onsite – customer to provide building access, internet access, etc
– Customer to provide PDW Reference Hardware
MTP & TAP Schedule
• MTP 1 – Completed
• MTP 2 – Q1 2010
• TAP – Q2 2010
• RTM – Summer 2010
Next Steps
Proof Steps
� Quick Start DW Roadmap Service
� Architectural Design Session
� Madison Technology Preview (MTP)
� Review Madison, SQL Server Classic or Fast Track DW HW/SW configurations and pricing
www.bayareasql.org
To attend our meetings or inquire about speaking opportunities, please contact:
Mark Ginnebaugh, User Group Leader [email protected]
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.