rushabh mehta managing director (india) | solid quality mentors
DESCRIPTION
Agenda Microsoft Data Warehousing Overview SMP v/s MPP Architecture Microsoft Parallel Data Warehouse Architecture and ComponentsTRANSCRIPT
Massive scale with Microsoft SQL Server 2008 R2 Parallel Data Warehouse Edition
Rushabh MehtaManaging Director (India) | Solid Quality [email protected]
About me: Rushabh MehtaProfessional Association for SQL Server
PresidentSolid Quality Mentors (SolidQ)
Business Intelligence MentorManaging Director, India
SQL Server MVP
[email protected] ◊ www.solidq.com ◊ @rushabhmehta
AgendaMicrosoft Data Warehousing OverviewSMP v/s MPP ArchitectureMicrosoft Parallel Data Warehouse Architecture and Components
Microsoft Data Warehousing Offerings
SQL 6.01995
SQL 7.01998
SQL 20002000
SQL 20052005
SQL 20082008
SQL 6.51996
SQL CE2000
64-bit2001
OLAP and ETLData MiningManaged Reporting
Microsoft’s Commitment to DW and BI Pe
rvas
ive
Insig
ht
Data WarehousingAd-hoc Reporting
DW ScaleData ProfilingCompression
VS IntegrationKPIsMultiple sources Resource Governor
Partitioning
FastTrack2009
PDW2010
Power PivotLoad Optimize
Parallel ProcessingScale to 100s of TB
•Gartner Leaders Quadrant for Business Intelligence, since 2008•Gartner Leaders Quadrant for Data Warehouse, since 2008• Leader in “The Forrester Wave: Enterprise Data
Warehousing Platforms, Q1 2009”• Fastest growing of top 5 data warehouse vendors -
IDC • Microsoft spends as a company $9.1 billion in
research annually
SQL Server Fast Track Data Warehouse
A method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this methodBest practices for data layout, loading and management
Solution to help customers and partners accelerate their data warehouse deployments
Fast Track Scope
Dat
a Pa
th
Data Warehouse
Analysis Services Cubes
PerformancePoint ServicesSAN, Storage Array
Reporting Services
Web Analytic Tools
Integration Services ETL
SharePoint Services
Microsoft Office SharePoint
Data Staging,Bulk Loading
Subject AreaData Marts
Supporting Systems BI Data Storage Systems Presentation Layer Systems
Reference Architecture Scope (dashed)
Pres
enta
tion
Dat
aPr
esen
tatio
n D
ata
Fast Track Value Proposition
8
Appliance-like time to valueReduces DBA effort; fewer indexes, much higher level of sequential I/O
Choice of HW PlatformsDell, HP, Bull, EMC and IBM – more
in future
Low TCO ThroughCommodity hardware and value
pricing; Lower storage costs.
High ScaleNew reference architectures scale
up to 48 TB (assuming 2.5x compression)
Reduced RiskValidated by Microsoft; better
choice of hardware; application of Best Practice
SMP ArchitectureSMP = Symmetric Multiprocessing
Two or more identical processors connected to single shared main memory and controlled by single OS instanceAny processor can work on any taskEasily move tasks between processors to balance workload efficiently
All SQL Server implementations up until now have been SMP
MPP ArchitectureMPP = Massively Parallel Processing
Uses many separate CPUs running in parallel to execute a single programEach CPU has its own memoryApplications must be segmented, using high speed communications between nodes
Parallel Data WarehouseControl
RackDataRack
Control Rack
Data Rack/s
Compute Nodes Storage Nodes
Spare Compute Node
Dua
l Fib
er C
hann
el
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQLDua
l Infi
niba
nd
Control Nodes
Active / Passive
Landing Zone
Backup Node
SQL
Management Servers
Client Drivers
ETL Load Interface
Corporate Backup Solution
Support / Patching
Corporate Network Private Network
SQL
SQL
Parallel Data Warehouse
Compute Node + Storage Node PDW Node
Compute NodesEach MPP node is a highly tuned SMP node with standard interfaces
Dedicated hardware, database & storage
Running SQL Server 2008 EE
SQL as primary interface
Compute Node
SQL
Architecture: Compute Server Node Hardware Options
Pre-configured For Each Sqlserver Instance On Each Compute Node.
• Drives Configured As RAID1 To Avoid Appliance Failover For A Single Drive Failure• Dell Compute Nodes Have 2
LUN’s (2 RAID1 Pairs)• HP Compute Nodes Have 3
LUN’s (3 RAID1 Pairs)
tempdb Used For The Following Purposes
• Sort-work Area For Data Loading Into Clustered Index Tables
• Spill Area For Hash Joins Not Fitting Into Memory
• Temporary PDW Tables
Enterprise ClassDBMS
TempDBWorkspace
Dual Multi-CoreProcessors
DUAL 4Gb FC Dual InfiniBand
CPU
CPU
RAM
Data LayoutReplicated:A table structure that exists as a full copy within each discrete PDW Node.
Distributed: A table structure that is hashed on a single column and uniformly distributed across all nodes on the appliance. Each distribution is a separate physical table in the DBMS.
Ultra shared nothing: The ability to design a schema of both distributed and replicated tables to minimize data movement between nodes
Small sets of data can be more efficiently stored in full (replicated).Certain set operations are more efficient against full sets of data (i.e., single node operations).
Data Layout
Date DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store DimStore Dim IDStore NameStore MgrStore Size
Item DimProd Dim IDProd CategoryProd Sub CatProd Desc
Sales FactDate Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold
Promo DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End
DD
SD
ID
MD
SF1
DD
SD
ID
PD
SF2
DD
SD
ID
PD
SF3
DD
SD
ID
PD
SF4
DD
SD
ID
PD
SF5
DD
SD
ID
PD
SF1
Compute Nodes Storage Nodes
Spare Compute Node
Dua
l Fib
er C
hann
el
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQLDua
l Infi
niba
nd
Control Nodes
Active / Passive
Landing Zone
Backup Node
SQL
Management Servers
Client Drivers
ETL Load Interface
Corporate Backup Solution
Support / Patching
Corporate Network Private Network
SQL
SQL
Parallel Data Warehouse
Control NodesActive / Passive Cluster
SQLClient Drivers
Control Node & Client Drivers• Client connections always go through the
control node• The Control Node contains no persistent
user data• PDW ‘Secret Sauce’
• Processes SQL requests• Prepares execution plan• Orchestrates distributed execution
• Local SQL Server to do final query plan processing / result aggregation
• Client Drivers provided by DataDirect• ODBC, OLE-DB, JDBC and ADO.NET client drivers• Available drivers for 32 and 64 bits
PDW Benefits – Massive Parallel Processing
Control Rack DataRack
Query 1
Query 1 is standard T-SQL submitted to SQL Server on Control Node
?????????
?
Query is executed on all 10 NodesResults are sent back to client
PDW Benefits – Massive Parallel Processing
Blazing fast performance by parallelizing queries on highly optimized ultra shared nothing nodes.
Control Rack DataRack Multiple queries
are simultaneously executed across all nodes.
PDW supports querying while data is loading.
?
?
??
?
?
?
? ????
? ???
??? ? ??????? ? ??????? ? ??????? ? ??????? ? ??????? ? ??????? ? ??????? ? ??????? ? ??????? ? ????
Compute Nodes Storage Nodes
Spare Compute Node
Dua
l Fib
er C
hann
el
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQLDua
l Infi
niba
nd
Control Nodes
Active / Passive
Landing Zone
Backup Node
SQL
Management Servers
Client Drivers
ETL Load Interface
Corporate Backup Solution
Support / Patching
Corporate Network Private Network
SQL
SQL
Parallel Data Warehouse
Support / Patching
Management NodesActive / Passive Cluster
Management Node• Runs a separate domain controller
(Active Directory)• Used for deploying patches to all
nodes in the appliance• Holds images in case a node needs
reimaging• High Availability using Active / Passive
clustering
Compute Nodes Storage Nodes
Spare Compute Node
Dua
l Fib
er C
hann
el
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQLDua
l Infi
niba
nd
Control Nodes
Active / Passive
Landing Zone
Backup Node
SQL
Management Servers
Client Drivers
ETL Load Interface
Corporate Backup Solution
Support / Patching
Corporate Network Private Network
SQL
SQL
Parallel Data Warehouse
Landing Zone
ETL Load Interface
Landing Zone• Provides high capacity storage for data files
from ETL processes• Integration services available on the landing
zone• Connected to internal network• Available as sandbox for other applications
and scripts that run on internal network
SourceLanding
Zone Files
Data Loader
Compute Nodes
Compute Nodes Storage Nodes
Spare Compute Node
Dua
l Fib
er C
hann
el
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQLDua
l Infi
niba
nd
Control Nodes
Active / Passive
Landing Zone
SQL
Management Servers
Client Drivers
ETL Load Interface
Support / Patching
Corporate Network Private Network
Backup Node
Corporate Backup Solution
SQL
SQL
Parallel Data Warehouse
Backup Node
Corporate Backup Solution
Backup Node• Coordinated backup across the nodes• Database level backup
• Full or differential• Metadata backup
• Can restore to a larger appliance• Optional item – 1 size per config
• Up to 524TB of capacity• Available in XS, S, M, L and XL
PDW Software Architecture
SQL Server
DW Authenticati
on
DW Configuratio
nDW
Queue DW Schema
PDW Services
DMS
IIS
Compute NodesCompute Nodes
Compute Node
Landing Zone
Backup Node
Management Node
Built by DWPUExisting MS software 3rd Party
Nexus Query Tool
JDBCOLE-DBODBC
ADO.NETSQL Server
DMS
User DataAdmin
Console
DSQLCore
Engine Services
DMS Manager
MS BI(AS, RS)
DMS
DMS
Loader
ClientSQL SSIS
HPC AD
SQL OS
SQL OS
Control Node
3rd Party Tools (Client Access)
ConclusionMPP architecture supports massive scale through increased parallelization and shared-nothing architectureMicrosoft SQL Server 2008 R2 Parallel Data Warehouse Edition brings massive scale wrapped in the simplicity of an appliance
ReferencesMicrosoft Parallel Data Warehouse official sitehttp://www.microsoft.com/pdw
Feedback / QnAYour Feedback is Important!Please take a few moments to fill out our
online feedback form at: << Feedback URL – Ask your organizer for this in advance>>
For detailed feedback, use the form at http://www.connectwithlife.co.in/vtd/helpdesk.aspx
Or email us at [email protected]
Use the Question Manager on LiveMeeting to ask your questions now!
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.