data warehousing...redefining high scale data warehousing •leverage existing data marts and data...
TRANSCRIPT
Data Warehousing – Current Situation
Customer Challenges
Industry Trends
Fast Track Data Warehouse
Parallel Data Warehouse
Summary
Agenda
> 10TB 17% 34%
57% Appliances
78%
Data Warehouse Landscape and Customer Challenges
92%
82 %
Growing Market
Massive Parallel Processing
Source – TDWI, “Next Generation Data Warehouse Platforms”
Data explosion – number of firms with 10+TB expect data will double in 3 years
Customers need to reduce cost
More customers plan to do real-time analytics
Growing market adoption of DW appliances
Move to MPP – more customers plan to use MPP in the next 3 years
1
2
3
4
5
1 2
3
4
5
Data Quality
Real-Time DW and Streaming Data
Advanced Analytics
MPP
MDM
Secure and Robust
MPP (Parallel Data Warehouse)
Master Data Services
Database Security
StreamInsight (Streaming Data)
Data Quality (Zoomix)
Data Warehouse Industry Trends Microsoft has steadily invested in the most important data warehouse technologies
Column Store
Column Store (Project Apollo)
Complete Data
Warehouse Solution
Flexibility and Choice Massive Scalability
at a Low Cost
Microsoft Data Warehouse Vision
Make SQL Server the fastest and most affordable database for customers of all sizes
Simplified Data Warehouse Management
A complete range of solutions on one platform
Microsoft &
Partner
Services
A method for designing a cost-effective, balanced system for Data Warehouse workloads
Reference hardware configurations developed in conjunction with hardware partners using this method
Best practices for data layout, loading and management
•
•
•
Software:
• SQL Server 2008 Enterprise
• Windows Server 2008
Configuration guidelines:
• Physical table structures
• Indexes
• Compression
• SQL Server settings
• Windows Server settings
• Loading
Hardware:
• Tight specifications for
servers, storage and networking
• „Per core‟ building block
Tier 1 Enterprise Data Warehouse Appliance Offering
High scalability from 10s to100s of terabytes
High performance through MPP system
Flexibility and Choice
Multiple hardware vendors
Choice of deployment options through distributed architecture
Most Comprehensive Solution
Complete data warehouse solution spanning desktop, enterprise
data warehouse (EDW), and data marts
Deep integration with Microsoft business intelligence (BI)
Comprehensive toolset for BI, ETL, MDM, and streaming data
Introducing Parallel Data Warehouse
Redefining High Scale Data Warehousing
• Leverage existing data marts
and data warehouse solutions
• Pre-tested configurations that
provide low risk
implementation
• Rich, comprehensive
intelligence
• Multiple hardware offerings
mean no vendor lock-in
• Support for data marts, Fast
Track implementations, and
MPP hubs
• Storage and processing
options to fit your needs
• Ultra shared nothing
architecture
• Operations run in parallel
for performance and scale
• Interoperability with
existing BI solutions
Tier 1 Enterprise Data
Warehouse Flexibility and Choice
Most Comprehensive
Solution
Control Rack Data Rack
Compute Nodes Storage Nodes
Spare Compute Node D
ual Fib
er
Ch
an
nel
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Du
al In
fin
iban
d
Control Nodes
Active / Passive
Landing Zone
Backup Node
SQL
Management Servers
Private Network
SQL
SQL
Control Nodes
Landing Zone
Built-in Backup
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Parallel Data Warehouse Appliance Hardware Architecture
Compute Nodes
Du
al In
fin
iban
d
Control Nodes
Active/Passive
Landing Zone
Backup Node
Storage Nodes
Spare Database Server
Du
al Fib
er
Ch
an
nel
SQL
Management Servers
Client Drivers
ETL Load Interface
Corporate Backup Solution
Data Center Monitoring
Corporate Network Private Network
Star Schema
Or
Normalized Data
Data stored on servers
Backup Data
Control Rack Data Rack
Compute Nodes Storage Nodes
Spare Compute Node
Du
al Fib
er
Ch
an
nel
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Du
al In
fin
iban
d
Control Nodes
Active / Passive
Landing Zone
Backup
Node
SQL
Management Servers
Private Network
SQL
SQL
Query 1
Query 1 is submitted to SQL Server on Control Node
? ? ? ? ? ?
? ? ?
?
Query is executed on all 10 Nodes
Results are sent back to client
Control Rack Data Rack
Compute Nodes Storage Nodes
Spare Compute Node
Du
al Fib
er
Ch
an
nel
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Du
al In
fin
iban
d
Control Nodes
Active / Passive
Landing Zone
Backup
Node
SQL
Management Servers
Private Network
SQL
SQL
Blazing fast performance by parallelizing queries on highly
optimized ultra shared nothing nodes.
Multiple queries are simultaneously executed across all nodes.
PDW supports querying while data is loading.
?
?
?
?
?
?
?
? ? ? ?
?
? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Software Architecture
Compute Nodes Compute Nodes
Compute Node
Query
Tool
MS BI
(AS, RS)
Control Node
Other
Third-
Party Tools
DWSQL
Landing Zone Node
Internet
Explorer
SQL Server
DW
Authentication
DW
Configuration
DW
Schema TempDB
SQL Server User Data
Data Movement Service
Data
Movement
Service
Data Movement Service
MPP Engine Coordinator
IIS
Admin
Console
Data Access (OLEDB, ODBC, ADO.NET, JDBC)
MPP Engine Coordinator
Provides single system image
SQL compilation
Global metadata and appliance configuration
Global query optimization and plan generation
Global query execution coordination
Global transaction coordination
Authentication and authorization
Supportability (hardware and software status)
Data Movement Service
Data movement across the appliance
Distributed query execution operators
SQL
Parser
DMS
Manager
Core
Engine
Services Backup Node
Data Movement Service
All hardware from a single vendor
Multiple vendors to chose from
Orderable at the rack level
Vendor will:
Assemble appliances
Image appliances with OS, SQL Server, and
PDW software
Appliance installed in 1 – 2 days
Support:
Microsoft provides first call support
Hardware partner provides onsite break/fix
support
Parallel Data Warehouse An appliance experience
The Most Complete Data Warehouse Solution
• Compelling end user BI experience
• Agility to support new business groups with integrated data marts
• Fast time to value for BI through integration with Microsoft BI
• Complete data warehouse solution spanning desktop, data marts, and EDW
Value to Customer
Supporting Features
• PowerPivot, Reporting Services, and Microsoft Excel®
• Distributed Data Warehouse Architecture integrates both MPP and SMP data warehouses
• Integration with Microsoft BI tools, including Analysis Services, Integration Services, and Microsoft
SharePoint®
• Most comprehensive toolset, including ETL, BI, MDM, and StreamInsight
Integration with SQL Server BI tools
SQL Server Integration Services (ETL) has PDW as a destination
SQL Server Analysis Services (OLAP) has PDW as a source
SQL Server Reporting Services
PowerPivot integration
Complementary SQL Server tools
Master Data Services (for MDM)
StreamInsight (for streaming Data)
The Most Complete Data Warehouse Solution
A distributed architecture gives you the flexibility to add or change diverse workloads or user groups,
while maintaining data consistency across the enterprise.
Parallel database copy
technology enables rapid
data movement and
consistency between
EDW and Data Marts
Create SQL Server 2008, Fast Track Data Warehouse, and SQL Server Analysis
Services Data Marts
Support user groups with
very different SLAs:
• Performance
• Capacity
• Loading
• Concurrency
Distributed Data Warehouse Architecture - Flexible Business Alignment
Parallel Data Warehouse offers
Massive scalability to 100s of terabytes
Massively Parallel Processing Appliances
Low Cost Hardware Choice through industry standard hardware
Microsoft BI Platform Integration
Tier 1 Level Critical Advantage Program
Distributed Data Warehouse
Summary
Learn More:
Visit the Microsoft Data Warehousing portal
Visit the Fast Track and Parallel Data Warehouse
Web pages
Visit the SQL Server DW Portal on TechNet
Next Steps