optimizing your bi semantic model for performance and scale dave wickert (ae7td)...
TRANSCRIPT
Optimizing Your BI Semantic Model for Performance and Scale
Dave Wickert (AE7TD) [email protected] Program ManagerSQL Server Business IntelligenceMicrosoft
My radios
Main rig:ICOM 7000100w HF into a 33’ Buddipole
My radios
QRP rig: (on order )Elecraft KX310w HF into a 17M 28’ PAR EndFedz wire thrown up into a tree
My radios
SDR ‘toy’:Funcube Dongle
My radios
DStar digital radio in my office:DV Dongle
Session Objectives
You will understand: architecture of Analysis Services in tabular modeoptimizing processing performancequery processing architecture
Takeaway
Factors to think through for capacity planning
BI Semantic Model: Architecture
BI Semantic Model
Data model
Business logic and queries
Data access ROLAP MOLAPxVelocityVertiPaq
DirectQuery
MDX DAX
Multi-dimensional
Tabular
Third-partyapplications
ReportingServices Excel PowerPivot
Databases LOB Applications Files OData Feeds Cloud Services
SharePointInsights
VertiPaq Design Principles
1. Performance, Performance, Performance
VertiPaq Design Principles
2. Query Performance >> Processing Performance
VertiPaq Design Principles
3. Accommodate changes without forcing reload, if possible
Encoding
Value EncodingArithmetic for value <-> dataIDGreat for dense value distributionAllows computation on dataIDs
Hash EncodingHash table for value<->dataIDGreat for sparse value distributionRequires decompression for computation
Encoding(per column) Compression
(per segment)
values dataID(int)value hash
Encoding Example
Sales
1
2,000,000,000
Sales
1
2
Hash Encoding Value Encoding
Defined automatically to conserve space, discoverable through DMVs
T2(C1,C2)
P2
Part
itio
n 1
CC1
6
VertiPaq StorageT1
(C1,C2,C3)
Part
itio
n
1Pa
rtit
ion 2
Dictionary
Col. Segment Hierarchy
CC1
Relationship
1
1
2
2
3
3
4
4
5
5
6
Table data stored in segments & dictionaries per columnCalculated columns are stored like regular columnsHierarchies can provide quicker access for querying
Relationship structures are created to accelerate lookups across tables, remove hard coupling between tables
Partitions are a group of segments and intended for data managementAny table can have partitions, defined independently of other tables. Partitions are only for segment data
demo
Inspecting VertiPaq Storage
Processing Architecture
What’s going on?
Network
Memory
CPU
Processing Architecture
Hands On
Processing Phases
Read & Encode DataSegment N
Segment N
CompressSegment N
Read & Encode DataSegment N + 1
CompressSegment N+1
Build calc cols, hier,
relationships
Segment N +1
t
Special case of 3rd segment
Read & Encode DataSegment (until 2*segment_size)
Segment 1 + 2
CompressSegment 1
Read & Encode DataSegment 3
CompressSegment 2
Split
First segment can “stretch” to be twice as large Optimizes for smaller lookup tables
Segment 3
CompressSegment 3
t
Processing - Memory & CPU usage
Network
Memory
CPU
Segment 1 + 2
Compress Segment 1
Compress Segment 2
Read & Encode DataSegment (until 2*segment_size)
Read & Encode DataSegment 3
CompressSegment 3
Calc cols, hier, rels.
Segment 3Split
Controlling Segment Size and CompressionDefaultSegmentRowCount – # of rows in the segment
0 – default: 8M for Analysis Services, 1M for PowerPivotValue must be power of 2, should be at least 1MLarger => generally better compression, faster queries with lower overheadSmaller => smaller working set during processing
ProcessingTimeboxSecPerMRow-1 – default: 10 sec Smaller => greedy algorithm gives most gains in the beginningLarger => almost always better compression, higher query performanceIncrease for large number of columns (~>200)Reported in profiler & DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS
Processing options
DataT1Data
Hierarchies
CC1
Relationships
Recalc
Full: Data and RecalcDefault: Data and Recalc if neededDefrag: defragment dictionaries for a tableClear: remove dataAdd: creates new partition and merges
Incremental Processing
TypicalCreate new partitions for new dataProcessData on new partition or to reload into existing tablesProcess Recalc to rebuild calc columns, hierarchies, relationships
AvoidMultiple ProcessFull – each causes a Recalc, unless in a single transactional batch
Advanced ProcessingParallel processing
Use single transaction batch for processing tables in parallelNo parallel processing of a table’s partitions in SQL Server 2012
Use ProcessDefrag periodicallyDeleting partitions may leave outdated values in dictionaries
Merge partitions for reducing metadata overhead
Error handling RI violations assigned a blank value, no error during processing
Server Memory Map
VertiPaqPagingMode enables use of system pagingDictionaries need to remain in memory
Server memory
DB1 FE VertiPaq
caches
DB2 DB1~New, not
committed part of the database
VertiPaqMemoryLimit – 60 %
TotalMemoryLimit – 80 %
demo
Inspecting Processing Sequence
Query Processing
Querying
Two formula engines: MDX, DAX
FE calls into VertiPaq to retrieve data Query logic pushed to VertiPaq where possibleA VertiPaq query executes in parallel, one core per segmentOptimized for execution on compressed formatVertiPaq-level caching for chatty FE/VertiPaq communicationEvents in profiler
VertiPaq
MDX DAX
Queries
VertiPaq Query Performance
Scans @ 5B+ rows/s 1B+ rows of data
Scans @ 20B+ rows/s10B+ rows of data
demo
Inspecting Query Evaluation
Rich & Fast
RichSingle threaded per queryDesigned for expressivity
SimpleOne core per segmentOptimized for speed
DAX/MDX VertiPaq Query
More pushed down over time
Why raw speed counts . . .Vertipaq Performance, Performance, Performance
Amdahl's law: Establishes themaximum expected improvementto an overall system when onlypart of the system is improved.
The speedup of a program usingmultiple processors in parallelcomputing is limited by the timeneeded for the sequential fractionof the program.
Session Objectives and Takeaways
Session Objectives: Understand the architecture of Analysis Services in tabular modeUnderstand and optimize processing performanceUnderstand query processing architecture
Takeaways:Factors to think through for capacity planning
Related Content
DBI61-HOL: Developing a Tabular BISM Using SQL Server Data Tools
Microsoft SQL Server: Breakthrough Insights - Credible, Consistent Data
Workbook shown in talk:
Available here:• http://www.powerpivotblog.nl/
/what-is-using-all-that-memory-on-my-analysis-server-instance /bismservermemoryreport
• http://www.powerpivotblog.nl/wp-content /uploads/2012/02/BISMServerMemoryReport.xlsx
• http://www.powerpivotblog.nl/?s=BISMServerMemory
Track Resources
@sqlserver@ms_teched
mvaMicrosoft Virtual Academy
SQL Server 2012 Eval Copy
Get Certified!
Hands-On Labs
Resources
Connect. Share. Discuss.
http://europe.msteched.com
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Resources for Developers
http://microsoft.com/msdn
Evaluations
http://europe.msteched.com/sessions
Submit your evals online
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.
Best Practices for Tabular ModelsWorkspace DB for data-driven design experience
BI Dev Studio
Analysis Services
Workspace DB
Process Tables/Partitions
Analysis Services
Deploy project
DB
Options for working with large datasets
1. Point to smaller/empty DB2. Filter large tables to reduce
rows