Jeff McAllisterIntel Senior Technical Consulting EngineerWebinar – Oct. 27, 2016
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice2
Welcome! - About our Speaker
Jeff McAllisterMedia & OpenCL Senior Software Technical Consulting Engineer
Developer Products Division
Intel Software & Services Group
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice3
Welcome! What We’ll Cover Today
Intel Hardware
Intel® Software Tools &
SDKs
Awesome Video
Processing Solutions
The Other Side of the Chip
See Technical Specifications for System Requirements - Select SKUs of Intel® Xeon® & Core™ processor-based platforms apply.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice5
Why Develop Now for 6th Generation Intel® Xeon® & Intel® Core™ Processors?
Take advantage of Intel CPUs, integrated graphics (GPUs) & hardware-accelerated MPEG-2, AVC, & now NEW HEVC codecs
to deliver fast, high-density, real-time video transcoding.
Stay competitive – Transition to 4K - Innovate cloud video, OTT streaming, immersive experiences & more.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice6
Intel Hardware is Heterogeneous
CPUs Awesome general purpose performance
Large software ecosystem
Other Programmable Intel Hardware GPU (shown here)
IPU
FPGA
See Technical Specifications for System Requirements - Select SKUs of Intel® Xeon® & Core™ processor-based platforms apply.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice7
Media Capabilities
Gen9 Processor Graphics GPU 14nm process technology
Integrated with processor
Higher Performance- GT2 with 24 execution units- GT4e* with 72 EUs &128MB eDRAM- CPU+GPU provide over 1 TFLOPS processing power
Latest API feature support- DirectX 3D 2015 version, OGL 4.4, OpenGL ES 3.0, OpenCL 2.1- Tightly coupled CPU/GPU programming using Shared Virtual
memory + OpenCL
Expanded hardware acceleration for media features- Low power/full fixed function AVC encode- HEVC Encode/Decode- MJPEG Encode
See Technical Specifications for System Requirements - Select SKUs of Intel® Xeon® & Core™ processor-based platforms apply.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice8
Video Transcoding Performance: HEVC
Intel and the Intel logo are trademarks of Intel Corporation.
NEW! Up to 2 Real-time HEVC streams per Intel® Xeon® processor1
115 real-time HD AVC-HEVC or 4 realtime UHD AVC-HEVC transcode , 8 real-time HD HEVC-HEVC or 2 realtime UHD HEVC-HEVC transcode using Intel MediaSDK (Target usage 7), all content 8-bit 4:2:0. - Benchmark platform configuration: Processor: Intel® Xeon® processor E3-1585Lv5 @ 3.0GHz, Ring @ 3.0GHz and GT @1.15GHz; primary BIOS Version: SKLSE2R1.R00.B104.B01.1511110114; driver: 20.19.15.4444. platform: RVP11 halo fab 2; OS: Windows* 8.1x64 Enterprise, 16 GB memory, 2 DIMMS 2133 MHz, one socket, four cores, Intel®Iris™ Pro Graphics P580, Intel® Hyper-threading Technology enabled, Intel® Virtualization technology enabled.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/performance.
Multistream Performance (1xRT=30fps)
Number of Real-time (30fps) streams
Number of Real-time (60fps) streams
1080p-to-1080p
AVC-to-HEVC 15 7HEVC-to-HEVC 8 4
4K-to-4K
AVC-to-HEVC 4 2HEVC-to-HEVC 2 1
E3-1500 v5 HEVC is fully accelerated targeting 4K60 capability
Specific hardware technical specifications apply. See performance benchmarks and Media Server Studio site for details.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice9
Graphics Technology HighlightsGlossary
Execution Units (EUs) = general purpose cores
EUs, samplers, caches, etc. in “slices”
Fixed function is in “unslice”
eDRAM adds cache, increases bandwidth
adds Other names Summary
Intel® HD Graphics
GT2“4+2”
Good
Intel® Iris™ Graphics
+slices+eDRAM
GT3“2+3e”
Better
Intel® Iris™ Pro Graphics
+slices+eDRAM
GT3e,GT4e“4+4e”
Best
Naming Convention
Just look for Intel® QuickSync Video at ark.intel.com
Fixed Function (VDBox, VEBox)
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice10
Intel Processor Graphics/GPU Overview
GT2Intel® HD Graphics24 EUs, 1 MFX
GT3Intel® Iris™ Graphics48 EUs, 2 MFX
GT4Intel® Iris™ Pro Graphics72 EUs, 2 MFX
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice11
Codecs + Frame Processing use Fixed Function + EUs
EU EU
EU EU
EU EU
EU EU
Sampler
EU EU
EU EU
EU EU
EU EU
3D
FFMedia Fixed Function
VDBOX VEBOX
EU EU
EU EU
EU EU
EU EU
VPP
Video Decoding BSD=VDBox decode
Caches
Video EncodingENC= EU+VDBox VME (MB type, motion vectors, bit budget/BRC)PAK = VDBox (residue packing & entropy coding)VDENC = low power encode (6th Generation Core® & forward)
VPHalVideo Processing Hardware
Acceleration Layer
VEBox
• Deinterlacing
• Denoise (Luma/Chroma)
• Frame Rate Conversion
• Color space conversions
• Composition/alpha blending
• ScalingSampler Sampler
Optimizing Media Solutions & Applications with the Intel® Media SDK & Intel® SDK for OpenCL™ Applications
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice13
Better Together
Heterogeneous Toolsaccess the other side of the chip
Intel® Media Server Studio
Intel® Media SDK**
Intel® SDK for OpenCL™ Applications**
He
tero
ge
ne
ou
s H
ard
wa
re
**Also available as standalone tools.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Intel® Media SDK / Intel® Media Server StudioM
ain
lo
op
Init
iali
za
tio
n
Decode
init
Parameters (from header)
stream frame VPP frame Encode stream
init
Parameters (in & out)
init
Parameters
Media accelerator frameworkCodec basedHigh level/parameter interface 3 operations
Good option for: Accelerated video encode, decode (and short list of frame processing)
Links to More Information Media Server Studio Media SDK Intel Media Code Samples
14
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice15
Intel® Media SDK 2017 Supported Codecs
Standard Encode Decode
HEVC (main profile) HW HW
AVC SW/HW/ low power SW/HW
MPEG-2 SW/HW SW/HW
MJPEG SW/ HW SW/ HW
MVC SW/HW SW/HW
VC-1 - SW/HW
green=new in Intel® Media Server Studio for Gen9
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
N:1 Frame Composition
Resizing
Color Conversion
Deinterlacing
Denoising
Frame Rate Conversion
Brightness/Contrast/Saturation
Sharpening
16
Intel® Media SDK 2017 Supported Video Processing Features
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice17
Media Software Scope Diagram
Transcode pipeline
Intel Media SDK/Intel® Media Server Studio focus
Limited support
Out of scope/external component
Intel® Media SDK Audio (AAC enc/dec, MPEG dec)
Intel® Media SDK (Video)
Decode Encode
Demuxer/Splitter
Process
Muxer
ES
ES ES
ES
Decode Process Encode
ES = Elementary stream
Container file input
Container file output
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice18
Containers & Timestamps Presentation Time Stamp (PTS): Time when unit should be presented (viewed/heard) Decoding Time Stamp (DTS): Secondary – required when decode + buffer must happen before presentation
due to complex reference structure (i.e. modern video formats)
Container
Audio
•Packets
•Timestamps
Video
•Packets
•DTS, PTS Demux
Decode Frame Proc Encode
container fmt
bitstream
elementary video
bitstreamSurface Surface
elementary video
bitstream
video
audio
mux
mfxBitstream• DecodeTimeStamp (DTS)• TimeStamp(PTS)
mfxFrameSurface1->mfxFrameData:• TimeStamp(PTS)
Takeaways Intel® Media SDK forwards timestamps (PTS) through the pipeline Except for FRC and deinterlace VPP, Media SDK does not touch timestamps New DTS added to output bitstream based on encoder GOP settings
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice19
Memory Usage – Memory Surfaces
Two surface types– System surfaces– Video surfaces
Video is tiled & can’t be efficiently accessed by CPU
Every Media SDK component supports both memory types, internal copy is used if necessary
Internal copy may lead to HUGE performance degradation
Memory
system
CPU
app SW
GPU
HW
video
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice20
Opaque Memory
Software/Hardware
Allocate DirectX*
Surfaces
Allocate System
Memory Buffers
DECODE/VPP/ENCODE Initialization
Hardware
Software
DECODE/VPP/ENCODE
Initialization
Allocate frames with NULL buffer pointers
Before After
Allocator Callbacks
Problem: Different allocation pathways for software & hardware implementations increases complexity & code maintenance
Solution: Let Intel® Media SDK allocate surfaces & handle them internally
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice21
Basic Structure of an Intel® Media SDK-optimized Application
Ap
pli
cati
on
Initialize Session, set parameters
Query + Allocate
Main loop
Find free surface
Q stages: decode, VPP Encode
Sync
Retrieve output
Drain loop Same as above
Clean up, exit
Loop
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice22
Intel® Media SDK “Hello world”
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice23
All Stages are Initialized with mfxVideoParam
typedef struct {mfxU32 AllocId;mfxU32 reserved[2];mfxU16 reserved3;mfxU16 AsyncDepth;
union {mfxInfoMFX mfx;mfxInfoVPP vpp;
};mfxU16 Protected;mfxU16 IOPattern;mfxExtBuffer** ExtParam;mfxU16 NumExtParam;mfxU16 reserved2;
} mfxVideoParam;
mfxInfoMFX (decode/encode) Codec, profile/level Decode: mostly read from stream header Encode: params covered in encode section
(in mfxstructures.h)
mfxInfoVPP In/Out frame parameters (covered in VPP section)
Extended Parameter sets (next slide)
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice24
Extended Parameter Sets
mfxExtCodingOption2 extendedCodingOptions2;
memset(&extendedCodingOptions2, 0, sizeof(extendedCodingOptions2));
extendedCodingOptions2.Header.BufferId = MFX_EXTBUFF_CODING_OPTION2;
extendedCodingOptions2.Header.BufferSz = sizeof(extendedCodingOptions2);
extendedCodingOptions2.BRefType=MFX_B_REF_PYRAMID;
mfxExtCodingOption3 extendedCodingOptions3;
memset(&extendedCodingOptions3, 0, sizeof(extendedCodingOptions3));
extendedCodingOptions3.Header.BufferId = MFX_EXTBUFF_CODING_OPTION3;
extendedCodingOptions3.Header.BufferSz = sizeof(extendedCodingOptions3);
extendedCodingOptions3.EnableMBQP=MFX_CODINGOPTION_ON;
mfxExtBuffer* extendedBuffers[2];
extendedBuffers[0] = (mfxExtBuffer*) & extendedCodingOptions2;
extendedBuffers[1] = (mfxExtBuffer*) & extendedCodingOptions3;
mfxEncParams.ExtParam = extendedBuffers;
mfxEncParams.NumExtParam = 2;
Problem: How to future proof AND allow the SDK to grow?
Solution: Design in the ability to extend while retaining mfxVideoParam original size
(Now most apps require EPS)
Common pattern configure ID and size in header 0=default/unused Supply vals for params used
Attach an array of param buffersTell Intel® Media SDK how big the array is
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice25
Enabling Low Power h264 Encode
mfxEncParams.mfx.LowPower=MFX_CODINGOPTION_ON;
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Plu
gin
s
Dri
ve
r
26
3 Flavors of HEVC
Hardware
Closest to Hardware
AVC performance
Fastest
Software
Best quality, most options
Slowest
Software+GPUAcceleration
Close to Software Quality
Boost for Software Performance
In Intel® Media Server Studio 2017 & Intel® Media SDK (Windows) when run
on supported hardware with Gen9/Skylake graphics
In Intel® Media Server Studio Professional Edition. More hardware options.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Encode Setup
//load HEVC plugin here
sts = Initialize(impl, ver, &session, NULL);
MFXVideoENCODE mfxENC(session);// Create Media SDK encoder
// Set required video parameters for encode...
// Query number of required surfaces for encoder
sts = mfxENC.QueryIOSurf(&mfxEncParams, &EncRequest);
sts = mfxENC.Init(&mfxEncParams); // Initialize the Media SDK encoder
// Main loop
27
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Expected Return Codes for EncodeFrameAsyc
28
Basic Encode Flow
EncodeFrameAsyc(surface in)
MFX_ERR_MORE_DATA
EncodeFrameAsyc(null in)
Initialize
Finish (MFX_ERR_MORE_DATA indicates all surfaces drained)
Main loop Drain loop
Mo
re i
np
ut
Inp
ut
fin
ish
ed
MFX_ERR_MORE_DATA
•More input surface data is required to proceed. Encode may request several input surfaces before producing its first output.
MFX_WRN_DEVICE_BUSY
•Hardware device is unable to respond. This is an expected output for normal operation & should clear after a very short wait. However, if this state persists more than a few milliseconds this may indicate a problem.
MFX_ERR_NOT_ENOUGH_BUFFER
•Bitstream output buffer is not big enough to contain output frame. Output buffer size must be increased.
Other
•Other error codes may be bugs. Contact an Intel support representative for more information.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice29
Encode do { if (still_reading_file) { // main loop sts = mfxENC.EncodeFrameAsync(NULL, pEncSurfaces[nEncSurfIdx], &mfxBS, &syncp); } else { // drain loop sts = mfxENC.EncodeFrameAsync(NULL, NULL, &mfxBS, &syncp); if (sts==MFX_ERR_MORE_DATA) break; } switch(sts) { case MFX_WRN_DEVICE_BUSY: MSDK_SLEEP(1); break; case MFX_ERR_MORE_DATA: nEncSurfIdx = GetFreeSurfaceIndex(pEncSurfaces, nEncSurfNum); // Find free surface readsts=LoadRawFrame(pEncSurfaces[nEncSurfIdx], fSource); if (readsts!=MFX_ERR_NONE) still_reading_file=0; break; } if (sts!=MFX_ERR_NONE) continue; sts = session.SyncOperation(syncp, 60000); // Synchronize. // bitstream data can be used here } while (true);
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice30
Application Design Fundamentals
Heuristic
Use video memory/NV12 color format
Avoid CPU<->GPU raw frame copies
Run asynchronously
Minimize waits for non-GPU tasks
Asynchronous: Each stage can have multiple frames “in flight,” Frames locked while a session is working on them.
Session/pipeline-based: Not accelerating individual operations
Based on video memory (NV12 color format, GPU allocated): Arrange pipelines to minimize conversion steps.
Designed to minimize copies: As with NV12 conversions, arrange pipeline steps to reuse surfaces in the same location instead of copying them between CPU and GPU.
Minimize waits: Enqueue as many operations as possible w/o blocking for CPU.
Intel® Media SDK (Video)
Decode EncodeProcess
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Why OpenCL + Intel® Media SDK?
Decode EncodeProcess
What Intel® Media SDK Covers (highlevel)
OpenCL CoversFuller range/lower level
Media SDK provides optimized implementations for: Codecs Frame Processing Operations
For video processing tasks not in Media SDK’s scope, extend with OpenCL Make use of growing GPU capabilities Keep pipelines on GPU
Example uses: color conversions, custom bit rate control
Fixed Function
Performance
Add your Innovation via
GPGPU
Build Something Awesome!
31OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice32
Advanced Analysis can be Yours Use Intel® VTune™ Amplifier** to analyze Intel® Media SDK & OpenCL™ optimized Media Applications
GEN GPU engines
utilization
GPU hardware metrics over
time
Memory Bandwidth
CPU Software Threads
**Available in Intel® Media Server Studio or as a standalone tool.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
How to get the Intel® Media SDK
33
Intel® Media SDK - FREE
Platform / Device Targets Intel® Core™ or Core™ M processors
Select SKUs of Intel® Celeron™, Pentium™ & Atom™ processors with Intel® HD Graphics supporting Intel® Quick Sync Video
Client devices – Desktop/mobile applications
See Technical Specifications for System RequirementsSee Technical Specifications for System Requirements
Intel® Media Server Studio – 3 Editions (includes Free Community)
Platform / Device Targets Select SKUs of Intel® Xeon® & Core™ processor-based
platforms Applications for media, communications infrastructure,
video processing/conferencing, digital surveillance, video cloud & data center
For HEVC, AVC, MPEG-2, MPEG-Audio
Downloadsoftware.intel.com/media-sdk
Downloadsoftware.intel.com/intel-media-server-studio
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice34
More Resources
• software.intel.com/media-sdkIntel® Media SDK
• software.intel.com/intel-media-server-studioIntel® Media Server Studio
• github.com/Intel-Media-SDK/samplesLearn from Samples
& Tutorials
• software.intel.com/forums/intel-media-sdkAsk questions at the forum
Webinar Replays
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Legal Notices, Disclaimers & Optimization NoticeIntel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance.
All information provided here is subject to change without notice. Contact your Intel representative, sales office or distributor to obtain the latest Intel product specifications and roadmaps.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
The cost reduction scenarios described in this document are intended to enable you to get a better understanding of how the purchase of a given Intel product, combined with a number of situation-specific variables, might affect your future cost and savings. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel, the Intel logo, Xeon, Core, Iris Pro, and VTune are trademarks of Intel Corporation in the U.S. and other countries.OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
36
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice37
Intel® Media Server Studio Editions – At a Glance
Feature/Component Community Edition** Essentials Edition Professional Edition
Intel® Media SDK
Graphics Drivers
Code Samples
OpenCL™ Code Builder and Runtime
Metrics Monitor (Linux* only)
Intel® Premier Support
HEVC Decoder & Encoder, GPU Assist APIs
Audio Decoder & Encoder
Video Quality Caliper
Intel® VTune™ Amplifier
Premium Telecine Interlace Reverser
Premium Components