parallel design methodology for video codec lsi with high-level synthesis and fpga-based platform
DESCRIPTION
DAC50, Designer Track, 156-VB543. Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform. Kazuya YOKOHARI, Koyo NITTA, Mitsuo IKEDA, and Atsushi SHIMIZU NTT Media Intelligence Laboratories. Outline. Introduction Proposed Design Methodology - PowerPoint PPT PresentationTRANSCRIPT
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based
Platform
Kazuya YOKOHARI, Koyo NITTA,Mitsuo IKEDA, and Atsushi SHIMIZUNTT Media Intelligence Laboratories
6/5/2013 1
DAC50, Designer Track, 156-VB543
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Outline
• Introduction• Proposed Design Methodology• Case Study: 4K HEVC Intra Codec• Evaluation• Conclusion
6/5/2013 2
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Video Codec LSI
6/5/2013 3
• MPEG-2 and H.264/AVC are major standards of video coding.
• We have developed MPEG-2 video codec LSI (VASA) and H.264/AVC codec LSI (SARA).
• The development of video codec LSI needs many simulations.
Test data
VASA (MPEG-2)
SARA (H.264/AVC)
Bit Stream(Coded Image)
Codec LSI
• Coded image should be evaluated by subjective and objective evaluation.
• Degradations of some coded images are not detected by objective evaluation.
• Subjective evaluation in real-time is important to find these degradations.
Objective evaluation examples: BD-Bitrate, SSIM, PSNR
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Existing LSI Design Flow
6/5/2013 4
Stimulus
Verification
Behavioral Synthesis
Verification
Logic Synthesis
P & R
SystemCsource codes
Verilog-RTL codes
Verilog-RTLcodes(already verified)
Fail
Pass
Fail
Pass
ASIC FPGA IP core
TechnologyLibrary
Behavioral design
RTL design
Gate-level design
• Even behavioral design which is fastest simulation environment needs 100 times simulation time, at the existing design flow.
• Fast simulation environment is important, since many simulations are needed at the video codec LSI design.
Simulation Speed
X100 (on CPU)
X1,000 (on CPU)X100 (on emulator)
X10,000 (on CPU)X1,000 (on emulator)
Existing architecture exploration loop
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
The Problems of The Video Codec LSI Development
6/5/2013 5
• Many simulations are needed at the development of the video codec LSI.
• The simulation needs 100 times simulation time at the existing LSI design.
• To resolve above problems, simulation and circuit design environments are important to check and improve codec LSI performance smoothly.
• Simulation environment: FPGA-based platform.Real-time simulation becomes possible using FPGA.
Rapid prototyping becomes possible using high-level synthesis.
• Circuit design environment: High-level synthesis.
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Video Codec Design Platform
6/5/2013 6
• The video codec design platform is able to run large scale circuit simulation in real-time using many FPGAs.
• The proposed platform enables input and output image data in real-time using some SDI interfaces.
FPGA1 FPGA2
FPGA3 FPGA4
FPGA(Center
)
SDI interface
• The proposed platform has many FPGAs, since the scale of a product level video codec LSI is very large.
• This platform enables simulations of a product level circuit using many FPGAs.
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Proposed Video Codec Design Flow (1/2)
6/5/2013 7
Stimulus
Verification
Behavioral Synthesis
Verification
Logic Synthesis
P & R
SystemCsource codes
Verilog-RTL codes
Verilog-RTLcodes(already verified)
Fail
Pass
Fail
Pass
ASIC FPGA IP core
TechnologyLibrary
Behavioral design
RTL design
Gate-level design
•Proposed design flow enables rapid prototyping using high-level synthesis.•Proposed design flow enables real-time simulation using the proposed platform.
Simulation Speed
X100 (on CPU)
X1,000 (on CPU)X100 (on emulator)
X10,000 (on CPU)X1,000 (on emulator)
Existing architecture exploration loop
Proposed architecture exploration loop
X1(on video codec design platform)
•Feedback time is needed by repetition of each design steps when single architecture exploration loop is used.
GOOD
NOT GOOD
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Proposed Video Codec Design Flow (2/2)
6/5/2013 8
Stimulus
Verification
Behavioral Synthesis
Verification
Logic Synthesis
P & R
SystemCsource codes
Verilog-RTL codes
Verilog-RTLcodes(already verified)
Fail
Pass
Fail
Pass
ASIC FPGA IP core
TechnologyLibrary
Behavioral design
RTL design
Gate-level design
• Circuits design is subdivided and parallel design is performed, in order to reduce feedback time by repetition of each design steps.
• Using parallel design, architecture exploration is realized at high speed.
Simulation Speed
X100 (on CPU)
X1,000 (on CPU)X100 (on emulator)
X10,000 (on CPU)X1,000 (on emulator)
Existing architecture exploration loop
Proposed architecture exploration loop
X1(on video codec design platform)
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Summary of The Proposed Design Methodology
The proposed parallel design methodology has three features.1. High-level synthesis.
– Using high-level synthesis, a target circuit architecture can be easily changed and tuned compared with a RTL design methodology.
2. Video codec design platform.– Using video codec design platform, a subjective image
evaluation can be performed, since the proposed platform can perform simulation in real-time.
3. Parallel design.– Using parallel design and high-level synthesis, the function
addition in smaller unit becomes possible that leads to the reduction of a feedback time.
6/5/2013 9
Combining these three features, an effect of subjective image quality for each function can be evaluated and used for architecture exploration.
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Case Study: 4K HEVC Intra Codec
6/5/2013 10
IntraPrediction
Transform and
Quantization
EntropyCoding
Video Coding
InputData
OutputStream
• HEVC (High Efficiency Video Coding) is a next generation video coding standard.
• HEVC intra codec consists of three blocks, intra prediction, transform and quantization, and entropy coding block.
Intra Prediction generates prediction difference image from input data and predicted image data.
Transform and Quantization
generates quantized values from transformed difference image and reconstruction image from quantized values.
Entropy Codinggenerates bit stream from quantized values.
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
The Specifications of the HEVC Intra Codec
6/5/2013 11
*CU stands for Coding Unit.*PU stands for Prediction Unit.*TU stands for Transform Unit.*HM is a reference software of HEVC
• Prediction Mode
2
0: Planar1: DC
10
18
26
34
STEP1 STEP2 (LOOP#1)
STEP2 (LOOP#2)
STEP2(LOOP#3)
Intra Prediction
•PU: 32x32•Prediction Mode: 4
•Prediction Mode: 7
•PU: 64x64, 16x16
Transform and Quantization
•TU: 32x32 •TU: 16x16
Entropy Coding
•CU: 32x32 •CU: 64x64
Base Algorithm
•HM3.0 •HM7.0 This slide’s scope.
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Evaluation (1/2)
6/5/2013 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
0.5
1
1.5
2
2.5
3
3.5
4
4.5
050000100000150000200000250000300000350000400000450000500000
IPDTQECCycle
Area Cycle
1 2 3 4 50
0.2
0.4
0.6
0.8
1
1.2
0
500
1000
1500
2000
2500
3000
3500Area Cycle
1 2 3 4 50
0.2
0.4
0.6
0.8
1
1.2
0
500
1000
1500
2000
2500
3000
3500
Design Period (Month)
Area Cycle
STEP1
STEP2LOOP#1
STEP2LOOP#2
STEP2LOOP#3
Circuits Performances and Design Period
The main changed points of each block.• LOOP#1: Version up base algorithm of
each block• LOOP#2: Functional expansion of IPD• LOOP#3: Functional expansion of each
block
• The circuit performances of each expanded function are evaluated at STEP2.
• The feedback data is available from other design loops at STEP2.
Subjective Evaluation Period
Feedback data is available
Subjective Evaluation Period
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Evaluation (2/2)
6/5/2013 13
• Using the proposed parallel design methodology, three design loops were able to be tried in only seven months.
• Using the proposed parallel design methodology, the number of cycle*area was reduced to 1/5 in four months after preliminary design of the LOOP#1 and 1/4 in three months after preliminary design of the LOOP#2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
0.2
0.4
0.6
0.8
1
1.2
STEP1, STEP2(LOOP#1)
STEP2(LOOP#2)
STEP2(LOOP#3)
Design Period (Month)
Cycle*Area
90% down
STEP1
STEP2
LOOP#180% down(four months)
LOOP#275% down(three months)
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
Conclusion• We proposed that the new design methodology for
video codec LSI. Using the proposed design methodology, we are able to reduce feedback time and run simulation and evaluate coded image in real-time.
• Using the proposed design methodology, three design loops were able to be tried in only seven months.
• Using the proposed design methodology, the number of cycle * area was reduced to 1/5 in four months after preliminary design of the LOOP#1 and 1/4 in three months after preliminary design of the LOOP#2.
• In order to realize a HEVC codec, we need to add or expand some functional tools, checking subjective evaluation of these tools.
6/5/2013 14