embedded system design - simon fraser universitylshannon/courses/ensc460/lecture_set_3.pdf• this...
TRANSCRIPT
1
Special Topics in Advanced Digital System Design
by Dr. Lesley Shannon
Email: [email protected] Website: http://www.ensc.sfu.ca/~lshannon/
Slide Set: 3Date: January 23, 2007
Simon Fraser UniversitySimon Fraser University
ENSC 460/894: Lecture Set 3 2
Slide Set Overview
• An implementation study example
– Demonstrates why we have to make choices in our implementation platform to meet application specification requirements
• Example Application:
– A digital camera
ENSC 460/894: Lecture Set 3 3
Embedded System Design
• All real systems contain both hardware and software
– No such thing as a software only systems
– What do these systems look like?
– How do we design these systems?
• Recall the embedded system design methodology handout.
ENSC 460/894: Lecture Set 3 4
Embedded System Design
• This design study shows:
– Four implementations (with varying degrees of hardware)
– Illustrates the tradeoffs between hardware and software implementations
– Demonstrates how fixed-point vs floating-point can be used as an optimization for hardware and software
• This example is from Embedded System Design: A Unified Hardware/Software Introduction. Vahid, Gavargis 2000. (you can find more details in chapter 7 of this book)
ENSC 460/894: Lecture Set 3 5
•
ENSC 460/894: Lecture Set 3 6
What does a digital camera do?
• Recall our ipod Use Case Diagram:
1. Capture images, process them, store them in memory
2. Uploads images to a PC
3. Vary image size, delete images, digital stretching, zoom in and out, etc
• This example focuses on the image first Use case:– When the shutter is pressed:
• Image captured• Converted to digital form by charge coupled device (CCD)• Compressed and archived
• The compression part is discussed in detail
2
ENSC 460/894: Lecture Set 3 7
Specify System Requirements
• Performance:– We want to process a picture in one second
• Slower would be annoying
• Faster not necessary for a low-end camera
• Size:– Must fit on a low-cost chip
• Let’s say 200,000 gates, including the processor
• Power and Energy:– We don’t want a fan– We want the battery to last as long as possible
ENSC 460/894: Lecture Set 3 8
Lens area
Pixel columns
Covered columns
Circuitry
Shutter
Pix
el ro
ws
When exposed to light, each cell of the Charge Coupled Device (CCD) becomes electrically charged. This charge can then be converted to a 8-bit value where 0 represents no exposure while 255 represents very intense exposure of that cell to light.
Overview of Image Capture
ENSC 460/894: Lecture Set 3 9
Lens area
Pixel columns
Covered columns
Circuitry
Shutter
Pix
el ro
ws
Some of the columns are covered with a black strip of paint. The light-intensity of these pixels is used for zero-bias adjustments of all the cells.
Overview of Image Capture
ENSC 460/894: Lecture Set 3 10
Lens area
Pixel columns
Covered columns
Circuitry
Shutter
Pix
el ro
ws
The electromechanical shutter is activated to expose the cells to light for a brief moment.
Overview of Image Capture
ENSC 460/894: Lecture Set 3 11
Lens area
Pixel columns
Covered columns
Circuitry
Shutter
Pix
el ro
ws
The electronic circuitry, when commanded, discharges the cells, activates the electromechanical shutter, and then reads the 8-bit charge value of each cell. These values can be clocked out of the CCD by external logic through a standard parallel bus interface.
Overview of Image Capture
ENSC 460/894: Lecture Set 3 12
serial outpute.g., 011010...
yes no
CCDinput
Zero-bias adjust
DCT
Quantize
Archive in memory
More 8×8
blocks?
Transmit serially
yes
no Done ?
Overview of Image Capture
3
ENSC 460/894: Lecture Set 3 13
• Manufacturing errors cause cells to measure slightly above or slightly below the actual light intensity
• Error typically the same along columns, but different across rows
• Some of the left-most columns are blocked by black paint– If you get anything but 0, you have a zero-bias error– Each row is corrected by subtracting the average error in all the
blocked cells for that row…
(example on next slide)
Zero-Bias Error
ENSC 460/894: Lecture Set 3 14
Zero-Bias Error
123 157 142 127 131 102 99 235134 135 157 112 109 106 108 136135 144 159 108 112 118 109 126176 183 161 111 186 130 132 133137 149 154 126 185 146 131 132121 130 127 146 205 150 130 126117 151 160 181 250 161 134 125168 170 171 178 183 179 112 124
After zero-bias adjustment
136 170 155 140 144 115 112 248 12 14145 146 168 123 120 117 119 147 12 10144 153 168 117 121 127 118 135 9 9176 183 161 111 186 130 132 133 0 0144 156 161 133 192 153 138 139 7 7122 131 128 147 206 151 131 127 2 0121 155 164 185 254 165 138 129 4 4173 175 176 183 188 184 117 129 5 5
Covered cells
Before zero-bias adjustment
-13-11-90-7-1-4-5
Zero-bias adjustment
ENSC 460/894: Lecture Set 3 15
• Performs zero-bias adjustment• CcdppCapture uses CcdCapture and CcdPopPixel to obtain image• Performs zero-bias adjustment after each row read in
void CcdppCapture(void) {
CcdCapture();
for(rowIndex=0; rowIndex<SZ_ROW; rowIndex++) {
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] = CcdPopPixel();
}
bias = (CcdPopPixel() + CcdPopPixel()) / 2;
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] -= bias;
}
}
}
void CcdppCapture(void) {
CcdCapture();
for(rowIndex=0; rowIndex<SZ_ROW; rowIndex++) {
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] = CcdPopPixel();
}
bias = (CcdPopPixel() + CcdPopPixel()) / 2;
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] -= bias;
}
}
}
CCDPP (CCD PreProcessing) module
ENSC 460/894: Lecture Set 3 16
• Store more images• Transmit image to PC in less time• JPEG (Joint Photographic Experts Group)
– Popular standard format for representing digital images in a compressed form
– Provides for a number of different modes of operation– Based on discrete cosine transform (DCT)– Image data is divided into 8x8 blocks of pixels– 3 steps performed on each block:
1. DCT2. Quantization3. Huffman Encoding
Compression
ENSC 460/894: Lecture Set 3 17
• Transforms an 8x8 block of pixels into the frequency domain– We produce a new 8x8 block of values such that:
• Upper-left corner represent the “low frequency” components (essence of the image)
• Lower-right corner represents the “high frequency” components (the finer details)
– We can reduce the precision of the higher frequency componentsand retain reasonable image quality
Discrete Cosine Transform (DCT)
ENSC 460/894: Lecture Set 3 18
• Equation to perform DCT:
• where
∑ ∑= =
+∗
+∗∗∗∗=
7..0 7..0)
16)12(cos()
16)12(cos()()(
41),(
x yxy
vyuxDvCuCvuF ππ
⎪⎩
⎪⎨⎧ =
=otherwise 1
0h if 2
1)(hC
Discrete Cosine Transform (DCT)
4
ENSC 460/894: Lecture Set 3 19
• Achieve high compression ratio by reducing image quality • Reduce bit precision of encoded data
– Fewer bits needed for encoding– let’s reduce the precision equally across all frequency values– Even better, divide by a factor of 2
• Simple right shifts can do this
1150 39 -43 -10 26 -83 11 41-81 -3 115 -73 -6 -2 22 -514 -11 1 -42 26 -3 17 -382 -61 -13 -12 36 -23 -18 5
44 13 37 -4 10 -21 7 -836 -11 -9 -4 20 -28 -21 14
-19 -7 21 -6 3 3 12 -21-5 -13 -11 -17 -4 -1 7 -4
144 5 -5 -1 3 -10 1 5-10 0 14 -9 -1 0 3 -1
2 -1 0 -5 3 0 2 -50 -8 -2 -2 5 -3 -2 16 2 5 -1 1 -3 1 -15 -1 -1 -1 3 -4 -3 2
-2 -1 3 -1 0 0 2 -3-1 -2 -1 -2 -1 0 1 -1
After being decoded using DCT After quantization
Divide each cell’s value by 8
Quantization
ENSC 460/894: Lecture Set 3 20
void CodecDoFdct_for_one_block(int i, int j) {
int x, y;
for(x=0; x<8; x++)
for(y=0; y<8; y++)
obuffer[i*8+x][j*8+y] = FDCT(i, j, x, y, ibuffer);
}
void CodecDoFdct_for_one_block(int i, int j) {
int x, y;
for(x=0; x<8; x++)
for(y=0; y<8; y++)
obuffer[i*8+x][j*8+y] = FDCT(i, j, x, y, ibuffer);
}
void CodecDoFdct(void) {
int i, j;
for(i=0; i<NUM_ROW_BLOCKS; i++)
for(j=0; j<NUM_COL_BLOCKS; j++)
CodecDoFdct_for_one_block(i, j);
}
void CodecDoFdct(void) {
int i, j;
for(i=0; i<NUM_ROW_BLOCKS; i++)
for(j=0; j<NUM_COL_BLOCKS; j++)
CodecDoFdct_for_one_block(i, j);
}
CODEC
ENSC 460/894: Lecture Set 3 21
• Rather than computing the floating point cosine function…– Notice that there are only 64 distinct values need for
the cosine:
– So let’s pre-compute them
∑ ∑= =
+∗
+∗∗∗∗=
7..0 7..0)
16)12(cos()
16)12(cos()()(
41),(
x yxy
vyuxDvCuCvuF ππ
What about Fixed Point Number Representation?
ENSC 460/894: Lecture Set 3 22
• The result of the cosine is floating point– It would be better if we could store the table in less memory
• Example: Suppose we want to represent -1 to 1 using 16 bits
Floating Point Fixed Point
0.0 00.25 81920.5 163840.99999… 32767-0.5 -16384
• So if x is the floating point number, the fixed point number is =round ( 32768 * x)
• We can generate a fixed-point table now….
What about Fixed Point Number Representation?
ENSC 460/894: Lecture Set 3 23
static const short COS_TABLE[8][8] = {{ 32768, 32138, 30273, 27245, 23170, 18204, 12539, 6392 },{ 32768, 27245, 12539, -6392, -23170, -32138, -30273, -18204 },{ 32768, 18204, -12539, -32138, -23170, 6392, 30273, 27245 },{ 32768, 6392, -30273, -18204, 23170, 27245, -12539, -32138 },{ 32768, -6392, -30273, 18204, 23170, -27245, -12539, 32138 },{ 32768, -18204, -12539, 32138, -23170, -6392, 30273, -27245 },{ 32768, -27245, 12539, 6392, -23170, 32138, -30273, 18204 },{ 32768, -32138, 30273, -27245, 23170, -18204, 12539, -6392 }
};
static const short COS_TABLE[8][8] = {{ 32768, 32138, 30273, 27245, 23170, 18204, 12539, 6392 },{ 32768, 27245, 12539, -6392, -23170, -32138, -30273, -18204 },{ 32768, 18204, -12539, -32138, -23170, 6392, 30273, 27245 },{ 32768, 6392, -30273, -18204, 23170, 27245, -12539, -32138 },{ 32768, -6392, -30273, 18204, 23170, -27245, -12539, 32138 },{ 32768, -18204, -12539, 32138, -23170, -6392, 30273, -27245 },{ 32768, -27245, 12539, 6392, -23170, 32138, -30273, 18204 },{ 32768, -32138, 30273, -27245, 23170, -18204, 12539, -6392 }
};
static double COS(int xy, int uv) {return( COS_TABLE[xy][uv] / 32768.0);
}
static double COS(int xy, int uv) {return( COS_TABLE[xy][uv] / 32768.0);
}
CODEC
ENSC 460/894: Lecture Set 3 24
static int FDCT(int base_x, base_y, offset_x, offset_y, short **img) {r = 0; u = base_x*8 + offset_x;v = base_y*8 + offset_y;for(x=0; x<8; x++) {
s[x] = img[x][0] * COS(0, v) + img[x][1] * COS(1, v) + img[x][2] * COS(2, v) + img[x][3] * COS(3, v) + img[x][4] * COS(4, v) + img[x][5] * COS(5, v) + img[x][6] * COS(6, v) + img[x][7] * COS(7, v);
}for(x=0; x<8; x++) r += s[x] * COS(x, u);return (r * .25 * C(u) * C(v));
}
static int FDCT(int base_x, base_y, offset_x, offset_y, short **img) {r = 0; u = base_x*8 + offset_x;v = base_y*8 + offset_y;for(x=0; x<8; x++) {
s[x] = img[x][0] * COS(0, v) + img[x][1] * COS(1, v) + img[x][2] * COS(2, v) + img[x][3] * COS(3, v) + img[x][4] * COS(4, v) + img[x][5] * COS(5, v) + img[x][6] * COS(6, v) + img[x][7] * COS(7, v);
}for(x=0; x<8; x++) r += s[x] * COS(x, u);return (r * .25 * C(u) * C(v));
}
CODEC
5
ENSC 460/894: Lecture Set 3 25
• Serialize 8 x 8 block of pixels– Values are converted into single list using zigzag pattern
• Perform Huffman encoding– More frequently occurring pixels assigned short binary code
• Longer binary codes left for less frequently occurring pixels
Huffman Encoding
ENSC 460/894: Lecture Set 3 26
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x
Pixel frequencies
How often each pixel value occurs
(so, -1 occurs 15 times,
0 occurs 8 times, etc.)
Huffman Encoding
ENSC 460/894: Lecture Set 3 27
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1 1 1 1
2 3 4
5 5 5
5 68
15
1
Pixel frequencies
We are going to construct a tree. Each leaf represents one pixel value. The number inside the node is the frequency.
Huffman Encoding
ENSC 460/894: Lecture Set 3 28
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1 1
2 3 4
5 5 5
5 68
15
1
Pixel frequencies
We start adding internal nodes. The value inside in each internal node is the sum of the two leaves.
Choose two leaves such that the sum will be as small as possible. (in this case, leaves are both 1, and sum is 2)
Huffman Encoding
ENSC 460/894: Lecture Set 3 29
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
2 3 4
5 5 5
5 68
15
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 30
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22 3 4
5 5 5
5 68
15
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
6
ENSC 460/894: Lecture Set 3 31
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3 4
5 5 5
5 68
15
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 32
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
5 5 5
5 68
15
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 33
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65 5 5
5 68
15
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 34
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5 5
5 68
15
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 35
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
5 68
15
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 36
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115 68
15
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
7
ENSC 460/894: Lecture Set 3 37
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
68
15
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 38
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
15
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 39
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 40
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
29
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 41
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
2935
1
Pixel frequencies
Repeat, until we have a tree
Huffman Encoding
ENSC 460/894: Lecture Set 3 42
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x 144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
2935
64
1
Pixel frequencies
Now let’s construct the
Huffman Table
Huffman Encoding
8
ENSC 460/894: Lecture Set 3 43
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x 144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
2935
64
1
Pixel frequencies Huffman codes
-10-21235-3-5
-10144-9-8-46
14
Huffman Encoding
ENSC 460/894: Lecture Set 3 44
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x 144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
2935
64
1
Pixel frequencies Huffman codes
-1 000-21235-3-5
-10144-9-8-46
14
Find a path for each table entry
0 = go left, 1 = go right
ENSC 460/894: Lecture Set 3 45
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x 144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
2935
64
1
Pixel frequencies Huffman codes
-1 000 100-21235-3-5
-10144-9-8-46
14
Find a path for each table entry
0 = go left, 1 = go right
ENSC 460/894: Lecture Set 3 46
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x 144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
2935
64
1
Pixel frequencies Huffman codes
-1 000 100-2 1101235-3-5
-10144-9-8-46
14
Find a path for each table entry
0 = go left, 1 = go right
ENSC 460/894: Lecture Set 3 47
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x 144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
2935
64
1
Pixel frequencies Huffman codes
-1 000 100-2 1101 010235-3-5
-10144-9-8-46
14
Find a path for each table entry
0 = go left, 1 = go right
ENSC 460/894: Lecture Set 3 48
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x 144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
2935
64
1
-1 000 100-2 1101 0102 11103 10105 0110-3 11110-5 10110
-10 01110144 111111-9 111110-8 101111-4 1011106 011111
14 011110
Pixel frequencies Huffman codes
Huffman Encoding
9
ENSC 460/894: Lecture Set 3 49
-1 15x 0 8x-2 6x1 5x2 5x3 5x5 5x-3 4x-5 3x
-10 2x144 1x-9 1x-8 1x-4 1x6 1x
14 1x 144
5 3 2
1 0 -2
-1
-10 -5 -3
-4 -8 -96141 1
2
1 1
2
1
22
4
3
5
4
65
9
5
10
5
115
14
6
17
8
1815
2935
64
1
-1 000 100-2 1101 0102 11103 10105 0110-3 11110-5 10110
-10 01110144 111111-9 111110-8 101111-4 1011106 011111
14 011110
Pixel frequencies Huffman codes
Common pixel values are short
No code is a prefix of another code – makes decoding easy!
ENSC 460/894: Lecture Set 3 50
Archiving Images
Here’s a really simple memory map
The amount of memory required depends on N and the compression ratio
ENSC 460/894: Lecture Set 3 51
Uploading to the PC
• When connected to PC and upload command received– Read images from memory– Transmit serially using UART– While transmitting
• Reset pointers, image-size variables and global memory pointer accordingly
ENSC 460/894: Lecture Set 3 52
serial outpute.g., 011010...
yes no
CCDinput
Zero-bias adjust
DCT
Quantize
Archive in memory
More 8×8
blocks?
Transmit serially
yes
no Done ?
Overview of Image Capture
ENSC 460/894: Lecture Set 3 53
Implementing the Design
• We are going to talk about four potential implementations:
• 1. Microcontroller Alone (everything in software)
• 2. Microcontroller and CCDPP• 3. Microcontroller and CCDPP / Fixed-
Point DCT• 4. Microcontroller and CCDPP / DCT
ENSC 460/894: Lecture Set 3 54
Recall our System Requirements
• Performance:– We want to process a picture in one second
• Slower would be annoying
• Faster not necessary for a low-end camera
• Size:– Must fit on a low-cost chip
• Let’s say 200,000 gates, including the processor
• Power and Energy:– We don’t want a fan– We want the battery to last as long as possible
10
ENSC 460/894: Lecture Set 3 55
Implementation 1: Microprocessor Alone
Suppose we use an Intel 8051 Microcontroller
• Total IC Cost about $5– Well below 200mW power– We figure it will take 3 months
to get the product done– 12 Mhz, 12 cycles per instruction
• one million instructions per second– Can we get the required performance?
• (let’s say our grid is 64x64)
ENSC 460/894: Lecture Set 3 56
•
void CcdppCapture(void) {
CcdCapture();
for(rowIndex=0; rowIndex<SZ_ROW; rowIndex++) {
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] = CcdPopPixel();
}
bias = (CcdPopPixel() + CcdPopPixel()) / 2;
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] -= bias;
}
}
}
void CcdppCapture(void) {
CcdCapture();
for(rowIndex=0; rowIndex<SZ_ROW; rowIndex++) {
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] = CcdPopPixel();
}
bias = (CcdPopPixel() + CcdPopPixel()) / 2;
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] -= bias;
}
}
}
Nested loops, 64*(64+64) iterations.
If each iteration is 50 assembly language instructions,
8192 * 50 instructions = 409,600 instructions per image
This is half our budget and we haven’t even done DCT or Huffman yet!
ENSC 460/894: Lecture Set 3 57
•
void CcdppCapture(void) {
CcdCapture();
for(rowIndex=0; rowIndex<SZ_ROW; rowIndex++) {
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] = CcdPopPixel();
}
bias = (CcdPopPixel() + CcdPopPixel()) / 2;
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] -= bias;
}
}
}
void CcdppCapture(void) {
CcdCapture();
for(rowIndex=0; rowIndex<SZ_ROW; rowIndex++) {
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] = CcdPopPixel();
}
bias = (CcdPopPixel() + CcdPopPixel()) / 2;
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
buffer[rowIndex][colIndex] -= bias;
}
}
}
Nested loops, 64*(64+64) iterations.
If each iteration is 50 assembly language instructions,
8192 * 50 instructions = 409,600 instructions per image
This is half our budget and we haven’t even done DCT or Huffman yet!
ENSC 460/894: Lecture Set 3 58
Implementation 2: Microcontroller and CCDPP
• CCDPP function implemented on custom hardware unit– Improves performance – less microcontroller cycles– Increases engineering cost and time-to-market– Easy to implement
• Simple datapath• Few states in controller
• Simple UART easy to implement as custom hardware unit also• EEPROM for program memory and RAM for data memory added
as well
8051
UART CCDPP
RAMEEPROM
SOC
ENSC 460/894: Lecture Set 3 59
Implementation 2: Microcontroller and CCDPP
• CCDPP function implemented on custom hardware unit– Improves performance – less microcontroller cycles– Increases engineering cost and time-to-market– Easy to implement
• Simple datapath• Few states in controller
• Simple UART easy to implement as custom hardware unit also• EEPROM for program memory and RAM for data memory added
as well
8051
UART CCDPP
RAMEEPROM
SOC
ENSC 460/894: Lecture Set 3 60
• Synthesizable version of Intel 8051 available– Written in VHDL – Captured at register transfer level (RTL)
• Fetches instruction from ROM• Decodes using Instruction Decoder• ALU executes arithmetic operations
– Source and destination registers reside in RAM• Special data movement instructions used to load and store externally• Special program generates VHDL description of ROM from output of C
compiler
Microcontroller
To External Memory Bus
Controller
4K ROM
128RAM
Instruction Decoder
ALU
Block diagram of Intel 8051 processor core
11
ENSC 460/894: Lecture Set 3 61
UART
• UART invoked when 8051 executes store instruction with UART’senable register as target address– Memory-mapped communication between 8051 and UART
• Start state transmits 0 indicating start of byte transmission then transitions to Data state
• Data state sends 8 bits serially then transitions to Stop state
• Stop state transmits 1 indicating transmission done then transitions back to idle mode
invoked
I = 8
I < 8
Idle:I = 0
Start: Transmit
LOW
Data: Transmit
data(I), then I++
Stop: Transmit
HIGH
ENSC 460/894: Lecture Set 3 62
CCDPP• Hardware implementation of zero-bias operations• Internal buffer, B, memory-mapped to 8051
• GetRow state reads in one row from CCD to B– 66 bytes: 64 pixels + 2 blacked-out pixels
• ComputeBias state computes bias for that row and stores in variable Bias
• FixBias state iterates over same row subtracting Bias from each element
• NextRow transitions to GetRowfor repeat of process on next row or to Idle state when all 64 rows completed
C = 64
C < 64
R = 64 C = 66
invoked
R < 64
C < 66Idle:R=0C=0
GetRow:B[R][C]=Pxl
C=C+1
ComputeBias:Bias=(B[R][11] + B[R][10]) / 2
C=0
NextRow:R++C=0
FixBias:B[R][C]=B[R][C]-
Bias
ENSC 460/894: Lecture Set 3 63
Connecting SoC Components• Memory-mapped
– All single-purpose processors and RAM are connected to 8051’s memory bus
• Read– Processor places address on 16-bit address bus– Asserts read control signal for 1 cycle– Reads data from 8-bit data bus 1 cycle later– Device (RAM or custom circuit) detects asserted read control signal– Checks address– Places and holds requested data on data bus for 1 cycle
• Write– Processor places address and data on address and data bus– Asserts write control signal for 1 clock cycle– Device (RAM or custom circuit) detects asserted write control signal– Checks address bus– Reads and stores data from data bus
ENSC 460/894: Lecture Set 3 64
Analysis
•
VHDL simulator
VHDL VHDL VHDL
Execution time
Synthesis tool
gates gates gates
Sum gates
Gate level simulator
Power equation
Chip area
Power
ENSC 460/894: Lecture Set 3 65
Analysis
• Entire SOC tested on VHDL simulator– Interprets VHDL descriptions and functionally simulates
execution of system• Recall program code translated to VHDL description of ROM
– Tests for correct functionality– Measures clock cycles to process one image (performance)
• Gate-level description obtained through synthesis– Synthesis tool like compiler for hardware– Simulate gate-level models to obtain data for power analysis
• Number of times gates switch from 1 to 0 or 0 to 1– Count number of gates for chip area
ENSC 460/894: Lecture Set 3 66
Analysis of Implementation 2
• Total execution time for processing one image: – 9.1 seconds
• Power consumption: – 0.033 watt
• Energy consumption: – 0.30 joule (9.1 s x 0.033 watt)
• Total chip area: – 98,000 gates
12
ENSC 460/894: Lecture Set 3 67
Analysis of Implementation 2
• Total execution time for processing one image: – 9.1 seconds
• Power consumption: – 0.033 watt
• Energy consumption: – 0.30 joule (9.1 s x 0.033 watt)
• Total chip area: – 98,000 gates
ENSC 460/894: Lecture Set 3 68
Implementation 3: Fixed-Point DCT
• Most of the execution time is now spent in the DCT
• We could design custom hardware like we did for CCDPP– More complex, so more design effort
• Let’s see if we can speed up the DCT by modifying the number representation (but still do it in software)
ENSC 460/894: Lecture Set 3 69
DCT Floating Point Cost• DCT uses ~260 floating-point operations per pixel transformation
– 4096 (64 x 64) pixels per image– 1 million floating-point operations per image
• No floating-point support with Intel 8051– Compiler must emulate– Generates procedures for each floating-point operation
• mult, add– Each procedure uses tens of integer operations
• Thus, > 10 million integer operations per image• More procedures increase code size
– Fixed-point arithmetic can improve on this – Shrink code size
ENSC 460/894: Lecture Set 3 70
Fixed-Point Arithmetic• Integers used to represent a real number
– Some bits represent fraction, some bits represent whole number
Integer Part = 2There are 16 possible values (“codes”) of the factional part.
If we “quantize” the fractional value over these 16 possible codes:
0: encode with 0000
1/16: encode with 0001
…
15/16: encode with 1111
So this fractionalpart is 12/16 = 0.75
So the number is 2.75
ENSC 460/894: Lecture Set 3 71
Fixed-Point Arithmetic
• How do you convert a real constant to fixed point:– Multiply real value by 2 ^ (# of bits used for fractional part)– Round to nearest integer
Example: Represent 3.14 as 8-bit integer with 4 bits for fraction• 2^4 = 16• 3.14 x 16 = 50.24 ≈ 50 = 00110010• 16 (2^4) possible values for fraction, each represents 0.0625 (1/16)• Last 4 bits (0010) = 2• 2 x 0.0625 = 0.125• 3(0011) + 0.125 = 3.125 ≈ 3.14
• To get a more accurate the representation:– Use more bits to represent the fraction
ENSC 460/894: Lecture Set 3 72
Fixed-Point Arithmetic
• Addition:– A good approximation is to simply add the fixed-point
representations:
Example: Suppose we want to add 3.14 and 2.713.14 is represented as 0011 00102.71 is represented as 0010 1011
Add these two representations to get: 0101 1101This corresponds to 5.8125, which is kind of close to 5.85
• To get a more accurate the representation– Use more bits to represent the fraction
13
ENSC 460/894: Lecture Set 3 73
Fixed-Point Arithmetic• Multiply:
– Multiply the representations– Shift right by the number of bits in the fractional part
Example: Suppose we want to multiply 3.14 and 2.713.14 is represented as 0011 00102.71 is represented as 0010 1011
Multiply these two representations to get: 100001100110Shift right by 4 bits: 10000110This corresponds to 8.375, which is kind of close to 8.5094
• Moral: we can add and multiply easily. This is faster and smaller than floating point
ENSC 460/894: Lecture Set 3 74
New CODECstatic const char code COS_TABLE[8][8] = {
{ 64, 62, 59, 53, 45, 35, 24, 12 },{ 64, 53, 24, -12, -45, -62, -59, -35 },{ 64, 35, -24, -62, -45, 12, 59, 53 },{ 64, 12, -59, -35, 45, 53, -24, -62 },{ 64, -12, -59, 35, 45, -53, -24, 62 },{ 64, -35, -24, 62, -45, -12, 59, -53 },{ 64, -53, 24, 12, -45, 62, -59, 35 },{ 64, -62, 59, -53, 45, -35, 24, -12 }
};
static const char code COS_TABLE[8][8] = {{ 64, 62, 59, 53, 45, 35, 24, 12 },{ 64, 53, 24, -12, -45, -62, -59, -35 },{ 64, 35, -24, -62, -45, 12, 59, 53 },{ 64, 12, -59, -35, 45, 53, -24, -62 },{ 64, -12, -59, 35, 45, -53, -24, 62 },{ 64, -35, -24, 62, -45, -12, 59, -53 },{ 64, -53, 24, 12, -45, 62, -59, 35 },{ 64, -62, 59, -53, 45, -35, 24, -12 }
};
static int FDCT(int base_x, base_y, offset_x, offset_y, short **img) {r = 0; u = base_x*8 + offset_x;v = base_y*8 + offset_y;for (x=0; x<8; x++) {
s[x] = 0;for(j=0; j<8; j++)
s[x] += (img[x][j] * COS_TABLE[j][v] ) >> 6;}for(x=0; x<8; x++) r += (s[x] * COS_TABLE[x][u]) >> 6;return (short)((((r * (((16*C(u)) >> 6) *C(v)) >> 6)) >> 6) >> 6);
}
static int FDCT(int base_x, base_y, offset_x, offset_y, short **img) {r = 0; u = base_x*8 + offset_x;v = base_y*8 + offset_y;for (x=0; x<8; x++) {
s[x] = 0;for(j=0; j<8; j++)
s[x] += (img[x][j] * COS_TABLE[j][v] ) >> 6;}for(x=0; x<8; x++) r += (s[x] * COS_TABLE[x][u]) >> 6;return (short)((((r * (((16*C(u)) >> 6) *C(v)) >> 6)) >> 6) >> 6);
}
ENSC 460/894: Lecture Set 3 75
Analysis of Implementation 3
• Total execution time for processing one image:– 1.5 seconds
• Power consumption: – 0.033 watt (same as implementation 2)
• Energy consumption: – 0.050 joule (1.5 s x 0.033 watt)
• Battery life 6x longer!!
• Total chip area: – 90,000 gates
• (8,000 fewer gates -- less memory needed for code)
ENSC 460/894: Lecture Set 3 76
Analysis of Implementation 3
• Total execution time for processing one image:– 1.5 seconds
• Power consumption: – 0.033 watt (same as implementation 2)
• Energy consumption: – 0.050 joule (1.5 s x 0.033 watt)
• Battery life 6x longer!!
• Total chip area: – 90,000 gates
• (8,000 fewer gates -- less memory needed for code)
ENSC 460/894: Lecture Set 3 77
Implementation 4: Implement the CODEC in H/W
8051
UART CCDPP
RAMEEPROM
SOC
CODEC
ENSC 460/894: Lecture Set 3 78
Implementation 4: Implement the CODEC in H/W
8051
UART CCDPP
RAMEEPROM
SOC
CODEC
14
ENSC 460/894: Lecture Set 3 79
CODEC Design• Four memory mapped registers
– C_DATAI_REG: used to push 8x8 block into CODEC– C_DATAO_REG: used to pop 8 x 8 block out of CODEC– C_CMND_REG: used to command CODEC
• Writing 1 to this register invokes CODEC– C_STAT_REG: indicates CODEC done and ready for next block
• Polled in software
• Direct translation of C code to VHDL for actual hardware implementation.
• Fixed-point version used
ENSC 460/894: Lecture Set 3 80
Analysis of Implementation 4
• Total execution time for processing one image:– 0.099 seconds (well under 1 sec)
• Power consumption: – 0.040 watt– Increase over 2 and 3 because the chip has more hardware
• Energy consumption: – 0.00040 joule (0.099 s x 0.040 watt)– Battery life 12x longer than previous implementation!!
• Total chip area: – 128,000 gates– Significant increase over previous implementations
ENSC 460/894: Lecture Set 3 81
Analysis of Implementation 4
• Total execution time for processing one image:– 0.099 seconds (well under 1 sec)
• Power consumption: – 0.040 watt– Increase over 2 and 3 because the chip has more hardware
• Energy consumption: – 0.00040 joule (0.099 s x 0.040 watt)– Battery life 12x longer than previous implementation!!
• Total chip area: – 128,000 gates– Significant increase over previous implementations
ENSC 460/894: Lecture Set 3 82
So, what do you tell your boss?
• Implementation 3 – Close in performance– Cheaper
• Less time to build and less gates
• Implementation 4– Great performance and energy consumption– More expensive and may miss time-to-market window
• If DCT designed ourselves then increased engineering cost and time-to-market
• If existing DCT purchased then increased IC cost
• Which is better?
Implementation 2 Implementation 3 Implementation 4 Performance (second) 9.1 1.5 0.099 Power (watt) 0.033 0.033 0.040 Size (gate) 98,000 90,000 128,000 Energy (joule) 0.30 0.050 0.0040
ENSC 460/894: Lecture Set 3 83
Highlights of this slide set:
• We saw an example / case study that illustrates some of the tradeoffs
– Hardware takes longer to design– Hardware will be faster– Sometimes you can optimize the software instead– Always a tradeoff between performance, cost, and time
• This was just one example. • However, these concepts can be applied to the general problem of
designing embedded systems and SoCs.