accumulator architecture processing on the tms320c6x vliw dsp

38
RASTER IMAGE PROCESSING ON THE TMS320C6X VLIW DSP Prof. Brian L. Evans in collaboration with Niranjan Damera-Venkata and Wade Schwartzkopf Embedded Signal Processing Laboratory The University of Texas at Austin Austin, TX 78712-1084 http://signal.ece.utexas.edu/ Accumulator architecture Load-store architecture M em ory-register architecture

Upload: others

Post on 12-Sep-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

07_C6xImage2RASTER IMAGE PROCESSING ON THE TMS320C6X VLIW DSP
P r of. B r i a n L . E v a n s
in co l labora t ion w i th N ir a n ja n D a m e r a -Ven k a t a a n d
W a d e S ch w a r t z k opf
E m b e d d e d S ign a l P r oces s in g L a b or a t or y T h e U n iver s it y of T e x a s a t A u s t in
A u s t in , TX 78712-1084
h t t p ://s i g n a l.e c e .u t e x a s .e d u /
A ccu m u la tor arch i tec tu re
L oad-s tore arch itectu r e
M em ory-regis ter arch itectu r e
2
Outline
n I n t e r p ola t ion
n H a lft o n i n g
n C /C++ codin g t i p s
n Con clu s ion
3
Introduction
n R a s t e r s ca n
n Raster image processing 4Process one or more rows at a time
4Pixel operations: color conversion, ordered dither halftoning
4Local operations: JPEG coding, FIR filtering, interpolation, error diffusion halftoning
4
Raster Image Processing on the TMS320C6x
n T M S 3 2 0 C C 6 x w or k s b e s t w it h 1 6 -bit d a t a
4B y t e s p e r im a g e p ixel : 1 for g r e y s ca le , 3 or 4 for color
4R e d u ce p rocesso r pe r fo rm a n ce or d ou b le m e m or y
n N u m b e r of 4 8 0 0 -p ixel r o w s (e.g. 8 in . a t 6 0 0 d p i) of a g r e y s ca le im a ge t h a t ca n fit in t o m e m or y
P r o c e s s o r M H z D a t a ( k b i t s )
P r o g r a m ( k b i t s ) P r i c e
M a x R o w s b y t e p i x e l s
M a x R o w s h a l f w o r d s
C 6 2 1 1 ** 1 6 7 3 2 + 5 1 2 3 2 $ 2 5 1 3 6 .5
C 6 2 0 1 2 0 0 5 1 2 5 1 2 $ 1 5 9 1 3 6 .5
C 6 2 0 2 2 5 0 1 0 0 0 2 0 0 0 $ 1 8 4 2 6 1 3 .0
C 6 2 0 3 3 0 0 4 0 0 0 3 0 0 0 n /a 1 0 4 5 2 .0
** C6211 has 512 kbits of L2 on-chip cache. All of it used for the image.
5
Color Spaces
n R G B : R e d G r e e n B l u e
4A d d it ive color
4C R T d i s p l a y s
n Y C r C b : L u m i n a n c e C h r om i n a n ce
4D e cou p les in t e n s it y a n d color in for m a t ion
4D igit a l im a ge /vid e o com p r e s s ion s t a n d a r d s (d igit a l TV)
4E y e l e s s s e n s i t i v e t o ch r o m i n a n c e t h a n lu m in a n ce: s u b s a m p le C r /C b w it h ou t s ign ifica n t v isu a l degr a d a t ion
n C M Y(K ): C y a n M a gen t a Yellow (Black)
4S u b t r a ct ive color
4P r in t in g a n d p h ot o g r a p h y
4B la ck in k u s e d for im p r oved color g a m u t , a n d fa s t e r d r y in g a n d p u r e r r e n d e r in g for b la ck a n d gr e y s
6
n N e s t e d con v e r s ion for m u la s
4Y = C r e d R + C g r e e n G + C blu e B
4C r = (R - Y ) / (2 - 2 C r e d )
4C b = (B - Y ) / (2 - 2 C b l u e)
ITU RGB to YCrCb Standards
R e c o m m e n d a t i o n C r e d
C g r e e n
C b l u e
C r r a n g e C
b r a n g e
I T U ( C C I R ) 6 0 1 -1 0 . 2 9 8 9 0 . 5 8 6 6 0 . 1 1 4 5 [-1 8 2 ,182 ] [-1 4 4 ,144 ]
I T U ( C C I R ) 7 0 9 0 . 2 1 2 6 0 . 7 1 5 2 0 . 0 7 2 2 [-1 6 2 ,162 ] [-1 3 7 ,137 ]
I T U 0 . 2 2 2 0 0 . 7 0 6 7 0 . 0 7 1 3 [-1 6 4 ,164 ] [-1 3 7 ,137 ]
n Y is lossless, but Cr/Cb is clipped to [-128,127]
n Assume that RGB has been gamma corrected
n Rec 601-1 used with TIFF and JPEG standards
n 8-bit format
4Y,R,G,B in [0,255]
4Cr in [-128,127]
4Cb in [-128,127]
7
n N e s t e d con v e r s ion for m u la s
4R = Y + (2 - 2 C r e d ) C r
4B = Y + (2 - 2 C b l u e) C b
4G = (Y - C blu e B - C r e d R ) / C g r e e n
ITU YCrCb to RGB Standards
n 8-bit format
4Y,R,G,B in [0,255]
4Cr in [-128,127]
4Cb in [-128,127]
http://www.neuro.sfc.keio.ac.jp/~aly/polygon/info/color-space-faq.html http://www.inforamp.net/~poynton/notes/colour_and_gamma/ColorFAQ.html
R e c o m m e n d a t i o n C r e d
C g r e e n
C b l u e
R r a n g e B r a n g e
I T U ( C C I R ) 6 0 1 -1 0 .2989 0 .5866 0 .1145 [-1 7 9 ,433] [-2 2 7 ,480]
I T U ( C C I R ) 7 0 9 0 .2126 0 .7152 0 .0722 [-2 0 0 ,455] [-2 3 8 ,493]
I T U 0 .2220 0 .7067 0 .0713 [-1 9 9 ,453] [-2 3 8 ,493]
n Range of G is [-134,390] for Rec 601-1
n RGB values are clipped to [0,255] and rounded
8
RGB/YCrCb Conversion in Floating Point
n N e s t e d for m u la s
0.2989 0.5866 0.1145 0.5000 -0.4183 -0.0817 -0.1688 -0.3312 0.5000
1 1.4022 0 1 -0.7145 -0.3458 1 0 1.7710
n Matrix multiplication
[Y Cr Cb] T = M [R G B]T [R G B]T = M [Y Cr Cb]
T
Cr = 0.7132 (R - Y)
Cb = 0.5647 (B - Y)
9
n M u lt iplica t ion b y d ir e ct ion ca lcu la t ion
4Q u a n t ize coefficien t s
4P u t coefficien t s in r e g i s t e r s a n d s t r e a m p ixels
4H igh ly a ccu r a t e u n d e r e x t e n d e d p r e cis ion a ccu m u la t ion
4W e ll-m a t ch e d t o D S P s a n d gr a p h ics ca r d s
n M u lt iplica t ion b y t a b le look u p
4P r eca lcu la t e m u lt iplica t ion s (floa t in g p oin t t im e s b y t e )
4S t or e in 5 (9 ) 256-by te t ab l e s for n e s t e d (m a t r ix) for m u la s m e a n s 5 (9 ) t im es in cr e a s e in m e m o r y b a n d w i d t h a n d p oor ca ch e p e r for m a n ce
4D o n ot n e e d e x t e n d e d p r e cision a ccu m u la t or s
4W e ll-m a t ch e d t o A S I C s a n d m icrocon t r olle r s
n A d d it ion s : 4 (6) for n e s t e d (m a t r ix) for m u la s
RGB/YCrCb Conversion in Fixed Point
10
9794 19221 3752 16384 -13706 -2677 -5531 -10852 16384
32767 2 (22973) 0 32767 -23412 -11331
32767 0 2 (29015)
n Matrix multiplication (coefficients scaled by 215-1)
[Y Cr Cb] T = M [R G B]T [R G B]T = M [Y Cr Cb]
T
G = 2 (27929) Y - 6396 B - 18504 R
Y = 9794 R + 19221 G + 3752 B
Cr = 23369 (R - Y)
Cb = 18504 (B - Y)
n Move 8-bit color quantity to upper 8 of 16 bits
n Use 16 x 16 multiplication, 32-bit accumulation
n Nested formulas (coefficients scaled by 215-1)
11
n R G B t o CMY ( idea l case )
4C = 2 5 5 - R
4M = 2 5 5 - G
4Y = 2 5 5 - B
n C M Y t o C M Y K R G B t o C M Y K
4K = m i n ( C ,M ,Y ) K = 2 5 5 - m a x(R , G , B )
4C = 2 5 5 (C - K ) / (2 5 5 - K ) C = 2 5 5 (m - R ) / m
4M = 2 5 5 (M - K ) / (2 5 5 - K ) M = 2 5 5 (m - G ) / m
4Y = 2 5 5 (Y - K ) / (2 5 5 - K ) Y = 2 5 5 (m - B ) / m
4 2 -D look u p t a b les m = m a x (R , G , B )
n R , G , B , C , M , Y , a n d K h a ve a r a n ge of [0 ,255]
n U s e fu l in p r in t e r s a n d cop ier s
D ivis ion t a k e s 1 or 2 in s t r u ct ion s p e r b it
of p r e cision in r e s u lt
12
Matrix Computation Example
; Texas Instruments, INC. ; ; MATRIX VECTOR MULTIPLY ; ; ftp://ftp.ti.com/pub/tms320bbs/c67xfiles/mvm.asm ; ; DESCRIPTION ; A[][] * B[] = C[] ; ; ARGUMENTS PASSED ; a[] -> A4 ; b[] -> B4 ; c[] -> A6 ; rows -> B6 ; columns -> A8 ; ; CYCLES ; (n + 20)*m + 1 (m = # of rows, n = # of columns)
13
Matrix Computation Example (cont.)
*** begin piplining inner loop
SUB .L1X rows,1,ocntr || ADD .L2 bptr,4,btmp || LDW .D1T1 *aptr++(4),aa0 ;1 load a[i] from memory || LDW .D2T2 *bptr,bb0 ;1 load b[i] from memory || SUB .S2X colms,1,lcntr ; load cntr = comumns - 1
oloop:
[lcntr] LDW .D1T1 *aptr++(4),aa0 ;2 if(lcntr) load a[i] from memory || [lcntr] LDW .D2T2 *btmp++(4),bb0 ;2 if(lcntr) load b[i] from memory || [lcntr] SUB .L2 lcntr,1,lcntr ;2 if(lcntr) lcntr -= 1 || SUB .S1 colms,2,icntr ; || ZERO .L1 sum0 ; zero the running sum
[lcntr] LDW .D1T1 *aptr++(4),aa0 ;3 if(lcntr) load a[i] from memory || [lcntr] LDW .D2T2 *btmp++(4),bb0 ;3 if(lcntr) load b[i] from memory
|| [lcntr] SUB .L2 lcntr,1,lcntr ;3 if(lcntr) lcntr -= 1
[lcntr] LDW .D1T1 *aptr++(4),aa0 ;4 if(lcntr) load a[i] from memory || [lcntr] LDW .D2T2 *btmp++(4),bb0 ;4 if(lcntr) load b[i] from memory || [lcntr] SUB .L2 lcntr,1,lcntr ;4 if(lcntr) lcntr -= 1
14
Matrix Computation Example (cont.)
[lcntr] LDW .D1T1 *aptr++(4),aa0 ;5 if(lcntr) load a[i] from memory || [lcntr] LDW .D2T2 *btmp++(4),bb0 ;5 if(lcntr) load b[i] from memory || [lcntr] SUB .L2 lcntr,1,lcntr ;5 if(lcntr) lcntr -= 1 || B .S2 iloop ;1 branch to iloop
[lcntr] LDW .D1T1 *aptr++(4),aa0 ;6 if(lcntr) load a[i] from memory || [lcntr] LDW .D2T2 *btmp++(4),bb0 ;6 if(lcntr) load b[i] from memory || [lcntr] SUB .L2 lcntr,1,lcntr ;6 if(lcntr) lcntr -= 1 || [icntr] SUB .L1 icntr,1,icntr ;6 if(icntr) icntr -= 1 || MPYSP .M1X aa0,bb0,mult0 ;1 mult0 = a[i]*b[i] || [icntr] B .S2 iloop ;2 if(icntr) branch to iloop
[lcntr] LDW .D1T1 *aptr++(4),aa0 ;7 if(lcntr) load a[i] from memory || [lcntr] LDW .D2T2 *btmp++(4),bb0 ;7 if(lcntr) load b[i] from memory || [lcntr] SUB .L2 lcntr,1,lcntr ;7 if(lcntr) lcntr -= 1 || [icntr] SUB .L1 icntr,1,icntr ;7 if(icntr) icntr -= 1 || MPYSP .M1X aa0,bb0,mult0 ;2 mult0 = a[i]*b[i] || [icntr] B .S2 iloop ;3 if(icntr) branch to iloop
15
Matrix Computation Example (cont.)
[lcntr] LDW .D1T1 *aptr++(4),aa0 ;8 if(lcntr) load a[i] from memory || [lcntr] LDW .D2T2 *btmp++(4),bb0 ;8 if(lcntr) load b[i] from memory || [lcntr] SUB .L2 lcntr,1,lcntr ;8 if(lcntr) lcntr -= 1 || [icntr] SUB .L1 icntr,1,icntr ;8 if(icntr) icntr -= 1 || MPYSP .M1X aa0,bb0,mult0 ;3 mult0 = a[i]*b[i] || [icntr] B .S2 iloop ;4 if(icntr) branch to iloop
[lcntr] LDW .D1T1 *aptr++(4),aa0 ;9 if(lcntr) load a[i] from memory || [lcntr] LDW .D2T2 *btmp++(4),bb0 ;9 if(lcntr) load b[i] from memory || [lcntr] SUB .L2 lcntr,1,lcntr ;9 if(lcntr) lcntr -= 1 || [icntr] SUB .L1 icntr,1,icntr ;9 if(icntr) icntr -= 1 || MPYSP .M1X aa0,bb0,mult0 ;4 mult0 = a[i]*b[i] || [icntr] B .S2 iloop ;5 if(icntr) branch to iloop
iloop:
[lcntr] LDW .D1T1 *aptr++(4),aa0 ;10 if(lcntr) load a[i] from memory || [lcntr] LDW .D2T2 *btmp++(4),bb0 ;10 if(lcntr) load b[i] from memory || [lcntr] SUB .L2 lcntr,1,lcntr ;10 if(lcntr) lcntr -= 1 || [icntr] SUB .S1 icntr,1,icntr ;10 if(icntr) icntr -= 1 || MPYSP .M1X aa0,bb0,mult0 ;5 mult0 = a[i]*b[i] || ADDSP .L1 mult0,sum0,sum0 ;1 sum0 = sum0+mult0 || [icntr] B .S2 iloop ;6 if(icntr) branch to iloop
16
***************** add up the running sums ***
MV .D1 sum0,temp1 ; temp1 = sum0 ADDSP .L1 sum0,temp1,temp2 ; temp2 = temp1 + sum0 (2nd sum0) MV .D1 sum0,temp1 ; temp1 = sum0 (the 3rd sum0) ADDSP .L1 sum0,temp1,temp3 ; temp3 = temp1 + sum0 (4th sum0) NOP 2 ; wait for temp3
[ocntr] B .S2 oloop ; if(ocntr) branch to oloop ADDSP .L1 temp2,temp3,sum0 ; sum0 = temp2 + temp3
*** [ocntr] MV .D2 bptr,btmp ; reset *b to beginning of b
SUB .S1 colms,2,icntr ; inner cntr = columns - 2 || SUB .S2X colms,1,lcntr ; load cntr = comumns - 1
LDW .D1T1 *aptr++(4),aa0 ;1 load a[i] from memory || LDW .D2T2 *btmp++(4),bb0 ;1 load b[i] from memory
STW .D1 sum0,*cptr++(4) ; c[i] = sum0 || [ocntr] SUB .L1 ocntr,1,ocntr ; if(ocntr) ocntr -= 1
17
Pixels of original image u
Convolution mask for the interpolation
1 1 1 1
FIR filter by H
4Alternate pixels may be
skipped
1
/* v is the zoomed (interpolated) version of u */ v[m,n]=u[round(m/2),round(n/2)]
n Interpolation by pixel replication
4Computationally simple
Bilinear Interpolation
n I n t e r p ola t e r o w s t h e n colu m n s (or v ice-versa )
4 I n cr e a s e d com p lexi t y
4R e d u ced a lia s in g
/* v is the zoomed (interpolated) version of u */ v1[m,2n] = u[m,n] v1[m,2n+1] = a1*u[m,n]+a2*u[m,n+1] v[2m,n] = v1[m,n] v[2m+1,n] = b1*v1[m,n]+b2*v1[m+1,n]
1 2 1
1 2 1
4 22H = >> 4
D FIR filter by H followed
by a shift
2-D FIR Filter
n D iffe r e n ce equ a t ion
y (n ) = 2 x (n 1 ,n 2) + 3 x (n 1-1,n 2) + x (n 1 ,n 2-1) + x (n 1-1,n 2-1)
n Vector dot product plus keep M1 rows in memory and
circularly buffer input
2-D Filter Implementations
n S t or e M 1 x M 2 filt e r coe fficie n t s i n s e q u e n t ia l m e m or y (vector) of len g t h M = M 1 M 2
n F or e a ch ou t p u t , for m vector f rom N 1 x N 2 im a ge
1 M 1 s e p a r a t e d ot p r o d u c t s o f l e n g t h M 2 a s b y t e s
2 F or m im a ge vect or b y r a s t e r s ca n n in g i m a ge a s b y t e s
3 F or m im a ge vect or b y r a s t e r s ca n n in g i m a ge a s w or d s
I m p lem e n t a t ion 1 2 3
T h r ou gh p u t
(s a m p les/cycle) 1 2 1 .5
D a t a r e a d a t
on e t im e
( b y t e s ) 1 1 2
Raster scan
2-D FIR Implementation #1 on C6x
; registers: A5=&a(0,0) B5=&x(n1,n2) B7=M A9=M2 B8=N2 fir2d1 MV .D1 A9,A2 ; inner product length || SUB .D2 B8,B7,B10 ; offset to next row || CMPLT.L1 B7,A9,A1 ; A1=no more rows to do || ZERO .S1 A4 ; initialize accumulator || SUB .S2 B7,A9,B7 ; number of taps left fir1 LDBU .D1 *A5++,A6 ; load a(m1,m2), zero fill || LDBU .D2 *B5++,B6 ; load x(n1-m1,n2-m2) || MPYU .M1X A6,B6,A3 ; A3=a(m1,m2) x(n1-m1,n2-m2) || ADD .L1 A3,A4,A4 ; y(n1,n2) += A3 ||[A2] SUB .S1 A2,1,A2 ; decrement loop counter ||[A2] B .S2 fir1 ; if A2 != 0, then branch
MV .D1 A9,A2 ; inner product length || CMPLT.L1 B7,A9,A1 ; A1=no more rows to do || ADD .L2 B5,B10,B5 ; advance to next image row ||[!A1]B .S1 fir1 ; outer loop || SUB .S2 B7,A9,B7 ; count number of taps left ; A4=y(n1,n2)
23
2-D FIR Implementation #2 on C6x
; registers: A5=&a(0,0) B5=&x(n1,n2) A2=M B7=M2 B8=N2 fir2d2 SUB .D2 B8,B7,B9 ; byte offset between rows || ZERO .L1 A4 ; initialize accumulator || SUB .L2 B7,1,B7 ; B7 = numFilCols - 1 || ZERO .S2 B2 ; offset into image data
fir2 LDBU .D1 *A5++,A6 ; load a(m1,m2), zero fill || LDBU .D2 *B6[B2],B6 ; load x(n1-m1,n2-m2) || MPYU .M1X A6,B6,A3 ; A3=a(m1,m2) x(n1-m1,n2-m2) || ADD .L1 A3,A4,A4 ; y(n1,n2) += A3 || CMPLT.L2 B2,B7,B1 ; need to go to next row? || ADD .S2 B2,1,B2 ; incr offset into image
[!B1] ADD .L2 B2,B9,B2 ; move offset to next row ||[A2] SUB .S1 A2,1,A2 ; decrement loop counter ||[A2] B .S2 fir2 ; if A2 != 0, then branch ; A4=y(n1,n2)
24
2-D FIR Implementation #3 on C6x
; registers: A5=&a(0,0) B5=&x(n1,n2) A2=M B7=M2 B8=N2 fir2d3 ZERO .D1 A4 ; initialize accumulator #1 || SUB .D2 B8,B7,B9 ; index offset between rows || ZERO .L2 B2 ; offset into image data || MVKH .S1 0xFF,A8 ; mask to get lowest 8 bits || SHR .S2 B7,1,B7 ; divide by 2: 16bit address
ZERO .D2 B4 ; initialize accumulator #2 || ZERO .L1 A6 ; current coefficient value || ZERO .L2 B6 ; current image value || SHR .S1 A2,1,A2 ; divide by 2: 16bit address || SHR .S2 B9,1,B9 ; divide by 2: 16bit address
Initialization
25
2-D FIR Implementation #3 on C6x (cont.)
fir3 LDHU .D1 *A5++,A6 ; load a(m1,m2) a(m1+1,m2+1) || LDHU .D2 *B6[B2],B6 ; load two pixels of image x || CMPLT.L2 B2,B7,B1 ; need to go to next row? || ADD .S2 B2,1,B2 ; incr offset into image
AND .L1 A6,A8,A6 ; extract a(m1,m2) || AND .L2 B6,A8,B6 ; extract x(n1-m1,n2-m2) || EXTU .S1 A6,0,8,A9 ; extract a(m1+1,m2+1) || EXTU .S2 B6,0,8,B9 ; extract x(n1-m1+1,n2-m2+1)
MPYHU .M1X A6,B6,A3 ; A3=a(m1,m2) x(n1-m1,n2-m2) || MPYHU .M2X A9,B9,B3 ; B3=a*x offset by 1 index || ADD .L1 A3,A4,A4 ; y(n1,n2) += A3 || ADD .L2 B3,B4,B4 ; y(n1+1,n2+1) += B3 ||[!B1]ADD .D2 B2,B9,B2 ; move offset to next row ||[A2] SUB .S1 A2,1,A2 ; decrement loop counter ||[A2] B .S2 fir3 ; if A2 != 0, then branch ; A4=y(n1,n2) and B4=y(n1+1,n2+1) Main Loop
26
FIR Filter Implementation on the C6x
MVK .S1 0x0001,AMR ; modulo block size 2^2 MVKH .S1 0x4000,AMR ; modulo addr register B6 MVK .S2 2,A2 ; A2 = 2 (four-tap filter) ZERO .L1 A4 ; initialize accumulators ZERO .L2 B4
; initialize pointers A5, B6, and A7 fir LDW .D1 *A5++,A0 ; load a(n) and a(n+1)
LDW .D2 *B6++,B1 ; load x(n) and x(n+1) MPY .M1X A0,B1,A3 ; A3 = a(n) * x(n) MPYH .M2X A0,B1,B3 ; B3 = a(n+1) * x(n+1) ADD .L1 A3,A4,A4 ; yeven(n) += A3 ADD .L2 B3,B4,B4 ; yodd(n) += B3
[A2] SUB .S1 A2,1,A2 ; decrement loop counter [A2] B .S2 fir ; if A2 != 0, then branch
ADD .L1 A4,B4,A4 ; Y = Yodd + Yeven STH .D1 A4,*A7 ; *A7 = Y
Throughput of two multiply-accumulates per cycle
27
Ordered Dithering on a TMS320C62x
; remove next two lines if thresholds in linear array MVK .S1 0x0001,AMR ; modulo block size 2^2 MVKH .S1 0x4000,AMR ; modulo addr reg B6
; initialize A6 and B6 .trip 100 ; minimum loop count
dith: LDB .D1 *A6++,A4 ; read pixel || LDB .D2 *B6++,B4 ; read threshold || CMPGTU .L1x A4,B4,A1 ; threshold pixel || ZERO .S1 A5 ; 0 if <= threshold [A1] MVK .S1 255,A5 ; 255 if > threshold || STB .D1 A5,*A6++ ; store result ||[B0] SUB .L2 B0,1,B0 ; decrement counter ||[B0] B .S2 dith ; branch if not zero
Throughput of two cycles
More Efficient Ordered Dithering on the C6x
MVK .S1 0x00ff,A8 ; white pixel #1 || MVK .S2 0x0001,AMR ; modulo block size 2^2
SHL .S1 A8,8,A9 ; white pixel #2 || MVKH .S2 0x4000,AMR ; modulo addr reg. B6
SHL .S1 A8,16,A10 ; white pixel #3 || SHL .S2 A8,24,B9 ; white pixel #4
; initialize ; A2 number of pixels divided by 4 ; A6 pointer to pixels (will be overwritten) ; B6 pointer to thresholds dith2: LDW .D1 *A6,A4 ; read 4 pixels (bytes)
LDW .D2 *B6++,B4 ; read 4 thresholds EXTU .S1 A4,24,24,A12 ; extract pixel #2 EXTU .S2 B4,24,24,B12 ; extract threshold #2 ZERO .L1 A5 ; store output in A5 CMPLTU .L2 A12,B12,B0 ; B0 = (A12 < B12)
Throughput of 1.25 pixels Initialization
29
More Efficient Ordered Dithering on the C6x EXTU .S1 A4,16,24,A13 ; extract pixel #2 EXTU .S2 B4,16,24,B13 ; extract threshold #2
[!B0] OR .L1 A5,A8,A5 ; output of pixel 1 CMPLTU .L2 A13,B13,B1 ; B1 = (A13 < B13)
EXTU .S1 A4,8,24,A14 ; extract pixel #3 EXTU .S2 B4,8,24,B14 ; extract threshold #3
[!B1] OR .L1 A5,A9,A5 ; output of pixels 1-2 CMPLTU .L2 A14,B14,B2 ; B2 = (A14 < B14)
EXTU .S1 A4,0,24,A15 ; extract pixel #4 EXTU .S2 B4,0,24,B15 ; extract threshold #4
[!B2] OR .L2 A5,B9,B5 ; output of pixels 1-3 CMPLTU .L1 A15,B15,A1 ; B2 = (A15 < B15)
[!A1] OR .S1 B5,A11,A5 ; output of pixels 1-4 STW .D1 A5,*A6++ ; store results
[A2] SUB .L1 A2,1,A2 ; decrement loop count [A2] B .L2 dith2 ; if A2 != 0, branch
30
Floyd-Steinberg Error Diffusion
n N ois e -s h a p e d f e e d b a c k cod e r (2-D s igm a d e lt a )
n Error filter H(z)
Floyd-Steinberg Error Diffusion
n C im p lem e n t a t ion color /gr a ysca le e r r or d iffu s ion
n R e p la cin g m u lt iplica t ion s w it h a d d s a n d s h ift s
4 3*er r or = (e r r or < < 2 ) - e r r or
4 5*er r or = (e r r or < < 2 ) + e r r o r
4 7*er r or = (e r r or < < 3 ) - e r r or
4C a n r e u s e (e r r or < < 2 ) ca lcu la t ion
n R e p la ce divis ion b y 1 6 w it h a d d s a n d s h ift s
4 n > > 4 d oes n ot g ive r igh t a n s w e r for n ega t ive n
4A d d offs e t of 2 4-1 = 15 fo r nega t i ve n : (n + 1 5 ) >> 4
4Alt e r n a t i v e i s t o w o r k w i t h | e r r or |
n Com b in e n e s t e d for loop s in t o on e for loop t h a t ca n b e p ipel in e d b y t h e C 6 x t ools
32
n Loca l va r ia b les
4D e fin e on ly w h e n a n d w h e r e n e e d e d t o a s s is t com p iler in m a p p in g v a r ia b l e s t o r e g i s t e r s (especia lly on C 6 x )
4G ive in it ia l va lu e s t o a v oid u n in it ia l ized r e a d e r r or s
4C h oos e n a m e s t o in d ica t e p u r p ose a n d d a t a t y p e
4 I n C , m a y on ly be de fin e d a t s t a r t of n e w e n v ir o n m e n t
4 I n C + + , m a y b e d e fin e d a n y w h e r e
4F u n ct ion a r g u m e n t s a s loca l va r ia b les (m a y b e u p d a t e d )
n R e a d in g s t r in g s from file s u s in g fge t s
4R e a d s N ch a r a ct e r s or n e w lin e , wh ich e v e r com e s fir s t
4D oes n ot g u a r a n t e e t h a t n e w lin e i s r e a d
4D oes n ot g u a r a n t e e t h a t s t r in g i s n u l l t e r m i n a t e d
n D e fin e a s m a n y con s t a n t s a s p oss ib le
33
C/C++ Coding Tips int fileHasLine(FILE *filePtr, char *searchStr) { char bufStr[128], *strPtr; int foundFlag; foundFlag = 0; while ( ! feof(filePtr) ) { strPtr = fgets(bufStr, 127, filePtr); if (strPtr && strcmp(bufStr,searchStr) == 0) { foundFlag = 1; break; } } return(foundFlag); }
int fileHasLine(FILE *filePtr, const char *searchStr) { int foundFlag = FALSE; while ( ! feof(filePtr) ) { char bufStr[BUFLEN]; int bufStrLen = 0; char *strPtr = fgets(bufStr, BUFLEN-1, filePtr); bufStr[BUFLEN-1] = ‘\0’; bufStrLen = strlen(bufStr); if ( bufStr[bufStrLen-1] == ‘\n’ ) bufStr[bufStrLen - 1] = ‘\0’; if (strPtr && strcmp(bufStr,searchStr) == 0) { foundFlag = TRUE; break; } } return(foundFlag); }
#define BUFLEN 128
C/C++ Coding Tips
n Alloca t in g d y n a m ic m e m or y
4F u n ct ion m a lloc a lloca t e s b u t d oes n ot in it ia l ize va lu e s : u s e ca lloc (a lloca t e /in it ia l ize) or m e m s e t (in it ia lize)
4 I n C + + , n e w o p e r a t or ca lls m a lloc a n d t h e n ca lls t h e con s t r u ct or for e a ch cr e a t e d object
4O n fa ilu r e, m a lloc a n d n e w r e t u r n 0 : w h e n n e w fa ils, _n e w _h a n d ler is ca l led if s e t (s e t b y s e t _n e w _h a n d ler )
n D e a lloca t in g d y n a m ic m e m or y
4F u n ct ion fr e e cr a s h e s if p a s s e d a n u ll poin t e r
4 I n C + + , d e let e o p e r a t o r fir s t ca lls d e s t r u ct or of ob ject (s ) a n d t h e n ca l ls fr e e : d e le t e ign or e s n u ll poin t e r s
4U s e d e le t e [] a r r a y P t r t o d e a lloca t e a n a r r a y
4Zer o poin t e r a ft e r d e a lloca t in g it t o p r e v e n t r e d e le t ion
4D e a lloca t e a p oin t e r b e for e r e a s s ign in g i t
35
Filter::AllocateBuffer(int n) { DeallocateBuffer(); buf = new int [n]; if (buf == 0) { cerr << “allocation failed”; exit(0); } memset(buf, 0, n* sizeof(int)); } Filter::DeallocateBuffer() { delete [] buf; buf = 0; }
Not robust Robust (keep constructor and destructor)
36
C/C++ Coding Tips
n S t a t ic s t r in g l e n g t h
#define IS_STRING_NULL(s) (! *(s)) #define IS_STRING_NOT_NULL (*(s))
#define KEYSTR “MarketShare” #define STATIC_STRLEN(s) (sizeof(s) - 1) strncmp(strBuf, KEYSTR, STATIC_STRLEN(KEYSTR)) == 0)
n Dynamic string length
37
Conclusion
n P r in t e r p ipel in e
4R G B t o Y C r C b con v e r s ion
4 J P E G com p r e s s ion a n d d e com p r e s s ion
4D ocu m e n t s e g m e n t a t ion a n d e n h a n cem e n t
4Y C r C b t o R G B t o C M Y K c o n v e r s i o n
4 I n t e r p ola t ion (e .g n e a r e s t n e igh b or or b ilin e a r )
4H a lft on in g (e.g. or d e r e d d it h e r or e r r or d iffu s ion )
n S p lit e m b e d d e d s oft w a r e s y s t e m s
4C + + for n on -r e a l-t im e t a s k s : G U I s a n d file in p u t /ou t p u t
4C for low -leve l im a g e p r oces s in g o p e r a t i o n s
4A N S I C ca n b e cr o s s -com p iled on t o D S P s
4P r ogr a m C cod e t o w o r k w i t h b l o c k s o r r o w s b e c a u s e e m b e d d e d p r oce s s or s h a ve l i t t le on -ch ip m e m or y
38
Conclusion
n W e b r e s ou r ces
4 com p .d s p n e w s g r ou p : F A Q w w w .b d t i.com /fa q /d s p _fa q .h t m l
4 e m b e d d e d p r oce s s or s a n d s y s t e m s : w w w .eg3.com
4 on -lin e cou r s e s a n d D S P b oa r d s : w w w .t e ch on lin e .com
4 soft w a r e d e v e lop m e n t : w w w .ece.u t e x a s .e d u /~ beva n s /t a lk s /so f tware_deve lopm e n t
4 T I color la s e r p r in t e r x S t r e a m t e ch n ology w w w .t i.com /sc/docs/d s p s /x s t r e a m /in d e x .h t m
n R e fer e n ces 4 B . L. E v a n s , “S o f t w a r e D e v e l o p m e n t in t h e U n ix E n vir on m e n t ”.
h t t p ://w w w .ece .u t e x a s .e d u /~ b e v a n s /t a lk s /s o f t w a r e _ d e v e l o p m e n t /
4 B . L. E v a n s , “E E 3 7 9 K -17 Rea l -T im e D S P L a b or a t ory , ” U T Au s t i n . h t t p ://w w w .ece .u t e x a s .e d u /~ b e v a n s /cou r s e s /r e a lt i m e /