fpga based implementation of divider in finite field final ms project by fareena fiaz advisor dr...
Post on 19-Dec-2015
216 views
TRANSCRIPT
FPGA based implementation of divider in Finite field
Final MS Project by
Fareena Fiaz
AdvisorDr Shahid Masud
Lahore University of Management Sciences (LUMS)5th July 2004
2
Overview
Problem description Finite field Representing Alpha values Finite/Galois field algebra Division algorithms Efficient hardware requirements
Related work My contribution
Project phases Design flow UMD Algorithm,pseudocode, description, state diagram C code implementation Verilog code results & timing diagram Implementation, synthesis for a FPGA (Field Programmable Gate Array)
Conclusion Future work References
3
Finite field
A set of finite elements All algebraic operations hold Order of field (say q) Finite field representation, GF (q) Commonly used field types
Primary field: GF (p) Binary field: GF (2m)
Applications Code construction Cryptography Decoding
4
Representing Alpha() values
Power Polynomial Binary 8-tuple Decimal
1 1 (10000000) 1
1 1 (01000000) 2
2 2 (00100000) 4
3 3 (00010000) 8
4 4 (00001000) 16
8 1 2 3 4 (10111000) 29
9 1 3 4 5 (01011100) 58
P(x) = X8 + X4 + X3 + X2 + 1 for GF(28)
P() = 8 + 4 + 3 + 2 + 1 = 0
5
Finite/Galois field algebra
Field – a number system with defined fundamental operations (+, -, /, x)
Galois Field (GF) is a finite field GF(2m ) m=8 Multiplication: i * j = [(i + j) mod (2^m - 1)] Division: i / j = [(i - j) mod (2^m - 1)]
Example: 5 * 4 = 9 mod 7 = 2 Addition/Subtraction: ( use binary XOR)
Example:(00011001) + (10110100) = (10101101)
6
Division algorithms
For computing Z = X/Y in Finite fields
Extended Euclidean algorithm Fermat’s theorem Look-up tables Reduction to subfield inversion Direct inversion
7
Euclidean algorithm
Statement: Integers a & b,
if b=0 => gcd(a,b)=aelse remainder= c replace a = b &
b = c;start the process again
gcd(1071,1029)
a b c
1071 1029 42
1029 42 21
42 21 0
21=gcd 0
8
Extended Euclidean algorithm
Keep track of quotient Find integers x & y, such that
a*x+b*y = gcd(a,b) For a & b relatively prime,
gcd(a,b) = ±1 => a*x+b*y = 1 => a*x = 1 mod b => x = a-1mod b Hence, x is multiplicative inverse of a modulo b
Extended GCD: gcd(P(x), F(x))=1 P(x)*S(x) + F(x)*R(x) = 1 Take mod of irreducible polynomial F(x) P(x)*S(x) = 1 mod F(x) S(x) = (P(x))-1 mod F(x)
9
Efficient hardware requirements
Its iterations have less/ few complex operations or tests
It requires the least number of iterations to complete the task (number of clock cycles)
10
Overview
Problem description Finite field Representing Alpha values Finite/Galois field algebra Division algorithms Efficient hardware requirements
Related work My contribution
Project phases Design flow UMD Algorithm,pseudocode, description, state diagram C code implementation Verilog code results & timing diagram Implementation, synthesis for a FPGA (Field Programmable Gate Array)
Conclusion Future work References
11
Systolic VLSI implementation [2][3]
Serial-in serial-out architectures
Parallel-in serial-out architectures
Serial-in parallel-out architectures
12
Overview
Problem description Finite field Representing Alpha values Finite/Galois field algebra Division algorithms Efficient hardware requirements
Related work My contribution
Project phases Design flow UMD Algorithm,pseudocode, description, state diagram C code implementation Verilog code results & timing diagram Implementation, synthesis for a FPGA (Field Programmable Gate Array)
Conclusion Future work References
13
Project phases
Project was divided into four phases
Time Duration: Phase 1, phase 2,
partially phase 3 (March 2004-May
2004) Remaining phase3
and phase 4 (June 2004 – 01, July 2004)
Phase 1:Prototye Testing
(Maple 7)
Phase 2:Functionality Testing
(C- Language)
Phase 3:Simulation Testing
(Active- HDL)
Phase 4:Synthesis Testing
(Xilinx Project Navigator)
14
Design FlowDefine Requirements
C coding
Verilog Coding
Place & Route
Verify Logic
Verify Logic
Configuration
Verify Logic
FPGA Chip
Synthesis
Verify Logic
Define requirements: specify functionality of device. E.g: I/O and performance requirements
Verilog coding entry: functional description of architectures. Include combinational logic design and RTL (Register Transfer Level) coding.
Synthesis: generates a netlist from the Verilog code. Netlist is a low level abstraction of the code.
Place and Route: determines the placement of each cell and connections between cells in the chip.
Configuration: a process in which the circuit design (bitstream file) is downloaded into the FPGA.
C coding: Source code of a program that implements a Finite field divider.
15
UMD Algorithm
Combination of Extended Euclidean and binary GCD algorithm[1]
Input: Let m= number of bits, 0 ≤ X ≤ p, 0 < Y < p, 2m < p < 2m +1
Output: quotient is computed as
• Quotient[m-1:0] = X[m-1:0] / Y[m-1:0] mod p, with Y ≠ 0
Modular inversion: when X= 1
16
Inputs: X, Y, pc = Y, u = X, d = p, w = 0, g = 0
WHILE c != 0IF co = 0 THEN /* co: Least significant bit of c */
c := c/2 g := g-1
ELSEIF g < 0 THEN SWAP(c, d), SWAP(u, w), g := -gENDIF;k = 1IF ((c+d) mod 4 ≠ 0) THEN
k = -1ELSE
g:= g -1ENDIF;c := (c + k*d)/2, u = (u+ k*w)
ENDIFu := (u + uo* p)/2 /* uo: Least significant bit of u */END WHILE
IF d = 1 THEN Quotient := w
ELSEQuotient := p - w
ENDIF
Pseudocode
17
Description
If c is even and d is odd thengcd (c , d) = gcd (c/2 , d)
If c and d are both odd, then 4 divides either c+d or c-d. In case of (c+d) gcd(c , d) = gcd( (c+d/2) , d ) = gcd ( (c+d)/4 , d ) |(c+d)/4| ≤ max (|c/2|, |d/2|)
In case of (c-d) gcd(c , d) = gcd( (c-d/2) , d ) = gcd ( (c-d)/4 , d ) |(c-d)/4| ≤ min (|c/2|, |d/2|)
18
Inputs: Divisor, Dividend, Prime_valuec = Divisor, u = Dividend, d = Prime_value, = 0, counter = 0, quotient_w = 0;WHILE c != 0
IF co = 0 THEN /* co: Least significant bit of c */
c := c/2 counter := counter-1
ELSEIF counter < 0 THEN SWAP(c, d), SWAP(u, quotient_w), counter := - counterENDIF;k = 1IF ((c+d) mod 4 ≠ 0) THEN
k = -1ELSE
counter:= counter -1ENDIF;
c := (c + k*d)/2u = (u+ k* quotient_w)
ENDIFu := (u + uo* Prime_value)/2 /* uo: Least significant bit of u */END WHILE
IF d = 1 THEN Quotient := quotient_w
ELSEQuotient := Prime_value - quotient_w
ENDIF
Explanation
19
Contd’
When c > d The size of bit vector c is reduced by 1
When c < d The size of bit vector may not be reduced. Counter is used to reduce no. of iterations,
forcing Swap(c,d) to make c>d, required for convergence
20
Contd’
At any point w*Y = d*X (modp) u*Y = c*X (modp) always hold
Start of Iterations w=0, d=p; 0*Y = X*p mod(p)=0
End of Iterations c=0, d=±1, w=Z , gcd(Y,p) =1 Z*Y = X mod(p) => Z=X/Y mod(p)
21
State diagram
22
Design FlowDefine Requirements
C coding
Verilog Coding
Place & Route
Verify Logic
Verify Logic
Configuration
Verify Logic
FPGA Chip
Synthesis
Verify Logic
Define requirements: specify functionality of device. E.g: I/O and performance requirements
Verilog coding entry: functional description of architectures. Include combinational logic design and RTL (Register Transfer Level) coding.
Synthesis: generates a netlist from the Verilog code. Netlist is a low level abstraction of the code.
Place and Route: determines the placement of each cell and connections between cells in the chip.
Configuration: a process in which the circuit design (bitstream file) is downloaded into the FPGA.
C coding: Source code of a program that implements a Finite field divider.
23
C Code Implementation
Source code of a program that implements the divider using UMD algorithm
Verification purposes
24
Example:X=98; Y=11; P=103Quotient =37
98/11 mod(103)=37 98= (37*11) mod (103) = 98
Outputs of C Code
Consultation Session
25
Design FlowDefine Requirements
C coding
Verilog Coding
Place & Route
Verify Logic
Verify Logic
Configuration
Verify Logic
FPGA Chip
Synthesis
Verify Logic
Define requirements: specify functionality of device. E.g: I/O and performance requirements
Verilog coding entry: functional description of architectures. Include combinational logic design and RTL (Register Transfer Level) coding.
Synthesis: generates a netlist from the Verilog code. Netlist is a low level abstraction of the code.
Place and Route: determines the placement of each cell and connections between cells in the chip.
Configuration: a process in which the circuit design (bitstream file) is downloaded into the FPGA.
C coding: Source code of a program that implements a Finite field divider.
26
Add Verilog code
27
Verilog code results
. .
28
Design FlowDefine Requirements
C coding
Verilog Coding
Place & Route
Verify Logic
Verify Logic
Configuration
Verify Logic
FPGA Chip
Synthesis
Verify Logic
Define requirements: specify functionality of device. E.g: I/O and performance requirements
Verilog coding entry: functional description of architectures. Include combinational logic design and RTL (Register Transfer Level) coding.
Synthesis: generates a netlist from the Verilog code. Netlist is a low level abstraction of the code.
Place and Route: determines the placement of each cell and connections between cells in the chip.
Configuration: a process in which the circuit design (bitstream file) is downloaded into the FPGA.
C coding: Source code of a program that implements a Finite field divider.
29
Implementation
FPGA design for primary divider
30
Synthesis for FPGASelected Device : 2s30tq144-5
Macro Statistics# Registers: 14 # Counters: 1 1-bit register: 5 3-bit up counter: 1 8-bit register: 9
# Adders/ Subtractors: 6 # Multiplexers: 3 8-bit adder: 3 2-to-1 multiplexer: 3 8-bit subtractor: 1 9-bit adder: 1 9-bit adder carry out: 1
Minimum period: 12.260ns
Maximum Frequency: 81.566MHz
HDL Synthesis Report for the Finite field divider
31
Overview Problem description
Finite field Representing Alpha values Finite/Galois field algebra Division algorithms Efficient hardware requirements
Related work My contribution
Project phases Design flow UMD Algorithm,pseudocode, description, state diagram C code implementation Verilog code results & timing diagram Implementation, synthesis for a FPGA (Field Programmable Gate Array)
Conclusion Future work References
32
Conclusion
UMD applicable to both integer domain
Uses less number of clock cycles/iterations to compute result
Reduces complexity of iterations by using a counter variable to switch between prime and binary fields
Its simplicity has rendered it suitable for hardware implementation, as already seen
33
Future work
Reduction of clock cycles to make it more faster for hardware implementation
Reduction of clock cycles for swapping
Expansion of hardware implementation for binary field as well
34
References J. Guajardo and C. Paar, “Itoh-Tsuji Inversion in Standard Basis and Its Applications
in Cryptography and Codes,” and Christof Paar, “Inversion in Finite Fields and Rings” , Designs, Codes and Cryptography, 25, app. 207-216, 2002.
A.F. Tenca, L.A. Tawalbeh (School of Electrical Engineering and Computer Sciences, Oregon State University, USA), “Algorithm for unified modular division in GF(p) and GF(2n) suitable for cryptographic hardware”, Electronics Letters, March 2004, Vol. 40 No. 5
IEEE P1363 / D13, Standard Specifications for Public Key Cryptography, copyright © 1999 by the Institute of Electrical and Electronics Engineers, Inc. 345 East 47th Street, New York, NY 10017, USA
http://www.math.sc.edu/~sumner/numbertheory/euclidean/euclidean.html
Shu Lin/ Daniel J. Costello, Jr., Error Control Coding: Fundamentals and Applications, Prentice Hall Series in Computer Applications in Electrical Engineering, Franklin F. Kuo Editor
Chin-Liang Wang and Jung-Lung Lin, “A Systolic Architecture for Computing Inverses and Divisions in Finite Fields GF(2m)”, IEEE Log Number 9207282, Dec. 19, 1990; revised July 1, 1992, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan, Republic of China.
35
Chien-Hsing Wu, Chien-Ming Wu, Ming-Der Shieh, and Yin-Tsung Hwang, “High-Speed, Low-Complexity Systolic Designs of Novel Iterative Division Algorithms in GF(2m)”, IEEE transactions on Computers, Vol. 53, No. 3, March 2004.
S. C. Shantz. “From Euclid's GCD to Montgomery Multiplication to the Great Divide". Technical Report TR-2001-95, Sun Microsystems Laboratories, 2001. 4,7
Savas, E., and Koc, C.K. “Architectures for unified field inversion with applications in elliptic curve cryptography” 9th IEEE Intl. Conf. on Electronics, Circuits and System – ICECS’2002, Dubrovnik, Croatia, September 2002
Clifford E. Cummings, “Non Blocking Assignments in Verilog Synthesis, Coding Styles That Kill”, Sunburst Design, Inc. [email protected] :web site: www.sunburst-design.com/papers
VHDL & RTL Books
Lionel Bening and Harry Foster, Principles of Verifiable RTL Design, A Functional Coding Style Supporting Verification Processes in Verilog, Kluwer Academic Publishers
J. Bhasker, Verilog VHDL Synthesis, A Practical Primer, Star Galaxy Publishing, Lucent Technologies 1998
Thankyou
Questions
??