implementation of finite field inversion debdeep mukhopadhyay chester rebeiro dept. of computer...

43
IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA

Upload: ronald-nichols

Post on 01-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

IMPLEMENTATION OF

FINITE FIELD

INVERSION

Debdeep Mukhopadhyay Chester Rebeiro

Dept. of Computer Science and Engineering

Indian Institute of Technology Kharagpur

INDIA

Page 2: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Finite Field Inverse

23-27 May 2011 Anurag Labs, DRD0 2

Page 3: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Itoh-Tsujii Method for Binary Fields

23-27 May 2011 Anurag Labs, DRD0 3

Page 4: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

The Steps

23-27 May 2011 Anurag Labs, DRD0 4

Page 5: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

How do we do a SquaringConsider (again) the field GF(24), with

irreducible polynomial x4+x+1. What is (x3+x2+1)2 in this field ?

23-27 May 2011 Anurag Labs, DRD0 5

Page 6: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Squaring

Squaring can be represented in the form of a matrix multiplication T.a

23-27 May 2011 Anurag Labs, DRD0 6

Page 7: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Quad OperationQuad operation

can be done by two squaring operations.

Quad operation can be written in the form T2.a

23-27 May 2011 Anurag Labs, DRD0 7

Page 8: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Advantage of using Quad Operations

Quad circuits have better LUT utilization compared to Squarer circuits

23-27 May 2011 Anurag Labs, DRD0 8

Page 9: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Generalization of the Itoh-Tsujii Algorithm

23-27 May 2011 Anurag Labs, DRD0 9

Page 10: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Theorem 1

23-27 May 2011 Anurag Labs, DRD0 10

Page 11: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Theorem 2

23-27 May 2011 Anurag Labs, DRD0 11

Page 12: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Quad Itoh-Tsujii Inversion Algorithm

23-27 May 2011 Anurag Labs, DRD0 12

Page 13: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

A Circuit for InversionAt every

clock cycle, either the multiplier or the quadblock is active.

The output of the multiplier is stored in mout register

23-27 May 2011 Anurag Labs, DRD0 13

Page 14: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Finding the Inverse

23-27 May 2011 Anurag Labs, DRD0 14

Page 15: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Finding the Inverse Step 2

23-27 May 2011 Anurag Labs, DRD0 15

Page 16: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Finding the Inverse Step 2

23-27 May 2011 Anurag Labs, DRD0 16

Page 17: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Control Signals for the Inverse

23-27 May 2011 Anurag Labs, DRD0 17

Page 18: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Performance Charts

23-27 May 2011 Anurag Labs, DRD0 18

Page 19: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Higher Powered Itoh-Tsujii

23-27 May 2011 Anurag Labs, DRD0 19

• We seen that Quad circuits utilize LUTs in a better way compared to squarer circuits.

• Also LUT size is increasing as silicon technology reduces

• We have seen 4-LUT become 6-LUT, and now 8-LUT

• This gives us a motivation to investigate using higher powers other than quad circuits

Page 20: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Revisiting the Theorems

23-27 May 2011 Anurag Labs, DRD0 20

Page 21: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

2n Itoh-Tsujii Inversion

23-27 May 2011 Anurag Labs, DRD0 21

These are the overheads

Higher Powered

Page 22: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Overhead in 2n Itoh-Tsujii

23-27 May 2011 Anurag Labs, DRD0 22

• Computation of .

• Using addition chain for , can be computed in clock cycles, where is the length of addition chain for .

• Computation of , for

• Using addition chain for , that contains , can be

computed during computation, because .

Page 23: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

2n Itoh-Tsujii Design

23-27 May 2011 Anurag Labs, DRD0 23

Page 24: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

24

Configurable Parameters

• Addition chain.

• Power circuit used in power block.

• Number of cascaded power

circuits in the power block.

• These have an effect on – Number of clock cycles.

– Critical path delay.

Building the Optimal Design

For a given field and a given FPGA how do decide the optimal

design ?

23-27 May 2011 Anurag Labs, DRD0

Page 25: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Estimating AREA required on an FPGA

23-27 May 2011 Anurag Labs, DRD0 25

• A k input LUT (k-LUT) can implement any functionality of maximum k input variables.

• Total number of k-LUTs to implement a function with variables can be expressed as

Page 26: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Estimating Delay of a Design in an FPGA

23-27 May 2011 Anurag Labs, DRD0 26

• Delay in FPGAs comprise of LUT delay and routing delay..

• For this ITA architecture, we have experimentally found, total delay is proportional to number of LUTs in critical path.

• We denote number of LUTs in a delay path as maxlutpath.

• In k-LUT, maxlutpath of an variable function is

Page 27: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Recap : Karatsuba Multiplier

23-27 May 2011 Anurag Labs, DRD0 27

Page 28: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Hybrid Karatsuba Multiplier for GF(2233)Note that the school book multiplier

has replaced the general Karatsuba Multiplier

23-27 May 2011 Anurag Labs, DRD0 28

School Book Multiplier

Page 29: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

29

• The field multiplier is a hybrid Karatsuba multiplier.

• A bit hybrid Karatsuba multiplier consists of two bit and one bit multipliers. This happens in recursive manner.

• In threshold ( ) level, School-Book multiplier is invoked.

• Total area of bit hybrid Karatsuba multiplier is given by

• Total area for the School-Book multiplier is

Estimating LUT Requirement for Hybrid Karatsuba Multiplier

23-27 May 2011 Anurag Labs, DRD0

Page 30: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Estimating Delay of Hybrid Karatsuba Multiplier

23-27 May 2011 Anurag Labs, DRD0 30

• The hybrid Karatsuba multiplier is distributed in smaller multipliers like a tree. Height of the tree is

• Each level of the Simple Karatsuba tree introduces one LUT delay.

• In threshold ( ) level, School-Book multiplier delay is added.

• Delay of School-Book multiplier is

• Delay of the entire multiplier in LUTs is given by

Page 31: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

31

• For fields generated by trinomials, area of modular reduction

is almost equal to field size and delay is one LUT considering LUT size .

• For fields generated by pentanomials, – and 2 LUT for .

– and 2 LUT for .

Estimating Area & Delay for Modular Reduction

23-27 May 2011 Anurag Labs, DRD0

Page 32: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Area & Delay Estimates for 2n Circuit

23-27 May 2011 Anurag Labs, DRD0 32

• The output of a 2n circuit, which raises an input can be expressed as , where is binary field matrix

and ,

• LUT requirement per output bit is

• Total LUT requirement for the 2n circuit is

• LUT delay per output bit is

• Since all bits are in parallel, delay of 2n circuit is

Page 33: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Area & Delay Estimates for Multiplexer

23-27 May 2011 Anurag Labs, DRD0 33

• For a 2s : 1 MUX, there are s selection lines and thus the output is a function of 2s + s variables.

• For a MUX in , each of the 2s input lines is of width m bits.

• Total LUT requirement is

• Total LUT delay of the MUX is

• When number of inputs to MUX , the above gives a close upper bound

Page 34: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Area & Delay of PowerBlock

23-27 May 2011 Anurag Labs, DRD0 34

• Let the Powerblock contains us number of cascaded 2n circuits.

• The has selection lines, where

• LUT requirement for is

• Total LUT requirement for Powerblock is

• Delay of is

• Total LUT delay of Powerblock in

Page 35: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Area & Delay for the Entire Architecture

23-27 May 2011 Anurag Labs, DRD0 35

• LUT estimate for the entire architecture is

• There are two parallel delay paths.– LUT delay of first path is

– LUT delay of second path is

– LUT delay of entire architecture is

Page 36: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Optimal Number of Cascades

23-27 May 2011 Anurag Labs, DRD0 36

• For a given field and based FPGA, Powerblock can be configured with different power circuits and cascades .

• Increase in reduces clock cycles, but increases delay of Powerblock.

• is fixed, but depends on and .

• is minimum when

• Minimum delay of the ITA architecture is thus

Page 37: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

Power Circuit Selection to achieve Minimum Clock Cycles

23-27 May 2011 Anurag Labs, DRD0 37

• Number of clock cycles for the inversion can be approximated as

• Number of clock cycles for increases linearly with .

• The term reduces with increase in .

• When is small, the reduction in is significant for increase in .

• But, for large values of n, the increase in dominates over the decrease in

• So, increases with increase in for large values of .

Page 38: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

38

• The performance metric is

• Minimization of without increasing gives best performance. Area remains almost same.

• The following steps are performed to achieve optimal performance

• The optimal architecture is given by

Tuning Design for Optimality

23-27 May 2011 Anurag Labs, DRD0

Page 39: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

39

• Our estimation model uses maxlutpath to find LUT delay.

• Routing delay is difficult to model in FPGAs.

• To get overall delay, we have used experimental results for a reference ITA architecture.

• Total delay of reference architecture is the

• Let LUT delay of reference architecture is

• Total delay of any other ITA architecture in the same field is approximately

• Here is a constant and depends on FPGA technology.

• In 4-LUT based and 6-LUT based

Xilinx FPGAs, has values 0.2 and 0.1 respectively.

Validation of Theoretical Estimates

23-27 May 2011 Anurag Labs, DRD0

Page 40: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

40

Validation on 4-input LUT FPGAs

23-27 May 2011 Anurag Labs, DRD0

Page 41: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

41

Validation on 6-input LUT FPGAs

23-27 May 2011 Anurag Labs, DRD0

Page 42: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

42

Experimental Results

23-27 May 2011 Anurag Labs, DRD0

Page 43: IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology

43

Comparison Charts

23-27 May 2011 Anurag Labs, DRD0