software implementation of binary elliptic curves: impact ......introduction algorithms and...

50
Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication J. Taverne 0 , A. Faz-Hern´ andez 1 , D. F. Aranha 2 , F. Rodr´ ıguez-Henr´ ıquez 1 , D. Hankerson 3 , J. L´ opez 2 0 Universit´ e Lyon 1 - ISFA - France 1 CINVESTAV - IPN - M´ exico 2 Universidad de Campinas - Brazil 3 Auburn University - USA CHES - Nara - Japan September 29th 2011

Upload: others

Post on 19-Mar-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

Software implementation of binary elliptic curves:impact of the carry-less multiplier

on scalar multiplication

J. Taverne0, A. Faz-Hernandez1, D. F. Aranha2,F. Rodrıguez-Henrıquez1, D. Hankerson3, J. Lopez2

0Universite Lyon 1 - ISFA - France1CINVESTAV - IPN - Mexico2Universidad de Campinas - Brazil3Auburn University - USA

CHES - Nara - JapanSeptember 29th 2011

Page 2: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Outline of the talk

1 Introduction

2 Algorithms and implementationBinary field arithmeticElliptic curves arithmeticScalar multiplication

3 Results

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 2 / 30

Page 3: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Outline of the talk

1 Introduction

2 Algorithms and implementationBinary field arithmeticElliptic curves arithmeticScalar multiplication

3 Results

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 3 / 30

Page 4: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Scalar multiplication implementation at CHES

Julio Lopez, Ricardo Dahab: Fast Multiplication on EllipticCurves over GF (2m) without Precomputation

A whole section devoted to Implementation of Elliptic CurveCryptosystems

Two sections on ECC : Elliptic Curve Algorithms and SideChannel Attacks on Elliptic Curve Cryptanalysis

Three sections on Elliptic Curve Cryptography

...

Three papers on efficient/fast ECC implementation

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 4 / 30

Page 5: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Scalar multiplication implementation at CHES

Julio Lopez, Ricardo Dahab: Fast Multiplication on EllipticCurves over GF (2m) without Precomputation

A whole section devoted to Implementation of Elliptic CurveCryptosystems

Two sections on ECC : Elliptic Curve Algorithms and SideChannel Attacks on Elliptic Curve Cryptanalysis

Three sections on Elliptic Curve Cryptography

...

Three papers on efficient/fast ECC implementation

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 4 / 30

Page 6: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Scalar multiplication implementation at CHES

Julio Lopez, Ricardo Dahab: Fast Multiplication on EllipticCurves over GF (2m) without Precomputation

A whole section devoted to Implementation of Elliptic CurveCryptosystems

Two sections on ECC : Elliptic Curve Algorithms and SideChannel Attacks on Elliptic Curve Cryptanalysis

Three sections on Elliptic Curve Cryptography

...

Three papers on efficient/fast ECC implementation

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 4 / 30

Page 7: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Scalar multiplication implementation at CHES

Julio Lopez, Ricardo Dahab: Fast Multiplication on EllipticCurves over GF (2m) without Precomputation

A whole section devoted to Implementation of Elliptic CurveCryptosystems

Two sections on ECC : Elliptic Curve Algorithms and SideChannel Attacks on Elliptic Curve Cryptanalysis

Three sections on Elliptic Curve Cryptography

...

Three papers on efficient/fast ECC implementation

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 4 / 30

Page 8: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Scalar multiplication implementation at CHES

...

Julio Lopez, Ricardo Dahab: Fast Multiplication on EllipticCurves over GF (2m) without Precomputation

A whole section devoted to Implementation of Elliptic CurveCryptosystems

Two sections on ECC : Elliptic Curve Algorithms and SideChannel Attacks on Elliptic Curve Cryptanalysis

Three sections on Elliptic Curve Cryptography

...

Three papers on efficient/fast ECC implementation

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 4 / 30

Page 9: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Scalar multiplication implementation at CHES

...

Julio Lopez, Ricardo Dahab: Fast Multiplication on EllipticCurves over GF (2m) without Precomputation

A whole section devoted to Implementation of Elliptic CurveCryptosystems

Two sections on ECC : Elliptic Curve Algorithms and SideChannel Attacks on Elliptic Curve Cryptanalysis

Three sections on Elliptic Curve Cryptography

...

Three papers on efficient/fast ECC implementation

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 4 / 30

Page 10: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Intel® Carry-Less Multiplication Instruction

Available since Westmere architecture [32nm], PCLMULQDQinstruction performs a multiplication of two 64-bit operandswithout carry bits. Its latency ranges from 10 to 15 clockcycles (8 to 14 in Sandy Bridge).

Unlike Westmere, new Sandy Bridge architecture providesthree-operand code for this instruction.

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 5 / 30

Page 11: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Impact on the field arithmetic assumptions

Mul in F2233 comb method (LD) Karatsuba (CMUL)Westmere i5 256 cc * 128 cc ** according to our experimentations

Impact on the ratios of multiplication with other operations:• Mul/Sq, Mul/Sqrt• Inv/Mul• Quadratic solver/Mul

Consequences on the elliptic curves arithmetic:

??? DOUBLING HALVING ???[Multiplication] [Quad. solver]

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 6 / 30

Page 12: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Impact on the field arithmetic assumptions

Mul in F2233 comb method (LD) Karatsuba (CMUL)Westmere i5 256 cc * 128 cc ** according to our experimentations

Impact on the ratios of multiplication with other operations:• Mul/Sq, Mul/Sqrt• Inv/Mul• Quadratic solver/Mul

Consequences on the elliptic curves arithmetic:

??? DOUBLING HALVING ???[Multiplication] [Quad. solver]

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 6 / 30

Page 13: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Motivation

From the point of view of software implementations binaryelliptic curves have almost always be considered [much] slowerthan prime field multiplications.

Until now, little attention has been put on multi-coreimplementation of a single elliptic curve scalar multiplication

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 7 / 30

Page 14: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Motivation

From the point of view of software implementations binaryelliptic curves have almost always be considered [much] slowerthan prime field multiplications.

Until now, little attention has been put on multi-coreimplementation of a single elliptic curve scalar multiplication

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 7 / 30

Page 15: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Motivation

From the point of view of software implementations binaryelliptic curves have almost always be considered [much] slowerthan prime field multiplications.

Until now, little attention has been put on multi-coreimplementation of a single elliptic curve scalar multiplication

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 7 / 30

Page 16: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Our contribution

Squaring and square-root are not negligible anymore withrespect to multiplication

Half-trace is computed at the same cost as multiplication

Fastest single-core implementation of a single scalarmultiplication on various binary curves at the 112- 128-192-bit security levels

Efficient multi-core implementation of a single scalarmultiplication achieving an almost 2 factor of accelerationfrom algorithm analysis and 1.46 to 1.72 in practice

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 8 / 30

Page 17: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Outline of the talk

1 Introduction

2 Algorithms and implementationBinary field arithmeticElliptic curves arithmeticScalar multiplication

3 Results

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 9 / 30

Page 18: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Structure

Tower

Protocols

Scalar mul.

Ell. curves arith.

F2p

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 10 / 30

Page 19: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Structure

Tower

Protocols

Scalar mul.

Ell. curves arith.

F2p

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 10 / 30

Page 20: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Structure

Tower

Protocols

Scalar mul.

Ell. curves arith.

F2p

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 10 / 30

Page 21: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Structure

Tower

Protocols

Scalar mul.

Ell. curves arith.

F2p

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 10 / 30

Page 22: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Structure

Tower

Protocols

Scalar mul.

Ell. curves arith.

F2p

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 10 / 30

Page 23: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Structure

Tower

Protocols

Scalar mul.

Ell. curves arith.

F2p

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 10 / 30

Page 24: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Outline of the talk

1 Introduction

2 Algorithms and implementationBinary field arithmeticElliptic curves arithmeticScalar multiplication

3 Results

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 11 / 30

Page 25: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Multiplication

The maximum amount of work should be done in registers toavoid costly load/store instructions

The multiplier should have 128-bit granularity to benefit fromcheap xor and shift-by-byte instructions

Minimal overhead when implementing Karatsuba in F2p

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 12 / 30

Page 26: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Squaring and square-root

Vectorized implementations with simultaneous table lookupsthrough byte shuffling instructions [Aranha et al., LATINCRYPT 10’]

improved by a careful reordering of the instructions

Multi-squaring = exponentiation to 2k . The method uses onlyxor operations, taking the values from a large precomputedtable. Although very memory demanding, this multi-squaringfunction brings substantial speed improvements.

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 13 / 30

Page 27: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Quadratic solver z2 + z = c

Algorithm 1 Solve x2 + x = c [Avanzi, IACR ePrint 07’]

Input: c =∑m−1

i=0 cizi ∈ F2m where m is odd and Tr(c) = 0

Output: a solution s of x2 + x = c1: compute H(l0c8i+1 + l1c8i+3 + l2c8i+5 + l3c8i+7)

for i ∈ I = {0, . . . , bm−38c} and lj ∈ F2

2: s ← 03: for i = (m − 1)/2 downto 1 do4: if c2i = 1 then5: c ← c + z i , s ← s + z i

6: return s ← s +∑

i∈I c8i+1H(z8i+1) + c8i+3H(z8i+3) + c8i+5H(z8i+5) +

c8i+7H(z8i+7)

Free memory environment [desktop/server], not focused onmemory minimization

Step 5 benefits from the vectorization in the same way assquare-root

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 14 / 30

Page 28: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Ratios

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 15 / 30

Page 29: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Outline of the talk

1 Introduction

2 Algorithms and implementationBinary field arithmeticElliptic curves arithmeticScalar multiplication

3 Results

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 16 / 30

Page 30: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Elliptic curves arithmetic

Set of curves used in this work:

• NIST random binary elliptic curves: B-233, B-409• NIST Koblitz curves: K-233, K-409• Binary Edwards elliptic curve: curve2251

y2 + xy = x3 + (z13 + z9 + z8 + z7 + z2 + z + 1)

Op. LD Kim&Kim * Exp. result in F2233

Point Doubling 4M, 5S 4M, 5S, -2Red 5.5M

Point Addition 8M, 5S 8M, 4S, -3Red 9M

* [Kim et al., IACR ePrint 07’]

Op. Theoretical cost Exp. result in F2233

Point Halving 1M, 1QS, 1SQRT 3.3M

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 17 / 30

Page 31: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Elliptic curves arithmetic

Set of curves used in this work:

• NIST random binary elliptic curves: B-233, B-409• NIST Koblitz curves: K-233, K-409• Binary Edwards elliptic curve: curve2251

y2 + xy = x3 + (z13 + z9 + z8 + z7 + z2 + z + 1)

Op. LD Kim&Kim * Exp. result in F2233

Point Doubling 4M, 5S 4M, 5S, -2Red 5.5M

Point Addition 8M, 5S 8M, 4S, -3Red 9M

* [Kim et al., IACR ePrint 07’]

Op. Theoretical cost Exp. result in F2233

Point Halving 1M, 1QS, 1SQRT 3.3M

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 17 / 30

Page 32: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Outline of the talk

1 Introduction

2 Algorithms and implementationBinary field arithmeticElliptic curves arithmeticScalar multiplication

3 Results

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 18 / 30

Page 33: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Scalar multiplication

Implementation methods and features:

Curve Method Parallel SCP *

random Double&add, Halve&add OK X

Koblitz τ&add, τ−1&add OK X

curve2251 Montgomery laddering X OK

* SCP = Side-Channel Protected

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 19 / 30

Page 34: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Scalar multiplication

Implementation methods and features:

Curve Method Parallel SCP *

random Double&add, Halve&add OK X

Koblitz τ&add, τ−1&add OK X

curve2251 Montgomery laddering X OK

* SCP = Side-Channel Protected

Today’s focus = random curves

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 19 / 30

Page 35: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Sequential algorithm

Algorithm 2 Double-and-add scalar multiplication

Input: ω, k, P ∈ E(F2m ) of odd order rOutput: kP1: Obtain the representation ωNAF(k) =

∑ti=0 ki2

i

2: Compute Pi = iP for i ∈ I = {1, 3, . . . , 2ω−1 − 1}3: Q ← O4: for i = t downto 0 do5: Q ← 2Q6: if k ′i > 0 then7: Q ← Q + Pki8: else if k ′i < 0 then9: Q ← Q − P−ki

10: return Q

cost = pre-comp + tω+1 PA + t.PD

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 20 / 30

Page 36: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Sequential algorithm

Algorithm 3 Halve-and-add scalar multiplication [Fong et al., IEEE TC 04’]

Input: ω, k, P ∈ E(F2m ) of odd order rOutput: kP1: Perform scalar recoding: k ′ = 2tk mod r where t = dlog2 re2: Obtain the representation ωNAF(k ′)/2t =

∑ti=0 k

′i 2i−t

3: Initialize Qi ← O for i ∈ I = {1, 3, . . . , 2ω−1 − 1}4: for i = t downto 0 do5: if k ′i > 0 then6: Qk′i

← Qk′i+ P

7: else if k ′i < 0 then8: Q−k′i

← Q−k′i− P

9: P ← P/210: return Q ←

∑i∈I iQi

cost = tω+1 PA + t.PH + post-comp

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 21 / 30

Page 37: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Parallel formulation

Formula for parallel implementation on random binary curves:

kP = (k ′t2t−n + · · · + k ′n)P + (k ′n−12−1 + · · · + k ′02−n)P

In other words:

kP =t∑

i=n

k ′i 2i−nP +

n−1∑i=0

k ′i 2−n+iP

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 22 / 30

Page 38: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Parallel algorithm

Algorithm 4 Double-and-add, halve-and-add scalar multiplication: parallel

Input: ω, scalar k, P ∈ E(F2m ) of odd order r , constant n ≈ t2

Output: kP

1: Compute Pi = iP fori ∈ I = {1, 3, . . . , 2ω−1 − 1}

2: Q0 ← O

3: Recode: k ′ = 2nk mod r and obtain repωNAF(k ′)/2n =

∑ti=0 k

′i 2i−n

4: Initialize Qi ← O for i ∈ I

{Barrier}5: for i = t downto n do6: Q0 ← 2Q0

7: if k ′i > 0 then8: Q0 ← Q0 + Pk′i9: else if k ′i < 0 then

10: Q0 ← Q0 − P−k′i

11: for i = n − 1 downto 0 do12: P ← P/213: if k ′i > 0 then14: Qk′i

← Qk′i+ P

15: else if k ′i < 0 then16: Q−k′i

← Q−k′i− P

{Barrier}17: return Q ← Q0 +

∑i∈I iQi

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 23 / 30

Page 39: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Parallel algorithm

Algorithm 5 Double-and-add, halve-and-add scalar multiplication: parallel

Input: ω, scalar k, P ∈ E(F2m ) of odd order r , constant n ≈ t2

Output: kP1: Compute Pi = iP for

i ∈ I = {1, 3, . . . , 2ω−1 − 1}2: Q0 ← O

3: Recode: k ′ = 2nk mod r and obtain repωNAF(k ′)/2n =

∑ti=0 k

′i 2i−n

4: Initialize Qi ← O for i ∈ I

{Barrier}

5: for i = t downto n do6: Q0 ← 2Q0

7: if k ′i > 0 then8: Q0 ← Q0 + Pk′i9: else if k ′i < 0 then

10: Q0 ← Q0 − P−k′i

11: for i = n − 1 downto 0 do12: P ← P/213: if k ′i > 0 then14: Qk′i

← Qk′i+ P

15: else if k ′i < 0 then16: Q−k′i

← Q−k′i− P

{Barrier}17: return Q ← Q0 +

∑i∈I iQi

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 23 / 30

Page 40: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Parallel algorithm

Algorithm 6 Double-and-add, halve-and-add scalar multiplication: parallel

Input: ω, scalar k, P ∈ E(F2m ) of odd order r , constant n ≈ t2

Output: kP1: Compute Pi = iP for

i ∈ I = {1, 3, . . . , 2ω−1 − 1}2: Q0 ← O

3: Recode: k ′ = 2nk mod r and obtain repωNAF(k ′)/2n =

∑ti=0 k

′i 2i−n

4: Initialize Qi ← O for i ∈ I

{Barrier}5: for i = t downto n do6: Q0 ← 2Q0

7: if k ′i > 0 then8: Q0 ← Q0 + Pk′i9: else if k ′i < 0 then

10: Q0 ← Q0 − P−k′i

11: for i = n − 1 downto 0 do12: P ← P/213: if k ′i > 0 then14: Qk′i

← Qk′i+ P

15: else if k ′i < 0 then16: Q−k′i

← Q−k′i− P

{Barrier}17: return Q ← Q0 +

∑i∈I iQi

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 23 / 30

Page 41: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Parallel algorithm

Algorithm 7 Double-and-add, halve-and-add scalar multiplication: parallel

Input: ω, scalar k, P ∈ E(F2m ) of odd order r , constant n ≈ t2

Output: kP1: Compute Pi = iP for

i ∈ I = {1, 3, . . . , 2ω−1 − 1}2: Q0 ← O

3: Recode: k ′ = 2nk mod r and obtain repωNAF(k ′)/2n =

∑ti=0 k

′i 2i−n

4: Initialize Qi ← O for i ∈ I

{Barrier}5: for i = t downto n do6: Q0 ← 2Q0

7: if k ′i > 0 then8: Q0 ← Q0 + Pk′i9: else if k ′i < 0 then

10: Q0 ← Q0 − P−k′i

11: for i = n − 1 downto 0 do12: P ← P/213: if k ′i > 0 then14: Qk′i

← Qk′i+ P

15: else if k ′i < 0 then16: Q−k′i

← Q−k′i− P

{Barrier}17: return Q ← Q0 +

∑i∈I iQi

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 23 / 30

Page 42: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Parallel algorithm

Algorithm 8 Double-and-add, halve-and-add scalar multiplication: parallel

Input: ω, scalar k, P ∈ E(F2m ) of odd order r , constant n ≈ t2

Output: kP1: Compute Pi = iP for

i ∈ I = {1, 3, . . . , 2ω−1 − 1}2: Q0 ← O

3: Recode: k ′ = 2nk mod r and obtain repωNAF(k ′)/2n =

∑ti=0 k

′i 2i−n

4: Initialize Qi ← O for i ∈ I

{Barrier}5: for i = t downto n do6: Q0 ← 2Q0

7: if k ′i > 0 then8: Q0 ← Q0 + Pk′i9: else if k ′i < 0 then

10: Q0 ← Q0 − P−k′i

11: for i = n − 1 downto 0 do12: P ← P/213: if k ′i > 0 then14: Qk′i

← Qk′i+ P

15: else if k ′i < 0 then16: Q−k′i

← Q−k′i− P

{Barrier}

17: return Q ← Q0 +∑

i∈I iQi

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 23 / 30

Page 43: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Binary field arithmeticElliptic curves arithmeticScalar multiplication

Parallel algorithm

Algorithm 9 Double-and-add, halve-and-add scalar multiplication: parallel

Input: ω, scalar k, P ∈ E(F2m ) of odd order r , constant n ≈ t2

Output: kP1: Compute Pi = iP for

i ∈ I = {1, 3, . . . , 2ω−1 − 1}2: Q0 ← O

3: Recode: k ′ = 2nk mod r and obtain repωNAF(k ′)/2n =

∑ti=0 k

′i 2i−n

4: Initialize Qi ← O for i ∈ I

{Barrier}5: for i = t downto n do6: Q0 ← 2Q0

7: if k ′i > 0 then8: Q0 ← Q0 + Pk′i9: else if k ′i < 0 then

10: Q0 ← Q0 − P−k′i

11: for i = n − 1 downto 0 do12: P ← P/213: if k ′i > 0 then14: Qk′i

← Qk′i+ P

15: else if k ′i < 0 then16: Q−k′i

← Q−k′i− P

{Barrier}17: return Q ← Q0 +

∑i∈I iQi

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 23 / 30

Page 44: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Outline of the talk

1 Introduction

2 Algorithms and implementationBinary field arithmeticElliptic curves arithmeticScalar multiplication

3 Results

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 24 / 30

Page 45: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Benchmark environment

Timings validated with SUPERCOP [”turbo” mode disabled]from eBACS: http://bench.cr.yp.to

Intel Westmere and Sandy Bridge families [respectively Corei5-660 and Core i7-2600K]

Random scalar and unknown point scenario

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 25 / 30

Page 46: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Benchmark

Timings in 103 clock cycles, (SC)=Single-Core, (MC)=Multi-Core

112-bit security levelImplementation System Finite Field Westmere Sandy Bridge

NIST - K-233 (SC) τ&add (5τ -NAF) F2233 89 67.8

NIST - B-233 (SC) Halve&add (4-NAF) F2233 182 157

NIST - K-233 (MC) (τ |τ)&add (5τ -NAF) F2233 58 46.5

NIST - B-233 (MC) (Dbl,Halve)&add (4-NAF) F2233 116 100

128-bit security levelImplementation System Finite Field Westmere Sandy Bridge

curve2251 (SC) Montgomery F2251 282 225

192-bit security levelImplementation System Finite Field Westmere Sandy Bridge

NIST - K-409 (SC) τ&add (5τ -NAF) F2409 321 255.6

NIST - B-409 (SC) Halve&add (4-NAF) F2409 705 557

NIST - K-409 (MC) (τ |τ)&add (5τ -NAF) F2409 191 148.8

NIST - B-409 (MC) (Dbl,Halve)&add (4-NAF) F2409 444 349

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 26 / 30

Page 47: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Benchmark

Timings in 103 clock cycles, (SC)=Single-Core, (MC)=Multi-Core

112-bit security levelImplementation System Finite Field Westmere Sandy Bridge

NIST - K-233 (SC) τ&add (5τ -NAF) F2233 89 67.8

NIST - B-233 (SC) Halve&add (4-NAF) F2233 182 157

NIST - K-233 (MC) (τ |τ)&add (5τ -NAF) F2233 58 46.5 (x1.46)

NIST - B-233 (MC) (Dbl,Halve)&add (4-NAF) F2233 116 100 (x1.57)

128-bit security levelImplementation System Finite Field Westmere Sandy Bridge

curve2251 (SC) Montgomery F2251 282 225

192-bit security levelImplementation System Finite Field Westmere Sandy Bridge

NIST - K-409 (SC) τ&add (5τ -NAF) F2409 321 255.6

NIST - B-409 (SC) Halve&add (4-NAF) F2409 705 557

NIST - K-409 (MC) (τ |τ)&add (5τ -NAF) F2409 191 148.8 (x1.72)

NIST - B-409 (MC) (Dbl,Halve)&add (4-NAF) F2409 444 349 (x1.6)

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 27 / 30

Page 48: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Comparison with the literature

128-bits security level. Timings validated with SUPERCOP except for (*)

Implementation System Finite Field 103 clock cycles

Bernstein (*)curve2251Core 2 Quad Q6600

binary field: F2251 314.3

Galbraith, Lin, Scottgls1271Intel Xeon E5620 (WSM)

prime field: F(2127−1)2 278.3

This workcurve2251Intel Xeon E5620 (WSM)

binary field: F2251 263.1

Bernstein et al.curve25519Intel Xeon E5620 (WSM)

prime field: F2255−19

226.9

This workcurve2251Intel Core i7-2600K (SB)

binary field: F2251 225

Bernstein et al.curve25519Intel Core i7-2600K (SB)

prime field: F2255−19

193.8

Hu, Longa, Xu (*)Jac128gls4Intel Core i7-2600M (SB)

prime field: F(2128−40557)2 120

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 28 / 30

Page 49: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Concluding remarks

This work achieved an almost 2 factor of acceleration forparallel algorithms. However in practice this factor is up to1.72 in our best implementation.

Future improvement in parallelization management couldimprove this factor.

If the latency would be reduced to 3 clock cycles as theinstruction for integer, binary would have great chance to winagainst prime fields.

AMD Bulldozer release is coming: will AMD’s carry-lessmultiplication instruction latency be lower?

Last update: scalar multiplication has been computed in148.7 × 103 clock cycles using NIST recommended ellipticcurve K-283. Moreover parallelized version takes 95.5 × 103.

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 29 / 30

Page 50: Software implementation of binary elliptic curves: impact ......Introduction Algorithms and implementation Results Outline of the talk 1 Introduction 2 Algorithms and implementation

IntroductionAlgorithms and implementation

Results

Thank you!

Software implementation of binary elliptic curves: impact of the carry-less multiplier on scalar multiplication CHES 2011 30 / 30