lightweight cryptography and rfid security · 2011. 6. 14. · 1989 gost 1997 xtea 1998 aes 2005...
TRANSCRIPT
Lightweight Cryptography andand RFID Security
Svetla NikovaCOSIC KUL d UT tCOSIC, KULeuven and UTwente
Overview
• Lightweight cryptography - state of the art• Comparison Standard vs. LightweightComparison Standard vs. Lightweight • SCA countermeasures - TI approach• TI implementations• Area, Power or Throughput?• Conclusions
Lightweight Crypto
Stream ciphers (3): only the eStream finalists: 2005 Grain, Trivium, Mickey
Lightweight Crypto
Stream ciphers (3): only the eStream finalists: 2005 Grain, Trivium, Mickey
Bl k i h (25)Block ciphers (25): 1977 DES, 1989 GOST1997 XTEA1998 AES2005 mCrypton, STEA2006 Hight, SEA2007 Clefia Kasumi DESL DESXL Present2007 Clefia, Kasumi, DESL, DESXL, Present 2008 Puffin2009 Katan, Ktantan, Hummingbird, MIBS2010 PRINT2011 Klein, LED, Twine, EPCBC, Vitamin-B, Piccolo
Lightweight Crypto
Stream ciphers (3): only the eStream finalists: 2005 Grain, Trivium, Mickey
Bl k i h (25)Block ciphers (25): 1977 DES, 1989 GOST1997 XTEA1998 AES2005 mCrypton, STEA2006 Hight, SEA2007 Clefia Kasumi DESL DESXL Present2007 Clefia, Kasumi, DESL, DESXL, Present 2008 Puffin2009 Katan, Ktantan, Hummingbird, MIBS2010 PRINT2011 Klein, LED, Twine, EPCBC, Vitamin-B, Piccolo
Hash functions (10): 2007 MAME2008 Squash DM Present H Present Keccak2008 Squash, DM-Present, H-Present, Keccak2010 Quark, Armadilo2011 Spongent, Vitamin-H, Photon
Overview
• Lightweight crypto - state of the art• Comparison Standard vs. LightweightComparison Standard vs. Lightweight • SCA countermeasures - TI approach• TI implementations• Area, Power or Throughput?• Conclusions
Standard vs. Lightweight Modules AES ‐ T Present
Memory 2040 887
AES-T (TUG) 20050.35 µm CMOS Technology of Philips 100kHz @1 5V Encryption + Decryption
[GE]
Mix Column[GE]
306 0
100kHz @1.5V Encryption + DecryptionPresent (RUB/DTU/ORANGE) 2007UMC L180 0.18μm 1P6M 100 kHz @1.8VEncryption only
S‐box[GE]
408 32
FSM + Rest[GE]
646 192
Difficult to compare:• Different technology – GE differs• Power depends even more on the
[GE]
Total Area 3400 1111
technology used.• Here 0.35 µm vs. 0.18 μm is a bigtechnology difference!C l d t d d t h l[GE]
Cycles 1000 547
• Cycles do not depend on technology• But AES=128 while Present=64 bits• Some implementations - encryption only others include decryption tooPower
[μA]3.0 1.34 only others include decryption too.
• Including decryption in AES addscost in MixColumn and FSM.
Standard (hits back) vs. Lightweight Modules AES ‐ T AES ‐ B Present
Memory 2040 1678 887AES T[GE]
Mix Column[GE]
306 373 0
AES-T0.35 µm CMOS Technology of Philips 100kHz @1.5V
S‐box[GE]
408 233 32
FSM + Rest[GE]
646 317 192
Encryption + Decryption
Present and AES-B UMC L180 0 18μm 1P6M[GE]
Total Area 3400 2601 1111
UMC L180 0.18μm 1P6M 100 kHz @1.8VEncryption only
F i i i[GE]
Cycles 1000 226 547
Fair comparison is now possible.
Power[μA]
3.0 3.7 1.34
Standard (hits back) vs. Lightweight
Standard designs hit backAES-B (RUB/NTU) 2011
Standard (hits back) vs. Lightweight Modules AES ‐ T AES ‐ B Present
Memory 2040 1678 887
• Memory becomes smallerin GE due to technologychange[GE]
Mix Column[GE]
306 373 0
change.• MixColumns becomebigger but this is thetrade-off in order to
S‐box[GE]
408 233 32
FSM + Rest[GE]
646 317 192
trade off in order togain more in the FSM.
• Canright’s S-box is usedwhich is smaller, but not
[GE]
Total Area 3400 2601 1111
as much as indicated(again because ofthe technology change).I i diffi l[GE]
Cycles 1000 226 547• It is difficult to comparethe FSM since AES-Tcontains also thedecryption still AES BPower
[μA]3.0 3.7 1.34 decryption, still AES-B
state machine is smaller.
Standard vs. Lightweight (updated) Modules AES ‐ B Present
Memory 1678 887
Smaller key and block size• 128 bit - too much• 80 bit key and 64 bit data – ok
[GE]
Mix Column[GE]
373 0
y• 32, 48 bit data might be acceptable?
128 + 128 100 %
80 + 64 56 25 %S‐box[GE]
233 32
FSM + Rest[GE]
317 192
80 + 64 56.25 %
80 + 48 50 %
80 + 32 43.75 %[GE]
Total Area 2601 1111
• Memory •65% for AES-B •80% for Present[GE]
Cycles 226 547
•80% for Present
Power[μA]
3.7 1.34
Standard vs. Lightweight (updated) Modules AES ‐ B Present
Memory 1678 887
• P-layer costs 0 for Present.• Simple FSM can save a lot.
[GE]
Mix Column[GE]
373 0• 8x8 S-box costs ~300 GE or at least 200.
• While an 4x4 S-box costs ~ 50 GE or S‐box[GE]
233 32
FSM + Rest[GE]
317 192
at least 30.• Saving of 6 to 7 times in the S-box.
W k S b d P l[GE]
Total Area 2601 1111
• Weaker S-box and P-layercompensated by a larger number ofrounds - 31 vs. 10.
[GE]
Cycles 226 547
Power[μA]
3.7 1.34
Standard vs. Lightweight (updated)
Still the lightweight cipher is more than twice smaller.And also the power consumption is ~ 3 times less.p p
Overview
• Lightweight crypto - state of the art• Comparison Standard vs. LightweightComparison Standard vs. Lightweight • SCA countermeasures - TI approach• TI implementations• Area, Power or Throughput?• Conclusions
Side-Channel Attacks
Device executing the cryptographic algorithm leaks information on internal state
Instantaneous leakage depends on intermediate variables, which results in
tiequations That have lower nonlinearityThat may contain noise
Power consumption depends on: Instructions executed Data processed
Signal is noisy; multiple measurements d dneeded
SCA countermeasures at different levels
Hardware logic styleHardware logic styleRelieves cryptographersPlaces burden on hardware designers
Algorithms and implementationsAlgorithms and implementationsProbably lowest feasible level
Ciphers and ProtocolsCiphers and ProtocolsNew standards, takes time
Lightweight SCA protection
Simple masking are vulnerable due to glitches.Private circuits [Ishai et al ] too expensive not realistic model
z = f (x)
Private circuits [Ishai et.al.] – too expensive, not realistic model.Multi-party computation (TI) made practical.
1. Correctnessz = f (x)
f ( )
2. Non-completeness
3 I d d t ifz1= f1 (x1,x2)z2= f2 (x1,x3)z3= f3 (x2,x3)
3. Independent uniform distribution of input
3 3 ( 2 3)
Power consumption of each fi is independent of x1, x2, x3.Secure in the presence of glitches (transition count model)Secure in the presence of glitches (transition count model)against 1st order SCA.
Example: multiplier
• = secure AND gate• 3 shares3 shares
• Secure in the presence of glitches
Lightweight SCA protection
Protecting Arbitrary Functions:Multiplication of elements needs at least +1 sharesMultiplication of n elements needs at least n+1 sharesHardware size increases about quadratic with the number of sharesCan we reduce the number of shares?Hence 3 shares we can apply only to the quadratic functionsHence 3 shares we can apply only to the quadratic functions.
Pipelining:Registers are insensitive to glitchesg gSplit functions into parts with less non-linearityUse registers between combinatorial partsProblem:
Property 3: the inputs of each step need to be independentuniformly distributed
Pipelining: output of each step is input of next stepW d P t 3 f t t llWe need Property 3 for output as well.
Lightweight SCA protection
Which functions can we protect?Th b f h d d h d f h f iThe number of shares depends on the degree of the function.Hence 3 shares we can apply only to the quadratic functions.
• The multiplications in GF(2) (AND gate) and GF(4).• The Boolean functions with 2 and 3 inputs• The Boolean functions with 2 and 3 inputs. • Noekeon (KUL) 2000, S-box.
S(x) = NL(L(NL(x))
Pipelined implementation
Noekeon Implementation Results
• Implementation using Austria Microsystems Standard Cell Library CMOS 0.35μm
• S-Box: 54 GE (implementation of 2
quadratic mappings) correlation
• Protected S-Box: 188 GE (excluding 12 bit register) no correlation between shares and
unshared valuesunshared values
• Less than 4x increase (actually 3.5x) in size N t li t lNote nonlinear part only.
Noekeon Implementation Results
• An 4x4 S-box costs ~ 50 GE or at-least 30 GE, but the s-box of Noekeon is 54the s-box of Noekeon is 54 GE when decomposed in two quadratic mappings.
• Since the shared mappings• Since the shared mappings are less efficient than the originals we get instead of theoretically expected 3x y pincrease slightly more 3.5x.
Lightweight SCA protection
Which 4x4 S-boxes can we protect?A Poschmann et al 2010 Present S box also can be decomposedA.Poschmann et.al 2010 – Present S-box also can be decomposed.Hence similar to Noekeon, Present can be shared with 3 shares only.
There are 302 affine-equivalent classes for the 4 x 4 bijections:There are 302 affine equivalent classes for the 4 x 4 bijections: 295 cubic classes, 6 quadratic classes and 1 affine class. Bijections (permutations) in GF(2)4 belong to the symmetric group S16.
Theorem A 4 x 4 bijection can be decomposed using quadratic bijectionsTheorem. A 4 x 4 bijection can be decomposed using quadratic bijections if and only if it belongs to the alternating group A16 (151 classes).
Lightweight SCA protection
Which 4x4 S-boxes can we protect?
There are 302 affine-equivalent classes for the 4 x 4 bijections: 295 cubic classes, 6 quadratic classes and 1 affine class. Bijections (permutations) in GF(2)4 belong to the symmetric group S16.j (p ) ( ) g y g p 16.
Theorem. A 4 x 4 bijection can be decomposed using quadratic bijections if and only if it belongs to the alternating group A16 (151 classes).
H th 302/2 6 1 144 bi l i A hi h bHence there are 302/2 - 6 - 1 = 144 cubic classes in A16 which can be decomposed.• 30 classes can be decomposed with length 2, • the remaining 114 classes can be decomposed with length 3• the remaining 114 classes can be decomposed with length 3.
Thus 144 classes can be masked using only 3 shares.Decomposable S-boxes: Noekeon; Present; Serpent 0 1 2 6; Khazad P QDecomposable S boxes: Noekeon; Present; Serpent 0,1,2,6; Khazad P,Q.
Overview
• Lightweight crypto - state of the art• Comparison Standard vs. LightweightComparison Standard vs. Lightweight • SCA countermeasures - TI approach• TI implementations• Area, Power or Throughput?• Conclusion
PRESENT ‐ Implementation ResultsModules Present Present TI
Memory 887 2635.8V
A.Poschmann et.al 2010
[GE] 300%
Mix Column[GE]
0 0
0 kH
z @
1 Memory 80+64 bitsShared 3x increase
Efficient S-box only 32 GE.S‐box[GE]
32 35511x
FSM + Rest[G ]
192 592308%
P6M
-10
0 yShared 8.8x increase + 12 bit register (pipelined)
The FSM increases 3 times[GE] 308%
Total Area 1111 35820.18μm
1 The FSM increases 3 times.Pipeline increases the cyclesand slightly the control.
[GE] 322%
Cycles 547 578106%
UM
C L
180 In total the increase is ~ 3x
Cycles – small increase only.But the power increases ~4x.
Power[μA]
1.34 5.02375%
U
Lightweight SCA protection
Can we protect AES S-box 8x8 or only 4x4 S-boxes?N li i i• Nonlinear part = inversion overGF(256)
• Tower field approach
• Need to ensure Property 3 inevery step
• No efficient methodNo efficient method • Large search space
• Ongoing research to make itg gefficient.
Lightweight SCA protection
Can we protect the AES S-box?R b h h h i f h l i li i i GF(4)Remember we have the sharing of the multiplications in GF(4).But this multiplication is the only non-linear in the AES (Canright) S-box.
S-box is transformed fromS-box is transformed from GF(28) to GF(28)/GF(24)/GF(22)Tower field approach
RUB/NTU 2011 [MPLPW2011]
Lightweight SCA protection
Can we protect the AES S-box?Th l i li i i GF(4) i h l li i h AES S bThe multiplications in GF(4) is the only non-linear in the AES S-box.Recall our countermeasure requires registers between differentstages of shared functions.Thus Canright’s S box representation requires in total five pipeliningThus Canright s S-box representation requires in total five pipelining stages.This implies that in total one needs to store 174 bits.
AES Implementation ResultsModules AES ‐ B AES TI
Memory 1678 50558V
RUB/NTU 2011 [MPLPW2011]
Memory 2x128 bitsy[GE] 300%
Mix Column[GE]
373 1120300%
0 kH
z @
1.8 Memory 2x128 bits
Shared 3x increase
Complex S-box only 233 GE.Sh d 13 7 i +S‐box
[GE]233 4244
18x
FSM + Rest 317 695P6M
-10
0 Shared 13.7x increase + 174 bit register (pipelined)
The FSM increases only [GE] 219%
Total Area 2601 111140.18μm
1P
2 times.
In total the increase is ~ 4x[GE] 427%
Cycles 226 266118%M
C L
180
0
Cycles – small increase only.But the power increases ~4x.
Power[μA]
3.7 13.4362%
UM
Threshold Implementation [MPLPW2011]
• Present TI - first order DPA fail with 5 million measurements. (data masking, key masking, random data and key permutations).
• AES TI - 5 million traces correlation collision attack succeeds becauseuniformity fails and resharing is required.
• With resharing 100 million traces are still insufficient for CPA using aHD d l d MIA i HD d l thi d d CPA ith 400HD model and MIA using a HD model, even third-order CPA with 400million traces fails.
Threshold Implementation ResultsModules AES ‐ B AES TI Present Present TI
Memory 1678 5055 887 2635
UM
C L1
[GE] 300% 300%
Mix Column[GE]
373 1120300%
0 0
180 0.18μ
S‐box[GE]
233 424418215%
32 35511094%
FSM + Rest[G ]
317 695219%
192 592308%
m 1P
6M -
[GE] 219% 308%
Total Area 2601 11114 1111 3582
-100 kHz
[GE] 427% 322%
Cycles 226 266118%
547 578106%
@1.8V
Power[μA]
3.7 13.4362%
1.34 5.02375%
Overview
• Lightweight crypto - state of the art• Comparison Standard vs. LightweightComparison Standard vs. Lightweight • SCA countermeasures - TI approach• Comparing different TI implementationsg• Area, Power or Throughput?• Conclusions
Area, Power or Throughput
3000
4000
AreaCipher Area [GE]NXP 0.140 µm
Power [µW] Consumption@ 1 MHz, 1.2V
Throughput[bit/cycle]
0
1000
2000
Power
AES ‐T 3162 5.95 0.12
Present 1173 3.45 0.12
00
10.00
15.00
Power1598 5.56 2.06
Katan 64 984 7.62 0.50
0.00
5.00
Throughput
1102 8.63 0.75
Grain 861 7.40 1.00
0.000.501.001.502.002.50
A P P K K G T C
Trivium 1298 12.02 1.00
Crypto 1 306 2.57 1.00
AES (Tina)
Present
Present
Katan64
Katan64
Grain
Trivium
Crypto 1
Power and ThroughputRoad tolling example: car passing with high speed should authenticate with antenna/reader on certain (height) distance. ( g )
Requirements: Di t < 10 12 Ti < 10Distance < 10-12 m; Time < 10 ms.
Why power is so important?In that “extreme” example the power consumption is more important thanIn that extreme example the power consumption is more important than the area. The excess of power can be used to improve the throughput.
Can we do crypto on RFID 12 meters far away?
C th ti t t i h t ti f ?Can we authenticate a tag in short time so far away?
Power and ThroughputToll example requirements: Distance < 10-12 m; Time < 10 ms.
So we can not only do a crypto but we can make aSo we can not only do a crypto, but we can make a full authentication even with SCA protected lightweight implementation. Distance for
Fixed Time 10 msCipher / Authentication Time [ms] Distance [m]
AES‐T 6 10
10 12 12
20
1110
20
30
Fixed Time 10 ms
AES TI 10 7
23 10
7
0
Time at Fixed Distance 10 m
Present 2 10
10 20
Present TI 8 10 6
23
8510152025
Present TI 8 10
10 1120
5
AES AES TI Present Presnt TI
Overview
• Lightweight crypto - state of the art• Comparison Standard vs. LightweightComparison Standard vs. Lightweight• SCA countermeasures - TI approach• TI implementations• Lightweight Area, Power or Throughput?• Conclusions
Conclusions
• Young and challenging research areag g g• Already many interesting lightweight designs
available• New lightweight primitives should be designed with
SCA protection in mind • The semiconductor industry shows interest in• The semiconductor industry shows interest in
implementing lightweight primitives with SCA countermeasures
• Research should focus on all parameters not only on area
Thank you!