accelerating cryptographic performance on the zynq ... · accelerating cryptographic performance on...

12
WP512 (v1.0) May 21, 2019 www.xilinx.com 1 © Copyright 2017–2019 Xilinx, Inc. Xilinx, the Xilinx logo, Alveo, Artix, Kintex, Spartan, UltraScale, Versal, Virtex, Vivado, Zynq, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. AMBA, AMBA Designer, Arm, Arm1176JZ-S, CoreSight, Cortex, and PrimeCell are trademarks of Arm in the EU and other countries. All other trademarks are the property of their respective owners. The Zynq® UltraScale+™ MPSoC’s embedded cryptographic accelerator enables system architects to greatly increase cryptographic performance—by as much as 10,000% or more—compared to software-only solutions. White Paper: Zynq UltraScale+ MPSoC WP512 (v1.0) May 21, 2019 Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC ABSTRACT The Zynq UltraScale+ MPSoC features embedded cryptographic cores[Ref 1] for AES-GCM-256 bit, SHA-3/384, and RSA, as well as Arm® v8 cryptographic extensions[Ref 2] within the Arm Cortex®-A53 processor, providing accelerated cryptographic performance. The performance of the embedded cryptographic accelerators is described under two software architectures, running on a real-time operating system (RTOS) as well as in user space on Linux. In each scenario, the cryptographic accelerator’s performance is compared to using a software-only cryptographic solution. These performance measurements show how system architects can maximize cryptographic performance depending on their needs.

Upload: others

Post on 18-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 1

© Copyright 2017–2019 Xilinx, Inc. Xilinx, the Xilinx logo, Alveo, Artix, Kintex, Spartan, UltraScale, Versal, Virtex, Vivado, Zynq, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. AMBA, AMBA Designer, Arm, Arm1176JZ-S, CoreSight, Cortex, and PrimeCell are trademarks of Arm in the EU and other countries. All other trademarks are the property of their respective owners.

The Zynq® UltraScale+™ MPSoC’s embedded cryptographic accelerator enables system architects to greatly increase cryptographic performance—by as much as 10,000% or more—compared to software-only solutions.

White Paper: Zynq UltraScale+ MPSoC

WP512 (v1.0) May 21, 2019

Accelerating Cryptographic Performance on the

Zynq UltraScale+ MPSoC

ABSTRACTThe Zynq UltraScale+ MPSoC features embedded cryptographic cores[Ref 1] for AES-GCM-256 bit, SHA-3/384, and RSA, as well as Arm® v8 cryptographic extensions[Ref 2] within the Arm Cortex®-A53 processor, providing accelerated cryptographic performance. The performance of the embedded cryptographic accelerators is described under two software architectures, running on a real-time operating system (RTOS) as well as in user space on Linux. In each scenario, the cryptographic accelerator’s performance is compared to using a software-only cryptographic solution. These performance measurements show how system architects can maximize cryptographic performance depending on their needs.

Page 2: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 2

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

IntroductionThis white paper illustrates:

• The performance of software running on the Arm Cortex-A53 processor, leveraging the built-in cryptographic accelerators in the Zynq UltraScale+ MPSoC’s Configuration Security Unit (CSU)

• The performance of the equivalent software algorithm running on the Arm Cortex-A53 processor, but leveraging the Arm v8 cryptographic extensions

Both results are compared to the native Arm Cortex-A53 software solution.

The Zynq UltraScale+ MPSoC accelerators located in the CSU contain the embedded cryptographic cores for AES-GCM 256-bit encryption and decryption, SHA-3/384 hashing, and both public and private RSA operations with a key size of up to 4,096 bits. Although the default clock for the CSU is the Zynq UltraScale+ MPSoC's internal system oscillator running at 180MHz[Ref 1], the clock was changed for these tests to run off the low power domain's phase-locked loop (PLL), set at 375MHz, to achieve better performance. The cryptographic extensions available in the Arm v8 core accelerate AES, SHA-1, SHA-2, and CRC-32 operations, but they do not support acceleration of any RSA or SHA-3 operations.

Performance measurements for all tests were run on the Arm Cortex-A53 processor in the Zynq UltraScale+ MPSoC, and utilized the wolfSSL[Ref 3] built-in benchmarking software version 3.12.0 running out of external dual data rate (DDR) memory. WolfSSL is able to leverage a range of hardware for cryptographic functions during cryptographic performance measurements. On the Zynq UltraScale+ MPSoC, this includes the embedded cryptographic core, Arm v8 cryptographic extensions, and software-only algorithms. Data sets of 16 bytes up to 15,888 bytes were used to measure the performance of the AES-GCM 256-bit and SHA3/384 cryptographic algorithms, while RSA-2048 and RSA-4096 were used to measure RSA performance.

wolfSSL on FreeRTOSWolfSSL version 3.12.0 was ported to FreeRTOS on Xilinx SDK 2017.1 for the Zynq UltraScale+ MPSoC to benchmark the performance of the software-only solution, Arm v8 cryptographic extensions, and the Zynq UltraScale+ MPSoC’s embedded cryptographic core for the AES-GCM 256-bit, SHA3/384, RSA-2048, and RSA-4096 algorithms. CSU DMA caching and endian byte-swapping was disabled in the XilSecure libraries[Ref 4] for the wolfSSL FreeRTOS port to maximize the embedded cryptographic performance. Data block sizes of 16 bytes, 528 bytes, 1,024 bytes, 4,112 bytes, 7,696 bytes, and 15,888 bytes were used to measure the throughput of the AES-GCM 256-bit and SHA3/384 algorithms. RSA-2048 and RSA-4096 public and private operations were performed to measure the average amount of time needed for each operation. For the RSA operations, the pre-calculated exponential value was not used.

Page 3: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 3

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

ResultsTable 1 shows the results of AES-GCM 256-bit encryption running the wolfSSL benchmarks under FreeRTOS. This table highlights in bold the fastest performing cryptographic extension and shows that the Arm v8 cryptographic extensions outperform the Zynq UltraScale+ MPSoC’s embedded cryptographic core for data blocks of less than or equal to 1,024 bytes, while the MPSoC’s crypto core outperforms the Arm v8 cryptographic extensions for data blocks greater than 1,024 bytes.

Table 2 shows the results of AES-GCM 256-bit decryption running the wolfSSL benchmarks under FreeRTOS. This table highlights in bold the fastest performing cryptographic extension and shows that the Arm v8 cryptographic extensions outperform the Zynq UltraScale+ MPSoC’s embedded cryptographic core for data blocks of less than or equal to 1,024 bytes, while the MPSoC crypto core outperforms the Arm v8 cryptographic extensions for data blocks greater than 1,024 bytes.

Table 1: FreeRTOS Throughput, AES-GCM Encryption

FreeRTOS Operation Throughput: AES-GCM 256-bit Encryption (MB/s)

Block SizeSoftware

OnlyArm v8

Crypto ExtensionsZynq UltraScale+ MPSoC

Crypto Hardware

16 3.785 53.041 4.012

528 7.52 278.156 122.285

1,024 7.634 297.949 219.531

4,112 7.73 315.976 555.58

7,696 7.743 318.9 691.379

15,888 7.746 318.949 844.344

Table 2: FreeRTOS Throughput, AES-GCM Decryption

FreeRTOS Operation Throughput: AES-GCM 256-bit Decryption (MB/s)

Block SizeSoftware

OnlyArm v8

Crypto ExtensionsZynq UltraScale+ MPSoC

Crypto Hardware

16 3.756 43.414 2.069

528 7.515 166.307 59.153

1,024 7.634 175.415 100.244

4,112 7.73 183.429 237.055

7,696 7.75 184.771 291.194

15,888 7.746 184.854 373.363

Page 4: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 4

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

In Table 3, the performance of SHA3/384 reveals that the software-only solution outperforms the Zynq UltraScale+ MPSoC’s embedded cryptographic core only for data blocks of 16 bytes.

Table 4 shows that the Zynq UltraScale+ MPSoC embedded cryptographic core outperforms all software-only based RSA solutions.

wolfSSL on LinuxwolfSSL version 3.12.0 was run under Linux 4.14 as available in Xilinx release 2018.3[Ref 5] for the Zynq UltraScale+ MPSoC. This was used to benchmark the software-only solution, Arm v8 cryptographic extensions, and the Zynq UltraScale+ MPSoC’s embedded cryptographic core performance of the AES-GCM 256-bit, SHA3/384, RSA-2048, and RSA-4096 algorithms. Due to memory management and execution privilege differences, applications running under Linux are unlike RTOS applications, in that the same modifications made to the FreeRTOS port were not applied to the Linux port. However, the same data set used in the FreeRTOS benchmarking was used in Linux.

Due to Linux, the expectation is that user space applications run at the lowest-exception level (EL0) on the Armv8 architecture. The Zynq UltraScale+ MPSoC’s cryptographic extensions are only accessible from user space applications via a secure monitor call (SMC). This accesses the platform management unit (PMU), which then accesses the cryptographic operation in the CSU using the XilSecure library[Ref 4]. When the cryptographic operation is complete, the Linux application continues to run after returning from the CSU, returning then from the PMU call, and subsequently from the SMC. The high overhead of this scheme is reflected in the Linux benchmarking results.

Table 3: FreeRTOS Throughput, SHA3/384

FreeRTOS Operation Throughput: SHA3/384 (MB/s)

Block SizeSoftware

OnlyZynq UltraScale+ MPSoC

Crypto Hardware

16 38.502 17.934

528 55.087 328.623

1,024 56.152 417.92

4,112 57.295 592.763

7,696 57.615 628.626

15,888 57.52 654.565

Table 4: FreeRTOS Average RSA Operation Time

FreeRTOS Avg. RSA Operation Time: wolfSSL v3.12.0 (ms)

RSA OperationSoftware

OnlyZynq UltraScale+ MPSoC

Crypto Hardware

Public Encrypt 2048 4.874 0.552

Private Decrypt 2048 89.25 12.846

Public Encrypt 4096 18.519 1.95

Private Decrypt 4096 619.47 95.9

Page 5: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 5

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

ResultsTable 5 shows the results of AES-GCM 256-bit encryption running the wolfSSL benchmarks under Linux. This table highlights in bold the fastest performing cryptographic extension and shows that the Arm v8 cryptographic extensions outperform the Zynq UltraScale+ MPSoC’s embedded cryptographic core for all data block sizes. This is because the Arm v8 cryptographic extensions can be directly accessed in user space and do not have to traverse a large software stack to reach the CSU.

Table 6 shows the results of AES-GCM 256-bit decryption running the wolfSSL benchmarks under Linux. This table highlights in bold the fastest performing cryptographic extension and shows that the Arm v8 cryptographic extensions outperform the Zynq UltraScale+ MPSoC’s embedded cryptographic core for all data block sizes, due to the same reasons stated for encryption.

Table 5: Linux Throughput, AES-GCM Encryption

Linux 4.14 Operation Throughput: AES-GCM 256-bit Encryption (MB/s)

Block SizeSoftware

OnlyArm v8

Crypto ExtensionsZynq UltraScale+ MPSoC

Crypto Hardware

16 4.426 54.309 0.151

528 8.971 300.263 4.68

1,024 9.105 322.934 8.66

4,112 9.206 344.002 27.769

7,696 9.235 347.552 42.572

15,888 9.23 345.605 59.948

Table 6: Linux Throughput, AES-GCM Decryption

Linux 4.14 Operation Throughput: AES-GCM 256-bit Decryption (MB/s)

Block SizeSoftware

OnlyArm v8

Crypto ExtensionsZynq UltraScale+ MPSoC

Crypto Hardware

16 4.397 34.835 0.146

528 8.955 179.661 4.535

1,024 9.1 192.258 8.52

4,112 9.22 203.652 27.083

7,696 9.231 205.579 41.784

15,888 9.213 204.456 59.311

Page 6: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 6

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

In Table 7, the performance of SHA3/384 reveals that the software-only solution outperforms the Zynq UltraScale+ MPSoC’s embedded cryptographic core for data block sizes less than or equal to 1,024 bytes, and the MPSoC crypto core outperforms the software-only solution for data block sizes greater than or equal to 4,112 bytes.

Table 8 shows that the Zynq UltraScale+ MPSoC’s embedded cryptographic core outperforms all software-only RSA solutions, even with the high Linux calling overhead.

Table 7: Linux Throughput, SHA3/384

Linux Operation Throughput: SHA3/384 (MB/s)

Block SizeSoftware

OnlyZynq UltraScale+ MPSoC

Crypto Hardware

16 41.58 0.308

528 59.632 9.822

1,024 60.797 18.602

4,112 62.025 66.507

7,696 62.357 112.26

15,888 62.291 179.718

Table 8: Linux Average RSA Operation Time

Linux Avg. RSA Operation Time: wolfSSL v3.12.0 (ms)

RSA OperationSoftware

OnlyZynq UltraScale+ MPSoC

Crypto Hardware

Public Encrypt 2048 4.424 1.342

Private Decrypt 2048 83.512 25.71

Public Encrypt 4096 16.736 4.152

Private Decrypt 4096 569.89 191.778

Page 7: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 7

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

ConclusionWith the exception only of the SHA3/384 Linux result, the cryptographic acceleration provided by the Arm v8 cryptographic extensions and by the Zynq UltraScale+ MPSoC's embedded cryptographic core outperforms a software-only solution.

Table 9 highlights the percentage of maximum performance improvement using the cryptographic accelerators over a software-only solution. In Table 9, all maximum percentage of performance improvements occurred in the wolfSSL FreeRTOS benchmark results using the Zynq UltraScale+ MPSoC's embedded cryptographic core with the largest block size for the AES and SHA algorithms, and for all key sizes and all operations for the RSA algorithm.

The slight performance improvement of the software-only solutions between the FreeRTOS and Linux data is due to the fact that the software-only solutions used different versions of compilers. The FreeRTOS benchmarking used GCC version 6.2.1, while the Linux benchmarking used GCC version 7.3.1. To confirm this difference in compilers, the software-only version of wolfSSL for Linux was recompiled using GCC version 6.2.1 and the AES-GCM 256-bit algorithms were retested. The results showed that AES-GCM 256-bit software-only encryption performance decreased to 7.023 MB/s, while decryption performance decreased to 7.022 MB/s. Even though this white paper focuses on using cryptographic accelerators, the difference in compiler version performance shows that using the latest tools is extremely important to maximize software performance.

Figure 1 compares all the benchmarking results from this white paper of the AES-GCM 256-bit encryption algorithm run under the different benchmarking scenarios. Figure 1 clearly shows that the software-only solutions have dramatically slower throughput compared to both accelerated cryptographic solutions. When running wolfSSL benchmarking under FreeRTOS, the Zynq UltraScale+ MPSoC's hardened cryptographic core outperforms the Arm v8 cryptographic extensions for block sizes greater than 2,000 bytes. The FreeRTOS software-only solution overlaps with the Linux software-only solution and is not legible on the chart. Due to the high overhead of calling the MPSoC's AES-GCM 256-bit encryption engine from Linux, using the Arm v8 cryptographic extensions is always a better option to maximize performance.

Table 9: Performance Improvement: Cryptographic Algorithms Over Software-Only Solutions

Algorithm Performance Improvement

AES-GCM 256-bit Encryption 10,800.39%

AES-GCM 256-bit Decryption 4,720.07%

SHA3/384 1,037.98%

RSA 2048 Public Encryption 782.97%

RSA 2048 Private Decryption 594.77%

RSA 4096 Public Encryption 849.69%

RSA 4096 Private Decryption 545.95%

Page 8: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 8

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

Figure 2 compares all the performance results of the AES-GCM 256-bit decryption algorithm run under the different benchmarking scenarios. Figure 2 does show results similar to those displayed in Figure 1; however, the performance of the Zynq UltraScale+ MPSoC's embedded cryptographic engine run under FreeRTOS outperforms the Arm v8 cryptographic extensions for block sizes greater than 4,000 bytes. Again, the FreeRTOS software-only solution overlaps with the Linux software-only solution and is not legible on the chart.

Figure 3 shows the comparison results of running the SHA3/384 algorithm under the different benchmarking scenarios. Like the results for AES-GCM 256-bit decryption, the performance of the Zynq UltraScale+ MPSoC's embedded cryptographic engine when run under FreeRTOS clearly outperforms all the other solutions except when using a block size of 16 bytes. Even Linux, with its

X-Ref Target - Figure 1

Figure 1: Comparison Summary of All AES-GCM 256-Bit Encryption Results

X-Ref Target - Figure 2

Figure 2: Comparison Summary of All AES-GCM 256-Bit Decryption Results

WP512_01_050719

900

800

700

600

500T

hro

ug

hp

ut

(MB

/s)

Data Block Size (Bytes)

FreeRTOS Software OnlyFreeRTOS Arm v8 Crypto ExtensionsFreeRTOS Zynq UltraScale+ MPSoc Crypto Hardware

Linux 4.14 Software OnlyLinux 4.14 Arm v8 Crypto ExtensionsLinux 4.14 Zynq UltraScale+ MPSoc Crypto Hardware

wolfSSL 3.12.0 AES-GCM 256-Bit Encryption Performance

400

300

200

100

2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,00000

WP512_02_050719

400

350

300

250

Th

rou

gh

pu

t (M

B/s

)

Data Block Size (Bytes)

FreeRTOS Software OnlyFreeRTOS Arm v8 Crypto ExtensionsFreeRTOS Zynq UltraScale+ MPSoC Crypto Hardware

Linux 4.14 Software OnlyLinux 4.14 Arm v8 Crypto ExtensionsLinux 4.14 Zynq UltraScale+ MPSoC Crypto Hardware

wolfSSL 3.12.0 AES-GCM 256-Bit Decryption Performance

200

150

100

50

2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,00000

Page 9: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 9

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

high calling overhead, shows performance improvement over a software-only solution when using the embedded cryptographic accelerators for block sizes greater than 4,000 bytes.

Figure 4, 5, 6, and 7 show the collective comparison of running all RSA operations under the various benchmarking scenarios. These graphs consistently indicate that the use of the Zynq UltraScale+ MPSoC's embedded cryptographic core always outperforms a software-only solution. Additionally, when accelerated performance is measured running RSA algorithms under FreeRTOS versus Linux, FreeRTOS is always at least twice as fast due to the CSU being directly accessible in FreeRTOS.

X-Ref Target - Figure 3

Figure 3: Comparison Summary of All SHA3/384 Results

X-Ref Target - Figure 4

Figure 4: Summary of All RSA-2048 Public Encryption Benchmark Results

WP512_03_050719

700

600

500

Th

rou

gh

pu

t (M

B/s

)

Data Block Size (Bytes)

FreeRTOS Software OnlyFreeRTOS ZU+Crypto Hardware

Linux 4.14 Software OnlyLinux 4.14 ZU+Crypto Hardware

wolfSSL 3.12.0 SHA3 Performance

400

300

200

100

2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,00000

WP512_04_040819

5

4

4.5

3.5

Tim

e (m

s)

Benchmark Test

3

2

2.5

1

1.5

0.5

0FreeRTOS

Software OnlyFreeRTOS

Zynq UltraScale+ MPSoCCrypto Hardware

Linux 4.14Software Only

Linux 4.14Zynq UltraScale+ MPSoC

Crypto Hardware

wolfSSL 3.12.0 Average Operation Timeof RSA 2048 Public Encryption

4.874

0.552

4.424

1.342

Page 10: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 10

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

X-Ref Target - Figure 5

Figure 5: Summary of All RSA-2048 Public Decryption Benchmark Results

X-Ref Target - Figure 6

Figure 6: Summary of All RSA-4096 Public Encryption Benchmark Results

WP512_05_040819

90

80

70

Tim

e (m

s)

Benchmark Test

60

40

50

20

30

10

0FreeRTOS

Software OnlyFreeRTOS

Zynq UltraScale+ MPSoCCrypto Hardware

Linux 4.14Software Only

Linux 4.14Zynq UltraScale+ MPSoC

Crypto Hardware

wolfSSL 3.12.0 Average Operation Timeof RSA 2048 Private Decryption

89.25

12.846

83.512

25.71

WP512_06_040819

20

16

18

14

Tim

e (m

s)

Benchmark Test

12

8

10

4

6

2

0FreeRTOS

Software OnlyFreeRTOS

Zynq UltraScale+ MPSoCCrypto Hardware

Linux 4.14Software Only

Linux 4.14Zynq UltraScale+ MPSoC

Crypto Hardware

wolfSSL 3.12.0 Average Operation Timeof RSA 4096 Public Encryption

18.519

1.95

16.736

4.152

Page 11: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 11

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

System architects who are tasked to define cryptographic solutions should evaluate:

• How the algorithm is run—utilizing Arm v8 cryptographic extensions, or using the Zynq UltraScale+ MPSoC's embedded cryptographic core

• The size of each data block • Where the algorithm is running—on a simple RTOS, or in Linux user space • Out of which memory space the software is running—out of DDR memory, or out of on-chip

memory (OCM). While this paper did not test running the cryptographic algorithms out of DDR versus OCM, this factor should be considered.

As shown in this white paper, these variables can significantly affect the performance of the Zynq UltraScale+ MPSoC's accelerated cryptographic performance. The research for this white paper covered only a small subset of possible benchmarking scenarios compared to the number of possible solution combinations. End users should ultimately benchmark their own design to make sure performance requirements are being met.

These results confirm that using the cryptographic accelerators on the Zynq UltraScale+ MPSoC can vastly improve cryptographic performance and will benefit any market and any application that utilizes the AES-GCM-256-bit algorithm, the SHA3/384 algorithm, the RSA-2048 algorithm, and/or the RSA-4096 algorithm. Secondly, these algorithms cover all the foundations for a secure system and covers the aspects of confidentiality, integrity, and authentication. Lastly, not only is a performance increase achieved on the Zynq UltraScale+ MPSoC by using the cryptographic accelerators but doing so will reduce the resource load on the Arm Cortex-A53s, the Arm Cortex-R5s, and the programmable logic. In this way, Zynq UltraScale+ MPSoC applications have a lot more system resources available to dedicate to the application's needs instead of devoting resources to cryptographic needs.

X-Ref Target - Figure 7

Figure 7: Summary of All RSA-4096 Public Decryption Benchmark ResultsWP512_07_040819

700

Tim

e (m

s)

Benchmark Test

600

400

500

200

300

100

0FreeRTOS

Software OnlyFreeRTOS

Zynq UltraScale+ MPSoCCrypto Hardware

Linux 4.14Software Only

Linux 4.14Zynq UltraScale+ MPSoC

Crypto Hardware

wolfSSL 3.12.0 Average Operation Timeof RSA 4096 Private Decryption

619.47

95.9

569.89

191.778

Page 12: Accelerating Cryptographic Performance on the Zynq ... · Accelerating Cryptographic Performance on the Zynq UltraScale+MPSoC In Table3, the performance of SHA3/384 reveals that the

WP512 (v1.0) May 21, 2019 www.xilinx.com 12

Accelerating Cryptographic Performance on the Zynq UltraScale+ MPSoC

References1. Xilinx User Guide UG1085, Zynq UltraScale+ Device Technical Reference Manual 2. Arm Technical Reference Manual, Cortex®-A53 MPCore Processor Cryptography Extension,

Revision r0p4 3. wolfSSL website product landing page 4. Xilinx User Guide UG1137, Zynq UltraScale+ MPSoC Software Developer Guide, Appendix I:

XilSecure Library v3.1 5. Xilinx User Guide UG1144, Linux Tools Documentation Reference Guide

Revision HistoryThe following table shows the revision history for this document:

DisclaimerThe information disclosed to you hereunder (the “Materials”) is provided solely for the selection and use of Xilinxproducts. To the maximum extent permitted by applicable law: (1) Materials are made available “AS IS” and with all faults,Xilinx hereby DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOTLIMITED TO WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE;and (2) Xilinx shall not be liable (whether in contract or tort, including negligence, or under any other theory of liability)for any loss or damage of any kind or nature related to, arising under, or in connection with, the Materials (including youruse of the Materials), including for any direct, indirect, special, incidental, or consequential loss or damage (including lossof data, profits, goodwill, or any type of loss or damage suffered as a result of any action brought by a third party) evenif such damage or loss was reasonably foreseeable or Xilinx had been advised of the possibility of the same. Xilinxassumes no obligation to correct any errors contained in the Materials or to notify you of updates to the Materials or toproduct specifications. You may not reproduce, modify, distribute, or publicly display the Materials without prior writtenconsent. Certain products are subject to the terms and conditions of Xilinx’s limited warranty, please refer to Xilinx’sTerms of Sale which can be viewed at http://www.xilinx.com/legal.htm#tos; IP cores may be subject to warranty andsupport terms contained in a license issued to you by Xilinx. Xilinx products are not designed or intended to be fail-safeor for use in any application requiring fail-safe performance; you assume sole risk and liability for use of Xilinx productsin such critical applications, please refer to Xilinx’s Terms of Sale which can be viewed at http://www.xilinx.com/legal.htm#tos.

Automotive Applications DisclaimerXILINX PRODUCTS ARE NOT DESIGNED OR INTENDED TO BE FAIL-SAFE, OR FOR USE IN ANY APPLICATION REQUIRINGFAIL-SAFE PERFORMANCE, SUCH AS APPLICATIONS RELATED TO: (I) THE DEPLOYMENT OF AIRBAGS, (II) CONTROL OF AVEHICLE, UNLESS THERE IS A FAIL-SAFE OR REDUNDANCY FEATURE (WHICH DOES NOT INCLUDE USE OF SOFTWARE INTHE XILINX DEVICE TO IMPLEMENT THE REDUNDANCY) AND A WARNING SIGNAL UPON FAILURE TO THE OPERATOR, OR(III) USES THAT COULD LEAD TO DEATH OR PERSONAL INJURY. CUSTOMER ASSUMES THE SOLE RISK AND LIABILITY OFANY USE OF XILINX PRODUCTS IN SUCH APPLICATIONS.

Date Version Description of Revisions

05/21/2019 1.0 Initial Xilinx release.