architectural impact of ssl processing jingnan yao

25
Architectural Impact of SSL Processing Jingnan Yao

Post on 21-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Architectural Impact of SSL Processing Jingnan Yao

Architectural Impact of SSL Processing

Jingnan Yao

Page 2: Architectural Impact of SSL Processing Jingnan Yao

Reference

“Architectural Impact of Secure Socket Layer on Internet Servers”, Karishna Kant, Ravishankar Iyer and Prasant Mohapatra.

“Anatomy and Performance of SSL Processing”, Li Zhao, Ravi Iyer, Srihari Makineni and Laxmi Bhuyan.

Page 3: Architectural Impact of SSL Processing Jingnan Yao

Two Major Approach

IPSEC: Internet Protocal Security Protocol IP level Implemented in NICs (network interface cards)

SSL: Secure Socket Layer Transport level Secures an individual communication session

Secure HTTP (called HTTPS) uses SSL for security and is being used widely in e-commerce environment.

Page 4: Architectural Impact of SSL Processing Jingnan Yao

Performance Impact

Server: number of simultaneous connections drop significantly

Client: unduly long client response time (10-25% ecommerce transactions are aborted)

Page 5: Architectural Impact of SSL Processing Jingnan Yao

Simultaneous Connections for SPECWeb99 and SPECweb99_SSL

It can be seen that SPECWeb99 can achieve much higher throughput than SPECWeb99_SSL.

Page 6: Architectural Impact of SSL Processing Jingnan Yao

Overview of SSL

Privacy, Integrity & Authentication Session Negotiation Phase:

Authentication of the server and client at the beginning of the session

Bulk Data Transfer Phase:Encryption/decryption of data exchanged between the two parties during the session

Page 7: Architectural Impact of SSL Processing Jingnan Yao
Page 8: Architectural Impact of SSL Processing Jingnan Yao

Execution Time Breakdown in Web Server (1KB webpage)

SSL processing (libcrypto & libssl) takes 71.6% of the execution time.

Page 9: Architectural Impact of SSL Processing Jingnan Yao

Further Breakdown of Crypto Operations

Public key encryption

Private key encryption

Hashing Other

operations

Page 10: Architectural Impact of SSL Processing Jingnan Yao

Configurations

Number of processors in the SMP server: Uniprocessor Dual Processor Quad processor

Three different L2 cache sizes 512KB 1MB 2MB

Three different file sizes 30 byte handshake performance 1 MB bulk data encryption performance 36 kB average web-page transfer

Page 11: Architectural Impact of SSL Processing Jingnan Yao

Overall Performance

Page 12: Architectural Impact of SSL Processing Jingnan Yao

Observation 1:

“SSL increases path length 10-15 fold over non-SSL case” + “CPI drops by more than a factor of 2” “The use of SSL increases computational cost of the transactions by a factor of 5-7.”

“As the number of processors increase, the ratio goes down.” “More processors mean more coherency traffic in both SSL and non-SSL cases. ”

Page 13: Architectural Impact of SSL Processing Jingnan Yao

Observation 2:

“Small CPI for SSL” A faster CPU core would not be very helpful in improving SSL performance so long as L1 is large enough to supply much of the code and data needed.

“Bulk data encryption/decryption algorithms highly sequential in nature” A wider issue width would not help, but a longer pipeline would.

Page 14: Architectural Impact of SSL Processing Jingnan Yao

L1 Cache Characteristics

Separate instruction and data L1 caches: 16KB Single unified L2 Cache

Page 15: Architectural Impact of SSL Processing Jingnan Yao

Observation 1:

“L1 instruction miss ratios are very low in all cases. L1 data miss ratios are more significant.”

“The instruction miss ratio generally decreases with number of processors, but the data miss ratio goes up.”

“More processors allow a better sharing of code, but the coherency misses in data cache increase. ”

Page 16: Architectural Impact of SSL Processing Jingnan Yao

Observation 2:

“30 byte file sizes: the miss ratio for both instruction and data are much lower in the SSL case than non-SSL case.”

“The data miss ratio retains the same behavior for all file sizes and processor configurations.”

“The frequent reuse of the data during the encryption and decryption process.”

“The instruction locality relating to handshaking process is very high.”

Page 17: Architectural Impact of SSL Processing Jingnan Yao

Observation 3:

“1 MB files sizes: the instruction miss ratio becomes very poor with the SSL traffic for bulk data transfers.”

“Low instruction locality in the bulk data transfer case.”

“Working set of instructions in the bulk transfer case does not fit within L1 cache.”

“Larger instruction L1 cache would help to improve bulk data encryption performance.”

Page 18: Architectural Impact of SSL Processing Jingnan Yao

L2 Cache Characteristics

Page 19: Architectural Impact of SSL Processing Jingnan Yao

Observation 1:

“High L2 miss ratios, especially for large size webpages (1MB sizes)” “High degree of locking/contention in TCP processing.”“Cache pollution because of TCP checksum.”

Page 20: Architectural Impact of SSL Processing Jingnan Yao

Encryption Dominated & SSL Handshake Dominated

(1MB files)

(30 byte files)

Page 21: Architectural Impact of SSL Processing Jingnan Yao

Observation 1:

“1MB case: SSL bulk data transfer shows very good L2 miss ratios.” “The heavy computational workload of SSL helps in reducing the L2 cache miss ratio.”“SSL processing itself has certain features that would lead to high L2 cache miss ratios.” “30 byte case: SSL Handshake shows very high L2 miss ratios.”

Page 22: Architectural Impact of SSL Processing Jingnan Yao

Branch and Prediction Behavior

Page 23: Architectural Impact of SSL Processing Jingnan Yao

Observation 1:

“Branch frequency with SSL is about 30%-50% of that without SSL.” “There are less control dependencies in the SSL-based transactions.”

“Low branch frequency in SSL encourages high degree of pipelining in the processor architecture.”

“Lower control dependency is another reason for high hit rate in L1 and low CPI in case of SSL.”

Page 24: Architectural Impact of SSL Processing Jingnan Yao

Observation 2:

“For 1P/2P configuration: the miss-prediction rate with SSL is lower.”

“For 4P configuration: the miss-prediction rate with SSL is always higher.”

“For 4P configuration: BTB is highly inefficient.”“Better branch prediction algorithms can be investigated.”

“Avoid overly complex branch predictor for SSL transactions since the branch frequency is very low.”

Page 25: Architectural Impact of SSL Processing Jingnan Yao

Conclusion

SSL overhead increases computational cost of the transactions by a factor of 5-7 times

SSL transactions do not benefit much from a larger L2 cache but a larger L1 cache would be helpful.

A complex logic for handling control dependencies is not useful for SSL transaction as the frequency of branches is very low.