architectural impact of ssl processing jingnan yao
Post on 21-Dec-2015
216 views
TRANSCRIPT
Architectural Impact of SSL Processing
Jingnan Yao
Reference
“Architectural Impact of Secure Socket Layer on Internet Servers”, Karishna Kant, Ravishankar Iyer and Prasant Mohapatra.
“Anatomy and Performance of SSL Processing”, Li Zhao, Ravi Iyer, Srihari Makineni and Laxmi Bhuyan.
Two Major Approach
IPSEC: Internet Protocal Security Protocol IP level Implemented in NICs (network interface cards)
SSL: Secure Socket Layer Transport level Secures an individual communication session
Secure HTTP (called HTTPS) uses SSL for security and is being used widely in e-commerce environment.
Performance Impact
Server: number of simultaneous connections drop significantly
Client: unduly long client response time (10-25% ecommerce transactions are aborted)
Simultaneous Connections for SPECWeb99 and SPECweb99_SSL
It can be seen that SPECWeb99 can achieve much higher throughput than SPECWeb99_SSL.
Overview of SSL
Privacy, Integrity & Authentication Session Negotiation Phase:
Authentication of the server and client at the beginning of the session
Bulk Data Transfer Phase:Encryption/decryption of data exchanged between the two parties during the session
Execution Time Breakdown in Web Server (1KB webpage)
SSL processing (libcrypto & libssl) takes 71.6% of the execution time.
Further Breakdown of Crypto Operations
Public key encryption
Private key encryption
Hashing Other
operations
Configurations
Number of processors in the SMP server: Uniprocessor Dual Processor Quad processor
Three different L2 cache sizes 512KB 1MB 2MB
Three different file sizes 30 byte handshake performance 1 MB bulk data encryption performance 36 kB average web-page transfer
Overall Performance
Observation 1:
“SSL increases path length 10-15 fold over non-SSL case” + “CPI drops by more than a factor of 2” “The use of SSL increases computational cost of the transactions by a factor of 5-7.”
“As the number of processors increase, the ratio goes down.” “More processors mean more coherency traffic in both SSL and non-SSL cases. ”
Observation 2:
“Small CPI for SSL” A faster CPU core would not be very helpful in improving SSL performance so long as L1 is large enough to supply much of the code and data needed.
“Bulk data encryption/decryption algorithms highly sequential in nature” A wider issue width would not help, but a longer pipeline would.
L1 Cache Characteristics
Separate instruction and data L1 caches: 16KB Single unified L2 Cache
Observation 1:
“L1 instruction miss ratios are very low in all cases. L1 data miss ratios are more significant.”
“The instruction miss ratio generally decreases with number of processors, but the data miss ratio goes up.”
“More processors allow a better sharing of code, but the coherency misses in data cache increase. ”
Observation 2:
“30 byte file sizes: the miss ratio for both instruction and data are much lower in the SSL case than non-SSL case.”
“The data miss ratio retains the same behavior for all file sizes and processor configurations.”
“The frequent reuse of the data during the encryption and decryption process.”
“The instruction locality relating to handshaking process is very high.”
Observation 3:
“1 MB files sizes: the instruction miss ratio becomes very poor with the SSL traffic for bulk data transfers.”
“Low instruction locality in the bulk data transfer case.”
“Working set of instructions in the bulk transfer case does not fit within L1 cache.”
“Larger instruction L1 cache would help to improve bulk data encryption performance.”
L2 Cache Characteristics
Observation 1:
“High L2 miss ratios, especially for large size webpages (1MB sizes)” “High degree of locking/contention in TCP processing.”“Cache pollution because of TCP checksum.”
Encryption Dominated & SSL Handshake Dominated
(1MB files)
(30 byte files)
Observation 1:
“1MB case: SSL bulk data transfer shows very good L2 miss ratios.” “The heavy computational workload of SSL helps in reducing the L2 cache miss ratio.”“SSL processing itself has certain features that would lead to high L2 cache miss ratios.” “30 byte case: SSL Handshake shows very high L2 miss ratios.”
Branch and Prediction Behavior
Observation 1:
“Branch frequency with SSL is about 30%-50% of that without SSL.” “There are less control dependencies in the SSL-based transactions.”
“Low branch frequency in SSL encourages high degree of pipelining in the processor architecture.”
“Lower control dependency is another reason for high hit rate in L1 and low CPI in case of SSL.”
Observation 2:
“For 1P/2P configuration: the miss-prediction rate with SSL is lower.”
“For 4P configuration: the miss-prediction rate with SSL is always higher.”
“For 4P configuration: BTB is highly inefficient.”“Better branch prediction algorithms can be investigated.”
“Avoid overly complex branch predictor for SSL transactions since the branch frequency is very low.”
Conclusion
SSL overhead increases computational cost of the transactions by a factor of 5-7 times
SSL transactions do not benefit much from a larger L2 cache but a larger L1 cache would be helpful.
A complex logic for handling control dependencies is not useful for SSL transaction as the frequency of branches is very low.