automated black-box detection of side-channel ... evans/pubs/ccs2011/sca-packaged.pdf ·...
Click here to load reader
Post on 21-Feb-2018
Embed Size (px)
In 18th ACM Conference on Computer and Communications Security, Chicago, October 2011
Automated Black-Box Detection ofSide-Channel Vulnerabilities in Web Applications
Peter ChapmanUniversity of Virginia
David EvansUniversity of Virginiaevans@virginia.edu
ABSTRACTWeb applications divide their state between the client and the server.The frequent and highly dynamic client-server communication thatis characteristic of modern web applications leaves them vulner-able to side-channel leaks, even over encrypted connections. Wedescribe a black-box tool for detecting and quantifying the sever-ity of side-channel vulnerabilities by analyzing network traffic overrepeated crawls of a web application. By viewing the adversary asa multi-dimensional classifier, we develop a methodology to morethoroughly measure the distinguishably of network traffic for a va-riety of classification metrics. We evaluate our detection system onseveral deployed web applications, accounting for proposed clientand server-side defenses. Our results illustrate the limitations of en-tropy measurements used in previous work and show how our newmetric based on the Fisher criterion can be used to more robustlyreveal side-channels in web applications.
1. INTRODUCTIONCommunication between the client and server in a web applica-
tion is necessary for meaningful and efficient operation, but with-out care, can leak substantial information through a variety of side-channels. Previous work has demonstrated the ability to profiletransfer size distributions over encrypted connections in order toidentify visited websites [6, 8, 30]. Today, such side-channel leaksare especially pervasive and difficult to mitigate due to modern webdevelopment techniques that require increased client-server com-munication . The competitive marketplace encourages a dy-namic and responsive browsing experience. Using AJAX and sim-ilar technologies, information is brought to the user on demand,limiting unnecessary traffic, decreasing latency, and increasing re-sponsiveness. By design, this approach separates traffic into a se-ries of small requests that are specific to the actions of the user .
Analyzing network activity generated by web applications canreveal a surprising amount of information. Most attacks examinethe size distributions of transmitted data, since the most commonlyused encryption mechanisms on the web make no effort to concealthe size of the payload. As a result, traffic patterns can be correlatedwith specific keypresses and mouse clicks. Fundamentally, these
Permission to make digital or hard copies of all or part of this work for per-sonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. To copy otherwise, to re-publish, to post on servers or to redistribute to lists, is strongly encouraged.CCS 2011, October 1721, 2011, Chicago, Illinois, USA.
attacks leverage correlations between the network traffic and thecollective state of the web application.
To assist developers who want to create web applications that areresponsive but have limited side-channel leaks, we developed a sys-tem to automatically detect and quantify the side-channel leaks in aweb application. Our system identifies side-channel vulnerabilitiesby extensively crawling a target application to find network trafficthat is predictably associated with changes in the application state.We use a black-box approach for compatibility and accuracy. Driv-ing an actual web browser enables the deployment of our tools onany website, regardless of back-end implementation or complex-ity. Further, by generating the same traffic as would be seen byan attacker we ensure that information leaks due to unpredictableelements such as plug-ins or third-party scripts are still detected.
Previous work has used the concept of an attackers ambiguityset, measured either in reduction power [5, 22, 33] or conditionalentropy [21, 33] to measure information leaks. Our results showthat entropy-based metrics are very fragile and do not adequatelymeasure the risk of information leaking for complex web applica-tions under realistic conditions. As an alternative metric, we adoptthe Fisher criterion  to measure the classifiability of web appli-cation network traffic, and by extension, information leakage.
Contributions and Overview. We consider two threat models forstudying web application side-channel leaks: one where the at-tacker listens to encrypted wireless traffic and another where anattacker intercepts encrypted network traffic at the level of an Inter-net service provider (Section 3.2).
We present a black-box web application crawling system thatlogs network traffic while interacting with a website in the samemanner as an actual user using a standard web browser, outputtinga set of web application states and user actions with generated net-work traffic traces, conceptually represented as a finite-state ma-chine (Section 3). We have developed a rich XML specificationformat to configure the crawler to interact effectively with manywebsites (Section 4.2).
Using the finite-state machine output of the web application ex-ploration, we consider information leaks from the perspective of amulti-class classification problem (Section 5). In building our ex-ample nearest-centroid classifier, we enumerate three distance met-rics that measure the similarity of two network traces. Using thesame set of distance metrics, we measure the entropy of user ac-tions in the web application in the same manner as prior work andshow that the variation and noise in real web applications makethe concept of an uncertainty set insufficient for describing infor-mation leaks (Section 5.2). This motivates an alternative measure-ment based on the Fisher criterion to quantify the classifiability andtherefore information leakage of network traces in a web applica-tion based on the same set of distance metrics (Section 5.3).
We evaluate our crawling system and leak quantification tech-niques on several complex web applications. Compared to entropy-based metrics, we find the Fisher criterion is more resilient to out-liers in trace data and provides a more useful measure of an appli-cations vulnerability to side-channel classification (Section 6).
2. RELATED WORKThe study of side-channel vulnerabilities extends at least to the
World War II . Pervious work has considered side-channel vul-nerabilities in a wide variety of domains, including cryptographicimplementations , sound from dot matrix printers , and ofcourse, web traffic [5, 6, 8, 30]. The most effective side-channel at-tacks on web applications have examined the size distributions oftraffic [6,8,30]. The vulnerabilities stem from the individual trans-fer of objects in a web page, which vary significantly in size andquantity. Furthermore, due to the deployment of stream ciphers onthe Internet , the sizes of objects are often visible in encryptedtraffic. Tunneled connections [3, 21] and even encrypted wirelessconnections [3, 5] are vulnerable to these attacks.
The interactive traffic of modern web applications presents at-tackers with rich opportunities for side-channel attacks. Chen et al.demonstrated how an attacker could identify specific user actionswithin a web application based on intercepted encrypted traffic .The leaks they found in search engines, a personal health informa-tion management system, a tax application, and a banking site re-quired application-specific mitigation techniques for adequate andefficient protection. Search engine suggestions are a suitable exam-ple to demonstrate these attacks. As the user types a search query,the client sends the server the typed keys and the server returns alist of suggestions. An attacker can leverage the fact that the sizeof the suggestions sent after each keystroke vary depending on thetyped letter to reconstruct search terms. Figure 1 shows how a sin-gle query is divided into a revealing series of network transfers. Fora single letter, the attacker only needs to match the traffic to a setof 26 possibilities (letters A through Z). With the next letter, theattacker can use the reduced set of possibilities given by the firstletter to drastically reduce the search space.
Client Google 748
Client Google 762
Figure 1: Search engines leak queries through the networktraffic generated by auto-complete suggestions. The numbersindicate the number of bytes transferred.
be found through a straightforward static analysis. Sidebuster thenquantifies the leaks using rerun testing, which initiates the stati-cally discovered communication-inducing action using a simulatedbrowser called HtmlUnit  and records the network traffic withJpcap . It measures the leaks by calculating the conditionalentropy for each tested action. Sidebuster was tested on severaldemonstration applications and mock websites and shown to dis-cover significant information leaks. We choose a black-box ap-proach, in part, to create a leak discovery tool that is not limited toa particular implementation platform.
Black-box Web Application Exploration and Testing. Black-box exploration of traditional applications is