do i use the wrong denition? - university of wisconsin

15
Do I Use the Wrong Definition? DefUse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs * Yao Shi Tsinghua University [email protected] Soyeon Park University of California, San Diego [email protected] Zuoning Yin University of Illinois at Urbana-Champaign [email protected] Shan Lu University of Wisconsin-Madison [email protected] Yuanyuan Zhou University of California, San Diego [email protected] Wenguang Chen Tsinghua University [email protected] Weimin Zheng Tsinghua University [email protected] Abstract Software bugs, such as concurrency, memory and seman- tic bugs, can significantly affect system reliability. Although much effort has been made to address this problem, there are still many bugs that cannot be detected, especially concur- rency bugs due to the complexity of concurrent programs. Effective approaches for detecting these common bugs are therefore highly desired. This paper presents an invariant-based bug detection tool, DefUse, which can detect not only concurrency bugs (in- cluding the previously under-studied order violation bugs), but also memory and semantic bugs. Based on the obser- vation that many bugs appear as violations to program- mers’ data flow intentions, we introduce three different types of definition-use invariants that commonly exist in both sequential and concurrent programs. We also design an algorithm to automatically extract such invariants from programs, which are then used to detect bugs. Moreover, * This work is supported by IBM CAS Canada research fellowship ”May- Happen-In-Parallel Analysis for POSIX Thread” and Intel-MOE joint re- search award ”Program Analysis for Parallel Applications”. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. OOPSLA/SPLASH’10, October 17–21, 2010, Reno/Tahoe, Nevada, USA. Copyright c 2010 ACM 978-1-4503-0203-6/10/10. . . $10.00 DefUse uses various techniques to prune false positives and rank error reports. We evaluated DefUse using sixteen real-world applica- tions with twenty real-world concurrency and sequential bugs. Our results show that DefUse can effectively detect 19 of these bugs, including 2 new bugs that were never re- ported before, with only a few false positives. Our training sensitivity results show that, with the benefit of the prun- ing and ranking algorithms, DefUse is accurate even with insufficient training. Categories and Subject Descriptors D.2.4 [Software Engi- neering]: Software/Program Verification—Reliability, Sta- tistical methods; D.2.5 [Software Engineering]: Testing and Debugging—Diagnostics, Testing tools General Terms Reliability Keywords Concurrency Bug, Sequential Bug, Atomicity Violation, Order Violation 1. Introduction 1.1 Motivation As software has grown in size and complexity, the difficulty of finding and fixing bugs has increased. Inevitably, many bugs leak into deployed software, contributing to up to 27% of system failures [22]. Among the different types of bugs, concurrency bugs are the most notorious, due to their non-deterministic nature. Unlike sequential bugs, concurrency bugs depend not only on inputs and execution environments, but also on thread- interleaving and other timing-related events that are to be

Upload: others

Post on 25-Jul-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Do I Use the Wrong Denition? - University of Wisconsin

Do I Use the Wrong Definition?DefUse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs ∗

Yao ShiTsinghua University

[email protected]

Soyeon ParkUniversity of California, San Diego

[email protected]

Zuoning YinUniversity of Illinois at

[email protected]

Shan LuUniversity of Wisconsin-Madison

[email protected]

Yuanyuan ZhouUniversity of California, San Diego

[email protected]

Wenguang ChenTsinghua [email protected]

Weimin ZhengTsinghua University

[email protected]

AbstractSoftware bugs, such as concurrency, memory and seman-tic bugs, can significantly affect system reliability. Althoughmuch effort has been made to address this problem, there arestill many bugs that cannot be detected, especially concur-rency bugs due to the complexity of concurrent programs.Effective approaches for detecting these common bugs aretherefore highly desired.

This paper presents an invariant-based bug detection tool,DefUse, which can detect not only concurrency bugs (in-cluding the previously under-studied order violation bugs),but also memory and semantic bugs. Based on the obser-vation that many bugs appear as violations to program-mers’ data flow intentions, we introduce three differenttypes of definition-use invariants that commonly exist inboth sequential and concurrent programs. We also designan algorithm to automatically extract such invariants fromprograms, which are then used to detect bugs. Moreover,

∗ This work is supported by IBM CAS Canada research fellowship ”May-Happen-In-Parallel Analysis for POSIX Thread” and Intel-MOE joint re-search award ”Program Analysis for Parallel Applications”.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.OOPSLA/SPLASH’10, October 17–21, 2010, Reno/Tahoe, Nevada, USA.Copyright c© 2010 ACM 978-1-4503-0203-6/10/10. . . $10.00

DefUse uses various techniques to prune false positives andrank error reports.

We evaluated DefUse using sixteen real-world applica-tions with twenty real-world concurrency and sequentialbugs. Our results show that DefUse can effectively detect19 of these bugs, including 2 new bugs that were never re-ported before, with only a few false positives. Our trainingsensitivity results show that, with the benefit of the prun-ing and ranking algorithms, DefUse is accurate even withinsufficient training.Categories and Subject Descriptors D.2.4 [Software Engi-neering]: Software/Program Verification—Reliability, Sta-tistical methods; D.2.5 [Software Engineering]: Testingand Debugging—Diagnostics, Testing toolsGeneral Terms ReliabilityKeywords Concurrency Bug, Sequential Bug, AtomicityViolation, Order Violation

1. Introduction1.1 MotivationAs software has grown in size and complexity, the difficultyof finding and fixing bugs has increased. Inevitably, manybugs leak into deployed software, contributing to up to 27%of system failures [22].

Among the different types of bugs, concurrency bugs arethe most notorious, due to their non-deterministic nature.Unlike sequential bugs, concurrency bugs depend not onlyon inputs and execution environments, but also on thread-interleaving and other timing-related events that are to be

Page 2: Do I Use the Wrong Denition? - University of Wisconsin

manifested [3, 18, 27, 30, 34, 40]. This makes them hard tobe exposed and detected during in-house testing. Althoughconcurrency bugs may appear less frequently than sequentialbugs, they can cause more severe consequences, such as datacorruption, hanging systems, or even catastrophic disasters(e.g., the Northeastern Electricity Blackout Incident [35]).With the pervasiveness of multi-core machines and concur-rent programs, this problem is becoming more and more se-vere.

Recent research works have made significant advances indetecting concurrency bugs, especially for data races [10, 23,34, 39] and atomicity violations 1 [9, 11, 19, 33, 42, 44], butorder violation bugs have been neglected. An order violationoccurs if a programming assumption on the order of certainevents is not guaranteed during the implementation [18] 2.According to a recent real-world concurrency bug character-istic study [18], order violations account for one third of allnon-deadlock concurrency bugs.

Figure 1(a) shows an order violation bug in HTTrack. Inthis example, the programmer incorrectly assumes that S2 inThread 1 is always executed after S3 in Thread 2 (probablydue to some misunderstanding in the thread creation processand scheduling). Unfortunately, this order assumption is notguaranteed in the implementation. In some cases, S2 may beexecuted before S3 and results in a null-pointer parameter,which may lead to a null-pointer dereference, crashing theprogram later.

Besides concurrency bugs, semantic bugs are anotherform of bugs that is hard to detect. To date, only a few toolshave been proposed to detect semantic bugs due to their ver-satility and lack of general patterns. Figure 2(a) shows suchan example from gzip. This bug is caused by the missingreassignment of variable ifd before S2. With certain in-puts, ifd can incorrectly reuse the value set by S3, causingthe program to misbehave. This bug is program-specific andcannot be detected easily.

Finally, memory corruption bugs such as buffer overflowand dangling pointer are also important since they can beexploited by malicious users. Many tools have been builtfor detecting those bugs but they each target only a specialsubtype of bugs. Figure 2(c) and 2(d) show two real worldmemory corruption bugs.

1.2 Commonality among Bug TypesInterestingly, regardless of the difference between thesebugs’ root causes, many of them share a common charac-teristic: when triggered, they usually are followed by an in-correct data flow, i.e., a read instruction uses the value from

1 An atomicity violation bug is a software error occurring when a set of opera-tions/accesses that is supposed to be atomic is not protected with appropriate syn-chronizations.2 Even though order violations can be fixed in ways similar to atomicity violationbugs (i.e., by adding locks), their root causes are different. Order is not essential inatomicity violation, as long as there is no intervening remote access breaking theassumed atomicity.

an unexpected definition 3. We refer to such a definition asan incorrect definition. Figure 1 and 2 use eight real-worldbugs to demonstrate such commonality. Although these bugshave different root causes, when they are manifested, theyinvariably result in a read (highlighted in each figure) thatuses a value from an incorrect definition.

For example, in the HTTrack example (Figure 1(a)), dur-ing correct execution, S2 always uses the definition from S3.However, in a buggy run, S2 uses the definition from S1, re-sulting in a null-pointer dereference later. Similarly, in theatomicity violation bug example from Apache (Figure 1(b)),during correct execution, S2 always uses the definition fromS1. However, in a buggy run, S3 is interleaved between S1and S2. Thus S2 uses the definition from S3 which causesboth threads to call cleanup cash obj(). This will lead to anull-pointer dereference inside this function. The other twobugs in Figure 1(c)(d) are also similar.

Some sequential bugs also share a similar characteristic.For example, in the semantic bug example shown in Fig-ure 2(a), in correct runs, S2 always uses the definition fromS1. However, when the bug manifests, S2 can read the def-inition from S3 from the last loop iteration. The other threebugs shown in Figure 2, including one semantic bug and twomemory bugs, are similar.

The above commonality indicates that, if we can detectsuch incorrect definition-use data flow, it is possible to catchthese bugs, regardless of their different root causes.

In order to detect incorrect definition-use flow, we firstneed to determine what definitions a correct instructionshould use. Unfortunately, it is not easy to get such infor-mation. Firstly, it is tedious to ask programmers to providesuch information (if they were aware of such informationconsciously, they would not have introduced such bugs inthe first place). Secondly, it is also hard to extract such in-formation by a static analysis. Though an advanced staticanalysis could reveal all possible definitions that an instruc-tion could use, it cannot differentiate the correct definitionsfrom incorrect ones. For example, in Figure 2(a), static anal-ysis tools would report the definitions from both S1 andS3 can reach S2, but it cannot identify S3 as an incorrectdefinition for S1.

In this paper, we use a different but complementary tech-nique to address this problem for both concurrent and se-quential programs written in unsafe languages.

1.3 Our ContributionsThe technique proposed in this paper is a new class of pro-gram invariants called definition-use invariants, which cancapture the inherent relationships between definitions anduses for concurrent and sequential programs. A definition-

3 When a variable, v, is on the left-hand side of an assignment statement, this statementis a definition of v. If variable v is on the right-hand-side of a statement, v has a useat this statement. A reaching definition for a given instruction is another instruction,the target variable of which reaches the given instruction without an interveningassignment [1]. For simplicity, we refer to a use’s reaching definition as its definition.

Page 3: Do I Use the Wrong Denition? - University of Wisconsin

����� � ����

�� ����������� � ���� ��� ������������ �� !��"$#&%('�)+* ,.-�/�02143

� 5�"2���56�7�8��97�8: :�;�1��� ����9���<=���!�������>����?�@A" 1B3

������� �DC

�FE!G ���� ��� �H�.�������� �� !��"$#&%('�)+* ,.-I/ 02143��J�G � 58"A%+'�)+*�,�-I/ 0�K8KML�1

��� ����9���<=���!�������7�I��?H@2" 1N3

�����������O ��� ������� � �M" 1�P���IG7Q.R ,�S T R U ,.V 0�:����� ����>���8<7�$"6143

W

�� ������XC�YE7G7Q.R ,�S T R U ,�V�0�:�;83YZ Z8�M<���� 97����

�7� ���9��>[\�2���$������"$����� � �I] ^ ^ 143_�`J8G �7� ����!��9��!�8� �852� � �>��<��� ���" Q�R ,.S8T R U ,�V�0B^6^ 1B3

a&b.c�d!e�fhghi�j�k&ghf�glk&d!c�d>e f�m d

npoIo�q+c�d!r\nls t7t�f�q(u=f�qAm6dwv�e s t�xyf�z m6d

����� ���XC

�YEw�8{ 5I"ASI-�) U�| /8}8'�~��D� �89��X�&���.��� { �`� 1�`J�� �p�8���!��<!�h"A?��752�6SI-�) U�| /�}I'�~ �+]�� ���I]7� � 9�1B3

���������h���� � SI- ) U�| /8}8'�~��8:D� ��9 3

a`b�c d>e f�g�i�j�k�� i��wk�d!i���� � ��m d

"A?71`��� �!��� Z+��� �����$�h"A� ��1`� 9 O ���2� ��97�I"�8� [����� p�� !�� p� ���7� �>����52� 9�� �2� � 971�� 9Fa&b�c�d>e fM"����8�h� ��� � � O � � � ���$� ��971

����� ���XC�YEw� ��:��7� � � ���p"(^ ^ 1B3�`J�� �8�2� S T8/8} � | }80 � :

�$� ��?���9��7[p� ���2� �&�![�"2��^ ^>143� �`s e�� t����� d7s � i��hd>q(f�c�s f�tMo�e�qAf c�j J� �`c���j�� ��� s � c�� � ��f�tpe

����2�������

�`�8G � 58" �8�2� S�T8/8} � | }80 � �(�?���9��8^ � ���� �h� �����1�PB^ ^  

o�q+c���t�g�� t!t!� i��¡t�f�t!t!� i���m dyb.f�f�qA¢2g���qAm d

�&� �����!���� 7�

£&��52� 9�� �2� � 9 ��� !�D���� ���A� �89

¤���9�� 5� =����2� � 9

"$��1`�7� �!��� ZA�����p���$�M"A���`1¥� 9 O � �+� �89!�8"$��� [����� p�� !�� p�2� �����$�h����52� 9�� �2� �8971�� 9�n�o.o�qAc d>r�"���$��� � O � ��� ���2� � 971

"�>1`��� � � �![������ 9 O ���2� � 9��I� 9Ya`b.c�d!e�f�"������h� ��� � � O � ��� ���2� ��971

"��1`£���5$� 9�� �2� ��9h������"+£��`��� 1`� 9 O ���2� ��9��8� 9Do�q(c���t7g�� t7t!� i��D"���$��� � O � �8� ���2� ��9!1

¦p§y¨�©`ª&«¬«B­�¨.®�¯�°`°�±$² ©.³8®�² §y¨�´�!���2�� �=������52� 9�� �2� �89 � �� 7�M���� ���2� ��9� 9��!�8�2�� �=������52� 9�� �2� ��9 � �� !�D�� � ���$� ��9

µµ µµ

¶¶ ¶¶

·· ··

¸¸ ¸¸

�.�M ���� ��� �D�� !���$���M����������� ��52� 9�� �2� � 9���5����� i�z c�� k�i>b�s ?�����¹�^

º �$�� �� G 9���� �!<���� 97�$� �����8���5���$� 9��!�D?7�»��� i�z�c�� k&i>b�s m

�.�M�!���p� p?���5���$����¹h� ���������9��7¼ <����=�� � � �»5�� =��7¼������!�A� ��9���5 ��8���2���$�����h���$�����2� ��98^

½�������$���������$ !¾w� ! ����h<!�2� ��9��2�����I�.¹�� � [����� ����<�<���9� p?���5����M�.��� �9���� �������� 9�������M� 9��!�����8^

����� � ����

�&� �����!���� 7�

£&��52� 9�� �2� � 9 ��� !�D���� ���A� �89

¤���9�� 5� =����2� � 9

�.�M ���� ��� �D�� !���$���M� ���!���=����52� 9�� �2� � 9���58qAf��6d!i ����s�I���yC�^

º �$�� �� G ?����2�p�2���$��� �� Y�2� �»�$��5$�$�����2���h !�8�����!� ������ ?H@���>�I"��� �$���8����� 9���� � 9��M<���� 9��6�����8��������?�� ��52�$����1B^

�.���� 7�� Y�2���M��8�p������ ��5$� 9�� �2� ��9�?!����¹���������h�[p�$� 9����$���� ����� 9!��8�2� ��� O � 9��I^

��C�� 9��M�.�M� ��M9����I<��$������>�6����[�� �2��� 9�� 9�������h� � !���=�2� ��9�^

����� � ����

�&� �����!���� 7�

£&��52� 9�� �2� � 9 ��� !�D���� ���A� �89

¤���9�� 5� =����2� � 9

�.�M ���� ��� ���!��9� ��������2���� !����������52� 9�� �2� ��9���5z ��� k`� ��j�f>¿h�� p�yCp�� !���I^

ÀÁ���9��M��� ���� � �"�!���2�A��<7�6���D� ���p52� � ��1����8����� ���^

�.���� 7�� p� <������� ��z���� k�� ��j�f>¿M?7����¹���5 �6��� �2���� �A�$���h?���� 9��h����� ��Â!� 9���� 9��lC�^

��C�� 9��M�.�M� ��M9����I<��$������>�6����[�� �2��� 9�� 9�������h� � !���=�2� ��9�^

����� � ����

�&� �����!���� 7�

£&��52� 9�� �2� � 9 ��� !�D���� ���A� �89

¤���9�� 5� =����2� � 9

��¹� ���� ��� �D�� !���$���h����52� 9�� �$� �89p��9�� �»52�$���Ã���I]�9����5$�$� �¡�yC�^

º ��� �� G 9���� � � <���� 97�� �8�������5� ���9��!�M?!��e ¢ÅÄ�z c���j�x&� j s e m

��¹��!���p� p?���5���$������ !��� »e ¢(Ä�z�c���j�x�� j�s e<��$��<����2� �p�89����� !� Y�2��������52� 9�� �2� � 9�52�$���Ã��C�^

½�������$���������$ !¾w� ! ����h<!�2� ��9��2�����I���M� � [����� ����<�<���9� p?���5����M��¹D� �9���� �������� 9�������M� 9��!�����8^

ÆÆ ÆÆ

Figure 1. Real world concurrency bug examples, definition-use(DefUse) invariants and violations are shown in these examples.

use invariant reflects the programmers’ assumptions aboutwhich definition(s) a read should use. Specifically, we haveobserved the following three types of definition-use invari-ants. We combine all the three types into DefUse Invariants.

• Local/Remote(LR) invariants. A read only uses defini-tions from either the local thread (e.g., Figure 1(b)); or aremote thread (e.g., Figure 1(a)). LR invariants are usefulin detecting order violations and read-after-write atomic-ity violations in multi-threaded programs.

• Follower invariants. When there are two consecutivereads upon the same variable from the same thread, thesecond always uses the same definition as the first. Forexample, in Figure 1(c), S2 should use the same defini-tion of buf index as S1. Follower invariants are usefulfor detecting read-after-read atomicity violations and cer-tain memory bugs.

• Definition Set(DSet) invariants. A read should alwaysuse definition(s) from a certain set of writes. For example,in Figure 2(a), S1 is the only correct definition for S2to use. It can be useful for detecting both concurrencyand sequential bugs but this would introduce more falsepositives.

We have designed a tool, called DefUse that automati-cally extracts definition-use invariants from programs writ-ten in C/C++ and uses them to detect software bugs of var-ious types. To the best of our knowledge, this is one of thefirst techniques that can be used to detect a large variety ofsoftware bugs, including concurrency bugs (both order vio-lations and atomicity violations) as well as sequential bugs.

Similar as many previous invariant-based bug detectionstudies [8, 13, 19, 37, 44, 45], DefUse leverages in-housetesting as training runs to extract the definition-use invari-ants, which are then used to detect concurrency and sequen-

Page 4: Do I Use the Wrong Denition? - University of Wisconsin

Ç È!É Ê�Ë�Ì Í Î8É6Ï!ÎIÊ�Ð Ñ�ÒhÓ�ÔÊ�Ë�Ì8Í Ê=Õ�ÖÊ�Ë�Ì8Í ×BØ�ÙhÊ�Ð Ñ�ÒhÓHÎ.Ú�Ñ7Û ÛAÜÇ ÛIÔ$Ê�Ù ÙhÝ Þ6ÉÝ Õ.Ö ß ß&à&á â�ã ä�å7ä æ�ç�è â�é êpè ê�å!åpã ä ë�ì�å!á í�ê�æ�î�ï=è ç�ð\ñ7ò�é ç�î î

ó Ç ô�É2Ë�Ù�Ê�Ë�Ì Í$جÎ.Ó�õ Í Î�ÊYö�Ç È�Ó�Ñ7É Î.Ó�Ï�Ø�Ç É$Ç Ï È�÷øÊ�Ë�Ì�Í$Ø�Î.Ó�õ Í ÎIÊ�ÜùYúwû ÛÏ�ÍIÔ2Ç>Ù�ü8Ü�Ç ý ó Ç ô�É$Ë�Ü�Ç þ8þ�Õ ÿ � þ�þDÙ�Ý=Ý ��õ Ð Ø!õMÇ Û�ÔÊ»Ù8ÙhÝ ÞAÚ�Ý Õ ó Ç ô�É2Ë�Ùhö���ÜÇ È�Ó�Ñ7É Î�Ó�Ï�Ø�Ç É$Ç Ï È�þ�Ù ó Ç ô�É2Ë�Ü ���IÇ È�Ó�Ñ!É Î.Ó�Ï�Ø�Ç É$Ç Ï ÈpØ�Ë�Ï�Ñ�Ð ôDÚ�õ�Ó�Ï Ø�Ç É2Ç ��õ� ñ�ñ�� ï

�×� Û�Û2Ç õ Ð ô>Î���Î�Ú�Ñ!Û Û6õ�ÍIÇ Ø�������������õ�É2È�ô�õ�Ð Ç Ò�� ó Ç Ð Ð=Ì Ð Ð Ï�Ê!Ì�É6õ�Ò�õ�ÒpÏ Í��»ÛÏ Í�É2Ë�õÓ�Ï Ç È7É6õ8Í����FÉ2Ë�õ�Í ó Ç Ø!õ ��Ç É Ì�Ê!Ê!õ�Ø!Ø!õ�ØYÉ2Ë�õ�Ò�õ�ÒpÏ Í �pÌ�Íõ�Ì�Ð õ�Ì�ô�õ�ô�Ú��Û2Ç õ�Ð ô>Î!�¬Î�Ú�Ñ!Û Ûõ Í8ô Ç Íõ�Ê=É$Ð ���>×"�ó Ë�Ç Ð õ�Ô#�wÕ.ÖÐ õ�È�Ù$��õ�É$È�ô�õ Ð Ç Ò��IÔ&%('") * + ,.-(/0-213�'4'4*657�68XÕBÜ0� �IÍ$õ�É$Ñ�Í2Èpö���Ç Û�Û$Ì�Ç Ð Ñ Íõ ÛIÔ2Ð õ�È�ý�ü�ÕÛ$Íõ�õ Ô�'") *6+ ,�-(/9-(1:3�'4'4* 5AÕBÜ0� ��Û2Ç õ�Ð ô>Î!�¬Î.Ú�Ñ7Û Ûõ ÍIÇ Ø»ô�Ì�È;� Ð Ç È<��Ó�Ï�Ç È�É6õ�Í

=?>@BA2>DCEGF H�I;JLK�K!I&F MH EGF N�CO

Ô$Ê=Õ�P&õ�Û$Ç È�Ç É2Ç Ï�ÈRQ�õ�ÉIÔ4PSQ�õ�É Õ�Ç È���Ì�ÍAÇ Ì8È!É8Ç ÈRT���� Ê!Ï Íõ8Ñ!É2Ç Ð4ñ:.ÔAÚ�Ñ7Û Û6õ�Í�Ï;��õ Í ÛAÐ Ï ó Õ

Ôô�Õ�P&õ�Û2Ç È�Ç É2Ç Ï8ÈRQ�õ�ÉIÔ4PSQ�õ�É Õ�Ç È���Ì Í2Ç Ì�È!ÉIÇ ÈUT����¥Ê!Ï�Íõ�Ñ!É2Ç Ð>ï7ç�ã�Ôô�Ì�È<��Ð Ç È;��Ó�Ï Ç È7É6õ8Í Õï7ç�ã�ï7ç�ã � ï

Ê!Ï ÍAÍõ�Ê=É ô�õ�Û$Ç È�Ç É$Ç Ï È7ö(Ñ�Ø!õMÍõ Ð Ì�É2Ç Ï È

ùYúwû Ç È7É ) '#,�Ù�üIÜ0�×�Û2Ç Ð õ�ô�õ�Ø!Ê�ÍAÇ Ó!ÉÏ Í4�7üpÛÏ Í8Ø=Éô Ç È�×"�ó Ë�Ç Ð õ�Ô"8XÕ.ÖÇ È�Ì�ÒpõhÙ�Ì�Í&�;�.V Ï Ó7É$Ç È�ô�þ8þ�W2ÜÇ ÛIÔ$Ø=É$ÍÊ�ÒhÓ�ÔAÇ È�Ì Òpõ �<XAö4X2Õ�Ù�Ù�ü�Õ:� � Y$ö�Z ÛÏ Í8Ø=Éô Ç È

ùD[�û ��õ�É Î.Òpõ�É2Ë�Ï�ô Ô\) '#, ÕBÜ]� �IÓ�ÍÏ Ê>õ�Ø!Ø»Ø=Éô Ç Èõ Ð Ø!õ�Ö�� ��Ó�Í$Ï�Ê!õ�Ø!Ø\Ì�È�Ï Í2ÒpÌ8ÐHÛ$Ç Ð õùD^�û ) '#,XÙ�Ï Ó�õ�È=Î.Ç È�Ó�Ñ!É Î8Û2Ç Ð õ�Ô2Ç È�Ì�Ò�õ ��� � � Õ4Ü��õ�É Î�Ò�õ�É2Ë�Ï�ô�Ô2Ç Ûô�ÕBÜ��

ÔÌ�ÕDP�õ�Û2Ç È�Ç É$Ç Ï ÈUQ�õ�É�Ô4PSQ�õ�É Õ�Ç È���Ì ÍAÇ Ì�È!ÉIÇ È`_�í7á ñ ÔØ!õ ÒpÌ8È!É2Ç ÊpÚ�Ñ<�0a_�í7á ñL_�í7á ñ� ï

b Ì Ç Ð Ñ�Í$õ

c�Ï�Ï�É�Ê!Ì Ñ�Ø>õ

P&õ�Û$Ç È�Ç É2Ç Ï�È!öÑ�Ø!õhÍõ8Ð Ì�É2Ç Ï�È

d�Ì�È�Ç Û6õ�Ø=É6Ì�É$Ç Ï Èee ee

Ç È�Ê!Ï�ÍAÍõ�Ê=É�ô�õ�Û2Ç È�Ç É2Ç Ï�È7ö(Ñ�Ø!õMÍõ�Ð Ì�É2Ç Ï�È

QB�;f�Ò�õ ÒpÊ�Ó���Ô Û2Ò»É��.� � � ��Ð õ8È<��É2Ë!Õ4ÜQ(�:f]'"g$hji\+ *6k6l h"m.n�Ùpo Þü�q Ü�×&Ê�Ë�õ�Ê<rlÉ2Ë�õ�Ê�Ë�Ì Í$Ì�Ê>É6õ�Í Ì�Û Éõ Í6X(÷sX�Ç È»Û$Ò�É4�utlÈ��pÏ È�õ�Ç ÈwvBõ�Û&��Ì6xDb T�tzy�Ç Ø��Ì Ð Ç ô �.{�Ñ!É�É2Ë�õ�Ó ÍÏ.� Í2Ì Ò�Òpõ�Í�ô Ç ô�È�Ï�É Ê!Ï È�Ø�Ç ô�õ�Í É$Ë�Ì�É2| Þ2ü:}�Ç ØpÇ È���Ì�Ð Ç ô:�>×��

QD~ f�Ç Û8Ô��=Ø=É2ÍÊ�Ë�Í�Ô&X+õ�Û&��Ì xDb6T�tSX��]'#gphji�) n+Õ2ÕÍ$õ�É$Ñ�ÍAÈ$�B���.��Ü� � �8Ê7Ï È���õ�Í É�Û2Ò»É�ÉÏMÐ ô�Û$Ò»ÉIË�õ�ÍõQ2�f7Ì�Ø�Ó Í2Ç È!É Û2Ô"8���Ð ô�Û$Ò�É4��� � � Õ

Ô2Ú7ÕDP�õ�Û2Ç È Ç É2Ç Ï�ÈRQ�õ�É8Ô4P�Q�õ�É Õ�Ç È���Ì ÍAÇ Ì È7É8Ç ÈRT��B� Ê!Ï�Í$õ Ñ7É$Ç ÐHå�ê.��ÔØ!õ ÒpÌ8È!É2Ç ÊpÚ�Ñ;��Õå�ê�� å�ê��.� ï

�� ��

d�Ç Ø!Ø�Ç È<�hÌMÍõ�Ì�Ø!Ø�Ç � È�Ò�õ È!É�É6Ï�á î âhÚ�õ�ÛÏ�Íõ$Q(����Ë�õMÓ�ÍÏ.�8ÍÌ Ò Ó�ÍÏ�Ê!õ�Ø!Ø!õ�Ø»ÌMÈ�Ï�Í2Ò�Ì ÐHÛ2Ç Ð õ�Ú�õ�ÛÏ�ÍõØ=Éô Ç È���Q�~�Ø!õ�ÉØ\á î âpÛ2Ç ÍØ=É�Ì È�ô$Q2�MÇ È»É2Ë�õMÈ�õ��7É8Ç É6õ�Í$Ì�É2Ç Ï Èó Ç Ð Ð!Ç È�Ê!Ï8Í2Íõ�Ê=É2Ð ��Ñ�Ø!õpÉ2Ë�Ç Ø\ô�õ�ÛAÇ È�Ç ÉÏ�È����Ø!õpÉ$Ë�õ ó Í$Ï�È<�pÛ2Ç Ð õ�ô õ�Ø!Ê�Í2Ç Ó!ÉÏ Í Û6Ï�Í Ø=É$ô�Ç ÈÓ�ÍÏ�Ê>õ�Ø7Ø�Ç È;�:�]��Ë�õ�Ó�ÍÏ�� ÍÌ�Ò ÛÌ Ç Ð ØYÉ6Ï�Íõ Ì�ôpÛ2ÍÏ�Ò Ø=Éô Ç È����Ì8Í2Ç Ì Ú�Ð õ�á î â�Ç ÈUQ2��Ø�Ë�Ï�Ñ�Ð ô�Ñ�Ø>õhÏ È�Ð �»É$Ë�õhô�õ�Û2Ç È�Ç É2Ç Ï ÈÛ2ÍÏ Ò�Q����

b Ì Ç Ð Ñ�Í$õP&õ�Û$Ç È�Ç É2Ç Ï�È7öÑ�Ø!õ�Íõ8Ð Ì�É2Ç Ï�È

d�Ì�È�Ç Û6õ�Ø=É6Ì�É$Ç Ï È�ÁË�õ�È�Ç!Ç Ø»õ.��Ñ�Ì ÐHÉÏ�Ð õ�È<��É$Ë6�!î ðhã �á �`Ì�É6Q�~�Ç Ø$| Þ$ü:}��.{�Ñ!Éå�ã Aï>ä."� a�Ø=É2Ç Ð Ð!Íõ�É2Ñ�Í2È�ØpÈ�Ï�È7öj�B�B�.�`��Ì�Ð Ñ�õ�Ú�õ�Ê!Ì�Ñ�Ø!õs| Þ$ü:}Ç ØFÉ2Ë�õpÉõ8Í2Ò�Ç È�Ì�ÉÏ�Í Ï�Ûv4õ�Û&��Ì xDb6T�t?y0�]��Ë�õ�È�ë�åHñ2$á ì�ã î&� a��õ�ÉØFÉ2Ë�õ ó ÍÏ È;��Û6Ï�Í2Ò�Ì�É�Ø=É$Í2Ç È<� ��ÁÍÏ È;��ÛÏ Í2ÒpÌ�É�Û6Ï�Í�ë�åHñ:$á ì ã î&� au�î ðhã �6á ��Ì�É6Q(~hØ�Ë�Ï�Ñ�Ð ô�Ñ�Ø!õpÉ2Ë�õ�ô�õ�Û$Ç È�Ç É2Ç Ï È»Û2Í$Ï Ò�Q��;��Q(�Ç ØFÉ2Ë�õ ó ÍÏ�È;��ô�õ�Û2Ç È Ç É2Ç Ï�È»Û6Ï�Í�î ðhã]�á �`Ì�É6Q(~�

b Ì Ç Ð Ñ�Í$õ

c�Ï�Ï�É�Ê!Ì Ñ�Ø>õ

P&õ�Û$Ç È�Ç É2Ç Ï�È7öÑ�Ø!õ�Íõ8Ð Ì�É2Ç Ï�È

d�Ì�È�Ç Û6õ�Ø=É6Ì�É$Ç Ï È

��Ë�õR��Ì�Ð Ñ�õ�Ï�Û ó Ç ô�ÉAËpØ�Ë�Ï Ñ�Ð ôMÚ�õ�Ð õ�Ø!ØYÉ$Ë�Ì�È»É2Ë�õÐ õ È<��É$Ë�Ï�Û�ï=è ç�ð\ñ!ò&é ç�î î\�} ÞAÚ6};r=õ<��Ø»Ì Í$õ�õ È7É$õ Íõ�ô�Ñ È!É2Ç Ðwá ì!ñ.ç�ã òIñ.æ�å!á ã á æ�ìÚ�õ�Ê!Ï Ò�õ�ØpÈ�õ.��Ì�É2Ç ��õ �� ÉIÒpÌ6r=õ�Ø ó Ç ô�É2Ë�Ò�Ñ�Ê�Ë�Ð Ì Í&��õ�ÍÉ2Ë�Ì�È�É$Ë�õ�Ø�Ç ��õ�Ï�Û8ï=è ç�ð\ñ!ò`é ç�î î\�� ÍÌ�Ø�Ë�f�Ê>Ï�Í2Í2Ñ�Ó7É ÌpÛ2Ñ�È�Ê=É2Ç Ï È�Ó�Ï Ç È7Éõ�Í Ì�ô0�$Ì�Ê!õ È7É�ÉÏï=è ç�ð\ñ!ò&é ç�î î\�Q��»Ø�Ë�Ï Ñ�Ð ô�ô�õ�Û$Ç È�õ�ô�Ì�ÉÌ�Ï È�Ð �\ÛÏ�Í�É2Ë�õ�Ñ�Ø!õ�Ø»Ï�Ûï=è ç�ð\ñ!ò&é ç�î î Ô�P`õ�Û$Ç È�Ç É2Ç Ï ÈRQ�õ�É8Ô�P!Q.Õ�Ç È<�!Ì8Í2Ç Ì�È!É Õ7�

b Ì Ç Ð Ñ�Í$õ

c�Ï�Ï�É�Ê!Ì Ñ�Ø>õ

P&õ�Û$Ç È�Ç É2Ç Ï�È7öÑ�Ø!õ�Íõ8Ð Ì�É2Ç Ï�È

d�Ì�È�Ç Û6õ�Ø=É6Ì�É$Ç Ï È

t`Û6Éõ Í�Û$Í2õ�õ �7î á ê�è â�ò ú ò`é ç�î îê;.Ç ØpÈ�Ï�É�Ê�Ð õ Ì Íõ�ôhÌ È�ô�Ç ØÑ�Ø!õ�ô�Ì���Ì Ç È�Ç È�_ ê ã ì â�ê�è á ðw� au�t`Û6Éõ Í�Û$Ç Í$ØwÉ�ÛÌ Ç Ð Ñ�Íõ�Ï�Û._ ê�ã ì�â�ê�è á ð [ �!î á ê�è â�ò ú ò�é�ç�î î ê.�Ç ØÛ2Íõ�õ ô �. È�È�õ��7É8Ç É6õ�ÍÌ�É2Ç Ï8È��]_�ê�ã ì â�ê�è á ð [ ó Ç Ð Ð=Ì�Ê!Ê!õ�Ø!ØÇ È���Ì Ð Ç ô�Òpõ8ÒpÏ Í �pÌ�È�ôpÛÌ�Ç Ð=Ì.� Ì Ç È��;b8Ç È�Ì�Ð Ð ���î á ê�è â¬ò ú ò&é ç�î î6ê;�Ç Ø»ô�Ï Ñ�Ú�Ð õ�Û$Íõ�õ�ô:� È���Ì�Ð Ç ô�Ò�õ ÒpÏ Í��pÌ�Ê!Ê!õ�Ø!Ø»Ì È�ôhô�Ï Ñ�Ú�Ð õpÛAÍõ�õ �d�õ�Ò�Ï Í �pÌ�Ê!Ê!õ�Ø!Ø!õ�ØpÇ È`_ ê�ã ì â�ê�è á ð [ Ì�È�ôpÉ2Ë�õ�Ø!õ Ê!Ï È�ôÉ2Ç Òpõ�Ï�Û î Aê�ê�Ì�Íõ�Ï�Ñ!É�Ï�Û�É$Ë�õ�Ð Ç Ûõ�É2Ç Òpõ �

c�Ï�Ï�É�Ê!Ì Ñ�Ø>õ � Í$Ï�� ÍÌ8Ò�Òpõ Í8ô Ç ô�È�Ï�É�Ê!Ï È�Ø�Ç ô�õ Í É2Ë�õ�Ø=É2ÍAÇ È.��õ È�ô�Ç È.�Ê�Ë�Ì ÍÌ�Ê>É6õ�Í Ý Þ$ü�ÝwÌ Ø?��Ì�Ð Ç ô�

�� ��

Figure 2. Real world sequential bug examples, definition-use(DefUse) invariants and violations are shown in these examples.

tial bugs. To tolerate insufficient training, DefUse automati-cally prunes out unlikely invariants and violations, and ranksremaining violations based on confidence. Our invariant ex-traction algorithm also takes possible training noises (whichmay be incorrectly labeled training runs) into consideration.

We envision our tool being used in two scenarios: (1)General bug detection during testing: Once the invariantsare extracted, DefUse can be used to monitor testing runsfor possible violations. (2) Post-mortem diagnosis: When aprogrammer tries to diagnose a bug, he can use DefUse tosee if any DefUse invariant is violated, and which “incor-rect” definition-use data flow is detected. Such informationprovides useful clues for narrowing down the root cause. Forsuch usage, we assume a deterministic replay support like

the Flight Data Recorder [41] or others [27] to reproduce thefailure occurrence.

We have evaluated DefUse with twenty representativereal-world concurrency and sequential bugs 4, including or-der violation, atomicity violation, memory corruption, andsemantic bugs, from sixteen server and desktop applicationssuch as Apache, MySQL, Mozilla, HTTrack, GNU Linuxcoreutils, and so on. DefUse successfully detects 19 of them4 Like other dynamic tools such as AVIO [19] or Purify [15], our bug detection toolrequires the bug to be manifested during execution (either a monitored run for post-mortem diagnosis or a production run). Hence, evaluation for a dynamic tool is usuallydone with mostly known bugs—whether it can detect these bugs or not. This does notmean that it can only detect known bugs. When such a tool (e.g., the commonly-usedPurify [15] or our tool) is used in field, it can also help programmers to detect unknownbugs. Furthermore, during our evaluation process, we discovered two new bugs thatwere never reported before.

Page 5: Do I Use the Wrong Denition? - University of Wisconsin

including 2 new bugs that were never reported before. Fur-thermore, DefUse reports only a few (0-3) false positives forour evaluated applications, benefiting from DefUse’s prun-ing algorithm that can reduce the number of false posi-tives by up to 87%. Our training sensitivity results showthat DefUse is reasonably tolerant of insufficient trainingand achieves acceptable false positive rates after only a few(around 20) training runs.

2. Definition-Use Invariants2.1 OverviewClassification of Program Invariants. Invariants are pro-gram properties that are preserved during correct executions.They reflect programmer intentions. Invariants are tradition-ally used for compiler optimization (e.g., loop invariants).Recent works automatically infer likely invariants of certaintypes from training runs and use them to detect softwarebugs. Previously proposed invariants include value-range in-variants [8, 13], access-set invariants [45], access interleav-ing invariants [19] and others [6, 17].

Since few previous studies have classified program invari-ants, we summarize them into three categories as below:

• Characteristic-based Invariants: Sometimes, a programelement’s state may change, but all states share a com-mon characteristic. Examples include “variable x’s valueshould be less than 5”, “function f’s return value shouldbe positive”, “basic block b should be atomic”, and so on.

• Relation-based Invariants: Some invariants are aboutthe relationship among multiple program elements. Anexample is the multi-variable invariant discussed inMUVI [17]. It captures the relations between multiplevariables that are always accessed together.

• Value-based Invariants: In some cases, certain programelements always have a fixed set of values. In this case,we use value in a broad sense. The value can be a variablevalue, or an access set, and so on. We call such invariantsas value-based. The access set invariants discovered inAccMon [45] is such an example. It captures the setof instructions (program counters) that access a targetmemory location.

Definition-Use Invariants. As discussed in Section 1.2,definition-use invariants widely exist in both sequential andconcurrent programs, but have not been studied before. Theyreflect programmers’ intention on data flows and are closelyrelated to many software bugs, such as those shown in Fig-ure 1 and Figure 2.

From many real-world programs, we have observed thatcommon data flow intention can be captured by the follow-ing three types of definition-use invariants, with each be-longing to a category discussed above.• Local/Remote (LR) is a type of characteristic-based in-

variants for concurrent programs. An LR invariant re-

flects programmers’ intention on an important propertyof a read instruction: should it use only definitions from alocal (respectively remote) thread? It is useful for detect-ing order violation bugs, and read-after-write atomicityviolation bugs in concurrent programs.

• Follower is a type of relation-based invariant that is espe-cially useful for concurrent programs. It reflects the rela-tion between two consecutive reads to the same mem-ory location from the same thread: should the secondread use the same definition as the first one? It capturesthe atomicity assumption between two consecutive reads,and can be used to detect read-after-read atomicity viola-tion bugs.

• Definition Set (DSet) is a value-based invariant and cap-tures the set of definitions (i.e., write instructions) thata read instruction should use. Unlike LR and Follower,DSet is applicable to both concurrent and sequential pro-grams. It is useful for detecting concurrency bugs (atom-icity violations, especially read-after-write atomicity vi-olations, and order violations), as well as memory bugs(dangling pointers, buffer overflow, and so on) and cer-tain semantic bugs which are introduced by incorrect dataflow.

The above types of definition-use invariants capture dif-ferent data flow intentions and complement each other well.When used for detecting bugs, each has different trade-offsbetween detection power and the false positive rate. There-fore, we combine these three types into a single definition-use framework, called DefUse.

In the rest of this section, we first discuss each one withreal-world examples, followed by the rationale for combin-ing all three of them in DefUse.

2.2 Local/Remote (LR) InvariantsIn concurrent programs, certain reads may use only localdefinitions (maybe due to atomicity) or only remote defini-tions (maybe for the purpose of communication or synchro-nization). LR invariants can capture these properties.

Formally speaking, an LR invariant, LR(r), at a read, r,equals to “LOCAL” (respectively “REMOTE”) if r uses onlydefinitions from the local (respectively remote) thread. Wedenote them as LR-Local and LR-Remote, respectively. Ifr can read from either a local or a remote definition, r hasno LR invariant. Figure 3(a) illustrates the basic idea of LRinvariants.

Figure 1(a) and (b) demonstrate two different real-worldexamples of LR invariants from HTTrack and Apache andtheir corresponding violations. As we explained before inthe introduction, in the example shown in Figure 1(a), theglobal opt pointer at S2 in thread 1 should always use adefinition from a remote thread, i.e., a child thread of thread1, not from the local thread. Otherwise, it can lead to a nullpointer dereference. In other words, it has an LR-Remote

Page 6: Do I Use the Wrong Denition? - University of Wisconsin

Thread 1

W1

R1

Thread 1

R1

Thread 2

W1

(a) Local/Remote (LR)

R1 should read a value defined by a local writer W1.

R1 should read a value defined by a remote writer W1.

Thread 1

R2

Thread 2W1

R1

R2 (a follower) should read the same value as what R1 reads

(b) Follower

Thread 1 Thread 2W1

R1

R1 should read the value defined by a write in its definition set(c) Definition Set (DSet)

definition set= {W1}

Figure 3. Examples of real-world Definition-Use invariants and their violations.

invariant but unfortunately it is not guaranteed by the im-plementation (probably due to the programmer’s wrong as-sumption about the execution order after thread creation).

Conversely, in the Apache example shown on Figure 1(b),S2 always needs to read the refcount value decrementedby the local thread, i.e., S2 holds a LR-Local invariant. Fur-thermore, due to the incorrect implementation, the above as-sumption is not guaranteed. Therefore, S2 and S1 can be in-terleaved by S3, which causes S2 to read the definition froma remote write S3. This violates S2’s LR-Local invariant.

2.3 Follower InvariantsIn concurrent programs, programmers frequently make as-sumptions about the atomicity of certain code regions. TheLR invariants already captures the case of “read after write”data flow relation in an assumed atomic region, but not the“read after read” case, which can be captured by using a Fol-lower invariant.

Specifically, for two consecutive reads, r1 and r2, to thesame location from the same thread, if r2 always uses thesame definition as r1, we say r2 has a Follower invariant.Follower is different from LR because as long as r2 usesthe same definition as r1, the definition can come fromeither local or remote. Figure 3(b) demonstrates Followerinvariants.

Figure 1(c) shows a real-world example of Follower in-variants from Apache. As long as S2 reads the same defini-tion as S1, wherever buf index is updated, the execution iscorrect, i.e., S2 holds a Follower invariant. However, whensuch an assumption is not guaranteed, the sequence of S1and S2 may be interrupted by the interleaved remote accessS3, making S2 read a different value from S1. Therefore,S2’s Follower invariant is violated. This is an atomicity vio-lation bug.

2.4 Definition Set (DSet) InvariantsWhile concurrent programs have special inter-thread dataflows, definition-use is not specific to only concurrent pro-grams. Our third type of invariants, Definition Set (DSet),is suitable for both sequential and concurrent programs. ADSet invariant at a read is defined as the set of all writeswhose definition this read may use. Figure 3(c) shows aDSet invariant at R1. Every read instruction has such a DSet.

When it consumes a value defined by an instruction outsideits DSet, a DSet invariant is violated, indicating a likely bug.

Figure 1(d) shows a real-world example of a DSet invari-ant in a concurrent program, Transmission. S3 is supposedto use h->bandwidth’s value defined only by S2. If thethread creation and initialization in S2 takes a longer timethan the programmer’s assumption, S3 may use the defini-tion from S1, violating the DSet invariant of S1. It is of notethat there is no Follower invariant here. Even though there isan LR-Remote invariant, the bug’s manifestation conditiondoes not violate LR-Remote at S3 because the wrong defini-tion S1 also comes from a remote thread (thread 1). There-fore, LR or Follower invariants are not effective in detectingthis bug, while DSet is.

Apart from concurrent programs, DSet is also applicableto sequential programs, as shown in Figure 2. The DSetinvariant at S2 in Figure 2 (a) captures the set of validdefinitions for S2 to use: {S1}. When the semantic bug istriggered, S2 would use a definition from S3, violating S2’sDSet invariant. Hence, this bug can be detected. The otherthree sequential bug examples are similar.

2.5 DefUse: Combining All Three InvariantsThe above three types of invariants capture different aspectsof definition-use properties. Although DSet invariants ap-pear more general than LR and Follower, they do not nec-essarily subsume LR and Follower since they do not captureinter-thread information. In a definition set, we do not ex-press whether this definition is local or remote. As a result,in some concurrency bug cases, DSet may not be violated,but LR or Follower invariants are, as shown in the examplesbelow.

Figure 1(b) shows such an example from Apache Httpd.The variable refcount definition used at S2 is always fromatomic decrease() at S1. However, when the bug is trig-gered, S2 can use a definition from a remote thread also ex-ecuting atomic decrease(). Therefore, the DSet invariantat S2 is not violated during such abnormal execution. How-ever, the LR invariant is violated.

Similarly, DSet invariants do not completely cover Fol-lower invariants either. For example, in Figure 1 (c), thereare two definition-use chains, S3-S1 and S3-S2, which canbe recorded in two DSets. However, the dependence relation

Page 7: Do I Use the Wrong Denition? - University of Wisconsin

Invariant Database

Instructioncounters

DSetInvariants

LRInvariants

FollowerInvariants

Extractor

Detector

Pruning/RankingEngine

Runtime Infomation

Runtime Infomation

Violation Report

Extraction phase

Detection phase

Figure 4. Overview of DefUse System.

between the two reads’ definitions are not captured by DSet.In an incorrect run, if another instance of S3 is interleavedbetween S1 and S2, no DSet invariant is violated, althoughthe Follower invariant at S2 is.

In addition, since LR and Follower invariants capturegeneral properties instead of concrete values, they providefewer false positives in bug detection. It follows that theirgenerality may also lead to failure to detect the actual bugs.In particular, LR and Follower invariants are not suitable forcapturing sequential data flow, whereas DSet is. Therefore,DSet can be used for detecting memory and semantic bugs.

Due to the above reasons, our work combines all threeinvariants into a single framework, called DefUse.

3. Invariant ExtractionDefUse has two phases: (1) an extraction phase for inferringdefinition-use invariants; and (2) a detection phase for de-tecting invariant violations and reporting potential bugs af-ter pruning and ranking. DefUse uses four components forthese two phases: Extractor, Detector, Invariant Database,and Pruning/Ranking Engine, which are shown in Figure 4.

This section discusses how to automatically infer pro-grams’ definition-use invariants. It first describes the gen-eral idea and then provides the extraction algorithms for LR,Follower and DSet invariants. The next section will discussviolation detection, pruning and ranking.

3.1 The Main IdeaDefinition-use invariant are hard to extract using static codeanalysis, especially from concurrent programs. It is also tootedious to require programmers to write down all invariants.In addition, programmers may not be consciously aware ofmany such invariants. Otherwise, they probably would haveimplemented the code correctly without introducing suchbugs.

Similar to the previous works on extracting program in-variants, such as DAIKON [8], DIDUCE [13], AVIO [19],AccMon [45], and so on, we also rely on correct runs fromin-house testing to statistically infer definition-use invari-ants. Although it is possible to combine this with static anal-ysis, it is difficult to use such information especially for con-current programs written in C/C++, for which it is hard tostatically determine which code can be executed in parallel.Similar to the previous works, we also assume that the tar-

symbol descriptionI a static instructionIU a read instruction (i.e., a use)ID a write instruction (i.e., a definition)i the latest dynamic instance of I

iu the latest dynamic instance of IU

id the latest dynamic instance of ID

T (id) or T (iu) id’s or iu’s thread identifier

Table 1. Symbols used in this paper. All instruction symbolsare for the memory address m by default.

get program is reasonably mature and thereby feasible forextracting invariants statistically.

As pointed out by the previous works [19, 44], concurrentprograms provide a unique challenge for generating differenttraining runs. Different runs, even with the same input, canhave different interleavings. A systematic concurrency test-ing framework such as CHESS [25, 26] or CTrigger [30] canbe used to systematically explore different interleavings foreach input.Insufficient training problem and training sensitivity. Itis conceivable that training runs may not cover all executionpaths. Therefore, the extracted invariants may suffer fromthe problem of insufficient training. To address this, bothour invariant extraction and bug detection algorithms explorevarious techniques to filter unlikely invariants or violations,and devalue unconfident invariants and bug reports (see Sec-tion 4.2). Moreover, the extracted invariants can be refinedafter each detection run based on the programmer’s feed-back on bug reports. Finally, in our experiments, we studiedthe training sensitivity by testing with the inputs that are verydifferent from that used in the training (see Section 5.5).

3.2 Extraction AlgorithmsIn this section, we describe in detail the algorithms to extractdefinition-use invariants. Due to page limitations, we presentonly the algorithms for concurrent programs. The extractionfor DSet invariants works for sequential programs as well.For simplicity, we use the symbols explained in Table 1.DSet invariant extraction. Our DefUse extraction algo-rithm obtains DSet by collecting all definitions that are usedby IU during training runs. In our implementation, DefUseuses the Pin [21] binary instrumentation framework to in-strument accesses to every monitored memory location. Foreach memory location, DefUse stores its most recent writeinstruction in a global hash-table, called Definition-Table. Ata read iu, DefUse gets its definition id from the Definition-Table and uses id’s corresponding static instruction ID toupdate IU ’s DSet:

DSet(IU )← DSet(IU ) ∪ {ID} (1)After the training phase, information for every instruc-

tions’ DSet is stored into the invariant database along with

Page 8: Do I Use the Wrong Denition? - University of Wisconsin

some statistical information such as the number of times IU

and ID are executed, and so on, for the purpose of pruningand ranking.

LR invariant extraction. In order to infer LR invariants,DefUse firstly needs to know which thread provides thedefinition for a read. This can be easily obtained through theDefinition-Table. Specifically, when iu is executed, DefUsefinds its definition id from the Definition-Table. DefUse thencompares T (iu) and T (id) to determine whether they arefrom the same thread or not. Finally, it compares the answerwith the LR(IU ) associated with IU . If they are different,LR(IU ) is set to NO INV to indicate that there is no LRinvariant at IU and this read is no longer monitored forLR invariant extraction. LR(IU ) is initialized based on thedefinition source (either REMOTE or LOCAL) on the first timeIU is executed. This process can be formalized as follows:

LR(IU )←

LOCAL if LR(IU ) = LOCAL∧ T (id) = T (iu)

REMOTE if LR(IU ) = REMOTE∧ T (id) <> T (iu)

NO INV Otherwise(2)

Follower invariant extraction. In order to infer Followerinvariants, DefUse needs to store its recent access historyto determine whether an instruction and its predecessor usethe same definition. To achieve this, DefUse maintains a bit-vector for every memory location m, called has read(m).A bit in the vector has read(m,t) indicates whether thecurrent definition to memory location m has already beenused by thread t. By checking this bit-vector before everyread instruction iu, DefUse can easily determine whether iuand its predecessor use the same definition.

Specifically, bit-vector has read(m) is initialized aszero. Every write to m sets all bits of has read(m) tozero. After any read from m in thread t, has read(m, t)

is set to one. In order to know whether an instruction iu usesthe same definition as its predecessor, before executing iu,DefUse checks if the corresponding bit (i.e., has read(m,

t)) is one. If it is, it means that there is no new definitionsince thread T (iu)’s last use to m. In other words, iu sharesthe same definition with its predecessor.

To maintain and update the Follower invariant informa-tion for IU , DefUse associates Follower(IU ) with it. Thisflag is set to TRUE if IU ’s dynamic instances always sharedefinition with their predecessors. Whenever a dynamic in-stance of IU uses a different definition from its predecessor,the flag is set to FALSE and IU is no longer monitored forFollower invariant extraction.

Follower(IU )← Follower(IU ) ∧ has read(m, T (iu))(3)

3.3 Design and Implementation IssuesOur implementation is based on Pin [21], a dynamic binaryinstrumentation framework. In the following paragraphs, wediscuss several detailed implementation issues.

Monitored memory locations. The invariant extractionalgorithms can be applied to any memory unit (byte, word,and so on) and any memory region (stack, heap, global dataregion, and so on). We use byte granularity in our imple-mentation for the greatest accuracy. Currently, our DefUseprototype monitors heap and global data region. Stack is ig-nored because variables stored on stack seldom involve inconcurrency bugs [18]. Nevertheless, some sequential bugsmay not be detected because of this choice. We will extendDefUse in the future to cover these stack related sequentialbugs.

External definitions. Programs may use external librariesand system calls. We term the definitions from those externalcodes as external definitions. The absence of these externaldefinitions could result in some definition-use invariants notbeing detected.

We use two approaches to solve this problem. Firstly, weannotate some system calls and widely used library func-tions, such as memset and memcpy which can create defi-nitions. We regard these functions as black boxes and theircall-sites as definitions.

Of course, it is impractical to annotate all external func-tions and system calls. Therefore, our second approach isto introduce ξ = (Iξ , Tξ) to represent unannotated exter-nal definitions. When DefUse meets a use and does not findany preceding definition to the same memory location, thesymbol Iξ is introduced to represent the virtual external def-inition.

It is of note that since we instrument the code at binarylevel, we can always detect every user-level definition anduse, regardless of whether or not it is from libraries. How-ever, annotation may improve context-sensitivity comparedto directly processing the definitions within them.

Virtual address recycle. We also need to consider vir-tual address recycling caused by memory allocation and de-allocation. In other words, some instructions may accessthe same memory location simply due to memory recyclingrather than for some inherent data flow relation. To addressthis problem, we intercept every de-allocation function, anddelete all the previous information for accesses to the future-deallocated memory region.

Context-sensitivity. Small un-inlined functions may havemany call-sites. As a result, a use in such a function mayhave many definitions from different call-sites, which canlead to inaccurate invariants. This issue can be addressed intwo ways. The first is to inline this small function, and treateach call-site’s use as different reads. In other words, we canmake it context-sensitive. A simpler approach is to just prune

Page 9: Do I Use the Wrong Denition? - University of Wisconsin

such uses from the invariant database since it has too manydefinitions.

Training noise. Although existing in-house testing oraclesare relatively accurate in labeling incorrect runs from cor-rect ones in most cases, we also suffer from the problem oftraining noise (i.e., a buggy run is incorrectly labeled as cor-rect), as experienced with all previous works in invariant-based bug detection [8, 13, 19, 45]. To handle this prob-lem, our extraction algorithm can be extended by relaxingthe 100% support constraint. For example, if an invariantis supported by 95% of the training runs, this invariant isnot pruned and is still kept to detect potential violations dur-ing monitored runs. Intuitively, violations to such invariantsshould be ranked lower than those for invariants with 100%support from training runs.

4. Bug detection4.1 Detection AlgorithmsDefUse detects bugs by checking violations against the ex-tracted DSet, LR, and Follower invariants. At every readinstruction, if the monitored properties do not match theassociated invariants, DefUse reports such violation to itsRank/Pruning Engine for further analysis.

The detection algorithms are similar to the extractionalgorithms discussed in Section 3. In the below section, webriefly describe the basic detection algorithms against eachtype of definition-use invariant violations. We will discusshow to prune and rank bug reports in Section 4.2.

To detect DSet invariant violation, DefUse maintains aDefinition-Table at run time so that it knows which instruc-tion provides a definition for each use. If the definition for ause is not in this use’s DSet, a violation is issued. Formallyspeaking, the violation condition is:

ID /∈ DSet(IU ) (4)

Detecting violations against LR invariants is also straight-forward. At a monitored read, DefUse first checks whetherthis read has an LR invariant or not. If it does, DefUse ex-amines whether the monitored read and its definition comefrom the same thread and matches the monitored conditionwith the type of LR invariant (LOCAL and REMOTE) extractedat this read. If there is a violation, it will be reported to thePruning/Ranking Engine:

{LR(IU ) = LOCAL ∧ T (id) <> T (iu)} ∨{LR(IU ) = REMOTE ∧ T (id) = T (iu)}

(5)

The basic idea of detecting violations to Follower invari-ants is to check whether an instruction with a Follower in-variant shares the same definition with its predecessor (byleveraging the has read(m) vector similar to that used inextraction). If not, a violation will be reported for pruningand ranking:

Follower(IU ) ∧ has read(m, T (iu)) (6)

4.2 Pruning and RankingFor training-based approaches, it is usually hard to knowwhether the training is sufficient. Without ranking and prun-ing, bug detection using such approaches may generate a sig-nificant number of false positives due to insufficient training.Pruning. DefUse automatically prunes the following casesin bug reports:• Barely exercised uses: For reads that are never covered

during training, we do not report any violations sincewe do not extract any invariants associated with them. Itis also useful to extend this strategy to those uses thatwe have only seen once or twice during training runsand therefore have little confidence of in the extractedinvariants.

• Barely exercised definitions: Similarly, for definitionsthat are never exercised during training runs, if we en-counter them during detection, we are not confidentwhether they are violations. In this case, we also prunethem from the bug reports.

• Popular uses: Some uses, such as those in a small func-tion called from multiple call-sites, are very popular andhave a large definition set. In this case, we also prune itas it might be perfectly acceptable to have yet anotherdefinition for this use during detection runs.

Ranking. After pruning the above cases, DefUse ranks ev-ery unpruned violations based on its confidence. In this way,users can start with the top-ranked bug reports and gradu-ally move down the list. In addition, DefUse also providesan optional pruning phase, called confidence-based pruning,which prunes the violations whose confidence is lower thana certain threshold. By default, we do not use such an optionfor our experiments unless explicitly mentioned.

Intuitively, the following conditions increase the confi-dence of a true bug:• many dynamic instances of the definition (#ID) and the

use (#IU ) during training;• no significant difference between the number of instances

of the definition and instances of the use (|#ID −#IU |)during training;

• small definition set (|DSet(IU )|);• few instances of this violation pair (#violationDSet(ID , IU ))

during detection.

For a DSet invariant violation, if the invariant is supportedby many instances of definitions and uses during training,any violation to it has a high probability of being a bug.Meanwhile, if the size of DSet(IU ) is large, or the sameviolation is detected many times, the violation is unlikely tobe a bug. As such, the confidence is computed as follows.

Page 10: Do I Use the Wrong Denition? - University of Wisconsin

confDSet =#ID×#IU

(|#ID−#IU |+1)×|DSet(IU )|×#violationDSet(ID ,IU )

(7)

For LR and Follower violations, the confidence is com-puted based only on the number of dynamic instances of ause during training and the number of violations occurringduring detection, as follows:

confLR = #IU/#violationLR(IU ) (8)

confF = #IU/#violationF (IU ) (9)

Violations to multiple invariants. In some cases, a bugmay violate multiple invariants (e.g., both the DSet and LRinvariants). In these case, DefUse uses a geometric mean ofthe confidence of all violations as DefUse’s confidence. Asan alternative approach, we can also weigh each invariantdifferently. For example, violations to LR and Follower in-variants may have higher confidence values, since they arecoarse-grained invariants and hence can gain more statisti-cal support from training runs.

5. Experimental Evaluation5.1 Test Platform, Applications and BugsWe implemented DefUse using Intel’s dynamic instrumenta-tion tool, Pin [21], for invariant extraction and bug detection.All experiments are conducted on an 8-core Intel Xeon ma-chine (2.33GHz, 4GB of memory) running Linux 2.6.9.

We used a total of sixteen representative real-world ap-plications (shown in Table 2), including 8 multi-threadedapplications and 8 sequential applications. Concurrent ap-plications include 2 widely used servers (Apache Httpd andMySQL), 6 desktop/client applications (Mozilla JavaScriptengine, HTTrack, Transmission, PBZip2, and so on). Weused the original applications from their official websites anddid not modify any of them for our evaluation.

We evaluated DefUse with 20 real-world software bugs(including 11 concurrency bugs and 9 sequential bugs) ofvarious root causes, including atomicity violations, orderviolations, memory bugs and semantic bugs.

5.2 Training ProcessIn our experiments, we tried our best to emulate how pro-grammers would use DefUse in practice, especially for in-variant extraction from training runs. The following is ourtraining setup. Like all previous invariant-based works, weleverage the common available testing oracle from in-houseregression testing to label correct testing runs for training.

Invariants are extracted systematically. Specifically, weused multiple different inputs for training, which are cru-cial for sequential applications. All inputs used for trainingare from standard test cases released together with the cor-responding software. In other words, training inputs are se-lected in a way that is oblivious to the tested bug. In some

Type Applications LOC DescriptionsServer Apache 345K Web server

Con- App. MySQL 1.9M Database servercurrent Mozilla 3.4M Web browser suiteApp. HTTrack 54.8K Website copier

Desktop Transmission 86.4K BitTorrent clientApp. PBZip2 2.0K Parallel file compressor

x264 29.7K H.264 codecZSNES 37.3K Nintendo game emulator

Utility gzip 14.6K Compression tooltar 41.2K File archiving toolseq 1.7K Print number sequencepaste 1.4K Merge lines of files

Sequen- GNU sort 4.3K Sort lines in filestial Linux ptx 5.7K Index of file contentsApp. coreutils cut 3.3K Remove sections from files

pr 6.3K Convert files for printing

Table 2. Evaluated applications.

Bug Type ID Bug descriptionApache#1 Random crash in cache management

Atomicity Apache#2 Non-deterministic Log-file corruptionViolation Apache#3 Random crash in worker-queue accessesBugs Mozilla Wrong results of JavaScript execution

MySQL#1 Non-deterministic DB log disorderMySQL#2 Read after free objectHTTrack Read before object creation

Order Transmi A shared variable is read beforeViolation -ssion it is properly assigned by a child threadBugs PBZip2 Random crash in file decompression

x264 Read after file closedZSNES Inquiry lock before initialization

Semantics gzip Use wrong file descriptorseq Read wrong string terminator

Unbounded tar#1 Read out of buffermemory paste Read out of bufferread sort Read unopened fileDangling Pointer cut Dangling pointerMemory ptx Buffer overflowcorruption pr Buffer overflow

tar#2 Overwrite wrong buffer fields

Table 3. Evaluated real-world bugs.

experiments (for studying training sensitivity), we even man-ually examined the training inputs to ensure that they are dif-ferent from the input used for detection (as explained in thenext subsection).

For concurrent applications, in each input, training alsoneeds to cover various interleavings [25, 26, 36]. There-fore, we executed the tested program with each input for 10times 5 on our 8-core machine. It is conceivable to use moreadvanced interleaving testing tools [25, 26, 30, 36] to im-prove DefUse training process to cover more interleavings.

5 In the case of Apache server, we define one run as the duration to service for 100different client requests, and for HTTrack, web crawling time for one run is set to fiveminutes.

Page 11: Do I Use the Wrong Denition? - University of Wisconsin

5.3 Two Sets of ExperimentsDefUse is designed for two different usage scenarios (post-mortem diagnosis and general bug detection during testing)as mentioned in the introduction. During post-mortem diag-nosis, developers usually know which input triggers the bugso that they can easily design a good training input set forDefUse. For general bug detection, programmers can relyonly on in-house testing suites, and so it is possible that somemonitored run may have a very different input. In such cases,it would be desirable for a bug detector to not report manyfalse positives.

Considering the above two different usage scenarios,we design two sets of experiments to thoroughly evaluateDefUse.

1. DefUse bug detection with sufficient training inputs. Inthis set of experiments, DefUse’s training input set in-cludes all suitable inputs provided by the software devel-opers or those generated by representative tools(e.g., forMySQL, we used the mysql test suite; for Apache, weused Surge [2] and httperf [24]). For some applications(e.g., tar, sort), we generated many different inputs. Theresults with this setting are shown in Table 4 and Table 5.

2. DefUse bug detection with insufficient training inputs.We use two experiments to show how DefUse wouldperform with insufficient trainings: how the number offalse positives changes with inputs different from thoseused in training (Figure 5); and the amount of trainingruns/inputs needed to achieve a good detection accuracy(Figure 6).

In all our experiments, confidence is only used for rank-ing, but not for pruning, unless specifically mentioned (Fig-ure 5(b) and Figure 6(b)).

5.4 Overall ResultsTable 4 shows DefUse’s detection capability and the numberof false positives, as well as the bug rankings in DefUse’sviolation report.Bug detection. As shown on Table 4, DefUse can detect19 of the 20 tested bugs in the 16 applications. It is ofnote that these bugs are of various types including atomicityviolations, order violations, semantic bugs, buffer overflows,dangling pointers, unbound reads, and so on. Section 5.6explains why DefUse misses the bug of tar#2.

The results also show that the three types of definition-useinvariants (LR, Follower and DSet) are very complementary.DSet can detect 7 of the 11 concurrency bugs and all thetested sequential bugs, but misses 4 concurrency bugs whichcan be detected by LR or Follower. For example, Apache#1and Mozilla are detected only by using LR invariants, andApache#2 is detected only by using Follower. This justifiesthe benefit of combining all three invariants in DefUse.

DefUse can also detect bugs that would be missed by pre-vious tools. One is an order violation bug in HTTrack and

the other two are semantic bugs in gzip and seq. Specif-ically, the recently proposed invariant-based bug detector,PSet [44], cannot detect the HTTrack bug, since their de-tection is not strictly based on definition-use relations. Fig-ure 8 in section 6 shows the simplified codes of HTTrack,and compares DefUse with PSet’s behavior. It can be notedthat DefUse can detect the bug with both DSet and LR in-variants. A more detailed comparison with PSet with a bugexample is in section 6.

The evaluated two semantic bugs are caused only by a se-mantically wrong definition-use pair and not by accessing aninvalid memory location. Hence, they would not be detectedby previous memory checkers such as Valgrind [28].

False positives. In Table 4, for all the applications, thenumber of false positives is fewer than four. Moreover, al-most all bugs ranked top in the violation report, which helpsprogrammers easily identify bugs.

The low false positive numbers are a result of our pruningscheme described in section 4.2 as well as sufficient training(the results of insufficient training are shown later). Table 5demonstrates the effectiveness of our basic pruning algo-rithm (please note that no confidence-based pruning is usedhere). Overall, the number of false positives is reduced by upto 87% by pruning.

Comparing LR and Follower with DSet, the first two havemuch fewer false positives because they are coarse-grained(i.e., they look at only high-level definition-use propertiesinstead of concrete definitions) and have more statisticalsupports from training runs.

5.5 Training SensitivityAs discussed in Section 5.3, in some scenarios and especiallyif used for general bug detection instead of post-mortemanalysis, it is possible that training is not sufficient. Thatis, the training does not have a good coverage of executionpaths or inputs. Our second set of experiments aim to answerthe following related questions:

What if detection runs are dramatically different fromtraining runs? To show DefUse’s training sensitivity, wefirst analyze the effect of training inputs as shown in Fig-ure 5. For detection, we use various inputs that are differentfrom training inputs. To quantitatively measure their differ-ences, we use the number of basic-blocks newly explored ina detection run (but not in any training run).

Without any confidence-based pruning, as shown on Fig-ure 5(a)(c), when a detection run is very different from train-ing runs (e.g., for MySQL, with 835 number of untrained ba-sic blocks), DefUse can introduce a few (e.g., 58 in MySQL)false positives. Fortunately, this problem can be addressed byapplying confidence-based pruning, i.e., pruning out thosewith low confidence. In Figure 5(b)(d), after confidence-based pruning, even with 700 untrained basic blocks in adetection run, the number of false positives is always below30. However, confidence-based pruning may also increase

Page 12: Do I Use the Wrong Denition? - University of Wisconsin

Bug Type Applications LR Follower DSet DefUseBug False Bug False Bug False Bug False

Detected? Positives Detected? Positives Detected Positives Detected? PositivesConcurrent Applications

Apache#1 Yes(1) 0 no 0 no 1 Yes (2) 1Atomicity Apache#2 no 2 Yes(1) 0 no 3 Yes(1) 3Violation Apache#3 Yes(1) 0 Yes(1) 0 no 0 Yes(1) 0

Bugs Mozilla Yes (1) 0 no 0 no 3 Yes (1) 3MySQL#1 no 0 no 1 Yes(1) 0 Yes(2) 1MySQL#2 no 0 no 1 Yes(1) 2 Yes(1) 3HTTrack * Yes(1) 0 no 0 Yes(1) 0 Yes(1) 0

Order Transmission * no 0 no 0 Yes(1) 0 Yes(1) 0Violation PBZip2 no 0 Yes(1) 0 Yes(1) 0 Yes(1) 0

Bugs x264 no 0 no 0 Yes(1) 0 Yes(1) 0ZSNES Yes(1) 0 no 0 Yes(1) 0 Yes(1) 0

Sequential ApplicationsSemantics gzip - - - - Yes(1) 0 Yes(1) 0

seq - - - - Yes(1) 0 Yes(1) 0Unbounded tar#1 - - - - Yes(1) 0 Yes(1) 0

memory paste - - - - Yes(1) 0 Yes(1) 0read sort - - - - Yes(2) 1 Yes(2) 1

Dangling Pointer cut - - - - Yes(1) 1 Yes(1) 1Memory ptx - - - - Yes(1) 0 Yes(1) 0

corruption pr - - - - Yes(1) 0 Yes(1) 0tar#2 - - - - no 0 no 0

Table 4. Overall results from DefUse with invariants extracted from sufficient training runs described in Section 5.3. The numberinside each parenthesis is the bug’s ranking in the violation report of DefUse. The bugs marked * are new bugs that were neverreported before.

(a) MySQL before confidence-based pruning

(b) MySQL after confidence-based pruning

� �

� �

� �

� �

� ��

� � �

� � � � �� � � � � �� � � �

� � � � � � � � � � � � � � � � � � � �

�������������� � ��

��

� �

� �

� �

� ��

� ��

� �� � �� � �� ��� ���

� � � � � � � � � � � � � � � � � � � �

�������������� � ��

(c) HTTrack before confidence-based pruning

(d) HTTrack after confidence-based pruning

��

� �

� �

� �

� ��

� ��� � �� � �� � ��

� � � � � � � � � � � � � � � � � � � �

�������������� � ��

��

� �

� �

� �

� ��

� ��� � �� � �� � ��

� � � � � � � � � � � � � � � � � � �

�������������� � ��

Threshold = 5 Threshold = 5

Figure 5. Training sensitivity in terms of input difference. Each dot is one detection run. The detection runs use various inputs thatdiffer from those used in training runs by various degrees. For each detection, its difference from the training runs is reflected bythe number of basic-blocks that are not covered by the training runs. The x-axis is the number of such basic blocks. The y-axis is thenumber of false positives occurred in each detection run.

the chance of pruning out some true bugs, even though thiswas not demonstrated in our experiments.

How many training runs/inputs are sufficient? Figure 6shows how the number of training inputs/runs affects thenumber of false positives. Without confidence-based prun-ing (shown on Figure 6(a)), after DefUse training with 13inputs, the number of false positives is gradually reduced,and finally reaches to 1. Confidence-based pruning (shownon Figure 6(b)) makes the false positive curves drop faster.

In comparison, LR and Follower’s false positives are al-ways fewer than DSet. DefUse has fewer false positives thanDSet with confidence pruning, because its confidence is cal-culated using a geometric mean of the confidences from allthree types of invariants.

5.6 Bug Missed by DefUseEven though DefUse can detect various types of concur-rency and sequential bugs, it still misses one tested bug(tar#2), which is described in Figure 7. If a file name (i.e.,read s name) is longer than posix header.name, S1 cor-rupts the following fields, e.g., header.mode, and dumpsthe wrong data to fd. Note that the wrong write S1 and itsuse S3 do not violate the DSet invariant held by S3, sincethis definition-use relation has already been captured fromheader.name during training. Interestingly, other existingmemory checkers (i.e., Valgrind [28]) cannot detect it aswell, since S1 always writes to valid location, even whenthis bug is manifested.

Page 13: Do I Use the Wrong Denition? - University of Wisconsin

Application LR Follower DSet DefUseBP AP BP AP BP AP BP AP

Apache#1 0 0 0 0 2 1 2 1Apache#2 2 2 0 0 3 3 3 3Apache#3 0 0 0 0 0 0 0 0Mozilla 0 0 1 0 4 3 5 3

MySQL#1 0 0 1 1 4 0 5 1MySQL#2 0 0 1 1 22 2 23 3HTTrack 0 0 0 0 1 0 1 0

Transmission 0 0 0 0 0 0 0 0PBZip2 0 0 0 0 0 0 0 0

x264 0 0 0 0 1 0 1 0ZSNES 0 0 0 0 0 0 0 0

Table 5. Comparison on the number of false positives beforepruning (BP) and after pruning (AP). AP prunes barely ex-ercised uses and definitions, popular uses, and local violationsagainst Follower, as described in section 4.2. The results showthe worst case from five detection runs. The evaluation setup isthe same as Table 4.

(b) After confidence-based pruning (threshold = 5)

(a) Before confidence-based pruning

��4��4��4��4����4����4����4����4�

�������� ¡�4�����R�&�¢�&�R�& `�"�?�4�`�4�`���`�\ £�¤�¥#¦�§"¦�§4¨�¦�§4©4ª�£

«¬ ­ ®¯°° ±² ³´µ¯³¶ ·¶ ¸´³

��4�\��4�\��4�\��4�\��&�\�4��&�\�4��&�\�4�

«¬ ­ ®¯° «´¹ºº ³

������

�� ��

� � � � ��

� � � � � �

��4��4��4��4����4����4����4����4�

�������� ¡�4�`�&�R���¢���R�& `�#�z�\���\�����¢�4 £&¤�¥"¦ §j¦ §�¨�¦�§4©4ª�£

«¬ ­ ®¯°° ±² ³´µ¯³¶ ·¶ ¸´³

��4�\��4�\��4�\��4�\��&�\�4��&�\�4��&�\�4�

«¬ ­ ®¯° «´¹ºº ³

������

�� ��

� � � � ��

� � � � � �

Figure 6. False positives with increasing number of trainingruns. The plain-line curve shows the numbers of untrainedbasic blocks in the detection run.

To address this problem, DefUse needs to be extendedto consider other types of definition-use invariants suchas whether a definition is used at least once before itis re-defined. In Figure 7, when the bug is manifested,header.typeflag is re-defined by S2 without any usesafter S1, which can be detected by the extension.

Like previous invariant-based approaches, DefUse wouldmiss bugs where there is no statistical support to extract anyinvariants, and bugs that do not violate any invariants.

5.7 Overhead of DefUseIn Table 6, the slowdown of DefUse is less than 5X in mostcases and up to 20X for memory intensive concurrent appli-

»9¼�½ ¾B¿�À ÁÃÂuÄ�Å]Æ�Ç�½ È É?Ê »0¼�½ ¾uË<ÌÍ(Î7Ï ÂGÈ�Ç ÐÃÑ9Ò�Ê�Ä9ÉuÓu¾uÉ<Ç�Ô Õ0ÓuÖ�É;×0Ç É�Ó�À Å;Â"Å�Õ0Ó]Ö�É]Ë4ØÍ�Ù�Ï Ä9É�Ó0¾uÉ�Ç�Ô È Ò0Ñ0É0¿�À Ó0Ú?Û?Ü�ÝÞ6ß;àá6â�Å�ã�ÞäÃß å æ:ç�ä�ØÍ�è�Ï ÆBÇ�½ È É?Ê ¿ ¾�×uÄÃÉ�Óu¾uÉ<Ç�×GÂu½ éÃÉ�¼Ã¿&Ê�Ä9ÉuÓu¾]É�Ç Ë&Ë�Øê ë�ì<í í î;ï4ð ñ

ò ó È Ò0Ñ0ÉB¼0¿.ÄGÉ�Óu¾uÉ<Ç ó�òÂ7È�Ç�Á0Ð�È;Ñ0¼0Âu½ ô7Å�Ä9É�Ó0¾]É�Ç ÌÐ0Ä9Ó�Ç;Õ0Ó]Ö�É�õ öÃ÷u÷]ø ØÐ0Ä9Ó�Ç<ÖB¼u¾uÉ�õ ùuø�ØÔ Ô ÔÐ0Ä9Ó�Ç]È ÒuÑ9Éu¿�À ÓuÚ�Ø�Ô Ô Ôê Ø ú û ï4ð üý&þjÿ�� �������0ÿ�Sþ

�9ÿ�Sþ � �Ãþ���� ÿ��� þ�ÿ��7þ"ý

��� �Gþ

Figure 7. A memory corruption bug in GNU tar that cannotbe detected by DefUse.

Mozilla MySQL Transmission PBZip2 x264 gzip3.26X 5.53X 1.03X 3.39X 20.26X 4.12X

seq tar paste sort cut ptx1.87X 2.18X 2.71X 2.45X 2.39X 2.41X

Table 6. Overhead of DefUse.

cations such as application “x264”. Compared to some othersoftware only concurrency bug detection tools such as Val-grind (which includes a lockset data race detector and incurs10X-100X overhead) [28], and the AVIO software imple-mentation (which has 15X-40X slowdowns) [19], DefUse ismuch faster. Even though DefUse overhead may be high forsome applications, it is not critically important because weexpect Defuse to be used during post-mortem diagnosis andin-house testing/debugging, similar to Valgrind.

6. Related workProgram invariants. Our work was inspired by many pre-vious works on program invariants including DAIKON [8],DIDUCE [13], AccMon [45], AVIO [19], DITTO [37],MUVI [17], and so on. Similar to these studies, we alsoextract invariants from training runs, and dynamically detectviolations to report potential bugs. Unlike these works, ourinvariants focus on data flow. More specifically, the focus ison definition-use invariants, which can be used to detect bothconcurrency and sequential bugs. Few previous methods candetect both types. Furthermore, our tool is also one of thefirst that can detect order violations in concurrent programs.

Definition-use and data flow. A recent study [4] used dataflow to detect security vulnerabilities, i.e., buffer overflowattacks. It was based on the insight that if an intended dataflow is subverted with a unexpected use, there may leadto an attack. Our work differs from this as we determinewhether a use reads from a wrong definition by extractingdefinition-use invariants from both concurrent and sequen-tial programs, and report potential concurrency (includingatomicity and order violations) and sequential bugs.

Definition-use relations have also been used for softwaretesting. Many works [14, 16, 43] have focused on testingdefinition-use relations to increase test coverage.

Data flow has many other usages. For example, EIO [12]uses data flow analysis to see whether low-level error codes

Page 14: Do I Use the Wrong Denition? - University of Wisconsin

thread 1I0: write

I2: readI3: readI2: read

thread 2

I1: write

thread 2

I1: write

thread 1I0: writeI2: read

I3: readI2: read

thread 2thread 1I0: writeI2: readI3: readI2: read

Correct run 1 Correct run 2 Incorrect runPSet[I0]={ }; PSet[I2] = {I1}PSet[I1]={I0,I2}; PSet[I3] = {I1}

DSet[I2] = {I0, I1}DSet[I3] = {I1}

I1: write

Figure 8. Simplified code of HTTrack showing the differencebetween PSet and DSet. In the incorrect run (the executionorder of I1 and I3 does not match with the assumption), I3obtains the definition from I0, violating DSet(I3).

(e.g., ”I/O error”) are correctly propagated to file systems sothat recovery mechanisms can handle them. In [5], data flowanalysis is used to detect memory leak and double free.

Concurrency bug detection and avoidance. There havebeen a lot of works on detecting data races [10, 23, 39].The happens-before [31], lockset algorithms [7, 34, 38],and many of the extensions [29, 32] have been proposedfor data races detection. However, there are two problemsfor data race detectors: (1) many data races in real-worldapplications are benign such that race detectors may havetoo many false alarms; and (2) some wrong interleavings andorders are not related to data races. Therefore, concurrencybugs caused by atomicity violations and order violationsare introduced [18]. Atomicity violations have been wellstudied [9, 11, 19, 33, 42, 44], but order violations are, sofar, rarely studied.

Much research has been conducted on detecting or avoid-ing concurrency bugs. Recently, [44] proposed an innovativemethod to avoid concurrency bugs by using data dependencyinformation collected during correct runs. Their approachencodes the set of tested correct interleavings in a program’sbinary executable, and enforces only those interleavings atrun time. They use Predecessor Set(PSet) constraints to cap-ture the tested interleavings between two dependent memoryinstructions. Specifically, for each shared memory instruc-tion, PSet specifies the set of all valid remote memory in-structions that it can be immediately dependent upon 6.

Their work is quite different from DefUse in the follow-ing aspects. (1) Their work focuses on run-time fault toler-ance and recovery, while DefUse is tailored to bug detec-tion and post-mortem failure diagnosis. (2) PSet does not di-rectly target definition-use relations even though it reflectssome aspect of them. Specifically PSet does not capturesuch definition-use relations as it cares about only remote,preceding instructions. Figure 8 shows the simplified codefrom HTTrack, where PSet does not capture the definition-use relation on I3 and I1, and misses the order violation

6 The PSet of a static memory instruction M includes another instruction P only if(i) P and M are in two different threads; (ii) M is immediately dependent on P; (iii)neither of the two threads executes a read or a write (to the same memory location)that interleaved between P and M [44].

bug. DefUse (with DSet or LR invariants) can detect it.(3) Our work also includes LR and Follower invariants andthereby can check other properties of concurrent execution.(4) DefUse can also detect sequential bugs and work for se-quential programs.

Bugaboo [20] uses context-aware communication graphsfor concurrency bug detection. However, it is limited by thecontext size due to the exponential cost. If the granularityof the bug manifestation is larger than the context size, 5by default, the detection may be inaccurate. For example, inFigure 1(a), if the interval between S1 and S3 in executionis larger than the context size, it may miss the bug or mayreport the false positive. In addition, Bugaboo needs the spe-cial hardware extension because of the expensive softwareimplementation even for the default context size.

Our tool, DefUse, can detect not only atomicity viola-tions, but also order violations that have been seldom stud-ied in previous research. Using the same framework, we canalso detect memory bugs and semantic bugs.

7. ConclusionThis paper proposed definition-use invariants which can beused to detect a wide spectrum of software bugs, includingboth concurrency bugs (atomicity and order violations) andsequential bugs (memory corruptions and certain semanticbugs). Our experimental results with 16 real-world applica-tions and 20 real-world bugs of different types have shownthat DefUse was effective in detecting 19 of them, including2 new bugs that were never reported before, while introduc-ing only 0-3 false positives. Our training sensitivity exper-iments showed that DefUse can reasonably tolerate insuffi-cient training, especially with confidence-based pruning.

There are several possible directions for the future work.Firstly, as shown in our experiments, DefUse still fails todetect one tested bug (Section 5.6). To handle such bugs, weneed to extend DefUse invariants to include constraints. Sec-ondly, it is interesting to consider other types of definition-use invariants. Thirdly, DefUse is currently implemented en-tirely in software, but it is conceivable that some simplehardware extensions can significantly reduce DefUse’s run-time overhead and make it possible to use DefUse to monitorproduction runs.

References[1] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles,

Techniques, and Tools. Addison Wesley, 1986.[2] P. Barford and M. Crovella. Generating representative web

workloads for network and server performance evaluation. InACM SIGMETRICS, June 1998.

[3] M. Burrows and K. R. M. Leino. Finding stale-value errors inconcurrent programs. Concurrency and Computation: Practice& Experience, 16(12):1161–1172, 2004.

[4] M. Castro, M. Costa, and T. Harris. Securing software byenforcing data-flow integrity. In OSDI, 2006.

Page 15: Do I Use the Wrong Denition? - University of Wisconsin

[5] S. Cherem, L. Princehouse, and R. Rugina. Practical memoryleak detection using guarded value-flow analysis. In PLDI,2007.

[6] T. Chilimbi and V. Ganapathy. HeapMD: Identifying heap-based bugs using anomaly detection. In ASPLOS, 2006.

[7] J.-D. Choi, K. Lee, A. Loginov, R. O’Callahan, V. Sarkar,and M. Sridharan. Efficient and precise datarace detection formultithreaded object-oriented programs. In PLDI, 2002.

[8] M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco,M. S. Tschantz, and C. Xiao. The Daikon system for dynamicdetection of likely invariants. Science of Computer Program-ming, 69(1–3):35–45, Dec. 2007.

[9] C. Flanagan and S. N. Freund. Atomizer: a dynamic atomicitychecker for multithreaded programs. In POPL, 2004.

[10] C. Flanagan and S. N. Freund. FastTrack: efficient and precisedynamic race detection. In PLDI, 2009.

[11] C. Flanagan and S. Qadeer. A type and effect system foratomicity. In PLDI, pages 338–349, 2003.

[12] H. S. Gunawi, C. Rubio-Gonzaiz, A. C. Arpaci-Dusseau,R. H. Arpaci-Dusseau, and B. Liblit. EIO: Error handling isoccasionally correct. In FAST, 2008.

[13] S. Hangal and M. S. Lam. Tracking down software bugs usingautomatic anomaly detection. In ICSE, 2002.

[14] M. J. Harrold and B. A. Malloy. Data flow testing ofparallelized code. In ICSM, 1992.

[15] R. Hastings and B. Joyce. Purify: Fast detection of memoryleaks and access errors. In Usenix Winter Technical Conference,1992.

[16] S. Lu, W. Jiang, and Y. Zhou. A study of interleaving coveragecriteria. In FSE, 2007.

[17] S. Lu, S. Park, C. Hu, X. Ma, W. Jiang, Z. Li, R. A. Popa, andY. Zhou. MUVI: Automatically inferring multi-variable accesscorrelations and detecting related semantic and concurrencybugs. In SOSP, 2007.

[18] S. Lu, S. Park, E. Seo, and Y. Zhou. Learning frommistakes – a comprehensive study of real world concurrencybug characteristics. In ASPLOS, 2008.

[19] S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detectingatomicity violations via access interleaving invariants. InASPLOS, 2006.

[20] B. Lucia and L. Ceze. Finding concurrency bugs with context-aware communication graphs. In MICRO, 2009.

[21] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney,S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building cus-tomized program analysis tools with dynamic instrumentation.In PLDI, 2005.

[22] E. Marcus and H. Stern. Blueprints for high availability (2ndedition). John Wiley and Sons, 2003.

[23] D. Marino, M. Musuvathi, and S. Narayanasamy. LiteRace:effective sampling for lightweight data-race detection. In PLDI,2009.

[24] D. Mosberger and T. Jin. httperf—a tool for measuring webserver performance. Performance Evaluation Review, 26(3):31–37, 1998.

[25] M. Musuvathi and S. Qadeer. Iterative context bounding forsystematic testing of multithreaded programs. In PLDI, 2007.

[26] M. Musuvathi, S. Qadeer, T. Ball, and G. Basler. Finding andreproducing heisenbugs in concurrent programs. In OSDI, 2008.

[27] S. Narayanasamy, C. Pereira, and B. Calder. Recording sharedmemory dependencies using strata. In ASPLOS, 2006.

[28] N. Nethercote and J. Seward. Valgrind: A framework forheavyweight dynamic binary instrumentation. In PLDI, 2007.

[29] R. O’Callahan and J.-D. Choi. Hybrid dynamic data racedetection. In PPoPP, 2003.

[30] S. Park, S. Lu, and Y. Zhou. CTrigger: Exposing atomicityviolation bugs from their hiding places. In ASPLOS, 2009.

[31] D. Perkovic and P. J. Keleher. Online data-race detection viacoherency guarantees. In OSDI, 1996.

[32] E. Pozniansky and A. Schuster. Efficient on-the-fly data racedetection in multithreaded C++ programs. In PPoPP, 2003.

[33] A. Sasturkar, R. Agarwal, L. Wang, and S. D. Stoller.Automated type-based analysis of data races and atomicity.In PPoPP, pages 83–94, 2005.

[34] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, andT. Anderson. Eraser: A dynamic data race detector formultithreaded programs. ACM TOCS, 1997.

[35] SecurityFocus. Software bug contributed to blackout.http://www.securityfocus.com/news/8016.

[36] K. Sen. Race directed random testing of concurrent programs.In PLDI, 2008.

[37] A. Shankar and R. Bodik. DITTO: Automatic incremental-ization of data structure invariant checks (in Java). In PLDI,2007.

[38] C. von Praun and T. R. Gross. Object race detection. InOOPSLA, 2001.

[39] C. von Praun and T. R. Gross. Static conflict analysis formulti-threaded object oriented programs. In PLDI, 2003.

[40] M. Xu, R. Bodik, and M. Hill. A regulated transitive reductionfor longer memory race recording. In ASPLOS, 2006.

[41] M. Xu, R. Bodik, and M. D. Hill. A “flight data recorder”for enabling full-system multiprocessor deterministic replay. InISCA, 2003.

[42] M. Xu, R. Bodik, and M. D. Hill. A serializability violationdetector for shared-memory server programs. In PLDI, pages1–14, 2005.

[43] C.-S. D. Yang, A. L. Souter, and L. L. Pollock. All-du-pathcoverage for parallel programs. In ISSTA, 1998.

[44] J. Yu and S. Narayanasamy. A case for an interleavingconstrained shared-memory multi-processor. In ISCA, 2009.

[45] P. Zhou, W. Liu, F. Long, S. Lu, F. Qin, Y. Zhou, S. Midkiff,and J. Torrellas. AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-based Invariants. In MICRO,2004.