reverse engineering state machines by interactive grammar inference
DESCRIPTION
Reverse Engineering State Machines by Interactive Grammar Inference. Neil Walkinshaw , Kirill Bogdanov , Mike Holcombe, Sarah Salahuddin. State Machines. Used to model software behaviour. edit. Documentation. load. Inspection / review. save as. close. Model-based testing. ok. exit. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/1.jpg)
Reverse Engineering State Machines by Interactive Grammar Inference
Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin
![Page 2: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/2.jpg)
State Machines
• Used to model software behaviour
load
exit
close
edit
save as
ok
Documentation
Inspection / review
Model-based testing
Model checking
![Page 3: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/3.jpg)
State Machines
• Used to model software behaviour
load
exit
close
edit
save as
ok
Documentation
Inspection / review
Model-based testing
Model checking
• Only useful if complete and up-to-date• Usually not the case due to time constraints and software
evolution
![Page 4: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/4.jpg)
Reverse Engineering State Machines
• Static analysis – analysis of source code– symbolic execution, flow analyses, ...– Inevitably considers executions that are infeasible in
practice• Dynamic analysis – infer model from sample
executions– Favoured for accuracy– States considered equal if subsequent trace is similar– Variants of the k-tails algorithm [Biermann, Feldman-
1972] most common reverse engineering algorithm
![Page 5: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/5.jpg)
Traditional Approach• For any point in a trace, its k-tail is the
following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit>
load edit edit save_as ok edit editedit
![Page 6: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/6.jpg)
load edit edit save_as ok edit editedit
Traditional Approach• For any point in a trace, its k-tail is the
following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit> K=2
![Page 7: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/7.jpg)
load edit edit save_as ok edit editedit
Traditional Approach• For any point in a trace, its k-tail is the
following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit> K=2
load edit save_as
edit
ok
![Page 8: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/8.jpg)
load edit edit save_as ok edit editedit
Traditional Approach• For any point in a trace, its k-tail is the
following sequence of k events or functions– Point x is considered equivalent to y if the k-tails are equal<load,edit,edit,edit,save_as,ok,edit,edit> K=2
load edit save_as
edit
okRemove
Non determinism load save_as
edit
ok
![Page 9: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/9.jpg)
Problems• Too expensive if result is to be correct and complete:– Need complete set of executions up to certain length– Passive – all executions need to be presented at once
• If provided traces only partial (probable for non-trivial system) the resulting model is untrustworthy– Difficult to tell how complete the model is – what’s
missing?
load save_as
edit
okload
exit
close
edit
save as
ok
![Page 10: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/10.jpg)
Regular Grammar Inference
• Given a set of valid and (optionally) invalid sentences from a language, infer its grammar.
• Regular grammars can be represented as deterministic finite state machines
• Problem of regular grammar inference equivalent to that of reverse engineering state machines
• Several sophisticated grammar inference techniques– Effectively address many problems that arise with
current reverse-engineering approaches
![Page 11: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/11.jpg)
Benefits of Adapting Grammar Inference Techniques
• Active techniques – Do not require set of executions to be presented at
once– Interact with an oracle to identify missing information
• More efficient– Can efficiently process large sample sets.
• Reasonably accurate given sparse sets of executions– More sophisticated heuristics to accurately identify
equivalent states
![Page 12: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/12.jpg)
Query-Driven State Merging (QSM)
• Devised by Dupont et al. • Combines benefits mentioned on previous slide– Active, efficient, reasonably accurate for sparse sets of
sample executions• Guaranteed to produce correct machine if set of
sample executions is characteristic:– Must cover every transition in the target grammar– Enough positive and negative samples to differentiate
between different states (to prevent false merges)– Questions aim to elicit characteristic sample from oracle
![Page 13: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/13.jpg)
Query-Driven State Merging (QSM)<load, close, exit><load, edit, edit, save_as, ok, close, exit><load, edit, edit, edit, close, exit>
load close
exit
editedit save_as ok close exit
edit
close exit
Generate “Prefix Tree Acceptor”
![Page 14: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/14.jpg)
Query-Driven State Merging (QSM)
load closeexit
editedit save_as ok close exit
edit
close exit
Attempt mergeProduce questions (executions valid in this machine, but not in unmerged version)
<close,exit>?<edit,edit...>?<Load,load,close,exit>?
![Page 15: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/15.jpg)
Query-Driven State Merging (QSM)Attempt mergeProduce questions (executions valid in this machine, but not in unmerged version)If all questions answered yes,
merge nodesElse
add negative questions to graph
load close
exit
editedit save_as ok close exit
edit
close exit
close, edit
ActiveEfficientAccepts negative information about model
![Page 16: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/16.jpg)
Implementation• Use Eclipse TPTP to record traces– Sequence of method calls → <load,edit...>
• Questions can either be answered manually– OR as tests directly to the system– Can vary number of questions generated
• QSM component accepts simple text files of strings (prefixed with “+” and “-”)
![Page 17: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/17.jpg)
![Page 18: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/18.jpg)
Evaluation
• Used traces to generate JHotDraw case study– Described in paper
• Generated random state machines – Subject to certain constraints – minimal, deterministic
etc.– Three sets of 10 random machines (5, 25, 50 states)– Random paths over these machines = initial set of
traces– Measured accuracy of final machine, and number of
questions required
![Page 19: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/19.jpg)
![Page 20: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/20.jpg)
![Page 21: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/21.jpg)
Current and Future Work
• Identify data constraints associated with states– Can use tools such as Daikon
• Automatically answer queries– Static analysis – using call graph analysis to
automatically propose negative / impossible executions
– Automated test generation• Heuristics – can certain questions be safely
ignored?
![Page 22: Reverse Engineering State Machines by Interactive Grammar Inference](https://reader030.vdocuments.mx/reader030/viewer/2022032612/56813320550346895d99f484/html5/thumbnails/22.jpg)
Conclusions
• Preliminary results show technique is reasonably accurate and efficient
• Can potentially be almost entirely automated– Automatically generates tests (questions), many of
which can be eliminated by static analysis anyway• Grammar Inference is useful source of ideas
for dynamic analysis and reverse engineering