automatic verification of data-centric web services victor vianu u.c. san diego
TRANSCRIPT
Web service: service hosted on the Web
• Should be accessible by humans or
programs: need for standard interface• Should be possible to discover automatically: need
for uniform description of functionality• Should be possible to create complex services
using simpler ones: Web service composition and synthesis
This talk:automatic verification of Web services
• Finite-state abstractions
• Beyond finite-state: workflow+data
The WAVE verifier
Specifying Web services
• White box: internal logic
order
delivery
bill
payment
?o !b ?p !d
Simplest: finite state Mealy machines
Specifying Web services
• White box: internal logic
order
delivery
bill
payment
Simplest: finite state Mealy machines
!d!b
?p
?p
!d
!d
?o
!b
• Interacting Web services
C : finite set of peer-to-peer channels
authorize
M : (finite) set of message classes
ok
bill 2
paym
ent 2
order1
receipt1
order2
receipt2
payment 1
bill 1
P : finite set of Web services (peers) store bank
supplier1 supplier2
Combining Peer and Composition Models
• Peer fsa’s begin in their start states
. . .. . .
. . .
store
supplier1 supplier2
. . .
bank!o 1
!a ?k
!o2
?b1
?a !k
?o1 !b1?o2
?r 2
!b2
a
Executing a Mealy Composition (cont.)
. . .. . .
. . .
store
supplier1 supplier2
. . .
bank
• STORE produces letter a and sends to BANK
!o 1!a ?k
!o2
?b1
?a !k
?o1 !b1?o2
?r 2
!b2
Executing a Mealy Composition (cont.)
• BANK consumes letter a
. . .. . .
. . .
store
supplier1 supplier2
. . .
bank!o 1
!a ?k
!o2
?b1
?a !k
?o1 !b1?o2
?r 2
!b2
• Important parameters:
--bounded or unbounded queues
--open or closed system• Execution successful if all queues are empty and
fsa’s in final state
r2
r2
b1b2
o1
. . . . . .!o 1
!a ?k
!o2
?o2 ?r 2
!b2
?o1 !b1 . . .
store
supplier1 supplier2
. . .
bank
o2o2
?b1
?a !k
Verifying Temporal Properties of Mealy Compositions
• Label states with propositions• Express temporal formulas in LTL, e.g.,
– “shipment just made” only after “line-of-credit avail”
. . .. . .
. . .
store
warehouse1 warehouse2
. . .
bank!o 1
!a ?k
!o2
?b1
?a !k
?o1 !b1?o2
?r 2
!b2
?r 2
!b2
“line-of-credit
available”
“shipment just
made”
“shipment just
made”
Linear time temporal logic (LTL)
– Temporal operators• Xp: p holds in the next time
• pUq: p holds until q holds
• Fp: p holds eventually ; Gp: p always holds• pBq: either q always holds or p holds before q fails
p
p q
Current time
next time
Some time later
pp
Results on Temporal Verification
• Long history, see [Clarke et.al. ’00]
• E.g.: one fsa and propositional LTL – PSPACE in size of formula + fsa– linear time in size of fsa
• Mealy compositions– Bounded queues
• Composition can be simulated as Mealy machine• Verification is decidable• Standard techniques to reduce cost
– Unbounded queues• In general, undecidable [Brand & Zafiropulo 83]
Alternative: temporal property of sequence of exchanged messages
order2
payment
1bill 1
ok
receipt2
ord
er
1rece
ipt
1
bill2
paym
ent
2
“conversation”
a k o1 b1o2p1
r1 r2 b2p2
LTL properties: Every authorize followed by some bill?
authorizestore bank
ware-house1
ware-house2
• Conversation: sequence of exchanged messages
• Conversation language: set of all conversations between Mealy peers
• Bounded queues: regular language
Conversation languages
• Conversation language L is not regular:
L a*b* = { anbn | n 0 }• Conversation languages are context sensitive in fact, they are quasi-realtime languages
[Bulltan+Fu+Hull+Su 03]
• Unbounded queues
?b!a
p1 p2
?a
!b
a
b
Synthesis of compositions
• Given a set of peers and communication channels with message names, and a constraint Φ on the conversation language, find Mealy automata for peers so that the constraint is satisfied.
Bounded queues• Closed systems: PSPACE for LTL PTIME for ω-regular sets given by an automaton
• Open systems: “game” against environment synthesis = finding winning strategy undecidable for arbitrary topology hierarchical topology: decidable [Kupferman + Vardi 01]
but non-elementary even in linear case [Pnueli+Rosner] [
Synthesizing hierarchical compositionfrom “library” of services
TravelServiceTemplat
es
Air Travel
TemplatesAirport
Transfer
HotelReservatio
n
Customized
TravelService
database
Product index page(PIP)
Matching products
Home page(HP)Name passwd
login
Customer page(CP)
DesktopMy order
laptop
Product detail page(PP)
Product detail
Confirmation page(CoP)
Order detail
Desktop Search(SP)
Desktop search
Ram:
Hdd:
search
laptop Search(SP)
Desktop search
Ram:
Hdd:
Display:
search
Past Order (POP)
Past Order
Order status(OSP)
Order status
Cancel confirmationpage(CCP)
cancel
buy
Error message page(MP)
back
state
Product index page(PIP)
Matching products
Home page(HP)Name passwd
login
Customer page(CP)
DesktopMy order
laptop
Product detail page(PP)
Product detail
Confirmation page(CoP)
Order detail
Desktop Search(SP)
Desktop search
Ram:
Hdd:
search
laptop Search(SP)
Desktop search
Ram:
Hdd:
Display:
search
Past Order (POP)
Past Order
Order status(OSP)
Order status
Cancel confirmationpage(CCP)
cancel
buy
Error message page(MP)
back
input
output query
update
Motivation
• Interactive, data-driven Web applications:
powered by an underlying database and interacting with external users according to explicit or implicit workflow
• Complexity of workflow leads to bugs:
see the public database of Web site bugs (Orbitz bug)
• Static analysis required to increase confidence in correctness and robustness of applications
• Problem: Data-driven Web applications are
infinite-state systems !
The WAVE Verifier
• WAVE = Web Application VErifier
• Practical, sound and complete automatic verification for data-driven applications
• Novel coupling of model checking and database optimization techniques.
login cancel
Home Page(HP)
desktoplaptop
RAM:CPU:
RAM:CPU:SCREEN:
submit submit
Matching products
Details Confirmationbuy print
Customer Page(CP)
Laptop Search (LSP) Desktop Search (DSP)
Product Index (PIP)
Product Detail (PDP)
Confirmation (CoP)
back
Message
Message Page (MP)
state update
DB
action
NAME:PASSWD:
High-level
WebML-style workflow
Modeling Web Applications
A Web application is described by
• the set of– database relations D (fixed during a session)– state relations S (updateable)– input relations I (hold user’s input choice) – action relations A
• and a set of web page schemas (templates)– their contents are determined at run-time, as function
of D,S,I
Webpage Schema
• Input options (user may pick at most one of them)– Query(DB, State, Previous input)
• Actions and Transitions (triggered by user input)– Actions: Query(DB, State, Input, Prev. input)– Next Webpage: Query(DB, State, Input, Prev. Input )
• State updates (triggered by user input)– insertions/deletions : Query(DB, State, Input, Prev. Input)
“Query”: First Order Logic formula (core SQL)
• input options as provided by menus, pull-down lists, buttons, HTTP links: user must choose one
• input constants model text input boxes: name, password, etc.
• input I at previous step: prev-I can be viewed as special state
Details on Modeling the Input
login cancel
Home Page(HP)
desktoplaptop
RAM:CPU:
RAM:CPU:SCREEN:
submit submit
Matching products
Details Confirmationbuy print
Customer Page(CP)
Laptop Search (LSP) Desktop Search (DSP)
Product Index (PIP)
Product Detail (PDP)
Confirmation (CoP)
back
Message
Message Page (MP)
NAME:PASSWD:
Home page(HP) Input: name, password , clickbutton(x)
Input options: clickbutton(x) (x= “login” or x = “cancel”)
State update: error("bad user") not users(name,password) and clickbutton("login")
Page Transition rules: CP users(name,password) and
clickbutton(“login”) MP not users(name, password) and clickbutton("login")
input constant
input constant
DB tablestate table
next page
login cancel
Home Page(HP)
desktoplaptop
RAM:CPU:
RAM:CPU:SCREEN:
submit submit
Matching products
Details Confirmationbuy print
Customer Page(CP)
Laptop Search (LSP) Desktop Search (DSP)
Product Index (PIP)
Product Detail (PDP)
Confirmation (CoP)
back
Message
Message Page (MP)
NAME:PASSWD:
Product Info Page (PIP) Input: pick(pid,price)
Input options:
pick(pid,price) ram cpuprev-search(ram,cpu) catalog(pid,ram,cpu,price)
… previous
input
db table
Verifying Web Applications
We want to verify properties of the possible interactions between users and web app.
need to describe interactions:
A run of Web application W is the sequence of configurations through which W evolves
input
page
actions
state
DB
Configuration:
Examples of Desirable Properties
• Semantic properties– “no product is delivered before payment in the right
amount is received" – “no user can cancel an order that has already been
shipped”
• Basic soundness of specification– “conditions guarding transition to next Web page are
mutually exclusive”
• Navigational properties– “the shopping cart page is reachable from any page”
Expressing properties in LTL-FO
LTL-FO: First-Order logic + linear temporal logic operators
• Xp: p holds in the next time step
• pUq: p holds until q holds
• Fp: p will finally hold
• Gp: p generally holds (in all steps)
• pBq: p must hold before q holds
More expressive than classical (propositional) LTL
Example Property
Any shipped product must be previously paid for
PPpay(price)button(“authorize payment”)
pick(pid, price)prod-price(pid, price)
pid, uname,price [(pid, uname, price) B
Ship(uname, pid)]
Where (pid, uname, price) is the formula
page name
input
state database
action
input
The Verification Problem
Given Web application W and LTL-FO property P
Decide if every run of W satisfies P.If not, exhibit a counterexample run.
dbcontrol
input
action
state
Challenge: infinite-state reactive system
Control: (input, state, db) (action, state)
relational transducer
Verification of Infinite-State Systems
• Typical approaches in Software Verification are unsatisfactory:
– Model checking: developed for finite-state systems described by propositional states. More expressive specifications first abstracted to propositional ones.
Unsatisfactory: can check that some payment occurred before some shipment, but not that it involved the correct amount and product.
– Theorem proving: not autonomous. Prover gets stuck during search and asks for guidance from expert user.
Verification results for LTL-FO [PODS’04]
• The verification problem is undecidable
• Restriction for decidability: input boundedness – Essentially, input-guarded quantification, i.e. quantified variables must appear in input atoms
pick(pid,price) ram cpu prev-search(ram,cpu) catalog(pid,ram,cpu,price)
Input-bounded specs
• State, action, and transition rules use FO conditions with “input-bounded” quantification:
x ( input( x ) φ( x ))
x ( input( x ) φ( x ))
state atoms have no quantified variables
prev-I can also serve as guard• Input options definitions:
*FO (db, prev-input, ground state atoms)
Input-bounded LTL-FO property:FO components are input bounded
x G [ X reject-order(x)
(past-order(x) y (pay(x,y) price(x,y)))]
“An order is rejected in the next step only if it has already been ordered but not paid correctly in the current input”
Verification results:
If W and P are input-bounded, checking whether W satisfies P is PSPACE-
complete
Even modest relaxations of input boundedness restriction lead to undecidability.
Extensions leading to undecidability
• Relaxing the requirement that state atoms must be ground in formula defining the input options.
Reduction: Does TM halt on input epsilon?• Lifting the input-bounded requirement by allowing state
projection.
Reduction: Implication for FDs and IDs• Allowing Prev-I to record all previous input to I rather than
the most recent one.
Reduction: Trakhtenbrot’s Theorem• Extend the FO-LTL formulas with path quantification.
Reduction: validity of **FO formulas
Expressivity of Input-bounded Specs.
See demo site http://www.db.ucsd.edu for
modeling of significant parts of the following Web
applications:
• Dell-like computer shopping website• Expedia• Barnes&Noble• GrandPrix motor sport Web site
How to verify
To check that W satisfies P, verify that there is no run satisfying P.
Model checking approach (finite-state):
• Build Buchi automaton A(P) for P
• Build automaton S accepting all runs
• Check that there is no counterexample run: emptiness of S X A(P)
Our case: infinite-state system
Verification: build A(P), then search for counterexample runs accepted by A(P)
But: no automaton S for the runs!
Challenge in searching for counterexample runs: infinite runs infinitely many underlying databases
How to limit the search space?
Challenge: Infinite Search Space for Runs
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
length of run
...
number of underlying DBs
Bounding the Search for Counterexample Runs
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
length of run
number of underlying DBs
Periodic runs suffice: counterexample iff
periodic one
Sufficient to consider only DBs over a fixed domain of cardinality exponential
in size of spec + prop
doub
le-e
xpon
entia
lly m
any
DB
s
doubly-exponential length in size of spec+prop
Finite search space yields decidability
of verification
Key insight for PSPACE complexity
No need to explicitly materialize entire configuration:
Instead, at each step construct only those portions of DB, state and actions which can affect property.
Call them pseudoconfigurations.
Pseudorun: the resulting sequence of pseudoconfigurations.
counterexample run iff counterexample pseudorun
Pseudoconfigs have size polynomial in size of spec + prop
PSPACE verification algorithm
Pseudoconfigurations
input
page
actions
S
C = a set of relevant constants extracted from the spec. and prop. + a fixed number of variables
input picked from C
restriction of states to constants in C
restriction of DB to C
restriction of actions to constants in C
Pseudoruns
. . .
DB DB DB
• pseudoconfigs have size polynomial in size of spec + prop
• we never construct entire DB, just “slide” poly window over it! PSPACE verification algorithm
pseudorun
The WAVE Verifier [SIGMOD’05]
• Essentially implements search for a counterexample pseudorun
• Many tricks and heuristics to achieve good verification times
Reducing the number of pseudoruns
For computer shopping Website:
Specification contains 29 constants
(“login”, “cancel”, “submit”, “admin”, etc.).
Four DB tables of arity 2, 3, 5 and 7.
Yield
2^(29^2 + 29^3 + 29^5 + 29^7) = 2^(17,270,412,688) partial DBs!
Heuristics for Pruning Pseudoruns
• Only some combinations of constants are relevant Ex: button values compared to “cancel”, “login”, “submit”,
“register”, user names to “admin”, but not conversely!
A DB tuple listing a user named “cancel” is irrelevant to pseudorun and should not be constructed in the first place.
• In general, Dataflow analysis identifies all constants to which a DB attribute may be compared (directly or indirectly).
This limits the relevant combinations of constants when constructing partial DBs.
• Spectacular reduction: for the computer shopping website, from 2^(17,270,412,688) partial DBs to 8 !
More Tricks
• Internal representation of pseudoconfigs to– Efficiently detect loop in periodic run– Efficiently evaluate queries
• Early pruning of pseudoruns as soon as property is violated
• See Sigmod 05 paper for more
Experimental Evaluation of WAVE Tool
• Online Demo at http://www.db.ucsd.edu/
• WAVE was evaluated experimentally on 4 Web applications:– Dell-like computer shopping– Part of Expedia, Barnes&Noble, GrandPrix
• We measured verification time for a battery of properties: all within seconds, below one minute.
• Here, report only Dell experiment. All others are similar.
Some of the Verified PropertiesProperty type Property name Time (seconds)
Sequence pBq P5 (true)
P7 (true)
4
2
Session Gp Gq P9 (true) 1
Correlation Fp Fq P10 (true)
P11 (false)
P12 (true)
P13 (false)
0.23
0.29
0.6
0.44
Response p Fq P14 (false) 0.19
Reachability Gp or Fq P2 (true)
P3 (false)
0.9
0.37
Recurrence G(Fp) P17 (false) 0.15
Strong non-progress F(Gp) P15 (false) 0.26
Weak non-progress G(pXp) P6 (false) 0.49
Guarantee Fp P1 (true)
P8 (false)
0.02
0.11
Shipment only after proper payment
Failure of Classical Tools
• SPIN model checker Abstraction is unsatisfactory.
Alternative trick:Try to use SPIN to verify pseudoruns. The resulting SPIN input is too large to
handle.
• PVS theorem prover Not guaranteed to find a counterexample. Gets stuck during search, asks for guidance from expert user.
Work in progress: composition of Web services
Product index page(PIP)
Matching products
Buyer Login page(BP)
Name passwd
login
Category choice page(CP)
DesktopMy order laptop
Product detail page(PP)
Product detailbuy
Confirmation page(CoP)
Order detail
Desktop Search(SP)
Desktop searchRam:Hdd:
search
laptop Search(SP)Desktop searchRam:Hdd:Display:
Past Order (POP)
Past Order
Order status(OSP)
Order status
Cancel confirmationpage(CCP)
User payment(UPP)
Credit Verification
M
Payment CC No:Expire date
submit
cancel
search
Conclusions• Sound and complete verification is feasible for a significant class
of database-powered (hence infinite-state) Web services.
• The verification times are surprisingly good: incomplete verification of software often takes days, even after abstraction.
Our results suggest that • database-powered Web applications may be unusually well
suited for automated verification.• Coupling of database and model-checking techniques is
extremely effective.
Significant to both the DB area and automatic verification.
Branching-time temporal properties
• Computation tree logic (CTL*|CTL) Add path quantifiers:
• A---”for every path”• E---”there exists a path”
Computation tree logic (CTL)
From every page, there is a way back to the home page
(AGEF)homepage
CTL example
Verification results for CTL(*)
• Propositional transducers:
--states and outputs are propositional
--prev-I atoms are disallowed
• CTL* formulas using state, output, and
inputs interpreted as propositions
• Verification of CTL(*) formulas for propositional transducers:
--CO-NEXPTIME for CTL
--EXPSPACE for CTL*
Proof idea:
(i) show that there is a bound
on the databases that need to be considered in
order to detect a violation;
(ii) for a fixed database, reduce checking violation
to model checking for a Kripke structure
generated from the database.
Getting down to PSPACE:
• Fully propositional transducers:
inputs are also propositional
Proof technique: highly efficient model-checking technique of Kupferman, Vardi, Wolper using hesitant alternating tree automata (HAA). Reduce to checking emptiness of a one-letter word HAA.
Alternative restriction: capturing “user-driven search”
• Propositional states and actions• Inputs are monadic, propagated using prev-I atoms
Example: allows conducting a user-driven searchgoing through consecutive stages of refinement
For transducers with “user-driven search”:
EXPTIME for fixed out-degree of input choice
Proof: reduce to satisfiability of CTL(*) formulas by a Kripke structure
CTL formulas can be verified in EXPTIMECTL* formulas can be verified in 2-EXPTIME
Application: verification for Web services
• Basic soundness of specification
“no input constant is required before it is supplied”
“the next Web page is always uniquely defined”
• Semantic properties
“no product is delivered before payment of the right amount is received"
“no user can cancel an order that has already been shipped”
• Navigational properties
“there is a way to reach the home page from any page”
Next step
• Practical tools for verification
SPIN ? symbolic model-checking?• Multiple users, sessions...• Modeling Web service compositions• Analysis of compsitions• Synthesis of compositions
The XML angle
• Black box: input/output signature
order
delivery
bill
payment
Supplier
XML type
XML type
XML type
XML type
• Interacting Web services
authorizeok
bill 2
paym
ent 2
order1
receipt1
order2
receipt2
payment 1
bill 1
store bank
supplier1 supplier2
• Interacting Web services
authorizeok
bill 2
paym
ent 2
order1
receipt1
order2
receipt2
payment 1
bill 1
store bank
supplier1 supplier2
XML types XML types
XML types
XML types
• XML document:
labeled, unranked, ordered tree
Quick XML Review
root
section section
intro section conc intro conc
intro section section conc
intro conc intro conc
• XML type: regular tree language
• XML query: tree transducer
e.g. k-pebble transducer
Thomas Schwentick’s tutorial, Games’03
Application: typechecking
output type β
Need to check: T(α) β
Equivalently: α T-1(β)
Web service A
output type output typeoutput type α
XML query
Tree transducer T
Web service B Web service C
Theorem: T-1(β) is a regular tree language[Milo+Suciu+V.-]
1. Compute from T and β the tree automaton for T-1(β)2. Check that α T-1(β)
Typechecking:
Caveat: no data joins
Example: application to matchingfor service composition
Web service A
Web service B
output type α
input type β
Can output of A be restructured to fit the input type β of B?
• Given output I of service A, check whether
T(I) β ≠ by constructing a tree automaton for T(I)
• If yes, produce as side effect a minimal restructuring of I that satisfies β,
witnessing the nonempty intersection
Key: describe allowed restructurings by a nondeterministic transducer T
Static version:
Can every output of A be restructured
so as to satisfy the input type of B?
• Key: {I / T(I) β ≠ } is regular
if T is k-pebble transducer
• Enough to check that
α {I / T(I) β ≠ }
GetTemp
city
“Paris”
GetEvents
“Exhibits”
temp
“16°C”
exhibits
GetExhibits
“Paris”
City
Going all the way: Active XML
<?xml version=“1.0” ?><newspaper> <title>Le Monde</title> <date>06/10/2003</date> <call svc=“Yahoo.GetTemp”> <city>Paris</city> </call> <call svc=“TimeOut.GetEvents”> exhibits </call></newspaper>
newspaper
titledate
“Le Monde”
“06/10/2003”<temp>16°C</temp>
Y!Y!
Materialization: replacing a service call by its result. It’s a recursive process.
T!T!
<exhibits> <call svc=“Yahoo.GetExhibits”> <city>Paris</city> </call></exhibits>
[Milo,Abiteboul,Amann,Benjelloun,Ngoc – SIGMOD03]
• Context: peer-to-peer Web services
• Each peer– Repository of intensional (AXML)
documents– Server: provides Web services (XQuery)– Client: when invoking the embedded
service calls
• Restriction on where data and service calls
occur in tree
Extended type
GetEvents
“Exhibits”
newspaper
title date
“Le Monde”“06/10/2003”
GetTemp
city
“Paris”
• Service call input/output signatures
• Service call definitions
input parameters: XQueries on client data
output: XQuery on parameters and server data
Basic typechecking problem
Given AXML types and service signatures and definitions for all peers, check if:• all AXML documents resulting from calls among peers are valid• all service inputs and outputs satisfy the signatures
Can be checked using transducers (if no data joins)
Controlling expansion policy by typing
GetEvents
“Exhibits”
newspaper
title date
“Le Monde”“06/10/2003”
GetTemp
city
“Paris”
temp
“16°C”
Y!Y!
Materialization can be performed by the sender, before sending a document… or by the receiver, after receiving it.
GetEvents
“Exhibits”
newspaper
title date
“Le Monde”“06/10/2003”
GetTemp
city
“Paris”
temp
“16°C”
Why control the materialization of calls?
• For added functionality, e.g. – Intensional data allows to get up-to-date information.
• For security reasons or capabilities, e.g.– I don’t trust this Web service/domain,– I don’t have the right credentials to invoke it, – It costs money,– Maybe the receiver doesn’t know Active XML!
• For performance reasons, e.g.– A proxy can invoke services on behalf of a PDA.
Example scenario
• Client allows only certain service calls, specified by its type
• Can server always force its answer to satisfy
the clients schema by appropriate expansions?• Game between server and invoked services
Example: word case
• Client type: regular language R
• Input: word a1 ... an
each ai represents a Web service call
• Output type of service a: regular language Ra
• Game: Bob chooses a in the current word,
Alice responds with word in Ra to replace a
• Bob wins if resulting word is in RDoes Bob have a winning strategy on a1 ... an ?
Undecidable[Segoufin,Schwentick,Muscholl – STACS’04]
Decidable under restrictions[Segoufin,Schwentick,Muscholl – STACS’04][Milo,Abiteboul,Amann,Benjelloun,Ngoc – SIGMOD’03]
even if limited to “context-free games”: Ra consists of just finitely many words
complexity: PSPACE to EXPSPACE
Approaches not covered here:
• Process algebras, pi-calculus
• Situation calculus
• Petri nets
• Transaction logic
• Use of ontologies, description logics
References
• [Brand and Zafiropulo 83]: [15] in Hull pods 2003• [Bulltan Fu Hull Su 03]: [17] in hull pods
• [Pnueli+Rosner]: [57] in hull pods
• [Kupferman + Vardi 01] [48] in hull pods• Quasi-realtime languages: accepted by non-det multi-
tape TM in linear time [Book Greibach]
also, smallest AFL containing the CFGs and closed under intersection
AFL: closed under U, conc, +, non-erasing homo, inv. Homo, inters. with regular languages