unambiguous automata inference by means of states-merging methods françois coste, daniel fredouille...
TRANSCRIPT
Unambiguous automata inference by means of states-merging methods
François Coste, Daniel Fredouille{fcoste|dfredoui}@irisa.frhttp://www.irisa.fr/symbiose
IRISA-INRIA, Campus de Beaulieu35042 Rennes CedexFrance
D. Fredouille and F. Coste, Unambiguous Automata Inference 2
Definitions
Alphabet: = {a,b}
Word: abbabbabbaaa
Language: L
Automaton:
*
*
I- Automata inference
L={a+b}*a{a+b}
D. Fredouille and F. Coste, Unambiguous Automata Inference 3
Classes of automata (1/3)
Nondeterministic Automata (NFA)
Deterministic Automata (DFA)– one outgoing transition per input symbol
I- Automata inference
a
a
L={a+b}*a{a+b}
D. Fredouille and F. Coste, Unambiguous Automata Inference 4
Classes of automata (2/3)
Unambiguous Automata (UFA) [SH85]– one acceptance per word
I- Automata inference
a
bab
b a
a
b
b
b
L={a+b}*a{a+b}
NFA UFA DFA
D. Fredouille and F. Coste, Unambiguous Automata Inference 5
Automata inference
Examples
Counter-examples
I- Automata inference
S ={aa,abab}+
S ={ba,abbb}-
L={a+b}*a{a+b}
D. Fredouille and F. Coste, Unambiguous Automata Inference 6
Why this study ? State of the art: DFA inference Our goal: introducing some amount of
non-determinism Why ?
– NFA << DFA– inferring with less data– inferring “explicit” representations
Method:– extending classical DFA inference algorithm
I- Automata inference
D. Fredouille and F. Coste, Unambiguous Automata Inference 7
II - Study of the DFA inference framework
D. Fredouille and F. Coste, Unambiguous Automata Inference 8
Search space for NFAs [DMV94]
UA
MCA
II - The DFA search space
D. Fredouille and F. Coste, Unambiguous Automata Inference 9
Counter-examples : compatibility
UA
S L -
S L = -(compatible)
(incompatible)
MCA
II - The DFA search space
D. Fredouille and F. Coste, Unambiguous Automata Inference 10
The search space for DFA
UA
MCA
Deterministic merging
State merging
II - The DFA search space
D. Fredouille and F. Coste, Unambiguous Automata Inference 11
q1,q2 Q, w *: w pref(q1) w pref(q2) state-merging(q1,q2)
II - The DFA search space
Merging for determinisation procedure
D. Fredouille and F. Coste, Unambiguous Automata Inference 12
q1,q2 Q, w *: w pref(q1) w pref(q2) state-merging(q1,q2)
II - The DFA search space
Merging for determinization procedure
D. Fredouille and F. Coste, Unambiguous Automata Inference 13
q1,q2 Q, w *: w pref(q1) w pref(q2) state-merging(q1,q2)
II - The DFA search space
Merging for determinization procedure
D. Fredouille and F. Coste, Unambiguous Automata Inference 14
Deterministic merging operator =state-merging + merging for determinization
II - The DFA search space
Very commonly used [OG92, LPP98,...]
Demonstration of formal properties– Merging for determinization
• Enables to reach the “closest” DFA from the original NFA
– Deterministic merging
• Enables to reach all derived DFA from a given DFA
– ... (see tech. rep.)
D. Fredouille and F. Coste, Unambiguous Automata Inference 15
IV - From DFA to UFA inferenceor how to introduce some amount of non-determinism in inference
D. Fredouille and F. Coste, Unambiguous Automata Inference 16
Inferring non-deterministic representations: the choice of UFA
III - DFA to UFA inference
Why UFA ?– unity in the search space (like DFA)
NFA
UFA
DFA
UA
MCA({aaaaa})
D. Fredouille and F. Coste, Unambiguous Automata Inference 17
Merging for disambiguisation procedure
III - DFA to UFA inference
q1,q2 Q, w1,w2 *: w1 pref(q1) w1 pref(q2) w2 suff(q1) w2 suff(q2) state-merging(q1,q2)
D. Fredouille and F. Coste, Unambiguous Automata Inference 18
Merging for disambiguisation procedure
III - DFA to UFA inference
q1,q2 Q, w1,w2 *: w1 pref(q1) w1 pref(q2) w2 suff(q1) w2 suff(q2) state-merging(q1,q2)
D. Fredouille and F. Coste, Unambiguous Automata Inference 19
Unambiguous merging = state-merging + merging for disambiguisation
III - DFA to UFA inference
Finer operator than merging for determinization Demonstration of formal properties
– Merging for disambiguisation
• Enables to reach the “closest” UFA from the original NFA
– unambiguous merging
• Enables to reach all derived UFA from a given UFA
– ... (see tech. rep.)
D. Fredouille and F. Coste, Unambiguous Automata Inference 20
IV - Comparative experiments
- Inference algorithms- Benchmarks- Experimental results
D. Fredouille and F. Coste, Unambiguous Automata Inference 21
Algorithms
UFA– Hill-climbing heuristic
DFA– EDSM heuristic [LPP98]
RFSA– DeLeTe II [DLT01]
IV - Comparative experiments
– Hill-climbing heuristic
D. Fredouille and F. Coste, Unambiguous Automata Inference 22
Counter-example use for DFA and UFA inference
Compatibility [DMV94]– generalization of , stopped by
Functionality [AS95]
– generalization of and , stopped by
empty intersection
IV - Comparative experiments
S+ S-
S+ S-
D. Fredouille and F. Coste, Unambiguous Automata Inference 23
Benchmarks
[DLT01]– Generation: DFA, NFA, Regular Expression
– 4 sizes of training sample
– 30 languages generated for each generation mode and sample size
+ UFA generator Evaluation based on
– average recognition level on test sets
– matches between recognition level
IV - Comparative experiments
D. Fredouille and F. Coste, Unambiguous Automata Inference 24
Results
Best algorithms w.r.t. benchmarks– DFA bench: UFA inference with hill-climbing
– UFA bench: DFA inference with hill-climbing UFA inference with hill-
climbing
– NFA bench: RFSA inference
– Reg. Expr.: RFSA inference DFA inference with hill-climbing
?
IV - Comparative experiments
D. Fredouille and F. Coste, Unambiguous Automata Inference 25
Results
Heuristic:
– Hill-climbing >> EDSM when inferring DFAs
for NFA/Regular Expression/UFA
bench.
Counter-examples:
– Compatibility Functionality
IV - Comparative experiments
D. Fredouille and F. Coste, Unambiguous Automata Inference 26
Sample size Generator ufaC-dfaC 13 13 4 11 14 5 8 16 6 15 11 4DLTII-ufaC 1 7 22 1 2 27 2 6 22 2 5 23dfaF-dfaC 4 10 16 2 7 21 2 12 16 7 10 13ufaF-ufaC 8 16 6 4 21 5 10 18 2 10* 16 3GeneratorufaC-dfaC 17 6 7 13 14 3 7 15 8 13 9 8DLTII-ufaC 6 11 13 4 14 12 3 13 14 6 13 11dfaF-dfaC 1 4 25 1 6 23 2 13 15 6 6 18ufaF-ufaC 5 15 10 4 14 12 3* 15 11 3 12 15Generator ufaC-dfaC 16 9 5 11 12 7 12 10 8 12 10 8DLTII-ufaC 9 6 15 12 13 5 14 10 6 16 11 3dfaF-dfaC 10 8 12 5 13 12 5 10 15 6 12 12ufaF-ufaC 4 11 15 4 20 6 5 18 7 6 11 13Generator ufaC-dfaC 5 12 13 11 6 13 10 14 6 6 10 14DLTII-ufaC 15 8 7 17 8 5 10 10 10 18 9 3dfaF-dfaC 7 12 11 5 13 12 5 16 9 3 23 4ufaF-ufaC 9 13 8 9 5 16 5 13 12 6 13 11
UFA
DFA
NFA
Reg. Expr.
20015010050Results (matches)
IV - Comparative experiments
D. Fredouille and F. Coste, Unambiguous Automata Inference 27
Conclusion UFA inference
– Merging for disambiguisation– Heuristic
Comparison with EDSM & DeLeTe II
Perspectives Speeding up the algorithm Application Using properties of the DFA/UFA space
D. Fredouille and F. Coste, Unambiguous Automata Inference 28
References [AS95] Alquézar, Sanfeliu, “Incremental grammatical
inference from positive and negative data using unbiased finite state automata”, SSPR’94
[DMV94] Dupont and al. “What is the search space of the regular inference ?”, ICGI ’94
[DLT00] Denis and al., “Learning regular languages using nondeterministic automata”, ICGI ’00
[SH85] Stearns, Hunt, “On the equivalence and containment problems for unambiguous regular expressions, regular grammars and finite automata”, SIAM vol 14
[tech. rep.] Coste, Fredouille “What is the search space for NFA, UFA and DFA inference ?”, IRISA