xduce tabuchi naoshi, m1, yonelab. ([email protected])

53
XDuce Tabuchi Naoshi, M1, Yo nelab. ([email protected] o.ac.jp)

Upload: adeline-patt

Post on 14-Dec-2015

231 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

XDuce

Tabuchi Naoshi, M1, Yonelab.

([email protected])

Page 2: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Presentation Outline

XDuce: Introduction Regular Expression Types Regular Expression Pattern Matching Algorithms for Pattern Matching Type Inference Conclusion / Future Works References Xperl(?)

Page 3: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

XDuce: For What?

A functional language for XML processing On the basis of

Regular Expression Types, andPattern Matching

Statically Typedi.e. Outputs are statically checked against DTD-conformance etc.

Page 4: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Advantages (vs. “untyped”)

“Untyped” XML processing: programs using DOM etc.Little connection between program and XML s

chemaValidity can be checked only at run-time, if an

y

Page 5: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Advantages (vs. “embedding”)

“Embedding” : mapping XML schema into language’s type system.

e.g.

<!ELEMENT person (name, mail*, tel?)> (DTD)

type person = Person of name * mail list * tel option (ML)

Page 6: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Advantages (vs. “embedding”)

Embedding does not suit intuition in some cases.

e.g.

Intuitively… (name, mail*, tel?) <: (name, mail*, tel*)but not name * mail list * tel option <: name * mail list * tel list

Page 7: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Language Features (1/2)

ML-like pattern matchinge.g.

match p with| person(name[n], (ms as mail*), tel[t])

-> (* case: p has a tel *)| person(name[n], (ms as mail*))

-> (* case: p has no tel *)…

Page 8: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Language Features (2/2)

Type inferencee.g. if

type Person = person[name[String],mail*, tel[String]?] and

p :: Personthen

match p with person[name[n], (ms as mail*)]⇒ n :: String, ms :: mail* are inferred.

Page 9: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Applications

Bookmarks (Mozilla bookmark extraction) Html2Latex Diff (diff for XML) All 300 – 350 lines.

Page 10: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Regular Expression Types

Types are defined in regular expression form with labelsConcatanation, alternation, repetition as basic

constructorsLabels correspond to elements of XML

(person, name, mail, etc…)

Page 11: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Syntax of Types

T ::= () | X | l[T]| T, T (* concat. *)| T|T (* alt. *)| T* (* rep. *)

whereX : Type Variablesl : Labels

Page 12: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Recursive Types

Types can be (mutually) recursive.e.g.

type Folder = Entry*type Entry = name[String],

file[File] |name[String], folder[Folder]

Page 13: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Subtyping

Meaning of subtypes is as usual:All values t of T are also values of T’

T <: T’ ⇔ t ∈ T ⇒ t ∈ T’

Page 14: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Subtagging

Subtaggings are user-defined “ad-hoc” subtype relation between labelse.g.

small <: font… <small> tag is a special case of <font> tag (in HTML)

Page 15: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Complexity of Subtyping

Subtype relation (T <: T’) is equivalent to inclusion of CFGs: Undecidable!

Need some restrictions on syntax (next slide…)

Page 16: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Well-formedness of Types

Syntactic restriction on types to ensure “regularity”

Recursive use of types can only occurat the tail position of type definition, or inside labels.

Page 17: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Well-formed Types: Examples

type X = Int, Ytype Y = String, X | ()

andtype Z = String, lab[Z], String | ()

are well-formed, buttype U = Int, U, String | ()

is not.

Page 18: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Complexity of Subtyping, again

With well-formedness, checking subtype relation is:Still EXPTIME-complete, butacceptable in practical cases.

Page 19: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Pattern Matching

Pattern match can also involve regular expression types.

e.g.match p with| person[name[n], (ms as mail*), (t as tel?)-> …

Page 20: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Policies of Pattern Matching

Pattern matching has two basic policies:First-match (as in ML):

only the first pattern matched is takenLongest-match (as usual in regexp. matching

on string):matching is done as much as possible

Page 21: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

First-match: Example

(* p = person[name[…], mail, tel[…]] *)match p with| person(name[n], (ms as mail*), tel

[t])-> (* invoked *)

| person(name[n], (ms as mail*), (tl as tel?)

-> (* not invoked *)

Page 22: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Longest-match: Example

(* p = person[name mail, mail, tel] *)match p with| … (m1 as mail*), (m2 as mail*), …

-> (*m1 = mail, mailm2 = () *)

Page 23: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Exhaustiveness and Redundancy

Pattern matches are checked against exhaustiveness and redundancy.Exhaustiveness: No “omission” of valuesRedundancy: Never-matched patterns

Page 24: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Exhaustiveness

A pattern match P1 -> e1 | … | Pn -> en is exhaustive (wrt. input type T)⇔All values t ∈ T are matched by some Pi

orT <: P1 | … | Pn

Page 25: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Exhaustiveness: Example (1/2)

match p with| person[name[n], (ms as mail*), tel[t]]

-> ...| person[name[n], (ms as mail*)]

-> ...is exhaustive patterns (wrt. Person)

Page 26: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Exhaustiveness: Example (2/2)

match p with| person[name[n], (ms as mail*), tel[t]]

-> ...| person[name[n], (ms as mail+)]

-> ...is NOT exhaustive (wrt. Person):

person[name[...]] does not match

Page 27: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Redundancy

A pattern Pi is redundant in P1 -> e1 | … | P

n -> en (wrt. input type T)⇔All values matched by Pi is matched by P1 | ... | Pi-1

Page 28: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Redundancy: Example

match p with| person[name[n], (ms as mail*), (tl as tel?)]

-> ...| person[name[n], (ms as mail*)]

-> ...Second pattern is redundant:

anything match second pattern also match first one.

Page 29: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Algorithms for Pattern Matching

Pattern matching takes following stepsTranslation of values into internal forms

(binary trees)Translation of types and patterns into internal

forms (binary trees and tree automata)Values are matched by patterns, in terms of

tree automata

Page 30: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Internal Forms of Values

Values are represented as binary trees internally

t ::= ε (* leaves)| l(t, t) (* labels *)

First node is content of the label, second is remainder of the sequence.

Page 31: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Internal Forms of Values: Example

person[name[], mail[], mail[]]is translated into

person(name(ε, mail(ε, mail(ε, ε))), ε)

Page 32: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Internal Forms of Types

Types are also translated into binary treesT ::= φ (* empty *)

| ε (* leaves *)| T|T| l(X, X)

X is States, used in tree automata

Page 33: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Internal Forms of Types: Tree Automata A tree automaton M is a mapping of

States -> Typese.g.

M(X) = name(Y, Z)M(Y) = εM(Z) = mail(Y, Z) | ε ...

Page 34: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Internal Forms of Types: Example

type Person = person[name[], mail*, tel[]?]is translated into

binary tree: person(X1, X0) and tree automaton M, s.t.

M(X0) = εM(X1) = name(X0, X2),M(X2) = mail(X0, X2) | mail(X0, X3) | εM(X3) = tel(X0, X0)

Page 35: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Internal Forms of Patterns

Patterns are similar to types, with some additions

P ::= (* same as types... *)| x : P (* x as P *)| T (* wildcard *)

Wildcards are used for non “as”-ed variables

Page 36: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Internal Forms of Patterns: Example Pattern

person[name[n], (ms as mail*)]is translated into binary tree

person(Y1, Y0)and tree automaton N, s.t.

N(Y0) = εN(Y1) = name(n:T, ms:Y2)N(Y2) = mail(Y0, Y2) | ε

Page 37: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Pattern Matching (1/3)

Pattern matching has two rolesmatch input values (of course!)bind variables to components of input value, if

matched Written formally

t ∈ D ⇒ V“t is matched by D, yielding V” (V : Vars -> Values)

Page 38: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Pattern Matching (2/3)

Matching relation t ∈ D ⇒ V is defined by following rules... (next slide)

Assumptions:D is a set of patterns and statesA tree automaton N is implied(D, N) corresponds to the external pattern

Page 39: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Pattern Matching (3/3)

212121

222111

21

21

21

1

),(),(

|

|

}{:

)(

VVYYlttl

VYtVYt

VPPt

VPtPt

VPPt

VPtTt

txVPxt

VPt

VYt

VYNt

Page 40: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Type Inference (1/2)

Infer types of variables in patterns Results are exact types of variables Type of each variable depends on

pattern itself, and type of input

Page 41: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Type Inference (2/2)

Type inference is “flow-sensitive” In P1 -> e1 | … | Pn -> en , inference on Pi de

pends on P1 ... Pi-1

Because…Values matched by Pi are those NOT matched

by P1 ... Pi-1

Page 42: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Type Inference: Example (1/2)

(* p :: person[name[], mail*, tel[]?] *) match p with| person[name[], rest] -> …

Type of rest inferred ismail*, tel[]?

In this case

Page 43: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Type Inference: Example (2/2)

match p with| person[name[], tel[]] -> …| person[name[], rest] -> …

Type of rest becomes(mail+, tel[]?) | ()

In this case, because…person[name[], (), tel[]]

Is matched by the first pattern

Page 44: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Type Inference: Limitations

“Exact” type inference is possible only onVariables at tail position, or Inside labels (c.f. well-formedness)

Limitation comes from internal representation of patterns (binary trees)

Page 45: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Conclusion

Expressiveness of regular expression types/pattern matching are useful for XML processing.

Type inference (including subtype relation) is possible and efficient (in most practical cases).

Page 46: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Future Works

Precise type inference on all variables Introducing Any type: Not possible by

naïve wayBreaks closure-property of tree

automataMakes type inference impossible

Page 47: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

References

Regular Expression Pattern Matching for XML: Hosoya and Pierce

Regular Expression Types for XML: Hosoya, Vouillon, and Pierce

Available @ http://xduce.sourceforge.net/papers.html

Page 48: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Xperl(?)

My own current research Regular expression types for Perl Motivation: Scripting languages

are used more widelywill live longer

than XML

Page 49: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Features (in mind)

Regular expression (but not tree) types Infer outputs of scripts, etc. Detect possible run-time errors

Page 50: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Progress Report (1/3)

Parsing: Nightmare! ASTs can be extracted through debug inte

rface, fortunately :-p

Page 51: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Progress Report (2/3)

Semantics: No specification but implementation

Trying from scratch, step by step Queer, esp. around side-effects and data

structures First attempt in the world?

Page 52: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Progress Report (3/3)

Type System: Working along with semantics

Types are regular expressions:

τ ::= ε|α| ττ | τ|τ | τ* … Preliminary implementation of inference Still VERY trivial...

Page 53: XDuce Tabuchi Naoshi, M1, Yonelab. (tabee@yl.is.s.u-tokyo.ac.jp)

Resources

No documentations yet. Working note is placed @

http://tabee.com/private/lab/xperl/defn.dvi

AS-IS.