andrew kennedy (microsoft research cambridge) benjamin pierce (university of pennsylvania) texpoint...

26
On Decidability of Nominal Subtyping with Variance Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania)

Post on 20-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

On Decidability of Nominal Subtyping with VarianceAndrew Kennedy (Microsoft Research

Cambridge)Benjamin Pierce (University of Pennsylvania)

Page 2: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Compiler demo

Java 1.6

class N<Z> { }class C<X> extends N<N<? super C<C<X>>>> { N<? super C<Object>> cast(C<Object> c) { return c; }}

Scala 2.3.1

class N[-Z]class C extends N[N[C]] { def cast(c:C): N[C] = c }

.NET 2.0

.class interface N<-Z> { }

.class C implements class N<class N<class C>> { .method static class N<class C> cast(class C c) cil managed { .maxstack 1 ldarg.0 ret }

Run compilers!

Page 3: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Two features in common

1. Generic inheritance

class C<X,Y> extends D<E<X>> implements I<Y> Java

class C[X,Y] extends D[E[X]] with I[Y] Scala

.class C<X,Y> extends class D<class E<!X>> implements class I<!Y> .NET

2. Generic variance

interface Func<X,Y> ... Func<? super C, ? extends D> ... Java

trait Func[-X,+Y] Scala

.class interface Func<-X,+Y> .NET

Page 4: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Generic inheritance

Inheritance declaration has form

C<X1,...,Xn> <:: V, …

Syntax-directed subtyping rule:

generic class name

formal type parameters

<:: short for “extends or implements”

supertypes, may use X1,...,Xn

C<T1,...,Tn> <: D<U1,...,Un>

V[T1/X1,...,Tn/Xn] <: D<U1,...,Un>

C<X1,...,Xn> <:: V

(SUPER)

C D

Page 5: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Generic variance

Variance declaration e.g. C<+X,-Y,Z> <:: ...

Subtyping rule

Java has wildcards a.k.a. use-site variance. These can mimic declaration-site variance e.g.

C<? extends T, ? super T’, V> <: C<? extends U, ? super U’, V>

T <: U U’ <: T’

(Var )for each i T i <:var (C#i) Ui

C <T ><: C <U >

(Super )C <X ><:: V [T =X ]V <: D<U >

C <T ><: D<U >C 6= D

T <: U

T <:+U T <:± T

U <: T

T <:- U

Figure 1: The subtyping relation

blahblah

C<T1,...,Tn> <: C<U1,...,Un>

C<v1 X1, ..., vn Xn> 8 i, Ti <:vi

Ui

(VAR)

variance annotation: +, -, ±

subtyping direction from vi

T <:+ U means T <: U

T <:- U means U <: T

T <:± U means T=U

Page 6: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

What goes wrong?

Example 1class N<-X>class C <:: NNC

Question: C <: NC ?® (by inheritance rule)NNC <: NC ? ® (by variance rule) C <: NC ?

Oops. We’re back where we started.

short for N<N<C>>>

Page 7: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

What goes wrong?

Example 2class N<-X>class C<Y> <:: NNCCY

Question: CA <: NCB ?® (by inheritance rule)NNCCA <: NCB ? ® (by variance rule) CB <: NCCA ?® (by inheritance rule)NNCCB <: NCCA ?® (by variance rule) CCA <: NCCB ?

Oops. Types are growing forever...

short for N<N<C<C<Y>>>>

Page 8: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Even when it goes right...

Example 3class N<-X>class C0<Y> <:: NNYclass C1<Y> <:: C0C0Y...class Cn<Y> <:: Cn-1Cn-1Y

Question: CnNA <: NCnA ?Answer: yes, by a derivation that uses 2n+1

instances of variance.

Page 9: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Research outline

Start with “essence of generic Java/Scala/.NET subtyping”: ground subtyping only (future: open types with bounds on type

parameters) declaration-site variance (same issues, and more, arise in

wildcards/use-site variance) Investigate algorithmics of subtyping.

Presentations of Java-style subtyping are typically declarative e.g. FGJ, Wild FJ, Viroli/Igarashi Variant FGJ

So first step is to present syntax-directed (a.k.a. algorithmic) rules and prove transitivity; equivalence of declarative and algorithmic systems follows

Not trivial – see Appendix of paper for proof

Page 10: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Start with General Problem

Just two restrictions on inheritance:1. Acyclicity: if C<T> <:: ... <:: D<U> then C D2. Variance-respecting: e.g. C<+X> <:: N<X>

illegal if N contravariant

Theorem. Subtyping is undecidable. (Java, Scala, and .NET all impose further

restrictions on inheritance, so this result does not transfer)

Page 11: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

i ui vi

1 a ab2 b ca3 ca a4 ab

cc

12314abcaaabc abcaaabc

Post Correspondence Problem Given a sequence of pairs (u1,v1),...,(un,vn) of

words over a finite alphabet find an index sequence i1,...,im such that ui1

...uim = vi1

...vim

Example problem Solution

Page 12: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Undecidability of subtyping: proof

Post Correspondence Problem is undecidable

Reduce instance of PCP to instance of subtyping under some inheritance declarati0ns

Represent letters of alphabet by unary generic classes, define non-generic class E for “end-of-word”

class a<X> class b<X> class c<X> class E

Words are represented by repeated type application

abca a<b<c<a<E>>>>

Page 13: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Undecidability of subtyping: proof

State of search for solution is encoded by subtype problemC<u,v> <: NC<u,v> ?

where u and v are the currently-accumulated words, N is contravariant, and choice of next word is encoded by multiple supertypes of C. Class B is used to choose the very first word. All Ni are contravariant, S is invariant.

class C<X,Y> <:: NN1C<u1X, v1Y> class B <:: NN1C<u1E, v1E>

<:: N1NC<u1X, v1Y> <:: N1NC<u1E, v1E>

... ...<:: NNnC<unX, vnY> <::

NNnC<unE, vnE><:: NnNC<unX, vnY> <::

NnNC<unE, vnE><:: NSX<:: SY

It turns out that

B <: NB iff ui1...uim

= vi1...vim

for some i1,...,im

Page 14: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Example

class C<X,Y> <:: NN1C<aX,abY> (C1)<:: N1NC<aX,abY> (C1’) ...<:: NN4C<abcX,cY> (Cn)<:: N4NC<abcX,cY> (Cn’) <:: NSX (L)<:: SY (R)

Steps of subtyping derivation:

B <: NB® … C<bcaaabcE,caaabcE> <: NC<bcaaabcE,caaabcE>® (by C1) NN1C<abcaaabcE,abcaaabcE> <: NC<bcaaabcE,caaabcE>® (by VAR) C<bcaaabcE,caaabcE> <: N1C<abcaaabcE,abcaaabcE> ® (by C1’) N1NC<abcaaabcE,abcaaabcE> <: N1C<abcaaabcE,abcaaabcE> ® (by VAR) C<abcaaabcE,abcaaabcE> <: NC<abcaaabcE,abcaaabcE>® (by L) NSabcaaabcE <: NC<abcaaabcE,abcaaabcE>® (by VAR) C<abcaaabcE,abcaaabcE> <: SabcaaabcE® (by R) SabcaaabcE <: SabcaaabcE® (by reflexivity) QED.

i ui vi

1 a ab

2 b ca

3 ca a

4 abc

c

Page 15: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Ingredients of undecidability1. Contravariance

(used to send term to other side and back again)

2. Unbounded growth in size of subtype assertion(used to accumulate concatenation of words)

3. Multiple instantiation inheritance(used to encode choice of words)

Idea: investigate contribution of each of these ingredients by eliminating them, one at a time.

Page 16: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Ingredient 1: contravariance

Theorem. If no parameters are contravariant, then subtyping is decidable.

Proof. Define well-founded order on subtype assertions:

(T1 <: U1) < (T2 <: U2)iff size(U1) < size(U2)or size(U1) = size(U2) and T2 <::+ T1

This order decreases from conclusion to premises in the subtyping rules; so rule-based algorithm terminates.

Page 17: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Ingredient 2: unbounded growth of types

Definitions. A set of types S is inheritance closed if the following conditions hold:

Inheritance: if T2 S and T <:: U then U2S, and Decomposition: if C<T1,...,Tn>2 S then T1,...,Tn2S

The inheritance closure of a set is the least superset that is inheritance closed.

Class declarations are finitary if inheritance closure of any finite set of types is finite.

Example 2.

Inheritance closure of { CA } is infinite, includes { A, CA, CCA, CCCA, ... }, so definitions above are not finitary.

Theorem. For finitary inheritance, subtyping is decidable.Proof. Algorithm simply maintains a list of “visited” goals to detect cycles.

As inheritance closure is finite, the algorithm explores only a finite set of types, and hence terminates.

class N<-X>class C<Y> <:: NNCCY

Page 18: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Characterizing infinitary inheritance

Syntactic characterization (due to Viroli): create type parameter dependency graph which represents uses of formal type parameters in inheritance declarations

Nodes are formal parameters Non-expansive edges represent “naked” uses of type parameters Expansive edges represent “nested” uses of type parameters Inheritance is infinitary iff a cycle contains an expansive edge

Example 2class N<-W>class C<Y> <:: NNCCY

Example 1aclass N<-X>class D<Z> <:: NNDZ

X Z W Y

Finitary Infinitary

Page 19: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Characterizing infinitary inheritance

Syntactic characterization (due to Viroli): create type parameter dependency graph which represents uses of formal type parameters in inheritance declarations

Nodes are formal parameters Non-expansive edges represent “naked” uses of type parameters Expansive edges represent “nested” uses of type parameters Inheritance is infinitary iff a cycle contains an expansive edge

Example 2bclass N<-W>class C<Y> <:: NNECYclass E<V> <:: C<V>

Example 1aclass N<-X>class D<Z> <:: NNDZ

X Z W Y

Finitary Infinitary

V

Page 20: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Ingredient 3: multiple instantiation inheritance

C# permits implementation of same generic interface at different instantiations e.g.

class E : IEnumerator<int>, IEnumerator<string>

Instantiations must be non-overlapping e.g.

class C<X> : I<X>, I<object>class D<Y,Z> : I<Y>, I<Z>

are illegal.

Java outlaws multiple instantiation inheritance (it can’t be implemented by type erasure)

Question: does this make subtyping decidable?

Page 21: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

No back-tracking

In the absence of multiple instantiation inheritance, we have

T <::* C<U1,...,Un> Æ T <::* C<V1,...,Vn> ) 8 i, Ui = Vi

(<::* is reflexive transitive closure of single-step inheritance). i.e. instantiations are uniquely determined by inheritance.

We can then reformulate subtyping so that derivations are unique; the algorithm can proceed without back-tracking. We combine inheritance and variance into a single rule.

T <: D<U1,...,Un>

T <::* D<T1,...,Tn> 8 i, Ti <:vi

Ui

D<v1 X1,...,vn Xn>(SUPERVAR)

Page 22: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Accessibility

Example 2aclass N<-X> class D<Z>class C<Y> <:: NNCDY

Question: CA <: NCB ?® (by SUPERVAR) CB <: NCDA ?® (by SUPERVAR) CDA <: NCDB ? ® (by SUPERVAR) CDB <: NCDDA ?® (by SUPERVAR) CDDA <: NCDDB ?® ...

Page 23: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Accessibility

Example 2aclass N<-X> class D<Z>class C<Y> <:: NNCDY

Question: CA <: NCB ?® (by SUPERVAR) CB <: NCDA ?® (by SUPERVAR) CDA <: NCDB ? ® (by SUPERVAR) CDB <: NCDDA ?® (by SUPERVAR) CDDA <: NCDDB ?® ...

Observation 1 Types A and B do not affect

validity: they are “inaccessible”. In fact, everything underneath C is inaccessible. So by checking

equivalence “up to accessibility”, we can detect looping.

Observation 2The inaccessible region of

the assertion grows unboundedly.

The accessible region is bounded in size.

Page 24: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Characterizing accessibility

Example 2aclass N<-X> class D<Z>class C<Y> <:: NNCDY

X Y Z

Parameter Y is “expansive-recursive”: it appears in an expansive cycle in the type parameter dependency graph.

Instantiations of Y are inaccessible because1. Invariance of Y => variance rule does not “uncover” an

instantiation2. Recursion through Y => inheritance always instantiates Y

with another type involving C (in more complex examples, in mutual recursion with C)Definition

• C<T1,...,Tn> ~ D<U1,...,Un> (“equivalent up to accessibility”) when C=D and for each i, either i’th parameter of C is expansive-recursive or Ti ~ Ui

• (T <: T’) ~ (U <: U’) when T~U and T’~U’

Page 25: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Decidability argument

1. Lemma.Suppose J1 ~ J2 for subtype judgments J1 and J2. If J1 ! J1’ then J2 ! J2’ for some J2’ such that J1’ ~ J2’.Corollary: if J !+ J’ and J ~ J’ then J !1

2. Lemma. For a given set of inheritance declarations, there exists some bound such that:accessible-depth(J)· and J!J’ ) accessible-depth(J’)·

3. Corollary. If all expansive-recursive parameters are invariant and used exactly once, then subtyping is decidable.

Page 26: Andrew Kennedy (Microsoft Research Cambridge) Benjamin Pierce (University of Pennsylvania) TexPoint fonts used in EMF. Read the TexPoint manual before

Discussion

Subtyping in .NET is decidable .NET outlaws infinitary inheritance to ensure termination of

eager supertype loading Our decidability result applies to ground subtyping; we

believe it’s easy to extend the result to open subtyping with type parameter bounds

Decidability of subtyping in Scala and Java is still open It would be nice to generalize the last result to remove the

variance/linearity restriction This would imply decidability of Scala subtyping (we think) Java wildcards are more complex: even the context can grow

unboundedly