don't compare averages holger bast max-planck-institut für informatik (mpii) saarbrücken, germany...

17
Don't Compare Averages Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber WEA 2005 May 10 – May 13, Santorini Island, Greece

Post on 20-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Don't Compare Averages Holger Bast Max-Planck-Institut fr Informatik (MPII) Saarbrcken, Germany joint work with Ingmar Weber WEA 2005 May 10 May 13, Santorini Island, Greece
  • Slide 2
  • Two famous quotes There are three kinds of lies: lies, damn lies, and statistics Benjamin Disraeli, 1804 1881 (reported by Mark Twain) Never believe any statistics you havent forged yourself Winston Churchill, 1874 1965
  • Slide 3
  • A typical figure Theirs Ours Each point represents an average over a number of iterations Y-axis: some cost measure X-axis: input size 3 4
  • Slide 4
  • Changing the cost measure... by a monotone function, say from c to 2 c This is from authentic data! 3 4 c 10 15 2c2c
  • Slide 5
  • No deep mathematics here Even for strict monotone f certainly E f(X) f(E X) in general but also E X E Y does not in general imply E f(X) E f(Y) Example X : 4, 4 average 4 Y : 1, 5 average 3 2 X : 2 4, 2 4 average 16 2 Y : 2 1, 2 5 average 17
  • Slide 6
  • Examples of multiple cost measures Language modeling for a given probability distribution p 1,, p n find distribution q 1,, q n from a constrained class that minimizes cross-entropy p i log (p i /q i ) minimizes perplexity (p i /q i ) p i = 2 cross-entropy Algorithm A uses algorithm B as a subroutine B produces result of average quality q complexity of A depends on, say, q 2
  • Slide 7
  • Can this also happen with error bars? error bars for c don't overlap, yet reversal for f(c)? Yes, this can also happen! c f(c)
  • Slide 8
  • Can this also happen with error bars? complete reversal with error bars? c f(c)
  • Slide 9
  • Can this also happen with error bars? complete reversal with error bars? c f(c)
  • Slide 10
  • Can this also happen with error bars? complete reversal with error bars? E Y + Y E X X E f(Y) f(Y) E f(X) + f(X) c f(c) Z = E |Z E Z| absolute deviation Z = sqrt E (Z E Z) 2 standard deviation
  • Slide 11
  • Can this also happen with error bars? complete reversal with error bars? if E X X E Y + Y c f(c) then E f(X) f(X) E f(Y) + f(Y) Theorem: complete reversal can never happen!
  • Slide 12
  • Can this also happen with error bars? complete reversal with error bars? if E X X E Y + Y c f(c) then E f(X) f(X) E f(Y) + f(Y) if only one of the four is dropped, the theorem no longer holds in general
  • Slide 13
  • Our first proof
  • Slide 14
  • The canonical proof 1.The medians M X and M Y do commute with f Prob(X M X) = = Prob( f(X) f(M X) ) f(M X) = M f(X) and f(M Y) = M f(Y) 2. and hence cannot reverse their order M X M Y f(M X) f(M Y) because f is monotone M f(X) M f(Y) because M and f commute 3.Expectation and median are related as | E X M X | X = E | X E X | | E Y M Y | Y = E | Y E Y | nothing new, but hardly any computer scientist seems to know
  • Slide 15
  • The canonical proof now assume this would happen contradicts the fact that the medians cannot reverse E Y + Y E X X E f(Y) f(Y) E f(X) + f(X) then M Y M Xyet M f(Y) > M f(X) c f(c)
  • Slide 16
  • Conclusion Average comparison is a deceptive thing even with error bars! There are more effects of this kind e.g. non-overlapping error bars are not statistically significant for a particular order of the expectations (or medians) e.g. for normally distributed X, Y Prob( X + X Y Y | E X > E Y ) is up to 8% Better always look at the complete histogram and at least check maximum and minimum X Y
  • Slide 17
  • ! Conclusion Average comparison is a deceptive thing even with error bars! There are more effects of this kind e.g. non-overlapping error bars are not statistically significant for a particular order of the expectations (or medians) e.g. for normally distributed X, Y Prob( X + X Y Y | E X > E Y ) is up to 8%