functional document testing - libreoffice conference · putting the fun in functional how to...

22
1 LibreOffice Brno 2016 Conference: Functional Document Testing Functional Document Testing Michael Stahl, Red Hat, Inc. 2016-09-08

Upload: dangtu

Post on 04-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

1LibreOffice Brno 2016 Conference: Functional Document Testing

Functional Document Testing

Michael Stahl, Red Hat, Inc.2016-09-08

Page 2: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

2LibreOffice Brno 2016 Conference: Functional Document Testing

Functional Document Testing

1) Problem Statement / Requirements

2) Diff Basics

3) Non-Determinism

4) First Results

Page 3: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

3LibreOffice Brno 2016 Conference: Functional Document Testing

Problem: Change Is Bad

Data-loss regressions in ODF round trips like

● Tdf#74230 graphic defaults changed

● Tdf#92379 background properties lost

● Tdf#100182 document index marks lost

Page 4: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

4LibreOffice Brno 2016 Conference: Functional Document Testing

Status Quo

Automated Crash Testing:

● 90k documents automatically loaded & stored

● success = LO doesn't crash

● no check of the content of written documents

● Idea: compare written documents to "reference" docs

(at least for ODF round-trips, the most important format)

Page 5: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

5LibreOffice Brno 2016 Conference: Functional Document Testing

Non-Requirements

General purpose ODF diff tool

● Different producers

● User initiated content changes

Page 6: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

6LibreOffice Brno 2016 Conference: Functional Document Testing

Requirements

● Scalable: we have 27k documents (threading...)

● Performance: want to compare faster than LO generates

● Memory usage: DOMs wasteful?

● Can make assumptions about documents: written by LO

● UTF-8

● Namespaces

● Non-determinism

● Intentional changes

Page 7: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

7LibreOffice Brno 2016 Conference: Functional Document Testing

Putting the Fun in Functional

How to compare 2 XML documents?

● save memory => avoid DOM

LazyList of SAX events

List of Differences

LazyList of SAX events

but: SAX callback interface?

● could Greenspun some CPS transform ...

● ... or be pragmatic and use a language where this is

idiomatic

Page 8: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

8LibreOffice Brno 2016 Conference: Functional Document Testing

Highly Unsophisticated Algorithm

diff :: State → [Event] → [Event] → [Difference]

● compare first SAX events in both lists

● if they match, continue with rests of both lists

● if not:

● cut 5 elements from both lists

● try to find shortest "edit distance" between them

(smallest number of insert / delete / difference)

● report differences

● continue with the first elements that match again

Page 9: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

9LibreOffice Brno 2016 Conference: Functional Document Testing

Non-Determinism: ODF

ODF has non-deterministic features:

● Spreadsheet formulas like RAND, NOW, INFO

● Presentations: random animations

● Fields: <text:date> / <text:time> etc.

● meta.xml <dc:date> / <meta:creation-date> (templates)

● XForms bindings: now()

Page 10: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

10LibreOffice Brno 2016 Conference: Functional Document Testing

Non-Determinism:Application Specific

● Automatic Styles

● Writer layout: how far did it go?

● Various generated “*:id” attributes

● ToX-mark / Reference-mark order

● ...

Page 11: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

11LibreOffice Brno 2016 Conference: Functional Document Testing

To Fix LO or to Ignore?

Reasons for fixing LO to be deterministic:

● Non-determinism is annoying for a small sub-set of users

Reasons for ignoring non-determinism in ODFunDiff:

● Inherent ODF non-determinism

● Intentional changes

● Older releases

● Want to compare production releases, not special “mode”

Page 12: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

12LibreOffice Brno 2016 Conference: Functional Document Testing

Automatic Styles

● Cannot compare directly (non-determinism)

● Cut them out of the document

● Usage (attributes of type styleNameRef) creates a mapping

● At end of document:

● use mapping to compare style content

● compare unused automatic styles

Page 13: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

13LibreOffice Brno 2016 Conference: Functional Document Testing

Stateless Pre-Filtering

preFilter :: [Event] → [Event]

● Eliminate <text:soft-page-break> element (layout)

● Merge surrounding <text:span> / <text:a> / <text:s>

● Sort ToX-marks / Ref-marks

● such that the IDs can be mapped later...

● Sort <config:config-item>

● Sort <manifest:file-entry>

● … and remove "Configurations2/accelerator/current.xml"

● … and remove “layout-cache”

● Merge multiple consecutive character data events

Page 14: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

14LibreOffice Brno 2016 Conference: Functional Document Testing

Stateful Attribute Value Mapping

● Generated ID attributes

● styleNameRef attributes with automatic styles

● <text:list> xml:id / text:continue-list

● ToX mark start / end text:id

● <text:changed-region> text:id / <text:change>

text:change-id

● Random animations: draw:id, smil:targetElement,

smil:begin

Page 15: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

15LibreOffice Brno 2016 Conference: Functional Document Testing

Missing Attributes

● Writer table style:width / style:column-width (layout)

● Writer text:use-soft-page-breaks (layout)

● Most pointless attribute ever

Page 16: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

16LibreOffice Brno 2016 Conference: Functional Document Testing

Attribute Value and Character Data Changes

● Writer draw:z-index (layout)

● officeooo:rsid (generated)

● Fields: <text:page-number>, <text:page-count> (layout)

● Header: <text:chapter>, <text:variable-get> (layout)

● Footnotes: <text:note-citation> (layout)

● <meta:document-statistic> (layout, async. word-count)

● <meta:generator>

● Config-item “DoNotCaptureDrawObjsOnPage” (layout)

Page 17: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

17LibreOffice Brno 2016 Conference: Functional Document Testing

Teach Your Tool Ignorance With a Big Hammer

● Non-localized changes:

● Spreadsheet formulas like RAND, NOW

● Random animations

● Regex search content.xml => pass in flag to ignore more

stuff

Page 18: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

18LibreOffice Brno 2016 Conference: Functional Document Testing

Are We Sufficiently Ignorant Yet?

● If written by same LO build, around 20 documents still differ

● ooo71392-1.ods spreadsheet with randomized styles

● tdf89783-12.odt 64k automatic table styles => OOM

Page 19: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

19LibreOffice Brno 2016 Conference: Functional Document Testing

It’s Not a Bug, it’s a Feature

● Filter differences that are intentional changes

● Implemented as separate result filter function

postFilter :: [Difference] → [Difference]

● Version dependent

● Ignore change if reference version < N and test version >= N

● For 5.3, ~15 changes identified

Page 20: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

20LibreOffice Brno 2016 Conference: Functional Document Testing

Bugs Found

● Hyperlinks on <form:form> not converted to absolute URLs

● OOo 1.0 XML: Impress converting legacy animations via timer

● Calc ISOWEEKNUM migration issue

● Number format “month” vs. “minutes” regression

● Calc wrong matrix formula result “Err:504”

● Calc header / footer lost?

● some other number formatting bug

● Calc CEILING formula not returning an error (now declared a

feature)

Page 21: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

21LibreOffice Brno 2016 Conference: Functional Document Testing

Surely Functional Means Slow?

● GHC profiling helps

● LO release build round-trips documents in 90 minutes

● ODFunDiff compares them in 18 minutes

● Trivial multi-threading with GHC runtime and Chan

Page 22: Functional Document Testing - LibreOffice Conference · Putting the Fun in Functional How to compare 2 XML documents? ... Functional Document Testing Non ... Functional Document Testing

22LibreOffice Brno 2016 Conference: Functional Document Testing

All text and image content in this document is licensed under the Creative Commons Attribution-Share Alike 3.0 License (unless otherwise specified). "LibreOffice" and "The Document Foundation" are registered trademarks. Their respective logos and icons are subject to international copyright laws. The use of these therefore is subject to the trademark policy.

Thank you …

git://gerrit.libreoffice.org/odfundiff