quarterly review q4, june 2014 - wikimedia€¦ · quarterly review q4, june 2014. agenda ... gsoc...
TRANSCRIPT
Quarterly review Q4, June 2014
Agenda
○ Our objectives○ Progress Q4○ Outlook 2014 / 15○ Tasks Q1○ Questions and discussion
Our objectives
Bridge between wikitext and HTML5 + RDFa
1. Faithful bidirectional conversion without dirty wikitext diffs
2. Improve performance & enable new features by moving our primary content representation to HTML5 + RDFa
3. Research better templating, widget, and diffing solutions
Progress Q4What we got done:● Continued to improve rendering● Continued to improve RT-ing● Improved performance● Parsoid-specific CSS● Improved testing● Wrapped up image editing
support● Parsing of transclusion
parameters to DOM● Proper structured logging● Mentoring of wikitext linter
project● Helped CX team get started with
Parsoid HTML5● Services bootstrapping
And what we didn’t:● Language variant editing● Visual diffing● Content widgets (research)● Support repeated switching
between Wikitext & HTML in single edit without information loss (stretch)
● Non-Wikipedia project support (stretch)
Progress Q4:
Continuous iteration
○ Perfect round-tripping & render accuracy○ template encapsulation, foster parenting, links,
selective serialization, …○ informed by data from testing, bug reports from
VE usage, production logs
○ Continuous deploys○ mediawiki.org/wiki/Parsoid/Deployments○ Very solid thanks to work invested in testing
Progress Q4: Testing infrastructure
○ Parser tests○ Now simulating edits○ Support multiple formats per test for cases where
PHP / Parsoid HTML differs (images)○ Working on HTML tidy integration
■ Important for Parsoid HTML5 page view preparation
○ Round-Trip testing○ Trivial edit mode to detect selser (selective
serializer) regressions
Progress Q4:
Parser test results
● Better coverage● More passing tests
# tests wt2html wt2wt html2wt html2html selser
Q3 1280 821 / 459 1139 / 136 286 / 963 749 / 494 15994 / 1387
Q4 1347 911 / 436 1198 / 125 390 / 895 814 / 467 16438 / 1198
Progress Q4:
Round-trip test resultsQ3: 0.33%have semantic diffswithout selser
Now: 0.25%have semantic diffswithout selser
With selser: 0.006%(after trivial edits) only diffs introducedare single missing newlines.
Progress Q4:
Round-trip test results
○ Goals: ○ speed up editing & logged-in browsing○ enable new ways to view & edit content○ avoid maintaining two render pipelines in the
longer term○ Developed Parsoid-specific CSS
○ now used by VE (Desktop + Mobile), Flow, CX etc○ Found & fixed some render issues in manual
testing○ Next steps:
○ Automate render testing / visual diffing○ Services to provide needed infrastructure (Q1)
Progress Q4:
Get ready for HTML5 page views
Progress Q4:
Get ready for HTML5 page views
Progress Q4:
Performance: nodegrind
Progress Q4:
Performance
Progress Q4:
Performance: HTML request
Progress Q4:
Performance: HTML request
Progress Q4:
Images
○ No shortage of edge cases, big mess in MW
○ Now basically bug-for-bug compatible○ Started initiatives to clean up image
handling in the longer term○ semantic image formats○ square bounding boxes○ less confusing options, better uniform styling in
skin / view
Progress Q4:
DOM tpl. parameter editing
○ Parse transclusion parameters to DOM○ enables visual parameter editing○ not yet enabled in prod as no VE support yet
○ Performance implications ○ Parse times:
■ ~10s -> ~16s on transclusion-heavy pages like [[Barack Obama]]
■ Transclusion reuse will partly mitigate problem in prod.; need to quantify this before deploy
○ HTML size: ■ Larger DOM■ Will be mitigated when attributes are moved
out (depends on storage service, goal Q1)
Progress Q4:
Structured Logging
● GELF logging using bunyan● Will hook up to logstash, makes it easier to
analyze issues & usage
● Even in current form, has exposed errors in production that have since been fixed.
Progress Q4:
Language variants
○ Work just started○ Basic parsing & DOM representation○ Lots of very different bits of functionality
squeezed into one -{ … }- syntax○ Occupies an awkward post-processing spot in
PHP parser; better integrated in Parsoid; will have to unify in future
○ Next: Full editing support○ But: no rule/glossary support yet (RFC for that)○ Editing will be in a mix of variants
■ Handling variants better is a Research Project (ie, different ideas how to best do it)
Progress Q4:
Wikitext linter GSoC project
○ GSoC ‘14 project (Hardik Juneja)○ Flags wikitext “errors”
■ Missing end-tags, fostered content, etc.○ Collect statistics on wikitext usage patterns
■ Ex: Multi-template uses○ Surfaces info that Parsoid already has since:
■ it has to recover from these errors■ it has to reproduce errors back
○ Collected during normal parsing■ Sends data to an “external” webservice
○ Name? Lintoid/LintTrap/WikiLint/Linter/?? ○ Bikeshedding reqd .. not necessarily now. :-)
Progress Q4:
Wikitext linter GSoC project
○ http://lintbridge.wmflabs.org/○ presents results with HTML & web service
interfaces○ Future plan:
○ Enable in prod (either always or in sampling mode: 1 in N parses depending on overheads)
○ Pursue collaboration with Project CheckWiki○ Possibly in decent shape by Q2
Progress Q4:
HTML templating
● Goals:○ Secure & sane: Balanced DOM, Context Sensitive
Escaping○ Efficient to execute on server or client○ Content: Develop convincing alternative to
wikitext-based templating● Implemented KnockOff / TAssembly Q3● Mobile now testing KnockOff / TAssembly
for client-side UI templating● first design sketch for i18n & content tpl.● had useful conversations with i18n folks
Progress Q4:
Helped Content Translation team
Progress Q4:
Services
○ Gabriel transitioning into services team○ Starting up: hiring now○ Goal setting, design discussions
■ Authentication RFC■ REST API front-end■ Storage service
○ Infrastructure work○ pushing (Debian, for now) packaging along
○ Scott working on PDF renderer○ Leverages semantic info in Parsoid HTML for
massaging & conversion to LaTeX○ Getting ready for deploy
Progress Q4:
Things we didn’t get done
● Language variant editing (only partly)● Visual diffing (for HTML pageview testing)● Performance: More efficient template
updates (stretch goal for Q4)○ lower priority than HTML storage
● Research: Support switching between HTML and Wikitext within one edit○ works better with stable-id support
● Research: Content widgets
Stretch goals Q4
○ Non-Wikipedia projects○ Parsoid enabled on all public WMF projects○ biggest issue for editing seems to be labeled
section transclusion (primarily Wikisource)○ potential to fix complexity issues in projects like
Wiktionary○ no time spent on this so far
2014/15 plans
https://www.mediawiki.org/wiki/Parsoid/Roadmap/2014_15_Draft
Broad focus areas:● Continue iterating on RT & render quality,
performance● Language variant support● Parsoid HTML page views● Stable element ids● Support for HTML wikis
2014/15 Q1-Q4: More of the same
Not tied to roadmap tasks / features:● Bug fixing● Long tail of compatibility with PHP parser
+ Tidy combo● Performance work● Maintenance and nodejs upgrades (0.8 →
0.10 → 0.11 …)● Continuous deploys● GSoC, talks, conferences, etc.
● Finish up basic parsing support for lang. variants
● Use a bot to fix nesting issues:-{zh-cn=<span style='color:red';zh-tw=<span style='color:green'}->foo</span> rewritten to <span style=”-{zh-cn='color:red';zh-tw='color:green'}-”>foo</span>
● Finish basic “mix of variants” editing support● Evolve longer term strategy
○ glossaries, page-specific variant rules, etc.○ better editing support when working in a variant
(or use content translation mechanisms)
2014/15 Q1:Language variant support
2014/15 Q1-Q3/Q4: Parsoid HTML pageviews
● Dependencies:○ HTML storage + Content API (Services team)○ User-agnostic HTML + client-side customization
■ Redlinks, etc. (Services, platform teams)○ Any team willing to experiment with this initially
● Time line:○ Prototype by end of Q1
■ will have good idea of remaining render quality work by then
○ Move to beta in Q2○ Work with Mobile on specific skinning
2014/15 Q1-Q3/Q4: Parsoid HTML pageviews: Tasks
● Reduce HTML footprint○ Move data attributes into metadata storage○ Size difference
■ Currently: 3-3.5x on large pages■ Goal: ~1x
● Identify missing functionality/support○ Ex: Mixed content/styling templates○ Long tail of rendering diffs
● Testing + QA○ Tidy support in parser tests (Work-In-Progress)○ Visual diffs○ Reuse existing RT-testing framework
2014/15 Q1-Q3:Stable element ids (What?)
● Parsoid parses page P with WT foo<p id=”1”>foo</p>
● I edit source of P to bar\n\nfoo● Parsoid reparses P: <p id=”..”>bar</p><p id=”..”>foo</p>● Can we re-assign id 1 to the second para?
2014/15 Q1-Q3:Stable element ids (Why?)
● Lets us associate metadata with content elements○ authorship maps○ efficient diffs○ inline comments○ content translation tracking○ awesome features we haven’t thought about yet
● Lets us slim down HTML size, but still support switching between HTML & wikitext in VE
2014/15 Q2-Q4:Support for HTML wikis
● HTML content templates● Content widgets for some scenarios
○ Ex: navboxes, data tables● HTML diffs (for revision history)● Abuse filters
Cross-team efforts○ Parsoid, Services, VE, Platform, Design,
Community, ...
Q1 tasks
● HTML views: work towards a prototype○ Tidy support in parser tests○ Visual diffs + reuse existing RT-testing framework○ Start plugging gaps in rendering accuracy
■ Ex: mixed content-style templates used heavily in some infoboxes
○ Move data attrs to metadata storage■ Depends on storage support (services Q1)■ Work with VE + other clients that depend on
these attributes for a transition plan.○ Anything else?○ First experiment with PDF rendering and/or
“printable” page view?
Q1 tasks
● Language variant support○ Finish up basic parsing / editing support
● HTML for template parameters○ quantify perf issues○ work with VE team to enable it
● Wrap up wikitext linting GSoC project● Resume work on stable id support● Hook up logging with logstash● Other:
○ bug fixes, deploys○ Wikimania, post-wikimania vacations
Thank you!https://www.mediawiki.org/wiki/Parsoid