quarterly review q4, june 2014 - wikimedia€¦ · quarterly review q4, june 2014. agenda ... gsoc...

37
Quarterly review Q4, June 2014

Upload: others

Post on 05-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Quarterly review Q4, June 2014

Page 2: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Agenda

○ Our objectives○ Progress Q4○ Outlook 2014 / 15○ Tasks Q1○ Questions and discussion

Page 3: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Our objectives

Bridge between wikitext and HTML5 + RDFa

1. Faithful bidirectional conversion without dirty wikitext diffs

2. Improve performance & enable new features by moving our primary content representation to HTML5 + RDFa

3. Research better templating, widget, and diffing solutions

Page 4: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4What we got done:● Continued to improve rendering● Continued to improve RT-ing● Improved performance● Parsoid-specific CSS● Improved testing● Wrapped up image editing

support● Parsing of transclusion

parameters to DOM● Proper structured logging● Mentoring of wikitext linter

project● Helped CX team get started with

Parsoid HTML5● Services bootstrapping

And what we didn’t:● Language variant editing● Visual diffing● Content widgets (research)● Support repeated switching

between Wikitext & HTML in single edit without information loss (stretch)

● Non-Wikipedia project support (stretch)

Page 5: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Continuous iteration

○ Perfect round-tripping & render accuracy○ template encapsulation, foster parenting, links,

selective serialization, …○ informed by data from testing, bug reports from

VE usage, production logs

○ Continuous deploys○ mediawiki.org/wiki/Parsoid/Deployments○ Very solid thanks to work invested in testing

Page 6: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4: Testing infrastructure

○ Parser tests○ Now simulating edits○ Support multiple formats per test for cases where

PHP / Parsoid HTML differs (images)○ Working on HTML tidy integration

■ Important for Parsoid HTML5 page view preparation

○ Round-Trip testing○ Trivial edit mode to detect selser (selective

serializer) regressions

Page 7: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Parser test results

● Better coverage● More passing tests

# tests wt2html wt2wt html2wt html2html selser

Q3 1280 821 / 459 1139 / 136 286 / 963 749 / 494 15994 / 1387

Q4 1347 911 / 436 1198 / 125 390 / 895 814 / 467 16438 / 1198

Page 8: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Round-trip test resultsQ3: 0.33%have semantic diffswithout selser

Now: 0.25%have semantic diffswithout selser

With selser: 0.006%(after trivial edits) only diffs introducedare single missing newlines.

Page 9: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Round-trip test results

Page 10: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

○ Goals: ○ speed up editing & logged-in browsing○ enable new ways to view & edit content○ avoid maintaining two render pipelines in the

longer term○ Developed Parsoid-specific CSS

○ now used by VE (Desktop + Mobile), Flow, CX etc○ Found & fixed some render issues in manual

testing○ Next steps:

○ Automate render testing / visual diffing○ Services to provide needed infrastructure (Q1)

Progress Q4:

Get ready for HTML5 page views

Page 11: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Get ready for HTML5 page views

Page 12: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Performance: nodegrind

Page 13: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Performance

Page 14: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Performance: HTML request

Page 15: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Performance: HTML request

Page 16: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Images

○ No shortage of edge cases, big mess in MW

○ Now basically bug-for-bug compatible○ Started initiatives to clean up image

handling in the longer term○ semantic image formats○ square bounding boxes○ less confusing options, better uniform styling in

skin / view

Page 17: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

DOM tpl. parameter editing

○ Parse transclusion parameters to DOM○ enables visual parameter editing○ not yet enabled in prod as no VE support yet

○ Performance implications ○ Parse times:

■ ~10s -> ~16s on transclusion-heavy pages like [[Barack Obama]]

■ Transclusion reuse will partly mitigate problem in prod.; need to quantify this before deploy

○ HTML size: ■ Larger DOM■ Will be mitigated when attributes are moved

out (depends on storage service, goal Q1)

Page 18: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Structured Logging

● GELF logging using bunyan● Will hook up to logstash, makes it easier to

analyze issues & usage

● Even in current form, has exposed errors in production that have since been fixed.

Page 19: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Language variants

○ Work just started○ Basic parsing & DOM representation○ Lots of very different bits of functionality

squeezed into one -{ … }- syntax○ Occupies an awkward post-processing spot in

PHP parser; better integrated in Parsoid; will have to unify in future

○ Next: Full editing support○ But: no rule/glossary support yet (RFC for that)○ Editing will be in a mix of variants

■ Handling variants better is a Research Project (ie, different ideas how to best do it)

Page 20: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Wikitext linter GSoC project

○ GSoC ‘14 project (Hardik Juneja)○ Flags wikitext “errors”

■ Missing end-tags, fostered content, etc.○ Collect statistics on wikitext usage patterns

■ Ex: Multi-template uses○ Surfaces info that Parsoid already has since:

■ it has to recover from these errors■ it has to reproduce errors back

○ Collected during normal parsing■ Sends data to an “external” webservice

○ Name? Lintoid/LintTrap/WikiLint/Linter/?? ○ Bikeshedding reqd .. not necessarily now. :-)

Page 21: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Wikitext linter GSoC project

○ http://lintbridge.wmflabs.org/○ presents results with HTML & web service

interfaces○ Future plan:

○ Enable in prod (either always or in sampling mode: 1 in N parses depending on overheads)

○ Pursue collaboration with Project CheckWiki○ Possibly in decent shape by Q2

Page 22: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

HTML templating

● Goals:○ Secure & sane: Balanced DOM, Context Sensitive

Escaping○ Efficient to execute on server or client○ Content: Develop convincing alternative to

wikitext-based templating● Implemented KnockOff / TAssembly Q3● Mobile now testing KnockOff / TAssembly

for client-side UI templating● first design sketch for i18n & content tpl.● had useful conversations with i18n folks

Page 23: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Helped Content Translation team

Page 24: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Services

○ Gabriel transitioning into services team○ Starting up: hiring now○ Goal setting, design discussions

■ Authentication RFC■ REST API front-end■ Storage service

○ Infrastructure work○ pushing (Debian, for now) packaging along

○ Scott working on PDF renderer○ Leverages semantic info in Parsoid HTML for

massaging & conversion to LaTeX○ Getting ready for deploy

Page 25: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Progress Q4:

Things we didn’t get done

● Language variant editing (only partly)● Visual diffing (for HTML pageview testing)● Performance: More efficient template

updates (stretch goal for Q4)○ lower priority than HTML storage

● Research: Support switching between HTML and Wikitext within one edit○ works better with stable-id support

● Research: Content widgets

Page 26: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Stretch goals Q4

○ Non-Wikipedia projects○ Parsoid enabled on all public WMF projects○ biggest issue for editing seems to be labeled

section transclusion (primarily Wikisource)○ potential to fix complexity issues in projects like

Wiktionary○ no time spent on this so far

Page 27: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

2014/15 plans

https://www.mediawiki.org/wiki/Parsoid/Roadmap/2014_15_Draft

Broad focus areas:● Continue iterating on RT & render quality,

performance● Language variant support● Parsoid HTML page views● Stable element ids● Support for HTML wikis

Page 28: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

2014/15 Q1-Q4: More of the same

Not tied to roadmap tasks / features:● Bug fixing● Long tail of compatibility with PHP parser

+ Tidy combo● Performance work● Maintenance and nodejs upgrades (0.8 →

0.10 → 0.11 …)● Continuous deploys● GSoC, talks, conferences, etc.

Page 29: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

● Finish up basic parsing support for lang. variants

● Use a bot to fix nesting issues:-{zh-cn=<span style='color:red';zh-tw=<span style='color:green'}->foo</span> rewritten to <span style=”-{zh-cn='color:red';zh-tw='color:green'}-”>foo</span>

● Finish basic “mix of variants” editing support● Evolve longer term strategy

○ glossaries, page-specific variant rules, etc.○ better editing support when working in a variant

(or use content translation mechanisms)

2014/15 Q1:Language variant support

Page 30: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

2014/15 Q1-Q3/Q4: Parsoid HTML pageviews

● Dependencies:○ HTML storage + Content API (Services team)○ User-agnostic HTML + client-side customization

■ Redlinks, etc. (Services, platform teams)○ Any team willing to experiment with this initially

● Time line:○ Prototype by end of Q1

■ will have good idea of remaining render quality work by then

○ Move to beta in Q2○ Work with Mobile on specific skinning

Page 31: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

2014/15 Q1-Q3/Q4: Parsoid HTML pageviews: Tasks

● Reduce HTML footprint○ Move data attributes into metadata storage○ Size difference

■ Currently: 3-3.5x on large pages■ Goal: ~1x

● Identify missing functionality/support○ Ex: Mixed content/styling templates○ Long tail of rendering diffs

● Testing + QA○ Tidy support in parser tests (Work-In-Progress)○ Visual diffs○ Reuse existing RT-testing framework

Page 32: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

2014/15 Q1-Q3:Stable element ids (What?)

● Parsoid parses page P with WT foo<p id=”1”>foo</p>

● I edit source of P to bar\n\nfoo● Parsoid reparses P: <p id=”..”>bar</p><p id=”..”>foo</p>● Can we re-assign id 1 to the second para?

Page 33: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

2014/15 Q1-Q3:Stable element ids (Why?)

● Lets us associate metadata with content elements○ authorship maps○ efficient diffs○ inline comments○ content translation tracking○ awesome features we haven’t thought about yet

● Lets us slim down HTML size, but still support switching between HTML & wikitext in VE

Page 34: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

2014/15 Q2-Q4:Support for HTML wikis

● HTML content templates● Content widgets for some scenarios

○ Ex: navboxes, data tables● HTML diffs (for revision history)● Abuse filters

Cross-team efforts○ Parsoid, Services, VE, Platform, Design,

Community, ...

Page 35: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Q1 tasks

● HTML views: work towards a prototype○ Tidy support in parser tests○ Visual diffs + reuse existing RT-testing framework○ Start plugging gaps in rendering accuracy

■ Ex: mixed content-style templates used heavily in some infoboxes

○ Move data attrs to metadata storage■ Depends on storage support (services Q1)■ Work with VE + other clients that depend on

these attributes for a transition plan.○ Anything else?○ First experiment with PDF rendering and/or

“printable” page view?

Page 36: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Q1 tasks

● Language variant support○ Finish up basic parsing / editing support

● HTML for template parameters○ quantify perf issues○ work with VE team to enable it

● Wrap up wikitext linting GSoC project● Resume work on stable id support● Hook up logging with logstash● Other:

○ bug fixes, deploys○ Wikimania, post-wikimania vacations

Page 37: Quarterly review Q4, June 2014 - Wikimedia€¦ · Quarterly review Q4, June 2014. Agenda ... GSoC ‘14 project (Hardik Juneja) ... Resume work on stable id support

Thank you!https://www.mediawiki.org/wiki/Parsoid