automated qa of dita content sets
TRANSCRIPT
Automated QA of DITA Content Sets
Ben ColbornSr. Manager, Technical Publications, Nutanix
“[The Machine] is a universal educator, surely raising the level of human intelligence. …Every age has done its work … with the best tools or contrivances it knew, the tools most successful in saving the most precious thing in the world—human effort.”—Frank Lloyd Wright, “The Art and Craft of the Machine”
What is quality?
First order of quality: ROT vs. RAT
RedundantObsoleteTrivial
RelevantAccurateTimely
Second order: Surface featuresWhat can be discerned by an editor• General writing conventions• Organizational conventions• Domain conventions• Information types• Grammaticality• Terminology
Level Definition Automatable
Coordination Manuscript handling, job monitoring and control Policy Ensuring that a publication reflects the policy of the
organization Integrity Ensuring that parts of a publication match Screening Spelling, S-V agreement Copy clarification
Clarifying illegible text, preparing graphics Format Ensuring conformity with format Mechanical style
Checking capitalization, abbreviations, use of numbers, consistency of spelling, organizational terminology
Language Checking grammar, usage, parallelism, conciseness Substantive Ensuring that the necessary content for the intended scope
is present
Which levels of edit can be automated?
Automate what you can to free people to do what computers can’t!
Division of laborHuman Machine• Coordination• Policy (mostly)• Copy clarification• Substantive
• Integrity: validate and check for completeness
• Screening: spell checker, grammar checker
• Format: schema-driven authoring, automatic stylesheet application
• Mechanical style: QA plugin• Language: Acrolinx and
aspirants
Mechanical style example• MMSTP prohibits “click on”• How reliably will a computer find all occurrences of
“click on” in 500 pages of content? How long will it take?• How reliably will a person? How long will it take?• What tasks of higher impact could the person have
done in the same time?• How will the person feel after making this attempt?• What about when there are 100 rules and not just one?
Approaches
Terminology Markup• Acrolinx• Shared dictionary• String-matching script
• Constraints• Schematron• XPath-matching script
Ditanauts QA plugin
Spelling/grammar check Editorial/peer
review
QA plugin
Three legs of a QA process
Ditanauts QA plugin
Overview• Freely available on GitHub• Customization of HTML Open Toolkit Plugin• Checks for the occurrence of XPath expressions• Creates a report DB (for customization) and user-
readable reports
Process1. Compile terminology checks into XPath expressions2. Check each topic for the occurrence of user-configured
XPath expressions3. Write a database file (DITA topic) listing each topic
with the found violations and other metadata4. Write user-readable reports: quality summary,
DITAMAP, CSV
Input: XPath expressions
Input: Expression compiler
ExecutionOT 1.x
> ant -Dtranstype=qa -Douter.control=quiet \-Dargs.input=samples/taskbook.ditamap \-Dsetchunk=true
OT 2.x> dita -f qa -i samples/taskbook.ditamap \
-Dsetchunk=true
Output: Database file
Output: CSV
Output: DITAMAP
Best practices• Keep the list of violations short.• Only include violations that are likely to occur in your content set.• Only include violations that are impactful.• Only include rules that are systematically violated.• Update the violations list over time.• Carefully craft checks to avoid false positives.• Provide a specific resolution for each violation.• Use @class rather than element names in the XPath expressions.• Designate a project team member to run the QA routine and
follow up on resolution.
Resources• QA plugin
https://github.com/dita-community/org.dita-community.qa• Ditanauts blog
http://ditanauts.org/tag/qa/