5 reasons not to use dita from a ccms perspective
TRANSCRIPT
5 Reasons not to useDITA from a CCMS Perspective
Marcus KesselerManaging Director – SCHEMA GmbH
TEKOM 2015Stuttgart – November 10
Some Definitionsand Terminology
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Definitions and Terminology:Marcus Kesseler, SCHEMA & DERCOM
Marcus Kesseler
Computer Scientist with a heavy Artificial Intelligence background.
One of two founders and managing directors of SCHEMA GmbH.
SCHEMA
A software company based in Nürnberg.
SCHEMA is 20 years old and we have been makingand selling CCMS from day one.
DERCOM
Is the Association of German Manufacturers of Authoring and Content Management Systems.
Currently 7 companies, with 1,400 customers between them.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Definitions and Terminology:CCMS
CCMSComponent Content Management System.
The main difference between a CMS and a CCMS:A CCMS has the ability to aggregate content components into larger documents.
A CCMS is able to publish content as “classic” documents or as Web portal content or app content, all with very high quality.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Definitions and Terminology:DITA
DITADarwin Information Typing Architecture, an XML and files-based standard for the representation of componentized and interlinked content.
Although there are several DITA-based CCMS implementations, DITA can be used with just an XML Editor, the file system and the DITA Open Toolkit.
What we like about DITA, is the visibility it brings to the enormous advantages of componentized content.
We fully agree with the DITA community, that there really is no alternative to working with components (or topics) in large-scale, state-of-the-art technical content authoring, management and distribution.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
More Terminology: Essentialand Incidental Complexity
Essential complexity, also called intrinsic or inherent complexity, is the complexity you cannot hide or get rid of in a software implementation. It is directly derived from the domain you are modelling.
Example: When moving from a document based content authoring to a componentized one, the number of objects you have to deal with goes up by two or three orders of magnitude. The only way to hide this increase would be to hide the components, which, of course, would defeat the purpose.
Incidental complexity, is an extra dose of complexity added on top of the essential complexity by bad choices of architecture, data representation or user experience design.
Context of this Talk:Large Technical Content Departments
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Our context is not the Lone Technical Content RangerAll arguments in this talk assume that we are talking about the processes and needs of large technical content department operating at a high level of maturity.
We are not talking about the perspective of the Lone Technical Content Ranger.
Russell Ward presented this perspective in his great talk last year here at tekom 2014:
Five reasons not to use DITA
[http://conferences.tekom.de/fileadmin/tx_doccon/slides/742_5_Reasons_Not_to_Use_DITA.pdf]
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Large Technical Content Departments: Some Parameters
So, what is a Large Technical Content Department?
5 to several dozen technical writers.
Publications have to be regularly updated in 5 to 30(or more) languages.
Multiple publication formats, including:
Paginated formats, like PDF (directly or via InDesign, FrameMaker or Word).
Online formats, like HTML, HTML5, EPUB, etc.
Custom XML formats.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Large Technical Content Departments: Processes & Worflows
The following are defined and enforced:
Writing standards and terminology
Translation standards and workflows
Artwork & media standards and workflows
Publication workflows
Release workflows
Distribution Workflows
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Large Technical Content Departments: Core Challenges
Layout has to be of the highest quality, strictly adhering to Corporate Design standards.
Products are highly modular or organized in product families with common base features, both of which are key requirements for effective and massive content reuse.
Product innovation is fast and relentless, the technical content team is always under pressure to keep product and information life cycles in sync.
So, just another great day in the wonderful world of technical content publishing. Life is good!
Reason 1Coverage of Component Content Management Requirements in DITA is Surprisingly Small
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Requirements Coverage ofXML, DITA and CCMS# Process Name & Requirements Max
Points XML DITA CCMS
1 Topics management(classes, workflows, versioning, ownership, access control). 10 0 3 9
2 Manage the links between topics(classes, workflows, versioning, ownership, referential integrity). 10 0 3 9
3Management of the maps that build the publications out of the underlying components(versioning, ownership, referential integrity).
10 0 3 9
4 Manage the metadata on topics, links and maps(classes, workflows, versioning, ownership). 10 1 2 9
5 Translation management with automatic flagging of topics needing re-translation(ownership, workflow, dataflow). 10 1 1 8
6 Media assets management(classes, workflows, ownership, guidelines, conversion, translation). 10 1 2 7
7 Publication formats and layout management(design within corporate guidelines, implementation, revisions). 10 0 4 8
8 Automatic publication generation and channel specific distribution(workflow, IT systems integration). 10 0 2 6
9Overall content, links and publications quality assurance and approval processes(correctness, writing style, terminology, translations, links, publication maps, graphics and layout).
10 2 3 8
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Requirements Coverage ofXML, DITA and CCMS# Process Name & Requirements Max
Points XML DITA CCMS
10 Information model management(conceptual design, classes, roles, rights, workflows, evolution). 10 0 2 9
11Performance & costs management(financial controlling, key performance indicators monitoring, tracking, corrective actions)
10 0 2 4
12 Security(user management, user roles, access control, change tracking). 10 0 0 8
13 IT and software infrastructure management(change, updates and upgrades). 10 0 0 4
14Manage the communication with adjacent departments, like product management, engineering and marketing(responsibilities, workflows).
10 0 0 3
15 Team management(skills, training, structure, responsibilities, motivation). 10 0 0 0
Coverage [Points] 150 5 27 101Coverage [Percent] 3% 18% 67%
Coverage with CCMS baseline [Percent] 27% 100%
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Requirements Coverage ofXML, DITA and CCMS
XML DITA CCMS[DITA]
CCMS[DERCOM]
BusinessLogic inDITA OpenToolkit
BusinessLogic in
Database,WorkflowSystem,
TMSInterfaces,
MediaAssets
Management,etc
Non-DITA CCMSs bonus forbeing on the market for at
least 10 years longer?
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Drawbacks of a Small Requirements Coverage
Comparing CCMSs based on their level of DITA compliance would not yield much insights, since most requirements are outside of DITA’s scope.
All features not within DITA’s scope would not be trivially portable to other DITA-based systems. Some examples:
Versioning
Translation states & dataflow
Release and ongoing workflow states
Media assets management
Access rights & user management
Note: Even with a DITA-based CCMS, you wouldincur a significant amount of vendor lock-in!
Reason 2Evolution of the DITAStandard is too Slow
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Evolution of DITA is too Slow
An update every five years is just not compatible with the demands of an ever accelerating market (variables? scoped keys?).
Fast evolution of DITA is impeded by the following two inherently conflicting requirements:
The need to add features that are crucially missing in real-life application scenarios.
The need to prevent new features that would add even more incidental complexity to the standard.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Evolution of DITA is too slow
Scoped keys are a good example:
Under heavy reuse scenarios you are very, very likely to need them.
On the other hand, should tech writers really need to be trained in programming languages scoping concepts, just to be able to handle reuse variability?
Reason 3How DITA deals with theNumber of Files Explosion
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
How is a DITA TopicRepresented in a File System?
TOP
[XML]DITATopic
File
File Metadata(Name, Owner, LastWriteDate, …)
Metadata withinXML DITA Topic(class, author, target audience, …)
XML Content
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Now we add some translations…
TOPEN . . .TOP
FRTOPJA
TOPPT
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
… and some versions …
TOPENV1
. . .TOPFRV1
TOPJAV1
TOPPTV1
TOPENV2
. . .TOPFRV2
TOPJAV2
TOPPTV2
TOPENVn
. . .TOPFRVn
TOPJAVn
TOPPTVn
...
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
… and after several years, a single topicmay have proliferated into m × n files!
TOPENV1
TOPFRV1
TOPJAV1
TOPPTV1
TOPENV2
TOPFRV2
TOPJAV2
TOPPTV2
TOPENVn
TOPFRVn
TOPJAVn
TOPPTVn
n versions
m languages
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
How m × n Topics areaccessed in DITA
In DITA each single translation or version is a unique, individual file and hence a distinct topic.
The user has to know exactly what language and version is being referenced.
Keys or file names will likely follow some pattern like this:Topic_Intro_en_V1
Topic_Intro_fr_V1
Topic_Intro_ja_V1
Topic_Intro_en_V2
Topic_Intro_fr_V2
Topic_Intro_ja_V2
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
How m × n Topics areAccessed in a CCMS
In a CCMS implemented on top of a database, all these m × n topics can be addressed with a single key:
[ID_Intro, Language, LatestReleasedVersion]
where Language and LatestReleasedVersion are variables, that the system will automatically populate as needed.
In Computer Science this is called a composite key, and was invented over 45 years ago at IBM.
Composite keys capture and optimally encode the regularities in the target domain and let the computer do the tedious book-keeping. This is what computers are good at!
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
How m × n Topics are Accessedby the Author in a CCMS
Authors will rarely need to see, insert or handle full CCMS composite topic keys:
[ID_Intro, Language, LatestReleasedVersion]
Since the composite key structure is universal within the system, there is no need to explicitly represent the variable parts. They are optional and will be implicitly added at document aggregation time.
What the author sees and handles is just:
[ID_Intro]
And, of course, usually even this is hidden by the GUI.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Advantages of Composite Keys
DITA would be so much easier, if references were defined as composite keys:
Maps would be directly reusable. No need to create and maintain a map for each language. A change to the map structure in English is automatically available in all other languages.
New languages (or versions) can be added to your pool without touching the maps at all!
No need to develop, train and enforce sophisticated file name or key patterns to manually capture and encode these rather trivial domain regularities.
Authors need only insert a reference to the topic, the system does the tedious and error-prone book-keeping.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Representation of m × nTopics in a CCMS
EN FR JA PT
TOPIC
Metadata forthis version in this language
Metadata forall versions in this language
Metadata forall versions inall languages
Topiccontainer
Language container
XMLcontainer XML
V1
XML
V2
XML
Vn
XML
V1
XML
V2
XML
Vn
XML
V1
XML
V2
XML
Vn
XML
V1
XML
V2
XML
Vn
XMLcontent
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Cool stuff you can easilydo with Composite Keys
A complete and detailedtranslation status report isjust a trivial query.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Translation Report: Details
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Representation of aGraphic in a CCMS
Neutral
GRAPHICGraphic
container
Language container
Formatcontainer V1
Vector [SVG]
Graphicsfile
V2 Vn
V1
Pixel [PNG]V2 Vn
V1
Source V2 Vn
EN
V1
Vector [SVG]V2 Vn
V1
Pixel [PNG]V2 Vn
V1
Source V2 Vn
PT
V1
Vector [SVG]V2 Vn
V1
Pixel [PNG]V2 Vn
V1
Source V2 Vn
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Call Out Designer
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Call Out Designer
Reason 4DITA‘s XML-first Paradigmvs.a Database-first Paradigm
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
DITA‘s XML-first Paradigm vs. a Database-first Paradigm
In DITA, every information or data that is needed to drive business processes has to be inside the XML files together with the content as such (= DITA’s XML first paradigm).
This goes against quite a few Computer Science information model designing principles.
Any change, however minimal, to a topic can affect content, structure, linking or metadata and therefore has to be carefully scrutinized to identify what exactly changed and if any consistency rules were broken.
Enforcing the principles of Atomicity, Consistency and Isolation in DITA is quite a challenge (cf. The ACID Principles of Database Design).
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
DITA‘s XML-first vs. Database-first
Please note that DITA’s XML first is a huge incidental complexity driver for DITA-based CCMS implementations: There is pressure to improve metadata handling by keeping them in
the database, but, with XML-first, you also have to keep them in the DITA files. Now there are two distinct and separate representations. You’ve lost your single source of truth.
The database value and the DITA XML value can get inconsistent from update conflicts and may have to be manually corrected by the users.
Controlling change permissions for individual metadata values in a file is also a huge challenge. It is possible to do it in good XML editors. But users can still open the XML file in Notepad…
Reason 5The Default DITA ContentModel is too Complex
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Trend in CCMS: Content Model Complexity Reduction
In the last 10 years, there has been a very strong trend in the CCMS market to reduce content model complexity (aka semantic DTDs).
Content departments observed, that in the long term, they never got back their investment into design, implementation, training and especially maintenance of their sophisticated, made-to-order content models.
The trend is simply to move the needed business data from the XML content into the database, where it is much easier to implement, manage, interface with, retrieve and use productively.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Examples of ContentModel Complexity Reduction
Some examples:
Topic types or classes are just metadata in the database. The variability on the XML Editor (DTD) level is reduced to an absolute minimum.
All metadata assigned to a topic is moved from the XML into the database.
Fine grained variability in the content is handled by variables, which on the XML content level are just very simple references into the database. The data model for variables in the database is very powerful and table oriented (=EXCEL), so that it is easy to maintain versions, languages and taxonomic dependencies of variable names and values without touching the XML content.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
DITA Specialization
As a Computer Scientist, I think DITA Specialization is a really impressive and elegant solution for the implementation of sophisticated content models.
But again, DITA is adding all this sophistication to the XML level, where it will incur a big cost in incidental complexity.
I think that there is a consensus, that even the default DITA content model is already challenging for most technical writers new to component-based authoring.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
DITA Specialization
There is a paradox, in that just to trim the content model down to a more manageable scope, you already need a significant amount of consulting and configuration.
The OASIS Lightweight DITA Initiative, chaired by Michael Priestley (IBM), is trying to remedy this situation, so that you can start simple and add more features later, when you understand the principles and can be sure that you really need them.
Summary & Conclusion
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Summary of our5 Reasons against DITA
1. Coverage of Component Content Management Requirements in DITA is Surprisingly Small.
2. Evolution of the DITA Standard is too Slow.
3. How DITA deals with the Number of Files Explosion.
4. DITA‘s XML-first Paradigm.
5. The Default DITA Content Model is too Complex.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Conclusion
As long as the DITA standard is based on a non-negotiable XML-first paradigm, it will always incur a tremendous incidental complexity cost on multiple levels:
Initial configuration, even if just to trim DITA back, is significant.
Integrating DITA into a CCMS (or database) is fragile and expensive.
Technical writers will need a lot of training and close motivation monitoring.
SCH
EMA
Gro
up 2
015 –
All r
ight
s re
serv
ed
Recommendation
Our recommendation would be to decouple the DITA business logic from the XML-first principle.
In the end, this means the DITA Open Toolkit would not be just a smart topic aggregation compiler, but behave much more like an integrated database application, in short: just like a state-of-the-art CCMS.
Tekom 2015 presents a very convenient opportunity to take a closer look at these systems!
Thank you very muchfor your attention!
Lesen Sie unseren Blog http://blog.schema.de