5 reasons not to use dita from a ccms perspective

5 Reasons not to useDITA from a CCMS Perspective

Marcus KesselerManaging Director – SCHEMA GmbH

TEKOM 2015Stuttgart – November 10

Some Definitionsand Terminology

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Definitions and Terminology:Marcus Kesseler, SCHEMA & DERCOM

Marcus Kesseler

Computer Scientist with a heavy Artificial Intelligence background.

One of two founders and managing directors of SCHEMA GmbH.

SCHEMA

A software company based in Nürnberg.

SCHEMA is 20 years old and we have been makingand selling CCMS from day one.

DERCOM

Is the Association of German Manufacturers of Authoring and Content Management Systems.

Currently 7 companies, with 1,400 customers between them.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Definitions and Terminology:CCMS

CCMSComponent Content Management System.

The main difference between a CMS and a CCMS:A CCMS has the ability to aggregate content components into larger documents.

A CCMS is able to publish content as “classic” documents or as Web portal content or app content, all with very high quality.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Definitions and Terminology:DITA

DITADarwin Information Typing Architecture, an XML and files-based standard for the representation of componentized and interlinked content.

Although there are several DITA-based CCMS implementations, DITA can be used with just an XML Editor, the file system and the DITA Open Toolkit.

What we like about DITA, is the visibility it brings to the enormous advantages of componentized content.

We fully agree with the DITA community, that there really is no alternative to working with components (or topics) in large-scale, state-of-the-art technical content authoring, management and distribution.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

More Terminology: Essentialand Incidental Complexity

Essential complexity, also called intrinsic or inherent complexity, is the complexity you cannot hide or get rid of in a software implementation. It is directly derived from the domain you are modelling.

Example: When moving from a document based content authoring to a componentized one, the number of objects you have to deal with goes up by two or three orders of magnitude. The only way to hide this increase would be to hide the components, which, of course, would defeat the purpose.

Incidental complexity, is an extra dose of complexity added on top of the essential complexity by bad choices of architecture, data representation or user experience design.

Context of this Talk:Large Technical Content Departments

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Our context is not the Lone Technical Content RangerAll arguments in this talk assume that we are talking about the processes and needs of large technical content department operating at a high level of maturity.

We are not talking about the perspective of the Lone Technical Content Ranger.

Russell Ward presented this perspective in his great talk last year here at tekom 2014:

Five reasons not to use DITA

[http://conferences.tekom.de/fileadmin/tx_doccon/slides/742_5_Reasons_Not_to_Use_DITA.pdf]

http://conferences.tekom.de/fileadmin/tx_doccon/slides/742_5_Reasons_Not_to_Use_DITA.pdf

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Large Technical Content Departments: Some Parameters

So, what is a Large Technical Content Department?

5 to several dozen technical writers.

Publications have to be regularly updated in 5 to 30(or more) languages.

Multiple publication formats, including:

Paginated formats, like PDF (directly or via InDesign, FrameMaker or Word).

Online formats, like HTML, HTML5, EPUB, etc.

Custom XML formats.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Large Technical Content Departments: Processes & Worflows

The following are defined and enforced:

Writing standards and terminology

Translation standards and workflows

Artwork & media standards and workflows

Publication workflows

Release workflows

Distribution Workflows

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Large Technical Content Departments: Core Challenges

Layout has to be of the highest quality, strictly adhering to Corporate Design standards.

Products are highly modular or organized in product families with common base features, both of which are key requirements for effective and massive content reuse.

Product innovation is fast and relentless, the technical content team is always under pressure to keep product and information life cycles in sync.

So, just another great day in the wonderful world of technical content publishing. Life is good!

Reason 1Coverage of Component Content Management Requirements in DITA is Surprisingly Small

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Requirements Coverage ofXML, DITA and CCMS# Process Name & Requirements Max

Points XML DITA CCMS

1 Topics management(classes, workflows, versioning, ownership, access control). 10 0 3 9

2 Manage the links between topics(classes, workflows, versioning, ownership, referential integrity). 10 0 3 9

3Management of the maps that build the publications out of the underlying components(versioning, ownership, referential integrity).

10 0 3 9

4 Manage the metadata on topics, links and maps(classes, workflows, versioning, ownership). 10 1 2 9

5 Translation management with automatic flagging of topics needing re-translation(ownership, workflow, dataflow). 10 1 1 8

6 Media assets management(classes, workflows, ownership, guidelines, conversion, translation). 10 1 2 7

7 Publication formats and layout management(design within corporate guidelines, implementation, revisions). 10 0 4 8

8 Automatic publication generation and channel specific distribution(workflow, IT systems integration). 10 0 2 6

9Overall content, links and publications quality assurance and approval processes(correctness, writing style, terminology, translations, links, publication maps, graphics and layout).

10 2 3 8

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Requirements Coverage ofXML, DITA and CCMS# Process Name & Requirements Max

Points XML DITA CCMS

10 Information model management(conceptual design, classes, roles, rights, workflows, evolution). 10 0 2 9

11Performance & costs management(financial controlling, key performance indicators monitoring, tracking, corrective actions)

10 0 2 4

12 Security(user management, user roles, access control, change tracking). 10 0 0 8

13 IT and software infrastructure management(change, updates and upgrades). 10 0 0 4

14Manage the communication with adjacent departments, like product management, engineering and marketing(responsibilities, workflows).

10 0 0 3

15 Team management(skills, training, structure, responsibilities, motivation). 10 0 0 0

Coverage [Points] 150 5 27 101Coverage [Percent] 3% 18% 67%

Coverage with CCMS baseline [Percent] 27% 100%

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Requirements Coverage ofXML, DITA and CCMS

XML DITA CCMS[DITA]

CCMS[DERCOM]

BusinessLogic inDITA OpenToolkit

BusinessLogic in

Database,WorkflowSystem,

TMSInterfaces,

MediaAssets

Management,etc

Non-DITA CCMSs bonus forbeing on the market for at

least 10 years longer?

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Drawbacks of a Small Requirements Coverage

Comparing CCMSs based on their level of DITA compliance would not yield much insights, since most requirements are outside of DITA’s scope.

All features not within DITA’s scope would not be trivially portable to other DITA-based systems. Some examples:

Versioning

Translation states & dataflow

Release and ongoing workflow states

Media assets management

Access rights & user management

Note: Even with a DITA-based CCMS, you wouldincur a significant amount of vendor lock-in!

Reason 2Evolution of the DITAStandard is too Slow

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Evolution of DITA is too Slow

An update every five years is just not compatible with the demands of an ever accelerating market (variables? scoped keys?).

Fast evolution of DITA is impeded by the following two inherently conflicting requirements:

The need to add features that are crucially missing in real-life application scenarios.

The need to prevent new features that would add even more incidental complexity to the standard.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Evolution of DITA is too slow

Scoped keys are a good example:

Under heavy reuse scenarios you are very, very likely to need them.

On the other hand, should tech writers really need to be trained in programming languages scoping concepts, just to be able to handle reuse variability?

Reason 3How DITA deals with theNumber of Files Explosion

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

How is a DITA TopicRepresented in a File System?

TOP

[XML]DITATopic

File

File Metadata(Name, Owner, LastWriteDate, …)

Metadata withinXML DITA Topic(class, author, target audience, …)

XML Content

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Now we add some translations…

TOPEN . . .TOP

FRTOPJA

TOPPT

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

… and some versions …

TOPENV1

. . .TOPFRV1

TOPJAV1

TOPPTV1

TOPENV2

. . .TOPFRV2

TOPJAV2

TOPPTV2

TOPENVn

. . .TOPFRVn

TOPJAVn

TOPPTVn

...

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

… and after several years, a single topicmay have proliferated into m × n files!

TOPENV1

TOPFRV1

TOPJAV1

TOPPTV1

TOPENV2

TOPFRV2

TOPJAV2

TOPPTV2

TOPENVn

TOPFRVn

TOPJAVn

TOPPTVn

n versions

m languages

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

How m × n Topics areaccessed in DITA

In DITA each single translation or version is a unique, individual file and hence a distinct topic.

The user has to know exactly what language and version is being referenced.

Keys or file names will likely follow some pattern like this:Topic_Intro_en_V1

Topic_Intro_fr_V1

Topic_Intro_ja_V1

Topic_Intro_en_V2

Topic_Intro_fr_V2

Topic_Intro_ja_V2

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

How m × n Topics areAccessed in a CCMS

In a CCMS implemented on top of a database, all these m × n topics can be addressed with a single key:

[ID_Intro, Language, LatestReleasedVersion]

where Language and LatestReleasedVersion are variables, that the system will automatically populate as needed.

In Computer Science this is called a composite key, and was invented over 45 years ago at IBM.

Composite keys capture and optimally encode the regularities in the target domain and let the computer do the tedious book-keeping. This is what computers are good at!

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

How m × n Topics are Accessedby the Author in a CCMS

Authors will rarely need to see, insert or handle full CCMS composite topic keys:

[ID_Intro, Language, LatestReleasedVersion]

Since the composite key structure is universal within the system, there is no need to explicitly represent the variable parts. They are optional and will be implicitly added at document aggregation time.

What the author sees and handles is just:

[ID_Intro]

And, of course, usually even this is hidden by the GUI.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Advantages of Composite Keys

DITA would be so much easier, if references were defined as composite keys:

Maps would be directly reusable. No need to create and maintain a map for each language. A change to the map structure in English is automatically available in all other languages.

New languages (or versions) can be added to your pool without touching the maps at all!

No need to develop, train and enforce sophisticated file name or key patterns to manually capture and encode these rather trivial domain regularities.

Authors need only insert a reference to the topic, the system does the tedious and error-prone book-keeping.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Representation of m × nTopics in a CCMS

EN FR JA PT

TOPIC

Metadata forthis version in this language

Metadata forall versions in this language

Metadata forall versions inall languages

Topiccontainer

Language container

XMLcontainer XML

V1

XML

V2

XML

Vn

XML

V1

XML

V2

XML

Vn

XML

V1

XML

V2

XML

Vn

XML

V1

XML

V2

XML

Vn

XMLcontent

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Cool stuff you can easilydo with Composite Keys

A complete and detailedtranslation status report isjust a trivial query.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Translation Report: Details

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Representation of aGraphic in a CCMS

Neutral

GRAPHICGraphic

container

Language container

Formatcontainer V1

Vector [SVG]

Graphicsfile

V2 Vn

V1

Pixel [PNG]V2 Vn

V1

Source V2 Vn

EN

V1

Vector [SVG]V2 Vn

V1

Pixel [PNG]V2 Vn

V1

Source V2 Vn

PT

V1

Vector [SVG]V2 Vn

V1

Pixel [PNG]V2 Vn

V1

Source V2 Vn

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Call Out Designer

Reason 4DITA‘s XML-first Paradigmvs.a Database-first Paradigm

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

DITA‘s XML-first Paradigm vs. a Database-first Paradigm

In DITA, every information or data that is needed to drive business processes has to be inside the XML files together with the content as such (= DITA’s XML first paradigm).

This goes against quite a few Computer Science information model designing principles.

Any change, however minimal, to a topic can affect content, structure, linking or metadata and therefore has to be carefully scrutinized to identify what exactly changed and if any consistency rules were broken.

Enforcing the principles of Atomicity, Consistency and Isolation in DITA is quite a challenge (cf. The ACID Principles of Database Design).

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

DITA‘s XML-first vs. Database-first

Please note that DITA’s XML first is a huge incidental complexity driver for DITA-based CCMS implementations: There is pressure to improve metadata handling by keeping them in

the database, but, with XML-first, you also have to keep them in the DITA files. Now there are two distinct and separate representations. You’ve lost your single source of truth.

The database value and the DITA XML value can get inconsistent from update conflicts and may have to be manually corrected by the users.

Controlling change permissions for individual metadata values in a file is also a huge challenge. It is possible to do it in good XML editors. But users can still open the XML file in Notepad…

Reason 5The Default DITA ContentModel is too Complex

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Trend in CCMS: Content Model Complexity Reduction

In the last 10 years, there has been a very strong trend in the CCMS market to reduce content model complexity (aka semantic DTDs).

Content departments observed, that in the long term, they never got back their investment into design, implementation, training and especially maintenance of their sophisticated, made-to-order content models.

The trend is simply to move the needed business data from the XML content into the database, where it is much easier to implement, manage, interface with, retrieve and use productively.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Examples of ContentModel Complexity Reduction

Some examples:

Topic types or classes are just metadata in the database. The variability on the XML Editor (DTD) level is reduced to an absolute minimum.

All metadata assigned to a topic is moved from the XML into the database.

Fine grained variability in the content is handled by variables, which on the XML content level are just very simple references into the database. The data model for variables in the database is very powerful and table oriented (=EXCEL), so that it is easy to maintain versions, languages and taxonomic dependencies of variable names and values without touching the XML content.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

DITA Specialization

As a Computer Scientist, I think DITA Specialization is a really impressive and elegant solution for the implementation of sophisticated content models.

But again, DITA is adding all this sophistication to the XML level, where it will incur a big cost in incidental complexity.

I think that there is a consensus, that even the default DITA content model is already challenging for most technical writers new to component-based authoring.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

DITA Specialization

There is a paradox, in that just to trim the content model down to a more manageable scope, you already need a significant amount of consulting and configuration.

The OASIS Lightweight DITA Initiative, chaired by Michael Priestley (IBM), is trying to remedy this situation, so that you can start simple and add more features later, when you understand the principles and can be sure that you really need them.

Summary & Conclusion

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Summary of our5 Reasons against DITA

1. Coverage of Component Content Management Requirements in DITA is Surprisingly Small.

2. Evolution of the DITA Standard is too Slow.

3. How DITA deals with the Number of Files Explosion.

4. DITA‘s XML-first Paradigm.

5. The Default DITA Content Model is too Complex.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Conclusion

As long as the DITA standard is based on a non-negotiable XML-first paradigm, it will always incur a tremendous incidental complexity cost on multiple levels:

Initial configuration, even if just to trim DITA back, is significant.

Integrating DITA into a CCMS (or database) is fragile and expensive.

Technical writers will need a lot of training and close motivation monitoring.

SCH

EMA

Gro

up 2

015 –

All r

ight

s re

serv

ed

Recommendation

Our recommendation would be to decouple the DITA business logic from the XML-first principle.

In the end, this means the DITA Open Toolkit would not be just a smart topic aggregation compiler, but behave much more like an integrated database application, in short: just like a state-of-the-art CCMS.

Tekom 2015 presents a very convenient opportunity to take a closer look at these systems!

Thank you very muchfor your attention!

Lesen Sie unseren Blog http://blog.schema.de

http://blog.schema.de/

5 reasons not to use dita from a ccms perspective

Software