csun presentation-170302-hykim

13
AUTOMATIC AND SEMI-AUTOMATIC 2-TIER CHECK SYSTEM FOR EPUB ACCESSIBILITY 2017.03.02 Hyun-Young Kim SookMyung Women’s University

Upload: hyunyoung-kim

Post on 03-Mar-2017

26 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Csun presentation-170302-hykim

AUTOMATIC AND SEMI-AUTOMATIC 2-TIER CHECK SYSTEM

FOR EPUB ACCESSIBILITY

2017.03.02 Hyun-Young Kim SookMyung Women’s University

Page 2: Csun presentation-170302-hykim

WHAT IS EPUB

• One of eBook File Format

• De Facto Standard published by the International Digital Publishing Forum (IDPF) Since 2007

• De Jour International Standards Organization as ISO/IEC TS 30135 (parts 1-7) in 2014

• EPUB 2.0 in October 2007,

Maintenance update (2.0.1) in September 2010.

EPUB 3.0 in October 2011

Maintenance update (3.0.1) in June 2014

The current version of EPUB 3.1 in January 2017

Page 3: Csun presentation-170302-hykim

EPUB & WEB RELATION

• EPUB production needs web technologies

• W3C's Web Accessibility Initiative

Web Content Accessibility Guidelines (WCAG) 2.0

Accessible Rich Internet Applications (WAI-ARIA) 1.0

• Also EPUB needs book metaphor and structure information

Semantic Markup Features

Navigation Features

Page 4: Csun presentation-170302-hykim

EXISTED ACCESSIBILITY DOCUMENTS

• IDPF EPUB3 Accessibility Guidelines

• Semantics, Navigation, Metadata

• XHTML Content Documents, MathML, SVG, EPUB Style Sheets, Media Overlay

• IDPF EPUB Accessibility 1.0

• Developed as part of EPUB 3.1 to provide guidance on making EPUB publications accessible

• BISG (Book Industry Study Group) Quick Start Guide To Accessible Publishing

• Essential Check Points from EPUB3 Accessibility Guidelines

• DAISY member, DIAGRAM Image Description Guidelines

• Description guidelines that apply to any type of image.

• Guidelines for describing images within specific types of categories, such as maps.

Page 5: Csun presentation-170302-hykim

EPUB PRODUCTION STATUS IN KOREA

• Only Conversion, No Accessibility

• National Library should reproduce DAISY or Accessible EPUB

• The library defined e-book accessibility certification criteria

and designated that as an industry standard in Korea

• Proposed Accessibility Checker is based on e-book accessibility certification criteria

Page 6: Csun presentation-170302-hykim

PROPOSED CHECKER

• 156 Check Points from Previous Guidelines

• Some Check Points can be decided automatically

• Language Definition, Existence of LOI and LOT, Existence of LOA and LOV, and etc

• Others can be decided manually

• epub:type attribute is meaningful enough

• whether the page number accurately is the same as the number at paper book, and etc.

• 2-tier Checker

• Automatic Check for 39 Points, PC Standalone version

• Semi-Automatic Check for 117 Points, Web version linked with editor

• Web Checker indicates points where problems may occur

• HTML Editor that opens XHTML and CSS documents after decomposing EPUB

Page 7: Csun presentation-170302-hykim

AUTOMATIC CHECKER

Page 8: Csun presentation-170302-hykim

SEMI-AUTOMATIC CHECKER

Page 9: Csun presentation-170302-hykim

PROPOSED CHECKER VS. EPUBCHECK

• EpubCheck

• Tool to validate EPUB files, developed by IDPF and DAISY

• Detecting many types of errors in EPUB structure such as OCF container structure, OPF and OPS mark-up, internal

reference consistency

• Do not Support Any Accessibility Issues

• Proposed Checker

• Tool to investigate the accessibility of EPUB

• Some modules are same as those of EpubCheck

parsing in the EPUB Package and checking the OCF Related Content

Page 10: Csun presentation-170302-hykim

WORKFLOW OF PROPOSED CHECKER

EPUB

XHTML

CSS

SVG

SMIL

Decomposition

Navigation

Inspection

Lang / Audio Clip / Video Clip / Alt Text …

CSS separation / em / strong / Formatting / justified …

SVG lang / description

media-type / list

TOC / LOI / LOV / LOT…

OPF Metadata / lang …

Page 11: Csun presentation-170302-hykim

VERIFICATION OF CHECKER

• 50 EPUB files which has deposited into the national library of Korea

• 148 accessibility defects per each file on average

• Accessibility errors focus on 8 points

• The Korean e-book market has EPUB2x 90% and EPUB3x 10%

• Rare percentage of e-books available for Multimedia, MathML, and Media Overlay support

• 8 error points occurs at parts which are irrelevant to EPUB3 specifications

Page 12: Csun presentation-170302-hykim

MAJOR DEFECTS

• To define the default language for an XHTML document, the lang and xml:lang language attributes need to be attached to the root

html element. It occupies 41% over all defects.

• In the case of multilingual publications, best practice is to always specify the language in each content document to ensure proper

rendering. It occupies 21% over all defects.

• When using the epub:type attribute in a content document, the epub namespace must be declared on the element containing the

attribute, or on one of its ancestors. It occupies 13% over all defects.

• Images that are central to the understanding of a publication must always include a text alternative in their alt attribute. It occupies

7% over all defects.

• When creating hyperlinks, the text inside of the link can provide the full context of what is being linked to or the link can have

alternate text. It occupies 7% over all defects.

• Separating style from markup is consequently not just about keeping CSS in a separate file from your markup, but recognizing that

markup must convey meaning to be useful to all readers. It occupies 7% over all defects.

• When using bolding and italics, EPUB follow the rules of HTML5 and CSS standard. It occupies 2% over all defects.

• Avoid justifying text, as the uneven spacing that occurs between words can reduce the readability for some people. It occupies 1%

over all defects.

Page 13: Csun presentation-170302-hykim

FUTURE WORKS

• 1st tier automatic system could pick up problematic items which defined as 39 check points

• It is responsible for 25% of all 156 check points

• 2nd tier semi-automatic system handle 75% of check points

• It should be changed into automatic detection through Machine Learning algorithm