knowledge engineering for teldap

29
Knowledge Engineering for TELDAP Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica

Upload: aat-taiwan

Post on 25-May-2015

496 views

Category:

Education


3 download

DESCRIPTION

Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAPResearch Fellow Research Center for Information Technology Innovation &Institute of Information Science, Academia Sinica

TRANSCRIPT

Page 1: Knowledge Engineering for TELDAP

Knowledge Engineering for TELDAP

Keh-Jiann Chen Principal Investigator

Core Platforms for Digital Contents Project, TELDAPResearch Fellow

Research Center for Information Technology Innovation &Institute of Information Science, Academia Sinica

Page 2: Knowledge Engineering for TELDAP

Outline

IntroductionUnion catalogDatabases and metadata for digital contents and websitesKnowledge engineeringFuture perspective

Page 3: Knowledge Engineering for TELDAP

IntroductionThe integration and management of digital contents has become an important issue as the amount of digital contents produced from different projects and institutions increases rapidly.Our project goal is to achieve optimized preservation, retrieval, and presentation of digital collections.

Page 4: Knowledge Engineering for TELDAP

1. Union Catalog

Page 5: Knowledge Engineering for TELDAP

What is the union catalog?

It is a catalog and portal for all digital collections of

TELDAP.

It is an integrated platform for browsing and searching

entire digital contents of TELDAP.

Metadata provides core descriptions and licensing

information of each digital collection.

Page 6: Knowledge Engineering for TELDAP

Browsing by topics

Search by keywords

Home Page of Union Catalogs

Page 7: Knowledge Engineering for TELDAP

2. Databases and metadata for digital contents and websites

Page 8: Knowledge Engineering for TELDAP

Metadata models for different types of objects

Archived digital itemsUnion catalog metadata model- Dublin core+

Web sitesDCCAP (Dublin Core Collections Application Profile)Fields for internal used only

Unique Identifier, Format, Evaluation, Cataloging History

DocumentsDocument metadata-Dublin core

Page 9: Knowledge Engineering for TELDAP

9

Metadata for

digital items:

Over 2 million

digital items and

still increasing

Element Definition

Title A name given to the resource

Creator An entity primarily responsible for making the content of the resource

Subject and Keywords The topic of the content of the resource

Description An account of the content of the resource

Publisher An entity responsible for making the resource available

Contributor An entity responsible for making contributions to the content of the resource

Date A date associated with an event in the life cycle of the resource

Resource Type The nature or genre of the content of the resource

Format The physical or digital manifestation of the resource

Resource Identifier An unambiguous reference to the resource within a given context

Source A Reference to a resource from which the present resource is derived

Language A language of the intellectual content of the resource

Relation A reference to a related resource

Coverage The extent or scope of the content of the resource

Rights Management Information about rights held in and over the resource

Page 10: Knowledge Engineering for TELDAP

10

Page 11: Knowledge Engineering for TELDAP

Metadata for websitesOver 200 websites and still increasingMetadata

DCCAP (Dublin Core Collections Application Profile)To Combine the standard with our requirements: 19 data fields

Page 12: Knowledge Engineering for TELDAP

The Website Homepage Picture

URL, Project Information

Type, Name, Author, Subject, Description, Language, Item Type, Target

Archived Information:URL, time, authorization

Copyright, Purpose, Other Information

Figure: http://digitalarchives.tw

Metadata for websites

Page 13: Knowledge Engineering for TELDAP

Dynamic categorizationUser-oriented categorization

General, elementary school students, high school students, researchers, …etc.

Topical-based categorizationArchaeology, painting, animal, plant, document, …etc.

Functional-based categorizationResearch, education, business, technology,…

Categorization based on institutionsAcademia Sinica, Taiwan U., Palace museum,…

Page 14: Knowledge Engineering for TELDAP

Purpose: EducationTarget: Elementary school student,

Junior high school student, Teacher…

Select Items: According to 40 evaluation indicators, select top 5 websites

Purpose: Creative applicationsSelect Items: According to 40 evaluation indicators, select top 5 websites

Purpose: Academic researchSubject: Animal, Archaeology, Anthropology…Select Items: According to 40 evaluation indicators, select top 3 websites

Figure: http://digitalarchives.tw

Page 15: Knowledge Engineering for TELDAP

Metadata for project documentsOver 5000 documents and still increasingMetadata- Dublin coreConstruct Teldapwiki- A Wikipedia for Teldap http://wiki.teldap.tw/

Page 16: Knowledge Engineering for TELDAP

3. Knowledge Engineering

Page 17: Knowledge Engineering for TELDAP

Plans of making knowledge structures for TELDAP

Construct metadata models for different objects.Establish hyperlinks between contexts and objects.

Develop keyword extraction tools.Design automatic tagging tools.

Construct Teldap ontology and thesaurusArt & Architecture Thesaurus by GettyChinese WordNet

Page 18: Knowledge Engineering for TELDAP

(1) Metadata models for different objectsDigital collections

Union catalog metadata model- Dublin core+Web sites

DCCAP (Dublin Core Collections Application Profile)Public fieldsPrivate fields

Unique Identifier, Format, Evaluation, Cataloging History

DocumentsDocument metadata-Dublin core

Page 19: Knowledge Engineering for TELDAP

(2) Establish hyperlinks between contents and objects

Identify keywords in contentsTag keywords with related object hyperlinks

Page 20: Knowledge Engineering for TELDAP

Develop hyperlink tagging toolsWord segmentation tools

Resolve word segmentation ambiguities and identify keywords.CKIP word segmentation system: http://ckipsvr.iis.sinica.edu.tw/

Page 21: Knowledge Engineering for TELDAP

Develop hyperlink tagging toolsTELDAP keyword dictionary

Extract keywords from metadata and establish object-keyword relations.

Extract text from XML data for each objectThe text are classified by topics, titles, descriptions, authors, locations, eras etc.From each class of text file extract keywords by automatic word segmentation and keyword extraction techniques.

Page 22: Knowledge Engineering for TELDAP

Prototype system for hyperlink taggerIdentify and select keywords from the input text

Page 23: Knowledge Engineering for TELDAP

Prototype system for hyperlink taggerProduce text with hyperlinks

Page 24: Knowledge Engineering for TELDAP

Prototype system for hyperlink taggerHyperlinks point to the related digital collections

Page 25: Knowledge Engineering for TELDAP

(3) Construct Teldap ontology and thesaurusTopical relationSynonym relation蘇軾=蘇東坡= Sushi鄭成功=延平郡王

Hypernym/hyponym器物→[陶器、瓷器]/[杯、盤、碗、甕]

Establish implicit links between objects by author, material, object type, …etc..

Page 26: Knowledge Engineering for TELDAP

(3) Construct Teldap ontology and thesaurusEstablish association links between Chinese keywords and Getty AAT.Merging Chinese WordNet with English WordNet

Page 27: Knowledge Engineering for TELDAP

Technology developmentConstruct multi-lingua thesauri – Getty AATMaintain the TELDAP keyword and object relation databaseConstruct name authority files, gazetteers, and universal calendarsDesign hyperlink taggers and keyword extension toolsDesigning authoring tool which provides hyperlinks of keyword related digital contents automaticallyDesign knowledge-based content retrieval system

Future Perspectives

Page 28: Knowledge Engineering for TELDAP

Content enrichmentWithin TELDAP:

Standardize object metadata model and data formatAll TELDAP objects should have their metadataWriting scripts and stories for different topics with Wiki-like knowledge structureEnrich the digital collections Establish hyperlinks between text books and TELDAP collections

Extend the knowledge sources:e.g. Wikipedia

Future Perspectives

Page 29: Knowledge Engineering for TELDAP

Thank you for your attention!敬請指教