the xml-based enterprise information portal solutions company

37
The XML-based Enterprise Information Portal Solutions Company

Upload: ginata

Post on 04-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

The XML-based Enterprise Information Portal Solutions Company. Extracting Knowledge from XML Documents Using Topic Maps. Eric Freese Director of Professional Services - Midwest Region ISOGEN International/DataChannel Knowledge Technologies 2001 – Austin, TX 7 March 2001. Premise. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The XML-based Enterprise Information Portal Solutions Company

The XML-based Enterprise Information Portal Solutions Company

Page 2: The XML-based Enterprise Information Portal Solutions Company

Eric FreeseEric Freese

Director of Professional Services - Midwest RegionDirector of Professional Services - Midwest Region

ISOGEN International/DataChannelISOGEN International/DataChannel

Knowledge Technologies 2001 – Austin, TXKnowledge Technologies 2001 – Austin, TX

7 March 20017 March 2001

Extracting Knowledge from XML Documents

Using Topic Maps

Page 3: The XML-based Enterprise Information Portal Solutions Company

PremisePremise

Rules and procedures can be established that Rules and procedures can be established that allow automated harvesting of information from allow automated harvesting of information from structured documents (XML) into a knowledge structured documents (XML) into a knowledge base by using the structure and the relationships base by using the structure and the relationships between the structural componentsbetween the structural components

Topic maps can be used as the interchange and Topic maps can be used as the interchange and management model for knowledge basesmanagement model for knowledge bases

New knowledge can be inferred within a New knowledge can be inferred within a knowledge base using defined inference rulesknowledge base using defined inference rules

Page 4: The XML-based Enterprise Information Portal Solutions Company

OverviewOverview

Late Breaking NewsLate Breaking News Topic MapsTopic Maps Knowledge Representation/Semantic NetworksKnowledge Representation/Semantic Networks Topic Map Constructs for Semantic NetworksTopic Map Constructs for Semantic Networks SemanTextSemanText - Example Application - Example Application ConclusionsConclusions

Page 5: The XML-based Enterprise Information Portal Solutions Company

Late Breaking NewsLate Breaking News

RDF and Topic MapsRDF and Topic Maps

XML Topic Maps (XTM)XML Topic Maps (XTM)

Page 6: The XML-based Enterprise Information Portal Solutions Company

Topic Maps and Semantic NetsTopic Maps and Semantic Nets

A A Topic MapTopic Map is a mechanism for describing and representing data about the structure is a mechanism for describing and representing data about the structure and content of an information set, using topics, associations, and occurrences.and content of an information set, using topics, associations, and occurrences.

A A Semantic NetworkSemantic Network is a knowledge representation technique consisting of nodes and is a knowledge representation technique consisting of nodes and links.links.

Page 7: The XML-based Enterprise Information Portal Solutions Company

Topic MapsTopic Maps

ISO/IEC 13250:2000 Document description and processing languages – Topic MapsISO/IEC 13250:2000 Document description and processing languages – Topic Maps TopicMaps.Org – XML Topic Maps (XTM)TopicMaps.Org – XML Topic Maps (XTM) Topic maps are optimized for navigation of large amounts of dataTopic maps are optimized for navigation of large amounts of data They are similar to indexes in the paper publishing worldThey are similar to indexes in the paper publishing world A topic map can also be compared to a glossary, cross-reference, thesaurus, or catalogA topic map can also be compared to a glossary, cross-reference, thesaurus, or catalog

Page 8: The XML-based Enterprise Information Portal Solutions Company

TopicsTopics

Topics are the basic building blocks of topic Topics are the basic building blocks of topic mapsmaps

A topic is anything a user wants to describeA topic is anything a user wants to describe A topic can have zero or many links to A topic can have zero or many links to

occurrences within an information setoccurrences within an information set A topic can be used to aggregate all the A topic can be used to aggregate all the

information about a subject within the information about a subject within the information setinformation set

Topics are categorized using topic typesTopics are categorized using topic types Topics can have multiple typesTopics can have multiple types Types are defined using topicsTypes are defined using topics

Page 9: The XML-based Enterprise Information Portal Solutions Company

Family Tree ExampleFamily Tree Example

George Cara

Olivia

Eric Rita Dawn ScottBecky Todd

CarmenTiffaniKeriJordan

Page 10: The XML-based Enterprise Information Portal Solutions Company

Family Tree ExampleFamily Tree Example

George Cara

Olivia

Eric Rita Dawn ScottBecky Todd

CarmenTiffaniKeriJordan

Page 11: The XML-based Enterprise Information Portal Solutions Company

AssociationsAssociations

Associations relate topics togetherAssociations relate topics together They express a semantic relationship between topicsThey express a semantic relationship between topics Association can be defined as an instance of a specific topicAssociation can be defined as an instance of a specific topic Topics are members of and have roles within associationsTopics are members of and have roles within associations Association role types are topicsAssociation role types are topics

Page 12: The XML-based Enterprise Information Portal Solutions Company

Family Tree ExampleFamily Tree Example

George Cara

Olivia

Eric Rita Dawn ScottBecky Todd

CarmenTiffaniKeriJordan

Page 13: The XML-based Enterprise Information Portal Solutions Company

OccurrencesOccurrences

Occurrences provide links from the topic map into the information setOccurrences provide links from the topic map into the information set Occurrences also provide an internal means for describing topics in the topic Occurrences also provide an internal means for describing topics in the topic

mapmap An occurrence can have only one typeAn occurrence can have only one type Occurrence roles are topicsOccurrence roles are topics

Page 14: The XML-based Enterprise Information Portal Solutions Company

Topic Scopes and ThemesTopic Scopes and Themes

Themes can be defined which can be used to group topics on a broader scale than typesThemes can be defined which can be used to group topics on a broader scale than types Themes can also be viewed as filters for topic informationThemes can also be viewed as filters for topic information Scopes can be assigned to topic characteristics, associations and occurrences which call Scopes can be assigned to topic characteristics, associations and occurrences which call

the themes into effectthe themes into effect Themes and scopes are used to disambiguate topicsThemes and scopes are used to disambiguate topics

Page 15: The XML-based Enterprise Information Portal Solutions Company

Semantic Network ArchitectureSemantic Network Architecture

A semantic network is drawn as a series of nodes A semantic network is drawn as a series of nodes connected by linksconnected by links

Nodes represent objects, concepts, or situations Nodes represent objects, concepts, or situations within a specific domainwithin a specific domain

Links represent relationships between nodesLinks represent relationships between nodes Specialized computer languages (such as Prolog) Specialized computer languages (such as Prolog)

have been developed which can model and process have been developed which can model and process the logic within a semantic networkthe logic within a semantic network

A semantic network can be used as the basis for the A semantic network can be used as the basis for the development of fact and rules within an expert development of fact and rules within an expert systemsystem

Page 16: The XML-based Enterprise Information Portal Solutions Company

Associative PropertiesAssociative Properties

The links within a semantic network may have the following properties: The links within a semantic network may have the following properties: – Reflexive - topic can have the association applied to itselfReflexive - topic can have the association applied to itself– Symmetric - association is true no matter the position of the topics – topics are Symmetric - association is true no matter the position of the topics – topics are

often of the same or related typesoften of the same or related types– Transitive - association can be derived based on other associationsTransitive - association can be derived based on other associations

Page 17: The XML-based Enterprise Information Portal Solutions Company

ExamplesExamples

ReflexiveReflexiveSpouse is married to spouseSpouse is married to spouse

SymmetricSymmetricHusband is married to wife ANDHusband is married to wife ANDWife is married to husbandWife is married to husband

TransitiveTransitiveFathers are parents ANDFathers are parents ANDEric is a father SOEric is a father SOEric is a parentEric is a parent

Page 18: The XML-based Enterprise Information Portal Solutions Company

Semantic Network RelationshipsSemantic Network Relationships

Typically binary – one node at the end of each linkTypically binary – one node at the end of each link N-ary relationships can be broken down into binary relationships N-ary relationships can be broken down into binary relationships

Austin, Texas is a city in the United States. =Austin, Texas is a city in the United States. =– Austin,Texas is a cityAustin,Texas is a city– Geographic regions (cities) are located in geographic regions (countries)Geographic regions (cities) are located in geographic regions (countries)– United States is a countryUnited States is a country

Page 19: The XML-based Enterprise Information Portal Solutions Company

Topic Maps vs. Semantic NetworksTopic Maps vs. Semantic Networks

Commonalities between topic maps and semantic networks:Commonalities between topic maps and semantic networks:– Both are organized into a network of information nodes or modules.Both are organized into a network of information nodes or modules.– Both allow the user to model links between the nodes.Both allow the user to model links between the nodes.– Both allow the user to attach semantic information to the nodes and the links.Both allow the user to attach semantic information to the nodes and the links.

One basic difference:One basic difference:– Topic maps focus on navigation between topics.Topic maps focus on navigation between topics.– Semantic networks focus on the links/associations between the nodes and the knowledge represented by the linked nodes.Semantic networks focus on the links/associations between the nodes and the knowledge represented by the linked nodes.

Page 20: The XML-based Enterprise Information Portal Solutions Company

Harvesting Knowledge from Harvesting Knowledge from Structured InformationStructured Information XML provides a way of attaching semantics to XML provides a way of attaching semantics to

pieces of information through markuppieces of information through markup Markup can be used to define or identify topic Markup can be used to define or identify topic

typestypes– Element namesElement names– Attribute valuesAttribute values

Associations between different pieces of Associations between different pieces of information can be determined by structural information can be determined by structural relationshipsrelationships

XPath can be used to denote the structural XPath can be used to denote the structural componentscomponents

Page 21: The XML-based Enterprise Information Portal Solutions Company

Topic Map Constructs for Topic Map Constructs for Semantic NetsSemantic Nets Published Subject IdentifiersPublished Subject Identifiers Topic Map Templates/Association TemplatesTopic Map Templates/Association Templates Type Hierarchies/OntologiesType Hierarchies/Ontologies Association TypesAssociation Types Association PropertiesAssociation Properties Association OccurrencesAssociation Occurrences Inference RulesInference Rules

Page 22: The XML-based Enterprise Information Portal Solutions Company

Published Subject Identifiers (PSIs)Published Subject Identifiers (PSIs)

Allows an identifier to be attached to a subject so that it can unambiguously be named and referencedAllows an identifier to be attached to a subject so that it can unambiguously be named and referenced XTM identifies a core set of PSIs for the main building blocks for topic maps as well as selected association XTM identifies a core set of PSIs for the main building blocks for topic maps as well as selected association

typestypes Two topics which are related to the same subject are merged automaticallyTwo topics which are related to the same subject are merged automatically

http://www.topicmaps.org/xtm/1.0/psi1.xtm#superclass-subclass http://www.topicmaps.org/xtm/1.0/psi1.xtm#superclass http://www.topicmaps.org/xtm/1.0/psi1.xtm#subclass

Page 23: The XML-based Enterprise Information Portal Solutions Company

Templates/SchemasTemplates/Schemas

Define semantics contained within an associationDefine semantics contained within an association Define constraints on the creation of semantically valid topic map structuresDefine constraints on the creation of semantically valid topic map structures Provide roadmaps for creation of topic map structuresProvide roadmaps for creation of topic map structures Defined using regular topic maps syntaxDefined using regular topic maps syntax Future work may include definition of extentsFuture work may include definition of extents

– CardinalityCardinality– Time/DateTime/Date

Page 24: The XML-based Enterprise Information Portal Solutions Company

Templates/Schemas – cont.Templates/Schemas – cont.<topic id="marriage.schema"> <instanceOf><topicRef xlink:href="#association.class"/></instanceOf> <instanceOf><topicRef xlink:href="#schema"/></instanceOf> <baseName><baseNameString>Marriage</baseNameString></baseName> <occurrence> <instanceOf><topicRef xlink:href="#association.property"/></instanceOf> <resourceRef xlink:href="#reflexive"/> </occurrence> <occurrence id="minimum.spouses"> <instanceOf><topicRef xlink:href="#minimum.occurrences"/></instanceOf> <resourceData>2</resourceData> </occurrence> <occurrence id="maximum.spouses"> <instanceOf><topicRef xlink:href="#maximum.occurrences"/></instanceOf> <resourceData>2</resourceData> </occurrence> </topic>

Page 25: The XML-based Enterprise Information Portal Solutions Company

Templates/Schemas – cont.Templates/Schemas – cont.

<association>

<instanceOf><topicRef xlink:href="#marriage.schema"/></instanceOf>

<scope><topicRef xlink:href="#schema"/></scope>

<member>

<roleSpec><topicRef xlink:href="#spouse"/></roleSpec>

<resourceRef xlink:href="#minimum.spouses"/>

<resourceRef xlink:href="#maximum.spouses"/>

</member>

</association>

Page 26: The XML-based Enterprise Information Portal Solutions Company

Type Hierarchies/OntologiesType Hierarchies/Ontologies

Hierarchies allow ontologies to be developed by which Hierarchies allow ontologies to be developed by which additional knowledge can inferred simply through additional knowledge can inferred simply through hierarchical inheritancehierarchical inheritance

Can use templates to control or enhance the ontologyCan use templates to control or enhance the ontology

Page 27: The XML-based Enterprise Information Portal Solutions Company

Type Hierarchies/Ontologies – cont.Type Hierarchies/Ontologies – cont.

<topic id="person"> <instanceOf><topicRef xlink:href="#topic.class"/></instanceOf> <baseName><baseNameString>Person</baseNameString></baseName></topic> <topic id="male"> <instanceOf><topicRef xlink:href="#topic.class"/></instanceOf> <baseName><baseNameString>Male</baseNameString></baseName></topic>

<topic id="eric"> <instanceOf><topicRef xlink:href="#male"/></instanceOf> <instanceOf><topicRef xlink:href="#person"/></instanceOf> <baseName><baseNameString>Eric</baseNameString></baseName></topic>

Page 28: The XML-based Enterprise Information Portal Solutions Company

Association TypesAssociation Types

ISO 13250 implicitly specifies class/instance associationsISO 13250 implicitly specifies class/instance associations XTM specifies, through PSIs, class/instance and superclass/subclassXTM specifies, through PSIs, class/instance and superclass/subclass Other examplesOther examples

– Component/objectComponent/object– Member/collectionMember/collection– Portion/massPortion/mass– Feature/activityFeature/activity– Place/areaPlace/area– Phase/processPhase/process

Page 29: The XML-based Enterprise Information Portal Solutions Company

Association PropertiesAssociation Properties

Transitivity, reflexivity, symmetry properties can be Transitivity, reflexivity, symmetry properties can be attached to associationsattached to associations

Allows special processing and understanding to occur Allows special processing and understanding to occur when using associationswhen using associations

Page 30: The XML-based Enterprise Information Portal Solutions Company

Association OccurrencesAssociation Occurrences

Topic maps center more on topics where other knowledge management Topic maps center more on topics where other knowledge management schemes concentrate more on associations or relationships between topicsschemes concentrate more on associations or relationships between topics

In topic maps, associations can have topics defined which reify themIn topic maps, associations can have topics defined which reify them Reification of associations allows them to have occurrences Reification of associations allows them to have occurrences

Page 31: The XML-based Enterprise Information Portal Solutions Company

Inference RulesInference Rules

Inference rules allow new topics and associations to be Inference rules allow new topics and associations to be created based on the existence of otherscreated based on the existence of others

Rules can be stored and managed using topic map Rules can be stored and managed using topic map syntaxsyntax

Page 32: The XML-based Enterprise Information Portal Solutions Company

Inference Rules – cont.Inference Rules – cont.

<association>

<instanceOf><topicRef xlink:href="#inference.rule"/></instanceOf>

<scope><topicRef xlink:href="#inference.rule.schema"/></scope>

<member>

<roleSpec><topicRef xlink:href="#inference.rule.condition"/></roleSpec>

<topicRef xlink:href="#ir.parent.in.family.N345"/>

<topicRef xlink:href="#ir.parent.in.family.N456"/>

<topicRef xlink:href="#ir.sibling.in.family.N567"/>

</member>

<member>

<roleSpec><topicRef xlink:href="#inference.rule.statement"/></roleSpec>

<topicRef xlink:href="#ir.cousin.N678"/>

</member>

</association>

Page 33: The XML-based Enterprise Information Portal Solutions Company

SemanTextSemanText: Using Topic Maps for : Using Topic Maps for Knowledge RepresentationKnowledge Representation 100% pure Python system developed to demonstrate the joining of topic maps and semantic networks100% pure Python system developed to demonstrate the joining of topic maps and semantic networks Uses tmproc, wxPython, PyXMLUses tmproc, wxPython, PyXML Enables creation, modification, querying of topic map structuresEnables creation, modification, querying of topic map structures Semantic networks structures with entities and relationshipsSemantic networks structures with entities and relationships Inference engine built in where user can add rules which create new topic map structuresInference engine built in where user can add rules which create new topic map structures Development is continuingDevelopment is continuing

Page 34: The XML-based Enterprise Information Portal Solutions Company

DemoDemo

Page 35: The XML-based Enterprise Information Portal Solutions Company

Future SemanTextFuture SemanText Plans Plans

Implement XTMImplement XTM Implement scopes, themesImplement scopes, themes Implement merge – hard vs. softImplement merge – hard vs. soft Integration with grove-based system to allow point-and-click input from multiple data formatsIntegration with grove-based system to allow point-and-click input from multiple data formats Hooks to natural language toolsHooks to natural language tools Voice input/output using VoiceMLVoice input/output using VoiceML Graphical output such as VRML or SVGGraphical output such as VRML or SVG Textual output such as Open E-book, PalmOS, WMLTextual output such as Open E-book, PalmOS, WML

Page 36: The XML-based Enterprise Information Portal Solutions Company

ConclusionsConclusions

SemanTextSemanText demonstrates that information can be harvested using the markup from XML documents in demonstrates that information can be harvested using the markup from XML documents in order to build a knowledge baseorder to build a knowledge base

It demonstrates that the topic map architecture can be used to interchange semantic network informationIt demonstrates that the topic map architecture can be used to interchange semantic network information It also demonstrates that topic maps can be used to feed a semantic networkIt also demonstrates that topic maps can be used to feed a semantic network It demonstrates that topic map syntax can be used to extend the topic map paradigmIt demonstrates that topic map syntax can be used to extend the topic map paradigm

– Schemas, templates, inference rulesSchemas, templates, inference rules

Page 37: The XML-based Enterprise Information Portal Solutions Company

Q & AQ & A

SemanText available fromSemanText available fromwww.semantext.comwww.semantext.com

Questions or comments welcome at:Questions or comments welcome at:ISOGEN International/DataChannelISOGEN International/DataChannel1611 W. County Road B, Suite 2041611 W. County Road B, Suite 204

St. Paul, MN 55113 USASt. Paul, MN 55113 USAVoice: 1.651.636.9100 - Fax: 1.651.636.9191Voice: 1.651.636.9100 - Fax: 1.651.636.9191

[email protected]@isogen.comwww.isogen.com - www.datachannel.comwww.isogen.com - www.datachannel.com