the xml-based enterprise information portal solutions company
DESCRIPTION
The XML-based Enterprise Information Portal Solutions Company. Extracting Knowledge from XML Documents Using Topic Maps. Eric Freese Director of Professional Services - Midwest Region ISOGEN International/DataChannel Knowledge Technologies 2001 – Austin, TX 7 March 2001. Premise. - PowerPoint PPT PresentationTRANSCRIPT
The XML-based Enterprise Information Portal Solutions Company
Eric FreeseEric Freese
Director of Professional Services - Midwest RegionDirector of Professional Services - Midwest Region
ISOGEN International/DataChannelISOGEN International/DataChannel
Knowledge Technologies 2001 – Austin, TXKnowledge Technologies 2001 – Austin, TX
7 March 20017 March 2001
Extracting Knowledge from XML Documents
Using Topic Maps
PremisePremise
Rules and procedures can be established that Rules and procedures can be established that allow automated harvesting of information from allow automated harvesting of information from structured documents (XML) into a knowledge structured documents (XML) into a knowledge base by using the structure and the relationships base by using the structure and the relationships between the structural componentsbetween the structural components
Topic maps can be used as the interchange and Topic maps can be used as the interchange and management model for knowledge basesmanagement model for knowledge bases
New knowledge can be inferred within a New knowledge can be inferred within a knowledge base using defined inference rulesknowledge base using defined inference rules
OverviewOverview
Late Breaking NewsLate Breaking News Topic MapsTopic Maps Knowledge Representation/Semantic NetworksKnowledge Representation/Semantic Networks Topic Map Constructs for Semantic NetworksTopic Map Constructs for Semantic Networks SemanTextSemanText - Example Application - Example Application ConclusionsConclusions
Late Breaking NewsLate Breaking News
RDF and Topic MapsRDF and Topic Maps
XML Topic Maps (XTM)XML Topic Maps (XTM)
Topic Maps and Semantic NetsTopic Maps and Semantic Nets
A A Topic MapTopic Map is a mechanism for describing and representing data about the structure is a mechanism for describing and representing data about the structure and content of an information set, using topics, associations, and occurrences.and content of an information set, using topics, associations, and occurrences.
A A Semantic NetworkSemantic Network is a knowledge representation technique consisting of nodes and is a knowledge representation technique consisting of nodes and links.links.
Topic MapsTopic Maps
ISO/IEC 13250:2000 Document description and processing languages – Topic MapsISO/IEC 13250:2000 Document description and processing languages – Topic Maps TopicMaps.Org – XML Topic Maps (XTM)TopicMaps.Org – XML Topic Maps (XTM) Topic maps are optimized for navigation of large amounts of dataTopic maps are optimized for navigation of large amounts of data They are similar to indexes in the paper publishing worldThey are similar to indexes in the paper publishing world A topic map can also be compared to a glossary, cross-reference, thesaurus, or catalogA topic map can also be compared to a glossary, cross-reference, thesaurus, or catalog
TopicsTopics
Topics are the basic building blocks of topic Topics are the basic building blocks of topic mapsmaps
A topic is anything a user wants to describeA topic is anything a user wants to describe A topic can have zero or many links to A topic can have zero or many links to
occurrences within an information setoccurrences within an information set A topic can be used to aggregate all the A topic can be used to aggregate all the
information about a subject within the information about a subject within the information setinformation set
Topics are categorized using topic typesTopics are categorized using topic types Topics can have multiple typesTopics can have multiple types Types are defined using topicsTypes are defined using topics
Family Tree ExampleFamily Tree Example
George Cara
Olivia
Eric Rita Dawn ScottBecky Todd
CarmenTiffaniKeriJordan
Family Tree ExampleFamily Tree Example
George Cara
Olivia
Eric Rita Dawn ScottBecky Todd
CarmenTiffaniKeriJordan
AssociationsAssociations
Associations relate topics togetherAssociations relate topics together They express a semantic relationship between topicsThey express a semantic relationship between topics Association can be defined as an instance of a specific topicAssociation can be defined as an instance of a specific topic Topics are members of and have roles within associationsTopics are members of and have roles within associations Association role types are topicsAssociation role types are topics
Family Tree ExampleFamily Tree Example
George Cara
Olivia
Eric Rita Dawn ScottBecky Todd
CarmenTiffaniKeriJordan
OccurrencesOccurrences
Occurrences provide links from the topic map into the information setOccurrences provide links from the topic map into the information set Occurrences also provide an internal means for describing topics in the topic Occurrences also provide an internal means for describing topics in the topic
mapmap An occurrence can have only one typeAn occurrence can have only one type Occurrence roles are topicsOccurrence roles are topics
Topic Scopes and ThemesTopic Scopes and Themes
Themes can be defined which can be used to group topics on a broader scale than typesThemes can be defined which can be used to group topics on a broader scale than types Themes can also be viewed as filters for topic informationThemes can also be viewed as filters for topic information Scopes can be assigned to topic characteristics, associations and occurrences which call Scopes can be assigned to topic characteristics, associations and occurrences which call
the themes into effectthe themes into effect Themes and scopes are used to disambiguate topicsThemes and scopes are used to disambiguate topics
Semantic Network ArchitectureSemantic Network Architecture
A semantic network is drawn as a series of nodes A semantic network is drawn as a series of nodes connected by linksconnected by links
Nodes represent objects, concepts, or situations Nodes represent objects, concepts, or situations within a specific domainwithin a specific domain
Links represent relationships between nodesLinks represent relationships between nodes Specialized computer languages (such as Prolog) Specialized computer languages (such as Prolog)
have been developed which can model and process have been developed which can model and process the logic within a semantic networkthe logic within a semantic network
A semantic network can be used as the basis for the A semantic network can be used as the basis for the development of fact and rules within an expert development of fact and rules within an expert systemsystem
Associative PropertiesAssociative Properties
The links within a semantic network may have the following properties: The links within a semantic network may have the following properties: – Reflexive - topic can have the association applied to itselfReflexive - topic can have the association applied to itself– Symmetric - association is true no matter the position of the topics – topics are Symmetric - association is true no matter the position of the topics – topics are
often of the same or related typesoften of the same or related types– Transitive - association can be derived based on other associationsTransitive - association can be derived based on other associations
ExamplesExamples
ReflexiveReflexiveSpouse is married to spouseSpouse is married to spouse
SymmetricSymmetricHusband is married to wife ANDHusband is married to wife ANDWife is married to husbandWife is married to husband
TransitiveTransitiveFathers are parents ANDFathers are parents ANDEric is a father SOEric is a father SOEric is a parentEric is a parent
Semantic Network RelationshipsSemantic Network Relationships
Typically binary – one node at the end of each linkTypically binary – one node at the end of each link N-ary relationships can be broken down into binary relationships N-ary relationships can be broken down into binary relationships
Austin, Texas is a city in the United States. =Austin, Texas is a city in the United States. =– Austin,Texas is a cityAustin,Texas is a city– Geographic regions (cities) are located in geographic regions (countries)Geographic regions (cities) are located in geographic regions (countries)– United States is a countryUnited States is a country
Topic Maps vs. Semantic NetworksTopic Maps vs. Semantic Networks
Commonalities between topic maps and semantic networks:Commonalities between topic maps and semantic networks:– Both are organized into a network of information nodes or modules.Both are organized into a network of information nodes or modules.– Both allow the user to model links between the nodes.Both allow the user to model links between the nodes.– Both allow the user to attach semantic information to the nodes and the links.Both allow the user to attach semantic information to the nodes and the links.
One basic difference:One basic difference:– Topic maps focus on navigation between topics.Topic maps focus on navigation between topics.– Semantic networks focus on the links/associations between the nodes and the knowledge represented by the linked nodes.Semantic networks focus on the links/associations between the nodes and the knowledge represented by the linked nodes.
Harvesting Knowledge from Harvesting Knowledge from Structured InformationStructured Information XML provides a way of attaching semantics to XML provides a way of attaching semantics to
pieces of information through markuppieces of information through markup Markup can be used to define or identify topic Markup can be used to define or identify topic
typestypes– Element namesElement names– Attribute valuesAttribute values
Associations between different pieces of Associations between different pieces of information can be determined by structural information can be determined by structural relationshipsrelationships
XPath can be used to denote the structural XPath can be used to denote the structural componentscomponents
Topic Map Constructs for Topic Map Constructs for Semantic NetsSemantic Nets Published Subject IdentifiersPublished Subject Identifiers Topic Map Templates/Association TemplatesTopic Map Templates/Association Templates Type Hierarchies/OntologiesType Hierarchies/Ontologies Association TypesAssociation Types Association PropertiesAssociation Properties Association OccurrencesAssociation Occurrences Inference RulesInference Rules
Published Subject Identifiers (PSIs)Published Subject Identifiers (PSIs)
Allows an identifier to be attached to a subject so that it can unambiguously be named and referencedAllows an identifier to be attached to a subject so that it can unambiguously be named and referenced XTM identifies a core set of PSIs for the main building blocks for topic maps as well as selected association XTM identifies a core set of PSIs for the main building blocks for topic maps as well as selected association
typestypes Two topics which are related to the same subject are merged automaticallyTwo topics which are related to the same subject are merged automatically
http://www.topicmaps.org/xtm/1.0/psi1.xtm#superclass-subclass http://www.topicmaps.org/xtm/1.0/psi1.xtm#superclass http://www.topicmaps.org/xtm/1.0/psi1.xtm#subclass
Templates/SchemasTemplates/Schemas
Define semantics contained within an associationDefine semantics contained within an association Define constraints on the creation of semantically valid topic map structuresDefine constraints on the creation of semantically valid topic map structures Provide roadmaps for creation of topic map structuresProvide roadmaps for creation of topic map structures Defined using regular topic maps syntaxDefined using regular topic maps syntax Future work may include definition of extentsFuture work may include definition of extents
– CardinalityCardinality– Time/DateTime/Date
Templates/Schemas – cont.Templates/Schemas – cont.<topic id="marriage.schema"> <instanceOf><topicRef xlink:href="#association.class"/></instanceOf> <instanceOf><topicRef xlink:href="#schema"/></instanceOf> <baseName><baseNameString>Marriage</baseNameString></baseName> <occurrence> <instanceOf><topicRef xlink:href="#association.property"/></instanceOf> <resourceRef xlink:href="#reflexive"/> </occurrence> <occurrence id="minimum.spouses"> <instanceOf><topicRef xlink:href="#minimum.occurrences"/></instanceOf> <resourceData>2</resourceData> </occurrence> <occurrence id="maximum.spouses"> <instanceOf><topicRef xlink:href="#maximum.occurrences"/></instanceOf> <resourceData>2</resourceData> </occurrence> </topic>
Templates/Schemas – cont.Templates/Schemas – cont.
<association>
<instanceOf><topicRef xlink:href="#marriage.schema"/></instanceOf>
<scope><topicRef xlink:href="#schema"/></scope>
<member>
<roleSpec><topicRef xlink:href="#spouse"/></roleSpec>
<resourceRef xlink:href="#minimum.spouses"/>
<resourceRef xlink:href="#maximum.spouses"/>
</member>
</association>
Type Hierarchies/OntologiesType Hierarchies/Ontologies
Hierarchies allow ontologies to be developed by which Hierarchies allow ontologies to be developed by which additional knowledge can inferred simply through additional knowledge can inferred simply through hierarchical inheritancehierarchical inheritance
Can use templates to control or enhance the ontologyCan use templates to control or enhance the ontology
Type Hierarchies/Ontologies – cont.Type Hierarchies/Ontologies – cont.
<topic id="person"> <instanceOf><topicRef xlink:href="#topic.class"/></instanceOf> <baseName><baseNameString>Person</baseNameString></baseName></topic> <topic id="male"> <instanceOf><topicRef xlink:href="#topic.class"/></instanceOf> <baseName><baseNameString>Male</baseNameString></baseName></topic>
<topic id="eric"> <instanceOf><topicRef xlink:href="#male"/></instanceOf> <instanceOf><topicRef xlink:href="#person"/></instanceOf> <baseName><baseNameString>Eric</baseNameString></baseName></topic>
Association TypesAssociation Types
ISO 13250 implicitly specifies class/instance associationsISO 13250 implicitly specifies class/instance associations XTM specifies, through PSIs, class/instance and superclass/subclassXTM specifies, through PSIs, class/instance and superclass/subclass Other examplesOther examples
– Component/objectComponent/object– Member/collectionMember/collection– Portion/massPortion/mass– Feature/activityFeature/activity– Place/areaPlace/area– Phase/processPhase/process
Association PropertiesAssociation Properties
Transitivity, reflexivity, symmetry properties can be Transitivity, reflexivity, symmetry properties can be attached to associationsattached to associations
Allows special processing and understanding to occur Allows special processing and understanding to occur when using associationswhen using associations
Association OccurrencesAssociation Occurrences
Topic maps center more on topics where other knowledge management Topic maps center more on topics where other knowledge management schemes concentrate more on associations or relationships between topicsschemes concentrate more on associations or relationships between topics
In topic maps, associations can have topics defined which reify themIn topic maps, associations can have topics defined which reify them Reification of associations allows them to have occurrences Reification of associations allows them to have occurrences
Inference RulesInference Rules
Inference rules allow new topics and associations to be Inference rules allow new topics and associations to be created based on the existence of otherscreated based on the existence of others
Rules can be stored and managed using topic map Rules can be stored and managed using topic map syntaxsyntax
Inference Rules – cont.Inference Rules – cont.
<association>
<instanceOf><topicRef xlink:href="#inference.rule"/></instanceOf>
<scope><topicRef xlink:href="#inference.rule.schema"/></scope>
<member>
<roleSpec><topicRef xlink:href="#inference.rule.condition"/></roleSpec>
<topicRef xlink:href="#ir.parent.in.family.N345"/>
<topicRef xlink:href="#ir.parent.in.family.N456"/>
<topicRef xlink:href="#ir.sibling.in.family.N567"/>
</member>
<member>
<roleSpec><topicRef xlink:href="#inference.rule.statement"/></roleSpec>
<topicRef xlink:href="#ir.cousin.N678"/>
</member>
</association>
SemanTextSemanText: Using Topic Maps for : Using Topic Maps for Knowledge RepresentationKnowledge Representation 100% pure Python system developed to demonstrate the joining of topic maps and semantic networks100% pure Python system developed to demonstrate the joining of topic maps and semantic networks Uses tmproc, wxPython, PyXMLUses tmproc, wxPython, PyXML Enables creation, modification, querying of topic map structuresEnables creation, modification, querying of topic map structures Semantic networks structures with entities and relationshipsSemantic networks structures with entities and relationships Inference engine built in where user can add rules which create new topic map structuresInference engine built in where user can add rules which create new topic map structures Development is continuingDevelopment is continuing
DemoDemo
Future SemanTextFuture SemanText Plans Plans
Implement XTMImplement XTM Implement scopes, themesImplement scopes, themes Implement merge – hard vs. softImplement merge – hard vs. soft Integration with grove-based system to allow point-and-click input from multiple data formatsIntegration with grove-based system to allow point-and-click input from multiple data formats Hooks to natural language toolsHooks to natural language tools Voice input/output using VoiceMLVoice input/output using VoiceML Graphical output such as VRML or SVGGraphical output such as VRML or SVG Textual output such as Open E-book, PalmOS, WMLTextual output such as Open E-book, PalmOS, WML
ConclusionsConclusions
SemanTextSemanText demonstrates that information can be harvested using the markup from XML documents in demonstrates that information can be harvested using the markup from XML documents in order to build a knowledge baseorder to build a knowledge base
It demonstrates that the topic map architecture can be used to interchange semantic network informationIt demonstrates that the topic map architecture can be used to interchange semantic network information It also demonstrates that topic maps can be used to feed a semantic networkIt also demonstrates that topic maps can be used to feed a semantic network It demonstrates that topic map syntax can be used to extend the topic map paradigmIt demonstrates that topic map syntax can be used to extend the topic map paradigm
– Schemas, templates, inference rulesSchemas, templates, inference rules
Q & AQ & A
SemanText available fromSemanText available fromwww.semantext.comwww.semantext.com
Questions or comments welcome at:Questions or comments welcome at:ISOGEN International/DataChannelISOGEN International/DataChannel1611 W. County Road B, Suite 2041611 W. County Road B, Suite 204
St. Paul, MN 55113 USASt. Paul, MN 55113 USAVoice: 1.651.636.9100 - Fax: 1.651.636.9191Voice: 1.651.636.9100 - Fax: 1.651.636.9191
[email protected]@isogen.comwww.isogen.com - www.datachannel.comwww.isogen.com - www.datachannel.com