creating semantic web contents with protege 2000

Upload: jsmith2090

Post on 05-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    1/12

    60 1094-7167/01/$10.00 2001 IEEE IEEE INTELLIGENT SYSTEMS

    T h e S e m a n t i c W e b

    Creating Semantic

    Web Contents withProtg-2000

    Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubzy, Ray W. Fergerson, and

    Mark A. Musen, Stanford University

    Because we can process only a tiny fraction of information available on the Web,

    we must turn to machines for help in processing and analyzing its contents. With

    current technology, machines cannot understand and interpret the meaning of the infor-

    mation in natural-language form, which is how most Web information is represented

    today. We need a Semantic Web to express informa-tion in a precise, machine-interpretable form, so soft-ware agents processing the same set of data share anunderstanding of what the terms describing the datamean.1

    Consequently, weve recently seen an explosionin the number of Semantic Web languages devel-

    oped. Because researchers and developers haventyet reached a consensus on which language is themost suitable, which features each language musthave,or which syntax is the most appropriate,we arelikely to see even more languages emerge. We needto develop tools that will let us experiment with thesenew languages so we can compare their expressive-ness and features, change language specifications,and select a suitable language for a specific task.

    In this article, we describe Protg-2000, a graph-ical tool for ontology editing and knowledge acqui-sition that we can adapt to enable conceptual model-ing with new and evolving Semantic Web languages.Protg-2000 lets us think about domain models at a

    conceptual level without having to know the syntaxof the language ultimately used on the Web. We canconcentrate on the concepts and relationships in thedomain and the facts about them that we need toexpress. For example, if we are developing an ontol-ogy of wines, food, and appropriate winefood com-binations, we can focus on Bordeaux and lambinstead of markup tags and correct syntax.

    Naturally, designing a new tool specifically for anew language could be better than adapting an exist-ing tool. We can offer several reasons, however, for

    adapting an existing tool at the stage where no sin-gle language has emerged as the winner. First, wecan experiment with emerging languages withoutcommitting enormous amounts of resources to cre-ating tools that are custom-tailored for these lan-guagesonly to decide later that the languages arenot suitable. Second,Protg-2000 already provides

    considerable functionality that a new tool will needto replicate, both at the modeling and user-interfacelevels. Third, using different customizations of thesame tool to edit ontologies in different languagesgives us most of the translation among the modelsin the languages for free.Translating a model fromone language to another becomes as easy as select-ing a save as item from a menu.

    Semantic Web languagesAI researchers have used ontologies for a long

    time to express formally a shared understanding ofinformation. An ontology is an explicit specificationof the concepts in a domain and the relations among

    them, which provides a formal vocabulary for infor-mation exchange. Specific instances of the conceptsdefined in the ontologiesinstance datapairedwith ontologies constitute the basis of the SemanticWeb. In recent experiments to prototype the Seman-tic Web, members of different communities with dif-ferent backgrounds and goals in mind have createda multitude of languages for representing ontologiesand instance data on the Web (see Table1). Typically,a Semantic Web language for describing ontologiesand instance data contains a hierarchical description

    As researchers continue

    to create new languages

    in the hope of developing

    a Semantic Web, they still

    lack consensus on a

    standard. The authors

    describe how Protg-

    2000a tool for

    ontology development

    and knowledge

    acquisitioncan be

    adapted for editing

    models in different

    Semantic Web

    languages.

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    2/12

    of important concepts in a domain (classes).Individuals in the domain are instances ofthese classes, and properties (slots) of eachclass describe various features and attributesof the concept. Logical statements describerelations among concepts. For example, con-sider an ontology describing wines,food, andappropriate winefood combinations. Someof the classes describing this domain are Wine,Wineries, and different types ofFood. Some

    properties of the Wine class include the winesflavor, body, sugar level, and thewinery that pro-duced it.

    These notions are present in many Seman-tic Web languages existing today includingSHOE, Topic Maps, XOL, RDF and RDFS,and DAML+OIL.

    The SHOE (Simple HTML OntologyExtensions) language, developed at the Uni-versity of Maryland, introduces primitives todefine ontology and instance data on Webpages. Classes are called categories in SHOE.Categories constitute a simple is-a hierarchy,and slots are binary relations. SHOE also

    allows relations among instances or instancesand data to have any number of arguments(not just binary relations). Horn clausesexpress intensional definitions in SHOE.

    The Hytime community developed TopicMaps, a recent ISO standard (ISO/IEC13250). Topic Maps aim to annotate docu-ments with conceptual information. Topicscorrespond to classes in other ontology lan-guages and can be linked to documents. Top-ics are instances ofTopic Types (other topics),which can be related to one another withAsso-ciations. Associations correspond closely toslots in other ontology languages. Associa-

    tions belong toAssociation Types, which areagain Topics. Topic Maps do not have a spe-cialized primitive for representing instances.Any instance of a topic type can act as a topictype itself.

    The bioinformatics community designedXOL for the exchange of ontologies in thefield of bioinformatics. It evolved to becomea general language for interchange of ontol-ogy and instance data. Being an interchangelanguage, XOL includes primitives found inmany knowledge-representation systems,object databases,and relational databases. Itprovides means to define classes, a class hier-

    archy, slots, facets, and instances.RDF (resource description framework)provides a graph-based data model, consist-ing of nodes and edges. Nodes correspond to

    objects or resources and the edges to prop-erties of these objects. The labels on thenodes and edges are Uniform Resource Iden-tifiers (URIs). However, RDF itself does notdefine any primitives for creating ontologies.It is the basis for several other ontology-def-inition languages such as RDFS andDAML+OIL.

    RDF Schema2 defines the primitives forcreating ontologies. Figure 1 shows an exam-

    ple of a graph representing our ontology ofwines as an RDFS. In RDFS, there areclasses of concepts, which constitute a hier-archy with multiple inheritance. For exam-ple, the class Wine is a subclass of the classDrink. Classes typically have instances (forexample, a specific red wine is an instanceof the Red Wine class) and a resource can be aninstance of more than one class (for exam-ple, Romariz Port is an instance of both the RedWine and the Dessert Wine classes). Resourceshave properties associated with them (forexample, Wine has flavor). Properties describeattributes of a resource or a relation of a

    resource to another resource. RDFS definesa propertys domainresources that can besubjects of the propertyand a propertysrangeresources that can be objects of the

    MARCH/APRIL 2001 computer.org/intelligent 61

    Table 1. A selection of Semantic Web languages.

    Language Description URL

    XOL XML-based ontology-exchange language www.ai.sri.com/~pkarp/xol

    Topic Maps ISO standard for describing knowledge structures www.topicmaps.org

    SHOE Simple HTML Ontology Extensions www.cs.umd.edu/projects/plus/SHOE

    RDF and Resource description framework and www.w3.org/RDFRDFS RDF Schema

    DAML+OIL DARPA Agent Markup Language + www.daml.org

    Ontology Inference Layer

    rdf:type

    rdfs:subClassOf

    rdfs:subClassOf

    rdf:type

    rdf:typerdf:type

    rdf:type rdf:type

    rdf:type

    rdf:type

    rdf:type

    rdf:type

    rdfs:rangerdfs:range

    rdfs:subClassOf

    rdfs:subClassOf rdfs:subClassOf

    rdfs:domain

    rdfs:domain

    rdf:Property

    d:White_wine d:Dessert_wine

    d:Drink d:maker

    d:grape

    rdfs:Class

    d:Wine

    d:Winery

    d:Red_wine d:Rose_wine

    d:Wine_grape

    Figure 1. An RDF Schema graph representing the Wine ontology.

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    3/12

    property. For example, the property maker

    may have a class Wine as its domain and aclass Winery as its range.DAML+OIL (DARPA Agent Markup lan-

    guage + Ontology Inference Layer)3 takes adifferent approach to defining classes andinstances. In addition to defining classes andinstances declaratively, DAML+OIL andother description-logics languages let us cre-ate intensional class definitions using Booleanexpressions and specify necessary, or neces-sary and sufficient, conditions for class mem-bership. These languages rely on inferenceengines (classifiers) to compute a class hier-archy and to determine class membership of

    instances based on the properties of classesand instances. For example, we can define aclass of Bordeaux wines as a class of winesproduced by a winery in the Bordeaux region.In DAML+OIL, we can also specify globalproperties of classes and slots. For example,we can say that the location slot is transitive: ifa winery is located in the Bordeaux region andthe Bordeaux region is located in France,thenthe winery is in France. We will describeDAML+OIL in more detail later.

    We can see from this discussion that

    Semantic Web languages for representingontologies and instance data have many fea-tures in common. At the same time, there aresignificant differences stemming from dif-ferent design goals for the languages. Inadapting Protg-2000 as an editor for theselanguages, we build on the similaritiesamong them and custom-tailor the tool toaccount for the individual differences.

    Protg-2000For many years now, experts in domains

    such as medicine and manufacturing haveused Protg-2000 for domain modeling. We

    show not only how we adapt Protg-2000to the new world of the Semantic Webreusing its user interface, internal represen-tation, and frameworkbut also how ourchanges enable conceptual modeling with thenew Semantic Web languages.

    Protg-2000 is highly customizable,which makes its adaptation as an editor for anew language faster than creating a new edi-tor from scratch. The following featuresmake this customization possible:

    An extensible knowledge model. We can

    redefine declaratively the representationalprimitives the system uses. A customizable output file format.We can

    implement Protg-2000 components thattranslate from the Protg-2000 internalrepresentation to a text representation inany formal language.

    A customizable user interface. We canreplace Protg-2000 user-interface com-ponents for displaying and acquiring datawith new components that fit the new lan-guage better.

    An extensible architecture that enablesintegration with other applications. We

    can connect Protg-2000 directly toexternal semantic modules, such as spe-cific reasoning engines operating on themodels in the new language.

    Protg-2000 knowledge modelProtg-2000s representational primi-

    tivesthe elements of its knowledgemodel4are very similar to those of theSemantic Web languages that we describedearlier. Protg-2000 has classes, instances

    62 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

    T h e S e m a n t i c W e b

    The tabs representing differentviews of a knowledge base and the

    configuration information

    The class hiearchyThe slot definition

    The facets

    Figure 2. A snapshot of the ontology representing wines. The tree on the left represents a class hierarchy. The form on the right

    shows the definition of the Wine class.

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    4/12

    of these classes, slots representing attributesof classes and instances, and facets express-ing additional information about slots.

    Figure 2 shows an example definition of aclass,which is part of an ontology describing

    wines, food, and desirable winefood com-binations. In the figure, the tree on the leftrepresents a class hierarchy. The class ofPauil-lac wines, for instance, is a subclass of theclass ofMdoc wines. In other words, eachPauillac wine is a Mdoc wine. The class ofMdocwines is, in turn, a subclass ofRed Bordeauxwines and so on. The form on the right in Fig-ure 2 represents the definition of the selectedclass (Wine). There is the name of the class,its documentation, a list of possible con-straints, and definitions of slots that theinstances of this class will have. Instances ofthe class Wine will have slots describing their

    flavor, body, sugar level, thewinery that producedthe wine, and so on.The form in Figure 3 displays an instance

    of the class Pauillac representing Chteau LafiteRothschild Pauillac, and the fields display the slotvalues for that instance. Therefore, we knowthat Chteau Lafite Rothschild Pauillac has afull body and strong flavor among otherproperties. Both the class-definition forms(the right-hand side in Figure 2) and theinstance-definition forms (Figure 3) areknowledge-acquisition forms in Protg-2000. The fields on the knowledge-acquisi-tion forms correspond to slot values, and we

    define classes and instances by filling in slotvalues in these fields. Protg-2000 gener-ates knowledge-acquisition forms automati-cally based on the types of the slots andrestrictions on their values.

    The Protg-2000 user interface (Figure2) consists of several tabs for editing differ-ent elements of a knowledge base and cus-tom tailoring the layout of the knowledge-acquisition forms, such as the forms inFigures 2 and 3. The Classes tab definesclasses and slots, and the Instances tabacquires specific instances. The Forms taballows us to change the layout and the con-

    tents of the knowledge-acquisition forms.We can customize almost all of the Pro-

    tg-2000 features we have described to fita specific domain or task by

    changing declaratively the standard classand slot definitions;

    changing the content and the layout of theknowledge-acquisition forms; and

    developing plug-ins using the Protg-2000 application-programming interface.5

    Lets look at how we can customize Pro-tg-2000 and then see how we can use thisflexibility to create Protg-based editors fornew Semantic Web languages.

    Changing the notion of classesand slotsThe definition of the Wine class in Figure 2

    is a standard class definition in Protg-2000,with a class name, documentation, list ofslots,and so on. What if we need to add moreattributes to a class definition, or change howa class looks, or change the default definitionof a class in the system? For instance, wemight want to add a list of a few best winer-ies for each type of wine in the hierarchy.Such a list is a property of a class (such asPauillac wines) rather than a property of spe-cific instances of the class (such as Chteau

    Lafite Rothschild Pauillac). The list of the bestwineries for a class is not inherited by its sub-classes:The best wineries producing red Bor-deaux are not necessarily the same as the bestMdoc or Pauillac wineries (although, theymay overlap). Therefore, this list mustbecome a part of a class definition the sameway as documentation is a part of a class def-inition. The Protg-2000 metaclass archi-tecture lets us do just that.4,5

    Metaclasses are templates for classes inthe same way that classes are templates forinstances. We can define our own meta-classes and effectively change a definition of

    what a class is, in the same way we woulddefine a new class. The default Protg-2000template (the standard metaclass) defines thefields that we see in Figure 2. We can extenddeclaratively this standard definition of what

    a class is with new fields of any type bydefining a new metaclass, which simplybecomes a part of the knowledge base. Fig-ure 4 shows a definition of the Red Bordeauxclass that includes the additional field with a

    list of the best Bordeaux wineries.Similarly, we can define newmetaslots asuser-defined templates for new slots. If slotdefinitions in our system must have fields inaddition to the ones that Protg-2000 has,we simply define new templates where wedescribe these new fields.

    Custom-tailoring slot widgets forvalue acquisition

    The look and behavior of the fields on theknowledge-acquisition forms in Figures 2and 3 depend on the types of the values thatthe fields can take. A field for a string value,

    such as a class name, has a simple text win-dow. A field that contains a list of complexvalues is a list box with buttons to add andremove values and to create new values.These fields are called slot widgets. They notonly display the values appropriately, but alsohelp to ensure that the values are correctbased on the slot definitions in the ontology.For example, the maker of a wine must be awineryan instance of the Winery class. Theslot widget for the maker slot in Figure 3 letsus set the value only to a Winery instance.

    Developers can extend Protg-2000 byimplementing their own slot widgets that are

    tailored to acquire and verify particular kindsof values. Suppose we wanted to be moreprecise about the sugar level in wine and tomark it on a scale rather than simply choos-ing among three valuesdry, sweet, or off-dry.

    MARCH/APRIL 2001 computer.org/intelligent 63

    Figure 3. An instance of the class Pauillac representing the Chteau Lafite Rothschild Pauillac. Thiswine has a full body, a strong flavor, and a moderate tannin level, among other properties.

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    5/12

    We could store the value as a number in thesugar slot. We could use a slot widget thatwould let us select the value on a slider ratherthan enter a number (see Figure 5). When wecustomize knowledge-acquisition forms, wechoose not only the layout of the fields onthe form, but also the slot widgets that mustbe used for different fields. The slot widgetswe choose do not usually affect the contentsof the knowledge base itself, but their use canmake the look and feel of the tool much moresuitable for a particular domain or language.

    Slot widgets also can help ensure the internalconsistency of a knowledge base by check-ing, for example, that an integer value thatwe enter is between the allowed maximumand minimum for that slot.

    Using a back-end plug-in togenerate the right output

    When we develop a domain model in Pro-tg-2000, we do not need to think about thespecific file format that Protg-2000 will use

    to save our work. We think about our domainat the conceptual level and create classes,slots,facets, and instances using the graphical userinterface. Protg-2000 then saves the result-ing domain models in its own file format.Developers can change this file format in thesame way they plug in slot widgets. Back-endplug-ins let developers substitute the Protg-2000 text file format with any other file for-mat. For example, suppose we wanted to useXML to publish the wine ontology and otherdomain models we create using Protg-2000.

    We would then need to create an XML backend that substitutes files in the Protg-2000format with the files in XML. A back end cre-ates a mapping between the in-memory rep-resentation of a Protg-2000 knowledge baseand the file output in the required format. Theback end also enables us to import the files inthat format into Protg-2000. The new fileformat has the same status as the Protg-2000native file format, and the users can chooseeither format for their files.

    Editing Semantic Web languageswith Protg-2000

    Armed with the arsenal of tools to custom-tailor Protg-2000 quickly and easily, letslook at what is involved in creating a Protg-2000 editor for a new Semantic Web language.We will use the Protg-RDFS editor devel-oped in our laboratory as an example, but theideas are the same for any new language.

    We start creating a Protg-2000 editor forour new language by determining the differ-ences between the knowledge models of the

    two languages: the knowledge model of Pro-tg-2000 and the knowledge model under-lying our language of choice. We then decidewhich of the available toolsmetaclasses,custom user-interface components, or a cus-tom back endwe will use to reconcile thedifferences or to hide them from the user.

    In practice, the overlap between the knowl-edge models underlying the Semantic Weblanguages available today is very large. Themodels might use different terminology for

    64 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

    T h e S e m a n t i c W e b

    The field in thenew template

    Figure 4. A class definition that uses a nonstandard template. We added the best wineries slot to the standard class-definition template.

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    6/12

    the same notion (for example, slots in Pro-tg-2000 andproperties in RDFS). How-ever, the structure of the concepts, the under-lying semantics, and the restrictions are oftensimilar.

    When we compare the two knowledgemodels, we identify four categories of con-cepts (see Figure 6):

    1. Concepts that are exactly the same in the

    two languages (possibly with different

    names). Usually, classes, inheritance,instances, slots as properties of classes andinstances, and many of the slot restrictionsfall into this category.

    2. Concepts that are the same but expressed

    differently in the two languages. Forexample, Protg-2000 associates slotswith classes by attaching a slot to a class.

    RDFS defines essentially the same rela-tionship by defining a domain of a property.3. Concepts in our language of choice that

    do not have an equivalent in Protg-2000.For example,RDFS allows an instance tohave more than one type, whereas in Pro-tg-2000 each instance has a unique directtype.

    4. Concepts that Protg-2000 supports and

    our language of choice does not. For exam-ple, Protg-2000 allows a slot to have

    more than one allowed class for its values,

    whereas the range of a property in RDFSis limited to a single class.

    Naturally, we can express all the featuresof our language that fall into the first categorydirectly in Protg-2000. We deal with the dif-ferences in the other three categories by defin-ing appropriate metaclasses and metaslots andby resolving the remaining changes in theback end. We hide the differences from theuser behind custom-tailored slot widgets.

    The second item on the list, the concepts

    that do not have a direct equivalent in Protg-2000 but that can be mapped to native Pro-tg-2000 concepts, deserves a special dis-cussion. Consider domains of properties inRDFS (rdfs:domain). A domain specifies a classon which a property might be used. For exam-ple, the domain of the flavor property is the Wineclass. Protg-2000 slots are similar to prop-erties in RDFS. Attaching a slot to a class inProtg-2000 also specifies that a slot can beused with that class. For example, the flavor slot

    MARCH/APRIL 2001 computer.org/intelligent 65

    A slider widget fornumeric values

    Figure 5. Changing a slot widget. We use a slider instead of a simple field to acquire

    numeric values for the sugar level.

    (2) Concepts that can beencoded as native Protg

    concepts

    (1) Conceptsthat are identical

    semantically

    (3) Concepts that do nothave an equivalentin native Protg

    (4) Concepts in Protg thatdo not have an equivalent

    in the language

    Use nativeProtg concepts

    Use Protgconcepts directly

    Use metaclasses andmetaslots to encode

    the information

    Use custom labels andslot widgets to hide

    the differences

    Back end

    Use custom slot widgetsto facilitate

    knowledge entry

    Use knowledge-acquisitionforms to disable

    the features

    Map between the model inProtg and the model

    required by the language

    Map directly intothe model required

    by the language

    Modeling

    User interface

    Map between the model inProtg and the model

    required by the language

    Define the means of storingthe information in the

    language format

    The new language3

    1

    2

    4 Protg-2000

    Figure 6. Comparing the knowledge models of Protg-2000 and a new Semantic Web language.

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    7/12

    is attached to the Wine class. We have two waysto encode the RDFS domain information in aProtg-RDFS editor. First, we can add adomain slot to a template (metaslot) that we willuse for all our slots. Then, a field for domainwill appear on each form for a slot, and wewill fill it in there. Second, we can simply usethe native Protg-2000 notion of slot attach-ment and translate the attachments of slots to

    classes into domains of properties in the backend. The second solution lets us use the Pro-tg-2000 user interface directly and hides thefeatures of a specific language used to storethe information.

    We find it extremely beneficial to adopt theparadigm of using the native Protg-2000 fea-tures wherever possible and of resorting toadditional definitions,such as metaclasses andmetaslots, only when absolutely necessary.This approach maximally facilitates theexchange of domain models among differentlanguages, which we edit (or will edit) withProtg-2000. As new languages emerge and

    we experiment with them,the knowledge mod-els underlying these languages will undoubt-edly overlap. By encoding as much as possiblein the native Protg-2000 structures and leav-ing part of the translation between the Protg-2000 model and the language to the back end,we maximize the amount of information thatwe will preserve by simply loading a knowl-edge base in one language supported by Pro-tg-2000 and saving it to another language.Even though there would often be some parts

    of these models that will not be part of thisoverlap, we are maximizing the amount ofinformation that gets ported among models indifferent languages for free.

    Having generated the four groups of con-cepts after comparing the two knowledgemodels (see Figure 6), we can reconcile thedifferences using

    1. modelingby changing default defini-tions of classes and slots at the modelinglevel;

    2. the user interfaceby developing spe-cialized user-interface components; and

    3. the back endby implementing the newback end that will translate between thedomain model in Protg-2000 and thedomain model in our language of choice.

    Lets look at how each of these three lev-els works using the development of a Pro-tg-based RDFS editor as an example (seeTable 2 for a summary of the entire process).

    The modeling levelWe start by determining which concepts in

    our language of choice are identical to Protg-2000 concepts or that can be represented usingthe native Protg-2000 concepts. We use thenative Protg-2000 as a means to model thisgroup of concepts, even if it is not how theseelements are directly expressed in our lan-guage. We then define the new templates forclass and slot definitions if necessary.

    Consider, for example, the two attrib-utesrdfs:seeAlso and rdfs:isDefinedBythat areassociated with each class and each propertyin RDFS. The rdfs:seeAlso property specifiesanother resource containing informationabout the subject resource, and the rdfs:isDe-finedBy property indicates the resource defin-ing the subject resource. The values of theseproperties are other resources or URIs point-

    ing to other resources. We must add these twofields that the Protg-2000 itself does nothave to each class and slot form in our knowl-edge base. To add these fields, we define anew metaclass that will serve as a templatefor RDFS classes. This metaclass is, in fact,equivalent to the RDFS class rdfs:Class. Figure7 shows the definition ofrdfs:Class with the newtemplate slots that will appear on each classform that uses this template.

    The user-interface levelWhen creating a Protg-based editor for a

    new language,we can change both the behav-

    ior and the look and feel of the knowledge-acquisition forms to reflect the terminologyand the features of the language. First, we canchange the labels on the formsthe simplesttype of customization. For example, we caneasily replace Protgs Template slotslabel in a class definition with the RDFSProperties label to give the form an RDFSlook. Other elements that we can easily con-figure by manipulating the forms includewhether or not a field should be visible to the

    66 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

    T h e S e m a n t i c W e b

    Table 2. Creating the Protg-based RDFS editor.

    Category (1) Concepts in RDFS (2) Concepts in RDFS (3) Concepts in RDFS (4) Concepts in Protgthat are (nearly) identical that can be encoded that do not have an that do not haveto Protg concepts as native Protg concepts equivalent in native Protg an equivalent in RDFS

    Modeling rdfs:Class = :STANDARD-CLASS Do not use explicit rdfs:domain and rdfs:Class is a default for Cardinality, inverse slot,rdfs:subClassOf = rdfs:range for properties; rdfs: metaclass, and rdf:Property is a and default value facets;

    subclass of domain encoded as slot attachment; default metaslot; add properties multip le allowed classesrdf:type = instance of rdfs:range encoded as allowed class rdfs:isDefinedBy, rdfs:seeAlso for a slotrdf:Property = Instance-typed slots as core slots; add

    :STANDARD-SLOT rdfs:ConstraintProperty andrdfs:subPropertyOf = rdfs:ConstraintResource as

    subslot of core classes; multiple types ofrdfs:Resource = :THING an instancerdf:comment =

    :DOCUMENTATION

    User Custom labels on class Plug-in URI slot widget for Disable default-value andinterface and slots forms (for val idating URI-type slots. inverse-slot widgets on

    example, Properties slot forms.and Comment)

    Back end Map Protg concepts Translate slot attachments On import, create new class Write out extra facet informationdirectly to RDFS concepts as rdfs:domain for as a subclass of the multiple as Protg-specific properties

    properties and allowed types. on properties. If a slot has

    classes as rdfs:range multiple allowed classes, createa new class for rdfs:rangevalue. On import, do the reverse.

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    8/12

    user, the buttons on the fields, the position and

    size of the fields, and the slot widgets to beused for each field. We perform this configu-ration entirely in the Protg-2000 Forms taband not in the programming code.

    We could also develop our own slot-wid-get plug-ins to allow editing and verificationof elements that are unique to our language.For example, a URI widget in the Protg-RDF editor can validate that the user hasentered a correct URI or even take the userto the corresponding Web page.

    Disabling fields for some slots on the formprevents the user from exercising Protg-2000 features that the particular Semantic Web

    language does not support. For example, wecan disable the field for entering default slotvalues in the Protg-RDFS editor, becauseRDFS does not support default values.

    The back-end levelWhatever the differences between Protg-

    2000 and our language that we could notresolve at the modeling and user-interface lev-els, we will need to reconcile in the modulethat saves the internal Protg-2000 repre-

    sentation in the required output file format

    the back-end plug-in. The back-end plug-in

    1. saves a Protg-2000 knowledge base ina file format that conforms to the syntaxof our language of choice;

    2. maps the elements of the Protg-2000knowledge base that do not have a directequivalent in our language to the appro-priate set of elements in this language; and

    3. imports domain models in this languagethat were developed elsewhere for edit-ing in Protg-2000.

    Usually, when developers define a lan-

    guage with a new syntax, they quickly imple-ment a parser that allows developers to readand write files in that languages syntax.Many of the new languages are extensions ofXML or RDF, and thus we can often use theexisting XML and RDF parsers to take careof the syntactic part of adapting to the newlanguage.

    In RDFS, the back end must deal with anumber of issues that we did not resolve atthe modeling level or in the user interface. We

    might have resolved some of these issues

    there, but it would have unnecessarily com-plicated the editor for the user. For example,instances in Protg-2000 are of a singleclass, whereas in RDFS they can be directinstances of several classes (for example, theyhave several direct types). Because the RDFSmodel is more general, we have no problemin saving a Protg-2000 knowledge base inRDFS. However, when we import RDFSinstance data into Protg-2000, we must dealwith instances that have several direct types.Suppose we have a class for red wines and aclass for dessert wines. We have Romariz Port asan instance of both classes in RDFS. When

    we import this RDFS instance data into Pro-tg-2000,the back end can create a new classthat is a subclass of both classes (for exam-ple, denoting a concept of dessert red wines)and make the Romariz Port instance an instanceof this new class. We can record the two orig-inal classes ofRomariz Port as additional slotson the newly created class (as shown in Fig-ure 4). When saving back to RDFS, the backend can extract the information from this slot,thus preserving the original model.

    MARCH/APRIL 2001 computer.org/intelligent 67

    The additional slotsdefining RFDS-specific

    properties

    Figure 7. A template definition for classes in RDFS. The class rdfs:Class inherits most of the slots from the standard class template, butthe two slots at the top of the list of properties are the ones that we defined for RDFS.

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    9/12

    Any user-defined back end has the samestatus as all the other back ends, includingthe ones that are part of the core Protg-2000 system: it can be used as a storage for-mat for Protg-2000 knowledge bases.Therefore, there is another, no less impor-tant, goal of a back-end plug-in: to ensurethat when we create a knowledge base in Pro-tg-2000, save it using the back-end plug-in,

    and load it again, we have preserved all theinformation. Hence, we must find a way tostore the elements that Protg-2000 sup-ports, but that our language of choice doesnot. Most Semantic Web languages are flex-ible enough to easily store this information.For example, in RDFS, we simply add newProtg-specific properties to slots to recorddefault values, which RDFS does not have.These properties have no meaning to anotherRDFS agent, but if we read the knowledge

    base back in Protg-2000, we will have thedefault values preserved.

    Creating new tabs to includeother semantic modules

    In addition to creating a Protg-basededitor for a new Semantic Web language,developers can plug in other applications inthe knowledge-baseediting environment. In

    addition to the standard tabs that constitutethe Protg-2000 user interface (Figures 2and 4), developers can create tab plug-ins inthe same way they can plug in new slot wid-gets. These tabs can include arbitrary appli-cations that benefit from the live connectionto the knowledge base. These applicationsthen become an integral part of the knowl-edge-baseediting environment.

    Consider our wine example again. Havingcreated a knowledge base of wines and food

    and the appropriate combinations, we mightwant to build an application that produceswine suggestions for a meal course in a restau-rant. Such an application would actively usethe data in the knowledge base but it wouldalso implement its own reasoning mechanismto analyze suggestions. We can implement thiswine-selection application as a tab plug-in.

    In practical terms, a tab plug-in is a sepa-

    rate application, a developers own user-interface space from which the developer canconnect to, query, and update the current Pro-tg-2000 knowledge base.

    In the realm of the Semantic Web, a tabcan include any applications that would helpus acquire or annotate the knowledge base.Such applications can

    enable direct annotation of HTML pageswith semantic elements;

    68 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

    T h e S e m a n t i c W e b

    Auxilliary coreclasses

    Figure 8. The definition for the spicy-red-meat-course class in the Protg-OIL editor. In addition to the standard fields, such as those

    shown in Figure 2 ), we have OIL-specific fields such as hasPropertyRestriction and subClassOf for specifying complex class expressions.These slots use the OIL-specific slot widget to display expressions. The tree on the left contains the auxiliary core classes wedefined for OIL.

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    10/12

    provide connection to external reasoningand inference resources;

    acquire the semantic data automaticallyfrom text; and

    present a graphical view of a set of inter-

    related resources.

    Using Protg to edit DAML+OILDAML+OIL, the Semantic Web language

    that was heavily inspired by research indescription logics (DL), allows more typesof concept definitions in ontologies than Pro-tg-2000 and RDFS do. The DL-inspiredlanguages usually include the following fea-tures in addition to the ones found in the tra-ditional frame-based languages:

    We can specify not only necessarybut alsosufficientconditions for class membership.

    For example, if a wine is produced by awinery from the Bordeaux region, it is aBordeaux wine.

    We can use arbitrary Boolean expressionsin class and slot definitions to specifysuperclasses of a class, domain and rangeof a slot, and so on. For example, a spicyred-meat course must contain red meat andmust contain food that is itself spicy orfood containing something that is spicy.

    We can specify global slot properties. Forexample, location is a transitive property: ifthe Chteau Lafite Rothschild winery is in the Bor-deaux region and the Bordeaux region is in

    France, then the Chteau Lafite Rothschild wineryis in France. We can define global axioms that express

    additional properties of classes. For exam-ple, we can say that the classification ofthe class of all wines into the subclassesfor red, white, and ros wines is disjoint:Each instance of the Wine class belongsonly to one of these subclasses.

    We have adapted Protg-2000 to work asan OIL editor. (The OIL language is a pre-cursor for DAML+OIL.) In doing so, we fol-

    lowed the same steps we described in creat-ing the Protg-based RDFS editor. Inaddition, we have integrated external servicesfor OIL ontologies into Protg-2000. Inte-grating DAML+OIL would require nearlyidentical steps.

    The modeling levelWe introduce the new class and slot tem-

    plates, OilClass and OilProperty, to specify com-plex class and slot definitions. As a result, a

    class template, for example, acquires thesethree new fields (see Figure 8):

    1. typeto specify whether the class defin-ition contains only necessary or both

    necessary and sufficient conditions forclass membership;2. hasPropertyRestrictionto specify complex

    expressions for slot restrictions; and3. subclassOfto specify complex expres-

    sions describing the position of the classin the class hierarchy.

    To integrate OIL into Protg-2000,we usedthe names from the RDFS serialization syntaxof (Standard-)OIL and not the plain ASCII ver-sion. See www.ontoknowledge.org/oil/ for thevarious syntaxes and versions.

    Just as for RDFS, we use as many native

    Protg-2000 mechanisms for modeling OILontologies as possible. If a new class is sim-ply a subclass of several existing classes inthe hierarchy,we use Protgs own notion ofsubclasses by placing the new class where itbelongs in the hierarchy. However, if a super-class definition requires boolean expres-sionssomething Protg-2000 does notallowwe use the subClassOf field that we seeon the template. Even though Protg-2000does not understand the semantics of thisfield, we can represent this additional super-class information declaratively,and then passit to a classifier or simply save in OIL.

    We use the hasPropertyRestriction field when weneed complex expressions or when we needto specify existential slot constraints: Protg-2000 allows definition of value-type con-straints on slots (All values of this slot mustbe instances of this class). OIL allows exis-tential slot constraints in addition to the value-type constraints (One value of this slot mustcome from this class and one value mustcome from that class). We build the complexexpressions declaratively by creatinginstances of core auxiliary classes. We cansee some of these core classes in the tree inFigure 8. In the example, we specify a sub-

    class of a meal-course, spicy-red-meat-course, whichwe define as a course that must contain redmeat andmust contain food that is itself spicyorfood containing something that is spicy.

    Even though Protg-2000 does not sup-port some of the semantics that OIL has, wecan still encode the additional informationdeclaratively. Protg-2000 will ignore theinformation, but it will be able to pass it onto a classifier or to encode it in OIL so that anOIL agent can understand it.

    The user-interface levelApart from changing labels and rearrang-

    ing fields on the forms for the OilClass and Oil-Property templates, we created a new slot widgetto allow easier editing of nested expressions

    such as the ones representing food that isitself spicy orfood containing something thatis spicy in Figure 8. This widget augmentsthe standard Protg-2000 widget for select-ing and creating values for instance-valuedslots with a display of the nested expressionsin a more practical form. A further extensionof this simple but effective slot widget caninclude a full context-sensitive, validatingexpression editor.

    The back-end levelWe describe here an OIL back end that pro-

    duces the RDFS output for OIL. Therefore,

    we can build largely on the existing RDFSback end. In defining the class and slot namesand the structure of the auxiliary core classesin the OIL editor,we have mainly adhered tothe RDFS specification of OIL. As a result,

    just using the RDFS back end, described ear-lier, gives us an output that is very close butnot identical to the RDFS OIL output that weneed. Thus, to create the OIL back end, westarted with the existing RDFS back end. Weadapted it to add the parts of definitions spec-ified by the native Protg-2000 means to thecomplex class expressions.

    The OIL back end encodes the concepts

    that Protg-2000 has and OIL does not(global cardinality restrictions on slots, forexample) by defining additional statementsin a Protg namespace. An OIL agent willnot understand these statements and willignore them, but Protg-2000 will be able toextract the necessary information from them.

    Because many Semantic Web languagesare in their infancy and already come in manydifferent versions, there is an alternativeapproach to developing specific back endsfor each of these versions. We can create ageneral RDF back end for Protg-2000 andthen use a declarative and easily adaptable

    RDF transformation language for generatingthe desired outputs. Some research groupsare currently investigating such a back endand the corresponding RDF transformation(and query) language.

    Accessing external services througha tab plug-in

    The DL languages, such as OIL andDAML+OIL, traditionally rely on an infer-ence componenta classifierto find the

    MARCH/APRIL 2001 computer.org/intelligent 69

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    11/12

    right position of a class in the class hierar-chy and to determine which class definitionsare unsatisfiable (cannot have any instances).Therefore, it is crucial to have a connectionto a DL classifier as part of the environmentfor editing OIL and DAML+OIL ontologies.Having created a set of definitions, we caninvoke the classifier to determine how theevolving class hierarchy looks. We can seethe effects that changes in class definitionswill have on the evolving hierarchy. We canimmediately check if logical expressionsdefining a class contradict one another mak-ing the class unsatisfiable.

    Therefore, in order to create a full-fledgedProtg-based OIL editor, we need to con-nect Protg-2000 to such an inference com-ponent and present the results to the user. Weimplemented this connection as a Protg-2000 tab plug-in.

    Figure 9 shows the OIL tab in action. Ini-tially, the class hierarchy has the various meal-course subclasses as siblings. In addition, wespecify that an oyster-shellfish-course is a meal-course that has OYSTERS as the value for its FOOD slot;

    a shellfish-course is a meal-course that has shellfishas its food, and so on. We then use the OIL tabto connect to a DL classifier, FaCT,6 and tohave it rearrange the class hierarchy accord-ing to the class definitions. In the resultinghierarchy, the oyster-shellfish-course class, forexample, is correctly classified as being asubclass of the shellfish-course class.

    With the advent of the Semantic Web,the current network of onlineresources is expanding from a set of staticdocuments designed mainly for human con-sumption to a new Web of dynamic docu-ments, services, and devices, which softwareagents will be able to understand. Develop-ers will likely create many different repre-sentation languages to embrace the hetero-geneous nature of these resources. Somelanguages will be used to describe specificdomains of knowledge; others will modelcapabilities of services and devices. These

    languages will have different emphasis,scope, and expressive power.Protg-2000 provides full-fledged sup-

    port for knowledge modeling and acquisi-tion. Developers also can custom-tailor Pro-tg-2000 quickly and easily to be an editorfor a new Semantic Web language. A Pro-tg-based editor enables modeling at a con-ceptual level that allows developers to thinkin terms of concepts and relations in thedomain that they are modeling and not interms of the syntax of the final output.

    By adapting Protg-2000 to edit a newSemantic Web language rather than creating

    a new editor from scratch or using a text edi-tor to create ontologies in the new language,we obtain a graphical, conceptual-levelontology editor and knowledge-acquisitiontool. We get a new editor to experiment withthe new language without investing manyresources into it. And we can use Protg-2000 as an interchange module to translatemost of the models in other Semantic-Weblanguages into our new language and viceversa. In our experience, it takes a few days

    70 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

    T h e S e m a n t i c W e b

    OIL tab plugin

    Figure 9. Tab plug-in for classification of OIL ontologies. On the right, we see a hierarchy of meal courses before classification. The

    middle pane shows interactions with the FaCT classifier. The hierarchy on the right is the one that the classifier computed.

  • 7/31/2019 Creating Semantic Web Contents With Protege 2000

    12/12

    to adapt Protg-2000 to a new Semantic-Web languagea lot less time than isrequired to create any sophisticated softwarefrom scratch. We were able to create theseeditors even for a language like OIL, which

    takes a knowledge-modeling approach thatis different from the frame-based approachfor which Protg originally was designed.The extensible and flexible knowledge modeland the open plug-in architecture of Protg-2000 constitute the basis for developing asuite of conceptual-level editors for Seman-tic Web languages.

    AcknowledgmentsFor more information about the Protg project,

    please visit http://protege.stanford.edu. A grantfrom Spawar, a grant from FastTrack Systems,andthe DARPA DAML program supported this work.

    References

    1. T. Berners-Lee, M. Fischetti, and M. Der-touzos, Weaving the Web: The Original

    Design and Ultimate Destiny of the World

    Wide Web by its Inventor, Harper, San Fran-cisco, 1999.

    2. D. Brickley and R.V. Guha, ResourceDescription Framework (RDF) Schema Spec-ification,World Wide Web Consortium, Pro-posed Recommendation 1999, www.w3.org/TR/2000/CR-rdf-schema-20000327 (cur-rent 28 Mar. 2001).

    3. J. Hendler and D.L. McGuinness, TheDARPA Agent Markup Language, IEEE

    Intelligent Systems, vol. 16, no. 6, Jan./Feb.,2000, pp. 6773.

    4. N.F. Noy, R.W. Fergerson, and M.A. Musen,The Knowledge Model of Protg-2000:Combining Interoperability and Flexibility,Proc. Knowledge Engineering and Knowl-edge Management: 12th Intl Conf. (EKAW-2000), Lecture Notes in Artificial Intelligence,no. 1937, Springer-Verlag, Berlin, 2000,pp.1732.

    5. M.A. Musen et al., Component-Based Sup-port for Building Knowledge-AcquisitionSystems, Proc. Conf. Intelligent Informa-tion Processing (IIP 2000) Intl Federation

    for Information Processing World ComputerCongress (WCC 2000), Beijing, China, 2000,http://smi-web.stanford.edu/pubs/SMI_Abstracts/SMI-2000-0838.html (current 28Mar. 2001).

    6. I. Horrocks, The FaCT system, Proc. Auto-mated Reasoning with Analytic Tableaux and

    Related Methods:Intl Conf. Tableaux 98,Lec-

    ture Notes in Artificial Intelligence, no. 1397,Springer-Verlag,Berlin, 1998,pp. 307312.

    MARCH/APRIL 2001 computer.org/intelligent 71

    Natalya F. Noy is a research scientist in the Stanford Medical Informaticslaboratory at Stanford University. Her interests include ontology developmentand evaluation, semantic integration of ontologies, and making ontology-

    development accessible to experts in noncomputer-science domains. She hasa BS in applied mathematics from Moscow State University, Russia, an MAin computer science from Boston University, and a PhD in computer sciencefrom Northeastern University in Boston. Contact her at Stanford MedicalInformatics, 251 Campus Dr., Stanford Univ., Stanford, CA 94305;[email protected].

    Michael Sintek is a research scientist at the German Research Center forArtificial Intelligence. Currently, he is project leader of the FRODO projectwhere an RDF-based framework for building distributed organizational mem-ories is developed. He has a Diplom in computer science and economics fromthe University of Kaiserslautern. Contact him at the German Research Cen-ter for Artificial Intelligence (DFKI) GmbH, Knowledge Management Group,Postfach 2080, D-67608 Kaiserslautern, Germany; [email protected].

    Stefan Decker is a postdoctoral fellow at the Department of Computer Sci-ence at Stanford University, where he works on Semantic Web Infrastruc-ture in the DARPA DAML program. His research interests include knowledgerepresentation and database systems for the Web, information integration,and ontology articulation and merging. He has a PhD in computer sciencefrom the University of Karlsruhe, Germany, where he worked on ontology-based access to information. Contact him at Stanford University, Gates Hall4A, Room 425, Stanford, CA 94305; [email protected].

    Monica Crubzy is a postdoctoral researcher in the Stanford Medical Infor-

    matics laboratory at Stanford University. Her research focuses on the mod-eling and integration of libraries of problem-solving methods in the Protgknowledge-based-system development framework. She graduated from thecole Polytechnique Fminine,France, in general engineering and computerscience. She has a PhD in computer science from the Institut National deRecherche en Informatique et Automatique in Sophia Antipolis, France. Con-tact her at Stanford Medical Informatics, 251 Campus Drive,Stanford Uni-versity, Stanford, CA 94305; [email protected].

    Ray Fergerson is a programmer in the Stanford Medical Informatics labo-ratory at Stanford University. He has a BS in physics from the ColoradoSchool of Mines and a PhD in experimental nuclear physics from the Uni-versity of Texas in Austin. Contact him at Stanford Medical Informatics, 251Campus Drive, Stanford University, Stanford, CA 94305; [email protected].

    Mark A. Musens biography appears in the Guest Editors Introduction on page 25.

    T h e A u t h o r s