digital libraries models and content. goals for tonight finish up from last week – the 5 s model...

40
Digital Libraries Models and Content

Upload: caitlin-lattimore

Post on 16-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Digital Libraries

Models and Content

Page 2: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Goals for tonight

• Finish up from last week– the 5 S model more formally– Status of the systems available

• Obtaining, describing, indexing content– XML– Dublin Core– Introducing content exchanges (OAI)

Page 3: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Applying the 5S model, informallyChoose a subject area – then answer the questions• Stream - what types of data? gif, jpg, avi, docx, pdf, html? • Structure - How are the elements organized? Is there a

hierarchy? Are there multiple structures?• Spaces - How will we index the items? How will we divide them

into related groups• Scenarios - what services will we provide? What information do

we need to provide those services? What events might happen that we need to plan for?

• Societies - who is the library intended to serve? Remember to include agents and other processes as well as users.

This is the first deliverable for your first project.

Page 4: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

More formally: Definitions

• Definition: A stream is a sequence whose codomain is a non empty set.

• Definition: A structure is a tuple (G, L, F) where G = (V,E) is a directed graph with vertex set V and edge set E, L is a set of label values, and F is a labeling function. F : (V E ) → L∪ .

See http://www.mathsisfun.com/sets/domain-range-codomain.html for a nice description of domain, range, codomain if you need it.

Page 5: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Structure illustration

ImagesAudio filesBooks

Collectioninclu

des

incl

udes

includes

A very simple structure. How might it be enhanced? How would an index be included? What substructures might be added?

What are the G, L, F, V, E parts of this example?

Page 6: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Definitions, cont’d

• Definition: A space is a measurable space, measure space, probability space, vector space, topological space, or metric space– A vector space is a representation for the set of elements

in a collection. The vector representing each element is a set of characteristics held by that element and both connecting that element to others that are similar and distinguishing it from those that are different.

– We will do an exercise to illustrate

Page 7: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Vector space illustration

• Consider a car. What are the characteristics that you associate with a car? – If you want to compare one car to another, what

characteristics would you choose?– If you wanted to distinguish a car from another type of

vehicle, what characteristics would you need?• distinguish from a snowmobile• distinguish from a truck

• Make a vector of those characteristics.• Then, fill in the vector for several specific cars.

Page 8: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Definitions - 3

• Definition: A scenario is a sequence of related transition events (e1, e2, …, en) on state set S such that ek = (sk, sk+1,) for 1 <= k <= n.– More easily visualized, a scenario is a path in a directed

graph, G = (S, ∑e), where vertices correspond to states in the state set S and directed edges are equivalent to events in a set of events, ∑e, and correspond to transitions between states.

– Scenarios must be implemented to make a working system.

Page 9: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Definitions - 4

• Definition: A society is a tuple (C,R) where – C = (c1, c2, …, cn) is a set of conceptual communities, each

community referring to a set of individuals of the same class or type (e.g. actors, activities, components, hardware, software, data);

– R = (r1, r2, …, rm) is a set of relationships, each relationship being a tuple rj = (ej, ij) where ej is a Cartesian product ck1

x ck2 x … x cknj

. 1<= k1 < k2 < … < knj<= n, which specifies the

communities involved in the relationship and ij is an activity.

Page 10: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Projects in our DL laboratory

• Mendel 289 is the center of activity for projects related to digital libraries and similar projects.

• Summary of the projects under way, which may present opportunities for class projects or for independent study

• NSDL, CITIDEL, CSTA, Ensemble, Distributed Expertise, Computing Ontology, Interdisciplinary Computing and its relationship to the libraries ….

Page 11: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Our systems

• Now available– Fedora linux machines, remotely accessible (use the gateway)– Bare machines with just basic system– We can install Drupal either from the Drupal site (doing things for

ourselves) or from the Bitnami site (builds the stack for us)• I just heard that Drupal may already be installed. Feel free to uninstall and

reinstall if you wish.

• If you have a computer of your own and want to use it, – Fine, but you must be able to demonstrate it to the class at the end

of the semester. I will need to be able to see what you are doing from time to time during the semester. – That means you need a static IP address.

Page 12: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

The Digital Library Content

• Essential elements for a digital library–Users–Content–Services

Page 13: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Content - requirements

• Obtain• Store–Organize–Describe

• Find• Deliver

Page 14: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Describing the content

• How to describe content– Metadata• Machine readable description of anything

• What description– Machine readable requires standard descriptive

elements• Dublin Core (http://dublincore.org/)

– International standard– “a standard for cross-domain information resource

description.”– 15 descriptive elements

• Other metadata schemes– IEEE-LOM

Page 15: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Metadata

• What does metadata look like?• Metadata is data about data– Information about a resource, encoded in

the resource or associated with the resource.

• The language of metadata: XML–eXtensible Markup Language

Page 16: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

XML

• XML is a markup language• XML describes features• There is no standard XML• Use XML to create a resource type• Separately develop software to interact

with the data described by the XML codes.

Source: tutorial at w3school.com

Page 17: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

XML rules

• Easy rules, but very strict• First line is the version and character

set used: – <?xml version="1.0" encoding="ISO-8859-1"?

> • The rest is user defined tags• Every tag has an opening and a

closing

Page 18: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Element naming

• XML elements must follow these naming rules:– Names can contain letters, numbers, and other characters– Names must not start with a number or punctuation character– Names must not start with the letters xml (or XML or Xml ..)– Names cannot contain spaces

Page 19: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Elements and attributes

• Use elements to describe data• Use attributes to present information

that is not part of the data–For example, the file type or some

other information that would be useful in processing the data, but is not part of the data.

Page 20: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Repeating elements

• Naming an element means it appears exactly once.• Name+ means it appears one or more

times• Name* means it appears 0 or more

times.• Name? Means it appears 0 or one

time.

Page 21: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Parts of an XML document

• Elements– The components of an XML document– Some contain other parts, some are empty

• Ex in HTML: “br” or “table” in XML “ingredient”

• Attributes– Information about elements, not data

• Ex in HTML “src=” in XML “scale=”

• Entities– Special characters or strings with pre-assigned meaning

• Ex in HTML &nbsp for non-breaking space

• PCDATA– Parsed Character data: text that will be parsed and interpreted by the reader.

Tags and entities will be expanded and used in presentation.• CDATA

– Character data: text that will not be parsed and interpreted. It will be displayed exactly as provided.

The HTML examples are familiar; the XML examples are made up – dependent on the specific XML scheme used

Page 22: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Using XML - an exampleDefine the fields of a recipe collection:<?xml version="1.0" encoding="ISO-8859-1"?><recipe><recipe-title> </recipe-title><ingredient-list> <ingredient> <ingredient-amount> </ingredient-amount> <ingredient-name> </ingredient-name> </ingredient></ingredient-list><directions></directions></recipe>

ISO 8859 is a character set.

See http://www.bbsinc.com/iso8859.html

Page 23: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Processing the XML data

• How do we know what to do with the information in an XML file?–Document Type Definition (DTD)• Put in the same file as the data -- immediate

reference• Put a reference to an external description• Provides the definition of the legitimate

content for each element

Page 24: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Document Type Definition

• <?xml version="1.0" encoding="ISO-8859-1"?>• <!DOCTYPE recipe [• <!ELEMENT recipe (recipe-title, ingredient-list, directions)>• <!ELEMENT recipe-title (#PCDATA)>• <!ELEMENT ingredient-list (ingredient)>• <!ELEMENT ingredient (ingredient-amount, ingredient-name)*>• <!ELEMENT ingredient-amount (#PCDATA)>• <!ELEMENT ingredient-name (#PCDATA)>• <!ELEMENT directions (#PCDATA)> ]>

Repeat 0 or more times

Page 25: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE recipe SYSTEM “recipe.dtd”><recipe><recipe-title> Meringue cookies</recipe-title><ingredient-list> <ingredient> <ingredient-amount>3 </ingredient-amount> <ingredient-name> egg whites</ingredient-name> </ingredient> <ingredient> <ingredient-amount> 1 cup</ingredient-amount> <ingredient-name> sugar</ingredient-name> </ingredient> <ingredient> <ingredient-amount>1 teaspoon </ingredient-amount> <ingredient-name> vanilla</ingredient-name> </ingredient> <ingredient> <ingredient-amount>2 cups </ingredient-amount> <ingredient-name>mini chocolate chips </ingredient-name> </ingredient></ingredient-list><directions>Beat the egg whites until stiff. Stir in sugar, then vanilla. Gently fold in chocolate chips. Place

in warm oven at 200 degrees for an hour. Alternatively, place in an oven at 350 degrees. Turn oven off and leave overnight.

</directions> </recipe>

Not the way that I want to see a recipe in a magazine!

What could we do with a large collection of such entries?

How would we get the information entered into a collection?

External reference to DTD

Page 26: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

XML exercise

• Design an XML schema for an application of your choice. Keep it simple.

• Examples -- address book, TV program listing, DVD collection, …

Page 27: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Another example• A paper with content encoded with XML:

http://tecfaseed.unige.ch/staf18/modules/ePBL/uploads/proj3/paper81.xml

• First few lines:• <?xml version="1.0" encoding="ISO-8859-1"?>• <?xml-stylesheet href="ePBLpaper11.css" type="text/css"?>• <?xml-stylesheet href="ePBLpaper11.xsl" type="text/xsl"?>• <!DOCTYPE paper SYSTEM "ePBLpaper11.dtd">• <paper id="proj3">• <info>• <title>Standards E-learning and their possible support for a rich pedagogic approach in a• 'Integrated Learning' context</title>• <authors>• <author>• <firstname>Rodolophe</firstname>• <familyname>Borer</familyname>• <homepageurl>http://tecfa.unige.ch/perso/staf/borer/</homepageurl>• <email/>• </author>• </authors>

"ePBLpaper11.dtd” shown on next slide

Page 28: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

<?xml version="1.0" encoding="ISO-8859-1" ?><!-- _________ _____________________ --><!-- ePBL-project DTD for student project management

& specification --><!-- Copyright: (2004)

[email protected] --><!-- http://tecfa.unige.ch/~paraskev/ --><!-- Daniel K. Schneider --><!-- http://tecfa.unige.ch/tecfa-people/schneider.html--><!-- Created: 13/11/2002 (based on EVA_pm grammar) --><!-- Updated: 07/05/2004 --><!-- VERSIONS --><!-- v1.1 Adaptations to use with Morphon xml editor

and addition of IDs--><!-- ____________________ --><!-- _ ENTITY DECLARATIONS ______ --><!ENTITY % foreign-dtd SYSTEM "ibtwsh6_ePBL.dtd">%foreign-dtd;<!ENTITY % id "id ID #IMPLIED"><!-- ______ MAIN ELEMENT _________ --><!ELEMENT project (name, authors, date, updated,

goal, state-of-the-art, research-development-questions, methodology, workpackages ) >

<!ELEMENT name (#PCDATA )><!ELEMENT date (#PCDATA )><!ELEMENT authors (#PCDATA )>

<!ELEMENT updated (#PCDATA )><!ELEMENT goal (title, description )><!ELEMENT state-of-the-art %vert.model;><!ATTLIST state-of-the-art %id;><!ELEMENT research-development-questions (question )

+>

<!ELEMENT question (title, description )><!ELEMENT methodology %vert.model;><!ATTLIST methodology %id;><!ELEMENT workpackages (workpackage )+><!ELEMENT workpackage (planning, objectives,

deliverables )><!ATTLIST workpackage %id;><!ELEMENT objectives (objective )+><!ELEMENT objective (title, description )><!ELEMENT deliverables (deliverable )+><!ELEMENT deliverable (url, title, description )><!ELEMENT url (#PCDATA )><!ELEMENT planning (from, to, progress )><!ELEMENT from (#PCDATA )><!ELEMENT to (#PCDATA )><!ELEMENT progress (#PCDATA )><!-- ________________________ --><!ELEMENT title (#PCDATA )><!ATTLIST title %id;><!ELEMENT description %vert.model;><!-- _______________________ -->

Source: http://tecfa.unige.ch/staf/staf-j/vuilleum/staf18/p6/

Page 29: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Vocabulary

• Given the need for processing, do you want free text or restricted entries?

• Free text gives more flexibility for the person making the entry

• Controlled vocabulary helps with– Consistent processing– Comparison between entries

• Controlled vocabulary limits– Options for what is said

Page 30: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Vocabulary example

• Recipe example– What text should be controlled?– What should be free text?

• Ingredients– Ingredient-amount– Ingredient-name– Should we revise how we coded ingredient amount?

• Directions

Page 31: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Dublin Core

• Standard set of metadata fields for entries in digital libraries:– Title, creator, subject, description, publisher,

contributor, date, type, format, identifier, source, language, relation, coverage, rights

Page 32: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Dublin Core elementssee: http://dublincore.org/documents/dces/

• Title• Creator • Subject - C• Description• Publisher• Contributor• Date • Type - C

• Format - C• Identifier• Source• Language• Relation• Coverage - C

• RightsRights Management information

Space, time, jurisdiction.

C = controlled vocabulary recommended.

Ref. to related resource

Standards RFC 3066, ISO639

Unambiguous ID

Ex: collection, dataset, event, image

YYYY-MM-DD, ex.

Entity primarily responsible for making content of the resource

Entity making the resource available

Contributor to content of the resource

What is needed to display or operate the resource.

Page 33: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Dublin Core Terms

• An update to the original DC elements– Adds the concept of range and domain

Each term has this minimal set of attributes:• Name: A token appended to the URI of a DCMI namespace to

create the URI of the term.• Label: The human-readable label assigned to the term.• URI: The Uniform Resource Identifier used to uniquely identify

a term.• Definition: A statement that represents the concept and

essential nature of the term.• Type of Term: The type of term as described in the DCMI

Abstract Model [DCAM].

Page 34: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

DC Terms

Additional Attributes possible:

• Comment: Additional information about the term or its application.• See: Authoritative documentation related to the term.• References: A resource referenced in the Definition or Comment.• Refines: A Property of which the described term is a Sub-Property.• Broader Than: A Class of which the described term is a Super-Class.• Narrower Than: A Class of which the described term is a Sub-Class.• Has Domain: A Class of which a resource described by the term is an Instance.• Has Range: A Class of which a value described by the term is an Instance.• Member Of: An enumerated set of resources (Vocabulary Encoding Scheme)

of which the term is a Member.• Instance Of: A Class of which the described term is an instance.• Version: A specific historical description of a term.• Equivalent Property: A Property to which the described term is equivalent.

Page 35: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

The DC Terms – from 15 to …

abstract, accessRights, accrualMethod, accrualPeriodicity, accrualPolicy, alternative, audience, available, bibliographicCitation, conformsTo, contributor, coverage, created, creator, date, dateAccepted, dateCopyrighted, dateSubmitted, description, educationLevel, extent, format, hasFormat, hasPart, hasVersion, identifier, instructionalMethod, isFormatOf, isPartOf, isReferencedBy, isReplacedBy, isRequiredBy, issued, isVersionOf, language, license, mediator, medium, modified, provenance, publisher, references, relation, replaces, requires, rights, rightsHolder, source, spatial, subject, tableOfContents, temporal, title, type, valid

Page 36: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

DC terms

• See http://dublincore.org/documents/dcmi-terms/

• Review the list and see what has been added

Page 37: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

A Drupal example

• Ensemble: www.computingportal.org

Page 38: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

IEEE - LOM

• Example of a specialized metadata scheme– Learning Object Metadata• Specifically for collections of educational materials• Includes all of Dublin Core• See http://projects.ischool.washington.edu/sasutton/IEEE1484.html

Page 39: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Computing systems

• Linux machines• Introduction to unix:

http://www.csc.villanova.edu/~lab/unix/• Dspace: http://www.dspace.org/– Documentation, including installation -

http://www.dspace.org/index.php?option=com_content&task=view&id=151&Itemid=116

• Najib Nadi, our system administrator, is setting up the machines. He will send a message to the class by the middle of the week with details of machine location and login.

Remember - you have the option to use your own machine, but must meet the criteria described last week.

Page 40: Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

This session

• Defined meta data and its role in digital libraries.

• Introduced XML as a language for describing a collection of content.

• Described the computing resources and how to get ready for the first DL setup.