spatiotemporal infrastructure for semantic network in digital archives

34
2002APEC Workshop on e-Learning and Digital Lib Academia Sinica, Taipei, Taiwan. Dec. Spatiotemporal Infrastructure for Semantic Network in Digital Archives Eric Yen Computing Centre, Academia Sinica Dec, 2002

Upload: shepry

Post on 12-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Spatiotemporal Infrastructure for Semantic Network in Digital Archives. Eric Yen Computing Centre, Academia Sinica Dec , 2002. Outline. Introduction NDAP Approaches – Space-Time-Language Coordinates Archiving and processing of millions of geospatial materials in AS Characteristics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Spatiotemporal Infrastructure for Semantic Network in Digital

Archives

Eric YenComputing Centre, Academia Sinica

Dec, 2002

Page 2: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Outline Introduction NDAP Approaches – Space-Time-Language Coordinates Archiving and processing of millions of geospatial materials

in ASCharacteristicsHow to delve into the knowledge levelExperiences & Lessons we learnedExtend to more general solution

Geolibrary The Trends Conclusions

Page 3: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Introduction to Digital Archive Digital Archive is a collection of digital objects. A digital object is defined as something (e.g., an image, an audio

recording, a text document, a movie, a map) that has been digitally encoded and integrated with metadata to support discovery, use, and storage of those objects.

Goals for Digital Archive (functional point of view) Protection of the original Duplication for safety Search and Retrieval Easy Access Resource Sharing Lower cost of maintenance and dissemination Max. flexibility for integration of heterogeneous/homogeneous

information resources Providing abundant resources for knowledge discovery and knowledge

construction

Page 4: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Knowledge Discovery and Construction Knowledge construction means the active process of

manipulating data to arrive at abstract models of relationships among phenomena in the world that facilitate our understanding of those phenomena and, ultimately, of the world. [1]

Knowledge discovery is a nontrivial process of identifying valid, novel, useful, and understandable pattern in data. [2]

Persistent cataloging, classification, and segmentation of digital objects is the ground for finding patterns, models, and trends of large volume data.

Reference: 1. MacEachren, A. et al, Constructing knowledge from multivariate spatiotemporal Data: integrating

geographic visualization with knowledge discovery in database methods2. Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P., 1996, From data mining to knowledge discovery:

An overview. In advances in Knowledge Discovery and Data Mining, pp.1-34.

Page 5: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Types of Elementary Knowledge Organization Systems Classification Systems Ontologies Taxonomies Index Languages Thesauri and other controlled lists of keywords Glossary Dictionaries Clustering Approaches Lexical Databases Concept Maps/Spaces Semantic Road Maps …

Page 6: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Why Knowledge-based

Approach for Digital Library ?1 Providing “Conceptual Infrastructure” Mapping out the conceptual structure and providing a common language for a field Providing classification/typology and concept definitions. Clarifying concepts by putting

them into context. Thus providing orientation and serving as a reference tool for individual researchers and practitioners and thereby

Assisting with the exploration of the conceptual context of a research problem and in structuring the problem, thereby providing the conceptual basis for the design of good research, for the consistent definition of variables, and thus the cumulation of research results.

Providing the conceptual basis for the exploration of the various aspects of a program in program planning, in the identification of approaches and strategies, and in the development of evaluation criteria

Assisting users in understanding context Assisting information providers with conceptualizing a topic and with finding

the proper term Discovery of high quality resources Providing frameworks for information exchange and resource interoperability

Dagobert Soergel, Evaluation of Knowledge Organization Systems (KOS)

Page 7: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Information Storage & Retrieval Information system(s) in which the vocabulary is to be used Use of the vocabulary

Vocabulary control in indexing and searching (controlled vocabulary) Vocabulary control only for searching. Assist with clarifying a search topic and

assembling all applicable concepts and terms, whether searching with a controlled vocabulary of free-text.

ISAR technique(s) (such as: printed index, computer search system). Support of inclusive (hierarchically expanded) searching

Automated vs. manual indexing or query formulation. Approach to indexing to be supported: Request-oriented vs. entity-oriented

Techniques for eliciting user needs (e.g., menu based on search tree; questions based on facet structure)

Summary evaluation of the vocabulary's adequacy for the stated purpose on the more detailed analysis as outlined below.

Translation Language learning

Dagobert Soergel, Evaluation of Knowledge Organization Systems (KOS)

Why Knowledge-based

Approach for Digital Library ?2

Page 8: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Digital library requirements for knowledge organization schemas

The need for knowledge organization in subject gateways and discovery services, issues of application and use

Web-based directory structures as knowledge organization systems

Knowledge organization as support for web-based information retrieval, query expansion, cross-language searching

Semantic portals

ECDL2000, Special Workshop on Networked Knowledge Organization Systems, http://nkos.slis.kent.edu/ECDL-NKOS-final.htm

Page 9: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Digital library requirements for knowledge based data processing

Knowledge organization for filtering, information extraction, summary

Knowledge organization support for multilingual systems, natural language processing or machine translation

Structured result display, clustering End-user interactions with knowledge organization

systems, evaluation and studies of use, knowledge bases for supportive user interfaces, visualization

ECDL2000, Special Workshop on Networked Knowledge Organization Systems, http://nkos.slis.kent.edu/ECDL-NKOS-final.htm

Page 10: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Digital library requirements for knowledge structuring and management

Suitable vocabulary structures, conceptual relationships Comparison between established library classification

systems and home-grown browsing structures Methodologies, tools and formats for the construction and

maintenance of vocabularies and for mapping between terms, classes and systems

Frameworks for the analysis of assumptions and viewpoints underlying the construction and application of terminology systems

Methods for the combination and adaptation of different vocabularies

ECDL2000, Special Workshop on Networked Knowledge Organization Systems, http://nkos.slis.kent.edu/ECDL-NKOS-final.htm

Page 11: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Digital library requirements for access to knowledge structures

Data exchange and description formats for knowledge organization systems, the potential and limitations of XML and RDF schemas

Handling of subject information in metadata formats Standards and repositories for machine-readable

description of networked knowledge organization schemas (as collections/systems)

Interoperability, cross-browsing and cross-searching between distributed services based on knowledge organization systems

Distributed access to knowledge organization systems: standard solutions and protocols for query and response, taxonomy servers

ECDL2000, Special Workshop on Networked Knowledge Organization Systems, http://nkos.slis.kent.edu/ECDL-NKOS-final.htm

Page 12: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Discover Knowledge from Digital Archive

Geospatial information means those geo-materials that are georeferenced and having well-documented metadata

Ref. Components of a digital object in digital archive Geospatial Content Based Extracting knowledge by space-time-language

Page 13: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Knowledge about Space Temporal Characteristics is embedded and could not be neglected Acquisition

Direct Experience Locomotion thru environment(crawling, walking, running, bicycling, driving, flying, et

c.) Stationary viewing

Secondary Environmental Experience Static medium: maps, diagrams, paintings, photos, etc. Dynamic medium: animate static visual figures to show changes over time

Other ways to conceive those that can not be viewed Characteristics

Multimodal: proprioceptive, kinesthetic, auditory, visual, etc. Language is often used to convey spatial information Multi-perspective and scales

充分瞭解人類獲取、整合與利用空間資訊模式,將可促進此類資訊的更有效利用,以及建立更符合實際需求的應用機制 (e.g., aid for decision making)

Page 14: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Spatial Representation in GIS Data Model

Vector: explicit Basic elements: point, line and polygon

Raster: implicit Geographic space is organized into partitions (layers) Space-dominant representations focus on the spatial

arrangement of entities based on the geometric and thematic properties of these entities. Space is a neutral container Entities only exist when associated to a layer or theme Applied primarily in traditional mapping Layer-based raster and vector models Each layer is associated to a period or point in time Change- or update-based scenario Analysis based on similarity or dissimilarity between aggregations

(layers) at different points of time

Page 15: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Why Thinking in Spatio-Tempoal ways?

Because the earth is running: It’s incomplete to describe an events/object in spatial domain only.

Learn from the past, and plan for (predict) the future.

Characteristics of Space & TimeImportanceTo organize space over time

Page 16: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Discover Knowledge from Geospatial Information

Geospatial information means those geo-materials that are georeferenced and having well-documented metadata

Ref. Components of a digital object in digital archive Geospatial Content Based

Feature Identification Feature comparison: enhance the likelihood of relationships among

features Feature interpretation: merge the identified features and their

relationships with real world entity, by domain knowledge Linking to other resources that are related to this feature, this place

and the time parsing the collected information from metadata or lexical analysis

Demands Link spatiotemporal data analysis techniques to GIS

Feature interpretation tools must provide connections between abstract representations of data, metadata that describe those data, an analyst’s knowledge, and knowledge sources external to the data set being explored (e.g., thru digital library)

Feature interpretation tools must provide connections between abstract representations of data, metadata that describe those data, an analyst’s knowledge, and knowledge sources external to the data set being explored (e.g., thru digital library)

Page 17: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Discover Knowledge from Geospatial Information

Def: Finding instances of identifiable features in spatiotemporal data Emphasis is on examining the distribution of data in all of its dimensions in an effort to

notice any distinct object, regularity, anomaly, hot spot, etc.

Feature Identification

Example:Distribution of Tombs in Han Dynasty

Page 18: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Integrated Support for Research

Page 19: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

WebGIS-based System Architecture

Page 20: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Challenges of Geospatial Information Processing

High threshold for general users Hard to find required geospatial content/service New retrieval technology for geospatial

information Persistent metadata and archive Mechanism for effective management of huge

volume of data set Efficient ways for digitization/vectorization of

geospatial materials Integration with other information resources

Page 21: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Discover Knowledge by Space-Time-Language Coordinates

Constructing the linkage among diversified archives thru language (vocabulary)

Lingual coordinate has both spatial and temporal extentsLingual-Temporal Plane: evolution of language thru timeLingual-Spatial Plan: spatial distribution in dialect

Multi-lingual support for digital archive Establishment of domain-specific controlled vocabulary sets,

and serve as basis of ontology

Page 22: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Discover Knowledge by Space-Time-Language Coordinates

Language

Time

Space

Page 23: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Space, Time and Language Coordinates for Digital Archives

LanguageLanguage

TimeTime SpaceSpace

Language Language in Timein Time

HistoricalGIS

Language Language in Spacein Space

Language Language in Text, in in Text, in Speech...Speech...

Language Changes

Digital Archives

Language variations

Page 24: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

A lexis/vocabulary in context is analogy to the basic unit of a concept in knowledge Lexis is the basic unit for any kind of language process, such as recognition, parsing, wordformati

on, semantics, conversation and analysisThru lexical analysis, collection of all the lexical types(詞類 ), lexical patterns(grammar文法 ), a

nd instances could pave the base as lingual coordinate.Collection of enough description(context incl. metadata) for a specific domain(could be a set of di

gital objects), ontology(collection of concepts for the domain) of that field is constructed. How do we know if that is enough? Need the self-learning capability in the mechanism

Atomic attributes of a place nameName

Glyph & stroke: original writing, all the historical and contemporary writing, and Romanization(pinyin) Pronunciation: indigenous and evolutions afterward meaning (if we could restore to original fonts & sound)

Footprint Could be ambiguous: M N

Time: (start, end), could be vague for historical namesType: (geographic type, also could know the administrative level if it represents an administrative

area)

Atomic attributes of a datumPeople, event, time, place, object

Lingual Coordinate in NDAP

Page 25: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Geographic searching is a powerful and important toolMore than 80% information resources pertain to specific geographic areas and are either explicitly

or implicitly geo-referenced. To utilize benefits of geographic search, we have to geo-reference information contents first. the cost of creating geographic footprints for each record (the Alexandria Digital Library Project

spent $4m over four years) is very high. The automatic extraction of geo-referenced information is also possible but there is a need for sophisticated tools that go further than geographic name extraction.

Moving from information management toward knowledge management (Demands) New ways of information search & retrieval

Traditional full-text search Keyword-based or query by example search Query by information content (image, audio, video, and multimedia contents) Incorporation of geographic & temporal search

Versatile ways for presenting information & knowledge 2D, 3D, or 4D Multimedia, virtual reality Map-on-demand, thru the parser of geographic names from context, or directly by the coordinates

Separation of content representation & presentationThe core is the metadata-based content analysis

CA(Information Content)Metadata Schemes for management of contents Identify the best way of information representation and become persistent archive

Constructing Space-Time-Language Coordinates for NDAP

Page 26: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

漢籍全文檢索漢籍全文檢索

圖書聯合目錄查詢

圖書聯合目錄查詢

清代地方誌檢索清代地方誌檢索

人物資料庫查詢

人物資料庫查詢

中國歷史文化地圖之整合應用

Page 27: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Page 28: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Roles of Visualization in Knowledge Discovery

Role Useful in finding holes or errors in data sets Useful for noticing abstract features and patterns Predigest complex relations of data sets into visual form Facilitate access to multiple perspectives on information, thru

interactivity Facilitate decisions on appropriate model representation during

analysis stage. Process tracking: uncover key aspects of a process Parameter control to get corresponding outcome on the fly

Functionality

Page 29: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Geolibrary Objective: Lower the barriers for applying GIScience

technologies Approaches

Collecting and providing basic georeferenced spatial data/knowledge persistently

Building up application environment and tools for utilization of spatiotemporal knowledge and technologies

Development of spatiotemporal-based technologies for multi-disciplinary contents integration, aggregation, knowledge discovery in map-metaphor

Focus & Approach Construction of the System Infrastructure for Spatial and Temporal

Information Technology Development of Core Technology Establishment of Effective Service Model for Research Support

Page 30: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Clearinghouse An instance of implementation of interoperability Functionality

Locating the required resources/servicesMaintaining a persistent catalog of resources/services for

sharingExchange of information contentFormat transformation

Partnerships

MetadataMetadata

GEOdataGEOdata

Clearinghouse (catalog)Clearinghouse (catalog)

FrameworkFramework

StandardsStandards

Page 31: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Effective Management System for Huge Volume of Data

Remote sensing data: 2TB/day;And will accumulate to 5 Peta Byte in 2005。

According to the statistics of EU Space Center Raw data from satellite : 100GB/day, 500GB/day (after Feb. 2002) 800 TB data had been archived

Big Challenge of IT for cataloging, searching, retrieval, management, identification, knowledge discovery, and integration、

Trading off between decentralization and consolidation on cost, Convergent to multi-centers of information resources in Internet Think about how to facilitate the collaboration among those centers –

Community and virtual organization

Demands for complete architecture and services Data Grid

Page 32: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

What’s the Solution Support sharing and coordinated use of diverse resources in

dynamic “virtual organizations” – Grid ! Good technical solutions for key problems, such as

Security enhancement like authentication and authorization Resource discovery and monitoring Reliable remote service invocation High-performance remote data access -- Grid !

Good quality reference implementation, multi-lingual support, interfaces to many systems, large user base, industrial support, etc. – Grid !

Persistent Web Services – Grid !

Page 33: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Measuring Success High degree of component autonomy Low cost of infrastructure Ease of contributing components Ease of using components Breadth of task complexity supported by the approach Scalability in the number of components

Page 34: Spatiotemporal Infrastructure for Semantic Network in Digital Archives

2002APEC Workshop on e-Learning and Digital LibrariesAcademia Sinica, Taipei, Taiwan. Dec. 16-20

Conclusions and Future Work Building the right infrastructure will be crucial Intersection of spatiotemporal coordinates and lingual

coordinate constitutes a good framework both for knowledge extraction and interoperability

Consensus gathering and technology development still the major challenges for interoperability

Open System, Open Standard, and Open Source