automatic metadata generation using associative networks

28
Automatic Metadata Generation Using Associative-Networks Marko A. Rodriguez CCS-3 ‘Tech Talk’ December 7, 2005 http://www.soe.ucsc.edu/~okram

Upload: marko-rodriguez

Post on 11-May-2015

1.389 views

Category:

Technology


0 download

DESCRIPTION

In spite of its tremendous value, metadata is generally sparse and incomplete, thereby hampering the effectiveness of digital information services. Many of the existing mechanisms for the automated creation of metadata rely primarily on content analysis which can be costly and inefficient. The automatic metadata generation system proposed in this article leverages resource relationships generated from existing metadata as a medium for propagation from metadata-rich to metadata-poor resources. Because of its independence from content analysis, it can be applied to a wide variety of resource media types and is shown to be computationally inexpensive. The proposed method operates through two distinct phases. Occurrence and co-occurrence algorithms first generate an associative network of repository resources leveraging existing repository metadata. Second, using the associative network as a substrate, metadata associated with metadata-rich resources is propagated to metadata-poor resources by means of a discrete-form spreading activation algorithm. This article discusses the general framework for building associative networks, an algorithm for disseminating metadata through such networks, and the results of an experiment and validation of the proposed method using a standard bibliographic dataset.

TRANSCRIPT

Page 1: Automatic Metadata Generation using Associative Networks

Automatic Metadata Generation Using Associative-Networks

Marko A. RodriguezCCS-3 ‘Tech Talk’December 7, 2005

http://www.soe.ucsc.edu/~okram

Page 2: Automatic Metadata Generation using Associative Networks

Resources and Metadata

• A resource is any digital-object (e.g. manuscripts, images, video, audio, etc.).

• A resource’s metadata record is a list of attributes describing the resource

[ EXAMPLE MANUSCRIPT METADATA ] Authors, Institutions, Keywords, Subject Categories, Citations, Year, Publishing Journal, Usage Data

Page 3: Automatic Metadata Generation using Associative Networks

Metadata Record<?xml version="1.0" encoding="UTF-8" ?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  <responseDate>2005-09-07T15:25:04Z</responseDate>   <request verb="GetRecord" identifier="oai:arXiv.org:cs/0412047" metadataPrefix="oai_dc">http://arXiv.org/oai2</request> <GetRecord> <record> <header>  <identifier>oai:arXiv.org:cs/0412047</identifier>   <datestamp>2004-12-14</datestamp>   <setSpec>cs</setSpec>   </header> <metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/">  <dc:title>A Social Network for Societal-Scale Decision-Making Systems</dc:title>   <dc:creator>Rodriguez, Marko</dc:creator>   <dc:creator>Steinbock, Daniel</dc:creator>   <dc:subject>Computers and Society</dc:subject>   <dc:subject>Data Structures and Algorithms</dc:subject>   <dc:subject>Human-Computer Interaction</dc:subject>   <dc:subject>H.4.2</dc:subject>   <dc:subject>J.7</dc:subject>   <dc:subject>K.4.m</dc:subject>   <dc:description>In societal-scale decision-making systems the collective is faced ...</dc:description>   <dc:description>Comment: Dynamically Distributed Democracy algorithm</dc:description>   <dc:date>2004-12-10</dc:date>   <dc:type>text</dc:type>   <dc:identifier>http://arxiv.org/abs/cs/0412047</dc:identifier>   <dc:identifier>North American Association for Computational Social and Organizational Science Conference Proceedings 2004</dc:identifier>   </oai_dc:dc>  </metadata>  </record>  </GetRecord></OAI-PMH>

Page 4: Automatic Metadata Generation using Associative Networks

Problem Statement

• Metadata is costly to generate by hand

• Metadata is hard to extract from raw resource (e.g. audio, video)

• How can we automatically generate metadata for atrophied resource records?

Page 5: Automatic Metadata Generation using Associative Networks

General System Overview

• Generate resource relations with existing metadata in the repository.– occurrence and/or co-occurrence networks

• Propagate metadata from metadata rich resources to metadata limited resources– encapsulate metadata in discrete particles

and disseminate them over the generated associative network

Page 6: Automatic Metadata Generation using Associative Networks

HEP-TH 2003 Semantic Network

A1

P1

Autho

r

of

O1

J1J2

K1

K2

T1

T2

A2

A3P2

O2

P3

P4

P5

cite

s

Aut

hor o

f

Published

journal

Published

journal

Has ke

ywor

d

Has keywordAuthor

of

Author of

Author of

Organization of

Organization of

Publishedtime

Publis

hed

time Published time

Author of

Organizationof

Publis

hed

time

Haskeyword

cites

Publishedjournal

c

ites

cite

s

A4Author

of

Page 7: Automatic Metadata Generation using Associative Networks

Transforming the Semantic Network

Convert the multi-node network into a collection of manuscripts with their associated attributes (metadata record).

– manuscript• Authors• Citations• Publication Date• Keywords• Organizations• Journal

resource

metadata record

Page 8: Automatic Metadata Generation using Associative Networks

Occurrence/Co-Occurrence

• Citation: two manuscripts are connected if one manuscript cites the other.

• Co-Author: two manuscripts are connected if they share the same authors

• Co-Citation: two manuscripts are connected if they share the same authors

• Co-Keyword: two manuscripts are connected if they share the same keywords

• Co-Organization: two manuscripts are connected if they share the same organizations

• Co-Date: two manuscripts are connected if they share the same publication date

• Co-Journal: two manuscripts are connected if they share the same journal

Page 9: Automatic Metadata Generation using Associative Networks

Network Generation Running Times

• Occurrence: O(N)– Each resource’s metadata record much be

checked once and only once for a direct reference to another resource.

• Co-occurrence: O([N2 – N] / 2)– Each resource’s metadata record much be

check against every other resource’s (N2), except itself (-N), once and only once (1/2).

A B

A B

C

Page 10: Automatic Metadata Generation using Associative Networks

Particle Propagation

• Every resource is given one particle, p_i. This particle contains all the metadata associated with its resource.

• A particle also has an energy value, e_i. The further the particle travels (edge steps), the more its energy value decays.

e_i(t+1) = e_i(t) * (1-\delta)

Page 11: Automatic Metadata Generation using Associative Networks

Particle Propagation

• The particle takes an outgoing edge of its current node based on the probability distribution of its outgoing edge set. If the resource it encounters doesn’t have metadata of a particular type, it recommends that resource its metadata weighted by its energy value.

Page 12: Automatic Metadata Generation using Associative Networks

Metadata Recommendations

• Manuscript A– Journal

• Journal of Complexity [0.2457]• Journal of Information Science [0.1]• Information Processing and Management [0.001]

recommendation strength

Page 13: Automatic Metadata Generation using Associative Networks
Page 14: Automatic Metadata Generation using Associative Networks

Mini-Break

Page 15: Automatic Metadata Generation using Associative Networks

Terrorist Alert

Page 16: Automatic Metadata Generation using Associative Networks

System Parameters

• Metadata Density: to validate the algorithm we kill a percentage of the metadata in the system and see if we can reconstruct it using the algorithm (d \in [0,1])

• Metadata Percentile: only those metadata tags in the pth percentile are accepted as valid metadata (p \in [0,1])

** Validation is based Precision and Recall values

Page 17: Automatic Metadata Generation using Associative Networks

Results for Co-Author Network(Citation Metadata)

Page 18: Automatic Metadata Generation using Associative Networks

Results for Co-Author Network (Organization Metadata)

Page 19: Automatic Metadata Generation using Associative Networks

Results for Co-Author Network (Keyword Metadata)

Page 20: Automatic Metadata Generation using Associative Networks

Results for Co-Keyword Network(Citation Metadata)

Page 21: Automatic Metadata Generation using Associative Networks

Results for Co-Keyword Network(Journal Metadata)

Page 22: Automatic Metadata Generation using Associative Networks

Results for Citation Network(Author Metadata)

Page 23: Automatic Metadata Generation using Associative Networks

Results for Citation Network(Keyword Metadata)

Page 24: Automatic Metadata Generation using Associative Networks

Results for Citation Network(Journal Metadata)

Page 25: Automatic Metadata Generation using Associative Networks

Take Home Points

• Different edge types are better a propagating different metadata types.

• Can work for any resource type as long as there exists some preliminary vetted metadata and a way to create resource relations. (if there is pre-existing metadata then resource relations can be automatically created).

Page 26: Automatic Metadata Generation using Associative Networks

Future Work (part 1)

• What about path types? e.g. take a co-author edge, then a citation edge, etc. Better precision and recall?

• Explore usage metadata (applicable to any resource type—and allows for cross resource relations (e.g. manuscripts connected to audio)). The weight between two resources is a function of the interval between their download from the same IP. (Bollen, et.al. 2004)

Page 27: Automatic Metadata Generation using Associative Networks

Future Work (part 2)

• Application to social-networks? Given an unknown individual, infer his attributes according to his social-relationships

how does ‘work_with’ differ from ‘married_to’? They share same income metadata and religious belief metadata, respectively.

Page 28: Automatic Metadata Generation using Associative Networks

Conclusion

• Good life…

Rodriguez, M.A., Bollen, J., Van de Sompel, H., “Automatic Metadata Generation using Associative Networks”, [unpublished], 2005.

Know of a good journal venue?