Transcript
Page 1: Automatic Metadata Generation using Associative Networks

Automatic Metadata Generation Using Associative-Networks

Marko A. RodriguezCCS-3 ‘Tech Talk’December 7, 2005

http://www.soe.ucsc.edu/~okram

Page 2: Automatic Metadata Generation using Associative Networks

Resources and Metadata

• A resource is any digital-object (e.g. manuscripts, images, video, audio, etc.).

• A resource’s metadata record is a list of attributes describing the resource

[ EXAMPLE MANUSCRIPT METADATA ] Authors, Institutions, Keywords, Subject Categories, Citations, Year, Publishing Journal, Usage Data

Page 3: Automatic Metadata Generation using Associative Networks

Metadata Record<?xml version="1.0" encoding="UTF-8" ?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  <responseDate>2005-09-07T15:25:04Z</responseDate>   <request verb="GetRecord" identifier="oai:arXiv.org:cs/0412047" metadataPrefix="oai_dc">http://arXiv.org/oai2</request> <GetRecord> <record> <header>  <identifier>oai:arXiv.org:cs/0412047</identifier>   <datestamp>2004-12-14</datestamp>   <setSpec>cs</setSpec>   </header> <metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/">  <dc:title>A Social Network for Societal-Scale Decision-Making Systems</dc:title>   <dc:creator>Rodriguez, Marko</dc:creator>   <dc:creator>Steinbock, Daniel</dc:creator>   <dc:subject>Computers and Society</dc:subject>   <dc:subject>Data Structures and Algorithms</dc:subject>   <dc:subject>Human-Computer Interaction</dc:subject>   <dc:subject>H.4.2</dc:subject>   <dc:subject>J.7</dc:subject>   <dc:subject>K.4.m</dc:subject>   <dc:description>In societal-scale decision-making systems the collective is faced ...</dc:description>   <dc:description>Comment: Dynamically Distributed Democracy algorithm</dc:description>   <dc:date>2004-12-10</dc:date>   <dc:type>text</dc:type>   <dc:identifier>http://arxiv.org/abs/cs/0412047</dc:identifier>   <dc:identifier>North American Association for Computational Social and Organizational Science Conference Proceedings 2004</dc:identifier>   </oai_dc:dc>  </metadata>  </record>  </GetRecord></OAI-PMH>

Page 4: Automatic Metadata Generation using Associative Networks

Problem Statement

• Metadata is costly to generate by hand

• Metadata is hard to extract from raw resource (e.g. audio, video)

• How can we automatically generate metadata for atrophied resource records?

Page 5: Automatic Metadata Generation using Associative Networks

General System Overview

• Generate resource relations with existing metadata in the repository.– occurrence and/or co-occurrence networks

• Propagate metadata from metadata rich resources to metadata limited resources– encapsulate metadata in discrete particles

and disseminate them over the generated associative network

Page 6: Automatic Metadata Generation using Associative Networks

HEP-TH 2003 Semantic Network

A1

P1

Autho

r

of

O1

J1J2

K1

K2

T1

T2

A2

A3P2

O2

P3

P4

P5

cite

s

Aut

hor o

f

Published

journal

Published

journal

Has ke

ywor

d

Has keywordAuthor

of

Author of

Author of

Organization of

Organization of

Publishedtime

Publis

hed

time Published time

Author of

Organizationof

Publis

hed

time

Haskeyword

cites

Publishedjournal

c

ites

cite

s

A4Author

of

Page 7: Automatic Metadata Generation using Associative Networks

Transforming the Semantic Network

Convert the multi-node network into a collection of manuscripts with their associated attributes (metadata record).

– manuscript• Authors• Citations• Publication Date• Keywords• Organizations• Journal

resource

metadata record

Page 8: Automatic Metadata Generation using Associative Networks

Occurrence/Co-Occurrence

• Citation: two manuscripts are connected if one manuscript cites the other.

• Co-Author: two manuscripts are connected if they share the same authors

• Co-Citation: two manuscripts are connected if they share the same authors

• Co-Keyword: two manuscripts are connected if they share the same keywords

• Co-Organization: two manuscripts are connected if they share the same organizations

• Co-Date: two manuscripts are connected if they share the same publication date

• Co-Journal: two manuscripts are connected if they share the same journal

Page 9: Automatic Metadata Generation using Associative Networks

Network Generation Running Times

• Occurrence: O(N)– Each resource’s metadata record much be

checked once and only once for a direct reference to another resource.

• Co-occurrence: O([N2 – N] / 2)– Each resource’s metadata record much be

check against every other resource’s (N2), except itself (-N), once and only once (1/2).

A B

A B

C

Page 10: Automatic Metadata Generation using Associative Networks

Particle Propagation

• Every resource is given one particle, p_i. This particle contains all the metadata associated with its resource.

• A particle also has an energy value, e_i. The further the particle travels (edge steps), the more its energy value decays.

e_i(t+1) = e_i(t) * (1-\delta)

Page 11: Automatic Metadata Generation using Associative Networks

Particle Propagation

• The particle takes an outgoing edge of its current node based on the probability distribution of its outgoing edge set. If the resource it encounters doesn’t have metadata of a particular type, it recommends that resource its metadata weighted by its energy value.

Page 12: Automatic Metadata Generation using Associative Networks

Metadata Recommendations

• Manuscript A– Journal

• Journal of Complexity [0.2457]• Journal of Information Science [0.1]• Information Processing and Management [0.001]

recommendation strength

Page 13: Automatic Metadata Generation using Associative Networks
Page 14: Automatic Metadata Generation using Associative Networks

Mini-Break

Page 15: Automatic Metadata Generation using Associative Networks

Terrorist Alert

Page 16: Automatic Metadata Generation using Associative Networks

System Parameters

• Metadata Density: to validate the algorithm we kill a percentage of the metadata in the system and see if we can reconstruct it using the algorithm (d \in [0,1])

• Metadata Percentile: only those metadata tags in the pth percentile are accepted as valid metadata (p \in [0,1])

** Validation is based Precision and Recall values

Page 17: Automatic Metadata Generation using Associative Networks

Results for Co-Author Network(Citation Metadata)

Page 18: Automatic Metadata Generation using Associative Networks

Results for Co-Author Network (Organization Metadata)

Page 19: Automatic Metadata Generation using Associative Networks

Results for Co-Author Network (Keyword Metadata)

Page 20: Automatic Metadata Generation using Associative Networks

Results for Co-Keyword Network(Citation Metadata)

Page 21: Automatic Metadata Generation using Associative Networks

Results for Co-Keyword Network(Journal Metadata)

Page 22: Automatic Metadata Generation using Associative Networks

Results for Citation Network(Author Metadata)

Page 23: Automatic Metadata Generation using Associative Networks

Results for Citation Network(Keyword Metadata)

Page 24: Automatic Metadata Generation using Associative Networks

Results for Citation Network(Journal Metadata)

Page 25: Automatic Metadata Generation using Associative Networks

Take Home Points

• Different edge types are better a propagating different metadata types.

• Can work for any resource type as long as there exists some preliminary vetted metadata and a way to create resource relations. (if there is pre-existing metadata then resource relations can be automatically created).

Page 26: Automatic Metadata Generation using Associative Networks

Future Work (part 1)

• What about path types? e.g. take a co-author edge, then a citation edge, etc. Better precision and recall?

• Explore usage metadata (applicable to any resource type—and allows for cross resource relations (e.g. manuscripts connected to audio)). The weight between two resources is a function of the interval between their download from the same IP. (Bollen, et.al. 2004)

Page 27: Automatic Metadata Generation using Associative Networks

Future Work (part 2)

• Application to social-networks? Given an unknown individual, infer his attributes according to his social-relationships

how does ‘work_with’ differ from ‘married_to’? They share same income metadata and religious belief metadata, respectively.

Page 28: Automatic Metadata Generation using Associative Networks

Conclusion

• Good life…

Rodriguez, M.A., Bollen, J., Van de Sompel, H., “Automatic Metadata Generation using Associative Networks”, [unpublished], 2005.

Know of a good journal venue?


Top Related