oclc online computer library center © 2004 oclc online computer library center, inc. using literary...
TRANSCRIPT
© 2004 OCLC Online Computer Library Center, Inc.
OCLC Online Computer Library CenterUsing Literary Warrant to
Define a Version of the DDC for Automated
Classification Services Diane Vizine-Goetz
Research Scientist, OCLC ResearchJulianne Beall
Assistant Editor, DDCISKO Conference
London, 13-16 July 2004
2
Exploratory Study
Defining a version of the DDC – To facilitate automatic
assignment of DDC numbers to electronic documents
– Based on literary warrant for topics in electronic resources
3
DDC for Automated Classification
Machine classification service – A database of concepts used to
classify a document– Software that generates a prioritized
list of concepts that characterize the content of the document (Scorpion)
4
Checking Literary Warrant
Primary source for checking literary warrant: BUBL – Ca. 12,000 Internet resources
Canadian Information By Subject – Ca. 10,000 Internet resources
KidsClick! – Ca. 6,400 Internet resources
5
http://bubl.ac.uk/link/ddc.html
6
BUBL Site Statistics
DeweyClass
Numberof sites
Site Status ok
USSites
UKSites
500 135 103 59 27
510 167 36 65 43
520 186 133 84 25
530 139 111 68 20
540 118 82 38 33
550 247 196 127 30
Total 992 761 441 178
7
http://www.nlc-bnc.ca/caninfo/esub.htm
8
http://sunsite.berkeley.edu/KidsClick!/dewey.html
9
Defining a Version of the DDC
Starting point: classification numbers in Abridged Edition 14
True abridgment: the truncated number for a topic is always the same as the full number for the topic, except shorter, e.g.:– 551.64 Forecasting and forecasts of
specific phenomena •Cut back to 551.6 Climatology and
weather
10
Database Record
Class number
Caption
Superordinate hierarchy
Notes that describe what is found in a class
Relative Index entries
Mapped terminology
11
Keywords from 551.64 Added to 551.6; 551.64
DeletedClass-here note: methods of forecasting specific phenomena specific areas
Relative Index entries, e.g.,– Acid rain—weather forecasting– Hurricanes—weather forecasting– Rain—weather forecasting
Subject Headings for Children LCSH– Storms—Forecasting
12
Enriching Terminology for Numbers Built from Table 1
Example: built number 520.6
520 Astronomy and allied sciences
Relative Index terms that approximate the whole of 520: – Astronomy– Celestial bodies– Outer space– Space—astronomy
13
Built Number 520.6
Relative Index terms from T1—06, e.g.:– Associations– Organizations
Combined entries for 520.6, e.g.:– Astronomy—associations– Astronomy—organisations– Astronomy—organizations– Celestial bodies—associations– Celestial bodies—organisations– Celestial bodies—organizations
14
505 Science Serial publications 1_05 506 Science Organizations 1_06 507.2 Science Research; statistical methods 1_072 507.4 Science Museums, collections, exhibits 1_074 509 Science Historical, geographic, persons treatment 1_09 509.2 Science Persons 1_092 510.28 Mathematics Auxiliary techniques and procedures; apparatus, equipment 1_0285 510.5 Mathematics Serial publications 1_05 510.6 Mathematics Organizations 1_06 510.71 Mathematics Education 1_071 510.9 Mathematics Historical, geographic, persons treatment 1_09 520.6 Astronomy Organizations 1_06 526.06 Cartography Organizations 1_06 530.05 Physics Serial publications 1_05 530.06 Physics Organizations 1_06 530.071 Physics Education 1_071 540.5 Chemistry Serial publications 1_05 540.6 Chemistry Organizations 1_06 540.71 Chemistry Education 1_071 550.5 Earth sciences Serial publications 1_05 550.6 Earth sciences Organizations 1_06 550.71 Earth sciences Education 1_071 551.4606 Hydrosphere and submarine geology Oceanography Organizations 1_06 551.4607 Hydrosphere and submarine geology Oceanography Education and research 1_071; 1_072
Subdivisions Added or Enriched
15
Added UK Spellings for Index Entries
512.7
Number theory
Factorisation—number theory
Factorization—number theory
Number theory
Prime numbers
519.6
Mathematical optimization
Mathematical optimisation
Mathematical optimization
Optimisation—mathematical
Optimization—mathematics
16
Match Type A14 base A14.v1 A14.v2 A14.v3 Exact 139 139 129 183 Partial 155 155 186 167 Exact or Partial 294 294 315 350 Non-match 455 455 434 399 Total 749 749 749 749
A14.v1 base file + UK spelling
A14.v2 base file + UK spelling + SS added/enriched
A14.v3 base file + UK spelling + SS added/enriched + truncation
Results: Scorpion & BUBL
17
18
19
20
21
22
Next Steps
Analyze where the truncation and the enriched terminology were useful and where not; revise the v3 database accordingly
Extend approach to additional classes and projects (ePrints UK)
23
Links
Research : Projects : ePrints-UK– http://www.oclc.org/research/projects/
mswitch/epuk.htm
Dewey– http://www.oclc.org/dewey/