inis training seminar - international atomic energy november 2005 inis training seminar 1 inis...

Download INIS Training Seminar - International Atomic Energy November 2005 INIS Training Seminar 1 INIS Training…

Post on 12-Jun-2018




0 download

Embed Size (px)


  • 1

    November 2005 INIS Training Seminar 1

    INIS Training Seminar14-18 November 2005

    Subject Analysis, Thesaurus and Computer-assisted Indexing

    Alexander NevyjelDatabase Production and Development Group

    INIS Unit, INIS&NKM Section, IAEA

    November 2005 INIS Training Seminar 2

    Introduction to Subject Analysis

    Subject Analysis should be carried out whenever possible by subject specialists with a good knowledge of the subject matter and a familiarity with the subject analysis tools of the respective database (subject categories, thesaurus, subject analysis rules)Steps of Subject Analysis

    subject classificationabstractingsubject indexing

  • 2

    November 2005 INIS Training Seminar 3

    Subject Classification

    The main topic of the document determines theprimary subject categoryIf there are other significant topics, one or moresecondary subject categories can be assigned in addition

    November 2005 INIS Training Seminar 4


    Each input item should contain an English abstract(exception: short communications)Abstracts in other languages are optionalIf an author abstract is available, it should be checked by the subject specialist, and edited, if necessaryAn abstract should be as informative as possibleEmphasize what is novel about the information in the original document

  • 3

    November 2005 INIS Training Seminar 5

    ThesaurusWhat is a Thesaurus ?

    A thesaurus is a terminological control deviceused in translating from the natural languageof documents, indexers or users into a more constrained system language. It is a controlled and dynamic vocabulary of semantically and generically related terms which covers a specific domain of knowledge

    This definition has been adopted by UNESCOGuidelines for the establishment and development of monolingual

    thesauri, UNESCO, SC/W/255, Paris, September 1973

    November 2005 INIS Training Seminar 6

    The Thesaurus and its Structure

    Relationship Sy Cross reference

    hierarchical BT broader term (level 1, 2,...)hierarchical NT narrower term (level 1, 2,...)

    affinitive RT related term

    preferential UF used for (reciprocally USE ...)preferential UF+ used for multiple

    (reciprocally USE ... AND ...)preferential SF seen for

    (reciprocally SEE ... OR ...)

  • 4

    November 2005 INIS Training Seminar 7

    Subject Indexing

    Subject indexing means analysing the information content of a piece of literature and expressing the meaningfull information content in the language of the database using the controlled vocabulary of the ThesaurusUnderstanding of the content --> subject specialistFamiliarity with Thesaurus and indexing rulesSelect a set of descriptors that describes the subject content of the piece of literature

    November 2005 INIS Training Seminar 8

    Procedures for Indexing

    Carefully read the title and abstract and scan the body of the piece of literaturescan the full text (introduction, table of content, tables, graphs, figures, conclusion) to find information items missing from the abstract or requiring more precisionIdentify the concept(s) about which the piece of literature contains useful informationTranslate the concepts into descriptorsAvoid overindexing

  • 5

    November 2005 INIS Training Seminar 9

    Proposed Terms (Technical Note 175)If no suitable descriptor exists in the Thesaurus for

    the retrieval of a usefull concept, make a proposal for a new one, containing the following: Proposed termProposed word block of the term (in particular proposed BTs)Potential forbidden terms pointing to this proposed descriptor Scope note when appropriate Explanation and justification for the proposal One or more sample records

    November 2005 INIS Training Seminar 10

    The purpose of subject indexing is

    to enable useful retrieval

  • 6

    November 2005 INIS Training Seminar 11

    Computer-assisted Indexing

    Kick-off Meeting Jan 2004Implementation and Customisation Jun 2004Production Indexing from Jun 2004 ongoingCAI version 1.0 final acceptance Aug 2004Tuning of the system from Aug 2004 ongoingCAI version 1.10 kick-off Dec 2004CAI version 1.10 acceptance Apr 2005RetrievalWare pilot Aug 2005CAI Thesaurus extension planned Jan 2006

    November 2005 INIS Training Seminar 12

    CAI Thesaurus extension

    Hidden terms are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. handled similar to forbidden terms with one or more USE relationsCAI internal only not exported to INIS production systemnot exported to FIBRE not printed in any appearance of the thesaurus support identification of descriptors in the free text

  • 7

    November 2005 INIS Training Seminar 13

    Hidden Terms: Compounds

    Descriptor hidden term free text



    approx. 1400 hidden terms (expected 3000)

    November 2005 INIS Training Seminar 14

    Hidden Terms: Isotopes

    Descriptor hidden term free text

    CESIUM 137 Cesium 137, Cesium-137"1"3"7cs 137Cs137 caesium 137 Caesium, 137-Caesiumcaesium 137 Caesium 137, Caesium-137137 cesium 137 Cesium, 137-Cesium137 cs 137 Cs, 137-Css 137 Cs 137, Cs-137cs"1"3"7 Cs137

    cs137 Cs137CESIUM 138 "1"3"8"mcs 138mCs

    cs"1"3"8"m Cs138m

    approx. 22.400 hidden terms

  • 8

    November 2005 INIS Training Seminar 15

    Hidden Terms: Elementary ParticlesDescriptor hidden term free text

    B QUARKS bottom quarksT QUARKS top quarksELECTRON NEUTRINOS #nu#_e eMUON NEUTRINOS #nu#_#mu# TAU NEUTRINOS #nu#_#tau# RHO-770 MESONS #rho#-770 -770OMEGA-782 MESONS #omega#-782 -782KAONS NEUTRAL K"0 K0


    approx. 300 hidden terms

    November 2005 INIS Training Seminar 16

    Hidden Terms: UK/US SpellingsDescriptor hidden term

    A CENTERS a centresACTIVITY METERS activity metresANALOG COMPUTERS analogue computersANESTHESIA anaesthesiaARCHAEOLOGY archeologyAUSTRIAN ORGANIZATIONS austrian organisationsBALLISTIC MISSILE DEFENSE ballistic missile defenceBAYARD-ALPERT GAGES bayard-alpert gaugesBEAM ANALYZERS beam analysersBEHAVIOR behaviourCATALOGS catalogues

    approx. 800 hidden terms

  • 9

    November 2005 INIS Training Seminar 17

    Hidden Terms: Diacritics and CountriesDescriptor hidden term

    Diacritics:BAECKLUND TRANSFORMATION backlund transformationBRUECKNER MODEL bruckner modelBRUNSBUETTEL REACTOR brunsbuttel reactorMOESSBAUER EFFECT mossbauer effect

    Country Names:CAMBODIA kampucheaCOTE D'IVOIRE ivory coastGREECE hellasMYANMAR burmaSYRIA syrian arab republicTHAILAND siam

    approx. 250 hidden terms

    November 2005 INIS Training Seminar 18

    Hidden Terms: Other SpellingsDescriptor hidden term

    Singular/PluralFUNGI fungusFUNGI fungusesG MATRIX g matricesG MATRIX g matrixes

    Reverse SequenceATOM-MOLECULE COLLISIONS atom-molecule scatteringATOM-MOLECULE COLLISIONS molecule-atom scatteringATOM-MOLECULE COLLISIONS atom-molecule reactionsATOM-MOLECULE COLLISIONS molecule-atom reactionsATOM-MOLECULE COLLISIONS atom-molecule interactionsATOM-MOLECULE COLLISIONS molecule-atom interactions

    approx. 900 hidden terms

  • 10

    November 2005 INIS Training Seminar 19

    CAI Thesaurus Extension

    ThesaurusValid Descriptors 21.953Forbidden Terms 9.411

    CAI Hidden Terms 29.237

    Total 60.601

    Terminological Knowledge Base

    November 2005 INIS Training Seminar 20

    Further Improvements under Development


    Case sensitivityTiN TIN (instead of TITANIUM NITRIDES)gas GALLIUM SULFIDESwho is the WHO (World Health Organization)

    Verbs versus Nouns this leads us to LEAD this leaves it . LEAVES

    Homographic termsSolutions SOLUTIONS or MATHEMATICAL SOLUTIONS

    Nuclear Reactions, e.g. 14N(,)10BTargets BeamsReactions

  • 11

    November 2005 INIS Training Seminar 21

    C A I In te rac tiveT ra in ing o f C A I

    R ecords w ith F u llIndex ing

    IN IS V erifica tion a ndP roduc tion S ys tem

    C A I O ffline /B a tch

    R ecord s w ithC A I-sugges ted

    D escrip to rs

    IN IS S ub jec tA na lys is M odu le

    Inpu t fromM e m ber S ta tes

    F u llIndex ing

    P roposed T erm s /N o In dex ing

    E lec tron ic R e cordsfrom P ub lishe rs

    P roposed T erm s/N o Inde x ing


    Interactive CAI ProcessingBatch Mode

    Conventional Processing

    November 2005 INIS Training Seminar 22

  • 12

    November 2005 INIS Training Seminar 23

    CAI Batch Processing StatisticsNov 2004 November 2005

    Country Records FilesAR Argentina 133 7AU Australia 443 2BD Bangladesh 2 1BG Bulgaria 27 1BR Brazil 10 1CH Switzerland 58 4CN China 294 3DE Germany 363 11FR France 243 3JP Japan 6 1MK Macedonia 107 1MY Malaysia 125 3SE Sweden 27 1TH Thailand 15 1UZ Uzbekistan 144 2

    Total 1997 42

    November 2005 INIS Training Seminar 24

    CAI Batch Processing

    Input: MemSt-CC-yymmdd-xxxxxxxxxxxOutput: _MemSt-CC-yymmdd-xxxxxxxxxxx

    MemSt is a standard prefix (meaning member state)CC is the country code yymmdd is the date when the file was generated xxxxxxxxxxx is any additional identification


  • 13

    November 2005 INIS Training Seminar 25

    CAI Batch Processing


View more >