korean script searching in korean library opacs
DESCRIPTION
Korean script searching in Korean Library OPACs. Junglim Chae Yonsei University. Indexing Method. N-Gram Morphological Analysis. N-Gram Indexing. N-Gram : Unigram, Bigram, Trigram, N-Gram E.g.) 아버지가 방에 들어가신다 12 Index by Bigram Segmentation - PowerPoint PPT PresentationTRANSCRIPT
-
Korean script searching in Korean Library OPACs Junglim Chae Yonsei University
-
Indexing Method
N-Gram
Morphological Analysis
-
N-Gram IndexingN-Gram : Unigram, Bigram, Trigram, N-GramE.g.) 12 Index by Bigram Segmentation, , , 0 , 0, , 0 , 0, , , , Many index terms-many results but lots of noise High recall ratio but low precision ratio
-
Morphological AnalysisRequires a morphological analysis dictionaryE.g.) Three Index by morphological analysis, , Ability to match linguistically similar terms Faster performance with a smaller index Accurate matches that meet user expectationsHigh precision ratio but low recall ratio
-
N-Gram Vs. Morphological Analysis
N-GramMorphological AnalysisRecall RatioHighLowPrecision RatioLowHighSize of IndexBigSmallIndexing SpeedFastSlowSearch SpeedSlowFastApplicationLibrariesWeb Search Engines
-
A Case Study
Yonsei University LibraryLibrary System: Maestro-Y Search Engine: K2 by VerityIndexing Method N-Gram (bigram) + Morphological AnalysisIndexing RulesRule1: Divide Strings by space Rule2: Extract index using bigram indexing methodRule3: Add the whole string excluding spaces between strings Rule4: Add words from Korean morphological analysis dictionary
-
A Case Study
Yonsei University LibraryE.g.)
/ (rule1), , , , (rule2)(rule3)(rule4)Index: , , , , , ,
-
Search Tips
-
Search Tips(1)Keyword Search
, Default Search OptionUse at most 3 keywordsUse Boolean operatorsOmit Stop-words
-
Search Tips(2)Keyword Search
Follow the Korean Word Division Rules E.g.) (O) (X)
-
Search Tips(3)
Keyword Search
Compound Nounsdo not use spaces between nounsE.g.) (O), (X )
-
Browse SearchBegin with or Truncation,
When you already know the first word of the title, author, or publisher E.g.)
Search Tips(4)
-
Browse Search
Korean ClassicsE.g.)
Search Tips(5)
-
Exact Match
Precise Search
Known itemsE.g.) Search Tips(6)
-
Exact Match
Single character wordsE.g.) , , C
Search Tips(7)
-
Support Hangul/Hancha Searching
E.g.) /
Search Tips(8)
-
Japanese KanaArchaic KoreanRussianSpecial characters : Choose scripts from Multi-language Input Table
Search Tips(9)
-
E.g.) Multi-Script Input Table
-
Japanese Kana//
Search Tips(10)
-
Personal names ; Shakespeare ; Murakami, Haruki ; ; ,
Search Tips(11)
-
Space Considered as ANDE.g.) = AND In some OPACs, spaces in the character fields do make a difference in retrieval
Search Tips(12)
-
Comparative search with and without space
-
Thank You
*********************.************
*****
*