information retrieval vector space model in-depth part 2 thomas tiahrt, ma, phd csc492 – advanced...

9
INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 2 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

Upload: eileen-webster

Post on 23-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Slide 2
  • INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 2 Thomas Tiahrt, MA, PhD CSC492 Advanced Text Analytics
  • Slide 3
  • Inverse Document Frequency (IDF) 2
  • Slide 4
  • Inverse Document Frequency 3
  • Slide 5
  • 4
  • Slide 6
  • Document/Term Matrix 5
  • Slide 7
  • Weight Factor Computation 6
  • Slide 8
  • VSM Pros and Cons 7 Benefits Documents can be ordered by importance Threshold display limits are easy to honor Documents similar to the query retrieved early can be used for relevance feedback Drawbacks Orthogonal terms assumption is false Some vector operations have no theoretical justification
  • Slide 9
  • References 8 Sources: Introduction to Information Retrieval by Christopher Manning, Prabhakar Raghavan and Hinrich Schtze, The Cambridge University Press Automatic Text Processing Gerard Salton, Addison-Wesley Publishing.
  • Slide 10
  • The end of the second in-depth description of the vector space model slide show has come. End of the Slides 9