bsc thesis part1 - web based information retrieval

Upload: tpitikaris

Post on 07-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Bsc Thesis part1 - WEB BASED INFORMATION RETRIEVAL

    1/26

    WEB BASED INFORMATION

    RETRIEVAL

    byTheodoros Pitikaris

    A thesis submitted in partial fulfill-

    ment of the requirements for the

    degree of:

    BSc in Computing an d

    Information Technology

    Department of Computing

    University of Surrey

  • 8/6/2019 Bsc Thesis part1 - WEB BASED INFORMATION RETRIEVAL

    2/26

    UNIVERSITY OF SURREY

    ABSTRACT

    WEB BASED INFORMATION RE-

    TRIEVAL

    by

    Theodoros Pitikaris

    Supervisory Committee: Dr . Bogdan Vrusias

    Department of Computing

    Dr. Nick Antonopoulos

    Department of Computing

    Web World Wide contains large sets of information. This characteristic ofWeb however, can become a real pain fo r users who seek sources that

    would be qualitative an d relative, at the same time, to their informative

    needs. In this Final Year project we tr y to examine some information re-

    trieval methods over web stored information. The main focus is given on if

    an d how software agents could potentially enhance the information re-

    trieval process .

    Another topic that we examine in this final year project is the require-

    ments, phases an d evaluation process that are necessary in software de-

    sign & production process.

  • 8/6/2019 Bsc Thesis part1 - WEB BASED INFORMATION RETRIEVAL

    3/26

  • 8/6/2019 Bsc Thesis part1 - WEB BASED INFORMATION RETRIEVAL

    4/26

    Project development process ... . ... ... ..... ............ .. ... ..... ................ .... ........... 54Chapter 5 . Software Development PHASES in Details ...... .......... ....... 58

    Design Overview ............................................ ...... . .. ................ .. ... 58Facilities .. ......... . ................. ........ .... . ................... . ........................ ......... .. 58

    The core system .... .. ............................... .. .. .. ............................. .. 59Software development platform ............. . .......................... ......... . . ............. 59Intergraded Development Environment Development.. ..................... ... .. ...... 60System Design ............................................... .......... ....................... ..... ... 61.Unit Testing ........................... .............. ................................ ........ .......... 69Integration Testing ................ . ............................. .. . ................. . ... ............ 70

    Chapter 6. DISCUSSION ...................................... .. ....................... 72Interesting parts during development process ................................ . 72

    Prototype evaluation ................ .................................. .................. 72Comments on th e evaluation results and related work ..................... 74Overall project Evaluation ................................ ............................. 75

    Chapter 7. Conclusions ...... .... ..... ........................................... .. ..... 77Future work ............................................................................. ... 78

    INDEX . . . ............ . . .... ...... ... .. .. ... ..... .... . ...... ........ ........ ... ..................... 83

    ii

  • 8/6/2019 Bsc Thesis part1 - WEB BASED INFORMATION RETRIEVAL

    5/26

    LIST OF TABLES

    Table 1 Agile vs Waterfall methodo logy (available fromhttp:/ en. wikipedia.org/wiki/ Agile_software_development) ........................ . . 29

    Table 2 Development Phases ........................................ .. ................... .. ................ 57Table 3 Sample of a Matrix candidate for SVD .. .. .... .. ...... .. .. ................................... 64

    List of figuresFigure 1 Google database development ................................................................ 6Figure 2 The Waterfall Model .............................................................................. 26Figure 4 Waterfa ll vs. Agile ................................................................................ 28Figure 5 System Use Case ................ .. .. ....... .... .. .. .. ............ .. ......... .. ................... 62Figure 6 System State Qiagram .......................................................................... 63Figure 7 Users' opinion about the system ................ .... ............ .... .. ..................... 74

    iii

  • 8/6/2019 Bsc Thesis part1 - WEB BASED INFORMATION RETRIEVAL

    6/26

    AcknowledgmentsThe author wishes to express sincere appreciation to Mr Staurakakis

    Emanuel and Mr Tsagatsakis John for their assistance in the preparation of

    this Final year Project report.

    iv

  • 8/6/2019 Bsc Thesis part1 - WEB BASED INFORMATION RETRIEVAL

    7/26

    INTRODUCTION

    In 2001 the Bank of Sweden Prize in Economic Sciences in Memory

    of Alfred Nobel was awarded to James Mirrlees and William Vickrey

    fo r their fundamental contributions to the theory of incentives under

    asymmetric information.

    With their work

    (http://www .nobel.se/economics/laureates/2001/ecoadv. pdf)

    they have validated no t only the importance of the Information but

    also the importance of accessibility over this information.

    Nowadays everyone in west, especially after the development of th e

    internet, has access to large amount of data, in electronic or paper

    form. The main problem that we usually face is that the volume of

    this information is so large that we can no t easily handle it, or worse

    it has no use.

    In order to take advantage of this information we need to categorize

    it in thematic cohesion and thus to manageable data. A few decades

    ago this was librarians' line, bu t as already mentioned the volume of

    data has increased dramatically in such a degree(Society, 2004)

    that the traditional methods of indexing are no t in position to face

    this new challenge.

    Th e problem gets bigger when we need to categorize ne w documents

    based on their content, of course in many documents their is an ab -

    stract on top of them ; bu t in fact only scientific papers with a special

    purpose have this form, fo r example an abstract is essential fo r a

    paper bu t no t fo r a newspaper or a magazine.

    1

  • 8/6/2019 Bsc Thesis part1 - WEB BASED INFORMATION RETRIEVAL

    8/26

    Some people believe that when we talk about retrieving dat