mse portfolio - eric davispeople.cis.ksu.edu/~efd3467/mse_portfolio-eric_davis.pdferic f. davis...

236
MASTER OF SOFTWARE ENGINEERING PORTFOLIO by ERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of the requirements for the degree MASTER OF SOFTWARE ENGINEERING Department of Computing and Information Sciences College of Engineering KANSAS STATE UNIVERSITY Manhattan, Kansas 2008 Approved by: Major Professor Dr. William H. Hsu

Upload: others

Post on 12-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

MASTER OF SOFTWARE ENGINEERING PORTFOLIO

by

ERIC F. DAVIS

B.S., Kansas State University, 2003

A REPORT

submitted in partial fulfillment of the requirements for the degree

MASTER OF SOFTWARE ENGINEERING

Department of Computing and Information Sciences

College of Engineering

KANSAS STATE UNIVERSITY

Manhattan, Kansas

2008

Approved by:

Major Professor

Dr. William H. Hsu

Page 2:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

Abstract

The KDD-Research Entity Search Tool (KREST) is a standalone application that allows a user to

find specific pieces of data available on the World Wide Web. It allows web crawling, web

searching, and the option to perform entity searches. The web searches and entity searches can

be performed on web pages that are loaded into the program or on pages that were crawled using

the web crawling portion of the application. The benefit of having an all-in-one tool like KREST

is that it allows the user to find the specific piece of contact information that they want, such as

an email address, a phone number, or a street address. It attempts to replace the current tedious

search method of having to open an internet browser, go to a search engine, attempting to

determine the proper search term, and then wading through matching pages until the find the

desired information.

Page 3:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

iii

Table of Contents

List of Figures ................................................................................................................................ iv

List of Tables ................................................................................................................................ vii

CHAPTER 1 - Vision Document.................................................................................................... 1

CHAPTER 2 - Project Plan........................................................................................................... 13

CHAPTER 3 - Software Quality Assurance Plan......................................................................... 23

CHAPTER 4 - Architectural Design............................................................................................. 29

CHAPTER 5 - Technical Inspection Checklist ............................................................................ 79

CHAPTER 6 - Component Design ............................................................................................... 82

CHAPTER 7 - Test Plan ............................................................................................................. 125

CHAPTER 8 - Test Assessment Evaluation............................................................................... 166

CHAPTER 9 - User’s Manual .................................................................................................... 197

CHAPTER 10 - Project Evaluation ............................................................................................ 214

References................................................................................................................................... 224

Appendix A - Source Metrics ..................................................................................................... 226

Page 4:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

iv

List of Figures

Figure 1.1 Project Overview........................................................................................................... 2

Figure 1.2 Project Block Diagram .................................................................................................. 3

Figure 1.3 KREST Data Flow Diagram.......................................................................................... 4

Figure 1.4 System Use Case ........................................................................................................... 5

Figure 2.1 Project Schedule .......................................................................................................... 15

Figure 4.1 Package View .............................................................................................................. 30

Figure 4.2 KREST Application Package ...................................................................................... 30

Figure 4.3 Controller Package ...................................................................................................... 31

Figure 4.4 KrestController Class .................................................................................................. 32

Figure 4.5 KrestAboutDialog Class.............................................................................................. 33

Figure 4.6 WebCrawler Class ....................................................................................................... 33

Figure 4.7 SiteVisitor Class .......................................................................................................... 34

Figure 4.8 ThreadController Class................................................................................................ 34

Figure 4.9 HTTPReader Class ...................................................................................................... 35

Figure 4.10 WebSearcher Class.................................................................................................... 35

Figure 4.11 EntitySearcher Class.................................................................................................. 35

Figure 4.12 View Package ............................................................................................................ 36

Figure 4.13 KrestView Class ........................................................................................................ 36

Figure 4.14 CrawlerObserver Class.............................................................................................. 37

Figure 4.15 SearchObserver Class................................................................................................ 37

Figure 4.16 EntityObserver Class ................................................................................................. 38

Figure 4.17 Model Package .......................................................................................................... 38

Figure 4.18 KrestModel Class ...................................................................................................... 38

Figure 4.19 KrestObjectLibrary Class .......................................................................................... 39

Figure 4.20 WebObject Class ....................................................................................................... 39

Figure 4.21 Webpage Class .......................................................................................................... 39

Figure 4.22 KrestEntity Class ....................................................................................................... 40

Page 5:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

v

Figure 4.23 AddressEntity Class................................................................................................... 40

Figure 4.24 EmailEntity Class ...................................................................................................... 40

Figure 4.25 FaxEntity Class.......................................................................................................... 41

Figure 4.26 PhoneEntity Class...................................................................................................... 41

Figure 4.27 ZipEntity Class .......................................................................................................... 41

Figure 4.28 OverarchingEntity Class............................................................................................ 42

Figure 4.29 Web Crawl Sequence Diagram ................................................................................. 43

Figure 4.30 Web Search Sequence Diagram ................................................................................ 44

Figure 4.31 Entity Search Sequence Diagram .............................................................................. 45

Figure 6.1 Package View .............................................................................................................. 83

Figure 6.2 KREST Application Package ...................................................................................... 83

Figure 6.3 Controller Package ...................................................................................................... 84

Figure 6.4 KrestController Class .................................................................................................. 85

Figure 6.5 KrestAboutDialog Class.............................................................................................. 91

Figure 6.6 FileLoader Class.......................................................................................................... 92

Figure 6.7 WebCrawler Class ....................................................................................................... 93

Figure 6.8 SiteVisitor Class .......................................................................................................... 95

Figure 6.9 ThreadController Class................................................................................................ 98

Figure 6.10 HTTPReader Class .................................................................................................... 99

Figure 6.11 WebSearcher Class.................................................................................................. 100

Figure 6.12 EntitySearcher Class................................................................................................ 101

Figure 6.13 View Package .......................................................................................................... 103

Figure 6.14 KrestView Class ...................................................................................................... 104

Figure 6.15 CrawlerObserver Class............................................................................................ 105

Figure 6.16 SearchObserver Class.............................................................................................. 107

Figure 6.17 EntityObserver Class ............................................................................................... 108

Figure 6.18 Model Package ........................................................................................................ 109

Figure 6.19 KrestModel Class .................................................................................................... 110

Figure 6.20 KrestObjectLibrary Class ........................................................................................ 111

Figure 6.21 WebObject Class ..................................................................................................... 112

Figure 6.22 Webpage Class ........................................................................................................ 113

Page 6:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

vi

Figure 6.23 KrestEntity Class ..................................................................................................... 115

Figure 6.24 AddressEntity Class................................................................................................. 116

Figure 6.25 EmailEntity Class .................................................................................................... 117

Figure 6.26 FaxEntity Class........................................................................................................ 118

Figure 6.27 PhoneEntity Class.................................................................................................... 119

Figure 6.28 ZipEntity Class ........................................................................................................ 121

Figure 6.29 OverarchingEntity Class.......................................................................................... 122

Figure 9.1 Opening KREST Screen............................................................................................ 198

Figure 9.2 Completed Breadth-First Web Crawl........................................................................ 200

Figure 9.3 Depth-First Crawl in Progress ................................................................................... 201

Figure 9.4 Saving a Web Crawl.................................................................................................. 202

Figure 9.5 Stopping a Web Crawl............................................................................................... 203

Figure 9.6 Resetting a Web Crawl.............................................................................................. 204

Figure 9.7 Performing a Web Search.......................................................................................... 205

Figure 9.8 Filtering the Web Search by Back-link Count .......................................................... 206

Figure 9.9 Performing an Entity Search ..................................................................................... 207

Figure 9.10 How to Load Data into KREST............................................................................... 208

Figure 9.11 How to Save Entity Search Results ......................................................................... 209

Figure 9.12 KREST Application with Exit Methods Circled ..................................................... 210

Figure 9.13 How to Access the Help Menu................................................................................ 211

Figure 10.2 Phase Breakdown .................................................................................................... 217

Figure 10.3 Project Activity Breakdown .................................................................................... 218

Figure 10.4: Phase 1 Activity Breakdown .................................................................................. 219

Figure 10.5: Phase 2 Activity Breakdown .................................................................................. 220

Figure 10.6: Phase 3 Activity Breakdown .................................................................................. 221

Page 7:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

vii

List of Tables

Table 2.1 COCOMO Effort Adjustment Factors .......................................................................... 16

Table 2.2 Project Effort Adjustment Factor Values..................................................................... 16

Table 5.1 Technical Inspection Checklist ..................................................................................... 80

Table 6.1 Detailed Description of the KrestApplication Class..................................................... 83

Table 6.2 Detailed Description of the KrestController Class ....................................................... 86

Table 6.3 Detailed Description of the KrestAboutDialog Class................................................... 91

Table 6.4 Detailed Description of the FileLoader Class............................................................... 92

Table 6.5 Detailed Description of the WebCrawler Class............................................................ 93

Table 6.6 Detailed Description of the SiteVisitor Class ............................................................... 95

Table 6.7 Detailed Description of the ThreadController Class..................................................... 98

Table 6.8 Detailed Description of the HTTPReader Class ........................................................... 99

Table 6.9 Detailed Description of the WebSearcher Class......................................................... 100

Table 6.10 Detailed Description of the EntitySearcher Class..................................................... 102

Table 6.11 Detailed Description of the KrestView Class ........................................................... 104

Table 6.12 Detailed Description of the CrawlerObserver Class................................................. 105

Table 6.13 Detailed Description of the SearchObserver Class................................................... 107

Table 6.14 Detailed Description of the EntityObserver Class .................................................... 108

Table 6.15 Detailed Description of the KrestModel Class ......................................................... 110

Table 6.16 Detailed Description of the KrestObjectLibrary Class ............................................. 111

Table 6.17 Detailed Description of the WebObject Class .......................................................... 113

Table 6.18 Detailed Description of the Webpage Class ............................................................. 113

Table 6.19 Detailed Description of the KrestEntity Class.......................................................... 115

Table 6.20 Detailed Description of the AddressEntity Class ..................................................... 116

Table 6.21 Detailed Description of the EmailEntity Class ......................................................... 117

Table 6.22 Detailed Description of the FaxEntity Class............................................................. 118

Table 6.23 Detailed Description of the PhoneEntity Class......................................................... 119

Table 6.24 Detailed Description of the ZipEntity Class ............................................................. 121

Page 8:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

viii

Table 6.25 Detailed Description of the OverarchingEntity Class............................................... 122

Table 7.1 Test Case 1.................................................................................................................. 129

Table 7.2 Test Case 2.................................................................................................................. 130

Table 7.3 Test Case 3.................................................................................................................. 136

Table 7.4 Test Case 4.................................................................................................................. 140

Table 7.5 Test Case 5.................................................................................................................. 147

Table 8.1 Test Results Summary ................................................................................................ 166

Table 8.2 Test Log for Test Case 1............................................................................................. 167

Table 8.3 Test Log for Test Case 2............................................................................................. 168

Table 8.4 Test Log for Test Case 3............................................................................................. 173

Table 8.5 Test Log for Test Case 4............................................................................................. 176

Table 8.6 Test Log for Test Case 5............................................................................................. 182

Table 10.1 Project Phase Completion Dates............................................................................... 216

Page 9:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

1

CHAPTER 1 - Vision Document

1 Introduction

1.1 Motivation

The motivation for this project is to improve upon the current state of web searching.

Searching for contact information in the web is a tedious task in the current state: it

involves trying to determine a proper search string, followed by wading through

matching pages looking for the contact information desired. The goal of this project is

to improve upon this process by providing the contract information after one search,

without requiring the user to wade through pages that match the search string.

1.2 KDD-Research Entity Search Tool (KREST)

The KDD-Research Entity Search Tool (KREST) is the answer to the search related

problems mentioned above. It breaks apart web pages into entities, such as email

addresses, phone numbers, and fax numbers. Specific entity results are returned to the

user, rather than the old way of returning page matches. KREST also allows the user to

perform a web crawl from a given starting webpage, and to perform a traditional web

search in addition to performing an entity search.

Entity search works by allowing the user to specify what specific type of information

they are looking for. To the end user, using entity search will seem somewhat like

using a database to query for information -- just the information requested will be

returned, without any additional filler. Rather than being forced to search for a general

term like “Amazon Customer Service”, entity search will allow the user to specify that

they are looking for the Amazon Customer Service phone number by entering a search

term like “Amazon Customer Service #phone”. Alternatively, if the user was looking

for the email address of the professors at Kansas State University that teach database

courses, they could search for that specifically by a search term such as “Kansas State

University professor database #email”. Upon receiving a search term, the entity search

Page 10:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

2

tool will look for the pages that match the search text. Those pages that match the

search text will then be broken to extract the requested entities (if they exist on those

pages). The entities that match will then be returned to the user with links to the pages

that contained the information in case the user wishes to verify the information.

1.3 Terms & Definitions

Actor – For UML purposes, the actor is the end user of the system.

Entity – A specific piece of information, such as an email address or a phone number.

Knowledge Discovery in Databases (KDD) – A group headed by Dr. William Hsu

whose primary focus is data-mining.

Sequence Diagram – A graphical design used to display the order in which objects

interact during a certain period.

Unified Modeling Language (UML) – A standard notation used to describe real-world

objects.

Use Case Diagram – A behavioral diagram defined by UML. It provides a graphical

depiction of system functionality in terms of actors.

2 Project Overview

The Project Overview section provides information about the structure and goals of the

KREST project.

Figure 1.1 Project Overview

Page 11:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

3

2.1 Introduction

Figure 1.1 provides a high level overview of what the KREST project is working to

achieve. It will allow the user to perform web crawls, web searches, and entity

searches all within the same tool. It will be a self-contained application that works

separately from the user’s normal Internet browser. The KREST environment will

update and extract data from a database that stores previously crawled web pages.

Figure 1.2 Project Block Diagram

Figure 1.2 provides a block diagram of how the KREST project will operate. The user

will interact with the KREST tool within the KREST environment. The KREST

Page 12:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

4

environment makes calls to the Application Level where the Web Crawler Service, the

Web Search Service or the Entity Search Service perform the work. Each of these

services makes use of the Website Database in the Storage Level. All of the work

being performed is done on the JAVA Virtual Machine, which in turn runs on the

user’s actual system hardware.

Figure 1.3 KREST Data Flow Diagram

Figure 1.3 provides a view of how data will be used throughout the program, especially

for entity searching. The database will contain web pages, which are linked to for

specific entity instantiations.

2.2 Project Goal

The goal of the KREST project is to create an application that provides the ability to

perform entity searches on either previously loaded data or crawled web pages. The

project should be able to reproduce the findings from Tao Cheng’s entity search work

[2], which is searching for contact information based on a publicly available dataset of

web pages.

2.3 Project Purpose

The purpose of the KREST project is to provide a tool that allows enhanced web

searching by way of entity search. It is also to provide a standalone application that

Page 13:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

5

will speed up searches on the client end. The developed application will act as a

platform for future KDD students to perform entity search testing, and provide a good

base for future entity search enhancements.

3 Project Requirements

The Project Requirements section will detail all of the requirements for the KREST project.

Each requirement will be discussed in detail, as well as the associated requirement number,

and the planned release that will fulfill the requirement (i.e. Demo 1, Demo 2, or Final

Release). All of the project’s critical requirements will be noted.

Figure 1.4 System Use Case

The requirements are broken out into four distinct sections based on the Use Case diagram

found in Figure 1.4: Application Requirements, Web Crawler Requirements, Web Search

Requirements, and Entity Search Requirements. This makes it easier to track the

Page 14:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

6

requirements between different parts of the application, and also makes it easier to refine

and add requirements as the project progresses.

3.1 Application Requirements

This section details all of the requirements related to the main application that are not

specific to the web crawler, the web search, or the entity search pieces. The

requirements are numbered ARI 1XX, where ARI stands for Application Requirement

Item.

3.1.1 ARI 100 [Critical Requirement]

The program shall provide a GUI for user interaction. This is a critical requirement

because the usefulness of the system would be extremely limited if done in a command

line format.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.1.2 ARI 101

The application shall be executable in a single step (e.g. without having to perform any

setup steps).

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.1.3 ARI 102

The application shall have a menu bar that contains at a minimum: a File menu and a

Help menu.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.1.4 ARI 103 [Critical Requirement]

The application shall allow the user to load a data set of web pages. This is a critical

requirement because in order to reproduce the findings of [2], the same data set needs

to be used.

• Build Release Applicability: Final Release

3.1.5 ARI 104

The application shall allow the user to save entity search results.

Page 15:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

7

• Build Release Applicability: Final Release

3.1.6 ARI 105

The application's Help menu shall contain at a minimum an ‘About’ menu item

• Build Release Applicability: Demo 2, Final Release

3.1.7 ARI 106

The application's menu bar shall contain shortcut keys.

• Build Release Applicability: Demo 2, Final Release

3.1.8 ARI 107 [Critical Requirement]

The application shall be platform independent. This is a critical requirement because

while the application is being developed using Windows, the goal is to also allow it to

be used on both Linux and Unix as well.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.1.9 ARI 108

The application shall be able to be minimized.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.1.10 ARI 109

The application shall be able to be closed without having to perform a Control-C from

the command line.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.2 Web Crawler Requirements

This section details all of the requirements related to the web crawling portion of the

project. The requirements are numbered WCRI 1XX, where WCRI stands for Web

Crawling Requirement Item.

3.2.1 WCRI 100 [Critical Requirement]

The user shall have the ability to perform a web crawl based on a starting website. This

is a critical requirement because without the web crawling portion of the project, the

usefulness of the project is extremely limited (it would be limited to only using user

Page 16:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

8

loaded data sets). By allowing user specified web crawls to be performed, the user can

tailor the search to their needs.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.2.2 WCRI 101 [Critical Requirement]

The user shall be allowed to specify the starting website (if none is specified,

http://www.cis.ksu.edu will be used). This is a critical requirement because allowing

the user to specify the start point to crawl from allows a good web crawl to take place.

Without allowing the user to specify the start point, there would not be any usefulness

to the web crawler.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.2.3 WCRI 102

The user shall have the ability to specify the maximum depth of the web crawl.

• Build Release Applicability: Demo 2, Final Release

3.2.4 WCRI 103

The user shall have the ability to specify a log file in which to save the results of the

crawl.

• Build Release Applicability: Demo 2, Final Release

3.2.5 WCRI 104 [Critical Requirement]

The user shall be allowed to specify the maximum number of websites to crawl before

stopping. This is a critical requirement because without allowing the user to specify

how many websites to search, it would have to be bounded by the application, which is

not a good solution. By allowing the user to specify the maximum number of websites,

it allows much better control over the web crawl.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.2.6 WCRI 105

The user shall be allowed to stop the crawl at any time before it finishes.

• Build Release Applicability: Demo 2, Final Release

Page 17:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

9

3.2.7 WCRI 106

The user shall be notified when the crawl is complete.

• Build Release Applicability: Demo 2, Final Release

3.2.8 WCRI 107

The user shall be kept apprised of the total number of pages left to crawl.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.2.9 WCRI 108

The user shall be apprised of the total number of pages crawled.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.2.10 WCRI 109 [Critical Requirement]

The crawler shall follow the robot exclusionary protocol. This is a critical requirement

because it keeps the web crawler from crawling pages that are not intended to be

crawled. If a robot protocol is not specified for a domain, all pages will considered to

be able to be crawled.

• Build Release Applicability: Demo 2, Final Release

3.2.11 WCRI 110 [Critical Requirement]

The crawler shall use multiple threads to avoid putting too much stress on an individual

web host. This is a critical requirement because it will help prevent overloading a web

host with numerous requests one right after another.

• Build Release Applicability: Demo 2, Final Release

3.3 Web Search Requirements

This section details all of the requirements related to the web search portion of the

project. The requirements are numbered WSRI 1XX, where WSRI stands for Web

Search Requirement Item.

3.3.1 WSRI 100 [Critical Requirement]

The user shall be allowed to search over previously crawled web pages. This is a

critical requirement because it is important to provide a web search functionality

Page 18:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

10

similar to what is available on the web for comparison to the entity search portion of

the project.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.3.2 WSRI 101 [Critical Requirement]

The user shall have a box to enter search terms. This is a critical requirement because

without a box for user to enter search terms, it would not be possible to provide an

entity search capability.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.3.3 WSRI 102

The user shall be allowed to specify the minimum number of back-links required for a

page containing the search term to be considered a match.

• Build Release Applicability: Demo 2, Final Release

3.3.4 WSRI 103

The URLs that match the search terms shall be sorted in order of number of back-links.

• Build Release Applicability: Demo 2, Final Release

3.3.5 WSRI 104 [Critical Requirement]

The URLs that match the search terms shall be displayed in a scrollable text box. This

is a critical requirement because all of the results need to be shown to the user in a

useful fashion.

• Build Release Applicability: Demo 1, Demo 2, Final Release

3.4 Entity Search Requirements

This section details all of the requirements related to the entity search portion of the

project. The requirements are numbered ESRI 1XX, where ESRI stands for Entity

Search Requirement Item.

3.4.1 ESRI 100 [Critical Requirement]

Page 19:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

11

The user shall have the ability to search for entities from previously crawled websites.

This is a critical requirement because providing an entity search capability is the

primary thrust of the project.

• Build Release Applicability: Demo 2, Final Release

3.4.2 ESRI 101 [Critical Requirement]

The user shall have a box to enter search terms. This is a critical requirement because

without a box for user to enter search terms, it would not be possible to provide an

entity search capability.

• Build Release Applicability: Demo 2, Final Release

3.4.3 ESRI 102 [Critical Requirement]

There shall entities for at a minimum: email address, phone number, fax number, street

address, and zip code. This is a critical project requirement, because this is the

minimum amount of information required to reproduce the findings from Tao Cheng’s

work [2].

• Build Release Applicability: Demo 2, Final Release

3.4.4 ESRI 103

There shall be an overarching entity that gathers all contact info.

• Build Release Applicability: Demo 2, Final Release

3.4.5 ESRI 104

The entity search results shall be ranked based on highest score.

• Build Release Applicability: Final Release

3.4.6 ESRI 105 [Critical Requirement]

The user shall be allowed to specify search terms in addition to entity terms. This is a

critical requirement because without being allowed to specify additional search terms, it

would not be possible to return any interesting results.

• Build Release Applicability: Demo 2, Final Release

3.4.7 ESRI 106 [Critical Requirement]

Page 20:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

12

The entities that match the search terms shall be displayed in a scrollable text box. This

is a critical requirement because all of the results need to be shown to the user in a

useful fashion.

• Build Release Applicability: Demo 2, Final Release

4 Assumptions

• Java Runtime Environment 1.3.1 or later will be installed on the computer running the

application.

• In order to run a search, the user will have an active Internet connection.

• In order to perform a Web Crawl in a reasonable amount of time, the user will have a

high-speed Internet connection (DSL or better).

• The user will need a minimum of 512 MB of memory.

• The user will have a computer with a minimum speed of 1.6 GHz.

5 Constraints

• Java will be used for the web crawling. While it will not be as efficient as using other

languages, there is much web functionality defined in the JDK, making it easier to write

the web crawling.

• Entity Search is being limited to searching for contact info entities. An excellent future

enhancement would be to add other entity types.

6 Environment

• Eclipse 3.3.0 will be used as the IDE.

• Java version JDK 1.5 will be used.

• The Jigloo plugin for Eclipse will be used for GUI development.

Page 21:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

13

CHAPTER 2 - Project Plan

1 Task Breakdown

1.1 Project Phases

The project is broken into three distinct phases: the Inception Phase, the Elaboration

Phase, and the Production Phase.

1.1.1 Inception Phase

The inception phase is focused on creating the scope of the project, and developing the

formal project requirements. A vision document will be developed during this phase,

which details the project scope and requirements. A project plan will also be created

during this phase that describes the project schedule and effort estimate. A software

quality assurance plan will also be designed which will list the required project

documentation as well as the steps that will be taken to ensure a quality project is

delivered.

An initial prototype is created during this phase that will show project feasibility. It

will demonstrate some of the project requirements listed in the vision document.

The inception phase is complete when the developer has delivered a prototype as well

as all required documentation to the supervisory committee, and the supervisory

committee has reviewed and approved all items. The first presentation will be given at

the end of this phase.

1.1.2 Elaboration Phase

During the elaboration phase the architecture of the project will be finalized into an

architectural design plan. In addition, all documents from the inception phase will be

Page 22:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

14

updated to include any revisions noted by the supervisory committee from the first

project presentation. The project requirements for the project will be formally specified

using OCL. Also, a formal test plan will be developed that will include the method of

testing, as well as the way of documenting, tracking, and fixing bugs found. Two

fellow MSE students will perform technical inspections of the architectural design and

will report on their findings.

A second prototype will be created during this phase that expands upon the first

prototype. It will demonstrate some of the more challenging project requirements, as

well as showing features requested by the supervisory committee.

The elaboration phase is complete after the developer delivers the second version of the

prototype and all required documentation, and the supervisory committee has given its

approval. The second presentation will be given at the end of this phase.

1.1.3 Production Phase

The production phase focuses on project implementation and testing. During this phase

the developer will complete the coding of the project, as well as produce all supporting

documentation (User Manual, Project Evaluation, Test Logs, etc.)

The production phase is complete when the developer has completed all required

functionality in the project, has delivered the project and all supporting documentation

to the supervisory committee, and the supervisory committee has reviewed and

approved all items. The final presentation will be given at the end of this phase.

1.2 Project Schedule

The current schedule for the project is displayed in Figure 1. If viewing this document

in digital format, the chart can be seen better by increasing the zoom. (A PDF version

of the Gantt chart is also available on the project website.) Note: This schedule held

through both the Inception and Elaboration phases of the project.

Page 23:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

15

Figure 2.1 Project Schedule

2 Cost Estimate

Barry Boehm’s Constructive Cost Model (COCOMO) will be used to estimate project

effort and time. The COCOMO model was developed in the early 1980’s and has a

wide range of applicability to software projects.

2.1 Elaboration Phase - COCOMO

Intermediate COCOMO will be used, which is an extension of Basic COCOMO. It

includes an Effort Adjustment Factors (EAF) variable, which adjusts the level of effort

due to estimated project attributes. The KDD-Research Entity Search Tool project is an

Organic Project in COCOMO terms, because it will be a relatively small software

project with somewhat flexible requirements, and a developer with application

programming experience.

Effort will be estimated using the formula: Effort = 3.2 * EAF * (KLOC)1.05

, where

KLOC represents the number of thousands of lines of source code developed. Time

will be estimated in months using the formula:

Time = 2.5 * Effort0.38

.

There are a total of 15 Effort Adjustment Factors, which have different values within a

give range. Each factor is classified as very low, low, nominal, high, very high, or

Page 24:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

16

extra high. This classification gives a value to the adjustment factor. The 15 Effort

Adjustment Factors can be found in Table 1.

Table 2.1 COCOMO Effort Adjustment Factors

Identifier Effort Adjustment Factor Possible Range of

Values

RELY Required Software Reliability 0.75 – 1.40

DATA Size of Application Database 0.94 – 1.16

CPLX Complexity of the Product 0.70 – 1.65

TIME Run-time Performance Requirements 1.00 – 1.66

STOR Memory Constraints 1.00 – 1.56

VIRT Virtual Machine Volatility 0.87 – 1.30

TURN Required Turnabout Time 0.87 – 1.15

ACAP Analyst Capability 1.46 – 0.71

AEXP Applications Experience 1.29 – 0.82

PCAP Software Engineer Capability 1.42 – 0.70

VEXP Virtual Machine Experience 1.21 – 0.90

LEXP Programming Language Experience 1.14 – 0.95

TOOL Use of Software Tools 1.24 – 0.82

MODP Use of Modern Software Practices 1.24 – 0.83

SCED Required Development Schedule 1.23 – 1.10

The values chosen for the KDD-Research Entity Search Tool are given in Table 2, as

well as an explanation for the value chosen.

Table 2.2 Project Effort Adjustment Factor Values

Identifier Classification Value Reasoning

RELY Low 0.88 Project is not safety critical,

and does not have to be

completely reliable

Page 25:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

17

DATA High 1.08 A large number of web pages

are needed in order to

perform a thorough search

CPLX Nominal 1.00 Web crawling, Web Search,

and Entity Search are not

overly complicated concepts

TIME Nominal 1.00 Response time is important

yet not overly critical

STOR Very High 1.21 Crawling and searching will

require a lot of memory

usage

VIRT Low 0.87 Low complexity of the

hardware and software

TURN Low 0.87 Since this is a single

developer project, the

turnaround time on results is

low

ACAP High 0.86 Developer has 4+ years

experience in software

engineering

AEXP High 0.91 Developer has 3+ years

experience in applications

development

PCAP High 0.86 Developer has applicable

experience

VEXP Nominal 1.00 Developer has 2+ years

experience developing for

Java virtual machine

LEXP High 0.95 Developer has 2+ years

experience developing using

Java

Page 26:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

18

TOOL Nominal 1.00 Moderate experience with

tools being used

MODP Very High 0.83 Developer has 4+ years

experience in employing

modern software engineering

practices

SCED Nominal 1.00 Project has a tight schedule,

but some slippage is

allowable

Based on these numbers, the value for EAF is: 0.95

The estimated size of the project is 2.25 KLOC. This estimated is based determining

the KLOC of other the source of available web crawlers. Simple applet web crawlers

with minimal extra features average about 0.75 KLOC. The 2 KLOC estimate is

calculated by doubling the web crawler estimate to include entity search plus an

additional 0.75 KLOC for additional GUI features that are not available in the applets.

Using these figures, the Effort and Time values are calculated as:

Effort = 3.2 * 0.95 * 2.251.05

= 7.12

Time = 2.5 * 7.120.38

= 5.27

This means that COCOMO estimates that 7.12 staff months will be necessary to

complete the project. The Time value estimates that the project can be completed in

5.27 chronological months. I believe that this estimate is fairly accurate. In fact, it is

very close to the estimate presented in the Gantt chart in Section 1.2. As can be seen in

the Gantt chart, additional time is needed for the project to produce additional

documentation for the MSE project.

The COCOMO model is not without its faults however. It is based on projects that

were created by teams of members, so it may not apply perfectly to projects where

Page 27:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

19

there is a single developer. This estimate also assumes a fairly steady development

time, with little interruption – increased project complexity, scope, or misjudged EAF

values can cause the estimate to be off.

2.2 Production Phase Estimates

Estimates for the Production Phase were complied differently than those for the

Elaboration Phase. At the completion of the Elaboration Phase, there were a total of 2K

SLOC developed. This represented the implementation of 29 out of 34 requirements,

which means that 85 percent of all requirements have been implemented. Assuming

that all requirements represent about the same amount of SLOC to develop, this means

that there are about 353 SLOC left to develop ((2000 / 0.85) - 2000).

At the completion of the Elaboration Phase, software productivity was calculated as

17.86 SLOC per hour. This means that the time remaining in software development

during the production phase should be about 20 hours (353 / 17.86). Due to the

developer only being able to devote about 2 hours a day to software development, this

represents about 10 days worth of coding remaining. The original estimates for testing

(21 days) and documentation (25 days) still hold. This means that the time required for

the Production Phase should be 56 days (10 + 21 + 25).

3 Architecture Elaboration Plan

This section details all of the documents and artifacts that are to be completed by the

end of the Elaboration phase before the second presentation.

3.1 Vision Document Revision

Suggestions from the supervisory committee during the first project presentation

regarding the vision document will be included in a revision of the vision document.

The document will also be updated to include a complete requirements listing. The

requirements will be ranked in order of importance, and will have unique identifiers.

The major professor will approve the changes to the document.

Page 28:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

20

3.2 Project Plan Revision

Suggestions from the supervisory committee during the first project presentation

regarding the project plan will be included in a revision of the project plan. The Gantt

chart will be updated with any changes in schedule, and the COCOMO estimate will be

updated based on any changes regarding the cost estimate. The major professor will

approve the changes to the document.

3.3 Architectural Design

The architectural design document will use UML to create the architectural

components. It will include all state, sequence, class, and data models for the projects.

The major professor will approve the architectural design document.

3.4 Prototype Development

The prototype developed during the Inception Phase will be expanded upon during the

Elaboration Phase. Additions will include new functionality, as well as suggestions

from the supervisory committee during the first project presentation. The features

implemented for the prototype will be approved by the major professor.

3.5 Test Plan

A test plan will be developed that ensures that all requirements specified in the Vision

Plan are met. The document will contain detailed instructions on how to evaluate the

product, and will be approved by the major professor.

3.6 Formal Technical Inspections

Two MSE students will provide input into the project by completing formal technical

inspections. The inspectors will use a formal inspection checklist that will be produced

during the Elaboration Phase. Both inspectors will produce a report based on their

findings.

3.7 Formal Requirements Specification

The web crawling portion of the project will be specified using OCL. This section was

chosen rather than the entity search portion of the project because it will allow for a

Page 29:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

21

more substantial formal specification. The specification will be done in OCL using the

USE (UML-based Specification Environment) tool. The major professor will approve

the formal requirements specification.

4 Software Production Plan

This section details all of the documents and artifacts that are to be completed by the end of

the Production phase before the third presentation.

4.1 Test Plan Revision

Suggestions from the supervisory committee during the second project presentation

regarding the test plan will be included in a revision of the document. The document

will also be updated with specific file names to be loaded. The major professor will

approve the changes to the document.

4.2 Architectural Design Revision

Suggestions from the supervisory committee during the second project presentation

regarding the design will be included in a revision of the Architectural Design

document. The major professor will approve the changes to the document.

4.3 Component Design

The component design document will use UML to convey detailed information about

the software components. It will include all attributes and methods for the classes in

the project. The major professor will approve the component design document.

4.4 Final Software Executable

The prototype developed during the Architecture Elaboration Phase will be expanded

upon during the Production Phase. Additions will include all remaining required

functionality, as well as late suggestions from the supervisory committee during the

second project presentation. The features implemented for the final executable will be

approved by the major professor.

Page 30:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

22

4.5 Formal Technical Inspections

Two MSE students will provide input into the project by completing formal technical

inspections. The inspectors will use a formal inspection checklist that was produced

during the Elaboration Phase. Both inspectors will produce a report based on their

findings.

4.6 User’s Manual

At the completion of software development, the developer will create a User’s Manual,

which will act as a guide for using the completed system. The manual will be broken

up into different sections for performing various tasks within the system, and will act as

a basic walkthrough of the system. The manual will also list various troubleshooting

problems and solutions.

4.7 Test Assessment

At the completion of software development, the developer will run the tests contained

in the Test Plan document, and will record the results. The Test Assessment document

will contain the results of running these tests.

4.8 Technical Instructions for Reuse and Extension

At the completion of software development, the developer shall produce a guide that

explains how to reuse the project in the future for other MSE projects. The document

shall also describe how to extend various features within the project to adapt the project

for different types of use.

4.9 Project Assessment

At the completion of software development and testing, the developer will write up a

document containing the developer’s opinion on the project. The document will

describe in detail what went well, what could have been better, and what simply did not

work. The Project Assessment will also contain the final metrics for the project.

Page 31:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

23

CHAPTER 3 - Software Quality Assurance Plan

1 Software Production Plan

This document defines the steps taken to ensure that the Knowledge Discovery in

Databases (KDD) Research Entity Search Tool project is a high quality product. All

required documentation for the project is listed.

2 Management

2.1 Organization

Supervisory Committee

• Dr. Scott DeLoach

• Dr. David Gustafson

• Dr. William Hsu

Major Professor

• Dr. William Hsu

Developer

• Eric Davis

Formal Technical Inspectors

• Steve Stampbach

• Tim Weninger

2.2 Tasks

All project tasks are discussed in detail in the Project Plan. The Project Plan includes a

Gantt chart that lays out all of the tasks and their deadlines.

Page 32:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

24

2.3 Responsibilities

2.2.1 Supervisory Committee

The role of the supervisory committee is to prepare for and attend each of the three

project presentations that will occur at the end of each project phase. The committee

members will provide feedback and suggestions on the state of the project.

2.2.2 Major Professor

The role of the major professor is twofold: to act as a supervisory committee member,

and to meet weekly with the developer to discuss progress, expectations, and to provide

suggestions.

2.2.3 Developer

The role of the developer is to produce the product and all supporting documentation.

The developer is responsible for maintaining a time log, and for meeting weekly with

the major professor to discuss the project.

2.2.4 Formal Technical Inspectors

The roles of the formal technical inspectors are for completing a formal inspection of

the project’s architecture, design, and source code. They will submit a report on their

findings during the formal inspection.

3 Documentation

The official documentation requirements for MSE projects are defined at:

http://mse.cis.ksu.edu/online/mse-portfolio.htm. Additional documentation may be

required at the discretion of the major professor and developer. The planned

documentation for the project is listed in Section 12 of this document.

All project documentation will be available on the project website:

http://www.cis.ksu.edu/~efd3467/index.html

4 Standards, Practices, Conventions, and Metrics

Page 33:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

25

4.1 Documentation Standards

IEEE standards will be followed for all applicable documentation throughout the

project

4.2 Coding Standards

Java naming conventions will be followed for all source code developed. Source code

API will be generated using Javadoc.

4.3 Metrics

COCOMO will be used to estimate project effort.

5 Reviews and Audits

All documentation, source code, and executable products will be evaluated by members of

the supervisory committee at the conclusion of each phase of the project. Formal

inspections of the architecture, design, and source code will be conducted by the formal

technical inspectors when the coding is complete.

6 Testing

The Test Plan will list the test procedures and expected results of all tests in detail.

However, a brief description of what type of testing will be performed will be given below.

General testing of the web crawling and web search portions of the project will be

performed by crawling the Kansas State University Department of Computer and

Information Sciences domain and searching for specific pieces of information. Sample

queries include “professor of machine learning”, “computer graphics”, and “enrollment

forms”. The results returned will then be verified manually to ensure that the pages being

returned actually contain the requested search strings.

The formal testing of the entity search portion of project will follow the same tests as found

in Tao Cheng’s entity search work [2]. A dataset based on a 2006 general web crawl from

WebBase Project will be used. The original data was over 2TB, so it will have to be scaled

down in order to allow reasonable testing to occur. Once the data is scaled down, entity

Page 34:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

26

searches will be performed on the data to see if the correct information can be extracted.

Sample queries include “Amazon Customer Service #phone”, “Bill Gates #email”, and

“Ebay Customer Service #phone”. The results given as the best results by the entity

searcher will be manually checked for accuracy.

7 Problem Reporting and Corrective Actions

All problems found during testing will be recorded in the Software Problem Report

spreadsheet. Each problem found will list the problem, the estimated time to fix, the date

fixed, and the corrective action taken. If the problem cannot or will not be solved during

the project, it will be noted. All problems will be discussed with the major professor.

8 Tools, Technologies, and Methodologies

The following tools will be used for coding, testing, and documentation:

• Eclipse IDE – for software development

• Eclipse FatJar – for building executable JAR files

• Eclipse Jigloo Plug-in – for GUI development

• Microsoft Word – for documentation development

• Microsoft Excel – for risk and problem report tracking and time logs

• Microsoft PowerPoint – for project presentation creation

• Adobe Acrobat – for document conversion to PDF

• Microsoft Project – for project planning

• Microsoft Visio – for software design development

• USE 2.3.1 – for developing formal specifications

9 Code and Media Control

All developed source code will be controlled using a CVS system. The CVS is located at:

http://fingolfin.user.cis.ksu.edu/repos/KDD/projects/entitysearch.

All documents will be maintained on the developer’s personal computer with associated

version numbers. Change logs will be maintained in each document. All completed

Page 35:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

27

project documentation will be available on the project website at:

http://www.cis.ksu.edu/~efd3467/index.html.

10 Risk Management

Software risks will be documented in the Software Risk Reporting and Mitigation

spreadsheet. The risks and potential mitigation strategies will be discussed with the major

professor as they appear.

11 Deliverables

The following are the deliverables for each phase of the project:

Phase I

• Vision Document

• Project Plan

• Prototype Demonstration

• Software Quality Assurance Plan

• Time Log

• Presentation

Phase II

• Vision Document

• Project Plan

• Software Requirements Specification

• Architecture Design

• Test Plan

• Software Risk Reporting and Mitigation Document

• Technical Inspection Checklist

• Executable Architecture Prototype

• Action Items

• Time Log

Page 36:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

28

• Presentation

Phase III

• Component Design

• Source Code

• Executable Project

• User Manual

• Formal Technical Inspection Letters

• Project Evaluation

• Software Problem Reports

• Time Log

• Presentation

Page 37:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

29

CHAPTER 4 - Architectural Design

1 Introduction

The purpose of this document is to provide an architectural design of the KDD-Research

Entity Search Tool (KREST). The document will illustrate class diagrams and sequence

diagrams. The purpose of each class in the diagrams will be given. Also, a formal

specification of the web crawler portion of the project will be given in Section 3.

1.1 Background

The purpose of KREST is to provide a multifunctional web search tool that runs as a

standalone application. The project allows the user to perform a web crawl, to perform

a basic web search over the crawled pages, and to perform an entity search over the

crawled pages. The project also allows the user to perform web searches and entity

searches based on datasets that can be loaded into the tool.

2 KDD-Research Entity Search Tool Architecture

2.1 Package View

The KREST project will follow the Model-View-Controller (MVC) architecture, with

an application class to kick off the project. This allows the screen to be easily updated

via changes to the model.

Page 38:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

30

Figure 4.1 Package View

2.2 Application Package

Figure 4.2 KREST Application Package

2.2.1 Class Description

2.1.1.1 KrestApplication

The KrestApplication class is a very simple class that will be used to start up the

program. It will startup the KrestController and makes it visible.

2.3 Controller Package

Page 39:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

31

Figure 4.3 Controller Package

2.3.1 Class Description

2.3.1.1 KrestController

The KrestController class is the class responsible for getting all of the other parts up

and running. It is responsible for signaling the web crawls, web searches, and entity

searches to begin processing. It also controls displaying the form.

Page 40:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

32

Figure 4.4 KrestController Class

2.3.1.2 KrestAboutDialog

The KrestAboutDialog class is a Dialog that displays information about the KREST

application.

Page 41:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

33

Figure 4.5 KrestAboutDialog Class

2.3.1.3 WebCrawler

The WebCrawler class is responsible for setting up everything needed for a web

crawl, and starting the process to do it.

Figure 4.6 WebCrawler Class

2.3.1.4 SiteVisitor

The SiteVisitor is responsible for visiting individual web pages. Each instance of

the SiteVisitor class is a thread that represents a different web page being visited.

Page 42:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

34

Figure 4.7 SiteVisitor Class

2.3.1.5 ThreadController

The ThreadController class is responsible for ensuring that only up to the

maximum number of specified web crawling threads are running at any one time.

The web crawling threads are instances of the SiteVisitor class. The

ThreadController maintains tickets to keep track of which threads are allowed to

run. If a thread has a ticket, it is allowed to run, otherwise it sleeps while waiting

to grab a ticket.

Figure 4.8 ThreadController Class

2.3.1.6 HTTPReader

Page 43:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

35

The HTTPReader class is responsible for downloading the text of a given web page.

If the given web page does not exist, it will throw an exception.

Figure 4.9 HTTPReader Class

2.3.1.7 WebSearcher

The WebSearcher class is responsible for setting up everything needed for a web

search, and starting the process to do it.

Figure 4.10 WebSearcher Class

2.3.1.8 EntitySearcher

The EntitySearcher class is responsible for setting up everything needed for a web

crawl, and starting the process to do it.

Figure 4.11 EntitySearcher Class

Page 44:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

36

2.4 View Package

Figure 4.12 View Package

2.4.1 Class Description

2.4.1.1 KrestView

The KrestView class is an abstract class that can be implemented by the

CrawlerObserver, the SearchObserver, and the EntityObserver classes. It is used to

update the display based on changes from the model.

Figure 4.13 KrestView Class

2.4.1.2 CrawlerObserver

The CrawlerObserver class is responsible for updating the screen when the model

changes due to web crawling.

Page 45:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

37

Figure 4.14 CrawlerObserver Class

2.4.1.3 SearchObserver

The SearchObserver class is responsible for updating the screen when the model

changes due to web searching.

Figure 4.15 SearchObserver Class

2.4.1.4 EntityObserver

The EntityObserver class is responsible for updating the screen when the model

changes due to entity searching.

Page 46:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

38

Figure 4.16 EntityObserver Class

2.5 Model Package

Figure 4.17 Model Package

2.5.1 Class Description

2.5.1.1 KrestModel

The KrestModel class is responsible for holding the current KrestObjectLibrary

object, and making the appropriate pieces available to other classes.

Figure 4.18 KrestModel Class

2.5.1.2 KrestObjectLibrary

Page 47:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

39

The KrestObjectLibrary class is responsible for holding onto all created

WebObjects.

Figure 4.19 KrestObjectLibrary Class

2.5.1.3 WebObject

The WebObject class is an abstract class that can be implemented by both the

Webpage class and the KrestEntity class. It is used to hold data found based on

web crawls and web or entity searches.

Figure 4.20 WebObject Class

2.5.1.4 Webpage

The Webpage class is responsible for holding onto information about a single web

site. Each web site explored will have its own Webpage instance.

Figure 4.21 Webpage Class

2.5.1.5 KrestEntity

Page 48:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

40

The KrestEntity class is an abstract class that can be implemented by the

AddressEntity, EmailEntity, FaxEntity, PhoneEntity, ZipEntity, and

OverarchingEntity. It is used to hold data found based on entity searches.

Figure 4.22 KrestEntity Class

2.5.1.6 AddressEntity

The AddressEntity class is responsible for holding onto information about a single

address entity. Each address found during an entity search will have its own

instance of this class.

Figure 4.23 AddressEntity Class

2.5.1.7 EmailEntity

The EmailEntity class is responsible for holding onto information about a single

email entity. Each email address found during an entity search will have its own

instance of this class.

Figure 4.24 EmailEntity Class

Page 49:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

41

2.5.1.8 FaxEntity

The FaxEntity class is responsible for holding onto information about a single fax

entity. Each fax number found during an entity search will have its own instance of

this class.

Figure 4.25 FaxEntity Class

2.5.1.9 PhoneEntity

The PhoneEntity class is responsible for holding onto information about a single

phone entity. Each phone number found during an entity search will have its own

instance of this class.

Figure 4.26 PhoneEntity Class

2.5.1.10 ZipEntity

The ZipEntity class is responsible for holding onto information about a single zip

entity. Each zip code found during an entity search will have its own instance of

this class.

Figure 4.27 ZipEntity Class

Page 50:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

42

2.5.1.11 OverarchingEntity

The OverarchingEntity class is responsible for holding onto information about all

entity types. Each street address, email address, fax number, phone number, and

zip code found during an overarching entity search will have its own instance of

this class.

Figure 4.28 OverarchingEntity Class

2.6 Sequence Diagrams

The following three sub-sections show the sequence diagrams for three different user

actions: performing a web crawl, performing a web search, and performing an entity

search.

2.6.1 User Performs a Web Crawl

Prerequisites: KREST is already running.

Sequence of Events:

1. User presses the ‘Begin Crawl’ button.

Page 51:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

43

2. The KrestController is notified that the crawl button was pressed, and tells the

WebCrawler to begin the crawl.

3. The WebCrawler tells the SiteVisitor to start visiting web pages.

4. SiteVisitor updates the model with the web pages visited.

5. SiteVisitor notifies that the crawl is complete.

6. WebCrawler updates the screen with the latest information via the

CrawlerObserver.

Post-conditions: The KrestModel is update with all pages visited, and the screen is

updated for the user.

Figure 4.29 Web Crawl Sequence Diagram

2.6.2 User Performs a Web Searchl

Prerequisites:

1. KREST is already running.

2. A web crawl has already been performed.

Page 52:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

44

Sequence of Events:

1. User presses the ‘Begin Search’ button.

2. The KrestController is notified that the search button was pressed, and tells

the WebSearcher to begin the search.

3. The WebSearcher queries the KrestModel for all Webpages.

4. The WebSearcher searches through the crawled pages for the search terms.

5. WebSearcher updates the screen with the matching pages via the

SearchObserver.

Post-conditions: The screen is updated with all matching web pages for the user.

Figure 4.30 Web Search Sequence Diagram

2.6.3 User Performs an Entity Search

Prerequisites:

Page 53:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

45

1. KREST is already running.

2. A web crawl has already been performed.

Sequence of Events:

1. User presses the ‘Begin Search’ button.

2. The KrestController is notified that the search button was pressed, and tells

the EntitySearcher to begin the search.

3. The EntitySearcher queries the KrestModel for all Webpages.

4. The EntitySearcher searches through the crawled pages for the search terms,

and extracts corresponding entities.

5. EntitySearcher updates the screen with the entities found via the

EntityObserver.

Post-conditions: The screen is updated with all found entities.

Figure 4.31 Entity Search Sequence Diagram

Page 54:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

46

3 Formal Specification

The project was formally specified in OCL and validated using USE 2.3.1. All of the

important classes, attributes, and operations were specified. Invariants and pre and post

conditions were also specified. The formal specification is contained below:

model krest

--

-- APPLICATION PACKAGE

--

class KrestApplication

operations

init()

end

--

-- CONTROLLER PACKAGE

--

class KrestController

attributes

operations

KrestController()

initGui()

crawlButtonActionPerformed()

resetCrawlerButtonActionPerformed()

searchButtonActionPerfomed()

resetTables()

entitySearchButtonActionPerformed()

end

Page 55:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

47

class KrestAboutDialog

operations

KrestAboutDialog()

end

class WebSearcher

attributes

matches: Set(WebObject)

searchString: String

operations

WebSearcher(newSearchString: String)

beginSearch()

end

class WebCrawler

attributes

siteVisitor: SiteVisitor

debugSwitch: Boolean

operations

WebCrawler()

beginCrawl(pageAddress: String, searchString: String, maxToCrawl: Integer,

minBacklinks: Integer, filePath: String, maxDepth: Integer)

stopCrawling()

performReset(partial: Boolean)

getMatches(): Set(WebObject)

getSiteVisitorThreads(): Integer

getSiteVisitor(): SiteVisitor

end

class ThreadController

Page 56:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

48

attributes

crowdSize: Integer

maxCrowdSize: Integer

ticketDatabase: Integer

operations

ThreadController(maxCrowdSizes: Integer)

getTicket(): Integer

returnTicket(ticket: Integer)

findFreeTicket(): Integer

end

class EntitySearcher

attributes

entityMatches: Set(KrestEntity)

entityType: Integer

searchString: String

operations

EntitySearcher(newSearchString: String)

beginSearch()

end

class SiteVisitor

attributes

MAX_THREADS: Integer

threadLimiter: ThreadController

debugSwitch: Boolean

searchString: String

crawlCounter: Integer

pagesToCrawl: Integer

pagesVisited: Integer

maxCrawl: Integer

Page 57:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

49

threadCount: Integer

fileName: String

pageAddress: String

keepProcessing: Boolean

threadList: Set(SiteVisitor)

maxDepth: Integer

currentDepth: Integer

pageDatabase: Set(Webpage)

pageToFetch: Webpage

operations

SiteVisitor(pageAddr: String, searchStr: String, maxToCrawl: Integer, filePath: String,

maxSearchableDepth: Integer, curDepth: Integer)

start()

run()

stopAllThreads()

resetCrawler(partial: Boolean)

getMatches(): Set(Webpage)

getThreadCount(): Integer

getCrawlCount(): Integer

getQueueCount(): Integer

loadPage(page: String): String

extractHyperTextLinks(page: String)

containsSearchString(page: String): Boolean

alreadyVisited(pageAddr: String): Boolean

markAsVisited(pageAddr: String)

end

class HTTPReader

attributes

HTTP_PORT: Integer

operations

Page 58:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

50

HTTPReader()

downloadWWWPage(): String

end

--

-- MODEL PACKAGE

--

class KrestModel

attributes

library: KrestObjectLibrary

name: String

operations

KrestModel()

setName(newName: String)

getName(): String

addObject(webObject: WebObject)

removeObject(webObject: WebObject)

getData(): KrestObjectLibrary

addObserver(observer: KrestView)

end

class KrestObjectLibrary

attributes

objects: Set(WebObject)

operations

KrestObjectLibrary()

findObjectByName(name: String): WebObject

findObjectsByType(type: Integer): KrestObjectLibrary

getKeys(): Set(String)

Page 59:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

51

addObject(newObject: WebObject)

removeObject(objectToRemove: WebObject)

getAllObjects(): Set(WebObject)

end

class WebObject

attributes

name: String

operations

WebObject(newName: String)

getName(): String

setName(newName: String)

end

class Webpage < WebObject

attributes

pageText: String

backlinksCount: Integer

backlinks: Set(String)

operations

Webpage(newName: String)

getText(): String

setText(newText:String)

getBacklinkCount(): Integer

addNewBacklink(backlinkName: String)

end

class KrestEntity < WebObject

attributes

entityName: String

entityPattern: String

Page 60:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

52

numberOfOccurrences: Integer

occurrenceList: Set(KrestEntity)

operations

KrestEntity()

getName(): String

setName(newName: String)

addOccurrence(websiteFound: String)

getAllOccurrences(): Set(KrestEntity)

end

class AddressEntity < KrestEntity

attributes

streetAddress: String

cityString: String

stateString: String

operations

AddressEntity(newStreet: String, newCity: String, newState: String)

getStreet(): String

getCity(): String

getState(): String

setStreet(newStreet: String)

setCity(newCity: String)

setState(newState: String)

end

class PhoneEntity < KrestEntity

attributes

areaCode: String

phoneNumber: String

operations

PhoneEntity(newAreaCode: String, newPhoneNumber: String)

Page 61:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

53

getAreaCode(): String

getPhoneNumber(): String

getAreaAndPhoneNumber(): String

setAreaCode(newAreaCode: String)

setPhoneNumber(newPhoneNumber: String)

end

class FaxEntity < KrestEntity

attributes

areaCode: String

faxNumber: String

operations

FaxEntity(newAreaCode: String, newFaxNumber: String)

getAreaCode(): String

getFaxNumber(): String

getFullFaxNumber(): String

setAreaCode(newAreaCode: String)

setFaxNumber(newFaxNumber: String)

end

class ZipEntity < KrestEntity

attributes

zipCode: String

operations

ZipEntity(newZipCode: String)

getZipCode(): String

setZipCode(newZipCode: String)

end

class OverarchingEntity < KrestEntity

attributes

Page 62:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

54

phoneAreaCode: String

phoneNumber: String

faxAreaCode: String

faxNumber: String

streetString: String

cityString: String

stateString: String

zipString: String

emailAddress: String

operations

OverarchingEntity(phoneAreaCodeString: String, phoneNumberString: String,

faxAreaCodeString: String, faxNumberString: String, newStreetString: String, newCityString:

String, newStateString: String, newZipCodeString: String, newEmailAddress: String)

getPhoneAreaCode(): String

getPhoneNumber(): String

getAreaAndPhoneNumber(): String

getFaxAreaCode(): String

getFaxNumber(): String

getFaxAreaAndNumber(): String

getStreetAddress(): String

getCity(): String

getState(): String

getZipCode(): String

getEmailAddress(): String

setPhoneAreaCode(newAreaCode: String)

setPhoneNumber(newPhoneNumber: String)

setFaxAreaCode(newAreaCode: String)

setFaxNumber(newFaxNumber: String)

setStreetAddress(newStreetAddress: String)

setCity(newCity: String)

setState(newState: String)

Page 63:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

55

setZipCode(newZipCode: String)

setEmailAddress(newEmailAddress: String)

end

class EmailEntity < KrestEntity

attributes

emailAddress: String

operations

EmailEntity(newEmail: String)

getEmailAddress(): String

setEmailAddrses(newEmail: String)

end

--

-- VIEW PACKAGE

--

class KrestView

attributes

crawler: CrawlObserver

search: SearchObserver

entity: EntityObserver

operations

KrestObserver()

end

class CrawlObserver

attributes

operations

CrawlerObserver()

Page 64:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

56

updateCurrentlyCrawlingField(): String

updateCrawledURLsTextField(): String

updateQueuedSitesTextField(): String

updateCrawlProgressBar(): Integer

end

class EntityObserver

attributes

operations

EntityObserver()

updateEntitySearchResults(results: KrestObjectLibrary)

end

class SearchObserver

attributes

operations

SearchObserver()

updateWebSearchResults(results: KrestObjectLibrary)

end

--

-- ASSOCIATIONS

--

--

-- CONTROLLER PACKAGE

--

association Dialog between

KrestController[1]

Page 65:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

57

KrestAboutDialog[0..1] role dialog

end

association Searcher between

KrestController[1]

WebSearcher[1] role searcher

end

association Crawler between

KrestController[1]

WebCrawler[0..1] role crawler

end

association Entity between

KrestController[1]

EntitySearcher[1] role entity

end

association Threads between

WebCrawler[1]

ThreadController[1] role threads

end

association Visitor between

WebCrawler[1]

SiteVisitor[1..*] role visitor

end

association Reader between

SiteVisitor[1]

HTTPReader[1] role reader

Page 66:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

58

end

--

-- MODEL PACKAGE

--

association Library between

KrestModel[1]

KrestObjectLibrary[1] role library

end

association Objects between

KrestObjectLibrary[1]

WebObject[0..*] role objects

end

--

-- VIEW PACKAGE

--

association CrawlerView between

KrestView[1]

CrawlObserver[1] role crawlerview

end

association SearcherView between

KrestView[1]

SearchObserver[1] role searchview

end

association EntityView between

Page 67:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

59

KrestView[1]

EntityObserver[1] role entityview

end

--

-- CONSTRAINTS

--

constraints

--

-- All WebSearcher matches must have unique names

--

context ws : WebSearcher

inv UniqueNamesWebSearcherMatches:

ws.matches->forAll(p1,p2 | p1 <> p2

implies p1.name <> p2.name)

--

-- Every ThreadController has a current crowd size, a max crowd size,

-- and a tickets in the database count >= 0

--

context tc : ThreadController

inv PositiveCrowdSize:

tc.crowdSize >= 0

inv PositiveMaxCrowdSize:

tc.maxCrowdSize >= 0

inv PositiveDatabaseTicketsCount:

tc.ticketDatabase >= 0

Page 68:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

60

--

-- All EntitySearcher matches must have unique names, and entity type

-- must be >= 0

--

context es : EntitySearcher

inv UniqueNamesEntitySearcherMatches:

es.entityMatches->forAll(p1,p2 | p1 <> p2

implies p1.entityName <> p2.entityName)

inv PositiveEntityType:

es.entityType >= 0

--

-- Every SiteVisitor has a MAX_THREADS value that must be >= 0,

-- crawlCounter that must be >= 0, a pages left

-- to crawl counter that must be >= 0, a pages visited counter that must be >= 0,

-- a max number of pages to

-- crawl that must be >= 0, a current thread count that must be >= 0, a maximum

-- search depth value that must

-- be >= 0, a current depth count that must be >= 0 and <= the max search depth,

-- a page database that only

-- contains unique webpages

--

context sv : SiteVisitor

inv PositiveMaxThreads:

sv.MAX_THREADS >= 0

inv PositiveCrawlCounter:

sv.crawlCounter >= 0

inv PositivePagesToCrawl:

Page 69:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

61

sv.pagesToCrawl >= 0

inv PositivePagesVisited:

sv.pagesVisited >= 0

inv PositiveMaxCrawlCount:

sv.maxCrawl >= 0

inv PositiveThreadCount:

sv.threadCount >= 0

inv PositiveMaxSearchDepth:

sv.maxDepth >= 0

inv PositiveCurrentDepth:

sv.currentDepth >= 0

inv CurrentDepthNotGreaterThanMaxDepth:

sv.currentDepth <= sv.maxDepth

inv UniqueWebpagesOnly:

sv.pageDatabase->forAll(p1,p2 | p1 <> p2

implies p1.name <> p2.name)

--

-- Every HTTPReader has a HTTP_PORT value between 0 and 65535

--

context hr : HTTPReader

inv PositivePortValue:

hr.HTTP_PORT >= 0

inv PortValueLessThanMax:

hr.HTTP_PORT <= 65535

--

-- All KrestObjectLibrary objects must have unique names

--

Page 70:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

62

context lib : KrestObjectLibrary

inv UniqueNamesKrestObjectLibrary:

lib.objects->forAll(p1,p2 | p1 <> p2

implies p1.name <> p2.name)

--

-- Every Webpage object has a positive number of backlinks

--

context wp : Webpage

inv PositiveBacklinks:

wp.backlinksCount >= 0

--

-- All KrestEntities must have unique names, and the number of occurrences must

-- be positive

--

context ent : KrestEntity

inv UniqueNamesKrestEntities:

ent.occurrenceList->forAll(p1,p2 | p1 <> p2

implies p1.entityName <> p2.entityName)

inv PositiveOccurrenceCount:

ent.numberOfOccurrences >= 0

--

-- All WebObjects must be either a Webpage or KrestEntity, but not both

--

context wo: WebObject

inv IsOneOfItsSubtypes:

Page 71:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

63

wo.oclIsKindOf(Webpage) or wo.oclIsKindOf(KrestEntity)

inv MutualExclusion1:

if wo.oclIsKindOf(Webpage) then not wo.oclIsKindOf(KrestEntity) else

wo.oclIsKindOf(KrestEntity) endif

--

-- OPERATIONS

--

--

-- Any added objects to the KrestModel must be new objects

--

context KrestModel::addObject(webObject: WebObject)

pre cond1 : library.objects->excludes(webObject)

post cond2 : library.objects = library.objects@pre->including(webObject)

post cond3 : (library.objects - library.objects@pre)->size() = 1

--

-- Deleting an object from the KrestModel must remove it while the other objects

-- remain unchanged

--

context KrestModel::removeObject(webObject: WebObject)

pre cond1 : library.objects->includes(webObject)

post cond2 : library.objects = library.objects@pre->excluding(webObject)

post cond3 : (library.objects@pre - library.objects)->size() = 1

--

-- Finding an object by name in the KrestObjectLibrary

--

Page 72:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

64

context KrestObjectLibrary::findObjectByName(name: String): WebObject

post cond1 : result = objects->any(c1 | c1.name = name)

--

-- Any added objects to the KrestObjectLibrary must be new objects

--

context KrestObjectLibrary::addObject(newObject: WebObject)

pre cond1 : objects->excludes(newObject)

post cond2 : objects = objects@pre->including(newObject)

post cond3 : (objects - objects@pre)->size() = 1

--

-- Deleting an object from the KrestObjectLibrary must remove it while the other

-- objects remain unchanged

--

context KrestObjectLibrary::removeObject(objectToRemove: WebObject)

pre cond1 : objects->includes(objectToRemove)

post cond2 : objects = objects@pre->excluding(objectToRemove)

post cond3 : (objects@pre - objects)->size() = 1

--

-- Getting all objects from the KrestObjectLibrary returns all objects

--

context KrestObjectLibrary::getAllObjects(): Set(WebObject)

post cond1 : result = self.objects

--

Page 73:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

65

-- Getting the name from the WebObject returns its name

--

context WebObject::getName(): String

post cond1 : result = self.name

--

-- Setting the name for the WebObject sets its name

--

context WebObject::setName(newName: String)

post cond1 : self.name = newName

--

-- Getting the page text from the Webpage returns its text

--

context Webpage::getText(): String

post cond1 : result = self.pageText

--

-- Setting the pageText for the Webpage sets its text

--

context Webpage::setText(newText: String)

post cond1 : self.pageText= newText

--

-- Getting the backlink count from the Webpage returns its count

--

Page 74:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

66

context Webpage::getBacklinkCount(): Integer

post cond1 : result = self.backlinksCount

--

-- Any added backlinks to the Webpage must be new objects

--

context Webpage::addNewBacklink(backlinkName: String)

pre cond1 : backlinks->excludes(backlinkName)

post cond2 : backlinks = backlinks@pre->including(backlinkName)

post cond3 : (backlinks - backlinks@pre)->size() = 1

--

-- Getting the name from the KrestEntity returns its name

--

context KrestEntity::getName(): String

post cond1 : result = self.entityName

--

-- Setting the name for the KrestEntity sets its entityName

--

context KrestEntity::setName(newName: String)

post cond1 : self.entityName = newName

--

-- Any added occurrence of an entity will be a new occurrence, and will increment

-- the number of occurrences

--

Page 75:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

67

context KrestEntity::addOccurrence(websiteFound: String)

pre cond1 : self.occurrenceList.entityName->excludes(websiteFound)

post cond2 : occurrenceList.entityName = occurrenceList.entityName@pre-

>including(websiteFound)

post cond3 : (occurrenceList - occurrenceList@pre)->size() = 1

post cond4 : (numberOfOccurrences - numberOfOccurrences@pre) = 1

--

-- Getting all KrestEntity occurrences returns the list of occurrences

--

context KrestEntity::getAllOccurrences(): Set(KrestEntity)

post cond1 : result = self.occurrenceList

--

-- Creating a new AddressEntity sets the values passed in

--

context AddressEntity::AddressEntity(newStreet: String, newCity: String, newState:

String)

post cond1 : streetAddress = newStreet

post cond2 : cityString = newCity

post cond3 : stateString = newState

--

-- Getting the Street from AddressEntity returns the Street string

--

context AddressEntity::getStreet(): String

post cond1 : result = self.streetAddress

Page 76:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

68

--

-- Getting the City from AddressEntity returns the city string

--

context AddressEntity::getCity(): String

post cond1 : result = self.cityString

--

-- Getting the stateString from AddressEntity returns the state string

--

context AddressEntity::getState(): String

post cond1 : result = self.stateString

--

-- Setting the street for AddressEntity stores the new street

--

context AddressEntity::setStreet(newStreet: String)

post cond1 : streetAddress = newStreet

--

-- Setting the city for AddressEntity stores the new city

--

context AddressEntity::setCity(newCity: String)

post cond1 : cityString = newCity

--

-- Setting the state for AddressEntity stores the new state

--

Page 77:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

69

context AddressEntity::setState(newState: String)

post cond1 : stateString = newState

--

-- Creating a new PhoneEntity sets the values passed in

--

context PhoneEntity::PhoneEntity(newAreaCode: String, newPhoneNumber: String)

post cond1 : areaCode = newAreaCode

post cond2 : phoneNumber = newPhoneNumber

--

-- Getting the Area Code from PhoneEntity returns the area code

--

context PhoneEntity::getAreaCode(): String

post cond1 : result = self.areaCode

--

-- Getting the Phone Number from PhoneEntity returns the phone number

--

context PhoneEntity::getPhoneNumber(): String

post cond1 : result = self.phoneNumber

--

-- Getting the Area Code and Phone Number from PhoneEntity returns the area code

concatenated with the phone number

--

Page 78:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

70

context PhoneEntity::getAreaAndPhoneNumber(): String

post cond1 : result = self.areaCode.concat(self.phoneNumber)

--

-- Setting the area code for PhoneEntity stores the new area code

--

context PhoneEntity::setAreaCode(newAreaCode: String)

post cond1 : areaCode = newAreaCode

--

-- Setting the phone number for PhoneEntity stores the new phone number

--

context PhoneEntity::setPhoneNumber(newPhoneNumber: String)

post cond1 : phoneNumber = newPhoneNumber

--

-- Creating a new FaxEntity sets the values passed in

--

context FaxEntity::FaxEntity(newAreaCode: String, newFaxNumber: String)

post cond1 : areaCode = newAreaCode

post cond2 : faxNumber = newFaxNumber

--

-- Getting the Area Code from FaxEntity returns the area code

--

context FaxEntity::getAreaCode(): String

post cond1 : result = self.areaCode

Page 79:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

71

--

-- Getting the Fax Number from FaxEntity returns the phone number

--

context FaxEntity::getFaxNumber(): String

post cond1 : result = self.faxNumber

--

-- Getting the Area Code and Fax Number from FaxEntity returns the area code

-- concatenated with the fax number

--

context FaxEntity::getFullFaxNumber(): String

post cond1 : result = self.areaCode.concat(self.faxNumber)

--

-- Setting the area code for FaxEntity stores the new area code

--

context FaxEntity::setAreaCode(newAreaCode: String)

post cond1 : areaCode = newAreaCode

--

-- Setting the fax number for FaxEntity stores the new phone number

--

context FaxEntity::setFaxNumber(newFaxNumber: String)

post cond1 : faxNumber = newFaxNumber

--

Page 80:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

72

-- Creating a new ZipEntity sets the value passed in

--

context ZipEntity::ZipEntity(newZipCode: String)

post cond1 : zipCode = newZipCode

--

-- Getting the Zip Code from ZipEntity returns the zip code

--

context ZipEntity::getZipCode(): String

post cond1 : result = self.zipCode

--

-- Setting the zip code for ZipEntity stores the new zip code

--

context ZipEntity::setZipCode(newZipCode: String)

post cond1 : zipCode = newZipCode

--

-- Creating a new EmailEntity sets the value passed in

--

context EmailEntity::EmailEntity(newEmail: String)

post cond1 : emailAddress = newEmail

--

-- Getting the Email Address from EmailEntity returns the email adress

--

Page 81:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

73

context EmailEntity::getEmailAddress(): String

post cond1 : result = self.emailAddress

--

-- Setting the Email Address for EmailEntity stores the new email address

--

context EmailEntity::setEmailAddrses(newEmail: String)

post cond1 : emailAddress = newEmail

--

-- Creating a new OverarchingEntity sets the values passed in

--

context OverarchingEntity::OverarchingEntity(phoneAreaCodeString: String,

phoneNumberString: String, faxAreaCodeString: String, faxNumberString: String,

newStreetString: String, newCityString: String, newStateString: String, newZipCodeString:

String, newEmailAddress: String)

post cond1 : self.phoneAreaCode = phoneAreaCodeString

post cond2 : self.phoneNumber = phoneNumberString

post cond3 : self.faxAreaCode = faxAreaCodeString

post cond4 : self.faxNumber = faxNumberString

post cond5 : self.streetString = newStreetString

post cond6 : self.cityString = newCityString

post cond7 : self.stateString = newStateString

post cond8 : self.zipString = newZipCodeString

post cond9 : self.emailAddress = newEmailAddress

--

-- Getting the Area Code from OverarchingEntity returns the area code

--

Page 82:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

74

context OverarchingEntity::getPhoneAreaCode(): String

post cond1 : result = self.phoneAreaCode

--

-- Getting the Phone Number from OverarchingEntity returns the phone number

--

context OverarchingEntity::getPhoneNumber(): String

post cond1 : result = self.phoneNumber

--

-- Getting the Area Code and Phone Number from OverarchingEntity returns the area

-- code concatenated with the phone number

--

context OverarchingEntity::getAreaAndPhoneNumber(): String

post cond1 : result = self.phoneAreaCode.concat(self.phoneNumber)

--

-- Getting the Area Code from OverarchingEntity returns the area code

--

context OverarchingEntity::getFaxAreaCode(): String

post cond1 : result = self.faxAreaCode

--

-- Getting the Fax Number from OverarchingEntity returns the phone number

--

context OverarchingEntity::getFaxNumber(): String

Page 83:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

75

post cond1 : result = self.faxNumber

--

-- Getting the Area Code and Fax Number from OverarchingEntity returns the

-- area code concatenated with the fax number

--

context OverarchingEntity::getFaxAreaAndNumber(): String

post cond1 : result = self.faxAreaCode.concat(self.faxNumber)

--

-- Getting the Street from OverarchingEntity returns the Street string

--

context OverarchingEntity::getStreetAddress(): String

post cond1 : result = self.streetString

--

-- Getting the City from OverarchingEntity returns the city string

--

context OverarchingEntity::getCity(): String

post cond1 : result = self.cityString

--

-- Getting the stateString from OverarchingEntity returns the state string

--

context OverarchingEntity::getState(): String

post cond1 : result = self.stateString

Page 84:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

76

--

-- Getting the zip code from OverarchingEntity returns the zip code string

--

context OverarchingEntity::getZipCode(): String

post cond1 : result = self.zipString

--

-- Getting the email address from OverarchingEntity returns the email address

-- string

--

context OverarchingEntity::getEmailAddress(): String

post cond1 : result = self.emailAddress

--

-- Setting the phone area code for OverarchingEntity stores the new area code

--

context OverarchingEntity::setPhoneAreaCode(newAreaCode: String)

post cond1 : phoneAreaCode = newAreaCode

--

-- Setting the phone number for OverarchingEntity stores the new phone number

--

context OverarchingEntity::setPhoneNumber(newPhoneNumber: String)

post cond1 : phoneNumber = newPhoneNumber

--

-- Setting the fax area code for OverarchingEntity stores the new area code

Page 85:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

77

--

context OverarchingEntity::setFaxAreaCode(newAreaCode: String)

post cond1 : faxAreaCode = newAreaCode

--

-- Setting the fax number for OverarchingEntity stores the new phone number

--

context OverarchingEntity::setFaxNumber(newFaxNumber: String)

post cond1 : faxNumber = newFaxNumber

--

-- Setting the street for OverarchingEntity stores the new street

--

context OverarchingEntity::setStreetAddress(newStreetAddress: String)

post cond1 : streetString = newStreetAddress

--

-- Setting the city for OverarchingEntity stores the new city

--

context OverarchingEntity::setCity(newCity: String)

post cond1 : cityString = newCity

--

-- Setting the state for OverarchingEntity stores the new state

--

context OverarchingEntity::setState(newState: String)

Page 86:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

78

post cond1 : stateString = newState

--

-- Setting the zip code for OverarchingEntity stores the new zip code

--

context OverarchingEntity::setZipCode(newZipCode: String)

post cond1 : zipString = newZipCode

--

-- Setting the Email Address for OverarchingEntity stores the new email address

--

context OverarchingEntity::setEmailAddress(newEmailAddress: String)

post cond1 : emailAddress = newEmailAddress

Page 87:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

79

CHAPTER 5 - Technical Inspection Checklist

1 Software Production Plan

This document provides a checklist to be used in the technical inspection of the Knowledge

Discovery in Databases (KDD) Research Entity Search Tool project. It provides a

guideline for the inspectors to follow to ensure that the Architectural Design Document and

the OCL formal specification model are both complete and correct.

2 Items to be Inspected

Vision Document 2.0 will need to be referenced by the inspectors while completing the

technical inspection.

2.1 UML Diagrams

• Class Diagrams

• Sequence Diagrams

• Class Descriptions

2.2 Formal Specification

• Class Diagrams

3 Formal Inspectors

• Steve Stampbach

Contact: [email protected]

• Tim Weninger

Contact: [email protected]

4 Formal Inspection Checklist

Page 88:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

80

Table 5.1 Technical Inspection Checklist

Item # Inspection Item Pass/Fail/Partial Comments

TI-1 The symbols used in the class

diagrams conform to UML

standards.

TI-2 The symbols used in the sequence

diagrams conform to UML

standards.

TI-3 The classes in the class diagrams

have corresponding descriptions

provided in the Architectural

Design Document.

TI-4 The descriptions of the classes in

the Architecture Design Document

are clear and concise.

TI-5 The classes in the formal

specification are consistent with

those in the Architectural Design

Document (related to Web

Crawling only).

TI-6 The attributes in the formal

specification are consistent with

the attributes of the corresponding

class diagrams.

TI-7 The associations in the formal

specification are present in the

class diagrams as association links.

TI-8 The multiplicities in the formal

specification are consistent with

the multiplicities of the

Page 89:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

81

corresponding class diagrams.

TI-9 The sequence diagrams are clear

and concise.

Page 90:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

82

CHAPTER 6 - Component Design

1 Introduction

The purpose of this document is to provide a component design of the KDD-Research

Entity Search Tool (KREST). The document will illustrate class diagrams. The purpose of

each class in the diagrams will be given, as well as a description of the attributes and

methods.

1.1 Background

The purpose of KREST is to provide a multifunctional web search tool that runs as a

standalone application. The project allows the user to perform a web crawl, to perform

a basic web search over the crawled pages, and to perform an entity search over the

crawled pages. The project also allows the user to perform web searches and entity

searches based on datasets that can be loaded into the tool.

2 KDD-Research Entity Search Tool Architecture

2.1 Package View

The KREST project will follow the Model-View-Controller (MVC) architecture, with

an application class to kick off the project. This allows the screen to be easily updated

via changes to the model.

Page 91:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

83

Figure 6.1 Package View

2.2 Application Package

Figure 6.2 KREST Application Package

2.2.1 Class Description

2.2.1.1 KrestApplication

The KrestApplication class is a very simple class that will be used to start up the

program. It will startup the KrestController and makes it visible.

Table 6.1 Detailed Description of the KrestApplication Class

Class Visibility Extends Implements

KrestApplication public JDialog none

Attribute Visibility Type Other

Function Visibility Parameters Returns Actions

Page 92:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

84

init public void void Starts the

application

Main public String void Main method

2.3 Controller Package

Figure 6.3 Controller Package

2.3.1 Class Description

2.3.1.1 KrestController

The KrestController class is the class responsible for getting all of the other parts up

and running. It is responsible for signaling the web crawls, web searches, and entity

searches to begin processing. It also controls displaying the form.

Page 93:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

85

Figure 6.4 KrestController Class

Page 94:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

86

Table 6.2 Detailed Description of the KrestController Class

Class Visibility Extends Implements

KrestController public JFrame none

Attribute Visibility Type Other

aboutAction private AbstractAction

aboutMenuItem private JMenuItem

crawlButton private JButton

crawlOptionPanel private JPanel

crawlProgressBar private JProgressBar

crawlProgressLabel private JLabel

crawledURLsLabel private JLabel

crawledURLsTextField private JTextField

currentlyCrawlingLabel private JLabel

currentlyCrawlingTextField private JTextField

entitySearchButton private JButton

entitySearchPanel private JPanel

entitySearchResultsTable private JTable

entitySearchScrollPane private JScrollPane

entitySearchStringLabel private JLabel

entitySearchStringTextField private JTextField

entitySearcher private EntitySearcher

exitAction private AbstractAction

exitMenuItem private JMenuItem

fileMenu private JMenu

helpMenu private JMenu

jSeparator1 private JSeparator

jSeparator2 private JSeparator

krestMenuBar private JMenuBar

krestTabbedPane private JPane

loadDataAction private AbstractAction

Page 95:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

87

loadDataMenuItem private JMenuItem

logFileCheckbox private JCheckBox

logFileTextField private JTextField

matchingPagesTable private JTable

maxDepthRadioButton private JRadioButton

maxDepthTextField private JTextField

maxSitesComboBox private JComboBox

maxSitesRadioButton private JRadioButton

minBacklinksLabel private JLabel

minBacklinksTextField private JTextField

parentFrame private KrestController

queueSitesCountLabel private JLabel

queueSitesCountTextField private JTextField

resetCrawlerButton private JButton

saveResultsAction private AbstractAction

saveResultsMenuItem private JMenuItem

searchResultsScrolledPane private JScrollPane

searchStringLabel private JLabel

searchStringTextField private JTextField

serialVersionUID private Long final, static

startCrawlLabel private JLabel

startCrawlTextField private JTextField

view private KrestView

webCrawler private WebCrawler

webCrawlerPanel private JPanel

webSearchButton private JButton

webSearchPanel private JPanel

webSearcher private WebSearcher

Function Visibility Parameters Returns Actions

KrestController public void void Constructor to

Page 96:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

88

set up the class

crawlButtonActionPerforme

d

private void void Starts the

crawling action

entitySearchButtonActionP

erformed

private ActionEvent void Starts the entity

search action

getAboutAction private void AbstractActi

on

Gets the ‘About’

menu action

getCrawlOptionsPanel private void JPanel Gets the crawl

options panel

getCrawlProgressBar private void JProgressBar Gets the crawl

progress bar

getCrawlProgressLabel private void JLabel Gets the crawl

progress label

getCrawledURLsLabel private void JLabel Gets the crawled

URLs label

getCrawledURLsTextField private void JTextField Gets the crawled

URLs text field

getCurrentlyCrawlingLabel private void JLabel Gets the

currently

crawling label

getCurrentlyCrawlingTextF

ield

private void JTextField Gets the

currently

crawling text

field

getEntitySearchButton private void JButton Gets the entity

search button

getEntitySearchResultsTabl

e

private void JTable Gets the entity

search results

table

getEntitySearchScrollPane private void JScrollPane Gets the entity

Page 97:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

89

search scroll

pane

getEntitySearchStringLabel private void JLabel Gets the entity

search string

label

getEntitySearchStringTextF

ield

private void JTextField Gets the entity

search string

text field

getExitAction private void AbstractActi

on

Gets the action

for the exit

menu item

getExitMenuItem private void JMenuItem Gets the exit

menu item

getJSeparator2 private void JSeparator Gets the

JSeparator

getLoadAction private void AbstractActi

on

Gets the action

for the load

menu item

getLoadDataMenuItem private void JMenuItem Gets the load

data menu item

getLogFileCheckbox private void JCheckbox Gets the log file

checkbox

getMatchingPagesTable private void JTable Gets the table of

matching pages

getMaxDepthRadioButton private void JRadioButto

n

Gets the max

depth radio

button

getMaxDepthTextField private void JTextField Gets the max

depth text field

getMaxSitesRadioButton private void JRadioButto Gets the max

Page 98:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

90

n sites radio

button

getMinBacklinksLabel private void JLabel Gets the min

back links label

getMinBacklinksTextField private void JTextField Gets the min

back links text

field

getQueueSitesCountLabel private void JLabel Gets the queue

sites count label

getQueueSitesCountTextFie

ld

private void JTextField Gets the queue

sites count text

field

getResetCrawlerButton public void JButton Gets the reset

crawler button

getSaveResultsAction private void AbstractActi

on

Gets the action

for saving

results

getSaveResultsMenuItem private void JMenuItem Gets the save

results menu

item

getSearchResultsScrolledPa

ne

private void JScrolledPan

e

Gets the scroll

pane containing

search results

getSearchStringLabel private void JLabel Gets the search

string label

getSearchStringTextField private void JTextField Gets the search

strin text field

getWebSearchButton private void JButton Gets the web

search button

getWebSearchPanel private void JPanel Gets the web

Page 99:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

91

search panel

initGUI private void void Build and

display the GUI

resetCrawlerButtonActionP

erformed

private void void Action that

takes place

when the reset

crawler button is

pressed

searchButtonActionPerform

ed

private void void Action that

takes place

when the search

button is pressed

2.3.1.2 KrestAboutDialog

The KrestAboutDialog class is a Dialog that displays information about the KREST

application.

Figure 6.5 KrestAboutDialog Class

Table 6.3 Detailed Description of the KrestAboutDialog Class

Class Visibility Extends Implements

KrestAboutDialog public JDialog none

Attribute Visibility Type Other

serialVersionUID public Long final, static

Function Visibility Parameters Returns Actions

KrestAboutDialog public JFrame void Constructor

which sets up

the dialog

Page 100:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

92

2.3.1.3 FileLoader

The FileLoader class is responsible for loading in previously retrieved data into the

application.

Figure 6.6 FileLoader Class

Table 6.4 Detailed Description of the FileLoader Class

Class Visibility Extends Implements

FileLoader public JDialog none

Attribute Visibility Type Other

fileToLoad private String

library private KrestObject

Library

parent private KrestContro

ller

Function Visibility Parameters Returns Actions

FileLoader public String,

KrestContro

ller

void Constructor

which sets up

the class

readInWebBaseFile private void void Reads in the

chosen file in

the WebBase

format

2.3.1.4 WebCrawler

The WebCrawler class is responsible for setting up everything needed for a web

crawl, and starting the process to do it.

Page 101:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

93

Figure 6.7 WebCrawler Class

Table 6.5 Detailed Description of the WebCrawler Class

Class Visibility Extends Implements

WebCrawler public JDialog none

Attribute Visibility Type Other

debugSwitch public Boolean

siteVisitor private SiteVisitor static

Function Visibility Parameters Returns Actions

WebCrawler public none void Constructor

which sets up

the class

beginCrawl Public String,

String,

Integer,

Integer,

String,

Integer

void Starts up the

web crawl

based upon

the beginning

page address

getMatches public void Vector Gets the

matching web

pages crawled

getSiteVisitor public void SiteVisitor Gets the

SiteVisitor

class object

getSiteVisitorThreads public void Integer Gets the

Page 102:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

94

current

number of

running

SiteVisitor

threads

performReset public Boolean void Reset the

database by

forcing a

removal of all

crawled pages

stopCrawling public void void Stop the

current web

crawl

2.3.1.5 SiteVisitor

The SiteVisitor is responsible for visiting individual web pages. Each instance of

the SiteVisitor class is a thread that represents a different web page being visited.

Page 103:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

95

Figure 6.8 SiteVisitor Class

Table 6.6 Detailed Description of the SiteVisitor Class

Class Visibility Extends Implements

SiteVisitor public Thread none

Attribute Visibility Type Other

MAX_THREADS private Integer static, final

crawlCounter public Integer static

currentDepth private Integer

debugSwitch public Boolean

fileName public String

fileWriter private FileWriter

keepProcessing public Boolean

library public KrestObject

Library

Page 104:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

96

maxCrawl public Integer

maxDepth private Integer

minBacklinks public Integer

observer public CrawlerObs

erver

pageAddress public String

pageDatabase public Hashtable

pageMatches public Vector

pageToFetch public URL

pagesToCrawl public Integer

pagesVisited public Integer

printStream private BufferedWri

ter

searchString public String

threadCount public Integer

threadLimiter public ThreadContr

oller

threadList private ArrayList

Function Visibility Parameters Returns Actions

SiteVisitor public String,

String,

Integer,

Integer,

String,

Integer

void Constructor to set

up the class

SiteVisitor public String,

String,

Integer,

Integer,

String,

void Constructor to set

up the class

Page 105:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

97

Integer,

Integer

alreadyVisited private String Boolean Checks to see

whether or not a

page has already

been visited

containsSearchString private String Boolean Checks to see

whether or not a

page contains the

search string

extractHyperTextLinks private String Vector Extracts the links

from the web

page

getCrawlCount public void Integer Gets the number

of pages crawled

getMatches public void Vector Gets the

matching web

pages

getQueueCount public void Integer Gets the number

of pages in the

queue

getThreadCount public void Integer Gets the number

of SiteVisitor

threads running

loadPage private URL String Gets the text of

the given URL

markAsVisited private String void Marks that the

page has been

visited so that it

is not crawled

again

Page 106:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

98

resetCrawler public Boolean void Resets the crawl

information

run public void void Starts the crawl

stopAllThreads public void void Stops all threads

from crawling

2.3.1.6 ThreadController

The ThreadController class is responsible for ensuring that only up to the maximum

number of specified web crawling threads are running at any one time. The web

crawling threads are instances of the SiteVisitor class. The ThreadController

maintains tickets to keep track of which threads are allowed to run. If a thread has a

ticket, it is allowed to run, otherwise it sleeps while waiting to grab a ticket.

Figure 6.9 ThreadController Class

Table 6.7 Detailed Description of the ThreadController Class

Class Visibility Extends Implements

ThreadController public none none

Attribute Visibility Type Other

crowdSize public Integer

maxCrowdSize public Integer

ticketDatabase public Integer []

Function Visibility Parameters Returns Actions

Page 107:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

99

ThreadController public Integer void Constructor

which sets up

the class

findFreeTicket protected void Integer Finds the first

available free

ticket

getTicket public void Integer Grabs a ticket

returnAllTickets public void void Returns all

tickets to the

database

returnTicket public Integer void Returns a

specific ticket to

the database

2.3.1.7 HTTPReader

The HTTPReader class is responsible for downloading the text of a given web page.

If the given web page does not exist, it will throw an exception.

Figure 6.10 HTTPReader Class

Table 6.8 Detailed Description of the HTTPReader Class

Class Visibility Extends Implements

HTTPReader public none none

Attribute Visibility Type Other

HTTP_PORT public Integer final, static

in public DataInputSt

ream

Page 108:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

100

Function Visibility Parameters Returns Actions

checkRobotExclusionaryPr

otocol

private URL Boolean Checks to see if

crawling the

page is allowed

downloadWWWPage public URL String Grabs the text

of the web page

at the given

URL

2.3.1.8 WebSearcher

The WebSearcher class is responsible for setting up everything needed for a web

search, and starting the process to do it.

Figure 6.11 WebSearcher Class

Table 6.9 Detailed Description of the WebSearcher Class

Class Visibility Extends Implements

WebSearcher public none none

Attribute Visibility Type Other

debugSwitch public Boolean

matches public ArrayList

searchString public String

searchStrings public ArrayList

Page 109:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

101

view public KrestView

Function Visibility Parameters Returns Actions

WebSearcher public KrestView void Constructor

which sets up

the class

beginSearch public String Integer Kicks off the

search for web

pages the

contain the

search strings

determineMatches private ArrayList ArrayList Finds all matches

in the web page

list

2.3.1.9 EntitySearcher

The EntitySearcher class is responsible for setting up everything needed for a web

crawl, and starting the process to do it.

Figure 6.12 EntitySearcher Class

Page 110:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

102

Table 6.10 Detailed Description of the EntitySearcher Class

Class Visibility Extends Implements

EntitySearcher public none none

Attribute Visibility Type Other

entityString public String

library public KrestObjectL

ibrary

matches public ArrayList

searchString public String

searchStrings public ArrayList

view public KrestView

Function Visibility Parameters Returns Actions

EntitySearcher public KrestView void Constructor that

sets up the class

beginSearch public String Integer Kicks off the

search for entities

determineEmailMatches private ArrayList ArrayList Finds all email

entities in the

matching web

pages

determineFaxMatches private ArrayList ArrayList Finds all fax

entities in the

matching web

pages

determineOverarchingMatc

hes

private ArrayList ArrayList Finds all entities

in the matching

web pages

determinePhoneMatches private ArrayList ArrayList Finds all phone

number entities in

the matching web

Page 111:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

103

pages

determineStreetAddressMat

ches

private ArrayList ArrayList Finds all street

address entities in

the matching web

pages

determineZipMatches private ArrayList ArrayList Finds all zip code

entities in the

matching web

pages

2.4 View Package

Figure 6.13 View Package

2.4.1 Class Description

2.4.1.1 KrestView

The KrestView class is an abstract class that can be implemented by the

CrawlerObserver, the SearchObserver, and the EntityObserver classes. It is used to

update the display based on changes from the model.

Page 112:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

104

Figure 6.14 KrestView Class

Table 6.11 Detailed Description of the KrestView Class

Class Visibility Extends Implements

KrestView public none none

Attribute Visibility Type Other

crawler private CrawlerObserve

r

entity private EntityObserver

search private SearchObserver

Function Visibility Parameters Returns Actions

KrestView public JTable,

JTextField,

JTable

void Constructor to

set up the view

addPageToSearchTable public String void Adds a new

page to the

search table

updateCurrentCount public Integer void Updates the

number of

crawled web

pages

updateCurrentPage public String void Updates the

current web

Page 113:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

105

page being

crawled

updateCurrentProgress public Integer, Integer void Updates the

current progress

updateEntitySearchResults public KrestObjectLibr

ary, String

void Updates the

screen with the

entity search

matches found

updateSitesToCrawl public Integer void Updates the

number of sites

left to crawl

updateWebSearchResults public ArrayList void Updates the

screen with the

web search

matches found

2.4.1.2 CrawlerObserver

The CrawlerObserver class is responsible for updating the screen when the model

changes due to web crawling.

Figure 6.15 CrawlerObserver Class

Table 6.12 Detailed Description of the CrawlerObserver Class

Class Visibility Extends Implements

CrawlerObserver public none none

Page 114:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

106

Attribute Visibility Type Other

crawlButton public JButton static

currentCount public JTextField static

currentPage public JTextField static

currentProgress public JProgressBar static

krestFrame public JFrame static

matchingPagesTable public JTable static

maxToCrawl public JComboBox static

sitesToCrawl public JTextField static

Function Visibility Parameters Returns Actions

CrawlerObserver public void Default

Constructor to

set up the view

CrawlerObserver public JTextField,

JTextField,

JProgressBar,

JComboBox,

JTable,

JFrame,

JButton

void Class

constructor

addPageToSearchTable public String void Adds a new

page to the

crawled pages

table

updateCurrentCount public Integer void Updates the

current number

of crawled

pages

updateCurrentPage public String void Updates the

current page

Page 115:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

107

being crawled

updateCurrentProgress public Integer, Integer void Updates the

progress bar

updateSitesToCrawl public Integer void Updates the

number of sites

left to crawl

2.4.1.3 SearchObserver

The SearchObserver class is responsible for updating the screen when the model

changes due to web searching.

Figure 6.16 SearchObserver Class

Table 6.13 Detailed Description of the SearchObserver Class

Class Visibility Extends Implements

SearchObserver public none none

Attribute Visibility Type Other

matchingPagesTable public JTable

minBacklinksField public JTextField

Function Visibility Parameters Returns Actions

SearchObserver public JTable,

JTextField

void Constructor to

set up the

class

removeMatchesBelowBa

cklinkCount

private ArrayList ArrayList Removes the

matching web

Page 116:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

108

pages without

the requisite

number of

back links

sortListByBacklinkCount private ArrayList ArrayList Sorts the

matching web

pages by

decreasing

number of

back links

updateWebSearchResults public ArrayList void Updates the

matching web

pages found

2.4.1.4 EntityObserver

The EntityObserver class is responsible for updating the screen when the model

changes due to entity searching.

Figure 6.17 EntityObserver Class

Table 6.14 Detailed Description of the EntityObserver Class

Class Visibility Extends Implements

EntityObserver public none none

Attribute Visibility Type Other

matchingEntitiesTable public JTable

Function Visibility Parameters Returns Actions

Page 117:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

109

EntityObserver public JTable void Constructor to

set up the

class

sortListByMatchSize private ArrayList ArrayList Sorts the

entities found

by the number

of pages they

were found on

updateEntitySearchResult

s

public KrestObjectLib

ary,

String

void Updates the

matching

entities found

2.5 Model Package

Figure 6.18 Model Package

2.5.1 Class Description

2.5.1.1 KrestModel

The KrestModel class is responsible for holding the current KrestObjectLibrary

object, and making the appropriate pieces available to other classes.

Page 118:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

110

Figure 6.19 KrestModel Class

Table 6.15 Detailed Description of the KrestModel Class

Class Visibility Extends Implements

KrestModel public none none

Attribute Visibility Type Other

library private KrestObject

Library

static

name private String

Function Visibility Parameters Returns Actions

KrestModel public String void Constructor to

set up the class

addObject public WebObject void Adds a new

object to the

model

findDataByName public String WebObject Find a specific

object by name

in the model

findObjectsByType public Integer ArrayList Find all

objects of a

specified type

in the model

Page 119:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

111

getData public void Enumeration Gets all of the

objects in the

model

getName public void String Gets the name

of the model

removeObject public WebObject void Remove a

specific object

from the

database

setName public String void Sets the name

of the model

2.5.1.2 KrestObjectLibrary

The KrestObjectLibrary class is responsible for holding onto all created

WebObjects.

Figure 6.20 KrestObjectLibrary Class

Table 6.16 Detailed Description of the KrestObjectLibrary Class

Class Visibility Extends Implements

KrestObjectLibrary public none none

Attribute Visibility Type Other

objects public Hashtable static

Page 120:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

112

Function Visibility Parameters Returns Actions

KrestObjectLibrary public void void Constructor to

set up the class

findObjectByName public String WebObject Find a specific

object in the

database

findObjectsByType public Integer ArrayList Find all objects

of a specified

type in the

database

getAllObjects public void Enumeration Gets all of the

objects in the

database

getKeys public void Enumeration Gets all of the

keys of the

database

removeObject public WebObject void Remove a

specific object

from the

database

2.5.1.3 WebObject

The WebObject class is an abstract class that can be implemented by both the

Webpage class and the KrestEntity class. It is used to hold data found based on

web crawls and web or entity searches.

Figure 6.21 WebObject Class

Page 121:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

113

Table 6.17 Detailed Description of the WebObject Class

Class Visibility Extends Implements

WebObject public none none

Attribute Visibility Type Other

name protected String

Function Visibility Parameters Returns Actions

getName public void String Grabs the

name of the

object

setName public String void Gives a new

name to the

object

2.5.1.4 Webpage

The Webpage class is responsible for holding onto information about a single web

site. Each web site explored will have its own Webpage instance.

Figure 6.22 Webpage Class

Table 6.18 Detailed Description of the Webpage Class

Class Visibility Extends Implements

Webpage public WebObject none

Attribute Visibility Type Other

Page 122:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

114

backlinkCount private Integer

backlinks private ArrayList

pageText private String

Function Visibility Parameters Returns Actions

Webpage public String void Constructor

that sets up

the class

addNewBacklink public String void Adds the

name of a

webpage that

links to this

page

getBacklinkCount public void Integer Grabs the

number of

pages that link

to this one

getText public void String Grabs the text

of this object

setText public String void Sets the text

of this object

2.5.1.5 KrestEntity

The KrestEntity class is an abstract class that can be implemented by the

AddressEntity, EmailEntity, FaxEntity, PhoneEntity, ZipEntity, and

OverarchingEntity. It is used to hold data found based on entity searches.

Page 123:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

115

Figure 6.23 KrestEntity Class

Table 6.19 Detailed Description of the KrestEntity Class

Class Visibility Extends Implements

KrestEntity public WebObject none

Attribute Visibility Type Other

entityName protected String

entityPattern protected String

numberOfOccurrences public Integer

occurrenceList protected ArrayList

Function Visibility Parameters Returns Actions

addOccurrence public String void Adds a new

webpage that

contains the

entity

getAllOccurrences public void ArrayList Grabs all

instances of

the entity

from the

database

getName public void String Grabs the

name of this

object

setName public String void Sets the name

Page 124:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

116

of this object

2.5.1.6 AddressEntity

The AddressEntity class is responsible for holding onto information about a single

address entity. Each address found during an entity search will have its own

instance of this class.

Figure 6.24 AddressEntity Class

Table 6.20 Detailed Description of the AddressEntity Class

Class Visibility Extends Implements

AddressEntity public KrestEntity none

Attribute Visibility Type Other

cityString private String

stateString private String

streetAddress private String

Function Visibility Parameters Returns Actions

AddressEntity public String,

String,

String

void Constructor to set

up the class

getCity public void String Gets the city

getState public void String Gets the state

Page 125:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

117

getStreet public void String Gets the street

setCity public String void Sets the new city

of the entity

setState public String void Sets the new state

of the entity

setStreet public String void Sets the new

street of the

entity

2.5.1.7 EmailEntity

The EmailEntity class is responsible for holding onto information about a single

email entity. Each email address found during an entity search will have its own

instance of this class.

Figure 6.25 EmailEntity Class

Table 6.21 Detailed Description of the EmailEntity Class

Class Visibility Extends Implements

EmailEntity public KrestEntity none

Attribute Visibility Type Other

emailAddress private String

Function Visibility Parameters Returns Actions

EmailEntity public String void Constructor to set

up the class

getEmailAddress public void String Grabs the email

address

Page 126:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

118

associated with

the object

setEmailAddress public String void Gives a new

email address to

the object

2.5.1.8 FaxEntity

The FaxEntity class is responsible for holding onto information about a single fax

entity. Each fax number found during an entity search will have its own instance of

this class.

Figure 6.26 FaxEntity Class

Table 6.22 Detailed Description of the FaxEntity Class

Class Visibility Extends Implements

FaxEntity public KrestEntity none

Attribute Visibility Type Other

areaCode private String

faxNumber private String

Function Visibility Parameters Returns Actions

FaxEntity public String,

String

void Constructor to set

up the class

getAreaCode public void String Gets the area

Page 127:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

119

code of the fax

number

getFaxNumber public void String Gets the fax

number without

the area code

getFullFaxNumber public void String Gets the fax

number with the

area code

setAreaCode public String void Sets the new area

code of the entity

setPhoneNumber public String void Sets the new

number of the

entity

2.5.1.9 PhoneEntity

The PhoneEntity class is responsible for holding onto information about a single

phone entity. Each phone number found during an entity search will have its own

instance of this class.

Figure 6.27 PhoneEntity Class

Table 6.23 Detailed Description of the PhoneEntity Class

Class Visibility Extends Implements

Page 128:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

120

PhoneEntity public KrestEntity none

Attribute Visibility Type Other

areaCode private String

phoneNumber private String

Function Visibility Parameters Returns Actions

PhoneEntity public String,

String

void Constructor to set

up the class

getAreaAndPhoneNumber public void String Gets the phone

number with the

area code

getAreaCode public void String Gets the area

code of the phone

number

getPhoneNumber public void String Gets the phone

number without

the area code

setAreaCode public String void Sets the new area

code of the entity

setPhoneNumber public String void Sets the new

number of the

entity

2.5.1.10 ZipEntity

The ZipEntity class is responsible for holding onto information about a single zip

entity. Each zip code found during an entity search will have its own instance of

this class.

Page 129:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

121

Figure 6.28 ZipEntity Class

Table 6.24 Detailed Description of the ZipEntity Class

Class Visibility Extends Implements

ZipEntity public KrestEntity none

Attribute Visibility Type Other

zipCode private String

Function Visibility Parameters Returns Actions

ZipEntity public String void Constructor to

set up the class

getZipCode public void String Grabs the zip

code associated

with the object

setZipCode public String void Gives a new zip

code to the

object

2.5.1.11 OverarchingEntity

The OverarchingEntity class is responsible for holding onto information about all

entity types. Each street address, email address, fax number, phone number, and

zip code found during an overarching entity search will have its own instance of

this class.

Page 130:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

122

Figure 6.29 OverarchingEntity Class

Table 6.25 Detailed Description of the OverarchingEntity Class

Class Visibility Extends Implements

OverarchingEntity public KrestEntity none

Attribute Visibility Type Other

cityString private String

emailAddress private String

faxAreaCode private String

faxNumber private String

phoneAreaCode private String

phoneNumber private String

stateString private String

streetString private String

zipCode private String

Function Visibility Parameters Returns Actions

OverarchingEntity public String,

String,

String,

String,

String,

String

void Constructor to set

up the class

Page 131:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

123

String,

String,

String

getAreaAndPhoneNumber public void String Gets the area

code and phone

number

getEmailAddress public void String Gets the email

address

getCity public void String Gets the city

getFaxAreaAndNumber public void String Gets the fax area

code and fax

number

getFaxAreaCode public void String Gets the fax area

code

getFaxNumber public void String Gets the fax

number without

the area code

getPhoneAreaCode public void String Gets the phone

area code

getPhoneNumber public void String Gets the phone

number without

the area code

getState public void String Gets the state

getStreetAddress public void String Gets the street

getZipCode public void String Gets the zip code

setEmailAddress public String void Sets the new

email address of

the entity

setCity public String void Sets the new city

of the entity

Page 132:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

124

setFaxAreaCode public String void Sets the new fax

area code of the

entity

setFaxNumber public String void Sets the new fax

number of the

entity

setPhoneAreaCode public String void Sets the new

phone area code

of the entity

setPhoneNumber public String void Sets the new

phone number of

the entity

setState public String void Sets the new state

of the entity

setStreetAddress public String void Sets the new

street of the

entity

setZipCode public String void Sets the new zip

code of the entity

Page 133:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

125

CHAPTER 7 - Test Plan

1 Test Plan Identifier

KREST-Validation-V-1.0

2 Introduction

This document provides the methods that will be used to test the KDD-Research Entity

Search Tool (KREST). The project allows the user to perform a web crawl, to perform a

basic web search over the crawled pages, and to perform a entity search over the crawled

pages. Each task will be treated as a separate module of the system and will be tested with

respect to the associated requirements described in the vision document.

3 Test Items

The following items will be tested:

• General Application Related Items

• Web Crawler Items

• Web Search Items

• Entity Search Items

• Reproducing similar results based on the same datasets to [2].

4 Tested Features

All features listed below will be tested. These features can also be found in the Vision

Document.

4.1 General Application Related Items

• ARI 100 – The program shall provide a GUI for user interaction.

• ARI 101 – The application shall be executable in a single step (e.g. without having

to perform any setup steps).

Page 134:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

126

• ARI 102 – The application shall have a menu bar that contains at a minimum: a File

menu and a Help menu.

• ARI 103 – The application shall allow the user to load a data set of web pages.

• ARI 104 – The application shall allow the user to save entity search results.

• ARI 105 – The application's Help menu shall contain at a minimum an About menu

item.

• ARI 106 – The application's menu bar shall contain shortcut keys.

• ARI 107 – The application shall be platform independent.

• ARI 108 – The application shall be able to be minimized.

• ARI 109 – The application shall be able to be closed without having to perform a

Control-C from the command line.

4.2 Web Crawler Items

• WCRI 100 – The user shall have the ability to perform a web crawl based on a

starting website.

• WCRI 101 – The user shall be allowed to specify the starting website (if none is

specified, http://www.cis.ksu.edu will be used).

• WCRI 102 – The user shall have the ability to specify the maximum depth of the

web crawl.

• WCRI 103 – The user shall have the ability to specify a log file in which to save

the results of the crawl.

• WCRI 104 – The user shall be allowed to specify the maximum number of

websites to crawl before stopping.

• WCRI 105 – The user shall be allowed to stop the crawl at any time before it

finishes.

• WCRI 106 – The user shall be notified when the crawl is complete.

• WCRI 107 – The user shall be kept apprised of the total number of pages left to

crawl.

• WCRI 108 – The user shall be apprised of the total number of pages crawled.

Page 135:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

127

4.3 Web Search Items

• WSRI 100 – The user shall be allowed to search over previously crawled web

pages.

• WSRI 101 – The user shall have a box to enter search terms.

• WSRI 102 – The user shall be allowed to specify the minimum number of back-

links required for a page containing the search term to be considered a match.

• WSRI 103 – The URLs that match the search terms shall be sorted in order of

number of back-links.

• WSRI 104 – The URLs that match the search terms shall be displayed in a

scrollable text box.

4.4 Entity Search Items

• ESRI 100 – The user shall have the ability to search for entities from previously

crawled websites.

• ESRI 101 – The user shall have a box to enter search terms.

• ESRI 102 – There shall entities for at a minimum: email address, phone number,

fax number, street address, and zip code.

• ESRI 103 – There shall be an overarching entity that gathers all contact info.

• ESRI 104 – The entity search results shall be ranked based on highest score.

• ESRI 105 – The user shall be allowed to specify search terms in addition to entity

terms.

• ESRI 106 – The entities that match the search terms shall be displayed in a

scrollable text box.

5 Features not to be Tested

Testing on the following two requirements will not be tested, rather they will be checked in

the code by inspection.

• WCRI 109 – The crawler shall follow the robot exclusionary protocol.

• WCRI 110 – The crawler shall use multiple threads to avoid putting too much

stress on an individual web host.

Page 136:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

128

6 Approach

Testing will be performed by running separate series of actions using KREST. The

sequences of actions will be defined in separate test cases, which can be found in Section

10. Each test case will list the action to be performed, the expected result, and the features

or requirements that map to that step.

7 Item Pass / Fail Criteria

Each test case will be considered successful if it meets the requirements mentioned in the

Vision document. A test case will fail if any requirement is not met as described.

8 Suspension Criteria and Resumption Requirements

8.1 Suspension Criteria

In the event of a test case failure, the running of the test case shall be halted. The

failure shall be logged in the Test Log, as well as the likely cause, and suggested

solutions to the problem.

8.2 Resumption Requirements

After a test case failure, the test case shall be rerun from the beginning of the test once

the problem has been logged with the problem identified and a solution to the problem

implemented. Testing on independent test cases can continue to be executed in parallel

with the effort to fix problems encountered in independent areas.

9 Test Deliverables

A Test Log document will be maintained during testing, that will provide delivered when

testing is complete. The Test Log document will document the time and date of all test

cases run, as well as documenting whether the tests passed or failed. In the event of a

failed test, the Test Log will also contain the reason for the failure as well as suggested

solutions.

10 Testing Tasks

Page 137:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

129

10.1 Test Case 1: Application Items

This test case tests the basic application items.

Prerequisites: None.

Table 7.1 Test Case 1

Step

#

Action Performed Expected Outcome Requirements

Met

Pass/Fail

1 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the KREST

program starts up, with the

Web Crawler tab opened.

• Observe that the menu bar

contains a File menu and a

Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

2 Tester types Alt-H | A. • Observe that an About

dialog is opened.

ARI 105

ARI 106

3 Tester selects the “OK”

button from the About

dialog.

• Observe that the About

dialog closes.

4 Tester minimizes the

KREST application.

• Observe that the KREST

application is minimized.

ARI 108

5 Tester restores the

KREST application.

• Observe that the KREST

application is restored.

ARI 108

6 Tester types Alt-F | X. • Observe that the KREST

application closes.

ARI 109

7 Tester starts KREST by

running the .jar file on a

CIS Linux or Unix

machine.

• Observe that the KREST

program starts up, with the

Web Crawler tab opened.

• Observe that the menu bar

ARI 100

ARI 101

ARI 107

ARI 102

Page 138:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

130

contains a File menu and a

Help Menu

• Observe that the menu

items contain shortcuts.

ARI 106

8 Tester types Alt-F | X. • Observe that the KREST

application closes.

ARI 109

10.2 Test Case 2: Web Crawler Items

This test case tests the web crawler requirements.

Prerequisites: Test Case 1 must have passed.

Table 7.2 Test Case 2

Step

#

Action Performed Expected Outcome Requirements

Met

Pass/Fail

1 Tester starts KREST by

double clicking on the .jar

file on a Windows PC.

• Observe that the KREST

program starts up, with

the Web Crawler tab

opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

2 Tester enters a valid

website to crawl in the

“Start Crawl At:” field.

• Field becomes populated. WCRI 100

WCRI 101

3 Tester selects the radio

button next to “Max

Depth to Explore”

• Observe that the radio

button next to “Max

Depth to Explore”

becomes selected.

Page 139:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

131

• Observe that the radio

button next to “Max Sites

to Explore” becomes

deselected.

4 Tester selects the radio

button next to “Max Sites

to Explore”.

• Observe that the radio

button next to “Max Sites

to Explore” becomes

selected.

• Observe that the radio

button next to “Max

Depth to Explore”

becomes deselected.

5 Tester enters a number

between 10 and 25 in the

“Max Sites to Explore”

field.

• Observe that the field is

updated.

WCRI 104

6 Tester presses the “Begin

Crawl” button.

• Observe that the “Begin

Crawl” button is renamed

to a “Stop Crawl” button.

• Observe that the “Reset

Crawler” button becomes

sensitized.

• Observe that the

“Currently Crawling”

field continuously

updates with the current

website being explored.

• Observe that the

“Crawled URLs” field is

updated with the current

number of URLs that has

WCRI 108

Page 140:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

132

number of URLs that has

been crawled.

• Observe that the “Sites in

the Queue” field is

updated with the number

of websites that could

still be explored.

• Observe that the “Current

Progress” progress bar is

updated based on the

number of pages left to

crawl.

• Observe that when the

“Current Progress”

progress bar reaches

100%, a dialog appears

notifying the operator

that the crawl is

complete.

• When the crawl is

complete, observe that

the “Stop Crawl” button

returns to a “Begin

Crawl” button.

• Observe that the Web

Search and Entity Search

tabs become sensitized.

WCRI 107

WCRI 106

7 Tester selects the “OK”

button from the dialog.

• Observe that the dialog

closes.

8 Tester presses the “Reset

Crawler” button.

• Observe that a

Page 141:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

133

Crawler” button. confirmation dialog

appears.

9 Tester presses the

“Cancel” button from the

confirmation dialog.

• Observe the confirmation

dialog disappears.

• Observe there are no

changes to the form.

10 Tester presses the “Reset

Crawler” button.

• Observe that a

confirmation dialog

appears.

11 Tester presses the “OK”

button from the

confirmation dialog.

• Observe the confirmation

dialog disappears.

• Observe the “Reset

Crawler” button is

desensitized.

• Observe the “Begin

Crawl” button is labeled

appropriately.

• Observe the “Currently

Crawling” field is empty.

• Observe the “Crawled

URLs” field is set to 0.

• Observe the “Sites in the

Queue” field is set to 0.

• Observe the “Crawl

Progress” progress bar is

reset.

12 Tester selects the

checkbox next to the

“Log File to Use Field”.

• Observe the checkbox

becomes selected.

• Observe the field

becomes enabled.

WCRI 103

Page 142:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

134

13 Tester enters a valid

filename where they

would like the log results

saved (or uses the default

file name).

• Observe the field is

updated.

14 Tester selects the radio

button next to “Max

Depth to Explore”.

• Observe that the radio

button next to “Max Sites

to Explore” becomes

selected.

• Observe that the radio

button next to “Max

Depth to Explore”

becomes deselected.

15 Tester enters a value

between 2 and 5 in the

“Max Depth to Explore”

field.

• Observe the field is

updated.

WCRI 102

16 Tester presses the “Begin

Crawl” button.

• Observe that the “Begin

Crawl” button is renamed

to a “Stop Crawl” button.

• Observe that the “Reset

Crawler” button becomes

sensitized.

• Observe that the

“Currently Crawling”

field continuously

updates with the current

website being explored.

• Observe that the

“Crawled URLs” field is

updated with the current

WCRI 108

Page 143:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

135

updated with the current

number of URLs that has

been crawled.

• Observe that the “Sites in

the Queue” field is

updated with the number

of websites that could

still be explored.

• Observe that the “Current

Progress” progress bar is

updated based on the

number of pages left to

crawl.

• Before the crawl

completes, move to the

next step.

WCRI 107

17 Tester presses the “Stop

Crawl” button.

• Observe that the text of

the Button returns to

“Begin Crawl”.

• Observe that the crawl is

halted.

WCRI 105

18 Tester presses the “Reset

Crawler” button.

• Observe that a

confirmation dialog

appears.

19 Tester presses the “OK”

button from the

confirmation dialog.

• Observe the confirmation

dialog disappears.

• Observe the “Reset

Crawler” button is

desensitized.

• Observe the “Begin

Page 144:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

136

Crawl” button is labeled

appropriately.

• Observe the “Currently

Crawling” field is empty.

• Observe the “Crawled

URLs” field is set to 0.

• Observe the “Sites in the

Queue” field is set to 0.

• Observe the “Crawl

Progress” progress bar is

reset.

20 Tester types Alt-F | X. • Observe that the KREST

application closes.

ARI 109

21 Tester opens the log file

that they specified for

crawling.

• Observe that the results

of the web crawl were

logged.

WCRI 103

10.3 Test Case 3: Web Search Items

This test case tests the web search requirements.

Prerequisites: Test Case 1 and Test Case 2 must both have passed.

Table 7.3 Test Case 3

Step

#

Action Performed Expected Outcome Requirements

Met

Pass/Fail

1 Tester starts KREST by

double clicking on the .jar

file on a Windows PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

ARI 100

ARI 101

ARI 102

Page 145:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

137

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 106

2 Tester enters a valid

website to crawl in the

“Start Crawl At:” field.

• Field becomes

populated.

WCRI 100

WCRI 101

3 Tester enters a number

between 10 and 25 in the

“Max Sites to Explore”

field.

• Observe that the field is

updated.

WCRI 104

4 Tester presses the “Begin

Crawl” button.

• Observe that the “Begin

Crawl” button is

renamed to a “Stop

Crawl” button.

• Observe that the “Reset

Crawler” button

becomes sensitized.

• Observe that the

“Currently Crawling”

field continuously

updates with the current

website being explored.

• Observe that the

“Crawled URLs” field

is updated with the

current number of

URLs that has been

crawled.

• Observe that the “Sites

in the Queue” field is

updated with the

WCRI 108

Page 146:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

138

updated with the

number of websites that

could still be explored.

• Observe that the

“Current Progress”

progress bar is updated

based on the number of

pages left to crawl.

• Observe that when the

“Current Progress”

progress bar reaches

100%, a dialog appears

notifying the operator

that the crawl is

complete.

• When the crawl is

complete, observe that

the “Stop Crawl” button

returns to a “Begin

Crawl” button.

• Observe that the Web

Search and Entity

Search tabs become

sensitized.

WCRI 107

WCRI 106

5 Tester selects the “OK”

button from the dialog.

• Observe that the dialog

closes.

6 Tester selects the Web

Search tab.

• Observe that the Web

Search tab is now

raised.

7 Tester enters a string to

search for in the “Search

• Observe that the field is WSRI 100

Page 147:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

139

search for in the “Search

String” field.

updated. WSRI 101

8 Tester enters a value

between 1 and 3 in the

“Min # of Backlinks”

field.

• Observe that the field is

updated.

WSRI 102

9 Tester presses the “Begin

Search” button.

• Observe that URLs that

contain the search string

in their text are listed in

the Search Results

scrollable box.

• Observe that the URLs

are sorted by decreasing

number of backlinks.

WSRI 104

WSRI 103

10 Tester enters a new string

to search for in the

“Search String” field.

• Observe that the field is

updated.

WSRI 100

WSRI 101

11 Tester presses the “Begin

Search” button.

• Observe that URLs that

contain the search string

in their text are listed in

the Search Results

scrollable box.

• Observe that the URLs

are sorted by decreasing

number of backlinks.

WSRI 104

WSRI 103

12 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

Page 148:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

140

10.4 Test Case 4: Entity Search Items

This test case tests the entity search requirements.

Prerequisites: Test Case 1 and Test Case 2 must both have passed.

Table 7.4 Test Case 4

Step

#

Action Performed Expected Outcome Requirements

Met

Pass/Fail

1 Tester starts KREST by

double clicking on the .jar

file on a Windows PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

2 Tester enters a valid

website to crawl in the

“Start Crawl At:” field.

• Field becomes

populated.

WCRI 100

WCRI 101

3 Tester enters a number

between 10 and 25 in the

“Max Sites to Explore”

field.

• Observe that the field is

updated.

WCRI 104

4 Tester presses the “Begin

Crawl” button.

• Observe that the “Begin

Crawl” button is

renamed to a “Stop

Crawl” button.

• Observe that the “Reset

Crawler” button

becomes sensitized.

Page 149:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

141

• Observe that the

“Currently Crawling”

field continuously

updates with the current

website being explored.

• Observe that the

“Crawled URLs” field

is updated with the

current number of URLs

that has been crawled.

• Observe that the “Sites

in the Queue” field is

updated with the

number of websites that

could still be explored.

• Observe that the

“Current Progress”

progress bar is updated

based on the number of

pages left to crawl.

• Observe that when the

“Current Progress”

progress bar reaches

100%, a dialog appears

notifying the operator

that the crawl is

complete.

• When the crawl is

complete, observe that

the “Stop Crawl” button

returns to a “Begin

WCRI 108

WCRI 107

WCRI 106

Page 150:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

142

Crawl” button.

• Observe that the Web

Search and Entity

Search tabs become

sensitized.

5 Tester selects the “OK”

button from the dialog.

• Observe that the dialog

closes.

6 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

7 Tester enters a string to

search for in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

8 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the email

address of the previous

term.

• Observe that the field is

updated.

ESRI 102

9 Tester presses the “Begin

Search” button.

• Observe that email

addresses that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

ESRI 106

ESRI 104

10 Tester deletes the old

value in the “Search

• Observe that the field is ESRI 100

Page 151:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

143

value in the “Search

String” field enters a new

string to search for in the

“Search String” field.

updated. ESRI 101

ESRI 105

11 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102

12 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

ESRI 106

ESRI 104

13 Tester deletes the old

value in the “Search

String” field enters a new

string to search for in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

14 Tester adds “#fax”

without the quotes in the

“Search String” field to

search for the fax number

of the previous term.

• Observe that the field is

updated.

ESRI 102

15 Tester presses the “Begin

Search” button.

• Observe that fax ESRI 106

Page 152:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

144

Search” button. numbers that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

ESRI 104

16 Tester deletes the old

value in the “Search

String” field enters a new

string to search for in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

17 Tester adds “#address”

without the quotes in the

“Search String” field to

search for the street

address of the previous

term.

• Observe that the field is

updated.

ESRI 102

18 Tester presses the “Begin

Search” button.

• Observe that street

addresses that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

ESRI 106

ESRI 104

Page 153:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

145

19 Tester deletes the old

value in the “Search

String” field enters a new

string to search for in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

20 Tester adds “#zip”

without the quotes in the

“Search String” field to

search for the zip code of

the previous term.

• Observe that the field is

updated.

ESRI 102

21 Tester presses the “Begin

Search” button.

• Observe that zip codes

that were contained on

the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

ESRI 106

ESRI 104

22 Tester deletes the old

value in the “Search

String” field enters a new

string to search for in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

23 Tester adds “#all” without

the quotes in the “Search

String” field to search for

the all of the contact info

of the previous term.

• Observe that the field is

updated.

ESRI 103

24 Tester presses the “Begin

Search” button.

• Observe that all contact ESRI 106

Page 154:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

146

Search” button. info that was contained

on the same pages that

matched the search

string is listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

ESRI 104

25 Tester types Alt-F | S,

saves the entity search

results, and verifies that

the data was saved.

• Observe that a file

dialog appears

ARI 104

26 Tester enters a valid file

name and selects the

‘Save’ button.

• Observe that the entity

search results were

saved to the specified

file.

27 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

10.5 Test Case 5: Reproducing the Results of [2]

This test case tests the ability of the entity searcher to reproduce the results of the entity

search project described in [2].

Prerequisites: Test Case 1 and Test Case 2 must both have passed. The twelve datasets

when represent a sampling of the original dataset found in Tao Cheng’s entity search

work [2] must be available for use.

Page 155:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

147

Table 7.5 Test Case 5

Step

#

Action Performed Expected Outcome Requirements

Met

Pass/Fail

1 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

2 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

3 Tester loads the

‘Test_Data_1.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103

4 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

5 Tester enters “Citibank

Customer Service” in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

6 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

• Observe that the field is

updated.

ESRI 102

Page 156:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

148

term.

7 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed in

the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 800-967-

2400 should be

contained in the

matches.

ESRI 106

ESRI 104

8 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

9 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

10 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

11 Tester loads the

‘Test_Data_2.pages’

• Observe that the load ARI 103

Page 157:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

149

‘Test_Data_2.pages’

dataset.

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

12 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

13 Tester enters “New York

DMV” in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

14 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102

15 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed in

the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 800-342-

5368 should be

contained in the

matches.

ESRI 106

ESRI 104

Page 158:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

150

16 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

17 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

18 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

19 Tester loads the

‘Test_Data_3.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103

20 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

21 Tester enters “Amazon

Customer Service” in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

22 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

• Observe that the field is

updated.

ESRI 102

Page 159:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

151

term.

23 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed in

the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 800-201-

7575 should be

contained in the

matches.

ESRI 106

ESRI 104

24 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

25 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

26 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

27 Tester loads the

‘Test_Data_4.pages’

• Observe that the load ARI 103

Page 160:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

152

‘Test_Data_4.pages’

dataset.

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

28 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

29 Tester enters “EBay

Customer Service” in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

30 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102

31 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed in

the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 888-749-

3229 should be

contained in the

matches.

ESRI 106

ESRI 104

Page 161:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

153

32 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

33 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

34 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

35 Tester loads the

‘Test_Data_5.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103

36 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

37 Tester enters “Thinkpad

Customer Service” in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

38 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

• Observe that the field is

updated.

ESRI 102

Page 162:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

154

term.

39 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed in

the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 877-338-

4465 should contained

in the matches.

ESRI 106

ESRI 104

40 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

41 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

42 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

43 Tester loads the

‘Test_Data_6.pages’

dataset.

• Observe that the load

dialog disappears.

ARI 103

Page 163:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

155

dataset. • Observe that the Web

Search and Entity

Search tabs become

enabled.

44 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

45 Tester enters “Illinois

IRS” in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

46 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102

47 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed in

the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 800-829-

3676 should be

contained in the

matches.

ESRI 106

ESRI 104

48 Tester types Alt-F | X. • Observe that the ARI 109

Page 164:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

156

KREST application

closes.

49 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

50 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

51 Tester loads the

‘Test_Data_7.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103

52 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

53 Tester enters “Barnes &

Noble Customer Service”

in the “Search String”

field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

54 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

• Observe that the field is

updated.

ESRI 102

Page 165:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

157

term.

55 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed in

the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 800-422-

7717 should be

contained in the

matches.

ESRI 106

ESRI 104

56 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

57 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

58 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

59 Tester loads the

‘Test_Data_8.pages’

• Observe that the load ARI 103

Page 166:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

158

‘Test_Data_8.pages’

dataset.

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

60 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

61 Tester enters “Bill

Gates” in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

62 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102

63 Tester presses the “Begin

Search” button.

• Observe that email

addresses that occurred

on the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that

[email protected]

should be contained in

the matches.

ESRI 106

ESRI 104

Page 167:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

159

64 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

65 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

66 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

67 Tester loads the

‘Test_Data_9.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103

68 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

69 Tester enters “Oprah

Winfrey” in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

70 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

• Observe that the field is

updated.

ESRI 102

Page 168:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

160

term.

71 Tester presses the “Begin

Search” button.

• Observe that email

addresses that occurred

on the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that

[email protected] should

be contained in the

matches.

ESRI 106

ESRI 104

72 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

73 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

74 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

75 Tester loads the

‘Test_Data_10.pages’

• Observe that the load ARI 103

Page 169:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

161

‘Test_Data_10.pages’

dataset.

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

76 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

77 Tester enters “Elvis

Presley” in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

78 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102

79 Tester presses the “Begin

Search” button.

• Observe that email

addresses that occurred

on the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that

[email protected]

should be contained in

the matches.

ESRI 106

ESRI 104

Page 170:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

162

80 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

81 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

82 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

83 Tester loads the

‘Test_Data_11.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103

84 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

85 Tester enters “Larry

Page” in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

86 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

• Observe that the field is

updated.

ESRI 102

Page 171:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

163

term.

87 Tester presses the “Begin

Search” button.

• Observe that email

addresses that occurred

on the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that

[email protected]

should be contained in

the matches.

ESRI 106

ESRI 104

88 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

89 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File menu

and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

90 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103

91 Tester loads the

‘Test_Data_12.pages’

• Observe that the load ARI 103

Page 172:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

164

‘Test_Data_12.pages’

dataset.

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

92 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now raised.

93 Tester enters “Arnold

Schwarzenegger” in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

94 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102

95 Tester presses the “Begin

Search” button.

• Observe that email

addresses that occurred

on the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that

[email protected]

ov should be contained

in the matches.

ESRI 106

ESRI 104

Page 173:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

165

96 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109

Page 174:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

166

CHAPTER 8 - Test Assessment Evaluation

1 Introduction

This document provides the results of performing functional qualification testing on the

KDD-Research Entity Search Tool (KREST) project. The project allows the user to

perform a web crawl, to perform a basic web search over the crawled pages, and to perform

an entity search over the crawled pages. Functional black-box testing was performed. The

functionality tested, and the methods used are described in the Test Plan document.

2 Test Results Summary

Table 8.1 Test Results Summary

Test Case Main Functionality Tested Pass/Fail

Test Case 1 Application Functionality PASS

Test Case 2 Web Crawling Functionality PASS

Test Case 3 Web Searching PASS

Test Case 4 Entity Searching PASS

Test Case 5 Reproducing the results in [2] PASS

The specific requirements tested by each test case are listed throughout the test procedures

next to the actual step where they are tested.

3 Complete Test Results

3.1 Test Case 1: Application Items

This test case tests the basic application items.

Prerequisites: None.

Page 175:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

167

Date Performed: 3/9/08

Issues Found: None

Comments: Test ran perfectly.

Table 8.2 Test Log for Test Case 1

Step

#

Action Performed Expected Outcome Requirements

Met

Pass/Fail

1 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

Pass

2 Tester types Alt-H | A. • Observe that an About

dialog is opened.

ARI 105

ARI 106

Pass

3 Tester selects the “OK”

button from the About

dialog.

• Observe that the About

dialog closes.

Pass

4 Tester minimizes the

KREST application.

• Observe that the

KREST application is

minimized.

ARI 108 Pass

5 Tester restores the

KREST application.

• Observe that the

KREST application is

restored.

ARI 108 Pass

6 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

7 Tester starts KREST by

running the .jar file on a

CIS Linux or Unix

machine.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

ARI 100

ARI 101

ARI 107

ARI 102

Pass

Page 176:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

168

items contain shortcuts.

ARI 106

8 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

3.2 Test Case 2: Web Crawler Items

This test case tests the web crawler requirements.

Prerequisites: Test Case 1 must have passed.

Date Performed: 3/9/08

Issues Found: None

Comments: Crawler seemed to hang during the first attempt at breadth first crawling.

Application was restarted, and could not repeat the issue. This is considered to be an

issue with the internet connection (which has been flaky lately).

Table 8.3 Test Log for Test Case 2

Step

#

Action Performed Expected Outcome Requirements

Met

Pass/Fail

1 Tester starts KREST by

double clicking on the .jar

file on a Windows PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

Pass

2 Tester enters a valid

website to crawl in the

“Start Crawl At:” field.

• Field becomes

populated.

WCRI 100

WCRI 101

Pass

Page 177:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

169

3 Tester selects the radio

button next to “Max

Depth to Explore”

• Observe that the radio

button next to “Max

Depth to Explore”

becomes selected.

• Observe that the radio

button next to “Max

Sites to Explore”

becomes deselected.

Pass

4 Tester selects the radio

button next to “Max Sites

to Explore”.

• Observe that the radio

button next to “Max

Sites to Explore”

becomes selected.

• Observe that the radio

button next to “Max

Depth to Explore”

becomes deselected.

Pass

5 Tester enters a number

between 10 and 25 in the

“Max Sites to Explore”

field.

• Observe that the field is

updated.

WCRI 104 Pass

6 Tester presses the “Begin

Crawl” button.

• Observe that the “Begin

Crawl” button is

renamed to a “Stop

Crawl” button.

• Observe that the “Reset

Crawler” button

becomes sensitized.

• Observe that the

“Currently Crawling”

field continuously

updates with the current

website being explored.

• Observe that the

“Crawled URLs” field

is updated with the

current number of

URLs that has been

crawled.

• Observe that the “Sites

in the Queue” field is

updated with the

number of websites that

could still be explored.

• Observe that the

WCRI 108

Pass

Page 178:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

170

“Current Progress”

progress bar is updated

based on the number of

pages left to crawl.

• Observe that when the

“Current Progress”

progress bar reaches

100%, a dialog appears

notifying the operator

that the crawl is

complete.

• When the crawl is

complete, observe that

the “Stop Crawl” button

returns to a “Begin

Crawl” button.

• Observe that the Web

Search and Entity

Search tabs become

sensitized.

WCRI 107

WCRI 106

7 Tester selects the “OK”

button from the dialog.

• Observe that the dialog

closes.

Pass

8 Tester presses the “Reset

Crawler” button.

• Observe that a

confirmation dialog

appears.

Pass

9 Tester presses the

“Cancel” button from the

confirmation dialog.

• Observe the

confirmation dialog

disappears.

• Observe there are no

changes to the form.

Pass

10 Tester presses the “Reset

Crawler” button.

• Observe that a

confirmation dialog

appears.

Pass

11 Tester presses the “OK”

button from the

confirmation dialog.

• Observe the

confirmation dialog

disappears.

• Observe the “Reset

Crawler” button is

desensitized.

• Observe the “Begin

Crawl” button is labeled

appropriately.

• Observe the “Currently

Crawling” field is

empty.

Pass

Page 179:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

171

• Observe the “Crawled

URLs” field is set to 0.

• Observe the “Sites in

the Queue” field is set

to 0.

• Observe the “Crawl

Progress” progress bar

is reset.

12 Tester selects the

checkbox next to the

“Log File to Use Field”.

• Observe the checkbox

becomes selected.

• Observe the field

becomes enabled.

WCRI 103 Pass

13 Tester enters a valid

filename where they

would like the log results

saved (or uses the default

file name).

• Observe the field is

updated.

Pass

14 Tester selects the radio

button next to “Max

Depth to Explore”.

• Observe that the radio

button next to “Max

Sites to Explore”

becomes selected.

• Observe that the radio

button next to “Max

Depth to Explore”

becomes deselected.

Pass

15 Tester enters a value

between 2 and 5 in the

“Max Depth to Explore”

field.

• Observe the field is

updated.

WCRI 102 Pass

16 Tester presses the “Begin

Crawl” button.

• Observe that the “Begin

Crawl” button is

renamed to a “Stop

Crawl” button.

• Observe that the “Reset

Crawler” button

becomes sensitized.

• Observe that the

“Currently Crawling”

field continuously

updates with the current

website being explored.

Pass

Page 180:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

172

• Observe that the

“Crawled URLs” field

is updated with the

current number of

URLs that has been

crawled.

• Observe that the “Sites

in the Queue” field is

updated with the

number of websites that

could still be explored.

• Observe that the

“Current Progress”

progress bar is updated

based on the number of

pages left to crawl.

• Before the crawl

completes, move to the

next step.

WCRI 108

WCRI 107

17 Tester presses the “Stop

Crawl” button.

• Observe that the text of

the Button returns to

“Begin Crawl”.

• Observe that the crawl

is halted.

WCRI 105 Pass

18 Tester presses the “Reset

Crawler” button.

• Observe that a

confirmation dialog

appears.

Pass

19 Tester presses the “OK”

button from the

confirmation dialog.

• Observe the

confirmation dialog

disappears.

• Observe the “Reset

Crawler” button is

desensitized.

• Observe the “Begin

Crawl” button is labeled

Pass

Page 181:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

173

appropriately.

• Observe the “Currently

Crawling” field is

empty.

• Observe the “Crawled

URLs” field is set to 0.

• Observe the “Sites in

the Queue” field is set

to 0.

• Observe the “Crawl

Progress” progress bar

is reset.

20 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

21 Tester opens the log file

that they specified for

crawling.

• Observe that the results

of the web crawl were

logged.

WCRI 103 Pass

3.3 Test Case 3: Web Search Items

This test case tests the web search requirements.

Prerequisites: Test Case 1 and Test Case 2 must both have passed.

Date Performed: 3/9/08

Issues Found: None

Comments: First search yielded no results (probably due to the small number of web

pages actually crawled). Changed the search term, and the test was able to be

completed properly.

Table 8.4 Test Log for Test Case 3

Step

#

Action Performed Expected Outcome Requirements

Met

Pass/Fail

1 Tester starts KREST by • Observe that the

KREST program starts

ARI 100 Pass

Page 182:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

174

double clicking on the .jar

file on a Windows PC.

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 101

ARI 102

ARI 106

2 Tester enters a valid

website to crawl in the

“Start Crawl At:” field.

• Field becomes

populated.

WCRI 100

WCRI 101

Pass

3 Tester enters a number

between 10 and 25 in the

“Max Sites to Explore”

field.

• Observe that the field is

updated.

WCRI 104 Pass

4 Tester presses the “Begin

Crawl” button.

• Observe that the “Begin

Crawl” button is

renamed to a “Stop

Crawl” button.

• Observe that the “Reset

Crawler” button

becomes sensitized.

• Observe that the

“Currently Crawling”

field continuously

updates with the current

website being explored.

• Observe that the

“Crawled URLs” field

is updated with the

current number of

URLs that has been

crawled.

• Observe that the “Sites

in the Queue” field is

updated with the

number of websites that

could still be explored.

• Observe that the

“Current Progress”

progress bar is updated

WCRI 108

Pass

Page 183:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

175

progress bar is updated

based on the number of

pages left to crawl.

• Observe that when the

“Current Progress”

progress bar reaches

100%, a dialog appears

notifying the operator

that the crawl is

complete.

• When the crawl is

complete, observe that

the “Stop Crawl” button

returns to a “Begin

Crawl” button.

• Observe that the Web

Search and Entity

Search tabs become

sensitized.

WCRI 107

WCRI 106

5 Tester selects the “OK”

button from the dialog.

• Observe that the dialog

closes.

Pass

6 Tester selects the Web

Search tab.

• Observe that the Web

Search tab is now

raised.

Pass

7 Tester enters a string to

search for in the “Search

String” field.

• Observe that the field is

updated.

WSRI 100

WSRI 101

Pass

8 Tester enters a value

between 1 and 3 in the

“Min # of Backlinks”

field.

• Observe that the field is

updated.

WSRI 102 Pass

9 Tester presses the “Begin

Search” button.

• Observe that URLs that

contain the search string

in their text are listed in

the Search Results

scrollable box.

• Observe that the URLs

are sorted by decreasing

number of backlinks.

WSRI 104

WSRI 103

Pass

10 Tester enters a new string • Observe that the field is

updated.

WSRI 100 Pass

Page 184:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

176

to search for in the

“Search String” field.

updated. WSRI 101

11 Tester presses the “Begin

Search” button.

• Observe that URLs that

contain the search string

in their text are listed in

the Search Results

scrollable box.

• Observe that the URLs

are sorted by decreasing

number of backlinks.

WSRI 104

WSRI 103

Pass

12 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

3.4 Test Case 4: Entity Search Items

This test case tests the entity search requirements.

Prerequisites: Test Case 1 and Test Case 2 must both have passed.

Date Performed: 3/9/08

Issues Found: None

Comments: Completed the test flawlessly.

Table 8.5 Test Log for Test Case 4

Step

#

Action Performed Expected Outcome Requirements

Met

Pass/Fail

1 Tester starts KREST by

double clicking on the .jar

file on a Windows PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help

Menu

• Observe that the menu

items contain

shortcuts.

ARI 100

ARI 101

ARI 102

Pass

Page 185:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

177

shortcuts. ARI 106

2 Tester enters a valid

website to crawl in the

“Start Crawl At:” field.

• Field becomes

populated.

WCRI 100

WCRI 101

Pass

3 Tester enters a number

between 10 and 25 in the

“Max Sites to Explore”

field.

• Observe that the field

is updated.

WCRI 104 Pass

4 Tester presses the “Begin

Crawl” button.

• Observe that the

“Begin Crawl” button

is renamed to a “Stop

Crawl” button.

• Observe that the

“Reset Crawler”

button becomes

sensitized.

• Observe that the

“Currently Crawling”

field continuously

updates with the

current website being

explored.

• Observe that the

“Crawled URLs” field

is updated with the

current number of

URLs that has been

crawled.

• Observe that the “Sites

in the Queue” field is

updated with the

number of websites

that could still be

explored.

• Observe that the

“Current Progress”

progress bar is updated

based on the number

of pages left to crawl.

• Observe that when the

“Current Progress”

progress bar reaches

100%, a dialog appears

WCRI 108

WCRI 107

Pass

Page 186:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

178

100%, a dialog appears

notifying the operator

that the crawl is

complete.

• When the crawl is

complete, observe that

the “Stop Crawl”

button returns to a

“Begin Crawl” button.

• Observe that the Web

Search and Entity

Search tabs become

sensitized.

WCRI 106

5 Tester selects the “OK”

button from the dialog.

• Observe that the dialog

closes.

Pass

6 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

7 Tester enters a string to

search for in the “Search

String” field.

• Observe that the field

is updated.

ESRI 100

ESRI 101

ESRI 105

Pass

8 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the email

address of the previous

term.

• Observe that the field

is updated.

ESRI 102 Pass

9 Tester presses the “Begin

Search” button.

• Observe that email

addresses that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the

results are sorted based

on max number of

times found.

ESRI 106

ESRI 104

Pass

10 Tester deletes the old • Observe that the field

is updated.

ESRI 100 Pass

Page 187:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

179

value in the “Search

String” field enters a new

string to search for in the

“Search String” field.

is updated. ESRI 101

ESRI 105

11 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field

is updated.

ESRI 102 Pass

12 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the

results are sorted based

on max number of

times found.

ESRI 106

ESRI 104

Pass

13 Tester deletes the old

value in the “Search

String” field enters a new

string to search for in the

“Search String” field.

• Observe that the field

is updated.

ESRI 100

ESRI 101

ESRI 105

Pass

14 Tester adds “#fax”

without the quotes in the

“Search String” field to

search for the fax number

of the previous term.

• Observe that the field

is updated.

ESRI 102 Pass

15 Tester presses the “Begin

Search” button.

• Observe that fax

numbers that were

contained on the same

pages that matched the

search string are listed

ESRI 106

Pass

Page 188:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

180

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the

results are sorted based

on max number of

times found.

ESRI 104

16 Tester deletes the old

value in the “Search

String” field enters a new

string to search for in the

“Search String” field.

• Observe that the field

is updated.

ESRI 100

ESRI 101

ESRI 105

Pass

17 Tester adds “#address”

without the quotes in the

“Search String” field to

search for the street

address of the previous

term.

• Observe that the field

is updated.

ESRI 102 Pass

18 Tester presses the “Begin

Search” button.

• Observe that street

addresses that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the

results are sorted based

on max number of

times found.

ESRI 106

ESRI 104

Pass

19 Tester deletes the old

value in the “Search

String” field enters a new

string to search for in the

“Search String” field.

• Observe that the field

is updated.

ESRI 100

ESRI 101

ESRI 105

Pass

20 Tester adds “#zip”

without the quotes in the

• Observe that the field

is updated.

ESRI 102 Pass

Page 189:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

181

“Search String” field to

search for the zip code of

the previous term.

21 Tester presses the “Begin

Search” button.

• Observe that zip codes

that were contained on

the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the

results are sorted based

on max number of

times found.

ESRI 106

ESRI 104

Pass

22 Tester deletes the old

value in the “Search

String” field enters a new

string to search for in the

“Search String” field.

• Observe that the field

is updated.

ESRI 100

ESRI 101

ESRI 105

Pass

23 Tester adds “#all” without

the quotes in the “Search

String” field to search for

the all of the contact info

of the previous term.

• Observe that the field

is updated.

ESRI 103 Pass

24 Tester presses the “Begin

Search” button.

• Observe that all

contact info that was

contained on the same

pages that matched the

search string is listed

in the Entity Search

Results scrollable box.

• Observe that the

results are sorted based

on max number of

times found.

ESRI 106

ESRI 104

Pass

25 Tester types Alt-F | S,

saves the entity search

results, and verifies that

• Observe that the entity

search results were

saved to the specified

file.

ARI 104 Pass

Page 190:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

182

results, and verifies that

the data was saved.

26 Tester enters a valid file

name and selects the

‘Save’ button.

• Observe that the entity

search results were

saved to the specified

file.

Pass

27 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

3.5 Test Case 5: Reproducing the Results of [2]

This test case tests the ability of the entity searcher to reproduce the results of the entity

search project described in [2].

Prerequisites: Test Case 1 and Test Case 2 must both have passed. Four datasets when

represent a sampling of the original dataset found in [2] must be available for use.

Date Performed: 3/11/08. Retested 3/12/08.

Issues Found: Entity Searcher was having trouble with case sensitivity of search

terms. Updated the check in the code, rebuilt and retested

Comments: Overall the test worked well after the fix.

Table 8.6 Test Log for Test Case 5

Step

#

Action Performed Expected Outcome Requirements

Met

Pass/Fail

1 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

Pass

Page 191:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

183

ARI 106

2 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

3 Tester loads the

‘Test_Data_1.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103 Pass

4 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

5 Tester enters “Citibank

Customer Service” in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

Pass

6 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102 Pass

7 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 800-967-

2400 should be

contained in the

matches.

ESRI 106

ESRI 104

Pass

8 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

9 Tester starts KREST by

double clicking on the

.jar file on a Windows

• Observe that the

KREST program starts

up, with the Web

ARI 100

ARI 101

Pass

Page 192:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

184

.jar file on a Windows

PC.

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 102

ARI 106

10 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

11 Tester loads the

‘Test_Data_2.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103 Pass

12 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

13 Tester enters “New York

DMV” in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

Pass

14 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102 Pass

15 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 800-342-

5368 should be

ESRI 106

ESRI 104

Pass

Page 193:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

185

contained in the

matches.

16 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

17 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

Pass

18 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

19 Tester loads the

‘Test_Data_3.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103 Pass

20 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

21 Tester enters “Amazon

Customer Service” in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

Pass

22 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102 Pass

23 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed

ESRI 106

Pass

Page 194:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

186

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 800-201-

7575 should be

contained in the

matches.

ESRI 104

24 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

25 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

Pass

26 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

27 Tester loads the

‘Test_Data_4.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103 Pass

28 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

29 Tester enters “EBay

Customer Service” in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

Pass

30 Tester adds “#phone”

without the quotes in the

“Search String” field to

• Observe that the field is

updated.

ESRI 102 Pass

Page 195:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

187

search for the phone

number of the previous

term.

31 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 888-749-

3229 should be

contained in the

matches.

ESRI 106

ESRI 104

Pass

32 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

33 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

Pass

34 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

35 Tester loads the

‘Test_Data_5.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103 Pass

36 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

37 Tester enters “Thinkpad • Observe that the field is

updated.

ESRI 100 Pass

Page 196:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

188

Customer Service” in the

“Search String” field.

updated. ESRI 101

ESRI 105

38 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102 Pass

39 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 877-338-

4465 should contained

in the matches.

ESRI 106

ESRI 104

Pass

40 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

41 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

Pass

42 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

43 Tester loads the

‘Test_Data_6.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

ARI 103 Pass

Page 197:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

189

dataset. Search tabs become

enabled.

44 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

45 Tester enters “Illinois

IRS” in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

Pass

46 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102 Pass

47 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 800-829-

3676 should be

contained in the

matches.

ESRI 106

ESRI 104

Pass

48 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

49 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

Pass

Page 198:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

190

ARI 106

50 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

51 Tester loads the

‘Test_Data_7.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103 Pass

52 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

53 Tester enters “Barnes &

Noble Customer Service”

in the “Search String”

field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

Pass

54 Tester adds “#phone”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102 Pass

55 Tester presses the “Begin

Search” button.

• Observe that phone

numbers that were

contained on the same

pages that matched the

search string are listed

in the Entity Search

Results scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that 800-422-

7717 should be

contained in the

matches.

ESRI 106

ESRI 104

Pass

56 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

57 Tester starts KREST by • Observe that the

KREST program starts

ARI 100 Pass

Page 199:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

191

double clicking on the

.jar file on a Windows

PC.

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 101

ARI 102

ARI 106

58 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

59 Tester loads the

‘Test_Data_8.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103 Pass

60 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

61 Tester enters “Bill Gates”

in the “Search String”

field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

Pass

62 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102 Pass

63 Tester presses the “Begin

Search” button.

• Observe that email

addresses that occurred

on the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that

ESRI 106

Pass

Page 200:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

192

[email protected]

should be contained in

the matches.

ESRI 104

64 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

65 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

Pass

66 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

67 Tester loads the

‘Test_Data_9.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103 Pass

68 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

69 Tester enters “Oprah

Winfrey” in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

Pass

70 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102 Pass

71 Tester presses the “Begin

Search” button.

• Observe that email

addresses that occurred

on the same pages that

ESRI 106

Pass

Page 201:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

193

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that

[email protected] should

be contained in the

matches.

ESRI 104

72 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

73 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

Pass

74 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

75 Tester loads the

‘Test_Data_10.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103 Pass

76 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

77 Tester enters “Elvis

Presley” in the “Search

String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

Pass

78 Tester adds “#email”

without the quotes in the

“Search String” field to

• Observe that the field is

updated.

ESRI 102 Pass

Page 202:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

194

search for the phone

number of the previous

term.

79 Tester presses the “Begin

Search” button.

• Observe that email

addresses that occurred

on the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that

[email protected]

should be contained in

the matches.

ESRI 106

ESRI 104

Pass

80 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

81 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

Pass

82 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

83 Tester loads the

‘Test_Data_11.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

Search and Entity

Search tabs become

enabled.

ARI 103 Pass

84 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

85 Tester enters “Larry • Observe that the field is

updated.

ESRI 100 Pass

Page 203:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

195

Page” in the “Search

String” field.

updated. ESRI 101

ESRI 105

86 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102 Pass

87 Tester presses the “Begin

Search” button.

• Observe that email

addresses that occurred

on the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that

[email protected]

should be contained in

the matches.

ESRI 106

ESRI 104

Pass

88 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

89 Tester starts KREST by

double clicking on the

.jar file on a Windows

PC.

• Observe that the

KREST program starts

up, with the Web

Crawler tab opened.

• Observe that the menu

bar contains a File

menu and a Help Menu

• Observe that the menu

items contain shortcuts.

ARI 100

ARI 101

ARI 102

ARI 106

Pass

90 Tester presses Alt-F | L. • Observe dialog to load

file opens.

ARI 103 Pass

91 Tester loads the

‘Test_Data_12.pages’

dataset.

• Observe that the load

dialog disappears.

• Observe that the Web

ARI 103 Pass

Page 204:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

196

dataset. Search and Entity

Search tabs become

enabled.

92 Tester selects the Entity

Search tab.

• Observe that the Entity

Search tab is now

raised.

Pass

93 Tester enters “Arnold

Schwarzenegger” in the

“Search String” field.

• Observe that the field is

updated.

ESRI 100

ESRI 101

ESRI 105

Pass

94 Tester adds “#email”

without the quotes in the

“Search String” field to

search for the phone

number of the previous

term.

• Observe that the field is

updated.

ESRI 102 Pass

95 Tester presses the “Begin

Search” button.

• Observe that email

addresses that occurred

on the same pages that

matched the search

string are listed in the

Entity Search Results

scrollable box.

• Observe that the results

are sorted based on max

number of times found.

• Observe that

[email protected].

gov should be contained

in the matches.

ESRI 106

ESRI 104

Pass

96 Tester types Alt-F | X. • Observe that the

KREST application

closes.

ARI 109 Pass

4 Overall Results

KREST passed the formal qualification testing with flying colors, and is now ready for the

final MSE presentation.

Page 205:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

197

CHAPTER 9 - User’s Manual

1 Introduction

This document describes how to setup and run the KDD-Research Entity Search Tool

(KREST). It will explain how to run web crawls, web searches, and entity searches, as

well as detailing how to load in available data.

2 Application Setup

This section details what things are necessary in order to run KREST.

2.1 Required Software

• Java Runtime Environment 1.3.1 or later

2.2 Recommended Hardware

• Minimum recommended processor speed: 1.6 GHz

• Minimum recommended RAM: 512 MB

• Minimum recommended internet connection: DSL or better

2.3 Required Files

• KREST.jar – This jar file contains everything necessary to run KREST. If you

desire to see or make modifications to the source code, it is available in KREST-

Source-final.zip. Simply download the source, make any modifications deemed

necessary, and rebuild the project. The FatJar plugin was used with eclipse to

package everything necessary into the executable jar file.

2.4 Recommended Files

• WebBase Datasets – These can be created from WebBase at:

http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/. They represent

previously crawled pages. If you want to load in a large section of crawled pages

Page 206:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

198

for web or entity searching, you should consider downloading datasets from there.

Instructions for how to download datasets are available on the WebBase website.

3 KREST

3.6 Running KREST

• Double click on the KREST.jar executable Jar file to start up the application. You

should see a screen like the one below.

Figure 9.1 Opening KREST Screen

3.7 Performing a Web Crawl

So you want to perform a web crawl. Before you can do that though, there are several

decisions that you need to make:

• Where do you want to start the web crawl at

• Do you want to perform a breadth-first crawl? If so, how many pages do you

want to explore?

Page 207:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

199

• Or would you rather perform a depth-limited crawl? If so, how many levels deep

would you like to explore?

3.2.1 Breadth-First Crawling

This is the type of crawling where you limit the scope of the web crawl by the number

of websites that you want to explore. First, enter the website that you would like to

begin exploring at. After that, make sure that the ‘Max Sites to Explore’ circle is

selected, and enter the maximum number of websites that you want to have explored.

There is a drop down box containing different amounts, or you can enter a specific

number.

It is important to note that if the crawler runs out of web pages to explore before it

reaches your maximum number of sites to explore, it will stop crawling. (However, it is

extremely rare for this to happen.).

Next, once you are satisfied with the start page and the maximum number of sites to

explore, press the ‘Begin Crawl’ button. You should see the fields at the bottom of the

KREST form start updating with the progress bar moving to tell you how much

progress has been made in your web search. When the web crawl is complete a box

will pop up telling you that the crawl has completed.

Page 208:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

200

Figure 9.2 Completed Breadth-First Web Crawl

3.2.2 Depth-First Crawling

This is the type of crawling where you limit the scope of the web crawl by the depth of

the websites beyond the start page that you want to explore. First, enter the website

that you would like to begin exploring at. After that, make sure that the ‘Max Depth to

Explore’ circle is selected, and enter the maximum depth of websites that you want to

have explored. The default depth of 3 can be modified, but keep in mind that

increasing it too much can leave the crawler going for a long time!

It is important to note that if the crawler runs out of web pages to explore before it

reaches your maximum depth to explore, it will stop crawling. (However, it is

extremely rare for this to happen.).

Next, once you are satisfied with the start page and the maximum depth to explore,

press the ‘Begin Crawl’ button. You should see the fields at the bottom of the KREST

form start updating with the progress bar moving to tell you how much progress has

Page 209:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

201

been made in your web search. When the web crawl is complete the progress will stop

moving forward.

Figure 9.3 Depth-First Crawl in Progress

3.2.3 Saving Web Crawl Information

If you want to save the information about the web crawl, click the box next to the “Log

File to Use:” field. You should see the field become editable. Either enter a new file

name, or use the one provided. When this box is selected, and the ‘Begin Crawl’ button

is pressed, all information about the web crawl will be written out the file.

Page 210:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

202

Figure 9.4 Saving a Web Crawl

3.2.4 Stopping a Web Crawl

Did you make a mistake in the page that you wanted to start crawling from? Is the

crawl taking too long, and you just want it to end? Don’t worry; you have the ability to

stop the web crawl at any point. Once you’ve started a web crawl, notice that the

‘Begin Crawl’ button has changed to a ‘Stop Crawl’ button. Simply press the ‘Stop

Crawl’ button at any point during a web crawl, and the crawl will immediately stop

with the status fields being reset to defaults. You may also be interested in the ability

to clear crawled pages out of the database, which is detailed in the next section.

Page 211:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

203

Figure 9.5 Stopping a Web Crawl

3.2.5 Resetting the Crawled Pages

If you want to start over from scratch after having performed a web crawl, select the

‘Reset Crawler’ button. It will clear all of the previously crawled web pages out of the

database, and reset the fields on the form. If you are in the middle of a web crawl when

the ‘Reset Crawler’ button is pressed, it will stop the web crawl and reset the database.

The fields containing information about the crawl will also be reset.

Page 212:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

204

Figure 9.6 Resetting a Web Crawl

3.8 Performing a Web Search

Performing a web search is simple with KREST. First, you must have either performed

a web crawl, or loaded pages through the application. (Loading Data is discussed in

Section 3.5). To perform a web search, click on the ‘Web Search’ tab, enter the term

that you would like to search for, and press the ‘Begin Search’ button. The pages that

contained the search terms will be listed in the ‘Search Results’ table. The matching

pages will be ranked according to number of back-links, that is, the number of pages

that link to that particular web page.

Page 213:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

205

Figure 9.7 Performing a Web Search

3.3.1 Filtering the Web Search Results

Did you get too many results? Or only want to see the most significant ones? By using

the ‘Min # of Backlinks’ field, you can filter out the results that do not have any other

page refer to them. This helps ensure that you get the highest quality results. Simply

enter the minimum number of back-links required, and press ‘Begin Search’ – lesser

results will be filtered out automatically.

Page 214:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

206

Figure 9.8 Filtering the Web Search by Back-link Count

3.9 Performing an Entity Search

Performing an entity search is simple with KREST. First, you must have either

performed a web crawl, or loaded pages through the application. (Loading Data is

discussed in Section 3.5). To perform an entity search, click on the ‘Entity Search’ tab,

enter the term that you would like to search for, following by the entity type that you

would like to find and press the ‘Begin Search’ button. The entity search matches will

be returned as well as pages that contain the entities in the ‘Search Results’ table. The

entities found will be ranked according to number of web pages that contained each

entity.

To search for an entity, enter the type preceded by the pound (#) sign. Acceptable

entity types are Street Addresses (#address), Email Addresses (#email), Phone

Numbers (#phone), Fax Numbers (#Fax), and Zip Codes (#Zip). There is also an

Overarching entity (#all) that will pick up all entity information. If you do not enter a

Page 215:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

207

valid entity type into the search box, a box will pop up notifying you of the valid entity

terms.

Figure 9.9 Performing an Entity Search

3.10 Loading Data

Sometimes you’d rather skip the web crawl and look at data that you already have on

your computer. In order to load previously crawled data, simply go to the ‘File’ menu

and select ‘Load Data’. A file dialog will appear asking you to select the location of

the previously crawled data. Once you select the right file, KREST will begin loading

– PLEASE NOTE: Loading in data can take a while. Once the file has been loaded, a

box will pop up notifying you that loading data is complete.

Page 216:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

208

Figure 9.10 How to Load Data into KREST

3.11 Saving Entity Search Results

Need to save your entity search results out to a file? In order to save the results,

complete a web search, and then select the ‘File’ menu and press ‘Save Results’. A file

dialog will pop up allowing you to select where the results to be saved.

Page 217:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

209

Figure 9.11 How to Save Entity Search Results

3.12 Exiting KREST

Leaving so soon? You have two ways that you can shut down the KREST application:

• Click the ‘X’ button in the upper-right hand corner of the application.

• Go to the ‘File’ menu and select ‘Exit’.

Page 218:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

210

Figure 9.12 KREST Application with Exit Methods Circled

3.13 Information About KREST

Want to find out who created KREST, and when it was created? Click on the ‘Help’

menu and select ‘About’. You’ll see a box pop-up with information on the developer.

Page 219:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

211

Figure 9.13 How to Access the Help Menu

3.14 Troubleshooting

Have a problem that wasn’t answered elsewhere in the manual? You problem might be

answered here.

3.9.1 Crawler not Getting All Links on a Web Page

The Web Crawler is set to look for all instances of “http://….” in the html of the web

page. It is currently unable to extract partial links (such as “/cgi-bin/index.html”). This

is a feature that may be implemented in a future build.

3.9.2 Progress Bar not Updating During Depth-First Crawls

Depth-First crawling works differently that normal Breadth-First crawling. Since the

crawling keeps processing until it hits the max depth, there isn’t an easy way to track

when all of the pages at the max level have been processed. Because of this, the

progress bar will sometimes hang at 66%. If it appears that crawling has completed (by

the crawled page not changing), it is safe to move on to perform web or entity searches.

Page 220:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

212

3.9.3 Cannot Click on URLs in the Web Search Results

The URLs in the Web Search Results area are not clickable URLs. However, if you

want to visit one of the URLs that were found, simply click in the cell and highlight the

URL. Copy the text of the URL and paste it into your web browser.

3.9.4 Cannot Click on URLs in the Entity Search Results

Ideally, you would not need to click on the URLs in the Entity Search Results area, as

the information has already been extracted from the web pages. However, if you really

want to see the web page, simply click in the cell and highlight the URL. Copy the text

of the URL and paste it into your web browser.

3.9.5 Tried to Load Data, but Received an Error Message

Currently KREST is only able to load datasets downloaded from WebBase

(http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/). Trying to load any other

type of data will result in an error message being displayed.

3.9.6 Tried to Load Data, but Only Loaded X Number of Pages

The KREST application is currently limited to loading in about 32 MB worth of data

from a file. This is due to Java’s class size restrictions. All pages that were loaded

have been loaded properly, and you may perform web searches and entity searches on

the loaded pages.

3.9.7 Entity Search Results Don’t Match What I Expected for Overarching Results

Overarching results are based on the address. Once the address has been found on a

webpage, the other entities will be searched for from that point in the webpage.

Nothing before that point in the page will be recorded.

3.9.8 Searching for Multiple Entity Types

KREST is limited to searching for only one entity type at a time. If you want to search

for more than one at a time, you will need to combine them all using the

“#overarching” entity type. If you try to search for more than one entity type at once,

the last one will be used.

3.9.9 Miscellaneous Problem Not Mentioned Above

Page 221:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

213

If you are reading this section after encountering a problem, then you may have found a

bug in the application. Please note the bug and email it to the developer at

[email protected] (Maintained through May 2008). If the issue is bad enough that it is

preventing you from running, shutdown KREST and restart it.

Page 222:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

214

CHAPTER 10 - Project Evaluation

1 Introduction

This document describes in detail my experiences while working on the KDD-Research

Entity Search Tool (KREST) project throughout two semesters of CIS 895. It includes a

time log analysis, a source code analysis, as well as problems encountered and lessons

learned. Also included is a section which describes possible future work on the project.

2 Problems Encountered

During the course of the project, there were several areas that were frequent causes of

concern, where a majority of the debugging time ended up being spent.

2.1 Web Crawler Thread Control

In order to speed up web crawling, I implemented a system which allowed multiple

threads to crawl web pages at the same time. I have had very little previous experience

working with thread control, but for basic crawling, the system seemed to work pretty

well. However, I encountered a lot of problems when trying to do more complex things

like stopping a web crawl, restarting a web crawl, or starting a brand new crawl after

one has been stopped. I was eventually able to get past the problems, but had to spend

a lot more time debugging the issues than I had planned for.

2.2 Java Class Size Limitations

My initial plan for the project was to store all of the crawled or loaded web pages in a

Hashtable within the KrestObjectLibrary class. I wanted to avoid having to hook up to

a database, because I didn’t have much experience using JDBC calls, and I wanted to

keep the storage mechanism as simple as possible. I assumed that I would be able to

allocate as much space as was available to storing web pages. I later found out when

trying to test the crawl functionality that Java limits each class to 32 MB of heap space.

This limited the crawls to around 1500 to 2500 web pages, depending upon the size of

Page 223:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

215

the pages. While not bad, I was hoping to be able to load over 50,000 web pages at a

time. Since I was still able to achieve significant results with the smaller number of

web pages, I did not need to look at adding in a real database; however, it would be a

good add on project to add database functionality for webpage storage.

2.3 Jigloo GUI Builder

For the project, I needed to build a graphical user interface for the application. I

wanted one that was integrated with Eclipse, my integrated development environment

(IDE) of choice. In order to do this, I chose to use Jigloo, which had been used by

previous CIS 895 students in building their projects. Upon inspection and running

small tests with the Eclipse plugin, the tool seemed to work well at building interfaces.

The larger that the screens got though, the longer Jigloo took to load each time. It also

took longer and longer to recompile after each change. I also struggled with the layouts

within the plugin, they did not seem to pack well when the GUI was built as an

executable. If I were doing the project from scratch again, I would go with a different

GUI builder.

3 Source Lines of Code (SLOC)

The estimate for the SLOC to be produced for the project was made at the end of Phase 1

of the project. The estimate was 2000 SLOC based on other available web crawling

projects. At the end of phase two, a new estimate was made, which anticipated that there

would be around 2350 SLOC.

The actual SLOC developed was 2960. A detailed breakdown of the SLOC produced for

the project can be seen in Appendix A.

I believe the original estimate was low due to the amount of extra code produced by using

the Jigloo GUI builder. The builder added in many extra “getter” methods for all of the

graphical widgets, most of which were not used. This accounted for about 350 to 400

SLOC. The other area that was larger than expected was the entity search portion of the

project. In order to search for specific entity types, extra code was needed, which resulted

Page 224:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

216

in several hundred extra SLOC. Since this was where a majority of the remaining code

needed to be developed during phase three, it likely caused the gap.

Overall, I think that I did a decent job with the original estimate on the SLOC, although I

would’ve liked to have done better. The original estimate ended up being off by less than

50%, and the second estimate was only off by about 25%.

4 Project Duration

The following table shows the preliminary estimated dates for the completion of the three

project phases, and the actual dates when they were finished. The actual completion dates

stayed very close to the estimated schedule.

Table 10.1 Project Phase Completion Dates

Phase Expected Completion Date Actual Completion Date

1 November 13, 2007 November 13, 2007

2 February 15, 2008 February 13, 2008

3 April 25, 2008 April 23, 2008

The figure below shows the total time spent working on the project during each phase of

the project.

Page 225:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

217

Figure 10.1 Phase Breakdown

Time Spent Per Phase (in Hours)

Phase 1, 55.92,

37%

Phase 2, 57.83,

38%

Phase 3, 37.67,

25%

Phase 1

Phase 2

Phase 3

It ended up that the amount spent on all three phases was roughly equal, despite the

differences in the length of time between phases. This was due to trying to keep on

schedule, so there was an attempt to cram more work into a compressed amount of calendar

time.

The following graph displays an overall breakdown of time spent on activities relating to

the project. Over 75% of the total project time went into documentation and code

development, which is to be expected. Additional charts will follow that will show the

activity breakdown per phase.

Page 226:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

218

Figure 10.2 Project Activity Breakdown

Time Spent Per Project Activity (in Hours)

Coding, 57.92, 38.25%

Integration, 0.75,

0.50%

Documentation, 57.17,

37.75%

Webpage, 2.00,

1.32%

Presentation, 11.58,

7.65%

Reading, 9.50, 6.27%

Timelog, 1.50, 0.99%

Environment, 4.58,

3.03%

Research, 1.92,

1.27%

Discussion, 4.50,

2.97%

Discussion

Research

Reading

Timelog

Environment

Coding

Integration

Documentation

Presentation

Webpage

The chart below details the activity breakdown for Phase 1 of the project. Although

slightly over 50% of the time was spent coding and producing documentation, a large

chunk of time was also spent in discussion, reading, researching, and setting up the

project environment.

Page 227:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

219

Figure 10.3: Phase 1 Activity Breakdown

Time Spent Per Project Activity During Phase 1 (in Hours)

Coding, 12.58, 22.50%Integration, 0.00,

0.00%

Documentation, 17.50,

31.30%

Discussion, 4.33,

7.75%

Research, 1.92,

3.43%

Environment, 4.58,

8.20%

Timelog, 0.83, 1.49%

Reading, 9.50, 16.99%

Presentation, 3.67,

6.56%

Webpage, 1.00,

1.79%Discussion

Research

Reading

Timelog

Environment

Coding

Integration

Documentation

Presentation

Webpage

The chart below details the activity breakdown for Phase 2 of the project. The amount

of time spent producing documentation was similar to Phase 1, but the amount of time

spent coding almost tripled. Also, the amount of time spent preparing for the

presentation almost doubled. It is interesting to note that by Phase 2, little time was

spend reading, researching, and setting up the environment as these activities were

completed during Phase 1.

Page 228:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

220

Figure 10.4: Phase 2 Activity Breakdown

Time Spent Per Project Activity During Phase 2 (in Hours)

Coding, 30.33, 52.45%Documentation, 18.67,

32.28%

Webpage, 0.25,

0.43%

Presentation, 7.92,

13.69%

Timelog, 0.67, 1.15% Discussion

Research

Reading

Timelog

Environment

Coding

Integration

Documentation

Presentation

Webpage

The chart that follows details the activity breakdown for Phase 3 of the project. The

amount of time spent producing code dropped significantly when compared to Phase 2.

This is due to the coding of the project almost being complete by the time that Phase 3

began. Also, the amount of time spent in producing documentation rose by quite a bit

compared to other phases. This is due to the increased amount of documentation

required for Phase 3, as well as cleaning up previously release documents, and putting

together the portfolio from previous work.

Page 229:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

221

Figure 10.5: Phase 3 Activity Breakdown

Time Spent Per Project Activity During Phase 3 (in Hours)

Coding, 15.00, 40%

Integration, 0.75,

2%

Documentation,

21.00, 56%

Presentation, 0.00,

0%

Timelog, 0.00, 0%

Discussion, 0.17,

0%Webpage, 0.75, 2%

Discussion

Research

Reading

Timelog

Environment

Coding

Integration

Documentation

Presentation

Webpage

5 Lessons Learned

Throughout the duration of the project, there were several topics that I learned that I could

apply in the future.

5.1 Eclipse IDE

I use C++ everyday at work, and I really have not used Java or Eclipse for more than

brief assignments since graduating from Kansas State with my undergraduate degree in

2003. This was the first time I had developed anything significant in both Java and the

Eclipse IDE. It took me a while to figure out how to set everything up with the

development environment, but once I got past the learning curve, it was an extremely

powerful tool. Knowledge of how to use Eclipse will definitely be useful if I ever

switch to a project at work that uses Java.

5.2 Creation of Design Documents

On the projects that I have worked on since graduating with my undergraduate degree, I have

never been through the full software lifecycle – I have always come in during the coding

phase and stayed through integration before moving to a new project. Due to this, I have

Page 230:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

222

spent a lot of time developing software based on design documents that were produced by

others, but I have never spent any time developing design documents from scratch. Using

Microsoft Visio to develop design document was a good learning experience and will be

useful in the future.

6 Future Work

There are three areas that I would consider for project enhancements if I had more time to

work on the project.

6.1 Integration of Open Source Web Crawler

Currently KREST supports web crawling, web searching, and entity searching. Early

on during the project, I made the decision to implement the crawling capability rather

than using one of the available open source crawlers. This allowed me to learn how

web crawling works, and served as a base for future entity search development.

However, due to the time and scope limitations of the MSE project, the crawler is

limited in comparison to other open source crawlers. For instance, while the crawler

supports crawling over links found in web pages it only supports full http:// URLs, it

cannot handle partial paths. Also, while the crawler has its own thread control to

prevent the crawler from slamming the internet connection, it is not nearly as robust as

any of the open source crawlers.

6.2 Adding a Database to Hold Web Pagse

In order to limit the scope of the project, a Java Hashtable object is used to hold

crawled Webpage objects rather than a full backend database. While the current

mechanism works well with the current project, it is also one of the main limitations of

the project. Due to the Hashtable object being stored by the KrestObjectLibrary class,

trying to load too many web pages will cause Java to run out of heap space as the

KrestObjectLibrary class will attempt to grow beyond the 32 MB class limit imposed

by Java.

In order to update the data storage to be more robust, a full database should probably be

implemented if a developer was to extend the project in the future. An added benefit of

Page 231:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

223

adding a database would be the ability to store previously crawled pages over multiple

sessions, rather than having to start over from scratch each time the program is run.

6.3 Adding Ability to Load Different File Types

KREST is currently set up to load files create from the WebBase repository, available

at: http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/. The operator can go to

the WebBase website and download a specified number of webpages. Currently, the

crawler can handle about 32 MB of data, so in most cases this works out to roughly

1500 web pages.

In the future, it may be useful to use KREST as an alternate test bed, to compare against

other projects. In order to do this, KREST would have to be extended to load

additional file types.

Page 232:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

224

References

[1] Cheng, T., Yang, X., & Chang, K. (2007). EntityRank: Searching Entities Directly and

Holistically. In Proceedings of the 33rd Very Large Data Bases Conference (VLDB

2007).

[2] Cheng, T., Yang, X., & Chang, K. (2007). Supporting Entity Search: a Large-Scale

Prototype Search Engine. In Proceedings of the 2007 ACM SIGMOD Conference

(SIGMOD 2007), pages 1144-1146.

[3] Gallagher, P. (2005). Component Design 1.0. Retrieved 03/17/2008, from

http://mse.cis.ksu.edu/gallagher/PhaseThree/PDF/Component_Design_1_0.pdf.

[4] Gallagher, P. (2005). Technical Inspection List 1.0. Retrieved 12/13/2007, from

http://mse.cis.ksu.edu/gallagher/PhaseTwo/PDF/Technical_Inspection_1_0.pdf.

[5] Gallagher, P. (2005). Test Plan 1.0. Retrieved 01/09/2008, from

http://mse.cis.ksu.edu/gallagher/PhaseTwo/PDF/Test_Plan_1.0.pdf.

[6] Gallagher, P. (2005). Vision Document 2.0. Retrieved 10/29/2007, from

http://mse.cis.ksu.edu/gallagher/PhaseTwo/PDF/Vision_Document_2.0.pdf.

[7] Guillen, E. (2004). Architecture Design 1.0. Retrieved 01/14/2008, from

http://mse.cis.ksu.edu/esteban/phase_2/docs/Architecture Design1.0.pdf.

[8] IEEE Standard for Software Quality Assurance Planning. IEEE Std 730-1998

(Revision of IEEE Std 730-1989).

[9] IEEE Guide for Software Quality Assurance Planning. IEEE Std 730,1-1995 (Revision

of IEEE Std 983-1986).

[10] Marston, T. (2007). The Model-View-Controller (MVC) Design Pattern for PHP.

Retrieved 01/15/2008, from http://www.tonymarston.net/php-mysql/model-view-

controller.html.

[11] Relevant standards – IEEE Std.839-1998 for Software Test Plans.

[12] Sepaha, B. (2005). Inspection Checklist. Retrieved 12/13/2007, from

http://mse.cis.ksu.edu/binti/Phase2Documents/Checklist.pdf.

[13] Wikipedia. Retrieved 10/29/2007 from http://www.wikipedia.org.

Page 233:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

225

[14] Zhong, H. & Cheng, T. (2007). Virtual Web: What If You Own the Entire Web?.

Retrieved 10/29/2007, from http://mias.uiuc.edu/dssi/2007_virtual_web.

Page 234:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

226

Appendix A - Source Metrics

The project source metrics were determined using SLOC Metrics 3.0 available from

http://SLOCMetrics.com.

1. Project Summary Metrics

Table A.1 Overall Project SLOC Metrics

Project SLOC % SLOC Comments Blank Lines Total

All Source 100.0% 2960 1374 541 4875

2. Source Metrics By Package

Table A.2 Source Metrics by package

Project SLOC % SLOC Comments Blank Lines Total

Application 72.43% 18 10 7 35

Controller 14.97% 2144 637 344 3125

Model 11.99% 443 474 119 1036

View 0.61% 355 253 71 679

Total 100.0% 2960 1374 541 4875

3. Source Metrics of the Application Package

Table A.3 Source Metrics of the Application Package

Project SLOC % SLOC Comments Blank Lines Total

KrestApplication.java 100.00% 18 10 7 35

Total 100.00% 18 10 7 35

Page 235:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

227

4. Source Metrics of the Controller Package

Table A.4 Source Metrics of the Controller Package

Project SLOC % SLOC Comments Blank Lines Total

EntitySearcher.java 36.71% 787 82 84 953

KrestController.java 32.09% 688 210 72 970

SiteVisitor.java 15.72% 337 138 88 563

FileLoader.java 4.15% 89 5 16 110

HTTPReader.java 3.08% 66 32 22 120

WebSearcher.java 2.38% 51 30 15 96

ThreadController.java 2.19% 47 50 18 115

KrestAboutDialog.java 1.45% 31 15 9 55

WebCrawler.java 1.26% 27 50 13 90

Webpage.java 0.98% 21 25 7 53

Total 100.00% 2144 637 344 3125

5. Source Metrics of the Model Package

Table A.5 Source Metrics of the Model Package

Project SLOC % SLOC Comments Blank Lines Total

OverarchingEntity.java 26.64% 118 121 30 269

KrestObjectLibrary.java 18.74% 83 45 13 141

FaxEntity.java 9.03% 40 41 11 92

PhoneEntity.java 9.03% 40 41 11 92

AddressEntity.java 7.90% 35 46 10 91

KrestModel.java 7.22% 32 50 11 93

Webpage.java 5.42% 24 34 8 66

KrestEntity.java 5.19% 23 30 7 60

EmailEntity.java 4.29% 19 24 7 50

ZipEntity.java 4.29% 19 24 7 50

WebObject.java 2.26% 10 18 4 32

Page 236:   MSE Portfolio - Eric Davispeople.cis.ksu.edu/~efd3467/MSE_Portfolio-Eric_Davis.pdfERIC F. DAVIS B.S., Kansas State University, 2003 A REPORT submitted in partial fulfillment of

228

Total 100.00% 443 474 119 1036

6. Source Metrics of the View Package

Table A.6: Source Metrics of the View Package

Project SLOC % SLOC Comments Blank Lines Total

EntityObserver.java 25.07% 89 41 19 149

TextAreaRenderer.java 21.41% 76 53 9 138

CrawlerObserver.java 18.87% 67 56 13 136

SearchObserver.java 18.03% 64 31 15 110

KrestView.java 10.14% 36 60 11 107

TextAreaEditor.java 6.48% 23 12 4 39

Total 100.00% 355 253 71 679