johnson graduate school of management library project

58
Johnson Graduate School of Management Library Project Clients: Ken Bolton Angela K. Horne JGSM Library Reference Team Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu

Upload: leandra-shepard

Post on 02-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Clients: Ken Bolton Angela K. Horne JGSM Library Reference Team. Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu. Johnson Graduate School of Management Library Project. Functional Requirements. Search Function Simple Search Advanced Search - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Johnson Graduate School of Management Library Project

Johnson Graduate School of Management Library Project

Clients:

Ken Bolton

Angela K. Horne

JGSM Library Reference Team

Project Team:

Jonathan Gong

Benson Lee

Man Fai Matthew Lee

Greg Leedberg

Liz Xu

Page 2: Johnson Graduate School of Management Library Project

Functional Requirements

Search Function Simple Search Advanced Search

Administrative Features Add HTML Page Remove HTML Page Update Existing HTML Page

Page 3: Johnson Graduate School of Management Library Project

Search Feature

Why? Because the client would like the content of their website to be more accessible

Simple Search For a easy to accessible search

Advanced Search To limit search results and get better results

Page 4: Johnson Graduate School of Management Library Project

Simple Search

A search box will be located on the home page of the JGSM library website. (http://www.library.cornell.edu/johnson/)

The system will return all of the pages that contain all or any of the words provided by the user. (with exceptions)

Example: “Bloomberg FAQ”

“the Bloomberg FAQ”

Page 5: Johnson Graduate School of Management Library Project

Advanced Search

Search fields: Find pages with all of the keywords Find pages with any of the keywords Find pages with “the exact phrase”

Example: “Bloomberg FAQ” Limit search to a specific category

Page 6: Johnson Graduate School of Management Library Project

Viewing Search Results

The results of the search should be displayed 10 to a page in ranked order

Search results will contain the title of the pages, link to the pages, and a short description

Search results should reflect what the most useful links are to users

Example: “Bloomberg FAQ”

Page 7: Johnson Graduate School of Management Library Project

Administrative Features - Add

All administrative features must authenticate the user using a username and password.

Add HTML Page – Administrator can: specify a URL to add to the search system

the system will add page and key metadata into the database

select category for the page add an abstract to be associated with the page (optional)

if there is no abstract, part of the text of the document will be displayed in search results

Page 8: Johnson Graduate School of Management Library Project

Admin Features - Remove

Remove HTML Page – Administrator can: specify a URL to remove from the search system

the system will remove the page and all association with the URL from the database

upon removal the page will no longer be searched by users

if the URL does not exist in the database, the system will display an error

Page 9: Johnson Graduate School of Management Library Project

Admin Features - Update

Update HTML Page – Administrator can: specify the page to update using its URL

the page metadata in the database is updated from the new URL

change the category of the page (optional) change the abstract of the page after viewing the

old page abstract (optional)

Page 10: Johnson Graduate School of Management Library Project

Non-Functional Requirements

Ease of Use Documentation Help System Deployment Scalability Security Design Criteria

Page 11: Johnson Graduate School of Management Library Project

Ease of Use

System will be extremely easy to use

Search Search box on main JGSM Library’s page A link on the main JGSM Library’s page to the

advanced search page Advanced search’s 3 options are also self-explanatory

Page 12: Johnson Graduate School of Management Library Project

Ease of Use

Administration The administration user interface is very

straightforward Three functionalities:

Add a Page Remove a Page Update a Page

Page 13: Johnson Graduate School of Management Library Project

Ease of Use

After viewing the training slides and trying it out a few times…

An administrator should be able to maintain the database through the administration page immediately

Page 14: Johnson Graduate School of Management Library Project

Documentation

All source code that we write will have documentation within

All source code that we use from another source will include information on where it came from

A separate document will contain our implementation strategies and describe all algorithms we use

Page 15: Johnson Graduate School of Management Library Project

Help System

Search There will be a link to a help page on the

advanced search page that suggests ways to get better search results

That page will automatically display if no search results are found

Page 16: Johnson Graduate School of Management Library Project

Help System

Administration A brief help page written by us will be linked to on

the administration page for instructions on usage There will be error messages that indicate what

went wrong, if errors occur during database maintenance

Page 17: Johnson Graduate School of Management Library Project

Deployment

We will install and configure all necessary software and integrate the system into the JGSM Library system

After deployment, system can be used instantly by anyone who accesses the page

Page 18: Johnson Graduate School of Management Library Project

Scalability

The system will not experience visible slowdown as the document base grows, up to at least twice the number of documents currently in the database

This applies for both searching and database administration

Page 19: Johnson Graduate School of Management Library Project

Security

Administration page will be accessed with user name and password

We recommend that the client do not link to the administration page from anywhere on the JGSM site

Page 20: Johnson Graduate School of Management Library Project

Use Cases

The use scenarios of this system involve two actors: The website user who wishes to search The administrator who actually manages the website.

WebsiteUser Administrator

Page 21: Johnson Graduate School of Management Library Project

Use Cases:WebsiteUser Use Cases

WebsiteUser

Quick search

Advanced Search

View Results

«uses»

«uses»

Page 22: Johnson Graduate School of Management Library Project

Name: Quick search Actor: WebsiteUser Flow of events:

1. WebsiteUser visits Johnson Graduate School of Management Library Website

2. WebsiteUser clicks in "simple search" box near top of page

3. WebsiteUser types in one or more search terms into the box that they desire to search for.

4. WebsiteUser presses <enter>.

WebsiteUser

Quick search

View Results

«uses»

Page 23: Johnson Graduate School of Management Library Project

5. WebsiteUser views results via the View Results use case.

6. When completed, WebsiteUser may either browse to another webpage, close their web browser, or perform another search.

Entry conditions: WebsiteUser knows URL of library website. WebsiteUser has a compatible browser.

WebsiteUser

Quick search

View Results

«uses»

Page 24: Johnson Graduate School of Management Library Project

Name: Advanced Search Actor: WebsiteUser Flow of events:

1. WebsiteUser visits Johnson Graduate School of Management Library Website

2. WebsiteUser clicks on "Advanced Search" link. 3. WebsiteUser is presented with advanced search options --

searching for "any" words, "all" words, exact phrase, or within a certain category.

4. WebsiteUser types in one or more search terms into the box that corresponds to the type of search they wish to perform.

5. WebsiteUser selects the category they wish to search within, if any.

6. WebsiteUser clicks the "search" button.

WebsiteUser Advanced Search

View Results

«uses»

Page 25: Johnson Graduate School of Management Library Project

7. WebsiteUser views results via the View Results use case.8. When completed, WebsiteUser may either browse to

another webpage, close their web browser, or perform another search.

Entry conditions: WebsiteUser must have a web browser capable of

displaying the Johnson Graduate School of Management library website.

WebsiteUser must know the URL of the JGSM Library website, or browse there from another site.

WebsiteUser Advanced Search

View Results

«uses»

Page 26: Johnson Graduate School of Management Library Project

Name: View Results Actor: WebsiteUser Flow of events:

1. Website presents WebsiteUser with a results page, containing a list of the first 10 results, ordered by relevance as determined by the search engine's ranking algorithm.

2. For each result, the results page includes a title of the page, a link to that page, and the context in which the search term(s) were used, OR an abstract of the page.

3. If a result seems useful to WebsiteUser, they click on the link and can visit the page. They may navigate back to the results page to see the results again.

4. If there are more than 10 results, WebsiteUser may see the next 10 by clicking a "next page" link at the bottom of the search results.

WebsiteUser

Quick search

Advanced Search

View Results

«uses»

«uses»

Page 27: Johnson Graduate School of Management Library Project

Use Cases:Administrator

Administrator

Add a page tosite/index

Remove a page fromsite/index

Update a page'scontent in site/index

Authenticate«uses»

«uses»

«uses»

Page 28: Johnson Graduate School of Management Library Project

Name: Add a page to the site/index Actor: Administrator Flow of events:

1. Administrator adds HTML page to online website.2. Administrator visits the Administration Page.3. The “Authenticate” use case authenticates the Administrator4. In the "Add" section, Administrator enters the URL of the page just

added.5. If Administrator desires to store a description of the page (for use in the

search results), they enter it in the description box.6. If this page belongs to a category (used for advanced searching), they

may select that category from the category pull-down menu.7. Administrator clicks the "Add" button.

Administrator

Add a page tosite/index

«uses»Authenticate

Page 29: Johnson Graduate School of Management Library Project

7. Page is now indexed and available for searching.

8. Administrator is returned the Administration page.

Entry conditions: Administrator must have a compatible web browser. Administrator must know the URL of the

administration page.

Administrator

Add a page tosite/index

«uses»Authenticate

Page 30: Johnson Graduate School of Management Library Project

Name: Remove a page from the site/index Actor: Administrator Flow of events:

1. Administrator removes HTML webpage from the online website.2. Administrator visits the Administration page.3. The “Authenticate” use case authenticates the administrator.4. In the "Remove" section, Administrator enters the URL of the page just

removed.5. Administrator clicks the "Remove" button.

Administrator

Remove a page fromsite/index

Authenticate«uses»

Page 31: Johnson Graduate School of Management Library Project

5. All data relating to that webpage is then removed from the index, and will no longer appear in search results.

6. Administrator is now returned the administration page.

Entry conditions: Administrator must have a compatible web browser. Administrator must know the URL of the administration

page.

Administrator

Remove a page fromsite/index

Authenticate«uses»

Page 32: Johnson Graduate School of Management Library Project

Name: Updating a page in the site/index Actor: Administrator Flow of events:

1. Administrator updates the HTML webpage on the online website.

2. Administrator visits the Administration page.

3. Administrator is authenticated through the “Authenticate” use case.

4. In the "Update" section, Administrator enters the URL of the page which has been updated.

5. Administrator clicks the “Continue" button.

Administrator

Update a page'scontent in site/index

Authenticate«uses»

Page 33: Johnson Graduate School of Management Library Project

5. Administrator is now presented with the current abstract, if one exists, for the page being updated.

6. If the Administrator wishes to alter or remove the abstract, they may edit it here.

7. Administrators clicks "Update" button.

8. All data relating to the updated page in the search index now reflects the updated contents.

9. Administrator is now returned to the administration page.

– Entry conditions: Administrator must have a compatible browser. Administrator must know the URL of the administration page.

Administrator

Update a page'scontent in site/index

Authenticate«uses»

Page 34: Johnson Graduate School of Management Library Project

Name: Authenticate Actor: Administrator Flow of events:

1. Administrator is requested for a user name and password for the administration page.

2. Administrator supplies user name and password, and presses <enter>.

3. Administrator is granted access to administration page.

Administrator

Add a page tosite/index

Remove a page fromsite/index

Update a page'scontent in site/index

Authenticate«uses»

«uses»

«uses»

Page 35: Johnson Graduate School of Management Library Project

User Interface

The design follows our understanding of the client’s requirement.

Page 36: Johnson Graduate School of Management Library Project

Simple Search

Page 37: Johnson Graduate School of Management Library Project

Advanced Search

Page 38: Johnson Graduate School of Management Library Project

Search Results

Page 39: Johnson Graduate School of Management Library Project

Ease of Use

Simple search for new users. Advanced search for skilled users.

Page 40: Johnson Graduate School of Management Library Project

Administration Interface

Page 41: Johnson Graduate School of Management Library Project

Adding a Page…

The add-page section on the Database Administration page

Page 42: Johnson Graduate School of Management Library Project

Removing a Page…

The remove-page section on the Database Administration page

Page 43: Johnson Graduate School of Management Library Project

Updating a Page…

The update-page section on the Database Administration page

Page 44: Johnson Graduate School of Management Library Project

Update a Page…

Page 45: Johnson Graduate School of Management Library Project

Consistent Procedures

Adding a page, removing a page and updating a page all follow similar procedures.

Page 46: Johnson Graduate School of Management Library Project

Feedbacks

A feedback about the administrative operation will be displayed on the top of the main administration page after the operation.

Page 47: Johnson Graduate School of Management Library Project

Error Handling

Error message for failed operations.

Page 48: Johnson Graduate School of Management Library Project

Follow up

The user should feel in control

Page 49: Johnson Graduate School of Management Library Project

Development Tools

Page 50: Johnson Graduate School of Management Library Project

PhpDig vs. Home BrewPhpDig pros: Easier to maintain if familiar with search API Potentially more flexible – for example, can automate indexing More robust than our solution PhpDig works right now Can index MS-Word, PDF, Excel documents with plug-ins

PhpDig cons: Documentation does not specify algorithms used Code is longer and more complex Indexing relatively slow, must use Firefox to add and update pages Using Help Forum on website requires $5.00 for 30 days access Simplistic ranking algorithm (based on cursory glance)

We currently favor using PhpDig as our solution.

Page 51: Johnson Graduate School of Management Library Project

Database Schema (Ours)

WordTable

PK WordPK ID

Count

PageTable

PK ID

URLTitleCategoryFullTextDateModifiedAbstract

Page 52: Johnson Graduate School of Management Library Project

Three Main Functions

Add page to database Search for page in database Remove page from database

Page 53: Johnson Graduate School of Management Library Project

Stop List (PhpDig + Ours)

Example words: a, the, I’m, isn’t, moreover

Example: The pig makes excellent soup

Filtered words: pig, makes, excellent, soup

Page 54: Johnson Graduate School of Management Library Project

Porter Stemmer (Ours)

Word stems are extracted from words Implementation is from Porter’s website Example:

pig, makes, excellent, soup

pig, make, excel, soup

Page 55: Johnson Graduate School of Management Library Project

Adding a Page

1. Scan page into database

2. Filter out common words with stop-list

3. Use Porter Stemming algorithm to retrieve word stems (PhpDig uses twoletters trick)

4. Add word stems to database

Page 56: Johnson Graduate School of Management Library Project

Search

1. Filter out common words from query

2. Use Porter Stemming algorithm on query (PhpDig uses twoletters)

3. Look for words in word table

4. Return pages that contain query terms

Page 57: Johnson Graduate School of Management Library Project

Removing a Page

PhpDig:• Look up URL in sites

table, get page ID, then get spider ID.

• Remove entries with corresponding spider IDs in engine and spider tables

• (Optional) Delete file from FTP server

Ours:• Look up URL in page

table, get page ID• Remove word entries

with corresponding page ID in word table

• Remove entry with specified URL in page table

Page 58: Johnson Graduate School of Management Library Project

Query Results

Ours: Sort by term frequency - inverse document frequency (tf-idf) score

PhpDig: sorted by occurrence

“Everything should be made as simple as possible, but not simpler.” -Einstein