johnson graduate school of management library project
DESCRIPTION
Clients: Ken Bolton Angela K. Horne JGSM Library Reference Team. Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu. Johnson Graduate School of Management Library Project. Functional Requirements. Search Function Simple Search Advanced Search - PowerPoint PPT PresentationTRANSCRIPT
Johnson Graduate School of Management Library Project
Clients:
Ken Bolton
Angela K. Horne
JGSM Library Reference Team
Project Team:
Jonathan Gong
Benson Lee
Man Fai Matthew Lee
Greg Leedberg
Liz Xu
Functional Requirements
Search Function Simple Search Advanced Search
Administrative Features Add HTML Page Remove HTML Page Update Existing HTML Page
Search Feature
Why? Because the client would like the content of their website to be more accessible
Simple Search For a easy to accessible search
Advanced Search To limit search results and get better results
Simple Search
A search box will be located on the home page of the JGSM library website. (http://www.library.cornell.edu/johnson/)
The system will return all of the pages that contain all or any of the words provided by the user. (with exceptions)
Example: “Bloomberg FAQ”
“the Bloomberg FAQ”
Advanced Search
Search fields: Find pages with all of the keywords Find pages with any of the keywords Find pages with “the exact phrase”
Example: “Bloomberg FAQ” Limit search to a specific category
Viewing Search Results
The results of the search should be displayed 10 to a page in ranked order
Search results will contain the title of the pages, link to the pages, and a short description
Search results should reflect what the most useful links are to users
Example: “Bloomberg FAQ”
Administrative Features - Add
All administrative features must authenticate the user using a username and password.
Add HTML Page – Administrator can: specify a URL to add to the search system
the system will add page and key metadata into the database
select category for the page add an abstract to be associated with the page (optional)
if there is no abstract, part of the text of the document will be displayed in search results
Admin Features - Remove
Remove HTML Page – Administrator can: specify a URL to remove from the search system
the system will remove the page and all association with the URL from the database
upon removal the page will no longer be searched by users
if the URL does not exist in the database, the system will display an error
Admin Features - Update
Update HTML Page – Administrator can: specify the page to update using its URL
the page metadata in the database is updated from the new URL
change the category of the page (optional) change the abstract of the page after viewing the
old page abstract (optional)
Non-Functional Requirements
Ease of Use Documentation Help System Deployment Scalability Security Design Criteria
Ease of Use
System will be extremely easy to use
Search Search box on main JGSM Library’s page A link on the main JGSM Library’s page to the
advanced search page Advanced search’s 3 options are also self-explanatory
Ease of Use
Administration The administration user interface is very
straightforward Three functionalities:
Add a Page Remove a Page Update a Page
Ease of Use
After viewing the training slides and trying it out a few times…
An administrator should be able to maintain the database through the administration page immediately
Documentation
All source code that we write will have documentation within
All source code that we use from another source will include information on where it came from
A separate document will contain our implementation strategies and describe all algorithms we use
Help System
Search There will be a link to a help page on the
advanced search page that suggests ways to get better search results
That page will automatically display if no search results are found
Help System
Administration A brief help page written by us will be linked to on
the administration page for instructions on usage There will be error messages that indicate what
went wrong, if errors occur during database maintenance
Deployment
We will install and configure all necessary software and integrate the system into the JGSM Library system
After deployment, system can be used instantly by anyone who accesses the page
Scalability
The system will not experience visible slowdown as the document base grows, up to at least twice the number of documents currently in the database
This applies for both searching and database administration
Security
Administration page will be accessed with user name and password
We recommend that the client do not link to the administration page from anywhere on the JGSM site
Use Cases
The use scenarios of this system involve two actors: The website user who wishes to search The administrator who actually manages the website.
WebsiteUser Administrator
Use Cases:WebsiteUser Use Cases
WebsiteUser
Quick search
Advanced Search
View Results
«uses»
«uses»
Name: Quick search Actor: WebsiteUser Flow of events:
1. WebsiteUser visits Johnson Graduate School of Management Library Website
2. WebsiteUser clicks in "simple search" box near top of page
3. WebsiteUser types in one or more search terms into the box that they desire to search for.
4. WebsiteUser presses <enter>.
WebsiteUser
Quick search
View Results
«uses»
5. WebsiteUser views results via the View Results use case.
6. When completed, WebsiteUser may either browse to another webpage, close their web browser, or perform another search.
Entry conditions: WebsiteUser knows URL of library website. WebsiteUser has a compatible browser.
WebsiteUser
Quick search
View Results
«uses»
Name: Advanced Search Actor: WebsiteUser Flow of events:
1. WebsiteUser visits Johnson Graduate School of Management Library Website
2. WebsiteUser clicks on "Advanced Search" link. 3. WebsiteUser is presented with advanced search options --
searching for "any" words, "all" words, exact phrase, or within a certain category.
4. WebsiteUser types in one or more search terms into the box that corresponds to the type of search they wish to perform.
5. WebsiteUser selects the category they wish to search within, if any.
6. WebsiteUser clicks the "search" button.
WebsiteUser Advanced Search
View Results
«uses»
7. WebsiteUser views results via the View Results use case.8. When completed, WebsiteUser may either browse to
another webpage, close their web browser, or perform another search.
Entry conditions: WebsiteUser must have a web browser capable of
displaying the Johnson Graduate School of Management library website.
WebsiteUser must know the URL of the JGSM Library website, or browse there from another site.
WebsiteUser Advanced Search
View Results
«uses»
Name: View Results Actor: WebsiteUser Flow of events:
1. Website presents WebsiteUser with a results page, containing a list of the first 10 results, ordered by relevance as determined by the search engine's ranking algorithm.
2. For each result, the results page includes a title of the page, a link to that page, and the context in which the search term(s) were used, OR an abstract of the page.
3. If a result seems useful to WebsiteUser, they click on the link and can visit the page. They may navigate back to the results page to see the results again.
4. If there are more than 10 results, WebsiteUser may see the next 10 by clicking a "next page" link at the bottom of the search results.
WebsiteUser
Quick search
Advanced Search
View Results
«uses»
«uses»
Use Cases:Administrator
Administrator
Add a page tosite/index
Remove a page fromsite/index
Update a page'scontent in site/index
Authenticate«uses»
«uses»
«uses»
Name: Add a page to the site/index Actor: Administrator Flow of events:
1. Administrator adds HTML page to online website.2. Administrator visits the Administration Page.3. The “Authenticate” use case authenticates the Administrator4. In the "Add" section, Administrator enters the URL of the page just
added.5. If Administrator desires to store a description of the page (for use in the
search results), they enter it in the description box.6. If this page belongs to a category (used for advanced searching), they
may select that category from the category pull-down menu.7. Administrator clicks the "Add" button.
Administrator
Add a page tosite/index
«uses»Authenticate
7. Page is now indexed and available for searching.
8. Administrator is returned the Administration page.
Entry conditions: Administrator must have a compatible web browser. Administrator must know the URL of the
administration page.
Administrator
Add a page tosite/index
«uses»Authenticate
Name: Remove a page from the site/index Actor: Administrator Flow of events:
1. Administrator removes HTML webpage from the online website.2. Administrator visits the Administration page.3. The “Authenticate” use case authenticates the administrator.4. In the "Remove" section, Administrator enters the URL of the page just
removed.5. Administrator clicks the "Remove" button.
Administrator
Remove a page fromsite/index
Authenticate«uses»
5. All data relating to that webpage is then removed from the index, and will no longer appear in search results.
6. Administrator is now returned the administration page.
Entry conditions: Administrator must have a compatible web browser. Administrator must know the URL of the administration
page.
Administrator
Remove a page fromsite/index
Authenticate«uses»
Name: Updating a page in the site/index Actor: Administrator Flow of events:
1. Administrator updates the HTML webpage on the online website.
2. Administrator visits the Administration page.
3. Administrator is authenticated through the “Authenticate” use case.
4. In the "Update" section, Administrator enters the URL of the page which has been updated.
5. Administrator clicks the “Continue" button.
Administrator
Update a page'scontent in site/index
Authenticate«uses»
5. Administrator is now presented with the current abstract, if one exists, for the page being updated.
6. If the Administrator wishes to alter or remove the abstract, they may edit it here.
7. Administrators clicks "Update" button.
8. All data relating to the updated page in the search index now reflects the updated contents.
9. Administrator is now returned to the administration page.
– Entry conditions: Administrator must have a compatible browser. Administrator must know the URL of the administration page.
Administrator
Update a page'scontent in site/index
Authenticate«uses»
Name: Authenticate Actor: Administrator Flow of events:
1. Administrator is requested for a user name and password for the administration page.
2. Administrator supplies user name and password, and presses <enter>.
3. Administrator is granted access to administration page.
Administrator
Add a page tosite/index
Remove a page fromsite/index
Update a page'scontent in site/index
Authenticate«uses»
«uses»
«uses»
User Interface
The design follows our understanding of the client’s requirement.
Simple Search
Advanced Search
Search Results
Ease of Use
Simple search for new users. Advanced search for skilled users.
Administration Interface
Adding a Page…
The add-page section on the Database Administration page
Removing a Page…
The remove-page section on the Database Administration page
Updating a Page…
The update-page section on the Database Administration page
Update a Page…
Consistent Procedures
Adding a page, removing a page and updating a page all follow similar procedures.
Feedbacks
A feedback about the administrative operation will be displayed on the top of the main administration page after the operation.
Error Handling
Error message for failed operations.
Follow up
The user should feel in control
Development Tools
PhpDig vs. Home BrewPhpDig pros: Easier to maintain if familiar with search API Potentially more flexible – for example, can automate indexing More robust than our solution PhpDig works right now Can index MS-Word, PDF, Excel documents with plug-ins
PhpDig cons: Documentation does not specify algorithms used Code is longer and more complex Indexing relatively slow, must use Firefox to add and update pages Using Help Forum on website requires $5.00 for 30 days access Simplistic ranking algorithm (based on cursory glance)
We currently favor using PhpDig as our solution.
Database Schema (Ours)
WordTable
PK WordPK ID
Count
PageTable
PK ID
URLTitleCategoryFullTextDateModifiedAbstract
Three Main Functions
Add page to database Search for page in database Remove page from database
Stop List (PhpDig + Ours)
Example words: a, the, I’m, isn’t, moreover
Example: The pig makes excellent soup
Filtered words: pig, makes, excellent, soup
Porter Stemmer (Ours)
Word stems are extracted from words Implementation is from Porter’s website Example:
pig, makes, excellent, soup
pig, make, excel, soup
Adding a Page
1. Scan page into database
2. Filter out common words with stop-list
3. Use Porter Stemming algorithm to retrieve word stems (PhpDig uses twoletters trick)
4. Add word stems to database
Search
1. Filter out common words from query
2. Use Porter Stemming algorithm on query (PhpDig uses twoletters)
3. Look for words in word table
4. Return pages that contain query terms
Removing a Page
PhpDig:• Look up URL in sites
table, get page ID, then get spider ID.
• Remove entries with corresponding spider IDs in engine and spider tables
• (Optional) Delete file from FTP server
Ours:• Look up URL in page
table, get page ID• Remove word entries
with corresponding page ID in word table
• Remove entry with specified URL in page table
Query Results
Ours: Sort by term frequency - inverse document frequency (tf-idf) score
PhpDig: sorted by occurrence
“Everything should be made as simple as possible, but not simpler.” -Einstein