prototype report - university of southern california · web viewthe second version of prototype...
TRANSCRIPT
Prototype Report
Soccer Data Web Crawler
Team No. 02
First Name Last Name RoleTrupti Sardesai Project ManagerWenchen Tu PrototyperSubessware Selvameena Karunamoorthy System/Software ArchitectPranshu Kumar Requirements EngineerZhitao Zhou Feasibility AnalystYan Zhang Operational Concept EngineerQing Hu Life Cycle PlannerAmir ali Tahmasebi Shaper
Version HistoryDate Author Version Changes made Rationale
10/11/14 WT 1.0 First Version of this Document For draft FC Package
12/01/14 WT 2.0 Second Version of this Document For DDC package
12/08/14 WT,TS 3.0 Changes made to UI images Final version of DC package
Table of ContentsPrototype Report..............................................................................................................................................................iVersion History..............................................................................................................................................................iiTable of Contents..........................................................................................................................................................iiiTable of Tables..............................................................................................................................................................ivTable of Figures.............................................................................................................................................................v
1. Introduction............................................................................................................................................................1
1.1 Purpose of Prototype Report......................................................................................................................1
1.2 Status..........................................................................................................................................................1
2. Navigation Flow......................................................................................................................................................2
3. Prototype..................................................................................................................................................................3
3.1 Developer User Interface...........................................................................................................................3
3.2 Basic Web Crawler.....................................................................................................................................5
3.3 Social Media Crawler.................................................................................................................................9
Table of TablesTable 1: Developer User Interface................................................................................................................................3
Table 2: Web Crawler Architecture...............................................................................................................................5
Table 3: Spider for whoscored.com...............................................................................................................................6
Table 4: Workflow for gathering Facebook data.........................................................................................................9
Table 5: Workflow for gathering Twitter data..............................................................................................................9
Table of FiguresFigure 1: Navigation Flow of Soccer Data Web Crawler..............................................................................................2
Figure 2: Websites New Player User Interface..............................................................................................................3
Figure 3: New Task User Interface................................................................................................................................4
Figure 4: Player List User Interface..............................................................................................................................4
Figure 5: Task List User Interface..................................................................................................................................5
Figure 6: The basic web crawler architecture...............................................................................................................6
Figure 7: team data abstracted from topdraw................................................................................................................7
Figure 8: player data abstracted from nasl....................................................................................................................8
Figure 9: player data abstracted from sbnation.............................................................................................................8
Figure 10: team data abstracted from mls.....................................................................................................................9
Figure 11: Workflow of gathering Facebook Data......................................................................................................10
Figure 12: main data gathered from Facebook............................................................................................................10
Figure 13: post data gathered from Facebook.............................................................................................................11
Figure 14: Workflow of gathering Twitter Data...........................................................................................................12
Figure 15: main data gathered from Twitter................................................................................................................12
Figure 16: tweet data gathered from Twitter...............................................................................................................13
Prototype Report Version 3.0
1. Introduction
1.1 Purpose of Prototype Report
A prototype is an early sample, model, or release of a product built to test a concept or process or to act as a thing to be replicated or learned from. A prototype is designed to test and trial a new design to enhance precision by system analysts and users. The purpose of prototype report is to provide a description of the overall structure of Soccer Data Web Crawler so that clients can have a better visual view of the product. In this way, we can further eliminate ambiguity, detect misunderstanding and make improvements.
1.2 StatusThe second version of Prototype report records the Prototype for the whole system including web crawler, social media crawler and developer User Interface. For web crawler, we give the prototype of the overall architecture, spiders for two representative websites. For extracting data from social media, we Prototype the workflow of using Facebook API to acquire player Facebook data according to his name as well as workflow of Twitter. For developer User Interface, we implement an easy version of User Interface which provides functions such as new player, update and delete website list.
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/141
Prototype Report Version 3.0
2. Navigation Flow
Figure 1: Navigation Flow of Soccer Data Web Crawler
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/142
Prototype Report Version 3.0
3. Prototype
3.1 Developer User Interface
Description The User Interface for developer to manage website list and other parameters.
Related Win Condition
WC_3398As a developer, I can add, delete, update the specific websites visited, fields to capture from the website and frequency of crawler refreshes for each specified website.
Table 1: Developer User Interface
Figure 2: Websites New Player User Interface
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/143
Prototype Report Version 3.0
Figure 3: New Task User Interface
Figure 4: Player List User Interface
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/144
Prototype Report Version 3.0
Figure 5: Task List User Interface
3.2 Basic Web CrawlerIn this prototype, we will give the web crawler architecture. The architecture consists of modules to extract both data and links from the website. Description The modules and workflow of basic web crawler.Related Win Condition
WC_3473The web crawler shall gather team information from the websites in the website list.
WC_3472The web crawler shall gather player information from the websites in the website list.
WC_3413The webcraweler shall gather head shots of players from the biography page on the website being crawled so that the player's picture can be shown on the report being generated.
WC_3412The web crawler shall gather videos from the pages being crawled and ingest into STBI as is so that the coach and fans is able to watch the relevant videos PA.
Table 2: Web Crawler Architecture
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/145
Prototype Report Version 3.0
Figure 6: The basic web crawler architectureBesides, we analyzed the websites in the list client gave to us. There are basically two category websites. One is that the statistical table is generated statically. We can get the table from the source code of this website. The other type is that the statistical data is dynamically generated when the page is loaded. In the second type, we need to get the request URL the page used to load the data. Uslpro.uslsoccer.com website belongs to the second category. While the nasl website belongs to the first category.
Description The Spider for crawling player data from whoscored.com website.
Related Win Condition
WC_3472The web crawler shall gather player information from the websites in the website list.
Table 3: Spider for whoscored.com
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/146
Prototype Report Version 3.0
Figure 7: team data abstracted from topdraw
Figure 8: player data abstracted from nasl
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/147
Prototype Report Version 3.0
Figure 9: player data abstracted from sbnation
Figure 10: team data abstracted from mls
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/148
Prototype Report Version 3.0
3.3 Gather Social Media Data
3.3.1 Facebook
Description The overall work flow to user player’s name to get Facebook data.
Related Win Condition
WC_3416The web crawler shall get comments, name and number of members, likes from specified Facebook pages
Table 4: Workflow for gathering Facebook data
Figure 11: Workflow of gathering Facebook Data
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/149
Prototype Report Version 3.0
Figure 12: main data gathered from Facebook
Figure 13: post data gathered from Facebook
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/1410
Prototype Report Version 3.0
3.3.2 Twitter
Description The overall work flow to get player Twitter information.
Related Win Condition
WC_3417The web crawler shall get number of followers, the comments and the number of retweets for a specified twitter account.
Table 5: Workflow for gathering Twitter data
Figure 14: Workflow of gathering Twitter Data
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/1411
Prototype Report Version 3.0
Figure 15: main data gathered from Twitter
Figure 16: tweet data gathered from Twitter
PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/1412