prototype report - university of southern california · web viewthe second version of prototype...

20
Prototype Report Soccer Data Web Crawler Team No. 02 First Name Last Name Role Trupti Sardesai Project Manager Wenchen Tu Prototyper Subessware Selvameena Karunamoorthy System/Software Architect Pranshu Kumar Requirements Engineer Zhitao Zhou Feasibility Analyst Yan Zhang Operational Concept Engineer Qing Hu Life Cycle Planner Amir ali Tahmasebi Shaper

Upload: lamdieu

Post on 29-Apr-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report

Soccer Data Web Crawler

Team No. 02

First Name Last Name RoleTrupti Sardesai Project ManagerWenchen Tu PrototyperSubessware Selvameena Karunamoorthy System/Software ArchitectPranshu Kumar Requirements EngineerZhitao Zhou Feasibility AnalystYan Zhang Operational Concept EngineerQing Hu Life Cycle PlannerAmir ali Tahmasebi Shaper

Page 2: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Version HistoryDate Author Version Changes made Rationale

10/11/14 WT 1.0 First Version of this Document For draft FC Package

12/01/14 WT 2.0 Second Version of this Document For DDC package

12/08/14 WT,TS 3.0 Changes made to UI images Final version of DC package

Page 3: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Table of ContentsPrototype Report..............................................................................................................................................................iVersion History..............................................................................................................................................................iiTable of Contents..........................................................................................................................................................iiiTable of Tables..............................................................................................................................................................ivTable of Figures.............................................................................................................................................................v

1. Introduction............................................................................................................................................................1

1.1 Purpose of Prototype Report......................................................................................................................1

1.2 Status..........................................................................................................................................................1

2. Navigation Flow......................................................................................................................................................2

3. Prototype..................................................................................................................................................................3

3.1 Developer User Interface...........................................................................................................................3

3.2 Basic Web Crawler.....................................................................................................................................5

3.3 Social Media Crawler.................................................................................................................................9

Page 4: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Table of TablesTable 1: Developer User Interface................................................................................................................................3

Table 2: Web Crawler Architecture...............................................................................................................................5

Table 3: Spider for whoscored.com...............................................................................................................................6

Table 4: Workflow for gathering Facebook data.........................................................................................................9

Table 5: Workflow for gathering Twitter data..............................................................................................................9

Page 5: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Table of FiguresFigure 1: Navigation Flow of Soccer Data Web Crawler..............................................................................................2

Figure 2: Websites New Player User Interface..............................................................................................................3

Figure 3: New Task User Interface................................................................................................................................4

Figure 4: Player List User Interface..............................................................................................................................4

Figure 5: Task List User Interface..................................................................................................................................5

Figure 6: The basic web crawler architecture...............................................................................................................6

Figure 7: team data abstracted from topdraw................................................................................................................7

Figure 8: player data abstracted from nasl....................................................................................................................8

Figure 9: player data abstracted from sbnation.............................................................................................................8

Figure 10: team data abstracted from mls.....................................................................................................................9

Figure 11: Workflow of gathering Facebook Data......................................................................................................10

Figure 12: main data gathered from Facebook............................................................................................................10

Figure 13: post data gathered from Facebook.............................................................................................................11

Figure 14: Workflow of gathering Twitter Data...........................................................................................................12

Figure 15: main data gathered from Twitter................................................................................................................12

Figure 16: tweet data gathered from Twitter...............................................................................................................13

Page 6: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

1. Introduction

1.1 Purpose of Prototype Report

A prototype is an early sample, model, or release of a product built to test a concept or process or to act as a thing to be replicated or learned from. A prototype is designed to test and trial a new design to enhance precision by system analysts and users. The purpose of prototype report is to provide a description of the overall structure of Soccer Data Web Crawler so that clients can have a better visual view of the product. In this way, we can further eliminate ambiguity, detect misunderstanding and make improvements.

1.2 StatusThe second version of Prototype report records the Prototype for the whole system including web crawler, social media crawler and developer User Interface. For web crawler, we give the prototype of the overall architecture, spiders for two representative websites. For extracting data from social media, we Prototype the workflow of using Facebook API to acquire player Facebook data according to his name as well as workflow of Twitter. For developer User Interface, we implement an easy version of User Interface which provides functions such as new player, update and delete website list.

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/141

Page 7: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

2. Navigation Flow

Figure 1: Navigation Flow of Soccer Data Web Crawler

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/142

Page 8: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

3. Prototype

3.1 Developer User Interface

Description The User Interface for developer to manage website list and other parameters.

Related Win Condition

WC_3398As a developer, I can add, delete, update the specific websites visited, fields to capture from the website and frequency of crawler refreshes for each specified website.

Table 1: Developer User Interface

Figure 2: Websites New Player User Interface

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/143

Page 9: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

Figure 3: New Task User Interface

Figure 4: Player List User Interface

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/144

Page 10: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

Figure 5: Task List User Interface

3.2 Basic Web CrawlerIn this prototype, we will give the web crawler architecture. The architecture consists of modules to extract both data and links from the website. Description The modules and workflow of basic web crawler.Related Win Condition

WC_3473The web crawler shall gather team information from the websites in the website list.

WC_3472The web crawler shall gather player information from the websites in the website list.

WC_3413The webcraweler shall gather head shots of players from the biography page on the website being crawled so that the player's picture can be shown on the report being generated.

WC_3412The web crawler shall gather videos from the pages being crawled and ingest into STBI as is so that the coach and fans is able to watch the relevant videos PA.

Table 2: Web Crawler Architecture

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/145

Page 11: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

Figure 6: The basic web crawler architectureBesides, we analyzed the websites in the list client gave to us. There are basically two category websites. One is that the statistical table is generated statically. We can get the table from the source code of this website. The other type is that the statistical data is dynamically generated when the page is loaded. In the second type, we need to get the request URL the page used to load the data. Uslpro.uslsoccer.com website belongs to the second category. While the nasl website belongs to the first category.

Description The Spider for crawling player data from whoscored.com website.

Related Win Condition

WC_3472The web crawler shall gather player information from the websites in the website list.

Table 3: Spider for whoscored.com

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/146

Page 12: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

Figure 7: team data abstracted from topdraw

Figure 8: player data abstracted from nasl

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/147

Page 13: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

Figure 9: player data abstracted from sbnation

Figure 10: team data abstracted from mls

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/148

Page 14: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

3.3 Gather Social Media Data

3.3.1 Facebook

Description The overall work flow to user player’s name to get Facebook data.

Related Win Condition

WC_3416The web crawler shall get comments, name and number of members, likes from specified Facebook pages

Table 4: Workflow for gathering Facebook data

Figure 11: Workflow of gathering Facebook Data

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/149

Page 15: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

Figure 12: main data gathered from Facebook

Figure 13: post data gathered from Facebook

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/1410

Page 16: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

3.3.2 Twitter

Description The overall work flow to get player Twitter information.

Related Win Condition

WC_3417The web crawler shall get number of followers, the comments and the number of retweets for a specified twitter account.

Table 5: Workflow for gathering Twitter data

Figure 14: Workflow of gathering Twitter Data

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/1411

Page 17: Prototype Report - University of Southern California · Web viewThe second version of Prototype report records the Prototype for the whole system including web crawler, social media

Prototype Report Version 3.0

Figure 15: main data gathered from Twitter

Figure 16: tweet data gathered from Twitter

PRO_FCP_F14a_T02_V3.0 Version Date: 12/07/1412