qing-cai chen; xiao-hong yang; xiao-long wang machine learning and cybernetics (icmlc), 2011...

1

A PEER-TO-PEER BASED PASSIVE WEB CRAWLING

SYSTEM

Qing-Cai Chen; Xiao-Hong Yang; Xiao-Long Wang Machine

Learning and Cybernetics (ICMLC), 2011 International

Conference on Year: 2011 , Page(s): 1878 – 1883

Speaker : Chang, Kun-Hsiang

2

Outline

Abstract P2P based passive web crawling system Crawler server registering Content updated notification Download updated content by P2P network Website discovering

3

Abstract This paper proposes an innovative client/server

based web crawling system.

main benefits : Capability of timely management web changes for a

crawle.The saving of website bandwidth resources.The capability of downloading large files or multimedia

content features.The capability of protection intellectual properties

while indexing and searching the content.

4

The basic principle of a Crawler

5

P2P based passive web crawling system

6

Responsibilities Assignment for Crawler Server and Crawler Client

7

Crawler server registering robots.xml

PortIP address.

8

Content updated notification

a new registered server, it has to wait for several days or weeks to be notified to download all history contents on this website.

9

Download updated content by P2P network

10

Website discovering

11

END

qing-cai chen; xiao-hong yang; xiao-long wang machine learning and cybernetics (icmlc), 2011...

Documents