qing-cai chen; xiao-hong yang; xiao-long wang machine learning and cybernetics (icmlc), 2011...
TRANSCRIPT
1
A PEER-TO-PEER BASED PASSIVE WEB CRAWLING
SYSTEM
Qing-Cai Chen; Xiao-Hong Yang; Xiao-Long Wang Machine
Learning and Cybernetics (ICMLC), 2011 International
Conference on Year: 2011 , Page(s): 1878 – 1883
Speaker : Chang, Kun-Hsiang
2
Outline
Abstract P2P based passive web crawling system Crawler server registering Content updated notification Download updated content by P2P network Website discovering
3
Abstract This paper proposes an innovative client/server
based web crawling system.
main benefits : Capability of timely management web changes for a
crawle.The saving of website bandwidth resources.The capability of downloading large files or multimedia
content features.The capability of protection intellectual properties
while indexing and searching the content.
4
The basic principle of a Crawler
5
P2P based passive web crawling system
6
Responsibilities Assignment for Crawler Server and Crawler Client
7
Crawler server registering robots.xml
PortIP address.
8
Content updated notification
a new registered server, it has to wait for several days or weeks to be notified to download all history contents on this website.
9
Download updated content by P2P network
10
Website discovering
11
END