building topic/trend detection system based on slow intelligence

24
Building Topic/Trend Detection System based on Slow Intelligence Chia-Chun Shih & Ting-Chun Peng Institute for Information Industry Taipei, Taiwan Presented at DMS’10 special session on Slow Intelligence Systems

Upload: kailey

Post on 04-Jan-2016

44 views

Category:

Documents


1 download

DESCRIPTION

Building Topic/Trend Detection System based on Slow Intelligence. Chia-Chun Shih & Ting-Chun Peng Institute for Information Industry Taipei, Taiwan. Presented at DMS’10 special session on Slow Intelligence Systems. Agenda. Introduction Topic/Trend Detection System - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Building Topic/Trend Detection System based on Slow Intelligence

Building Topic/Trend Detection System based on Slow Intelligence

Chia-Chun Shih & Ting-Chun Peng

Institute for Information Industry

Taipei, Taiwan

Presented at DMS’10 special session on Slow Intelligence Systems

Page 2: Building Topic/Trend Detection System based on Slow Intelligence

2

Agenda

• Introduction• Topic/Trend Detection System• Topic/Trend Detection System with Slow Intelligence• Conclusion

Page 3: Building Topic/Trend Detection System based on Slow Intelligence

Introduction

Page 4: Building Topic/Trend Detection System based on Slow Intelligence

4

Introduction

• Social media is prevailing• Social media is a reflection of real-world

– An experiment from HP Social Computing Lab shows:• Twitter-rate time series can accurately predict box-office movie sales with

Adjusted R2 = 0.973 (amazing!!)

• The emerging market for Social Media Monitoring Service– E.g., Nielsen Buzzmetrics, Radian6

Twitter PostsTwitter PostsBlog PostsBlog Posts Facebook UsersFacebook Users

Page 5: Building Topic/Trend Detection System based on Slow Intelligence

5

Introduction

• Topic Detection and Tracking (TDT)– Initiated by DARPA at 1996– discover the topical structure in unsegmented streams of

news reporting as it appears across multiple media– Tasks:

• Topic Detection• Topic Tracking• First Story Detection• Story Segmentation• Link Detection

(cont’d)

Page 6: Building Topic/Trend Detection System based on Slow Intelligence

6

Introduction

• Slow Intelligence provides a software development framework for systems with insufficient computing resources to gradually adapt to environments to handle complexities

(cont’d)

EnumeratorEnumerator AdaptorAdaptor EliminatorEliminator ConcentratorConcentratorProblemProblem Solution Solution

Knowledge-based ControllerKnowledge-based Controller

EnvironmentEnvironment

Slow Intelligence System

1 2 3 4

Page 7: Building Topic/Trend Detection System based on Slow Intelligence

7

Introduction • In this paper, we propose a design of online topic/trend detection system

for Social Media with the advantages of Slow Intelligence.• Four complexities of designing online topic/trend detection systems are

identified, along with corresponding Slow Intelligence solutions.

(cont’d)

Enumerator Adaptor Eliminator Concentrator

Slow Intelligence System Building Blocks

Crawler & Extractor Topic Extractror Trend Detector

Topic/Trend Detection System

SIS system for scheduling Crawlers

SIS system for Selecting Trend Estimation MethodSIS System for

Focused Crawling

SIS system for adapting extractors

Enumerator Adaptor Eliminator Concentrator

Slow Intelligence System Building Blocks

Crawler & Extractor Topic Extractror Trend Detector

Topic/Trend Detection System

SIS system for scheduling Crawlers

SIS system for Selecting Trend Estimation MethodSIS System for

Focused Crawling

SIS system for adapting extractors

Page 8: Building Topic/Trend Detection System based on Slow Intelligence

Topic/Trend Detection System

Page 9: Building Topic/Trend Detection System based on Slow Intelligence

9

Topic/Trend Detection System

• Objective– Detect current hot topics and to predict future hot topics based on data

collected from Social Media

• Three components– Crawler & Extractor: Collect data and extract information from Social

Media– Topic Extractor: Detect hot topics from a set of text documents– Trend Detector: Detect trends (future hot topics) based on currently

available data

Crawler &

Extractor

Topic Extractor

Trend Detector

SocialMedia

Current Hot topics

Future Hot topics

Page 10: Building Topic/Trend Detection System based on Slow Intelligence

10

Topic/Trend Detection System

• Crawler & Extractor

(cont’d)

Web dataDB

WebCrawler

HTMLdocuments

InformationExtractor

* Extract articles and metadata (title, author, content, etc) from semi-structured web content

User’sKeywords of

Interests

Topic Extractor

Social Media

Textdocuments

Crawler & Extractor

Page 11: Building Topic/Trend Detection System based on Slow Intelligence

11

Topic/Trend Detection System

• Topic Extractor

(cont’d)

Web dataDB

Topic WordExtraction

Topic WordClustering

Hot topicextraction

Currenttopics

CurrentHot topics

Topic Extractor

• Apply TF-IDF scheme to generate Top-N topic words for each document

• Apply clustering algorithm to cluster topic words into topic groups. The topic groups are treated as “topics”

• Apply aging theory to find hot topics

Page 12: Building Topic/Trend Detection System based on Slow Intelligence

12

Topic/Trend Detection System

• Trend Detector

(cont’d)

Trend Detector

Currenttopics

Trend EstimationAlgorithms

Topic Trend(Future Hot Topics)

• The Trend Estimation Algorithm is a black box now, however, it will “find its way” when Slow Intelligence is involved in the system

Page 13: Building Topic/Trend Detection System based on Slow Intelligence

Topic/Trend Detection Systemwith Slow Intelligence

Page 14: Building Topic/Trend Detection System based on Slow Intelligence

14

T/TD System with Slow Intelligence

• Four complexities of designing online topic/trend detection systems

• 1. It is unlikely to collect all web data based on limited amount of computing resources. The system needs to develop data collection strategies which can concentrate limited resources on collecting important web data.

Crawler &

Extractor

Page 15: Building Topic/Trend Detection System based on Slow Intelligence

15

T/TD System with Slow Intelligence

• 2. Many computation methods are available for estimating trends. If parameter settings are also taken into account, there are too many combinations to choose. Furthermore, Internet is a changing environment, which means current best solution may not perform well in the future. The system needs to automatically (or at least quasi-automatically) find best solution from many alternatives in a changing environment.

(cont’d)

Trend Detector

Page 16: Building Topic/Trend Detection System based on Slow Intelligence

16

T/TD System with Slow Intelligence

• 3. The crawler needs to revisit websites to collect up-to-date data in hourly or daily intervals. Each site has different amount of to-be-update data and different policy to restrict frequent access, which are unknown beforehand. The system needs to find feasible data collection schedule based on past experience.

(cont’d)

Crawler &

Extractor

Page 17: Building Topic/Trend Detection System based on Slow Intelligence

17

T/TD System with Slow Intelligence

• 4. Any changes in web pages may disrupt Extractors. It needs automatic repair mechanism for Extractors if many websites are being monitored. The repair mechanism needs to detect errors of Extractors, find alternatives, and choose the best solution from alternatives to fix the disrupted Extractors.

(cont’d)

Crawler &

Extractor

Page 18: Building Topic/Trend Detection System based on Slow Intelligence

18

T/TD System with Slow Intelligence

1. SIS to help restrict the range of data collection

(cont’d)

Knowledge of data

Knowledge of algorithm

Page 19: Building Topic/Trend Detection System based on Slow Intelligence

19

T/TD System with Slow Intelligence

2. SIS to help select and adapt trend detection algorithms

(cont’d)

Page 20: Building Topic/Trend Detection System based on Slow Intelligence

20

T/TD System with Slow Intelligence

3. SIS to help scheduling Crawler

(cont’d)

Page 21: Building Topic/Trend Detection System based on Slow Intelligence

21

T/TD System with Slow Intelligence

4. SIS to help adapt Extractors

(cont’d)

Page 22: Building Topic/Trend Detection System based on Slow Intelligence

Conclusion

Page 23: Building Topic/Trend Detection System based on Slow Intelligence

23

Conclusion

• An online trend detection system requires careful resource allocation and automatic algorithm adaptation to process huge size of heterogeneous data.

• This research adopts Slow Intelligence, which provides a framework for systems with insufficient computing resources to gradually adapt to environments, to response the challenges.

• Four Slow Intelligence subsystems are proposed, and each subsystem targets a challenge in designing online topic/trend detection systems.

Page 24: Building Topic/Trend Detection System based on Slow Intelligence

If you have any questions, please e-mail us

[email protected] (Chia-Chun Shih)

[email protected] (Ting-Chun Peng)