trends in web search and its relevance to digital libraries

21
Trends in Web Search and its relevance to Digital Libraries Min-Yen Kan Web IR NLP Group (WING) National University of Singapore

Upload: nitesh

Post on 11-Jan-2016

43 views

Category:

Documents


2 download

DESCRIPTION

Trends in Web Search and its relevance to Digital Libraries. Min-Yen Kan Web IR NLP Group (WING) National University of Singapore. Tips on Web Searching. Visualize results, then come up with multiple queries Use multiple search engines Advanced Search inurl:, site: “Phrasal search” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Trends in Web Search and its relevance to Digital Libraries

Trends in Web Search and its relevance to Digital Libraries

Min-Yen Kan

Web IR NLP Group (WING)

National University of Singapore

Page 2: Trends in Web Search and its relevance to Digital Libraries

226 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Tips on Web Searching

• Visualize results, then come up with multiple queries• Use multiple search engines• Advanced Search

– inurl:, site:

– “Phrasal search”

But that’s just general search…• Federated resources / Niche search engines

Page 3: Trends in Web Search and its relevance to Digital Libraries

326 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Site- and Task-specific resources• Site Prestige

Know what others think and do– Google PageRank (Link structure), Alexa (Traffic)– Google Trends / Insight (Queries)

• Social Searching (Web 2.0)The voice of the reader / critic– (Bookmarks / Tags) Del.icio.us, Citeulike.org, Bibsonomy.org– (News) Digg / Slashdot– (Blogs) Google Blog, Technorati

• People Search:Finding public information on a person

– Spock (web), Zabasearch (US only)– LinkedIn, Facebook– Must validate your sources

http://labs.digg.com/arc/

Page 4: Trends in Web Search and its relevance to Digital Libraries

426 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Expert Search Find people who will advocate on your behalf• What do they want?

• Scholar: – Active? → Check their recent articles– Names common? → Define area of interest– Compare against peers– Download vs. citation counts

• Patent search: – Referenced by: (citation count; different than scholar)

• Identifying webfaced advocates:

– Blog search, PageRank

http://flickr.com/photos/phauly/

How do machines do it?• Expert search task as benchmark test• Download web pages to analyze• Needed to deal with spam pages• Used PageRank to assess prestige

How do machines do it?• Expert search task as benchmark test• Download web pages to analyze• Needed to deal with spam pages• Used PageRank to assess prestige

→ Impact

Page 5: Trends in Web Search and its relevance to Digital Libraries

526 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Problem or opportunity?• Revenue from print continually declining• Students and researchers rely on internet• Researchers want archiving rights – freedom of academic information

Characteristics:• Not zero-sum content• Distribution is now largely the role of search engines → Necessitates new role of publisher and new revenue model

– Will classic models work? Advertising, Subscription, Transactional & Bundling – Variants? Versioning (Varian), Moving window (JSTOR)

http://flickr.com/photos/danielbroche/

The game has fundamentally changed

The game has fundamentally changed

Page 6: Trends in Web Search and its relevance to Digital Libraries

626 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Forecasting

Content is becoming free– MIT / Stanford opening up textbooks – Open access archiving→ long term: content will not be primary revenue source

eBook revenue hasn’t held up its promise yet…

– Device gap: iPhone and nextGen devices→ Revenue may be further down the pipe

+

Academic publishers– Connect to libraries and federations at institution level– Individual customers are secondary

Trusted source– Expertise in copyediting, typesetting, project management, distribution, social networking– Many individual web publishers rediscovering same problems→ Consultancy model→ Win-win partnerships with individual authors

Page 7: Trends in Web Search and its relevance to Digital Libraries

726 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Web Trends• Social Content• Wisdom of masses: Crowdsourcing• Rich Media • Open Source / Access

Paradigmatic change – Classifieds → Craigslist– POTS → Skype– CD store → iTunes– Publishers → ??

http://www.informationarchitects.jp/slash/iA_WebTrends_2007_2_1024_768.gif

Page 8: Trends in Web Search and its relevance to Digital Libraries

826 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Where is research going?

• Search API usage• Browser as computer• Web page structure,

mining text data

• Modeling web users at tasks: Exploring / Fact-finding• Personalization, recommending• Social networks• Understanding opinion• Query and log analysis

http://flickr.com/photos/alisdair/

User centric

Server centric

Page 9: Trends in Web Search and its relevance to Digital Libraries

926 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Webfaced pop quiz – which is which?

SpringerAmerican Statistical

Society World Scientific

courtesy: http://pagerank.si/

WING@NUS

Page 10: Trends in Web Search and its relevance to Digital Libraries

1026 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Forecast: Know your strengthsGet advocates• Make it easy to get individuals to insist to their institution to buy your materials• Know who is accessing (not necessarily buying) your content Content revenue will continue to decline• Find an economic model that works for you• Work as partners in content creation

Be savvy on trends• Be visible: do “white hat” Search Engine Optimization (SEO)• Make your abstracts indexable by others

+

Academic publishers– Connect to libraries and federations at institution level– Individual customers are secondary

Trusted source– Expertise in copyediting, typesetting, project management, distribution, social networking– Many individual web publishers rediscovering same problems–→ Consultancy model–→ Win-win partnerships with individual authors

Page 11: Trends in Web Search and its relevance to Digital Libraries

1126 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Trends in Digital Libraries• Expanding types of information in search • Automated tools for DLs• Usability in E-books and online media• User modeling• Personalization, annotation and relation to other user tasks

http://flickr.com/photos/pathfinderlinden

>> WING @ NUS

Page 12: Trends in Web Search and its relevance to Digital Libraries

1226 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Scholarly Digital Libraries

• ForeCite: our scholarly DL• Data Cleaning• Slide and Document Alignment• Searching in the OPAC• Math Information Retrieval

Page 13: Trends in Web Search and its relevance to Digital Libraries

1326 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

ForeCite: Beyond the document as an item

A user-centric DL framework• Put author / reader functionality together• Tagging, correction, annotation and viewing• Automatic tools: keyphrases and sentence classification• For use on and offline, organizes local PDF files for you• Only need your web browser

ServerServer ClientClient

Page 14: Trends in Web Search and its relevance to Digital Libraries

1426 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Data Cleaning• Addresses

– Dongwon Lee, 110 E. Foster Ave. #410, State College, PA, 16802– LEE Dong, 110 East Foster Avenue Apartment 410, Univ. Park, PA 16802-2343

• Products– Honda Fix vs. Honda Jazz– Apple iPod Nano 4GB vs. 4GB iPod nano 4GB

• Idea: use web as additional context for disambiguation and clustering• Placed 3rd in Web People Search Task (WEPS 2007)

Search results:

“Jeffrey D. Ullman” 384,000 pages“Jeffrey D. Ullman” + “aho” 174,000 pages

“J. Ullman” 124,000 pages“J. Ullman” + “aho” 41,000 pages

“Shimon Ullman” 27,300 pages“Shimon Ullman” + “aho” 66 pages

45%45%

33%33%

0%0%

Page 15: Trends in Web Search and its relevance to Digital Libraries

1526 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Slides and their relationship to documents

Document in focusDocument in focus Slides in FocusSlides in Focus

Page 16: Trends in Web Search and its relevance to Digital Libraries

1626 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Searching in Libraries

http://linc.comp.nus.edu.sg

Page 17: Trends in Web Search and its relevance to Digital Libraries

1726 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Symbolic Information SearchHow do users want to search math materials?

Our answer: Text-to-Expression Linking– Resolve text keywords to expressions– e.g., “Pythagorean Theorem” “a2+b2=c2” or “x2+y2=z2”

Reduce the need for expression input

Solves the notational variation problem

Not quite right…

Page 18: Trends in Web Search and its relevance to Digital Libraries

1826 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Conclusions

• Consider us your research WING!• Trade data and problems for solutions and interns

Meanwhile:• Use better search strategies• Practice white hat SEO• Identify webfaced advocates

Page 19: Trends in Web Search and its relevance to Digital Libraries

1926 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

References• Kahin and Varian (2000) Internet Publishing and Beyond• Towle et al. (2007) Electronic Books in the 2003-2005 Period, Pub Res Q 23:95-104

Photo Credits• Flickr Creative Commons Search

Thanks to all of you for listening

& my fellow WING group members

Page 20: Trends in Web Search and its relevance to Digital Libraries

2026 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Page 21: Trends in Web Search and its relevance to Digital Libraries

2126 Sep 2008

Min-Yen Kan, WING@NUS

World Scientific Talk

Abstract•I will present trends in current academic research on web search anddigital libraries, and discuss their relevance to publishers and theireconomic model. With respect to the web, I will cover how searchengines are starting to specialize and use click through and ad datato improve relevance ranking. With respect to digital libraryresearch, I discuss my group's research at NUS on advancing thestate-of-the-art in scholarly digital libraries. I cover advances onhow we deal with data cleaning issues, and slide and equationretrieval and alignment.