lessons learned from lod failure and big data : the future trend
DESCRIPTION
I discuss the failure of LOD and the reasons. From the lessons learned, LOD2 got launched four plus (4+) years ago and is about to the completed. What can you say about the future trend of Big Data from the lessons?TRANSCRIPT
1
Lessons Learned from LOD (Linked Open Data) Failure and Big Data:
The Future Trend
Youngwhan Lee, Ph. D.전화 : 010-7997-0345
이메일 : [email protected]: Youngwhan Nick Lee
Twitter: nicklee002
Web Evolution and Big Data
1-3
Internet Today
2010:• Estimated 1011 Web pages in the World
2012:• Social Media: Facebook (1 Billion Monthly Active Users) • 문자 발명후 2003 년까지 5 엑사 바이트 2012 년 현재 매일 7 엑사바이트 데이터
생성 중 • Is “big data” a big pile of garbage?
Web Explosion and Big Data
• Number of Web Users (Mar. 2012): 2.3 Billion• 1011 Web pages in the World (Est. 2010)
– Since the inception of Web, there were 7000 days (i.e. 20 years). This means humans create over 10 Million pages a day.
• Digital Information Created in the year 2010: 1 zetabytes (1021)- "There was 5 exabytes of information created between the dawn of civiliza-
tion through 2003, but that much information is now created every 2 days, and the pace is increasing.“ –Eric Schmitt (2010)
- 2012, almost 7 exabytes are created everyday. - We call it “Big Data.”
• What does this mean?
Modified, based on Gene Bellinger, Durval Castro, Anthony Mills http://www.systems-thinking.org/dikw/dikw.htm , http://yjhyjh.egloos. -com/39721
R-DBMS
NoSQL
데이터분석
MapRe-duce
LOD
큐레이션 SPARQL
RDF
지식구조화
OWL
RIF
Aggregation
Understand-ing
빅데이터 / 웹에서의 정보 / 지식 추출• 정보 검색
– SEO(Search Engine Optimization) PageRank, EdgeRank
• Data Mining: 프로그램에 의한 정보 ( 지식 ) 추출 가능– 통계분석 , Rule-based Analysis, 신경망 분석– Visualization
• 지식공학 이용– RDF/OWL 사용한 온톨로지 누적 연결– Raw Data 연결하고 분석 가능하도록 개방 (Linked Open Data; LOD)– 프로그램에 의한 논리분석 가능한 지식 추출 가능
• SPARQL• RIF(Rule-based Interface Framework)
• 인간의 힘 이용 : 큐레이션– 인간의 눈과 지식을 이용하여 정보를 필터하고 종합
• 예 : pinterest.com, videocooki.com, storify.com, scoop.it, curated.by
데이터사이언스
지식공학
Pareto’s Law
Bighead
Longtail
Longtail Phenomena in
Bighead Applications
Longtail Applications
Popu
larit
y
The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to in-formation domains
… …
…
…
Mobile Apps iPhone Apps Android Apps
SNS Apps Facebook Apps Twitter Apps
LOD and Others Medical Apps 공공 정보 활용 Apps …
지식공학에서의 접근
• 온톨로지 구축– Cyc– WolframAlpha– Siri
• 데이터의 웹 (Web of Data)– LOD LOD2
Old “Layercake” of Semantic Web
정보 교환
RDF
OWL2
OWL2
Linking Open Data (LOD) is to connect and to open data to public
1. Use URIs as names for things2. Use HTTP URIs 3. When someone looks up a URI, provide useful information4. Include links to other URIs
4 Principlesof LOD
Linked Open Data (LOD) Principles
A little history of LOD Project Tim Berners-Lee proposed LOD(Linking Open Data) project (2006) Since the proposal, numerous countries and organizations participated,
caused LOD to explode in terms of the number of data Wikipedia DBpedia (www.dbpedia.org) Bio2RDF project opened in 27 fields of Biology, Genetics, Medical-re-
lated, of which the data sets are about 2.3 billions (Bio2RDF.org) (2008.10)
BBC announced to participate LOD project (www.bbc.org), now one of the institutes actively utilizing the data
US Data.gov released 5 billion data triples US Library of Congress announced to join LOD project. (http://
id.loc.gov/authorities/sh85042531#concept) NY Times ( data.nytimes.com) release their data of 150 years of publica-
tion (2009.10) US Whitehouse release a plan to open data in RDF (2009.11)
Advantages of LOD
• Elegant• Expandable• Flexible• Powerful• Decentralized• Participatory• Inclusive, and• “Free” to use
Linked Open Data (LOD) Principles
Change of Web Structure
18
인간을 위한웹 페이지 연결 웹페이지 연결 버스
유저 인터페이스
웹데이터 연결 버스
매쉬업 매쉬업
인간을 위한웹 페이지 연결
컴퓨터를 위한웹 데이터 연결
웹페이지 연결 버스
유저 인터페이스
May, 2007Mar., 2008
Sep., 2008
July, 2009
SPARQL
SPARQL (Simple Protocol and RDF Query Lan-guage)
Technical Proposal Phase Practical Use Phase
Web 3.0: Merging the two Perspectives
Market Be-havior
Perspective
Technology Innovation
Perspective
WWW Propoal (1989)
Semantic Web LOD Proposal (2006)
WEB 1.0 WEB 2.0
Data-based Semantics
Knowledge-based Semantics
“GGG” Proposal (2007)
Next Generation Web
“WEB2” Proposal (2009)
Web 3.0
But no Champaign…
• Definition Unclear– Berners-Lee’s 4 principles are ambiguous
• Interpretation difficult• Inconsistent• Difficult both to learn and use• Difficult to build browsers and reasoners
• “Free” to use
Full of incomplete and inconsistent RDFs, no way to make them evolveIn short, “Garbage in, Garbage out” expe-rienced
Solution to LOD problems: LOD2
• LOD2 Stack: A Technical Approach– Linked Data Management– Enrichment and Quality Improvement– Various Tools to use
• Storage and Querying• Revision and authoring• Interlinking and fusing• Classification and enrichment• …
Q: Is this technical approach for LOD good enough?
A: Business ap-proach is definitely
needed.
Big Data
What did we do with big data in 2013?
What would we do with big data in 2014?
End of Theory
“ 이론의 종말” by Chris Anderson
빅데이터와 데이터 지상주의
Implication
• Issue: Have and Have-not are separated–E. g. in marketing
• 4Ps– Price, product, place, promotion
• STP– Segmentation, targeting, and positioning
Implication
• Is Technical Approach needed?
Business Approach
• Data Markets– Azure Data Marketplace– Data.com– Infochimps.com– DataMarket.com– Kaggle.com
Data Market: Azure Data Mar-ketplace
Data Market: Data.com
Data Market: Infochimps.com
Data Market: DataMarket.com
Data Market: Kaggle.com
Conclusion
• Positioning for Korea,– Where are we?– Where are we heading to?
참고문헌
• 웹 3.0 세상을 바꾸고 있다 .– 이영환
• A Semantic Web Primer (Cooperative Information Systems se-ries) – Grigoris Antoniou, Frank van Harmelen
• Semantic Web for the Working Ontologist, Second Edition: Effec-tive Modeling in RDFS and OWL– Dean Allemang, James Hendler
• 온톨로지 : 인터넷 진화의 열쇠– 노상규 , 박진수
• 월드와이드웹– 팀 버너스 - 리
• 큐레이션– 스티븐 로젠바움 저 , 이시은 역
Web sites
• Problems of Linked Data– http://milicicvuk.com/blog/2011/07/26/problems-of-linked-data-
14-identity/
• LOD2– http://lod2.eu/Welcome.html– http://stack.lod2.eu/blog/
• How to Define Web 3.0 – http://howtosplitanatom.com/news/how-to-define-web-30-2/
• SPARQL by Example– http://www.cambridgesemantics.com/semantic-university/spar
ql-by-example#(1)
• Practical P-P-P-Problems with Linked Data– http://www.mkbergman.com/917/practical-p-p-p-problems-with
-linked-data/
• Linked-Data-Api– https://code.google.com/p/linked-data-api/