alex cheng of baidu: "big data: a new frontier"

Tags:

Post on 01-Nov-2014

783 Views

Category:

Art & Photos

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Big Data: A New Frontier

Alex Cheng, VP Baidu 2013-4-12

5 billion+ Search Queries

~4 million Posts on PostBar

~500 million Users

100 million+ Mobile Search Users

~500,000 Business

Clients

Everyday

at

Storage  

Processing  

Analy1cs  &  

Predic1on  

Data  Intelligence  Volume  

 Velocity  

 Variety    

 Value  

Web  Pages  &  Links  100+  PB   Logs  100+  PB  UGC  1  PB  

Web  

News  

PostBar   Encyclopedia  

Knows  

Searches,  Clicks,  Posts  etc.  

1 petabyte = 2x National Library of China

Logs  100+  PB  

UGC  1+  PB  

2005

2006

2007

20

08

2009

20

10

2011

20

12

100  PB   100  PB   100  PB  

100  PB   100  PB   100  PB   100  PB   100  PB  

•  95%  of  the  data  was  created  within  the  last  3  years  

•  100  PB  of  new  data  is  processed  everyday  

100  PB   100  PB   100  PB   100  PB   100  PB  

100  PB   100  PB   100  PB   100  PB  100  PB  

100  PB  100  PB  

Growth  :  100%+  YoY  

Hardware Innovations

•  Custom ARM-based

Servers

•  Gigabit Switches

•  Custom SSD/Flash Storage

TCO -25% Density +70%

PUE 1.18 / 1.37 (#1) Non-cooling hours 48%

Custom Rack Uptime Efficiency 10x

Performance 2x Cost -48%

Baidu Cloud IDC Yangquan, Shanxi, China

Software Innovations

•  Global Optimization •  Multiple Replication •  Data Distribution •  Partial Update

MONOLITHIC HW

TRADITIONAL RELATIONAL DATABASE

DIRECT RECORD ACCESS OR QUERIES

TRADITIONAL  SERVER  STACK  

MAPREDUCE

NOSQL DATABASE

PARALLEL RELATIONAL DATABASE

HADOOP

DISTRIBUTED HARDWARE

NEW  SERVER  STACK  

•  Real-time online learning •  Tens of billions training

samples •  Billions of complex features

Feature extraction

Model Training Models

Query Advanced

Search Module

CTR-server

Logs

Offline

Online

Big  Data   +   Web  Search  

•  Real-­‐Rme  DicRonary  Updates  •  Dynamic  Result  Modeling  •  High-­‐frequency  Inputs  

RecommendaRon      

Big  Data   +   IME  

User Input

NLP Module

Consolidated Search Result

On-Device Quick

Search

Cloud-based

Dictionary

Device-based

Dictionary

Output

Voice

Images

•  10+ Billions Training Examples •  Heterogeneous Features •  Intensive Computing

Deep Learning

The  Future  of  Big  Data   “Digital  Universe”      

2009   2010   2011   2012   2013   2014   2015   2016   2017   2018   2019   2020  

20,000  

40,000  

10,000  

30,000  

exabytes Machine-generated Sensor Data “Anytime, Anywhere, Any Devices” Smartphone Smart Home Wearable Devices Smart Car … …

top related