facebook (stylized facebook) is a social networking system and website launched in february 2004,...

17
facebook Powered By : M.Bahmani H.Adldoost S.Entekhabi Urmia University April 2011

Upload: jewel-dickerson

Post on 27-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

facebook

Powered By:

M.BahmaniH.AdldoostS.Entekhabi

Urmia UniversityApril 2011

What is the

facebook?

Facebook (stylized facebook) is a Social Networking System and

website launched in February 2004, operated and privately

owned by Facebook, Inc.

As of January 2011, Facebook has more than 600 million active users.  Users may create a

personal profile, add other users as friends, and exchange

messages, including automatic notifications when they update

their profile . . .

Facebook has the First largest installation of

HADOOP(?) 1 Year ago , Yahoo Was the first

and now it’s the second. It is also the creator of HIVE(?)

But, What is The Hadoop and Hive?

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.Hadoop is a top-level Apache project being built and used by a global community of contributors using the Java programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses.

HADOOP:

HADOOP SEMINAR – 2010

In The Other words: Hadoop is an open source framework that enables data-intensive distributed applications to efficiently process gigantic amounts of data.   It’s an open source implementation of the MapReduce approach to processing data.  MapReduce was invented at Google to deal with the massive quantities of data necessary to index the web.  There are two main components to the system: the Hadoop Distributed File System (HDFS) which stores and maintains data across many machines, and the MapReduce engine which processes the data.

MAP REDUCE:

Hadoop was created by Doug Cutting, who named it after his son's toy elephant. It was originally developed to support distribution for the Nutch search engine project.

THERE ARE MANY TECHNOLOGIES BUILT ON TOP OF HADOOP THAT NEED TO BE CONSIDERED FOR YOUR SYSTEM. FOR EXAMPLE:

THERE IS:

HiveA SYSTEM FOR OFFLINE ANALYSIS

Hive is a data warehouse infrastructure built on top of Hadoop. It provides tools to enable easy data “ETL”, a mechanism to put structures on the data, and the capability to querying and analysis of large data sets stored in Hadoop files. Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Hive:So what is the

MySQL Federated Storage Engine:

The MySQL Federated storage engine for the MySQL relational database management system is a storage engine which allows a user to create a table that is a local representation of a foreign (remote) table. It utilizes the MySQL client library API as a data transport, treating the remote data source the same way other storage engines treat local data sources whether they be MYD files (MyISAM), memory (Cluster, Heap), or tablespace (InnoDB) . Each Federated table that is defined there is one .frm (data definition file containing information such as the URL of the data source). The actual data can exist on a local or remote MySQL instance.

Oracle RAC:

Oracle Real Application Clusters (RAC) is an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i that provides software for clustering and high availability in Oracle database environments. Oracle RAC allows multiple computers to run Oracle RDBMS software simultaneously while accessing a single database, thus providing a clustered database.

An example of Hadoop/Hive Advantages , that has been told by a facebook stakeholder :

When we started at Facebook in 2007 all of the data processing infrastructure was built around a data warehouse built using a commercial RDBMS. The data that we were generating was growing very fast – as an example we grew from a 15TB data set in 2007 to a 2PB data set today. The infrastructure at that time was so inadequate that some daily data processing jobs were taking more than a day to process and the situation was just getting worse with every passing day!

FACE BOOK SERVERS

We had an urgent need for infrastructure that could scale along with our data and it was at that time we then started exploring Hadoop as a way to address our scaling needs.[The] Hive/Hadoop cluster at Facebook stores more than 2PB of uncompressed data and routinely loads 15 TB of data daily

Thanks a lot for your attention