2012-01-10-data tuesday

Post on 24-May-2015

734 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Additional information on #datatuesday: http://data-tuesday.com/Additional information on Hadoop on Azure: http://www.hadooponazure.com, http://aka.ms/benjguinhadoop

TRANSCRIPT

Hadoop sur AzureData Tuesday – 10 janvier 2012Pierre Lagarde (DPE) – pierlag@microsoft.comBenjamin Guinebertière (DPE) – www.benjguin.com

Microsoft Distribution of Hadoop [MDH]

• Code name : Isotope• Leveraging the Hadoop data-driven

community – OnPremise – Cloud– Windows Server integration [AD – Secure

HDFS]– Connection with SQL Server / Excel– Developer Framework [JavaScript, .NET, F#,

…]– Hadoop as a Service through Azure [eMDH]

Structural Overview

A SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS

EIS / ERP RDBMS File System OData [RSS] Azure Storage

ISOTOPE[Azure and Enterprise]

OCEAN OF DATA[unstructured, semi-structured, structured]

Java - JavaScript Streaming OM HiveQL PigLatin (T)SQL.NET/C#/F#

HDFS

NOSQL ETL

Création d’un cluster à la demande

Map/Reduce - Java

Map/Reduce – C#

Map/Reduce - JavaScript

Démo - JavaScript

Azure Storage

distcp HDFS Sort/filter

JavaScript M/R

HDFS File

Graph.bar(data)

Excel ODBC Hive Connector

Reporting SQLServer

from("books") .mapReduce("file.js", "word, count:long") .orderBy("count DESC") .take(10) .to("top10")

• from("books").mapReduce("bin/WordCountLong.js", "word, count:long").orderBy("count DESC").take(10).to("demo-top10")

• #get top10

top related