committed to deliver…. we are leaders in hadoop ecosystem. we support, maintain, monitor and...
TRANSCRIPT
![Page 1: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/1.jpg)
Committed to Deliver…
![Page 2: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/2.jpg)
We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over
Hadoop whether you run apache Hadoop, Facebook version or Cloudera version in your own data center, or n cluster of machines Amazon EC2, Rackspace etc
We provide Scalable End-to-end Solution: Solution that can scale of large data set (Tera Bytes or Peta Bytes)
Low Cost Solution: Based on open source Framework currently used by Google, Yahoo and Facebook.
Solution optimized for minimum SLA and maximize performance
![Page 3: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/3.jpg)
– Project Initiation• Project Planning• Requirement Collection• POC using Hadoop technology
– Team Building• Highly skilled Hadoop experts • Dedicated team for project
– Agile Methodology• Small Iterations• Easy to implement changing
requirement
– Support• Long term relationship to support developed product• Scope to change based on business/technical need
![Page 4: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/4.jpg)
The combined experience has led to the adoption of unique methodology that ensures quality work. We:
Evaluating the hardware available and understand the clients requirements.
Peeking through the data. Analyzing data, prototype using M/R code. Show the
results to our clients. Iterative - and continuous improvement and develop
better understanding of data. Parallel development of various tasks:
◦ Data Collection◦ Data Storage in HDFS◦ M/R Analytics jobs.◦ Scheduler to run M/R jobs and bring coordination.◦ Transform output into OLAP cubes (Dimension
and Fact Table)◦ Provide a custom interface to retrieve the M/R
output
![Page 5: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/5.jpg)
We are expert in time series data, in other words we receive time-stamp data.
We have ample experience in writing efficient fast and robust Map/Reduce code which implement ETL functions.
We have massaged Hadoop to enterprise standard provided features like High Availability, Data Collection, data Merging.
Writing Map/Reduce is not enough. We wrote layers on top of Hadoop which uses Hive, Pig to transform data in OLAP cubes for easy UI consumption.
![Page 6: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/6.jpg)
We provide a brief about our clients.
![Page 7: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/7.jpg)
Collector
Hadoop ClusterHadoop Cluster
Map / Reduce Output
UI Display
UI Display
Thrifit ServiceThrifit Service
Training Data
WebUI
WebUI
![Page 8: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/8.jpg)
External News Collector
Map/ Reduce Categorization Index
Map/ Reduce (Filtering, Term Freq Collection)
Map/ Reduce (Training Set) Training Data
DFS Client
Hive Interface
![Page 9: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/9.jpg)
We were asked to analyze their sales data and extract valuable information from the data.
The Data was in form of 9-tuple format: <OrderID, EmailID, MobileNum, ProductID, PayableAmount, DeliveryCharges, ModeofPayment, OrderStatus, OrderSite>
We were asked to provide information like unique subscribers count (used email address), per day transactions amount
We deployed the Hadoop cluster on three machines ◦ Deployed our collector to pump data from DB into
HDFS◦ Wrote M/R jobs to generate OLAP cubes.◦ Provided Hive Interface to extract and show in UI.
![Page 10: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/10.jpg)
OrderID EmailID Mobile Num
Payable Amount
Delivery Charges
Mode of Payment
Order Status
Order Site
Day Granularity
Actual Number of Customers
Forecast Number of Customers
Total Aggregated Amount
Forecast Aggregated Amount
Email ID Payable Amount
![Page 11: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/11.jpg)
We delivered end-to-end reporting solution to Guavus.
The Data was provided by Sprint Network (Tier 1 Company) we had to develop a reporting engine to analyze and generate OLAP cubes.
We were asked to provide evaluate Peta Bytes of data provide ETL solution
We deployed the Hadoop cluster on 10 Linux machines.
We wrote our collector which read Binary Data and pushed into Hadoop Cluster.
We wrote M/R jobs (which run for 4 hrs) every day The idea was to provide provide analytics on stream data
We generate OLAP cubes and storing results in Infinity DB (column DB), Hive.
![Page 12: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/12.jpg)
Reporting UI/ Web InterfaceReporting UI/ Web Interface
Report Generation Task (Map / Reduce Framework)Report Generation Task (Map / Reduce Framework)
Data CollectorData Collector Query EngineQuery Engine Hadoop Configuration
Hadoop Configuration
Distributed Storage Framework (Hadoop / HDFS)Distributed Storage Framework (Hadoop / HDFS)
Infinity DB / Hive / PigInfinity DB / Hive / Pig
![Page 13: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/13.jpg)
Hadoop Infrastructure
Hadoop Infrastructure
Map / Reduce Tasks
Map / Reduce Tasks
Data Collector
Data Collector
Data Collector
Data Collector
Data Collector
Data Collector
Data Collector
Data Collector
Monitor / Overall Scheduler
Infinity DB / Hive /
Pig
Infinity DB / Hive /
Pig
Rubix Framework
Rubix Framework
UI D
ispla
yU
I Disp
lay
![Page 14: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/14.jpg)
For HT we are developing a syndication clustering algorithm.
We have large amount of old news document and we were asked cluster. Manually clustering was nearly impossible
We implement a clustering Map/Reduce algorithm using Cosine Similarity and clustered the documents.
![Page 15: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/15.jpg)
XML files/Documents
V1
V2
VN
List of XMLNews Files
Transformed into Integer Vector.One XML news file maps to One Vector.
ApplyCoSine
SimilarityBetweenVectors
Get the Minimum Distance Pair of Vector
NewsFiles
NewsFiles
NewsFilesNewsFiles
News Files
News Files
CreateList of closely related stories
HADOOP PLATFORM
MAP Functionality REDUCE Functionality
Clus
ter A
lgor
ithm
C-Ba
yes
Clas
sific
ation
C-Ba
yes
Clas
sific
ation
Cate
goriz
e D
ocum
ents
Cate
goriz
e D
ocum
ents
![Page 16: Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,](https://reader031.vdocuments.mx/reader031/viewer/2022013101/56649e1b5503460f94b097a9/html5/thumbnails/16.jpg)
Office Location: India A-82, Sector 57, Noida, UP, 201301 Japan 2-8-6-405,Higashi Tabata Kita-ku,Tokyo,Japan General Inquiries [email protected] Sales Inquiries [email protected]