how salesforce.com uses hadoop
TRANSCRIPT
![Page 1: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/1.jpg)
How Salesforce.com Uses Hadoop
Some Data Science Use Cases
Narayan Bharadwaj Jed Crosby
salesforce.com salesforce.com
@nadubharadwaj @JedCrosby
![Page 2: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/2.jpg)
Safe Harbor
Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be
deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other
financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of
intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we
operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new
releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization
and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of
salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This
documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of
our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based
upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-
looking statements.
![Page 3: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/3.jpg)
Agenda
• Technology
• Hadoop use cases
• Use case discussion
• Product Metrics
• User Behavior Analysis
• Collaborative Filtering
• Q&A
Every time you see the elephant, we will attempt to explain a
Hadoop related concept.
![Page 4: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/4.jpg)
Got “Cloud Data”?
800 million transactions/day
Terabytes/day
130k customers
Millions of users
![Page 5: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/5.jpg)
Technology
![Page 6: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/6.jpg)
Hadoop Overview
- Started by Doug Cutting at Yahoo!
- Based on two Google papers
Google File System (GFS): http://research.google.com/archive/gfs.html
Google MapReduce: http://research.google.com/archive/mapreduce.html
- Hadoop is an open source Apache project
Hadoop Distributed File System (HDFS)
Distributed Processing Framework (MapReduce)
- Several related projects
HBase, Hive, Pig, Flume, ZooKeeper, Mahout, Oozie, HCatalog
![Page 7: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/7.jpg)
Our Hadoop Ecosystem
Apache Pig
![Page 8: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/8.jpg)
Contributions
@pRaShAnT1784 : Prashant Kommireddi
Lars Hofhansl @thefutureian : Ian Varley
![Page 9: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/9.jpg)
Use Cases
![Page 10: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/10.jpg)
Product Metrics User behavior
analysis Capacity planning
Monitoring intelligence Collections Query Runtime
Prediction
Early Warning System Collaborative Filtering Search Relevancy
Internal App Product feature
Hadoop Use Cases
![Page 11: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/11.jpg)
Product Metrics
![Page 12: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/12.jpg)
Track feature usage/adoption across 130k+ customers
Eg: Accounts, Contacts, Visualforce, Apex,…
Track standard metrics across all features
Eg: #Requests, #UniqueOrgs, #UniqueUsers, AvgResponseTime,…
Track features and metrics across all channels
API, UI, Mobile
Primary audience: Executives, Product Managers
Product Metrics – Problem Statement
![Page 13: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/13.jpg)
Feature Metadata
(Instrumentation)
Daily Summary
(Output)
Crunch it
(How?)
Storage & Processing
Feature (What?) Fancy UI
(Visualize) Collaborate & Iterate
Data Pipeline
![Page 14: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/14.jpg)
Feature Metrics
(Custom Object)
Trend Metrics
(Custom Object)
Client Machine
Pig script generator
Hadoop
Log Files
Lo
g P
ull
User Input
(Page Layout) Reports, Dashboards
AP
I
AP
I
Wo
rkfl
ow
Fo
rmu
la
Fie
lds
Java Program
Collaboration
(Chatter)
Wo
rkfl
ow
Product Metrics Pipeline
![Page 15: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/15.jpg)
Id Feature Name PM Instrumentation Metric1 Metric2 Metric3 Metric4 Status
F0001 Accounts John /001 #requests #UniqOrgs #UniqUsers AvgRT Dev
F0002 Contacts Nancy /003 #requests #UniqOrgs #UniqUsers AvgRT Review
F0003 API Eric A #requests #UniqOrgs #UniqUsers AvgRT Deployed
F0004 Visualforce Roger V #requests #UniqOrgs #UniqUsers AvgRT Decom
F0005 Apex Kim axapx #requests #UniqOrgs #UniqUsers AvgRT Deployed
F0006 Custom Objects Chun /aXX #requests #UniqOrgs #UniqUsers AvgRT Deployed
F0008 Chatter Jed chcmd #requests #UniqOrgs #UniqUsers AvgRT Deployed
F0009 Reports Steve R #requests #UniqOrgs #UniqUsers AvgRT Deployed
Feature Metrics (Custom Object)
![Page 16: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/16.jpg)
Feature Metrics (Custom Object)
![Page 17: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/17.jpg)
User Input (Page Layout)
Formula
Field
Workflow
Rule
![Page 18: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/18.jpg)
User Input (Child Custom Object)
Child
Objects
![Page 19: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/19.jpg)
Apache Pig
![Page 20: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/20.jpg)
-- Define UDFs
DEFINE GFV GetFieldValue(‘/path/to/udf/file’);
-- Load data
A = LOAD ‘/path/to/cloud/data/log/files’ USING PigStorage();
-- Filter data
B = FILTER A BY GFV(row, ‘logRecordType’) == ‘U’;
-- Extract Fields
C = FOREACH B GENERATE GFV(*, ‘orgId’), LFV(*. ‘userId’) ……..
-- Group
G = GROUP C BY ……
-- Compute output metrics
O = FOREACH G {
orgs = C.orgId; uniqueOrgs = DISTINCT orgs;
}
-- Store or Dump results
STORE O INTO ‘/path/to/user/output’;
Basic Pig Script Construct
![Page 21: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/21.jpg)
Java Pig Script Generator (Client)
![Page 22: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/22.jpg)
Id Date #Requests #Unique Orgs #Unique Users Avg ResponseTime
F0001 06/01/2012 <big> <big> <big> <little>
F0002 06/01/2012 <big> <big> <big> <little>
F0003 06/01/2012 <big> <big> <big> <little>
F0001 06/02/2012 <big> <big> <big> <little>
F0002 06/02/2012 <big> <big> <big> <little>
F0003 06/03/2012 <big> <big> <big> <little>
Trend Metrics (Custom Object)
![Page 23: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/23.jpg)
Upload to Trend Metrics (Custom Object)
![Page 24: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/24.jpg)
Visualization (Reports & Dashboards)
![Page 25: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/25.jpg)
Visualization (Reports & Dashboards)
![Page 26: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/26.jpg)
Collaborate, Iterate (Chatter)
![Page 27: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/27.jpg)
Feature Metrics
(Custom Object)
Trend Metrics
(Custom Object)
Client Machine
Pig script generator
Hadoop
Log Files
Lo
g P
ull
User Input
(Page Layout) Reports, Dashboards
AP
I
AP
I
Wo
rkfl
ow
Fo
rmu
la
Fie
lds
Java Program
Collaboration
(Chatter)
Wo
rkfl
ow
Recap
![Page 28: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/28.jpg)
User Behavior Analysis
![Page 29: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/29.jpg)
Problem Statement
How do we reduce number of clicks on the user interface?
Need to understand top user click paths. What are they typically trying to do?
What are the user clusters/personas?
Approach:
• Markov transition for click path, D3.js visuals
• K-means (unsupervised) clustering for user groups
![Page 30: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/30.jpg)
Markov Transitions for "Setup" Pages
![Page 31: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/31.jpg)
K-means clustering of "Setup" Pages
![Page 32: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/32.jpg)
Collaborative Filtering
Jed Crosby
![Page 33: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/33.jpg)
Show similar files within an organization
Content-based approach
Community-base approach
Collaborative Filtering – Problem Statement
![Page 34: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/34.jpg)
Popular File
![Page 35: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/35.jpg)
Related File
![Page 36: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/36.jpg)
Amazon published this algorithm in 2003.
Amazon.com Recommendations: Item-to-Item Collaborative Filtering, by
Gregory Linden, Brent Smith, and Jeremy York. IEEE Internet Computing,
January-February 2003.
At Salesforce, we adapted this algorithm for Hadoop, and we
use it to recommend files to view and users to follow.
We found this relationship using item-to-item collaborative
filtering
![Page 37: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/37.jpg)
Annual Report Vision Statement
Dilbert Comic
Darth Vader Cartoon
Disk Usage Report
Example: CF on 5 files
![Page 38: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/38.jpg)
Annual
Report
Vision
Statement
Dilbert
Cartoon
Darth Vader
Cartoon
Disk Usage
Report
Miranda
(CEO) 1 1 1 0 0
Bob (CFO) 1 1 1 0 0
Susan
(Sales) 0 1 1 1 0
Chun (Sales) 0 0 1 1 0
Alice (IT) 0 0 1 1 1
View History Table
![Page 39: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/39.jpg)
Annual Report
Disk Usage
Report
Darth Vader
Cartoon Dilbert Cartoon
Vision Statement
Relationships Between the Files
![Page 40: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/40.jpg)
Annual Report
Disk Usage
Report
Darth Vader
Cartoon Dilbert Cartoon
Vision Statement 2
2
0
0
3 1
0
3
1 1
Relationships Between the Files
![Page 41: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/41.jpg)
Annual
Report
Vision
Statement
Dilbert
Cartoon
Darth Vader
Cartoon
Disk Usage
Report
Dilbert (2) Dilbert (3) Vision Stmt. (3) Dilbert (3) Dilbert (1)
Vision Stmt. (2) Annual Rpt. (2) Darth Vader (3) Vision Stmt. (1) Darth Vader (1)
Darth Vader (1) Annual Rpt. (2) Disk Usage (1)
Disk Usage (1)
The popularity problem: notice that Dilbert appears first in every list. This is
probably not what we want.
The solution: divide the relationship tallies by file popularities.
Sorted Relationships for Each File
![Page 42: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/42.jpg)
Annual Report
Disk Usage
Report
Darth Vader
Cartoon Dilbert Cartoon
Vision Statement .82
.63 0
0
.77 .33
0
.77
.45 .58
Normalized Relationships Between the Files
![Page 43: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/43.jpg)
Annual Report Vision
Statement
Dilbert
Cartoon
Darth Vader
Cartoon
Disk Usage
Report
Vision Stmt.
(.82)
Annual Report
(.82)
Darth Vader
(.77) Dilbert (.77)
Darth Vader
(.58)
Dilbert (.63) Dilbert (.77) Vision Stmt.
(.77)
Disk Usage
(.58)
Dilbert
(.45)
Darth Vader
(.33)
Annual Report
(.63)
Vision Stmt.
(.33)
Disk Usage
(.45)
High relationship tallies AND similar popularity values now drive closeness.
Sorted relationships for each file, normalized by file popularities
![Page 44: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/44.jpg)
1) Compute file popularities
2) Compute relationship tallies and divide by file popularities
3) Sort and store the results
The Item-to-Item CF Algorithm
![Page 45: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/45.jpg)
MapReduce Overview Map Shuffle Reduce
(adapted from http://code.google.com/p/mapreduce-framework/wiki/MapReduce)
![Page 46: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/46.jpg)
<user, file>
Inverse identity map
<file, List<user>>
Reduce
<file, (user count)>
Result is a table of (file, popularity) pairs that you store in the Hadoop distributed cache.
1. Compute File Popularities
![Page 47: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/47.jpg)
(Miranda, Dilbert), (Bob, Dilbert), (Susan, Dilbert), (Chun, Dilbert), (Alice, Dilbert)
Inverse identity map
<Dilbert, {Miranda, Bob, Susan, Chun, Alice}>
Reduce
(Dilbert, 5)
Example: File popularity for Dilbert
![Page 48: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/48.jpg)
<user, file>
Identity map
<user, List<file>>
Reduce
<(file1, file2), Integer(1)>,
<(file1, file3), Integer(1)>,
…
<(file(n-1), file(n)), Integer(1)>
Relationships have their file IDs in alphabetical order to avoid double counting.
2a. Compute Relationship Tallies − Find All Relationships in View History Table
![Page 49: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/49.jpg)
(Miranda, Annual Report), (Miranda, Vision Statement), (Miranda, Dilbert)
Identity map
<Miranda, {Annual Report, Vision Statement, Dilbert}>
Reduce
<(Annual Report, Dilbert), Integer(1)>,
<(Annual Report, Vision Statement), Integer(1)>,
<(Dilbert, Vision Statement), Integer(1)>
Example 2a: Miranda’s (CEO) File Relationship Votes
![Page 50: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/50.jpg)
<(file1, file2), Integer(1)>
<(file1, file2), List<Integer(1)>
Identity map
Reduce: count and divide
by popularities
<file1, (file2, similarity score)>, <file2, (file1, similarity score)>
Note that we emit each result twice,
one for each file that belongs to a relationship.
2b. Tally the Relationship Votes − Just a Word Count, Where Each
Relationship Occurrence is a Word
![Page 51: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/51.jpg)
<(Dilbert, Vader), Integer(1)>,
<(Dilbert, Vader), Integer(1)>,
<(Dilbert, Vader), Integer(1)>
<(Dilbert, Vader), {1, 1, 1}>
Identity map
Reduce: count and divide
by popularities
<Dilbert, (Vader, sqrt(3/5))>, <Vader, (Dilbert, sqrt(3/5))>
Example 2b: the Dilbert/Darth Vader Relationship
![Page 52: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/52.jpg)
<file1, (file2, similarity score)>
Identity map
<file1, List<(file2, similarity score)>>
Reduce
<file1, {top n similar files}>
Store the results in your location of choice
3. Sort and Store Results
![Page 53: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/53.jpg)
<Dilbert, (Annual Report, .63)>,
<Dilbert, (Vision Statement, .77)>,
<Dilbert, (Disk Usage, .45)>,
<Dilbert, (Darth Vader, .77)>
Identity map
<Dilbert, {(Annual Report, .63), (Vision Statement, .77), (Disk Usage, .45), (Darth Vader, .77)}>
Reduce
<Dilbert, {Darth Vader, Vision Statement}> (Top 2 files)
Store results
Example 3: Sorting the Results for Dilbert
![Page 54: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/54.jpg)
Cosine formula and normalization trick to avoid the distributed
cache
Mahout has CF
Asymptotic order of the algorithm is O(M*N2) in worst case, but
is helped by sparsity.
cosqAB =A · B
A B=A
A·B
B
Appendix
![Page 55: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/55.jpg)
Narayan Bharadwaj
Director, Product Management
@nadubharadwaj
Jed Crosby
Data Scientist
@JedCrosby
![Page 56: How Salesforce.com Uses Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022062319/557ad1dcd8b42add288b4e7e/html5/thumbnails/56.jpg)