hive@king threshing data
DESCRIPTION
hive@king Threshing data. Mattias Andersson, BI Developer, [email protected]. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/1.jpg)
![Page 2: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/2.jpg)
hive@kingThreshing dataMattias Andersson, BI Developer, [email protected]
“Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.”
![Page 3: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/3.jpg)
3
Agenda
• A short history of King• Why do we use hive at King?• I will discuss hive from an analytics and data warehouse
user perspective• Keep it short
![Page 4: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/4.jpg)
This is
Bragging warning!
Level 1
![Page 5: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/5.jpg)
Thomas Hartwig (CTO), Patrik Stymne (Architect) Sebastian Knutsson (Chief Product Officer), Riccardo Zacconi (CEO), Lars Markgren (GM Sweden)
Founded in 2003 by a bunch of ex-Spray guys
![Page 6: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/6.jpg)
+ in London, Malmö, Bucharest, San Fran, Malta & Barcelona.
A European developer with its heart in Sthlm
”Silicontull”
![Page 7: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/7.jpg)
We create & publish casual games
![Page 8: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/8.jpg)
2003-2010
![Page 9: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/9.jpg)
200+ casual games
The foundation for our crusade on Facebook and mobile
2003-2010
![Page 10: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/10.jpg)
Fucked by Facebook (FBF Index)
500m
2004 2005 2006 2007 2008 2009 2010
Facebook unique visitors Yahoo Games US unique visitors
Fall of 2010
![Page 11: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/11.jpg)
Facebook Fall of 2012, Industry
experts:
“King missed the train, it’s too
late now” “Zynga and Wooga
owns the market”
![Page 12: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/12.jpg)
King’s response?
![Page 13: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/13.jpg)
It is never too late to disrupt an industry
![Page 14: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/14.jpg)
April 2011: Bubble Saga on Facebook 2011The Saga format
![Page 15: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/15.jpg)
Bubble Saga was a hit…n.7 on Facebook after 4 months
Daily Active Users (DAU
2.4 million DAU!
April 2011
![Page 16: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/16.jpg)
Bubble Witch Saga…
Daily Active Uniques (DAU)
Explosive growth: from 0 to 6 million daily players
in 4 months
Oct 2011-2012
1 year growth: from 220,000 DAU to 8,500,000!
![Page 17: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/17.jpg)
Mobile: July 2012
![Page 18: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/18.jpg)
Mobile July 2012 - now
Also #1 top grossing app in Sweden since February
![Page 19: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/19.jpg)
19
How we succeeded technically speaking…Our platform
Tech choices:Application – 96 servers (java)MySQL – 59 serversMemcache – 24 serversHadoop cluster – 20 servers
How it all works from a BI perspectiveMySQL shards with user state, they are off limits for BIThe game logs events whenever something interesting has happenedHourly rolling of logs to central logserver where we fetch the data
![Page 20: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/20.jpg)
20
Big data, bigger metadataMetadata…
![Page 21: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/21.jpg)
21
We are on our way…Are we Big Data?
![Page 22: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/22.jpg)
22
The most important successfactor for hiveHive connectivity
Web interface to hiveEasy to use so is a great first encounter
Hue
Enables us to pull data from hive into Qlikview/R/Excel
ODBC
The default/advanced interfaceCommand line interface
Different interfaces use different escape sequences/variable substitution…
Scumbag hive:
![Page 23: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/23.jpg)
23
This is what sold it to meHive programmability
Hive custom transformfrom ( from dual map a using 'seq 1 5' as sequence int
sort by sequence ) map_outreduce sequenceusing 'awk "{sum+=$0\; print sum}"' as cumulative int;Output:1361015
Really easy to make something horribly unmaintainable. Perl/xslt/wget in one hql-statement…
Scumbag hive:
![Page 24: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/24.jpg)
24
Map as a double entendreHive complexity
Map datatypecreate table if not exists test2(test map<string,map<string,int>>)ROW FORMAT DELIMITEDSTORED AS TEXTFILE;
select test ["test"]["x"] from test2;
There is no syntax to declare map/array separators after the first for hive in textfile format, \004 \005 and \006 \007 is hardcoded.
Scumbag hive:
![Page 25: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/25.jpg)
25
Its complicated…So why did we choose to use hive?
Pros• SQL is easy to learn• Supports custom mapreduce jobs• ODBC connection for QlikView• Hue for lightweight access• Development is moving fast • Open source
Cons• High latency• Lots of moving parts• Not free from bugs
![Page 26: hive@king Threshing data](https://reader031.vdocuments.mx/reader031/viewer/2022013004/5681685d550346895dde9c6d/html5/thumbnails/26.jpg)
The end.