Download - Mind Candy GIAF 4 Presentation
![Page 1: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/1.jpg)
Capturing Events at Mind Candy
Tim Bennett – Data Engineer at Mind CandyPlease visit: http://slid.es/bennetimo/capturing-events-at-mind-
candy
![Page 2: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/2.jpg)
Statistics on a Macbook
Liz Macfie – Data Scientist at Mind Candy
![Page 3: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/3.jpg)
What is a Data Scientist?
Data Scientist (n.): Person who is better at statistics than a software engineer and better at software engineering than a statistician.
Josh Wills (paraphrased)
![Page 4: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/4.jpg)
What is a Data Scientist atMind Candy?
Data EngineersData Engineers Analysts
![Page 5: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/5.jpg)
What is a Data Scientist atMind Candy?
Data EngineersData Engineers Analysts
maintaining data structures and back-end systems
pulling data from various external sources
communicating the needs of the product managers and analysts
![Page 6: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/6.jpg)
What is a Data Scientistat Mind Candy?
Data EngineersData Engineers Analysts
maintaining data structures and back-end systems
pulling data from various external sources
communicating the needs of the product managers and analysts
determining which questions are needing to be answered
carrying out statistically relevant analysis of the data
providing support for internal tools and systems
![Page 7: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/7.jpg)
What is a Data Scientistat Mind Candy?
![Page 8: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/8.jpg)
Recent HighlightsDeciding which events to capture
Moshi Monsters Village
![Page 9: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/9.jpg)
Recent HighlightsDeciding which events to capture
Moshi Monsters Village
RescuingMoshlings
FarmingCrops
SendingGifts
MakingIAPs
CompletingQuests
BuildingHomes
InvitingFriends
ConvertingCurrencies
![Page 10: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/10.jpg)
Recent HighlightsDeciding which events to capture
trade-off between getting all possible information and bloating the app
receive prioritised questions from analysts and determine the data needed to answer them
design a data structure to meet the analysts' needs
source any required external data and place it in suitable databases
liase with developers to test all events
![Page 11: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/11.jpg)
Recent HighlightsCreating dashboards - real-time
key metrics for display on large screens around the office
allow product managers to make immediate decisions about content delivery and strategies
give immediate feedback to management
![Page 12: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/12.jpg)
Recent HighlightsCreating dashboards - aggregated
aggregated metrics for display on web-based dashboards
allow analysts to answer more complex queries, building filters and tables
provide longer-term analysis of the game's performance across various segments
![Page 13: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/13.jpg)
When Things Go Wrong
Luis Vicente – Data Engineer at Mind Candy
![Page 14: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/14.jpg)
Problems....
Evan and his friends were enjoying their life in Redshift...
... but there were dark clouds approaching.
![Page 15: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/15.jpg)
Problems...Being prepared (or discovering you're not)
We thought we had designed our eventing system very carefully and would not have many problems
To quote our CPO: “If our servers aren't breaking, then we probably aren't growing fast enough”.
![Page 16: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/16.jpg)
Problems...Being prepared (or discovering you're not)
We thought we had designed our eventing system very carefully and would not have many problems
To quote our CPO: “If our servers aren't breaking, then we probably aren't growing fast enough”.
Then our servers broke.
![Page 17: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/17.jpg)
Problems...Investigating the reasons why
How were we using Redshift?
BI Engineers Deep analysis dashboards Real-time dashboards The eventing system
All using Redshift heavily.
![Page 18: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/18.jpg)
Problems...BI Engineers
They have to answer questions, which need data to be stored in the data warehouse.
How dangerous could a BI Engineer be....?
![Page 19: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/19.jpg)
Problems...Deep Analysis
We use Qlikview for deep analysis
Daily incremental updates of Qlikview datasets at midnight
Small incremental updates during the day to provide up-to-date metrics
We wanted to stop daily refreshes....but couldn't just then
![Page 20: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/20.jpg)
Problems...Real-time Madness
The real-time dashboard started as a generic tool
But product managers love real-time dashboards....
At first.... there was 1 dashboard with 6 metrics
Now.... there are 5 dashboards with 15-20 metrics each
![Page 21: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/21.jpg)
Problems...Eventing System
All these dashboards and users need events
Whirlpool was responsible for storing them in the data warehouse (DWH)
But we were worried about duplicates in the data warehouse...
...so we stored them in a STAGING table, then used an UPSERT job to move them to their final destination
![Page 22: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/22.jpg)
Problems...Why this concern over duplicates?
Our previous DWH was highly structured, with primary keys and uniqueness
Redshift is not an SQL database...
...and uniqueness is not enforced
So you can have duplicates!
![Page 23: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/23.jpg)
What happened?
![Page 24: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/24.jpg)
What happened?
After two days working with a real load, events weren't being stored in the DWH.
We started seeing these kind of error messages:
Serializable isolation violation on table - 111594, transactions forming the cycle are: 2604845, 2604854, 2604912 (pid:2053)
We began running UPSERT jobs every five minutes, but the time it took to do this kept increasing...until they stopped working altogether.
![Page 25: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/25.jpg)
Everything OK now?
We don't have those problems anymore.... but we do have other problems!
We are still executing a huge number of queries, and Redshift can only handle 5 in parallel...
...well, you can configure it to handle more, but they start fighting one another for resources!
Almost!
![Page 26: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/26.jpg)
What are we going to do? Our real-time dashboards will stop using the
DWH as their data source
Since we still need real-time data, we will build a “lambda architecture” around our eventing system
![Page 27: Mind Candy GIAF 4 Presentation](https://reader034.vdocuments.mx/reader034/viewer/2022051513/547de3adb4af9fef158b54eb/html5/thumbnails/27.jpg)
And then everyone is happy!