kylin engineering principles
TRANSCRIPT
Engineering Principles of Kylin
October 2014
1
Jiang Xu
2
Done is better than perfect!
3
How to get product ideas?
Get ideas from real product problems
• We get this ideas from ??? project & MicroStrategy limitation :
– Although data is on-boarding to hadoop, how to access data is a big issue. Hive is too slow!
– Although MicroStrategy is fast, MicroStrategy can’t handle 2+ billion records
– Although there are lots of SQL-on-Hadoop solutions, they can’t guarantee the low latency for big query
• Lesson learned
– Try to get ideas from customer’s pain point
– Always get ideas from real product problems
4
Thinking as product instead of project
• We think to build a generic product or platform
– Standard: ANSQL SQL
– Full Stack: ODBC/JDBC for BI tools integration
– …
• Lesson Learned:
– When you get ideas, try to think about a product or platform
– Product is more generic and is easy to adopt in long term
5
Control scope to build best solution
• Due to the time and resource limitation, we must control the scope of product
– Focus on MOLAP instead of HOLAP
– Focus on Tableau instead of MicroStrategy
– Don’t support real-time
– Don’t support full SQL
• Try to build best solution for a “small problem”
6
Reference the industry solution & academic papers
• Study industry analysis report
– Gartner
– Forrester
– …
• Study existed solution
– Google BigQuery
– Google Dremel & PowerDrill
– SQL-on-Hadoop (Hive, Presto, Phoenix, Druid…)
• Study academic papers
– Data Mining Concepts and Techniques, 3rd
– Lost of papers on data cube, OLAP…
7
8
How to setup a team?
Find the right people
• Due to the complexity of this product, we put lots of efforts to setup the team
– Smart
– Diligent
– Solid CS background
– Matching the team’s chemistry
• Try to use your connection to find the good candidate
– Find a very good team member by friend
• Try to give a tough interview to find the good candidate
– Give a 2+ hours 1:1 interview to find a good member, mostly on coding, algorithm and problem solving
9
Assign the right tasks to the right people
• Assign the components based on the team member’s capability and interesting
• All member have to do the dirty work
• All member have the opportunity to do challenge tasks.
• People have to prove himself to take more challenge task
10
Lead by example
• Leader Knows Details, Leader Writes Code
• If you want the team member to follow the engineering principle, the leader must follow it firstly.
– For example, the test driven development, the leader must write test case firstly.
• Lead should take nobody-wanted tasks
– Support
– Testing
– Customer onboard
– …
11
12
How to design a product?
Done is better than perfect
• It’s easy to design a “perfect” system. But it’s hard to design a feasible system!
• Due to resource limitation, we must guarantee that the design can be done by the team.
• Don’t do everything average. Try to do one thing best!
13
KISS – Keep it simple stupid
• Designing a simple system is much challenge than a complex system.
– Give simple solution to complex problem;
– Build a system that is easy to maintain and extend over time
• For example, Kylin has a very simple deployment architecture: just web server besides hadoop
14
SOLID Principles - Robert C. Martin
• SRP: Single responsibility principle
• OCP: Open/closed principle
• LSP: Liskov substitution principle
• ISP: Interface segregation principle
• DIP: Dependency inversion principle
15
Don’t reinvent wheels
• Try to reuse the existed open source product
– Calcite
– Hive
– MapReduce
– HBase
– …
• Try to reference the existed solution
– Bias error in Hyperloglog
• Google Hyperloglog++
• Facebook Presto: magic parameter
16
80-20 Rule
• Put 80% efforts to develop 20% most important features
• What should be done?
– ODBC driver
– Analytic SQL: groups, aggregation, filter, join, projection, sub-query…
– …
• What shouldn’t be done?
– BI tools
– Full ANSI SQL
– …
17
Explain your design in simple words
• If you can’t explain to your peers with simple words, there must be something wrong.
• Challenge each other!
• Good design is involving!
18
Build a workable prototype
• Paper work can’t verify your design
• Only the workable prototype can validate your design.
• We use 1 month to build a workable prototype
– SQL is parsed by hand-written ANTLR
– Cube is built by simple map-reduce scripts
19
20
How to develop a product?
Automate Test
• Auto integration testing >> auto unit test
– No mock!
– Test on live system
– Each case cover one user case
• 1+ auto test for each feature & 1+ auto test for each bug fix
• Reusing a golden-standard test sample will simplify the test cases building
• Automate everything
– Compare SQL result between H2 and Hadoop
21
Code Review - Simple is Beautiful
• Code is clear to read and easy to change
• If I have problem understanding your code, FIX it!
– One class has > 1 responsibilities
– Code looks complex
– Not easy to do enhancement
– Duplicate logic
– Package organization looks messy
– …
22
Code Review – Buddy Programming
• Can Code Review find Bugs ??? – NO !!!
• How can we find Bugs
– Testing as a customer with vertical use case
– You write first version, I write second version
– Each component has 2+ owner
23
Continues Code Refactoring
• If other people have problem understanding your code, REFACTOR it!
• Comprehensive auto test suite make refactor much easy
24
DevOps – Develop For Operation
• Logging every important information
• Export every important metrics
• Easy to trouble shooting
• Easy to monitor
• One-liner installation
25
Performance Tuning - Question Everything
• System Level
– CPU, Memory
• JVM Level
– GC: Calcite generate code and use up perm generation that trigger full GC
– Java Profile to question yourself every hotspot
– Remove hotspot One by One
• Hadoop
– Data Skew
– MapReduce Job Tuning
– …
• Algorithm
– Hyperloglog26
Open Source Adoption
• Open source software is budget-free, but isn’t bug-free
• We fix lots of bugs
– Calcite
– Trev4j
– Hyperloglog
– …
27
28
How to on-board a product?
Customer is 1st priority
• Work with customer closely
– Help customer to design cube
– Refine requirements to reduce complexity
• because we make impossible become possible
• Fix bug quickly
– Fix product bus is more important than feature development
• Continues Improvement
– Prioritize the customer requirement
– Give a workable solution quickly, then improve it later.
• Specific requirement vs. Generic requirement
– Do your best to give generic solution for specific requirement
– Say NO to very specific solution
29
2+ Cases for Product Verification
• To develop a good product, we need at least 2+ use case to verify and finalize our design.
• Try to find different use cases to verify product
– Transaction Data
– Behavior Data
30
Usability is Key
• Usability is key for customer onboarding
• Easy used GUI to hide the complexity concepts
• …
31
Q & A
32