use dependency injection to get hadoop *out* of your application code

31
Opower @ Hadoop Summit North America Use Dependency Injection to get Hadoop out of your application code June 27, 2013 Eric Chang Technology Lead, Data Services Opower

Upload: hadoop-summit

Post on 20-Aug-2015

1.350 views

Category:

Technology


0 download

TRANSCRIPT

  1. 1. Opower @ Hadoop Summit North America Use Dependency Injection to get Hadoop out of your application code June 27, 2013 Eric Chang Technology Lead, Data Services Opower
  2. 2. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem Statement 2. Solution 3. Example 4. Opower case study 5. Wrap up
  3. 3. Opower @ Hadoop Summit North America 3 Problem Statement Hadoop is hard, lets go shopping!, or Effective Separation of Concerns in Hadoop
  4. 4. Opower @ Hadoop Summit North America 4 Problem Statement Why Separation of Concerns? Integration/migration of existing code Allows for code re-use Allows for different levels of expertise Greater testability Hadoop doesnt do Separation of Concerns serialization, input/output formats, and partitioning are not portable provides little guidance/out of the box functionality for integrating code components (existing or new)
  5. 5. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem Statement 2. Solution 3. Example 4. Opower case study 5. Wrap up
  6. 6. Opower @ Hadoop Summit North America 6 Solution Dependency Injection, or Dont call us, well call you
  7. 7. Opower @ Hadoop Summit North America 7 Solution: DI, illustrated aRealtimeCallFromTheWeb() { IoC container BizServiceImpl Realtime ReadDAO Realtime WriteDAO businessService.run() } Realtime DataStore
  8. 8. Opower @ Hadoop Summit North America 8 Solution: DI, illustrated IoC container BizServiceImpl reduce(key, values, context) { ContextBacked WriteDAO businessService.run() } ValuesBacked ReadDAO
  9. 9. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem Statement 2. Solution 3. Example 4. Opower case study 5. Wrap up
  10. 10. Opower @ Hadoop Summit North America 10 Example Small-batch, Artisanal WordCount -> Petabyte-scale WordCount* *healthy suspension of disbelief required refs http://wiki.apache.org/hadoop/WordCount
  11. 11. Opower @ Hadoop Summit North America 11 Example: Artisanal WordCount You live in a borough of NYC and have a beard Youve built a great business around counting words, one at a time, in small, handcrafted batches in linear O(n) time You receive files from customers and run your simple but effective code You had the foresight to know that some day you need to scale up. So you created a properly componentized architecture: Domain objects Data access layer Service layer (application logic)
  12. 12. Opower @ Hadoop Summit North America WordCountDTO word : String count: int 12 Example: Artisanal WordCount getWords() : Iterable writeWordCount(count : WordCountDTO) countWord(word : String) WordCount ServiceImpl ArtisanalWord CountDAO 1 2 3 1. Retrieve words 2. Count words 3. Write count
  13. 13. Opower @ Hadoop Summit North America 13 Example: Artisanal WordCount Core business logic: WordCountServiceImpl.countWord() public void countWord(String word) { int wordCount = 0; for(String nextWord : wordCountDAO.getWords()){ if(nextWord.equals(word)) ++wordCount; } WordCountDTO wordCountDTO = new WordCountDTO(word, wordCount); wordCountDAO.writeWordCount(wordCountDTO); }
  14. 14. Opower @ Hadoop Summit North America 14 Example: Artisanal WordCount IoC configuration (Google Guice) public class WordCountGuiceModule extends AbstractModule { ... @Override protected void configure() { bind(WordCountService.class) .to(WordCountServiceImpl.class); bind(WordCountDAO.class) .toInstance(this.wordCountDAO); } }
  15. 15. Opower @ Hadoop Summit North America 15 Example: Artisanal WordCount Artisanal WordCount wiring and execution WordCountDAO wordCountDAO = new ArtisanalWordCountDAO(inFile, outFile); WordCountService wordCountService = Guice.createInjector( new WordCountGuiceModule(wordCountDAO) ).getInstance(WordCountService.class); for(String word : getWordsToCount()) { wordCountService.countWord(word); }
  16. 16. Opower @ Hadoop Summit North America 16 Example: Artisanal WordCount artisanalWordCount() { IoC container WordCountServiceImpl ArtisanalWord CountDAO wordCountService .countWord(hat) } bat cat hat mat hat sat rat
  17. 17. Opower @ Hadoop Summit North America 17 Example: Artisanal WordCount
  18. 18. Opower @ Hadoop Summit North America 18 Example: Petabyte WordCount Indie days are over: petabytes of words! O(n) wont cut it Hadoop to the rescue. You partition by word in your map phase. Your reduce method looks like: public void reduce(Text key, Iterable values, Context context) MapReduceWordCountDAO fulfills the WordCountDAO contract (more on this later) WordCountDTOs are written to an MR context and collected
  19. 19. Opower @ Hadoop Summit North America 19 Example: Petabyte WordCount reduce(key, values, context) { IoC container WordCountServiceImpl MapReduce WordCountDAO wordCountService .countWord(key.toString()) } bat cat hat mat hat sat cat bat: cat: hat:
  20. 20. Opower @ Hadoop Summit North America 20 Example: Petabyte WordCount Petabyte WordCount wiring and execution public void reduce(Text key, Iterable values, Context ctx){ MapReduceWordCountDAO wordCountDAO = new MapReduceWordCountDAO(key,values,ctx); WordCountService wordCountService = Guice.createInjector( new WordCountGuiceModule(wordCountDAO) ).getInstance(WordCountService.class); wordCountService.countWord(key.toString()); }
  21. 21. Opower @ Hadoop Summit North America 21 Example: Petabyte WordCount
  22. 22. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem Statement 2. Solution 3. Example 4. Opower case study 5. Wrap up
  23. 23. Opower @ Hadoop Summit North America 23 Opower case study: Bill Projection Opower in 5 bullet points We Help People like you & me Reduce Your energy usage by working with utility companies to analyze energy usage and provide actionable insights One of the ways we do this is via Bill Projection
  24. 24. Opower @ Hadoop Summit North America 24 Opower case study: Bill Projection How it works Retrieve energy usage (kWh, therms) Forecast usage Apply rates to project costs Rate Engine rates $30
  25. 25. Opower @ Hadoop Summit North America 25 Opower case study: Bill Projection DI used to employ the same code components for batch and in-process, synchronous (real-time) calculations Batch M/R calculations In-process calculations web emailsms ivr Bill Projection code components Curated data inputs Results validation
  26. 26. Opower @ Hadoop Summit North America 26 Opower case study: Bill Projection Spring IoC container BillForecastServiceImpl billForecastService .forecast() } HBase map() reduce(key, values, context) { RateEngineImpl MapReduceDAOMRUsageDAO
  27. 27. Opower @ Hadoop Summit North America 27 Opower case study: Bill Projection calculateBillProjection() { Spring IoC container BillForecastServiceImpl RateEngineImpl MapReduceDAOHBaseUsageDAO HBase billForecastService .forecast() }
  28. 28. Opower @ Hadoop Summit North America 28 Opower case study: Bill Projection Benefits of DI solution Were able to use pre-Hadoop Rate Engine code component Calculations can be applied in batch and/or in real-time Good test coverage
  29. 29. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem Statement 2. Solution 3. Example 4. Opower case study 5. Wrap up
  30. 30. Opower @ Hadoop Summit North America 30 Wrap up Dependency Injection + Hadoop gives you Separation of Concerns Batch and real-time calculations using the same code Some limitations Code is sufficiently componentized Assumes domain classes can survive MR partitioning Somebody still has to know MR Opower employs DI + Hadoop to serve up Bill Projections using a mixed batch + real-time workflow
  31. 31. Opower @ Hadoop Summit North America 31 Wrap up Questions? Eric Chang Technical Lead, Data Services Opower [email protected] http://www.linkedin.com/in/ericgchang Artisanal WordCount example: https://github.com/opower/artisanal-word-count