cloudera impala

Download Cloudera Impala

Post on 25-Dec-2014

110 views

Category:

Documents

5 download

Embed Size (px)

DESCRIPTION

Cloudera Impala provides a fast, ad hoc query capability to Apache Hadoop, complementing traditional MapReduce batch processing. Learn the design choices and architecture behind Impala, and how to use near-ubiquitous SQL to explore your own data at scale. As presented to Portland Big Data User Group on July 23rd 2014. http://www.meetup.com/Hadoop-Portland/events/194930422/

TRANSCRIPT

  • 1. 1 Cloudera Impala Portland Big Data User Group, July 2014 Alex Moundalexis @technmsg
  • 2. Thirty Seconds About Alex SoluGons Architect aka consultant government infrastructure former coder of Perl former administrator fan of Portland 2
  • 3. What Does Cloudera Do? product distribuGon of Hadoop components, Apache licensed enterprise tooling support training services (aka consulGng) community 3
  • 4. Disclaimer Cloudera builds things soPware most donated to Apache some closed-source Cloudera products I reference are open source Apache Licensed source code is on GitHub hVps://github.com/cloudera 4
  • 5. What This Talk Isnt About deploying Puppet, Chef, Ansible, homegrown scripts, intern labor sizing & tuning depends heavily on data and workload coding unless you count XML or CSV or SQL algorithms 5
  • 6. Public Domain IFCAR
  • 7. CC BY-SA Lilian De Cassai
  • 8. cloudera impala 8 /kloudi()r impal/ noun a modern, open source, MPP SQL query engine for Apache Hadoop. Cloudera Impala provides fast, ad hoc SQL query capability for Apache Hadoop, complemenGng tradiGonal MapReduce batch processing.
  • 9. 9 Quick and dirty, for context. The Apache Hadoop Ecosystem
  • 10. Why Ecosystem? In the beginning, just Hadoop HDFS MapReduce Today, dozens of interrelated components I/O Processing Specialty ApplicaGons ConguraGon Workow 10
  • 11. HDFS Distributed, highly fault-tolerant lesystem OpGmized for large streaming access to data Based on Google File System hVp://research.google.com/archive/gfs.html 11
  • 12. Lots of Commodity Machines 12 Image:Yahoo! Hadoop cluster [ OSCON 07 ]
  • 13. MapReduce (MR) Programming paradigm Batch oriented, not realGme Works well with distributed compuGng Lots of Java, but other languages supported Based on Googles paper hVp://research.google.com/archive/mapreduce.html 13
  • 14. Under the Covers 14
  • 15. You specify map() and reduce() functions. The framework does the rest. 60
  • 16. Apache Hive AbstracGon of Hadoops Java API HiveQL compiles down to MR a SQL-like language Eases analysis using MapReduce 16
  • 17. Apache Hive Metastore Maps HDFS les to DB-like resources Databases Tables Column/eld names, data types Roles/users InputFormat/OutputFormat 17
  • 18. WHY DO WE NEED THIS? But wait 18
  • 19. 19
  • 20. 20 I am not a SQL wizard by any means Super Shady SQL Supplement
  • 21. A Simple RelaGonal Database name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 21 >
  • 22. InteracGng with RelaGonal Data name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 22 > SELECT * FROM people;
  • 23. InteracGng with RelaGonal Data name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 23 > SELECT * FROM people;
  • 24. RequesGng Specic Fields name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 24 > SELECT name, state FROM people;
  • 25. RequesGng Specic Fields name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 25 > SELECT name, state FROM people;
  • 26. RequesGng Specic Rows name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 26 > SELECT name, state FROM people WHERE year < 2012;
  • 27. RequesGng Specic Rows name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 27 > SELECT name, state FROM people WHERE year < 2012;
  • 28. Two Simple Tables owner species name Alex Cactus Marvin Joey Cat Brain Sean None Paris Unknown 28 > name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011
  • 29. Joining Two Tables owner species name Alex Cactus Marvin Joey Cat Brain Sean None Paris Unknown 29 > SELECT people.name AS owner, people.state AS state, pets.name AS pet FROM people LEFT JOIN pets ON people.name = pets.owner name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011
  • 30. Joining Two Tables owner species name Alex Cactus Marvin Joey Cat Brain Sean None Paris Unknown 30 > SELECT people.name AS owner, people.state AS state, pets.name AS pet FROM people LEFT JOIN pets ON people.name = pets.owner name state employer year Alex Maryland

Recommended

View more >