intro to python data analysis in wakari

65
Intro to Python Data Analysis in Wakari Karissa McKelvey Software Developer Continuum Analytics @karissamck November 8, 2013 PyData NYC

Upload: karissa-rae-mckelvey

Post on 26-Jan-2015

137 views

Category:

Technology


16 download

DESCRIPTION

Outlines the vision and philosophy for Wakari.io with a basic overview of popular python data analysis packages. Most of the talk is conducted in Wakari and is not visible on these slides. 90 minutes for PyData NYC, November 8th 2013.

TRANSCRIPT

Page 1: Intro to Python Data Analysis in Wakari

Intro to Python Data Analysis in Wakari

Karissa McKelveySoftware Developer Continuum Analytics

@karissamck

November 8, 2013PyData NYC

Page 2: Intro to Python Data Analysis in Wakari

$ WHOAMI

karissamck.com@karissamck

Page 3: Intro to Python Data Analysis in Wakari

truthy.indiana.edu

Page 4: Intro to Python Data Analysis in Wakari

More Tweets, Mote Votes

Page 5: Intro to Python Data Analysis in Wakari

Get you excited about data analysis in Wakari

Walk through some basic analysis packages and wakari workflows

Kick-start your journey

MY GOALS

Page 6: Intro to Python Data Analysis in Wakari

WHO ARE YOU?

Page 7: Intro to Python Data Analysis in Wakari
Page 8: Intro to Python Data Analysis in Wakari
Page 9: Intro to Python Data Analysis in Wakari
Page 10: Intro to Python Data Analysis in Wakari
Page 11: Intro to Python Data Analysis in Wakari
Page 12: Intro to Python Data Analysis in Wakari
Page 13: Intro to Python Data Analysis in Wakari
Page 14: Intro to Python Data Analysis in Wakari

Putting Science back in Comp Sci

• Much of the software stack is for systems programming --- C++, Java, .NET, ObjC, web

- Complex numbers? - Vectorized primitives?

• Software stack for scientists is not as helpful as it should be

• Fortran is still where many scientists end up

Page 15: Intro to Python Data Analysis in Wakari
Page 16: Intro to Python Data Analysis in Wakari

Why Python?

Page 17: Intro to Python Data Analysis in Wakari

High Performance with BIG DATA

Page 18: Intro to Python Data Analysis in Wakari

Packages for data analysis and visualization

Page 19: Intro to Python Data Analysis in Wakari

Syntax – Gets out of your way

Page 20: Intro to Python Data Analysis in Wakari

Community Driven

Page 21: Intro to Python Data Analysis in Wakari

Ready for web applications, too.

Page 22: Intro to Python Data Analysis in Wakari
Page 23: Intro to Python Data Analysis in Wakari

• “Python is good for data cleanup, R for statistical models”

“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python

Page 24: Intro to Python Data Analysis in Wakari

• “Python is good for data cleanup, R for statistical models”

• “R is quirky and weird but the statisticians love it and there really isn’t any compelling reason to switch”

“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python

Page 25: Intro to Python Data Analysis in Wakari

• “Python is good for data cleanup, R for statistical models”

• “R is quirky and weird but the statisticians love it and there really isn’t any compelling reason to switch”

• “You’re running an MCMC simulation on a laptop? Perhaps you should write it in C++/FORTRAN”

“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python

Page 26: Intro to Python Data Analysis in Wakari

Ready for DATA, and then some

“You’re running an MCMC simulation on a laptop? Perhaps you should write it in C++/FORTRAN”

Page 27: Intro to Python Data Analysis in Wakari

Numba: just-in-time compiler to LLVM through @decorators

numba.pydata.org

Page 28: Intro to Python Data Analysis in Wakari

Numba: just-in-time compiler to LLVM through @decorators*

numba.pydata.org*aka, fast. easy.

Page 29: Intro to Python Data Analysis in Wakari
Page 30: Intro to Python Data Analysis in Wakari

Basic packages for data analysis and visualization

Page 31: Intro to Python Data Analysis in Wakari

NumPy: The foundation of the Python Data Analysis stack

Page 32: Intro to Python Data Analysis in Wakari

NumPy: Array-oriented

Page 33: Intro to Python Data Analysis in Wakari
Page 34: Intro to Python Data Analysis in Wakari
Page 35: Intro to Python Data Analysis in Wakari
Page 36: Intro to Python Data Analysis in Wakari

Pandas: Builds upon NumPy

Page 37: Intro to Python Data Analysis in Wakari

Matplotlib: 2D plotting library

Page 38: Intro to Python Data Analysis in Wakari

IPython: Interactive Python (+ in the Web)

tab completionmagic %-commands

Inline plots

Page 39: Intro to Python Data Analysis in Wakari

Anaconda: pulls it all together

Page 40: Intro to Python Data Analysis in Wakari
Page 41: Intro to Python Data Analysis in Wakari

wakari.io Browser-based Python & Linux environment

Page 42: Intro to Python Data Analysis in Wakari

Share files, IPython notebooks, and plots with pay-as-you-go compute

IPython Notebook

Scientific Packages

Terminal

Page 43: Intro to Python Data Analysis in Wakari

Sharing in Wakari

• Packages IPython notebooks, files, folders, data, and environment

• Get a link

• Share that link.

Page 44: Intro to Python Data Analysis in Wakari

Reproducible Research

Page 45: Intro to Python Data Analysis in Wakari
Page 46: Intro to Python Data Analysis in Wakari

“A rule of thumb among biotechnology venture capitalists is that half of published research

cannot be replicated”

Page 47: Intro to Python Data Analysis in Wakari

How do we replicate research today?

Page 48: Intro to Python Data Analysis in Wakari

How do we replicate research today?collaborate on

Page 49: Intro to Python Data Analysis in Wakari

How do we replicate research today?collaborate on

data analysis

Page 50: Intro to Python Data Analysis in Wakari

How do we collaborate today?

Page 51: Intro to Python Data Analysis in Wakari

How do we collaborate today?

Page 52: Intro to Python Data Analysis in Wakari

How do we collaborate today?

Page 53: Intro to Python Data Analysis in Wakari

How do we collaborate today?

Page 54: Intro to Python Data Analysis in Wakari

????????

Page 55: Intro to Python Data Analysis in Wakari

How do we replicate research today?

Page 56: Intro to Python Data Analysis in Wakari

wakari.io Browser-based Python & Linux environment

Page 57: Intro to Python Data Analysis in Wakari

Enterprise or Cloud

Online at wakari.io or install locally for access to your hardware and data

Page 58: Intro to Python Data Analysis in Wakari

wakari.io Browser-based Python & Linux environment

Page 59: Intro to Python Data Analysis in Wakari

Coming Soon

Page 60: Intro to Python Data Analysis in Wakari

Project-based interaction

Projects starting at 10$/month with unlimited team members

user

Page 61: Intro to Python Data Analysis in Wakari

Interactive Plotting

Next-generation collaborative data manipulation, analysis, and presentation

Page 62: Intro to Python Data Analysis in Wakari

Talks to see

• Jack Vanderplas (Washington)– Efficient computing with Numpy • 29th Floor combo 3pm (Right now, next door!)

• Julia Evans (N/A)– A practical introduction to IPython Notebook &

pandas • Here, 4:45pm.

Page 63: Intro to Python Data Analysis in Wakari

Talks to see

• Sarah Guido (Michigan)– A Beginner’s Guide to Machine Learning with

scikit-learn

• Imram Haque (Counsyl)– Beyond the dict

• Peter Wang (Continuum)– Bokeh Workshop

Page 64: Intro to Python Data Analysis in Wakari

Special Thanks

Ben ZaitlinMark FlorissonClayton Davis

Bryan Van de VenTravis Oliphant

Page 65: Intro to Python Data Analysis in Wakari

Karissa McKelvey@karissamck