intro to python data analysis in wakari
DESCRIPTION
Outlines the vision and philosophy for Wakari.io with a basic overview of popular python data analysis packages. Most of the talk is conducted in Wakari and is not visible on these slides. 90 minutes for PyData NYC, November 8th 2013.TRANSCRIPT
Intro to Python Data Analysis in Wakari
Karissa McKelveySoftware Developer Continuum Analytics
@karissamck
November 8, 2013PyData NYC
$ WHOAMI
karissamck.com@karissamck
truthy.indiana.edu
More Tweets, Mote Votes
Get you excited about data analysis in Wakari
Walk through some basic analysis packages and wakari workflows
Kick-start your journey
MY GOALS
WHO ARE YOU?
Putting Science back in Comp Sci
• Much of the software stack is for systems programming --- C++, Java, .NET, ObjC, web
- Complex numbers? - Vectorized primitives?
• Software stack for scientists is not as helpful as it should be
• Fortran is still where many scientists end up
Why Python?
High Performance with BIG DATA
Packages for data analysis and visualization
Syntax – Gets out of your way
Community Driven
Ready for web applications, too.
• “Python is good for data cleanup, R for statistical models”
“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
• “Python is good for data cleanup, R for statistical models”
• “R is quirky and weird but the statisticians love it and there really isn’t any compelling reason to switch”
“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
• “Python is good for data cleanup, R for statistical models”
• “R is quirky and weird but the statisticians love it and there really isn’t any compelling reason to switch”
• “You’re running an MCMC simulation on a laptop? Perhaps you should write it in C++/FORTRAN”
“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
Ready for DATA, and then some
“You’re running an MCMC simulation on a laptop? Perhaps you should write it in C++/FORTRAN”
Numba: just-in-time compiler to LLVM through @decorators
numba.pydata.org
Numba: just-in-time compiler to LLVM through @decorators*
numba.pydata.org*aka, fast. easy.
Basic packages for data analysis and visualization
NumPy: The foundation of the Python Data Analysis stack
NumPy: Array-oriented
Pandas: Builds upon NumPy
Matplotlib: 2D plotting library
IPython: Interactive Python (+ in the Web)
tab completionmagic %-commands
Inline plots
Anaconda: pulls it all together
wakari.io Browser-based Python & Linux environment
Share files, IPython notebooks, and plots with pay-as-you-go compute
IPython Notebook
Scientific Packages
Terminal
Sharing in Wakari
• Packages IPython notebooks, files, folders, data, and environment
• Get a link
• Share that link.
Reproducible Research
“A rule of thumb among biotechnology venture capitalists is that half of published research
cannot be replicated”
How do we replicate research today?
How do we replicate research today?collaborate on
How do we replicate research today?collaborate on
data analysis
How do we collaborate today?
How do we collaborate today?
How do we collaborate today?
How do we collaborate today?
????????
How do we replicate research today?
wakari.io Browser-based Python & Linux environment
Enterprise or Cloud
Online at wakari.io or install locally for access to your hardware and data
wakari.io Browser-based Python & Linux environment
Coming Soon
Project-based interaction
Projects starting at 10$/month with unlimited team members
user
Interactive Plotting
Next-generation collaborative data manipulation, analysis, and presentation
Talks to see
• Jack Vanderplas (Washington)– Efficient computing with Numpy • 29th Floor combo 3pm (Right now, next door!)
• Julia Evans (N/A)– A practical introduction to IPython Notebook &
pandas • Here, 4:45pm.
Talks to see
• Sarah Guido (Michigan)– A Beginner’s Guide to Machine Learning with
scikit-learn
• Imram Haque (Counsyl)– Beyond the dict
• Peter Wang (Continuum)– Bokeh Workshop
Special Thanks
Ben ZaitlinMark FlorissonClayton Davis
Bryan Van de VenTravis Oliphant
Karissa McKelvey@karissamck