data visualization in python - github pages · biocomputing bootcamp 2016 matplotlib • resulting...

17
Data visualization in python Day 2

Upload: others

Post on 30-Oct-2019

42 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Data visualization in python

Day 2

Page 2: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

A variety of packages and philosophies• (today) matplotlib: http://matplotlib.org/

– Gallery: http://matplotlib.org/gallery.html– Frequently used commands:

http://matplotlib.org/api/pyplot_summary.html

• Seaborn: http://stanford.edu/~mwaskom/software/seaborn/

• ggplot: – R version: http://docs.ggplot2.org/current/– Python port: http://ggplot.yhathq.com/

• Bokeh (live plots in your browser) – http://bokeh.pydata.org/en/latest/

Page 3: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Matplotlib• Gallery: http://matplotlib.org/gallery.html• Top commands: http://matplotlib.org/api/pyplot_summary.html• Provides "pylab" API, a mimic of matlab• Many different graph types and options, some obscure

Page 4: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Matplotlib• Resulting plots

represented by python objects, from entire figure down to individual points/lines.

• Large API allows any aspect to be tweaked

• Lengthy coding sometimes required to make a plot "just so"

Page 5: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Seaborn

• https://stanford.edu/~mwaskom/software/seaborn/• Implements more complex plot types

– Joint points, clustergrams, fitted linear models

• Uses matplotlib "under the hood"

Page 6: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Others

• ggplot: – (Original) R version: http://docs.ggplot2.org/current/– A recent python port: http://ggplot.yhathq.com/– We'll discuss this on the R side tomorrow, both the basics of

both work similarly.

• Bokeh (live plots in your browser) – http://bokeh.pydata.org/en/latest/

• Plotting functionality built-in to pandas– http://pandas.pydata.org/pandas-

docs/stable/visualization.html

Page 7: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Using matplotlib

• This 'magic' command tells ipython:– Load matplotlib (import as the alias "mpl")– Load the pyplot interface (as "plt"), which approximates the

plotting functionality and syntax of MATLAB Put the output inline with notebook results (rather than saving to file, opening a new window, etc)

• What if we're not using ipython notebook?

In[1]: %pylab inline

All the magic commands:https://ipython.org/ipython-doc/3/interactive/magics.html

import matplotlib as mplimport pyplot as pltimport numpy as np

Page 8: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Generate some data to plot

• Draw 100 samples into x from N(0, 10)• Draw 100 samples into y from N(20, 2)• Set z = 3 times y plus x plus N(0, 1)

• Inspect sample mean and standard deviation using numpy functions mean, std:

>>> print 'x mean: ',np.mean(x) >>> print 'x std: ',np.std(x)x mean: 0.0820478565308 x std: 9.9856477737

Page 9: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Scatterplots

• plt.scatter• plt.title• plt.xlabel• plt.ylabel

http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter

Page 10: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Overlay multiple series on a single plot

• Simply issue more than one plotting command in a row

• Just a few of the parameters you can customize:– marker– color (for other plot types,

edgecolor, fillcolor)– label– Size

• plt.legend() adds a legend

Page 11: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Adjacent plots

>>> my_figure, my_axes = plt.subplots( 1, 2, sharey=True, sharex=True )

>>> my_axes[0].scatter( … )# ...

http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplots

0 1

Page 12: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Boxplots

• plt.boxplot(…)

http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot

Page 13: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Histograms

• plt.hist( … )

http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist

Page 14: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Why are these binned differently?

What's all this?

Page 15: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Check the manual…

http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist

In (required or optional)

3 things out(besides a plot)

Page 16: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

Get bin boundaries from 1st hist, use in 2nd_ = something(…) here,

means call function something (or interpret some expression), get the result, and then toss it (don't put in a variable)

Page 17: Data visualization in python - GitHub Pages · Biocomputing Bootcamp 2016 Matplotlib • Resulting plots represented by python objects, from entire figure down to individual points/lines

Biocomputing Bootcamp 2016

No fill color – can see through overlapping bins