pandas - cujammu.ac.in _python_for_data… · pandas - terminology matplotlib is a python 2d...
TRANSCRIPT
![Page 1: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/1.jpg)
Pandas
Python for Data Analysis
By: Asst. Prof. Abhishek Singh Sambyal
Central University of Jammu
![Page 2: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/2.jpg)
Pandas - outline● Overview
● Purpose - Why?????
● Terminology
● DataFrame
● DataFrame – Reindexing
● DataFrame - Summarising and Descriptive Statistics
● Bibliography
![Page 3: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/3.jpg)
Pandas - overview● Python Data Analysis Library, similar to:
○ R
○ MATLAB
● Combined with IPython toolkit
● Built on top of NumPy, SciPy, to some Matplotlib
● Open source - BSD License
● Key Component
○ Dataframe
![Page 4: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/4.jpg)
Pandas - Why??????● Ideal tool for data scientists
● Munging data
● Cleaning data
● Analyzing data
● Modeling data
● Organizing the results of the analysis into a form
suitable for plotting or tabular display
![Page 5: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/5.jpg)
Pandas - terminology● IPython is a command shell for interactive computing in
multiple programming languages, especially focused on the
Python programming language, that offers enhanced
introspection, rich media, additional shell syntax, tab
completion, and rich history.
● NumPy is the fundamental package for scientific computing
with Python.
![Page 6: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/6.jpg)
Pandas - terminology● Matplotlib is a python 2D plotting library which produces
publication quality figures in a variety of hardcopy
formats and interactive environments across platforms.
● SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem
of open-source software for mathematics, science, and
engineering.
● Data Munging or Data Wrangling means taking data that's
stored in one format and changing it into another format.
![Page 7: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/7.jpg)
Pandas - terminology● Cython programming language is a superset of Python with
a foreign function interface for invoking C/C++ routines
and the ability to declare the static type of subroutine
parameters and results, local variables, and class
attributes.
![Page 8: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/8.jpg)
Pandas -Dataframe
Data structure
![Page 9: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/9.jpg)
Dataframe data structure● Spreadsheet-like data structure containing an order
collection of columns
● Has both a row and column index
● Consider as dict of Series (with shared index)
![Page 10: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/10.jpg)
Dataframe● Creation with dict of equal-length lists
![Page 11: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/11.jpg)
Dataframe● Creation with dict of dicts
![Page 12: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/12.jpg)
Dataframe● Columns can be retrieved as
Series
○ dict notation
○ attribute notation
● Rows can retrieved by position or by name (using ix attribute)
![Page 13: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/13.jpg)
Dataframe● New Columns can be added (by
computation or direct
assignment)
![Page 14: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/14.jpg)
DataFrame - Reindexing● Creation of new object with the
data conformed to a new index
![Page 15: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/15.jpg)
DataFrame - Summarising and Descriptive Statistics
![Page 16: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats](https://reader030.vdocuments.mx/reader030/viewer/2022040110/5f0360697e708231d408e942/html5/thumbnails/16.jpg)
Pandas - Bibliography ● Pandas: Python Data Analysis Library.http://pandas.pydata.org/
● IPython. http://ipython.org/http://en.wikipedia.org/wiki/IPython
● NumPy - http://www.numpy.org/
● SciPy - http://scipy.org/
● Matplotlib - http://matplotlib.org/
● Cython - http://www.cython.org/