python data structures - best in class for data analysis

Python Data Structures -

Best In Class for

Data Science

Rajesh ManickadasJuly 2016

ObjectiveThe Objective of this Presentation is to Introduce python data structures available for data science

Python Data Structures - PrimerA Refresher to Python Data Structures

Tuples Immutable Arrays

Lists Mutable Arrays

Dict Hashtables

More…

➔ Built-In Types➔ Data Type Modules➔ Numerical and

Mathematical Modules

Python Data Structures - Functional Optimization PatternsThe Prime Objective is to optimize the data structures for functional programming optimization

Scalars are Python Objects designed with functional optimization patterns.

>>> a = 45>>> b = 45>>> id(a)16790784>>> id(b)16790784

16790784

No Arrays. List and Lists and List of Lists and List of List of Lists….

Good for Functional Work and Not Designed for Large Data Processing

NdarraysStructure

NumPy Data Structures - ndarrays - BasicsNdarrays - Basic Modelling

MetaData

Data Buffer

Metadata Flexibility/Shape - Designed with Data Transformation Optimization patterns ex.

Transpose Reuse - Reuse of the Data Buffer ex. Views Dataype Encapsulation - Scalars

Data Buffer A Chunk of Memory starting at a particular location

NumPy Data Structures - ndarray - Data Transformations

Ndarrays - Data Transformation Optimizations

PyArrayObect

typedef struct PyArrayObject {

PyObject_HEAD

char *data;

int nd;

npy_intp *dimensions;

npy_intp *strides;

PyObject *base;

PyArray_Descr *descr;

int flags;

PyObject *weakreflist;

} PyArrayObject;

>>> import numpy as np>>> matx = np.arange(15)>>> id(matx)139892166884368>>> mat3x5 = matx.reshape(3,5)>>> id(mat3x5)139892020117712>>> matxarray([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])>>> mat3x5array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])>>> matx[4] = 100>>> matxarray([ 0, 1, 2, 3, 100, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])>>> mat3x5array([[ 0, 1, 2, 3, 100], [ 5, 6, 7, 8, 9], [ 10, 11, 12, 13, 14]])>>> _

Ndarray 1:matx

Ndarray 2:mat3x5

reshape

Dim:1strides:(8,)shape:(15,)

Dim:2strides:(40,8)shape:(3,5)

NumPy Data Structures - More ConceptsThe More you know, The More you operate

Broadcasting

N-D Iterators

Indexing

Scalars

Routines

Shapes and Views

Pandas - Where Python Meets the TablesFor what people see is what they manipulate

Series(1n)

DataFrame (2n)

Panels(3n) Tables

DataFrame

Indexing

Set Algebra

Immutable

Ordered Set

Hash/Dict

Unions

Filters

Intersections

Pandas - IndexingIndexing Is the Key Data Structure Element to Pandas

● Index is a PandasObject● The Motivation is to enable different implementation of Indexing - Custom

Indexing● Indexes are immutable● Multi Indexing/Hierarchical Indexing● Time Series - DateTime Indexes

Pandas - Indexing a DataFrameIndexing Organization

Year Total Gas Liquid Solid

1997 250255 12561 66649 159191

1998 255310 12990 71750 158106

1999 271548 11549 77852 169087

2000 281389 11974 82834 172812

Label Index DateTime Index

Array, ordered, immutable, hashtable,int64

Array, ordered, immutable, hashtable,timestamp

NdarraydatadypeIndex (axis)columns

Pandas - Time Series - C02 Emissions in India (1858- 2014)Time Series Example

>>> import numpy as np>>> import pandas as pd>>> import matplotlib.pyplot as plt>>> dateparse = lambda dates: pd.datetime.strptime(dates, '%Y')>>> co2emission = pd.read_table('inco2.csv',delimiter=',',header='infer', parse_dates=True, index_col='Year',date_parser=dateparse)>>> co2emission.plot()<matplotlib.axes.AxesSubplot object at 0x7fd79d20bcd0>>>> plt.show()>>> co2solidemission = co2emission['Solid']>>> co2solidemission.plot()<matplotlib.axes.AxesSubplot object at 0x7fd79be3bf50>>>> plt.show()>>> co2solidemission.mean()50129.979310344825

Pandas - More Concepts

● Set Algebra - SQL Joins, Indexing and Filtering● Categorical Data● I/0 Optimizations● R Integration● Panels

python data structures - best in class for data analysis

Technology

data structures in python - r grapenthin · basic python...

data structures for statistical computing in python

python data structures · python data structures author:...

discrete structures with python

fundamentals of python: from first programs through data...

nested data structures - stanford university...nesting data...

programming, data structures and algorithms...

fundamentals of python: data...

data structures and algorithms using python

fundamentals of python: from first programs through data...

programming, data structures and algorithms in...

lecture 8: summary of haskell course + type level...

teaching data structures with python

dictionaries, tuples, and files - advanced data structures...

workflows and abstractions for...

python data structures implementation€¦ · python data...

lecture 9 - dsa - python data structures

data structures for programming with python - cs101 ... ·...

chapter 7: data structures data structure 7.1:...

java structures: data structures for the principled...