creative data analysis with python
TRANSCRIPT
Creative Data Analysiswith Python
Grant Paton-Simpson
Senior Data & Implementation Specialist Optima Corporation
Creator of SOFA Statistics
Great Python Tools Available● Matplotlib (see Creating Interactive Applications in
Matplotlib by Jake Vanderplas http://vimeo.com/63260224)● Numpy● Python sets, ordered dicts, named tuples● PANDAS● SQL Alchemy, adodbapi, dbapi● Easy text processing
(e.g. HTML)● CSV● Python!
Get Inspired!
Flexibility
Use Freedom Responsibly!
See http://blog.revolutionanalytics.com/2010/04/when-infographics-go-bad.html etcand http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation
The point is in there somewhere – honest!
Simple can be best
Make a Simple Point● Make complex things simple● Extract small information from large data● Present truth, do not deceive
http://www.dataists.com/2010/10/...… what-data-visualization-should-do-simple-small-truth/
Make it easy for the audience
Flexible analysis needs flexible tools
Matplotlib can do it
is your friend
● How to shift a legend outside the plot
● How to have a major and minor axis
● How to shift x axis labels to the middle of a bar
● How to position a triangle a certain percentage along the x axis
● How to apply a heat map to circles etc etc
Annotations, layers, shape placement and much more!
Example with Percentile Lines
Iterate
Colour adds meaning
SQLThe power of ...
● Planned non-obsolescence
● Nothing you can't do
● Scales
● Can decouple
● SQL Alchemy, dbapi, adodbapi etc● In my current role, I use SQL with safe data where there is no
significant potential for dangerous input. In this case, the most readable and maintainable way of building SQL strings is to use dicts and string interpolation: “SELECT %(fld1)s, %(fld2)s FROM ...” % {“fld1”: dest_arrive_time, “fld2”: dest_depart_time}.But this is not a good habit otherwise – search on “SQL injection” if you don't know why!
● Read data using dicts: row[“dest_x”]
dbapi● con = db.connect(host=...)● cur = con.cursor()● sql = “SELECT fname
FROM data WHERE age > 40”
● cur.execute(sql)● print(“, ”.join(x[“fname”] for x in cur.fetchall()))
HTMLThe power of ...
● Text
● Nothing you can't do
● Easy to display tabular data, hyperlinks, subreports
● Clean HTML can be opened as documents and spreadsheets
● Conditional highlighting e.g. class_str = “class = 'highlight'
if age > 10 else ””
html.append(“<td %(class_str)s>%(age_val)</td>”)
Imagine, create, iterate ...