pycon apac 2016 keynote

Post on 23-Jan-2018

2.608 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SaturdayMorningKeynoteWesMcKinney@wesmckinn

PyConAPAC2016(Seoul)

Me

DataPad

ApacheArrow

Featheribis

Inprocess:PythonforDataAnalysis:2ndEdi:onComing2017(inEnglishJ)

Q:Whatbringsyouhere?

Oursharedvalues

PrideinsoMwarecraMsmanship

Mystory

•  AccidentalsoMwaredeveloper•  2007:Myfirstjob(financialresearchanalyst)

•  IstartedwriPngPythonlibrariestodomyownworkbeQer

•  SoonIwashelpingmycolleaguesworkbeQer,too

Tools

Tools

Empathythefeelingthatyouunderstandandshareanotherperson'sexperiencesandemoPons:theabilitytosharesomeoneelse'sfeelings

Source:Merriam-Webster'sLearner'sDicPonary

Opensourceiswonderful…

Opensourceiswonderful…butitcanalsobefrustraPng

Sustainableopensource

•  Howtokeepcontributorsfromdrowning/burningout?

•  Howtofundthework?

•  Howtoprotectandservethecommunity?

TheGrind

“Thegrindisanendlessstreamofbugreports,requests,demands,quesPons,andoccasionalinquisiPons.” DHH,CreatorofRubyonRails

pandas,theopensourceproject

•  PartsofcodedatebacktoApril2008•  Over600uniquecontributorsonGitHub•  AcPveprojectmaintainersrangefrom4-7people

•  >6900ClosedIssues•  >5100PullRequests

pandasatendof2012

April7,2014

"Somemightarguethat[Heartbleed]istheworst

vulnerabilityfound(atleastintermsofitspotenPalimpact)

sincecommercialtrafficbegantoflowontheInternet."

JosephSteinberg,Forbescybersecuritycolumnist

“Thereshouldbeatleast…[6]fullPmeOpenSSLteammembers,notjustone,abletoconcentrate…withouthavingtohustlecommercialwork.Ifyou’rea…inaposiPontodosomethingaboutit,giveitsomethought.Please.I’mgemngoldandwearyandI’dliketorePresomeday.”SteveMarquess,OpenSSLteam

ByNadiaEghbal,supportedbytheFordFoundaPon

Formoreonthis

“TheCathedralandtheBazaar”

Python’snormalizaPoninindustry

•  Pythonhasbecomealeadinglanguageinsteadofsomething“experimental”or“risky”

•  ManybusinessesfoundedonthegrowthofthePythonuserbase

•  SeePaulGraham’s2004essay“ThePythonParadox”—howthingshavechanged!

Governance“theprocessesofinteracPonanddecision-makingamongtheactorsinvolvedinacollecPveproblem…”

M.HuMy(viaWikipedia)

OpennessandTransparency

Consensus

Someexamplegovernancedocuments

•  NumPy(seethedocs)

•  IPython/Jupytergovernance– github.com/jupyter/governance

•  pandas– github.com/pydata/pandas-governance– ModeledaMerJupytergovernance

hQp://numfocus.org

hQp://apache.org

conda-forge

•  Community-curatedcondapackagechannel(hostedonanaconda.org)

•  Reproduciblebuildinfrastructure(Docker+CircleCI+TravisCI+Appveyor)

•  AutomatedGitHubhelpertools

conda config --add channels conda-forge

Whatisnextforpandas?

•  pandas1.0– Astable,maintenance-onlyrelease

•  Beginning“pandas2.0”– PlanningsignificantrefactoringontheinternalsofSeries,DataFrame

Whypandas2.0?

•  Somechangesdifficult/impossibletodoinanincrementalway

•  pandas’srelaPonshipwiththeecosystemhasevolvedoverthelast5years

•  Makepandas

– Fasteranduselessmemory– Fixlong-standinglimitaPons/inconsistencies– Easierinteroperability/extensibility

ApacheArrow

hQp://arrow.apache.org

HighPerformanceSharing&InterchangeToday With Arrow

•  Each system has its own internal memory format

•  70-80% CPU wasted on serialization and deserialization

•  Similar functionality implemented in multiple projects

•  All systems utilize the same memory format

•  No overhead for cross-system communication

•  Projects can share functionality (eg, Parquet-to-Arrow reader)

FeatherFileFormatforPythonandR

• Problem:fast,language-agnosPcbinarydataframefileformat

• ByWesMcKinney(Python)andHadleyWickham(R)

• ReadspeedsclosetodiskIOperformance

• LeveragesApacheArrow

Thankyou

@wesmckinnhQp://wesmckinney.com

pandassprintonMonday!

top related