python networking gitbook

231

Upload: hoangtruc

Post on 04-Jan-2017

262 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Python Networking Gitbook
Page 2: Python Networking Gitbook

1. Introduction2. IntroductiontoClient/ServerNetworking

i. Virtualenvii. InstallingvirtualenvinErleiii. Createavirtualenvironmenttotestpackages

3. Introductiontosocketi. Whatissocket?ii. CreatingaSocketiii. Usingsocketsiv. Disconnectingv. Non-blockingsockets

4. UDPandTCPi. Addressesandportnumbersii. UDP

i. HowUDPworks?ii. WhentouseUPDiii. Socket(UDP)iv. Unreliability,Backoff,Blocking,Timeoutsv. ConnectingUDPSocketsvi. BindingtoInterfaces(UDP)vii. UDPFragmentationviii. SocketOptions

iii. TCPi. HowTCPworks?ii. WhentouseTCPiii. WhatTCPSocketsMeaniv. ASimpleTCPClientandServerv. BindingtoInterfaces(TCP)vi. Deadlockvii. ClosedConnections,Half-OpenConnectionsviii. UsingTCPStreamslikeFiles

5. SocketnamesandDNSi. Socketnamesii. Fivesocketcordinatesiii. IPv6iv. Thegetaddrinfo()function

i. Askinggetaddrinfo()WheretoBindii. Askinggetaddrinfo()AboutServicesiii. Askinggetaddrinfo()forPrettyHostnamesiv. Othergetaddrinfo()Flagsv. getaddrinfo()inyourowncode

v. ASketchofHowDNSWorksvi. UsingDNS

6. NetworkDataandNetworkErrorsi. TextandEncodingsii. NetworkByteOrderiii. FramingandQuotingiv. PicklesandSelf-DelimitingFormatsv. XML,JSON,Etc.vi. Compressionvii. NetworkExceptionsviii. HandlingExceptions

TableofContents

Page 3: Python Networking Gitbook

7. TLSandSSLi. CleartextontheNetworkii. TLSEncryptsYourConversationsiii. SupportingTLSinPythoniv. TheStandardSSLModule

8. ServerArchitecturei. DaemonsandLoggingii. Introductoryexampleiii. Elementaryclientiv. Event-DrivenServersv. TheSemanticsofNon-blockingvi. TwistedPythonvii. ThreadingandMulti-processingviii. ThreadingandMulti-processingFrameworks

9. Caches,MessageQueues,andMap-Reducei. UsingMemcachedii. MemcachedandShardingiii. MessageQueuesiv. UsingMessageQueuesfromPythonv. Map-Reduce

10. HTTPi. URLAnatomyii. RelativeURLsiii. Instrumentingurllib2iv. TheGETMethodandTheHostHeaderv. PayloadsandPersistentConnectionsvi. POSTAndFormsvii. RESTAndMoreHTTPMethodsviii. IdentifyingUserAgentsandWebServersix. ContentTypeNegotiationx. Compressionxi. HTTPCachingxii. TheHEADMethodxiii. HTTPSEncryptionxiv. HTTPAuthenticationxv. Cookiesxvi. HTTPSessionHijackingxvii. Cross-SiteScriptingAttacks

11. ScreenScrapingi. FetchingWebPagesii. DownloadingPagesThroughFormSubmissioniii. TheStructureofWebPagesiv. ThreeAxesv. DivingintoanHTMLDocumentvi. Selectors

12. WebApplicationsi. WebServersandPythonii. ChoosingaWebServeriii. WSGIiv. WSGIMiddlewarev. PythonWebFrameworksvi. URLDispatchTechniquesvii. Templatesviii. Pure-PythonWebServersix. CommonGatewayInterface(CGI)x. mod_python

Page 4: Python Networking Gitbook

13. E-mailCompositionandDecodingi. E-mailMessagesii. ComposingTraditionalMessagesiii. ParsingTraditionalMessagesiv. ParsingDatesv. UnderstandingMIMEvi. ComposingMIMEAttachmentsvii. MIMEAlternativePartsviii. ComposingNon-EnglishHeadersix. ComposingNestedMultipartsx. ParsingMIMEMessagesxi. DecodingHeaders

14. SimpleMailTransportProtocol(SMTP)i. E-mailClients,WebmailServicesii. HowSMTPIsUsediii. SendingE-Mailiv. IntroducingtheSMTPLibraryv. ErrorHandlingandConversationDebuggingvi. GettingInformationfromEHLOvii. UsingSecureSocketsLayerandTransportLayerSecurityviii. AuthenticatedSMTP

15. PostOfficeProtocol(POP)i. ConnectingandAuthenticatingii. ObtainingMailboxInformationiii. DownloadingandDeletingMessages

16. InternetMessageAccessProtocol(IMAP)i. UnderstandingIMAPinPythonii. IMAPClientiii. MessageNumbersvs.UIDsiv. SummaryInformationv. DownloadinganEntireMailboxvi. DownloadingMessagesIndividuallyvii. FlaggingandDeletingMessagesviii. SearchingandManipulatingMessages

17. TelnetandSSHi. Command-LineAutomationii. Command-LineExpansionandQuotingiii. UnixHasNoSpecialCharactersiv. QuotingCharactersforProtectionv. ThingsAreDifferentinaTerminalvi. TerminalsDoBufferingvii. Telnetviii. SSH:TheSecureShellix. SSHHostKeysx. SSHAuthenticationxi. ShellSessionsandIndividualCommandsxii. SFTP:FileTransferOverSSH

18. FileTransferProtocol(FTP)i. WhattoUseInsteadofFTPii. CommunicationChannelsiii. UsingFTPinPythoniv. ASCIIandBinaryFilesv. AdvancedBinaryDownloadingvi. UploadingDatavii. AdvancedBinaryUploadingviii. HandlingErrors

Page 5: Python Networking Gitbook

ix. DetectingDirectoriesandRecursiveDownloadx. CreatingDirectories,DeletingThings

19. RemoteProcedureCall(RPC)i. FeaturesofRPCii. XML-RPCiii. JSON-RPCiv. Self-documentingDatav. TalkingAboutObjects:PyroandRPyCvi. AnRPyCExamplevii. RPC,WebFrameworks,MessageQueues

Page 6: Python Networking Gitbook

bookbook passingpassing

ThisbookteachesthereaderaboutPythonnetworkinginLinux,usingErleRoboticsautopilots.ErleRoboticscreatessmall-sizeLinuxcomputersformakingdrones.

WithPythonnetworkingwerefertohowussingthisprogramminglanguagetocontroltheincoming/outcomingconnections,tousedifferentprotocolssuchasIP.

Foryearswe'vebeenworkingintheroboticsfield,particularlywithdrones.WehavepassedthroughdifferentUniversitiesandresearchcentersandinalltheseplacesweactuallyfoundthatmostofthedronesareblackboxes(checkoutour60spitch).Notmeanttobeusedforlearning,research.Thesoftwaretheyuseisinmostofthecasesunknown,closedsourceornotdocumented.Giventheseconditions,howarewegoingtoeducatethenextgenerationsonthistechnologies?Howdoyougetstartedprogrammingdronesifyoudon'thave$1000+budget?Whichplatformallowsmetogetstartedwithdroneswithoutriskingahand?

Wearecomingupwithananswertoallthesequestions,ourtechnology:Erle.

ErleRobotics:PythonNetworkingProgramming

Book

About

Page 7: Python Networking Gitbook

InspiredbytheBeagleBonedevelopmentboard,wehavedesignedasmallcomputerwithabout36+sensors,plentyofI/Oandprocessingpowerforreal-timeanalysis.Erleistheenablingtechnologyforthenextgenerationofaerialandterrestrialrobotsthatwillbeusedincitiessolvingtaskssuchassurveillance,enviromentalmonitoringorevenprovidingaidatcatastrophes.

Oursmall-sizeLinuxcomputerisbringingroboticstothepeopleandbusinesses.

ThisbookhasbeenbasedondiferentLinuxdocumentationavaliableontheinternet.Refertothesourcesforthecorrespondinglicenses:

PythonDocumentationPythonStandardLibraryPythonPackageIndex

AllPythonreleasesareOpenSource(seelinkfortheOpenSourceDefinition).

FoundationsofPythonNetworkProgrammingbyBrandonRhodesandJohnGoerzen

Unlessspecified,thiscontentislicensedundertheCreativeCommonsAttribution-NonComercial-ShareAlike3.0UnportedLicense.Toviewacopyofthislicense,visithttp://creativecommons.org/licenses/by-sa/3.0/orsendalettertoCreativeCommons,171SecondStreet,Suite300,SanFrancisco,California,94105,USA.

AllderivativeworksaretobeattributedtoSilviaNúñezRiveroofErleRoboticsS.L..

Foranyquestions,concerns,orissuessubmitthemtosupport[at]erlerobot.com.

License

Page 8: Python Networking Gitbook

ThischapterisaboutnetworkprogrammingwiththePythonlanguage:aboutaccomplishingaspecificsetoftasksthatallinvolveaparticulartechnology—computernetworks—usingageneral-purposeprogramminglanguagethatcandoallsortsofthings.

Fornowon,wewillusefrecuently:

PythonStandardLibrarydocumentationPythonPackageIndex

IntroductiontoClient/ServerNetworking

Page 9: Python Networking Gitbook

AcommonsituationisthatyoufindaPythonpackagethatsoundslikeitmightalreadydoexactlywhatyouwant,andthatyouwanttotryitoutonyoursystem.ForthisyoushouldbeintroducetoverybestPythontechnologyforquicklytryingoutanewlibrary:virtualenv

Intheolddays,installingaPythonpackagewasagruesomeandirreversibleactthatrequiredadministrativeprivilegesonyourmachineandleftyoursystemPythoninstallpermanentlyaltered.

CarefulPythonprogrammersdonotsufferfromthissituationanylonger.ManyoftheminstallonlyonePythonpackagesystem-wide:virtualenv.Oncevirtualenvisinstalled,youhavethepowertocreateanynumberofsmall,self-contained“virtualPythonenvironments”wherepackagescanbeinstalled,un-installed,andexperimentedwithwithoutcontaminatingyoursystem-widePython.Whenaparticularprojectorexperimentisover,yousimplyremoveitsvirtualenvironmentdirectory,andyoursystemisclean.

Virtualenv

Page 10: Python Networking Gitbook

Thisistheoficialwebsiteofvirtualenv,whereyoucanfindinfromationabouttheinstallationandtheusage.

IfyouareconnectedtotheInternetfromErle(byusingawirelessnadousb)thenyouonlyneedtotype:

root@erlerobot:~#pipinstallvirtualenv

Ifnottheprocessmustbeabitmoretedious:

FirstofallyuneedtodownloadthevirtualenvfromhereDownloadthefilecalledvirtualenv-1.11.6.tar.gz(md5,pgp)toyourPc.ThencopyittoErleboar,youcanfindinthistutorialhowtodoit.Onceyouhavecopiedit,type:

root@erlerobot:~#tarxvfzvirtualenv-1.11.6.tar.gz

root@erlerobot:~#cdvirtualenv-1.11.6

root@erlerobot:~#pythonsetup.pyinstall

Congratulationsyouarenowreadytouseit!

InstallingvirtualenvinErle

Page 11: Python Networking Gitbook

Wearenowgoingtousevirtualenvtocreateanewenvironmentandintallthegooglemapspackageonit.Youcanreadmoreaboutthispackagehere.

Nowyoutypethefollowing:

root@erlerobot:~#virtualenv--no-site-packagesgmapenv

Newpythonexecutableingmapenv/bin/python

Installingsetuptools,pip...done.

root@erlerobot:~#

root@erlerobot:~#cdgmapenv

root@erlerobot:~/gmapenv#ls

binincludeliblocal

root@erlerobot:~/gmapenv#.bin/activate

(gmapenv)root@erlerobot:~/gmapenv#python-c'importgooglemaps'

Traceback(mostrecentcalllast):

File"<string>",line1,in<module>

ImportError:Nomodulenamedgooglemaps

(gmapenv)root@erlerobot:~/gmapenv#

Asyoucansee,thegooglemapspackageisnotyetavailable.Toinstallit,usethepipcommandthatisinsideyourvirtualenvandthatisnowonyourpaththankstotheactivatecommandthatyouran:

(gmapenv)root@erlerobot:~/gmapenv#pipinstallgooglemaps

Downloading/unpackinggooglemaps

Downloadinggooglemaps-1.0.2.tar.gz(60Kb):60Kbdownloaded

Runningsetup.pyegg_infoforpackagegooglemaps

Installingcollectedpackages:googlemaps

Runningsetup.pyinstallforgooglemaps

Successfullyinstalledgooglemaps

Cleaningup...

Thepythonbinaryinsidethevirtualenvwillnowhavethegooglemapspackageavailable:

(gmapenv)root@erlerobot:~/gmapenv#python-c'importgooglemaps'

Whenyouinstallapacket,youshouldbecarefull:itmustbesuitableforErlearchitecture.

Createavirtualenvironmenttotestpackages

Page 12: Python Networking Gitbook

Wewillusesocketsalotinfuturechapters.Thus,thischapter'saimistointroduceyouthebasicconceptsofsocket.

Introductiontosocket

Page 13: Python Networking Gitbook

RatherthantryingtoinventitsownAPIfordoingnetworking,Pythonmadeaninterestingdecision:itsimplyprovidesaslightlyobject-basedinterfacetoallofthenormal,gritty,low-leveloperatingsystemcallsthatarenormallyusedtoaccomplishnetworkingtasksonPOSIX-compliantoperatingsystems.

So,PythonexposesthenormalPOSIXcallsforrawUDPandTCPconnectionsratherthantryingtoinventanyofitsown.AndthenormalPOSIXnetworkingcallsoperatearoundacentralconceptcalledasocket.

ThatmeansthatcommunicationbetweendifferententitiesonanetworkisbasedontheclassicconceptPythonsockets.Socketsareanabstractconceptthatdesignatestheendpointofaconnection.Theprogramsusesocketstocommunicatewithotherprograms,whichmaybelocatedondifferentcomputers.AsocketisdefinedbytheIPaddressofthemachine,theportonwhichitlistens,andtheprotocolused.

Moreover,ifyouhaveeverworkedwithPOSIXbefore,youwillprobablyhaverunacrossthefactthatinsteadofmakingyourepeatafilenameoverandoveragain,thecallsletyouusethefilenametocreatea“filedescriptor”thatrepresentsaconnectiontothefile,andthroughwhichyoucanaccessthefileuntilyouaredoneworkingwithit.Socketsprovidethesameideaforthenetworkingrealm:whenyouaskforaccesstoalineofcommunication—likeaUDPport,asweareabouttosee—youcreateoneoftheseabstract“socket”objectsandthenaskforittobeboundtotheportyouwanttouse.Ifthebindingissuccessful,thenthesocket“holdsonto”thatportnumberfor.

Youshould,aswell,beawareofthatpartofthetroublewithunderstandingthesethingsisthat“socket”canmeananumberofsubtlydifferentthings,dependingoncontext.Sofirst,let’smakeadistinctionbetweena“client”socket-anendpointofaconversation,anda“server”socket,whichismorelikeaswitchboardoperator.Theclientapplication(yourbrowser,forexample)uses“client”socketsexclusively;thewebserverit’stalkingtousesboth“server”socketsand“client”sockets.

FromPythondocumentationwecanextractmoreinfoaboutsocketmodule.

Whatissocket?

Page 14: Python Networking Gitbook

Roughlyspeaking,whenyouclickedonthelinkthatbroughtyoutothispage,yourbrowserdidsomethinglikethefollowing:

#createanINET,STREAMingsocket

s=socket.socket(

socket.AF_INET,socket.SOCK_STREAM)

#nowconnecttothewebserveronport80

#-thenormalhttpport

s.connect(("www.mcmillan-inc.com",80))

Whentheconnectcompletes,thesocketscanbeusedtosendinarequestforthetextofthepage.Thesamesocketwillreadthereply,andthenbedestroyed.That’sright,destroyed.Clientsocketsarenormallyonlyusedforoneexchange(orasmallsetofsequentialexchanges).

Whathappensinthewebserverisabitmorecomplex.First,thewebservercreatesa“serversocket”:

#createanINET,STREAMingsocket

serversocket=socket.socket(

socket.AF_INET,socket.SOCK_STREAM)

#bindthesockettoapublichost,

#andawell-knownport

serversocket.bind((socket.gethostname(),80))

#becomeaserversocket

serversocket.listen(5)

Acouplethingstonotice:weusedsocket.gethostname()sothatthesocketwouldbevisibletotheoutsideworld.Ifwehaduseds.bind(('localhost',80))ors.bind(('127.0.0.1',80))wewouldstillhavea“server”socket,butonethatwasonlyvisiblewithinthesamemachine.s.bind(('',80))specifiesthatthesocketisreachablebyanyaddressthemachinehappenstohave.

Asecondthingtonote:lownumberportsareusuallyreservedfor“wellknown”services(HTTP,SNMPetc).Ifyou’replayingaround,useanicehighnumber(4digits).

Finally,theargumenttolistentellsthesocketlibrarythatwewantittoqueueupasmanyas5connectrequests(thenormalmax)beforerefusingoutsideconnections.Iftherestofthecodeiswrittenproperly,thatshouldbeplenty.

Nowthatwehavea“server”socket,listeningonport80,wecanenterthemainloopofthewebserver:

while1:

#acceptconnectionsfromoutside

(clientsocket,address)=serversocket.accept()

#nowdosomethingwiththeclientsocket

#inthiscase,we'llpretendthisisathreadedserver

ct=client_thread(clientsocket)

ct.run()

There’sactually3generalwaysinwhichthisloopcouldwork-dispatchingathreadtohandleclientsocket,createanewprocesstohandleclientsocket,orrestructurethisapptousenon-blockingsockets,andmulitplexbetweenour“server”socketandanyactiveclientsocketsusingselect.Theimportantthingtounderstandnowisthis:thisisalla“server”socketdoes.Itdoesn’tsendanydata.Itdoesn’treceiveanydata.Itjustproduces“client”sockets.Eachclientsocketiscreatedinresponsetosomeother“client”socketdoingaconnect()tothehostandportwe’reboundto.Assoonaswe’vecreatedthatclientsocket,wegobacktolisteningformoreconnections.Thetwo“clients”arefreetochatitup-theyareusingsomedynamicallyallocatedportwhichwillberecycledwhentheconversationends.

CreatingaSocket

Page 15: Python Networking Gitbook

Thefirstthingtonote,isthatthewebbrowser’s“client”socketandthewebserver’s“client”socketareidenticalbeasts.Thatis,thisisa“peertopeer”conversation.Ortoputitanotherway,asthedesigner,youwillhavetodecidewhattherulesofetiquetteareforaconversation.Normally,theconnectingsocketstartstheconversation,bysendinginarequest,orperhapsasignon.Butthat’sadesigndecision-it’snotaruleofsockets.

Nowtherearetwosetsofverbstouseforcommunication.Youcanusesend()andrecv(),oryoucantransformyourclientsocketintoafile-likebeastanduseread()andwrite().I’mnotgoingtotalkaboutithere,excepttowarnyouthatyouneedtouseflushonsockets.Thesearebuffered“files”,andacommonmistakeistowritesomething,andthenreadforareply.Withoutaflushinthere,youmaywaitforeverforthereply,becausetherequestmaystillbeinyouroutputbuffer.

Nowwecometothemajorstumblingblockofsockets-send()andrecv()operateonthenetworkbuffers.Theydonotnecessarilyhandleallthebytesyouhandthem(orexpectfromthem),becausetheirmajorfocusishandlingthenetworkbuffers.Ingeneral,theyreturnwhentheassociatednetworkbuffershavebeenfilled(send)oremptied(recv).Theythentellyouhowmanybytestheyhandled.Itisyourresponsibilitytocallthemagainuntilyourmessagehasbeencompletelydealtwith.

Whenarecv()returns0bytes,itmeanstheothersidehasclosed(orisintheprocessofclosing)theconnection.Youwillnotreceiveanymoredataonthisconnection.

AprotocollikeHTTPusesasocketforonlyonetransfer.Theclientsendsarequest,thenreadsareply.That’sit.Thesocketisdiscarded.Thismeansthataclientcandetecttheendofthereplybyreceiving0bytes.

Butifyouplantoreuseyoursocketforfurthertransfers,youneedtorealizethatthereisnoEOTonasocket.Irepeat:ifasocketsend()orrecv()returnsafterhandling0bytes,theconnectionhasbeenbroken.Iftheconnectionhasnotbeenbroken,youmaywaitonarecv()forever,becausethesocketwillnottellyouthatthere’snothingmoretoread(fornow).Nowifyouthinkaboutthatabit,you’llcometorealizeafundamentaltruthofsockets:messagesmusteitherbefixedlength(yuck),orbedelimited(shrug),orindicatehowlongtheyare(muchbetter),orendbyshuttingdowntheconnection.Thechoiceisentirelyyours,(butsomewaysarerighterthanothers).

Assumingyoudon’twanttoendtheconnection,thesimplestsolutionisafixedlengthmessage:

classmysocket:

'''demonstrationclassonly

-codedforclarity,notefficiency

'''

def__init__(self,sock=None):

ifsockisNone:

self.sock=socket.socket(

socket.AF_INET,socket.SOCK_STREAM)

else:

self.sock=sock

defconnect(self,host,port):

self.sock.connect((host,port))

defmysend(self,msg):

totalsent=0

whiletotalsent<MSGLEN:

sent=self.sock.send(msg[totalsent:])

ifsent==0:

raiseRuntimeError("socketconnectionbroken")

totalsent=totalsent+sent

defmyreceive(self):

chunks=[]

bytes_recd=0

whilebytes_recd<MSGLEN:

chunk=self.sock.recv(min(MSGLEN-bytes_recd,2048))

ifchunk=='':

raiseRuntimeError("socketconnectionbroken")

Usingsockets

Page 16: Python Networking Gitbook

chunks.append(chunk)

bytes_recd=bytes_recd+len(chunk)

return''.join(chunks)

Thesendingcodehereisusableforalmostanymessagingscheme-inPythonyousendstrings,andyoucanuselen()todetermineitslength(evenifithasembedded\0characters).It’smostlythereceivingcodethatgetsmorecomplex.

Theeasiestenhancementistomakethefirstcharacterofthemessageanindicatorofmessagetype,andhavethetypedeterminethelength.Nowyouhavetworecvs-thefirsttoget(atleast)thatfirstcharactersoyoucanlookupthelength,andthesecondinalooptogettherest.Ifyoudecidetogothedelimitedroute,you’llbereceivinginsomearbitrarychunksize,(4096or8192isfrequentlyagoodmatchfornetworkbuffersizes),andscanningwhatyou’vereceivedforadelimiter.

Onecomplicationtobeawareof:ifyourconversationalprotocolallowsmultiplemessagestobesentbacktoback(withoutsomekindofreply),andyoupass`recv()^anarbitrarychunksize,youmayendupreadingthestartofafollowingmessage.You’llneedtoputthatasideandholdontoit,untilit’sneeded.

Prefixingthemessagewithit’slength(say,as5numericcharacters)getsmorecomplex,because(believeitornot),youmaynotgetall5charactersinonerecv.Inplayingaround,you’llgetawaywithit;butinhighnetworkloads,yourcodewillveryquicklybreakunlessyouusetworecvloops-thefirsttodeterminethelength,thesecondtogetthedatapartofthemessage.Nasty.Thisisalsowhenyou’lldiscoverthatsenddoesnotalwaysmanagetogetridofeverythinginonepass.Anddespitehavingreadthis,youwilleventuallygetbitbyit!

Wewilldiscusstheissueofframming(delimitingmessages)inlaterchapter:NetworkdataandNetworkerrors

Page 17: Python Networking Gitbook

Strictlyspeaking,you’resupposedtouseshutdownonasocketbeforeyoucloseit.Theshutdownisanadvisorytothesocketattheotherend.Dependingontheargumentyoupassit,itcanmean“I’mnotgoingtosendanymore,butI’llstilllisten”,or“I’mnotlistening,goodriddance!”.Mostsocketlibraries,however,aresousedtoprogrammersneglectingtousethispieceofetiquettethatnormallyacloseisthesameasshutdown();close().Soinmostsituations,anexplicitshutdownisnotneeded.

OnewaytouseshutdowneffectivelyisinanHTTP-likeexchange.Theclientsendsarequestandthendoesashutdown(1).Thistellstheserver“Thisclientisdonesending,butcanstillreceive.”Theservercandetect“EOF”byareceiveof0bytes.Itcanassumeithasthecompleterequest.Theserversendsareply.Ifthesendcompletessuccessfullythen,indeed,theclientwasstillreceiving.

Pythontakestheautomaticshutdownastepfurther,andsaysthatwhenasocketisgarbagecollected,itwillautomaticallydoacloseifit’sneeded.Butrelyingonthisisaverybadhabit.Ifyoursocketjustdisappearswithoutdoingaclose,thesocketattheotherendmayhangindefinitely,thinkingyou’rejustbeingslow.So,itisveryrecommendablecloseyoursocketswhenyou’redone.

Disconnecting

Page 18: Python Networking Gitbook

InPython,youusesocket.setblocking(0)tomakeitnon-blocking.Youdothisaftercreatingthesocket,butbeforeusingit.(Actually,ifyou’renuts,youcanswitchbackandforth.)

Themajormechanicaldifferenceisthatsend(),recv(),connectandacceptcanreturnwithouthavingdoneanything.Youhave(ofcourse)anumberofchoices.Youcancheckreturncodeanderrorcodesandgenerallydriveyourselfcrazy.Yourappwillgrowlarge,buggyandsuckCPU.Solet’sskipthebrain-deadsolutionsanddoitright.Useselect.

ready_to_read,ready_to_write,in_error=\

select.select(

potential_readers,

potential_writers,

potential_errs,

timeout)

`

Youpassselectthreelists:thefirstcontainsallsocketsthatyoumightwanttotryreading;thesecondallthesocketsyoumightwanttotrywritingto,andthelast(normallyleftempty)thosethatyouwanttocheckforerrors.Youshouldnotethatasocketcangointomorethanonelist.Theselectcallisblocking,butyoucangiveitatimeout.Thisisgenerallyasensiblethingtodo-giveitanicelongtimeout(sayaminute)unlessyouhavegoodreasontodootherwise.

Inreturn,youwillgetthreelists.Theycontainthesocketsthatareactuallyreadable,writableandinerror.Eachoftheselistsisasubset(possiblyempty)ofthecorrespondinglistyoupassedin.

Ifasocketisintheoutputreadablelist,youcanbeas-close-to-certain-as-we-ever-get-in-this-businessthatarecvonthatsocketwillreturnsomething.Sameideaforthewritablelist.You’llbeabletosendsomething.Maybenotallyouwantto,butsomethingisbetterthannothing.(Actually,anyreasonablyhealthysocketwillreturnaswritable-itjustmeansoutboundnetworkbufferspaceisavailable.)

Ifyouhavea“server”socket,putitinthepotential_readerslist.Ifitcomesoutinthereadablelist,youracceptwill(almostcertainly)work.Ifyouhavecreatedanewsockettoconnecttosomeoneelse,putitinthepotential_writerslist.Ifitshowsupinthewritablelist,youhaveadecentchancethatithasconnected.

Oneverynastyproblemwithselect:ifsomewhereinthoseinputlistsofsocketsisonewhichhasdiedanastydeath,theselectwillfail.Youthenneedtoloopthrougheverysingledamnsocketinallthoselistsanddoaselect([sock],[],[],0)untilyoufindthebadone.Thattimeoutof0meansitwon’ttakelong,butit’sugly.

Actually,selectcanbehandyevenwithblockingsockets.It’sonewayofdeterminingwhetheryouwillblock-thesocketreturnsasreadablewhenthere’ssomethinginthebuffers.However,thisstilldoesn’thelpwiththeproblemofdeterminingwhethertheotherendisdone,orjustbusywithsomethingelse.

Portabilityalert:OnUnix,selectworksbothwiththesocketsandfiles.Don’ttrythisonWindows.OnWindows,selectworkswithsocketsonly.

Non-blockingsockets

Page 19: Python Networking Gitbook

ThetwoprincipalapproacheswhenbuildingatopIPare:UPDandTCP.

ThevastmajorityofapplicationstodayarebuiltatopTCP,theTransmissionControlProtocol,whichoffersorderedandreliabledatastreamsbetweenIPapplications.Afewprotocols,usuallywithshort,self-containedrequestsandresponses,andsimpleclientsthatwillnotbeannoyedifarequestgetslostandtheyhavetorepeatit,chooseUDP,theUserDatagramProtocol.

Thistwomethodsaredescribedindepthalongthischapter,butfornowhavetakeaquicklooktothedifferencesbetweenthistwo.

UDPandTCP

Page 20: Python Networking Gitbook

Wearegoingtoreviewabitaboutthistwotopics:

TheIPprotocolassignsanIPaddress—whichtraditionallytakestheformofafour-octetcode,like18.9.22.69—toeverymachineconnectedtoanIPnetwork.Infact,itdoesabitmorethanthis:amachinewithseveralnetworkcardsconnectedtothenetworkwilltypicallyhaveadifferentIPaddressforeachcard,sothatotherhostscanchoosethenetworkoverwhichyouwanttocontactthemachine.ButevenifanIP-connectedmachinehasonlyonenetworkcard,italsohasatleastoneothernetworkaddress:theaddress127.0.0.1ishowmachinescanconnecttothemselves.Itservesasastablenamethateachmachinehasforitself,thatstaysthesameasnetworkcablesarepluggedandunpluggedandaswirelesssignalscomeandgo.AndtheseIPaddressesallowmillionsofdifferentmachines,usingallsortsofdifferentnetworkhardware,topasspacketstoeachotheroverthefabricofanIPnetwork.

ButwithUDPandTCPwenowtakeabigstep,andstopthinkingabouttheroutingneedsofthenetworkasawholeandstartconsideringtheneedsofspecificapplicationsthatarerunningonaparticularmachine.Andthefirstthingwenoticeisthatasinglecomputertodaycanhavemanydozensofprogramsrunningonitatanygiventime—andmanyofthesewillwanttousethenetworkatthesamemoment.Youmightbecheckinge-mailwithThunderbirdwhileawebpageisdownloadinginGoogleChrome,orinstallingaPythonpackagewithpipoverthenetworkwhilecheckingthestatusofaremoteserverwithSSH.Somehow,allofthosedifferentandsimultaneousconversationsneedtotakeplacewithoutinterferingwitheachother.Thisproblemisknownasneedformultiplexing:theneedforasinglechanneltobesharedunambiguouslybyseveraldifferentconversations.

YoualsoshouldrememberthatwhenaprogramonyourcomputersendsorreceivesdataovertheInternetitsendsthatdatatoanipaddressandaspecificportontheremotecomputer,andreceivesthedataonausuallyrandomportonits

Addressesandportnumbers

Page 21: Python Networking Gitbook

owncomputer.IfitusestheTCPprotocoltosendandreceivethedatathenitwillconnectandbinditselftoaTCPport.IfitusestheUDPprotocoltosendandreceivedata,itwilluseaUDPport.

Page 22: Python Networking Gitbook

Now,wearegoingtocentreinUDP(UserDatagramProtocol).

UDP

Page 23: Python Networking Gitbook

TheUDPschemeisreallyquitesimple;anIPaddressandportareallthatisnecessarytodirectapackettoitsdestination.

Imagine,forexample,thatyousetupaDNSserver(Chapter4)ononeofyourmachines,withtheIPaddress192.168.1.9.Toallowothercomputerstofindtheservice,theserverwillasktheoperatingsystemforpermissiontotakecontroloftheUDPportwiththestandardDNSportnumber53.Assumingthatnoprocessisalreadyrunningthathasclaimedthatportnumber,theDNSserverwillbegrantedthatport.

Next,imaginethataclientmachinewiththeIPaddress192.168.1.30onyournetworkisgiventheIPaddressofthisnewDNSserverandwantstoissueaquery.ItwillcraftaDNSqueryinmemory,andthenasktheoperatingsystemtosendthatblockofdataasaUDPpacket.Sincetherewillneedtobesomewaytoidentifytheclientwhenthepacketreturns,andsincetheclienthasnotexplicitlyrequestedaportnumber,theoperatingsystemassignsitarandomone—say,port44137.

Thepacketwillthereforewingitswaytowardport53withlabelsthatidentifyitssourceastheIPaddressandUDPportnumbers(hereseparatedbyacolon):

192.168.1.30:44137

Anditwillgiveitsdestinationasthefollowing:

192.168.1.9:53

Thisdestinationaddress,simplethoughitlooks—justthenumberofacomputer,andthenumberofaport—iseverythingthatanIPnetworkstackneedstoguidethispackettoitsdestination.TheDNSserverwillreceivetherequestfromitsoperatingsystem,alongwiththeoriginatingIPandportnumber.Onceithasformulatedaresponse,theDNSserverwillasktheoperatingsystemtosendtheresponseasaUDPpackettotheIPaddressandUDPportnumberfromwhichtherequestoriginallycame.Thereplypacketwillhavethesourceanddestinationswappedfromwhattheywereintheoriginalpacket,anduponitsarrivalatthesourcemachine,itwillbedeliveredtothewaitingclientprogram.

HowUDPworks?

Page 24: Python Networking Gitbook

So,TheUserDataProtocol,UDP,letsuser-levelprogramssendindividualpacketsacrossanIPnetwork.Typically,aclientprogramsendsapackettoaserver,whichthenrepliesbackusingthereturnaddressbuiltintoeveryUDPpacket.YoumightthinkthatUDPwouldbeveryefficientforsendingsmallmessages.Actually,UDPisefficientonlyifyourhosteveronlysendsonemessageatatime,thenwaitsforaresponse.

TherearetwogoodreasonstouseUDP:

Becauseyouareimplementingaprotocolthatalreadyexists,anditusesUDP.

Becauseunreliablesubnetbroadcastisagreatpatternforyourapplication,andUDPsupportsitperfectly.

WhentouseUPD

Page 25: Python Networking Gitbook

Aswehaveseensocketsmakestalkingtoarbitrarymachinesaroundtheworldunbelievablyeasy(atleastcomparedtootherschemes).

Whenyoucraftprogramsthatacceptportnumbersfromuserinputlikethecommandlineorconfigurationfiles,itisfriendlytoallownotjustnumericportnumbersbuttoletuserstypehumanreadablenamesforwell-knownports.Thesenamesarestandard,andareavailablethroughthegetservbyname()callsupportedbyPython’sstandardsocketmodule.IfwewanttoaskwheretheDomainNameServicelives,wecouldhavefoundoutthisway:

importsocket

socket.getservbyname('domain')

53

Nowexaminethefollowingcodewhichshowsasimpleserverandclient.YoucanseealreadythatallsortsofoperationsaretakingplacethataredrawnfromthesocketmoduleinthePythonStandardLibrary.

#UDPclientandserveronlocalhost

importsocket,sys

s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)

MAX=65535

PORT=1060

ifsys.argv[1:]==['server']:

s.bind(('127.0.0.1',PORT))

print'Listeningat',s.getsockname()

whileTrue:

data,address=s.recvfrom(MAX)

print'Theclientat',address,'says',repr(data)

s.sendto('Yourdatawas%dbytes'%len(data),address)

elifsys.argv[1:]==['client']:

print'Addressbeforesending:',s.getsockname()

s.sendto('Thisismymessage',('127.0.0.1',PORT))

print'Addressaftersending',s.getsockname()

data,address=s.recvfrom(MAX)#overlypromiscuous-seetext!

print'Theserver',address,'says',repr(data)

else:

print>>sys.stderr,'usage:udp_local.pyserver|client'

Whenruningit,youshouldgetsomethingsimilartothis:

root@erlerobot:~/Python_files#pythonudp_local.py

usage:udp_local.pyserver|client

Noetrytorunfirsttheserver:

root@erlerobot:~/Python_files#pythonude_local.pyserver

Listeningat('127.0.0.1',1060)

AndtheninanewTerminalwindowtheclient:

root@erlerobot:~/Python_files#pythonudp_local.pyclient

Addressbeforesending:('0.0.0.0',0)

Addressaftersending('0.0.0.0',59726)

Theserver('127.0.0.1',1060)says'Yourdatawas18bytes'

Intheserverwindowwillappearanewline:

Socket(UDP)

Page 26: Python Networking Gitbook

Theclientat('127.0.0.1',59726)says'Thisismymessage'

NotethatthePythonprogramcanalwaysuseasocket’sgetsockname()methodtoretrievethecurrentIPandporttowhichthesocketisbound.Oncethesockerhasbeenboundsuccessfully,theserverisreadytostartreceivingrequests!Itentersaloopandrepeatedlyrunsrecvfrom(),tellingtheroutinethatitwillhappilyreceivemessagesuptoamaximumlengthofMAX,whichisequalto65535bytes—avaluethathappenstobethegreatestlengththataUDPpacketcanpossiblyhave,sothatwewillalwaysbeshownthefullcontentofeachpacket.Untilwesendamessagewithaclient,ourrecvfrom()callwillwaitforever.

Page 27: Python Networking Gitbook

Becausetheclientandserverintheprevioussectionwerebothrunningonthesamemachineandtalkingthroughitsloopbackinterface—whichisnotevenaphysicalnetworkcardthatcouldexperienceasignalingglitchandloseapacket,butmerelyavirtualconnectionbacktothesamemachinedeepinthenetworkstack—therewasnorealwaythatpacketscouldgetlost,andsowedidnotactuallyseeanyoftheinconvenienceofUDP.

YoucanrunthisclientandserverexampleontwodifferentmachinesontheInternet.Andinsteadofalwaysansweringclientrequests,thisserverrandomlychoosestoansweronlyhalfoftherequestscominginfromclients—whichwillletusdemonstratehowtobuildreliabilityintoourclientcode,withoutwaitingwhatmightbehoursforarealdroppedpackettooccur.

importrandom,socket,sys

s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)

MAX=65535

PORT=1060

if2<=len(sys.argv)<=3andsys.argv[1]=='server':

interface=sys.argv[2]iflen(sys.argv)>2else''

s.bind((interface,PORT))

print'Listeningat',s.getsockname()

whileTrue:

data,address=s.recvfrom(MAX)

ifrandom.randint(0,1):

print'Theclientat',address,'says:',repr(data)

s.sendto('Yourdatawas%dbytes'%len(data),address)

else:

print'Pretendingtodroppacketfrom',address

eliflen(sys.argv)==3andsys.argv[1]=='client':

hostname=sys.argv[2]

s.connect((hostname,PORT))

print'Clientsocketnameis',s.getsockname()

delay=0.1

whileTrue:

s.send('Thisisanothermessage')

print'Waitingupto',delay,'secondsforareply'

s.settimeout(delay)

try:

data=s.recv(MAX)

exceptsocket.timeout:

delay*=2#waitevenlongerforthenextrequest

ifdelay>2.0:

raiseRuntimeError('Ithinktheserverisdown')

except:

raise#arealerror,sowelettheuserseeit

else:

break#wearedone,andcanstoplooping

print'Theserversays',repr(data)

else:

print>>sys.stderr,'usage:udp_remote.pyserver[<interface>]'

print>>sys.stderr,'or:udp_remote.pyclient<host>'

sys.exit(2)

Runningthefileitselfresulton:

root@erlerobot:~/Python_files#pythonsocket1.py

usage:udp_remote.pyserver[<interface>]

or:udp_remote.pyclient<host>

Thenruntheserver:

root@erlerobot:~/Python_files#pythonudp_remote.pyserver

Listeningat('0.0.0.0',1060)

Andnowtheclient,remembertopassthehostnamewheretheserverscriptisbeingrun(inthiscasethesamemachine):

Unreliability,Backoff,Blocking,Timeouts

Page 28: Python Networking Gitbook

root@erlerobot:~/Python_files#pythonudep_remote.pyclient127.0.0.1

Clientsocketnameis('127.0.0.1',54770)

Waitingupto0.1secondsforareply

Waitingupto0.2secondsforareply

Waitingupto0.4secondsforareply

Waitingupto0.8secondsforareply

Theserversays'Yourdatawas23bytes'

Asyoucansee,eachtimearequestisreceived,theserverusesrandint()toflipacointodecidewhetherthisrequestwillbeanswered,sothatwedonothavetokeeprunningtheclientalldaywaitingforarealdroppedpacket.Theclientewillfindthatoneormoreofitsrequestsneverresultinreplies.

Page 29: Python Networking Gitbook

TheremoteUDPclientinsocket1.pyusesanewcallthatwehavenotdiscussedbefore:theconnect()socketoperation.Youcanseeeasilyenoughwhatitdoes.Insteadofhavingtousesendto()andanexplicitUDPaddresseverytimewewanttosendsomethingtotheserver,theconnect()callletstheoperatingsystemknowaheadoftimewhichremoteaddresstowhichwewanttosendpackets,sothatwecansimplysupplydatatothesend()callandnothavetorepeattheserveraddressagain.Butconnect()doessomethingelseimportant,whichwillnotbeobviousatallfromreadingthescriptofudp_remote.py.Toapproachthistopic,letusreturntoudp_local.pyfileforamoment.YouwillrecallthatbothitsclientandserverusetheloopbackIPaddressandassumereliabledelivery—theclientwillwaitforeverforaresponse.Tryrunningtheclientinonewindow:

root@erlerobot:~/Python_files#pythonudp_local.py

Addressbeforesending:('0.0.0.0',0)

Addressaftersending('0.0.0.0',52970)

Theclientisnowwaiting—perhapsforever—foraresponseinreplytothepacketithasjustsenttothelocalhostIPaddressatUDPport1060.Butwhatifwenefariouslytrysendingitbackapacketfromadifferentserver,instead?Fromanothercommandpromptonthesamesystem,tryrunningPythonandenteringthesecommands—andfortheportnumber,copytheintegerthatwasjustprintedtothescreenwhenyourantheUDPclient:

>>>importsocket

>>>s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)

>>>s.sendto('Fakereply',('127.0.0.1',52970))

10

>>>

Intheclientwindowappears:

Theserver('127.0.0.1',65320)says'Fakereply'

Itturnsoutthatourfirstclientacceptsanswersfromanywhere.Eventhoughtheserverisrunningonthelocalhost,andremotenetworkconnectivityisnotevendesirable,theclientwillevenacceptpacketsfromanothermachine.IfIbringupaPythonpromptonanotherboxandrunthesametwolinesofcodeasjustshown,thenawaitingclientcanevenseetheremoteIPaddress.

Thereare,then,twowaystowriteUDPclientsthatarecarefulaboutthereturnaddressesofthepacketsarrivingback:

Youcanusesendto()anddirecteachoutgoingpackettoaspecificdestination,andthenuserecvfrom()toreceivetherepliesandcarefullycheckthereturnaddressitgivesyouagainstthelistofserverstowhichyouhavemadeoutstandingrequests.

Youcanconnect()yoursocketrightaftercreatingit,andthensimplyusesend()andrecv(),andtheoperatingsystemwillfilteroutunwantedpacketsforyou.Thisworksonlyforspeakingtooneserveratatime,becauserunningconnect()asecondtimeonthesamesocketdoesnotaddaseconddestinationaddresstoyourUDPsocket.Instead,itwipesoutthefirstaddressentirely,sothatnofurtherrepliesfromtheearlieraddresswillbedeliveredtoyourprogram.

ConnectingUDPSockets

Page 30: Python Networking Gitbook

Whenusingsockets,itisimportanttodistinguishtheactof“binding”—bywhichyougrabaparticularUDPportfortheuseofaparticularsocket—fromtheactthattheclientperformsby“connecting,”whichlimitsallrepliesreceivedsothattheycancomeonlyfromtheparticularservertowhichyouwanttotalk.

SofarwehaveseentwopossibilitiesfortheIPaddressusedinthebind()callthattheservermakes:youcanuse'127.0.0.1'toindicatethatyouonlywantpacketsfromotherprogramsrunningonthesamemachine,oruseanemptystring''asawildcard,indicatingthatyouarewillingtoreceivepacketsfromanyinterface.Itactuallyturnsoutthatthereisathirdchoice:youcanprovidetheIPaddressofoneofthemachine’sexternalIPinterfaces,likeitsEthernetconnectionorwirelesscard,andtheserverwilllistenonlyforpacketsdestinedforthoseIPs.First,whatifwebindsolelytoanexternalinterface?Runtheserverlikethis,usingwhateveryouroperatingsystemtellsyouistheexternalIPaddressofyoursystem:

root@erlerobot:~/Python_files#pythonudp_remote.pyserver192.168.1.35

Listeningat('192.168.1.35',1060)

ConnectingtothisIPaddressfromanothermachineshouldstillworkjustfine:

root@erlerobot:~/Python_files#pythonudp_remote.pyclient192.168.1.35

Clientsocketnameis('192.168.1.35',58824)

Waitingupto0.1secondsforareply

Theserversays'Yourdatawas23bytes'

Butifyoutryconnectingtotheservicethroughtheloopbackinterfacebyrunningtheclientscriptonthesamemachine,thepacketswillneverbedelivered:

root@erlerobot:~/Python_files#pythonudp_remote.pyclient127.0.0.1

Clientsocketnameis('127.0.0.1',60251)

Waitingupto0.1secondsforareply

Traceback(mostrecentcalllast):

...

socket.error:[Errno111]Connectionrefused

Ifyourunclientagainonthesamemachine,butthistimeusetheexternalIPaddressofthebox,eventhoughtheclientandserverarebothrunningthere,thiswillnotgiveanyerror.SobindingtoanIPinterfacemightlimitwhichexternalhostscantalktoyou;butitwillcertainlynotlimitconversationswithotherclientsonthesamemachine,solongastheyknowtheIPaddressthattheyshouldusetoconnect.

Now,stopallofthescriptsthatarerunning,andwecantryrunningtwoserversonthesamebox.

root@erlerobot:~/Python_files#pythonudp_remote.pyserver127.0.0.1

Listeningat('127.0.0.1',1060)

Andthenwetryrunningasecondone,connectedtothewildcardIPaddressthatallowsrequestsfromanyaddress:

root@erlerobot:~/Python_files#pythonudp_remote.pyserver

Traceback(mostrecentcalllast):

...

socket.error:[Errno98]Addressalreadyinuse

WehavelearnedsomethingaboutoperatingsystemIPstacksandtherulesthattheyfollow:theydonotallowtwodifferentsocketstolistenatthesameIPaddressandportnumber,becausethentheoperatingsystemwouldnotknowwhereto

BindingtoInterfaces(UDP)

Page 31: Python Networking Gitbook

deliverincomingpackets.ButwhatifinsteadoftryingtorunthesecondserveragainstallIPinterfaces,wejustranitagainstanexternalIPinterface—onethatthefirstcopyoftheserverisnotlisteningto?Letustry:

root@erlerobot:~/Python_files#pythonudp_remote.pyserver192.168.1.35

Listeningat('192.168.1.35',1060)

Itworked,thismenasthattherearenowtwoserversrunningonthismachine,oneofwhichisboundtotheinwardlookingport1060ontheloopbackinterface,andtheotherlookingoutwardforpacketsarrivingonport1060fromthenetworktowhichmywirelesscardhasconnected.

IPnetworkstackneverthinksofaUDPportasaloneentitythatiseitherentirelyavailable,orelseinuse,atanygivenmoment.Instead,itthinksintermsofUDP“socketnames”thatarealwaysapairlinkinganIPinterface—evenifitisthewildcardinterface—withaUDPportnumber.Itisthesesocketnamesthatmustnotconflictamongthelisteningserversatanygivenmoment,ratherthanthebareUDPportsthatareinuse.

Page 32: Python Networking Gitbook

TheforegoingprogramlistingshavesuggestedthataUDPpacketcanbeupto64kBinsize,whereasyouprobablyalreadyknowthatyourEthernetorwirelesscardcanonlyhandlepacketsofaround1,500bytesinstead.

TheactualtruthisthatIPsendssmallUDPpacketsassinglepacketsonthewire,butsplitsuplargerUDPpacketsintoseveralsmallphysicalpackets.Thismeansthatlargepacketsaremorelikelytobedropped,sinceifanyoneoftheirpiecesfailstomakeitswaytothedestination,thenthewholepacketcanneverbereassembledanddeliveredtothelisteningoperatingsystem.Butasidefromthehigherchanceoffailure,thisprocessoffragmentinglargeUDPpacketssothattheywillfitonthewireshouldbeinvisibletoyourapplication.Therearethreeways,however,inwhichitmightberelevant:

Ifyouarethinkingaboutefficiency,youmightwanttolimityourprotocoltosmallpackets,tomakeretransmissionlesslikelyandtolimithowlongittakestheremoteIPstacktoreassembleyourUDPpacketandgiveittothewaitingapplication.

IftheICMPpacketsarewrongfullyblockedbyafirewallthatwouldnormallyallowyourhosttoauto-detecttheMTUbetweenyouandtheremotehost,thenyourlargerUDPpacketsmightdisappearintooblivionwithoutyoureverknowing.TheMTUisthe“maximumtransmissionunit”or“largestpacketsize”thatallofthenetworkdevicesbetweentwohostswillsupport.

Ifyourprotocolcanmakeitsownchoicesabouthowitsplitsupdatabetweendifferentpackets,andyouwanttobeabletoauto-adjustthissizebasedontheactualMTUbetweentwohosts,thensomeoperatingsystemsletyouturnofffragmentationandreceiveanerrorifaUDPpacketistoobig.Thisletsyouregroupandsplititintoseveralpacketsifthatispossible.

Linuxisoneoperatingsystemthatsupportsthislastoption.Takealookatbig_sender.py,whichsendsaverylargemessagetooneoftheserversthatwehavejustdesigned.

importIN,socket,sys

s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)

MAX=65535

PORT=1060

iflen(sys.argv)!=2:

print>>sys.stderr,'usage:big_sender.pyhost'

sys.exit(2)

hostname=sys.argv[1]

s.connect((hostname,PORT))

s.setsockopt(socket.IPPROTO_IP,IN.IP_MTU_DISCOVER,IN.IP_PMTUDISC_DO)

try:

s.send('#'*65000)

exceptsocket.error:

print'Themessagedidnotmakeit'

option=getattr(IN,'IP_MTU',14)#constanttakenfrom<linux/in.h>

print'MTU:',s.getsockopt(socket.IPPROTO_IP,option)

else:

print'Thebigmessagewassent!Yournetworksupportsreallybigpackets!'

Ifwerunthisprogramagainstaserverelsewhereonmyhomenetwork,thenwediscoverthatmywirelessnetworkallowsphysicalpacketsthatarenobiggerthanthe1,500bytestypicallysupportedbyEthernet-stylenetworks:

root@erlerobot:~/Python_files#pythonbig_sender.py127.0.0.0

Themessagedidnotmakeit

MTU:1500

UDPFragmentation

Page 33: Python Networking Gitbook

ThePOSIXsocketinterfacealsosupportsallsortsofsocketoptionsthatcontrolspecificbehaviorsofnetworksockets.TheseareaccessedthroughthePythonsocketmethodsgetsockopt()andsetsockopt(),usingtheoptionsyouwillfinddocumentedforyouroperatingsystem.YoucanfindthisoptionsdescribedinthePythondocumentation.

Whensettingsocketoptions,thesetcallissimilarto:

value=s.getsockopt(socket.SOL_SOCKET,socket.SO_BROADCAST)

s.setsockopt(socket.SOL_SOCKET,socket.SO_BROADCAST,value)

Herearesomeofthemorecommonoptions:

SO_BROADCAST:AllowsbroadcastUDPpacketstobesentandreceived;seethenextsectionfordetails.

SO_DONTROUTE:Onlybewillingtosendpacketsthatareaddressedtohostsonsubnetstowhichthiscomputerisconnecteddirectly.

SO_TYPE:Whenpassedtogetsockopt(),thisreturnstoyouregardlessofwhetherasocketisoftypeSOCK_DGRAMandcanbeusedforUDP,oritisoftypeSOCK_STREAMandinsteadsupportsthesemanticsofTCP.

NOTE:

IfUDPhasasuperpower,itisitsabilitytosupportbroadcast:insteadofsendingapackettosomespecificotherhost,youcanpointitatanentiresubnettowhichyourmachineisattachedandhavethephysicalnetworkcardbroadcastthepacketsothatallattachedhostsseeitwithoutitshavingtobecopiedseparatelytoeachoneofthem.Hereandhereyoucanfindtwoexampleofbroadcasting.

SocketOptions

Page 34: Python Networking Gitbook

TheTransmissionControlProtocol(TCP)istheworkhorseoftheInternet.ProtocolsthatcarrydocumentsandfilesnearlyalwaysrideatopTCP,includingHTTPandallthemajorwaysoftransmittinge-mail.Itisalsothefoundationofchoiceforprotocolsthatcarryonlongconversationsbetweenpeopleorcomputers,likeSSHandmanypopularchatprotocols

TCP

Page 35: Python Networking Gitbook

First,everypacketisgivenasequencenumber,sothatthesystemonthereceivingendcanputthembacktogetherintherightorder,andsothatitcannoticemissingpacketsinthesequenceandaskthattheybere-transmitted.Insteadofusingsequentialintegers(1,2,…)tomarkpackets,TCPusesacounterthatcountsthenumberofbytestransmitted.Soa1,024-bytepacketwithasequencenumberof7,200wouldbefollowedbyapacketwithasequencenumberof8,224.Thismeansthatabusynetworkstackdoesnothavetorememberhowitbrokeadatastreamupintopackets;ifaskedforare-transmission,itcanbreakthestreamupintopacketssomeotherway(whichmightletitfitmoredataintoapacketifmorebytesarenowwaitingfortransmission),andthereceivercanstillputthepacketsbacktogether.

Ratherthanrunningveryslowlyinlock-stepbyneedingeverypackettobeacknowledgedbeforeitsendsthenextone,TCPsendswholeburstsofpacketsatatimebeforeexpectingaresponse.TheamountofdatathatasenderiswillingtohaveonthewireatanygivenmomentiscalledthesizeoftheTCP“window.”TheTCPimplementationonthereceivingendcanregulatethewindowsizeofthetransmittingend,andthussloworpausetheconnection.Thisiscalled“flowcontrol.”Thisletsitforbidthetransmissionofadditionalpacketsincaseswhereitsinputbufferisfullanditwouldhavetodiscardanymoredataifitweretoarriverightnow.

Finally,ifTCPseesthatpacketsarebeingdropped,itassumesthatthenetworkisbecomingcongestedandstopssendingasmuchdataeverysecond.

HowTCPworks

Page 36: Python Networking Gitbook

TCPhasverynearlybecomeauniversaldefaultwhentwoprogramsneedtocommunicate,weshouldlookatafewinstancesinwhichitsbehaviorisnotoptimalforcertainkindsofdata,incaseanapplicationyouarewritingeverfallsintooneofthesecategories.First,TCPisunwieldyforprotocolswhereclientswanttosendsingle,smallrequeststoaserver,andthenaredoneandwillnottalktoitfurther.IttakesthreepacketsfortwohoststosetupaTCPconnection—thefamoussequenceofSYN,SYN-ACK,andACK(whichmean“Iwanttotalk,hereisthepacketsequencenumberIwillbestartingwith”;“okay,here’smine”;“okay!”)—andthenanotherthreeorfourtoshuttheconnectionbackdown(eitheraquickFIN,FIN-ACK,ACK,oraslightlylongerpairofseparateFINandACKpackets).Thatissixpacketsjusttosendasinglerequest:ProtocoldesignersquicklyturntoUDPinsuchcases.

InviewofthiswearegoingtodetailtwosituationswheretheuseofTCPisnotappropriate:

WhereUDPreallyshinesoverTCP,then,iswheresuchalong-termrelationshipdoesnotpertainbetweenclientandserver,andespeciallywheretherearesomanyclientsthatatypicalTCPimplementationwouldrunoutofportnumbersifithadtokeepupwithaseparatedatastreamforeachactiveclient.

ThesecondsituationwhereTCPisinappropriateiswhenanapplicationcandosomethingmuchsmarterthansimplyre-transmitdatawhenapackethasbeenlost.Imagineanaudiochatconversation,forexample:ifasecond’sworthofdataislostbecauseofadroppedpacket,thenitwilldolittlegoodtosimplyre-sendthatsamesecondofaudio,overandover,untilitfinallyarrives.

WhentouseTCP

Page 37: Python Networking Gitbook

Aswehavementionedbefore,TCPusesportnumberstodistinguishdifferentapplicationsrunningatthesameIPaddress,andfollowsexactlythesameconventionsregardingwell-knownandephemeralportnumber.WithastatefulstreamprotocollikeTCP,the`connect()callbecomesthefundamentalactuponwhichallothernetworkcommunicationhinges.TCPconnect()canfail:Theremotehostmightnotanswer;itmightrefusetheconnection;ormoreobscureprotocolerrorsmightoccurliketheimmediatereceiptofaRST(“reset”)packet.Becauseastreamconnectioninvolvessettingupapersistentconnectionbetweentwohosts,theotherhostneedstobelisteningandreadytoacceptyourconnection.

Onthe“serverside”—which,forthepurposeofthischapter,istheconversationpartnernotdoingtheconnect()callbutreceivingtheSYNpacketthatitinitiates—anincomingconnectiongeneratesanevenmoremomentousevent,thecreationofanewsocket.ThisisbecausethestandardPOSIXinterfacetoTCPactuallyinvolvestwocompletelydifferentkindsofsockets:“passive”listeningsocketsandactive“connected”ones:

Apassivesocketholdsthe“socketname”—theaddressandportnumber—atwhichtheserverisreadytoreceiveconnections.Nodatacaneverbereceivedorsentbythiskindofport;itdoesnotrepresentanyactualnetworkconversation.Instead,itishowtheserveralertstheoperatingsystemtoitswillingnesstoreceiveincomingconnectionsinthefirstplace.

Anactivesocket(connectedsocket),isboundtooneparticularremoteconversationpartner,whohastheirownIPaddressandportnumber.Itcanbeusedonlyfortalkingbackandforthwiththatpartner,andcanbereadandwrittentowithoutworryingabouthowtheresultingdatawillbesplitupintopackets—inmanycases,aconnectedsocketcanbepassedtoanotherPOSIXprogramthatexpectstoreadfromanormalfile,andtheprogramwillneverevenknowthatitistalkingtothenetwork.

Notethatwhileapassivesocketismadeuniquebytheinterfaceaddressandportnumberatwhichitislistening(sothatnooneelseisallowedtograbthatsameaddressandport),therecanbemanyactivesocketsthatallsharethesamelocalsocketname.

Whatmakesanactivesocketuniqueis,rather,thefour-partcoordinate:(local_ip,local_port,remote_ip,remote_port).Itisthisfour-tuplebywhichtheoperatingsystemnameseachactiveTCPconnection,andincomingTCPpacketsareexaminedtoseewhethertheirsourceanddestinationaddressassociatethemwithanyofthecurrentlyactivesocketsonthesystem.

WhatTCPSocketsMean

Page 38: Python Networking Gitbook

HereyoucanfindthecodeofasimpleTCPclientandserverthatsendandreceive16octets:

importsocket,sys

s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)

HOST=sys.argv.pop()iflen(sys.argv)==3else'127.0.0.1'

PORT=1060

defrecv_all(sock,length):

data=''

whilelen(data)<length:

more=sock.recv(length-len(data))

ifnotmore:

raiseEOFError('socketclosed%dbytesintoa%d-bytemessage'

%(len(data),length))

data+=more

returndata

ifsys.argv[1:]==['server']:

s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)

s.bind((HOST,PORT))

s.listen(1)

whileTrue:

print'Listeningat',s.getsockname()

sc,sockname=s.accept()

print'Wehaveacceptedaconnectionfrom',sockname

print'Socketconnects',sc.getsockname(),'and',sc.getpeername()

message=recv_all(sc,16)

print'Theincomingsixteen-octetmessagesays',repr(message)

sc.sendall('Farewell,client')

sc.close()

print'Replysent,socketclosed'

elifsys.argv[1:]==['client']:

s.connect((HOST,PORT))

print'Clienthasbeenassignedsocketname',s.getsockname()

s.sendall('Hithere,server')

reply=recv_all(s,16)

print'Theserversaid',repr(reply)

s.close()

else:

print>>sys.stderr,'usage:tcp_local.pyserver|client[host]'

First,theTCPconnect()callisnottheinnocuousbitoflocalsocketconfigurationthatitisinthecaseofUDP,whereitmerelysetsadefaultaddressusedwithanysubsequentsend()calls,andplacesafilteronpacketsarrivingatoursocket.Here,connect()isareallivenetworkoperationthatkicksoffthethree-wayhandshakebetweentheclientandservermachinesothattheyarereadytocommunicate.Thismeansthatconnect()canfail,asyoucanverifyquiteeasilybyexecutingthisscriptwhentheserverisnotrunning:

root@erlerobot:~/Python_files#pythontcp_sixteen.pyclient

Traceback(mostrecentcalllast):

File"tcp_sixteen.py",line29,in<module>

s.connect((HOST,PORT))

File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py",line224,inmeth

returngetattr(self._sock,name)(*args)

socket.error:[Errno61]Connectionrefused

YouwillseethatthisTCPclientisinonewaymuchsimplerthanourUDPclient,becauseitdoesnotneedtomakeanyprovisionformissingdata.BecauseoftheassurancesthatTCPprovides,itcansend()datawithoutcheckingwhethertheremoteendreceivesit,andrunrecv()withouthavingtoconsiderthepossibilityofre-transmittingitsrequest.

WhenweperformaTCPsend(),ouroperatingsystem’snetworkingstackwillfaceoneofthreesituations:

Thedatacanbeimmediatelyacceptedbythesystem,eitherbecausethenetworkcardisimmediatelyfreetotransmit,orbecausethesystemhasroomtocopythedatatoatemporaryoutgoingbuffersothatyourprogramcancontinuerunning.Inthesecases,send()returnsimmediately,anditwillreturnthelengthofyourdatastringbecausethewholestringwastransmitted.

ASimpleTCPClientandServer

Page 39: Python Networking Gitbook

Anotherpossibilityisthatthenetworkcardisbusyandthattheoutgoingdatabufferforthissocketisfullandthesystemcannot—orwillnot—allocateanymorespace.Inthiscase,thedefaultbehaviorofsend()issimplytoblock,pausingyourprogramuntilthedatacanbeaccepted.

Thereisafinal,hybridpossibility:thattheoutgoingbuffersarealmostfull,butnotquite,andsopartofthedatayouaretryingtosendcanbeimmediatelyqueued,buttherestwillhavetowait.Inthiscase,send()completesimmediatelyandreturnsthenumberofbytesacceptedfromthebeginningofyourdatastring,butleavestherestofthedataunprocessed.

Fortunately,Pythondoesnotforceustodothisdanceourselveseverytimewehaveablockofdatatosend:theStandardLibrarysocketimplementationprovidesafriendlysendall()method.Notonlyissendall()fasterthandoingitourselves,itreleasestheGlobalInterpreterLockduringitsloopsothatotherPythonthreadscanrunwithoutcontentionuntilallofthedatahasbeentransmitted.Unfortunately,noequivalentisprovidedfortherecv()call,despitethefactthatitmightreturnonlypartofthedatathatisonthewayfromtheclient.Internally,theoperatingsystemimplementationofrecv()useslogicveryclosetothatusedwhensending:

Ifnodataisavailable,thenrecv()blocksandyourprogrampausesuntildataarrives.

Ifplentyofdataisavailablealreadyintheincomingbuffer,thenyouaregivenasmanybytesasyouaskedrecv()for.

Butifthebuffercontainsabitofdata,butnotasmuchasyouareaskingfor,thenyouareimmediatelyreturnedwhatdoeshappentobethere,evenifitisnotasmuchasyouhaveaskedfor.

Inthecodestoredintcp_sixteen.py,youcanseehowthedistinctionbetweenactiveandlisteningsocketiscarriedthroughinactualservercode.Thelink,whichmightstrikeyouasoddatfirst,isthatalisteningsocketactuallyproducesnewconnectedsocketsasthereturnvaluethatyougetbylistening.Followthestepsintheprogramlistingtoseetheorderinwhichthesocketoperationsoccur.

Runtheserver:

root@erlerobot:~/Python_files#pythontcp_sixteen.pyserver

Listeningat('127.0.0.1',1060)

Andthentheclient(inanotherterminalwindow):

root@erlerobot:~/Python_files#pythontcp_sixteen.pyclient

Clienthasbeenassignedsocketname('127.0.0.1',49607)

Theserversaid'Farewell,client'

Theserverreturnsthis:

Wehaveacceptedaconnectionfrom('127.0.0.1',49607)

Socketconnects('127.0.0.1',1060)and('127.0.0.1',49607)

Theincomingsixteen-octetmessagesays'Hithere,server'

Replysent,socketclosed

Listeningat('127.0.0.1',1060)

Page 40: Python Networking Gitbook

theIPaddressthatyoupairwithaportnumberwhenyouperformabind()operationtellstheoperatingsystemwhichnetworkinterfacesyouarewillingtoreceiveconnectionsfrom.Theexampleinvocationsoftcp_sixteen.pyusedthelocalhostIPaddress127.0.0.1,whichprotectsyourcodefromconnectionsoriginatingonothermachines.Youcanverifythisbyrunningtcp_sixteen.pyinservermodeasshownpreviously,andtryingtoconnectwithaclientfromanothermachine:

root@erlerobot:~/Python_files#pythontcp_sixteen.pyclient192.168.1.35

Traceback(mostrecentcalllast):

File"tcp_sixteen.py",line29,in<module>

s.connect((HOST,PORT))

File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py",line224,inmeth

returngetattr(self._sock,name)(*args)

socket.error:[Errno61]Connectionrefused

Butifyouruntheserverwithanemptystringforthehostname,whichtellsthePythonbind()routinethatyouarewillingtoacceptconnectionsthroughanyofyourmachine’sactivenetworkinterfaces,thentheclientcanconnectsuccessfullyfromanotherhost:

root@erlerobot:~/Python_files#pythontcp_sixteen.pyserver""

Listeningat('0.0.0.0',1060)

`

Runtheclient:

root@erlerobot:~/Python_files#pythontcp_sixteen.pyclient192.168.1.35

Clienthasbeenassignedsocketname('192.168.1.35',49696)

Theserversaid'Farewell,client'

Thisappearintotheserverterminal:

Wehaveacceptedaconnectionfrom('192.168.1.35',49696)

Socketconnects('192.168.1.35',1060)and('192.168.1.35',49696)

Theincomingsixteen-octetmessagesays'Hithere,server'

Replysent,socketclosed

Listeningat('0.0.0.0',1060)

BindingtoInterfaces(TCP)

Page 41: Python Networking Gitbook

Theterm“deadlock”isusedforallsortsofsituationsincomputersciencewheretwoprograms,sharinglimitedresources,canwindupwaitingoneachotherforeverbecauseofpoorplanning.ItturnsoutthatitcanhappenfairlyeasilywhenusingTCP.

Takealookattcp_deadlock.pyforanexampleofaserverandclientthattrytobeabittoocleverwithoutthinkingthroughtheconsequences.Here,theserverauthorhasdonesomethingthatisactuallyquiteintelligent.Hisjobistoturnanarbitraryamountoftextintouppercase.Recognizingthatitsclient’srequestscanbearbitrarilylarge,andthatonecouldrunoutofmemorytryingtoreadanentirestreamofinputbeforetryingtoprocessit,theserverreadsandprocessessmallblocksof1,024bytesatatime.

importsocket,sys

s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)

HOST='127.0.0.1'

PORT=1060

ifsys.argv[1:]==['server']:

s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)

s.bind((HOST,PORT))

s.listen(1)

whileTrue:

print'Listeningat',s.getsockname()

sc,sockname=s.accept()

print'Processingupto1024bytesatatimefrom',sockname

n=0

whileTrue:

message=sc.recv(1024)

ifnotmessage:

break

sc.sendall(message.upper())#senditbackuppercase

n+=len(message)

print'\r%dbytesprocessedsofar'%(n,),

sys.stdout.flush()

print

sc.close()

print'Completedprocessing'

eliflen(sys.argv)==3andsys.argv[1]=='client'andsys.argv[2].isdigit():

bytes=(int(sys.argv[2])+15)//16*16#roundupto//16

message='capitalizethis!'#16-bytemessagetorepeatoverandover

print'Sending',bytes,'bytesofdata,inchunksof16bytes'

s.connect((HOST,PORT))

sent=0

whilesent<bytes:

s.sendall(message)

sent+=len(message)

print'\r%dbytessent'%(sent,),

sys.stdout.flush()

print

s.shutdown(socket.SHUT_WR)

print'Receivingallthedatatheserversendsback'

received=0

whileTrue:

data=s.recv(42)

ifnotreceived:

print'Thefirstdatareceivedsays',repr(data)

received+=len(data)

ifnotdata:

break

print'\r%dbytesreceived'%(received,),

s.close()

else:

Deadlock

Page 42: Python Networking Gitbook

print>>sys.stderr,'usage:tcp_deadlock.pyserver|client<bytes>'

Ifyoustarttheserverandthenruntheclientwithacommand-lineargumentspecifyingamodestnumberofbytes—say,askingittosend32bytesofdata(forsimplicity,itwillroundwhatevervalueyousupplyuptoamultipleof16bytes)—thenitwillgetitstextbackinalluppercase:

root@erlerobot:~/Python_files#pythontcp_deadlock.pyserver

Listeningat('127.0.0.1',1060)

root@erlerobot:~/Python_files#pythontcp_deadlock.pyclient32

Sending32bytesofdata,inchunksof16bytes

32bytessent

Receivingallthedatatheserversendsback

Thefirstdatareceivedsays'CAPITALIZETHIS!CAPITALIZETHIS!'

32bytesreceived

Ontheserverscreenthisisdisplayed:

Processingupto1024bytesatatimefrom('127.0.0.1',49702)

32bytesprocessedsofar

Completedprocessing

Listeningat('127.0.0.1',1060)

Now,tryusingtheclienttosendaverylargestreamofdata,say,onetotalingagigabyte:

root@erlerobot:~/Python_files#pythontcp_deadlock.pyclient1073741824

Sending1073741824bytesofdata,inchunksof16bytes

1399600bytessent

Intheserverwindow:

Processingupto1024bytesatatimefrom('127.0.0.1',49703)

688032bytesprocessedsofar

Youwillseeboththeclientandtheserverfuriouslyupdatingtheirterminalwindowsastheybreathlesslyupdateyouwiththeamountofdatatheyhavetransmittedandreceived.Thenumberswillclimbandclimbuntil,quitesuddenly,bothconnectionsfreeze.Theserver’soutputbufferandtheclient’sinputbufferhavebothfinallyfilled,andTCPhasuseditswindowadjustmentprotocoltosignalthisfactandstopthesocketfromsendingmoredatathatwouldhavetobediscardedandlaterre-sent.

Page 43: Python Networking Gitbook

tcp_deadlock.pyshowsushowaPythonsocketobjectbehaveswhenanend-of-fileisreached.Youwillseethattheclientmakesashutdown()callonthesocketafteritfinishessendingitstransmission.Thissolvesanimportantproblem:iftheserverisgoingtoreadforeveruntilitseesend-offile,thenhowwilltheclientavoidhavingtodoafullclose()onthesocketandthusforbiditselffromdoingthemanyrecv()callsthatitstillneedstomaketoreceivetheserver’sresponse?Thesolutionisto“half-close”thesocket—thatis,topermanentlyshutdowncommunicationinonedirectionbutwithoutdestroyingthesocketitself—sothattheservercannolongerreadanydata,butcanstillsendanyremainingreplybackintheotherdirection,whichwillstillbeopen.Theshutdown()callcanbeusedtoendeitherdirectionofcommunicationinatwo-waysocketlikethis;itsargumentcanbeoneofthreesymbols:

SHUT_WR:Thisisthemostcommonvalueused,sinceinmostcasesaprogramknowswhenitsownoutputisfinishedbutnotaboutwhenitsconversationpartnerwillbedone.Thisvaluesaysthatthecallerwillbewritingnomoredataintothesocket,andthatreadsfromitsotherendshouldactlikeitisclosed.

SHUT_RD:Thisisusedtoturnofftheincomingsocketstream,sothatanend-of-fileerrorisencounteredifyourpeertriestosendanymoredatatoyouonthesocket.

SHUT_RDWR:Thisclosescommunicationinbothdirectionsonthesocket.Itmightnot,atfirst,seemuseful,becauseyoucanalsojustperformaclose()onthesocketandcommunicationissimilarlyendedinbothdirections.Thedifferenceisaratheradvancedone:ifseveralprogramsonyouroperatingsystemareallowedtoshareasinglesocket,thenclose()justendsyourprocess’srelationshipwiththesocket,butkeepsitopenaslongasanotherprocessisstillusingit;but`shutdown()willalwaysimmediatelydisablethesocketforeveryoneusingit.

ClosedConnections,Half-OpenConnections

Page 44: Python Networking Gitbook

SinceTCPsupportsstreamsofdata,theymighthavealreadyremindedyouofnormalfiles,whichalsosupportreadingandwritingasfundamentaloperations.Pythondoesaverygoodjobofkeepingtheseconceptsseparate:fileobjectscanread()andwrite(),socketscansend()and`recv(),andnokindofobjectcandoboth.ButsometimesyouwillwanttotreatasocketlikeanormalPythonfileobject—oftenbecauseyouwanttopassitintocodelikethatofthemanyPythonmodulessuchaspickle,json,andzlibthatcanreadandwritedatadirectlyfromafile.Forthispurpose,Pythonprovidesamakefile()methodoneverysocketthatreturnsaPythonfileobjectthatisreallycallingrecv()andsend()behindthescenes:

>>>importsocket

>>>s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)

>>>hasattr(s,'read')

False

>>>f=s.makefile()

>>>hasattr(f,'read')

True

Sockets,likenormalPythonfiles,alsohaveafileno()methodthatletsyoudiscovertheirfiledescriptornumberincaseyouneedtosupplyittolower-levelcalls.

UsingTCPStreamslikeFiles

Page 45: Python Networking Gitbook

Inthischapter,wewilldiscussthetopicofnetworkaddressesandwilldescribethedistributedservicethatallowsnamestoberesolvedtorawIPaddresses.

SocketnamesandDNS

Page 46: Python Networking Gitbook

ThelastchapterhasalreadyintroducedyoutothefactthatsocketscannotbenamedwithasingleprimitivePythonvaluelikeanumberorstring.Instead,bothTCPandUDPuseintegerportnumberstoshareasinglemachine'sIPaddressamongthemanydifferentapplicationsthatmightberunningthere,andsotheaddressandportnumberhavetobecombinedinordertoproduceasocketname,likethis:

('18.9.22.69',80)

Youwillrecallthatsocketnamesareimportantatseveralpointsinthecreationanduseofsockets.Foryourreference,hereareallofthemajorsocketmethodsthatdemandofyousomesortofsocketnameasanargument:

mysocket.accept():EachtimethisiscalledonalisteningTCPstreamsocketthathasincomingconnectionsreadytohandofftotheapplication,itreturnsatuple(orderedsetofvalues)whoseseconditemistheremoteaddressthathasconnected(thefirstiteminthetupleisthenetsocketconnectedtothatremoteaddress).

mysocket.bind(address):Assignsthesocketthelocaladdresssothatoutgoingpacketshaveanaddressfromwhichtooriginate,andsothatanyincomingconnectionsfromothermachineshaveanamethattheycanusetoconnect.

mysocket.connect(address):Establishesthatdatasentthroughthissocketwillbedirectedtothegivenremoteaddress.ForUDPsockets,thissimplysetsthedefaultaddressusedifthecallerusessend()ratherthansendto();forTCPsockets,thisactuallynegotiatesanewstreamwithanothermachineusingathree-wayhandshake,andraisesanexceptionifthenegotiationfails.

mysocket.getpeername():Returnstheremoteaddresstowhichthissocketisconnected.

mysocket.getsockname():Returnstheaddressofthissocket'sownlocalendpoint.

mysocket.recvfrom(...):ForUDPsockets,thisreturnsatuplethatpairsastringofreturneddatawiththeaddressfromwhichitwasjustsent.

mysocket.sendto(data,address):AnunconnectedUDPportusesthismethodtofireoffadatapacketataparticularremoteaddress.

Ingeneral,anyoftheforegoingmethodscanreceiveorreturnanyofthesortsofaddressesthatfollow,meaningthattheywillworkregardlessofwhetheryouareusingIPv4,IPv6orothers.

Socketnames

Page 47: Python Networking Gitbook

Ifyoureviewpreviouscode,youwillnoticethatwehaveuse:

importsocket

s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)

s.bind(('localhost',1060))

WepaidparticularattentiontothehostnamesandIPaddressesthattheirsocketsused.Butifyoureadeachprogramlistingfromthebeginning,youwillseethattheseareonlythelasttwocoordinatesoffivemajordecisionsthatweremadeduringtheconstructionanddeploymentofeachsocketobject.Inorder,hereisthefulllistofvaluesthathadtobechosen,andyouwillseethattherearefiveinall:

First,theaddressfamilymakesthebiggestdecision:itnameswhatkindofnetworkyouwanttotalkto,outofthemanykindsthataparticularmachinemightsupport.WewillalwaysusethevalueAF_INET.

Nextaftertheaddressfamilycomesthesockettype.Itchoosestheparticularkindofcommunicationtechniquethatyouwanttouseonthenetworkyouhavechosen.thesocketinterfacedesignersdecidedtocreatemoregenericnamesforthebroadideaofapacket-basedsocket,whichgoesbythenameSOCK_DGRAM,andthebroadideaofareliableflowcontrolleddatastream,whichaswehaveseenisknownasaSOCK_STREAM.

Thethirdfieldinthesocket()call,theprotocol,israrelyusedbecauseonceyouhavespecifiedtheaddressfamilyandsockettype,youhavenarroweddownthepossibleprotocolstoonemajoroption.

Thefourthandfifthfieldsare,then,theIPaddressandUDPorTCPportnumberthatwereexplainedindetailinthelastchapters.

Fivesocketcordinates

Page 48: Python Networking Gitbook

Andhavingexplainedallofthat,itturnsoutthatthisbookactuallydoesneedtointroduceoneadditionaladdressfamily,beyondtheAF_INETwehaveusedsofar:theaddressfamilyforIPv6,namedAF_INET6,whichisthewayforwardintoafuturewheretheworlddoesnot,infact,runoutofIPaddresses.

InPythonyoucantestdirectlyforwhethertheunderlyingplatformsupportsIPv6bycheckingthehas_ipv6Booleanattributeinsidethesocketmodule:

>>>importsocket

>>>socket.has_ipv6

True

ButnotethatthisdoesnottellyouwhetheranactualIPv6interfaceisupandconfiguredandcancurrentlybeusedtosendpacketsanywhere;itispurelyanassertionaboutwhetherIPv6supporthasbeencompiledintotheoperatingsystem,notaboutwhetheritisinuse.

ThedifferencesthatIPv6willmakeforyourPythoncodemightsoundquitedaunting,iflistedonerightaftertheother:

YoursocketshavetobepreparedtohavethefamilyAF_INET6ifyouarecalledupontooperateonanIPv6network.

Nolongerdosocketnamesconsistofjusttwopieces,anaddressandaportnumber;instead,theycanalsoinvolveadditionalcoordinatesthatprovide“flow”informationanda“scope”identifier.

TheprettyIPv4octetslike18.9.22.69thatyoumightalreadybereadingfromconfigurationfilesorfromyourcommand-lineargumentswillnowsometimesbereplacedbyIPv6hostaddressesinstead,whichyoumightnotevenhavegoodregularexpressionsforyet.Theyhavelotsofcolons,theycaninvolvehexadecimalnumbers,andingeneraltheylookquiteugly.

IPv6

Page 49: Python Networking Gitbook

Tomakeyourcodesimple,powerful,andimmunefromthecomplexitiesofthetransitionfromIPv4toIPv6,youshouldturnyourattentiontooneofthemostpowerfultoolsinthePythonsocketuser'sarsenal:getaddrinfo().Thegetaddrinfo()functionsitsinthesocketmodulealongwithmostotheroperationsthatinvolveaddresses(ratherthanbeingasocketmethod).Unlessyouaredoingsomethingspecialized,itisprobablytheonlyroutinethatyouwilleverneedtotransformthehostnamesandportnumbersthatyourusersspecifyintoaddressesthatcanbeusedbysocketmethods.Itsapproachissimple:ratherthanmakingyouattacktheaddressingproblempiecemeal,whichisnecessarywhenusingtheolderroutinesinthesocketmodule,itletsyouspecifyeverythingyouknowabouttheconnectionthatyouneedtomakeinasinglecall.Inresponse,itreturnsallofthecoordinateswediscussedearlierthatarenecessaryforyoutocreateandconnectasockettothenameddestination.

IfwevisitPythonOfficialDocumentationwefindthissomeinterestingeplanations.Firstthesyntaxisisthefollowing:

socket.getaddrinfo(host,port[,family[,socktype[,proto[,flags]]]])

Sowhatgetaddrinfo()doesis;translatethehost/portargumentintoasequenceof5-tuplesthatcontainallthenecessaryargumentsforcreatingasocketconnectedtothatservice.hostisadomainname,astringrepresentationofanIPv4/v6addressorNone.portisastringservicenamesuchas'http',anumericportnumberorNone.BypassingNoneasthevalueofhostandport,youcanpassNULLtotheunderlyingCAPI.

Thefunctionreturnsalistof5-tupleswiththefollowingstructure:

(family,socktype,proto,canonname,sockaddr)

Inthesetuples,family,socktype,protoareallintegersandaremeanttobepassedtothesocket()function."canonname"willbeastringrepresentingthecanonicalnameofthehostifAI_CANONNAMEispartoftheflagsargument;elsecanonnamewillbeempty."sockaddr"isatupledescribingasocketaddress,whoseformatdependsonthereturnedfamily(a(address,port)2-tupleforAF_INET,a(address,port,flowinfo,scopeid)4-tupleforAF_INET6),andismeanttobepassedtothesocket.connect()method.

Hereyoufindaexapmleofuse:

>>>importsocket

>>>frompprintimportpprint

>>>infolist=socket.getaddrinfo('gatech.edu','www')

>>>pprint(infolist)

[(2,2,17,'',('130.207.160.173',80)),

(2,1,6,'',('130.207.160.173',80))]

>>>

>>>ftpca=infolist[0]

>>>ftpca[0:3]

(2,2,17)

>>>s=socket.socket(*ftpca[0:3])

>>>ftpca[4]

('130.207.160.173',80)

>>>s.connect(ftpca[4])

>>>

ftpcahereisanacronymfortheorderofthevariablesthatarereturned:“family,type,protocol,canonicalname,andaddress,”whichcontaineverythingyouneedtomakeaconnection.Here,wehaveaskedaboutthepossiblemethodsforconnectingtotheHTTPportofthehostgatech.edu,andhavebeentoldthattherearetwowaystodoit:bycreatingaSOCK_STREAMsocket(sockettype1)thatusesIPPROTO_TCP(protocolnumber6)orelsebyusingaSOCK_DGRAM(sockettype2)socketwith`IPPROTO_UDP(whichistheprotocolrepresentedbytheinteger17).

Asyoucanseefromtheforegoingcodesnippet,getaddrinfo()generallyallowsnotonlythehostnamebutalsotheportnametobeasymbolratherthananinteger.

Thegetaddrinfo()function

Page 50: Python Networking Gitbook

Beforetacklingalloftheoptionsthatgetaddrinfo()supports,itwillbemoreusefultoseehowitisusedtosupportthreebasicnetworkoperations.Wewilltacklethemintheorderthatyoumightperformoperationsonasocket:binding,connecting,andthenidentifyingaremotehostwhohassentyouinformation.

>>>importsocket

>>>fromsocketimportgetaddrinfo

>>>getaddrinfo(None,'smtp',0,socket.SOCK_STREAM,0,socket.AI_PASSIVE)

[(2,1,6,'',('0.0.0.0',25)),(30,1,6,'',('::',25,0,0))]

>>>getaddrinfo(None,53,0,socket.SOCK_DGRAM,0,socket.AI_PASSIVE)

[(2,2,17,'',('0.0.0.0',53)),(30,2,17,'',('::',53,0,0))]

>>>

Hereweaskedaboutwhereweshouldbind()asocketifwewanttoserveSMTPtrafficusingTCP,andifwewanttoserveDNStrafficusingDCP,respectively.TheanswerswegotbackineachcasearetheappropriatewildcardaddressesthatwillletusbindtoeveryIPv4andeveryIPv6interfaceonthelocalmachinewithalloftherightvaluesforthesocketfamily,sockettype,andprotocolineachcase.Ifyouinsteadwanttobind()toaparticularIPaddressthatyouknowthatthelocalmachineholds,thenomittheAI_PASSIVEflagandjustspecifythehostname.Forexample,herearetwowaysthatyoumighttrybindingtolocalhost:

>>>getaddrinfo('127.0.0.1','smtp',0,socket.SOCK_STREAM,0)

[(2,1,6,'',('127.0.0.1',25))]

>>>getaddrinfo('localhost','smtp',0,socket.SOCK_STREAM,0)

[(30,1,6,'',('::1',25,0,0)),(2,1,6,'',('127.0.0.1',25)),(30,1,6,'',('fe80::1%lo0',25,0,1))]

>>>

YoucanseethatsupplyingtheIPv4addressforthelocalhostlocksyoudowntoreceivingconnectionsonlyoverIPv4,whileusingthesymbolicnamelocalhost(atleastonaLinuxlaptop,withawell-configured/etc/hostsfile)makesavailableboththeIPv4andIPv6localnamesforthemachine.

Askinggetaddrinfo()WheretoBind

Page 51: Python Networking Gitbook

Themajorityusesofgetaddrinfo()areoutward-looking,andgenerateinformationsuitableforconnectingyoutootherapplications.Inallsuchcases,youcaneitheruseanemptystringtoindicatethatyouwanttoconnectbacktothelocalhostusingtheloopbackinterface,orprovideastringgivinganIPv4address,IPv6address,orhostnametonameyourdestination.Theusualuseofgetaddrinfo()inallothercases—which,basically,iswhenyouarepreparingtoconnect()or sendto()—istospecifytheAI_ADDRCONFIGflag,whichfiltersoutanyaddressesthatareimpossibleforyourcomputertoreach.Forexample,anorganizationmighthavebothanIPv4andanIPv6rangeofIPaddresses;butifyourparticularhostsupportsonlyIPv4,thenyouwillwanttheresultsfilteredtoincludeonlyaddressesinthatfamily.IncasethelocalmachinehasonlyanIPv6networkinterfacebuttheserviceyouareconnectingtoissupportingonlyIPv4,theAI_V4MAPPEDwillreturnyouthoseIPv4addressesre-encodedasIPv6addressesthatyoucanactuallyuse.Soyouwillusuallyusegetaddrinfo()thiswaywhenconnecting:

>>>importsocket

>>>fromsocketimportgetaddrinfo

>>>getaddrinfo('ftp.kernel.org','ftp',0,socket.SOCK_STREAM,0,socket.AI_ADDRCONFIG|socket.AI_V4MAPPED)

[(2,1,6,'',('199.204.44.194',21)),(2,1,6,'',('198.145.20.140',21)),(2,1,6,'',('149.20.4.69',21))]

>>>

Andwehavegottenexactlywhatwewanted:everywaytoconnecttoahostnamedftp.kernel.orgthroughaTCPconnectiontoitsFTPport.

Hereisanotherquery,whichdescribeshowIcanconnectfrommylaptoptotheHTTPinterfaceoftheIANAthatassignsportnumbersinthefirstplace:

>>>getaddrinfo('iana.org','www',0,socket.SOCK_STREAM,0,socket.AI_ADDRCONFIG|socket.AI_V4MAPPED)

[(2,1,6,'',('192.0.43.8',80))]

>>>

Ifwetakeawayourcarefullychosenflagsinthesixthparameter,thenwewillalsobeabletoseetheirIPv6address:

>>>getaddrinfo('iana.org','www',0,socket.SOCK_STREAM,0)

[(2,1,6,'',('192.0.43.8',80)),(30,1,6,'',('2001:500:88:200::8',80,0,0))]

>>>

Askinggetaddrinfo()AboutServices

Page 52: Python Networking Gitbook

Onelastcircumstancethatyouwillcommonlyencounteriswhereyoueitheraremakinganewconnection,ormaybehavejustreceivedaconnectiontooneofyourownsockets,andyouwantanattractivehostnametodisplaytotheuserorrecordinalogfile.Thisisslightlydangerousbecauseahostnamelookupcantakequiteabitoftime,evenonthemodernInternet,andmightreturnahostnamethatnolongerworksbythetimeyougoandcheckyourlogs—soforlogfiles,trytorecordboththehostnameandrawIPaddress.Butifyouhaveagooduseforthe“canonicalname”ofahost,thentryrunninggetaddrinfo()withtheAI_CANONNAMEflagturnedon,andthefourthitemofanyofthetuplesthatitreturns—thatwerealwaysemptystringsintheforegoingexamples,youwillnote—willcontainthecanonicalname:

>>>importsocket

>>>fromsocketimportgetaddrinfo

>>>getaddrinfo('iana.org','www',0,socket.SOCK_STREAM,0,socket.AI_ADDRCONFIG|socket.AI_V4MAPPED|socket.AI_CANONNAME)

[(2,1,6,'iana.org',('192.0.43.8',80))]

>>>

Askinggetaddrinfo()forPrettyHostnames

Page 53: Python Networking Gitbook

Theflagsavailablevarysomewhatbyoperatingsystem,andyoushouldalwaysconsultyourowncomputer'sdocumentation(nottomentionitsconfiguration)ifyouareconfusedaboutavaluethatitchoosestoreturn.Butthereareseveralflagsthattendtobecross-platform;herearesomeofthemoreimportantones:

AI_ALL:WAI_V4MAPPEDoptionwillsaveyouinthesituationwhereyouareonapurelyIPv6-connectedhost,butthehosttowhichyouwanttoconnectadvertisesonlyIPv4addresses:itresolvesthisproblemby“mapping”theIPv4addressestotheirIPv6equivalent.ButifsomeIPv6addressesdohappentobeavailable,thentheywillbetheonlyonesshown.Thustheexistenceofthisoption:ifyouwanttoseealloftheaddressesfromyourIPv6-connectedhost,eventhoughsomeperfectlygoodIPv6addressesareavailable,thencombinethisAI_ALLflagwithAI_V4MAPPEDandthelistreturnedtoyouwillhaveeveryaddressknownforthetargethost.

AI_NUMERICHOST:Thisturnsoffanyattempttointerpretthehostnameparameter(thefirstparameterto`getaddrinfo())asatextualhostnamelikecern.ch,andonlytriestointerpretthehostnamestringasaliteralIPv4orIPv6hostnamelike74.207.234.78orfe80::fcfd:4aff:fecf:ea4e.Thisismuchfaster,asnoDNSround-tripisincurred(seethenextsection),andpreventspossiblyuntrusteduserinputfromforcingyoursystemtoissueaquerytoanameserverundersomeoneelse'scontrol.

AI_NUMERICSERV:Thisturnsoffsymbolicportnameslikewwwandinsiststhatportnumberslike80beusedinstead.Thisdoesnotnecessarilyhavethenetworkqueryimplicationsofthepreviousoption,sinceport-numberdatabasesaretypicallystoredlocallyonIP-connectedmachines;onPOSIXsystems,resolvingasymbolicportnametypicallyrequiresonlyaquickscanofthe/etc/servicesfile(butcheckyour/etc/nsswitch.conffile'sservicesoptiontobesure).Butifyouknowyourportstringshouldalwaysbeaninteger,thenactivatingthisflagcanbeausefulsanitycheck.

Othergetaddrinfo()Flags

Page 54: Python Networking Gitbook

Hereyouhaveaquickexampleofhowgetaddrinfo()looksinactualcodeinwww_ping.py.

importsocket,sys

iflen(sys.argv)!=2:

print>>sys.stderr,'usage:www_ping.py<hostname_or_ip>'

sys.exit(2)

hostname_or_ip=sys.argv[1]

try:

infolist=socket.getaddrinfo(

hostname_or_ip,'www',0,socket.SOCK_STREAM,0,

socket.AI_ADDRCONFIG|socket.AI_V4MAPPED|socket.AI_CANONNAME,

)

exceptsocket.gaierror,e:

print'Nameservicefailure:',e.args[1]

sys.exit(1)

info=infolist[0]#perstandardrecommendation,trythefirstone

socket_args=info[0:3]

address=info[4]

s=socket.socket(*socket_args)

try:

s.connect(address)

exceptsocket.error,e:

print'Networkfailure:',e.args[1]

else:

print'Success:host',info[3],'islisteningonport80'

Itperformsasimpleare-you-theretestofwhateverwebserveryounameonthecommandlineby

attemptingaquickconnectiontoport80withastreamingsocket.Usingthescriptwouldlooksomethinglikethis:

root@erlerobot:~/Python_files#

root@erlerobot:~/Python_files#pythonwww_ping.pymit.edu

Success:hostmit.eduislisteningonport80

root@erlerobot:~/Python_files#pythonwww_ping.pysmtp.google.com

Nameservicefailure:nodenamenorservnameprovided,ornotknown

root@erlerobot:~/Python_files#www_ping.pyno-such-host.com

Nameservicefailure:nodenamenorservnameprovided,ornotknown

root@erlerobot:~/Python_files#

Notethatthesocket()constructordoesnottakealistofthreeitemsasitsparameter.Instead,theparameterlistisintroducedbyanasterisk,whichmeansthatthethreeelementsofthesocket_argslistarepassedasthreeseparateparameterstotheconstructor.

getaddrinfo()inyourowncode

Page 55: Python Networking Gitbook

TheDNSProtocolpurposeistoturnhostnamesintoIPaddresses.

Forexample,considerthedomainnamewww.python.org.Ifyourwebbrowserneedstoknowthisaddress,thenthebrowserrunsacalllikegetaddrinfo()toasktheoperatingsystemtoresolvethatname.Yoursystemwillknoweitherthatitisrunninganameserverofitsown,orthatthenetworktowhichitisattachedprovidesnameservice.So,thefirstactofyourDNSserverwillbetocheckitsowncacheofrecentlyquerieddomainnamestoseeifwww.python.orghasalreadybeencheckedbysomeothermachineservedbytheDNSserverinthelastfewminutesorhours.Ifanentryispresentandhasnotyetexpired—andtheownerofeachdomainnamegetstochooseitsexpirationtimeout,becausesomeorganizationsliketochangeIPaddressesquicklyiftheyneedto,whileothersarehappytohaveoldIPaddresseslingerforhoursordaysintheworld'sDNScaches—thenitcanbereturnedimmediately.Butletusimaginethatitismorningandthatyouarethefirstpersoninyourofficeorinthecoffeeshoptotrytalkingtowww.python.orgtoday,andsotheDNSserverhastogofindthehostnamefromscratch.YourDNSserverwillnowbeginarecursiveprocessofaskingaboutwww.python.orgattheverytopoftheworld'sDNSserverhierarchy:the“root-level”nameserversthatknowallofthetop-leveldomains(TLDs)like.com,.org,.net,andallofthecountrydomains,andknowthegroupsofserversthatareresponsibleforeach.NameserversoftwaregenerallycomeswiththeIPaddressesofthesetop-levelserversbuiltin,tosolvethebootstrappingproblemofhowyoufindanydomainnameserversbeforeyouareactuallyconnectedtothedomainnamesystem.WiththisfirstUDPround-trip,yourDNSserverwilllearn(ifitdidnotknowalreadyfromanotherrecentquery)whichserverskeepthefullindexof.orgdomain.

NowasecondDNSrequestwillbemade,thistimetooneofthe.orgservers,askingwhoonearthrunsthepython.orgdomain.Youcanfindoutwhatthosetop-levelserversknowaboutadomainbyrunningthewhoiscommand-lineprogramonaPOSIXsystem,oruseoneofthemany“whois”webpagesonline,typing:

whoispython.org

Whereveryouareintheworld,yourDNSrequestforanyhostnamewithinpython.orgmustbepassedontooneofthetwoDNSserversnamedinthatentry.

TherearesomereasondtonotuseDNS,andusegetaddrinfo()orsomeothersystem-supportedmechanismforresolvinghostnames.

TheDNSisoftennottheonlywaythatasystemgetsnameinformation.

IfyourapplicationrunsoffandtriestouseDNSonitsownasitsfirstchoiceforresolvingadomainname,thenuserswillnoticethatsomecomputernamesthatworkeverywhereelseonyoursystem—intheirbrowser,infilesharenames,andsoforth—suddenlydonotworkwhentheyuseyourapplication,becauseyouarenotdeferringtomechanismslikeWINSor/etc/hostsliketheoperatingsystemitselfdoes.

ThelocalmachineprobablyhasacacheofrecentlyquerieddomainnamesthatmightalreadyknowaboutthehostwhoseIPaddressyouneed.IfyoutryspeakingDNSyourselftoansweryourquery,youwillbeduplicatingworkthathasalreadybeendone.

ThesystemonwhichyourPythonscriptisrunningalreadyknowsaboutthelocaldomainnameservers,thankseithertomanualinterventionbyyoursystemadministratororanetworkconfigurationprotocollikeDHCPinyouroffice,home,orcoffeeshop.TocrankupDNSrightinsideyourPythonprogram,youwillhavetolearnhowtoqueryyourparticularoperatingsystemforthisinformation—anoperating-system-specificactionthatwewillnotbecoveringinthisbook.

IfyoudonotusethelocalDNSserver,thenyouwillnotbeabletobenefitfromitsowncachethatwouldpreventyourapplicationandotherapplicationsrunningonthesamenetworkfromrepeatingrequestsaboutahostnamethatisinfrequentuseatyourlocation.

ASketchofHowDNSWorks

Page 56: Python Networking Gitbook

Fromtimetotime,adjustmentsaremadetotheworldDNSinfrastructure,andoperatingsystemlibrariesanddaemonsaregraduallyupdatedtoaccommodatethis.IfyourprogrammakesrawDNScallsofitsown,thenyouwillhavetofollowthesechangesyourselfandmakesurethatyourcodestaysup-to-datewiththelatestchangesinTLDserverIPaddresses,conventionsinvolvinginternationalization,andtweakstotheDNSprotocolitself.

Thereis,however,asolidandlegitimatereasontomakeaDNScallfromPython:becauseyouareamailserver,orattheveryleastaclienttryingtosendmaildirectlytoyourrecipientswithoutneedingtorunalocalmailrelay,andyouwanttolookuptheMXrecordsassociatedwithadomainsothatyoucanfindthecorrectmailserverforyourfriendsat@example.com.

Page 57: Python Networking Gitbook

PyDNSprovidesamoduleforperformingDNSqueriesfrompythonapplications.Youcaninstallitby:

pipinstallpydns

YourPythoninterpreterwillthengaintheabilitytorunourfirstDNSprogramlisting,shownindns_basic.py.

importsys,DNS

iflen(sys.argv)!=2:

print>>sys.stderr,'usage:dns_basic.py<hostname>'

sys.exit(2)

DNS.DiscoverNameServers()

request=DNS.Request()

forqtinDNS.Type.A,DNS.Type.AAAA,DNS.Type.CNAME,DNS.Type.MX,DNS.Type.NS:

reply=request.req(name=sys.argv[1],qtype=qt)

foranswerinreply.answers:

printanswer['name'],answer['classstr'],answer['typename'],\

repr(answer['data'])

Runningthisprogrammwillresultoon:

root@erlerobot:~/Python_files#dns_basic.pypython.org

python.orgINA'82.94.164.162'

python.orgINAAAA'\x01\x08\x88\x00\x00\r\x00\x00\x00\x00\x00\x00\x00\xa2'

python.orgINMX(50,'mail.python.org')

python.orgINNS'ns2.xs4all.nl'

python.orgINNS'ns.xs4all.nl'

Thekeysthatgetprintedoneachlineareasfollows:

Thenamethatwelookedup.

The“class,”whichinallqueriesyouarelikelytoseeisIN,meaningitisaquestionaboutInternetaddresses.

The“type”ofrecord;somecommononesareAforanIPv4address,AAAAforanIPv6address,NSforarecordthatlistsanameserver,andMXforastatementaboutwhatmailservershouldbeusedforadomain.

Finally,the“data”providestheinformationforwhichtherecordtypewasessentiallyapromise:theaddress,ordata,orhostnameassociatedwiththenamethatweaskedabout.

UsingDNS

Page 58: Python Networking Gitbook

Whatdatashouldwesend?Howshoulditbeencodedandformatted?ForwhatkindsoferrorswillourPythonprogramsneedtobeprepared?Wewilllookatthebasicanswersinthischapter,andlearnhowtousesocketsresponsiblysothatourdataarrivesintact.

NetworkDataandNetworkErrors

Page 59: Python Networking Gitbook

TheuseofASCIIforthebasicEnglishlettersandnumbersisnearlyuniversalamongnetworkprotocolsthesedays.Butwhenyoubegintousemoreinterestingcharacters,youhavetobecareful.InPythonyoushouldalwaysrepresentameaningfulstringoftextwitha“Unicodestring”thatisdenotedwithaleadingu,likethis:

>>>elvish=u'Namárië!'

Butyoucannotputsuchstringsdirectlyonanetworkconnectionwithoutspecifyingwhichrivalsystemofencodingyouwanttousetomixyourcharactersdowntobytes.AverypopularsystemisUTF-8,becausenormalcharactersarerepresentedbythesamecodesasinASCII,andlongersequencesofbytesarenecessaryonlyforinternationalcharacters.OtherencodingsareavailableinPython;theStandardLibrarydocumentationforthecodecspackageliststhemall.Theyeachrepresentafullsystemforreducingsymbolstobytes.Hereareafewexamples:

>>>elvish.encode('idna')

'xn--namri!-rta6f'

>>>elvish.encode('cp500')

'\xd5\x81\x94E\x99\x89SO'

>>>elvish.encode('utf_8_sig')

'\xef\xbb\xbfNam\xc3\xa1ri\xc3\xab!'

Onthereceivingendofsuchastring,simplytakethebytestringandcallitsdecode()methodwiththenameofthecodecthatwasusedtoencodeit:

>>>'xn--namri!-rta6f'.decode('idna')

u'nam\xe1ri\xeb!'

>>>'\xd5\x81\x94E\x99\x89SO'.decode('cp500')

u'Nam\xe1ri\xeb!'

>>>'\xef\xbb\xbfNam\xc3\xa1ri\xc3\xab!'.decode('utf_8_sig')

u'Nam\xe1ri\xeb!'

TextandEncodings

Page 60: Python Networking Gitbook

Tounderstandtheissueofbyteorder,considertheprocessofsendinganintegeroverthenetwork.Tobespecific,thinkabouttheinteger4253.

Manyprotocols,ofcourse,willsimplytransmitthisintegerasthestring'4253'—thatis,asfourdistinctcharacters.Thefourdigitswillrequireatleastfourbytestotransmit,atleastinanycommontextencoding.Andusingdecimaldigitswillalsoinvolvesomecomputationalexpense:sincenumbersarenotstoredinsidecomputersinbase10,itwilltakerepeateddivision—withinspectionoftheremainder—todeterminethatthisnumberisinfactmadeof4thousands,plus2hundreds,plus5tens,plus3leftover.Andwhenthefour-digitstring'4253'isreceived,repeatedadditionandmultiplicationbypowersoftenwillbenecessarytoputthetextbacktogetherintoanumber.

Inanycase,thestring'4253'isnothowyourcomputerrepresentsthisnumberasanintegervariableinPython.Insteaditwillstoreitasabinarynumber,usingthebitsofseveralsuccessivebytestorepresenttheone'splace,two'splace,four'splace,andsoforthofasinglelargenumber.Wecanglimpsethewaythattheintegerisstoredbyusingthehex()built-infunctionatthePythonprompt:

>>>hex(4253)

'0x109d'

Eachhexdigitcorrespondstofourbits,soeachpairofhexdigitsrepresentsabyteofdata.Insteadofbeingstoredasfourdecimaldigits4,4,2,and3withthefirst4beingthe“mostsignificant”digit(sincetweakingitsvaluewouldthrowthenumberoffbyathousand)and3beingitsleastsignificantdigit,thenumberisstoredasamostsignificantbyte0x10andaleastsignificantbyte0x9d,adjacenttooneanotherinmemory.

Herewereachagreatdifferencebetweencomputers.Whiletheywillallagreethatthebytesinmemoryhaveanorder,andtheywillallstoreastringlikeContent-Length:4253inexactlythatorderstartingwithCandendingwith3,theydonotshareasingleideaabouttheorderinwhichthebytesofabinarynumbershouldbestored.Somecomputersare“big-endian”andputthemostsignificantbytefirst;othersare“little-endian”andputtheleastsignificantbytefirst.

Pythonmakesitveryeasytoseethedifferencebetweenthetwoendiannesses.Simplyusethestructmodule,whichprovidesavarietyofoperationsforconvertingdatatoandfrompopularbinaryformats.Hereisthenumber4253representedfirstinalittle-endianformatandtheninabig-endianorder:

>>>importstruct

>>>struct.pack('<i',4253)

'\x9d\x10\x00\x00'

>>>struct.pack('>i',4253)

'\x00\x00\x10\x9

structmoduleperformsconversionsbetweenPythonvaluesandCstructsrepresentedasPythonstrings.Youcanreadmorehere.Wehereusedthecodei,whichusesfourbytestostoreaninteger,sothetwoupperbytesarezeroforasmallnumberlike4253.Italsosupportsanunpack()operation,whichconvertsthebinarydatabacktoPythonnumbers:

>>>struct.unpack('>i','\x00\x00\x10\x9d')

(4253,)

Thereforethestructmoduleprovidesanothersymbol,'!',whichmeansthesamethingas'>'whenusedinpack()andunpack()butsaystootherprogrammers(and,ofcourse,toyourselfasyoureadthecodelater),“IampackingthisdatasothatIcansenditoverthenetwork.”

NetworkByteOrder

Page 61: Python Networking Gitbook

IfyouhavemadethefarmorecommonoptionofusingaTCPstreamforcommunication,thenyouwillfacetheissueofframing,thatis,theissueofhowtodelimityourmessagessothatthereceivercantellwhereonemessageendsandthenextbegins.

Thereisafirstpattern(streaming)thatcanbeusedbyextremelysimplenetworkprotocolsthatinvolveonlythedeliveryofdata—noresponseisexpected,sothereneverhastocomeatimewhenthereceiverdecides“Enough!”andturnsaroundtosendaresponse.Inthiscase,thesendercanloopuntilalloftheoutgoingdatahasbeenpassedtosendall()andthenclose()thesocket.Thereceiverneedonlycallrecv()repeatedlyuntilthecallfinallyreturnsanemptystring,indicatingthatthesenderhasfinallyclosedthesocket.Youcanseethispatterninstreamer.py:

importsocket,sys

s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)

HOST=sys.argv.pop()iflen(sys.argv)==3else'127.0.0.1'

PORT=1060

ifsys.argv[1:]==['server']:

s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)

s.bind((HOST,PORT))

s.listen(1)

print'Listeningat',s.getsockname()

sc,sockname=s.accept()

print'Acceptedconnectionfrom',sockname

sc.shutdown(socket.SHUT_WR)

message=''

whileTrue:

more=sc.recv(8192)#arbitraryvalueof8k

ifnotmore:#sockethasclosedwhenrecv()returns''

break

message+=more

print'Donereceivingthemessage;itsays:'

printmessage

sc.close()

s.close()

elifsys.argv[1:]==['client']:

s.connect((HOST,PORT))

s.shutdown(socket.SHUT_RD)

s.sendall('Beautifulisbetterthanugly.\n')

s.sendall('Explicitisbetterthanimplicit.\n')

s.sendall('Simpleisbetterthancomplex.\n')

s.close()

else:

print>>sys.stderr,'usage:streamer.pyserver|client[host]'

Ifyourunthisscriptasaserverandthen,atanothercommandprompt,runtheclientversion,you

willseethatalloftheclient'sdatamakesitintacttotheserver,withtheend-of-fileeventgeneratedbytheclientclosingthesocketservingastheonlyframingthatisnecessary:

root@erlerobot:~/Python_files#pythonstreamer.pyserver

Listeningat('127.0.0.1',1060)

Acceptedconnectionfrom('127.0.0.1',49592)

Donereceivingthemessage;itsays:

Beautifulisbetterthanugly.

Explicitisbetterthanimplicit.

Simpleisbetterthancomplex.

Thereisasecondpatternisavariantonthefirst:streaminginbothdirections.Thesocketisinitiallyleftopeninbothdirections.First,dataisstreamedinonedirection—exactlyandthenthatdirectionaloneisshutdown.Second,dataisthen

FramingandQuoting

Page 62: Python Networking Gitbook

streamedintheotherdirection,andthesocketisfinallyclosed.

Athirdpattern,whichwehavealreadyseen,istousefixed-lengthmessages,asillustratedintcp_sixteen.py.YoucanusethePythonsendall()methodtokeepsendingpartsofastringuntilthewholethinghasbeentransmitted,andthenusearecv()loopofourowndevisingtomakesurethatyoureceivethewholemessage.

Afourthpatternistosomehowdelimityourmessageswithspecialcharacters.Thereceiverwouldwaitinarecv()loopliketheonejustcited,butwaituntilthereplystringitwasaccumulatingfinallycontainedthedelimiterindicatingtheend-of-message.

Afifthpatternistoprefixeachmessagewithitslength.Thisisaverypopularchoiceforhighperformanceprotocolssinceblocksofbinarydatacanbesentverbatimwithouthavingtobeanalyzed,quoted,orinterpolated.Ofcourse,thelengthitselfhastobeframedusingoneofthetechniquesgivenpreviously—oftenitissimplyafixed-widthbinaryinteger,orelseavariable-lengthdecimalstringfollowedbyadelimiter.Buteitherway,oncethelengthhasbeenreadanddecoded,thereceivercanenteraloopandcallrecv()repeatedlyuntilthewholemessagehasarrived.

Thereissixthpatternforwhichtheunknownlengthsarenoproblem.Insteadofsendingjustone,trysendingseveralblocksofdatathatareeachprefixedwiththeirlength.Thismeansthataseachchunkofnewinformationbecomesavailabletothesender,itcanbelabeledwithitslengthandplacedontheoutgoingstream.Whentheendfinallyarrives,thesendercanemitanagreed-uponsignal—perhapsalengthfieldgivingthenumberzero—thattellsthereceiverthattheseriesofblocksiscomplete.

Following(blocks.py)youcanfindanexampleofthissixthpattern.Likethepreviousone,thissendsdatainonlyonedirection—fromtheclienttotheserver—butthedatastructureismuchmoreinteresting.Eachmessageisprefixedwitha4-bytelength;inastruct,'I'meansa32-bitunsignedinteger,meaningthatthesemessagescanbeupto4GBinlength.Aseriesofthreesuchmessagesissenttotheserver,followedbyazero-lengthmessage—whichisessentiallyjustalengthfieldwithzerosinsideandthennomessagedataafterit—tosignalthattheseriesofblocksisover.

importsocket,struct,sys

s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)

HOST=sys.argv.pop()iflen(sys.argv)==3else'127.0.0.1'

PORT=1060

format=struct.Struct('!I')#formessagesupto2**32-1inlength

defrecvall(sock,length):

data=''

whilelen(data)<length:

more=sock.recv(length-len(data))

ifnotmore:

raiseEOFError('socketclosed%dbytesintoa%d-bytemessage'

%(len(data),length))

data+=more

returndata

defget(sock):

lendata=recvall(sock,format.size)

(length,)=format.unpack(lendata)

returnrecvall(sock,length)

defput(sock,message):

sock.send(format.pack(len(message))+message)

ifsys.argv[1:]==['server']:

s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)

s.bind((HOST,PORT))

s.listen(1)

print'Listeningat',s.getsockname()

sc,sockname=s.accept()

print'Acceptedconnectionfrom',sockname

sc.shutdown(socket.SHUT_WR)

whileTrue:

message=get(sc)

ifnotmessage:

break

print'Messagesays:',repr(message)

sc.close()

s.close()

Page 63: Python Networking Gitbook

elifsys.argv[1:]==['client']:

s.connect((HOST,PORT))

s.shutdown(socket.SHUT_RD)

put(s,'Beautifulisbetterthanugly.')

put(s,'Explicitisbetterthanimplicit.')

put(s,'Simpleisbetterthancomplex.')

put(s,'')

s.close()

else:

print>>sys.stderr,'usage:streamer.pyserver|client[host]'

Runningfirsttheserverandthentheclientindifferentterminals,resultoon:

root@erlerobot:~/Python_files#pythonblocks.pyserver

Listeningat('127.0.0.1',1060)

Acceptedconnectionfrom('127.0.0.1',49692)

Messagesays:'Beautifulisbetterthanugly.'

Messagesays:'Explicitisbetterthanimplicit.'

Messagesays:'Simpleisbetterthancomplex.'

root@erlerobot:~/Python_files#

Page 64: Python Networking Gitbook

Notethatsomekindsofdatathatyoumightsendacrossthenetworkalreadyincludesomeformofdelimitingbuilt-in.Ifyouaretransmittingsuchdata,thenyoumightnothavetoimposeyourownframingatopwhatthedataisalreadydoing.ConsiderPython“pickles”forexample,thenativeformofserializationthatcomeswiththeStandardLibrary.Thepicklemoduleimplementsafundamental,butpowerfulalgorithmforserializingandde-serializingaPythonobjectstructure.“Pickling”istheprocesswherebyaPythonobjecthierarchyisconvertedintoabytestream,and“unpickling”istheinverseoperation,wherebyabytestreamisconvertedbackintoanobjecthierarchy.Moreover,usingaquirkymixoftextcommandsanddata,apicklestoresthecontentsofaPythondatastructuresothatyoucanreconstructitlateroronadifferentmachine:

>>>importpickle

>>>pickle.dumps([5,6,7])

'(lp0\nI5\naI6\naI7\na.'

Theinterestingthingabouttheformatisthe'.'characterthatyouseeattheendoftheforegoingstring—itistheformat'swayofmarkingtheendofapickle.Uponencounteringit,theloadercanstopandreturnthevaluewithoutreadinganyfurther.Thuswecantaketheforegoingpickle,sticksomeuglydataontheend,andseethatloads()willcompletelyignoreitandgiveusouroriginallistback:

>>>pickle.loads('(lp0\nI5\naI6\naI7\na.UjJGdVpHRnNaZz09')

[5,6,7]

Ofcourse,usingloads()thiswayisnotusefulfornetworkdata,sinceitdoesnottellushowmanybytesitprocessedinordertoreloadthepickle;westilldonotknowhowmuchofourstringispickledata.Butifweswitchtoreadingfromafileandusingthepickleload()function,thenthefilepointerwillbeleftrightattheendofthepickledata,andwecanstartreadingfromthereifwewanttoreadwhatcameafterthepickle:

>>>fromStringIOimportStringIO

>>>f=StringIO('(lp0\nI5\naI6\naI7\na.UjJGdVpHRnNaZz09')

>>>pickle.load(f)

[5,6,7]

>>>f.pos

18

>>>f.read()

'UjJGdVpHRnNaZz09'

PicklesandSelf-DelimitingFormats

Page 65: Python Networking Gitbook

Ifyourprotocolneedstobeusablefromotherprogramminglanguages—orifyousimplypreferuniversalstandardstoformatsspecifictoPython—thentheJSONandXMLdataformatsareeachapopularchoice.Notethatneitheroftheseformatssupportsframing,soyouwillhavetofirstfigureouthowtoextractacompletestringoftextfromoverthenetworkbeforeyoucanthenprocessit.

JSONisamongthebestchoicesavailabletodayforsendingdatabetweendifferentcomputerlanguages.SincePython2.6,ithasbeenincludedintheStandardLibraryasamodulenamedjson.JSON,shortforJavaScriptObjectNotation,isalightweightformatfordataexchange.JSONisasubsetoftheobjectliteralnotationJavaScriptthatdoesnotrequiretheuseofXML.ForncodingbasicPythonobjecthierarchies:

>>>#Thesyntaxisis:

...

>>>importjson

>>>json.dumps(['foo',{'bar':('baz',None,1.0,2)}])

'["foo",{"bar":["baz",null,1.0,2]}]'

>>>#Example:

...

>>>json.dumps([51,u'Namárië!'])

'[51,"Nam\\u00e1ri\\u00eb!"]'

Fordecodingityoushoulduse:

>>>#Thesyntaxisis:

...

>>>importjson

>>>json.loads('["foo",{"bar":["baz",null,1.0,2]}]')

[u'foo',{u'bar':[u'baz',None,1.0,2]}]

>>>#Anexample:

...

>>>json.loads('{"name":"Lancelot","quest":"Grail"}')

{u'quest':u'Grail',u'name':u'Lancelot'}

NotethattheprotocolfullysupportsUnicodestrings.Itdoes,however,haveaweakness:avastomissionintheJSONstandardisthatitprovidesabsolutelynoprovisionforcleanlypassingbinarydatalikeimagesorarbitrarydocuments.TheXMLformatisbetterfordocuments,sinceitsbasicstructureistotakestringsandmarkthemupbywrappingtheminangle-bracketedelements.

XML,JSON,Etc.

Page 66: Python Networking Gitbook

SincethetimenecessarytotransmitdataoverthenetworkisoftenmoresignificantthanthetimeyourCPUspendspreparingthedatafortransmission,itisoftenworthwhiletocompressdatabeforesendingit.ThepopularHTTPprotocolletsaclientandserverfigureoutwhethertheycanbothsupportcompression.

Aninterestingfactaboutthemostubiquitousformofcompression,theGNUzlibfacility(Forapplicationsthatrequiredatacompression,thefunctionsinthismoduleallowcompressionanddecompression,usingthezliblibrary)thatisavailablethroughthePythonStandardLibrary,isthatitisself-framing.Ifyoustartfeedingitacompressedstreamofdata,thenitcantellyouwhenthecompresseddatahasendedandfurther,uncompresseddatahasarrivedpastitsend.

Mostprotocolschoosetodotheirownframingandthen,ifdesired,passtheresultingblocktozlibfordecompression.Butyoucouldconceivablypromiseyourselfthatyouwouldalwaystackabitofuncompresseddataontotheendofeachzlibcompressedstring—here,wewilluseasingle'.'byte—andwatchforyourcompressionobjecttosplitoutthat“extradata”asthesignalthatyouaredone.Considerthiscombinationoftwocompresseddatastreams:

>>>importzlib

>>>data=zlib.compress('sparse')+'.'+zlib.compress('flat')+'.'

>>>data

'x\x9c+.H,*N\x05\x00\t\r\x02\x8f.x\x9cK\xcbI,\x01\x00\x04\x16\x01\xa8.'

>>>len(data)

28

Imaginethatthese28bytesarriveattheirdestinationin8-bytepackets.Afterprocessingthefirstpacket,wewillfindthedecompressionobject'sunused_dataslotstillempty,whichtellsusthatthereisstillmoredatacoming,sowewouldrecv()onoursocketagain:

>>>dobj=zlib.decompressobj()

>>>dobj.decompress(data[0:8]),dobj.unused_data

('spars','')

Butthesecondblockofeightcharacters,whenfedtoourdecompressobject,bothfinishesoutthecompresseddatawewerewaitingfor(sincethefinal'e'completesthestring'sparse')andalsofinallyhasanon-emptyunused_datavaluethatshowsusthatwefinallyreceivedour'.'byte:

>>>dobj.decompress(data[8:16]),dobj.unused_data

('e','.x')

Ifanotherstreamofcompresseddataiscoming,thenwehavetoprovideeverythingpastthe'.'—inthiscase,thecharacter'x'—toournewdecompressobject,thenstartfeedingittheremaining“packets”:

>>>dobj2=zlib.decompressobj()

>>>dobj2.decompress('x'),dobj2.unused_data

('','')

>>>dobj2.decompress(data[16:24]),dobj2.unused_data

('flat','')

>>>dobj2.decompress(data[24:]),dobj2.unused_data

('','.')

Atthispoint,unused_dataisagainnon-empty,meaningthatwehavereadpasttheendofthissecondboutofcompresseddataandcanexamineitscontent.

Compression

Page 67: Python Networking Gitbook

Dependingontheprotocolimplementationthatyouareusing,youmighthavetodealonlywithexceptionsspecifictothatprotocol,oryoumighthavetodealwithbothprotocol-specificexceptionsandwithrawsocketerrorsaswell.

Theexceptionsthatarespecifictosocketoperationsare:

socket.gaierror:Thisexceptionisraisedwhengetaddrinfo()cannotfindanameorservicethatyouaskabout—hencethelettersG,A,andIinitsname.

>>>importsocket

>>>s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)

>>>s.connect(('nonexistent.hostname.foo.bar',80))

Traceback(mostrecentcalllast):

...

gaierror:[Errno-5]Noaddressassociatedwithhostname

socket.error:Thisistheworkhorseofthesocketmodule,andwillberaisedfornearlyeveryfailurethatcanhappenatanystageinanetworktransmission.

socket.timeout:Thisexceptionisraisedonlyifyou,oralibrarythatyouareusing,decidestosetatimeoutonasocketratherthanwaitforeverforasend()orrecv()tocomplete.Itindicatesthatthetimeoutwasreachedbeforetheoperationcouldcompletenormally.

NetworkExceptions

Page 68: Python Networking Gitbook

Therearefourbasicapproachesofhandlingtheerrorsthatcanoccur.

Thefirstisnottohandleexceptionsatall.IfonlyyouoronlyotherPythonprogrammerswillbeusingyourscript,thentheywillprobablynotbefazedbyseeinganexception.Ifyouarewritingalibraryofcallstobeusedbyotherprogrammers,thenthisfirstapproachisusuallypreferable,sincebylettingtheexceptionthroughyougivetheprogrammerusingyourAPIthechancetodecidehowtopresenterrorstohisorherusers.

Ifyouareindeedwritingalibrary,thenthereisasecondapproachtoconsider:wrappingthenetworkerrorsinanexceptionofyourown.

Athirdapproachtoexceptionsistowrapatry…exceptclausearoundeverysinglenetworkcallthatyouevermake,andprintoutapithyerrormessageinitsplace.Whilesuitableforshortprograms,thiscanbecomeveryrepetitivewhenlongprogramsareinvolved,withoutnecessarilyprovidingthatmuchmoreinformationfortheuser.

Thereisonefinalreasonthatmightdictatewhereyouaddanexceptionhandlertoyournetworkprogram:youmightwanttointelligentlyre-tryanoperationthatfailed.

HandlingExceptions

Page 69: Python Networking Gitbook

Beforeyousendsensitivedataacrossanetwork,youneedproofoftheidentityofthemachinethatyouthinkisontheotherendofthesocket,andwhilesendingthedata,youneeditprotectedagainstthepryingeyesofanyonecontrollingthegatewaysandnetworkswitchesthatseeallofyourpackets.ThesolutiontothisproblemistouseTransportLayerSecurity(TLS).BecauseearlierversionsofTLSwerecalledtheSecureSocketsLayer(SSL),nearlyallofthelibrariesthatyouwillusetospeakTLSactuallystillhaveSSLsomewhereinthename.

TLSandSSL

Page 70: Python Networking Gitbook

ThereareseveralsecurityproblemsthatTLSisdesignedtosolve.Theyarebestunderstoodbyconsideringthedangersofsendingyournetworkdataas“cleartext”overaplainoldsocket,whichcopiesyourdatabyte-for-byteintothepacketsthatgetsentoverthenetwork.

Whataretheconsequencesofsomeonewhocannowobserve,capture,andanalyzeyourdataathisleisure?

Hecanseeallofthedatathatpassesoverthatsegmentofthenetwork.Thefractionofyourdatathathecancapturedependsonhowmuchofitpassesoverthatparticularlink.

Hewillseeanyusernamesandpasswordsthatyourclientsusetoconnecttotheserversbehindthem.

Logmessagescanalsobeintercepted,iftheyarebeingsenttoacentrallocationandhappentotraveloveracompromisedIPsegmentordevice.Thiscouldbeveryusefuliftheobserverwantstoprobeforvulnerabilitiesinyoursoftware.

Ifyourdatabaseserverisnotpickyaboutwhoconnects,asidefromcaringthatthewebfrontendsendsapassword,thentheattackercannowlauncha“replayattack,”inwhichhemakeshisownconnectiontoyourdatabaseanddownloadsallofthedatathatafront-endserverisnormallyallowedtoaccess.

Imagineanattackerwhocannotyetaltertrafficonyournetworkitself,butwhocancompromiseoneoftheservicesaroundtheedgesthathelpyourserversfindeachother.Specifically,whatifshecancompromisetheDNSservicethatletsyourwebfrontendsfindyourdb.example.comserver.Thensomeinterestingtricksmightbecomepossible:

Whenyourfrontendsaskforthehostnamedb.example.com,shecouldanswerwiththeIPaddressofherownserver,locatedanywhereintheworld,instead.

Thefakedatabaseserverwillbeatalosstoanswerrequestswithanyrealdatathattheintruderhasnotalreadycopieddownoffthenetwork.

Ifyourdatabaseisnotcarefullylockeddownandsoisnotpickyaboutwhichserversconnect,thentheattackercandosomethingmoreinteresting:asrequestsstartarrivingatherfakedatabaseserver,hecanhaveitturnaroundandforwardthoserequeststotherealdatabaseserver.Thisiscalleda“man-in-the-middle”attack:hewillbeinfairlycompletecontrolofyourapplication.

Whileproxyingtheclientrequeststhroughtothedatabase,theattackerwillprobablyalsohavetheoptionofinsertingqueriesofherownintotherequeststream.Thiscouldletherdownloadentiretablesofdataanddeleteorchangewhateverdatathefront-endservicesaretypicallyallowedtomodify.

CleartextontheNetwork

Page 71: Python Networking Gitbook

ThesecrettoTLSispublic-keycryptography.Thereareseveralmathematicalschemesthathavebeenprovedabletosupportpublic-keyschemes,buttheyallhavethesethreefeatures:

Anyonecangenerateakeypair,consistingofaprivatekeythattheykeeptothemselvesandapublickeythattheycanbroadcasthowevertheywant.

Ifthepublickeyisusedtoencryptinformation,thentheresultingblockofbinarydatacannotbereadbyanyone,anywhereintheworld,exceptbysomeonewhoholdstheprivatekey.

Ifthesystemthatholdstheprivatekeyusesittoencryptinformation,thenanycopyofthepublickeycanbeusedtodecryptthedata.

WewillfocusonhowpublickeysareusedintheTLSsystem:PublickeysareusedattwodifferentlevelswithinTLS:first,toestablishacertificateauthority(CA)systemthatletsserversprove“whotheyreallyare”totheclientsthatwanttoconnect;and,second,tohelpaparticularclientandservercommunicatesecurely.

TLSEncryptsYourConversations

Page 72: Python Networking Gitbook

Fromthepointofviewofyournetworkprogram,youstartaTLSconnectionbyturningcontrolofasocketovertoanSSLlibrary.Bydoingso,youindicatethatyouwanttostopusingthesocketforcleartextcommunication,andstartusingitforencrypteddataunderthecontrolofthelibrary.

Fromthatpointon,younolongerusetherawsocket;doingsowillcauseanerrorandbreaktheconnection.Instead,youwilluseroutinesprovidedbythelibrarytoperformallcommunication.BothclientandservershouldturntheirsocketsovertoSSLatthesametime,afterreadingallpendingdataoffofthesocketinbothdirections.TherearetwogeneralapproachestousingSSL:

ThemoststraightforwardoptionisprobablytousethesslpackagethatrecentversionsofPythonshipwiththeStandardLibrary.

Theotheralternativeistouseathird-partyPythonlibrary.ThereareseveralofthesethatsupportTLS,butmanyofthemaredecrepitandseemtohavebeenabandoned.ForexampleM2Cryptopackage.

SupportingTLSinPython

Page 73: Python Networking Gitbook

HereyoucanfindanexampleoftheuseofTLS.Thefirstandlastfewlinesofthisfilesslclient.pylookcompletelynormal:openingasockettoaremoteserver,andthensendingandreceivingdatapertheprotocolthattheserversupports.Thecryptographicprotectionisinvokedbythefewlinesofcodeinthemiddle—twolinesthatloadacertificatedatabaseandmaketheTLSconnectionitself,andthenthecalltomatch_hostname()thatperformsthecrucialtestofwhetherwearereallytalkingtotheintendedserverorperhapstoanimpersonator.

importos,socket,ssl,sys

frombackports.ssl_match_hostnameimportmatch_hostname,CertificateError

try:

script_name,hostname=sys.argv

exceptValueError:

print>>sys.stderr,'usage:sslclient.py<hostname>'

sys.exit(2)

#Firstweconnect,asusual,withasocket.

sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)

sock.connect((hostname,443))

#Next,weturnthesocketovertotheSSLlibrary!

ca_certs_path=os.path.join(os.path.dirname(script_name),'certfiles.crt')

sslsock=ssl.wrap_socket(sock,ssl_version=ssl.PROTOCOL_SSLv3,

cert_reqs=ssl.CERT_REQUIRED,ca_certs=ca_certs_path)

#Doesthecertificatethattheserverproffered*really*matchthe

#hostnametowhichwearetryingtoconnect?Weneedtocheck.

try:

match_hostname(sslsock.getpeercert(),hostname)

exceptCertificateError,ce:

print'Certificateerror:',str(ce)

sys.exit(1)

#Fromhereon,our`sslsock`workslikeanormalsocket.Wecan,for

#example,makeanimpromptuHTTPcall.

sslsock.sendall('GET/HTTP/1.0\r\n\r\n')

result=sslsock.makefile().read()#quickwaytoreaduntilEOF

sslsock.close()

print'Thedocumenthttps://%s/is%dbyteslong'%(hostname,len(result))

Notethatthecertificatedatabaseneedstobeprovidedasafilenamedcertfiles.crtinthesamedirectoryasthescript.

root@erlerobot:~/Python_files#cat/etc/ssl/certs/*>certfiles.crt

root@erlerobot:~/Python_files#sslclient.pywww.openssl.org

Thedocumenthttps://www.openssl.org/is15941byteslong

TheStandardSSLModule

Page 74: Python Networking Gitbook

ThischapterexploreshownetworkprogrammingintersectswiththegeneraltoolsandtechniquesthatPythondevelopersusetowritelong-runningdaemonsthatcanperformsignificantamountsofworkbykeepingacomputeranditsprocessorsbusy.

ServerArchitecture

Page 75: Python Networking Gitbook

Adaemonisacomputerprogramthatrunsasabackgroundprocess,ratherthanbeingunderthedirectcontrolofaninteractiveuser.Youcanalsoinstallpython-daemonfromthePackage,anditscodewillletyourserverprogrambecomeadaemonentirelyonitsownpower.

Anotherusefulthingisthemodernloggingmodule,whichcanwritetosyslog,files,networksockets,oranythinginbetween.Thesimplestpatternistoplacesomethinglikethisatthetopofeachofyourdaemon’ssourcefiles:

importlogging

log=logging.getLogger(__name__)

Thenyourcodecangeneratemessagesverysimply:

log.error('thesystemisdown')

DaemonsandLogging

Page 76: Python Networking Gitbook

Inthisminimalistprotocollancelot.py,theclientopensasocket,sendsacrossoneofthethreequestionsaskedofSirLauncelotattheBridgeofDeathinMontyPython’sHolyGrailmovie,andthenterminatesthemessagewithaquestionmark:Whatisyourname?Theserverrepliesbysendingbacktheappropriateanswer,whichalwaysendswithaperiod:MynameisSirLauncelotofCamelot.BothquestionandanswerareencodedasASCII.

importsocket,sys

PORT=1060

qa=(('Whatisyourname?','MynameisSirLancelotofCamelot.'),

('Whatisyourquest?','ToseektheHolyGrail.'),

('Whatisyourfavoritecolor?','Blue.'))

qadict=dict(qa)

defrecv_until(sock,suffix):

message=''

whilenotmessage.endswith(suffix):

data=sock.recv(4096)

ifnotdata:

raiseEOFError('socketclosedbeforewesaw%r'%suffix)

message+=data

returnmessage

defsetup():

iflen(sys.argv)!=2:

print>>sys.stderr,'usage:%sinterface'%sys.argv[0]

exit(2)

interface=sys.argv[1]

sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)

sock.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)

sock.bind((interface,PORT))

sock.listen(128)

print'Readyandlisteningat%rport%d'%(interface,PORT)

returnsock

Theservercodeisserver_simple.py:

importlancelot

defhandle_client(client_sock):

try:

whileTrue:

question=lancelot.recv_until(client_sock,'?')

answer=lancelot.qadict[question]

client_sock.sendall(answer)

exceptEOFError:

client_sock.close()

defserver_loop(listen_sock):

whileTrue:

client_sock,sockname=listen_sock.accept()

handle_client(client_sock)

if__name__=='__main__':

listen_sock=lancelot.setup()

server_loop(listen_sock)

Anyway,thissimpleserverhasterribleperformancecharacteristics.Thedifficultycomeswhenmanyclientsallwanttoconnectatthesametime.Thefirstclient’ssocketwillbereturnedbyaccept(),andtheserverwillenterthehandle_client()looptostartansweringthatfirstclient’squestions.Butwhilethequestionsandanswersaretrundlingbackandforthacrossthenetwork,alloftheotherclientsareforcedtoqueueup.

Introductoryexample

Page 77: Python Networking Gitbook

Wewilltacklethedeficienciesofthesimpleservershowninserver_simple.pyintwodiscussions.First,inthissection,wewilldiscusshowmuchtimeitspendswaitingevenononeclientthatneedstoaskseveralquestions;andinthenextsection,wewilllookathowitbehaveswhenconfrontedwithmanyclientsatonce.AsimpleclientfortheLauncelotprotocolconnects,askseachofthethreequestionsonce,andthendisconnects.Thecodeofclient.pyisthefollowing:

importsocket,sys,lancelot

defclient(hostname,port):

s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)

s.connect((hostname,port))

s.sendall(lancelot.qa[0][0])

answer1=lancelot.recv_until(s,'.')#answersendwith'.'

s.sendall(lancelot.qa[1][0])

answer2=lancelot.recv_until(s,'.')

s.sendall(lancelot.qa[2][0])

answer3=lancelot.recv_until(s,'.')

s.close()

printanswer1

printanswer2

printanswer3

if__name__=='__main__':

ifnot2<=len(sys.argv)<=3:

print>>sys.stderr,'usage:client.pyhostname[port]'

sys.exit(2)

port=int(sys.argv[2])iflen(sys.argv)>2elselancelot.PORT

client(sys.argv[1],port)

Withthesetwoscriptsinplace,wecanstartrunningourserverinoneconsolewindow:

```

root@erlerobot:~/Python_files#pythonserver_simple.pylocalhost

Readyandlisteningat'localhost'port1060

```

Wecanthenrunourclientinanotherwindow,andseethethreeanswersreturnedbytheserver:

root@erlerobot:~/Python_files#pythonclient.pylocalhost

MynameisSirLancelotofCamelot.

ToseektheHolyGrail.

Blue.

Theclientandserverrunveryquicklyhereonmylaptop.Butappearancesaredeceiving,sowehadbetterapproachthisclient-serverinteractionmorescientificallybybringingrealmeasurementstobearuponitsactivity.

Thesolutionformeasuringtherealwaitingtimewhenrunningtheclientandserveronasinglemachine,buttosendtheconnectionthrougharound-triptoanothermachinebywayofanSSHtunnel.

Whendoingthisyouwillnoticehowthecostofcommunicationdominatestheperformance.Itwillalwaysseemtotakelessthan10μsfortheservertoruntheanswer=lineandretrievetheresponsethatcorrespondstoaparticularquestion.Ifactuallygeneratingtheanswerweretheclient’sonlyjob,thenwecouldexpectittoservemorethan100,000clientrequestspersecondwayofanSSHtunnel.Butlookatallofthetimethattheclientandserverspendwaitingforthenetwork:everytimeoneofthemfinishesasendall()call,ittakesbetween500μsand800μsbeforetheotherconversationpartnerisreleasedfromits`recv()callandcanproceed.

Nowon,wemayneedasystemforcomparingthesubsequentserverdesignsthatweexplore.Wearethereforegoingtoturnnowtoapublictool:theFunkLoadtool,writteninPythonandavailablefromthePythonPackageIndex.

Elementaryclient

Page 78: Python Networking Gitbook

root@erlerobot:~/Python_files#pipinstallfunkload

Page 79: Python Networking Gitbook

Thesimpleserverwehavebeenexamininghastheproblemthatthe`recv()calloftenfindsthatnodataisyetavailablefromtheclient,sothecall“blocks”untildataarrives.Thetimespentwaiting,aswehaveseen,istimelost;itcannotbespentusefullybytheservertoanswerrequestsfromotherclients.

Butwhatifweavoidedevercallingrecv()untilweknewthatdatahadarrivedfromaparticularclient.Theresultwouldbeaneventdrivenserverthatsitsinatightloopwatchingmanyclients;Ihavewrittenanexample,showninserver_poll.

importlancelot

importselect

listen_sock=lancelot.setup()

sockets={listen_sock.fileno():listen_sock}

requests={}

responses={}

poll=select.poll()

poll.register(listen_sock,select.POLLIN)

whileTrue:

forfd,eventinpoll.poll():

sock=sockets[fd]

#Removedclosedsocketsfromourlist.

ifevent&(select.POLLHUP|select.POLLERR|select.POLLNVAL):

poll.unregister(fd)

delsockets[fd]

requests.pop(sock,None)

responses.pop(sock,None)

#Acceptconnectionsfromnewsockets.

elifsockislisten_sock:

newsock,sockname=sock.accept()

newsock.setblocking(False)

fd=newsock.fileno()

sockets[fd]=newsock

poll.register(fd,select.POLLIN)

requests[newsock]=''

#Collectincomingdatauntilitformsaquestion.

elifevent&select.POLLIN:

data=sock.recv(4096)

ifnotdata:#end-of-file

sock.close()#makesPOLLNVALhappennexttime

continue

requests[sock]+=data

if'?'inrequests[sock]:

question=requests.pop(sock)

answer=dict(lancelot.qa)[question]

poll.modify(sock,select.POLLOUT)

responses[sock]=answer

#Sendoutpiecesofeachreplyuntiltheyareallsent.

elifevent&select.POLLOUT:

response=responses.pop(sock)

n=sock.send(response)

ifn<len(response):

responses[sock]=response[n:]

else:

poll.modify(sock,select.POLLIN)

requests[sock]=''

Themainloopinthisprogramiscontrolledbythepollobject,whichisqueriedatthetopofeveryiteration.Thepoll()callisablockingcall,thedifferenceisthatrecv()hastowaitononesingleclient,whilepoll()canwaitondozensorhundredsofclients,andreturnwhenanyofthemshowsactivity.

Thewaypoll()worksisthatwetellitwhichsocketsweneedtomonitor,andwhethereachsocketinterestsusbecausewewanttoreadfromitorwritetoit.Whenoneormoreofthesocketsareready,poll()returnsandprovidesalistofthe

Event-DrivenServers

Page 80: Python Networking Gitbook

socketsthatwecannowuse.

Tokeepthingsstraightwhenreadingthecode,thinkaboutthelifespanofoneparticularclientandtracewhathappenstoitssocketanddata.

Theclientwillfirstdoaconnect(),andtheserver’spoll()callwillreturnanddeclarethatthereisdatareadyonthemainlisteningsocket.Thatcanmeanonlyonething,anewclienthasconnected.Soweaccept()theconnectionandtellourpollobjectthatwewanttobenotifiedwhendatabecomesavailableforreadingfromthenewsocket.Tomakesurethattherecv()andsend()methodsonthesocketneverblockandfreezeoureventloop,wecallthesetblocking()socketmethodwiththevalueFalse(whichmeans“blockingisnotallowed”).

Whendatabecomesavailable,theincomingstringisappendedtowhateverisalreadyintherequestsdictionaryundertheentryforthatsocket.(socketscansafelybeusedasdictionarykeysinPython)

Wekeepacceptingmoredatauntilweseeaquestionmark,atwhichpointtheLauncelotquestioniscomplete.Thequestionsaresoshortthat,inpractice,theyprobablyallarriveintheveryfirstrecv()fromeachsocket;butjusttobesafe,wehavetobepreparedtomakeseveralrecv()callsuntilthewholequestionhasarrived.Wethenlookuptheappropriateanswer,storeitintheresponsesdictionaryundertheentryforthisclientsocket,andtellthepollobjectthatwenolongerwanttolistenformoredatafromthisclientbutinsteadwanttobetoldwhenitssocketcanstartacceptingoutgoingdata.

Onceasocketisreadyforwriting,wesendasmuchoftheansweraswillfitintoonesend()callontheclientsocket.This,bytheway,isabigreasonsend()returnsalength:becauseifyouuseitinnon-blockingmode,thenitmightbeabletosendonlysomeofyourbyteswithoutmakingyouwaitforabuffertodrainbackdown.

Oncethisserverhasfinishedtransmittingtheanswer,wetellthepollobjecttoswaptheclientsocketbackovertobeinglistenedtofornewincomingdata.

Aftermanyquestion-answerexchanges,theclientwillfinallyclosetheconnection.Oddlyenough,thePOLLHUP,POLLERR,andPOLLNVALcircumstancesthatpoll()cantellusabout—allofwhichindicatethattheconnectionhasclosedonewayoranother—arereturnedonlyifwearetryingtowritetothesocket,notreadfromit.Sowhenanattempttoreadreturnszerobytes,wehavetotellthepollobjectthatwenowwanttowritetothesocketsothatwereceivetheofficialnotificationthattheconnectionisclosed.

Aslightlyoldermechanismforwritingevent-drivenserversthatlistentosocketsistousetheselect()call,whichlikepoll()isavailablefromthePythonselectmoduleintheStandardLibrary.Irecommendtouse`poll()becauseitproducesmuchcleanercode,butmanypeoplechooseselect()becauseitissupportedonWindows.

Whentalkingaboutevent-drivenservers,youshouldtakeintoaccountthefollowing:Event-DrivenServersareBlockingandSynchronous.Referringtotheevent-drivenservers,liketheoneinserver_poll.py,somepeoplecallthem“non-blocking,”despitethefactthatthepoll()callblocks(theymeanthatitdoesnotblockwaitingforanyparticularclient),andotherscallthem“asynchronous”despitethefactthattheprogramexecutesitsstatementsintheirusuallinearorder.

Twothingsyoushouldknow

Page 81: Python Networking Gitbook

Ishouldaddaquicknoteabouthowrecv()andsend()behaveinnon-blockingmode,whenyouhavecalledsetblocking(False)ontheirsocket.A`poll()loopliketheonejustshownmeansthatweneverfinishcallingeitherofthesefunctionswhentheycannotacceptorprovidedata.Butwhatifwefindourselvesinasituationwherewewanttocalleitherfunctioninnon-blockingmodeanddonotyetknowwhetherthesocketisready?

Fortherecv()call,thesearetherules:

Ifdataisready,itisreturned.Ifnodatahasarrived,socket.errorisraised.Iftheconnectionhasclosed,''isreturned.

Notethatclosedconnectionreturnsavalue,butastill-openconnectionraisesanexception.Thelogicbehindthisbehavioristhatthefirstandlastpossibilitiesarebothpossibleinblockingmodeaswell:eitheryougetdataback,orfinallytheconnectionclosesandyougetbackanemptystring.Sotocommunicatetheextra,thirdpossibilitythatcanhappeninnon-blockingmode—thattheconnectionisstillopenbutnodataisreadyyet—anexceptionisused.

Thebehaviorofnon-blockingsend()issimilar:

Somedataissent,anditslengthisreturned.Thesocketbuffersarefull,sosocket.errorisraised.Iftheconnectionisclosed,socket.errorisalsoraised.

Thisevidencethatpoll()couldsaythatasocketisreadyforsending,butaFINpacketfromtheclientcouldarriverightaftertheserverisreleasedfromitspoll()butbeforeitcanstartupitssend()call.

TheSemanticsofNon-blocking

Page 82: Python Networking Gitbook

ThereareacoupleofPythonfactstotakeintoaccountwhenyouarecomputingyourownevent-drivenserver.

IthappensthatPythoncomeswithanevent-drivenframeworkbuiltintotheStandardLibrary.Iamgoingtorecommendthatyouignoreitentirely.Itisapairofancientmodules,asyncoreandasynchat,thatdatefromtheearlydaysofPython—youwillnotethatalloftheclassestheydefinearelowercase,indefianceofbothgoodtasteandallsubsequentpractice—andthattheyaredifficulttousecorrectly.

Instead,wewilltalkaboutTwistedPython.TwistedPythonisnotsimplyaframework;itisanevent-drivennetworkingengineforPython.TheTwistedcommunityhasdevelopedawayofwritingPythonthatisalltheirown.

Takealookatserver_twisted.pyforhowsimpleourevent-drivenservercanbecomeifweleavethetroubleofdealingwiththelow-leveloperatingsystemcallstosomeoneelse.

fromtwisted.internet.protocolimportProtocol,ServerFactory

fromtwisted.internetimportreactor

importlancelot

classLancelot(Protocol):

defconnectionMade(self):

self.question=''

defdataReceived(self,data):

self.question+=data

ifself.question.endswith('?'):

self.transport.write(dict(lancelot.qa)[self.question])

self.question=''

factory=ServerFactory()

factory.protocol=Lancelot

reactor.listenTCP(1060,factory)

reactor.run()

Fromthenon,everyeventonthatsocketistranslatedintoamethodcalltoourobject,lettinguswritecodethatappearstobethinkingaboutjustoneclientatatime.ButthankstothefactthatTwistedwillcreatedozensorhundredsofourLauncelotprotocolobjects,onecorrespondingtoeachconnectedclient,theresultisaneventloopthatcanrespondtowhicheverclientsocketsareready.

HereyoucanfindmoreinfromationaboutTwistedPython

TwistedPython

Page 83: Python Networking Gitbook

Theessentialideaofathreadedormulti-processserveristhatwetakethesimpleandstraightforwardserverthatwestartedoutwith(theserver_simple.py)andrunseveralcopiesofitatoncesothatwecanserveseveralclientsatonce,withoutmakingthemwaitoneachother.

Usingmultiplethreadsorprocessesisverycommon,especiallyinhigh-capacitywebanddatabaseservers.IntheStandardLibraryyoucanfindthemultiprocessingmodule.

(Note:Themainprogramlogicdoesnotevenknowwhichsolutionisbeingused;thetwoclasseshaveasimilarenoughinterfacethateitherThreadorProcesscanherebeusedinterchangeably.)

Looktheexampleatserver_multi.py:

importsys,time,lancelot

frommultiprocessingimportProcess

fromserver_simpleimportserver_loop

fromthreadingimportThread

WORKER_CLASSES={'thread':Thread,'process':Process}

WORKER_MAX=10

defstart_worker(Worker,listen_sock):

worker=Worker(target=server_loop,args=(listen_sock,))

worker.daemon=True#exitwhenthemainprocessdoes

worker.start()

returnworker

if__name__=='__main__':

iflen(sys.argv)!=3orsys.argv[2]notinWORKER_CLASSES:

print>>sys.stderr,'usage:server_multi.pyinterfacethread|process'

sys.exit(2)

Worker=WORKER_CLASSES[sys.argv.pop()]#setup()wantslen(argv)==2

#Everyworkerwillaccept()foreveronthesamelisteningsocket.

listen_sock=lancelot.setup()

workers=[]

foriinrange(WORKER_MAX):

workers.append(start_worker(Worker,listen_sock))

#Checkeverytwosecondsfordeadworkers,andreplacethem.

whileTrue:

time.sleep(2)

forworkerinworkers:

ifnotworker.is_alive():

printworker.name,"died;startingreplacementworker"

workers.remove(worker)

workers.append(start_worker(Worker,listen_sock))

Asyoucanseeitislettingmultiplethreadsorprocessesallcallaccept()ontheverysameserversocket,andinsteadofraisinganerrorandinsistingthatonlyonethreadatatimebeabletowaitforanincomingconnection,theoperatingsystempatientlyqueuesupallofourwaitingworkersandthenwakesuponeworkerforeachnewconnectionthatarrives.Thefactthatalisteningsocketcanbesharedatallbetweenthreadsandprocesses,andthattheoperatingsystemdoesround-robinbalancingamongtheworkersthatarewaitingonanaccept()call,isoneofthegreatgloriesofthePOSIXnetworkstackandexecutionmodel;itmakesprogramslikethisverysimpletowrite.

ThreadingandMulti-processing

Page 84: Python Networking Gitbook

TheSocketServermodulesimplifiesthetaskofwritingnetworkservers.

Therearefourbasicserverclasses:TCPServer,UDPServer,UnixDatagramServerandUnixStreamServer.

Thesefourclassesprocessrequestssynchronously;eachrequestmustbecompletedbeforethenextrequestcanbestarted.Thisisn’tsuitableifeachrequesttakesalongtimetocomplete,becauseitrequiresalotofcomputation,orbecauseitreturnsalotofdatawhichtheclientisslowtoprocess.

Inserver_SocketServer.py,youcanseehowsmallourmulti-threadedserverbecomeswhenittakesadvantageofthisframework.(ThereisalsoaForkingMixInthatyoucanuseifyouwantittospawnseveralprocesses—atleastonaPOSIXsystem.)

fromSocketServerimportThreadingMixIn,TCPServer,BaseRequestHandler

importlancelot,server_simple,socket

classMyHandler(BaseRequestHandler):

defhandle(self):

server_simple.handle_client(self.request)

classMyServer(ThreadingMixIn,TCPServer):

allow_reuse_address=1

#address_family=socket.AF_INET6#ifyouneedIPv6

server=MyServer(('',lancelot.PORT),MyHandler)

server.serve_forever()

Whereasourearlierexamplecreatedtheworkersupfrontsothattheywereallsharingthesamelisteningsocket,theSocketServerdoesallofitslisteninginthemainthreadandcreatesoneworkereachtimeaccept()returnsanewclientsocket.

ThreadingandMulti-processingFrameworks

Page 85: Python Networking Gitbook

ThischaptersurveysthehandfuloftechnologiesthathavetogetherbecomefundamentalbuildingblocksforexpandingapplicationstoInternetscale.

Thischapter’spurposeistointroduceyoutotheproblemthateachtoolsolves;explainhowtousetheservicetoaddressthatissue;andgiveafewhintsaboutusingthetoolfromPython.

Caches,MessageQueues,andMap-Reduce

Page 86: Python Networking Gitbook

Memcachedisthe“memorycachedaemon.”ItsimpactonmanylargeInternetserviceshasbeen,byallaccounts,revolutionary.AfterglancingathowtouseitfromPython,wewilldiscussitsimplementation,whichwillteachusaboutaveryimportantmodernnetworkconceptcalledsharding.

TheactualproceduresforusingMemcachedaredesignedtobeverysimple:

YourunaMemcacheddaemononeveryserverwithsomesparememory.YoumakealistoftheIPaddressandportnumbersofyournewMemcacheddaemons,anddistributethislisttoalloftheclientsthatwillbeusingthecache.Yourclientprogramsnowhaveaccesstoanorganization-wideblazing-fastkeyvaluecachethatactssomethinglikeabigPythondictionarythatallofyourserverscanshare.ThecacheoperatesonanLRU(least-recently-used)basis,droppingolditemsthathavenotbeenaccessedforawhilesothatithasroomtobothacceptnewentriesandkeeprecordsthatarebeingfrequentlyaccessed.

UsingMemcached

Page 87: Python Networking Gitbook

EnoughPythonclientsarecurrentlylistedforMemcachedthatIhadbetterjustsendyoutothepagethatliststhem,ratherthantrytoreviewthemhere:http://code.google.com/p/memcached/wiki/Clients.TheclientthattheylistfirstiswritteninpurePython,andthereforewillnotneedtocompileagainstanylibraries.MemcachedcanbeinstallthankstobeingavailableonthePythonPackageIndex:

root@erlerobot:~/Python_files#pipinstallpython-memcached

``

Theinterfaceisstraightforward.Thoughyoumighthaveexpectedaninterfacethatmorestrongly

resemblesaPythondictionarywithnativemethodslike`__getitem__`,theauthorofpython-memcached

choseinsteadtousethesamemethodnamesasareusedinotherlanguagessupportedby

Memcached—whichIthinkwasagooddecision,sinceitmakesiteasiertotranslateMemcached

examplesintoPython:

```python

>>>importmemcache

>>>mc=memcache.Client(['127.0.0.1:11211'])

>>>mc.set('user:19','{name:"Lancelot",quest:"Grail"}')

True

>>>mc.get('user:19')

'{name:"Lancelot",quest:"Grail"}'

ThebasicpatternbywhichMemcachedisusedfromPythonisshowninsquares.py.Beforeembarkingonan(artificially)expensiveoperation,itchecksMemcachedtoseewhethertheanswerisalreadypresent.Ifso,thentheanswercanbereturnedimmediately;ifnot,thenitiscomputedandstoredinthecachebeforebeingreturned.

importmemcache,random,time,timeit

mc=memcache.Client(['127.0.0.1:11211'])

defcompute_square(n):

value=mc.get('sq:%d'%n)

ifvalueisNone:

time.sleep(0.001)#pretendthatcomputingasquareisexpensive

value=n*n

mc.set('sq:%d'%n,value)

returnvalue

defmake_request():

compute_square(random.randint(0,5000))

print'Tensuccessiveruns:',

foriinrange(1,11):

print'%.2fs'%timeit.timeit(make_request,number=2000),

print

TheMemcacheddaemonneedstoberunningonyourmachineatport11211forthisexampletosucceed.Forthefirstfewhundredrequests,ofcourse,theprogramwillrunatitsusualspeed.Butasthecachebeginstoaccumulatemorerequests,itisabletoaccelerateanincreasinglylargefractionofthem.

root@erlerobot:~/Python_files#pythonsquares.py

Tensuccessiveruns:2.75s1.98s1.51s1.14s0.90s0.82s0.71s0.65s0.58s0.55s

Thispatternisgenerallycharacteristicofcaching:agradualimprovementasthecachebeginstocovertheproblemdomain,andthenstabilityaseitherthecachefillsortheinputdomainhasbeenfullycovered.

YoumustalwaysrememberthatMemcachedisacache;itisephemeral,itusesRAMforstorage,and,ifre-started,itremembersnothingthatyouhaveeverstored!Yourapplicationshouldalwaysbeabletorecoverifthecacheshoulddisappear.

Page 88: Python Networking Gitbook

ThedesignofMemcachedillustratesanimportantprinciplethatisusedinseveralotherkindsofdatabases,andwhichyoumightwanttoemployinarchitecturesofyourown:theclientsshardthedatabasebyhashingthekeys’stringvaluesandlettingthehashdeterminewhichmemberoftheclusterisconsultedforeachkey.

Tounderstandwhythisiseffective,consideraparticularkey/valuepair—likethekeysq:42andthevalue1764thatmightbestoredbysquares.py.TomakethebestuseoftheRAMithasavailable,theMemcachedclusterwantstostorethiskeyandvalueexactlyonce.Buttomaketheservicefast,itwantstoavoidduplicationwithoutrequiringanycoordinationbetweenthedifferentserversorcommunicationbetweenalloftheclients.

Thismeansthatalloftheclients,withoutanyotherinformationtogoonthan(a)thekeyand(b)thelistofMemcachedserverswithwhichtheyareconfigured,needsomeschemeforworkingoutwherethatpieceofinformationbelongs.Iftheyfailtomakethesamedecision,thennotonlymightthekeyandvaluebecopiedontoseveralserversandreducetheoverallmemoryavailable,butalsoaclient’sattempttoremoveaninvalidentrycouldleaveotherinvalidcopieselsewhere.

Thesolutionisthattheclientsallimplementasingle,stablealgorithmthatcanturnakeyintoanintegernthatselectsoneoftheserversfromtheirlist.Theydothisbyusinga“hash”algorithm,whichmixesthebitsofastringwhenforminganumbersothatanypatterninthestringis,hopefully,obliterated.YoucanfindhashlibmoduleinthePythonStandardLibrary.

Toseewhypatternsinkeyvaluesmustbeobliterated,considerhashing.py.ItloadsadictionaryofEnglishwords(youmighthavetodownloadadictionaryofyourownoradjustthepathtomakethescriptrunonyourownmachine),andexploreshowthosewordswouldbedistributedacrossfourserversiftheywereusedaskeys.Thefirstalgorithmtriestodividethealphabetintofourroughlyequalsectionsanddistributesthekeysusingtheirfirstletter;theothertwoalgorithmsusehashfunctions.

importhashlib

defalpha_shard(word):

"""Doapoorjobofassigningdatatoserversbyusingfirstletters."""

ifword[0]in'abcdef':

return'server0'

elifword[0]in'ghijklm':

return'server1'

elifword[0]in'nopqrs':

return'server2'

else:

return'server3'

defhash_shard(word):

"""Doagreatjobofassigningdatatoserversusingahashvalue."""

return'server%d'%(hash(word)%4)

defmd5_shard(word):

"""Doagreatjobofassigningdatatoserversusingahashvalue."""

#digest()isabytestring,soweord()itslastcharacter

return'server%d'%(ord(hashlib.md5(word).digest()[-1])%4)

words=open('/usr/share/dict/words').read().split()

forfunctioninalpha_shard,hash_shard,md5_shard:

d={'server0':0,'server1':0,'server2':0,'server3':0}

forwordinwords:

d[function(word.lower())]+=1

printfunction.__name__[:-6],d

Thehash()functionisPython’sownbuilt-inhashroutine,whichisdesignedtobeblazinglyfastbecauseitisusedinternallytoimplementPythondictionarylookup.

MemcachedandSharding

Page 89: Python Networking Gitbook

Messagequeueprotocolsletyousendreliablechunksofdatacalledmessages.Typically,aqueuepromisestotransmitmessagesreliably,andtodeliverthematomically:amessageeitherarriveswholeandintact,oritdoesnotarriveatall.Clientsneverhavetoloopandkeepcallingsomethinglikerecv()untilawholemessagehasarrived.Theotherinnovationthatmessagequeuesofferisthat,insteadofsupportingonlythepoint-topointconnectionsthatarepossiblewithanIPtransportlikeTCP,youcansetupallkindsoftopologiesbetweenmessagingclients.Eachbrandofmessagequeuetypicallysupportsseveraltopologies.

Apipelinetopologyisthepatternthatperhapsbestresemblesthepictureyouhaveinyourheadwhenyouthinkofaqueue:aproducercreatesmessagesandsubmitsthemtothequeue,fromwhichthemessagescanthenbereceivedbyaconsumer.Forexample,thefront-endwebmachinesofaphotosharingwebsitemightacceptimageuploadsfromendusersandlisttheincomingfilesonaninternalqueue.Amachineroomfullofserverscouldthenreadfromthequeue,eachreceivingonemessageforeachreaditperforms,andgeneratethumbnailsforeachoftheincomingimages.Thequeuemightgetlongduringthedayandthenbeshortoremptyduringperiodsofrelativelylowuse,buteitherwaythefront-endwebserversarefreedtoquicklyreturnapagetothewaitingcustomer,tellingthemthattheiruploadiscompleteandthattheirimageswillsoonappearintheirphotostream.

Apublisher-subscribertopologylooksverymuchlikeapipeline,butwithakeydifference.Thepipelinemakessurethateveryqueuedmessageisdeliveredtoexactlyoneconsumer—since,afterall,itwouldbewastefulfortwothumbnailserverstobeassignedthesamephotograph.Butsubscriberstypicallywanttoreceiveallofthemessagesthatarebeingenqueuedbyeachpublisher—orelsetheywanttoreceiveeverymessagethatmatchessomeparticulartopic.Eitherway,apublisher-subscribermodelsupportsmessagesthatfanouttobedeliveredtoeveryinterestedsubscriber.Thiskindofqueuecanbeusedtopowerexternalservicesthatneedtopusheventstotheoutsideworld,andalsotoformafabricthatamachineroomfullofserverscanusetoadvertisewhichsystemsareup,whicharegoingdownformaintenance,andthatcanevenpublishtheaddressesofothermessagequeuesastheyarecreatedanddestroyed.

Finally,arequest-replypatternisoftenthemostcomplexbecausemessageshavetomakearoundtrip.Bothofthepreviouspatternsplacedverylittleresponsibilityontheproducerofamessage:theyconnecttothequeue,transmittheirmessage,andaredone.Butamessagequeueclientthatmakesarequesthastostayconnectedandwaitforthecorrespondingreplytobedeliveredbacktoit.Thequeueitself,tosupportthis,hastofeaturesomesortofaddressingschemebywhichrepliescanbedirectedtothecorrectclientthatisstillsittingandwaitingforit.Butforallofitsunderlyingcomplexity,thisisprobablythemostpowerfulpatternofall,sinceitallowstheloadofdozensorhundredsofclientstobespreadacrossequallylargenumbersofserverswithoutanyeffortbeyondsettingupthemessagequeue.Andsinceagoodmessagequeuewillallowserverstoattachanddetachwithoutlosingmessages,thistopologyallowsserverstobebroughtdownformaintenanceinawaythatisinvisibletothepopulationofclientmachines.

MessageQueues

Page 90: Python Networking Gitbook

ThereareseveralAMQP(AdvancedMessageQueuingProtocol)implementationscurrentlylistedinthePythonPackageIndex.

AnalternativetousingAMQPandhavingtorunacentralbroker,likeRabbitMQorApacheQpid,istouseØMQ,the“ZeroMessageQueue,”whichwasinventedbythesamecompanyasAMQPbutmovesthemessagingintelligencefromacentralizedbrokerintoeveryoneofyourmessageclientprograms.

AgoodsummaryoftheadvantagesanddisadvantagesisprovidedattheØMQwebsite:http://zeromq.org/docs:welcome-from-amqp

Thenextexample,queuecrazy.py,showssomeofthepatternsthatcanbesupportedwhenmessagequeuesareusedtoconnectdifferentpartsofanapplication.ItrequiresØMQ,whichyoucanmosteasilymakeavailabletoPythonIndex:

root@erlerobot:~/Python_files#pipinstallpyzmq-static

ThelistingusesPythonthreadstocreateasmallclusterofsixdifferentservices.Onepushesaconstantstreamofwordsontoapipeline.Threeotherssitreadytoreceiveawordfromthepipeline;eachwordwakesoneofthemup.Thefinaltwoarerequest-replyservers,whichresembleremoteprocedureendpointsandsendbackamessageforeachmessagetheyreceive.

importrandom,threading,time,zmq

zcontext=zmq.Context()

deffountain(url):

"""Producesasteadystreamofwords."""

zsock=zcontext.socket(zmq.PUSH)

zsock.bind(url)

words=[wforwindir(__builtins__)ifw.islower()]

whileTrue:

zsock.send(random.choice(words))

time.sleep(0.4)

defresponder(url,function):

"""Performsastringoperationoneachwordreceived."""

zsock=zcontext.socket(zmq.REP)

zsock.bind(url)

whileTrue:

word=zsock.recv()

zsock.send(function(word))#sendthemodifiedwordback

defprocessor(n,fountain_url,responder_urls):

"""Readwordsastheyareproduced;getthemprocessed;printthem."""

zpullsock=zcontext.socket(zmq.PULL)

zpullsock.connect(fountain_url)

zreqsock=zcontext.socket(zmq.REQ)

forurlinresponder_urls:

zreqsock.connect(url)

whileTrue:

word=zpullsock.recv()

zreqsock.send(word)

printn,zreqsock.recv()

defstart_thread(function,*args):

thread=threading.Thread(target=function,args=args)

thread.daemon=True#soyoucaneasilyControl-Cthewholeprogram

thread.start()

start_thread(fountain,'tcp://127.0.0.1:6700')

start_thread(responder,'tcp://127.0.0.1:6701',str.upper)

start_thread(responder,'tcp://127.0.0.1:6702',str.lower)

forninrange(3):

start_thread(processor,n+1,'tcp://127.0.0.1:6700',

UsingMessageQueuesfromPython

Page 91: Python Networking Gitbook

['tcp://127.0.0.1:6701','tcp://127.0.0.1:6702'])

time.sleep(30)

Thetworequest-replyserversaredifferent—oneturnseachworditreceivestouppercase,whiletheothermakesitswordsalllowercase—andyoucantellthethreeprocessorsapartbythefactthateachisassignedadifferentinteger.

FinallyIwouldliketoaddthefollowingtofixtheconceptofmessageQueues:Messagequeuesprovideapointofcoordinationandintegrationfordifferentpartsofyourapplicationthatmayrequiredifferenthardware,loadbalancingtechniques,platforms,orevenprogramminglanguages.Theycantakeresponsibilityfordistributingmessagesamongmanywaitingconsumersorserversinawaythatisnotpossiblewiththesinglepoint-to-pointlinksofferedbynormalTCPsockets,andcanalsouseadatabaseorotherpersistentstoragetoassurethatupdatestoyourservicearenotlostiftheservergoesdown.Messagequeuesalsoofferresilienceandflexibility,sinceifsomepartofyoursystemtemporarilybecomesabottleneck,thenthemessagequeuecanabsorbtheshockbyallowingmanymessagestoqueueupforthatservice.Byhidingthepopulationofserversorprocessesthatserveaparticularkindofrequest,themessagequeuepatternalsomakesiteasytodisconnect,upgrade,reboot,andreconnectserverswithouttherestofyourinfrastructurenoticing.

Page 92: Python Networking Gitbook

MapReduceisaprogrammingmodelandanassociatedimplementationforprocessingandgeneratinglargedatasetswithaparallel,distributedalgorithmonacluster.

AMapReduceprogramiscomposedofaMap()procedurethatperformsfilteringandsorting(suchassortingstudentsbyfirstnameintoqueues,onequeueforeachname)andaReduce()procedurethatperformsasummaryoperation(suchascountingthenumberofstudentsineachqueue,yieldingnamefrequencies).The"MapReduceSystem"(alsocalled"infrastructure"or"framework")orchestratesbymarshallingthedistributedservers,runningthevarioustasksinparallel,managingallcommunicationsanddatatransfersbetweenthevariouspartsofthesystem,andprovidingforredundancyandfaulttolerance.

ThesetwooperationsbearsomeresemblancetothePythonbuilt-infunctionsofthatname(whichPythonitselfborrowedfromtheworldoffunctionalprogramming);imaginehowonemightsplitacrossseveralserversthetasksofsummingthesquaresofmanyintegers:

>>>squares=map(lambdan:n*n,range(11))

>>>squares

[0,1,4,9,16,25,36,49,64,81,100]

>>>importoperator

>>>reduce(operator.add,squares)

385

Themappingoperationshouldbepreparedtorunonceonsomeparticularsliceoftheoverallproblemordataset,andtoproduceatally,table,orresponsethatsummarizesitsfindingsforthatsliceoftheinput.Thereduceoperationisthenexposedtotheoutputsofthemappingfunctions,tocombinethemtogetherintoanever-accumulatinganswer.Tousethemapreducecluster’spowereffectively,frameworksarenotcontenttosimplyrunthereducefunctionononenodeonceallofthedozensorhundredsofactivemachineshavefinishedthemappingstage.Instead,thereducefunctionisruninparallelonmanynodesatonce,eachconsideringtheoutputofahandfulofmapoperations,andthentheseintermediateresultsarecombinedagainandagaininatreeofcomputationsuntilafinalreducestepproducesoutputforthewholeinput.

Inconclusion,themap-reducepatternprovidesacloud-styleframeworkfordistributedcomputationacrossmany

Map-Reduce

Page 93: Python Networking Gitbook

processorsand,potentially,acrossmanypartsofalargedataset.

Page 94: Python Networking Gitbook

Hypertextisstructuredtextthatuseslogicallinks(hyperlinks)betweennodescontainingtext.HTTP(TheHypertextTransferProtocol)istheprotocoltoexchangeortransferhypertext.

HTTPisthefoundationofdatacommunicationfortheWorldWideWeb.AsthischapterproceedstoexplorethefeaturesofHTTP,wearegoingtoillustratetheprotocolusingseveralmodulesthatcomebuilt-intothePythonStandardLibrary

HTTP

Page 95: Python Networking Gitbook

UniformResourceLocators(URLs),arestringsthattellyourwebbrowserhowtofetchresourcesfromtheWorldWideWebTheyareasubclassofthefullsetofpossibleUniformResourceIdentifiers(URIs);specifically,theyareURIsconstructedsothattheygiveinstructionsforfetchingadocument,insteadofservingonlyasanidentifier.

Tounderstandhowtheywork,FconsideraverysimpleURL,forexample,likethefollowing:http://python.orgIfsubmittedtoawebbrowser,thisURLisinterpretedasanordertoresolvethehostnamepython.orgtoanIPaddress,makeaTCPconnectiontothatIPaddressatthestandardHTTPport80,andthenaskfortherootdocument/thatlivesatthatsite.

NowimagineanothermorecomplicatedURL,imaginethatwewantedthelogoforNord/LB,alargeGermanbank.TheresultingURLmightlooksomethinglikethis:http://example.com:8080/Nord%2FLB/logo?shape=square&dpi=96

Here,theURLspecifiesmoreinformationthanourpreviousexampledid:

Theprotocolwill,again,beHTTP.Thehostnameexample.comwillberesolvedtoanIP.Thistime,port8080willbeusedinsteadof80.Onceaconnectioniscomplete,theremoteserverwillbeaskedfortheresourcenamed:/Nord%2FLB/logo?shape=square&dpi=96

Webservers,inpractice,haveabsolutefreedomtointerpretURLsastheyplease;however,theintentionofthestandardisthatthisURLbeparsedintotwoquestion-mark-delimitedpieces.Thefirstisapathconsistingoftwoelements:

ANord/LBpathelement.Alogopathelement.

Thestringfollowingthe?isinterpretedasaquerycontainingtwoterms:

Ashapeparameterwhosevalueissquare.Adpiparameterwhosevalueis96.

Anycharactersbeyondthealphanumerics,afewpunctuationmarks—specificallytheset$-_.+!*'(),—andthespecialdelimitercharactersthemselves(liketheslashes)mustbepercent-encodedbyfollowingapercentsign%withthetwo-digithexadecimalcodeforthecharacter.

YoushouldnotethatthefollowingURLpathsarenotequivalent:

Nord%2FLB%2Flogo=Asinglepathcomponent,namedNord/LB/logo.

Nord%2FLB/logo=Twopathcomponents,Nord/LBandlogo.

Nord/LB/logo=ThreeseparatepathcomponentsNord,LB,andlogo.

ThemostimportantPythonroutinesforworkingwithURLslive,appropriatelyenough,intheirownmodule.Theurlparsemodule;thismoduledefinesastandardinterfacetobreakURLstringsupincomponents(addressingscheme,networklocation,pathetc.),tocombinethecomponentsbackintoaURLstring,andtoconverta“relativeURL”toanabsoluteURLgivena“baseURL.”

>>>fromurlparseimporturlparse,urldefrag,parse_qs,parse_qsl

Withtheseroutines,youcangetlargeandcomplexURLsliketheexamplegivenearlierandturn

themintotheircomponentparts,withRFC-compliantparsingalreadyimplementedforyou:

```python

>>>p=urlparse('http://example.com:8080/Nord%2FLB/logo?shape=square&dpi=96')

>>>p

ParseResult(scheme='http',netloc='example.com:8080',path='/Nord%2FLB/logo',

URLAnatomy

Page 96: Python Networking Gitbook

»»»params='',query='shape=square&dpi=96',fragment='')

ThequerystringthatisofferedbytheParseResultcanthenbesubmittedtooneoftheparsingroutinesifyouwanttointerpretitasaseriesofkey-valuepairs,whichisastandardwayforwebformstosubmitthem:

>>>parse_qs(p.query)

{'shape':['square'],'dpi':['96']}

Notethateachvalueinthisdictionaryisalist,ratherthansimplyastring.ThisistosupportthefactthatagivenparametermightbespecifiedseveraltimesinasingleURL;insuchcases,thevaluesaresimplyappendedtothelist:

>>>parse_qs('mode=topographic&pin=Boston&pin=San%20Francisco')

{'mode':['topographic'],'pin':['Boston','SanFrancisco']}

This,youwillnote,preservestheorderinwhichvaluesarrive;ofcourse,thisdoesnotpreservetheorderoftheparametersthemselvesbecausedictionarykeysdonotrememberanyparticularorder.Iftheorderisimportanttoyou,thenusetheparse_qsl()functioninstead(thelmuststandfor“list”):

>>>parse_qsl('mode=topographic&pin=Boston&pin=San%20Francisco')

[('mode','topographic'),('pin','Boston'),('pin','SanFrancisco')]

`

Finally,notethatan“anchor”appendedtoaURLaftera#characterisnotrelevanttotheHTTPprotocol.ThisisbecauseanyanchorisstrippedoffandisnotturnedintopartoftheHTTPrequest.Instead,theanchortellsawebclienttojumptosomeparticularsectionofadocumentaftertheHTTPtransactioniscompleteandthedocumenthasbeendownloaded.Toremovetheanchor,useurldefrag():

>>>u='http://docs.python.org/library/urlparse.html#urlparse.urldefrag'

>>>urldefrag(u)

('http://docs.python.org/library/urlparse.html','urlparse.urldefrag')

YoucanturnaParseResultbackintoaURLbycallingitsgeturl()method.Whencombinedwiththeurlencode()function,whichknowshowtobuildquerystrings,thiscanbeusedtoconstructnewURLs:

>>>importurllib,urlparse

>>>query=urllib.urlencode({'company':'Nord/LB','report':'sales'})

>>>p=urlparse.ParseResult(

...'https','example.com','data',None,query,None)

>>>p.geturl()

'https://example.com/data?report=sales&company=Nord%2FLB'

Forlast,theHTTPrequestlooklikethis:

GET/rfc/rfc2616.txtHTTP/1.1

Accept-Encoding:identity

Host:www.ietf.org

Connection:close

User-Agent:Python-urllib/2.7

AndtheHTTPresponsethatcomesbackoverthesocketalsostartswithasetofheaders,butthenalsoincludesabodythatcontainsthedocumentitselfthathasbeenrequested:

HTTP/1.1200OK

Server:cloudflare-nginx

Page 97: Python Networking Gitbook

Date:Fri,11Jul201407:02:55GMT

Content-Type:text/plain

Transfer-Encoding:chunked

Connection:close

Set-Cookie:__cfduid=d5be98ff9fbae526f308d478da5bb413e1405062173934;expires=Mon,23-Dec-201923:50:00GMT;path=/;domain=.ietf.org;HttpOnly

Last-Modified:Fri,11Jun199918:46:53GMT

Vary:Accept-Encoding

CF-RAY:1483235b13c51043-CDG

<addinfourlat4341048456whosefp=<socket._fileobjectobjectat0x102a13750>>

Page 98: Python Networking Gitbook

Veryoften,thelinksusedinwebpagesdonotspecifyfullURLs,butrelativeURLsthataremissingseveraloftheusualcomponents.Whenoneoftheselinksneedstoberesolved,theclientneedstofillinthemissinginformationwiththecorrespondingfieldsfromtheURLusedtofetchthepageinthefirstplace.

Thesimplestrelativelinksarethenamesofpagesoneleveldeeperthanthebasepage:

>>>urlparse.urljoin('http://www.python.org/psf/','grants')

'http://www.python.org/psf/grants'

>>>urlparse.urljoin('http://www.python.org/psf/','mission')

'http://www.python.org/psf/mission'

NotethecrucialimportanceofthetrailingslashintheURLs:

>>>urlparse.urljoin('http://www.python.org/psf','grants')

'http://www.python.org/grants'

LikefilesystempathsonthePOSIXandWindowsoperatingsystems,.canbeusedforthecurrentdirectoryand..isthenameoftheparent:

>>>urlparse.urljoin('http://www.python.org/psf/','./mission')

'http://www.python.org/psf/mission'

>>>urlparse.urljoin('http://www.python.org/psf/','../news/')

'http://www.python.org/news/'

>>>urlparse.urljoin('http://www.python.org/psf/','/dev/')

'http://www.python.org/dev'

`

And,asillustratedinthelastexample,arelativeURLthatstartswithaslashisassumedtoliveatthetoplevelofthesamesiteastheoriginalURL.Happily,theurljoin()functionignoresthebaseURLentirelyifthesecondargumentalsohappenstobeanabsoluteURL.ThismeansthatyoucansimplypasseveryURLonagivenwebpagetotheurljoin()function,andanyrelativelinkswillbeconverted;atthesametime,absolutelinkswillbepassedthroughuntouched:

>>>#Absolutelinksaresafefromchange

...

>>>urlparse.urljoin('http://www.python.org/psf/','http://yelp.com/')

'http://yelp.com/'

RelativeURLs

Page 99: Python Networking Gitbook

WenowturntotheHTTPprotocolitself.Althoughitson-the-wireappearanceisusuallyaninternaldetailhandledbywebbrowsersandlibrarieslikeurllib2module.Theurllib2moduledefinesfunctionsandclasseswhichhelpinopeningURLs(mostlyHTTP)inacomplexworld—basicanddigestauthentication,redirections,cookiesandmore.

wearegoingtoadjustitsbehaviorsothatwecanseetheprotocolprintedtothescreen.Takealookatverbose_http.py:

importStringIO,httplib,urllib2

classVerboseHTTPResponse(httplib.HTTPResponse):

def_read_status(self):

s=self.fp.read()

print'-'*20,'Response','-'*20

prints.split('\r\n\r\n')[0]

self.fp=StringIO.StringIO(s)

returnhttplib.HTTPResponse._read_status(self)

classVerboseHTTPConnection(httplib.HTTPConnection):

response_class=VerboseHTTPResponse

defsend(self,s):

print'-'*50

prints.strip()

httplib.HTTPConnection.send(self,s)

classVerboseHTTPHandler(urllib2.HTTPHandler):

defhttp_open(self,req):

returnself.do_open(VerboseHTTPConnection,req)

Thiscustomizationprintsoutboththeoutgoingrequestandtheincomingresponseinsteadofkeepingthembothhidden.

Toallowforcustomization,theurllib2libraryletsyoubypassitsvanillaurlopen()functionandinsteadbuildanopenerfullofhandlerclassesofyourowndevising—afactthatwewilluserepeatedlyasthischapterprogresses.Listing9–1providesexactlysuchahandlerclassbyperformingaslightcustomizationonthenormalHTTPhandler.Thiscustomizationprintsoutboththeoutgoingrequestandtheincomingresponseinsteadofkeepingthembothhidden.Formanyofthefollowingexamples,wewilluseanopenerobjectthatwebuildrighthere,usingthehandlerfromverbose_http.py:

>>>fromverbose_httpimportVerboseHTTPHandler

>>>importurllib,urllib2

>>>opener=urllib2.build_opener(VerboseHTTPHandler)

YoucantryusingthisopeneragainsttheURLoftheRFCthatwementionedatthebeginningofthischapter:opener.open('http://www.ietf.org/rfc/rfc2616.txt')

Instrumentingurllib2

Page 100: Python Networking Gitbook

WhentheearliestversionofHTTPwasfirstinvented,ithadasinglepower:toissueamethodcalledGETthatnamedandreturnedahypertextdocumentfromaremoteserver.Thatmethodisstillthebackboneoftheprotocoltoday.

TheGETmethod,likeallHTTPmethods,isthefirstthingtransmittedaspartofanHTTPrequest,anditisimmediatelyfollowedbytherequestheaders.ForsimpleGETmethods,therequestsimplyendswiththeblanklinethatterminatestheheaderssotheservercanimmediatelystopreadingandsendaresponse.

>>>info=opener.open('http://www.ietf.org/rfc/rfc2616.txt')

GET/rfc/rfc2616.txtHTTP/1.1

Accept-Encoding:identity

Host:www.ietf.org

Connection:close...

Theopener’sopen()method,liketheplainurlopen()functionatthetoplevelofurllib2,returnsaninformationobjectthatletsusexaminetheresultoftheGETmethod.YoucanseethattheHTTPrequeststartedwithastatuslinecontainingtheHTTPversion,astatuscode,andashortmessage.Theinfoobjectmakestheseavailableasobjectattributes;italsoletsusexaminetheheadersthroughadictionary-likeobject:

>>>info.code

200

>>>info.msg

'OK'

>>>sorted(info.headers.keys())

['accept-ranges','connection','content-length','content-type',

'date','etag','last-modified','server','vary']

>>>info.headers['Content-Type']

'text/plain'

Finally,theinfoobjectisalsopreparedtoactasafile.TheHTTPresponsestatusline,theheaders,andtheblanklinethatfollowsthemhaveallbeenreadfromtheHTTPsocket,andnowtheactualdocumentiswaitingtoberead.Asisusuallythecasewithfileobjects,youcaneitherstartreadingtheinfoobjectinpiecesthroughread(N)orreadline();oryoucanchoosetobringtheentiredatastreamintomemoryasasinglestring:

>>>printinfo.read().strip()

NetworkWorkingGroupR.Fielding

RequestforComments:2616UCIrvine

Obsoletes:2068J.Gettys

Category:StandardsTrackCompaq/W3C

...

ThesearethefirstlinesofthelongertextfilethatyouwillseeifyoupointyourwebbrowseratthesameURL.

InaworldofsixbillionpeopleandfourbillionIPaddresses,theneedquicklybecamecleartosupportserversthatmighthostdozensofwebsitesatthesameIP.AndthatiswhytheURLlocationisnowincludedineveryHTTPrequest.Forcompatibility,ithasnotbeenmadepartoftheGETrequestlineitself,buthasinsteadbeenstuckintotheheadersunderthenameHost.

>>>info=opener.open('http://www.google.com/')

--------------------Response--------------------

HTTP/1.1302Found

Cache-Control:private

...

--------------------------------------------------

GET/?gfe_rd=cr&ei=OY6_U_qjHOeA8QeTg4H4BQHTTP/1.1

Accept-Encoding:identity

Host:www.google.es

Connection:close

TheGETMethodandTheHostHeader

Page 101: Python Networking Gitbook

User-Agent:Python-urllib/2.7

--------------------Response--------------------

HTTP/1.1200OK

...

Dependingonhowtheyareconfigured,serversmightreturnentirelydifferentsiteswhenconfrontedwithtwodifferentvaluesforHost;theymightpresentslightlydifferentversionsofthesamesite;ortheymightignoretheheaderaltogether.Butsemantically,tworequestswithdifferentvaluesforHostareaskingabouttwoentirelydifferentURLs.WhenseveralsitesarehostedatasingleIPaddress,thosesitesareeachsaidtobeservedbyavirtualhost,andthewholepracticeissometimesreferredtoasvirtualhosting.

IsalsoimportanttotakecarethatwhenhandlingHTTPdiffrentresponsescanhappend,betweenthemcodes,errors,andredirection.Youcanreadmoreaboutthishere.

Page 102: Python Networking Gitbook

Bydefault,HTTP/1.1serverswillkeepaTCPconnectionopenevenaftertheyhavedeliveredtheirresponse.Thisenablesyoutomakefurtherrequestsonthesamesocketandavoidtheexpenseofcreatinganewsocketforeverypieceofdatayoumightneedtodownload.Keepinmindthatdownloadingamodernwebpagecaninvolvefetchingdozens,ifnothundreds,ofseparatepiecesofcontent.TheHTTPConnectionclassprovidedbyurllib2letsyoutakeadvantageofthisfeature.Infact,allrequestsgothroughoneoftheseobjects;whenyouuseafunctionlikeurlopen()orusetheopen()methodonanopenerobject,anHTTPConnectionobjectiscreatedbehindthescenes,usedforthatonerequest,andthendiscarded.Whenyoumightmakeseveralrequeststothesamesite,useapersistentconnectioninstead:

>>>importhttplib

>>>c=httplib.HTTPConnection('www.python.org')

>>>c.request('GET','/')

>>>original_sock=c.sock

>>>content=c.getresponse().read()#getthewholepage

>>>c.request('GET','/about/')

>>>c.sockisoriginal_sock

True

Now,ifweinsertthisheadermanually,thenweforcetheHTTPConnectionobjecttocreateasecondsocketwhenweaskitforasecondpage:

>>>c=httplib.HTTPConnection('www.python.org')

>>>c.request('GET','/',headers={'Connection':'close'})

>>>original_sock=c.sock

>>>content=c.getresponse().read()

>>>c.request('GET','/about/')

>>>c.sockisoriginal_sock

False

NotethatHTTPConnectiondoesnotraiseanexceptionwhenonesocketclosesandithastocreateanotherone;youcankeepusingthesameobjectoverandoveragain.Thisholdstrueregardlessofwhethertheserverisacceptingalloftherequestsoverasinglesocket,oritissometimeshangingupandforcingHTTPConnectiontoreconnect.

PayloadsandPersistentConnections

Page 103: Python Networking Gitbook

ThePOSTHTTPmethodwasdesignedtopowerwebforms.WhenformsareusedwiththeGETmethod,whichisindeedtheirdefaultbehavior,theyappendtheform’sfieldvaluestotheendoftheURL:http://www.google.com/search?q=python+language

TheconstructionofsuchaURLcreatesanewnamedlocationthatcanbesaved;bookmarked;referencedfromotherwebpages;andsentine-mails,Tweets,andtextmessages.Andforactionslikesearchingandselectingdata,thesefeaturesareperfect.Butwhataboutaloginformthatacceptsyoure-mailaddressandpassword?NotonlywouldtherebenegativesecurityimplicationstohavingtheseelementsappendedtotheformURL—suchasthefactthattheywouldbedisplayedonthescreenintheURLbarandincludedinyourbrowserhistory—butsurelyitwouldbeoddtothinkofyourusernameandpasswordascreatinganewlocationorpageonthewebsiteinquestion:http://example.com/[email protected]&pw=aaz9Gog3

BuildingURLsinthiswaywouldimplythatadifferentpageexistsontheexample.comwebsiteforeverypossiblepasswordthatyoucouldtrytyping.Thisisundesirableforobviousreasons.AndsothePOSTmethodshouldalwaysbeusedforformsthatarenotconstructingthenameofaparticularpageorlocationonawebsite,butareinsteadperformingsomeactiononbehalfofthecaller.FormsinHTMLcanspecifythattheywantthebrowsertousePOSTbyspecifyingthatmethodintheir`

element:

<formname="myloginform"action="/access/dummy"method="post">

E-mail:<inputtype="text"name="e-mail"size="20">

Password:<inputtype="password"name="password"size="20">

<inputtype="submit"name="submit"value="Login">

</form>

InsteadofstuffingformparametersintotheURL,aPOSTcarriestheminthebodyoftherequest.WecanperformthesameactionourselvesinPythonbyusingurlencodetoformattheformparameters,andthensupplyingthemasasecondparametertoanyoftheurllib2methodsthatopenaURL.-(FromthestandardPythonlibrary:urllib.urlencode(query[,doseq])Convertamappingobjectorasequenceoftwo-elementtuplestoa“percent-encoded”string,suitabletopasstourlopen()aboveastheoptionaldataargument.ThisisusefultopassadictionaryofformfieldstoaPOSTrequest.)

form=urllib.urlencode({'inputstring':'Atlanta,GA'})

>>>response=opener.open('http://forecast.weather.gov/zipcity.php',form)

--------------------------------------------------

POST/zipcity.phpHTTP/1.1

...

Content-Length:25

Host:forecast.weather.gov

Content-Type:application/x-www-form-urlencoded

...

--------------------------------------------------

inputstring=Atlanta%2C+GA

--------------------Response--------------------

HTTP/1.1302Found

...

Location:http://forecast.weather.gov/MapClick.php?CityName=Atlanta&state=GA

&site=FFC&textField1=33.7629&textField2=-84.4226&e=1

...

--------------------------------------------------

GET/MapClick.php?CityName=Atlanta&state=GA&site=FFC&textField1=33.7629&textField2=

-84.4226&e=1HTTP/1.1

...

--------------------Response--------------------

HTTP/1.1200OK

...

AlthoughouropenerobjectisputtingadashedlinebetweeneachHTTPrequestanditspayloadforclarity(ablankline,youwillrecall,iswhatreallyseparatesheadersandpayloadonthewire)youareotherwiseseeingarawHTTPPOSTmethodhere.Notethesefeaturesoftherequest-responsesshowninexampleabove:

POSTAndForms

Page 104: Python Networking Gitbook

TherequestlinestartswiththestringPOST.Contentisprovided(andthus,aContent-Lengthheader).Theformparametersaresentasthebody.TheContent-Typeforstandardwebformsisx-www-form-urlencoded.

ThemostimportantthingtograspisthatGETandPOSTaremostemphaticallynotsimplytwodifferentwaystoformatformparameters.Instead,theyactuallymeantwoentirelydifferentthings.TheGETmethodmeans,“IbelievethatthereisadocumentatthisURL;pleasereturnit.”ThePOSTmethodmeans,“HereisanactionthatIwantperformed.”

InthePOSTexampleaboveyoucannoticethatinsteadofsimplyreturningastatusof200followedbyapageofweatherforecastdata,itinsteadreturneda302redirectthaturllib2obeyedbyperformingaGETforthepagenamedintheLocation:header.

AwebsiteleavesusersinaverydifficultpositionifitanswersaPOSTformsubmissionwithaliteralwebpage.Well-designeduser-facingPOSTformsalwaysredirecttoapagethatshowstheresultoftheaction,andthispagecanbesafelybookmarked,shared,stored,andreloaded.Thisisanimportantfeatureofmodernbrowsers:ifaPOSTresultsinaredirect,thenpressingthereloadbuttonsimplyrefetchesthefinalURLanddoesnotreattemptthewholetrainofredirectsthatleadtothecurrentlocation

SuccessfulFormPOSTsShouldAlwaysRedirect

Page 105: Python Networking Gitbook

Web-basedAPIs,whichfetchdocumentsanddatausingGETandPOSTtospecificURLs.Therefore,weshouldimmediatelynotethatmanymodernwebservicestrytointegratetheirAPIsmoretightlywithHTTPbygoingbeyondthetwomostcommonHTTPmethodsbyimplementingadditionalmethodslikePUTandDELETE.

Adesignpatternnamed“RepresentationalStateTransfer”hasthereforebeentakingholdinmanydevelopercommunities.ItspecifiesthatthenounsofanAPIshouldliveattheirownURLs.Forexample,PUT,GET,POST,andDELETEshouldbeused,respectively,tocreate,fetch,modify,andremovethedocumentslivingattheseURLs.

Bycouplingthisbasicrecommendationwithfurtherguidelines,theRESTmethodologyguidesthecreationofwebservicesthatmakemorecompleteuseoftheHTTPprotocol.Suchwebservicesalsoofferquitecleansemantics,andcanbeacceleratedbythesamecachingproxiesthatareoftenusedtospeedthedeliveryofnormalwebpages.

NotethatHTTPsupportsarbitrarymethodnames,eventhoughthestandarddefinesspecificsemanticsforGETandPOSTandalloftherest.Traditionwoulddictateusingthewell-knownmethodsdefinedinthestandardunlessyouareusingaspecificframeworkormethodologythatrecognizesandhasdefinedothermethods.

RESTAndMoreHTTPMethods

Page 106: Python Networking Gitbook

User-Agent:Python-urllib/2.6:ThisheaderisoptionalintheHTTPprotocol,andmanysitessimplyignoreorlogit.Itcanbeusefulwhensiteswanttoknowwhichbrowserstheirvisitorsusemostoften,anditcansometimesbeusedtodistinguishsearchenginespiders(bots)fromnormalusersbrowsingasite.

Manywebsitesaresensitivetothekindsofbrowsersthatviewthem.Ifyouneedtoaccesssuchsiteswith`urllib2,youcansimplyinstructittolieaboutitsidentity,andthereceivingwebsitewillnotknowthedifference:

>>>url='https://wca.eclaim.com/'

>>>urllib2.urlopen(url).read()

'<HTML>...Thefollowingare...required...MicrosoftInternetExplorer...'

>>>agent='Mozilla/5.0(Windows;U;MSIE7.0;WindowsNT6.0;en-US)'

>>>request=urllib2.Request(url)

>>>request.add_header('User-Agent',agent)

>>>urllib2.urlopen(request).read()

'\r\n<HTML>\r\n<HEAD>\r\n\t<TITLE>Eclaim.com-LogIn</TITLE>...'

Therearedatabasesofpossibleuseragentstringsonlineatseveralsitesthatyoucanreferencebothwhenanalyzingagentstringsthatyourownservershavereceived,aswellaswhenconcoctingstringsforyourownHTTPrequests:

http://www.zytrax.com/tech/web/browser_ids.htmhttp://www.useragentstring.com/pages/useragentstring.php

IdentifyingUserAgentsandWebServers

Page 107: Python Networking Gitbook

ItisalwayspossibletosimplymakeanHTTPrequestandlettheserverreturnadocumentwithwhateverContent-Type:isappropriatefortheinformationwehaverequested.Someoftheusualcontenttypesencounteredbyabrowserincludethefollowing:text/html,text/plain,text/css,image/gif,image/jpeg,image/x-png,application/javascript,application/pdf,application/zip.

Ifthewebserviceisreturningagenericdatastreamofbytesthatitcannotdescribemorespecifically,itcanalwaysfallbacktothecontenttype:application/octet-stream.

Thefourheadersthatwillinterestyouincludethefollowing:Accept,Accept-Charset,Accept-Language,Accept-Encoding

Eachoftheseheaderssupportsacomma-separatedlistofitems,whereeachitemcanbegivenaweightbetweenoneandzero(largerweightsindicatemorepreferreditems)byaddingasuffixthatconsistsofasemi-colonandq=stringtotheitem.Theresultwilllooksomethinglikethis(using,forillustration,theAccept:headerthatmyGoogleChromebrowserseemstobecurrentlyusing):Accept:application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;»q=0.8,image/png,*/*;q=0.5

ThisindicatesthatChromeprefersXMLandXHTML,butwillacceptHTMLorevenplaintextifthosearetheonlydocumentformatsavailable;thatChromeprefersPNGimageswhenitcangetthem;andthatithasnopreferencebetweenalloftheothercontenttypesinexistence.

ContentTypeNegotiation

Page 108: Python Networking Gitbook

WhilemanydocumentsdeliveredoverHTTParealreadyfairlyheavilycompressed,includingimagesandfileformatslikePDF,webpagesthemselvesarewritteninverboseSGMLdialectsthatcanconsumemuchlessbandwidthifsubjectedtogenerictextualcompression.Similarly,CSSandJavaScriptfilesalsocontainverystereotypedpatternsofpunctuationandrepeatedvariablenames,whichisveryamenabletocompression.

Webclientscanmakeserversawarethattheyacceptcompresseddocumentsbylistingtheformatstheysupportinarequestheader,asinthisexample:`Accept-Encoding:gzip``

Forsomereason,manysitesseemtonotoffercompressionunlesstheUser-Agent:headerspecifiessomethingtheyrecognize.Thus,toconvinceGoogletocompressitsGoogleNewspage,youhavetouseurllib2somethinglikethis:

>>>request=urllib2.Request('http://news.google.com/')

>>>request.add_header('Accept-Encoding','gzip')

>>>request.add_header('User-Agent','Mozilla/5.0')

>>>info=opener.open(request)

--------------------------------------------------

GET/HTTP/1.1

Host:news.google.com

User-Agent:Mozilla/5.0

Connection:close

Accept-Encoding:gzip

--------------------Response--------------------

HTTP/1.1200OK

Content-Type:text/html;charset=UTF-8

...

Content-Encoding:gzip

...

Rememberthatwebserversdonothavetoperformcompression,andthatmanywillignoreyourAccept-Encoding:header.Therefore,youshouldalwayscheckthecontentencodingoftheresponse,andperformdecompressiononlywhentheserverdeclaresthatitisnecessary:

>>>info.headers['Content-Encoding']=='gzip'

True

>>>importgzip,StringIO

>>>gzip.GzipFile(fileobj=StringIO.StringIO(info.read())).read()

'<!DOCTYPEHTML...<html>...</html>'

Asyoucansee,Pythondoesnotletuspassthefile-likeinforesponseobjectdirectlytotheGzipFileclassbecause,itisnotquitefile-likeenough.Here,wecanperformthequickwork-aroundofreadingthewholecompressedfileintomemoryandthenwrappingitinaStringIOobjectthatdoessupport`tell().

Compression

Page 109: Python Networking Gitbook

Manyelementsofatypicalwebsitedesignarerepeatedoneverypageyouvisit,andyourbrowsingwouldslowtoacrawlifeveryimageanddecorationhadtobedownloadedseparatelyforeverypageyouviewed.Well-configuredwebserversthereforeaddheaderstoeveryHTTPresponsethatallowbrowsers,aswellasanyproxycachesbetweenthebrowserandtheserver,tocontinueusingacopyofadownloadedresourceforsomeperiodoftimeuntilitexpires.

Therearetwobasicmechanismsbywhichserverscansupportclientcaching.Inthefirstapproach,anHTTPresponseincludesanExpires:headerthatformatsadateandtimeusingthesameformatasthestandardDate:header:Expires:Sun,21Jan201017:06:12GMT.However,thisrequirestheclienttocheckitsclock—andmanycomputersrunclocksthatarefaraheadoforbehindtherealcurrentdateandtime.

Thisbringsustoasecond,moremodernalternative,theCache-Controlheader,thatdependsonlyontheclientbeingabletocorrectlycountsecondsforwardfromthepresent.Forexample,toallowanimageorpagetobecachedforanhourbuttheninsistthatitberefetchedoncethehourisup,acachecontrolheadercouldbesuppliedlikethis:Cache-Control:max-age=3600,must-revalidate.

HTTPCaching

Page 110: Python Networking Gitbook

It’spossiblethatyoumightwantyourprogramtocheckaseriesoflinksforvalidityorwhethertheyhavemoved,butyoudonotwanttoincurtheexpenseofactuallydownloadingthebodythatwouldfollowtheHTTPheaders.Inthiscase,youcanissueaHEADrequest.Thisisdirectlypossiblethroughhttplib,butitcanalsobeperformedbyurllib2ifyouarewillingtowriteasmallrequestclassofyourown:

>>>classHeadRequest(urllib2.Request):

...defget_method(self):

...return'HEAD'

...

>>>info=urllib2.urlopen(HeadRequest('http://www.google.com/'))

>>>info.read()

''

Youcanseethatthebodyoftheresponseiscompletelyempty.

TheHEADMethod

Page 111: Python Networking Gitbook

AnencryptedURLstartswithhttps:insteadofsimplyhttp:,usesthedefaultport443insteadofport80,andusesTLS.

EncryptionhastobenegotiatedbeforetheusercansendhisHTTPrequest,lestalloftheinformationinitbedivulged;butuntiltherequestistransmitted,theserverdoesnotknowwhatHost:therequestwillspecify.Therefore,encryptedwebsitesstillliveundertheoldproblemofhavingtouseadifferentIPaddressforeverydomainthatmustbehosted.

Atechniqueknownas“ServerNameIndication”(SNI)hasbeendevelopedtogetaroundthistraditionalrestriction;however,Pythondoesnotyetsupportit.Itappears,though,thatapatchwasappliedtothePython3trunkwiththisfeature,onlydayspriortothetimeofwriting.Hereistheticketincaseyouwanttofollowtheissue:http://bugs.python.org/issue5639.

TouseHTTPSfromPython,simplysupplyanhttps:methodinyourURL:

>>>info=urllib2.urlopen('https://www.ietf.org/rfc/rfc2616.txt')

>>>

Iftheconnectionworksproperly,thenneitheryourgovernmentnoranyofthevariouslargeandshadowycorporationsthattracksuchthingsshouldbeabletoeasilydetermineeitherthesearchtermyouusedortheresultsyouviewed.

HTTPSEncryption

Page 112: Python Networking Gitbook

TheHTTPprotocolcamewithameansofauthenticationthatwassopoorlythoughtoutandsobadlyimplementedthatitseemstohavebeenalmostentirelyabandoned.Whenaserverwasaskedforapagetowhichaccesswasrestricted,itwassupposedtoreturnaresponsecode:HTTP/1.1401AuthorizationRequired.

Theauthenticationtokenwasgeneratedbydoingbase64encodingonthecolon-separatedusernameandpassword:

>>>importbase64

>>>printbase64.b64encode("guido:vanOranje!")

Z3VpZG86dmFuT3JhbmplIQ==

This,ofcourse,justprotectsanyspecialcharactersintheusernameandpasswordthatmighthavebeenconfusedaspartoftheheadersthemselves;itdoesnotprotecttheusernameandpasswordatall,sincetheycanverysimplybedecodedagain:

>>>printbase64.b64decode("Z3VpZG86dmFuT3JhbmplIQ==")

guido:vanOranje!

Anyway,oncetheencodedvaluewascomputed,itcouldbeincludedinthesecondrequestlikethis:`Authorization:BasicQWxhZGRpbjpvcGVuIHNlc2FtZQ==``

Anincorrectpasswordorunknownuserwouldelicitadditional401errorsfromtheserver,resultinginthepop-upboxappearingagainandagain.Finally,iftheusergotitright,shewouldeitherbeshowntheresourceor—ifsheinfactdidnothavepermission—beshownaresponsecodelikethefollowing:403Forbidden.

Pythonsupportsthiskindofauthenticationthroughahandlerthat,asyourprogramusesit,canaccumulatealistofpasswords.

auth_handler=.HTTPBasicAuthHandler()

auth_handler.add_password(realm='voetbal',uri='http://www.onsoranje.nl/',

user='guido',passwd='vanOranje!')

Theresultinghandlercanbepassedintobuild_opener().

HTTPAuthentication

Page 113: Python Networking Gitbook

Theactualmechanismthatpowersuseridentitytracking,loggingin,andloggingoutofmodernwebsitesisthecookie.TheHTTPresponsessentbyaservercanoptionallyincludeanumberofSet-cookie:headersthatbrowsersstoreonbehalfoftheuser.Ineverysubsequentrequestmadetothatsite,thebrowserwillincludeaCookie:headercorrespondingtoeachcookiethathasbeenset.

Themostobvioususeoscookiesistokeepupwithuseridentity.Tosupportloggingin,awebsitecandeployanormalformthatasksforyourusernameandpassword(ore-mailaddressandpassword,orwhatever).

Cookiescanalsobeusedforfeatsotherthansimplyidentifyingusers.Forexample,asitecanissueacookietoeverybrowserthatconnects,enablingittotrackevencasualvisitors.Thisapproachenablesanonlinestoretoletvisitorsstartbuildingashoppingcartfullofitemswithouteverbeingforcedtocreateanaccount.

Fromthepointofviewofawebclient,cookiesaremoderatelyshortstringsthathavetobestoredandthendivulgedwhenmatchingrequestsaremade.ThePythonStandardLibraryputsthislogicinitsownmodule,cookielib(ThecookielibmoduledefinesclassesforautomatichandlingofHTTPcookies.),whoseCookieJarobjectscanbeusedassmallcookiedatabasesbytheHTTPCookieProcessorin`urllib2.Toseeitseffect,youneedgonofurtherthanthefrontpageofGoogle,whichsetscookiesinthemereeventofanunknownvisitorarrivingatthesiteforthefirsttime.Hereishowwecreateanewopenerthatknowsaboutcookies:

>>>importcookielib

>>>cj=cookielib.CookieJar()

>>>cookie_opener=urllib2.build_opener(VerboseHTTPHandler,

...urllib2.HTTPCookieProcessor(cj))

OpeningtheGooglefrontpagewillresultintwodifferentcookiesgettingset:

>>>response=cookie_opener.open('http://www.google.com/')

--------------------------------------------------

GET/HTTP/1.1

...

--------------------Response--------------------

HTTP/1.1200OK

...

Set-Cookie:PREF=ID=94381994af6d5c77:FF=0:TM=1288205983:LM=1288205983:S=Mtwivl7EB73uL5Ky;

expires=Fri,26-Oct-201218:59:43GMT;path=/;domain=.google.com

Set-Cookie:NID=40=rWLn_I8_PAhUF62J0yFLtb1-AoftgU0RvGSsa81FhTvd4vXD91iU5DOEdxSVt4otiISY-

3RfEYcGFHZA52w3-85p-hujagtB9akaLnS0QHEt2v8lkkelEGbpo7oWr9u5;expires=Thu,28-Apr-2011

18:59:43GMT;path=/;domain=.google.com;HttpOnly

...

Ifyouinvestigatemoreaboutcookielib,youwillfindthatyoucandomorethanqueryandmodifythecookiesthathavebeenset.Youcanalsoautomaticallystoretheminafile,sothattheysurvivefromonePythonsessiontothenext.Youcanevencreatecookieprocessorsthatimplementyourowncustompolicieswithrespecttowhichcookiestostoreandwhichtodivulge.

Serverscanconstrainacookietoaparticulardomainandpath,inadditiontosettingaMax-ageorexpirestime.Unfortunately,somebrowsersignorethissetting,sositesshouldneverbasetheirsecurityontheassumptionthattheexpirestimewillbeobeyed.Therefore,serverscanmarkcookiesassecure;thisensuresthatsuchcookiesareonlytransmittedwithHTTPSrequeststothesiteandneverinunsecureHTTPrequests.

Cookies

Page 114: Python Networking Gitbook

Aperpetualproblemwithcookiesisthatwebsitedesignersdonotseemtorealizethatcookiesneedtobeprotectedaszealouslyasyourusernameandpassword.Whileitistruethatwell-designedcookiesexpireandwillnolongerbeacceptedasvalidbytheserver,cookies—whiletheylast—giveexactlyasmuchaccesstoawebsiteasausernameandpassword.

Somesitesdonotprotectcookiesatall:theymightrequireHTTPSforyourusernameandpassword,butthenreturnyoutonormalHTTPfortherestofyoursession.OthersitesaresmartenoughtoprotectsubsequentpageloadswithHTTPS,evenafteryouhavelefttheloginpage,buttheyforgetthatstaticdatafromthesamedomain,likeimages,decorations,CSSfiles,andJavaScriptsourcecode,willalsocarryyourcookie.ThebetteralternativesaretoeithersendallofthatinformationoverHTTPS,ortocarefullyserveitfromadifferentdomainorpaththatisoutsidethejurisdictionofthesessioncookie.

ShouldyouhappentoobserveorcaptureaCookie:headerfromanHTTPrequestthatyouobserve,rememberthatthereisnoneedtostoreitinaCookieJarorrepresentitasacookielibobjectatall.Indeed,youcouldnotdothatanywaybecausetheoutgoingCookie:headerdoesnotrevealthedomainandpathrulesthatthecookiewasstoredwith.Instead,justinjecttheCookie:headerrawintotherequestsyoumaketothewebsite: pythonrequest=urllib2.Request(url)

request.add_header('Cookie',intercepted_value)info=urllib2.urlopen(request)

HTTPSessionHijacking

Page 115: Python Networking Gitbook

Theearliestexperimentswithscriptsthatcouldruninwebbrowsersrevealedaproblem:alloftheHTTPrequestsmadebythebrowserweredonewiththeauthorityoftheuser’scookies,sopagescouldcausequiteabitoftroublebyattemptingto,say,POSTtotheonlinewebsiteofapopularbankaskingthatmoneybetransferredtotheattacker’saccount.Anyonewhovisitedtheproblemsitewhileloggedontothatparticularbankinanotherwindowcouldlosemoney.Toaddressthis,browsersimposedtherestrictionthatscriptsinlanguageslikeJavaScriptcanonlymakeconnectionsbacktothesitethatservedthewebpage,andnottootherwebsites.Thisiscalledthe“sameoriginpolicy.”

Today,would-beattackersfindwaysaroundthispolicybyusingaconstellationofattackscalledcross-sitescripting(knownbytheacronymXSStopreventconfusionwithCascadingStyleSheets).Thesetechniquesincludethingslikefindingthefieldsonawebpagewherethesitewillincludesnippetsofuser-provideddatawithoutproperlyescapingthem,andthenfiguringouthowtocraftasnippetofdatathatwillperformsomecompromisingactiononbehalfoftheuserorsendprivateinformationtoathirdparty.Next,thewouldbeattackersreleasealinkorcodecontainingthatsnippetontoapopularwebsite,bulletinboard,orinspame-mails,hopingthatthousandsofpeoplewillclickandinadvertentlyassistintheirattackagainstthesite.Thereareacollectionoftechniquesthatareimportantforavoidingcross-sitescripting;youcanfindtheminanygoodreferenceonwebdevelopment.Themostimportantonesincludethefollowing:

WhenprocessingaformthatissupposedtosubmitaPOSTrequest,alwayscarefullydisregardanyGETparameters.

NeversupportURLsthatproducesomesideeffectorperformsomeactionsimplythroughbeingthesubjectofaGET.

Ineveryform,includenotonlytheobviousinformation—suchasadollaramountanddestinationaccountnumberforbanktransfers—butalsoahiddenfieldwithasecretvaluethatmustmatchforthesubmissiontobevalid.Thatway,randomPOSTrequeststhatattackersgeneratewiththedollaramountanddestinationaccountnumberwillnotworkbecausetheywilllackthesecretthatwouldmakethesubmissionvalid.

WhilethepossibilitiesforXSSarenot,strictlyspeaking,problemsorissueswiththeHTTPprotocolitself,ithelpstohaveasolidunderstandingofthemwhenyouaretryingtowriteanyprogramthatoperatessafelyontheWorldWideWeb.

AlibrarycalledWebObisalsoavailableforPython(andlistedonthePythonPackageIndex)thatcontainsHTTPrequestandresponseclassesthatweredesignedfromtheotherdirection:thatis,theywereintendedallalongasgeneral-purposerepresentationsofHTTPinallofitslow-leveldetails.YoucanlearnmoreaboutthemattheWebObprojectwebpage:http://pythonpaste.org/webob/

Cross-SiteScriptingAttacks

Page 116: Python Networking Gitbook

Mostwebsitesaredesignedfirstandforemostforhumaneyes.Whilewell-designedsitesofferformalAPIsbywhichyoucanconstructGooglemaps,uploadFlickrphotos,orbrowseYouTubevideos,manysitesoffernothingbutHTMLpagesformattedforhumans.Ifyouneedaprogramtobeabletofetchitsdata,thenyouwillneedtheabilitytodiveintodenselyformattedmarkupandretrievetheinformationyouneed—aprocessknownaffectionatelyasscreenscraping.

ScreenScraping

Page 117: Python Networking Gitbook

BeforeyoucanparseanHTML-formattedwebpage,youofcoursehavetoacquiresome.Herearesomeoptionsfordownloadingcontent.

Youcanuseurllib2,ortheevenlower-levelhttplib,toconstructanHTTPrequestthatwillreturnawebpage.Foreachformthathastobefilledout,youwillhavetobuildadictionaryrepresentingthefieldnamesanddatavaluesinside;unlikearealwebbrowser,theselibrarieswillgiveyounohelpinsubmittingforms.

Youcantoinstallmechanizeandwriteaprogramthatfillsoutandsubmitswebformsmuchasyouwoulddowhensittinginfrontofawebbrowser.Thedownsideisthat,tobenefitfromthisautomation,youwillneedtodownloadthepagecontainingtheformHTMLbeforeyoucanthensubmitit—possiblydoublingthenumberofwebrequestsyouperform.

Ifyouneedtodownloadandparseentirewebsites,takealookattheScrapyproject,hostedathttp://scrapy.org,whichprovidesaframeworkforimplementingyourownwebspiders.Withthetoolsitprovides,youcanwriteprogramsthatfollowlinkstoeverypageonawebsite,tabulatingthedatayouwantextractedfromeachpage.

WhenwebpageswindupbeingincompletebecausetheyusedynamicJavaScripttoloaddatathatyouneed,youcanusetheQtWebKitmoduleofthePyQt4librarytoloadapage,lettheJavaScriptrun,andthensaveorparsetheresultingcompleteHTMLpage.

Finally,ifyoureallyneedabrowsertoloadthesite,boththeSeleniumandWindmilltestplatformsprovideawaytodriveastandardwebbrowserfrominsideaPythonprogram.Youcanstartthebrowserup,directittothepageofinterest,filloutandsubmitforms,dowhateverelseisnecessarytobringupthedatayouneed,andthenpulltheresultinginformationdirectlyfromtheDOMelementsthatholdthem.

FetchingWebPages

Page 118: Python Networking Gitbook

Thetaskofgrabbinginformationfromawebsiteusuallystartsbyreadingitcarefullywithawebbrowserandfindingaroutetotheinformationyouneed.

Figurefetch_urllib2.pyshowsthesiteoftheNationalWeatherService;forourfirstexample,wewillwriteaprogramthattakesacityandstateasargumentsandprintsoutthecurrentconditions,temperature,andhumidity.

Whenusingtheurllib2modulefromtheStandardLibrary,youwillhavetoreadthewebpageHTMLmanuallytofindtheform.YoucanusetheViewSourcecommandinyourbrowser,searchforthewords“Localforecast,”andfindthefollowingforminthemiddleoftheseaofHTML:

<formmethod="post"action="http://forecast.weather.gov/zipcity.php"...>

...

<inputtype="text"id="zipcity"name="inputstring"size="9"

»value="City,St"onfocus="this.value='';"/>

<inputtype="submit"name="Go2"value="Go"/>

</form>

Theonlyimportantelementsherearethe<form>itselfandthe<input>fieldsinside;everythingelseisjustdecorationintendedtohelphumanreaders.ThisformdoesaPOSTtoaparticularURLwith,itappears,justoneparameter:aninputstringgivingthecitynameandstate.fetch_urllib2.pyshowsasimplePythonprogramthatusesonlytheStandardLibrarytoperformthisinteraction,andsavestheresulttophoenix.html.

importurllib,urllib2

data=urllib.urlencode({'inputstring':'Phoenix,AZ'})

info=urllib2.urlopen('http://forecast.weather.gov/zipcity.php',data)

content=info.read()

open('phoenix.html','w').write(content)

Ontheonehand,urllib2makesthisinteractionveryconvenient;weareabletodownloadaforecastpageusingonlyafewlinesofcode.But,ontheotherhand,wehadtoreadandunderstandtheformourselvesinsteadofrelyingonanactualHTMLparsertoreadit.Theapproachencouragedbymechanizeisquitedifferent:youneedonlytheaddressoftheopeningpagetogetstarted,andthelibraryitselfwilltakeresponsibilityforexploringtheHTMLandlettingyouknowwhatformsarepresent.Herearetheformsthatitfindsonthisparticularpage:

>>>importmechanize

>>>br=mechanize.Browser()

>>>response=br.open('http://www.weather.gov/')

>>>forforminbr.forms():

...print'%r%r%s'%(form.name,form.attrs.get('id'),form.action)

...forcontrolinform.controls:

...print'',control.type,control.name,repr(control.value)

NoneNonehttp://search.usa.gov/search

»hiddenv:project'firstgov'

»textquery''

»radioaffiliate['nws.noaa.gov']

»submitNone'Go'

NoneNonehttp://forecast.weather.gov/zipcity.php

»textinputstring'City,St'

»submitGo2'Go'

'jump''jump'http://www.weather.gov/

»selectmenu['http://www.weather.gov/alerts-beta/']

»buttonNoneNone

Oncewehavedeterminedthatweneedthezipcity.phpform,wecanwriteaprogramlikethatshowninetch_mechanize.py.Youcanseethatatnopointdoesitbuildasetofformfieldsmanuallyitself,aswasnecessaryinourpreviouslisting.Instead,itsimplyloadsthefrontpage,setstheonefieldvaluethatwecareabout,andthenpressestheform’ssubmitbutton.NotethatsincethisHTMLformdidnotspecifyaname,wehadtocreateourownfilterfunction—the

DownloadingPagesThroughFormSubmission

Page 119: Python Networking Gitbook

lambdafunctioninthelisting—tochoosewhichofthethreeformswewanted.

importmechanize

br=mechanize.Browser()

br.open('http://www.weather.gov/')

br.select_form(predicate=lambda(form):'zipcity'inform.action)

br['inputstring']='Phoenix,AZ'

response=br.submit()

content=response.read()

open('phoenix.html','w').write(content)

Manymechanizeusersinsteadchoosetoselectformsbytheorderinwhichtheyappearinthepage—inwhichcasewecouldhavecalledselect_form(nr=1).ButIprefernottorelyontheorder,sincetherealidentityofaformisinherentintheactionthatitperforms,notitslocationonapage.

Page 120: Python Networking Gitbook

TheHypertextMarkupLanguage(HTML)isoneofmanymarkupdialectsbuiltatoptheStandardGeneralizedMarkupLanguage(SGML),whichbequeathedtotheworldtheideaofusingthousandsofanglebracketstomarkupplaintext.InsertingboldanditalicsintoaformatlikeHTMLisassimpleastypingeightanglebrackets:

The<b>very</b>strangebook<i>TristramShandy</i>.TheverystrangebookTristramShandy.

IntheterminologyofSGML,thestrings<b>and</b>areeachtags—theyare,infact,anopeningandaclosingtag—andtogethertheycreateanelementthatcontainsthetextveryinsideit.Elementscancontaintextaswellasotherelements,andcandefineaseriesofkey/valueattributepairsthatgivemoreinformationabouttheelement:

<pcontent="personal">Iamreading<idocument="play">Hamlet</i>.</p>

IamreadingHamlet.

TheproblemwithSGMLlanguagesinthisregard—andHTMLisoneparticularexample—isthattheyexpectparserstoknowtherulesaboutwhichelementscanbenestedinsidewhichotherelements,andthisleadstoconstructionslikethisunorderedlist<ul>,insidewhichareseverallistitems<li>:

<ul><li>First<li>Second<li>Third<li>Fourth</ul>

FirstSecondThirdFourth

SinceHTMLinfactsaysthat

elementscannotnest,anHTMLparserwillunderstandtheforegoingsnippettobeequivalenttothismoreexplicitXMLstring:

<ul><li>First</li><li>Second</li><li>Third</li><li>Fourth</li></ul>

FirstSecondThirdFourth

AndbeyondthisimplicitunderstandingofHTMLthataparsermustpossessarethetwinproblemsthat,first,variousbrowsersovertheyearshavevariedwildlyinhowwelltheycanreconstructthedocumentstructurewhengivenveryconciseorevendeeplybrokenHTML;and,second,mostwebpageauthorsjudgethequalityoftheirHTMLbywhethertheirbrowserofchoicerendersitcorrectly.ThishasresultednotonlyinaWorldWideWebthatisfullofsiteswithinvalidandbrokenHTMLmarkup,butalsointhefactthatthepermissivenessbuiltintobrowsershasencourageddifferentflavorsofbrokenHTMLamongtheirdifferentusergroups.

Formoredocumentationaboutthesetopicvisit:

http://www.w3.org/MarkUp/Guide/http://www.w3.org/MarkUp/Guide/Advanced.htmlhttp://www.w3.org/MarkUp/Guide/Stylehttp://werbach.com/barebones/barebones.htmlhttp://www.w3.org/TR/REC-html40/http://validator.w3.org/http://tidy.sourceforge.net/

TheStructureofWebPages

Page 121: Python Networking Gitbook

ParsingHTMLwithPythonrequiresthreechoices:

TheparseryouwillusetodigesttheHTML,andtrytomakesenseofitstangleofopeningandclosingtags.

TheAPI(ApplicationProgrammingInterface)bywhichyourPythonprogramwillaccessthetreeofconcentricelementsthattheparserbuiltfromitsanalysisoftheHTMLpage.

Whatkindsofselectorsyouwillbeabletowritetojumpdirectlytothepartofthepagethatinterestsyou,insteadofhavingtostepintothehierarchyoneelementatatime.

Theissueofselectorsisaveryimportantone,becauseawell-writtenselectorcanunambiguouslyidentifyanHTMLelementthatinterestsyouwithoutyourhavingtotouchanyoftheelementsaboveitinthedocumenttree.

Now,Ishouldpauseforasecondtoexplaintermslike“deeper,”andIthinktheconceptwillbeclearestifwereconsidertheunorderedlistthatwasquotedintheprevioussection.Anexperiencedwebdeveloperlookingatthatlistrearrangesitinherhead,sothatthisiswhatitlookslike:

First

Second

Third

Fourth

<ul>

<li>First</li>

<li>Second</li>

<li>Third</li>

<li>Fourth</li>

</ul>

Herethe<ul>elementissaidtobea“parent”elementoftheindividuallistitems,which“wraps”themandwhichisonelevel“above”theminthewholedocument.The<li>elementsare“siblings”ofoneanother;eachisa“child”ofthe<ul>elementthat“contains”them,andtheysit“below”theirparentinthelargerdocumenttree.ThiskindofspatialthinkingwindsupbeingveryimportantforworkingyourwayintoadocumentthroughanAPI.

Inbrief,hereareyourchoicesalongeachofthethreeaxesthatwerejustlisted:

Themostpowerful,flexible,andfastestparseratthemomentappearstobetheHTMLParserthatcomeswithlxml;thenextmostpowerfulisthelongtimefavoriteBeautifulSoup;andcomingindeadlastaretheparsingclassesincludedwiththePythonStandardLibrary,whichnooneseemstouseforseriousscreenscraping.

ThebestAPIformanipulatingatreeofHTMLelementsisElementTree,whichhasbeenbroughtintotheStandardLibraryforusewiththeStandardLibraryparsers,andisalsotheAPIsupportedbylxml;BeautifulSoupsupportsanAPIpeculiartoitself;andapairofancient,ugly,event-basedinterfacestoHTMLstillexistinthePythonStandardLibrary.

Thelxmllibrarysupportstwoofthemajorindustry-standardselectors:CSSselectorsandXPathquerylanguage;BeautifulSouphasaselectorsystemallitsown,butonethatisverypowerfulandhaspoweredcountlessweb-scrapingprogramsovertheyears.

ThreeAxes

Page 122: Python Networking Gitbook

ThetreeofobjectsthataparsercreatesfromanHTMLfileisoftencalledaDocumentObjectModel,orDOM,eventhoughthisisofficiallythenameofoneparticularAPIdefinedbythestandardsbodiesandimplementedbybrowsersfortheuseofJavaScriptrunningonawebpage.

Thetaskwehavesetforourselves,youwillrecall,istofindthecurrentconditions,temperature,andhumidityinthephoenix.htmlpagethatwehavedownloaded

Therearetwoapproachestonarrowingyourattentiontothespecificareaofthedocumentinwhichyouareinterested.YoucaneithersearchtheHTMLforawordorphraseclosetothedatathatyouwant,or,aswementionedpreviously,useGoogleChromeorFirefoxwithFirebugto“InspectElement”andseetheelementyouwantembeddedinanattractivediagramofthedocumenttree.

Toseehowdirectdocument-objectmanipulationwouldworkinthiscase,wecanloadtherawpagedirectlyintoboththelxmlandBeautifulSoupsystems.

>>>importlxml.etree

>>>parser=lxml.etree.HTMLParser(encoding='utf-8')

>>>tree=lxml.etree.parse('phoenix.html',parser)

Theneedforaseparateparserobjecthereisbecause,asyoumightguessfromitsname,lxmlisnativelytargetedatXMLfiles.

>>>fromBeautifulSoupimportBeautifulSoup

>>>soup=BeautifulSoup(open('phoenix.html'))

Traceback(mostrecentcalllast):

...

HTMLParseError:malformedstarttag,atline96,column720

Whatonearth?Well,look,theNationalWeatherServicedoesnotcheckortidyitsHTM.Jumpingtoline96,column720ofphoenix.html,weseethattheredoesindeedappeartobesomebrokenHTML:

<ahref="http://www.weather.gov"<u>www.weather.gov</u></a>

Youcanseethatthe<u>tagstartsbeforeaclosinganglebrackethasbeenencounteredforthe<a>tag.ButwhyshouldBeautifulSoupcare.IwonderwhatversionIhaveinstalled.

>>>BeautifulSoup.__version__

'3.1.0'

Well,drat.ItypedtooquicklyandwasnotcarefultospecifyaworkingversionwhenIranpiptoinstallBeautifulSoupintomyvirtualenvironment.Let’stryagain:

root@erlerobot:~/Python_files#pipinstallBeautifulSoup==3.0.8.1

Now,ifweweretotaketheapproachofstartingatthetopofthedocumentanddiggingeverdeeperuntilwefindthenodethatweareinterestedin,wearegoingtohavetogeneratesomeveryverbosecode.Hereistheapproachwewouldhavetotakewithlxml:

DivingintoanHTMLDocument

Page 123: Python Networking Gitbook

>>>fonttag=tree.find('body').find('div').findall('table')[3]\

....findall('tr')[1].find('td').findall('table')[1].find('tr')\

....findall('td')[1].findall('table')[1].find('tr').find('td')\

....find('table').findall('tr')[1].find('td').find('table')\

....find('tr').find('td').find('font')

>>>fonttag.text

'\nAFewClouds'

AnattractivesyntacticconventionletsBeautifulSouphandlesomeofthesestepsmorebeautifully:

>>>fonttag=soup.body.div('table',recursive=False)[3]\

...('tr',recursive=False)[1].td('table',recursive=False)[1].tr\

...('td',recursive=False)[1]('table',recursive=False)[1].tr.td\

....table('tr',recursive=False)[1].td.table\

....tr.td.font

>>>fonttag.text

u'AFewClouds71&deg;F(22&deg;C)'

BeautifulSoupletsyouchoosethefirstchildelementwithagiventagbysimplyselectingtheattribute.tagname,andletsyoureceivealistofchildelementswithagiventagnamebycallinganelementlikeafunctionwiththetagnameandarecursiveoptiontellingittopayattentionjusttothechildrenofanelement.

BothlxmlandBeautifulSoupprovideattractivewaystoquicklygrabachildelementbasedonitstagnameandpositioninthedocument.Weclearlyshouldnotbeusingsuchprimitivenavigationtotrydescendingintoareal-worldwebpage.

FiguringouthowHTMLelementsaregrouped,bytheway,ismucheasierifyoueitherviewHTMLwithaneditorthatprintsitasatree,orifyourunitthroughatoollikeHTMLtidyfromW3Cthatcanindenteachtagtoshowyouwhichonesareinsidewhichotherones.tidyvalidate,correct,andpretty-printHTMLfiles.Youshouldusethiscommandline:

tidyphoenix.html>phoenix-tidied.html

Page 124: Python Networking Gitbook

Aselectorisapatternthatiscraftedtomatchdocumentelementsonwhichyourprogramwantstooperate.Someofthemare:

PeoplewhoaredeeplyXML-centricpreferXPathexpressions,whichareacompaniontechnologytoXMLitselfandletyoumatchelementsbasedontheirancestors,theirownidentity,andtextualmatchesagainsttheirattributesandtextcontent.

Ifyouareawebdeveloper,thenyouprobablylinktoCSSselectorsasthemostnaturalchoiceforexaminingHTML.

BothlxmlandBeautifulSoup,aswehaveseen,provideasmatteringoftheirownmethodsforfindingdocumentelements.

Herearestandardsanddescriptionsforeachoftheselectorstylesjustdescribed:

http://www.w3.org/TR/xpath/http://codespeak.net/lxml/tutorial.html#using-xpath-to-find-texthttp://codespeak.net/lxml/xpathxslt.htmlhttp://www.w3.org/TR/CSS2/selector.htmlhttp://codespeak.net/lxml/cssselect.html

And,finally,herearelinkstodocumentationthatlooksatselectormethodspeculiartolxmlandBeautifulSoup:

http://codespeak.net/lxml/tutorial.html#elementpathhttp://www.crummy.com/software/BeautifulSoup/documentation.html

Now,hereyouhaveacompletedweatherscraperinthefileweather.py:

importsys,urllib,urllib2

importlxml.etree

fromlxml.cssselectimportCSSSelector

fromBeautifulSoupimportBeautifulSoup

iflen(sys.argv)<2:

print>>sys.stderr,'usage:weather.pyCITY,STATE'

exit(2)

data=urllib.urlencode({'inputstring':''.join(sys.argv[1:])})

info=urllib2.urlopen('http://forecast.weather.gov/zipcity.php',data)

content=info.read()

#Solution#1usingCSSSelector

parser=lxml.etree.HTMLParser(encoding='utf-8')

tree=lxml.etree.fromstring(content,parser)

big=CSSSelector('td.big')(tree)[0]

ifbig.find('font')isnotNone:

big=big.find('font')

print'Condition:',big.text.strip()

print'Temperature:',big.findall('br')[1].tail

tr=tree.xpath('.//td[b="Humidity"]')[0].getparent()

print'Humidity:',tr.findall('td')[1].text

print

#Solution#2usingBeautifulSoup

soup=BeautifulSoup(content)#doctest:+SKIP

big=soup.find('td','big')

ifbig.fontisnotNone:

big=big.font

print'Condition:',big.contents[0].string.strip()

temp=big.contents[3].stringorbig.contents[4].string#canbeeither

print'Temperature:',temp.replace('&deg;','')

tr=soup.find('b',text='Humidity').parent.parent.parent

print'Humidity:',tr('td')[1].string

print

Selectors

Page 125: Python Networking Gitbook

Takeintoaccountthatforrunningthisyoualsoneedtohavethelxmmoduleinstalled.

Page 126: Python Networking Gitbook

Thischapterfocusesontheactualactofprogramming.Everyotherissuethatweconsiderwillbeintheserviceofthisoverarchinggoal:tocreateanewwebserviceusingPythonasourlanguage.

WebApplications

Page 127: Python Networking Gitbook

Acceptablewebsiteperformancegenerallyrequirestheabilitytoserveseveralusersconcurrently.

Toavoidcorruptingin-memorydatastructures,CPythonemploysaGlobalInterpreterLock(GIL),sothatonlyonethreadinamulti-threadedprogramcanactuallybeexecutingPythoncodeatanygiventime.ThusPythonwillletyoucreateasmanythreadsasyouwantinagivenprocess;however,onlyonethreadcanruncodeatatime,asthoughyourthreadswereconfinedtoasingleprocessor.

Atypicalwebapplicationreceivesandparsestheuser'srequest,thenmakesacorrespondingrequesttothedatabasebehindit;whilethatthreadiswaitingforaresponsefromthedatabase,theGILisavailableforanyotherthreadsthatneedtorunPythoncode.Finallythedatabaseanswers;thewaitingthreadreacquirestheGIL;and,inaquickblazeofCPUactivity,thedataisturnedintoanattractivewebpage,andtheresponseissentwingingitswaybacktotheuser.

Thusthreadscansometimesatleastperformdecently.Nevertheless,multipleprocessesarethemoregeneralwaytoscale.Thisisbecause,asaservicegetsbigger,additionalprocessescanbebroughtuponadditionalmachines,ratherthanbeingconfinedtoasinglemachine.Threads,nomattertheirothermerits,cannotdothat!TherearetwogeneralapproachestorunningaPythonwebapplicationinsideofacollectorofidenticalworkerprocesses:

TheApachewebservercanbecombinedwiththepopularmod_wsgimoduletohostaseparatePythoninterpreterineveryApacheworkerprocess.

ThewebapplicationcanberuninsideofeithertheflupserverortheuWSGIserver.BothoftheseserverswillmanageapoolofworkerprocesseswhereeachprocesshostsaPythoninterpreterrunningyourapplication.Thefront-endwebservercansubmitrequeststoflupusingeitherthestandardFastCGI(FCGI)orSimpleCGI(SCGI)protocol,whileithastospeaktouWSGIinitsownspecial“uwsgi”protocol(whosenameisalllowercasetodistinguishitfromthenameoftheserver).

WebServersandPython

Page 128: Python Networking Gitbook

AllofthepopularopensourcewebserverscanbeusedtoservePythonwebapplications,sothefullrangeofmodernoptionsisavailable:

ApacheHTTPServer:SincetakingtheleadasthemostpopularHTTPserverbackin1996.Itsstatedgoalisflexibilityandmodularity;itisreasonablyfast,butitwillnotwinspeedrecordsagainstmorerecentserversthatfocusonlyonspeed.Itsconfigurationfilescanbeabitlongandverbose,butthroughthemApacheoffersverypowerfuloptionsforapplyingdifferentrulesandbehaviorstodifferentdirectoriesandURLs.Avarietyofextensionmodulesareavailable(manyofwhichcomebundledwithit),anduserdirectoriescanhaveseparate.htaccessconfigurationfilesthatmakefurtheradjustmentstothemainconfiguration.

nginx(“engineX”):Thenginxserverhasbecomeagreatfavoriteoforganizationswithalargevolumeofcontentthatneedstobeservedquickly.Itisconsideredfairlyeasytoconfigure.lighttpd(“lighty”):Firstwrittentodemonstrateanarchitecturethatcouldsupporttensofthousandsofopenclientsockets(bothnginxandCherokeearealsocontendersinthisclass),thisserverisknownforbeingveryeasytoconfigure.Somesystemadministratorscomplainaboutitsmemoryusage,butmanyothershaveobservednoproblemswithit.

Cherokee:Notonlydoesthisserverofferperformancethatmightedgeoutevennginxandlighttpd,butitletsyouconfiguretheserverthroughabuilt-inwebinterface.

SotocombineeachoftheseserverswithPython;forexampleinthecaseofApache:themod_wsgimodulehasadaemonmodewhereitinternallyrunsyourPythoncodeinsideastackofdedicatedserverprocessesthatareseparatefromApache.EachWebServerGatewayInterface(WSGI)processcanevenrunasadifferentuser.IfyoureallywanttouseApacheasyourfrontend,thisisoneofthebestoptionsavailable.

Butthemoststronglyrecommendedapproachtodayistosetuponeofthethreefastserverstoprovideyourstaticcontent,andthenuseoneofthefollowingthreetechniquestorunyourPythoncodebehindthem:

UseHTTPproxyingsothatyournginx,lighttpd,orCherokeefront-endserverdeliversHTTPrequestsfordynamicwebpagestoaback-endApacheinstancerunningmod_wsgi.

UsetheFastCGIprotocolorSCGIprotocoltotalktoaflupinstancerunningyourPythoncode.

UsetheuwsgiprotocoltotalktoauWSGIinstancerunningyourPythoncode.

Atthispoint,youunderstandsomethingofthelargercontextinwhichPythonwebapplicationsareusuallyrun;youarenowreadytoturnyourattentiontothetaskofprogramming.

ChoosingaWebServer

Page 129: Python Networking Gitbook

IntegratingPythonwithwebserverswasmuchimprovedbythecreationofPEP333,whichdefinesthePythonWebServerGatewayInterface(WSGI):http://legacy.python.org/dev/peps/pep-0333/.

WSGIintroducedasinglecallingconventionthateverywebservercouldimplement,therebymakingthatwebserverinstantlycompatiblewithallofthePythonwebapplicationsandwebframeworksthatalsosupportWSGI.

AtthePythonlibraryyoucangetmoreinformationaboutthewsgirefmodule.ThismoduleprovidesavarietyofutilityfunctionsforworkingwithWSGIenvironments.Thewsgirefpackage,whosesimple_serverwewilluseintheexample,alsocontainsseveralutilitiesforworkingwithWSGI.Itincludesfunctionsforexamining,furtherunpacking,andmodifyingtheenvironobject;aprebuiltiteratorforstreaminglargefilesbacktotheserver;andevenavalidatesub-modulewhoseroutinescancheckaWSGIapplicationtoseewhetheritcomplieswiththespecificationwhenpresentedwithaseriesofrepresentativerequests.

Developersgenerallyavoidwritingraw.WSGIapplicationsbecausetheconveniencesofevenasimplewebframeworkmakecodesomucheasiertowriteandmaintain.But,forthesakeofillustration,wsgi_app.pyshowsasmallWSGIapplicationwhosefrontpageaskstheusertotypeastring.Submittingthestringtakestheusertoasecondwebpage,wherehecanseeitsbase64encoding.Fromthere,alinkwilltakehimbacktothefirstpagetorepeattheprocess.

importcgi,base64

fromwsgiref.simple_serverimportmake_server

defpage(content,*args):

yield'<html><head><title>wsgi_app.py</title></head><body>'

yieldcontent%args

yield'</body>'

defsimple_app(environ,start_response):

gohome='<br><ahref="/">Returntothehomepage</a>'

q=cgi.parse_qs(environ['QUERY_STRING'])

ifenviron['PATH_INFO']=='/':

ifenviron['REQUEST_METHOD']!='GET'orenviron['QUERY_STRING']:

start_response('400BadRequest',[('Content-Type','text/plain')])

return['Error:thefrontpageisnotaform']

start_response('200OK',[('Content-Type','text/html')])

returnpage('Welcome!Enterastring:<formaction="encode">'

'<inputname="mystring"><inputtype="submit"></form>')

elifenviron['PATH_INFO']=='/encode':

ifenviron['REQUEST_METHOD']!='GET':

start_response('400BadRequest',[('Content-Type','text/plain')])

return['Error:thisformdoesnotsupportPOSTparameters']

if'mystring'notinqornotq['mystring'][0]:

start_response('400BadRequest',[('Content-Type','text/plain')])

return['Error:thisformrequiresa"mystring"parameter']

my=q['mystring'][0]

start_response('200OK',[('Content-Type','text/html')])

returnpage('<tt>%s</tt>base64encodedis:<tt>%s</tt>'+gohome,

cgi.escape(repr(my)),cgi.escape(base64.b64encode(my)))

else:

start_response('404NotFound',[('Content-Type','text/plain')])

return['ThatURLisnotvalid']

print'Listeningonlocalhost:8000'

make_server('localhost',8000,simple_app).serve_forever()

importcgi,base64fromwsgiref.simple_serverimportmake_server

defpage(content,*args):yield''yieldcontent%argsyield''

WSGI

Page 130: Python Networking Gitbook

defsimple_app(environ,start_response):gohome='Returntothehomepage'q=cgi.parse_qs(environ['QUERY_STRING'])

ifenviron['PATH_INFO']=='/':

ifenviron['REQUEST_METHOD']!='GET'orenviron['QUERY_STRING']:

start_response('400BadRequest',[('Content-Type','text/plain')])

return['Error:thefrontpageisnotaform']

start_response('200OK',[('Content-Type','text/html')])

returnpage('Welcome!Enterastring:<formaction="encode">'

'<inputname="mystring"><inputtype="submit"></form>')

elifenviron['PATH_INFO']=='/encode':

ifenviron['REQUEST_METHOD']!='GET':

start_response('400BadRequest',[('Content-Type','text/plain')])

return['Error:thisformdoesnotsupportPOSTparameters']

if'mystring'notinqornotq['mystring'][0]:

start_response('400BadRequest',[('Content-Type','text/plain')])

return['Error:thisformrequiresa"mystring"parameter']

my=q['mystring'][0]

start_response('200OK',[('Content-Type','text/html')])

returnpage('<tt>%s</tt>base64encodedis:<tt>%s</tt>'+gohome,

cgi.escape(repr(my)),cgi.escape(base64.b64encode(my)))

else:

start_response('404NotFound',[('Content-Type','text/plain')])

return['ThatURLisnotvalid']

Thefirstthingtonoteinthiscodelistingisthattwoverydifferentobjectsarebeingcreated:aWSGIserverthatknowshowtouseHTTPtotalktoawebbrowserandanapplicationwrittentorespondcorrectlywheninvokedpertheWSGIcallingconvention.Notethatthesetwopieces—theclientandserver—couldeasilybeswappedout.Thiscodeexampleshouldmakethecallingconventionclearenough:

Foreachincomingrequest,theapplicationiscalledwithanenvironobject,givingitthedetailsoftheHTTPrequestandalive,callable,andnamedstart_response().

OncetheapplicationhasdecidedwhatHTTPresponsecodeandheadersneedtobereturned,itmakesasinglecalltostart_response().ItsheaderswillbecombinedwithanyheadersthattheWSGIservermightalreadyprovidetotheclient.

Finally,theapplicationneedsonlytoreturntheactualcontent—eitheralistofstringsorageneratoryieldingstrings.Eitherway,thestringswillbeconcatenatedbytheWSGIservertoproducetheresponsebodythatistransmittedbacktotheclient.Generatorsareusefulforcaseswhereitwouldbeunwiseforanapplicationtotryloadingallofthecontent(likelargefiles)intomemoryatonce.

Page 131: Python Networking Gitbook

StandardinterfaceslikeWSGImakeitpossiblefordeveloperstocreatewrappers—adesign-patternspersonwouldcalltheseadapters—thatacceptarequestfromaserver;modify,adjust,orrecordtherequest;andthencallanormalWSGIapplicationwiththemodifiedenvironment.Suchmiddlewarecanalsoinspectandadjusttheoutgoingdatastream;everything,infact,isupforgrabs,andessentialarbitrarychangescanbemadebothtothecircumstancesunderwhichaWSGIapplicationruns,aswellastothecontentthatitreturns.

IfseveralWSGIapplicationsneedtoliveatasinglewebsiteunderdifferentURLs,thenapieceofmiddlewarecanbegiventheURLs.(youcanreadmoreinhttp://pythonpaste.org/)

IfeachWSGIapplicationonawebsiteweretokeepitsownlistofpasswordsandhonoronlyitsownsessioncookies,thenuserswouldhavetologinagaineachtimetheycrossedanapplicationboundary.BydelegatingauthenticationtoWSGImiddleware,applicationscanberelievedevenofthedutytoprovidetheirownloginpage;instead,themiddlewareasksauserwholacksasessioncookietologin;onceauserisauthenticated,themiddlewarecanpassalongtheuser'sidentitytotheapplicationsbyputtingtheuser'sinformationintheenvironargument.Bothrepoze.whoandrepoze.whatcanhelpsiteintegratorsassertsite-widecontroloverusersandtheirpermissions.

Themingcanbeaproblemwhenseveralsmallapplicationsarecombinedtoformalargerwebsite.Thisisbecauseeachapplicationtypicallyhasitsownapproachtotheming.Thishasledtothedevelopmentoftwocompetingtools,xdvandDeliverance,thatletyoubuildasingleHTMLthemeandthenprovidesimplerulesthatpulltextoutofyourback-endapplicationsanddropitintoyourthemeintherightplaces.

DebuggerscanbecreatedthatcallaWSGIapplicationand,ifanuncaughtPythonexceptionisraised,displayanannotatedtracebacktosupportdebugging.WebErroractuallyprovidesthedeveloperwithalive,in-browserPythoncommandlinepromptforeverylevelinastacktraceatwhichthedevelopercaninvestigateafailure.Anotherpopulartoolisrepoze.profile,whichwatchestheapplicationasitprocessesrequestsandproducesareportonwhichfunctionsareconsumingthemostCPUcycles.

IfyouareinterestedinwhatWSGImiddlewareisavailable,thenyoucanvisitthispairofsitestolearnmore:

http://wsgi.org/wsgi/Middleware_and_Utilitieshttp://repoze.org/repoze_components.html#middleware

TodaythereareatleastthreemajorcompetingapproachesinthePythoncommunityforcraftingmodularcomponentsthatcanbeusedtobuildwebsites:

TheWSGImiddlewareapproachthinksthatcodereusecanoftenbestbeachievedthroughacomponentstack,whereeachcomponentusesWSGItospeaktothenext.Here,allinteractionhastosomehowbemadetofitthemodelofadictionaryofstringsbeinghandeddownandthencontentbeingpassedbackup.

EverythingbuiltatoptheZopeToolkitusesformalDesignPatternconceptslikeinterfacesandfactoriestoletcomponentsdiscoveroneanotherandbeconfiguredforoperation.Thankstoadapters,componentscanoftenbeusedwithwidgetsthatwerenotoriginallydesignedwithagiventypeofcomponentinmind.

Severalwebframeworkshavetriedtoadoptconventionsthatwouldmakeiteasyforthird-partypiecesoffunctionalitytobeaddedtoanapplicationeasily.TheDjangocommunityseemstohavetraveledthefarthestinthisdirection,butitalsolooksasthoughithasencounteredquiteseriousroadblocksincaseswhereacomponentneedstoadditsowntablestothedatabasethathaveforeign-keyrelationshipswithusertables.

Theseexamplesillustrateanimportantfact:WSGImiddlewareisagoodideathathasworkedverywellforasmallclassofproblemswheretheideaofwrappinganapplicationwithconcentricfunctionalitymakessolidsense.However,mostwebprogrammersseemtowanttousemoretypicalPythonmechanismslikeAPIs,classes,andobjectstocombinetheirowncodewithexistingcomponents.

WSGIMiddleware

Page 132: Python Networking Gitbook

Nowwearegoingtotalkaboutanentirelydifferentdiscipline:webapplicationdevelopment.

Networkprogrammersthinkaboutthingslikesockets,portnumbers,protocols,packetloss,latency,framing,andencodings.Althoughalloftheseconceptsmustalsobeinthebackofawebdeveloper'smind,heractualattentionisfocusedonasetoftechnologiessointricateandfast-changingthattheactualpacketsandlatenciesarerecalledtomindonlywhentheyarecausingtrouble.ThewebdeveloperneedstothinkinsteadaboutHTML,GET,POST,forms,REST,CSS,JavaScript,Ajax,APIs,sprites,compression,andemergingtechnologieslikeHTML5andWebSocket.Thewebsiteexistsinhermindprimarilyasaseriesofdocumentsthatuserswilltraversetoaccomplishgoals.

WebframeworksexisttohelpprogrammersstepbackfromthedetailsofHTTP—whichis,afterall,animplementationdetailmostusersneverevenbecomeawareof—andtowritecodethatfocusesonthenounsofwebdesign.wsgi_app.pyshowshowevenaverymodestPythonmicroframeworkcanbeusedtoreorienttheattentionofawebprogrammer.

Youcaninstalltheframeworkbottleandrunthelistingonceyouhaveactivatedavirtualenvironment,likethis:

Thebottle_app.py:

importbase64,bottle

bottle.debug(True)

app=bottle.Bottle()

@app.route('/encode')

@bottle.view('bottle_template.html')

defencode():

mystring=bottle.request.GET.get('mystring')

ifmystringisNone:

bottle.abort(400,'Thisformrequiresa"mystring"parameter')

returndict(mystring=mystring,myb=base64.b64encode(mystring))

@app.route('/')

@bottle.view('bottle_template.html')

defindex():

returndict(mystring=None)

bottle.run(app=app,host='localhost',port=8080)

root@erlerobot:~/Python_files#pipinstallbottle

root@erlerobot:~/Python_files#pythonbottle_app.py

Thewsgi_app.py:

importcgi,base64

fromwsgiref.simple_serverimportmake_server

defpage(content,*args):

yield'<html><head><title>wsgi_app.py</title></head><body>'

yieldcontent%args

yield'</body>'

defsimple_app(environ,start_response):

gohome='<br><ahref="/">Returntothehomepage</a>'

q=cgi.parse_qs(environ['QUERY_STRING'])

ifenviron['PATH_INFO']=='/':

ifenviron['REQUEST_METHOD']!='GET'orenviron['QUERY_STRING']:

start_response('400BadRequest',[('Content-Type','text/plain')])

return['Error:thefrontpageisnotaform']

start_response('200OK',[('Content-Type','text/html')])

returnpage('Welcome!Enterastring:<formaction="encode">'

'<inputname="mystring"><inputtype="submit"></form>')

PythonWebFrameworks

Page 133: Python Networking Gitbook

elifenviron['PATH_INFO']=='/encode':

ifenviron['REQUEST_METHOD']!='GET':

start_response('400BadRequest',[('Content-Type','text/plain')])

return['Error:thisformdoesnotsupportPOSTparameters']

if'mystring'notinqornotq['mystring'][0]:

start_response('400BadRequest',[('Content-Type','text/plain')])

return['Error:thisformrequiresa"mystring"parameter']

my=q['mystring'][0]

start_response('200OK',[('Content-Type','text/html')])

returnpage('<tt>%s</tt>base64encodedis:<tt>%s</tt>'+gohome,

cgi.escape(repr(my)),cgi.escape(base64.b64encode(my)))

else:

start_response('404NotFound',[('Content-Type','text/plain')])

return['ThatURLisnotvalid']

print'Listeningonlocalhost:8000'

make_server('localhost',8000,simple_app).serve_forever()

Inbottle_app.pytheattentionwasonthesingleincomingHTTPrequest,andthebranchesinourlogicexploredallofthepossiblelifespansforthatparticularprotocolrequest.wsgi_app.pychangesthefocustothepagesthatactuallyexistonthesiteandgivingeachofthesepagesreasonablebehaviors.Thesametreeofpossibilitiesexists,butthetreeexistsimplicitlythankstothepossibleURLsdefinedinthecode,notbecausetheprogrammerhaswrittenalargeifstatement.

%#Thepagetemplatethatgoeswithbottle_app.py.

%#

<html><head><title>bottle_app.py</title></head>

<body>

%ifmystringisNone:

Welcome!Enterastring:

<formaction="encode"><inputname="mystring"><inputtype="submit"></form>

%else:

<tt>{{mystring}}</tt>base64encodedis:<tt>{{myb}}</tt><br>

<ahref="/">Returntothehomepage</a>

%end

</body>

Itmightseemmerelyapleasantconveniencethatwecanusethe`BottleSimpleTemplatetoinsertourvariablesintoawebpageandknowthattheywillbeescapedcorrectly.Butthetruthisthattemplatesserve,justlikeschemesforURLdispatch,tore-orientourattention:insteadoftheresultingwebpageexistinginourmindsaswhatwillresultwhenthestringsinourprogramlistingarefinallyconcatenated,wegettolayoutitsHTMLintact,inorder,andinafilethatcanactuallytakean.htmlextensionandbehighlightedandindentedasHTMLinoureditor.ThePythonprogramwillnolongerimpedeourrelationshipwithourmarkup.

Andfull-fledgedPythonframeworksabstractawayevenmoreimplementationdetails.Averyimportantfeaturetheytypicallyprovideisdataabstraction:insteadoftalkingtoadatabaseusingitsrawAPIs,aprogrammercandefinemodels,layingoutthedatafieldssotheyareeasytoinstantiate,search,andmodify.AndsomeframeworkscanprovideentireRESTfulAPIsthatallowcreation,inspection,modification,anddeletionwithPUT,GET,POST,andDELETE.Theprogrammermerelyneedstodefinethestructureofhisdatadocument,andthennametheURLatwhichthetreeofRESTobjectsshouldbebased.

Whenlookingforawebframework,youwillfindthatthevariousframeworksdifferonafewmajorpoints.Theupcomingsectionswillwalkyouthroughwhatthesepointsare,andhowtheymightaffectyourdevelopmentexperience.

Page 134: Python Networking Gitbook

ThevariousPythonwebframeworkstendtohandleURLdispatchquitedifferently.

SomesmallframeworkslikeBottleandFlaskletyoucreatesmallapplicationsbydecoratingaseriesofcallableswithURLpatterns;smallapplicationscanthenbecombinedlaterbyplacingthembeneathoneormoretop-levelapplications.

Othersframeworks,likeDjango,Pylons,andWerkzeug,encourageeachapplicationtodefineitsURLsallinoneplace.Thisbreaksyourcodeintotwolevels,whereURLdispatchhappensinonelocationandrenderinginanother.ThisseparationmakesiteasiertoreviewalloftheURLsthatanapplicationsupports;italsomeansthatyoucanattachcodetonewURLswithouthavingtomodifythefunctionsthemselves.

Anotherapproachhasyoudefinecontrollers,whichareclassesthatrepresentsomepointintheURLhierarchy—say,thepath/cart—andthenwritemethodsonthecontrollerclassnamedview()andedit()ifyouwanttosupportsub-pagesnamed/cart/viewand/cart/edit.CherryPy,TurboGears2,andPylons(ifyouusecontrollersinsteadofRoutes)allsupportthisapproach.WhiledetermininglaterwhatURLsaresupportedcanmeantraversingamazeofdifferentconnectedclasses,thisapproachdoesallowfordynamic,recursiveURLspacesthatexistonlyatruntimeasclasseshandoffdispatchrequestsbasedonlivedataaboutthesitestructure.

AlargecommunitywithitsownconferencesexistsaroundtheZopeframework.

ThevariousmechanismsforURLdispatchcanallbeusedtoproducefairlycleandesign,andchoosingfromamongthemislargelyamatteroftaste.

URLDispatchTechniques

Page 135: Python Networking Gitbook

AlmostallwebframeworksexpectyoutoproducewebpagesbycombiningPythoncodecalledaviewwithanHTMLtemplate;yousawthisapproachinactioninwsgi_app.py.Thisapproachhasgainedtractionbecauseofitseminentmaintainability:buildingadictionaryofinformationisbestperformedinplainPythoncode,andtheitemsfetchedandarrangedbytheviewcantheneasilybeincludedbythetemplate,solongasthetemplatelanguagesupportsbasicactionslikeiterationandsomeformofexpressionevaluation.(Atemplateisadocumentconsistingofrowsandtables,withdifferentrangesandsizes,whichfacilitatesthedevelopmentofwebpages,lettersorothercontent).ItisoneofthegloriesofPythonthatweuseviewsandtemplates,andoneoftheshamesoftraditionalPHPdevelopmentthatdeveloperswouldfreelyintermixHTMLandextensivePHPcodetoproduceasingle,unifiedmess.

Viewscanalsobecomemoretestablewhentheironlyjobistogenerateadictionaryofdata.Agoodframeworkwillletyouwriteteststhatsimplychecktherawdatareturnedbythefunctioninsteadofmakingyoupeekrepeatedlyintofullyrenderedtemplatestoseeiftheviewcorralleditsdatacorrectly.

Thereseemtobetwomajordifferencesofopinionamongthedesignersandusersofthevarioustemplatelanguagesaboutwhatconstitutesthebestwaytousetemplates:

ShouldtemplatesbevalidHTMLwithiterationandexpressionshiddeninelementattributes?OrshouldthetemplatelanguageuseitsownstyleofmarkupthatfestoonsandwrapstheliteralHTMLofthewebpage?WhiletheformercanletthedeveloperrunHTMLvalidationagainsttemplatefilesbeforetheyareeverrenderedandbeassuredthatrenderingwillnotchangethevalidator'sverdict,mostdevelopersseemtofindthelatterapproachmucheasiertoreadandmaintain.

ShouldtemplatesallowarbitraryPythonexpressionsintemplatecode,orlockdowntheavailableoptionstoprimitiveoperationslikedictionaryget-itemandobjectget-attribute?Manypopularframeworkschoosethelatteroption,requiringevenlazyprogrammerstopushcomplexoperationsintotheirPythoncode“whereitbelongs.”Butseveraltemplatelanguagesreasonthat,ifPythonprogrammersdosowellwithouttypechecking,thenmaybetheyshouldalsobetrustedwiththechoiceofwhichexpressionsbelongintheviewandwhichinthetemplate.

SincemanyPythonframeworksletyoupluginyourtemplatelanguageofchoice,andonlyafewofthemlockyoudowntooneoption,youmightfindthatyoucanpairyourfavoriteapproaches.

Templates

Page 136: Python Networking Gitbook

AfunwaytodemonstratethatPythoncomeswith“batteriesincluded”istoenteradirectoryonyoursystemandruntheSimpleHTTPServerStandardLibrarymoduleasastand-aloneprogram:

root@erlerobot:~/Python_files#python-mSimpleHTTPServer

ServingHTTPon0.0.0.0port8000...

Ifyoudirectyourbrowsertolocalhost:8000,youwillseethecontentsofthisscript'scurrentdirectorydisplayedforbrowsing,suchasthelistingsprovidedbyApachewhenasiteleavesadirectorybrowsable.Documentsandimageswillloadinyourwebbrowserwhenselected,basedonthecontenttypeschosenthroughthebestguessesofthemimetypesStandardLibrarymodule.ThemimetypesmoduleconvertsbetweenafilenameorURLandtheMIMEtypeassociatedwiththefilenameextension.ConversionsareprovidedfromfilenametoMIMEtypeandfromMIMEtypetofilenameextension;encodingsarenotsupportedforthelatterconversion.

Yoday,weusenamespaces,callables,andduck-typedobjectstoprovidemuchcleanerformsofextensibility.Forexample,todayanobjectlikestart_responseisprovidedasanargument(dependencyinjection),andtheWSGIstandardspecifiesitsbehaviorratherthanitsinheritancetree(ducktyping).TheStandardLibraryincludestwootherHTTPservers:

CGIHTTPServertakestheSimpleHTTPServerand,insteadofjustservingstaticfilesoffofthedisk,itaddstheabilitytorunCGIscripts.

SimpleXMLRPCServerandDocXMLRPCServereachprovideaserverendpointagainstwhichclientprogramscanmakeXML-RPCremoteprocedurecalls.ThisprotocolusesXMLfilessubmittedthroughHTTPrequests.

Notethatnoneoftheprecedingserversistypicallyintendedforproductionuse;instead,theyareusefulforsmallinternaltasksforwhichyoujustneedaquickHTTPendpointtobeusedbyotherservicesinternaltoasystemorsubnet.AndwhilemostPythonwebframeworkswillprovideawaytorunyourapplicationfromthecommandlinefordebugging.Thesepure-Pythonwebserverscanbeveryusefulifyouarewritinganapplicationthatuserswillbeinstallinglocally,andyouwanttoprovideawebinterfacewithouthavingtoshipaseparatewebserverlikeApacheornginx.

Pure-PythonWebServers

Page 137: Python Networking Gitbook

Whenthefirstexperimentsweretakingplacewithdynamicallygeneratedwebpages,acallingconventionwasnecessary,andsotheCommonGatewayInterface(CGI)wasdefined.Itallowedprogramsinallsortsoflanguages—C,thevariousUnixshells,awk,Perl,Python,PHP,andsoforth—tobepartnersingeneratingdynamiccontent.

Today,thedesignofCGIisconsideredsomethingofadisaster.Runninganewprocessfromscratchisjustaboutthemostexpensivesingleoperationthatyoucanperformonamodernoperatingsystem,andrequiringthatthistakeplaceforeverysingleincomingHTTPrequestissimplymadness.YoushouldavoidCGIunderallcircumstances.ButitispossibleyoumightsomedayhavetoconnectPythoncodetoalegacyHTTPserverthatdoesnotsupportatleastFastCGIorSCGI,soIwilloutlineCGI'sessentialfeatures.ThreestandardlinesofcommunicationthatalreadyexistedbetweenparentandchildprocessesonUnixsystemswereusedbywebserverswheninvokingaCGIscript:

TheUnixenvironment—alistofstringsprovidedtoeachprocessuponitsinvocationthattraditionallyincludesthingslikeTZ=EST(thetimezone)andCOLUMNS=80(user'sscreenwidth)—wasinsteadstuffedfullofinformationabouttheHTTPrequestthattheCGIscriptwasbeingcalledupontoanswer.Thevariouspartsoftherequest'sURL;theuseragentstring;basicinformationaboutthewebserver;andevenacookiecouldbeincludedinthelistofcolon-separatedkeyvaluepairs.

Thestandardinputtothescriptcouldbereadtoend-of-filetoreceivewhateverdatahadbeensubmittedinthebodyoftheHTTPrequestusingPOST.WhetherarequestwasindeedaPOSTcouldbecheckedbyexaminingtheREQUEST_METHODenvironmentvariable.

Finally,thescriptwouldproducecontent,whichitdidbywritingHTTPheaders,ablankline,andthenaresponsebodytoitsstandardoutput.Tobeavalidresponse,aContent-Typeheaderwasgenerallynecessaryataminimum—thoughinitsabsence,somewebserverswouldinsteadacceptaLocationheaderasasignalthattheyshouldsendaredirect.

ShouldyoueverneedtorunPythonbehindanHTTPserverthatonlysupportsCGI,thenIrecommendthatyouusetheCGIHandlermodulefromthewsgirefStandardLibrarypackage(ThisisusefulwhenyouhaveaWSGIapplicationandwanttorunitasaCGIscript).ThisletsyouuseanormalPythonwebframeworktowriteyourservice—or,alternatively,torollupyoursleevesandwritearawWSGIapplication—andthenoffertheHTTPserveraCGIscript,asshownhere:

importCGIHandler,MyWSGIApp

my_wsgi_app=MyWSGIApp()#configurationnecessaryhere?

CGIHandler().run(my_wsgi_app)

BesuretocheckwhetheryourwebframeworkofchoicealreadyprovidesawaytoinvokeitasaCGIscript;ifso,yourwebframeworkwillalreadyknowallofthestepsinvolvedinloadingandconfiguringyourapplication.

CommonGatewayInterface(CGI)

Page 138: Python Networking Gitbook

AsitbecameclearthatCGIwasbothinefficientandinflexible—CGIscriptscouldnotflexiblysettheHTTPreturncode,forexample—itbecamefashionabletostartembeddingprogramminglanguagesdirectlyinwebservers.

Backintheearlydays,embeddingwasalsopossible,throughasomewhatdifferentapproachthatactuallymadePythonanextensionlanguageformuchoftheinternalsofApacheitself.Themodulethatsupportedthiswasmod_python,andforyearsitwasbyfarthemostpopularwaytoconnectPythontotheWorldWideWeb.The`mod_pythonApachemoduleputaPythoninterpreterinsideofeveryworkerprocessspawnedbyApache.ProgrammerscouldarrangefortheirPythoncodetobeinvokedbywritin.directivesintotheirApacheconfiguration.

Today,mod_pythonismainlyofhistoricalinterest.Ihaveoutlineditsfeatureshere,notonlybecauseyoumightbecalledupontomaintainorupgradeaservicethatisstillrunningonmod_python,butbecauseitstillprovidesuniqueApacheintegrationpointswherePythoncannotgetinvolvedinanyotherway.Ifyourunintoeithersituation,youcanfinditsdocumentationathttp://modpython.org/

mod_python

Page 139: Python Networking Gitbook

Here,wewilllearnabouttheactualpayloadthatiscarriedbyalloftheprotocolsinvolvedinwaysasamessageistransmittedandreceived(AuthenticatedSMTP,POP,IMAP),thatis,theformatofe-mailmessagesthemslves.

E-mailCompositionandDecoding

Page 140: Python Networking Gitbook

Eachtraditionale-mailmessagecontainstwodistinctparts:headersandthebody.Hereisaverysimplee-mailmessagesothatyoucanseewhatthetwosectionslooklike:

From:JaneSmith<[email protected]>

To:AlanJones<[email protected]>

Subject:TestingThisE-MailThing

HelloAlan,

Thisisjustatestmessage.Thanks.

Thefirstsectioniscalledtheheaders,whichcontainallofthemetadataaboutthemessage,likethesender,thedestination,andthesubjectofthemessage—everythingexceptthetextofthemessageitself.Thebodythenfollowsandcontainsthemessagetextitself.TherearethreebasicrulesofInternete-mailformatting:

Atleastduringactualtransmission,everylineofane-mailmessageshouldbeterminatedbythetwo-charactersequencecarriagereturn,newline,representedinPythonby'\r\n'.E-mailclientsrunningonyourlaptopordesktopmachinetendtomakedifferentdecisionsaboutwhethertostoremessagesinthisformat,orreplacethesetwo-characterlineendingswithwhateverendingisnativetoyouroperatingsystem.

Thefirstfewlinesofane-mailareheaders,whichconsistofaheadername,acolon,aspace,andavalue.Aheadercanbeseverallineslongbyindentingthesecondandfollowinglinesfromtheleftmarginasasignalthattheybelongtotheheaderabovethem.

Theheadersendwithablankline(thatis,bytwolineendingsback-to-backwithoutinterveningtext)andthenthemessagebodyiseverythingelsethatfollows.Thebodyisalsosometimescalledthepayload.

Theheadersarethereforthebenefitofthepersonwhoreadsthee-mailmessage,andthemostimportantheadersarethese:

From:Thisidentifiesthemessagesender.Itcanalso,intheabsenceofaReply-toheader,beusedasthedestinationwhenthereaderclicksthee-mailclient’s“Reply”button.

Reply-To:Thissetsanalternativeaddressforreplies,incasetheyshouldgotosomeonebesidesthesendernamedintheFromheader.

Subject:Thisisashortseveral-worddescriptionofthee-mail’spurpose,usedbymostclientswhendisplayingwholemailboxesfullofe-mailmessages.

Date:Thisisaheaderthatcanbeusedtosortamailboxintheorderinwhichemailsarrived.

Message-IDandIn-Reply-To:EachIDuniquelyidentifiesamessage,andtheseIDsarethenusedine-mailrepliestospecifyexactlywhichmessagewasbeingrepliedto.Thiscanhelpsophisticatedmailreadersperform“threading,”arrangingmessagessothatrepliesaregroupeddirectlybeneaththemessagestowhichtheyreply.

E-mailMessages

Page 141: Python Networking Gitbook

Howcanwegenerateatraditionale-mailinPythonwithouthavingtoimplementtheformattingdetailsourselves?Theansweristousethemoduleswithinthepowerfulemailpackage.Theemailpackageisalibraryformanagingemailmessages,includingMIMEandotherRFC2822-basedmessagedocuments.

Asourfirstexample,trad_gen_simple.pyshowsaprogramthatgeneratesasimplemessage.Notethatwhenyougeneratemessagesthisway,manuallysettingthepayloadwiththeMessageclass,youshouldlimityourselftousingplain7-bitASCIItext.

fromemail.messageimportMessage

text="""Hello,

Thisisatestmessage.

--Anonymous"""

msg=Message()

msg['To']='[email protected]'

msg['From']='TestSender<[email protected]>'

msg['Subject']='TestMessage'

msg.set_payload(text)

printmsg.as_string()

Theprogramissimple.ItcreatesaMessageobject,setstheheadersandbody,andprintstheresult.Whenyourunthisprogram,youwillgetaniceformattedmessagewithproperheaders:

root@erlerobot:~/Python_files#pythontrad_gen_simple.py

To:[email protected]

From:TestSender<[email protected]>

Subject:TestMessage

Hello,

Thisisatestmessage.

--Anonymous

root@erlerobot:~/Python_files#

Whiletechnicallycorrect,thismessageisactuallyabitdeficientwhenitcomestoprovidingenoughheaderstoreallyfunctioninthemodernworld.Foronething,moste-mailsshouldhaveaDateheader,inaformatspecifictoe-mailmessages.Pythonprovidesanemail.utils.formatdate()routinethatwillgeneratedatesintherightformat.YoushouldaddaMessage-IDheadertomessages.Thisheadershouldbegeneratedinsuchawaythatnoothere-mail,anywhereinhistory,willeverhavethesameMessage-ID.Thismightsounddifficult,butPythonprovidesafunctiontohelpdothataswell:email.utils.make_msgid().Sotakealookattrad_gen_newhdrs.py,whichfleshesoutourfirstsampleprogramintoamorecompleteexamplethatsetstheseadditionalheaders.

importemail.utils

fromemail.messageimportMessage

message="""Hello,

Thisisatestmessage.

--Anonymous"""

msg=Message()

msg['To']='[email protected]'

msg['From']='TestSender<[email protected]>'

msg['Subject']='TestMessage'

msg['Date']=email.utils.formatdate(localtime=1)

msg['Message-ID']=email.utils.make_msgid()

ComposingTraditionalMessages

Page 142: Python Networking Gitbook

msg.set_payload(message)

printmsg.as_string()

Ifyouruntheprogram,youwillnoticetwonewheadersintheoutput.

root@erlerobot:~/Python_files#pythontrad_gen_newhdrs.py

To:[email protected]

From:TestSender<[email protected]>

Subject:TestMessage

Date:Mon,14Jul201414:31:50+0200

Message-ID:<[email protected]>

Hello,

Thisisatestmessage.

--Anonymous

root@erlerobot:~/Python_files#

Page 143: Python Networking Gitbook

Whathappenswhenyoureceiveanincomingmessageasarawblockoftextandwanttolookinside?Well,theemailmodulealsoprovidessupportforparsinge-mailmessages,re-constructingthesameMessageobjectthatwouldhavebeenusedtocreatethemessageinthefirstplace.(Ofcourse,itdoesnotmatterwhetherthee-mailyouareparsingwasoriginallycreatedinPythonthroughtheMessageclass,orwhethersomeothere-mailprogramcreatedit;theformatisstandard,soPython’sparsingshouldworkeitherway.)Afterparsingthemessage,youcaneasilyaccessindividualheadersandthebodyofthemessageusingthesameconventionsasyouusedtocreatemessages:headerslooklikethedictionarykey-valuesoftheMessage,andthebodycanbefetchedwithafunction.

Asimpleexampleofaparserisshownintrad_parse.py.Alloftheactualparsingtakesplaceintheone-linefunctionmessage_from_file();everythingelseintheprogramlistingissimplyanillustrationofhowaMessageobjectcanbeminedforheadersanddata.

importemail

banner='-'*48

popular_headers=('From','To','Subject','Date')

msg=email.message_from_file(open('message.txt'))

headers=sorted(msg.keys())

printbanner

forheaderinheaders:

ifheadernotinpopular_headers:

printheader+':',msg[header]

printbanner

forheaderinheaders:

ifheaderinpopular_headers:

printheader+':',msg[header]

printbanner

ifmsg.is_multipart():

print"ThisprogramcannothandleMIMEmultipartmessages."

else:

printmsg.get_payload()

Theoutputshouldbelikethis

root@erlerobot:~/Python_files#pythontrad_parse.py

------------------------------------------------

Message-ID:<[email protected]>

------------------------------------------------

Date:Mon,14Jul201414:33:54+0200

From:TestSender<[email protected]>

Subject:TestMessage,Chapter12

To:[email protected]

------------------------------------------------

Hello,

Thisisatestmessage.

--Anonymous

root@erlerobot:~/Python_files#

Asyoucansee,thePythonStandardLibrarymakesitquiteeasybothtocreateandthentoparsestandardInternete-mailmessages.Notethattheemailpackagealsooffersamessage_from_string()functionthat,insteadoftakingafile,cansimplybehandedthestringcontainingane-mailmessage.

ParsingTraditionalMessages

Page 144: Python Networking Gitbook

TheemailpackageprovidestwofunctionsthatworktogetherasateamtohelpyouparsetheDatefieldofe-mailmessages,whoseformatyoucanseeintheprecedingexample:adateandtime,followedbyatimezoneexpressedashoursandminutes(twodigitseach)relativetoUTC.Countriesintheeasternhemisphereexperiencesunriseearly,sotheirtimezonesareexpressedaspositivenumbers,likethefollowing:

Date:Sun,27May200711:34:43+1000

Thoseofusinthewesternhemispherehavetowaitlongerforthesuntorise,soourtimezoneslagbehind;EasternDaylightTime,forexample,runsfourhoursbehindUTC:

Date:Sun,27May200708:36:37-0400

TofigureoutwhatmomentoftimeisreallymeantbyaDateheader,simplycalltwofunctionsinarow:

Callparsedate_tz()toextractthetimeandtimezone.Usemktime_tz()toaddorsubtractthetimezone.TheresultwithbeastandardUnixtimestamp.

Forexample,considerthetwoDateheadersshownpreviously.Ifyoujustcomparedtheirbaretimes,thefirstdatelookslater:11:34a.m.is,afterall,after8:36a.m.Butthesecondtimeisinfactthemuchlaterone,becauseitisexpressedinatimezonethatissomuchfartherwest.Wecantestthisbyusingthefunctionspreviouslynamed.First,turnthetopdateintoatimestamp:

>>>fromemail.utilsimportparsedate_tz,mktime_tz

>>>timetuple1=parsedate_tz('Sun,27May200711:34:43+1000')

>>>printtimetuple1

(2007,5,27,11,34,43,0,1,-1,36000)

>>>timestamp1=mktime_tz(timetuple1)

>>>printtimestamp1

1180229683.0

Thenturntheseconddateintoatimestampaswell,andthedatescanbecompareddirectly:

>>>timetuple2=parsedate_tz('Sun,27May200708:36:37-0400')

>>>timestamp2=mktime_tz(timetuple2)

>>>printtimestamp2

1180269397.0

>>>timestamp1<timestamp2

True

Ifyouhaveneverseenatimestampvaluebefore,theyrepresenttimeveryplainly:asthenumberofsecondsthathavepassedsincethebeginningof1970.YouwillfindfunctionsinPython’soldtimemodulefordoingcalculationswithtimestamps,andyouwillalsofindthatyoucanturnthemintonormalPythondatetimeobjectsquiteeasily:

>>>fromdatetimeimportdatetime

>>>datetime.fromtimestamp(timestamp2)

datetime.datetime(2007,5,27,8,36,37)

Intherealworld,manypoorlywrittene-mailclientsgeneratetheirDateheadersincorrectly.WhiletheroutinespreviouslyshowndotrytobeflexiblewhenconfrontedwithamalformedDate,theysometimescansimplymakenosenseofitandparsedate_tz()hastogiveupandreturnNone.Sowhencheckingareal-worlde-mailmessageforadate,remembertodoitinthreesteps:firstcheckwhetheraDateheaderispresentatall;thenbepreparedforNonetobereturnedwhenyouparseit;andfinallyapplythetimezoneconversiontogetarealtimestampthatyoucanworkwith.

ParsingDates

Page 145: Python Networking Gitbook

Sofarwehavediscussede-mailmessagesthatareplaintext:thecharactersaftertheblanklinethatendstheheadersaretobepresentedliterallytotheuserasthecontentofthee-mailmessage.Today,onlyafractionofthemessagessentacrosstheInternetaresosimple.

TheMultipurposeInternetMailExtensions(MIME)standardisasetofrulesforencodingdata,ratherthansimpleplaintext,insidee-mails.MIMEprovidesasystemforthingslikeattachments,alternativemessageformats,andtextthatisstoredinalternateencodings.BecauseMIMEmessageshavetobetransmittedanddeliveredthroughmanyofthesameolde-mailservicesthatwereoriginallydesignedtohandleplain-texte-mails,MIMEoperatesbyaddingheaderstoane-mailmessageandthengivingitcontentthatlookslikeplaintexttothemachinebutthatcanactuallybedecodedbyane-mailclientintoHTML,images,orattachments.

ThemostimportantfeaturesofMIMEare,first,thatMIMEsupportsmultipartmessages.Anormale-mailmessage,aswehaveseen,containssomeheadersandabody.ButaMIMEmessagecansqueezeseveraldifferentpartsintothemessagebody.Thesepartsmightbethingstobepresentedtotheuserinorder,likeaplain-textmessage,animagefileattachment,andthenaPDFattachment.Or,theycouldbealternativemultiparts,whichrepresentthesamecontentindifferentways—usually,byencodingamessageinbothplaintextandHTML.Second,MIMEsupportsdifferenttransferencodings.Traditionale-mailmessagesarelimitedto7-bitdata,whichrendersthemunusableforinternationalalphabets.MIMEhasseveralwaysoftransforming8-bitdatasoitfitswithintheconfinesofe-mailsystems:

The“plain”encodingisthesameasyouwouldseeintraditionalmessages,andpasses7-bittextunmodified.

“Base-64”isawayofencodingrawbinarydatathatturnsitintonormalalphanumericdata.Mostoftheattachmentsyousendandreceive—suchasimages,PDFs,andZIPfiles—areencodedwithbase-64.

“Quoted-printable”isahybridthattriestoleaveplainEnglishtextalonesothatitremainsreadableinoldmailreaders,whilealsolettingunusualcharactersbeincludedaswell.

MIMEalsoprovidescontenttypes,whichtelltherecipientwhatkindofcontentispresent.Forinstance,acontenttypeoftext/plainindicatesaplain-textmessage,whileimage/jpegisaJPEGimage.

YouwillrecallthatMIMEmessagesmustworkwithinthelimitedplain-textframeworkoftraditionalemailmessages.Todothat,theMIMEspecificationdefinessomeheadersandsomerulesaboutformattingthebodytext.

Fornon-multipartmessagesthatareasingleblockofdata,MIMEsimplyaddssomeheaderstospecifywhatkindofcontentthee-mailcontains,alongwithitscharacterset.Butthebodyofthemessageisstillasinglepiece,althoughitmightbeencodedwithoneoftheschemesalreadydescribed.

Formultipartmessages,thingsgettrickier:MIMEplacesaspecialmarkerinthee-mailbodyeverywherethatitneedstoseparateonepartfromthenext.Eachpartcanthenhaveitsownlimitedsetofheaders—whichoccuratthestartofthepart—followedbydata.Byconvention,themostbasiccontentinane-mailcomesfirst(likeaplain-textmessage,ifonehasbeenincluded),sothatpeoplewithoutMIME-awarereaderswillseetheplaintextimmediatelywithouthavingtoscrolldownthroughdozensorhundredsofpagesofMIMEdata.

UnderstandingMIME

HowMIMEworks

Page 146: Python Networking Gitbook

WewillstartbylookingathowtocreateMIMEmessages.Tocomposeamessagewithattachments,youwillgenerallyfollowthesesteps:

1. CreateaMIMEMultipartobjectandsetitsmessageheaders.2. CreateaMIMETextobjectwiththemessagebodytextandattachittotheMIMEMultipartobject.3. CreateappropriateMIMEobjectsforeachattachmentandattachthemtotheMIMEMultipartobject.4. Finally,callas_string()ontheMIMEMultipartobjecttowriteouttheresultingmessage.

Takealookatmime_gen_basic.pyforaprogramthatimplementsthisalgorithm.Youcanseethatpartsofthecodelooksimilartologicthatweusedtogenerateatraditionale-mail.Aftercreatingthemessageanditstextbody,theprogramloopsovereachfilegivenonthecommandlineandattachesittothegrowingmessage.

fromemail.mime.baseimportMIMEBase

fromemail.mime.multipartimportMIMEMultipart

fromemail.mime.textimportMIMEText

fromemailimportutils,encoders

importmimetypes,sys

defattachment(filename):

fd=open(filename,'rb')

mimetype,mimeencoding=mimetypes.guess_type(filename)

ifmimeencodingor(mimetypeisNone):

mimetype='application/octet-stream'

maintype,subtype=mimetype.split('/')

ifmaintype=='text':

retval=MIMEText(fd.read(),_subtype=subtype)

else:

retval=MIMEBase(maintype,subtype)

retval.set_payload(fd.read())

encoders.encode_base64(retval)

retval.add_header('Content-Disposition','attachment',

filename=filename)

fd.close()

returnretval

message="""Hello,

Thisisatestmessage.

--Anonymous"""

msg=MIMEMultipart()

msg['To']='[email protected]'

msg['From']='TestSender<[email protected]>'

msg['Subject']='TestMessage'

msg['Date']=utils.formatdate(localtime=1)

msg['Message-ID']=utils.make_msgid()

body=MIMEText(message,_subtype='plain')

msg.attach(body)

forfilenameinsys.argv[1:]:

msg.attach(attachment(filename))

printmsg.as_string()

Theattachment()functiondoestheworkofcreatingamessageattachmentobject.First,itdeterminestheMIMEtypeofeachfilebyusingPython’sbuilt-inmimetypesmodule.Ifthetypecan’tbedetermined,oritwillneedaspecialkindofencoding,thenatypeisdeclaredthatpromisesonlythatthedataismadeofa“streamofoctets”(sequenceofbytes)butwithoutanyfurtherpromiseaboutwhattheymean.IfthefileisatextdocumentwhoseMIMEtypestartswithtext/,aMIMETextobjectiscreatedtohandleit;otherwise,aMIMEBase)genericobjectiscreated.Inthelattercase,thecontentsareassumedtobebinary,sotheyareencodedwithbase-64.Finally,anappropriateContent-DispositionheaderisaddedtothatsectionoftheMIMEfilesothatmailreaderswillknowthattheyaredealingwithanattachment.

Theresultofrunningthisprogramisshownbelow:

ComposingMIMEAttachments

Page 147: Python Networking Gitbook

root@erlerobot:~/Python_files#echo"Thisisatest">test.txt

root@erlerobot:~/Python_files#gzip<test.txt>test.txt.gz

root@erlerobot:~/Python_files#pythonmime_gen_basic.pytest.txttest.txt.gz

Content-Type:multipart/mixed;boundary="===============1623374356=="

MIME-Version:1.0

To:[email protected]

From:TestSender<[email protected]>

Subject:TestMessage

Date:Mon,14Jul201414:36:07+0200

Message-ID:<[email protected]>

--===============1623374356==

Content-Type:text/plain;charset="us-ascii"

MIME-Version:1.0

Content-Transfer-Encoding:7bit

Hello,

Thisisatestmessage.

--Anonymous

--===============1623374356==

Content-Type:text/plain;charset="us-ascii"

MIME-Version:1.0

Content-Transfer-Encoding:7bit

Content-Disposition:attachment;filename="test.txt"

Thisisatest

--===============1623374356==

Content-Type:application/octet-stream

MIME-Version:1.0

Content-Transfer-Encoding:base64

Content-Disposition:attachment;filename="test.txt.gz"

H4sIAP3o2D8AAwvJyCxWAKJEhZLU4hIuAIwtwPoPAAAA

--===============1623374356==--

Themessagestartsofflookingquitesimilartothetraditionaloneswecreatedearlier;youcanseefamiliarheaderslikeTo,From,andSubjectjustlikebefore.NotetheContent-Typeline,however:itindicatesmultipart/mixed.ThattellsthemailreaderthatthebodyofthemessagecontainsmultipleMIMEparts,andthatthestringcontainingequalssignswillbetheseparatorbetweenthem.Nextcomesthemessage’sfirstpart.NoticethatithasitsownContent-Typeheader!Thesecondpartlookssimilartothefirst,buthasanadditionalContent-Dispositionheader;thiswillsignalmoste-mailreadersthatthepartshouldbedisplayedasafilethattheusercansaveratherthanbeingimmediatelydisplayedtothescreen.Finallycomesthepartcontainingthebinaryfile,encodedwithbase-64,whichmakesitnotdirectlyreadable.

Page 148: Python Networking Gitbook

MIME“alternative”partsletyougeneratemultipleversionsofasingledocument.Theuser’smailreaderwillthenautomaticallydecidewhichonetodisplay,dependingonwhichcontenttypeitlikesbest;somemailreadersmightevenshowtheuserradiobuttons,oramenu,andletthemchoose.Theprocessofcreatingalternativesissimilartotheprocessforattachments,andisillustratedinmime_gen_alt.py:

fromemail.mime.baseimportMIMEBase

fromemail.mime.multipartimportMIMEMultipart

fromemail.mime.textimportMIMEText

fromemailimportutils,encoders

defalternative(data,contenttype):

maintype,subtype=contenttype.split('/')

ifmaintype=='text':

retval=MIMEText(data,_subtype=subtype)

else:

retval=MIMEBase(maintype,subtype)

retval.set_payload(data)

encoders.encode_base64(retval)

returnretval

messagetext="""Hello,

Thisisa*great*testmessage.

--Anonymous"""

messagehtml="""Hello,<P>

Thisisa<B>great</B>testmessagefromChapter12.Ihopeyouenjoy

it!<P>

--<I>Anonymous</I>"""

msg=MIMEMultipart('alternative')

msg['To']='[email protected]'

msg['From']='TestSender<[email protected]>'

msg['Subject']='TestMessage,Chapter12'

msg['Date']=utils.formatdate(localtime=1)

msg['Message-ID']=utils.make_msgid()

msg.attach(alternative(messagetext,'text/plain'))

msg.attach(alternative(messagehtml,'text/html'))

printmsg.as_string()

Noticethedifferencesbetweenanalternativemessageandamessagewithattachments!Withthealternativemessage,noContent-Dispositionheaderisinserted.Also,theMIMEMultipartobjectispassedthealternativesubtypetotellthemailreaderthatallobjectsinthismultipartarealternativeviewsofthesamething.Noteagainthatitisalwaysmostpolitetoincludetheplain-textobjectfirstforpeoplewithancientorincapablemailreaders,whichsimplyshowthemtheentiremessageastext.

MIMEAlternativeParts

Page 149: Python Networking Gitbook

AlthoughyouhaveseenhowMIMEcanencodemessagebodypartswithbase-64toallow8-bitdatatopassthrough,thatdoesnotsolvetheproblemofspecialcharactersinheaders.Forinstance,ifyournamewasMichaelMuller(withanumlautoverthe“u”),youwouldhavetroublerepresentingyournameaccuratelyinyourownalphabet.The“u”wouldcomeoutbare.Therefore,MIMEprovidesawaytoencodedatainheaders.Takealookatmime_headers.pyforhowtodoitinPython.

fromemail.mime.textimportMIMEText

fromemail.headerimportHeader

message="""Hello,

Thisisatestmessage.

--Anonymous"""

msg=MIMEText(message)

msg['To']='[email protected]'

fromhdr=Header()

fromhdr.append(u"MichaelM\xfcller")

fromhdr.append('<[email protected]>')

msg['From']=fromhdr

msg['Subject']='TestMessage'

printmsg.as_string()

Thecode'\xfc'intheUnicodestring(stringsinPythonsourcefilesthatareprefixedwithucancontainarbitraryUnicodecharacters,ratherthanbeingrestrictedtocharacterswhosevalueisbetween0and255).

root@erlerobot:~/Python_files#pythonmime_headers.py

Content-Type:text/plain;charset="us-ascii"

MIME-Version:1.0

Content-Transfer-Encoding:7bit

To:[email protected]

From:=?iso-8859-1?q?Michael_M=FCller?=<[email protected]>

Subject:TestMessage

Date:Mon,14Jul201414:46:33+0200

Message-ID:<[email protected]>

Hello,

Thisisatestmessage.

--Anonymous

ComposingNon-EnglishHeaders

Page 150: Python Networking Gitbook

Nowthatyouknowhowtogenerateamessagewithalternativesandonewithattachments,youmaybewonderinghowtodoboth.Todothat,youcreateastandardmultipartforthemainmessage.Thenyoucreateamultipart/alternativeinsidethatforyourbodytext,andattachyourmessageformatstoit.Finally,youattachthevariousfiles.Takealookatmime_gen_both.pyforthecompletesolution.

fromemail.mime.textimportMIMEText

fromemail.mime.multipartimportMIMEMultipart

fromemail.mime.baseimportMIMEBase

fromemailimportutils,encoders

importmimetypes,sys

defgenpart(data,contenttype):

maintype,subtype=contenttype.split('/')

ifmaintype=='text':

retval=MIMEText(data,_subtype=subtype)

else:

retval=MIMEBase(maintype,subtype)

retval.set_payload(data)

encoders.encode_base64(retval)

returnretval

defattachment(filename):

fd=open(filename,'rb')

mimetype,mimeencoding=mimetypes.guess_type(filename)

ifmimeencodingor(mimetypeisNone):

mimetype='application/octet-stream'

retval=genpart(fd.read(),mimetype)

retval.add_header('Content-Disposition','attachment',

filename=filename)

fd.close()

returnretval

messagetext="""Hello,

Thisisa*great*testmessagefromChapter12.Ihopeyouenjoyit!

--Anonymous"""

messagehtml="""Hello,<P>

Thisisa<B>great</B>testmessage<P>

--<I>Anonymous</I>"""

msg=MIMEMultipart()

msg['To']='[email protected]'

msg['From']='TestSender<[email protected]>'

msg['Subject']='TestMessage'

msg['Date']=utils.formatdate(localtime=1)

msg['Message-ID']=utils.make_msgid()

body=MIMEMultipart('alternative')

body.attach(genpart(messagetext,'text/plain'))

body.attach(genpart(messagehtml,'text/html'))

msg.attach(body)

forfilenameinsys.argv[1:]:

msg.attach(attachment(filename))

printmsg.as_string()

ComposingNestedMultiparts

Page 151: Python Networking Gitbook

Python’semailmodulecanreadamessagefromafileorastring,andgeneratethesamekindofinmemoryobjecttreethatweweregeneratingourselvesintheaforementionedlistings.Tounderstandthee-mail’scontent,allyouhavetodo

isstepthroughitsstructure.Showanexampleatmime_structure.py`:

importsys,email

defprintmsg(msg,level=0):

prefix="|"*level

prefix2=prefix+"|"

printprefix+"+MessageHeaders:"

forheader,valueinmsg.items():

printprefix2,header+":",value

ifmsg.is_multipart():

foriteminmsg.get_payload():

printmsg(item,level+1)

msg=email.message_from_file(sys.stdin)

printmsg(msg)

Thisprogramisshortandsimple.Foreachobjectitencounters,itcheckstoseeifitismultipart;ifso,thechildrenofthatobjectaredisplayedaswell.Individualpartsofamessagecaneasilybeextracted.Youwillrecallthatthereareseveralwaysthatmessagedatamaybeencoded;fortunately,theemailmodulecandecodethemall!mime_decode.pyshowsaprogramthatwillletyoudecodeandsaveanycomponentofaMIMEmessage:

importsys,email

counter=0

parts=[]

defprintmsg(msg,level=0):

globalcounter

l="|"*level

ifmsg.is_multipart():

printl+"Foundmultipart:"

foriteminmsg.get_payload():

printmsg(item,level+1)

else:

disp=['%d.Decodablepart'%(counter+1)]

if'content-type'inmsg:

disp.append(msg['content-type'])

if'content-disposition'inmsg:

disp.append(msg['content-disposition'])

printl+",".join(disp)

counter+=1

parts.append(msg)

inputfd=open(sys.argv[1])

msg=email.message_from_file(inputfd)

printmsg(msg)

while1:

print"Selectpartnumbertodecodeorqtoquit:"

part=sys.stdin.readline().strip()

ifpart=='q':

sys.exit(0)

try:

part=int(part)

msg=parts[part-1]

except:

print"Invalidselection."

continue

print"Selectfiletowriteto:"

filename=sys.stdin.readline().strip()

try:

fd=open(filename,'wb')

except:

print"Invalidfilename."

continue

ParsingMIMEMessages

Page 152: Python Networking Gitbook

fd.write(msg.get_payload(decode=1))

Thisprogramstepsthroughthemessage,likethelastexample.Weskipaskingtheuseraboutmessagecomponentsthataremultipartbecausethoseexistonlytocontainothermessageobjects,liketextandattachments;multipartsectionshavenoactualpayloadoftheirown.

Page 153: Python Networking Gitbook

ThelasttrickthatweshouldcoverregardingMIMEmessagesisdecodingheadersthatmayhavebeenencodedwithforeignlanguages.Thefunctiondecode_header()takesasingleheaderandreturnsalistofpiecesoftheheader;eachpieceisabinarystringtogetherwithitsencoding(namedasastringifitissomethingbesides7-bitASCII,elsethevalueNone):

>>>x='=?iso-8859-1?q?Michael_M=FCller?=<[email protected]>'

>>>importemail.header

>>>pieces=email.header.decode_header(x)

>>>printpieces

[('MichaelM\xfcller','iso-8859-1'),('<[email protected]>',None)]

Ofcourse,thisrawinformationislikelytobeoflittleusetoyou.Toinsteadseetheactualtextinsidetheencoding,usethedecode()functionofeachbinarystringinthelist(fallingbacktoan‘ascii’encodingifNonewasreturned)andpastetheresulttogetherwithspaces:

>>>print''.join(s.decode(encor'ascii')fors,encinpieces)

MichaelMuller<[email protected]>

Itisalwaysgoodpracticetousedecode_header()onanyofthe“bigthree”headers—From,To,andSubject—beforedisplayingthemtotheuser.Ifnospecialencodingwasused,thentheresultwillsimplybeaone-elementlistcontainingtheheaderstringwithaNoneencoding.

DecodingHeaders

Page 154: Python Networking Gitbook

Theactualmovementofe-mailbetweensystemsisaccomplishedthroughSMTP:the“SimpleMailTransportProtocol.”InthischapterwewillanalyzeSMTPindepth.

SimpleMailTransportProtocol(SMTP)

Page 155: Python Networking Gitbook

TheroleofSMTPinmessagesubmission,wheretheuserpresses“Send”andexpectsamessagetogowingingitswayacrosstheInternet,willprobablybeleastconfusingifwetracethehistoryofhowusershavehistoricallyworkedwithInternetmail.Thekeyconcepttounderstandaswebeginthishistoryisthatusershaveneverbeenaskedtositaroundandwaitforane-mailmessagetoactuallybedelivered.Thisprocesscanoftentakequiteabitoftime—anduptoseveraldozenrepeatedattempts—beforeane-mailmessageisactuallydeliveredtoitsdestination.Anynumberofthingscouldcausedelays:amessagecouldhavetowaitbecauseothermessagesarealreadybeingtransmittedacrossalinkoflimitedbandwidth;thedestinationservermightbedownforafewhours,oritsnetworkmightnotbecurrentlyaccessiblebecauseofaglitch;andifthemailisdestinedforalargeorganization,thenitmighthavetomakeseveraldifferent“hops”asitarrivesatthebiguniversityserver,thenisdirectedtoasmallercollegee-mailmachine,andthenfinallyisdirectedtoadepartmentale-mailserver.

TheroleofSMTPinmessagesubmission,wheretheuserpresses“Send”andexpectsamessagetogowingingitswayacrosstheInternet,willprobablybeleastconfusingifwetracethehistoryofhowusershavehistoricallyworkedwithInternetmail.Thekeyconcepttounderstandaswebeginthishistoryisthatusershaveneverbeenaskedtositaroundandwaitforane-mailmessagetoactuallybedelivered.Thisprocesscanoftentakequiteabitoftime—anduptoseveraldozenrepeatedattempts—beforeane-mailmessageisactuallydeliveredtoitsdestination.Anynumberofthingscouldcausedelays:amessagecouldhavetowaitbecauseothermessagesarealreadybeingtransmittedacrossalinkoflimitedbandwidth;thedestinationservermightbedownforafewhours,oritsnetworkmightnotbecurrentlyaccessiblebecauseofaglitch;andifthemailisdestinedforalargeorganization,thenitmighthavetomakeseveraldifferent“hops”asitarrivesatthebiguniversityserver,thenisdirectedtoasmallercollegee-mailmachine,andthenfinallyisdirectedtoadepartmentale-mailserver.

E-mailbrowsingandsubmission,therefore,becomeablackbox:yourbrowserinteractswithawebAPI,andontheotherend,youwillseeplainoldSMTPconnectionsoriginatingfromandgoingtothelargeorganizationasmailisdeliveredineachdirection.Butintheworldofwebmail,clientprotocolsareremovedfromtheequation,takingusbacktotheolddaysofpureserver-to-serverunauthenticatedSMTP.

E-mailClients,WebmailServices

Page 156: Python Networking Gitbook

TheforegoingnarrativehashopefullyhelpedyoustructureyourthinkingaboutInternete-mailprotocols,andrealizehowtheyfittogetherinthebiggerpictureofgettingmessagestoandfromusers.Butthesubjectofthischapterisanarrowerone—theSimpleMailTransportProtocolinparticular.Andweshouldstartbystatingthebasics:

SMTPisaTCP/IP-basedprotocol.Connectionscanbeauthenticated,ornot.Connectionscanbeencrypted,ornot.

Moste-mailconnectionsacrosstheInternetthesedaysseemtolackanyattemptatencryption,whichmeansthatwhoeverownstheInternetbackboneroutersaretheoreticallyinapositiontoreadsimplystaggeringamountsofotherpeople’smail.

WhatarethetwowaysthatSMTPisused?First,SMTPcanbeusedfore-mailsubmissionbetweenacliente-mailprogramlikeThunderbirdorOutlook,claimingthatauserwantstosende-mail,andaserveratanorganizationthathasgiventhatuserane-mailaddress.Theseconnectionsgenerallyuseauthentication,sothatspammerscannotconnectandsendmillionsofmessagesonauser’sbehalfwithouthisorherpassword.Oncereceived,theserverputsthemessageinaqueuefordelivery(andoftenmakesitsfirstattemptatsendingitmomentslater),andtheclientcanforgetaboutthemessageandpresumetheserverwillkeeptryingtodeliverit.Second,SMTPisusedbetweenInternetmailserversastheymovee-mailfromitsorigintoitsdestination.Thistypicallyinvolvesnoauthentication;afterall,bigorganizationslikeGoogle,Yahoo!,andMicrosoftdonotknowthepasswordsofeachother’susers,sowhenYahoo!receivesane-mailfromGoogleclaimingthatitwassentfroman@gmail.comuser,Yahoo!justhastobelievethem(ornot—sometimesorganizationsblacklisteachotheriftoomuchspamismakingitthroughtheirservers,ashappenedtoafriendofminetheotherdaywhenHotmailstoppedacceptinghisclient’snewslettersfromGoDaddy’sserversbecauseofallegedproblemswithspam).

So,typically,noauthenticationtakesplacebetweenserverstalkingSMTPtoeachother—andevenencryptionagainstsnoopingroutersseemstobeusedonlyrarely.Becauseoftheproblemofspammersconnectingtoe-mailserversandclaimingtobedeliveringmailfromanotherorganization’susers,therehasbeenanattemptmadetolockdownwhocansendemailonanorganization’sbehalf.Thoughcontroversial,somee-mailserversconsulttheSenderPolicyFramework(SPF),definedinRFC4408,toseewhethertheservertheyaretalkingtoreallyhastheauthoritytodeliverthee-mailsitistransmitting.ButtheSPFandotheranti-spamtechnologiesareunfortunatelybeyondthescopeofthisbook,whichmustlimititselftothequestionofusingthebasicprotocolsthemselvesfromPython.SowenowturntothemoretechnicalquestionofhowyouwillactuallyuseSMTPfromyourPythonprograms.

HowSMTPIsUsed

Page 157: Python Networking Gitbook

Successfullysendinge-mailgenerallyrequiresaqueuewhereamessagecansitforseconds,minutes,ordaysuntilitcanbesuccessfullytransmittedtowarditsdestination.SoyoutypicallydonotwantyourprogramsusingPython’ssmtplibtosendmaildirectlytoamessage’sdestination—becauseifyourfirsttransmissionattemptfails,thenyouwillbestuckwiththejobofwritingafull“mailtransferagent”(MTA),astheRFCscallane-mailserver,andgiveitafullstandards-compliantre-tryqueue.Thisisnotonlyabigjob,butalsoonethathasalreadybeendonewellseveraltimes,andyouwillbewisetotakeadvantageofoneoftheexistingMTAs(lookatpostfix,exim,andqmail)beforetryingtowritesomethingofyourown.

SoonlyrarelywillyoubemakingSMTPconnectionsoutintotheworldfromPython.Moreusually,yoursystemadministratorwilltellyouoneoftwothings:

ThatyoushouldmakeanauthenticatedSMTPconnectiontoanexistinge-mailserver,usingausernameandpasswordthatwillbelongtoyourapplication,andgiveitpermissiontousethee-mailservertoqueueoutgoingmessages

Thatyoushouldrunalocalbinaryonthesystem—likethesendmailprogram—thatthesystemadministratorhasalreadygonetothetroubletoconfiguresothatlocalprogramscansendmail.

SendingE-Mail

Page 158: Python Networking Gitbook

Python’sbuilt-inSMTPimplementationisinthePythonStandardLibrarymodulesmtplibPython’sbuilt-inSMTPimplementationisinthePythonStandardLibrarymodulesmtplib,whichmakesiteasytodosimpletaskswithSMTP.

Intheexamplesthatfollow,theprogramsaredesignedtotakeseveralcommand-linearguments:thenameofanSMTPserver,asenderaddress,andoneormorerecipientaddresses.Pleaseusethemcautiously;nameonlyanSMTPserverthatyouyourselfrunorthatyouknowwillbehappyreceivingyourtestmessages,lestyouwindupgettinganIPaddressbannedforsendingspam!Ifyoudon’tknowwheretofindanSMTPserver,youmighttryrunningamaildaemonlikepostfixoreximlocallyandthenpointingtheseexampleprogramsatlocalhost.ManyUNIX,Linux,andMacOSXsystemshaveanSMTPserverlikeoneofthesealreadylisteningforconnectionsfromthelocalmachine.

Otherwise,consultyournetworkadministratororInternetprovidertoobtainaproperhostnameandport.Notethatyouusuallycannotjustpickamailserveratrandom;manystoreorforwardmailonlyfromcertainauthorizedclients.So,takealookatsimple.pyforaverysimpleSMTPprogram:

importsys,smtplib

iflen(sys.argv)<4:

print"usage:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]

sys.exit(2)

server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]

message="""To:%s

From:%s

Subject:TestMessagefromsimple.py

Hello,

Thisisatestmessagesenttoyoufromthesimple.pyprogram.

"""%(','.join(toaddrs),fromaddr)

s=smtplib.SMTP(server)

s.sendmail(fromaddr,toaddrs,message)

print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)

So,takealookatsimple.pyforaverysimpleSMTPprogram.

python

importsys,smtplib

iflen(sys.argv)<4:

print"usage:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]

sys.exit(2)

server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]

message="""To:%s

From:%s

Subject:TestMessagefromsimple.py

Hello,

Thisisatestmessagesenttoyoufromthesimple.pyprogram.

"""%(','.join(toaddrs),fromaddr)

s=smtplib.SMTP(server)

s.sendmail(fromaddr,toaddrs,message)

print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)

Itstartsbygeneratingasimplemessagefromtheuser’scommand-linearguments.Thenitcreatesansmtplib.SMTPobjectthatconnectstothespecifiedserver.Finally,allthat’srequiredisacalltosendmail().Ifthatreturnssuccessfully,thenyou

IntroducingtheSMTPLibrary

Page 159: Python Networking Gitbook

knowthatthemessagewassent.

Whenyouruntheprogram,itwilllooklikethis:

root@erlerobot:~/Python_files#[email protected]@example.com

Messagesuccessfullysentto2recipient(s)

ThankstothehardworkthattheauthorsofthePythonStandardLibraryhaveputintothesendmail()method,itmightbetheonlySMTPcallyoueverneed.

Page 160: Python Networking Gitbook

Thereareseveraldifferentexceptionsthatmightberaisedwhileyou’reprogrammingwithsmtplib.Theyare:

socket.gaierrorforerrorslookingupaddressinformation.

socket.errorforgeneralI/Oandcommunicationproblems.

socket.herrorforotheraddressingerrors.

smtplib.SMTPExceptionorasubclassofitforSMTPconversationproblems.

Thesmtplibmodulealsoprovidesawaytogetaseriesofdetailedmessagesaboutthestepsittakestosendane-mail.Toenablethatlevelofdetail,youcancallsmtpobj.set_debuglevel(1)Withthisoption,youshouldbeabletotrackdownanyproblems.Takeaalookatdebug.pyforanexampleprogramthatprovidesbasicerrorhandlinganddebugging.

importsys,smtplib,socket

iflen(sys.argv)<4:

print"usage:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]

sys.exit(2)

server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]

message="""To:%s

From:%s

Subject:TestMessagefromsimple.py

Hello,

Thisisatestmessagesenttoyoufromthedebug.pyprogram.

"""%(','.join(toaddrs),fromaddr)

try:

s=smtplib.SMTP(server)

s.set_debuglevel(1)

s.sendmail(fromaddr,toaddrs,message)

except(socket.gaierror,socket.error,socket.herror,

smtplib.SMTPException),e:

print"***Yourmessagemaynothavebeensent!"

printe

sys.exit(1)

else:

print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)

Thisprogramlookssimilartothelastone.However,theoutputwillbeverydifferent.

root@erlerobot:~/Python_files#[email protected]@complete.org

send:'ehlolocalhost\r\n'

reply:'250-localhost\r\n'

reply:'250-PIPELINING\r\n'

reply:'250-SIZE20480000\r\n'

reply:'250-VRFY\r\n'

reply:'250-ETRN\r\n'

reply:'250-STARTTLS\r\n'

...

Messagesuccessfullysentto1recipient(s)

Fromthisexample,youcanseetheconversationthatsmtplibishavingwiththeSMTPserveroverthenetwork.Let’slookatwhat’shappening:First,theclient(thesmtpliblibrary)sendsanEHLOcommand(an“extended”successortoamoreancientcommandthatwasnamed,morereadably,HELO)withyourhostnameinit.Theremoteserverrespondswithitshostname,andlistsanyoptionalSMTPfeaturesthatitsupports.Next,theclientsendsthemailfromcommand,whichstatesthe“envelopesender”e-mailaddressandthesizeofthemessage.Theserveratthismomenthastheopportunitytorejectthemessage(forexample,becauseitthinksyouareaspammer);butinthiscase,itrespondswith250Ok.(Note

ErrorHandlingandConversationDebugging

Page 161: Python Networking Gitbook

thatinthiscase,thecode250iswhatmatters;theremainingtextisjustahuman-readablecommentandvariesfromservertoserver.)Thentheclientsendsarcpttocommand,withthe“enveloperecipient”thatwetalkedsomuchaboutearlierinthischapter;youcanfinallyseethat,indeed,itistransmittedseparatelyfromthetextofthemessageitselfwhenusingtheSMTPprotocol.Ifyouweresendingthemessagetomorethanonerecipient,theywouldeachbelistedonthercpttoline.Finally,theclientsendsadatacommand,transmitstheactualmessage(usingverbosecarriagereturn-linefeedlineendings,youwillnote,pertheInternete-mailstandard),andfinishestheconversation.

Thesmtplibmoduleisdoingallthisautomaticallyforyouinthisexample.Intherestofthechapter,wewilllookathowtotakemorecontroloftheprocesssoyoucantakeadvantageofsomemoreadvancedfeatures.

Page 162: Python Networking Gitbook

SometimesitisnicetoknowaboutwhatkindofmessagesaremoteSMTPserverwillaccept.Forinstance,mostSMTPservershavealimitonwhatsizemessagetheypermit,andifyoufailtocheckfirst,thenyoumaytransmitaverylargemessageonlytohaveitrejectedwhenyouhavecompletedtransmission.

SomeserversdonotsupportESMTP.Onthoseservers,EHLOwilljustreturnanerror.Inthatcase,youmustsendaHELOcommandinstead.Inthepreviousexamples,weusedsendmail()immediatelyaftercreatingourSMTPobject,sosmtplibhadtosenditsown“hello”messagetotheserver.ButifitseesyouattempttosendtheEHLOorHELOcommandonyourown,thensendmail()willnolongerattempttosendthesecommandsitself.ehlo.pyshowsaprogramthatgetsthemaximumsizefromtheserver,andreturnsanerrorbeforesendingifamessagewouldbetoolarge.

importsys,smtplib,socket

iflen(sys.argv)<4:

print"usage:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]

sys.exit(2)

server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]

message="""To:%s

From:%s

Subject:TestMessagefromsimple.py

Hello,

Thisisatestmessagesenttoyoufromtheehlo.pyprogram.

"""%(','.join(toaddrs),fromaddr)

try:

s=smtplib.SMTP(server)

code=s.ehlo()[0]

uses_esmtp=(200<=code<=299)

ifnotuses_esmtp:

code=s.helo()[0]

ifnot(200<=code<=299):

print"RemoteserverrefusedHELO;code:",code

sys.exit(1)

ifuses_esmtpands.has_extn('size'):

print"Maximummessagesizeis",s.esmtp_features['size']

iflen(message)>int(s.esmtp_features['size']):

print"Messagetoolarge;aborting."

sys.exit(1)

s.sendmail(fromaddr,toaddrs,message)

except(socket.gaierror,socket.error,socket.herror,

smtplib.SMTPException),e:

print"***Yourmessagemaynothavebeensent!"

printe

sys.exit(1)

else:

print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)

Ifyourunthisprogram,andtheremoteserverprovidesitsmaximummessagesize,thentheprogramwilldisplaythesizeonyourscreenandverifythatitsmessagedoesnotexceedthatsizebeforesending.Hereiswhatrunningthisprogrammightlooklike:

root@erlerobot:~/Python_files#[email protected]@complete.orgMaximummessagesizeis10240000

Messagesuccessfullysentto1recipient(s)

Takealookatthepartofthecodethatverifiestheresultfromacalltoehlo()orhelo().Thosetwofunctionsreturnalist;thefirstiteminthelistisanumericresultcodefromtheremoteSMTPserver.

GettingInformationfromEHLO

Page 163: Python Networking Gitbook
Page 164: Python Networking Gitbook

E-mailssentinplaintextoverSMTPcanbereadbyanyonewithaccesstoanInternetgatewayorrouteracrosswhichthepacketshappentopass.Thebestsolutiontothisproblemistoencrypteache-mailwithapublickeywhoseprivatekeyispossessedonlybythepersontowhomyouaresendingthee-mail;therearefreelyavailablesystemssuchasPGPandGPGfordoingexactlythis.Butregardlessofwhetherthemessagesthemselvesareprotected,individualSMTPconversationsbetweenparticularpairsofmachinescanbeencryptedandauthenticatedusingamethodknownasSSL/TLS.

ThegeneralprocedureforusingTLSinSMTPisasfollows:

1. CreatetheSMTPobject,asusual.2. SendtheEHLOcommand.IftheremoteserverdoesnotsupportEHLO,thenitwillnotsupportTLS.3. Checks.has_extn()toseeifstarttlsispresent.Ifnot,thentheremoteserverdoesnotsupportTLSandthemessage

canonlybesentnormally,intheclear.4. Callstarttls()toinitiatetheencryptedchannel.5. Callehlo()asecondtime;thistime,it’sencrypted.6. Finally,sendyourmessage.

ThefirstquestionyouhavetoaskyourselfwhenworkingwithTLSiswhetheryoushouldreturnanerrorifTLSisnotavailable.Dependingonyourapplication,youmightwanttoraiseanerrorforanyofthefollowing:

ThereisnosupportforTLSontheremoteside.

TheremotesidefailstoestablishaTLSsessionproperly.

Theremoteserverpresentsacertificatethatcannotbevalidated.

tls.pyactsasaTLS-capablegeneral-purposeclient.ItwillconnecttoaserveranduseTLSifitcan;otherwise,itwillfallbackandsendthemessageasusual.(ButitwilldiewithanerroriftheattempttostartTLSfailswhiletalkingtoanostensiblycapableserver).

importsys,smtplib,socket

iflen(sys.argv)<4:

print"Syntax:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]

sys.exit(2)

server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]

message="""To:%s

From:%s

Subject:TestMessagefromsimple.py

Hello,

Thisisatestmessagesenttoyoufromthetls.pyprogram

inFoundationsofPythonNetworkProgramming.

"""%(','.join(toaddrs),fromaddr)

try:

s=smtplib.SMTP(server)

code=s.ehlo()[0]

uses_esmtp=(200<=code<=299)

ifnotuses_esmtp:

code=s.helo()[0]

ifnot(200<=code<=299):

print"RemoveserverrefusedHELO;code:",code

sys.exit(1)

ifuses_esmtpands.has_extn('starttls'):

print"NegotiatingTLS...."

s.starttls()

code=s.ehlo()[0]

ifnot(200<=code<=299):

UsingSecureSocketsLayerandTransportLayerSecurity

Page 165: Python Networking Gitbook

print"Couldn'tEHLOafterSTARTTLS"

sys.exit(5)

print"UsingTLSconnection."

else:

print"ServerdoesnotsupportTLS;usingnormalconnection."

s.sendmail(fromaddr,toaddrs,message)

except(socket.gaierror,socket.error,socket.herror,

smtplib.SMTPException),e:

print"***Yourmessagemaynothavebeensent!"

printe

sys.exit(1)

else:

print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)

IfyourunthisprogramandgiveitaserverthatunderstandsTLS,theoutputwilllooklikethis:

root@erlerobot:~/Python_files#[email protected]@complete.org

NegotiatingTLS....

UsingTLSconnection.

Messagesuccessfullysentto1recipient(s)

Noticethatthecalltosendmail()intheselastfewlistingsisthesame,regardlessofwhetherTLSisused.

Page 166: Python Networking Gitbook

WereachthetopicofAuthenticatedSMTP,whereyourISP,university,orcompanye-mailserverneedsyoutologinwithausernameandpasswordtoprovethatyouarenotaspammerbeforetheyallowyoutosende-mail.

Formaximumsecurity,TLSshouldbeusedinconjunctionwithauthentication;otherwiseyourpassword(andusername,forthatmatter)willbevisibletoanyoneobservingtheconnection.TheproperwaytodothisistoestablishtheTLSconnectionfirst,andthensendyourauthenticationinformationonlyovertheencryptedcommunicationschannel.

Butusingauthenticationitselfissimple;smtplibprovidesalogin()functionthattakesausernameandapassword.login.pyshowsanexample.Toavoidrepeatingcodealreadyshowninpreviouslistings,thislistingdoesnottaketheadviceofthepreviousparagraph,andsendstheusernameandpasswordoveranun-authenticatedconnectionthatwillsendthemintheclear.

importsys,smtplib,socket

fromgetpassimportgetpass

iflen(sys.argv)<4:

print"Syntax:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]

sys.exit(2)

server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]

message="""To:%s

From:%s

Subject:TestMessagefromsimple.py

Hello,

Thisisatestmessagesenttoyoufromthelogin.pyprogram

inFoundationsofPythonNetworkProgramming.

"""%(','.join(toaddrs),fromaddr)

sys.stdout.write("Enterusername:")

username=sys.stdin.readline().strip()

password=getpass("Enterpassword:")

try:

s=smtplib.SMTP(server)

try:

s.login(username,password)

exceptsmtplib.SMTPException,e:

print"Authenticationfailed:",e

sys.exit(1)

s.sendmail(fromaddr,toaddrs,message)

except(socket.gaierror,socket.error,socket.herror,

smtplib.SMTPException),e:

print"***Yourmessagemaynothavebeensent!"

printe

sys.exit(1)

else:

print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)

Youcanrunthisprogramjustlikethepreviousexamples.Ifyourunitwithaserverthatdoessupportauthentication,youwillbepromptedforausernameandpassword.Iftheyareaccepted,thentheprogramwillproceedtotransmityourmessage.

AuthenticatedSMTP

Page 167: Python Networking Gitbook

ThePostOfficeProtocol,isasimpleprotocolthatisusedtodownloade-mailfromamailserver,andistypicallyusedthroughane-mailclientlikeThunderbirdorOutlook.POPdoesnotsupportmultiplemailboxesontheremoteside,nordoesitprovideanyreliable,persistentmessageidentification.ThismeansthatyoucannotusePOPasaprotocolformailsynchronization.ThePythonStandardLibraryprovidesthepoplibmodule,whichprovidesaconvenientinterfaceforusingPOP.Inthischapter,youwilllearnhowtousepoplibtoconnecttoaPOPserver,gathersummaryinformationaboutamailbox,downloadmessages,anddeletetheoriginalsfromtheserver.

PostOfficeProtocol(POP)

Page 168: Python Networking Gitbook

POPsupportsseveralauthenticationmethods.Thetwomostcommonarebasicusername-passwordauthentication,andAPOP,whichisanoptionalextensiontoPOPthathelpsprotectpasswodsfrombeingsentinplain-textifyouareusinganancientPOPserverthatdoesnotsupportSSL.

TheprocessofconnectingandauthenticatingtoaremoteserverlookslikethisinPython:

1. CreateaPOP3_SSLorjustaplainPOP3object,andpasstheremotehostnameandporttoit.2. Calluser()andpass_()tosendtheusernameandpassword.Notetheunderscoreinpass_().Itispresentbecause

passisakeywordinPythonandcannotbeusedforamethodname.3. Iftheexceptionpoplib.error_protoisraised,itmeansthattheloginhasfailedandthestringvalueoftheexception

containstheerrorexplanationsentbytheserver.

ThechoicebetweenPOP3andPOP3_SSLisgovernedbywhetheryoure-mailprovideroffers—or,inthisdayandage,evenrequires—thatyouconnectoveranencryptedconnection.

popconn.pyusestheforegoingstepstologintoaremotePOPserver.Onceconnected,itcallsstat(),whichreturnsasimpletuplegivingthenumberofmessagesinthemailboxandthemessages’totalsize.Finally,theprogramcallsquit(),whichclosesthePOPconnection.

importgetpass,poplib,sys

iflen(sys.argv)!=3:

print'usage:%shostnameuser'%sys.argv[0]

exit(2)

hostname,user=sys.argv[1:]

passwd=getpass.getpass()

p=poplib.POP3_SSL(hostname)#or"POP3"ifSSLisnotsupported

try:

p.user(user)

p.pass_(passwd)

exceptpoplib.error_proto,e:

print"Loginfailed:",e

else:

status=p.stat()

print"Youhave%dmessagestotaling%dbytes"%status

finally:

p.quit()

YoucantestthisprogramifyouhaveaPOPaccountsomewhere.Theprogramwillthenpromptyouforyourpassword.Finally,itwilldisplaythemailboxstatus,withouttouchingoralteringanyofyourmail.

WhenPOPserversdonotsupportSSLtoprotectyourconnectionfromsnooping,theysometimesatleastsupportanalternateauthenticationprotocolcalledAPOP,whichusesachallenge-responseschemetoassurethatyourpasswordisnotsentintheclear.(Butallofyoure-mailwillstillbevisibletoanythirdpartywatchingthepacketsgoby)ThePythonStandardLibrarymakesthisveryeasytoattempt:justcalltheapop()method,thenfallbacktobasicauthenticationifthePOPserveryouaretalkingtodoesnotunderstand.TouseAPOPbutfallbacktoplainauthentication,youcoulduseastanzaliketheoneshownbelowinsideyourPOPprogram(likeponconn.py).

print"AttemptingAPOPauthentication..."

try:

p.apop(user,passwd)

exceptpoplib.error_proto:

print"Attemptingstandardauthentication..."

try:

p.user(user)

p.pass_(passwd)

exceptpoplib.error_proto,e:

print"Loginfailed:",e

ConnectingandAuthenticating

Page 169: Python Networking Gitbook

sys.exit(1)

Page 170: Python Networking Gitbook

Theprecedingexampleshowedyoustat(),whichreturnsthenumberofmessagesinthemailboxandtheirtotalsize.AnotherusefulPOPcommandislist(),whichreturnsmoredetailedinformationabouteachmessage.Themostinterestingpartisthemessagenumber,whichisrequiredtoretrievemessageslater.Notethattheremaybegapsinmessagenumbers:amailboxmay,forexample,containmessagenumbers1,2,5,6,and9.Also,thenumberassignedtoaparticularmessagemaybedifferentoneachconnectionyoumaketothePOPserver.mailbox.pyshowshowtousethelist()commandtodisplayinformationabouteachmessage.

importgetpass,poplib,sys

iflen(sys.argv)!=3:

print'usage:%shostnameuser'%sys.argv[0]

exit(2)

hostname,user=sys.argv[1:]

passwd=getpass.getpass()

p=poplib.POP3_SSL(hostname)

try:

p.user(user)

p.pass_(passwd)

exceptpoplib.error_proto,e:

print"Loginfailed:",e

else:

response,listings,octet_count=p.list()

forlistinginlistings:

number,size=listing.split()

print"Message%shas%sbytes"%(number,size)

finally:

p.quit()

Thelist()functionreturnsatuplecontainingthreeitems;youshouldgenerallypayattentiontotheseconditem.HereisitsrawoutputforoneofmyPOPmailboxesatthemoment,whichhasthreemessagesinit:

('+OK3messages(5675bytes)',['12395','21626',

'31654'],24)

Thethreestringsinsidetheseconditemgivethemessagenumberandsizeforeachofthethreemessagesinmyin-box.

ObtainingMailboxInformation

Page 171: Python Networking Gitbook

YoushouldnowbegettingthehangofPOP:whenusingpoplibyougettoissuesmallatomiccommandsthatalwaysreturnatupleinsidewhicharevariousstringsandlistsofstringsshowingyoutheresult.Wearenowreadytoactuallymanipulatemessages!Thethreerelevantmethods,whichallidentifymessagesusingthesameintegeridentifiersthatarereturnedbylist(),arethese:

retr(num):Thismethoddownloadsasinglemessageandreturnsatuplecontainingaresultcodeandthemessageitself,deliveredasalistoflines.ThiswillcausemostPOPserverstosetthe“seen”flagforthemessageto“true,”barringyoufromeverseeingitfromPOPagain(unlessyouhaveanotherwayintoyourmailboxthatletsyousetmessagesbackto“Unread”).

top(num,body_lines):Thismethodreturnsitsresultinthesameformatasretr()withoutmarkingthemessageas“seen.”Butinsteadofreturningthewholemessage,itjustreturnstheheadersplushowevermanylinesofthebodyyouaskforinbody_lines.Thisisusefulforpreviewingmessagesifyouwanttolettheuserdecidewhichonestodownload.

dele(num):ThismethodmarksthemessagefordeletionfromthePOPserver,totakeplacewhenyouquitthisPOPsession.Typicallyyouwoulddothisonlyiftheuserdirectlyrequestsirrevocabledestructionofthemessage,orifyouhavestoredthemessagetodiskandusedsomethinglikefsync()toassurethedata’ssafety.

Toputeverythingtogether,takealookatdownload-and-delete.py,whichisafairlyfunctionale-mailclientthatspeaksPOP.Itchecksyourin-boxtodeterminehowmanymessagesthereareandtolearnwhattheirnumbersare;thenitusestop()toofferapreviewofeachone;and,attheuser’soption,itcanretrievethewholemessage,andcanalsodeleteitfromthemailbox.

importemail,getpass,poplib,sys

iflen(sys.argv)!=3:

print'usage:%shostnameuser'%sys.argv[0]

exit(2)

hostname,user=sys.argv[1:]

passwd=getpass.getpass()

p=poplib.POP3_SSL(hostname)

try:

p.user(user)

p.pass_(passwd)

exceptpoplib.error_proto,e:

print"Loginfailed:",e

else:

response,listings,octets=p.list()

forlistinginlistings:

number,size=listing.split()

print'Message',number,'(sizeis',size,'bytes):'

print

response,lines,octets=p.top(number,0)

message=email.message_from_string('\n'.join(lines))

forheaderin'From','To','Subject','Date':

ifheaderinmessage:

printheader+':',message[header]

print

print'Readthismessage[ny]?'

answer=raw_input()

ifanswer.lower().startswith('y'):

response,lines,octets=p.retr(number)

message=email.message_from_string('\n'.join(lines))

print'-'*72

forpartinmessage.walk():

ifpart.get_content_type()=='text/plain':

printpart.get_payload()

print'-'*72

print

print'Deletethismessage[ny]?'

answer=raw_input()

DownloadingandDeletingMessages

Page 172: Python Networking Gitbook

ifanswer.lower().startswith('y'):

p.dele(number)

print'Deleted.'

finally:

p.quit()

Ifyourunthisprogram,you’llseeoutputsimilartothis:

root@erlerobot:~/Python_files#pythondownload-and-delete.pypop.gmail.commy_gmail_acct

Message1(sizeis1847bytes):

From:[email protected]

To:BrandonRhodes<[email protected]>

Subject:Backupcomplete

Date:Tue,13Apr201016:56:43-0700(PDT)

Readthismessage[ny]?

n

Deletethismessage[ny]?

y

Deleted.

Page 173: Python Networking Gitbook

SuchasPOP,IMAPisawaythatalaptopordesktopcomputercanconnecttoalargerInternetservertoviewandmanipulateauser’se-mail.WhereasthecapabilitiesofPOPareratheranemictheIMAPprotocolofferssuchafullarrayofcapabilitiesthatmanyusersstoretheire-mailpermanentlyontheserver,keepingitsafefromalaptopordesktopharddrivecrash.

Thischapterwillteachjustthebasics,withafocusonhowtobestconnectfromPython.

InternetMessageAccessProtocol(IMAP)

Page 174: Python Networking Gitbook

ThePythonStandardLibrarycontainsanIMAPclientinterfacenamedimaplib,whichdoesofferrudimentaryaccesstotheprotocol.Unfortunately,itlimitsitselftoknowinghowtosendrequestsanddelivertheirresponsesbacktoyourcode.ItmakesnoattempttoactuallyimplementthedetailedrulesintheIMAPspecificationforparsingthereturneddata.

Asanexampleofhowvaluesreturnedfromimaplibareusuallytoorawtobeusefullyusedinaprogram,takealookatopen_imaplib.py.ItisasimplescriptthatusesimaplibtoconnecttoanIMAPaccount,listthe“capabilities”thattheserveradvertises,andthendisplaythestatuscodeanddatareturnedbytheLISTcommand.

importgetpass,sys

fromimapclientimportIMAPClient

try:

hostname,username=sys.argv[1:]

exceptValueError:

print'usage:%shostnameusername'%sys.argv[0]

sys.exit(2)

c=IMAPClient(hostname,ssl=True)

try:

c.login(username,getpass.getpass())

exceptc.Error,e:

print'Couldnotlogin:',e

sys.exit(1)

print'Capabilities:',c.capabilities()

print'Listingmailboxes:'

data=c.list_folders()

forflags,delimiter,folder_nameindata:

print'%-30s%s%s'%(''.join(flags),delimiter,folder_name)

c.logout()

Ifyourunthisscriptwithappropriatearguments,itwillstartbyaskingforyourpassword—IMAPauthenticationisalmostalwaysaccomplishedthroughausernameandpassword:

root@erlerobot:~/Python_files#[email protected]

Password:

Ifyourpasswordiscorrect,itwillthenprintoutaresponsethatlookssomethingliketheresultshownbelow:

Capabilities:('IMAP4REV1','UNSELECT','IDLE','NAMESPACE','QUOTA',

'XLIST','CHILDREN','XYZZY','SASL-IR','AUTH=XOAUTH')

Listingmailboxes

Status:'OK'

Data:

'(\\HasNoChildren)"/""INBOX"'

'(\\HasNoChildren)"/""Personal"'

'(\\HasNoChildren)"/""Receipts"'

'(\\HasNoChildren)"/""Travel"'

'(\\HasNoChildren)"/""Work"'

'(\\Noselect\\HasChildren)"/""[Gmail]"'

'(\\HasChildren\\HasNoChildren)"/""[Gmail]/AllMail"'

'(\\HasNoChildren)"/""[Gmail]/Drafts"'

'(\\HasChildren\\HasNoChildren)"/""[Gmail]/SentMail"'

'(\\HasNoChildren)"/""[Gmail]/Spam"'

'(\\HasNoChildren)"/""[Gmail]/Starred"'

'(\\HasChildren\\HasNoChildren)"/""[Gmail]/Trash"'

Therearetwomainproblems:First,wehavebeenreturneditsstatuscodemanuallyandsecond,imaplibgivesusnohelpininterpretingtheresults.

Sounlessyouwanttoimplementseveraldetailsoftheprotocolyourself,youwillwantamorecapableIMAPclientlibrary.

UnderstandingIMAPinPython

Page 175: Python Networking Gitbook
Page 176: Python Networking Gitbook

Fortunately,apopularandbattle-testedIMAPlibraryforPythondoesexist,andisavailableforeasyinstallationfromthePythonPackageIndex.TheIMAPClientpackageiswrittenbyafriendlyPythonprogrammernamedMennoSmits,andinfactusestheStandardLibraryOnceinstalled,youcanusethepythoninterpreterinthevirtualenvironmenttoruntheprogramshowninopen_imap.py.

importgetpass,sys

fromimapclientimportIMAPClient

try:

hostname,username=sys.argv[1:]

exceptValueError:

print'usage:%shostnameusername'%sys.argv[0]

sys.exit(2)

c=IMAPClient(hostname,ssl=True)

try:

c.login(username,getpass.getpass())

exceptc.Error,e:

print'Couldnotlogin:',e

sys.exit(1)

print'Capabilities:',c.capabilities()

print'Listingmailboxes:'

data=c.list_folders()

forflags,delimiter,folder_nameindata:

print'%-30s%s%s'%(''.join(flags),delimiter,folder_name)

c.logout()

Youcanseeimmediatelyfromthecodethatmoredetailsoftheprotocolexchangearenowbeinghandledonourbehalf.Forexample,wenolongergetastatuscodebackthatwehavetocheckeverytimewerunacommand;instead,thelibraryisdoingthatcheckforusandwillraiseanexceptiontostopusinourtracksifanythinggoeswrong.Second,youcanseethateachresultfromtheLISTcommand—whichinthislibraryisofferedasthelist_folders()methodinsteadofthelist()methodofferedbyimaplib—hasalreadybeenparsedintoPythondatatypesforus.Eachlineofdatacomesbackasatuplegivingusthefolderflags,foldernamedelimiter,andfoldername,andtheflagsthemselvesareasequenceofstrings.Takealookatthecodebelow,forwhattheoutputofthissecondscriptlookslike:

Capabilities:('IMAP4REV1','UNSELECT','IDLE','NAMESPACE','QUOTA','XLIST','CHILDREN',

'XYZZY','SASL-IR','AUTH=XOAUTH')

Listingmailboxes:

\HasNoChildren/INBOX

\HasNoChildren/Personal

\HasNoChildren/Receipts

\HasNoChildren/Travel

\HasNoChildren/Work

\Noselect\HasChildren/[Gmail]

\HasChildren\HasNoChildren/[Gmail]/AllMail

\HasNoChildren/[Gmail]/Drafts

\HasChildren\HasNoChildren/[Gmail]/SentMail

\HasNoChildren/[Gmail]/Spam

\HasNoChildren/[Gmail]/Starred

\HasChildren\HasNoChildren/[Gmail]/Trash

Thestandardflagslistedforeachfoldermaybezeroormoreofthefollowing:

\Noinferiors:Thismeansthatthefolderdoesnotcontainanysub-foldersandthatitisnotpossibleforittocontainsub-foldersinthefuture.YourIMAPclientwillreceiveanerrorifittriestocreateasub-folderunderthisfolder.

\Noselect:Thismeansthatitisnotpossibletorunselect_folder()onthisfolder—thatis,thisfolderdoesnotandcannotcontainanymessages.(Perhapsitexistsjusttoallowsub-foldersbeneathit,asonepossibility.)

\Marked:Thismeansthattheserverconsidersthisboxtobeinterestinginsomeway;generally,thisindicatesthatnewmessageshavebeendeliveredsincethelasttimethefolderwasselected.However,theabsenceof\Markeddoesnot

IMAPClient

Page 177: Python Networking Gitbook

guaranteethatthefolderdoesnotcontainnewmessages;someserverssimplydonotimplement\Markedatall.

\Unmarked:Thisguaranteesthatthefolderdoesn’tcontainnewmessages.

Page 178: Python Networking Gitbook

IMAPprovidestwodifferentwaystorefertoaspecificmessagewithinafolder:byatemporarymessagenumber(whichtypicallygoes1,2,3,andsoforth)orbyaUID(uniqueidentifier).Thedifferencebetweenthetwolieswithpersistence.Messagenumbersareassignedrightwhenyouselectthefolder.Thismeanstheycanbeprettyandsequential,butitalsomeansthatifyourevisitthesamefolderlater,thenagivenmessagemayhaveadifferentnumber.Forprogramssuchaslivemailreadersorsimpledownloadscripts,thisbehavior(whichisthesameasPOP)isfine;youdonotneedthenumberstostaythesame.ButaUID,bycontrast,isdesignedtoremainthesameevenifyoucloseyourconnectiontotheserveranddonotreconnectagainforanotherweek.IfamessagehadUID1053today,thenthesamemessagewillhaveUID1053tomorrow,andnoothermessageinthatfolderwilleverhaveUID1053.Ifyouarewritingasynchronizationtool,thisbehaviorisquiteuseful!Itwillallowyoutoverifywith100%percentcertaintythatactionsarebeingtakenagainstthecorrectmessage.ThisisoneofthethingsthatmakeIMAPsomuchmorefunthanPOP.

MostIMAPcommandsthatworkwithspecificmessagescantakeeithermessagenumbersorUIDs.Normally,IMAPClientalwaysusesUIDsandignoresthetemporarymessagenumbersassignedbyIMAP.Butifyouwanttoseethetemporarynumbersinstead,simplyinstantiateIMAPClientwithause_uid=Falseargument—or,youcanevensetthevalueoftheclass’suse_uidattributetoFalseandTrueontheflyduringyourIMAPsession.

MessageNumbersvs.UIDs

Page 179: Python Networking Gitbook

Whenyoufirstselectafolder,theIMAPserverprovidessomesummaryinformationaboutit—aboutthefolderitselfandalsoaboutitsmessages.ThesummaryisreturnedbyIMAPClientasadictionary.HerearethekeysthatmostIMAPserverswillreturnwhenyourunselect_folder():

EXISTS:Anintegergivingthenumberofmessagesinthefolder.

FLAGS:Alistoftheflagsthatcanbesetonmessagesinthisfolder.

RECENT:Specifiestheserver’sapproximationofthenumberofmessagesthathaveappearedinthefoldersincethelasttimeanIMAPclientranselect_folder()onit.

PERMANENTFLAGS:Specifiesthelistofcustomflagsthatcanbesetonmessages;thisisusuallyempty.

UIDNEXT:Theserver’sguessabouttheUIDthatwillbeassignedtothenextincoming(oruploaded)message

UIDVALIDITY:AstringthatcanbeusedbyclientstoverifythattheUIDnumberinghasnotchanged;ifyoucomebacktoafolderandthisisadifferentvaluethanthelasttimeyouconnected,thentheUIDnumberhasstartedoverandyourstoredUIDvaluesarenolongervalid.

UNSEEN:Specifiesthemessagenumberofthefirstunseenmessage(onewithoutthe\Seenflag)inthefolder.

Oftheseflags,serversareonlyrequiredtoreturnFLAGS,EXISTS,andRECENT,thoughmostwillincludeatleastUIDVALIDITYaswell.

folder_info.pyshowsanexampleprogramthatreadsanddisplaysthesummaryinformationofmyINBOXmailfolder:

importgetpass,sys

fromimapclientimportIMAPClient

try:

hostname,username=sys.argv[1:]

exceptValueError:

print'usage:%shostnameusername'%sys.argv[0]

sys.exit(2)

c=IMAPClient(hostname,ssl=True)

try:

c.login(username,getpass.getpass())

exceptc.Error,e:

print'Couldnotlogin:',e

sys.exit(1)

else:

select_dict=c.select_folder('INBOX',readonly=True)

fork,vinselect_dict.items():

print'%s:%r'%(k,v)

c.logout()

Whenrun,thisprogramdisplaysresultssuchasthis:

```

root@erlerobot:~/Python_files#[email protected]:EXISTS:3PERMANENTFLAGS:('\Answered','\Flagged','\Draft','\Deleted','\Seen','\*')READ-WRITE:TrueUIDNEXT:2626FLAGS:('\Answered','\Flagged','\Draft','\Deleted','\Seen')UIDVALIDITY:1RECENT:0

```

SummaryInformation

Page 180: Python Networking Gitbook

ThatshowsthatmyINBOXfoldercontainsthreemessages,noneofwhichhavearrivedsinceIlastchecked.IfyourprogramisinterestedinusingUIDsthatitstoredduringprevioussessions,remembertocomparetheUIDVALIDITYtoastoredvaluefromaprevioussession.

Page 181: Python Networking Gitbook

WithIMAP,theFETCHcommandisusedtodownloadmail,whichIMAPClientexposesasits`fetch()method.Thesimplestwaytofetchinvolvesdownloadingallmessagesatonce,inasinglebiggulp.Whilethisissimplestandrequirestheleastnetworktraffic(sinceyoudonothavetoissuerepeatedcommandsandreceivemultipleresponses),itdoesmeanthatallofthereturnedmessageswillneedtositinmemoryDownloadfromtogetherasyourprogramexaminesthem.Forverylargemailboxeswhosemessageshavelotsofattachments,thisisobviouslynotpractical.

mailbox_summary.pydownloadsallofthemessagesfrommyINBOXfolderintoyourcomputer’smemoryinaPythondatastructure,andthendisplaysabitofsummaryinformationabouteachone.

importemail,getpass,sys

fromimapclientimportIMAPClient

try:

hostname,username,foldername=sys.argv[1:]

exceptValueError:

print'usage:%shostnameusernamefolder'%sys.argv[0]

sys.exit(2)

c=IMAPClient(hostname,ssl=True)

try:

c.login(username,getpass.getpass())

exceptc.Error,e:

print'Couldnotlogin:',e

sys.exit(1)

c.select_folder(foldername,readonly=True)

msgdict=c.fetch('1:*',['BODY.PEEK[]'])

formessage_id,messageinmsgdict.items():

e=email.message_from_string(message['BODY[]'])

printmessage_id,e['From']

payload=e.get_payload()

ifisinstance(payload,list):

part_content_types=[part.get_content_type()forpartinpayload]

print'Parts:',''.join(part_content_types)

else:

print'',''.join(payload[:60].split()),'...'

c.logout()

RememberthatIMAPisstateful:firstweuseselect_folder()toputus“inside”thegivenfolder,andthenwecanrunfetch()toaskformessagecontent.Therange'1:*'means“thefirstmessagethroughtheendofthemailfolder”,becausemessageIDs—whethertemporaryorUIDs—arealwayspositiveintegers.

Hereiswhatitlooksliketorunthisscript:

root@erlerobot:~/Python_files#pythonmailbox_summary.pyimap.example.combrandonINBOX

Password:

2590"Amazon.com"<[email protected]>

DearBrandon,PortablePowerSystems,Inc.shippedthefollo...

2469MeetupReminder<[email protected]>

Parts:text/plaintext/html

[email protected]

Thankyou.Pleasenotethatchargeswillappearas"Linode.c...

DownloadinganEntireMailbox

Page 182: Python Networking Gitbook

E-mailmessagescanbequitelarge,andsocanmailfolders—manymailsystemspermituserstohavehundredsorthousandsofmessages,thatcaneachbe10MBormore.ThatkindofmailboxcaneasilyexceedtheRAMontheclientmachineifitscontentsarealldownloadedatonce,asinthepreviousexample.Tohelpnetwork-basedmailclientsthatdonotwanttokeeplocalcopiesofeverymessage,IMAPsupportsseveraloperationsbesidesthebig“fetchthewholemessage”commandthatwesawintheprevioussection.

Ane-mail’sheaderscanbedownloadedasablockoftext,separatelyfromthemessage.

Particularheadersfromamessagecanberequestedandreturned.

TheservercanbeaskedtorecursivelyexploreandreturnanoutlineoftheMIMEstructureofamessage.

Thetextofparticularsectionsofthemessagecanbereturned.

ThisallowsIMAPclientstoperformveryefficientqueriesthatdownloadonlytheinformationtheyneedtodisplayfortheuser,decreasingtheloadontheIMAPserverandthenetwork,andallowingresultstobedisplayedmorequicklytotheuser.ForanexampleofhowasimpleIMAPclientworks,examinesimple_client.py,whichputstogetheranumberofideasaboutbrowsinganIMAPaccount.Hopefullythisprovidesmorecontextthanwouldbepossibleifthesefeatureswerespreadoutoverahalf-dozenshorterprogramlistingsatthispointinthechapter.Youcanseethattheclientconsistsofthreeconcentricloopsthateachtakeinputfromtheuserasheorsheviewsthelistofmailfolders,thenthelistofmessageswithinaparticularmailfolder,andfinallythesectionsofaspecificmessage.

importgetpass,sys

fromimapclientimportIMAPClient

try:

hostname,username=sys.argv[1:]

exceptValueError:

print'usage:%shostnameusername'%sys.argv[0]

sys.exit(2)

banner='-'*72

c=IMAPClient(hostname,ssl=True)

try:

c.login(username,getpass.getpass())

exceptc.Error,e:

print'Couldnotlogin:',e

sys.exit(1)

defdisplay_structure(structure,parentparts=[]):

"""Attractivelydisplayagivenmessagestructure."""

#Thewholebodyofthemessageisnamed'TEXT'.

ifparentparts:

name='.'.join(parentparts)

else:

print'HEADER'

name='TEXT'

#Printthispart'sdesignationanditsMIMEtype.

is_multipart=isinstance(structure[0],list)

ifis_multipart:

parttype='multipart/%s'%structure[1].lower()

else:

parttype=('%s/%s'%structure[:2]).lower()

print'%-9s'%name,parttype,

#Foramultipartpart,printallofitssubordinateparts;for

#otherparts,printtheirdisposition(ifavailable).

ifis_multipart:

DownloadingMessagesIndividually

Page 183: Python Networking Gitbook

print

subparts=structure[0]

foriinrange(len(subparts)):

display_structure(subparts[i],parentparts+[str(i+1)])

else:

ifstructure[6]:

print'size=%s'%structure[6],

ifstructure[8]:

disposition,namevalues=structure[8]

printdisposition,

foriinrange(0,len(namevalues),2):

print'%s=%r'%namevalues[i:i+2]

print

defexplore_message(c,uid):

"""Lettheuserviewvariouspartsofagivenmessage."""

msgdict=c.fetch(uid,['BODYSTRUCTURE','FLAGS'])

whileTrue:

print

print'Flags:',

flaglist=msgdict[uid]['FLAGS']

ifflaglist:

print''.join(flaglist)

else:

print'none'

display_structure(msgdict[uid]['BODYSTRUCTURE'])

print

reply=raw_input('Message%s-typeapartname,or"q"toquit:'

%uid).strip()

print

ifreply.lower().startswith('q'):

break

key='BODY[%s]'%reply

try:

msgdict2=c.fetch(uid,[key])

exceptc._imap.error:

print'Error-cannotfetchsection%r'%reply

else:

content=msgdict2[uid][key]

ifcontent:

printbanner

printcontent.strip()

printbanner

else:

print'(Nosuchsection)'

defexplore_folder(c,name):

"""Listthemessagesinfolder`name`andlettheuserchooseone."""

whileTrue:

c.select_folder(name,readonly=True)

msgdict=c.fetch('1:*',['BODY.PEEK[HEADER.FIELDS(FROMSUBJECT)]',

'FLAGS','INTERNALDATE','RFC822.SIZE'])

print

foruidinsorted(msgdict):

items=msgdict[uid]

print'%6d%20s%6dbytes%s'%(

uid,items['INTERNALDATE'],items['RFC822.SIZE'],

''.join(items['FLAGS']))

foriinitems['BODY[HEADER.FIELDS(FROMSUBJECT)]'].splitlines():

print''*6,i.strip()

reply=raw_input('Folder%s-typeamessageUID,or"q"toquit:'

%name).strip()

ifreply.lower().startswith('q'):

break

try:

reply=int(reply)

exceptValueError:

print'Pleasetypeanintegeror"q"toquit'

else:

ifreplyinmsgdict:

explore_message(c,reply)

c.close_folder()

defexplore_account(c):

"""DisplaythefoldersinthisIMAPaccountandlettheuserchooseone."""

whileTrue:

Page 184: Python Networking Gitbook

print

folderflags={}

data=c.list_folders()

forflags,delimiter,nameindata:

folderflags[name]=flags

fornameinsorted(folderflags.keys()):

print'%-30s%s'%(name,''.join(folderflags[name]))

print

reply=raw_input('Typeafoldername,or"q"toquit:').strip()

ifreply.lower().startswith('q'):

break

ifreplyinfolderflags:

explore_folder(c,reply)

else:

print'Error:nofoldernamed',repr(reply)

if__name__=='__main__':

explore_account(c)

Youcanseethattheouterfunctionusesasimplelist_folders()calltopresenttheuserwithalistofhisorhermailfolders,likesomeoftheprogramlistingswehaveseenalready.Eachfolder’sIMAPflagsarealsodisplayed.Thisletstheprogramgivetheuserachoicebetweenfolders:

INBOX\HasNoChildren

Receipts\HasNoChildren

Travel\HasNoChildren

Work\HasNoChildren

Typeafoldername,or"q"toquit:

``

Onceauserhasselectedafolder,thingsbecomemoreinteresting:asummaryhastobeprintedfor

eachmessage.NotethatitiscarefultouseBODY.PEEKinsteadofBODYtofetchtheseitems,sincetheIMAP

serverwouldotherwisemarkthemessagesas\Seenmerelybecausetheyhadbeendisplayedina

summary.

Theresultsofthis`fetch()`callareprintedtothescreenonceane-mailfolderhasbeenselected:

27032010-09-2821:32:1319129bytes\SeenFrom:BrandonCraigRhodesSubject:DigestedArticles27042010-09-2823:03:4515354bytesSubject:Re:[venv]BuildingavirtualenvironmentforofflinetestingFrom:"W.CraigTrader"27052010-09-2908:11:3810694bytesSubject:Re:[venv]BuildingavirtualenvironmentforofflinetestingFrom:HugoLopesTavaresFolderINBOX-typeamessageUID,or"q"toquit: `Asyoucansee,thefactthatseveralitemsofinterest

canbesuppliedtotheIMAPfetch()`commandletsusbuildfairlysophisticatedmessagesummarieswithonlyasingleround-triptotheserver.Onefinalnoteaboutthefetch()command:itletsyounotonlypulljustthepartsofamessagethatyouneedatanygivenmoment,butalsotruncatethemincasetheyarequitelongandyoujustwanttoprovideanexcerptfromthebeginningtotantalizetheuser.

Page 185: Python Networking Gitbook

Youmighthavenoticed,whiletryingoutsimple_client.pyorreadingitsexampleoutputjustshown,thatIMAPmarksmessageswithattributescalled“flags,”whichtypicallytaketheformofabackslashprefixedword,like\Seenforoneofthemessagesjustcited.Severalofthesearestandard,andaredefinedinRFC3501foruseonallIMAPservers.Hereiswhatthemostimportantonesmean:

\Answered:Theuserhasrepliedtothemessage.

\Draft:Theuserhasnotfinishedcomposingthemessage.

\Flagged:Themessagehassomehowbeensingledoutspecially;thepurposeandmeaningofthisflagvarybetweenmailreaders.

\Recent:NoIMAPclienthasseenthismessagebefore.Thisflagisunique,inthattheflagcannotbeaddedorremovedbynormalcommands;itisautomaticallyremovedafterthemailboxisselected.

\Seen:Themessagehasbeenread.

TheIMAPClientlibrarysupportsseveralmethodsforworkingwithflags.Thesimplestretrievestheflagsasthoughyouhaddoneafetch()askingfor'FLAGS',butgoesaheadandremovesthedictionaryaroundeachanswer:

>>>c.get_flags(2703)

{2703:('\\Seen',)}

Therearealsocallstoaddandremoveflagsfromamessage:

c.remove_flags(2703,['\\Seen'])

c.add_flags(2703,['\\Answered'])

Incaseyouwanttocompletelychangethesetofflagsforaparticularmessagewithoutfiguringoutthecorrectseriesofaddsandremoves,youcanuseset_flags()tounilaterallyreplacethewholelistofmessageflagswithanewone:

c.set_flags(2703,['\\Seen','\\Answered'])

AnyoftheseoperationscantakealistofmessageUIDsinsteadofthesingleUIDshownintheseexamples.

OnelastinterestinguseofflagsisthatitishowIMAPsupportsmessagedeletion.Theprocess,forsafety,takestwosteps:firsttheclientmarksoneormoremessageswiththe\Deleteflag;thenitcallsexpunge()toperformthedeletionsasasingleoperation.TheIMAPClientlibrarydoesnotmakeyoudothisbyhand,however(thoughthatwouldwork);insteadithidesthefactthatflagsareinvolvedbehindasimpledelete_messages()routinethatmarksthemessagesforyou.Itstillhastobefollowedbyexpunge()ifyouactuallywanttheoperationtotakeeffect,though:

c.delete_messages([2703,2704])

c.expunge()

FlaggingandDeletingMessages

Flagging

DeletingMessages

Page 186: Python Networking Gitbook

Searchingisanotherissuethatisveryimportantforaprotocoldesignedtoletyoukeepallyourmailonthemailserveritself:withoutsearch,ane-mailclientwouldhavetodownloadallofauser’smailanywaythefirsttimeheorshewantedtoperformafull-textsearchtofindane-mailmessage.Theessenceofsearchissimple:youcallthesearch()methodonanIMAPclientinstance,andarereturnedtheUIDs(assuming,ofcourse,thatyouaccepttheIMAPClientdefaultofuse_uid=Trueforyourclient)ofthemessagesthatmatchyourcriteria:

>>>c.select_folder('INBOX')

>>>c.search('SINCE20-Aug-2010TEXTApress')

[2590L,2652L,2653L,2654L,2655L,2699L]

Therearemanycriteriathatyoucancombineinordertoformaquery.LiketherestofIMAP,theyarespecifiedinRFC3501.Somecriteriaarequitesimple,andrefertobinaryattributeslikeflags:

ALL:Everymessageinthemailbox

UID(id,...):MessageswiththegivenUIDs

LARGERn:Messagesmorethannoctetsinlength

SMALLERm:Messageslessthanmoctetsinlength

ANSWERED:Havetheflag\Answered

DELETED:Havetheflag\Deleted

DRAFT:Havetheflag\Draft

FLAGGED:Havetheflag\Flagged

KEYWORDflag:Havethegivenkeywordflagset

NEW:Havetheflag\Recent

OLD:Lacktheflag\Recent

UNANSWERED:Lacktheflag\Answered

UNDELETED:Lacktheflag\Deleted

UNDRAFT:Lacktheflag\Draft

UNFLAGGED:Lacktheflag\Flagged

UNKEYWORDflag:Lackthegivenkeywordflag

UNSEEN:Lacktheflag\Seen

Therearetwosetsofcriteriafordates,dependingonwhichdateyouwanttoqueryby:theinternalDateheader(sneddate)andtheatwhcicharrivedattheIMAPserver.

Finally,therearetwosearchoperationsthatrefertothetextofthemessageitself—thesearethebigworkhorsesthatsupportfull-textsearchofthekindyourusersareprobablyexpectingwhentheytypeintoasearchfieldinane-mailclient:

BODYstring:Themessagebodymustcontainthestring.

TEXTstring:Theentiremessage,eitherbodyorheader,mustcontainthestringsomewhere.

CreatingordeletingfoldersisdonequitesimplyinIMAP,byprovidingthenameofthefolder:

c.create_folder('Personal')

c.delete_folder('Work')

SomeIMAPserversorconfigurationsmaynotpermittheseoperations,ormayhaverestrictionsonnaming;besuretohaveerrorcheckinginplacewhencallingthem.Therearetwooperationsthatcancreatenewe-mailmessagesinyourIMAPaccountbesidesthe“normal”meansofwaitingforpeopletosendthemtoyou.First,youcancopyanexistingmessagefromitshomefolderoverintoanotherfolder.Startbyusingselect_folder()tovisitthefolderwherethemessageslive,andthenrunthecopymethodlikethis:

c.select_folder('INBOX')

SearchingandManipulatingMessages

ManipulatingFoldersandMessages

Page 187: Python Networking Gitbook

c.copy([2653L,2654L],'TODO')

Finally,itispossibletoaddamessagetoamailboxwithIMAP.YoudonotneedtosendthemessagefirstwithSMTP;IMAPisallthatisneeded.Addingamessageisasimpleprocess,thoughthereareacoupleofthingstobeawareof.

Youmustalsobecautiousinhowcarefullyyouchangethelineendings,becausesomemessagesmayuse'\r\n'somewhereinsidedespiteusingonly'\n'forthefirstfewdozenlines,andIMAPclientshavebeenknowntofailifamessageusesbothdifferentlineendings!Thesolutionisasimpleone,thankstoPython’spowerfulsplitlines()stringmethodthatrecognizesallthreepossiblelineendings;simplycallthefunctiononyourmessageandthenre-jointhelineswiththestandardlineending:

>>>'one\rtwo\nthree\r\nfour'.splitlines()

['one','two','three','four']

>>>'\r\n'.join('one\rtwo\nthree\r\nfour'.splitlines())

'one\r\ntwo\r\nthree\r\nfour'

Theactualactofappendingamessage,onceyouhavethelineendingscorrect,istocalltheappend()methodonyourIMAPclient:

c.append('INBOX',my_message)

Youcanalsosupplyalistofflagsasakeywordargument,aswellasamsg_timetobeusedasitsarrivaltimebypassinganormalPythondatetimeobject.

Page 188: Python Networking Gitbook

The“commandline”isthetopicofthischapter:howyoucanaccessitoverthenetwork,togetherwithenoughdiscussionaboutitstypicalbehaviortogetyouthroughanyfrustrationsyoumightencounterwhiletryingtouseit.

TelnetandSSH

Page 189: Python Networking Gitbook

Beforegettingintothedetailsofhowthecommandlineworks,andhowyoucanaccessitoverthenetwork,weshouldpauseandnotethatthereexistmanysystemstodayforautomatingtheentireprocess.Ifyouhavedozensorhundredsofmachinestomaintainandyouneedtostartsendingthemallthesamecommands,thenyoumightfindthattoolsalreadyexist—toolsthatalreadyprovidewaystowritecommandscripts,pushthemoutforexecutionacrossacloudofmachines,batchupanyerrormessagesorresponsesforyourreview,andevensavecommandsinaqueuetobere-triedlaterincaseamachineisdownandcannotbereachedatthemoment.

Whataretheoptions?First,theFabriclibraryisverypopularwithPythonprogrammerswhoneedtoruncommandsandcopyfilestoremoteservermachines.Asyoucanseeinfabfile.py,aFabricscriptcallsverysimplefunctionswithnameslikeput(),cd(),andrun()toperformoperationsonthemachinestowhichitconnects.Butyoucanlearnmoreaboutitatitswebsite:http://fabfile.org/.Althoughfabfile.pyisdesignedtoberunbyFabric'sownfabcommand-linetool,FabriccanalsobeusedfrominsideyourownPythonprograms;again,consulttheirdocumentationfordetails.

fromfabric.apiimport*

defversions():

withcd('/usr/bin'):

withsettings(hide('warnings'),warn_only=True):

forversionin'2.4','2.5','2.6','2.7','3.0','3.1':

result=run('python%s-c"None"'%version)

ifnotresult.failed:

print"Host",env.host,"hasPython",version

AnotherprojecttocheckoutisSilverLining.Itisstillveryimmature,butifyouareanexperiencedprogrammerwhoneedsitsspecificcapabilities,thenyoumightfindthatitsolvesyourproblemswell.Thislibrarygoesbeyondbatchingcommandsacrossmanydifferentservers:itwillactuallycreateandinitializeUbuntuserversthroughthe“libcloud”PythonAPI,andtheninstallyourPythonwebapplicationsthereforyou.Youcanlearnmoreaboutthispromisingprojectathttp://cloudsilverlining.org/.

Ontheotherhand,thereis“pexpect.”Whileitisnot,technically,aprogramthatitselfknowshowtousethenetwork,itisoftenusedtocontrolthesystem“ssh”or“telnet”commandwhenaPythonprogrammerwantstoautomateinteractionswitharemotepromptofsomekind.ThistypicallytakesplaceinasituationwherenoAPIforadeviceisavailable,andcommandssimplyhavetobetypedeachtimethecommand-linepromptappears.Configuringsimplenetworkhardwareoftenrequiresthiskindofclunkystep-by-stepinteraction.Youcanlearnmoreabout“pexpect”here:http://pypi.python.org/pypi/pexpect.

Finally,therearemorespecificprojectsthatprovidemechanismsforremotesystemsadministration.RedHatandFedorausersmightlookatfunc,whichusesanSSL-encryptedXML-RPCservicethatletsyouwritePythonprogramsthatperformsystemconfigurationandmaintenance:https://fedorahosted.org/func/.

Command-LineAutomation

Page 190: Python Networking Gitbook

IfyouhaveevertypedmanycommandsataUnixcommandprompt,youwillbeawarethatnoteverycharacteryoutypeisinterpretedliterally.Considerthiscommand,forexample:

root@erlerobot:~#echo*

Hello.txtPython-3.4.1Python-3.4.1.tgzPython_filesbuildgmapenvhola.txtotrotext.txtvirtualenv-1.11.6virtualenv-1.11.6.tar.gz

root@erlerobot:~#

Theasterisk*inthiscommandwasnotinterpretedtomean“printoutanasteriskcharactertothescreen”;instead,theshellthoughtIwastryingtowriteapatternthatwouldmatchallofthefilenamesinthecurrentdirectory.Toactuallyprintoutanasterisk,Ihavetouseanotherspecialcharacter—an“escape”character,becauseitletsme“escape”fromtheshell'snormalmeaning—totellitthatIjustmeantheasteriskliterally:

root@erlerobot:~#echoHereisaloneasterisk:\*

Hereisaloneasterisk:*

root@erlerobot:~#echoAndhereare'*'two"*"moreasterisks

Andhereare*two*moreasterisks

root@erlerobot:~#

Therulesbywhichmodernshellsinterpretthespecialcharactersinyourcommandlinehavebecomequitecomplex.Instead,tousethecommandlineeffectively,youjusthavetounderstandtwopoints:

Specialcharactersareinterpretedasspecialbytheshellyouareusing,likebash.

Whenpassingcommandstoashelleitherlocallyoracrossthenetwork,youneedtoescapethespecialcharactersyouusesothattheyarenotexpandedintounintendedvaluesontheremotesystem.

Command-LineExpansionandQuoting

Page 191: Python Networking Gitbook

Likemanyveryusefulstatements,theboldclaimofthetitleofthissectionis,alas,alie.Thereis,infact,acharacterthatUnixconsidersspecial.But,ingeneral,Unixhasnospecialcharacters,andthisisaveryimportantfactforyoutograsp.

Ontheonehand,itmakesitveryeasyto,say,nameallofthefilesinthecurrentdirectoryasargumentstoacommand;butontheotherhand,itcanbeverydifficulttoechoamessagetothescreenthatmixessinglequotesanddouble-quotes.

Thesimplelessonofthissectionisthatthewholesetofconventionstowhichyouareaccustomedhasnothingtodowithyouroperatingsystem;theyaresimplyandentirelyabehaviorofthebashshell,orofwhicheveroftheotherpopular(orarcane)shellsthatyouareusing.Itdoesnotmatterhowfamiliartherulesseem,orhowdifficultitisforyoutoimagineusingaUnix-likesystemwithoutthem.Ifyoutakebashaway,theyaresimplynotthere.Youcanobservethisquitesimplybytakingcontroloftheoperatingsystem'sprocesslauncheryourselfandtryingtothrowsomespecialcharactersatafamiliarcommand:

>>>importsubprocess

>>>args=['echo','Sometimesan','*','justmeansan','*']

>>>subprocess.call(args)

SometimesanjustmeansanHere,wearebypassingalloftheshellapplicationsthatareavailableforinterpretingcommands,andwearetellingtheoperatingsystemtostartanewprocessusingpreciselythelistofargumentswehaveprovided.Andtheprocess—theechocommand,inthiscase—isgettingexactlythosecharacters,insteadofhavingthe*turnedintoalistoffilenamesfirst.Thoughwerarelythinkaboutit,themostcommon“special”characterisoneweuseallthetime:thespacecharacter.Ratherthanassumethatyouactuallymeaneachspacecharactertobepassedtothecommandyouareinvoking,theshellinsteadinterpretsitasthedelimiterseparatingtheactualtextyouwantthecommandtosee.ThiscausesendlessentertainmentwhenpeopleincludespacesinUnixfilenames,andthentrytomovethefilesomewhereelse:

root@erlerobot:~#mvSmithContract.txt~/Documents

mv:cannotstat`Smith':Nosuchfileordirectory

mv:cannotstat`Contract.txt':Nosuchfileordirectory

Tomaketheshellunderstandthatyouaretalkingaboutonefilewithaspaceinitsname,nottwofiles,youhavetocontrivesomethinglikeoneofthesepossiblecommandlines:

root@erlerobot:~#mvSmith\Contract.txt~/Documents

root@erlerobot:~#mv"SmithContract.txt"~/Documents

root@erlerobot:~#mvSmith*Contract.txt~/Documents

Thatlastpossibilityobviouslymeanssomethingquitedifferent—sinceitwillmatchanyfilenamethathappenstostartwithSmithandendwithContract.txt,regardlessofwhetherthetextbetweenthemisasimplespacecharacterorsomemuchlongersequenceoftext—butIhaveseenmanypeopletypeitinfrustrationwhoarestilllearningshellconventionsandcannotrememberhowtotypealiteralspacecharacterfortheshell.Ifyouwanttoconvinceyourselfthatnoneofthecharactersthatthebashshellhastaughtyoutobecarefulaboutisspecial,shell.pyshowsasimpleshell,writteninPython,thattreatsonlythespaceasspecialbutpasseseverythingelsethroughliterallytothecommand.

importsubprocess

whileTrue:

args=raw_input(']').split()

ifnotargs:

pass

elifargs==['exit']:

break

elifargs[0]=='show':

print"Arguments:",args[1:]

UnixHasNoSpecialCharacters

Page 192: Python Networking Gitbook

else:

subprocess.call(args)

Runningthisfile,resulton:

root@erlerobot:~#pythonshell.py

]echoHithere!

Hithere!

]echoAnasterisk*isnotspecial.

Anasterisk*isnotspecial.

]echoThestring$HOSTisnotspecial,norare"doublequotes".

Thestring$HOSTisnotspecial,norare"doublequotes".

]echoWhat?No*<>!$specialcharacters?

What?No*<>!$specialcharacters?

]show"The'show'built-inlistsitsarguments."

Arguments:['"The',"'show'",'built-in','lists','its','arguments."']

]exit

YoucanseehereabsoluteevidencethatUnixcommands—inthiscase,the/bin/echocommandthatwearecallingoverandoveragain—donotgenerallyattempttointerprettheirargumentsasanythingotherthanstrings.Theechocommandhappilyacceptsdouble-quotes,dollarsigns,andasterisks,andtreatsthemallasliteralcharacters.Astheforegoingshowcommandillustrates,Pythonissimplyreducingourargumentstoalistofstringsfortheoperatingsystemtouseincreatinganewprocess.Whatifwefailtosplitourcommandintoseparatearguments?

>>>importsubprocess

>>>subprocess.call(['echohello'])

Traceback(mostrecentcalllast):

...

OSError:[Errno2]Nosuchfileordirectory

Theoperatingsystemdoesnotknowthatspacesshouldbespecial;thatisaquirkofshellprograms,notofUnix-likeoperatingsystemsthemselves!Sothesystemthinksthatitisbeingaskedtorunacommandliterallynamedecho[space]hello,and,unlessyouhavecreatedsuchafileinthecurrentdirectory,itfailstofinditandraisesanexception.

Topreventyoufrommakingthismistake,Pythonstopsyouinyourtracksifyouincludeanullcharacterinacommandlineargument:

>>>importsubprocess

>>>subprocess.call(['echo','Sentencescanend\0abruptly.'])

Traceback(mostrecentcalllast):

...

TypeError:execv()arg2mustcontainonlystrings

Sinceeverycommandonthesystemisdesignedtolivewithinthislimitation,youwillgenerallyfindthereisneveranyreasontoputnullcharactersintocommand-lineargumentsanyway.

Page 193: Python Networking Gitbook

Intheforegoingsection,weusedroutinesinPython'ssubprocessmoduletodirectlyinvokecommands.(Thesubprocessmoduleallowsyoutospawnnewprocesses,connecttotheirinput/output/errorpipes,andobtaintheirreturncodes.)Thiswasgreat,andletuspasscharactersthatwouldhavebeenspecialtoanormalinteractiveshell.Ifyouhaveabiglistoffilenameswithspacesandotherspecialcharactersinthem,itcanbewonderfultosimplypassthemintoasubprocesscallandhavethecommandonthereceivingendunderstandyouperfectly.

Butwhenyouareusingremote-shellprotocolsoverthenetwork(which,youwillrecall,isthesubjectofthischapter!),youaregenerallygoingtobetalkingtoashelllikebashinsteadofgettingtoinvokecommandsdirectlylikeyoudothroughthesubprocessmodule.Thismeansthatremote-shellprotocolswillfeelmorelikethesystem()routinefromtheosmodule,whichdoesinvokeashelltointerpretyourcommandline,andthereforeinvolvesyouinallofthecomplexitiesoftheUnixcommandline:

>>>importos

>>>os.system('echo*')

Hello.txtPython-3.4.1Python-3.4.1.tgzPython_filesbuildgmapenvhola.txtotrotext.txtvirtualenv-1.11.6virtualenv-

Ofcourse,iftheotherendofaremote-shellconnectionisusingsomesortofshellwithwhichyouareunfamiliar,thereislittlethatPythoncando.TheauthorsoftheStandardLibraryhavenoideahow,say,aMotorolaDSLrouter'sTelnet-basedcommandlinemighthandlespecialcharacters,orevenwhetheritpaysattentiontoquotesatall.ButiftheotherendofanetworkconnectionisastandardUnixshelloftheshfamily,likebashorzsh,thenyouareinluck:thefairlyobscurePythonpipesmodule,whichisnormallyusedtobuildcomplexshellcommandlines,containsahelperfunctionthatisperfectforescapingarguments.Itiscalledquote,andcansimplybepassedastring:

>>>frompipesimportquote

>>>printquote("filename")

filename

'filewithspaces'

>>>printquote("file'singlequoted'inside!")

"file'singlequoted'inside!"

>>>printquote("danger!;rm-r*")

'danger!;rm-r*'

Sopreparingacommandlineforremoteexecutiongenerallyjustinvolvesrunningquote()oneachargumentandthenpastingtheresulttogetherwithspaces.NotethatusingaremoteshellwithPythondoesnotinvolveyouintheterrorsoftwolevelsofshellquoting!IfyouhaveevertriedtobuildaremoteSSHcommandlinethatusesfancyquoting,bytypingalocalcommandlineintoyourownshell.Theattempttendstogenerateaseriesofexperimentslikethis:

$echo$HOST

guinness

$sshasaphecho$HOST

guinness

$sshasaphecho\$HOST

asaph

$sshasaphecho\\$HOST

guinness

$sshasaphecho\\\$HOST

$HOST

$sshasaphecho\\\\$HOST

\guinness

usingaremote-shellprotocolthroughPythondoesnotinvolvetwolevelsofshelllikethis.Instead,yougettoconstructaliteralstringinPythonthatthendirectlybecomeswhatisexecutedbytheremoteshell;nolocalshellisinvolved.Soifusingashell-within-a-shellhasyouconvincedthatpassingstringsandfilenamessafelytoaremoteshellisaveryhardproblem,relax:nolocalshellwillbeinvolvedinourfollowingexamples.

QuotingCharactersforProtection

Page 194: Python Networking Gitbook

YouwillprobablytalktomoreprogramsthanjusttheshelloveryourPython-poweredremote-shellconnection,ofcourse.Youwilloftenwanttowatchtheincomingdatastreamfortheinformationanderrorsprintedoutbythecommandsyouarerunning.Andsometimesyouwillevenwanttosenddataback,eithertoprovidetheremoteprogramswithinput,ortorespondtoquestionsandpromptsthattheypresent.

Whenperformingtaskslikethis,youmightbesurprisedtofindthatprogramshangindefinitelywithouteverfinishingtheoutputthatyouarewaitingon,orthatdatayousendseemstonotbegettingthrough.Tohelpyouthroughsituationslikethis,abriefdiscussionofUnixterminalsisinorder.

Aterminaltypicallynamesadeviceintowhichausertypestext,andonwhosescreenthecomputer'sresponsecanbedisplayed.IfaUnixmachinehasphysicalserialportsthatcouldpossiblyhostaphysicalterminal,thenthedevicedirectorywillcontainentrieslike/dev/ttyS1withwhichprogramscansendandreceivestringstothatdevice.Butmostterminalsthesedaysare,inreality,otherprograms:anxtermterminal,oraGnomeorKDEterminalprogram,oraPuTTYclientonaWindowsmachinethathasconnectedviaaremote-shellprotocolofthekindwewilldiscuss.

Buttheprogramsrunninginsidetheterminalonyourlaptopordesktopmachinestillneedtoknowthattheyaretalkingtoaperson—theystillneedtofeelliketheyaretalkingthroughthemechanismofaterminaldeviceconnectedtoadisplay.SotheUnixoperatingsystemprovidesasetof“pseudoterminal”devices(whichmighthavelessconfusinglybeennamed“virtual”terminals)withnameslike/dev/tty42.WhensomeonebringsupanxtermorconnectsthroughSSH,thextermorSSHdaemongrabsafreshpseudo-terminal,configuresit,andrunstheuser'sshellbehindit.Theshellexaminesitsstandardinput,seesthatitisaterminal,andpresentsapromptsinceitbelievesitselftobetalkingtoaperson.

Thisisacrucialdistinctiontounderstand:theshellpresentsapromptbecause,andonlybecause,itthinksitisconnectedtoaterminal!Ifyoustartupashellandgiveitastandardinputthatisnotaterminal—like,say,apipefromanothercommand—thennopromptwillbeprinted,yetitwillstillrespondtocommands:

root@erlerobot:~#cat|bash

echoHereweareinsideofbash,withnoprompt!

Hereweareinsideofbash,withnoprompt!

python

print'Pythonhasnotprintedaprompt,either.'

importsys

print'Isthisaterminal?',sys.stdin.isatty()

YoucanseethatPython,also,doesnotprintitsusualstartupbanner,nordoesitpresentanyprompts.

Thereareevenchangesinhowsomecommandsformattheiroutputdependingonwhethertheyaretalkingtoaterminal.Somecommandswithlonglinesofoutput—thepscommandcomestomind—willtruncatetheirlinestoyourterminalwidthifusedinteractively,butproducearbitrarilywideoutputifconnectedtoapipeorfile.And,entertaininglyenough,thefamiliarcolumn-basedoutputofthelscommandgetsturnedoffandreplacedwithafilenameoneachline(whichis,youmustadmit,aneasierformatforreadingbyanotherprogram)ifitsoutputisapipeorfile:

root@erlerobot:~#ls

Hello.txtPython_fileshola.txtvirtualenv-1.11.6

Python-3.4.1buildotrovirtualenv-1.11.6.tar.gz

Python-3.4.1.tgzgmapenvtext.txt

root@erlerobot:~#ls|cat

Hello.txt

Python-3.4.1

Python-3.4.1.tgz

Python_files

build

gmapenv

hola.txt

otro

text.txt

virtualenv-1.11.6

virtualenv-1.11.6.tar.gz

ThingsAreDifferentinaTerminal

Page 195: Python Networking Gitbook

root@erlerobot:~#

AprogramrunningbehindTelnet,forexample,alwaysthinksitistalkingtoaterminal;soyourscriptsorprogramsmustalwaysexpecttoseeaprompteachtimetheshellisreadyforinput,andsoforth.ButwhenyoumakeaconnectionoverthemoresophisticatedSSHprotocol,youwillactuallyhaveyourchoiceofwhethertheprogramthinksthatitsinputisaterminalorjustaplainpipeorfile.Youcantestthiseasilyfromthecommandlineifthereisanothercomputeryoucanconnectto:

root@erlerobot:~#ssh-tasaph

asaph$echo"Hereweare,ataprompt."

Hereweare,ataprompt.

SowhenyouspawnacommandthroughamodernprotocollikeSSH,youneedtoconsiderwhetheryouwanttheprogramontheremoteendthinkingthatyouareapersontypingatitthroughaterminal,orwhetherithadbestthinkitistalkingtorawdatacominginthroughafileorpipe.

Programsarenotactuallyrequiredtoactanydifferentlywhentalkingtoaterminal;itisjustforourconveniencethattheyvarytheirbehavior:

Programsthatareoftenusedinteractivelywillpresentahuman-readablepromptwhentheyaretalkingtoaterminal.Butwhentheythinkinputiscomingfromafile,theyavoidprintingaprompt.

Sophisticatedinteractiveprograms,thesedays,usuallyturnoncommand-lineeditingwhentheirinputisaTTY.

Manyprogramsreadonlyonelineofinputatatimewhenlisteningtoaterminal,becausehumansliketogetanimmediateresponsetoeverycommandtheytype.Butwhenreadingfromapipeorfile,thesesameprogramswillwaituntilthousandsofcharactershavearrivedbeforetheytrytointerprettheirfirstbatchofinput.

Itisevenmorecommonforprogramstoadjusttheiroutputbasedonwhethertheyaretalkingtoaterminal.

Bothofthelasttwoissues,whichinvolvebuffering,causeallsortsofproblemswhenyoutakeaprocessthatyouusuallydomanuallyandtrytoautomateit—becauseindoingsoyouoftenmovefromterminalinputtoinputprovidedthroughafileorpipe,andsuddenlyyoufindthattheprogramsbehavequitedifferently,andmightevenseemtobehangingbecause“print”statementsarenotproducingimmediateoutput,butareinsteadsavinguptheirresultstopushoutallatoncewhentheiroutputbufferisfull.

YoucanseethiseasilywithasimplePythonprogram(sincePythonisoneoftheapplicationsthatdecideswhethertobufferitsoutputbasedonwhetheritistalkingtoaterminal)thatprintsamessage,waitsforalineofinput,andthenprintsagain:

root@erlerobot:~#python-c'print"talk:";s=raw_input();print"yousaid",s'

talk:

hi

yousaidhi

root@erlerobot:~#python-c'print"talk:";s=raw_input();print"yousaid",s'|cat

hi

talk:

yousaidhi

Youcanseethatinthefirstinstance,whenPythonknewitsoutputwasaterminal,itprintedtalk:immediately.Butinthesecondinstance,itsoutputwasapipetothecatcommand,andsoitdecidedthatitcouldsaveuptheresultsofthatfirstprintstatementandbatchthemtogetherwiththerestoftheprogram'soutput,sothatbothlinesofoutputappearedonlyonceyouhadprovidedyourinputandtheprogramwasending.

Theforegoingproblemiswhymanycarefullywrittenprograms,bothinPythonandinotherlanguages,frequentlycallflush()ontheiroutputtomakesurethatanythingwaitinginabuffergoesaheadandgetssentout,regardlessofwhethertheoutputlookslikeaterminal.Sothosearethebasicproblemswithterminalsandbuffering:programschangetheirbehavior,ofteninidiosyncraticways,whentalkingtoaterminal(thinkagainofthelsexample),andtheyoftenstartheavily

Page 196: Python Networking Gitbook

bufferingtheiroutputiftheythinktheyarewritingtoafileorpipe.

Page 197: Python Networking Gitbook

Beyondtheprogram-specificbehaviorsjustdescribed,thereareadditionalproblemsraisedbyterminals.

Forexample,whathappenswhenyouwantaprogramtobereadingyourinputonecharacteratatime,buttheUnixterminaldeviceitselfisbufferingyourkeystrokestodeliverthemasawholeline?ThiscommonproblemhappensbecausetheUnixterminaldefaultsto“canonical”inputprocessing,whereitletstheuserenterawholeline,andevenedititbybackspacingandre-typing,beforefinallypressing“Enter”andlettingtheprogramseewhatheorshehastyped.Ifyouwanttoturnoffcanonicalprocessingsothataprogramcanseeeveryindividualcharacterasitistyped,youcanusethestty“SetTTYsettings”commandtodisableit:

root@erlerobot:~#stty-icanon

AnotherproblemisthatUnixterminalstraditionallysupportedapairofkeystrokesforpausingtheoutputstreamsothattheusercouldreadsomethingonthescreenbeforeitscrolledoffandwasreplacedbymoretext.OftenthesewerethecharactersCtrl+Sfor“Stop”andCtrl+Qfor“Keepgoing,”anditwasasourceofgreatannoyancethatifbinarydataworkeditswayintoanautomatedTelnetconnectionthatthefirstCtrl+Sthathappenedtopassacrossthechannelwouldpausetheterminalandprobablyruinthesession.Again,thissettingcanbeturnedoffwithstty:

root@erlerobot:~#stty-ixon-ixoff

Thereareplentyoflessfamoussettingsthatcanalsocauseyougrief.Becausetherearesomany—andbecausetheyvarybetweenUniximplementations—thesttycommandactuallysupportstwomodes,cookedandraw,thatturndozensofsettingslikeicanonandixononandofftogether:

root@erlerobot:~#sttyraw

root@erlerobot:~#sttycooked

Incaseyoumakeyourterminalsettingsahopelessmessaftersomeexperimentation,mostUnixsystemsprovideacommandforresettingtheterminalbacktoreasonable,sanesettings(youmightneedtohitCtrl+Jtosubmittheresetcommand,sinceyourReturnkey,whoseequivalentisCtrl+M,actuallyonlyfunctionstosubmitcommandsbecauseofaterminalsettingcalledicrnl):

root@erlerobot:~#reset

If,insteadoftryingtogettheterminaltobehaveacrossaTelnetorSSHsession,youhappentobetalkingtoaterminalfromPython,checkoutthetermiosmodulethatcomeswiththeStandardLibrary.ThismoduleprovidesaninterfacetothePOSIXcallsforttyI/Ocontrol.Foracompletedescriptionofthesecalls,seethePOSIXorUnixmanualpages.

TerminalsDoBuffering

Page 198: Python Networking Gitbook

TelnetisanetworkprotocolusedontheInternetorlocalareanetworkstoprovideabidirectionalinteractivetext-orientedcommunicationfacilityusingavirtualterminalconnection.Userdataisinterspersedin-bandwithTelnetcontrolinformationinan8-bitbyteorienteddataconnectionovertheTransmissionControlProtocol(TCP).

Telnetisinsecure:anyonewatchingyourTelnetpacketsflybywillseeyourusername,password,andeverythingyoudoontheremotesystem.Itisclunky.Andithasbeencompletelyabandonedformostsystemsadministration.

IncaseyouarehavingtowriteaPythonprogramthathastospeakTelnettooneofthesedevices,hereareafewpointersonusingthePythontelnetlib.ThetelnetlibmoduleprovidesaTelnetclassthatimplementstheTelnetprotocol.

First,youhavetorealizethatallTelnetdoesistoestablishachannelandtosendthethingsyoutype,andreceivethethingstheremotesystemsays,backandforthacrossthatchannel.ThismeansthatTelnetisignorantofallsortsofthingsofwhichyoumightexpectaremote-shellprotocoltobeaware.

Forexample,itisconventionalthatwhenyouTelnettoaUnixmachine,youarepresentedwithaalogin:promptatwhichyoutypeyourusername,andapassword:promptwhereyouenteryourpassword.

ThefactthatTelnetisignorantaboutauthenticationhasanimportantconsequence:youcannottypeanythingonthecommandlineitselftogetyourselfpre-authenticatedtotheremotesystem,noravoidtheloginandpasswordpromptsthatwillpopupwhenyoufirstconnect!IfyouaregoingtouseplainTelnet,youaregoingtohavetosomehowwatchtheincomingtextforthosetwoprompts(orhowevermanytheremotesystemsupplies)andissuethecorrectreplies.

Obviously,ifsystemsvaryinwhatusernameandpasswordpromptstheypresent,thenyoucanhardlyexpectstandardizationintheerrormessagesorresponsesthatgetsentbackwhenyourpasswordfails.ThatiswhyTelnetissohardtoscriptandprogramfromalanguagelikePythonandalibraryliketelnetlib.

SoifyouareusingTelnet,thenyouareplayingatextgame:youwatchfortexttoarrive,andthentrytoreplywithsomethingintelligibletotheremotesystem.Tohelpyouwiththis,thePythontelnetlibprovidesnotonlybasicmethodsforsendingandreceivingdata,butalsoafewroutinesthatwillwatchandwaitforaparticularstringtoarrivefromtheremotesystem.

telnet_login.pyconnectstolocalhost,whichinthiscaseismyUbuntulaptop,whereIhavejustrunaptitudeinstalltelnetdsothataTelnetdaemonisnowlisteningonitsstandardport23.

importtelnetlib

t=telnetlib.Telnet('localhost')

#t.set_debuglevel(1)#uncommentthisfordebuggingmessages

t.read_until('login:')

t.write('brandon\n')

t.read_until('assword:')#let"P"becapitalizedornot

t.write('mypass\n')

n,match,previous_text=t.expect([r'Loginincorrect',r'\$'],10)

ifn==0:

print"Usernameandpasswordfailed-givingup"

else:

t.write('execuptime\n')

printt.read_all()#keepreadinguntiltheconnectioncloses

Ifthescriptissuccessful,itshowsyouwhatthesimpleuptimecommandprintsontheremotesystem:

root@erlerobot:~/Python_files#pythontelnet_login.py

10:24:43up5days,12:13,14users,loadaverage:1.44,0.91,0.73

Telnet

Page 199: Python Networking Gitbook

Thelistingshowsyouthegeneralstructureofasessionpoweredbytelnetlib.First,aconnectionisestablished,whichisrepresentedinPythonbyaninstanceoftheTelnetobject.Hereonlythehostnameisspecified,thoughyoucanalsoprovideaportnumbertoconnecttosomeotherserviceportthanstandardTelnet.Youcancallset_debuglevel(1)ifyouwantyourTelnetobjecttoprintoutallofthestringsthatitsendsandreceivesduringthesession.Thisactuallyturnedouttobeimportantforwritingeventheverysimplescriptshowninthelisting,becauseintwodifferentcasesitgothungup,andIhadtore-runitwithdebuggingmessagesturnedonsothatIcouldseetheactualoutputandfixthescript.Igenerallyturnoffdebuggingonlyonceaprogramisworkingperfectly,andturnitbackonwheneverIwanttodomoreworkonthescript.

NotethatTelnetdoesnotdisguisethefactthatitsserviceisbackedbyaTCPsocket,andwillpassthroughtoyourprogramanysocket.errorandsocket.gaierrorexceptionsthatareraised.OncetheTelnetsessionisestablished,interactiongenerallyfallsintoareceive-and-sendpattern,whereyouwaitforapromptorresponsefromtheremoteend,thensendyournextpieceofinformation.Thelistingillustratestwomethodsofwaitingfortexttoarrive:

Theverysimpleread_until()methodwatchesforaliteralstringtoarrive,thenreturnsastringprovidingallofthetextthatitreceivedfromthemomentitstartedlistinguntilthemomentitfinallysawthestringyouwerewaitingfor.

Themorepowerfulandsophisticatedexpect()methodtakesalistofPythonregularexpressions.Oncethetextarrivingfromtheremoteendfinallyaddsuptosomethingthatmatchesoneoftheregularexpressions,`expect()returnsthreeitems:theindexinyourlistofthepatternthatmatched,theregularexpressionSRE_Matchobjectitself,andthetextthatwasreceivedleadinguptothematchingtext.FormoreinformationonwhatyoucandowithaSRE_Match,includingfindingthevaluesofanysub-expressionsinyourpattern,readtheStandardLibrarydocumentationfortheremodule.

Ifthescriptseesanerrormessagebecauseofanincorrectpassword—anddoesnotgetstuckwaitingforeverforaloginorpasswordpromptthatneverarrivesorthatlooksdifferentthanitwasexpecting—thenitexits:

root@erlerobot:~/Python_files#pythontelnet_login.py

Usernameandpasswordfailed-givingup

IfyouwindupwritingaPythonscriptthathastouseTelnet,itwillsimplybealargerormorecomplicatedversionofthesamesimplepatternshownhere.Bothread_until()andexpect()takeanoptionalsecondargumentnamedtimeoutthatplacesamaximumlimitonhowlongthecallwillwatchforthetextpatternbeforegivingupandreturningcontroltoyourPythonscript.Iftheyquitandgiveupbecauseofthetimeout,theydonotraiseanerror;instead—awkwardlyenough—theyjustreturnthetexttheyhaveseensofar,andleaveittoyoutofigureoutwhetherthattextcontainsthepattern.ThereareafewoddsandendsintheTelnetobjectthatweneednotcoverhere.YouwillfindtheminthetelnetlibStandardLibrarydocumentation—includinganinteract()methodthatletstheuser“talk”directlyoveryourTelnetconnectionusingtheterminal!Thiskindofcallwasverypopularbackintheolddays,whenyouwantedtoautomateloginbutthentakecontrolandissuenormalcommandsyourself.

Normally,eachtimeaTelnetserversendsanoptionrequest,telnetlibflatlyrefusestosendorreceivethatoption.ButyoucanprovideaTelnetobjectwithyourowncallbackfunctionforprocessingoptions;amodestexampleisshownintelnet_codes.py.Formostoptions,itsimplyre-implementsthedefaulttelnetlibbehaviorandrefusestohandleanyoptions(andalwaysremembertorespondtoeachoptiononewayoranother;failingtodosowilloftenhangtheTelnetsessionastheserverwaitsforeverforyourreply).Butiftheserverexpressesinterestinthe“terminaltype”option,thenthisclientsendsbackareplyof“mypython,”whichtheshellcommanditrunsafterlogginginthenseesasits$TERMenvironmentvariable.

fromtelnetlibimportTelnet,IAC,DO,DONT,WILL,WONT,SB,SE,TTYPE

defprocess_option(tsocket,command,option):

ifcommand==DOandoption==TTYPE:

tsocket.sendall(IAC+WILL+TTYPE)

print'Sendingterminaltype"mypython"'

tsocket.sendall(IAC+SB+TTYPE+'\0'+'mypython'+IAC+SE)

elifcommandin(DO,DONT):

print'Willnot',ord(option)

tsocket.sendall(IAC+WONT+option)

elifcommandin(WILL,WONT):

Page 200: Python Networking Gitbook

print'Donot',ord(option)

tsocket.sendall(IAC+DONT+option)

t=Telnet('localhost')

#t.set_debuglevel(1)#uncommentthisfordebuggingmessages

t.set_option_negotiation_callback(process_option)

t.read_until('login:',5)

t.write('brandon\n')

t.read_until('assword:',5)#soPcanbecapitalizedornot

t.write('mypass\n')

n,match,previous_text=t.expect([r'Loginincorrect',r'\$'],10)

ifn==0:

print"Usernameandpasswordfailed-givingup"

else:

t.write('exececho$TERM\n')

printt.read_all()

Page 201: Python Networking Gitbook

TheSSHprotocolisoneofthebest-knownexamplesofasecure,encryptedprotocolamongmodernsystemadministrators(HTTPSisprobablytheverybestknown).

SSHisdescendedfromanearlierprotocolthatsupported“remotelogin,”“remoteshell,”and“remotefilecopy”commandsnamedrlogin,rsh,andrcp,whichintheirtimetendedtobecomemuchmorepopularthanTelnetatsitesthatsupportedthem.Youcannotimaginewhatarevelationrcpwasparticular,unlessyouhavespenthourstryingtotransferafilebetweencomputersarmedwithonlyTelnetandascriptthattriestotypeyourpasswordforyou,onlytodiscoverthatyourfilecontainsabytethatlookslikeacontrolcharactertoTelnetortheremoteterminal,andhavethewholethinghanguntilyouaddalayerofescaping(orfigureouthowtodisableboththeTelnetescapekeyandallinterpretationtakingplaceontheremoteterminal).

Butthebestfeatureoftherloginfamilywasthattheydidnotjustechousernameandpasswordpromptswithoutactuallyknowingthemeaningofwhatwasgoingon.Instead,theystayedinvolvedthroughtheprocessofauthentication,andyoucouldevencreateafileinyourhomedirectorythattoldthem“whensomeonenamedbrandontriestoconnectfromtheasaphmachine,justlettheminwithoutapassword.”Suddenly,systemadministratorsandUnixusersalikereceivedbackhoursofeachmonththatwouldotherwisehavebeenspenttypingtheirpassword.Suddenly,youcouldcopytenfilesfromonemachinetoanothernearlyaseasilyasyoucouldhavecopiedthemintoalocalfolder.SSHhaspreservedallofthesegreatfeaturesoftheearlyremote-shellprotocol,whilebringingbulletproofsecurityandhardencryptionthatistrustedworldwideforadministeringcriticalservers.

AtSSH,wereachaprotocolsosophisticatedthatitactuallyimplementsitsownrulesformultiplexing,sothatseveral“channels”ofinformationcanallsharethesameSSHsocket.EveryblockofinformationSSHsendsacrossitssocketislabeledwitha“channel”identifiersothatseveralconversationscansharethesocket.Thereareatleasttworeasonssub-channelsmakesense.First,eventhoughthechannelIDtakesupabitofbandwidthforeverysingleblockofinformationtransmitted,theadditionaldataissmallcomparedtohowmuchextrainformationSSHhastotransmittonegotiateandmaintainencryptionanyway.Second,channelsmakesensebecausetherealexpenseofanSSHconnectionissettingitup.Hostkeynegotiationandauthenticationcantogethertakeupseveralsecondsofrealtime,andoncetheconnectionisestablished,youwanttobeabletouseitforasmanyoperationsaspossible.ThankstotheSSHnotionofachannel,youcanamortizethehighcostofconnectingbyperformingmanyoperationsbeforeyoulettheconnectionclose.Onceconnected,youcancreateseveralkindsofchannels:

Aninteractiveshellsession,likethatsupportedbyTelnet.

Theindividualexecutionofasinglecommand.

Afile-transfersessionlettingyoubrowsetheremotefilesystem.

Aport-forwardthatinterceptsTCPconnections.

SSH:TheSecureShell

AnOverviewofSSH

Page 202: Python Networking Gitbook

WhenanSSHclientfirstconnectstoaremotehost,theyexchangetemporarypublickeysthatletthemencrypttherestoftheirconversationwithoutrevealinganyinformationtoanywatchingthirdparties.Then,beforetheclientiswillingtodivulgeanyfurtherinformation,itdemandsproofoftheremoteserver'sidentity.Thismakesgoodsenseasafirststep:ifyouarereallytalkingtoahackerwhohastemporarilymanagedtograbtheremoteserver'sIP,youdonotwantSSHtodivulgeevenyourusername—muchlessyourpassword.

TherearemanyproblemswiththissystemfromthepointofviewofSSH.Whileitistruethatyoucanbuildapublic-keyinfrastructureinternaltoanorganization,whereyoudistributeyourownsigningauthority'scertificatestoyourwebbrowsersorotherapplicationsandthencansignyourownservercertificateswithoutpayingathirdparty,apublic-keyinfrastructureisstillconsideredtoocumbersomeaprocessforsomethinglikeSSH;serveradministratorswanttosetup,use,andteardownserversallthetime,withouthavingtotalktoacentralauthorityfirst.

SoSSHhastheideathateachserver,wheninstalled,createsitsownrandompublic-privatekeypairthatisnotsignedbyanybody.Instead,oneoftwoapproachesistakentokeydistribution:

Asystemadministratorwritesascriptthatgathersupallofthehostpublickeysinanorganization,createsanssh_known_hostslistingthemall,andplacesthisfileinthe/etc/sshddirectoryoneverysystemintheorganization.NoweverySSHclientwillknowabouteverySSHhostkeybeforetheyevenconnectforthefirsttime.

Abandontheideaofknowinghostkeysaheadoftime,andinsteadmemorizethematthemomentoffirstconnection.UsersoftheSSHcommandlinewillbeveryfamiliarwiththis:theclientsaysitdoesnotrecognizethehosttowhichyouareconnecting,youreflexivelyanswer“yes,”anditskeygetsstoredinyour"~/.ssh/known_hosts"file.Youactuallyhavenoguaranteethatyouarereallytalkingtothehostyouthinkitis;butatleastyouwillbeguaranteedthateverysubsequentconnectionyouevermaketothatmachineisgoingtotherightplace,andnottootherserversthatsomeoneisswappingintoplaceatthesameIPaddress.

ThefamiliarpromptfromtheSSHcommandlinewhenitseesanunfamiliarhostlookslikethis:

root@erlerobot:~#sshasaph.rhodesmill.org

Theauthenticityofhost'asaph.rhodesmill.org(74.207.234.78)'

can'tbeestablished.

RSAkeyfingerprintis85:8f:32:4e:ac:1f:e9:bc:35:58:c1:d4:25:e3:c7:8c.

Areyousureyouwanttocontinueconnecting(yes/no)?yes

Warning:Permanentlyadded'asaph.rhodesmill.org,74.207.234.78'(RSA)

tothelistofknownhosts.

That“yes”answerburieddeeponthenext-to-lastfulllineistheanswerthatItypedgivingSSHthego-aheadtomaketheconnectionandrememberthekeyfornexttime.

TheparamikolibraryhasfullsupportforallofthenormalSSHtacticssurroundinghostkeys.Butitsdefaultbehaviorisratherspare:itloadsnohost-keyfilesbydefault,andwillthen,ofcourse,raiseanexceptionfortheveryfirsthosttowhichyouconnectbecauseitwillnotbeabletoverifyitskey.Theexceptionthatitraisesisabitun-informative;itisonlybylookingatthefactthatitcomesfrominsidethemissing_host_key()functionthatIusuallyrecognizewhathascausedtheerror.(Beforedoingthis,installparamikomodulefromPythonPackageIndex):

>>>importparamiko

>>>client=paramiko.SSHClient()

>>>client.connect('my.example.com',username='test')

Traceback(mostrecentcalllast):

...

File".../paramiko/client.py",line85,inmissing_host_key

»raiseSSHException('Unknownserver%s'%hostname)

paramiko.SSHException:Unknownservermy.example.com

TobehavelikethenormalSSHcommand,loadboththesystemandthecurrentuser'sknown-hostkeysbeforemakingthe

SSHHostKeys

Page 203: Python Networking Gitbook

connection:

>>>client.load_system_host_keys()

>>>client.load_host_keys('/home/brandon/.ssh/known_hosts')

>>>client.connect('my.example.com',username='test')

Theparamikolibraryalsoletsyouchoosehowyouhandleunknownhosts.Onceyouhaveaclientobjectcreated,youcanprovideitwithadecision-makingclassthatisaskedwhattodoifahostkeyisnotrecognized.YoucanbuildtheseclassesyourselfbyinheritingfromtheMissingHostKeyPolicyclass:

>>>classAllowAnythingPolicy(paramiko.MissingHostKeyPolicy):

...defmissing_host_key(self,client,hostname,key):

...return

...

>>>client.set_missing_host_key_policy(AllowAnythingPolicy())

>>>client.connect('my.example.com',username='test')

Notethat,throughtheargumentstothemissing_host_key()method,youreceiveseveralpiecesofinformationonwhichtobaseyourdecision;youcould,forexample,allowconnectionstomachinesonyourownserversubnetwithoutahostkey,butdisallowallothers.

Insideparamikotherearealsoseveraldecision-makingclassesthatalreadyimplementseveralbasichost-keyoptions:

paramiko.AutoAddPolicy:Hostkeysareautomaticallyaddedtoyouruserhost-keystore(thefile~/.ssh/known_hostsonUnixsystems)whenfirstencountered,butanychangeinthehostkeyfromthenonwillraiseafatalexception.

paramiko.RejectPolicy:Connectingtohostswithunknownkeyssimplyraisesanexception.

paramiko.WarningPolicy:Anunknownhostcausesawarningtobelogged,buttheconnectionisthenallowedtoproceed.

TheAutoAddPolicyneverneedshumaninteraction,butwillatleastassureyouonsubsequentencountersthatyouarestilltalkingtothesamemachineasbefore.

Page 204: Python Networking Gitbook

Sincethischapterisprimarilyabouthowto“speakSSH”fromPython,Iwilljustbrieflyoutlinehowauthenticationworks.TherearegenerallythreewaystoproveyouridentitytoaremoteserveryouarecontactingthroughSSH:

Youcanprovideausernameandpassword.

Youcanprovideausername,andthenhaveyourclientsuccessfullyperformapublic-keychallenge-response.Thiscleveroperationmanagestoprovethatyouareinpossessionofasecret“identity”keywithoutactuallyexposingitscontentstotheremotesystem.

YoucanperformKerberosauthentication.IftheremotesystemissetuptoallowKerberos,andifyouhaverunthekinitcommand-linetooltoproveyouridentitytooneofthemasterKerberosserversintheSSHserver'sauthenticationdomain,thenyoushouldbeallowedinwithoutapassword.

Sinceoption3isveryrare,wewillconcentrateonthefirsttwo.Usingausernameandpasswordwithparamikoisveryeasy—yousimplyprovidetheminyourcalltotheconnect()method:

>>>client.connect('my.example.com',username='brandon',password=mypass)

Public-keyauthentication,whereyouusessh-keygentocreatean“identity”keypair(whichistypicallystoredinyour~/.sshdirectory)thatcanbeusedtoauthenticateyouwithoutapassword,makesthePythoncodeeveneasier.

>>>client.connect('my.example.com')

Ifyouridentitykeyfileisstoredsomewhereotherthaninthenormal~/.ssh/id_rsafile,thenyoucanprovideitsfilename—orawholePythonlistoffilenames—totheconnect()methodmanually:

>>>client.connect('my.example.com',key_filename='/home/brandon/.ssh/id_sysadmin')

Oncetheconnect()methodhassucceeded,youarenowreadytostartperformingremoteoperations,allofwhichwillbeforwardedoverthesamephysicalsocketwithoutrequiringre-negotiationofthehostkey,youridentity,ortheencryptionthatprotectstheSSHsocketitself.

SSHAuthentication

Page 205: Python Networking Gitbook

OnceyouhaveaconnectedSSHclient,theentireworldofSSHoperationsisopentoyou.Simplybyasking,youcanaccessremote-shellsessions,runindividualcommands,commencefile-transfersessions,andsetupportforwarding.

First,SSHcansetuparawshellsessionforyou,runningontheremoteendinsideapseudoterminalsothatprogramsactliketheynormallydowhentheyareinteractingwiththeuserataterminal.ThiskindofconnectionbehavesverymuchlikeaTelnetconnection;takealookatssh_simple.pyforanexample,whichpushesasimpleechocommandattheremoteshell,andthenasksittoexit.

importparamiko

classAllowAnythingPolicy(paramiko.MissingHostKeyPolicy):

defmissing_host_key(self,client,hostname,key):

return

client=paramiko.SSHClient()

client.set_missing_host_key_policy(AllowAnythingPolicy())

client.connect('127.0.0.1',username='test')#password='')

channel=client.invoke_shell()

stdin=channel.makefile('wb')

stdout=channel.makefile('rb')

stdin.write('echoHello,world\rexit\r')

printstdout.read()

client.close()

Ifyouactuallyrunthiscommand,youwillseethatthecommandsyoutypeareactuallyechoedtoyoutwice,andthatthereisnoobviouswaytoseparatethesecommandechoesfromtheactualcommandoutput.

Becauseofquirkyterminal-dependentbehaviors,youshouldgenerallyavoideverusinginvoke_shell()unlessyouareactuallywritinganinteractiveterminalprogramwhereyouletaliveusertypecommands.Amuchbetteroptionforrunningremotecommandsistouseexec_command(),which,insteadofstartingupawholeshellsession,justrunsasinglecommand,givingyoucontrolofitsstandardinput,output,anderrorstreamsjustasthoughyouhadrunitusingthesubprocessmoduleintheStandardLibrary.Aswehaveseenthismoduleallowsyoutospawnnewprocesses,connecttotheirinput/output/errorpipes,andobtaintheirreturncodes.

Ascriptdemonstratingitsuseisshowninssh_commands.py.Thedifferencebetweenexec_command()andalocalsubprocessisthatyoudonotgetthechancetopasscommand-lineargumentsasseparatestrings;instead,youhavetopassawholecommandlineforinterpretationbytheshellontheremoteend.

importparamiko

classAllowAnythingPolicy(paramiko.MissingHostKeyPolicy):

defmissing_host_key(self,client,hostname,key):

return

client=paramiko.SSHClient()

client.set_missing_host_key_policy(AllowAnythingPolicy())

client.connect('127.0.0.1',username='test')#password='')

forcommandin'echo"Hello,world!"','uname','uptime':

stdin,stdout,stderr=client.exec_command(command)

stdin.close()

printrepr(stdout.read())

stdout.close()

stderr.close()

client.close()

ShellSessionsandIndividualCommands

Page 206: Python Networking Gitbook

EverytimeyoustartanewSSHshellsessionwithinvoke_shell(),andeverytimeyoukickoffacommandwithexec_command(),anewSSH“channel”iscreatedbehindthescenes,whichiswhatprovidesthefile-likePythonobjectsthatletyoutalktotheremotecommand'sstandardinput,output,anderror.Channels,asjustexplained,canruninparallel,andSSHwillcleverlyinterleavetheirdataonyoursingleSSHconnectionsothatalloftheconversationshappensimultaneouslywithouteverbecomingconfused.

Takealookatssh_threads.pyforaverysimpleexampleofwhatispossible.Here,two“commands”arekickedoffremotely,whichareeachasimpleshellscriptwithsomeechocommandsinterspersedwithpausescreatedbycallstosleep.Thethreadingmoduleconstructshigher-levelthreadinginterfacesontopofthelowerlevelthreadmodule.

importthreading

importparamiko

classAllowAnythingPolicy(paramiko.MissingHostKeyPolicy):

defmissing_host_key(self,client,hostname,key):

return

client=paramiko.SSHClient()

client.set_missing_host_key_policy(AllowAnythingPolicy())

client.connect('127.0.0.1',username='test')#password='')

defread_until_EOF(fileobj):

s=fileobj.readline()

whiles:

prints.strip()

s=fileobj.readline()

out1=client.exec_command('echoOne;sleep2;echoTwo;sleep1;echoThree')[1]

out2=client.exec_command('echoA;sleep1;echoB;sleep2;echoC')[1]

thread1=threading.Thread(target=read_until_EOF,args=(out1,))

thread2=threading.Thread(target=read_until_EOF,args=(out2,))

thread1.start()

thread2.start()

thread1.join()

thread2.join()

client.close()

Inordertobeabletoprocessthesetwostreamsofdatasimultaneously,wearekickingofftwothreads,andarehandingeachofthemoneofthechannelsfromwhichtoread.Theyeachprintouteachlineofnewinformationassoonasitarrives,andfinallyexitwhenthereadline()commandindicatesend-of-filebyreturninganemptystring.Whenrun,thisscriptshouldreturnsomethinglikethis:

root@erlerobot:~/Python_files#pythonssh_threads.py

One

A

B

Two

Three

C

SSHchannelsoverthesameTCPconnectionarecompletelyindependent,caneachreceive(andsend)dataattheirownpace,andcancloseindependentlywhentheparticularcommandthattheyaretalkingtofinallyterminates.

Page 207: Python Networking Gitbook

Version2oftheSSHprotocolincludesasub-protocolcalledthe“SSHFileTransferProtocol”(SFTP)thatletsyouwalktheremotedirectorytree,createanddeletedirectoriesandfiles,andcopyfilesbackandforthfromthelocaltotheremotemachine.ThecapabilitiesofSFTParesocomplexandcomplete,infact,thattheysupportnotonlysimplefile-copyoperations,butcanpowergraphicalfilebrowsersandcanevenlettheremotefilesystembemountedlocally.

WhentalkingaboutSFTPcommandsthanisprovidedbythebareparamikodocumentationforthePythonSFTPclient(http://www.lag.net/paramiko/docs/paramiko.SFTPClient-class);herearethemainthingstorememberwhendoingSFTP:

TheSFTPprotocolisstateful,justlikeFTP,andjustlikeyournormalshellaccount.Soyoucaneitherpassallfileanddirectorynamesasabsolutepathsthatstartattherootofthefilesystem,orusegetcwd()andchdir()tomovearoundthefilesystemandthenusepathsthatarerelativetothedirectoryinwhichyouhavearrived.

Youcanopenafileusingeitherthefile()oropen()methodandyougetbackafile-likeobjectconnectedtoanSSHchannelthatrunsindependentlyofyourSFTPchannel.

Becauseeachopenremotefilegetsanindependentchannel,filetransferscanhappenasynchronously;youcanopenmanyremotefilesatonceandhavethemallstreamingdowntoyourdiskdrive,oropennewfilesandbesendingdatatheotherway.

Finally,keepinmindthatnoshellexpansionisdoneonanyofthefilenamesyoupassacrossSFTP.Ifyoutryusingafilenamelike*oronethathasspacesorspecialcharacters,theyaresimplyinterpretedaspartofthefilename.Thismeansthatanysupportforpattern-matchingthatyouwanttoprovidetotheuserhastobethroughfetchingthedirectorycontentsyourselfandthencheckingtheirpatternagainsteachone,usingaroutinelikethoseprovidedinfnmatchinthePythonStandardLibrary.fnmatchmoduleprovidessupportforUnixshell-stylewildcards,whicharenotthesameasregularexpressions.

AverymodestexampleSFTPsessionisshowninsftp.py.Itdoessomethingsimplethatsystemadministratorsmightoftenneed:itconnectstotheremotesystemandcopiesmessageslogfilesoutofthe/var/logdirectory,perhapsforscanningoranalysisonthelocalmachine.Thefunctoolsmoduleisforhigher-orderfunctions:functionsthatactonorreturnotherfunctions.Ingeneral,anycallableobjectcanbetreatedasafunctionforthepurposesofthismodule,asshowninthesftp.py:

importfunctools

importparamiko

classAllowAnythingPolicy(paramiko.MissingHostKeyPolicy):

defmissing_host_key(self,client,hostname,key):

return

client=paramiko.SSHClient()

client.set_missing_host_key_policy(AllowAnythingPolicy())

client.connect('127.0.0.1',username='test')#password='')

defmy_callback(filename,bytes_so_far,bytes_total):

print'Transferof%risat%d/%dbytes(%.1f%%)'%(

filename,bytes_so_far,bytes_total,100.*bytes_so_far/bytes_total)

sftp=client.open_sftp()

sftp.chdir('/var/log')

forfilenameinsorted(sftp.listdir()):

iffilename.startswith('messages.'):

callback_for_filename=functools.partial(my_callback,filename)

sftp.get(filename,filename,callback=callback_for_filename)

client.close()

Notethat,althoughImadeabigdealoftalkingabouthoweachfilethatyouopenwithSFTPusesitsownindependentchannel,thesimpleget()andput()conveniencefunctionsprovidedbyparamiko—whicharereallylightweightwrappers

SFTP:FileTransferOverSSH

Page 208: Python Networking Gitbook

foranopen()followedbyaloopthatreadsandwrites—donotattemptanyasynchrony,butinsteadjustblockandwaituntileachwholefilehasarrived.Thismeansthattheforegoingscriptcalmlytransfersonefileatatime,producingoutputthatlookssomethinglikethis:

root@erlerobot:~/Python_files#pythonsftp.py

Transferof'messages.1'isat32768/128609bytes(25.5%)

Transferof'messages.1'isat65536/128609bytes(51.0%)

Transferof'messages.1'isat98304/128609bytes(76.4%)

Transferof'messages.1'isat128609/128609bytes(100.0%)

Transferof'messages.2.gz'isat32768/40225bytes(81.5%)

Transferof'messages.2.gz'isat40225/40225bytes(100.0%)

Transferof'messages.3.gz'isat28249/28249bytes(100.0%)

Transferof'messages.4.gz'isat32768/71703bytes(45.7%)

Transferof'messages.4.gz'isat65536/71703bytes(91.4%)

Transferof'messages.4.gz'isat71703/71703bytes(100.0%)

Page 209: Python Networking Gitbook

TheFileTransferProtocol(FTP)wasonceamongthemostwidelyusedprotocolsontheInternet,invokedwheneverauserwantedtotransferfilesbetweenInternet-connectedcomputers.

Inthischapterwewillexaminethisprotocolandstudythepossiblealternaives.

FileTransferProtocol(FTP)

Page 210: Python Networking Gitbook

Today,therearebetteralternativesthantheFTPprotocolforprettymuchanythingyoucouldwanttodowithit.

Thebiggestproblemwiththeprotocolisitslackofsecurity:notonlyfiles,butusernamesandpasswordsaresentcompletelyintheclearandcanbeviewedbyanyoneobservingnetworktraffic.

AsecondissueisthatanFTPusertendstomakeaconnection,chooseaworkingdirectory,anddoseveraloperationsalloverthesamenetworkconnection.ModernInternetservices,withmillionsofusers,preferprotocolslikeHTTPthatconsistofshort,completelyself-containedrequests,insteadoflong-runningFTPconnectionsthatrequiretheservertorememberthingslikeacurrentworkingdirectory.

Afinalbigissueisfilesystemsecurity.TheearlyFTPservers,insteadofshowingusersjustasliverofthehostfilesystemthattheownerwantedexposed,tendedtosimplyexposetheentirefilesystem,lettinguserscdto/andsnooparoundtoseehowthesystemwasconfigured.

Forfiledownload,HTTPisthestandardprotocolontoday’sInternet,protectedwithSSLwhennecessaryforsecurity.InsteadofexposingsystemspecificfilenameconventionslikeFTP,HTTPsupportssystem-independentURLs.

Anonymousuploadisabitlessstandard,butthegeneraltendencyistouseaformonawebpagethatinstructsthebrowsertouseanHTTPPOSToperationtotransmitthefilethattheuserselects.

FilesynchronizationhasimprovedimmeasurablysincethedayswhenarecursiveFTPfilecopywastheonlycommonwaytogetfilestoanothercomputer.Insteadofwastefullycopyingeveryfile,moderncommandslikersyncorrdistefficientlycomparefilesatbothendsoftheconnectionandcopyonlytheonesthatareneworhavechanged.

FullfilesystemaccessisactuallytheoneareawhereFTPcanstillcommonlybefoundontoday’sInternet:thousandsofcut-rateISPscontinuetosupportFTP,despiteitsinsecurity,asthemeansbywhichuserscopytheirmediaand(typically)PHPsourcecodeintotheirwebaccount.AmuchbetteralternativetodayisforserviceproviderstosupportSFTPinstead.

WhattoUseInsteadofFTP

Sowhatarethealternatives?

Page 211: Python Networking Gitbook

FTPisunusualbecause,bydefault,itactuallyusestwoTCPconnectionsduringoperation.Oneconnectionisthecontrolchannel,whichcarriescommandsandtheresultingacknowledgmentsorerrorcodes.Thesecondconnectionisthedatachannel,whichisusedsolelyfortransmittingfiledataorotherblocksofinformation,suchasdirectorylistings.Technically,thedatachannelisfullduplex,meaningthatitallowsfilestobetransmittedinbothdirectionssimultaneously.However,inactualpractice,thiscapabilityisrarelyused.

TheprocessofdownloadingafilefromanFTPserverranmostlylikethis:

1. First,theFTPclientestablishesacommandconnectionbyconnectingtotheFTPportontheserver.2. Theclientauthenticatesitself,usuallywithusernameandpassword.3. Theclientchangesdirectoryontheservertowhereitwantstodepositorretrievefiles.4. Theclientbeginslisteningonanewportforthedataconnection,andtheninformstheserveraboutthatport.5. Theserverconnectstotheporttheclientrequested.6. Thefileistransmitted.7. Thedataconnectionisclosed.

FTPalsosupportswhatisknownaspassivemode.Inthisscenario,thedataconnectionismadebackward:theserveropensanextraport,andtellstheclienttomakethesecondconnection.Otherthanthat,everythingbehavesthesameway.

CommunicationChannels

Page 212: Python Networking Gitbook

ThePythonmoduleftplibistheprimaryinterfacetoFTPforPythonprogrammers.Ithandlesthedetailsofestablishingthevariousconnectionsforyou,andprovidesconvenientwaystoautomatecommoncommands.YoucanusethistowritePythonprogramsthatperformavarietyofautomatedFTPjobs,suchasmirroringotherftpservers.ItisalsousedbythemoduleurllibtohandleURLsthatuseFTP.FormoreinformationonFTP(FileTransferProtocol),seeInternetRFC959.

connect.pyshowsaverybasicftplibexample.Theprogramconnectstoaremoteserver,displaysthewelcomemessage,andprintsthecurrentworkingdirectory.

fromftplibimportFTP

f=FTP('ftp.ibiblio.org')

print"Welcome:",f.getwelcome()

f.login()

print"Currentworkingdirectory:",f.pwd()

f.quit()

`

RecallthatanFTPsessioncanvisitdifferentdirectories,justlikeashellpromptcanmovebetweenlocationswithcd.Here,thepwd()functionreturnsthecurrentworkingdirectoryontheremotesiteoftheconnection.Finally,thequit()functionlogsoutandclosestheconnection.Hereiswhattheprogramoutputswhenrun:

root@erlerobot:~/Python_files#pythonconnect.py

Welcome:220ProFTPDServer

Currentworkingdirectory:/

UsingFTPinPython

Page 213: Python Networking Gitbook

WhenmakinganFTPtransfer,youhavetodecidewhetheryouwantthefiletreatedasamonolithicblockofbinarydata,orwhetheryouwantitparsedasatextfilesothatyourlocalmachinecanpasteitslinesbacktogetherusingwhateverend-of-linecharacterisnativetoyourplatform.Afiletransferredinso-called“ASCIImode”isdeliveredonelineatatime,sothatyoucangluethelinesbacktogetheronthelocalmachineusingitsownline-endingconvention.Takealookatasciidl.pyforaPythonprogramthatdownloadsawell-knowntextfileandsavesitinyourlocaldirectory.

importos

fromftplibimportFTP

ifos.path.exists('README'):

raiseIOError('refusingtooverwriteyourREADMEfile')

defwriteline(data):

fd.write(data)

fd.write(os.linesep)

f=FTP('ftp.kernel.org')

f.login()

f.cwd('/pub/linux/kernel')

fd=open('README','w')

f.retrlines('RETRREADME',writeline)

fd.close()

f.quit()

Intheexample,thecwd()functionselectsanewworkingdirectoryontheremotesystem.Thentheretrlines()functionbeginsthetransfer.Itsfirstparameterspecifiesacommandtorunontheremotesystem,usuallyRETR,followedbyafilename.Itssecondparameterisafunctionthatiscalled,overandoveragain,aseachlineofthetextfileisretrieved;ifomitted,thedataissimplyprintedtostandardoutput.Thelinesarepassedwiththeend-of-linecharacterstripped,sothehomemadewriteline()functionsimplyappendsyoursystem’sstandardlineendingtoeachlineasitiswrittenout.Tryrunningthisprogram;thereshouldbeafileinyourcurrentdirectorynamedREADMEaftertheprogramisdone.Basicbinaryfiletransfersworkinmuchthesamewayastext-filetransfers;binarydl.pyshowsanexample.

importos

fromftplibimportFTP

ifos.path.exists('patch8.gz'):

raiseIOError('refusingtooverwriteyourpatch8.gzfile')

f=FTP('ftp.kernel.org')

f.login()

f.cwd('/pub/linux/kernel/v1.0')

fd=open('patch8.gz','wb')

f.retrbinary('RETRpatch8.gz',fd.write)

fd.close()

f.quit()

Whenrun,itdepositsafilenamedpatch8.gzinyourcurrentworkingdirectory.Theretrbinary()functionsimplypassesblocksofdatatothespecifiedfunction.Thisisconvenient,sinceafileobject’swrite()functionexpectsjustsuchdata—sointhiscase,nocustomfunctionisnecessary.

ASCIIandBinaryFiles

Page 214: Python Networking Gitbook

Theftplibmoduleprovidesasecondfunctionthatcanbeusedforbinarydownloading:ntransfercmd().Thiscommandprovidesalower-levelinterface,butcanbeusefulifyouwanttoknowalittlebitmoreaboutwhat’sgoingonduringthedownload.Inparticular,thismoreadvancedcommandletsyoukeeptrackofthenumberofbytestransferred,andyoucanusethatinformationtodisplaystatusupdatesfortheuser.advbinarydl.pyshowsasampleprogramthatusesntransfercmd().

importos,sys

fromftplibimportFTP

ifos.path.exists('linux-1.0.tar.gz'):

raiseIOError('refusingtooverwriteyourlinux-1.0.tar.gzfile')

f=FTP('ftp.kernel.org')

f.login()

f.cwd('/pub/linux/kernel/v1.0')

f.voidcmd("TYPEI")

datasock,size=f.ntransfercmd("RETRlinux-1.0.tar.gz")

bytes_so_far=0

fd=open('linux-1.0.tar.gz','wb')

while1:

buf=datasock.recv(2048)

ifnotbuf:

break

fd.write(buf)

bytes_so_far+=len(buf)

print"\rReceived",bytes_so_far,

ifsize:

print"of%dtotalbytes(%.1f%%)"%(

size,100*bytes_so_far/float(size)),

else:

print"bytes",

sys.stdout.flush()

print

fd.close()

datasock.close()

f.voidresp()

f.quit()

Thereareafewnewthingstonotehere.Firstcomesthecalltovoidcmd().ThispassesanFTPcommanddirectlytotheserver,checksforanerror,butreturnsnothing.Inthiscase,therawcommandisTYPEI.Thatsetsthetransfermodeto“image,”whichishowFTPrefersinternallytobinaryfiles.Inthepreviousexample,retrbinary()automaticallyranthiscommandbehindthescenes,butthelower-levelntransfercmd()doesnot.Next,notethatntransfercmd()returnsatupleconsistingofadatasocketandanestimatedsize.Alwaysbearinmindthatthesizeismerelyanestimate,andshouldnotbeconsideredauthoritative;thefilemayendsooner,oritmightgoonmuchlonger,thanthisvalue.Also,ifasizeestimatefromtheFTPserverissimplynotavailable,thentheestimatedsizereturnedwillbeNone.

Afterreceivingthedata,itisimportanttoclosethedatasocketandcallvoidresp(),whichreadsthecommandresponsecodefromtheserver,raisinganexceptioniftherewasanyerrorduringtransmission.Evenifyoudonotcareaboutdetectingerrors,failingtocallvoidresp()willmakefuturecommandslikelytofailbecausetheserver’soutputsocketwillbeblockedwaitingforyoutoreadtheresults.Hereisanexampleofrunningthisprogram:

root@erlerobot:~/Python_files#pythonadvbinarydl.py

Received1259161of1259161bytes(100.0%)

AdvancedBinaryDownloading

Page 215: Python Networking Gitbook

FiledatacanalsobeuploadedthroughFTP.Aswithdownloading,therearetwobasicfunctionsforuploading:storbinary()andstorlines().Bothtakeacommandtorun,andafile-likeobjecttotransmit.Thestorbinary()functionwillcalltheread()methodrepeatedlyonthatobjectuntilitscontentisexhausted,whilestorlines(),bycontrast,callsthereadline()method.Unlikethecorrespondingdownloadfunctions,thesemethodsdonotrequireyoutoprovideacallablefunctionofyourown.(Butyoucould,ofcourse,passafile-likeobjectofyourowncraftingwhoseread()orreadline()methodcomputestheoutgoingdataasthetransmissionproceeds.binaryul.pyshowshowtouploadafileinbinarymode.

fromftplibimportFTP

importsys,getpass,os.path

iflen(sys.argv)!=5:

print"usage:%s<host><username><localfile><remotedir>"%(

sys.argv[0])

exit(2)

host,username,localfile,remotedir=sys.argv[1:]

password=getpass.getpass(

"Enterpasswordfor%son%s:"%(username,host))

f=FTP(host)

f.login(username,password)

f.cwd(remotedir)

fd=open(localfile,'rb')

f.storbinary('STOR%s'%os.path.basename(localfile),fd)

fd.close()

f.quit()

Thisprogramlooksquitesimilartoourearlierefforts.SincemostanonymousFTPsitesdonotpermitfileuploading,youwillhavetofindaserversomewheretotestitagainst;Isimplyinstalledtheold,venerableftpdonmylaptopforafewminutesandranthetestlikethis:

root@erlerobot:~/Python_files#pythonbinaryul.pylocalhostbrandontest.txt/tmp

YoucanmodifythisprogramtouploadafileinASCIImodebysimplychangingstorbinary()tostorlines().

UploadingData

Page 216: Python Networking Gitbook

Justlikethedownloadprocesshadacomplicatedrawversion,itisalsopossibletouploadfiles“byhand”usingntransfercmd(),asshowninadvbinaryul.py.

fromftplibimportFTP

importsys,getpass,os.path

BLOCKSIZE=8192#chunksizetoreadandtransmit:8kB

iflen(sys.argv)!=5:

print"usage:%s<host><username><localfile><remotedir>"%(

sys.argv[0])

exit(2)

host,username,localfile,remotedir=sys.argv[1:]

password=getpass.getpass("Enterpasswordfor%son%s:"%\

(username,host))

f=FTP(host)

f.login(username,password)

f.cwd(remotedir)

f.voidcmd("TYPEI")

fd=open(localfile,'rb')

datasock,esize=f.ntransfercmd('STOR%s'%os.path.basename(localfile))

size=os.stat(localfile)[6]

bytes_so_far=0

while1:

buf=fd.read(BLOCKSIZE)

ifnotbuf:

break

datasock.sendall(buf)

bytes_so_far+=len(buf)

print"\rSent",bytes_so_far,"of",size,"bytes",\

"(%.1f%%)\r"%(100*bytes_so_far/float(size))

sys.stdout.flush()

print

datasock.close()

fd.close()

f.voidresp()

f.quit()

Nowwecanperformanuploadthatcontinuouslydisplaysitsstatusasitprogresses:

root@erlerobot:~/Python_files#pythonbinaryul.pylocalhostbrandonpatch8.gz/tmp

Enterpasswordforbrandononlocalhost:

Sent6408of6408bytes(100.0%)

AdvancedBinaryUploading

Page 217: Python Networking Gitbook

LikemostPythonmodules,ftplibwillraiseanexceptionwhenanerroroccurs.Itdefinesseveralexceptionsofitsown,anditcanalsoraisesocket.errorandIOError.Asaconvenience,itoffersatuple,namedftplib.all_errors,thatlistsalloftheexceptionsthatcanpossiblyberaisedbyftplib.Thisisoftenausefulshortcutforwritingatry…exceptclause.

Oneoftheproblemswiththebasicretrbinary()functionisthat,inordertouseiteasily,youwillusuallywindupopeningthefileonthelocalendbeforebeginningthetransferontheremoteside.Ifyourcommandaimedattheremotesideretortsthatthefiledoesnotexist,oriftheRETRcommandotherwisefails,thenyouwillhavetocloseanddeletethelocalfileyouhavejustcreated(orelsewinduplitteringthefilesystemwithzero-lengthfiles).

Withthentransfercmd()method,bycontrast,youcancheckforaproblempriortoopeningalocalfile.nlst.pyalreadyfollowstheseguidelines:ifntransfercmd()fails,theexceptionwillcausetheprogramtoterminatebeforethelocalfileisopened.ScanningDirectoriesFTPprovidestwowaystodiscoverinformationaboutserverfilesanddirectories.Theseareimplementedinftplibasthenlst()anddir()methods.

Thenlst()methodreturnsalistofentriesinagivendirectory—allofthefilesanddirectoriesinside.However,thebarenamesareallthatisreturned.Thereisnootherinformationaboutwhichparticularentriesarefilesoraredirectories,onthesizesofthefilespresent,oranythingelse.

Themorepowerfuldir()functionreturnsadirectorylistingfromtheremote.Thislistingisinasystem-definedformat,buttypicallycontainsafilename,size,modificationdate,andfiletype.OnUNIXservers,itistypicallytheoutputofoneofthesetwoshellcommands:

root@erlerobot:~#ls-l

root@erlerobot:~#ls-la

nlst.pyshowsanexampleofusingnlst()togetdirectoryinformation.

fromftplibimportFTP

f=FTP('ftp.ibiblio.org')

f.login()

f.cwd('/pub/academic/astronomy/')

entries=f.nlst()

entries.sort()

printlen(entries),"entries:"

forentryinentries:

printentry

f.quit()

nlst.pyshowsanexampleofusingnlst()togetdirectoryinformation.Whenyourunthisprogram,youwillseeoutputlikethis:

root@erlerobot:~/Python_files#pythonnlst.py

13entries:

INDEX

README

ephem_4.28.tar.Z

hawaii_scope

incoming

jupitor-moons.shar.Z

lunar.c.Z

lunisolar.shar.Z

moon.shar.Z

planetary

sat-track.tar.Z

stars.tar.Z

xephem.tar.Z

HandlingErrors

Page 218: Python Networking Gitbook

IfyouweretouseanFTPclienttomanuallylogontotheserver,youwouldseethesamefileslisted.Noticethatthefilenamesareinaconvenientformatforautomatedprocessing—abarelistoffilenames—butthatthereisnoextrainformation.Theresultwillbedifferentwhenwetryanotherfilelistingcommandindir.py:

fromftplibimportFTP

f=FTP('ftp.ibiblio.org')

f.login()

f.cwd('/pub/academic/astronomy/')

entries=[]

f.dir(entries.append)

print"%dentries:"%len(entries)

forentryinentries:

printentry

f.quit()

Noticethatthefilenamesareinaconvenientformatforautomatedprocessing—abarelistoffilenames—butthatisnoextrainformation.Contrastthebarelistoffilenameswesawearlierwiththeoutputfromdir.py,whichusesdir():

root@erlerobot:~/Python_files#pythondir.py

13entries:

-rw-r--r--1(?)»(?)»»750Feb141994INDEX

-rw-r--r--1root»bin»»135Feb111999README

-rw-r--r--1(?)»(?)»341303Oct21992ephem_4.28.tar.Z

drwxr-xr-x2(?)»(?)»»4096Feb111999hawaii_scope

drwxr-xr-x2(?)»(?)»»4096Feb111999incoming

-rw-r--r--1(?)»(?)»»5983Oct21992jupitor-moons.shar.Z

-rw-r--r--1(?)»(?)»»1751Oct21992lunar.c.Z

-rw-r--r--1(?)»(?)»»8078Oct21992lunisolar.shar.Z

-rw-r--r--1(?)»(?)»»64209Oct21992moon.shar.Z

drwxr-xr-x2(?)»(?)»»4096Jan61993planetary

-rw-r--r--1(?)»(?)»129969Oct21992sat-track.tar.Z

-rw-r--r--1(?)»(?)»»16504Oct21992stars.tar.Z

-rw-r--r--1(?)»(?)»410650Oct21992xephem.tar.Z

Thedir()methodtakesafunctionthatitcallsforeachline,deliveringthedirectorylistinginpiecesjustlikeretrlines()deliversthecontentsofparticularfiles.Here,wesimplysupplytheappend()methodofourplainoldPythonentrieslist.

Page 219: Python Networking Gitbook

IfyoucannotguaranteewhatinformationanFTPservermightchoosetoreturnfromitsdir()command,howareyougoingtotelldirectoriesfromnormalfiles—anessentialsteptodownloadingentiretreesoffilesfromtheserver?Theanswer,showninrecursedl.py,istosimplytryacwd()intoeverynamethatnlst()returnsand,ifyousucceed,concludethattheentityisadirectory.Thissampleprogramdoesnotdoanyactualdownloading;instead,tokeepthingssimple,itsimplyprintsoutthedirectoriesitvisitstothescreen.

importos,sys

fromftplibimportFTP,error_perm

defwalk_dir(f,dirpath):

original_dir=f.pwd()

try:

f.cwd(dirpath)

excepterror_perm:

return#ignorenon-directoresandoneswecannotenter

printdirpath

names=f.nlst()

fornameinnames:

walk_dir(f,dirpath+'/'+name)

f.cwd(original_dir)#returntocwdofourcaller

f=FTP('ftp.kernel.org')

f.login()

walk_dir(f,'/pub/linux/kernel/Historic/old-versions')

f.quit()

Thissampleprogramwillrunabitslow—thereare,itturnsout,quiteafewfilesintheold-versionsdirectoryontheLinuxKernelArchive—butwithinafewdozenseconds,youshouldseetheresultingdirectorytreedisplayedonthescreen:

root@erlerobot:~/Python_files#pythonrecursedl.py

/pub/linux/kernel/Historic/old-versions

/pub/linux/kernel/Historic/old-versions/impure

/pub/linux/kernel/Historic/old-versions/old

/pub/linux/kernel/Historic/old-versions/old/corrupt

/pub/linux/kernel/Historic/old-versions/tytso

DetectingDirectoriesandRecursiveDownload

Page 220: Python Networking Gitbook

Finally,FTPsupportsfiledeletion,andsupportsboththecreationanddeletionofdirectories.Thesemoreobscurecallsarealldescribedintheftplibdocumentation:

delete(filename)willdeleteafilefromtheserver.

mkd(dirname)attemptstocreateanewdirectory.

rmd(dirname)willdeleteadirectory;notethatmostsystemsrequirethedirectorytobeemptyfirst.

rename(oldname,newname)works,essentially,liketheUnixcommandmv:ifbothnamesareinthesamedirectory,thefileisessentiallyre-named;butifthedestinationspecifiesanameinadifferentdirectory,thenthefileisactuallymoved.

TouseTLS,createyourFTPconnectionwiththeFTP_TLSclassinsteadoftheplainFTPclass;simplybydoingthis,yourusernameandpasswordand,infact,theentireFTPcommandchannelwillbeprotectedfrompryingeyes.Ifyouthenadditionallyruntheclass’sprot_p()method(ittakesnoarguments),thentheFTPdataconnectionwillbeprotectedaswell.Shouldyouforsomereasonwanttoreturntousinganun-encrypteddataconnectionduringthesession,thereisaprot_c()methodthatreturnsthedatastreamtonormal.Again,yourcommandswillcontinuetobeprotectedaslongasyouareusingtheFTP_TLSclass.

CheckthePythonStandardLibrarydocumentationformoredetails(theyincludeasmallcodesample)ifyouwindupneedingthisextensiontoFTP:http://docs.python.org/library/ftplib.html#ftplib.FTP_TLS

CreatingDirectories,DeletingThings

DoingFTPSecurely

Page 221: Python Networking Gitbook

RemoteProcedureCall(RPC)systemsletyoucallaremotefunctionusingthesamesyntaxthatyouwouldusewhencallingaroutineinalocalAPIorlibrary.Thistendstobeusefulintwosituations:First,whenyourprogramhasalotofworktodo,andyouwanttospreaditacrossseveralmachinesbymakingcallsacrossthenetwork;andsecond,whenyouneeddataorinformationthatisonlyavailableonanotherharddriveornetwork.

InthischapeterwewilltrytoknowRCPbetterandlearnhowwecanuseitincombinationconPython.

RemoteProcedureCall(RPC)

Page 222: Python Networking Gitbook

Besidesservingtheirtheessentialpurposeoflettingyoumakewhatappeartobelocalfunctionormethodcallsthatareinfactpassingacrossthenetworktoadifferentserver,RPCprotocolshaveseveralkeyfeatures,andalsosomedifferences,thatyoushouldkeepinmindwhenchoosingandthendeployinganRPCclientorserver.

First,everyRPCmechanismhaslimitsonthekindofdatayoucanpass.Themostpopularprotocols,therefore,supportonlyafewkindsofnumbersandstrings;onesequenceorlistdatatype;andthensomethinglikeastructorassociativearray.

Asecondcommonfeatureistheabilityoftheservertosignalthatanexceptionoccurredwhileitwasrunningtheremotefunction.Insuchcases,theclientRPClibrarywilltypicallyraiseanexceptionitselftotelltheclientthatsomethinghasgonewrong.

Third,manyRPCmechanismsprovideintrospection,whichisawayforclientstolistthecallsthataresupportedandperhapstodiscoverwhatargumentstheytake.

Fourth,eachRPCmechanismneedstosupportsomeaddressingschemewherebyyoucanreachoutandconnecttoaparticularremoteAPI.Somesuchmechanismsarequitecomplicated,andtheymightevenhavetheabilitytoautomaticallyconnectyoutothecorrectserveronyournetworkforperformingaparticulartask,withoutyourhavingtoknowitsnamebeforehand.OthermechanismsarequitesimpleandjustaskyoufortheIPaddress,portnumber,orURLoftheserviceyouwanttoaccess.Thesemechanismsexposetheunderlyingnetworkaddressingscheme,ratherthancreatingaschemeoftheirown.

Finally,someRPCmechanismssupportauthentication,accesscontrol,andevenfullimpersonationofparticularuseraccountswhenRPCcallsaremadebyseveraldifferentclientprogramswieldingdifferentcredentials.

FeaturesofRPC

Page 223: Python Networking Gitbook

XML-RPChasnativesupportinPythonpreciselybecauseitwasoneofthefirstRPCprotocolsoftheInternetage,operatingnativelyoverHTTPinsteadofinsistingonitsownon-the-wireprotocol.Thismeansourexampleswillnotevenrequireanythird-partymodules.WhilewewillseethatthismakesourRPCserversomewhatlesscapablethanifwemovedtoathird-partylibrary,thiswillalsomaketheexamplesgoodonesforaninitialforayintoRPC.

IfyouhaveeverusedrawXML,thenyouarefamiliarwiththefactthatitlacksanydata-typesemantics;itcannotrepresentnumbers,forexample,butonlyelementsthatcontainotherelements,textstrings,andtext-stringattributes.ThustheXML-RPCspecificationhastobuildadditionalsemanticsontopoftheplainXMLdocumentformatinordertospecifyhowthingslikenumbersshouldlookwhenconvertedintomarked-uptext.ThePythonStandardLibrarymakesiteasytowriteeitheranXML-RPCclientorserver,thoughmorepowerisavailablewhenwritingaclient.Forexample,theclientlibrarysupportsHTTPbasicauthentication,whiletheserverdoesnotsupportthis.Therefore,wewillbeginatthesimpleend,withtheserver.

xmlrpc_server.pyshowsabasicserverthatstartsawebserveronport7001andlistensforincomingInternetconnections.Hereweeillusetheoperatormodule,whichexportsasetofefficientfunctionscorrespondingtotheintrinsicoperatorsofPython.

importoperator,math

fromSimpleXMLRPCServerimportSimpleXMLRPCServer

defaddtogether(*things):

"""Addtogethereverythinginthelist`things`."""

returnreduce(operator.add,things)

defquadratic(a,b,c):

"""Determine`x`valuessatisfying:`a`*x*x+`b`*x+c==0"""

b24ac=math.sqrt(b*b-4.0*a*c)

returnlist(set([(-b-b24ac)/2.0*a,

(-b+b24ac)/2.0*a]))

defremote_repr(arg):

"""Returnthe`repr()`renderingofthesupplied`arg`."""

returnarg

server=SimpleXMLRPCServer(('127.0.0.1',7001))

server.register_introspection_functions()

server.register_multicall_functions()

server.register_function(addtogether)

server.register_function(quadratic)

server.register_function(remote_repr)

print"Serverready"

server.serve_forever()

YoucanseethatthethreesamplefunctionsthattheserveroffersoverXML-RPC—theonesthatareaddedtotheRPCservicethroughtheregister_function()calls—arequitetypicalPythonfunctions.Andthat,again,isthewholepointofXML-RPC:itletsyoumakeroutinesavailableforinvocationoverthenetworkwithouthavingtowritethemanydifferentlythaniftheywerenormalfunctionsofferedinsideofyourprogram.

Notethattwoadditionalconfigurationcallsaremadeinadditiontothethreecallsthatregisterourfunctions.Eachofthemturnsonanadditionalservicethatisoptional,butoftenprovidedbyXML-RPCservers:anintrospectionroutinethataclientcanusetoaskwhichRPCcallsaresupportedbyagivenserver;andtheabilitytosupportamulticallfunctionthatletsseveralindividualfunctioncallsbebundledtogetherintoasinglenetworkround-trip.Thisserverwillneedtoberunningbeforewecantryanyofthenextthreeprogramlistings,sobringupacommandwindowandgetitstarted:

root@erlerobot:~/Python_files#pythonxmlrpc_server.py

Serverready

XML-RPC

Page 224: Python Networking Gitbook

Thismeansthatheeserverisnowwaitingforconnectionsonlocalhostport7001.

Now,openanothercommandwindowandgetreadytotryoutthenextthreelistingsaswereviewthem.First,wewilltryouttheintrospectioncapabilitythatweturnedoninthisparticularserver.Notethatthisabilityisoptional,anditmaynotbeavailableonmanyotherXML-RPCservicesthatyouuseonlineorthatyoudeployyourself.xmlrpc_introspect.pyshowshowintrospectionhappensfromtheclient’spointofview.

importxmlrpclib

proxy=xmlrpclib.ServerProxy('http://127.0.0.1:7001')

print'Herearethefunctionssupportedbythisserver:'

formethod_nameinproxy.system.listMethods():

ifmethod_name.startswith('system.'):

continue

signatures=proxy.system.methodSignature(method_name)

ifisinstance(signatures,list)andsignatures:

forsignatureinsignatures:

print'%s(%s)'%(method_name,signature)

else:

print'%s(...)'%(method_name,)

method_help=proxy.system.methodHelp(method_name)

ifmethod_help:

print'',method_help

TheintrospectionmechanismisanoptionalextensionthatisnotactuallydefinedintheXML-RPCspecificationitself.Theclientisabletocallaseriesofspecialmethodsthatallbeginwiththestringsystem.todistinguishthemfromnormalmethods.Thesespecialmethodsgiveinformationabouttheothercallsavailable.WestartbycallinglistMethods().Ifintrospectionissupportedatall,thenwewillreceivebackalistofothermethodnames;forthisexamplelisting,weignorethesystemmethodsandonlyproceedtoprintoutinformationabouttheotherones.Inthexmlrpc_introspect.pyweusethexmlrpcmodule,thismodulesupportswritingXML-RPCclientcode;ithandlesallthedetailsoftranslatingbetweenconformablePythonobjectsandXMLonthewire.

root@erlerobot:~/Python_files#pythonxmlrpc_introspect.py

Herearethefunctionssupportedbythisserver:

addtogether(...)

Addtogethereverythinginthelist`things`.

quadratic(...)

Determine`x`valuessatisfying:`a`*x*x+`b`*x+c==0

remote_repr(...)

Returnthe`repr()`renderingofthesupplied`arg`.

YouwillrecallthatthewholepointofanRPCserviceistomakefunctioncallsinatargetlanguagelookasnaturalaspossible.Andasyoucanseeinxmlrpc_client.pytheStandardLibrary’sxmlrpclibgivesyouaproxyobjectformakingfunctioncallsagainsttheserver.Thesecallslookexactlylikelocalfunctioncalls.

importxmlrpclib

proxy=xmlrpclib.ServerProxy('http://127.0.0.1:7001')

printproxy.addtogether('x','ÿ','z')

printproxy.addtogether(20,30,4,1)

printproxy.quadratic(2,-4,0)

printproxy.quadratic(1,2,1)

printproxy.remote_repr((1,2.0,'three'))

printproxy.remote_repr([1,2.0,'three'])

printproxy.remote_repr({'name':'Arthur','data':{'age':42,'sex':'M'}})

printproxy.quadratic(1,0,1)

Notehowalmostallofthecallsworkwithoutahitch,andhowbothofthecallsinthislistingandthefunctionsthemselvesbackinxmlrpc_server.pylooklikecompletelynormalPython;thereiswithnothingaboutthemthatisparticulartoanetwork:

Page 225: Python Networking Gitbook

root@erlerobot:~/Python_files#pythonxmlrpc_client.py

xÿz

55

[0.0,8.0]

[-1.0]

[1,2.0,'three']

[1,2.0,'three']

{'data':{'age':[42],'sex':'M'},'name':'Arthur'}

Traceback(mostrecentcalllast):

...

xmlrpclib.Fault:<Fault1:"<type'exceptions.ValueError'>:mathdomainerror">

NotethatXML-RPCfunctioncalls,likethoseofPythonandmanyotherlanguagesinitslineage,cantakeseveralarguments,butcanonlyreturnasingleresultvalue.Thatvaluemightbeacomplexdatastructure,butitwillbereturnedasasingleresult.Andtheprotocoldoesnotcarewhetherthatresulthasaconsistentshapeorsize;thelistreturnedbyquadratic()variesinitsnumberofelementsreturnedwithoutanycomplaintfromthenetworklogic.Note,also,thattherichvarietyofPythondatatypesmustbereducedtothesmallersetthatXMLRPCitselfhappenstosupport.Inparticular,XML-RPConlysupportsasinglesequencetype:thelist.

ThusfarwehavecoveredthegeneralfeaturesandrestrictionsofXML-RPC.IfyouconsultthedocumentationforeithertheclientortheservermoduleintheStandardLibrary,youcanlearnaboutafewmorefeatures.Inparticular,youcanlearnhowtouseTLSandauthenticationbysupplyingmoreargumentstotheServerProxyclass.Butonefeatureisimportantenoughtogoaheadandcoverhere:theabilitytomakeseveralcallsinanetworkround-tripwhentheserversupportsit,asshowninxmlrpc_multicall.py.

importxmlrpclib

proxy=xmlrpclib.ServerProxy('http://127.0.0.1:7001')

multicall=xmlrpclib.MultiCall(proxy)

multicall.addtogether('a','b','c')

multicall.quadratic(2,-4,0)

multicall.remote_repr([1,2.0,'three'])

foranswerinmulticall():

printanswer

Whenyourunthisscript,youcancarefullywatchtheserver’scommandwindowtoconfirmthatonlyasingleHTTPrequestismadeinordertoanswerallthreefunctioncallsthatgetmade.

ThreefinalpointsareworthmentioningbeforewemoveontoexamininganotherRPCmechanism:

Therearetwoadditionaldatatypesthatsometimesprovehardtolivewithout,somanyXML-RPCmechanismssupportthem:datesandthevaluethatPythoncallsNone.Python’sclientandserverbothsupportoptionsthatwillenablethetransmissionandreceptionofthesenonstandardtypes.

Keywordargumentsare,alas,notsupportedbyXML-RPC,becausefewlanguagesaresophisticatedenoughtoincludethemandXML-RPCwantstointeroperatewiththoselanguages.Someservicesgetaroundthisbyallowingadictionarytobepassedasafunction’sfinalargument.

Finally,keepinmindthatdictionariescanonlybepassedifalloftheirkeysarestrings,whethernormalorUnicode.Seethe“Self-documentingData”sectionlaterinthischapterformoreinformationonhowtothinkaboutthisrestriction.

Page 226: Python Networking Gitbook

ThebrightideabehindJSONistoserializedatastructurestostringsthatusethesyntaxoftheJavaScriptprogramminglanguage.ThismeansthatJSONstringscanbeturnedbackintodatainawebbrowsersimplybyusingtheeval()function.ByusingasyntaxspecificallydesignedfordataratherthanadaptingaverbosedocumentmarkuplanguagelikeXML,thisremoteprocedurecallmechanismcanmakeyourdatamuchmorecompactwhilesimultaneouslysimplifyingyourparsersandlibrarycode.

JSON-RPCisnotsupportedinthePythonStandardLibrary,soyouwillhavetochooseoneoftheseveralthird-partydistributionsavailable.YoucanfindthesedistributionsonthePythonPackageIndex.Myownfavoriteislovely.jsonrpc.Ifyouinstallitinavirtualenvironment,thenyoucantryouttheserverandclientshowninListingsjsonrpc_server.pyandjsonrpc_client.py.

fromwsgiref.simple_serverimportmake_server

importlovely.jsonrpc.dispatcher,lovely.jsonrpc.wsgi

deflengths(*args):

results=[]

forarginargs:

try:

arglen=len(arg)

exceptTypeError:

arglen=None

results.append((arglen,arg))

returnresults

dispatcher=lovely.jsonrpc.dispatcher.JSONRPCDispatcher()

dispatcher.register_method(lengths)

app=lovely.jsonrpc.wsgi.WSGIJSONRPCApplication({'':dispatcher})

server=make_server('localhost',7002,app)

print"Startingserver"

whileTrue:

server.handle_request()

Theservercodeisquitesimple,asanRPCmechanismshouldbe.AswithXML-RPC,wemerelyneedtonamethefunctionsthatwewantofferedoverthenetwork,andtheybecomeavailableforqueries.

fromlovely.jsonrpcimportproxy

proxy=proxy.ServerProxy('http://localhost:7002')

printproxy.lengths((1,2,3),27,{'Sirius':-1.46,'Rigel':0.12})

First,notethattheprotocolallowedustosendasmanyargumentsaswewanted;itwasnotbotheredbythefactthatitcouldnotintrospectastaticmethodsignaturefromourfunction.Second,notethattheNonevalueintheserver’sreplypassesbacktousunhindered.

root@erlerobot:~/Python_files#pythonjsonrpc_server.pyStartingserver[Inanothercommandwindow:]$python

jsonrpc_client.py[[3,[1,2,3]],[None,27],[2,{'Rigel':0.12,'Sirius':-1.46}]]

JSON-RPC

Page 227: Python Networking Gitbook

YouhavejustseenthatbothXML-RPCandJSON-RPCappeartosupportadatastructureverymuchlikeaPythondictionary,butwithanannoyinglimitation.InXML-RPC,thedatastructureiscalledastruct,whereasJSONcallsitanobject.TothePythonprogrammer,however,itlookslikeadictionary,andyourfirstreactionwillprobablybeannoyancethatitskeyscannotbeintegers,floats,ortuples.Letuslookataconcreteexample.Imaginethatyouhaveadictionaryofphysicalelementsymbolsindexedbytheiratomicnumber:

{1:'H',2:'He',3:'Li',4:'Be',5:'B',6:'C',7:'N',8:'O'}

IfyouneedtotransmitthisdictionaryoveranRPCmechanism,simplyput,thestructandobjectRPCdatastructuresarenotdesignedtopairkeyswithvaluesincontainersofanarbitrarysize.Instead,theyaredesignedtoassociateasmallsetofpre-definedattributenameswiththeattributevaluesthattheyhappentocarryforsomeparticularobject.Ifyoutrytouseastructtopairrandomkeysandvalues,youmightinadvertentlymakeitverydifficulttouseforpeopleunfortunateenoughtobeusingstatically-typedprogramminglanguages.Instead,youshouldthinkofdictionariesbeingsentacrossRPCsasbeinglikethe__dict__attributesofyourPythonobjects,whichyoushouldgenerallynotfindyourselfusingtoassociateanarbitrarysetofkeyswithvalues.

AllofthismeansthatthedictionarythatIshowedafewmomentsagoshouldactuallybeserializedasalistofexplicitlylabelledvaluesifitisgoingtobeusedbyageneral-purposeRPCmechanism:

{{'number':1,'symbol':'H'},

{'number':2,'symbol':'He'},

{'number':3,'symbol':'Li'},

{'number':4,'symbol':'Be'},

{'number':5,'symbol':'B'},

{'number':6,'symbol':'C'},

{'number':7,'symbol':'N'},

{'number':8,'symbol':'O'}}

NotethattheprecedingexamplesshowthePythondictionaryasyouwillpassitintoyourRPCcall,notthewayitwouldberepresentedonthewire.

IfyouhaveaPythondictionaryliketheonewearediscussinghere,youcanturnitintoanRPCappropriatedatastructure,andthenchangeitbackwithcodelikethis:

>>>elements={1:'H',2:'He'}

>>>t=[{'number':key,'symbol':elements[key]}forkeyinelements]

>>>t

[{'symbol':'H','number':1},{'symbol':'He','number':2}]

>>>dict((obj['number'],obj['symbol'])forobjint)

{1:'H',2:'He'}

Usingnamedtuplesmightbeanevenbetterwaytomarshalsuchvaluesbeforesendingthemifyoufindyourselfcreatinganddestroyingtoomanydictionariestomakethistransformationappealing.

Self-documentingData

Page 228: Python Networking Gitbook

IftheideaofRPCwastomakeremotefunctioncallslooklikelocalones,thenthetwobasicRPCmechanismswehavelookedatactuallyfailprettyspectacularly.Ifthefunctionswewerecallinghappenedtoonlyusebasicdatatypesintheirargumentsandreturnvalues,thenXML-RPCandJSONRPCwouldworkfine.Butthinkofalloftheoccasionswhenyouusemorecomplexparametersandreturnvaluesinstead!Whathappenswhenyouneedtopassliveobjects?

WhenallyouhavearePythonprogramsthatneedtotalktoeachother,thereisatleastoneexcellentreasontolookforanRPCservicethatknowsaboutPythonobjectsandtheirways:Pythonhasanumberofverypowerfuldatatypes,soitcansimplybeunreasonabletotry“talkingdown”tothedialectoflimiteddataformatslikeXML-RPCandJSON-RPC.ThisisespeciallytruewhenPythondictionaries,sets,anddatetimeobjectswouldexpressexactlywhatyouwanttosay.TherearetwoPython-nativeRPCsystemsthatweshouldmention:PyroandRPyC.ThePyroprojectliveshere:http://ww.xs4all.nl/~irmen/pyro3/

Thiswell-establishedRPClibraryisbuiltontopofthePythonpicklemodule,anditcansendanykindofargumentandresponsevaluethatisinherentlypickle-able.Basically,thismeansthat,ifanobject)anditsattributes)canbereducedtoitsbasictypes,thenitcanbetransmitted.However,ifthevaluesyouwanttosendorreceiveareonesthatthepicklemodulechokeson,thenPyrowillnotworkforyoursituation.Thepicklemoduleimplementsafundamental,butpowerfulalgorithmforserializingandde-serializingaPythonobjectstructure.“Pickling”istheprocesswherebyaPythonobjecthierarchyisconvertedintoabytestream,and“unpickling”istheinverseoperation,wherebyabytestreamisconvertedbackintoanobjecthierarchy.

TalkingAboutObjects:PyroandRPyC

Page 229: Python Networking Gitbook

TheRPyCprojectliveshere:http://rpyc.wikidot.com/

Thisprojecttakesamuchmoresophisticatedapproachtowardobjects.Indeed,wherewhatactuallygetspassedacrossthenetworkisareferencetoanobjectthatcanbeusedtocallbackandinvokemoreofitsmethodslaterifthereceiverneedsto.Themostrecentversionalsoseemstohaveputmorethoughtintosecurity,whichisimportantifyouarelettingotherorganizationsuseyourRPCmechanism.Afterall,ifyouletsomeonegiveyousomedatatoun-pickle,youareessentiallylettingthemrunarbitrarycodeonyourcomputer.

YoucanseeanexampleclientandserverinListingsrpyc_client.pyandrpyc_server.py.IfyouwantanexampleoftheincrediblekindsofthingsthatasystemlikeRPyCmakespossible,youshouldstudytheselistingsclosely.

importrpyc

defnoisy(string):

print'Noisy:',repr(string)

proxy=rpyc.connect('localhost',18861,config={'allow_public_attrs':True})

fileobj=open('testfile.txt')

linecount=proxy.root.line_counter(fileobj,noisy)

print'Thenumberoflinesinthefilewas',linecount

AtfirsttheclientmightlooklikearatherstandardprogramusinganRPCservice.Afterall,itcallsagenerically-namedconnect()functionwithanetworkaddress,andthenaccessesmethodsofthereturnedproxyobjectasthoughthecallswerebeingperformedlocally.

Theserverexposesasinglemethodthattakestheprofferedfileobjectandcallablefunction.ItusestheseexactlyasyouwouldinanormalPythonprogramthatwashappeninginsideasingleprocess.Itcallsthefileobject’sreadlines()andexpectsthereturnvaluetobeaniteratoroverwhichaforloopcanrepeat.Finally,theservercallsthefunctionobjectthathasbeenpassedinwithoutanyregardforwherethefunctionactuallylives(namely,intheclient).

importrpyc

classMyService(rpyc.Service):

defexposed_line_counter(self,fileobj,function):

forlinenum,lineinenumerate(fileobj.readlines()):

function(line)

returnlinenum+1

fromrpyc.utils.serverimportThreadedServer

t=ThreadedServer(MyService,port=18861)

t.start()

Itisespeciallyinstructivetolookattheoutputgeneratedbyrunningtheclient,assumingthatasmalltestfile.txtindeedexistsinthecurrentdirectoryandthatithasafewwordsofwisdominside:

root@erlerobot:~/Python_files#pythonrpyc_client.py

Noisy:'Simple\n'

Noisy:'is\n'

Noisy:'better\n'

Noisy:'than\n'

Noisy:'complex.\n'

Thenumberoflinesinthefilewas5

Equallystartlingherearetwofacts.First,theserverwasabletoiterateovermultipleresultsfromreadlines(),eventhoughthisrequiredtherepeatedinvocationoffile-objectlogicthatlivedontheclient.Second,theserverdidn’tsomehow

AnRPyCExample

Page 230: Python Networking Gitbook

copythenoisy()function’scodeobjectsoitcouldrunthefunctiondirectly;instead,itrepeatedlyinvokedthefunction,withthecorrectargumenteachtime,ontheclientsideoftheconnection.

RPyCtakesexactlytheoppositeapproachfromtheotherRPCmechanismswehavelookedat.Whereasalloftheothertechniquestrytoserializeandsendasmuchinformationacrossthenetworkaspossible,andthenleavetheremotecodetoeithersucceedorfailwithnofurtherinformationfromtheclient,theRPyCschemeonlyserializescompletelyimmutableitemssuchasPythonintegers,floats,strings,andtuples.Foreverythingelse,itpassesacrossanobjectnamethatletstheremotesidereachbackintotheclienttoaccessattributesandinvokemethodsonthoseliveobjects.

Page 231: Python Networking Gitbook

BewillingtoexplorealternativetransmissionmechanismsforyourworkwithRPCservices.TheclassesprovidedinthePythonStandardLibraryforXML-RPC,forexample,arenotevenusedbymanyPythonprogrammerswhoneedtospeakthatprotocol.

TherearethreeusefulwaysthatyoucanlookintomovingbeyondoverlysimpleexamplecodethatmakesitlookasthoughyouhavetobringupanewwebserverforeveryRPCserviceyouwanttomakeavailablefromaparticularsite.

First,lookintowhetheryoucanusethepluggabilityofWSGItoletyouinstallanRPCservicethatyouhaveincorporatedintoalargerwebprojectthatyouaredeploying.ImplementingbothyournormalwebapplicationandyourRPCserviceasWSGIserversbeneathafilterthatcheckstheincomingURLenablesyoutoallowbothservicestoliveatthesamehostnameandportnumber.

Second,insteadofusingadedicatedRPClibrary,youmayfindthatyourwebframeworkofchoicealreadyknowshowtohostanXML-RPC,JSON-RPC,orsomeotherflavorofRPCcall.

Third,youmightwanttotrysendingRPCmessagesoveranalternatetransportthatdoesabetterjobthantheprotocol’snativetransportofroutingthecallstoserversthatarereadytohandlethem.MessagequeuesareoftenanexcellentvehicleforRPCcallswhenyouwantawholerackofserverstostaybusysharingtheloadofincomingrequests.

Ofcourse,thereisonerealityoflifeonthenetworkthatRPCservicescannoteasilyhide:thenetworkcanbedownorevengodowninthemiddleofaparticularRPCcall.YouwillfindthatmostRPCmechanismssimplyraiseanexceptionifacallisinterruptedanddoesnotcomplete.Notethatanerror,unfortunately,isnoguaranteethattheremoteenddidnotprocesstherequest—maybeitactuallydidfinishprocessingit,butthenthenetworkwentdownrightasthelastpacketofthereplywasbeingsent.Inthiscase,yourcallwouldhavetechnicallyhappenedandthedatawouldhavebeensuccessfullyaddedtothedatabaseorwrittentoafileorwhatevertheRPCcalldoes.However,youwillthinkthecallfailedandwanttotryitagain—possiblystoringthesamedatatwice.

Itispossibleyouwillwantbothfeatures:acompactandefficientbinaryformatandsupportacrossseveraldifferentlanguages.Hereareafewoptions:

SomeJSON-RPClibrariessupporttheBSONprotocol,whichprovidesatightbinarytransportformatandalsoanexpandedrangeofdatatypesbeyondthosesupportedbyJSON.

TheApacheFoundationisnowincubatingThrift,anRPCsystemdevelopedseveralyearsagoatFacebookandreleasedasopensource.

GoogleProtocolBuffersarepopularwithmanyprogrammers,butstrictlyspeakingtheyarenotafullRPCsystem;instead,theyareabinarydataserializationprotocol.

RPC,WebFrameworks,MessageQueues

RecoveringFromNetworkErrors

BinaryOptions:ThriftandProtocolBuffers