python networking gitbook
Post on 04-Jan-2017
264 Views
Preview:
TRANSCRIPT
1. Introduction2. IntroductiontoClient/ServerNetworking
i. Virtualenvii. InstallingvirtualenvinErleiii. Createavirtualenvironmenttotestpackages
3. Introductiontosocketi. Whatissocket?ii. CreatingaSocketiii. Usingsocketsiv. Disconnectingv. Non-blockingsockets
4. UDPandTCPi. Addressesandportnumbersii. UDP
i. HowUDPworks?ii. WhentouseUPDiii. Socket(UDP)iv. Unreliability,Backoff,Blocking,Timeoutsv. ConnectingUDPSocketsvi. BindingtoInterfaces(UDP)vii. UDPFragmentationviii. SocketOptions
iii. TCPi. HowTCPworks?ii. WhentouseTCPiii. WhatTCPSocketsMeaniv. ASimpleTCPClientandServerv. BindingtoInterfaces(TCP)vi. Deadlockvii. ClosedConnections,Half-OpenConnectionsviii. UsingTCPStreamslikeFiles
5. SocketnamesandDNSi. Socketnamesii. Fivesocketcordinatesiii. IPv6iv. Thegetaddrinfo()function
i. Askinggetaddrinfo()WheretoBindii. Askinggetaddrinfo()AboutServicesiii. Askinggetaddrinfo()forPrettyHostnamesiv. Othergetaddrinfo()Flagsv. getaddrinfo()inyourowncode
v. ASketchofHowDNSWorksvi. UsingDNS
6. NetworkDataandNetworkErrorsi. TextandEncodingsii. NetworkByteOrderiii. FramingandQuotingiv. PicklesandSelf-DelimitingFormatsv. XML,JSON,Etc.vi. Compressionvii. NetworkExceptionsviii. HandlingExceptions
TableofContents
7. TLSandSSLi. CleartextontheNetworkii. TLSEncryptsYourConversationsiii. SupportingTLSinPythoniv. TheStandardSSLModule
8. ServerArchitecturei. DaemonsandLoggingii. Introductoryexampleiii. Elementaryclientiv. Event-DrivenServersv. TheSemanticsofNon-blockingvi. TwistedPythonvii. ThreadingandMulti-processingviii. ThreadingandMulti-processingFrameworks
9. Caches,MessageQueues,andMap-Reducei. UsingMemcachedii. MemcachedandShardingiii. MessageQueuesiv. UsingMessageQueuesfromPythonv. Map-Reduce
10. HTTPi. URLAnatomyii. RelativeURLsiii. Instrumentingurllib2iv. TheGETMethodandTheHostHeaderv. PayloadsandPersistentConnectionsvi. POSTAndFormsvii. RESTAndMoreHTTPMethodsviii. IdentifyingUserAgentsandWebServersix. ContentTypeNegotiationx. Compressionxi. HTTPCachingxii. TheHEADMethodxiii. HTTPSEncryptionxiv. HTTPAuthenticationxv. Cookiesxvi. HTTPSessionHijackingxvii. Cross-SiteScriptingAttacks
11. ScreenScrapingi. FetchingWebPagesii. DownloadingPagesThroughFormSubmissioniii. TheStructureofWebPagesiv. ThreeAxesv. DivingintoanHTMLDocumentvi. Selectors
12. WebApplicationsi. WebServersandPythonii. ChoosingaWebServeriii. WSGIiv. WSGIMiddlewarev. PythonWebFrameworksvi. URLDispatchTechniquesvii. Templatesviii. Pure-PythonWebServersix. CommonGatewayInterface(CGI)x. mod_python
13. E-mailCompositionandDecodingi. E-mailMessagesii. ComposingTraditionalMessagesiii. ParsingTraditionalMessagesiv. ParsingDatesv. UnderstandingMIMEvi. ComposingMIMEAttachmentsvii. MIMEAlternativePartsviii. ComposingNon-EnglishHeadersix. ComposingNestedMultipartsx. ParsingMIMEMessagesxi. DecodingHeaders
14. SimpleMailTransportProtocol(SMTP)i. E-mailClients,WebmailServicesii. HowSMTPIsUsediii. SendingE-Mailiv. IntroducingtheSMTPLibraryv. ErrorHandlingandConversationDebuggingvi. GettingInformationfromEHLOvii. UsingSecureSocketsLayerandTransportLayerSecurityviii. AuthenticatedSMTP
15. PostOfficeProtocol(POP)i. ConnectingandAuthenticatingii. ObtainingMailboxInformationiii. DownloadingandDeletingMessages
16. InternetMessageAccessProtocol(IMAP)i. UnderstandingIMAPinPythonii. IMAPClientiii. MessageNumbersvs.UIDsiv. SummaryInformationv. DownloadinganEntireMailboxvi. DownloadingMessagesIndividuallyvii. FlaggingandDeletingMessagesviii. SearchingandManipulatingMessages
17. TelnetandSSHi. Command-LineAutomationii. Command-LineExpansionandQuotingiii. UnixHasNoSpecialCharactersiv. QuotingCharactersforProtectionv. ThingsAreDifferentinaTerminalvi. TerminalsDoBufferingvii. Telnetviii. SSH:TheSecureShellix. SSHHostKeysx. SSHAuthenticationxi. ShellSessionsandIndividualCommandsxii. SFTP:FileTransferOverSSH
18. FileTransferProtocol(FTP)i. WhattoUseInsteadofFTPii. CommunicationChannelsiii. UsingFTPinPythoniv. ASCIIandBinaryFilesv. AdvancedBinaryDownloadingvi. UploadingDatavii. AdvancedBinaryUploadingviii. HandlingErrors
ix. DetectingDirectoriesandRecursiveDownloadx. CreatingDirectories,DeletingThings
19. RemoteProcedureCall(RPC)i. FeaturesofRPCii. XML-RPCiii. JSON-RPCiv. Self-documentingDatav. TalkingAboutObjects:PyroandRPyCvi. AnRPyCExamplevii. RPC,WebFrameworks,MessageQueues
bookbook passingpassing
ThisbookteachesthereaderaboutPythonnetworkinginLinux,usingErleRoboticsautopilots.ErleRoboticscreatessmall-sizeLinuxcomputersformakingdrones.
WithPythonnetworkingwerefertohowussingthisprogramminglanguagetocontroltheincoming/outcomingconnections,tousedifferentprotocolssuchasIP.
Foryearswe'vebeenworkingintheroboticsfield,particularlywithdrones.WehavepassedthroughdifferentUniversitiesandresearchcentersandinalltheseplacesweactuallyfoundthatmostofthedronesareblackboxes(checkoutour60spitch).Notmeanttobeusedforlearning,research.Thesoftwaretheyuseisinmostofthecasesunknown,closedsourceornotdocumented.Giventheseconditions,howarewegoingtoeducatethenextgenerationsonthistechnologies?Howdoyougetstartedprogrammingdronesifyoudon'thave$1000+budget?Whichplatformallowsmetogetstartedwithdroneswithoutriskingahand?
Wearecomingupwithananswertoallthesequestions,ourtechnology:Erle.
ErleRobotics:PythonNetworkingProgramming
Book
About
InspiredbytheBeagleBonedevelopmentboard,wehavedesignedasmallcomputerwithabout36+sensors,plentyofI/Oandprocessingpowerforreal-timeanalysis.Erleistheenablingtechnologyforthenextgenerationofaerialandterrestrialrobotsthatwillbeusedincitiessolvingtaskssuchassurveillance,enviromentalmonitoringorevenprovidingaidatcatastrophes.
Oursmall-sizeLinuxcomputerisbringingroboticstothepeopleandbusinesses.
ThisbookhasbeenbasedondiferentLinuxdocumentationavaliableontheinternet.Refertothesourcesforthecorrespondinglicenses:
PythonDocumentationPythonStandardLibraryPythonPackageIndex
AllPythonreleasesareOpenSource(seelinkfortheOpenSourceDefinition).
FoundationsofPythonNetworkProgrammingbyBrandonRhodesandJohnGoerzen
Unlessspecified,thiscontentislicensedundertheCreativeCommonsAttribution-NonComercial-ShareAlike3.0UnportedLicense.Toviewacopyofthislicense,visithttp://creativecommons.org/licenses/by-sa/3.0/orsendalettertoCreativeCommons,171SecondStreet,Suite300,SanFrancisco,California,94105,USA.
AllderivativeworksaretobeattributedtoSilviaNúñezRiveroofErleRoboticsS.L..
Foranyquestions,concerns,orissuessubmitthemtosupport[at]erlerobot.com.
License
ThischapterisaboutnetworkprogrammingwiththePythonlanguage:aboutaccomplishingaspecificsetoftasksthatallinvolveaparticulartechnology—computernetworks—usingageneral-purposeprogramminglanguagethatcandoallsortsofthings.
Fornowon,wewillusefrecuently:
PythonStandardLibrarydocumentationPythonPackageIndex
IntroductiontoClient/ServerNetworking
AcommonsituationisthatyoufindaPythonpackagethatsoundslikeitmightalreadydoexactlywhatyouwant,andthatyouwanttotryitoutonyoursystem.ForthisyoushouldbeintroducetoverybestPythontechnologyforquicklytryingoutanewlibrary:virtualenv
Intheolddays,installingaPythonpackagewasagruesomeandirreversibleactthatrequiredadministrativeprivilegesonyourmachineandleftyoursystemPythoninstallpermanentlyaltered.
CarefulPythonprogrammersdonotsufferfromthissituationanylonger.ManyoftheminstallonlyonePythonpackagesystem-wide:virtualenv.Oncevirtualenvisinstalled,youhavethepowertocreateanynumberofsmall,self-contained“virtualPythonenvironments”wherepackagescanbeinstalled,un-installed,andexperimentedwithwithoutcontaminatingyoursystem-widePython.Whenaparticularprojectorexperimentisover,yousimplyremoveitsvirtualenvironmentdirectory,andyoursystemisclean.
Virtualenv
Thisistheoficialwebsiteofvirtualenv,whereyoucanfindinfromationabouttheinstallationandtheusage.
IfyouareconnectedtotheInternetfromErle(byusingawirelessnadousb)thenyouonlyneedtotype:
root@erlerobot:~#pipinstallvirtualenv
Ifnottheprocessmustbeabitmoretedious:
FirstofallyuneedtodownloadthevirtualenvfromhereDownloadthefilecalledvirtualenv-1.11.6.tar.gz(md5,pgp)toyourPc.ThencopyittoErleboar,youcanfindinthistutorialhowtodoit.Onceyouhavecopiedit,type:
root@erlerobot:~#tarxvfzvirtualenv-1.11.6.tar.gz
root@erlerobot:~#cdvirtualenv-1.11.6
root@erlerobot:~#pythonsetup.pyinstall
Congratulationsyouarenowreadytouseit!
InstallingvirtualenvinErle
Wearenowgoingtousevirtualenvtocreateanewenvironmentandintallthegooglemapspackageonit.Youcanreadmoreaboutthispackagehere.
Nowyoutypethefollowing:
root@erlerobot:~#virtualenv--no-site-packagesgmapenv
Newpythonexecutableingmapenv/bin/python
Installingsetuptools,pip...done.
root@erlerobot:~#
root@erlerobot:~#cdgmapenv
root@erlerobot:~/gmapenv#ls
binincludeliblocal
root@erlerobot:~/gmapenv#.bin/activate
(gmapenv)root@erlerobot:~/gmapenv#python-c'importgooglemaps'
Traceback(mostrecentcalllast):
File"<string>",line1,in<module>
ImportError:Nomodulenamedgooglemaps
(gmapenv)root@erlerobot:~/gmapenv#
Asyoucansee,thegooglemapspackageisnotyetavailable.Toinstallit,usethepipcommandthatisinsideyourvirtualenvandthatisnowonyourpaththankstotheactivatecommandthatyouran:
(gmapenv)root@erlerobot:~/gmapenv#pipinstallgooglemaps
Downloading/unpackinggooglemaps
Downloadinggooglemaps-1.0.2.tar.gz(60Kb):60Kbdownloaded
Runningsetup.pyegg_infoforpackagegooglemaps
Installingcollectedpackages:googlemaps
Runningsetup.pyinstallforgooglemaps
Successfullyinstalledgooglemaps
Cleaningup...
Thepythonbinaryinsidethevirtualenvwillnowhavethegooglemapspackageavailable:
(gmapenv)root@erlerobot:~/gmapenv#python-c'importgooglemaps'
Whenyouinstallapacket,youshouldbecarefull:itmustbesuitableforErlearchitecture.
Createavirtualenvironmenttotestpackages
Wewillusesocketsalotinfuturechapters.Thus,thischapter'saimistointroduceyouthebasicconceptsofsocket.
Introductiontosocket
RatherthantryingtoinventitsownAPIfordoingnetworking,Pythonmadeaninterestingdecision:itsimplyprovidesaslightlyobject-basedinterfacetoallofthenormal,gritty,low-leveloperatingsystemcallsthatarenormallyusedtoaccomplishnetworkingtasksonPOSIX-compliantoperatingsystems.
So,PythonexposesthenormalPOSIXcallsforrawUDPandTCPconnectionsratherthantryingtoinventanyofitsown.AndthenormalPOSIXnetworkingcallsoperatearoundacentralconceptcalledasocket.
ThatmeansthatcommunicationbetweendifferententitiesonanetworkisbasedontheclassicconceptPythonsockets.Socketsareanabstractconceptthatdesignatestheendpointofaconnection.Theprogramsusesocketstocommunicatewithotherprograms,whichmaybelocatedondifferentcomputers.AsocketisdefinedbytheIPaddressofthemachine,theportonwhichitlistens,andtheprotocolused.
Moreover,ifyouhaveeverworkedwithPOSIXbefore,youwillprobablyhaverunacrossthefactthatinsteadofmakingyourepeatafilenameoverandoveragain,thecallsletyouusethefilenametocreatea“filedescriptor”thatrepresentsaconnectiontothefile,andthroughwhichyoucanaccessthefileuntilyouaredoneworkingwithit.Socketsprovidethesameideaforthenetworkingrealm:whenyouaskforaccesstoalineofcommunication—likeaUDPport,asweareabouttosee—youcreateoneoftheseabstract“socket”objectsandthenaskforittobeboundtotheportyouwanttouse.Ifthebindingissuccessful,thenthesocket“holdsonto”thatportnumberfor.
Youshould,aswell,beawareofthatpartofthetroublewithunderstandingthesethingsisthat“socket”canmeananumberofsubtlydifferentthings,dependingoncontext.Sofirst,let’smakeadistinctionbetweena“client”socket-anendpointofaconversation,anda“server”socket,whichismorelikeaswitchboardoperator.Theclientapplication(yourbrowser,forexample)uses“client”socketsexclusively;thewebserverit’stalkingtousesboth“server”socketsand“client”sockets.
FromPythondocumentationwecanextractmoreinfoaboutsocketmodule.
Whatissocket?
Roughlyspeaking,whenyouclickedonthelinkthatbroughtyoutothispage,yourbrowserdidsomethinglikethefollowing:
#createanINET,STREAMingsocket
s=socket.socket(
socket.AF_INET,socket.SOCK_STREAM)
#nowconnecttothewebserveronport80
#-thenormalhttpport
s.connect(("www.mcmillan-inc.com",80))
Whentheconnectcompletes,thesocketscanbeusedtosendinarequestforthetextofthepage.Thesamesocketwillreadthereply,andthenbedestroyed.That’sright,destroyed.Clientsocketsarenormallyonlyusedforoneexchange(orasmallsetofsequentialexchanges).
Whathappensinthewebserverisabitmorecomplex.First,thewebservercreatesa“serversocket”:
#createanINET,STREAMingsocket
serversocket=socket.socket(
socket.AF_INET,socket.SOCK_STREAM)
#bindthesockettoapublichost,
#andawell-knownport
serversocket.bind((socket.gethostname(),80))
#becomeaserversocket
serversocket.listen(5)
Acouplethingstonotice:weusedsocket.gethostname()sothatthesocketwouldbevisibletotheoutsideworld.Ifwehaduseds.bind(('localhost',80))ors.bind(('127.0.0.1',80))wewouldstillhavea“server”socket,butonethatwasonlyvisiblewithinthesamemachine.s.bind(('',80))specifiesthatthesocketisreachablebyanyaddressthemachinehappenstohave.
Asecondthingtonote:lownumberportsareusuallyreservedfor“wellknown”services(HTTP,SNMPetc).Ifyou’replayingaround,useanicehighnumber(4digits).
Finally,theargumenttolistentellsthesocketlibrarythatwewantittoqueueupasmanyas5connectrequests(thenormalmax)beforerefusingoutsideconnections.Iftherestofthecodeiswrittenproperly,thatshouldbeplenty.
Nowthatwehavea“server”socket,listeningonport80,wecanenterthemainloopofthewebserver:
while1:
#acceptconnectionsfromoutside
(clientsocket,address)=serversocket.accept()
#nowdosomethingwiththeclientsocket
#inthiscase,we'llpretendthisisathreadedserver
ct=client_thread(clientsocket)
ct.run()
There’sactually3generalwaysinwhichthisloopcouldwork-dispatchingathreadtohandleclientsocket,createanewprocesstohandleclientsocket,orrestructurethisapptousenon-blockingsockets,andmulitplexbetweenour“server”socketandanyactiveclientsocketsusingselect.Theimportantthingtounderstandnowisthis:thisisalla“server”socketdoes.Itdoesn’tsendanydata.Itdoesn’treceiveanydata.Itjustproduces“client”sockets.Eachclientsocketiscreatedinresponsetosomeother“client”socketdoingaconnect()tothehostandportwe’reboundto.Assoonaswe’vecreatedthatclientsocket,wegobacktolisteningformoreconnections.Thetwo“clients”arefreetochatitup-theyareusingsomedynamicallyallocatedportwhichwillberecycledwhentheconversationends.
CreatingaSocket
Thefirstthingtonote,isthatthewebbrowser’s“client”socketandthewebserver’s“client”socketareidenticalbeasts.Thatis,thisisa“peertopeer”conversation.Ortoputitanotherway,asthedesigner,youwillhavetodecidewhattherulesofetiquetteareforaconversation.Normally,theconnectingsocketstartstheconversation,bysendinginarequest,orperhapsasignon.Butthat’sadesigndecision-it’snotaruleofsockets.
Nowtherearetwosetsofverbstouseforcommunication.Youcanusesend()andrecv(),oryoucantransformyourclientsocketintoafile-likebeastanduseread()andwrite().I’mnotgoingtotalkaboutithere,excepttowarnyouthatyouneedtouseflushonsockets.Thesearebuffered“files”,andacommonmistakeistowritesomething,andthenreadforareply.Withoutaflushinthere,youmaywaitforeverforthereply,becausetherequestmaystillbeinyouroutputbuffer.
Nowwecometothemajorstumblingblockofsockets-send()andrecv()operateonthenetworkbuffers.Theydonotnecessarilyhandleallthebytesyouhandthem(orexpectfromthem),becausetheirmajorfocusishandlingthenetworkbuffers.Ingeneral,theyreturnwhentheassociatednetworkbuffershavebeenfilled(send)oremptied(recv).Theythentellyouhowmanybytestheyhandled.Itisyourresponsibilitytocallthemagainuntilyourmessagehasbeencompletelydealtwith.
Whenarecv()returns0bytes,itmeanstheothersidehasclosed(orisintheprocessofclosing)theconnection.Youwillnotreceiveanymoredataonthisconnection.
AprotocollikeHTTPusesasocketforonlyonetransfer.Theclientsendsarequest,thenreadsareply.That’sit.Thesocketisdiscarded.Thismeansthataclientcandetecttheendofthereplybyreceiving0bytes.
Butifyouplantoreuseyoursocketforfurthertransfers,youneedtorealizethatthereisnoEOTonasocket.Irepeat:ifasocketsend()orrecv()returnsafterhandling0bytes,theconnectionhasbeenbroken.Iftheconnectionhasnotbeenbroken,youmaywaitonarecv()forever,becausethesocketwillnottellyouthatthere’snothingmoretoread(fornow).Nowifyouthinkaboutthatabit,you’llcometorealizeafundamentaltruthofsockets:messagesmusteitherbefixedlength(yuck),orbedelimited(shrug),orindicatehowlongtheyare(muchbetter),orendbyshuttingdowntheconnection.Thechoiceisentirelyyours,(butsomewaysarerighterthanothers).
Assumingyoudon’twanttoendtheconnection,thesimplestsolutionisafixedlengthmessage:
classmysocket:
'''demonstrationclassonly
-codedforclarity,notefficiency
'''
def__init__(self,sock=None):
ifsockisNone:
self.sock=socket.socket(
socket.AF_INET,socket.SOCK_STREAM)
else:
self.sock=sock
defconnect(self,host,port):
self.sock.connect((host,port))
defmysend(self,msg):
totalsent=0
whiletotalsent<MSGLEN:
sent=self.sock.send(msg[totalsent:])
ifsent==0:
raiseRuntimeError("socketconnectionbroken")
totalsent=totalsent+sent
defmyreceive(self):
chunks=[]
bytes_recd=0
whilebytes_recd<MSGLEN:
chunk=self.sock.recv(min(MSGLEN-bytes_recd,2048))
ifchunk=='':
raiseRuntimeError("socketconnectionbroken")
Usingsockets
chunks.append(chunk)
bytes_recd=bytes_recd+len(chunk)
return''.join(chunks)
Thesendingcodehereisusableforalmostanymessagingscheme-inPythonyousendstrings,andyoucanuselen()todetermineitslength(evenifithasembedded\0characters).It’smostlythereceivingcodethatgetsmorecomplex.
Theeasiestenhancementistomakethefirstcharacterofthemessageanindicatorofmessagetype,andhavethetypedeterminethelength.Nowyouhavetworecvs-thefirsttoget(atleast)thatfirstcharactersoyoucanlookupthelength,andthesecondinalooptogettherest.Ifyoudecidetogothedelimitedroute,you’llbereceivinginsomearbitrarychunksize,(4096or8192isfrequentlyagoodmatchfornetworkbuffersizes),andscanningwhatyou’vereceivedforadelimiter.
Onecomplicationtobeawareof:ifyourconversationalprotocolallowsmultiplemessagestobesentbacktoback(withoutsomekindofreply),andyoupass`recv()^anarbitrarychunksize,youmayendupreadingthestartofafollowingmessage.You’llneedtoputthatasideandholdontoit,untilit’sneeded.
Prefixingthemessagewithit’slength(say,as5numericcharacters)getsmorecomplex,because(believeitornot),youmaynotgetall5charactersinonerecv.Inplayingaround,you’llgetawaywithit;butinhighnetworkloads,yourcodewillveryquicklybreakunlessyouusetworecvloops-thefirsttodeterminethelength,thesecondtogetthedatapartofthemessage.Nasty.Thisisalsowhenyou’lldiscoverthatsenddoesnotalwaysmanagetogetridofeverythinginonepass.Anddespitehavingreadthis,youwilleventuallygetbitbyit!
Wewilldiscusstheissueofframming(delimitingmessages)inlaterchapter:NetworkdataandNetworkerrors
Strictlyspeaking,you’resupposedtouseshutdownonasocketbeforeyoucloseit.Theshutdownisanadvisorytothesocketattheotherend.Dependingontheargumentyoupassit,itcanmean“I’mnotgoingtosendanymore,butI’llstilllisten”,or“I’mnotlistening,goodriddance!”.Mostsocketlibraries,however,aresousedtoprogrammersneglectingtousethispieceofetiquettethatnormallyacloseisthesameasshutdown();close().Soinmostsituations,anexplicitshutdownisnotneeded.
OnewaytouseshutdowneffectivelyisinanHTTP-likeexchange.Theclientsendsarequestandthendoesashutdown(1).Thistellstheserver“Thisclientisdonesending,butcanstillreceive.”Theservercandetect“EOF”byareceiveof0bytes.Itcanassumeithasthecompleterequest.Theserversendsareply.Ifthesendcompletessuccessfullythen,indeed,theclientwasstillreceiving.
Pythontakestheautomaticshutdownastepfurther,andsaysthatwhenasocketisgarbagecollected,itwillautomaticallydoacloseifit’sneeded.Butrelyingonthisisaverybadhabit.Ifyoursocketjustdisappearswithoutdoingaclose,thesocketattheotherendmayhangindefinitely,thinkingyou’rejustbeingslow.So,itisveryrecommendablecloseyoursocketswhenyou’redone.
Disconnecting
InPython,youusesocket.setblocking(0)tomakeitnon-blocking.Youdothisaftercreatingthesocket,butbeforeusingit.(Actually,ifyou’renuts,youcanswitchbackandforth.)
Themajormechanicaldifferenceisthatsend(),recv(),connectandacceptcanreturnwithouthavingdoneanything.Youhave(ofcourse)anumberofchoices.Youcancheckreturncodeanderrorcodesandgenerallydriveyourselfcrazy.Yourappwillgrowlarge,buggyandsuckCPU.Solet’sskipthebrain-deadsolutionsanddoitright.Useselect.
ready_to_read,ready_to_write,in_error=\
select.select(
potential_readers,
potential_writers,
potential_errs,
timeout)
`
Youpassselectthreelists:thefirstcontainsallsocketsthatyoumightwanttotryreading;thesecondallthesocketsyoumightwanttotrywritingto,andthelast(normallyleftempty)thosethatyouwanttocheckforerrors.Youshouldnotethatasocketcangointomorethanonelist.Theselectcallisblocking,butyoucangiveitatimeout.Thisisgenerallyasensiblethingtodo-giveitanicelongtimeout(sayaminute)unlessyouhavegoodreasontodootherwise.
Inreturn,youwillgetthreelists.Theycontainthesocketsthatareactuallyreadable,writableandinerror.Eachoftheselistsisasubset(possiblyempty)ofthecorrespondinglistyoupassedin.
Ifasocketisintheoutputreadablelist,youcanbeas-close-to-certain-as-we-ever-get-in-this-businessthatarecvonthatsocketwillreturnsomething.Sameideaforthewritablelist.You’llbeabletosendsomething.Maybenotallyouwantto,butsomethingisbetterthannothing.(Actually,anyreasonablyhealthysocketwillreturnaswritable-itjustmeansoutboundnetworkbufferspaceisavailable.)
Ifyouhavea“server”socket,putitinthepotential_readerslist.Ifitcomesoutinthereadablelist,youracceptwill(almostcertainly)work.Ifyouhavecreatedanewsockettoconnecttosomeoneelse,putitinthepotential_writerslist.Ifitshowsupinthewritablelist,youhaveadecentchancethatithasconnected.
Oneverynastyproblemwithselect:ifsomewhereinthoseinputlistsofsocketsisonewhichhasdiedanastydeath,theselectwillfail.Youthenneedtoloopthrougheverysingledamnsocketinallthoselistsanddoaselect([sock],[],[],0)untilyoufindthebadone.Thattimeoutof0meansitwon’ttakelong,butit’sugly.
Actually,selectcanbehandyevenwithblockingsockets.It’sonewayofdeterminingwhetheryouwillblock-thesocketreturnsasreadablewhenthere’ssomethinginthebuffers.However,thisstilldoesn’thelpwiththeproblemofdeterminingwhethertheotherendisdone,orjustbusywithsomethingelse.
Portabilityalert:OnUnix,selectworksbothwiththesocketsandfiles.Don’ttrythisonWindows.OnWindows,selectworkswithsocketsonly.
Non-blockingsockets
ThetwoprincipalapproacheswhenbuildingatopIPare:UPDandTCP.
ThevastmajorityofapplicationstodayarebuiltatopTCP,theTransmissionControlProtocol,whichoffersorderedandreliabledatastreamsbetweenIPapplications.Afewprotocols,usuallywithshort,self-containedrequestsandresponses,andsimpleclientsthatwillnotbeannoyedifarequestgetslostandtheyhavetorepeatit,chooseUDP,theUserDatagramProtocol.
Thistwomethodsaredescribedindepthalongthischapter,butfornowhavetakeaquicklooktothedifferencesbetweenthistwo.
UDPandTCP
Wearegoingtoreviewabitaboutthistwotopics:
TheIPprotocolassignsanIPaddress—whichtraditionallytakestheformofafour-octetcode,like18.9.22.69—toeverymachineconnectedtoanIPnetwork.Infact,itdoesabitmorethanthis:amachinewithseveralnetworkcardsconnectedtothenetworkwilltypicallyhaveadifferentIPaddressforeachcard,sothatotherhostscanchoosethenetworkoverwhichyouwanttocontactthemachine.ButevenifanIP-connectedmachinehasonlyonenetworkcard,italsohasatleastoneothernetworkaddress:theaddress127.0.0.1ishowmachinescanconnecttothemselves.Itservesasastablenamethateachmachinehasforitself,thatstaysthesameasnetworkcablesarepluggedandunpluggedandaswirelesssignalscomeandgo.AndtheseIPaddressesallowmillionsofdifferentmachines,usingallsortsofdifferentnetworkhardware,topasspacketstoeachotheroverthefabricofanIPnetwork.
ButwithUDPandTCPwenowtakeabigstep,andstopthinkingabouttheroutingneedsofthenetworkasawholeandstartconsideringtheneedsofspecificapplicationsthatarerunningonaparticularmachine.Andthefirstthingwenoticeisthatasinglecomputertodaycanhavemanydozensofprogramsrunningonitatanygiventime—andmanyofthesewillwanttousethenetworkatthesamemoment.Youmightbecheckinge-mailwithThunderbirdwhileawebpageisdownloadinginGoogleChrome,orinstallingaPythonpackagewithpipoverthenetworkwhilecheckingthestatusofaremoteserverwithSSH.Somehow,allofthosedifferentandsimultaneousconversationsneedtotakeplacewithoutinterferingwitheachother.Thisproblemisknownasneedformultiplexing:theneedforasinglechanneltobesharedunambiguouslybyseveraldifferentconversations.
YoualsoshouldrememberthatwhenaprogramonyourcomputersendsorreceivesdataovertheInternetitsendsthatdatatoanipaddressandaspecificportontheremotecomputer,andreceivesthedataonausuallyrandomportonits
Addressesandportnumbers
owncomputer.IfitusestheTCPprotocoltosendandreceivethedatathenitwillconnectandbinditselftoaTCPport.IfitusestheUDPprotocoltosendandreceivedata,itwilluseaUDPport.
Now,wearegoingtocentreinUDP(UserDatagramProtocol).
UDP
TheUDPschemeisreallyquitesimple;anIPaddressandportareallthatisnecessarytodirectapackettoitsdestination.
Imagine,forexample,thatyousetupaDNSserver(Chapter4)ononeofyourmachines,withtheIPaddress192.168.1.9.Toallowothercomputerstofindtheservice,theserverwillasktheoperatingsystemforpermissiontotakecontroloftheUDPportwiththestandardDNSportnumber53.Assumingthatnoprocessisalreadyrunningthathasclaimedthatportnumber,theDNSserverwillbegrantedthatport.
Next,imaginethataclientmachinewiththeIPaddress192.168.1.30onyournetworkisgiventheIPaddressofthisnewDNSserverandwantstoissueaquery.ItwillcraftaDNSqueryinmemory,andthenasktheoperatingsystemtosendthatblockofdataasaUDPpacket.Sincetherewillneedtobesomewaytoidentifytheclientwhenthepacketreturns,andsincetheclienthasnotexplicitlyrequestedaportnumber,theoperatingsystemassignsitarandomone—say,port44137.
Thepacketwillthereforewingitswaytowardport53withlabelsthatidentifyitssourceastheIPaddressandUDPportnumbers(hereseparatedbyacolon):
192.168.1.30:44137
Anditwillgiveitsdestinationasthefollowing:
192.168.1.9:53
Thisdestinationaddress,simplethoughitlooks—justthenumberofacomputer,andthenumberofaport—iseverythingthatanIPnetworkstackneedstoguidethispackettoitsdestination.TheDNSserverwillreceivetherequestfromitsoperatingsystem,alongwiththeoriginatingIPandportnumber.Onceithasformulatedaresponse,theDNSserverwillasktheoperatingsystemtosendtheresponseasaUDPpackettotheIPaddressandUDPportnumberfromwhichtherequestoriginallycame.Thereplypacketwillhavethesourceanddestinationswappedfromwhattheywereintheoriginalpacket,anduponitsarrivalatthesourcemachine,itwillbedeliveredtothewaitingclientprogram.
HowUDPworks?
So,TheUserDataProtocol,UDP,letsuser-levelprogramssendindividualpacketsacrossanIPnetwork.Typically,aclientprogramsendsapackettoaserver,whichthenrepliesbackusingthereturnaddressbuiltintoeveryUDPpacket.YoumightthinkthatUDPwouldbeveryefficientforsendingsmallmessages.Actually,UDPisefficientonlyifyourhosteveronlysendsonemessageatatime,thenwaitsforaresponse.
TherearetwogoodreasonstouseUDP:
Becauseyouareimplementingaprotocolthatalreadyexists,anditusesUDP.
Becauseunreliablesubnetbroadcastisagreatpatternforyourapplication,andUDPsupportsitperfectly.
WhentouseUPD
Aswehaveseensocketsmakestalkingtoarbitrarymachinesaroundtheworldunbelievablyeasy(atleastcomparedtootherschemes).
Whenyoucraftprogramsthatacceptportnumbersfromuserinputlikethecommandlineorconfigurationfiles,itisfriendlytoallownotjustnumericportnumbersbuttoletuserstypehumanreadablenamesforwell-knownports.Thesenamesarestandard,andareavailablethroughthegetservbyname()callsupportedbyPython’sstandardsocketmodule.IfwewanttoaskwheretheDomainNameServicelives,wecouldhavefoundoutthisway:
importsocket
socket.getservbyname('domain')
53
Nowexaminethefollowingcodewhichshowsasimpleserverandclient.YoucanseealreadythatallsortsofoperationsaretakingplacethataredrawnfromthesocketmoduleinthePythonStandardLibrary.
#UDPclientandserveronlocalhost
importsocket,sys
s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
MAX=65535
PORT=1060
ifsys.argv[1:]==['server']:
s.bind(('127.0.0.1',PORT))
print'Listeningat',s.getsockname()
whileTrue:
data,address=s.recvfrom(MAX)
print'Theclientat',address,'says',repr(data)
s.sendto('Yourdatawas%dbytes'%len(data),address)
elifsys.argv[1:]==['client']:
print'Addressbeforesending:',s.getsockname()
s.sendto('Thisismymessage',('127.0.0.1',PORT))
print'Addressaftersending',s.getsockname()
data,address=s.recvfrom(MAX)#overlypromiscuous-seetext!
print'Theserver',address,'says',repr(data)
else:
print>>sys.stderr,'usage:udp_local.pyserver|client'
Whenruningit,youshouldgetsomethingsimilartothis:
root@erlerobot:~/Python_files#pythonudp_local.py
usage:udp_local.pyserver|client
Noetrytorunfirsttheserver:
root@erlerobot:~/Python_files#pythonude_local.pyserver
Listeningat('127.0.0.1',1060)
AndtheninanewTerminalwindowtheclient:
root@erlerobot:~/Python_files#pythonudp_local.pyclient
Addressbeforesending:('0.0.0.0',0)
Addressaftersending('0.0.0.0',59726)
Theserver('127.0.0.1',1060)says'Yourdatawas18bytes'
Intheserverwindowwillappearanewline:
Socket(UDP)
Theclientat('127.0.0.1',59726)says'Thisismymessage'
NotethatthePythonprogramcanalwaysuseasocket’sgetsockname()methodtoretrievethecurrentIPandporttowhichthesocketisbound.Oncethesockerhasbeenboundsuccessfully,theserverisreadytostartreceivingrequests!Itentersaloopandrepeatedlyrunsrecvfrom(),tellingtheroutinethatitwillhappilyreceivemessagesuptoamaximumlengthofMAX,whichisequalto65535bytes—avaluethathappenstobethegreatestlengththataUDPpacketcanpossiblyhave,sothatwewillalwaysbeshownthefullcontentofeachpacket.Untilwesendamessagewithaclient,ourrecvfrom()callwillwaitforever.
Becausetheclientandserverintheprevioussectionwerebothrunningonthesamemachineandtalkingthroughitsloopbackinterface—whichisnotevenaphysicalnetworkcardthatcouldexperienceasignalingglitchandloseapacket,butmerelyavirtualconnectionbacktothesamemachinedeepinthenetworkstack—therewasnorealwaythatpacketscouldgetlost,andsowedidnotactuallyseeanyoftheinconvenienceofUDP.
YoucanrunthisclientandserverexampleontwodifferentmachinesontheInternet.Andinsteadofalwaysansweringclientrequests,thisserverrandomlychoosestoansweronlyhalfoftherequestscominginfromclients—whichwillletusdemonstratehowtobuildreliabilityintoourclientcode,withoutwaitingwhatmightbehoursforarealdroppedpackettooccur.
importrandom,socket,sys
s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
MAX=65535
PORT=1060
if2<=len(sys.argv)<=3andsys.argv[1]=='server':
interface=sys.argv[2]iflen(sys.argv)>2else''
s.bind((interface,PORT))
print'Listeningat',s.getsockname()
whileTrue:
data,address=s.recvfrom(MAX)
ifrandom.randint(0,1):
print'Theclientat',address,'says:',repr(data)
s.sendto('Yourdatawas%dbytes'%len(data),address)
else:
print'Pretendingtodroppacketfrom',address
eliflen(sys.argv)==3andsys.argv[1]=='client':
hostname=sys.argv[2]
s.connect((hostname,PORT))
print'Clientsocketnameis',s.getsockname()
delay=0.1
whileTrue:
s.send('Thisisanothermessage')
print'Waitingupto',delay,'secondsforareply'
s.settimeout(delay)
try:
data=s.recv(MAX)
exceptsocket.timeout:
delay*=2#waitevenlongerforthenextrequest
ifdelay>2.0:
raiseRuntimeError('Ithinktheserverisdown')
except:
raise#arealerror,sowelettheuserseeit
else:
break#wearedone,andcanstoplooping
print'Theserversays',repr(data)
else:
print>>sys.stderr,'usage:udp_remote.pyserver[<interface>]'
print>>sys.stderr,'or:udp_remote.pyclient<host>'
sys.exit(2)
Runningthefileitselfresulton:
root@erlerobot:~/Python_files#pythonsocket1.py
usage:udp_remote.pyserver[<interface>]
or:udp_remote.pyclient<host>
Thenruntheserver:
root@erlerobot:~/Python_files#pythonudp_remote.pyserver
Listeningat('0.0.0.0',1060)
Andnowtheclient,remembertopassthehostnamewheretheserverscriptisbeingrun(inthiscasethesamemachine):
Unreliability,Backoff,Blocking,Timeouts
root@erlerobot:~/Python_files#pythonudep_remote.pyclient127.0.0.1
Clientsocketnameis('127.0.0.1',54770)
Waitingupto0.1secondsforareply
Waitingupto0.2secondsforareply
Waitingupto0.4secondsforareply
Waitingupto0.8secondsforareply
Theserversays'Yourdatawas23bytes'
Asyoucansee,eachtimearequestisreceived,theserverusesrandint()toflipacointodecidewhetherthisrequestwillbeanswered,sothatwedonothavetokeeprunningtheclientalldaywaitingforarealdroppedpacket.Theclientewillfindthatoneormoreofitsrequestsneverresultinreplies.
TheremoteUDPclientinsocket1.pyusesanewcallthatwehavenotdiscussedbefore:theconnect()socketoperation.Youcanseeeasilyenoughwhatitdoes.Insteadofhavingtousesendto()andanexplicitUDPaddresseverytimewewanttosendsomethingtotheserver,theconnect()callletstheoperatingsystemknowaheadoftimewhichremoteaddresstowhichwewanttosendpackets,sothatwecansimplysupplydatatothesend()callandnothavetorepeattheserveraddressagain.Butconnect()doessomethingelseimportant,whichwillnotbeobviousatallfromreadingthescriptofudp_remote.py.Toapproachthistopic,letusreturntoudp_local.pyfileforamoment.YouwillrecallthatbothitsclientandserverusetheloopbackIPaddressandassumereliabledelivery—theclientwillwaitforeverforaresponse.Tryrunningtheclientinonewindow:
root@erlerobot:~/Python_files#pythonudp_local.py
Addressbeforesending:('0.0.0.0',0)
Addressaftersending('0.0.0.0',52970)
Theclientisnowwaiting—perhapsforever—foraresponseinreplytothepacketithasjustsenttothelocalhostIPaddressatUDPport1060.Butwhatifwenefariouslytrysendingitbackapacketfromadifferentserver,instead?Fromanothercommandpromptonthesamesystem,tryrunningPythonandenteringthesecommands—andfortheportnumber,copytheintegerthatwasjustprintedtothescreenwhenyourantheUDPclient:
>>>importsocket
>>>s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
>>>s.sendto('Fakereply',('127.0.0.1',52970))
10
>>>
Intheclientwindowappears:
Theserver('127.0.0.1',65320)says'Fakereply'
Itturnsoutthatourfirstclientacceptsanswersfromanywhere.Eventhoughtheserverisrunningonthelocalhost,andremotenetworkconnectivityisnotevendesirable,theclientwillevenacceptpacketsfromanothermachine.IfIbringupaPythonpromptonanotherboxandrunthesametwolinesofcodeasjustshown,thenawaitingclientcanevenseetheremoteIPaddress.
Thereare,then,twowaystowriteUDPclientsthatarecarefulaboutthereturnaddressesofthepacketsarrivingback:
Youcanusesendto()anddirecteachoutgoingpackettoaspecificdestination,andthenuserecvfrom()toreceivetherepliesandcarefullycheckthereturnaddressitgivesyouagainstthelistofserverstowhichyouhavemadeoutstandingrequests.
Youcanconnect()yoursocketrightaftercreatingit,andthensimplyusesend()andrecv(),andtheoperatingsystemwillfilteroutunwantedpacketsforyou.Thisworksonlyforspeakingtooneserveratatime,becauserunningconnect()asecondtimeonthesamesocketdoesnotaddaseconddestinationaddresstoyourUDPsocket.Instead,itwipesoutthefirstaddressentirely,sothatnofurtherrepliesfromtheearlieraddresswillbedeliveredtoyourprogram.
ConnectingUDPSockets
Whenusingsockets,itisimportanttodistinguishtheactof“binding”—bywhichyougrabaparticularUDPportfortheuseofaparticularsocket—fromtheactthattheclientperformsby“connecting,”whichlimitsallrepliesreceivedsothattheycancomeonlyfromtheparticularservertowhichyouwanttotalk.
SofarwehaveseentwopossibilitiesfortheIPaddressusedinthebind()callthattheservermakes:youcanuse'127.0.0.1'toindicatethatyouonlywantpacketsfromotherprogramsrunningonthesamemachine,oruseanemptystring''asawildcard,indicatingthatyouarewillingtoreceivepacketsfromanyinterface.Itactuallyturnsoutthatthereisathirdchoice:youcanprovidetheIPaddressofoneofthemachine’sexternalIPinterfaces,likeitsEthernetconnectionorwirelesscard,andtheserverwilllistenonlyforpacketsdestinedforthoseIPs.First,whatifwebindsolelytoanexternalinterface?Runtheserverlikethis,usingwhateveryouroperatingsystemtellsyouistheexternalIPaddressofyoursystem:
root@erlerobot:~/Python_files#pythonudp_remote.pyserver192.168.1.35
Listeningat('192.168.1.35',1060)
ConnectingtothisIPaddressfromanothermachineshouldstillworkjustfine:
root@erlerobot:~/Python_files#pythonudp_remote.pyclient192.168.1.35
Clientsocketnameis('192.168.1.35',58824)
Waitingupto0.1secondsforareply
Theserversays'Yourdatawas23bytes'
Butifyoutryconnectingtotheservicethroughtheloopbackinterfacebyrunningtheclientscriptonthesamemachine,thepacketswillneverbedelivered:
root@erlerobot:~/Python_files#pythonudp_remote.pyclient127.0.0.1
Clientsocketnameis('127.0.0.1',60251)
Waitingupto0.1secondsforareply
Traceback(mostrecentcalllast):
...
socket.error:[Errno111]Connectionrefused
Ifyourunclientagainonthesamemachine,butthistimeusetheexternalIPaddressofthebox,eventhoughtheclientandserverarebothrunningthere,thiswillnotgiveanyerror.SobindingtoanIPinterfacemightlimitwhichexternalhostscantalktoyou;butitwillcertainlynotlimitconversationswithotherclientsonthesamemachine,solongastheyknowtheIPaddressthattheyshouldusetoconnect.
Now,stopallofthescriptsthatarerunning,andwecantryrunningtwoserversonthesamebox.
root@erlerobot:~/Python_files#pythonudp_remote.pyserver127.0.0.1
Listeningat('127.0.0.1',1060)
Andthenwetryrunningasecondone,connectedtothewildcardIPaddressthatallowsrequestsfromanyaddress:
root@erlerobot:~/Python_files#pythonudp_remote.pyserver
Traceback(mostrecentcalllast):
...
socket.error:[Errno98]Addressalreadyinuse
WehavelearnedsomethingaboutoperatingsystemIPstacksandtherulesthattheyfollow:theydonotallowtwodifferentsocketstolistenatthesameIPaddressandportnumber,becausethentheoperatingsystemwouldnotknowwhereto
BindingtoInterfaces(UDP)
deliverincomingpackets.ButwhatifinsteadoftryingtorunthesecondserveragainstallIPinterfaces,wejustranitagainstanexternalIPinterface—onethatthefirstcopyoftheserverisnotlisteningto?Letustry:
root@erlerobot:~/Python_files#pythonudp_remote.pyserver192.168.1.35
Listeningat('192.168.1.35',1060)
Itworked,thismenasthattherearenowtwoserversrunningonthismachine,oneofwhichisboundtotheinwardlookingport1060ontheloopbackinterface,andtheotherlookingoutwardforpacketsarrivingonport1060fromthenetworktowhichmywirelesscardhasconnected.
IPnetworkstackneverthinksofaUDPportasaloneentitythatiseitherentirelyavailable,orelseinuse,atanygivenmoment.Instead,itthinksintermsofUDP“socketnames”thatarealwaysapairlinkinganIPinterface—evenifitisthewildcardinterface—withaUDPportnumber.Itisthesesocketnamesthatmustnotconflictamongthelisteningserversatanygivenmoment,ratherthanthebareUDPportsthatareinuse.
TheforegoingprogramlistingshavesuggestedthataUDPpacketcanbeupto64kBinsize,whereasyouprobablyalreadyknowthatyourEthernetorwirelesscardcanonlyhandlepacketsofaround1,500bytesinstead.
TheactualtruthisthatIPsendssmallUDPpacketsassinglepacketsonthewire,butsplitsuplargerUDPpacketsintoseveralsmallphysicalpackets.Thismeansthatlargepacketsaremorelikelytobedropped,sinceifanyoneoftheirpiecesfailstomakeitswaytothedestination,thenthewholepacketcanneverbereassembledanddeliveredtothelisteningoperatingsystem.Butasidefromthehigherchanceoffailure,thisprocessoffragmentinglargeUDPpacketssothattheywillfitonthewireshouldbeinvisibletoyourapplication.Therearethreeways,however,inwhichitmightberelevant:
Ifyouarethinkingaboutefficiency,youmightwanttolimityourprotocoltosmallpackets,tomakeretransmissionlesslikelyandtolimithowlongittakestheremoteIPstacktoreassembleyourUDPpacketandgiveittothewaitingapplication.
IftheICMPpacketsarewrongfullyblockedbyafirewallthatwouldnormallyallowyourhosttoauto-detecttheMTUbetweenyouandtheremotehost,thenyourlargerUDPpacketsmightdisappearintooblivionwithoutyoureverknowing.TheMTUisthe“maximumtransmissionunit”or“largestpacketsize”thatallofthenetworkdevicesbetweentwohostswillsupport.
Ifyourprotocolcanmakeitsownchoicesabouthowitsplitsupdatabetweendifferentpackets,andyouwanttobeabletoauto-adjustthissizebasedontheactualMTUbetweentwohosts,thensomeoperatingsystemsletyouturnofffragmentationandreceiveanerrorifaUDPpacketistoobig.Thisletsyouregroupandsplititintoseveralpacketsifthatispossible.
Linuxisoneoperatingsystemthatsupportsthislastoption.Takealookatbig_sender.py,whichsendsaverylargemessagetooneoftheserversthatwehavejustdesigned.
importIN,socket,sys
s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
MAX=65535
PORT=1060
iflen(sys.argv)!=2:
print>>sys.stderr,'usage:big_sender.pyhost'
sys.exit(2)
hostname=sys.argv[1]
s.connect((hostname,PORT))
s.setsockopt(socket.IPPROTO_IP,IN.IP_MTU_DISCOVER,IN.IP_PMTUDISC_DO)
try:
s.send('#'*65000)
exceptsocket.error:
print'Themessagedidnotmakeit'
option=getattr(IN,'IP_MTU',14)#constanttakenfrom<linux/in.h>
print'MTU:',s.getsockopt(socket.IPPROTO_IP,option)
else:
print'Thebigmessagewassent!Yournetworksupportsreallybigpackets!'
Ifwerunthisprogramagainstaserverelsewhereonmyhomenetwork,thenwediscoverthatmywirelessnetworkallowsphysicalpacketsthatarenobiggerthanthe1,500bytestypicallysupportedbyEthernet-stylenetworks:
root@erlerobot:~/Python_files#pythonbig_sender.py127.0.0.0
Themessagedidnotmakeit
MTU:1500
UDPFragmentation
ThePOSIXsocketinterfacealsosupportsallsortsofsocketoptionsthatcontrolspecificbehaviorsofnetworksockets.TheseareaccessedthroughthePythonsocketmethodsgetsockopt()andsetsockopt(),usingtheoptionsyouwillfinddocumentedforyouroperatingsystem.YoucanfindthisoptionsdescribedinthePythondocumentation.
Whensettingsocketoptions,thesetcallissimilarto:
value=s.getsockopt(socket.SOL_SOCKET,socket.SO_BROADCAST)
s.setsockopt(socket.SOL_SOCKET,socket.SO_BROADCAST,value)
Herearesomeofthemorecommonoptions:
SO_BROADCAST:AllowsbroadcastUDPpacketstobesentandreceived;seethenextsectionfordetails.
SO_DONTROUTE:Onlybewillingtosendpacketsthatareaddressedtohostsonsubnetstowhichthiscomputerisconnecteddirectly.
SO_TYPE:Whenpassedtogetsockopt(),thisreturnstoyouregardlessofwhetherasocketisoftypeSOCK_DGRAMandcanbeusedforUDP,oritisoftypeSOCK_STREAMandinsteadsupportsthesemanticsofTCP.
NOTE:
IfUDPhasasuperpower,itisitsabilitytosupportbroadcast:insteadofsendingapackettosomespecificotherhost,youcanpointitatanentiresubnettowhichyourmachineisattachedandhavethephysicalnetworkcardbroadcastthepacketsothatallattachedhostsseeitwithoutitshavingtobecopiedseparatelytoeachoneofthem.Hereandhereyoucanfindtwoexampleofbroadcasting.
SocketOptions
TheTransmissionControlProtocol(TCP)istheworkhorseoftheInternet.ProtocolsthatcarrydocumentsandfilesnearlyalwaysrideatopTCP,includingHTTPandallthemajorwaysoftransmittinge-mail.Itisalsothefoundationofchoiceforprotocolsthatcarryonlongconversationsbetweenpeopleorcomputers,likeSSHandmanypopularchatprotocols
TCP
First,everypacketisgivenasequencenumber,sothatthesystemonthereceivingendcanputthembacktogetherintherightorder,andsothatitcannoticemissingpacketsinthesequenceandaskthattheybere-transmitted.Insteadofusingsequentialintegers(1,2,…)tomarkpackets,TCPusesacounterthatcountsthenumberofbytestransmitted.Soa1,024-bytepacketwithasequencenumberof7,200wouldbefollowedbyapacketwithasequencenumberof8,224.Thismeansthatabusynetworkstackdoesnothavetorememberhowitbrokeadatastreamupintopackets;ifaskedforare-transmission,itcanbreakthestreamupintopacketssomeotherway(whichmightletitfitmoredataintoapacketifmorebytesarenowwaitingfortransmission),andthereceivercanstillputthepacketsbacktogether.
Ratherthanrunningveryslowlyinlock-stepbyneedingeverypackettobeacknowledgedbeforeitsendsthenextone,TCPsendswholeburstsofpacketsatatimebeforeexpectingaresponse.TheamountofdatathatasenderiswillingtohaveonthewireatanygivenmomentiscalledthesizeoftheTCP“window.”TheTCPimplementationonthereceivingendcanregulatethewindowsizeofthetransmittingend,andthussloworpausetheconnection.Thisiscalled“flowcontrol.”Thisletsitforbidthetransmissionofadditionalpacketsincaseswhereitsinputbufferisfullanditwouldhavetodiscardanymoredataifitweretoarriverightnow.
Finally,ifTCPseesthatpacketsarebeingdropped,itassumesthatthenetworkisbecomingcongestedandstopssendingasmuchdataeverysecond.
HowTCPworks
TCPhasverynearlybecomeauniversaldefaultwhentwoprogramsneedtocommunicate,weshouldlookatafewinstancesinwhichitsbehaviorisnotoptimalforcertainkindsofdata,incaseanapplicationyouarewritingeverfallsintooneofthesecategories.First,TCPisunwieldyforprotocolswhereclientswanttosendsingle,smallrequeststoaserver,andthenaredoneandwillnottalktoitfurther.IttakesthreepacketsfortwohoststosetupaTCPconnection—thefamoussequenceofSYN,SYN-ACK,andACK(whichmean“Iwanttotalk,hereisthepacketsequencenumberIwillbestartingwith”;“okay,here’smine”;“okay!”)—andthenanotherthreeorfourtoshuttheconnectionbackdown(eitheraquickFIN,FIN-ACK,ACK,oraslightlylongerpairofseparateFINandACKpackets).Thatissixpacketsjusttosendasinglerequest:ProtocoldesignersquicklyturntoUDPinsuchcases.
InviewofthiswearegoingtodetailtwosituationswheretheuseofTCPisnotappropriate:
WhereUDPreallyshinesoverTCP,then,iswheresuchalong-termrelationshipdoesnotpertainbetweenclientandserver,andespeciallywheretherearesomanyclientsthatatypicalTCPimplementationwouldrunoutofportnumbersifithadtokeepupwithaseparatedatastreamforeachactiveclient.
ThesecondsituationwhereTCPisinappropriateiswhenanapplicationcandosomethingmuchsmarterthansimplyre-transmitdatawhenapackethasbeenlost.Imagineanaudiochatconversation,forexample:ifasecond’sworthofdataislostbecauseofadroppedpacket,thenitwilldolittlegoodtosimplyre-sendthatsamesecondofaudio,overandover,untilitfinallyarrives.
WhentouseTCP
Aswehavementionedbefore,TCPusesportnumberstodistinguishdifferentapplicationsrunningatthesameIPaddress,andfollowsexactlythesameconventionsregardingwell-knownandephemeralportnumber.WithastatefulstreamprotocollikeTCP,the`connect()callbecomesthefundamentalactuponwhichallothernetworkcommunicationhinges.TCPconnect()canfail:Theremotehostmightnotanswer;itmightrefusetheconnection;ormoreobscureprotocolerrorsmightoccurliketheimmediatereceiptofaRST(“reset”)packet.Becauseastreamconnectioninvolvessettingupapersistentconnectionbetweentwohosts,theotherhostneedstobelisteningandreadytoacceptyourconnection.
Onthe“serverside”—which,forthepurposeofthischapter,istheconversationpartnernotdoingtheconnect()callbutreceivingtheSYNpacketthatitinitiates—anincomingconnectiongeneratesanevenmoremomentousevent,thecreationofanewsocket.ThisisbecausethestandardPOSIXinterfacetoTCPactuallyinvolvestwocompletelydifferentkindsofsockets:“passive”listeningsocketsandactive“connected”ones:
Apassivesocketholdsthe“socketname”—theaddressandportnumber—atwhichtheserverisreadytoreceiveconnections.Nodatacaneverbereceivedorsentbythiskindofport;itdoesnotrepresentanyactualnetworkconversation.Instead,itishowtheserveralertstheoperatingsystemtoitswillingnesstoreceiveincomingconnectionsinthefirstplace.
Anactivesocket(connectedsocket),isboundtooneparticularremoteconversationpartner,whohastheirownIPaddressandportnumber.Itcanbeusedonlyfortalkingbackandforthwiththatpartner,andcanbereadandwrittentowithoutworryingabouthowtheresultingdatawillbesplitupintopackets—inmanycases,aconnectedsocketcanbepassedtoanotherPOSIXprogramthatexpectstoreadfromanormalfile,andtheprogramwillneverevenknowthatitistalkingtothenetwork.
Notethatwhileapassivesocketismadeuniquebytheinterfaceaddressandportnumberatwhichitislistening(sothatnooneelseisallowedtograbthatsameaddressandport),therecanbemanyactivesocketsthatallsharethesamelocalsocketname.
Whatmakesanactivesocketuniqueis,rather,thefour-partcoordinate:(local_ip,local_port,remote_ip,remote_port).Itisthisfour-tuplebywhichtheoperatingsystemnameseachactiveTCPconnection,andincomingTCPpacketsareexaminedtoseewhethertheirsourceanddestinationaddressassociatethemwithanyofthecurrentlyactivesocketsonthesystem.
WhatTCPSocketsMean
HereyoucanfindthecodeofasimpleTCPclientandserverthatsendandreceive16octets:
importsocket,sys
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
HOST=sys.argv.pop()iflen(sys.argv)==3else'127.0.0.1'
PORT=1060
defrecv_all(sock,length):
data=''
whilelen(data)<length:
more=sock.recv(length-len(data))
ifnotmore:
raiseEOFError('socketclosed%dbytesintoa%d-bytemessage'
%(len(data),length))
data+=more
returndata
ifsys.argv[1:]==['server']:
s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)
s.bind((HOST,PORT))
s.listen(1)
whileTrue:
print'Listeningat',s.getsockname()
sc,sockname=s.accept()
print'Wehaveacceptedaconnectionfrom',sockname
print'Socketconnects',sc.getsockname(),'and',sc.getpeername()
message=recv_all(sc,16)
print'Theincomingsixteen-octetmessagesays',repr(message)
sc.sendall('Farewell,client')
sc.close()
print'Replysent,socketclosed'
elifsys.argv[1:]==['client']:
s.connect((HOST,PORT))
print'Clienthasbeenassignedsocketname',s.getsockname()
s.sendall('Hithere,server')
reply=recv_all(s,16)
print'Theserversaid',repr(reply)
s.close()
else:
print>>sys.stderr,'usage:tcp_local.pyserver|client[host]'
First,theTCPconnect()callisnottheinnocuousbitoflocalsocketconfigurationthatitisinthecaseofUDP,whereitmerelysetsadefaultaddressusedwithanysubsequentsend()calls,andplacesafilteronpacketsarrivingatoursocket.Here,connect()isareallivenetworkoperationthatkicksoffthethree-wayhandshakebetweentheclientandservermachinesothattheyarereadytocommunicate.Thismeansthatconnect()canfail,asyoucanverifyquiteeasilybyexecutingthisscriptwhentheserverisnotrunning:
root@erlerobot:~/Python_files#pythontcp_sixteen.pyclient
Traceback(mostrecentcalllast):
File"tcp_sixteen.py",line29,in<module>
s.connect((HOST,PORT))
File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py",line224,inmeth
returngetattr(self._sock,name)(*args)
socket.error:[Errno61]Connectionrefused
YouwillseethatthisTCPclientisinonewaymuchsimplerthanourUDPclient,becauseitdoesnotneedtomakeanyprovisionformissingdata.BecauseoftheassurancesthatTCPprovides,itcansend()datawithoutcheckingwhethertheremoteendreceivesit,andrunrecv()withouthavingtoconsiderthepossibilityofre-transmittingitsrequest.
WhenweperformaTCPsend(),ouroperatingsystem’snetworkingstackwillfaceoneofthreesituations:
Thedatacanbeimmediatelyacceptedbythesystem,eitherbecausethenetworkcardisimmediatelyfreetotransmit,orbecausethesystemhasroomtocopythedatatoatemporaryoutgoingbuffersothatyourprogramcancontinuerunning.Inthesecases,send()returnsimmediately,anditwillreturnthelengthofyourdatastringbecausethewholestringwastransmitted.
ASimpleTCPClientandServer
Anotherpossibilityisthatthenetworkcardisbusyandthattheoutgoingdatabufferforthissocketisfullandthesystemcannot—orwillnot—allocateanymorespace.Inthiscase,thedefaultbehaviorofsend()issimplytoblock,pausingyourprogramuntilthedatacanbeaccepted.
Thereisafinal,hybridpossibility:thattheoutgoingbuffersarealmostfull,butnotquite,andsopartofthedatayouaretryingtosendcanbeimmediatelyqueued,buttherestwillhavetowait.Inthiscase,send()completesimmediatelyandreturnsthenumberofbytesacceptedfromthebeginningofyourdatastring,butleavestherestofthedataunprocessed.
Fortunately,Pythondoesnotforceustodothisdanceourselveseverytimewehaveablockofdatatosend:theStandardLibrarysocketimplementationprovidesafriendlysendall()method.Notonlyissendall()fasterthandoingitourselves,itreleasestheGlobalInterpreterLockduringitsloopsothatotherPythonthreadscanrunwithoutcontentionuntilallofthedatahasbeentransmitted.Unfortunately,noequivalentisprovidedfortherecv()call,despitethefactthatitmightreturnonlypartofthedatathatisonthewayfromtheclient.Internally,theoperatingsystemimplementationofrecv()useslogicveryclosetothatusedwhensending:
Ifnodataisavailable,thenrecv()blocksandyourprogrampausesuntildataarrives.
Ifplentyofdataisavailablealreadyintheincomingbuffer,thenyouaregivenasmanybytesasyouaskedrecv()for.
Butifthebuffercontainsabitofdata,butnotasmuchasyouareaskingfor,thenyouareimmediatelyreturnedwhatdoeshappentobethere,evenifitisnotasmuchasyouhaveaskedfor.
Inthecodestoredintcp_sixteen.py,youcanseehowthedistinctionbetweenactiveandlisteningsocketiscarriedthroughinactualservercode.Thelink,whichmightstrikeyouasoddatfirst,isthatalisteningsocketactuallyproducesnewconnectedsocketsasthereturnvaluethatyougetbylistening.Followthestepsintheprogramlistingtoseetheorderinwhichthesocketoperationsoccur.
Runtheserver:
root@erlerobot:~/Python_files#pythontcp_sixteen.pyserver
Listeningat('127.0.0.1',1060)
Andthentheclient(inanotherterminalwindow):
root@erlerobot:~/Python_files#pythontcp_sixteen.pyclient
Clienthasbeenassignedsocketname('127.0.0.1',49607)
Theserversaid'Farewell,client'
Theserverreturnsthis:
Wehaveacceptedaconnectionfrom('127.0.0.1',49607)
Socketconnects('127.0.0.1',1060)and('127.0.0.1',49607)
Theincomingsixteen-octetmessagesays'Hithere,server'
Replysent,socketclosed
Listeningat('127.0.0.1',1060)
theIPaddressthatyoupairwithaportnumberwhenyouperformabind()operationtellstheoperatingsystemwhichnetworkinterfacesyouarewillingtoreceiveconnectionsfrom.Theexampleinvocationsoftcp_sixteen.pyusedthelocalhostIPaddress127.0.0.1,whichprotectsyourcodefromconnectionsoriginatingonothermachines.Youcanverifythisbyrunningtcp_sixteen.pyinservermodeasshownpreviously,andtryingtoconnectwithaclientfromanothermachine:
root@erlerobot:~/Python_files#pythontcp_sixteen.pyclient192.168.1.35
Traceback(mostrecentcalllast):
File"tcp_sixteen.py",line29,in<module>
s.connect((HOST,PORT))
File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py",line224,inmeth
returngetattr(self._sock,name)(*args)
socket.error:[Errno61]Connectionrefused
Butifyouruntheserverwithanemptystringforthehostname,whichtellsthePythonbind()routinethatyouarewillingtoacceptconnectionsthroughanyofyourmachine’sactivenetworkinterfaces,thentheclientcanconnectsuccessfullyfromanotherhost:
root@erlerobot:~/Python_files#pythontcp_sixteen.pyserver""
Listeningat('0.0.0.0',1060)
`
Runtheclient:
root@erlerobot:~/Python_files#pythontcp_sixteen.pyclient192.168.1.35
Clienthasbeenassignedsocketname('192.168.1.35',49696)
Theserversaid'Farewell,client'
Thisappearintotheserverterminal:
Wehaveacceptedaconnectionfrom('192.168.1.35',49696)
Socketconnects('192.168.1.35',1060)and('192.168.1.35',49696)
Theincomingsixteen-octetmessagesays'Hithere,server'
Replysent,socketclosed
Listeningat('0.0.0.0',1060)
BindingtoInterfaces(TCP)
Theterm“deadlock”isusedforallsortsofsituationsincomputersciencewheretwoprograms,sharinglimitedresources,canwindupwaitingoneachotherforeverbecauseofpoorplanning.ItturnsoutthatitcanhappenfairlyeasilywhenusingTCP.
Takealookattcp_deadlock.pyforanexampleofaserverandclientthattrytobeabittoocleverwithoutthinkingthroughtheconsequences.Here,theserverauthorhasdonesomethingthatisactuallyquiteintelligent.Hisjobistoturnanarbitraryamountoftextintouppercase.Recognizingthatitsclient’srequestscanbearbitrarilylarge,andthatonecouldrunoutofmemorytryingtoreadanentirestreamofinputbeforetryingtoprocessit,theserverreadsandprocessessmallblocksof1,024bytesatatime.
importsocket,sys
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
HOST='127.0.0.1'
PORT=1060
ifsys.argv[1:]==['server']:
s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)
s.bind((HOST,PORT))
s.listen(1)
whileTrue:
print'Listeningat',s.getsockname()
sc,sockname=s.accept()
print'Processingupto1024bytesatatimefrom',sockname
n=0
whileTrue:
message=sc.recv(1024)
ifnotmessage:
break
sc.sendall(message.upper())#senditbackuppercase
n+=len(message)
print'\r%dbytesprocessedsofar'%(n,),
sys.stdout.flush()
sc.close()
print'Completedprocessing'
eliflen(sys.argv)==3andsys.argv[1]=='client'andsys.argv[2].isdigit():
bytes=(int(sys.argv[2])+15)//16*16#roundupto//16
message='capitalizethis!'#16-bytemessagetorepeatoverandover
print'Sending',bytes,'bytesofdata,inchunksof16bytes'
s.connect((HOST,PORT))
sent=0
whilesent<bytes:
s.sendall(message)
sent+=len(message)
print'\r%dbytessent'%(sent,),
sys.stdout.flush()
s.shutdown(socket.SHUT_WR)
print'Receivingallthedatatheserversendsback'
received=0
whileTrue:
data=s.recv(42)
ifnotreceived:
print'Thefirstdatareceivedsays',repr(data)
received+=len(data)
ifnotdata:
break
print'\r%dbytesreceived'%(received,),
s.close()
else:
Deadlock
print>>sys.stderr,'usage:tcp_deadlock.pyserver|client<bytes>'
Ifyoustarttheserverandthenruntheclientwithacommand-lineargumentspecifyingamodestnumberofbytes—say,askingittosend32bytesofdata(forsimplicity,itwillroundwhatevervalueyousupplyuptoamultipleof16bytes)—thenitwillgetitstextbackinalluppercase:
root@erlerobot:~/Python_files#pythontcp_deadlock.pyserver
Listeningat('127.0.0.1',1060)
root@erlerobot:~/Python_files#pythontcp_deadlock.pyclient32
Sending32bytesofdata,inchunksof16bytes
32bytessent
Receivingallthedatatheserversendsback
Thefirstdatareceivedsays'CAPITALIZETHIS!CAPITALIZETHIS!'
32bytesreceived
Ontheserverscreenthisisdisplayed:
Processingupto1024bytesatatimefrom('127.0.0.1',49702)
32bytesprocessedsofar
Completedprocessing
Listeningat('127.0.0.1',1060)
Now,tryusingtheclienttosendaverylargestreamofdata,say,onetotalingagigabyte:
root@erlerobot:~/Python_files#pythontcp_deadlock.pyclient1073741824
Sending1073741824bytesofdata,inchunksof16bytes
1399600bytessent
Intheserverwindow:
Processingupto1024bytesatatimefrom('127.0.0.1',49703)
688032bytesprocessedsofar
Youwillseeboththeclientandtheserverfuriouslyupdatingtheirterminalwindowsastheybreathlesslyupdateyouwiththeamountofdatatheyhavetransmittedandreceived.Thenumberswillclimbandclimbuntil,quitesuddenly,bothconnectionsfreeze.Theserver’soutputbufferandtheclient’sinputbufferhavebothfinallyfilled,andTCPhasuseditswindowadjustmentprotocoltosignalthisfactandstopthesocketfromsendingmoredatathatwouldhavetobediscardedandlaterre-sent.
tcp_deadlock.pyshowsushowaPythonsocketobjectbehaveswhenanend-of-fileisreached.Youwillseethattheclientmakesashutdown()callonthesocketafteritfinishessendingitstransmission.Thissolvesanimportantproblem:iftheserverisgoingtoreadforeveruntilitseesend-offile,thenhowwilltheclientavoidhavingtodoafullclose()onthesocketandthusforbiditselffromdoingthemanyrecv()callsthatitstillneedstomaketoreceivetheserver’sresponse?Thesolutionisto“half-close”thesocket—thatis,topermanentlyshutdowncommunicationinonedirectionbutwithoutdestroyingthesocketitself—sothattheservercannolongerreadanydata,butcanstillsendanyremainingreplybackintheotherdirection,whichwillstillbeopen.Theshutdown()callcanbeusedtoendeitherdirectionofcommunicationinatwo-waysocketlikethis;itsargumentcanbeoneofthreesymbols:
SHUT_WR:Thisisthemostcommonvalueused,sinceinmostcasesaprogramknowswhenitsownoutputisfinishedbutnotaboutwhenitsconversationpartnerwillbedone.Thisvaluesaysthatthecallerwillbewritingnomoredataintothesocket,andthatreadsfromitsotherendshouldactlikeitisclosed.
SHUT_RD:Thisisusedtoturnofftheincomingsocketstream,sothatanend-of-fileerrorisencounteredifyourpeertriestosendanymoredatatoyouonthesocket.
SHUT_RDWR:Thisclosescommunicationinbothdirectionsonthesocket.Itmightnot,atfirst,seemuseful,becauseyoucanalsojustperformaclose()onthesocketandcommunicationissimilarlyendedinbothdirections.Thedifferenceisaratheradvancedone:ifseveralprogramsonyouroperatingsystemareallowedtoshareasinglesocket,thenclose()justendsyourprocess’srelationshipwiththesocket,butkeepsitopenaslongasanotherprocessisstillusingit;but`shutdown()willalwaysimmediatelydisablethesocketforeveryoneusingit.
ClosedConnections,Half-OpenConnections
SinceTCPsupportsstreamsofdata,theymighthavealreadyremindedyouofnormalfiles,whichalsosupportreadingandwritingasfundamentaloperations.Pythondoesaverygoodjobofkeepingtheseconceptsseparate:fileobjectscanread()andwrite(),socketscansend()and`recv(),andnokindofobjectcandoboth.ButsometimesyouwillwanttotreatasocketlikeanormalPythonfileobject—oftenbecauseyouwanttopassitintocodelikethatofthemanyPythonmodulessuchaspickle,json,andzlibthatcanreadandwritedatadirectlyfromafile.Forthispurpose,Pythonprovidesamakefile()methodoneverysocketthatreturnsaPythonfileobjectthatisreallycallingrecv()andsend()behindthescenes:
>>>importsocket
>>>s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
>>>hasattr(s,'read')
False
>>>f=s.makefile()
>>>hasattr(f,'read')
True
Sockets,likenormalPythonfiles,alsohaveafileno()methodthatletsyoudiscovertheirfiledescriptornumberincaseyouneedtosupplyittolower-levelcalls.
UsingTCPStreamslikeFiles
Inthischapter,wewilldiscussthetopicofnetworkaddressesandwilldescribethedistributedservicethatallowsnamestoberesolvedtorawIPaddresses.
SocketnamesandDNS
ThelastchapterhasalreadyintroducedyoutothefactthatsocketscannotbenamedwithasingleprimitivePythonvaluelikeanumberorstring.Instead,bothTCPandUDPuseintegerportnumberstoshareasinglemachine'sIPaddressamongthemanydifferentapplicationsthatmightberunningthere,andsotheaddressandportnumberhavetobecombinedinordertoproduceasocketname,likethis:
('18.9.22.69',80)
Youwillrecallthatsocketnamesareimportantatseveralpointsinthecreationanduseofsockets.Foryourreference,hereareallofthemajorsocketmethodsthatdemandofyousomesortofsocketnameasanargument:
mysocket.accept():EachtimethisiscalledonalisteningTCPstreamsocketthathasincomingconnectionsreadytohandofftotheapplication,itreturnsatuple(orderedsetofvalues)whoseseconditemistheremoteaddressthathasconnected(thefirstiteminthetupleisthenetsocketconnectedtothatremoteaddress).
mysocket.bind(address):Assignsthesocketthelocaladdresssothatoutgoingpacketshaveanaddressfromwhichtooriginate,andsothatanyincomingconnectionsfromothermachineshaveanamethattheycanusetoconnect.
mysocket.connect(address):Establishesthatdatasentthroughthissocketwillbedirectedtothegivenremoteaddress.ForUDPsockets,thissimplysetsthedefaultaddressusedifthecallerusessend()ratherthansendto();forTCPsockets,thisactuallynegotiatesanewstreamwithanothermachineusingathree-wayhandshake,andraisesanexceptionifthenegotiationfails.
mysocket.getpeername():Returnstheremoteaddresstowhichthissocketisconnected.
mysocket.getsockname():Returnstheaddressofthissocket'sownlocalendpoint.
mysocket.recvfrom(...):ForUDPsockets,thisreturnsatuplethatpairsastringofreturneddatawiththeaddressfromwhichitwasjustsent.
mysocket.sendto(data,address):AnunconnectedUDPportusesthismethodtofireoffadatapacketataparticularremoteaddress.
Ingeneral,anyoftheforegoingmethodscanreceiveorreturnanyofthesortsofaddressesthatfollow,meaningthattheywillworkregardlessofwhetheryouareusingIPv4,IPv6orothers.
Socketnames
Ifyoureviewpreviouscode,youwillnoticethatwehaveuse:
importsocket
s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
s.bind(('localhost',1060))
WepaidparticularattentiontothehostnamesandIPaddressesthattheirsocketsused.Butifyoureadeachprogramlistingfromthebeginning,youwillseethattheseareonlythelasttwocoordinatesoffivemajordecisionsthatweremadeduringtheconstructionanddeploymentofeachsocketobject.Inorder,hereisthefulllistofvaluesthathadtobechosen,andyouwillseethattherearefiveinall:
First,theaddressfamilymakesthebiggestdecision:itnameswhatkindofnetworkyouwanttotalkto,outofthemanykindsthataparticularmachinemightsupport.WewillalwaysusethevalueAF_INET.
Nextaftertheaddressfamilycomesthesockettype.Itchoosestheparticularkindofcommunicationtechniquethatyouwanttouseonthenetworkyouhavechosen.thesocketinterfacedesignersdecidedtocreatemoregenericnamesforthebroadideaofapacket-basedsocket,whichgoesbythenameSOCK_DGRAM,andthebroadideaofareliableflowcontrolleddatastream,whichaswehaveseenisknownasaSOCK_STREAM.
Thethirdfieldinthesocket()call,theprotocol,israrelyusedbecauseonceyouhavespecifiedtheaddressfamilyandsockettype,youhavenarroweddownthepossibleprotocolstoonemajoroption.
Thefourthandfifthfieldsare,then,theIPaddressandUDPorTCPportnumberthatwereexplainedindetailinthelastchapters.
Fivesocketcordinates
Andhavingexplainedallofthat,itturnsoutthatthisbookactuallydoesneedtointroduceoneadditionaladdressfamily,beyondtheAF_INETwehaveusedsofar:theaddressfamilyforIPv6,namedAF_INET6,whichisthewayforwardintoafuturewheretheworlddoesnot,infact,runoutofIPaddresses.
InPythonyoucantestdirectlyforwhethertheunderlyingplatformsupportsIPv6bycheckingthehas_ipv6Booleanattributeinsidethesocketmodule:
>>>importsocket
>>>socket.has_ipv6
True
ButnotethatthisdoesnottellyouwhetheranactualIPv6interfaceisupandconfiguredandcancurrentlybeusedtosendpacketsanywhere;itispurelyanassertionaboutwhetherIPv6supporthasbeencompiledintotheoperatingsystem,notaboutwhetheritisinuse.
ThedifferencesthatIPv6willmakeforyourPythoncodemightsoundquitedaunting,iflistedonerightaftertheother:
YoursocketshavetobepreparedtohavethefamilyAF_INET6ifyouarecalledupontooperateonanIPv6network.
Nolongerdosocketnamesconsistofjusttwopieces,anaddressandaportnumber;instead,theycanalsoinvolveadditionalcoordinatesthatprovide“flow”informationanda“scope”identifier.
TheprettyIPv4octetslike18.9.22.69thatyoumightalreadybereadingfromconfigurationfilesorfromyourcommand-lineargumentswillnowsometimesbereplacedbyIPv6hostaddressesinstead,whichyoumightnotevenhavegoodregularexpressionsforyet.Theyhavelotsofcolons,theycaninvolvehexadecimalnumbers,andingeneraltheylookquiteugly.
IPv6
Tomakeyourcodesimple,powerful,andimmunefromthecomplexitiesofthetransitionfromIPv4toIPv6,youshouldturnyourattentiontooneofthemostpowerfultoolsinthePythonsocketuser'sarsenal:getaddrinfo().Thegetaddrinfo()functionsitsinthesocketmodulealongwithmostotheroperationsthatinvolveaddresses(ratherthanbeingasocketmethod).Unlessyouaredoingsomethingspecialized,itisprobablytheonlyroutinethatyouwilleverneedtotransformthehostnamesandportnumbersthatyourusersspecifyintoaddressesthatcanbeusedbysocketmethods.Itsapproachissimple:ratherthanmakingyouattacktheaddressingproblempiecemeal,whichisnecessarywhenusingtheolderroutinesinthesocketmodule,itletsyouspecifyeverythingyouknowabouttheconnectionthatyouneedtomakeinasinglecall.Inresponse,itreturnsallofthecoordinateswediscussedearlierthatarenecessaryforyoutocreateandconnectasockettothenameddestination.
IfwevisitPythonOfficialDocumentationwefindthissomeinterestingeplanations.Firstthesyntaxisisthefollowing:
socket.getaddrinfo(host,port[,family[,socktype[,proto[,flags]]]])
Sowhatgetaddrinfo()doesis;translatethehost/portargumentintoasequenceof5-tuplesthatcontainallthenecessaryargumentsforcreatingasocketconnectedtothatservice.hostisadomainname,astringrepresentationofanIPv4/v6addressorNone.portisastringservicenamesuchas'http',anumericportnumberorNone.BypassingNoneasthevalueofhostandport,youcanpassNULLtotheunderlyingCAPI.
Thefunctionreturnsalistof5-tupleswiththefollowingstructure:
(family,socktype,proto,canonname,sockaddr)
Inthesetuples,family,socktype,protoareallintegersandaremeanttobepassedtothesocket()function."canonname"willbeastringrepresentingthecanonicalnameofthehostifAI_CANONNAMEispartoftheflagsargument;elsecanonnamewillbeempty."sockaddr"isatupledescribingasocketaddress,whoseformatdependsonthereturnedfamily(a(address,port)2-tupleforAF_INET,a(address,port,flowinfo,scopeid)4-tupleforAF_INET6),andismeanttobepassedtothesocket.connect()method.
Hereyoufindaexapmleofuse:
>>>importsocket
>>>frompprintimportpprint
>>>infolist=socket.getaddrinfo('gatech.edu','www')
>>>pprint(infolist)
[(2,2,17,'',('130.207.160.173',80)),
(2,1,6,'',('130.207.160.173',80))]
>>>
>>>ftpca=infolist[0]
>>>ftpca[0:3]
(2,2,17)
>>>s=socket.socket(*ftpca[0:3])
>>>ftpca[4]
('130.207.160.173',80)
>>>s.connect(ftpca[4])
>>>
ftpcahereisanacronymfortheorderofthevariablesthatarereturned:“family,type,protocol,canonicalname,andaddress,”whichcontaineverythingyouneedtomakeaconnection.Here,wehaveaskedaboutthepossiblemethodsforconnectingtotheHTTPportofthehostgatech.edu,andhavebeentoldthattherearetwowaystodoit:bycreatingaSOCK_STREAMsocket(sockettype1)thatusesIPPROTO_TCP(protocolnumber6)orelsebyusingaSOCK_DGRAM(sockettype2)socketwith`IPPROTO_UDP(whichistheprotocolrepresentedbytheinteger17).
Asyoucanseefromtheforegoingcodesnippet,getaddrinfo()generallyallowsnotonlythehostnamebutalsotheportnametobeasymbolratherthananinteger.
Thegetaddrinfo()function
Beforetacklingalloftheoptionsthatgetaddrinfo()supports,itwillbemoreusefultoseehowitisusedtosupportthreebasicnetworkoperations.Wewilltacklethemintheorderthatyoumightperformoperationsonasocket:binding,connecting,andthenidentifyingaremotehostwhohassentyouinformation.
>>>importsocket
>>>fromsocketimportgetaddrinfo
>>>getaddrinfo(None,'smtp',0,socket.SOCK_STREAM,0,socket.AI_PASSIVE)
[(2,1,6,'',('0.0.0.0',25)),(30,1,6,'',('::',25,0,0))]
>>>getaddrinfo(None,53,0,socket.SOCK_DGRAM,0,socket.AI_PASSIVE)
[(2,2,17,'',('0.0.0.0',53)),(30,2,17,'',('::',53,0,0))]
>>>
Hereweaskedaboutwhereweshouldbind()asocketifwewanttoserveSMTPtrafficusingTCP,andifwewanttoserveDNStrafficusingDCP,respectively.TheanswerswegotbackineachcasearetheappropriatewildcardaddressesthatwillletusbindtoeveryIPv4andeveryIPv6interfaceonthelocalmachinewithalloftherightvaluesforthesocketfamily,sockettype,andprotocolineachcase.Ifyouinsteadwanttobind()toaparticularIPaddressthatyouknowthatthelocalmachineholds,thenomittheAI_PASSIVEflagandjustspecifythehostname.Forexample,herearetwowaysthatyoumighttrybindingtolocalhost:
>>>getaddrinfo('127.0.0.1','smtp',0,socket.SOCK_STREAM,0)
[(2,1,6,'',('127.0.0.1',25))]
>>>getaddrinfo('localhost','smtp',0,socket.SOCK_STREAM,0)
[(30,1,6,'',('::1',25,0,0)),(2,1,6,'',('127.0.0.1',25)),(30,1,6,'',('fe80::1%lo0',25,0,1))]
>>>
YoucanseethatsupplyingtheIPv4addressforthelocalhostlocksyoudowntoreceivingconnectionsonlyoverIPv4,whileusingthesymbolicnamelocalhost(atleastonaLinuxlaptop,withawell-configured/etc/hostsfile)makesavailableboththeIPv4andIPv6localnamesforthemachine.
Askinggetaddrinfo()WheretoBind
Themajorityusesofgetaddrinfo()areoutward-looking,andgenerateinformationsuitableforconnectingyoutootherapplications.Inallsuchcases,youcaneitheruseanemptystringtoindicatethatyouwanttoconnectbacktothelocalhostusingtheloopbackinterface,orprovideastringgivinganIPv4address,IPv6address,orhostnametonameyourdestination.Theusualuseofgetaddrinfo()inallothercases—which,basically,iswhenyouarepreparingtoconnect()or sendto()—istospecifytheAI_ADDRCONFIGflag,whichfiltersoutanyaddressesthatareimpossibleforyourcomputertoreach.Forexample,anorganizationmighthavebothanIPv4andanIPv6rangeofIPaddresses;butifyourparticularhostsupportsonlyIPv4,thenyouwillwanttheresultsfilteredtoincludeonlyaddressesinthatfamily.IncasethelocalmachinehasonlyanIPv6networkinterfacebuttheserviceyouareconnectingtoissupportingonlyIPv4,theAI_V4MAPPEDwillreturnyouthoseIPv4addressesre-encodedasIPv6addressesthatyoucanactuallyuse.Soyouwillusuallyusegetaddrinfo()thiswaywhenconnecting:
>>>importsocket
>>>fromsocketimportgetaddrinfo
>>>getaddrinfo('ftp.kernel.org','ftp',0,socket.SOCK_STREAM,0,socket.AI_ADDRCONFIG|socket.AI_V4MAPPED)
[(2,1,6,'',('199.204.44.194',21)),(2,1,6,'',('198.145.20.140',21)),(2,1,6,'',('149.20.4.69',21))]
>>>
Andwehavegottenexactlywhatwewanted:everywaytoconnecttoahostnamedftp.kernel.orgthroughaTCPconnectiontoitsFTPport.
Hereisanotherquery,whichdescribeshowIcanconnectfrommylaptoptotheHTTPinterfaceoftheIANAthatassignsportnumbersinthefirstplace:
>>>getaddrinfo('iana.org','www',0,socket.SOCK_STREAM,0,socket.AI_ADDRCONFIG|socket.AI_V4MAPPED)
[(2,1,6,'',('192.0.43.8',80))]
>>>
Ifwetakeawayourcarefullychosenflagsinthesixthparameter,thenwewillalsobeabletoseetheirIPv6address:
>>>getaddrinfo('iana.org','www',0,socket.SOCK_STREAM,0)
[(2,1,6,'',('192.0.43.8',80)),(30,1,6,'',('2001:500:88:200::8',80,0,0))]
>>>
Askinggetaddrinfo()AboutServices
Onelastcircumstancethatyouwillcommonlyencounteriswhereyoueitheraremakinganewconnection,ormaybehavejustreceivedaconnectiontooneofyourownsockets,andyouwantanattractivehostnametodisplaytotheuserorrecordinalogfile.Thisisslightlydangerousbecauseahostnamelookupcantakequiteabitoftime,evenonthemodernInternet,andmightreturnahostnamethatnolongerworksbythetimeyougoandcheckyourlogs—soforlogfiles,trytorecordboththehostnameandrawIPaddress.Butifyouhaveagooduseforthe“canonicalname”ofahost,thentryrunninggetaddrinfo()withtheAI_CANONNAMEflagturnedon,andthefourthitemofanyofthetuplesthatitreturns—thatwerealwaysemptystringsintheforegoingexamples,youwillnote—willcontainthecanonicalname:
>>>importsocket
>>>fromsocketimportgetaddrinfo
>>>getaddrinfo('iana.org','www',0,socket.SOCK_STREAM,0,socket.AI_ADDRCONFIG|socket.AI_V4MAPPED|socket.AI_CANONNAME)
[(2,1,6,'iana.org',('192.0.43.8',80))]
>>>
Askinggetaddrinfo()forPrettyHostnames
Theflagsavailablevarysomewhatbyoperatingsystem,andyoushouldalwaysconsultyourowncomputer'sdocumentation(nottomentionitsconfiguration)ifyouareconfusedaboutavaluethatitchoosestoreturn.Butthereareseveralflagsthattendtobecross-platform;herearesomeofthemoreimportantones:
AI_ALL:WAI_V4MAPPEDoptionwillsaveyouinthesituationwhereyouareonapurelyIPv6-connectedhost,butthehosttowhichyouwanttoconnectadvertisesonlyIPv4addresses:itresolvesthisproblemby“mapping”theIPv4addressestotheirIPv6equivalent.ButifsomeIPv6addressesdohappentobeavailable,thentheywillbetheonlyonesshown.Thustheexistenceofthisoption:ifyouwanttoseealloftheaddressesfromyourIPv6-connectedhost,eventhoughsomeperfectlygoodIPv6addressesareavailable,thencombinethisAI_ALLflagwithAI_V4MAPPEDandthelistreturnedtoyouwillhaveeveryaddressknownforthetargethost.
AI_NUMERICHOST:Thisturnsoffanyattempttointerpretthehostnameparameter(thefirstparameterto`getaddrinfo())asatextualhostnamelikecern.ch,andonlytriestointerpretthehostnamestringasaliteralIPv4orIPv6hostnamelike74.207.234.78orfe80::fcfd:4aff:fecf:ea4e.Thisismuchfaster,asnoDNSround-tripisincurred(seethenextsection),andpreventspossiblyuntrusteduserinputfromforcingyoursystemtoissueaquerytoanameserverundersomeoneelse'scontrol.
AI_NUMERICSERV:Thisturnsoffsymbolicportnameslikewwwandinsiststhatportnumberslike80beusedinstead.Thisdoesnotnecessarilyhavethenetworkqueryimplicationsofthepreviousoption,sinceport-numberdatabasesaretypicallystoredlocallyonIP-connectedmachines;onPOSIXsystems,resolvingasymbolicportnametypicallyrequiresonlyaquickscanofthe/etc/servicesfile(butcheckyour/etc/nsswitch.conffile'sservicesoptiontobesure).Butifyouknowyourportstringshouldalwaysbeaninteger,thenactivatingthisflagcanbeausefulsanitycheck.
Othergetaddrinfo()Flags
Hereyouhaveaquickexampleofhowgetaddrinfo()looksinactualcodeinwww_ping.py.
importsocket,sys
iflen(sys.argv)!=2:
print>>sys.stderr,'usage:www_ping.py<hostname_or_ip>'
sys.exit(2)
hostname_or_ip=sys.argv[1]
try:
infolist=socket.getaddrinfo(
hostname_or_ip,'www',0,socket.SOCK_STREAM,0,
socket.AI_ADDRCONFIG|socket.AI_V4MAPPED|socket.AI_CANONNAME,
)
exceptsocket.gaierror,e:
print'Nameservicefailure:',e.args[1]
sys.exit(1)
info=infolist[0]#perstandardrecommendation,trythefirstone
socket_args=info[0:3]
address=info[4]
s=socket.socket(*socket_args)
try:
s.connect(address)
exceptsocket.error,e:
print'Networkfailure:',e.args[1]
else:
print'Success:host',info[3],'islisteningonport80'
Itperformsasimpleare-you-theretestofwhateverwebserveryounameonthecommandlineby
attemptingaquickconnectiontoport80withastreamingsocket.Usingthescriptwouldlooksomethinglikethis:
root@erlerobot:~/Python_files#
root@erlerobot:~/Python_files#pythonwww_ping.pymit.edu
Success:hostmit.eduislisteningonport80
root@erlerobot:~/Python_files#pythonwww_ping.pysmtp.google.com
Nameservicefailure:nodenamenorservnameprovided,ornotknown
root@erlerobot:~/Python_files#www_ping.pyno-such-host.com
Nameservicefailure:nodenamenorservnameprovided,ornotknown
root@erlerobot:~/Python_files#
Notethatthesocket()constructordoesnottakealistofthreeitemsasitsparameter.Instead,theparameterlistisintroducedbyanasterisk,whichmeansthatthethreeelementsofthesocket_argslistarepassedasthreeseparateparameterstotheconstructor.
getaddrinfo()inyourowncode
TheDNSProtocolpurposeistoturnhostnamesintoIPaddresses.
Forexample,considerthedomainnamewww.python.org.Ifyourwebbrowserneedstoknowthisaddress,thenthebrowserrunsacalllikegetaddrinfo()toasktheoperatingsystemtoresolvethatname.Yoursystemwillknoweitherthatitisrunninganameserverofitsown,orthatthenetworktowhichitisattachedprovidesnameservice.So,thefirstactofyourDNSserverwillbetocheckitsowncacheofrecentlyquerieddomainnamestoseeifwww.python.orghasalreadybeencheckedbysomeothermachineservedbytheDNSserverinthelastfewminutesorhours.Ifanentryispresentandhasnotyetexpired—andtheownerofeachdomainnamegetstochooseitsexpirationtimeout,becausesomeorganizationsliketochangeIPaddressesquicklyiftheyneedto,whileothersarehappytohaveoldIPaddresseslingerforhoursordaysintheworld'sDNScaches—thenitcanbereturnedimmediately.Butletusimaginethatitismorningandthatyouarethefirstpersoninyourofficeorinthecoffeeshoptotrytalkingtowww.python.orgtoday,andsotheDNSserverhastogofindthehostnamefromscratch.YourDNSserverwillnowbeginarecursiveprocessofaskingaboutwww.python.orgattheverytopoftheworld'sDNSserverhierarchy:the“root-level”nameserversthatknowallofthetop-leveldomains(TLDs)like.com,.org,.net,andallofthecountrydomains,andknowthegroupsofserversthatareresponsibleforeach.NameserversoftwaregenerallycomeswiththeIPaddressesofthesetop-levelserversbuiltin,tosolvethebootstrappingproblemofhowyoufindanydomainnameserversbeforeyouareactuallyconnectedtothedomainnamesystem.WiththisfirstUDPround-trip,yourDNSserverwilllearn(ifitdidnotknowalreadyfromanotherrecentquery)whichserverskeepthefullindexof.orgdomain.
NowasecondDNSrequestwillbemade,thistimetooneofthe.orgservers,askingwhoonearthrunsthepython.orgdomain.Youcanfindoutwhatthosetop-levelserversknowaboutadomainbyrunningthewhoiscommand-lineprogramonaPOSIXsystem,oruseoneofthemany“whois”webpagesonline,typing:
whoispython.org
Whereveryouareintheworld,yourDNSrequestforanyhostnamewithinpython.orgmustbepassedontooneofthetwoDNSserversnamedinthatentry.
TherearesomereasondtonotuseDNS,andusegetaddrinfo()orsomeothersystem-supportedmechanismforresolvinghostnames.
TheDNSisoftennottheonlywaythatasystemgetsnameinformation.
IfyourapplicationrunsoffandtriestouseDNSonitsownasitsfirstchoiceforresolvingadomainname,thenuserswillnoticethatsomecomputernamesthatworkeverywhereelseonyoursystem—intheirbrowser,infilesharenames,andsoforth—suddenlydonotworkwhentheyuseyourapplication,becauseyouarenotdeferringtomechanismslikeWINSor/etc/hostsliketheoperatingsystemitselfdoes.
ThelocalmachineprobablyhasacacheofrecentlyquerieddomainnamesthatmightalreadyknowaboutthehostwhoseIPaddressyouneed.IfyoutryspeakingDNSyourselftoansweryourquery,youwillbeduplicatingworkthathasalreadybeendone.
ThesystemonwhichyourPythonscriptisrunningalreadyknowsaboutthelocaldomainnameservers,thankseithertomanualinterventionbyyoursystemadministratororanetworkconfigurationprotocollikeDHCPinyouroffice,home,orcoffeeshop.TocrankupDNSrightinsideyourPythonprogram,youwillhavetolearnhowtoqueryyourparticularoperatingsystemforthisinformation—anoperating-system-specificactionthatwewillnotbecoveringinthisbook.
IfyoudonotusethelocalDNSserver,thenyouwillnotbeabletobenefitfromitsowncachethatwouldpreventyourapplicationandotherapplicationsrunningonthesamenetworkfromrepeatingrequestsaboutahostnamethatisinfrequentuseatyourlocation.
ASketchofHowDNSWorks
Fromtimetotime,adjustmentsaremadetotheworldDNSinfrastructure,andoperatingsystemlibrariesanddaemonsaregraduallyupdatedtoaccommodatethis.IfyourprogrammakesrawDNScallsofitsown,thenyouwillhavetofollowthesechangesyourselfandmakesurethatyourcodestaysup-to-datewiththelatestchangesinTLDserverIPaddresses,conventionsinvolvinginternationalization,andtweakstotheDNSprotocolitself.
Thereis,however,asolidandlegitimatereasontomakeaDNScallfromPython:becauseyouareamailserver,orattheveryleastaclienttryingtosendmaildirectlytoyourrecipientswithoutneedingtorunalocalmailrelay,andyouwanttolookuptheMXrecordsassociatedwithadomainsothatyoucanfindthecorrectmailserverforyourfriendsat@example.com.
PyDNSprovidesamoduleforperformingDNSqueriesfrompythonapplications.Youcaninstallitby:
pipinstallpydns
YourPythoninterpreterwillthengaintheabilitytorunourfirstDNSprogramlisting,shownindns_basic.py.
importsys,DNS
iflen(sys.argv)!=2:
print>>sys.stderr,'usage:dns_basic.py<hostname>'
sys.exit(2)
DNS.DiscoverNameServers()
request=DNS.Request()
forqtinDNS.Type.A,DNS.Type.AAAA,DNS.Type.CNAME,DNS.Type.MX,DNS.Type.NS:
reply=request.req(name=sys.argv[1],qtype=qt)
foranswerinreply.answers:
printanswer['name'],answer['classstr'],answer['typename'],\
repr(answer['data'])
Runningthisprogrammwillresultoon:
root@erlerobot:~/Python_files#dns_basic.pypython.org
python.orgINA'82.94.164.162'
python.orgINAAAA'\x01\x08\x88\x00\x00\r\x00\x00\x00\x00\x00\x00\x00\xa2'
python.orgINMX(50,'mail.python.org')
python.orgINNS'ns2.xs4all.nl'
python.orgINNS'ns.xs4all.nl'
Thekeysthatgetprintedoneachlineareasfollows:
Thenamethatwelookedup.
The“class,”whichinallqueriesyouarelikelytoseeisIN,meaningitisaquestionaboutInternetaddresses.
The“type”ofrecord;somecommononesareAforanIPv4address,AAAAforanIPv6address,NSforarecordthatlistsanameserver,andMXforastatementaboutwhatmailservershouldbeusedforadomain.
Finally,the“data”providestheinformationforwhichtherecordtypewasessentiallyapromise:theaddress,ordata,orhostnameassociatedwiththenamethatweaskedabout.
UsingDNS
Whatdatashouldwesend?Howshoulditbeencodedandformatted?ForwhatkindsoferrorswillourPythonprogramsneedtobeprepared?Wewilllookatthebasicanswersinthischapter,andlearnhowtousesocketsresponsiblysothatourdataarrivesintact.
NetworkDataandNetworkErrors
TheuseofASCIIforthebasicEnglishlettersandnumbersisnearlyuniversalamongnetworkprotocolsthesedays.Butwhenyoubegintousemoreinterestingcharacters,youhavetobecareful.InPythonyoushouldalwaysrepresentameaningfulstringoftextwitha“Unicodestring”thatisdenotedwithaleadingu,likethis:
>>>elvish=u'Namárië!'
Butyoucannotputsuchstringsdirectlyonanetworkconnectionwithoutspecifyingwhichrivalsystemofencodingyouwanttousetomixyourcharactersdowntobytes.AverypopularsystemisUTF-8,becausenormalcharactersarerepresentedbythesamecodesasinASCII,andlongersequencesofbytesarenecessaryonlyforinternationalcharacters.OtherencodingsareavailableinPython;theStandardLibrarydocumentationforthecodecspackageliststhemall.Theyeachrepresentafullsystemforreducingsymbolstobytes.Hereareafewexamples:
>>>elvish.encode('idna')
'xn--namri!-rta6f'
>>>elvish.encode('cp500')
'\xd5\x81\x94E\x99\x89SO'
>>>elvish.encode('utf_8_sig')
'\xef\xbb\xbfNam\xc3\xa1ri\xc3\xab!'
Onthereceivingendofsuchastring,simplytakethebytestringandcallitsdecode()methodwiththenameofthecodecthatwasusedtoencodeit:
>>>'xn--namri!-rta6f'.decode('idna')
u'nam\xe1ri\xeb!'
>>>'\xd5\x81\x94E\x99\x89SO'.decode('cp500')
u'Nam\xe1ri\xeb!'
>>>'\xef\xbb\xbfNam\xc3\xa1ri\xc3\xab!'.decode('utf_8_sig')
u'Nam\xe1ri\xeb!'
TextandEncodings
Tounderstandtheissueofbyteorder,considertheprocessofsendinganintegeroverthenetwork.Tobespecific,thinkabouttheinteger4253.
Manyprotocols,ofcourse,willsimplytransmitthisintegerasthestring'4253'—thatis,asfourdistinctcharacters.Thefourdigitswillrequireatleastfourbytestotransmit,atleastinanycommontextencoding.Andusingdecimaldigitswillalsoinvolvesomecomputationalexpense:sincenumbersarenotstoredinsidecomputersinbase10,itwilltakerepeateddivision—withinspectionoftheremainder—todeterminethatthisnumberisinfactmadeof4thousands,plus2hundreds,plus5tens,plus3leftover.Andwhenthefour-digitstring'4253'isreceived,repeatedadditionandmultiplicationbypowersoftenwillbenecessarytoputthetextbacktogetherintoanumber.
Inanycase,thestring'4253'isnothowyourcomputerrepresentsthisnumberasanintegervariableinPython.Insteaditwillstoreitasabinarynumber,usingthebitsofseveralsuccessivebytestorepresenttheone'splace,two'splace,four'splace,andsoforthofasinglelargenumber.Wecanglimpsethewaythattheintegerisstoredbyusingthehex()built-infunctionatthePythonprompt:
>>>hex(4253)
'0x109d'
Eachhexdigitcorrespondstofourbits,soeachpairofhexdigitsrepresentsabyteofdata.Insteadofbeingstoredasfourdecimaldigits4,4,2,and3withthefirst4beingthe“mostsignificant”digit(sincetweakingitsvaluewouldthrowthenumberoffbyathousand)and3beingitsleastsignificantdigit,thenumberisstoredasamostsignificantbyte0x10andaleastsignificantbyte0x9d,adjacenttooneanotherinmemory.
Herewereachagreatdifferencebetweencomputers.Whiletheywillallagreethatthebytesinmemoryhaveanorder,andtheywillallstoreastringlikeContent-Length:4253inexactlythatorderstartingwithCandendingwith3,theydonotshareasingleideaabouttheorderinwhichthebytesofabinarynumbershouldbestored.Somecomputersare“big-endian”andputthemostsignificantbytefirst;othersare“little-endian”andputtheleastsignificantbytefirst.
Pythonmakesitveryeasytoseethedifferencebetweenthetwoendiannesses.Simplyusethestructmodule,whichprovidesavarietyofoperationsforconvertingdatatoandfrompopularbinaryformats.Hereisthenumber4253representedfirstinalittle-endianformatandtheninabig-endianorder:
>>>importstruct
>>>struct.pack('<i',4253)
'\x9d\x10\x00\x00'
>>>struct.pack('>i',4253)
'\x00\x00\x10\x9
structmoduleperformsconversionsbetweenPythonvaluesandCstructsrepresentedasPythonstrings.Youcanreadmorehere.Wehereusedthecodei,whichusesfourbytestostoreaninteger,sothetwoupperbytesarezeroforasmallnumberlike4253.Italsosupportsanunpack()operation,whichconvertsthebinarydatabacktoPythonnumbers:
>>>struct.unpack('>i','\x00\x00\x10\x9d')
(4253,)
Thereforethestructmoduleprovidesanothersymbol,'!',whichmeansthesamethingas'>'whenusedinpack()andunpack()butsaystootherprogrammers(and,ofcourse,toyourselfasyoureadthecodelater),“IampackingthisdatasothatIcansenditoverthenetwork.”
NetworkByteOrder
IfyouhavemadethefarmorecommonoptionofusingaTCPstreamforcommunication,thenyouwillfacetheissueofframing,thatis,theissueofhowtodelimityourmessagessothatthereceivercantellwhereonemessageendsandthenextbegins.
Thereisafirstpattern(streaming)thatcanbeusedbyextremelysimplenetworkprotocolsthatinvolveonlythedeliveryofdata—noresponseisexpected,sothereneverhastocomeatimewhenthereceiverdecides“Enough!”andturnsaroundtosendaresponse.Inthiscase,thesendercanloopuntilalloftheoutgoingdatahasbeenpassedtosendall()andthenclose()thesocket.Thereceiverneedonlycallrecv()repeatedlyuntilthecallfinallyreturnsanemptystring,indicatingthatthesenderhasfinallyclosedthesocket.Youcanseethispatterninstreamer.py:
importsocket,sys
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
HOST=sys.argv.pop()iflen(sys.argv)==3else'127.0.0.1'
PORT=1060
ifsys.argv[1:]==['server']:
s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)
s.bind((HOST,PORT))
s.listen(1)
print'Listeningat',s.getsockname()
sc,sockname=s.accept()
print'Acceptedconnectionfrom',sockname
sc.shutdown(socket.SHUT_WR)
message=''
whileTrue:
more=sc.recv(8192)#arbitraryvalueof8k
ifnotmore:#sockethasclosedwhenrecv()returns''
break
message+=more
print'Donereceivingthemessage;itsays:'
printmessage
sc.close()
s.close()
elifsys.argv[1:]==['client']:
s.connect((HOST,PORT))
s.shutdown(socket.SHUT_RD)
s.sendall('Beautifulisbetterthanugly.\n')
s.sendall('Explicitisbetterthanimplicit.\n')
s.sendall('Simpleisbetterthancomplex.\n')
s.close()
else:
print>>sys.stderr,'usage:streamer.pyserver|client[host]'
Ifyourunthisscriptasaserverandthen,atanothercommandprompt,runtheclientversion,you
willseethatalloftheclient'sdatamakesitintacttotheserver,withtheend-of-fileeventgeneratedbytheclientclosingthesocketservingastheonlyframingthatisnecessary:
root@erlerobot:~/Python_files#pythonstreamer.pyserver
Listeningat('127.0.0.1',1060)
Acceptedconnectionfrom('127.0.0.1',49592)
Donereceivingthemessage;itsays:
Beautifulisbetterthanugly.
Explicitisbetterthanimplicit.
Simpleisbetterthancomplex.
Thereisasecondpatternisavariantonthefirst:streaminginbothdirections.Thesocketisinitiallyleftopeninbothdirections.First,dataisstreamedinonedirection—exactlyandthenthatdirectionaloneisshutdown.Second,dataisthen
FramingandQuoting
streamedintheotherdirection,andthesocketisfinallyclosed.
Athirdpattern,whichwehavealreadyseen,istousefixed-lengthmessages,asillustratedintcp_sixteen.py.YoucanusethePythonsendall()methodtokeepsendingpartsofastringuntilthewholethinghasbeentransmitted,andthenusearecv()loopofourowndevisingtomakesurethatyoureceivethewholemessage.
Afourthpatternistosomehowdelimityourmessageswithspecialcharacters.Thereceiverwouldwaitinarecv()loopliketheonejustcited,butwaituntilthereplystringitwasaccumulatingfinallycontainedthedelimiterindicatingtheend-of-message.
Afifthpatternistoprefixeachmessagewithitslength.Thisisaverypopularchoiceforhighperformanceprotocolssinceblocksofbinarydatacanbesentverbatimwithouthavingtobeanalyzed,quoted,orinterpolated.Ofcourse,thelengthitselfhastobeframedusingoneofthetechniquesgivenpreviously—oftenitissimplyafixed-widthbinaryinteger,orelseavariable-lengthdecimalstringfollowedbyadelimiter.Buteitherway,oncethelengthhasbeenreadanddecoded,thereceivercanenteraloopandcallrecv()repeatedlyuntilthewholemessagehasarrived.
Thereissixthpatternforwhichtheunknownlengthsarenoproblem.Insteadofsendingjustone,trysendingseveralblocksofdatathatareeachprefixedwiththeirlength.Thismeansthataseachchunkofnewinformationbecomesavailabletothesender,itcanbelabeledwithitslengthandplacedontheoutgoingstream.Whentheendfinallyarrives,thesendercanemitanagreed-uponsignal—perhapsalengthfieldgivingthenumberzero—thattellsthereceiverthattheseriesofblocksiscomplete.
Following(blocks.py)youcanfindanexampleofthissixthpattern.Likethepreviousone,thissendsdatainonlyonedirection—fromtheclienttotheserver—butthedatastructureismuchmoreinteresting.Eachmessageisprefixedwitha4-bytelength;inastruct,'I'meansa32-bitunsignedinteger,meaningthatthesemessagescanbeupto4GBinlength.Aseriesofthreesuchmessagesissenttotheserver,followedbyazero-lengthmessage—whichisessentiallyjustalengthfieldwithzerosinsideandthennomessagedataafterit—tosignalthattheseriesofblocksisover.
importsocket,struct,sys
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
HOST=sys.argv.pop()iflen(sys.argv)==3else'127.0.0.1'
PORT=1060
format=struct.Struct('!I')#formessagesupto2**32-1inlength
defrecvall(sock,length):
data=''
whilelen(data)<length:
more=sock.recv(length-len(data))
ifnotmore:
raiseEOFError('socketclosed%dbytesintoa%d-bytemessage'
%(len(data),length))
data+=more
returndata
defget(sock):
lendata=recvall(sock,format.size)
(length,)=format.unpack(lendata)
returnrecvall(sock,length)
defput(sock,message):
sock.send(format.pack(len(message))+message)
ifsys.argv[1:]==['server']:
s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)
s.bind((HOST,PORT))
s.listen(1)
print'Listeningat',s.getsockname()
sc,sockname=s.accept()
print'Acceptedconnectionfrom',sockname
sc.shutdown(socket.SHUT_WR)
whileTrue:
message=get(sc)
ifnotmessage:
break
print'Messagesays:',repr(message)
sc.close()
s.close()
elifsys.argv[1:]==['client']:
s.connect((HOST,PORT))
s.shutdown(socket.SHUT_RD)
put(s,'Beautifulisbetterthanugly.')
put(s,'Explicitisbetterthanimplicit.')
put(s,'Simpleisbetterthancomplex.')
put(s,'')
s.close()
else:
print>>sys.stderr,'usage:streamer.pyserver|client[host]'
Runningfirsttheserverandthentheclientindifferentterminals,resultoon:
root@erlerobot:~/Python_files#pythonblocks.pyserver
Listeningat('127.0.0.1',1060)
Acceptedconnectionfrom('127.0.0.1',49692)
Messagesays:'Beautifulisbetterthanugly.'
Messagesays:'Explicitisbetterthanimplicit.'
Messagesays:'Simpleisbetterthancomplex.'
root@erlerobot:~/Python_files#
Notethatsomekindsofdatathatyoumightsendacrossthenetworkalreadyincludesomeformofdelimitingbuilt-in.Ifyouaretransmittingsuchdata,thenyoumightnothavetoimposeyourownframingatopwhatthedataisalreadydoing.ConsiderPython“pickles”forexample,thenativeformofserializationthatcomeswiththeStandardLibrary.Thepicklemoduleimplementsafundamental,butpowerfulalgorithmforserializingandde-serializingaPythonobjectstructure.“Pickling”istheprocesswherebyaPythonobjecthierarchyisconvertedintoabytestream,and“unpickling”istheinverseoperation,wherebyabytestreamisconvertedbackintoanobjecthierarchy.Moreover,usingaquirkymixoftextcommandsanddata,apicklestoresthecontentsofaPythondatastructuresothatyoucanreconstructitlateroronadifferentmachine:
>>>importpickle
>>>pickle.dumps([5,6,7])
'(lp0\nI5\naI6\naI7\na.'
Theinterestingthingabouttheformatisthe'.'characterthatyouseeattheendoftheforegoingstring—itistheformat'swayofmarkingtheendofapickle.Uponencounteringit,theloadercanstopandreturnthevaluewithoutreadinganyfurther.Thuswecantaketheforegoingpickle,sticksomeuglydataontheend,andseethatloads()willcompletelyignoreitandgiveusouroriginallistback:
>>>pickle.loads('(lp0\nI5\naI6\naI7\na.UjJGdVpHRnNaZz09')
[5,6,7]
Ofcourse,usingloads()thiswayisnotusefulfornetworkdata,sinceitdoesnottellushowmanybytesitprocessedinordertoreloadthepickle;westilldonotknowhowmuchofourstringispickledata.Butifweswitchtoreadingfromafileandusingthepickleload()function,thenthefilepointerwillbeleftrightattheendofthepickledata,andwecanstartreadingfromthereifwewanttoreadwhatcameafterthepickle:
>>>fromStringIOimportStringIO
>>>f=StringIO('(lp0\nI5\naI6\naI7\na.UjJGdVpHRnNaZz09')
>>>pickle.load(f)
[5,6,7]
>>>f.pos
18
>>>f.read()
'UjJGdVpHRnNaZz09'
PicklesandSelf-DelimitingFormats
Ifyourprotocolneedstobeusablefromotherprogramminglanguages—orifyousimplypreferuniversalstandardstoformatsspecifictoPython—thentheJSONandXMLdataformatsareeachapopularchoice.Notethatneitheroftheseformatssupportsframing,soyouwillhavetofirstfigureouthowtoextractacompletestringoftextfromoverthenetworkbeforeyoucanthenprocessit.
JSONisamongthebestchoicesavailabletodayforsendingdatabetweendifferentcomputerlanguages.SincePython2.6,ithasbeenincludedintheStandardLibraryasamodulenamedjson.JSON,shortforJavaScriptObjectNotation,isalightweightformatfordataexchange.JSONisasubsetoftheobjectliteralnotationJavaScriptthatdoesnotrequiretheuseofXML.ForncodingbasicPythonobjecthierarchies:
>>>#Thesyntaxisis:
...
>>>importjson
>>>json.dumps(['foo',{'bar':('baz',None,1.0,2)}])
'["foo",{"bar":["baz",null,1.0,2]}]'
>>>#Example:
...
>>>json.dumps([51,u'Namárië!'])
'[51,"Nam\\u00e1ri\\u00eb!"]'
Fordecodingityoushoulduse:
>>>#Thesyntaxisis:
...
>>>importjson
>>>json.loads('["foo",{"bar":["baz",null,1.0,2]}]')
[u'foo',{u'bar':[u'baz',None,1.0,2]}]
>>>#Anexample:
...
>>>json.loads('{"name":"Lancelot","quest":"Grail"}')
{u'quest':u'Grail',u'name':u'Lancelot'}
NotethattheprotocolfullysupportsUnicodestrings.Itdoes,however,haveaweakness:avastomissionintheJSONstandardisthatitprovidesabsolutelynoprovisionforcleanlypassingbinarydatalikeimagesorarbitrarydocuments.TheXMLformatisbetterfordocuments,sinceitsbasicstructureistotakestringsandmarkthemupbywrappingtheminangle-bracketedelements.
XML,JSON,Etc.
SincethetimenecessarytotransmitdataoverthenetworkisoftenmoresignificantthanthetimeyourCPUspendspreparingthedatafortransmission,itisoftenworthwhiletocompressdatabeforesendingit.ThepopularHTTPprotocolletsaclientandserverfigureoutwhethertheycanbothsupportcompression.
Aninterestingfactaboutthemostubiquitousformofcompression,theGNUzlibfacility(Forapplicationsthatrequiredatacompression,thefunctionsinthismoduleallowcompressionanddecompression,usingthezliblibrary)thatisavailablethroughthePythonStandardLibrary,isthatitisself-framing.Ifyoustartfeedingitacompressedstreamofdata,thenitcantellyouwhenthecompresseddatahasendedandfurther,uncompresseddatahasarrivedpastitsend.
Mostprotocolschoosetodotheirownframingandthen,ifdesired,passtheresultingblocktozlibfordecompression.Butyoucouldconceivablypromiseyourselfthatyouwouldalwaystackabitofuncompresseddataontotheendofeachzlibcompressedstring—here,wewilluseasingle'.'byte—andwatchforyourcompressionobjecttosplitoutthat“extradata”asthesignalthatyouaredone.Considerthiscombinationoftwocompresseddatastreams:
>>>importzlib
>>>data=zlib.compress('sparse')+'.'+zlib.compress('flat')+'.'
>>>data
'x\x9c+.H,*N\x05\x00\t\r\x02\x8f.x\x9cK\xcbI,\x01\x00\x04\x16\x01\xa8.'
>>>len(data)
28
Imaginethatthese28bytesarriveattheirdestinationin8-bytepackets.Afterprocessingthefirstpacket,wewillfindthedecompressionobject'sunused_dataslotstillempty,whichtellsusthatthereisstillmoredatacoming,sowewouldrecv()onoursocketagain:
>>>dobj=zlib.decompressobj()
>>>dobj.decompress(data[0:8]),dobj.unused_data
('spars','')
Butthesecondblockofeightcharacters,whenfedtoourdecompressobject,bothfinishesoutthecompresseddatawewerewaitingfor(sincethefinal'e'completesthestring'sparse')andalsofinallyhasanon-emptyunused_datavaluethatshowsusthatwefinallyreceivedour'.'byte:
>>>dobj.decompress(data[8:16]),dobj.unused_data
('e','.x')
Ifanotherstreamofcompresseddataiscoming,thenwehavetoprovideeverythingpastthe'.'—inthiscase,thecharacter'x'—toournewdecompressobject,thenstartfeedingittheremaining“packets”:
>>>dobj2=zlib.decompressobj()
>>>dobj2.decompress('x'),dobj2.unused_data
('','')
>>>dobj2.decompress(data[16:24]),dobj2.unused_data
('flat','')
>>>dobj2.decompress(data[24:]),dobj2.unused_data
('','.')
Atthispoint,unused_dataisagainnon-empty,meaningthatwehavereadpasttheendofthissecondboutofcompresseddataandcanexamineitscontent.
Compression
Dependingontheprotocolimplementationthatyouareusing,youmighthavetodealonlywithexceptionsspecifictothatprotocol,oryoumighthavetodealwithbothprotocol-specificexceptionsandwithrawsocketerrorsaswell.
Theexceptionsthatarespecifictosocketoperationsare:
socket.gaierror:Thisexceptionisraisedwhengetaddrinfo()cannotfindanameorservicethatyouaskabout—hencethelettersG,A,andIinitsname.
>>>importsocket
>>>s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
>>>s.connect(('nonexistent.hostname.foo.bar',80))
Traceback(mostrecentcalllast):
...
gaierror:[Errno-5]Noaddressassociatedwithhostname
socket.error:Thisistheworkhorseofthesocketmodule,andwillberaisedfornearlyeveryfailurethatcanhappenatanystageinanetworktransmission.
socket.timeout:Thisexceptionisraisedonlyifyou,oralibrarythatyouareusing,decidestosetatimeoutonasocketratherthanwaitforeverforasend()orrecv()tocomplete.Itindicatesthatthetimeoutwasreachedbeforetheoperationcouldcompletenormally.
NetworkExceptions
Therearefourbasicapproachesofhandlingtheerrorsthatcanoccur.
Thefirstisnottohandleexceptionsatall.IfonlyyouoronlyotherPythonprogrammerswillbeusingyourscript,thentheywillprobablynotbefazedbyseeinganexception.Ifyouarewritingalibraryofcallstobeusedbyotherprogrammers,thenthisfirstapproachisusuallypreferable,sincebylettingtheexceptionthroughyougivetheprogrammerusingyourAPIthechancetodecidehowtopresenterrorstohisorherusers.
Ifyouareindeedwritingalibrary,thenthereisasecondapproachtoconsider:wrappingthenetworkerrorsinanexceptionofyourown.
Athirdapproachtoexceptionsistowrapatry…exceptclausearoundeverysinglenetworkcallthatyouevermake,andprintoutapithyerrormessageinitsplace.Whilesuitableforshortprograms,thiscanbecomeveryrepetitivewhenlongprogramsareinvolved,withoutnecessarilyprovidingthatmuchmoreinformationfortheuser.
Thereisonefinalreasonthatmightdictatewhereyouaddanexceptionhandlertoyournetworkprogram:youmightwanttointelligentlyre-tryanoperationthatfailed.
HandlingExceptions
Beforeyousendsensitivedataacrossanetwork,youneedproofoftheidentityofthemachinethatyouthinkisontheotherendofthesocket,andwhilesendingthedata,youneeditprotectedagainstthepryingeyesofanyonecontrollingthegatewaysandnetworkswitchesthatseeallofyourpackets.ThesolutiontothisproblemistouseTransportLayerSecurity(TLS).BecauseearlierversionsofTLSwerecalledtheSecureSocketsLayer(SSL),nearlyallofthelibrariesthatyouwillusetospeakTLSactuallystillhaveSSLsomewhereinthename.
TLSandSSL
ThereareseveralsecurityproblemsthatTLSisdesignedtosolve.Theyarebestunderstoodbyconsideringthedangersofsendingyournetworkdataas“cleartext”overaplainoldsocket,whichcopiesyourdatabyte-for-byteintothepacketsthatgetsentoverthenetwork.
Whataretheconsequencesofsomeonewhocannowobserve,capture,andanalyzeyourdataathisleisure?
Hecanseeallofthedatathatpassesoverthatsegmentofthenetwork.Thefractionofyourdatathathecancapturedependsonhowmuchofitpassesoverthatparticularlink.
Hewillseeanyusernamesandpasswordsthatyourclientsusetoconnecttotheserversbehindthem.
Logmessagescanalsobeintercepted,iftheyarebeingsenttoacentrallocationandhappentotraveloveracompromisedIPsegmentordevice.Thiscouldbeveryusefuliftheobserverwantstoprobeforvulnerabilitiesinyoursoftware.
Ifyourdatabaseserverisnotpickyaboutwhoconnects,asidefromcaringthatthewebfrontendsendsapassword,thentheattackercannowlauncha“replayattack,”inwhichhemakeshisownconnectiontoyourdatabaseanddownloadsallofthedatathatafront-endserverisnormallyallowedtoaccess.
Imagineanattackerwhocannotyetaltertrafficonyournetworkitself,butwhocancompromiseoneoftheservicesaroundtheedgesthathelpyourserversfindeachother.Specifically,whatifshecancompromisetheDNSservicethatletsyourwebfrontendsfindyourdb.example.comserver.Thensomeinterestingtricksmightbecomepossible:
Whenyourfrontendsaskforthehostnamedb.example.com,shecouldanswerwiththeIPaddressofherownserver,locatedanywhereintheworld,instead.
Thefakedatabaseserverwillbeatalosstoanswerrequestswithanyrealdatathattheintruderhasnotalreadycopieddownoffthenetwork.
Ifyourdatabaseisnotcarefullylockeddownandsoisnotpickyaboutwhichserversconnect,thentheattackercandosomethingmoreinteresting:asrequestsstartarrivingatherfakedatabaseserver,hecanhaveitturnaroundandforwardthoserequeststotherealdatabaseserver.Thisiscalleda“man-in-the-middle”attack:hewillbeinfairlycompletecontrolofyourapplication.
Whileproxyingtheclientrequeststhroughtothedatabase,theattackerwillprobablyalsohavetheoptionofinsertingqueriesofherownintotherequeststream.Thiscouldletherdownloadentiretablesofdataanddeleteorchangewhateverdatathefront-endservicesaretypicallyallowedtomodify.
CleartextontheNetwork
ThesecrettoTLSispublic-keycryptography.Thereareseveralmathematicalschemesthathavebeenprovedabletosupportpublic-keyschemes,buttheyallhavethesethreefeatures:
Anyonecangenerateakeypair,consistingofaprivatekeythattheykeeptothemselvesandapublickeythattheycanbroadcasthowevertheywant.
Ifthepublickeyisusedtoencryptinformation,thentheresultingblockofbinarydatacannotbereadbyanyone,anywhereintheworld,exceptbysomeonewhoholdstheprivatekey.
Ifthesystemthatholdstheprivatekeyusesittoencryptinformation,thenanycopyofthepublickeycanbeusedtodecryptthedata.
WewillfocusonhowpublickeysareusedintheTLSsystem:PublickeysareusedattwodifferentlevelswithinTLS:first,toestablishacertificateauthority(CA)systemthatletsserversprove“whotheyreallyare”totheclientsthatwanttoconnect;and,second,tohelpaparticularclientandservercommunicatesecurely.
TLSEncryptsYourConversations
Fromthepointofviewofyournetworkprogram,youstartaTLSconnectionbyturningcontrolofasocketovertoanSSLlibrary.Bydoingso,youindicatethatyouwanttostopusingthesocketforcleartextcommunication,andstartusingitforencrypteddataunderthecontrolofthelibrary.
Fromthatpointon,younolongerusetherawsocket;doingsowillcauseanerrorandbreaktheconnection.Instead,youwilluseroutinesprovidedbythelibrarytoperformallcommunication.BothclientandservershouldturntheirsocketsovertoSSLatthesametime,afterreadingallpendingdataoffofthesocketinbothdirections.TherearetwogeneralapproachestousingSSL:
ThemoststraightforwardoptionisprobablytousethesslpackagethatrecentversionsofPythonshipwiththeStandardLibrary.
Theotheralternativeistouseathird-partyPythonlibrary.ThereareseveralofthesethatsupportTLS,butmanyofthemaredecrepitandseemtohavebeenabandoned.ForexampleM2Cryptopackage.
SupportingTLSinPython
HereyoucanfindanexampleoftheuseofTLS.Thefirstandlastfewlinesofthisfilesslclient.pylookcompletelynormal:openingasockettoaremoteserver,andthensendingandreceivingdatapertheprotocolthattheserversupports.Thecryptographicprotectionisinvokedbythefewlinesofcodeinthemiddle—twolinesthatloadacertificatedatabaseandmaketheTLSconnectionitself,andthenthecalltomatch_hostname()thatperformsthecrucialtestofwhetherwearereallytalkingtotheintendedserverorperhapstoanimpersonator.
importos,socket,ssl,sys
frombackports.ssl_match_hostnameimportmatch_hostname,CertificateError
try:
script_name,hostname=sys.argv
exceptValueError:
print>>sys.stderr,'usage:sslclient.py<hostname>'
sys.exit(2)
#Firstweconnect,asusual,withasocket.
sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect((hostname,443))
#Next,weturnthesocketovertotheSSLlibrary!
ca_certs_path=os.path.join(os.path.dirname(script_name),'certfiles.crt')
sslsock=ssl.wrap_socket(sock,ssl_version=ssl.PROTOCOL_SSLv3,
cert_reqs=ssl.CERT_REQUIRED,ca_certs=ca_certs_path)
#Doesthecertificatethattheserverproffered*really*matchthe
#hostnametowhichwearetryingtoconnect?Weneedtocheck.
try:
match_hostname(sslsock.getpeercert(),hostname)
exceptCertificateError,ce:
print'Certificateerror:',str(ce)
sys.exit(1)
#Fromhereon,our`sslsock`workslikeanormalsocket.Wecan,for
#example,makeanimpromptuHTTPcall.
sslsock.sendall('GET/HTTP/1.0\r\n\r\n')
result=sslsock.makefile().read()#quickwaytoreaduntilEOF
sslsock.close()
print'Thedocumenthttps://%s/is%dbyteslong'%(hostname,len(result))
Notethatthecertificatedatabaseneedstobeprovidedasafilenamedcertfiles.crtinthesamedirectoryasthescript.
root@erlerobot:~/Python_files#cat/etc/ssl/certs/*>certfiles.crt
root@erlerobot:~/Python_files#sslclient.pywww.openssl.org
Thedocumenthttps://www.openssl.org/is15941byteslong
TheStandardSSLModule
ThischapterexploreshownetworkprogrammingintersectswiththegeneraltoolsandtechniquesthatPythondevelopersusetowritelong-runningdaemonsthatcanperformsignificantamountsofworkbykeepingacomputeranditsprocessorsbusy.
ServerArchitecture
Adaemonisacomputerprogramthatrunsasabackgroundprocess,ratherthanbeingunderthedirectcontrolofaninteractiveuser.Youcanalsoinstallpython-daemonfromthePackage,anditscodewillletyourserverprogrambecomeadaemonentirelyonitsownpower.
Anotherusefulthingisthemodernloggingmodule,whichcanwritetosyslog,files,networksockets,oranythinginbetween.Thesimplestpatternistoplacesomethinglikethisatthetopofeachofyourdaemon’ssourcefiles:
importlogging
log=logging.getLogger(__name__)
Thenyourcodecangeneratemessagesverysimply:
log.error('thesystemisdown')
DaemonsandLogging
Inthisminimalistprotocollancelot.py,theclientopensasocket,sendsacrossoneofthethreequestionsaskedofSirLauncelotattheBridgeofDeathinMontyPython’sHolyGrailmovie,andthenterminatesthemessagewithaquestionmark:Whatisyourname?Theserverrepliesbysendingbacktheappropriateanswer,whichalwaysendswithaperiod:MynameisSirLauncelotofCamelot.BothquestionandanswerareencodedasASCII.
importsocket,sys
PORT=1060
qa=(('Whatisyourname?','MynameisSirLancelotofCamelot.'),
('Whatisyourquest?','ToseektheHolyGrail.'),
('Whatisyourfavoritecolor?','Blue.'))
qadict=dict(qa)
defrecv_until(sock,suffix):
message=''
whilenotmessage.endswith(suffix):
data=sock.recv(4096)
ifnotdata:
raiseEOFError('socketclosedbeforewesaw%r'%suffix)
message+=data
returnmessage
defsetup():
iflen(sys.argv)!=2:
print>>sys.stderr,'usage:%sinterface'%sys.argv[0]
exit(2)
interface=sys.argv[1]
sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)
sock.bind((interface,PORT))
sock.listen(128)
print'Readyandlisteningat%rport%d'%(interface,PORT)
returnsock
Theservercodeisserver_simple.py:
importlancelot
defhandle_client(client_sock):
try:
whileTrue:
question=lancelot.recv_until(client_sock,'?')
answer=lancelot.qadict[question]
client_sock.sendall(answer)
exceptEOFError:
client_sock.close()
defserver_loop(listen_sock):
whileTrue:
client_sock,sockname=listen_sock.accept()
handle_client(client_sock)
if__name__=='__main__':
listen_sock=lancelot.setup()
server_loop(listen_sock)
Anyway,thissimpleserverhasterribleperformancecharacteristics.Thedifficultycomeswhenmanyclientsallwanttoconnectatthesametime.Thefirstclient’ssocketwillbereturnedbyaccept(),andtheserverwillenterthehandle_client()looptostartansweringthatfirstclient’squestions.Butwhilethequestionsandanswersaretrundlingbackandforthacrossthenetwork,alloftheotherclientsareforcedtoqueueup.
Introductoryexample
Wewilltacklethedeficienciesofthesimpleservershowninserver_simple.pyintwodiscussions.First,inthissection,wewilldiscusshowmuchtimeitspendswaitingevenononeclientthatneedstoaskseveralquestions;andinthenextsection,wewilllookathowitbehaveswhenconfrontedwithmanyclientsatonce.AsimpleclientfortheLauncelotprotocolconnects,askseachofthethreequestionsonce,andthendisconnects.Thecodeofclient.pyisthefollowing:
importsocket,sys,lancelot
defclient(hostname,port):
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect((hostname,port))
s.sendall(lancelot.qa[0][0])
answer1=lancelot.recv_until(s,'.')#answersendwith'.'
s.sendall(lancelot.qa[1][0])
answer2=lancelot.recv_until(s,'.')
s.sendall(lancelot.qa[2][0])
answer3=lancelot.recv_until(s,'.')
s.close()
printanswer1
printanswer2
printanswer3
if__name__=='__main__':
ifnot2<=len(sys.argv)<=3:
print>>sys.stderr,'usage:client.pyhostname[port]'
sys.exit(2)
port=int(sys.argv[2])iflen(sys.argv)>2elselancelot.PORT
client(sys.argv[1],port)
Withthesetwoscriptsinplace,wecanstartrunningourserverinoneconsolewindow:
```
root@erlerobot:~/Python_files#pythonserver_simple.pylocalhost
Readyandlisteningat'localhost'port1060
```
Wecanthenrunourclientinanotherwindow,andseethethreeanswersreturnedbytheserver:
root@erlerobot:~/Python_files#pythonclient.pylocalhost
MynameisSirLancelotofCamelot.
ToseektheHolyGrail.
Blue.
Theclientandserverrunveryquicklyhereonmylaptop.Butappearancesaredeceiving,sowehadbetterapproachthisclient-serverinteractionmorescientificallybybringingrealmeasurementstobearuponitsactivity.
Thesolutionformeasuringtherealwaitingtimewhenrunningtheclientandserveronasinglemachine,buttosendtheconnectionthrougharound-triptoanothermachinebywayofanSSHtunnel.
Whendoingthisyouwillnoticehowthecostofcommunicationdominatestheperformance.Itwillalwaysseemtotakelessthan10μsfortheservertoruntheanswer=lineandretrievetheresponsethatcorrespondstoaparticularquestion.Ifactuallygeneratingtheanswerweretheclient’sonlyjob,thenwecouldexpectittoservemorethan100,000clientrequestspersecondwayofanSSHtunnel.Butlookatallofthetimethattheclientandserverspendwaitingforthenetwork:everytimeoneofthemfinishesasendall()call,ittakesbetween500μsand800μsbeforetheotherconversationpartnerisreleasedfromits`recv()callandcanproceed.
Nowon,wemayneedasystemforcomparingthesubsequentserverdesignsthatweexplore.Wearethereforegoingtoturnnowtoapublictool:theFunkLoadtool,writteninPythonandavailablefromthePythonPackageIndex.
Elementaryclient
root@erlerobot:~/Python_files#pipinstallfunkload
Thesimpleserverwehavebeenexamininghastheproblemthatthe`recv()calloftenfindsthatnodataisyetavailablefromtheclient,sothecall“blocks”untildataarrives.Thetimespentwaiting,aswehaveseen,istimelost;itcannotbespentusefullybytheservertoanswerrequestsfromotherclients.
Butwhatifweavoidedevercallingrecv()untilweknewthatdatahadarrivedfromaparticularclient.Theresultwouldbeaneventdrivenserverthatsitsinatightloopwatchingmanyclients;Ihavewrittenanexample,showninserver_poll.
importlancelot
importselect
listen_sock=lancelot.setup()
sockets={listen_sock.fileno():listen_sock}
requests={}
responses={}
poll=select.poll()
poll.register(listen_sock,select.POLLIN)
whileTrue:
forfd,eventinpoll.poll():
sock=sockets[fd]
#Removedclosedsocketsfromourlist.
ifevent&(select.POLLHUP|select.POLLERR|select.POLLNVAL):
poll.unregister(fd)
delsockets[fd]
requests.pop(sock,None)
responses.pop(sock,None)
#Acceptconnectionsfromnewsockets.
elifsockislisten_sock:
newsock,sockname=sock.accept()
newsock.setblocking(False)
fd=newsock.fileno()
sockets[fd]=newsock
poll.register(fd,select.POLLIN)
requests[newsock]=''
#Collectincomingdatauntilitformsaquestion.
elifevent&select.POLLIN:
data=sock.recv(4096)
ifnotdata:#end-of-file
sock.close()#makesPOLLNVALhappennexttime
continue
requests[sock]+=data
if'?'inrequests[sock]:
question=requests.pop(sock)
answer=dict(lancelot.qa)[question]
poll.modify(sock,select.POLLOUT)
responses[sock]=answer
#Sendoutpiecesofeachreplyuntiltheyareallsent.
elifevent&select.POLLOUT:
response=responses.pop(sock)
n=sock.send(response)
ifn<len(response):
responses[sock]=response[n:]
else:
poll.modify(sock,select.POLLIN)
requests[sock]=''
Themainloopinthisprogramiscontrolledbythepollobject,whichisqueriedatthetopofeveryiteration.Thepoll()callisablockingcall,thedifferenceisthatrecv()hastowaitononesingleclient,whilepoll()canwaitondozensorhundredsofclients,andreturnwhenanyofthemshowsactivity.
Thewaypoll()worksisthatwetellitwhichsocketsweneedtomonitor,andwhethereachsocketinterestsusbecausewewanttoreadfromitorwritetoit.Whenoneormoreofthesocketsareready,poll()returnsandprovidesalistofthe
Event-DrivenServers
socketsthatwecannowuse.
Tokeepthingsstraightwhenreadingthecode,thinkaboutthelifespanofoneparticularclientandtracewhathappenstoitssocketanddata.
Theclientwillfirstdoaconnect(),andtheserver’spoll()callwillreturnanddeclarethatthereisdatareadyonthemainlisteningsocket.Thatcanmeanonlyonething,anewclienthasconnected.Soweaccept()theconnectionandtellourpollobjectthatwewanttobenotifiedwhendatabecomesavailableforreadingfromthenewsocket.Tomakesurethattherecv()andsend()methodsonthesocketneverblockandfreezeoureventloop,wecallthesetblocking()socketmethodwiththevalueFalse(whichmeans“blockingisnotallowed”).
Whendatabecomesavailable,theincomingstringisappendedtowhateverisalreadyintherequestsdictionaryundertheentryforthatsocket.(socketscansafelybeusedasdictionarykeysinPython)
Wekeepacceptingmoredatauntilweseeaquestionmark,atwhichpointtheLauncelotquestioniscomplete.Thequestionsaresoshortthat,inpractice,theyprobablyallarriveintheveryfirstrecv()fromeachsocket;butjusttobesafe,wehavetobepreparedtomakeseveralrecv()callsuntilthewholequestionhasarrived.Wethenlookuptheappropriateanswer,storeitintheresponsesdictionaryundertheentryforthisclientsocket,andtellthepollobjectthatwenolongerwanttolistenformoredatafromthisclientbutinsteadwanttobetoldwhenitssocketcanstartacceptingoutgoingdata.
Onceasocketisreadyforwriting,wesendasmuchoftheansweraswillfitintoonesend()callontheclientsocket.This,bytheway,isabigreasonsend()returnsalength:becauseifyouuseitinnon-blockingmode,thenitmightbeabletosendonlysomeofyourbyteswithoutmakingyouwaitforabuffertodrainbackdown.
Oncethisserverhasfinishedtransmittingtheanswer,wetellthepollobjecttoswaptheclientsocketbackovertobeinglistenedtofornewincomingdata.
Aftermanyquestion-answerexchanges,theclientwillfinallyclosetheconnection.Oddlyenough,thePOLLHUP,POLLERR,andPOLLNVALcircumstancesthatpoll()cantellusabout—allofwhichindicatethattheconnectionhasclosedonewayoranother—arereturnedonlyifwearetryingtowritetothesocket,notreadfromit.Sowhenanattempttoreadreturnszerobytes,wehavetotellthepollobjectthatwenowwanttowritetothesocketsothatwereceivetheofficialnotificationthattheconnectionisclosed.
Aslightlyoldermechanismforwritingevent-drivenserversthatlistentosocketsistousetheselect()call,whichlikepoll()isavailablefromthePythonselectmoduleintheStandardLibrary.Irecommendtouse`poll()becauseitproducesmuchcleanercode,butmanypeoplechooseselect()becauseitissupportedonWindows.
Whentalkingaboutevent-drivenservers,youshouldtakeintoaccountthefollowing:Event-DrivenServersareBlockingandSynchronous.Referringtotheevent-drivenservers,liketheoneinserver_poll.py,somepeoplecallthem“non-blocking,”despitethefactthatthepoll()callblocks(theymeanthatitdoesnotblockwaitingforanyparticularclient),andotherscallthem“asynchronous”despitethefactthattheprogramexecutesitsstatementsintheirusuallinearorder.
Twothingsyoushouldknow
Ishouldaddaquicknoteabouthowrecv()andsend()behaveinnon-blockingmode,whenyouhavecalledsetblocking(False)ontheirsocket.A`poll()loopliketheonejustshownmeansthatweneverfinishcallingeitherofthesefunctionswhentheycannotacceptorprovidedata.Butwhatifwefindourselvesinasituationwherewewanttocalleitherfunctioninnon-blockingmodeanddonotyetknowwhetherthesocketisready?
Fortherecv()call,thesearetherules:
Ifdataisready,itisreturned.Ifnodatahasarrived,socket.errorisraised.Iftheconnectionhasclosed,''isreturned.
Notethatclosedconnectionreturnsavalue,butastill-openconnectionraisesanexception.Thelogicbehindthisbehavioristhatthefirstandlastpossibilitiesarebothpossibleinblockingmodeaswell:eitheryougetdataback,orfinallytheconnectionclosesandyougetbackanemptystring.Sotocommunicatetheextra,thirdpossibilitythatcanhappeninnon-blockingmode—thattheconnectionisstillopenbutnodataisreadyyet—anexceptionisused.
Thebehaviorofnon-blockingsend()issimilar:
Somedataissent,anditslengthisreturned.Thesocketbuffersarefull,sosocket.errorisraised.Iftheconnectionisclosed,socket.errorisalsoraised.
Thisevidencethatpoll()couldsaythatasocketisreadyforsending,butaFINpacketfromtheclientcouldarriverightaftertheserverisreleasedfromitspoll()butbeforeitcanstartupitssend()call.
TheSemanticsofNon-blocking
ThereareacoupleofPythonfactstotakeintoaccountwhenyouarecomputingyourownevent-drivenserver.
IthappensthatPythoncomeswithanevent-drivenframeworkbuiltintotheStandardLibrary.Iamgoingtorecommendthatyouignoreitentirely.Itisapairofancientmodules,asyncoreandasynchat,thatdatefromtheearlydaysofPython—youwillnotethatalloftheclassestheydefinearelowercase,indefianceofbothgoodtasteandallsubsequentpractice—andthattheyaredifficulttousecorrectly.
Instead,wewilltalkaboutTwistedPython.TwistedPythonisnotsimplyaframework;itisanevent-drivennetworkingengineforPython.TheTwistedcommunityhasdevelopedawayofwritingPythonthatisalltheirown.
Takealookatserver_twisted.pyforhowsimpleourevent-drivenservercanbecomeifweleavethetroubleofdealingwiththelow-leveloperatingsystemcallstosomeoneelse.
fromtwisted.internet.protocolimportProtocol,ServerFactory
fromtwisted.internetimportreactor
importlancelot
classLancelot(Protocol):
defconnectionMade(self):
self.question=''
defdataReceived(self,data):
self.question+=data
ifself.question.endswith('?'):
self.transport.write(dict(lancelot.qa)[self.question])
self.question=''
factory=ServerFactory()
factory.protocol=Lancelot
reactor.listenTCP(1060,factory)
reactor.run()
Fromthenon,everyeventonthatsocketistranslatedintoamethodcalltoourobject,lettinguswritecodethatappearstobethinkingaboutjustoneclientatatime.ButthankstothefactthatTwistedwillcreatedozensorhundredsofourLauncelotprotocolobjects,onecorrespondingtoeachconnectedclient,theresultisaneventloopthatcanrespondtowhicheverclientsocketsareready.
HereyoucanfindmoreinfromationaboutTwistedPython
TwistedPython
Theessentialideaofathreadedormulti-processserveristhatwetakethesimpleandstraightforwardserverthatwestartedoutwith(theserver_simple.py)andrunseveralcopiesofitatoncesothatwecanserveseveralclientsatonce,withoutmakingthemwaitoneachother.
Usingmultiplethreadsorprocessesisverycommon,especiallyinhigh-capacitywebanddatabaseservers.IntheStandardLibraryyoucanfindthemultiprocessingmodule.
(Note:Themainprogramlogicdoesnotevenknowwhichsolutionisbeingused;thetwoclasseshaveasimilarenoughinterfacethateitherThreadorProcesscanherebeusedinterchangeably.)
Looktheexampleatserver_multi.py:
importsys,time,lancelot
frommultiprocessingimportProcess
fromserver_simpleimportserver_loop
fromthreadingimportThread
WORKER_CLASSES={'thread':Thread,'process':Process}
WORKER_MAX=10
defstart_worker(Worker,listen_sock):
worker=Worker(target=server_loop,args=(listen_sock,))
worker.daemon=True#exitwhenthemainprocessdoes
worker.start()
returnworker
if__name__=='__main__':
iflen(sys.argv)!=3orsys.argv[2]notinWORKER_CLASSES:
print>>sys.stderr,'usage:server_multi.pyinterfacethread|process'
sys.exit(2)
Worker=WORKER_CLASSES[sys.argv.pop()]#setup()wantslen(argv)==2
#Everyworkerwillaccept()foreveronthesamelisteningsocket.
listen_sock=lancelot.setup()
workers=[]
foriinrange(WORKER_MAX):
workers.append(start_worker(Worker,listen_sock))
#Checkeverytwosecondsfordeadworkers,andreplacethem.
whileTrue:
time.sleep(2)
forworkerinworkers:
ifnotworker.is_alive():
printworker.name,"died;startingreplacementworker"
workers.remove(worker)
workers.append(start_worker(Worker,listen_sock))
Asyoucanseeitislettingmultiplethreadsorprocessesallcallaccept()ontheverysameserversocket,andinsteadofraisinganerrorandinsistingthatonlyonethreadatatimebeabletowaitforanincomingconnection,theoperatingsystempatientlyqueuesupallofourwaitingworkersandthenwakesuponeworkerforeachnewconnectionthatarrives.Thefactthatalisteningsocketcanbesharedatallbetweenthreadsandprocesses,andthattheoperatingsystemdoesround-robinbalancingamongtheworkersthatarewaitingonanaccept()call,isoneofthegreatgloriesofthePOSIXnetworkstackandexecutionmodel;itmakesprogramslikethisverysimpletowrite.
ThreadingandMulti-processing
TheSocketServermodulesimplifiesthetaskofwritingnetworkservers.
Therearefourbasicserverclasses:TCPServer,UDPServer,UnixDatagramServerandUnixStreamServer.
Thesefourclassesprocessrequestssynchronously;eachrequestmustbecompletedbeforethenextrequestcanbestarted.Thisisn’tsuitableifeachrequesttakesalongtimetocomplete,becauseitrequiresalotofcomputation,orbecauseitreturnsalotofdatawhichtheclientisslowtoprocess.
Inserver_SocketServer.py,youcanseehowsmallourmulti-threadedserverbecomeswhenittakesadvantageofthisframework.(ThereisalsoaForkingMixInthatyoucanuseifyouwantittospawnseveralprocesses—atleastonaPOSIXsystem.)
fromSocketServerimportThreadingMixIn,TCPServer,BaseRequestHandler
importlancelot,server_simple,socket
classMyHandler(BaseRequestHandler):
defhandle(self):
server_simple.handle_client(self.request)
classMyServer(ThreadingMixIn,TCPServer):
allow_reuse_address=1
#address_family=socket.AF_INET6#ifyouneedIPv6
server=MyServer(('',lancelot.PORT),MyHandler)
server.serve_forever()
Whereasourearlierexamplecreatedtheworkersupfrontsothattheywereallsharingthesamelisteningsocket,theSocketServerdoesallofitslisteninginthemainthreadandcreatesoneworkereachtimeaccept()returnsanewclientsocket.
ThreadingandMulti-processingFrameworks
ThischaptersurveysthehandfuloftechnologiesthathavetogetherbecomefundamentalbuildingblocksforexpandingapplicationstoInternetscale.
Thischapter’spurposeistointroduceyoutotheproblemthateachtoolsolves;explainhowtousetheservicetoaddressthatissue;andgiveafewhintsaboutusingthetoolfromPython.
Caches,MessageQueues,andMap-Reduce
Memcachedisthe“memorycachedaemon.”ItsimpactonmanylargeInternetserviceshasbeen,byallaccounts,revolutionary.AfterglancingathowtouseitfromPython,wewilldiscussitsimplementation,whichwillteachusaboutaveryimportantmodernnetworkconceptcalledsharding.
TheactualproceduresforusingMemcachedaredesignedtobeverysimple:
YourunaMemcacheddaemononeveryserverwithsomesparememory.YoumakealistoftheIPaddressandportnumbersofyournewMemcacheddaemons,anddistributethislisttoalloftheclientsthatwillbeusingthecache.Yourclientprogramsnowhaveaccesstoanorganization-wideblazing-fastkeyvaluecachethatactssomethinglikeabigPythondictionarythatallofyourserverscanshare.ThecacheoperatesonanLRU(least-recently-used)basis,droppingolditemsthathavenotbeenaccessedforawhilesothatithasroomtobothacceptnewentriesandkeeprecordsthatarebeingfrequentlyaccessed.
UsingMemcached
EnoughPythonclientsarecurrentlylistedforMemcachedthatIhadbetterjustsendyoutothepagethatliststhem,ratherthantrytoreviewthemhere:http://code.google.com/p/memcached/wiki/Clients.TheclientthattheylistfirstiswritteninpurePython,andthereforewillnotneedtocompileagainstanylibraries.MemcachedcanbeinstallthankstobeingavailableonthePythonPackageIndex:
root@erlerobot:~/Python_files#pipinstallpython-memcached
``
Theinterfaceisstraightforward.Thoughyoumighthaveexpectedaninterfacethatmorestrongly
resemblesaPythondictionarywithnativemethodslike`__getitem__`,theauthorofpython-memcached
choseinsteadtousethesamemethodnamesasareusedinotherlanguagessupportedby
Memcached—whichIthinkwasagooddecision,sinceitmakesiteasiertotranslateMemcached
examplesintoPython:
```python
>>>importmemcache
>>>mc=memcache.Client(['127.0.0.1:11211'])
>>>mc.set('user:19','{name:"Lancelot",quest:"Grail"}')
True
>>>mc.get('user:19')
'{name:"Lancelot",quest:"Grail"}'
ThebasicpatternbywhichMemcachedisusedfromPythonisshowninsquares.py.Beforeembarkingonan(artificially)expensiveoperation,itchecksMemcachedtoseewhethertheanswerisalreadypresent.Ifso,thentheanswercanbereturnedimmediately;ifnot,thenitiscomputedandstoredinthecachebeforebeingreturned.
importmemcache,random,time,timeit
mc=memcache.Client(['127.0.0.1:11211'])
defcompute_square(n):
value=mc.get('sq:%d'%n)
ifvalueisNone:
time.sleep(0.001)#pretendthatcomputingasquareisexpensive
value=n*n
mc.set('sq:%d'%n,value)
returnvalue
defmake_request():
compute_square(random.randint(0,5000))
print'Tensuccessiveruns:',
foriinrange(1,11):
print'%.2fs'%timeit.timeit(make_request,number=2000),
TheMemcacheddaemonneedstoberunningonyourmachineatport11211forthisexampletosucceed.Forthefirstfewhundredrequests,ofcourse,theprogramwillrunatitsusualspeed.Butasthecachebeginstoaccumulatemorerequests,itisabletoaccelerateanincreasinglylargefractionofthem.
root@erlerobot:~/Python_files#pythonsquares.py
Tensuccessiveruns:2.75s1.98s1.51s1.14s0.90s0.82s0.71s0.65s0.58s0.55s
Thispatternisgenerallycharacteristicofcaching:agradualimprovementasthecachebeginstocovertheproblemdomain,andthenstabilityaseitherthecachefillsortheinputdomainhasbeenfullycovered.
YoumustalwaysrememberthatMemcachedisacache;itisephemeral,itusesRAMforstorage,and,ifre-started,itremembersnothingthatyouhaveeverstored!Yourapplicationshouldalwaysbeabletorecoverifthecacheshoulddisappear.
ThedesignofMemcachedillustratesanimportantprinciplethatisusedinseveralotherkindsofdatabases,andwhichyoumightwanttoemployinarchitecturesofyourown:theclientsshardthedatabasebyhashingthekeys’stringvaluesandlettingthehashdeterminewhichmemberoftheclusterisconsultedforeachkey.
Tounderstandwhythisiseffective,consideraparticularkey/valuepair—likethekeysq:42andthevalue1764thatmightbestoredbysquares.py.TomakethebestuseoftheRAMithasavailable,theMemcachedclusterwantstostorethiskeyandvalueexactlyonce.Buttomaketheservicefast,itwantstoavoidduplicationwithoutrequiringanycoordinationbetweenthedifferentserversorcommunicationbetweenalloftheclients.
Thismeansthatalloftheclients,withoutanyotherinformationtogoonthan(a)thekeyand(b)thelistofMemcachedserverswithwhichtheyareconfigured,needsomeschemeforworkingoutwherethatpieceofinformationbelongs.Iftheyfailtomakethesamedecision,thennotonlymightthekeyandvaluebecopiedontoseveralserversandreducetheoverallmemoryavailable,butalsoaclient’sattempttoremoveaninvalidentrycouldleaveotherinvalidcopieselsewhere.
Thesolutionisthattheclientsallimplementasingle,stablealgorithmthatcanturnakeyintoanintegernthatselectsoneoftheserversfromtheirlist.Theydothisbyusinga“hash”algorithm,whichmixesthebitsofastringwhenforminganumbersothatanypatterninthestringis,hopefully,obliterated.YoucanfindhashlibmoduleinthePythonStandardLibrary.
Toseewhypatternsinkeyvaluesmustbeobliterated,considerhashing.py.ItloadsadictionaryofEnglishwords(youmighthavetodownloadadictionaryofyourownoradjustthepathtomakethescriptrunonyourownmachine),andexploreshowthosewordswouldbedistributedacrossfourserversiftheywereusedaskeys.Thefirstalgorithmtriestodividethealphabetintofourroughlyequalsectionsanddistributesthekeysusingtheirfirstletter;theothertwoalgorithmsusehashfunctions.
importhashlib
defalpha_shard(word):
"""Doapoorjobofassigningdatatoserversbyusingfirstletters."""
ifword[0]in'abcdef':
return'server0'
elifword[0]in'ghijklm':
return'server1'
elifword[0]in'nopqrs':
return'server2'
else:
return'server3'
defhash_shard(word):
"""Doagreatjobofassigningdatatoserversusingahashvalue."""
return'server%d'%(hash(word)%4)
defmd5_shard(word):
"""Doagreatjobofassigningdatatoserversusingahashvalue."""
#digest()isabytestring,soweord()itslastcharacter
return'server%d'%(ord(hashlib.md5(word).digest()[-1])%4)
words=open('/usr/share/dict/words').read().split()
forfunctioninalpha_shard,hash_shard,md5_shard:
d={'server0':0,'server1':0,'server2':0,'server3':0}
forwordinwords:
d[function(word.lower())]+=1
printfunction.__name__[:-6],d
Thehash()functionisPython’sownbuilt-inhashroutine,whichisdesignedtobeblazinglyfastbecauseitisusedinternallytoimplementPythondictionarylookup.
MemcachedandSharding
Messagequeueprotocolsletyousendreliablechunksofdatacalledmessages.Typically,aqueuepromisestotransmitmessagesreliably,andtodeliverthematomically:amessageeitherarriveswholeandintact,oritdoesnotarriveatall.Clientsneverhavetoloopandkeepcallingsomethinglikerecv()untilawholemessagehasarrived.Theotherinnovationthatmessagequeuesofferisthat,insteadofsupportingonlythepoint-topointconnectionsthatarepossiblewithanIPtransportlikeTCP,youcansetupallkindsoftopologiesbetweenmessagingclients.Eachbrandofmessagequeuetypicallysupportsseveraltopologies.
Apipelinetopologyisthepatternthatperhapsbestresemblesthepictureyouhaveinyourheadwhenyouthinkofaqueue:aproducercreatesmessagesandsubmitsthemtothequeue,fromwhichthemessagescanthenbereceivedbyaconsumer.Forexample,thefront-endwebmachinesofaphotosharingwebsitemightacceptimageuploadsfromendusersandlisttheincomingfilesonaninternalqueue.Amachineroomfullofserverscouldthenreadfromthequeue,eachreceivingonemessageforeachreaditperforms,andgeneratethumbnailsforeachoftheincomingimages.Thequeuemightgetlongduringthedayandthenbeshortoremptyduringperiodsofrelativelylowuse,buteitherwaythefront-endwebserversarefreedtoquicklyreturnapagetothewaitingcustomer,tellingthemthattheiruploadiscompleteandthattheirimageswillsoonappearintheirphotostream.
Apublisher-subscribertopologylooksverymuchlikeapipeline,butwithakeydifference.Thepipelinemakessurethateveryqueuedmessageisdeliveredtoexactlyoneconsumer—since,afterall,itwouldbewastefulfortwothumbnailserverstobeassignedthesamephotograph.Butsubscriberstypicallywanttoreceiveallofthemessagesthatarebeingenqueuedbyeachpublisher—orelsetheywanttoreceiveeverymessagethatmatchessomeparticulartopic.Eitherway,apublisher-subscribermodelsupportsmessagesthatfanouttobedeliveredtoeveryinterestedsubscriber.Thiskindofqueuecanbeusedtopowerexternalservicesthatneedtopusheventstotheoutsideworld,andalsotoformafabricthatamachineroomfullofserverscanusetoadvertisewhichsystemsareup,whicharegoingdownformaintenance,andthatcanevenpublishtheaddressesofothermessagequeuesastheyarecreatedanddestroyed.
Finally,arequest-replypatternisoftenthemostcomplexbecausemessageshavetomakearoundtrip.Bothofthepreviouspatternsplacedverylittleresponsibilityontheproducerofamessage:theyconnecttothequeue,transmittheirmessage,andaredone.Butamessagequeueclientthatmakesarequesthastostayconnectedandwaitforthecorrespondingreplytobedeliveredbacktoit.Thequeueitself,tosupportthis,hastofeaturesomesortofaddressingschemebywhichrepliescanbedirectedtothecorrectclientthatisstillsittingandwaitingforit.Butforallofitsunderlyingcomplexity,thisisprobablythemostpowerfulpatternofall,sinceitallowstheloadofdozensorhundredsofclientstobespreadacrossequallylargenumbersofserverswithoutanyeffortbeyondsettingupthemessagequeue.Andsinceagoodmessagequeuewillallowserverstoattachanddetachwithoutlosingmessages,thistopologyallowsserverstobebroughtdownformaintenanceinawaythatisinvisibletothepopulationofclientmachines.
MessageQueues
ThereareseveralAMQP(AdvancedMessageQueuingProtocol)implementationscurrentlylistedinthePythonPackageIndex.
AnalternativetousingAMQPandhavingtorunacentralbroker,likeRabbitMQorApacheQpid,istouseØMQ,the“ZeroMessageQueue,”whichwasinventedbythesamecompanyasAMQPbutmovesthemessagingintelligencefromacentralizedbrokerintoeveryoneofyourmessageclientprograms.
AgoodsummaryoftheadvantagesanddisadvantagesisprovidedattheØMQwebsite:http://zeromq.org/docs:welcome-from-amqp
Thenextexample,queuecrazy.py,showssomeofthepatternsthatcanbesupportedwhenmessagequeuesareusedtoconnectdifferentpartsofanapplication.ItrequiresØMQ,whichyoucanmosteasilymakeavailabletoPythonIndex:
root@erlerobot:~/Python_files#pipinstallpyzmq-static
ThelistingusesPythonthreadstocreateasmallclusterofsixdifferentservices.Onepushesaconstantstreamofwordsontoapipeline.Threeotherssitreadytoreceiveawordfromthepipeline;eachwordwakesoneofthemup.Thefinaltwoarerequest-replyservers,whichresembleremoteprocedureendpointsandsendbackamessageforeachmessagetheyreceive.
importrandom,threading,time,zmq
zcontext=zmq.Context()
deffountain(url):
"""Producesasteadystreamofwords."""
zsock=zcontext.socket(zmq.PUSH)
zsock.bind(url)
words=[wforwindir(__builtins__)ifw.islower()]
whileTrue:
zsock.send(random.choice(words))
time.sleep(0.4)
defresponder(url,function):
"""Performsastringoperationoneachwordreceived."""
zsock=zcontext.socket(zmq.REP)
zsock.bind(url)
whileTrue:
word=zsock.recv()
zsock.send(function(word))#sendthemodifiedwordback
defprocessor(n,fountain_url,responder_urls):
"""Readwordsastheyareproduced;getthemprocessed;printthem."""
zpullsock=zcontext.socket(zmq.PULL)
zpullsock.connect(fountain_url)
zreqsock=zcontext.socket(zmq.REQ)
forurlinresponder_urls:
zreqsock.connect(url)
whileTrue:
word=zpullsock.recv()
zreqsock.send(word)
printn,zreqsock.recv()
defstart_thread(function,*args):
thread=threading.Thread(target=function,args=args)
thread.daemon=True#soyoucaneasilyControl-Cthewholeprogram
thread.start()
start_thread(fountain,'tcp://127.0.0.1:6700')
start_thread(responder,'tcp://127.0.0.1:6701',str.upper)
start_thread(responder,'tcp://127.0.0.1:6702',str.lower)
forninrange(3):
start_thread(processor,n+1,'tcp://127.0.0.1:6700',
UsingMessageQueuesfromPython
['tcp://127.0.0.1:6701','tcp://127.0.0.1:6702'])
time.sleep(30)
Thetworequest-replyserversaredifferent—oneturnseachworditreceivestouppercase,whiletheothermakesitswordsalllowercase—andyoucantellthethreeprocessorsapartbythefactthateachisassignedadifferentinteger.
FinallyIwouldliketoaddthefollowingtofixtheconceptofmessageQueues:Messagequeuesprovideapointofcoordinationandintegrationfordifferentpartsofyourapplicationthatmayrequiredifferenthardware,loadbalancingtechniques,platforms,orevenprogramminglanguages.Theycantakeresponsibilityfordistributingmessagesamongmanywaitingconsumersorserversinawaythatisnotpossiblewiththesinglepoint-to-pointlinksofferedbynormalTCPsockets,andcanalsouseadatabaseorotherpersistentstoragetoassurethatupdatestoyourservicearenotlostiftheservergoesdown.Messagequeuesalsoofferresilienceandflexibility,sinceifsomepartofyoursystemtemporarilybecomesabottleneck,thenthemessagequeuecanabsorbtheshockbyallowingmanymessagestoqueueupforthatservice.Byhidingthepopulationofserversorprocessesthatserveaparticularkindofrequest,themessagequeuepatternalsomakesiteasytodisconnect,upgrade,reboot,andreconnectserverswithouttherestofyourinfrastructurenoticing.
MapReduceisaprogrammingmodelandanassociatedimplementationforprocessingandgeneratinglargedatasetswithaparallel,distributedalgorithmonacluster.
AMapReduceprogramiscomposedofaMap()procedurethatperformsfilteringandsorting(suchassortingstudentsbyfirstnameintoqueues,onequeueforeachname)andaReduce()procedurethatperformsasummaryoperation(suchascountingthenumberofstudentsineachqueue,yieldingnamefrequencies).The"MapReduceSystem"(alsocalled"infrastructure"or"framework")orchestratesbymarshallingthedistributedservers,runningthevarioustasksinparallel,managingallcommunicationsanddatatransfersbetweenthevariouspartsofthesystem,andprovidingforredundancyandfaulttolerance.
ThesetwooperationsbearsomeresemblancetothePythonbuilt-infunctionsofthatname(whichPythonitselfborrowedfromtheworldoffunctionalprogramming);imaginehowonemightsplitacrossseveralserversthetasksofsummingthesquaresofmanyintegers:
>>>squares=map(lambdan:n*n,range(11))
>>>squares
[0,1,4,9,16,25,36,49,64,81,100]
>>>importoperator
>>>reduce(operator.add,squares)
385
Themappingoperationshouldbepreparedtorunonceonsomeparticularsliceoftheoverallproblemordataset,andtoproduceatally,table,orresponsethatsummarizesitsfindingsforthatsliceoftheinput.Thereduceoperationisthenexposedtotheoutputsofthemappingfunctions,tocombinethemtogetherintoanever-accumulatinganswer.Tousethemapreducecluster’spowereffectively,frameworksarenotcontenttosimplyrunthereducefunctionononenodeonceallofthedozensorhundredsofactivemachineshavefinishedthemappingstage.Instead,thereducefunctionisruninparallelonmanynodesatonce,eachconsideringtheoutputofahandfulofmapoperations,andthentheseintermediateresultsarecombinedagainandagaininatreeofcomputationsuntilafinalreducestepproducesoutputforthewholeinput.
Inconclusion,themap-reducepatternprovidesacloud-styleframeworkfordistributedcomputationacrossmany
Map-Reduce
processorsand,potentially,acrossmanypartsofalargedataset.
Hypertextisstructuredtextthatuseslogicallinks(hyperlinks)betweennodescontainingtext.HTTP(TheHypertextTransferProtocol)istheprotocoltoexchangeortransferhypertext.
HTTPisthefoundationofdatacommunicationfortheWorldWideWeb.AsthischapterproceedstoexplorethefeaturesofHTTP,wearegoingtoillustratetheprotocolusingseveralmodulesthatcomebuilt-intothePythonStandardLibrary
HTTP
UniformResourceLocators(URLs),arestringsthattellyourwebbrowserhowtofetchresourcesfromtheWorldWideWebTheyareasubclassofthefullsetofpossibleUniformResourceIdentifiers(URIs);specifically,theyareURIsconstructedsothattheygiveinstructionsforfetchingadocument,insteadofservingonlyasanidentifier.
Tounderstandhowtheywork,FconsideraverysimpleURL,forexample,likethefollowing:http://python.orgIfsubmittedtoawebbrowser,thisURLisinterpretedasanordertoresolvethehostnamepython.orgtoanIPaddress,makeaTCPconnectiontothatIPaddressatthestandardHTTPport80,andthenaskfortherootdocument/thatlivesatthatsite.
NowimagineanothermorecomplicatedURL,imaginethatwewantedthelogoforNord/LB,alargeGermanbank.TheresultingURLmightlooksomethinglikethis:http://example.com:8080/Nord%2FLB/logo?shape=square&dpi=96
Here,theURLspecifiesmoreinformationthanourpreviousexampledid:
Theprotocolwill,again,beHTTP.Thehostnameexample.comwillberesolvedtoanIP.Thistime,port8080willbeusedinsteadof80.Onceaconnectioniscomplete,theremoteserverwillbeaskedfortheresourcenamed:/Nord%2FLB/logo?shape=square&dpi=96
Webservers,inpractice,haveabsolutefreedomtointerpretURLsastheyplease;however,theintentionofthestandardisthatthisURLbeparsedintotwoquestion-mark-delimitedpieces.Thefirstisapathconsistingoftwoelements:
ANord/LBpathelement.Alogopathelement.
Thestringfollowingthe?isinterpretedasaquerycontainingtwoterms:
Ashapeparameterwhosevalueissquare.Adpiparameterwhosevalueis96.
Anycharactersbeyondthealphanumerics,afewpunctuationmarks—specificallytheset$-_.+!*'(),—andthespecialdelimitercharactersthemselves(liketheslashes)mustbepercent-encodedbyfollowingapercentsign%withthetwo-digithexadecimalcodeforthecharacter.
YoushouldnotethatthefollowingURLpathsarenotequivalent:
Nord%2FLB%2Flogo=Asinglepathcomponent,namedNord/LB/logo.
Nord%2FLB/logo=Twopathcomponents,Nord/LBandlogo.
Nord/LB/logo=ThreeseparatepathcomponentsNord,LB,andlogo.
ThemostimportantPythonroutinesforworkingwithURLslive,appropriatelyenough,intheirownmodule.Theurlparsemodule;thismoduledefinesastandardinterfacetobreakURLstringsupincomponents(addressingscheme,networklocation,pathetc.),tocombinethecomponentsbackintoaURLstring,andtoconverta“relativeURL”toanabsoluteURLgivena“baseURL.”
>>>fromurlparseimporturlparse,urldefrag,parse_qs,parse_qsl
Withtheseroutines,youcangetlargeandcomplexURLsliketheexamplegivenearlierandturn
themintotheircomponentparts,withRFC-compliantparsingalreadyimplementedforyou:
```python
>>>p=urlparse('http://example.com:8080/Nord%2FLB/logo?shape=square&dpi=96')
>>>p
ParseResult(scheme='http',netloc='example.com:8080',path='/Nord%2FLB/logo',
URLAnatomy
»»»params='',query='shape=square&dpi=96',fragment='')
ThequerystringthatisofferedbytheParseResultcanthenbesubmittedtooneoftheparsingroutinesifyouwanttointerpretitasaseriesofkey-valuepairs,whichisastandardwayforwebformstosubmitthem:
>>>parse_qs(p.query)
{'shape':['square'],'dpi':['96']}
Notethateachvalueinthisdictionaryisalist,ratherthansimplyastring.ThisistosupportthefactthatagivenparametermightbespecifiedseveraltimesinasingleURL;insuchcases,thevaluesaresimplyappendedtothelist:
>>>parse_qs('mode=topographic&pin=Boston&pin=San%20Francisco')
{'mode':['topographic'],'pin':['Boston','SanFrancisco']}
This,youwillnote,preservestheorderinwhichvaluesarrive;ofcourse,thisdoesnotpreservetheorderoftheparametersthemselvesbecausedictionarykeysdonotrememberanyparticularorder.Iftheorderisimportanttoyou,thenusetheparse_qsl()functioninstead(thelmuststandfor“list”):
>>>parse_qsl('mode=topographic&pin=Boston&pin=San%20Francisco')
[('mode','topographic'),('pin','Boston'),('pin','SanFrancisco')]
`
Finally,notethatan“anchor”appendedtoaURLaftera#characterisnotrelevanttotheHTTPprotocol.ThisisbecauseanyanchorisstrippedoffandisnotturnedintopartoftheHTTPrequest.Instead,theanchortellsawebclienttojumptosomeparticularsectionofadocumentaftertheHTTPtransactioniscompleteandthedocumenthasbeendownloaded.Toremovetheanchor,useurldefrag():
>>>u='http://docs.python.org/library/urlparse.html#urlparse.urldefrag'
>>>urldefrag(u)
('http://docs.python.org/library/urlparse.html','urlparse.urldefrag')
YoucanturnaParseResultbackintoaURLbycallingitsgeturl()method.Whencombinedwiththeurlencode()function,whichknowshowtobuildquerystrings,thiscanbeusedtoconstructnewURLs:
>>>importurllib,urlparse
>>>query=urllib.urlencode({'company':'Nord/LB','report':'sales'})
>>>p=urlparse.ParseResult(
...'https','example.com','data',None,query,None)
>>>p.geturl()
'https://example.com/data?report=sales&company=Nord%2FLB'
Forlast,theHTTPrequestlooklikethis:
GET/rfc/rfc2616.txtHTTP/1.1
Accept-Encoding:identity
Host:www.ietf.org
Connection:close
User-Agent:Python-urllib/2.7
AndtheHTTPresponsethatcomesbackoverthesocketalsostartswithasetofheaders,butthenalsoincludesabodythatcontainsthedocumentitselfthathasbeenrequested:
HTTP/1.1200OK
Server:cloudflare-nginx
Date:Fri,11Jul201407:02:55GMT
Content-Type:text/plain
Transfer-Encoding:chunked
Connection:close
Set-Cookie:__cfduid=d5be98ff9fbae526f308d478da5bb413e1405062173934;expires=Mon,23-Dec-201923:50:00GMT;path=/;domain=.ietf.org;HttpOnly
Last-Modified:Fri,11Jun199918:46:53GMT
Vary:Accept-Encoding
CF-RAY:1483235b13c51043-CDG
<addinfourlat4341048456whosefp=<socket._fileobjectobjectat0x102a13750>>
Veryoften,thelinksusedinwebpagesdonotspecifyfullURLs,butrelativeURLsthataremissingseveraloftheusualcomponents.Whenoneoftheselinksneedstoberesolved,theclientneedstofillinthemissinginformationwiththecorrespondingfieldsfromtheURLusedtofetchthepageinthefirstplace.
Thesimplestrelativelinksarethenamesofpagesoneleveldeeperthanthebasepage:
>>>urlparse.urljoin('http://www.python.org/psf/','grants')
'http://www.python.org/psf/grants'
>>>urlparse.urljoin('http://www.python.org/psf/','mission')
'http://www.python.org/psf/mission'
NotethecrucialimportanceofthetrailingslashintheURLs:
>>>urlparse.urljoin('http://www.python.org/psf','grants')
'http://www.python.org/grants'
LikefilesystempathsonthePOSIXandWindowsoperatingsystems,.canbeusedforthecurrentdirectoryand..isthenameoftheparent:
>>>urlparse.urljoin('http://www.python.org/psf/','./mission')
'http://www.python.org/psf/mission'
>>>urlparse.urljoin('http://www.python.org/psf/','../news/')
'http://www.python.org/news/'
>>>urlparse.urljoin('http://www.python.org/psf/','/dev/')
'http://www.python.org/dev'
`
And,asillustratedinthelastexample,arelativeURLthatstartswithaslashisassumedtoliveatthetoplevelofthesamesiteastheoriginalURL.Happily,theurljoin()functionignoresthebaseURLentirelyifthesecondargumentalsohappenstobeanabsoluteURL.ThismeansthatyoucansimplypasseveryURLonagivenwebpagetotheurljoin()function,andanyrelativelinkswillbeconverted;atthesametime,absolutelinkswillbepassedthroughuntouched:
>>>#Absolutelinksaresafefromchange
...
>>>urlparse.urljoin('http://www.python.org/psf/','http://yelp.com/')
'http://yelp.com/'
RelativeURLs
WenowturntotheHTTPprotocolitself.Althoughitson-the-wireappearanceisusuallyaninternaldetailhandledbywebbrowsersandlibrarieslikeurllib2module.Theurllib2moduledefinesfunctionsandclasseswhichhelpinopeningURLs(mostlyHTTP)inacomplexworld—basicanddigestauthentication,redirections,cookiesandmore.
wearegoingtoadjustitsbehaviorsothatwecanseetheprotocolprintedtothescreen.Takealookatverbose_http.py:
importStringIO,httplib,urllib2
classVerboseHTTPResponse(httplib.HTTPResponse):
def_read_status(self):
s=self.fp.read()
print'-'*20,'Response','-'*20
prints.split('\r\n\r\n')[0]
self.fp=StringIO.StringIO(s)
returnhttplib.HTTPResponse._read_status(self)
classVerboseHTTPConnection(httplib.HTTPConnection):
response_class=VerboseHTTPResponse
defsend(self,s):
print'-'*50
prints.strip()
httplib.HTTPConnection.send(self,s)
classVerboseHTTPHandler(urllib2.HTTPHandler):
defhttp_open(self,req):
returnself.do_open(VerboseHTTPConnection,req)
Thiscustomizationprintsoutboththeoutgoingrequestandtheincomingresponseinsteadofkeepingthembothhidden.
Toallowforcustomization,theurllib2libraryletsyoubypassitsvanillaurlopen()functionandinsteadbuildanopenerfullofhandlerclassesofyourowndevising—afactthatwewilluserepeatedlyasthischapterprogresses.Listing9–1providesexactlysuchahandlerclassbyperformingaslightcustomizationonthenormalHTTPhandler.Thiscustomizationprintsoutboththeoutgoingrequestandtheincomingresponseinsteadofkeepingthembothhidden.Formanyofthefollowingexamples,wewilluseanopenerobjectthatwebuildrighthere,usingthehandlerfromverbose_http.py:
>>>fromverbose_httpimportVerboseHTTPHandler
>>>importurllib,urllib2
>>>opener=urllib2.build_opener(VerboseHTTPHandler)
YoucantryusingthisopeneragainsttheURLoftheRFCthatwementionedatthebeginningofthischapter:opener.open('http://www.ietf.org/rfc/rfc2616.txt')
Instrumentingurllib2
WhentheearliestversionofHTTPwasfirstinvented,ithadasinglepower:toissueamethodcalledGETthatnamedandreturnedahypertextdocumentfromaremoteserver.Thatmethodisstillthebackboneoftheprotocoltoday.
TheGETmethod,likeallHTTPmethods,isthefirstthingtransmittedaspartofanHTTPrequest,anditisimmediatelyfollowedbytherequestheaders.ForsimpleGETmethods,therequestsimplyendswiththeblanklinethatterminatestheheaderssotheservercanimmediatelystopreadingandsendaresponse.
>>>info=opener.open('http://www.ietf.org/rfc/rfc2616.txt')
GET/rfc/rfc2616.txtHTTP/1.1
Accept-Encoding:identity
Host:www.ietf.org
Connection:close...
Theopener’sopen()method,liketheplainurlopen()functionatthetoplevelofurllib2,returnsaninformationobjectthatletsusexaminetheresultoftheGETmethod.YoucanseethattheHTTPrequeststartedwithastatuslinecontainingtheHTTPversion,astatuscode,andashortmessage.Theinfoobjectmakestheseavailableasobjectattributes;italsoletsusexaminetheheadersthroughadictionary-likeobject:
>>>info.code
200
>>>info.msg
'OK'
>>>sorted(info.headers.keys())
['accept-ranges','connection','content-length','content-type',
'date','etag','last-modified','server','vary']
>>>info.headers['Content-Type']
'text/plain'
Finally,theinfoobjectisalsopreparedtoactasafile.TheHTTPresponsestatusline,theheaders,andtheblanklinethatfollowsthemhaveallbeenreadfromtheHTTPsocket,andnowtheactualdocumentiswaitingtoberead.Asisusuallythecasewithfileobjects,youcaneitherstartreadingtheinfoobjectinpiecesthroughread(N)orreadline();oryoucanchoosetobringtheentiredatastreamintomemoryasasinglestring:
>>>printinfo.read().strip()
NetworkWorkingGroupR.Fielding
RequestforComments:2616UCIrvine
Obsoletes:2068J.Gettys
Category:StandardsTrackCompaq/W3C
...
ThesearethefirstlinesofthelongertextfilethatyouwillseeifyoupointyourwebbrowseratthesameURL.
InaworldofsixbillionpeopleandfourbillionIPaddresses,theneedquicklybecamecleartosupportserversthatmighthostdozensofwebsitesatthesameIP.AndthatiswhytheURLlocationisnowincludedineveryHTTPrequest.Forcompatibility,ithasnotbeenmadepartoftheGETrequestlineitself,buthasinsteadbeenstuckintotheheadersunderthenameHost.
>>>info=opener.open('http://www.google.com/')
--------------------Response--------------------
HTTP/1.1302Found
Cache-Control:private
...
--------------------------------------------------
GET/?gfe_rd=cr&ei=OY6_U_qjHOeA8QeTg4H4BQHTTP/1.1
Accept-Encoding:identity
Host:www.google.es
Connection:close
TheGETMethodandTheHostHeader
User-Agent:Python-urllib/2.7
--------------------Response--------------------
HTTP/1.1200OK
...
Dependingonhowtheyareconfigured,serversmightreturnentirelydifferentsiteswhenconfrontedwithtwodifferentvaluesforHost;theymightpresentslightlydifferentversionsofthesamesite;ortheymightignoretheheaderaltogether.Butsemantically,tworequestswithdifferentvaluesforHostareaskingabouttwoentirelydifferentURLs.WhenseveralsitesarehostedatasingleIPaddress,thosesitesareeachsaidtobeservedbyavirtualhost,andthewholepracticeissometimesreferredtoasvirtualhosting.
IsalsoimportanttotakecarethatwhenhandlingHTTPdiffrentresponsescanhappend,betweenthemcodes,errors,andredirection.Youcanreadmoreaboutthishere.
Bydefault,HTTP/1.1serverswillkeepaTCPconnectionopenevenaftertheyhavedeliveredtheirresponse.Thisenablesyoutomakefurtherrequestsonthesamesocketandavoidtheexpenseofcreatinganewsocketforeverypieceofdatayoumightneedtodownload.Keepinmindthatdownloadingamodernwebpagecaninvolvefetchingdozens,ifnothundreds,ofseparatepiecesofcontent.TheHTTPConnectionclassprovidedbyurllib2letsyoutakeadvantageofthisfeature.Infact,allrequestsgothroughoneoftheseobjects;whenyouuseafunctionlikeurlopen()orusetheopen()methodonanopenerobject,anHTTPConnectionobjectiscreatedbehindthescenes,usedforthatonerequest,andthendiscarded.Whenyoumightmakeseveralrequeststothesamesite,useapersistentconnectioninstead:
>>>importhttplib
>>>c=httplib.HTTPConnection('www.python.org')
>>>c.request('GET','/')
>>>original_sock=c.sock
>>>content=c.getresponse().read()#getthewholepage
>>>c.request('GET','/about/')
>>>c.sockisoriginal_sock
True
Now,ifweinsertthisheadermanually,thenweforcetheHTTPConnectionobjecttocreateasecondsocketwhenweaskitforasecondpage:
>>>c=httplib.HTTPConnection('www.python.org')
>>>c.request('GET','/',headers={'Connection':'close'})
>>>original_sock=c.sock
>>>content=c.getresponse().read()
>>>c.request('GET','/about/')
>>>c.sockisoriginal_sock
False
NotethatHTTPConnectiondoesnotraiseanexceptionwhenonesocketclosesandithastocreateanotherone;youcankeepusingthesameobjectoverandoveragain.Thisholdstrueregardlessofwhethertheserverisacceptingalloftherequestsoverasinglesocket,oritissometimeshangingupandforcingHTTPConnectiontoreconnect.
PayloadsandPersistentConnections
ThePOSTHTTPmethodwasdesignedtopowerwebforms.WhenformsareusedwiththeGETmethod,whichisindeedtheirdefaultbehavior,theyappendtheform’sfieldvaluestotheendoftheURL:http://www.google.com/search?q=python+language
TheconstructionofsuchaURLcreatesanewnamedlocationthatcanbesaved;bookmarked;referencedfromotherwebpages;andsentine-mails,Tweets,andtextmessages.Andforactionslikesearchingandselectingdata,thesefeaturesareperfect.Butwhataboutaloginformthatacceptsyoure-mailaddressandpassword?NotonlywouldtherebenegativesecurityimplicationstohavingtheseelementsappendedtotheformURL—suchasthefactthattheywouldbedisplayedonthescreenintheURLbarandincludedinyourbrowserhistory—butsurelyitwouldbeoddtothinkofyourusernameandpasswordascreatinganewlocationorpageonthewebsiteinquestion:http://example.com/welcome?email=brandon@rhodesmill.org&pw=aaz9Gog3
BuildingURLsinthiswaywouldimplythatadifferentpageexistsontheexample.comwebsiteforeverypossiblepasswordthatyoucouldtrytyping.Thisisundesirableforobviousreasons.AndsothePOSTmethodshouldalwaysbeusedforformsthatarenotconstructingthenameofaparticularpageorlocationonawebsite,butareinsteadperformingsomeactiononbehalfofthecaller.FormsinHTMLcanspecifythattheywantthebrowsertousePOSTbyspecifyingthatmethodintheir`
element:
<formname="myloginform"action="/access/dummy"method="post">
E-mail:<inputtype="text"name="e-mail"size="20">
Password:<inputtype="password"name="password"size="20">
<inputtype="submit"name="submit"value="Login">
</form>
InsteadofstuffingformparametersintotheURL,aPOSTcarriestheminthebodyoftherequest.WecanperformthesameactionourselvesinPythonbyusingurlencodetoformattheformparameters,andthensupplyingthemasasecondparametertoanyoftheurllib2methodsthatopenaURL.-(FromthestandardPythonlibrary:urllib.urlencode(query[,doseq])Convertamappingobjectorasequenceoftwo-elementtuplestoa“percent-encoded”string,suitabletopasstourlopen()aboveastheoptionaldataargument.ThisisusefultopassadictionaryofformfieldstoaPOSTrequest.)
form=urllib.urlencode({'inputstring':'Atlanta,GA'})
>>>response=opener.open('http://forecast.weather.gov/zipcity.php',form)
--------------------------------------------------
POST/zipcity.phpHTTP/1.1
...
Content-Length:25
Host:forecast.weather.gov
Content-Type:application/x-www-form-urlencoded
...
--------------------------------------------------
inputstring=Atlanta%2C+GA
--------------------Response--------------------
HTTP/1.1302Found
...
Location:http://forecast.weather.gov/MapClick.php?CityName=Atlanta&state=GA
&site=FFC&textField1=33.7629&textField2=-84.4226&e=1
...
--------------------------------------------------
GET/MapClick.php?CityName=Atlanta&state=GA&site=FFC&textField1=33.7629&textField2=
-84.4226&e=1HTTP/1.1
...
--------------------Response--------------------
HTTP/1.1200OK
...
AlthoughouropenerobjectisputtingadashedlinebetweeneachHTTPrequestanditspayloadforclarity(ablankline,youwillrecall,iswhatreallyseparatesheadersandpayloadonthewire)youareotherwiseseeingarawHTTPPOSTmethodhere.Notethesefeaturesoftherequest-responsesshowninexampleabove:
POSTAndForms
TherequestlinestartswiththestringPOST.Contentisprovided(andthus,aContent-Lengthheader).Theformparametersaresentasthebody.TheContent-Typeforstandardwebformsisx-www-form-urlencoded.
ThemostimportantthingtograspisthatGETandPOSTaremostemphaticallynotsimplytwodifferentwaystoformatformparameters.Instead,theyactuallymeantwoentirelydifferentthings.TheGETmethodmeans,“IbelievethatthereisadocumentatthisURL;pleasereturnit.”ThePOSTmethodmeans,“HereisanactionthatIwantperformed.”
InthePOSTexampleaboveyoucannoticethatinsteadofsimplyreturningastatusof200followedbyapageofweatherforecastdata,itinsteadreturneda302redirectthaturllib2obeyedbyperformingaGETforthepagenamedintheLocation:header.
AwebsiteleavesusersinaverydifficultpositionifitanswersaPOSTformsubmissionwithaliteralwebpage.Well-designeduser-facingPOSTformsalwaysredirecttoapagethatshowstheresultoftheaction,andthispagecanbesafelybookmarked,shared,stored,andreloaded.Thisisanimportantfeatureofmodernbrowsers:ifaPOSTresultsinaredirect,thenpressingthereloadbuttonsimplyrefetchesthefinalURLanddoesnotreattemptthewholetrainofredirectsthatleadtothecurrentlocation
SuccessfulFormPOSTsShouldAlwaysRedirect
Web-basedAPIs,whichfetchdocumentsanddatausingGETandPOSTtospecificURLs.Therefore,weshouldimmediatelynotethatmanymodernwebservicestrytointegratetheirAPIsmoretightlywithHTTPbygoingbeyondthetwomostcommonHTTPmethodsbyimplementingadditionalmethodslikePUTandDELETE.
Adesignpatternnamed“RepresentationalStateTransfer”hasthereforebeentakingholdinmanydevelopercommunities.ItspecifiesthatthenounsofanAPIshouldliveattheirownURLs.Forexample,PUT,GET,POST,andDELETEshouldbeused,respectively,tocreate,fetch,modify,andremovethedocumentslivingattheseURLs.
Bycouplingthisbasicrecommendationwithfurtherguidelines,theRESTmethodologyguidesthecreationofwebservicesthatmakemorecompleteuseoftheHTTPprotocol.Suchwebservicesalsoofferquitecleansemantics,andcanbeacceleratedbythesamecachingproxiesthatareoftenusedtospeedthedeliveryofnormalwebpages.
NotethatHTTPsupportsarbitrarymethodnames,eventhoughthestandarddefinesspecificsemanticsforGETandPOSTandalloftherest.Traditionwoulddictateusingthewell-knownmethodsdefinedinthestandardunlessyouareusingaspecificframeworkormethodologythatrecognizesandhasdefinedothermethods.
RESTAndMoreHTTPMethods
User-Agent:Python-urllib/2.6:ThisheaderisoptionalintheHTTPprotocol,andmanysitessimplyignoreorlogit.Itcanbeusefulwhensiteswanttoknowwhichbrowserstheirvisitorsusemostoften,anditcansometimesbeusedtodistinguishsearchenginespiders(bots)fromnormalusersbrowsingasite.
Manywebsitesaresensitivetothekindsofbrowsersthatviewthem.Ifyouneedtoaccesssuchsiteswith`urllib2,youcansimplyinstructittolieaboutitsidentity,andthereceivingwebsitewillnotknowthedifference:
>>>url='https://wca.eclaim.com/'
>>>urllib2.urlopen(url).read()
'<HTML>...Thefollowingare...required...MicrosoftInternetExplorer...'
>>>agent='Mozilla/5.0(Windows;U;MSIE7.0;WindowsNT6.0;en-US)'
>>>request=urllib2.Request(url)
>>>request.add_header('User-Agent',agent)
>>>urllib2.urlopen(request).read()
'\r\n<HTML>\r\n<HEAD>\r\n\t<TITLE>Eclaim.com-LogIn</TITLE>...'
Therearedatabasesofpossibleuseragentstringsonlineatseveralsitesthatyoucanreferencebothwhenanalyzingagentstringsthatyourownservershavereceived,aswellaswhenconcoctingstringsforyourownHTTPrequests:
http://www.zytrax.com/tech/web/browser_ids.htmhttp://www.useragentstring.com/pages/useragentstring.php
IdentifyingUserAgentsandWebServers
ItisalwayspossibletosimplymakeanHTTPrequestandlettheserverreturnadocumentwithwhateverContent-Type:isappropriatefortheinformationwehaverequested.Someoftheusualcontenttypesencounteredbyabrowserincludethefollowing:text/html,text/plain,text/css,image/gif,image/jpeg,image/x-png,application/javascript,application/pdf,application/zip.
Ifthewebserviceisreturningagenericdatastreamofbytesthatitcannotdescribemorespecifically,itcanalwaysfallbacktothecontenttype:application/octet-stream.
Thefourheadersthatwillinterestyouincludethefollowing:Accept,Accept-Charset,Accept-Language,Accept-Encoding
Eachoftheseheaderssupportsacomma-separatedlistofitems,whereeachitemcanbegivenaweightbetweenoneandzero(largerweightsindicatemorepreferreditems)byaddingasuffixthatconsistsofasemi-colonandq=stringtotheitem.Theresultwilllooksomethinglikethis(using,forillustration,theAccept:headerthatmyGoogleChromebrowserseemstobecurrentlyusing):Accept:application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;»q=0.8,image/png,*/*;q=0.5
ThisindicatesthatChromeprefersXMLandXHTML,butwillacceptHTMLorevenplaintextifthosearetheonlydocumentformatsavailable;thatChromeprefersPNGimageswhenitcangetthem;andthatithasnopreferencebetweenalloftheothercontenttypesinexistence.
ContentTypeNegotiation
WhilemanydocumentsdeliveredoverHTTParealreadyfairlyheavilycompressed,includingimagesandfileformatslikePDF,webpagesthemselvesarewritteninverboseSGMLdialectsthatcanconsumemuchlessbandwidthifsubjectedtogenerictextualcompression.Similarly,CSSandJavaScriptfilesalsocontainverystereotypedpatternsofpunctuationandrepeatedvariablenames,whichisveryamenabletocompression.
Webclientscanmakeserversawarethattheyacceptcompresseddocumentsbylistingtheformatstheysupportinarequestheader,asinthisexample:`Accept-Encoding:gzip``
Forsomereason,manysitesseemtonotoffercompressionunlesstheUser-Agent:headerspecifiessomethingtheyrecognize.Thus,toconvinceGoogletocompressitsGoogleNewspage,youhavetouseurllib2somethinglikethis:
>>>request=urllib2.Request('http://news.google.com/')
>>>request.add_header('Accept-Encoding','gzip')
>>>request.add_header('User-Agent','Mozilla/5.0')
>>>info=opener.open(request)
--------------------------------------------------
GET/HTTP/1.1
Host:news.google.com
User-Agent:Mozilla/5.0
Connection:close
Accept-Encoding:gzip
--------------------Response--------------------
HTTP/1.1200OK
Content-Type:text/html;charset=UTF-8
...
Content-Encoding:gzip
...
Rememberthatwebserversdonothavetoperformcompression,andthatmanywillignoreyourAccept-Encoding:header.Therefore,youshouldalwayscheckthecontentencodingoftheresponse,andperformdecompressiononlywhentheserverdeclaresthatitisnecessary:
>>>info.headers['Content-Encoding']=='gzip'
True
>>>importgzip,StringIO
>>>gzip.GzipFile(fileobj=StringIO.StringIO(info.read())).read()
'<!DOCTYPEHTML...<html>...</html>'
Asyoucansee,Pythondoesnotletuspassthefile-likeinforesponseobjectdirectlytotheGzipFileclassbecause,itisnotquitefile-likeenough.Here,wecanperformthequickwork-aroundofreadingthewholecompressedfileintomemoryandthenwrappingitinaStringIOobjectthatdoessupport`tell().
Compression
Manyelementsofatypicalwebsitedesignarerepeatedoneverypageyouvisit,andyourbrowsingwouldslowtoacrawlifeveryimageanddecorationhadtobedownloadedseparatelyforeverypageyouviewed.Well-configuredwebserversthereforeaddheaderstoeveryHTTPresponsethatallowbrowsers,aswellasanyproxycachesbetweenthebrowserandtheserver,tocontinueusingacopyofadownloadedresourceforsomeperiodoftimeuntilitexpires.
Therearetwobasicmechanismsbywhichserverscansupportclientcaching.Inthefirstapproach,anHTTPresponseincludesanExpires:headerthatformatsadateandtimeusingthesameformatasthestandardDate:header:Expires:Sun,21Jan201017:06:12GMT.However,thisrequirestheclienttocheckitsclock—andmanycomputersrunclocksthatarefaraheadoforbehindtherealcurrentdateandtime.
Thisbringsustoasecond,moremodernalternative,theCache-Controlheader,thatdependsonlyontheclientbeingabletocorrectlycountsecondsforwardfromthepresent.Forexample,toallowanimageorpagetobecachedforanhourbuttheninsistthatitberefetchedoncethehourisup,acachecontrolheadercouldbesuppliedlikethis:Cache-Control:max-age=3600,must-revalidate.
HTTPCaching
It’spossiblethatyoumightwantyourprogramtocheckaseriesoflinksforvalidityorwhethertheyhavemoved,butyoudonotwanttoincurtheexpenseofactuallydownloadingthebodythatwouldfollowtheHTTPheaders.Inthiscase,youcanissueaHEADrequest.Thisisdirectlypossiblethroughhttplib,butitcanalsobeperformedbyurllib2ifyouarewillingtowriteasmallrequestclassofyourown:
>>>classHeadRequest(urllib2.Request):
...defget_method(self):
...return'HEAD'
...
>>>info=urllib2.urlopen(HeadRequest('http://www.google.com/'))
>>>info.read()
''
Youcanseethatthebodyoftheresponseiscompletelyempty.
TheHEADMethod
AnencryptedURLstartswithhttps:insteadofsimplyhttp:,usesthedefaultport443insteadofport80,andusesTLS.
EncryptionhastobenegotiatedbeforetheusercansendhisHTTPrequest,lestalloftheinformationinitbedivulged;butuntiltherequestistransmitted,theserverdoesnotknowwhatHost:therequestwillspecify.Therefore,encryptedwebsitesstillliveundertheoldproblemofhavingtouseadifferentIPaddressforeverydomainthatmustbehosted.
Atechniqueknownas“ServerNameIndication”(SNI)hasbeendevelopedtogetaroundthistraditionalrestriction;however,Pythondoesnotyetsupportit.Itappears,though,thatapatchwasappliedtothePython3trunkwiththisfeature,onlydayspriortothetimeofwriting.Hereistheticketincaseyouwanttofollowtheissue:http://bugs.python.org/issue5639.
TouseHTTPSfromPython,simplysupplyanhttps:methodinyourURL:
>>>info=urllib2.urlopen('https://www.ietf.org/rfc/rfc2616.txt')
>>>
Iftheconnectionworksproperly,thenneitheryourgovernmentnoranyofthevariouslargeandshadowycorporationsthattracksuchthingsshouldbeabletoeasilydetermineeitherthesearchtermyouusedortheresultsyouviewed.
HTTPSEncryption
TheHTTPprotocolcamewithameansofauthenticationthatwassopoorlythoughtoutandsobadlyimplementedthatitseemstohavebeenalmostentirelyabandoned.Whenaserverwasaskedforapagetowhichaccesswasrestricted,itwassupposedtoreturnaresponsecode:HTTP/1.1401AuthorizationRequired.
Theauthenticationtokenwasgeneratedbydoingbase64encodingonthecolon-separatedusernameandpassword:
>>>importbase64
>>>printbase64.b64encode("guido:vanOranje!")
Z3VpZG86dmFuT3JhbmplIQ==
This,ofcourse,justprotectsanyspecialcharactersintheusernameandpasswordthatmighthavebeenconfusedaspartoftheheadersthemselves;itdoesnotprotecttheusernameandpasswordatall,sincetheycanverysimplybedecodedagain:
>>>printbase64.b64decode("Z3VpZG86dmFuT3JhbmplIQ==")
guido:vanOranje!
Anyway,oncetheencodedvaluewascomputed,itcouldbeincludedinthesecondrequestlikethis:`Authorization:BasicQWxhZGRpbjpvcGVuIHNlc2FtZQ==``
Anincorrectpasswordorunknownuserwouldelicitadditional401errorsfromtheserver,resultinginthepop-upboxappearingagainandagain.Finally,iftheusergotitright,shewouldeitherbeshowntheresourceor—ifsheinfactdidnothavepermission—beshownaresponsecodelikethefollowing:403Forbidden.
Pythonsupportsthiskindofauthenticationthroughahandlerthat,asyourprogramusesit,canaccumulatealistofpasswords.
auth_handler=.HTTPBasicAuthHandler()
auth_handler.add_password(realm='voetbal',uri='http://www.onsoranje.nl/',
user='guido',passwd='vanOranje!')
Theresultinghandlercanbepassedintobuild_opener().
HTTPAuthentication
Theactualmechanismthatpowersuseridentitytracking,loggingin,andloggingoutofmodernwebsitesisthecookie.TheHTTPresponsessentbyaservercanoptionallyincludeanumberofSet-cookie:headersthatbrowsersstoreonbehalfoftheuser.Ineverysubsequentrequestmadetothatsite,thebrowserwillincludeaCookie:headercorrespondingtoeachcookiethathasbeenset.
Themostobvioususeoscookiesistokeepupwithuseridentity.Tosupportloggingin,awebsitecandeployanormalformthatasksforyourusernameandpassword(ore-mailaddressandpassword,orwhatever).
Cookiescanalsobeusedforfeatsotherthansimplyidentifyingusers.Forexample,asitecanissueacookietoeverybrowserthatconnects,enablingittotrackevencasualvisitors.Thisapproachenablesanonlinestoretoletvisitorsstartbuildingashoppingcartfullofitemswithouteverbeingforcedtocreateanaccount.
Fromthepointofviewofawebclient,cookiesaremoderatelyshortstringsthathavetobestoredandthendivulgedwhenmatchingrequestsaremade.ThePythonStandardLibraryputsthislogicinitsownmodule,cookielib(ThecookielibmoduledefinesclassesforautomatichandlingofHTTPcookies.),whoseCookieJarobjectscanbeusedassmallcookiedatabasesbytheHTTPCookieProcessorin`urllib2.Toseeitseffect,youneedgonofurtherthanthefrontpageofGoogle,whichsetscookiesinthemereeventofanunknownvisitorarrivingatthesiteforthefirsttime.Hereishowwecreateanewopenerthatknowsaboutcookies:
>>>importcookielib
>>>cj=cookielib.CookieJar()
>>>cookie_opener=urllib2.build_opener(VerboseHTTPHandler,
...urllib2.HTTPCookieProcessor(cj))
OpeningtheGooglefrontpagewillresultintwodifferentcookiesgettingset:
>>>response=cookie_opener.open('http://www.google.com/')
--------------------------------------------------
GET/HTTP/1.1
...
--------------------Response--------------------
HTTP/1.1200OK
...
Set-Cookie:PREF=ID=94381994af6d5c77:FF=0:TM=1288205983:LM=1288205983:S=Mtwivl7EB73uL5Ky;
expires=Fri,26-Oct-201218:59:43GMT;path=/;domain=.google.com
Set-Cookie:NID=40=rWLn_I8_PAhUF62J0yFLtb1-AoftgU0RvGSsa81FhTvd4vXD91iU5DOEdxSVt4otiISY-
3RfEYcGFHZA52w3-85p-hujagtB9akaLnS0QHEt2v8lkkelEGbpo7oWr9u5;expires=Thu,28-Apr-2011
18:59:43GMT;path=/;domain=.google.com;HttpOnly
...
Ifyouinvestigatemoreaboutcookielib,youwillfindthatyoucandomorethanqueryandmodifythecookiesthathavebeenset.Youcanalsoautomaticallystoretheminafile,sothattheysurvivefromonePythonsessiontothenext.Youcanevencreatecookieprocessorsthatimplementyourowncustompolicieswithrespecttowhichcookiestostoreandwhichtodivulge.
Serverscanconstrainacookietoaparticulardomainandpath,inadditiontosettingaMax-ageorexpirestime.Unfortunately,somebrowsersignorethissetting,sositesshouldneverbasetheirsecurityontheassumptionthattheexpirestimewillbeobeyed.Therefore,serverscanmarkcookiesassecure;thisensuresthatsuchcookiesareonlytransmittedwithHTTPSrequeststothesiteandneverinunsecureHTTPrequests.
Cookies
Aperpetualproblemwithcookiesisthatwebsitedesignersdonotseemtorealizethatcookiesneedtobeprotectedaszealouslyasyourusernameandpassword.Whileitistruethatwell-designedcookiesexpireandwillnolongerbeacceptedasvalidbytheserver,cookies—whiletheylast—giveexactlyasmuchaccesstoawebsiteasausernameandpassword.
Somesitesdonotprotectcookiesatall:theymightrequireHTTPSforyourusernameandpassword,butthenreturnyoutonormalHTTPfortherestofyoursession.OthersitesaresmartenoughtoprotectsubsequentpageloadswithHTTPS,evenafteryouhavelefttheloginpage,buttheyforgetthatstaticdatafromthesamedomain,likeimages,decorations,CSSfiles,andJavaScriptsourcecode,willalsocarryyourcookie.ThebetteralternativesaretoeithersendallofthatinformationoverHTTPS,ortocarefullyserveitfromadifferentdomainorpaththatisoutsidethejurisdictionofthesessioncookie.
ShouldyouhappentoobserveorcaptureaCookie:headerfromanHTTPrequestthatyouobserve,rememberthatthereisnoneedtostoreitinaCookieJarorrepresentitasacookielibobjectatall.Indeed,youcouldnotdothatanywaybecausetheoutgoingCookie:headerdoesnotrevealthedomainandpathrulesthatthecookiewasstoredwith.Instead,justinjecttheCookie:headerrawintotherequestsyoumaketothewebsite: pythonrequest=urllib2.Request(url)
request.add_header('Cookie',intercepted_value)info=urllib2.urlopen(request)
HTTPSessionHijacking
Theearliestexperimentswithscriptsthatcouldruninwebbrowsersrevealedaproblem:alloftheHTTPrequestsmadebythebrowserweredonewiththeauthorityoftheuser’scookies,sopagescouldcausequiteabitoftroublebyattemptingto,say,POSTtotheonlinewebsiteofapopularbankaskingthatmoneybetransferredtotheattacker’saccount.Anyonewhovisitedtheproblemsitewhileloggedontothatparticularbankinanotherwindowcouldlosemoney.Toaddressthis,browsersimposedtherestrictionthatscriptsinlanguageslikeJavaScriptcanonlymakeconnectionsbacktothesitethatservedthewebpage,andnottootherwebsites.Thisiscalledthe“sameoriginpolicy.”
Today,would-beattackersfindwaysaroundthispolicybyusingaconstellationofattackscalledcross-sitescripting(knownbytheacronymXSStopreventconfusionwithCascadingStyleSheets).Thesetechniquesincludethingslikefindingthefieldsonawebpagewherethesitewillincludesnippetsofuser-provideddatawithoutproperlyescapingthem,andthenfiguringouthowtocraftasnippetofdatathatwillperformsomecompromisingactiononbehalfoftheuserorsendprivateinformationtoathirdparty.Next,thewouldbeattackersreleasealinkorcodecontainingthatsnippetontoapopularwebsite,bulletinboard,orinspame-mails,hopingthatthousandsofpeoplewillclickandinadvertentlyassistintheirattackagainstthesite.Thereareacollectionoftechniquesthatareimportantforavoidingcross-sitescripting;youcanfindtheminanygoodreferenceonwebdevelopment.Themostimportantonesincludethefollowing:
WhenprocessingaformthatissupposedtosubmitaPOSTrequest,alwayscarefullydisregardanyGETparameters.
NeversupportURLsthatproducesomesideeffectorperformsomeactionsimplythroughbeingthesubjectofaGET.
Ineveryform,includenotonlytheobviousinformation—suchasadollaramountanddestinationaccountnumberforbanktransfers—butalsoahiddenfieldwithasecretvaluethatmustmatchforthesubmissiontobevalid.Thatway,randomPOSTrequeststhatattackersgeneratewiththedollaramountanddestinationaccountnumberwillnotworkbecausetheywilllackthesecretthatwouldmakethesubmissionvalid.
WhilethepossibilitiesforXSSarenot,strictlyspeaking,problemsorissueswiththeHTTPprotocolitself,ithelpstohaveasolidunderstandingofthemwhenyouaretryingtowriteanyprogramthatoperatessafelyontheWorldWideWeb.
AlibrarycalledWebObisalsoavailableforPython(andlistedonthePythonPackageIndex)thatcontainsHTTPrequestandresponseclassesthatweredesignedfromtheotherdirection:thatis,theywereintendedallalongasgeneral-purposerepresentationsofHTTPinallofitslow-leveldetails.YoucanlearnmoreaboutthemattheWebObprojectwebpage:http://pythonpaste.org/webob/
Cross-SiteScriptingAttacks
Mostwebsitesaredesignedfirstandforemostforhumaneyes.Whilewell-designedsitesofferformalAPIsbywhichyoucanconstructGooglemaps,uploadFlickrphotos,orbrowseYouTubevideos,manysitesoffernothingbutHTMLpagesformattedforhumans.Ifyouneedaprogramtobeabletofetchitsdata,thenyouwillneedtheabilitytodiveintodenselyformattedmarkupandretrievetheinformationyouneed—aprocessknownaffectionatelyasscreenscraping.
ScreenScraping
BeforeyoucanparseanHTML-formattedwebpage,youofcoursehavetoacquiresome.Herearesomeoptionsfordownloadingcontent.
Youcanuseurllib2,ortheevenlower-levelhttplib,toconstructanHTTPrequestthatwillreturnawebpage.Foreachformthathastobefilledout,youwillhavetobuildadictionaryrepresentingthefieldnamesanddatavaluesinside;unlikearealwebbrowser,theselibrarieswillgiveyounohelpinsubmittingforms.
Youcantoinstallmechanizeandwriteaprogramthatfillsoutandsubmitswebformsmuchasyouwoulddowhensittinginfrontofawebbrowser.Thedownsideisthat,tobenefitfromthisautomation,youwillneedtodownloadthepagecontainingtheformHTMLbeforeyoucanthensubmitit—possiblydoublingthenumberofwebrequestsyouperform.
Ifyouneedtodownloadandparseentirewebsites,takealookattheScrapyproject,hostedathttp://scrapy.org,whichprovidesaframeworkforimplementingyourownwebspiders.Withthetoolsitprovides,youcanwriteprogramsthatfollowlinkstoeverypageonawebsite,tabulatingthedatayouwantextractedfromeachpage.
WhenwebpageswindupbeingincompletebecausetheyusedynamicJavaScripttoloaddatathatyouneed,youcanusetheQtWebKitmoduleofthePyQt4librarytoloadapage,lettheJavaScriptrun,andthensaveorparsetheresultingcompleteHTMLpage.
Finally,ifyoureallyneedabrowsertoloadthesite,boththeSeleniumandWindmilltestplatformsprovideawaytodriveastandardwebbrowserfrominsideaPythonprogram.Youcanstartthebrowserup,directittothepageofinterest,filloutandsubmitforms,dowhateverelseisnecessarytobringupthedatayouneed,andthenpulltheresultinginformationdirectlyfromtheDOMelementsthatholdthem.
FetchingWebPages
Thetaskofgrabbinginformationfromawebsiteusuallystartsbyreadingitcarefullywithawebbrowserandfindingaroutetotheinformationyouneed.
Figurefetch_urllib2.pyshowsthesiteoftheNationalWeatherService;forourfirstexample,wewillwriteaprogramthattakesacityandstateasargumentsandprintsoutthecurrentconditions,temperature,andhumidity.
Whenusingtheurllib2modulefromtheStandardLibrary,youwillhavetoreadthewebpageHTMLmanuallytofindtheform.YoucanusetheViewSourcecommandinyourbrowser,searchforthewords“Localforecast,”andfindthefollowingforminthemiddleoftheseaofHTML:
<formmethod="post"action="http://forecast.weather.gov/zipcity.php"...>
...
<inputtype="text"id="zipcity"name="inputstring"size="9"
»value="City,St"onfocus="this.value='';"/>
<inputtype="submit"name="Go2"value="Go"/>
</form>
Theonlyimportantelementsherearethe<form>itselfandthe<input>fieldsinside;everythingelseisjustdecorationintendedtohelphumanreaders.ThisformdoesaPOSTtoaparticularURLwith,itappears,justoneparameter:aninputstringgivingthecitynameandstate.fetch_urllib2.pyshowsasimplePythonprogramthatusesonlytheStandardLibrarytoperformthisinteraction,andsavestheresulttophoenix.html.
importurllib,urllib2
data=urllib.urlencode({'inputstring':'Phoenix,AZ'})
info=urllib2.urlopen('http://forecast.weather.gov/zipcity.php',data)
content=info.read()
open('phoenix.html','w').write(content)
Ontheonehand,urllib2makesthisinteractionveryconvenient;weareabletodownloadaforecastpageusingonlyafewlinesofcode.But,ontheotherhand,wehadtoreadandunderstandtheformourselvesinsteadofrelyingonanactualHTMLparsertoreadit.Theapproachencouragedbymechanizeisquitedifferent:youneedonlytheaddressoftheopeningpagetogetstarted,andthelibraryitselfwilltakeresponsibilityforexploringtheHTMLandlettingyouknowwhatformsarepresent.Herearetheformsthatitfindsonthisparticularpage:
>>>importmechanize
>>>br=mechanize.Browser()
>>>response=br.open('http://www.weather.gov/')
>>>forforminbr.forms():
...print'%r%r%s'%(form.name,form.attrs.get('id'),form.action)
...forcontrolinform.controls:
...print'',control.type,control.name,repr(control.value)
NoneNonehttp://search.usa.gov/search
»hiddenv:project'firstgov'
»textquery''
»radioaffiliate['nws.noaa.gov']
»submitNone'Go'
NoneNonehttp://forecast.weather.gov/zipcity.php
»textinputstring'City,St'
»submitGo2'Go'
'jump''jump'http://www.weather.gov/
»selectmenu['http://www.weather.gov/alerts-beta/']
»buttonNoneNone
Oncewehavedeterminedthatweneedthezipcity.phpform,wecanwriteaprogramlikethatshowninetch_mechanize.py.Youcanseethatatnopointdoesitbuildasetofformfieldsmanuallyitself,aswasnecessaryinourpreviouslisting.Instead,itsimplyloadsthefrontpage,setstheonefieldvaluethatwecareabout,andthenpressestheform’ssubmitbutton.NotethatsincethisHTMLformdidnotspecifyaname,wehadtocreateourownfilterfunction—the
DownloadingPagesThroughFormSubmission
lambdafunctioninthelisting—tochoosewhichofthethreeformswewanted.
importmechanize
br=mechanize.Browser()
br.open('http://www.weather.gov/')
br.select_form(predicate=lambda(form):'zipcity'inform.action)
br['inputstring']='Phoenix,AZ'
response=br.submit()
content=response.read()
open('phoenix.html','w').write(content)
Manymechanizeusersinsteadchoosetoselectformsbytheorderinwhichtheyappearinthepage—inwhichcasewecouldhavecalledselect_form(nr=1).ButIprefernottorelyontheorder,sincetherealidentityofaformisinherentintheactionthatitperforms,notitslocationonapage.
TheHypertextMarkupLanguage(HTML)isoneofmanymarkupdialectsbuiltatoptheStandardGeneralizedMarkupLanguage(SGML),whichbequeathedtotheworldtheideaofusingthousandsofanglebracketstomarkupplaintext.InsertingboldanditalicsintoaformatlikeHTMLisassimpleastypingeightanglebrackets:
The<b>very</b>strangebook<i>TristramShandy</i>.TheverystrangebookTristramShandy.
IntheterminologyofSGML,thestrings<b>and</b>areeachtags—theyare,infact,anopeningandaclosingtag—andtogethertheycreateanelementthatcontainsthetextveryinsideit.Elementscancontaintextaswellasotherelements,andcandefineaseriesofkey/valueattributepairsthatgivemoreinformationabouttheelement:
<pcontent="personal">Iamreading<idocument="play">Hamlet</i>.</p>
IamreadingHamlet.
TheproblemwithSGMLlanguagesinthisregard—andHTMLisoneparticularexample—isthattheyexpectparserstoknowtherulesaboutwhichelementscanbenestedinsidewhichotherelements,andthisleadstoconstructionslikethisunorderedlist<ul>,insidewhichareseverallistitems<li>:
<ul><li>First<li>Second<li>Third<li>Fourth</ul>
FirstSecondThirdFourth
SinceHTMLinfactsaysthat
elementscannotnest,anHTMLparserwillunderstandtheforegoingsnippettobeequivalenttothismoreexplicitXMLstring:
<ul><li>First</li><li>Second</li><li>Third</li><li>Fourth</li></ul>
FirstSecondThirdFourth
AndbeyondthisimplicitunderstandingofHTMLthataparsermustpossessarethetwinproblemsthat,first,variousbrowsersovertheyearshavevariedwildlyinhowwelltheycanreconstructthedocumentstructurewhengivenveryconciseorevendeeplybrokenHTML;and,second,mostwebpageauthorsjudgethequalityoftheirHTMLbywhethertheirbrowserofchoicerendersitcorrectly.ThishasresultednotonlyinaWorldWideWebthatisfullofsiteswithinvalidandbrokenHTMLmarkup,butalsointhefactthatthepermissivenessbuiltintobrowsershasencourageddifferentflavorsofbrokenHTMLamongtheirdifferentusergroups.
Formoredocumentationaboutthesetopicvisit:
http://www.w3.org/MarkUp/Guide/http://www.w3.org/MarkUp/Guide/Advanced.htmlhttp://www.w3.org/MarkUp/Guide/Stylehttp://werbach.com/barebones/barebones.htmlhttp://www.w3.org/TR/REC-html40/http://validator.w3.org/http://tidy.sourceforge.net/
TheStructureofWebPages
ParsingHTMLwithPythonrequiresthreechoices:
TheparseryouwillusetodigesttheHTML,andtrytomakesenseofitstangleofopeningandclosingtags.
TheAPI(ApplicationProgrammingInterface)bywhichyourPythonprogramwillaccessthetreeofconcentricelementsthattheparserbuiltfromitsanalysisoftheHTMLpage.
Whatkindsofselectorsyouwillbeabletowritetojumpdirectlytothepartofthepagethatinterestsyou,insteadofhavingtostepintothehierarchyoneelementatatime.
Theissueofselectorsisaveryimportantone,becauseawell-writtenselectorcanunambiguouslyidentifyanHTMLelementthatinterestsyouwithoutyourhavingtotouchanyoftheelementsaboveitinthedocumenttree.
Now,Ishouldpauseforasecondtoexplaintermslike“deeper,”andIthinktheconceptwillbeclearestifwereconsidertheunorderedlistthatwasquotedintheprevioussection.Anexperiencedwebdeveloperlookingatthatlistrearrangesitinherhead,sothatthisiswhatitlookslike:
First
Second
Third
Fourth
<ul>
<li>First</li>
<li>Second</li>
<li>Third</li>
<li>Fourth</li>
</ul>
Herethe<ul>elementissaidtobea“parent”elementoftheindividuallistitems,which“wraps”themandwhichisonelevel“above”theminthewholedocument.The<li>elementsare“siblings”ofoneanother;eachisa“child”ofthe<ul>elementthat“contains”them,andtheysit“below”theirparentinthelargerdocumenttree.ThiskindofspatialthinkingwindsupbeingveryimportantforworkingyourwayintoadocumentthroughanAPI.
Inbrief,hereareyourchoicesalongeachofthethreeaxesthatwerejustlisted:
Themostpowerful,flexible,andfastestparseratthemomentappearstobetheHTMLParserthatcomeswithlxml;thenextmostpowerfulisthelongtimefavoriteBeautifulSoup;andcomingindeadlastaretheparsingclassesincludedwiththePythonStandardLibrary,whichnooneseemstouseforseriousscreenscraping.
ThebestAPIformanipulatingatreeofHTMLelementsisElementTree,whichhasbeenbroughtintotheStandardLibraryforusewiththeStandardLibraryparsers,andisalsotheAPIsupportedbylxml;BeautifulSoupsupportsanAPIpeculiartoitself;andapairofancient,ugly,event-basedinterfacestoHTMLstillexistinthePythonStandardLibrary.
Thelxmllibrarysupportstwoofthemajorindustry-standardselectors:CSSselectorsandXPathquerylanguage;BeautifulSouphasaselectorsystemallitsown,butonethatisverypowerfulandhaspoweredcountlessweb-scrapingprogramsovertheyears.
ThreeAxes
ThetreeofobjectsthataparsercreatesfromanHTMLfileisoftencalledaDocumentObjectModel,orDOM,eventhoughthisisofficiallythenameofoneparticularAPIdefinedbythestandardsbodiesandimplementedbybrowsersfortheuseofJavaScriptrunningonawebpage.
Thetaskwehavesetforourselves,youwillrecall,istofindthecurrentconditions,temperature,andhumidityinthephoenix.htmlpagethatwehavedownloaded
Therearetwoapproachestonarrowingyourattentiontothespecificareaofthedocumentinwhichyouareinterested.YoucaneithersearchtheHTMLforawordorphraseclosetothedatathatyouwant,or,aswementionedpreviously,useGoogleChromeorFirefoxwithFirebugto“InspectElement”andseetheelementyouwantembeddedinanattractivediagramofthedocumenttree.
Toseehowdirectdocument-objectmanipulationwouldworkinthiscase,wecanloadtherawpagedirectlyintoboththelxmlandBeautifulSoupsystems.
>>>importlxml.etree
>>>parser=lxml.etree.HTMLParser(encoding='utf-8')
>>>tree=lxml.etree.parse('phoenix.html',parser)
Theneedforaseparateparserobjecthereisbecause,asyoumightguessfromitsname,lxmlisnativelytargetedatXMLfiles.
>>>fromBeautifulSoupimportBeautifulSoup
>>>soup=BeautifulSoup(open('phoenix.html'))
Traceback(mostrecentcalllast):
...
HTMLParseError:malformedstarttag,atline96,column720
Whatonearth?Well,look,theNationalWeatherServicedoesnotcheckortidyitsHTM.Jumpingtoline96,column720ofphoenix.html,weseethattheredoesindeedappeartobesomebrokenHTML:
<ahref="http://www.weather.gov"<u>www.weather.gov</u></a>
Youcanseethatthe<u>tagstartsbeforeaclosinganglebrackethasbeenencounteredforthe<a>tag.ButwhyshouldBeautifulSoupcare.IwonderwhatversionIhaveinstalled.
>>>BeautifulSoup.__version__
'3.1.0'
Well,drat.ItypedtooquicklyandwasnotcarefultospecifyaworkingversionwhenIranpiptoinstallBeautifulSoupintomyvirtualenvironment.Let’stryagain:
root@erlerobot:~/Python_files#pipinstallBeautifulSoup==3.0.8.1
Now,ifweweretotaketheapproachofstartingatthetopofthedocumentanddiggingeverdeeperuntilwefindthenodethatweareinterestedin,wearegoingtohavetogeneratesomeveryverbosecode.Hereistheapproachwewouldhavetotakewithlxml:
DivingintoanHTMLDocument
>>>fonttag=tree.find('body').find('div').findall('table')[3]\
....findall('tr')[1].find('td').findall('table')[1].find('tr')\
....findall('td')[1].findall('table')[1].find('tr').find('td')\
....find('table').findall('tr')[1].find('td').find('table')\
....find('tr').find('td').find('font')
>>>fonttag.text
'\nAFewClouds'
AnattractivesyntacticconventionletsBeautifulSouphandlesomeofthesestepsmorebeautifully:
>>>fonttag=soup.body.div('table',recursive=False)[3]\
...('tr',recursive=False)[1].td('table',recursive=False)[1].tr\
...('td',recursive=False)[1]('table',recursive=False)[1].tr.td\
....table('tr',recursive=False)[1].td.table\
....tr.td.font
>>>fonttag.text
u'AFewClouds71°F(22°C)'
BeautifulSoupletsyouchoosethefirstchildelementwithagiventagbysimplyselectingtheattribute.tagname,andletsyoureceivealistofchildelementswithagiventagnamebycallinganelementlikeafunctionwiththetagnameandarecursiveoptiontellingittopayattentionjusttothechildrenofanelement.
BothlxmlandBeautifulSoupprovideattractivewaystoquicklygrabachildelementbasedonitstagnameandpositioninthedocument.Weclearlyshouldnotbeusingsuchprimitivenavigationtotrydescendingintoareal-worldwebpage.
FiguringouthowHTMLelementsaregrouped,bytheway,ismucheasierifyoueitherviewHTMLwithaneditorthatprintsitasatree,orifyourunitthroughatoollikeHTMLtidyfromW3Cthatcanindenteachtagtoshowyouwhichonesareinsidewhichotherones.tidyvalidate,correct,andpretty-printHTMLfiles.Youshouldusethiscommandline:
tidyphoenix.html>phoenix-tidied.html
Aselectorisapatternthatiscraftedtomatchdocumentelementsonwhichyourprogramwantstooperate.Someofthemare:
PeoplewhoaredeeplyXML-centricpreferXPathexpressions,whichareacompaniontechnologytoXMLitselfandletyoumatchelementsbasedontheirancestors,theirownidentity,andtextualmatchesagainsttheirattributesandtextcontent.
Ifyouareawebdeveloper,thenyouprobablylinktoCSSselectorsasthemostnaturalchoiceforexaminingHTML.
BothlxmlandBeautifulSoup,aswehaveseen,provideasmatteringoftheirownmethodsforfindingdocumentelements.
Herearestandardsanddescriptionsforeachoftheselectorstylesjustdescribed:
http://www.w3.org/TR/xpath/http://codespeak.net/lxml/tutorial.html#using-xpath-to-find-texthttp://codespeak.net/lxml/xpathxslt.htmlhttp://www.w3.org/TR/CSS2/selector.htmlhttp://codespeak.net/lxml/cssselect.html
And,finally,herearelinkstodocumentationthatlooksatselectormethodspeculiartolxmlandBeautifulSoup:
http://codespeak.net/lxml/tutorial.html#elementpathhttp://www.crummy.com/software/BeautifulSoup/documentation.html
Now,hereyouhaveacompletedweatherscraperinthefileweather.py:
importsys,urllib,urllib2
importlxml.etree
fromlxml.cssselectimportCSSSelector
fromBeautifulSoupimportBeautifulSoup
iflen(sys.argv)<2:
print>>sys.stderr,'usage:weather.pyCITY,STATE'
exit(2)
data=urllib.urlencode({'inputstring':''.join(sys.argv[1:])})
info=urllib2.urlopen('http://forecast.weather.gov/zipcity.php',data)
content=info.read()
#Solution#1usingCSSSelector
parser=lxml.etree.HTMLParser(encoding='utf-8')
tree=lxml.etree.fromstring(content,parser)
big=CSSSelector('td.big')(tree)[0]
ifbig.find('font')isnotNone:
big=big.find('font')
print'Condition:',big.text.strip()
print'Temperature:',big.findall('br')[1].tail
tr=tree.xpath('.//td[b="Humidity"]')[0].getparent()
print'Humidity:',tr.findall('td')[1].text
#Solution#2usingBeautifulSoup
soup=BeautifulSoup(content)#doctest:+SKIP
big=soup.find('td','big')
ifbig.fontisnotNone:
big=big.font
print'Condition:',big.contents[0].string.strip()
temp=big.contents[3].stringorbig.contents[4].string#canbeeither
print'Temperature:',temp.replace('°','')
tr=soup.find('b',text='Humidity').parent.parent.parent
print'Humidity:',tr('td')[1].string
Selectors
Takeintoaccountthatforrunningthisyoualsoneedtohavethelxmmoduleinstalled.
Thischapterfocusesontheactualactofprogramming.Everyotherissuethatweconsiderwillbeintheserviceofthisoverarchinggoal:tocreateanewwebserviceusingPythonasourlanguage.
WebApplications
Acceptablewebsiteperformancegenerallyrequirestheabilitytoserveseveralusersconcurrently.
Toavoidcorruptingin-memorydatastructures,CPythonemploysaGlobalInterpreterLock(GIL),sothatonlyonethreadinamulti-threadedprogramcanactuallybeexecutingPythoncodeatanygiventime.ThusPythonwillletyoucreateasmanythreadsasyouwantinagivenprocess;however,onlyonethreadcanruncodeatatime,asthoughyourthreadswereconfinedtoasingleprocessor.
Atypicalwebapplicationreceivesandparsestheuser'srequest,thenmakesacorrespondingrequesttothedatabasebehindit;whilethatthreadiswaitingforaresponsefromthedatabase,theGILisavailableforanyotherthreadsthatneedtorunPythoncode.Finallythedatabaseanswers;thewaitingthreadreacquirestheGIL;and,inaquickblazeofCPUactivity,thedataisturnedintoanattractivewebpage,andtheresponseissentwingingitswaybacktotheuser.
Thusthreadscansometimesatleastperformdecently.Nevertheless,multipleprocessesarethemoregeneralwaytoscale.Thisisbecause,asaservicegetsbigger,additionalprocessescanbebroughtuponadditionalmachines,ratherthanbeingconfinedtoasinglemachine.Threads,nomattertheirothermerits,cannotdothat!TherearetwogeneralapproachestorunningaPythonwebapplicationinsideofacollectorofidenticalworkerprocesses:
TheApachewebservercanbecombinedwiththepopularmod_wsgimoduletohostaseparatePythoninterpreterineveryApacheworkerprocess.
ThewebapplicationcanberuninsideofeithertheflupserverortheuWSGIserver.BothoftheseserverswillmanageapoolofworkerprocesseswhereeachprocesshostsaPythoninterpreterrunningyourapplication.Thefront-endwebservercansubmitrequeststoflupusingeitherthestandardFastCGI(FCGI)orSimpleCGI(SCGI)protocol,whileithastospeaktouWSGIinitsownspecial“uwsgi”protocol(whosenameisalllowercasetodistinguishitfromthenameoftheserver).
WebServersandPython
AllofthepopularopensourcewebserverscanbeusedtoservePythonwebapplications,sothefullrangeofmodernoptionsisavailable:
ApacheHTTPServer:SincetakingtheleadasthemostpopularHTTPserverbackin1996.Itsstatedgoalisflexibilityandmodularity;itisreasonablyfast,butitwillnotwinspeedrecordsagainstmorerecentserversthatfocusonlyonspeed.Itsconfigurationfilescanbeabitlongandverbose,butthroughthemApacheoffersverypowerfuloptionsforapplyingdifferentrulesandbehaviorstodifferentdirectoriesandURLs.Avarietyofextensionmodulesareavailable(manyofwhichcomebundledwithit),anduserdirectoriescanhaveseparate.htaccessconfigurationfilesthatmakefurtheradjustmentstothemainconfiguration.
nginx(“engineX”):Thenginxserverhasbecomeagreatfavoriteoforganizationswithalargevolumeofcontentthatneedstobeservedquickly.Itisconsideredfairlyeasytoconfigure.lighttpd(“lighty”):Firstwrittentodemonstrateanarchitecturethatcouldsupporttensofthousandsofopenclientsockets(bothnginxandCherokeearealsocontendersinthisclass),thisserverisknownforbeingveryeasytoconfigure.Somesystemadministratorscomplainaboutitsmemoryusage,butmanyothershaveobservednoproblemswithit.
Cherokee:Notonlydoesthisserverofferperformancethatmightedgeoutevennginxandlighttpd,butitletsyouconfiguretheserverthroughabuilt-inwebinterface.
SotocombineeachoftheseserverswithPython;forexampleinthecaseofApache:themod_wsgimodulehasadaemonmodewhereitinternallyrunsyourPythoncodeinsideastackofdedicatedserverprocessesthatareseparatefromApache.EachWebServerGatewayInterface(WSGI)processcanevenrunasadifferentuser.IfyoureallywanttouseApacheasyourfrontend,thisisoneofthebestoptionsavailable.
Butthemoststronglyrecommendedapproachtodayistosetuponeofthethreefastserverstoprovideyourstaticcontent,andthenuseoneofthefollowingthreetechniquestorunyourPythoncodebehindthem:
UseHTTPproxyingsothatyournginx,lighttpd,orCherokeefront-endserverdeliversHTTPrequestsfordynamicwebpagestoaback-endApacheinstancerunningmod_wsgi.
UsetheFastCGIprotocolorSCGIprotocoltotalktoaflupinstancerunningyourPythoncode.
UsetheuwsgiprotocoltotalktoauWSGIinstancerunningyourPythoncode.
Atthispoint,youunderstandsomethingofthelargercontextinwhichPythonwebapplicationsareusuallyrun;youarenowreadytoturnyourattentiontothetaskofprogramming.
ChoosingaWebServer
IntegratingPythonwithwebserverswasmuchimprovedbythecreationofPEP333,whichdefinesthePythonWebServerGatewayInterface(WSGI):http://legacy.python.org/dev/peps/pep-0333/.
WSGIintroducedasinglecallingconventionthateverywebservercouldimplement,therebymakingthatwebserverinstantlycompatiblewithallofthePythonwebapplicationsandwebframeworksthatalsosupportWSGI.
AtthePythonlibraryyoucangetmoreinformationaboutthewsgirefmodule.ThismoduleprovidesavarietyofutilityfunctionsforworkingwithWSGIenvironments.Thewsgirefpackage,whosesimple_serverwewilluseintheexample,alsocontainsseveralutilitiesforworkingwithWSGI.Itincludesfunctionsforexamining,furtherunpacking,andmodifyingtheenvironobject;aprebuiltiteratorforstreaminglargefilesbacktotheserver;andevenavalidatesub-modulewhoseroutinescancheckaWSGIapplicationtoseewhetheritcomplieswiththespecificationwhenpresentedwithaseriesofrepresentativerequests.
Developersgenerallyavoidwritingraw.WSGIapplicationsbecausetheconveniencesofevenasimplewebframeworkmakecodesomucheasiertowriteandmaintain.But,forthesakeofillustration,wsgi_app.pyshowsasmallWSGIapplicationwhosefrontpageaskstheusertotypeastring.Submittingthestringtakestheusertoasecondwebpage,wherehecanseeitsbase64encoding.Fromthere,alinkwilltakehimbacktothefirstpagetorepeattheprocess.
importcgi,base64
fromwsgiref.simple_serverimportmake_server
defpage(content,*args):
yield'<html><head><title>wsgi_app.py</title></head><body>'
yieldcontent%args
yield'</body>'
defsimple_app(environ,start_response):
gohome='<br><ahref="/">Returntothehomepage</a>'
q=cgi.parse_qs(environ['QUERY_STRING'])
ifenviron['PATH_INFO']=='/':
ifenviron['REQUEST_METHOD']!='GET'orenviron['QUERY_STRING']:
start_response('400BadRequest',[('Content-Type','text/plain')])
return['Error:thefrontpageisnotaform']
start_response('200OK',[('Content-Type','text/html')])
returnpage('Welcome!Enterastring:<formaction="encode">'
'<inputname="mystring"><inputtype="submit"></form>')
elifenviron['PATH_INFO']=='/encode':
ifenviron['REQUEST_METHOD']!='GET':
start_response('400BadRequest',[('Content-Type','text/plain')])
return['Error:thisformdoesnotsupportPOSTparameters']
if'mystring'notinqornotq['mystring'][0]:
start_response('400BadRequest',[('Content-Type','text/plain')])
return['Error:thisformrequiresa"mystring"parameter']
my=q['mystring'][0]
start_response('200OK',[('Content-Type','text/html')])
returnpage('<tt>%s</tt>base64encodedis:<tt>%s</tt>'+gohome,
cgi.escape(repr(my)),cgi.escape(base64.b64encode(my)))
else:
start_response('404NotFound',[('Content-Type','text/plain')])
return['ThatURLisnotvalid']
print'Listeningonlocalhost:8000'
make_server('localhost',8000,simple_app).serve_forever()
importcgi,base64fromwsgiref.simple_serverimportmake_server
defpage(content,*args):yield''yieldcontent%argsyield''
WSGI
defsimple_app(environ,start_response):gohome='Returntothehomepage'q=cgi.parse_qs(environ['QUERY_STRING'])
ifenviron['PATH_INFO']=='/':
ifenviron['REQUEST_METHOD']!='GET'orenviron['QUERY_STRING']:
start_response('400BadRequest',[('Content-Type','text/plain')])
return['Error:thefrontpageisnotaform']
start_response('200OK',[('Content-Type','text/html')])
returnpage('Welcome!Enterastring:<formaction="encode">'
'<inputname="mystring"><inputtype="submit"></form>')
elifenviron['PATH_INFO']=='/encode':
ifenviron['REQUEST_METHOD']!='GET':
start_response('400BadRequest',[('Content-Type','text/plain')])
return['Error:thisformdoesnotsupportPOSTparameters']
if'mystring'notinqornotq['mystring'][0]:
start_response('400BadRequest',[('Content-Type','text/plain')])
return['Error:thisformrequiresa"mystring"parameter']
my=q['mystring'][0]
start_response('200OK',[('Content-Type','text/html')])
returnpage('<tt>%s</tt>base64encodedis:<tt>%s</tt>'+gohome,
cgi.escape(repr(my)),cgi.escape(base64.b64encode(my)))
else:
start_response('404NotFound',[('Content-Type','text/plain')])
return['ThatURLisnotvalid']
Thefirstthingtonoteinthiscodelistingisthattwoverydifferentobjectsarebeingcreated:aWSGIserverthatknowshowtouseHTTPtotalktoawebbrowserandanapplicationwrittentorespondcorrectlywheninvokedpertheWSGIcallingconvention.Notethatthesetwopieces—theclientandserver—couldeasilybeswappedout.Thiscodeexampleshouldmakethecallingconventionclearenough:
Foreachincomingrequest,theapplicationiscalledwithanenvironobject,givingitthedetailsoftheHTTPrequestandalive,callable,andnamedstart_response().
OncetheapplicationhasdecidedwhatHTTPresponsecodeandheadersneedtobereturned,itmakesasinglecalltostart_response().ItsheaderswillbecombinedwithanyheadersthattheWSGIservermightalreadyprovidetotheclient.
Finally,theapplicationneedsonlytoreturntheactualcontent—eitheralistofstringsorageneratoryieldingstrings.Eitherway,thestringswillbeconcatenatedbytheWSGIservertoproducetheresponsebodythatistransmittedbacktotheclient.Generatorsareusefulforcaseswhereitwouldbeunwiseforanapplicationtotryloadingallofthecontent(likelargefiles)intomemoryatonce.
StandardinterfaceslikeWSGImakeitpossiblefordeveloperstocreatewrappers—adesign-patternspersonwouldcalltheseadapters—thatacceptarequestfromaserver;modify,adjust,orrecordtherequest;andthencallanormalWSGIapplicationwiththemodifiedenvironment.Suchmiddlewarecanalsoinspectandadjusttheoutgoingdatastream;everything,infact,isupforgrabs,andessentialarbitrarychangescanbemadebothtothecircumstancesunderwhichaWSGIapplicationruns,aswellastothecontentthatitreturns.
IfseveralWSGIapplicationsneedtoliveatasinglewebsiteunderdifferentURLs,thenapieceofmiddlewarecanbegiventheURLs.(youcanreadmoreinhttp://pythonpaste.org/)
IfeachWSGIapplicationonawebsiteweretokeepitsownlistofpasswordsandhonoronlyitsownsessioncookies,thenuserswouldhavetologinagaineachtimetheycrossedanapplicationboundary.BydelegatingauthenticationtoWSGImiddleware,applicationscanberelievedevenofthedutytoprovidetheirownloginpage;instead,themiddlewareasksauserwholacksasessioncookietologin;onceauserisauthenticated,themiddlewarecanpassalongtheuser'sidentitytotheapplicationsbyputtingtheuser'sinformationintheenvironargument.Bothrepoze.whoandrepoze.whatcanhelpsiteintegratorsassertsite-widecontroloverusersandtheirpermissions.
Themingcanbeaproblemwhenseveralsmallapplicationsarecombinedtoformalargerwebsite.Thisisbecauseeachapplicationtypicallyhasitsownapproachtotheming.Thishasledtothedevelopmentoftwocompetingtools,xdvandDeliverance,thatletyoubuildasingleHTMLthemeandthenprovidesimplerulesthatpulltextoutofyourback-endapplicationsanddropitintoyourthemeintherightplaces.
DebuggerscanbecreatedthatcallaWSGIapplicationand,ifanuncaughtPythonexceptionisraised,displayanannotatedtracebacktosupportdebugging.WebErroractuallyprovidesthedeveloperwithalive,in-browserPythoncommandlinepromptforeverylevelinastacktraceatwhichthedevelopercaninvestigateafailure.Anotherpopulartoolisrepoze.profile,whichwatchestheapplicationasitprocessesrequestsandproducesareportonwhichfunctionsareconsumingthemostCPUcycles.
IfyouareinterestedinwhatWSGImiddlewareisavailable,thenyoucanvisitthispairofsitestolearnmore:
http://wsgi.org/wsgi/Middleware_and_Utilitieshttp://repoze.org/repoze_components.html#middleware
TodaythereareatleastthreemajorcompetingapproachesinthePythoncommunityforcraftingmodularcomponentsthatcanbeusedtobuildwebsites:
TheWSGImiddlewareapproachthinksthatcodereusecanoftenbestbeachievedthroughacomponentstack,whereeachcomponentusesWSGItospeaktothenext.Here,allinteractionhastosomehowbemadetofitthemodelofadictionaryofstringsbeinghandeddownandthencontentbeingpassedbackup.
EverythingbuiltatoptheZopeToolkitusesformalDesignPatternconceptslikeinterfacesandfactoriestoletcomponentsdiscoveroneanotherandbeconfiguredforoperation.Thankstoadapters,componentscanoftenbeusedwithwidgetsthatwerenotoriginallydesignedwithagiventypeofcomponentinmind.
Severalwebframeworkshavetriedtoadoptconventionsthatwouldmakeiteasyforthird-partypiecesoffunctionalitytobeaddedtoanapplicationeasily.TheDjangocommunityseemstohavetraveledthefarthestinthisdirection,butitalsolooksasthoughithasencounteredquiteseriousroadblocksincaseswhereacomponentneedstoadditsowntablestothedatabasethathaveforeign-keyrelationshipswithusertables.
Theseexamplesillustrateanimportantfact:WSGImiddlewareisagoodideathathasworkedverywellforasmallclassofproblemswheretheideaofwrappinganapplicationwithconcentricfunctionalitymakessolidsense.However,mostwebprogrammersseemtowanttousemoretypicalPythonmechanismslikeAPIs,classes,andobjectstocombinetheirowncodewithexistingcomponents.
WSGIMiddleware
Nowwearegoingtotalkaboutanentirelydifferentdiscipline:webapplicationdevelopment.
Networkprogrammersthinkaboutthingslikesockets,portnumbers,protocols,packetloss,latency,framing,andencodings.Althoughalloftheseconceptsmustalsobeinthebackofawebdeveloper'smind,heractualattentionisfocusedonasetoftechnologiessointricateandfast-changingthattheactualpacketsandlatenciesarerecalledtomindonlywhentheyarecausingtrouble.ThewebdeveloperneedstothinkinsteadaboutHTML,GET,POST,forms,REST,CSS,JavaScript,Ajax,APIs,sprites,compression,andemergingtechnologieslikeHTML5andWebSocket.Thewebsiteexistsinhermindprimarilyasaseriesofdocumentsthatuserswilltraversetoaccomplishgoals.
WebframeworksexisttohelpprogrammersstepbackfromthedetailsofHTTP—whichis,afterall,animplementationdetailmostusersneverevenbecomeawareof—andtowritecodethatfocusesonthenounsofwebdesign.wsgi_app.pyshowshowevenaverymodestPythonmicroframeworkcanbeusedtoreorienttheattentionofawebprogrammer.
Youcaninstalltheframeworkbottleandrunthelistingonceyouhaveactivatedavirtualenvironment,likethis:
Thebottle_app.py:
importbase64,bottle
bottle.debug(True)
app=bottle.Bottle()
@app.route('/encode')
@bottle.view('bottle_template.html')
defencode():
mystring=bottle.request.GET.get('mystring')
ifmystringisNone:
bottle.abort(400,'Thisformrequiresa"mystring"parameter')
returndict(mystring=mystring,myb=base64.b64encode(mystring))
@app.route('/')
@bottle.view('bottle_template.html')
defindex():
returndict(mystring=None)
bottle.run(app=app,host='localhost',port=8080)
root@erlerobot:~/Python_files#pipinstallbottle
root@erlerobot:~/Python_files#pythonbottle_app.py
Thewsgi_app.py:
importcgi,base64
fromwsgiref.simple_serverimportmake_server
defpage(content,*args):
yield'<html><head><title>wsgi_app.py</title></head><body>'
yieldcontent%args
yield'</body>'
defsimple_app(environ,start_response):
gohome='<br><ahref="/">Returntothehomepage</a>'
q=cgi.parse_qs(environ['QUERY_STRING'])
ifenviron['PATH_INFO']=='/':
ifenviron['REQUEST_METHOD']!='GET'orenviron['QUERY_STRING']:
start_response('400BadRequest',[('Content-Type','text/plain')])
return['Error:thefrontpageisnotaform']
start_response('200OK',[('Content-Type','text/html')])
returnpage('Welcome!Enterastring:<formaction="encode">'
'<inputname="mystring"><inputtype="submit"></form>')
PythonWebFrameworks
elifenviron['PATH_INFO']=='/encode':
ifenviron['REQUEST_METHOD']!='GET':
start_response('400BadRequest',[('Content-Type','text/plain')])
return['Error:thisformdoesnotsupportPOSTparameters']
if'mystring'notinqornotq['mystring'][0]:
start_response('400BadRequest',[('Content-Type','text/plain')])
return['Error:thisformrequiresa"mystring"parameter']
my=q['mystring'][0]
start_response('200OK',[('Content-Type','text/html')])
returnpage('<tt>%s</tt>base64encodedis:<tt>%s</tt>'+gohome,
cgi.escape(repr(my)),cgi.escape(base64.b64encode(my)))
else:
start_response('404NotFound',[('Content-Type','text/plain')])
return['ThatURLisnotvalid']
print'Listeningonlocalhost:8000'
make_server('localhost',8000,simple_app).serve_forever()
Inbottle_app.pytheattentionwasonthesingleincomingHTTPrequest,andthebranchesinourlogicexploredallofthepossiblelifespansforthatparticularprotocolrequest.wsgi_app.pychangesthefocustothepagesthatactuallyexistonthesiteandgivingeachofthesepagesreasonablebehaviors.Thesametreeofpossibilitiesexists,butthetreeexistsimplicitlythankstothepossibleURLsdefinedinthecode,notbecausetheprogrammerhaswrittenalargeifstatement.
%#Thepagetemplatethatgoeswithbottle_app.py.
%#
<html><head><title>bottle_app.py</title></head>
<body>
%ifmystringisNone:
Welcome!Enterastring:
<formaction="encode"><inputname="mystring"><inputtype="submit"></form>
%else:
<tt>{{mystring}}</tt>base64encodedis:<tt>{{myb}}</tt><br>
<ahref="/">Returntothehomepage</a>
%end
</body>
Itmightseemmerelyapleasantconveniencethatwecanusethe`BottleSimpleTemplatetoinsertourvariablesintoawebpageandknowthattheywillbeescapedcorrectly.Butthetruthisthattemplatesserve,justlikeschemesforURLdispatch,tore-orientourattention:insteadoftheresultingwebpageexistinginourmindsaswhatwillresultwhenthestringsinourprogramlistingarefinallyconcatenated,wegettolayoutitsHTMLintact,inorder,andinafilethatcanactuallytakean.htmlextensionandbehighlightedandindentedasHTMLinoureditor.ThePythonprogramwillnolongerimpedeourrelationshipwithourmarkup.
Andfull-fledgedPythonframeworksabstractawayevenmoreimplementationdetails.Averyimportantfeaturetheytypicallyprovideisdataabstraction:insteadoftalkingtoadatabaseusingitsrawAPIs,aprogrammercandefinemodels,layingoutthedatafieldssotheyareeasytoinstantiate,search,andmodify.AndsomeframeworkscanprovideentireRESTfulAPIsthatallowcreation,inspection,modification,anddeletionwithPUT,GET,POST,andDELETE.Theprogrammermerelyneedstodefinethestructureofhisdatadocument,andthennametheURLatwhichthetreeofRESTobjectsshouldbebased.
Whenlookingforawebframework,youwillfindthatthevariousframeworksdifferonafewmajorpoints.Theupcomingsectionswillwalkyouthroughwhatthesepointsare,andhowtheymightaffectyourdevelopmentexperience.
ThevariousPythonwebframeworkstendtohandleURLdispatchquitedifferently.
SomesmallframeworkslikeBottleandFlaskletyoucreatesmallapplicationsbydecoratingaseriesofcallableswithURLpatterns;smallapplicationscanthenbecombinedlaterbyplacingthembeneathoneormoretop-levelapplications.
Othersframeworks,likeDjango,Pylons,andWerkzeug,encourageeachapplicationtodefineitsURLsallinoneplace.Thisbreaksyourcodeintotwolevels,whereURLdispatchhappensinonelocationandrenderinginanother.ThisseparationmakesiteasiertoreviewalloftheURLsthatanapplicationsupports;italsomeansthatyoucanattachcodetonewURLswithouthavingtomodifythefunctionsthemselves.
Anotherapproachhasyoudefinecontrollers,whichareclassesthatrepresentsomepointintheURLhierarchy—say,thepath/cart—andthenwritemethodsonthecontrollerclassnamedview()andedit()ifyouwanttosupportsub-pagesnamed/cart/viewand/cart/edit.CherryPy,TurboGears2,andPylons(ifyouusecontrollersinsteadofRoutes)allsupportthisapproach.WhiledetermininglaterwhatURLsaresupportedcanmeantraversingamazeofdifferentconnectedclasses,thisapproachdoesallowfordynamic,recursiveURLspacesthatexistonlyatruntimeasclasseshandoffdispatchrequestsbasedonlivedataaboutthesitestructure.
AlargecommunitywithitsownconferencesexistsaroundtheZopeframework.
ThevariousmechanismsforURLdispatchcanallbeusedtoproducefairlycleandesign,andchoosingfromamongthemislargelyamatteroftaste.
URLDispatchTechniques
AlmostallwebframeworksexpectyoutoproducewebpagesbycombiningPythoncodecalledaviewwithanHTMLtemplate;yousawthisapproachinactioninwsgi_app.py.Thisapproachhasgainedtractionbecauseofitseminentmaintainability:buildingadictionaryofinformationisbestperformedinplainPythoncode,andtheitemsfetchedandarrangedbytheviewcantheneasilybeincludedbythetemplate,solongasthetemplatelanguagesupportsbasicactionslikeiterationandsomeformofexpressionevaluation.(Atemplateisadocumentconsistingofrowsandtables,withdifferentrangesandsizes,whichfacilitatesthedevelopmentofwebpages,lettersorothercontent).ItisoneofthegloriesofPythonthatweuseviewsandtemplates,andoneoftheshamesoftraditionalPHPdevelopmentthatdeveloperswouldfreelyintermixHTMLandextensivePHPcodetoproduceasingle,unifiedmess.
Viewscanalsobecomemoretestablewhentheironlyjobistogenerateadictionaryofdata.Agoodframeworkwillletyouwriteteststhatsimplychecktherawdatareturnedbythefunctioninsteadofmakingyoupeekrepeatedlyintofullyrenderedtemplatestoseeiftheviewcorralleditsdatacorrectly.
Thereseemtobetwomajordifferencesofopinionamongthedesignersandusersofthevarioustemplatelanguagesaboutwhatconstitutesthebestwaytousetemplates:
ShouldtemplatesbevalidHTMLwithiterationandexpressionshiddeninelementattributes?OrshouldthetemplatelanguageuseitsownstyleofmarkupthatfestoonsandwrapstheliteralHTMLofthewebpage?WhiletheformercanletthedeveloperrunHTMLvalidationagainsttemplatefilesbeforetheyareeverrenderedandbeassuredthatrenderingwillnotchangethevalidator'sverdict,mostdevelopersseemtofindthelatterapproachmucheasiertoreadandmaintain.
ShouldtemplatesallowarbitraryPythonexpressionsintemplatecode,orlockdowntheavailableoptionstoprimitiveoperationslikedictionaryget-itemandobjectget-attribute?Manypopularframeworkschoosethelatteroption,requiringevenlazyprogrammerstopushcomplexoperationsintotheirPythoncode“whereitbelongs.”Butseveraltemplatelanguagesreasonthat,ifPythonprogrammersdosowellwithouttypechecking,thenmaybetheyshouldalsobetrustedwiththechoiceofwhichexpressionsbelongintheviewandwhichinthetemplate.
SincemanyPythonframeworksletyoupluginyourtemplatelanguageofchoice,andonlyafewofthemlockyoudowntooneoption,youmightfindthatyoucanpairyourfavoriteapproaches.
Templates
AfunwaytodemonstratethatPythoncomeswith“batteriesincluded”istoenteradirectoryonyoursystemandruntheSimpleHTTPServerStandardLibrarymoduleasastand-aloneprogram:
root@erlerobot:~/Python_files#python-mSimpleHTTPServer
ServingHTTPon0.0.0.0port8000...
Ifyoudirectyourbrowsertolocalhost:8000,youwillseethecontentsofthisscript'scurrentdirectorydisplayedforbrowsing,suchasthelistingsprovidedbyApachewhenasiteleavesadirectorybrowsable.Documentsandimageswillloadinyourwebbrowserwhenselected,basedonthecontenttypeschosenthroughthebestguessesofthemimetypesStandardLibrarymodule.ThemimetypesmoduleconvertsbetweenafilenameorURLandtheMIMEtypeassociatedwiththefilenameextension.ConversionsareprovidedfromfilenametoMIMEtypeandfromMIMEtypetofilenameextension;encodingsarenotsupportedforthelatterconversion.
Yoday,weusenamespaces,callables,andduck-typedobjectstoprovidemuchcleanerformsofextensibility.Forexample,todayanobjectlikestart_responseisprovidedasanargument(dependencyinjection),andtheWSGIstandardspecifiesitsbehaviorratherthanitsinheritancetree(ducktyping).TheStandardLibraryincludestwootherHTTPservers:
CGIHTTPServertakestheSimpleHTTPServerand,insteadofjustservingstaticfilesoffofthedisk,itaddstheabilitytorunCGIscripts.
SimpleXMLRPCServerandDocXMLRPCServereachprovideaserverendpointagainstwhichclientprogramscanmakeXML-RPCremoteprocedurecalls.ThisprotocolusesXMLfilessubmittedthroughHTTPrequests.
Notethatnoneoftheprecedingserversistypicallyintendedforproductionuse;instead,theyareusefulforsmallinternaltasksforwhichyoujustneedaquickHTTPendpointtobeusedbyotherservicesinternaltoasystemorsubnet.AndwhilemostPythonwebframeworkswillprovideawaytorunyourapplicationfromthecommandlinefordebugging.Thesepure-Pythonwebserverscanbeveryusefulifyouarewritinganapplicationthatuserswillbeinstallinglocally,andyouwanttoprovideawebinterfacewithouthavingtoshipaseparatewebserverlikeApacheornginx.
Pure-PythonWebServers
Whenthefirstexperimentsweretakingplacewithdynamicallygeneratedwebpages,acallingconventionwasnecessary,andsotheCommonGatewayInterface(CGI)wasdefined.Itallowedprogramsinallsortsoflanguages—C,thevariousUnixshells,awk,Perl,Python,PHP,andsoforth—tobepartnersingeneratingdynamiccontent.
Today,thedesignofCGIisconsideredsomethingofadisaster.Runninganewprocessfromscratchisjustaboutthemostexpensivesingleoperationthatyoucanperformonamodernoperatingsystem,andrequiringthatthistakeplaceforeverysingleincomingHTTPrequestissimplymadness.YoushouldavoidCGIunderallcircumstances.ButitispossibleyoumightsomedayhavetoconnectPythoncodetoalegacyHTTPserverthatdoesnotsupportatleastFastCGIorSCGI,soIwilloutlineCGI'sessentialfeatures.ThreestandardlinesofcommunicationthatalreadyexistedbetweenparentandchildprocessesonUnixsystemswereusedbywebserverswheninvokingaCGIscript:
TheUnixenvironment—alistofstringsprovidedtoeachprocessuponitsinvocationthattraditionallyincludesthingslikeTZ=EST(thetimezone)andCOLUMNS=80(user'sscreenwidth)—wasinsteadstuffedfullofinformationabouttheHTTPrequestthattheCGIscriptwasbeingcalledupontoanswer.Thevariouspartsoftherequest'sURL;theuseragentstring;basicinformationaboutthewebserver;andevenacookiecouldbeincludedinthelistofcolon-separatedkeyvaluepairs.
Thestandardinputtothescriptcouldbereadtoend-of-filetoreceivewhateverdatahadbeensubmittedinthebodyoftheHTTPrequestusingPOST.WhetherarequestwasindeedaPOSTcouldbecheckedbyexaminingtheREQUEST_METHODenvironmentvariable.
Finally,thescriptwouldproducecontent,whichitdidbywritingHTTPheaders,ablankline,andthenaresponsebodytoitsstandardoutput.Tobeavalidresponse,aContent-Typeheaderwasgenerallynecessaryataminimum—thoughinitsabsence,somewebserverswouldinsteadacceptaLocationheaderasasignalthattheyshouldsendaredirect.
ShouldyoueverneedtorunPythonbehindanHTTPserverthatonlysupportsCGI,thenIrecommendthatyouusetheCGIHandlermodulefromthewsgirefStandardLibrarypackage(ThisisusefulwhenyouhaveaWSGIapplicationandwanttorunitasaCGIscript).ThisletsyouuseanormalPythonwebframeworktowriteyourservice—or,alternatively,torollupyoursleevesandwritearawWSGIapplication—andthenoffertheHTTPserveraCGIscript,asshownhere:
importCGIHandler,MyWSGIApp
my_wsgi_app=MyWSGIApp()#configurationnecessaryhere?
CGIHandler().run(my_wsgi_app)
BesuretocheckwhetheryourwebframeworkofchoicealreadyprovidesawaytoinvokeitasaCGIscript;ifso,yourwebframeworkwillalreadyknowallofthestepsinvolvedinloadingandconfiguringyourapplication.
CommonGatewayInterface(CGI)
AsitbecameclearthatCGIwasbothinefficientandinflexible—CGIscriptscouldnotflexiblysettheHTTPreturncode,forexample—itbecamefashionabletostartembeddingprogramminglanguagesdirectlyinwebservers.
Backintheearlydays,embeddingwasalsopossible,throughasomewhatdifferentapproachthatactuallymadePythonanextensionlanguageformuchoftheinternalsofApacheitself.Themodulethatsupportedthiswasmod_python,andforyearsitwasbyfarthemostpopularwaytoconnectPythontotheWorldWideWeb.The`mod_pythonApachemoduleputaPythoninterpreterinsideofeveryworkerprocessspawnedbyApache.ProgrammerscouldarrangefortheirPythoncodetobeinvokedbywritin.directivesintotheirApacheconfiguration.
Today,mod_pythonismainlyofhistoricalinterest.Ihaveoutlineditsfeatureshere,notonlybecauseyoumightbecalledupontomaintainorupgradeaservicethatisstillrunningonmod_python,butbecauseitstillprovidesuniqueApacheintegrationpointswherePythoncannotgetinvolvedinanyotherway.Ifyourunintoeithersituation,youcanfinditsdocumentationathttp://modpython.org/
mod_python
Here,wewilllearnabouttheactualpayloadthatiscarriedbyalloftheprotocolsinvolvedinwaysasamessageistransmittedandreceived(AuthenticatedSMTP,POP,IMAP),thatis,theformatofe-mailmessagesthemslves.
E-mailCompositionandDecoding
Eachtraditionale-mailmessagecontainstwodistinctparts:headersandthebody.Hereisaverysimplee-mailmessagesothatyoucanseewhatthetwosectionslooklike:
From:JaneSmith<jsmith@example.com>
To:AlanJones<ajones@example.com>
Subject:TestingThisE-MailThing
HelloAlan,
Thisisjustatestmessage.Thanks.
Thefirstsectioniscalledtheheaders,whichcontainallofthemetadataaboutthemessage,likethesender,thedestination,andthesubjectofthemessage—everythingexceptthetextofthemessageitself.Thebodythenfollowsandcontainsthemessagetextitself.TherearethreebasicrulesofInternete-mailformatting:
Atleastduringactualtransmission,everylineofane-mailmessageshouldbeterminatedbythetwo-charactersequencecarriagereturn,newline,representedinPythonby'\r\n'.E-mailclientsrunningonyourlaptopordesktopmachinetendtomakedifferentdecisionsaboutwhethertostoremessagesinthisformat,orreplacethesetwo-characterlineendingswithwhateverendingisnativetoyouroperatingsystem.
Thefirstfewlinesofane-mailareheaders,whichconsistofaheadername,acolon,aspace,andavalue.Aheadercanbeseverallineslongbyindentingthesecondandfollowinglinesfromtheleftmarginasasignalthattheybelongtotheheaderabovethem.
Theheadersendwithablankline(thatis,bytwolineendingsback-to-backwithoutinterveningtext)andthenthemessagebodyiseverythingelsethatfollows.Thebodyisalsosometimescalledthepayload.
Theheadersarethereforthebenefitofthepersonwhoreadsthee-mailmessage,andthemostimportantheadersarethese:
From:Thisidentifiesthemessagesender.Itcanalso,intheabsenceofaReply-toheader,beusedasthedestinationwhenthereaderclicksthee-mailclient’s“Reply”button.
Reply-To:Thissetsanalternativeaddressforreplies,incasetheyshouldgotosomeonebesidesthesendernamedintheFromheader.
Subject:Thisisashortseveral-worddescriptionofthee-mail’spurpose,usedbymostclientswhendisplayingwholemailboxesfullofe-mailmessages.
Date:Thisisaheaderthatcanbeusedtosortamailboxintheorderinwhichemailsarrived.
Message-IDandIn-Reply-To:EachIDuniquelyidentifiesamessage,andtheseIDsarethenusedine-mailrepliestospecifyexactlywhichmessagewasbeingrepliedto.Thiscanhelpsophisticatedmailreadersperform“threading,”arrangingmessagessothatrepliesaregroupeddirectlybeneaththemessagestowhichtheyreply.
E-mailMessages
Howcanwegenerateatraditionale-mailinPythonwithouthavingtoimplementtheformattingdetailsourselves?Theansweristousethemoduleswithinthepowerfulemailpackage.Theemailpackageisalibraryformanagingemailmessages,includingMIMEandotherRFC2822-basedmessagedocuments.
Asourfirstexample,trad_gen_simple.pyshowsaprogramthatgeneratesasimplemessage.Notethatwhenyougeneratemessagesthisway,manuallysettingthepayloadwiththeMessageclass,youshouldlimityourselftousingplain7-bitASCIItext.
fromemail.messageimportMessage
text="""Hello,
Thisisatestmessage.
--Anonymous"""
msg=Message()
msg['To']='recipient@example.com'
msg['From']='TestSender<sender@example.com>'
msg['Subject']='TestMessage'
msg.set_payload(text)
printmsg.as_string()
Theprogramissimple.ItcreatesaMessageobject,setstheheadersandbody,andprintstheresult.Whenyourunthisprogram,youwillgetaniceformattedmessagewithproperheaders:
root@erlerobot:~/Python_files#pythontrad_gen_simple.py
To:recipient@example.com
From:TestSender<sender@example.com>
Subject:TestMessage
Hello,
Thisisatestmessage.
--Anonymous
root@erlerobot:~/Python_files#
Whiletechnicallycorrect,thismessageisactuallyabitdeficientwhenitcomestoprovidingenoughheaderstoreallyfunctioninthemodernworld.Foronething,moste-mailsshouldhaveaDateheader,inaformatspecifictoe-mailmessages.Pythonprovidesanemail.utils.formatdate()routinethatwillgeneratedatesintherightformat.YoushouldaddaMessage-IDheadertomessages.Thisheadershouldbegeneratedinsuchawaythatnoothere-mail,anywhereinhistory,willeverhavethesameMessage-ID.Thismightsounddifficult,butPythonprovidesafunctiontohelpdothataswell:email.utils.make_msgid().Sotakealookattrad_gen_newhdrs.py,whichfleshesoutourfirstsampleprogramintoamorecompleteexamplethatsetstheseadditionalheaders.
importemail.utils
fromemail.messageimportMessage
message="""Hello,
Thisisatestmessage.
--Anonymous"""
msg=Message()
msg['To']='recipient@example.com'
msg['From']='TestSender<sender@example.com>'
msg['Subject']='TestMessage'
msg['Date']=email.utils.formatdate(localtime=1)
msg['Message-ID']=email.utils.make_msgid()
ComposingTraditionalMessages
msg.set_payload(message)
printmsg.as_string()
Ifyouruntheprogram,youwillnoticetwonewheadersintheoutput.
root@erlerobot:~/Python_files#pythontrad_gen_newhdrs.py
To:recipient@example.com
From:TestSender<sender@example.com>
Subject:TestMessage
Date:Mon,14Jul201414:31:50+0200
Message-ID:<20140714123150.987.14344@root-erlerobot.local>
Hello,
Thisisatestmessage.
--Anonymous
root@erlerobot:~/Python_files#
Whathappenswhenyoureceiveanincomingmessageasarawblockoftextandwanttolookinside?Well,theemailmodulealsoprovidessupportforparsinge-mailmessages,re-constructingthesameMessageobjectthatwouldhavebeenusedtocreatethemessageinthefirstplace.(Ofcourse,itdoesnotmatterwhetherthee-mailyouareparsingwasoriginallycreatedinPythonthroughtheMessageclass,orwhethersomeothere-mailprogramcreatedit;theformatisstandard,soPython’sparsingshouldworkeitherway.)Afterparsingthemessage,youcaneasilyaccessindividualheadersandthebodyofthemessageusingthesameconventionsasyouusedtocreatemessages:headerslooklikethedictionarykey-valuesoftheMessage,andthebodycanbefetchedwithafunction.
Asimpleexampleofaparserisshownintrad_parse.py.Alloftheactualparsingtakesplaceintheone-linefunctionmessage_from_file();everythingelseintheprogramlistingissimplyanillustrationofhowaMessageobjectcanbeminedforheadersanddata.
importemail
banner='-'*48
popular_headers=('From','To','Subject','Date')
msg=email.message_from_file(open('message.txt'))
headers=sorted(msg.keys())
printbanner
forheaderinheaders:
ifheadernotinpopular_headers:
printheader+':',msg[header]
printbanner
forheaderinheaders:
ifheaderinpopular_headers:
printheader+':',msg[header]
printbanner
ifmsg.is_multipart():
print"ThisprogramcannothandleMIMEmultipartmessages."
else:
printmsg.get_payload()
Theoutputshouldbelikethis
root@erlerobot:~/Python_files#pythontrad_parse.py
------------------------------------------------
Message-ID:<20140714123150.987.14344@root-erlerobot.local>
------------------------------------------------
Date:Mon,14Jul201414:33:54+0200
From:TestSender<sender@example.com>
Subject:TestMessage,Chapter12
To:recipient@example.com
------------------------------------------------
Hello,
Thisisatestmessage.
--Anonymous
root@erlerobot:~/Python_files#
Asyoucansee,thePythonStandardLibrarymakesitquiteeasybothtocreateandthentoparsestandardInternete-mailmessages.Notethattheemailpackagealsooffersamessage_from_string()functionthat,insteadoftakingafile,cansimplybehandedthestringcontainingane-mailmessage.
ParsingTraditionalMessages
TheemailpackageprovidestwofunctionsthatworktogetherasateamtohelpyouparsetheDatefieldofe-mailmessages,whoseformatyoucanseeintheprecedingexample:adateandtime,followedbyatimezoneexpressedashoursandminutes(twodigitseach)relativetoUTC.Countriesintheeasternhemisphereexperiencesunriseearly,sotheirtimezonesareexpressedaspositivenumbers,likethefollowing:
Date:Sun,27May200711:34:43+1000
Thoseofusinthewesternhemispherehavetowaitlongerforthesuntorise,soourtimezoneslagbehind;EasternDaylightTime,forexample,runsfourhoursbehindUTC:
Date:Sun,27May200708:36:37-0400
TofigureoutwhatmomentoftimeisreallymeantbyaDateheader,simplycalltwofunctionsinarow:
Callparsedate_tz()toextractthetimeandtimezone.Usemktime_tz()toaddorsubtractthetimezone.TheresultwithbeastandardUnixtimestamp.
Forexample,considerthetwoDateheadersshownpreviously.Ifyoujustcomparedtheirbaretimes,thefirstdatelookslater:11:34a.m.is,afterall,after8:36a.m.Butthesecondtimeisinfactthemuchlaterone,becauseitisexpressedinatimezonethatissomuchfartherwest.Wecantestthisbyusingthefunctionspreviouslynamed.First,turnthetopdateintoatimestamp:
>>>fromemail.utilsimportparsedate_tz,mktime_tz
>>>timetuple1=parsedate_tz('Sun,27May200711:34:43+1000')
>>>printtimetuple1
(2007,5,27,11,34,43,0,1,-1,36000)
>>>timestamp1=mktime_tz(timetuple1)
>>>printtimestamp1
1180229683.0
Thenturntheseconddateintoatimestampaswell,andthedatescanbecompareddirectly:
>>>timetuple2=parsedate_tz('Sun,27May200708:36:37-0400')
>>>timestamp2=mktime_tz(timetuple2)
>>>printtimestamp2
1180269397.0
>>>timestamp1<timestamp2
True
Ifyouhaveneverseenatimestampvaluebefore,theyrepresenttimeveryplainly:asthenumberofsecondsthathavepassedsincethebeginningof1970.YouwillfindfunctionsinPython’soldtimemodulefordoingcalculationswithtimestamps,andyouwillalsofindthatyoucanturnthemintonormalPythondatetimeobjectsquiteeasily:
>>>fromdatetimeimportdatetime
>>>datetime.fromtimestamp(timestamp2)
datetime.datetime(2007,5,27,8,36,37)
Intherealworld,manypoorlywrittene-mailclientsgeneratetheirDateheadersincorrectly.WhiletheroutinespreviouslyshowndotrytobeflexiblewhenconfrontedwithamalformedDate,theysometimescansimplymakenosenseofitandparsedate_tz()hastogiveupandreturnNone.Sowhencheckingareal-worlde-mailmessageforadate,remembertodoitinthreesteps:firstcheckwhetheraDateheaderispresentatall;thenbepreparedforNonetobereturnedwhenyouparseit;andfinallyapplythetimezoneconversiontogetarealtimestampthatyoucanworkwith.
ParsingDates
Sofarwehavediscussede-mailmessagesthatareplaintext:thecharactersaftertheblanklinethatendstheheadersaretobepresentedliterallytotheuserasthecontentofthee-mailmessage.Today,onlyafractionofthemessagessentacrosstheInternetaresosimple.
TheMultipurposeInternetMailExtensions(MIME)standardisasetofrulesforencodingdata,ratherthansimpleplaintext,insidee-mails.MIMEprovidesasystemforthingslikeattachments,alternativemessageformats,andtextthatisstoredinalternateencodings.BecauseMIMEmessageshavetobetransmittedanddeliveredthroughmanyofthesameolde-mailservicesthatwereoriginallydesignedtohandleplain-texte-mails,MIMEoperatesbyaddingheaderstoane-mailmessageandthengivingitcontentthatlookslikeplaintexttothemachinebutthatcanactuallybedecodedbyane-mailclientintoHTML,images,orattachments.
ThemostimportantfeaturesofMIMEare,first,thatMIMEsupportsmultipartmessages.Anormale-mailmessage,aswehaveseen,containssomeheadersandabody.ButaMIMEmessagecansqueezeseveraldifferentpartsintothemessagebody.Thesepartsmightbethingstobepresentedtotheuserinorder,likeaplain-textmessage,animagefileattachment,andthenaPDFattachment.Or,theycouldbealternativemultiparts,whichrepresentthesamecontentindifferentways—usually,byencodingamessageinbothplaintextandHTML.Second,MIMEsupportsdifferenttransferencodings.Traditionale-mailmessagesarelimitedto7-bitdata,whichrendersthemunusableforinternationalalphabets.MIMEhasseveralwaysoftransforming8-bitdatasoitfitswithintheconfinesofe-mailsystems:
The“plain”encodingisthesameasyouwouldseeintraditionalmessages,andpasses7-bittextunmodified.
“Base-64”isawayofencodingrawbinarydatathatturnsitintonormalalphanumericdata.Mostoftheattachmentsyousendandreceive—suchasimages,PDFs,andZIPfiles—areencodedwithbase-64.
“Quoted-printable”isahybridthattriestoleaveplainEnglishtextalonesothatitremainsreadableinoldmailreaders,whilealsolettingunusualcharactersbeincludedaswell.
MIMEalsoprovidescontenttypes,whichtelltherecipientwhatkindofcontentispresent.Forinstance,acontenttypeoftext/plainindicatesaplain-textmessage,whileimage/jpegisaJPEGimage.
YouwillrecallthatMIMEmessagesmustworkwithinthelimitedplain-textframeworkoftraditionalemailmessages.Todothat,theMIMEspecificationdefinessomeheadersandsomerulesaboutformattingthebodytext.
Fornon-multipartmessagesthatareasingleblockofdata,MIMEsimplyaddssomeheaderstospecifywhatkindofcontentthee-mailcontains,alongwithitscharacterset.Butthebodyofthemessageisstillasinglepiece,althoughitmightbeencodedwithoneoftheschemesalreadydescribed.
Formultipartmessages,thingsgettrickier:MIMEplacesaspecialmarkerinthee-mailbodyeverywherethatitneedstoseparateonepartfromthenext.Eachpartcanthenhaveitsownlimitedsetofheaders—whichoccuratthestartofthepart—followedbydata.Byconvention,themostbasiccontentinane-mailcomesfirst(likeaplain-textmessage,ifonehasbeenincluded),sothatpeoplewithoutMIME-awarereaderswillseetheplaintextimmediatelywithouthavingtoscrolldownthroughdozensorhundredsofpagesofMIMEdata.
UnderstandingMIME
HowMIMEworks
WewillstartbylookingathowtocreateMIMEmessages.Tocomposeamessagewithattachments,youwillgenerallyfollowthesesteps:
1. CreateaMIMEMultipartobjectandsetitsmessageheaders.2. CreateaMIMETextobjectwiththemessagebodytextandattachittotheMIMEMultipartobject.3. CreateappropriateMIMEobjectsforeachattachmentandattachthemtotheMIMEMultipartobject.4. Finally,callas_string()ontheMIMEMultipartobjecttowriteouttheresultingmessage.
Takealookatmime_gen_basic.pyforaprogramthatimplementsthisalgorithm.Youcanseethatpartsofthecodelooksimilartologicthatweusedtogenerateatraditionale-mail.Aftercreatingthemessageanditstextbody,theprogramloopsovereachfilegivenonthecommandlineandattachesittothegrowingmessage.
fromemail.mime.baseimportMIMEBase
fromemail.mime.multipartimportMIMEMultipart
fromemail.mime.textimportMIMEText
fromemailimportutils,encoders
importmimetypes,sys
defattachment(filename):
fd=open(filename,'rb')
mimetype,mimeencoding=mimetypes.guess_type(filename)
ifmimeencodingor(mimetypeisNone):
mimetype='application/octet-stream'
maintype,subtype=mimetype.split('/')
ifmaintype=='text':
retval=MIMEText(fd.read(),_subtype=subtype)
else:
retval=MIMEBase(maintype,subtype)
retval.set_payload(fd.read())
encoders.encode_base64(retval)
retval.add_header('Content-Disposition','attachment',
filename=filename)
fd.close()
returnretval
message="""Hello,
Thisisatestmessage.
--Anonymous"""
msg=MIMEMultipart()
msg['To']='recipient@example.com'
msg['From']='TestSender<sender@example.com>'
msg['Subject']='TestMessage'
msg['Date']=utils.formatdate(localtime=1)
msg['Message-ID']=utils.make_msgid()
body=MIMEText(message,_subtype='plain')
msg.attach(body)
forfilenameinsys.argv[1:]:
msg.attach(attachment(filename))
printmsg.as_string()
Theattachment()functiondoestheworkofcreatingamessageattachmentobject.First,itdeterminestheMIMEtypeofeachfilebyusingPython’sbuilt-inmimetypesmodule.Ifthetypecan’tbedetermined,oritwillneedaspecialkindofencoding,thenatypeisdeclaredthatpromisesonlythatthedataismadeofa“streamofoctets”(sequenceofbytes)butwithoutanyfurtherpromiseaboutwhattheymean.IfthefileisatextdocumentwhoseMIMEtypestartswithtext/,aMIMETextobjectiscreatedtohandleit;otherwise,aMIMEBase)genericobjectiscreated.Inthelattercase,thecontentsareassumedtobebinary,sotheyareencodedwithbase-64.Finally,anappropriateContent-DispositionheaderisaddedtothatsectionoftheMIMEfilesothatmailreaderswillknowthattheyaredealingwithanattachment.
Theresultofrunningthisprogramisshownbelow:
ComposingMIMEAttachments
root@erlerobot:~/Python_files#echo"Thisisatest">test.txt
root@erlerobot:~/Python_files#gzip<test.txt>test.txt.gz
root@erlerobot:~/Python_files#pythonmime_gen_basic.pytest.txttest.txt.gz
Content-Type:multipart/mixed;boundary="===============1623374356=="
MIME-Version:1.0
To:recipient@example.com
From:TestSender<sender@example.com>
Subject:TestMessage
Date:Mon,14Jul201414:36:07+0200
Message-ID:<20140714123150.987.14344@root-erlerobot.local>
--===============1623374356==
Content-Type:text/plain;charset="us-ascii"
MIME-Version:1.0
Content-Transfer-Encoding:7bit
Hello,
Thisisatestmessage.
--Anonymous
--===============1623374356==
Content-Type:text/plain;charset="us-ascii"
MIME-Version:1.0
Content-Transfer-Encoding:7bit
Content-Disposition:attachment;filename="test.txt"
Thisisatest
--===============1623374356==
Content-Type:application/octet-stream
MIME-Version:1.0
Content-Transfer-Encoding:base64
Content-Disposition:attachment;filename="test.txt.gz"
H4sIAP3o2D8AAwvJyCxWAKJEhZLU4hIuAIwtwPoPAAAA
--===============1623374356==--
Themessagestartsofflookingquitesimilartothetraditionaloneswecreatedearlier;youcanseefamiliarheaderslikeTo,From,andSubjectjustlikebefore.NotetheContent-Typeline,however:itindicatesmultipart/mixed.ThattellsthemailreaderthatthebodyofthemessagecontainsmultipleMIMEparts,andthatthestringcontainingequalssignswillbetheseparatorbetweenthem.Nextcomesthemessage’sfirstpart.NoticethatithasitsownContent-Typeheader!Thesecondpartlookssimilartothefirst,buthasanadditionalContent-Dispositionheader;thiswillsignalmoste-mailreadersthatthepartshouldbedisplayedasafilethattheusercansaveratherthanbeingimmediatelydisplayedtothescreen.Finallycomesthepartcontainingthebinaryfile,encodedwithbase-64,whichmakesitnotdirectlyreadable.
MIME“alternative”partsletyougeneratemultipleversionsofasingledocument.Theuser’smailreaderwillthenautomaticallydecidewhichonetodisplay,dependingonwhichcontenttypeitlikesbest;somemailreadersmightevenshowtheuserradiobuttons,oramenu,andletthemchoose.Theprocessofcreatingalternativesissimilartotheprocessforattachments,andisillustratedinmime_gen_alt.py:
fromemail.mime.baseimportMIMEBase
fromemail.mime.multipartimportMIMEMultipart
fromemail.mime.textimportMIMEText
fromemailimportutils,encoders
defalternative(data,contenttype):
maintype,subtype=contenttype.split('/')
ifmaintype=='text':
retval=MIMEText(data,_subtype=subtype)
else:
retval=MIMEBase(maintype,subtype)
retval.set_payload(data)
encoders.encode_base64(retval)
returnretval
messagetext="""Hello,
Thisisa*great*testmessage.
--Anonymous"""
messagehtml="""Hello,<P>
Thisisa<B>great</B>testmessagefromChapter12.Ihopeyouenjoy
it!<P>
--<I>Anonymous</I>"""
msg=MIMEMultipart('alternative')
msg['To']='recipient@example.com'
msg['From']='TestSender<sender@example.com>'
msg['Subject']='TestMessage,Chapter12'
msg['Date']=utils.formatdate(localtime=1)
msg['Message-ID']=utils.make_msgid()
msg.attach(alternative(messagetext,'text/plain'))
msg.attach(alternative(messagehtml,'text/html'))
printmsg.as_string()
Noticethedifferencesbetweenanalternativemessageandamessagewithattachments!Withthealternativemessage,noContent-Dispositionheaderisinserted.Also,theMIMEMultipartobjectispassedthealternativesubtypetotellthemailreaderthatallobjectsinthismultipartarealternativeviewsofthesamething.Noteagainthatitisalwaysmostpolitetoincludetheplain-textobjectfirstforpeoplewithancientorincapablemailreaders,whichsimplyshowthemtheentiremessageastext.
MIMEAlternativeParts
AlthoughyouhaveseenhowMIMEcanencodemessagebodypartswithbase-64toallow8-bitdatatopassthrough,thatdoesnotsolvetheproblemofspecialcharactersinheaders.Forinstance,ifyournamewasMichaelMuller(withanumlautoverthe“u”),youwouldhavetroublerepresentingyournameaccuratelyinyourownalphabet.The“u”wouldcomeoutbare.Therefore,MIMEprovidesawaytoencodedatainheaders.Takealookatmime_headers.pyforhowtodoitinPython.
fromemail.mime.textimportMIMEText
fromemail.headerimportHeader
message="""Hello,
Thisisatestmessage.
--Anonymous"""
msg=MIMEText(message)
msg['To']='recipient@example.com'
fromhdr=Header()
fromhdr.append(u"MichaelM\xfcller")
fromhdr.append('<mmueller@example.com>')
msg['From']=fromhdr
msg['Subject']='TestMessage'
printmsg.as_string()
Thecode'\xfc'intheUnicodestring(stringsinPythonsourcefilesthatareprefixedwithucancontainarbitraryUnicodecharacters,ratherthanbeingrestrictedtocharacterswhosevalueisbetween0and255).
root@erlerobot:~/Python_files#pythonmime_headers.py
Content-Type:text/plain;charset="us-ascii"
MIME-Version:1.0
Content-Transfer-Encoding:7bit
To:recipient@example.com
From:=?iso-8859-1?q?Michael_M=FCller?=<mmueller@example.com>
Subject:TestMessage
Date:Mon,14Jul201414:46:33+0200
Message-ID:<20140714123150.987.14344@root-erlerobot.local>
Hello,
Thisisatestmessage.
--Anonymous
ComposingNon-EnglishHeaders
Nowthatyouknowhowtogenerateamessagewithalternativesandonewithattachments,youmaybewonderinghowtodoboth.Todothat,youcreateastandardmultipartforthemainmessage.Thenyoucreateamultipart/alternativeinsidethatforyourbodytext,andattachyourmessageformatstoit.Finally,youattachthevariousfiles.Takealookatmime_gen_both.pyforthecompletesolution.
fromemail.mime.textimportMIMEText
fromemail.mime.multipartimportMIMEMultipart
fromemail.mime.baseimportMIMEBase
fromemailimportutils,encoders
importmimetypes,sys
defgenpart(data,contenttype):
maintype,subtype=contenttype.split('/')
ifmaintype=='text':
retval=MIMEText(data,_subtype=subtype)
else:
retval=MIMEBase(maintype,subtype)
retval.set_payload(data)
encoders.encode_base64(retval)
returnretval
defattachment(filename):
fd=open(filename,'rb')
mimetype,mimeencoding=mimetypes.guess_type(filename)
ifmimeencodingor(mimetypeisNone):
mimetype='application/octet-stream'
retval=genpart(fd.read(),mimetype)
retval.add_header('Content-Disposition','attachment',
filename=filename)
fd.close()
returnretval
messagetext="""Hello,
Thisisa*great*testmessagefromChapter12.Ihopeyouenjoyit!
--Anonymous"""
messagehtml="""Hello,<P>
Thisisa<B>great</B>testmessage<P>
--<I>Anonymous</I>"""
msg=MIMEMultipart()
msg['To']='recipient@example.com'
msg['From']='TestSender<sender@example.com>'
msg['Subject']='TestMessage'
msg['Date']=utils.formatdate(localtime=1)
msg['Message-ID']=utils.make_msgid()
body=MIMEMultipart('alternative')
body.attach(genpart(messagetext,'text/plain'))
body.attach(genpart(messagehtml,'text/html'))
msg.attach(body)
forfilenameinsys.argv[1:]:
msg.attach(attachment(filename))
printmsg.as_string()
ComposingNestedMultiparts
Python’semailmodulecanreadamessagefromafileorastring,andgeneratethesamekindofinmemoryobjecttreethatweweregeneratingourselvesintheaforementionedlistings.Tounderstandthee-mail’scontent,allyouhavetodo
isstepthroughitsstructure.Showanexampleatmime_structure.py`:
importsys,email
defprintmsg(msg,level=0):
prefix="|"*level
prefix2=prefix+"|"
printprefix+"+MessageHeaders:"
forheader,valueinmsg.items():
printprefix2,header+":",value
ifmsg.is_multipart():
foriteminmsg.get_payload():
printmsg(item,level+1)
msg=email.message_from_file(sys.stdin)
printmsg(msg)
Thisprogramisshortandsimple.Foreachobjectitencounters,itcheckstoseeifitismultipart;ifso,thechildrenofthatobjectaredisplayedaswell.Individualpartsofamessagecaneasilybeextracted.Youwillrecallthatthereareseveralwaysthatmessagedatamaybeencoded;fortunately,theemailmodulecandecodethemall!mime_decode.pyshowsaprogramthatwillletyoudecodeandsaveanycomponentofaMIMEmessage:
importsys,email
counter=0
parts=[]
defprintmsg(msg,level=0):
globalcounter
l="|"*level
ifmsg.is_multipart():
printl+"Foundmultipart:"
foriteminmsg.get_payload():
printmsg(item,level+1)
else:
disp=['%d.Decodablepart'%(counter+1)]
if'content-type'inmsg:
disp.append(msg['content-type'])
if'content-disposition'inmsg:
disp.append(msg['content-disposition'])
printl+",".join(disp)
counter+=1
parts.append(msg)
inputfd=open(sys.argv[1])
msg=email.message_from_file(inputfd)
printmsg(msg)
while1:
print"Selectpartnumbertodecodeorqtoquit:"
part=sys.stdin.readline().strip()
ifpart=='q':
sys.exit(0)
try:
part=int(part)
msg=parts[part-1]
except:
print"Invalidselection."
continue
print"Selectfiletowriteto:"
filename=sys.stdin.readline().strip()
try:
fd=open(filename,'wb')
except:
print"Invalidfilename."
continue
ParsingMIMEMessages
fd.write(msg.get_payload(decode=1))
Thisprogramstepsthroughthemessage,likethelastexample.Weskipaskingtheuseraboutmessagecomponentsthataremultipartbecausethoseexistonlytocontainothermessageobjects,liketextandattachments;multipartsectionshavenoactualpayloadoftheirown.
ThelasttrickthatweshouldcoverregardingMIMEmessagesisdecodingheadersthatmayhavebeenencodedwithforeignlanguages.Thefunctiondecode_header()takesasingleheaderandreturnsalistofpiecesoftheheader;eachpieceisabinarystringtogetherwithitsencoding(namedasastringifitissomethingbesides7-bitASCII,elsethevalueNone):
>>>x='=?iso-8859-1?q?Michael_M=FCller?=<mmueller@example.com>'
>>>importemail.header
>>>pieces=email.header.decode_header(x)
>>>printpieces
[('MichaelM\xfcller','iso-8859-1'),('<mmueller@example.com>',None)]
Ofcourse,thisrawinformationislikelytobeoflittleusetoyou.Toinsteadseetheactualtextinsidetheencoding,usethedecode()functionofeachbinarystringinthelist(fallingbacktoan‘ascii’encodingifNonewasreturned)andpastetheresulttogetherwithspaces:
>>>print''.join(s.decode(encor'ascii')fors,encinpieces)
MichaelMuller<mmueller@example.com>
Itisalwaysgoodpracticetousedecode_header()onanyofthe“bigthree”headers—From,To,andSubject—beforedisplayingthemtotheuser.Ifnospecialencodingwasused,thentheresultwillsimplybeaone-elementlistcontainingtheheaderstringwithaNoneencoding.
DecodingHeaders
Theactualmovementofe-mailbetweensystemsisaccomplishedthroughSMTP:the“SimpleMailTransportProtocol.”InthischapterwewillanalyzeSMTPindepth.
SimpleMailTransportProtocol(SMTP)
TheroleofSMTPinmessagesubmission,wheretheuserpresses“Send”andexpectsamessagetogowingingitswayacrosstheInternet,willprobablybeleastconfusingifwetracethehistoryofhowusershavehistoricallyworkedwithInternetmail.Thekeyconcepttounderstandaswebeginthishistoryisthatusershaveneverbeenaskedtositaroundandwaitforane-mailmessagetoactuallybedelivered.Thisprocesscanoftentakequiteabitoftime—anduptoseveraldozenrepeatedattempts—beforeane-mailmessageisactuallydeliveredtoitsdestination.Anynumberofthingscouldcausedelays:amessagecouldhavetowaitbecauseothermessagesarealreadybeingtransmittedacrossalinkoflimitedbandwidth;thedestinationservermightbedownforafewhours,oritsnetworkmightnotbecurrentlyaccessiblebecauseofaglitch;andifthemailisdestinedforalargeorganization,thenitmighthavetomakeseveraldifferent“hops”asitarrivesatthebiguniversityserver,thenisdirectedtoasmallercollegee-mailmachine,andthenfinallyisdirectedtoadepartmentale-mailserver.
TheroleofSMTPinmessagesubmission,wheretheuserpresses“Send”andexpectsamessagetogowingingitswayacrosstheInternet,willprobablybeleastconfusingifwetracethehistoryofhowusershavehistoricallyworkedwithInternetmail.Thekeyconcepttounderstandaswebeginthishistoryisthatusershaveneverbeenaskedtositaroundandwaitforane-mailmessagetoactuallybedelivered.Thisprocesscanoftentakequiteabitoftime—anduptoseveraldozenrepeatedattempts—beforeane-mailmessageisactuallydeliveredtoitsdestination.Anynumberofthingscouldcausedelays:amessagecouldhavetowaitbecauseothermessagesarealreadybeingtransmittedacrossalinkoflimitedbandwidth;thedestinationservermightbedownforafewhours,oritsnetworkmightnotbecurrentlyaccessiblebecauseofaglitch;andifthemailisdestinedforalargeorganization,thenitmighthavetomakeseveraldifferent“hops”asitarrivesatthebiguniversityserver,thenisdirectedtoasmallercollegee-mailmachine,andthenfinallyisdirectedtoadepartmentale-mailserver.
E-mailbrowsingandsubmission,therefore,becomeablackbox:yourbrowserinteractswithawebAPI,andontheotherend,youwillseeplainoldSMTPconnectionsoriginatingfromandgoingtothelargeorganizationasmailisdeliveredineachdirection.Butintheworldofwebmail,clientprotocolsareremovedfromtheequation,takingusbacktotheolddaysofpureserver-to-serverunauthenticatedSMTP.
E-mailClients,WebmailServices
TheforegoingnarrativehashopefullyhelpedyoustructureyourthinkingaboutInternete-mailprotocols,andrealizehowtheyfittogetherinthebiggerpictureofgettingmessagestoandfromusers.Butthesubjectofthischapterisanarrowerone—theSimpleMailTransportProtocolinparticular.Andweshouldstartbystatingthebasics:
SMTPisaTCP/IP-basedprotocol.Connectionscanbeauthenticated,ornot.Connectionscanbeencrypted,ornot.
Moste-mailconnectionsacrosstheInternetthesedaysseemtolackanyattemptatencryption,whichmeansthatwhoeverownstheInternetbackboneroutersaretheoreticallyinapositiontoreadsimplystaggeringamountsofotherpeople’smail.
WhatarethetwowaysthatSMTPisused?First,SMTPcanbeusedfore-mailsubmissionbetweenacliente-mailprogramlikeThunderbirdorOutlook,claimingthatauserwantstosende-mail,andaserveratanorganizationthathasgiventhatuserane-mailaddress.Theseconnectionsgenerallyuseauthentication,sothatspammerscannotconnectandsendmillionsofmessagesonauser’sbehalfwithouthisorherpassword.Oncereceived,theserverputsthemessageinaqueuefordelivery(andoftenmakesitsfirstattemptatsendingitmomentslater),andtheclientcanforgetaboutthemessageandpresumetheserverwillkeeptryingtodeliverit.Second,SMTPisusedbetweenInternetmailserversastheymovee-mailfromitsorigintoitsdestination.Thistypicallyinvolvesnoauthentication;afterall,bigorganizationslikeGoogle,Yahoo!,andMicrosoftdonotknowthepasswordsofeachother’susers,sowhenYahoo!receivesane-mailfromGoogleclaimingthatitwassentfroman@gmail.comuser,Yahoo!justhastobelievethem(ornot—sometimesorganizationsblacklisteachotheriftoomuchspamismakingitthroughtheirservers,ashappenedtoafriendofminetheotherdaywhenHotmailstoppedacceptinghisclient’snewslettersfromGoDaddy’sserversbecauseofallegedproblemswithspam).
So,typically,noauthenticationtakesplacebetweenserverstalkingSMTPtoeachother—andevenencryptionagainstsnoopingroutersseemstobeusedonlyrarely.Becauseoftheproblemofspammersconnectingtoe-mailserversandclaimingtobedeliveringmailfromanotherorganization’susers,therehasbeenanattemptmadetolockdownwhocansendemailonanorganization’sbehalf.Thoughcontroversial,somee-mailserversconsulttheSenderPolicyFramework(SPF),definedinRFC4408,toseewhethertheservertheyaretalkingtoreallyhastheauthoritytodeliverthee-mailsitistransmitting.ButtheSPFandotheranti-spamtechnologiesareunfortunatelybeyondthescopeofthisbook,whichmustlimititselftothequestionofusingthebasicprotocolsthemselvesfromPython.SowenowturntothemoretechnicalquestionofhowyouwillactuallyuseSMTPfromyourPythonprograms.
HowSMTPIsUsed
Successfullysendinge-mailgenerallyrequiresaqueuewhereamessagecansitforseconds,minutes,ordaysuntilitcanbesuccessfullytransmittedtowarditsdestination.SoyoutypicallydonotwantyourprogramsusingPython’ssmtplibtosendmaildirectlytoamessage’sdestination—becauseifyourfirsttransmissionattemptfails,thenyouwillbestuckwiththejobofwritingafull“mailtransferagent”(MTA),astheRFCscallane-mailserver,andgiveitafullstandards-compliantre-tryqueue.Thisisnotonlyabigjob,butalsoonethathasalreadybeendonewellseveraltimes,andyouwillbewisetotakeadvantageofoneoftheexistingMTAs(lookatpostfix,exim,andqmail)beforetryingtowritesomethingofyourown.
SoonlyrarelywillyoubemakingSMTPconnectionsoutintotheworldfromPython.Moreusually,yoursystemadministratorwilltellyouoneoftwothings:
ThatyoushouldmakeanauthenticatedSMTPconnectiontoanexistinge-mailserver,usingausernameandpasswordthatwillbelongtoyourapplication,andgiveitpermissiontousethee-mailservertoqueueoutgoingmessages
Thatyoushouldrunalocalbinaryonthesystem—likethesendmailprogram—thatthesystemadministratorhasalreadygonetothetroubletoconfiguresothatlocalprogramscansendmail.
SendingE-Mail
Python’sbuilt-inSMTPimplementationisinthePythonStandardLibrarymodulesmtplibPython’sbuilt-inSMTPimplementationisinthePythonStandardLibrarymodulesmtplib,whichmakesiteasytodosimpletaskswithSMTP.
Intheexamplesthatfollow,theprogramsaredesignedtotakeseveralcommand-linearguments:thenameofanSMTPserver,asenderaddress,andoneormorerecipientaddresses.Pleaseusethemcautiously;nameonlyanSMTPserverthatyouyourselfrunorthatyouknowwillbehappyreceivingyourtestmessages,lestyouwindupgettinganIPaddressbannedforsendingspam!Ifyoudon’tknowwheretofindanSMTPserver,youmighttryrunningamaildaemonlikepostfixoreximlocallyandthenpointingtheseexampleprogramsatlocalhost.ManyUNIX,Linux,andMacOSXsystemshaveanSMTPserverlikeoneofthesealreadylisteningforconnectionsfromthelocalmachine.
Otherwise,consultyournetworkadministratororInternetprovidertoobtainaproperhostnameandport.Notethatyouusuallycannotjustpickamailserveratrandom;manystoreorforwardmailonlyfromcertainauthorizedclients.So,takealookatsimple.pyforaverysimpleSMTPprogram:
importsys,smtplib
iflen(sys.argv)<4:
print"usage:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]
sys.exit(2)
server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]
message="""To:%s
From:%s
Subject:TestMessagefromsimple.py
Hello,
Thisisatestmessagesenttoyoufromthesimple.pyprogram.
"""%(','.join(toaddrs),fromaddr)
s=smtplib.SMTP(server)
s.sendmail(fromaddr,toaddrs,message)
print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)
So,takealookatsimple.pyforaverysimpleSMTPprogram.
python
importsys,smtplib
iflen(sys.argv)<4:
print"usage:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]
sys.exit(2)
server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]
message="""To:%s
From:%s
Subject:TestMessagefromsimple.py
Hello,
Thisisatestmessagesenttoyoufromthesimple.pyprogram.
"""%(','.join(toaddrs),fromaddr)
s=smtplib.SMTP(server)
s.sendmail(fromaddr,toaddrs,message)
print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)
Itstartsbygeneratingasimplemessagefromtheuser’scommand-linearguments.Thenitcreatesansmtplib.SMTPobjectthatconnectstothespecifiedserver.Finally,allthat’srequiredisacalltosendmail().Ifthatreturnssuccessfully,thenyou
IntroducingtheSMTPLibrary
knowthatthemessagewassent.
Whenyouruntheprogram,itwilllooklikethis:
root@erlerobot:~/Python_files#pythonsimple.pylocalhostsender@example.comrecipient@example.com
Messagesuccessfullysentto2recipient(s)
ThankstothehardworkthattheauthorsofthePythonStandardLibraryhaveputintothesendmail()method,itmightbetheonlySMTPcallyoueverneed.
Thereareseveraldifferentexceptionsthatmightberaisedwhileyou’reprogrammingwithsmtplib.Theyare:
socket.gaierrorforerrorslookingupaddressinformation.
socket.errorforgeneralI/Oandcommunicationproblems.
socket.herrorforotheraddressingerrors.
smtplib.SMTPExceptionorasubclassofitforSMTPconversationproblems.
Thesmtplibmodulealsoprovidesawaytogetaseriesofdetailedmessagesaboutthestepsittakestosendane-mail.Toenablethatlevelofdetail,youcancallsmtpobj.set_debuglevel(1)Withthisoption,youshouldbeabletotrackdownanyproblems.Takeaalookatdebug.pyforanexampleprogramthatprovidesbasicerrorhandlinganddebugging.
importsys,smtplib,socket
iflen(sys.argv)<4:
print"usage:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]
sys.exit(2)
server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]
message="""To:%s
From:%s
Subject:TestMessagefromsimple.py
Hello,
Thisisatestmessagesenttoyoufromthedebug.pyprogram.
"""%(','.join(toaddrs),fromaddr)
try:
s=smtplib.SMTP(server)
s.set_debuglevel(1)
s.sendmail(fromaddr,toaddrs,message)
except(socket.gaierror,socket.error,socket.herror,
smtplib.SMTPException),e:
print"***Yourmessagemaynothavebeensent!"
printe
sys.exit(1)
else:
print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)
Thisprogramlookssimilartothelastone.However,theoutputwillbeverydifferent.
root@erlerobot:~/Python_files#pythondebug.pylocalhostfoo@example.comjgoerzen@complete.org
send:'ehlolocalhost\r\n'
reply:'250-localhost\r\n'
reply:'250-PIPELINING\r\n'
reply:'250-SIZE20480000\r\n'
reply:'250-VRFY\r\n'
reply:'250-ETRN\r\n'
reply:'250-STARTTLS\r\n'
...
Messagesuccessfullysentto1recipient(s)
Fromthisexample,youcanseetheconversationthatsmtplibishavingwiththeSMTPserveroverthenetwork.Let’slookatwhat’shappening:First,theclient(thesmtpliblibrary)sendsanEHLOcommand(an“extended”successortoamoreancientcommandthatwasnamed,morereadably,HELO)withyourhostnameinit.Theremoteserverrespondswithitshostname,andlistsanyoptionalSMTPfeaturesthatitsupports.Next,theclientsendsthemailfromcommand,whichstatesthe“envelopesender”e-mailaddressandthesizeofthemessage.Theserveratthismomenthastheopportunitytorejectthemessage(forexample,becauseitthinksyouareaspammer);butinthiscase,itrespondswith250Ok.(Note
ErrorHandlingandConversationDebugging
thatinthiscase,thecode250iswhatmatters;theremainingtextisjustahuman-readablecommentandvariesfromservertoserver.)Thentheclientsendsarcpttocommand,withthe“enveloperecipient”thatwetalkedsomuchaboutearlierinthischapter;youcanfinallyseethat,indeed,itistransmittedseparatelyfromthetextofthemessageitselfwhenusingtheSMTPprotocol.Ifyouweresendingthemessagetomorethanonerecipient,theywouldeachbelistedonthercpttoline.Finally,theclientsendsadatacommand,transmitstheactualmessage(usingverbosecarriagereturn-linefeedlineendings,youwillnote,pertheInternete-mailstandard),andfinishestheconversation.
Thesmtplibmoduleisdoingallthisautomaticallyforyouinthisexample.Intherestofthechapter,wewilllookathowtotakemorecontroloftheprocesssoyoucantakeadvantageofsomemoreadvancedfeatures.
SometimesitisnicetoknowaboutwhatkindofmessagesaremoteSMTPserverwillaccept.Forinstance,mostSMTPservershavealimitonwhatsizemessagetheypermit,andifyoufailtocheckfirst,thenyoumaytransmitaverylargemessageonlytohaveitrejectedwhenyouhavecompletedtransmission.
SomeserversdonotsupportESMTP.Onthoseservers,EHLOwilljustreturnanerror.Inthatcase,youmustsendaHELOcommandinstead.Inthepreviousexamples,weusedsendmail()immediatelyaftercreatingourSMTPobject,sosmtplibhadtosenditsown“hello”messagetotheserver.ButifitseesyouattempttosendtheEHLOorHELOcommandonyourown,thensendmail()willnolongerattempttosendthesecommandsitself.ehlo.pyshowsaprogramthatgetsthemaximumsizefromtheserver,andreturnsanerrorbeforesendingifamessagewouldbetoolarge.
importsys,smtplib,socket
iflen(sys.argv)<4:
print"usage:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]
sys.exit(2)
server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]
message="""To:%s
From:%s
Subject:TestMessagefromsimple.py
Hello,
Thisisatestmessagesenttoyoufromtheehlo.pyprogram.
"""%(','.join(toaddrs),fromaddr)
try:
s=smtplib.SMTP(server)
code=s.ehlo()[0]
uses_esmtp=(200<=code<=299)
ifnotuses_esmtp:
code=s.helo()[0]
ifnot(200<=code<=299):
print"RemoteserverrefusedHELO;code:",code
sys.exit(1)
ifuses_esmtpands.has_extn('size'):
print"Maximummessagesizeis",s.esmtp_features['size']
iflen(message)>int(s.esmtp_features['size']):
print"Messagetoolarge;aborting."
sys.exit(1)
s.sendmail(fromaddr,toaddrs,message)
except(socket.gaierror,socket.error,socket.herror,
smtplib.SMTPException),e:
print"***Yourmessagemaynothavebeensent!"
printe
sys.exit(1)
else:
print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)
Ifyourunthisprogram,andtheremoteserverprovidesitsmaximummessagesize,thentheprogramwilldisplaythesizeonyourscreenandverifythatitsmessagedoesnotexceedthatsizebeforesending.Hereiswhatrunningthisprogrammightlooklike:
root@erlerobot:~/Python_files#pythonehlo.pylocalhostfoo@example.comjgoerzen@complete.orgMaximummessagesizeis10240000
Messagesuccessfullysentto1recipient(s)
Takealookatthepartofthecodethatverifiestheresultfromacalltoehlo()orhelo().Thosetwofunctionsreturnalist;thefirstiteminthelistisanumericresultcodefromtheremoteSMTPserver.
GettingInformationfromEHLO
E-mailssentinplaintextoverSMTPcanbereadbyanyonewithaccesstoanInternetgatewayorrouteracrosswhichthepacketshappentopass.Thebestsolutiontothisproblemistoencrypteache-mailwithapublickeywhoseprivatekeyispossessedonlybythepersontowhomyouaresendingthee-mail;therearefreelyavailablesystemssuchasPGPandGPGfordoingexactlythis.Butregardlessofwhetherthemessagesthemselvesareprotected,individualSMTPconversationsbetweenparticularpairsofmachinescanbeencryptedandauthenticatedusingamethodknownasSSL/TLS.
ThegeneralprocedureforusingTLSinSMTPisasfollows:
1. CreatetheSMTPobject,asusual.2. SendtheEHLOcommand.IftheremoteserverdoesnotsupportEHLO,thenitwillnotsupportTLS.3. Checks.has_extn()toseeifstarttlsispresent.Ifnot,thentheremoteserverdoesnotsupportTLSandthemessage
canonlybesentnormally,intheclear.4. Callstarttls()toinitiatetheencryptedchannel.5. Callehlo()asecondtime;thistime,it’sencrypted.6. Finally,sendyourmessage.
ThefirstquestionyouhavetoaskyourselfwhenworkingwithTLSiswhetheryoushouldreturnanerrorifTLSisnotavailable.Dependingonyourapplication,youmightwanttoraiseanerrorforanyofthefollowing:
ThereisnosupportforTLSontheremoteside.
TheremotesidefailstoestablishaTLSsessionproperly.
Theremoteserverpresentsacertificatethatcannotbevalidated.
tls.pyactsasaTLS-capablegeneral-purposeclient.ItwillconnecttoaserveranduseTLSifitcan;otherwise,itwillfallbackandsendthemessageasusual.(ButitwilldiewithanerroriftheattempttostartTLSfailswhiletalkingtoanostensiblycapableserver).
importsys,smtplib,socket
iflen(sys.argv)<4:
print"Syntax:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]
sys.exit(2)
server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]
message="""To:%s
From:%s
Subject:TestMessagefromsimple.py
Hello,
Thisisatestmessagesenttoyoufromthetls.pyprogram
inFoundationsofPythonNetworkProgramming.
"""%(','.join(toaddrs),fromaddr)
try:
s=smtplib.SMTP(server)
code=s.ehlo()[0]
uses_esmtp=(200<=code<=299)
ifnotuses_esmtp:
code=s.helo()[0]
ifnot(200<=code<=299):
print"RemoveserverrefusedHELO;code:",code
sys.exit(1)
ifuses_esmtpands.has_extn('starttls'):
print"NegotiatingTLS...."
s.starttls()
code=s.ehlo()[0]
ifnot(200<=code<=299):
UsingSecureSocketsLayerandTransportLayerSecurity
print"Couldn'tEHLOafterSTARTTLS"
sys.exit(5)
print"UsingTLSconnection."
else:
print"ServerdoesnotsupportTLS;usingnormalconnection."
s.sendmail(fromaddr,toaddrs,message)
except(socket.gaierror,socket.error,socket.herror,
smtplib.SMTPException),e:
print"***Yourmessagemaynothavebeensent!"
printe
sys.exit(1)
else:
print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)
IfyourunthisprogramandgiveitaserverthatunderstandsTLS,theoutputwilllooklikethis:
root@erlerobot:~/Python_files#pythontls.pyjgoerzen@complete.orgjgoerzen@complete.org
NegotiatingTLS....
UsingTLSconnection.
Messagesuccessfullysentto1recipient(s)
Noticethatthecalltosendmail()intheselastfewlistingsisthesame,regardlessofwhetherTLSisused.
WereachthetopicofAuthenticatedSMTP,whereyourISP,university,orcompanye-mailserverneedsyoutologinwithausernameandpasswordtoprovethatyouarenotaspammerbeforetheyallowyoutosende-mail.
Formaximumsecurity,TLSshouldbeusedinconjunctionwithauthentication;otherwiseyourpassword(andusername,forthatmatter)willbevisibletoanyoneobservingtheconnection.TheproperwaytodothisistoestablishtheTLSconnectionfirst,andthensendyourauthenticationinformationonlyovertheencryptedcommunicationschannel.
Butusingauthenticationitselfissimple;smtplibprovidesalogin()functionthattakesausernameandapassword.login.pyshowsanexample.Toavoidrepeatingcodealreadyshowninpreviouslistings,thislistingdoesnottaketheadviceofthepreviousparagraph,andsendstheusernameandpasswordoveranun-authenticatedconnectionthatwillsendthemintheclear.
importsys,smtplib,socket
fromgetpassimportgetpass
iflen(sys.argv)<4:
print"Syntax:%sserverfromaddrtoaddr[toaddr...]"%sys.argv[0]
sys.exit(2)
server,fromaddr,toaddrs=sys.argv[1],sys.argv[2],sys.argv[3:]
message="""To:%s
From:%s
Subject:TestMessagefromsimple.py
Hello,
Thisisatestmessagesenttoyoufromthelogin.pyprogram
inFoundationsofPythonNetworkProgramming.
"""%(','.join(toaddrs),fromaddr)
sys.stdout.write("Enterusername:")
username=sys.stdin.readline().strip()
password=getpass("Enterpassword:")
try:
s=smtplib.SMTP(server)
try:
s.login(username,password)
exceptsmtplib.SMTPException,e:
print"Authenticationfailed:",e
sys.exit(1)
s.sendmail(fromaddr,toaddrs,message)
except(socket.gaierror,socket.error,socket.herror,
smtplib.SMTPException),e:
print"***Yourmessagemaynothavebeensent!"
printe
sys.exit(1)
else:
print"Messagesuccessfullysentto%drecipient(s)"%len(toaddrs)
Youcanrunthisprogramjustlikethepreviousexamples.Ifyourunitwithaserverthatdoessupportauthentication,youwillbepromptedforausernameandpassword.Iftheyareaccepted,thentheprogramwillproceedtotransmityourmessage.
AuthenticatedSMTP
ThePostOfficeProtocol,isasimpleprotocolthatisusedtodownloade-mailfromamailserver,andistypicallyusedthroughane-mailclientlikeThunderbirdorOutlook.POPdoesnotsupportmultiplemailboxesontheremoteside,nordoesitprovideanyreliable,persistentmessageidentification.ThismeansthatyoucannotusePOPasaprotocolformailsynchronization.ThePythonStandardLibraryprovidesthepoplibmodule,whichprovidesaconvenientinterfaceforusingPOP.Inthischapter,youwilllearnhowtousepoplibtoconnecttoaPOPserver,gathersummaryinformationaboutamailbox,downloadmessages,anddeletetheoriginalsfromtheserver.
PostOfficeProtocol(POP)
POPsupportsseveralauthenticationmethods.Thetwomostcommonarebasicusername-passwordauthentication,andAPOP,whichisanoptionalextensiontoPOPthathelpsprotectpasswodsfrombeingsentinplain-textifyouareusinganancientPOPserverthatdoesnotsupportSSL.
TheprocessofconnectingandauthenticatingtoaremoteserverlookslikethisinPython:
1. CreateaPOP3_SSLorjustaplainPOP3object,andpasstheremotehostnameandporttoit.2. Calluser()andpass_()tosendtheusernameandpassword.Notetheunderscoreinpass_().Itispresentbecause
passisakeywordinPythonandcannotbeusedforamethodname.3. Iftheexceptionpoplib.error_protoisraised,itmeansthattheloginhasfailedandthestringvalueoftheexception
containstheerrorexplanationsentbytheserver.
ThechoicebetweenPOP3andPOP3_SSLisgovernedbywhetheryoure-mailprovideroffers—or,inthisdayandage,evenrequires—thatyouconnectoveranencryptedconnection.
popconn.pyusestheforegoingstepstologintoaremotePOPserver.Onceconnected,itcallsstat(),whichreturnsasimpletuplegivingthenumberofmessagesinthemailboxandthemessages’totalsize.Finally,theprogramcallsquit(),whichclosesthePOPconnection.
importgetpass,poplib,sys
iflen(sys.argv)!=3:
print'usage:%shostnameuser'%sys.argv[0]
exit(2)
hostname,user=sys.argv[1:]
passwd=getpass.getpass()
p=poplib.POP3_SSL(hostname)#or"POP3"ifSSLisnotsupported
try:
p.user(user)
p.pass_(passwd)
exceptpoplib.error_proto,e:
print"Loginfailed:",e
else:
status=p.stat()
print"Youhave%dmessagestotaling%dbytes"%status
finally:
p.quit()
YoucantestthisprogramifyouhaveaPOPaccountsomewhere.Theprogramwillthenpromptyouforyourpassword.Finally,itwilldisplaythemailboxstatus,withouttouchingoralteringanyofyourmail.
WhenPOPserversdonotsupportSSLtoprotectyourconnectionfromsnooping,theysometimesatleastsupportanalternateauthenticationprotocolcalledAPOP,whichusesachallenge-responseschemetoassurethatyourpasswordisnotsentintheclear.(Butallofyoure-mailwillstillbevisibletoanythirdpartywatchingthepacketsgoby)ThePythonStandardLibrarymakesthisveryeasytoattempt:justcalltheapop()method,thenfallbacktobasicauthenticationifthePOPserveryouaretalkingtodoesnotunderstand.TouseAPOPbutfallbacktoplainauthentication,youcoulduseastanzaliketheoneshownbelowinsideyourPOPprogram(likeponconn.py).
print"AttemptingAPOPauthentication..."
try:
p.apop(user,passwd)
exceptpoplib.error_proto:
print"Attemptingstandardauthentication..."
try:
p.user(user)
p.pass_(passwd)
exceptpoplib.error_proto,e:
print"Loginfailed:",e
ConnectingandAuthenticating
sys.exit(1)
Theprecedingexampleshowedyoustat(),whichreturnsthenumberofmessagesinthemailboxandtheirtotalsize.AnotherusefulPOPcommandislist(),whichreturnsmoredetailedinformationabouteachmessage.Themostinterestingpartisthemessagenumber,whichisrequiredtoretrievemessageslater.Notethattheremaybegapsinmessagenumbers:amailboxmay,forexample,containmessagenumbers1,2,5,6,and9.Also,thenumberassignedtoaparticularmessagemaybedifferentoneachconnectionyoumaketothePOPserver.mailbox.pyshowshowtousethelist()commandtodisplayinformationabouteachmessage.
importgetpass,poplib,sys
iflen(sys.argv)!=3:
print'usage:%shostnameuser'%sys.argv[0]
exit(2)
hostname,user=sys.argv[1:]
passwd=getpass.getpass()
p=poplib.POP3_SSL(hostname)
try:
p.user(user)
p.pass_(passwd)
exceptpoplib.error_proto,e:
print"Loginfailed:",e
else:
response,listings,octet_count=p.list()
forlistinginlistings:
number,size=listing.split()
print"Message%shas%sbytes"%(number,size)
finally:
p.quit()
Thelist()functionreturnsatuplecontainingthreeitems;youshouldgenerallypayattentiontotheseconditem.HereisitsrawoutputforoneofmyPOPmailboxesatthemoment,whichhasthreemessagesinit:
('+OK3messages(5675bytes)',['12395','21626',
'31654'],24)
Thethreestringsinsidetheseconditemgivethemessagenumberandsizeforeachofthethreemessagesinmyin-box.
ObtainingMailboxInformation
YoushouldnowbegettingthehangofPOP:whenusingpoplibyougettoissuesmallatomiccommandsthatalwaysreturnatupleinsidewhicharevariousstringsandlistsofstringsshowingyoutheresult.Wearenowreadytoactuallymanipulatemessages!Thethreerelevantmethods,whichallidentifymessagesusingthesameintegeridentifiersthatarereturnedbylist(),arethese:
retr(num):Thismethoddownloadsasinglemessageandreturnsatuplecontainingaresultcodeandthemessageitself,deliveredasalistoflines.ThiswillcausemostPOPserverstosetthe“seen”flagforthemessageto“true,”barringyoufromeverseeingitfromPOPagain(unlessyouhaveanotherwayintoyourmailboxthatletsyousetmessagesbackto“Unread”).
top(num,body_lines):Thismethodreturnsitsresultinthesameformatasretr()withoutmarkingthemessageas“seen.”Butinsteadofreturningthewholemessage,itjustreturnstheheadersplushowevermanylinesofthebodyyouaskforinbody_lines.Thisisusefulforpreviewingmessagesifyouwanttolettheuserdecidewhichonestodownload.
dele(num):ThismethodmarksthemessagefordeletionfromthePOPserver,totakeplacewhenyouquitthisPOPsession.Typicallyyouwoulddothisonlyiftheuserdirectlyrequestsirrevocabledestructionofthemessage,orifyouhavestoredthemessagetodiskandusedsomethinglikefsync()toassurethedata’ssafety.
Toputeverythingtogether,takealookatdownload-and-delete.py,whichisafairlyfunctionale-mailclientthatspeaksPOP.Itchecksyourin-boxtodeterminehowmanymessagesthereareandtolearnwhattheirnumbersare;thenitusestop()toofferapreviewofeachone;and,attheuser’soption,itcanretrievethewholemessage,andcanalsodeleteitfromthemailbox.
importemail,getpass,poplib,sys
iflen(sys.argv)!=3:
print'usage:%shostnameuser'%sys.argv[0]
exit(2)
hostname,user=sys.argv[1:]
passwd=getpass.getpass()
p=poplib.POP3_SSL(hostname)
try:
p.user(user)
p.pass_(passwd)
exceptpoplib.error_proto,e:
print"Loginfailed:",e
else:
response,listings,octets=p.list()
forlistinginlistings:
number,size=listing.split()
print'Message',number,'(sizeis',size,'bytes):'
response,lines,octets=p.top(number,0)
message=email.message_from_string('\n'.join(lines))
forheaderin'From','To','Subject','Date':
ifheaderinmessage:
printheader+':',message[header]
print'Readthismessage[ny]?'
answer=raw_input()
ifanswer.lower().startswith('y'):
response,lines,octets=p.retr(number)
message=email.message_from_string('\n'.join(lines))
print'-'*72
forpartinmessage.walk():
ifpart.get_content_type()=='text/plain':
printpart.get_payload()
print'-'*72
print'Deletethismessage[ny]?'
answer=raw_input()
DownloadingandDeletingMessages
ifanswer.lower().startswith('y'):
p.dele(number)
print'Deleted.'
finally:
p.quit()
Ifyourunthisprogram,you’llseeoutputsimilartothis:
root@erlerobot:~/Python_files#pythondownload-and-delete.pypop.gmail.commy_gmail_acct
Message1(sizeis1847bytes):
From:root@server.example.com
To:BrandonRhodes<brandon.craig.rhodes@gmail.com>
Subject:Backupcomplete
Date:Tue,13Apr201016:56:43-0700(PDT)
Readthismessage[ny]?
n
Deletethismessage[ny]?
y
Deleted.
SuchasPOP,IMAPisawaythatalaptopordesktopcomputercanconnecttoalargerInternetservertoviewandmanipulateauser’se-mail.WhereasthecapabilitiesofPOPareratheranemictheIMAPprotocolofferssuchafullarrayofcapabilitiesthatmanyusersstoretheire-mailpermanentlyontheserver,keepingitsafefromalaptopordesktopharddrivecrash.
Thischapterwillteachjustthebasics,withafocusonhowtobestconnectfromPython.
InternetMessageAccessProtocol(IMAP)
ThePythonStandardLibrarycontainsanIMAPclientinterfacenamedimaplib,whichdoesofferrudimentaryaccesstotheprotocol.Unfortunately,itlimitsitselftoknowinghowtosendrequestsanddelivertheirresponsesbacktoyourcode.ItmakesnoattempttoactuallyimplementthedetailedrulesintheIMAPspecificationforparsingthereturneddata.
Asanexampleofhowvaluesreturnedfromimaplibareusuallytoorawtobeusefullyusedinaprogram,takealookatopen_imaplib.py.ItisasimplescriptthatusesimaplibtoconnecttoanIMAPaccount,listthe“capabilities”thattheserveradvertises,andthendisplaythestatuscodeanddatareturnedbytheLISTcommand.
importgetpass,sys
fromimapclientimportIMAPClient
try:
hostname,username=sys.argv[1:]
exceptValueError:
print'usage:%shostnameusername'%sys.argv[0]
sys.exit(2)
c=IMAPClient(hostname,ssl=True)
try:
c.login(username,getpass.getpass())
exceptc.Error,e:
print'Couldnotlogin:',e
sys.exit(1)
print'Capabilities:',c.capabilities()
print'Listingmailboxes:'
data=c.list_folders()
forflags,delimiter,folder_nameindata:
print'%-30s%s%s'%(''.join(flags),delimiter,folder_name)
c.logout()
Ifyourunthisscriptwithappropriatearguments,itwillstartbyaskingforyourpassword—IMAPauthenticationisalmostalwaysaccomplishedthroughausernameandpassword:
root@erlerobot:~/Python_files#pythonopen_imaplib.pyimap.example.combrandon@example.com
Password:
Ifyourpasswordiscorrect,itwillthenprintoutaresponsethatlookssomethingliketheresultshownbelow:
Capabilities:('IMAP4REV1','UNSELECT','IDLE','NAMESPACE','QUOTA',
'XLIST','CHILDREN','XYZZY','SASL-IR','AUTH=XOAUTH')
Listingmailboxes
Status:'OK'
Data:
'(\\HasNoChildren)"/""INBOX"'
'(\\HasNoChildren)"/""Personal"'
'(\\HasNoChildren)"/""Receipts"'
'(\\HasNoChildren)"/""Travel"'
'(\\HasNoChildren)"/""Work"'
'(\\Noselect\\HasChildren)"/""[Gmail]"'
'(\\HasChildren\\HasNoChildren)"/""[Gmail]/AllMail"'
'(\\HasNoChildren)"/""[Gmail]/Drafts"'
'(\\HasChildren\\HasNoChildren)"/""[Gmail]/SentMail"'
'(\\HasNoChildren)"/""[Gmail]/Spam"'
'(\\HasNoChildren)"/""[Gmail]/Starred"'
'(\\HasChildren\\HasNoChildren)"/""[Gmail]/Trash"'
Therearetwomainproblems:First,wehavebeenreturneditsstatuscodemanuallyandsecond,imaplibgivesusnohelpininterpretingtheresults.
Sounlessyouwanttoimplementseveraldetailsoftheprotocolyourself,youwillwantamorecapableIMAPclientlibrary.
UnderstandingIMAPinPython
Fortunately,apopularandbattle-testedIMAPlibraryforPythondoesexist,andisavailableforeasyinstallationfromthePythonPackageIndex.TheIMAPClientpackageiswrittenbyafriendlyPythonprogrammernamedMennoSmits,andinfactusestheStandardLibraryOnceinstalled,youcanusethepythoninterpreterinthevirtualenvironmenttoruntheprogramshowninopen_imap.py.
importgetpass,sys
fromimapclientimportIMAPClient
try:
hostname,username=sys.argv[1:]
exceptValueError:
print'usage:%shostnameusername'%sys.argv[0]
sys.exit(2)
c=IMAPClient(hostname,ssl=True)
try:
c.login(username,getpass.getpass())
exceptc.Error,e:
print'Couldnotlogin:',e
sys.exit(1)
print'Capabilities:',c.capabilities()
print'Listingmailboxes:'
data=c.list_folders()
forflags,delimiter,folder_nameindata:
print'%-30s%s%s'%(''.join(flags),delimiter,folder_name)
c.logout()
Youcanseeimmediatelyfromthecodethatmoredetailsoftheprotocolexchangearenowbeinghandledonourbehalf.Forexample,wenolongergetastatuscodebackthatwehavetocheckeverytimewerunacommand;instead,thelibraryisdoingthatcheckforusandwillraiseanexceptiontostopusinourtracksifanythinggoeswrong.Second,youcanseethateachresultfromtheLISTcommand—whichinthislibraryisofferedasthelist_folders()methodinsteadofthelist()methodofferedbyimaplib—hasalreadybeenparsedintoPythondatatypesforus.Eachlineofdatacomesbackasatuplegivingusthefolderflags,foldernamedelimiter,andfoldername,andtheflagsthemselvesareasequenceofstrings.Takealookatthecodebelow,forwhattheoutputofthissecondscriptlookslike:
Capabilities:('IMAP4REV1','UNSELECT','IDLE','NAMESPACE','QUOTA','XLIST','CHILDREN',
'XYZZY','SASL-IR','AUTH=XOAUTH')
Listingmailboxes:
\HasNoChildren/INBOX
\HasNoChildren/Personal
\HasNoChildren/Receipts
\HasNoChildren/Travel
\HasNoChildren/Work
\Noselect\HasChildren/[Gmail]
\HasChildren\HasNoChildren/[Gmail]/AllMail
\HasNoChildren/[Gmail]/Drafts
\HasChildren\HasNoChildren/[Gmail]/SentMail
\HasNoChildren/[Gmail]/Spam
\HasNoChildren/[Gmail]/Starred
\HasChildren\HasNoChildren/[Gmail]/Trash
Thestandardflagslistedforeachfoldermaybezeroormoreofthefollowing:
\Noinferiors:Thismeansthatthefolderdoesnotcontainanysub-foldersandthatitisnotpossibleforittocontainsub-foldersinthefuture.YourIMAPclientwillreceiveanerrorifittriestocreateasub-folderunderthisfolder.
\Noselect:Thismeansthatitisnotpossibletorunselect_folder()onthisfolder—thatis,thisfolderdoesnotandcannotcontainanymessages.(Perhapsitexistsjusttoallowsub-foldersbeneathit,asonepossibility.)
\Marked:Thismeansthattheserverconsidersthisboxtobeinterestinginsomeway;generally,thisindicatesthatnewmessageshavebeendeliveredsincethelasttimethefolderwasselected.However,theabsenceof\Markeddoesnot
IMAPClient
guaranteethatthefolderdoesnotcontainnewmessages;someserverssimplydonotimplement\Markedatall.
\Unmarked:Thisguaranteesthatthefolderdoesn’tcontainnewmessages.
IMAPprovidestwodifferentwaystorefertoaspecificmessagewithinafolder:byatemporarymessagenumber(whichtypicallygoes1,2,3,andsoforth)orbyaUID(uniqueidentifier).Thedifferencebetweenthetwolieswithpersistence.Messagenumbersareassignedrightwhenyouselectthefolder.Thismeanstheycanbeprettyandsequential,butitalsomeansthatifyourevisitthesamefolderlater,thenagivenmessagemayhaveadifferentnumber.Forprogramssuchaslivemailreadersorsimpledownloadscripts,thisbehavior(whichisthesameasPOP)isfine;youdonotneedthenumberstostaythesame.ButaUID,bycontrast,isdesignedtoremainthesameevenifyoucloseyourconnectiontotheserveranddonotreconnectagainforanotherweek.IfamessagehadUID1053today,thenthesamemessagewillhaveUID1053tomorrow,andnoothermessageinthatfolderwilleverhaveUID1053.Ifyouarewritingasynchronizationtool,thisbehaviorisquiteuseful!Itwillallowyoutoverifywith100%percentcertaintythatactionsarebeingtakenagainstthecorrectmessage.ThisisoneofthethingsthatmakeIMAPsomuchmorefunthanPOP.
MostIMAPcommandsthatworkwithspecificmessagescantakeeithermessagenumbersorUIDs.Normally,IMAPClientalwaysusesUIDsandignoresthetemporarymessagenumbersassignedbyIMAP.Butifyouwanttoseethetemporarynumbersinstead,simplyinstantiateIMAPClientwithause_uid=Falseargument—or,youcanevensetthevalueoftheclass’suse_uidattributetoFalseandTrueontheflyduringyourIMAPsession.
MessageNumbersvs.UIDs
Whenyoufirstselectafolder,theIMAPserverprovidessomesummaryinformationaboutit—aboutthefolderitselfandalsoaboutitsmessages.ThesummaryisreturnedbyIMAPClientasadictionary.HerearethekeysthatmostIMAPserverswillreturnwhenyourunselect_folder():
EXISTS:Anintegergivingthenumberofmessagesinthefolder.
FLAGS:Alistoftheflagsthatcanbesetonmessagesinthisfolder.
RECENT:Specifiestheserver’sapproximationofthenumberofmessagesthathaveappearedinthefoldersincethelasttimeanIMAPclientranselect_folder()onit.
PERMANENTFLAGS:Specifiesthelistofcustomflagsthatcanbesetonmessages;thisisusuallyempty.
UIDNEXT:Theserver’sguessabouttheUIDthatwillbeassignedtothenextincoming(oruploaded)message
UIDVALIDITY:AstringthatcanbeusedbyclientstoverifythattheUIDnumberinghasnotchanged;ifyoucomebacktoafolderandthisisadifferentvaluethanthelasttimeyouconnected,thentheUIDnumberhasstartedoverandyourstoredUIDvaluesarenolongervalid.
UNSEEN:Specifiesthemessagenumberofthefirstunseenmessage(onewithoutthe\Seenflag)inthefolder.
Oftheseflags,serversareonlyrequiredtoreturnFLAGS,EXISTS,andRECENT,thoughmostwillincludeatleastUIDVALIDITYaswell.
folder_info.pyshowsanexampleprogramthatreadsanddisplaysthesummaryinformationofmyINBOXmailfolder:
importgetpass,sys
fromimapclientimportIMAPClient
try:
hostname,username=sys.argv[1:]
exceptValueError:
print'usage:%shostnameusername'%sys.argv[0]
sys.exit(2)
c=IMAPClient(hostname,ssl=True)
try:
c.login(username,getpass.getpass())
exceptc.Error,e:
print'Couldnotlogin:',e
sys.exit(1)
else:
select_dict=c.select_folder('INBOX',readonly=True)
fork,vinselect_dict.items():
print'%s:%r'%(k,v)
c.logout()
Whenrun,thisprogramdisplaysresultssuchasthis:
```
root@erlerobot:~/Python_files#pythonfolder_info.pyimap.example.combrandon@example.comPassword:EXISTS:3PERMANENTFLAGS:('\Answered','\Flagged','\Draft','\Deleted','\Seen','\*')READ-WRITE:TrueUIDNEXT:2626FLAGS:('\Answered','\Flagged','\Draft','\Deleted','\Seen')UIDVALIDITY:1RECENT:0
```
SummaryInformation
ThatshowsthatmyINBOXfoldercontainsthreemessages,noneofwhichhavearrivedsinceIlastchecked.IfyourprogramisinterestedinusingUIDsthatitstoredduringprevioussessions,remembertocomparetheUIDVALIDITYtoastoredvaluefromaprevioussession.
WithIMAP,theFETCHcommandisusedtodownloadmail,whichIMAPClientexposesasits`fetch()method.Thesimplestwaytofetchinvolvesdownloadingallmessagesatonce,inasinglebiggulp.Whilethisissimplestandrequirestheleastnetworktraffic(sinceyoudonothavetoissuerepeatedcommandsandreceivemultipleresponses),itdoesmeanthatallofthereturnedmessageswillneedtositinmemoryDownloadfromtogetherasyourprogramexaminesthem.Forverylargemailboxeswhosemessageshavelotsofattachments,thisisobviouslynotpractical.
mailbox_summary.pydownloadsallofthemessagesfrommyINBOXfolderintoyourcomputer’smemoryinaPythondatastructure,andthendisplaysabitofsummaryinformationabouteachone.
importemail,getpass,sys
fromimapclientimportIMAPClient
try:
hostname,username,foldername=sys.argv[1:]
exceptValueError:
print'usage:%shostnameusernamefolder'%sys.argv[0]
sys.exit(2)
c=IMAPClient(hostname,ssl=True)
try:
c.login(username,getpass.getpass())
exceptc.Error,e:
print'Couldnotlogin:',e
sys.exit(1)
c.select_folder(foldername,readonly=True)
msgdict=c.fetch('1:*',['BODY.PEEK[]'])
formessage_id,messageinmsgdict.items():
e=email.message_from_string(message['BODY[]'])
printmessage_id,e['From']
payload=e.get_payload()
ifisinstance(payload,list):
part_content_types=[part.get_content_type()forpartinpayload]
print'Parts:',''.join(part_content_types)
else:
print'',''.join(payload[:60].split()),'...'
c.logout()
RememberthatIMAPisstateful:firstweuseselect_folder()toputus“inside”thegivenfolder,andthenwecanrunfetch()toaskformessagecontent.Therange'1:*'means“thefirstmessagethroughtheendofthemailfolder”,becausemessageIDs—whethertemporaryorUIDs—arealwayspositiveintegers.
Hereiswhatitlooksliketorunthisscript:
root@erlerobot:~/Python_files#pythonmailbox_summary.pyimap.example.combrandonINBOX
Password:
2590"Amazon.com"<order-update@amazon.com>
DearBrandon,PortablePowerSystems,Inc.shippedthefollo...
2469MeetupReminder<info@meetup.com>
Parts:text/plaintext/html
2470billing@linode.com
Thankyou.Pleasenotethatchargeswillappearas"Linode.c...
DownloadinganEntireMailbox
E-mailmessagescanbequitelarge,andsocanmailfolders—manymailsystemspermituserstohavehundredsorthousandsofmessages,thatcaneachbe10MBormore.ThatkindofmailboxcaneasilyexceedtheRAMontheclientmachineifitscontentsarealldownloadedatonce,asinthepreviousexample.Tohelpnetwork-basedmailclientsthatdonotwanttokeeplocalcopiesofeverymessage,IMAPsupportsseveraloperationsbesidesthebig“fetchthewholemessage”commandthatwesawintheprevioussection.
Ane-mail’sheaderscanbedownloadedasablockoftext,separatelyfromthemessage.
Particularheadersfromamessagecanberequestedandreturned.
TheservercanbeaskedtorecursivelyexploreandreturnanoutlineoftheMIMEstructureofamessage.
Thetextofparticularsectionsofthemessagecanbereturned.
ThisallowsIMAPclientstoperformveryefficientqueriesthatdownloadonlytheinformationtheyneedtodisplayfortheuser,decreasingtheloadontheIMAPserverandthenetwork,andallowingresultstobedisplayedmorequicklytotheuser.ForanexampleofhowasimpleIMAPclientworks,examinesimple_client.py,whichputstogetheranumberofideasaboutbrowsinganIMAPaccount.Hopefullythisprovidesmorecontextthanwouldbepossibleifthesefeatureswerespreadoutoverahalf-dozenshorterprogramlistingsatthispointinthechapter.Youcanseethattheclientconsistsofthreeconcentricloopsthateachtakeinputfromtheuserasheorsheviewsthelistofmailfolders,thenthelistofmessageswithinaparticularmailfolder,andfinallythesectionsofaspecificmessage.
importgetpass,sys
fromimapclientimportIMAPClient
try:
hostname,username=sys.argv[1:]
exceptValueError:
print'usage:%shostnameusername'%sys.argv[0]
sys.exit(2)
banner='-'*72
c=IMAPClient(hostname,ssl=True)
try:
c.login(username,getpass.getpass())
exceptc.Error,e:
print'Couldnotlogin:',e
sys.exit(1)
defdisplay_structure(structure,parentparts=[]):
"""Attractivelydisplayagivenmessagestructure."""
#Thewholebodyofthemessageisnamed'TEXT'.
ifparentparts:
name='.'.join(parentparts)
else:
print'HEADER'
name='TEXT'
#Printthispart'sdesignationanditsMIMEtype.
is_multipart=isinstance(structure[0],list)
ifis_multipart:
parttype='multipart/%s'%structure[1].lower()
else:
parttype=('%s/%s'%structure[:2]).lower()
print'%-9s'%name,parttype,
#Foramultipartpart,printallofitssubordinateparts;for
#otherparts,printtheirdisposition(ifavailable).
ifis_multipart:
DownloadingMessagesIndividually
subparts=structure[0]
foriinrange(len(subparts)):
display_structure(subparts[i],parentparts+[str(i+1)])
else:
ifstructure[6]:
print'size=%s'%structure[6],
ifstructure[8]:
disposition,namevalues=structure[8]
printdisposition,
foriinrange(0,len(namevalues),2):
print'%s=%r'%namevalues[i:i+2]
defexplore_message(c,uid):
"""Lettheuserviewvariouspartsofagivenmessage."""
msgdict=c.fetch(uid,['BODYSTRUCTURE','FLAGS'])
whileTrue:
print'Flags:',
flaglist=msgdict[uid]['FLAGS']
ifflaglist:
print''.join(flaglist)
else:
print'none'
display_structure(msgdict[uid]['BODYSTRUCTURE'])
reply=raw_input('Message%s-typeapartname,or"q"toquit:'
%uid).strip()
ifreply.lower().startswith('q'):
break
key='BODY[%s]'%reply
try:
msgdict2=c.fetch(uid,[key])
exceptc._imap.error:
print'Error-cannotfetchsection%r'%reply
else:
content=msgdict2[uid][key]
ifcontent:
printbanner
printcontent.strip()
printbanner
else:
print'(Nosuchsection)'
defexplore_folder(c,name):
"""Listthemessagesinfolder`name`andlettheuserchooseone."""
whileTrue:
c.select_folder(name,readonly=True)
msgdict=c.fetch('1:*',['BODY.PEEK[HEADER.FIELDS(FROMSUBJECT)]',
'FLAGS','INTERNALDATE','RFC822.SIZE'])
foruidinsorted(msgdict):
items=msgdict[uid]
print'%6d%20s%6dbytes%s'%(
uid,items['INTERNALDATE'],items['RFC822.SIZE'],
''.join(items['FLAGS']))
foriinitems['BODY[HEADER.FIELDS(FROMSUBJECT)]'].splitlines():
print''*6,i.strip()
reply=raw_input('Folder%s-typeamessageUID,or"q"toquit:'
%name).strip()
ifreply.lower().startswith('q'):
break
try:
reply=int(reply)
exceptValueError:
print'Pleasetypeanintegeror"q"toquit'
else:
ifreplyinmsgdict:
explore_message(c,reply)
c.close_folder()
defexplore_account(c):
"""DisplaythefoldersinthisIMAPaccountandlettheuserchooseone."""
whileTrue:
folderflags={}
data=c.list_folders()
forflags,delimiter,nameindata:
folderflags[name]=flags
fornameinsorted(folderflags.keys()):
print'%-30s%s'%(name,''.join(folderflags[name]))
reply=raw_input('Typeafoldername,or"q"toquit:').strip()
ifreply.lower().startswith('q'):
break
ifreplyinfolderflags:
explore_folder(c,reply)
else:
print'Error:nofoldernamed',repr(reply)
if__name__=='__main__':
explore_account(c)
Youcanseethattheouterfunctionusesasimplelist_folders()calltopresenttheuserwithalistofhisorhermailfolders,likesomeoftheprogramlistingswehaveseenalready.Eachfolder’sIMAPflagsarealsodisplayed.Thisletstheprogramgivetheuserachoicebetweenfolders:
INBOX\HasNoChildren
Receipts\HasNoChildren
Travel\HasNoChildren
Work\HasNoChildren
Typeafoldername,or"q"toquit:
``
Onceauserhasselectedafolder,thingsbecomemoreinteresting:asummaryhastobeprintedfor
eachmessage.NotethatitiscarefultouseBODY.PEEKinsteadofBODYtofetchtheseitems,sincetheIMAP
serverwouldotherwisemarkthemessagesas\Seenmerelybecausetheyhadbeendisplayedina
summary.
Theresultsofthis`fetch()`callareprintedtothescreenonceane-mailfolderhasbeenselected:
27032010-09-2821:32:1319129bytes\SeenFrom:BrandonCraigRhodesSubject:DigestedArticles27042010-09-2823:03:4515354bytesSubject:Re:[venv]BuildingavirtualenvironmentforofflinetestingFrom:"W.CraigTrader"27052010-09-2908:11:3810694bytesSubject:Re:[venv]BuildingavirtualenvironmentforofflinetestingFrom:HugoLopesTavaresFolderINBOX-typeamessageUID,or"q"toquit: `Asyoucansee,thefactthatseveralitemsofinterest
canbesuppliedtotheIMAPfetch()`commandletsusbuildfairlysophisticatedmessagesummarieswithonlyasingleround-triptotheserver.Onefinalnoteaboutthefetch()command:itletsyounotonlypulljustthepartsofamessagethatyouneedatanygivenmoment,butalsotruncatethemincasetheyarequitelongandyoujustwanttoprovideanexcerptfromthebeginningtotantalizetheuser.
Youmighthavenoticed,whiletryingoutsimple_client.pyorreadingitsexampleoutputjustshown,thatIMAPmarksmessageswithattributescalled“flags,”whichtypicallytaketheformofabackslashprefixedword,like\Seenforoneofthemessagesjustcited.Severalofthesearestandard,andaredefinedinRFC3501foruseonallIMAPservers.Hereiswhatthemostimportantonesmean:
\Answered:Theuserhasrepliedtothemessage.
\Draft:Theuserhasnotfinishedcomposingthemessage.
\Flagged:Themessagehassomehowbeensingledoutspecially;thepurposeandmeaningofthisflagvarybetweenmailreaders.
\Recent:NoIMAPclienthasseenthismessagebefore.Thisflagisunique,inthattheflagcannotbeaddedorremovedbynormalcommands;itisautomaticallyremovedafterthemailboxisselected.
\Seen:Themessagehasbeenread.
TheIMAPClientlibrarysupportsseveralmethodsforworkingwithflags.Thesimplestretrievestheflagsasthoughyouhaddoneafetch()askingfor'FLAGS',butgoesaheadandremovesthedictionaryaroundeachanswer:
>>>c.get_flags(2703)
{2703:('\\Seen',)}
Therearealsocallstoaddandremoveflagsfromamessage:
c.remove_flags(2703,['\\Seen'])
c.add_flags(2703,['\\Answered'])
Incaseyouwanttocompletelychangethesetofflagsforaparticularmessagewithoutfiguringoutthecorrectseriesofaddsandremoves,youcanuseset_flags()tounilaterallyreplacethewholelistofmessageflagswithanewone:
c.set_flags(2703,['\\Seen','\\Answered'])
AnyoftheseoperationscantakealistofmessageUIDsinsteadofthesingleUIDshownintheseexamples.
OnelastinterestinguseofflagsisthatitishowIMAPsupportsmessagedeletion.Theprocess,forsafety,takestwosteps:firsttheclientmarksoneormoremessageswiththe\Deleteflag;thenitcallsexpunge()toperformthedeletionsasasingleoperation.TheIMAPClientlibrarydoesnotmakeyoudothisbyhand,however(thoughthatwouldwork);insteadithidesthefactthatflagsareinvolvedbehindasimpledelete_messages()routinethatmarksthemessagesforyou.Itstillhastobefollowedbyexpunge()ifyouactuallywanttheoperationtotakeeffect,though:
c.delete_messages([2703,2704])
c.expunge()
FlaggingandDeletingMessages
Flagging
DeletingMessages
Searchingisanotherissuethatisveryimportantforaprotocoldesignedtoletyoukeepallyourmailonthemailserveritself:withoutsearch,ane-mailclientwouldhavetodownloadallofauser’smailanywaythefirsttimeheorshewantedtoperformafull-textsearchtofindane-mailmessage.Theessenceofsearchissimple:youcallthesearch()methodonanIMAPclientinstance,andarereturnedtheUIDs(assuming,ofcourse,thatyouaccepttheIMAPClientdefaultofuse_uid=Trueforyourclient)ofthemessagesthatmatchyourcriteria:
>>>c.select_folder('INBOX')
>>>c.search('SINCE20-Aug-2010TEXTApress')
[2590L,2652L,2653L,2654L,2655L,2699L]
Therearemanycriteriathatyoucancombineinordertoformaquery.LiketherestofIMAP,theyarespecifiedinRFC3501.Somecriteriaarequitesimple,andrefertobinaryattributeslikeflags:
ALL:Everymessageinthemailbox
UID(id,...):MessageswiththegivenUIDs
LARGERn:Messagesmorethannoctetsinlength
SMALLERm:Messageslessthanmoctetsinlength
ANSWERED:Havetheflag\Answered
DELETED:Havetheflag\Deleted
DRAFT:Havetheflag\Draft
FLAGGED:Havetheflag\Flagged
KEYWORDflag:Havethegivenkeywordflagset
NEW:Havetheflag\Recent
OLD:Lacktheflag\Recent
UNANSWERED:Lacktheflag\Answered
UNDELETED:Lacktheflag\Deleted
UNDRAFT:Lacktheflag\Draft
UNFLAGGED:Lacktheflag\Flagged
UNKEYWORDflag:Lackthegivenkeywordflag
UNSEEN:Lacktheflag\Seen
Therearetwosetsofcriteriafordates,dependingonwhichdateyouwanttoqueryby:theinternalDateheader(sneddate)andtheatwhcicharrivedattheIMAPserver.
Finally,therearetwosearchoperationsthatrefertothetextofthemessageitself—thesearethebigworkhorsesthatsupportfull-textsearchofthekindyourusersareprobablyexpectingwhentheytypeintoasearchfieldinane-mailclient:
BODYstring:Themessagebodymustcontainthestring.
TEXTstring:Theentiremessage,eitherbodyorheader,mustcontainthestringsomewhere.
CreatingordeletingfoldersisdonequitesimplyinIMAP,byprovidingthenameofthefolder:
c.create_folder('Personal')
c.delete_folder('Work')
SomeIMAPserversorconfigurationsmaynotpermittheseoperations,ormayhaverestrictionsonnaming;besuretohaveerrorcheckinginplacewhencallingthem.Therearetwooperationsthatcancreatenewe-mailmessagesinyourIMAPaccountbesidesthe“normal”meansofwaitingforpeopletosendthemtoyou.First,youcancopyanexistingmessagefromitshomefolderoverintoanotherfolder.Startbyusingselect_folder()tovisitthefolderwherethemessageslive,andthenrunthecopymethodlikethis:
c.select_folder('INBOX')
SearchingandManipulatingMessages
ManipulatingFoldersandMessages
c.copy([2653L,2654L],'TODO')
Finally,itispossibletoaddamessagetoamailboxwithIMAP.YoudonotneedtosendthemessagefirstwithSMTP;IMAPisallthatisneeded.Addingamessageisasimpleprocess,thoughthereareacoupleofthingstobeawareof.
Youmustalsobecautiousinhowcarefullyyouchangethelineendings,becausesomemessagesmayuse'\r\n'somewhereinsidedespiteusingonly'\n'forthefirstfewdozenlines,andIMAPclientshavebeenknowntofailifamessageusesbothdifferentlineendings!Thesolutionisasimpleone,thankstoPython’spowerfulsplitlines()stringmethodthatrecognizesallthreepossiblelineendings;simplycallthefunctiononyourmessageandthenre-jointhelineswiththestandardlineending:
>>>'one\rtwo\nthree\r\nfour'.splitlines()
['one','two','three','four']
>>>'\r\n'.join('one\rtwo\nthree\r\nfour'.splitlines())
'one\r\ntwo\r\nthree\r\nfour'
Theactualactofappendingamessage,onceyouhavethelineendingscorrect,istocalltheappend()methodonyourIMAPclient:
c.append('INBOX',my_message)
Youcanalsosupplyalistofflagsasakeywordargument,aswellasamsg_timetobeusedasitsarrivaltimebypassinganormalPythondatetimeobject.
The“commandline”isthetopicofthischapter:howyoucanaccessitoverthenetwork,togetherwithenoughdiscussionaboutitstypicalbehaviortogetyouthroughanyfrustrationsyoumightencounterwhiletryingtouseit.
TelnetandSSH
Beforegettingintothedetailsofhowthecommandlineworks,andhowyoucanaccessitoverthenetwork,weshouldpauseandnotethatthereexistmanysystemstodayforautomatingtheentireprocess.Ifyouhavedozensorhundredsofmachinestomaintainandyouneedtostartsendingthemallthesamecommands,thenyoumightfindthattoolsalreadyexist—toolsthatalreadyprovidewaystowritecommandscripts,pushthemoutforexecutionacrossacloudofmachines,batchupanyerrormessagesorresponsesforyourreview,andevensavecommandsinaqueuetobere-triedlaterincaseamachineisdownandcannotbereachedatthemoment.
Whataretheoptions?First,theFabriclibraryisverypopularwithPythonprogrammerswhoneedtoruncommandsandcopyfilestoremoteservermachines.Asyoucanseeinfabfile.py,aFabricscriptcallsverysimplefunctionswithnameslikeput(),cd(),andrun()toperformoperationsonthemachinestowhichitconnects.Butyoucanlearnmoreaboutitatitswebsite:http://fabfile.org/.Althoughfabfile.pyisdesignedtoberunbyFabric'sownfabcommand-linetool,FabriccanalsobeusedfrominsideyourownPythonprograms;again,consulttheirdocumentationfordetails.
fromfabric.apiimport*
defversions():
withcd('/usr/bin'):
withsettings(hide('warnings'),warn_only=True):
forversionin'2.4','2.5','2.6','2.7','3.0','3.1':
result=run('python%s-c"None"'%version)
ifnotresult.failed:
print"Host",env.host,"hasPython",version
AnotherprojecttocheckoutisSilverLining.Itisstillveryimmature,butifyouareanexperiencedprogrammerwhoneedsitsspecificcapabilities,thenyoumightfindthatitsolvesyourproblemswell.Thislibrarygoesbeyondbatchingcommandsacrossmanydifferentservers:itwillactuallycreateandinitializeUbuntuserversthroughthe“libcloud”PythonAPI,andtheninstallyourPythonwebapplicationsthereforyou.Youcanlearnmoreaboutthispromisingprojectathttp://cloudsilverlining.org/.
Ontheotherhand,thereis“pexpect.”Whileitisnot,technically,aprogramthatitselfknowshowtousethenetwork,itisoftenusedtocontrolthesystem“ssh”or“telnet”commandwhenaPythonprogrammerwantstoautomateinteractionswitharemotepromptofsomekind.ThistypicallytakesplaceinasituationwherenoAPIforadeviceisavailable,andcommandssimplyhavetobetypedeachtimethecommand-linepromptappears.Configuringsimplenetworkhardwareoftenrequiresthiskindofclunkystep-by-stepinteraction.Youcanlearnmoreabout“pexpect”here:http://pypi.python.org/pypi/pexpect.
Finally,therearemorespecificprojectsthatprovidemechanismsforremotesystemsadministration.RedHatandFedorausersmightlookatfunc,whichusesanSSL-encryptedXML-RPCservicethatletsyouwritePythonprogramsthatperformsystemconfigurationandmaintenance:https://fedorahosted.org/func/.
Command-LineAutomation
IfyouhaveevertypedmanycommandsataUnixcommandprompt,youwillbeawarethatnoteverycharacteryoutypeisinterpretedliterally.Considerthiscommand,forexample:
root@erlerobot:~#echo*
Hello.txtPython-3.4.1Python-3.4.1.tgzPython_filesbuildgmapenvhola.txtotrotext.txtvirtualenv-1.11.6virtualenv-1.11.6.tar.gz
root@erlerobot:~#
Theasterisk*inthiscommandwasnotinterpretedtomean“printoutanasteriskcharactertothescreen”;instead,theshellthoughtIwastryingtowriteapatternthatwouldmatchallofthefilenamesinthecurrentdirectory.Toactuallyprintoutanasterisk,Ihavetouseanotherspecialcharacter—an“escape”character,becauseitletsme“escape”fromtheshell'snormalmeaning—totellitthatIjustmeantheasteriskliterally:
root@erlerobot:~#echoHereisaloneasterisk:\*
Hereisaloneasterisk:*
root@erlerobot:~#echoAndhereare'*'two"*"moreasterisks
Andhereare*two*moreasterisks
root@erlerobot:~#
Therulesbywhichmodernshellsinterpretthespecialcharactersinyourcommandlinehavebecomequitecomplex.Instead,tousethecommandlineeffectively,youjusthavetounderstandtwopoints:
Specialcharactersareinterpretedasspecialbytheshellyouareusing,likebash.
Whenpassingcommandstoashelleitherlocallyoracrossthenetwork,youneedtoescapethespecialcharactersyouusesothattheyarenotexpandedintounintendedvaluesontheremotesystem.
Command-LineExpansionandQuoting
Likemanyveryusefulstatements,theboldclaimofthetitleofthissectionis,alas,alie.Thereis,infact,acharacterthatUnixconsidersspecial.But,ingeneral,Unixhasnospecialcharacters,andthisisaveryimportantfactforyoutograsp.
Ontheonehand,itmakesitveryeasyto,say,nameallofthefilesinthecurrentdirectoryasargumentstoacommand;butontheotherhand,itcanbeverydifficulttoechoamessagetothescreenthatmixessinglequotesanddouble-quotes.
Thesimplelessonofthissectionisthatthewholesetofconventionstowhichyouareaccustomedhasnothingtodowithyouroperatingsystem;theyaresimplyandentirelyabehaviorofthebashshell,orofwhicheveroftheotherpopular(orarcane)shellsthatyouareusing.Itdoesnotmatterhowfamiliartherulesseem,orhowdifficultitisforyoutoimagineusingaUnix-likesystemwithoutthem.Ifyoutakebashaway,theyaresimplynotthere.Youcanobservethisquitesimplybytakingcontroloftheoperatingsystem'sprocesslauncheryourselfandtryingtothrowsomespecialcharactersatafamiliarcommand:
>>>importsubprocess
>>>args=['echo','Sometimesan','*','justmeansan','*']
>>>subprocess.call(args)
SometimesanjustmeansanHere,wearebypassingalloftheshellapplicationsthatareavailableforinterpretingcommands,andwearetellingtheoperatingsystemtostartanewprocessusingpreciselythelistofargumentswehaveprovided.Andtheprocess—theechocommand,inthiscase—isgettingexactlythosecharacters,insteadofhavingthe*turnedintoalistoffilenamesfirst.Thoughwerarelythinkaboutit,themostcommon“special”characterisoneweuseallthetime:thespacecharacter.Ratherthanassumethatyouactuallymeaneachspacecharactertobepassedtothecommandyouareinvoking,theshellinsteadinterpretsitasthedelimiterseparatingtheactualtextyouwantthecommandtosee.ThiscausesendlessentertainmentwhenpeopleincludespacesinUnixfilenames,andthentrytomovethefilesomewhereelse:
root@erlerobot:~#mvSmithContract.txt~/Documents
mv:cannotstat`Smith':Nosuchfileordirectory
mv:cannotstat`Contract.txt':Nosuchfileordirectory
Tomaketheshellunderstandthatyouaretalkingaboutonefilewithaspaceinitsname,nottwofiles,youhavetocontrivesomethinglikeoneofthesepossiblecommandlines:
root@erlerobot:~#mvSmith\Contract.txt~/Documents
root@erlerobot:~#mv"SmithContract.txt"~/Documents
root@erlerobot:~#mvSmith*Contract.txt~/Documents
Thatlastpossibilityobviouslymeanssomethingquitedifferent—sinceitwillmatchanyfilenamethathappenstostartwithSmithandendwithContract.txt,regardlessofwhetherthetextbetweenthemisasimplespacecharacterorsomemuchlongersequenceoftext—butIhaveseenmanypeopletypeitinfrustrationwhoarestilllearningshellconventionsandcannotrememberhowtotypealiteralspacecharacterfortheshell.Ifyouwanttoconvinceyourselfthatnoneofthecharactersthatthebashshellhastaughtyoutobecarefulaboutisspecial,shell.pyshowsasimpleshell,writteninPython,thattreatsonlythespaceasspecialbutpasseseverythingelsethroughliterallytothecommand.
importsubprocess
whileTrue:
args=raw_input(']').split()
ifnotargs:
pass
elifargs==['exit']:
break
elifargs[0]=='show':
print"Arguments:",args[1:]
UnixHasNoSpecialCharacters
else:
subprocess.call(args)
Runningthisfile,resulton:
root@erlerobot:~#pythonshell.py
]echoHithere!
Hithere!
]echoAnasterisk*isnotspecial.
Anasterisk*isnotspecial.
]echoThestring$HOSTisnotspecial,norare"doublequotes".
Thestring$HOSTisnotspecial,norare"doublequotes".
]echoWhat?No*<>!$specialcharacters?
What?No*<>!$specialcharacters?
]show"The'show'built-inlistsitsarguments."
Arguments:['"The',"'show'",'built-in','lists','its','arguments."']
]exit
YoucanseehereabsoluteevidencethatUnixcommands—inthiscase,the/bin/echocommandthatwearecallingoverandoveragain—donotgenerallyattempttointerprettheirargumentsasanythingotherthanstrings.Theechocommandhappilyacceptsdouble-quotes,dollarsigns,andasterisks,andtreatsthemallasliteralcharacters.Astheforegoingshowcommandillustrates,Pythonissimplyreducingourargumentstoalistofstringsfortheoperatingsystemtouseincreatinganewprocess.Whatifwefailtosplitourcommandintoseparatearguments?
>>>importsubprocess
>>>subprocess.call(['echohello'])
Traceback(mostrecentcalllast):
...
OSError:[Errno2]Nosuchfileordirectory
Theoperatingsystemdoesnotknowthatspacesshouldbespecial;thatisaquirkofshellprograms,notofUnix-likeoperatingsystemsthemselves!Sothesystemthinksthatitisbeingaskedtorunacommandliterallynamedecho[space]hello,and,unlessyouhavecreatedsuchafileinthecurrentdirectory,itfailstofinditandraisesanexception.
Topreventyoufrommakingthismistake,Pythonstopsyouinyourtracksifyouincludeanullcharacterinacommandlineargument:
>>>importsubprocess
>>>subprocess.call(['echo','Sentencescanend\0abruptly.'])
Traceback(mostrecentcalllast):
...
TypeError:execv()arg2mustcontainonlystrings
Sinceeverycommandonthesystemisdesignedtolivewithinthislimitation,youwillgenerallyfindthereisneveranyreasontoputnullcharactersintocommand-lineargumentsanyway.
Intheforegoingsection,weusedroutinesinPython'ssubprocessmoduletodirectlyinvokecommands.(Thesubprocessmoduleallowsyoutospawnnewprocesses,connecttotheirinput/output/errorpipes,andobtaintheirreturncodes.)Thiswasgreat,andletuspasscharactersthatwouldhavebeenspecialtoanormalinteractiveshell.Ifyouhaveabiglistoffilenameswithspacesandotherspecialcharactersinthem,itcanbewonderfultosimplypassthemintoasubprocesscallandhavethecommandonthereceivingendunderstandyouperfectly.
Butwhenyouareusingremote-shellprotocolsoverthenetwork(which,youwillrecall,isthesubjectofthischapter!),youaregenerallygoingtobetalkingtoashelllikebashinsteadofgettingtoinvokecommandsdirectlylikeyoudothroughthesubprocessmodule.Thismeansthatremote-shellprotocolswillfeelmorelikethesystem()routinefromtheosmodule,whichdoesinvokeashelltointerpretyourcommandline,andthereforeinvolvesyouinallofthecomplexitiesoftheUnixcommandline:
>>>importos
>>>os.system('echo*')
Hello.txtPython-3.4.1Python-3.4.1.tgzPython_filesbuildgmapenvhola.txtotrotext.txtvirtualenv-1.11.6virtualenv-
Ofcourse,iftheotherendofaremote-shellconnectionisusingsomesortofshellwithwhichyouareunfamiliar,thereislittlethatPythoncando.TheauthorsoftheStandardLibraryhavenoideahow,say,aMotorolaDSLrouter'sTelnet-basedcommandlinemighthandlespecialcharacters,orevenwhetheritpaysattentiontoquotesatall.ButiftheotherendofanetworkconnectionisastandardUnixshelloftheshfamily,likebashorzsh,thenyouareinluck:thefairlyobscurePythonpipesmodule,whichisnormallyusedtobuildcomplexshellcommandlines,containsahelperfunctionthatisperfectforescapingarguments.Itiscalledquote,andcansimplybepassedastring:
>>>frompipesimportquote
>>>printquote("filename")
filename
'filewithspaces'
>>>printquote("file'singlequoted'inside!")
"file'singlequoted'inside!"
>>>printquote("danger!;rm-r*")
'danger!;rm-r*'
Sopreparingacommandlineforremoteexecutiongenerallyjustinvolvesrunningquote()oneachargumentandthenpastingtheresulttogetherwithspaces.NotethatusingaremoteshellwithPythondoesnotinvolveyouintheterrorsoftwolevelsofshellquoting!IfyouhaveevertriedtobuildaremoteSSHcommandlinethatusesfancyquoting,bytypingalocalcommandlineintoyourownshell.Theattempttendstogenerateaseriesofexperimentslikethis:
$echo$HOST
guinness
$sshasaphecho$HOST
guinness
$sshasaphecho\$HOST
asaph
$sshasaphecho\\$HOST
guinness
$sshasaphecho\\\$HOST
$HOST
$sshasaphecho\\\\$HOST
\guinness
usingaremote-shellprotocolthroughPythondoesnotinvolvetwolevelsofshelllikethis.Instead,yougettoconstructaliteralstringinPythonthatthendirectlybecomeswhatisexecutedbytheremoteshell;nolocalshellisinvolved.Soifusingashell-within-a-shellhasyouconvincedthatpassingstringsandfilenamessafelytoaremoteshellisaveryhardproblem,relax:nolocalshellwillbeinvolvedinourfollowingexamples.
QuotingCharactersforProtection
YouwillprobablytalktomoreprogramsthanjusttheshelloveryourPython-poweredremote-shellconnection,ofcourse.Youwilloftenwanttowatchtheincomingdatastreamfortheinformationanderrorsprintedoutbythecommandsyouarerunning.Andsometimesyouwillevenwanttosenddataback,eithertoprovidetheremoteprogramswithinput,ortorespondtoquestionsandpromptsthattheypresent.
Whenperformingtaskslikethis,youmightbesurprisedtofindthatprogramshangindefinitelywithouteverfinishingtheoutputthatyouarewaitingon,orthatdatayousendseemstonotbegettingthrough.Tohelpyouthroughsituationslikethis,abriefdiscussionofUnixterminalsisinorder.
Aterminaltypicallynamesadeviceintowhichausertypestext,andonwhosescreenthecomputer'sresponsecanbedisplayed.IfaUnixmachinehasphysicalserialportsthatcouldpossiblyhostaphysicalterminal,thenthedevicedirectorywillcontainentrieslike/dev/ttyS1withwhichprogramscansendandreceivestringstothatdevice.Butmostterminalsthesedaysare,inreality,otherprograms:anxtermterminal,oraGnomeorKDEterminalprogram,oraPuTTYclientonaWindowsmachinethathasconnectedviaaremote-shellprotocolofthekindwewilldiscuss.
Buttheprogramsrunninginsidetheterminalonyourlaptopordesktopmachinestillneedtoknowthattheyaretalkingtoaperson—theystillneedtofeelliketheyaretalkingthroughthemechanismofaterminaldeviceconnectedtoadisplay.SotheUnixoperatingsystemprovidesasetof“pseudoterminal”devices(whichmighthavelessconfusinglybeennamed“virtual”terminals)withnameslike/dev/tty42.WhensomeonebringsupanxtermorconnectsthroughSSH,thextermorSSHdaemongrabsafreshpseudo-terminal,configuresit,andrunstheuser'sshellbehindit.Theshellexaminesitsstandardinput,seesthatitisaterminal,andpresentsapromptsinceitbelievesitselftobetalkingtoaperson.
Thisisacrucialdistinctiontounderstand:theshellpresentsapromptbecause,andonlybecause,itthinksitisconnectedtoaterminal!Ifyoustartupashellandgiveitastandardinputthatisnotaterminal—like,say,apipefromanothercommand—thennopromptwillbeprinted,yetitwillstillrespondtocommands:
root@erlerobot:~#cat|bash
echoHereweareinsideofbash,withnoprompt!
Hereweareinsideofbash,withnoprompt!
python
print'Pythonhasnotprintedaprompt,either.'
importsys
print'Isthisaterminal?',sys.stdin.isatty()
YoucanseethatPython,also,doesnotprintitsusualstartupbanner,nordoesitpresentanyprompts.
Thereareevenchangesinhowsomecommandsformattheiroutputdependingonwhethertheyaretalkingtoaterminal.Somecommandswithlonglinesofoutput—thepscommandcomestomind—willtruncatetheirlinestoyourterminalwidthifusedinteractively,butproducearbitrarilywideoutputifconnectedtoapipeorfile.And,entertaininglyenough,thefamiliarcolumn-basedoutputofthelscommandgetsturnedoffandreplacedwithafilenameoneachline(whichis,youmustadmit,aneasierformatforreadingbyanotherprogram)ifitsoutputisapipeorfile:
root@erlerobot:~#ls
Hello.txtPython_fileshola.txtvirtualenv-1.11.6
Python-3.4.1buildotrovirtualenv-1.11.6.tar.gz
Python-3.4.1.tgzgmapenvtext.txt
root@erlerobot:~#ls|cat
Hello.txt
Python-3.4.1
Python-3.4.1.tgz
Python_files
build
gmapenv
hola.txt
otro
text.txt
virtualenv-1.11.6
virtualenv-1.11.6.tar.gz
ThingsAreDifferentinaTerminal
root@erlerobot:~#
AprogramrunningbehindTelnet,forexample,alwaysthinksitistalkingtoaterminal;soyourscriptsorprogramsmustalwaysexpecttoseeaprompteachtimetheshellisreadyforinput,andsoforth.ButwhenyoumakeaconnectionoverthemoresophisticatedSSHprotocol,youwillactuallyhaveyourchoiceofwhethertheprogramthinksthatitsinputisaterminalorjustaplainpipeorfile.Youcantestthiseasilyfromthecommandlineifthereisanothercomputeryoucanconnectto:
root@erlerobot:~#ssh-tasaph
asaph$echo"Hereweare,ataprompt."
Hereweare,ataprompt.
SowhenyouspawnacommandthroughamodernprotocollikeSSH,youneedtoconsiderwhetheryouwanttheprogramontheremoteendthinkingthatyouareapersontypingatitthroughaterminal,orwhetherithadbestthinkitistalkingtorawdatacominginthroughafileorpipe.
Programsarenotactuallyrequiredtoactanydifferentlywhentalkingtoaterminal;itisjustforourconveniencethattheyvarytheirbehavior:
Programsthatareoftenusedinteractivelywillpresentahuman-readablepromptwhentheyaretalkingtoaterminal.Butwhentheythinkinputiscomingfromafile,theyavoidprintingaprompt.
Sophisticatedinteractiveprograms,thesedays,usuallyturnoncommand-lineeditingwhentheirinputisaTTY.
Manyprogramsreadonlyonelineofinputatatimewhenlisteningtoaterminal,becausehumansliketogetanimmediateresponsetoeverycommandtheytype.Butwhenreadingfromapipeorfile,thesesameprogramswillwaituntilthousandsofcharactershavearrivedbeforetheytrytointerprettheirfirstbatchofinput.
Itisevenmorecommonforprogramstoadjusttheiroutputbasedonwhethertheyaretalkingtoaterminal.
Bothofthelasttwoissues,whichinvolvebuffering,causeallsortsofproblemswhenyoutakeaprocessthatyouusuallydomanuallyandtrytoautomateit—becauseindoingsoyouoftenmovefromterminalinputtoinputprovidedthroughafileorpipe,andsuddenlyyoufindthattheprogramsbehavequitedifferently,andmightevenseemtobehangingbecause“print”statementsarenotproducingimmediateoutput,butareinsteadsavinguptheirresultstopushoutallatoncewhentheiroutputbufferisfull.
YoucanseethiseasilywithasimplePythonprogram(sincePythonisoneoftheapplicationsthatdecideswhethertobufferitsoutputbasedonwhetheritistalkingtoaterminal)thatprintsamessage,waitsforalineofinput,andthenprintsagain:
root@erlerobot:~#python-c'print"talk:";s=raw_input();print"yousaid",s'
talk:
hi
yousaidhi
root@erlerobot:~#python-c'print"talk:";s=raw_input();print"yousaid",s'|cat
hi
talk:
yousaidhi
Youcanseethatinthefirstinstance,whenPythonknewitsoutputwasaterminal,itprintedtalk:immediately.Butinthesecondinstance,itsoutputwasapipetothecatcommand,andsoitdecidedthatitcouldsaveuptheresultsofthatfirstprintstatementandbatchthemtogetherwiththerestoftheprogram'soutput,sothatbothlinesofoutputappearedonlyonceyouhadprovidedyourinputandtheprogramwasending.
Theforegoingproblemiswhymanycarefullywrittenprograms,bothinPythonandinotherlanguages,frequentlycallflush()ontheiroutputtomakesurethatanythingwaitinginabuffergoesaheadandgetssentout,regardlessofwhethertheoutputlookslikeaterminal.Sothosearethebasicproblemswithterminalsandbuffering:programschangetheirbehavior,ofteninidiosyncraticways,whentalkingtoaterminal(thinkagainofthelsexample),andtheyoftenstartheavily
bufferingtheiroutputiftheythinktheyarewritingtoafileorpipe.
Beyondtheprogram-specificbehaviorsjustdescribed,thereareadditionalproblemsraisedbyterminals.
Forexample,whathappenswhenyouwantaprogramtobereadingyourinputonecharacteratatime,buttheUnixterminaldeviceitselfisbufferingyourkeystrokestodeliverthemasawholeline?ThiscommonproblemhappensbecausetheUnixterminaldefaultsto“canonical”inputprocessing,whereitletstheuserenterawholeline,andevenedititbybackspacingandre-typing,beforefinallypressing“Enter”andlettingtheprogramseewhatheorshehastyped.Ifyouwanttoturnoffcanonicalprocessingsothataprogramcanseeeveryindividualcharacterasitistyped,youcanusethestty“SetTTYsettings”commandtodisableit:
root@erlerobot:~#stty-icanon
AnotherproblemisthatUnixterminalstraditionallysupportedapairofkeystrokesforpausingtheoutputstreamsothattheusercouldreadsomethingonthescreenbeforeitscrolledoffandwasreplacedbymoretext.OftenthesewerethecharactersCtrl+Sfor“Stop”andCtrl+Qfor“Keepgoing,”anditwasasourceofgreatannoyancethatifbinarydataworkeditswayintoanautomatedTelnetconnectionthatthefirstCtrl+Sthathappenedtopassacrossthechannelwouldpausetheterminalandprobablyruinthesession.Again,thissettingcanbeturnedoffwithstty:
root@erlerobot:~#stty-ixon-ixoff
Thereareplentyoflessfamoussettingsthatcanalsocauseyougrief.Becausetherearesomany—andbecausetheyvarybetweenUniximplementations—thesttycommandactuallysupportstwomodes,cookedandraw,thatturndozensofsettingslikeicanonandixononandofftogether:
root@erlerobot:~#sttyraw
root@erlerobot:~#sttycooked
Incaseyoumakeyourterminalsettingsahopelessmessaftersomeexperimentation,mostUnixsystemsprovideacommandforresettingtheterminalbacktoreasonable,sanesettings(youmightneedtohitCtrl+Jtosubmittheresetcommand,sinceyourReturnkey,whoseequivalentisCtrl+M,actuallyonlyfunctionstosubmitcommandsbecauseofaterminalsettingcalledicrnl):
root@erlerobot:~#reset
If,insteadoftryingtogettheterminaltobehaveacrossaTelnetorSSHsession,youhappentobetalkingtoaterminalfromPython,checkoutthetermiosmodulethatcomeswiththeStandardLibrary.ThismoduleprovidesaninterfacetothePOSIXcallsforttyI/Ocontrol.Foracompletedescriptionofthesecalls,seethePOSIXorUnixmanualpages.
TerminalsDoBuffering
TelnetisanetworkprotocolusedontheInternetorlocalareanetworkstoprovideabidirectionalinteractivetext-orientedcommunicationfacilityusingavirtualterminalconnection.Userdataisinterspersedin-bandwithTelnetcontrolinformationinan8-bitbyteorienteddataconnectionovertheTransmissionControlProtocol(TCP).
Telnetisinsecure:anyonewatchingyourTelnetpacketsflybywillseeyourusername,password,andeverythingyoudoontheremotesystem.Itisclunky.Andithasbeencompletelyabandonedformostsystemsadministration.
IncaseyouarehavingtowriteaPythonprogramthathastospeakTelnettooneofthesedevices,hereareafewpointersonusingthePythontelnetlib.ThetelnetlibmoduleprovidesaTelnetclassthatimplementstheTelnetprotocol.
First,youhavetorealizethatallTelnetdoesistoestablishachannelandtosendthethingsyoutype,andreceivethethingstheremotesystemsays,backandforthacrossthatchannel.ThismeansthatTelnetisignorantofallsortsofthingsofwhichyoumightexpectaremote-shellprotocoltobeaware.
Forexample,itisconventionalthatwhenyouTelnettoaUnixmachine,youarepresentedwithaalogin:promptatwhichyoutypeyourusername,andapassword:promptwhereyouenteryourpassword.
ThefactthatTelnetisignorantaboutauthenticationhasanimportantconsequence:youcannottypeanythingonthecommandlineitselftogetyourselfpre-authenticatedtotheremotesystem,noravoidtheloginandpasswordpromptsthatwillpopupwhenyoufirstconnect!IfyouaregoingtouseplainTelnet,youaregoingtohavetosomehowwatchtheincomingtextforthosetwoprompts(orhowevermanytheremotesystemsupplies)andissuethecorrectreplies.
Obviously,ifsystemsvaryinwhatusernameandpasswordpromptstheypresent,thenyoucanhardlyexpectstandardizationintheerrormessagesorresponsesthatgetsentbackwhenyourpasswordfails.ThatiswhyTelnetissohardtoscriptandprogramfromalanguagelikePythonandalibraryliketelnetlib.
SoifyouareusingTelnet,thenyouareplayingatextgame:youwatchfortexttoarrive,andthentrytoreplywithsomethingintelligibletotheremotesystem.Tohelpyouwiththis,thePythontelnetlibprovidesnotonlybasicmethodsforsendingandreceivingdata,butalsoafewroutinesthatwillwatchandwaitforaparticularstringtoarrivefromtheremotesystem.
telnet_login.pyconnectstolocalhost,whichinthiscaseismyUbuntulaptop,whereIhavejustrunaptitudeinstalltelnetdsothataTelnetdaemonisnowlisteningonitsstandardport23.
importtelnetlib
t=telnetlib.Telnet('localhost')
#t.set_debuglevel(1)#uncommentthisfordebuggingmessages
t.read_until('login:')
t.write('brandon\n')
t.read_until('assword:')#let"P"becapitalizedornot
t.write('mypass\n')
n,match,previous_text=t.expect([r'Loginincorrect',r'\$'],10)
ifn==0:
print"Usernameandpasswordfailed-givingup"
else:
t.write('execuptime\n')
printt.read_all()#keepreadinguntiltheconnectioncloses
Ifthescriptissuccessful,itshowsyouwhatthesimpleuptimecommandprintsontheremotesystem:
root@erlerobot:~/Python_files#pythontelnet_login.py
10:24:43up5days,12:13,14users,loadaverage:1.44,0.91,0.73
Telnet
Thelistingshowsyouthegeneralstructureofasessionpoweredbytelnetlib.First,aconnectionisestablished,whichisrepresentedinPythonbyaninstanceoftheTelnetobject.Hereonlythehostnameisspecified,thoughyoucanalsoprovideaportnumbertoconnecttosomeotherserviceportthanstandardTelnet.Youcancallset_debuglevel(1)ifyouwantyourTelnetobjecttoprintoutallofthestringsthatitsendsandreceivesduringthesession.Thisactuallyturnedouttobeimportantforwritingeventheverysimplescriptshowninthelisting,becauseintwodifferentcasesitgothungup,andIhadtore-runitwithdebuggingmessagesturnedonsothatIcouldseetheactualoutputandfixthescript.Igenerallyturnoffdebuggingonlyonceaprogramisworkingperfectly,andturnitbackonwheneverIwanttodomoreworkonthescript.
NotethatTelnetdoesnotdisguisethefactthatitsserviceisbackedbyaTCPsocket,andwillpassthroughtoyourprogramanysocket.errorandsocket.gaierrorexceptionsthatareraised.OncetheTelnetsessionisestablished,interactiongenerallyfallsintoareceive-and-sendpattern,whereyouwaitforapromptorresponsefromtheremoteend,thensendyournextpieceofinformation.Thelistingillustratestwomethodsofwaitingfortexttoarrive:
Theverysimpleread_until()methodwatchesforaliteralstringtoarrive,thenreturnsastringprovidingallofthetextthatitreceivedfromthemomentitstartedlistinguntilthemomentitfinallysawthestringyouwerewaitingfor.
Themorepowerfulandsophisticatedexpect()methodtakesalistofPythonregularexpressions.Oncethetextarrivingfromtheremoteendfinallyaddsuptosomethingthatmatchesoneoftheregularexpressions,`expect()returnsthreeitems:theindexinyourlistofthepatternthatmatched,theregularexpressionSRE_Matchobjectitself,andthetextthatwasreceivedleadinguptothematchingtext.FormoreinformationonwhatyoucandowithaSRE_Match,includingfindingthevaluesofanysub-expressionsinyourpattern,readtheStandardLibrarydocumentationfortheremodule.
Ifthescriptseesanerrormessagebecauseofanincorrectpassword—anddoesnotgetstuckwaitingforeverforaloginorpasswordpromptthatneverarrivesorthatlooksdifferentthanitwasexpecting—thenitexits:
root@erlerobot:~/Python_files#pythontelnet_login.py
Usernameandpasswordfailed-givingup
IfyouwindupwritingaPythonscriptthathastouseTelnet,itwillsimplybealargerormorecomplicatedversionofthesamesimplepatternshownhere.Bothread_until()andexpect()takeanoptionalsecondargumentnamedtimeoutthatplacesamaximumlimitonhowlongthecallwillwatchforthetextpatternbeforegivingupandreturningcontroltoyourPythonscript.Iftheyquitandgiveupbecauseofthetimeout,theydonotraiseanerror;instead—awkwardlyenough—theyjustreturnthetexttheyhaveseensofar,andleaveittoyoutofigureoutwhetherthattextcontainsthepattern.ThereareafewoddsandendsintheTelnetobjectthatweneednotcoverhere.YouwillfindtheminthetelnetlibStandardLibrarydocumentation—includinganinteract()methodthatletstheuser“talk”directlyoveryourTelnetconnectionusingtheterminal!Thiskindofcallwasverypopularbackintheolddays,whenyouwantedtoautomateloginbutthentakecontrolandissuenormalcommandsyourself.
Normally,eachtimeaTelnetserversendsanoptionrequest,telnetlibflatlyrefusestosendorreceivethatoption.ButyoucanprovideaTelnetobjectwithyourowncallbackfunctionforprocessingoptions;amodestexampleisshownintelnet_codes.py.Formostoptions,itsimplyre-implementsthedefaulttelnetlibbehaviorandrefusestohandleanyoptions(andalwaysremembertorespondtoeachoptiononewayoranother;failingtodosowilloftenhangtheTelnetsessionastheserverwaitsforeverforyourreply).Butiftheserverexpressesinterestinthe“terminaltype”option,thenthisclientsendsbackareplyof“mypython,”whichtheshellcommanditrunsafterlogginginthenseesasits$TERMenvironmentvariable.
fromtelnetlibimportTelnet,IAC,DO,DONT,WILL,WONT,SB,SE,TTYPE
defprocess_option(tsocket,command,option):
ifcommand==DOandoption==TTYPE:
tsocket.sendall(IAC+WILL+TTYPE)
print'Sendingterminaltype"mypython"'
tsocket.sendall(IAC+SB+TTYPE+'\0'+'mypython'+IAC+SE)
elifcommandin(DO,DONT):
print'Willnot',ord(option)
tsocket.sendall(IAC+WONT+option)
elifcommandin(WILL,WONT):
print'Donot',ord(option)
tsocket.sendall(IAC+DONT+option)
t=Telnet('localhost')
#t.set_debuglevel(1)#uncommentthisfordebuggingmessages
t.set_option_negotiation_callback(process_option)
t.read_until('login:',5)
t.write('brandon\n')
t.read_until('assword:',5)#soPcanbecapitalizedornot
t.write('mypass\n')
n,match,previous_text=t.expect([r'Loginincorrect',r'\$'],10)
ifn==0:
print"Usernameandpasswordfailed-givingup"
else:
t.write('exececho$TERM\n')
printt.read_all()
TheSSHprotocolisoneofthebest-knownexamplesofasecure,encryptedprotocolamongmodernsystemadministrators(HTTPSisprobablytheverybestknown).
SSHisdescendedfromanearlierprotocolthatsupported“remotelogin,”“remoteshell,”and“remotefilecopy”commandsnamedrlogin,rsh,andrcp,whichintheirtimetendedtobecomemuchmorepopularthanTelnetatsitesthatsupportedthem.Youcannotimaginewhatarevelationrcpwasparticular,unlessyouhavespenthourstryingtotransferafilebetweencomputersarmedwithonlyTelnetandascriptthattriestotypeyourpasswordforyou,onlytodiscoverthatyourfilecontainsabytethatlookslikeacontrolcharactertoTelnetortheremoteterminal,andhavethewholethinghanguntilyouaddalayerofescaping(orfigureouthowtodisableboththeTelnetescapekeyandallinterpretationtakingplaceontheremoteterminal).
Butthebestfeatureoftherloginfamilywasthattheydidnotjustechousernameandpasswordpromptswithoutactuallyknowingthemeaningofwhatwasgoingon.Instead,theystayedinvolvedthroughtheprocessofauthentication,andyoucouldevencreateafileinyourhomedirectorythattoldthem“whensomeonenamedbrandontriestoconnectfromtheasaphmachine,justlettheminwithoutapassword.”Suddenly,systemadministratorsandUnixusersalikereceivedbackhoursofeachmonththatwouldotherwisehavebeenspenttypingtheirpassword.Suddenly,youcouldcopytenfilesfromonemachinetoanothernearlyaseasilyasyoucouldhavecopiedthemintoalocalfolder.SSHhaspreservedallofthesegreatfeaturesoftheearlyremote-shellprotocol,whilebringingbulletproofsecurityandhardencryptionthatistrustedworldwideforadministeringcriticalservers.
AtSSH,wereachaprotocolsosophisticatedthatitactuallyimplementsitsownrulesformultiplexing,sothatseveral“channels”ofinformationcanallsharethesameSSHsocket.EveryblockofinformationSSHsendsacrossitssocketislabeledwitha“channel”identifiersothatseveralconversationscansharethesocket.Thereareatleasttworeasonssub-channelsmakesense.First,eventhoughthechannelIDtakesupabitofbandwidthforeverysingleblockofinformationtransmitted,theadditionaldataissmallcomparedtohowmuchextrainformationSSHhastotransmittonegotiateandmaintainencryptionanyway.Second,channelsmakesensebecausetherealexpenseofanSSHconnectionissettingitup.Hostkeynegotiationandauthenticationcantogethertakeupseveralsecondsofrealtime,andoncetheconnectionisestablished,youwanttobeabletouseitforasmanyoperationsaspossible.ThankstotheSSHnotionofachannel,youcanamortizethehighcostofconnectingbyperformingmanyoperationsbeforeyoulettheconnectionclose.Onceconnected,youcancreateseveralkindsofchannels:
Aninteractiveshellsession,likethatsupportedbyTelnet.
Theindividualexecutionofasinglecommand.
Afile-transfersessionlettingyoubrowsetheremotefilesystem.
Aport-forwardthatinterceptsTCPconnections.
SSH:TheSecureShell
AnOverviewofSSH
WhenanSSHclientfirstconnectstoaremotehost,theyexchangetemporarypublickeysthatletthemencrypttherestoftheirconversationwithoutrevealinganyinformationtoanywatchingthirdparties.Then,beforetheclientiswillingtodivulgeanyfurtherinformation,itdemandsproofoftheremoteserver'sidentity.Thismakesgoodsenseasafirststep:ifyouarereallytalkingtoahackerwhohastemporarilymanagedtograbtheremoteserver'sIP,youdonotwantSSHtodivulgeevenyourusername—muchlessyourpassword.
TherearemanyproblemswiththissystemfromthepointofviewofSSH.Whileitistruethatyoucanbuildapublic-keyinfrastructureinternaltoanorganization,whereyoudistributeyourownsigningauthority'scertificatestoyourwebbrowsersorotherapplicationsandthencansignyourownservercertificateswithoutpayingathirdparty,apublic-keyinfrastructureisstillconsideredtoocumbersomeaprocessforsomethinglikeSSH;serveradministratorswanttosetup,use,andteardownserversallthetime,withouthavingtotalktoacentralauthorityfirst.
SoSSHhastheideathateachserver,wheninstalled,createsitsownrandompublic-privatekeypairthatisnotsignedbyanybody.Instead,oneoftwoapproachesistakentokeydistribution:
Asystemadministratorwritesascriptthatgathersupallofthehostpublickeysinanorganization,createsanssh_known_hostslistingthemall,andplacesthisfileinthe/etc/sshddirectoryoneverysystemintheorganization.NoweverySSHclientwillknowabouteverySSHhostkeybeforetheyevenconnectforthefirsttime.
Abandontheideaofknowinghostkeysaheadoftime,andinsteadmemorizethematthemomentoffirstconnection.UsersoftheSSHcommandlinewillbeveryfamiliarwiththis:theclientsaysitdoesnotrecognizethehosttowhichyouareconnecting,youreflexivelyanswer“yes,”anditskeygetsstoredinyour"~/.ssh/known_hosts"file.Youactuallyhavenoguaranteethatyouarereallytalkingtothehostyouthinkitis;butatleastyouwillbeguaranteedthateverysubsequentconnectionyouevermaketothatmachineisgoingtotherightplace,andnottootherserversthatsomeoneisswappingintoplaceatthesameIPaddress.
ThefamiliarpromptfromtheSSHcommandlinewhenitseesanunfamiliarhostlookslikethis:
root@erlerobot:~#sshasaph.rhodesmill.org
Theauthenticityofhost'asaph.rhodesmill.org(74.207.234.78)'
can'tbeestablished.
RSAkeyfingerprintis85:8f:32:4e:ac:1f:e9:bc:35:58:c1:d4:25:e3:c7:8c.
Areyousureyouwanttocontinueconnecting(yes/no)?yes
Warning:Permanentlyadded'asaph.rhodesmill.org,74.207.234.78'(RSA)
tothelistofknownhosts.
That“yes”answerburieddeeponthenext-to-lastfulllineistheanswerthatItypedgivingSSHthego-aheadtomaketheconnectionandrememberthekeyfornexttime.
TheparamikolibraryhasfullsupportforallofthenormalSSHtacticssurroundinghostkeys.Butitsdefaultbehaviorisratherspare:itloadsnohost-keyfilesbydefault,andwillthen,ofcourse,raiseanexceptionfortheveryfirsthosttowhichyouconnectbecauseitwillnotbeabletoverifyitskey.Theexceptionthatitraisesisabitun-informative;itisonlybylookingatthefactthatitcomesfrominsidethemissing_host_key()functionthatIusuallyrecognizewhathascausedtheerror.(Beforedoingthis,installparamikomodulefromPythonPackageIndex):
>>>importparamiko
>>>client=paramiko.SSHClient()
>>>client.connect('my.example.com',username='test')
Traceback(mostrecentcalllast):
...
File".../paramiko/client.py",line85,inmissing_host_key
»raiseSSHException('Unknownserver%s'%hostname)
paramiko.SSHException:Unknownservermy.example.com
TobehavelikethenormalSSHcommand,loadboththesystemandthecurrentuser'sknown-hostkeysbeforemakingthe
SSHHostKeys
connection:
>>>client.load_system_host_keys()
>>>client.load_host_keys('/home/brandon/.ssh/known_hosts')
>>>client.connect('my.example.com',username='test')
Theparamikolibraryalsoletsyouchoosehowyouhandleunknownhosts.Onceyouhaveaclientobjectcreated,youcanprovideitwithadecision-makingclassthatisaskedwhattodoifahostkeyisnotrecognized.YoucanbuildtheseclassesyourselfbyinheritingfromtheMissingHostKeyPolicyclass:
>>>classAllowAnythingPolicy(paramiko.MissingHostKeyPolicy):
...defmissing_host_key(self,client,hostname,key):
...return
...
>>>client.set_missing_host_key_policy(AllowAnythingPolicy())
>>>client.connect('my.example.com',username='test')
Notethat,throughtheargumentstothemissing_host_key()method,youreceiveseveralpiecesofinformationonwhichtobaseyourdecision;youcould,forexample,allowconnectionstomachinesonyourownserversubnetwithoutahostkey,butdisallowallothers.
Insideparamikotherearealsoseveraldecision-makingclassesthatalreadyimplementseveralbasichost-keyoptions:
paramiko.AutoAddPolicy:Hostkeysareautomaticallyaddedtoyouruserhost-keystore(thefile~/.ssh/known_hostsonUnixsystems)whenfirstencountered,butanychangeinthehostkeyfromthenonwillraiseafatalexception.
paramiko.RejectPolicy:Connectingtohostswithunknownkeyssimplyraisesanexception.
paramiko.WarningPolicy:Anunknownhostcausesawarningtobelogged,buttheconnectionisthenallowedtoproceed.
TheAutoAddPolicyneverneedshumaninteraction,butwillatleastassureyouonsubsequentencountersthatyouarestilltalkingtothesamemachineasbefore.
Sincethischapterisprimarilyabouthowto“speakSSH”fromPython,Iwilljustbrieflyoutlinehowauthenticationworks.TherearegenerallythreewaystoproveyouridentitytoaremoteserveryouarecontactingthroughSSH:
Youcanprovideausernameandpassword.
Youcanprovideausername,andthenhaveyourclientsuccessfullyperformapublic-keychallenge-response.Thiscleveroperationmanagestoprovethatyouareinpossessionofasecret“identity”keywithoutactuallyexposingitscontentstotheremotesystem.
YoucanperformKerberosauthentication.IftheremotesystemissetuptoallowKerberos,andifyouhaverunthekinitcommand-linetooltoproveyouridentitytooneofthemasterKerberosserversintheSSHserver'sauthenticationdomain,thenyoushouldbeallowedinwithoutapassword.
Sinceoption3isveryrare,wewillconcentrateonthefirsttwo.Usingausernameandpasswordwithparamikoisveryeasy—yousimplyprovidetheminyourcalltotheconnect()method:
>>>client.connect('my.example.com',username='brandon',password=mypass)
Public-keyauthentication,whereyouusessh-keygentocreatean“identity”keypair(whichistypicallystoredinyour~/.sshdirectory)thatcanbeusedtoauthenticateyouwithoutapassword,makesthePythoncodeeveneasier.
>>>client.connect('my.example.com')
Ifyouridentitykeyfileisstoredsomewhereotherthaninthenormal~/.ssh/id_rsafile,thenyoucanprovideitsfilename—orawholePythonlistoffilenames—totheconnect()methodmanually:
>>>client.connect('my.example.com',key_filename='/home/brandon/.ssh/id_sysadmin')
Oncetheconnect()methodhassucceeded,youarenowreadytostartperformingremoteoperations,allofwhichwillbeforwardedoverthesamephysicalsocketwithoutrequiringre-negotiationofthehostkey,youridentity,ortheencryptionthatprotectstheSSHsocketitself.
SSHAuthentication
OnceyouhaveaconnectedSSHclient,theentireworldofSSHoperationsisopentoyou.Simplybyasking,youcanaccessremote-shellsessions,runindividualcommands,commencefile-transfersessions,andsetupportforwarding.
First,SSHcansetuparawshellsessionforyou,runningontheremoteendinsideapseudoterminalsothatprogramsactliketheynormallydowhentheyareinteractingwiththeuserataterminal.ThiskindofconnectionbehavesverymuchlikeaTelnetconnection;takealookatssh_simple.pyforanexample,whichpushesasimpleechocommandattheremoteshell,andthenasksittoexit.
importparamiko
classAllowAnythingPolicy(paramiko.MissingHostKeyPolicy):
defmissing_host_key(self,client,hostname,key):
return
client=paramiko.SSHClient()
client.set_missing_host_key_policy(AllowAnythingPolicy())
client.connect('127.0.0.1',username='test')#password='')
channel=client.invoke_shell()
stdin=channel.makefile('wb')
stdout=channel.makefile('rb')
stdin.write('echoHello,world\rexit\r')
printstdout.read()
client.close()
Ifyouactuallyrunthiscommand,youwillseethatthecommandsyoutypeareactuallyechoedtoyoutwice,andthatthereisnoobviouswaytoseparatethesecommandechoesfromtheactualcommandoutput.
Becauseofquirkyterminal-dependentbehaviors,youshouldgenerallyavoideverusinginvoke_shell()unlessyouareactuallywritinganinteractiveterminalprogramwhereyouletaliveusertypecommands.Amuchbetteroptionforrunningremotecommandsistouseexec_command(),which,insteadofstartingupawholeshellsession,justrunsasinglecommand,givingyoucontrolofitsstandardinput,output,anderrorstreamsjustasthoughyouhadrunitusingthesubprocessmoduleintheStandardLibrary.Aswehaveseenthismoduleallowsyoutospawnnewprocesses,connecttotheirinput/output/errorpipes,andobtaintheirreturncodes.
Ascriptdemonstratingitsuseisshowninssh_commands.py.Thedifferencebetweenexec_command()andalocalsubprocessisthatyoudonotgetthechancetopasscommand-lineargumentsasseparatestrings;instead,youhavetopassawholecommandlineforinterpretationbytheshellontheremoteend.
importparamiko
classAllowAnythingPolicy(paramiko.MissingHostKeyPolicy):
defmissing_host_key(self,client,hostname,key):
return
client=paramiko.SSHClient()
client.set_missing_host_key_policy(AllowAnythingPolicy())
client.connect('127.0.0.1',username='test')#password='')
forcommandin'echo"Hello,world!"','uname','uptime':
stdin,stdout,stderr=client.exec_command(command)
stdin.close()
printrepr(stdout.read())
stdout.close()
stderr.close()
client.close()
ShellSessionsandIndividualCommands
EverytimeyoustartanewSSHshellsessionwithinvoke_shell(),andeverytimeyoukickoffacommandwithexec_command(),anewSSH“channel”iscreatedbehindthescenes,whichiswhatprovidesthefile-likePythonobjectsthatletyoutalktotheremotecommand'sstandardinput,output,anderror.Channels,asjustexplained,canruninparallel,andSSHwillcleverlyinterleavetheirdataonyoursingleSSHconnectionsothatalloftheconversationshappensimultaneouslywithouteverbecomingconfused.
Takealookatssh_threads.pyforaverysimpleexampleofwhatispossible.Here,two“commands”arekickedoffremotely,whichareeachasimpleshellscriptwithsomeechocommandsinterspersedwithpausescreatedbycallstosleep.Thethreadingmoduleconstructshigher-levelthreadinginterfacesontopofthelowerlevelthreadmodule.
importthreading
importparamiko
classAllowAnythingPolicy(paramiko.MissingHostKeyPolicy):
defmissing_host_key(self,client,hostname,key):
return
client=paramiko.SSHClient()
client.set_missing_host_key_policy(AllowAnythingPolicy())
client.connect('127.0.0.1',username='test')#password='')
defread_until_EOF(fileobj):
s=fileobj.readline()
whiles:
prints.strip()
s=fileobj.readline()
out1=client.exec_command('echoOne;sleep2;echoTwo;sleep1;echoThree')[1]
out2=client.exec_command('echoA;sleep1;echoB;sleep2;echoC')[1]
thread1=threading.Thread(target=read_until_EOF,args=(out1,))
thread2=threading.Thread(target=read_until_EOF,args=(out2,))
thread1.start()
thread2.start()
thread1.join()
thread2.join()
client.close()
Inordertobeabletoprocessthesetwostreamsofdatasimultaneously,wearekickingofftwothreads,andarehandingeachofthemoneofthechannelsfromwhichtoread.Theyeachprintouteachlineofnewinformationassoonasitarrives,andfinallyexitwhenthereadline()commandindicatesend-of-filebyreturninganemptystring.Whenrun,thisscriptshouldreturnsomethinglikethis:
root@erlerobot:~/Python_files#pythonssh_threads.py
One
A
B
Two
Three
C
SSHchannelsoverthesameTCPconnectionarecompletelyindependent,caneachreceive(andsend)dataattheirownpace,andcancloseindependentlywhentheparticularcommandthattheyaretalkingtofinallyterminates.
Version2oftheSSHprotocolincludesasub-protocolcalledthe“SSHFileTransferProtocol”(SFTP)thatletsyouwalktheremotedirectorytree,createanddeletedirectoriesandfiles,andcopyfilesbackandforthfromthelocaltotheremotemachine.ThecapabilitiesofSFTParesocomplexandcomplete,infact,thattheysupportnotonlysimplefile-copyoperations,butcanpowergraphicalfilebrowsersandcanevenlettheremotefilesystembemountedlocally.
WhentalkingaboutSFTPcommandsthanisprovidedbythebareparamikodocumentationforthePythonSFTPclient(http://www.lag.net/paramiko/docs/paramiko.SFTPClient-class);herearethemainthingstorememberwhendoingSFTP:
TheSFTPprotocolisstateful,justlikeFTP,andjustlikeyournormalshellaccount.Soyoucaneitherpassallfileanddirectorynamesasabsolutepathsthatstartattherootofthefilesystem,orusegetcwd()andchdir()tomovearoundthefilesystemandthenusepathsthatarerelativetothedirectoryinwhichyouhavearrived.
Youcanopenafileusingeitherthefile()oropen()methodandyougetbackafile-likeobjectconnectedtoanSSHchannelthatrunsindependentlyofyourSFTPchannel.
Becauseeachopenremotefilegetsanindependentchannel,filetransferscanhappenasynchronously;youcanopenmanyremotefilesatonceandhavethemallstreamingdowntoyourdiskdrive,oropennewfilesandbesendingdatatheotherway.
Finally,keepinmindthatnoshellexpansionisdoneonanyofthefilenamesyoupassacrossSFTP.Ifyoutryusingafilenamelike*oronethathasspacesorspecialcharacters,theyaresimplyinterpretedaspartofthefilename.Thismeansthatanysupportforpattern-matchingthatyouwanttoprovidetotheuserhastobethroughfetchingthedirectorycontentsyourselfandthencheckingtheirpatternagainsteachone,usingaroutinelikethoseprovidedinfnmatchinthePythonStandardLibrary.fnmatchmoduleprovidessupportforUnixshell-stylewildcards,whicharenotthesameasregularexpressions.
AverymodestexampleSFTPsessionisshowninsftp.py.Itdoessomethingsimplethatsystemadministratorsmightoftenneed:itconnectstotheremotesystemandcopiesmessageslogfilesoutofthe/var/logdirectory,perhapsforscanningoranalysisonthelocalmachine.Thefunctoolsmoduleisforhigher-orderfunctions:functionsthatactonorreturnotherfunctions.Ingeneral,anycallableobjectcanbetreatedasafunctionforthepurposesofthismodule,asshowninthesftp.py:
importfunctools
importparamiko
classAllowAnythingPolicy(paramiko.MissingHostKeyPolicy):
defmissing_host_key(self,client,hostname,key):
return
client=paramiko.SSHClient()
client.set_missing_host_key_policy(AllowAnythingPolicy())
client.connect('127.0.0.1',username='test')#password='')
defmy_callback(filename,bytes_so_far,bytes_total):
print'Transferof%risat%d/%dbytes(%.1f%%)'%(
filename,bytes_so_far,bytes_total,100.*bytes_so_far/bytes_total)
sftp=client.open_sftp()
sftp.chdir('/var/log')
forfilenameinsorted(sftp.listdir()):
iffilename.startswith('messages.'):
callback_for_filename=functools.partial(my_callback,filename)
sftp.get(filename,filename,callback=callback_for_filename)
client.close()
Notethat,althoughImadeabigdealoftalkingabouthoweachfilethatyouopenwithSFTPusesitsownindependentchannel,thesimpleget()andput()conveniencefunctionsprovidedbyparamiko—whicharereallylightweightwrappers
SFTP:FileTransferOverSSH
foranopen()followedbyaloopthatreadsandwrites—donotattemptanyasynchrony,butinsteadjustblockandwaituntileachwholefilehasarrived.Thismeansthattheforegoingscriptcalmlytransfersonefileatatime,producingoutputthatlookssomethinglikethis:
root@erlerobot:~/Python_files#pythonsftp.py
Transferof'messages.1'isat32768/128609bytes(25.5%)
Transferof'messages.1'isat65536/128609bytes(51.0%)
Transferof'messages.1'isat98304/128609bytes(76.4%)
Transferof'messages.1'isat128609/128609bytes(100.0%)
Transferof'messages.2.gz'isat32768/40225bytes(81.5%)
Transferof'messages.2.gz'isat40225/40225bytes(100.0%)
Transferof'messages.3.gz'isat28249/28249bytes(100.0%)
Transferof'messages.4.gz'isat32768/71703bytes(45.7%)
Transferof'messages.4.gz'isat65536/71703bytes(91.4%)
Transferof'messages.4.gz'isat71703/71703bytes(100.0%)
TheFileTransferProtocol(FTP)wasonceamongthemostwidelyusedprotocolsontheInternet,invokedwheneverauserwantedtotransferfilesbetweenInternet-connectedcomputers.
Inthischapterwewillexaminethisprotocolandstudythepossiblealternaives.
FileTransferProtocol(FTP)
Today,therearebetteralternativesthantheFTPprotocolforprettymuchanythingyoucouldwanttodowithit.
Thebiggestproblemwiththeprotocolisitslackofsecurity:notonlyfiles,butusernamesandpasswordsaresentcompletelyintheclearandcanbeviewedbyanyoneobservingnetworktraffic.
AsecondissueisthatanFTPusertendstomakeaconnection,chooseaworkingdirectory,anddoseveraloperationsalloverthesamenetworkconnection.ModernInternetservices,withmillionsofusers,preferprotocolslikeHTTPthatconsistofshort,completelyself-containedrequests,insteadoflong-runningFTPconnectionsthatrequiretheservertorememberthingslikeacurrentworkingdirectory.
Afinalbigissueisfilesystemsecurity.TheearlyFTPservers,insteadofshowingusersjustasliverofthehostfilesystemthattheownerwantedexposed,tendedtosimplyexposetheentirefilesystem,lettinguserscdto/andsnooparoundtoseehowthesystemwasconfigured.
Forfiledownload,HTTPisthestandardprotocolontoday’sInternet,protectedwithSSLwhennecessaryforsecurity.InsteadofexposingsystemspecificfilenameconventionslikeFTP,HTTPsupportssystem-independentURLs.
Anonymousuploadisabitlessstandard,butthegeneraltendencyistouseaformonawebpagethatinstructsthebrowsertouseanHTTPPOSToperationtotransmitthefilethattheuserselects.
FilesynchronizationhasimprovedimmeasurablysincethedayswhenarecursiveFTPfilecopywastheonlycommonwaytogetfilestoanothercomputer.Insteadofwastefullycopyingeveryfile,moderncommandslikersyncorrdistefficientlycomparefilesatbothendsoftheconnectionandcopyonlytheonesthatareneworhavechanged.
FullfilesystemaccessisactuallytheoneareawhereFTPcanstillcommonlybefoundontoday’sInternet:thousandsofcut-rateISPscontinuetosupportFTP,despiteitsinsecurity,asthemeansbywhichuserscopytheirmediaand(typically)PHPsourcecodeintotheirwebaccount.AmuchbetteralternativetodayisforserviceproviderstosupportSFTPinstead.
WhattoUseInsteadofFTP
Sowhatarethealternatives?
FTPisunusualbecause,bydefault,itactuallyusestwoTCPconnectionsduringoperation.Oneconnectionisthecontrolchannel,whichcarriescommandsandtheresultingacknowledgmentsorerrorcodes.Thesecondconnectionisthedatachannel,whichisusedsolelyfortransmittingfiledataorotherblocksofinformation,suchasdirectorylistings.Technically,thedatachannelisfullduplex,meaningthatitallowsfilestobetransmittedinbothdirectionssimultaneously.However,inactualpractice,thiscapabilityisrarelyused.
TheprocessofdownloadingafilefromanFTPserverranmostlylikethis:
1. First,theFTPclientestablishesacommandconnectionbyconnectingtotheFTPportontheserver.2. Theclientauthenticatesitself,usuallywithusernameandpassword.3. Theclientchangesdirectoryontheservertowhereitwantstodepositorretrievefiles.4. Theclientbeginslisteningonanewportforthedataconnection,andtheninformstheserveraboutthatport.5. Theserverconnectstotheporttheclientrequested.6. Thefileistransmitted.7. Thedataconnectionisclosed.
FTPalsosupportswhatisknownaspassivemode.Inthisscenario,thedataconnectionismadebackward:theserveropensanextraport,andtellstheclienttomakethesecondconnection.Otherthanthat,everythingbehavesthesameway.
CommunicationChannels
ThePythonmoduleftplibistheprimaryinterfacetoFTPforPythonprogrammers.Ithandlesthedetailsofestablishingthevariousconnectionsforyou,andprovidesconvenientwaystoautomatecommoncommands.YoucanusethistowritePythonprogramsthatperformavarietyofautomatedFTPjobs,suchasmirroringotherftpservers.ItisalsousedbythemoduleurllibtohandleURLsthatuseFTP.FormoreinformationonFTP(FileTransferProtocol),seeInternetRFC959.
connect.pyshowsaverybasicftplibexample.Theprogramconnectstoaremoteserver,displaysthewelcomemessage,andprintsthecurrentworkingdirectory.
fromftplibimportFTP
f=FTP('ftp.ibiblio.org')
print"Welcome:",f.getwelcome()
f.login()
print"Currentworkingdirectory:",f.pwd()
f.quit()
`
RecallthatanFTPsessioncanvisitdifferentdirectories,justlikeashellpromptcanmovebetweenlocationswithcd.Here,thepwd()functionreturnsthecurrentworkingdirectoryontheremotesiteoftheconnection.Finally,thequit()functionlogsoutandclosestheconnection.Hereiswhattheprogramoutputswhenrun:
root@erlerobot:~/Python_files#pythonconnect.py
Welcome:220ProFTPDServer
Currentworkingdirectory:/
UsingFTPinPython
WhenmakinganFTPtransfer,youhavetodecidewhetheryouwantthefiletreatedasamonolithicblockofbinarydata,orwhetheryouwantitparsedasatextfilesothatyourlocalmachinecanpasteitslinesbacktogetherusingwhateverend-of-linecharacterisnativetoyourplatform.Afiletransferredinso-called“ASCIImode”isdeliveredonelineatatime,sothatyoucangluethelinesbacktogetheronthelocalmachineusingitsownline-endingconvention.Takealookatasciidl.pyforaPythonprogramthatdownloadsawell-knowntextfileandsavesitinyourlocaldirectory.
importos
fromftplibimportFTP
ifos.path.exists('README'):
raiseIOError('refusingtooverwriteyourREADMEfile')
defwriteline(data):
fd.write(data)
fd.write(os.linesep)
f=FTP('ftp.kernel.org')
f.login()
f.cwd('/pub/linux/kernel')
fd=open('README','w')
f.retrlines('RETRREADME',writeline)
fd.close()
f.quit()
Intheexample,thecwd()functionselectsanewworkingdirectoryontheremotesystem.Thentheretrlines()functionbeginsthetransfer.Itsfirstparameterspecifiesacommandtorunontheremotesystem,usuallyRETR,followedbyafilename.Itssecondparameterisafunctionthatiscalled,overandoveragain,aseachlineofthetextfileisretrieved;ifomitted,thedataissimplyprintedtostandardoutput.Thelinesarepassedwiththeend-of-linecharacterstripped,sothehomemadewriteline()functionsimplyappendsyoursystem’sstandardlineendingtoeachlineasitiswrittenout.Tryrunningthisprogram;thereshouldbeafileinyourcurrentdirectorynamedREADMEaftertheprogramisdone.Basicbinaryfiletransfersworkinmuchthesamewayastext-filetransfers;binarydl.pyshowsanexample.
importos
fromftplibimportFTP
ifos.path.exists('patch8.gz'):
raiseIOError('refusingtooverwriteyourpatch8.gzfile')
f=FTP('ftp.kernel.org')
f.login()
f.cwd('/pub/linux/kernel/v1.0')
fd=open('patch8.gz','wb')
f.retrbinary('RETRpatch8.gz',fd.write)
fd.close()
f.quit()
Whenrun,itdepositsafilenamedpatch8.gzinyourcurrentworkingdirectory.Theretrbinary()functionsimplypassesblocksofdatatothespecifiedfunction.Thisisconvenient,sinceafileobject’swrite()functionexpectsjustsuchdata—sointhiscase,nocustomfunctionisnecessary.
ASCIIandBinaryFiles
Theftplibmoduleprovidesasecondfunctionthatcanbeusedforbinarydownloading:ntransfercmd().Thiscommandprovidesalower-levelinterface,butcanbeusefulifyouwanttoknowalittlebitmoreaboutwhat’sgoingonduringthedownload.Inparticular,thismoreadvancedcommandletsyoukeeptrackofthenumberofbytestransferred,andyoucanusethatinformationtodisplaystatusupdatesfortheuser.advbinarydl.pyshowsasampleprogramthatusesntransfercmd().
importos,sys
fromftplibimportFTP
ifos.path.exists('linux-1.0.tar.gz'):
raiseIOError('refusingtooverwriteyourlinux-1.0.tar.gzfile')
f=FTP('ftp.kernel.org')
f.login()
f.cwd('/pub/linux/kernel/v1.0')
f.voidcmd("TYPEI")
datasock,size=f.ntransfercmd("RETRlinux-1.0.tar.gz")
bytes_so_far=0
fd=open('linux-1.0.tar.gz','wb')
while1:
buf=datasock.recv(2048)
ifnotbuf:
break
fd.write(buf)
bytes_so_far+=len(buf)
print"\rReceived",bytes_so_far,
ifsize:
print"of%dtotalbytes(%.1f%%)"%(
size,100*bytes_so_far/float(size)),
else:
print"bytes",
sys.stdout.flush()
fd.close()
datasock.close()
f.voidresp()
f.quit()
Thereareafewnewthingstonotehere.Firstcomesthecalltovoidcmd().ThispassesanFTPcommanddirectlytotheserver,checksforanerror,butreturnsnothing.Inthiscase,therawcommandisTYPEI.Thatsetsthetransfermodeto“image,”whichishowFTPrefersinternallytobinaryfiles.Inthepreviousexample,retrbinary()automaticallyranthiscommandbehindthescenes,butthelower-levelntransfercmd()doesnot.Next,notethatntransfercmd()returnsatupleconsistingofadatasocketandanestimatedsize.Alwaysbearinmindthatthesizeismerelyanestimate,andshouldnotbeconsideredauthoritative;thefilemayendsooner,oritmightgoonmuchlonger,thanthisvalue.Also,ifasizeestimatefromtheFTPserverissimplynotavailable,thentheestimatedsizereturnedwillbeNone.
Afterreceivingthedata,itisimportanttoclosethedatasocketandcallvoidresp(),whichreadsthecommandresponsecodefromtheserver,raisinganexceptioniftherewasanyerrorduringtransmission.Evenifyoudonotcareaboutdetectingerrors,failingtocallvoidresp()willmakefuturecommandslikelytofailbecausetheserver’soutputsocketwillbeblockedwaitingforyoutoreadtheresults.Hereisanexampleofrunningthisprogram:
root@erlerobot:~/Python_files#pythonadvbinarydl.py
Received1259161of1259161bytes(100.0%)
AdvancedBinaryDownloading
FiledatacanalsobeuploadedthroughFTP.Aswithdownloading,therearetwobasicfunctionsforuploading:storbinary()andstorlines().Bothtakeacommandtorun,andafile-likeobjecttotransmit.Thestorbinary()functionwillcalltheread()methodrepeatedlyonthatobjectuntilitscontentisexhausted,whilestorlines(),bycontrast,callsthereadline()method.Unlikethecorrespondingdownloadfunctions,thesemethodsdonotrequireyoutoprovideacallablefunctionofyourown.(Butyoucould,ofcourse,passafile-likeobjectofyourowncraftingwhoseread()orreadline()methodcomputestheoutgoingdataasthetransmissionproceeds.binaryul.pyshowshowtouploadafileinbinarymode.
fromftplibimportFTP
importsys,getpass,os.path
iflen(sys.argv)!=5:
print"usage:%s<host><username><localfile><remotedir>"%(
sys.argv[0])
exit(2)
host,username,localfile,remotedir=sys.argv[1:]
password=getpass.getpass(
"Enterpasswordfor%son%s:"%(username,host))
f=FTP(host)
f.login(username,password)
f.cwd(remotedir)
fd=open(localfile,'rb')
f.storbinary('STOR%s'%os.path.basename(localfile),fd)
fd.close()
f.quit()
Thisprogramlooksquitesimilartoourearlierefforts.SincemostanonymousFTPsitesdonotpermitfileuploading,youwillhavetofindaserversomewheretotestitagainst;Isimplyinstalledtheold,venerableftpdonmylaptopforafewminutesandranthetestlikethis:
root@erlerobot:~/Python_files#pythonbinaryul.pylocalhostbrandontest.txt/tmp
YoucanmodifythisprogramtouploadafileinASCIImodebysimplychangingstorbinary()tostorlines().
UploadingData
Justlikethedownloadprocesshadacomplicatedrawversion,itisalsopossibletouploadfiles“byhand”usingntransfercmd(),asshowninadvbinaryul.py.
fromftplibimportFTP
importsys,getpass,os.path
BLOCKSIZE=8192#chunksizetoreadandtransmit:8kB
iflen(sys.argv)!=5:
print"usage:%s<host><username><localfile><remotedir>"%(
sys.argv[0])
exit(2)
host,username,localfile,remotedir=sys.argv[1:]
password=getpass.getpass("Enterpasswordfor%son%s:"%\
(username,host))
f=FTP(host)
f.login(username,password)
f.cwd(remotedir)
f.voidcmd("TYPEI")
fd=open(localfile,'rb')
datasock,esize=f.ntransfercmd('STOR%s'%os.path.basename(localfile))
size=os.stat(localfile)[6]
bytes_so_far=0
while1:
buf=fd.read(BLOCKSIZE)
ifnotbuf:
break
datasock.sendall(buf)
bytes_so_far+=len(buf)
print"\rSent",bytes_so_far,"of",size,"bytes",\
"(%.1f%%)\r"%(100*bytes_so_far/float(size))
sys.stdout.flush()
datasock.close()
fd.close()
f.voidresp()
f.quit()
Nowwecanperformanuploadthatcontinuouslydisplaysitsstatusasitprogresses:
root@erlerobot:~/Python_files#pythonbinaryul.pylocalhostbrandonpatch8.gz/tmp
Enterpasswordforbrandononlocalhost:
Sent6408of6408bytes(100.0%)
AdvancedBinaryUploading
LikemostPythonmodules,ftplibwillraiseanexceptionwhenanerroroccurs.Itdefinesseveralexceptionsofitsown,anditcanalsoraisesocket.errorandIOError.Asaconvenience,itoffersatuple,namedftplib.all_errors,thatlistsalloftheexceptionsthatcanpossiblyberaisedbyftplib.Thisisoftenausefulshortcutforwritingatry…exceptclause.
Oneoftheproblemswiththebasicretrbinary()functionisthat,inordertouseiteasily,youwillusuallywindupopeningthefileonthelocalendbeforebeginningthetransferontheremoteside.Ifyourcommandaimedattheremotesideretortsthatthefiledoesnotexist,oriftheRETRcommandotherwisefails,thenyouwillhavetocloseanddeletethelocalfileyouhavejustcreated(orelsewinduplitteringthefilesystemwithzero-lengthfiles).
Withthentransfercmd()method,bycontrast,youcancheckforaproblempriortoopeningalocalfile.nlst.pyalreadyfollowstheseguidelines:ifntransfercmd()fails,theexceptionwillcausetheprogramtoterminatebeforethelocalfileisopened.ScanningDirectoriesFTPprovidestwowaystodiscoverinformationaboutserverfilesanddirectories.Theseareimplementedinftplibasthenlst()anddir()methods.
Thenlst()methodreturnsalistofentriesinagivendirectory—allofthefilesanddirectoriesinside.However,thebarenamesareallthatisreturned.Thereisnootherinformationaboutwhichparticularentriesarefilesoraredirectories,onthesizesofthefilespresent,oranythingelse.
Themorepowerfuldir()functionreturnsadirectorylistingfromtheremote.Thislistingisinasystem-definedformat,buttypicallycontainsafilename,size,modificationdate,andfiletype.OnUNIXservers,itistypicallytheoutputofoneofthesetwoshellcommands:
root@erlerobot:~#ls-l
root@erlerobot:~#ls-la
nlst.pyshowsanexampleofusingnlst()togetdirectoryinformation.
fromftplibimportFTP
f=FTP('ftp.ibiblio.org')
f.login()
f.cwd('/pub/academic/astronomy/')
entries=f.nlst()
entries.sort()
printlen(entries),"entries:"
forentryinentries:
printentry
f.quit()
nlst.pyshowsanexampleofusingnlst()togetdirectoryinformation.Whenyourunthisprogram,youwillseeoutputlikethis:
root@erlerobot:~/Python_files#pythonnlst.py
13entries:
INDEX
README
ephem_4.28.tar.Z
hawaii_scope
incoming
jupitor-moons.shar.Z
lunar.c.Z
lunisolar.shar.Z
moon.shar.Z
planetary
sat-track.tar.Z
stars.tar.Z
xephem.tar.Z
HandlingErrors
IfyouweretouseanFTPclienttomanuallylogontotheserver,youwouldseethesamefileslisted.Noticethatthefilenamesareinaconvenientformatforautomatedprocessing—abarelistoffilenames—butthatthereisnoextrainformation.Theresultwillbedifferentwhenwetryanotherfilelistingcommandindir.py:
fromftplibimportFTP
f=FTP('ftp.ibiblio.org')
f.login()
f.cwd('/pub/academic/astronomy/')
entries=[]
f.dir(entries.append)
print"%dentries:"%len(entries)
forentryinentries:
printentry
f.quit()
Noticethatthefilenamesareinaconvenientformatforautomatedprocessing—abarelistoffilenames—butthatisnoextrainformation.Contrastthebarelistoffilenameswesawearlierwiththeoutputfromdir.py,whichusesdir():
root@erlerobot:~/Python_files#pythondir.py
13entries:
-rw-r--r--1(?)»(?)»»750Feb141994INDEX
-rw-r--r--1root»bin»»135Feb111999README
-rw-r--r--1(?)»(?)»341303Oct21992ephem_4.28.tar.Z
drwxr-xr-x2(?)»(?)»»4096Feb111999hawaii_scope
drwxr-xr-x2(?)»(?)»»4096Feb111999incoming
-rw-r--r--1(?)»(?)»»5983Oct21992jupitor-moons.shar.Z
-rw-r--r--1(?)»(?)»»1751Oct21992lunar.c.Z
-rw-r--r--1(?)»(?)»»8078Oct21992lunisolar.shar.Z
-rw-r--r--1(?)»(?)»»64209Oct21992moon.shar.Z
drwxr-xr-x2(?)»(?)»»4096Jan61993planetary
-rw-r--r--1(?)»(?)»129969Oct21992sat-track.tar.Z
-rw-r--r--1(?)»(?)»»16504Oct21992stars.tar.Z
-rw-r--r--1(?)»(?)»410650Oct21992xephem.tar.Z
Thedir()methodtakesafunctionthatitcallsforeachline,deliveringthedirectorylistinginpiecesjustlikeretrlines()deliversthecontentsofparticularfiles.Here,wesimplysupplytheappend()methodofourplainoldPythonentrieslist.
IfyoucannotguaranteewhatinformationanFTPservermightchoosetoreturnfromitsdir()command,howareyougoingtotelldirectoriesfromnormalfiles—anessentialsteptodownloadingentiretreesoffilesfromtheserver?Theanswer,showninrecursedl.py,istosimplytryacwd()intoeverynamethatnlst()returnsand,ifyousucceed,concludethattheentityisadirectory.Thissampleprogramdoesnotdoanyactualdownloading;instead,tokeepthingssimple,itsimplyprintsoutthedirectoriesitvisitstothescreen.
importos,sys
fromftplibimportFTP,error_perm
defwalk_dir(f,dirpath):
original_dir=f.pwd()
try:
f.cwd(dirpath)
excepterror_perm:
return#ignorenon-directoresandoneswecannotenter
printdirpath
names=f.nlst()
fornameinnames:
walk_dir(f,dirpath+'/'+name)
f.cwd(original_dir)#returntocwdofourcaller
f=FTP('ftp.kernel.org')
f.login()
walk_dir(f,'/pub/linux/kernel/Historic/old-versions')
f.quit()
Thissampleprogramwillrunabitslow—thereare,itturnsout,quiteafewfilesintheold-versionsdirectoryontheLinuxKernelArchive—butwithinafewdozenseconds,youshouldseetheresultingdirectorytreedisplayedonthescreen:
root@erlerobot:~/Python_files#pythonrecursedl.py
/pub/linux/kernel/Historic/old-versions
/pub/linux/kernel/Historic/old-versions/impure
/pub/linux/kernel/Historic/old-versions/old
/pub/linux/kernel/Historic/old-versions/old/corrupt
/pub/linux/kernel/Historic/old-versions/tytso
DetectingDirectoriesandRecursiveDownload
Finally,FTPsupportsfiledeletion,andsupportsboththecreationanddeletionofdirectories.Thesemoreobscurecallsarealldescribedintheftplibdocumentation:
delete(filename)willdeleteafilefromtheserver.
mkd(dirname)attemptstocreateanewdirectory.
rmd(dirname)willdeleteadirectory;notethatmostsystemsrequirethedirectorytobeemptyfirst.
rename(oldname,newname)works,essentially,liketheUnixcommandmv:ifbothnamesareinthesamedirectory,thefileisessentiallyre-named;butifthedestinationspecifiesanameinadifferentdirectory,thenthefileisactuallymoved.
TouseTLS,createyourFTPconnectionwiththeFTP_TLSclassinsteadoftheplainFTPclass;simplybydoingthis,yourusernameandpasswordand,infact,theentireFTPcommandchannelwillbeprotectedfrompryingeyes.Ifyouthenadditionallyruntheclass’sprot_p()method(ittakesnoarguments),thentheFTPdataconnectionwillbeprotectedaswell.Shouldyouforsomereasonwanttoreturntousinganun-encrypteddataconnectionduringthesession,thereisaprot_c()methodthatreturnsthedatastreamtonormal.Again,yourcommandswillcontinuetobeprotectedaslongasyouareusingtheFTP_TLSclass.
CheckthePythonStandardLibrarydocumentationformoredetails(theyincludeasmallcodesample)ifyouwindupneedingthisextensiontoFTP:http://docs.python.org/library/ftplib.html#ftplib.FTP_TLS
CreatingDirectories,DeletingThings
DoingFTPSecurely
RemoteProcedureCall(RPC)systemsletyoucallaremotefunctionusingthesamesyntaxthatyouwouldusewhencallingaroutineinalocalAPIorlibrary.Thistendstobeusefulintwosituations:First,whenyourprogramhasalotofworktodo,andyouwanttospreaditacrossseveralmachinesbymakingcallsacrossthenetwork;andsecond,whenyouneeddataorinformationthatisonlyavailableonanotherharddriveornetwork.
InthischapeterwewilltrytoknowRCPbetterandlearnhowwecanuseitincombinationconPython.
RemoteProcedureCall(RPC)
Besidesservingtheirtheessentialpurposeoflettingyoumakewhatappeartobelocalfunctionormethodcallsthatareinfactpassingacrossthenetworktoadifferentserver,RPCprotocolshaveseveralkeyfeatures,andalsosomedifferences,thatyoushouldkeepinmindwhenchoosingandthendeployinganRPCclientorserver.
First,everyRPCmechanismhaslimitsonthekindofdatayoucanpass.Themostpopularprotocols,therefore,supportonlyafewkindsofnumbersandstrings;onesequenceorlistdatatype;andthensomethinglikeastructorassociativearray.
Asecondcommonfeatureistheabilityoftheservertosignalthatanexceptionoccurredwhileitwasrunningtheremotefunction.Insuchcases,theclientRPClibrarywilltypicallyraiseanexceptionitselftotelltheclientthatsomethinghasgonewrong.
Third,manyRPCmechanismsprovideintrospection,whichisawayforclientstolistthecallsthataresupportedandperhapstodiscoverwhatargumentstheytake.
Fourth,eachRPCmechanismneedstosupportsomeaddressingschemewherebyyoucanreachoutandconnecttoaparticularremoteAPI.Somesuchmechanismsarequitecomplicated,andtheymightevenhavetheabilitytoautomaticallyconnectyoutothecorrectserveronyournetworkforperformingaparticulartask,withoutyourhavingtoknowitsnamebeforehand.OthermechanismsarequitesimpleandjustaskyoufortheIPaddress,portnumber,orURLoftheserviceyouwanttoaccess.Thesemechanismsexposetheunderlyingnetworkaddressingscheme,ratherthancreatingaschemeoftheirown.
Finally,someRPCmechanismssupportauthentication,accesscontrol,andevenfullimpersonationofparticularuseraccountswhenRPCcallsaremadebyseveraldifferentclientprogramswieldingdifferentcredentials.
FeaturesofRPC
XML-RPChasnativesupportinPythonpreciselybecauseitwasoneofthefirstRPCprotocolsoftheInternetage,operatingnativelyoverHTTPinsteadofinsistingonitsownon-the-wireprotocol.Thismeansourexampleswillnotevenrequireanythird-partymodules.WhilewewillseethatthismakesourRPCserversomewhatlesscapablethanifwemovedtoathird-partylibrary,thiswillalsomaketheexamplesgoodonesforaninitialforayintoRPC.
IfyouhaveeverusedrawXML,thenyouarefamiliarwiththefactthatitlacksanydata-typesemantics;itcannotrepresentnumbers,forexample,butonlyelementsthatcontainotherelements,textstrings,andtext-stringattributes.ThustheXML-RPCspecificationhastobuildadditionalsemanticsontopoftheplainXMLdocumentformatinordertospecifyhowthingslikenumbersshouldlookwhenconvertedintomarked-uptext.ThePythonStandardLibrarymakesiteasytowriteeitheranXML-RPCclientorserver,thoughmorepowerisavailablewhenwritingaclient.Forexample,theclientlibrarysupportsHTTPbasicauthentication,whiletheserverdoesnotsupportthis.Therefore,wewillbeginatthesimpleend,withtheserver.
xmlrpc_server.pyshowsabasicserverthatstartsawebserveronport7001andlistensforincomingInternetconnections.Hereweeillusetheoperatormodule,whichexportsasetofefficientfunctionscorrespondingtotheintrinsicoperatorsofPython.
importoperator,math
fromSimpleXMLRPCServerimportSimpleXMLRPCServer
defaddtogether(*things):
"""Addtogethereverythinginthelist`things`."""
returnreduce(operator.add,things)
defquadratic(a,b,c):
"""Determine`x`valuessatisfying:`a`*x*x+`b`*x+c==0"""
b24ac=math.sqrt(b*b-4.0*a*c)
returnlist(set([(-b-b24ac)/2.0*a,
(-b+b24ac)/2.0*a]))
defremote_repr(arg):
"""Returnthe`repr()`renderingofthesupplied`arg`."""
returnarg
server=SimpleXMLRPCServer(('127.0.0.1',7001))
server.register_introspection_functions()
server.register_multicall_functions()
server.register_function(addtogether)
server.register_function(quadratic)
server.register_function(remote_repr)
print"Serverready"
server.serve_forever()
YoucanseethatthethreesamplefunctionsthattheserveroffersoverXML-RPC—theonesthatareaddedtotheRPCservicethroughtheregister_function()calls—arequitetypicalPythonfunctions.Andthat,again,isthewholepointofXML-RPC:itletsyoumakeroutinesavailableforinvocationoverthenetworkwithouthavingtowritethemanydifferentlythaniftheywerenormalfunctionsofferedinsideofyourprogram.
Notethattwoadditionalconfigurationcallsaremadeinadditiontothethreecallsthatregisterourfunctions.Eachofthemturnsonanadditionalservicethatisoptional,butoftenprovidedbyXML-RPCservers:anintrospectionroutinethataclientcanusetoaskwhichRPCcallsaresupportedbyagivenserver;andtheabilitytosupportamulticallfunctionthatletsseveralindividualfunctioncallsbebundledtogetherintoasinglenetworkround-trip.Thisserverwillneedtoberunningbeforewecantryanyofthenextthreeprogramlistings,sobringupacommandwindowandgetitstarted:
root@erlerobot:~/Python_files#pythonxmlrpc_server.py
Serverready
XML-RPC
Thismeansthatheeserverisnowwaitingforconnectionsonlocalhostport7001.
Now,openanothercommandwindowandgetreadytotryoutthenextthreelistingsaswereviewthem.First,wewilltryouttheintrospectioncapabilitythatweturnedoninthisparticularserver.Notethatthisabilityisoptional,anditmaynotbeavailableonmanyotherXML-RPCservicesthatyouuseonlineorthatyoudeployyourself.xmlrpc_introspect.pyshowshowintrospectionhappensfromtheclient’spointofview.
importxmlrpclib
proxy=xmlrpclib.ServerProxy('http://127.0.0.1:7001')
print'Herearethefunctionssupportedbythisserver:'
formethod_nameinproxy.system.listMethods():
ifmethod_name.startswith('system.'):
continue
signatures=proxy.system.methodSignature(method_name)
ifisinstance(signatures,list)andsignatures:
forsignatureinsignatures:
print'%s(%s)'%(method_name,signature)
else:
print'%s(...)'%(method_name,)
method_help=proxy.system.methodHelp(method_name)
ifmethod_help:
print'',method_help
TheintrospectionmechanismisanoptionalextensionthatisnotactuallydefinedintheXML-RPCspecificationitself.Theclientisabletocallaseriesofspecialmethodsthatallbeginwiththestringsystem.todistinguishthemfromnormalmethods.Thesespecialmethodsgiveinformationabouttheothercallsavailable.WestartbycallinglistMethods().Ifintrospectionissupportedatall,thenwewillreceivebackalistofothermethodnames;forthisexamplelisting,weignorethesystemmethodsandonlyproceedtoprintoutinformationabouttheotherones.Inthexmlrpc_introspect.pyweusethexmlrpcmodule,thismodulesupportswritingXML-RPCclientcode;ithandlesallthedetailsoftranslatingbetweenconformablePythonobjectsandXMLonthewire.
root@erlerobot:~/Python_files#pythonxmlrpc_introspect.py
Herearethefunctionssupportedbythisserver:
addtogether(...)
Addtogethereverythinginthelist`things`.
quadratic(...)
Determine`x`valuessatisfying:`a`*x*x+`b`*x+c==0
remote_repr(...)
Returnthe`repr()`renderingofthesupplied`arg`.
YouwillrecallthatthewholepointofanRPCserviceistomakefunctioncallsinatargetlanguagelookasnaturalaspossible.Andasyoucanseeinxmlrpc_client.pytheStandardLibrary’sxmlrpclibgivesyouaproxyobjectformakingfunctioncallsagainsttheserver.Thesecallslookexactlylikelocalfunctioncalls.
importxmlrpclib
proxy=xmlrpclib.ServerProxy('http://127.0.0.1:7001')
printproxy.addtogether('x','ÿ','z')
printproxy.addtogether(20,30,4,1)
printproxy.quadratic(2,-4,0)
printproxy.quadratic(1,2,1)
printproxy.remote_repr((1,2.0,'three'))
printproxy.remote_repr([1,2.0,'three'])
printproxy.remote_repr({'name':'Arthur','data':{'age':42,'sex':'M'}})
printproxy.quadratic(1,0,1)
Notehowalmostallofthecallsworkwithoutahitch,andhowbothofthecallsinthislistingandthefunctionsthemselvesbackinxmlrpc_server.pylooklikecompletelynormalPython;thereiswithnothingaboutthemthatisparticulartoanetwork:
root@erlerobot:~/Python_files#pythonxmlrpc_client.py
xÿz
55
[0.0,8.0]
[-1.0]
[1,2.0,'three']
[1,2.0,'three']
{'data':{'age':[42],'sex':'M'},'name':'Arthur'}
Traceback(mostrecentcalllast):
...
xmlrpclib.Fault:<Fault1:"<type'exceptions.ValueError'>:mathdomainerror">
NotethatXML-RPCfunctioncalls,likethoseofPythonandmanyotherlanguagesinitslineage,cantakeseveralarguments,butcanonlyreturnasingleresultvalue.Thatvaluemightbeacomplexdatastructure,butitwillbereturnedasasingleresult.Andtheprotocoldoesnotcarewhetherthatresulthasaconsistentshapeorsize;thelistreturnedbyquadratic()variesinitsnumberofelementsreturnedwithoutanycomplaintfromthenetworklogic.Note,also,thattherichvarietyofPythondatatypesmustbereducedtothesmallersetthatXMLRPCitselfhappenstosupport.Inparticular,XML-RPConlysupportsasinglesequencetype:thelist.
ThusfarwehavecoveredthegeneralfeaturesandrestrictionsofXML-RPC.IfyouconsultthedocumentationforeithertheclientortheservermoduleintheStandardLibrary,youcanlearnaboutafewmorefeatures.Inparticular,youcanlearnhowtouseTLSandauthenticationbysupplyingmoreargumentstotheServerProxyclass.Butonefeatureisimportantenoughtogoaheadandcoverhere:theabilitytomakeseveralcallsinanetworkround-tripwhentheserversupportsit,asshowninxmlrpc_multicall.py.
importxmlrpclib
proxy=xmlrpclib.ServerProxy('http://127.0.0.1:7001')
multicall=xmlrpclib.MultiCall(proxy)
multicall.addtogether('a','b','c')
multicall.quadratic(2,-4,0)
multicall.remote_repr([1,2.0,'three'])
foranswerinmulticall():
printanswer
Whenyourunthisscript,youcancarefullywatchtheserver’scommandwindowtoconfirmthatonlyasingleHTTPrequestismadeinordertoanswerallthreefunctioncallsthatgetmade.
ThreefinalpointsareworthmentioningbeforewemoveontoexamininganotherRPCmechanism:
Therearetwoadditionaldatatypesthatsometimesprovehardtolivewithout,somanyXML-RPCmechanismssupportthem:datesandthevaluethatPythoncallsNone.Python’sclientandserverbothsupportoptionsthatwillenablethetransmissionandreceptionofthesenonstandardtypes.
Keywordargumentsare,alas,notsupportedbyXML-RPC,becausefewlanguagesaresophisticatedenoughtoincludethemandXML-RPCwantstointeroperatewiththoselanguages.Someservicesgetaroundthisbyallowingadictionarytobepassedasafunction’sfinalargument.
Finally,keepinmindthatdictionariescanonlybepassedifalloftheirkeysarestrings,whethernormalorUnicode.Seethe“Self-documentingData”sectionlaterinthischapterformoreinformationonhowtothinkaboutthisrestriction.
ThebrightideabehindJSONistoserializedatastructurestostringsthatusethesyntaxoftheJavaScriptprogramminglanguage.ThismeansthatJSONstringscanbeturnedbackintodatainawebbrowsersimplybyusingtheeval()function.ByusingasyntaxspecificallydesignedfordataratherthanadaptingaverbosedocumentmarkuplanguagelikeXML,thisremoteprocedurecallmechanismcanmakeyourdatamuchmorecompactwhilesimultaneouslysimplifyingyourparsersandlibrarycode.
JSON-RPCisnotsupportedinthePythonStandardLibrary,soyouwillhavetochooseoneoftheseveralthird-partydistributionsavailable.YoucanfindthesedistributionsonthePythonPackageIndex.Myownfavoriteislovely.jsonrpc.Ifyouinstallitinavirtualenvironment,thenyoucantryouttheserverandclientshowninListingsjsonrpc_server.pyandjsonrpc_client.py.
fromwsgiref.simple_serverimportmake_server
importlovely.jsonrpc.dispatcher,lovely.jsonrpc.wsgi
deflengths(*args):
results=[]
forarginargs:
try:
arglen=len(arg)
exceptTypeError:
arglen=None
results.append((arglen,arg))
returnresults
dispatcher=lovely.jsonrpc.dispatcher.JSONRPCDispatcher()
dispatcher.register_method(lengths)
app=lovely.jsonrpc.wsgi.WSGIJSONRPCApplication({'':dispatcher})
server=make_server('localhost',7002,app)
print"Startingserver"
whileTrue:
server.handle_request()
Theservercodeisquitesimple,asanRPCmechanismshouldbe.AswithXML-RPC,wemerelyneedtonamethefunctionsthatwewantofferedoverthenetwork,andtheybecomeavailableforqueries.
fromlovely.jsonrpcimportproxy
proxy=proxy.ServerProxy('http://localhost:7002')
printproxy.lengths((1,2,3),27,{'Sirius':-1.46,'Rigel':0.12})
First,notethattheprotocolallowedustosendasmanyargumentsaswewanted;itwasnotbotheredbythefactthatitcouldnotintrospectastaticmethodsignaturefromourfunction.Second,notethattheNonevalueintheserver’sreplypassesbacktousunhindered.
root@erlerobot:~/Python_files#pythonjsonrpc_server.pyStartingserver[Inanothercommandwindow:]$python
jsonrpc_client.py[[3,[1,2,3]],[None,27],[2,{'Rigel':0.12,'Sirius':-1.46}]]
JSON-RPC
YouhavejustseenthatbothXML-RPCandJSON-RPCappeartosupportadatastructureverymuchlikeaPythondictionary,butwithanannoyinglimitation.InXML-RPC,thedatastructureiscalledastruct,whereasJSONcallsitanobject.TothePythonprogrammer,however,itlookslikeadictionary,andyourfirstreactionwillprobablybeannoyancethatitskeyscannotbeintegers,floats,ortuples.Letuslookataconcreteexample.Imaginethatyouhaveadictionaryofphysicalelementsymbolsindexedbytheiratomicnumber:
{1:'H',2:'He',3:'Li',4:'Be',5:'B',6:'C',7:'N',8:'O'}
IfyouneedtotransmitthisdictionaryoveranRPCmechanism,simplyput,thestructandobjectRPCdatastructuresarenotdesignedtopairkeyswithvaluesincontainersofanarbitrarysize.Instead,theyaredesignedtoassociateasmallsetofpre-definedattributenameswiththeattributevaluesthattheyhappentocarryforsomeparticularobject.Ifyoutrytouseastructtopairrandomkeysandvalues,youmightinadvertentlymakeitverydifficulttouseforpeopleunfortunateenoughtobeusingstatically-typedprogramminglanguages.Instead,youshouldthinkofdictionariesbeingsentacrossRPCsasbeinglikethe__dict__attributesofyourPythonobjects,whichyoushouldgenerallynotfindyourselfusingtoassociateanarbitrarysetofkeyswithvalues.
AllofthismeansthatthedictionarythatIshowedafewmomentsagoshouldactuallybeserializedasalistofexplicitlylabelledvaluesifitisgoingtobeusedbyageneral-purposeRPCmechanism:
{{'number':1,'symbol':'H'},
{'number':2,'symbol':'He'},
{'number':3,'symbol':'Li'},
{'number':4,'symbol':'Be'},
{'number':5,'symbol':'B'},
{'number':6,'symbol':'C'},
{'number':7,'symbol':'N'},
{'number':8,'symbol':'O'}}
NotethattheprecedingexamplesshowthePythondictionaryasyouwillpassitintoyourRPCcall,notthewayitwouldberepresentedonthewire.
IfyouhaveaPythondictionaryliketheonewearediscussinghere,youcanturnitintoanRPCappropriatedatastructure,andthenchangeitbackwithcodelikethis:
>>>elements={1:'H',2:'He'}
>>>t=[{'number':key,'symbol':elements[key]}forkeyinelements]
>>>t
[{'symbol':'H','number':1},{'symbol':'He','number':2}]
>>>dict((obj['number'],obj['symbol'])forobjint)
{1:'H',2:'He'}
Usingnamedtuplesmightbeanevenbetterwaytomarshalsuchvaluesbeforesendingthemifyoufindyourselfcreatinganddestroyingtoomanydictionariestomakethistransformationappealing.
Self-documentingData
IftheideaofRPCwastomakeremotefunctioncallslooklikelocalones,thenthetwobasicRPCmechanismswehavelookedatactuallyfailprettyspectacularly.Ifthefunctionswewerecallinghappenedtoonlyusebasicdatatypesintheirargumentsandreturnvalues,thenXML-RPCandJSONRPCwouldworkfine.Butthinkofalloftheoccasionswhenyouusemorecomplexparametersandreturnvaluesinstead!Whathappenswhenyouneedtopassliveobjects?
WhenallyouhavearePythonprogramsthatneedtotalktoeachother,thereisatleastoneexcellentreasontolookforanRPCservicethatknowsaboutPythonobjectsandtheirways:Pythonhasanumberofverypowerfuldatatypes,soitcansimplybeunreasonabletotry“talkingdown”tothedialectoflimiteddataformatslikeXML-RPCandJSON-RPC.ThisisespeciallytruewhenPythondictionaries,sets,anddatetimeobjectswouldexpressexactlywhatyouwanttosay.TherearetwoPython-nativeRPCsystemsthatweshouldmention:PyroandRPyC.ThePyroprojectliveshere:http://ww.xs4all.nl/~irmen/pyro3/
Thiswell-establishedRPClibraryisbuiltontopofthePythonpicklemodule,anditcansendanykindofargumentandresponsevaluethatisinherentlypickle-able.Basically,thismeansthat,ifanobject)anditsattributes)canbereducedtoitsbasictypes,thenitcanbetransmitted.However,ifthevaluesyouwanttosendorreceiveareonesthatthepicklemodulechokeson,thenPyrowillnotworkforyoursituation.Thepicklemoduleimplementsafundamental,butpowerfulalgorithmforserializingandde-serializingaPythonobjectstructure.“Pickling”istheprocesswherebyaPythonobjecthierarchyisconvertedintoabytestream,and“unpickling”istheinverseoperation,wherebyabytestreamisconvertedbackintoanobjecthierarchy.
TalkingAboutObjects:PyroandRPyC
TheRPyCprojectliveshere:http://rpyc.wikidot.com/
Thisprojecttakesamuchmoresophisticatedapproachtowardobjects.Indeed,wherewhatactuallygetspassedacrossthenetworkisareferencetoanobjectthatcanbeusedtocallbackandinvokemoreofitsmethodslaterifthereceiverneedsto.Themostrecentversionalsoseemstohaveputmorethoughtintosecurity,whichisimportantifyouarelettingotherorganizationsuseyourRPCmechanism.Afterall,ifyouletsomeonegiveyousomedatatoun-pickle,youareessentiallylettingthemrunarbitrarycodeonyourcomputer.
YoucanseeanexampleclientandserverinListingsrpyc_client.pyandrpyc_server.py.IfyouwantanexampleoftheincrediblekindsofthingsthatasystemlikeRPyCmakespossible,youshouldstudytheselistingsclosely.
importrpyc
defnoisy(string):
print'Noisy:',repr(string)
proxy=rpyc.connect('localhost',18861,config={'allow_public_attrs':True})
fileobj=open('testfile.txt')
linecount=proxy.root.line_counter(fileobj,noisy)
print'Thenumberoflinesinthefilewas',linecount
AtfirsttheclientmightlooklikearatherstandardprogramusinganRPCservice.Afterall,itcallsagenerically-namedconnect()functionwithanetworkaddress,andthenaccessesmethodsofthereturnedproxyobjectasthoughthecallswerebeingperformedlocally.
Theserverexposesasinglemethodthattakestheprofferedfileobjectandcallablefunction.ItusestheseexactlyasyouwouldinanormalPythonprogramthatwashappeninginsideasingleprocess.Itcallsthefileobject’sreadlines()andexpectsthereturnvaluetobeaniteratoroverwhichaforloopcanrepeat.Finally,theservercallsthefunctionobjectthathasbeenpassedinwithoutanyregardforwherethefunctionactuallylives(namely,intheclient).
importrpyc
classMyService(rpyc.Service):
defexposed_line_counter(self,fileobj,function):
forlinenum,lineinenumerate(fileobj.readlines()):
function(line)
returnlinenum+1
fromrpyc.utils.serverimportThreadedServer
t=ThreadedServer(MyService,port=18861)
t.start()
Itisespeciallyinstructivetolookattheoutputgeneratedbyrunningtheclient,assumingthatasmalltestfile.txtindeedexistsinthecurrentdirectoryandthatithasafewwordsofwisdominside:
root@erlerobot:~/Python_files#pythonrpyc_client.py
Noisy:'Simple\n'
Noisy:'is\n'
Noisy:'better\n'
Noisy:'than\n'
Noisy:'complex.\n'
Thenumberoflinesinthefilewas5
Equallystartlingherearetwofacts.First,theserverwasabletoiterateovermultipleresultsfromreadlines(),eventhoughthisrequiredtherepeatedinvocationoffile-objectlogicthatlivedontheclient.Second,theserverdidn’tsomehow
AnRPyCExample
copythenoisy()function’scodeobjectsoitcouldrunthefunctiondirectly;instead,itrepeatedlyinvokedthefunction,withthecorrectargumenteachtime,ontheclientsideoftheconnection.
RPyCtakesexactlytheoppositeapproachfromtheotherRPCmechanismswehavelookedat.Whereasalloftheothertechniquestrytoserializeandsendasmuchinformationacrossthenetworkaspossible,andthenleavetheremotecodetoeithersucceedorfailwithnofurtherinformationfromtheclient,theRPyCschemeonlyserializescompletelyimmutableitemssuchasPythonintegers,floats,strings,andtuples.Foreverythingelse,itpassesacrossanobjectnamethatletstheremotesidereachbackintotheclienttoaccessattributesandinvokemethodsonthoseliveobjects.
BewillingtoexplorealternativetransmissionmechanismsforyourworkwithRPCservices.TheclassesprovidedinthePythonStandardLibraryforXML-RPC,forexample,arenotevenusedbymanyPythonprogrammerswhoneedtospeakthatprotocol.
TherearethreeusefulwaysthatyoucanlookintomovingbeyondoverlysimpleexamplecodethatmakesitlookasthoughyouhavetobringupanewwebserverforeveryRPCserviceyouwanttomakeavailablefromaparticularsite.
First,lookintowhetheryoucanusethepluggabilityofWSGItoletyouinstallanRPCservicethatyouhaveincorporatedintoalargerwebprojectthatyouaredeploying.ImplementingbothyournormalwebapplicationandyourRPCserviceasWSGIserversbeneathafilterthatcheckstheincomingURLenablesyoutoallowbothservicestoliveatthesamehostnameandportnumber.
Second,insteadofusingadedicatedRPClibrary,youmayfindthatyourwebframeworkofchoicealreadyknowshowtohostanXML-RPC,JSON-RPC,orsomeotherflavorofRPCcall.
Third,youmightwanttotrysendingRPCmessagesoveranalternatetransportthatdoesabetterjobthantheprotocol’snativetransportofroutingthecallstoserversthatarereadytohandlethem.MessagequeuesareoftenanexcellentvehicleforRPCcallswhenyouwantawholerackofserverstostaybusysharingtheloadofincomingrequests.
Ofcourse,thereisonerealityoflifeonthenetworkthatRPCservicescannoteasilyhide:thenetworkcanbedownorevengodowninthemiddleofaparticularRPCcall.YouwillfindthatmostRPCmechanismssimplyraiseanexceptionifacallisinterruptedanddoesnotcomplete.Notethatanerror,unfortunately,isnoguaranteethattheremoteenddidnotprocesstherequest—maybeitactuallydidfinishprocessingit,butthenthenetworkwentdownrightasthelastpacketofthereplywasbeingsent.Inthiscase,yourcallwouldhavetechnicallyhappenedandthedatawouldhavebeensuccessfullyaddedtothedatabaseorwrittentoafileorwhatevertheRPCcalldoes.However,youwillthinkthecallfailedandwanttotryitagain—possiblystoringthesamedatatwice.
Itispossibleyouwillwantbothfeatures:acompactandefficientbinaryformatandsupportacrossseveraldifferentlanguages.Hereareafewoptions:
SomeJSON-RPClibrariessupporttheBSONprotocol,whichprovidesatightbinarytransportformatandalsoanexpandedrangeofdatatypesbeyondthosesupportedbyJSON.
TheApacheFoundationisnowincubatingThrift,anRPCsystemdevelopedseveralyearsagoatFacebookandreleasedasopensource.
GoogleProtocolBuffersarepopularwithmanyprogrammers,butstrictlyspeakingtheyarenotafullRPCsystem;instead,theyareabinarydataserializationprotocol.
RPC,WebFrameworks,MessageQueues
RecoveringFromNetworkErrors
BinaryOptions:ThriftandProtocolBuffers
top related