analysis and visualization of protein-protein...

29
1 Analysis and visualization of protein-protein interactions Olga Vitek Assistant Professor Statistics and Computer Science

Upload: others

Post on 07-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

1

Analysis and visualization of protein-protein interactions

Olga VitekAssistant Professor

Statistics and Computer Science

Page 2: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

2

Outline

1. Protein-protein interactions

2. Using graph structures to study protein-protein interactions

3. Clustering of graphs

4. Evaluation of clusters

Page 3: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

3

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

!"#$%&$'"()%*"+,%-$..

• A cell is a smallest structural unit of an

organism that is capable of independent

functioning

• All cells have some common features

Page 4: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

4

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

!"#$%&"#$%'()%*+,%-./0%/12(1/34'*5/(

TranslationTranscription

Replication

This model is known as the “central dogma”

Page 5: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

5

Why should we study proteins?

● Proteins: large molecules made up of amino acids

◆ accomplish most of the function of the living cells■ by interacting (i.e. entering in physical contact)

with other molecules

◆ linear structures fold into 3-dimensional shapes■ the structure is used to accomplish the function

Ubiquitin-conjugating enzyme E2 G1

(PDB entry 2AWF)

Page 6: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

6

Proteins accomplish function by forming complexes

● A protein complex is a group of tightly interacting proteins◆ also called functional module◆ protein interactions within

the complex help accomplish its function

● Example: exosome◆ a complex of 11 proteins◆ degrades RNA molecules◆ ring structure ensures the function

http://en.wikipedia.org/wiki/Exosome_complex

Page 7: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

7

Discovery of the complex helps understanding its function

● Example: exosome◆ first discovered in yeast◆ helped discovering an

equivalent complex in humans◆ has clinical implications■ target of autoimmune

disease■ chemotherapies for cancer

block its activity

● Knowledge of protein complexes speeds up biological and clinical research

Page 8: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

8

Complexity of a bacterial cell

Often study simpler “model” organisms to gain insight into the function of the cell

Page 9: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

9

Modern technologies determine protein interactions on large scale● New terms◆ Proteome: all proteins that exist in an organism◆ Interactome: all protein-protein interactions

● New questions◆ Interactions of individual proteins◆ Network-wide patterns of interactions

● New challenges◆ Large, complex, noisy datasets◆ Computational approaches are key

Page 10: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

10

BB II NN DD

DNA damage response and repairDNA metabolismDNA recombinationDNA replication

RNA localization and processing

autophagybuddingcarbohydrate metabolismcell cyclecell growth and/or maintenancecell organization and biogenesiscell shape and cell size controlgeneral metabolismmatingnucleolar and ribosome biogenesisprotein amino acid phosphorylation/desphosphorylationprotein biosynthesisprotein degradationprotein metabolism and modificationprotein transportsignal transductionsporulationstress responsetranscriptiontransport

Kss1Bck2

Ste7

Dig1

Ste12

Ste11

Bem3

Ylr154c

Rvb1

Aco1

Rvs167

Ded1

Gyp5Ecm29

Hom6

Ilv5

Lys12

Pho84

Pre10

Ubp7

Ura7

Met18

Ylr243w

Ubi4

Idh1

Apg12

Crm1

Fet3

Kap122

Kgd1

Met10

Pfk1

Rpn1

Car1

Hyp2

Ipp1

Mdh1

Cnb1

Cmp2

Cns1

Ecm10

Yhb1

Cof1

Crn1

Srv2

Hta1

Hhf1

Hir2

Htb1Kap114Kri1

Nap1

Nop4Ret1

Rpc82

Rpo31

Spt16

Rrp13

Ylr222c

Cct2

Cys4

Dig2

Pim1

Rpa135

Rpn6

Tec1

Yer093c

Ygl245w

Yjr072c

Yol078w

Lsm8

Apa1

Gar1Lsm2

Qcr2

Rpn12Rpn8

Rrp42

Smb1

Tif6

Ygl117wMig1

Mss116

Nop12

Nop2Brx1

Ckb1

Cox6

Fet4Nsa2

Nip7Nmd3

Nop1

Rrp1

Sik1

Nug1

Nsa1

Noc2

Ypl009c

Rpc19

Rpn5

Emp24

Kre6

Rpn9Rpt1

Rpt2

Sip2

Arc35Gal83

Idh2

Sec53Snf1

Snf4

Tcp1

Smt3

Cpr1

Gph1Pst2Rod1 Sip1

Yor267c

Spo12

Pse1

Tem1

Cdc15

Gcd11

Mcx1Sar1

Ybr281c

Yhr033w

Ynk1

Ubc6

Atp4

Gcn1

Los1

Pol5 Sec7

Uba1

Ybl004w

Ykl056c

Ypt1

Yju2

Cor1Ded81

Prp19

Dss4

Gdi1

Mrs6

Sec4

Cdc53

Por1

Skp1

Ela1

Loc1

Yra1Cic1

Glc7

Glc8

Mhp1

Reg1

Kap104

Ynl035c

Dim1

Enp1

Nab2

Hrp1Mrt4

Prp6

Hsl7

Rpp0

Ylr287c

Fyv14Krr1

Yjr041c

Rpf2

Kre33

Tsr1

Tif4631

Drs1

Erb1

Ydr131c

Yrb2

Yhr197w

Arf1

Arf2

Cmd1

Cmk2

Ede1

Myo2

Myo4

Myo5

Pgm2

Vas1

Vps13

Ils1

Sod1

Hch1

She3

Dmc1

Est1

Puf6

Rrp12

Cbf5

Dbp7

Hsh49

Pet56

Pdi1

Pwp1

Yer077cYjl109c

Ykl014c

Fpr1

Hom3Gsp1

Gsp2

Srm1

Rna1

Mog1

Rse1Ist3Bud13

Car2

Ssz1

Ydr341c

Sah1

Lsm7 Pat1

Ade5

Dhh1

Lsm4Prp24

Met30

Rub1

Sis1

Tef4

Met4

Met31Nop13

Ebp2

Imd2Imd3

Rrp5

Rad10

Rad1

Sod2

Ubc1

Adk1

Yhr115cYnl311c

Ypt6Ric1

Cdc11

Cdc3Cdc10

Cdc12

Tif4632Ctf13

Hrt1

Rtt101

Sec27

Guf1

Yll034c

Tps1

Ubp15

Rpc40

Rpa190

Sgn1

Ygr250c

Spt2

Cka1

Cka2

Ygr052w

His4

Gpi16

Cdc42

Bem4

San1

Gbp2 Hpr1Sub2Thp2

Mft1

Rlr1

Rho2

Ubc12

Ula1

Clb2

Cdc28

Dia2

Cks1

Hat1

Vps21

Ypt10Ypt52

Ypt32

Ypt31

Ypt7

Gpa2

Ymr029c

Grr1

Pfk2

Pfs2

Rpt3Ygl004c

Rpn10

Snp1

Bcy1

Tpk3

Tpk1

Sti1

Ypt53

Cce1Rnr3

Cmk1

Vph2

Cpr6

Qns1

Trr1

Caf120

Ptc4

Ydr247w

Gin4

Ydr071c

Mas1

Mas2

Gis4

Prb1

Sxm1

Ecm1

Lhp1

Yjl149w

Cdc13

Fun11

Clu1

Tif2

Gnd1

Kip3

Hef3

Inp52Sap190

Nta1

Hsp104

Vma4

Tfp1

Mge1

Ptc5

Yal027w

Spo13

Ydr219c

Yhr122wYdr306c

Yku80

Msu1

Ylr352w

Yol128c

Oye2

Bur2

Cln1

Clb5 Clb3 Yer138c

Ydr170wa

Nup85

Hem15Ymr209c Cbp3

Nup84

Seh1

Pre1Pup2

Pup3

Ylr199c

Pre5

Pre2

Pre3

Pre6

Pre7Scl1

Ykl206c

Pre9

Pre8 Rsp5

Rvs161

Bop2

Sgt1Ufo1

Cdc4

Yil007c

Yta6

Ygr086c

Ypl004c

Top2

Cdh1

Cct3

Cat5

Yjl068c

Gyp6

Trs120

Trs130

Rpb5

Rpa43

Ybr203w

Yck1

Yck2

Ypr015c

Bud20

Ydr101c

Prp43

Rsm25

Afg2

Fyv4

Ypl013c

Rsm24

Yjl122wYbl044w

Nog1

Ydr036c

Htb2

Mam33

Ygl068w

Cdc23Swm1

Cdc5

Mcd1 Smc1

Gsy1

Fpr3

Sds22

Fin1

Gpa1

Hex3

Ssk1

Hrr25

Prp3

Prp4

Ymr226c

Apc1

Nop58

Ygr145w

Kap95

Cbp2

She4

Mlc1

Nuf1

Hul5Myo3

Faa4

Rnq1

Ydr279wAdo1

Lsm1

Ygr173w

Ydr152wYkl078w

Ylr097c

Adh2

Imd4

Pet127

Tis11

Ytm1

Yrb1

Gal3

Bbc1

Cdc39

Mms1

Pma2

Yar009c

Ygp1Ylr035ca

Ylr106c

Adr1

Mkt1

Dur1Ecm33

Hsp12

Sec26

Qri8

Ahp1

Dop1

Sry1

Rvb2

Npl3

Pub1

Fun12

Ste23

Ssf1

Aac3

Kre31

Rli1

Srp1

Sup45

Ygr090w

Tpt1

Apc2

Mak5

Ynl116w

Dbf2

Mob1

Ess1

Tfg1Rpo21

Tom1Spt5

Hxt6

Rpb3

Ynl253wIsu1Nfs1

Lys1

Fol2

Tal1

Cct5Cct6

Cft1

Hgh1

Mer1

Lhs1

Rpn3 Rpn7

Rpt5

Nas6

Arp2

Rpt4

Rpn11Cpr3

Cdc33

Dut1Ygr066c

Thi22

Tpk2Ybr028c

Yck3

Ygr154c

Ssn8 Sno2

Ygr111w

Yjl207c

Yil113w

Slt2

Ynl260c

Asi3

Ypl170w

Pma1

Ynl227c

Cdc14

Mcr1 Dpm1

Atp7 Atp5

Fur1

Hms1

Ydr453c

Atp3

Spe3

Aut2

Spc72

Ydr229w

Irr1Smc3

Faa1

Cyr1

Dbf20

Ala1

Egd2

Axl1Rpb10

Dbp8

Cpa2

Rnr1

Rnr2

Ydl086w

Pdc6

Pdc5

Ynl157w

Prs5

Dia4

Bud32

Grx4

Ykr038cYml036w

Isw2

Isw1

Kns1

Msg5

Fus3

Msh1

Pph22

Cdc55

Ppe1

Rts1

Tpd3

Ygr161c

Hta2Ygl121c

Rpb11

Tap42

Pro1

Arc1

Fum1

Rad51

Mlh1

Tos3

Ykr096w

Yak1

Dnm1

Vps1

Ylr270wYpl247c

Mpc54

Yhr112cPyc2

Ybl108w

Oye3

Ycr079w

Vid31

Cdc60

Pro2

Pyc1

Ypl110cYkl215c

Emg1

Yer030w

Ynl099c

Siw14

Ypk2Pet112

Ygr016w

Yel023c

Yor154w

Yor220w

Rad3

Dun1

Far1

Gpd1

Gpd2

Sen1Ste20

Sec6Ylr368w

Rad26

Ach1

Adh4

Bio3

Erg20

Yhr076w

Ymr318c

Rhr2

Ydr326c

Ras2Ras1

Acc1Gfa1

Isa2

Ngl2Rpa49

Rpc25Rpo26

Tbs1

Yfl042c

Rpc34

Sen15

Egd1Erg13

Erg6

Frs2

Gnd2

Grx1

Aat2

Afr1

Lro1

Met6

Ntf2

Prm2

Rsn1

Scp160

Ses1

Snu13

Ths1

Vma5Wtm1

Ybr025c

Dog1

Has1

Prp8

Sap185

Sit4

Ylr386w

Yhr214waYdl025c

Yal049c

Yhr009c

Apt1

Aro1

Ydr128w

Ylr238w

Erg1

Cdc7

Bfr2

Bir1

Nut1

Thi3Ylr231c Ylr331c

Bms1

Cdc46

Ctf4Dbp10

Lst4Mcm2

Mcm3

Spb1

Ssf2

Ybl104c

Pph3

Ybl046w

Ynl201c

Prp11

Cop1

Nan1 Rex2

Shm2

Ssk2

Bck1

Gal7

Kic1

Kin2

Mkk2

Smk1

Ydr214w

Ylr187w

Gdb1

Rad50

Ubr1

Ylr241w

Yor173w

Ykl161c

Ynl056w

Yol045wFun30

Fun31

Aac1

Apg17

Cvt9

Ppx1

REP1

Sec18

Tyr1

Yhr020w

Ynl208w

Yor086c

Grx3

Idp2Pho81

Sec23

Yor073w

Dog2

Msk1

Rgd1Vma22

Aip1

Arp7

Mse1

Ydr239c

Ypr115w

Rad54

Ynl134c

Cdc9

Dbp9Pol30

Yor378w

Sld2

Trl1

Lif1 Dnl4Mec3

Mak16

Ydr198c

Ygl146c

Nop15 Mlh3

Pso2

Mgm101

Rad2 Pex15

Mre11

Vma8

Xrs2

Rad7

Elc1

Rfa2

Hir3

Ubc13

Aro9

Mms2

Rad18

Whi2

Csr2

Tdp1

Shp1

Yen1

Fpr4

Lcd1

Adh5

Mag1

Ai1

Hho1

Hht1

Mph1Msh2

Yor155c

Mus81

Anc1

Cdc16

Mes1

Mms4

Nhp2

Rad53

Rpc10

Yer078c

Ntg1Rfc2

Tif34

Rad24

Rfc3

Rfc5

Ylr413w

Asf1Ptc2

Swi4

Tbf1Ymr135c

Yta7

Rad59

Bem2

Hor2

Ilv2

Opy1

Pgm1Ptc3

Rad52

Rad9

Rfa1

Rfa3

Hxt7

Yjr141w

Brr2

Rfc4

Rom2

Vac8

Sml1Adh3

Nat1

Dpb11

Hpr5

Phr1

Msd1

Pol4

Rho5

Rad16Htz1

Pdx1

Rad30

Map2

Ycl042w

Efd1Rfc1

Rhc18

Imd1

Set1

Bre2

Trf4

Mtr4 Ydl175c

Yil079c

Ypl146c

Ddc1

Suv3

Rad17

Mgt1

Rip1

Sof1

Ira2

Rad14

Rad4

Rad28

Ccl1Kin28

Lsc1Tfb3 Ybr184w

Ald5 Bul1

Lsb1Ykr018c

Ylr392c

Sir3

Gas1

Sir1

Ubp8

Sir4

Blm3Sir2

Yfl006w

Yku70

Pex19

Arc15

Arc18Arc19

Arc40Arp3

Puf3

Fhl1

Fkh1

Fkh2

Gcn3Hmo1

Ceg1

Ckb2

Fyv8

Gcd2

Gcd7

Mbp1

Net1

Sec2

Sin3

Sui2

Sui3

Ubp12

Ure2

Ygr017w

Ymr144w

Ino80

Lap4

Ams1

Bik1

Pib1

Pph21

Prk1

Abp1Akl1

Dis3

Eaf3 Fip1

Nam8

Nup1

Nup2

Nup60

Pap1

Pct1

Reb1

Rnt1

Rrp4Rrp43

Rrp6

Rtt103

Sif2

Snu56Sto1

Tra1

Ume1

Ypr090w

Fal1

Gcd1

Gcd6

Ubc4Qcr7

Ufd4

Ydl100c

Ybr014c

Yer083c

Ygl020c

Ydr200c

Yfr008w

Yol054w

Gac1

Pob3

Ycr030c

Yor056c

Ypr093cRpb9

Rgp1

Hts1

Apm3

Apl5

Las17

Bzz1Gal2

Pep1

Sla1

Sqt1

Vma6

Vrp1 Yhm2Ynr065c

Pds1

Pkh2

Ygr033c

Cef1Clf1

Snt309

Ygr205w

Sat4

Pho85

Ydr516c

Ygr165w

Num1Sef1Skt5

Syf1

Ygl081w

Nip1

Gsy2Ylr016c

Smc4 Ynl094w

Ypl150w

Aro4

Cbk1Sgt2

Ssd1

Gip2

Hal5

Itr2

Kin82Ynr047w Ksp1

Chs1

Pri2

Yhr186c

Lem3

Nmd5

Yhr199c

Ylr326w

Ppq1

Yhl010c

Nha1

Ybl049w

Ykr017c

Trx2

Usa1

Pex6

Chk1

Ctr1

Ssk22Bud7

Mae1Bfa1

Nup53

Ybr063c

Dpb2

Lst8 Pho86

Srp54

Ykr051w

Ylr271w

Yml020w

Ypr003cRrp3

Slc1

Tfc7Ygr266w

Nog2

Met16

Slx1 Ybt1

Ynl040w

Bmh1

Bnr1

Boi2Kcs1

Nth1 Yfr017cYil028w

Cyk3 Sok1Stu1

Svl3

Caf4

Ent2

Ybl029w

Osh7

Sap1

Pex7

Fzo1

New1

Rrp9

Rtf1

Spt7

Ate1

Epl1

Mpt1

Vid21

Arp4

Esa1

Tuf1

Sdf1

Yel064c Snt1Trm3

Yil112w

Ylr409c

Ymr155wYrf1-3Zds2

Wtm2

Swd3

Sfp1

Sgs1

Ydr316w

Bud9

Dak2

Kin1

Ynl182c

Q0032

Bub3Q0092

Rpf1 Ypd1 Vps35

Ybr242wYil137c

Sec28

Ypl222w

Bud3

Cin8Hir1Mak11

Mck1

Pnt1Yil105c

Msi1

Set3

Pkh1

Prp46Cdc54

Ypl113c

Sap155

Eap1

Ybr187w

Ycr076c Ykr007w

Btn2

Hxt5

Swe1

Ahc1

Kel1

Tep1

Mlp2

Tup1

Cyc8

Ssy5

Sph1

Ydl156w

Rpl23b

Bcp1

Ypl208w

Rpn13

Pol2

Ygl104cNoc3

Yhl035c

Hot1Ydr116c

Ykl082c

Cna1

Ygr263c

Ctk1

Cdc37

Hrb1

Elp2

Elp3

Jip1

Iki3

Zms1

Hat2

Hif1

Hog1Rck2

Pac11

Ybl064cDyn2

Pbs2

Pps1Ade13

Ppz2Yor054c

Pwp2

Ygr210c

Sec13

Sec31

Nup133

Yhl039wLas1

Yor283w

Ycl039w

Ydr255c

Vid28

Fyv10

Vid30

Yer066ca

Ygr223cErg10

Yjr061w

Kkq8

Yjl069c

Arp10

Dip2

Dip5 Mum2

Yil055c

Bmh2

Cln2

Gcn2 Ynl213c

Pop2

Tfc4Kel2

Bud14

Mih1

Pac2

Ydr449c

Ygr130c

Nsr1

Sul2

Ydr267c

Ubp9

Yol111c

Ygl131c

Yor353c

Ctk3Stb3

Fap1

Mek1

Pcl9

Psr1

Ser1

Ufd2Tsl1Dsk2

Hmf1

Ydr049w

Npl4

Rad23

Ylr247cExo70Cin5

Ymr291wVps33

Ypl236cHym1

Yfr039c

Ltp1

Mot1 Mob2

Pcl6

Rpg1

Top1Yfr011c

Yol087c

Tif35

Yor227w

Aut10Apg2

Ptc1Ris1

Apg7

Ape3

Yml072c

Cpa1

Prp28

Rpd3

Ydr266c

Ymr093w

Rok1

Are2

Bre1

Yhr149c

Ypl055c

Ime4Pep3

Ymr086w

Sum1

Lsb3

Yfr024c

Prp12 Ylr422wYor042wYjl045w

Ygr280c

Cik1

Elm1

Scd5

Ydr412w

Ptp2

Ydl063cDps1

Msn5Gal11

Prp31

Skm1

Hmg2

Swd1

Ygr067c

Yir003w

Rlp24

Gpi15

Yhr046c

Bub2

Ism1

Ylr152cDcp2

Crz1

Dcp1Sfb3

Tgl1Ybr225w

Sgm1Mds3 Pin4

Sas10

Yel015w

Ynl207w

Rpm2

Cdc25

Ltv1

Yor215c

Mec1

Rad27

Yhr196w

Tps2Scs2

Mms22Esc4

Gdh2Arl3

Chd1

Mlc2

Prs3

Ira1

Rpa12

Yer067w

Yhr087w

Ura3

Isa1Ygr150c

Ygr198w

Red1

Rim11

Gcr2

Yjr028w

Cki1

Pmd1

Prs2

Yer160c

Yjr027w

Ybr094w

Ybr267w

Ydr339cPmc1

Ydr365c

Ynr054c

Ycr087w

Ydr102c Yfr003c

Yfr016c

Cap2 Cap1

Ylr427w

Are1

Msc3

Grs1

Swi1

Kex2

Tao3

Ccr4

Cdc36

Yer084w

Dbp2

Osh3

Ptp3

Swi5 Stb4

Faa2

Hfi1

Ygr002c

Tel1

Ybr280c

Ybr139wAah1

Ycr001w

Yfl034w

Ypr143w

Yhr105w

Ssq1

Mub1

Spb4

Ubr2

Sth1

Ysc84Scp1

Sps1

Apm1

Shs1

Gcn5

Ada2

Taf60

Sgf29

Hap2Cis1

Ykl214c

Ypr085c

Hap5Ypl166w

Nop16

Ynl063w

Grh1

Hap3

Mdh2

Exg1

Yil108w

Med4Zrg17

Psr2

Ssl2

Rad55

Rck1

Rim15

Rlf2

Ygl060w

Ski8Ski2

Ski3

Sln1

Taf90

Prp40Ykl099c

Yml093w

Ykr060w

Yor145c

Lcp5

Caf20

Cdc20

Mad3

Cse2

Ime2

Sdh2

Ydr372c

Ykr046c

Ste4

Ybl036c

Ydl193w

Ydr482c

Ygr054w

Ino4

Ydr324c

Msh3

Ygl220wYll029w

Yjr110w

Rgr1

Yfl030w

Pcl7

Efb1

Kgd2Krs1

Pmi40

Lpd1

Arg1

Trp5

Ara1

Thr4

Sac6

Ape2

Pab1

Mdh3

Acs2

Hom2

Ydl124wBat1

Gcy1

Ade6 Cys3

Ade3

Ura1Rho4

Msn4

Van1

Mnn9

Bni1

Cdc47

Yjr029w

Trp3

Imh1

Pdx3

Thi21

Ade17

Ymr145c

Ktr3

Nop14

Mdj1

Cvt19

Dld3Cdc123

Sui1

Arg4

Rho1Yfr044c

Ilv3

Apg5

Ymr315w

Ymr102c

Afg3

Tfg2

Bgl2

Cbp6Psd1

Oac1

Pet9

Ayr1

Pro3

Scw4

Msh6

Yil104c

Ymr323w

Pda1

Nbp2

Ppz1

Snu66

Sks1

Nup145

Ctf19

Ypl181w

Gsf2

Spc24

Glt1

Spc25

Ymr196w

Rmt2

Yjr070c

Sen2

Asc1

Trp2

Rcl1

Apl2

Ylr328w

Apl4

Pib2Rlp7

Apm4

Rad6

Yjl107c

Sec21

Erg27

Gpt2Ynl181w

Ydl204w

Om45 Ret2

Yer049w

Ydr398w

Sgd1

Pox1

Npr2 Yer182wVps8

Ydr233c

Vid24

Yor172w

Cac2

Ydl113c

Ynl124w

Sac1

Wbp1

Hsm3

Rpl5

Nup120

Vps41

Crc1

Figure S3: View of the entire HMS-PCI Dataset.Thick blue lines represent literature-derived interactions from PreBIND+MIPS in the HMS-PCI dataset.Thin orange lines represent potential novel interactions. Arrows point from bait to associated protein.Functional annotation derived from Gene Ontology.(www.geneontology.org) http://www.bind.ca

Ho et al., Nature, 2002

New technologies determine protein-protein interactions on a large scale

Such datasets are being increasingly produced, and are publicly available

Page 11: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

11

Outline

1. Protein-protein interactions

2. Using graph structures to study protein-protein interactions

3. Clustering of graphs

4. Evaluation of clusters

Page 12: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

Representing protein-protein interactions using graphs

12

protein A protein B

experimentally determined interaction

Protein attributes:• name• function• quantitative data

Protein attributes:• name• function• quantitative data

Interaction attributes:• type• confidence• direction

Page 13: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

Representing protein-protein interactions using graphs

13

protein A protein B

experimentally determined interaction

Protein attributes:• name• function• quantitative data

Protein attributes:• name• function• quantitative data

Interaction attributes:• type• confidence• direction

Experimental artifact

Page 14: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

Representing protein-protein interactions using graphs

14

● Use graphs to represent the large-scale information on proteins, interactions and their attributes

Page 15: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

Graph-based representation of protein-protein interactions

15

BB II NN DD

DNA damage response and repairDNA metabolismDNA recombinationDNA replication

RNA localization and processing

autophagybuddingcarbohydrate metabolismcell cyclecell growth and/or maintenancecell organization and biogenesiscell shape and cell size controlgeneral metabolismmatingnucleolar and ribosome biogenesisprotein amino acid phosphorylation/desphosphorylationprotein biosynthesisprotein degradationprotein metabolism and modificationprotein transportsignal transductionsporulationstress responsetranscriptiontransport

Kss1Bck2

Ste7

Dig1

Ste12

Ste11

Bem3

Ylr154c

Rvb1

Aco1

Rvs167

Ded1

Gyp5Ecm29

Hom6

Ilv5

Lys12

Pho84

Pre10

Ubp7

Ura7

Met18

Ylr243w

Ubi4

Idh1

Apg12

Crm1

Fet3

Kap122

Kgd1

Met10

Pfk1

Rpn1

Car1

Hyp2

Ipp1

Mdh1

Cnb1

Cmp2

Cns1

Ecm10

Yhb1

Cof1

Crn1

Srv2

Hta1

Hhf1

Hir2

Htb1Kap114Kri1

Nap1

Nop4Ret1

Rpc82

Rpo31

Spt16

Rrp13

Ylr222c

Cct2

Cys4

Dig2

Pim1

Rpa135

Rpn6

Tec1

Yer093c

Ygl245w

Yjr072c

Yol078w

Lsm8

Apa1

Gar1Lsm2

Qcr2

Rpn12Rpn8

Rrp42

Smb1

Tif6

Ygl117wMig1

Mss116

Nop12

Nop2Brx1

Ckb1

Cox6

Fet4Nsa2

Nip7Nmd3

Nop1

Rrp1

Sik1

Nug1

Nsa1

Noc2

Ypl009c

Rpc19

Rpn5

Emp24

Kre6

Rpn9Rpt1

Rpt2

Sip2

Arc35Gal83

Idh2

Sec53Snf1

Snf4

Tcp1

Smt3

Cpr1

Gph1Pst2Rod1 Sip1

Yor267c

Spo12

Pse1

Tem1

Cdc15

Gcd11

Mcx1Sar1

Ybr281c

Yhr033w

Ynk1

Ubc6

Atp4

Gcn1

Los1

Pol5 Sec7

Uba1

Ybl004w

Ykl056c

Ypt1

Yju2

Cor1Ded81

Prp19

Dss4

Gdi1

Mrs6

Sec4

Cdc53

Por1

Skp1

Ela1

Loc1

Yra1Cic1

Glc7

Glc8

Mhp1

Reg1

Kap104

Ynl035c

Dim1

Enp1

Nab2

Hrp1Mrt4

Prp6

Hsl7

Rpp0

Ylr287c

Fyv14Krr1

Yjr041c

Rpf2

Kre33

Tsr1

Tif4631

Drs1

Erb1

Ydr131c

Yrb2

Yhr197w

Arf1

Arf2

Cmd1

Cmk2

Ede1

Myo2

Myo4

Myo5

Pgm2

Vas1

Vps13

Ils1

Sod1

Hch1

She3

Dmc1

Est1

Puf6

Rrp12

Cbf5

Dbp7

Hsh49

Pet56

Pdi1

Pwp1

Yer077cYjl109c

Ykl014c

Fpr1

Hom3Gsp1

Gsp2

Srm1

Rna1

Mog1

Rse1Ist3Bud13

Car2

Ssz1

Ydr341c

Sah1

Lsm7 Pat1

Ade5

Dhh1

Lsm4Prp24

Met30

Rub1

Sis1

Tef4

Met4

Met31Nop13

Ebp2

Imd2Imd3

Rrp5

Rad10

Rad1

Sod2

Ubc1

Adk1

Yhr115cYnl311c

Ypt6Ric1

Cdc11

Cdc3Cdc10

Cdc12

Tif4632Ctf13

Hrt1

Rtt101

Sec27

Guf1

Yll034c

Tps1

Ubp15

Rpc40

Rpa190

Sgn1

Ygr250c

Spt2

Cka1

Cka2

Ygr052w

His4

Gpi16

Cdc42

Bem4

San1

Gbp2 Hpr1Sub2Thp2

Mft1

Rlr1

Rho2

Ubc12

Ula1

Clb2

Cdc28

Dia2

Cks1

Hat1

Vps21

Ypt10Ypt52

Ypt32

Ypt31

Ypt7

Gpa2

Ymr029c

Grr1

Pfk2

Pfs2

Rpt3Ygl004c

Rpn10

Snp1

Bcy1

Tpk3

Tpk1

Sti1

Ypt53

Cce1Rnr3

Cmk1

Vph2

Cpr6

Qns1

Trr1

Caf120

Ptc4

Ydr247w

Gin4

Ydr071c

Mas1

Mas2

Gis4

Prb1

Sxm1

Ecm1

Lhp1

Yjl149w

Cdc13

Fun11

Clu1

Tif2

Gnd1

Kip3

Hef3

Inp52Sap190

Nta1

Hsp104

Vma4

Tfp1

Mge1

Ptc5

Yal027w

Spo13

Ydr219c

Yhr122wYdr306c

Yku80

Msu1

Ylr352w

Yol128c

Oye2

Bur2

Cln1

Clb5 Clb3 Yer138c

Ydr170wa

Nup85

Hem15Ymr209c Cbp3

Nup84

Seh1

Pre1Pup2

Pup3

Ylr199c

Pre5

Pre2

Pre3

Pre6

Pre7Scl1

Ykl206c

Pre9

Pre8 Rsp5

Rvs161

Bop2

Sgt1Ufo1

Cdc4

Yil007c

Yta6

Ygr086c

Ypl004c

Top2

Cdh1

Cct3

Cat5

Yjl068c

Gyp6

Trs120

Trs130

Rpb5

Rpa43

Ybr203w

Yck1

Yck2

Ypr015c

Bud20

Ydr101c

Prp43

Rsm25

Afg2

Fyv4

Ypl013c

Rsm24

Yjl122wYbl044w

Nog1

Ydr036c

Htb2

Mam33

Ygl068w

Cdc23Swm1

Cdc5

Mcd1 Smc1

Gsy1

Fpr3

Sds22

Fin1

Gpa1

Hex3

Ssk1

Hrr25

Prp3

Prp4

Ymr226c

Apc1

Nop58

Ygr145w

Kap95

Cbp2

She4

Mlc1

Nuf1

Hul5Myo3

Faa4

Rnq1

Ydr279wAdo1

Lsm1

Ygr173w

Ydr152wYkl078w

Ylr097c

Adh2

Imd4

Pet127

Tis11

Ytm1

Yrb1

Gal3

Bbc1

Cdc39

Mms1

Pma2

Yar009c

Ygp1Ylr035ca

Ylr106c

Adr1

Mkt1

Dur1Ecm33

Hsp12

Sec26

Qri8

Ahp1

Dop1

Sry1

Rvb2

Npl3

Pub1

Fun12

Ste23

Ssf1

Aac3

Kre31

Rli1

Srp1

Sup45

Ygr090w

Tpt1

Apc2

Mak5

Ynl116w

Dbf2

Mob1

Ess1

Tfg1Rpo21

Tom1Spt5

Hxt6

Rpb3

Ynl253wIsu1Nfs1

Lys1

Fol2

Tal1

Cct5Cct6

Cft1

Hgh1

Mer1

Lhs1

Rpn3 Rpn7

Rpt5

Nas6

Arp2

Rpt4

Rpn11Cpr3

Cdc33

Dut1Ygr066c

Thi22

Tpk2Ybr028c

Yck3

Ygr154c

Ssn8 Sno2

Ygr111w

Yjl207c

Yil113w

Slt2

Ynl260c

Asi3

Ypl170w

Pma1

Ynl227c

Cdc14

Mcr1 Dpm1

Atp7 Atp5

Fur1

Hms1

Ydr453c

Atp3

Spe3

Aut2

Spc72

Ydr229w

Irr1Smc3

Faa1

Cyr1

Dbf20

Ala1

Egd2

Axl1Rpb10

Dbp8

Cpa2

Rnr1

Rnr2

Ydl086w

Pdc6

Pdc5

Ynl157w

Prs5

Dia4

Bud32

Grx4

Ykr038cYml036w

Isw2

Isw1

Kns1

Msg5

Fus3

Msh1

Pph22

Cdc55

Ppe1

Rts1

Tpd3

Ygr161c

Hta2Ygl121c

Rpb11

Tap42

Pro1

Arc1

Fum1

Rad51

Mlh1

Tos3

Ykr096w

Yak1

Dnm1

Vps1

Ylr270wYpl247c

Mpc54

Yhr112cPyc2

Ybl108w

Oye3

Ycr079w

Vid31

Cdc60

Pro2

Pyc1

Ypl110cYkl215c

Emg1

Yer030w

Ynl099c

Siw14

Ypk2Pet112

Ygr016w

Yel023c

Yor154w

Yor220w

Rad3

Dun1

Far1

Gpd1

Gpd2

Sen1Ste20

Sec6Ylr368w

Rad26

Ach1

Adh4

Bio3

Erg20

Yhr076w

Ymr318c

Rhr2

Ydr326c

Ras2Ras1

Acc1Gfa1

Isa2

Ngl2Rpa49

Rpc25Rpo26

Tbs1

Yfl042c

Rpc34

Sen15

Egd1Erg13

Erg6

Frs2

Gnd2

Grx1

Aat2

Afr1

Lro1

Met6

Ntf2

Prm2

Rsn1

Scp160

Ses1

Snu13

Ths1

Vma5Wtm1

Ybr025c

Dog1

Has1

Prp8

Sap185

Sit4

Ylr386w

Yhr214waYdl025c

Yal049c

Yhr009c

Apt1

Aro1

Ydr128w

Ylr238w

Erg1

Cdc7

Bfr2

Bir1

Nut1

Thi3Ylr231c Ylr331c

Bms1

Cdc46

Ctf4Dbp10

Lst4Mcm2

Mcm3

Spb1

Ssf2

Ybl104c

Pph3

Ybl046w

Ynl201c

Prp11

Cop1

Nan1 Rex2

Shm2

Ssk2

Bck1

Gal7

Kic1

Kin2

Mkk2

Smk1

Ydr214w

Ylr187w

Gdb1

Rad50

Ubr1

Ylr241w

Yor173w

Ykl161c

Ynl056w

Yol045wFun30

Fun31

Aac1

Apg17

Cvt9

Ppx1

REP1

Sec18

Tyr1

Yhr020w

Ynl208w

Yor086c

Grx3

Idp2Pho81

Sec23

Yor073w

Dog2

Msk1

Rgd1Vma22

Aip1

Arp7

Mse1

Ydr239c

Ypr115w

Rad54

Ynl134c

Cdc9

Dbp9Pol30

Yor378w

Sld2

Trl1

Lif1 Dnl4Mec3

Mak16

Ydr198c

Ygl146c

Nop15 Mlh3

Pso2

Mgm101

Rad2 Pex15

Mre11

Vma8

Xrs2

Rad7

Elc1

Rfa2

Hir3

Ubc13

Aro9

Mms2

Rad18

Whi2

Csr2

Tdp1

Shp1

Yen1

Fpr4

Lcd1

Adh5

Mag1

Ai1

Hho1

Hht1

Mph1Msh2

Yor155c

Mus81

Anc1

Cdc16

Mes1

Mms4

Nhp2

Rad53

Rpc10

Yer078c

Ntg1Rfc2

Tif34

Rad24

Rfc3

Rfc5

Ylr413w

Asf1Ptc2

Swi4

Tbf1Ymr135c

Yta7

Rad59

Bem2

Hor2

Ilv2

Opy1

Pgm1Ptc3

Rad52

Rad9

Rfa1

Rfa3

Hxt7

Yjr141w

Brr2

Rfc4

Rom2

Vac8

Sml1Adh3

Nat1

Dpb11

Hpr5

Phr1

Msd1

Pol4

Rho5

Rad16Htz1

Pdx1

Rad30

Map2

Ycl042w

Efd1Rfc1

Rhc18

Imd1

Set1

Bre2

Trf4

Mtr4 Ydl175c

Yil079c

Ypl146c

Ddc1

Suv3

Rad17

Mgt1

Rip1

Sof1

Ira2

Rad14

Rad4

Rad28

Ccl1Kin28

Lsc1Tfb3 Ybr184w

Ald5 Bul1

Lsb1Ykr018c

Ylr392c

Sir3

Gas1

Sir1

Ubp8

Sir4

Blm3Sir2

Yfl006w

Yku70

Pex19

Arc15

Arc18Arc19

Arc40Arp3

Puf3

Fhl1

Fkh1

Fkh2

Gcn3Hmo1

Ceg1

Ckb2

Fyv8

Gcd2

Gcd7

Mbp1

Net1

Sec2

Sin3

Sui2

Sui3

Ubp12

Ure2

Ygr017w

Ymr144w

Ino80

Lap4

Ams1

Bik1

Pib1

Pph21

Prk1

Abp1Akl1

Dis3

Eaf3 Fip1

Nam8

Nup1

Nup2

Nup60

Pap1

Pct1

Reb1

Rnt1

Rrp4Rrp43

Rrp6

Rtt103

Sif2

Snu56Sto1

Tra1

Ume1

Ypr090w

Fal1

Gcd1

Gcd6

Ubc4Qcr7

Ufd4

Ydl100c

Ybr014c

Yer083c

Ygl020c

Ydr200c

Yfr008w

Yol054w

Gac1

Pob3

Ycr030c

Yor056c

Ypr093cRpb9

Rgp1

Hts1

Apm3

Apl5

Las17

Bzz1Gal2

Pep1

Sla1

Sqt1

Vma6

Vrp1 Yhm2Ynr065c

Pds1

Pkh2

Ygr033c

Cef1Clf1

Snt309

Ygr205w

Sat4

Pho85

Ydr516c

Ygr165w

Num1Sef1Skt5

Syf1

Ygl081w

Nip1

Gsy2Ylr016c

Smc4 Ynl094w

Ypl150w

Aro4

Cbk1Sgt2

Ssd1

Gip2

Hal5

Itr2

Kin82Ynr047w Ksp1

Chs1

Pri2

Yhr186c

Lem3

Nmd5

Yhr199c

Ylr326w

Ppq1

Yhl010c

Nha1

Ybl049w

Ykr017c

Trx2

Usa1

Pex6

Chk1

Ctr1

Ssk22Bud7

Mae1Bfa1

Nup53

Ybr063c

Dpb2

Lst8 Pho86

Srp54

Ykr051w

Ylr271w

Yml020w

Ypr003cRrp3

Slc1

Tfc7Ygr266w

Nog2

Met16

Slx1 Ybt1

Ynl040w

Bmh1

Bnr1

Boi2Kcs1

Nth1 Yfr017cYil028w

Cyk3 Sok1Stu1

Svl3

Caf4

Ent2

Ybl029w

Osh7

Sap1

Pex7

Fzo1

New1

Rrp9

Rtf1

Spt7

Ate1

Epl1

Mpt1

Vid21

Arp4

Esa1

Tuf1

Sdf1

Yel064c Snt1Trm3

Yil112w

Ylr409c

Ymr155wYrf1-3Zds2

Wtm2

Swd3

Sfp1

Sgs1

Ydr316w

Bud9

Dak2

Kin1

Ynl182c

Q0032

Bub3Q0092

Rpf1 Ypd1 Vps35

Ybr242wYil137c

Sec28

Ypl222w

Bud3

Cin8Hir1Mak11

Mck1

Pnt1Yil105c

Msi1

Set3

Pkh1

Prp46Cdc54

Ypl113c

Sap155

Eap1

Ybr187w

Ycr076c Ykr007w

Btn2

Hxt5

Swe1

Ahc1

Kel1

Tep1

Mlp2

Tup1

Cyc8

Ssy5

Sph1

Ydl156w

Rpl23b

Bcp1

Ypl208w

Rpn13

Pol2

Ygl104cNoc3

Yhl035c

Hot1Ydr116c

Ykl082c

Cna1

Ygr263c

Ctk1

Cdc37

Hrb1

Elp2

Elp3

Jip1

Iki3

Zms1

Hat2

Hif1

Hog1Rck2

Pac11

Ybl064cDyn2

Pbs2

Pps1Ade13

Ppz2Yor054c

Pwp2

Ygr210c

Sec13

Sec31

Nup133

Yhl039wLas1

Yor283w

Ycl039w

Ydr255c

Vid28

Fyv10

Vid30

Yer066ca

Ygr223cErg10

Yjr061w

Kkq8

Yjl069c

Arp10

Dip2

Dip5 Mum2

Yil055c

Bmh2

Cln2

Gcn2 Ynl213c

Pop2

Tfc4Kel2

Bud14

Mih1

Pac2

Ydr449c

Ygr130c

Nsr1

Sul2

Ydr267c

Ubp9

Yol111c

Ygl131c

Yor353c

Ctk3Stb3

Fap1

Mek1

Pcl9

Psr1

Ser1

Ufd2Tsl1Dsk2

Hmf1

Ydr049w

Npl4

Rad23

Ylr247cExo70Cin5

Ymr291wVps33

Ypl236cHym1

Yfr039c

Ltp1

Mot1 Mob2

Pcl6

Rpg1

Top1Yfr011c

Yol087c

Tif35

Yor227w

Aut10Apg2

Ptc1Ris1

Apg7

Ape3

Yml072c

Cpa1

Prp28

Rpd3

Ydr266c

Ymr093w

Rok1

Are2

Bre1

Yhr149c

Ypl055c

Ime4Pep3

Ymr086w

Sum1

Lsb3

Yfr024c

Prp12 Ylr422wYor042wYjl045w

Ygr280c

Cik1

Elm1

Scd5

Ydr412w

Ptp2

Ydl063cDps1

Msn5Gal11

Prp31

Skm1

Hmg2

Swd1

Ygr067c

Yir003w

Rlp24

Gpi15

Yhr046c

Bub2

Ism1

Ylr152cDcp2

Crz1

Dcp1Sfb3

Tgl1Ybr225w

Sgm1Mds3 Pin4

Sas10

Yel015w

Ynl207w

Rpm2

Cdc25

Ltv1

Yor215c

Mec1

Rad27

Yhr196w

Tps2Scs2

Mms22Esc4

Gdh2Arl3

Chd1

Mlc2

Prs3

Ira1

Rpa12

Yer067w

Yhr087w

Ura3

Isa1Ygr150c

Ygr198w

Red1

Rim11

Gcr2

Yjr028w

Cki1

Pmd1

Prs2

Yer160c

Yjr027w

Ybr094w

Ybr267w

Ydr339cPmc1

Ydr365c

Ynr054c

Ycr087w

Ydr102c Yfr003c

Yfr016c

Cap2 Cap1

Ylr427w

Are1

Msc3

Grs1

Swi1

Kex2

Tao3

Ccr4

Cdc36

Yer084w

Dbp2

Osh3

Ptp3

Swi5 Stb4

Faa2

Hfi1

Ygr002c

Tel1

Ybr280c

Ybr139wAah1

Ycr001w

Yfl034w

Ypr143w

Yhr105w

Ssq1

Mub1

Spb4

Ubr2

Sth1

Ysc84Scp1

Sps1

Apm1

Shs1

Gcn5

Ada2

Taf60

Sgf29

Hap2Cis1

Ykl214c

Ypr085c

Hap5Ypl166w

Nop16

Ynl063w

Grh1

Hap3

Mdh2

Exg1

Yil108w

Med4Zrg17

Psr2

Ssl2

Rad55

Rck1

Rim15

Rlf2

Ygl060w

Ski8Ski2

Ski3

Sln1

Taf90

Prp40Ykl099c

Yml093w

Ykr060w

Yor145c

Lcp5

Caf20

Cdc20

Mad3

Cse2

Ime2

Sdh2

Ydr372c

Ykr046c

Ste4

Ybl036c

Ydl193w

Ydr482c

Ygr054w

Ino4

Ydr324c

Msh3

Ygl220wYll029w

Yjr110w

Rgr1

Yfl030w

Pcl7

Efb1

Kgd2Krs1

Pmi40

Lpd1

Arg1

Trp5

Ara1

Thr4

Sac6

Ape2

Pab1

Mdh3

Acs2

Hom2

Ydl124wBat1

Gcy1

Ade6 Cys3

Ade3

Ura1Rho4

Msn4

Van1

Mnn9

Bni1

Cdc47

Yjr029w

Trp3

Imh1

Pdx3

Thi21

Ade17

Ymr145c

Ktr3

Nop14

Mdj1

Cvt19

Dld3Cdc123

Sui1

Arg4

Rho1Yfr044c

Ilv3

Apg5

Ymr315w

Ymr102c

Afg3

Tfg2

Bgl2

Cbp6Psd1

Oac1

Pet9

Ayr1

Pro3

Scw4

Msh6

Yil104c

Ymr323w

Pda1

Nbp2

Ppz1

Snu66

Sks1

Nup145

Ctf19

Ypl181w

Gsf2

Spc24

Glt1

Spc25

Ymr196w

Rmt2

Yjr070c

Sen2

Asc1

Trp2

Rcl1

Apl2

Ylr328w

Apl4

Pib2Rlp7

Apm4

Rad6

Yjl107c

Sec21

Erg27

Gpt2Ynl181w

Ydl204w

Om45 Ret2

Yer049w

Ydr398w

Sgd1

Pox1

Npr2 Yer182wVps8

Ydr233c

Vid24

Yor172w

Cac2

Ydl113c

Ynl124w

Sac1

Wbp1

Hsm3

Rpl5

Nup120

Vps41

Crc1

Figure S3: View of the entire HMS-PCI Dataset.Thick blue lines represent literature-derived interactions from PreBIND+MIPS in the HMS-PCI dataset.Thin orange lines represent potential novel interactions. Arrows point from bait to associated protein.Functional annotation derived from Gene Ontology.(www.geneontology.org) http://www.bind.ca

● View data as a graph

◆ Proteins are nodes and interactions are edges

◆ Nodes have attributes■ e.g. known function

◆ Directed edges ■ experimental artifact

Page 16: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

The interactions are determined by tag-affinity purification (TAP)

● A protein (“bait”) is labeled by a chemical

● The bait forms its interactions (collects “prey”)

● The bait, and all other proteins in the complex are isolated

● All components of the complex are identified by mass spectrometry

16To appropriate a quote from JohnDonne, “no protein is an island entire ofitself” — or at least, very few proteins

are. Most seem to function within compli-cated cellular pathways, interacting withother proteins either in pairs or as compo-nents of larger complexes. A comprehensiveunderstanding of these interactions will beneeded before we can appreciate the mecha-nisms by which cellular pathways functionand interlink. On pages 141 and 180 of thisissue, Gavin et al.1 and Ho et al.2 describe significant advances towards this goal. Eachgroup has characterized hundreds of distinctmultiprotein complexes in the budding yeastSaccharomyces cerevisiae, using approachesin which individual proteins are tagged andused to pull down associated proteins, whichare then analysed by mass spectrometry.

These studies1,2 exemplify an emergingparadigm in protein biology: the systematicanalysis of an organism’s complete comple-ment of proteins (its ‘proteome’). Proteininteractions on a proteome-wide scale havealready been analysed in several ways. In apair of landmark papers, Uetz et al.3 and Itoet al.4 adapted the yeast ‘two-hybrid’ assay —a means of assessing whether two single proteins interact — into a high-throughputmethod of mapping pair-wise protein inter-actions on a large scale. The authors collec-tively identified over 4,000 protein–proteininteractions in S. cerevisiae. Our own group5

has developed a microarray technology inwhich purified, active proteins from almostthe entire yeast proteome are printed onto a microscope slide at high density, such that thousands of protein interactions (andother protein functions) can be assayedsimultaneously.

Gavin et al.1 and Ho et al.2 take a differentapproach — one that is particularly effectiveat identifying protein complexes that con-tain three or more components. Large-scaleefforts to characterize protein complexes are generally rate-limited by the need for anearly pure preparation of each complex. Inthe new studies1,2, protein complexes werepurified as follows (Fig. 1). First, the authorsattached tags to hundreds of different pro-teins (to create ‘bait’ proteins). They thenintroduced DNA encoding these bait pro-teins into yeast cells, allowing the modifiedproteins to be expressed in the cells and toform physiological complexes with otherproteins. Then, using the tag, each bait pro-

tein was pulled out, often fishing out theentire complex with it (hence the term‘bait’). The proteins extracted with thetagged bait were identified using standardmass-spectrometry methods.

Applying this approach on a proteome-

wide scale, Gavin et al.1 have identified1,440 distinct proteins within 232 multi-protein complexes in yeast. As 91% of thesecomplexes contain at least one protein ofpreviously unknown function, the studyprovides a wealth of new information on231 previously uncharacterized yeast pro-teins, and on a further 113 proteins to whichthe authors ascribe a previously unknowncellular role. Furthermore, Gavin et al. findthat most of these complexes have a compo-nent in common with at least one other multiprotein assembly, suggesting a meansof coordinating cellular functions into ahigher-order network of interacting proteincomplexes.

An understanding of this high-orderorganization will undoubtedly offer insightinto corresponding networks in otherorganisms, as most yeast complexes havecounterparts in more complex species (onereason why researchers are interested in thisunicellular organism). Gavin and colleaguesillustrate this point by purifying andanalysing three equivalent multiproteincomplexes from yeast and human cells: theArp2/3 complex, a component of the cellular‘skeleton’; the Ccr4–Not1 complex, which isfound in the nucleus; and the TRAPP com-plex, which is involved in transport from one intracellular compartment (the endo-plasmic reticulum) to another (the Golgi). In each case, the authors retrieved humanand yeast complexes that were similar, if notidentical, in composition.

Using the same general approach, Ho et al.2 constructed an initial set of 725 yeastbait proteins, from which they identified3,617 interactions involving 1,578 differentproteins. They describe interaction networksassembled around the protein kinase Kss1 —a known component of pathways involved inmating and filamentous growth — and com-plexes associated with the cyclin-dependentkinase Cdc28 and the gene-transcriptionfactors Fkh1 and Fkh2. In addition, Ho andcolleagues used 86 bait proteins that areimplicated in the DNA-damage response,allowing them to delineate much of the yeastdamage-response network. In particular,they reveal many regulators and targets of theprotein kinase Dun1, and a possible role forthe DNA-repair protein Rad7 in processes of targeted protein degradation.

The approach taken by Gavin et al. andHo et al. is clearly powerful, but it does have

news and views

Protein complexes take the baitAnuj Kumar and Michael Snyder

Figure 1 Analysing protein interactions. In the‘co-precipitation/mass spectrometry’ approachused by Gavin et al.1 and Ho et al.2, an ‘affinitytag’ is first attached to a target protein (the ‘bait’; a). b, Bait proteins are systematicallyprecipitated, along with any associated proteins,on an ‘affinity column’. c, Purified proteincomplexes are resolved by one-dimensionalSDS–PAGE, a technique that involves running an electric charge through the complexes on agel, so that proteins become separated according to mass. d, Proteins are excised fromthe gel, digested with the enzyme trypsin, andanalysed by mass spectrometry. Database-search algorithms (bioinformatics) are thenused to identify specific proteins from their mass spectra.

Affinitycolumn

a

b

c

d Protein 1Protein 2Protein 3Protein 4Protein 5

Isolate proteincomplex

SDS–PAGE

Excise bandsDigest with trypsin

Analyse by massspectrometry andbioinformatics

Tag1

2

34

1

5

Bait

Many cellular functions are carried out by proteins that are bound together incomplexes. In two new large-scale studies, labelled proteins are used as‘bait’ to capture and identify those complexes.

NATURE | VOL 415 | 10 JANUARY 2002 | www.nature.com 123

Kumar & Snyder, Nature, 2002

Page 17: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

The interactions are determined by tag-affinity purification (TAP)

17

cell

proteincomplex

Tagbait

prey

1. Tagging 2. Purification 3. Identification

KLNFMTP...

PNGFLKK...

SRKNFSL...

KFWQTY...

KKRLMTP...

Page 18: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

18

The technology yields false positive and false negative interactions

● Can not distinguish between various types of complexes

chain star complete graph

● Use the “spoke” model to represent results of experiments✦ directed edges from “bait” to “prey”✦ multiple proteins in a complex can be used as a bait

• direction of edges reflects experimental design, but not the underlying biology

spoke

bait

prey

tag

Page 19: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

Global graph-based summaries:degree of a node

19

● Degree of a node: the number of edges that the node has to other nodes

◆ degree distribution: fraction of nodes in the network with a different degree

◆ mean degree: average degree over all nodes

Each node is labeled with its degree

http://en.wikipedia.org/wiki/Degree_(graph_theory)

Page 20: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

Degree distribution: Gavin et al., 200220

!

!

!

!!!

!!!!!!!!!!!!!!!!!!!!!!! !!! ! ! ! !

0 10 20 30 40 50

01

00

20

03

00

40

05

00

Degree Distribution: Gavin2002

Degree

No

de

Co

un

t

● Only a few nodes have a large number of edges

Page 21: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

Global graph-based summaries:clustering coefficient

21

● Clustering coefficient of a node: the fraction of the neighbors of a node that are also neighbors

● Clustering coefficient of a network: average clustering coefficient over all neighbors

http://en.wikipedia.org/wiki/Clustering_coefficient

Page 22: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

Mean degree vs clustering coefficient of experimental networks

22

0 5 10 15 20 25

0.0

0.1

0.2

0.3

0.4

!

!

!!

!

!

!

!

!

!

!

Gavin2002

Gavin2006

Hazbun2003Ho2002

Ito2001

Krogan2004

Krogan2006

Tong2002

Uetz2000

Li2004

Stelzl2005

Mean degree

Clu

ste

ring c

oeffic

ientsExperimental networks

!

!

AP!MS

Y2H

tag affinity purification

yeast-two-hybrid

Two technologies:

Page 23: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

Conclusion from these summaries for protein interaction networks:

23

● Most nodes have a low degree (i.e. few neighbors)

● Some nodes have a high clustering coefficient (i.e. their neighbors are also neighbors)

● Of interest are protein clusters (i.e. groups of proteins that interact with each other more closely than outside the group)

◆ close interactions can help infer biological function◆ challenge: large and noisy datasets

Page 24: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

24

Outline

1. Protein-protein interactions

2. Using graph structures to study protein-protein interactions

3. Clustering of graphs

4. Evaluation of clusters

Page 25: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

25Our goal: find protein clusters in the large and noisy interaction graph

Gavin et al., Nature, 2002

Page 26: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

26

Step 1: “de-noise” the interaction graph

● We are more confident in protein interactions if they are determined using multiple baits

◆ remove isolated subgraphs◆ determine connected

components■ subgraphs where there is a

directed path from each protein to every other protein

Gavin et al., Nature, 2002

Page 27: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

27

Step 1: “de-noise” the interaction graph

● We are more confident in protein interactions if they are determined using multiple baits

◆ remove isolated subgraphs◆ determine connected

components■ subgraphs where there is a

directed path from each protein to every other protein

Isolated component

Gavin et al., Nature, 2002

Page 28: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

28

● Finding clusters■ ignore directions of edges■ use Markov Cluster (MCL)

algorithm for clustering

● The output are sets of closely interacting proteins

● Not every protein is expected to cluster

Step 2: based on the graph topology, find protein clusters in the connected

components

Gavin et al., Nature, 2002

Page 29: Analysis and visualization of protein-protein interactionssecant.cs.purdue.edu/_media/cs190c:09-04-06.pdf · 2009. 4. 10. · 4. Evaluation of clusters. 3 ... New questions Interactions

29

Output of a clustering procedure

Gavin et al., Nature, 2002

Step 2: based on the graph topology, find protein clusters in the connected

components Exosome example: additional proteins were found by clustering the network

©!2006!Nature Publishing Group!

!

components could be found under clustering conditions with slightlypoorer accuracy or coverage. Therefore, we grouped similar com-plexes from conditions with coverage and accuracy above 70%. Theresulting 5,488 different protein-complex variations were termed‘complex isoforms’ (Fig. 1). This procedure increased the overallcoverage to 90%. The inclusion of parameters resulting in accuracy/coverage below 70% did not increase the coverage, but significantlydecreased accuracy (data not shown).Comparison with the complete collection of known complexes

(279 from MIPS and the literature) showed that 257 of 491 com-plexes were entirely novel, and just 20 of those previously knownlacked novel components (Supplementary Table S2). Of the known

complexes not recovered by the procedure above, 36 were partiallyfound in single purifications (Supplementary Table S4) but produceda signal too weak to be recovered automatically.

Modular organization of the cell machineryThe above procedure partitions proteins in complexes into two types:core components that are present in most isoforms, and attachmentspresent in only some of them (Fig. 1). This is reminiscent of anorganization structure proposed previously that was based on asmall-scale analysis27. Complex cores ranged from 1–23 proteins insize (average 3.1 ^ 2.5). Among the attachments, we noticed severalinstances where two or more proteins were always together andpresent in multiple complexes, which we call ‘modules’ (Supplemen-tary Table S3; on average, associated with 3.3 ^ 1.6 cores).We tested whether this organization was a reflection of biological

phenomena by first looking at transcriptional control of the complexcomponents. A quality controlled set of 975 differentially expressedgenes derived from microarray analyses15 showed that a largepercentage of pairs of proteins within cores were coexpressed at the

Figure 2 | Evidence supporting complex organization. Proteins in eachorganization level (cores, and so on) are referred to as groups. a, Percentageof cell cycle co-regulated genes found in the same group. b, Percentage of co-regulated proteins in the same group expressed at the same time during thecell cycle. c, d, are as for a, b, but for sporulation genes. e, Average dispersionranges for protein abundance within each group. f–h, Percentage of groupshaving exactly the same subcellular localizations, cellular functions orphylogenetic conservation, respectively. i, j, Percentage of pairs for which adirect interaction is known from three-dimensional structures or yeast two-hybrid experiments, respectively. Values on each bar show the total numberof counts; n.d., not determined. See Supplementary Information for furtherdetails.

Figure 3 |Architecture andmodularity of complexes. Proteins are colouredaccording to their localization20. The line attribute corresponds to socio-affinity indices: dotted lines, 5–10; dashed lines, 10–15; plain lines,.15. Baitproteins are shown in bold and shaded circles around groups of proteinsindicate cores andmodules. a, The exosome and the Ski module. b, Stages inde-adenylation-dependent mRNA degradation; arrows show the order ofevents. c, Two distinct families of cap-binding proteins: the nuclear CBC(cap-binding complex) and the cytoplasmic eIF4F.

NATURE|Vol 440|30 March 2006 ARTICLES

633

Gavin et al., Nature, 2006