IntroductionToArtificialIntelligenceForDrug
Development&MedicalScience
MarkChang,PhD
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 1
FDA-IndustryWorkshop,WashingtonDC,September,2020
TopicstoCover• AI/MachineLearningOverview• SupervisedLearning
– DeepLearningNeuralNetworks– Similarity-BasedLearning/KernelMethod/SupportVectorMachine
– DecisionTreeMethods• Non-SupervisedLearning
– UnsupervisedLearning– ReinforcementLearning– SwarmIntelligentLearning– EvolutionaryLearning
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 2
ArtificialIntelligence(AI)• Intelligence=AbilityofPerformingMentalWork• AI=SoftwarewiththeAbilityofPerformingHumanMentalWork• Robot=Machine(hardware)withtheAbilityPerformPhysicaland/or
MentalWorkthatUsuallyRequiresAI• ExamplesofAITechnologiesandRobotics
– Calculator,computer,iphone,ipad,– ELIZA,Apps,searchengine,Googlemap,ecommercesystem,..Alpha0– Machine-learning,datamining,statisticalmodeling.– Humanoidrobots,iRobot,robotsinassemblylines,self-drivecars
• MedicineAI– QuantitativeStructure–ActivityRelationship(QSAR)andMolecularDesignin
DrugDiscovery– CancerPredictionUsingMicroarrayData– DiseaseDiagnosisandPrognosisBasedonMedicalImage– NaturalLanguageProcessingforMedicalRecords– Similarity-BasedAIforClinicalTrials
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 3
ArtificialGeneralIntelligence(AGI)• AGIIsMachineWithFullScopeOfHumanIntelligence
– UnlikeNarrowAIthataccomplishspecifictasksforhuman– Capabilityofexperiencingconsciousness,emotion,discovery,creativity,
self-awareness,collaboration,andevolution– Maybebecapableofcreatingrobotslikehimselfandhumanthatcan
givebirths• AGIIsAnotherRaceOfHumanBeings
– AsmedicalAIdevelops,wegraduallyreplaceourmalfunctionedorganswithones(artificialornot),makingeveryoneamixtureofhumanandmachine.
– Asrobotsreachfullhumanintelligenceandintensivelyinteractwithus,wewilldevelopstrongfeelingwiththemandwillnotdiscriminatethemjustbecauseoftheiroriginalities.
• ChallengesToOvercome– Computingpowerrequired-possiblypoweredbyQuantumComputing– Physical(muscular)powerrequired– Man-machinemixturemightbeasolution
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 4
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 5
UnsupervisedLearning
MachineLearning
ReinforcementLearning
EvolutionalLearning
SwarmIntelligenceLearning
SupervisedLearning
Classification
Regression
Anomalydetection
Clustering
Association
Dimensionreduction
MachineLearning=BackboneofNarrowAI
TypesOfMachineLearning• SupervisedLearning
– Learnergivesaresponseybasedoninputx,andcomparethetargety– Trainedlearnerwillbeabletopredictfutureoutcome– Ex:Digitalimagingrecognitionforcancerdiagnosis
• UnsupervisedLearning– Basedonthesimilaritiestore-representtheinputsinamoreefficientway– Findhiddenstructurewithoutthehelpofatargetanswer– Ex:Documentationorganization,customersegmentation,patientclustering
• ReinforcementLearning– Concernshowalearnershouldtakeactionsinanenvironmentsoasto
maximizesomenotionoflong-termreward– Ex:iRobot,driverlesscar
• EvolutionaryIntelligence/Learning– EmergesthroughevolutionsovergenerationsofAIagents–becomebetterin
somesense– Ex:GeneticalgorithmforpharmacovigilanceandHealthcaremanagement
• SwarmIntelligence/Learning– Systemsoperationbasedonmicro(individual)motivations,withouta
centralizedcontrollerorleader– Ex:Antalgorithmforoptimalcustomerboardingstrategyatairport
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 6
DeepLearning• DeepLearningNeuralNetwork
– Multiple-layerartificialneuralnetwork– Progressivelyextracthigher-levelfeaturesfromrawinput
• FourMajorNetworkArchitectures1. Feed-forwardNeuralNetwork(FNN)forGeneral
ClassificationandRegression2. ConvolutionNeuralNetwork(CNN)forImageandFacial
Recognition3. RecurrentNeuralNetwork(RNN)forSpeech
Recognition,NaturalLanguageProcessing4. DeepBeliefNetwork(DBN)forDiseaseDiagnosisand
Prognosis.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 7
ArtificialNeuralNetwork(ANN)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 8
ActivationFunction:identity,softmax,logistic,sigmoidal,tanh,andsgnfunctions,…
MultiplePerceptron(ForwardNeuralNetwork)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019
• Withaidentityactivationfunction,theANN=linearfunctionXiandlinearinWi• ClassiclinearmodelislinearinparameterWibutcannonlinearinXi• Learning=updatingweightWi
9
Wi Wi
Learning-BackpropagationAlgorithm
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 10
Learning(training)=updatingweightwiinthenetwork.Errororlossfunction:
Updateweightwibackwardslayerbylayerstartingfromtheoutputlayer,hiddenlayers,totheinputlayer.α=learningrate,asmallconstanttoensureconvergenceofthealgorithm,
PredictedTargeted
Weightincrement
Δwi = −α∂E∂wi
E = 1N
(Yi −Yi )2
i∑
Y = f (wi ,Yi )
TrainingAndTesting• TrainAIModeltoDetermineItsParametersBeforeUse• TesttheTrainedModelforItsPerformance• DefineTargetPopulation• ObtainTrainingandTestingdatasets
– Resamplingwithreplacement(bootstrap)andwithoutreplacement(split)
– Forlargersamplesizeproblems,useresamplingwithoutreplacementbecauseeachsamplelikelyreflectsthedistributionofthetargetpopulation.
– Forsmallsamplesizeproblems,useresamplingwithreplacementbecauseeachsampleoftendoesnotreflectthepopulationandsampleistoosmallforresamplingwithoutreplacement(Empiricaldistribution≈populationdistribution).
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 11
TrainingErrorVersusTestingError
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 12
Over-fitting
Regularization:Introducingalossfunctiontoreduceover-fitting.
ConvolutionNeuralNetwork(CNN)• MainIdeasInCNNs
– Differentfiltersintheconvolutionlayers– Locationinvarianceandcompositionality:makesenseforimageprocessingbutnotforNLPortime-seriesevents
• FilterFunctions– Eachidentifiesparticularfeaturesorimageelements– Overlappedlocalfeaturestoformulatethe“overallpicture.”
• CNNforDataOtherThanImages,If– (1)thedatacanbetransformedtolooklikeimagedatainmatrixform,and
– (2)thedataisjustasusefulafterswappinganytwocolumns
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 13
ConvolutionIllustrated
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019
Aconvolution(function)oftwofunctionf(x)andg(y)isdefinedasthesumoftheproductsofoneimagef(·)atlocationxandanotherimage,g(·)(calledthefilter)atadifferentlocationy-x.Atpixlevel:
14
FiltersizeN=9
Ahighvalueindicatesgoodmatchbetweenfilterandoriginalimages
Highvaluesondiagonalelementsindicatethebackslashimage
CNNForDiseaseDiagnosis
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 15
• PoolingLayer=>ReduceResolutionorSizeofImages• ConvolutionalLayer=>FilteringSpecificFeatures• DenseLayer=>FullyConnectedLayer
SampleOfMnistHandwrittenDigits
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 16
CNNForWrittenDigitsRecognition98.7%AccuracyUsingkerasInR
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 17
ApplicationsOfCNN• Hussain(2017),CascadeddeepCNNforabraintumorsegmentation.• Farooq(2017),CNN-basedmethodfortheclassificationofAlzheimer’s
diseaseinMRIimages• Moeskopset.al(2016),amultiscaleCNN-basedapproachforautomatic
segmentationofMRImagesforclassifyingvoxelintobraintissueclasses• Prasoonet.al(2013),atri-planarCNNusedforsegmentationoftibial
cartilageinkneeMRIimages• Zhangetal(2015),segmentationofisointensebraintissuepresented
throughaCNNusingamultimodalMRIdatasetbytrainingthenetworkonthreepatchesthatareextractedfromtheimages
• Anthimopoulosetal(2016),lungpatternclassificationforinterstitiallungdiseasesusingadeepconvolutionalneuralnetwork
• Coleetal(2016),Predictingbrainagewithdeeplearningfromrawimagingdataresultsinareliableandheritablebiomarker
• Estevaetal(2017),Dermatologist-levelclassificationofskincancerwithdeepneuralnetworks
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 18
ArchitectureOfRecurrentNeuralNetwork(RNN)
• RecurrentNeuralNetwork(RNN)– ANNformodelingtemporaldynamicbehavior– UnlikeFNN,RNNscanusetheirinternalstateasmemorytoprocesssequencesofinputs.
• BasicRNN– Suffersfromashort-termmemoryproblem– DifficulttocarryinformationfromearliertimestepstolateroneswhenthesequenceislongassuchEnglishtext
• LongShort-TermMemoryNetworks(LSTMs)– Designedtodealwithlong-termdependencyproblems– UsedforLNP,stockmarketprediction,voicerecognition,motionpicturecaptioning,andpoemandmusicgeneration.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 19
VanishingAndExplodingGradientProblem&Solutions
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 20
X0 X1 Xt-1 Xt+k Xn
w
Problem:Whenthenumberoflayersgetsverylarge,thegradientsatearlylayersvanishandchangesofthecorrespondingweightshaveminimaleffectontheoutcome
… … …www
Xt…
SkipConnection
Solutions:• IntroduceSkipConnections(e.g.,if…then…)• LeakyRecurrentUnits• GateRecurrentNetwork• LongShort-TermMemoryNetwork(LSTM)• DeepBeliefNetwork
w
ArchitectureOfLongShort-TermMemoryNetwork
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 21
ApplicationOfRNNToMolecularDesign
Gupta,etal.(2018)trainedtheirRNNon541,555SMILESstrings,withlengthsfrom34to74SMILEScharacters(tokens).TheRNNmodelcanbeusedtogeneratesequencesonetokenatatime,asthesemodelscanoutputaprobabilitydistributionoverallpossibletokensateachtimestep.Typically,theRNNaimstopredictthenexttokenofagiveninput.Samplingfromthisdistributionwouldthenallowgeneratingnovelmolecularstructures.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 22
SimplifiedMolecular-InputLine-EntrySystem(SMILES)
SMILESdescribethestructureofchemicalspeciesusingshortASCIIstrings.In2007,anopenstandardcalledOpenSMILESwasdevelopedintheopen-sourcechemistrycommunity
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 23
ApplicationsOfLSTMInNLP
• SentimentAnalysis:identifytextas“positive”or“negative,”filteringspamorclassifyingemailtextasspam,classifyingthelanguageofthesourcetext
• LanguageModeling:predictingtheprobabilisticrelationshipsbetweenwords,enablingonetopredictthenextwordsorphrases
• SpeechRecognition:translatespeechtotextreadablebyhumans• MachineTranslation:translatetextbetweenlanguages• DocumentSummarization:aheading,abstractforadocument• QuestionAnsweringSystem:ProcessQuestionsandProvideAnswersina
naturallanguage
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 24
AtrainedLSTMistoprovidetheprobabilityofstringSviaaconditionalprobability:
NaturalLanguageProcessing(NLP)VersusMolecularDesign
• FeaturesInNLP– Wordsatdifferencelocationsorsequenceofwordsinasentence
• WordEmbedding– Semanticmodelingrequiresmappingwordstonumbersorwordembedding.– Wordembeddinginvolvesamathematicalembeddingfromaspacewithmany
dimensionsperwordtoacontinuousvectorspacewithalowerdimension.• FeaturesInMolecularDesign
– Molecularmodelsexhibittree-structures,whichcanbetransferredintoone-dimensionalsequencesofdifferentsubstructures
• WordEmbeddingInBiologicalsequences– WordEmbeddingforn-gramforbiologicalsequences(e.g.DNA,RNA,and
Proteins)-AsgariandMofrad(2015).– Therepresentationcanbeusedindeeplearninginproteomicsandgenomics.
• LongShort-TermMemoryNets– NLPdealswithsequencesofwords.Drugdiscoverydealswithgenesequences
ormolecularstructuresrepresentedbyasequenceofbiologicalsubstructures– LSTMscanbeusedforcompoundscreeningandmoleculardesignasinNLP.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 25
DeepBeliefNetwork(DBN)• TwoMajorChallengesInQSARStudiesandDrugDesign
– (1)thelargenumberofdescriptorsmayhaveautocorrelations– (2)properparameterinitializationinmodelpredictiontoavoidanover-fittingproblem.
• DBNCombinesUnsupervisedandSupervisedLearning– Forefficientlearning– EachDBNlayeristrainedindependentlyviaunsupervisedportion
– Theunsupervisedlearningisusedfordimensionreductionorselectingsubsetsofdescriptors.
– Afterthecompletionofunsupervisedlearning,theoutputfromthelayersisrefinedwithsupervisedlogisticregression.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 26
ArchitectureOfDeepBeliefNetWithStraggledRestrictedBoltzmannMachine(RBM)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019
• RBMsaretwo-layerneuralnetsthatconstitutethebuildingblocksofdeep-beliefnetworks.
• ThefirstlayerofRBMiscalledthevisible,orinputlayer,andthesecondisthehiddenlayer.
• Activationfunctionσ=normalizedexponentialenergyfunction.
• Two-StageLearning:Unsupervisedlearningforfeatureselection=trainingRBMinparallel,followedbysupervisedlearningusingbackpropagation
27
σ
σσ
σσσ
σσ
σ σ 1stRBM
2ndRBM
3rdRBM
σ ClassOutput
InputLayer
Single-StepContrastiveDivergence(CD-1)ProcedureForRBMAndDBN
• Takeatrainingsetvandinitialw,computetheprobabilitiesofthehiddenunits• Sampleahiddenactivationvectorhfromthisprobabilitydistribution.• Computepositivegradient=outerproductofvandh• Fromh,sampleareconstructionv'ofthevisibleunits,thenresamplethehidden
activationsh'fromthis(Gibbssamplingstep).• Computenegativegradient=outerproductofv'andh’• UpdatetoweightWwithincrement=Learning_rate*(positivegradient-
negativegradient):• Updatethebiasesaandb:• TheaboveprocedurestartsfromtheinputlayerorthefirstRBMlaterandmove
graduallytotheoutputlayer.• WhentheunsupervisedtrainingisfinishedforallRBM,thesupervisedlearning
(classification)startstofinetuningtheweights.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 28
ApplicationOfDeepBeliefNetwork• Hintonetal(2006)proposedafastlearningalgorithmusedfor
deepbeliefnetworks.• Zhenetal(2016)proposedaconvolutionaldeepbeliefnetworkis
fordirectestimationofaventricularvolumefromimageswithoutperformingsegmentationatall
• Jaekwon,etal(2017)comparedDBNswithothermethodsincardiovascularriskprediction.TheyproposedacardiovasculardiseasepredictionmodelusingthesixthKoreaNationalHealthandNutritionExaminationSurvey(KNHANES-VI)2013datasettoanalyzecardiovascular-relatedhealthdata.TheyshowthatstatisticalDBN-basedpredictionmodelhasan83.9%accuracy.
• Ghasemietal(2018)usedadeepbeliefnetworktoevaluatetheDBN’sperformanceusingKaggledatasetswithfifteentargetscontainingmorethan70kmolecules.Theresultsrevealedthatanoptimizationinparameterinitializationwillimprovetheabilityofdeepneuralnetworkstoprovidehighqualitymodelpredictions.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 29
SoftwarePackagesForDeepLearning
(TensorFlow,Keras,kerasandkerasR)• TensorFlow
– MultiplelevelsofabstractionpoweredbyMicrosoft– Anopensourceartificialintelligencelibrary,usingdataflowgraphstobuild
models.– Forcreatinglarge-scaleneuralnetworkswithmanylayers– Scale->Vector->Matrix->Tensor.– TensorFlow=MultidimensionalDataFlowfromLayertoLayerinArtificial
NeuralNetworks• Keras
– High-levelneuralnetworksAPIwritteninPython– CapableofrunningontopofTensorFlow,CNTK,orTheano– Developedwithafocusonenablingfastexperimentation
• TheRpackages,kerasandkerasR– TwoRversionofKerasforthestatisticalcommunity– keraspackageusesthepipeoperator(%>%)toconnectfunctionsor
operationstogether,butyouwon’tfindthisinkerasR– kerasRusesthe$operatortomakeyourmodel– kerasRcontainsfunctionsthatarenamedinasimilar,butnotidenticalwayas
theoriginalKeraspackage.Copyright®MarkChang,PhD,AGInception,
BostonUniversity,2019 30
MachineLearningPackagesInR
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 31
PublicDataSourcesForAI• ThecommonlyusedpublicdatarepositoriesforAI
applicationsincancerpredictionandclusteringincludetheTCGA,UCI,NCBIGeneExpressionOmnibus(GEO)andKentridgebiomedicaldatabases.
• Kaggle– InsideKaggleyou’llfindallthecode&datayouneedtodoyourdatasciencework.Useover19,000publicdatasetsand200,000publicnotebooksarereadytoconqueranyanalysisinnotime.
– www.kaggle.com• UCICenterforMachineLearningandIntelligentSystems
– http://archive.ics.uci.edu/ml/datasets.php
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 32
TwoSpecialArtificialNeuralNetworks:
– GenerativeAdversarialNetworks(GANs)– Autoencoder(AutoassociativeNetwork)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 33
GenerativeAdversarialNetworks(GANs)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 34
1. Generatortakesinrandomnumbersandreturnsanimage.2. Thegeneratedimageisfedintothediscriminatoralongwithastreamofimagestakenfrom
theactualdataset.3. Discriminatortakesinbothrealandfakeimagesandreturnsprobabilitiesofauthenticity.
GANcanbeviewedasthecombinationofacounterfeiterandacop,wherethecounterfeiterislearningtopassfalsenotes,andthecopislearningtodetectthem.Botharedynamicinthezero-sumgame,eachsidecomestolearntheother’smethodsinaconstantescalation.Asthediscriminatorchangesitsbehavior,sodoesthegenerator,andviceversa.
ApplicationsOfGenerativeAdversarialNetworks(GAN)
• Imagingmarkerscanbeusedformonitoringdiseaseprogressionwithorwithouttreatment.Modelsaretypicallybasedonlargeamountsofdatawithannotatedexamplesofknownmarkersaimingatautomatingdetection.ChristianDoppleretal(2017)developedaGANthatcanlearnamanifoldofnormalanatomicalvariability.Appliedtonewdatasuchasimagescontainingretinalfluidorhyperreflectivefoci,themodellabelsanomaliesandscoresimagepatchesindicatingtheirfitintothelearneddistribution.
• Deepgenerativeadversarialnetworksaretheemergingtechnologyindrugdiscoveryandbiomarkerdevelopment.Kadurinetal.(2017)demonstratedaproof-of-conceptinimplementingadeepGANtoidentifynewmolecularfingerprintswithpredefinedanticancerproperties.TheyalsodevelopedanewGANmodelformolecularfeatureextractionproblems,andshowedthatthemodelsignificantlyenhancesthecapacityandefficiencyofdevelopmentofthenewmoleculeswithspecificanticancerpropertiesusingthedeepgenerativemodels.
• Yahi,etal.(2017)proposeaframeworkforexploringthevalueofGANsinthecontextofcontinuouslaboratorytimeseriesdata.Theauthorsdeviseanunsupervisedevaluationmethodthatmeasuresthepredictivepowerofsyn-theticlaboratorytesttimeseriesandshowthatwhenitcomestopredictingtheimpactofdrugexposureonlaboratorytestdata,incorporatingrepresen-tationlearningofthetrainingcohortspriortotrainingtheGANmodelsisbeneficial.
• Putin,etal.(2018)proposedaReinforcedAdversarialNeuralComputer(RANC)forthedenovodesignofnovelsmall-moleculeorganicstructuresbasedontheGANparadigmandreinforcementlearning.ThestudyshowsRANCscanbereasonablyregardedasapromisingstartingpointtodevelopnovelmoleculeswithactivityagainstdifferentbiologicaltargetsorpathways.Thisapproachallowsscientiststosavetimeandcoversabroadchemicalspacepopulatedwithnovelanddiversecompounds.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 35
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 36
1. Autoassociative:Output=input2. DimensionReduction:Hiddenlayersmallerthaninputlayer3. Mixtureofunsupervisedandsupervisedlearning
Autoencoder(AutoassociativeNetwork)
ApplicationsofAutoencoder• Kadurin,etal.(2017)presentedanapplicationofautoencodersfor
generatingnovelmolecularfingerprintswithadefinedsetofparameters.– 7-layerarchitecturewiththelatentmiddlelayerservingasadiscriminator– Inputandoutputuseavectorofbinaryfingerprintsandconcentrationofthe
molecule.– NCI-60celllineassaydatafor6252compounds– Screened72millioncompoundsinPubChemandselectcandidatemolecules
withpotentialanti-cancerproperties.• CancerpredictionusingAIincludespredictingtheexistenceofcancer,
cancertypeandsurvivabilityrisk.Differenttypesofautoencodershavebeenusedforfilteringmicroarraygeneexpressions:– Stackeddenoisingautoencoders(Jie,etal.,2015),– Contractiveautoencoders(Macıas-Garcıa,etal.,2017)– Sparseautoencoders(Rasool,2013),– Regularizedautoencoders(Kumardeep,etal.,2017)– Variationalautoencoders(WayandGreene,2017)– Deeparchitectureoffourlayerswith15,000,10,000,2000,and500neurons
(Danaeeetal,2016)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 37
DecisionTree• ATreeMethod(TM)
– Acommonlyusedsupervisedlearningmethod– Classification&RegressionTree– Featuresaredichotomizedforabinarytree
• ImpurityMeasuresForOptimization– Giniindex(CART)– Entropy(ID3,C4.5)– Misclassificationerror
• OverfittingandErrorPropagation– Forlargertrees,anearlymisclassificationerrorcanpropagatedownstream,
eventuallyleadingtopoorpredictions.Solutions:– Treepruning– Bagging&Boosting– RandomForest
• ApplicationsinMedicine
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 38
ExampleOfBinaryDecisionTree
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 39
TheDecisionTreeMethodForCategoricalOutcome
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 40
CommonImpurityMeasures
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 41
MisclassificationError
GiniIndex
Gross-EntropyorDeviance
ForNodemofAClassificationTree(predictedprobabilitypmk):
ThresholdDeterminationAndTreePruning
• DeterminingAnOptimalTreeModel– Twocompetingfactors:theaccuracyofthetreemethodand
computationalefficacy.– Foragiventreedepth,optimalthresholdforeachparameteris
obtainedusingthegreedymethod:tryarangeofdifferentthresholds– Anyofthethreeimpuritiescanbeused,butoftenmisclassification
rate.– Treesizeisatuningparametergoverningthemodel’scomplexity
• Cost-ComplexityPruning– Splittreenodesonlywhenthedecreaseinerrorduetothesplit
exceedssomethreshold.Thisstrategyistooshort-sightedsinceaseeminglyworthlesssplitmightleadtoaverygoodsplitbelowit.
– Thepreferredstrategyistogrowalargetree,stoppingthesplittingprocessonlywhensomeminimumnodesizeortreedepthisreached.Thenthislargetreeisprunedusingcost-complexitypruning
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 42
Bagging,Boosting,AndRandomForest• Asinglebigtreecanpropagateclassificationerrorstotheleaves,
makingthepredictionveryunstable.– Solutions:Bagging,Boosting,andRandomForest
• Bagging(BootstrapAggregation)– Forminganaverageofmanydifferenttrees.
• Boosting– Outputtheclass:
– WeakclassifiersGi(x)areonlyslightlybetterthanrandomguessing.– αicomputedbytheboostingalgorithm(AdaBoostAlgorithm)to
weightmoreonbetterclassifiesGi(x).• RandomForests
– anensembleclassifierthatconsistsofmanydecisiontreeswithrandomlyselectedfeatures
– thefinalclassisthemodeoftheclass’soutputbyindividualtrees.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 43
ApplicationsOfDecisionTreeMethods
• Singh,etal(2015)haveusedRFsforbioactivityclassification• Wangetal.(2015)haveusedtheRFmethodtomodeltheprotein-
ligandbindingaffinitybetween170complexesofHIV-1proteases,110complexesoftrypsin,and126complexesofcarbonicanhydrase.
• Kumarietal(2015)haveconstructedanimprovedRFbyintegratingbootstrapandrotationfeaturematrixcomponentsfordrugtargetsidentification.
• Mistry,etal.(2016)haveusedRFandDTstomodelthedrug-vehicletoxicityrelationship.Theirdatasetincluded227,093potentialdrugcandidatesand39potentialvehicles.Theresultingmodelpredictedthetoxicityreliefofdrugsbyspecificvehicles.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 44
Similarity-BasedandSimilarity-Principle-BasedMachineLearning
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 45
NewScientificParadigm
• ControversiesinStatisticalEvidenceandScientificDiscovery
• CallforNewParadigms–AI/MachineLearning• TheSimilarityPrinciple(SP)
– Humanintelligenceindailylife– RoleofSPinscientificdiscovery– Similarity-BasedMachineLearning(SBML)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 46
Simpson’sParadox
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 47
Allpatients:BbetterthanAMales:AbetterthanBFemales:AbetterthanBQuestion:ShouldapatienttakeAorB?AllFemales:AbetterthanBYoungFemales:BisbetterthanAOldFemales:BisbetterthanAQuestion:ShouldapatienttakeAorB?
• Howtointerprettheresultsifthesepatients+you=theentirepatientpopulation(Oneresponsefromyouwouldnotchangethedirectionofthedrugeffect)?
• ControversialQuestion:HowSpecificIsTooSpecific?• Itisphilosophicalquestionorstatisticalquestion:Don’tputcartbeforethehorse• FundamentalSolution:TheSimilarityPrinciple
ControversiesInPredictionOfDrugEffect
• LongevityPrediction–ParadoxofTraveling– Doesthepredictionofperson’slongevitychangesassoonashearrivedatthenewcountrybecauseofitsdifferentlife-expectance?
• BiasInPredictiveEffect– Stratifiedrandomizedandun-randomnessinclinicsiteselectionleadtoadifferentpatientdistribution(e.g.racedistribution)intheclinicaltrialfromthetargetpopulation
– Ifracehaseffect,themeandrugeffectwillbeusuallybiased.
• DrugEffectEstimationforMultiregionalTrial– Howsmallshouldregionstobe,country,state,city?
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 48
SimilarityPrinciple• Allscience,andlearningitself,isbasedonafundamental
principle–thesimilarityprinciple(Chang,2012,2014).• TheSimilarityPrinciple:
– Similarthingsorindividualswilllikelybehavesimilarly,andthemoresimilartheyarethemoresimilarlytheybehave.
– Ex.,peoplewiththesame(orasimilar)disease,gender,andagewilllikelyhavesimilarresponsestoamedicalintervention.Iftheyaresimilarinmoreaspectstheywillhavemoresimilarresponses.
– Predictedoutcome=similarity-weightedtheobservedoutcomes:
– NowdoyouknowhowtoresolvetheSimpson’sParadox?
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 49
SimilarityPrincipleIllustrated
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 50
Similarthingsbehavesimilarly.
IllustrationsSimilarityPrinciple
Thingsmoresimilarbehavemoresimilarly.
Featuresselectionandgrouping.
Rolesofoutcomeandfeaturesinsimilarity
1. QSAR:Compoundswithsimilarstructureswillhavesimilaractivities.
2. Terroristattach:Sept11Everyyearisahighrisky.
1. Useresultsofdrugtestinanimalstoinformasmallfirst-in-manclinicaltrial,and
2. Useresultsfromclinicaltrialstodecidewhetherthedrugcanbeusedforlargepatientpopulation
1. Definethetargetpopulationandusedformodelevaluation
2. Appliedtoscientificdiscoveryandcausalrelationships
1. Similarityreliesonbyboththefeaturesandoutcomevariable.
2. Therelativeimportanceofdifferentfeaturescanbeobjectivelydeterminedusingfeature-scalingfactors
TwoApproachesToLearningFromData
• CausalityApproach– Seekingtherelationshipbetweentheoutcomeandtheindependent
variables(attributes)– Y(x)=f(x;a(x’)),parameters=functionofobserveddata
• SimilarityApproach– Seekingtherelationshipbetweentheoutcomesbetweendifferent
subjectswithdifferentattributes– Y(x)~S(x,x’)Y(x’),similarity=functionofobserveddata
• SimilarityPrinciple(SP)versusCausality– SPisthefoundationofanycausalrelationships– Unavoidablesimilaritygroupingforrepetitionsofeventsandcausality
• AILearningandScientificDiscovery– Showhumanbrainsincredibleability– Implyhumanincredibleinabilitytohandleeverythingindividually
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 51
SubjectivityOfCausality&ControversiesInInterpretationsOfCauseEffects
• SubjectivityinPotentialCauseSelection• CausesAreGenerallyNotIndependent• ModelsAreNotUnique
– BodyWeight=f(waist,height)– BodyWeight=f(waist,height,shoulderheight)– BodyWeight=f(waist,height,shoulderheight,footlength)– Infact,allfactorsarecorrelatedandstatisticallysignificant
• DifferentModelsProvideDifferentInterpretationsofEffect(associationorcausality)ofWaistandHeight
• SimilarControversiesInDrugEffect=f(drug,race,genex,geney)– Simplystatingdrugeffectwithoutspecifyingotherfactorsismisleading.– Interpretationofafactoreffectmustinthecontextofallotherfactorsinthe
model– Givenalargerbutfinitepopulation,manyfactors(andinfinitepossible
combinationsofthesefactors)canbesignificant.Thefactorsincludedinthemodelissubjectiveandconsequentlytheinterpretationdrugeffectisarbitrary
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 52
DoseAnalysisofCovarianceMakeSense?• ModelOptions:
– Model1:Weight=a1*height– Model2:Weight=a2*height+b2*(shoulderHeight)– Model3:Weight=a3*height+b3*(shoulderHeight)+c3*(leglength)– Model4:Weight=d4*(headheight)+b4*(shoulderheight)
• Givenbigdata– Allmodelsaresignificant
• Howtointerprettheparameters?– Whatistheeffectofheight,a1,a2,ora3?– Whatistheeffectofshoulderheight,b1,b2orb4?– a1,a2,anda3canbeverydifferentsinceheightmainlyconsistsofshoulder-
height,andlegisapartofshoulder-heightandheight– c3canbenegativesincea3andb3havealreadyincludethelegeffect,would
yousay:“longerlegwillreducetheweight?”• Conclusion:
– Manyattributesorcovariates(includinggenes)inmodelingexhibitstrongcorrelations,thecommoninterpretationofparametersoftendonotmakemuchsensetome.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 53
SimilarityPrincipleInAction-Similarix
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 54
BoundedSimilarity:0≤Sij≤1Completelydifferent=0Identical=1
Feature-ScalingFactors(FSF)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 55
• RkdependonSelectedFeaturesandOutcome• IncludingMoreCommonFeatureswill
• reducethedifferenceandincreasethesimilarityamongsubjects• leadtolargerRkdifferencestobalanceouttheeffectofcommonfeatures
• WhyExponentialSimilarityFunction• Humanonlycanpayattentiontolimited“localevents”anddistantevents
quicklydisappearatthehorizon• Humanorgansensibilityisonlog-scale(e.g.,loudness,brightness)
ExponentialSimilarityFunction:
DistanceFunction:
SBML-Algorithm
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 56
SolveinitialscalingfactorsRkbasedonSijandsimilarityfunction
CalculateweightWijusingSijandRk
Normalizetrainingdataset
Assigninitialsimilarityscores,Sij
Normalizednewdata(withouttheoutcomevariable)usingmeanand
standarddeviationfromthetrainingset
Modeloutcome,Yi=sumwijOj
CalculateerrorE=sum(Yi-Oi)2/N
UpdatescalingFactorsRkusinggradientmethodwithlearningrate,alpha
CalculatesimilarityscoreSijbetweennewsubjectsandsubjectsinthetrainingsetusingthepreviouslydeterminedRk
CalculateweightWijusingRk
Predictfutureoutcome,Yi=sumwijOj
• LossFunctions– Lasso:
– Ridge:
– ElasticNet:• MinimizationwithGradientMethod
• Learningwithrateα
LearningWithRegularization
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 57
ClinicalTrialExample:RareDisease
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 58
TrainingSetSize
FullLinearModel
OptimalLinearModel
RidgeRegression
SBML
25% 0.774 0.788 0.796 0.616
50% 0.803 0.802 0.804 0.655
75% 0.806 0.809 0.810 0.668
Note:MSEnormalizedbyvarianceoftheoutcomes
MeanSquaredError(MSE)ofDifferenceMachineLearningMethodsBasedonBootstraps
Training,Validation&EvaluationOfAIMethods
• Cross-ValidationforTuningParameters– Exhaustivecross-validationmethods:allpossiblewaysoftraining-validationsplits.
– Leave-p-outcross-validation(LpOCV):exhaustivelysplitswithpobservationsasthevalidationset,therestfortraining.
– Bootstrapping:therandomselectionofsizepastrainingsetandsizeqastestset,withreplacement
• TargetPopulation(TP)– Well-definedTP(WDTP)isneededformodelevaluation,butinAIworld,WDTPisoftenmissing.
– Bigdata:simplysplitthesampleintotrainsetandtestset– Largerdata:trainsetsandtestsetsobtainedfromresamplingwithoutreplacement
– Smalldata:useempiricaldistribution–trainsetsandtestsetsobtainedfrombootstrapping
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 59
SBMLPerformanceOfRegressionProblem
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 60
MSEofP
redicted
Y/NaïveM
SE
TrainingSampleSize
OutcomeY=X1+X2X3+X4X4+X5.However,X1,…,X7fromstandardmultivariatenormaldistributionareincludedinallmodels(X6,X7arenoise).
PrecisionVersusComplexityOfDataRelationship,SampleSize,AndComputationalEfficiency
-VariablesX6andX7arenoise- TrainingTime=O(N2)=0.09N2
(second).TrainingTime=2.4minutesfortrainsetsizeN=400.
- MSEnormalizedbythenaïveMSEfromthepredictionusingthetrainingsamplemean
- Learningrateα=0.125,λ =-0.5Epoch=5
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 61
=X1+X2X3+X4X4+X5=X1+5X2X3+X4X4+X5
Normalize
dMSE
LessNonlinear
MoreNonlinear
TrainingSampleSize
SBMLPerformanceForClassification
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 62
• Y=sign(X1+X2X3+X4X4+X5)+1)/2• Xi=multivariatenormal• Lossfunctionλ=-0.5• LearningRateα=0.125• MSEisbasedonprobabilityofY=1.
LogisticModel
SBML
MSE
MSE
MSE
ParadigmsOfDrugDevelopment
• Challenges– Multi-RegionalTrialandSubgroupAnalysis– RareDiseaseandPrecisionMedicine
• Solutions– SmallerClinicalTrials– StagewiseMarketing-Authorization– EnhancedPharmacovigilance&PostmarketingMonitoring
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 63
KernelMethod• KernelMethod:
– KernelAssmoothfunctionwithnormalizationfactor(Hastieetal,2001)– Non-linearkernelregressionwithoutnormalizationfactor(Hastieetal,2001)– Kernelasattributesinlearning(trainingweights)(Scholkopf,etc.,2005)
• Kernel
– Kernel,k(�,�),isthedotproductoffeaturevectors– Kernelisasimilarityscorefunction
• FeatureSelectionforDifferentObjects– Sequencing(text,sound,music)– Images(black-white,color)– Trees(chemicalcompounds)– Networks(metabolicnet,disease-symptomnet,drug-AEnet)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 64
TheKernelTrick• Anyalgorithmforvectorialdatathatcanbeexpressedonlyintermsofdotproductsbetweenvectorscanbeperformedimplicitlyinthefeaturespaceassociatedwithanykernel,byreplacingeachdotproductbyakernelevaluation(Scholkopf,etc.,2005).
• Transferlinearmethods(e.g.,PCA)intononlinearmethodsbysimplyreplacingtheclassicdotproductwithamoregeneralkernel,suchastheGaussianRBFkernel.
• Nonlinearityviathenewkernelisthenobtainedatminimalextracomputationalcost,asthealgorithmremainsexactlythesame.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 65
SupportVectorMachineAsAKernelMethod
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 66
ClassifierwithHardMarginModel:
ClassifierwithSoftMarginModel:
SupportVectorMachineInR
ForPredictingCognitiveImpairment
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 67
• Alzheimer’sdiseaseisthemostcommoncauseofdementiaintheelderly.
• Predictthecognitiveoutcomeusingthedemographicsand130biomarkers(predictors).
ApplicationOfKernelMethods<<KernelMethodsinComputationalBiology>>,Scholkopfetal(2004)collectdifferentapplicationsofkernelmethodsandsupportvectormachines:• InexactMatchingStringKernelsforProteinClassification• FastKernelsforStringandTreeMatching• LocalAlignmentKernelsforBiologicalSequences• KernelsforGraphs• DiffusionKernels• AKernelforProteinSecondaryStructurePrediction• HeterogeneousDataComparisonandGeneSelectionwithKernelCanonical
CorrelationAnalysis• Kernel-BasedIntegrationofGenomicDataUsingSemi-definiteProgramming• ProteinClassificationviaKernelMatrixCompletion• AccurateSpliceSiteDetectionforCaenorhabditiselegans• GeneExpressionAnalysis:JointFeatureSelectionandClassifierDesign• GeneSelectionforMicroarrayData
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 68
ApplicationOfSVM• SVM-basedproteinfoldrecognitionmethods(Shamimetal.,2007,2013;DamoulasandGirolami,2008;Dongetal.,2009;YangandChen,2011).ThemaindifferenceamongtheseSVM-basedmethodsistheirfeaturerepresentationalgorithms.
• Poorinmohammadetal.(2015)combinedtheSVMapproachwithpseudoaminoacidcompositiondescriptorstoclassifyanti-HIVpeptides,withapredictionaccuracyof96.76%.
• WeiandZou(2016)providedareviewoftherecentprogressinmachinelearning-basedmethodsforproteinfoldrecognitionpriorto2011.
Copyright®MarkChang,PhD,AGInception,
BostonUniversity,2019 69
UnsupervisedLearning
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 70
UnsupervisedLearning(USL)• UnsupervisedLearningHasNoClear,CorrectAnswers• USL=Clustering,Association,andAnomalyDetection
– Clustering=discoveringtheinherentgroupingsinthedata.Ex:groupingpatientsbytheirbaselinediseaseseverityanddemographiccharacteristics
– Anassociationrulelearning=discoveringrulesthatdescribelargeportionsofthedata.Ex:peoplebuyingproductAalsotendingtobuyproductB.
– Anomaly(outlier)andnoveltydetections=identifyingitems,eventsorobservationsthatdonotconformtoanexpectedpattern.Anomalydetectioncanalsobeasupervisedlearning.Ex:structuraldefects,medicalproblems,texterrors,orbankfraud
• Applications– Groupingbreastcancerpatientsbytheirgeneticmarkers– GroupingMovieviewersbytheratingsassignedbymovieviewers– Findingsaleitemsthatarehighlyrelatedcanbeveryhelpfulwhen
stockingshelvesordoingcross-marketinginsalespromotions,catalogdesign,andconsumersegmentationbasedonbuyingpatterns.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 71
• AprioriAlgorithmforAssociationRule– Identifythefrequentindividualitemsintherelationaldatabaseand
extendsthemtolargerandlargeritemsetsaslongasthoseitemsetsappearsufficientlyofteninthedatabase
• PrincipalComponentAnalysis(PCA)– Findalow-dimensionalrepresentationoftheobservations– Sequentiallyfindasetoflinearcombinationsofthepredictorsthat
havemaximalvarianceandareorthogonalwitheachother.• K-MeansClusteringforDimensionReduction
– PartitiontheNobservationsintoapre-specifiedKclusterssoastominimizethewithin-clustersumofsquares:
• HierarchicalClusteringforDimensionReduction– Yieldsatree-likevisualrepresentation(Dendrogram)ofthe
observationswithoutpre-specifiednumberofclusters
CommonUnsupervisedLearningAlgorithms
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 72
HierarchicalClusteringAlgorithm
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 73
HierarchicalClusteringForBreastCancerPatients
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 74
HeatmapofGeneSamples
75
ApplicationsOfClusteringMethods• Böckeetal(2005),ahierarchicalclusteringapproachforlargecompoundlibraries• Wallneretal(2010)studiedcorrelationandclusteranalysisofimmunomodulatory
drugsbasedoncytokineprofiles.• Tanetal(2015)andRasooletal(2013)useneuralnetworkfilteringmethodsare
usedforextractingrepresentationsthatbestdescribethegeneexpressions.• Haratyetal(2015)studiedanenhancedk-meanclusteringalgorithmforpattern
discoveryinhealthdata.• Yildirimetal(2014)usek-meansalgorithmtodiscoverhiddenknowledgeinhealth
recordsofchildrenanddatamininginbioinformatics.Theyconcludethatmedicalprofessionalscaninvestigatetheclusterswhichourstudyrevealed,thusgainingusefulknowledgeandinsightintothisdatafortheirclinicalstudies.
• Hameedertal(2018)proposedatwo-tieredunsupervisedclusteringapproachfordrugrepositioningthroughheterogeneousdataintegration.
• MacCuishetal(book,2019),clusteringinbioinformaticsanddrugdiscovery• Maetal(2019),acomparativestudyofclusterdetectionalgorithmsinprotein–
proteininteractionfordrugtargetdiscoveryanddrugrepurposing.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 76
ReinforcementLearning(RL)• Learning
– OptimizeautilityFunction– Feedbackfromone’senvironmentisessentialforRL.– Forthesituationwhenthecorrectanswerisdifficulttodefineorthere
aretoomanypossiblepathstocompletethetask.• Example-AlphaGoZero
– AcomputerprogramthatplaystheboardgameGo– Atthe2017FutureofGoSummit,beattheworldNo.1rankedplayer– AlphaGoanditssuccessorsuseaMonteCarlotreesearchalgorithmto
finditsmovesbasedonknowledgelearnedbyanANNfrombothhumanandcomputerplay
• Methods– StochasticDecisionProcess– Q-Learning
• Applications– MarkovDecisionProcessforclinicaldevelopmentprograms(Chang,
2010)– Valueiterationandpolicyiterationalgorithms
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 77
PhaseTransitionProbabilitiesOfClinicalTrials
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 78
Source:Changetal.(2019)
MarkovChainForClinicalDevelopmentProgram
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 79
Source:Chang,2010;Changetal.2019
Pij=ActualPhaseTransitionProbabilityfromPhaseitoPhasejforalldiseaseindications
SDP=Markovchain&actionrules(calledpolicy)tooptimizetheNPVateachstage.Backwardinduction&dynamicprogrammingusedtosolvetheproblem.Action=trialdesignandconduct,……
SequentialDecisionProcess(SDP)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 80
Source:Chang,2010;Changetal.2019
NME
Phase1&
Action
Phase2&
Action
Phase3&
Action
MarketingAuthorization
NetPresentValue
(NPV
)
DevelopmentTimeline
P01
P12
P23
P34
StochasticDecisionProcessForACancerClinicalDevelopmentProgram(CDP)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 81
Source:Changetal.2019
SwarmIntelligence(SI)• Self-OrganizedSystem(SOS)
– SOS=Systemsinwhichorganizedbehaviorariseswithoutacentralizedcontrollerorleader
– Stigmergy=amechanismofindirectcoordinationbetweenagents:thetraceleftintheenvironmentbyanactionstimulatestheperformanceofanextaction,byadifferentagent.
• SwarmIntelligence(SI)– SI=intelligencepossessedbySOSwithoutcommongoalandleader
• CharacteristicsofSI– Micromotivedandmacroconsequence(genotypeversusphenotype)– EachindividualinSOShasnointelligenceandfollowasimplerule.– Ex:eachantsimplyfollowsthehormone,butthecolonybehavesintelligently
infindingtheshortestpathtothefoodsource– SIisnotownbyanyindividual
• SIExamplesinNature– Antcolonies,birdflocking,animalherding,bacterialgrowth,fishschooling,
andmicrobialintelligence• ArtificialSwarmIntelligenceAlgorithms
– Particleswarm,Antcolony,Artificialbeecolony,Artificialimmunesystems,,Gravitationalsearch,Glowwormswarm,andBatalgorithms
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 82
HowSwarmIntelligenceWorks
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 83
SwarmIntelligence:Everyantfollowthesmell,collectivelytheantcolonyquicklyfindtheshortestpathwithouttheguidanceofanycentralintelligence
SwarmIntelligenceApplications• Acentralpartoftherationaldrugdevelopmentprocessisthepredictionofthe
complexstructureofasmallligandwithaprotein,theso-calledprotein-liganddockingproblem,usedinvirtualscreeningoflargedatabasesandleadoptimization.
• Korbetal(2006)developedadockingalgorithmbasedonantcolonyoptimization,tostructure-baseddrugdesign.
• Fu,etal.(2015)studiedanewapproachforflexiblemoleculardockingbasedonswarmintelligence.Theycomputetheinteractionsof23protein-ligandcomplexes.
• RajeshkumarandKousalya(2017)presentaviewonapplicationsofswarm-basedintelligencealgorithmsinpharmaceuticalin-dustry,includingdrugdesign,pharmacovigilance,andalignmentofsequence.
• Soulami,et.Al.(2017)usedaparticleswarmoptimization(PSO)basedalgorithmfordetectionandclassificationofabnormalitiesinmammographicimagesusingtexturefeaturesandsupportvectormachine(SVM)classifiers.
• Fanget.al.(2018)presentedanovelfeatureselectioncalledtheelitesearchmechanism-basedflowerpollinationalgorithm(ESFPA)usedtodetermineproteinessentiality.ESFPAusesanimprovedSIalgorithmforfeatureselectionandselectsoptimalfeaturesforproteinessentialityprediction.
• ThemetaheuristicOptpackageinRincludesmanyoptimizationalgorithms
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 84
EvolutionaryIntelligence(EI)• InspirationbyNaturalEvolutionProcess
– Inspiredbyourunderstandingofbiologicalevolution• MinimalorNoSpecificationofSolutionStructure
– Automaticallysolvesproblemswithoutrequiringtheusertoknoworspecifytheformorstructureofthesolutioninadvance.
• SolutionImprovementViaEvolution– Generationbygeneration,stochasticallytransformpopulationsofcomputerprogramsintonewpopulationsofprogramsthatwilleffectivelysolveproblems.
• GeneticAlgorithm(GA)versusGeneticProgramming(GP)– Bothuseevolutionalgorithms– GAisrepresentedasalistofactionsandvalues,oftenastring– GPisrepresentedasatreestructureofactionsandvalues,usuallyanesteddatastructure.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 85
GeneticProgramming
ThreeKeyElementsinGP:(1)Representation:SyntaxTree
Mathematicalorcompoundstructure(2)ReproductiveMechanism:CrossoversandMutations
Crossover=combinationoftworandomlyselecteddatastructurestoproduceonenewstructure.Amutation=randomlygeneratesasubtreetoreplacearandomlyselectedsubtreefromarandomlyselectedindividual.
(3)SurvivalFitnessForfindingafunctiong(x)toapproximatethetargetfunctionf(x),thefitness=themeansquareerrorbetweenthetwofunctions.Forfindingaoptimaltreatmentsequenceforinfertility,thefitnessfunction=treatmentsuccess(1forsuccessand0forfailure).
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 86
GeneticAlgorithmForTreatmentStrategyOptimization
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 87
Thepopulation(treatmentsequences)changesconstantlysincesurvivabilityisafunctionoffitnessorobservedeffectoftreatmentsequence
Treatmentsequencesfortreatingwomeninfertility
ApplicationOfEvolutionaryIntelligence• RNAstructureprediction(Batenburgetal,1995)• Molecularstructureoptimization(Wong,etal.2011)• GhoshandJain(2005)collectedarticlesacrossabroadrangeoftopicson
theapplicationsofevolutionaryAIindrugdiscovery,GPindataminingfordrugdiscovery
• Barmpalex,et.al.(2011)hasusedsymbolicregressionviageneticprogrammingintheoptimizationofacontrolledreleasepharmaceuticalformulationandcompareditspredictiveperformancetoANNs.
• Ghaheri,et.al.(2015)reviewedtheapplicationsofthegeneticalgorithmindiseasescreening,diagnosis,treatmentplanning,pharmacovigilance,prognosis,andhealthcaremanagement.
• Ghaherietal(2019)providedancomprehensivereviewoftheapplicationsofgeneticalgorithmsinmedicine,in14differentareas:– radiology,oncology,cardiology,endocrinology,obstetricsandgynecology,
pediatrics,surgery,infectiousdiseases,pulmonology,radiotherapy,rehabilitationmedicine,orthopedics,neurology,pharmacotherapy,andhealthcaremanagement.
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 88
Review&Summary
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 89
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 90
UnsupervisedLearning
MachineLearning
ReinforcementLearning
EvolutionalLearning
SwarmIntelligenceLearning
SupervisedLearning
ListOfMachineLearningMethodsDetailsandReferencesAvailablefromtheUpcomingBook(Chang,2020)
-Linkanalysis-PCA-K-Means-HierarchicalClustering-SOM
-SBML-KNN-Deeplearning(FNN,CNN,RNN,DBN)-Kernelmethod-SVM-GANs-Autoencoders-Decisiontree(bagging,Boosting,Randomforest)-Logisticregression-Ridge,lasso,elasticnetregressions-Bayesiannetwork
-Sequentialdecisionprocess-BayesianQ-learning
-Geneticalgorithm-Geneticprogramming-Cellularautomata
-Antclusteringmodel-Particleswarmoptimization(PSO)
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 91
AIApplications
KernelMethodforProteinClassification&BiologicalSequences
DrugDiscovery
Healthcare
MicroarrayDataAnalysis
DrugDevelopment
SBMLforClinicalTrialsofRareDisease&PrecisionMedicine
DeepLearning(FNN,CNN,RNN,LSTM,DBN)fordrugdesign&moleculeclassification
DecisionTree,RandomForestforbioactivityclassification,protein-ligandbinding,toxicitymodeling
LDAfordrug-druginteraction,kNNfordrugablemoleculesidentification,SVMforDNA-bindingproteinprediction,Networksimilarityfordiseasemodeling
NLPforPharmacovigilance:Auto-narrativegeneration,narrativeanalysis,QCassessment,causalityassessment.developdiseasemodels,probabilisticclinicalriskstratificationmodels,andpractice-basedclinicalpathways
DiseaseDiagnosis&PrognosiswithMedicalImage
Data
NLPforMedicalRecords:textprocessing,classification
KNN,SVM,CNN,AdaBoost,AutoencoderforCancerDetectionfromGeneExpressionData
ANN&SOMensemblesforgene-relatedclustering
CNNfortumordetection,segmentation,classification,andcomputer-aideddiagnosis,breastcancer(mammography),lungdiseases,lesionsegmentationfortraumaticbraininjuries,braintumor,CVdiseasewithCardiologicalimage
ListOfAIApplicationsDetailsandReferencesAvailablefromUpcomingBook(Chang,Feb2020)
Structure-Activity
Relationship(QSAR)
SVMfordisabilityduetostroke,
ReferencesBooks:• Chang,M.(2019).ArtificialIntelligenceinDrugDevelopment,PrecisionMedicine,andHealthcare.CRC• Chang,M.(2010).MonteCarloSimulationforthepharmaceuticalindustry.CRC• Chang,M.(2013).PrinciplesofScientificMethods.CRC• Chang,M.(2007,2014).AdaptiveDesignTheoryandImplementationUsingSASandR.CRC• Chang,M.(2011).ModernIssuesandMethodsinBiostatistics,Springer.• Chang,M.(2012).ParadoxesinScientificInference.Chang,M.(2014).PrinciplesofScientificMethods.CRC• ChangM.etal.(2019).InnovativeStrategies,StatisticalSolutionsandSimulationsforModernClinicalTrials.CRC• ChowandChang(2011).AdaptiveDesignMethodsforClinicalTrials.CRC• Hastie,T.,Tibshirani,R.,Friedman,J.(2001,2ndEdition,2009).TheElementsofStatisticalLearning.Springer• Koza,J.etal(2005).GeneticProgrammingVI.Routinehuman-competitivemachineintelligence.Springer• Ben,G&Pennachin,C.(1998).ArtificialGeneralIntelligence.Springer• RussellS&Norving(2003).Artificialintelligence–amodernapproach.PrenticeHall.• Bishop.C.(2006).PatternrecognitionandMachinelearning.SpringerReviewPapers:• Daouda,M.andMayo,M.(2019).ASurveyOfNeuralNetwork-basedCancerPredictionModelsFromMicroarray
Data.ArtificialIntelligenceInMedicine97,204—214• Jiang,F.etal(2017).Artificialintelligenceinhealthcare:past,presentandfuture.StrokeandVascularNeurology• Long,E.,Lin,H.,etal(2017).Anartificialintelligenceplatformforthemultihospitalcollaborativemanagementof
congenitalcataracts.• Qayyuma,A.etal.(2018.)MedicalImageAnalysisusingConvolutionalNeuralNetworks:AReview,Journalof
MedicalSystemsNovember2018,42:226• Vanhaelen,Q.etal.(2017).Designofefficientcomputationalworkflowsforinsilicodrugrepurposing.Drug
DiscoveryToday Volume22,Number2 February2017• Schmider,J.ArtificialIntelligenceinAdverseEventCaseProcessing.Clin.Pharmco&Therapeutics.Aprial2019• Ghasemi,F.(2018).Neuralnetworkanddeep-learningalgorithmsusedinQSARstudies:meritsanddrawbacks.
DrugDiscoveryTodayVolume23,Number10October2018
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 92
Similarity-BasedMachineLearninginR(1)library(class)#Normalizedata(bysubtractingMeanandDividingbySDoftheRefDatanormalizeData=function(DataToNormalize,RefData){normalizedData=DataToNormalizefor(nin1:ncol(DataToNormalize)){normalizedData[,n]=(DataToNormalize[,n]-mean(RefData[,n]))/sd(RefData[,n])}return(normalizedData)}#CalculateinitialscalingfactorsR0s#S0s=p-valuesInitialRs=function(X,eta,S0s){K=length(S0s);R0s=rep(0,K)for(kin1:K){#force<12toavoids=exp(d)overflow R0s[k] = min((-log(S0s[k])) ^(1/eta) / max(IQR(X[ ,k]), 0.0000001), 12) }return(R0s);}Distance=function(Rs,x1,x2){#3vectors:Rs=scalingfactors,x1=point1,x2=point2K=length(Rs) d = 0; for (k in 1:K) { d = d + (Rs[k]* (x2[k]-x1[k]))^2 } return(d^0.5)}#CalculateSimilarityScore.SimilarityTrain=function(eta,Rs,Xtrain){N1=nrow(Xtrain);K=length(Rs); S = matrix(0, nrow=N1, ncol=N1) for(iin1:N1){for(jini:N1){ d2 = 0; for (k in 1:K) { d2 = d2 + (Rs[k]* (Xtrain[i,k]-Xtrain[j,k]))^2 } S[i,j]=exp(-d2^0.5)#avoid0fori=jS[j,i]=S[i,j]#Usesymmetrytoreduce50%CPUtime.}}#Endjandiloopsreturn(S)}#CalculateSimilarityScoresbetweentrainingandtestsubjects.Similarity=function(eta,Rs,Xtrain,Xtest){N1=nrow(Xtrain);N2=nrow(Xtest);K=length(Rs) S = matrix(0, nrow=N2, ncol=N1) for(iin1:N2){for(jin1:N1){ d2 = 0; for (k in 1:K) { d2 = d2 + (Rs[k]* (Xtest[i,k]-Xtrain[j,k]))^2 } S[i,j] = exp(-d2^0.5) }}#Endjandiloopsreturn(S)}
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 93
Weight=function(S){ N1= nrow(S); N2=ncol(S); W=matrix(0,nrow=N1,ncol=N2)for(iin1:N1){sum_S_row=sum(S[i,])for(jin1:N2){W[i,j]=S[i,j]/max(sum_S_row,0.00000000001)}}return(W)}#EndofWeightPredictedY=function(W,Ytrain,Ytest){#CalculatepredictedoutcomeandErrorOutObj=list()OutObj$pred_Y=W%*%Ytrain#Forbinaryoutcome,pred_y=probof1.OutObj$MSE=mean((OutObj$pred_Y-Ytest)^2)return(OutObj)}#EndofPredictionYDerivativeE=function(eta,pred_Y,Rs,X,S,O){#DerivativesifthelossfunctionN=nrow(X);K=length(Rs)der_S=matrix(0,nrow=K*N,ncol=N);der_W=matrix(0,nrow=K*N,ncol=N)dist=matrix(0,nrow=N,ncol=N);der_E=rep(0,K)for(iin1:N){for(jini:N){ d2 = 0; for (k in 1:K) { d2 = d2 + (Rs[k]* (X[i,k]-X[j,k]))^2 }dist[i,j]=max(d2^0.5,0.0000001)#avoidoverflowinder_dbelowdist[j,i]=dist[i,j]}}#Endofiandjloopsfor(min1:K){for(iin1:N){for(jini:N){ der_d = (Rs[m]/dist[i,j]) * (X[i,m]-X[j,m]) ^2 der_S[(m-1)*N+i, j] = -1 * S[i,j] * eta * (dist[i,j])^(eta-1) * der_d der_S[(m-1)*N+j, i] =der_S[(m-1)*N+i, j]}}}#Endofj,i,andmloops#WeightDerivativefor(min1:K){for(iin1:N){ sum_der_S = sum(der_S[(m-1)*N+i, ]); sumSi = sum(S[i, ])for(jini:N){ der_W[(m-1)*N+i, j] = der_S[(m-1)*N+i, j] / sumSi - S[i,j]* sum_der_S /sumSi^2 der_W[(m-1)*N+j, i] =der_W[(m-1)*N+i, j]}}}#Endofj,i,andmloops#DerivativesofEfor(min1:K){for(iin1:N){err=2/N*(pred_Y[i]-O[i])#Formeansquarederror for (j in 1:N) { der_E[m] = der_E[m] + err * O[j] * der_W[(m-1)*N+i, j] }}}#EndofIandmloopsreturn(der_E)}#EndofDerivativeE
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 94
Similarity-BasedMachineLearninginR(2)
Learning=function(LearningRate,Lamda,Rs,der_E){#Updatescalingfactors,Rs.K=length(Rs)der_lossFun=der_E+2*Lamda*Rs#derivativesoflossfunctionRs=Rs-LearningRate*der_lossFun#forceRs<25toavoidS=exp(d)overflow for (m in 1:length(Rs)) { Rs[m] = min(max(0,Rs[m]),25) }return(Rs)}#EndofLearning#TrainingAItoobtainscalingfactors,Rs###SBMLtrain=function(Epoch,S0,Lamda,LearningRate,eta,Xtrain,Ytrain){TrainObj=list();OutObj0=list();OutObj=list()R0=InitialRs(Xtrain,eta,S0);S=SimilarityTrain(eta,R0,Xtrain)OutObj0=PredictedY(Weight(S),Ytrain,Ytrain)Rs=R0;OutObj=OutObj0#incaseEpoch=0fornolearningTrainMSE0=OutObj0$MSEpreLoss=OutObj0$MSE+Lamda*sum(Rs^2)#Learning iter=0;while(iter<Epoch){preRs=RsRs=Learning(LearningRate,Lamda,Rs,DerivativeE(eta,OutObj0$pred_Y,Rs,Xtrain,S,Ytrain))OutObj=PredictedY(Weight(SimilarityTrain(eta,Rs,Xtrain)),Ytrain,Ytrain)iter=iter+1Loss=OutObj$MSE+Lamda*sum(Rs^2)if(Loss>preLoss){Rs=preRs;iter=Epoch+1}preLoss=Loss}TrainObj$Rs=Rs;TrainObj$Y=OutObj$pred_Y;TrainObj$MSE=OutObj$MSEreturn(TrainObj)}#Linearregression#outcome="gaussian"or"binomial",...GLMPv=function(Y,X,outcome){LMfull=glm(Y~.,data=X,family=outcome)sumLM=coef(summary(LMfull)) pVals=sumLM[,4][1:ncol(X)+1]return(pVals)}
#ExamplewithSimulatedMultivariateNormalData#covariancematrixforcreatingcorrelatedX.library(mvtnorm)sigma=matrix(c(1,0.0,0.6,0.2,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.6,0.0,1,0.5,0.0,0.0,0.0,0.2,0.0,0.5,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1),ncol=7)mu=c(0,0,0,0,0,0,0)#7multivaritenormalmeansinitS=rep(0.5,7)#7intialsimilarities.xTrain=rmvnorm(n=40,mean=mu,sigma=sigma)xTest=rmvnorm(n=40,mean=mu,sigma=sigma)#ConstructythatarenotlinearlyrelatedtoxYtrain=xTrain[,1]+xTrain[,2]*xTrain[,3]+xTrain[,4]^2+xTrain[,5]Ytest=xTest[,1]+xTest[,2]*xTest[,3]+xTest[,4]^2+xTest[,5]#DatanormalizationXtrain=as.data.frame(normalizeData(xTrain,xTrain))Xtest=as.data.frame(normalizeData(xTest,xTrain))#Assignp-valuefromLMtoinitialsimilarities#initS=GLMPv(Y=Ytrain,X=Xtrain,outcome="gaussian”)mySBMLtrain=SBMLtrain(Epoch=5,S0=initS,Lamda=-0.5,LearningRate=0.125,eta=1,Xtrain,Ytrain)myPred=PredictedY(Weight(Similarity(eta=1,mySBMLtrain$Rs,Xtrain,Xtest)),Ytrain,Ytest)#MSEusingtrainingmeanforpredictionplot(myPred$pred_Y,Ytest)mse1=mean((Ytest-mean(Ytrain))^2)#naivemsemse2=myPred$MSE#Probabilityerroravemse1=avemse1+mse1/50avemse2=avemse2+mse2/50}avemse1;avemse2
Acknowledgment
• Dr.SusanHwang,BostonUniversity
Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 95