Download - Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

IntroductionToArtificialIntelligenceForDrug

Development&MedicalScience

MarkChang,PhD

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 1

FDA-IndustryWorkshop,WashingtonDC,September,2020

TopicstoCover•  AI/MachineLearningOverview•  SupervisedLearning

– DeepLearningNeuralNetworks–  Similarity-BasedLearning/KernelMethod/SupportVectorMachine

– DecisionTreeMethods•  Non-SupervisedLearning

– UnsupervisedLearning–  ReinforcementLearning–  SwarmIntelligentLearning–  EvolutionaryLearning


ArtificialIntelligence(AI)•  Intelligence=AbilityofPerformingMentalWork•  AI=SoftwarewiththeAbilityofPerformingHumanMentalWork•  Robot=Machine(hardware)withtheAbilityPerformPhysicaland/or

MentalWorkthatUsuallyRequiresAI•  ExamplesofAITechnologiesandRobotics

–  Calculator,computer,iphone,ipad,–  ELIZA,Apps,searchengine,Googlemap,ecommercesystem,..Alpha0–  Machine-learning,datamining,statisticalmodeling.–  Humanoidrobots,iRobot,robotsinassemblylines,self-drivecars

•  MedicineAI–  QuantitativeStructure–ActivityRelationship(QSAR)andMolecularDesignin

DrugDiscovery–  CancerPredictionUsingMicroarrayData–  DiseaseDiagnosisandPrognosisBasedonMedicalImage–  NaturalLanguageProcessingforMedicalRecords–  Similarity-BasedAIforClinicalTrials


ArtificialGeneralIntelligence(AGI)•  AGIIsMachineWithFullScopeOfHumanIntelligence

–  UnlikeNarrowAIthataccomplishspecifictasksforhuman–  Capabilityofexperiencingconsciousness,emotion,discovery,creativity,

self-awareness,collaboration,andevolution–  Maybebecapableofcreatingrobotslikehimselfandhumanthatcan

givebirths•  AGIIsAnotherRaceOfHumanBeings

–  AsmedicalAIdevelops,wegraduallyreplaceourmalfunctionedorganswithones(artificialornot),makingeveryoneamixtureofhumanandmachine.

–  Asrobotsreachfullhumanintelligenceandintensivelyinteractwithus,wewilldevelopstrongfeelingwiththemandwillnotdiscriminatethemjustbecauseoftheiroriginalities.

•  ChallengesToOvercome–  Computingpowerrequired-possiblypoweredbyQuantumComputing–  Physical(muscular)powerrequired–  Man-machinemixturemightbeasolution



UnsupervisedLearning

MachineLearning

ReinforcementLearning

EvolutionalLearning

SwarmIntelligenceLearning

SupervisedLearning

Classification

Regression

Anomalydetection

Clustering

Association

Dimensionreduction

MachineLearning=BackboneofNarrowAI

TypesOfMachineLearning•  SupervisedLearning

–  Learnergivesaresponseybasedoninputx,andcomparethetargety–  Trainedlearnerwillbeabletopredictfutureoutcome–  Ex:Digitalimagingrecognitionforcancerdiagnosis

•  UnsupervisedLearning–  Basedonthesimilaritiestore-representtheinputsinamoreefficientway–  Findhiddenstructurewithoutthehelpofatargetanswer–  Ex:Documentationorganization,customersegmentation,patientclustering

•  ReinforcementLearning–  Concernshowalearnershouldtakeactionsinanenvironmentsoasto

maximizesomenotionoflong-termreward–  Ex:iRobot,driverlesscar

•  EvolutionaryIntelligence/Learning–  EmergesthroughevolutionsovergenerationsofAIagents–becomebetterin

somesense–  Ex:GeneticalgorithmforpharmacovigilanceandHealthcaremanagement

•  SwarmIntelligence/Learning–  Systemsoperationbasedonmicro(individual)motivations,withouta

centralizedcontrollerorleader–  Ex:Antalgorithmforoptimalcustomerboardingstrategyatairport


DeepLearning•  DeepLearningNeuralNetwork

– Multiple-layerartificialneuralnetwork–  Progressivelyextracthigher-levelfeaturesfromrawinput

•  FourMajorNetworkArchitectures1.  Feed-forwardNeuralNetwork(FNN)forGeneral

ClassificationandRegression2.  ConvolutionNeuralNetwork(CNN)forImageandFacial

Recognition3.  RecurrentNeuralNetwork(RNN)forSpeech

Recognition,NaturalLanguageProcessing4.  DeepBeliefNetwork(DBN)forDiseaseDiagnosisand

Prognosis.


ArtificialNeuralNetwork(ANN)


ActivationFunction:identity,softmax,logistic,sigmoidal,tanh,andsgnfunctions,…

MultiplePerceptron(ForwardNeuralNetwork)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019

•  Withaidentityactivationfunction,theANN=linearfunctionXiandlinearinWi•  ClassiclinearmodelislinearinparameterWibutcannonlinearinXi•  Learning=updatingweightWi

9

Wi Wi

Learning-BackpropagationAlgorithm


Learning(training)=updatingweightwiinthenetwork.Errororlossfunction:

Updateweightwibackwardslayerbylayerstartingfromtheoutputlayer,hiddenlayers,totheinputlayer.α=learningrate,asmallconstanttoensureconvergenceofthealgorithm,

PredictedTargeted

Weightincrement

Δwi = −α∂E∂wi

E = 1N

(Yi −Yi )2

i∑

Y = f (wi ,Yi )

TrainingAndTesting•  TrainAIModeltoDetermineItsParametersBeforeUse•  TesttheTrainedModelforItsPerformance•  DefineTargetPopulation•  ObtainTrainingandTestingdatasets

–  Resamplingwithreplacement(bootstrap)andwithoutreplacement(split)

–  Forlargersamplesizeproblems,useresamplingwithoutreplacementbecauseeachsamplelikelyreflectsthedistributionofthetargetpopulation.

–  Forsmallsamplesizeproblems,useresamplingwithreplacementbecauseeachsampleoftendoesnotreflectthepopulationandsampleistoosmallforresamplingwithoutreplacement(Empiricaldistribution≈populationdistribution).


TrainingErrorVersusTestingError


Over-fitting

Regularization:Introducingalossfunctiontoreduceover-fitting.

ConvolutionNeuralNetwork(CNN)•  MainIdeasInCNNs

–  Differentfiltersintheconvolutionlayers–  Locationinvarianceandcompositionality:makesenseforimageprocessingbutnotforNLPortime-seriesevents

•  FilterFunctions–  Eachidentifiesparticularfeaturesorimageelements–  Overlappedlocalfeaturestoformulatethe“overallpicture.”

•  CNNforDataOtherThanImages,If–  (1)thedatacanbetransformedtolooklikeimagedatainmatrixform,and

–  (2)thedataisjustasusefulafterswappinganytwocolumns


ConvolutionIllustrated


Aconvolution(function)oftwofunctionf(x)andg(y)isdefinedasthesumoftheproductsofoneimagef(·)atlocationxandanotherimage,g(·)(calledthefilter)atadifferentlocationy-x.Atpixlevel:

14

FiltersizeN=9

Ahighvalueindicatesgoodmatchbetweenfilterandoriginalimages

Highvaluesondiagonalelementsindicatethebackslashimage

CNNForDiseaseDiagnosis


•  PoolingLayer=>ReduceResolutionorSizeofImages•  ConvolutionalLayer=>FilteringSpecificFeatures•  DenseLayer=>FullyConnectedLayer

SampleOfMnistHandwrittenDigits


CNNForWrittenDigitsRecognition98.7%AccuracyUsingkerasInR


ApplicationsOfCNN•  Hussain(2017),CascadeddeepCNNforabraintumorsegmentation.•  Farooq(2017),CNN-basedmethodfortheclassificationofAlzheimer’s

diseaseinMRIimages•  Moeskopset.al(2016),amultiscaleCNN-basedapproachforautomatic

segmentationofMRImagesforclassifyingvoxelintobraintissueclasses•  Prasoonet.al(2013),atri-planarCNNusedforsegmentationoftibial

cartilageinkneeMRIimages•  Zhangetal(2015),segmentationofisointensebraintissuepresented

throughaCNNusingamultimodalMRIdatasetbytrainingthenetworkonthreepatchesthatareextractedfromtheimages

•  Anthimopoulosetal(2016),lungpatternclassificationforinterstitiallungdiseasesusingadeepconvolutionalneuralnetwork

•  Coleetal(2016),Predictingbrainagewithdeeplearningfromrawimagingdataresultsinareliableandheritablebiomarker

•  Estevaetal(2017),Dermatologist-levelclassificationofskincancerwithdeepneuralnetworks


ArchitectureOfRecurrentNeuralNetwork(RNN)

•  RecurrentNeuralNetwork(RNN)–  ANNformodelingtemporaldynamicbehavior–  UnlikeFNN,RNNscanusetheirinternalstateasmemorytoprocesssequencesofinputs.

•  BasicRNN–  Suffersfromashort-termmemoryproblem–  DifficulttocarryinformationfromearliertimestepstolateroneswhenthesequenceislongassuchEnglishtext

•  LongShort-TermMemoryNetworks(LSTMs)–  Designedtodealwithlong-termdependencyproblems–  UsedforLNP,stockmarketprediction,voicerecognition,motionpicturecaptioning,andpoemandmusicgeneration.


VanishingAndExplodingGradientProblem&Solutions


X0 X1 Xt-1 Xt+k Xn

w

Problem:Whenthenumberoflayersgetsverylarge,thegradientsatearlylayersvanishandchangesofthecorrespondingweightshaveminimaleffectontheoutcome

… … …www

Xt…

SkipConnection

Solutions:•  IntroduceSkipConnections(e.g.,if…then…)•  LeakyRecurrentUnits•  GateRecurrentNetwork•  LongShort-TermMemoryNetwork(LSTM)•  DeepBeliefNetwork

w

ArchitectureOfLongShort-TermMemoryNetwork


ApplicationOfRNNToMolecularDesign

Gupta,etal.(2018)trainedtheirRNNon541,555SMILESstrings,withlengthsfrom34to74SMILEScharacters(tokens).TheRNNmodelcanbeusedtogeneratesequencesonetokenatatime,asthesemodelscanoutputaprobabilitydistributionoverallpossibletokensateachtimestep.Typically,theRNNaimstopredictthenexttokenofagiveninput.Samplingfromthisdistributionwouldthenallowgeneratingnovelmolecularstructures.


SimplifiedMolecular-InputLine-EntrySystem(SMILES)

SMILESdescribethestructureofchemicalspeciesusingshortASCIIstrings.In2007,anopenstandardcalledOpenSMILESwasdevelopedintheopen-sourcechemistrycommunity


ApplicationsOfLSTMInNLP

•  SentimentAnalysis:identifytextas“positive”or“negative,”filteringspamorclassifyingemailtextasspam,classifyingthelanguageofthesourcetext

•  LanguageModeling:predictingtheprobabilisticrelationshipsbetweenwords,enablingonetopredictthenextwordsorphrases

•  SpeechRecognition:translatespeechtotextreadablebyhumans•  MachineTranslation:translatetextbetweenlanguages•  DocumentSummarization:aheading,abstractforadocument•  QuestionAnsweringSystem:ProcessQuestionsandProvideAnswersina

naturallanguage


AtrainedLSTMistoprovidetheprobabilityofstringSviaaconditionalprobability:

NaturalLanguageProcessing(NLP)VersusMolecularDesign

•  FeaturesInNLP–  Wordsatdifferencelocationsorsequenceofwordsinasentence

•  WordEmbedding–  Semanticmodelingrequiresmappingwordstonumbersorwordembedding.–  Wordembeddinginvolvesamathematicalembeddingfromaspacewithmany

dimensionsperwordtoacontinuousvectorspacewithalowerdimension.•  FeaturesInMolecularDesign

–  Molecularmodelsexhibittree-structures,whichcanbetransferredintoone-dimensionalsequencesofdifferentsubstructures

•  WordEmbeddingInBiologicalsequences–  WordEmbeddingforn-gramforbiologicalsequences(e.g.DNA,RNA,and

Proteins)-AsgariandMofrad(2015).–  Therepresentationcanbeusedindeeplearninginproteomicsandgenomics.

•  LongShort-TermMemoryNets–  NLPdealswithsequencesofwords.Drugdiscoverydealswithgenesequences

ormolecularstructuresrepresentedbyasequenceofbiologicalsubstructures–  LSTMscanbeusedforcompoundscreeningandmoleculardesignasinNLP.


DeepBeliefNetwork(DBN)•  TwoMajorChallengesInQSARStudiesandDrugDesign

–  (1)thelargenumberofdescriptorsmayhaveautocorrelations–  (2)properparameterinitializationinmodelpredictiontoavoidanover-fittingproblem.

•  DBNCombinesUnsupervisedandSupervisedLearning–  Forefficientlearning–  EachDBNlayeristrainedindependentlyviaunsupervisedportion

–  Theunsupervisedlearningisusedfordimensionreductionorselectingsubsetsofdescriptors.

–  Afterthecompletionofunsupervisedlearning,theoutputfromthelayersisrefinedwithsupervisedlogisticregression.


ArchitectureOfDeepBeliefNetWithStraggledRestrictedBoltzmannMachine(RBM)


•  RBMsaretwo-layerneuralnetsthatconstitutethebuildingblocksofdeep-beliefnetworks.

•  ThefirstlayerofRBMiscalledthevisible,orinputlayer,andthesecondisthehiddenlayer.

•  Activationfunctionσ=normalizedexponentialenergyfunction.

•  Two-StageLearning:Unsupervisedlearningforfeatureselection=trainingRBMinparallel,followedbysupervisedlearningusingbackpropagation

27

σ

σσ

σσσ

σσ

σ σ 1stRBM

2ndRBM

3rdRBM

σ ClassOutput

InputLayer

Single-StepContrastiveDivergence(CD-1)ProcedureForRBMAndDBN

•  Takeatrainingsetvandinitialw,computetheprobabilitiesofthehiddenunits•  Sampleahiddenactivationvectorhfromthisprobabilitydistribution.•  Computepositivegradient=outerproductofvandh•  Fromh,sampleareconstructionv'ofthevisibleunits,thenresamplethehidden

activationsh'fromthis(Gibbssamplingstep).•  Computenegativegradient=outerproductofv'andh’•  UpdatetoweightWwithincrement=Learning_rate*(positivegradient-

negativegradient):•  Updatethebiasesaandb:•  TheaboveprocedurestartsfromtheinputlayerorthefirstRBMlaterandmove

graduallytotheoutputlayer.•  WhentheunsupervisedtrainingisfinishedforallRBM,thesupervisedlearning

(classification)startstofinetuningtheweights.


ApplicationOfDeepBeliefNetwork•  Hintonetal(2006)proposedafastlearningalgorithmusedfor

deepbeliefnetworks.•  Zhenetal(2016)proposedaconvolutionaldeepbeliefnetworkis

fordirectestimationofaventricularvolumefromimageswithoutperformingsegmentationatall

•  Jaekwon,etal(2017)comparedDBNswithothermethodsincardiovascularriskprediction.TheyproposedacardiovasculardiseasepredictionmodelusingthesixthKoreaNationalHealthandNutritionExaminationSurvey(KNHANES-VI)2013datasettoanalyzecardiovascular-relatedhealthdata.TheyshowthatstatisticalDBN-basedpredictionmodelhasan83.9%accuracy.

•  Ghasemietal(2018)usedadeepbeliefnetworktoevaluatetheDBN’sperformanceusingKaggledatasetswithfifteentargetscontainingmorethan70kmolecules.Theresultsrevealedthatanoptimizationinparameterinitializationwillimprovetheabilityofdeepneuralnetworkstoprovidehighqualitymodelpredictions.


SoftwarePackagesForDeepLearning

(TensorFlow,Keras,kerasandkerasR)•  TensorFlow

–  MultiplelevelsofabstractionpoweredbyMicrosoft–  Anopensourceartificialintelligencelibrary,usingdataflowgraphstobuild

models.–  Forcreatinglarge-scaleneuralnetworkswithmanylayers–  Scale->Vector->Matrix->Tensor.–  TensorFlow=MultidimensionalDataFlowfromLayertoLayerinArtificial

NeuralNetworks•  Keras

–  High-levelneuralnetworksAPIwritteninPython–  CapableofrunningontopofTensorFlow,CNTK,orTheano–  Developedwithafocusonenablingfastexperimentation

•  TheRpackages,kerasandkerasR–  TwoRversionofKerasforthestatisticalcommunity–  keraspackageusesthepipeoperator(%>%)toconnectfunctionsor

operationstogether,butyouwon’tfindthisinkerasR–  kerasRusesthe$operatortomakeyourmodel–  kerasRcontainsfunctionsthatarenamedinasimilar,butnotidenticalwayas

theoriginalKeraspackage.Copyright®MarkChang,PhD,AGInception,

BostonUniversity,2019 30

MachineLearningPackagesInR


PublicDataSourcesForAI•  ThecommonlyusedpublicdatarepositoriesforAI

applicationsincancerpredictionandclusteringincludetheTCGA,UCI,NCBIGeneExpressionOmnibus(GEO)andKentridgebiomedicaldatabases.

•  Kaggle–  InsideKaggleyou’llfindallthecode&datayouneedtodoyourdatasciencework.Useover19,000publicdatasetsand200,000publicnotebooksarereadytoconqueranyanalysisinnotime.

–  www.kaggle.com•  UCICenterforMachineLearningandIntelligentSystems

–  http://archive.ics.uci.edu/ml/datasets.php


TwoSpecialArtificialNeuralNetworks:

– GenerativeAdversarialNetworks(GANs)– Autoencoder(AutoassociativeNetwork)


GenerativeAdversarialNetworks(GANs)


1.  Generatortakesinrandomnumbersandreturnsanimage.2.  Thegeneratedimageisfedintothediscriminatoralongwithastreamofimagestakenfrom

theactualdataset.3.  Discriminatortakesinbothrealandfakeimagesandreturnsprobabilitiesofauthenticity.

GANcanbeviewedasthecombinationofacounterfeiterandacop,wherethecounterfeiterislearningtopassfalsenotes,andthecopislearningtodetectthem.Botharedynamicinthezero-sumgame,eachsidecomestolearntheother’smethodsinaconstantescalation.Asthediscriminatorchangesitsbehavior,sodoesthegenerator,andviceversa.

ApplicationsOfGenerativeAdversarialNetworks(GAN)

•  Imagingmarkerscanbeusedformonitoringdiseaseprogressionwithorwithouttreatment.Modelsaretypicallybasedonlargeamountsofdatawithannotatedexamplesofknownmarkersaimingatautomatingdetection.ChristianDoppleretal(2017)developedaGANthatcanlearnamanifoldofnormalanatomicalvariability.Appliedtonewdatasuchasimagescontainingretinalfluidorhyperreflectivefoci,themodellabelsanomaliesandscoresimagepatchesindicatingtheirfitintothelearneddistribution.

•  Deepgenerativeadversarialnetworksaretheemergingtechnologyindrugdiscoveryandbiomarkerdevelopment.Kadurinetal.(2017)demonstratedaproof-of-conceptinimplementingadeepGANtoidentifynewmolecularfingerprintswithpredefinedanticancerproperties.TheyalsodevelopedanewGANmodelformolecularfeatureextractionproblems,andshowedthatthemodelsignificantlyenhancesthecapacityandefficiencyofdevelopmentofthenewmoleculeswithspecificanticancerpropertiesusingthedeepgenerativemodels.

•  Yahi,etal.(2017)proposeaframeworkforexploringthevalueofGANsinthecontextofcontinuouslaboratorytimeseriesdata.Theauthorsdeviseanunsupervisedevaluationmethodthatmeasuresthepredictivepowerofsyn-theticlaboratorytesttimeseriesandshowthatwhenitcomestopredictingtheimpactofdrugexposureonlaboratorytestdata,incorporatingrepresen-tationlearningofthetrainingcohortspriortotrainingtheGANmodelsisbeneficial.

•  Putin,etal.(2018)proposedaReinforcedAdversarialNeuralComputer(RANC)forthedenovodesignofnovelsmall-moleculeorganicstructuresbasedontheGANparadigmandreinforcementlearning.ThestudyshowsRANCscanbereasonablyregardedasapromisingstartingpointtodevelopnovelmoleculeswithactivityagainstdifferentbiologicaltargetsorpathways.Thisapproachallowsscientiststosavetimeandcoversabroadchemicalspacepopulatedwithnovelanddiversecompounds.



1.  Autoassociative:Output=input2.  DimensionReduction:Hiddenlayersmallerthaninputlayer3.  Mixtureofunsupervisedandsupervisedlearning

Autoencoder(AutoassociativeNetwork)

ApplicationsofAutoencoder•  Kadurin,etal.(2017)presentedanapplicationofautoencodersfor

generatingnovelmolecularfingerprintswithadefinedsetofparameters.–  7-layerarchitecturewiththelatentmiddlelayerservingasadiscriminator–  Inputandoutputuseavectorofbinaryfingerprintsandconcentrationofthe

molecule.–  NCI-60celllineassaydatafor6252compounds–  Screened72millioncompoundsinPubChemandselectcandidatemolecules

withpotentialanti-cancerproperties.•  CancerpredictionusingAIincludespredictingtheexistenceofcancer,

cancertypeandsurvivabilityrisk.Differenttypesofautoencodershavebeenusedforfilteringmicroarraygeneexpressions:–  Stackeddenoisingautoencoders(Jie,etal.,2015),–  Contractiveautoencoders(Macıas-Garcıa,etal.,2017)–  Sparseautoencoders(Rasool,2013),–  Regularizedautoencoders(Kumardeep,etal.,2017)–  Variationalautoencoders(WayandGreene,2017)–  Deeparchitectureoffourlayerswith15,000,10,000,2000,and500neurons

(Danaeeetal,2016)


DecisionTree•  ATreeMethod(TM)

–  Acommonlyusedsupervisedlearningmethod–  Classification&RegressionTree–  Featuresaredichotomizedforabinarytree

•  ImpurityMeasuresForOptimization–  Giniindex(CART)–  Entropy(ID3,C4.5)–  Misclassificationerror

•  OverfittingandErrorPropagation–  Forlargertrees,anearlymisclassificationerrorcanpropagatedownstream,

eventuallyleadingtopoorpredictions.Solutions:–  Treepruning–  Bagging&Boosting–  RandomForest

•  ApplicationsinMedicine


ExampleOfBinaryDecisionTree


TheDecisionTreeMethodForCategoricalOutcome


CommonImpurityMeasures


MisclassificationError

GiniIndex

Gross-EntropyorDeviance

ForNodemofAClassificationTree(predictedprobabilitypmk):

ThresholdDeterminationAndTreePruning

•  DeterminingAnOptimalTreeModel–  Twocompetingfactors:theaccuracyofthetreemethodand

computationalefficacy.–  Foragiventreedepth,optimalthresholdforeachparameteris

obtainedusingthegreedymethod:tryarangeofdifferentthresholds–  Anyofthethreeimpuritiescanbeused,butoftenmisclassification

rate.–  Treesizeisatuningparametergoverningthemodel’scomplexity

•  Cost-ComplexityPruning–  Splittreenodesonlywhenthedecreaseinerrorduetothesplit

exceedssomethreshold.Thisstrategyistooshort-sightedsinceaseeminglyworthlesssplitmightleadtoaverygoodsplitbelowit.

–  Thepreferredstrategyistogrowalargetree,stoppingthesplittingprocessonlywhensomeminimumnodesizeortreedepthisreached.Thenthislargetreeisprunedusingcost-complexitypruning


Bagging,Boosting,AndRandomForest•  Asinglebigtreecanpropagateclassificationerrorstotheleaves,

makingthepredictionveryunstable.–  Solutions:Bagging,Boosting,andRandomForest

•  Bagging(BootstrapAggregation)–  Forminganaverageofmanydifferenttrees.

•  Boosting–  Outputtheclass:

–  WeakclassifiersGi(x)areonlyslightlybetterthanrandomguessing.–  αicomputedbytheboostingalgorithm(AdaBoostAlgorithm)to

weightmoreonbetterclassifiesGi(x).•  RandomForests

–  anensembleclassifierthatconsistsofmanydecisiontreeswithrandomlyselectedfeatures

–  thefinalclassisthemodeoftheclass’soutputbyindividualtrees.


ApplicationsOfDecisionTreeMethods

•  Singh,etal(2015)haveusedRFsforbioactivityclassification•  Wangetal.(2015)haveusedtheRFmethodtomodeltheprotein-

ligandbindingaffinitybetween170complexesofHIV-1proteases,110complexesoftrypsin,and126complexesofcarbonicanhydrase.

•  Kumarietal(2015)haveconstructedanimprovedRFbyintegratingbootstrapandrotationfeaturematrixcomponentsfordrugtargetsidentification.

•  Mistry,etal.(2016)haveusedRFandDTstomodelthedrug-vehicletoxicityrelationship.Theirdatasetincluded227,093potentialdrugcandidatesand39potentialvehicles.Theresultingmodelpredictedthetoxicityreliefofdrugsbyspecificvehicles.


Similarity-BasedandSimilarity-Principle-BasedMachineLearning


NewScientificParadigm

•  ControversiesinStatisticalEvidenceandScientificDiscovery

•  CallforNewParadigms–AI/MachineLearning•  TheSimilarityPrinciple(SP)

– Humanintelligenceindailylife– RoleofSPinscientificdiscovery– Similarity-BasedMachineLearning(SBML)


Simpson’sParadox


Allpatients:BbetterthanAMales:AbetterthanBFemales:AbetterthanBQuestion:ShouldapatienttakeAorB?AllFemales:AbetterthanBYoungFemales:BisbetterthanAOldFemales:BisbetterthanAQuestion:ShouldapatienttakeAorB?

•  Howtointerprettheresultsifthesepatients+you=theentirepatientpopulation(Oneresponsefromyouwouldnotchangethedirectionofthedrugeffect)?

•  ControversialQuestion:HowSpecificIsTooSpecific?•  Itisphilosophicalquestionorstatisticalquestion:Don’tputcartbeforethehorse•  FundamentalSolution:TheSimilarityPrinciple

ControversiesInPredictionOfDrugEffect

•  LongevityPrediction–ParadoxofTraveling–  Doesthepredictionofperson’slongevitychangesassoonashearrivedatthenewcountrybecauseofitsdifferentlife-expectance?

•  BiasInPredictiveEffect–  Stratifiedrandomizedandun-randomnessinclinicsiteselectionleadtoadifferentpatientdistribution(e.g.racedistribution)intheclinicaltrialfromthetargetpopulation

–  Ifracehaseffect,themeandrugeffectwillbeusuallybiased.

•  DrugEffectEstimationforMultiregionalTrial–  Howsmallshouldregionstobe,country,state,city?


SimilarityPrinciple•  Allscience,andlearningitself,isbasedonafundamental

principle–thesimilarityprinciple(Chang,2012,2014).•  TheSimilarityPrinciple:

–  Similarthingsorindividualswilllikelybehavesimilarly,andthemoresimilartheyarethemoresimilarlytheybehave.

–  Ex.,peoplewiththesame(orasimilar)disease,gender,andagewilllikelyhavesimilarresponsestoamedicalintervention.Iftheyaresimilarinmoreaspectstheywillhavemoresimilarresponses.

–  Predictedoutcome=similarity-weightedtheobservedoutcomes:

–  NowdoyouknowhowtoresolvetheSimpson’sParadox?


SimilarityPrincipleIllustrated


Similarthingsbehavesimilarly.

IllustrationsSimilarityPrinciple

Thingsmoresimilarbehavemoresimilarly.

Featuresselectionandgrouping.

Rolesofoutcomeandfeaturesinsimilarity

1.  QSAR:Compoundswithsimilarstructureswillhavesimilaractivities.

2.  Terroristattach:Sept11Everyyearisahighrisky.

1.  Useresultsofdrugtestinanimalstoinformasmallfirst-in-manclinicaltrial,and

2.  Useresultsfromclinicaltrialstodecidewhetherthedrugcanbeusedforlargepatientpopulation

1.  Definethetargetpopulationandusedformodelevaluation

2.  Appliedtoscientificdiscoveryandcausalrelationships

1.  Similarityreliesonbyboththefeaturesandoutcomevariable.

2.  Therelativeimportanceofdifferentfeaturescanbeobjectivelydeterminedusingfeature-scalingfactors

TwoApproachesToLearningFromData

•  CausalityApproach–  Seekingtherelationshipbetweentheoutcomeandtheindependent

variables(attributes)–  Y(x)=f(x;a(x’)),parameters=functionofobserveddata

•  SimilarityApproach–  Seekingtherelationshipbetweentheoutcomesbetweendifferent

subjectswithdifferentattributes–  Y(x)~S(x,x’)Y(x’),similarity=functionofobserveddata

•  SimilarityPrinciple(SP)versusCausality–  SPisthefoundationofanycausalrelationships–  Unavoidablesimilaritygroupingforrepetitionsofeventsandcausality

•  AILearningandScientificDiscovery–  Showhumanbrainsincredibleability–  Implyhumanincredibleinabilitytohandleeverythingindividually


SubjectivityOfCausality&ControversiesInInterpretationsOfCauseEffects

•  SubjectivityinPotentialCauseSelection•  CausesAreGenerallyNotIndependent•  ModelsAreNotUnique

–  BodyWeight=f(waist,height)–  BodyWeight=f(waist,height,shoulderheight)–  BodyWeight=f(waist,height,shoulderheight,footlength)–  Infact,allfactorsarecorrelatedandstatisticallysignificant

•  DifferentModelsProvideDifferentInterpretationsofEffect(associationorcausality)ofWaistandHeight

•  SimilarControversiesInDrugEffect=f(drug,race,genex,geney)–  Simplystatingdrugeffectwithoutspecifyingotherfactorsismisleading.–  Interpretationofafactoreffectmustinthecontextofallotherfactorsinthe

model–  Givenalargerbutfinitepopulation,manyfactors(andinfinitepossible

combinationsofthesefactors)canbesignificant.Thefactorsincludedinthemodelissubjectiveandconsequentlytheinterpretationdrugeffectisarbitrary


DoseAnalysisofCovarianceMakeSense?•  ModelOptions:

–  Model1:Weight=a1*height–  Model2:Weight=a2*height+b2*(shoulderHeight)–  Model3:Weight=a3*height+b3*(shoulderHeight)+c3*(leglength)–  Model4:Weight=d4*(headheight)+b4*(shoulderheight)

•  Givenbigdata–  Allmodelsaresignificant

•  Howtointerprettheparameters?–  Whatistheeffectofheight,a1,a2,ora3?–  Whatistheeffectofshoulderheight,b1,b2orb4?–  a1,a2,anda3canbeverydifferentsinceheightmainlyconsistsofshoulder-

height,andlegisapartofshoulder-heightandheight–  c3canbenegativesincea3andb3havealreadyincludethelegeffect,would

yousay:“longerlegwillreducetheweight?”•  Conclusion:

–  Manyattributesorcovariates(includinggenes)inmodelingexhibitstrongcorrelations,thecommoninterpretationofparametersoftendonotmakemuchsensetome.


SimilarityPrincipleInAction-Similarix


BoundedSimilarity:0≤Sij≤1Completelydifferent=0Identical=1

Feature-ScalingFactors(FSF)


•  RkdependonSelectedFeaturesandOutcome•  IncludingMoreCommonFeatureswill

•  reducethedifferenceandincreasethesimilarityamongsubjects•  leadtolargerRkdifferencestobalanceouttheeffectofcommonfeatures

•  WhyExponentialSimilarityFunction•  Humanonlycanpayattentiontolimited“localevents”anddistantevents

quicklydisappearatthehorizon•  Humanorgansensibilityisonlog-scale(e.g.,loudness,brightness)

ExponentialSimilarityFunction:

DistanceFunction:

SBML-Algorithm


SolveinitialscalingfactorsRkbasedonSijandsimilarityfunction

CalculateweightWijusingSijandRk

Normalizetrainingdataset

Assigninitialsimilarityscores,Sij

Normalizednewdata(withouttheoutcomevariable)usingmeanand

standarddeviationfromthetrainingset

Modeloutcome,Yi=sumwijOj

CalculateerrorE=sum(Yi-Oi)2/N

UpdatescalingFactorsRkusinggradientmethodwithlearningrate,alpha

CalculatesimilarityscoreSijbetweennewsubjectsandsubjectsinthetrainingsetusingthepreviouslydeterminedRk

CalculateweightWijusingRk

Predictfutureoutcome,Yi=sumwijOj

•  LossFunctions–  Lasso:

–  Ridge:

–  ElasticNet:•  MinimizationwithGradientMethod

•  Learningwithrateα

LearningWithRegularization


ClinicalTrialExample:RareDisease


TrainingSetSize

FullLinearModel

OptimalLinearModel

RidgeRegression

SBML

25% 0.774 0.788 0.796 0.616

50% 0.803 0.802 0.804 0.655

75% 0.806 0.809 0.810 0.668

Note:MSEnormalizedbyvarianceoftheoutcomes

MeanSquaredError(MSE)ofDifferenceMachineLearningMethodsBasedonBootstraps

Training,Validation&EvaluationOfAIMethods

•  Cross-ValidationforTuningParameters–  Exhaustivecross-validationmethods:allpossiblewaysoftraining-validationsplits.

–  Leave-p-outcross-validation(LpOCV):exhaustivelysplitswithpobservationsasthevalidationset,therestfortraining.

–  Bootstrapping:therandomselectionofsizepastrainingsetandsizeqastestset,withreplacement

•  TargetPopulation(TP)–  Well-definedTP(WDTP)isneededformodelevaluation,butinAIworld,WDTPisoftenmissing.

–  Bigdata:simplysplitthesampleintotrainsetandtestset–  Largerdata:trainsetsandtestsetsobtainedfromresamplingwithoutreplacement

–  Smalldata:useempiricaldistribution–trainsetsandtestsetsobtainedfrombootstrapping


SBMLPerformanceOfRegressionProblem


MSEofP

redicted

Y/NaïveM

SE

TrainingSampleSize

OutcomeY=X1+X2X3+X4X4+X5.However,X1,…,X7fromstandardmultivariatenormaldistributionareincludedinallmodels(X6,X7arenoise).

PrecisionVersusComplexityOfDataRelationship,SampleSize,AndComputationalEfficiency

-VariablesX6andX7arenoise-  TrainingTime=O(N2)=0.09N2

(second).TrainingTime=2.4minutesfortrainsetsizeN=400.

-  MSEnormalizedbythenaïveMSEfromthepredictionusingthetrainingsamplemean

-  Learningrateα=0.125,λ =-0.5Epoch=5


=X1+X2X3+X4X4+X5=X1+5X2X3+X4X4+X5

Normalize

dMSE

LessNonlinear

MoreNonlinear

TrainingSampleSize

SBMLPerformanceForClassification


•  Y=sign(X1+X2X3+X4X4+X5)+1)/2•  Xi=multivariatenormal•  Lossfunctionλ=-0.5•  LearningRateα=0.125•  MSEisbasedonprobabilityofY=1.

LogisticModel

SBML

MSE

MSE

MSE

ParadigmsOfDrugDevelopment

•  Challenges– Multi-RegionalTrialandSubgroupAnalysis– RareDiseaseandPrecisionMedicine

•  Solutions– SmallerClinicalTrials– StagewiseMarketing-Authorization– EnhancedPharmacovigilance&PostmarketingMonitoring


KernelMethod•  KernelMethod:

–  KernelAssmoothfunctionwithnormalizationfactor(Hastieetal,2001)–  Non-linearkernelregressionwithoutnormalizationfactor(Hastieetal,2001)–  Kernelasattributesinlearning(trainingweights)(Scholkopf,etc.,2005)

•  Kernel

–  Kernel,k(�,�),isthedotproductoffeaturevectors–  Kernelisasimilarityscorefunction

•  FeatureSelectionforDifferentObjects–  Sequencing(text,sound,music)–  Images(black-white,color)–  Trees(chemicalcompounds)–  Networks(metabolicnet,disease-symptomnet,drug-AEnet)


TheKernelTrick•  Anyalgorithmforvectorialdatathatcanbeexpressedonlyintermsofdotproductsbetweenvectorscanbeperformedimplicitlyinthefeaturespaceassociatedwithanykernel,byreplacingeachdotproductbyakernelevaluation(Scholkopf,etc.,2005).

•  Transferlinearmethods(e.g.,PCA)intononlinearmethodsbysimplyreplacingtheclassicdotproductwithamoregeneralkernel,suchastheGaussianRBFkernel.

•  Nonlinearityviathenewkernelisthenobtainedatminimalextracomputationalcost,asthealgorithmremainsexactlythesame.


SupportVectorMachineAsAKernelMethod


ClassifierwithHardMarginModel:

ClassifierwithSoftMarginModel:

SupportVectorMachineInR

ForPredictingCognitiveImpairment


•  Alzheimer’sdiseaseisthemostcommoncauseofdementiaintheelderly.

•  Predictthecognitiveoutcomeusingthedemographicsand130biomarkers(predictors).

ApplicationOfKernelMethods<<KernelMethodsinComputationalBiology>>,Scholkopfetal(2004)collectdifferentapplicationsofkernelmethodsandsupportvectormachines:•  InexactMatchingStringKernelsforProteinClassification•  FastKernelsforStringandTreeMatching•  LocalAlignmentKernelsforBiologicalSequences•  KernelsforGraphs•  DiffusionKernels•  AKernelforProteinSecondaryStructurePrediction•  HeterogeneousDataComparisonandGeneSelectionwithKernelCanonical

CorrelationAnalysis•  Kernel-BasedIntegrationofGenomicDataUsingSemi-definiteProgramming•  ProteinClassificationviaKernelMatrixCompletion•  AccurateSpliceSiteDetectionforCaenorhabditiselegans•  GeneExpressionAnalysis:JointFeatureSelectionandClassifierDesign•  GeneSelectionforMicroarrayData


ApplicationOfSVM•  SVM-basedproteinfoldrecognitionmethods(Shamimetal.,2007,2013;DamoulasandGirolami,2008;Dongetal.,2009;YangandChen,2011).ThemaindifferenceamongtheseSVM-basedmethodsistheirfeaturerepresentationalgorithms.

•  Poorinmohammadetal.(2015)combinedtheSVMapproachwithpseudoaminoacidcompositiondescriptorstoclassifyanti-HIVpeptides,withapredictionaccuracyof96.76%.

•  WeiandZou(2016)providedareviewoftherecentprogressinmachinelearning-basedmethodsforproteinfoldrecognitionpriorto2011.

Copyright®MarkChang,PhD,AGInception,

BostonUniversity,2019 69

UnsupervisedLearning(USL)•  UnsupervisedLearningHasNoClear,CorrectAnswers•  USL=Clustering,Association,andAnomalyDetection

–  Clustering=discoveringtheinherentgroupingsinthedata.Ex:groupingpatientsbytheirbaselinediseaseseverityanddemographiccharacteristics

–  Anassociationrulelearning=discoveringrulesthatdescribelargeportionsofthedata.Ex:peoplebuyingproductAalsotendingtobuyproductB.

–  Anomaly(outlier)andnoveltydetections=identifyingitems,eventsorobservationsthatdonotconformtoanexpectedpattern.Anomalydetectioncanalsobeasupervisedlearning.Ex:structuraldefects,medicalproblems,texterrors,orbankfraud

•  Applications–  Groupingbreastcancerpatientsbytheirgeneticmarkers–  GroupingMovieviewersbytheratingsassignedbymovieviewers–  Findingsaleitemsthatarehighlyrelatedcanbeveryhelpfulwhen

stockingshelvesordoingcross-marketinginsalespromotions,catalogdesign,andconsumersegmentationbasedonbuyingpatterns.


•  AprioriAlgorithmforAssociationRule–  Identifythefrequentindividualitemsintherelationaldatabaseand

extendsthemtolargerandlargeritemsetsaslongasthoseitemsetsappearsufficientlyofteninthedatabase

•  PrincipalComponentAnalysis(PCA)–  Findalow-dimensionalrepresentationoftheobservations–  Sequentiallyfindasetoflinearcombinationsofthepredictorsthat

havemaximalvarianceandareorthogonalwitheachother.•  K-MeansClusteringforDimensionReduction

–  PartitiontheNobservationsintoapre-specifiedKclusterssoastominimizethewithin-clustersumofsquares:

•  HierarchicalClusteringforDimensionReduction–  Yieldsatree-likevisualrepresentation(Dendrogram)ofthe

observationswithoutpre-specifiednumberofclusters

CommonUnsupervisedLearningAlgorithms


HierarchicalClusteringAlgorithm


HierarchicalClusteringForBreastCancerPatients


HeatmapofGeneSamples

75

ApplicationsOfClusteringMethods•  Böckeetal(2005),ahierarchicalclusteringapproachforlargecompoundlibraries•  Wallneretal(2010)studiedcorrelationandclusteranalysisofimmunomodulatory

drugsbasedoncytokineprofiles.•  Tanetal(2015)andRasooletal(2013)useneuralnetworkfilteringmethodsare

usedforextractingrepresentationsthatbestdescribethegeneexpressions.•  Haratyetal(2015)studiedanenhancedk-meanclusteringalgorithmforpattern

discoveryinhealthdata.•  Yildirimetal(2014)usek-meansalgorithmtodiscoverhiddenknowledgeinhealth

recordsofchildrenanddatamininginbioinformatics.Theyconcludethatmedicalprofessionalscaninvestigatetheclusterswhichourstudyrevealed,thusgainingusefulknowledgeandinsightintothisdatafortheirclinicalstudies.

•  Hameedertal(2018)proposedatwo-tieredunsupervisedclusteringapproachfordrugrepositioningthroughheterogeneousdataintegration.

•  MacCuishetal(book,2019),clusteringinbioinformaticsanddrugdiscovery•  Maetal(2019),acomparativestudyofclusterdetectionalgorithmsinprotein–

proteininteractionfordrugtargetdiscoveryanddrugrepurposing.


ReinforcementLearning(RL)•  Learning

–  OptimizeautilityFunction–  Feedbackfromone’senvironmentisessentialforRL.–  Forthesituationwhenthecorrectanswerisdifficulttodefineorthere

aretoomanypossiblepathstocompletethetask.•  Example-AlphaGoZero

–  AcomputerprogramthatplaystheboardgameGo–  Atthe2017FutureofGoSummit,beattheworldNo.1rankedplayer–  AlphaGoanditssuccessorsuseaMonteCarlotreesearchalgorithmto

finditsmovesbasedonknowledgelearnedbyanANNfrombothhumanandcomputerplay

•  Methods–  StochasticDecisionProcess–  Q-Learning

•  Applications–  MarkovDecisionProcessforclinicaldevelopmentprograms(Chang,

2010)–  Valueiterationandpolicyiterationalgorithms


PhaseTransitionProbabilitiesOfClinicalTrials


Source:Changetal.(2019)

MarkovChainForClinicalDevelopmentProgram


Source:Chang,2010;Changetal.2019

Pij=ActualPhaseTransitionProbabilityfromPhaseitoPhasejforalldiseaseindications

SDP=Markovchain&actionrules(calledpolicy)tooptimizetheNPVateachstage.Backwardinduction&dynamicprogrammingusedtosolvetheproblem.Action=trialdesignandconduct,……

SequentialDecisionProcess(SDP)


Source:Chang,2010;Changetal.2019

NME

Phase1&

Action

Phase2&

Action

Phase3&

Action

MarketingAuthorization

NetPresentValue

(NPV

)

DevelopmentTimeline

P01

P12

P23

P34

StochasticDecisionProcessForACancerClinicalDevelopmentProgram(CDP)


Source:Changetal.2019

SwarmIntelligence(SI)•  Self-OrganizedSystem(SOS)

–  SOS=Systemsinwhichorganizedbehaviorariseswithoutacentralizedcontrollerorleader

–  Stigmergy=amechanismofindirectcoordinationbetweenagents:thetraceleftintheenvironmentbyanactionstimulatestheperformanceofanextaction,byadifferentagent.

•  SwarmIntelligence(SI)–  SI=intelligencepossessedbySOSwithoutcommongoalandleader

•  CharacteristicsofSI–  Micromotivedandmacroconsequence(genotypeversusphenotype)–  EachindividualinSOShasnointelligenceandfollowasimplerule.–  Ex:eachantsimplyfollowsthehormone,butthecolonybehavesintelligently

infindingtheshortestpathtothefoodsource–  SIisnotownbyanyindividual

•  SIExamplesinNature–  Antcolonies,birdflocking,animalherding,bacterialgrowth,fishschooling,

andmicrobialintelligence•  ArtificialSwarmIntelligenceAlgorithms

–  Particleswarm,Antcolony,Artificialbeecolony,Artificialimmunesystems,,Gravitationalsearch,Glowwormswarm,andBatalgorithms


HowSwarmIntelligenceWorks


SwarmIntelligence:Everyantfollowthesmell,collectivelytheantcolonyquicklyfindtheshortestpathwithouttheguidanceofanycentralintelligence

SwarmIntelligenceApplications•  Acentralpartoftherationaldrugdevelopmentprocessisthepredictionofthe

complexstructureofasmallligandwithaprotein,theso-calledprotein-liganddockingproblem,usedinvirtualscreeningoflargedatabasesandleadoptimization.

•  Korbetal(2006)developedadockingalgorithmbasedonantcolonyoptimization,tostructure-baseddrugdesign.

•  Fu,etal.(2015)studiedanewapproachforflexiblemoleculardockingbasedonswarmintelligence.Theycomputetheinteractionsof23protein-ligandcomplexes.

•  RajeshkumarandKousalya(2017)presentaviewonapplicationsofswarm-basedintelligencealgorithmsinpharmaceuticalin-dustry,includingdrugdesign,pharmacovigilance,andalignmentofsequence.

•  Soulami,et.Al.(2017)usedaparticleswarmoptimization(PSO)basedalgorithmfordetectionandclassificationofabnormalitiesinmammographicimagesusingtexturefeaturesandsupportvectormachine(SVM)classifiers.

•  Fanget.al.(2018)presentedanovelfeatureselectioncalledtheelitesearchmechanism-basedflowerpollinationalgorithm(ESFPA)usedtodetermineproteinessentiality.ESFPAusesanimprovedSIalgorithmforfeatureselectionandselectsoptimalfeaturesforproteinessentialityprediction.

•  ThemetaheuristicOptpackageinRincludesmanyoptimizationalgorithms


EvolutionaryIntelligence(EI)•  InspirationbyNaturalEvolutionProcess

–  Inspiredbyourunderstandingofbiologicalevolution•  MinimalorNoSpecificationofSolutionStructure

–  Automaticallysolvesproblemswithoutrequiringtheusertoknoworspecifytheformorstructureofthesolutioninadvance.

•  SolutionImprovementViaEvolution–  Generationbygeneration,stochasticallytransformpopulationsofcomputerprogramsintonewpopulationsofprogramsthatwilleffectivelysolveproblems.

•  GeneticAlgorithm(GA)versusGeneticProgramming(GP)–  Bothuseevolutionalgorithms–  GAisrepresentedasalistofactionsandvalues,oftenastring–  GPisrepresentedasatreestructureofactionsandvalues,usuallyanesteddatastructure.


GeneticProgramming

ThreeKeyElementsinGP:(1)Representation:SyntaxTree

Mathematicalorcompoundstructure(2)ReproductiveMechanism:CrossoversandMutations

Crossover=combinationoftworandomlyselecteddatastructurestoproduceonenewstructure.Amutation=randomlygeneratesasubtreetoreplacearandomlyselectedsubtreefromarandomlyselectedindividual.

(3)SurvivalFitnessForfindingafunctiong(x)toapproximatethetargetfunctionf(x),thefitness=themeansquareerrorbetweenthetwofunctions.Forfindingaoptimaltreatmentsequenceforinfertility,thefitnessfunction=treatmentsuccess(1forsuccessand0forfailure).


GeneticAlgorithmForTreatmentStrategyOptimization


Thepopulation(treatmentsequences)changesconstantlysincesurvivabilityisafunctionoffitnessorobservedeffectoftreatmentsequence

Treatmentsequencesfortreatingwomeninfertility

ApplicationOfEvolutionaryIntelligence•  RNAstructureprediction(Batenburgetal,1995)•  Molecularstructureoptimization(Wong,etal.2011)•  GhoshandJain(2005)collectedarticlesacrossabroadrangeoftopicson

theapplicationsofevolutionaryAIindrugdiscovery,GPindataminingfordrugdiscovery

•  Barmpalex,et.al.(2011)hasusedsymbolicregressionviageneticprogrammingintheoptimizationofacontrolledreleasepharmaceuticalformulationandcompareditspredictiveperformancetoANNs.

•  Ghaheri,et.al.(2015)reviewedtheapplicationsofthegeneticalgorithmindiseasescreening,diagnosis,treatmentplanning,pharmacovigilance,prognosis,andhealthcaremanagement.

•  Ghaherietal(2019)providedancomprehensivereviewoftheapplicationsofgeneticalgorithmsinmedicine,in14differentareas:–  radiology,oncology,cardiology,endocrinology,obstetricsandgynecology,

pediatrics,surgery,infectiousdiseases,pulmonology,radiotherapy,rehabilitationmedicine,orthopedics,neurology,pharmacotherapy,andhealthcaremanagement.


Review&Summary




MachineLearning

ReinforcementLearning

EvolutionalLearning

SwarmIntelligenceLearning

SupervisedLearning

ListOfMachineLearningMethodsDetailsandReferencesAvailablefromtheUpcomingBook(Chang,2020)

-Linkanalysis-PCA-K-Means-HierarchicalClustering-SOM

-SBML-KNN-Deeplearning(FNN,CNN,RNN,DBN)-Kernelmethod-SVM-GANs-Autoencoders-Decisiontree(bagging,Boosting,Randomforest)-Logisticregression-Ridge,lasso,elasticnetregressions-Bayesiannetwork

-Sequentialdecisionprocess-BayesianQ-learning

-Geneticalgorithm-Geneticprogramming-Cellularautomata

-Antclusteringmodel-Particleswarmoptimization(PSO)


AIApplications

KernelMethodforProteinClassification&BiologicalSequences

DrugDiscovery

Healthcare

MicroarrayDataAnalysis

DrugDevelopment

SBMLforClinicalTrialsofRareDisease&PrecisionMedicine

DeepLearning(FNN,CNN,RNN,LSTM,DBN)fordrugdesign&moleculeclassification

DecisionTree,RandomForestforbioactivityclassification,protein-ligandbinding,toxicitymodeling

LDAfordrug-druginteraction,kNNfordrugablemoleculesidentification,SVMforDNA-bindingproteinprediction,Networksimilarityfordiseasemodeling

NLPforPharmacovigilance:Auto-narrativegeneration,narrativeanalysis,QCassessment,causalityassessment.developdiseasemodels,probabilisticclinicalriskstratificationmodels,andpractice-basedclinicalpathways

DiseaseDiagnosis&PrognosiswithMedicalImage

Data

NLPforMedicalRecords:textprocessing,classification

KNN,SVM,CNN,AdaBoost,AutoencoderforCancerDetectionfromGeneExpressionData

ANN&SOMensemblesforgene-relatedclustering

CNNfortumordetection,segmentation,classification,andcomputer-aideddiagnosis,breastcancer(mammography),lungdiseases,lesionsegmentationfortraumaticbraininjuries,braintumor,CVdiseasewithCardiologicalimage

ListOfAIApplicationsDetailsandReferencesAvailablefromUpcomingBook(Chang,Feb2020)

Structure-Activity

Relationship(QSAR)

SVMfordisabilityduetostroke,

ReferencesBooks:•  Chang,M.(2019).ArtificialIntelligenceinDrugDevelopment,PrecisionMedicine,andHealthcare.CRC•  Chang,M.(2010).MonteCarloSimulationforthepharmaceuticalindustry.CRC•  Chang,M.(2013).PrinciplesofScientificMethods.CRC•  Chang,M.(2007,2014).AdaptiveDesignTheoryandImplementationUsingSASandR.CRC•  Chang,M.(2011).ModernIssuesandMethodsinBiostatistics,Springer.•  Chang,M.(2012).ParadoxesinScientificInference.Chang,M.(2014).PrinciplesofScientificMethods.CRC•  ChangM.etal.(2019).InnovativeStrategies,StatisticalSolutionsandSimulationsforModernClinicalTrials.CRC•  ChowandChang(2011).AdaptiveDesignMethodsforClinicalTrials.CRC•  Hastie,T.,Tibshirani,R.,Friedman,J.(2001,2ndEdition,2009).TheElementsofStatisticalLearning.Springer•  Koza,J.etal(2005).GeneticProgrammingVI.Routinehuman-competitivemachineintelligence.Springer•  Ben,G&Pennachin,C.(1998).ArtificialGeneralIntelligence.Springer•  RussellS&Norving(2003).Artificialintelligence–amodernapproach.PrenticeHall.•  Bishop.C.(2006).PatternrecognitionandMachinelearning.SpringerReviewPapers:•  Daouda,M.andMayo,M.(2019).ASurveyOfNeuralNetwork-basedCancerPredictionModelsFromMicroarray

Data.ArtificialIntelligenceInMedicine97,204—214•  Jiang,F.etal(2017).Artificialintelligenceinhealthcare:past,presentandfuture.StrokeandVascularNeurology•  Long,E.,Lin,H.,etal(2017).Anartificialintelligenceplatformforthemultihospitalcollaborativemanagementof

congenitalcataracts.•  Qayyuma,A.etal.(2018.)MedicalImageAnalysisusingConvolutionalNeuralNetworks:AReview,Journalof

MedicalSystemsNovember2018,42:226•  Vanhaelen,Q.etal.(2017).Designofefficientcomputationalworkflowsforinsilicodrugrepurposing.Drug

DiscoveryToday Volume22,Number2 February2017•  Schmider,J.ArtificialIntelligenceinAdverseEventCaseProcessing.Clin.Pharmco&Therapeutics.Aprial2019•  Ghasemi,F.(2018).Neuralnetworkanddeep-learningalgorithmsusedinQSARstudies:meritsanddrawbacks.

DrugDiscoveryTodayVolume23,Number10October2018


Similarity-BasedMachineLearninginR(1)library(class)#Normalizedata(bysubtractingMeanandDividingbySDoftheRefDatanormalizeData=function(DataToNormalize,RefData){normalizedData=DataToNormalizefor(nin1:ncol(DataToNormalize)){normalizedData[,n]=(DataToNormalize[,n]-mean(RefData[,n]))/sd(RefData[,n])}return(normalizedData)}#CalculateinitialscalingfactorsR0s#S0s=p-valuesInitialRs=function(X,eta,S0s){K=length(S0s);R0s=rep(0,K)for(kin1:K){#force<12toavoids=exp(d)overflow R0s[k] = min((-log(S0s[k])) ^(1/eta) / max(IQR(X[ ,k]), 0.0000001), 12) }return(R0s);}Distance=function(Rs,x1,x2){#3vectors:Rs=scalingfactors,x1=point1,x2=point2K=length(Rs) d = 0; for (k in 1:K) { d = d + (Rs[k]* (x2[k]-x1[k]))^2 } return(d^0.5)}#CalculateSimilarityScore.SimilarityTrain=function(eta,Rs,Xtrain){N1=nrow(Xtrain);K=length(Rs); S = matrix(0, nrow=N1, ncol=N1) for(iin1:N1){for(jini:N1){ d2 = 0; for (k in 1:K) { d2 = d2 + (Rs[k]* (Xtrain[i,k]-Xtrain[j,k]))^2 } S[i,j]=exp(-d2^0.5)#avoid0fori=jS[j,i]=S[i,j]#Usesymmetrytoreduce50%CPUtime.}}#Endjandiloopsreturn(S)}#CalculateSimilarityScoresbetweentrainingandtestsubjects.Similarity=function(eta,Rs,Xtrain,Xtest){N1=nrow(Xtrain);N2=nrow(Xtest);K=length(Rs) S = matrix(0, nrow=N2, ncol=N1) for(iin1:N2){for(jin1:N1){ d2 = 0; for (k in 1:K) { d2 = d2 + (Rs[k]* (Xtest[i,k]-Xtrain[j,k]))^2 } S[i,j] = exp(-d2^0.5) }}#Endjandiloopsreturn(S)}


Weight=function(S){ N1= nrow(S); N2=ncol(S); W=matrix(0,nrow=N1,ncol=N2)for(iin1:N1){sum_S_row=sum(S[i,])for(jin1:N2){W[i,j]=S[i,j]/max(sum_S_row,0.00000000001)}}return(W)}#EndofWeightPredictedY=function(W,Ytrain,Ytest){#CalculatepredictedoutcomeandErrorOutObj=list()OutObj$pred_Y=W%*%Ytrain#Forbinaryoutcome,pred_y=probof1.OutObj$MSE=mean((OutObj$pred_Y-Ytest)^2)return(OutObj)}#EndofPredictionYDerivativeE=function(eta,pred_Y,Rs,X,S,O){#DerivativesifthelossfunctionN=nrow(X);K=length(Rs)der_S=matrix(0,nrow=K*N,ncol=N);der_W=matrix(0,nrow=K*N,ncol=N)dist=matrix(0,nrow=N,ncol=N);der_E=rep(0,K)for(iin1:N){for(jini:N){ d2 = 0; for (k in 1:K) { d2 = d2 + (Rs[k]* (X[i,k]-X[j,k]))^2 }dist[i,j]=max(d2^0.5,0.0000001)#avoidoverflowinder_dbelowdist[j,i]=dist[i,j]}}#Endofiandjloopsfor(min1:K){for(iin1:N){for(jini:N){ der_d = (Rs[m]/dist[i,j]) * (X[i,m]-X[j,m]) ^2 der_S[(m-1)*N+i, j] = -1 * S[i,j] * eta * (dist[i,j])^(eta-1) * der_d der_S[(m-1)*N+j, i] =der_S[(m-1)*N+i, j]}}}#Endofj,i,andmloops#WeightDerivativefor(min1:K){for(iin1:N){ sum_der_S = sum(der_S[(m-1)*N+i, ]); sumSi = sum(S[i, ])for(jini:N){ der_W[(m-1)*N+i, j] = der_S[(m-1)*N+i, j] / sumSi - S[i,j]* sum_der_S /sumSi^2 der_W[(m-1)*N+j, i] =der_W[(m-1)*N+i, j]}}}#Endofj,i,andmloops#DerivativesofEfor(min1:K){for(iin1:N){err=2/N*(pred_Y[i]-O[i])#Formeansquarederror for (j in 1:N) { der_E[m] = der_E[m] + err * O[j] * der_W[(m-1)*N+i, j] }}}#EndofIandmloopsreturn(der_E)}#EndofDerivativeE


Similarity-BasedMachineLearninginR(2)

Learning=function(LearningRate,Lamda,Rs,der_E){#Updatescalingfactors,Rs.K=length(Rs)der_lossFun=der_E+2*Lamda*Rs#derivativesoflossfunctionRs=Rs-LearningRate*der_lossFun#forceRs<25toavoidS=exp(d)overflow for (m in 1:length(Rs)) { Rs[m] = min(max(0,Rs[m]),25) }return(Rs)}#EndofLearning#TrainingAItoobtainscalingfactors,Rs###SBMLtrain=function(Epoch,S0,Lamda,LearningRate,eta,Xtrain,Ytrain){TrainObj=list();OutObj0=list();OutObj=list()R0=InitialRs(Xtrain,eta,S0);S=SimilarityTrain(eta,R0,Xtrain)OutObj0=PredictedY(Weight(S),Ytrain,Ytrain)Rs=R0;OutObj=OutObj0#incaseEpoch=0fornolearningTrainMSE0=OutObj0$MSEpreLoss=OutObj0$MSE+Lamda*sum(Rs^2)#Learning iter=0;while(iter<Epoch){preRs=RsRs=Learning(LearningRate,Lamda,Rs,DerivativeE(eta,OutObj0$pred_Y,Rs,Xtrain,S,Ytrain))OutObj=PredictedY(Weight(SimilarityTrain(eta,Rs,Xtrain)),Ytrain,Ytrain)iter=iter+1Loss=OutObj$MSE+Lamda*sum(Rs^2)if(Loss>preLoss){Rs=preRs;iter=Epoch+1}preLoss=Loss}TrainObj$Rs=Rs;TrainObj$Y=OutObj$pred_Y;TrainObj$MSE=OutObj$MSEreturn(TrainObj)}#Linearregression#outcome="gaussian"or"binomial",...GLMPv=function(Y,X,outcome){LMfull=glm(Y~.,data=X,family=outcome)sumLM=coef(summary(LMfull)) pVals=sumLM[,4][1:ncol(X)+1]return(pVals)}

#ExamplewithSimulatedMultivariateNormalData#covariancematrixforcreatingcorrelatedX.library(mvtnorm)sigma=matrix(c(1,0.0,0.6,0.2,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.6,0.0,1,0.5,0.0,0.0,0.0,0.2,0.0,0.5,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1),ncol=7)mu=c(0,0,0,0,0,0,0)#7multivaritenormalmeansinitS=rep(0.5,7)#7intialsimilarities.xTrain=rmvnorm(n=40,mean=mu,sigma=sigma)xTest=rmvnorm(n=40,mean=mu,sigma=sigma)#ConstructythatarenotlinearlyrelatedtoxYtrain=xTrain[,1]+xTrain[,2]*xTrain[,3]+xTrain[,4]^2+xTrain[,5]Ytest=xTest[,1]+xTest[,2]*xTest[,3]+xTest[,4]^2+xTest[,5]#DatanormalizationXtrain=as.data.frame(normalizeData(xTrain,xTrain))Xtest=as.data.frame(normalizeData(xTest,xTrain))#Assignp-valuefromLMtoinitialsimilarities#initS=GLMPv(Y=Ytrain,X=Xtrain,outcome="gaussian”)mySBMLtrain=SBMLtrain(Epoch=5,S0=initS,Lamda=-0.5,LearningRate=0.125,eta=1,Xtrain,Ytrain)myPred=PredictedY(Weight(Similarity(eta=1,mySBMLtrain$Rs,Xtrain,Xtest)),Ytrain,Ytest)#MSEusingtrainingmeanforpredictionplot(myPred$pred_Y,Ytest)mse1=mean((Ytest-mean(Ytrain))^2)#naivemsemse2=myPred$MSE#Probabilityerroravemse1=avemse1+mse1/50avemse2=avemse2+mse2/50}avemse1;avemse2

Acknowledgment

•  Dr.SusanHwang,BostonUniversity


Download - Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Top Related