introduction to artificial intelligence for drug development & … · 2020. 9. 30. · with...

95
Introduction To Artificial Intelligence For Drug Development & Medical Science Mark Chang, PhD Copyright® Mark Chang, PhD, AGInception, Boston University, 2019 1 FDA-Industry Workshop, Washington DC, September, 2020

Upload: others

Post on 12-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

IntroductionToArtificialIntelligenceForDrug

Development&MedicalScience

MarkChang,PhD

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 1

FDA-IndustryWorkshop,WashingtonDC,September,2020

Page 2: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

TopicstoCover•  AI/MachineLearningOverview•  SupervisedLearning

– DeepLearningNeuralNetworks–  Similarity-BasedLearning/KernelMethod/SupportVectorMachine

– DecisionTreeMethods•  Non-SupervisedLearning

– UnsupervisedLearning–  ReinforcementLearning–  SwarmIntelligentLearning–  EvolutionaryLearning

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 2

Page 3: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ArtificialIntelligence(AI)•  Intelligence=AbilityofPerformingMentalWork•  AI=SoftwarewiththeAbilityofPerformingHumanMentalWork•  Robot=Machine(hardware)withtheAbilityPerformPhysicaland/or

MentalWorkthatUsuallyRequiresAI•  ExamplesofAITechnologiesandRobotics

–  Calculator,computer,iphone,ipad,–  ELIZA,Apps,searchengine,Googlemap,ecommercesystem,..Alpha0–  Machine-learning,datamining,statisticalmodeling.–  Humanoidrobots,iRobot,robotsinassemblylines,self-drivecars

•  MedicineAI–  QuantitativeStructure–ActivityRelationship(QSAR)andMolecularDesignin

DrugDiscovery–  CancerPredictionUsingMicroarrayData–  DiseaseDiagnosisandPrognosisBasedonMedicalImage–  NaturalLanguageProcessingforMedicalRecords–  Similarity-BasedAIforClinicalTrials

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 3

Page 4: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ArtificialGeneralIntelligence(AGI)•  AGIIsMachineWithFullScopeOfHumanIntelligence

–  UnlikeNarrowAIthataccomplishspecifictasksforhuman–  Capabilityofexperiencingconsciousness,emotion,discovery,creativity,

self-awareness,collaboration,andevolution–  Maybebecapableofcreatingrobotslikehimselfandhumanthatcan

givebirths•  AGIIsAnotherRaceOfHumanBeings

–  AsmedicalAIdevelops,wegraduallyreplaceourmalfunctionedorganswithones(artificialornot),makingeveryoneamixtureofhumanandmachine.

–  Asrobotsreachfullhumanintelligenceandintensivelyinteractwithus,wewilldevelopstrongfeelingwiththemandwillnotdiscriminatethemjustbecauseoftheiroriginalities.

•  ChallengesToOvercome–  Computingpowerrequired-possiblypoweredbyQuantumComputing–  Physical(muscular)powerrequired–  Man-machinemixturemightbeasolution

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 4

Page 5: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 5

UnsupervisedLearning

MachineLearning

ReinforcementLearning

EvolutionalLearning

SwarmIntelligenceLearning

SupervisedLearning

Classification

Regression

Anomalydetection

Clustering

Association

Dimensionreduction

MachineLearning=BackboneofNarrowAI

Page 6: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

TypesOfMachineLearning•  SupervisedLearning

–  Learnergivesaresponseybasedoninputx,andcomparethetargety–  Trainedlearnerwillbeabletopredictfutureoutcome–  Ex:Digitalimagingrecognitionforcancerdiagnosis

•  UnsupervisedLearning–  Basedonthesimilaritiestore-representtheinputsinamoreefficientway–  Findhiddenstructurewithoutthehelpofatargetanswer–  Ex:Documentationorganization,customersegmentation,patientclustering

•  ReinforcementLearning–  Concernshowalearnershouldtakeactionsinanenvironmentsoasto

maximizesomenotionoflong-termreward–  Ex:iRobot,driverlesscar

•  EvolutionaryIntelligence/Learning–  EmergesthroughevolutionsovergenerationsofAIagents–becomebetterin

somesense–  Ex:GeneticalgorithmforpharmacovigilanceandHealthcaremanagement

•  SwarmIntelligence/Learning–  Systemsoperationbasedonmicro(individual)motivations,withouta

centralizedcontrollerorleader–  Ex:Antalgorithmforoptimalcustomerboardingstrategyatairport

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 6

Page 7: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

DeepLearning•  DeepLearningNeuralNetwork

– Multiple-layerartificialneuralnetwork–  Progressivelyextracthigher-levelfeaturesfromrawinput

•  FourMajorNetworkArchitectures1.  Feed-forwardNeuralNetwork(FNN)forGeneral

ClassificationandRegression2.  ConvolutionNeuralNetwork(CNN)forImageandFacial

Recognition3.  RecurrentNeuralNetwork(RNN)forSpeech

Recognition,NaturalLanguageProcessing4.  DeepBeliefNetwork(DBN)forDiseaseDiagnosisand

Prognosis.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 7

Page 8: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ArtificialNeuralNetwork(ANN)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 8

ActivationFunction:identity,softmax,logistic,sigmoidal,tanh,andsgnfunctions,…

Page 9: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

MultiplePerceptron(ForwardNeuralNetwork)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019

•  Withaidentityactivationfunction,theANN=linearfunctionXiandlinearinWi•  ClassiclinearmodelislinearinparameterWibutcannonlinearinXi•  Learning=updatingweightWi

9

Wi Wi

Page 10: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Learning-BackpropagationAlgorithm

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 10

Learning(training)=updatingweightwiinthenetwork.Errororlossfunction:

Updateweightwibackwardslayerbylayerstartingfromtheoutputlayer,hiddenlayers,totheinputlayer.α=learningrate,asmallconstanttoensureconvergenceofthealgorithm,

PredictedTargeted

Weightincrement

Δwi = −α∂E∂wi

E = 1N

(Yi −Yi )2

i∑

Y = f (wi ,Yi )

Page 11: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

TrainingAndTesting•  TrainAIModeltoDetermineItsParametersBeforeUse•  TesttheTrainedModelforItsPerformance•  DefineTargetPopulation•  ObtainTrainingandTestingdatasets

–  Resamplingwithreplacement(bootstrap)andwithoutreplacement(split)

–  Forlargersamplesizeproblems,useresamplingwithoutreplacementbecauseeachsamplelikelyreflectsthedistributionofthetargetpopulation.

–  Forsmallsamplesizeproblems,useresamplingwithreplacementbecauseeachsampleoftendoesnotreflectthepopulationandsampleistoosmallforresamplingwithoutreplacement(Empiricaldistribution≈populationdistribution).

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 11

Page 12: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

TrainingErrorVersusTestingError

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 12

Over-fitting

Regularization:Introducingalossfunctiontoreduceover-fitting.

Page 13: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ConvolutionNeuralNetwork(CNN)•  MainIdeasInCNNs

–  Differentfiltersintheconvolutionlayers–  Locationinvarianceandcompositionality:makesenseforimageprocessingbutnotforNLPortime-seriesevents

•  FilterFunctions–  Eachidentifiesparticularfeaturesorimageelements–  Overlappedlocalfeaturestoformulatethe“overallpicture.”

•  CNNforDataOtherThanImages,If–  (1)thedatacanbetransformedtolooklikeimagedatainmatrixform,and

–  (2)thedataisjustasusefulafterswappinganytwocolumns

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 13

Page 14: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ConvolutionIllustrated

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019

Aconvolution(function)oftwofunctionf(x)andg(y)isdefinedasthesumoftheproductsofoneimagef(·)atlocationxandanotherimage,g(·)(calledthefilter)atadifferentlocationy-x.Atpixlevel:

14

FiltersizeN=9

Ahighvalueindicatesgoodmatchbetweenfilterandoriginalimages

Highvaluesondiagonalelementsindicatethebackslashimage

Page 15: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

CNNForDiseaseDiagnosis

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 15

•  PoolingLayer=>ReduceResolutionorSizeofImages•  ConvolutionalLayer=>FilteringSpecificFeatures•  DenseLayer=>FullyConnectedLayer

Page 16: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SampleOfMnistHandwrittenDigits

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 16

Page 17: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

CNNForWrittenDigitsRecognition98.7%AccuracyUsingkerasInR

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 17

Page 18: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationsOfCNN•  Hussain(2017),CascadeddeepCNNforabraintumorsegmentation.•  Farooq(2017),CNN-basedmethodfortheclassificationofAlzheimer’s

diseaseinMRIimages•  Moeskopset.al(2016),amultiscaleCNN-basedapproachforautomatic

segmentationofMRImagesforclassifyingvoxelintobraintissueclasses•  Prasoonet.al(2013),atri-planarCNNusedforsegmentationoftibial

cartilageinkneeMRIimages•  Zhangetal(2015),segmentationofisointensebraintissuepresented

throughaCNNusingamultimodalMRIdatasetbytrainingthenetworkonthreepatchesthatareextractedfromtheimages

•  Anthimopoulosetal(2016),lungpatternclassificationforinterstitiallungdiseasesusingadeepconvolutionalneuralnetwork

•  Coleetal(2016),Predictingbrainagewithdeeplearningfromrawimagingdataresultsinareliableandheritablebiomarker

•  Estevaetal(2017),Dermatologist-levelclassificationofskincancerwithdeepneuralnetworks

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 18

Page 19: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ArchitectureOfRecurrentNeuralNetwork(RNN)

•  RecurrentNeuralNetwork(RNN)–  ANNformodelingtemporaldynamicbehavior–  UnlikeFNN,RNNscanusetheirinternalstateasmemorytoprocesssequencesofinputs.

•  BasicRNN–  Suffersfromashort-termmemoryproblem–  DifficulttocarryinformationfromearliertimestepstolateroneswhenthesequenceislongassuchEnglishtext

•  LongShort-TermMemoryNetworks(LSTMs)–  Designedtodealwithlong-termdependencyproblems–  UsedforLNP,stockmarketprediction,voicerecognition,motionpicturecaptioning,andpoemandmusicgeneration.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 19

Page 20: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

VanishingAndExplodingGradientProblem&Solutions

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 20

X0 X1 Xt-1 Xt+k Xn

w

Problem:Whenthenumberoflayersgetsverylarge,thegradientsatearlylayersvanishandchangesofthecorrespondingweightshaveminimaleffectontheoutcome

… … …www

Xt…

SkipConnection

Solutions:•  IntroduceSkipConnections(e.g.,if…then…)•  LeakyRecurrentUnits•  GateRecurrentNetwork•  LongShort-TermMemoryNetwork(LSTM)•  DeepBeliefNetwork

w

Page 21: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ArchitectureOfLongShort-TermMemoryNetwork

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 21

Page 22: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationOfRNNToMolecularDesign

Gupta,etal.(2018)trainedtheirRNNon541,555SMILESstrings,withlengthsfrom34to74SMILEScharacters(tokens).TheRNNmodelcanbeusedtogeneratesequencesonetokenatatime,asthesemodelscanoutputaprobabilitydistributionoverallpossibletokensateachtimestep.Typically,theRNNaimstopredictthenexttokenofagiveninput.Samplingfromthisdistributionwouldthenallowgeneratingnovelmolecularstructures.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 22

Page 23: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SimplifiedMolecular-InputLine-EntrySystem(SMILES)

SMILESdescribethestructureofchemicalspeciesusingshortASCIIstrings.In2007,anopenstandardcalledOpenSMILESwasdevelopedintheopen-sourcechemistrycommunity

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 23

Page 24: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationsOfLSTMInNLP

•  SentimentAnalysis:identifytextas“positive”or“negative,”filteringspamorclassifyingemailtextasspam,classifyingthelanguageofthesourcetext

•  LanguageModeling:predictingtheprobabilisticrelationshipsbetweenwords,enablingonetopredictthenextwordsorphrases

•  SpeechRecognition:translatespeechtotextreadablebyhumans•  MachineTranslation:translatetextbetweenlanguages•  DocumentSummarization:aheading,abstractforadocument•  QuestionAnsweringSystem:ProcessQuestionsandProvideAnswersina

naturallanguage

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 24

AtrainedLSTMistoprovidetheprobabilityofstringSviaaconditionalprobability:

Page 25: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

NaturalLanguageProcessing(NLP)VersusMolecularDesign

•  FeaturesInNLP–  Wordsatdifferencelocationsorsequenceofwordsinasentence

•  WordEmbedding–  Semanticmodelingrequiresmappingwordstonumbersorwordembedding.–  Wordembeddinginvolvesamathematicalembeddingfromaspacewithmany

dimensionsperwordtoacontinuousvectorspacewithalowerdimension.•  FeaturesInMolecularDesign

–  Molecularmodelsexhibittree-structures,whichcanbetransferredintoone-dimensionalsequencesofdifferentsubstructures

•  WordEmbeddingInBiologicalsequences–  WordEmbeddingforn-gramforbiologicalsequences(e.g.DNA,RNA,and

Proteins)-AsgariandMofrad(2015).–  Therepresentationcanbeusedindeeplearninginproteomicsandgenomics.

•  LongShort-TermMemoryNets–  NLPdealswithsequencesofwords.Drugdiscoverydealswithgenesequences

ormolecularstructuresrepresentedbyasequenceofbiologicalsubstructures–  LSTMscanbeusedforcompoundscreeningandmoleculardesignasinNLP.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 25

Page 26: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

DeepBeliefNetwork(DBN)•  TwoMajorChallengesInQSARStudiesandDrugDesign

–  (1)thelargenumberofdescriptorsmayhaveautocorrelations–  (2)properparameterinitializationinmodelpredictiontoavoidanover-fittingproblem.

•  DBNCombinesUnsupervisedandSupervisedLearning–  Forefficientlearning–  EachDBNlayeristrainedindependentlyviaunsupervisedportion

–  Theunsupervisedlearningisusedfordimensionreductionorselectingsubsetsofdescriptors.

–  Afterthecompletionofunsupervisedlearning,theoutputfromthelayersisrefinedwithsupervisedlogisticregression.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 26

Page 27: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ArchitectureOfDeepBeliefNetWithStraggledRestrictedBoltzmannMachine(RBM)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019

•  RBMsaretwo-layerneuralnetsthatconstitutethebuildingblocksofdeep-beliefnetworks.

•  ThefirstlayerofRBMiscalledthevisible,orinputlayer,andthesecondisthehiddenlayer.

•  Activationfunctionσ=normalizedexponentialenergyfunction.

•  Two-StageLearning:Unsupervisedlearningforfeatureselection=trainingRBMinparallel,followedbysupervisedlearningusingbackpropagation

27

σ

σσ

σσσ

σσ

σ σ 1stRBM

2ndRBM

3rdRBM

σ ClassOutput

InputLayer

Page 28: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Single-StepContrastiveDivergence(CD-1)ProcedureForRBMAndDBN

•  Takeatrainingsetvandinitialw,computetheprobabilitiesofthehiddenunits•  Sampleahiddenactivationvectorhfromthisprobabilitydistribution.•  Computepositivegradient=outerproductofvandh•  Fromh,sampleareconstructionv'ofthevisibleunits,thenresamplethehidden

activationsh'fromthis(Gibbssamplingstep).•  Computenegativegradient=outerproductofv'andh’•  UpdatetoweightWwithincrement=Learning_rate*(positivegradient-

negativegradient):•  Updatethebiasesaandb:•  TheaboveprocedurestartsfromtheinputlayerorthefirstRBMlaterandmove

graduallytotheoutputlayer.•  WhentheunsupervisedtrainingisfinishedforallRBM,thesupervisedlearning

(classification)startstofinetuningtheweights.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 28

Page 29: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationOfDeepBeliefNetwork•  Hintonetal(2006)proposedafastlearningalgorithmusedfor

deepbeliefnetworks.•  Zhenetal(2016)proposedaconvolutionaldeepbeliefnetworkis

fordirectestimationofaventricularvolumefromimageswithoutperformingsegmentationatall

•  Jaekwon,etal(2017)comparedDBNswithothermethodsincardiovascularriskprediction.TheyproposedacardiovasculardiseasepredictionmodelusingthesixthKoreaNationalHealthandNutritionExaminationSurvey(KNHANES-VI)2013datasettoanalyzecardiovascular-relatedhealthdata.TheyshowthatstatisticalDBN-basedpredictionmodelhasan83.9%accuracy.

•  Ghasemietal(2018)usedadeepbeliefnetworktoevaluatetheDBN’sperformanceusingKaggledatasetswithfifteentargetscontainingmorethan70kmolecules.Theresultsrevealedthatanoptimizationinparameterinitializationwillimprovetheabilityofdeepneuralnetworkstoprovidehighqualitymodelpredictions.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 29

Page 30: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SoftwarePackagesForDeepLearning

(TensorFlow,Keras,kerasandkerasR)•  TensorFlow

–  MultiplelevelsofabstractionpoweredbyMicrosoft–  Anopensourceartificialintelligencelibrary,usingdataflowgraphstobuild

models.–  Forcreatinglarge-scaleneuralnetworkswithmanylayers–  Scale->Vector->Matrix->Tensor.–  TensorFlow=MultidimensionalDataFlowfromLayertoLayerinArtificial

NeuralNetworks•  Keras

–  High-levelneuralnetworksAPIwritteninPython–  CapableofrunningontopofTensorFlow,CNTK,orTheano–  Developedwithafocusonenablingfastexperimentation

•  TheRpackages,kerasandkerasR–  TwoRversionofKerasforthestatisticalcommunity–  keraspackageusesthepipeoperator(%>%)toconnectfunctionsor

operationstogether,butyouwon’tfindthisinkerasR–  kerasRusesthe$operatortomakeyourmodel–  kerasRcontainsfunctionsthatarenamedinasimilar,butnotidenticalwayas

theoriginalKeraspackage.Copyright®MarkChang,PhD,AGInception,

BostonUniversity,2019 30

Page 31: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

MachineLearningPackagesInR

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 31

Page 32: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

PublicDataSourcesForAI•  ThecommonlyusedpublicdatarepositoriesforAI

applicationsincancerpredictionandclusteringincludetheTCGA,UCI,NCBIGeneExpressionOmnibus(GEO)andKentridgebiomedicaldatabases.

•  Kaggle–  InsideKaggleyou’llfindallthecode&datayouneedtodoyourdatasciencework.Useover19,000publicdatasetsand200,000publicnotebooksarereadytoconqueranyanalysisinnotime.

–  www.kaggle.com•  UCICenterforMachineLearningandIntelligentSystems

–  http://archive.ics.uci.edu/ml/datasets.php

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 32

Page 33: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

TwoSpecialArtificialNeuralNetworks:

– GenerativeAdversarialNetworks(GANs)– Autoencoder(AutoassociativeNetwork)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 33

Page 34: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

GenerativeAdversarialNetworks(GANs)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 34

1.  Generatortakesinrandomnumbersandreturnsanimage.2.  Thegeneratedimageisfedintothediscriminatoralongwithastreamofimagestakenfrom

theactualdataset.3.  Discriminatortakesinbothrealandfakeimagesandreturnsprobabilitiesofauthenticity.

GANcanbeviewedasthecombinationofacounterfeiterandacop,wherethecounterfeiterislearningtopassfalsenotes,andthecopislearningtodetectthem.Botharedynamicinthezero-sumgame,eachsidecomestolearntheother’smethodsinaconstantescalation.Asthediscriminatorchangesitsbehavior,sodoesthegenerator,andviceversa.

Page 35: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationsOfGenerativeAdversarialNetworks(GAN)

•  Imagingmarkerscanbeusedformonitoringdiseaseprogressionwithorwithouttreatment.Modelsaretypicallybasedonlargeamountsofdatawithannotatedexamplesofknownmarkersaimingatautomatingdetection.ChristianDoppleretal(2017)developedaGANthatcanlearnamanifoldofnormalanatomicalvariability.Appliedtonewdatasuchasimagescontainingretinalfluidorhyperreflectivefoci,themodellabelsanomaliesandscoresimagepatchesindicatingtheirfitintothelearneddistribution.

•  Deepgenerativeadversarialnetworksaretheemergingtechnologyindrugdiscoveryandbiomarkerdevelopment.Kadurinetal.(2017)demonstratedaproof-of-conceptinimplementingadeepGANtoidentifynewmolecularfingerprintswithpredefinedanticancerproperties.TheyalsodevelopedanewGANmodelformolecularfeatureextractionproblems,andshowedthatthemodelsignificantlyenhancesthecapacityandefficiencyofdevelopmentofthenewmoleculeswithspecificanticancerpropertiesusingthedeepgenerativemodels.

•  Yahi,etal.(2017)proposeaframeworkforexploringthevalueofGANsinthecontextofcontinuouslaboratorytimeseriesdata.Theauthorsdeviseanunsupervisedevaluationmethodthatmeasuresthepredictivepowerofsyn-theticlaboratorytesttimeseriesandshowthatwhenitcomestopredictingtheimpactofdrugexposureonlaboratorytestdata,incorporatingrepresen-tationlearningofthetrainingcohortspriortotrainingtheGANmodelsisbeneficial.

•  Putin,etal.(2018)proposedaReinforcedAdversarialNeuralComputer(RANC)forthedenovodesignofnovelsmall-moleculeorganicstructuresbasedontheGANparadigmandreinforcementlearning.ThestudyshowsRANCscanbereasonablyregardedasapromisingstartingpointtodevelopnovelmoleculeswithactivityagainstdifferentbiologicaltargetsorpathways.Thisapproachallowsscientiststosavetimeandcoversabroadchemicalspacepopulatedwithnovelanddiversecompounds.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 35

Page 36: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 36

1.  Autoassociative:Output=input2.  DimensionReduction:Hiddenlayersmallerthaninputlayer3.  Mixtureofunsupervisedandsupervisedlearning

Autoencoder(AutoassociativeNetwork)

Page 37: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationsofAutoencoder•  Kadurin,etal.(2017)presentedanapplicationofautoencodersfor

generatingnovelmolecularfingerprintswithadefinedsetofparameters.–  7-layerarchitecturewiththelatentmiddlelayerservingasadiscriminator–  Inputandoutputuseavectorofbinaryfingerprintsandconcentrationofthe

molecule.–  NCI-60celllineassaydatafor6252compounds–  Screened72millioncompoundsinPubChemandselectcandidatemolecules

withpotentialanti-cancerproperties.•  CancerpredictionusingAIincludespredictingtheexistenceofcancer,

cancertypeandsurvivabilityrisk.Differenttypesofautoencodershavebeenusedforfilteringmicroarraygeneexpressions:–  Stackeddenoisingautoencoders(Jie,etal.,2015),–  Contractiveautoencoders(Macıas-Garcıa,etal.,2017)–  Sparseautoencoders(Rasool,2013),–  Regularizedautoencoders(Kumardeep,etal.,2017)–  Variationalautoencoders(WayandGreene,2017)–  Deeparchitectureoffourlayerswith15,000,10,000,2000,and500neurons

(Danaeeetal,2016)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 37

Page 38: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

DecisionTree•  ATreeMethod(TM)

–  Acommonlyusedsupervisedlearningmethod–  Classification&RegressionTree–  Featuresaredichotomizedforabinarytree

•  ImpurityMeasuresForOptimization–  Giniindex(CART)–  Entropy(ID3,C4.5)–  Misclassificationerror

•  OverfittingandErrorPropagation–  Forlargertrees,anearlymisclassificationerrorcanpropagatedownstream,

eventuallyleadingtopoorpredictions.Solutions:–  Treepruning–  Bagging&Boosting–  RandomForest

•  ApplicationsinMedicine

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 38

Page 39: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ExampleOfBinaryDecisionTree

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 39

Page 40: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

TheDecisionTreeMethodForCategoricalOutcome

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 40

Page 41: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

CommonImpurityMeasures

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 41

MisclassificationError

GiniIndex

Gross-EntropyorDeviance

ForNodemofAClassificationTree(predictedprobabilitypmk):

Page 42: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ThresholdDeterminationAndTreePruning

•  DeterminingAnOptimalTreeModel–  Twocompetingfactors:theaccuracyofthetreemethodand

computationalefficacy.–  Foragiventreedepth,optimalthresholdforeachparameteris

obtainedusingthegreedymethod:tryarangeofdifferentthresholds–  Anyofthethreeimpuritiescanbeused,butoftenmisclassification

rate.–  Treesizeisatuningparametergoverningthemodel’scomplexity

•  Cost-ComplexityPruning–  Splittreenodesonlywhenthedecreaseinerrorduetothesplit

exceedssomethreshold.Thisstrategyistooshort-sightedsinceaseeminglyworthlesssplitmightleadtoaverygoodsplitbelowit.

–  Thepreferredstrategyistogrowalargetree,stoppingthesplittingprocessonlywhensomeminimumnodesizeortreedepthisreached.Thenthislargetreeisprunedusingcost-complexitypruning

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 42

Page 43: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Bagging,Boosting,AndRandomForest•  Asinglebigtreecanpropagateclassificationerrorstotheleaves,

makingthepredictionveryunstable.–  Solutions:Bagging,Boosting,andRandomForest

•  Bagging(BootstrapAggregation)–  Forminganaverageofmanydifferenttrees.

•  Boosting–  Outputtheclass:

–  WeakclassifiersGi(x)areonlyslightlybetterthanrandomguessing.–  αicomputedbytheboostingalgorithm(AdaBoostAlgorithm)to

weightmoreonbetterclassifiesGi(x).•  RandomForests

–  anensembleclassifierthatconsistsofmanydecisiontreeswithrandomlyselectedfeatures

–  thefinalclassisthemodeoftheclass’soutputbyindividualtrees.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 43

Page 44: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationsOfDecisionTreeMethods

•  Singh,etal(2015)haveusedRFsforbioactivityclassification•  Wangetal.(2015)haveusedtheRFmethodtomodeltheprotein-

ligandbindingaffinitybetween170complexesofHIV-1proteases,110complexesoftrypsin,and126complexesofcarbonicanhydrase.

•  Kumarietal(2015)haveconstructedanimprovedRFbyintegratingbootstrapandrotationfeaturematrixcomponentsfordrugtargetsidentification.

•  Mistry,etal.(2016)haveusedRFandDTstomodelthedrug-vehicletoxicityrelationship.Theirdatasetincluded227,093potentialdrugcandidatesand39potentialvehicles.Theresultingmodelpredictedthetoxicityreliefofdrugsbyspecificvehicles.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 44

Page 45: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Similarity-BasedandSimilarity-Principle-BasedMachineLearning

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 45

Page 46: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

NewScientificParadigm

•  ControversiesinStatisticalEvidenceandScientificDiscovery

•  CallforNewParadigms–AI/MachineLearning•  TheSimilarityPrinciple(SP)

– Humanintelligenceindailylife– RoleofSPinscientificdiscovery– Similarity-BasedMachineLearning(SBML)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 46

Page 47: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Simpson’sParadox

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 47

Allpatients:BbetterthanAMales:AbetterthanBFemales:AbetterthanBQuestion:ShouldapatienttakeAorB?AllFemales:AbetterthanBYoungFemales:BisbetterthanAOldFemales:BisbetterthanAQuestion:ShouldapatienttakeAorB?

•  Howtointerprettheresultsifthesepatients+you=theentirepatientpopulation(Oneresponsefromyouwouldnotchangethedirectionofthedrugeffect)?

•  ControversialQuestion:HowSpecificIsTooSpecific?•  Itisphilosophicalquestionorstatisticalquestion:Don’tputcartbeforethehorse•  FundamentalSolution:TheSimilarityPrinciple

Page 48: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ControversiesInPredictionOfDrugEffect

•  LongevityPrediction–ParadoxofTraveling–  Doesthepredictionofperson’slongevitychangesassoonashearrivedatthenewcountrybecauseofitsdifferentlife-expectance?

•  BiasInPredictiveEffect–  Stratifiedrandomizedandun-randomnessinclinicsiteselectionleadtoadifferentpatientdistribution(e.g.racedistribution)intheclinicaltrialfromthetargetpopulation

–  Ifracehaseffect,themeandrugeffectwillbeusuallybiased.

•  DrugEffectEstimationforMultiregionalTrial–  Howsmallshouldregionstobe,country,state,city?

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 48

Page 49: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SimilarityPrinciple•  Allscience,andlearningitself,isbasedonafundamental

principle–thesimilarityprinciple(Chang,2012,2014).•  TheSimilarityPrinciple:

–  Similarthingsorindividualswilllikelybehavesimilarly,andthemoresimilartheyarethemoresimilarlytheybehave.

–  Ex.,peoplewiththesame(orasimilar)disease,gender,andagewilllikelyhavesimilarresponsestoamedicalintervention.Iftheyaresimilarinmoreaspectstheywillhavemoresimilarresponses.

–  Predictedoutcome=similarity-weightedtheobservedoutcomes:

–  NowdoyouknowhowtoresolvetheSimpson’sParadox?

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 49

Page 50: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SimilarityPrincipleIllustrated

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 50

Similarthingsbehavesimilarly.

IllustrationsSimilarityPrinciple

Thingsmoresimilarbehavemoresimilarly.

Featuresselectionandgrouping.

Rolesofoutcomeandfeaturesinsimilarity

1.  QSAR:Compoundswithsimilarstructureswillhavesimilaractivities.

2.  Terroristattach:Sept11Everyyearisahighrisky.

1.  Useresultsofdrugtestinanimalstoinformasmallfirst-in-manclinicaltrial,and

2.  Useresultsfromclinicaltrialstodecidewhetherthedrugcanbeusedforlargepatientpopulation

1.  Definethetargetpopulationandusedformodelevaluation

2.  Appliedtoscientificdiscoveryandcausalrelationships

1.  Similarityreliesonbyboththefeaturesandoutcomevariable.

2.  Therelativeimportanceofdifferentfeaturescanbeobjectivelydeterminedusingfeature-scalingfactors

Page 51: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

TwoApproachesToLearningFromData

•  CausalityApproach–  Seekingtherelationshipbetweentheoutcomeandtheindependent

variables(attributes)–  Y(x)=f(x;a(x’)),parameters=functionofobserveddata

•  SimilarityApproach–  Seekingtherelationshipbetweentheoutcomesbetweendifferent

subjectswithdifferentattributes–  Y(x)~S(x,x’)Y(x’),similarity=functionofobserveddata

•  SimilarityPrinciple(SP)versusCausality–  SPisthefoundationofanycausalrelationships–  Unavoidablesimilaritygroupingforrepetitionsofeventsandcausality

•  AILearningandScientificDiscovery–  Showhumanbrainsincredibleability–  Implyhumanincredibleinabilitytohandleeverythingindividually

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 51

Page 52: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SubjectivityOfCausality&ControversiesInInterpretationsOfCauseEffects

•  SubjectivityinPotentialCauseSelection•  CausesAreGenerallyNotIndependent•  ModelsAreNotUnique

–  BodyWeight=f(waist,height)–  BodyWeight=f(waist,height,shoulderheight)–  BodyWeight=f(waist,height,shoulderheight,footlength)–  Infact,allfactorsarecorrelatedandstatisticallysignificant

•  DifferentModelsProvideDifferentInterpretationsofEffect(associationorcausality)ofWaistandHeight

•  SimilarControversiesInDrugEffect=f(drug,race,genex,geney)–  Simplystatingdrugeffectwithoutspecifyingotherfactorsismisleading.–  Interpretationofafactoreffectmustinthecontextofallotherfactorsinthe

model–  Givenalargerbutfinitepopulation,manyfactors(andinfinitepossible

combinationsofthesefactors)canbesignificant.Thefactorsincludedinthemodelissubjectiveandconsequentlytheinterpretationdrugeffectisarbitrary

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 52

Page 53: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

DoseAnalysisofCovarianceMakeSense?•  ModelOptions:

–  Model1:Weight=a1*height–  Model2:Weight=a2*height+b2*(shoulderHeight)–  Model3:Weight=a3*height+b3*(shoulderHeight)+c3*(leglength)–  Model4:Weight=d4*(headheight)+b4*(shoulderheight)

•  Givenbigdata–  Allmodelsaresignificant

•  Howtointerprettheparameters?–  Whatistheeffectofheight,a1,a2,ora3?–  Whatistheeffectofshoulderheight,b1,b2orb4?–  a1,a2,anda3canbeverydifferentsinceheightmainlyconsistsofshoulder-

height,andlegisapartofshoulder-heightandheight–  c3canbenegativesincea3andb3havealreadyincludethelegeffect,would

yousay:“longerlegwillreducetheweight?”•  Conclusion:

–  Manyattributesorcovariates(includinggenes)inmodelingexhibitstrongcorrelations,thecommoninterpretationofparametersoftendonotmakemuchsensetome.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 53

Page 54: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SimilarityPrincipleInAction-Similarix

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 54

BoundedSimilarity:0≤Sij≤1Completelydifferent=0Identical=1

Page 55: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Feature-ScalingFactors(FSF)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 55

•  RkdependonSelectedFeaturesandOutcome•  IncludingMoreCommonFeatureswill

•  reducethedifferenceandincreasethesimilarityamongsubjects•  leadtolargerRkdifferencestobalanceouttheeffectofcommonfeatures

•  WhyExponentialSimilarityFunction•  Humanonlycanpayattentiontolimited“localevents”anddistantevents

quicklydisappearatthehorizon•  Humanorgansensibilityisonlog-scale(e.g.,loudness,brightness)

ExponentialSimilarityFunction:

DistanceFunction:

Page 56: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SBML-Algorithm

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 56

SolveinitialscalingfactorsRkbasedonSijandsimilarityfunction

CalculateweightWijusingSijandRk

Normalizetrainingdataset

Assigninitialsimilarityscores,Sij

Normalizednewdata(withouttheoutcomevariable)usingmeanand

standarddeviationfromthetrainingset

Modeloutcome,Yi=sumwijOj

CalculateerrorE=sum(Yi-Oi)2/N

UpdatescalingFactorsRkusinggradientmethodwithlearningrate,alpha

CalculatesimilarityscoreSijbetweennewsubjectsandsubjectsinthetrainingsetusingthepreviouslydeterminedRk

CalculateweightWijusingRk

Predictfutureoutcome,Yi=sumwijOj

Page 57: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

•  LossFunctions–  Lasso:

–  Ridge:

–  ElasticNet:•  MinimizationwithGradientMethod

•  Learningwithrateα

LearningWithRegularization

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 57

Page 58: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ClinicalTrialExample:RareDisease

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 58

TrainingSetSize

FullLinearModel

OptimalLinearModel

RidgeRegression

SBML

25% 0.774 0.788 0.796 0.616

50% 0.803 0.802 0.804 0.655

75% 0.806 0.809 0.810 0.668

Note:MSEnormalizedbyvarianceoftheoutcomes

MeanSquaredError(MSE)ofDifferenceMachineLearningMethodsBasedonBootstraps

Page 59: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Training,Validation&EvaluationOfAIMethods

•  Cross-ValidationforTuningParameters–  Exhaustivecross-validationmethods:allpossiblewaysoftraining-validationsplits.

–  Leave-p-outcross-validation(LpOCV):exhaustivelysplitswithpobservationsasthevalidationset,therestfortraining.

–  Bootstrapping:therandomselectionofsizepastrainingsetandsizeqastestset,withreplacement

•  TargetPopulation(TP)–  Well-definedTP(WDTP)isneededformodelevaluation,butinAIworld,WDTPisoftenmissing.

–  Bigdata:simplysplitthesampleintotrainsetandtestset–  Largerdata:trainsetsandtestsetsobtainedfromresamplingwithoutreplacement

–  Smalldata:useempiricaldistribution–trainsetsandtestsetsobtainedfrombootstrapping

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 59

Page 60: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SBMLPerformanceOfRegressionProblem

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 60

MSEofP

redicted

Y/NaïveM

SE

TrainingSampleSize

OutcomeY=X1+X2X3+X4X4+X5.However,X1,…,X7fromstandardmultivariatenormaldistributionareincludedinallmodels(X6,X7arenoise).

Page 61: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

PrecisionVersusComplexityOfDataRelationship,SampleSize,AndComputationalEfficiency

-VariablesX6andX7arenoise-  TrainingTime=O(N2)=0.09N2

(second).TrainingTime=2.4minutesfortrainsetsizeN=400.

-  MSEnormalizedbythenaïveMSEfromthepredictionusingthetrainingsamplemean

-  Learningrateα=0.125,λ =-0.5Epoch=5

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 61

=X1+X2X3+X4X4+X5=X1+5X2X3+X4X4+X5

Normalize

dMSE

LessNonlinear

MoreNonlinear

TrainingSampleSize

Page 62: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SBMLPerformanceForClassification

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 62

•  Y=sign(X1+X2X3+X4X4+X5)+1)/2•  Xi=multivariatenormal•  Lossfunctionλ=-0.5•  LearningRateα=0.125•  MSEisbasedonprobabilityofY=1.

LogisticModel

SBML

MSE

MSE

MSE

Page 63: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ParadigmsOfDrugDevelopment

•  Challenges– Multi-RegionalTrialandSubgroupAnalysis– RareDiseaseandPrecisionMedicine

•  Solutions– SmallerClinicalTrials– StagewiseMarketing-Authorization– EnhancedPharmacovigilance&PostmarketingMonitoring

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 63

Page 64: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

KernelMethod•  KernelMethod:

–  KernelAssmoothfunctionwithnormalizationfactor(Hastieetal,2001)–  Non-linearkernelregressionwithoutnormalizationfactor(Hastieetal,2001)–  Kernelasattributesinlearning(trainingweights)(Scholkopf,etc.,2005)

•  Kernel

–  Kernel,k(�,�),isthedotproductoffeaturevectors–  Kernelisasimilarityscorefunction

•  FeatureSelectionforDifferentObjects–  Sequencing(text,sound,music)–  Images(black-white,color)–  Trees(chemicalcompounds)–  Networks(metabolicnet,disease-symptomnet,drug-AEnet)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 64

Page 65: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

TheKernelTrick•  Anyalgorithmforvectorialdatathatcanbeexpressedonlyintermsofdotproductsbetweenvectorscanbeperformedimplicitlyinthefeaturespaceassociatedwithanykernel,byreplacingeachdotproductbyakernelevaluation(Scholkopf,etc.,2005).

•  Transferlinearmethods(e.g.,PCA)intononlinearmethodsbysimplyreplacingtheclassicdotproductwithamoregeneralkernel,suchastheGaussianRBFkernel.

•  Nonlinearityviathenewkernelisthenobtainedatminimalextracomputationalcost,asthealgorithmremainsexactlythesame.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 65

Page 66: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SupportVectorMachineAsAKernelMethod

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 66

ClassifierwithHardMarginModel:

ClassifierwithSoftMarginModel:

Page 67: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SupportVectorMachineInR

ForPredictingCognitiveImpairment

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 67

•  Alzheimer’sdiseaseisthemostcommoncauseofdementiaintheelderly.

•  Predictthecognitiveoutcomeusingthedemographicsand130biomarkers(predictors).

Page 68: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationOfKernelMethods<<KernelMethodsinComputationalBiology>>,Scholkopfetal(2004)collectdifferentapplicationsofkernelmethodsandsupportvectormachines:•  InexactMatchingStringKernelsforProteinClassification•  FastKernelsforStringandTreeMatching•  LocalAlignmentKernelsforBiologicalSequences•  KernelsforGraphs•  DiffusionKernels•  AKernelforProteinSecondaryStructurePrediction•  HeterogeneousDataComparisonandGeneSelectionwithKernelCanonical

CorrelationAnalysis•  Kernel-BasedIntegrationofGenomicDataUsingSemi-definiteProgramming•  ProteinClassificationviaKernelMatrixCompletion•  AccurateSpliceSiteDetectionforCaenorhabditiselegans•  GeneExpressionAnalysis:JointFeatureSelectionandClassifierDesign•  GeneSelectionforMicroarrayData

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 68

Page 69: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationOfSVM•  SVM-basedproteinfoldrecognitionmethods(Shamimetal.,2007,2013;DamoulasandGirolami,2008;Dongetal.,2009;YangandChen,2011).ThemaindifferenceamongtheseSVM-basedmethodsistheirfeaturerepresentationalgorithms.

•  Poorinmohammadetal.(2015)combinedtheSVMapproachwithpseudoaminoacidcompositiondescriptorstoclassifyanti-HIVpeptides,withapredictionaccuracyof96.76%.

•  WeiandZou(2016)providedareviewoftherecentprogressinmachinelearning-basedmethodsforproteinfoldrecognitionpriorto2011.

Copyright®MarkChang,PhD,AGInception,

BostonUniversity,2019 69

Page 70: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

UnsupervisedLearning

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 70

Page 71: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

UnsupervisedLearning(USL)•  UnsupervisedLearningHasNoClear,CorrectAnswers•  USL=Clustering,Association,andAnomalyDetection

–  Clustering=discoveringtheinherentgroupingsinthedata.Ex:groupingpatientsbytheirbaselinediseaseseverityanddemographiccharacteristics

–  Anassociationrulelearning=discoveringrulesthatdescribelargeportionsofthedata.Ex:peoplebuyingproductAalsotendingtobuyproductB.

–  Anomaly(outlier)andnoveltydetections=identifyingitems,eventsorobservationsthatdonotconformtoanexpectedpattern.Anomalydetectioncanalsobeasupervisedlearning.Ex:structuraldefects,medicalproblems,texterrors,orbankfraud

•  Applications–  Groupingbreastcancerpatientsbytheirgeneticmarkers–  GroupingMovieviewersbytheratingsassignedbymovieviewers–  Findingsaleitemsthatarehighlyrelatedcanbeveryhelpfulwhen

stockingshelvesordoingcross-marketinginsalespromotions,catalogdesign,andconsumersegmentationbasedonbuyingpatterns.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 71

Page 72: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

•  AprioriAlgorithmforAssociationRule–  Identifythefrequentindividualitemsintherelationaldatabaseand

extendsthemtolargerandlargeritemsetsaslongasthoseitemsetsappearsufficientlyofteninthedatabase

•  PrincipalComponentAnalysis(PCA)–  Findalow-dimensionalrepresentationoftheobservations–  Sequentiallyfindasetoflinearcombinationsofthepredictorsthat

havemaximalvarianceandareorthogonalwitheachother.•  K-MeansClusteringforDimensionReduction

–  PartitiontheNobservationsintoapre-specifiedKclusterssoastominimizethewithin-clustersumofsquares:

•  HierarchicalClusteringforDimensionReduction–  Yieldsatree-likevisualrepresentation(Dendrogram)ofthe

observationswithoutpre-specifiednumberofclusters

CommonUnsupervisedLearningAlgorithms

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 72

Page 73: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

HierarchicalClusteringAlgorithm

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 73

Page 74: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

HierarchicalClusteringForBreastCancerPatients

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 74

Page 75: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

HeatmapofGeneSamples

75

Page 76: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationsOfClusteringMethods•  Böckeetal(2005),ahierarchicalclusteringapproachforlargecompoundlibraries•  Wallneretal(2010)studiedcorrelationandclusteranalysisofimmunomodulatory

drugsbasedoncytokineprofiles.•  Tanetal(2015)andRasooletal(2013)useneuralnetworkfilteringmethodsare

usedforextractingrepresentationsthatbestdescribethegeneexpressions.•  Haratyetal(2015)studiedanenhancedk-meanclusteringalgorithmforpattern

discoveryinhealthdata.•  Yildirimetal(2014)usek-meansalgorithmtodiscoverhiddenknowledgeinhealth

recordsofchildrenanddatamininginbioinformatics.Theyconcludethatmedicalprofessionalscaninvestigatetheclusterswhichourstudyrevealed,thusgainingusefulknowledgeandinsightintothisdatafortheirclinicalstudies.

•  Hameedertal(2018)proposedatwo-tieredunsupervisedclusteringapproachfordrugrepositioningthroughheterogeneousdataintegration.

•  MacCuishetal(book,2019),clusteringinbioinformaticsanddrugdiscovery•  Maetal(2019),acomparativestudyofclusterdetectionalgorithmsinprotein–

proteininteractionfordrugtargetdiscoveryanddrugrepurposing.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 76

Page 77: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ReinforcementLearning(RL)•  Learning

–  OptimizeautilityFunction–  Feedbackfromone’senvironmentisessentialforRL.–  Forthesituationwhenthecorrectanswerisdifficulttodefineorthere

aretoomanypossiblepathstocompletethetask.•  Example-AlphaGoZero

–  AcomputerprogramthatplaystheboardgameGo–  Atthe2017FutureofGoSummit,beattheworldNo.1rankedplayer–  AlphaGoanditssuccessorsuseaMonteCarlotreesearchalgorithmto

finditsmovesbasedonknowledgelearnedbyanANNfrombothhumanandcomputerplay

•  Methods–  StochasticDecisionProcess–  Q-Learning

•  Applications–  MarkovDecisionProcessforclinicaldevelopmentprograms(Chang,

2010)–  Valueiterationandpolicyiterationalgorithms

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 77

Page 78: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

PhaseTransitionProbabilitiesOfClinicalTrials

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 78

Source:Changetal.(2019)

Page 79: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

MarkovChainForClinicalDevelopmentProgram

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 79

Source:Chang,2010;Changetal.2019

Pij=ActualPhaseTransitionProbabilityfromPhaseitoPhasejforalldiseaseindications

Page 80: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SDP=Markovchain&actionrules(calledpolicy)tooptimizetheNPVateachstage.Backwardinduction&dynamicprogrammingusedtosolvetheproblem.Action=trialdesignandconduct,……

SequentialDecisionProcess(SDP)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 80

Source:Chang,2010;Changetal.2019

NME

Phase1&

Action

Phase2&

Action

Phase3&

Action

MarketingAuthorization

NetPresentValue

(NPV

)

DevelopmentTimeline

P01

P12

P23

P34

Page 81: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

StochasticDecisionProcessForACancerClinicalDevelopmentProgram(CDP)

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 81

Source:Changetal.2019

Page 82: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SwarmIntelligence(SI)•  Self-OrganizedSystem(SOS)

–  SOS=Systemsinwhichorganizedbehaviorariseswithoutacentralizedcontrollerorleader

–  Stigmergy=amechanismofindirectcoordinationbetweenagents:thetraceleftintheenvironmentbyanactionstimulatestheperformanceofanextaction,byadifferentagent.

•  SwarmIntelligence(SI)–  SI=intelligencepossessedbySOSwithoutcommongoalandleader

•  CharacteristicsofSI–  Micromotivedandmacroconsequence(genotypeversusphenotype)–  EachindividualinSOShasnointelligenceandfollowasimplerule.–  Ex:eachantsimplyfollowsthehormone,butthecolonybehavesintelligently

infindingtheshortestpathtothefoodsource–  SIisnotownbyanyindividual

•  SIExamplesinNature–  Antcolonies,birdflocking,animalherding,bacterialgrowth,fishschooling,

andmicrobialintelligence•  ArtificialSwarmIntelligenceAlgorithms

–  Particleswarm,Antcolony,Artificialbeecolony,Artificialimmunesystems,,Gravitationalsearch,Glowwormswarm,andBatalgorithms

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 82

Page 83: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

HowSwarmIntelligenceWorks

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 83

SwarmIntelligence:Everyantfollowthesmell,collectivelytheantcolonyquicklyfindtheshortestpathwithouttheguidanceofanycentralintelligence

Page 84: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

SwarmIntelligenceApplications•  Acentralpartoftherationaldrugdevelopmentprocessisthepredictionofthe

complexstructureofasmallligandwithaprotein,theso-calledprotein-liganddockingproblem,usedinvirtualscreeningoflargedatabasesandleadoptimization.

•  Korbetal(2006)developedadockingalgorithmbasedonantcolonyoptimization,tostructure-baseddrugdesign.

•  Fu,etal.(2015)studiedanewapproachforflexiblemoleculardockingbasedonswarmintelligence.Theycomputetheinteractionsof23protein-ligandcomplexes.

•  RajeshkumarandKousalya(2017)presentaviewonapplicationsofswarm-basedintelligencealgorithmsinpharmaceuticalin-dustry,includingdrugdesign,pharmacovigilance,andalignmentofsequence.

•  Soulami,et.Al.(2017)usedaparticleswarmoptimization(PSO)basedalgorithmfordetectionandclassificationofabnormalitiesinmammographicimagesusingtexturefeaturesandsupportvectormachine(SVM)classifiers.

•  Fanget.al.(2018)presentedanovelfeatureselectioncalledtheelitesearchmechanism-basedflowerpollinationalgorithm(ESFPA)usedtodetermineproteinessentiality.ESFPAusesanimprovedSIalgorithmforfeatureselectionandselectsoptimalfeaturesforproteinessentialityprediction.

•  ThemetaheuristicOptpackageinRincludesmanyoptimizationalgorithms

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 84

Page 85: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

EvolutionaryIntelligence(EI)•  InspirationbyNaturalEvolutionProcess

–  Inspiredbyourunderstandingofbiologicalevolution•  MinimalorNoSpecificationofSolutionStructure

–  Automaticallysolvesproblemswithoutrequiringtheusertoknoworspecifytheformorstructureofthesolutioninadvance.

•  SolutionImprovementViaEvolution–  Generationbygeneration,stochasticallytransformpopulationsofcomputerprogramsintonewpopulationsofprogramsthatwilleffectivelysolveproblems.

•  GeneticAlgorithm(GA)versusGeneticProgramming(GP)–  Bothuseevolutionalgorithms–  GAisrepresentedasalistofactionsandvalues,oftenastring–  GPisrepresentedasatreestructureofactionsandvalues,usuallyanesteddatastructure.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 85

Page 86: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

GeneticProgramming

ThreeKeyElementsinGP:(1)Representation:SyntaxTree

Mathematicalorcompoundstructure(2)ReproductiveMechanism:CrossoversandMutations

Crossover=combinationoftworandomlyselecteddatastructurestoproduceonenewstructure.Amutation=randomlygeneratesasubtreetoreplacearandomlyselectedsubtreefromarandomlyselectedindividual.

(3)SurvivalFitnessForfindingafunctiong(x)toapproximatethetargetfunctionf(x),thefitness=themeansquareerrorbetweenthetwofunctions.Forfindingaoptimaltreatmentsequenceforinfertility,thefitnessfunction=treatmentsuccess(1forsuccessand0forfailure).

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 86

Page 87: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

GeneticAlgorithmForTreatmentStrategyOptimization

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 87

Thepopulation(treatmentsequences)changesconstantlysincesurvivabilityisafunctionoffitnessorobservedeffectoftreatmentsequence

Treatmentsequencesfortreatingwomeninfertility

Page 88: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ApplicationOfEvolutionaryIntelligence•  RNAstructureprediction(Batenburgetal,1995)•  Molecularstructureoptimization(Wong,etal.2011)•  GhoshandJain(2005)collectedarticlesacrossabroadrangeoftopicson

theapplicationsofevolutionaryAIindrugdiscovery,GPindataminingfordrugdiscovery

•  Barmpalex,et.al.(2011)hasusedsymbolicregressionviageneticprogrammingintheoptimizationofacontrolledreleasepharmaceuticalformulationandcompareditspredictiveperformancetoANNs.

•  Ghaheri,et.al.(2015)reviewedtheapplicationsofthegeneticalgorithmindiseasescreening,diagnosis,treatmentplanning,pharmacovigilance,prognosis,andhealthcaremanagement.

•  Ghaherietal(2019)providedancomprehensivereviewoftheapplicationsofgeneticalgorithmsinmedicine,in14differentareas:–  radiology,oncology,cardiology,endocrinology,obstetricsandgynecology,

pediatrics,surgery,infectiousdiseases,pulmonology,radiotherapy,rehabilitationmedicine,orthopedics,neurology,pharmacotherapy,andhealthcaremanagement.

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 88

Page 89: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Review&Summary

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 89

Page 90: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 90

UnsupervisedLearning

MachineLearning

ReinforcementLearning

EvolutionalLearning

SwarmIntelligenceLearning

SupervisedLearning

ListOfMachineLearningMethodsDetailsandReferencesAvailablefromtheUpcomingBook(Chang,2020)

-Linkanalysis-PCA-K-Means-HierarchicalClustering-SOM

-SBML-KNN-Deeplearning(FNN,CNN,RNN,DBN)-Kernelmethod-SVM-GANs-Autoencoders-Decisiontree(bagging,Boosting,Randomforest)-Logisticregression-Ridge,lasso,elasticnetregressions-Bayesiannetwork

-Sequentialdecisionprocess-BayesianQ-learning

-Geneticalgorithm-Geneticprogramming-Cellularautomata

-Antclusteringmodel-Particleswarmoptimization(PSO)

Page 91: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 91

AIApplications

KernelMethodforProteinClassification&BiologicalSequences

DrugDiscovery

Healthcare

MicroarrayDataAnalysis

DrugDevelopment

SBMLforClinicalTrialsofRareDisease&PrecisionMedicine

DeepLearning(FNN,CNN,RNN,LSTM,DBN)fordrugdesign&moleculeclassification

DecisionTree,RandomForestforbioactivityclassification,protein-ligandbinding,toxicitymodeling

LDAfordrug-druginteraction,kNNfordrugablemoleculesidentification,SVMforDNA-bindingproteinprediction,Networksimilarityfordiseasemodeling

NLPforPharmacovigilance:Auto-narrativegeneration,narrativeanalysis,QCassessment,causalityassessment.developdiseasemodels,probabilisticclinicalriskstratificationmodels,andpractice-basedclinicalpathways

DiseaseDiagnosis&PrognosiswithMedicalImage

Data

NLPforMedicalRecords:textprocessing,classification

KNN,SVM,CNN,AdaBoost,AutoencoderforCancerDetectionfromGeneExpressionData

ANN&SOMensemblesforgene-relatedclustering

CNNfortumordetection,segmentation,classification,andcomputer-aideddiagnosis,breastcancer(mammography),lungdiseases,lesionsegmentationfortraumaticbraininjuries,braintumor,CVdiseasewithCardiologicalimage

ListOfAIApplicationsDetailsandReferencesAvailablefromUpcomingBook(Chang,Feb2020)

Structure-Activity

Relationship(QSAR)

SVMfordisabilityduetostroke,

Page 92: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

ReferencesBooks:•  Chang,M.(2019).ArtificialIntelligenceinDrugDevelopment,PrecisionMedicine,andHealthcare.CRC•  Chang,M.(2010).MonteCarloSimulationforthepharmaceuticalindustry.CRC•  Chang,M.(2013).PrinciplesofScientificMethods.CRC•  Chang,M.(2007,2014).AdaptiveDesignTheoryandImplementationUsingSASandR.CRC•  Chang,M.(2011).ModernIssuesandMethodsinBiostatistics,Springer.•  Chang,M.(2012).ParadoxesinScientificInference.Chang,M.(2014).PrinciplesofScientificMethods.CRC•  ChangM.etal.(2019).InnovativeStrategies,StatisticalSolutionsandSimulationsforModernClinicalTrials.CRC•  ChowandChang(2011).AdaptiveDesignMethodsforClinicalTrials.CRC•  Hastie,T.,Tibshirani,R.,Friedman,J.(2001,2ndEdition,2009).TheElementsofStatisticalLearning.Springer•  Koza,J.etal(2005).GeneticProgrammingVI.Routinehuman-competitivemachineintelligence.Springer•  Ben,G&Pennachin,C.(1998).ArtificialGeneralIntelligence.Springer•  RussellS&Norving(2003).Artificialintelligence–amodernapproach.PrenticeHall.•  Bishop.C.(2006).PatternrecognitionandMachinelearning.SpringerReviewPapers:•  Daouda,M.andMayo,M.(2019).ASurveyOfNeuralNetwork-basedCancerPredictionModelsFromMicroarray

Data.ArtificialIntelligenceInMedicine97,204—214•  Jiang,F.etal(2017).Artificialintelligenceinhealthcare:past,presentandfuture.StrokeandVascularNeurology•  Long,E.,Lin,H.,etal(2017).Anartificialintelligenceplatformforthemultihospitalcollaborativemanagementof

congenitalcataracts.•  Qayyuma,A.etal.(2018.)MedicalImageAnalysisusingConvolutionalNeuralNetworks:AReview,Journalof

MedicalSystemsNovember2018,42:226•  Vanhaelen,Q.etal.(2017).Designofefficientcomputationalworkflowsforinsilicodrugrepurposing.Drug

DiscoveryToday Volume22,Number2 February2017•  Schmider,J.ArtificialIntelligenceinAdverseEventCaseProcessing.Clin.Pharmco&Therapeutics.Aprial2019•  Ghasemi,F.(2018).Neuralnetworkanddeep-learningalgorithmsusedinQSARstudies:meritsanddrawbacks.

DrugDiscoveryTodayVolume23,Number10October2018

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 92

Page 93: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Similarity-BasedMachineLearninginR(1)library(class)#Normalizedata(bysubtractingMeanandDividingbySDoftheRefDatanormalizeData=function(DataToNormalize,RefData){normalizedData=DataToNormalizefor(nin1:ncol(DataToNormalize)){normalizedData[,n]=(DataToNormalize[,n]-mean(RefData[,n]))/sd(RefData[,n])}return(normalizedData)}#CalculateinitialscalingfactorsR0s#S0s=p-valuesInitialRs=function(X,eta,S0s){K=length(S0s);R0s=rep(0,K)for(kin1:K){#force<12toavoids=exp(d)overflow R0s[k] = min((-log(S0s[k])) ^(1/eta) / max(IQR(X[ ,k]), 0.0000001), 12) }return(R0s);}Distance=function(Rs,x1,x2){#3vectors:Rs=scalingfactors,x1=point1,x2=point2K=length(Rs) d = 0; for (k in 1:K) { d = d + (Rs[k]* (x2[k]-x1[k]))^2 } return(d^0.5)}#CalculateSimilarityScore.SimilarityTrain=function(eta,Rs,Xtrain){N1=nrow(Xtrain);K=length(Rs); S = matrix(0, nrow=N1, ncol=N1) for(iin1:N1){for(jini:N1){ d2 = 0; for (k in 1:K) { d2 = d2 + (Rs[k]* (Xtrain[i,k]-Xtrain[j,k]))^2 } S[i,j]=exp(-d2^0.5)#avoid0fori=jS[j,i]=S[i,j]#Usesymmetrytoreduce50%CPUtime.}}#Endjandiloopsreturn(S)}#CalculateSimilarityScoresbetweentrainingandtestsubjects.Similarity=function(eta,Rs,Xtrain,Xtest){N1=nrow(Xtrain);N2=nrow(Xtest);K=length(Rs) S = matrix(0, nrow=N2, ncol=N1) for(iin1:N2){for(jin1:N1){ d2 = 0; for (k in 1:K) { d2 = d2 + (Rs[k]* (Xtest[i,k]-Xtrain[j,k]))^2 } S[i,j] = exp(-d2^0.5) }}#Endjandiloopsreturn(S)}

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 93

Weight=function(S){ N1= nrow(S); N2=ncol(S); W=matrix(0,nrow=N1,ncol=N2)for(iin1:N1){sum_S_row=sum(S[i,])for(jin1:N2){W[i,j]=S[i,j]/max(sum_S_row,0.00000000001)}}return(W)}#EndofWeightPredictedY=function(W,Ytrain,Ytest){#CalculatepredictedoutcomeandErrorOutObj=list()OutObj$pred_Y=W%*%Ytrain#Forbinaryoutcome,pred_y=probof1.OutObj$MSE=mean((OutObj$pred_Y-Ytest)^2)return(OutObj)}#EndofPredictionYDerivativeE=function(eta,pred_Y,Rs,X,S,O){#DerivativesifthelossfunctionN=nrow(X);K=length(Rs)der_S=matrix(0,nrow=K*N,ncol=N);der_W=matrix(0,nrow=K*N,ncol=N)dist=matrix(0,nrow=N,ncol=N);der_E=rep(0,K)for(iin1:N){for(jini:N){    d2 = 0; for (k in 1:K) { d2 = d2 + (Rs[k]* (X[i,k]-X[j,k]))^2 }dist[i,j]=max(d2^0.5,0.0000001)#avoidoverflowinder_dbelowdist[j,i]=dist[i,j]}}#Endofiandjloopsfor(min1:K){for(iin1:N){for(jini:N){   der_d = (Rs[m]/dist[i,j]) * (X[i,m]-X[j,m]) ^2    der_S[(m-1)*N+i, j] = -1 * S[i,j] * eta * (dist[i,j])^(eta-1) * der_d    der_S[(m-1)*N+j, i] =der_S[(m-1)*N+i, j]}}}#Endofj,i,andmloops#WeightDerivativefor(min1:K){for(iin1:N){   sum_der_S = sum(der_S[(m-1)*N+i, ]); sumSi = sum(S[i, ])for(jini:N){     der_W[(m-1)*N+i, j] = der_S[(m-1)*N+i, j] / sumSi - S[i,j]* sum_der_S /sumSi^2      der_W[(m-1)*N+j, i] =der_W[(m-1)*N+i, j]}}}#Endofj,i,andmloops#DerivativesofEfor(min1:K){for(iin1:N){err=2/N*(pred_Y[i]-O[i])#Formeansquarederror   for (j in 1:N) { der_E[m] = der_E[m] +  err * O[j] * der_W[(m-1)*N+i, j] }}}#EndofIandmloopsreturn(der_E)}#EndofDerivativeE

Page 94: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 94

Similarity-BasedMachineLearninginR(2)

Learning=function(LearningRate,Lamda,Rs,der_E){#Updatescalingfactors,Rs.K=length(Rs)der_lossFun=der_E+2*Lamda*Rs#derivativesoflossfunctionRs=Rs-LearningRate*der_lossFun#forceRs<25toavoidS=exp(d)overflow for (m in 1:length(Rs)) { Rs[m] = min(max(0,Rs[m]),25) }return(Rs)}#EndofLearning#TrainingAItoobtainscalingfactors,Rs###SBMLtrain=function(Epoch,S0,Lamda,LearningRate,eta,Xtrain,Ytrain){TrainObj=list();OutObj0=list();OutObj=list()R0=InitialRs(Xtrain,eta,S0);S=SimilarityTrain(eta,R0,Xtrain)OutObj0=PredictedY(Weight(S),Ytrain,Ytrain)Rs=R0;OutObj=OutObj0#incaseEpoch=0fornolearningTrainMSE0=OutObj0$MSEpreLoss=OutObj0$MSE+Lamda*sum(Rs^2)#Learning iter=0;while(iter<Epoch){preRs=RsRs=Learning(LearningRate,Lamda,Rs,DerivativeE(eta,OutObj0$pred_Y,Rs,Xtrain,S,Ytrain))OutObj=PredictedY(Weight(SimilarityTrain(eta,Rs,Xtrain)),Ytrain,Ytrain)iter=iter+1Loss=OutObj$MSE+Lamda*sum(Rs^2)if(Loss>preLoss){Rs=preRs;iter=Epoch+1}preLoss=Loss}TrainObj$Rs=Rs;TrainObj$Y=OutObj$pred_Y;TrainObj$MSE=OutObj$MSEreturn(TrainObj)}#Linearregression#outcome="gaussian"or"binomial",...GLMPv=function(Y,X,outcome){LMfull=glm(Y~.,data=X,family=outcome)sumLM=coef(summary(LMfull)) pVals=sumLM[,4][1:ncol(X)+1]return(pVals)}

#ExamplewithSimulatedMultivariateNormalData#covariancematrixforcreatingcorrelatedX.library(mvtnorm)sigma=matrix(c(1,0.0,0.6,0.2,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.6,0.0,1,0.5,0.0,0.0,0.0,0.2,0.0,0.5,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1),ncol=7)mu=c(0,0,0,0,0,0,0)#7multivaritenormalmeansinitS=rep(0.5,7)#7intialsimilarities.xTrain=rmvnorm(n=40,mean=mu,sigma=sigma)xTest=rmvnorm(n=40,mean=mu,sigma=sigma)#ConstructythatarenotlinearlyrelatedtoxYtrain=xTrain[,1]+xTrain[,2]*xTrain[,3]+xTrain[,4]^2+xTrain[,5]Ytest=xTest[,1]+xTest[,2]*xTest[,3]+xTest[,4]^2+xTest[,5]#DatanormalizationXtrain=as.data.frame(normalizeData(xTrain,xTrain))Xtest=as.data.frame(normalizeData(xTest,xTrain))#Assignp-valuefromLMtoinitialsimilarities#initS=GLMPv(Y=Ytrain,X=Xtrain,outcome="gaussian”)mySBMLtrain=SBMLtrain(Epoch=5,S0=initS,Lamda=-0.5,LearningRate=0.125,eta=1,Xtrain,Ytrain)myPred=PredictedY(Weight(Similarity(eta=1,mySBMLtrain$Rs,Xtrain,Xtest)),Ytrain,Ytest)#MSEusingtrainingmeanforpredictionplot(myPred$pred_Y,Ytest)mse1=mean((Ytest-mean(Ytrain))^2)#naivemsemse2=myPred$MSE#Probabilityerroravemse1=avemse1+mse1/50avemse2=avemse2+mse2/50}avemse1;avemse2

Page 95: Introduction To Artificial Intelligence For Drug Development & … · 2020. 9. 30. · with ones (artificial or not), making everyone a mixture of human and machine. – As robots

Acknowledgment

•  Dr.SusanHwang,BostonUniversity

Copyright®MarkChang,PhD,AGInception,BostonUniversity,2019 95