evaluation of interactive machine learning systems › ial2019 › pdf ›...
TRANSCRIPT
EvaluationofInteractiveMachineLearningSystems
NadiaBoukhelifaIAL,September2019
EvaluationofInteractiveMachineLearningSystems
NadiaBoukhelifaIAL,September2019
visual
(IVMLs)
WhoamI?
3
NadiaBOUKHELIFA
Ph.D.2007—UniversityofKent,UKComputerScience,InformationVisualization
Post-doc—INRIA,TélécomParisTech,FranceVisualization,Human-ComputerInteraction
Researcher,Tenured,2016—INRA,FranceMulti-dimensionalDataVisualization,InteractiveModelling
INRA—NationalInstituteofAgriculturalResearch
4
NadiaBOUKHELIFA
INRA—NationalInstituteofAgriculturalResearch
5
NadiaBOUKHELIFA
http://www.cfsg.fr/site-de-grignon
�6
MALICESTeam:ModellingandKnowledgeIntegration
Expertiseformalisation
Uncertainty
Deterministicandstochasticmodelling
DecisionmakingOptimisation,visualisation
�7
Expertiseformalisation
Uncertainty
Deterministicandstochasticmodelling
DecisionmakingOptimisation,visualisation
NotexpertinAI
MALICESTeam:ModellingandKnowledgeIntegration
�8
Context:complexsystems,complexdatasets,complexmodels…
Azodyn-blé (Jeuffroy, Recous, Girard, Barré, Champeil, Meynard) modèle complet
Caractéristiques du sol (Clay%, CaCO3%, N%)
Données climatiques (température, pluie, Irr)
Nature du précédent
Amendements organiques (date, dose, forme)
Minéralisation nette de l�humus, des résidus et des amendements organiques (Mh, Mr, Ma)
Engrais minéral ou organique (date et dose)
Nmin dans sol
Vitesse de croissance la semaine avant apport
Indice de nutrition azotée (NNI)
Quantité de N aérien réel
Courbe de dilution maxi, Vmax
Climat (rayonnement, température)
Quantité maxi de N accumulé dans la culture
Biomasse aérienne
Rayonnement intercepté
Indice foliaire
RUE
NG/m²réel Quantité max de N dans les grains
Max Biomass in grains
Biomasse des grains = rendement
Quantité d�N dans les grains
Teneur en protéines
Acides aminés accumulés
N remobilisé
N dans organes végétatifs
Nmin dans sol à récolte climat (température)
CAU de l�engrais
NE/m²
climat avant floraison (température, rayonnement, déficit hydrique) NG/m² max
ENTREES
SORTIES
Jeuffroy, M-H., and Sylvie Recous. "Azodyn: a simple model simulating the date of nitrogen deficiency for decision support in wheat fertilization." European journal of Agronomy 10, no. 2 (1999): 129-144.
�9
Context:InteractiveModelExploration
Whyhumanintheloopinmodelling?
• tointegratevaluableexpertsknowledgethatmaybehardtoencodedirectlyintomathematicalorcomputationalmodels.
• tohelpresolveexistinguncertaintiesasaresultof,forexample,biasanderrorthatmayarisefromautomaticmachinelearning.
• tobuildtrustbymakinghumansinvolvedinthemodellingorlearningprocesses.
• tocaterforindividualhumandifferencesandsubjectiveassessmentssuchasinartandcreativeapplications.
�10
VisualisationOptimisationModel Simulations
Experimental System
Application!Multiple Computational Stages!
Multiple Actors!
Feedback
Interaction!Results !
11
Boukhelifa,Nadia,etal."AnExploratoryStudyonVisualExplorationofModelSimulationsbyMultipleTypesofExperts."Proceedingsofthe2019CHIConferenceonHumanFactorsinComputingSystems.ACM,2019.
OurapproachtoInteractiveModelExploration
VisualisationOptimisationModel Simulations
Experimental System
Application!Multiple Computational Stages!
Multiple Actors!
Feedback
Interaction!Results !
12 Boukhelifa et al., CHI2019
Collaborativesetup
OurapproachtoInteractiveModelExploration
VisualisationOptimisationModel Simulations
Experimental System
Application!Multiple Computational Stages!
Multiple Actors!
Feedback
Interaction!Results !
13
• Modelswrittenbythirdparties• Expertsspecialistsinpartsofthemodelledprocess
=>Co-locatedmultipleexpertise:domain,model,optimisation,visualization.
OurapproachtoInteractiveModelExploration
Collaborativesetup
Boukhelifa et al., CHI2019
VisualisationOptimisationModel Simulations
Experimental System
Application!Multiple Computational Stages!
Multiple Actors!
Feedback
Interaction!Results !
Data & Models 14
OurapproachtoInteractiveModelExploration
Boukhelifa et al., CHI2019
VisualisationOptimisationModel Simulations
Experimental System
Application!Multiple Computational Stages!
Multiple Actors!
Feedback
Interaction!Results !
Data & Models 15
Trade-offs
OurapproachtoInteractiveModelExploration
Boukhelifa et al., CHI2019
VisualisationOptimisationModel Simulations
Experimental System
Application!Multiple Computational Stages!
Multiple Actors!
Feedback
Interaction!Results !
Data & Models Pareto Fronts 16
setofnon-dominatedcompromisepoints,wherenoobjectivecanbeimprovedwithoutsacrificingatleastoneotherobjective.
QU
AN
TITY
OF
ITEM
2
QUANTITY OF ITEM 1
OurapproachtoInteractiveModelExploration
Boukhelifa et al., CHI2019
VisualisationOptimisationModel Simulations
Experimental System
Application!Multiple Computational Stages!
Multiple Actors!
Feedback
Interaction!Results !
Visualization SystemData & Models Pareto Fronts 17
QU
AN
TITY
OF
ITEM
2
QUANTITY OF ITEM 1
OurapproachtoInteractiveModelExploration
�18
http://www.seguetech.com/
a way to generate pretty images from data
Definition#1:Visualization
�19
Definition#1:Visualization
Thepurposeofvisualizationisinsight,notpictures!
http://www.seguetech.com/
�20
Definition#1:Visualization
Informa<onVisualisa<on
“Theuseofcomputer-supported,interactive,visualrepresentationsofabstractdatatoamplifycognition.”[Cardetal.,1999]
Thepurposeofvisualizationisinsight,notpictures!
http://www.seguetech.com/
Card,StuartK.,JockD.Mackinlay,andBenShneiderman."Usingvisiontothink."Readingsininformationvisualization.MorganKaufmannPublishersInc.,1999.Card,S.
WhyvisualiseyourdataRawDatafromAnscombe’sQuartet
�21
FrankAnscombe
StatisticalanalysisRawDatafromAnscombe’sQuartet
�22
StatisticalProperties
Meanofx 9.0
Varianceofx 11.0
Meanofy 7.5
Varianceofy 4.12
Correlationbetweenxandy 0.816
Linearregressionline y=3+0.5x
VisualrepresentationofthedataRawDatafromAnscombe’sQuartet
�23
VisualProperties
Samestats,differentgraphs!
Nevertrustsummarystatisticsalone;visualiseyourdata!
�24
DatasaurusDozenDataset,AutodeskResearchhttps://www.autodeskresearch.com/publications/samestat
Visualrepresentationofthedata
�25
Communicate visually
Explore interactively
Evaluate
InformationVisualization
�26
Definition#2:Human-ComputerInteraction
“Human-computer interaction is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them.”
Hewett,T.etal.ACMCurriculaforHuman-ComputerInteraction.1992;
“algorithms that can interact with both computational agents and human agents (in active learning: oracles) and can optimize their learning behavior through these interactions.”
�27
Definition#3:iML
“Users Are People, Not Oracles”
Takingintoaccounthumanfactors
Amershietal.,PowertothePeople:TheRoleofHumansinInteractiveMachineLearning,2014
AndreasHolzinger.Interactivemachinelearning(iml).InformatikSpektrum,39(1),2016.DanielKottke,InteractiveAdaptiveLearning,IntelligentEmbeddedSystems,UniversityofKassel,2018
�28
Definition#4:IVML
“In interactive visual machine learning (IVML), a human operator and a machine collaborate to achieve a task mediated by an interactive visual interface.”
Ourdefinition:
N.Boukhelifa,A.Bezerianos,andE.Lutton.EvaluationofInteractiveMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
�29
EvaluationofInteractiveSystems
✔Userperformance
✔Userexperience
✔Algorithmicperformance
Weknowhowtoassess:
�30
EvaluationofIVMLSystems
https://unsplash.com/photos/VBe9zj-JHBs
�31
OurexperiencewithEVE-EvolutionaryVisualExploration
N.Boukhelifa,A.Bezerianos,I-CTrelea,N.MPerrot,andELutton.AnExploratoryStudyonVisualExplorationofModelSimulationsbyMultipleTypesofExperts."ACMCHIConferenceonHumanFactorsinComputingSystems,2019
N.Boukhelifa,A.Bezerianos,andE.Lutton.EvaluationofInteractiveMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
N.Boukhelifa,A.Bezerianos,W.CancinoandE.Lutton.EvolutionaryVisualExploration:EvaluationofanIECFrameworkforGuidedVisualSearch.EvolutionaryComputationJournal,MITpress,2017.
N.Boukhelifa,A.BezerianosandE.Lutton.AMixedApproachfortheEvaluationofaGuidedExploratoryVisualization System.EuroVisWorkshoponReproducibility,Verification,andValidationinVisualization(EuroRV3)2015.
W.Cancino,N.Boukhelifa,A.BezerianosandE.Lutton.EvolutionaryVisualExploration:ExperimentalAnalysisofAlgorithmBehaviour.GECCOworkshoponGeneticandEvolutionaryComputation(VizGEC2013).
N.Boukhelifa,W.Cancino,A.BezerianosandE.Lutton.EvolutionaryVisualExploration:EvaluationWithExpertUsers.ComputerGraphicsForum(EuroVis2013),EurographicsAssociation,2013,32(3).
EvaluationofIVMLSystems
N.Boukhelifa,A.Bezerianos,I-CTrelea,N.MPerrot,andELutton.AnExploratoryStudyonVisualExplorationofModelSimulationsbyMultipleTypesofExperts."ACMCHIConferenceonHumanFactorsinComputingSystems,2019
N.Boukhelifa,A.Bezerianos,andE.Lutton.EvaluationofInteractiveMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
N.Boukhelifa,A.Bezerianos,W.CancinoandE.Lutton.EvolutionaryVisualExploration:EvaluationofanIECFrameworkforGuidedVisualSearch.EvolutionaryComputationJournal,MITpress,2017.
N.Boukhelifa,A.BezerianosandE.Lutton.AMixedApproachfortheEvaluationofaGuidedExploratoryVisualization System.EuroVisWorkshoponReproducibility,Verification,andValidationinVisualization(EuroRV3)2015.
W.Cancino,N.Boukhelifa,A.BezerianosandE.Lutton.EvolutionaryVisualExploration:ExperimentalAnalysisofAlgorithmBehaviour.GECCOworkshoponGeneticandEvolutionaryComputation(VizGEC2013).
�32
EvaluationofIVMLSystems
w. Cancino A. Bezerinaos E. Lutton
Our experience with EVE - Evolutionary Visual Exploration
�33
EvolutionaryVisualExploration
Needtoexplorecombineddimensionshttps://unsplash.com/photos/YTbFHT9_IhY
�34
Howtoexploreahugesearchspaceofpossiblecombined
dimensions?
�35
Howtoexploreahugesearchspaceofpossiblecombined
dimensions?
n-DdatasetInteresting2D projectionsEV
E IEAs
EvolutionaryVisualExploration
�36
WhyIEAs?
✔cancombineobjective&subjectivemeasures
✔supportexploration&exploitation
✔adapttouserchangeofinterest
Suitableforexploratoryvisualization:
EVEPrototype
�37
IEA
Bou
khel
ifa e
t al.,
Eur
oVis
201
3
�38
ArtificialEvolution
NaturalEvolution EvolutionaryAlgorithms(EAs)
�39
Wikipedia
EVE:CreatingNewProjections
�40
Biscuit data (6D)
Top_heat, Bottom_heat
+,-,*,/, (.)(.), exp and log
offspring1: 0.2 * Top_heat + 0.7 * Bottom_heat
offspring2: Exp(Top_heat)
offspring3: 0.99 - (0.69 * log(Top_heat)) …
Favor linear patterns
Choose
offspringX
InteractiveEvolution(IEAs)
EvaluationofProjections
�41
User assessment ComplexitySurrogate function
Scagnostics, Wilkinson and Wills, 2008
learned
interactivecomputed
TheSearchSpace
TheSetofalldimensionsencodedastrees(GPframework,Koza92)
�42
initialdimensions,operators,constants
SubspaceExploration
�43
SubspaceExploration
�44
KeyfeaturesofEVE
�45
Intuitive
Adaptable
Interactive
Flexible
Isit?
EvaluatingEVE
46
Amixedapproach
�47
Algorithm centred evaluation
User centred Evaluation
Isthehuman-machinecooperationproducingtherightresults?
�48
Whenwehaveground-truth
Aim
• istheexplorationactuallysteeredtowardaninterestingareaofthesearchspace?
• aretheproposedsolutionsvaried?
Quantitativestudymethodology
• syntheticdataset
• pre-specifiedtask
�49
Evaluatinghuman-AIcollaboration
• 12participants• mean28.5years• noexperiencerequired• Syntheticdataset5D• 20minutes
Participants
�50
Gameseparatetwopointclusters
Task
�51
IEAisabletofollowtheorderofuserranking
Convergenceanalysis
�52
failure
failuresuccess
success
Interfaceevaluation-usestrategiesVisualpatternselectionstrategies
failure
failuresuccess
success
Interfaceevaluation-usestrategiesVisualpatternselectionstrategies
Keyfindings
Userstakedifferentsearchandevaluationstrategiesevenforasimpletask.
Onaveragethesurrogatefunctionfollowstheorderofuserrankingfairlyconsistently.
Linkbetweenuserevaluationstrategy,andoutcomeofexploration&speedofconvergence.
�55
Promisingresultsbut…
• Realworldsituationshavemorecomplexdatasetsandtasks.
• Usersarenotalwaysconsistentorgivedetailedfeedback.
• Wedonotalwayshavegroundtruth.
What happens when we do not have ground truth ?
�56
User-centredevaluation
Insightandusabilityevaluation
• areexpertsabletoconfirmoldknowledge?
• areexpertsabletogainnewinsight?
Qualitativestudymethodology
• thinkaloud,observe,interview&questionnaire
• videotapedandlogdatacapture
�57
• 5domainexperts
• mean34.2years
• owndatasets
• pre-questionnaire
• 2.5hours
Participants
�58
Training
T1:showinthetoolwhatyoualreadyknowaboutthedata
T2:explorethedatainlightofaresearchquestion
Tasks
�59
EVEResults
Hypothesisgeneration
�60
“thiscombinationmaybeanimportantfindingbecauseitinvolvesparametersthataffectonlyonepartofthe
simulationmodel...”
Cityemergencemodel
EVEResults
Hypothesisquantification
‘‘wealwaystalkaboutthisqualitatively.Thisisthefirst
timeIseeconcreteweights...’’
�61
Electricityconsumptionprofiles
EVEResults
�62
expertswereableto:• tryoutalternativescenarios• thinklaterally• quantifyaqualitativehypothesis• formulateanewhypothesisorrefineoldone• (domainvalue)
But…
�63
« On one hand, humans perform unpredictable and sophisticated reasoning; on the other, artificial solvers are technically complex and adopt solving strategies which are very different from those employed by humans.
Secondly, the environment from which the problem to be solved is drawn is usually uncontrollable and uncertain.
Together, these factors complicate the task of designing precise and effective evaluation studies. »
G. Cortellessa and A. Cesta, AAAI
EvaluationofIVMLisChallenging
Cortellessa,Gabriella,andAmedeoCesta."EvaluatingMixed-InitiativeSystems:AnExperimentalApproach."ICAPS.Vol.6.2006.
�64
EvaluationofIVMLisChallenging
Ch#1ComplexHumanFactors
Ch#2MultipleExpertise
Ch#3StochasticProcesses
Ch#4Co-adaptation
Ch#5Uncertainty
�65
Ch#1ComplexHumanFactors
“Users Are People, Not Oracles”
FrustratedBored
FatigueAnnoyed
Inconsistent
Reluctanttogivefeedback
Interruptibility
Amershi et al., Power to the People: The Role of Humans in Interactive Machine Learning , 2014
Bias
e.g.,Needbettertechniquestocaptureuserintent.
�66
Ch#2MultipleExpertise
ShouldIVMLhelpgroupsreachconsensus?Encouragemultiple
views?
Forus:buildcommongroundandselectbest
trade-offs
�67
Ch#2MultipleExpertise
ShouldIVMLhelpgroupsreachconsensus?Encouragemultiple
views?
Forus:buildcommongroundandselectbest
trade-offs
Weneedtocaterforindividualaswellascollective
exploration
Needsetupsthatcanswitchbetweenindividualand
collectivelearning
�68
• Riskofgettingstuckinlocaloptima?
�68
• Effectofstochasticityonuser’smentalmodeoftheML
Ch#3StochasticProcesses
Ch#3Co-adaptation
�69
“ Co-adaptive phenomena are defined as those in which the environment affects human behavior and at the same time, human behavior affects the environment. Such phenomena pose theoretical and methodological challenges and are difficult to study in traditional ways. ”
W. E. Mackay, Users and Customizable Software: A Co-Adaptive Phenomenon, 1990.
Ch#4Uncertainty
�70
Differentsourcesofuncertaintyarisingfrom:• Automatic
inferences• Analysts
reasoning
https://www.facebook.com/pedromics
�71
N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
ReviewofCHI/VIZpublications2012-2017Keywordsearch:IML+Evaluation19papersfromdifferentdomains
Notanexhaustivesurvey!
IMLEvaluation-HCIStudies
�72
N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
IMLEvaluation-HCIStudies
�73
N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
HumanFeedback:
• implicit(7papers)
• explicit(8papers)
• mixed(4papers):• implicithumanfeedbackhelpsinferinformationtocomplementimplicithumanfeedback.
IMLEvaluation-HCIStudies
�74
N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
SystemFeedback:
• toinformhumansaboutthestateofthemachinelearningalgorithm,andprovenanceofsystemsuggestions.
• canbevisual,progressive,andcanindicateuncertainty.
• mostsystemsgavefeedback.
• challenge:toinformwithoutoverwhelmingtheuser.
IMLEvaluation-HCIStudies
�75
N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
Typesofstudy:
• 12/19studiesinvolvingusers(someforofcontrolledstudy)
• Difficulttoconduct:• potentialconfoundingfactors
• nogroundtruth
IMLEvaluation-HCIStudies
�76
N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
EvaluationMetrics:
objective:• time,precision,re-call,#Insights
• iMLvs.baselineorvariantsofML;with/outsystemfeedback;explicitvs.implicit;impactofuserfeedback
• difficultyseparatingusabilityissuesfromtaskresults
•
IMLEvaluation-HCIStudies
�77
N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
EvaluationMetrics:
subjective:• aspectsofuserexperience,e.g.,reported,easiness,speed,taskload,trust,andconfidence.
•
IMLEvaluation-HCIStudies
�78
N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.
EvaluationMetrics:
subjective:• aspectsofuserexperience,e.g.,reported,easiness,speed,taskload,trust,andconfidence.
•
IMLEvaluation-HCIStudies
“One sign of success of iML systems is when humans forget that they are feeding information to an algorithm, and rather focus on
synthesising information relevant to their task”. Endertetal.,2012[38]
GuidelinesforIVMLevaluation
�79
Weknowhowtoevaluateinterfaces&interactivesystems,butveryfewguidelinesexistspecificallyforiMLsystems!
JakobNielsen's10UsabilityHeuristicsforUserInterfaceDesign
G1:VisibilityofsystemstatusG2:MatchbetweensystemandtherealworldG3:UsercontrolandfreedomG4:ConsistencyandstandardsG5:ErrorpreventionG6:RecognitionratherthanrecallG7:FlexibilityandefficiencyofuseG8:AestheticandminimalistdesignG9:Helpusersrecognize,diagnose,&recoverfromerrorsG10:Helpanddocumentation
GuidelinesforIVMLevaluation
�80
G1Developingsignificantvalue-addedautomation.
G2Consideringuncertaintyaboutauser’sgoals.
G3Consideringthestatusofauser’sattentioninthetimingofservices.
G4Inferringidealactioninlightofcosts,benefits,anduncertainties.
G5Employingdialogtoresolvekeyuncertainties.
G6Allowingefficientdirectinvocationandtermination.
G7Minimizingthecostofpoorguessesaboutactionandtiming.
G8Scopingprecisionofservicetomatchuncertainty,variationingoals.
G9Providingmechanismsforefficientagent−usercollaborationtorefineresults.
G10Employingsociallyappropriatebehaviorsforagent−userinteraction.
G11Maintainingworkingmemoryofrecentinteractions.
G12Continuingtolearnbyobserving.
PrinciplesofMixed-InitiativeUserInterfaces-EricHorvitz,1999
GuidelinesforIVMLevaluation
�81
G1Makeclearwhatthesystemcando.
G2Makeclearhowwellthesystemcandowhatitcando.G3Timeservicesbasedoncontext.G4Showcontextuallyrelevantinformation.G5Matchrelevantsocialnorms.G6Mitigatesocialbiases.
G7Supportefcientinvocation.
G8Supportefcientdismissal.
G9Supportefcientcorrection.
G10Scopeserviceswhenindoubt.
G11Makeclearwhythesystemdidwhatitdid.G12Rememberrecentinteractions.G13Learnfromuserbehavior.G14Updateandadaptcautiously.G15Encouragegranularfeedback.
G16Conveytheconsequencesofuseractions.
G17Provideglobalcontrols.
G18Notifyusersaboutchanges.
Newguidelinesproposed,e.g.,Amershietal.,2019.
Noconclusions-researchquestions!
�82
Q1WhataspectsorcomponentsoftheIVMLsystemaremostimportanttoevaluate?
Q2Whattypesoftaskscanbedelegatedtomachinelearningandwhicharebestlefttohumans?
Q3WhoarethetargetusersofIVMLsystems?
Q4Whatistheroleofexpertiseinthiscontext?
Q5ShouldwehavedomainexpertstrainIVMLsystems?
Q6Whataretherisksandbenefitsofintroducinghumanexpertiseintothe(machine)learningprocess?
Q7Whatmetricsshouldweusetoevaluate?
Q8Canweestablishapplicationordomain-independentmetrics?
Q9DoweestablishdifferentevaluationsmeasuresfortheunderstandingofIVMLsystemsandfortheirperformance?
Q10Canweestablishbenchmarkdatasetsandtestuse-casestohelpevaluateIVMLsystems?
Q11Shouldweseekreplicationinthiscontext,andifso,howdowesupportreplicationofresultsinaco-learningoradaptiveenvironment?
Q12Howdowecommunicatetheevaluationresultstootherdisciplines?
eviva-ml.github.io
iML-eval:On-goingResearchTopic
�83
21 October, 2019
eviva-ml.github.io
�84
eviva-ml.github.ioEarly registration deadline 20/09
Organizing Committee
• Nadia Boukhelifa (INRA, FR)• Anastasia Bezerianos (Univ. Paris-Sud and INRIA, FR)• Enrico Bertini (NYU Tandon School of Engineering, USA)• Christopher Collins (Uni. of Ontario Institute of Technology, CA)• Steven Drucker (Microsoft Research, USA)• Alex Endert (Georgia Tech, USA)• Jessica Hullman (Northwestern University, USA)• Michael Sedlmair (University of Stuttgart, DE) • Remco Chang (Tufts University, USA)• Chris North (Virginia Tech, USA)
Noconclusions-researchquestions!
�85
Q1WhataspectsorcomponentsoftheIVMLsystemaremostimportanttoevaluate?
Q2Whattypesoftaskscanbedelegatedtomachinelearningandwhicharebestlefttohumans?
Q3WhoarethetargetusersofIVMLsystems?
Q4Whatistheroleofexpertiseinthiscontext?
Q5ShouldwehavedomainexpertstrainIVMLsystems?
Q6Whataretherisksandbenefitsofintroducinghumanexpertiseintothe(machine)learningprocess?
Q7Whatmetricsshouldweusetoevaluate?
Q8Canweestablishapplicationordomain-independentmetrics?
Q9DoweestablishdifferentevaluationsmeasuresfortheunderstandingofIVMLsystemsandfortheirperformance?
Q10Canweestablishbenchmarkdatasetsandtestuse-casestohelpevaluateIVMLsystems?
Q11Shouldweseekreplicationinthiscontext,andifso,howdowesupportreplicationofresultsinaco-learningoradaptiveenvironment?
Q12Howdowecommunicatetheevaluationresultstootherdisciplines?
eviva-ml.github.io
Danke!