whole-tale: the experience of research

79
Whole Tale: The Experience of Research through reproducible, computational narratives YesWorkflow: Revealing workflow, provenance from scripts Kurator: Automating data cleaning workflows EulerX: Agreeing to disagree about variant taxonomies Bertram Ludäscher [email protected] BCoN Workshop 2018-02-13..14 U Kansas Director,Center for Informatics Research in Science & Scholarship (CIRSS) School of Information Sciences (iSchool@Illinois) & National Center for Supercomputing Applications (NCSA) & Department of Computer Science (CS@Illinois) 1

Upload: bertram-ludaescher

Post on 17-Mar-2018

88 views

Category:

Data & Analytics


1 download

TRANSCRIPT

WholeTale:TheExperience ofResearch…through reproducible,computationalnarratives

YesWorkflow:Revealingworkflow,provenancefromscriptsKurator:AutomatingdatacleaningworkflowsEulerX:Agreeingtodisagreeaboutvarianttaxonomies

BertramLudä[email protected]

BCoN Workshop2018-02-13..14UKansas

Director,CenterforInformaticsResearchinScience&Scholarship(CIRSS)SchoolofInformationSciences(iSchool@Illinois)

&NationalCenterforSupercomputingApplications(NCSA)&DepartmentofComputerScience(CS@Illinois)

1

WholeTale:Thenextstepintheevolutionofthescholarlyarticle:The“Living”Paper

• 1st Generation:– narrative (prose)

• 2nd Generation:plus …– name..identify..include(accessto)data

• 3rd Generation:plus …– name..reference..includecode (software)..– andprovenance …andexecenvironment(containers)

Ludäscher:Whole-Tale++ 2

WholeTale

WholeTaleDashboard

WholeTale:What’sinaname?

(1)WholeTale⇔WholeStory:◦ Support(computational /data)scientists◦…alongthecompleteresearchlifecycle◦ ...fromexperimentto(newkindof)publication◦ ...andback!

(2)WholeTale⇔ fortheLongTailofScience–Easysharingofyourcomputationalnarratives,data,andexec-env since2017!

–Powerapplicationsforeveryone!

3Ludäscher:Whole-Tale++

TheWholeTale:MergingScienceandCyberinfrastructurePathways

NSF-DIBBSaward (5years,5institutions)• Illinois(NCSA&iSchool)• BertramLudäscher(PI),MTCampbell(PM)[KandaceTurner],VictoriaStodden(coPI),MattTurk(coPI),KacperKowalik(sw-architect),CraigWillis(dev)

•UofChicago• KyleChard(coPI),MihaelHategan(dev)

•UTAustin/TACC•NiallGaffney(coPI),SivaKulasekaran(dev)

•UNotreDame• JarekNabrzyski(coPI),IanTaylor(dev),AdamBrinckman(dev)

•UCSB/NCEAS•Matt Jones(coPI),BryceMecum(dev)

4

Whole TaleMotivation• Can'treproduceresultbecause:

• Don'tknowhowtorunanalysis

• Can'tgetthesoftwarerunning

• Can'tpayforthecomputerorcomputepowertheresultwascomputedon

Source:BryceMecum,WTteam@NCEAS5

Whole TaleVisionAddressingreproducibility

6

Data Code

ExecutionEnvironment

Article

Whole TaleVision• Livingpublication

(data+code+environment)

• Facilitatereproducibility

• Encourageinvestigationofresultsmakingiteasytorecreatetheenvironmenttheresultwascreatedin

Article

7

Whole TaleVisionAddressingreproducibility

Article

Tale

+

8

WholeTaleVision

Tale

Data

{Code

D1PROV

9

WTArchitecture

10Ludäscher:Whole-Tale++

https://dashboard.wholetale.org

DEMO:(re-)useexistingtaleor…

Ludäscher:Whole-Tale++ 11https://dashboard.wholetale.org

…CreateaNewTale!

Ludäscher:Whole-Tale++ 12https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 13https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 14https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 15https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 16https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 17https://dashboard.wholetale.org

RunningwithRStudio:LocallyoronWT…

Ludäscher:Whole-Tale++ 18You’reupandrunningquicklyonWholeTale!!

MaybeyoujustwanttouseWTtolearnRforDataScience...

Ludäscher:Whole-Tale++ 19

AnotherexampleTale:LIGOgravitationalwavedetection

(tutorialJupyter notebook)

Ludäscher:Whole-Tale++ 20

Ludäscher:Whole-Tale++ 21https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 22https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 23https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 24https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 25https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 26https://dashboard.wholetale.org

Ludäscher:Whole-Tale++ 27https://dashboard.wholetale.org

New&UpcomingFeaturesinWT...• AddyourownFrontends(e.g.OpenRefine,..)• Persistent,sharedorpersonalfiles:

– /data/(registered/externaldata,read-only,associatedwithatale)– /home/(yourowndata,r/w,associatedwithallyourtales)– /workspace/(sharedr/wdata,associatedwithatale,acrossallusers)

• WT“DerivedTales”:– takeatale;modifyittoyourliking;andpublishasaderivedwork

• WT“Take-Out”:– Wanttorunyourtaleselsewhere?– Take-out yourtaleandrunonyouron(orcloud)platform

• WT“Scale-Out”:– IftheWT-dashboardisn’tenoughè runyourownWTsystem!

• WT Provenance support:– …viaDataONE provenancetools,ProvONE model(W3CPROVextension)– …viaYesWorkflow

• InterestinjoiningaWTBiodiversityInformaticsWorkingGroup!?– Wealreadyhave:archaeology&ecology,astronomy,materialsscience– Yourinputwanted!(isWTdevelopingsomethingusefulforyou?)– TryoutWT,createsomeexamples(inR,Python,...)andprovidefeedback!– =>possibilitytofundasummerintern!

Ludäscher:Whole-Tale++ 28

AdditionalMaterial…

…teasersahead!

Ludäscher:Whole-Tale++ 29

Provenanceis:keepingrecords …

• GrandCanyon’srocklayersarearecordoftheearlygeologichistoryofNorthAmerica.Theancestralpuebloan granariesatNankoweap Creektellarchaeologistsaboutmorerecenthumanhistory.(ByDrenaline,licensedunderCCBY-SA3.0)

• Notshown:computationalarchaeologistsreconstructingpastclimatefrommultipletree-ringdatabasesè computationalprovenanceiskeyfortransparency &reproducibility

Ludäscher:Workflows&Provenance=>Understanding 30

...andprovenanceis:Understanding whathappened!

Zrzavý,Jan,DavidStorch,and StanislavMihulka.Evolution:EinLese-Lehrbuch.

Springer-Verlag,2009.

Author:Jkwchui (BasedondrawingbyTruth-seeker2004)

Ludäscher:Workflows&Provenance=>Understanding 31

Computational Provenance …• Origin,processinghistoryofartifacts

– dataproducts,figures,...– also:underlyingworkflowè understandmethods,dataflow,anddependencies

Ludäscher:Workflows&Provenance=>Understanding 32

Climate Change Impacts in the United States

U.S. National Climate AssessmentU.S. Global Change Research Program

YesWorkflow:HowdoestheLIGOscriptproduceitsresults??

Ludäscher:Whole-Tale++ 33

YesWorkflow:Prospective&RetrospectiveProvenance…(almost)forfree!

• YWannotationsina(Python,R,…)scriptrecreateaworkflowviewfromthescript…

cassette_id

sample_score_cutoff

sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv

calibration_imagefile:calibration.img

initialize_run

run_logfile:run/run_log.txt

load_screening_results

sample_namesample_quality

calculate_strategy

rejected_sample accepted_sample num_images energies

log_rejected_sample

rejection_logfile:/run/rejected_samples.txt

collect_data_set

sample_id energy frame_numberraw_image

file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw

transform_images

corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img

total_intensitypixel_count corrected_image_path

log_average_image_intensity

collection_logfile:run/collected_images.csv

YW!

Ludäscher:Whole-Tale++ 34

@BEGIN..@END..@IN..@OUT..@URI..@LOG..

GetModernClimate

PRISM_annual_growing_season_precipitation

SubsetAllData

dendro_series_for_calibration

dendro_series_for_reconstruction CAR_Analysis_unique

cellwise_unique_selected_linear_models

CAR_Analysis_union

cellwise_union_selected_linear_models

CAR_Reconstruction_union

raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors

CAR_Reconstruction_union_output

ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif

master_data_directory prism_directory

tree_ring_datacalibration_years retrodiction_years

Paleoclimate Reconstruction(openSKOPE.org)• …explainedusingYesWorkflow!

KyleB.,(computational)archaeologist:"Ittookmeabout20minutestocomment.LessthananhourtolearnandYW-annotate,all-told."

Ludäscher:Whole-Tale++ 35

Data Curation Workflows (Filtered-Push … Kepler … Kurator projects)

Ludäscher:Whole-Tale++ 36

Ludäscher:Whole-Tale++ 37

http://kurator.acis.ufl.edu/kurator-web/

Ludäscher:Whole-Tale++ 38

http://kurator.acis.ufl.edu/kurator-web/

DwCA TaxonLookupWorkflow

• Declareinputs,outputs,andsteps ofascript(orwf)withYWannotationsto...– communicateprovenancegraphically(viagraphviz)

– combine differentformsofprovenance

– query provenance• SimpleYWannotationsincomments:– @BEGINStep,@ENDStep– @INData,@OUTData– @URITemplate,@LOGPattern

Ludäscher:Whole-Tale++ 39

�����������������

�������������������������������������������������������������������

��������������������������������������������������������������

������������������������������������������������

�������������������������

�������������������������������������������������������������

����������

�������������������������������������������������������������������������������������������������������

����������������

���������������������

�������������������������������������������������������

����������������

�������������������������������������������������������

�������������������

������������������������������������������

������������������

����������������������������������������

�����������������

���������������������������������������

������������

�������������������������������������������������������������������

��������������������������������������������������������

�����������������

TaxonLookupWorkflow:DataViewandProcessView

Ludäscher:Whole-Tale++ 40

Thestoryoftwoindividual

records

Ludäscher:Whole-Tale++ 41

�����������������

�����������������

�������������������

�������

����������

����������

�����������������

�����

���������

��������������

����������������

����������

���������������

�����������������

����������������

������

������������������

����������������

�������������������������������

�����������

������������������

����

�����������

������������

�������������

���������������������

�������������������������������������������������������������������

�����������������

�������������������������������������������������������������������������

�����������������

������������������

����������������

�������

����������

�����������

������������������

�����

���������

��������������

����������������

����������

���������������

�����������������

����������������

���������

�����������������

�������������������

���������������������������������

����������

�����������������

��������������������������������������

�����������

������������

�������������

���������������������

�������������������������������������������������������������������

�����������������

������������������������������������������������������������������

• OnetooktheGBIFroute,while…

• … theotherwentallWORMS!

Non-Marine?è GBIF

Marine?èWORMS

Theaggregate story..

Ludäscher:Whole-Tale++ 42

�����������������

�����

���������

��������������

����������������

��������������������

�����������������

��������������������������

�������

����������

������������������

�������������������������

�����������������

����������������������������

�����������

�������������������������������

���������

����������

������������������������������

��������

�����������

������������

�������������

���������������������

�������������������������������������������������������������������

�����������������

�������������������������������������������������������������������������

• Howmanyrecordswereobservedasinputsoroutputsofworkflowsteps?

• WerethereanyNULLvalues?Howmany?

YesWorkflow Summary• Lightweight YWannotationscan

beaddedeasilytoyourscriptstoreapworkflowbenefits– Documentation ofwhat’s

important– Visualization ofdependencies– Queryingprovenance(prospective,

retrospective,andhybrid)– Independent ofsystemorlanguage

used(R,Python,MATLAB,workflowtools,…)

èmake provenanceactionableè provenanceforself!

=> github.com/yesworkflow-org/yw=> try.yesworkflow.org

Ludäscher:Whole-Tale++ 43

�����������������

�������������������������������������������������������������������

��������������������������������������������������������������

������������������������������������������������

�������������������������

�������������������������������������������������������������

����������

�������������������������������������������������������������������������������������������������������

����������������

���������������������

�������������������������������������������������������

����������������

�������������������������������������������������������

�������������������

������������������������������������������

������������������

����������������������������������������

�����������������

���������������������������������������

������������

�������������������������������������������������������������������

��������������������������������������������������������

�����������������

�����������������

�����

���������

��������������

����������������

��������������������

�����������������

��������������������������

�������

����������

������������������

�������������������������

�����������������

����������������������������

�����������

�������������������������������

���������

����������

������������������������������

��������

�����������

������������

�������������

���������������������

�������������������������������������������������������������������

�����������������

�������������������������������������������������������������������������

DemoTime

Ludäscher:Whole-Tale++ 44

(Disclaimer) https://github.com/idaks/dataone-ahm-2016-posterhttps://github.com/idaks/wt-prov-summer-2017https://github.com/yesworkflow-org/yw-idcc-17

DataONE:SearchandProvenanceDisplay

45Ludäscher:Whole-Tale++

DataONE:SearchandProvenanceDisplay

46Ludäscher:Whole-Tale++

Adding YesWorkflow to DataONEYaxing’s script withinputs &outputproducts

Christopher’sYesWorkflow

model

ChristopherusingYaxing’s outputsasinputsforhisscript

Christopher’sresultscanbetracedbackall

thewaytoYaxing’sinput

Ludäscher:Whole-Tale++ 47

Yi-YunCheng1,NicoFranz2,JodiSchneider1,Shizhuo Yu3,ThomasRodenhausen4,BertramLudäscher11SchoolofInformationSciences,UniversityofIllinoisatUrbana-Champaign;2SchoolofLifeSciences,ArizonaStateUniversity;3DepartmentofComputerScience,UniversityofCaliforniaatDavis;4SchoolofInformation,UniversityofArizona

Agreeing to Disagree: Reconciling Conflicting Taxonomic Views using a Logic-based Approach

Acknowledgments

Supportoftheauthors’researchthroughtheNationalScienceFoundationiskindlyacknowledged(DEB-1155984,DBI-1342595,andDBI-1643002).TheauthorsthankProfessorKathrynLaBarreforhercommentsandsuggestions.WewouldalsoliketothankDr.LaetitiaNavarroandJeffTerstriep forhelpwithcreatingmapoverlaysinQGIS.

CONCLUSION

• Ourlogic-basedtaxonomyalignmentapproachcanbeusedtosolvecrosswalking issuesWewillbeabletomitigatethemembershipconditionproblemsthatoccurinequivalentcrosswalking.

• RCC-5approachpreservestheoriginaltaxonomieswhileprovidinganalignmentviewWecansolvedataintegrationproblemsthathappeninthemorecoarse-grainedrelativecrosswalking,whichotherwiseissubjectedtoinformationloss.

• Ourstudyalsounderscoresthebenefitsofdesigningdifferentalignmentworkflows(Bottomupvs.Top-down)tomatchtheneedsofspecifictaxonomyalignmentproblemsBottom-upapproach:seemstoworkwellwheneverwehavenon-overlappingrelationshipsattheleaf-level(lowest-level)articulations,andwearenotsurehowthehigher-levelconceptsshouldbealigned.

Top-downapproach:seemsfavorablewhenthereisanexpectationofcertainhigher-levelarticulationsinconjunctionwithunder-specified,complex,andoftenoverlappingleaf-levelrelations.

RELATEDWORK

• TaxonomyAlignmentProblems(TAP)TaxonomiesT1,T2 areinter-linkedviaasetofinputarticulations A,definedasRCC-5relations, toyielda“merged”taxonomyT3 .

• Euler/XArticulations – aconstraintorrulethatdefinesarelationship(asetconstraint)betweentwoconceptsfromdifferenttaxonomies.

RegionConnectionCalculus(RCC-5)

PossibleWorlds–WhenencodingandsolvingTAPsviaASP,thedifferentanswersetsrepresentalternativetaxonomymergesolutionsorpossibleworlds(PWs).

INTRODUCTION

Tina:HeyAmy,canyourecommendasignaturedishfromwhereyoulive?

Amy:Oh,definitelythehalf-smokesfromtheNortheast!Theyarethesetastyhalf-porkandhalf-beefsausages.

Tina:Whatacoincidence!Wehavehalf-smokesintheSouth,too!WheredoyouliveintheNortheast?NewYork?Boston?

Amy:Wrongguesses!WheredoyouliveintheSouth?

TinaandAmytogether:Washington,D.C.

[Thetwoofthemlookateachother,confused.]

“Inthefaceofincompatibleinformationordatastructuresamongusersoramongthosespecifyingthesystem,attemptstocreateunitaryknowledgecategoriesarefutile.Rather,parallelormultiplerepresentationalformsarerequired…”(Bowker&Star,2000).

CASE1RESULTS:CENvs.NDC

• State-levelalignmentsareallcongruent(Bottom-up)• Inferrednewarticulationsforregional-levelalignments

CASE2RESULTS:CENvs.TZ

Figure 3. (Left) CEN-NDC taxonomy alignment problem with 49 input articulations between TCEN and TNDCFigure 4. (Right) The unique possible world (PW) T3 reconciling TCEN and TNDC via inferred relationships

Figure 1. National Diversity Council map (NDC) vs. Census Bureau map (CEN)

• Github link:https://github.com/EulerProject/ASIST17

• Email:[email protected]

West

Southwest Southeast

Midwest North-east

West

South

Midwest North-east

PacificMountain

CentralEastern

West

South

Midwest

North-east

RESEARCHDESIGN

Step1. SupplyinputtaxonomiesT1 andT2Step2.FormulateRCC-5articulationsbetweenT1 andT2Step3. IterativelyeditarticulationsinEuler/X

Y X X YX Y X Y X Y

CongruenceX == Y

InclusionX > Y

Inverse InclusionX < Y

OverlapX>< Y

DisjointnessX ! Y

T1 T2

T1 T2

Inconsistent (N=0) Ambiguous (N>1)

T3

Add/Edit Articulations A

Euler/X

N Possible Worlds

N=1 N=0 or N>1

R1

R2

R3

R4

R5

R6

R7

R8

R9

CEN.Midwest

CEN.USATZ.USA

CEN.West

CEN.NortheastTZ.Eastern\CEN.Midwest

TZ.Eastern\CEN.South

CEN.South

CEN.South*TZ.CentralTZ.Central\CEN.Midwest

CEN.South\TZ.Eastern

CEN.South\TZ.Mountain

TZ.Central

CEN.Midwest\TZ.Eastern

TZ.Mountain\CEN.SouthTZ.Mountain

CEN.Midwest\TZ.Mountain

TZ.Mountain\CEN.Midwest

CEN.Midwest*TZ.Mountain

CEN.Midwest\TZ.Central

TZ.Mountain\CEN.West

CEN.Midwest*TZ.Eastern

CEN.West*TZ.Mountain

CEN.South*TZ.MountainCEN.South\TZ.Central

TZ.Eastern

CEN.South*TZ.Eastern

CEN.Midwest*TZ.CentralTZ.Central\CEN.South

TZ.PacificCEN.West\TZ.Mountain

Nodes

CEN 4newComb 18comb 1TZ 4

Edges

input 6inferred 37

CEN.IL NDC.IL==

CEN.IN NDC.IN==

CEN.RI NDC.RI==

CEN.IA NDC.IA==

CEN.WV NDC.WV==

CEN.KS NDC.KS==

CEN.KY NDC.KY==

CEN.TX NDC.TX==

CEN.NortheastCEN.VTCEN.MA

CEN.ME

CEN.CT

CEN.PA

CEN.NY

CEN.NH

CEN.NJ

CEN.South

CEN.TN

CEN.MS

CEN.MD

CEN.DC

CEN.DE

CEN.VA

CEN.FL

CEN.AR

CEN.AL

CEN.OK

CEN.SC

CEN.LACEN.GA

CEN.NC

CEN.ID NDC.ID==

NDC.TN==

CEN.WY NDC.WY==

NDC.VT==

NDC.MS==

CEN.MT NDC.MT==

NDC.MA==

CEN.USA

CEN.Midwest

CEN.West

NDC.ME==

NDC.MD==

CEN.MI NDC.MI==

CEN.MN NDC.MN==

NDC.DC==

NDC.DE==

CEN.OR NDC.OR==

CEN.OH NDC.OH==

NDC.VA==

NDC.FL==

NDC.AR==

CEN.AZ NDC.AZ==

NDC.AL==

NDC.OK==

NDC.CT==

CEN.CO NDC.CO==

CEN.CA NDC.CA==

CEN.SD NDC.SD==

NDC.SC==

CEN.MO

CEN.ND

CEN.NE

CEN.WI

NDC.LA==

NDC.MO==

CEN.UT NDC.UT==

NDC.GA==

NDC.PA==

CEN.NV

CEN.NM

CEN.WA

NDC.NY==

NDC.NV==

NDC.NM==

NDC.WA==

NDC.NH==

NDC.NJ==

NDC.ND==

NDC.NE==

NDC.WI==

NDC.NC==

NDC.West

NDC.Midwest

NDC.Northeast

NDC.Southeast

NDC.USA

NDC.Southwest

Nodes

CEN 54NDC 55 Edges

isa_CEN 53isa_NDC 54Art. 49

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

CEN.Northeast

TZ.Eastern

<

CEN.Midwest><

TZ.Mountain

><

TZ.Pacific

!

CEN.South

><

><

!

TZ.Central

><

CEN.USA

CEN.West

TZ.USA

==

!

><

!

Nodes

CEN 5TZ 5

Edges

isa_CEN 4isa_TZ 4Art. 12

CEN.Midwest

CEN.USATZ.USA

TZ.Eastern

TZ.Central

TZ.Mountain

CEN.South

CEN.Northeast

CEN.West TZ.Pacific

Nodes

CEN 4comb 1TZ 4

Edges

input 7overlapsinput 6overlapsinferred 1

inferred 1

R1 R2

R3

R4

R5

R6 R7

R8

R9

Figure 2. The process of aligning taxonomies T1 and T2 with Euler/X

Figure 5. Top-downinput alignments between TCEN and TTZ

Figure 6. The unique PW for the TCEN with TTZ alignment

Figure 10. Combined concepts solution for TCEN and TTZ

taxonomy CEN Census_Regions(USA Northeast Midwest South West)(Northeast CT MA ME NH NJ NY PA RI VT)(Midwest IL IN IA KS MI MN MO NE ND OH SD WI)(South AL AR DE DC FL GA KY LA MD MS NC OK SC TN TX VA WV)(West AZ CA CO ID MT NV NM OR UT WA WY)

taxonomy NDC National_Diversity_Council(USA Midwest Northeast Southeast Southwest West)(Northeast CT DC DE MD MA ME NH NJ NY PA RI VT)(Midwest IA IL IN KS MI MN MO ND NE OH SD WI)(Southeast AL AR FL GA KY LA MS NC SC TN VA WV)(Southwest AZ NM OK TX)(West CA CO ID MT NV OR WA WY UT)

articulations CEN NDC[CEN.AL equals NDC.AL][CEN.AR equals NDC.AR][CEN.AZ equals NDC.AZ][CEN.CA equals NDC.CA][CEN.CO equals NDC.CO][CEN.CT equals NDC.CT][CEN.DC equals NDC.DC][CEN.DE equals NDC.DE][CEN.FL equals NDC.FL][CEN.GA equals NDC.GA][CEN.IA equals NDC.IA][CEN.ID equals NDC.ID][CEN.IL equals NDC.IL][CEN.IN equals NDC.IN][CEN.KS equals NDC.KS][CEN.KY equals NDC.KY][CEN.LA equals NDC.LA][CEN.MA equals NDC.MA][CEN.MD equals NDC.MD][CEN.ME equals NDC.ME][CEN.MI equals NDC.MI][CEN.MN equals NDC.MN]...

Quick Scan!

taxonomy CEN Census_Regions(USA Midwest South West Northeast)

taxonomy TZ Time_Zone(USA Pacific Mountain Central Eastern)

articulations CEN TZ[CEN.Midwest disjoint TZ.Pacific][CEN.Midwest overlaps TZ.Eastern][CEN.Midwest overlaps TZ.Mountain][CEN.Northeast is_included_in TZ.Eastern][CEN.South disjoint TZ.Pacific][CEN.South overlaps TZ.Central][CEN.South overlaps TZ.Eastern][CEN.South overlaps TZ.Mountain][CEN.USA equals TZ.USA][CEN.West disjoint TZ.Central][CEN.West disjoint TZ.Eastern][CEN.West overlaps TZ.Mountain]

Ludäscher:Whole-Tale++ 48

Foranothertime?Non-unitary syntheses

of systematic knowledgeNico Franz

School of Life Sciences, Arizona State University

CIRSS Seminar – Center for Informatics Research in Science and Scholarship

February 17, 2017 – iSchool, University of Illinois Urbana-Champaign

@ http://www.slideshare.net/taxonbytes/franz-2017-uiuc-cirss-non-unitary-syntheses-of-systematic-knowledge 49Ludäscher:Whole-Tale++

Tracingtaxonomicnames(concepts!)overtime…

Taxonomic concept alignment, Andropogon glomeratus-virginicus complex, spanning across 11 classifications authored 1889-2015

• 36 unique taxonomic names

• 88 taxonomic concept labelsÞ name sec. author strings

• Alignment by A.S. WeakleyÞ row position = congruence

• 1/36 names with unique 1 : 1name : meaning cardinalityacross all classifications

• Andropogon virginicus

• Source: Franz et al. 20161

1 Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex.Semantic Web Journal (IOS). doi:10.3233/SW-160220

http://taxonbytes.org/wp-content/uploads/2014/10/Peet-BIGCB-2014-Changing-Perspectives-on-Plant-Distributions.pdf51Ludäscher:Whole-Tale++

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

"Taxonomic concept labels"identify input concept regions

RCC–5 articulations providedfor each species-level concept

• Input visualization: MSW3 (2005) versus MSW2 (1993)

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

52Ludäscher:Whole-Tale++

• Alignment visualization: "grey means taxonomically congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

53Ludäscher:Whole-Tale++

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Application of coverage constraint: parent-to-parent articulations (><) arefully defined by alignment signal propagated from their respective children.

è Sensible when complete sampling of children is intended.

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

54Ludäscher:Whole-Tale++

1 in 3 names is unreliable across MSW2/MSW3 classifications

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

55Ludäscher:Whole-Tale++

The 'consensus' The 'bible'

The (formerly) federal

'standard'

The 'best', latest regional flora

"Controlling the taxonomic variable"

Expert viewsare in

conflict

"Just bad"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

56Ludäscher:Whole-Tale++

The 'consensus' The 'bible'

The (formerly) federal

'standard'

The 'best', latest regional flora

Impact:Name-based aggregation has created

a novel synthesis that nobody believes in

"Controlling the taxonomic variable"

"Just bad"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

57Ludäscher:Whole-Tale++

The 'consensus' The 'bible'

The (formerly) federal

'standard'

The 'best', latest regional flora

"Controlling the taxonomic variable"

"Just bad"

Expert viewsare

reconciled

Solution:Instead of aggregating

an artificial 'consensus',build translation services

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

58Ludäscher:Whole-Tale++

Leavingtaxonandspeciesheadaches…• ToillustrateEulerthinkofasimplerusecase:• Agreeingtodisagree!• …whentherearemultiple,legitimateperspectives

• Sortingthingsout!– Eulerasataxonconcept(&name)“microscope”...– ..or“timemachine”?

59Ludäscher:Whole-Tale++

TwoTaxonomies:NDC vs CEN

“…in the face of incompatible information or data structures among users or among thosespecifying the system, attempts to create unitary knowledge categories are futile. Rather, parallelor multiple representational forms are required” [Bowker & Star, 2000, p.159]

West

Southwest Southeast

Midwest North-east

West

South

Midwest North-east

NationalDiversityCouncilmap(NDC) USCensusBuero map(CEN)

Source:Yi-Yun(Jessica)Cheng(PhDstudent,iSchool @Illinois)Ludäscher:Whole-Tale++ 60

Thetaxonomies

Ludäscher:Whole-Tale++

• TheCensusRegionsMap(CEN),consistsoffour regions:West,Midwest,Northeast,andSouth,i.e.,thecontiguous48statesandWashingtonD.C.

West

South

Midwest

North-east

61

Thetaxonomies

• TheNationalDiversityCouncilMap(NDC),consistsoffiveregions:West,Southwest,Midwest,Northeast,Southeast,the48statesandWashingtonD.C.

NDC(withstates)

West

Southwest Southeast

Midwest North-east

• NDC splits South into SW and SE

• Do NDC and CEN agree on “West”? “Midwest”? …

• How can we sort this out?

Ludäscher:Whole-Tale++ 62

Sortingthingsout…

Ludäscher:Whole-Tale++

CEN.Midwest

CEN.USA

CEN.South CEN.West CEN.Northeast NDC.Northeast

NDC.USA

NDC.Southeast NDC.Midwest NDC.Southwest NDC.West

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5

CEN.South

NDC.Northeast

o

NDC.Southwest

o

NDC.Southeast>

CEN.Midwest NDC.Midwest=

CEN.USA

CEN.West

CEN.NortheastNDC.USA

=

!

oNDC.West

>

<

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5articulations 9

CEN.Midwest

CEN.USA

CEN.South CEN.West CEN.Northeast NDC.Northeast

NDC.USA

NDC.Southeast NDC.Midwest NDC.Southwest NDC.West

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5

• Given:– taxonomiesT1,T2– andrelationsT1~T2

(articulations,alignment)• Find:

– mergedtaxonomyT3• Suchthat:

– T1,T2arepreserved– allpairwiserelationsare

explicit

T1 T2

63

5waystorelateconcepts(regions)

• Idea:relateconceptsXandYwitharticulations

• ArticulationLanguage:RegionConnectionCalculus (RCC5):congruence,inclusion,inverseinclusion,overlap,disjointness

Y X X YX Y X Y X Y

CongruenceX == Y

InclusionX > Y

Inverse InclusionX < Y

OverlapX>< Y

DisjointnessX ! Y

CEN.South

NDC.Northeast

><

NDC.Southwest

><

NDC.Southeast>

CEN.Midwest NDC.Midwest==

CEN.USA

CEN.West

CEN.NortheastNDC.USA

==

!

><NDC.West

>

<

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5articulations 9

Ludäscher:Whole-Tale++ 64

MergedtaxonomyT3

CEN.South

NDC.Northeast

NDC.Southwest

CEN.USANDC.USA

CEN.West

CEN.Northeast

NDC.Southeast

NDC.West

CEN.MidwestNDC.Midwest

Nodes

CEN 3NDC 4

congruent 2 Edges

is_a (input) 8overlaps (input) 3

CEN.Midwest

CEN.USA

CEN.South CEN.West CEN.Northeast NDC.Northeast

NDC.USA

NDC.Southeast NDC.Midwest NDC.Southwest NDC.West

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5

CEN.Midwest

CEN.USA

CEN.South CEN.West CEN.Northeast NDC.Northeast

NDC.USA

NDC.Southeast NDC.Midwest NDC.Southwest NDC.West

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5

CEN.South

NDC.Northeast

><

NDC.Southwest

><

NDC.Southeast>

CEN.Midwest NDC.Midwest==

CEN.USA

CEN.West

CEN.NortheastNDC.USA

==

!

><NDC.West

>

<

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5articulations 9

T1 T2

T1~T2 T3

Ludäscher:Whole-Tale++ 65

HowwealigntwotaxonomiesT1andT2

• Step1. SupplyinputtaxonomiesT1andT2

• Step2.DescribetherelationshipsbetweenT1 andT2

• Step3. IterativelyeditarticulationsinEuler/X

T1 T2

T1 T2

Inconsistent (N=0) Ambiguous (N>1)

T3

Add/Edit Articulations A

Euler/X

N Possible Worlds

N=1 N=0 or N>1

• … but where do the articulationscome from??– expert opinion– automatically derived from data

Ludäscher:Whole-Tale++ 66

Case1:CensusRegionvs.NationalDiversityCouncil

Ludäscher:Whole-Tale++

West

South

Midwest

North-east

NDC(withstates)

West

Southwest Southeast

Midwest North-east

CEN NDC

• … but where do the articulationscome from??– automatically derived from data– expert input

67

Ludäscher:Whole-Tale++

CEN.IL NDC.IL==

CEN.IN NDC.IN==

CEN.RI NDC.RI==

CEN.IA NDC.IA==

CEN.WV NDC.WV==

CEN.KS NDC.KS==

CEN.KY NDC.KY==

CEN.TX NDC.TX==

CEN.NortheastCEN.VTCEN.MA

CEN.ME

CEN.CT

CEN.PA

CEN.NY

CEN.NH

CEN.NJ

CEN.South

CEN.TN

CEN.MS

CEN.MD

CEN.DC

CEN.DE

CEN.VA

CEN.FL

CEN.AR

CEN.AL

CEN.OK

CEN.SC

CEN.LACEN.GA

CEN.NC

CEN.ID NDC.ID==

NDC.TN==

CEN.WY NDC.WY==

NDC.VT==

NDC.MS==

CEN.MT NDC.MT==

NDC.MA==

CEN.USA

CEN.Midwest

CEN.West

NDC.ME==

NDC.MD==

CEN.MI NDC.MI==

CEN.MN NDC.MN==

NDC.DC==

NDC.DE==

CEN.OR NDC.OR==

CEN.OH NDC.OH==

NDC.VA==

NDC.FL==

NDC.AR==

CEN.AZ NDC.AZ==

NDC.AL==

NDC.OK==

NDC.CT==

CEN.CO NDC.CO==

CEN.CA NDC.CA==

CEN.SD NDC.SD==

NDC.SC==

CEN.MO

CEN.ND

CEN.NE

CEN.WI

NDC.LA==

NDC.MO==

CEN.UT NDC.UT==

NDC.GA==

NDC.PA==

CEN.NV

CEN.NM

CEN.WA

NDC.NY==

NDC.NV==

NDC.NM==

NDC.WA==

NDC.NH==

NDC.NJ==

NDC.ND==

NDC.NE==

NDC.WI==

NDC.NC==

NDC.West

NDC.Midwest

NDC.Northeast

NDC.Southeast

NDC.USA

NDC.Southwest

Nodes

CEN 54NDC 55 Edges

isa_CEN 53isa_NDC 54Art. 49

CEN.IL NDC.IL==

CEN.IN NDC.IN==

CEN.RI NDC.RI==

CEN.IA NDC.IA==

CEN.WV NDC.WV==

CEN.KS NDC.KS==

CEN.KY NDC.KY==

CEN.TX NDC.TX==

CEN.NortheastCEN.VTCEN.MA

CEN.ME

CEN.CT

CEN.PA

CEN.NY

CEN.NH

CEN.NJ

CEN.South

CEN.TN

CEN.MS

CEN.MD

CEN.DC

CEN.DE

CEN.VA

CEN.FL

CEN.AR

CEN.AL

CEN.OK

CEN.SC

CEN.LACEN.GA

CEN.NC

CEN.ID NDC.ID==

NDC.TN==

CEN.WY NDC.WY==

NDC.VT==

NDC.MS==

CEN.MT NDC.MT==

NDC.MA==

CEN.USA

CEN.Midwest

CEN.West

NDC.ME==

NDC.MD==

CEN.MI NDC.MI==

CEN.MN NDC.MN==

NDC.DC==

NDC.DE==

CEN.OR NDC.OR==

CEN.OH NDC.OH==

NDC.VA==

NDC.FL==

NDC.AR==

CEN.AZ NDC.AZ==

NDC.AL==

NDC.OK==

NDC.CT==

CEN.CO NDC.CO==

CEN.CA NDC.CA==

CEN.SD NDC.SD==

NDC.SC==

CEN.MO

CEN.ND

CEN.NE

CEN.WI

NDC.LA==

NDC.MO==

CEN.UT NDC.UT==

NDC.GA==

NDC.PA==

CEN.NV

CEN.NM

CEN.WA

NDC.NY==

NDC.NV==

NDC.NM==

NDC.WA==

NDC.NH==

NDC.NJ==

NDC.ND==

NDC.NE==

NDC.WI==

NDC.NC==

NDC.West

NDC.Midwest

NDC.Northeast

NDC.Southeast

NDC.USA

NDC.Southwest

Nodes

CEN 54NDC 55 Edges

isa_CEN 53isa_NDC 54Art. 49

68

Ludäscher:Whole-Tale++

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

USA,MidwestandState-levelalignmentsareallcongruent

69

Ludäscher:Whole-Tale++

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

Theoverlappingrelationsareautomaticallyderivedfromdata

70

Ludäscher:Whole-Tale++

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

DCisinboththeSouthandtheNortheast

71

Case2:CensusRegionvsTimeZone

Ludäscher:Whole-Tale++

PacificMountain

CentralEastern

West

South

Midwest

North-east

CEN TZ

• … but where do the articulationscome from??– automatically derived from data– expert input

72

Ludäscher:Whole-Tale++

CEN.Northeast

TZ.Eastern

<

CEN.Midwest><

TZ.Mountain

><

TZ.Pacific

!

CEN.South

><

><

!

TZ.Central

><

CEN.USA

CEN.West

TZ.USA

==

!

><

!

Nodes

CEN 5TZ 5

Edges

isa_CEN 4isa_TZ 4Art. 12

CEN.Midwest

CEN.USATZ.USA

TZ.Eastern

TZ.Central

TZ.Mountain

CEN.South

CEN.Northeast

CEN.West TZ.Pacific

Nodes

CEN 4comb 1TZ 4

Edges

input 7overlapsinput 6overlapsinferred 1

inferred 1

InputOutput:PossibleWorld

Top-downregionalalignment

73

Howdoweknowifour‘expertarticulations’arecorrect?

Ludäscher:Whole-Tale++

R1 R2

R3

R4

R5

R6 R7

R8

R9

GIS solution as the Ground Truth..

74

Ludäscher:Whole-Tale++

R1

R2

R3

R4

R5

R6

R7

R8

R9

CEN.Midwest

CEN.USATZ.USA

CEN.West

CEN.NortheastTZ.Eastern\CEN.Midwest

TZ.Eastern\CEN.South

CEN.South

CEN.South*TZ.CentralTZ.Central\CEN.Midwest

CEN.South\TZ.Eastern

CEN.South\TZ.Mountain

TZ.Central

CEN.Midwest\TZ.Eastern

TZ.Mountain\CEN.SouthTZ.Mountain

CEN.Midwest\TZ.Mountain

TZ.Mountain\CEN.Midwest

CEN.Midwest*TZ.Mountain

CEN.Midwest\TZ.Central

TZ.Mountain\CEN.West

CEN.Midwest*TZ.Eastern

CEN.West*TZ.Mountain

CEN.South*TZ.MountainCEN.South\TZ.Central

TZ.Eastern

CEN.South*TZ.Eastern

CEN.Midwest*TZ.CentralTZ.Central\CEN.South

TZ.PacificCEN.West\TZ.Mountain

Nodes

CEN 4newComb 18comb 1TZ 4

Edges

input 6inferred 37

Combinedconceptssolutionforregional-levelalignments

75

DothetaxonomieshavetobespatialinordertouseRCC-5?

• No!Themoretypicalcasesfortaxonomyalignmentareusuallybetweennon-spatialtaxonomies– forwhichno“GISroute”ordirectvisualcuesaboutregionalextensionsareavailable

– theuseofRCC-5asanalignmentvocabularyisasuitableapproachtoperformawiderangeofmulti-hierarchyreconciliations

Ludäscher:Whole-Tale++ 76

Conclusion&Discussion• Underscoresthebenefitsofdesigningdifferentalignmentworkflows(Bottom-upvs.Top-Down)– Bottom-up:non-overlappingrelationshipsatthelowest-levelarticulations,notsurehowtoalignthehigher-levelconcepts

– Top-Down:whenthereisoftenoverlappingleaf-levelrelations..Expertinputwillfrequentlybeneededtoestablishsuchexpectationsunderthetop-downapproach

Ludäscher:Whole-Tale++

https://github.com/EulerProject/[email protected]

77

Implications

• Logic-basedtaxonomyalignmentapproach– Disambiguatename-basedtaxonomyalignmentovertime

• 40%oftheconceptsinbiologytaxonomiesundergoesnamechangeovertime(Franzetal.,2016)

– Maymitigateproblemsinequivalentcrosswalking• Membershipconditionproblemthatwasoftencriticizedincrosswalking

– Preservestheoriginaltaxonomieswhileprovidinganalignmentview

• Solvedataintegrationproblemsthathappeninthemorecoarse-grainedrelativecrosswalking

Ludäscher:Whole-Tale++

https://github.com/EulerProject/[email protected]

78

• …Aristotle…• …Euler…• …• …GregWhitbread…

• [BPB93]J.H.Beach,S.Pramanik,andJ.H.Beaman.Hierarchictaxonomicdatabases.,Advances inComputerMethodsforSystematicBiology:ArtificialIntelligence,Databases,ComputerVision,1993

• [Ber95]WalterG.Berendsohn.Theconceptof“potentialtaxa” indatabases.Taxon,44:207–212,1995.

• [Ber03]WalterG.Berendsohn.MoReTax – HandlingFactualInformationLinkedtoTaxonomicConceptsinBiology.No.39inSchriftenreihe fürVegetationskunde.Bundesamt für Naturschutz,2003.

• [GG03]M.Geoffroy andA.Güntsch.Assemblingandnavigatingthepotentialtaxongraph.In[Ber03],pages71–82,2003.

• [TL07]Thau,D.,&Ludäscher,B.(2007).Reasoningabouttaxonomiesinfirst-orderlogic.EcologicalInformatics,2(3),195-209.

• [FP09]Franz,N.M.,&Peet,R.K.(2009).Perspectives:towardsalanguageformappingrelationshipsamongtaxonomicconcepts.SystematicsandBiodiversity,7(1),5-20.

• … 79

SomeEulerXHistory

Ludäscher:Whole-Tale++