Third Provenance Challenge University of Texas at El Paso Team’s Presentation

Download Third Provenance Challenge University of Texas at El Paso  Team’s Presentation

Post on 13-Jan-2016




0 download

Embed Size (px)


Third Provenance Challenge University of Texas at El Paso Teams Presentation. Team: Paulo Pinheiro da Silva, Nicholas Del Rio, Leonardo Salayandia Presenter: James Michaelis (RPI) Overview. UTEP Approach: Process and Provenance Separation - PowerPoint PPT Presentation


  • Third Provenance ChallengeUniversity of Texas at El Paso Teams PresentationTeam: Paulo Pinheiro da Silva, Nicholas Del Rio, Leonardo Salayandia

    Presenter: James Michaelis (RPI)

  • Overview

    UTEP Approach: Process and Provenance SeparationProcess: Workflow-Driven Ontologies (WDO) and Semantic Abstract Workflow (SAW)PC3 WDO and SAWsProvenance: Proof Markup Language (PML)PC3 PMLCapturing PC3 PMLAnswering PC3 QuestionsConclusions

  • UTEP ApproachDifferent than OPM that considers process and provenance knowledge altogether, UTEP uses Inference Web technology that has an explicit separation between process and provenance knowledge Inference Web work on provenance was originally developed in the context of theorem provers instead of scientific workflowsInference Web has been expanded to include support for scientific workflowsSeparation between process and provenance has been preserved (and is considered beneficial considering many provenance scenarios without process knowledge) Process knowledge: Workflow-Driven Ontology (WDO) and Semantic Abstract Workflow (SAW)Provenance knowledge: Proof Markup Language (PML)

  • WDOs and SAWsWDOs are OWL-based ontologies used to represent process-related concepts, which are classified either as Data or MethodsWDO concepts can be created or reused from other domain ontologies as needed during the specification of processesSAWs are built using instances of the WDO concepts connected through isInputTo and isOutputOf relations (and their inverses)WDO-It! is a graphic editor for WDOs and SAWs

  • PC3 Semantic Abstract WorkflowWDO Data instancesWDO Method instancesPML-P Agent instances: Data comes from or goes to PML-P AgentData isOutputOf MethodData isInputTo MethodAbstraction at multiple levels of detail

  • Proof Markup Language (PML)PML is an OWL-based ontology composed of three modules:PML-J (justifications): used to build information manipulation traces (or justifications) for a given response (or result)PML-P (provenance): used to annotate PML-J documents with metadata about sources, methods (called inference rules), and agentsPML-T (trust): used to annotate PML-J with trust and belief metadata about agents and conclusions

  • PC3 PML Encoding

    0 OPM:ArtifactOPM:ProcessOPM:WasGeneratedByOPM:WasControlledBy

  • PML CaptureFrom a given SAW, WDO-It! has two options to generate code capable of capturing provenance: Generate PML wrappersused for run-time capture of provenanceGenerate PML data annotatorsused for post-execution generation of provenance

  • Answering PC3 Questions :What proc. steps were used?SPARQL can be used to query the PML provenance graph.This example shows how a SPARQL query could use the PML graph to answer what processing steps were used to generate some artifact.

  • ConclusionThe full encoding of the WDO, SAWs and PML for PC3 was done in 36 hoursUTEPs approach relies on tools to:Understand and speed-up the encoding of process knowledge (as WDOs and SAWs)Use process knowledge to create PML wrappers and/or PML data annotatorsVisualize and browse provenanceUse provenance for explanations, trust computation, data discovery, etc.

  • AcknowledgementsUTEP would like to thank James Michaelis for his effort to understand our work and represent our team at the 3rd Provenance ChallengeUTEP would like to thank the 3rd Provenance Challenge organizers and Paul Groth in particular for creating an opportunity for our team to be represented at the event

    QUESTION 1: Could I get a 1-2 sentence definition of your term Process as a reference? Im going to need to distinguish this from the OPM concept Process during the talk.*NOTE 1: The last 2 bullets seem redundant from the last slide. Should they be removed?NOTE 2: Bullet 1.3 seems redundant from the description in bullet 1. It might make sense to remove it.NOTE 3: We should include citations for inference web at this point (perhaps a URL to*QUESTION 2: Do you have an example WDO ontology (an owl file) I can reference? Im not sure if you have one up on your PC3 page but I cant load it right now since the PC3 server isnt responding.*QUESTION 3: Should Document:EndResult be classified as a PMLP:Source instead? It would seem more intuitive to me to do this.QUESTION 4: When WDO data refers to a named variable, is the value of concern?*QUESTION 5: In the bullet PMLJ, what is meant by response? Is response equivalent to conclusion in the PMLT bullet?

    NOTE 4: In PMLP bullet, we should probably mention PMLP:information somewhere. NOTE 5: It seems the list of items provided is not complete. Maybe we should tack etc at the end.*QUESTION 6: For the oval corresponding to OPM:Artifact, are you mapping to PMLJ:NodeSet + PMLP:Information, or just PMLP:Information. Based on our earlier telecons, I think Paulo recommended we leave this as PMLP:Information.QUESTION 7: There is one oval without an explanation what does this map to?*QUESTION 8: Of these two options, were were directly used in the Provenance Challenge? And for which purposes?

    NOTE 6: In general, Im uncertain as to what to discuss here for the talk. Could I get a paragraph on what I should say?*QUESTION 10: Which of the Provenance Challenge queries does this example correspond to?QUESTION 11: Does the data described on this slide correspond to the SAW (slide 5) and/or PML (slide 7) given?

    NOTE 7: You mention using SPARQL, but dont provide query syntax for this example. Perhaps this should be included.NOTE 8: Since we have a Probe-it screenshot, we should include a bullet on it in this slide or a previous one (plus a reference).NOTE 9: For this, more details on the graph layout coloring scheme are needed. If I recall correctly, orange nodes correspond to source nodes in a tree or dag. Is this right?NOTE 10: Also, some more information on the red arcs could be useful. Do these correspond to a specific query you developed?NOTE 11: As with slide 8, Im not sure how I would talk the audience through this. Could I also get a discussion paragraph for this slide?

    *QUESTION 12: Finishing up with the slides, I am not quite sure how the Process data relates to the Provenance data. Are the two completely independent? If they reference each other, in what ways does this happen?

    NOTE 12: When I discuss this slide, I assume I should emphasize that this work didnt require much effort, given the amount of time committed.