the case for browser provenance daniel w. margo and margo seltzer harvard school of engineering and...

14
The Case for Browser The Case for Browser Provenance Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Upload: jordon-ansted

Post on 01-Apr-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

The Case for Browser ProvenanceThe Case for Browser Provenance

Daniel W. Margo and Margo Seltzer

Harvard School of Engineering and Applied Sciences

Page 2: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Overview

• Problem: Browser Data Management

• Solution: Provenance for Web Browsers

• Use Cases• Details and Challenges• Implementation

Page 3: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

The Modern Browser:A Super-Application

• Originally a distributed document reader.• But now most documents are distributed.

• And the definition of “document” has changed:– Webmail

– YouTube

– Google Apps

• It is difficult for users to manage all this data.– e.g., recall a specific web page.

Page 4: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Browser Data Management (I)

• A “little big data” problem…– My history: ~25k objects in ~2 months.– Tractable for computers, but not for users.

• Traditional solution: Bookmarks.– Requires users to tag their data in advance…– …and to manage the bookmarks.

• Advanced solutions:– History Search (Google Chrome’s “New Tab” page)– Autocompletion (form history, saved passwords)

Page 5: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Browser Data Management (II)

• Firefox 3’s “Smart Location Bar”

from http://support.mozilla.com/en-US/kb/Smart+Location+Bar

• Most solutions powered by history and usage statistics.• “History and usage statistics” = provenance.

Page 6: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Traditional Browser History

Page 7: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Web Graphs (Firefox 3 Places)

Page 8: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Browser Provenance

Page 9: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Browser Provenance

Page 10: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Use Case:

Contextual History Search• Most history search is textual

• Edges imply contextual relationships.– E.g. “rosebud” “Citizen Kane”.

• 2-phase contextual search (Shah et. al):– Perform a textual history search.– Then, push the weight of results to neighbors.

• Similar to modern web search…– And good for the same reasons.

Page 11: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Use Case:

Personalizing Web Search• Context is created by the user.

– So a gardener relates “rosebud” “flower”.– Frustrating if Google returns “Citizen Kane”.

• Browser could clarify context to search engine!– Naïve: Just insert “flower” into “rosebud” searches.– If engine had a better interface, we could do better.

• Personalization with privacy.– Browser knows more about user than cookies can.– No need to give third parties raw personal data.

Page 12: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Use Case:

Time-Contextual History Search• Current histories can’t recreate prior state.

– e.g., “were these two pages open simultaneously?”

• Time relationships…– Are natural: “rosebud, and I think I was also looking at

gardening tools around that time.”– Narrow the search space a great deal.

• Related Work:– Gyllstrom and Soules’ “SeeTrieve”– Dumals et. al’s “Stuff I’ve Seen”

Page 13: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Use Case:

Download Lineage

• Need to know where data comes from.– For source attribution, finding updates, etc.

• URL is not always sufficient.– “This image came from…ImageShack!”

• This is exactly what provenance is for!– Just query ancestors!

Page 14: The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences

Conclusion

• Browsers record many statistics.

• These statistics are provenance records.

• Provenance techniques can improve:– History search, via context.– Web search, via personalization.– Data management, via lineage.

• Some details in the paper.

• Excruciating details in future work.