remembrance: the unbearable sentience of being digital ragib hasan *, radu sion, and marianne...
TRANSCRIPT
Remembrance: The Unbearable Sentience of Being Digital
Ragib Hasan*, Radu Sion, and Marianne Winslett
University of Illinois at Urbana-ChampaignStony Brook University
4th Biennial Conference on Innovative Database Research
January 4-7, 2009
2
What is the difference between …
A file, or a database tuple, is a dumb container of valuesData, the robot, can remember, has sentience
4th Conference on Innovative Data Systems Research (CIDR) 2009
Data, the android from Star Trek
and …
Data, stored in databases or file cabinets
3
Our Data objects suffer from AmnesiaSince the early days, our data processing model has assumed data containers (tuples, files, variables) to be oblivious of their past
4th Conference on Innovative Data Systems Research (CIDR) 2009
We assume a data object to know only its present value, and not recall its old values, states, or context information
4
Our current data objects are not sentient• Database tuples returned in query results
only show their latest values
• Data processing, evaluation is based on only the current state of variables
• Data objects do not tell us their historical states, or how they were created, processed, transmitted
• We cannot pick a system, turn a knob, and time-travel to 5 minutes in the past
4th Conference on Innovative Data Systems Research (CIDR) 2009
5
Exploring remembrance in various formsDatabases• Time-travel/Transaction time
databases• Temporal SQL• Checkpointing
4th Conference on Innovative Data Systems Research (CIDR) 2009
Scientific computing• Provenance• Lineage
File systems• Versioning file systems• CVS and other SCMs• WORM storage
Web• WayBack archive• gMail
Systems & languages• Reflective systems, self-managed systems• Time traveling virtual machines
6
However …• Most of these systems are in
essence, versioning systems– Memory / history is not an intrinsic
property of data– Association between a data value and
its history is kept externally
• These solutions are also isolated, piecemeal, and glued together by our original single-valued, oblivious data paradigm
4th Conference on Innovative Data Systems Research (CIDR) 2009
7
Remembrant Computing• We propose a new data paradigm, where
– Data objects retain their memories as an intrinsic property– History, context, temporal events can be recalled– Past (memory) and present (value) are considered as an atomic unit of
data
4th Conference on Innovative Data Systems Research (CIDR) 2009
Files recall their past contents
x = 5x = 10
Variables remember their past values and context
Hard disk blocks recall past content
Queries return tuple objects which remember their past
context , value, states
8
Remembrant Computing• When data objects are transferred, they
retain their old memories
• Copies retain memory of the original, along with copying context
• Deletions remove the value from container, but the memories may live on
4th Conference on Innovative Data Systems Research (CIDR) 2009
9
But, what’s the point of remembering?• “Time-aware knowledge”• Associative memories• More expressivity in data processing– Compute based on not only present value, but
historic information, derivation, lineage, provenance
– Mine useful patterns• Taint analysis / information flow checking• Recover from transient errors at arbitrary
granularities• Time-travel seamlessly to any point in an
application, system, or website4th Conference on Innovative Data Systems Research (CIDR) 2009
10
Is this possible, viable, desirable?• Physical limitations– Only limited amount of “memory” possible
in primary and secondary data storage– Not all memories can be retained forever
• Problem of Recursion– Will the system to store memories also
have its own memory objects? • Performance– Handling large amount of history for every
data object can cause performance bottleneck
4th Conference on Innovative Data Systems Research (CIDR) 2009
11
Is this possible, viable, desirable?• Security / privacy– How do we control access to old memories?– Would remembering states/values violate
privacy?• Legal issues– Various regulations limit how long data can
be retained– Privacy laws limit contextual information
that can be recorded
4th Conference on Innovative Data Systems Research (CIDR) 2009
12
Is this possible, viable, desirable?• Scalability: How do we recall, and when
do we forget?– Remembering everything can be undesirable• Some humans suffer from Hyperthymesia or total
recall• Too many unimportant details can overwhelm
functionality
– How to decide what “memories” to forget is an issue
– Management, searching, indexing all need to scale with large number of memories
4th Conference on Innovative Data Systems Research (CIDR) 2009
13
Where we are today …• MyLifeBits:– Recording all of Gordon Bell’s personal
interactions requires 18GB/year, or 1.1 TB over a lifetime
• Provenance:– 16% overhead in recording all information
flows for files (PASS [Seltzer et al, Usenix Technical 06])
– 3%-15% overhead in secure, tamper evident provenance for files [Hasan et al, Usenix FAST09]
4th Conference on Innovative Data Systems Research (CIDR) 2009
14
Epilogue• Ability to recall the past memories, and
contextual information differentiates sentient beings from simpler organisms
• Augmenting data objects with memory as an intrinsic property will introduce sentience for digital objects
4th Conference on Innovative Data Systems Research (CIDR) 2009