hashing for fun and profit
DESCRIPTION
An impromptu lightning talk I gave at Code4Lib North 2013TRANSCRIPT
HASHING FOR FUN AND PROFITMat Trudel@mattrudel
HASHING
• A one-way mathematical function that reduces a string of data into a fixed length number
• Easy to compute, hard to reverse
• Collision resistant. No two files should have the same hash
• Like a fingerprint, basically
SHA-1
SHA-1160 bits (40 hex chars)
SHA-1ff4f25dfc62c9df4478549444e9eb364841c9391
ff4f25dfc62c9df4478549444e9eb364841c9391
WEBCITATION.ORG
Unicorns! Unicorns! Unicorns!
Unicorns! Unicorns! Unicorns!
Unicorns! Unicorns! Unicorns!
Unicorns!
ff4f25dfc62c9df4478549444e9eb364841c9391 ff4f25dfc62c9df4478549444e9eb364841c9391 ff4f25dfc62c9df4478549444e9eb364841c9391
ASSET STORAGE ISDEAD SIMPLE
ff4f25dfc62c9df4478549444e9eb364841c9391.jpg
COST OF A DUPLICATE COPY IS A DB ROW OF METADATA
They both point to the same data on-disk
EVERY COPY OF
IS THE SAMEff4f25dfc62c9df4478549444e9eb364841c9391
TONS OF OTHER USEFUL PROPERTIES
• Content Addressable - essentially a URN
• Useful for detecting file changes (intentional or not)
• Can be computed using just the file itself (it’s just math)
• Indispensable part of many tools (git, CDNs, TLS)
• fin •