hashing for fun and profit

Post on 26-May-2015

194 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

An impromptu lightning talk I gave at Code4Lib North 2013

TRANSCRIPT

HASHING FOR FUN AND PROFITMat Trudel@mattrudel

HASHING

• A one-way mathematical function that reduces a string of data into a fixed length number

• Easy to compute, hard to reverse

• Collision resistant. No two files should have the same hash

• Like a fingerprint, basically

SHA-1

SHA-1160 bits (40 hex chars)

SHA-1ff4f25dfc62c9df4478549444e9eb364841c9391

ff4f25dfc62c9df4478549444e9eb364841c9391

WEBCITATION.ORG

Unicorns! Unicorns! Unicorns!

Unicorns! Unicorns! Unicorns!

Unicorns! Unicorns! Unicorns!

Unicorns!

ff4f25dfc62c9df4478549444e9eb364841c9391 ff4f25dfc62c9df4478549444e9eb364841c9391 ff4f25dfc62c9df4478549444e9eb364841c9391

ASSET STORAGE ISDEAD SIMPLE

ff4f25dfc62c9df4478549444e9eb364841c9391.jpg

COST OF A DUPLICATE COPY IS A DB ROW OF METADATA

They both point to the same data on-disk

EVERY COPY OF

IS THE SAMEff4f25dfc62c9df4478549444e9eb364841c9391

TONS OF OTHER USEFUL PROPERTIES

• Content Addressable - essentially a URN

• Useful for detecting file changes (intentional or not)

• Can be computed using just the file itself (it’s just math)

• Indispensable part of many tools (git, CDNs, TLS)

• fin •

top related