hashing for fun and profit

14
HASHING FOR FUN AND PROFIT Mat Trudel @mattrudel

Upload: mattrudel

Post on 26-May-2015

194 views

Category:

Technology


4 download

DESCRIPTION

An impromptu lightning talk I gave at Code4Lib North 2013

TRANSCRIPT

Page 1: Hashing for Fun and Profit

HASHING FOR FUN AND PROFITMat Trudel@mattrudel

Page 2: Hashing for Fun and Profit

HASHING

• A one-way mathematical function that reduces a string of data into a fixed length number

• Easy to compute, hard to reverse

• Collision resistant. No two files should have the same hash

• Like a fingerprint, basically

Page 3: Hashing for Fun and Profit

SHA-1

Page 4: Hashing for Fun and Profit

SHA-1160 bits (40 hex chars)

Page 5: Hashing for Fun and Profit

SHA-1ff4f25dfc62c9df4478549444e9eb364841c9391

Page 6: Hashing for Fun and Profit

ff4f25dfc62c9df4478549444e9eb364841c9391

Page 7: Hashing for Fun and Profit

WEBCITATION.ORG

Page 8: Hashing for Fun and Profit

Unicorns! Unicorns! Unicorns!

Unicorns! Unicorns! Unicorns!

Page 9: Hashing for Fun and Profit

Unicorns! Unicorns! Unicorns!

Unicorns!

ff4f25dfc62c9df4478549444e9eb364841c9391 ff4f25dfc62c9df4478549444e9eb364841c9391 ff4f25dfc62c9df4478549444e9eb364841c9391

Page 10: Hashing for Fun and Profit

ASSET STORAGE ISDEAD SIMPLE

ff4f25dfc62c9df4478549444e9eb364841c9391.jpg

Page 11: Hashing for Fun and Profit

COST OF A DUPLICATE COPY IS A DB ROW OF METADATA

They both point to the same data on-disk

Page 12: Hashing for Fun and Profit

EVERY COPY OF

IS THE SAMEff4f25dfc62c9df4478549444e9eb364841c9391

Page 13: Hashing for Fun and Profit

TONS OF OTHER USEFUL PROPERTIES

• Content Addressable - essentially a URN

• Useful for detecting file changes (intentional or not)

• Can be computed using just the file itself (it’s just math)

• Indispensable part of many tools (git, CDNs, TLS)

Page 14: Hashing for Fun and Profit

• fin •