This page intentionally left blank. ⬇️, ➡️, or spacebar 🛰 to start slidedeck. --- class: center, middle # Fixity --- # Fixity - Intro - Purpose - Algorithms - Tools --- # What is a checksum? These are defined sometimes as "digital fingerprints." I think this word is heard quite often but not well-explained or well-understood, but the absolute essence of a checksum is just a "random" string... sorta. At least that is what it looks like, and underneath it is a complex algorithm that is able to compare your file to itself-from-the-past. What matters is that that string is the same now as it is later in the future and as it was before in the past. --- # Cryptography How are checksums used most commonly? Cryptography! This prevents man-in-the-middle attacks during transfer across insecure lines. It allows encrypted passwords or other sensitive information can pass through a network to another location without that information being compromised. --- # Salting A "Salt" is another random string which is linked to existing password hashes and then hashed again for extra layers of protection. The salt value and resulting hash can then be stored in a database. You are able to encrypt that information, but without both parts, it cannot be un-encrypted. Extra safe! --- # Hash collision A collision attack is an attempt to find two arbitrary outputs which produce the same hash value -- hence, causing a collision. If someone can fake your checksum, they can get to the information they want. Or someone can inject bad information into your files. Or you can just receive the wrong thing! --- # Broken? "Broken" means "the technology that keeps these file fingerprints secure can be corrupted if the algorithm can be decrypted by computers. 1. It's the ability to break the code at all, and 2. the amount of time and money necessary to cause that break to happen. If an algorithm would take modern computers years to decrypt, it's still pretty secure. --- # But preservation??? But for preservation purposes, checksums are used for FIXITY!!! So we don't have to worry about security! --- # FIXITY! We care about file integrity, file location authentication -- are the files there and are they what they used to be? --- # Storage, too You can also use checksums to abstract larger files away for quicker access and validation. --- # When to check? Check your files after any major event: - acquisition - "pre-ingest" - post-ingest - migration (before and after) - transfer (before and after) - any major incident --- # Downsides to checksums - very slow to check large files - might take up energy --- class: middle, center # Algorithms --- # CRC CRC! This stands for Cycle Redundancy Check. The CRC was invented by W. Wesley Peterson way back in 1961; CRC32 is the work of several researchers and was published in 1975. It's a lot older and more basic of a process, so obviously there are lots of limitations here... not good for security, can be faked easily. But it is used in file format verification, Matroska files being an example of this, using a self-check mechanism. --- # SHA SHA stands for Secure Hash Algorithm. It was developed by the National Institute of Standards and Technology (NIST) as a U.S. Federal Information Processing Standard (FIPS). SHA is very common, but it gets a little complicated because there are different numbers meaning different things. The lower, less secure versions (SHA-1) are used to generate small unique identifiers, like git commits. Git uses SHA-1 so you have unique commits. --- # SHA-1 is Broken? HTTPS used to use SHA-1, but that's been broken for a while. It now uses SHA-2 (That's the SHA-256 version, not the SHA-512 version -- the algorithm is basically the same but there are different version) to determine if you are legitimately on the website you intend to be on via the SSL certificate. --- # MD5 Archives like MD5. BitTorrent files also use MD5 for fixity and file integrity. MD5 stands for Message Digest algorithm 5 (y'know, because computer scientists are great at naming things) and was invented by Ronald Rivest in 1991 to replace the old MD4 standard (you can guess what MD4 stood for). It's "broken", though, so do not use it for security! But it is fine to use for fixity. --- # Tools - **md5** - [hashdeep](https://github.com/jessek/hashdeep) - Windows Command Prompt: `certutil -hashfile MD5` And more. See [this section](http://mattersinmediaart.org/sustaining-your-collection.html#checksum-tools) of Matters in Media Art for more. --- # Making an MD5 Mac/Linux: `md5 your-file-here` To print only the checksum and not the associated filename, use `md5 -q your-file-here` for "quiet mode." Windows: `CertUtil -hashfile your-file-here MD5` ??? for Mac/Linux, this will make an MD5 hash and then print it to the screen, but you will have to take extra steps if you want to save it somewhere (which you probably do, because why did you make it?). --- # Additional Resources - [COPTR Fixity Tools List](http://coptr.digipres.org/Category:Fixity) - [Digital Preservation Handbook: Fixity and checksums](http://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums) - [!!Con 2015: Don't know about you, but I'm feeling like SHA-2!](https://www.youtube.com/watch?v=1QgamEwwPro) - [Reconsidering the Checksum for Audiovisual Preservation](http://dericed.com/papers/reconsidering-the-checksum-for-audiovisual-preservation/) --- # Learning more [Home](/)