This page intentionally left blank. ⬇️, ➡️, or spacebar 🛰 to start slidedeck. --- class: middle, center # Storage --- # Storage *Note: This is about storage formats, not the theory and recommendations for digital preservation storage.* .right[![](/img/next-cube.jpg)] --- # Storage - Tape - Drives - RAID - Network attached storage - Networks - Cloud - Film .right[![](/img/spacecraft.gif)] --- # Tape Magnetic tape has been used to store media for over 50 years in various forms, from early IBM computers to cheap consumer media vehicles. 📼 .center[![ibm-tape](/img/ibm-tape.jpg)] --- # Types of Tape - Enterprise tape libraries (maybe with robots 🤖) - High density magnetic media - Linear-Tape Open (LTO) LTO is commonly used in archives. .right[![tape-robot](/img/tape-robot.gif)] .right[[source](http://www.popularmechanics.com/technology/robots/a20808/ibm-tape-swapping-robot/)] --- # LTO LTO is cheap, sturdy, and environmentally reasonable (as far as storage solutions go). As of now, there are 9 generations of LTO technology. LTO-9 has the capacity of 18TB, 400 MB/s speed, and writing takes aproximately 12h30m. LTO-8 has the capacity of 12TB, 360 MB/s speed, and writing takes approximately 9h15m. LTO-7 has the capacity of 6TB, 300 MB/s speed, and writing takes approximately 6 hours. LTO-6 has the capacity of 2.5TB, 160 MB/s speed, and writing takes approximately 5h30m. LTO-5 has the capacity of 1.5TB, 140 MB/s speed, and writing takes approximately 3h10m. LTO-5, of these listed, can also have only 2 partitions (instead of 4) and 2:1 compression (instead of 2.5:1). .right[![lto-tape](/img/lto-tape.png)] --- # LTFS LTO is used in conjunction with Linear Tape File System (LTFS). This allows LTO to be accessed in a way similar to other data storage people are already used to, like disk drives. Something to note is that file name normalization may be required when storing data using an LTFS system, as it cannot handle ampersands, backslashes, and other "illegal" characters. --- # Some other LTO factors WORM (Write Once, Read Many): Possible as of LTO-3 and later Encryption: Possible as of LTO-4 and later Partitioning: Possible as of LTO-5 and later .right[![worm](/img/worm.png)] --- # Drives (HDD, SSD) If you want to keep bits as bits, you can store your data on hard drives. There are several kinds. Here are two common ones used for storage: - Hard disk drives (HDD): Spinning disks - Solid-state drive (SSD): Memory chips without moving parts --- # HDD Pros: - Cheap - Well-known - Large sizes available Cons: - Slow - Big - Has moving parts; more fragile - Consumes more power --- # SSD Pros: - Fast - Durable - No moving parts Cons: - Expensive - Large sizes are VERY expensive - Wears down faster --- # Formatting Drive file systems come in different formats, which is annoying. Some only work with the Windows operating system, and some only work with macOS. On the next page are the most common modern file systems, but [there are many more](https://en.wikipedia.org/wiki/Comparison_of_file_systems). --- **Ext2, Ext3, Ext4 (Extended File System)** - Linux preferred filesystem - Ext2 Maximum file/volume size: 16GB-2TB/2TB-32TB - Ext3 Maximum file/volume size: 16GB-2TB/2TB-32TB - Ext4 Maximum file/volume size: 16TB/1EiB **FAT32 (File Allocation Table)** - Natively read/write on Windows and macOS. - Maximum file/volume size: 4GB/2TB **NTFS (Windows NT File System)** - Natively read/write on Windows. - Read-only NTFS on macOS - Maximum file/volume size: 16 TB/256TB **HFS+ (Hierarchical File System, aka macOS Extended)** - Natively read/write HFS+ on macOS - Required for Time Machine - Maximum file/volume size: 8EiB/8EiB --- # Journaling Harddrives can be journaled or non-journaled. "A journaling file system is a file system that keeps track of changes not yet committed to the file system's main part by recording the intentions of such changes in a data structure known as a "journal", which is usually a circular log." [wikipedia](https://en.wikipedia.org/wiki/Journaling_file_system) See [Anatomy of Linux journaling file systems](https://www.ibm.com/developerworks/library/l-journaling-filesystems/index.html) for more. [Here](https://apple.stackexchange.com/questions/7609/what-are-the-differences-between-journaling-hfs-and-non-journaling-hfs) is a good breakdown of the differences when choosing to journal or not. --- # RAID RAID = Redundant Array of Inexpensive (or Independent) Disks A RAID is composed of multiple drives connected by a small software/firmware layer that - allows the units to act as one cohesive storage, - as well as efficiently create data redundancy or - sometimes optimize data transaction performance. --- # RAID RAID comes in different flavors, or *levels*. Higher levels are generally better about protecting. RAID 0: Striping. Stores data across drives but does not mirror, so there is no backup to the backup. No parity. RAID 1: Mirrors (backs up) but does not stripe (across drives). If you have 2 drives, it means 1 is duplicated twice. RAID 2 and RAID 3: Not used very often, but they use striping and parity methods (scheme for error protection). RAID 4: Has striping and parity (dedicated). RAID 5: Has striping and parity (distributed). Can handle one failed drive. RAID 6: Has striping and parity (double distributed). Can handle two failed drives. * Note that if something bad happens while RAID 5 is repairing itself, you can lose everything! --- # Networks Data storage doesn't have to be accessed directly on the same operating system or wired by way of USB or FireWire. Storage can be accessed via a network connection. .center[![network-fire](/img/network-fire.png)] --- # NAS, DAS, SAN NAS are servers that hold data and are also attached to a computer network. They can be connected to a router like the one in your home that gives you wifi. A NAS can be used in conjunction with RAID technology to create redundant storage server that can easily be accessed and controlled through the browser of connected computers. .center[![nas-written](/img/nas-written.png)] --- # NAS, DAS, SAN DAS is storage attached directly to a computer or server. Very fast! SAN is a group of computers conneced through a network with shared directly-attached storage. --- # Cloud Storage that you access via the internet is colloquially known as "cloud storage." The way the data is stored is not necessarily known, only the knowledge that it is present and accounted for somewhere else. # ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ # ☁︎ ☁︎ ☁︎ ☁︎ ☁︎ ☁︎ ☁︎ ☁︎ ☁︎ ☁︎ ☁︎ ☁︎ ☁︎ ☁︎ # ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ ☁️ --- # Film Unlike other media formats, film can be stored on film. .center[![film-storage](/img/film-storage.jpg)] --- # Film Stable conditions of cool temperatures and low humidity can keep film safe for many years. The bummer, though, is you won't be able to access it easily. Temperature: 10 °C ± 2 °C (aka 50 °F ± 3 °F) Relative Humidity: 40% ± 5% Colder is better BUT that consistency of temperature is much more important than being cooler. Better to have a temp and humidity level at something sustainable and does not fluctuate, rather than something "best practices" but unreliable. --- # The future? .center[![dna](/img/dna.png)] --- # "Algorithms for DNA Data Storage"
--- # Additional Resources - [The History of Magnetic Tape and Computing: A 65-Year-Old Marriage Continues to Evolve](http://www.ironmountain.com/resources/general-articles/t/the-history-of-magnetic-tape-and-computing-a-65-year-old-marriage-continues-to-evolve) - [ScienceFriday: Ghosts in the Reels](https://apps.sciencefriday.com/data/ghosts.html) --- # Learning more [Home](/)