dirkmeister.de: Storage

Showing posts with label Storage. Show all posts

Saturday, January 09, 2010

Storage Systems Course: My proposal

In my last post, I summarized some of the storage systems courses from international top universities with storage system labs.

In this post, I will distill my own ideas and my own views into a structure for a storage system course. Here, I assume here a 15-weeks course with a single 1 1/2 hour lecture per week (as we have in Germany):

Introduction, Overview, Disk Drive Architecture
Material: Ruemmler, Wilkes An introduction to disk drive modeling
Disk Scheduling / SSD
Material: Iyer, Druschel. Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O, Agrawal et al. Design Tradeoffs for SSD Performance
RAID
Material: Patterson et al. Introduction to Redundant Arrays of Inexpensive Disk (RAID), Corbett. Row-Diagonal Parity for Double Disk Failure Correction
Local File Systems
Local File System Case Studies: ext3, btrfs
Material: Valerie Aurora. A short history of btrfs, Card et al. Design and Implementation of the Second Extended Filesystem
Local File Structures (Sequential, Hashing, B-Tree)
Material: Comer. The Ubiquitous B-Tree
SAN / NAS / Object-based Storage
Material: Sacks. Demystifying DAS, SAN, NAS, NAS Gateways, Fibre Channel, and iSCSI
Examples: NFS, Ceph, GoogleFS/Hadoop DFS
Material: Weil. Ceph, A scalable, high-performance distributed file system, Ghemawat et al. The Google File System
Snapshots and Log-based Storage Designs
Material: Brinkmann, Effert. Snapshots and Continuous Data Replication in Cluster Storage Environments, Hitz et al. File System Design for an NFS File Server Appliance, Rosenblum, Ousterhout. The Design and Implementation of a Log-Structured File System
Fault Tolerance, Journaling, and Soft Updates
Material: Prabhakaran et al. Analysis and Evolution of Journaling File Systems, Seltzer et al. Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems
Advanced Hashing: Consistent Hashing, Share, and Crush
Material: Karger et al. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web, Weil et al. CRUSH: controlled, scalable, decentralized placement of replicated data
Caching, Replication
Material: Nelson et al. Caching in the Sprite network file system, Kistler et al. Disconnected operation in the Coda File System
Consistency, Availability, and Partition Tolerance
Material: DeCandia et al. Dynamo: Amazon’s Highly Available Key-value Store, Helland, Life beyond Distributed Transaction: An Apostate's Opinion
Data Deduplication
Material: Muthitacharoen et al., A Low-bandwidth Network File System, Douglis, Iyengar. Application-specific Delta-encoding via Resemblance Detection
Performance Analysis
Material: Traeger, A nine year study of file system and storage benchmarking (at least parts of it)

As books I would recommend:

Callaghan. NFS Illustrated
Pate. UNIX Filesystems: Evolution, Design, and Implementation
Folk, Zoellick. File structures

For me, a few key points are important:

To clearly separate between classes of file systems and a concrete example. The best example is the class of network file systems vs. NFS. At the end there should be no much question if something is a inherent property of a class of file systems or of the concrete implementation
To have enough time to handle the basic concepts independently from concrete usages. For example explaining B-Trees as an important file structures independent from the usage in e.g. BTRFS.
The concepts are more important than the current technology or standards.

Thursday, April 02, 2009

Comparison of One-Hop Distributed Hash Tables (DHT)

I just uploaded an old semester work, which I haven't published here yet, about a comparision of One-Hop DHT:

Distributed Hash Tables (DHT) are an important substrate of several peer-to-peer (P2P) applications. Most existing approaches favor a small memory and network overhead over lookup latency. New approaches question this tradeoff and allow a lookup with using only one hop, but they store the routing information for all nodes on each node in the system and so require higher background traffic to maintain the routing tables up-to-date. In this paper the design of three one-hop DHT approaches is described and compared in detail. This comparison shows that different assumptions are used to analyze the approaches. Therefore, several parameters are inspected and an uni- ﬁed parameter setting is extract. Using the uniﬁed parameter setting, a fair and meaningful comparison of the approaches is possible. In particular, the bandwidth consumption, fault tolerance properties, the usage of heterogeneity in the P2P network, and the scalability are compared. The comparison shows that the unified parameter setting lead to different relative results as originally stated by the approach designers.

PDF

Saturday, March 28, 2009

Video about IBM ProtecTIER data deduplication

IBM has published a marketing video about their ProtecTIER data deduplication system recorded at the Pulse09 conference in February:

Key message: It is scalable. But the video contains 3 minutes of marketing stuff without much real information.

What I really find more interessing: At the SYSTOR'09 conference (one of the interessing talks I mentioned here) will be a research talk about the technology and concepts behind the ProtecTIER system, which is based on the product from the company Diligent that IBM bought April 2008. Abstract :

We describe some of the design choices that were made during the development of the IBM TS7650G ProtecTier, a fast, scalable, inline, deduplication device. The system's design goals and how they were achieved are presented. This is the first and only deduplication device that uses similarity matching. The paper provides the following original research contributions: we show how similarity signatures can serve in a deduplication scheme; a novel type of similarity signatures is presented and its advantages in the context of deduplication requirements are explained. It is also shown how to combine similarity matching schemes with hash based identity schemes.

I really look forward to this talk. Especially how the delimit their approach in comparision to approaches like DERD, DeepStore and other.

First paper accepted

My first paper has been accepted for publication at the SYSTOR'09 conference that takes place in Haifa at May 4-6.

It is based on the first part of my master thesis, but the contents has been extended and revised afterwards:

Data deduplication systems detect redundancies between data blocks to either reduce storage needs or to reduce network traffic. A class of deduplication systems splits the data stream into data blocks (chunks) and then finds exact duplicates of these blocks.

This paper compares the influence of different chunking approaches on multiple levels. On a macroscopic level, we compare the chunking approaches based on real-live user data in a weekly full backup scenario, both at a single point in time as well as over several weeks.

In addition, we analyze how small changes affect the deduplication ratio for different file types on a microscopic level for chunking approaches and delta encoding. An intuitive assumption is that small semantic changes on documents cause only small modifications in the binary representation of files, which would imply a high ratio of deduplication. We will show that this assumption is not valid for many important file types and that application specific chunking can help to further decrease storage capacity demands.

I really look forward to that conference because surprisingly many talks in the program look really interesting and it is my first chance to meet storage researchers outside the Fürstenallee.

Monday, November 17, 2008

I will start at the Paderborn Center for Parallel Computing

With high probability, I will start as research assistent (aka PhD student) at the Paderborn Center for Parallel Computing.

There I will research in the area of storage systems, especially on the topic "data deduplication". It is really nice to be able to continue and complete the work of my master thesis.

Thursday, November 06, 2008

Compress, Encrypt and then remove redundancies:Really?

I knew that I had read it some where, but I forgot. I have found it again: The white paper "The New Metrics of Disk-based Data Protection" by the Strategic Research Cooperation claims that Diligent's Hyperfactor approach works in three steps - compress, encrypt, and then eliminate redundancy".

I really, really want to know how you can de-duplicate that have been compressed and encrypted before. Data that was similar to already known data elements has to be complettly different after compression and encryption. Then there - as I understand it - simply can't be any similarity left between the data elements.

Tuesday, November 04, 2008

Compression vs. Data Deduplication

How is data compression fundamentally different from data de-duplication? I really don't see it. But I'm not convinced that there is no need for a new word either.

Jon Bentley presented a compression algorithm for finding und eliminating common long strings in 1999. The approach works by dividing a stream of bytes into chunks of static size (he uses 1K chunks for the evaluation) and then hash them using Rabin's fingerprinting method. He maintains a table of all fingerprints seen up to date, and there he lookup up the hash value of a new chunk to check if a chunk with the same value was seen before. If there is such a collision, he checks these two chunks byte for byte. If the chunk was not seen before, the hash value is added to the table. If the chunk was there already, a link to an existing occurrences is stored.

The hash-based data de-duplication approach works as follows: The divide a stream of bytes into chunks (often using a more elaborate chunking method that is often falsely called "Rabin Fingerprints" because it used that fingerprinting technique), and then hash them using a cryptographic fingerprinting method (like SHA-1). Dedup systems maintain an index that contains all fingerprints seen up to data, and there they lookup up the fingerprint of the new chunk to check if the chunk was seen before. If the chunk was not seen before, the chunk is stored. If a chunk contains the same data (whp.) the data is not stored.

Looking similar?

In contrast to Jon Bentleys approach, it is common to skip the byte-by-byte comparison for performance reasons claiming that the collision probability using a cryptographic hash value is much lower than e.g. a random byte flip in memory. If there is a data loss, it is whp. not caused by a fingerprint collision. However, even that is not inherent to the de-duplication approach. You could easily perform a byte-by-byte-comparision in dedup systems, too.

One problem — from a students perspective writing a master thesis about that topic — is that before 2008 I found no evidence of the term "data de-duplication" (in a storage context, not in a data mining context) in research literature. The first usage I found was Zhu's paper about Data Domain's ways to avoid the disk bottleneck of hash-based data de-duplication.
There was an interesting discussion about the difference between the storage blogs "Backup Central" and "StorageMojo".
A StorageMojo author says:

I still don’t get why the industry refers to “de-duplication” rather than compression - why use a well-understood term when you can invent a new one? - but they did make the point that compression rates depend on your data types, backup policies and retention policies. Basically the more stuff stays the same the higher your back up compression rate.

W. Curtis Preston of "Backup Central" takes on that.

There are different definitions to distinguish compression from data de-duplication. Here are few tries:

Most often (Zhu, Kulkarni) it is claimed that the difference is that compression founds data only in a single file, while data de-duplication looks at multiple files. This is true e.g. for the zip compression application, it is a bit blurred by tar.gz, but the claim still seems to hold. But fundamentally, this is not more than an implementation detail. I have no trouble writing a small compression app that tries to find common strings from multiple files. Is that a compression application? Sure! In general compression is defined as working over a stream of bytes. If that stream is a single file, multiple files, or what ever is not important at a conceptually level. If the byte-stream stops producing new data for a week and than continue to produce new bytes like in a backup scenario also seems not important to me at a conceptually level.
In a recent SNIA white paper defines data de-duplication "as the process to examining a data set or byte stream at the sub-file level and storing and/or sending only unique data". Well, this differentiates dedup from "Content Addressable Storage"(CAS) or "Single Instance Storage"(SIS), but not from compression. The "unique data"-term wonders me a bit. All dedup systems I aware of (hash-based systems for sure and also Diligent's implementation of "Delta Encoding via Resemblance Detection") are very course grained approaches for finding redundancy. E.g. hash-based systems miss every redundancy smaller than a chunk size. They may miss large chunks of data if static-sized chunks are used (like in the Venti system) and a small shift has changed the data. Claiming that only unique data is stored, is not exactly true. A dedup systems stores no redundant data that it has classified as redundant. That is a difference.
The same SNIA paper defines compression as "the encoding of data to reduce its storage requirements". The referral to the "encoding" is a good point. Dedup systems use a permanent table to lookup, while most compression approaches use a temporary table and that is used to find a small encoding for the table, e.g., Huffman encodings. But wait a moment. A filesystem de-duplication system may consist of three parts a) A storage of all as new classified chunks b) an index with a mapping from the chunk fingerprint to the storage component and c) an index mapping from an Inode to a list of chunk fingerprints (together with offset and sizes and so). Can the value of an entry in the last index not be seen as an encoding of the file? I don't know. The point is not bad.
In a whitepaper from Diligent, they differentiate the two terms by claiming that compression finds redundancies only in a short "sliding window". Well, I have found no sliding window in Jon Bentleys compression algorithm.
W. Curtis Preston from Backup Central refers also to the file-by-file- or backup-by-backup property of compression, while de-duplication finds redundancies between multiple backups. Another difference is that de-duplication compression ratios are based on the types of data and how the backup is done. "Repeated incremental backups of brand new data (e.g., seismic data) would not de-dupe at all". This limits data de-duplication to the one (important) area of backup- and archival storage. De-duplication is very effective in that area, but not limited to it.

There are some approaches to differentiate the terms, but there is at least no clear and conscience definition and separation.

A commentator at "Backup Central" pointed out that data de-duplication "has more in common with image compression than text compression" and both dedup classes "look like out-of-order MPEG-4 compression". I honestly have no clue about image compression and MPEG-4 (I should!), but maybe he is right. It seems like the text compression approaches based on LW and the implementations like ZIP are simply stated as "compression". The whole concept of compression is reduced to that.

For me the goals are the same: To save storage capacity or bandwidth usage by finding and eliminating redundancies. The approaches are sometimes remarkable similar, while the implementations are not. May be de-duplication is just a subfield of compression

in that redundancies are found over all seen data,
used to build storage systems like block devices and file systems.

I'm really not sure about the formulation and the scope. I included the second point because a compression app that finds redundancies over all seen data and stores them in a single file, would I not call deduplication application. The first point is important because a storage system that applies only "classical" text compression wouldn't be a deduplication system.

Friday, October 31, 2008

Deduplication as Primary Data Storage

Das Blog StorageMojo schreibt in einem Artikel:

So what percentage de-dup compression of unstructured data is feasible? That is the key to understanding the economic basis of primary storage de-duplication of unstructured data. [...] Primary storage de-dup could be the next big win for IT shops. We just don’t have the data that can tell us how big the win could be

Hhm, mal schaun: "vim thesis.tex" - Just kidding!

8K Random Writes on Intel SSD

Eine neue Solid State Disk von Intel soll über 8.000 Random Writes pro Sekunde schaffen. 8000! "Meine" SSD, die ich in der Masterarbeit nutze, schafft gerade mal 3000 (laut IOMeter). Nicht umsonst sagt Linus Torvalds dazu "That thing absolutely rocks.". Der Preis ist immer noch etwas happig: Pro 80 GB 595 Euro (Großhandelspreis), also 7 Euro pro GB. Mal die erstbeste Festplatte von Alternate rausgesucht: 640 GB für 63 Euro (Endkundenpreis) - Also 10 Cent pro GB.

via [Kevin Burton]

P.S. Etwas verspätet, ich arbeite gerade meine Google Reader "Stars" ab.