My first paper has been accepted for publication at the SYSTOR'09 conference that takes place in Haifa at May 4-6.
It is based on the first part of my master thesis, but the contents has been extended and revised afterwards:
Data deduplication systems detect redundancies between data blocks to either reduce storage needs or to reduce network traffic. A class of deduplication systems splits the data stream into data blocks (chunks) and then finds exact duplicates of these blocks.I really look forward to that conference because surprisingly many talks in the program look really interesting and it is my first chance to meet storage researchers outside the Fürstenallee.This paper compares the influence of different chunking approaches on multiple levels. On a macroscopic level, we compare the chunking approaches based on real-live user data in a weekly full backup scenario, both at a single point in time as well as over several weeks.
In addition, we analyze how small changes affect the deduplication ratio for different file types on a microscopic level for chunking approaches and delta encoding. An intuitive assumption is that small semantic changes on documents cause only small modifications in the binary representation of files, which would imply a high ratio of deduplication. We will show that this assumption is not valid for many important file types and that application specific chunking can help to further decrease storage capacity demands.
No comments:
Post a Comment