Monday, April 06, 2009

Just Released: Filesystem Chunking

I just released some tools developed to evaluate chunking-based data deduplication techniques on various systems and to evaluate new chunking methods into the new open source project "fs-c" (for filesystem chunking).

The fs-c tools allow to analyze the internal and temporal redundancy of file system directories that are found by content-defined chunking using Rabin's fingerprinting method and static chunking with different chunk sizes.

The goal is to allow users to provide a rough estimate of the redundancy found by de-duplication systems for their concrete workload and to provide a basis for further enhancement to the tools and for e.g. application-specific chunking methods.

Currently the analysis is only done using an in-memory hashtable which limits the size of the system to a few hundred GB of data (or you need a large shared memory systems). I have also developed Hadoop MapReduce tasks to calculate the redundancies, but that code is not ready to publication.

1 comment:

  1. Automatically imported comment
    Author: Prakti
    Date: Monday 06. April 2009


    Nice Pun with the fs-c ! Intent?

    ReplyDelete