dirkmeister.de: 04/01/2009

Sunday, April 26, 2009

Twitter Updates 2009-04-27

Wow. Oracle buys Sun! Changes: More wired version numbers for Sun products? Future of mySQL? Oracle is more anti-agile than IBM. #
Listening to "Storage Systems". #
My Prof. has read the Scala book during holiday and finds the language nice. Maybe we will use it more at the PC^2. That would be nice. #
Nice overview about Cloud Computing (including OSS) in current "Linux Magazin" (German). I would like to eval "Eukalyptus". #
Removed a very hard to track bug in my master thesis's dedup system (ok, I introduced the bug after the thesis). I like this part of my job #
Another talk about deduplication from Santa Cruz that seems interesting: "Data Sequentiality Potential for Data De-Duplication Schemes". #
Searching the current mailing address of Peter Levart, project owner of FUSE-J. Sad that the project seems to be abandoned. #

Sunday, April 19, 2009

Twitter Updates 2009-04-19

Oh. For weeks no response from my hotel in Haifa. Today both hotels responded within 30 minutes. #
Playing with FUSE-J. Problems with C generator. It finds method with same name, same parameters but different return type: ByteBuffer.array #
Visited a PhD defence of a fried today. :-) #
" Who want to write a paper for next year, "grep vs reverse index"?" on Hadoop mailing list: http://tinyurl.com/delgj2 Lol #
Just installed my Intel X25-E SSD. #
The grading of my master thesis is now finished. Finally i'm really done. Nice. #
Today I played soccer organized by the PACE international graduate school. Finished as 8th of 9 teams. Oh. #
SSRC Seminar talk by Deepavali Bhagwat "Extreme Binning: Scalable, Parallel Deduplication". I would like listening to this talk. #

Sunday, April 12, 2009

Twitter Updates 2009-04-12

Now I have open my first open source project. My first OS activity since the bad experience with the Shox network simulator last year. #
Just watched Monster vs Aliens in 3D. Not the best Pixar movie, but the 3D experience was "Wow". I bet most films will be 3D in a few years #
Listening to Google Collect episode of the Javaposse. Used the library in Shox project two years ago. Nice engineering! #
The hotel in Haifa I picked first doesn't respond to my mails. One mail two weeks ago. One two days ago. Strange. Now I try an other one. #
Quillen: Nice deduplication project using AWS S3 on Google Code: http://code.google.com/p/quillen/ #
Statebook: http://www.statebook.co.uk/ #

Monday, April 06, 2009

Just Released: Filesystem Chunking

I just released some tools developed to evaluate chunking-based data deduplication techniques on various systems and to evaluate new chunking methods into the new open source project "fs-c" (for filesystem chunking).

The fs-c tools allow to analyze the internal and temporal redundancy of file system directories that are found by content-defined chunking using Rabin's fingerprinting method and static chunking with different chunk sizes.

The goal is to allow users to provide a rough estimate of the redundancy found by de-duplication systems for their concrete workload and to provide a basis for further enhancement to the tools and for e.g. application-specific chunking methods.

Currently the analysis is only done using an in-memory hashtable which limits the size of the system to a few hundred GB of data (or you need a large shared memory systems). I have also developed Hadoop MapReduce tasks to calculate the redundancies, but that code is not ready to publication.

Sunday, April 05, 2009

Twitter Updates 2009-04-05

in Willingen for the PC^2 Seminar. Listening a talk about "Scientific English" #
I'm aware of lots of storage research paper from NetApp, IBM and HP. I know some from Sun. Are there are any from EMC I should be aware of? #
Something sucks here. Scalax and/or Scala Eclipse Plugin. OutOfMemoryException (on my 4 GB MBP) while simply compiling this project. #

Thursday, April 02, 2009

Comparison of One-Hop Distributed Hash Tables (DHT)

I just uploaded an old semester work, which I haven't published here yet, about a comparision of One-Hop DHT:

Distributed Hash Tables (DHT) are an important substrate of several peer-to-peer (P2P) applications. Most existing approaches favor a small memory and network overhead over lookup latency. New approaches question this tradeoff and allow a lookup with using only one hop, but they store the routing information for all nodes on each node in the system and so require higher background traffic to maintain the routing tables up-to-date. In this paper the design of three one-hop DHT approaches is described and compared in detail. This comparison shows that different assumptions are used to analyze the approaches. Therefore, several parameters are inspected and an uni- ﬁed parameter setting is extract. Using the uniﬁed parameter setting, a fair and meaningful comparison of the approaches is possible. In particular, the bandwidth consumption, fault tolerance properties, the usage of heterogeneity in the P2P network, and the scalability are compared. The comparison shows that the unified parameter setting lead to different relative results as originally stated by the approach designers.

PDF