Saturday, March 24, 2012

Reactivation of dirkmeister.de

Next month, an interesting time begins.

After presenting some recent work at MSST 2012, I will do a three month internship at Greenplum, the data analysis part of EMC, in San Mateo: Three months California, three month Silicon Valley, three month sun, three month coding.

Some of my friends requested photos, updates, and proof of life. I could also use Facebook, Google+, or even twitter, but believe it or not, not all my friends use Facebook or Google+. Blogging feels very much like 2004, but it is ok for such a trip. For this reason, I will reactive this blog.

I don't know the blogging policy of Greenplum/EMC, so I will stick with being on the safe site by doing two things:

  • Stressing that all views don't represent my current, previous or future employers or project partners, including EMC, the University of Paderborn, and the University of Mainz.
  • Not even acknowledging that there could be something like a Greenplum Database.
Therefore, no technical content here. All the internship posts will be about photos, life in Silicon Valley (from my point of view), and weekend trips. 


This will be fun.

Saturday, January 09, 2010

Storage Systems Course: My proposal

In my last post, I summarized some of the storage systems courses from international top universities with storage system labs.

In this post, I will distill my own ideas and my own views into a structure for a storage system course. Here, I assume here a 15-weeks course with a single 1 1/2 hour lecture per week (as we have in Germany):
  1. Introduction, Overview, Disk Drive Architecture
    Material: Ruemmler, Wilkes An introduction to disk drive modeling

  2. Disk Scheduling / SSD
    Material: Iyer, Druschel. Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O, Agrawal et al. Design Tradeoffs for SSD Performance

  3. RAID
    Material: Patterson et al. Introduction to Redundant Arrays of Inexpensive Disk (RAID), Corbett. Row-Diagonal Parity for Double Disk Failure Correction

  4. Local File Systems

  5. Local File System Case Studies: ext3, btrfs
    Material: Valerie Aurora. A short history of btrfs, Card et al. Design and Implementation of the Second Extended Filesystem

  6. Local File Structures (Sequential, Hashing, B-Tree)
    Material: Comer. The Ubiquitous B-Tree

  7. SAN / NAS / Object-based Storage
    Material: Sacks. Demystifying DAS, SAN, NAS, NAS Gateways, Fibre Channel, and iSCSI

  8. Examples: NFS, Ceph, GoogleFS/Hadoop DFS
    Material: Weil. Ceph, A scalable, high-performance distributed file system, Ghemawat et al. The Google File System

  9. Snapshots and Log-based Storage Designs
    Material: Brinkmann, Effert. Snapshots and Continuous Data Replication in Cluster Storage Environments, Hitz et al. File System Design for an NFS File Server Appliance, Rosenblum, Ousterhout. The Design and Implementation of a Log-Structured File System

  10. Fault Tolerance, Journaling, and Soft Updates
    Material: Prabhakaran et al. Analysis and Evolution of Journaling File Systems, Seltzer et al. Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems

  11. Advanced Hashing: Consistent Hashing, Share, and Crush
    Material: Karger et al. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web, Weil et al. CRUSH: controlled, scalable, decentralized placement of replicated data

  12. Caching, Replication
    Material: Nelson et al. Caching in the Sprite network file system, Kistler et al. Disconnected operation in the Coda File System

  13. Consistency, Availability, and Partition Tolerance
    Material: DeCandia et al. Dynamo: Amazon’s Highly Available Key-value Store, Helland, Life beyond Distributed Transaction: An Apostate's Opinion

  14. Data Deduplication
    Material: Muthitacharoen et al., A Low-bandwidth Network File System, Douglis, Iyengar. Application-specific Delta-encoding via Resemblance Detection

  15. Performance Analysis
    Material: Traeger, A nine year study of file system and storage benchmarking (at least parts of it)
As books I would recommend:
For me, a few key points are important:
  • To clearly separate between classes of file systems and a concrete example. The best example is the class of network file systems vs. NFS. At the end there should be no much question if something is a inherent property of a class of file systems or of the concrete implementation
  • To have enough time to handle the basic concepts independently from concrete usages. For example explaining B-Trees as an important file structures independent from the usage in e.g. BTRFS. 
  • The concepts are more important than the current technology or standards.

Tuesday, December 01, 2009

Storage System and File System Courses

I researched a lot about storage system classes given at good universities this year. This had two reasons: The first was this post of a researcher at NetApp, about the missing of a good storage or file system class book and secondly our own storage systems class where I was the TA.

In this post I want to give a short overview about the various different courses, their focus, and other things. Please note, the following text might contain errors or misconceptions on my part. I also might have missed other storage courses at these universities.

University of California, Santa Cruz: 

Let's begin with the course of the University of California in Santa Cruz. Storage is a huge at UCSC with the Storage Systems Research Center that partners with nearly very everyone. The ceph file system and the crush hash function are two outcomes of their research. 

The course consists of a series of lectures (two per week), lots of reading material, and a project. The lectures are about file systems beginning with uniprocessor filesystems, performance analysis and (very fast) to distributed filesystems. They also cover fault tolerance and other advanced topics. Their reading material consists of 37 papers from classics like "File System Design for an NFS File System Appliance"  to state of the art research papers like "An Analysis of Data Corruption in the Storage Stack" (FAST 2008) that come about two weeks before. 

I miss some important basics that IMHO are important for understanding storage system design, like properties of modern hard disks and I am not that into archival storage (my boss is), but it is a really good designed course. Unfortunately, the lecture slides are not available online.

Columbia University, New York
Advanced Topics in Network Storage Systems, Spring 2004:
http://www1.cs.columbia.edu/~magoutis/cs699810-spring04/index.html


I may have missed one, but the last storage related course at Columbia University had been in 2004 by Kostas Magoutis. The course is focused on network storage and probably relies on basics from an Operating Systems class or a basis storage class. The lectures had been one per week with one to three papers are reading material per week. 

Really nice is that the lecturer has posted notes how the read the papers with questions and annotations to some of the material. Interestingly, data deduplication is covered with the LBFS, the Venti paper, and Henson's Compare-By-Hash papers. 

There are three books recommended for the course "UNIX Internals (1996)", "The Design and the Implementation of the 4.4 BSD Operating System (1996)" and "NFS Illustrated (1999)".

Cornell University, New York
Advanced Distributed Storage Systems, Spring 2009: 

At the Cornel University, I found the course and advanced distributed storage systems by Hakim Westherspoon (has taken part in the OceanStore project). The lectures, given two per week, handle "Cloud Computing, "Network File Systems", the important topics of Consistency, Availability, Replication, and Scalability. 

I think the major strength of this course is that it seems to focus much more than the other courses and the important concepts needed for storage system design, implementation and research than the focus on standards, products, and storage management issues. The major weakness is that the individual lectures are very focused on the research papers, whose content is presented. Even to the point that there is no single presentation scheme. I think the overall consistency of the lecture is weakened this way. 

One interesting aspect of the course is that the students have to write and hand-in short summaries of the reading material papers consisting a summary (3-4 sentences), two or three major strength points, two or three weaknesses and one question of future work that should be followed in the option of the student.

The have to projects as part of the course: In the first the students have to develop a distributed file system based on Amazon Web Service infrastructure. the second is a research project, the students have to come up with by themselves.

For the course 6 books are recommended: Two books by Richard Stevens (UNIX Network Programming, Advanced Programming in the UNIX Enviromment), two books by Tanenbaum (Modern Operating Systems, Distributed Systems), "The Design and Implementation of the 4.4 BSD Operating Systems", and "The C++ Programming Language" book by Stroustrup.

John Hopkins University
Storage Systems, Fall 2007:

At the John Hopkins University -- where our professors of Christian Scheideler and my advisor Andre Brinkmann (as visiting PhD student) had formerly been -- I found the Storage Systems course by Randal Burns.

As usual the course consists of a lecture series (2 lectures as 50min per week), home works, and a project. I like that the course some basics like disk drive architecture that a essential to understand the design of storage systems. On the other side it is a bit short on distributed file systems.


University of Notre Dame:

The University of Notre Dame offered in 2005 the course "Distributed Storage" by Surendar Chandra. 

As usual the course consists of a series of lectures (2 per week) and a project. The lectures topics are "Naming and location", "Consistency and Replication", "Distributed Storage Management", "Security", "Peer-to-Peer Storage and Sensors", and "Energy Management". The reading material consists of not less than 40 papers. My impression is that the collection of reading material differs much from the material of the other courses covered here, e.g. the well-known "classical" papers are not linked.

Technion

Technion is the "Israel Institute of Technology" in Haifa and I said before: I am pretty envy to the students there. However, not especially because of the "Filesystems" course.

The lecture series consists of an short introduction on disk drive architecture, RAID, sequential data processing on tapes (hey, I infer here from the pictures in the slides only), disk-based sorting, B-Trees, Hashing, concurrency and transactions as well as recovery. 

The course recommends five books: "File Structures and Analytic Approach", "Transactional Information Systems", "Principles of Database and Knowledge-Base Systems", "Database Management Systems", and "Database System Implementation". None of these books are directly filesystem related. The books match exactly to the lectures, mostly related to the basics shared between databases and storage systems, but nothing directly related to file systems.

The assignments seem to be pretty similar to ours. It seems to consist of multiple assignments about an easy filesystem implementation. However, the assignments are given also in Hebrew, so I don't understand them. I expected more from a Technion course. 

University of Wisconsin in Madison: 
Advanced Storage Systems, Spring 2006:
http://pages.cs.wisc.edu/~remzi/Classes/738/Spring2006/


The advanced storage systems class given at the University of Wisconsin seems to be a nicely structures class with interesting topics: It begins with local storage systems, but moves very quickly (3. topic) to distributed and mobile systems. Then important concepts like reliability and fault tolerance, performance and scalability as well as caching, replication and consistency are discussed. The reading material is a nice list of now classics like the WAFL paper, the AutoRAID paper, the GoogleFS and MapReduce, but also Row Diagonal Parity and the "soft update" paper.

What universities are missing:
The University of California, Berkeley is missing: The home of BSD (and therefore the Fast File System), RAID, and a lot of early work in P2P storage seems to have no course focussed on storage or file systems. I could not find classes in Stanford, Harvard, MIT, and Carnegie Mellon.

Summary
To sum these courses up a bit: Most courses have large amounts of reading material. This is unusual in Germany (or at least at UPB). I had enough courses (especially in the SE part) without any reading material: We followed this "US style" in our course, but only with 12 papers. Most courses have a project assignment for the students where the students have to come up with an own topic. I really like this, too.

Our own courses
Storage Systems (German), University of Paderborn, Spring 2009:
http://pc2.uni-paderborn.de/teaching/lectures/speichersysteme/


"Our" own storage systems course consists a lecture series with 15 lectures a 90 min and 6 assignments.

The lecture starts very slow, with "Magnetic Storage Systems" (week 1), Disk Scheduling (week 2), an introduction in MEMS and Flash storage (week 3), and RAID (week 4, 5). Next came filesystems (6,7) and storage connection technologies like SCSI (week 8) to SANS (week 9). Network and parallel file systems are treated in week 10 - 12. 

The assignments consisted of programming small FUSE filesystem in C (step-by-step).

In the last third of the lecture, the courses treated advanced storage topics that are interesting for our current research project like Long Term Archiving, HPC IO (MPI IO), Contentious Data Protection (CDP), Data Deduplication and P2P Storage.

In addition to the reading material, we referred to the book "Linux Device Drivers".

Our professor, Andre Brinkmann also gave a short course (6 lectures) called "Theoretical Aspects of Storage Systems Research" at the Politechnika Wroclwska in Poland, which is a very condensed version of our course focussed on the theoretical aspects.

Last words:
I really liked studying and comparing the storage system lectures. These lecture provide a pretty good overview about the classical (I should call them "essential") research papers of our field and an overview about related books as long as a real storage system course book is missing.

I am impressed that so many universities have "project" assignments where the students have to come up with a topic by themselves. These lectures show want is possible on good (mainly US-) universities, with motivated students, and with the right foundations.

Monday, July 20, 2009

Travel report: Israel - Haifa, Jerusalem, and Tel Aviv

In the first weeks of May I visited Israel to take part at the SYSTOR 2009 systems conference. The conference, which was hosted by the IBM Research Labs in Haifs, concerned about different systems aspects, but had also a deduplication track consisting of three talks. It was a nice trip and this post covers the "business" -- meaning the conference -- as well as thoughts about my private travel trough the country after the conference.


Israel Travel auf einer größeren Karte anzeigen

Day 1 - 4: Haifa

The first day was hard. I arrived at 3 o'clock in the morning at Ben Gurion Airport in Tel Aviv. I had read that bus transportation is the major inter-city way to travel. So I expected a larger bus station (as I later saw in Jerusalem and especially Tel Aviv), I found that station. I waited there for around 4 hours. Then a journey through the country. I sometimes had no idea were I was, but I am pretty sure that it was not the direct way to Haifa. It was the first time in a country that is not using latin letters. The hebrew letters seemed like random noise to me. I was never good in learning languages. The start in Israel was therefore a bit rough.

The SYSTOR conference on the other hand started in the nice and friendly location of IBM Haifa Labs, near the University of Haifa. The conference is pretty small. 24 accepted papers, around 50 visitors. But some talks were really interesting, others were disappointing. I liked e.g. Ethan Millers talk about deduplication of virtual disk images. He presented no really unexpected results, but to have the information what is important and what is not is nice to have written down. I was disappointed by some of the industry talks. Some were much too marketing driven and not deep enough in the technology I was really interested in. An example for this was the talk of Mellanox.

At the second day I presented the results of the first half of my master thesis. Presenting at a conference, was totally new to me. I think that my presentation was pretty good. I think -- at that is important -- the audience understood my major points. The paper version of the talk I gave is online in the ACM Digital Library.

But presenting a topic you know and were you had the opportunity to train, is one thing. Smalltalk and networking is also important and at such a small conference it is much easier. I e.g. meet a researcher whose blog I read even before SYSTOR. Surprisingly many people ask me about the blog and twitter. I really liked the "social event" that took place in the old city of Caesarea, the city from that the romans governed the ancient Palestine and a crusader fortress. Extremely cool view at sunset. Unfortunately I have no photo of it.

One funny story: The title of the conference was "SYSTOR - The Israeli experimental systems conference". The Israel Defense Forces (IDF) obviously misunderstood the focus on computer systems research and send an IDF soldier who is also a biology researcher to visit the conference. She probably understood pretty nothing as I would understand nothing on a biology conference.

IBM is not the only cool company in Haifa. Nearly every top company has a engineering facility there: NetApp, Google, Yahoo, LSI, and many others. SAP has an office in another town near Haifa. So I am pretty envy to the students of the Haifa university or the technical university of Israel, Technion. They have more CS-related top companies in their city than we have in complete Germany.

Day 5 - 7: Jerusalem and Dead Sea

After the conference, I stayed another week in Israel. From Haifa, I went to Jerusalem - A pretty extreme city. Everything is about religion there. As a mainly secular liberal (in an european sense), it was pretty strange and I want never ever live in that city.

On Saturday (Sabbat) my impression was that the city is practically shutdown. Every restaurant is closed, the streets are pretty empty (only Sherut taxis are driving). As tourist you better buy water on Friday, or you have a problem. This is totally different from Tel Aviv, were the life is much more relaxed and the people go to the beach on Sabbath. To read articles about problems between the "Jersusalem Life Style" and the "Tel Aviv Life Style" in the newspaper is no surprise after seeing both cities, e.g. here at spiegel.de.

I know the amount of history located in the old city, but I didn't liked visiting it. Too much persistent souvenir traders and shady tourist guides. I really don't like that. Especially in the Christian and the muslim charters. I haven't seen them in the jewish charter. May be only because it was Sabbath and than even dubious souvenir trades have their free day. Please: If you consider something your "Holy Place" show some dignity.

After noticing that every direction sign towards the Western Wall (what we call "Klagemauer" in German) is intentionally misleading, I oriented myself only by my "Lonely Planet". Eventually, I found the western wall plaza -- the only place in the old city that has been left its dignity. Mainly because the rules stated on large signs allow the prayers not to be distributed too much be people like me (no photographs on Sabbath for example). I was impressed how near all these locations are in reality. The Western Wall, the Temple Mount, the Al-Aksa-Mosque. Wow.

On my second day to Jerusalem, I had a day trip to the Dead Sea. The "Lonely Planet" says that the "Ein Gedi Beach" is an "undeserved popular beach". The book is right. Dear reader, I you visit the Dead Sea, please, drive one Egged-Station further. The "Ein Gedi Spa" is probably what you want visit. However, swimming in the Dead Sea was awesome. Totally strange feeling to be pined on the surface. I knew photos of people reading newspaper in the Dead Sea, but I never really took it for real. I always thought that it maybe is a bit easier to by on the surface, but can really can read while swimming. We should try playing Water Polo in the Dead Sea.

Days 8 - 11: Tel Aviv





After a dose of "culture" in Jerusalem, I went to Tel Aviv. And Tel Aviv is really a nice city. While I had problems in Haifa and Jerusalem to find restaurants to trust and to find a super market, both was easy in Tel Aviv. The shopping streets were fun - at least after you accept that it is normal that crowds of cute girls in uniform, jump up and down, shiekingly, before Bikini shops.

The beach was great. The 6-km promenade to the old city of Jaffa (the oldest documented habour of the world) was great. I visited the university of Tel Aviv. Palms on the university ground. Much of green. Have I said that I was sometimes pretty envy to the students there?

At the time I visited Tel Aviv, you can stay at the beach and swim, but the beaches were pretty empty. So I visited two museums in Tel Aviv. The Diaspora Museum at the university (Diaspora is the time called when the Jewish people had been in exile) and the Hagana Museum at the Rothschild Boulevard near the Independence Hall. Even considering that the Hagana Museam is pretty biased, it help to filled some holes I had in the understanding of the history of the country. The history of Jewism/Palastine/Israel is not a topic in German schools between the time between the year 70 and 1933 and also not after 1945. I suppose most people in Germany think that 1933ff was the first time that Jewish people returned to their country. However, I don't want to blame the schools for that. Class time is limited. Even the history of Germany after 1945 is not a topic in German schools.

I stayed at the Prima hotel. Nice, perfect location directly at the beach, but in contrast to what is said in the "Lonely Planet" there is no free internet. Only terribly expensive WLAN.

I really liked Tel Aviv and would it be so expensive I would revisit it, for sure. Some kind of local travel guide would be nice next time.

More random notes

My return to Germany from Ben Gurion Airport has been more eventful than expected. Some presents I bought at Tel Aviv, attracted the suspect of the security staff and I got into a extensive security check using some high-tech explosive detection devices. I had to show that my MBP and my photo camera really works. I had to remove the battery, etc. They even checked by "dirty clothes bag" for explosives. Fortunately the atmosphere was pretty nice e.g. the security staff found my "english-hebrew" book interesting and apparently found some of the translations pretty funny. Totally cute was that one of the security girls carefully re-packed the gifts, which they checked in detail, back into a box. The wrapping was nicer afterwards than before.

One thing I found very strange at the beginning: There are soldiers with armed weapons everywhere. Not because of checkpoints (I have only seen one checkpoint near the Dead Sea), but because it is normal to wear uniform in the spare time and often to wear the personal weapon. I have done mandatory military service as mechanized infantryman in 2000/2001. In the Germany army, soldiers are very strict when it comes to weapons. Wearing an assault rifle in the sparse time is completely unthinkable in Germany. For example, recent newspaper reports state that soldiers that have forgotten to leave the personal pocket knife(!) at the barracks and get controlled by the police have to pay up to 10.000 Euro (14.000 USD) due to very strict German weapon laws. From a military point of view, it also clear why the IDF soldiers carry their weapons to their home. Similar to Switzerland, the country is simply to small to have time of a lengthy mobilization in case on an attack. But, it wired. Even wearing the uniform in the sparse time is only allowed under very strict rules. A soldier is only allowed to wear the uniform on the direct way from home to the barrack. I have seen a couple walking through the city park of Jerusalem, holding hands on Sabbath, and the man had a rifle on this back. Nothing unusual there.

Other thing about the IDF soldiers I found strange, is -- I would call it -- lack of discipline. They hang around at bus stations, making themselves up with a lipstick, chewing gums, wearing non-uniform clothers like Flip-Flop shoes or (as described above) do shopping in groups. Some female soldiers had opened their shirt by the top three or four pates. All nothing that would be considered appropriate here. While I was pretty fast used to see uniforms everywhere, the lack of discipline kept surprising me.

Eating: I love english breakfast, so an Israeli breakfast is probably some kind of counterpart. Due to religious rules, they do not eat meat, eggs, ... for breakfast. The fresh salads, fruits, and cakes were really nice (at least in good hotels), but I still missed my favorite topping (ham). I am not really a fan of marmalade. The rest of the eating I would consider "arabic" (Probably Israeli people will scream when reading this): Hummus (delicious!), Kuskus, and these extremely sweat cakes. Falafels seem to be some kind of national disk.

I have not fully get what "Kosher" really means, but nearly every restaurant I found had "Kosher" certificate. It is still really important there. But at least in Tel Aviv, there are also non-kosher restaurant, e.g. -- if I understand if correctly an italian restaurant is by definition non-kosher. In practice, e.g. at the cantina of IBM, I haven't found the Kosher thing anything limiting except for the breakfast.

Thursday, June 25, 2009

Kleine Erinnerung daran über Israel zu schreiben

Heute kam per lawblog.de eine kleine Erinnerung einen Reisebericht über Israel zu schreiben:
Bei Polizeikontrollen, vor allem in Schleswig-Holstein, sind nämlich Soldaten außerhalb des Dienstes mit einem juristischen Vorwurf konfrontiert worden: Verstoß gegen das Waffengesetz. Ihr Vergehen? Die Soldaten, meist in Uniform auf dem Heimweg ins Wochenende, hatten das “Standardmesser” der Bundeswehr dabei.

Es handelt sich um ein Taschenmesser (Hersteller: Victorinox), allerdings in Form eines Einhandmessers. Die Besonderheit an Einhandmessern ist, dass sie mit einer Hand geöffnet werden können. Problem: Einhandmesser fallen seit neuestem unter das Waffengesetz. Wer so ein Messer bei sich hat, riskiert ein Bußgeld bis zu 10.000 Euro.
Auf so etwas kann man nur in Deutschland kommen, oder?

In Israel -- und zum Glück ist Deutschland im Gegensatz zu Israel von Freunden umgeben -- sieht man überall Soldaten mit ihren Waffen auf dem Heimweg und auf dem Weg zur Kaserne. Oder auch am Sonntag beim Spaziergang mit der Liebsten im Park.
Es war sehr gewöhnungsbedürftig und ich bin froh, dass dies hier weder notwendig noch üblich ist, aber wegen dem "Standardmesser" einen Aufstand zu machen, darauf kann man nur in D kommen.

Saturday, June 20, 2009

dirkmeister.de down?

Der Webserver, auf dem auch dirkmeister.de gehostet wird, hat zum dritten Mal innerhalb von einem Jahr einen Hardwaredefekt. Nachdem daher sicher sinnvollen Hosting-Providerwechsel läuft aber nichts wirklich richtig: Kriege die Domain nicht umgezogen, keinen vernünftigen Zugang zum neuen Server, etc.

Auf Grund dieser Probleme ziehe ich jetzt erstmal nach Blogger um. Ich habe ein Jahr gebraucht um nach dem letzten Crash wieder die Leserzahlen zu erreichen wie vorher. The Show must go on. Wenn hier irgendeine Festplatte crashed, ist es mir total egal. Wenn es mir "hier" gefällt, dann bleibe ich möglicherweise auch ganz bei Blogger.

Auch andere Webseiten sind aus dem gleichen Grund down. Zum Beispiel der Blogaggregator juli-blogs.de. juli-blogs.de ist nicht down, weil ich bei den "Jungen Liberalen" ausgetreten bin. Das war nur eine unglückliche zeitliche Überschneidung. Ich habe keinen Plan juli-blogs.de einzustellen.

Sunday, April 26, 2009

Twitter Updates 2009-04-27

  • Wow. Oracle buys Sun! Changes: More wired version numbers for Sun products? Future of mySQL? Oracle is more anti-agile than IBM. #
  • Listening to "Storage Systems". #
  • My Prof. has read the Scala book during holiday and finds the language nice. Maybe we will use it more at the PC^2. That would be nice. #
  • Nice overview about Cloud Computing (including OSS) in current "Linux Magazin" (German). I would like to eval "Eukalyptus". #
  • Removed a very hard to track bug in my master thesis's dedup system (ok, I introduced the bug after the thesis). I like this part of my job #
  • Another talk about deduplication from Santa Cruz that seems interesting: "Data Sequentiality Potential for Data De-Duplication Schemes". #
  • Searching the current mailing address of Peter Levart, project owner of FUSE-J. Sad that the project seems to be abandoned. #

Sunday, April 19, 2009

Twitter Updates 2009-04-19

  • Oh. For weeks no response from my hotel in Haifa. Today both hotels responded within 30 minutes. #
  • Playing with FUSE-J. Problems with C generator. It finds method with same name, same parameters but different return type: ByteBuffer.array #
  • Visited a PhD defence of a fried today. :-) #
  • " Who want to write a paper for next year, "grep vs reverse index"?" on Hadoop mailing list: http://tinyurl.com/delgj2 Lol #
  • Just installed my Intel X25-E SSD. #
  • The grading of my master thesis is now finished. Finally i'm really done. Nice. #
  • Today I played soccer organized by the PACE international graduate school. Finished as 8th of 9 teams. Oh. #
  • SSRC Seminar talk by Deepavali Bhagwat "Extreme Binning: Scalable, Parallel Deduplication". I would like listening to this talk. #

Sunday, April 12, 2009

Twitter Updates 2009-04-12

  • Now I have open my first open source project. My first OS activity since the bad experience with the Shox network simulator last year. #
  • Just watched Monster vs Aliens in 3D. Not the best Pixar movie, but the 3D experience was "Wow". I bet most films will be 3D in a few years #
  • Listening to Google Collect episode of the Javaposse. Used the library in Shox project two years ago. Nice engineering! #
  • The hotel in Haifa I picked first doesn't respond to my mails. One mail two weeks ago. One two days ago. Strange. Now I try an other one. #
  • Quillen: Nice deduplication project using AWS S3 on Google Code: http://code.google.com/p/quillen/ #
  • Statebook: http://www.statebook.co.uk/ #

Monday, April 06, 2009

Just Released: Filesystem Chunking

I just released some tools developed to evaluate chunking-based data deduplication techniques on various systems and to evaluate new chunking methods into the new open source project "fs-c" (for filesystem chunking).

The fs-c tools allow to analyze the internal and temporal redundancy of file system directories that are found by content-defined chunking using Rabin's fingerprinting method and static chunking with different chunk sizes.

The goal is to allow users to provide a rough estimate of the redundancy found by de-duplication systems for their concrete workload and to provide a basis for further enhancement to the tools and for e.g. application-specific chunking methods.

Currently the analysis is only done using an in-memory hashtable which limits the size of the system to a few hundred GB of data (or you need a large shared memory systems). I have also developed Hadoop MapReduce tasks to calculate the redundancies, but that code is not ready to publication.