Tuesday, December 01, 2009

Storage System and File System Courses

I researched a lot about storage system classes given at good universities this year. This had two reasons: The first was this post of a researcher at NetApp, about the missing of a good storage or file system class book and secondly our own storage systems class where I was the TA.

In this post I want to give a short overview about the various different courses, their focus, and other things. Please note, the following text might contain errors or misconceptions on my part. I also might have missed other storage courses at these universities.

University of California, Santa Cruz: 

Let's begin with the course of the University of California in Santa Cruz. Storage is a huge at UCSC with the Storage Systems Research Center that partners with nearly very everyone. The ceph file system and the crush hash function are two outcomes of their research. 

The course consists of a series of lectures (two per week), lots of reading material, and a project. The lectures are about file systems beginning with uniprocessor filesystems, performance analysis and (very fast) to distributed filesystems. They also cover fault tolerance and other advanced topics. Their reading material consists of 37 papers from classics like "File System Design for an NFS File System Appliance"  to state of the art research papers like "An Analysis of Data Corruption in the Storage Stack" (FAST 2008) that come about two weeks before. 

I miss some important basics that IMHO are important for understanding storage system design, like properties of modern hard disks and I am not that into archival storage (my boss is), but it is a really good designed course. Unfortunately, the lecture slides are not available online.

Columbia University, New York
Advanced Topics in Network Storage Systems, Spring 2004:
http://www1.cs.columbia.edu/~magoutis/cs699810-spring04/index.html


I may have missed one, but the last storage related course at Columbia University had been in 2004 by Kostas Magoutis. The course is focused on network storage and probably relies on basics from an Operating Systems class or a basis storage class. The lectures had been one per week with one to three papers are reading material per week. 

Really nice is that the lecturer has posted notes how the read the papers with questions and annotations to some of the material. Interestingly, data deduplication is covered with the LBFS, the Venti paper, and Henson's Compare-By-Hash papers. 

There are three books recommended for the course "UNIX Internals (1996)", "The Design and the Implementation of the 4.4 BSD Operating System (1996)" and "NFS Illustrated (1999)".

Cornell University, New York
Advanced Distributed Storage Systems, Spring 2009: 

At the Cornel University, I found the course and advanced distributed storage systems by Hakim Westherspoon (has taken part in the OceanStore project). The lectures, given two per week, handle "Cloud Computing, "Network File Systems", the important topics of Consistency, Availability, Replication, and Scalability. 

I think the major strength of this course is that it seems to focus much more than the other courses and the important concepts needed for storage system design, implementation and research than the focus on standards, products, and storage management issues. The major weakness is that the individual lectures are very focused on the research papers, whose content is presented. Even to the point that there is no single presentation scheme. I think the overall consistency of the lecture is weakened this way. 

One interesting aspect of the course is that the students have to write and hand-in short summaries of the reading material papers consisting a summary (3-4 sentences), two or three major strength points, two or three weaknesses and one question of future work that should be followed in the option of the student.

The have to projects as part of the course: In the first the students have to develop a distributed file system based on Amazon Web Service infrastructure. the second is a research project, the students have to come up with by themselves.

For the course 6 books are recommended: Two books by Richard Stevens (UNIX Network Programming, Advanced Programming in the UNIX Enviromment), two books by Tanenbaum (Modern Operating Systems, Distributed Systems), "The Design and Implementation of the 4.4 BSD Operating Systems", and "The C++ Programming Language" book by Stroustrup.

John Hopkins University
Storage Systems, Fall 2007:

At the John Hopkins University -- where our professors of Christian Scheideler and my advisor Andre Brinkmann (as visiting PhD student) had formerly been -- I found the Storage Systems course by Randal Burns.

As usual the course consists of a lecture series (2 lectures as 50min per week), home works, and a project. I like that the course some basics like disk drive architecture that a essential to understand the design of storage systems. On the other side it is a bit short on distributed file systems.


University of Notre Dame:

The University of Notre Dame offered in 2005 the course "Distributed Storage" by Surendar Chandra. 

As usual the course consists of a series of lectures (2 per week) and a project. The lectures topics are "Naming and location", "Consistency and Replication", "Distributed Storage Management", "Security", "Peer-to-Peer Storage and Sensors", and "Energy Management". The reading material consists of not less than 40 papers. My impression is that the collection of reading material differs much from the material of the other courses covered here, e.g. the well-known "classical" papers are not linked.

Technion

Technion is the "Israel Institute of Technology" in Haifa and I said before: I am pretty envy to the students there. However, not especially because of the "Filesystems" course.

The lecture series consists of an short introduction on disk drive architecture, RAID, sequential data processing on tapes (hey, I infer here from the pictures in the slides only), disk-based sorting, B-Trees, Hashing, concurrency and transactions as well as recovery. 

The course recommends five books: "File Structures and Analytic Approach", "Transactional Information Systems", "Principles of Database and Knowledge-Base Systems", "Database Management Systems", and "Database System Implementation". None of these books are directly filesystem related. The books match exactly to the lectures, mostly related to the basics shared between databases and storage systems, but nothing directly related to file systems.

The assignments seem to be pretty similar to ours. It seems to consist of multiple assignments about an easy filesystem implementation. However, the assignments are given also in Hebrew, so I don't understand them. I expected more from a Technion course. 

University of Wisconsin in Madison: 
Advanced Storage Systems, Spring 2006:
http://pages.cs.wisc.edu/~remzi/Classes/738/Spring2006/


The advanced storage systems class given at the University of Wisconsin seems to be a nicely structures class with interesting topics: It begins with local storage systems, but moves very quickly (3. topic) to distributed and mobile systems. Then important concepts like reliability and fault tolerance, performance and scalability as well as caching, replication and consistency are discussed. The reading material is a nice list of now classics like the WAFL paper, the AutoRAID paper, the GoogleFS and MapReduce, but also Row Diagonal Parity and the "soft update" paper.

What universities are missing:
The University of California, Berkeley is missing: The home of BSD (and therefore the Fast File System), RAID, and a lot of early work in P2P storage seems to have no course focussed on storage or file systems. I could not find classes in Stanford, Harvard, MIT, and Carnegie Mellon.

Summary
To sum these courses up a bit: Most courses have large amounts of reading material. This is unusual in Germany (or at least at UPB). I had enough courses (especially in the SE part) without any reading material: We followed this "US style" in our course, but only with 12 papers. Most courses have a project assignment for the students where the students have to come up with an own topic. I really like this, too.

Our own courses
Storage Systems (German), University of Paderborn, Spring 2009:
http://pc2.uni-paderborn.de/teaching/lectures/speichersysteme/


"Our" own storage systems course consists a lecture series with 15 lectures a 90 min and 6 assignments.

The lecture starts very slow, with "Magnetic Storage Systems" (week 1), Disk Scheduling (week 2), an introduction in MEMS and Flash storage (week 3), and RAID (week 4, 5). Next came filesystems (6,7) and storage connection technologies like SCSI (week 8) to SANS (week 9). Network and parallel file systems are treated in week 10 - 12. 

The assignments consisted of programming small FUSE filesystem in C (step-by-step).

In the last third of the lecture, the courses treated advanced storage topics that are interesting for our current research project like Long Term Archiving, HPC IO (MPI IO), Contentious Data Protection (CDP), Data Deduplication and P2P Storage.

In addition to the reading material, we referred to the book "Linux Device Drivers".

Our professor, Andre Brinkmann also gave a short course (6 lectures) called "Theoretical Aspects of Storage Systems Research" at the Politechnika Wroclwska in Poland, which is a very condensed version of our course focussed on the theoretical aspects.

Last words:
I really liked studying and comparing the storage system lectures. These lecture provide a pretty good overview about the classical (I should call them "essential") research papers of our field and an overview about related books as long as a real storage system course book is missing.

I am impressed that so many universities have "project" assignments where the students have to come up with a topic by themselves. These lectures show want is possible on good (mainly US-) universities, with motivated students, and with the right foundations.

Monday, July 20, 2009

Travel report: Israel - Haifa, Jerusalem, and Tel Aviv

In the first weeks of May I visited Israel to take part at the SYSTOR 2009 systems conference. The conference, which was hosted by the IBM Research Labs in Haifs, concerned about different systems aspects, but had also a deduplication track consisting of three talks. It was a nice trip and this post covers the "business" -- meaning the conference -- as well as thoughts about my private travel trough the country after the conference.


Israel Travel auf einer größeren Karte anzeigen

Day 1 - 4: Haifa

The first day was hard. I arrived at 3 o'clock in the morning at Ben Gurion Airport in Tel Aviv. I had read that bus transportation is the major inter-city way to travel. So I expected a larger bus station (as I later saw in Jerusalem and especially Tel Aviv), I found that station. I waited there for around 4 hours. Then a journey through the country. I sometimes had no idea were I was, but I am pretty sure that it was not the direct way to Haifa. It was the first time in a country that is not using latin letters. The hebrew letters seemed like random noise to me. I was never good in learning languages. The start in Israel was therefore a bit rough.

The SYSTOR conference on the other hand started in the nice and friendly location of IBM Haifa Labs, near the University of Haifa. The conference is pretty small. 24 accepted papers, around 50 visitors. But some talks were really interesting, others were disappointing. I liked e.g. Ethan Millers talk about deduplication of virtual disk images. He presented no really unexpected results, but to have the information what is important and what is not is nice to have written down. I was disappointed by some of the industry talks. Some were much too marketing driven and not deep enough in the technology I was really interested in. An example for this was the talk of Mellanox.

At the second day I presented the results of the first half of my master thesis. Presenting at a conference, was totally new to me. I think that my presentation was pretty good. I think -- at that is important -- the audience understood my major points. The paper version of the talk I gave is online in the ACM Digital Library.

But presenting a topic you know and were you had the opportunity to train, is one thing. Smalltalk and networking is also important and at such a small conference it is much easier. I e.g. meet a researcher whose blog I read even before SYSTOR. Surprisingly many people ask me about the blog and twitter. I really liked the "social event" that took place in the old city of Caesarea, the city from that the romans governed the ancient Palestine and a crusader fortress. Extremely cool view at sunset. Unfortunately I have no photo of it.

One funny story: The title of the conference was "SYSTOR - The Israeli experimental systems conference". The Israel Defense Forces (IDF) obviously misunderstood the focus on computer systems research and send an IDF soldier who is also a biology researcher to visit the conference. She probably understood pretty nothing as I would understand nothing on a biology conference.

IBM is not the only cool company in Haifa. Nearly every top company has a engineering facility there: NetApp, Google, Yahoo, LSI, and many others. SAP has an office in another town near Haifa. So I am pretty envy to the students of the Haifa university or the technical university of Israel, Technion. They have more CS-related top companies in their city than we have in complete Germany.

Day 5 - 7: Jerusalem and Dead Sea

After the conference, I stayed another week in Israel. From Haifa, I went to Jerusalem - A pretty extreme city. Everything is about religion there. As a mainly secular liberal (in an european sense), it was pretty strange and I want never ever live in that city.

On Saturday (Sabbat) my impression was that the city is practically shutdown. Every restaurant is closed, the streets are pretty empty (only Sherut taxis are driving). As tourist you better buy water on Friday, or you have a problem. This is totally different from Tel Aviv, were the life is much more relaxed and the people go to the beach on Sabbath. To read articles about problems between the "Jersusalem Life Style" and the "Tel Aviv Life Style" in the newspaper is no surprise after seeing both cities, e.g. here at spiegel.de.

I know the amount of history located in the old city, but I didn't liked visiting it. Too much persistent souvenir traders and shady tourist guides. I really don't like that. Especially in the Christian and the muslim charters. I haven't seen them in the jewish charter. May be only because it was Sabbath and than even dubious souvenir trades have their free day. Please: If you consider something your "Holy Place" show some dignity.

After noticing that every direction sign towards the Western Wall (what we call "Klagemauer" in German) is intentionally misleading, I oriented myself only by my "Lonely Planet". Eventually, I found the western wall plaza -- the only place in the old city that has been left its dignity. Mainly because the rules stated on large signs allow the prayers not to be distributed too much be people like me (no photographs on Sabbath for example). I was impressed how near all these locations are in reality. The Western Wall, the Temple Mount, the Al-Aksa-Mosque. Wow.

On my second day to Jerusalem, I had a day trip to the Dead Sea. The "Lonely Planet" says that the "Ein Gedi Beach" is an "undeserved popular beach". The book is right. Dear reader, I you visit the Dead Sea, please, drive one Egged-Station further. The "Ein Gedi Spa" is probably what you want visit. However, swimming in the Dead Sea was awesome. Totally strange feeling to be pined on the surface. I knew photos of people reading newspaper in the Dead Sea, but I never really took it for real. I always thought that it maybe is a bit easier to by on the surface, but can really can read while swimming. We should try playing Water Polo in the Dead Sea.

Days 8 - 11: Tel Aviv





After a dose of "culture" in Jerusalem, I went to Tel Aviv. And Tel Aviv is really a nice city. While I had problems in Haifa and Jerusalem to find restaurants to trust and to find a super market, both was easy in Tel Aviv. The shopping streets were fun - at least after you accept that it is normal that crowds of cute girls in uniform, jump up and down, shiekingly, before Bikini shops.

The beach was great. The 6-km promenade to the old city of Jaffa (the oldest documented habour of the world) was great. I visited the university of Tel Aviv. Palms on the university ground. Much of green. Have I said that I was sometimes pretty envy to the students there?

At the time I visited Tel Aviv, you can stay at the beach and swim, but the beaches were pretty empty. So I visited two museums in Tel Aviv. The Diaspora Museum at the university (Diaspora is the time called when the Jewish people had been in exile) and the Hagana Museum at the Rothschild Boulevard near the Independence Hall. Even considering that the Hagana Museam is pretty biased, it help to filled some holes I had in the understanding of the history of the country. The history of Jewism/Palastine/Israel is not a topic in German schools between the time between the year 70 and 1933 and also not after 1945. I suppose most people in Germany think that 1933ff was the first time that Jewish people returned to their country. However, I don't want to blame the schools for that. Class time is limited. Even the history of Germany after 1945 is not a topic in German schools.

I stayed at the Prima hotel. Nice, perfect location directly at the beach, but in contrast to what is said in the "Lonely Planet" there is no free internet. Only terribly expensive WLAN.

I really liked Tel Aviv and would it be so expensive I would revisit it, for sure. Some kind of local travel guide would be nice next time.

More random notes

My return to Germany from Ben Gurion Airport has been more eventful than expected. Some presents I bought at Tel Aviv, attracted the suspect of the security staff and I got into a extensive security check using some high-tech explosive detection devices. I had to show that my MBP and my photo camera really works. I had to remove the battery, etc. They even checked by "dirty clothes bag" for explosives. Fortunately the atmosphere was pretty nice e.g. the security staff found my "english-hebrew" book interesting and apparently found some of the translations pretty funny. Totally cute was that one of the security girls carefully re-packed the gifts, which they checked in detail, back into a box. The wrapping was nicer afterwards than before.

One thing I found very strange at the beginning: There are soldiers with armed weapons everywhere. Not because of checkpoints (I have only seen one checkpoint near the Dead Sea), but because it is normal to wear uniform in the spare time and often to wear the personal weapon. I have done mandatory military service as mechanized infantryman in 2000/2001. In the Germany army, soldiers are very strict when it comes to weapons. Wearing an assault rifle in the sparse time is completely unthinkable in Germany. For example, recent newspaper reports state that soldiers that have forgotten to leave the personal pocket knife(!) at the barracks and get controlled by the police have to pay up to 10.000 Euro (14.000 USD) due to very strict German weapon laws. From a military point of view, it also clear why the IDF soldiers carry their weapons to their home. Similar to Switzerland, the country is simply to small to have time of a lengthy mobilization in case on an attack. But, it wired. Even wearing the uniform in the sparse time is only allowed under very strict rules. A soldier is only allowed to wear the uniform on the direct way from home to the barrack. I have seen a couple walking through the city park of Jerusalem, holding hands on Sabbath, and the man had a rifle on this back. Nothing unusual there.

Other thing about the IDF soldiers I found strange, is -- I would call it -- lack of discipline. They hang around at bus stations, making themselves up with a lipstick, chewing gums, wearing non-uniform clothers like Flip-Flop shoes or (as described above) do shopping in groups. Some female soldiers had opened their shirt by the top three or four pates. All nothing that would be considered appropriate here. While I was pretty fast used to see uniforms everywhere, the lack of discipline kept surprising me.

Eating: I love english breakfast, so an Israeli breakfast is probably some kind of counterpart. Due to religious rules, they do not eat meat, eggs, ... for breakfast. The fresh salads, fruits, and cakes were really nice (at least in good hotels), but I still missed my favorite topping (ham). I am not really a fan of marmalade. The rest of the eating I would consider "arabic" (Probably Israeli people will scream when reading this): Hummus (delicious!), Kuskus, and these extremely sweat cakes. Falafels seem to be some kind of national disk.

I have not fully get what "Kosher" really means, but nearly every restaurant I found had "Kosher" certificate. It is still really important there. But at least in Tel Aviv, there are also non-kosher restaurant, e.g. -- if I understand if correctly an italian restaurant is by definition non-kosher. In practice, e.g. at the cantina of IBM, I haven't found the Kosher thing anything limiting except for the breakfast.

Thursday, June 25, 2009

Kleine Erinnerung daran über Israel zu schreiben

Heute kam per lawblog.de eine kleine Erinnerung einen Reisebericht über Israel zu schreiben:
Bei Polizeikontrollen, vor allem in Schleswig-Holstein, sind nämlich Soldaten außerhalb des Dienstes mit einem juristischen Vorwurf konfrontiert worden: Verstoß gegen das Waffengesetz. Ihr Vergehen? Die Soldaten, meist in Uniform auf dem Heimweg ins Wochenende, hatten das “Standardmesser” der Bundeswehr dabei.

Es handelt sich um ein Taschenmesser (Hersteller: Victorinox), allerdings in Form eines Einhandmessers. Die Besonderheit an Einhandmessern ist, dass sie mit einer Hand geöffnet werden können. Problem: Einhandmesser fallen seit neuestem unter das Waffengesetz. Wer so ein Messer bei sich hat, riskiert ein Bußgeld bis zu 10.000 Euro.
Auf so etwas kann man nur in Deutschland kommen, oder?

In Israel -- und zum Glück ist Deutschland im Gegensatz zu Israel von Freunden umgeben -- sieht man überall Soldaten mit ihren Waffen auf dem Heimweg und auf dem Weg zur Kaserne. Oder auch am Sonntag beim Spaziergang mit der Liebsten im Park.
Es war sehr gewöhnungsbedürftig und ich bin froh, dass dies hier weder notwendig noch üblich ist, aber wegen dem "Standardmesser" einen Aufstand zu machen, darauf kann man nur in D kommen.

Saturday, June 20, 2009

dirkmeister.de down?

Der Webserver, auf dem auch dirkmeister.de gehostet wird, hat zum dritten Mal innerhalb von einem Jahr einen Hardwaredefekt. Nachdem daher sicher sinnvollen Hosting-Providerwechsel läuft aber nichts wirklich richtig: Kriege die Domain nicht umgezogen, keinen vernünftigen Zugang zum neuen Server, etc.

Auf Grund dieser Probleme ziehe ich jetzt erstmal nach Blogger um. Ich habe ein Jahr gebraucht um nach dem letzten Crash wieder die Leserzahlen zu erreichen wie vorher. The Show must go on. Wenn hier irgendeine Festplatte crashed, ist es mir total egal. Wenn es mir "hier" gefällt, dann bleibe ich möglicherweise auch ganz bei Blogger.

Auch andere Webseiten sind aus dem gleichen Grund down. Zum Beispiel der Blogaggregator juli-blogs.de. juli-blogs.de ist nicht down, weil ich bei den "Jungen Liberalen" ausgetreten bin. Das war nur eine unglückliche zeitliche Überschneidung. Ich habe keinen Plan juli-blogs.de einzustellen.

Sunday, April 26, 2009

Twitter Updates 2009-04-27

  • Wow. Oracle buys Sun! Changes: More wired version numbers for Sun products? Future of mySQL? Oracle is more anti-agile than IBM. #
  • Listening to "Storage Systems". #
  • My Prof. has read the Scala book during holiday and finds the language nice. Maybe we will use it more at the PC^2. That would be nice. #
  • Nice overview about Cloud Computing (including OSS) in current "Linux Magazin" (German). I would like to eval "Eukalyptus". #
  • Removed a very hard to track bug in my master thesis's dedup system (ok, I introduced the bug after the thesis). I like this part of my job #
  • Another talk about deduplication from Santa Cruz that seems interesting: "Data Sequentiality Potential for Data De-Duplication Schemes". #
  • Searching the current mailing address of Peter Levart, project owner of FUSE-J. Sad that the project seems to be abandoned. #

Sunday, April 19, 2009

Twitter Updates 2009-04-19

  • Oh. For weeks no response from my hotel in Haifa. Today both hotels responded within 30 minutes. #
  • Playing with FUSE-J. Problems with C generator. It finds method with same name, same parameters but different return type: ByteBuffer.array #
  • Visited a PhD defence of a fried today. :-) #
  • " Who want to write a paper for next year, "grep vs reverse index"?" on Hadoop mailing list: http://tinyurl.com/delgj2 Lol #
  • Just installed my Intel X25-E SSD. #
  • The grading of my master thesis is now finished. Finally i'm really done. Nice. #
  • Today I played soccer organized by the PACE international graduate school. Finished as 8th of 9 teams. Oh. #
  • SSRC Seminar talk by Deepavali Bhagwat "Extreme Binning: Scalable, Parallel Deduplication". I would like listening to this talk. #

Sunday, April 12, 2009

Twitter Updates 2009-04-12

  • Now I have open my first open source project. My first OS activity since the bad experience with the Shox network simulator last year. #
  • Just watched Monster vs Aliens in 3D. Not the best Pixar movie, but the 3D experience was "Wow". I bet most films will be 3D in a few years #
  • Listening to Google Collect episode of the Javaposse. Used the library in Shox project two years ago. Nice engineering! #
  • The hotel in Haifa I picked first doesn't respond to my mails. One mail two weeks ago. One two days ago. Strange. Now I try an other one. #
  • Quillen: Nice deduplication project using AWS S3 on Google Code: http://code.google.com/p/quillen/ #
  • Statebook: http://www.statebook.co.uk/ #

Monday, April 06, 2009

Just Released: Filesystem Chunking

I just released some tools developed to evaluate chunking-based data deduplication techniques on various systems and to evaluate new chunking methods into the new open source project "fs-c" (for filesystem chunking).

The fs-c tools allow to analyze the internal and temporal redundancy of file system directories that are found by content-defined chunking using Rabin's fingerprinting method and static chunking with different chunk sizes.

The goal is to allow users to provide a rough estimate of the redundancy found by de-duplication systems for their concrete workload and to provide a basis for further enhancement to the tools and for e.g. application-specific chunking methods.

Currently the analysis is only done using an in-memory hashtable which limits the size of the system to a few hundred GB of data (or you need a large shared memory systems). I have also developed Hadoop MapReduce tasks to calculate the redundancies, but that code is not ready to publication.

Sunday, April 05, 2009

Twitter Updates 2009-04-05

  • in Willingen for the PC^2 Seminar. Listening a talk about "Scientific English" #
  • I'm aware of lots of storage research paper from NetApp, IBM and HP. I know some from Sun. Are there are any from EMC I should be aware of? #
  • Something sucks here. Scalax and/or Scala Eclipse Plugin. OutOfMemoryException (on my 4 GB MBP) while simply compiling this project. #

Thursday, April 02, 2009

Comparison of One-Hop Distributed Hash Tables (DHT)

I just uploaded an old semester work, which I haven't published here yet, about a comparision of One-Hop DHT:

Distributed Hash Tables (DHT) are an important substrate of several peer-to-peer (P2P) applications. Most existing approaches favor a small memory and network overhead over lookup latency. New approaches question this tradeoff and allow a lookup with using only one hop, but they store the routing information for all nodes on each node in the system and so require higher background traffic to maintain the routing tables up-to-date. In this paper the design of three one-hop DHT approaches is described and compared in detail. This comparison shows that different assumptions are used to analyze the approaches. Therefore, several parameters are inspected and an uni- fied parameter setting is extract. Using the unified parameter setting, a fair and meaningful comparison of the approaches is possible. In particular, the bandwidth consumption, fault tolerance properties, the usage of heterogeneity in the P2P network, and the scalability are compared. The comparison shows that the unified parameter setting lead to different relative results as originally stated by the approach designers.
PDF

Sunday, March 29, 2009

Twitter Updates 2009-03-29

  • What is the best plotting tool available? R, gnuplot, Excel? Matlab?, matplotlib? I am currently trying R and I am not really happy with it. #
  • Very nice "timeout" shell script: http://tinyurl.com/3m8ul5 #
  • I find statements from PhDs like "such languages only used for learning purposes like Pascal and today Java" really strange. #
  • Note: You have written to much LaTex text if you use \% in chats. #

Saturday, March 28, 2009

Video about IBM ProtecTIER data deduplication

IBM has published a marketing video about their ProtecTIER data deduplication system recorded at the Pulse09 conference in February:

Key message: It is scalable. But the video contains 3 minutes of marketing stuff without much real information.

What I really find more interessing: At the SYSTOR'09 conference (one of the interessing talks I mentioned here) will be a research talk about the technology and concepts behind the ProtecTIER system, which is based on the product from the company Diligent that IBM bought April 2008. Abstract:

We describe some of the design choices that were made during the development of the IBM TS7650G ProtecTier, a fast, scalable, inline, deduplication device. The system's design goals and how they were achieved are presented. This is the first and only deduplication device that uses similarity matching. The paper provides the following original research contributions: we show how similarity signatures can serve in a deduplication scheme; a novel type of similarity signatures is presented and its advantages in the context of deduplication requirements are explained.
It is also shown how to combine similarity matching schemes with hash based identity schemes.
I really look forward to this talk. Especially how the delimit their approach in comparision to approaches like DERD, DeepStore and other.

First paper accepted

My first paper has been accepted for publication at the SYSTOR'09 conference that takes place in Haifa at May 4-6.

It is based on the first part of my master thesis, but the contents has been extended and revised afterwards:

Data deduplication systems detect redundancies between data blocks to either reduce storage needs or to reduce network traffic. A class of deduplication systems splits the data stream into data blocks (chunks) and then finds exact duplicates of these blocks.

This paper compares the influence of different chunking approaches on multiple levels. On a macroscopic level, we compare the chunking approaches based on real-live user data in a weekly full backup scenario, both at a single point in time as well as over several weeks.

In addition, we analyze how small changes affect the deduplication ratio for different file types on a microscopic level for chunking approaches and delta encoding. An intuitive assumption is that small semantic changes on documents cause only small modifications in the binary representation of files, which would imply a high ratio of deduplication. We will show that this assumption is not valid for many important file types and that application specific chunking can help to further decrease storage capacity demands.

I really look forward to that conference because surprisingly many talks in the program look really interesting and it is my first chance to meet storage researchers outside the Fürstenallee.

Sunday, March 22, 2009

Twitter Updates 2009-03-22

  • Wow. Barbara Liskov had won the ACM Turing Award. I have read ots of papers she co-authored. Especially here work with Gupta udn Rodigues. #
  • Has anyone evaluated OCZ Vertex SSDs? I dont't trust my benchmarks. They are simply too good. #
  • Is there an aquivalent for LinkedHashMap in or for python? #
  • My paper is accepted for SYSTOR 2009 conference in Haifa. :-) #
  • Watching iPhone 3.0 Sneak Peak #
  • A reviewer has critized the use of color in my graphs. I haven't colored graphs! Strange. #
  • FDP für Internetsperren? Wird langsam Zeit, dass ich austrete #
  • Meine Nackenhaare sträuben sich. Hat der Autor dieses SPIEGEL-Artikels (http://tinyurl.com/d34edn) den Hauch einer Ahnung wovon er schreibt. #

Saturday, March 07, 2009

Matlab für Mitarbeiter der Universität Paderborn auf Mac OS X

Für Mitarbeiter und Laborrechner der Universität Paderborn sind MATLAB Lizenzen verfügbar. Theoretisch zumindest, weil oft genug hat der zentrale Lizenzserver keine frei, aber das ist ein anderes Thema.

Auf der Webseite des IMT gibt es zwar Anleitungen für Windows und Unix, aber keine für Mac OS. Deshalb soviel vorweg: Die Installation ähnlich wie bei Windows, obwohl Mac OS ein Unix-Betriebssystem ist mehr Windows als Unix. Vorgehen ist:

  1. Zuerst fragt man beim IMT nach der Lizenzdatei und dem File Installation Key.
  2. Danach führt man das Installationsprogramm "InstallForMacOSX" aus, dass im AFS unter /afs/uni-paderborn.de/public/imt-download/matlab/R20/macos_x_intel abgelegt ist. Der HTTP-Downloadlink der auf der Homepage kann dazu nicht verwendet werden. Wie man AFS auf Mac OS X einrichtet, steht auf dieser Seite, aber auch die Anleitung funktioniert so unter 10.5 nicht. Aber gehen wir mal davon aus, dass AFS schon eingerichtet ist.
  3. Nun muss man wie bei Windows die Datei network.lic (Download;) im Verzeichnis licenses von Matlab anlegen. Dazu muss man den Paketinhalt der Anwendung MATLAB_R2008b anzeigen lassen (Rechte Maustaste auf die Anwendung und "Paketinhalt anzeigen" auswählen) und die Datei in das entsprechende Verzeichnis kopieren. Unter Mac OS sind Anwendungen "in Wirklichkeit" Verzeichnisse, die durch "Paketinhalt anzeigen" geöffnet werden können.
Danach kann Matlab normal gestartet werden (unter der Voraussetzung, dass gerade Lizenzen "frei" sind).

Subversion auf Suse Linux Enterprise (SLE) 10

Es ist unheimlich frustrierend, dass Novell Suse Linux Enterprise 10 (SLE 10) von Haus aus kein Subversion anbietet. Begründung: nur geprüfte "Enterprise-Ready" Software, bla bla. Subversion ist so etwas von Enterprise-Ready und war es auch schon als SLE 10 herauskam.

Naja, hilft ja nicht. Was hilft ist build.opensuse.org. Dort gibt es auch RPM-Pakete für SLE 10. Einfach nach subversion und dessen Abhängigkeiten (apr, apr-utitl und neon) suchen, die RPMs herunterladen und installieren. Beispiel mit den aktuellen Links:

wget http://download.opensuse.org/repositories/Subversion/SLE_10/i586/subversion-1.5.5-24.1.i586.rpm
wget http://download.opensuse.org/repositories/Apache/SLE_10_server_database_postgresql/i586/libapr1-1.3.3-6.1.i586.rpm
wget http://download.opensuse.org/repositories/KDE:/Backports/SLE_10/i586/libapr-util1-1.3.4-6.1.i586.rpm
wget http://download.opensuse.org/repositories/Subversion/SLE_10/i586/neon-0.26.1-10.1.i586.rpm

rpm -ivh libapr1-1.3.3-6.1.i586.rpm rpm -ivh libapr-util1-1.3.4-6.1.i586.rpm rpm -ivh neon-0.26.1-10.1.i586.rpm rpm -ivh subversion-1.5.5-24.1.i586.rpm

Und schon ist subversion installiert!

Sunday, January 25, 2009

Favorite PhD Comics

As newbie PhD student, I should know what to expect. Therefore I read besides xkcd the comic series, PhD Comics:
These are my favorites (in no particular order):

Friday, January 09, 2009

Fertig: Master of Science

Gestern hab ich mit der Verteidigung meiner Masterarbeit mein Informatik-Studium abgeschlossen. Jubel!

Damit bin ich quasi "Master of Science".

Offiziell bin ich es natürlich erst, wenn dies auch durch das Prüf-Sek bestätigt wird. Aber außerhalb der Sprechstunden (Di-Do, 9:30-11:30) ist es mit dem Prüf-Sek so eine Sache (um es mal freundlich auszudrücken). Ob ich daher am Montag, wie geplant, im PC^2 starten kann, steht daher auch noch in den Sternen.

Saturday, January 03, 2009

Vergleich von Objektorientierter Programmierung in C

Ist es ein fairer Vergleich zu sagen, dass

Personen, die meinen C wäre eine vollkommen ausreichende Programmiersprache um objektorientiert zu programmieren, auch glauben, dass die beste Art 1000 x 1000 zu rechnen, die ist ein Gitter von 1000 mal 1000 Einheiten zu zeichnen und dann alle Quadrate darin zu zählen.
?

Der Gedanke ist mir gerade beim lesen von "Gödel, Escher, Bach" gekommen. Der 2. Teil des Vergleiches stammt aus dem Buch.