Every Java developer knows (or should know), that concurrent access to a list has to be synchronized.
In Java there are two build in ways to synchronize lists directly:
- java.util.Vector: The original implementation of an dynamic array from the old days of Java 1.0 - Java Stone Age.
- the factory method Collections.synchronizedList which produces a synchonized wrapper around a list (e.g. ArrayList or LinkedList).
There is also the list implementation CopyOnWriteArrayList from the package java.util.concurrent, which I will ignore here most of the time.
When I have to synchronize the access to a list in new code, which variant should be used
Depending from the concrete usage it could be an ultimate advance of the factory method, that the implementation of the underlying list can be chosen. If Vector is used, it is obviously a bad idea to delete elements from the middle of a list on a regular basis. The O calculus strikes back.
Furthermore the performance comparison shows surprising clear differences.
At the artificial task to insert 2^17 elements sequentially into the list, Vector needs 8 percent less time than the synchronized ArrayList. The advance is caused by the fact that Vector doubles the capacity of its internal data structure every time a threshold is reached. The ArrayList increases its internal array only about 50%. As a result the ArrayList copies its data more often into new arrays.
The iteration over 2^17 elements the vector needs six times longer then the synchronized variant of the ArrayList. This huge difference is caused by the locking behavior of the classes. Vector locks every single access to next(). The synchronizedList-Implementation doesnât lock the access over an iterator as described in the javadoc. This breaks the rule that the client should never be responsible for thread safety of a class, but it allows coarse-grained locking around the whole loop. This avoids the possibility of ConcurrentModificationExceptions. The difference between one entry and release of an critical region against the 2^17 entrances and releases, causes the surprisingly high time difference.
Of course it is also possible to lock very single access to next, e.g. when the execution of the loop body takes a longer time. With this fine-grained locking the performance would be the same as of the Vector implementation.
Another reason against vector is the ballast the class carries from the Java 1.0 time. Vector implements the List interface and offers the methods add(), remove() and so, but it also has old methods like elementAt (old get), elements (old iterator which returns an Enumeration) and insertElementAt (old add). This makes the usage not very handy.
A more subtle reason for synchronizedList is that a lot of developers use Vector all the time, even if no synchronization is needed. It is often not clear, if Vector is used with purpose or since a lack of experience or knowledge. A lot of tutorials and books donât mention the collections api. Furthermore there are universities havenât discovery the collections api yet.
synchronizedList documents the purpose of the code in a better way: Yes, I want synchronization. Yes, I know what I do. In very case the intention should be documented. In the javadoc and (in the ideal case) also with the Annotations from "Java Concurrency in Practice".
In my opinion Vector should be declared obsolete at least after the introduction of generics. When no synchronization is needed, a "new" list implementation should be used. In the case of multi threading a synchronized wrapper or (when the reading accesses outnumbers the write accesses) the CopyOnWriteArrayList implementation should be used. Let Vector die, its time is over.
A bit more correct: Vector locks every single access to get(), which is called by next().[Update May, 5th, 2008: In the Javaposse Googlegroup is currently an active and interessing discussion about ArrayList vs. Vector with further details, options and surprisingly different results]
Wow !! to the point ! thanks A ton
ReplyDelete