MapDB 1.1 and sun.misc.Unsafe

High performance Java projects usually use sun.misc.Unsafe to directly manipulate memory. It is not officially part of JDK, so it does not work on Dalvik and some other JVMs. Also it might cause JVM to crash in cause of an error, making debugging and user support hard.

MapDB uses safer ByteBuffers, thin abstraction over Unsafe. It is safer and easier to use for about 10% performance penalty. This choice helped to develop MapDB faster and more robust.

However 10% performance bonus is nothing to sneeze at, so Unsafe storage is supported in form of optional extension. MapDB is build around ByteBuffers and it can not take full advantage of Unsafe yet. MapDB 1.1 will add necessary changes to make it more usable.

So what is wrong with current ByteBuffers?

Unnecessary boundary checks

On each read or put, BB checks if offsets are within its limits. This checks can not be optimized away by JIT, as with byte[]. Boundary checking adds about 10% of overhead.

It is especially bad with many small calls generated by MapDB deserializer. Workaround is to read all data into single byte[] and deserialize small chunks from there. But it means additional copying and allocations which MapDB tries to avoid.

Defensive copy to transfer data

BB stores offsets internally, so to copy data from one BB to another one has to call bb.duplicate() and update offsets in new instance. Under heavy load it triggers GC and ruins CPU caches. MapDB is already quite optimized, so this could be almost 50% of GC trash.

General inflexibility of ByteBuffers

ByteBuffer's are not really that well designed in Java terms. There is 32bit addressing limit. It is hard to extend. And most implementations (java.nio.DirectByteBuffer) are final and package protected.

In short I don't like BB anymore. To fix their problems, there are dozens abstractions including Volume from MapDB. From now MapDB will build around ByteBuffers rather then on top.

MapDB 1.1 will add some changes:

DBMaker.newMemoryDB() will not use HeapByteBuffer but raw byte[]. It means one less abstraction layer, no boundary checks and instant 10% performance boost. This change is already done in MapDB snapshots.

Direct transfers between Volume. MapDB tries to move data directly from one location to another without using third buffer. So far we relied on ByteBuffers to do direct copying, but it does not support raw byte[], Unsafe and other storages. There is new commit which makes this independent on ByteBuffers

Off-heap memory based on Unsafe. So far sun.misc.Unsafe is not really supported. We can not add official support, since it is not part of JDK and does not work on Android. But the [mapdb-unsafe](](https://github.com/jankotek/mapdb-unsafe) extension will be treated as first class citizen. Its releases will be synchronized with MapDB release. And it will get bug fixing, support and will be mentioned in documentation.

Memory-mapped files via Unsafe. Memory mapped files use ByteBuffers as well. The boundary checking overhead is not that prominent here, since disk speeds are lower. But extension should support mmap-unsafe files anyway.

Partial async file IO. Breaking away from ByteBuffer offers option for asynchronous file reads (better term is probably lazily read). In practical terms the DataInput will be loaded once the Serializer actually starts reading data. This could improve performance by a few percent.




Last modification: July 01 2014

blog comments powered by Disqus