hamsterdb2: all your database are belong to us!
Currently, it looks as if i stopped working on hamsterdb (and Ohloh’s “Decreasing year-over-year development activity“) really makes me feel bad!) – but that’s far from true. I spend every minute on hamsterdb2, the next generation. I will continue to maintain hamsterdb and i even hope that i have time to release a bugfix release today or even tomorrow.
As i wrote in an older blog entry, hamsterdb will focus on embedded devices with limited hardware; it will offer high configurability and a limited feature set, optimized on throughput and performance. (Therefore i currently do not plan to improve transaction support for hamsterdb).
In contrast, hamsterdb2 will be for modern (preferably multi-core) machines and modern applications. Here’s a list of buzzwords:
- multithreaded - hamsterdb2 will not be just threadsafe, but it will actively encourage the use of multiple threads. Performance intensive operations are processed in the background. Every API handle (for Environments, Databases and Transactions) can be used from multiple threads in parallel.
- transactional - hamsterdb2 will support ACID transactions and multiple isolation levels. Isolation levels can be set individually for each transaction. Transactions itself are kept in memory – whenever a Database is modified, the Database “diff” is stored in-memory; transaction conflicts are resolved in-memory, without disk access. Same applies to UNDO operations. Committed transactions are periodically flushed to disk in the background.
- logging/recovery: since transactions are kept in-memory, they are also written to a logical logfile. In case of a crash, all operations from this logfile are repeated. These operations are idempotent, which means that they can be applied multiple times without side effects. Incomplete/Active transactions are recreated (there’s an API function to enumerate all active transactions). The index file is based on a B+Tree with atomic algorithms and therefore does not need its own logfile.
- Simplified API – i think the API of hamsterdb was already quite simple to use. Nevertheless i managed to improve it a little bit.
- The current state? I’m currently working on the recovery algorithm. Transactions and the index files are already working, but the index files need improvements (they currently only work for fixed-size keys). There’s no freelist management yet – empty pages are not reused. And i only implemented the SERIALIZE isolation level – but others will be simple to do. Currently, there’s no support for duplicate keys, record numbers and cursors.
Feel free to post questions/comments!