Archive

Archive for the ‘Coding’ Category

A new implementation for Transactions

October 14th, 2010

Over the last few weeks i started merging the hamsterdb2 Transaction handling into hamsterdb. (Some time ago i already announced that hamsterdb2 will be discontinued due to lack of resources [= time]).

The code is merged on a new branch “bleeding_edge” (http://github.com/cruppstahl/hamsterdb/tree/bleeding_edge).

What’s so special about the Transaction stuff I’m now writing?

Traditional implementations of Transaction

Traditional implementations of Transactions are often based on one of the following methods:

  1. Modified database pages are locked when they’re modified; the lock is held till the Transaction is committed or aborted
  2. Modified database pages are copied to a “shadow page”; upon commit, the original page and the shadow page are exchanged, usually by swapping pointers
  3. Only one Transaction is allowed (currently this is hamsterdb’s way of dealing with Transactions)
  4. Transactions are not allowed at all

While the last two options obviously have some limitations, the first two are a little bit better. Locking pages can block your Transactions when they modify keys on the same database page:

begin T1
begin T2
insert T1, key1, record1
insert T2, key2, record2 –> this will block till T1 is committed or aborted

Depending on the implementation, “block” means that the calling thread will only continue when T1 is committed or aborted. This will always lead to deadlocks if T1 and T2 are used in the same thread.

Shadow pages are costly – a new page has to be allocated, the memory has to be copied etc.

How will hamsterdb deal with all this?

I have found a different way of dealing with Transactions in hamsterdb. They are based on the assumption that Transaction support is mainly interesting for desktop and enterprise environments, not so much for embedded ones. And a conclusion from that assumption is that there’s sufficient RAM to store additional data. (Another conclusion is that multi-core CPUs are available and some costly operations will be processed in the background; but this is not part of my early milestones.)

Whenever a key/value pair is inserted in a transactional environment, hamsterdb will create an in-memory tree structure for the modified database. The key/value pair is then inserted in this tree. As soon as this Transaction is committed, the tree is flushed to disk. Since RAM is volatile and the tree is lost if the application crashes, a logical journal is used to log the operations. (Lookup and erase operations are similar.)

What maybe sounds a bit complex will be much simpler when reading the pseudo code.

def insert(db, txn, key, value):
    // get or create a new Tree structure for this Database; each
    // Database has just one such tree
    tree=db_get_or_create_tree(db)

    // get or create a new node in this tree which manages all operations of this key
    // each Tree has just one node *per key+
    node=tree_get_or_create_node(tree, key)

    // check if another active transaction (neither committed nor aborted)
    // also modifies this key. If yes then we have two conflicting transactions and
    // we return an error. The function is listed below.
    check_for_conflicts(db, txn, node, key)

    // otherwise we append another operation to this node
    node_append_operation(node, INSERT, record, current_lsn, flags)

    // and write the operation to the journal
    log_append_operation(INSERT, key, record, flags)

    // that's it!
    return

As promised here’s the function to check for conflicting Transactions. It will also check for other errors, i.e. if we’re inserting a key which was already inserted in another committed Transaction (this would create a duplicate key which is not allowed unless explicitly requested by the user).

def check_for_conflicts(db, txn, node, key):
    // iterate over all database operations (insert/erase) which modify this key.
    foreach Operation op in node_get_operations(node):
        // get a reference to the Transaction of this operation. We need to see if it's
        // already committed, aborted or still active.
        optxn=operation_get_txn(op)

        // if the txn was aborted then it's not interesting for us
        if txn_is_aborted(optxn):
            continue

        // if the txn was committed (or if it's the current txn) then we have to have a
        // closer look; if the operation was INSERT then we have a duplicate. If it's ERASE
        // then we can return successfully, because the key no longer exists and we can
        // insert it.
        if txn_is_committed(optxn) or txn==optxn:
            if op_type==INSERT:
                return DUPLICATE_KEY
            if op_type==ERASE:
                return SUCCESS

        // if the txn is neither committed or aborted then it's still running.
        // if two running txn's modify the same key then we return an error
        if txn_is_active(optxn):
            return CONFLICT

    // we're nearly through - we now checked all Transaction trees that were not
    // yet committed. Now we have to check the committed transactions that were
    // already flushed to disk. This basically is a simple btree lookup.
    // if the key does NOT yet exist, we're fine. otherwise we insert a duplicate.
    switch status=btree_find(db, key, flags):
    case SUCCESS:
        return DUPLICATE_KEY
    case KEY_NOT_FOUND:
        return SUCCESS
    default:
        return status

That’s it. hamsterdb allows inserting duplicate keys and it can also overwrite existing keys. To simplify the code I have not included these cases here.

Challenges

While the algorithm above works similar for lookup/erase operations, there are still a couple of challenges and problems that i have not yet solved.

Replacing the physical logfile with a logical journal will bring vast performance benefits, and the file size of the journal will be much smaller than the size of the logfile. However, it also requires that all operations in the journal are idempotent, which means that they can be re-applied even if they were already applied partially or completely. For insert I already know how to solve this problem. For erase the situation is more complex because the SMOs (structure modification operations – the btree merging/shifting/collapsing) are maybe impossible to become idempotent. This will require a simplification of the erase algorithm, i.e. in combination with “lazy splits”, or maybe I will have to fallback to physical logging in such cases.

The Cursor operations will be tricky to implement. Internally, two Cursors have to be created; one for the Transaction tree, another for the Btree. At runtime, both Cursors then have to be consolidated.

Similar problems appear when reading partial keys or when doing approximate matching. In both scenarios the Transaction tree and the Btree have to be consolidated.

An outlook

On the other hand, benefits of the new approach are obvious. By keeping all Transactions in RAM, performance will be high and concurrency (as in having several Transactions in parallel) will be high, too. No pages have to be locked. If the amount of consumed RAM is too high, large keys or records can be swapped to disk immediately.

In addition, as mentioned earlier, committed Transactions can be flushed to disk asynchronously. On today’s multi-core CPUs this offer another performance benefit for many use cases.

I hope I can come up with a first release this year, although most likely with limited functionality. If you’re interested in the progress then drop me a mail, follow the blog or the code – the github link is above. If you’re just curious about the new Transactional abilities, you can browse a few of the unittests. The file is here:

http://github.com/cruppstahl/hamsterdb/blob/bleeding_edge/unittests/txn.cpp

Look at the tests called txnInsertConflict*, txnInsertFind*, txnInsertFindErase*.

chris hamsterdb

c’t 18/10, page 179

October 4th, 2010

hamsterdb: release 1.1.6!

September 17th, 2010

This release stabilized the remote functionality. New samples for client/server scenarios are provided. The remote server can now write access logfiles. A unix daemon and a win32 Service were added. The dependency to Google Protocol Buffers for C was removed; instead, Google Protocol Buffers (C++) are now used. Compilation on ArchLinux (gcc 4.5.1) was fixed. Several functions are now deprecated. The source repository was moved to github. Sources, CHANGELOG and precompiled win32 libraries are available for download here!

chris Coding, hamsterdb, Libraries

hamsterdb moves to github.com!

August 2nd, 2010

I just imported the hamsterdb sources to a github repository.

There were a few reasons for this, mainly

  • I was fed up with repairing broken svn repositories, i had to use “svnadmin recover” every second week and afterwards i had to fix broken file permissions for the bdb logfiles. One repository was broken beyond repair.
  • I hope to make collaborating easier by moving to an open platform
  • github offers a lot of features (i.e. a Wiki) that i could utilise, and i am happy if someone else does the server maintenance for me

If you want to browse the sources, see the most recent changes or branch off your own hamsterdb then please go to http://github.com/cruppstahl/hamsterdb. Everyone’s invited :)

chris Coding, hamsterdb

hamsterdb: release 1.1.5 UNSTABLE

July 25th, 2010

This release implements remote functionality over http. The protocol is implemented with Google’s Protocol Buffers, the client uses libcurl for network handling and the server is based on mongoose, an embedded micro http server.
The server interface is very simple, and the server can easily be embedded into an application.

There are no known bugs, but currently the remote functionality is only supported on Unix/Linux. The next version 1.1.6 will also support the new features on Win32; several other improvements are also pending for the next release: a unix daemon, a win32 service and some other minor issues.

Download the sources at http://hamsterdb.com/download.

chris Coding, hamsterdb, Libraries

hamsterdb-dotnet: release 0.0.3

April 19th, 2010

… is available for download, including most of the functionality in hamsterdb 1.1.4.

chris Coding, hamsterdb, Libraries

hamsterdb: release 1.1.4

April 16th, 2010

Version 1.1.4 is now available!

This is mainly a bugfix release, and in case you’re not haunted by these bugs, there’s no need for you to update.

First one: hamsterdb only supports one Transaction at a time, but nevertheless it was allowed to create temporary, unnamed Transactions while another one was already running. -> fixed

Second one: on win32, if you have a cache size of more than a gigabyte and fill up the cache then the non-paged memory pool can be exhausted. -> fixed

What else? some macros in the header file were renamed for consistency reasons, but the old macros still exist and source compatibility was not broken.

Everything’s available here: http://hamsterdb.com/download.

Next thing on the TODO list: update the .NET wrapper. And then finally implement remote access.

chris Coding, hamsterdb, Libraries

hamsterdb java wrapper: release 0.0.3

April 7th, 2010

I just released a new version of the java wrapper, after neglecting things for a long time. Most of the current functionality is now available.

Next on the TODO list is the .NET wrapper.

Everything’s available for download, including precompiled win32/win64 libraries.

chris Coding, hamsterdb, Libraries

hamsterdb: release 1.1.3!

March 16th, 2010

Four weeks after 1.1.2, here’s the next! The headline is “partial read/write” – records can now be read or written partially. This can be a benefit if your records are REALLY big. But it will also enable you to “stream” records.

This feature caused changes in the layout of ham_record_t. Therefore the ABI of hamsterdb is no longer compatible (but the API and the database file format are compatible). To avoid crashes, i incremented the libtool version and embedded the hamsterdb version in the win32 dll/lib filenames.

Also, i fixed a performance related bug in the caching – 1.1.3 should therefore be much faster than 1.1.2.

I added a new function “ham_get_env” to retrieve the Environment handle of a Database. (This handle always exists, even if it was not created explicitely).

Everything’s available on http://hamsterdb.com/download.

Next big feature will be a remote client/server setup. And i’ll FINALLY update the java/python/.NET wrappers.

chris Coding, hamsterdb, Libraries

Presentation for OpenHUG Munich

February 26th, 2010

Yesterday i was invited to give a short presentation about hamsterdb for OpenHUG (Hadoop User Group) in Munich.

And i really enjoyed this meeting – most of the audience was coming from the Web 2.0/Enterprise area, where i do not have that many insights.

The OpenHUG was organized by Bob Schulze – thanks for the invitation, and thank you for the beer :)

Here’s the presentation.

hamsterdb

chris Coding, hamsterdb