Tuesday, 30 August 2011

Disruptor 2.0 - All Change Please

Martin recently announced version 2.0 of the Disruptor - basically there have been so many changes since we first open-sourced it that it's time to mark that officially.  His post goes over all the changes, the aim of this article is to attempt to translate my previous blog posts into new-world-speak, since it's going to take a long time to re-write each of them all over again. Now I see the disadvantage of hand-drawing everything.

In the old world

This is an example of a configuration of the Disruptor (specifically a diamond configuration).  If none of this means anything to you, feel free to go back and refresh yourself on all the (now outdated) Disruptor details.

The most obvious changes over the last few weeks have been:
  1. Updated naming convention
  2. Integrating the producer barrier into the ring buffer
  3. Adding the Disruptor wizard into the main code base.
The New World Order

You'll see the fundamentals are pretty much the same.  It's simpler, because the ProducerBarrier is no longer an entity in its own right - its replacement is the PublishPort interface, which is implemented by the RingBuffer itself.

Similarly the name DependencyBarrier instead of ConsumerBarrier clarifies the job of this object; Publisher (instead of Producer) and EventProcessor instead of Consumer also more accurately represent what these things do.  There was always a little confusion over the name Consumer, since consumers never actually consumed anything from the ring buffer. It was simply a term that we hoped would make sense to those who were used to queue implementations.

Not shown on the diagram is the name change of the items in the RingBuffer - in the old world, we called this Entry, now they're an Event, hence EventProcessor at the other end.

The aim of this wholesale rename has not been to completely discredit all my old blogs so I can continue blogging about the Disruptor ad infinitum. This is far from what I want - I have other, more fluffy, things to write about.  The aim of the rename is to make it easier to understand how the Disruptor works and how to use it. Although we use the Disruptor for event processing, when we open sourced it we wanted it to look like a general purpose solution, so the naming convention tried to represent that.  But in fact the event processing model does seem more intuitive.

No more tedious wiring
Now the Disruptor wizard is part of the Disruptor itself, my whole post on wiring is pretty pointless - which is good, actually, because it was a little involved.

These days, if you want to create the diamond pattern (for example the FizzBuzz performance test), it's a lot simpler:
DisruptorWizard dw = new DisruptorWizard<FizzBuzzEvent>(
                         ENTRY_FACTORY, 
                         RING_BUFFER_SIZE, 
                         EXECUTOR,
                         ClaimStrategy.Option.SINGLE_THREADED,
                         WaitStrategy.Option.YIELDING);
FizzBuzzEventHandler fizzHandler = 
                         new FizzBuzzEventHandler(FIZZ);
FizzBuzzEventHandler buzzHandler = 
                         new FizzBuzzEventHandler(BUZZ);
FizzBuzzEventHandler fizzBuzzHandler = 
                         new FizzBuzzEventHandler(FIZZ_BUZZ);

dw.handleEventsWith(fizzHandler, buzzHandler)
  .then(fizzBuzzHandler);

RingBuffer ringBuffer = dw.start();
Note there is a Wiki page on the Disruptor Wizard.

Other changes: performance improvements
As Martin mentions in his post, he's managed to significantly improve the performance (even more!) of the Disruptor in 2.0.

The short version of this is that there is a shiny new class, Sequence, which both takes care of the cache line padding, and removes the need for memory barriers.  The cache line padding is now done slightly differently because, bless Java 7's little cotton socks, it managed to "optimise" our old technique away.

I'll leave you to read the details over there, in this post I just wanted to give a quick summary of the changes and explain why my old diagrams may no longer be correct.

Saturday, 13 August 2011

What I Did On My Holidays

And now, a post for my long-neglected, less technical readers.

I took a week off in July to try and avoid that Oh My God I Missed Summer Again feeling. Granted, it's easy to get that in the UK even if you're not stuck in an office the entire time.

Really this is just an excuse to post some photos on the blog.

Monday
Hopped on the bike and explored from Kensington to Westminster.

Felt distinctly smug when I grabbed my lunch from Victoria amongst all the less fortunate people who had to go back to their offices.
Tuesday
Decided to dose up on Culture, and went to the National Gallery.  Last time I was there I was eight years old, and I distinctly remember admiring the frames more than the art.

The rather marvellous Artfinder makes up for the fact that you can't take photos inside the gallery - I can share most of the pieces that struck me when I was there.

Wednesday
Finally got around to embarking on one of the walks in Secret London.  I chose to do the City, from St Paul's, covering Bank, Monument and Liverpool St.

It was fascinating. I've worked in a lot of places round there, socialise there regularly, and have explored a number of the nooks and crannies.  But I was astounded at how many places I had never seen, alleys I had no idea existed, and gardens hidden away between modern office blocks.

Highly recommended.


Thursday
My parents dropped by and I racked my brain for a different way to explore this massive, old city.  And it came to me: using the artery of the city - the river.  So we took a boat to Greenwich.

Where they were filming Batman.

And I didn't know until after we'd left.

Gutted.

Friday
The sun finally makes an appearance. I was going to write a blog post, but I refuse to be indoors on a sunny day if I can get outside for some Vitamin D production (and I hadn't bought my new shiny then).

Sat in the park, sunbathed.

Brilliant day.

Brilliant week.


Sunday, 7 August 2011

Dissecting the Disruptor: Demystifying Memory Barriers

My recent slow-down in posting is because I've been trying to write a post explaining memory barriers and their applicability in the Disruptor. The problem is, no matter how much I read and no matter how many times I ask the ever-patient Martin and Mike questions trying to clarify some point, I just don't intuitively grasp the subject. I guess I don't have the deep background knowledge required to fully understand.

So, rather than make an idiot of myself trying to explain something I don't really get, I'm going to try and cover, at an abstract / massive-simplification level, what I do understand in the area.  Martin has written a post going into memory barriers in some detail, so hopefully I can get away with skimming the subject.

Disclaimer: any errors in the explanation are completely my own, and no reflection on the implementation of the Disruptor or on the LMAX guys who actually do know about this stuff.

What's the point?
My main aim in this series of blog posts is to explain how the Disruptor works and, to a slightly lesser extent, why. In theory I should be able to provide a bridge between the code and the technical paper by talking about it from the point of view of a developer who might want to use it.

The paper mentioned memory barriers, and I wanted to understand what they were, and how they apply.

What's a Memory Barrier?
It's a CPU instruction.  Yes, once again, we're thinking about CPU-level stuff in order to get the performance we need (Martin's famous Mechanical Sympathy).  Basically it's an instruction to a) ensure the order in which certain operations are executed and b) influence visibility of some data (which might be the result of executing some instruction).

Compilers and CPUs can re-order instructions, provided the end result is the same, to try and optimise performance.  Inserting a memory barrier tells the CPU and the compiler that what happened before that command needs to stay before that command, and what happens after needs to stay after.  All similarities to a trip to Vegas are entirely in your own mind.


The other thing a memory barrier does is force an update of the various CPU caches - for example, a write barrier will flush all the data that was written before the barrier out to cache, therefore any other thread that tries to read that data will get the most up-to-date version regardless of which core or which socket it might be executing by.

What's this got to do with Java?
Now I know what you're thinking - this isn't assembler.  It's Java.

The magic incantation here is the word volatile (something I felt was never clearly explained in the Java certification).  If your field is volatile, the Java Memory Model inserts a write barrier instruction after you write to it, and a read barrier instruction before you read from it.



This means if you write to a volatile field, you know that:
  1. Any thread accessing that field after the point at which you wrote to it will get the updated value 
  2. Anything you did before you wrote that field is guaranteed to have happened and any updated data values will also be visible, because the memory barrier flushed all earlier writes to the cache.
Example please!
So glad you asked.  It's about time I started drawing doughnuts again.

The RingBuffer cursor is one of these magic volatile thingies, and it's one of the reasons we can get away with implementing the Disruptor without locking.


The Producer will obtain the next Entry (or batch of them) and do whatever it needs to do to the entries, updating them with whatever values it wants to place in there.  As you know, at the end of all the changes the producer calls the commit method on the ring buffer, which updates the sequence number.  This write of the volatile field (cursor) creates a memory barrier which ultimately brings all the caches up to date (or at least invalidates them accordingly).  

At this point, the consumers can get the updated sequence number (8), and because the memory barrier also guarantees the ordering of the instructions that happened before then, the consumers can be confident that all changes the producer did to to the Entry at position 7 are also available.

...and on the Consumer side?
The sequence number on the Consumer is volatile, and read by a number of external objects - other downstream consumers might be tracking this consumer, and the ProducerBarrier/RingBuffer (depending on whether you're looking at older or newer code) tracks it to make sure the the ring doesn't wrap.


So, if your downstream consumer (C2) sees that an earlier consumer (C1) reaches number 12, when C2 reads entries up to 12 from the ring buffer it will get all updates C1 made to the entries before it updated its sequence number.

Basically everything that happens after C2 gets the updated sequence number (shown in blue above) must occur after everything C1 did to the ring buffer before updating its sequence number (shown in black).

Impact on performance
Memory barriers, being another CPU-level instruction, don't have the same cost as locks  - the kernel isn't interfering and arbitrating between multiple threads.  But nothing comes for free.  Memory barriers do have a cost - the compiler/CPU cannot re-order instructions, which could potentially lead to not using the CPU as efficiently as possible, and refreshing the caches obviously has a performance impact.  So don't think that using volatile instead of locking will get you away scot free.

You'll notice that the Disruptor implementation tries to read from and write to the sequence number as infrequently as possible.  Every read or write of a volatile field is a relatively costly operation. However, recognising this also plays in quite nicely with batching behaviour - if you know you shouldn't read from or write to the sequences too frequently, it makes sense to grab a whole batch of Entries and process them before updating the sequence number, both on the Producer and Consumer side. Here's an example from BatchConsumer:

    long nextSequence = sequence + 1;
    while (running)
    {
        try
        {
            final long availableSequence = consumerBarrier.waitFor(nextSequence);
            while (nextSequence <= availableSequence)
            {
                entry = consumerBarrier.getEntry(nextSequence);
                handler.onAvailable(entry);
                nextSequence++;
            }
            handler.onEndOfBatch();
            sequence = entry.getSequence();
        }
        ...
        catch (final Exception ex)
        {
            exceptionHandler.handle(ex, entry);
            sequence = entry.getSequence();
            nextSequence = entry.getSequence() + 1;
        }
    }

(You'll note this is the "old" code and naming conventions, because this is inline with my previous blog posts, I thought it was slightly less confusing than switching straight to the new conventions).

In the code above, we use a local variable to increment during our loop over the entries the consumer is processing.  This means we read from and write to the volatile sequence field (shown in bold) as infrequently as we can get away with.

In Summary
Memory barriers are CPU instructions that allow you to make certain assumptions about when data will be visible to other processes.  In Java, you implement them with the volatile keyword.  Using volatile means you don't necessarily have to add locks willy nilly, and will give you performance improvements over using them.  However you need to think a little more carefully about your design, in particular how frequently you use volatile fields, and how frequently you read and write them.


PS Given that the New World Order in the Disruptor uses totally different naming conventions now to everything I've blogged about so far, I guess the next post is mapping the old world to the new one.