The next in the series of understanding
the Disruptor pattern developed at
LMAX.
After the
last post we all understand ring buffers and how awesome they are. Unfortunately for you, I have not said anything about how to actually populate them or read from them when you're using the Disruptor.
ConsumerBarriers and Consumers
I'm going to approach this slightly backwards, because it's probably easier to understand in the long run. Assuming that some magic has populated it: how do you read something from the ring buffer?
(OK, I'm starting to regret using Paint/
Gimp. Although it's an excellent excuse to purchase a graphics tablet if I do continue down this road. Also UML gurus are probably cursing my name right now.)
Your
Consumer
is the thread that wants to get something off the buffer. It has access to a
ConsumerBarrier
, which is created by the
RingBuffer
and interacts with it on behalf of the
Consumer
. While the ring buffer obviously needs a sequence number to figure out what the next available slot is, the consumer also needs to know which sequence number it's up to - each consumer needs to be able to figure out which sequence number it's expecting to see next. So in the case above, the consumer has dealt with everything in the ring buffer up to and including 8, so it's expecting to see 9 next.
The consumer calls
waitFor
on the
ConsumerBarrier
with the sequence number it wants next
final long availableSeq = consumerBarrier.waitFor(nextSequence);
and the
ConsumerBarrier
returns the highest sequence number available in the ring buffer - in the example above, 12. The
ConsumerBarrier
has a
WaitStrategy
which it uses to decide how to wait for this sequence number - I won't go into details of that right now, the code has comments in outlining the advantages and disadvantages of each.
Now what?
So the consumer has been hanging around waiting for more stuff to get written to the ring buffer, and it's been told what has been written - entries 9, 10, 11 and 12. Now they're there, the consumer can ask the
ConsumerBarrier
to fetch them.
As it's fetching them, the
Consumer
is updating its own cursor.
You should start to get a feel for how this helps to smooth latency spikes - instead of asking "Can I have the next one yet? How about now? Now?" for every individual item, the
Consumer
simply says "Let me know when you've got more than this number", and is told in return how many more entries it can grab. Because these new entries have definitely been written (the ring buffer's sequence has been updated), and because the only things trying to get to these entries can only read them and not write to them, this can be done without locks. Which is nice. Not only is it safer and easier to code against, it's much faster not to use a lock.
And the added bonus - you can have multiple
Consumers
reading off the same
RingBuffer
, with no need for locks and no need for additional queues to coordinate between the different threads. So you can really run your processing in parallel with the Disruptor coordinating the effort.
The
BatchConsumer is an example of consumer code, and if you implement the
BatchHandler you can get the
BatchConsumer
to do the heavy lifting I've outlined above. Then it's easy to deal with the whole batch of entries processed (e.g. from 9-12 above) without having to fetch each one individually.
EDIT: Note that version 2.0 of the Disruptor uses different names to the ones in this article. Please see
my summary of the changes if you are confused about class names.
Nice, nice post - love the diagrams! After reading every single scrap of documentation on the Disruptor pattern, this post was the one that gave me my final "Aha!" moment. Thanks!
ReplyDeleteThank you!
ReplyDeleteTrisha, you should just draw the diagrams on the board and take photos :-)
ReplyDeleteI am still trying to digest the idea of the design here: is its goal to have just one writer that make the code thread-safe without lock?
Thanks,
Doug
You should only have a single thread writing to a single variable at any time, to prevent locks (see the section on modifying entries in the wiring post).
ReplyDeleteYes, this is to prevent the use of locks as locks are terrible for performance. But the design around multiple Consumers/EventHandlers is to also allow you to run things in parallel without contention.
Hi Trisha, just read this post. It' great; but I'm wondering about visibility guarantees. If there's no locking, how would a consumer see changes made to a message earlier by a producer?
ReplyDeleteThe short version is - because of the way the sequence number is updated, the Java Memory Model ensures that all the writes that happened before the sequence number is updated are guaranteed to have happened. Therefore when the consumer sees a sequence number (let's says 14), it can be certain that all the values in the event at slot 14 are correct.
DeleteHi Trisha, We have been trying to figure out how different WaitStrategy works, where can we find material regarding the same.
ReplyDeleteHi Ravi,
DeleteI see you've posted a message to the google group, which is definitely the best place to get help. The only material I personally have is the following:
Blocking: Lock. This strategy can be used when throughput and low-latency are not as important as CPU resource.
BusySpin: Hard on the CPU. Fast, reduces jitter. Best to tie to a specific core
Sleeping: Spins, yields, then parks. Best compromise but has spikes
Yielding: Spins then yields. Compromise, less spikes
Hope that helps!