Monday, 18 March 2013

It Depends

Don't you hate it when you ask a perfectly good question, and someone comes back with the answer "it depends"?

It's so frustrating to think that in a world of ones and zeros, people can't give absolute answers and you can't rely on "best practice".

It's an answer I've given so many times, especially when someone asks about performance.  Well, I've had my comeuppance.  The entire exercise of designing the new Java driver for MongoDB has been nothing but a series of questions where the answer is "it depends":
  • Which Java version are our users, um, using?
  • Do people want an asynchronous driver?
  • How will they want to work with async?
  • Will they want to use async and synchronous method calls from the same application?
  • Do people typically use the Java driver directly, or do they use something that wraps it, like Morphia or Spring Data?
  • What's most important for users in terms of performance?  Throughput? Latency? Consistent GC profile? Something else?
  • What sorts of operations are our users doing?
  • Do people usually update their driver and the server version at the same time?
  • Is it easier for them to update their driver version than their server version?
  • Do they use (or will they need) custom encoders and decoders?
  • When should we deprecate?  When should we remove deprecated methods?
...and so on, and so forth.

It's such a change from the sort of development I'm used to: "the business" (a business analyst, a customer, a business owner) comes to you with a requirement, you ask a bunch of questions, preferably explaining the trade-offs that come with decisions or approaches, and then you and the team come up with a design and implement it.  If you're agile, this is all done in a nice, iterative fashion, which hopefully leads to "the business" being happy, or at the very least to another series of stories/requirements to work on.

The problem is that working on a library, particularly an open source library, is a completely different thing.  We don't even know who our users are.  This statement is true for pretty much any web application, of course, but there at least you can (if you choose) use tools like analytics and A/B testing to figure out what works for your users and what doesn't.

When your library is used for free by all kinds of different teams and companies, including organisations behind closed doors, you have no idea what's being used, what works, what people like, what people don't like.  The most visible feedback you have is when you see blog posts telling everyone how bad your product is.

This makes the design exercise VERY difficult.  Take performance for example.  Having come from a high performance, low latency background, I'm desperate to have a very extensive suite of automated performance tests (and we do already have some).  But how do I design those, when I don't know:
  • What operations are typical for our users
  • Whether our users care more about latency or throughput, or mean latency vs the long tail, or GC pauses, etc
  • How much data customers tend to punt around
  • What the hardware or network topology looks like?

The number one lesson you learn when performance testing is to make your test environment as similar to production as possible.  How can we do that for all our customers?

Well, we can't.  Of course.

What we can do is offer an easy way for our users to test it for themselves, with their own data, their own hardware, their own use-cases.  If we can provide some sort of hook into standard metrics, we could get users to plug into that and do what they needed.  In theory, as we get more examples of these standard metrics from a range of users, it will be easier for us to help them diagnose problems.

What I discovered thinking about this problem is that I was asking the wrong question - it's not "how do we test this?" but "how do we make it as easy as possible for users to test this in a production-like environment?".

My biggest headache has been backwards compatibility.  I've worked on plenty of systems where we've provided an API which has to be maintained in a friendly way for those who use it.  That's tough enough - you have to be careful to only add methods, not to change signatures or remove them altogether.  But when your system is not simply an API to code against, but a library that runs within other people's systems, this problem is even harder.  Greg Young talked at QCon about this problem from the developer's point of view - every piece of code in your system, even if it's a third party library, is your code.  Because you're the one who'll get called at three in the morning if your system falls over with a ConcurrentModificationException in some third party data structure.

So as library developers, we have to not only provide excellent quality, well tested code, but we also have to let our baby go off and run in strange environments.  Ones that might not be running Oracle's Java 1.7, ones that might contain other libraries that could clash with our own.  Have you ever looked at all the Maven dependencies in a large project?  You can end up with conflicting versions of common libraries (e.g. logging or DI frameworks) as every library you use pulls in dozens of libraries for itself.

In our case then, we need to use as few dependencies as possible, and to write nice, clean, modern Java, whilst supporting older versions of Java.  What's modern enough?  We know that some large organisations take a loooong time to upgrade to the latest versions of Java.  We currently support Java 5 and above, as 5 was a big enough change (and is old enough now) to be a good point to draw the line.  But what about Java 7, with its shiny new asynchronous channels?  That would be awesome for a driver like ours, an application that exists solely to connect to some server somewhere.  What about Java 8, with lambda goodness and a very appealing Stream interface, that supports (and encourages) a syntax that looks like it might work well for providing a fluent API for querying, I dunno, databases?  How do we make the most out of advances in modern Java, without alienating our existing users?

And, of course, I haven't even talked about actually changing the API.  The existing Java driver doesn't even make use of generics.  How can we change our API to provide a more modern-looking interface, without either forcing all our users to make massive changes to their applications (and for what?  Their code already works), or adding so many new classes and methods that it becomes very difficult for new users to work out the Best Practice way of interacting with our driver?

So, what can we do?
  • Firstly we have to figure out which questions we have and what possible solutions exist.
  • Then we have to weigh up the pros and cons of each of the possible solutions.
  • Ideally we'd get feedback early and often from our users, from the community, about the approaches we're taking.
  • Development would happen in parallel, in a nice, agile, way, to bring the best possible solution for everyone.

What this all really means, though, is that the Shiny New Java Driver™ is not ready right now, and will not be ready immediately, despite the fact that we've already spent some time on its development, and considered all of those questions and more.  Right now, we're starting to get feedback from the community - from users, and from committers (or people who would like to be committers).  Which means that I get to to go to more conferences and user groups, and talk about our problems when designing the new driver.  I'm hoping to get two things out of this, apart from more air miles:
  1. Feedback from the community about our assumptions and the direction we're taking.
  2. Present to the development community some of the lessons we've learnt/are learning, in the hope that you can use them when approaching the design of your own applications.
<gratuitous-conference-plug>You've already seen some of these upcoming events.  But in case you haven't, this Thursday I'll be presenting at Skillsmatter on how we approached this design problem, and again at DevoxxUK next week.

If you're more interested in everything MongoDB, and want to get a much better look at what it is, how it works, how to design for it, then come to MongoDB London, where I am one of numerous presenters, all going into MongoDB specifics.</gratuitous-conference-plug>

So... I'm afraid to say that even after this long post, even after bemoaning my own experiences of hearing "it depends", I still don't have an answer for you.

But maybe I do.  

"It Depends" means "you need to get more information in order to answer that question".  And it's our job as developers to ask the right questions to gather that information.  If the answer is "it depends", you're not asking the right question yet, or you're not asking the right people.  So dig down, find out why you can't answer the original question, and start iterating through your design process until you have answers that you can act upon.

Easy.  Right...?

Wednesday, 13 March 2013

Upcoming Events

Following on from a busy and fun QCon, I can now let you in on the next set of events I'll be at:

Note that I'm actually going to be in the UK for all of March and April (at the moment), which will be a nice change.

I'm hoping to be able to speak about the same types of things for the rest of the year, it has been exhausting presenting on new topics for every event over the last 6 months.  I'll happily take feedback on what people are most interested in hearing about.

Monday, 11 March 2013

The Coalescing Ring Buffer

For anyone who is interested in what LMAX is up to, and is still following my blog, have a look at this post about the latest tool they've open sourced: the Coalescing Ring Buffer.

Wednesday, 6 March 2013

QCon Day One

I like QCon London, I really do.  Not only is it on home turf, but, as I've said before, it doesn't just focus on technology, or a set of technologies.


Full disclosure: I've been involved in planning QCon this year.  So this time I know all the thinking, hard work, planning and last-minute changes that go into a conference like this.  And it's a joy to be able to sit in the audience and see the conference that you've helped build.

There are things I took out of today that I want to get down on "paper" now, because I think the next few days will have different themes.

Let's Not Forget About Computer Science
I'm so pleased to see this in a conference!  After documenting and talking about the Disruptor so much last year, I felt it was important for us to go back to our roots a bit, and have some Mechanical Sympathy.  Some of the sessions today brought us back to the school room and had us thinking about the tools we're using.

In my new role I'm doing something I've never had to do before, and that's writing a library that will be used by other developers.  Barbara Liskov's keynote had me thinking hard about "readability over writability" and "design for the case that is used most often".  I also came out slightly depressed that some things hadn't changed in 40 years.  Martin Thompson's talk made me wish I had written my driver performance tests up front, and poked me to continue thinking about our users' performance needs.  I'd like to say Damian Conway's keynote made me want to code in Latin, but in fact it motivated me to get back to learning Spanish.  But it also (unintentionally) backed up a point I've heard before, which is that programming is at least twice as hard to learn if you don't speak English.  Probably worse if you don't speak a European language.

Overall, the message I took from the combination of these sessions is that it's almost more important than ever to get back to basics.  Stop obsessing about frameworks and tools and different flavoured JVM languages.  Start remembering all these things are tools, that they run on machines that work with ones and zeros.  Whatever abstractions we're using, we will write better, cleaner, more performant, fit-for-purpose code if we understand our tools, and their strengths and weaknesses.  One of these tools is even our own brain - understand how it shapes the information and presents us with solutions.

Art Is Awesome (and Useful)
I went a bit off topic to go to two sessions on visual information.  My excuse is that this blog needs more pictures, and I want more visual ways of representing what's going on with MongoDB/the driver.
Heather Willem's session encouraged us to doodle throughout.  In fact, forced us to.  Right up front she adresses the fact that doodling is seen as a lack of attention, as a waste of time.  And I realised, sitting there in the audience with my iPad and stylus, that I did feel guilty drawing away while someone talked at me.  But it was a brilliant exercise in unblocking some of those creative juices, and letting us see the power in visual information.  Perfect is not important, pictures are powerful.



Fernando Orellana is an artist fascinated by building machines to create art.  It was awesome to see an artist who also builds robots and codes stuff to do what he wants.  He said "code is like paint, I use it to create".  See? I knew coding was a creative activity.  He's done some stuff that I probably would have been tempted to say "is that art?", like aiming to create 40 000 play doh cars.  Or making creepy rodents that live in "dumb little suburban homes" out of singing hamster toys.  Go to his website and watch the videos, nothing else can describe it.  I was blown away, it was inspiring.  And useful too for sparking creativity in all of us.

Here's my "subconscious drawing" from the start of the session.  We need more child-like art in our lives.

These two sessions really rekindled my desire to do arty, drawing-y creative stuff.  And helped me see how fear blocks many of us from using this medium.

And...
Sitting in the audience as an attendee, different things jump out at you.  I was shocked, as someone who's been involved in suggesting and selecting speakers, at how few women there were again on this first day.  My thought was, how could this happen with me on the committee??  So I have more sympathy for conference organisers than I did, when facing this tough problem, but more conviction than ever that we need to do something different to showcase different types of role models.

On the other hand, it might be my imagination or wishful thinking, but the overall diversity of the attendees seemed much greater than recent years, which is a Good Thing.

I'm sorry for bringing up this subject again, but you might consider it blind of me not to notice something in a conference I've actually worked on.

But overall
I've been really impressed with the first day of the conference.  Not only was it, of course, a great opportunity to meet new people, catch up with old friends, and generally network, the sessions that I went to were excellent, and have me excited to be working in this industry, at this time.  And to be in a position to hear about it all.  And maybe, just maybe, contribute something of my own.

GDL Presents Women Techmakers with Trisha Gee

I was flattered a couple of weeks ago to be interviewed by Google as part of their women techmaker's series, as it moves over to Europe. In this video I talk about going to Mars, education, planning your career, being a developer, and the impact of technology on our lives. So, not much...