Tuesday, 17 January 2012

Christmas decorations teach me a lesson about troubleshooting

And now, after an absence of several weeks, you get to see how long it takes me to write some of these posts.


I was putting up the Christmas decorations one Saturday when my worst fear was realised1 - one of my three strings of lights was not working.

The first two went up fine.  The third lit up when I plugged it in, and in less than a second went out.  Curses.  This is not what I wanted, this was supposed to be a short exercise in making my tiny little flat look festive.

So I set about the tedious task of starting from the end closest to the plug and replacing every bulb, one by one, with a spare one to see if it magically lit up again.  When it doesn't, you take the spare back out and replace it with the original bulb.  I remember my parents going through this ritual every Christmas, the tediousness of this activity is more memorable than the fleeting joy of shinies.

While I was doing this, my mind was back on the job I'd been doing at work the previous week - battling an Internet Explorer 7 performance problem.  We have automated performance tests which give us an indication of the load time for our application in Chrome and IE, and some time in the previous couple of weeks our IE performance had significantly degraded in the development code.  Due to a number of too-boring-to-explain-here circumstances, the last known good revision was four days and nearly 250 revisions earlier than the first revision that showed the performance problem.

Since we couldn't see anything to indicate it was an environmental problem, the logical next step was to pinpoint the revision which caused the problem, so we could either fix it or get performance gains from somewhere else in the system.


The most obvious way to do this, given there were no obvious suspects, is with a binary search of the revisions.  Our last known good revision was 081, our first poor performing one was 240.  So the thing to do is to check revision 160, see if it falls on the poor or good performance side.


If 160 proves to be a poor performer, check revision 120....


...if 160 is fine, test revision 200...

...and keep splitting the revisions by half until you find the suspect.

So of course that's what I want to do with my stupid Christmas lights.  I do not want to sequentially check each light bulb, that has a worst-case number-of-bulbs-tried = n, where n is the number of bulbs (probably a couple of hundred, although it felt like several thousand).  So, in computer speak, O(n).  The binary search algorithm is O(log n).  At university, this Big Oh had no context for me.  But when you've taken 10 minutes to get a quarter of the way through your Christmas lights, and you diagnosed your IE performance problems... well, actually it took days.  But the point is, a binary search for the missing bulb would definitely have been a Good Thing.

I know you're dying to know if I tracked down the problem in Internet Explorer - I did.  What's the worst case when you're doing a binary search?  It's when the thing you're looking for is veeeery close to either your start point or your end point.

The revision number I was after was number 237.  Sigh.


And my Christmas tree lights?  Well, through the boredom I remembered that modern lights are in sections, so they have a sort of built-in binary search - well, limited segments will go dark if a single bulb is out - which allows you to narrow down your problem area.  Since the whole string was out, I figured something else was probably wrong.

Turned out the plug had come out of the socket.

So:
Lesson 1: Theoretical computer science does have a place when you care about how long something takes.  When it's you feeling the pain, you'll do anything to make it stop.

Lesson 2: When diagnosing a problem you will always biased towards what you think it is, in the face of actual evidence.  I was afraid I would have to search the whole set of lights for a blown bulb, so that's the problem I was looking for when the lights failed.  In actual fact it was a problem with a much simpler solution.



1 - OK, "worst fear" in this very limited context only - it's not like I lie awake at night in July afraid that one of the bulbs on my Christmas lights has blown.