Maybe I Should Be a Plumber?

It seems this week that I should reconsider my vocation, as work with pipes (no, not just the usual tubes) has been featuring prominently. Firstly, I had to put in a drainage pipe under the garden bed which I've (slowly) been building for Rach. Hopefully it will save the paving against the side of our house from flooding every time it rains. That would be super cool…

The second incident, and one which some of you might have noticed, is that SwitchPipe has been acting like a bitch. You may have noticed my sites being down at random intervals lately? Yup, that's SwitchPipe dying off for no good reason. And after I'd said so many nice things about it!!!

I was away when it first started, receiving a phone call from truelocal (of all folks!) letting me know that the Achernar Solutions site was down. Great. I then got an email from Peter - you know, the dude who wrote SwitchPipe - saying something along the lines of "Hey! Notice your site is down, hope it's not because of SwitchPipe". Gak!

Several other people on the SwitchPipe group had been having the same problem - SwitchPipe would sit around for ages, working perfectly, and then it would just die with no explanation.

I'll spare you the more boring bits about finding and hunting down the problem. Suffice to say that it involved lots of extra logging calls, and hacking logging calls into the EventMachine library.

Just now, I managed to reproduce the problem, fix it, create a patch and fire it off. Hopefully, that'll be the end of this freaking irritating bug.

What seemed to be causing it (and how I finally reproduced it) was making a "GET / HTTP/1.0" request to SwitchPipe. The key bit there is the request for "/". A fairly safe request, one would think. Well, not quite. I'm fairly certain that mod_proxy sends requests just like this off to back-end servers periodically, just to see if they're alive. This is likely what was happening in my case.

See, SwitchPipe first tries to find which site to serve based upon the directory name in the request. "/" has no directory. Oops! As such, that code returns false (as in, the FalseClass object), where it would normally return a string of the directory name. Not that_bad so far. However, later on, this string was turned into a symbol using to_sym(). Ok, BUT FalseClass doesn't have a to_sym() method! So what happened? Exceptions were raised, Threads died, and eventually, SwitchPipe would come to a standstill.

This sort of stems from the fact that EventMachine creates 20 Thread objects when it's running, but never manages or restarts them. It just assumes that they'll always be alive until it says otherwise. So, when a Thread died thanks to the Exception being thrown, the pool just continued getting smaller and smaller until, finally, there are no more Threads to handle the requests.

Hopefully with this patch things will be back to normal with my sites, and they should stay up. Aside from this one small (but nasty) bug, SwitchPipe has been working like a charm. Ironically, the problem only surfaced after I made the post about it. Typical!