Ruby Net:HTTP and Content Encoding : http_encoding_helper

Working on LifeBox has given me a great opportunity to get my hands dirty with the Ruby Standard Library.

I've blogged previously about Net::HTTP , which is a very nice library for working with HTTP and HTTPS requests in Ruby when you need a bit more control than open-uri allows. You know, things like sending custom headers, handling 302 Redirects, Last-Modified-Since, etc.

Even though we're still in development, we are currently checking just under 400 feeds - some of which we check every 15-20 minutes. That is a serious amount of bandwidth unless we can take advantage of some of the bandwidth saving features of the HTTP spec. With a combination of Last-Modified-Since (which not everyone supports for their feeds) and compression (which not every server supports) we've made a huge reduction in bandwidth.

However, there's a small catch with Net::HTTP requesting that servers send compressed content using the Content-Encoding header - Net::HTTP can easily add the header to request the content, but it currently provides no mechanism to actually handle the compressed data. (Although it looks like some other folks are looking into it . There's an update at the bottom of this post) It's fairly simple to do, however.

If you're reading this, now it's even simpler, as you can get my 21 line library (excluding code) that will do it for you - http_encoding_helper .

All that the library does is add a single method - plain_body which will work out if the content is compressed, and decompress it (using gzip or deflate, as the case might be). All that you have to do is request it using the Accept-Encoding header.

Check out the project page for a code sample, and more information.

Updated: 10Nov2007 I've been swapping a few emails with Hugh Sasse, who is responsible for the effort to get the Net::HTTP library to use compression by default (It's his email I linked to above). Since that post, he's also made a revision

One comment that Hugh made to me was that his patch and mine are fundamentally different in philosophy - his patch seeks to make compression the default, whereas mine is more of a "as you need" kind of patch.

The reason that my patch is a separate library, rather than a patch to the Net::HTTP library itself is purely for simplicity, and management. As I deploy code to a variety of servers, mangling the net/http code itself is not convenient. Similarly, my patch doesn't enable compression by default purely to keep it simple without having to mangle too much of the Net::HTTP code.

I'm eager to see Hugh's code to make it into the Ruby-1.9 Net::HTTP library so as that everyone's code can begin to use compression by default - being worked into the library itself makes it much more powerful, and much easier to keep working between changes in core.

Until that happens, my code will be available for use.

On a side note, it's interesting to see that lighttpd supports bzip2 compression - I don't know what UserAgents can handle that, but interesting none the less.