Building a new, super cheap transcoder using AWS Lambda and Layers

So remember that post I wrote a bit back about the Transcoder?  Yeah,  that’s old hat now.  I wrote a new one.  In an afternoon.  On a Sunday.

No joke, and it was super, super easy.

I’ve been experimenting with Golang a lot lately (and it is absolutely beautiful, I gotta point that out).  I started off yesterday reading about the newest option in AWS Lambda – layers.

For those who aren’t familiar, layers are a way to include necessary files in Lambda functions.  At its core, layers are uploaded as zip files, with the zip file essentially being mounted to /opt on the instance.  By including other binaries in the layer, you have a clean, simple way to put statically-linked (this part is important!) binaries into your Lambda instances.

In this case, I made a zip file containing a directory, “ffmpeg”, which contains a statically-linked build of ffmpeg.  Since the zip file gets mounted to /opt, this means I can call ffmpeg at /opt/ffmpeg/ffmpeg, and pass in its options like I normally would.

From there, it’s just a matter of building the glue to do all of this.  What started out as an experiment in Go is now chugging away in production.  Who knows what’s next for a cruddy afternoon?

Writing a library in Go

At work, we do a lot with HAProxy.  I mean, a lot – our HAProxy configuration is over 6000 lines.  We’ve been looking into ways to pare this down, but it still leaves us with one issue: how can we even start to track what’s going in this system at any given, when there are so many moving parts?

We’ve started using the ELK stack (though it’s now called the Elastic Stack by Elastic, the company that really makes most of it), but only for logging API calls in our stack.

HAProxy allows you to create a “stats socket”, a unix domain socket that you can connect to and send commands to control parts of HAProxy itself, or, more usefully for this case, to get a list of statistics for each server, listener, backend, and frontend.  Problem is, in some setups, with multiple threads for HAProxy (used to handle high load), we can get multiple sockets, and these have to be aggregated.

I’ve been experimenting with Golang lately, and so I started writing a tool to handle this data.  I have written a library, HAProxyGoStat, that makes it easier to handle the data.  It handles parsing the CSV format that HAProxy outputs by default (other formats, such as the JSON output and the split out format it can use, are both more text and harder to parse, for the most part, in addition to being unsupported on older versions of HAProxy).  You can find the library here: https://github.com/hmschreck/HAProxyGoStat.

As a demonstration of just how fast this library is, in our current environment for testing, HAProxy has 4 stats sockets, and each one reports about 1550 stats (each server, listener, backend, and frontend presents a stat).  Each stat is composed of 82 attributes; this means that, total, there are over half a million records in a single set of ‘snapshots’ of sockets.

The aggregation combines each of the stats snapshots, either passing through, or returning average, max, or sum, and returns a single snapshot after the aggregation process.  In a test that creates a parser, reaches out to each of the four sockets, and creates a set of snapshots simultaneously, then filters it.

The entire program runs in 0.22-0.27 seconds, hovering around 0.25 seconds – keep in mind, this includes several steps for initialization, that a properly daemonized version wouldn’t need to do, such as creating the parser.

This processes half a million records down to size in a quarter of a second.  I can literally run this every second and it will not back up.  That’s *fast*.