Saturday, December 20, 2014

How I built my own fountain code

What problem are we trying to solve?

Our phones and computers can only send data in very small chunks at a time. The internet deals with this by breaking up files (webpages, images, videos etc) into chunks and hoping that they can all be reassembled at the other end. This works fairly well, most of the time, and this is the reason that most of the internet runs on Transmission Control Protocol (TCP).

Isn't this good enough?

TCP has been "good enough" for a few decades now, but it would be nice to have some other options. Here are a couple of scenarios where it would be nice to have something different

Live content

When you're streaming a live event, you can't pause and wait for the download to catch up, or else it's no longer live anymore. TCP isn't suitable for this, as its way of dealing with packet loss is slowing down and resending the lost packets. There are codecs that can handle small amounts of packet loss, but it would be nice if we could trade off more bandwidth to hide any packet loss that might happen

Content Distribution Networks (CDNs)

Chances are that you downloaded this webpage from a different server to the one that I uploaded it to. The global nature of the internet means that requests for content are often directed to a server that's close to you. The way the internet currently works means you can only really download a file from a single server at a time - it's difficult to request it from multiple sources, and there's a chance you'll get multiple copies of the same chunks if you do.

How does a fountain code help?

We can describe TCP as being like a jigsaw puzzle, or a stamp collection. If you have a 1000 piece jigsaw puzzle, and lose a few of the pieces, you can't just get any other random pieces - they have to be the right pieces, or else the puzzle isn't complete. Similarly, the value of a stamp collection is generally in its completeness, rather than the sheer number of stamps there are. 1000 different stamps from a particular collection are valuable, whereas 1000 of the same stamp is rather boring.

So what is a fountain code?

Fountain codes are a type of forward error correction (FEC); specifically, a type of erasure code. An ideal fountain code is like filling up a glass of water at a fountain. You don't care which droplets land in the glass, but you do care about whether the glass is full. The current state-of-the-art fountain codes are based on Luby Transform Codes, which are based on a thing called the XOR operator.

XOR operators

If I have two numbers, 1010, and 1011, I can combine them using an XOR. 1010 xor 1011 = 0001. You line up the two numbers, and if the digits are the same, you write down a 0 - if the digits are different, then you write down a 1. XOR operators are cool because they work just as well in reverse - 1010 xor 0001 = 1011, and 1011 xor 0001 = 1010. We can use this when we break up a file to create extra pieces that can help us out if we get stuck (as a side note, this is how Stereo sound works on FM radio).

LT codes

For example, let's say we have a file that we break up into 5 pieces: a b c d e. We can also send other pieces, such as "a xor c", or maybe "b xor d xor e". If we send all 7, but the other person only receives 5, then they might be able to reconstruct the original message. For example, if you have pieces a, c, d, e, and "b xor d xor e", then you can xor d and e against "b xor d xor e" to extract b - this is the missing piece!

These sound cool - aren't they good enough?

Short answer: you need to pay to use LT codes or Raptor codes (except in New Zealand where software patents are illegal!) (I am not a lawyer btw). They're also the first practical implementation of fountain codes, and it stands to reason that there might be another way to build them that relies on different math.

They can also be a bit picky sometimes about *which* pieces you get. An ideal fountain code would work with any combination of pieces, but may have other issues that make it less practical. There's also a non-zero chance that there's a way cooler code out there, just waiting to be found!

... so I built my own

To quote my old math lecturer Geoff Whittle (congratulations by the way), "last night while I was lying awake thinking about math, I had an idea", so I got up and started writing down notes, and have spent the day re-learning field theory and cutting code. I wanted a system that didn't care at all which pieces you got, and managed to cobble something together in the end using linear equations.

Simultaneous equations

Remember doing these in high school? They usually looked something like this:

x + 2y = 25
3x + 4y = 9

The idea was that as long as you had as many equations as unknowns, you could usually solve the equations. The way you'd solve this was something along the lines of

(1) x + 2y = 25
(2) 3x + 4y = 9
(subtract (1) from (2) three times)
(3a) 0x - 2y = -66
(3b) -2y = -66
(divide by -2)
(4) y = 33
(substitute y for 33 in eqn 1)
(5a) x + 2*33 = 25
(5b) x + 66 = 25
(subtract 66)
(6a) x + 66 - 66 = 25 - 66
(6b) x = -41
x = -41
y = 33


There's no reason why x and y can't stand for multiple things though. Let's try this again:

x + 2y = 25, 49, 10, 58, 23, 95, 23, 47
3x + 4y = 9, 12, 67, 34, 23, 67, 98, 23

This looks kinda odd, but it still works. In addition, we've already done the work solving the first pair (25 and 9), so instead of solving again, we can just run the 6-steps for each of the other pairs and they'll solve too.

We can make this bigger if you like:

a + 31b + 4c + 2d + 45e + 5f = 12, 45, 65, 29 ..
3a + 24b + 2c + 23d + 23e + 11f = 56, 234, 92, 30 ..
6a + 45b + 4c + 65d + 56e + 89f = 39, 12, 6, 3 ..
5a + 6b + 34c + 4d + 78e + 56f = 34, 12, 67, 45 ..
23a + 23b + 12c + 43d + 12e + 23f = 4, 79, 20, 3 ..

It's worth pointing out that these equations are arbitrarily chosen - the receiver needs to get 5 of them to reconstruct the message, but if they miss a piece then we just choose a random new set of coefficients, calculate the answers, and send the packet down the line.

Results

It works! If you want 10 pieces, you can rebuild the message from *any* 10 pieces - I've run this about 50 times today and have yet to see decoding fail.

Improvements

You'll notice I haven't used any fractions or decimal points. There's not enough room here to get into this today, but I've used Galois Fields of order 2^8 with generator polynomial x^8 + x^4 + x^3 + x + 1. In short, if you redefine what addition and multiplication mean, and keep the numbers in the range of 0-255 (one byte, 8 bits) then you can avoid fractions while speeding up the calculations.

Problems

There's quite a bit of overhead involved. Each packet has to have the coefficients on the front, and needs one coefficient for each piece you've split the message into. If you have 100 pieces, you need 100 bytes of overhead on the front. If you have 1000 pieces, you have 1000 bytes of overhead on the front. If we keep in mind that an internet packet only holds about 1400 bytes, it means we have some serious limits here. If we want to send a 100KB message, we need to split it into about 100 pieces, and each piece has about 10% coding overhead. A 1MB message is out of the question. There are ways around this, where a 1GB file gets split up into 10,000 x 100KB messages, and you dump pieces onto the wire and the receiver lets you know once they've reconstructed parts, but this wasn't quite the ideal fountain code that I was hoping for.

In addition to this, the reference implementation really struggles to do the calculations once you get past 20-50 pieces. I suspect that most of this is due to the code being in python/numpy, and there are plenty of optimisations that could be made.

update 21/12/2014

The coefficients are random anyway, so I've changed things to make it just transmit a 4-byte seed for python's random.randint(). Now it'll handle any number of pieces with constant byte overhead, but 130 (for a 175KB file) pieces still takes a bit of effort to process (read: 2 minutes at each end to encode/decode). Given the encode time is similar to the decode time, I suspect the slowness is caused by the matrix multiplication code, or some double-copying of the GF(2^8) multiplication table, rather than my Gauss-Jordan implementation being dodgy. This gives me hope that I can speed things up by a few orders of magnitude and start testing with multiple MB files soon!

update 22/12/2014

It's been brought to my attention that this work has previously been covered by Muriel M├ędard at MIT. Looks like I have a bit more reading to do :)

Future work

There's still an ideal fountain code out there, waiting to be found. I sincerely hope somebody finds my work useful, but in any case, it's been a good day's work.

Code pls