What I hate about the Theora codec.


For the past three weeks or so, I’ve been writing C code again. It’s been both fun and frustrating, since it’s been probably twenty years or so since I’ve done any serious coding in that language. The application is for a startup, and I can’t reveal the specific task I’m doing, but it involves reading data from a video stream.

I could have used any video technology I wanted, but to avoid patent encumberances I went with the Ogg container format with the Theora video CODEC. Luckily, I don’t need to actually decompress the video stream, I just need to be able to manipulate it at the frame/page level. Shouldn’t be too hard, and for the most part it wasn’t. I started by looking at the reference implementations for Ogg and Theora (libogg and libtheora), but they were a little too full-featured for my application, so I started to roll my own.

Things were going pretty well, until I got into the Theora identification header. I’d read the relevant parts of the spec and did some poking around with a hex viewer on my sample Ogg files. Everything looked good, so I created a C structure definition that I thought matched the header:


typedef struct __attribute__ ((__packed__)) _THEORA_ID_HEADER {
THEORA_HEADER HDR; // Common header.
byte VMAJ; // The major version number.
byte VMIN; // The minor version number.
byte VREV; // The version revision number.
uint16 FMBW; // The width of the frame in macro blocks.
uint16 FMBH; // The height of the frame in macro blocks.
uint PICW:24; // The width of the picture region in pixels.
uint PICH:24; // The height of the picture region in pixels.
byte PICX; // The X offset of the picture region in pixels.
byte PICY; // The Y offset of the picture region in pixels.
uint32 FRN; // The frame-rate numerator.
uint32 FRD; // The frame-rate denominator.
uint PARN:24; // The pixel aspect-ratio numerator.
uint PARD:24; // The pixel aspect-ratio denominator.
byte CS; // The color space.
uint NOMBR:24; // The nominal bitrate of the stream, in bits per second.
ushort QUAL:6; // The quality hint.
ushort KFGSHIFT:5; // The amount to shift the key frame number by in the granule position
ushort PF:2; // Pixel format.
ushort RES:3; // Reserved.

The only problem: the values I read from an actual Theora stream didn’t match the header. They almost matched. Except for the one field I actually cared about (of course). I need the KFGSHIFT field to interpret the granule positions of the Ogg stream, and even though it’s apparently hardcoded to 6 in every library out there, my OCD wouldn’t let me get by with that. I wanted the real value from the stream. So, I fired up a hex editor in Atom and started digging.


After counting bytes, and recounting bytes, I found the offending 16 bit value that contains the bitfields in the header: 00 c0. Or, bitwise:

msb                         lsb
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Which didn’t match the diagram in the Theora spec. Looking at that diagram, I was expecting: 80 01.

msb                         lsb
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0

WTF?! I expected QUAL, PF, and RES to all be 0. I expected KFGSHIFT to be 6. Why were the two most-significant bits set? What was going on? So I went back to the Theora spec. I read the code for the Theora reference implementation (libtheora). Finally, buried in section 5.2 of the spec I found the explanation of this craziness.

The decoder logically unpacks integers by first reading the MSb of a binary integer from the logical bitstream, followed by the next most significant bit, etc., until the required number of bits have been read. When unpacking the bytes into bits, the decoder begins by reading the MSb of the integer to be read from the most significant unread bit position of the source byte, followed by the next-most significant bit position of the destination integer, and so on up to the requested number of bits. Note that this differs from the Vorbis I codec, which begins decoding with the LSb of the source integer, reading it from the LSb of the source byte. When all the bits of the current source byte are read, decoding continues with the MSb of the next byte. Any unfilled bits in the last byte of the packet MUST be cleared to zero by the encoder.

So, logically bits are written to the Theora stream most-significant bit first. Okaaaay. A lot of people would look at this and say “well, they must have had a good reason for this”. I’ve been bit-twiddling since the early 80s, and I can unequivocally tell you: they don’t. This scheme is just confusing and crazy.

Sure, if you just write a bit streaming library and call it to read and write all of your values, you’ll never see this aspect. But just look at the structure in figure 6.2 above: only four values cannot be expressed in even byte quantities. A structure with bitfields is a fine way to deal with this problem, but instead of the bit fields in the header being arranged like this:

msb                         lsb
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0

They’re actually arranged like this:

msb                         lsb
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0

So, yes, that five-bit KFGSHIFT field is arranged so that it is virtually impossible to reference it as a single quantity, bit-wise. You can’t shift, you can’t mask. You have to access each piece separately and combine them. They also bear no resemblance to the diagram in the spec. So, my new, not-so-attractive structure looks like this (most fields omitted for brevity):

typedef struct __attribute__ ((__packed__)) _THEORA_ID_HEADER {
  . . .
  // !!!HACK!!! ok, per the Theora spec, bit fields are
  // "read" and "written" to bytes most-significant bit first. This means
  // that the KFGSHIFT field (which is, of course, the only field I'm interested
  // in here) isn't stored contiguously in the file. The MSBs are in the first
  // byte, the LSBs in the second. Crazy. http://www.theora.org/doc/Theora.pdf
  ushort KFGSHIFT_MSB:2;		// The amount to shift the key frame number by in the granule position
  ushort QUAL:6;		// The quality hint.

  ushort RES:3;     // Reserved.
  ushort PF:2;      // Pixel format.
  ushort KFGSHIFT_LSB:3;		// The amount to shift the key frame number by in the granule position


So, that structure along with the helper macro will give me the KFGSHIFT value I wanted, aka 6. Hopefully the next person implementing Theora will find this post and save themselves a head-desk moment.

One comment

  1. Tim Jowers says:

    Totally matches what I’ve found in using libraries – even Java: the edge cases are often not programmed for and end up taking me a few days to create work-arounds. I notice you often roll-your-own when doing UI layer work and have come around to that approach myself. The libraries are just too brittle both in js and in Android.

Comments are closed.