ProgrammingWithTheora
Title: Programming With Theora
Author: David Barrett (dbarrett@quinthar.com)
Last Updated: 2004/8/13

If you're reading this, you probably already know that Theora is an open-source, patent-free video compression library. It uses a simple, single-frame C API, and operates on streams of YUV4:2:0 (also known as IYUY) images. This document gives a very cursory overview of how to get up and running with Theora.

Getting Theora
The official Subversion repository for Theroa is hosted by Xiph.org. You can get just the source by using the following Subversion command:

 

However, "libtheora" depends on "libogg", the code for which can be retrieved using:

 

Building Theora
Going from having the Theora source code to having "libtheora" linked into your application is non-trivial, and out of the scope of this document. One tip I will offer for Win32 programmers is to use the Ogg DirectShow filter project to build "libtheora". Even if you're not using DirectShow, it has all sorts of Visual Studio .NET projects ready to be compiled and linked:

 

Encoding With Theora
Theora is remarkably simple to use, once you know what to do. Here is a quick overview of how to encode a series of images using Theroa:

First, you need to initialize Theora. There are lots of options to tweak Theora for your needs, but most aren't necessary to understand when getting started:

theora_info encoderInfo;
theora_info_init( &encoderInfo );
encoderInfo.frame_width        = INPUT_FRAME_WIDTH;  // Must be multiple of 16
encoderInfo.frame_height       = INPUT_FRAME_HEIGHT; // Must be multiple of 16
encoderInfo.width              = encoderInfo.frame_width;
encoderInfo.height             = encoderInfo.frame_height;
encoderInfo.offset_x           = 0;
encoderInfo.offset_y           = 0;
encoderInfo.fps_numerator      = INPUT_FRAMES_PER_SECOND;
encoderInfo.fps_denominator    = 1;
encoderInfo.aspect_numerator   = encoderInfo.width;  // Aspect =  width/height
encoderInfo.aspect_denominator = encoderInfo.height; //
encoderInfo.colorspace         = OC_CS_UNSPECIFIED;
encoderInfo.target_bitrate     = 45000; // Anywhere between 45kbps and 2000kbps
encoderInfo.quality            = 16; 
encoderInfo.dropframes_p                 = 0;
encoderInfo.quick_p                      = 1;
encoderInfo.keyframe_auto_p              = 1;
encoderInfo.keyframe_frequency           = 64;
encoderInfo.keyframe_frequency_force     = 64;
encoderInfo.keyframe_data_target_bitrate = encoderInfo.target_bitrate * 3 / 2;
encoderInfo.keyframe_auto_threshold      = 80;
encoderInfo.keyframe_mindistance         = 8;
encoderInfo.noise_sensitivity            = 1;
theora_state theoraState;
theora_encode_init( &theoraState, &encoderInfo );
theora_info_clear( &encoderInfo );
Now that Theora is initialized, you can create the first packets you will send to your decoder. There are three Theora header packets: the "header" packet, the "comment" packet, and the "table" packet. These are created as follows:

ogg_packet headerPacket, commentPacket, tablePacket;
theora_encode_header( &theoraState, &headerPacket )
COPY_WRITE_OR_TRANSMIT_PACKET( headerPacket )
theora_comment encoderComment;
theora_comment_init( &encoderComment );
theora_encode_comment( &encoderComment, &commentPacket );
COPY_WRITE_OR_TRANSMIT_PACKET( commentPacket )
theora_comment_clear( &encoderComment );
theora_encode_tables( &theoraState, &tablePacket );
COPY_WRITE_OR_TRANSMIT_PACKET( tablePacket )
Each of these packets is stored in an "ogg_packet" structure. Ogg is a multimedia container format also maintained by Ogg. The Theora encoder uses "libogg" for convenience, though you can encode and decode Theora video without any knowledge of Ogg. Theora initializes these packets with pointers to some of its internal buffers, so be sure to copy the packet data, write it to disk, or transmit it to the receiver before going on.

Next, the input image must be in a planar YUV4:2:0 format, sometimes called IYUY. This image is different than a typical RGB images in three ways. First, it uses the YUV colorspace, where Y is the luminance (grayscale) channel, and UV represent color. Second, it's planar, in that the Y, U, and V channels are stored in three separate contiguous arrays. Third, the U and V channels are a quarter the size of the Y channel. So, once you have your data in the right format, you're ready to encode with Theora.

char yBuffer[ INPUT_FRAME_WIDTH * INPUT_FRAME_HEIGHT ];
char uBuffer[ INPUT_FRAME_WIDTH * INPUT_FRAME_HEIGHT / 4 ];
char vBuffer[ INPUT_FRAME_WIDTH * INPUT_FRAME_HEIGHT / 4 ];
The Theora encoder uses the "yuv_buffer" structure, which just encapsulates the planar YUV data:

yuv_buffer yuv;
yuv.y_width   = encoderInfo.width;
yuv.y_height  = encoderInfo.height;
yuv.y_stride  = encoderInfo.width;
yuv.y         = yBuffer;
yuv.uv_width  = INPUT_FRAME_WIDTH / 2;
yuv.uv_height = INPUT_FRAME_HEIGHT / 2;
yuv.uv_stride = yuv.uv_width;
yuv.u         = uBuffer; 
yuv.v         = vBuffer; 

Just pass this frame to Theora...

theora_encode_YUVin( &theoraState, &yuv );

... read out the packet ...

ogg_packet framePacket;
theora_encode_packetout( &theoraState, 0/*not last*/, & framePacket );
COPY_WRITE_OR_TRANSMIT_PACKET( framePacket )
... and you've just encoded your first entire Theora frame! The result is stored in the "ogg_packet" structure. The details of that packet don't really matter, just so long as you faithfully record/transmit all its data and recreate it within the decoder. To finish up the encoder just clean up Theora as follows:

theora_clear( &theoraState );

And that's all there is to encoding with Theora.

Decoding With Theora
Decoding is even simpler than encoding. To initialize the Theora encoder, get the three header packets we created while encoding ("headerPacket", "commentPacket", and "tablePacket"), and decode them in order:

theora_info theoraInfo;
theora_comment theoraComment;
theora_state theoraState;
theora_info_init( &theoraInfo );
theora_comment_init( &theoraComment );	
theora_decode_header( &theoraInfo, &theoraComment, headerPacket );
theora_decode_header( &theoraInfo, &theoraComment, commentPacket );
theora_decode_header( &theoraInfo, &theoraComment, tablePacket );
theora_decode_init( &theoraState, &theoraInfo );
Then for each frame just decode the packet into Theora, and read out the frame data:

yuv_buffer outputYUV;
theora_decode_packetin( &_theoraState, framePacket );
theora_decode_YUVout( &theoraState, &outputYUV );
That's it! Just decode each frame in order, and you're good to go.

One thing to keep in mind is the stride of the output YUV image. Again, Theora uses a planar YUV4:2:0 format, which means each channel is kept in its own array. Furthermore, the array storing each channel is non-contiguous. This means that each row of Y, U, or V data is separated from the next row by some number of non-image bytes. On top of that, the rows are not necessarily stored in order. What does all this mean? It means you need to be very careful to account for the "stride" of each channel when decoding the image.

The stride of a channel is the number of bytes from the start of one row to the start of the next. The stride can be longer than the width of the row, and can be negative. So, for example, to re-contiguify (is that a word?) a noncontiguous channel using its stride, you can do the following:

// Contiguify the Y channel
char  yBuffer[ OUTPUT_IMAGE_WIDTH * OUTPUT_IMAGE_HEIGHT ];
char* srcY = outputYUV.y;
char* dstY = yBuffer;
for( int c=0; c<outputYUV.y_height; ++c )
{
    // Copy into the dst buffer
    memcpy( dstY, srcY, outputYUV.y_width );
    dstY += outputYUV.y_width;
    srcY += outputYUV.y_stride;
}
Do the same for the U and V channels (remember, they're 1/4 size of the Y channel!) and you're ready to go.

Finally, to shut down the Theora decoder:

theora_clear( &theoraState );
theora_info_clear( &theoraInfo );
theora_comment_clear( &theoraComment );
And you're all done.

Concluding Thoughts
This overview is just a minimal, basic introduction into the use of "libtheora". There's a lot more to know, especially when it comes to synchronizing audio and video, tweaking Theora to your needs, and so on. But this should get you started and whet your appetite for more! If you have any questions, please direct them to the Theora Developers mailing list (theora-dev@xiph.org), and we look forward to hearing about your experience!

Frequently Asked Questions
1.What's the relationship between Ogg and Theora, and "libogg" and "libtheora"?
 
Ogg is a standardized data format for multimedia files and streams. Theora is a standardized data format for compressed video data. "libogg" is an open-source implementation of a library of tools to create files and streams conforming to the Ogg data format. "libtheora" is an open-source implementation of an encoder and decoder that compress video into data conforming to the Theora data format. "libtheora" uses "libogg", even though the Theora data format standard does not rely upon the Ogg data format.

2.How do I convert RGB images to/from the IYUY format used by the "libtheora" encoder/decoder?
 
Colorspace conversion is a big topic with lots of resources online. However, I recommend that you investigate the Intel Performance Primitives (IPP) library, as it offers tons of image conversion and manipulation routines optimized for every Intel processor. It's a commercial library, but it has a free trial and will get you up and running while you look for an open-source alternative (or write your own).


Login
Powered by QwikiWiki v1.4 - www.qwikiwiki.com