Colorspace

"The real trick to optimizing color space conversions is of course to not do them." --trbarry, April 2002

When storing video digitally there are two philosophies in which you can store it: RGB and YUV. Each has a variation or two that change how accurate they are, but that's it (i.e. RGB16,RGB24,RGB32 and then YUV, YUY2, YV12, I420 etc).

RGB stores video rather intuitively. It stores a color value for each of the 3 color levels, Red Green and Blue, on a per pixel basis. The most common RGB on computers these days is RGB24 which gives 8 bits to each color level (that's what gives us the 0-255 range as 2 to the 8th power is 256), thus white is 255,255,255 and black is 0,0,0.

RGB uses the fact that three color components can be added together to create any color. In contrast, YUV stores the color the same way human brain works.

Now comes an explaination: the primary thing that human brain acknowledges is brightness, aka "luma".

Luma can be relatively easly calculated from RGB channels by averaging the color values and giving more weight to some colors over others to find out the luma value. Scientists came up with weights that match human perception where green has high contribution, red half of that, blue - one third of red. Why this is the case is simply a matter of the way the brain works and this perceptive model is important in how YUV was developed.

Luma is a simple positive value where zero means black and high values mean white.

As for the color information, things are not so easy. They are called U and V (or sometimes Cb and Cr respecitvely). They can have both positive and negative values which match the way color is processed in our brain.

Cr, when positive, means that the object is red. Cr negative means that the object is green.

Our brain understands these two colors as opposites - If you think about it, no object can ever be red-greenish.

Cb, when positive, indicates blue object. Cb negative means yellow. Again, they are the opposites for our brain and so we have the reasoning behind YUV as a color methodology.

So, why is it useful to store pictures in YUV?

There are several reasons for storing in YUV

- a historical reason: when color TV was invented, it needed to be both backwards and forwards compatible with black-and-white TV. The old B&W channel became luminance, while two color channels were added on top of that. Old TVs simply ignore the two extra channels while color TVs just understand that chroma is zero in B&W signal.

- you get channels that are regularly uncorrelated - if shadows move, only one channel is affected. This helps video compression quite a lot, because the same information is not repeated for different channels.

- you get one channel that is much more important, and two channels that are less important (but necessary). You can do tricks with this idea, as you will see.

So, when dealing with YUV you can imagine Y as being the black and white image then U and V as the "coloring" of the image. Here's a visual example:

YUV: Orignal Image and Luma Channel
UV channels


You can see straight away that the color information is much less detailed. This is true, but even if it wasn't the reality is that you just can't notice detail as much in the chroma channel (remember biology - rods and cones... you have more rods, you can't actually see colour as clearly as you can see luma.)

Although you can have one Y, U and V sample per pixel like you do with R,G and B, it is common for the chroma samples (the U and V) to be sampled less often because the accuracy of the chroma is less noticable. There are a many ways to do this but we are going to demonstrate the two that you will deal with most - YUY2 and YV12.

YUY2 is a type of YUV that samples the luma once every pixel but only samples the chroma once every horizontal pair of pixels - the point being that the human eye doesn't really notice that the chroma of the two pixels is the same when the luma values are different. It's just like the way you can be less accurate when coloring in a black and white picutre than if you were making the picture from scratch with only colored pencils.

YUV 4:2:2 diagram

So basically YUY2 stores color data at a lower accuracy than RGB without us really noticing that much. Effectively what happens is that the chroma information is half the regular vertical resolution.

Due to this nature of YUY2, when you convert between YUY2 and RGB you either lose some data (as the chroma is averaged), or assumptions have to be made and data must be guessed at or interpolated (because the chroma is averaged already we can't find out what the real value was before).

Even less chroma sampling: YV12

YV12 is much like YUY2 but takes this one step further. Where YUY2 samples chroma once in every 2 pixels in a row, YV12 samples chroma once in every 2x2 pixel block! You'd think that only having one chroma sample in a 2x2 square would look terrible but the fact is that we don't really notice the difference all that much. Of course, because there are so many less chroma samples (it's effectively half the resoltion than the luma) there is less information to store which is great for saving bits. All major distribution codecs use a 1 chroma for 4 pixels method - including MPEG2 on DVDs.

The top image is an original and below it is an image sampled with YUV 4:2:0 (YV12) sampling , notice how the colors of the hairline at the top left become puzzled because of the chroma averaging between pixels.

Anime eye - original image
Anime eye - 4:2:0 sampled

The sharp among you may think "um ok but what if the image is interlaced - you'd be sampling color from two different fields!" and you'd be right... which is why YV12 interlaced footage has to be sampled a field at a time instead of a frame at a time.

Colorspace Conversions

Converting back and forth between colorspaces is bad because you can lose detail, as mentioned, and it also slows things down. So, you want to avoid colorspace conversions as much as possible. But how?

Well, you need to know two things - the colorspace of your footage and the colorspace used by your programs.

Footage Colorspaces:

DVDs - These use MPEG2 with 4:2:0 YUV (YV12) color. There is one chroma sample for each square of 2x2 pixels (unlike the YUY2 picture above which has one chroma for each group of 2x1 pixels).

DV - This uses 4:1:1 YUV which has the same number of chroma samples as MPEG2 but in a different order.

Mjpeg - This can use all kinds of YUV sampling but 4:2:0 (YV12) is very common.

MPEG1, 2 and 4 (divx etc) all use YV12 color (although they can technically support other YUV modes in theory, just not in practice). There are MPEG2 profiles (such as the Studio Profile) which can deal with 4:2:2 chorma but mostly you will see 4;2:0 chroma being used.

HuffYUV - currently YUY2 and RGB storage is possible. YV12 is developmental.

Programs:

Premiere, and almost all video editing programs, work in RGB because it's easier to deal with mathematically. Premiere demands that all incoming video should be in RGB32 - or 24-bit color with 8-bit alpha channel, specifically, and will convert the YUV footage you give it to that format for processing. Even Premiere Pro which hailed itself as being able to support YUV formats can only support 4:4:4 uncompressed YUV which is hardly any different from RGB. The native DV support is useful but it still doesn't warrant all the hype as very few of the plugins (including Adobe's own sample code) actually use the YUV support at all.

AVISynth can work in either colorspace, but YUV is preferred by many filters and most (if not all) AVISynth filters run in YV12 colorspace these days - which is fine as it is the format used by DVD and other distributable forms like mpeg1, mpeg4 and so on.

TMPGEnc's VFAPI plugins all operate in RGB colorspace because all of its filtering and processing runs in RGB.

This is also true of FlaskMPEG (mostly)

VirtualDub (and variants) run in RGB mode when you use Normal Recompress or Full Processing Mode (in the Video dropdown menu). All of VirtualDub's internal functions and filters run in RGB colorspace only. However, Fast Recompress doesn't decode the video to RGB, and instead just shunts whatever your source is into the compressor you've selected - thus if your source is a YUV type then it shunts the video data as YUV into the video compressor.

This is, of course, important because you need to know what conversion has to take place if you use source X in program Y or compress a certain format X using program Y.

For example, using Fast Recompress in VirtualDub (or any of its variants) is not only the fastest way to transcode video but also the least costly in terms of colorspace conversions. The drawback is you cannot you any of VirtualDub's filters in Fast Recompress mode - VirtualDub never even touches the incoming video stream.. which is why you should use AviSynth for any video processing work.

By scripting all your filters in AVISynth you can avoid many color conversions. The optimal scenario involves only 2 colorspace conversions: MPEG2 from DVD in YV12, processed with Avisynth in YV12 and then converted to RGB32 ready for editing. RGB export form editing program using Huffyuv in RGB mode and then converted to YV12 ready for the final video compressor. By doing this you not only save time but also quality by avoiding colorspace conversions.

Many people once used YUY2 mode in HuffYUV all the time thinking it was lossless - so they would import into Premiere, edit, export to YUY2, import again, edit, export again and each time it went into Premiere it was being converted into RGB and each time it was compressed to HuffYUV it was being converted to YUY2. This has lead to many faded-looking videos in the past.

Thankfully, with a little knowledge of these colorspaces, you can avoid doing this or at least only do it when you really need to.