This is an old revision of the document!

Hack Terms FAQ

Maybe we ought to have a thread where we draft up and refine a good FAQ on the subject. Here's a first stab at it, and obviously, changes, amendations, and clarifications are needed.

The Basic Overview

In an ideal world of fast cameras and unlimited data storage, video would be stored as a series of high-resolution frame images, all stored in a lossless and uncompressed data format.

Sadly, most video cameras don’t support these luxuries. When video is recorded, it is usually compressed; otherwise, it'd require writing massive amounts of data to the memory card, which may not be large enough or fast enough to handle it. This requires the camera to perform some very intensive processing to compress the video data, so that it can be written to the memory card.

Because this processing takes place during shooting, the camera has to do this very quickly. So the camera relies on shortcuts, which will be described later in this FAQ. As a result, the camera uses “lossy” data compression, and the image and motion quality can be compromised. (For now, we recommend reading Wikipedia on http://en.wikipedia.org/wiki/Video_coding#Video.)

Vitaly's hack enables users to reconfigure the way the GH2 camera encodes the video, in order to improve the quality of its recorded images and motion.

This reconfiguration is complex; sometimes, the settings may not work with each other. (For example, some patches with high image quality may have trouble recording acceptable sound, or “spanning” across multiple files. More about this below.) So some users have spent a lot of time and effort experimenting with the settings, and they have developed “patches” with various strengths and weaknesses. These include the Sanity patch, the FlowMotion patch, and the many patches developed by Driftwood.)

New or less-techy users may find it daunting to read through Personal View's lengthy forum discussions, just to find out what patch is best for their needs. This FAQ is intended to give such users a rough idea of what's going on.

Video Compression Basics

Imagine that you have a car that’s controlled by a computer, and you want it to travel down ten miles of perfectly straight highway all by itself, with no driver. One way to do this is to drive the car yourself, once, and have the car’s computer “record” the data—the car’s position, its orientation, speed, steering, etc. In theory, you could then put the car at its starting point, hit the Playback button, and the car should follow the same route every time.

Recording video is like driving the car on its initial trip down the highway, and assembling the data about the trip.

Ideally, you’d want to record every bit of data every second. But let’s also say that you have technical reasons to make the data smaller. Maybe the computer can’t record the data at every second—or, when it plays the data to the car, the car can’t respond very quickly. So, you find ways of making the data smaller. Maybe you decide to record the data every five seconds; that’d reduce the data down by 80%. Or, consider this: when you’re actually driving, you just tap the steering wheel every couple of hundred yards or so, and you’ll stay on course well enough. That might shrink the data down even further. So you just record the data when it changes.

In theory, the car should stick close enough to the road to get where it’s supposed to go, without any catastrophic problems. But the more you use these data shortcuts, the more the car will drift from its intended path. It may not drift much, but it’ll drift.

This is what video data compression is about. If you record the full data for every frame, you get perfect data, but those big files can be difficult to play back. But the techniques to make the data smaller introduce uncertainty, noise, and drift.

Interframe and Intraframe

There are two basic techniques by which video can be compressed. One very simple technique is to take each frame of the video, and use a compression system like the JPEG standard to compress it. (Think of taking an uncompressed TIFF file and saving it as a JPEG.) This can be very lossy, or even lossless, but it also requires a lot of processing work by the camera.

The fact that we’re compressing video files, and that video frames look very much like each other, enables a more complicated technique called “interframing.” This is the technique used by the GH2.

Here, the camera records an initial frame as a full photo. The camera then records the subsequent frames as only the changes that occur from the previous frame. (See http://en.wikipedia.org/wiki/Video_coding#Video) for a discussion.) So the video file may consist of the initial Frame 1, and then data that contains only changes between Frame 1 and 2, then between frame 2 and 3, etc.

Again, Intraframing requires the camera to perform a LOT of math before writing the data to the memory card. It is also a lossy technique, especially when combined with further compression techniques such as macroblocking and motion prediction.

Macroblocking and Motion Prediction

Earlier we said that the video records only the changes between one frame and the next. There are several ways of recording these changes, some of which allow greater compression.

For example, instead of recording the changes for every pixel, the camera may calculate the changes in small groups of pixels, called “macroblocks.” This FAQ won’t go into the complicated description (consult http://en.wikipedia.org/wiki/Macroblocks if you want that), but think of it this way. Sometimes, you’ll see little squares in your video, especially in areas of movement that are difficult to resolve (like water in a turbulent stream). This is an artifact of macroblocking, and it’s one of the reasons why people hack the GH2 camera; we want to reduce or eliminate that garbage.

Another technique involves examining the frames, examining how parts of the image move , and recording how whole blocks or areas move about. Sometimes, instead of recording all the individual pixel changes, it’s easier to say (in essence) “Take this group of pixels and move it one pixel to the left.” This is called “motion prediction.” Motion prediction can compress the data even further, because sometimes it’s possible to predict the motion of certain pixels across several frames, and not only the next frame.

Let’s go back to that robot car we were talking about earlier. Let’s say the car is at a certain point on the highway, but we know where it’s facing, how fast it’s going, and where the steering wheel is pointed. So we can predict where the car will be, and all of the other data about its travel, five or ten seconds from now. And in video, we can predict where a group of pixels will be four or five frames into the future. (The further into the future, the less reliable the predictions are. Like the weather.) Again, this is a complicated operation, so read http://en.wikipedia.org/wiki/Inter_frame for more information.

But here’s what you can take away for now. The video compression used by the GH2 works like this. The first frame is stored as-is. Each subsequent frame is stored as the changes from the previous frame. The changes are calculated using techniques like macroblocking and motion prediction. And these techniques sometimes get it wrong. Errors crop up. Motion doesn’t always match predictions. Macroblocking can look blocky, so both of these are lossy compression techniques.

P frames and B frames

We said earlier that full video would be a series of full frames stored at full resolution, while compressed video stores only one frame and estimates the rest. So you may have one initial, perfect frame, but the frames that follow become less and less perfect because prediction’s never quite accurate. So, the camera uses another technique to check the predicted frames against later frames.

This is where data compression and video can get very complicated. The Wikipedia page http://en.wikipedia.org/wiki/Inter_frame#Frame_types has a good description of this, and we’ll use its illustrations as an example.

Every tenth frame will be a perfect, uncompressed frame—it’s called an I-frame (for “initial”). Your camera uses motion prediction to “predict” what the fourth and seventh frames will be from that I-frame. The fourth and seventh frames are called P-frames (for “predicted”)

And the frames in between the P-frames—the B-frames, for “bi-directional”) are, in turn, “predicted” from the surrounding P-frames.

The three types of frames differ in terms of their reliability for accuracy. I-frames are, of course, accurate and reliable. P-frames are estimates derived from the I-frame—they’re not perfect, but they’re very good. B-frames are estimates derived from P-frames—so they’re not as good as P-frames.

Because the predicted frames are checked against frames in the past and future, the ‘drift” from macroblocking and motion prediction is reduced.

Group of Pictures (GOP)

http://en.wikipedia.org/wiki/Group_of_pictures

This batch of frames– the initial I-frame and the following P- and B-frames—is referred to as a Group of Pictures, or GOP. (Every I-frame starts a new GOP.) Each GOP has two values that determine its structure; the GOP Size and GOP length.

The GOP Size is the number of frames between each P-frame. In our example above, that value would be 4. The GOP Length is the number of frames in the group—or the number of frames between I-frames. In our example, this value would be 10.

Obviously, changing these numbers affects the quality of the data compression. If the GOP Length was reduced, we would have more frequent I-frames, and the video data would be less compressed and more perfect. If the GOP Size were reduced, then we would have more P-frames and fewer B-frames—the data would be less compressed, less lossy, and with less drift.

So, at long last, we learn something about how the GH2 patches improve the quality of your video—by changing the GOP values.

Bitrate

If the GH2 is shooting a lot of action, it can’t decide to simply shoot constant I-frames, because that’d create too much data for the system. And if it’s shooting a lot of placid, low-movement images, it can’t compress it down to a still image. It can’t decide to use macroblocks that are too big, because the image would start to look blocky, and it can’t use small macroblocks too often because, again, it’d create too much data.

So the camera has to have limits. The camera is creating data that falls within a range, a sweet spot between good compression and high detail. This range is specified by the bit rates.

A unhacked GH2’s highest bit rate is 24mbps. So the GH2 is constantly adjusting its compression techniques, as described above, so that the data stream doesn’t exceed 24mbps.

But if you have a hacked GH2, you can specify a higher bit rate—say, 40mbps, 80mbps, or in the case of some of Driftwood’s patches, more than 100 mbps. Now, the camera has the leeway to use compression techniques that are less lossy, and preserve movement and detail more effectively. That is the advantage of higher bit rates.

This does not always work. You can’t simply use Vitaly’s hack to specify a 100 mbps bit rate and the camera to work flawlessly. Your memory card may not be fast enough or large enough to handle a flood of uncompressed data. The CPU can overload. Or, the other settings, like the GOP values, create conflicts or don’t work right. Happily, we don’t have to go into why this doesn’t always work. That’s for the people who design and test the patches, and who read all of the forum posts about these things. They try various combinations of GOP sizes, bit rates, specs for the macroblocking and motion prediction, until they find combinations that do work.

The rest of us don’t need to understand this at that level. All we need to know for now is what the “bit rate” really means for us. The bit rate is a number that shows how much range the camera has in compressing video; the higher the number, less loss in compression.

Interframe and intraframe

These are two techniques by which video is compressed. The simpler technique is “intraframing.” For each frame of video, the camera compresses it down in the same manner as a JPEG picture file. This can be very lossy, or even lossless, but it also requires a lot of processing work by the camera.

A more complicated method is “interframing.” Here, the camera records an initial frame as a full photo. The camera then records the subsequent frames as only the changes that occur from the previous frame. (See http://en.wikipedia.org/wiki/Video_coding#Video) for a discussion.) So the video file may consist of the initial Frame 1, and then data that contains only changes between Frame 1 and 2, then between frame 2 and 3, etc.

Since most frames aren't stored as full frames, but are calculated, this technique requires the camera to make a lot of calculations before writing the data to the memory card. It is also a lossy technique. It can be made less lossy by sampling full frames more frequently.

Refinements to interframe Not written yet: see http://en.wikipedia.org/wiki/Inter-frame#H.264_Inter_frame_prediction_improvements for now.

Personal View FAQs Wiki

Table of Contents

Hack Terms FAQ

The Basic Overview

Video Compression Basics

Interframe and Intraframe

Macroblocking and Motion Prediction

P frames and B frames

Group of Pictures (GOP)

Bitrate

Interframe and intraframe

Personal View FAQs Wiki

User Tools

Site Tools

Table of Contents

Hack Terms FAQ

The Basic Overview

Video Compression Basics

Interframe and Intraframe

Macroblocking and Motion Prediction

P frames and B frames

Group of Pictures (GOP)

Bitrate

Interframe and intraframe

Page Tools