Pro: AVCHD Quantization process

cbrandin

On the 48 entry tables - that's the conclusion I came to - otherwise the sequence of quantization values doesn't make sense. That might imply that quantization (and therefore other coding functions as well) happen in a 4:4:4 color sample environment and that color sub-sampling happens later. Actually, that makes sense because if you want a codec that can handle other color sample schemes you would only want one coding engine common to all color sub-sample schemes and do the sub-sampling at the end; it results in much simpler code. Either that or there are simply 4x as many blocks for the Y component as the others. I doubt that, though, because that would make motion vector calculations more complicated. There might be something interesting to explore here. To some extent I would assume that Panasonic has a common codec that is used across several products; otherwise code maintenance would be a nightmare.

I have no idea what the 66 entry table might be. I'll look at the reference codec to try to locate any 66 entry tables.

Chris

Vitaliy_Kiselev

About 48 elements tables.
My current understanding is that this is 3 4x4 tables in zigzag pattern.
What do you think?

Vitaliy_Kiselev

@cbrandin

Any idea that it could be?

Vitaliy_Kiselev

Yes it is frames number having whole GOPs that is closes to one second.

cbrandin

I had a look at the H.264 reference code (available at http://iphome.hhi.de/suehring/tml/) and if you look at the routines having to do with quantization (they are named Qxxxxxx in the header and c code sections) you can see structural similarities with the GH2 tables. The numbers inside the structures are different however. The reference code is not optimized (it's very, very slow), so there is no pre-calculation of table values, as I suspect Panasonic is doing. Another place to look is the x264 codec (available at http://www.videolan.org/developers/x264.html) which is optimized. I haven't gone through that yet, but I am somewhat hopeful that there will be some hints in there for us.

As to the numbers you mentioned related to GOP length. Actually, they seem more related to framerate, corresponding to the GOP closest to 1 second. I vaguely recall the same parameters in the GH1 codec. I don't remember what the value for 1080/50 was, whether it was 40, 50, 0r 52 (because the GOP was 13 for the GH1). I had surmised that it had something to do with when frames are flushed to flash memory, but I'm not sure.

Chris

Vitaliy_Kiselev

Sorry, just tired, I fixed original post.

cbrandin

I'm confused - what's the difference between 1080p24 and 1080p?

Vitaliy_Kiselev

@woody123
Use PM for thanks and similar stuff next time ok? :-)

Vitaliy_Kiselev

Yes, encoder always work with 1088.

cbrandin

By the way, a confusing thing about this is why there are 8160 blocks and not 8100. If you take 1920x1080 you get 2073600 pixels. That divided by 256 (16x16) equals 8100, not 8160. The catch is that 1080 is not a multiple of 16 so you have to add an extra half-row of macroblocks. The actual calculation is (1920 x 1088) / 256 = 8160.

Chris

Vitaliy_Kiselev

@woody123
Yes, I got that it is 16x16 blocks using math skills :-)
For 720p they are using 3600 constant.

Other interesting thing that in the same block we have
setting of value that is proportional to GOP length.
For 1080p24 we have 24 (GOP=12), 60 for 1080i60 (GOP=15), 48 for 1080i50 (GOP=12).

cbrandin

You beat me to it. Nice reference too!

Chris

woody123

http://pro.sony.com/bbsccms/ext/cinealta/docs/ibc2004-SR_wpaper_v5.pdf
pages 4,5

"Picture segmentation
Each 4:2:2 PsF 1920 x 1080 picture is first reconstituted into a progressive 1920 x 1080
frame, then each frame is divided into 8160 16x16 shuffle blocks for luminance and two co-sited 8160 8x16 blocks for chrominance. In the case of 4:4:4 PsF, there are three 8160
16x16 blocks for each of RGB. In the case of interlace signals, each field is treated as an
independent 1920 x 540 field, and is divided into 4080 16x16 blocks for luminance and two
4080 8x16 blocks for chrominance. An example for 4:2:2 PsF is shown in figure 2."

LPowell

Vitaliy, as you noted, the asm listing calculates an index into the 52-element quantization table, in part by using a sequence of hard-coded reference levels (137, 152, 168, 192, 216, 240) as index cut off points. If there is only a single active instance of this routine in the encoder, the hard-coded reference levels could be patched with different values to globally bias the Qstep selection toward higher quality quantization factors.

Alternately, the coarse end of the Qstep index range could easily be capped, boosting the quality of low-detail macroblocks without increasing the bitrate of medium and high-detail blocks. The to_check routine limits the most coarse quantization index to 51; if this hard-coded value were decreased, it would force the encoder to use a higher quality quantization factor in low-detail macroblocks.

Vitaliy_Kiselev

Chris.
Do you recognize 4080 and 8160 constants (used for interlaced and progressive 1080 footage)?
All I found is that some encoders report this numbers with AC, DC and MV.

cbrandin

Maybe they do the block encoding first, and then the color subsampling. That would make sense if the codec was intended to support higher color subsampling rates. Boy, that would be nice - fat chance, though.

Chris

Vitaliy_Kiselev

OK.
Feel free to PM me or use email for more detailed things.

It looks like I found some funny setting that specify upper and lower bitrate limits for each encoder mode.

cbrandin

I'll have to study this a bit against the H.264 standard and see if I can correlate it to some of the reference codecs. It's been a while since I've looked at all this.

Chris

Vitaliy_Kiselev

As far as I remember from GH1 they are words. Here didn't find actual usage yet.

Vitaliy_Kiselev

Here is one:
word 0x906, 0xE08, 0xE08,0x100A,0x120A,0x100A,0x120C,0x1810,0x1810,0x120C,0x2014,0x5218,0x2014,0x6C1C,0x6C1C,0x6C20
word 0x805, 0xF0A, 0xF0A,0x1E14,0x140F,0x1E14,0x6050,0x6050,0x6050,0x6050,0x806C,0x806C,0x806C,0x9C8C,0x9C8C,0x9CB0
0x805, 0xF0A, 0xF0A,0x1E14,0x140F,0x1E14,0x6050,0x6050,0x6050,0x6050,0x806C,0x806C,0x806C,0x9C8C,0x9C8C,0x9CB0

cbrandin

Or, there might actually be 24 elements, which would correspond to 4x4 for Y, and two sets of 2x2 for U and V.

Chris

cbrandin

Really? I would expect three parts; one for the Y component which might be, say, 16x16, and the U and V parts would be 8x8 - or, half of the Y table's dimensions (whatever they are). Color subsampling, another trick that contributes to compression, typically occurs before quantization. Come to think of it, with 48 entries I would expect the Y parts to be 32 elements, the U and V parts to be 8 elements each. Those are somewhat strange sizes as they do not correspond to squares, but there might be some data packing going on. How big is each element?

Chris

Vitaliy_Kiselev

@cbrandin

I know all that you said.

In practice we have 52 elements tables (three for each mode).

And also 4 tables for each mode consisting from 48 elements (similar ones are used in GH1 encoder).
Each such table looks like 3 parts consisting from 16 elements (this can be 4x4 matrix in fact).

cbrandin

lpowell is right, you don't want to mess with coefficient tables, etc... Actually, I'm not sure anything is to be gained by playing with quantization tables either. Compression typically happens on a macroblock level. A typical 4x4 macroblock (we'll keep it small just for this example) would look something like this:

A1 A2 A3 A4
B1 B2 B3 B4
C1 C2 C3 C4
D1 D2 D3 D4

A1 is the DC Coefficient. The rest are all AC coefficients. Basically, the DC coefficient sets the base value and the AC coefficients are offsets using A1 (the DC coefficient) as the base. As you go to the right you'll see horizontal values representing higher frequency horizontal coefficients (i.e. more detail in the horizontal plane). As you go down you see higher frequency components for the vertical plane. So, the top left is the lowest detail on both planes, and the bottom right is the highest.

Quantization basically works by chopping off values going toward the bottom-right. You'll still see the entire macroblock, but values toward the bottom-right will be zeros after quantization. When the macroblock is transmitted huffman encoding (or the equivalent) is used in a zig-zag pattern, processing coefficients in an order where A1 comes first, followed by A2, followed by B1, etc..., with D4 coming last. This will cause in all the high frequency coefficients which have been set to zero by the quantization process to all be in a row at the end of the bitstream for the macroblock, which the huffman encoding will turn into just a few characters.

During the encoding process H.264 codecs will typically choose a quantization level according to available bandwidth, so theoretically it should not be necessary to mess with quantization tables. The codec should simply truncate, or not truncate macroblock entries according to available bandwidth.

Chris

Vitaliy_Kiselev

I think that this routine returns index in table:

VEnc_GetQuantIndex: ! CODE XREF: Venc_Encoder_Quant
mov 0xFFFFFFFF, D2
mov D0, D1
asr 9, D1
cmp 1, D1
blt skip_loop

setlb
asr 1, D1
inc D2
cmp 1, D1
lge

skip_loop: ! CODE XREF: VEnc_GetQuantIndex+8j
mov 2, D1
add D2, D1
asr D1, D0
mov 4, D1
cmp 137, D0
ble set

mov 5, D1
cmp 152, D0
ble set

mov 6, D1
cmp 168, D0
ble set

mov 7, D1
cmp 192, D0
ble set

mov 8, D1
cmp 216, D0
ble set

mov 9, D1
cmp 240, D0
ble set

mov 0xA, D1

set: ! CODE XREF: VEnc_GetQuantIndex+1Dj
! VEnc_GetQuantIndex+25j ...
udf00 6, D2
add D1, D2
cmp 0, D2
bge to_check

clr D2
bra return

to_check: ! CODE XREF: VEnc_GetQuantIndex+4Fj
cmp 51, D2
ble return

mov 51, D2

return: ! CODE XREF: VEnc_GetQuantIndex+52j
! VEnc_GetQuantIndex+56j
mov D2, D0
retf [D2], 4

! End of function VEnc_GetQuantIndex

Howdy, Stranger!

Categories

Tags in Topic

Top Posters