Personal View site logo
Pro: AVCHD Quantization process
  • 131 Replies sorted by
  • On the 48 entry tables - that's the conclusion I came to - otherwise the sequence of quantization values doesn't make sense. That might imply that quantization (and therefore other coding functions as well) happen in a 4:4:4 color sample environment and that color sub-sampling happens later. Actually, that makes sense because if you want a codec that can handle other color sample schemes you would only want one coding engine common to all color sub-sample schemes and do the sub-sampling at the end; it results in much simpler code. Either that or there are simply 4x as many blocks for the Y component as the others. I doubt that, though, because that would make motion vector calculations more complicated. There might be something interesting to explore here. To some extent I would assume that Panasonic has a common codec that is used across several products; otherwise code maintenance would be a nightmare.

    I have no idea what the 66 entry table might be. I'll look at the reference codec to try to locate any 66 entry tables.

    Chris
  • About 48 elements tables.
    My current understanding is that this is 3 4x4 tables in zigzag pattern.
    What do you think?
  • @cbrandin

    Any idea that it could be?

    image
    quant1.png
    456 x 273 - 4K
  • Yes it is frames number having whole GOPs that is closes to one second.
  • I had a look at the H.264 reference code (available at http://iphome.hhi.de/suehring/tml/) and if you look at the routines having to do with quantization (they are named Qxxxxxx in the header and c code sections) you can see structural similarities with the GH2 tables. The numbers inside the structures are different however. The reference code is not optimized (it's very, very slow), so there is no pre-calculation of table values, as I suspect Panasonic is doing. Another place to look is the x264 codec (available at http://www.videolan.org/developers/x264.html) which is optimized. I haven't gone through that yet, but I am somewhat hopeful that there will be some hints in there for us.

    As to the numbers you mentioned related to GOP length. Actually, they seem more related to framerate, corresponding to the GOP closest to 1 second. I vaguely recall the same parameters in the GH1 codec. I don't remember what the value for 1080/50 was, whether it was 40, 50, 0r 52 (because the GOP was 13 for the GH1). I had surmised that it had something to do with when frames are flushed to flash memory, but I'm not sure.

    Chris
  • Sorry, just tired, I fixed original post.
  • I'm confused - what's the difference between 1080p24 and 1080p?
  • @woody123
    Use PM for thanks and similar stuff next time ok? :-)
  • Yes, encoder always work with 1088.
  • By the way, a confusing thing about this is why there are 8160 blocks and not 8100. If you take 1920x1080 you get 2073600 pixels. That divided by 256 (16x16) equals 8100, not 8160. The catch is that 1080 is not a multiple of 16 so you have to add an extra half-row of macroblocks. The actual calculation is (1920 x 1088) / 256 = 8160.

    Chris
  • @woody123
    Yes, I got that it is 16x16 blocks using math skills :-)
    For 720p they are using 3600 constant.

    Other interesting thing that in the same block we have
    setting of value that is proportional to GOP length.
    For 1080p24 we have 24 (GOP=12), 60 for 1080i60 (GOP=15), 48 for 1080i50 (GOP=12).
  • You beat me to it. Nice reference too!

    Chris
  • http://pro.sony.com/bbsccms/ext/cinealta/docs/ibc2004-SR_wpaper_v5.pdf
    pages 4,5

    "Picture segmentation
    Each 4:2:2 PsF 1920 x 1080 picture is first reconstituted into a progressive 1920 x 1080
    frame, then each frame is divided into 8160 16x16 shuffle blocks for luminance and two co-sited 8160 8x16 blocks for chrominance. In the case of 4:4:4 PsF, there are three 8160
    16x16 blocks for each of RGB. In the case of interlace signals, each field is treated as an
    independent 1920 x 540 field, and is divided into 4080 16x16 blocks for luminance and two
    4080 8x16 blocks for chrominance. An example for 4:2:2 PsF is shown in figure 2."
  • Vitaliy, as you noted, the asm listing calculates an index into the 52-element quantization table, in part by using a sequence of hard-coded reference levels (137, 152, 168, 192, 216, 240) as index cut off points. If there is only a single active instance of this routine in the encoder, the hard-coded reference levels could be patched with different values to globally bias the Qstep selection toward higher quality quantization factors.

    Alternately, the coarse end of the Qstep index range could easily be capped, boosting the quality of low-detail macroblocks without increasing the bitrate of medium and high-detail blocks. The to_check routine limits the most coarse quantization index to 51; if this hard-coded value were decreased, it would force the encoder to use a higher quality quantization factor in low-detail macroblocks.
  • Chris.
    Do you recognize 4080 and 8160 constants (used for interlaced and progressive 1080 footage)?
    All I found is that some encoders report this numbers with AC, DC and MV.
  • Maybe they do the block encoding first, and then the color subsampling. That would make sense if the codec was intended to support higher color subsampling rates. Boy, that would be nice - fat chance, though.

    Chris
  • OK.
    Feel free to PM me or use email for more detailed things.

    It looks like I found some funny setting that specify upper and lower bitrate limits for each encoder mode.
  • I'll have to study this a bit against the H.264 standard and see if I can correlate it to some of the reference codecs. It's been a while since I've looked at all this.

    Chris
  • As far as I remember from GH1 they are words. Here didn't find actual usage yet.
  • Here is one:
    word 0x906, 0xE08, 0xE08,0x100A,0x120A,0x100A,0x120C,0x1810,0x1810,0x120C,0x2014,0x5218,0x2014,0x6C1C,0x6C1C,0x6C20
    word 0x805, 0xF0A, 0xF0A,0x1E14,0x140F,0x1E14,0x6050,0x6050,0x6050,0x6050,0x806C,0x806C,0x806C,0x9C8C,0x9C8C,0x9CB0
    0x805, 0xF0A, 0xF0A,0x1E14,0x140F,0x1E14,0x6050,0x6050,0x6050,0x6050,0x806C,0x806C,0x806C,0x9C8C,0x9C8C,0x9CB0
  • Or, there might actually be 24 elements, which would correspond to 4x4 for Y, and two sets of 2x2 for U and V.

    Chris
  • Really? I would expect three parts; one for the Y component which might be, say, 16x16, and the U and V parts would be 8x8 - or, half of the Y table's dimensions (whatever they are). Color subsampling, another trick that contributes to compression, typically occurs before quantization. Come to think of it, with 48 entries I would expect the Y parts to be 32 elements, the U and V parts to be 8 elements each. Those are somewhat strange sizes as they do not correspond to squares, but there might be some data packing going on. How big is each element?

    Chris
  • @cbrandin

    I know all that you said.

    In practice we have 52 elements tables (three for each mode).

    And also 4 tables for each mode consisting from 48 elements (similar ones are used in GH1 encoder).
    Each such table looks like 3 parts consisting from 16 elements (this can be 4x4 matrix in fact).
  • lpowell is right, you don't want to mess with coefficient tables, etc... Actually, I'm not sure anything is to be gained by playing with quantization tables either. Compression typically happens on a macroblock level. A typical 4x4 macroblock (we'll keep it small just for this example) would look something like this:

    A1 A2 A3 A4
    B1 B2 B3 B4
    C1 C2 C3 C4
    D1 D2 D3 D4

    A1 is the DC Coefficient. The rest are all AC coefficients. Basically, the DC coefficient sets the base value and the AC coefficients are offsets using A1 (the DC coefficient) as the base. As you go to the right you'll see horizontal values representing higher frequency horizontal coefficients (i.e. more detail in the horizontal plane). As you go down you see higher frequency components for the vertical plane. So, the top left is the lowest detail on both planes, and the bottom right is the highest.

    Quantization basically works by chopping off values going toward the bottom-right. You'll still see the entire macroblock, but values toward the bottom-right will be zeros after quantization. When the macroblock is transmitted huffman encoding (or the equivalent) is used in a zig-zag pattern, processing coefficients in an order where A1 comes first, followed by A2, followed by B1, etc..., with D4 coming last. This will cause in all the high frequency coefficients which have been set to zero by the quantization process to all be in a row at the end of the bitstream for the macroblock, which the huffman encoding will turn into just a few characters.

    During the encoding process H.264 codecs will typically choose a quantization level according to available bandwidth, so theoretically it should not be necessary to mess with quantization tables. The codec should simply truncate, or not truncate macroblock entries according to available bandwidth.

    Chris
  • I think that this routine returns index in table:

    VEnc_GetQuantIndex: ! CODE XREF: Venc_Encoder_Quant
    mov 0xFFFFFFFF, D2
    mov D0, D1
    asr 9, D1
    cmp 1, D1
    blt skip_loop

    setlb
    asr 1, D1
    inc D2
    cmp 1, D1
    lge


    skip_loop: ! CODE XREF: VEnc_GetQuantIndex+8j
    mov 2, D1
    add D2, D1
    asr D1, D0
    mov 4, D1
    cmp 137, D0
    ble set

    mov 5, D1
    cmp 152, D0
    ble set

    mov 6, D1
    cmp 168, D0
    ble set

    mov 7, D1
    cmp 192, D0
    ble set

    mov 8, D1
    cmp 216, D0
    ble set

    mov 9, D1
    cmp 240, D0
    ble set

    mov 0xA, D1


    set: ! CODE XREF: VEnc_GetQuantIndex+1Dj
    ! VEnc_GetQuantIndex+25j ...
    udf00 6, D2
    add D1, D2
    cmp 0, D2
    bge to_check

    clr D2
    bra return


    to_check: ! CODE XREF: VEnc_GetQuantIndex+4Fj
    cmp 51, D2
    ble return

    mov 51, D2


    return: ! CODE XREF: VEnc_GetQuantIndex+52j
    ! VEnc_GetQuantIndex+56j
    mov D2, D0
    retf [D2], 4

    ! End of function VEnc_GetQuantIndex