420, 422, 444, rescaling and colors flame

GlueFactoryBJJ

@Vitaliy - At the beginning of this thread, you were asking for logic/facts, so I tried to bring those in...

Anyway, I'm not saying that, mathematically, you can't CREATE a 2K 444 10-bit file from a 4K 420 8-bit file. Mathematically, you can take the square root of 7 to, say, 3 decimal places. However, when you square that number you don't get 7 again. You get a number very close (6.996025), but it isn't 7. Is it close enough for most purposes? Sure, it rounds to 7. It's in the ball park. But it isn't 7.

Heck, you can create a 2K 444 10-bit file from a 2K 420 8-bit file. What I'm saying is that, for the most part, the 4K converted file won't have the same color values as it would if you had exactly the same device that captured a 2K 444 10-bit file. No more than you could get the same colors out of an 8-bit JPG file as there are in a 12+ bit RAW file. Will it be aesthetically acceptable? Probably. But not exact.

Hey, I admit that for many purposes, you could call this "picking fly manure out of the pepper... pretty soon you end up with all fly manure and no pepper." But the point is still valid. Once color errors are baked in, they can't be corrected. Detail may be enhanced, but correct color can't. It is a fundamental concept in information theory.

I guess I'm saying I understand where @Ze_Cahue is coming from. Technically, the 4K 420 8-bit transformation can be done and it will probably be pretty close to as good as 2K 444 10-bit converted directly from the sensor. And ~80% close is probably "good enough". However, I feel sorry for the colorist who has to match and then grade multiple clips from different sources done that way. Then again, that would be infinitely better than trying to match/grade multiple 420 8-bit sources... :-)

Vitaliy_Kiselev

@GlueFactoryBJJ

I think we went to third cyrcle.

Whole concept is simple.

You have 4 buckets with water, each of the buckets can store 10 liters max.
If you now want to mix all contents 10 liter bucket can be not enough and you need 40 liters one to be safe.
And it is clear that if 4 buckets were empty result will be empty bucket, and if all were full - result will be bucket full with 40 liter of water.

Now. This buckets are individual sensor pixels, suppose they are bw (as we really do not need any complexity here). If you scale down from 4K to FHD you mix 4 pixels in one (in reality good algorithms are slightly more complex).

If you want to produce 8 bit result you make result_bucket = (bucket1+bucket2+bucket3+bucket4)/4.
if you want 10bit you just make result_bucket = bucket1+bucket2+bucket3+bucket4.

Ze_Cahue

@GlueFactoryBJJ the 4k 8-bit pixel would act only like a host for the 2k 10-bit pixel. We are talking digital here, it can be a simple "copy and past". All the info could be restored.

One 10-bit pixel = 1024

One 8-bit pixel = 256

If we get four 8-bit pixels, each one can host 1/4 of the original data.

So, the first 8-bit pixel would get the values between 0 and 255, the second get 256 to 512, the third gets 513 to 768, and the fourth get 769 to 1024. We retrieve equally all the 10-bit data just by reversing the process.

Ze_Cahue

Things got distorted because there are 2 different discussions in here. One is the false capability to have true 10bit 2k 444 from a 8-bit 4k 420 (GH4 output). The other one is the possibility to get 10bit 2k with a GH4 firmware modification.

Vitaliy_Kiselev

So, the first 8-bit pixel would get the values between 0 and 255, the second get 256 to 512, the third gets 513 to 768, and the fourth get 769 to 1024. We retrieve equally all the 10-bit data just by reversing the process.

LOL

Things got distorted because there are 2 different discussions in here. One is the false capability to have true 10bit 2k 444 from a 8-bit 4k 420 (GH4 output). The other one is the possibility to get 10bit 2k with a GH4 firmware modification.

It is not "false capability ". I really lost my hope in you :-)

Because you fail to understand basic things, made using most basic arguments and illustrations. And keep going rounds. Again, read my previous post and think. Get some paper and pencil, spend few minutes.

GlueFactoryBJJ

@Ze_Cahue - In essence, I think that is what Vitaliy is saying. As you noted, the converted 2K 444 10-bit isn't exactly true to the "RAW" data, but may frequently be close enough. I just don't think that any camera priced near a GH3/GH4 carries the CPU that could keep up with that kind of conversion (i.e. 4K 420 8-bit structured to produce a "perfect" 2K 444 10-bit).

Experience with color casts in recorded video indicates, at least to me, that it isn't happening "perfectly", so there are compromises. We "know" that most sensors records at least 12-bit data (RAW photos), so a lot of data is being thrown away to get to 420 8-bit. That is why RAW is SO much more flexible (latitude) in post than compressed data.

Regardless, IMO, Panasonic is doing one heck of a job making a silk purse out of the "pig's ear" (420 8-bit)! I just wish we could get the extra 4+ times the data recorded by the sensor in the GH3 (i.e. 422 10-bit).

Then again, the choice between Canon's or Nikon's equivalently priced products vs the GH3 is, to me, a no-brainer. I already made it, the GH3. :)

Vitaliy_Kiselev

We "know" that most sensors records at least 12-bit data (RAW photos), so a lot of data is being thrown away to get to 420 8-bit. That is why RAW is SO much more flexible (latitude) in post than compressed data.

Huh. They are not "thrown" away (it is just not fully correct to tell this), this is linear (almost) raw data that are 12 bit converted to8-bit data that are non linear.

So, as you read about S-Log and such, it is different way to convert SAME linear data to non linear representation.

Having 10-bits and/or S-Log is useful if you plan to make grading, and the more heavy it is - the more important it is.

In practice most of people loudly complaining on how they need 444 10-bit ProRes do not need it for their work.

Renovatio

I've to admit I can't follow you in all this kind of questions.

But I would like to ask a simple thing: is it possible to have a topic where there's the tutorial how to get 1080p 10bit 4:4:4 from a 4k 8bit 4:2:0? Because there's who says isn't possible, then who says is possible. But if can't be done, all this discussion is theory.

(yes, I need a simple guide to make it :D)

Vitaliy_Kiselev

@Renovatio

http://www.personal-view.com/talks/discussion/comment/167187#Comment_167187

Guide and description for 5 year olds.

Renovatio

I need the two years old version, I guess.

What I was asking: is there a particular process to follow with our video editing software, or just put 4k in 1080p project? The last sounds too optimistic, even for 2 years old.

AdamT

From what I gather it's a simple as dropping a 4K file into a 2K comp and scaling the footage down 50%. There's some discussion as to whether more sophisticated downscaling will result in better files, but nothing conclusive as far as I can tell.

It would be interesting if someone could shoot the same scene with same settings in 4K and 1080 so we can do a comparison of how the downsampled file grades vs. the native-shot 1080 file.

caveport

I think the buckets concept looks good on paper, but in practice, one must allow for the fact that there are only 256 levels stored for each 8 bit channel (Y,U,V). When adding together, as Vitally has explained, we will end up with 1024 levels for each 10 bit channel. If we use the simplest example, bit 1 of the 256 available bits of the 8 bit channel, and add together, we get 4. The end result is that there is NO bit 1,2 or 3. The method only works when adjacent pixels have different values and can be added to produce these in-between values. My point is that the original information has already been sampled into 8 bit and so some 'averaging' of the information has been done and 'baked in' to the 8 bit recording. While we can produce an acceptable 10 bit result from an 8 bit recording, it will never be as accurate as a higher bit depth recorded directly by the camera. One thing that WILL work is the spatial resampling from 4:2:0 to 4:4:4. So summing up; it's a worthwhile idea as it will improve the grading flexibility, but a higher bit depth recording will always be more accurate and give better results. I plan to do an in-depth technical test on this concept to examine the potential artefacts produced.

Vitaliy_Kiselev

If we use the simplest example, bit 1 of the 256 available bits of the 8 bit channel, and add together, we get 4. The end result is that there is NO bit 1,2 or 3.

Thing written above make no sense at all. Zero.

The method only works when adjacent pixels have different values and can be added to produce these in-between values.

It does not make ANY difference that the values are. It is simplest and basic math.

Thing that we see here is result of modern education.

Mistas

Too much "thinking" going on. We are talking about a major reduction in resolution, so why worry so much about all the adjacent pixels and "perceived" chroma change vs actual. It doesn't matter. Pixel data can be combined when downsampled (good) or averaged (not as good). If you went 4k 8bit-->2k 8bit then we could have a legitimate discussion on the ramifications of the unfavorable math.

4k-->2k gives you greater bit depth as long as the headroom is there. Period. Nothing is lost, just combined. It can't be any other way, because the resolution is decreased (by 400%). If there are chroma issues, it's because the original 8-bit may be inaccurate, not because anything was lost on the way to 10-bit.

What about native 2k 10bit? Would it be any different than a 4k 8bit-->2k 10bit? Probably a little bit yes, because of codec variables. I doubt it would be perceptible, but I've never done it so I don't know for sure.

GlueFactoryBJJ

@Vitaliy and others - I guess I must be really dense this week. I can't see how you can say that 2^12 gradations (4096) is the same as 2^8 (256) regardless of whether it is linear or not. I never said that 8 bit (log) couldn't cover the same DR, just that there are not as many data points to use to recover detail, especially in the shadow areas. Unless I am completely misunderstanding you, it is like saying that measurements in whole meters (8-bit) are equivalently accurate to whole centimeters (12-bit).

Exposure To The Right (ETTR, referring to the histogram, full description on the Luminous Landscape site) uses the log allocation of data points as the basis for the technique.

I GET that, mathematically, you can get 10-bit values from 4 * 8-bit. Not a problem. However, let's take a more detailed look at the analogy you have used.

Let's say you have 4K 10-bit data that is being converted into 8-bit data (as is the case with internal recording on the GH4). We have 4 "buckets" as you have described them, using 10-bit monochrome, as you did to simplify the example. Here are the bucket values:

Bucket 1 = 1020 Bucket 2 = 1021 Bucket 3 = 1022 Bucket 4 = 1023

Now let's convert these to 8-bit "equivalents".

Bucket 1 = 255 Bucket 2 = 255 Bucket 3 = 255 Bucket 4 = 255

Now let's convert these 8-bit equivalents to 2K 10-bit, using your additive method.

255+255+255+255 = 1020

Now let's convert the original 10-bit values to a 2K 10-bit value.

(1020+1021+1022+1023)/4 = 1021.5 ~ 1022.

Ok, only about a .2% error. However, when we look at COLOR values, the interaction of the CbCr can result in color changes that can be more noticeable.

Hey, it is close, but not exact. That is why JPGs always look different from RAW files.

Vitaliy_Kiselev

That is why JPGs always look different from RAW files.

Please, ask people to sit before this statement, as it is dangerous to their health if they will be standing.

Let me rephrase it a little for you to understand absurdity:

"That is why roast beef always look different from cows."

DrDave

@GlueFactoryBJJ Your example about the square root is not correct as far as certain calculations like bit depth. If you are "holding" 12 places, the rounding will produce exactly the same result. It is only when you are adding places that you get a different result. Cameras (and audio) do not add more than a few floating places, which are finite. However, if you are adding random number dither, you will of course get a different result. That is the nature of dither.

@Vitaliy rofl

Vitaliy_Kiselev

@GlueFactoryBJJ

Let's try again. Simple case again, only b/w and only 2x2 image.

We have raw data, let is be a1, a2, a3, a4. Raw data is linear 12-bit data.

Resulting 8-bit values will be b1=f(a1), b2=f(a2), b3=f(a3) and b4=f(a4).
Were f(x) is non linear transform.
Usual approach is try to preserve mid tones more accurate and compress shadows and highlights more :-)
f(x) changes according to camera settings.

With rescale all we get is single element b. Simplest method is to make b=b1+b2+b3+b4 (remember! we do not have a1..a4 available!).
As it is quite clear to see f(a)=f(a1+a2+a3+a4) is generally not equal to b=f(a1)+f(a2)+f(a3)+f(a4)

caveport

@Vitaliy Your mathematics is very good. But this application of math to produce a 10bit image will not eliminate banding in 8bit images such as skies with very gradual luminance and chroma variation.

Criticising my education when you have no knowledge of me or my background is not very polite. It is easy to slam people. I have tried to avoid that. Most people on this thread have no idea about how digital sampling works or how digital YCbCr signals work.

We do not have Raw data in 12bit. We have 4k 8bit 4:2:0 which we want to turn into 1920x1080 10bit 4:4:4.

If one considers the case of a graduated luminance signal in 8bit, from left to right (pixel 1 to pixel 4096 for example) which has a variation of only 2 discrete 8bit 'steps' i.e. 235 to 236, across the full width of the image, then the mathematical transform will perfectly preserve those same values in a 10bit image, as there is no variation in the math until the value changes from 235 to 236. The banding is perfectly preserved.

The colour accuracy or sharpness of colour channels will improve due to the size reduction to a 1920x1080 image. But bit depth is about how many discrete levels of brightness of the Y signal are stored and accuracy of the vector information stored in the CbCr channels.

It's not mathematics, it's geometry!

If you can make your math solve the problem of having to use higher bit depths to capture more information you are likely to have many dollars coming your way as this is what all the major imaging companies have been trying to achieve since the beginning of digital imaging.

BTW, I did NOT have a 'modern' education.

BlackLegSanji

VK please stop being so condescending/sarcastic and try to help others see where they're supposedly wrong. I, too, think that by converting the original 12-bit data off the ADC into 8-bit you have already discarded information which can never be recovered but can only be "guessed" with a clever algorithm.

That f(a)=f(a1+a2+a3+a4) not being equal to b=f(a1)+f(a2)+f(a3)+f(a4) is exactly why going from 4K 12-bit to 4K 8-bit then back to 2K 10-bit will not yield results as accurate as going directly from 4K 12-bit to 2K 10-bit.

Mistas

Maybe I am missing something but in my mind the most important consideration here is that resolution drops 400% from 4k to 2k. It is impossible to get a perfect down sample no matter how many bits. Even if you went 4k 10bit to 2k 10bit there would be inaccuracies with data because many pixels are gone. Is averaging pixel data from 4k to 2k at equal bit depth any different than adding the 8bit values to get a 10bit value? I doubt it would be noticeable.

The more important discussion is how much of a benefit will we see in a 10bit 2k, not whether or not it is "true" 10bit.

BurnetRhoades

@Mistas

The more important discussion is how much of a benefit will we see in a 10bit 2k, not whether or not it is "true" 10bit.

Using these methods employed so far, simple spatial filtering, the answer is "not much". The things you'd really like it to just solve for you, like poor gradation, compression artifact and ugly digital noise are left intact during the reduction. You gain a little back from the color sub-sampling but there are cleverer methods to achieve pseudo 4:4:4 from 4:2:0 without changing resolution.

GeoffreyKenner

@GlueFactoryBJJ: I never understood the "255" thing. 0 IS a value, it's 256.

@caveport: "It's not mathematics, it's geometry!" -> You made my day.

Beware of downscale, higher bit have special proprieties like super white and super black that can be clamped. With 8 bit you'll never be able to retrieve this information no matter how many downscale you performs. Burned highlight in 8 bit are burned highlight in every bit you can try to convert it to. Same goes for line skipping although there exists methods to attenuate the effect in the sky (like adding a thin layer of grain and then performing a denoise to mix and smooth the pixels).

Otherwise, Vitaliy pretty much summed up the thing with kid version. It's a bit more technical in reality but you get the picture.

Vitaliy_Kiselev

Criticising my education when you have no knowledge of me or my background is not very polite. It is easy to slam people. I have tried to avoid that. Most people on this thread have no idea about how digital sampling works or how digital YCbCr signals work.

I did not touch you or your background. I just stated obvious thing that was about general education level and level of discussion. And, in fact I never touched "digital sampling" or "YCC".

VK please stop being so condescending/sarcastic and try to help others see where they're supposedly wrong. I, too, think that by converting the original 12-bit data off the ADC into 8-bit you have already discarded information which can never be recovered but can only be "guessed" with a clever algorithm.

So, let's read CAREFULLY that was written:

We have raw data, let is be a1, a2, a3, a4. Raw data is linear 12-bit data.
Resulting 8-bit values will be b1=f(a1), b2=f(a2), b3=f(a3) and b4=f(a4). Were f(x) is non linear transform. Usual approach is try to preserve mid tones more accurate and compress shadows and highlights more :-)

As it is quite clear to see f(a)=f(a1+a2+a3+a4) is generally not equal to b=f(a1)+f(a2)+f(a3)+f(a4)

We are NOT talking about recovering some original raw data for individual pixels of 4K image.

We are talking about resulting 10-bit values after rescaling. Is it correct? Is it really 10-bit data? Yep. Are they equal to data made using first rescaling in raw and later making transform? Generally, no. But they are pretty close and if you know how to properly set camera settings and using so called flat picture profiles you will get pretty accurate data.

@caveport

If one considers the case of a graduated luminance signal in 8bit, from left to right (pixel 1 to pixel 4096 for example) which has a variation of only 2 discrete 8bit 'steps' i.e. 235 to 236, across the full width of the image, then the mathematical transform will perfectly preserve those same values in a 10bit image, as there is no variation in the math until the value changes from 235 to 236. The banding is perfectly preserved.

We have issue here.

Again. We are NOT talking about recovering some original raw or even 10-bit non linear data for individual pixels of 4K image. We are talking about image after rescaling.

Let's make example reverse for you to make easy to understand big logic flaw. Suppose you have X (between 1 and 1024) small balls in one big bucket. Now you get another 4 buckets, each is exactly 1/4 of big one (let's call them - bad banding buckets :-) ). You put your hand in big bucket, get one ball and randomly (with exactly same probability!) place it in each of 4 small buckets.
After you moved all balls, it is clear that it is all messy banding shit.
Let's make reverse process. Put all balls from small buckets to the large one. To our surprise we again have same X balls, and as X is arbitrary it is clear that we preserved all information and strangely got real 10-bit from horrible 8-bit.

karl

To check some practical results I used ffmpeg to convert some GH4 4k samples to yuv444 and to scale them down to 1920x1080. While the looks of the downsampled images were pretty good, I was shocked to see how little hardware support there is for yuv444: There is "theoretical" support in the HW acceleration APIs like vdpau and vaapi, but none of the computers I own has a GPU that actually supports yuv444 overlay visuals.

So the only way to play back and watch the yuv444 encoded video at full quality was to use the RGB "x11" display driver, which is kind of wasting a lot of CPU cycles.

Howdy, Stranger!

Categories

Tags in Topic

Top Posters