VideoCommon: Add support for unrestricted depth range. #13100

CrossVR · 2024-10-04T23:08:03Z

This PR uses the unrestricted depth range extension to achieve the following:

Remove the need for normalization of the [0, 2^24) depth value range using VK_EXT_depth_range_unrestricted and GL_NV_depth_buffer_float.
Natively support oversized depth ranges while still clamping the result to the [0, 2^24) range using VK_EXT_depth_range_unrestricted and VK_EXT_depth_clamp_control.

This is mostly a clean up by providing a code path that removes normalization and was not intended as an accuracy fix. However by removing normalization we have an opportunity to influence rounding behavior, which fixes the last known issue with fast depth.

CrossVR · 2024-10-05T02:09:21Z

There seems to be an issue with the lavapipe implementation, these results do not reproduce on Nvidia. I'll investigate the driver bug.

Pokechu22 · 2024-10-05T03:44:29Z

However by removing normalization we have an opportunity to influence rounding behavior, which fixes the last known issue with fast depth.

Which issue does this fix? https://bugs.dolphin-emu.org/issues/13633 seems to still be present on my Nvidia GPU when using fast depth but not with fast depth disabled.

JMC47 · 2024-10-05T04:49:13Z

You might need modified driver that supports it.

Pokechu22 · 2024-10-05T05:21:01Z

I see an Available extension: VK_EXT_depth_range_unrestricted line... but I'm missing VK_EXT_depth_clamp_control. (I do have VK_EXT_depth_clamp_zero_one but that doesn't seem useful here).

CrossVR · 2024-10-05T12:18:36Z

Correct you need a driver with VK_EXT_depth_clamp_control support or else we can't natively support oversized depth ranges. Though given that removing the normalization can fix the Pokemon channel I could add a code path that only removes the normalization while still using the vertex depth range hack to support oversized depth ranges.

CrossVR · 2024-10-05T14:34:06Z

I've added a code path that fixes the Pokemon Channel and doesn't include the native oversized depth range handling. As a bonus this also allows us to support this case on OpenGL since it also has an unrestricted depth range extension.

…rmalization.

Pokechu22 · 2024-10-07T05:15:36Z

On startup I get a failed VMA assertion. If I ignore it, everything seems to work nicely, though.

I don't entirely understand why a range of [0, 1) functions differently from a range of [0, 2^24). I'd expect both of them to behave the same with regards to floating-point precision (and if we wanted floats to not be able to represent any values other than those corresponding to integers in [0, 2^24), we'd need to map it into a range where values are only in the mantissa, so e.g. using [1, 2)). I suspect there's more to this I just don't understand though.

OatmealDome · 2024-10-07T05:47:26Z

btw, the VMA assertion is unrelated and was fixed in #13103.

Pokechu22 · 2024-10-07T05:53:58Z

Ah, I'd assumed it was related to the Vulkan-Headers update. If I rebase on master then the VMA assertion is indeed fixed.

CrossVR · 2024-10-07T15:45:46Z

@Pokechu22 Initially this PR was aimed at natively supporting oversized depth ranges where the benefit is obvious. We can definitely discuss some alternative normalization schemes, especially if that'll resolve the Pokemon Channel FIFO without the need for a depth bias.

CrossVR · 2024-10-07T16:34:31Z

@Pokechu22 To represent 24-bit integers shouldn't the range be [1, 4) rather than [1, 2) given that the mantissa is 23-bit and not 24-bit?

CrossVR · 2024-10-07T16:44:11Z

@dolphin-emu-bot rebuild

Pokechu22 · 2024-10-07T16:50:34Z

To represent 24-bit integers shouldn't the range be [1, 4) rather than [1, 2) given that the mantissa is 23-bit and not 24-bit?

Ah yes, I thought it was storing 24 bits in addition to the implicit 1 bit, but it's actually 24 bits including that 1 implicit bit.

I'm out of my depth (so to speak :P) on this subject and I don't think I can provide any useful feedback here. If others who are more familiar with how we handle depth are happy with this PR, then it's fine by me. I'll need to do some more reading about what fast depth does compared to the old approach, because I was under the impression that it was trying to store data into the mantissa like that already. I'm pretty sure it's been covered in one of the progress reports already.

I'm also not sure where the oversized depth ranges situation comes up in games. I'm guessing it's something that's also mentioned in a progress report and I just need to read some more.

CrossVR · 2024-10-07T17:02:33Z

@Pokechu22 I don't think there has been a progress report that neatly explains everything unfortunately, a lot of the knowledge about the depth buffer was built up over multiple progress reports.

An outside perspective on how to solve this issue without any regard for the currently implemented solutions would actually be very helpful, so let me try and give a concise explanation of the problem:

The gamecube/wii depth buffer is a 24-bit unsigned integer buffer
Depth values are stored in this depth buffer using the following equation: farZ + (z/w) * zRange
The farZ and zRange parameters are floating point numbers and are unbounded, thus the result of that equation can far exceed what is supported by the depth buffer (they can even be negative)
The final result of that equation is floored and clamped to the [0, 2^24) integer range before being written to the depth buffer
NOTE: There is no special fixed-point conversion, farZ and zRange are not normalized and usually both set to 2^24-1 with z being a negative value

Trying to implement this behavior accurately without rounding errors within the constraints of modern graphics APIs has been challenging, especially since many of them are constrained to a [0, 1] depth range by default.

However we can take advantage of two Vulkan extensions (and one OpenGL extension): VK_EXT_unrestricted_depth_range and VK_EXT_depth_clamp_control. The former gives us the ability to use a depth range beyond [0, 1] and the latter allows us to clamp the final depth value to any range we want.

Another possible thing to take advantage of is that VK_EXT_unrestricted_depth_range defines behavior very similar to the gamecube/wii, namely that for fixed-point depth buffers (like D24_UNORM) depth values beyond [0, 1] are clamped to that range. So this would in theory be perfect, a 24-bit unsigned integer depth buffer that supports an unrestricted depth range with depth values clamped to [0, 1] without the use of another extension. However the problem is the normalization, dividing zFar and zRange in the depth equation by 2^24-1 and then having the GPU multiply the resulting value by 2^24-1 when writing to the depth buffer will result in rounding errors.

dolphin-ci · 2024-10-07T17:30:18Z

FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:

sw3-dt on ogl-lin-mesa: diff
aeon-charge-attack on vk-lin-mesa: diff
burnout2-vehicletextures on vk-lin-mesa: diff
chibi-robo-zfighting on vk-lin-mesa: diff
dbz-depth on vk-lin-mesa: diff
DKCR-Char on vk-lin-mesa: diff
ea-pink on vk-lin-mesa: diff
ed-updated on vk-lin-mesa: diff
inverted-depth-range on vk-lin-mesa: diff
kirby-logicop on vk-lin-mesa: diff
lego-star-wars-crane-shadow on vk-lin-mesa: diff
metroid-visor on vk-lin-mesa: diff
mp3-bloom on vk-lin-mesa: diff
nsmbw-intro on vk-lin-mesa: diff
pbr-sfx on vk-lin-mesa: diff
pm-hc-jp on vk-lin-mesa: diff
pokemon-channel-tv on vk-lin-mesa: diff
rs2-glass on vk-lin-mesa: diff
rs2-skybox on vk-lin-mesa: diff
rs3-bumpmapping on vk-lin-mesa: diff
sf-assault-flashing on vk-lin-mesa: diff
spyro-depth on vk-lin-mesa: diff
sw3-dt on vk-lin-mesa: diff
tla-menu on vk-lin-mesa: diff
tsp3-pinkgrass on vk-lin-mesa: diff

_{^{automated-fifoci-reporter}}

iwubcode · 2024-10-07T22:50:59Z

Source/Core/VideoBackends/Vulkan/VulkanContext.cpp

@@ -676,6 +678,13 @@ bool VulkanContext::SelectDeviceExtensions(bool enable_surface)
  AddExtension(VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME, false);
  AddExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME, false);

+  if (!DriverDetails::HasBug(DriverDetails::BUG_BROKEN_D32F_CLEAR))


Multiline if statements should use {}

iwubcode

Code LGTM. Minor testing with some potential z-fighting during a FreeLook movement, no changes. I currently don't have VK_EXT_depth_clamp_control but will update with notes if I go through and perform a driver update (always hesitant to do that).

Pokechu22 · 2024-10-08T06:13:21Z

Hmm, I guess I've been thinking about the wrong problems (mainly I was thinking about what happens if the GPU interpolates the depth value between vertices and it produces a floating-point value that's more precise than a 24-bit fixed-point value, but using D24_UNORM seems like it would avoid that).

Is there a reason why we need to have the GPU multiply by 2^24-1 again? Could we configure the host viewport to not do that multiplication? ... Is that the entire premise of this PR?

Minor testing with some potential z-fighting during a FreeLook movement, no changes.

I remember there being some issues like this with e.g. Lloyd.dff which I think were caused by updating the freelook state on every new projection matrix, meaning different freelook values were used throughout the frame. I wasn't able to reproduce that currently though, so I'm not sure if that's actually the cause.

iwubcode · 2024-10-08T06:55:40Z

Is there a reason why we need to have the GPU multiply by 2^24-1 again? Could we configure the host viewport to not do that multiplication?

Sorry if I'm speaking out of line, I wouldn't consider myself an expert either. But I'll see if I can answer and @CrossVR can always correct me if I'm wrong. One of the advantages of this PR is to avoid the restriction most (all?) modern graphics APIs have by default which is to have the depth range in [0, 1]. By having it unrestricted, we don't need to do the multiply/divides to try and get the depth into that range and therefore avoid the precision errors it can entail.

(I'm not sure about the viewport question, I remember reading up on that when working on post processing depth logic but can't recall the specifics; I may be misremembering but that allowed you to set near/far plane..but how that plays into depth still puts you in the 0...1 range and doesn't give you any more precision)

I remember there being some issues like this with e.g. Lloyd.dff which I think were caused by updating the freelook state on every new projection matrix, meaning different freelook values were used throughout the frame.

Interesting, I could see that or something similar. I'll have to look into that more. This fifo log was a fire-emblem one, I don't think it's on fifoci (but something similar may be) a user gave it to me describing this problem. The shadows disappear as you move around. It's completely possible it isn't z-fighting and something with the projection matrix but I recall someone questioning something else as z-fighting and I just sort of attributed it to that.

CrossVR · 2024-10-08T11:33:48Z

Is there a reason why we need to have the GPU multiply by 2^24-1 again? Could we configure the host viewport to not do that multiplication? ... Is that the entire premise of this PR?

With a D24_UNORM depth buffer that multiplication is hardwired into the GPU. It is how it does its fixed point conversion. The way we currently avoid that is by switching to a D32_FLOAT depth buffer which does not do any fixed-point conversion and simply stores the float depth value directly. However we still need to stay within the [0, 1] depth range.

To do that we divide the farZ and zRange by 2^24 instead since that won't result in rounding errors. And then we make sure we never set the viewport depth range to anything larger than 2^24-1 / 2^24 to avoid getting depth values that are beyond the 24-bit integer range.

Getting rid of that divisor entirely is indeed one of the two premises of this PR. However you are not wrong about the increased precision being an issue. That is also definitely still an issue and it has been an issue since we switched from D24_UNORM to D32_FLOAT.

iwubcode · 2024-10-08T15:29:15Z

Thanks for the details Cross. A couple more questions from me.

To do that we divide the farZ and zRange by 2^24 instead since that won't result in rounding errors.

I thought floating point didn't match the emulated GPU. Therefore the emulator largely avoids floating point math. I assumed we were using unrestricted depth to avoid the divide/multiply and avoid the issues that occurred trying to replicate it. If it's not due to that, what is the reason for using unrestricted depth? Just an optimization?

However you are not wrong about the increased precision being an issue. That is also definitely still an issue and it has been an issue since we switched from D24_UNORM to D32_FLOAT.

I don't do well with this sort of low level stuff. Why is the value being more precise an issue? ~~And do you recall - why did we decide to switch to D32_FLOAT?~~ Ah, right, you said that. We use it because we can just store the float directly.

CrossVR · 2024-10-08T15:56:00Z

I thought floating point didn't match the emulated GPU. Therefore the emulator largely avoids floating point math.

At the vertex processing stage much of the GPU does use floating point math. It's at the pixel processing stage that the GPU is largely using integers. Our issue lies at the boundary between those two stages.

I assumed we were using unrestricted depth to avoid the divide/multiply and avoid the issues that occurred trying to replicate it. If it's not due to that, what is the reason for using unrestricted depth? Just an optimization?

The primary reason for unrestricted depth is to accurately handle oversized depth ranges. Our current solution involves scaling and offsetting the z value in the vertex shader, but this once again results in rounding errors. By using an unrestricted depth range we can avoid rounding errors in the case where the depth range is oversized.

Getting rid of the divisor was just a code cleanup, it's easier to reason about the code when not having to constantly divide and multiply depth values and having to deal with weird limits like 2^24-1 / 2^24. The fact that the pokemon-channel-tv FIFO also seems to benefit from this change by applying a small bias to the depth value was a nice extra, but was actually unexpected.

I don't do well with this sort of low level stuff. Why is the value being more precise an issue?

Imagine the scenario where the depth equation results in a value of 5.0 and the value in the floating point depth buffer is 5.5. If the depth test is set to EQUALS the depth test would fail here because 5.0 != 5.5. Whereas an accurate emulation of the GPU would've truncated the 5.5 to 5 when writing the value to the integer depth buffer, thus the depth test would've passed in that case.

CrossVR force-pushed the unrestricted-depth-range branch 3 times, most recently from 8a3e4e1 to c982fb1 Compare October 5, 2024 01:19

CrossVR force-pushed the unrestricted-depth-range branch from 3d5c995 to 1f3ccf4 Compare October 5, 2024 14:32

CrossVR force-pushed the unrestricted-depth-range branch from 1f3ccf4 to 1f63117 Compare October 5, 2024 14:37

CrossVR added 4 commits October 5, 2024 16:46

VideoCommon: Add support for unrestricted depth range.

048233b

VideoCommon: Always utilize unrestricted depth range to get rid of no…

a73abcb

…rmalization.

Vulkan: Don't use an unrestricted depth range on a 24-bit depth buffer.

1f2612b

OGL: Add unrestricted depth range support.

4d1000d

CrossVR force-pushed the unrestricted-depth-range branch from 1f63117 to 4d1000d Compare October 5, 2024 14:46

iwubcode reviewed Oct 7, 2024

View reviewed changes

iwubcode approved these changes Oct 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VideoCommon: Add support for unrestricted depth range. #13100

VideoCommon: Add support for unrestricted depth range. #13100

CrossVR commented Oct 4, 2024 •

edited

Loading

CrossVR commented Oct 5, 2024

Pokechu22 commented Oct 5, 2024

JMC47 commented Oct 5, 2024

Pokechu22 commented Oct 5, 2024

CrossVR commented Oct 5, 2024

CrossVR commented Oct 5, 2024 •

edited

Loading

Pokechu22 commented Oct 7, 2024

OatmealDome commented Oct 7, 2024

Pokechu22 commented Oct 7, 2024

CrossVR commented Oct 7, 2024

CrossVR commented Oct 7, 2024

CrossVR commented Oct 7, 2024

Pokechu22 commented Oct 7, 2024

CrossVR commented Oct 7, 2024 •

edited

Loading

dolphin-ci bot commented Oct 7, 2024

iwubcode Oct 7, 2024

iwubcode left a comment

Pokechu22 commented Oct 8, 2024

iwubcode commented Oct 8, 2024 •

edited

Loading

CrossVR commented Oct 8, 2024 •

edited

Loading

iwubcode commented Oct 8, 2024 •

edited

Loading

CrossVR commented Oct 8, 2024 •

edited

Loading

VideoCommon: Add support for unrestricted depth range. #13100

Are you sure you want to change the base?

VideoCommon: Add support for unrestricted depth range. #13100

Conversation

CrossVR commented Oct 4, 2024 • edited Loading

CrossVR commented Oct 5, 2024

Pokechu22 commented Oct 5, 2024

JMC47 commented Oct 5, 2024

Pokechu22 commented Oct 5, 2024

CrossVR commented Oct 5, 2024

CrossVR commented Oct 5, 2024 • edited Loading

Pokechu22 commented Oct 7, 2024

OatmealDome commented Oct 7, 2024

Pokechu22 commented Oct 7, 2024

CrossVR commented Oct 7, 2024

CrossVR commented Oct 7, 2024

CrossVR commented Oct 7, 2024

Pokechu22 commented Oct 7, 2024

CrossVR commented Oct 7, 2024 • edited Loading

dolphin-ci bot commented Oct 7, 2024

iwubcode Oct 7, 2024

Choose a reason for hiding this comment

iwubcode left a comment

Choose a reason for hiding this comment

Pokechu22 commented Oct 8, 2024

iwubcode commented Oct 8, 2024 • edited Loading

CrossVR commented Oct 8, 2024 • edited Loading

iwubcode commented Oct 8, 2024 • edited Loading

CrossVR commented Oct 8, 2024 • edited Loading

CrossVR commented Oct 4, 2024 •

edited

Loading

CrossVR commented Oct 5, 2024 •

edited

Loading

CrossVR commented Oct 7, 2024 •

edited

Loading

iwubcode commented Oct 8, 2024 •

edited

Loading

CrossVR commented Oct 8, 2024 •

edited

Loading

iwubcode commented Oct 8, 2024 •

edited

Loading

CrossVR commented Oct 8, 2024 •

edited

Loading