Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoCommon: Add support for unrestricted depth range. #13100

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

CrossVR
Copy link
Contributor

@CrossVR CrossVR commented Oct 4, 2024

This PR uses the unrestricted depth range extension to achieve the following:

  • Remove the need for normalization of the [0, 2^24) depth value range using VK_EXT_depth_range_unrestricted and GL_NV_depth_buffer_float.
  • Natively support oversized depth ranges while still clamping the result to the [0, 2^24) range using VK_EXT_depth_range_unrestricted and VK_EXT_depth_clamp_control.

This is mostly a clean up by providing a code path that removes normalization and was not intended as an accuracy fix. However by removing normalization we have an opportunity to influence rounding behavior, which fixes the last known issue with fast depth.

@CrossVR CrossVR force-pushed the unrestricted-depth-range branch 3 times, most recently from 8a3e4e1 to c982fb1 Compare October 5, 2024 01:19
@CrossVR
Copy link
Contributor Author

CrossVR commented Oct 5, 2024

There seems to be an issue with the lavapipe implementation, these results do not reproduce on Nvidia. I'll investigate the driver bug.

@Pokechu22
Copy link
Contributor

However by removing normalization we have an opportunity to influence rounding behavior, which fixes the last known issue with fast depth.

Which issue does this fix? https://bugs.dolphin-emu.org/issues/13633 seems to still be present on my Nvidia GPU when using fast depth but not with fast depth disabled.

@JMC47
Copy link
Contributor

JMC47 commented Oct 5, 2024

You might need modified driver that supports it.

@Pokechu22
Copy link
Contributor

I see an Available extension: VK_EXT_depth_range_unrestricted line... but I'm missing VK_EXT_depth_clamp_control. (I do have VK_EXT_depth_clamp_zero_one but that doesn't seem useful here).

@CrossVR
Copy link
Contributor Author

CrossVR commented Oct 5, 2024

Correct you need a driver with VK_EXT_depth_clamp_control support or else we can't natively support oversized depth ranges. Though given that removing the normalization can fix the Pokemon channel I could add a code path that only removes the normalization while still using the vertex depth range hack to support oversized depth ranges.

@CrossVR
Copy link
Contributor Author

CrossVR commented Oct 5, 2024

I've added a code path that fixes the Pokemon Channel and doesn't include the native oversized depth range handling. As a bonus this also allows us to support this case on OpenGL since it also has an unrestricted depth range extension.

@Pokechu22
Copy link
Contributor

image

On startup I get a failed VMA assertion. If I ignore it, everything seems to work nicely, though.

I don't entirely understand why a range of [0, 1) functions differently from a range of [0, 2^24). I'd expect both of them to behave the same with regards to floating-point precision (and if we wanted floats to not be able to represent any values other than those corresponding to integers in [0, 2^24), we'd need to map it into a range where values are only in the mantissa, so e.g. using [1, 2)). I suspect there's more to this I just don't understand though.

@OatmealDome
Copy link
Member

btw, the VMA assertion is unrelated and was fixed in #13103.

@Pokechu22
Copy link
Contributor

Ah, I'd assumed it was related to the Vulkan-Headers update. If I rebase on master then the VMA assertion is indeed fixed.

@CrossVR
Copy link
Contributor Author

CrossVR commented Oct 7, 2024

@Pokechu22 Initially this PR was aimed at natively supporting oversized depth ranges where the benefit is obvious. We can definitely discuss some alternative normalization schemes, especially if that'll resolve the Pokemon Channel FIFO without the need for a depth bias.

@CrossVR
Copy link
Contributor Author

CrossVR commented Oct 7, 2024

@Pokechu22 To represent 24-bit integers shouldn't the range be [1, 4) rather than [1, 2) given that the mantissa is 23-bit and not 24-bit?

@CrossVR
Copy link
Contributor Author

CrossVR commented Oct 7, 2024

@dolphin-emu-bot rebuild

@Pokechu22
Copy link
Contributor

To represent 24-bit integers shouldn't the range be [1, 4) rather than [1, 2) given that the mantissa is 23-bit and not 24-bit?

Ah yes, I thought it was storing 24 bits in addition to the implicit 1 bit, but it's actually 24 bits including that 1 implicit bit.

I'm out of my depth (so to speak :P) on this subject and I don't think I can provide any useful feedback here. If others who are more familiar with how we handle depth are happy with this PR, then it's fine by me. I'll need to do some more reading about what fast depth does compared to the old approach, because I was under the impression that it was trying to store data into the mantissa like that already. I'm pretty sure it's been covered in one of the progress reports already.

I'm also not sure where the oversized depth ranges situation comes up in games. I'm guessing it's something that's also mentioned in a progress report and I just need to read some more.

@CrossVR
Copy link
Contributor Author

CrossVR commented Oct 7, 2024

@Pokechu22 I don't think there has been a progress report that neatly explains everything unfortunately, a lot of the knowledge about the depth buffer was built up over multiple progress reports.

An outside perspective on how to solve this issue without any regard for the currently implemented solutions would actually be very helpful, so let me try and give a concise explanation of the problem:

  • The gamecube/wii depth buffer is a 24-bit unsigned integer buffer
  • Depth values are stored in this depth buffer using the following equation: farZ + (z/w) * zRange
  • The farZ and zRange parameters are floating point numbers and are unbounded, thus the result of that equation can far exceed what is supported by the depth buffer (they can even be negative)
  • The final result of that equation is floored and clamped to the [0, 2^24) integer range before being written to the depth buffer
  • NOTE: There is no special fixed-point conversion, farZ and zRange are not normalized and usually both set to 2^24-1 with z being a negative value

Trying to implement this behavior accurately without rounding errors within the constraints of modern graphics APIs has been challenging, especially since many of them are constrained to a [0, 1] depth range by default.

However we can take advantage of two Vulkan extensions (and one OpenGL extension): VK_EXT_unrestricted_depth_range and VK_EXT_depth_clamp_control. The former gives us the ability to use a depth range beyond [0, 1] and the latter allows us to clamp the final depth value to any range we want.

Another possible thing to take advantage of is that VK_EXT_unrestricted_depth_range defines behavior very similar to the gamecube/wii, namely that for fixed-point depth buffers (like D24_UNORM) depth values beyond [0, 1] are clamped to that range. So this would in theory be perfect, a 24-bit unsigned integer depth buffer that supports an unrestricted depth range with depth values clamped to [0, 1] without the use of another extension. However the problem is the normalization, dividing zFar and zRange in the depth equation by 2^24-1 and then having the GPU multiply the resulting value by 2^24-1 when writing to the depth buffer will result in rounding errors.

@dolphin-ci
Copy link

dolphin-ci bot commented Oct 7, 2024

FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:

  • sw3-dt on ogl-lin-mesa: diff
  • aeon-charge-attack on vk-lin-mesa: diff
  • burnout2-vehicletextures on vk-lin-mesa: diff
  • chibi-robo-zfighting on vk-lin-mesa: diff
  • dbz-depth on vk-lin-mesa: diff
  • DKCR-Char on vk-lin-mesa: diff
  • ea-pink on vk-lin-mesa: diff
  • ed-updated on vk-lin-mesa: diff
  • inverted-depth-range on vk-lin-mesa: diff
  • kirby-logicop on vk-lin-mesa: diff
  • lego-star-wars-crane-shadow on vk-lin-mesa: diff
  • metroid-visor on vk-lin-mesa: diff
  • mp3-bloom on vk-lin-mesa: diff
  • nsmbw-intro on vk-lin-mesa: diff
  • pbr-sfx on vk-lin-mesa: diff
  • pm-hc-jp on vk-lin-mesa: diff
  • pokemon-channel-tv on vk-lin-mesa: diff
  • rs2-glass on vk-lin-mesa: diff
  • rs2-skybox on vk-lin-mesa: diff
  • rs3-bumpmapping on vk-lin-mesa: diff
  • sf-assault-flashing on vk-lin-mesa: diff
  • spyro-depth on vk-lin-mesa: diff
  • sw3-dt on vk-lin-mesa: diff
  • tla-menu on vk-lin-mesa: diff
  • tsp3-pinkgrass on vk-lin-mesa: diff

automated-fifoci-reporter

@@ -676,6 +678,13 @@ bool VulkanContext::SelectDeviceExtensions(bool enable_surface)
AddExtension(VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME, false);
AddExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME, false);

if (!DriverDetails::HasBug(DriverDetails::BUG_BROKEN_D32F_CLEAR))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiline if statements should use {}

Copy link
Contributor

@iwubcode iwubcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. Minor testing with some potential z-fighting during a FreeLook movement, no changes. I currently don't have VK_EXT_depth_clamp_control but will update with notes if I go through and perform a driver update (always hesitant to do that).

@Pokechu22
Copy link
Contributor

Hmm, I guess I've been thinking about the wrong problems (mainly I was thinking about what happens if the GPU interpolates the depth value between vertices and it produces a floating-point value that's more precise than a 24-bit fixed-point value, but using D24_UNORM seems like it would avoid that).

Is there a reason why we need to have the GPU multiply by 2^24-1 again? Could we configure the host viewport to not do that multiplication? ... Is that the entire premise of this PR?

Minor testing with some potential z-fighting during a FreeLook movement, no changes.

I remember there being some issues like this with e.g. Lloyd.dff which I think were caused by updating the freelook state on every new projection matrix, meaning different freelook values were used throughout the frame. I wasn't able to reproduce that currently though, so I'm not sure if that's actually the cause.

@iwubcode
Copy link
Contributor

iwubcode commented Oct 8, 2024

Is there a reason why we need to have the GPU multiply by 2^24-1 again? Could we configure the host viewport to not do that multiplication?

Sorry if I'm speaking out of line, I wouldn't consider myself an expert either. But I'll see if I can answer and @CrossVR can always correct me if I'm wrong. One of the advantages of this PR is to avoid the restriction most (all?) modern graphics APIs have by default which is to have the depth range in [0, 1]. By having it unrestricted, we don't need to do the multiply/divides to try and get the depth into that range and therefore avoid the precision errors it can entail.

(I'm not sure about the viewport question, I remember reading up on that when working on post processing depth logic but can't recall the specifics; I may be misremembering but that allowed you to set near/far plane..but how that plays into depth still puts you in the 0...1 range and doesn't give you any more precision)

I remember there being some issues like this with e.g. Lloyd.dff which I think were caused by updating the freelook state on every new projection matrix, meaning different freelook values were used throughout the frame.

Interesting, I could see that or something similar. I'll have to look into that more. This fifo log was a fire-emblem one, I don't think it's on fifoci (but something similar may be) a user gave it to me describing this problem. The shadows disappear as you move around. It's completely possible it isn't z-fighting and something with the projection matrix but I recall someone questioning something else as z-fighting and I just sort of attributed it to that.

@CrossVR
Copy link
Contributor Author

CrossVR commented Oct 8, 2024

Is there a reason why we need to have the GPU multiply by 2^24-1 again? Could we configure the host viewport to not do that multiplication? ... Is that the entire premise of this PR?

With a D24_UNORM depth buffer that multiplication is hardwired into the GPU. It is how it does its fixed point conversion. The way we currently avoid that is by switching to a D32_FLOAT depth buffer which does not do any fixed-point conversion and simply stores the float depth value directly. However we still need to stay within the [0, 1] depth range.

To do that we divide the farZ and zRange by 2^24 instead since that won't result in rounding errors. And then we make sure we never set the viewport depth range to anything larger than 2^24-1 / 2^24 to avoid getting depth values that are beyond the 24-bit integer range.

Getting rid of that divisor entirely is indeed one of the two premises of this PR. However you are not wrong about the increased precision being an issue. That is also definitely still an issue and it has been an issue since we switched from D24_UNORM to D32_FLOAT.

@iwubcode
Copy link
Contributor

iwubcode commented Oct 8, 2024

Thanks for the details Cross. A couple more questions from me.

To do that we divide the farZ and zRange by 2^24 instead since that won't result in rounding errors.

I thought floating point didn't match the emulated GPU. Therefore the emulator largely avoids floating point math. I assumed we were using unrestricted depth to avoid the divide/multiply and avoid the issues that occurred trying to replicate it. If it's not due to that, what is the reason for using unrestricted depth? Just an optimization?

However you are not wrong about the increased precision being an issue. That is also definitely still an issue and it has been an issue since we switched from D24_UNORM to D32_FLOAT.

I don't do well with this sort of low level stuff. Why is the value being more precise an issue? And do you recall - why did we decide to switch to D32_FLOAT? Ah, right, you said that. We use it because we can just store the float directly.

@CrossVR
Copy link
Contributor Author

CrossVR commented Oct 8, 2024

I thought floating point didn't match the emulated GPU. Therefore the emulator largely avoids floating point math.

At the vertex processing stage much of the GPU does use floating point math. It's at the pixel processing stage that the GPU is largely using integers. Our issue lies at the boundary between those two stages.

I assumed we were using unrestricted depth to avoid the divide/multiply and avoid the issues that occurred trying to replicate it. If it's not due to that, what is the reason for using unrestricted depth? Just an optimization?

The primary reason for unrestricted depth is to accurately handle oversized depth ranges. Our current solution involves scaling and offsetting the z value in the vertex shader, but this once again results in rounding errors. By using an unrestricted depth range we can avoid rounding errors in the case where the depth range is oversized.

Getting rid of the divisor was just a code cleanup, it's easier to reason about the code when not having to constantly divide and multiply depth values and having to deal with weird limits like 2^24-1 / 2^24. The fact that the pokemon-channel-tv FIFO also seems to benefit from this change by applying a small bias to the depth value was a nice extra, but was actually unexpected.

I don't do well with this sort of low level stuff. Why is the value being more precise an issue?

Imagine the scenario where the depth equation results in a value of 5.0 and the value in the floating point depth buffer is 5.5. If the depth test is set to EQUALS the depth test would fail here because 5.0 != 5.5. Whereas an accurate emulation of the GPU would've truncated the 5.5 to 5 when writing the value to the integer depth buffer, thus the depth test would've passed in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants