-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VideoCommon: Add support for unrestricted depth range. #13100
base: master
Are you sure you want to change the base?
Conversation
8a3e4e1
to
c982fb1
Compare
There seems to be an issue with the lavapipe implementation, these results do not reproduce on Nvidia. I'll investigate the driver bug. |
Which issue does this fix? https://bugs.dolphin-emu.org/issues/13633 seems to still be present on my Nvidia GPU when using fast depth but not with fast depth disabled. |
You might need modified driver that supports it. |
I see an |
Correct you need a driver with |
3d5c995
to
1f3ccf4
Compare
I've added a code path that fixes the Pokemon Channel and doesn't include the native oversized depth range handling. As a bonus this also allows us to support this case on OpenGL since it also has an unrestricted depth range extension. |
1f3ccf4
to
1f63117
Compare
1f63117
to
4d1000d
Compare
On startup I get a failed VMA assertion. If I ignore it, everything seems to work nicely, though. I don't entirely understand why a range of |
btw, the VMA assertion is unrelated and was fixed in #13103. |
Ah, I'd assumed it was related to the Vulkan-Headers update. If I rebase on master then the VMA assertion is indeed fixed. |
@Pokechu22 Initially this PR was aimed at natively supporting oversized depth ranges where the benefit is obvious. We can definitely discuss some alternative normalization schemes, especially if that'll resolve the Pokemon Channel FIFO without the need for a depth bias. |
@Pokechu22 To represent 24-bit integers shouldn't the range be [1, 4) rather than [1, 2) given that the mantissa is 23-bit and not 24-bit? |
@dolphin-emu-bot rebuild |
Ah yes, I thought it was storing 24 bits in addition to the implicit 1 bit, but it's actually 24 bits including that 1 implicit bit. I'm out of my depth (so to speak :P) on this subject and I don't think I can provide any useful feedback here. If others who are more familiar with how we handle depth are happy with this PR, then it's fine by me. I'll need to do some more reading about what fast depth does compared to the old approach, because I was under the impression that it was trying to store data into the mantissa like that already. I'm pretty sure it's been covered in one of the progress reports already. I'm also not sure where the oversized depth ranges situation comes up in games. I'm guessing it's something that's also mentioned in a progress report and I just need to read some more. |
@Pokechu22 I don't think there has been a progress report that neatly explains everything unfortunately, a lot of the knowledge about the depth buffer was built up over multiple progress reports. An outside perspective on how to solve this issue without any regard for the currently implemented solutions would actually be very helpful, so let me try and give a concise explanation of the problem:
Trying to implement this behavior accurately without rounding errors within the constraints of modern graphics APIs has been challenging, especially since many of them are constrained to a However we can take advantage of two Vulkan extensions (and one OpenGL extension): Another possible thing to take advantage of is that |
FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:
automated-fifoci-reporter |
@@ -676,6 +678,13 @@ bool VulkanContext::SelectDeviceExtensions(bool enable_surface) | |||
AddExtension(VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME, false); | |||
AddExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME, false); | |||
|
|||
if (!DriverDetails::HasBug(DriverDetails::BUG_BROKEN_D32F_CLEAR)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiline if statements should use {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM. Minor testing with some potential z-fighting during a FreeLook movement, no changes. I currently don't have VK_EXT_depth_clamp_control
but will update with notes if I go through and perform a driver update (always hesitant to do that).
Hmm, I guess I've been thinking about the wrong problems (mainly I was thinking about what happens if the GPU interpolates the depth value between vertices and it produces a floating-point value that's more precise than a 24-bit fixed-point value, but using Is there a reason why we need to have the GPU multiply by
I remember there being some issues like this with e.g. Lloyd.dff which I think were caused by updating the freelook state on every new projection matrix, meaning different freelook values were used throughout the frame. I wasn't able to reproduce that currently though, so I'm not sure if that's actually the cause. |
Sorry if I'm speaking out of line, I wouldn't consider myself an expert either. But I'll see if I can answer and @CrossVR can always correct me if I'm wrong. One of the advantages of this PR is to avoid the restriction most (all?) modern graphics APIs have by default which is to have the depth range in [0, 1]. By having it unrestricted, we don't need to do the multiply/divides to try and get the depth into that range and therefore avoid the precision errors it can entail. (I'm not sure about the viewport question, I remember reading up on that when working on post processing depth logic but can't recall the specifics; I may be misremembering but that allowed you to set near/far plane..but how that plays into depth still puts you in the 0...1 range and doesn't give you any more precision)
Interesting, I could see that or something similar. I'll have to look into that more. This fifo log was a fire-emblem one, I don't think it's on fifoci (but something similar may be) a user gave it to me describing this problem. The shadows disappear as you move around. It's completely possible it isn't z-fighting and something with the projection matrix but I recall someone questioning something else as z-fighting and I just sort of attributed it to that. |
With a To do that we divide the Getting rid of that divisor entirely is indeed one of the two premises of this PR. However you are not wrong about the increased precision being an issue. That is also definitely still an issue and it has been an issue since we switched from |
Thanks for the details Cross. A couple more questions from me.
I thought floating point didn't match the emulated GPU. Therefore the emulator largely avoids floating point math. I assumed we were using unrestricted depth to avoid the divide/multiply and avoid the issues that occurred trying to replicate it. If it's not due to that, what is the reason for using unrestricted depth? Just an optimization?
I don't do well with this sort of low level stuff. Why is the value being more precise an issue? |
At the vertex processing stage much of the GPU does use floating point math. It's at the pixel processing stage that the GPU is largely using integers. Our issue lies at the boundary between those two stages.
The primary reason for unrestricted depth is to accurately handle oversized depth ranges. Our current solution involves scaling and offsetting the z value in the vertex shader, but this once again results in rounding errors. By using an unrestricted depth range we can avoid rounding errors in the case where the depth range is oversized. Getting rid of the divisor was just a code cleanup, it's easier to reason about the code when not having to constantly divide and multiply depth values and having to deal with weird limits like
Imagine the scenario where the depth equation results in a value of 5.0 and the value in the floating point depth buffer is 5.5. If the depth test is set to |
This PR uses the unrestricted depth range extension to achieve the following:
[0, 2^24)
depth value range usingVK_EXT_depth_range_unrestricted
andGL_NV_depth_buffer_float
.[0, 2^24)
range usingVK_EXT_depth_range_unrestricted
andVK_EXT_depth_clamp_control
.This is mostly a clean up by providing a code path that removes normalization and was not intended as an accuracy fix. However by removing normalization we have an opportunity to influence rounding behavior, which fixes the last known issue with fast depth.