-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
axom::Array constructor crash on CUDA. #1432
Comments
I'm trying a fix to set m_executeOnGPU based on the memory space, inside of Array::initialize, Array::initialize_from_other, and one of the constructors that calls neither of those methods. |
As mentioned previously: Link to documentation on setting/disabling the Address Translation Services (ATS), and checking if it is enabled/disabled (Point 19): https://lc.llnl.gov/confluence/display/SIERRA/Quickstart+Guide |
Thanks @BradWhitlock . We can have @publixsubfan look into this to make sure other issues don't occur. |
Update. I've had some trouble reproducing the crash on develop. The Array::m_executeOnGPU member is uninitialized but it does not seem to matter much. When it fails in my branch, it seems like some bad optimization might be at work. I was getting the allocatorID to pass from execution_space<ExecSpace>::allocatorID() and it seemed (in Totalview) that the allocatorID was getting optimized out. If I make it "volatile" to prevent inlining then I can see it returns 3 and it works normally. The code resembles:
|
Yes, I believe we need to initialize But is this happening with CUDA device-only memory? The value of that variable should be immaterial -- we should be passing through to special logic for that case. |
Code like the following resulted in Array::Array trying to initialize elements of a device-allocated array using placement new on the host. The code SEGV'd.
This method calls initialize() with 2 arguments, making the 3rd argument the detault of true, which is to default-construct.
axom/src/axom/core/Array.hpp
Line 1084 in 70b3608
axom/src/axom/core/Array.hpp
Line 1591 in 70b3608
I think the root of the problem could be that Array::m_executeOnGPU is not initialized anywhere. Valgrind was logging uninitialized memory in this area and m_executeOnGPU is probably the culprit.
Calling axom::Array(n, n, allocatorID) where allocatorID is a CUDA allocator should not cause a SEGV and it should initialize the data as needed on device.
I was told that ATS might have some bearing here too.
The text was updated successfully, but these errors were encountered: