Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particle diagnostic swaps particle id's, impossible to recreate trajectories #4908

Closed
aveksler1 opened this issue May 1, 2024 · 18 comments
Closed
Assignees
Labels
bug Something isn't working component: diagnostics all types of outputs component: openPMD openPMD I/O

Comments

@aveksler1
Copy link
Contributor

aveksler1 commented May 1, 2024

After the transition to SoA particle attributes #4653, I am no longer able to recreate particle trajectories using ParticleDiagnostics with the openpmd backend. Admittedly, I haven't tried other backends, since openpmd-viewer is my main analysis tool.

When there are multiple particles, the positions for a given particle id do not always correspond from one iteration to the next. I see this using openpmd-viewer's ParticleTracker, and a clunkier way to reproduce trajectories that isn't shown here.

Below is a minimal reproducing example python input script and analysis script. I launch 100 ions starting at different positions (with do_not_deposit=True so I have a simple particle tracer) in a uniform magnetic field expecting to see perfect Larmor orbits. The plots below show how some trajectories "jump" around, mapping to another particle's trajectory.

analysis.txt
PICMI_inputs_2d.txt

image

image

@aveksler1 aveksler1 added bug Something isn't working component: openPMD openPMD I/O component: diagnostics all types of outputs labels May 1, 2024
@roelof-groenewald
Copy link
Member

attn: @RemiLehe @ax3l

@ax3l ax3l self-assigned this May 1, 2024
@ax3l ax3l added the bug: affects latest release Bug also exists in latest release version label May 1, 2024
@ax3l
Copy link
Member

ax3l commented May 1, 2024

Thanks for the ping! I'll take a look if there is a mismatch between id and cpuid now :-o
(Or an initialization error.)

@ax3l
Copy link
Member

ax3l commented May 1, 2024

Thank you for the details!

Does the issue appear equally on CPU and GPU? How many MPI ranks did you use, if any?

@RemiLehe
Copy link
Member

RemiLehe commented May 1, 2024

Thanks for reporting this @aveksler1
Btw, did you see this when running on CPU or GPU (or both)? Are you running on multiple MPI ranks?

@aveksler1
Copy link
Contributor Author

I saw this when running on GPU, with 1 MPI rank, and 1 grid. I will check on CPU once it finishes compiling

@ax3l
Copy link
Member

ax3l commented May 1, 2024

@RemiLehe has a good guess that I think will be it: we forgot to move the idcpu when we sort particles in various ways.

We look into it now. cc @atmyers

update: see below, looks ok actually...

@roelof-groenewald
Copy link
Member

Thanks guys!

@ax3l
Copy link
Member

ax3l commented May 1, 2024

@ax3l
Copy link
Member

ax3l commented May 1, 2024

Could it be a GPU race condition in swapping of idcpu?
@aveksler1 @roelof-groenewald does setting export CUDA_LAUNCH_BLOCKING=1 still show the issue?
https://warpx.readthedocs.io/en/latest/usage/workflows/debugging.html

@aveksler1
Copy link
Contributor Author

@ax3l Yes, it still shows the issue.

@ax3l
Copy link
Member

ax3l commented May 1, 2024

Verified in parallel: openPMD write uses the correct data set:

// here we the save the SoA properties (idcpu)
{
// todo: add support to not write the particle index
getComponentRecord("id").storeChunkRaw(
soa.GetIdCPUData().data(), {offset}, {numParticleOnTile64});
}

@roelof-groenewald
Copy link
Member

Just guessing here but could there be a problem with openPMD's ID tracker not disentangling the id and cpu bits appropriately? In the above it still looks like the full idcpu 64 bit number is stored as a single quantity.

@ax3l
Copy link
Member

ax3l commented May 1, 2024

For openPMD, ids are just 64bit numbers. It does not deconstruct them.

@RemiLehe
Copy link
Member

RemiLehe commented May 1, 2024

I just compiled and ran on CPU (WarpX (24.04-51-gc5a5732721b4)), and I do not see the issue in this case.

@roelof-groenewald I think that this excludes the possibility that openPMD-viewer is misinterpreting the ID

@aveksler1
Copy link
Contributor Author

Issue does not show up when running with CPU! both on 1 and 2 MPI ranks.

@RemiLehe
Copy link
Member

RemiLehe commented May 1, 2024

@aveksler1 I just compiled for NVIDIA GPU (WarpX (24.04-51-gc5a5732721b4)) and ran with 1 MPI rank, and I again do not see the issue.

I wonder what the difference with your case is. What was the platform that you ran on? Was it an AMD or NVIDIA GPU?

@aveksler1
Copy link
Contributor Author

@RemiLehe @ax3l Sorry to waste your time, my WarpX build was out-dated, I should be better about rebuilding often, especially when facing bugs. It seems like AMReX-Codes/amrex#3890 is likely the fix that I was missing. Thank you all for your help.

@aveksler1 aveksler1 removed the bug: affects latest release Bug also exists in latest release version label May 1, 2024
@RemiLehe
Copy link
Member

RemiLehe commented May 1, 2024

OK, no worries. Thanks for letting us know, and thanks again for reporting potential bugs here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component: diagnostics all types of outputs component: openPMD openPMD I/O
Projects
None yet
Development

No branches or pull requests

4 participants