-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sierra #4
Sierra #4
Conversation
- The --version argument is now passed down to the "real" mpi wrapper, which is then passed on to the serial compiler. - The mpif77 and mpif90 wrappers are removed, and replaced with the mpifort wrapper. - mpiCC and mpic++ are installed as symlinks to mpicxx - mpif77 and mpif90 are installed as symlinks to mpifort
- Add missing constant MPI_MAX_LIBRARY_VERSION_STRING - Add missing constant MPI_ERRCODES_IGNORE - Add missing function MPI_Comm_spawn
- Added "YogiMPI_VERSION_STR" to yogimpi.h and mpif.h, which should be set to the version of YogiMPI used at compile time. - Added "make dist" target, which will create a yogimpi-<VERSION>.tar.gz file, where <VERSION> is determined by $(git describe --tags).
By inspection, there is more to do here.
Before this commit, when client code would pass a copy or delete callback to MPI_Comm_create_keyval, if the callbacks had any MPI calls inside them then the code would segfault. The segfault was due to the native MPI passing in the native MPI_Comm to the callback, and then YogiMPI would segfault when tyring to covert the comm to a MPI_Comm because Yogi did not know the comm was already converted. This change captures the callback functions and wraps them in lambdas that un-converts the comm back to a YogiMPI_Comm so it is safe for the callbacks to contain MPI calls. There are likely other failure modes associated with user-supplied callbacks to MPI routines. Hopefully this will be a good pattern to repeat if we find similar issues with other callbacks.
This commit wraps almost every single native MPI call with a callDepth increment/decrement. Currently this is only used for indenting calls in the yogimpi.log.* files. There are a certain class of problems that are hard to diagnose, and the commonality is that YogiMPI has called the native MPI and then that MPI has called a callback that was wrapped by YogiMPI. Adding this callDepth and indenting the log file based on callDepth makes it much easier to diagnose the problem.
This commit wraps MPI_User_functions in lambdas, so we can revert the MPI_Datatype back to a YogiMPI_Datatype before passing it to the callback so that the callback can be called from the native MPI routines (e.g. MPI_Reduce, MPI_Scan, etc.).
I tried with intel and also a pure gnu build on rhel7 with mpich, and all system tests pass. I also built some third party solvers against this and those tests passed, so I'm pretty happy in that regard. I haven't done any large-scale testing yet, but I can do that next week if you want. Alternatively, I'm okay with approving this after you or @bpittman does that testing. I'm going to be out Fri-Wed, but if you ping me on here I'll see it and respond. |
@rtrigg, thanks. I'd say approve it whenever you are comfortable with it. The changes are significant enough that I don't want to merge it until we have run our ATS with it. We did go ahead and tag it v1.4.0rc1 though so @bpittman doesn't need to rename all his VMs and such if we don't find any problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
system tests are passing, I'm happy. I converted the README to README.md a while back, so I added a change for make dist
it really looks like I broke the CI with my update, but I swear I didn't :) I think this is an issue with a change to oneAPI; I'll look into it and report back. I'll add, this doesn't change my opinion about the merge: I think it's ready. this is more of a build issue in the CI. |
This reverts commit cc8ec8d.
This reverts commit 53c3c0c.
Still have more testing to do and a bit more cleanup, but I need to move my exploration to another machine.
Now only datatypes are using the new insertIntoPoolIfNotExists call. When using the new call for comms and requests things stop working for some reason I don't yet understand, so I'm only using the new call for the datatype which was causing the most issues (I think ...).
The changes in this branch are needed to get sierra versions 5.X to build and run with yogimpi. Commits a9bf7ae and de7576f are the most "surprising" and are probably the most important changes for everyone to understand.
Before merging this branch, we need to field test the branch with current Helios and Kestrel. I have Kestrel working on my workstation with gcc and both openmpi and mpich backends. I recommend rebuilding avtools in addition to the final products as some of the function signatures will have changed.
@bpittman, let me know when you have enough infrastructure in place for me to run Jenkins and ATS, and I'll run those.