Sierra #4

selamberson · 2023-06-16T22:14:59Z

The changes in this branch are needed to get sierra versions 5.X to build and run with yogimpi. Commits a9bf7ae and de7576f are the most "surprising" and are probably the most important changes for everyone to understand.

Before merging this branch, we need to field test the branch with current Helios and Kestrel. I have Kestrel working on my workstation with gcc and both openmpi and mpich backends. I recommend rebuilding avtools in addition to the final products as some of the function signatures will have changed.

@bpittman, let me know when you have enough infrastructure in place for me to run Jenkins and ATS, and I'll run those.

- The --version argument is now passed down to the "real" mpi wrapper, which is then passed on to the serial compiler. - The mpif77 and mpif90 wrappers are removed, and replaced with the mpifort wrapper. - mpiCC and mpic++ are installed as symlinks to mpicxx - mpif77 and mpif90 are installed as symlinks to mpifort

- Add missing constant MPI_MAX_LIBRARY_VERSION_STRING - Add missing constant MPI_ERRCODES_IGNORE - Add missing function MPI_Comm_spawn

- Added "YogiMPI_VERSION_STR" to yogimpi.h and mpif.h, which should be set to the version of YogiMPI used at compile time. - Added "make dist" target, which will create a yogimpi-<VERSION>.tar.gz file, where <VERSION> is determined by $(git describe --tags).

By inspection, there is more to do here.

Before this commit, when client code would pass a copy or delete callback to MPI_Comm_create_keyval, if the callbacks had any MPI calls inside them then the code would segfault. The segfault was due to the native MPI passing in the native MPI_Comm to the callback, and then YogiMPI would segfault when tyring to covert the comm to a MPI_Comm because Yogi did not know the comm was already converted. This change captures the callback functions and wraps them in lambdas that un-converts the comm back to a YogiMPI_Comm so it is safe for the callbacks to contain MPI calls. There are likely other failure modes associated with user-supplied callbacks to MPI routines. Hopefully this will be a good pattern to repeat if we find similar issues with other callbacks.

This commit wraps almost every single native MPI call with a callDepth increment/decrement. Currently this is only used for indenting calls in the yogimpi.log.* files. There are a certain class of problems that are hard to diagnose, and the commonality is that YogiMPI has called the native MPI and then that MPI has called a callback that was wrapped by YogiMPI. Adding this callDepth and indenting the log file based on callDepth makes it much easier to diagnose the problem.

This commit wraps MPI_User_functions in lambdas, so we can revert the MPI_Datatype back to a YogiMPI_Datatype before passing it to the callback so that the callback can be called from the native MPI routines (e.g. MPI_Reduce, MPI_Scan, etc.).

rtrigg · 2023-06-22T21:16:11Z

I tried with intel and also a pure gnu build on rhel7 with mpich, and all system tests pass. I also built some third party solvers against this and those tests passed, so I'm pretty happy in that regard.

I haven't done any large-scale testing yet, but I can do that next week if you want. Alternatively, I'm okay with approving this after you or @bpittman does that testing.

I'm going to be out Fri-Wed, but if you ping me on here I'll see it and respond.

selamberson · 2023-06-22T21:20:39Z

@rtrigg, thanks. I'd say approve it whenever you are comfortable with it. The changes are significant enough that I don't want to merge it until we have run our ATS with it. We did go ahead and tag it v1.4.0rc1 though so @bpittman doesn't need to rename all his VMs and such if we don't find any problems.

This reverts commit de7576f.

This reverts commit a9bf7ae.

selamberson · 2023-10-30T16:54:27Z

@rtrigg and @bpittman, I have made progress on my yogimpi+sierra work. I have a new rc to test. Please test v1.4.0rc2 and let me know if it works for you. If it works, please "approve" this pull request so github will let me merge it.

selamberson · 2023-11-27T14:50:37Z

@rtrigg and @bpittman, the Kestrel ATS on v1.4.0rc2 indicates that rc2 is ready from my point of view. Can you two please look at the merge request, do whatever testing you need to do, and approve the pull request if it is ready so github will let me merge it.

rtrigg

system tests are passing, I'm happy. I converted the README to README.md a while back, so I added a change for make dist

rtrigg · 2023-12-14T21:29:36Z

it really looks like I broke the CI with my update, but I swear I didn't :) I think this is an issue with a change to oneAPI; I'll look into it and report back.

I'll add, this doesn't change my opinion about the merge: I think it's ready. this is more of a build issue in the CI.

This reverts commit cc8ec8d.

This reverts commit 53c3c0c.

Still have more testing to do and a bit more cleanup, but I need to move my exploration to another machine.

Now only datatypes are using the new insertIntoPoolIfNotExists call. When using the new call for comms and requests things stop working for some reason I don't yet understand, so I'm only using the new call for the datatype which was causing the most issues (I think ...).

selamberson added 20 commits June 8, 2023 14:52

Add some missing constants and functions.

f577749

- Add missing constant MPI_MAX_LIBRARY_VERSION_STRING - Add missing constant MPI_ERRCODES_IGNORE - Add missing function MPI_Comm_spawn

Give explicit control over which python to use.

e7acfc8

Have YVERSION default to 3

01531a3

Make sure yogi doesn't mistake the '-o' argument as the source file.

3ee4a3f

Add some const-correctness stuff.

eb44d80

By inspection, there is more to do here.

Add support for MPI_APPNUM

0a716a8

Remove debug prints.

f64944a

Clean up compiler warnings.

2049028

Add exiting MPI_Finalize debug statement.

9c7d92e

Add some missing commattrs

8d914ee

Clean up compiler warnings for intel-2023.0.0 and mpich

1f0ea28

Clean up compiler warnings for intel-2021.8.0 with openmpi

b585f85

Fix debug builds.

3780738

Add some missing MPI_CXX_* types.

cb6a955

Delete log files in testing directory.

4901ece

Wrap MPI_User_functions passed to MPI.

de7576f

This commit wraps MPI_User_functions in lambdas, so we can revert the MPI_Datatype back to a YogiMPI_Datatype before passing it to the callback so that the callback can be called from the native MPI routines (e.g. MPI_Reduce, MPI_Scan, etc.).

selamberson requested review from bpittman and rtrigg June 16, 2023 22:14

selamberson assigned bpittman, rtrigg and selamberson Jun 16, 2023

selamberson added 3 commits June 16, 2023 16:34

Fix intel oneapi to use mpiicpc

c4d3dba

Update default YVERSION for README.

6b92964

Add another missing const

7e995f3

selamberson added 2 commits September 15, 2023 09:39

Revert "Wrap MPI_User_functions passed to MPI."

53c3c0c

This reverts commit de7576f.

Revert "Wrap some function pointers passed to MPI"

cc8ec8d

This reverts commit a9bf7ae.

update Makefile reference to README.md

5cb3f28

rtrigg approved these changes Dec 14, 2023

View reviewed changes

selamberson added 10 commits December 28, 2023 15:59

Remove "v" from YOGIMPI_VERSION.

10b4690

Add diagnostic output to intel-oneapi action.

c467791

Another attempt to get intel-onapi working.

6556240

Another attempt to get intel-onapi working.

375c775

Add the MPI_UINT*_T types needed by Sierra.

53d24ee

Revert "Revert "Wrap some function pointers passed to MPI""

b8a83c5

This reverts commit cc8ec8d.

Revert "Revert "Wrap MPI_User_functions passed to MPI.""

dc58318

This reverts commit 53c3c0c.

WIP ... Attempt to fix performance problem.

17bef3b

Still have more testing to do and a bit more cleanup, but I need to move my exploration to another machine.

Another atempted fix for datatypes.

c98a691

bpittman approved these changes Apr 11, 2024

View reviewed changes

selamberson merged commit 90d7866 into master Apr 11, 2024
6 checks passed

selamberson deleted the sierra branch April 11, 2024 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sierra #4

Sierra #4

selamberson commented Jun 16, 2023

rtrigg commented Jun 22, 2023 •

edited

Loading

selamberson commented Jun 22, 2023

selamberson commented Oct 30, 2023

selamberson commented Nov 27, 2023

rtrigg left a comment

rtrigg commented Dec 14, 2023 •

edited

Loading

Sierra #4

Sierra #4

Conversation

selamberson commented Jun 16, 2023

rtrigg commented Jun 22, 2023 • edited Loading

selamberson commented Jun 22, 2023

selamberson commented Oct 30, 2023

selamberson commented Nov 27, 2023

rtrigg left a comment

Choose a reason for hiding this comment

rtrigg commented Dec 14, 2023 • edited Loading

rtrigg commented Jun 22, 2023 •

edited

Loading

rtrigg commented Dec 14, 2023 •

edited

Loading