Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading + MPI #974

Open
antoine-levitt opened this issue May 23, 2024 · 6 comments
Open

Threading + MPI #974

antoine-levitt opened this issue May 23, 2024 · 6 comments

Comments

@antoine-levitt
Copy link
Member

I've had this happen when running DFTK from within threads. I'm not too clear on what we should do here.

ERROR: LoadError: TaskFailedException

    nested task error: UndefRefError: access to undefined reference
    Stacktrace:
      [1] getindex
        @ ./essentials.jl:892 [inlined]
      [2] popfirst!
        @ ./array.jl:1706 [inlined]
      [3] run_init_hooks()
        @ MPI ~/.julia/packages/MPI/rwDDn/src/environment.jl:65
      [4] Init(; threadlevel::Symbol, finalize_atexit::Bool, errors_return::Bool)
        @ MPI ~/.julia/packages/MPI/rwDDn/src/environment.jl:155
      [5] Init
        @ ~/.julia/packages/MPI/rwDDn/src/environment.jl:114 [inlined]
      [6] PlaneWaveBasis(model::Model{…}, Ecut::Float64, fft_size::Tuple{…}, variational::Bool, kgrid::MonkhorstPack, symmetries_respect_rgrid::Bool, use_symmetries_for_kpoint_reduction::Bool, comm_kpts::MPI.Comm, architecture::DFTK.CPU)
        @ DFTK ~/.julia/dev/DFTK/src/PlaneWaveBasis.jl:247
      [7] #PlaneWaveBasis#141
        @ ~/.julia/dev/DFTK/src/PlaneWaveBasis.jl:399 [inlined]
      [8] setup_calculation(s::Int64, n_electrons::Int64, b::Int64, α::Int64; scaling::Symbol, α_q::Int64, α_r::Int64)
        @ Main ~/Dropbox/recherche/2020-11-anyons/new/functions.jl:239
      [9] setup_calculation
        @ ~/Dropbox/recherche/2020-11-anyons/new/functions.jl:207 [inlined]
     [10] 
        @ Main ~/Dropbox/recherche/2020-11-anyons/new/functions.jl:244
     [11] macro expansion
        @ ~/Dropbox/recherche/2020-11-anyons/new/compute.jl:25 [inlined]
     [12] (::var"#33#threadsfor_fun#23"{Int64, Int64, String, Channel{Int64}})(tid::Int64)
        @ Main ./threadingconstructs.jl:209
     [13] (::Base.Threads.var"#1#2"{var"#33#threadsfor_fun#23"{Int64, Int64, String, Channel{Int64}}, Int64})()
        @ Base.Threads ./threadingconstructs.jl:154
    Some type information was truncated. Use `show(err)` to see complete types.

...and 5 more exceptions.

Stacktrace:
 [1] threading_run(fun::var"#33#threadsfor_fun#23"{Int64, Int64, String, Channel{Int64}}, static::Bool)
   @ Base.Threads ./threadingconstructs.jl:172
 [2] macro expansion
   @ ./threadingconstructs.jl:189 [inlined]
 [3] top-level scope
   @ ~/Dropbox/recherche/2020-11-anyons/new/compute.jl:21
@epolack
Copy link
Collaborator

epolack commented May 23, 2024

I remember being able to do launch it in a quick and dirty way, but I am not so sure anymore…

On a local branch I enabled switching off the three parts where Threads is used.

@antoine-levitt
Copy link
Member Author

It works most of the times but I just had this happen once. Switching off you mean this? #972

@epolack
Copy link
Collaborator

epolack commented May 23, 2024

Right now, for me it works none of the time on another stuff I am doing…

Yes, I was indeed looking at 972 and looks like a lot what I am using for parallel phonons.

(I think I gave up looking at how to do thread in thread because of the @timing stuff.)

@antoine-levitt
Copy link
Member Author

(I think I gave up looking at how to do thread in thread because of the @timing stuff.)

Yeah, should we just disable this by default?

@epolack
Copy link
Collaborator

epolack commented May 23, 2024

I have never used the fact that it's enabled by default. I've always found this surprising.

@mfherbst
Copy link
Member

I've had this happen when running DFTK from within threads.

I think this is because MPI is initialised twice. We should put the initialisation call around a semaphore or signal MPI in the way we initialise it that it could be called from multiple threads (I think it has a flag to do that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants