Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU broken #966

Closed
mfherbst opened this issue Mar 19, 2024 · 6 comments
Closed

GPU broken #966

mfherbst opened this issue Mar 19, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@mfherbst
Copy link
Member

@antoine-levitt GPU CI returns an error after merging #964. I think you should first sort, then transfer.

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.
If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
  [4] assertscalar(op::String)
    @ GPUArraysCore /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
  [5] getindex
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:48 [inlined]
  [6] iterate
    @ ./abstractarray.jl:1222 [inlined]
  [7] iterate
    @ ./abstractarray.jl:1220 [inlined]
  [8] issorted(itr::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, order::Base.Order.ForwardOrdering)
    @ Base.Sort ./sort.jl:51
  [9] #issorted#1
    @ ./sort.jl:86 [inlined]
 [10] issorted(itr::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer})
    @ Base.Sort ./sort.jl:86
 [11] final_retval(X::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, AX::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, BX::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, resid_history::Matrix{Float64}, niter::Int64, n_matvec::Int64)
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/eigen/lobpcg_hyper_impl.jl:305
 [12] macro expansion
    @ /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/eigen/lobpcg_hyper_impl.jl:449 [inlined]
 [13] LOBPCG(A::DFTK.DftHamiltonianBlock, X::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, B::LinearAlgebra.UniformScaling{Bool}, precon::PreconditionerTPA{Float64}, tol::Float64, maxiter::Int64; miniter::Int64, ortho_tol::Float64, n_conv_check::Int64, display_progress::Bool)
    @ DFTK /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:237
 [14] LOBPCG
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:230 [inlined]
 [15] lobpcg_hyper(A::DFTK.DftHamiltonianBlock, X0::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}; maxiter::Int64, prec::PreconditionerTPA{Float64}, tol::Float64, largest::Bool, n_conv_check::Int64, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:miniter,), Tuple{Int64}}})
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/eigen/diag_lobpcg_hyper.jl:9
 [16] diagonalize_all_kblocks(eigensolver::typeof(lobpcg_hyper), ham::Hamiltonian, nev_per_kpoint::Int64; ψguess::Nothing, prec_type::Type{PreconditionerTPA}, interpolate_kpoints::Bool, tol::Float64, miniter::Int64, maxiter::Int64, n_conv_check::Int64)
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/eigen/diag.jl:50
 [17] next_density(ham::Hamiltonian, nbandsalg::AdaptiveBands, fermialg::FermiTwoStage; eigensolver::Function, ψ::Nothing, eigenvalues::Nothing, occupation::Nothing, kwargs::Base.Pairs{Symbol, Real, Tuple{Symbol, Symbol}, NamedTuple{(:miniter, :tol), Tuple{Int64, Float64}}})
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/scf/self_consistent_field.jl:74
 [18] (::DFTK.var"#fixpoint_map#784"{Float64, ScfConvergenceDensity, χ0Mixing, Float64, typeof(lobpcg_hyper), AdaptiveDiagtol, AdaptiveBands, FermiTwoStage, ScfDefaultCallback, Bool, PlaneWaveBasis{Float64, Float64, DFTK.GPU{CuArray}, CuArray{StaticArraysCore.SVector{3, Int64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Float64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Int64}, 1, CUDA.Mem.DeviceBuffer}}, Vector{Float64}, Vector{Float64}, Dates.DateTime, UInt64})(ρin::CuArray{Float64, 4, CUDA.Mem.DeviceBuffer})
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/scf/self_consistent_field.jl:177
 [19] (::DFTK.var"#fp_solver#733"{DFTK.var"#fp_solver#732#734"})(f::DFTK.var"#fixpoint_map#784"{Float64, ScfConvergenceDensity, χ0Mixing, Float64, typeof(lobpcg_hyper), AdaptiveDiagtol, AdaptiveBands, FermiTwoStage, ScfDefaultCallback, Bool, PlaneWaveBasis{Float64, Float64, DFTK.GPU{CuArray}, CuArray{StaticArraysCore.SVector{3, Int64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Float64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Int64}, 1, CUDA.Mem.DeviceBuffer}}, Vector{Float64}, Vector{Float64}, Dates.DateTime, UInt64}, x0::CuArray{Float64, 4, CUDA.Mem.DeviceBuffer}, maxiter::Int64; tol::Float64)
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/scf/scf_solvers.jl:17
 [20] macro expansion
    @ /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/scf/self_consistent_field.jl:209 [inlined]
 [21] self_consistent_field(basis::PlaneWaveBasis{Float64, Float64, DFTK.GPU{CuArray}, CuArray{StaticArraysCore.SVector{3, Int64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Float64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Int64}, 1, CUDA.Mem.DeviceBuffer}}; ρ::CuArray{Float64, 4, CUDA.Mem.DeviceBuffer}, ψ::Nothing, tol::Float64, is_converged::ScfConvergenceDensity, maxiter::Int64, maxtime::Dates.Year, mixing::χ0Mixing, damping::Float64, solver::DFTK.var"#fp_solver#733"{DFTK.var"#fp_solver#732#734"}, eigensolver::typeof(lobpcg_hyper), diagtolalg::AdaptiveDiagtol, nbandsalg::AdaptiveBands, fermialg::FermiTwoStage, callback::ScfDefaultCallback, compute_consistent_energies::Bool, response::ResponseOptions)
    @ DFTK /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:237
 [22] run_problem(; architecture::DFTK.GPU{CuArray})
    @ Main.var"##293" /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/gpu.jl:14
 [23] top-level scope
    @ /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/gpu.jl:18
 [24] eval
    @ ./boot.jl:370 [inlined]
 [25] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1903
 [26] include_string(m::Module, txt::String, fname::String)
    @ Base ./loading.jl:1913
 [27] #invokelatest#2
    @ ./essentials.jl:819 [inlined]
 [28] invokelatest
    @ ./essentials.jl:816 [inlined]
 [29] #3
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TestItemRunner/uEMJE/src/TestItemRunner.jl:102 [inlined]
 [30] withpath(f::TestItemRunner.var"#3#4"{String, String, Module}, path::String)
    @ TestItemRunner /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TestItemRunner/uEMJE/src/vendored_code.jl:7
 [31] run_testitem(filepath::String, use_default_usings::Bool, setups::Vector{Symbol}, package_name::String, original_code::String, line::Int64, column::Int64, test_setup_module_set::TestItemRunner.TestSetupModuleSet)
    @ TestItemRunner /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TestItemRunner/uEMJE/src/TestItemRunner.jl:101
 [32] run_tests(path::String; filter::typeof(dftk_testfilter), verbose::Bool)
    @ TestItemRunner /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TestItemRunner/uEMJE/src/TestItemRunner.jl:185
 [33] top-level scope
    @ /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/runtests_runner.jl:32
 [34] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [35] top-level scope
    @ /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/runtests.jl:28
 [36] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [37] top-level scope
    @ none:6
in expression starting at /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/gpu.jl:18
in expression starting at /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/runtests_runner.jl:32
in expression starting at /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/runtests.jl:22
ERROR: Package DFTK errored during testing
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/Types.jl:69
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
   @ Pkg.Operations /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/Operations.jl:2021
 [3] test
   @ /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/Operations.jl:1902 [inlined]
 [4] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Vector{String}, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::Base.Pairs{Symbol, IOContext{IOStream}, Tuple{Symbol}, NamedTuple{(:io,), Tuple{IOContext{IOStream}}}})
   @ Pkg.API /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/API.jl:441
 [5] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{IOStream}, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:coverage, :test_args), Tuple{Bool, Vector{String}}}})
   @ Pkg.API /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/API.jl:156
 [6] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:coverage, :test_args), Tuple{Bool, Vector{String}}}})
   @ Pkg.API /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/API.jl:171
 [7] top-level scope
@mfherbst mfherbst added the bug Something isn't working label Mar 19, 2024
@antoine-levitt
Copy link
Member

OK I fixed it directly on master (hopefully) can you test?

@antoine-levitt
Copy link
Member

Also should we report this to a GPU package? Which one?

@mfherbst
Copy link
Member Author

Does not yet fix it:

ERROR: LoadError: GPU compilation of MethodInstance for (::GPUArrays.var"#broadcast_kernel#38")(::CUDA.CuKernelContext, ::CuDeviceMatrix{ComplexF64, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(-), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Nothing, typeof(*), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float64, Vector{Float64}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}}}, ::Int64) failed
KernelError: passing and using non-bitstype argument
Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(-), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Nothing, typeof(*), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float64, Vector{Float64}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}}}, which is not isbits:
  .args is of type Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Nothing, typeof(*), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float64, Vector{Float64}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}} which is not isbits.
    .2 is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Nothing, typeof(*), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float64, Vector{Float64}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}} which is not isbits.
      .args is of type Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float64, Vector{Float64}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}} which is not isbits.
        .2 is of type Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float64, Vector{Float64}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}} which is not isbits.
          .x is of type LinearAlgebra.Adjoint{Float64, Vector{Float64}} which is not isbits.
            .parent is of type Vector{Float64} which is not isbits.
Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUCompiler/U36Ed/src/validation.jl:92
  [2] macro expansion
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUCompiler/U36Ed/src/driver.jl:123 [inlined]
  [3] macro expansion
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [4] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUCompiler/U36Ed/src/driver.jl:121
  [5] codegen
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUCompiler/U36Ed/src/driver.jl:110 [inlined]
  [6] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUCompiler/U36Ed/src/driver.jl:106
  [7] compile
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUCompiler/U36Ed/src/driver.jl:98 [inlined]
  [8] #1072
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/CUDA/htRwP/src/compiler/compilation.jl:247 [inlined]
  [9] JuliaContext(f::CUDA.var"#1072#1075"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
    @ GPUCompiler /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUCompiler/U36Ed/src/driver.jl:47
 [10] compile(job::GPUCompiler.CompilerJob)
    @ CUDA /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/CUDA/htRwP/src/compiler/compilation.jl:246
 [11] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUCompiler/U36Ed/src/execution.jl:125
 [12] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUCompiler/U36Ed/src/execution.jl:103
 [13] macro expansion
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/CUDA/htRwP/src/compiler/execution.jl:367 [inlined]
 [14] macro expansion
    @ ./lock.jl:267 [inlined]
 [15] cufunction(f::GPUArrays.var"#broadcast_kernel#38", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{ComplexF64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(-), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Nothing, typeof(*), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float64, Vector{Float64}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}}}, Int64}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/CUDA/htRwP/src/compiler/execution.jl:362
 [16] cufunction(f::GPUArrays.var"#broadcast_kernel#38", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{ComplexF64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(-), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Nothing, typeof(*), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{ComplexF64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float64, Vector{Float64}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}}}, Int64}})
    @ CUDA /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/CUDA/htRwP/src/compiler/execution.jl:359
 [17] macro expansion
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/CUDA/htRwP/src/compiler/execution.jl:112 [inlined]
 [18] #launch_heuristic#1122
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/CUDA/htRwP/src/gpuarrays.jl:17 [inlined]
 [19] launch_heuristic
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/CUDA/htRwP/src/gpuarrays.jl:15 [inlined]
 [20] _copyto!
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:56 [inlined]
 [21] copyto!
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:37 [inlined]
 [22] copy
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:28 [inlined]
 [23] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Nothing, typeof(-), Tuple{CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.Mem.DeviceBuffer}, Nothing, typeof(*), Tuple{CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, LinearAlgebra.Adjoint{Float64, Vector{Float64}}}}}})
    @ Base.Broadcast ./broadcast.jl:873
 [24] final_retval(X::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, AX::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, BX::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, resid_history::Matrix{Float64}, niter::Int64, n_matvec::Int64)
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/eigen/lobpcg_hyper_impl.jl:303
 [25] macro expansion
    @ /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/eigen/lobpcg_hyper_impl.jl:449 [inlined]
 [26] LOBPCG(A::DFTK.DftHamiltonianBlock, X::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, B::LinearAlgebra.UniformScaling{Bool}, precon::PreconditionerTPA{Float64}, tol::Float64, maxiter::Int64; miniter::Int64, ortho_tol::Float64, n_conv_check::Int64, display_progress::Bool)
    @ DFTK /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:237
 [27] LOBPCG
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:230 [inlined]
 [28] lobpcg_hyper(A::DFTK.DftHamiltonianBlock, X0::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}; maxiter::Int64, prec::PreconditionerTPA{Float64}, tol::Float64, largest::Bool, n_conv_check::Int64, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:miniter,), Tuple{Int64}}})
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/eigen/diag_lobpcg_hyper.jl:9
 [29] diagonalize_all_kblocks(eigensolver::typeof(lobpcg_hyper), ham::Hamiltonian, nev_per_kpoint::Int64; ψguess::Nothing, prec_type::Type{PreconditionerTPA}, interpolate_kpoints::Bool, tol::Float64, miniter::Int64, maxiter::Int64, n_conv_check::Int64)
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/eigen/diag.jl:50
 [30] next_density(ham::Hamiltonian, nbandsalg::AdaptiveBands, fermialg::FermiTwoStage; eigensolver::Function, ψ::Nothing, eigenvalues::Nothing, occupation::Nothing, kwargs::Base.Pairs{Symbol, Real, Tuple{Symbol, Symbol}, NamedTuple{(:miniter, :tol), Tuple{Int64, Float64}}})
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/scf/self_consistent_field.jl:74
 [31] (::DFTK.var"#fixpoint_map#784"{Float64, ScfConvergenceDensity, χ0Mixing, Float64, typeof(lobpcg_hyper), AdaptiveDiagtol, AdaptiveBands, FermiTwoStage, ScfDefaultCallback, Bool, PlaneWaveBasis{Float64, Float64, DFTK.GPU{CuArray}, CuArray{StaticArraysCore.SVector{3, Int64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Float64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Int64}, 1, CUDA.Mem.DeviceBuffer}}, Vector{Float64}, Vector{Float64}, Dates.DateTime, UInt64})(ρin::CuArray{Float64, 4, CUDA.Mem.DeviceBuffer})
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/scf/self_consistent_field.jl:177
 [32] (::DFTK.var"#fp_solver#733"{DFTK.var"#fp_solver#732#734"})(f::DFTK.var"#fixpoint_map#784"{Float64, ScfConvergenceDensity, χ0Mixing, Float64, typeof(lobpcg_hyper), AdaptiveDiagtol, AdaptiveBands, FermiTwoStage, ScfDefaultCallback, Bool, PlaneWaveBasis{Float64, Float64, DFTK.GPU{CuArray}, CuArray{StaticArraysCore.SVector{3, Int64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Float64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Int64}, 1, CUDA.Mem.DeviceBuffer}}, Vector{Float64}, Vector{Float64}, Dates.DateTime, UInt64}, x0::CuArray{Float64, 4, CUDA.Mem.DeviceBuffer}, maxiter::Int64; tol::Float64)
    @ DFTK /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/scf/scf_solvers.jl:17
 [33] macro expansion
    @ /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/src/scf/self_consistent_field.jl:209 [inlined]
 [34] self_consistent_field(basis::PlaneWaveBasis{Float64, Float64, DFTK.GPU{CuArray}, CuArray{StaticArraysCore.SVector{3, Int64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Float64}, 3, CUDA.Mem.DeviceBuffer}, CuArray{StaticArraysCore.SVector{3, Int64}, 1, CUDA.Mem.DeviceBuffer}}; ρ::CuArray{Float64, 4, CUDA.Mem.DeviceBuffer}, ψ::Nothing, tol::Float64, is_converged::ScfConvergenceDensity, maxiter::Int64, maxtime::Dates.Year, mixing::χ0Mixing, damping::Float64, solver::DFTK.var"#fp_solver#733"{DFTK.var"#fp_solver#732#734"}, eigensolver::typeof(lobpcg_hyper), diagtolalg::AdaptiveDiagtol, nbandsalg::AdaptiveBands, fermialg::FermiTwoStage, callback::ScfDefaultCallback, compute_consistent_energies::Bool, response::ResponseOptions)
    @ DFTK /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:237
 [35] run_problem(; architecture::DFTK.GPU{CuArray})
    @ Main.var"##293" /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/gpu.jl:14
 [36] top-level scope
    @ /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/gpu.jl:18
 [37] eval
    @ ./boot.jl:370 [inlined]
 [38] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1903
 [39] include_string(m::Module, txt::String, fname::String)
    @ Base ./loading.jl:1913
 [40] #invokelatest#2
    @ ./essentials.jl:819 [inlined]
 [41] invokelatest
    @ ./essentials.jl:816 [inlined]
 [42] #3
    @ /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TestItemRunner/uEMJE/src/TestItemRunner.jl:102 [inlined]
 [43] withpath(f::TestItemRunner.var"#3#4"{String, String, Module}, path::String)
    @ TestItemRunner /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TestItemRunner/uEMJE/src/vendored_code.jl:7
 [44] run_testitem(filepath::String, use_default_usings::Bool, setups::Vector{Symbol}, package_name::String, original_code::String, line::Int64, column::Int64, test_setup_module_set::TestItemRunner.TestSetupModuleSet)
    @ TestItemRunner /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TestItemRunner/uEMJE/src/TestItemRunner.jl:101
 [45] run_tests(path::String; filter::typeof(dftk_testfilter), verbose::Bool)
    @ TestItemRunner /scratch/hpc-prf-dftkjl/dftkjl01/.julia-ci/packages/TestItemRunner/uEMJE/src/TestItemRunner.jl:185
 [46] top-level scope
    @ /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/runtests_runner.jl:32
 [47] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [48] top-level scope
    @ /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/runtests.jl:28
 [49] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [50] top-level scope
    @ none:6
in expression starting at /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/gpu.jl:18
in expression starting at /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/runtests_runner.jl:32
in expression starting at /scratch/hpc-prf-dftkjl/ci-jacamar/data/dftkjl01/builds/s-UUqT3R/000/herbstm/DFTK.jl/test/runtests.jl:22
ERROR: Package DFTK errored during testing
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/Types.jl:69
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
   @ Pkg.Operations /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/Operations.jl:2021
 [3] test
   @ /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/Operations.jl:1902 [inlined]
 [4] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Vector{String}, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::Base.Pairs{Symbol, IOContext{IOStream}, Tuple{Symbol}, NamedTuple{(:io,), Tuple{IOContext{IOStream}}}})
   @ Pkg.API /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/API.jl:441
 [5] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{IOStream}, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:coverage, :test_args), Tuple{Bool, Vector{String}}}})
   @ Pkg.API /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/API.jl:156
 [6] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:coverage, :test_args), Tuple{Bool, Vector{String}}}})
   @ Pkg.API /opt/software/pc2/EB-SW/software/JuliaHPC/1.9.3-foss-2022a-CUDA-11.7.0/share/julia/stdlib/v1.9/Pkg/src/API.jl:171
 [7] top-level scope

BTW can you see the output at https://git.uni-paderborn.de/herbstm/DFTK.jl/-/jobs/258733 ?

If yes you can debug this yourself, by making a branch ending in name in gpu. Then the GPU CI runs at every commit.

@mfherbst
Copy link
Member Author

Also I think your current fix does not offload to GPU in case the eigenvalues are already sorted, right ?

@mfherbst
Copy link
Member Author

mfherbst commented May 9, 2024

Fixed now.

@mfherbst mfherbst closed this as completed May 9, 2024
@antoine-levitt
Copy link
Member

Sorry completely forgot about this, great if it's fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants