Skip to content

Kokkoskernels

Siva Rajamanickam edited this page Mar 14, 2018 · 2 revisions

Kokkoskernels Overview

Kokkos C++ Performance Portability Programming EcoSystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels

KokkosKernels implements local computational kernels for linear algebra and graph operations, using the Kokkos shared-memory parallel programming model. "Local" means not using MPI, or running within a single MPI process without knowing about MPI. "Computational kernels" are coarse-grained operations; they take a lot of work and make sense to parallelize inside using Kokkos. KokkosKernels can be the building block of a parallel linear algebra library like Tpetra that uses MPI and threads for parallelism, or it can be used stand-alone in your application.

Computational kernels in this subpackage include the following:

  • (Multi)vector dot products, norms, and updates (AXPY-like operations that add vectors together entry-wise)
  • Sparse matrix-vector multiply and other sparse matrix / dense vector kernels
  • Sparse matrix-matrix multiply
  • Graph coloring
  • Gauss-Seidel with coloring (generalization of red-black)
  • Other linear algebra and graph operations

KokkosKernels is licensed under standard 3-clause BSD terms of use. For specifics, please refer to the LICENSE file contained in the repository or distribution.

We organize this directory as follows:

  1. Public interfaces to computational kernels live in the src/ subdirectory (kokkos-kernels/src):

    • Kokkos_Blas1_MV.hpp: (Multi)vector operations that Tpetra::MultiVector uses
    • Kokkos_Sparse_CrsMatrix.hpp: Declaration and definition of KokkosSparse::CrsMatrix, the sparse matrix data structure used for the computational kernels below
    • KokkosSparse_spmv.hpp: Sparse matrix-vector multiply with a single vector, stored in a 1-D View + Sparse matrix-vector multiply with multiple vectors at a time (multivectors), stored in a 2-D View
  2. Implementations of computational kernels live in the src/impl/ subdirectory (kokkos-kernels/src/impl)

  3. Correctness tests live in the unit_test/ subdirectory, and performance tests live in the perf_test/ subdirectory

  4. Simple example scripts to build Kokkoskernels are in example/buildlib/

Do NOT use or rely on anything in the KokkosBlas::Impl namespace, or on anything in the impl/ subdirectory.

This separation of interface and implementation lets the interface assign the users' Views to View types with the desired attributes (e.g., read-only, RandomRead). This also makes it easier to provide full specializations of the implementation. "Full specializations" mean that all the template parameters are fixed, so that the compiler can actually compile the code. This technique keeps your library's or application's build times down, since kernels are already precompiled for certain template parameter combinations. It also improves performance, since compilers have an easier time optimizing code in shorter .cpp files.

Building Kokkoskernels

  1. Modify example/buildlib/compileKokkosKernelsSimple.sh or example/buildlib/compileKokkosKernels.sh for your environment and run it to generate the required makefiles.
    • KOKKOS_DEVICES can be as below. You can remove any backend that you don't need. If cuda backend is used, CXX compiler should point to ${KOKKOS_PATH}/config/nvcc_wrapper. If you enable Cuda, a host space, either OpenMP or Serial should be enabled. KOKKOS_DEVICES=OpenMP,Serial,Cuda

    • For the best performance give the architecture flag to proper architecture. e.g. KNLs: KOKKOS_ARCHS=KNL, KOKKOS_ARCHS=HSW. If you compile for P100 GPUs with Power8 Processor, give both architectures. KOKKOS_ARCHS=Pascal60,Power8

      For the architecture flags, run below command. %: scripts/generate_makefile.bash --help

  2. Run "make build-test" to compile the tests.

Using Kokkoskernels Test Drivers

In perf_test there are test drivers.

-- KokkosGraph_triangle.exe : Triangle counting driver. -- KokkosSparse_spgemm.exe : Sparse Matrix Sparse Matrix Multiply: *NOTE: KKMEM is outdated. Use default algorithm: KKSPGEMM = KKDEFAULT = DEFAULT Or within the code: kh.create_spgemm_handle(KokkosSparse::SPGEMM_KK); -- KokkosSparse_spmv.exe : Sparse matvec. -- KokkosSparse_pcg.exe: CG method with Gauss Seidel as preconditioner. -- KokkosGraph_color.exe: Distance-1 Graph coloring -- KokkosKernels_MatrixConverter.exe: given a matrix market format, converts it ".bin" binary format for fast input output readings, which can be read by other test drivers.

Please report bugs or performance issues to: https://github.com/kokkos/kokkos-kernels/issues

Clone this wiki locally