Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: cusparse misalignment error in spmv on subvectors #11926

Closed
maartenarnst opened this issue May 30, 2023 · 10 comments
Closed

Tpetra: cusparse misalignment error in spmv on subvectors #11926

maartenarnst opened this issue May 30, 2023 · 10 comments
Labels
MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests

Comments

@maartenarnst
Copy link
Contributor

Bug Report

@tpetra, @kokkos-kernels, @csiefer2

Description

We're using Anasazi's GeneralizedDavidson solver to solve an eigenproblem with an odd number of rows.

On CPU, it works. However, in a cuda build with cusparse, the solver aborts with a "cudaErrorMisalignedAddress" error.

error( cudaErrorMisalignedAddress): misaligned address  ...

void KokkosSparse::Impl::spmv_cusparse<KokkosSparse::CrsMatrix<double const, int const, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, 

We tracked down the issue to an spmv call on the following line

What appears to be happening is that Anasazi creates two multivectors with multiple columns (d_V and d_AV). The first multivector stores certain vectors. The second multivector serves to store the result of multiplying the matrix with these vectors. The issue appears to arise because Anasazi wants to do such spmv on subsets of the vectors (V_new and AV_new). In particular, we see the abort when it extracts from each multivector the second vector and then wants to do the spmv on those (V_new contains the second vector from d_V and AV_new contains the second vector from d_AV). It appears that when the eigenproblem has an odd number of rows, V_new and AV_new are not aligned in a way that cusparse expects. I wasn't able to find in the cusparse doc what exactly cusparse expects. But certain other cusparse functions expect alignement to 16 bytes, and, if that's the case here, then, if there is an odd number of rows, even if the multivector is aligned to 16 bytes, the second vector in it will be aligned to 8 bytes.

We're able to reproduce the issue using just functions from Tpetra (getVectorNonConst and apply). So even though the issue arises in a computation with Anasazi, it appears this is not really an Anasazi issue, but rather an issue concerning Tpetra's multivector alignment in connection with cusparse spmv.

We tried passing Kokkos::AllowPadding to the constructor of the (dual) view, but this didn't solve the problem.

We are unsure how to analyze/solve the issue further.

using scalar_t        = double;
using execution_space = Kokkos::DefaultExecutionSpace;

using local_ordinal_t  = Tpetra::Details::DefaultTypes::local_ordinal_type;
using global_ordinal_t = Tpetra::Details::DefaultTypes::global_ordinal_type;

using node = Tpetra::KokkosCompat::KokkosDeviceWrapperNode<execution_space>;

using multivector_t = Tpetra::MultiVector<scalar_t, local_ordinal_t, global_ordinal_t, node>;
using matrix_t      = Tpetra::CrsMatrix  <scalar_t, local_ordinal_t, global_ordinal_t, node>;
using map_t         = multivector_t::map_type;

TEST(SPMV, odd_number_of_rows)
{
    constexpr size_t num_rows = 5;
    constexpr size_t num_values_per_row = 3;

    auto contiguous_map = Teuchos::make_rcp<const map_t>(num_rows, 0, TEUCHOS_SERIAL_COMM);

    auto mat_A = Teuchos::make_rcp<matrix_t>(contiguous_map, num_values_per_row);

    mat_A->insertGlobalValues(
        0,
        Teuchos::tuple<global_ordinal_t>( 0 , 1 ),
        Teuchos::tuple<scalar_t        >( 1., 2.)
    );

    mat_A->fillComplete();

    auto jvec  = Teuchos::make_rcp<multivector_t>(contiguous_map, 2);
    auto kvec  = Teuchos::make_rcp<multivector_t>(contiguous_map, 2);

    mat_A->apply(*jvec->getVectorNonConst(0), *kvec->getVectorNonConst(0)); // works
    mat_A->apply(*jvec->getVectorNonConst(0), *kvec->getVectorNonConst(1)); // cusparse misaligned address for odd num_rows

    using view_t = Kokkos::View<scalar_t**, execution_space>;
    const auto lvec_values = view_t(Kokkos::view_alloc("values", Kokkos::AllowPadding), num_rows, 2);
    const auto lvec = Teuchos::make_rcp<multivector_t>(contiguous_map, lvec_values);

    mat_A->apply(*jvec->getVectorNonConst(0), *lvec->getVectorNonConst(1)); // cusparse misaligned address for odd num_rows
}

TEST(SPMV, even_number_of_rows)
{
    constexpr size_t num_rows = 6;
    constexpr size_t num_values_per_row = 3;

    auto contiguous_map = Teuchos::make_rcp<const map_t>(num_rows, 0, TEUCHOS_SERIAL_COMM);

    auto mat_A = Teuchos::make_rcp<matrix_t>(contiguous_map, num_values_per_row);

    mat_A->insertGlobalValues(
        0,
        Teuchos::tuple<global_ordinal_t>( 0 , 1 ),
        Teuchos::tuple<scalar_t        >( 1., 2.)
    );

    mat_A->fillComplete();

    auto jvec  = Teuchos::make_rcp<multivector_t>(contiguous_map, 2);
    auto kvec  = Teuchos::make_rcp<multivector_t>(contiguous_map, 2);

    mat_A->apply(*jvec->getVectorNonConst(0), *kvec->getVectorNonConst(0)); // works
    mat_A->apply(*jvec->getVectorNonConst(0), *kvec->getVectorNonConst(1)); // works for even number of rows

    using view_t = Kokkos::View<scalar_t**, execution_space>;
    const auto lvec_values = view_t(Kokkos::view_alloc("values", Kokkos::AllowPadding), num_rows, 2);
    const auto lvec = Teuchos::make_rcp<multivector_t>(contiguous_map, lvec_values);

    mat_A->apply(*jvec->getVectorNonConst(0), *lvec->getVectorNonConst(1)); // works for even number of rows
}
@maartenarnst maartenarnst added the type: bug The primary issue is a bug in Trilinos code or tests label May 30, 2023
@jhux2
Copy link
Member

jhux2 commented May 30, 2023

@trilinos/tpetra

@jhux2
Copy link
Member

jhux2 commented May 30, 2023

@trilinos/kokkos-kernels

@brian-kelley
Copy link
Contributor

Hi @maartenarnst Thanks for reporting this.

I'm having trouble replicating on CUDA 11.6 and 12.0 though, could you say which version you are using?

@brian-kelley
Copy link
Contributor

Also, in the cudaErrorMisalignedAddress error message, I assume the pointer it gives is actually aligned to 8 bytes and not something smaller? Just to rule out any incorrect casting from a smaller type like float somewhere.

@maartenarnst
Copy link
Contributor Author

Hi @brian-kelley,

We're using Cuda 12.1. The environment variable with the Cusparse version is NV_LIBCUSPARSE_DEV_VERSION=12.0.2.55-1.

The error message begins like

cudaFree(dBuffer) error( cudaErrorMisalignedAddress): misaligned address /home/costmo-user/Trilinos/packages/kokkos-kernels/sparse/tpls/KokkosSparse_spmv_tpl_spec_decl.hpp:127
Backtrace:

It's referring to cudaFree(dBuffer), but, because it's cuda itself that's allocating dBuffer, it seems unlikely that that line is the source of the issue. It seems this line may be intercepting the issue that actually occurred on the line before

KOKKOS_CUSPARSE_SAFE_CALL(cusparseSpMV(cusparseHandle, myCusparseOperation,
                                         &alpha, A_cusparse, vecX, &beta, vecY,
                                         myCudaDataType, alg, dBuffer));

I'm not sure what you mean with the alignment of the pointer in the message. It seems there is no address in the message. Do you know what I should do to get this address to rule out the incorrect casting?

@brian-kelley
Copy link
Contributor

Hi @maartenarnst Never mind about the offending pointer - I thought the Cuda misaligned error message included it, but it doesn't.

I'm talking to a cuSPARSE expert about to try to understand the alignment requirements of SpMV, and why I'm not replicating it on CUDA 12.0. I think a likely fix for this will have 2 parts though:

  • in Tpetra::MultiVector, pad out the number of rows to get 16-byte alignment. However, this will only be possible for MultiVectors constructed by map and number of columns (like the ones in your example). If a user constructs an MV from a View or DualView, we can't control the alignment.
  • in KokkosKernels, check the alignment at runtime and call our own SpMV implementation in cases where cuSPARSE won't work.

Together, these changes should make it so the error never happens, but you still get the performance of cuSPARSE in the vast majority of cases.

@brian-kelley
Copy link
Contributor

After trying some things, I think the first change (add padding in Tpetra) will only be possible after Kokkos core addresses kokkos/kokkos#2995. Right now, you can create a mirror of a padded view, but the mirror will be contiguous (meaning you can't deep copy between it and the original, as the strides are different). Several things in KokkosKernels require copying data between host and device, so currently those would break with padded views.

@brian-kelley
Copy link
Contributor

Hi @maartenarnst I was never able to replicate this in the end, but I did talk back and forth with the cuSPARSE developers to understand when 16-byte alignment should be necessary, and put in the patch #12004. Would you mind checking that this fixes the original issue in Anasazi, or your smaller reproducer? Thanks!

Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Jun 29, 2024
@brian-kelley
Copy link
Contributor

Fixed via #12004

@jhux2 jhux2 added this to Tpetra Aug 12, 2024
@jhux2 jhux2 moved this to Done in Tpetra Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests
Projects
Status: Done
Development

No branches or pull requests

3 participants