-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra::defaultArgNode fails in serial build #3033
Comments
Nobody -- including Matrix Market I/O -- should be creating Node instances explicitly. I can fix Matrix Market I/O not to do this, and I think I can also get rid of the Node instances so they have no power to affect things. |
Oh wait, nothing in the Matrix Market I/O routines calls @kddevin The issue could be that you're creating Tpetra objects at #include <Teuchos_Comm.hpp>
#include <Teuchos_DefaultComm.hpp>
#include <Teuchos_RCP.hpp>
#include <Tpetra_CrsMatrix.hpp>
#include <MatrixMarket_Tpetra.hpp>
#include <string>
#include <sstream>
int main(int narg, char *arg[])
{
Teuchos::GlobalMPISession session(&narg, &arg, NULL);
{
Teuchos::RCP<const Teuchos::Comm<int> > comm = Teuchos::DefaultComm<int>::getComm();
int rank = comm->getRank();
typedef Tpetra::CrsMatrix<> tcrsMatrix_t;
typedef typename tcrsMatrix_t::node_type node_t;
typedef Tpetra::MatrixMarket::Reader<tcrsMatrix_t> reader_t;
try {
std::cout << rank << " Calling defaultArgNode" << std::endl;
Teuchos::RCP<node_t> defNode = Tpetra::defaultArgNode<node_t>();
}
catch (std::exception &e) {
std::cout << "FAIL Exception caught: " << e.what() << std::endl;
return -1;
}
std::string basename("simple");
std::ostringstream fname;
fname << basename << ".mtx";
Teuchos::RCP<tcrsMatrix_t> mat;
try{
if (rank == 0)
std::cout << "Trying to read file " << fname.str() << std::endl;
mat = reader_t::readSparseFile(fname.str(), comm, true, false, false);
}
catch (std::exception &e) {
std::cout << "FAIL Exception caught: " << e.what() << std::endl;
return -1;
}
std::cout << "PASS Matrix #rows=" << mat->getGlobalNumRows()
<< " " << mat->getNodeNumRows() << std::endl;
}
return 0;
} |
Thanks, @mhoemmen . No, the problem isn't the scoping. With the suggestion above, I still get the error (and, I assume, I would have gotten it in the MPI case as well if scoping were the problem). readSparseFile creates the node as an argument to the map constructor. How does the Kokkos node differ between MPI builds and non-MPI builds? If the node instances are not needed in readSparseFile, I can remove them. Were they there just to ensure Kokkos::initialize got called? |
This serial test on the dashboard is passing. Let me see where my environment differs. |
@kddevin wrote:
It doesn't at all. Tpetra's Node creation initializes Kokkos if it hasn't already been initialized. It tries to get command-line arguments from |
Oddly, the code works without error if I build with -DBUILD_SHARED_LIBS:BOOL=ON as on the nightly test dashboard, but fails without that build option. Is BUILD_SHARED_LIBS=ON now required for serial builds? (If so, backward compatibility was broken somewhere along the line.) And if it is required, why is the default FALSE? |
Here's the configuration that worked; without the BUILD_SHARED_LIBS line, the test throws an exception. Note that I am not using #3044 for these tests. |
@kddevin wrote:
That's really quite weird. The current |
After further investigation, I think the problem is that I have TPL_ENABLE_Pthread=OFF. I tested #3044 with and without shared libraries; both cases threw an unknown error. Is Pthread now required to build Trilinos? If so, backward compatibility was broken somewhere along the line. Also, we shouldn't give users the option to disable it if it is now required (especially since the code compiles without it). If not, perhaps there is still something wrong with my configuration. |
@kddevin There should be no need to set
|
Thanks, @mhoemmen . Are there cases where std::call_once is needed to handle threading? If not, could we instead check whether Kokkos is initialized and, if not, initialize it? |
@kddevin wrote:
The only use case that would require |
Closing; #3057 contains the true issue. The Tpetra behavior was only a result. |
@trilinos/tpetra @trilinos/teuchos Tpetra::Map was and still is responsible for initializing Kokkos, if the user hasn't done it already. This commit moves the initialization code out of Teuchos into Tpetra. It also removes the dependency on std::call_once. It appears that with GCC, std::call_once only works if linking with libpthread. Thus, setting TPL_ENABLE_Pthread=OFF (which we don't recommend -- Trilinos autodetects this) could break std::call_once. This change could break a possible use case in which Kokkos::initialize has not been called, and different user threads each create different Tpetra::Map instances. However, Trilinos does not test this use case, nor do the applications we support appear to exercise it. I also took the liberty to purge some unnecessary header includes.
@trilinos/tpetra @trilinos/teuchos @trilinos/kokkos
Expectations
Calls to defaultArgNode should run whether Teuchos::Comm is serial or MPI.
Current Behavior
A simple test program (attached) calling defaultArgNode works when built with MPI, but not when built without MPI.
Without MPI, I get
terminate called after throwing an instance of 'std::system_error'
what(): Unknown error 18446744073709551615
Here are the relevant bits of the stack trace:
0x0000000001862a1c in KokkosCompat::Details::initializeKokkos() ()
at /home/.../packages/teuchos/kokkoscompat/src/KokkosCompat_Details_KokkosInit.cpp:95
0x000000000185fbdb in Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>::KokkosDeviceWrapperNode(Teuchos::ParameterList&) ()
at /home/.../packages/teuchos/kokkoscompat/src/KokkosCompat_ClassicNodeAPI_Wrapper.cpp:171
0x00000000016aaaf9 in Teuchos::RCP<Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > KokkosClassic::Details::getNode<Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >(Teuchos::RCPTeuchos::ParameterList const&) ()
at /home/.../packages/tpetra/classic/NodeAPI/Kokkos_DefaultNode.cpp:56
0x00000000010cd808 in Teuchos::RCP<Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > Tpetra::defaultArgNode<Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >() ()
at /home/.../packages/tpetra/core/src/Tpetra_Map_decl.hpp:91
Motivation and Context
Tpetra::MatrixMarket::Reader::readSparseFile calls this function when it attempts to create maps (e.g., makeRangeMap()).
Thus, the reader does not work for my serial build.
Definition of Done
Maybe my serial environment is wrong -- please advise. My script used to work, so if the environment is wrong, backward compatibility was lost somewhere along the line.
Otherwise, the test program should run with TPL_ENABLE_MPI=ON or OFF.
Possible Solution
Steps to Reproduce
See attached test program, which demonstrates the fault.
It reads a matrix-market file simple.mtx; you can use any matrix-market file with this name.
The test calls defaultArgNode directly, which a user wouldn't usually do. readSparseFile calls it internally when it creates Maps.
mmReader.cpp.txt
Your Environment
module purge
module load sems-env
module load sems-gcc/4.9.3
cmake
-D TPL_ENABLE_Pthread:BOOL=OFF
-D CMAKE_BUILD_TYPE:STRING="DEBUG"
-D CMAKE_VERBOSE_MAKEFILE:BOOL=OFF
-D TPL_ENABLE_MPI:BOOL=OFF
-D Trilinos_ENABLE_EXPLICIT_INSTANTIATION:BOOL=ON
-D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=OFF
-D Trilinos_ENABLE_TESTS:BOOL=OFF
-D Trilinos_ENABLE_EXAMPLES:BOOL=OFF
-D Trilinos_VERBOSE_CONFIGURE:BOOL=OFF
-D Trilinos_ENABLE_Zoltan2:BOOL=ON
-D Zoltan2_ENABLE_TESTS:BOOL=ON
..
Related Issues
Additional Information
The text was updated successfully, but these errors were encountered: