Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testSymbolicFactorGraph unit test failure #181

Closed
jlblancoc opened this issue Dec 7, 2019 · 5 comments · Fixed by #183 or #186
Closed

testSymbolicFactorGraph unit test failure #181

jlblancoc opened this issue Dec 7, 2019 · 5 comments · Fixed by #183 or #186
Labels
bug Bug report

Comments

@jlblancoc
Copy link
Member

As part of debugging #168, I found a new issue running the unit tests for i386 (for some reason it works right on amd64):

 68/237 Test  #68: testSymbolicFactorGraph ................***Failed    0.08 sec
Not equal:
expected:
: cliques: 4, variables: 6
expected:
- P( e0 l0 b0)
expected:
| - P( s0 | b0 l0)
expected:
| - P( t0 | e0 l0)
expected:
| - P( x0 | e0)
actual:
: cliques: 4, variables: 6
actual:
- P( e0 l0 b0)
actual:
| - P( t0 | e0 l0)
actual:
| - P( x0 | e0)
actual:
| - P( s0 | b0 l0)
/home/jlblanco/code/gtsam/gtsam/symbolic/tests/testSymbolicFactorGraph.cpp:100: Failure: "assert_equal(asiaBayesTree, actual2)" 
There were 1 failures

@dellaert Any idea where to start looking at to debug?

@dellaert
Copy link
Member

dellaert commented Dec 7, 2019

Interesting: I did encounter this before; it looks like it is only the order of the cliques that does not match, which is probably non-determinism in std::map?

@jlblancoc
Copy link
Member Author

Hmm... I'm almost sure it must be related with an overflow in 32bit integers in the symbolic names macros. Will nail it down...

@jlblancoc
Copy link
Member Author

jlblancoc commented Dec 9, 2019

Ok, here's the problem:

#  define CONCURRENT_MAP_BASE tbb::concurrent_unordered_map<KEY, VALUE>

Using a TBB hash-based index makes the order of nodes in a BayesTree not deterministic if a Key is above the 32bit limits (I'm talking of 32 bit architectures). At the bottom, the problem is inside tbb:

inline size_t tbb_hasher( const T& t ) {
    return static_cast<size_t>( t ) * internal::hash_multiplier;

They use size_t, which is not portable across architectures.
I have verified that building WITHOUT tbb fixes the problem.

There are few good solutions I can see... The only way I could fix this is by changing keyBits here:

// Symbol.cpp
//static const size_t keyBits = sizeof(Key) * 8;
static const size_t keyBits = std::min(sizeof(Key),sizeof(std::size_t)) * 8;
static const size_t chrBits = sizeof(unsigned char) * 8;
static const size_t indexBits = keyBits - chrBits;
static const Key chrMask = Key(UCHAR_MAX)  << indexBits; // For some reason, std::numeric_limits<unsigned char>::max() fails
static const Key indexMask = ~chrMask;

which effectively halves the number of possible Key values, wasting half of the Key 64bit space...
It could be enabled with #if only for 32 bit builds, and/or when TBB is enabled, but that would break binary compatibility of serialized objects, I guess...

What do you think, @dellaert ?

@dellaert
Copy link
Member

dellaert commented Dec 9, 2019

Good find!
But, Key has to stay 64bit, for the reason you suggested and because some applications need it. Can we provide a different hash function? Or simply say we don't guarantee the order, either?

@jlblancoc
Copy link
Member Author

Can we provide a different hash function?

That is the best option.

jlblancoc added a commit that referenced this issue Dec 10, 2019
jlblancoc added a commit that referenced this issue Dec 10, 2019
jlblancoc added a commit that referenced this issue Dec 10, 2019
jlblancoc added a commit that referenced this issue Dec 13, 2019
jlblancoc added a commit that referenced this issue Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
2 participants