Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UBSAN issue with Hamming due misalignment #456

Open
jlmelville opened this issue Feb 24, 2020 · 1 comment
Open

UBSAN issue with Hamming due misalignment #456

jlmelville opened this issue Feb 24, 2020 · 1 comment

Comments

@jlmelville
Copy link

../inst/include/annoylib.h:662:18: runtime error: load of misaligned address 0x55d2429c5424 for type 'const long unsigned int', which requires 8 byte alignment

Actually this PR won't fix that issue, and I'm not sure how to even fix it. Some thoughts:

  • Make sure that the memory is always aligned to 8 byte offsets, possibly using something like http://man7.org/linux/man-pages/man3/posix_memalign.3.html
  • Rewrite the hamming code to use 4 byte ints instead of 8 byte ints
  • Change the few places where we access v[i] to not use an array dereference

None of these are entirely trivial

Originally posted by @erikbern in #455 (comment)

@alexey-milovidov
Copy link

It should be trivial to fix, and every reasonable C++ library should be tested with ASan+MSan+TSan+UBSan+Fuzzing.

rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue May 12, 2024
- this method is little to not found in other vector search offerings
- it work kind of well for low dimensions but it suffers badly from a
  curse of dimensionality which makes inapt for a high number of
  dimensions
- now that Annoy with spotify/annoy#456 is
  gone, we can drop 'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from
  tests
rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue May 12, 2024
- this method is little to not found in other vector search offerings
- it work kind of well for low dimensions but it suffers badly from a
  curse of dimensionality which makes inapt for a high number of
  dimensions
- now that Annoy with spotify/annoy#456 is
  gone, we can drop 'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from
  tests
rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue May 13, 2024
- this method is little to not found in other vector search offerings
- it work kind of well for low dimensions but it suffers badly from a
  curse of dimensionality which makes inapt for a high number of
  dimensions
- now that Annoy with spotify/annoy#456 is
  gone, we can drop 'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from
  tests
rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue May 13, 2024
- this method is little to not found in other vector search offerings
- it work kind of well for low dimensions but it suffers badly from a
  curse of dimensionality which makes inapt for a high number of
  dimensions
- now that Annoy with spotify/annoy#456 is
  gone, we can drop 'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from
  tests
rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue May 14, 2024
- this method is little to not found in other vector search offerings
- it work kind of well for low dimensions but it suffers badly from a
  curse of dimensionality which makes inapt for a high number of
  dimensions
- now that Annoy with spotify/annoy#456 is
  gone, we can drop 'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from
  tests
rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue Jun 10, 2024
Annoy indexes not popular these days, at least when it comes to vector
databases. Such indexes work okay-ish low dimensions but they suffers
badly from a curse of dimensionality which makes them inapt for a high
number of dimensions.

Now that Annoy is gone, issue (*) also disappears and we can drop
'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests.

(*) spotify/annoy#456
rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue Aug 9, 2024
Annoy indexes fell out of favor in the community, at least when it comes
to vector databases. Such indexes work okay-ish low dimensions but they
suffers badly from a curse of dimensionality which makes them inapt for
a high number of dimensions.

Now that Annoy is gone, issue (*) also disappears and we can drop
'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests.

(*) spotify/annoy#456
rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue Aug 11, 2024
Annoy indexes fell out of favor in the community, at least when it comes
to vector databases. Such indexes work okay-ish low dimensions but they
suffers badly from a curse of dimensionality which makes them inapt for
a high number of dimensions.

Now that Annoy is gone, issue (*) also disappears and we can drop
'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests.

(*) spotify/annoy#456
rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue Aug 11, 2024
Annoy indexes fell out of favor in the community, at least when it comes
to vector databases. Such indexes work okay-ish low dimensions but they
suffers badly from a curse of dimensionality which makes them inapt for
a high number of dimensions.

Now that Annoy is gone, issue (*) also disappears and we can drop
'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests.

(*) spotify/annoy#456
rschu1ze added a commit to rschu1ze/ClickHouse that referenced this issue Aug 12, 2024
Annoy indexes fell out of favor in the community, at least when it comes
to vector databases. Such indexes work okay-ish low dimensions but they
suffers badly from a curse of dimensionality which makes them inapt for
a high number of dimensions.

Now that Annoy is gone, issue (*) also disappears and we can drop
'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests.

(*) spotify/annoy#456
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants