Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check failure happens in SsdFile.h #10098

Open
yma11 opened this issue Jun 7, 2024 · 6 comments
Open

Check failure happens in SsdFile.h #10098

yma11 opened this issue Jun 7, 2024 · 6 comments
Assignees
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@yma11
Copy link
Contributor

yma11 commented Jun 7, 2024

Bug description

When use AsyncDataCache together with SSDCache enabled, I got an error about size check failure on file_entry_size vs 8M. Here is the whole stack:

E0606 22:20:34.059942 1534672 Exceptions.h:67] Line: /root/workspace/gluten-rebase/ep/build-velox/build/velox_ep/./velox/common/caching/SsdFile.h:42, Function:SsdRun, Expression: size <= 1 << kSizeBits (13269190 vs. 8388608), Source: RUNTIME, ErrorCode: INVALID_STATE
W0606 22:20:34.061712 1534672 SsdCache.cpp:134] [SSDCA] Ignoring error in SsdFile::write: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (13269190 vs. 8388608)
Retriable: False
Expression: size <= 1 << kSizeBits
Function: SsdRun
File: /root/workspace/gluten-rebase/ep/build-velox/build/velox_ep/./velox/common/caching/SsdFile.h
Line: 42
Stack trace:
# 0  std::shared_ptr<facebook::velox::VeloxException::State const> facebook::velox::VeloxException::State::make<facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1}>(facebook::velox::VeloxException::Type, facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1})
# 1  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 2  facebook::velox::VeloxRuntimeError::VeloxRuntimeError(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, std::basic_string_view<char, std::char_traits<char> >)
# 3  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  facebook::velox::cache::SsdRun::SsdRun(unsigned long, unsigned int, unsigned int)
# 5  facebook::velox::cache::SsdFile::write(std::vector<facebook::velox::cache::CachePin, std::allocator<facebook::velox::cache::CachePin> >&)

System information

Velox System Info v0.0.2
Commit: 6ea98b6
CMake Version: 3.28.3
System: Linux-5.4.0-156-generic
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 9.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 9.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.8/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

@yma11 yma11 added bug Something isn't working triage Newly created issue that needs attention. labels Jun 7, 2024
@yma11
Copy link
Contributor Author

yma11 commented Jun 7, 2024

@oerling Do you have any idea about this? I think a file cache entry is easy to be larger than 8M, why we limit to this size here?

@yma11
Copy link
Contributor Author

yma11 commented Jun 14, 2024

@xiaoxmeng @zacw7 Do you guys happen to know about this? Thanks.

@zacw7
Copy link
Contributor

zacw7 commented Jun 14, 2024

SsdRun only reserves 23 bits (out of 64 bits) for size. Maybe we can expand it to 128 bits.

@xiaoxmeng
Copy link
Contributor

velox/common/caching/SsdFile.h

@yma11 what's the loadQuantum size used in the query? Thanks!

@yma11
Copy link
Contributor Author

yma11 commented Jun 17, 2024

It's 256MB. So I need to set it 8M if want to enable SSD cache?

@xiaoxmeng
Copy link
Contributor

It's 256MB. So I need to set it 8M if want to enable SSD cache?

@yma11 that's the current implementation limitation which need to fix @zacw7. We shall also put limitation on the max size of loadQuantum that we support.

@zacw7 zacw7 self-assigned this Jun 18, 2024
zacw7 added a commit to zacw7/velox that referenced this issue Jun 18, 2024
Summary:
SsdRun only reserves 23 bits (out of 64 bits) for entry size. loadQuantum larger than that will result in cache error.
Fixing facebookincubator#10098

Pull Request resolved: facebookincubator#10242

Reviewed By: xiaoxmeng

Differential Revision: D58711635

Pulled By: zacw7
zacw7 added a commit to zacw7/velox that referenced this issue Jun 18, 2024
Summary:
SsdRun only reserves 23 bits (out of 64 bits) for entry size. loadQuantum larger than that will result in cache error.
Fixing facebookincubator#10098

Pull Request resolved: facebookincubator#10242

Reviewed By: xiaoxmeng

Differential Revision: D58711635

Pulled By: zacw7
zacw7 added a commit to zacw7/velox that referenced this issue Jun 18, 2024
Summary:
SsdRun only reserves 23 bits (out of 64 bits) for entry size. loadQuantum larger than that will result in cache error.
Fixing facebookincubator#10098

Pull Request resolved: facebookincubator#10242

Reviewed By: xiaoxmeng

Differential Revision: D58711635

Pulled By: zacw7
facebook-github-bot pushed a commit that referenced this issue Jun 18, 2024
Summary:
SsdRun only reserves 23 bits (out of 64 bits) for entry size. loadQuantum larger than that will result in cache error.
Fixing #10098

Pull Request resolved: #10242

Reviewed By: xiaoxmeng

Differential Revision: D58711635

Pulled By: zacw7

fbshipit-source-id: 70327443c21d8c6d1d537c145143d46ad012501d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

3 participants