Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Clang] Add support for device image compression #15124

Open
wants to merge 59 commits into
base: sycl
Choose a base branch
from

Conversation

uditagarwal97
Copy link
Contributor

@uditagarwal97 uditagarwal97 commented Aug 18, 2024

This PR adds support for device image compression for the old offloading model. I'll make another follow-up PR to extend support for the new offload model.

Design summary:

ZSTD (compression algo)   ----> LLVMSupport (Interface)  ------> clang-offload-wrapper (For compression)
 |
 ----------------------------------------------------- --------> SYCL RT (For decompression)

This PR introduces ZSTD (https://github.com/facebook/zstd) as a 3rd party dependency of DPCPP. Similar to upstream LLVM, we expect user to have zstd-dev package installed on their machine - we won't be installing zstd from sources.

How to use
To compress device images, add --offload-compress CLI option to your clang invocation. Note that we compress device images only if the size of device images exceeds a threshold, which is 512 bytes by default. Moreover, by default, we use ZSTD level 10 for compression. ZSTD compression levels provides a tradeoff between (de)compression time and compression ratio, and the compression level can be changed using --offload-compression-level=<int> CLI option.

@uditagarwal97 uditagarwal97 self-assigned this Aug 18, 2024
@uditagarwal97
Copy link
Contributor Author

uditagarwal97 commented Aug 21, 2024

Some initial performance stats:

zstd_spirv_smol-v_dataset

Dataset: https://github.com/aras-p/smol-v/tree/master/tests/spirv-dumps
Dataset size: 275 SPIR-V files

Conclusion:
Overall, for SPIR-V files < 50KB, the decompression time is below 0.1ms, compression time <0.15ms, and compression ratio is ~3 (compressed image is 1/3 the original size).
For very small images (<512 bytes), I don't see much benefit of image compression.

Note:- Most of the SPIR-V files I have in the dataset are <50KB. I'm working on extending the performance evaluation to larger workloads. Also, the (de)compression performance will vary with the format of the file being compressed, so for AOT, where device images consists of target assembly, the performance stats might differ.

@jbrodman
Copy link
Contributor

What happens with the PTX and AMDGPU targets? Are they covered by the "native" binary image format? Do we need additional formats?

@jbrodman
Copy link
Contributor

Also guessing this feature may not make sense when combined with the native cpu device, but need to think more about that.

@uditagarwal97
Copy link
Contributor Author

uditagarwal97 commented Aug 21, 2024

What happens with the PTX and AMDGPU targets? Are they covered by the "native" binary image format? Do we need additional formats?

I think they are covered by the "none" binary image format. This is because clang driver (in SYCL offload mode) never specifies the image format in call to clang-offload-wrapper. So, by default, the BinaryImageFormat is "none" and it is upto the SYCL runtime to determine the format (https://github.com/intel/llvm/blob/sycl/sycl/source/detail/device_binary_image.cpp#L170).

I tested my changes with PTX, and they seem to work fine, so, we'd likely not require additional formats.

Copy link
Contributor

@mdtoguchi mdtoguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for driver

@uditagarwal97
Copy link
Contributor Author

@bso-intel @intel/llvm-reviewers-runtime ping!

Comment on lines 1135 to 1139
llvm::compression::zstd::compress(
ArrayRef<unsigned char>(
(const unsigned char *)(Bin->getBufferStart()),
Bin->getBufferSize()),
CompressedBuffer, OffloadCompressLevel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the crash occurs when compression failed, the error message that the SYCL end-user will receive may not be useful.
For example, "Failed to create ZSTD_CCtx", ""Failed to set ZSTD_c_compressionLevel", etc.
It would be much better user experience if they get a message like "Device image compression failed." + e.what().

Comment on lines +1117 to +1119
"'--offload-compress' option is specified but zstd "
"is not available. The device image will not be "
"compressed.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of error message is good since the SYCL end-user can understand what went wrong.

@uditagarwal97
Copy link
Contributor Author

@bso-intel

When the crash occurs when compression failed, the error message that the SYCL end-user will receive may not be useful.
For example, "Failed to create ZSTD_CCtx", ""Failed to set ZSTD_c_compressionLevel", etc.
It would be much better user experience if they get a message like "Device image compression failed." + e.what().

In 946a738, I've wrapped zstd::compress in try/catch to throw a more meaningful error message. Note that this will only work if DPC++ is built with LLVM_ENABLE_EH.

@@ -731,6 +733,14 @@ setSpecializationConstants(const std::shared_ptr<device_image_impl> &InputImpl,
}
}

static inline void CheckAndDecompressImage(RTDeviceBinaryImage *Img) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add [[maybe_unused]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants