-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenCL][Textures] Always use SSA for texture loading #14397
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
0c182fd
to
9d12480
Compare
cc: @elvin-n, @masahi, @csullivan please review it. |
// Only use local SSA if texture is not already being stored | ||
if (need_texture_ssa_) { | ||
std::string rhs = SSAGetID(ss.str(), op->dtype.with_lanes(4)); | ||
if (op->args.back().as<RampNode>()) { | ||
os << rhs; | ||
} else { | ||
os << "(("; | ||
this->PrintType(op->dtype.with_lanes(1), os); | ||
os << "*)&" << rhs << ")["; | ||
this->PrintExpr(op->args.back(), os); | ||
os << "]"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, without checking if the data is already stored, this ended up repeated stores, resulting in less optimal opencl code. It might be optimized by the runtime compiler but I recall a non-trivial perf difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review! I have added several new test cases. They cover case when the data was storage and local variable is reused. Also add a new one which check that writing float [4]
to image2d_t
will be handled correctly:
float out_local[4];
// ...
write_imagef(out, (int2)((((((int)get_group_id(0)) * 14) + ((int)get_local_id(0))) % 38), (((((int)get_group_id(0)) * 64) + (((int)get_local_id(0)) >> 1)) / 19)), vload4(0, out_local + 0));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, if you find a case with this in the future please consider adding it as a follow up test. Otherwise all LGTM, thanks @echuraev!
f9ac7ce
to
4925686
Compare
@tvm-bot rerun |
// Only use local SSA if texture is not already being stored | ||
if (need_texture_ssa_) { | ||
std::string rhs = SSAGetID(ss.str(), op->dtype.with_lanes(4)); | ||
if (op->args.back().as<RampNode>()) { | ||
os << rhs; | ||
} else { | ||
os << "(("; | ||
this->PrintType(op->dtype.with_lanes(1), os); | ||
os << "*)&" << rhs << ")["; | ||
this->PrintExpr(op->args.back(), os); | ||
os << "]"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, if you find a case with this in the future please consider adding it as a follow up test. Otherwise all LGTM, thanks @echuraev!
In some cases we must use SSA for textures loading but we didn't do that. Example of such cases: 1. Storing texture (NCHW4c) directly (w/o temporary buffer) to the output buffer (NCHW). In this case we have to use SSA because we need to get only one channel from the pixel. In case of storing to the local buffer the SSA was used because the buffer was allocated in kernel and the logic was written that if the buffer was allocated then we should use SSA. But if we store the same texture directly to the output buffer then SSA wasn't used and this OpenCL code wasn't compiled. 2. Casting texture (NCHW4c) to another data type and then storing it to the buffer (NCHW). The SSA for textures was disabled in case of cast operation. As a result it was necessary to take an channel from the pixel but we got the vector data type (e.g. float4) and then we tried to cast it to scalar data type. This code also wasn't compiled. In this PR SSA form was enabled for all cases when `texture2d_load` is used. The relevant tests cases were added.
93a11bf
to
be82808
Compare
@tvm-bot rerun |
1 similar comment
@tvm-bot rerun |
In some cases we must use SSA for textures loading but we didn't do
that. Example of such cases:
output buffer (NCHW). In this case we have to use SSA because we
need to get only one channel from the pixel. In case of storing to
the local buffer the SSA was used because the buffer was allocated
in kernel and the logic was written that if the buffer was allocated
then we should use SSA. But if we store the same texture directly to
the output buffer then SSA wasn't used and this OpenCL code wasn't
compiled.
the buffer (NCHW). The SSA for textures was disabled in case of cast
operation. As a result it was necessary to take a channel from the
pixel but we got the vector data type (e.g. float4) and then we
tried to cast it to scalar data type. This code also wasn't
compiled.
In this PR SSA form was enabled for all cases when
texture2d_load
isused. The relevant tests cases were added.