Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge latest upstream bits into our fork #7

Merged
merged 152 commits into from
Aug 24, 2022
Merged

Conversation

brent-carmer
Copy link

No description provided.

LebedevRI and others added 30 commits March 31, 2022 15:23
* `-mtune=`/`-mcpu=` support for x86 AMD CPU's

* Move processor tune into it's own enum, out of features

* clang-format

* Target: make Processor more optional

* Processor: add explanatory comments which CPU is what

* Drop outdated changes

* Make comments in Processor more readable / fix BtVer2 comment

* Target: don't require passing Processor

* Make processor more optional in the features string serialization/verification

* Address review notes

* Undo introduction of halide_target_processor_t

* Fix year for btver2/jaguar
* Scalarize predicated Loads

* Cleanup

* Fix gpu_vectorize scalarization for D3D12

* Fix OpenCL scalarization

* Minor fixes

* Formatting

* Address review comments

* Move Shuffle impl to CodeGen_GPU_C class

* Extra space removal

Co-authored-by: Shoaib Kamil <kamil@adobe.com>
For vector-of-Buffers, the ctor took a non-const ref to the argument, which was weird and nonsensical. Replaced with a const-ref version and and an rvalue-ref version; it turns out that literally *all* of the internal calls were able to use the latter, trivially saving some copies.
* `-mtune=native` CPU autodetection for AMD Zen 3 CPU

* Address review notes.

* Fix MSVC build

* Address review notes
* Bump development Halide version to 15.0.0

* trigger buildbots
* Remove the nobuild/partialbuildmethod tests from python_bindings/

They no longer serve a purpose and are redundant to other tests.

* WIP

* Update pystub.py

* wip

* wip

* wip

* Update TargetExportScript.cmake

* Update PythonExtensionHelpers.cmake

* PyExtensionGen didn't handle zero-dimensional buffers
* Fix "set but not used" warnings/errors

Apparently XCode 13.3 has smarter warnings about unused code and emits warnings/errors for these, so let's clean them up.

* Also fix missing `ssize_t` usage
It was deprecated (in favor of `OutputFileType` in Halide 14; let's remove it entirely for Halide 15.
This was deprecated in Halide 14; let's remove it entirely for Halide 15.
* Drop support for LLVM12

Halide 15 only needs to support LLVM13 and LLVM13. Drop all the special-casing for LLVM12.

* Update packaging.yml

* Update presubmit.yml

* 13

* more

* Update presubmit.yml

* woo

* Update presubmit.yml

* Update run-clang-tidy.sh

* Update run-clang-tidy.sh

* Update .clang-tidy

* Update .clang-tidy

* wer

* Update Random.cpp

* wer

* sdf

* sdf

* Update packaging.yml
Goal here: eliminate the need for a local version of llvm/clang-12, and don't stay too far behind the toolchain.

As always, clang-format doesn't promise backwards compatibility, but the main differences in formatting are:
- more regularization of spaces at the start of comments (I like this change)
- minor difference of formatting of function-pointer-type declarations (not a fan of this, but I can't find a way to disable it and it's only really used in a handful of place in the Python bindings)
* Always mark _ucon as 'unused' in Codegen_C, even if asserts are enabled, since generated closure functions may not use it

* halide_unused -> halide_maybe_unused

* fix test_internal

* More halide_unused -> halide_maybe_unused
Clang 13 removed the `return-std-move-in-c++11` warning entirely, so specifying it now warns that the warning is unknown.
…2) (halide#6677)

* add widening_mul using vpmaddwd for AVX2

* add vpmaddwd/pmaddwd test

* add widening_mul with pmaddwd for SSE2
* Drop support for Matlab extensions

Anecdotally, this hasn't been used in ~years, and the original author (@dsharletg) had suggested dropping it a while back. I'm going to propose we go ahead and drop it for Halide 15 and see who complains.

* Fixes for top-of-tree LLVM

* Update force_include_types.cpp

* trigger buildbots

* Update CodeGen_LLVM.cpp
* llvm no longer wants a type suffix on vst intrinsics

* Fix silly mistake

* Change 64-bit only

Co-authored-by: Andrew Adams <anadams@adobe.com>
…de#6704)

This allows for `compute_with` and `rfactor` to work more seamlessly in Python.

Also:
- Move two compute_with() variant bindings from PyFunc and PyStage to PyScheduleMethods, as they are identical between the two
- drive-by removal of redundant `py::implicitly_convertible<ImageParam, Func>();` call
* Remove the last remaining call to getPointerElementType()

LLVM is moving to opaque pointers, we must have missed this one in previous work

* ARM vst mangling needs to be conditional on opaque ptrs

The fixes from last week regarding mangling of arm vst intrinsics needs to be made conditional on whether the pointer is opaque or not; this will change based on whether `-D CLANG_ENABLE_OPAQUE_POINTERS=ON|OFF` is defined when LLVM is built, but should be sniffed via this API, according to my LLVM contact.

* Revert "ARM vst mangling needs to be conditional on opaque ptrs"

This reverts commit 9901314.
The fixes from last week regarding mangling of arm vst intrinsics needs to be made conditional on whether the pointer is opaque or not; this will change based on whether `-D CLANG_ENABLE_OPAQUE_POINTERS=ON|OFF` is defined when LLVM is built, but should be sniffed via this API, according to my LLVM contact.
* Combine string constants in combine_strings()

This is a pretty trivial optimization, but when printing (or enabling `debug`), it cuts the number of `halide_string_to_string()` calls we generate by ~half.

* Update IROperator.cpp
* Update CodeGen_PTX_Dev to use new PassManager

This was still using the LegacyPassManager for optimization, which will be going away at some point. (Code changes by @alinas; I'm just opening this PR on her behalf)

* Fixes after review
steven-johnson and others added 16 commits August 1, 2022 13:08
)

* recognize the patterns used for the RHS matrix

* make 1d tile matcher more robust

* put getting rhs tile's index into a separate func

* expand the tests used in correctness check

* add exclamation mark

* remove unused vars

* run format and tidy

* check for null before using IR in the next step

* check if the broadcast was found

* llvm below 13 is no longer supported

* replace single pattern with commutative permutations

* check if the stride is an `IntImm`, otherwise reject pattern

* apply clang-format-13

* rename wild_i32 -> v2

* check if v1 could be the stride value

* add more detail to a receiving a bad type

* added short explanation of the right-hand matrix layout

* added explanation for where the 4 comes from

* provide further documentation as to the layout of AMX

* add comments for expected patterns to get_3d_rhs_tile_index

* Document the matched pattern

Co-authored-by: Steven Johnson <srj@google.com>
* Fix autoscheduling trivial lut wrappers

Fixes halide#6899

* trigger buildbots

Co-authored-by: Steven Johnson <srj@google.com>
* Fix broken Makefile rules for autoschedulers on OSX

A few issues here:
- Make was building the plugins as .dylib on OSX, but they should have been .so to match Linux (and just on general principles)
- On OSX, explicitly linking libHalide.dylib into a plugin means that it will load its own copy of libHalide, which is bad, because it means the plugin doesn't share the same set of globals. We need to omit that explicit dependency and allow it to just find the exported symbols at load time.
- Add a test to verify the fix; run it everywhere even though it should only have been failing for Make-build OSX builds.

Finally, let me add that we really need to set a sunset date for supporting Make in Halide. The Makefiles aren't really maintained properly anymore, and when something subtle goes wrong, it takes an unreasonable amount of time to debug for something that is no longer our canonical build tool.

* Use order-only prerequisites

* Remove new load_plugin.cpp test

Not worth the complexity for the extra test coverage.
Co-authored-by: Lukas Trümper <lukas.truemper@outlook.de>
- variable 'count' set but not used
- warning: use of bitwise '|' with boolean operands
If the realization is tuple-valued, and the condition on the realization
uses a tuple call (index != 0), then the condition wasn't getting
resolved during the split_tuples pass. The cause was a missing mutate
call.
* Fix wrong install path for *.py files

We were looking in a nonexistent dir, so we never copied `__init__.py` as we should have.

* Update CMakeLists.txt
* Remove AddCudaToTarget.cmake
* Remove MakeShellPath.cmake
* Use CheckLinkerFlag in TargetExportScript
* Use DEPFILE for all generators
* Use REQUIRED with find_program, where applicable
* Use REQUIRED with find_library, where applicable
* Use CMake 3.21 cache behavior in HalideTargetHelpers.cmake
* Replace uses of get_filename_component with cmake_path
* Rework BLAS detection in linear_algebra app
* Drive-by: fix autotune_loop.sh install rule.
* Fix CBLAS header in linear_algebra test_halide_blas
* Make saturating_cast an intrinsic

* handle saturating_cast in Bounds.cpp + add bounds tests

* update saturating_cast CodeGen

* with_lanes should work on intrinsics as well

* lift to saturating_cast in FindIntrinsics

* update intrinsics test for u16_sat

* better sat_cast(widen(expr)) handling in find_intrinsics

* simplify bounds of saturating_cast + update is_monotonic
brent-carmer and others added 12 commits August 9, 2022 14:17
* Halide::Error should not extend std::runtime_error

Unfortunately, the std error/exception classes aren't marked for DLLEXPORT under MSVC; we need our Error classes to be DLLEXPORT for libHalide (and python bindings). The current situation basically causes MSVC to generator another version of `std::runtime_error` marked for DLLEXPORT, which can lead to ODR violations, which are bad. AFAICT we don't really rely on this inheritance anywhere, so this just eliminates the inheritance entirely.

(Note that I can't point to a specific malfunction resulting from this, but casual googling based on the many warnings MSVC emits about the current situation has me convinced that it needs addressing.)

* noexcept
* Rework PYTHONPATH
* Move pure-Python file copying logic to build time.
* Use TARGET_RUNTIME_DLLS to copy all DLLs instead of just Halide.
* Ensure that the last path component for Halide_Python is always `halide`
* Simplify __init__.py now that it's copied to build tree
* Add helper to de-duplicate PYTHONPATH test logic

Fixes halide#6870

Co-authored-by: Alex Reinking <alex.reinking@gmail.com>
Co-authored-by: Alex Reinking <reinking@google.com>
…as non-Python does) (halide#6932)

* Tutorial 10 needs to be skipped for Python when targeting Wasm (just as non-Python does)

* fixes

* Update CMakeLists.txt
)

Also, drive-by rename of 'default' to 'base' to better imply the inheritance
Add ASAN support

Co-authored-by: Alex Reinking <reinking@google.com>
halide#6928)

* Minimal approach to making Deinterleave correct for Reinterpret

* Add minimal useful implementation of extracting and concatenating bits

* clang-tidy

* More clang-tidy fixes

* Add missing error message

* Add low-bit-depth noise test

* Add test to cmake build

* Fix power-of-two check

* Remove dead object

* Add little-endian comment to reinterpret IR node

* Simplify concat_bits of single arg

* Add missing second arg

* Fix concat_bits call

Co-authored-by: Andrew Adams <anadams@adobe.com>
@DawnStone DawnStone merged commit 141a3ec into inteon-latest Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.