{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":762181983,"defaultBranch":"main","name":"BitBLAS","ownerLogin":"microsoft","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2024-02-23T08:50:03.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/6154722?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1725416207.0","currentOid":""},"activityList":{"items":[{"before":"4106902702ba4c6e75f004f94e3889999c1a2023","after":"6333d3b7c119f31f5aec6832392ce1b4651c09c5","ref":"refs/heads/main","pushedAt":"2024-09-20T15:09:57.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Dev] Improve Dequant performance on CUDA Simt (#189)\n\n* fix for int8 gemm\r\n\r\n* T4\r\n\r\n* Refactor code to handle loading of runtime module and handle exceptions","shortMessageHtmlLink":"[Dev] Improve Dequant performance on CUDA Simt (#189)"}},{"before":"ce7466c9bebe6054d1114e91c49d8892437a9df2","after":"4106902702ba4c6e75f004f94e3889999c1a2023","ref":"refs/heads/main","pushedAt":"2024-09-20T07:12:13.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Dev] Dequante SIMT Matmul Implementation. (#188)\n\n* fix for int8 gemm\r\n\r\n* T4","shortMessageHtmlLink":"[Dev] Dequante SIMT Matmul Implementation. (#188)"}},{"before":"916a54cb0b1fa41f0d79af06852b8c927725ab40","after":"ce7466c9bebe6054d1114e91c49d8892437a9df2","ref":"refs/heads/main","pushedAt":"2024-09-18T07:09:07.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"fix for int8 gemm (#185)","shortMessageHtmlLink":"fix for int8 gemm (#185)"}},{"before":"2f6d316be9f9d70f2845c2f319ac2f348d0cd6a6","after":"916a54cb0b1fa41f0d79af06852b8c927725ab40","ref":"refs/heads/main","pushedAt":"2024-09-17T10:21:41.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Dev] Bug fix for Block Reduce Template and improve TL (#183)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands\r\n\r\n* import scripts\r\n\r\n* remove shared mem hack\r\n\r\n* revert change for swizzling\r\n\r\n* bug fix\r\n\r\n* tl examples\r\n\r\n* Enhance Swizzle\r\n\r\n* lint fix\r\n\r\n* test fix\r\n\r\n* lint fix\r\n\r\n* optimize layout\r\n\r\n* update tl utils.\r\n\r\n* macro optimization\r\n\r\n* test fix\r\n\r\n* gemm_ss\r\n\r\n* doc fix\r\n\r\n* lint fix\r\n\r\n* lint fix\r\n\r\n* remove debug print\r\n\r\n* remove debug print\r\n\r\n* vectorization init\r\n\r\n* lint fix\r\n\r\n* prelude update\r\n\r\n* update tvm\r\n\r\n* bug fix for reduce_k with shared memory\r\n\r\n* bug fix\r\n\r\n* bug fix\r\n\r\n* Enhance Macro Generation\r\n\r\n* Lift Layout to reduce load time\r\n\r\n* lint fix\r\n\r\n* test fix\r\n\r\n* red fix","shortMessageHtmlLink":"[Dev] Bug fix for Block Reduce Template and improve TL (#183)"}},{"before":"5e3da9b64e39a80ee83604ec580af3677cbd1f03","after":"2f6d316be9f9d70f2845c2f319ac2f348d0cd6a6","ref":"refs/heads/main","pushedAt":"2024-09-07T04:55:26.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[TL] Enhance TL to import customized c headers (#179)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands\r\n\r\n* import scripts\r\n\r\n* remove shared mem hack\r\n\r\n* revert change for swizzling\r\n\r\n* bug fix\r\n\r\n* tl examples\r\n\r\n* Enhance Swizzle\r\n\r\n* lint fix\r\n\r\n* test fix\r\n\r\n* lint fix\r\n\r\n* optimize layout\r\n\r\n* update tl utils.\r\n\r\n* macro optimization\r\n\r\n* test fix\r\n\r\n* gemm_ss\r\n\r\n* doc fix\r\n\r\n* lint fix\r\n\r\n* lint fix\r\n\r\n* remove debug print\r\n\r\n* remove debug print\r\n\r\n* vectorization init\r\n\r\n* lint fix\r\n\r\n* prelude update","shortMessageHtmlLink":"[TL] Enhance TL to import customized c headers (#179)"}},{"before":"11649f0272f786b086d5f6e3b1c1caaaecbcc41c","after":"5e3da9b64e39a80ee83604ec580af3677cbd1f03","ref":"refs/heads/main","pushedAt":"2024-09-06T12:18:19.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[TL] Allow T.clear be applied on a \"local\" Buffer and improve L2 Swizzle (#178)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands\r\n\r\n* import scripts\r\n\r\n* remove shared mem hack\r\n\r\n* revert change for swizzling\r\n\r\n* bug fix\r\n\r\n* tl examples\r\n\r\n* Enhance Swizzle\r\n\r\n* lint fix\r\n\r\n* test fix\r\n\r\n* lint fix\r\n\r\n* optimize layout\r\n\r\n* update tl utils.\r\n\r\n* macro optimization\r\n\r\n* test fix\r\n\r\n* gemm_ss\r\n\r\n* doc fix\r\n\r\n* lint fix\r\n\r\n* lint fix\r\n\r\n* remove debug print\r\n\r\n* remove debug print\r\n\r\n* vectorization init\r\n\r\n* lint fix","shortMessageHtmlLink":"[TL] Allow T.clear be applied on a \"local\" Buffer and improve L2 Swiz…"}},{"before":"b9fab25cb69a41768a5d1ecf14531c107d7955b1","after":"11649f0272f786b086d5f6e3b1c1caaaecbcc41c","ref":"refs/heads/main","pushedAt":"2024-09-06T07:57:50.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[TL] Inject Storage Sync Scope Automatically for TL (#177)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands\r\n\r\n* import scripts\r\n\r\n* remove shared mem hack\r\n\r\n* revert change for swizzling\r\n\r\n* bug fix\r\n\r\n* tl examples\r\n\r\n* Enhance Swizzle\r\n\r\n* lint fix\r\n\r\n* test fix\r\n\r\n* lint fix\r\n\r\n* optimize layout\r\n\r\n* update tl utils.\r\n\r\n* macro optimization\r\n\r\n* test fix\r\n\r\n* gemm_ss\r\n\r\n* doc fix\r\n\r\n* lint fix\r\n\r\n* lint fix\r\n\r\n* remove debug print\r\n\r\n* remove debug print","shortMessageHtmlLink":"[TL] Inject Storage Sync Scope Automatically for TL (#177)"}},{"before":"c15744ed2ea2cb0c673e4503c59d825525ed572b","after":"b9fab25cb69a41768a5d1ecf14531c107d7955b1","ref":"refs/heads/main","pushedAt":"2024-09-06T05:42:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[TL] Support GEMM_SS Macro to perform gemm directly from shared memory (#176)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands\r\n\r\n* import scripts\r\n\r\n* remove shared mem hack\r\n\r\n* revert change for swizzling\r\n\r\n* bug fix\r\n\r\n* tl examples\r\n\r\n* Enhance Swizzle\r\n\r\n* lint fix\r\n\r\n* test fix\r\n\r\n* lint fix\r\n\r\n* optimize layout\r\n\r\n* update tl utils.\r\n\r\n* macro optimization\r\n\r\n* test fix\r\n\r\n* gemm_ss","shortMessageHtmlLink":"[TL] Support GEMM_SS Macro to perform gemm directly from shared memory ("}},{"before":"3aa943975577a18f725a542f45c0e2ed98559857","after":"c15744ed2ea2cb0c673e4503c59d825525ed572b","ref":"refs/heads/main","pushedAt":"2024-09-04T03:26:55.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[TL] Add TL Layout and Macro utils (#174)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands\r\n\r\n* import scripts\r\n\r\n* remove shared mem hack\r\n\r\n* revert change for swizzling\r\n\r\n* bug fix\r\n\r\n* tl examples\r\n\r\n* Enhance Swizzle\r\n\r\n* lint fix\r\n\r\n* test fix\r\n\r\n* lint fix\r\n\r\n* optimize layout\r\n\r\n* update tl utils.\r\n\r\n* macro optimization\r\n\r\n* test fix","shortMessageHtmlLink":"[TL] Add TL Layout and Macro utils (#174)"}},{"before":"6f845daa5b008789ea5ef4c2b4991890814ca3d9","after":null,"ref":"refs/heads/dependabot/github_actions/dot-github/workflows/actions/download-artifact-4.1.7","pushedAt":"2024-09-04T02:16:47.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"}},{"before":"ab979132153deaaf264fa74be1bf47dff5ad1907","after":"3aa943975577a18f725a542f45c0e2ed98559857","ref":"refs/heads/main","pushedAt":"2024-09-04T02:16:40.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"chore(deps): bump actions/download-artifact in /.github/workflows (#175)\n\nBumps [actions/download-artifact](https://github.com/actions/download-artifact) from 3 to 4.1.7.\r\n- [Release notes](https://github.com/actions/download-artifact/releases)\r\n- [Commits](https://github.com/actions/download-artifact/compare/v3...v4.1.7)\r\n\r\n---\r\nupdated-dependencies:\r\n- dependency-name: actions/download-artifact\r\n dependency-type: direct:production\r\n...\r\n\r\nSigned-off-by: dependabot[bot] \r\nCo-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>","shortMessageHtmlLink":"chore(deps): bump actions/download-artifact in /.github/workflows (#175)"}},{"before":null,"after":"6f845daa5b008789ea5ef4c2b4991890814ca3d9","ref":"refs/heads/dependabot/github_actions/dot-github/workflows/actions/download-artifact-4.1.7","pushedAt":"2024-09-03T22:54:21.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"chore(deps): bump actions/download-artifact in /.github/workflows\n\nBumps [actions/download-artifact](https://github.com/actions/download-artifact) from 3 to 4.1.7.\n- [Release notes](https://github.com/actions/download-artifact/releases)\n- [Commits](https://github.com/actions/download-artifact/compare/v3...v4.1.7)\n\n---\nupdated-dependencies:\n- dependency-name: actions/download-artifact\n dependency-type: direct:production\n...\n\nSigned-off-by: dependabot[bot] ","shortMessageHtmlLink":"chore(deps): bump actions/download-artifact in /.github/workflows"}},{"before":"6ffeae1b423ff12c12004bfa57893c8522c93b9b","after":"2fa6f03ab391d76e8a7f772e428df49f960c7128","ref":"refs/heads/ladder_amd","pushedAt":"2024-09-03T03:49:29.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[AMD] Ladder End2End Update for Matrix Code and HIP Codegen (#171)\n\n* amd hip update\r\n\r\n* install cmd\r\n\r\n---------\r\n\r\nCo-authored-by: root ","shortMessageHtmlLink":"[AMD] Ladder End2End Update for Matrix Code and HIP Codegen (#171)"}},{"before":"9b3b73b2c4ce0447aff909b1dc40fdbd86247e8d","after":"ab979132153deaaf264fa74be1bf47dff5ad1907","ref":"refs/heads/main","pushedAt":"2024-09-03T03:49:18.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[TL] Enhance Layout Annotate Pass to handle PTX Inst (#170)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands\r\n\r\n* import scripts\r\n\r\n* remove shared mem hack\r\n\r\n* revert change for swizzling\r\n\r\n* bug fix\r\n\r\n* tl examples\r\n\r\n* Enhance Swizzle\r\n\r\n* lint fix\r\n\r\n* test fix\r\n\r\n* lint fix","shortMessageHtmlLink":"[TL] Enhance Layout Annotate Pass to handle PTX Inst (#170)"}},{"before":null,"after":"6ffeae1b423ff12c12004bfa57893c8522c93b9b","ref":"refs/heads/ladder_amd","pushedAt":"2024-09-03T03:39:36.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"Change ONNX opset number to 14 for operator 'aten::scaled_dot_product_attention'. (#115)","shortMessageHtmlLink":"Change ONNX opset number to 14 for operator 'aten::scaled_dot_product…"}},{"before":"cdd2244bca359db9af126e2264da37c62e6d6f9a","after":null,"ref":"refs/heads/tl_tmac","pushedAt":"2024-09-02T16:30:06.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"}},{"before":"c55600f82cfdda329e139a0858d62ba463ae7ce4","after":"9b3b73b2c4ce0447aff909b1dc40fdbd86247e8d","ref":"refs/heads/main","pushedAt":"2024-09-02T09:48:19.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[TL] Update several TL Examples (#168)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands\r\n\r\n* import scripts\r\n\r\n* remove shared mem hack\r\n\r\n* revert change for swizzling\r\n\r\n* bug fix\r\n\r\n* tl examples","shortMessageHtmlLink":"[TL] Update several TL Examples (#168)"}},{"before":"1b242b9f111377492715152b89df9ffcd7912826","after":"c55600f82cfdda329e139a0858d62ba463ae7ce4","ref":"refs/heads/main","pushedAt":"2024-09-01T13:03:32.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Dev] Revert Hack impl for memory caching (#167)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands\r\n\r\n* import scripts\r\n\r\n* remove shared mem hack\r\n\r\n* revert change for swizzling\r\n\r\n* bug fix","shortMessageHtmlLink":"[Dev] Revert Hack impl for memory caching (#167)"}},{"before":"b1f5e7915553b5cce4720aa373fdf97dcfc60a99","after":"1b242b9f111377492715152b89df9ffcd7912826","ref":"refs/heads/main","pushedAt":"2024-08-31T17:18:24.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Dev] Enhance Thread Sync Injector for Stream-K Implementation (#166)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands\r\n\r\n* import scripts","shortMessageHtmlLink":"[Dev] Enhance Thread Sync Injector for Stream-K Implementation (#166)"}},{"before":"f284c32be99a52fcf1e93218aa72d42d597b8f25","after":"b1f5e7915553b5cce4720aa373fdf97dcfc60a99","ref":"refs/heads/main","pushedAt":"2024-08-31T06:45:57.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"update tvm (#165)\n\nCo-authored-by: leiwang1999 ","shortMessageHtmlLink":"update tvm (#165)"}},{"before":"872d6d71b2c6caee294544b1364132f413be5262","after":"f284c32be99a52fcf1e93218aa72d42d597b8f25","ref":"refs/heads/main","pushedAt":"2024-08-30T04:30:50.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[BugFix] Fix BitBLAS Linear with BFloat16 input (#164)\n\n* Merge branch 'main' of https://github.com/microsoft/BitBLAS into main\r\n\r\n* remove debug print\r\n\r\n* Refactor Matmul class for improved readability and maintainability\r\n\r\n* Refactor Matmul class for improved readability and maintainability\r\n\r\n* revert set device\r\n\r\n* lint fix\r\n\r\n* register fp8 for dynamic\r\n\r\n* Linear Fix","shortMessageHtmlLink":"[BugFix] Fix BitBLAS Linear with BFloat16 input (#164)"}},{"before":"2c091f8eb29edb4405a0d56d820af7933105565f","after":"872d6d71b2c6caee294544b1364132f413be5262","ref":"refs/heads/main","pushedAt":"2024-08-30T03:27:11.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Docs] Update install command from github repo (#163)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes\r\n\r\n* update install commands","shortMessageHtmlLink":"[Docs] Update install command from github repo (#163)"}},{"before":"ad1d7aea12a2f2c57e8965b02c74524c2dcae26b","after":"2c091f8eb29edb4405a0d56d820af7933105565f","ref":"refs/heads/main","pushedAt":"2024-08-30T03:19:45.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[BUGFix] Resgiter missing FP8 LDMATRIX Instructions for dynamic shared memory (#162)\n\n* Merge branch 'main' of https://github.com/microsoft/BitBLAS into main\r\n\r\n* remove debug print\r\n\r\n* Refactor Matmul class for improved readability and maintainability\r\n\r\n* Refactor Matmul class for improved readability and maintainability\r\n\r\n* revert set device\r\n\r\n* lint fix\r\n\r\n* register fp8 for dynamic","shortMessageHtmlLink":"[BUGFix] Resgiter missing FP8 LDMATRIX Instructions for dynamic share…"}},{"before":"2c6a1e87e6044924c877b7147cdae1c846192c2a","after":"ad1d7aea12a2f2c57e8965b02c74524c2dcae26b","ref":"refs/heads/main","pushedAt":"2024-08-29T11:32:16.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[BUGFix] Disable tensorcore when shape is really small (#159)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix\r\n\r\n* dispatch tensor core based on shapes","shortMessageHtmlLink":"[BUGFix] Disable tensorcore when shape is really small (#159)"}},{"before":"393c53e2ed136782540a6d2a0234bdc517d0b812","after":"2c6a1e87e6044924c877b7147cdae1c846192c2a","ref":"refs/heads/main","pushedAt":"2024-08-29T09:30:55.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Benchmark] Fast Decoding Benchmark (#158)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* buf fix for matrix support\r\n\r\n* lint fix","shortMessageHtmlLink":"[Benchmark] Fast Decoding Benchmark (#158)"}},{"before":"f40d9bab15425b938933468b51e704074e48e526","after":"393c53e2ed136782540a6d2a0234bdc517d0b812","ref":"refs/heads/main","pushedAt":"2024-08-28T17:27:13.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[BUG] Set Device when kernel be applied into Multiple GPUs. (#155)\n\n* Merge branch 'main' of https://github.com/microsoft/BitBLAS into main\r\n\r\n* remove debug print\r\n\r\n* Refactor Matmul class for improved readability and maintainability\r\n\r\n* Refactor Matmul class for improved readability and maintainability\r\n\r\n* revert set device\r\n\r\n* lint fix","shortMessageHtmlLink":"[BUG] Set Device when kernel be applied into Multiple GPUs. (#155)"}},{"before":"1e1a9571a96ed637bf9bff2556d1924aadea7412","after":"f40d9bab15425b938933468b51e704074e48e526","ref":"refs/heads/main","pushedAt":"2024-08-24T07:16:14.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Dev] Serialize Generated Kernel Name with Operator Config and Hint (#153)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Kernel Name\r\n\r\n* Refactor TIR CUDA source wrapper for improved readability and maintainability\r\n\r\n* bug fix","shortMessageHtmlLink":"[Dev] Serialize Generated Kernel Name with Operator Config and Hint (#…"}},{"before":"673290ba6d64071d7629cfd33d09f4374d94a482","after":"1e1a9571a96ed637bf9bff2556d1924aadea7412","ref":"refs/heads/main","pushedAt":"2024-08-23T06:48:29.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Version] Bump Version to 0.0.1.dev15 (#149)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* bug fix in test\r\n\r\n* lint fix.\r\n\r\n* test cuda i4 kernel\r\n\r\n* Refactor copyright notice in i4matmul.hpp\r\n\r\n* Refactor BitBLASLinear test module for improved readability and maintainability\r\n\r\n* refactor test as version below python 3.9 cannot handle int32 overflow.\r\n\r\n* format lint for test\r\n\r\n* Refactor test_int4b_fp16_convert.py for improved readability and maintainability\r\n\r\n* remove unused design file\r\n\r\n* move tile device from package to base\r\n\r\n* dummy impl for codegen\r\n\r\n* Refactor file structure for ladder_permutate module\r\n\r\n* Refactor backend class and fix typos in comments\r\n\r\n* Deep refactor Lib related code.\r\n\r\n* remove ci pull.\r\n\r\n* LintFix\r\n\r\n* refactor builder for whl build\r\n\r\n* Refactor TIRWrapper.wrap() method to include an assertion for the optimized module\r\n\r\n* Refactor lib_generator to set library and source paths\r\n\r\n* lint fix\r\n\r\n* BitNet vllm integration\r\n\r\n* chore: update codespell to version 2.3.0\r\n\r\n* Lintfix\r\n\r\n* Bump version to 0.0.1.dev13\r\n\r\n* lint fix\r\n\r\n* disable fast decoding [u]int4xint8 by default.\r\n\r\n* optimize from dict design in Hint\r\n\r\n* Implement SplitK\r\n\r\n* bitnet benchmark generation.\r\n\r\n* Add benchmark script for BitNet integration\r\n\r\n* AtomicAdd Support\r\n\r\n* LintFix\r\n\r\n* ci fix when 3rdparty tvm is initialized.\r\n\r\n* bug fix for setup\r\n\r\n* fix a bug in block reduce\r\n\r\n* typo fix\r\n\r\n* BUG Fix for block reduce.\r\n\r\n* Lint fix\r\n\r\n* Refactor block reduce schedule template\r\n\r\n* transform branch from bitblas to bitblas_tl\r\n\r\n* Fix subproject commit reference in 3rdparty/tvm\r\n\r\n* chore: update submodule branch from bitblas to bitblas_tl\r\n\r\n* force update config.cmake\r\n\r\n* Bug fix\r\n\r\n* Fix subproject commit reference in 3rdparty/cutlass\r\n\r\n* chore: Add submodule for cutlass library\r\n\r\n* update tl cutlass path\r\n\r\n* Refactor BitBLASLinear test module for improved readability and maintainability\r\n\r\n* format fix\r\n\r\n* Copy CUTLASS to the package directory\r\n\r\n* Refactor setup.py to include additional TVM header files\r\n\r\n* lint fix\r\n\r\n* bug fix\r\n\r\n* Refactor BitBLASLinear test module for improved readability and maintainability\r\n\r\n* Implement Matmul Benchmark Design\r\n\r\n* chore: Update BitBLAS Matmul benchmark script\r\n\r\n* lint fix\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark for improved readability and maintainability\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* lint fix\r\n\r\n* Benchmark bot test\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* int8 test case\r\n\r\n* Refactor compare_benchmark.py to handle missing benchmark results gracefully\r\n\r\n* ci fix\r\n\r\n* disable ci for test benchmark\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* remove cli installation\r\n\r\n* chore: Create virtual environment and install dependencies for benchmark\r\n\r\n* chore: Update benchmark workflow to include comparison step\r\n\r\n* Lint fix\r\n\r\n* upodate tvm cmmit\r\n\r\n* Imporve lower warp memory pass\r\n\r\n* Bug fix\r\n\r\n* Enhance to support warp schedule.\r\n\r\n* Enhance LOP3 Instructions\r\n\r\n* Enhance LOP3 Instructions\r\n\r\n* add test for stage3 propagate\r\n\r\n* implement propagate func\r\n\r\n* Stage3 Ladder Permutate integration\r\n\r\n* get_ladder_stage3_propagate\r\n\r\n* comments benchmark scirpts as the setting is too big\r\n\r\n* ci fix for benchmark\r\n\r\n* lint fix\r\n\r\n* chore: Update benchmark workflow to trigger on pull request comments\r\n\r\n* Add LDMatrix Transform 3\r\n\r\n* Support GPTQ Test\r\n\r\n* Fuse BlockReduce Schedule\r\n\r\n* Support mma propagate 3\r\n\r\n* Support MMA Propagate Stage 3\r\n\r\n* Lint Fix\r\n\r\n* Merge block reduce for dequantze config.\r\n\r\n* fix codeql\r\n\r\n* chore: Update submodule reference to latest commit\r\n\r\n* chore: Disable common subexpression elimination in TIR passes\r\n\r\n* Lint Fix\r\n\r\n* 4bit related lop3 updates.\r\n\r\n* lint fix\r\n\r\n* gptq test fix\r\n\r\n* Fix for test\r\n\r\n* lint fix\r\n\r\n* lint fix\r\n\r\n* typofix\r\n\r\n* QuantCompress Test\r\n\r\n* chore: Refactor quant_compress_impl.py for readability and maintainability\r\n\r\n* Enhance docs to update latest works.\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* removed legacy operator\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* LintFix\r\n\r\n* Fix GPTQ Repack with the latest weight transform\r\n\r\n* lint fix\r\n\r\n* bug fix for rescale dequantize\r\n\r\n* test fix\r\n\r\n* typo fix\r\n\r\n* lint fix\r\n\r\n* Set default weight propagate kind into LDMatrixTransform\r\n\r\n* lint fix\r\n\r\n* bug fix\r\n\r\n* bug fix for test\r\n\r\n* set default to stage3\r\n\r\n* revert change\r\n\r\n* lint fix\r\n\r\n* case fix\r\n\r\n* bug fix\r\n\r\n* fix for legalize\r\n\r\n* bug fix\r\n\r\n* chore: Clear global operator cache before running tests\r\n\r\n* revert optimize_stratety into SingleBatchDecodeOnly\r\n\r\n* typofix\r\n\r\n* update benchmark scripts\r\n\r\n* chore: Refactor benchmark scripts and fix typos\r\n\r\n* fix for testing\r\n\r\n* lint fix\r\n\r\n* fix import.\r\n\r\n* typo\r\n\r\n* operator benchmark\r\n\r\n* optimize\r\n\r\n* always with shared.dyn\r\n\r\n* optimize cache.\r\n\r\n* dsl fix\r\n\r\n* tqdm\r\n\r\n* chore: Add serialize_results method to benchmark_matmul_strategies.py\r\n\r\n* fix performance issue for dynamic async copy\r\n\r\n* chore: Refactor benchmark_matmul_strategies.py for improved performance and code readability\r\n\r\n* bug fix\r\n\r\n* update readme\r\n\r\n* disable block reduce for int8\r\n\r\n* bugfix for bitnet\r\n\r\n* annotatte todo.\r\n\r\n* lint fix\r\n\r\n* regist fast_decode for int8xint4\r\n\r\n* Refactor CUDA code to use sm architecture instead of compute architecture\r\n\r\n* compress qkv and gate up for bitnet\r\n\r\n* improve elementwise schedule\r\n\r\n* Refactor BitNet model checkpoint generation scripts\r\n\r\n* cross thread reduce for tl\r\n\r\n* fix scale only lop3 tensorize instructions.\r\n\r\n* bug fix for scale only case\r\n\r\n* fix scale for warp memory dequantize\r\n\r\n* lint fix\r\n\r\n* bug fox\r\n\r\n* format\r\n\r\n* fix repack from gptqv2\r\n\r\n* chore: Enable large files for Hugging Face models\r\n\r\n* bump version to dev14\r\n\r\n* BF 16 Update\r\n\r\n* lint fix\r\n\r\n* chore: Update BitBLAS benchmark scripts and fix typos\r\n\r\n* chore: Add gptqmodel to test requirements\r\n\r\n* remove gptqmodel dep for test\r\n\r\n* chore: Remove gptqmodel dependency for test\r\n\r\n* Bump version to dev15","shortMessageHtmlLink":"[Version] Bump Version to 0.0.1.dev15 (#149)"}},{"before":"ef28a5d23a2ad2d449d86f91701a088cb5b22315","after":"673290ba6d64071d7629cfd33d09f4374d94a482","ref":"refs/heads/main","pushedAt":"2024-08-23T06:32:38.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Dev] Support Numeric Precision BFloat16 as activation type (#148)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* bug fix in test\r\n\r\n* lint fix.\r\n\r\n* test cuda i4 kernel\r\n\r\n* Refactor copyright notice in i4matmul.hpp\r\n\r\n* Refactor BitBLASLinear test module for improved readability and maintainability\r\n\r\n* refactor test as version below python 3.9 cannot handle int32 overflow.\r\n\r\n* format lint for test\r\n\r\n* Refactor test_int4b_fp16_convert.py for improved readability and maintainability\r\n\r\n* remove unused design file\r\n\r\n* move tile device from package to base\r\n\r\n* dummy impl for codegen\r\n\r\n* Refactor file structure for ladder_permutate module\r\n\r\n* Refactor backend class and fix typos in comments\r\n\r\n* Deep refactor Lib related code.\r\n\r\n* remove ci pull.\r\n\r\n* LintFix\r\n\r\n* refactor builder for whl build\r\n\r\n* Refactor TIRWrapper.wrap() method to include an assertion for the optimized module\r\n\r\n* Refactor lib_generator to set library and source paths\r\n\r\n* lint fix\r\n\r\n* BitNet vllm integration\r\n\r\n* chore: update codespell to version 2.3.0\r\n\r\n* Lintfix\r\n\r\n* Bump version to 0.0.1.dev13\r\n\r\n* lint fix\r\n\r\n* disable fast decoding [u]int4xint8 by default.\r\n\r\n* optimize from dict design in Hint\r\n\r\n* Implement SplitK\r\n\r\n* bitnet benchmark generation.\r\n\r\n* Add benchmark script for BitNet integration\r\n\r\n* AtomicAdd Support\r\n\r\n* LintFix\r\n\r\n* ci fix when 3rdparty tvm is initialized.\r\n\r\n* bug fix for setup\r\n\r\n* fix a bug in block reduce\r\n\r\n* typo fix\r\n\r\n* BUG Fix for block reduce.\r\n\r\n* Lint fix\r\n\r\n* Refactor block reduce schedule template\r\n\r\n* transform branch from bitblas to bitblas_tl\r\n\r\n* Fix subproject commit reference in 3rdparty/tvm\r\n\r\n* chore: update submodule branch from bitblas to bitblas_tl\r\n\r\n* force update config.cmake\r\n\r\n* Bug fix\r\n\r\n* Fix subproject commit reference in 3rdparty/cutlass\r\n\r\n* chore: Add submodule for cutlass library\r\n\r\n* update tl cutlass path\r\n\r\n* Refactor BitBLASLinear test module for improved readability and maintainability\r\n\r\n* format fix\r\n\r\n* Copy CUTLASS to the package directory\r\n\r\n* Refactor setup.py to include additional TVM header files\r\n\r\n* lint fix\r\n\r\n* bug fix\r\n\r\n* Refactor BitBLASLinear test module for improved readability and maintainability\r\n\r\n* Implement Matmul Benchmark Design\r\n\r\n* chore: Update BitBLAS Matmul benchmark script\r\n\r\n* lint fix\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark for improved readability and maintainability\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* lint fix\r\n\r\n* Benchmark bot test\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* int8 test case\r\n\r\n* Refactor compare_benchmark.py to handle missing benchmark results gracefully\r\n\r\n* ci fix\r\n\r\n* disable ci for test benchmark\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* remove cli installation\r\n\r\n* chore: Create virtual environment and install dependencies for benchmark\r\n\r\n* chore: Update benchmark workflow to include comparison step\r\n\r\n* Lint fix\r\n\r\n* upodate tvm cmmit\r\n\r\n* Imporve lower warp memory pass\r\n\r\n* Bug fix\r\n\r\n* Enhance to support warp schedule.\r\n\r\n* Enhance LOP3 Instructions\r\n\r\n* Enhance LOP3 Instructions\r\n\r\n* add test for stage3 propagate\r\n\r\n* implement propagate func\r\n\r\n* Stage3 Ladder Permutate integration\r\n\r\n* get_ladder_stage3_propagate\r\n\r\n* comments benchmark scirpts as the setting is too big\r\n\r\n* ci fix for benchmark\r\n\r\n* lint fix\r\n\r\n* chore: Update benchmark workflow to trigger on pull request comments\r\n\r\n* Add LDMatrix Transform 3\r\n\r\n* Support GPTQ Test\r\n\r\n* Fuse BlockReduce Schedule\r\n\r\n* Support mma propagate 3\r\n\r\n* Support MMA Propagate Stage 3\r\n\r\n* Lint Fix\r\n\r\n* Merge block reduce for dequantze config.\r\n\r\n* fix codeql\r\n\r\n* chore: Update submodule reference to latest commit\r\n\r\n* chore: Disable common subexpression elimination in TIR passes\r\n\r\n* Lint Fix\r\n\r\n* 4bit related lop3 updates.\r\n\r\n* lint fix\r\n\r\n* gptq test fix\r\n\r\n* Fix for test\r\n\r\n* lint fix\r\n\r\n* lint fix\r\n\r\n* typofix\r\n\r\n* QuantCompress Test\r\n\r\n* chore: Refactor quant_compress_impl.py for readability and maintainability\r\n\r\n* Enhance docs to update latest works.\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* removed legacy operator\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* LintFix\r\n\r\n* Fix GPTQ Repack with the latest weight transform\r\n\r\n* lint fix\r\n\r\n* bug fix for rescale dequantize\r\n\r\n* test fix\r\n\r\n* typo fix\r\n\r\n* lint fix\r\n\r\n* Set default weight propagate kind into LDMatrixTransform\r\n\r\n* lint fix\r\n\r\n* bug fix\r\n\r\n* bug fix for test\r\n\r\n* set default to stage3\r\n\r\n* revert change\r\n\r\n* lint fix\r\n\r\n* case fix\r\n\r\n* bug fix\r\n\r\n* fix for legalize\r\n\r\n* bug fix\r\n\r\n* chore: Clear global operator cache before running tests\r\n\r\n* revert optimize_stratety into SingleBatchDecodeOnly\r\n\r\n* typofix\r\n\r\n* update benchmark scripts\r\n\r\n* chore: Refactor benchmark scripts and fix typos\r\n\r\n* fix for testing\r\n\r\n* lint fix\r\n\r\n* fix import.\r\n\r\n* typo\r\n\r\n* operator benchmark\r\n\r\n* optimize\r\n\r\n* always with shared.dyn\r\n\r\n* optimize cache.\r\n\r\n* dsl fix\r\n\r\n* tqdm\r\n\r\n* chore: Add serialize_results method to benchmark_matmul_strategies.py\r\n\r\n* fix performance issue for dynamic async copy\r\n\r\n* chore: Refactor benchmark_matmul_strategies.py for improved performance and code readability\r\n\r\n* bug fix\r\n\r\n* update readme\r\n\r\n* disable block reduce for int8\r\n\r\n* bugfix for bitnet\r\n\r\n* annotatte todo.\r\n\r\n* lint fix\r\n\r\n* regist fast_decode for int8xint4\r\n\r\n* Refactor CUDA code to use sm architecture instead of compute architecture\r\n\r\n* compress qkv and gate up for bitnet\r\n\r\n* improve elementwise schedule\r\n\r\n* Refactor BitNet model checkpoint generation scripts\r\n\r\n* cross thread reduce for tl\r\n\r\n* fix scale only lop3 tensorize instructions.\r\n\r\n* bug fix for scale only case\r\n\r\n* fix scale for warp memory dequantize\r\n\r\n* lint fix\r\n\r\n* bug fox\r\n\r\n* format\r\n\r\n* fix repack from gptqv2\r\n\r\n* chore: Enable large files for Hugging Face models\r\n\r\n* bump version to dev14\r\n\r\n* BF 16 Update\r\n\r\n* lint fix\r\n\r\n* chore: Update BitBLAS benchmark scripts and fix typos\r\n\r\n* chore: Add gptqmodel to test requirements\r\n\r\n* remove gptqmodel dep for test\r\n\r\n* chore: Remove gptqmodel dependency for test","shortMessageHtmlLink":"[Dev] Support Numeric Precision BFloat16 as activation type (#148)"}},{"before":"01c7a80cb0c11e053f967825b99149b961909fbc","after":"ef28a5d23a2ad2d449d86f91701a088cb5b22315","ref":"refs/heads/main","pushedAt":"2024-08-20T08:27:34.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"LeiWang1999","name":"Lei Wang","path":"/LeiWang1999","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/34334180?s=80&v=4"},"commit":{"message":"[Fix] Fix scale and zero scopes for scale only template (#147)\n\n* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* Refactor import statements for improved readability and maintainability\r\n\r\n* disable failure email for ci\r\n\r\n* remove email notifications.\r\n\r\n* move relax pass from testing to mlc_llm\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* Lint Fix\r\n\r\n* Refactor scripts with se check_eual_ref_scripts_with_emitter function\r\n\r\n* bug fix in test\r\n\r\n* lint fix.\r\n\r\n* test cuda i4 kernel\r\n\r\n* Refactor copyright notice in i4matmul.hpp\r\n\r\n* Refactor BitBLASLinear test module for improved readability and maintainability\r\n\r\n* refactor test as version below python 3.9 cannot handle int32 overflow.\r\n\r\n* format lint for test\r\n\r\n* Refactor test_int4b_fp16_convert.py for improved readability and maintainability\r\n\r\n* remove unused design file\r\n\r\n* move tile device from package to base\r\n\r\n* dummy impl for codegen\r\n\r\n* Refactor file structure for ladder_permutate module\r\n\r\n* Refactor backend class and fix typos in comments\r\n\r\n* Deep refactor Lib related code.\r\n\r\n* remove ci pull.\r\n\r\n* LintFix\r\n\r\n* refactor builder for whl build\r\n\r\n* Refactor TIRWrapper.wrap() method to include an assertion for the optimized module\r\n\r\n* Refactor lib_generator to set library and source paths\r\n\r\n* lint fix\r\n\r\n* BitNet vllm integration\r\n\r\n* chore: update codespell to version 2.3.0\r\n\r\n* Lintfix\r\n\r\n* Bump version to 0.0.1.dev13\r\n\r\n* lint fix\r\n\r\n* disable fast decoding [u]int4xint8 by default.\r\n\r\n* optimize from dict design in Hint\r\n\r\n* Implement SplitK\r\n\r\n* bitnet benchmark generation.\r\n\r\n* Add benchmark script for BitNet integration\r\n\r\n* AtomicAdd Support\r\n\r\n* LintFix\r\n\r\n* ci fix when 3rdparty tvm is initialized.\r\n\r\n* bug fix for setup\r\n\r\n* fix a bug in block reduce\r\n\r\n* typo fix\r\n\r\n* BUG Fix for block reduce.\r\n\r\n* Lint fix\r\n\r\n* Refactor block reduce schedule template\r\n\r\n* transform branch from bitblas to bitblas_tl\r\n\r\n* Fix subproject commit reference in 3rdparty/tvm\r\n\r\n* chore: update submodule branch from bitblas to bitblas_tl\r\n\r\n* force update config.cmake\r\n\r\n* Bug fix\r\n\r\n* Fix subproject commit reference in 3rdparty/cutlass\r\n\r\n* chore: Add submodule for cutlass library\r\n\r\n* update tl cutlass path\r\n\r\n* Refactor BitBLASLinear test module for improved readability and maintainability\r\n\r\n* format fix\r\n\r\n* Copy CUTLASS to the package directory\r\n\r\n* Refactor setup.py to include additional TVM header files\r\n\r\n* lint fix\r\n\r\n* bug fix\r\n\r\n* Refactor BitBLASLinear test module for improved readability and maintainability\r\n\r\n* Implement Matmul Benchmark Design\r\n\r\n* chore: Update BitBLAS Matmul benchmark script\r\n\r\n* lint fix\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark for improved readability and maintainability\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* lint fix\r\n\r\n* Benchmark bot test\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* int8 test case\r\n\r\n* Refactor compare_benchmark.py to handle missing benchmark results gracefully\r\n\r\n* ci fix\r\n\r\n* disable ci for test benchmark\r\n\r\n* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run\r\n\r\n* remove cli installation\r\n\r\n* chore: Create virtual environment and install dependencies for benchmark\r\n\r\n* chore: Update benchmark workflow to include comparison step\r\n\r\n* Lint fix\r\n\r\n* upodate tvm cmmit\r\n\r\n* Imporve lower warp memory pass\r\n\r\n* Bug fix\r\n\r\n* Enhance to support warp schedule.\r\n\r\n* Enhance LOP3 Instructions\r\n\r\n* Enhance LOP3 Instructions\r\n\r\n* add test for stage3 propagate\r\n\r\n* implement propagate func\r\n\r\n* Stage3 Ladder Permutate integration\r\n\r\n* get_ladder_stage3_propagate\r\n\r\n* comments benchmark scirpts as the setting is too big\r\n\r\n* ci fix for benchmark\r\n\r\n* lint fix\r\n\r\n* chore: Update benchmark workflow to trigger on pull request comments\r\n\r\n* Add LDMatrix Transform 3\r\n\r\n* Support GPTQ Test\r\n\r\n* Fuse BlockReduce Schedule\r\n\r\n* Support mma propagate 3\r\n\r\n* Support MMA Propagate Stage 3\r\n\r\n* Lint Fix\r\n\r\n* Merge block reduce for dequantze config.\r\n\r\n* fix codeql\r\n\r\n* chore: Update submodule reference to latest commit\r\n\r\n* chore: Disable common subexpression elimination in TIR passes\r\n\r\n* Lint Fix\r\n\r\n* 4bit related lop3 updates.\r\n\r\n* lint fix\r\n\r\n* gptq test fix\r\n\r\n* Fix for test\r\n\r\n* lint fix\r\n\r\n* lint fix\r\n\r\n* typofix\r\n\r\n* QuantCompress Test\r\n\r\n* chore: Refactor quant_compress_impl.py for readability and maintainability\r\n\r\n* Enhance docs to update latest works.\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* removed legacy operator\r\n\r\n* Refactor weight executors in Matmul class for improved readability and maintainability\r\n\r\n* LintFix\r\n\r\n* Fix GPTQ Repack with the latest weight transform\r\n\r\n* lint fix\r\n\r\n* bug fix for rescale dequantize\r\n\r\n* test fix\r\n\r\n* typo fix\r\n\r\n* lint fix\r\n\r\n* Set default weight propagate kind into LDMatrixTransform\r\n\r\n* lint fix\r\n\r\n* bug fix\r\n\r\n* bug fix for test\r\n\r\n* set default to stage3\r\n\r\n* revert change\r\n\r\n* lint fix\r\n\r\n* case fix\r\n\r\n* bug fix\r\n\r\n* fix for legalize\r\n\r\n* bug fix\r\n\r\n* chore: Clear global operator cache before running tests\r\n\r\n* revert optimize_stratety into SingleBatchDecodeOnly\r\n\r\n* typofix\r\n\r\n* update benchmark scripts\r\n\r\n* chore: Refactor benchmark scripts and fix typos\r\n\r\n* fix for testing\r\n\r\n* lint fix\r\n\r\n* fix import.\r\n\r\n* typo\r\n\r\n* operator benchmark\r\n\r\n* optimize\r\n\r\n* always with shared.dyn\r\n\r\n* optimize cache.\r\n\r\n* dsl fix\r\n\r\n* tqdm\r\n\r\n* chore: Add serialize_results method to benchmark_matmul_strategies.py\r\n\r\n* fix performance issue for dynamic async copy\r\n\r\n* chore: Refactor benchmark_matmul_strategies.py for improved performance and code readability\r\n\r\n* bug fix\r\n\r\n* update readme\r\n\r\n* disable block reduce for int8\r\n\r\n* bugfix for bitnet\r\n\r\n* annotatte todo.\r\n\r\n* lint fix\r\n\r\n* regist fast_decode for int8xint4\r\n\r\n* Refactor CUDA code to use sm architecture instead of compute architecture\r\n\r\n* compress qkv and gate up for bitnet\r\n\r\n* improve elementwise schedule\r\n\r\n* Refactor BitNet model checkpoint generation scripts\r\n\r\n* cross thread reduce for tl\r\n\r\n* fix scale only lop3 tensorize instructions.\r\n\r\n* bug fix for scale only case\r\n\r\n* fix scale for warp memory dequantize\r\n\r\n* lint fix\r\n\r\n* bug fox\r\n\r\n* format","shortMessageHtmlLink":"[Fix] Fix scale and zero scopes for scale only template (#147)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yMFQxNTowOTo1Ny4wMDAwMDBazwAAAAS8GZVS","startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yMFQxNTowOTo1Ny4wMDAwMDBazwAAAAS8GZVS","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOC0yMFQwODoyNzozNC4wMDAwMDBazwAAAASe2DxO"}},"title":"Activity · microsoft/BitBLAS"}