{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":817098195,"defaultBranch":"master","name":"nccl","ownerLogin":"hongbilu","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2024-06-19T02:46:21.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/36907893?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1718765189.033632","currentOid":""},"activityList":{"items":[{"before":"529ee691c36c5a43656b80c32c049b9ae976d5c0","after":"178b6b759074597777ce13438efb0e0ba625e429","ref":"refs/heads/master","pushedAt":"2024-06-23T14:17:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hongbilu","name":"bean","path":"/hongbilu","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36907893?s=80&v=4"},"commit":{"message":"2.22.3-1\n\nRework core for NVIDIA Trusted Computing\n * Compress work structs so that they are shared between channels\n * Utilize the full amount of kernel argument space permitted (4k)\n   before resorting to work fifo.\n * Rework the task preprocessing phase.\n * Use a separate abortDevFlag which is kept in sync with abortFlag\n   using cudaMemcpy operations.\n * Rename src/include/align.h to src/include/bitops.h\n\nAdd lazy connection establishment for collective operations\n * Move buffer allocation and connection establishment to the first\n   collective operation using that algorithm.\n * Accelerate init time and reduce memory usage.\n * Avoid allocating NVLS buffers if all calls are registered.\n * Compute algo/proto in ncclLaunchCollTasksInfo early on.\n * Connect peers in ncclCollPreconnectFunc if not connected already.\n * Also move shared buffer creation to the first send/recv call.\n\nAccelerate intra-node NVLink detection\n * Make each rank only detect NVLinks attached to its GPU.\n * Fuse XMLs to reconstruct the full NVLink topology\n\nAdd init profiling to report time spend in different init phases.\n * Report timings of bootstrap, allgather, search, connect, etc.\n * Add new \"PROFILE\" category for NCCL_DEBUG_SUBSYS.\n\nAdd support for PCI p2p on split PCI switches\n * Detect split PCI switches through a kernel module exposing\n   switch information.\n * Update the topology XML and graph to add those inter-switch\n   connections.\n\nAdd cost estimation API\n * Add a new ncclGroupEndSimulate primitive to return the estimated\n   time a group would take.\n\nNet/IB: Add separate traffic class for fifo messages\n * Add NCCL_IB_FIFO_TC to control the traffic class of fifo messages\n   independently from NCCL_IB_TC.\n   Merges PR #1194\n\nNet/IB: Add support for IB router\n * Use flid instead of lid if subnets do not match\n * Warn if flid is 0\n\nOptimizations and fixes for device network offload (unpack)\n * Double the default number of channels\n * Cache netDeviceType\n * Fix save/increment head logic to enable Tree support.\n\nSupport ncclGroupStart/End for ncclCommAbort/Destroy\n * Allow Abort/Destroy to be called within a group when managing\n   multiple GPUs with a single process.\n\nImprove Tuner API\n * Provide to the plugin the original cost table so that the plugin\n   can leave unknown or disabled algo/proto combinations untouched.\n * Remove nvlsSupport and collnetSupport.\n\nDo not print version to stdout when using a debug file\n * Also print version from all processes with INFO debug level.\n   Fixes issue #1271\n\nFix clang warnings in NVTX headers\n * Update NVTX headers to the latest version\n   Fixes issue #1270\n\nDisable port fusion in heterogeneous systems\n * Do not fuse ports if a mix of multi-port and single port are detected.\n\nFix NVLS graphs search for dual NICs.\n * Fix NVLS graph search when we have more than one NIC per GPU.\n\nFix crash with collnetDirect\n * Add separate graph search for collnetDirect, testing alltoall paths\n   and working similarly to the NVLS search.\n\nFix hang when nodes have different CPU types\n * Add the CPU type to the rank peer info.\n * Align all ranks on the CPU type after the first allgather.\n * Only use the aligned CPU type for all tuning operations.\n   Fixes issue #1136\n   Fixes issue #1184\n\nFix performance of registered send/recv operations\n * Allow for single full size operations\n * Add INFO to confirm the registration of send/recv buffers.\n\nMove all sync ops to finalize stage\n * Ensure ncclCommDestroy is non-blocking if ncclCommFinalize has\n   been called.\n\nImprove error reporting during SHM segment creation\n\nImprove support of various compilers\n   Merges PR #1177\n   Merges PR #1228\n\nAllow net and tuner plugins to be statically linked\n * Search for ncclNet or ncclTuner symbols in the main binary.\n   Merges PR #979\n\nPlugin examples includes cleanup\n * Harmonize err.h and common.h usage.\n * Add mixed plugin with both net and tuner.","shortMessageHtmlLink":"2.22.3-1"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"Y3Vyc29yOnYyOpK7MjAyNC0wNi0yM1QxNDoxNzoyMS4wMDAwMDBazwAAAARsx_Na","startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wNi0yM1QxNDoxNzoyMS4wMDAwMDBazwAAAARsx_Na","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wNi0yM1QxNDoxNzoyMS4wMDAwMDBazwAAAARsx_Na"}},"title":"Activity · hongbilu/nccl"}