Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect all_gather result in the interpreter #1933

Closed
linuxlonelyeagle opened this issue Jan 19, 2024 · 0 comments · Fixed by #1934
Closed

Incorrect all_gather result in the interpreter #1933

linuxlonelyeagle opened this issue Jan 19, 2024 · 0 comments · Fixed by #1934
Assignees

Comments

@linuxlonelyeagle
Copy link
Contributor

What happened?

Repro:

module @cross_replica {
  func.func public @all_gather(%arg0 : tensor<2x2xi64>) -> tensor<2x8xi64> {
    %result = "stablehlo.all_gather"(%arg0) {
      all_gather_dim = 1 : i64,
      replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>, 
      channel_handle = #stablehlo.channel_handle<handle=1, type=0>
    } : (tensor<2x2xi64>) -> tensor<2x8xi64>
    return %result : tensor<2x8xi64>
  }
  func.func public @main() {
    %0 = stablehlo.constant dense<[[1, 2], [3, 4]]> : tensor<2x2xi64>
    %1 = stablehlo.constant dense<[[5, 6], [7, 8]]> : tensor<2x2xi64>
    %results:4 = "interpreter.run_parallel"(%1, %1, %0, %1) {
      programs=[[@all_gather, @all_gather], [@all_gather,@all_gather]]
    } : (tensor<2x2xi64>, tensor<2x2xi64>, tensor<2x2xi64>, tensor<2x2xi64>) -> (tensor<2x8xi64>, tensor<2x8xi64>, tensor<2x8xi64>, tensor<2x8xi64>)
    check.expect_eq_const %results#0, dense<[[5, 6, 5, 6, 1, 2, 5, 6],
                                             [7, 8, 7, 8, 3, 4, 7, 8]]> : tensor<2x8xi64>
    check.expect_eq_const %results#1, dense<[[5, 6, 5, 6, 1, 2, 5, 6],
                                             [7, 8, 7, 8, 3, 4, 7, 8]]> : tensor<2x8xi64>
    check.expect_eq_const %results#2, dense<[[5, 6, 5, 6, 1, 2, 5, 6],
                                             [7, 8, 7, 8, 3, 4, 7, 8]]> : tensor<2x8xi64>
    check.expect_eq_const %results#3, dense<[[5, 6, 5, 6, 1, 2, 5, 6],
                                             [7, 8, 7, 8, 3, 4, 7, 8]]> : tensor<2x8xi64>

    func.return
  }
}

expected:

    check.expect_eq_const %results#0, dense<[[5, 6, 1, 2, 5, 6, 5, 6],
                                             [7, 8, 3, 4, 7, 8, 7, 8]]> : tensor<2x8xi64>
    check.expect_eq_const %results#1, dense<[[5, 6, 1, 2, 5, 6, 5, 6],
                                             [7, 8, 3, 4, 7, 8, 7, 8]]> : tensor<2x8xi64>
    check.expect_eq_const %results#2, dense<[[5, 6, 1, 2, 5, 6, 5, 6],
                                             [7, 8, 3, 4, 7, 8, 7, 8]]> : tensor<2x8xi64>
    check.expect_eq_const %results#3, dense<[[5, 6, 1, 2, 5, 6, 5, 6],
                                             [7, 8, 3, 4, 7, 8, 7, 8]]> : tensor<2x8xi64>

result:

    check.expect_eq_const %results#0, dense<[[5, 6, 5, 6, 1, 2, 5, 6],
                                             [7, 8, 7, 8, 3, 4, 7, 8]]> : tensor<2x8xi64>
    check.expect_eq_const %results#1, dense<[[5, 6, 5, 6, 1, 2, 5, 6],
                                             [7, 8, 7, 8, 3, 4, 7, 8]]> : tensor<2x8xi64>
    check.expect_eq_const %results#2, dense<[[5, 6, 5, 6, 1, 2, 5, 6],
                                             [7, 8, 7, 8, 3, 4, 7, 8]]> : tensor<2x8xi64>
    check.expect_eq_const %results#3, dense<[[5, 6, 5, 6, 1, 2, 5, 6],
                                             [7, 8, 7, 8, 3, 4, 7, 8]]> : tensor<2x8xi64>

Steps to reproduce your issue

No response

Version information

No response

@ghpvnist ghpvnist self-assigned this Jan 19, 2024
ghpvnist added a commit that referenced this issue Jan 23, 2024
Currently, the implementation concatenates rendezvous tensors which are
sorted by process ids. However, the spec implies that the order of
concatenation happens in the same order as the generated `processGroup`,
which is not necessarily sorted. This change reflects what is implied in
the spec.

Thanks @linuxlonelyeagle for filing the issue!

closes #1933
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants