Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to stop on first failure to traversal SDK #186

Merged
merged 3 commits into from
Jul 16, 2020

Conversation

safern
Copy link
Member

@safern safern commented Jun 6, 2020

If you have 2 projects, let's say A and B, where B depends from A and you build them sequentially (not in parallel), if A fails, then B shouldn't be built.

We have cases like this in dotnet/runtime, where we have different partitions of the repo, like the runtime and libraries. Libraries build depends on the runtime build. For each partition we have a project that drives the build of that partition, and we have a "Driver" project which adds whatever partitions you're trying to build as projects to build using the Traversal SDK and we don't build them in parallel because they're dependent, so we need the ability of failing if any partition failed and stopping to get reasonable errors instead of errors due to the previous partition failing.

I still need to add tests for this, but wanted to get early feedback

cc: @rainersigwald

FYI: @ViktorHofer

@safern
Copy link
Member Author

safern commented Jun 8, 2020

cc: @jeffkl

@jeffkl
Copy link
Contributor

jeffkl commented Jun 8, 2020

I'm still wrapping my head around this, here's a question I have:

StopOnFirstFailure doesn't seem to work at all if building in parallel. Why not just set BuildInParallel to false if you set StopOnFirstFailure to true? Is this because we end up grouping projects that are parallel and ones that are not?

I'm mainly looking for a simpler implementation.

@safern
Copy link
Member Author

safern commented Jun 8, 2020

StopOnFirstFailure doesn't seem to work at all if building in parallel. Why not just set BuildInParallel to false if you set StopOnFirstFailure to true? Is this because we end up grouping projects that are parallel and ones that are not?

Yeah, I wanted to give the option for customers to set StopOnFirstFailure for non-parallel traversals. I'm thinking on cases like where you have multiple orchestrators (which is the case for dotnet/runtime). For example, we have project Build.proj which does a build of project A and B and both are non-parallel projects. However, project A orchestrates another traversal of multiple projects; for example src\libraries\**\*.csproj (same for B but with different projects), and those traversals contain parallel projects. So, I wanted to respect, ProjectToBuild.BuildInParallel properties for this scenarios, where you have a global setting for all projects, but don't honor it if it is not supported in the current node.

Does that make sense?

@@ -114,11 +114,23 @@
DependsOnTargets="$(BuildDependsOn)"
Condition=" '$(IsGraphBuild)' != 'true' "
Returns="@(CollectedBuildOutput)">

<ItemGroup>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary, can't you just call AnyHaveMetadataValue on @(ProjectReference) on line 125?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but then we wouldn't be honoring BuildInParallel fallback property. Because we could do AnyHaveMetadataValue('BuildInParallel', 'true') but then I realized that would be a more complicated implementation because if a project doesn't have metadata value then we need to honor BuildInParallel.

Then for the other cases where we have more properties like CleanInParallel... it gets more complicated, so I figured this was the simplest, to basically do the same as we do when dispatching to the MSBuild task.

@jeffkl
Copy link
Contributor

jeffkl commented Jun 9, 2020

@rainersigwald Can you have a look at this and see if you know a way to simplify the logic?

@safern
Copy link
Member Author

safern commented Jun 15, 2020

Friendly ping 😃 -- this would be really helpful for dotnet/runtime repo.

@rainersigwald
Copy link
Member

I agree that this feels very complicated for the benefit.

StopOnFirstFailure doesn't seem to work at all if building in parallel. Why not just set BuildInParallel to false if you set StopOnFirstFailure to true? Is this because we end up grouping projects that are parallel and ones that are not?

Yeah, I wanted to give the option for customers to set StopOnFirstFailure for non-parallel traversals. I'm thinking on cases like where you have multiple orchestrators (which is the case for dotnet/runtime). For example, we have project Build.proj which does a build of project A and B and both are non-parallel projects. However, project A orchestrates another traversal of multiple projects; for example src\libraries\**\*.csproj (same for B but with different projects), and those traversals contain parallel projects. So, I wanted to respect, ProjectToBuild.BuildInParallel properties for this scenarios, where you have a global setting for all projects, but don't honor it if it is not supported in the current node.

I'm not sure I understand the scenario here. If Build.proj builds A, then B (with BuildInParallel="false" and StopOnFirstFailure="true", as properties set in the project), why would that have anything to do with A building its dependencies in parallel? That's a different project.

@safern
Copy link
Member Author

safern commented Jun 17, 2020

What I was trying to say was to not force customers to have to configure StopOnFirstFailure in such granularity on every project that uses the Traversal SDK.

For example if I have:

- Directory.Build.props -> StopOnFirstFailure=true
- A.proj -> BuildInParallel=false
  - AA.proj
  - AB.proj -> BuildInParallel=true
  - AC.proj -> BuildInParallel=true
- B.proj -> BuildInParallel=false
 - BA.proj -> BuildInParallel=true
 - BB.proj -> BuildInParallel=false

I would like the SDK to just build the right nodes for me with StopOnFirstFailure when it makes sense because I set it in a global shared location to true. If we do it for each project evaluation or metadata then I will have to endup doing something like this:

- Directory.Build.props -> StopOnFirstFailure=true
- A.proj -> BuildInParallel=false
  - Directory.Build.props -> StopOnFirstFailure=false
  - AA.proj
  - AB.proj -> BuildInParallel=true
  - AC.proj -> BuildInParallel=true
- B.proj -> BuildInParallel=false
 - Directory.Build.props -> StopOnFirstFailure=false
 - BA.proj -> BuildInParallel=true
 - BB.proj -> BuildInParallel=false

and then that complexity will kick back on the users where they will need to calculate were can they StopOnFirstFailure without loosing build parallelism. Does that make sense?

@safern
Copy link
Member Author

safern commented Jun 24, 2020

@rainersigwald does that make sense?

@jaredpar
Copy link
Member

@rainersigwald

I agree that this feels very complicated for the benefit

Consider this issue as a concrete example: dotnet/runtime#38342

This morning I tried building runtime on a new WSL2 enlistment and I forgot to install the right version of cmake. The runtime build even correctly identified that cmake wasn't installed and bailed out of the native portion of our build early. Yet the build continued on to other sections of the build, those predictably failed because the first project failed. I ended up scratching my head for 15 minutes because the errors were non-sensical. Eventually @safern pointed out that the build had failed much earlier due to cmake missing and that was the root cause.

If we don't take a feature like this how do we structure runtime to have sensible behavior here?

@safern
Copy link
Member Author

safern commented Jul 16, 2020

I spoke with @rainersigwald and we decided that indeed the complex implementation of batching on all the metadata was not worth it as MSBuild already handles the case where StopOnFirstFailure==true and BuildInParallel==true and BuildInParallel wins in that case. So basically what I was doing is not needed.

We decided to just default StopOnFirstFailure to true whenever it is not set by the user.

@jeffkl @rainersigwald PTAL

@safern safern force-pushed the AddStopOnFirstFailureSupport branch from 9a3b577 to a9a99d9 Compare July 16, 2020 22:30
Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a lot cleaner and still covers the main goals:

  1. If you were building things sequentially, presumably that's because of implicit dependencies. So the new behavior is better.
  2. The only negative to setting StopOnFirstFailure + BuildInParallel is a message.
  3. No confusion about batching and ordering between projects that have/don't have the StopOnFirstFailure/BuildInParallel metadata.

Copy link
Contributor

@jeffkl jeffkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the contribution!

And thanks to @rainersigwald for consulting on this.

@safern
Copy link
Member Author

safern commented Jul 16, 2020

Thank you guys for reviewing and the discussion!

@jeffkl jeffkl merged commit 0ba2860 into microsoft:master Jul 16, 2020
@jeffkl jeffkl added the Feature Request New feature or request label Jul 16, 2020
@safern safern deleted the AddStopOnFirstFailureSupport branch July 16, 2020 22:41
@safern
Copy link
Member Author

safern commented Jul 16, 2020

@jeffkl would you mind sharing the produced SDK version with this change whenever available, so I can update it in dotnet/runtime?

@jeffkl
Copy link
Contributor

jeffkl commented Jul 16, 2020

Fresh out of the oven: https://www.nuget.org/packages/Microsoft.Build.Traversal/2.0.52

@safern
Copy link
Member Author

safern commented Jul 16, 2020

Thanks 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants