Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreClr Official runs are overwriting their test list JSONs, which break the repro tool #9902

Closed
MattGal opened this issue Mar 8, 2018 · 11 comments

Comments

@MattGal
Copy link
Member

MattGal commented Mar 8, 2018

@RussKeldorph , @maririos FYI

Russ just showed me this problem, which you can see using the Helix API here:
https://helix.dot.net/api/jobs/422dc66a-f9f3-4e85-9cc2-c85b292521da/details?access_token=**your token**

The Job list is getting written here:
https://dotnetbuildoutput.blob.core.windows.net/coreclr-master-20180304-01/TestList.json
... but you'll note that there's nothing about this job list's url that separates it from a Windows, OSX, other linux, etc run; what's happening is apparently some or all test builds are uploading the test list to the same path.

Helix doesn't really care, as long as you leave the file alone for 4-5 seconds for us to parse it and distribute the work, but if two builds submitted simultaneously this would break regular runs. It breaks the repro tool though because we need to keep this file around to see where the test bits are stored.

To fix this:

There's a few ways to do this:

I'd recommend the distinct test list name, assuming the blobs aren't all overwriting each other as well.

@RussKeldorph
Copy link
Contributor

@jashook FYI.

@wtgodbe
Copy link
Member

wtgodbe commented Mar 8, 2018

Seems like the fix would be to insert something like:

<TestListFileName>$(Rid)-$(BuildType)-TestList.json</TestListFileName>

Somewhere around here: https://github.com/dotnet/coreclr/blob/master/tests/helixpublish.proj#L44-L50

@MattGal seem reasonable? I can put up a PR if needed

@MattGal
Copy link
Member Author

MattGal commented Mar 9, 2018

@wtgodbe that's the more elegant (IMO) fix, yes, but you'd want to make sure that your other payload blobs didn't collide either; If anyone writes to a blob in those lists after the list is written, it's potentially going to result in chaos. I was unsure from the Windows runtime correlation payload if this would be the case; simply review all the blobs that the specific runs are uploading to decide this.

@wtgodbe
Copy link
Member

wtgodbe commented Mar 26, 2018

For reference, here's how I would go about checking if test blobs have unique names:

Looks like everything (but the testlist file) is named with an RID included, which is a unique identifier.

@RussKeldorph
Copy link
Contributor

@wtgodbe Are you saying you went through your list and don't see anything that isn't already uniquely named, so we should be good to just fix the test list file name?

@maririos
Copy link
Member

@wtgodbe do we have an ETA of when this is going to get fixed?

@MattGal
Copy link
Member Author

MattGal commented Mar 28, 2018

I chatted with @RussKeldorph , I think Will's fix is probably sufficient and we have an easy way to check (which I will assist with) once the fix is in.

@wtgodbe
Copy link
Member

wtgodbe commented Mar 28, 2018

@RussKeldorph Yes, as far as I can tell everything except the test list file name is already uniquely identified, so my fix should be sufficient. The only thing I can think of that would break that is if two jobs in the same build had the exact sime RID (OS-Arch), which should never happen anyways.

@maririos
Copy link
Member

maririos commented Apr 2, 2018

@RussKeldorph I went and tried the Repro tool on a random CoreClr official build test failure and was able to get a RedHat6.9 machine with the right bits :) . Thank you so much for fixing this

@RussKeldorph
Copy link
Contributor

@maririos Yeah, it worked for me too. Credit goes to @MattGal and @wtgodbe.

@maririos
Copy link
Member

maririos commented Apr 2, 2018

Awesome! Thanks everyone!

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants