-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoreClr Official runs are overwriting their test list JSONs, which break the repro tool #9902
Comments
@jashook FYI. |
Seems like the fix would be to insert something like:
Somewhere around here: https://github.com/dotnet/coreclr/blob/master/tests/helixpublish.proj#L44-L50 @MattGal seem reasonable? I can put up a PR if needed |
@wtgodbe that's the more elegant (IMO) fix, yes, but you'd want to make sure that your other payload blobs didn't collide either; If anyone writes to a blob in those lists after the list is written, it's potentially going to result in chaos. I was unsure from the Windows runtime correlation payload if this would be the case; simply review all the blobs that the specific runs are uploading to decide this. |
For reference, here's how I would go about checking if test blobs have unique names:
Looks like everything (but the testlist file) is named with an RID included, which is a unique identifier. |
@wtgodbe Are you saying you went through your list and don't see anything that isn't already uniquely named, so we should be good to just fix the test list file name? |
@wtgodbe do we have an ETA of when this is going to get fixed? |
I chatted with @RussKeldorph , I think Will's fix is probably sufficient and we have an easy way to check (which I will assist with) once the fix is in. |
@RussKeldorph Yes, as far as I can tell everything except the test list file name is already uniquely identified, so my fix should be sufficient. The only thing I can think of that would break that is if two jobs in the same build had the exact sime RID (OS-Arch), which should never happen anyways. |
@RussKeldorph I went and tried the Repro tool on a random CoreClr official build test failure and was able to get a RedHat6.9 machine with the right bits :) . Thank you so much for fixing this |
Awesome! Thanks everyone! |
@RussKeldorph , @maririos FYI
Russ just showed me this problem, which you can see using the Helix API here:
https://helix.dot.net/api/jobs/422dc66a-f9f3-4e85-9cc2-c85b292521da/details?access_token=**your token**
The Job list is getting written here:
https://dotnetbuildoutput.blob.core.windows.net/coreclr-master-20180304-01/TestList.json
... but you'll note that there's nothing about this job list's url that separates it from a Windows, OSX, other linux, etc run; what's happening is apparently some or all test builds are uploading the test list to the same path.
Helix doesn't really care, as long as you leave the file alone for 4-5 seconds for us to parse it and distribute the work, but if two builds submitted simultaneously this would break regular runs. It breaks the repro tool though because we need to keep this file around to see where the test bits are stored.
To fix this:
There's a few ways to do this:
I'd recommend the distinct test list name, assuming the blobs aren't all overwriting each other as well.
The text was updated successfully, but these errors were encountered: