Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DONOTMERGE] Revert "Put off major collections to improve GC times." #44631

Closed
wants to merge 2 commits into from

Conversation

Keno
Copy link
Member

@Keno Keno commented Mar 15, 2022

As far as I can tell, win32 CI started failing with OOM after this PR. Putting up the revert to investigate whether it is indeed the root cause.

Reverts #44215

@chflood
Copy link
Member

chflood commented Mar 16, 2022

Aw man.. I didn't test on win32. If this is the problem I can come up with a less aggressive heuristic for 32bit builds.

@DilumAluthge
Copy link
Member

So, it looks like tester_win32 is no longer out-of-memorying, but it is failing with some Profile failure.

@IanButterworth Any idea why Profile is failing? I keep retrying the job, but the test failure keeps happening.

@IanButterworth
Copy link
Member

I don't see an error?

@Keno
Copy link
Member Author

Keno commented Mar 16, 2022

See e.g. https://build.julialang.org/#/builders/42/builds/3478/steps/5/logs/stdio. Probably the first thing to do is to change the test to print out what the output was on failure, so we can see if it's something dumb.

@IanButterworth
Copy link
Member

This design fix might help #44639

Otherwise, I don't think there's been a recent change on windows profiling

@IanButterworth
Copy link
Member

#44639 tester_win32 didn't get OOM-ed and failed with the same error, so it seems neither PR are fixes?

@Keno
Copy link
Member Author

Keno commented Mar 16, 2022

The OOM error is somewhat intermittent. It would be good to fix the Profile error if we can to get a cleaner signal on the OOM.

@IanButterworth
Copy link
Member

I added some debug logs to #44639 the subprocess is either taking ages or frozen and gets killed by the watchdog.

      From worker 4:	KILLING BY PROFILE TEST WATCHDOG
      From worker 4:	
      From worker 4:	120.057066 seconds (6.76 k allocations: 268.992 KiB, 0.07% compilation time)
      From worker 4:	  0.000782 seconds (19 allocations: 544 bytes)
      From worker 4:	s = ""
      From worker 4:	read(p, String) = ""

so this test appears to be going really slow on win32

using Profile
f(::Val) = GC.safepoint()
@profile for i = 1:10^3; f(Val(i)); end
print(Profile.len_data())

@DilumAluthge
Copy link
Member

Should we have a Sys.iswindows, and either reduce the number of iterations on Windows (so that the test takes less time to run), or just skip that test entirely on Windows?

@IanButterworth
Copy link
Member

More debug in #44639

The profile test just passed and the subprocess took 20s. On MacOS it takes ~3s, on my Windows VM with 32-bit julia it takes 10s. When it fails it means it's being killed after 120s, which seems like a big deviation from normal CI behavior.

Also, that run got OOM-ed.

Could it be that too many parallel test processes are being run on the win32 machine?

@Keno Keno closed this Apr 26, 2022
@giordano giordano deleted the revert-44215-update branch April 26, 2022 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants