Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix omrsysinfo_get_limit/omrsysinfo_set_limit API reporting of hard limits on macOS #4990

Closed
fjeremic opened this issue Mar 26, 2020 · 2 comments · Fixed by #5108
Closed

Comments

@fjeremic
Copy link
Contributor

During a Slack discussion [1] which I'll quote here:

I just got a hold of my partners macOS laptop to try and debug this failure. It appears that this is a macOS issue that others have run into as well, for example the Golang community:
golang/go#30401

I was able to reproduce the failure seen on Azure Pipelines by doing the following:
sudo launchctl limit maxfiles 10240 524288

Then close the terminal, and open a new session. ulimit -n should now print the new soft limit of 10240. We then run the test and fail in the same location as on Azure Pipelines:
https://jeremic.visualstudio.com/OMR/_build/results?buildId=314&view=logs&j=196c4937-6990-58b5-7d18-234ac2f0888b&t=66cbe35e-0143-5803-340e-78910b4c9fcb

It appears that at some point macOS changed the maximum soft limit that one can set to be OPEN_MAX which is defined in sys/syslimits.h . It seems macOS doesn't care what your hard limit is set to. This causes problems for our test. It also seems that the macOS version seems to matter as well. Not sure if there is a simple solution to this problem.

@rwy0717 @youngar any suggestions on how to proceed? Should we just disable this particular test on macOS given these restrictions and document the observed behavior in an issue?

By the way the reason it seems to pass on the Jenkins farm is because the value for the hard limit is set to 9223372036854775807 which is 2^63 - 1:
https://ci.eclipse.org/omr/job/Build-osx_x86-64/1331/consoleFull

11:57:14 31: [----------] 38 tests from PortSysinfoTest
11:57:14 31: originalSoftLimit=10240
11:57:14 31: originalHardLimit=9223372036854775807
11:57:14 31: soft set to hard limit=10240
11:57:24 31: [----------] 38 tests from PortSysinfoTest (8353 ms total)

It seems we have special code to handle this case:
https://github.com/eclipse/omr/blob/24a81ac88f98f4ae46ed289a05c8e1d74e301251/port/unix/omrsysinfo.c#L3364-L3379

I also tried setting the hard limit to be 2^63 - 1 but naturally macOS denies us to set it to that large of a value, so that doesn't fix the problem. Even if it did though I'd prefer we didn't hide the underlying issue which could arise to users who clone the repository and run the test's locally. I'd rather either fix the underlying problem or disable the test on macOS.

It was identified that the hard limit values returned by omrsysinfo_get_limit may not be the true limit that we can set the soft file descriptor limit to. As perthe above example, calling omrsysinfo_set_limit to update the soft limit to the reported hard limit would fail on macOS depending on system configuration.

Various solutions were discussed and it seems there was general concensus that we should report the hard limit which we can acutally set, iregardless of what the system reports as the hard limit. Another quote said:

Interestingly enough this code is new. There is some very relevant discussion here:
#3579

The PR which merged the above code was this one:
#3641

Seems @charliegracie and @youngar were involved in the discussions. Perhaps we can revisit this with some new perspective?

We should investigate a path forward, and revert the change which disables this test in #4980.

[1] https://eclipse-omr.slack.com/archives/C010F8XPRT9/p1584990621015600

@fjeremic
Copy link
Contributor Author

@rwy0717 you played around with a C test case. Could you upload it to this issue so we don't lose track of that once we get some resources to be able to fix this?

@rwy7
Copy link
Contributor

rwy7 commented Mar 26, 2020

It's very very rough, but here:

#include <stdio.h>
#include <sys/types.h>
#include <sys/sysctl.h>
#include <limits.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <fcntl.h>

static size_t files_opened = 3;

int open_to(size_t n) {
	printf("opening to: %lu\n", n);
	while (files_opened < n) {
		char name[PATH_MAX];
		sprintf(name, "/tmp/max_open-%d", files_opened);
		int fd = open(name, O_CREAT | O_RDWR);
		if (fd < 0) {
			printf("failed to open file no=%zu\n", files_opened + 1);
			return -1;
		}
		files_opened += 1;
	}
	printf("success\n");
	return 0;
}

extern "C" int main() {

	struct rlimit rlim;
	getrlimit(RLIMIT_NOFILE, &rlim);

	int name[2] = { CTL_KERN, KERN_MAXFILESPERPROC };
	int sysctl_max = 1;
	size_t len = sizeof(sysctl_max);
	sysctl(name, 2, &sysctl_max, &len, NULL, 0);

	printf("** initial report\n");
	printf("OPEN_MAX      = %d\n", OPEN_MAX);
	printf("rlim.rlim_cur = %llu\n", rlim.rlim_cur);
	printf("rlim.rlim_max = %llu\n", rlim.rlim_max);
	printf("sysctl_max    = %d\n", sysctl_max);

	///////////////

	{
		struct rlimit new_lim = rlim;
		new_lim.rlim_max = rlim.rlim_max - 32;
		int err = setrlimit(RLIMIT_NOFILE, &new_lim);
		printf("** set rlim_max to (rlim_max - 32): %d\n", err);
	}

	getrlimit(RLIMIT_NOFILE, &rlim);
	sysctl(name, 2, &sysctl_max, &len, NULL, 0);

	printf("OPEN_MAX      = %d\n", OPEN_MAX);
	printf("rlim.rlim_cur = %llu\n", rlim.rlim_cur);
	printf("rlim.rlim_max = %llu\n", rlim.rlim_max);
	printf("sysctl_max    = %d\n", sysctl_max);

	//////////////////

	{
		struct rlimit new_lim = rlim;
		new_lim.rlim_max = sysctl_max + 32;
		int err = setrlimit(RLIMIT_NOFILE, &new_lim);
		printf("** set rlim_max to sysctl_max + 32: %d\n", err);
	}

	getrlimit(RLIMIT_NOFILE, &rlim);
	sysctl(name, 2, &sysctl_max, &len, NULL, 0);

	printf("OPEN_MAX      = %d\n", OPEN_MAX);
	printf("rlim.rlim_cur = %llu\n", rlim.rlim_cur);
	printf("rlim.rlim_max = %llu\n", rlim.rlim_max);
	printf("sysctl_max    = %d\n", sysctl_max);

	//////////////////

	{
		struct rlimit new_lim = rlim;
		new_lim.rlim_max = rlim.rlim_cur + 32;
		int err = setrlimit(RLIMIT_NOFILE, &new_lim);
		printf("** set rlim_max to rlim_cur + 32: %d\n", err);
	}

	getrlimit(RLIMIT_NOFILE, &rlim);
	sysctl(name, 2, &sysctl_max, &len, NULL, 0);

	printf("OPEN_MAX      = %d\n", OPEN_MAX);
	printf("rlim.rlim_cur = %llu\n", rlim.rlim_cur);
	printf("rlim.rlim_max = %llu\n", rlim.rlim_max);
	printf("sysctl_max    = %d\n", sysctl_max);

	//////////////////

	open_to(sysctl_max);
	return 0;

	//////////////////

	struct rlimit new_lim = rlim;

	new_lim.rlim_cur = OPEN_MAX;
	int err = setrlimit(RLIMIT_NOFILE, &new_lim);
	printf("set to OPEN_MAX: %d\n", err);

	open_to(OPEN_MAX);

	new_lim.rlim_cur = sysctl_max;
	err = setrlimit(RLIMIT_NOFILE, &new_lim);
	printf("set to sysctl_max: %d\n", err);

	open_to(sysctl_max);

	new_lim.rlim_cur = sysctl_max + 1;
	err = setrlimit(RLIMIT_NOFILE, &new_lim);
	printf("set to sysctl_max + 1: %d\n", err);

	open_to(sysctl_max + 1);

	new_lim.rlim_cur = rlim.rlim_max;
	err = setrlimit(RLIMIT_NOFILE, &new_lim);
	printf("set to rlim_max: %d\n", err);

	open_to(rlim.rlim_max);

	return 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants