Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.posix: Rewriting enums to seperate clock IDs for clock_gettime() and timerfd_create() #21347

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

chrboesch
Copy link
Contributor

@chrboesch chrboesch commented Sep 8, 2024

In the linux kernel header file (time.h) is the note, that the driver implementing the clock IDs

  • SGI_CYCLE
  • TAI
    got removed. Therefore, calling timerfd_create() with these IDs will result in an error. So I disabled them.

In the linux kernel header file (time.h) is the note,
that the driver implementing the clock IDs
- SGI_CYCLE
- TAI
got removed. Therefore, calling timerfd_create() with these IDs
will result in an error. So I disabled them.
@chrboesch chrboesch changed the title Disable no longer used clock IDs in Linux std.os.linux: Disable no longer used clock IDs Sep 8, 2024
Copy link
Member

@alexrp alexrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me.

However, I do not believe this fixes #20334 because there are still more clocks defined here than timerfd_create() will accept. And it seems like there needs to be some discussion on that issue as to how that should be addressed.

@chrboesch
Copy link
Contributor Author

I have tried every other clock ID on my linux computer and it works fine and as expected. And since the clock IDs are defined as enums, contrary to the assumption of the thread creator, you can only select the ones entered. So it will definitely fix the issue.

@alexrp
Copy link
Member

alexrp commented Sep 10, 2024

So just to be clear, even clocks like PROCESS_CPUTIME_ID and THREAD_CPUTIME_ID work with timerfd_create()?

@chrboesch
Copy link
Contributor Author

The fact that you ask means it doesn't work on your computer. Let me do some testing on different computers.

@alexrp
Copy link
Member

alexrp commented Sep 10, 2024

No no, I haven't actually tried. I'm only asking because those specifically are somewhat special clocks, so I just wouldn't take it for granted that they work with timerfd_create(). I'm happy to take your word for it if you've tested them and they worked.

@chrboesch
Copy link
Contributor Author

chrboesch commented Sep 10, 2024

First of all, thanks for your skepticism. It was my bad that I did some quick tests in C on my Gentoo with self-compiled kernel. And that I mixed timerfd_create with getclock_time because there is no separation in the header file time.h. Interestingly, the timers even work in Zig even though they shouldn't. But only on my self-compiled kernel. And sometimes only with root rights.

However, the man page timerfd_create(2) says there are actually only two valid timers: CLOCK_REALTIME and CLOCK_MONOTONIC. So, I think we should limit the enum of these clock IDs, because it doesn't make sense if a timer sometimes works and sometimes doesn't. And it's hard to explain that you can achieve this with some special kernel settings. 😉

What surprises me a little, however, is that the equivalent POSIX command is 'timer_create'. And it also knows CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, but is triggered by different Linux commands. In other words, the over all change should not only be limited to restricting the enum, but also to correctly mapping the POSIX command.

TL;DR
My suggestion: Firstly, we limit the enum to the actually possible clock IDs (that fixes the issue) and secondly, we should correct the POSIX command later. What do you think?

Addendum: Since clock_id is also used in Zig for clock_gettime, this is a bit more complicated. We therefore need to have a second enum just for timercreate_fd. That's probably what the issue creator meant?

Addendum 2: To add to the confusion, Zig uses clockid and clockid_t equivalently, which is incorrect, as Linux distinguishes between them. This makes it difficult to simply add an enum clockid.

Addendum 3: It's not Zig's fault, it's Linux's fault. There, the term 'clockid' is used in the same way in different contexts, which contributes to this confusion. These are the things that contribute to instability and errors in operation because developers mistakenly confuse the two. I'll think about how we can make this better for Zig.

@alexrp
Copy link
Member

alexrp commented Sep 10, 2024

First of all, thanks for your skepticism. It was my bad that I did some quick tests in C on my Gentoo with self-compiled kernel. And that I mixed timerfd_create with getclock_time because there is no separation in the header file time.h. Interestingly, the timers even work in Zig even though they shouldn't. But only on my self-compiled kernel. And sometimes only with root rights.

It wouldn't surprise me if some distros disable those clocks I mentioned for non-superusers out of security concerns or something like that. But I'm just guessing here.

However, the man page timerfd_create(2) says there are actually only two valid timers: CLOCK_REALTIME and CLOCK_MONOTONIC. So, I think we should limit the enum of these clock IDs, because it doesn't make sense if a timer sometimes works and sometimes doesn't. And it's hard to explain that you can achieve this with some special kernel settings. 😉

Generally speaking, the Linux kernel code is the source of truth; the man pages are sometimes incomplete or wrong. I think we should base our decision on what the kernel code is written to allow. If it is the case (and I don't know if it is) that the kernel can return EINVAL for some clock IDs depending on factors outside of the programmer's control, then EINVAL should be considered an expected error and be converted to a Zig error accordingly.

I think all the Linux system calls that take clock IDs should be checked to see what's allowed before we decide what to do here (other than removing SGI_CYCLE and TAI, which should be uncontroversial).

@chrboesch
Copy link
Contributor Author

Ok, I removed the "fixes #20334", because it doesn't. So the PR is valid. And then we should discuss in the issue what we should do to solve the problem in general.

@chrboesch chrboesch changed the title std.os.linux: Disable no longer used clock IDs std.posix: Rewriting enums to seperate clock IDs for clock_gettime() and timerfd_create() Sep 11, 2024
Copy link
Contributor

@rootbeer rootbeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the cleanup of clock_gettime(). I've got a couple drive-by comments, but feel free to ignore them if they don't make sense to you.

// * The driver implementing this got removed. The clock ID is kept as a
// * place holder. Do not reuse!
// Therefore, calling clock_gettime() with these IDs will result in an error.
// SGI_CYCLE = 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically these could be useful if someone wants to compile a Zig program to run against an old Linux kernel that supports these old clocks? (I suspect that's really unlikely in this case though?) Anyway, it seems a bit odd for Zig to only track what is currently supported, so maybe its just sufficient to add a comment that these clocks are not supported on modern systems?

    SGI_CYCLE = 10, // Linux dropped support for this clock in the v4.13 timeframe
    TAI = 11, // Linux dropped support for this clock in the v4.13 timeframe

I may be missing some context on these clocks, so feel free to ignore this if its not compelling to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TAI_CLOCK was designed as CLOCK_REALTIME(UTC) + tai_offset, but tai_offset was always 0. So there is no point in using this clock, and no one did.
SGI_CYCLE was for Silicon Graphics (SGI) workstations, which are probably no longer in use, so it makes sense to disable these two.


pub const CLOCK_ID = clock_id;

pub const clock_id = enum(u32) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a short comment here and on clockid_t differentiating them? (I think just labelling which API/syscall they're aimed at would be sufficient? clock_gettime vs timerfd_create?)

Copy link
Contributor Author

@chrboesch chrboesch Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, makes sense.

@@ -1121,7 +1121,7 @@ test "access smoke test" {
test "timerfd" {
if (native_os != .linux) return error.SkipZigTest;

const tfd = try posix.timerfd_create(.MONOTONIC, .{ .CLOEXEC = true });
const tfd = try posix.timerfd_create(posix.clock_id.MONOTONIC, .{ .CLOEXEC = true });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The posix.clock_id. prefix shouldn't be necessary here, Zig should pick the right .MONOTONIC from the type signature of timerfd_create.

Copy link
Contributor Author

@chrboesch chrboesch Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@chrboesch
Copy link
Contributor Author

Now this PR fixes #20334.

@@ -2054,10 +2054,10 @@ pub fn eventfd(count: u32, flags: u32) usize {
return syscall2(.eventfd2, count, flags);
}

pub fn timerfd_create(clockid: clockid_t, flags: TFD) usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the PR in a few days so I might be missing some context here - why are we changing this to an integer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based my work on the kernel sources, there it is an integer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean it quite that literally when I said the kernel code should be the basis for our decision here. 🙂 We do use enums in syscall wrappers when the set of acceptable values is known ahead of time - which I think is the case here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was unsure but can change that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants