Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for UEFI #19486

Closed

Conversation

RossComputerGuy
Copy link
Contributor

@RossComputerGuy RossComputerGuy commented Mar 30, 2024

Just run: zig build -Dtarget=x86_64-uefi -Dsingle-threaded --zig-lib-dir lib

This PR introduces several improvements for UEFI, so far:

  • std.process.ArgvIterator support
  • std.process.totalSystemMemory support
  • UEFI Shell Protocol
  • Merged @truemedian's uefi-rework branch

Planned:

  • std.process.EnvMap support

@squeek502
Copy link
Collaborator

squeek502 commented Mar 30, 2024

Correct me if I'm wrong, but my assumption would be that UEFI should be using WTF-16 <-> WTF-8 conversion instead of UTF-16 <-> UTF-8 (see #19005 for context).

(links to any resources about these sorts of UEFI details would be appreciated if anyone has them handy)

Either way, the doc comments of either the InvalidUtf8 or InvalidWtf8 error in ChangeCurDirError will need to be updated accordingly.

@RossComputerGuy
Copy link
Contributor Author

@squeek502 The UEFI spec mentions UTF-16 and UTF-8 but not WTF-8 or WTF-16.

@llogick
Copy link
Contributor

llogick commented Mar 30, 2024

https://uefi.org/specs/UEFI/2.10/02_Overview.html?highlight=char16#data-types

CHAR16

2-byte Character.

Unless otherwise specified all characters and strings are stored in the UCS-2 encoding format as defined by Unicode 2.1 and ISO/IEC 10646 standards.
UCS-2 is a fixed width encoding that uses two bytes for each character; meaning, it can represent up to a total of 216 characters or slightly over 65 thousand. On the other hand, UTF-16 is a variable width encoding scheme that uses a minimum of 2 bytes and a maximum of 4 bytes for each character. This lets UTF-16 represent any character in Unicode while using minimal space for the most commonly used characters. For majority of the 65,000+ characters, UCS-2 and UTF-16 have identical [code](http://www.differencebetween.net/business/structure-systems/difference-between-universal-product-code-upc-and-stock-keeping-unit-sku/) points; so they are largely equivalent. This lets UTF-16 capable applications to correctly interpret UCS-2 codes. But the other way around would not work due to the many enhancements in UTF-16.

@RossComputerGuy
Copy link
Contributor Author

https://www.ibm.com/docs/en/i/7.4?topic=unicode-ucs-2-its-relationship-utf-16

The UCS-2 standard, an early version of Unicode, is limited to 65 535 characters. However, the data processing industry needs over 94 000 characters; the UCS-2 standard has been superseded by the Unicode UTF-16 standard.

So should we implement UCS-2 specific functions in the Unicode module or stick with the UTF-16 methods?

@squeek502
Copy link
Collaborator

squeek502 commented Mar 30, 2024

The situation with UEFI sounds like it is the same as the situation with Windows (arbitrary sequences of u16), so I think the right way to go is WTF-16 <-> WTF-8.

(WTF-16 is an informal name for potentially ill-formed UTF-16, or UTF-16 with potentially unpaired surrogates).

See the motivation section of the WTF-8 spec: https://simonsapin.github.io/wtf-8/#motivation and the description of #19005

@RossComputerGuy
Copy link
Contributor Author

The situation with UEFI sounds like it is the same as the situation with Windows

Yeah, well UEFI takes similarities from Windows so no surprise there.

so I think the right way to go is WTF-16 <-> WTF-8.

Has anyone tried WTF-8/16 encodings on UEFI? I've only seen UTF-8/16.

@squeek502
Copy link
Collaborator

squeek502 commented Mar 30, 2024

From the linked specs, UEFI uses UCS-2 which is essentially just sequences of u16. The only difference to UTF-16 is that well-formed UTF-16 disallows unpaired surrogate codepoints, meaning the code units 0xD800 through 0xDFFF. Because UCS-2 allows any u16 (including 0xD800 through 0xDFFF), this means that not all valid UCS-2 can be represented as UTF-16 (and also that not all valid UCS-2 can be represented as UTF-8).

WTF-16 is functionally equivalent to UCS-2 for our purposes, and WTF-8 is a superset of UTF-8 that allows the codepoints U+D800 to U+DFFF (surrogate codepoints) to be encoded using the normal UTF-8 encoding algorithm. Since U+D800 to U+DFFF are the only WTF-16 code units that are normally unrepresentable in UTF-8, this alone is sufficient to be able to losslessly roundtrip from WTF-8 to WTF-16.

If you'd like to test whether or not WTF-16 is needed, the way to go would be to create a file with any value from 0xD800 to 0xDFFF as one of the u16s in the filename. If the filesystem supports creating a file with such a name (which I'm assuming it does), then UTF-16 <-> UTF-8 is not going to cut it.

@RossComputerGuy
Copy link
Contributor Author

If you'd like to test this, the way to go would be to create a file with any value from 0xD800 to 0xDFFF as one of the u16s. If the filesystem supports creating a file with such a name (which I'm assuming it does), then UTF-16 <-> UTF-8 is not going to cut it.

Well, I can give it a try. I'm currently porting over @truemedian's UEFI stuff from their own fork so this PR will include file system operations.

@RossComputerGuy
Copy link
Contributor Author

I've merged @truemedian's Zig branch for the uefi-rework into this one and added a couple of improvements along the way.

@RossComputerGuy
Copy link
Contributor Author

Ok, so I got the compiler to build for UEFI
image
This PR still needs a lot of work though.

@RossComputerGuy RossComputerGuy force-pushed the feat/uefi-shell branch 3 times, most recently from 44da3f7 to 17a8713 Compare March 31, 2024 21:20
}

/// Returns the time in nanoseconds since 1900-01-01 00:00:00.
pub fn toEpochNanoseconds(self: Time) u128 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is causing an integer overflow panic.

const std = @import("std");

pub fn main() !void {
    const stat = try std.posix.fstat(std.fs.cwd().fd);

    std.debug.print("{} ", .{stat.atim});
    std.debug.print("{}\n", .{stat.atime()});

    std.debug.print("{} ", .{stat.mtim});
    std.debug.print("{}\n", .{stat.mtime()});

    std.debug.print("{} ", .{stat.ctim});
    std.debug.print("{}\n", .{stat.ctime()});
}
os.uefi.bits.Time{ .year = 2024, .month = 3, .day = 31, .hour = 0, .minute = 0, .second = 0, .nanosecond = 0, .timezone = 2047, .daylight = os.uefi.bits.Time.Time__struct_1282{ ._pad1 = 0, .in_daylight = false, .adjust_daylight = false } } 3920832000
os.uefi.bits.Time{ .year = 2024, .month = 3, .day = 31, .hour = 22, .minute = 28, .second = 8, .nanosecond = 0, .timezone = 2047, .daylight = os.uefi.bits.Time.Time__struct_1282{ ._pad1 = 0, .in_daylight = false, .adjust_daylight = false } } 3920912888
os.uefi.bits.Time{ .year = 1980, .month = 0, .day = 0, .hour = 0, .minute = 0, .second = 0, .nanosecond = 0, .timezone = 2047, .daylight = os.uefi.bits.Time.Time__struct_1282{ ._pad1 = 0, .in_daylight = false, .adjust_daylight = false } } panic: integer overflow 3186494919

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: this is with toEpochSeconds and not toEpochNanoseconds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suspicion would be days += self.day - 1;. Based on the doc comment for day, it should be between 1 and 31, but you're passing a struct with .day = 0, so something looks to be out of sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am not sure why it is like that. My implementation didn't do this (the one currently in upstream). This comes from @truemedian's branch which I incorporated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going further with this test program, I get this output:

os.uefi.bits.Time{ .year = 2024, .month = 3, .day = 31, .hour = 0, .minute = 0, .second = 0, .nanosecond = 0, .timezone = 2047, .daylight = os.uefi.bits.Time.Time__struct_1283{ ._pad1 = 0, .in_daylight = false, .adjust_daylight = false } } 3920832000
os.uefi.bits.Time{ .year = 2024, .month = 3, .day = 31, .hour = 22, .minute = 51, .second = 46, .nanosecond = 0, .timezone = 2047, .daylight = os.uefi.bits.Time.Time__struct_1283{ ._pad1 = 0, .in_daylight = false, .adjust_daylight = false } } 3920914306
os.uefi.bits.Time{ .year = 1980, .month = 0, .day = 0, .hour = 0, .minute = 0, .second = 0, .nanosecond = 0, .timezone = 2047, .daylight = os.uefi.bits.Time.Time__struct_1283{ ._pad1 = 0, .in_daylight = false, .adjust_daylight = false } }
Days in years: 29219

Days in months: 29219
panic: integer overflow 3186465173
Panicked during a panic. Aborting
const std = @import("std");

fn toEpochSeconds(self: std.os.uefi.bits.Time) u64 {
    var year: u16 = 1900;
    var days: u32 = 0;

    while (year < self.year) : (year += 1) {
        days += std.time.epoch.getDaysInYear(year);
    }
    std.debug.print("\nDays in years: {}\n", .{days});

    var month: u8 = 1;
    while (month < self.month) : (month += 1) {
        const leap_kind: std.time.epoch.YearLeapKind = if (std.time.epoch.isLeapYear(self.year)) .leap else .not_leap;

        days += std.time.epoch.getDaysInMonth(leap_kind, @enumFromInt(month));
    }
    std.debug.print("\nDays in months: {}\n", .{days});

    days += self.day - 1;
    std.debug.print("Days: {}\n", .{days});

    const hours = @as(u32, self.hour) * std.time.s_per_hour;
    std.debug.print("Hours: {}\n", .{hours});

    const minutes = @as(u16, self.minute) * std.time.s_per_min;
    std.debug.print("Minutes: {}\n", .{minutes});

    const days_as_sec = days * std.time.s_per_day;
    std.debug.print("Days as seconds: {}\n", .{days_as_sec});

    return days_as_sec +
        hours +
        minutes +
        self.second;
}

pub fn main() !void {
    const stat = try std.posix.fstat(std.fs.cwd().fd);

    std.debug.print("{} ", .{stat.atim});
    std.debug.print("{}\n", .{stat.atime()});

    std.debug.print("{} ", .{stat.mtim});
    std.debug.print("{}\n", .{stat.mtime()});

    std.debug.print("{} ", .{stat.ctim});
    std.debug.print("{}\n", .{toEpochSeconds(stat.ctim)});
}

So I think @ianprime0509 is right on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RossComputerGuy You don't even have to @intCast() to convert from u64 to u128:

pub fn main() void {
    const v: u64 = std.math.maxInt(u64);
    std.debug.print("{d}\n", .{ @as(u128, v) });
}

If you give a full stack trace I can help debug.

You may be able to avoid some integer casts if you refactor all your second and subsecond methods to be in terms of nanoseconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you give a full stack trace I can help debug.

Word of warning: we still cannot produce stack traces on UEFI, we're missing the information required to walk the stack backwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even int casting. @nektro has tried helping me with this problem as well on Discord. I traced the issue to the @as(u128, v) call with print calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's very strange. Is there a way to run behavior and stdlib tests on UEFI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda, however there are more important issues we need to get through first. Like zig actually being able to find itself in UEFI.

@RossComputerGuy RossComputerGuy force-pushed the feat/uefi-shell branch 2 times, most recently from a9bc276 to d877e8a Compare April 1, 2024 04:04
@RossComputerGuy RossComputerGuy force-pushed the feat/uefi-shell branch 6 times, most recently from d1723fc to 4ec3b02 Compare April 5, 2024 20:17
@RossComputerGuy RossComputerGuy force-pushed the feat/uefi-shell branch 2 times, most recently from 7b079e7 to 471889c Compare April 19, 2024 18:52
@RossComputerGuy
Copy link
Contributor Author

Some progress has been made. zig env works (libc hangs) but crashes at the end, zig targets prints but crashes which an OOB error.

@andrewrk
Copy link
Member

andrewrk commented Jun 8, 2024

draft status for > 30 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants