Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posix-based windows implementation #497

Merged
merged 18 commits into from
May 8, 2023

Conversation

patricoferris
Copy link
Collaborator

@patricoferris patricoferris commented Apr 27, 2023

This is a revised version of #490 (as part of #125). As mentioned in the previous PR, for an Eio 1.0 we're looking to have functional support for windows in a rigorous manner. To that end this PR implements a Windows backend using a mostly unchanged Eio_posix backend (directly copied over) with the following modifications and points of note:

  • Filesystem operations (mkdir, open etc.) are not currently supported.
  • Subprocess support is not currently supported either.
  • The "polling" mechanism has been changed to use Unix.select I started looking into adding a Windows backend to iomux then I stumbled across poll and the use of Wepoll. I tried to simply use WSApoll but of course that completely doesn't work for anything but sockets, so to minimise the diff and just get something working I went with Unix.select which I believe should just work for files and sockets a like and is pretty well tested.
  • I ran into some issues with MDX so I couldn't just copy the tests over from Eio_main tests. I'll look into that, it seemed to be something about a dll and flexlink or something.
  • It uses this PR for Unix_cstruct unix: add Cstruct_unix.{read,write,writev,send,recv} mirage/ocaml-cstruct#302 -- if this can't be merged, we could probably vendor the stubs. We still need other functions implemented like pread and pwrite which is a bit awkward because of the differences in how the FD current offset is changed compared to Unix systems...

This is an initial draft, I want to go over the Unix.select stuff and of course do more testing. Maybe then a follow-up PR can come in to add FS operations and the other missing pieces.

@samwgoldman
Copy link

This issue is relevant here. Unix.select works well with sockets on Windows, but the emulation for non-sockets is not 100%.

Copy link
Collaborator

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Are some of the examples able to run on Windows (hello, or even net?).

Deleting everything marked as deprecated would shrink it a bit.

CI is failing to find Eio. Probably just needs opam pin --with-version=dev.

lib_eio_windows/children.mli Outdated Show resolved Hide resolved
lib_eio_windows/dune Show resolved Hide resolved
lib_eio_windows/eio_windows.ml Outdated Show resolved Hide resolved
lib_eio_windows/eio_windows.mli Outdated Show resolved Hide resolved
lib_eio_windows/fd.ml Outdated Show resolved Hide resolved
lib_eio_windows/fd.mli Outdated Show resolved Hide resolved
lib_eio_windows/low_level.ml Outdated Show resolved Hide resolved
let fd_map = Hashtbl.create 10 in
let t = { run_q; poll; poll_maxi = (-1); fd_map; eventfd; eventfd_r;
active_ops = 0; need_wakeup = Atomic.make false; sleep_q } in
let eventfd_ri : int = Obj.magic eventfd_r in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think file descriptors are ints on Windows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's a little embarrassing ^^" Will fix soon

Copy link
Collaborator Author

@patricoferris patricoferris Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incorrect use of Obj.magic has been removed in 3ab48ea -- I'm now relying on comparing Unix.file_descr for the set of things to check to read and write.

@patricoferris
Copy link
Collaborator Author

patricoferris commented Apr 28, 2023

Thanks! Are some of the examples able to run on Windows (hello, or even net?).

Yep both work :)) And yes, I was a little too copy-paste happy and can remove all the bits that are not needed shortly.

With regards to CI, I'm trying to get it working over on a different branch to avoid noise here https://github.com/patricoferris/eio/actions/runs/4829634872/jobs/8604900630. Setup-ocaml uses an old version of opam I think so I can't use --with-version :( -- there are just lots of blockers to having OCaml 5 and windows including:

These are all probably prerequisites before an Eio_windows release?

@avsm
Copy link
Contributor

avsm commented Apr 29, 2023

Where's the dependency on ocplib-endian coming from? We removed it from cstruct a while back, and it shouldn't be necessary these days.

@patricoferris
Copy link
Collaborator Author

Where's the dependency on ocplib-endian coming from?

Crowbar I think https://ocaml.org/p/crowbar/latest

@polytypic
Copy link
Contributor

I ran into some issues with MDX so I couldn't just copy the tests over from Eio_main tests. I'll look into that, it seemed to be something about a dll and flexlink or something.

On Windows OCaml uses flexdll / flexlink for DLLs. So, on Windows, if you want to be able to load a C library dynamically, e.g. to use it from the OCaml toplevel, utop, or mdx, for example, then that library needs to be compiled (with __declspec(dllimport) annotations where needed) and linked (using flexlink) in a special way to support it.

I ran into this problem when trying to run tests with the luv backend. See here for an ugly hack with which I managed to actually be able to build luv to support dynamic loading on Windows.

@patricoferris
Copy link
Collaborator Author

Thanks @polytypic, I think that explains some of the errors I was seeing (which I've disabled I think). I'm not sure yet what's causing the new mdx errors in the CI (the stackoverflows with pretty bizarre line numbers).

@patricoferris
Copy link
Collaborator Author

Hmm are we going to be bitten here by FD_SETSIZE in select ? https://github.com/ocaml/ocaml/blob/23dab79a4e42856aa33816b9c79c3d4d79959cb9/otherlibs/unix/select_win32.c#L942

The default size FD_SETSIZE (currently 1024) is somewhat smaller than the current kernel limit to the number of open
files. However, in order to accommodate programs which might potentially use a larger number of open files with select, it is possible to increase this size within a program by providing a larger definition of FD_SETSIZE before the inclusion of <sys/types.h>

@patricoferris
Copy link
Collaborator Author

@polytypic just for extra clarity this was the kind of error I was seeing with MDX -- it looks very similar to the one you had with Luv only but not quite the same maybe.

+Cannot load required shared library dllcstruct_unix_stubs.
+Reason: C:/OCaml64/home/PatrickFerris/.opam/5.0/lib/stublibs\dllcstruct_unix_st
ubs.dll: flexdll error: cannot relocate caml_failwith RELOC_REL32, target is too
 far: fffffffd958da134  ffffffff958da134.

@polytypic
Copy link
Contributor

flexdll error: cannot relocate caml_failwith RELOC_REL32, target is too far: fffffffd958da134 ffffffff958da134.

Yes, I saw those kinds of errors also. The functions that may/need to be dynamically loaded should be marked with __declspec(dllimport).

@avsm
Copy link
Contributor

avsm commented Apr 30, 2023

We shouldn't have to manually annotate declspec(dllimport) in user stubs like cstruct these days. The full background is in ocaml/ocaml#1633, but I'm not sure what the current state of those tools are as Cygwin64 itself has been a moving target.

Copy link
Collaborator

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few minor comments in-line, but basically good to merge once CI is passing. Currently, it says:

Error: Library "eio_windows" in _build/default/lib_eio_windows is hidden

Probably ocaml/dune#5505 again. The solution is to use (* -*- tuareg -*- *) mode in the dune test file and disable the tests when not on Windows (see https://github.com/ocaml-multicore/eio/pull/475/files).

.github/workflows/main.yml Show resolved Hide resolved
lib_eio/unix/fork/dune Outdated Show resolved Hide resolved
lib_eio/unix/fork_action.c Outdated Show resolved Hide resolved
lib_eio_windows/eio_windows_stubs.c Outdated Show resolved Hide resolved
lib_eio_windows/fs.ml Outdated Show resolved Hide resolved
lib_eio_windows/sched.ml Outdated Show resolved Hide resolved
Copy link
Collaborator

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet what's causing the new mdx errors in the CI (the stackoverflows with pretty bizarre line numbers

[mdx] Fatal error: File "ctf.md", lines 33570790-33570791: Stack overflow

Does it work locally on Windows?

Given that ctf.md is using almost no features of Eio (just Eio.Private.Ctf.mint_id), this seems likely to be an MDX bug. If we can reproduce it without the #require "eio", we can report it there.

It's a good sign that the CI is able to build on Windows though, even if the tests fail. Just checking the build works is probably OK for this PR. We know many of the real tests will fail anyway.

lib_eio/unix/fork_action.c Outdated Show resolved Hide resolved
@patricoferris
Copy link
Collaborator Author

Does it work locally on Windows?

It's working locally for me (it would be nice to have a second person verify that). I agree it seems to be something going wrong with MDX and Windows, but only appearing in the CI. I'll see if I can make a demo repo with MDX failing like that to report (and maybe fix ^^") upstream -- we really need mdx to work on Windows in the long run !

@patricoferris
Copy link
Collaborator Author

patricoferris commented May 3, 2023

I threw a very simple md file into the tests that has zero dependencies and it also fails with the stackoverflow error see https://github.com/patricoferris/eio/actions/runs/4872160326/jobs/8689998313#step:5:22

EDIT: And so does macOS ^^" https://github.com/patricoferris/eio/actions/runs/4872160326/jobs/8689998021

EDIT EDIT: Also, I was probably meant to have done something with CRLF... Whilst investigating MDX issues I wrote a markdown file with this and it just hangs when you try to do things it seems.

@patricoferris
Copy link
Collaborator Author

For the CRLF bits see realworldocaml/mdx#294

@patricoferris
Copy link
Collaborator Author

patricoferris commented May 4, 2023

Yeah, so to summarise my thoughts, the MDX issues are:

So perhaps for this PR we can disable all MDX tests on Windows in the meantime and work to fix the declspec(dllimport) issues? To make up for the disable tests I can add more to the Alcotest tests on the windows backend?

@talex5
Copy link
Collaborator

talex5 commented May 4, 2023

So perhaps for this PR we can disable all MDX tests on Windows in the meantime and work to fix the declspec(dllimport) issues?

Sounds good to me!

@polytypic
Copy link
Contributor

I just got this PR to build and test on my Windows machine. I'll try to take a moment to read through the changes today.

@@ -67,13 +70,17 @@ CAMLprim value eio_unix_fork_execve(value v_unit) {
}

static void action_fchdir(int errors, value v_config) {
#ifdef _WIN32
uerror("Unsupported operation on windows", Nothing);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't allocate OCaml exceptions here (or in the other actions) as we're in the child and anything involving GC isn't going to work.

Suggested change
uerror("Unsupported operation on windows", Nothing);
eio_unix_fork_error(errors, "action_fchdir", "Unsupported operation on windows");
_exit(1);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh good catch thanks! Fixed the others too in cab124d

@patricoferris
Copy link
Collaborator Author

patricoferris commented May 8, 2023

I do wonder if (when it is ready) we should merge this into a stable Windows branch instead of the main branch, as it will probably be a pain to cut a release of Eio if this is in there?

@talex5
Copy link
Collaborator

talex5 commented May 8, 2023

I do wonder if (when it is ready) we should merge this into a stable Windows branch instead of the main branch, as it will probably be a pain to cut a release of Eio if this is in there?

We already can't cut a release, because we got rid of the Windows support from libuv and didn't replace it with anything yet.

@talex5 talex5 merged commit 439e538 into ocaml-multicore:main May 8, 2023
@talex5 talex5 mentioned this pull request May 8, 2023
9 tasks
@nojb
Copy link
Contributor

nojb commented May 23, 2023

I understand that the POSIX backend for Windows is just a stopgap measure while waiting for a more performant IOCP backend, but just for future reference here's another issue with the emulation in Unix.select on Windows that we ran into some time ago: ocaml/ocaml#7935

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants