-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make GVFS available for Linux and macOS #4
Comments
I agree with this. I would love to give this a spin, but I'm not willing to switch to windows for it. |
The announcement (and GitHub readme) emphasize "GVFS relies on a protocol extension that any service can implement" (https://github.com/Microsoft/gvfs/blob/master/Protocol.md). Any client tooling can support this as long as the protocol extension is added. |
@nasserd That protocol extension is just for lazy fetching of objects. That is the minority of the work in this repo. The vfs allows file change tracking to prevent working tree scans. Both are valuable but just the protocol extension doesn't provide much. |
+1 for non Windows support |
Yes we definitely want to support Mac and Linux, and we are looking for people with file systems expertise on those platforms. |
I guest it not working like File System, you need addition permission to load File System driver (like linux load kernel mod) or install FUSE , so , please just make git itself support the new gvfs protocol. |
There be two kind of git serve. over http/https and ssh. it seems will take more and more working to do if need implement GVFS. |
Dokan can serve as a conceptual basis on how a VFS can be be bridged between Windows and FUSE-capable systems. Also, with WSL evolving, it could conceivably be taken the opposite way, and have raw support for mounting FUSE-based filesystems directly within Windows. I for one would love to see that day. |
GVFS.FltWrapper could be abstracted so that people could aubstitute it with their own implementation for their OSes, or make a portable tool to de-virtualize files without dealing with custom FS |
We actually built our first internal version of GVFS on something that looks a lot like FUSE. The challenge is that performance is absolutely critical on the file system on which you're running your builds, and context switching from kernel to user mode for every bit of IO can never be as fast as just doing it down in the kernel. What GvFlt does for us is that we only do the kernel to user mode context switch the first time the file is opened, and after that it just becomes a normal NTFS file. I'll be very curious to see how we can replicate that on Mac and Linux, but I don't yet know enough about those systems. If anyone has ideas, I'd be very happy to discuss them! |
How about using git-annex, which is already packaged for most GNU/Linux distributions? |
I took a look at porting the code to Mac with Mono. I ran into a number of NuGet packages that I could not restore on the Mac, where can I get these from?
|
@joudinet annex, LFS, and similar solutions help (though only partially in our case) with the size problem, but they do nothing to help with the issue that it takes a long time for git to operate on a large number of files. When you have 3.5 million files, a basic "git status" takes 8 minutes, and that's because it has to enumerate every single file, at the very least compare its timestamp with the index, and worst case open the file and calculate its hash. By virtualizing, we are tackling both of those problems. |
@migueldeicaza All of those packages are available on nuget.org. However, I don't think you can do just a straight port of this to Mac, because one of the key components is the GvFlt filter driver that only works on Windows. That's the key piece that we need to figure out how to develop on Mac in an efficient way. |
Once I get this building, I will try the next step. What I would do is plug the code from GVFS into FUSE on macOS, and that is the area that will need some code changes. |
@migueldeicaza if you get this working on mac w/ mono is there any chance it will work on linux w/ mono? |
Some other challenges to keep in mind (Mikayla is helping me get my packages sorted out). The |
@DwordPtr if we get a unix port, it should work with little effort on Linux |
For those following at home: Start by updating Xamarin Studio to the beta channel (you will need the upgraded Mono). Then you will need a newer Nuget on your system:
With that, you can get your chainsaw and start cutting. |
@migueldeicaza That sounds great, let me know if you have any questions about the code as you're going through it! |
The filter driver of Windows => the |
fanotify: fscking all notification and file access system
|
I wonder how much looking at gitfs aka SlothFS could be of help for getting started with the port: "SlothFS is a FUSE filesystem that provides light-weight, lazily downloaded, read-only checkouts of manifest-based Git projects. It is intended for use with Android". |
I'm heavily over-simplifying here, but there have been two /really hard/ problems to solve in the file system portions of GVFS. 1) Enabling writes and 2) Making the file system fast, ideally as fast as a local disk for the second access. A read-only FUSE filesystem doesn't solve either of those very well :-). We actually had GVFS up and running almost a year and a half ago using a FUSE-like solution, and it worked great if you just wanted to read files and have them download on demand. Everything we've done since then is to allow you to also do things like run a build on top of GVFS, modify any file you want, have "git status" and "git checkout" do the right thing but do it fast, etc. I'd love to be proven wrong on this, but as we go to port this to Mac and Linux, I'd be very surprised if FUSE alone ends up being the correct solution. We had started with a similar solution for our Windows implementation, and what we learned is that there's just no way to make it fast enough if every single bit of IO has to transition from kernel mode to user mode. The way we've solved this on Windows is using the GvFlt driver, which only has to transition to user mode GVFS for the first file access, and after that lays it down on disk as a normal NTFS file, which enables your second access to be as fast as normal. I still haven't gone very deep on this, but I'm currently thinking that the solution for Mac/Linux will potentially be some combination of a FUSE read-only filesystem, combined with OverlayFS, combined with some sort of on-disk caching. But that perf requirement is a hard one, and it may drive us to build a custom driver for those platforms too. And I've also completely skipped talking about all the challenges of ensuring that git operations do the right things, even after you've made the file system writable and fast. That raises a whole set of other challenges. |
Thanks for the writeup. I wonder if an extension to FUSE would be suited here. This way you can maintain the userspace implementation. It would provide something similar to what you described, on first access it could return a file handle to a file on a different filesystem, thereon in the kernel would proxy all read/write operations to that handle directly. However it seems that this idea has been proposed before without too much buy in from the FUSE devs:
Although there was some interest each time and often citing that the performance was "good enough" or there weren't clear wins. So maybe with such a clear use case and some good benchmarks this could be supported. |
@sanoursa wrote:
I am the author of WinFsp which is a FUSE solution for Windows. What I have found is that a user mode file system that enables caching on Windows (i.e. uses the NTOS Cache Manager), can be almost as fast as NTFS. This is because the cache manager satisfies a lot of the I/O and the context switches are minimized. [The reason that NTFS is fast is because of the cache manager and not because disk accesses are fast; besides context switches are faster than disk accesses.] I link to some performance tests that show that a user mode file system can be very fast: https://github.com/billziss-gh/winfsp/wiki/WinFsp-Performance-Testing. NTFS has a slight edge on cached reads/writes in these tests, but this is because WinFsp does not implement FastIO (yet). If I was doing this (and I am tempted) I would actually start with a cross-platform FUSE implementation, so the whole git world can benefit. I would then port to Windows using the WinFsp-FUSE layer (or its native API for maximum Windows compatibility). |
+1 |
|
That would probably be a bad design decision. I exceed inotify limits (upward of >1m) on a daily basis due to Software like Jetbrains *, Bazel, ...
Due to the above this seems the most reliable way. |
Reading into OverlayFS it seems to be a good choice.
I think the problem that we will face on Linux is that git is really fast there. And any solution "feeling" not as swift will probably not be accepted. |
Do we want to move the discussion somewhere else with actual topic replies? Like Reddit, Slack, ... |
I think discussion makes more sense here. |
On 22/08/17 10:14, Andreas Bergmeier wrote:
1. OverlayFS seems to also return a file handle directly from one of
the underlying FS. While this is great and all, it seems to force
us to copy large files from the lower to the upper FS as soon as
we modify the file in any way (even attributes). This /may/ be
ReallySlow™️ and "waste" space.
In practice it turns out that files in source repositories are rarely
modified, almost always entirely rewritten. I don't think this should be
considered a blocker.
|
The way git works, git objects are not diffs but instead full copies of the modified files, so in any case full file would need to be transfered. |
On 22/08/17 10:31, Jesús Leganés-Combarro wrote:
The way git works, git objects are not diffs but instead full copies
of the modified files, so in any case full file would need to be
transfered.
I should have been more clear. I wasn't talking about the files in .git,
but rather the working tree that the user is editing.
IIUC the tools modifying .git wouldn't have to go though the caching
overlay filesystem in most cases. Although I guess that that depends on
the implementation.
|
@gitster Do you have any opinion about these questions? |
For those of you who are interested you may also want to watch Eden. It is a low-level FUSE file system that (I believe) supports sparse checkout for mercurial and git. It currently does not build, but my understanding is that the Facebook folks have great plans for it. |
@billziss-gh Is there any further information about Eden? Like mentioned earlier - a pure FUSE fs probably will not scale. |
My understanding is that they plan to open things up more as time passes. You may want to reach out to @ wez for further information.
As I mentioned earlier in the thread context switches are slow, but not as slow as disk accesses. A user mode file system that uses the OS file cache (cache manager, page cache, etc.) can be fast. This is not idle speculation, but it is something that I have done successfully on Windows. |
Just wanting to point out that Jonathan Tan from Google is working on adding a native sparse checkout feature to git which might render gvfs and others obsolete: https://public-inbox.org/git/20170915134343.3814dc38@twelve2.svl.corp.google.com/T/#u |
@est31 that's the hope :-). We're trying to push as much of this functionality into core git as we can. GVFS is currently providing two main features: 1) partial clone + on demand object download (which could be done in a platform-agnostic way in core git) and 2) working directory virtualization and dynamic expansion of sparse-checkout, which requires a file system driver and must be platform-specific. |
@abergmeier-dsfishlabs: we don't have any public news or timeline for Eden today. What I said in facebook/sapling#4 (comment) is still broadly true. The grand vision is that it will be cross platform (linux, macos, windows) but it will take a bit of time to get there for all systems. We're prioritizing supporting Mercurial as that is what we're using for our largest repositories. |
Great summary. For Linux I would think: working directory virtualizationShould be fairly "easy" to implement no matter in FUSE or kernel. dynamic expansionThis probably can be split into
This probably is done best by a daemon process, which listens on a directory (on the kernel fs). This directory would then have a list of files, which trigger a fetch of promised objects. So perhaps have working directory virtualization in the kernel and dynamic expansion in userspace? |
What about integrating the working directory virtualization part directly in the kernel as this is not the only project which could profit from such features: |
@darkdragon-001 @abergmeier @abergmeier-dsfishlabs is there any more work done to support linux and mac? |
Please see https://blogs.msdn.microsoft.com/devops/2018/03/15/gvfs-for-mac/ for the latest |
It looks like this is on GitHub's radar folks. |
Ms developed it for their use case which was managing windows source on Windows machines. It's like asking why so many Linux opensourced projects don't have Windows versions. |
I would definitely suggest taking a look at WinFSP. Also, maybe the easiest way to do this would be to use libfuse on linux systems. |
Tell me one, and if so, tell me why it would be dificult to port to Windows... |
I can’t think of any open-source project that would need this, whether on Linux or elsewhere. Maybe Microsoft should simply refactor their Windows code and get rid of all the legacy cruft? Then standard Git would work just fine. |
If you read the reasons why they developed GVFS, you'll see they have a single huge repo with all their source code of all their projects since MS-DOS 1.0, so they commits history and the commits diffs are humongous. You would think this is unreasonable for normal use cases, but I'm eagerly their changes gets into mainstream git and GVFS gets into Linux so I can be able to use it as a versioned distributed filesystem. |
@piranna Because MS needed to scratch their itch first, which is their huge code base. So there was no reason for them to immediately port to linux. @ido git submodules are a pain to manage. Sometimes a giant mono repo is easiest. You don't have to continually manage multiple artifacts, artifact repo, and version tracking files. Nevermind the support for that is seriously lacking in the C/C++ world. Annex doesn't solve the problem, and other projects only tackle parts like large file support |
Closing this issue because it is being replaced by many, finer grained issues covering the actual porting work. GVFS for Mac is now under active development and we have a bunch of issues covering that work. GVFS for Linux will be soon as well, so we'll open Linux-focused issues at that time. |
Teach upgraders to control their own upgrade actions
Is there any intent to port GVFS over to Linux or macOS?
The text was updated successfully, but these errors were encountered: