Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GVFS available for Linux and macOS #4

Closed
martinseener opened this issue Feb 3, 2017 · 55 comments
Closed

Make GVFS available for Linux and macOS #4

martinseener opened this issue Feb 3, 2017 · 55 comments

Comments

@martinseener
Copy link

Is there any intent to port GVFS over to Linux or macOS?

@DwordPtr
Copy link

DwordPtr commented Feb 3, 2017

I agree with this. I would love to give this a spin, but I'm not willing to switch to windows for it.

@nasserd
Copy link

nasserd commented Feb 3, 2017

The announcement (and GitHub readme) emphasize "GVFS relies on a protocol extension that any service can implement" (https://github.com/Microsoft/gvfs/blob/master/Protocol.md). Any client tooling can support this as long as the protocol extension is added.

@kevincox
Copy link

kevincox commented Feb 3, 2017

@nasserd That protocol extension is just for lazy fetching of objects. That is the minority of the work in this repo. The vfs allows file change tracking to prevent working tree scans. Both are valuable but just the protocol extension doesn't provide much.

@ktf
Copy link

ktf commented Feb 3, 2017

+1 for non Windows support

@sanoursa
Copy link
Contributor

sanoursa commented Feb 3, 2017

Yes we definitely want to support Mac and Linux, and we are looking for people with file systems expertise on those platforms.

@ghost
Copy link

ghost commented Feb 4, 2017

I guest it not working like File System, you need addition permission to load File System driver (like linux load kernel mod) or install FUSE , so , please just make git itself support the new gvfs protocol.

@ghost
Copy link

ghost commented Feb 4, 2017

There be two kind of git serve. over http/https and ssh. it seems will take more and more working to do if need implement GVFS.

@Dessix
Copy link
Member

Dessix commented Feb 6, 2017

Dokan can serve as a conceptual basis on how a VFS can be be bridged between Windows and FUSE-capable systems. Also, with WSL evolving, it could conceivably be taken the opposite way, and have raw support for mounting FUSE-based filesystems directly within Windows. I for one would love to see that day.

@max630
Copy link

max630 commented Feb 6, 2017

GVFS.FltWrapper could be abstracted so that people could aubstitute it with their own implementation for their OSes, or make a portable tool to de-virtualize files without dealing with custom FS

@sanoursa
Copy link
Contributor

sanoursa commented Feb 6, 2017

We actually built our first internal version of GVFS on something that looks a lot like FUSE. The challenge is that performance is absolutely critical on the file system on which you're running your builds, and context switching from kernel to user mode for every bit of IO can never be as fast as just doing it down in the kernel. What GvFlt does for us is that we only do the kernel to user mode context switch the first time the file is opened, and after that it just becomes a normal NTFS file. I'll be very curious to see how we can replicate that on Mac and Linux, but I don't yet know enough about those systems. If anyone has ideas, I'd be very happy to discuss them!

@joudinet
Copy link

joudinet commented Feb 7, 2017

How about using git-annex, which is already packaged for most GNU/Linux distributions?
https://git-annex.branchable.com/

@migueldeicaza
Copy link

I took a look at porting the code to Mac with Mono.

I ran into a number of NuGet packages that I could not restore on the Mac, where can I get these from?

bash$ nuget restore
Unable to find version '1.9.4' of package 'ManagedEsent'.
Unable to find version '1.9.4' of package 'Microsoft.Database.Collections.Generic'.
Unable to find version '1.9.4' of package 'Microsoft.Database.Isam'.
Unable to find version '1.1.28' of package 'Microsoft.Diagnostics.Tracing.EventRegister'.
Unable to find version '1.1.28' of package 'Microsoft.Diagnostics.Tracing.EventSource'.
Unable to find version '1.1.28' of package 'Microsoft.Diagnostics.Tracing.EventSource.Redist'.
Unable to find version '1.0.0' of package 'StyleCop.Error.MSBuild'.
Unable to find version '4.7.54.0' of package 'StyleCop.MSBuild'.
Unable to find version '0.17131.2-preview' of package 'Microsoft.GVFS.GVFlt'.
Unable to find version '2.0.275-beta' of package 'CommandLineParser'.
Unable to find version '3.5.0' of package 'NUnitLite'.

@sanoursa
Copy link
Contributor

sanoursa commented Feb 7, 2017

@joudinet annex, LFS, and similar solutions help (though only partially in our case) with the size problem, but they do nothing to help with the issue that it takes a long time for git to operate on a large number of files. When you have 3.5 million files, a basic "git status" takes 8 minutes, and that's because it has to enumerate every single file, at the very least compare its timestamp with the index, and worst case open the file and calculate its hash. By virtualizing, we are tackling both of those problems.

@sanoursa
Copy link
Contributor

sanoursa commented Feb 7, 2017

@migueldeicaza All of those packages are available on nuget.org.

However, I don't think you can do just a straight port of this to Mac, because one of the key components is the GvFlt filter driver that only works on Windows. That's the key piece that we need to figure out how to develop on Mac in an efficient way.

@migueldeicaza
Copy link

Once I get this building, I will try the next step.

What I would do is plug the code from GVFS into FUSE on macOS, and that is the area that will need some code changes.

@DwordPtr
Copy link

DwordPtr commented Feb 7, 2017

@migueldeicaza if you get this working on mac w/ mono is there any chance it will work on linux w/ mono?

@migueldeicaza
Copy link

Some other challenges to keep in mind (Mikayla is helping me get my packages sorted out). The Microsoft.Database.Isam and some of the other Isam libraries contain P/Invokes into native code that will not work on Unix.

@migueldeicaza
Copy link

@DwordPtr if we get a unix port, it should work with little effort on Linux

@migueldeicaza
Copy link

For those following at home:

Start by updating Xamarin Studio to the beta channel (you will need the upgraded Mono).

Then you will need a newer Nuget on your system:

$ curl https://dist.nuget.org/win-x86-commandline/latest/nuget.exe -o nuget.exe; mono nuget.exe restore

With that, you can get your chainsaw and start cutting.

@sanoursa
Copy link
Contributor

sanoursa commented Feb 7, 2017

@migueldeicaza That sounds great, let me know if you have any questions about the code as you're going through it!

@YueLinHo
Copy link

The filter driver of Windows => the fanotify/inotify on Linux? (On-Access Scanning)

@YueLinHo
Copy link

fanotify: fscking all notification and file access system
The definition of event:

Definition Meaning
FAN_ACCESS File was accessed
FAN_MODIFY File was modified
FAN_CLOSE_WRITE Writtable file closed
FAN_CLOSE_NOWRITE Unwrittable file closed
FAN_OPEN File was opened
FAN_OPEN_PERM File open in perm check
FAN_ACCESS_PERM File accessed in perm check

@sschuberth
Copy link

I wonder how much looking at gitfs aka SlothFS could be of help for getting started with the port: "SlothFS is a FUSE filesystem that provides light-weight, lazily downloaded, read-only checkouts of manifest-based Git projects. It is intended for use with Android".

@sanoursa
Copy link
Contributor

sanoursa commented Mar 9, 2017

I'm heavily over-simplifying here, but there have been two /really hard/ problems to solve in the file system portions of GVFS. 1) Enabling writes and 2) Making the file system fast, ideally as fast as a local disk for the second access. A read-only FUSE filesystem doesn't solve either of those very well :-). We actually had GVFS up and running almost a year and a half ago using a FUSE-like solution, and it worked great if you just wanted to read files and have them download on demand. Everything we've done since then is to allow you to also do things like run a build on top of GVFS, modify any file you want, have "git status" and "git checkout" do the right thing but do it fast, etc.

I'd love to be proven wrong on this, but as we go to port this to Mac and Linux, I'd be very surprised if FUSE alone ends up being the correct solution. We had started with a similar solution for our Windows implementation, and what we learned is that there's just no way to make it fast enough if every single bit of IO has to transition from kernel mode to user mode. The way we've solved this on Windows is using the GvFlt driver, which only has to transition to user mode GVFS for the first file access, and after that lays it down on disk as a normal NTFS file, which enables your second access to be as fast as normal.

I still haven't gone very deep on this, but I'm currently thinking that the solution for Mac/Linux will potentially be some combination of a FUSE read-only filesystem, combined with OverlayFS, combined with some sort of on-disk caching. But that perf requirement is a hard one, and it may drive us to build a custom driver for those platforms too.

And I've also completely skipped talking about all the challenges of ensuring that git operations do the right things, even after you've made the file system writable and fast. That raises a whole set of other challenges.

@kevincox
Copy link

kevincox commented Mar 9, 2017

Thanks for the writeup. I wonder if an extension to FUSE would be suited here. This way you can maintain the userspace implementation. It would provide something similar to what you described, on first access it could return a file handle to a file on a different filesystem, thereon in the kernel would proxy all read/write operations to that handle directly.

However it seems that this idea has been proposed before without too much buy in from the FUSE devs:

Although there was some interest each time and often citing that the performance was "good enough" or there weren't clear wins. So maybe with such a clear use case and some good benchmarks this could be supported.

@billziss-gh
Copy link

billziss-gh commented Mar 11, 2017

@sanoursa wrote:

I'd love to be proven wrong on this, but as we go to port this to Mac and Linux, I'd be very surprised if FUSE alone ends up being the correct solution. We had started with a similar solution for our Windows implementation, and what we learned is that there's just no way to make it fast enough if every single bit of IO has to transition from kernel mode to user mode. The way we've solved this on Windows is using the GvFlt driver, which only has to transition to user mode GVFS for the first file access, and after that lays it down on disk as a normal NTFS file, which enables your second access to be as fast as normal.

I am the author of WinFsp which is a FUSE solution for Windows. What I have found is that a user mode file system that enables caching on Windows (i.e. uses the NTOS Cache Manager), can be almost as fast as NTFS. This is because the cache manager satisfies a lot of the I/O and the context switches are minimized. [The reason that NTFS is fast is because of the cache manager and not because disk accesses are fast; besides context switches are faster than disk accesses.]

I link to some performance tests that show that a user mode file system can be very fast: https://github.com/billziss-gh/winfsp/wiki/WinFsp-Performance-Testing. NTFS has a slight edge on cached reads/writes in these tests, but this is because WinFsp does not implement FastIO (yet).

If I was doing this (and I am tempted) I would actually start with a cross-platform FUSE implementation, so the whole git world can benefit. I would then port to Windows using the WinFsp-FUSE layer (or its native API for maximum Windows compatibility).

@kmahyyg
Copy link

kmahyyg commented May 13, 2017

+1

@CJHarmath
Copy link

We actually had GVFS up and running almost a year and a half ago using a FUSE-like solution, and it worked great if you just wanted to read files and have them download on demand.
@sanoursa This might be still interesting work for few folks with read only patterns.
icsfs is one example https://sourceforge.net/projects/icfs/files/
Another one could be something similar to OpenAFS with it's local file cache.

@abergmeier-dsfishlabs
Copy link

The filter driver of Windows => the fanotify/inotify on Linux? (On-Access Scanning)

That would probably be a bad design decision. I exceed inotify limits (upward of >1m) on a daily basis due to Software like Jetbrains *, Bazel, ...

But that perf requirement is a hard one, and it may drive us to build a custom driver for those platforms too.

Due to the above this seems the most reliable way.

@abergmeier-dsfishlabs
Copy link

abergmeier-dsfishlabs commented Aug 22, 2017

I still haven't gone very deep on this, but I'm currently thinking that the solution for Mac/Linux will potentially be some combination of a FUSE read-only filesystem, combined with OverlayFS, combined with some sort of on-disk caching. But that perf requirement is a hard one, and it may drive us to build a custom driver for those platforms too.

Reading into OverlayFS it seems to be a good choice.
Currently I would imaging 2 problems:

  1. OverlayFS seems to also return a file handle directly from one of the underlying FS. While this is great and all, it seems to force us to copy large files from the lower to the upper FS as soon as we modify the file in any way (even attributes). This may be ReallySlow:tm: and "waste" space.
  2. I imaging reading directories to be slowish if we have FUSE as a lower FS. We certainly can start with that approach but will probably need to reimplement userspace code to the kernel to avoid a whole lot of context switching.

I think the problem that we will face on Linux is that git is really fast there. And any solution "feeling" not as swift will probably not be accepted.

@abergmeier-dsfishlabs
Copy link

Do we want to move the discussion somewhere else with actual topic replies? Like Reddit, Slack, ...

@piranna
Copy link

piranna commented Aug 22, 2017

I think discussion makes more sense here.

@kevincox
Copy link

kevincox commented Aug 22, 2017 via email

@piranna
Copy link

piranna commented Aug 22, 2017

In practice it turns out that files in source repositories are rarely
modified, almost always entirely rewritten.

The way git works, git objects are not diffs but instead full copies of the modified files, so in any case full file would need to be transfered.

@kevincox
Copy link

kevincox commented Aug 22, 2017 via email

@abergmeier-dsfishlabs
Copy link

@gitster Do you have any opinion about these questions?

@billziss-gh
Copy link

For those of you who are interested you may also want to watch Eden. It is a low-level FUSE file system that (I believe) supports sparse checkout for mercurial and git. It currently does not build, but my understanding is that the Facebook folks have great plans for it.

@abergmeier-dsfishlabs
Copy link

@billziss-gh Is there any further information about Eden? Like mentioned earlier - a pure FUSE fs probably will not scale.

@billziss-gh
Copy link

billziss-gh commented Sep 18, 2017

Is there any further information about Eden?

My understanding is that they plan to open things up more as time passes. You may want to reach out to @ wez for further information.

a pure FUSE fs probably will not scale.

As I mentioned earlier in the thread context switches are slow, but not as slow as disk accesses. A user mode file system that uses the OS file cache (cache manager, page cache, etc.) can be fast. This is not idle speculation, but it is something that I have done successfully on Windows.

@est31
Copy link

est31 commented Sep 18, 2017

Just wanting to point out that Jonathan Tan from Google is working on adding a native sparse checkout feature to git which might render gvfs and others obsolete: https://public-inbox.org/git/20170915134343.3814dc38@twelve2.svl.corp.google.com/T/#u

@sanoursa
Copy link
Contributor

@est31 that's the hope :-). We're trying to push as much of this functionality into core git as we can. GVFS is currently providing two main features: 1) partial clone + on demand object download (which could be done in a platform-agnostic way in core git) and 2) working directory virtualization and dynamic expansion of sparse-checkout, which requires a file system driver and must be platform-specific.

@wez
Copy link

wez commented Sep 25, 2017

@abergmeier-dsfishlabs: we don't have any public news or timeline for Eden today. What I said in facebook/sapling#4 (comment) is still broadly true. The grand vision is that it will be cross platform (linux, macos, windows) but it will take a bit of time to get there for all systems. We're prioritizing supporting Mercurial as that is what we're using for our largest repositories.

@malix0
Copy link

malix0 commented Nov 11, 2017

@abergmeier
Copy link

GVFS is currently providing two main features: [...] working directory virtualization and dynamic expansion of sparse-checkout

Great summary. For Linux I would think:

working directory virtualization

Should be fairly "easy" to implement no matter in FUSE or kernel.
Perhaps only needs to overlay a fs with fake paths for promised objects. When accessing a faked path, a direct expansion may be needed to be triggered.
Would assume that since it needs to access a fs probably way faster to implement this part as kernel fs (to get rid of context switches).

dynamic expansion

This probably can be split into

  • direct expansion (when operation on file is done via fs)
  • indirect expansion (some logic to fetch objects by heuristics)

This probably is done best by a daemon process, which listens on a directory (on the kernel fs). This directory would then have a list of files, which trigger a fetch of promised objects.

So perhaps have working directory virtualization in the kernel and dynamic expansion in userspace?
Or am I completely wrong?

@darkdragon-001
Copy link

What about integrating the working directory virtualization part directly in the kernel as this is not the only project which could profit from such features:
https://lkml.org/lkml/2017/12/13/669

@pratikpparikh
Copy link

@darkdragon-001 @abergmeier @abergmeier-dsfishlabs is there any more work done to support linux and mac?

@sanoursa
Copy link
Contributor

Please see https://blogs.msdn.microsoft.com/devops/2018/03/15/gvfs-for-mac/ for the latest

@wheelerlaw
Copy link

wheelerlaw commented Apr 22, 2018

It looks like this is on GitHub's radar folks.

@DanielJoyce
Copy link

Ms developed it for their use case which was managing windows source on Windows machines.

It's like asking why so many Linux opensourced projects don't have Windows versions.

@sam0x17
Copy link

sam0x17 commented Jun 7, 2018

I would definitely suggest taking a look at WinFSP.

Also, maybe the easiest way to do this would be to use libfuse on linux systems.

@piranna
Copy link

piranna commented Jun 7, 2018

It's like asking why so many Linux opensourced projects don't have Windows versions.

Tell me one, and if so, tell me why it would be dificult to port to Windows...

@ldo
Copy link

ldo commented Jun 7, 2018

I can’t think of any open-source project that would need this, whether on Linux or elsewhere.

Maybe Microsoft should simply refactor their Windows code and get rid of all the legacy cruft? Then standard Git would work just fine.

@piranna
Copy link

piranna commented Jun 8, 2018

Maybe Microsoft should simply refactor their Windows code and get rid of all the legacy cruft? Then standard Git would work just fine.

If you read the reasons why they developed GVFS, you'll see they have a single huge repo with all their source code of all their projects since MS-DOS 1.0, so they commits history and the commits diffs are humongous. You would think this is unreasonable for normal use cases, but I'm eagerly their changes gets into mainstream git and GVFS gets into Linux so I can be able to use it as a versioned distributed filesystem.

@DanielJoyce
Copy link

@piranna Because MS needed to scratch their itch first, which is their huge code base. So there was no reason for them to immediately port to linux.

@ido git submodules are a pain to manage. Sometimes a giant mono repo is easiest. You don't have to continually manage multiple artifacts, artifact repo, and version tracking files. Nevermind the support for that is seriously lacking in the C/C++ world. Annex doesn't solve the problem, and other projects only tackle parts like large file support

@sanoursa
Copy link
Contributor

Closing this issue because it is being replaced by many, finer grained issues covering the actual porting work. GVFS for Mac is now under active development and we have a bunch of issues covering that work. GVFS for Linux will be soon as well, so we'll open Linux-focused issues at that time.

alameenshah added a commit that referenced this issue Dec 14, 2018
Teach upgraders to control their own upgrade actions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests