Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

squashfs: read target of symlinks #200

Closed
ncw opened this issue Dec 17, 2023 · 5 comments
Closed

squashfs: read target of symlinks #200

ncw opened this issue Dec 17, 2023 · 5 comments

Comments

@ncw
Copy link
Contributor

ncw commented Dec 17, 2023

I'd like to be able to read the target of symlinks.

I can think of three methods to do this. My favourite is method 2 I think.

I'm happy to make a PR for whichever method we decide is best.

method 1

One way of doing this would be to add a Readlink() (string, error) method to FileStat. This would return an error for non symlinks.

// FileStat is the extended data underlying a single file, similar to https://golang.org/pkg/syscall/#Stat_t
type FileStat struct {
uid uint32
gid uint32
xattrs map[string]string
}

This could be implemented by adding a target field and setting it here

sys: FileStat{
uid: fs.uidsGids[header.uidIdx],
gid: fs.uidsGids[header.gidIdx],
xattrs: xattrs,

method 2

An alternative would be to add a new method to directoryEntry (which is passed to the caller as an os.FileInfo), say Readlink() (string, error) which you'd have to type assert to have access to. This would have the advantage that we don't need to put stuff in the FileStat the user isn't going to use.

// directoryEntry is a single directory entry
// it combines information from inode and the actual entry
// also fulfills os.FileInfo
//
// Name() string // base name of the file
// Size() int64 // length in bytes for regular files; system-dependent for others
// Mode() FileMode // file mode bits
// ModTime() time.Time // modification time
// IsDir() bool // abbreviation for Mode().IsDir()
// Sys() interface{} // underlying data source (can return nil)
type directoryEntry struct {

If you like this concept then we could potentially implement an OpenFile method too which would save a lot of directory traversals when reading files when it is very likely you've just listed the directory and have a FileInfo to hand.

method 3

A third method would be to implement FileSystem.Readlink to mimic os.Readlink. I like this method the least though because squashfs doesn't have an efficient way of finding a file other than iterating all the directories.

@deitch
Copy link
Collaborator

deitch commented Dec 18, 2023

I don't love method 2, in that you need to typecast to get it to work. I get it, sometimes you need to do such things, but within a squashfs, I would think you shouldn't need to do so.

Method 1 is pretty good; the amount of information you are carrying around is tiny. I might be worried if we were doing this on real-time systems, but who runs go on microcontrollers?

Method 3 is most like os.Readlink(), and so remains consistent with the go way of doing things (even if sometimes that go way is less efficient).

an efficient way of finding a file other than iterating all the directories

And what happens when you try to get the FileInfo for method 1 or directory entry for method 2? You still have to walk the directories. The issue is that now you have to do it twice: once to get to the point that you know it is a symlink, and a second time to read the symlink. That part does bother me. I suspect go did it that way because not everything has symlinks, so they wanted it out of band.

I think method 1 is your best bet. You already are there, you walked the tree, and the amount of additional information is tiny. I could be convinced otherwise, though.

@ncw
Copy link
Contributor Author

ncw commented Dec 18, 2023

I don't love method 2, in that you need to typecast to get it to work. I get it, sometimes you need to do such things, but within a squashfs, I would think you shouldn't need to do so.

I'll just note that you already need a typecast to turn the result of the Sys() call into the FileStat.

Method 1 is pretty good; the amount of information you are carrying around is tiny. I might be worried if we were doing this on real-time systems, but who runs go on microcontrollers?

Method 3 is most like os.Readlink(), and so remains consistent with the go way of doing things (even if sometimes that go way is less efficient).

Just an idea...The FileState could just contain a pointer to the directoryEntry...

In fact the FileStat could be an alias of directoryEntry (you can even make it backwards compatible with type aliases) and then we'd just add new methods to directoryEntry and they would appear on the FileStat also.

So

type FileStat = directoryEntry

Then move the Uid Gid and Xattrs methods onto dirEntry. This then makes adding Readlink quite natural and implements methods 1 and 2!

an efficient way of finding a file other than iterating all the directories

And what happens when you try to get the FileInfo for method 1 or directory entry for method 2? You still have to walk the directories. The issue is that now you have to do it twice: once to get to the point that you know it is a symlink, and a second time to read the symlink. That part does bother me. I suspect go did it that way because not everything has symlinks, so they wanted it out of band.

I think they were just following the unix syscalls which is readlink.

This is quite annoying though in general. From having implemented this many times in rclone I think the best interface is something like this:

// Node represents either a directory (*Dir) or a file (*File)
type Node interface {
	os.FileInfo
	IsFile() bool
	Inode() uint64
	SetModTime(modTime time.Time) error
	Sync() error
	Remove() error
	RemoveAll() error
	DirEntry() fs.DirEntry
	VFS() *VFS
	Open(flags int) (Handle, error)
	Truncate(size int64) error
	Path() string
	SetSys(interface{})
}

(With the addition of ReadLink!)

https://github.com/rclone/rclone/blob/8503282a5adffc992e1834eed2cd8aeca57c01dd/vfs/vfs.go#L51C1-L66C2

@deitch
Copy link
Collaborator

deitch commented Dec 19, 2023

From having implemented this many times in rclone

Oh rclone is really nice. I didn't know that was you. I have some other projects that are in a similar space, but not really competitive. We should have a real-world discussion. Send me a Twitter DM @avideitcher or just @ me on one and we can then connect via email. If you are interested, of course.

Do you remember the seminal "rsync for backup" paper, by Mike Rubel, all those years ago? So much can be done with the basic structures at hand.

In fact the FileStat could be an alias of directoryEntry (you can even make it backwards compatible with type aliases) and then we'd just add new methods to directoryEntry and they would appear on the FileStat also.

Ooh, this is interesting. As long as it implements the right interface, why not? I rather like that. Does it work?

Since we already are talking about these, this was part of the work I did for a company early this year.

ncw added a commit to ncw/go-diskfs that referenced this issue Dec 21, 2023


This moves the auxillary methods UID(),GID() and Xattr() onto the
directoryEntry directly and makes the FileStat a type alias for a
pointer to a directory entry.

This enables more methods to be added to the directory entry easily.
ncw added a commit to ncw/go-diskfs that referenced this issue Dec 21, 2023
@deitch deitch closed this as completed in 07630e2 Dec 22, 2023
ncw added a commit to ncw/go-diskfs that referenced this issue Dec 23, 2023


This moves the auxillary methods UID(),GID() and Xattr() onto the
directoryEntry directly and makes the FileStat a type alias for a
pointer to a directory entry.

This enables more methods to be added to the directory entry easily.
ncw added a commit to ncw/go-diskfs that referenced this issue Dec 23, 2023
@ncw
Copy link
Contributor Author

ncw commented Dec 23, 2023

Thanks for merging this. I'll send the next part shortly which implements Open - this helps a lot in the performance stakes.

Oh rclone is really nice. I didn't know that was you. I have some other projects that are in a similar space, but not really competitive. We should have a real-world discussion. Send me a Twitter DM @avideitcher or just @ me on one and we can then connect via email. If you are interested, of course.

You can find me at nick@craig-wood.com also @njcw on twitter (I have followed you!). Yes I agree we have lots of interests in common and a discussion would be great!

Do you remember the seminal "rsync for backup" paper, by Mike Rubel, all those years ago? So much can be done with the basic structures at hand.

I don't think I saw that, but I'm a big fan of rsync! The text UI of rclone is modeled around rsync and the philosophy is very similar. The only bit rclone doesn't implement is the very clever delta encoding transfer, but there aren't many cloud providers which do that!

Since we already are talking about these, this was part of the work I did for a company early this year.

Oh nice!

I'm currently working on an archive backend for rclone which enables reading of archives like .zip and .squashfs - that is why I've been so interested in squashfs.

You can see the code so far here

v1.66.0-beta.7581.1cee1e101.archive-backend on branch archive-backend

To tell rclone you want to look at an archive you do rclone ls :archive:/path/to/archive.sqfs

@deitch
Copy link
Collaborator

deitch commented Dec 24, 2023

To tell rclone you want to look at an archive you do rclone ls :archive:/path/to/archive.sqfs

Sometimes I think 50% of the things I get involved with in one way or another are, "how do we encode something that is kind of like a URL in a single string?" 😆

Will email you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants