Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use st_dev in conjunction with ino #2

Open
dlqqq opened this issue Sep 7, 2022 · 5 comments
Open

use st_dev in conjunction with ino #2

dlqqq opened this issue Sep 7, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@dlqqq
Copy link
Collaborator

dlqqq commented Sep 7, 2022

As @kevin-bates pointed out, st_ino is only unique within a single storage device, and must be coupled with st_dev (device number) to be truly unique per server instance.

@dlqqq dlqqq added the bug Something isn't working label Sep 7, 2022
@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 11, 2022

@kevin-bates Hey Kevin, I've finally gotten more time to tackle this issue. I had some thoughts about this that I wanted your input on.

  1. Is the (st_dev, st_ino) pair really guaranteed to grant file uniqueness on any multi-FS platform (ignoring NFS for now)? Reading the Linux man pages more closely, they state that an inode number is unique within a filesystem, not within a device. Depending on how precise they are with their nomenclature, this seems to imply it's possible for single device number to have multiple filesystems. Anecdotally, it looks like others are in agreement that the pair (st_dev, st_ino) will uniquely identify a file on any system. However, given that I don't see this in the Linux man pages, this seems guaranteed to me only if there is no way of creating a separate filesystem on the same device without creating a new partition (which changes the device minor number). Is this true to the best of your knowledge?

  2. Is it known whether the device number for a partition is persistent across remounts and reboots? I can't find this behavior documented anywhere, and if this is not the case, we will have serious difficulty supporting multi-FS platforms. I made a SE question for this, hopefully it will get some responses.

  3. I read that blog post on Linux NFS client handling device numbers you had linked in the original PR. Do NFS partitions remount on reboot? If so, that would also mean supporting NFS will be more difficult.

@kevin-bates
Copy link
Member

Hi @dlqqq - It's been 30 years since I dealt with this area of the stack so my memory doesn't exactly serve me very well.

I agree there's some ambiguity regarding file system and st_dev + st_ino uniqueness in the referenced man page. That said, it does appear that st_dev can change across remounts and reboots (including NFS) - so it won't be reliable. It's also clear that only st_ino is not sufficient. I think the various links and discussion illustrate the slippery slope that using stat info can introduce, particularly given the myriad of filesystem types and implementations and Windows (:smile:).

Perhaps taking a higher-level approach where persisting the root directory, relative path, and hostname might make supporting this easier. This may also get things closer to ContentsManager-independence.

To address moves within a filesystem, perhaps (conditionally) capturing st_ino (if the file resides in the file system) and using that as a hint when reconciling the file path might be helpful.

Would it help to drive insertion "on-demand" by inserting entries only when get_id() doesn't find anything (and after checking if the inode entry exists to handle moves)? This might ease the pain of out-of-band updates by only persisting information "touched" by the application and not unconditionally.

@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 12, 2022

It's also clear that only st_ino is not sufficient. I think the various links and discussion illustrate the slippery slope that using stat info can introduce, particularly given the myriad of filesystem types and implementations and Windows (😄).

To add to this, FAT32/NTFS don't even really have inodes, but instead just have file indexes which appear, to my horror, possibly mutable over a file's lifetime. This is what is returned by os.stat().st_ino on Windows. But that's a separate issue.

With respect to your other comments, the existing design handles out-of-band updates very intelligently, and it would be really difficult to make such major design changes without losing that functionality. However I'll keep those comments in mind if I'm unable to find a solution to supporting multi-FS platforms.

Thanks to Stack Overflow, it looks like blkid provides us with a persistent UUID per filesystem. Not sure how this works with NFS, but we'll see. I'm hopeful that this is exactly what we want.

@kevin-bates
Copy link
Member

it looks like blkid provides us with a persistent UUID per filesystem.

Cool - so there's your primary key! 😉

@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 12, 2022

One immediate issue with the blkid approach is that it seems specific only to certain Linux distributions. I don't know of its equivalents for OS X and Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants