Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mfu: add --open-noatime to open files with O_NOATIME #561

Merged
merged 1 commit into from
Oct 11, 2023
Merged

Conversation

adammoody
Copy link
Member

@adammoody adammoody commented Oct 6, 2023

This adds an --open-noatime option to a number of tools, which adds the O_NOATIME flag when opening files to avoid updating the file last access time.

Many centers use last access time to filter files for purge operations, and they would prefer not to change file atime values when making backup copies with dsync or scanning the file system for duplicate files with ddup. Adding this flag may also improve read performance on some file systems.

The O_NOATIME flag is only allowed when the effective user id matches the owner of the file or when the process is running with the CAP_FOWNER capability. A normal user will encounter errors when using O_NOATIME when reading from a shared directory containing files owned by other users, even if the current user has read access to all files.

The following tools are affected:

ddup - when reading files to compute hash values
dcp and dsync - when reading source files during a copy
dcmp and dsync - when reading source and destination files while comparing their contents
dtar - while reading source files when creating an archive

Wheen --open-noatime is specified with ddup, the tool checks the owner user id of each file and conditionally adds O_NOATIME if the process effective user id matches. This allows normal users to specify the --open-noatime option, even when running ddup on files that they don't own. The atime will be updated on files that the user can read but does not own.

For the remaining tools, the current algorithms do not expose the file owner id in a way to allow for an easy check. In this case, O_NOATIME is added when opening all files. Normal users will thus encounter an error if the tool attempts to open any file that they do not own.

Resolves:
#557
#534

@adammoody
Copy link
Member Author

@daltonbohning , I hope you're doing well.

We've had some requests to add O_NOATIME to some tools. I've opened this PR to do that. It'd be good to know whether these changes are valid for DAOS system.

Is there someone who can help us check that?

@daltonbohning
Copy link
Collaborator

@daltonbohning , I hope you're doing well.

We've had some requests to add O_NOATIME to some tools. I've opened this PR to do that. It'd be good to know whether these changes are valid for DAOS system.

Is there someone who can help us check that?

Hey Adam. Yes, I'm doing well. Hope the same for you!

It looks like DAOS/DFS doesn't respect passing O_NOATIME, though atime is updated. I tested and it doesn't break anything. Extra bits set in flags are just ignored. So this change is safe for us, and I'll create an internal ticket to see if we want to handle that flag when passed.

Thanks for the heads up!

@daltonbohning
Copy link
Collaborator

DAOS JIRA for reference: https://daosio.atlassian.net/browse/DAOS-14479

@adammoody adammoody changed the title ddup: open files with O_NOATIME mfu: open files with O_NOATIME Oct 6, 2023
@daltonbohning
Copy link
Collaborator

We discussed this in the context of DAOS, and we don't actually store atime with the file. It's only populated in the stat buf to the greater of mtime or ctime. So handling O_NOATIME wouldn't help anything because it would just get "reset" on the next file open.

@adammoody
Copy link
Member Author

TODO: we'll need to be a bit more clever when copying files that are readable but not owned by the user.

From man 2 open:

O_NOATIME (since Linux 2.6.8)
Do not update the file last access time (st_atime in the inode) when the file is read(2).

This flag can be employed only if one of the following conditions is true:

  • The effective UID of the process matches the owner UID of the file.
  • The calling process has the CAP_FOWNER capability in its user namespace and the owner UID of the file has a mapping in the namespace.

This flag is intended for use by indexing or backup programs, where its use can significantly reduce the amount of disk activity. This flag may not be effective on all filesystems. One example is NFS, where the server maintains the access time.

and potential error:

EPERM The O_NOATIME flag was specified, but the effective user ID of the caller did not match the owner of the file and the caller was not privileged.

@adammoody
Copy link
Member Author

We discussed this in the context of DAOS, and we don't actually store atime with the file. It's only populated in the stat buf to the greater of mtime or ctime. So handling O_NOATIME wouldn't help anything because it would just get "reset" on the next file open.

Thanks, @daltonbohning . And thanks for your super fast response!

@adammoody
Copy link
Member Author

It sounds like tar updates source file atimes by default but one can attempt to preserve atime with an option:

https://www.gnu.org/software/tar/manual/html_section/Attributes.html

When tar reads files, it updates their access times. To avoid this, use the ‘--atime-preserve[=METHOD]’ option, which can either reset the access time retroactively or avoid changing it in the first place.

@adammoody
Copy link
Member Author

adammoody commented Oct 6, 2023

For the ability to use O_NOATIME, we do a similar check in mfu_flist_chmod():

/* cache current effective user id,
* determines uid when considering owner ID of files */
opts->geteuid = geteuid();

/* whether process is running with CAP_FOWNER, allowing
* changes to permissions of file even when effective user id
* of the process does not match the owner of the file */
opts->capfowner = false;
#ifdef HAVE_LIBCAP
cap_rc = cap_get_bound(CAP_FOWNER);
if (cap_rc > 0) {
/* process is running with CAP_FOWNER capability */
opts->capfowner = true;
}
#endif

/* don't bother changing permissions on files we don't own,
* unless process has CAP_FOWNER capability */
uid_t owner = (uid_t) mfu_flist_file_get_uid(list, idx);
if (opts->geteuid != owner && !opts->capfowner) {
/* don't attempt to change files we don't own */
change = 0;
}

TODO: this code doesn't really accomplish what it claims to do. I'll fix that later.

@adammoody
Copy link
Member Author

Apparently, rsync v3.2.0 provides the following options for atime:

https://download.samba.org/pub/rsync/rsync.1

       --atimes, -U
              This  tells  rsync to set the access (use) times of the destina‐
              tion files to the same value as the source files.

              If repeated, it also sets the --open-noatime option,  which  can
              help you to make the sending and receiving systems have the same
              access times on the transferred files  without  needing  to  run
              rsync an extra time after a file is transferred.

              Note  that  some  older rsync versions (prior to 3.2.0) may have
              been built with a pre-release --atimes patch that does not imply
              --open-noatime when this option is repeated.

       --open-noatime
              This  tells rsync to open files with the O_NOATIME flag (on sys‐
              tems that support it) to avoid changing the access time  of  the
              files  that  are being transferred.  If your OS does not support
              the O_NOATIME flag then rsync will silently ignore this  option.
              Note  also  that  some filesystems are mounted to avoid updating
              the atime on read access even without the O_NOATIME  flag  being
              set.

Tip from: https://unix.stackexchange.com/questions/630228/rsync-keep-access-time-atime-how

Signed-off-by: Adam Moody <moody20@llnl.gov>
@adammoody adammoody changed the title mfu: open files with O_NOATIME mfu: add --open-noatime to open files with O_NOATIME Oct 11, 2023
@adammoody adammoody merged commit 7c40bff into main Oct 11, 2023
@adammoody adammoody deleted the noatime branch October 11, 2023 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants