-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macOS - fclones does not preserve extended attributes #152
Comments
On macOS extended attributes are used extensively and this fclones behaviour can create real mess. It should be IMHO either fixed or clearly documented so there are no surprises when used. |
Now my two pennies' worth opinion about solution. I think you could "copy" what jdupes does (it is also released under MIT licence) but in addition fix one remiaining issue jdupes still has https://github.com/jbruchon/jdupes/issues/189 This would make fclones one step closer to be ultimate deduplication program for macOS:) |
GNU cp unfortunately does not work anymore: Seems that brew version was built without xattr support:
I did a bit poking around and seems that only way to do this properly is to use Apple Standard C Library copyfile I am not familiar wtih Rust but can see that std::fs::copy uses the native platform's implementation: Maybe it can help with fixing this issue if Rust implementation allows COPYFILE_METADATA. copyfile man pageCOPYFILE(3) BSD Library Functions Manual COPYFILE(3) NAME copyfile, fcopyfile, copyfile_state_alloc, copyfile_state_free, copyfile_state_get, copyfile_state_set — copy a file LIBRARY Standard C Library (libc, −lc) SYNOPSIS #include <copyfile.h> int copyfile(const char *from, const char *to, copyfile_state_t state, copyfile_flags_t flags); int fcopyfile(int from, int to, copyfile_state_t state, copyfile_flags_t flags); copyfile_state_t copyfile_state_alloc(void); int copyfile_state_free(copyfile_state_t state); int copyfile_state_get(copyfile_state_t state, uint32_t flag, void * dst); int copyfile_state_set(copyfile_state_t state, uint32_t flag, const void * src); typedef int (*copyfile_callback_t)(int what, int stage, copyfile_state_t state, const char * src, const char * dst, void * ctx); DESCRIPTION These functions are used to copy a file’s data and/or metadata. (Metadata consists of permissions, extended attributes, access control lists, and so forth.) The copyfile_state_alloc() function initializes a copyfile_state_t object (which is an opaque data type). This object can be passed to copyfile() and fcopyfile(); copyfile_state_get() and copyfile_state_set() can be used to manipulate the state (see below). The copyfile_state_free() function is used to deallocate the object and its contents. The copyfile() function can copy the named from file to the named to file; the fcopyfile() function does the same, but using the file descriptors of already-opened files. If the state parameter is the return value from copyfile_state_alloc(), then copyfile() and fcopyfile() will use the information from the state object; if it is NULL, then both functions will work normally, but less control will be available to the caller. The flags parameter controls which contents are copied: COPYFILE_ACL Copy the source file’s access control lists. COPYFILE_STAT Copy the source file’s POSIX information (mode, modification time, etc.). COPYFILE_XATTR Copy the source file’s extended attributes. COPYFILE_DATA Copy the source file’s data. These values may be or’d together; several convenience macros are provided: COPYFILE_SECURITY Copy the source file’s POSIX and ACL information; equivalent to (COPYFILE_STAT|COPYFILE_ACL). COPYFILE_METADATA Copy the metadata; equivalent to (COPYFILE_SECURITY|COPYFILE_XATTR). COPYFILE_ALL Copy the entire file; equivalent to (COPYFILE_METADATA|COPYFILE_DATA). The copyfile() and fcopyfile() functions can also have their behavior modified by the following flags: COPYFILE_RECURSIVE Causes copyfile() to recursively copy a hierarchy. This flag is not used by fcopyfile(); see below for more information. COPYFILE_CHECK Return a bitmask (corresponding to the flags argument) indicating which contents would be copied; no data are actually copied. (E.g., if flags was set to COPYFILE_CHECK|COPYFILE_METADATA, and the from file had extended attributes but no ACLs, the return value would be COPYFILE_XATTR .) COPYFILE_PACK Serialize the from file. The to file is an AppleDouble-format file. COPYFILE_UNPACK Unserialize the from file. The from file is an AppleDouble-format file; the to file will have the extended attributes, ACLs, resource fork, and FinderInfo data from the to file, regardless of the flags argument passed in. COPYFILE_EXCL Fail if the to file already exists. (This is only applicable for the copyfile() function.) COPYFILE_NOFOLLOW_SRC Do not follow the from file, if it is a symbolic link. (This is only applicable for the copyfile() function.) COPYFILE_NOFOLLOW_DST Do not follow the to file, if it is a symbolic link. (This is only applicable for the copyfile() function.) COPYFILE_MOVE Unlink (using remove(3)) the from file. (This is only applicable for the copyfile() function.) No error is returned if remove(3) fails. Note that remove(3) removes a symbolic link itself, not the target of the link. COPYFILE_UNLINK Unlink the to file before starting. (This is only applicable for the copyfile() function.) COPYFILE_CLONE_FORCE Clone the file instead. This is a force flag i.e. if cloning fails, an error is returned. This flag is equivalent to (COPYFILE_EXCL | COPYFILE_ACL | COPYFILE_STAT | COPYFILE_XATTR | COPYFILE_DATA | COPYFILE_NOFOLLOW_SRC). Note that if cloning is successful, progress callbacks will not be invoked. Note also that there is no support for cloning directories: if a directory is provided as the source, an error will be returned. (This is only applicable for the copyfile() function.) COPYFILE_CLONE Try to clone the file instead. This is a best try flag i.e. if cloning fails, fallback to copying the file. This flag is equivalent to (COPYFILE_EXCL | COPYFILE_ACL | COPYFILE_STAT | COPYFILE_XATTR | COPYFILE_DATA | COPYFILE_NOFOLLOW_SRC). Note that if cloning is successful, progress callbacks will not be invoked. Note also that there is no support for cloning directories: if a directory is provided as the source and COPYFILE_CLONE_FORCE is not passed, this will instead copy the directory. Recursive copying however is supported, see below for more information. (This is only applicable for the copyfile() function.) COPYFILE_DATA_SPARSE Copy a file sparsely. This requires that the source and destination file systems support sparse files with hole sizes at least as large as their block sizes. This also requires that the source file is sparse, and for fcopyfile() the source file descriptor’s offset be a multiple of the minimum hole size. If COPYFILE_DATA is also specified, this will fall back to a full copy if sparse copying cannot be performed for any reason; otherwise, an error is returned. COPYFILE_NOFOLLOW This is a convenience macro, equivalent to (COPYFILE_NOFOLLOW_DST | COPYFILE_NOFOLLOW_SRC). COPYFILE_RUN_IN_PLACE If the src file has quarantine information, add the QTN_FLAG_DO_NOT_TRANSLOCATE flag to the quarantine information of the dst file. This allows a bundle to run in place instead of being translocated. The copyfile_state_get() and copyfile_state_set() functions can be used to manipulate the copyfile_state_t object returned by copyfile_state_alloc(). In both functions, the dst parameter’s type depends on the flag parameter that is passed in. COPYFILE_STATE_SRC_FD Get or set the file descriptor associated with the source (or destination) file. If this has not been initialized yet, the value will be -2. The dst (for copyfile_state_get()) and src (for copyfile_state_set()) parameters are pointers to int. COPYFILE_STATE_SRC_FILENAME Get or set the filename associated with the source (or destination) file. If it has not been initialized yet, the value will be NULL. For copyfile_state_set(), the src parameter is a pointer to a C string (i.e., char* ); copyfile_state_set() makes a private copy of this string. For copyfile_state_get() function, the dst parameter is a pointer to a pointer to a C string (i.e., char** ); the returned value is a pointer to the state ’s copy, and must not be modified or released. COPYFILE_STATE_STATUS_CB Get or set the callback status function (currently only used for recursive copies; see below for details). The src parameter is a pointer to a function of type copyfile_callback_t (see above). COPYFILE_STATE_STATUS_CTX Get or set the context parameter for the status call-back function (see below for details). The src parameter is a void *. COPYFILE_STATE_QUARANTINE Get or set the quarantine information with the source file. The src parameter is a pointer to an opaque object (type void * ). COPYFILE_STATE_COPIED Get the number of data bytes copied so far. (Only valid for copyfile_state_get(); see below for more details about callbacks.) If a COPYFILE_CLONE or COPYFILE_CLONE_FORCE operation successfully cloned the requested objects, then this value will be 0. The dst parameter is a pointer to off_t (type off_t * ). COPYFILE_STATE_XATTRNAME Get the name of the extended attribute during a callback for COPYFILE_COPY_XATTR (see below for details). This field cannot be set, and may be NULL. COPYFILE_STATE_WAS_CLONED True if a COPYFILE_CLONE or COPYFILE_CLONE_FORCE operation successfully cloned the requested objects. The dst parameter is a pointer to bool (type bool * ). Recursive Copies When given the COPYFILE_RECURSIVE flag, copyfile() (but not fcopyfile()) will use the fts(3) functions to recursively descend into the source file-system object. It then calls copyfile() on each of the entries it finds that way. If a call-back function is given (using copyfile_state_set() and COPYFILE_STATE_STATUS_CB ), the call-back function will be called four times for each directory object, and twice for all other objects. (Each directory will be examined twice, once on entry -- before copying each of the objects contained in the directory -- and once on exit -- after copying each object contained in the directory, in order to perform some final cleanup.) The call-back function will have one of the following values as the first argument, indicating what is being copied: COPYFILE_RECURSE_FILE The object being copied is a file (or, rather, something other than a directory). COPYFILE_RECURSE_DIR The object being copied is a directory, and is being entered. (That is, none of the filesystem objects contained within the directory have been copied yet.) COPYFILE_RECURSE_DIR_CLEANUP The object being copied is a directory, and all of the objects contained have been copied. At this stage, the destination directory being copied will have any extra permissions that were added to allow the copying will be removed. COPYFILE_RECURSE_ERROR There was an error in processing an element of the source hierarchy; this happens when fts(3) returns an error or unknown file type. (Currently, the second argument to the call-back function will always be COPYFILE_ERR in this case.) The second argument to the call-back function will indicate the stage of the copy, and will be one of the following values: COPYFILE_START Before copying has begun. The third parameter will be a newly-created copyfile_state_t object with the call-back function and context pre-loaded. COPYFILE_FINISH After copying has successfully finished. COPYFILE_ERR Indicates an error has happened at some stage. If the first argument to the call-back function is COPYFILE_RECURSE_ERROR, then an error occurred while processing the source hierarchy; otherwise, it will indicate what type of object was being copied, and errno will be set to indicate the error. The fourth and fifth parameters are the source and destination paths that are to be copied (or have been copied, or failed to copy, depending on the second argument). The last argument to the call-back function will be the value set by COPYFILE_STATE_STATUS_CTX, if any. The call-back function is required to return one of the following values: COPYFILE_CONTINUE The copy will continue as expected. COPYFILE_SKIP This object will be skipped, and the next object will be processed. (Note that, when entering a directory. returning COPYFILE_SKIP from the call-back function will prevent the contents of the directory from being copied.) COPYFILE_QUIT The entire copy is aborted at this stage. Any filesystem objects created up to this point will remain. copyfile() will return -1, but errno will be unmodified. The call-back function must always return one of the values listed above; if not, the results are undefined. The call-back function will be called twice for each object (and an additional two times for directory cleanup); the first call will have a stage parameter of COPYFILE_START; the second time, that value will be either COPYFILE_FINISH or COPYFILE_ERR to indicate a successful completion, or an error during processing. In the event of an error, the errno value will be set appropriately. Note that recursive cloning is also supported with the COPYFILE_CLONE flag (but not the COPYFILE_CLONE_FORCE flag). A recursive clone operation invokes copyfile() with COPYFILE_CLONE on every entry found in the source file-system object. Because copyfile() does not allow the cloning of directories, a recursive clone will instead copy any directory it finds (while cloning its contents). As symbolic links may point to directories, they are not followed during recursive clones even if the source is a symbolic link. Additionally, because the COPYFILE_CLONE flag implies the COPYFILE_EXCL flag, recursive clones require a nonexistent destination. The COPYFILE_PACK, COPYFILE_UNPACK, COPYFILE_MOVE, and COPYFILE_UNLINK flags are not used during a recursive copy, and will result in an error being returned. Progress Callback In addition to the recursive callbacks described above, copyfile() and fcopyfile() will also use a callback to report data (e.g., COPYFILE_DATA) progress. If given, the callback will be invoked on each write(2) call. The first argument to the callback function will be COPYFILE_COPY_DATA. The second argument will either be COPYFILE_PROGRESS (indicating that the write was successful), or COPYFILE_ERR (indicating that there was an error of some sort). The amount of data bytes copied so far can be retrieved using copyfile_state_get(), with the COPYFILE_STATE_COPIED requestor (the argument type is a pointer to off_t ). When copying extended attributes, the first argument to the callback function will be COPYFILE_COPY_XATTR. The other arguments will be as described for COPYFILE_COPY_DATA; the name of the extended attribute being copied may be retrieved using copyfile_state_get() and the parameter COPYFILE_STATE_XATTRNAME. When using COPYFILE_PACK, the callback may be called with COPYFILE_START for each of the extended attributes first, followed by COPYFILE_PROGRESS before getting and packing the data for each individual attribute, and then COPYFILE_FINISH when finished with each individual attribute. (That is, COPYFILE_START may be called for all of the extended attributes, before the first callback with COPYFILE_PROGRESS is invoked.) Any attribute skipped by returning COPYFILE_SKIP from the COPYFILE_START callback will not be placed into the packed output file. The return value for the data callback must be one of COPYFILE_CONTINUE The copy will continue as expected. (In the case of error, it will attempt to write the data again.) COPYFILE_SKIP The data copy will be aborted, but without error. COPYFILE_QUIT The data copy will be aborted; in the case of COPYFILE_PROGRESS, errno will be set to ECANCELED. While the src and dst parameters will be passed in, they may be NULL in the case of fcopyfile(). Note that progress callbacks are not invoked when a clone is requested (e.g. COPYFILE_CLONE) unless the clone cannot be performed and a copy is performed instead. RETURN VALUES Except when given the COPYFILE_CHECK flag, copyfile() and fcopyfile() return less than 0 on error, and 0 on success. All of the other functions return 0 on success, and less than 0 on error. WARNING Both copyfile() and fcopyfile() can copy symbolic links; there is a gap between when the source link is examined and the actual copy is started, and this can be a potential security risk, especially if the process has elevated privileges. When performing a recursive copy, if the source hierarchy changes while the copy is occurring, the results are undefined. fcopyfile() does not reset the seek position for either source or destination. This can result in the destination file being a different size than the source file. ERRORS copyfile() and fcopyfile() will fail if: [EINVAL] An invalid flag was passed in with COPYFILE_RECURSIVE. [EINVAL] The from or to parameter to copyfile() was a NULL pointer. [EINVAL] The from or to parameter to fcopyfile() was a negative number. [ENOMEM] A memory allocation failed. [ENOTSUP] The source file was not a directory, symbolic link, or regular file. [ENOTSUP] COPYFILE_CLONE_FORCE was specified and file cloning is not supported. [ENOTSUP] COPYFILE_DATA_SPARSE was specified, sparse copying is not supported, and COPYFILE_DATA was not specified. [ECANCELED] The copy was cancelled by callback. [EEXIST] The to parameter to copyfile() already existed and was passed in with COPYFILE_EXCL. [ENOENT] The from parameter to copyfile() did not exist. [EACCES] Search permission is denied for a component of the path prefix for the from or to parameters. [EACCES] Write permission is denied for a component of the path prefix for the to parameter. In addition, both functions may set errno via an underlying library or system call. EXAMPLES /* Initialize a state variable / SEE ALSO listxattr(2), getxattr(2), setxattr(2), acl(3) BUGS Both copyfile() functions lack a way to set the input or output block size. Recursive copies do not honor hard links. HISTORY The copyfile() API was introduced in Mac OS X 10.5. BSD August 30, 2017 BSD |
I can use specific C APIs from Apple SDK in Rust. |
FYI - this weekend I have also tested ref-link dedup behaviour on linux (BTRFS, Suse Linux) and here all is OK - xattr are preserved. So it is macOS specific problem. |
This is because on linux fclones performs reflink-in-place. On other systems, it removes the file and recreates it by doing a copy with reflink. I added a patch, but I can't test it because I have no access to macOS at the moment. |
I will test - no problem. Will report results today, |
if I build with |
I have tried and the answer is nope... I have never used rust toolchain - could you please tell me how to build local git cloned version? |
ok I think I managed crash learn how to do it:) Here you are:
and xattr results:
different than fclones v0.27.1:
but not yet what it should be:
|
Thank you for the test! I see, the problem is that it copied also the source file attributes. I need to clear them then before setting the originals. |
Try now. Now I clear the attributes before setting the originals. |
now have to do my Sunday chores but will do later today |
Sure, no hurry! :) Thank you for all the help. |
tested sucesfully. I concider reported issue fixed in v0.27.2 |
Thank you for fantastic tool and responsivness in dealing with bugs. |
Happy to hear that! Now when I think more about it, I wonder, maybe it would be actually better to make it a hard failure when metadata cannot be restored and it should attempt to rollback the change (by moving the original file back to its place)? Currently it is only a warning which means - if some metadata are not restored, they are lost forever. WDYT? |
Definitely hard error by default – but maybe with optional flag to overwrite it? |
I added the last commit to make it a hard error then. I also made error reporting a lot more detailed, so if it ever fails for someone, we'd know the reason. If all is ok, I'll merge and release - and btw feel free to review / test the changes once again. Thank you once again for collaborating on this! I couldn't fix that without your help! |
ok - testing |
aaah, the CI test failed.... investigating. Looks like I broke sth... |
figured that out, CI green |
I have tested xattr branch with latest commit (2c669c2) and all works in my tests - I included various xattr situations, files' ownership and groups and in addition in place encrytption files - the latest to make sure that it does not have issues similar to other deduplicators (namely jdupes - where it can lead to data loss - I mentioned it earlier #152 (comment)). All OK Also as I have seen in other projects problems with xattr copy when attribute size is bigger than 128KB e.g.: |
Looks for me that rust is very robust thing... maybe I will look into it closer:) |
If you would need help with future testing on macOS give me a shout. |
Thank you so much! |
when deduplicating (ref-link) fclones does not take care of extended attributes - it simply uses ones from the first file.
Example:
I created three identical files,
added sample extended attributes to two of them:
and now deduplicated using fclones:
as expected fclones found 2 duplicates and deduped them, however messing up external attr:
I guess you just
cp -c sourceFile destinationFile
creating clone of source file which unfortunately is not enough. Metadata should not be changed.External attributes should be preserved the same way like names - which are just basic attributes:).
The right way I would do it manually would be:
it unfortunately requires GNU cp (on macOS can be installed via brew
brew install coreutils
). I am sure there are maybe smarter ways to achieve the same:) What matters here is result.I also tried another deduplicator (jdupe) using the same dataset. Result:
So this is definitely doable.
The text was updated successfully, but these errors were encountered: