Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::fs::copy returns the combined size of all streams on NTFS on Windows #44532

Closed
stephaneyfx opened this issue Sep 13, 2017 · 12 comments
Closed
Labels
C-bug Category: This is a bug. O-windows Operating system: Windows P-medium Medium priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@stephaneyfx
Copy link
Contributor

NTFS files may contain alternate data streams besides the main one. Since PR #26751, std::fs::copy returns the combined size of all streams, which is confusing and likely unexpected unless the user is aware of this Windows/NTFS peculiarity.

An example where Windows uses alternate data streams seems to be when storing whether a file was downloaded from the internet (e.g. to display a security section and an "Unblock" button in the file properties). Such alternate streams are more like metadata, and to be consistent with other file metadata (attributes, permissions, extended attributes, etc.), their size should not be reflected in the value returned by copy.

In summary, I think std::fs::copy should return the size of the main stream for the following reasons (alternate data streams are still copied, their size is just not reported, like other metadata):

  • This is the least confusing result for non-NTFS experts and what most people expect (no "copying more bytes than the file contains").
  • This is consistent with other file metadata whose size is not reported by copy.
  • Alternate streams are mostly invisible, unless you try to open them explicitly. Even the file properties don't show them and Windows reports only the size of the main stream in the dialog.

@retep998, what do you think?

Steps to reproduce

  1. On Windows, download an executable file to a NTFS partition and verify that the property dialog contains a security section with an "Unblock" button.
  2. Note the file size reported in the file properties.
  3. Copy the file to the same partition using std::fs::copy and note the number of bytes returned.

Expected result

The number of bytes copied reported by copy is equal to the file size.

Actual result

copy reports a number of bytes greater than the file size.

@stephaneyfx
Copy link
Contributor Author

I have a fix ready if this is deemed an issue.

@sfackler
Copy link
Member

cc @retep998

@retep998
Copy link
Member

Personally I don't mind whether we report the size of all streams or just the main stream, and really I wish fs::copy wouldn't return any size at all (some system APIs for copying a file don't provide any way to get the number of bytes copied so they can't be used to implement fs::copy). However changing the behavior would technically be a breaking change so I'll leave this up to @rust-lang/libs to decide on.

@retep998 retep998 added I-nominated O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 13, 2017
@stephaneyfx
Copy link
Contributor Author

Technically, the breaking change happened with PR #26751 which changed the return value of ’std::fs::copy’ for multi-stream files on Windows + NTFS. This truly makes the return value of ’copy’ useless for platform/filesystem-agnostic code in the sense that it cannot be compared reliably with the file size.

As noted in the doc of ’copy’, the underlying called function may change, so one should not rely on it to copy alternate streams and if it matters to them, they should use a platform specific function instead to be robust against implementation changes in the std library.

@stephaneyfx
Copy link
Contributor Author

Oops, I didn't mean to close.

@stephaneyfx stephaneyfx reopened this Sep 13, 2017
@retep998
Copy link
Member

retep998 commented Sep 13, 2017

As noted in the doc of ’copy’, the underlying called function may change, so one should not rely on it to copy alternate streams and if it matters to them, they should use a platform specific function instead to be robust against implementation changes in the std library.

Regardless of whether std is technically allowed to change these details, I'd very strongly prefer that fs::copy continues to ask the OS to copy the file with all the associated metadata. As fs::copy is the first function that people reach for when wanting to copy a file, it is imperative that it perform well and be robust. The alternative of manually copying bytes to a new file is significantly slower (CopyFile and friends can take advantage of hardware DMA transfers to avoid wasting memory bandwidth and CPU time), as well as missing out on a lot of metadata which is sometimes essential to preserve.

Fortunately in this case we don't have to change what OS function we use, we just have to change the callback that we pass to CopyFileEx to report a different number. That said, we currently report TotalBytesTransferred which does match up with what the documentation on fs::copy claims which is "On success, the total number of bytes copied is returned.", so there is that point in favor of the status quo.

@stephaneyfx
Copy link
Contributor Author

Sorry if my comment was unclear. I don't mean to change the underlying function called to perform the copy, just the callback implementation to use the stream index and stream byte count. I agree that the implementation of std::fs::copy should go on calling the OS function for the reasons you mentioned. I was just pointing out that if an application requires such a platform-specific behavior (i.e. ensuring all streams in a file are copied), relying on the std function to do so may not be a robust and future-proof solution and would not work correctly when copying a file on a NTFS partition from Linux for example.

@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Sep 17, 2017
@alexcrichton alexcrichton added P-medium Medium priority and removed I-nominated labels Sep 26, 2017
@alexcrichton
Copy link
Member

The libs team discussed this today and the conclusion was that we agreed with sticking with the current implementation but we should figure out how to get the size reported the same one as returned by metadata.

@stephaneyfx
Copy link
Contributor Author

I can work on this if it helps. I have a fix ready and just need to add a unit test.

@alexcrichton
Copy link
Member

Awesome, thanks @stephaneyfx!

Any help is greatly appreciated :)

@stephaneyfx
Copy link
Contributor Author

@alexcrichton, I modified the documentation of fs::copy to clarify that the returned length is the same as the length reported by metadata. Should l also add a platform-specific note regarding the behavior with streams on Windows-NTFS?

@alexcrichton
Copy link
Member

Sure yeah, sounds great!

stephaneyfx added a commit to stephaneyfx/rust that referenced this issue Sep 28, 2017
On Windows with the NTFS filesystem, `fs::copy` would return the sum of the
lengths of all streams, which can be different from the length reported by
`metadata` and thus confusing for users unaware of this NTFS peculiarity.

This makes `fs::copy` return the same length `metadata` reports which is the
value it used to return before PR rust-lang#26751. Note that alternate streams are still
copied; their length is just not included in the returned value.

This change relies on the assumption that the stream with index 1 is always the
main stream in the `CopyFileEx` callback. I could not find any official
document confirming this but empirical testing has shown this to be true,
regardless of whether the alternate stream is created before or after the main
stream.

Resolves rust-lang#44532
bors added a commit that referenced this issue Oct 3, 2017
Made `fs::copy` return the length of the main stream

On Windows with the NTFS filesystem, `fs::copy` would return the sum of the
lengths of all streams, which can be different from the length reported by
`metadata` and thus confusing for users unaware of this NTFS peculiarity.

This makes `fs::copy` return the same length `metadata` reports which is the
value it used to return before PR #26751. Note that alternate streams are still
copied; their length is just not included in the returned value.

This change relies on the assumption that the stream with index 1 is always the
main stream in the `CopyFileEx` callback. I could not find any official
document confirming this but empirical testing has shown this to be true,
regardless of whether the alternate stream is created before or after the main
stream.

Resolves #44532
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-windows Operating system: Windows P-medium Medium priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants