-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue-2214 Create File.ReadLinesAsync() #66492
issue-2214 Create File.ReadLinesAsync() #66492
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsFix #2214 , will mark it as ready when it is ready, thanks. But there are some notes, please correct my understand on its behavior in advance:
|
I feel like ReadAllLines should be optimized to use |
Why do you believe that's an improvement? |
Because it looks like it duplicates the logic from ReadLines() in part of the ReadAllLines() function. |
There's a maintainability argument, but it's not an optimization. ReadLines() strictly has more overhead. |
} | ||
} | ||
|
||
internal sealed class AsyncReadLinesEnumerator : IAsyncEnumerator<string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't implement this manually; as written the implementation is missing out on some important optimizations. We should instead use the async iterator support provided by the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, and this iterator has following behaviors, please check if it is our expected ones (they should be same behaviors of ReadLines
though):
Check and throw file related exception if occur upfront before GetEnumerator
So that leaking first stream reader if not even foreach
Share one stream reader for multiple enumerator
until first round iteration finishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @stephentoub This PR (with async iterator by compiler as suggested) is ready for review now, please review, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lateapexearlyspeed thank you for another great contribution! PTAL at my comments, I would like to get a better understanding of why the StreamReader
can be disposed an re-created more than once
src/libraries/System.IO.FileSystem/tests/File/ReadWriteAllLinesAsync.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/tests/File/ReadWriteAllLinesAsync.cs
Outdated
Show resolved
Hide resolved
|
||
private static async IAsyncEnumerable<string> IterateFileLinesAsync(DisposingFlagStreamReader sr, string path, Encoding encoding, CancellationToken ctEnumerable, [EnumeratorCancellation] CancellationToken ctEnumerator = default) | ||
{ | ||
if (sr.IsDisposed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in what scenario is it possible for the DisposingFlagStreamReader
to be disposed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL at my comments, I would like to get a better understanding of why the
StreamReader
can be disposed an re-created more than once
Answer together here :)
in what scenario is it possible for the
DisposingFlagStreamReader
to be disposed here?
Before start this code, I checked existing IEnumerable File.ReadLines() related code and related comment and got some background that we would move first streamReader
allocation before involve Enumerator
(so pass streamReader to this IterateFileLinesAsync()
) so that exceptions such as DirectoryNotFoundException and FileNotFoundException are thrown directly by File.ReadLinesAsync
(which the user probably expects), rather than throw until call Enumerator.MoveNext
.
And there was comment to suggest using compiler-generated Iterator to implement.
Because generated enumerable stores first time passed streamReader reference, so here we check if current streamReader reference has been disposed already for scenario: when previous user has consumed all lines and then call GetEnumerator
against same enumerable).
IAsyncEnumerable<string> enumerable = File.ReadLinesAsync("path");
await foreach (string line in enumerable)
{
// consume line;
} // streamReader is disposed now
// can use same enumerable to create another enumerator and work properly
await foreach (string line in enumerable)
{
// consume line;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lateapexearlyspeed thanks for the explanation! Now I can see that we do something similar for File.ReadLines
:
runtime/src/libraries/System.Private.CoreLib/src/System/IO/ReadLinesIterator.cs
Lines 64 to 70 in a077f27
// NOTE: To maintain the same behavior with the previous yield-based | |
// iterator in 4.0, we have all the IEnumerator<T> instances share the same | |
// underlying reader. If we have already been disposed, _reader will be null, | |
// which will cause CreateIterator to simply new up a new instance to start up | |
// a new iteration. We cannot change this behavior due to compatibility | |
// concerns. | |
return CreateIterator(_path, _encoding, _reader); |
I wonder whether we need a new type for DisposingFlagStreamReader
. The ReadLinesAsync
method could create a FileStream
and pass it here and then we could use FileStream.CanRead
to figure out whether it has been disposed or not:
runtime/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/OSFileStreamStrategy.cs
Line 73 in a077f27
public sealed override bool CanRead => !_fileHandle.IsClosed && (_access & FileAccess.Read) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub just to be extra sure I assume that we want to have this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by using FileStream.CanRead
. Use StreamReader.BaseStream
to get inner fileStream so can pass and reuse StreamReader method to read lines.
too bad we cant avoid the disposable and read it's contents line for line directly into a span and return it as an array (using async with linq). |
sr = new DisposingFlagStreamReader(path, encoding); | ||
} | ||
|
||
using (sr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be async?
Also note: this could be changed from block using form to normal using form from here (after it leaves scope it will be disposed).
@@ -343,7 +343,7 @@ public static IAsyncEnumerable<string> ReadLinesAsync(string path, Encoding enco | |||
{ | |||
Validate(path, encoding); | |||
|
|||
var sr = new DisposingFlagStreamReader(path, encoding); // Move first streamReader allocation here so to throw related file exception upfront, which will cause known leaking if user never actually foreach's over the enumerable | |||
StreamReader sr = AsyncStreamReader(path, encoding); // Move first streamReader allocation here so to throw related file exception upfront, which will cause known leaking if user never actually foreach's over the enumerable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if they used linq query/method style with the function instead of foreach (and they use .ToList() at the end of it to execute the linq query/methods)? Will it leak in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @lateapexearlyspeed !
Fix #2214 , will mark it as ready when it is ready, thanks.
But there are some notes, please correct my understand on its behavior in advance:
"File not found" related exceptions will be thrown at beginning of
ReadLinesAsync()
which is same asReadLines()
, but followings are different withReadLines()
if I assume they are different method so not consider compatible with 4.0:IEnumerator<T>
instances from the sameIEnumerable<T>
will not share same underlying reader