Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue-2214 Create File.ReadLinesAsync() #66492

Conversation

lateapexearlyspeed
Copy link
Contributor

Fix #2214 , will mark it as ready when it is ready, thanks.

But there are some notes, please correct my understand on its behavior in advance:
"File not found" related exceptions will be thrown at beginning of ReadLinesAsync() which is same as ReadLines(), but followings are different with ReadLines() if I assume they are different method so not consider compatible with 4.0:

  • IEnumerator<T> instances from the same IEnumerable<T> will not share same underlying reader
  • Not dispose just after read to end of stream reader.

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Mar 11, 2022
@ghost
Copy link

ghost commented Mar 11, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Fix #2214 , will mark it as ready when it is ready, thanks.

But there are some notes, please correct my understand on its behavior in advance:
"File not found" related exceptions will be thrown at beginning of ReadLinesAsync() which is same as ReadLines(), but followings are different with ReadLines() if I assume they are different method so not consider compatible with 4.0:

  • IEnumerator<T> instances from the same IEnumerable<T> will not share same underlying reader
  • Not dispose just after read to end of stream reader.
Author: lateapexearlyspeed
Assignees: -
Labels:

area-System.IO, new-api-needs-documentation

Milestone: -

@lateapexearlyspeed lateapexearlyspeed changed the title issue-2214 Initial commit: create an AsyncReadLinesEnumerable. issue-2214 Create File.ReadLinesAsync() Mar 11, 2022
@AraHaan
Copy link
Member

AraHaan commented Mar 11, 2022

I feel like ReadAllLines should be optimized to use ReadLines() and then call ToArray() on that, same with the async version of it.

@stephentoub
Copy link
Member

I feel like ReadAllLines should be optimized to use ReadLines() and then call ToArray() on that, same with the async version of it.

Why do you believe that's an improvement?

@AraHaan
Copy link
Member

AraHaan commented Mar 12, 2022

I feel like ReadAllLines should be optimized to use ReadLines() and then call ToArray() on that, same with the async version of it.

Why do you believe that's an improvement?

Because it looks like it duplicates the logic from ReadLines() in part of the ReadAllLines() function.

@stephentoub
Copy link
Member

I feel like ReadAllLines should be optimized to use ReadLines() and then call ToArray() on that

Why do you believe that's an improvement?

Because it looks like it duplicates the logic

There's a maintainability argument, but it's not an optimization. ReadLines() strictly has more overhead.

}
}

internal sealed class AsyncReadLinesEnumerator : IAsyncEnumerator<string>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't implement this manually; as written the implementation is missing out on some important optimizations. We should instead use the async iterator support provided by the compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, and this iterator has following behaviors, please check if it is our expected ones (they should be same behaviors of ReadLines though):

Check and throw file related exception if occur upfront before GetEnumerator
So that leaking first stream reader if not even foreach
Share one stream reader for multiple enumerator until first round iteration finishes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stephentoub This PR (with async iterator by compiler as suggested) is ready for review now, please review, thanks.

@lateapexearlyspeed lateapexearlyspeed marked this pull request as ready for review March 22, 2022 09:19
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lateapexearlyspeed thank you for another great contribution! PTAL at my comments, I would like to get a better understanding of why the StreamReader can be disposed an re-created more than once


private static async IAsyncEnumerable<string> IterateFileLinesAsync(DisposingFlagStreamReader sr, string path, Encoding encoding, CancellationToken ctEnumerable, [EnumeratorCancellation] CancellationToken ctEnumerator = default)
{
if (sr.IsDisposed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what scenario is it possible for the DisposingFlagStreamReader to be disposed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL at my comments, I would like to get a better understanding of why the StreamReader can be disposed an re-created more than once

Answer together here :)

in what scenario is it possible for the DisposingFlagStreamReader to be disposed here?

Before start this code, I checked existing IEnumerable File.ReadLines() related code and related comment and got some background that we would move first streamReader allocation before involve Enumerator (so pass streamReader to this IterateFileLinesAsync()) so that exceptions such as DirectoryNotFoundException and FileNotFoundException are thrown directly by File.ReadLinesAsync (which the user probably expects), rather than throw until call Enumerator.MoveNext.
And there was comment to suggest using compiler-generated Iterator to implement.

Because generated enumerable stores first time passed streamReader reference, so here we check if current streamReader reference has been disposed already for scenario: when previous user has consumed all lines and then call GetEnumerator against same enumerable).

            IAsyncEnumerable<string> enumerable = File.ReadLinesAsync("path");
            await foreach (string line in enumerable)
            {
                // consume line;
            } // streamReader is disposed now

            // can use same enumerable to create another enumerator and work properly
            await foreach (string line in enumerable)
            {
                // consume line;
            }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lateapexearlyspeed thanks for the explanation! Now I can see that we do something similar for File.ReadLines:

// NOTE: To maintain the same behavior with the previous yield-based
// iterator in 4.0, we have all the IEnumerator<T> instances share the same
// underlying reader. If we have already been disposed, _reader will be null,
// which will cause CreateIterator to simply new up a new instance to start up
// a new iteration. We cannot change this behavior due to compatibility
// concerns.
return CreateIterator(_path, _encoding, _reader);

I wonder whether we need a new type for DisposingFlagStreamReader. The ReadLinesAsync method could create a FileStream and pass it here and then we could use FileStream.CanRead to figure out whether it has been disposed or not:

public sealed override bool CanRead => !_fileHandle.IsClosed && (_access & FileAccess.Read) != 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub just to be extra sure I assume that we want to have this behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by using FileStream.CanRead. Use StreamReader.BaseStream to get inner fileStream so can pass and reuse StreamReader method to read lines.

@adamsitnik adamsitnik self-assigned this Apr 26, 2022
@AraHaan
Copy link
Member

AraHaan commented Apr 27, 2022

too bad we cant avoid the disposable and read it's contents line for line directly into a span and return it as an array (using async with linq).

sr = new DisposingFlagStreamReader(path, encoding);
}

using (sr)
Copy link
Member

@AraHaan AraHaan Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be async?

Also note: this could be changed from block using form to normal using form from here (after it leaves scope it will be disposed).

@@ -343,7 +343,7 @@ public static IAsyncEnumerable<string> ReadLinesAsync(string path, Encoding enco
{
Validate(path, encoding);

var sr = new DisposingFlagStreamReader(path, encoding); // Move first streamReader allocation here so to throw related file exception upfront, which will cause known leaking if user never actually foreach's over the enumerable
StreamReader sr = AsyncStreamReader(path, encoding); // Move first streamReader allocation here so to throw related file exception upfront, which will cause known leaking if user never actually foreach's over the enumerable
Copy link
Member

@AraHaan AraHaan Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if they used linq query/method style with the function instead of foreach (and they use .ToList() at the end of it to execute the linq query/methods)? Will it leak in that case?

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @lateapexearlyspeed !

@adamsitnik adamsitnik merged commit 8b6f09a into dotnet:main Apr 29, 2022
@adamsitnik adamsitnik added this to the 7.0.0 milestone Apr 29, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API Proposal: File.ReadLinesAsync
4 participants