Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadOnlySequence<T> GetOffset does not function correctly with non-zero starting indices #75866

Open
Tracked by #79006
NinoFloris opened this issue Sep 19, 2022 · 13 comments

Comments

@NinoFloris
Copy link
Contributor

NinoFloris commented Sep 19, 2022

Description

Calling long ReadOnlySequence<T>.GetOffset(SequencePosition position) does not function correctly if the sequence was constructed with a segment/array having a non-zero starting index.

Reproduction Steps

using System;
using System.Buffers;

// Create a sequence with a non-zero starting index, 1 here.
var seq = new ReadOnlySequence<byte>(new byte[10], 1, 9);
// GetOffset: Returns the offset of a position within this sequence from the *start*.
var offset = seq.GetOffset(seq.Start);
if (offset != 0) 
  throw new Exception("Bugged, was: " + offset);

https://dotnetfiddle.net/lpUi0v

Expected behavior

Given the xml docs specifically note "Returns the offset of a position within this sequence from the start" I would expect the offset to be 0 when passing in seq.Start.

We're using GetOffset to Rewind a SequenceReader to a SequencePosition previously saved. SequenceReader is missing a method to do so directly however (would an api proposal for this be accepted?). So we take the offset of the position, subtracting it from Consumed and pass in the expected count to SequenceReader.Rewind.
Care also needs to be taken not to pass in 0, see #68774

Configuration

This is an issue ever since ReadOnlySequence.GetOffset was shipped in #771.
A version of the roundtrip test I specified in the reproduction steps should have been in the test suite.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 19, 2022
@ghost
Copy link

ghost commented Sep 19, 2022

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

long ReadOnlySequence<T>.GetOffset(SequencePosition position) does not function correctly if the sequence was constructed with a segment/array having a non-zero starting index.

Reproduction Steps

using System;
using System.Buffers;

var seq = new ReadOnlySequence<byte>(new byte[10], 1, 10 - 1);
// GetOffset: Returns the offset of a position within this sequence from the *start*.
var offset = seq.GetOffset(seq.Start);
if (offset != 0) 
  throw new Exception("Bugged, was: " + offset);

https://dotnetfiddle.net/lpUi0v

Expected behavior

Effectively the roundtrip test I specified in the reproduction steps should have been in the test suite.

Given the xml docs on specifically note "Returns the offset of a position within this sequence from the start" I would expect the offset to be 0 here.

This does not just happen with seq.Start but any position from that moment on.

Actual behavior

You get the starting index.

Regression?

No response

Known Workarounds

No response

Configuration

This has been bugged ever since ReadOnlySequence was shipped.

Other information

No response

Author: NinoFloris
Assignees: -
Labels:

area-System.Buffers, untriaged

Milestone: -

@NinoFloris
Copy link
Contributor Author

@vcsjones I can make a PR to fix this if this is indeed unexpected behavior.

@vcsjones
Copy link
Member

@NinoFloris I would wait for one of the area owners to weigh in first.

@dakersnar
Copy link
Contributor

Looking into this now

@dakersnar
Copy link
Contributor

dakersnar commented Sep 20, 2022

In the code, what's happening is we are adding positionIndex to our result here https://source.dot.net/#System.Memory/System/Buffers/ReadOnlySequence.cs,580, which is equivalent to seq.Start.GetInteger() which returns 1. This index is relative to the array the ReadOnlySequence was created from, not the ReadOnlySequence itself.

There are a few possibilities here:

  1. The behavior is intended, GetOffset is relative to the start of the source data, not the ReadOnlySequence, and the documentation needs to be clarified.
  2. The documentation is correct, and the behavior is incorrectly returning an offset relative to the source data, so one of the following is the case:
    a. GetOffset should not be adding positionIndex to the result.
    b. seq.Start is initialized incorrectly and should be initialized with an index relative to the ReadOnlySequence (seq.Start.GetInteger() == 0), not the source array (in this case seq.Start.GetInteger() == 1).

@dakersnar dakersnar added this to the Future milestone Sep 20, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 20, 2022
@dakersnar dakersnar added the bug label Sep 20, 2022
@dakersnar
Copy link
Contributor

@NinoFloris After some discussion I think 2.a. is correct here. If you would like to put up a fix (with a unit test preferably), feel free, otherwise I can tackle it. Thanks for pointing out this bug!

@dakersnar dakersnar self-assigned this Sep 22, 2022
@NinoFloris
Copy link
Contributor Author

NinoFloris commented Sep 26, 2022

Additionally, I've found another angle where this goes wrong.

GetOffset in the multi segment branch uses currentSegment.RunningIndex instead of the segment's memory length (minus any starting indices).

The issue here is around advancing on a pipereader, at some point the running index can become fairly large due to readers being created over segments that are still part of a larger buffer segment stack. Given RunningIndex states: "The sum of node length before current." using this for the output of GetOffset doesn't seem correct either.

Screen Shot 2022-09-27 at 00 42 49

Doing GetOffset on Sequence.Start in that cases gives even wilder outputs:
Screen Shot 2022-09-27 at 00 46 00

Sequence.End has a running index of 32768. The output of GetOffset(Sequence.Start) being 32768 - 2 (the actual remaining bytes) is consistent but it obviously does not make sense if I want to know the offset, which is 8192.

And it obviously goes out of bounds of the ReadOnlySequence which only reports a length of 8194.

@dakersnar
Copy link
Contributor

Haven't had time to follow up on this yet. I just wanted to add a small update that I'm no longer confident this is a bug. When I tried to implement 2.a., there were a lot of test cases that failed, and most of them seem to be written in such a way that implies the current behavior is intended.

@NinoFloris
Copy link
Contributor Author

Its leaking a hell of a lot of internal indices etc for it to be intended IMO

@jmelkins
Copy link

jmelkins commented Jan 8, 2024

I was hit by this problem too. The confusion is acute because GetOffset returns a value that depends on the number of items before Sequence.Start in the first memory segment, which may change when the sequence is sliced and is not known by the caller (unless making extra efforts to keep track of it).

I.e. the returned value from GetOffset is non-deterministic from the caller's perspective and of no use on its own. The workaround is to do something like:

SequencePosition currentPosition = ...  // e.g. a saved position or position of SequenceReader
long distance = readOnlySequence.GetOffset(currentPosition) - readOnlySequence.GetOffset(readOnlySequence.Start);
// distance now has the value the caller was expecting to receive from readOnlySequence.GetOffset(currentPosition) alone

The current behaviour may not be ideal but some code may now depend upon it. A few possibilities I thought of for improvement:

  • Change the documentation on GetOffset to describe the actual behaviour and add a warning about requirements for usage.
  • Add a new method on ReadOnlySequence<T> to implement the above workaround e.g.
public long GetDistance(SequencePosition start, SequencePosition end)
  • If a suitable method is added, consider deprecating GetOffset.

@lilinus
Copy link
Contributor

lilinus commented Jul 12, 2024

Another issue with GetOffset returning offset from the original (unsliced) ReadOnlySequence is that it makes ReadOnlySequence.GetPosition(long offset) not roundtripping (or failing).

Example:

var original = new ReadOnlySequence<byte>(new byte[2]);
var slice = original.Slice(1);
var endOffset = slice.GetOffset(slice.End);
var posEndMinusOne = slice.GetPosition(endOffset - 1); // Returns slice.End
var posEnd = slice.GetPosition(endOffset); // throws ArgumentOutOfRangeException

@jeffhandley jeffhandley modified the milestones: 9.0.0, Future Aug 6, 2024
@PranavSenthilnathan PranavSenthilnathan added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed bug labels Nov 12, 2024
@PranavSenthilnathan
Copy link
Member

At the very least, the documentation should be clarified because it is not clear/consistent

I'm not sure about deprecating/removing GetOffset since it's the "inverse" (modulo the round trip issue) of GetPosition. Here is the API request for it where it was added for that purpose: #27158 (the video of the review).

Since it's unlikely we take these APIs away, I think GetDistance is not needed (you can convert to offset and subtract). But feel free to make an API request issue for GetDistance if you think it is still useful for avoiding confusion.

@PranavSenthilnathan PranavSenthilnathan removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants