-
-
Notifications
You must be signed in to change notification settings - Fork 940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a few issues with ShellStream #1322
Conversation
The main change is to replace the Queue<byte> with a byte[] and a couple of variables which index the start and end of the data. The remainder is mainly slightly more careful locking semantics. It also implements Expect(string) separately so that it can work on the bytes and skip a lot of encoding work (this is where I wish ShellStream derived from StreamReader). One possibly contentious point: in fixing the Write behaviour I chose to remove the "outgoing" buffer and immediately send the data across the channel. Write(string) and WriteLine(string) were already doing this, and I felt it was better to change Write(byte[]) to match rather than changing the string methods.
|
||
var match = regex.Match(result); | ||
var bufferText = _encoding.GetString(_buffer, _head, _tail - _head); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To integrate the expectSize, it would probably just be a case of restricting the search space here, e.g.
var expectHead = Math.Max(_head, _tail - _expectSize);
var bufferText = _encoding.GetString(_buffer, expectHead, _tail - expectHead);
The question is whether to put it as a parameter on the method or the constructor (I would probably prefer method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also prefer a method parameter with the default value it will work for everyone without any breaking change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with the parameter and named it windowSize
, which felt to me a little more intuitive in meaning.
@jscarle, regretfully these changes (as they stand) reduce most of your recent contributions to be only in spirit rather than in code. I hope you consider the end result to be a net positive nonetheless. These recent changes wouldn't have happened without your motivating persistence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also on this line: unfortunately I think we still need to materialise the entire buffer as a string for Expect(Regex)
in order to correctly deal with multi-byte encoding. (Expect(string)
does not suffer from this because it works directly on the bytes). We can at least still use a reduced window for the regex match itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, what's important is that collaboratively we'll have managed to drastically improve Expect and fix bugs with ShellStream. These are issues that date back years now and made the previous implementation difficult to use, and in some cases impossible to use, in long running scenarios.
As for windowSize, I'd recommend against using the word window in this context. As we've been putting our hands in the plumbing, it may be obvious to us, however to someone consuming the API, it may not. I feel that it is too confusing and may lead people to misinterpret that as having some sort of relationship with the terminal window size.
In terms of performance, I'm glad that although my contribution has literally been obliterated by this PR, at least I was on par with your version in terms of execution time. My allocation improved, but your version is obviously much more efficient.
It would be a good idea to performance test Expect with a very large incoming buffer (in the several MB range) to ensure that there's no regression of the original performance issues that brought me to run a parallel queue to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, windowSize
is probably not a great name here. And I'd already been a bit confused by expectSize
. I'll change it later. So far my best attempt is rollingWindowSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use one of these:
- match
- search
- inspect / inspection
- observation
- peek
- frame
- analysis
- lookahead
- lookbehind
And/or a combination of above with Max, Size, Length, Range, or Bytes. ie: matchLength, peekSize.
Personally, I like matchLength, searchSize, inspect, and inspectSize. Though my most preferred is simply inspect. It's short, consice, and unambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions. I went with lookback
It looks great! I'm patiently waiting for the merge with the develop branch |
Is it ready for review? |
Yes, thanks |
Changing the behavior of the Write method is controversial, but we can try. |
Yeah, I think either edit: it could be flagged in the release notes as a behavioural breaking change |
I propose to do it the other way around. WriteLine could call Flush internally. And Write could buffer. |
Basically, my reasoning is that if I'm consuming ShellStream in my application, when I call WriteLine, I'm done. That's what I wanted to send. While using Write, maybe I have several different Writes to do before I'm done. |
Yep that makes sense, but note that |
I think that continues to make sense. If you're sending a full string, you probably are done as well. I suspect that people who are calling Write repeatedly will do so with byte[] and not strings. |
Thanks. I will restore the write buffer shortly |
Okej so then we will release the new version |
@WojciechNagorski I'd like to submit my PR before you release if you don't mind. |
@jscarle Ok, no problem, just let me know which one. |
Apply similar treatment to PipeStream as sshnet#1322 did to ShellStream PipeStream will now behave much more Stream-like. In particular, it performs partial reads (instead of blocking until a certain amount of data is available), blocks until data is available (instead of returning 0 prematurely) and removes the Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`. It should also be a fair bit faster, but sadly I gave up trying to make a benchmark compatible with all the quirks of the previous implementation.
Apply similar treatment to PipeStream as sshnet#1322 did to ShellStream PipeStream now behaves much more Stream-like. In particular, it performs partial reads (instead of blocking until a certain amount of data is available), blocks until data is available (instead of returning 0 prematurely) and removes the Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`. It should also be a fair bit faster, but sadly I gave up trying to make a benchmark compatible with all the quirks of the previous implementation. Some cleanup of its usage in the library followed.
Apply similar treatment to PipeStream as sshnet#1322 did to ShellStream PipeStream now behaves much more Stream-like. In particular, it performs partial reads (instead of blocking until a certain amount of data is available), blocks until data is available (instead of returning 0 prematurely) and removes the Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`. It should also be a fair bit faster, but sadly I gave up trying to make a benchmark compatible with all the quirks of the previous implementation. Some cleanup of its usage in the library followed.
Apply similar treatment to PipeStream as sshnet#1322 did to ShellStream PipeStream now behaves much more Stream-like. In particular, it performs partial reads (instead of blocking until a certain amount of data is available), blocks until data is available (instead of returning 0 prematurely) and removes the Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`. It should also be a fair bit faster, but sadly I gave up trying to make a benchmark compatible with all the quirks of the previous implementation. Some cleanup of its usage in the library followed.
Apply similar treatment to PipeStream as sshnet#1322 did to ShellStream PipeStream now behaves much more Stream-like. In particular, it performs partial reads (instead of blocking until a certain amount of data is available), blocks until data is available (instead of returning 0 prematurely) and removes the Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`. Sadly I gave up trying to make a benchmark compatible with all the quirks of the previous implementation, but a dumb throughput test (reading and writing simultaneously) shows about 5.2GB/s with this implementation compared to 140MB/s previously. Some cleanup of its usage in the library followed.
Apply similar treatment to PipeStream as #1322 did to ShellStream PipeStream now behaves much more Stream-like. In particular, it performs partial reads (instead of blocking until a certain amount of data is available), blocks until data is available (instead of returning 0 prematurely) and removes the Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`. Sadly I gave up trying to make a benchmark compatible with all the quirks of the previous implementation, but a dumb throughput test (reading and writing simultaneously) shows about 5.2GB/s with this implementation compared to 140MB/s previously. Some cleanup of its usage in the library followed. Co-authored-by: Wojciech Nagórski <wojtpl2@gmail.com>
I have not yet rebased this on the expectSize changes merged today, pending further discussion. So there are merge conflicts from the start. Just wanted to get it out there.The main change is to replace the Queue with a byte[] and a couple of variables which index the start and end of the data. The remainder is mainly slightly more careful locking semantics.
One possibly contentious point: in fixing the Write behaviour (#301) I chose to remove the outgoing buffer and immediately send the data across the channel. Write(string) and WriteLine(string) were already doing this, and I felt it was better to change Write(byte[]) to match rather than changing the string methods. This is the reason for the number of deleted test classes for Write.
It also makes the definition of infinite timeout to consistently be -1 milliseconds instead of 0, as is standard in other .NET methods with a wait period. This was not documented previously.
The rest of the changes are hopefully fairly agreeable. Happy to clarify anything.
Here are the (allocation) numbers on 3bfac50, which this is based on:
And here is after:
(for reference, current develop d07827b)
closes #63
closes #453
closes #672
closes #917
closes #180
closes #301
closes #365 (the previous implementation would read for the full timeout on each update, rather than just how long we have left)
closes #714
closes #748
closes #302
closes #1320
closes #923
closes #1024