Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ReadAheadRemoteFileInputStream not reading the whole file if a buffer is too big #769

Merged
merged 1 commit into from
Mar 4, 2022

Conversation

vladimirlagunov
Copy link
Contributor

If an instance of ReadAheadRemoteFileInputStream before this change is wrapped into a BufferedInputStream with a big buffer, the SSH client requests big packets from the server. It turned out that if the server had sent a response smaller than requested, the client wouldn't have adjusted to decreased window size, and would have read the file incorrectly.

This change detects cases when the server is not able to fulfil client's requests. Since this change, the client adjusts the maximum request length, sends new read-ahead requests, and starts to ignore all read-ahead requests sent earlier.

Just specifying some allegedly small constant buffer size wouldn't have helped in all possible cases. There is no way to explicitly get the maximum request length inside a client. All that limits differ from server to server. For instance, OpenSSH defines SFTP_MAX_MSG_LENGTH as 256 * 1024. Apache SSHD defines MAX_READDATA_PACKET_LENGTH as 63 * 1024, and it allows to redefine that size.

Interestingly, a similar issue #183 was fixed many years ago, but the bug was actually in the code introduced for that fix.

…ffer is too big

If an instance of ReadAheadRemoteFileInputStream before this change is wrapped into a BufferedInputStream with a big buffer, the SSH client requests big packets from the server. It turned out that if the server had sent a response smaller than requested, the client wouldn't have adjusted to decreased window size, and would have read the file incorrectly.

This change detects cases when the server is not able to fulfil client's requests. Since this change, the client adjusts the maximum request length, sends new read-ahead requests, and starts to ignore all read-ahead requests sent earlier.

Just specifying some allegedly small constant buffer size wouldn't have helped in all possible cases. There is no way to explicitly get the maximum request length inside a client. All that limits differ from server to server. For instance, OpenSSH defines SFTP_MAX_MSG_LENGTH as 256 * 1024. Apache SSHD defines MAX_READDATA_PACKET_LENGTH as 63 * 1024, and it allows to redefine that size.

Interestingly, a similar issue hierynomus#183 was fixed many years ago, but the bug was actually in the code introduced for that fix.
@codecov-commenter
Copy link

codecov-commenter commented Mar 4, 2022

Codecov Report

Merging #769 (cb9acdf) into master (50efeb6) will increase coverage by 0.44%.
The diff coverage is 85.18%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #769      +/-   ##
============================================
+ Coverage     63.93%   64.37%   +0.44%     
- Complexity     1452     1458       +6     
============================================
  Files           211      211              
  Lines          8617     8615       -2     
  Branches        811      806       -5     
============================================
+ Hits           5509     5546      +37     
+ Misses         2677     2645      -32     
+ Partials        431      424       -7     
Impacted Files Coverage Δ
...rc/main/java/net/schmizz/sshj/sftp/RemoteFile.java 55.00% <85.18%> (+14.15%) ⬆️
...net/schmizz/sshj/transport/TransportException.java 33.33% <0.00%> (-11.12%) ⬇️
...ain/java/net/schmizz/sshj/common/SSHException.java 61.29% <0.00%> (-6.46%) ⬇️
...va/net/schmizz/sshj/connection/ConnectionImpl.java 59.34% <0.00%> (+3.25%) ⬆️
.../main/java/net/schmizz/sshj/sftp/PacketReader.java 95.74% <0.00%> (+4.25%) ⬆️
...zz/sshj/connection/channel/ChannelInputStream.java 86.15% <0.00%> (+10.76%) ⬆️
...t/schmizz/sshj/connection/ConnectionException.java 33.33% <0.00%> (+11.11%) ⬆️
...va/net/schmizz/sshj/connection/channel/Window.java 68.75% <0.00%> (+14.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50efeb6...cb9acdf. Read the comment docs.

@vladimirlagunov
Copy link
Contributor Author

After writing this simple piece of code, I stumbled upon questions: why do I remove already received data chunks if the window is adjusted? Why not reuse them later? The answer came after few hours of hacking. Indeed, it would be cool to make something as complicated as an interval tree. However, if we don't remove the data chunks which can be used only after many other smaller requests, the size of the unconfirmed reads will be equal to maxUnconfirmedReads - 1 for a long time, and the read-ahead algorigthm won't work as expected. Another option might be increasing of maxUnconfirmedReads, but such behaviour would be unfair and unexpected.


private final byte[] b = new byte[1];

private final int maxUnconfirmedReads;
private final long readAheadLimit;
private final Queue<Promise<Response, SFTPException>> unconfirmedReads = new LinkedList<Promise<Response, SFTPException>>();
private final Queue<Long> unconfirmedReadOffsets = new LinkedList<Long>();
private final Queue<UnconfirmedRead> unconfirmedReads = new LinkedList<>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JdkObsolete: It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not. (details)
(at-me in a reply with help or ignore)

this.readAheadLimit = readAheadLimit > 0 ? fileOffset + readAheadLimit : Long.MAX_VALUE;
}

private ByteArrayInputStream pending = new ByteArrayInputStream(new byte[0]);

private boolean retrieveUnconfirmedRead(boolean blocking) throws IOException {
if (unconfirmedReads.size() <= 0) {
final UnconfirmedRead unconfirmedRead = unconfirmedReads.peek();
if (unconfirmedRead == null || !blocking && !unconfirmedRead.getPromise().isDelivered()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OperatorPrecedence: Use grouping parenthesis to make the operator precedence explicit (details)
(at-me in a reply with help or ignore)

@hierynomus hierynomus merged commit 9a939d0 into hierynomus:master Mar 4, 2022
@hierynomus
Copy link
Owner

Thanks for the PR @vladimirlagunov. High quality as always!

vladimirlagunov added a commit to vladimirlagunov/sshj that referenced this pull request Apr 6, 2022
…oteFile

Due to a bug in logic introduced by hierynomus#769, RemoteFile.ReadAheadRemoteFileInputStream started to send new read ahead requests for file parts that had already been requested.

Every call to read() asked the server to send parts of the file from the point which is already downloaded. Instead, it should have asked to send parts after the last requested part. This commit adds exactly this logic.

The bug didn't cause content corruption. It only affected performance, both on servers and on clients.
vladimirlagunov added a commit to vladimirlagunov/sshj that referenced this pull request Apr 6, 2022
…oteFile

Due to a bug in logic introduced by hierynomus#769, RemoteFile.ReadAheadRemoteFileInputStream started to send new read ahead requests for file parts that had already been requested.

Every call to read() asked the server to send parts of the file from the point which is already downloaded. Instead, it should have asked to send parts after the last requested part. This commit adds exactly this logic.

The bug didn't cause content corruption. It only affected performance, both on servers and on clients.
hierynomus pushed a commit that referenced this pull request Apr 7, 2022
)

Due to a bug in logic introduced by #769, RemoteFile.ReadAheadRemoteFileInputStream started to send new read ahead requests for file parts that had already been requested.

Every call to read() asked the server to send parts of the file from the point which is already downloaded. Instead, it should have asked to send parts after the last requested part. This commit adds exactly this logic.

The bug didn't cause content corruption. It only affected performance, both on servers and on clients.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants