Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload: asynchronious operations #6614

Merged
merged 2 commits into from
Mar 20, 2019
Merged

Upload: asynchronious operations #6614

merged 2 commits into from
Mar 20, 2019

Conversation

ogoffart
Copy link
Contributor

Implements owncloud/core#31851

@ogoffart ogoffart requested a review from ckamm June 25, 2018 16:18
@ogoffart
Copy link
Contributor Author

(WIP)

@guruz
Copy link
Contributor

guruz commented Jul 5, 2018

@jnweiger btw next time better build with branch name in file name:)
not that it matters much in this case

@ogoffart ogoffart changed the title Upload: assynchronious operations Upload: asynchronious operations Sep 14, 2018
@ogoffart
Copy link
Contributor Author

@PVince81 I see something was merge in the server stable10. Does that mean we should now go forward with this client side?

@ckamm : do you think this is small enough to go into 2.5.1? or should it be in master?

@ogoffart ogoffart changed the base branch from master to 2.5 October 17, 2018 08:46
Copy link
Contributor

@ckamm ckamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is merged for 2.5.1 it should be in disabled state. (OC-LazyOps=false? env variable toggle?) There are some good tests but this is too much for a patch release imo.

@@ -1029,6 +1029,7 @@ void CleanupPollsJob::slotPollFinished()
deleteLater();
return;
}
_journal->setUploadInfo(job->_item->_file, SyncJournalDb::UploadInfo());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can PollJobs have a "there was an error, don't retry" result in any way? I'm just wondering that no other branch wipes the uploadinfo.

PollJob seems to wipe the pollinfo on non-fatal non-503

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of PollInfo: It's set on startPollJob but only removed in this specific error case. And then when a sync is started, PollJobs are rerun? That means polls run indefinitely and block further syncs as long as the server replies "init" or "started"?

I think the idea is that we can continue polling for completion in case the user aborts the sync? (I think you tested that abort() case in the unittest, excellent!) PollInfos need more documentation about their purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of error, the uploadinfo has to stay so the next sync will deal with it by retrying the final move, or use the upload info to resolve possible conflict, should the file have been updated after all (using checksum)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a server never finishing that always reply "started" will block the sync forever.
Do you think this is a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UploadInfo not being wiped on failure: So if the server wants to communicate "something failed permanently, stop polling", the poll job would fail, we'd try to resend the final MOVE during the next sync and there it'd fail. Alright, that works.

Infinite polling if server replies that way: Should be ok, would be a significant server bug if that happens and working around it is complicated.

});


auto successCallbak = [](TestCase *tc, const QNetworkRequest &request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Callback; decide whether CallBack or Callback

return;
}
_finished = true;
startPollJob(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the overall sync timeout work in the presence of polls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeouts are per jobs. A particular PollJob will timeout after the normal timeout time if there is not traffic on that connection. In this case, it will be re-tried again until it succeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to confirm that if the server needs 30 minutes to finish the upload then the sync will only finish once that last poll is done. (as it should be for 2.5.1)

@PVince81
Copy link
Contributor

note: there's a capability so if you deem suitable, the client could decide to enable the feature if the server does: https://github.com/owncloud/core/pull/32414/files#diff-920833889854d937ebeda804d3eb5b19R52

currently what's in 10.0.10 is in tech preview / alpha stage

@ckamm
Copy link
Contributor

ckamm commented Oct 18, 2018

My current understanding is that this feature is enabled at the server's discretion in the first place: It looks like the client announces its capability to the server with the header, and then it either replies with the poll url or does a synchonous action.

The concern about having it default to off for the patch release is about potential bugs in the client implementation. In addition there could be a way of switching this feature off in case bugs are discovered in the server implementation.

@ogoffart
Copy link
Contributor Author

@ckamm: I've updated the commit: Added comments, and disabled the behavior by default unless setting the env variable OWNCLOUD_LAZYOPS=1

@DeepDiver1975 or @PVince81 : How long does the poll url stay valid, after it once succeed. Imagine this scenario:

  1. The client send a MOVE, the server reply with code 202 and a poll URL.
  2. The server do the operation, the operation is finished
  3. The client request the pull url.
  4. The server send the "finished", but the client crashes before recieving it! (or the connection drops, or the user puts its laptop to sleep)
  5. The client restarts and poll the URL, will that work? What if it is a month later?

Also note that the client will set the OC-LazyOps header for all uploads (non-chunking PUT or chunking MOVE).
We expect the server to be smart about it and not go into asynchronous operation if it know the operation will be fast. Going into asynchronous mode for small PUT would kill performance!

@ckamm
Copy link
Contributor

ckamm commented Oct 18, 2018

minor: commit message says "assynchronious", should be "asynchronous"

@guruz guruz added this to the 2.6.0 milestone Oct 25, 2018
@guruz
Copy link
Contributor

guruz commented Oct 25, 2018

I'm actually fine with having this in 2.5.1 if we enable this feature only by capability. (We kind-of tested this in the wild with a previous customer when it was called asynchronous PUT with poll jobs).
It could be forcible disabled by clients for testing (even if capability is on) by setting OWNCLOUD_LAZYOPS=0

In the server it's disabled by default in config.sample.php so brave server deployers need to enable it anyway in server?

if @ckamm @ogoffart agree please change milestone.

@PVince81 @ogoffart Can we still get answers to the questions above. #6614 (comment)

@PVince81
Copy link
Contributor

I think the server is not smart and always go to lazy mode when requested.

Not sure how long the poll URL is available, @DeepDiver1975 ?

@ogoffart ogoffart changed the base branch from 2.5 to master March 14, 2019 09:39
This was not required with 2.5 because a size of 0 was ignorted when comparing
size by the csync updater, to be compatible with very old version of the database.
But the we discovery will still think the file is changed if the database contains
a size of 0
@ogoffart
Copy link
Contributor Author

Rebased and fixed for 2.6

@guruz guruz merged commit 269a7b6 into master Mar 20, 2019
@guruz guruz deleted the amd branch March 20, 2019 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants