Add with_tokio_runtime to HTTP stores #4040

tustvold · 2023-04-09T16:20:17Z

Which issue does this PR close?

Closes #.

Rationale for this change

This allows isolating IO into a separate pool, allowing using object_store outside of a tokio context

What changes are included in this PR?

Are there any user-facing changes?

tustvold · 2023-04-09T16:20:53Z

object_store/src/client/retry.rs

 use tracing::info;

+#[derive(Debug)]


This module is not public, and so these changes are not breaking

tustvold · 2023-04-09T16:21:38Z

object_store/src/client/retry.rs

+/// This is unlike the public [`ClientOptions`](crate::ClientOptions) which contains just
+/// the properties used to construct [`Client`](reqwest::Client)
+#[derive(Debug, Clone, Default)]
+pub struct ClientConfig {


The separation of ClientOptions and ClientConfig is perhaps a little derived, but ClientConfig is a crate-private implementation detail and so I think this is fine.

tustvold · 2023-04-09T16:24:00Z

object_store/src/client/retry.rs

+
+        match config.runtime.as_ref() {
+            Some(handle) => handle
+                .spawn(fut)


It is worth highlighting that this only spawns the code that generates the Response, the Response streaming can and will take place in the calling context. This is perfectly acceptable as the mio reactor registration will have occurred already, the futures plumbing is runtime agnostic

Might be worth to put your entire comment in code as a code comment.

I'm not sure I follow your argument here. The underlying socket is registered w/ the IO runtime and so is the mio reactor. However we still cross-poll. So is our assumption that when polling data from the IO runtime to which mio just has written to, mio will never change its mind and "jump" to another runtime?

tustvold · 2023-04-09T18:30:08Z

object_store/src/http/mod.rs

-    #[tokio::test]
-    async fn http_test() {
+    /// Deletes any directories left behind from previous tests
+    async fn cleanup_directories(integration: &HttpStore) {


This is necessary because we now run the test twice, and the directories left behind cause tests of list_with_delimiter to fail.

I have confirmed that this behaviour of returning common prefixes for empty directories is consistent with LocalFileSystem. The reason we don't run into this with LocalFileSystem is that it creates a new temp directory for each test

crepererum

I think this is OK, but IMHO quite risky since it assumes behavior that I'm not sure counts as a stable interface.

crepererum · 2023-04-11T09:06:55Z

object_store/src/client/retry.rs

+
+        match config.runtime.as_ref() {
+            Some(handle) => handle
+                .spawn(fut)


Might be worth to put your entire comment in code as a code comment.

crepererum · 2023-04-11T09:25:07Z

object_store/src/client/retry.rs

+
+        match config.runtime.as_ref() {
+            Some(handle) => handle
+                .spawn(fut)


I'm not sure I follow your argument here. The underlying socket is registered w/ the IO runtime and so is the mio reactor. However we still cross-poll. So is our assumption that when polling data from the IO runtime to which mio just has written to, mio will never change its mind and "jump" to another runtime?

…ntime

tustvold · 2023-04-13T16:13:57Z

Marking as a draft whilst I think a bit more on this, another option might be to do something similar to https://docs.rs/async-compat/latest/async_compat/ and return decorated types

tustvold · 2023-09-18T17:46:04Z

Looping back to this I think this problem is ill-formed. There are two major use-cases for this functionality:

Supporting object_store on threads not managed by tokio
Support object_store in systems containing multiple thread pools with different tail latencies

The first use-case is better served by integrating tokio at a higher level, e.g. using Handle::enter at the thread level.

It is unclear how to handle the second use-case at a library level. The use of a second threadpool implies that the primary threadpool may have very high tail latencies. The problem is determining at what point this should result in back pressure on the underlying TCP connection. As written this PR will not change the way that this backpressure occurs, should the task not get scheduled on the high tail latency threadpool, nothing will drain the TCP socket, and TCP backpressure will occur. The approach in #4015 instead uses a queue with capacity for a single chunk, which will delay this TCP backpressure very slightly. You could increase the queue size, or make a more sophisticated queue that buffers a given number of bytes, but it is unclear how users would control this buffering behaviour.

Taking a step back this feels like the wrong way to solve this problem, ultimately IO should be segregated from compute at a meaningful application task boundary, rather than at the object_store interface. For example, AsyncFileReader::get_bytes could perform the IO to fetch a given chunk of data on a separate thread pool. This avoids object_store having to make decisions about how much buffering is too much, etc...

I am therefore going to close this PR

Add with_tokio_runtime to HTTP stores

5514e58

github-actions bot added the object-store Object Store Interface label Apr 9, 2023

tustvold commented Apr 9, 2023

View reviewed changes

Fix test concurrency

0165888

tustvold force-pushed the add-with-tokio-runtime branch from f2ee3d8 to 0165888 Compare April 9, 2023 16:36

tustvold mentioned this pull request Apr 9, 2023

feat: cross runtime object store wrapper #4015

Closed

tustvold added 2 commits April 9, 2023 17:44

Fix nested runtimes

cdffaef

Cleanup directories

09853c5

tustvold commented Apr 9, 2023

View reviewed changes

crepererum approved these changes Apr 11, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/master' into add-with-tokio-ru…

ddc2506

…ntime

tustvold marked this pull request as draft April 13, 2023 16:12

tustvold closed this Sep 18, 2023

tustvold mentioned this pull request Jan 2, 2024

parquet: an arrow::async_writer::AsyncArrowWriter that splits CPU-bound and I/O bound tasks #5269

Closed

tustvold mentioned this pull request Feb 6, 2024

Multipart upload can leave futures unpolled, leading to timeout #5366

Closed

This was referenced Aug 14, 2024

error decoding response body after upgrade to object store 0.10 #5882

Open

Add ParquetObjectReader::with_runtime #6248

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add with_tokio_runtime to HTTP stores #4040

Add with_tokio_runtime to HTTP stores #4040

tustvold commented Apr 9, 2023

tustvold Apr 9, 2023

tustvold Apr 9, 2023

tustvold Apr 9, 2023

crepererum Apr 11, 2023

crepererum Apr 11, 2023

tustvold Apr 9, 2023

crepererum left a comment

crepererum Apr 11, 2023

crepererum Apr 11, 2023

tustvold commented Apr 13, 2023 •

edited

Loading

tustvold commented Sep 18, 2023 •

edited

Loading

Add with_tokio_runtime to HTTP stores #4040

Add with_tokio_runtime to HTTP stores #4040

Conversation

tustvold commented Apr 9, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crepererum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Apr 13, 2023 • edited Loading

tustvold commented Sep 18, 2023 • edited Loading

tustvold commented Apr 13, 2023 •

edited

Loading

tustvold commented Sep 18, 2023 •

edited

Loading