Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite cp #498

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

rewrite cp #498

wants to merge 11 commits into from

Conversation

ayushkamat
Copy link
Contributor

No description provided.

Signed-off-by: Ayush Kamat <ayush@latch.bio>
Signed-off-by: Ayush Kamat <ayush@latch.bio>
latch_cli/services/cp/download/main.py Outdated Show resolved Hide resolved
latch_cli/services/cp/download/main.py Outdated Show resolved Hide resolved
latch_cli/services/cp/download/main.py Outdated Show resolved Hide resolved
try:
parent.mkdir(exist_ok=True, parents=True)
break
except NotADirectoryError: # somewhere up the tree is a file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh shouldnt this eat shit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this means

latch_cli/services/cp/download/main.py Outdated Show resolved Hide resolved
latch_cli/services/cp/http_utils.py Outdated Show resolved Hide resolved
Comment on lines 63 to 71
acc_info = execute(gql.gql("""
query AccountInfo {
accountInfoCurrent {
id
}
}
"""))["accountInfoCurrent"]

for src in srcs:
src_remote = is_remote_path(src)
acc_id = acc_info["id"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used in the get_path_error fn in the except for nice error message printing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely only do this query if we need to (if there is an error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would entirely avoid network stuff in the error path if possible.

latch_cli/services/cp/main.py Outdated Show resolved Hide resolved
Comment on lines 134 to 135
# jitter to not dos nuc-data
await asyncio.sleep(0.1 * random.random())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda odd what did we see here before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for wide directories with small files theres not enough time between start-upload calls so we end up throttling nuc-data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But like shouldn't we use a semaphore on the call rather than adding jitter? Jitter is worse because it is not aware of how many calls are inflight or how long they are taking.

Comment on lines 168 to 193
# exception handling
resp = await sess.post(
"https://nucleus.latch.bio/ldata/end-upload",
headers={"Authorization": get_auth_header()},
json={
"path": work.dest,
"upload_id": data["upload_id"],
"parts": [
{
"ETag": part.etag,
"PartNumber": part.part,
}
for part in parts
],
},
)
resp.raise_for_status()

if print_file_on_completion:
pbar.write(work.src.name)

pbar.reset()
total_pbar.update(1)

pbar.clear()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to do a smarter backoff with more tries given that we can 429 on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense for more retries - two qs:

  1. what is a smarter backoff method - im not super familiar with any other than exponential
  2. what does this backoff method lack that a smarter method would address?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is kinda unbounded in retries but there should be a maximum. Ideally, we have a semaphore which bounds the number of concurrent calls to nuc-data and then backoffs are less important and we can keep as is.

Signed-off-by: Ayush Kamat <ayush@latch.bio>
Signed-off-by: Ayush Kamat <ayush@latch.bio>
Signed-off-by: Ayush Kamat <ayush@latch.bio>
Signed-off-by: Ayush Kamat <ayush@latch.bio>
Signed-off-by: Ayush Kamat <ayush@latch.bio>
Comment on lines 44 to 46
"https://nucleus.latch.bio/ldata/start-upload": asyncio.BoundedSemaphore(2),
"https://nucleus.latch.bio/ldata/end-upload": asyncio.BoundedSemaphore(2),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u r probably fine with like 5 or 10 each

Comment on lines 71 to 72
start_upload_sema = asyncio.BoundedSemaphore(2)
end_upload_sema = asyncio.BoundedSemaphore(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beefier?

Comment on lines 149 to 153
if resp.status == 429:
raise RateLimitExceeded(
"The service is currently under load and could not complete your"
" request - please try again later."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait this should just backoff and retry? why are we failing here?

Comment on lines 196 to 200
if resp.status == 429:
raise RateLimitExceeded(
"The service is currently under load and could not complete your"
" request - please try again later."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very odd to die here.

Signed-off-by: Ayush Kamat <ayush@latch.bio>
Signed-off-by: Ayush Kamat <ayush@latch.bio>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants