Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark filesystems with lots of changes #4

Open
matheus23 opened this issue Apr 16, 2021 · 4 comments
Open

Benchmark filesystems with lots of changes #4

matheus23 opened this issue Apr 16, 2021 · 4 comments

Comments

@matheus23
Copy link
Member

One of my accounts has lots of filesystem public files history entries. Maybe it might make sense to benchmark this. (Just wanted to write this down somewhere, this came up with of fission-codes/fission#489)

@expede
Copy link
Member

expede commented Apr 16, 2021

We should 100% totally check this, but I'll be pretty surprised if it turned out to be the actual bottleneck. Here's my thinking:

  • Deep structures can require a lot of round trips with bitswap
  • But all (or most) of files (i.e. the heavy stuff) will be available at the same level, even if they're in lower layers
    • i.e. the "heavy" stuff most likely already available in the top layer
  • I've been able to reproduce >60s updates with a single small image

Which isn't to say that trying to sync a new really deep structure won't have a lot of round trips, but each of those should be pretty lightweight because it's just pointers. That of course compounds, so if you need to do 10k round trips because of bitswap, yeah that's pretty rough.

I can absolutely be wrong — the above is pure theory — and we should definitely test this empirically 🔬🧪👍

@matheus23
Copy link
Member Author

I've been able to reproduce >60s updates with a single small image

That sounds interesting! I'd love to hear more.

I'll be pretty surprised if it turned out to be the actual bottleneck.

Yeah I had this issue in my notes and I talked to James about it. I just wanted this to be persisted somewhere. It's all theory of course. Maybe this would've been more appropriate at another place 🤔

@expede
Copy link
Member

expede commented Apr 17, 2021

I've been able to reproduce >60s updates with a single small image

That sounds interesting! I'd love to hear more.

With the known-effected account, I opened the public directory in Drive, and upload a ~400kb image. It took over a minute to complete.

Maybe this would've been more appropriate at another place 🤔

I think that this a great place to record this, and thanks for writing it down! We absolutely need to check this assumption.

My comments above were mainly 1. stating my assumptions, 2. getting the conversation going, and 3. when we test this, we should account for the factors in my (and others) existing assumptions, for example:

  • How many files in a directory (number of flat / horizontal links)
  • Time to upload a single small file vs single large file (i.e. latency vs bandwidth)
  • Time for syncing deeply nested structures, when:
    • File are present at surface layer
    • Files are not present at surface layer

@matheus23
Copy link
Member Author

With the known-effected account, I opened the public directory in Drive, and upload a ~400kb image. It took over a minute to complete.

Ah right, I thought you reproduced this on a new account 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants