Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipfs dag export is slower than ipfs cat #8004

Open
jsign opened this issue Mar 23, 2021 · 7 comments
Open

ipfs dag export is slower than ipfs cat #8004

jsign opened this issue Mar 23, 2021 · 7 comments
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding P2 Medium: Good to have, but can wait until someone steps up

Comments

@jsign
Copy link

jsign commented Mar 23, 2021

Version information:

go-ipfs version: 0.8.0
Repo version: 11
System version: amd64/linux
Golang version: go1.15.8

Description:

I'm trying to get a sense of the throughput of CAR exporting for a DAG, mainly motivated by data preparing to Filecoin onboarding. A quick ipfs dag export test in some cloud VM showed that I had ~40MiB/s throughput which is pretty slow, and most probably related to slow SSDs.

To take out disk from the equation, I did the following experiment using IPFS mounting the repo in RAM:
Term 1:

$ IPFS_PATH=/dev/shm/.ipfs ipfs init
generating ED25519 keypair...done
peer identity: 12D3KooWFqjeUajS89iwFpn2z2m7ZMzdJiLZJ98LRvvje6mKEj4W
initializing IPFS node at /dev/shm/.ipfs
to get started, enter:

        ipfs cat /ipfs/QmQPeNsJPyVWPFDVHb77w8G42Fvo15z4bG2X8D2GhfbSXc/readme
$ IPFS_PATH=/dev/shm/.ipfs ipfs daemon
Initializing daemon...
...
Daemon is ready

Term 2:

➜ head -c 5G </dev/urandom > 5gb     
➜ ls -lh 5gb 
-rw-rw-r-- 1 ignacio ignacio 5,0G mar 23 11:07 5gb
➜ time IPFS_PATH=/dev/shm/.ipfs ipfs add  5gb 
added QmVo3NGUcj8MTGfLTSTL4W1BpmAAFfwk76rznK4Z9DUqW6 5gb
 5.00 GiB / 5.00 GiB [==============================================================================================================] 100.00%
IPFS_PATH=/dev/shm/.ipfs ipfs add 5gb  1,44s user 2,54s system 47% cpu 8,441 total
➜  time IPFS_PATH=/dev/shm/.ipfs ipfs cat QmVo3NGUcj8MTGfLTSTL4W1BpmAAFfwk76rznK4Z9DUqW6 > /dev/null
IPFS_PATH=/dev/shm/.ipfs ipfs cat  > /dev/null  2,20s user 7,87s system 83% cpu 12,096 total
➜  time IPFS_PATH=/dev/shm/.ipfs ipfs dag export QmVo3NGUcj8MTGfLTSTL4W1BpmAAFfwk76rznK4Z9DUqW6 > /dev/null
 19s  5.00 GiB / ? [-------------------------------------------------------------------------------------=----------------] 265.79 MiB/s 19s 
IPFS_PATH=/dev/shm/.ipfs ipfs dag export  > /dev/null  4,14s user 10,72s system 77% cpu 19,299 total

So for a 5GiB random file:

  • ipfs add: 8.4s (609MiB/s)
  • ipfs cat: 12.09s (423MiB/s)
  • ipfs dag export: 19.29s (265MiB/s)

It seems that these results are odd since ipfs dag export is slower than ipfs cat. Sounds like exporting the DAG should be doing less work than unpeeling the UnixFS layer for cat-ing the file.

Also, note the IPFS repo is mounted in RAM so these throughputs are best-case scenarios.

The IPFS config file is just the default that gets created on a fresh ipfs init as shown above (so using FlatFS).
Other info about my box:

  • RAM: 32 GiB
  • CPU: AMD Ryzen 7 3800XT
  • Disk: this (although this should only be relevant for ipfs add and not the rest since it's in RAM)

Extra test with 1GiB file showing the same fact:

$ IPFS_PATH=/dev/shm/.ipfs ipfs cat QmTEsmWrxvzEhhPoiMkU2tMAfhwAsVpKQ8otCuHsFtTpmM > /dev/null
IPFS_PATH=/dev/shm/.ipfs ipfs cat  > /dev/null  0,70s user 1,60s system 91% cpu 2,510 total
$ time IPFS_PATH=/dev/shm/.ipfs ipfs dag export QmTEsmWrxvzEhhPoiMkU2tMAfhwAsVpKQ8otCuHsFtTpmM > /dev/null
 3s  1.00 GiB / ? [-----------------------------------------------------=--------------------------------------------------] 266.52 MiB/s 3s 
IPFS_PATH=/dev/shm/.ipfs ipfs dag export  > /dev/null  1,00s user 2,12s system 80% cpu 3,873 total

Let me know if there might be any pitfall, or similar!

@jsign jsign added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Mar 23, 2021
@welcome
Copy link

welcome bot commented Mar 23, 2021

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@hannahhoward
Copy link
Contributor

FWIW, @jsign both cat and export have to do a dag traversal from the blockstore (the car is not stored linearly on disk), and export has to write minimally more data, as it includes all blocks (not just leaf nodes) and RLE encodings for car blocks. One thing I'd be curious about -- what's the generated CAR size for the 5.0GB flat file?

@hannahhoward
Copy link
Contributor

BTW, I just saw this and thought it was an interesting issue -- probably someone else on the IPFS team will be in charge of addressing it.

@jsign
Copy link
Author

jsign commented Mar 23, 2021

Yep, both cases are doing random-reads so that shouldn't be relevant to explain the difference.
The ipfs-cat has to do some extra work to unwrap the UnixFS data from nodes. The ipfs-dag-export should always store raw data that's already in RAM (plus RLE, but that should be quite fast).

Regarding the sizes, the original file size is 5368709120 bytes, and the CAR output 5370748072 bytes, so a diff of ~2MiB (UnixFS+CAR overhead).

@hannahhoward
Copy link
Contributor

Yea I'm stumped then -- I even went to look at CAR export and CAT's traversal methods -- it's not at all clear to me why one would be faster or slower.

@ribasushi
Copy link
Contributor

@hannahhoward isn't this kinda-sorta related to ipld/go-ipld-prime#149 or a similar multiple-rehashing problem in the traversal somewhere ?
cc @warpfork & @willscott as you've both worked in this area very recently too

@hannahhoward
Copy link
Contributor

@ribasushi unless I'm missing something the non of these should be running into the hash on read problem so probably not?

@BigLep BigLep added need/analysis Needs further analysis before proceeding P2 Medium: Good to have, but can wait until someone steps up and removed need/triage Needs initial labeling and prioritization labels Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding P2 Medium: Good to have, but can wait until someone steps up
Projects
None yet
Development

No branches or pull requests

4 participants