Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance indexing "big directories" #7588

Open
kallisti5 opened this issue Aug 10, 2020 · 3 comments
Open

Performance indexing "big directories" #7588

kallisti5 opened this issue Aug 10, 2020 · 3 comments
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization

Comments

@kallisti5
Copy link

kallisti5 commented Aug 10, 2020

Version information:

$ ipfs version --all
go-ipfs version: 0.6.0
Repo version: 10
System version: amd64/linux
Golang version: go1.14.4

Description:

Usage:

  • Add a folder containing 35GiB of files across ~7,108 files.
  • Documents only exist on a single node with a 200 Mbit up/down internet connection
  • Navigate to folder on a gateway such as gateway.ipfs.io (success)
  • Enter the folder (index) with 7,108 files. Timeout on gateway.

Does the IPFS gateway need to transfer a large amount of data to get a directory index?

The IPFS daemon has been running about 30Mbit/s for a while now.
bandwidth

@kallisti5 kallisti5 added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Aug 10, 2020
@kallisti5
Copy link
Author

kallisti5 commented Aug 10, 2020

I connected a remote node (vm at external hosting provider) directly to my local home IPFS node:
ipfs swarm connect /ip4/XX.XX.XX.XX/tcp/4001/p2p/QmdDXXX

and ran an ipfs ls...

time ipfs ls /ipfs/QmWApXeXXX/current/packages

It took over 7 minutes to perform an ls:

real 7m17.463s

@utkarsh5k
Copy link

I'm willing to take a look at this, but I'm a first time contributor, so if someone could point me to the indexing logic, that'd be super helpful, thanks!

@aschmahmann
Copy link
Contributor

Does the IPFS gateway need to transfer a large amount of data to get a directory index?

It currently needs the to get the first block of each file in the directory to decide if it's a file or a directory.

As for ipfs ls if you set --resolve-type and --size to false it will only need to get the directory data and not reach into the individual files. Also having --stream will give you outputs without having to wait for the full results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

3 participants