Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise auto piece size selection limit to 16 MB in create_torrent() #2669

Merged
merged 1 commit into from
Jan 8, 2018
Merged

raise auto piece size selection limit to 16 MB in create_torrent() #2669

merged 1 commit into from
Jan 8, 2018

Conversation

Chocobo1
Copy link
Contributor

@Chocobo1 Chocobo1 commented Jan 5, 2018

Related issue in qbt: qbittorrent/qBittorrent#8205

I've read various guides on the net, however they doesn't put out suggestions for large torrents (size in tens/hundreds of GB).
I'm only partially aware of its consequences, I definitely need some advice and comments welcome!

ps. This one is targeting RC_1_1 branch, I'll open another one for master branch after merged

@Chocobo1
Copy link
Contributor Author

Chocobo1 commented Jan 5, 2018

I just realized that BEP 30 (Merkle hash torrent extension) is supposed to remedy this kind of issue...
However is the support widespread enough so we can recommend it to users?
Does major torrent clients support it yet?


UPDATE: answering my own question:

  1. No one seems using BEP 30 and it might be broken: Bittorrent v2 #2197 (comment)
  2. There is BEP 52 (The BitTorrent Protocol Specification v2), which probably will mitigate this issue better than BEP 30.

Given the above, does this PR still make sense?

@arvidn
Copy link
Owner

arvidn commented Jan 5, 2018

I agree that 2 MiB piece sizes are perhaps a bit small, but 16 MiB feels quite large as well.
Here's my model for reasoning about what a reasonable piece size is:

The larger piece size, the longer it will take a peer to complete downloading a piece. The whole piece need to be downloaded in order to check its hash and checking its hash is required before offering it up to other peers. Larger piece sizes mean longer delays for payload to propagate from one peer to the next, slowing down the dissemination in general.

So, how much worse does it get as the piece size increases? It's hard to say without proper measurements, but it probably depends on the general download speed of the peers in the swarm. i.e. the most relevant property is probably not the piece size directly, but the piece download time.

Do you have any references to what people on the internet recommend?

Another aspect to take into account is whether torrents are expected to primarily be distributed via magnet links or not. If they are, a large torrent file (i.e. small pieces) may not be that big of a deal.

Another aspect is to keep bittorrent v2 in mind, although that's probably some way out, I would expect the piece size to be a lot smaller with a merkle tree. But perhaps these concerns are disconnected.

@ssiloti
Copy link
Collaborator

ssiloti commented Jan 6, 2018

Bittorrent v2 doesn't change the calculus for piece size selection that much. The only concerns v2 eliminates are the up-front overhead of downloading the piece hashes when using magnet links and the amount of data wasted on hash fails.

It seems to me that the current algorithm is too aggressive about increasing the piece size as torrents grow into the gigabytes and beyond. I wonder if we should scale the target piece list size with something like the square root of the total size. Maybe put a floor on it so that small torrents don't sacrifice piece size for an insignificant savings on metadata size. For example, if we used target_size = sqrt(total_size) * 2:

4GB torrent = 1MB piece size = 80KB piece list
16GB = 2MB = 160KB
128GB = 4MB = 640KB
1TB = 16MB = 1.25MB

I think it's a safe assumption that anyone on on 1TB torrent has a connection fast enough for 16MB pieces to be reasonable.

@Chocobo1
Copy link
Contributor Author

Chocobo1 commented Jan 6, 2018

Do you have any references to what people on the internet recommend?

Did a quick search on google and found these:
https://wiki.vuze.com/w/Torrent_Piece_Size
http://torrentinvites.org/f29/piece-size-guide-167985/

8mb and 16mb are useful for large file compilations to get around torrent file size limits on sites

Yeah, in the second guide says 1 MB torrent files is a common limitation, maybe it should be taken into consideration in this PR?
Yet I also think 1MB limit is a bit silly for a mega size content (1 TB), however this is tracker operators problem.

I wonder if we should scale the target piece list size with something like the square root of the total size. Maybe put a floor on it so that small torrents don't sacrifice piece size for an insignificant savings on metadata size.

Agree, I was thinking about it too.

For example, if we used target_size = sqrt(total_size) * 2:

Seems you used 1.25 instead of 2 in the examples?
I think the result is not bad.

@arvidn
Copy link
Owner

arvidn commented Jan 6, 2018

I also like the square root formula. @Chocobo1 what do you think about updating this PR to something like what @ssiloti suggested?

One thing to keep in mind, I think it's best to keep piece sizes to even powers of 2. Technically the specification doesn't mandate this and libtorrent supports arbitrary piece sizes, but it's a bit funky.

The question then is what should the coefficient be? is 2 right?

@ssiloti
Copy link
Collaborator

ssiloti commented Jan 6, 2018

@Chocobo1 It looks like the coefficient is smaller than 2 because we round the piece size up to the nearest power of 2, so the actual piece list size comes out smaller than the target. Bittorrent v2 requires the piece size to be a power of 2.

@Chocobo1
Copy link
Contributor Author

Chocobo1 commented Jan 6, 2018

@Chocobo1 what do you think about updating this PR to something like what @ssiloti suggested?

Okay, I've updated the PR.

Some results for small size torrents:
torrent content, determined piece size, hash list size, piece count
1MB, 16KB, 1KB, 64
10MB, 32KB, 6KB, 320
100MB, 128KB, 15KB, 800
200MB, 256KB, 15KB, 800
700MB, 512KB, 27KB, 1400

UPDATED: fixed wrong values.

maybe the coefficient can be raised for small torrents?

@arvidn
Copy link
Owner

arvidn commented Jan 6, 2018

piece_size = int(fs.total_size() / (target_size / 20));
const int hash_size = 20;

double target_list_size = sqrt(fs.total_size()) * 2;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be std::sqrt()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::sqrt is not a template. (see documentation). The integral square root function wasn't added until C++11, and since this patch is against RC_1_1 (which builds with C++98) it will need to convert to a double. I don't think such conversion is that great. In C++11 there are integer overloads for sqrt() though, so that's a possibility for master.

I'm thinking that the value space here is quite small. We want to round up the result to a power of 2 anyway, with only 11 different answers (16kiB, 32kiB, ..., 8 MiB, 16 MiB).

Would it make sense to just have a table instead?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to just have a table instead?

Indeed, PR updated.
I try keeping sqrt() for master branch.

}
piece_size = i;
piece_size = 16*1024 * pow(2.0, i);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer to not have any floating point operations, especially if they're unnecessary.
In this case for instance, you could just do: piece_size = 0x4000 << i, if I'm not mistaken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@arvidn
Copy link
Owner

arvidn commented Jan 7, 2018

I think this table-approach is better and simpler than using std::sqrt() in master too.

Copy link
Owner

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this be done without any floating point operations. I don't think it's necessary and I think it adds complexity

16 MB is chosen to have a bit more future proof
Also rewrite the auto piece size selection algorithm, so that it will scale with
torrent content size, suggested by @ssiloti.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants