-
-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raise auto piece size selection limit to 16 MB in create_torrent() #2669
Conversation
I just realized that BEP 30 (Merkle hash torrent extension) is supposed to remedy this kind of issue... UPDATE: answering my own question:
Given the above, does this PR still make sense? |
I agree that 2 MiB piece sizes are perhaps a bit small, but 16 MiB feels quite large as well. The larger piece size, the longer it will take a peer to complete downloading a piece. The whole piece need to be downloaded in order to check its hash and checking its hash is required before offering it up to other peers. Larger piece sizes mean longer delays for payload to propagate from one peer to the next, slowing down the dissemination in general. So, how much worse does it get as the piece size increases? It's hard to say without proper measurements, but it probably depends on the general download speed of the peers in the swarm. i.e. the most relevant property is probably not the piece size directly, but the piece download time. Do you have any references to what people on the internet recommend? Another aspect to take into account is whether torrents are expected to primarily be distributed via magnet links or not. If they are, a large torrent file (i.e. small pieces) may not be that big of a deal. Another aspect is to keep bittorrent v2 in mind, although that's probably some way out, I would expect the piece size to be a lot smaller with a merkle tree. But perhaps these concerns are disconnected. |
Bittorrent v2 doesn't change the calculus for piece size selection that much. The only concerns v2 eliminates are the up-front overhead of downloading the piece hashes when using magnet links and the amount of data wasted on hash fails. It seems to me that the current algorithm is too aggressive about increasing the piece size as torrents grow into the gigabytes and beyond. I wonder if we should scale the target piece list size with something like the square root of the total size. Maybe put a floor on it so that small torrents don't sacrifice piece size for an insignificant savings on metadata size. For example, if we used 4GB torrent = 1MB piece size = 80KB piece list I think it's a safe assumption that anyone on on 1TB torrent has a connection fast enough for 16MB pieces to be reasonable. |
Did a quick search on google and found these:
Yeah, in the second guide says 1 MB torrent files is a common limitation, maybe it should be taken into consideration in this PR?
Agree, I was thinking about it too.
Seems you used |
I also like the square root formula. @Chocobo1 what do you think about updating this PR to something like what @ssiloti suggested? One thing to keep in mind, I think it's best to keep piece sizes to even powers of 2. Technically the specification doesn't mandate this and libtorrent supports arbitrary piece sizes, but it's a bit funky. The question then is what should the coefficient be? is 2 right? |
@Chocobo1 It looks like the coefficient is smaller than 2 because we round the piece size up to the nearest power of 2, so the actual piece list size comes out smaller than the target. Bittorrent v2 requires the piece size to be a power of 2. |
Okay, I've updated the PR. Some results for small size torrents: UPDATED: fixed wrong values. maybe the coefficient can be raised for small torrents? |
src/create_torrent.cpp
Outdated
piece_size = int(fs.total_size() / (target_size / 20)); | ||
const int hash_size = 20; | ||
|
||
double target_list_size = sqrt(fs.total_size()) * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be std::sqrt()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::sqrt
is not a template. (see documentation). The integral square root function wasn't added until C++11, and since this patch is against RC_1_1 (which builds with C++98) it will need to convert to a double. I don't think such conversion is that great. In C++11 there are integer overloads for sqrt()
though, so that's a possibility for master.
I'm thinking that the value space here is quite small. We want to round up the result to a power of 2 anyway, with only 11 different answers (16kiB, 32kiB, ..., 8 MiB, 16 MiB).
Would it make sense to just have a table instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to just have a table instead?
Indeed, PR updated.
I try keeping sqrt()
for master branch.
src/create_torrent.cpp
Outdated
} | ||
piece_size = i; | ||
piece_size = 16*1024 * pow(2.0, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really prefer to not have any floating point operations, especially if they're unnecessary.
In this case for instance, you could just do: piece_size = 0x4000 << i
, if I'm not mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
I think this table-approach is better and simpler than using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer this be done without any floating point operations. I don't think it's necessary and I think it adds complexity
16 MB is chosen to have a bit more future proof Also rewrite the auto piece size selection algorithm, so that it will scale with torrent content size, suggested by @ssiloti.
Related issue in qbt: qbittorrent/qBittorrent#8205
I've read various guides on the net, however they doesn't put out suggestions for large torrents (size in tens/hundreds of GB).
I'm only partially aware of its consequences, I definitely need some advice and comments welcome!
ps. This one is targeting RC_1_1 branch, I'll open another one for master branch after merged