-
Notifications
You must be signed in to change notification settings - Fork 33
Stop RAR Uploads
Many Usenet uploads are wrapped in split RAR archives, as it seems to be enshrined as a practice that uploaders should be doing. This forces downloaders to do the reverse, with some going to the extent of having a dedicated ‘unpack drive’, just to speed up the process. Which leads to the question - why do people still upload split RARs, and why do few question this practice?
This article aims to question this common practice, and hopefully convince you that it simply doesn’t make sense in the vast majority of cases.
So why even bother changing an established procedure? From an engineering standpoint, getting rid of a completely unnecessary step simplifies processes, reduces failures, improves efficiency, and may even enable new possibilities.
More specifically, benefits include:
- Faster uploads/downloads, since you don’t need the Rar/Unrar step, and associated I/O overhead
- Files can be written in-place whilst downloading, with no need for temporary storage (read: don’t need a separate disk just for unpacking!). Also saves write cycles if you’re using an SSD, which extends the drive’s lifespan
- Similar to above, you don’t need extra disk space to store the intermediate RAR files (so this is no longer an issue, and direct unpack workarounds are no longer necessary)
- Because files can be written in-place, streaming off Usenet may become more of a possibility. Ideas like this also become simpler/more feasible
- Enables file selection if a group of files is posted, much like how you can choose which files you want from a torrent, as opposed to being forced to download all files packed into the RAR
- Uploader software could be simpler, so developers can focus efforts elsewhere. The same could apply to downloader software, but with the amount of RARs already out there, as well as those that insist on using encryption, I doubt they can drop support for it. Nonetheless, it’s a distant possibility
- This could also simplify uploader setup/config, since the Rar tool no longer is required and doesn’t need to be configured
-
Less confusion for uploaders:
- what size should you split RARs into? “Not too big, but not too small” I hear, but I’m just trying to upload, not imitate Goldilocks. Of course, we don’t live in fairy tale land, and the correct answer is not to split at all
- it can also make PAR2 less confusing, as you don’t need to consider the effects of required file padding when picking appropriate parameters
- avoid mistakes like these
- RAR is unavailable on non-x86 platforms. The official RAR tool is the only application which can create RARs, and only 32/64-bit x86 builds are available. With the rise of non-x86 platforms, particularly ARM, there should be a push to not being tied to x86-only software. Raspberry Pi users take note!
- Potentially simpler/more efficient setups for indexers that display information on file contents. Due to not needing capacity to perform extractions (disk space and I/O load), and possibly more flexibility with selectively downloading articles (less bandwidth usage), it could be cheaper (or more efficient) to run an index. This could translate to cheaper prices for everyone else
- Greater compatibility with alternative sources. Can’t complete a download, but found the file elsewhere? It may be possible the leverage the two if they’re the same file, whereas if the Usenet version was split-RAR’d, you’d likely be SoL. Also enables the PAR2 to be used elsewhere (e.g. repairing an incomplete torrent)
- No password/encryption surprises. Ever downloaded a file, only to be disappointed that the RAR required a password (that you can’t obtain)? If only encrypted uploads are in RARs, it’ll be easier to know to avoid them if you don’t have a password
- Better resiliency against broken downloads. If a download is broken and cannot be repaired, you’ll likely have a better chance at salvaging what you can get if RAR isn’t in your way
Unfortunately, a lot of (holistically unnecessary) engineering effort has gone into supporting RARs, which can’t be taken back, but we can at least lessen the burden for the future.
So despite the benefits, why aren’t people seeking to take advantage of them, and still wrap uploads in split RARs? I’ve seen all sorts of questionable reasons given, along with various misconceptions, so let’s go through them and clear these up.
It actually makes it slower (often significantly so), as there’s a pack step that needs to occur during upload, and an unpack step during download. But even if packing/unpacking was instant, there’s no reason to believe that splitting files speeds up uploading/downloading in any way whatsoever. The misconception seems to stem from the idea that multiple parts enables parallel downloading/uploading, where it otherwise wouldn’t. The reality is that Usenet/NNTP doesn’t care about files, as they’re broken down into articles, and articles can be transferred in parallel regardless of the file organisation. In other words, files already are split, without you needing to add a second level of splitting, which makes doing so rather pointless.
Splitting makes no difference to the likelihood of corruption during upload. Uploading 1GB means you're transferring 1GB, regardless of whether it's a single file or 10x 100MB parts.
Also keep in mind that your file is already broken into parts via articles, so a second level of splitting really does nothing.
If anything, it makes recoverability worse, though the difference is typically insignificant. PAR2, which is often used to recover missing articles, doesn’t handle recovery by file, it does so by block. Thus, breaking your file into arbitrary sized parts doesn’t change the fact that PAR2 doesn’t really care that you’ve done so. As with the above misconception, introducing a second layer of splitting is rather pointless, since PAR2 already divvies your file up into blocks. It can actually make things worse, as archive headers/metadata add overhead, and PAR2 internally requires each file to be zero-padded to the next whole block size, which can slightly reduce the efficiency of PAR2.
I suspect that this may have been a carry over assumption from PAR1, which operated on files instead of blocks. However, no-one should be using PAR1 in this day and age.
Speed is about the same. As mentioned above, PAR2 doesn’t really care that you’ve split the file into smaller pieces, because computation doesn’t operate on files. If you’re knowledgeable enough about how repair works, you might argue that there could be an I/O benefit with only needing to write repairs to a single, smaller part (than rewrite a big file). There’s actually truth to this, however any gains here are more than lost by the need to unsplit the file.
That all sounds suspect? I’ll work it all out and list I/O usage for you. This example compares a 1GB file against splitting it into 10x 100MB parts:
Scenario: 1 error occurs | 1GB file (unsplit) | 10x 100MB parts |
---|---|---|
Download file | Write 1GB file | Write 10x 100MB files |
PAR2 verify | Read 1GB file | Read 10x 100MB files |
PAR2 repair | Write 1GB file Read 1GB |
Write 100MB part Read 100MB |
RAR extract/unsplit | - | Read 10x 100MB files Write 1GB file |
Total | Read: 2GB Write: 2GB |
Read: 2.1GB Write: 2.1GB |
Scenario: no error | 1GB file (unsplit) | 10x 100MB parts |
---|---|---|
Download file | Write 1GB file | Write 10x 100MB files |
PAR2 verify* | Read 1GB file | Read 10x 100MB files |
RAR extract | - | Read 10x 100MB files Write 1GB file |
Total | Read: 1GB Write: 1GB |
Read: 2GB Write: 2GB |
* Assumes downloader doesn’t do verification during download
You can come up with other examples/scenarios if you wish, but in all cases, the unsplit scenario is always better. (also, for typical Usenet corruption (due to lost articles), the PAR2 application could engage in in-place repairing, which would favour the unsplit scenario even more, plus the post-repair verification read pass can theoretically be skipped)
There is one case where there might be a speed gain - if you create a PAR2 set for each split file. In other words, if a file is split into 10 parts, you’d be creating 10 sets of PAR2s (note that ‘sets’ are not volumes). This, however, doesn’t seem to be commonplace (and may not be widely supported) and hurts recoverability. If, however, the notion of speed over recoverability sounds attractive, you’re likely better off just increasing the PAR2 block size and respectively reducing the recovery block count.
Splitting files can actually hinder this. Usenet already splits files into pieces (articles), so a downloader could just pull select parts already. Articles are typically smaller than split RAR parts (~700KB vs several megabytes) so already provide more flexibility in that regard, however, the split RAR parts force you to start from a part boundary rather than any article.
There’s also the question of why you’d only want part of a file. The only typical reason I can think of is if you want to preview a video without needing to download the whole thing - in such a case, it’s actually easier without RAR since you just need to pause the downloader (or not, if it doesn’t lock the file) and preview the video directly.
As with the above misconception, Usenet already breaks files into pieces (articles) that can be retried, so a second level of splitting doesn’t really help.
In particular, yEnc has provision for this by including an embedded checksum, allowing for corruption to be detected at the article level. This detection is more robust than what you could get via split RAR parts, and gives the downloader more flexibility in automatically fixing it up, which is what it should be doing instead of requiring manual intervention.
It perhaps may help if you’re using a downloader with poor recovery capabilities and you’re on a particularly bad connection - neither of which, hopefully, should be common these days.
This is likely untrue, and even if it was, it‘d still be less efficient overall. In regards to file allocation, behaviour will vary across OSes, file systems and downloaders, but modern file systems generally support sparse files, so, if anything, allocating a single large file should be faster than allocating multiple smaller files. Even if the file system doesn’t support sparse files, and your downloader insists on pre-allocating files, there’s no efficiency gain with using split files, as you’re just changing when the file allocation is performed (and paying a slight penalty for additional context switches and I/O requests).
Furthermore, even if none of the above was the case, you still have to allocate the complete file during the unrar step, so any (non-existent) saving you’d get would be more than offset by the cost of the unrar operation.
Downloaders write articles as separate files, which is faster to reconstruct (or needs less disk space) if it's in parts
If a downloader operated this way, splitting would just force two levels of reconstruction to occur, which is obviously more costly than one level. Hence this misconception seems to be based off forgetting that the RAR parts themselves need to be extracted.
In reality though, downloaders shouldn't be writing articles to separate files (it's possible they did so in the past, but it's generally a bad idea), so the assumption here is simply untrue.
This is false - PAR2 doesn't care what type of file you're building a recovery set for.
Usenet binaries already have multiple checksums. yEnc includes a CRC32 checksum, which is verified during download. PAR2 also contains multiple checksums, which a number of downloaders verify as well. The checksum that RAR includes pretty much has no value in the context of Usenet.
Well yeah I suppose, but what are you planning to do with large files on a FAT32 volume? Keep them in pieces and never extract them?
The only case where I can think this makes sense is if you have a download machine that downloads to a FAT32 disk, from where you intend to move the file to a different file system. But in 2021, you should probably consider ditching FAT32 altogether.
Not if you use segmented downloading. Or just use HTTP and a multi-connection HTTP downloader (of which there are plenty out there). As a bonus, it’ll help with torrents and other downloads which aren’t split.
Likely no, but maybe. Obfuscation is a funny thing in that there’s no real standard - it’s just about doing weird and whacky things, so it’s a bit hard to reason about it. So in a way, RAR could obscure things. However, RAR is more or less widespread across Usenet, so really isn’t unusual for obfuscation purposes. At best, it could give you extra knobs to fiddle with, but I don’t really think they’re necessary considering that there’s plenty of other ways to do obfuscation.
An idea I’ve heard is that splitting files enables each file to be posted to a different group, as a form of obfuscation. Of course, you could just post each article to separate groups instead. Or use completely different subjects for each post. Or the multitude of other ways you could obfuscate that isn’t a silly justification for using split RARs.
Note: I don’t consider encryption (using a password) to be obfuscation, though they do often achieve the same end goal. Whether you should be using encryption for uploads targeting public consumption is beyond the scope of this page, nonetheless, if you’re adamant that encryption is required, you ultimately have little choice but to wrap your upload in some form.
I have not heard of such a case. I’ve been uploading hundreds of terabytes of content over the years, with no splitting or RARs, and haven’t heard of any issues related to that.
If such a downloader/indexer which has issues does exist however, it’s probably a good time for it to be updated.
Other than the obfuscation argument (see related point above), I fail to understand how (particularly since a lot of content already is in split RARs).
It's been pointed out that variance in RARing may help with articles to not be identified as being the same. I suspect that this is not likely an issue, but I have no evidence to suggest it either way. Regardless, if it's a concern, randomizing the article size likely serves the same purpose.
The scene distributes primarily via FTP, not Usenet. Why try to be some half-assed ‘scene but not scene’ uploader?
If you’re thinking that there must be some good reason as to why they do it, there really isn’t, other than legacy and the lack of willingness to change technically unsound practices.
If that’s really what your source is, just do what everyone else (e.g. most torrent uploaders) does and extract them. If it helps convince you, the energy/time savings of only one person doing it, vs hundreds/thousands of downloaders, is worth it.
Also, there’s no ‘purity points’ for keeping files as-is. (though if you really must, attach an SRR with the extracted content)
Of course, this does impose some cost to the uploader, which I personally think is worth it, but only the uploader can decide whether they're willing to go to the effort to extract for better accessibility amongst downloaders. It may also be possible for uploader applications to explore automatic unpacking with minimal cost.
Splitting gets around maximum Usenet file size
There is no such thing as a maximum file size on Usenet. I think this misconception stems from the fact that articles have a size limit. Note that Usenet operates on articles, not files, so the article having a size limit doesn't impose any restrictions on the file.
As an uploader, you shouldn't need to worry about article size limits, as your uploader should automatically deal with it for you.
Well... if you sail around in a ship with no internet connection, then “RAR” all you like.
How well is Windows 98 treating you?
Despite what this page is about, there are some legitimate reasons why you’d want to use an archive format, though they’re not typical for many cases of Usenet. Still, I’ll list them out to cover all bases:
- If compression is beneficial (generally not the case for already compressed media content)
- If there are many tiny files (i.e. several KB each) being distributed, as grouping them together reduces article fragmentation and PAR2 file fragmentation
- You want to force grouping amongst a few files. Unfortunately, indexers don’t handle a batch of files with different names particularly well (unless you provide an NZB with them grouped) and will likely display them as separate files. I wish there was a better standard for defining file organisation, but the haphazard construction of the Usenet ecosystem doesn’t make it easy to define such
- If encryption (i.e. a password) is being used, so that only a select few can see your
backupssuper sensitive data (note that I don’t consider encryption for the purpose of obfuscation to be ‘encryption’ in this context, unless you’re purposefully keeping the contents private, i.e. for a private indexer) - You need to preserve a folder hierarchy. yEnc doesn’t forbid paths in file names, but you may need to check support across uploaders and downloaders. If you do find such an application, it might be worth suggesting to the developer to support folder hierarchies
- If the RAR recovery feature is being used, and PAR2 isn’t (this isn’t standard Usenet practice, and is likely undesirable)
- You have custom scripts and don’t want to update them. Removing the RAR step should be a relatively easy change though, so I’d suggest giving it a try. If anything, you should get better speed and reliability
- If you believe that RARs are sacred and must be preserved to not receive divine punishment for going against the status quo
In short, use an archive only when it’d make sense to do so in the absence of Usenet being a factor.
Note: if you use RARs for any of the above reason, splitting isn’t required, although some downloaders benefit from it with “direct unpack” functionality (a streaming unrar would make more sense, saving some I/O and not requiring files to be split, but to my knowledge, no downloader implements such a thing). Also consider using 7z instead, which has fewer licensing issues than RAR, and hence, generally better.
- Your uploader doesn't provide suspend/resume, and you want to be able to manually do so. Even so, this isn’t straightforward to do as grouping counts may be off, unless manually adjusted.
- You’re uploading a >4GB file from a FAT32 disk, and absolutely refuse to upgrade it to something from this millennium (even though the unsplit file would have had to be sourced from a non-FAT32 disk)
If splitting makes sense, consider avoiding the use of RAR and just use plain split files (.001, .002 etc).
There seem to be many benefits of getting rid of the RAR/split step during upload, and few reasons to keep the practice. Many of the reasons cited for using split RARs seem to be questionable at best, whilst actually sound reasons for RAR are rarely cited. This seems to suggest that most cases of RAR usage are completely unnecessary, and it would make sense to benefit from changing this.
Unfortunately, there’s a lot of old tools and guides out there that still suggest using split RARs. Furthermore, there are a lot of folk that will be stuck to their own ways, and will likely be resistant to any change, even if it’s very beneficial to themselves. And that’s not mentioning the lack of enthusiasm for going against accepted practices in general. Due to how Usenet is, RARing during uploads is likely going to stay that way for quite a while. However, that’s no reason to at least question whether split RARs should remain mainstream, and consider improvement.
Whilst downloaders have the most to gain from abandoning RAR, sadly, it’s all in the hands of uploaders (and sometimes indexers if they impose certain rules). Fortunately, the bulk of Usenet uploads are done by a relative minority, so the practice can be impacted significantly if these folk can be convinced to change.
If you believe that there is much to gain from removing RARs from the picture, consider spreading the idea around, particularly towards uploaders and indexer admins, and hopefully convincing others to follow suit.
If you are an uploader, look to see what it would take to remove the RAR/split step from your upload process (which hopefully shouldn’t be more complex than deleting some lines from your script). If your script/tool doesn’t have that option, ask the developer to implement it, which should hopefully be an easy change for them. And if you’re convinced by my points here, perhaps actually experiment with uploading without RARing.
Not convinced? Consider sharing this idea around anyway, as I think it’s worth discussing. Do also let me know of your reasonings though, so I can update this document.
If you're interested in trying out the idea, but confused at the above, here's an alternative explanation:
Most upload guides suggest you do the following to post files to Usenet:
- create a split RAR of the files you wish to post
- create a PAR2 over the files to be uploaded
- submit everything using an uploader
Basically, I'm saying that the first step should be skipped, so you'd just start at step 2.
Some upload applications/scripts will actually automate all three steps for you, in which case, you'd have to find out how to skip the first step in that application.
There’s many benefits to getting rid of split RARs, and practically little reason to either split your files, or RAR them. This seems like a good case for getting rid of the practice, but this can only occur if you help spread the word.
People rarely use split RARs when creating torrents these days, and for good reason. The same should apply to Usenet.