-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compress world saves, and write them to a single file each. #10784
Comments
Compression is fine, joining the files is not. |
What compression schema did you use? Since Cataclysm essentially demands selective loading, I imagine that a format without a file index(or with a very inefficient one) would butcher the load performance. As far as saving goes, lightweight compression schemes are very light on the CPU, and file sizes on the orders of 4-8MiB are written in a fraction of a second with a modern HDD. Though, last paragraph said, I didn't think about the fact that the process of JSON generation itself could be a bottleneck as well, and that re-writing the entire thing would necessitate reading everything inside anyway. Hmmm. Perhaps it could be bypassed with a clever indexing scheme, that would let us get away with just reading the text from the original archive, and recompressing it into the new one. I can't say I'm a big fan of joining everything into one big file, but JSON is very repetitive(as is most text), whitespace particularly. The benefit of compressing small files individually is very modest. A decent idea could be to join world tiles, but that would be a bit of a huge restructurization, and would only be of benefit for file count. Whether that is worth it is dubious. Someone could sometime check what kind of effect it has on, say, NTFS/FAT32 fragmentation, I guess. All in all, your response sort of makes me (once again) realize by how much I care too much about some abstract concept of "elegance" over real issues. |
IIRC it can take over a minute to save/load under the old system, and that's at one or two overmaps (an overmap is the area revealed by debug map reveal: a little smaller than the area revealed by a map item or lab download). I'll haul out an old version if you'd like--I've got a few kicking around. |
I used gzip, I think I may have tried something else too, but can't remember what. Cataclysm was not doing selective loading, it was slurping all of the map data in at game start.
We have one, it's a tiered b* or similar tree structure indexed by a mangled set of map coordinates. This looks like a solution in search of a problem, there are MANY things we could do to speed up save read/write speeds and/or to hide said latency from the user, but frankly it doesn't come up, so I don't worry about it.
Have you witnessed this, or is it a theoretical issue? Every time this comes up it turns out that someone is concerned about it, but hasn't actually seen any problems. |
As for severity, I encountered upwards of 10 minutes to load/save before |
Is that a specially prepared save, something long-running, or just some sort of worst case coincidence? |
Just for fun, here's a quick test I just ran on a save I sized up using debug reveal a few times:
|
For long-running worlds, copying takes approximately half of forever without compression beforehand. One of my worlds is currently sitting at 422MB, 486MB on disk. Compressed (LZMA2, normal compression, 8 threads), it took 10:46 and the file weighs in at 14106KB. e: 40336 files, 270 directories for this world. One character. |
That sounds like it'd be a good benchmark save; can I get a copy of it? |
One blatantly cheaty save, coming right up: https://dl.dropboxusercontent.com/u/3273892/Bull%20Valley.7z |
Thanks much, here's your stats:
Something of note is that even just collecting the submap files from each region still shows about the same compression ratio, but a region can only grow so big (it's naturally bounded in maximum size), so the worst-case scenario looks something like this:
I think 177 milliseconds for a (nearly?) completely full overmap's worth of submaps is okay, and would lead to only 269 files instead of over 40,000 in the case of Bull Valley, above. Does that sound valuable? |
The worst case I mentioned was an actual save from a long-running game, in
See my comment about that earlier, use an archiver, they're designed to do
Other than on-disk size, what's the benefit here? It most likely isn't |
I think having less files to copy over would be a massive improvement. My long-running (1,5 in-game year) save is ~80 MB and copying it over to newest experimental can get annoying. |
Regarding backups, one should make incremental backups. In long running games, a lot of the map files are unchanged and don't need to be copied again. Instruct the copy tool to only copy changed files (as indicated by the files last modification time). This will only copy the changed / new map files. This also works if one backups into an archive file (like zip or tar). |
Yeah, I use rsync for backups. I don't personally mind the massive directory size, since it's the only one I have of that size and I've still got a solid ~130GB left on this disk, but I imagine it could be troublesome for other people. |
Please do not report theoretical or 'problems other people might have'. We |
Alright, here's your real problem, then. That world save that I uploaded, when uncompressed:
But when compressed, it's all of 14MB. tl;dr world saves for long-term games, especially with highly-mobile characters (i.e. "mobile fortress"-style games), are stupid huge. |
Here's a thought: instead of copying over your save, why don't you just keep the archive of your old version? Copy the new one right over the old one and avoid the whole backup/copy mess. I remember the old, monolithic saves, and sighing, seeing the progress indicator x/7000 or some ridiculous number, knowing I would have to delete everything but my character file and let the world regenerate. This new way is MUCH, MUCH better. You could also enable folder compression if size is that much of an issue. I don't know if Linux has something similar, but I just compress my whole DDA folder and forget about it. One other thing, it might not apply to others, but I'm a bit of a munchkin hoarder and I noticed that crafted items take up a LOT of space since they store every single item from which they were crafted. And if those components were themselves crafted... After I removed the relevant section of crafting code, I noticed common savings of several hundred megabytes. |
And then people complain that their crafted items lose what they're made from. Deus Ex syndrome: keeping the place exactly the way you left it is possible, but expensive. |
Using a strong compression that takes a long time to finish (on my linode:
I dunno, it only seems to take measurable time when you have a complete overmap and you wouldn't be likely to save more than four overmaps at the same time. Noticeable, maybe, but still minimal. To be honest, I'm far more concerned by your other point -- the risk of save-breaking bugs. Which is a major flaw in my book. |
Offtopic, but speaking of storing everything an item was made out of, I thought about splitting disassembly off into separate disassembly recipes, and craftables themselves into two categories - one being reversible crafts, the other being otherwise disassemblable ones. And back onto the issue, would it be possible to avoid saving, and simply regenerate things if they were not modified? E.g. before generating a tile/chunk, randomize a couple of seeds to use for secondary RNG generators, and flag unmodified terrain/items/vehicles/etc. so they aren't saved and only regenerated. |
Maybe they didn't realize how much it would cost. I'm not advocating anything, but keeping the component list isn't something I care about which is exactly why I removed it in my own version. I really can't imagine there are that many people willing to pay 100+ MB of disk and increased save/load times just so their items remember what they were crafted from though. But I hardly ever disassemble stuff I crafted, so maybe that's just me. Think about it though: That Steel frame in your car, you made that out of, what, 8 lumps of steel? And you made those lumps of steel out of what, 4 chunks? And you made those chunks out of what, 3 scrap, or some such? And how many of those do you have in your car? Things can quickly get out of hand like that.
I think that would help. Does anyone REALLY care what a steel frame was made out of? Does it even make sense disassembling something like a steel frame? |
Yeah my mobile fortress save is 3 years in and is a good 800+ megs heh. Compression would be gnarly as it is less than a tenth of that. |
As soon as you put it into a car, the vehicle and all its data vanish, there is no overhead anymore. When you remove the frame, you get a default frame item (with no components, it's like a world-gen item). |
I was talking about the frames stored in your trunk. ^_^ You're right, of course. I forgot about that. I can only speak from my own experience, but I personally saved 100-400 MB on my saves when I removed the component memory code, though none of that was in my car. You would be subject to that example WHILE you were constructing the car, however, if you save regularly and have finished but uninstalled parts lying around. And if you were to decide you didn't need that extra frame and just left it there... |
riiight...
Keep in mind a complete overmap will at some point have multiple z-levels in it. We're holding back some capacity for that so it remains possible, if we're not careful we'll find ourselves with a multi-level implementation we can't turn on because it's too slow.
Do you mean not storing components in an item if the crafting recipe isn't reversible? That's already done.
Sure, if you rewrite all of mapgen and most of the map code to handle regenerating unchanged things and saving only things that need to be saved. In case that doesn't make it obvious, I think this is a terrible idea. The state of the issue is exactly where it was when it started.
|
Is that strictly correct? Does "disassemble-able" (assuming constructible as well) equate to "reversible," logically speaking? Going back to the example of the steel frame, clearly if you can construct something like a steel frame, you should be able to melt it down and reuse the scraps. But we shouldn't need to store the information on what exactly went into that frame because we shouldn't be able to disassemble it that way. It's a fixed mass of steel. I suppose the work-around would be allowing the player to "cut-up" the frame, rather than disassemble it. That would be more logical, though it sounds silly. Also, there's a lot of redundancy in the component memory. The NX-17 for example has zero options for crafting components. It takes a vacutainer, 8 amplifier circuits, and 8 power converters. Storing its components is completely redundant information. Sure, you could make the argument that someone later on might alter the recipe, but is that worth the space NOW? |
You're getting hung up on terminology, "dissasemble" in dda means "reverse the crafting recipe", it doesn't matter how that happens, if it's feasible to get the inputs back out, it's reversible. Yes, there's a lot of redundancy all over the place, tuning the code to be minimal takes a lot of time and makes the code fragile, so we don't do it unless it's necessary. I haven't seen anything here indicating it's necessary. |
I don't think I'm getting hung up on terminology which is precisely why I included the example. Now, you could argue that's just an exception and that steel frames shouldn't be disassemble-able at all, but that's an example of what I mean when I say there can be a difference between the concepts of "disassemble" and "reverse the crafting recipe." I don't think it's a foregone conclusion that there aren't other examples or that the solution is to simply make all of them irreversible. Is there nothing in between "cut-up" and "disassemble?" I don't think it makes sense to be able to disassemble a steel frame, but logically there should be a method of recovering the materials, whether it be adding a new item/recipe property (disassemble-able vs. reversible), expanding the 'B'utcher menu "cut-up" option (possibly renaming to something like "salvage"), or simply adding recipes to turn frames into scrap. |
On Fri, Jan 9, 2015 at 11:24 AM, ejseto notifications@github.com wrote:
Nope. Also, the internal term for "cut up" is salvage. |
And actually, let's have a little history lesson:
To summarize, there are, at most, two ways to break down a given item:
On Fri, Jan 9, 2015 at 11:24 AM, ejseto notifications@github.com wrote:
|
My point was that I don't think it really makes sense to disassemble a steel frame, at least not in any traditional sense. You could hand-wave your way around it by pointing out that scrap, chunks and lumps are fairly ambiguous terms, but cutting it up makes more sense than disassembling it. But as you said, the salvage system is too lossy for something as basic as a steel frame, and you certainly wouldn't be able to do it with scissors. I'm aware of the internal name, that's exactly why I said salvage. What matters is what's visible to players. It doesn't make sense to "cut up" a steel frame, at least not with anything less than a blow torch. But it also doesn't really make sense to "disassemble" one either since you're not getting from it exactly what you put into it without that hand-waving. |
So the issue is that crafting then disassembling a steel frame gives you A hybrid approach I just thought of for compression would be to have the |
What about differentiating between "assembled" and "used" components, where assembled components are stored for retrieval on disassembly, whereas instead of retrieving used components, item-specific ones are returned. For the sake of the argument, let's say you're making an utility backpack out of either rags or a rucksack, and adding to it one of few knives, and other tools. In this case, information about knives would be stored, but rags/rucksack are "used up", and disassembly will only yield back a rucksack. |
Just because I don't mind it doesn't mean it's not a problem. |
On Jan 9, 2015 8:57 AM, "Asmageddon" notifications@github.com wrote:
The only sensible way I can think of to do that is to chain the recipes,
By saying you don't mind, you indicated that any issues you raise are |
The object is manufactured once, from a single recipe. While this could be insufficient for some obscure cases, the disassembly could just yield item-specific items, and a list of non-used-up components. E.g. recipe uses up 4 scrap metal and 1 survival knife or 1 combat knife, with the knife being stored in a |
Sounds like you're proposing that base materials (rags, scrap/chunks/lumps On Fri, Jan 9, 2015 at 10:48 PM, Asmageddon notifications@github.com
|
Yup. Basically anything that isn't just assembled into the thing(screws, sticks glued together, tools put in a toolbelt, puzzle) and is instead welded/cut/fried heavily, would lose its identity and become a part of the resulting item itself, rather than a component. |
The question is, how do you specify that, and how simple is it? Example json would be fine. |
Since I'm working on recipe stuff anyway, and changing the formats up a bit, primarily to more objects and less arrays(for extensibility and reusability), I could probably do that. What I've got currently is this, in which the way to do that would be either to add a per-component flag, or have two separate lists. Neither is problematic, so that's a style decision, I guess. |
|
I'm converting individual requirements into separately loadable bits, and objects from arrays. The upsides are primarily cleaner and less monolithic code, as well as extensibility. Recently, someone added a new flag for components, which, as of right now, is optional. If you added one more flag or number, the format would become ambiguous and impossible to parse properly. Won't happen? Maybe. But it's not really much more of a pain to write a json object than it is to write a json array. Also, if anyone desires so, this redundancy makes it possible to mix quality requirements into the alternative vector with just a bit of fiddling. Keep arrays for arrays and lists, use values/objects for the rest IMO. Oh, and for the note the min/difficulty was for a test of an optional skill requirement feature that doesn't impact existing stuff whatsoever, simply something I'd like available for the future for "this is harder to do than it is to learn/attempt" recipes/actions. As for an example, with two separate lists, this is what it could look like. |
Going back to the original topic of compressed saves, how about some hybrid system? Where currently there is a separate X.Y.Z folder, and o.X.Y file, why not combine those into separate archives: X.Y.foo? Even just something like a .tar would probably help a lot. As for the json, if not being able to use new saves with old versions is not a problem, there is a lot of redundant information that could be cut out and made into optional aka default values. |
As I outlined earlier, we could 'archive' map data as the player moved |
It's not obvious why @KA101 closed the issue at least without reading the whole discussion, can we get at least a simple explanation. The title of this issue suggests a solution that I think can be seamlessly implemented by just first trying to load a compressed file and falling back to uncompressed version if it fails. |
Look at the first reply. Everything else was bikeshedding.
The overall problems with this thread were:
|
Ok, I misread the title. What I meant was not putting the whole save into one file, just compressing individual files within the save folder. Thank you, I'll try looking into it later. |
Currently, Cataclysm savegames generate humongous messes of thousands of files, the whole thing compressible down to less than 10% of their original size. The file count is a problem for filesystem fragmentation, and for copying. Even for longer-running worlds, zipped savegames rarely cross the threshold of 8MiB, and I believe that on all but slowest of machines, even rewriting the entire world save(as opposed to incremental saving) would be faster than the current approach.
The text was updated successfully, but these errors were encountered: