Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] World persistence over Database #92

Closed
Asurar0 opened this issue Sep 12, 2024 · 9 comments
Closed

[Discussion] World persistence over Database #92

Asurar0 opened this issue Sep 12, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@Asurar0
Copy link
Contributor

Asurar0 commented Sep 12, 2024

Introduction

Minecraft employs a dedicated file format, known as the Anvil file format, to store chunk regions on disk for subsequent retrieval. This file storage approach is consistently utilized across all Minecraft Java Edition server implementations.

This issue aims to discuss the advantages of utilizing a B+Tree database (notably LMDB) over a file I/O approach for storing the world, as well as provide tools for converting between these storage formats, specifically exporting .MCA files from a database and importing .MCA files into a database.

Assumptions

The following assumptions underlie the reasoning of this proposal:

  • The term "Database" refers specifically to "key-value storage engines" or "embeddable transactional NoSQL databases in the form of a key-value store". This definition excludes RDBMS and does not imply the same level of features as more popular databases such as SQLite, PostgreSQL, Redis, or MongoDB. The common paradigm among these databases is a collection of tables that serve as key-value pair storage repositories. They are designed to facilitate efficient and high-performance retrieval and updating of large structured datasets.
  • The performance of both file I/O and database systems is heavily influenced by the underlying filesystem. Given that Pumpkin is intended for universal use, it is prudent not to assume the superiority of one approach over another based on a specific filesystem, as different operating systems and linux distributions employ varying default filesystems (e.g., Windows primarily uses NTFS, some Linux distributions do not include XFS by default, and F2FS is a specialized option, etc...).
  • We distinguish between the terms "Memory map", "Memory mapping" and "Memory mapping I/O" and the term "I/O". Specifically, "Memory map" and "Memory mapping I/O" refer to the use of memory mapping capabilities provided by the kernel, such as the mmap() function in POSIX, whereas "I/O" refers to classical input/output operations over file descriptors (or equivalent), exemplified by the read() function in POSIX.
  • We refer to the in-memory storage of loaded chunks as "Cache", which may incorporate an eviction policy or a timeout mechanism to manage the retention of chunks data.

File I/O

The File I/O approach, utilized by the standard Minecraft Java Edition server implementation, involves storing data in a dedicated folder containing .MCA files, each named according to the location of the 32x32 chunk region it represents:

Name of the .MCA file:
r.{X region}.{Z region}

Pumpkin can derive the region coordinate from a specific chunk location as follows:

// division
let region_chunk_x: i32 = player_chunk_x.div_floor(32);
let region_chunk_z: i32 = player_chunk_z.div_floor(32);

// bit shifts
region_chunk_x = region_chunk_x >> 5;
region_chunk_z = region_chunk_x >> 5;

The logic for interacting with chunks involves the following steps:

  • Determine the player's block location.
  • Derive the chunk location from the block location.
  • Derive the region location from the chunk location.
  • Retrieve and parse the corresponding region file from disk.
  • Render all chunks within the render distance.
  • If a chunk to be rendered spans across regions, load the adjacent region file.
  • If the cache decides to unload a chunk, update the corresponding MCA region file with the changes.
  • Upon server shutdown, update the MCA region files with any pending changes.

Certain steps can be optimized. For instance, by calculating the distance between the player and the edge of the region, you can determine if it is less than the render distance. If so, you can proactively fetch both region files simultaneously, reducing the need for subsequent region file loads.

Pros:

  • Compatibility with vanilla Minecraft worlds
  • Simplified maintenance in the event of data corruption, as only the affected region file needs to be handled
  • Improved efficiency in smaller worlds

Cons:

  • Performance is heavily dependent on the filesystem and file descriptors, which can be a bottleneck.
  • It can become problematic under extreme parallel workloads, such as those experienced with a large player count (e.g., over a thousand players like Folia), as the OS's blocking I/O implementation can lead to issues with loading and unloading hundreds of files. The primary mitigation for this is through hardware solutions, such as a RAID setup.

Database Approach

The database approach utilizes LMDB, a B+Tree-based database, to store chunk information in tables, enabling logarithmic time retrieval and updates. This is the approach used
by FerrumC project.

The logic for interacting with chunks involves the following steps:

  • Determine the player's block location.
  • Derive the chunk location from the block location.
  • Calculate a key based on the chunk location (and optionally the dimension).
  • Use the key to retrieve chunk information from the relevant tables.
  • Render all chunks within the render distance.
  • If the cache decides to unload a chunk, update the database with the latest chunk information from the cache.
  • Upon server shutdown, remove all chunks from the cache and insert them into the database.

The performance of this approach also depends on the implementation of the database, moreover, parallelization of reads and single-threaded writes over a dedicated threadpool.

Why LMDB ?

LMDB is considered a gold standard due to its exceptional performance and reliability, surpassing that of other options in the ecosystem. MDBX is an improvement over LMDB, but its Rust crate support is unfortunately declining.

Pros:

  • Well-suited for extreme parallel workloads and efficient handling of large datasets (more than 50GB up to several TBs)
  • Utilizes memory mapping, resulting in significant I/O efficiency gains compared to classical I/O methods
  • Offers more flexible data modeling, enabling more efficient information retrieval (e.g., retrieving only chunk information without entities).

Cons:

  • More challenging maintenance in the event of data corruption (LMDB is resilient to power outage and OS failure but not to disk corruption).
  • Requires the implementation of import/export functionality to enable users to convert their existing worlds.

Context

This proposal follows my previous experience assisting FerrumC in migrating from RocksDB (LSM structure) to LMDB (B+Tree structure), and aims to address and clarify any misconceptions surrounding the original proposal I made on Discord.

Conclusion

The MCA Region file approach is a proven and suitable solution for small to medium-sized servers, running on consumer or dedicated hardware, respectively. However, Pumpkin aims to achieve performance and efficiency that surpasses even the largest single-map servers, such as Folia and MultiPaper, which have successfully handled around 1000 simultaneous players. Pumpkin's goal is to exceed this benchmark, targeting player counts of over 1500 on a single dedicated hardware setup, a scale that would push the limits of file-based approaches. It is well understood that file-based approaches have inherent limitations when it comes to handling high parallel workloads.

This issue propose three options for Pumpkin:

  • Adopt the MCA Region file approach
  • Adopt the LMDB storage approach
  • Develop a zero-cost abstraction layer that supports both MCA Region file and LMDB backends, allowing users to choose their preferred storage method.
@Asurar0 Asurar0 added the enhancement New feature or request label Sep 12, 2024
@Bryntet
Copy link
Contributor

Bryntet commented Sep 12, 2024

Offers more flexible data modeling, enabling more efficient information retrieval (e.g., retrieving only chunk information without entities).

This is not something that is only achievable via a DB. I feel it very necessary that we don't conflate "using file I/O" with "using Minecraft's Anvil format".

I am still not convinced of the fact that we need a DB. But I think discussion around this is good.

I also think that supporting the original Anvil file format is a must, if we want to actually see adoption. But this should be togglable in config, to instead use Pumpkins format, whichever we choose to make it.

Supporting saving files in the Anvil file format should not be development priority though, our own world-saving method should be a the gold standard in our code base.

@Asurar0
Copy link
Contributor Author

Asurar0 commented Sep 13, 2024

This is not something that is only achievable via a DB. I feel it very necessary that we don't conflate "using file I/O" with "using Minecraft's Anvil format".

My assumption that we will be using MCA region files came from @Snowiiii words on discord:

ofc
but i would prefer using mca not databases

that way people can easly download their worlds
also way would you do if someone drops in an vanilla world?

I agree that we could create our own binary format, but that would mean reinventing the wheel. That being said, I'm happy to hear any ideas you have for improving the Pumpkin format.

I think we're still missing the bigger issue here, which is handling extreme parallel workloads with thousands of players. One key benefit of a single file mmap database is not just I/O efficiency, but also that the OS caches the file descriptor and its memory region, making memory mapping essentially 'free'. If we go with a file-based approach and try to use memory mapping, we'd still incur the cost of filesystem lookups and page faults, which will slow things down.

@Snowiiii
Copy link
Member

Sounds like a cool idea. I'm not against using a database approche as long User's can export their World files. Im fine if you want to start implementing db storage, Im interested to see real Performance benchmarks between databases and just normal OS I/O (e.g. Windows, Linux).

@Bryntet
Copy link
Contributor

Bryntet commented Sep 17, 2024

I agree that we could create our own binary format, but that would mean reinventing the wheel

Is the Anvil ".mca" format compressed at all?

If not, then our wheel could be significantly better.

@Asurar0
Copy link
Contributor Author

Asurar0 commented Sep 17, 2024

I agree that we could create our own binary format, but that would mean reinventing the wheel

Is the Anvil ".mca" format compressed at all?

If not, then our wheel could be significantly better.

Yes it is. https://minecraft.wiki/w/Region_file_format#Payload

@Snowiiii
Copy link
Member

I agree that we could create our own binary format, but that would mean reinventing the wheel

Is the Anvil ".mca" format compressed at all?

If not, then our wheel could be significantly better.

It can be. It also can be not

@BGR360
Copy link

BGR360 commented Oct 17, 2024

Performance is heavily dependent on the filesystem and file descriptors, which can be a bottleneck.

It can become problematic under extreme parallel workloads, such as those experienced with a large player count (e.g., over a thousand players like Folia), as the OS's blocking I/O implementation can lead to issues with loading and unloading hundreds of files. The primary mitigation for this is through hardware solutions, such as a RAID setup.

If we go with a file-based approach and try to use memory mapping, we'd still incur the cost of filesystem lookups and page faults, which will slow things down

@Asurar0 do you have any data to back up any of these claims? If not, this is just speculative premature optimization, and we should wait until we have data that confirms that the OS / file system is a bottleneck for large servers.

I'm also confused by what some of your claims mean.

Performance is heavily dependent on [...] file descriptors

What do you mean by this? Are you suggesting that the OS slows down when you have more file descriptors open? Are you referring to the fact that the OS may have a limit for the number of file descriptors that can be open at once?

the OS's blocking I/O implementation can lead to issues with loading and unloading hundreds of files

I don't understand what you mean by this. Blocking file I/O affects both the Anvil and DB approaches. And in fact, the problem is kind of worse when you use memory mapping, because then you have no control over when your threads block due to page faults.

One key benefit of a single file mmap database is [...] that the OS caches [the] memory region

This is true without memory mapping as well. Any recently accessed anvil files will be in the OS page cache.

we'd still incur the cost of filesystem lookups and page faults

You'll have to incur the cost of page faults with the single file DB just as much as you would with many files, unless the DB is substantially more space efficient. It all depends on how much data the OS can keep in the page cache.

The file system lookups, though, sound like a plausible source of slowdown.

The primary mitigation for this is through hardware solutions, such as a RAID setup.

I've never heard of this as a solution for improving the performance of loading and unloading hundreds of files, but I'm curious to know more. Can you provide any sources for me to read on this?

@28Smiles
Copy link

28Smiles commented Nov 4, 2024

I just stumbled upon this and I have multiple Questions. If you use a Database, why not go with an SQL compatible one, to be able to tap into a greater Ecosystem, allowing multiple Vendors, Configurations and/or even external Databases? Why B+ Trees, if you want to access an chunk, you know exactly where it is, hence you are able to use an hash index (for max cache util a locality-sensitive one), which has superior lookup times. Storing chunkdata in files, roughly equals a hash based database (-the Featureset). One of the strongest arguments for using an off-the-shelf RDBMS is the ecosystem, for a small server you can just use an SQL-Lite database, for larger servers you can use postgres and for the largest, you can even use a cluster and caching out of the box. Using an organised Format would allow Devs to write (Web-)Apps for e.g. Worldmaps, Scoreboards and so on, without even touching the Server-Code.
This leads me to the questions:

If you want to use a Database, why not an SQL based one?
Why, at this stage of the Project, bother with such premature optimisation anyway, if it does not enrich the featureset?

some comments from an outsider, thank you all

@NoSQLKnowHow
Copy link

I am not sure on the MCA format and the benefits of storing that data in a database, but some things like the ban lists, whitelists, etc. they are all in plain JSON format. For example, to add a new object, you have to bring the entire JSON doc into memory just to append a new entry. Even a simple database like SQLite only brings the pages the query has to deal with in memory. So adding a new entry for an IP ban would be super performant. You could also design the data model, indexes, and views, to be optimized for the known access patterns of checking if someone's IP or user is banned, adding a new entry, and so on. IMO, it'd not be a huge lift to make this optimization early and if need be migrate this to a different DB later.

Storing game state is very different from the, but it is getting ahead of the problem. Before a database is chosen, be it SQL, NoSQL, or whatever, there has to be a target for what the server has to support. What is the MVP, and what does success look like? Even better would be to have targets for times too. For example, it should take no more than XXms from initial connection to entry into the world, and it should support 400 of those simultaneously. How many users should the server support concurrently? and so on. Otherwise, we risk solving for a problem that will not arise in the vast majority of uses and overcomplicate smaller installs. I saw it mentioned in one area about running on a tiny RaspPi, which is fantastic, but that is different solution than 50,000 players in a world and keeping all of game state going. There needs to be a product definition in this regard before technology solutions are chosen.

@Pumpkin-MC Pumpkin-MC locked and limited conversation to collaborators Jan 13, 2025
@Snowiiii Snowiiii converted this issue into discussion #475 Jan 13, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants