Threading notes

Just some ramblings on the though process of the threading model (explained to someone else in discord).

First thing to hit there would be threading the generation and meshing (and gpu upload). However threading comes with a lot of caveats. The data has to be guarded from multiple threads acting on it at the same time. The simplest threading that most add is to mutex the voxel data and pass the chunks off to another thread to process. This comes with a lot of performance hits.

Anytime you want to access the chunk data (even just for reading) you would have to take a mutex lock (which is not cheap, can be in the 100-300 cycles). The next thing most people turn to are atomics and although they are way faster speed wise than a mutex (can be in the 4-12 cycle range) they are not free either. Atomics are processed fast however they can cause large latencies on the processor. In order for the processor to execute an atomic it needs to process all instructions that might affect the atomic in its pipeline and also not take anymore instructions that might affect the atomic (effectively stalling the cpu which could cost 50+ cycles, modern cpu can handle 200+ instruction inflight and can be pretty conservative on what it thinks might affect the atomic)

So what I would do is rather than a mutex for the data I would use a flag owned per chunk that marks whether the data is accessible. This flag can only be accessed by one thread (the main thread/render thread normally). I also put in checks for debug builds to make sure that whoever is accessing the flag is in the main thread. I would then setup 2 queues one for threadQueueProcessChunks and one for threadQueueProcessedChunks. These queues will be accessed across threads so they need to be accessed inside of a mutex lock threadMutex. I would run through the main loop and queue up all the chunks that need to be processed into a mainQueueProcessChunks (that is only ever accessed by the main thread, again debug checks). At the end of the main loop I would lock the threadMutex and loop through the mainQueueProcessChunks set the owned flag to false and push it to the threadQueueProcessChunks, then loop through threadQueueProcessedChunks set the owned flag to true (update whatever the thread was working, i.e. update the mesh, texture, etc...) and release the lock. From the thread side it would just loop, lock the threadMutex copy the threadQueueProcessChunks to a local queue threadLocalQueueProcessChunks and copy threadLocalQueueProcessedChunks to threadQueueProcessedChunks and release the lock. From there the thread can work on all the chunks in its local queue, pushing completed ones into threadLocalQueueProcessedChunks and occasionally taking the threadMutex lock to update the queues.

This way the main loop really only ever needs to take a single mutex lock per frame to start and collect processed chunks. Also anything in the main thread can access or edit the chunk data if the flag owned is true.

I have main thread (which handles game state and rendering), process thread (which handles the queues for the thread pool and io thread, it also orders the requests based on distance from player and request type) with a threadpool (generally the number of physical cores - (1 or 2)) that execute the requests and an IO thread. I have the following requests:

enum Type
{
    UpdatePos, //handled by process thread
    GenerateRegion, //handled by threadpool
    CancelGenerateRegion, //handled by process thread
    Generate, //handled by threadpool
    CancelGenerate, //handled by process thread
    Read, //handled by IO thread
    CancelRead, //handled by process thread
    Write, //handled by IO thread
    CancelWrite, //handled by process thread
    Mesh, //handled by threadpool
    CancelMesh, //handled by process thread
    MeshReturn //handled by process thread
};

each request type has a priority,

namespace Priority
{
const size_t CancelRead=10;
const size_t CancelWrite=10;
const size_t CancelGenerate=10;
const size_t CancelMesh=10;
const size_t MeshReturn=10;

const size_t UpdatePos=15;

const size_t Read=25;
const size_t Generate=25;
const size_t Mesh=25;

const size_t Write=50;
}

with the lower the number meaning the higher the priority

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threading notes

Clone this wiki locally