-
Notifications
You must be signed in to change notification settings - Fork 5
Threading notes
Just some ramblings on the though process of the threading model (explained to someone else in discord).
First thing to hit there would be threading the generation and meshing (and gpu upload). However threading comes with a lot of caveats. The data has to be guarded from multiple threads acting on it at the same time. The simplest threading that most add is to mutex the voxel data and pass the chunks off to another thread to process. This comes with a lot of performance hits.
Anytime you want to access the chunk data (even just for reading) you would have to take a mutex lock (which is not cheap, can be in the 100-300 cycles). The next thing most people turn to are atomics and although they are way faster speed wise than a mutex (can be in the 4-12 cycle range) they are not free either. Atomics are processed fast however they can cause large latencies on the processor. In order for the processor to execute an atomic it needs to process all instructions that might affect the atomic in its pipeline and also not take anymore instructions that might affect the atomic (effectively stalling the cpu which could cost 50+ cycles, modern cpu can handle 200+ instruction inflight and can be pretty conservative on what it thinks might affect the atomic)
So what I would do is rather than a mutex for the data I would use a flag owned
per chunk that marks whether the data is accessible. This flag can only be accessed by one thread (the main thread/render thread normally). I also put in checks for debug builds to make sure that whoever is accessing the flag is in the main thread. I would then setup 2 queues one for threadQueueProcessChunks
and one for threadQueueProcessedChunks
. These queues will be accessed across threads so they need to be accessed inside of a mutex lock threadMutex
. I would run through the main loop and queue up all the chunks that need to be processed into a mainQueueProcessChunks
(that is only ever accessed by the main thread, again debug checks). At the end of the main loop I would lock the threadMutex
and loop through the mainQueueProcessChunks
set the owned
flag to false and push it to the threadQueueProcessChunks
, then loop through threadQueueProcessedChunks
set the owned
flag to true (update whatever the thread was working, i.e. update the mesh, texture, etc...) and release the lock.
From the thread side it would just loop, lock the threadMutex
copy the threadQueueProcessChunks
to a local queue threadLocalQueueProcessChunks
and copy threadLocalQueueProcessedChunks
to threadQueueProcessedChunks
and release the lock. From there the thread can work on all the chunks in its local queue, pushing completed ones into threadLocalQueueProcessedChunks
and occasionally taking the threadMutex
lock to update the queues.
This way the main loop really only ever needs to take a single mutex lock per frame to start and collect processed chunks. Also anything in the main thread can access or edit the chunk data if the flag owned
is true.
I have main thread (which handles game state and rendering), process thread (which handles the queues for the thread pool and io thread, it also orders the requests based on distance from player and request type) with a threadpool (generally the number of physical cores - (1 or 2)) that execute the requests and an IO thread. I have the following requests:
enum Type
{
UpdatePos, //handled by process thread
GenerateRegion, //handled by threadpool
CancelGenerateRegion, //handled by process thread
Generate, //handled by threadpool
CancelGenerate, //handled by process thread
Read, //handled by IO thread
CancelRead, //handled by process thread
Write, //handled by IO thread
CancelWrite, //handled by process thread
Mesh, //handled by threadpool
CancelMesh, //handled by process thread
MeshReturn //handled by process thread
};
each request type has a priority,
namespace Priority
{
const size_t CancelRead=10;
const size_t CancelWrite=10;
const size_t CancelGenerate=10;
const size_t CancelMesh=10;
const size_t MeshReturn=10;
const size_t UpdatePos=15;
const size_t Read=25;
const size_t Generate=25;
const size_t Mesh=25;
const size_t Write=50;
}
with the lower the number meaning the higher the priority