Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor architecture to prevent race conditions #322

Open
2 tasks
lanedirt opened this issue Aug 29, 2024 · 0 comments
Open
2 tasks

Refactor architecture to prevent race conditions #322

lanedirt opened this issue Aug 29, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@lanedirt
Copy link
Owner

Description

There have been multiple problems reported that seem to be caused by race conditions. This is a high-level issue to track bug reports and smaller individual issues/PR's that are associated with fixes.

What is basically comes down to: the way how OGameX works is that there are multiple events during a page load that can trigger database updates. And since multiple pageloads can happen at the same time because of AJAX requests but also because of other players interacting with the current players plantes.

Race condition example

All database updates are done through the Laravel Eloquent ORM. Example flow for the Planet object:

During page load:

  1. Planet object of current user gets loaded in memory.
  2. Planet gets updated with newly gained resources according to hourly production since last page load. This calls PlanetService::save() method which calls the Eloquent $this->model->save().
  3. Player missions that have arrived are processed. For example a returning transport mission adds resources and units (ships) to the current planet which then also saves the planet.
  4. Etc.

When multiple page loads happen at the same time these so called "Race conditions" can occur which means two processes are trying to update the same field in the database at the exact same time which causes one of the processes to get its data overwritten by the other. This causes data loss and can result in lost resources/units, or illegally gained resources/units.

Solution

In order to prevent race conditions there are multiple ways to solve this. At the time of writing the following solutions are (partly) implemented:

  1. Pessismistic locking = This means that before a record in the database is updated, it is 1) locked for updating exclusively 2) after the lock is gained the record is read from the database to get the newest content 3) Updated are saved to database 4) The lock is released... This should work well, however it requires all places that mutate the database to implement these exact 4 steps to ensure that all updates happen on a locked record. And not all places have been changed yet in the code to accommodate for this.
  2. Optimistic locking = This means that whenever a record is updated a WHERE timestamp = lastKnownTimestamp is added to the update query. When the timestamp in the database is changed by another process, the update will fail. This ensures that only one database update is committed, and other pending updates do not overwrite other update's changes. Benefit of this method is that it does not require any locking, but it does require that the database operations (transactions) that use this can support silent failures (0 records updated) without crashing the page or causing other out-of-date errors.

Reported issues

See earlier reported issues:

Code changes

Some work has already been done to combat this, see the following PR's:

In order to test for race conditions custom php artisan commands have been added that can be used to issue parallel requests in order to simulate race conditions. See:

  • php artisan test:race-condition-unitqueue
  • php artisan test:race-condition-game-mission

Todo

Here are the remaining (known) todo's:

  • Check all planet save calls to ensure all callers 1. acquire an update lock, 2. refresh the planet data 3. mutate the planet object. 4. release the update lock.
  • Investigate use of optimistic locking within the current architecture to determine if its feasible to support this. If not feasible, investigate what architecture changes are required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant