Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Multi Region / Async Replication #4114

Open
benyanke opened this issue Jun 4, 2018 · 13 comments
Open

Discussion: Multi Region / Async Replication #4114

benyanke opened this issue Jun 4, 2018 · 13 comments
Labels
type/proposal The new feature has not been accepted yet but needs to be discussed first.

Comments

@benyanke
Copy link
Contributor

benyanke commented Jun 4, 2018

While DC-local clusters are relatively well understood and easy to implement (just share sessions/codebase/DB), multi-homed installs aren't quite as understood.

Does Gitea have any plans for a tool to allow syncing across multiregions in near-real time? Gitlab is already working on one - and I'd like to open this ticket to track the progress of a similar effort here, if it's a direction the maintainers would like to go.

@techknowlogick
Copy link
Member

Related or duplicate of #2959

@benyanke
Copy link
Contributor Author

benyanke commented Jun 4, 2018

I'd say this is related but distinct. #2959 clearly discusses a single-homed, multi-server cluster - a single region HA setup - when they discuss a single load balancer.

it would take to be able to run gitea on a multi-host 
setup, likely behind a single nginx load balancer.

I'm proposing a system to allow multiple autonomous gitea installs (say - oregon, virgina, and london), which all share data in near-real time, to allow a single frontend experience, but are completely self-contained as well. In other words - multi-region async replication.

@techknowlogick
Copy link
Member

@benyanke sorry for my terse message above, I hope it wasn't too unfriendly. As you can probably tell with the latest news we are getting a lot more than usual traffic and so triage is important.

@techknowlogick techknowlogick added the type/proposal The new feature has not been accepted yet but needs to be discussed first. label Jun 4, 2018
@benyanke
Copy link
Contributor Author

benyanke commented Jun 4, 2018

No worries! I really value the gitea project, and I'll be seriously looking at moving my main projects to a selfhosted instance of it given the news. I'm getting more comfortable with Golang, and I'd even be willing to work on this multi-region replication if I'm able.

Keep up the good work.

@techknowlogick
Copy link
Member

A part of multihome is using a DB that supports it so I've opened: cockroachdb/cockroach#24846 with the CockroachDB project to allow it to support Gitea. This of course doesn't fully achieve what you are looking for, but it is a starting place.

Thanks for the kind message.

@benyanke
Copy link
Contributor Author

benyanke commented Jun 4, 2018

I was initially thinking distinct local DBs with data sync on the app layer, but using DB layer replication would indeed also handle it!

Looking forward to seeing this.

@sapk
Copy link
Member

sapk commented Jun 5, 2018

I know that also not exactly what you are talking but #1612 could achieve the same goal and offer better experience across DC.

For information, https://github.com/pingcap/tidb should also provide a distibuted database compatible with mysql to be used by gitea.

@markuman
Copy link

markuman commented Jun 5, 2018

hmm not sure if I understand the terminology correctly.

But what about MariaDB/MySQL galera cluster? https://mariadb.com/kb/en/library/what-is-mariadb-galera-cluster/

It's a simple and scaleable synchronous active-active multi-master topology.

So since it's already build-in (batteries included) in MariaDB 10.2, it's ready to go.

@hrvoj3e
Copy link

hrvoj3e commented Dec 18, 2018

Has someone tried this? @benyanke @markuman ?

But what about MariaDB/MySQL galera cluster? https://mariadb.com/kb/en/library/what-is-mariadb-galera-cluster/

@benyanke
Copy link
Contributor Author

benyanke commented May 7, 2020

Just following up - the multi-master DB doesn't really solve the problem of async/multi region, as mysql/galera clusters assume low levels of latency. That is - it's intended for use within the same datacenter/region, not multi region.

This issue is for something like two distinct instances of gitea, each with it's own DB, doing it's own application-layer replication.

@markuman
Copy link

markuman commented May 7, 2020

Ok, point taken. Galera and Glusterfs setup would result in a sync replication (single region, but multi AZ (DC)) and fault tolerance setup.

I think for multi region, you must do it like gitlab[1].
Setup a read replica of gitea. So just a 2nd instance which mirrors all repositories from your master.
In case of failover, all repositories in your 2nd instance become the primary repository and your primary gitea instance must be mirror all repositories from the 2nd instance.

[1] https://docs.gitlab.com/ee/administration/geo/replication/

@benyanke
Copy link
Contributor Author

benyanke commented May 8, 2020

@markuman I'm currently a day-job gitlab-ee administrator, and the gitlab philosophy is excellent for this, I think. This is exactly how I'd suggest doing it in gitea:

  • read replicas as distinct but linked installs
  • proxying/caching reads/writes to/from master, so you can commit or pull from any of them
  • a switch you can flip to promote any replica to master

Fully agreed on Galera and Glusterfs, or similar tools. These are all excellent tools for providing intra-region HA, just not inter-region HA. In fact, a really resilient setup probably uses all of these: intra-region tools to provide each region's install redundancy, then inter-region tools to provide the entire stack redundancy.

@CoryGH
Copy link

CoryGH commented Jun 13, 2020

@benyanke I like the suggestion you made but would suggest modifying it such that there's no hard master/slave relationship between nodes, but instead there's a master, but when it goes down if there's 2 or more slaves they will sync writes between one another and then their combined "vote" on conflicts would force the master to merge those transactions in once it came back online.

Highly agree with this whole thread though, Gitea could seriously use replication functionality.

As an open-source alternative to a lot of the other git repo sharing/auth platforms out there, this kind of failover might even be best implemented at an organization level for master/slave relations, for example:
Say you have 3 nerds who want to have their sourcecode backed up and generally trust one another but might still want some fine-grained controls over the access to individual repos, they each create an org which is treated as a master that they have absolute admin control over, with a global first-come-first-serve registry of org names.

A bit better would be to encrypt org data on the replicas so it's there to pull if there's data loss on the master node, but otherwise secure (this would allow for things like contractors setting up shared repos between one another or for small client corporations.)

Or if you wanted to go another step with some less than pleasant keychains (I might actually help implement it if you wanted to go this route as I've done a bunch of this sort of thing in the past and have some code and schemas I could port in) you could have each such org act as described in the previous paragraphs, and have it fully encrypted on the replicas such that each org gets offsite backup and their data remains secure, but they can still permit access if desired - this gets hairier because then you have replication of the keychain to deal with, which isn't always as straightforward as files, but you can generally get around that by making collisions of things like usernames failover to machine\user or having a check against a known value to determine which user their password matches.

project-mirrors-bot-tu bot pushed a commit to project-mirrors/forgejo-as-gitea-fork that referenced this issue Jan 23, 2025
…le' (go-gitea#4114) from 0ko/forgejo:i18n-backport-20240612 into v7.0/forgejo

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4114
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/proposal The new feature has not been accepted yet but needs to be discussed first.
Projects
None yet
Development

No branches or pull requests

6 participants