-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Multi Region / Async Replication #4114
Comments
Related or duplicate of #2959 |
I'd say this is related but distinct. #2959 clearly discusses a single-homed, multi-server cluster - a single region HA setup - when they discuss a single load balancer.
I'm proposing a system to allow multiple autonomous gitea installs (say - oregon, virgina, and london), which all share data in near-real time, to allow a single frontend experience, but are completely self-contained as well. In other words - multi-region async replication. |
@benyanke sorry for my terse message above, I hope it wasn't too unfriendly. As you can probably tell with the latest news we are getting a lot more than usual traffic and so triage is important. |
No worries! I really value the gitea project, and I'll be seriously looking at moving my main projects to a selfhosted instance of it given the news. I'm getting more comfortable with Golang, and I'd even be willing to work on this multi-region replication if I'm able. Keep up the good work. |
A part of multihome is using a DB that supports it so I've opened: cockroachdb/cockroach#24846 with the CockroachDB project to allow it to support Gitea. This of course doesn't fully achieve what you are looking for, but it is a starting place. Thanks for the kind message. |
I was initially thinking distinct local DBs with data sync on the app layer, but using DB layer replication would indeed also handle it! Looking forward to seeing this. |
I know that also not exactly what you are talking but #1612 could achieve the same goal and offer better experience across DC. For information, https://github.com/pingcap/tidb should also provide a distibuted database compatible with mysql to be used by gitea. |
hmm not sure if I understand the terminology correctly. But what about MariaDB/MySQL galera cluster? https://mariadb.com/kb/en/library/what-is-mariadb-galera-cluster/ It's a simple and scaleable synchronous active-active multi-master topology.
So since it's already build-in (batteries included) in MariaDB 10.2, it's ready to go. |
Has someone tried this? @benyanke @markuman ?
|
Just following up - the multi-master DB doesn't really solve the problem of async/multi region, as mysql/galera clusters assume low levels of latency. That is - it's intended for use within the same datacenter/region, not multi region. This issue is for something like two distinct instances of gitea, each with it's own DB, doing it's own application-layer replication. |
Ok, point taken. Galera and Glusterfs setup would result in a sync replication (single region, but multi AZ (DC)) and fault tolerance setup. I think for multi region, you must do it like gitlab[1]. [1] https://docs.gitlab.com/ee/administration/geo/replication/ |
@markuman I'm currently a day-job gitlab-ee administrator, and the gitlab philosophy is excellent for this, I think. This is exactly how I'd suggest doing it in gitea:
Fully agreed on Galera and Glusterfs, or similar tools. These are all excellent tools for providing intra-region HA, just not inter-region HA. In fact, a really resilient setup probably uses all of these: intra-region tools to provide each region's install redundancy, then inter-region tools to provide the entire stack redundancy. |
@benyanke I like the suggestion you made but would suggest modifying it such that there's no hard master/slave relationship between nodes, but instead there's a master, but when it goes down if there's 2 or more slaves they will sync writes between one another and then their combined "vote" on conflicts would force the master to merge those transactions in once it came back online. Highly agree with this whole thread though, Gitea could seriously use replication functionality. As an open-source alternative to a lot of the other git repo sharing/auth platforms out there, this kind of failover might even be best implemented at an organization level for master/slave relations, for example: A bit better would be to encrypt org data on the replicas so it's there to pull if there's data loss on the master node, but otherwise secure (this would allow for things like contractors setting up shared repos between one another or for small client corporations.) Or if you wanted to go another step with some less than pleasant keychains (I might actually help implement it if you wanted to go this route as I've done a bunch of this sort of thing in the past and have some code and schemas I could port in) you could have each such org act as described in the previous paragraphs, and have it fully encrypted on the replicas such that each org gets offsite backup and their data remains secure, but they can still permit access if desired - this gets hairier because then you have replication of the keychain to deal with, which isn't always as straightforward as files, but you can generally get around that by making collisions of things like usernames failover to machine\user or having a check against a known value to determine which user their password matches. |
…le' (go-gitea#4114) from 0ko/forgejo:i18n-backport-20240612 into v7.0/forgejo Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4114 Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
While DC-local clusters are relatively well understood and easy to implement (just share sessions/codebase/DB), multi-homed installs aren't quite as understood.
Does Gitea have any plans for a tool to allow syncing across multiregions in near-real time? Gitlab is already working on one - and I'd like to open this ticket to track the progress of a similar effort here, if it's a direction the maintainers would like to go.
The text was updated successfully, but these errors were encountered: