Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dev i18n to be more concurrent #20159

Merged
merged 8 commits into from
Jul 4, 2022
Merged

Conversation

zeripath
Copy link
Contributor

@zeripath zeripath commented Jun 28, 2022

The recent changes to add live-reloading to the i18n translation files made the i18n code totally non-concurrent when using dev. This will make discovering other concurrency related issues far more difficult. This PR fixes these, adds some more comments to the code and slightly restructures a few functions.

Signed-off-by: Andrew Thornton art27@cantab.net

The recent changes to add live-reloading to the i18n translation files added a
few races. This PR fixes these, adds some more comments to the code and slightly
restructures a few functions.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath zeripath added type/bug skip-changelog This PR is irrelevant for the (next) changelog, for example bug fixes for unreleased features. labels Jun 28, 2022
@zeripath zeripath added this to the 1.18.0 milestone Jun 28, 2022
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jun 28, 2022
@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Jun 28, 2022
@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 28, 2022

Where is the race? In old code, the reloadMu locks the store and all locales.

Tr calls reloadLocaleByIni and tryTr, they are protected by reloadMu if the mutex is not nil.

Then, no other access to the maps.

For prod=true, the maps are prepared ahead, then only reads to them.

@zeripath
Copy link
Contributor Author

Where is the race?

Sorry I stand corrected there isn't a race because the previous code was locking the entire translation system for every single translation!

This PR suggests using a RWMutexes to allow for much more concurrent behaviour.

@wxiaoguang
Copy link
Contributor

Understand. My opinion is about Return on Investment (ROI).

When I was designing the module, IMO in development mode, there is no need for concurrent behavior. So I chose the simple approach. The less code, the fewer mistakes, and the less maintenance work in the future.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 28, 2022

Do you have strong preference to introduce these new mutexes? Personally I still like a simple approach (and there is always a chance to optimize the code if there is really some performance problems), while new mutexes also work and if people like it, either is fine.

If no need to add more mutexes at the moment, then the def reloading problem could be resolved by moving some lines of the code from Tr to tryTr, then everything works.


Update: the def reloading fix is #20165

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath zeripath changed the title Fix races in i18n code Allow dev i18n to be more concurrent Jun 28, 2022
@zeripath
Copy link
Contributor Author

When I was designing the module, IMO in development mode, there is no need for concurrent behavior. So I chose the simple approach. The less code, the fewer mistakes, and the less maintenance work in the future.

We need the i18n code to be concurrent because without it it will hide concurrency issues elsewhere.

@zeripath
Copy link
Contributor Author

Do you have strong preference to introduce these new mutexes? Personally I still like a simple approach (and there is always a chance to optimize the code if there is really some performance problems), while new mutexes also work and if people like it, either is fine.

I've already done all of the work to properly include the double mutex layer.

If no need to add more mutexes at the moment, then the def reloading problem could be resolved by moving some lines of the code from Tr to tryTr, then everything works.

Update: the def reloading fix is #20165

Your proposed PR involves recalculating the idx for the trKey thus involves relocking the main localestore.

Just accept this PR instead of making more PRs.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 28, 2022

When I was designing the module, IMO in development mode, there is no need for concurrent behavior. So I chose the simple approach. The less code, the fewer mistakes, and the less maintenance work in the future.

We need the i18n code to be concurrent because without it it will hide concurrency issues elsewhere.

If it's the case, I think there could be still a simple patch

diff --git a/modules/translation/i18n/i18n.go b/modules/translation/i18n/i18n.go
index 9c428ad88..4fddeab35 100644
--- a/modules/translation/i18n/i18n.go
+++ b/modules/translation/i18n/i18n.go
@@ -35,7 +35,7 @@ type locale struct {
 }
 
 type LocaleStore struct {
-       reloadMu *sync.Mutex // for non-prod(dev), use a mutex for live-reload. for prod, no mutex, no live-reload.
+       reloadMu *sync.RWMutex // for non-prod(dev), use a mutex for live-reload. for prod, no mutex, no live-reload.
 
        langNames []string
        langDescs []string
@@ -49,7 +49,7 @@ type LocaleStore struct {
 func NewLocaleStore(isProd bool) *LocaleStore {
        ls := &LocaleStore{localeMap: make(map[string]*locale), textIdxMap: make(map[string]int)}
        if !isProd {
-               ls.reloadMu = &sync.Mutex{}
+               ls.reloadMu = &sync.RWMutex{}
        }
        return ls
 }
@@ -136,8 +136,8 @@ func (ls *LocaleStore) Tr(lang, trKey string, trArgs ...interface{}) string {
 // Tr translates content to locale language. fall back to default language.
 func (l *locale) Tr(trKey string, trArgs ...interface{}) string {
        if l.store.reloadMu != nil {
-               l.store.reloadMu.Lock()
-               defer l.store.reloadMu.Unlock()
+               l.store.reloadMu.RLock()
+               defer l.store.reloadMu.RUnlock()
        }
        msg, _ := l.tryTr(trKey, trArgs...)
        return msg
@@ -147,6 +147,8 @@ func (l *locale) tryTr(trKey string, trArgs ...interface{}) (msg string, found b
        if l.store.reloadMu != nil {
                now := time.Now()
                if now.Sub(l.lastReloadCheckTime) >= time.Second && l.sourceFileInfo != nil && l.sourceFileName != "" {
+                       l.store.reloadMu.RUnlock() // if the locale file should be reloaded, then we release the read-lock
+                       l.store.reloadMu.Lock()    // and acquire the write-lock
                        l.lastReloadCheckTime = now
                        if sourceFileInfo, err := os.Stat(l.sourceFileName); err == nil && !sourceFileInfo.ModTime().Equal(l.sourceFileInfo.ModTime()) {
                                if err = l.store.reloadLocaleByIni(l.langName, l.sourceFileName); err == nil {
@@ -155,6 +157,8 @@ func (l *locale) tryTr(trKey string, trArgs ...interface{}) (msg string, found b
                                        log.Error("unable to live-reload the locale file %q, err: %v", l.sourceFileName, err)
                                }
                        }
+                       l.store.reloadMu.Unlock() // release the write-lock
+                       l.store.reloadMu.RLock()  // and re-acquire the read-lock, which was managed by outer Tr function
                }
        }
        trMsg := trKey

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 28, 2022

I've already done all of the work to properly include the double mutex layer.

Could a shared RWMutex work? If yes, then I think the simple approach doesn't have the hiding concurrency issues problem, they behave the same. IMO in most time, there is no write-lock, most calls to Tr causes read-lock.

Your proposed PR involves recalculating the idx for the trKey thus involves relocking the main localestore.

Yup, but they are protected by the mutex. Now using a RWMutex, reloadLocaleByIni is protected by the write-lock, are there still some problems?

Just accept this PR instead of making more PRs.

Sorry for noises, but I think it's clear to describe how one mutex works with a separate PR .....

@wxiaoguang
Copy link
Contributor

After some refactoring, finally I implemented a 100% write-lock-free live-reloading (if locale file is not changed), by using atomic.Value.

If we need to make a 100% lock-free live-reloading, it may be also feasible by using more atomic.Values, the question is whether it is worth. 🤔

@zeripath
Copy link
Contributor Author

Honestly. What is wrong with this PR?

I don't understand you.

I've spent more time arguing with you about this than I actually spent writing the PR.

Your proposed "simple" solution will involve locking every translation every second whilst the filestat is checked.

In fact instead of using filestats we should be using fsnotify but honestly you've put me off even looking any further at this.

Next you start suggesting atomic values? Have you looked at how RWMutex.RLock() is implemented:

func (rw *RWMutex) RLock() {
	if race.Enabled {
		_ = rw.w.state
		race.Disable()
	}
	if atomic.AddInt32(&rw.readerCount, 1) < 0 {
		// A writer is pending, wait for it.
		runtime_SemacquireMutex(&rw.readerSem, false, 0)
	}
	if race.Enabled {
		race.Enable()
		race.Acquire(unsafe.Pointer(&rw.readerSem))
	}
}

it's already using an atomic.

@zeripath zeripath closed this Jun 29, 2022
@zeripath zeripath reopened this Jun 29, 2022
Copy link
Member

@jolheiser jolheiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are resuming our lock states right before returning, I think this should simplify it, unless I missed something

modules/translation/i18n/i18n.go Outdated Show resolved Hide resolved
Co-authored-by: John Olheiser <john+github@jolheiser.com>
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Jun 29, 2022
@codecov-commenter

This comment was marked as off-topic.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 30, 2022

I've spent more time arguing with you about this than I actually spent writing the PR.

The first argument in this PR is about race, which I do not understand and did not find it. (thank you for your time to confirm that there was no data-race)

Since there is no data-race, the second is about why the complex is necessary (if not, only a few line change will work), you explained that you want better concurrency. (thank you for your time)

However, this PR's concurrency is still affected by the locale's Lock (a write-lock).

So I tried to investigated on it and tried a 100% write-lock-free solution, isn't is better for your proposed better concurrency?

Your proposed "simple" solution will involve locking every translation every second whilst the filestat is checked.
In fact instead of using filestats we should be using fsnotify but honestly ...

The updated "simple" solution only require read-lock if the locale file is not changed.

The original PR about fsnotify has some flaws: it's too complex and many code are copied twice (personally I do not like duplicate code and always try to to avoid them .....), result in no benefit for production, it will require more maintain work in future if the locale module should be refactored. And it's not obvious that it should be closed, Gitea already has some resource leaks, if no necessary I think the Closer pattern could be replaced by something else.

Update: if you think the fsnotify is a must and brings real benefit, it could still be used in current code base, without splitting code into duplicate functions: using a goroutine to do write-lock and reload, keeping most other code as before.

you've put me off even looking any further at this.

OK, I am going to be impolite .... (sorry for I am not a native speaker, so I might misunderstand or mis-express something, you can assume that I have no malice for anyone, just express my feeling): I have been put off many times, especially for some frontend refactorings and some recent PRs, even no feedback at all, I have complained that in discord before.

Next you start suggesting atomic values? Have you looked at how RWMutex.RLock() is implemented:

I have read it before. Why the atomic values has been explained above, because I thought you like a better concurrency solution. Maybe I misunderstand your meaning.

zeripath added a commit to zeripath/gitea that referenced this pull request Jul 3, 2022
The recovery, API, Web and package frameworks all create their own HTML
Renderers. This increases the memory requirements of Gitea
unnecessarily with duplicate templates being kept in memory.

Further the reloading framework in dev mode for these involves locking
and recompiling all of the templates on each load. This will potentially
hide concurrency issues and it is inefficient.

This PR stores the templates renderer in the context and stores this
context in the NormalRoutes, it then creates a fsnotify.Watcher
framework to watch files.

The watching framework is then extended to the mailer templates which
were previously not being reloaded in dev.

Then the locales are simplified to a similar structure.

Fix go-gitea#20210, go-gitea#20211, go-gitea#20217
Replace go-gitea#20159

Signed-off-by: Andrew Thornton <art27@cantab.net>
@6543
Copy link
Member

6543 commented Jul 4, 2022

.

@6543 6543 merged commit ba0f927 into go-gitea:main Jul 4, 2022
@zeripath zeripath deleted the fix-races-in-i18n branch July 4, 2022 12:46
zjjhot added a commit to zjjhot/gitea that referenced this pull request Jul 7, 2022
* upstream/main:
  Modify milestone search keywords to be case insensitive (go-gitea#20266)
  Fix toolip on mobile notification bell (go-gitea#20270)
  Allow RSA 2047 bit keys (go-gitea#20272)
  Refix notification bell placement (go-gitea#20251)
  Bump mermaid from 9.1.1 to 9.1.2 (go-gitea#20256)
  EscapeFilter the group dn membership (go-gitea#20200)
  Only show Followers that current user can access (go-gitea#20220)
  Init popup for new code comment (go-gitea#20234)
  Bypass Firefox (iOS) bug (go-gitea#20244)
  Adjust max-widths for the repository file table (go-gitea#20243)
  Display full name (go-gitea#20171)
  Adjust class for mobile has the problem of double small bells (go-gitea#20236)
  Adjust template for go-gitea#20069 smallbell (go-gitea#20108)
  Add integration tests for the Gitea migration form (go-gitea#20121)
  Allow dev i18n to be more concurrent (go-gitea#20159)
  Allow enable LDAP source and disable user sync via CLI (go-gitea#20206)
dineshsalunke pushed a commit to dineshsalunke/gitea that referenced this pull request Jul 9, 2022
The recent changes to add live-reloading to the i18n translation files made the i18n code totally non-concurrent when using dev. This will make discovering other concurrency related issues far more difficult. This PR fixes these, adds some more comments to the code and slightly restructures a few functions.

Signed-off-by: Andrew Thornton <art27@cantab.net>
vsysoev pushed a commit to IntegraSDL/gitea that referenced this pull request Aug 10, 2022
The recent changes to add live-reloading to the i18n translation files made the i18n code totally non-concurrent when using dev. This will make discovering other concurrency related issues far more difficult. This PR fixes these, adds some more comments to the code and slightly restructures a few functions.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. skip-changelog This PR is irrelevant for the (next) changelog, for example bug fixes for unreleased features. type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants