Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

models/issue.go NewIssue() could fail due to duplicate key #7887

Closed
guillep2k opened this issue Aug 16, 2019 · 5 comments · Fixed by #8307
Closed

models/issue.go NewIssue() could fail due to duplicate key #7887

guillep2k opened this issue Aug 16, 2019 · 5 comments · Fixed by #8307

Comments

@guillep2k
Copy link
Member

The current code for models/issue.go roughly does:

func getMaxIndexOfIssue(e Engine, repoID int64) (int64, error) {
    var (
        maxIndex int64
        has      bool
        err      error
    )

    has, err = e.SQL("SELECT COALESCE((SELECT MAX(`index`) FROM issue WHERE repo_id = ?),0)", repoID).Get(&maxIndex)

   [...]

    return maxIndex, nil
}

func newIssue(e *xorm.Session, doer *User, opts NewIssueOptions) (err error) {
	opts.Issue.Title = strings.TrimSpace(opts.Issue.Title)

	maxIndex, err := getMaxIndexOfIssue(e, opts.Issue.RepoID)
	if err != nil {
		return err
	}
	opts.Issue.Index = maxIndex + 1

   [...]

	if _, err = e.Insert(opts.Issue); err != nil {
		return err
	}

   [...]
}

It's possible that under heavy loads (e.g. a small server) two sessions may attempt to use the same index number to create issues; one of the two will fail due to a race condition:

  1. User A submits the form creating an issue for repo abc.
  2. User B submits the form creating another issue for repo abc.
  3. At issue.go, in the context of user A's request, getMaxIndexOfIssue() returns 99. It will be used at step 5.
  4. At issue.go, in the context of user B's request, getMaxIndexOfIssue() also returns 99. because at the moment that's the latest issue for the repo. It will be used at step 6.
  5. At issue.go, in the context of user A's request, newIssue() inserts A's issue with Index: 100.
  6. At issue.go, in the context of user B's request, newIssue() attempts to insert B's issue with Index: 100 and fails with duplicate key error, because 100 was used at step 5.

This will of course depend on the database locking model.

I think the best way to handle this is to use a mutex, thus covering most scenarios.

@zeripath
Copy link
Contributor

So a mutex would actually be at the wrong level - it needs to be at the DB level (Consider clusters). There's only really three options to fix it:

  1. Just retry and hope you can get it right once - Adding a mutex to the web server level will help prevent hitting the problem if you're not in a cluster but should also reduce the likelihood of hitting the problem if you are.
  2. Use a DB stored procedure to generate the issue number. (Not guaranteed to completely solve this issue on all DBs IIRC.)
  3. Change the transactional isolation mode on your DB. (Also not guaranteed to completely solve the issue on all DBs IIRC.)

I think option 1 is probably a reasonable thing to do, but someone with more specific DB knowledge could look into how much work it would be to add 2 or 3 - but I think you still need retries when I last looked into this.

@lunny
Copy link
Member

lunny commented Aug 17, 2019

If we want finish that on DB level, it's a complex SQL and specific for different databases. So I think the simple and dirty method is just retry it 3 times if it failed because of exist issue number.

@typeless
Copy link
Contributor

Ultimately, I would like to consider dropping the index column from the table. It should be something deducible from the other columns.

@lunny
Copy link
Member

lunny commented Oct 1, 2019

This should be resolved by #8309

@guillep2k
Copy link
Member Author

This should be resolved by #8309

Unfortunately, it won't. Consider the following scenario:

image

By the time T2 inserts the row in issue using MAX(index), T1 already picked the value 3 for that, but T2 can't possible know about it because T1 didn't commit the changes yet. So T2 also picks the value 3 and fails at the time of the commit.

I realized this when I created the stress test in #8005. It will always fail with this kind of method, unless we use some method of serialization like a semaphore or a lock in the table. But the semaphore will not scale in a multi-server setup as you've pointed out, and a database lock (either at row or table level) is dangerous because it's very easy to end up with a deadlock.

That's why I gave up and went for the retry route.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants