Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

On any change to external services, restart sync #6058

Closed
keegancsmith opened this issue Oct 16, 2019 · 9 comments · Fixed by #6667
Closed

On any change to external services, restart sync #6058

keegancsmith opened this issue Oct 16, 2019 · 9 comments · Fixed by #6667
Labels
good first issue Good for newcomers
Milestone

Comments

@keegancsmith
Copy link
Member

In the case of large instances a sync can take a long time. This mixed in with streaming can lead to unexpected behaviour. For example adding an external service, then deleting it. If it is adding a lot of repositories, we will continue to add the repositories even though the service is now gone.

@unknwon
Copy link
Member

unknwon commented Oct 16, 2019

Just experienced this during release testing, had to kill my instance.

@mrnugget
Copy link
Contributor

mrnugget commented Nov 7, 2019

After discussing this with @tsenart and @ryanslade, let me add some technical details...

Previously, without streaming, what repo-updater's Syncer would do in a Sync:

  1. Load list of all repositories on external service we want in our database
  2. Compare that with the repositories we have in our database
  3. Upsert the diff we want to keep

That's what happens here:

https://github.com/sourcegraph/sourcegraph/blob/6af8774e38a217c3690b2aa107b6141974d99073/cmd/repo-updater/repos/syncer.go#L103-L138

Now, in that non-streaming version what would happen when we delete an external service that we just added?

  1. The Syncer would run in the background, asynchronously, doing 1.-2. from above
  2. external service gets deleted
  3. Syncer would fail doing 3. from above, because the external service id forgeign key on the inserted repos does not match the database constraint

In the new, streaming version of the Syncer the Sync method does the following:

  1. Load list of all repositories on external service we want in our database
    1.b. While loading the list of repositories, check whether we have a new one that we can already, eagerly insert into the database
  2. Compare that with the repositories we have in our database
  3. Upsert the diff we want to keep

Now, I think what happens is that when an external service is deleted, (1.b.) will keep on running and not fail:

https://github.com/sourcegraph/sourcegraph/blob/6af8774e38a217c3690b2aa107b6141974d99073/cmd/repo-updater/repos/syncer.go#L395-L399

That in turn doesn't fail the whole syncing process in the background. We keep on listing repositories and trying to insert them in the background, until we reach the point of the upsert, where the whole sync should fail. But until then, we've effectively blocked the syncer.

And that means, that when you then add a new external service the sync it triggers doesn't run until the old one fails completely:

https://github.com/sourcegraph/sourcegraph/blob/6af8774e38a217c3690b2aa107b6141974d99073/cmd/repo-updater/repoupdater/server.go#L294-L304


What we could do to fix this is to abort a syncing process running in the background when an external service is deleted.

Or we could modify the makeNewRepoInserter to check when the insertion fails whether that's because the external service is gone. If so, it should abort:

https://github.com/sourcegraph/sourcegraph/blob/6af8774e38a217c3690b2aa107b6141974d99073/cmd/repo-updater/repos/syncer.go#L378-L401

@tsenart
Copy link
Contributor

tsenart commented Nov 7, 2019

Syncer would fail doing 3. from above, because the external service id forgeign key on the inserted repos does not match the database constraint

We actually don't have a foreign key constraint between the repo table and the external_services table. The relationship is stored in the sources column, but there is no integrity constraint in place.

@mrnugget
Copy link
Contributor

mrnugget commented Nov 7, 2019

We actually don't have a foreign key constraint between the repo table and the external_services table. The relationship is stored in the sources column, but there is no integrity constraint in place.

interesting. So then the question, and I guess part of this ticket, is: why did this not happen previously? Or did it?

@unknwon
Copy link
Member

unknwon commented Nov 7, 2019

Thanks for writing this! I got a lot context by reading it.

What we could do to fix this is to abort a syncing process running in the background when an external service is deleted.

Or we could modify the makeNewRepoInserter to check when the insertion fails whether that's because the external service is gone. If so, it should abort:

I think we also need to abort when the external service is updated (change of repo scopes).

@mrnugget
Copy link
Contributor

mrnugget commented Nov 7, 2019

I think we also need to abort when the external service is updated

Agree!

@beyang
Copy link
Member

beyang commented Nov 13, 2019

Dear all,

This is your release captain speaking. 🚂🚂🚂

Branch cut for the 3.10 release is scheduled for tomorrow.

Is this issue / PR going to make it in time? Please change the milestone accordingly.
When in doubt, reach out!

Thank you

@unknwon
Copy link
Member

unknwon commented Nov 13, 2019

I'm moving this to 3.11 since we don't have an owner on this issue yet, and most likely miss the branch cut.

@unknwon unknwon modified the milestones: 3.10, 3.11 Nov 13, 2019
@tsenart
Copy link
Contributor

tsenart commented Nov 14, 2019

@unknwon: Until we know we'll tackle this in 3.11, it's better to backlog.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers
Projects
None yet
5 participants