Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: rearrange order of file and directory operations #399

Merged
merged 2 commits into from
Mar 24, 2022

Conversation

alexanderleegs
Copy link
Contributor

Problem

This PR fixes a bug in the resource category renaming - the breadcrumb of renamed resource categories were not being modified correctly due to the index.html file not being modified correctly. The reason for this issue was that our directory level operations use a snapshot of the existing state when making operations - this means that any modification to individual files before a directory level operation gets overwritten.

Solution

This PR rearranges the order to fix this issue.

Copy link
Contributor

@seaerchin seaerchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR seems good to me but i have a few questions regarding how directory operations work

  1. why do services that depend on BaseDirectoryService also require GithubService (as the former already has the latter)
  2. for any mutation operation that works on a directory level, how will we prevent inconsistent state/data races? i see 1 way forward by always getting the current state from github (as you've done here) prior to mutation but this requires that it's always done, which could be better achieved through putting it in BaseDirectoryService.

wdyt?

@alexanderleegs
Copy link
Contributor Author

this PR seems good to me but i have a few questions regarding how directory operations work

  1. why do services that depend on BaseDirectoryService also require GithubService (as the former already has the latter)
  2. for any mutation operation that works on a directory level, how will we prevent inconsistent state/data races? i see 1 way forward by always getting the current state from github (as you've done here) prior to mutation but this requires that it's always done, which could be better achieved through putting it in BaseDirectoryService.

wdyt?

So BaseDirectoryService is more of an abstraction for github operations which use git trees - we use these when multiple files need to be moved, to avoid having to make a separate modification to each individual file. But some operations involve both moving files and editing the contents of some of the files, e.g. this resource category moving requires modification to the shifted index.html file, so they call both. Does that behaviour make sense?

I agree that the current way might be difficult to handle inconsistent state - we are relying on methods to always call directory level operations before individual ones, and all directory level operations need to be condensed into a single operation. The reason we don't query for git tree state before directory level operations is because of time - retrieving the git tree of a repo is an expensive operation and we want to minimise the amount of times we retrieve it, so we reuse the same git tree state we use for rollbacks so we only need to query the tree once. Would be happy to hear your thoughts on a better solution!

@seaerchin
Copy link
Contributor

seaerchin commented Mar 23, 2022

this PR seems good to me but i have a few questions regarding how directory operations work

  1. why do services that depend on BaseDirectoryService also require GithubService (as the former already has the latter)
  2. for any mutation operation that works on a directory level, how will we prevent inconsistent state/data races? i see 1 way forward by always getting the current state from github (as you've done here) prior to mutation but this requires that it's always done, which could be better achieved through putting it in BaseDirectoryService.

wdyt?

So BaseDirectoryService is more of an abstraction for github operations which use git trees - we use these when multiple files need to be moved, to avoid having to make a separate modification to each individual file. But some operations involve both moving files and editing the contents of some of the files, e.g. this resource category moving requires modification to the shifted index.html file, so they call both. Does that behaviour make sense?

I agree that the current way might be difficult to handle inconsistent state - we are relying on methods to always call directory level operations before individual ones, and all directory level operations need to be condensed into a single operation. The reason we don't query for git tree state before directory level operations is because of time - retrieving the git tree of a repo is an expensive operation and we want to minimise the amount of times we retrieve it, so we reuse the same git tree state we use for rollbacks so we only need to query the tree once. Would be happy to hear your thoughts on a better solution!

thanks for explaining the job of BaseDirectoryService! i think this behaviour is reasonable for abstraction over git trees!

for the second point - i think the problem is that all individual operations would be overwritten by directory level operations and that individual/directory operations can come in from many sources.

if this were purely sequential, we wouldn't have an issue - but because 2 different users can submit a request (eg user A requests to rename dir A to dir B and user B requests to move file A in dir A to dir C), it leads to complications. are we checking for such mutations that the state is consistent? i seem to recall that there's a toast warning if the state is outdated and asking to refresh but i'm not sure if this is true for all directory operations.

also, i'm not sure we have to query for the git tree state right? we can query for the commit hash of the directory in question and that should be sufficient to answer if the state has changed. we can even keep an in-memory store of the directoryName to commit hash if required so that reads are guaranteed to be safe and update this hash on writes

@alexanderleegs
Copy link
Contributor Author

this PR seems good to me but i have a few questions regarding how directory operations work

  1. why do services that depend on BaseDirectoryService also require GithubService (as the former already has the latter)
  2. for any mutation operation that works on a directory level, how will we prevent inconsistent state/data races? i see 1 way forward by always getting the current state from github (as you've done here) prior to mutation but this requires that it's always done, which could be better achieved through putting it in BaseDirectoryService.

wdyt?

So BaseDirectoryService is more of an abstraction for github operations which use git trees - we use these when multiple files need to be moved, to avoid having to make a separate modification to each individual file. But some operations involve both moving files and editing the contents of some of the files, e.g. this resource category moving requires modification to the shifted index.html file, so they call both. Does that behaviour make sense?
I agree that the current way might be difficult to handle inconsistent state - we are relying on methods to always call directory level operations before individual ones, and all directory level operations need to be condensed into a single operation. The reason we don't query for git tree state before directory level operations is because of time - retrieving the git tree of a repo is an expensive operation and we want to minimise the amount of times we retrieve it, so we reuse the same git tree state we use for rollbacks so we only need to query the tree once. Would be happy to hear your thoughts on a better solution!

thanks for explaining the job of BaseDirectoryService! i think this behaviour is reasonable for abstraction over git trees!

for the second point - i think the problem is that all individual operations would be overwritten by directory level operations and that individual/directory operations can come in from many sources.

if this were purely sequential, we wouldn't have an issue - but because 2 different users can submit a request (eg user A requests to rename dir A to dir B and user B requests to move file A in dir A to dir C), it leads to complications. are we checking for such mutations that the state is consistent? i seem to recall that there's a toast warning if the state is outdated and asking to refresh but i'm not sure if this is true for all directory operations.

also, i'm not sure we have to query for the git tree state right? we can query for the commit hash of the directory in question and that should be sufficient to answer if the state has changed. we can even keep an in-memory store of the directoryName to commit hash if required so that reads are guaranteed to be safe and update this hash on writes

We do actually check for race conditions - the attachWriteRouteHandlerWrapper and attachRollbackRouteHandlerWrapper are used for this purpose to lock the repo to prevent 2 simultaneous changes!

Agreed on the querying for commit hash - though if I remember correctly the retrieval of commits itself is also somewhat slow, since all commits are retrieved. We could relook into this in future, but I think this is out of scope of the current PR!

@seaerchin
Copy link
Contributor

before i forget, could i check that the current test case will fail in this scenario?

@alexanderleegs
Copy link
Contributor Author

before i forget, could i check that the current test case will fail in this scenario?

No actually - I don't think there's a way to catch this easily, since the appropriate methods are still called, and the issue is the ordering. It's hard to catch this even in production, since it doesn't show up on the CMS, which is why this bug hasn't been found for quite a while

@seaerchin
Copy link
Contributor

before i forget, could i check that the current test case will fail in this scenario?

No actually - I don't think there's a way to catch this easily, since the appropriate methods are still called, and the issue is the ordering. It's hard to catch this even in production, since it doesn't show up on the CMS, which is why this bug hasn't been found for quite a while

hmm would this be possible by asserting call order? (ie, that getResourceDirectoryPath is called before update) if so, the solution given here might work!

@alexanderleegs
Copy link
Contributor Author

before i forget, could i check that the current test case will fail in this scenario?

No actually - I don't think there's a way to catch this easily, since the appropriate methods are still called, and the issue is the ordering. It's hard to catch this even in production, since it doesn't show up on the CMS, which is why this bug hasn't been found for quite a while

hmm would this be possible by asserting call order? (ie, that getResourceDirectoryPath is called before update) if so, the solution given here might work!

As discussed offline, we'll tackle this after identity has been merged so as not to make the merge more messy - I've filed an issue for this #400

@alexanderleegs alexanderleegs merged commit 613cc07 into develop Mar 24, 2022
@alexanderleegs alexanderleegs deleted the fix/renamed-resource-breadcrumb branch March 24, 2022 03:00
This was referenced Mar 24, 2022
harishv7 pushed a commit that referenced this pull request Feb 17, 2023
* Fix: rearrange order of file and directory operations

* Fix: resource category rename tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants