Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple working copies #13

Closed
martinvonz opened this issue Apr 7, 2021 · 6 comments
Closed

Support multiple working copies #13

martinvonz opened this issue Apr 7, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@martinvonz
Copy link
Member

No description provided.

@martinvonz martinvonz added the enhancement New feature or request label Apr 7, 2021
@martinvonz
Copy link
Member Author

martinvonz commented Nov 13, 2021

I'm debating what the UX should be when a commit that's checked out in one working copy gets rewritten.

First some background: We currently keep track of the current checkout in the repo "view", which is part of the "operation" object in the operation log. This is considered the source of truth for which commit is checked out in the working copy. That record gets updated within a transaction, before the working copy itself gets updated. The working copy has its own record of what's currently checked out. That's supposed to follow the record in the view object. For example, if a transaction committed and then the power was cut, we'll end up with a mismatch. In such cases, next time jj is run, we should detect that the working copy commit isn't the intended commit, and automatically update to the new commit. That's not implemented yet, but it should be done regardless of support for multiple working copies.

For support for multiple working copies, I plan to extend the view object to keep track of the current checkout in each working copy. The question is what should happen if you run a command in one working copy that impacts the checkout in another working copy. For example, you might rebase a whole tree of commits, including the current working copy's checkout and another working copy's checkout. I see a few options:

  1. Update the other working copies' checked-out commits and also update the working copy files. This will of course only work if the other working copy is on a file system that's available (e.g. not on a USB stick or a network file system). It also requires the central repo to keep track of where every working copy is.
  2. Update the other working copies' checked-out commits, but don't update their working copy files. The working copy would then be stale and it would be update later when the user runs a command in that working copy. There's a small risk that the old working copy's checkout would be GC'd at this point.
  3. Don't update the other working copies' checked-out commits but leave them visible. This would mean that you'd end up with two visible commits with the same change ID, which by definition means that you'll have divergence. The user would then have to manually update and hide the old commit, or run some command that we don't yet have.
  4. Don't update the other working copies' checked-out commits and hide them. This would mean that your working copy would not appear in jj log but jj log -r @ would still show it. The change would not be considered divergent. The user can recover by updating to the new commit. We would probably want to make jj status highlight the fact that the working copy's commit is not currently visible.

@arxanas
Copy link
Contributor

arxanas commented Nov 13, 2021

It might be worth delineating two specific workflows that the user can pick from at a given time: live or on-demand sync. Similar to auto-fetch/auto-push systems for Git, or whatever Fossil does with auto-syncing. In particular, the same live sync workflow would extend to the use-case of syncing across multiple devices.

I think it would be hard to pick a single workflow that aligns with user expectations for multiple checkouts.

@martinvonz
Copy link
Member Author

Interesting idea. I'll have to go read about what Fossil does.

For reference, Git does 3 or 4 (they're the same there since it has no concept of change ID and it only moves the current branch pointer on rebase [1]). Git still has the main repo keep track of each worktree's location. IIRC, it does that mostly to prevent GC of commits checked out in other worktree. I don't know if the reflog for each worktree's HEAD is stored in the main repo or in worktree (it's easy to check, of course, I just haven't yet).

[1] By the way, @arxanas, since git move can move multiple branches, I suppose that means you may want to check that the branches you're about to move are not checked out in other working copies (I haven't checked if you already do, but it seems like something that's easy to overlook).

martinvonz added a commit that referenced this issue Nov 26, 2021
Having a concept of a "workspace" will be useful for adding support
for multiple workspaces (#13). You can think of the "workspace" as a
repo combined with a working copy. A workspace corresponds 1:1 with a
`.jj/` directory. It's pretty close to what other VCS simply call a
"repo", but I've ended up using the word "repo" for what Git calls a
"bare repo".
martinvonz added a commit that referenced this issue Nov 26, 2021
The `Repo` doesn't do anything with the `WorkingCopy` except keeping a
reference to it for its users to use. In fact, the entire lib crate
doesn't do antyhing with the `WorkingCopy`. It therefore seems simpler
to have the users of the crate manage the `WorkingCopy` instance. This
patch does that by letting `Workspace` own it. By not keeping an
instance in `Repo`, which is `Sync`, we can also drop the
`Arc<Mutex<>>` wrapping.

I left `Repo::working_copy()` for convenience for now, but now it
creates a new instance every time. It's only used in tests.

This further decoupling should help us add support for multiple
working copies (#13).
martinvonz added a commit that referenced this issue Nov 26, 2021
This is another step towards removing coupling between the repo and
the working copy, so we can have multiple working copies for a single
repo (#13).
@martinvonz
Copy link
Member Author

I just pushed some commits refactoring ReadonlyRepo so it no longer has working_copy() and working_copy_path(). The coupling was already very weak, but now it's gone completely. It wasn't a necessary change, as we could instead have made ReadonlyRepo::working_copy() simply depend on which working copy it was loaded from, but it is cleaner to separate it completely. It was always a bit ugly how the WorkingCopy had to be kept in a Mutex only because ReadonlyRepo is Sync. Now that ReadonlyRepo doesn't have a WorkingCopy, we don't need the Mutex and we can benefit from Rust's ownership rules.

I've added a "workspace" concept and a Workspace type, representing a working copy and a .jj/ directory. .jj/working_copy/ will keep the working copy state ("index"/"dirstate" in Git-/Mercurial-speak) for the workspace. All other directories in .jj/ (i.e. .jj/store/, .jj/op_store/, .jj/op_heads/, .jj/index/) will be shared between all workspaces backed by the same repo. Perhaps we should move them into .jj/repo/ or something to clarify that.

I still haven't decided which of the solutions for updating "other" workspaces I like best.

@martinvonz
Copy link
Member Author

I've put the this work on hold for a while but I hope to come back to it now. One idea I remember having earlier was to record the operation ID of the last successful update in .jj/working_copy/. I don't remember the details, but thinking a bit more about it now, it seems to make sense.

By having the operation ID there, we can detect if the working copy is stale. It may be stale because a process had crashed before it finished updating the working copy, or maybe it was a workspace that was stored on a disconnected USB stick. It could also be that it was stale because we went with solution 2 above and a process had rebased the commit in another workspace connected to the same repo.

The working copy's associated operation ID can also help us address a hack related to concurrent updates of the working copy. If we have recorded the operation ID there, then we simply reload at that operation instead of the current operation.

If we store the operation ID in the .jj/working_copy/, then the commit ID that's recorded there won't be needed anymore since it can be found in the operation. We may still want to store it so we don't need to look up the operation when updating the working copy, but it would be just a cache of the checkout recorded in the operation.

martinvonz added a commit that referenced this issue Jan 20, 2022
When there are concurrent operations that want to update the working
copy, it's useful to know which operation was the last to successfully
update the working copy. That can help use decide how to resolve a
mismatch between the repo view's record and the working copy's
record. If we detect such a difference, we can look at the working
copy's operation ID to see if it was updated by an operation before or
after we loaded the repo.

If the working copy's record says that it was updated at operation A
and we have loaded the repo at operation B (after A), we know that the
working copy is stale, so we can automatically update it (or tell the
user to run some command to update it if we think that's more
user-friendly).

Conversely, if we have loaded the repo at operation A and the working
copy's record says that it was updated at operation B, we know that
there was some concurrent operation that updated it. We can then
decide to print a warning telling the user that we skipped updating
because of the conflict. We already have logic for not updating the
working copy if the repo is loaded at an earlier operation, but maybe
we can drop that if we record the operation in the working copy (as
this patch does).
martinvonz added a commit that referenced this issue Jan 20, 2022
Now that we have the operation ID recorded in the working copy state,
we can tell if the working copy is stale. When it is, we update it to
the repo view's checkout.
martinvonz added a commit that referenced this issue Jan 29, 2022
)

It's clearly `Workspace`'s job to create `.jj/working_copy/`, I must
have just forgotten to move it there.
martinvonz added a commit that referenced this issue Feb 2, 2022
The `.jj/` directory contains information about two distinct parts:
the repo and the working copy. Most subdirectories are related to the
repo; only `.jj/working_copy/` is about the working copy. Let's move
the repo-related bits into a new `.jj/repo/` subdirectory. That makes
it clearer that they're related to the repo. It will probably also be
easier to manage when we have support for multiple workspaces backed
by a single repo.
martinvonz added a commit that referenced this issue Feb 2, 2022
This patch teaches the `View` object to keep track of the checkout in
each workspace. It serializes that information into the `OpStore`. For
compatibility with existing repos, the existing field for a single
workspace's checkout is interpreted as being for the workspace called
"default".

This is just an early step towards support for multiple
workspaces. Remaining things to do:

 * Record the workspace ID somewhere in `.jj/` (maybe in
   `.jj/working_copy/`)

 * Update existing code to use the workspace ID instead of assuming
   it's always "default" as we do after this patch

 * Add a way of indicating in `.jj/` that the repo lives elsewhere and
   make it possible to load a repo from such workspaces

 * Add a command for creating additional workspaces

 * Show each workspace's checkout in log output
martinvonz added a commit that referenced this issue Feb 2, 2022
This patch makes it so the workspace ID can be stored in
`.jj/working_copy/checkout`. The workspace ID is still always
"default".
martinvonz added a commit that referenced this issue Feb 2, 2022
When checking out a new commit, we look at the old checkout to see if
it's empty so we should abandon it. We current use the default
workspace's checkout. We need to respect the workspace ID we're given
in `MutableRepo::check_out()`, and we need to be able to deal with
that workspace not existing yet (i.e. this being the first checkout in
that workspace).
martinvonz added a commit that referenced this issue Feb 2, 2022
We detect concurrent working copy changes by checking that the old
commit matches the repo's view. We should use the current workspace
when looking up the checkout in the view.
martinvonz added a commit that referenced this issue Feb 2, 2022
When updating the working copy after committing a transaction, we
should update it based on the right checkout.
martinvonz added a commit that referenced this issue Feb 2, 2022
Before committing the working copy, we check if the working copy is
checked out to the commit we expect based on the repo's view. We
always use the default workspace's checkout, so we need to fix that.
martinvonz added a commit that referenced this issue Feb 2, 2022
When importing Git HEAD, we already use the right workspace ID for the
new checkout, but the old checkout we abandon is always the default
workspace's. We should fix that even if we will never support sharing
a working copy with Git in a non-default workspace.
martinvonz added a commit that referenced this issue Feb 2, 2022
If the workspace is shared with a Git repo, we sometimes update Git's
HEAD ref. We should get the new checkout from the right workspace ID
when doing that (though I'm not sure we'll ever support sharing the
working copy with Git in a non-default workspace).
martinvonz added a commit that referenced this issue Feb 2, 2022
…#13)

`jj new` will update onto the new commit if the previous commit was
the current checkout. That code needs to use the current workspace's
checkout.
martinvonz added a commit that referenced this issue Feb 2, 2022
`jj status` shows the status for the default workspace. Make it use
the current workspace instead.
martinvonz added a commit that referenced this issue Feb 2, 2022
Because we record each workspace's checkout in the repo view, we can
-- unlike other VCSs -- let the user refer to any workspace's checkout
in revsets. This patch adds syntax for that, so you can show the
contents of the checkout in workspace "foo" with `jj show foo@`. That
won't automatically commit that workspace's working copy, however.
martinvonz added a commit that referenced this issue Feb 2, 2022
We don't use the `current_checkout` keyword in out default templates,
but let's still fix it, so it refers to the current workspace.
martinvonz added a commit that referenced this issue Feb 2, 2022
We should highlight (with bright colors by default) the current
workspace's checkout, not the default workspace's checkout.
martinvonz added a commit that referenced this issue Feb 2, 2022
As part of creating a new repository, we create an open commit on top
of the root and set that as the current checkout. Now that we have
support for multiple checkouts in the model, we also have support for
zero checkouts, which means we don't need to create that commit on top
of the root when creating the repo. We can therefore move out of
`ReadonlyRepo`'s initialization code and let `Workspace` instead take
care of it. A user-visible effect of this change is that we now create
one operation for initilizing the repo and another one for checking
out the root commit. That seems fine, and will be consistent with the
additional operation we will create when adding further workspaces.
martinvonz added a commit that referenced this issue Feb 3, 2022
In workspaces added after the initial one, the idea is to have
`.jj/repo` be a file whose contents is a path to the location of the
repo directory in some other workspace.
martinvonz added a commit that referenced this issue Feb 3, 2022
With all the groudwork done, everything should just work with multiple
workspaces now. So let's add a command for creating workspaces.
martinvonz added a commit that referenced this issue Feb 3, 2022
It seems helpful to show in the log output which commit is checked out
in which workspace, so let's try that. I made it only show the
information if there are multiple checkouts for now.
martinvonz added a commit that referenced this issue Feb 3, 2022
…nce (#13)

When you run `jj co abc123` and that commit is already checked out, we
just print a message. The condition for that assumed that the checkout
existed, which it won't if you just ran `jj workspace forget`. Let's
avoid that crash, especially since `jj co` is an easy way to restore
the working copy if you had accidentally run `jj workspace forget`
(though `jj undo` is even easier).
@martinvonz
Copy link
Member Author

You can now create additional workspaces with jj workspace add <path to new workspace>, list workspaces with jj workspace list, and forget workspaces with jj workspace forget. Each workspace's checked-out commit shows up in the default log template (they're recorded in the repo view, so that's easy and cheap to do). You can also refer to them using e.g. jj diff --from workspace1@ --to workspace2@ to show a diff from workspace1's checkout to workspace2's checkout. Plain @ is now a short form to refer to the current workspace's checkout.

I went with solution 2 above, meaning that all checkouts are rebased along with commits they point to, but the working copy is not updated. If you run a command in the workspace after its checkout has been updated from another workspace, then it'll get automatically updated. Perhaps we should change that to just print a warning, in case the user was running a build or tests, or maybe it's just confusing.

I consider this done now and we can open new bugs for any improvements to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants