Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"failed to lock file" error #205

Closed
larsks opened this issue Sep 8, 2022 · 9 comments
Closed

"failed to lock file" error #205

larsks opened this issue Sep 8, 2022 · 9 comments

Comments

@larsks
Copy link
Contributor

larsks commented Sep 8, 2022

I accidentally aborted an stg goto operation (configured to gpg sign commits by default, fingers didn't anticipate the password prompt so I typed the wrong thing, causing the commit to fail, causing stg to fail), and subsequent stg operations failed with:

error: failed to lock file '/home/lars/projects/k8s-perf-tools/db-perf/.git/refs/stacks/main.lock' for writing: ; class=Os (2); code=Locked (-14)

Removing the lock file allowed things to continue, but this seems like a bug: stg should clean up lockfiles on exit, and even if it fails to clean up lock files, any locks it holds shouldn't persist when the process exits.

@jpgrayson
Copy link
Collaborator

jpgrayson commented Sep 10, 2022

Hmmm, this is an interesting case. Thanks for reporting it, @larsks.

There is one place in StGit's transaction code where the stack reference (e.g. refs/stacks/main) and patch references (refs/patches/main/xxx) are explicitly locked using a git2::Transaction. I think the first step would be to confirm that this is the critical section of code where the lock is created. If it is (it probably is), then we would need to figure out how to ensure the lock file(s) are cleaned up.

I'll note that it seems possible to ensure locks are cleaned up with a SIGTERM, but not with a SIGKILL and/or other arbitrary exit conditions. Also, not sure what mechanisms are available on Windows since that platform doesn't really do signals.

@larsks
Copy link
Contributor Author

larsks commented Sep 10, 2022

Here's a complete reproducer for this situation.

Prerequisites

You need git configured to automatically sign commits.

  • Ensure that commit.gpgsign=true.
  • Ensure that user.signkey is set to something useful

You need gpg configured to prompt for a passphrase on the terminal:

  • Set pinentry-mode loopback in ~/.gnupg/gpg.conf
  • Set allow-loopback-pinentry in ~/.gnupg/gpg-agent.conf

Initial repository state

We start with:

$ stg series
+ repair-reference-in-cmd-module
> add-lint-api-doc-target-to

Procedure

  1. Ensure that you don't have a cached passphrase (and that any
    configuration changes you made in gpg-agent.conf are activated):

    echo RELOADAGENT | gpg-connect-agent
    
  2. Try to change patches using e.g. stg pop. Abort the operation
    with CTRL-C when prompted for a passphrase.

    $ stg pop
    - add-lint-api-doc-target-to
    > repair-reference-in-cmd-module
    Enter passphrase:
    
  3. Try to repeat the above command:

    $ stg pop
    error: HEAD and stack top are not the same. This can happen if you modify the branch with git. See `stg repair --help` for next steps to take.
    
  4. Attempt to repair the branch. Enter the passphrase when prompted.

    $ stg repair
    info: `add-lint-api-doc-target-to` is now unapplied
    error: failed to lock file '/home/lars/src/repro/.git/refs/stacks/master.lock' for writing: ; class=Os (2); code=Locked (-14)
    

@jpgrayson
Copy link
Collaborator

This reproducing procedure is really helpful. I am able to reproduce the problem and have started taking a look at what's going on.

jpgrayson added a commit that referenced this issue Sep 12, 2022
When gpg signatures (stgit.gpgsign and/or commit.gpgsign) are enabled,
there is an opportunity for commits occurring at stack transaction execute
time to fail due to gpg failing. And gpg may fail for any number of
reasons, but some likely ones are either the user entering the wrong pass
phrase too many times or the user simply cancelling pass phrase entry.

In these cases where a transaction-time commit might fail, StGit should
leave the repository, working tree, and stack in a consistent state. But as
issue #205 demonstrates, there have been some problems where repo, working
tree, and stack state have not been consistent after such a failure.

This new test case exercises a known-problematic code path.

Reproduces #205.
@larsks
Copy link
Contributor Author

larsks commented Sep 12, 2022

Edit: Hey, thanks for taking the time to look at this and address it! Those changes seem to successfully clear up the locking problem.


With the changes in 08b0bd4, the above procedure still leaves the local repository in an inconsistent state:

$ stg pop
- add-test-for-gpg-failure
> robust-stack-transaction
Enter passphrase: # <CTRL-C> here
$ stg series
+ robust-stack-transaction
> add-test-for-gpg-failure
$ stg pop
error: Index not clean. Use `refresh` or `reset --hard`

Is that the intended behavior? stg reset --hard is the correct course of action here; running stg refresh would end up erroneously adding changes to the current patch.

@jpgrayson
Copy link
Collaborator

For the case where stg pop fails due to the gpg prompt being aborted, the transaction should now be fully rolled back such that the stack and branch state are consistent. So no more head/top mismatch state or needing to run stg repair afterward. Looks like you're getting that part of the new behavior. However, in my test cases, I have not seen the index be in an inconsistent state, so that's new/unexpected. I'm not immediately able to reproduce that problem either--could use some help with that.

@larsks
Copy link
Contributor Author

larsks commented Sep 12, 2022

I was just following the earlier procedure for reproducing the original problem. Here's a complete log:

$ git clone https://github.com/stacked-git/stgit
Cloning into 'stgit'...
remote: Enumerating objects: 17448, done.
remote: Counting objects: 100% (4016/4016), done.
remote: Compressing objects: 100% (882/882), done.
remote: Total 17448 (delta 3154), reused 3906 (delta 3103), pack-reused 13432
Receiving objects: 100% (17448/17448), 4.05 MiB | 16.64 MiB/s, done.
Resolving deltas: 100% (13713/13713), done.
$ cd stgit/
$ git log --oneline -2
97c44976 (HEAD -> master, origin/master, origin/HEAD) Add test for GPG failure
08b0bd49 Robust stack transaction rollback
$ stg init
$ stg uncommit -n2
> add-test-for-gpg-failure
$ stg series
+ robust-stack-transaction
> add-test-for-gpg-failure
$ echo RELOADAGENT | gpg-connect-agent
OK
$ stg pop
- add-test-for-gpg-failure
> robust-stack-transaction
Enter passphrase: # ENTER CTRL-C HERE
$ stg status
M  t/t1009-gpg.sh
$ stg pop
error: Index not clean. Use `refresh` or `reset --hard`

@jpgrayson
Copy link
Collaborator

I am able to reproduce the above when I configure gpg with pinentry-mode loopback. I had been using a different pinentry program (pinentry-qt) where cancelling the pinentry would result in StGit resuming control after its child git process exited with a failure such that rollback would occur.

$ stg pop                                                                                                                                                                                                            master
- add-test-for-gpg-failure
> robust-stack-transaction
@ add-test-for-gpg-failure (rolled back)
error: `git commit-tree`: error: gpg failed to sign the data
Command aborted (all changes rolled back)
$ stg status

So there is something about hitting ctrl-c with the loopback pinentry that seems to cause the parent stg process to be killed hard such that it does not get a chance to perform its new-and-improved rollback.

@larsks
Copy link
Contributor Author

larsks commented Sep 12, 2022

So there is something about hitting ctrl-c with the loopback pinentry that seems to cause the parent stg process to be killed hard such that it does not get a chance to perform its new-and-improved rollback.

The SIGINT here is delivered to gpg, and git, and stg (which wouldn't be the case when cancelling a GUI password prompt). Does the new transaction code get triggered when stg is interrupted by a signal?

@jpgrayson
Copy link
Collaborator

The updated transaction rollback code currently assumes that the subordinate git commit-tree process it calls will return. I am looking into adding a SIGINT handler using the ctrlc package.

jpgrayson added a commit that referenced this issue Sep 13, 2022
When the user sends a SIGINT signal (i.e. by pressing ctrl-c) while the
stack transaction is executing, the signal is now deferred such that the
rollback can occur as it does for other transaction execute time errors.
SIGINT signals outside of the execute-time critical section are not
deferred and will cause the stg process to exit immediately.

The implementation uses the ctrlc crate, which is cross platform and should
also work on Windows.

Also add test case that simulates the stg process being hit with a SIGINT
at transaction execute time, which is a thing that can happen depending on
the gpg pinentry configuration and if the user hits ctrl-c.

N.B. this change only handles SIGINT. The more violent SIGKILL will
immediately terminate the stg process, bypassing the rollback mechanism.

Fixes: #205
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants