regression in v1.23.1 #697

vrothberg · 2020-08-20T11:55:58Z

v1.23.1 is causing regressions in Podman's apiv2 tests. I do not have the time to track it down but need to get work done that's currently blocked by the regression so I first revert the bump in c/image (see containers/image#1031).

[+0064s] not ok 291 [40-pods] POST libpod/pods/bar/start [] : status
[+0064s] #  expected: 200
[+0064s] #    actual: 500
[+0064s]   expected: 200

The text was updated successfully, but these errors were encountered:

vrothberg · 2020-08-20T12:01:14Z

It looks like commit 7005d07 causes it.

vrothberg · 2020-08-20T12:13:26Z

$ POST libpod/pods/bar/start []                                          
HTTP/1.1 500 Internal Server Error                                       
Content-Type: application/json                                           
Date: Thu, 20 Aug 2020 12:11:03 GMT                                      
Content-Length: 117                                                      
                                                                         
{                                                                        
  "cause": "some containers failed",                                     
  "message": "error starting some containers: some containers failed",   
  "response": 500                                                        
}                                                                        
not ok 277 [40-pods] POST libpod/pods/bar/start [] : status

vrothberg · 2020-08-20T12:57:15Z

What I am seeing on the server side:

time="2020-08-20T14:27:33+02:00" level=debug msg="Received: -1"
time="2020-08-20T14:27:33+02:00" level=debug msg="Cleaning up container c6c60358e96e7ebf8ce7953644fef5a2bbb10ddeac9fd9c1b019162f59b51bc5"
``

vrothberg · 2020-08-20T12:59:22Z

@giuseppe @nalind @rhatdan ... no idea what's going on here. The only thing I noticed is that the error only occurs when we're locking the graphlock before the layer store. If we lock after, CI passes again.

rhatdan · 2020-08-20T13:53:52Z

@zvier PTAL

vrothberg · 2020-08-20T13:55:47Z

Note that it might very well be a bug in Podman. At least we cannot rule that out.

nalind · 2020-08-20T14:14:01Z

@giuseppe @nalind @rhatdan ... no idea what's going on here. The only thing I noticed is that the error only occurs when we're locking the graphlock before the layer store. If we lock after, CI passes again.

That differs from the order in which they're acquired in Shutdown(), which would be a potential lock ordering problem.

giuseppe · 2020-08-20T19:55:28Z

I don't think there is a regression in v1.23.1, IMO it only showed a race condition that already existed.

I think the issue is that podman performs the unmount from the cleanup process, and that confuses the mounts RefCounter. So next time the podman daemon tries to do a mount, nothing happens because the RefCounter believes it is already mounted, while in reality it was unmounted by the cleanup process.

giuseppe · 2020-08-20T20:06:07Z

I've tried with:

diff --git a/vendor/github.com/containers/storage/drivers/counter.go b/vendor/github.com/containers/storage/drivers/counter.go
index 72551a38d..ec7a6090a 100644
--- a/vendor/github.com/containers/storage/drivers/counter.go
+++ b/vendor/github.com/containers/storage/drivers/counter.go
@@ -51,6 +51,8 @@ func (c *RefCounter) incdec(path string, infoOp func(minfo *minfo)) int {
                if c.checker.IsMounted(path) {
                        m.count++
                }
+       } else if !c.checker.IsMounted(path) {
+               m.count = 0
        }
        infoOp(m)
        count := m.count

Not sure if it is the best fix, but at least the tests pass now.

if a previously mounted container was unmounted externally (e.g. through conmon cleanup for Podman containers), the ref counter will lose track of it and report it as still mounted. Closes: containers#697 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe · 2020-08-21T09:11:17Z

PR here: #698

if a previously mounted container was unmounted externally (e.g. through conmon cleanup for Podman containers), the ref counter will lose track of it and report it as still mounted. Closes: containers#697 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

vrothberg mentioned this issue Aug 20, 2020

podman load/save: support multi-image docker archive containers/podman#6811

Merged

giuseppe mentioned this issue Aug 21, 2020

counter: check for external umounts #698

Merged

mtrmac mentioned this issue Aug 24, 2020

Bump github.com/containers/storage from 1.23.0 to 1.23.1 containers/image#1032

Closed

rhatdan closed this as completed in #698 Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression in v1.23.1 #697

regression in v1.23.1 #697

vrothberg commented Aug 20, 2020 •

edited

Loading

vrothberg commented Aug 20, 2020

vrothberg commented Aug 20, 2020

vrothberg commented Aug 20, 2020

vrothberg commented Aug 20, 2020

rhatdan commented Aug 20, 2020

vrothberg commented Aug 20, 2020

nalind commented Aug 20, 2020

giuseppe commented Aug 20, 2020

giuseppe commented Aug 20, 2020

giuseppe commented Aug 21, 2020

regression in v1.23.1 #697

regression in v1.23.1 #697

Comments

vrothberg commented Aug 20, 2020 • edited Loading

vrothberg commented Aug 20, 2020

vrothberg commented Aug 20, 2020

vrothberg commented Aug 20, 2020

vrothberg commented Aug 20, 2020

rhatdan commented Aug 20, 2020

vrothberg commented Aug 20, 2020

nalind commented Aug 20, 2020

giuseppe commented Aug 20, 2020

giuseppe commented Aug 20, 2020

giuseppe commented Aug 21, 2020

vrothberg commented Aug 20, 2020 •

edited

Loading