Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: concurrent map read and map write #368

Closed
andrewgdavis opened this issue Jul 20, 2016 · 26 comments
Closed

fatal error: concurrent map read and map write #368

andrewgdavis opened this issue Jul 20, 2016 · 26 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@andrewgdavis
Copy link

running minikube logs shows the following:

fatal error: concurrent map read and map write

goroutine 274604 [running]:
runtime.throw(0x3ce8ca0, 0x21)
    /usr/lib/google-golang/src/runtime/panic.go:550 +0x99 fp=0xc8246efaf8 sp=0xc8246efae0 pc=0x42f639
runtime.mapaccess1_faststr(0x291e0a0, 0xc82225a840, 0xc820d91fb0, 0x25, 0x589a000)
    /usr/lib/google-golang/src/runtime/hashmap_fast.go:202 +0x5b fp=0xc8246efb58 sp=0xc8246efaf8 pc=0x40e18b
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).queueActionLocked(0xc821fe04d0, 0x38c58c8, 0x4, 0x3867a00, 0xc82752d700, 0x0, 0x0)
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:303 +0x2e2 fp=0xc8246efcc0             sp=0xc8246efb58 pc=0x110f8e2
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).Resync(0xc821fe04d0, 0x0, 0x0)
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:498 +0x4f7 fp=0xc8246efe38             sp=0xc8246efcc0 pc=0x1112027
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch.func1(0xc821547290, 0xc82006cf00, 0xc8219ae780, 0xc8232bd020, 0xc821547298)
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:289 +0x252 fp=0xc8246eff88              sp=0xc8246efe38 pc=0x1127a72
runtime.goexit()
    /usr/lib/google-golang/src/runtime/asm_amd64.s:2002 +0x1 fp=0xc8246eff90 sp=0xc8246eff88 pc=0x464961
created by k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:296 +0xde3

Steps to reproduce: Unknown at this point, but the number of occurrences has been 3 in the past 2 days.

minikube version: v0.6.0

kubectl version Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"283137936a498aed572ee22af6774b6fb6e9fd94", GitTreeState:"clean", BuildDate:"2016-07-01T19:26:38Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"darwin/amd64"} The connection to the server 192.168.99.100:8443 was refused - did you specify the right host or port? (minikube vm is no longer responsive to kubectl comands)

OS: Darwin Kernel 15.3.0

@dlorenc
Copy link
Contributor

dlorenc commented Jul 21, 2016

There was a watch related bug fixed in 1.3.1, maybe once we update in #380.

@andrewgdavis
Copy link
Author

As an FYI, after the failure and restart with: minikube stop; minikube start previously running pods start up fine, but no longer show events from kubectl describe pods. To see the events, one needs to delete the running pod deployment and then recreate it.

@dlorenc Thanks for the info.

@andrewgdavis
Copy link
Author

I looked at the watch fix in 1.3.1 / 1.3.2 but i don't think it addresses the problem.

Now that I have a few more stack dumps to look at, my guess is that there is a race condition that the following PR might address:

kubernetes/kubernetes#28744

@andrewgdavis
Copy link
Author

I tried out the latest release minikube v0.7.0, but the problem persists.

fatal error: concurrent map read and map write

goroutine 460377 [running]:
runtime.throw(0x3ceee60, 0x21)
    /usr/local/go/src/runtime/panic.go:547 +0x90 fp=0xc824f57ae8 sp=0xc824f57ad0
runtime.mapaccess2_faststr(0x2924c00, 0xc821629b60, 0xc824b56830, 0xc, 0x1, 0x1)
    /usr/local/go/src/runtime/hashmap_fast.go:307 +0x5b fp=0xc824f57b48 sp=0xc824f57ae8
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).queueActionLocked(0xc82123abb0, 0x38cb990, 0x4, 0x384dba0, 0xc8245bb000, 0x0, 0x0)
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:309 +0x4dc fp=0xc824f57cb0 sp=0xc824f57b48
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).Resync(0xc82123abb0, 0x0, 0x0)
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:498 +0x4f7 fp=0xc824f57e28 sp=0xc824f57cb0
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch.func1(0xc820026468, 0xc8215d6900, 0xc82273ca00, 0xc8248234a0, 0xc820026478)
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:289 +0x252 fp=0xc824f57f78 sp=0xc824f57e28
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc824f57f80 sp=0xc824f57f78
created by k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:296 +0xde3

Next week I will try and cherry pick kubernetes/kubernetes#28744 and see if that helps.

@andrewgdavis
Copy link
Author

link to similar issue kubernetes/kubernetes#29638

@andrewgdavis
Copy link
Author

I had a bit of time to try and work this issue, but I think I may be missing some fundamental steps in debugging.

I cloned the minikube repo, and cherry picked the relevant code I wanted to try.
Then make was issued; and out/minikube was created.

I currently have the .7.1 (release) minikube-vm installed--- does it need to be deleted before I run out/minikube stop; out/minikube start? Or is a simple out/minikube stop/start ok?

I am trying not to delete the current VM because my internet speed is slow... does minikube always work with the same eval $(minikube docker-env) settings?

@dlorenc
Copy link
Contributor

dlorenc commented Aug 11, 2016

Sorry for the slow response! Looks like this has finally been cherrypicked, but it didn't make it into 1.3.5: kubernetes/kubernetes#29960

We'll be able to pull in the fix once this makes it to an official Kubernetes release.

It looks like this is caused by having deployments with overlapping selectors, so removing that might be a workaround for now: kubernetes/kubernetes#29960 (comment)

You can just re-run "minikube start" to update the VM/localkube binary after you recompile. You don't need to stop or delete.

@dlorenc
Copy link
Contributor

dlorenc commented Aug 11, 2016

We should include those in our offline section: #391

@dlorenc dlorenc added the kind/bug Categorizes issue or PR as related to a bug. label Aug 11, 2016
@janetkuo
Copy link
Member

@andrewgdavis just curious, did you have multiple deployments? Do those deployments have overlapping label selectors?

@andrewgdavis
Copy link
Author

andrewgdavis commented Aug 15, 2016

at one time, that may have been the case, but last week I have only been deploying 1 pod that makes use of pod.alpha.kubernetes.io/init-containers and different types of persistent volumes (and fsGroups) backing the containers. Things don't seem to work as advertised :/

The yaml looks like this.

kind: Pod
apiVersion: v1
metadata:
  name: hot
  labels:
    app: hot
  annotations:
    pod.alpha.kubernetes.io/init-containers:
  ...
  ... a couple of init-containers doing stuff to prep tomcat
spec:
  containers:
  - name: run
    image: mytomcat:8.0.36
    imagePullPolicy: "IfNotPresent"
    volumeMounts:
    - name: workdir
      mountPath: /usr/share/tomcat/webapps/
  volumes:
    - name: workdir
      emptyDir:
        medium: "Memory"

the fatal error: concurrent map writes still occurs when running it. (simple kubectl delete -f tomcat.yaml and then kubectl create -f tomcat.yaml)

update: to be honest the last time that this occurred was Aug 9th-- I may have been doing deployments along side this pod at the time.
I will keep a note when I have multiple deployments started.

@janetkuo
Copy link
Member

@dlorenc this looks different from kubernetes/kubernetes#29960, since the panic comes from client cache deltafifo, instead of deployment controller?

@andrewgdavis
Copy link
Author

The minikube logs show a lot of goroutines that start with:
/go/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:296 +0xde3

for example:

grep "reflector.go:296" minikube-aug9err.log | wc -l
628

most of which look like this:

goroutine 1309 [select, 380 minutes]:
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch.func1(0xc820100718, 0xc82006cea0, 0xc821f96780, 0xc8239da000, 0xc820100720)
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:283 +0x3c8
created by k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:296 +0xde3

goroutine 414518 [select, 172 minutes]:
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch.func1(0xc821dd8cd0, 0xc822021740, 0xc821cff220, 0xc824550cc0, 0xc821dd8cd8)
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:283 +0x3c8
created by k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:296 +0xde3

goroutine 327671 [select, 212 minutes]:
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch.func1(0xc82248a0d0, 0xc821f00fc0, 0xc820400460, 0xc827fd92c0, 0xc82248a0e0)
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:283 +0x3c8
created by k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:296 +0xde3

goroutine 1256 [select, 380 minutes]:
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch.func1(0xc820100638, 0xc82006cea0, 0xc821fc45a0, 0xc822ccd680, 0xc820100640)
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:283 +0x3c8
created by k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch
    /usr/local/google/home/aprindle/go/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:296 +0xde3

Not sure if that is helpful or not. Let me know if there is anything else that I should look for.

@janetkuo
Copy link
Member

Looks like delta fifo's items map is read/written concurrently while reflector doing Resync in ListAndWatch https://github.com/kubernetes/kubernetes/blob/b0deb2eb8f4037421077f77cb163dbb4c0a2a9f5/pkg/client/cache/delta_fifo.go#L303 @lavalamp @wojtek-t maybe know more details about delta fifo?

@andrewgdavis do you know how to reproduce this?

@lavalamp
Copy link
Member

deltafifo has the queue locked. I'd expect the culprit to actually be something else. @andrewgdavis if you post the entire stack dump it might be helpful.

@andrewgdavis
Copy link
Author

Here is one of the 3 logs (I think they were all using minikube 0.7.1):
https://www.dropbox.com/s/4ng4p9aq4w0djpg/aug9err.log?dl=0

Let me know if you also would like to see the others.

@janetkuo no, i don't know how this was reproduced-- and I have not seen the issue since aug9. That said my environment changed-- I cherry picked kubernetes/kubernetes#28744 and built minikube-- then instead of using virtualBox, i went with xhyve.

@andrewgdavis
Copy link
Author

Just got another dump from today. So the cherrypick and use of xhyve don't seem to matter.
If interested here is the latest from today (aug 17)
https://www.dropbox.com/s/qwbdqyntpaeeieq/aug17err.log?dl=0

@wojtek-t
Copy link
Member

See this one:
kubernetes/kubernetes#30759 (comment)

It seems to be the same issue.

@lavalamp
Copy link
Member

Thanks for the stack dump.

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Aug 18, 2016
Automatic merge from submit-queue

queueActionLocked requires write lock

Fix kubernetes/minikube#368
Fix part of #30759

Hopefully. On stack dumps I couldn't see who was fighting with this.
k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Aug 22, 2016
Automatic merge from submit-queue

Do not hold the lock for a long time

Followup to #30839.

I'm not convinced this is a super great idea but I'll throw it out and let others decide.

Ref kubernetes/minikube#368
Ref #30759
@andrewgdavis
Copy link
Author

For closure I think this PR finally fixed the issue:

kubernetes/kubernetes#30948

@lavalamp
Copy link
Member

@andrewgdavis thanks for the confirmation!

@jezell
Copy link

jezell commented Aug 29, 2016

@aaron-prindle would it be possible to get out an 0.8.1 with this in it? Restarting multiple times a day because of panics isn't so fun.

@dlorenc
Copy link
Contributor

dlorenc commented Aug 29, 2016

Hey @jezell, we can do a release soon with this fix, but it would be much nicer if this made it into a real Kubernetes release first. It doesn't look like it's been cherry picked into a 1.3.* release yet, would you be fine with using a 1.4-alpha release to get this fix?

@jezell
Copy link

jezell commented Aug 29, 2016

I would definitely

@dlorenc
Copy link
Contributor

dlorenc commented Aug 29, 2016

Cool, we'll publish one soon. You won't need 0.8.1 for that, you'll be able to use it with a --k8s_version flag and your existing build.

@jezell
Copy link

jezell commented Aug 29, 2016

awesome! thanks

@ichekrygin
Copy link

Looks like this issue impacts ingress-controller resulting in frequent restarts:

nginx-ingress-4280m             1/1       Running   25         5d
nginx-ingress-5dv2y             1/1       Running   25         5d
nginx-ingress-6pwtz             1/1       Running   16         3d
nginx-ingress-vt8cr             1/1       Running   24         5d

I am hitting this issue in v1.3.6:

fatal error: concurrent map read and map write

goroutine 140812 [running]:
runtime.throw(0x1b9f100, 0x21)
        /usr/local/go/src/runtime/panic.go:547 +0x90 fp=0xc82059eaf8 sp=0xc82059eae0
runtime.mapaccess1_faststr(0x148a4a0, 0xc820317f50, 0xc8208fd0e0, 0x15, 0x1)
        /usr/local/go/src/runtime/hashmap_fast.go:202 +0x5b fp=0xc82059eb58 sp=0xc82059eaf8
k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).queueActionLocked(0xc8200e5b80, 0x1a3e8c8, 0x4, 0x19dde80, 0xc820d01348, 0x0, 0x0)
        /usr/local/google/home/beeps/goproj/src/k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:305 +0x1dd fp=0xc82059ecc0 sp=0xc82059eb58
k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).Resync(0xc8200e5b80, 0x0, 0x0)
        /usr/local/google/home/beeps/goproj/src/k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:511 +0x4f7 fp=0xc82059ee38 sp=0xc82059ecc0
k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch.func1(0xc8200c01e0, 0xc82008ec00, 0xc820612000, 0xc820cf04e0, 0xc8200c01e8)
        /usr/local/google/home/beeps/goproj/src/k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:289 +0x252 fp=0xc82059ef88 sp=0xc82059ee38
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc82059ef90 sp=0xc82059ef88
created by k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

7 participants