Skip to content
This repository has been archived by the owner on Apr 6, 2018. It is now read-only.

Crash in Go (presumably go-vncdriver) runtime, allocating more stack #111

Closed
tlbtlbtlb opened this issue Jan 13, 2017 · 2 comments
Closed

Comments

@tlbtlbtlb
Copy link
Contributor

tlbtlbtlb commented Jan 13, 2017

Actual behavior

Starter universe-starter-agent with:

$ python train.py --num-workers 8--env-id flashgames.NeonRace-v0 --log-dir /mnt/kube-efs/universe-perfmon/usa-flashgames.NeonRace-v0-20170113-041927
  -m child

After about 2 hours playing NeonRace-v0, one universe-starter-agent worker crashes with:

runtime: newstack sp=0xc820da3380 stack=[0xc820d9c000, 0xc820da3fa0]
        morebuf={pc:0x7fadfdb93f6f sp:0xc820da3388 lr:0x0}
        sched={pc:0x7fadfdb7a617 sp:0xc820da3380 lr:0x0 ctxt:0x0}
runtime: failed to unwind through stackBarrier at SP 0xc820da3388; [ @@@ ==>]
fatal error: inconsistent state in stackBarrier

runtime stack:
runtime.throw(0x7fadfe019ba0, 0x22)
        /usr/lib/go-1.6/src/runtime/panic.go:547 +0x92
runtime.gentraceback(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
        /usr/lib/go-1.6/src/runtime/traceback.go:215 +0x1743
runtime.traceback1(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0)
        /usr/lib/go-1.6/src/runtime/traceback.go:591 +0xca
runtime.traceback(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180)
        /usr/lib/go-1.6/src/runtime/traceback.go:568 +0x4a
runtime.newstack()
        /usr/lib/go-1.6/src/runtime/stack.go:833 +0x56d
runtime.morestack()
        /usr/lib/go-1.6/src/runtime/asm_amd64.s:359 +0x74

goroutine 5 [syscall, locked to thread]:
runtime: failed to unwind through stackBarrier at SP 0xc820da3388; [ @@@ ==>]
fatal error: inconsistent state in stackBarrier
panic during panic

runtime stack:
runtime.startpanic_m()
        /usr/lib/go-1.6/src/runtime/panic.go:604 +0x13e
runtime.systemstack(0x7fadfe02e4f8)
        /usr/lib/go-1.6/src/runtime/asm_amd64.s:307 +0xa1
runtime.startpanic()
        /usr/lib/go-1.6/src/runtime/panic.go:525 +0x14
runtime.throw(0x7fadfe019ba0, 0x22)
        /usr/lib/go-1.6/src/runtime/panic.go:546 +0x85
runtime.gentraceback(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
        /usr/lib/go-1.6/src/runtime/traceback.go:215 +0x1743
runtime.traceback1(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0)
        /usr/lib/go-1.6/src/runtime/traceback.go:591 +0xca
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc820000180)
        /usr/lib/go-1.6/src/runtime/traceback.go:568 +0x4a
runtime.tracebackothers(0xc820001200)
        /usr/lib/go-1.6/src/runtime/traceback.go:698 +0xb0
runtime.dopanic_m(0xc820001200, 0x7fadfdb65de2, 0x7fadf7ffe7c8)
        /usr/lib/go-1.6/src/runtime/panic.go:644 +0x1f5
runtime.dopanic.func1()
        /usr/lib/go-1.6/src/runtime/panic.go:534 +0x34
runtime.systemstack(0x7fadf7ffe7a0)
        /usr/lib/go-1.6/src/runtime/asm_amd64.s:307 +0xa1
runtime.dopanic(0x0)
        /usr/lib/go-1.6/src/runtime/panic.go:535 +0x63
runtime.throw(0x7fadfe019ba0, 0x22)
        /usr/lib/go-1.6/src/runtime/panic.go:547 +0x92
runtime.gentraceback(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
        /usr/lib/go-1.6/src/runtime/traceback.go:215 +0x1743
runtime.traceback1(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180, 0x0)
        /usr/lib/go-1.6/src/runtime/traceback.go:591 +0xca
runtime.traceback(0x7fadfdb93f6f, 0xc820da3388, 0x0, 0xc820000180)
        /usr/lib/go-1.6/src/runtime/traceback.go:568 +0x4a
runtime.newstack()
        /usr/lib/go-1.6/src/runtime/stack.go:833 +0x56d
runtime.morestack()
        /usr/lib/go-1.6/src/runtime/asm_amd64.s:359 +0x74

Versions

Linux 0c0c02f2bfdb 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Python 3.5.2
Name: universe
Version: 0.21.1
Summary: Universe: a software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications.
Home-page: https://github.com/openai/universe
Author: OpenAI
Author-email: universe@openai.com
License: UNKNOWN
Location: /experiment/universe
Requires: autobahn, docker-py, docker-pycreds, fastzbarlight, go-vncdriver, gym, Pillow, PyYAML, six, twisted, ujson
---
Name: gym
Version: 0.7.0
Summary: The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents.
Home-page: https://github.com/openai/gym
Author: OpenAI
Author-email: gym@openai.com
License: UNKNOWN
Location: /experiment/gym
Requires: numpy, requests, six, pyglet
---
Name: tensorflow
Version: 0.12.1
Summary: TensorFlow helps the tensors flow
Home-page: http://tensorflow.org/
Author: Google Inc.
Author-email: opensource@google.com
License: Apache 2.0
Location: /usr/local/lib/python3.5/dist-packages
Requires: protobuf, six, wheel, numpy
---
Name: numpy
Version: 1.11.0
Summary: NumPy: array processing for numbers, strings, records, and objects.
Home-page: http://www.numpy.org
Author: NumPy Developers
Author-email: numpy-discussion@scipy.org
License: BSD
Location: /usr/lib/python3/dist-packages
Requires:
---
Name: go-vncdriver
Version: 0.4.19
Summary: UNKNOWN
Home-page: UNKNOWN
Author: UNKNOWN
Author-email: UNKNOWN
License: UNKNOWN
Location: /usr/local/lib/python3.5/dist-packages
Requires: numpy
---
Name: Pillow
Version: 4.0.0
Summary: Python Imaging Library (Fork)
Home-page: http://python-pillow.org
Author: Alex Clark (Fork Author)
Author-email: aclark@aclark.net
License: Standard PIL License
Location: /usr/local/lib/python3.5/dist-packages
Requires: olefile
@tlbtlbtlb
Copy link
Contributor Author

tlbtlbtlb commented Jan 19, 2017

Another occurrence, after 4 hours of worker time. Details submitted as golang/go#18718.

They suggest upgrading Go. I'm trying some long runs with go 1.7.4

@tlbtlbtlb
Copy link
Contributor Author

After upgrading to go 1.7.4, I haven't seen this in about 200 agent-hours of operation. Leaving open, until we make 1.7.4 the default (which isn't trivial: ubuntu doesn't seem to have a package for it).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant