Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: locklinear.go is flaky #19276

Closed
mundaym opened this issue Feb 24, 2017 · 35 comments
Closed

test: locklinear.go is flaky #19276

mundaym opened this issue Feb 24, 2017 · 35 comments
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@mundaym
Copy link
Member

mundaym commented Feb 24, 2017

Seen on the builders (page 1):

linux/s390x: https://build.golang.org/log/addcd4b8de793f6b79065f2dc9579f06115d4079
openbsd/386: https://build.golang.org/log/08035f048b8b36a58aeaf06a8fa90d75377f3417
dragonfly/amd64: https://build.golang.org/log/dfc83b1a92a75b3fbff84c1675ecffb0ab6fd214
dragonfly/amd64: https://build.golang.org/log/cb41b327d857187c92ec62aef17cde179d60db8f
dragonfly/amd64: https://build.golang.org/log/a7ac7ef8d0cdcffc3a52e1bb2751391e49d13ef7
dragonfly/amd64: https://build.golang.org/log/a8aec06e3e8be117ce5500d822f25bbaf2e78a86
dragonfly/amd64: https://build.golang.org/log/c36d2439a65b4614fe1ffb33f3f6aa0f637136c0

The error looks similar to:

##### ../test
# go run run.go -- locklinear.go
exit status 1
panic: lockmany: too slow: 64000 ops: 871.06786ms; 128000 ops: 13.33057512s


goroutine 1 [running]:
main.checkLinear(0xeacb0, 0x8, 0x3e8, 0xf1180)
	/data/golang/workdir/go/test/locklinear.go:58 +0x2ca
main.main()
	/data/golang/workdir/go/test/locklinear.go:102 +0x6c
exit status 2

I also can't get this test to pass at all on a linux/s390x machine set up for performance testing, it looks like the relationship isn't actually linear, instead there is a 3-4x jump every time n is doubled. I'll open another issue for that once I have more information. EDIT: GOROOT was set and pointed to an old version of Go.

I'm not sure this test can be made to run reliably on virtual machines as-is, perhaps there is an alternative way to test this behavior?

@mundaym
Copy link
Member Author

mundaym commented Feb 24, 2017

cc @rsc

@rsc
Copy link
Contributor

rsc commented Feb 24, 2017

Either flaky or detecting a real problem. I'll investigate on the dragonfly/amd64, assuming I can gomote that.

@rsc
Copy link
Contributor

rsc commented Feb 24, 2017

Can't gomote. Can you try this on your s390x system:
locklinear.go.txt

cp locklinear.go.txt locklinear.go
go build locklinear.go
./locklinear
go tool pprof locklinear lock.prof
top10

and see what you get?

@mundaym
Copy link
Member Author

mundaym commented Feb 24, 2017

Will do.

The non-linearity I reported in the issue description appears to be operating system related. That machine runs RHEL 7.2 whereas the other machines I have access to run SLES 12 SP1 (the builder runs this) and Ubuntu 16.04 which both show linear behaviour.

@rsc
Copy link
Contributor

rsc commented Feb 24, 2017

It crossed my mind that we might be tickling an OS problem, but I don't 100% see how, at least not in the locking being tested, which is all in user space. Unless the machine has memory for 64,000 goroutines but not memory for 128,000 goroutines. Seems unlikely, and the test tries smaller numbers before getting that far, and those failed too.

Another worthwhile experiment is to rerun with 'const debug = false' changed to true.

@mundaym
Copy link
Member Author

mundaym commented Feb 24, 2017

Ran that 3 times on the builder (SLES 12 SP 1), I don't think there are enough samples to see anything interesting:

Showing nodes accounting for 2200us, 78.57% of 2800us total
Showing top 10 nodes out of 73
      flat  flat%   sum%        cum   cum%
     600us 21.43% 21.43%      600us 21.43%  runtime.nanotime /data/munday/master/src/runtime/sys_linux_s390x.s
     400us 14.29% 35.71%      400us 14.29%  runtime.futex /data/munday/master/src/runtime/sys_linux_s390x.s
     300us 10.71% 46.43%     1000us 35.71%  sync.(*Mutex).Lock /data/munday/master/src/sync/mutex.go
     200us  7.14% 53.57%      200us  7.14%  runtime.casgstatus /data/munday/master/src/runtime/proc.go
     200us  7.14% 60.71%      400us 14.29%  runtime.execute /data/munday/master/src/runtime/proc.go
     100us  3.57% 64.29%      200us  7.14%  runtime.chansend /data/munday/master/src/runtime/chan.go
     100us  3.57% 67.86%      100us  3.57%  runtime.findfunc /data/munday/master/src/runtime/symtab.go
     100us  3.57% 71.43%      100us  3.57%  runtime.findmoduledatap /data/munday/master/src/runtime/symtab.go
     100us  3.57% 75.00%      200us  7.14%  runtime.lock /data/munday/master/src/runtime/lock_futex.go
     100us  3.57% 78.57%      100us  3.57%  runtime.memclrNoHeapPointers /data/munday/master/src/runtime/memclr_s390x.s
Showing nodes accounting for 1000us, 100% of 1000us total
Showing top 10 nodes out of 40
      flat  flat%   sum%        cum   cum%
     300us 30.00% 30.00%      300us 30.00%  runtime.futex /data/munday/master/src/runtime/sys_linux_s390x.s
     300us 30.00% 60.00%      400us 40.00%  sync.(*Mutex).Lock /data/munday/master/src/sync/mutex.go
     100us 10.00% 70.00%      100us 10.00%  runtime.clearpools /data/munday/master/src/runtime/mgc.go
     100us 10.00% 80.00%      100us 10.00%  runtime.nanotime /data/munday/master/src/runtime/sys_linux_s390x.s
     100us 10.00% 90.00%      100us 10.00%  runtime.shade /data/munday/master/src/runtime/mgcmark.go
     100us 10.00%   100%      100us 10.00%  runtime.stackpoolalloc /data/munday/master/src/runtime/stack.go
         0     0%   100%      400us 40.00%  main.checkLinear /data/munday/master/test/locklinear.go
         0     0%   100%      400us 40.00%  main.checkLinear.func1 /data/munday/master/test/locklinear.go
         0     0%   100%      400us 40.00%  main.main /data/munday/master/test/locklinear.go
         0     0%   100%      100us 10.00%  main.main.func1 /data/munday/master/test/locklinear.go
     200us 28.57% 28.57%      200us 28.57%  runtime.nanotime /data/munday/master/src/runtime/sys_linux_s390x.s
     100us 14.29% 42.86%      100us 14.29%  runtime.gcmarkwb_m /data/munday/master/src/runtime/mbarrier.go
     100us 14.29% 57.14%      200us 28.57%  runtime.runqgrab /data/munday/master/src/runtime/proc.go
     100us 14.29% 71.43%      100us 14.29%  runtime.scanobject /data/munday/master/src/runtime/mgcmark.go
     100us 14.29% 85.71%      300us 42.86%  runtime.schedule /data/munday/master/src/runtime/proc.go
     100us 14.29%   100%      100us 14.29%  runtime.usleep /data/munday/master/src/runtime/sys_linux_s390x.s
         0     0%   100%      100us 14.29%  main.checkLinear /data/munday/master/test/locklinear.go
         0     0%   100%      100us 14.29%  main.checkLinear.func1 /data/munday/master/test/locklinear.go
         0     0%   100%      100us 14.29%  main.main /data/munday/master/test/locklinear.go
         0     0%   100%      100us 14.29%  main.main.func1 /data/munday/master/test/locklinear.go

@rsc
Copy link
Contributor

rsc commented Feb 24, 2017

Did the execution take 13 seconds but the profile only shows 2.2ms? That does suggest the OS. :-)

@rsc
Copy link
Contributor

rsc commented Feb 24, 2017

Another option is Linux perf if you have kernel symbols.

@rsc
Copy link
Contributor

rsc commented Feb 24, 2017

The openbsd problem appears to be real flakiness: I see it get a little too close to the maximum number of tries before succeeding when using a gomote, but I've never seen it fail. I did see it fail in a trybot just now on one of my own CLs. I sent a CL to deflake that, and also to add more debugging output in the event of a failure.

Because you are seeing consistent failures on s390x I don't expect my CL will help, but it may give us more information.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/37348 mentions this issue.

@mundaym
Copy link
Member Author

mundaym commented Feb 24, 2017

The non-linearity I was seeing on RHEL 7.2 was because I was accidentally using an old version of go on that machine. When I re-ran with tip it passes the test very reliably, only usually requires a couple of iterations.

If I run the test on its own on the s390x builder it seems to run reliably. When it is run in the tests it runs in parallel with other tests, including the compilation of those tests. The number of tests it runs in parallel is equal to the number of CPUs, 4 on the s390x builder, which means 3 other test files could be being compiled and run at the same time this is executing. Seems bad for a test requiring consistent execution times, could be the source of some of the flakiness?

@bradfitz
Copy link
Contributor

@mundaym, we can make a "test/serial" or "test/isolated" directory and have a separate cmd/dist test unit just for that directory with no parallelism.

gopherbot pushed a commit that referenced this issue Feb 24, 2017
This should help on the openbsd systems where the test mostly passes.

I don't expect it to help on s390x where the test reliably fails.
But it should give more information when it does fail.

For #19276.

Change-Id: I496c291f2b4b0c747b8dd4315477d87d03010059
Reviewed-on: https://go-review.googlesource.com/37348
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@rsc
Copy link
Contributor

rsc commented Feb 24, 2017

Let's see what the deflaking does before we add more mechanism to the test directory. We've been running maplinear with no problem at all for a long time.

@bradfitz
Copy link
Contributor

A new type of locklinear failure, out of memory:

https://build.golang.org/log/73611f9141109e3c9dc158791a19783b64d2af90

##### ../test
# go run run.go -- locklinear.go
exit status 1
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x4c7748, 0x16)
	/tmp/workdir/go/src/runtime/panic.go:596 +0x95
runtime.sysMap(0xc481120000, 0x2df60000, 0xc431d15900, 0x551058)
	/tmp/workdir/go/src/runtime/mem_linux.go:216 +0x1d0
runtime.(*mheap).sysAlloc(0x538f20, 0x2df60000, 0xc431d6ec00)
	/tmp/workdir/go/src/runtime/malloc.go:424 +0xda
runtime.(*mheap).grow(0x538f20, 0x16fae, 0x0)
	/tmp/workdir/go/src/runtime/mheap.go:774 +0x62
runtime.(*mheap).allocSpanLocked(0x538f20, 0x16fae, 0xc431d3ad80)
	/tmp/workdir/go/src/runtime/mheap.go:678 +0x42b
runtime.(*mheap).alloc_m(0x538f20, 0x16fae, 0x100000000, 0xffffffffffffffff)
	/tmp/workdir/go/src/runtime/mheap.go:562 +0xda
runtime.(*mheap).alloc.func1()
	/tmp/workdir/go/src/runtime/mheap.go:627 +0x4b
runtime.systemstack(0xc420031f10)
	/tmp/workdir/go/src/runtime/asm_amd64.s:343 +0xab
runtime.(*mheap).alloc(0x538f20, 0x16fae, 0x10100000000, 0xc420018000)
	/tmp/workdir/go/src/runtime/mheap.go:626 +0xa0
runtime.largeAlloc(0x2df5a018, 0xc43d644701, 0x5)
	/tmp/workdir/go/src/runtime/malloc.go:790 +0x93
runtime.mallocgc.func1()
	/tmp/workdir/go/src/runtime/malloc.go:685 +0x3e
runtime.systemstack(0xc420019300)
	/tmp/workdir/go/src/runtime/asm_amd64.s:327 +0x79
runtime.mstart()
	/tmp/workdir/go/src/runtime/proc.go:1119

goroutine 1 [running]:
runtime.systemstack_switch()
	/tmp/workdir/go/src/runtime/asm_amd64.s:281 fp=0xc42003cc00 sp=0xc42003cbf8
runtime.mallocgc(0x2df5a018, 0x4b8840, 0x440801, 0x3e258f13c0000000)
	/tmp/workdir/go/src/runtime/malloc.go:684 +0x927 fp=0xc42003cca0 sp=0xc42003cc00
runtime.makeslice(0x4b8840, 0x1ea3c01, 0x1ea3c01, 0x48483c6, 0xc42003cd20, 0x4408d8)
	/tmp/workdir/go/src/runtime/slice.go:54 +0x73 fp=0xc42003ccd0 sp=0xc42003cca0
main.main.func2(0x1f400)
	/tmp/workdir/go/test/locklinear.go:120 +0x50 fp=0xc42003cd58 sp=0xc42003ccd0
main.checkLinear.func1(0x1f400, 0x1e0d0de0)
	/tmp/workdir/go/test/locklinear.go:34 +0x62 fp=0xc42003cda8 sp=0xc42003cd58
main.checkLinear(0x4c53c9, 0x8, 0x3e8, 0x4cbd30)
	/tmp/workdir/go/test/locklinear.go:43 +0xd4 fp=0xc42003cf58 sp=0xc42003cda8
main.main()
	/tmp/workdir/go/test/locklinear.go:119 +0x79 fp=0xc42003cf88 sp=0xc42003cf58
runtime.main()
	/tmp/workdir/go/src/runtime/proc.go:185 +0x20a fp=0xc42003cfe0 sp=0xc42003cf88
runtime.goexit()
	/tmp/workdir/go/src/runtime/asm_amd64.s:2148 +0x1 fp=0xc42003cfe8 sp=0xc42003cfe0
exit status 2

FAIL	locklinear.go	15.262s

@mundaym
Copy link
Member Author

mundaym commented Mar 2, 2017

More dragonfly failures:

https://build.golang.org/log/f504bb806ac24358bbdddb7e5e3e118f9f971a84
https://build.golang.org/log/b670e82c3919854e45573e96a02853d06918dfc0

The numbers are very noisy, quite often the 2n run takes less time than the n run, for example:

# go run run.go -- locklinear.go
exit status 1
panic: lockone: too slow: 64000 ops: 1.866604887s; 128000 ops: 2.663964272s

1000 15.157454ms 2000 13.483049ms
2000 19.934531ms 4000 29.374991ms
4000 22.055165ms 8000 78.889775ms
8000 92.101943ms 16000 607.091197ms
16000 1.417252823s 32000 1.135918603s
16000 485.606034ms 32000 692.090697ms
32000 533.792438ms 64000 1.899678729s
32000 924.877037ms 64000 3.323945381s
32000 1.704821728s 64000 1.702881825s
32000 610.926975ms 64000 2.389301244s
32000 1.00228947s 64000 1.375318566s
32000 810.626726ms 64000 702.461466ms
64000 1.385925378s 128000 3.672122249s
64000 4.858599721s 128000 3.577465618s
64000 2.705008136s 128000 3.644147616s
64000 1.866604887s 128000 2.663964272s

Maybe it would be better to drop the constraint that the 2n run take at least 1.5x as long as the n run. maplinear doesn't seem to have that constraint which could be why it is more reliable in noisy environments.

@rsc
Copy link
Contributor

rsc commented Mar 2, 2017

Before I had that constraint I saw the test passing incorrectly because the constant factors dominated the overhead. For example the 1000 vs 2000 being 15ms vs 13ms, clearly we're not testing linear growth at that point.

CL 37543 for another attempt. If that fails then I'll (reluctantly) remove the lower bound entirely.

@aclements
Copy link
Member

@alexbrainman
Copy link
Member

https://storage.googleapis.com/go-build-log/73068911/windows-386-gce_cdb5e3d0.log

487 is ERROR_INVALID_ADDRESS in

runtime: VirtualAlloc of 385548288 bytes failed with errno=487

Alex

@rsc
Copy link
Contributor

rsc commented Mar 3, 2017

My pending CL should help: it should pass with smaller N. I also cut the max N (really the max time) by about 2X.

@philhofer
Copy link
Contributor

@bradfitz bradfitz added the Testing An issue that has been verified to require only test changes, not just a test failure. label Mar 21, 2017
@bradfitz bradfitz added this to the Go1.9 milestone Mar 21, 2017
@josharian
Copy link
Contributor

Another one on the dragonfly builder just now: https://build.golang.org/log/a1a3edd60e8e373bc06f9ad09e74faae59491793

@josharian
Copy link
Contributor

This just happened locally (darwin/amd64). Interestingly, when it happened, the rest of the test directory did not complete; all.bash hung. I wonder whether that was cause or symptom. Unfortunately, the stack traces all printed together, so they're pretty illegible, but here's the result of the SIGQUIT. The top stack trace--where the process was hung--is the panic line in locklinear; it appears the process did not exit as a result of the panic, perhaps due to some bad internal mutex state (?).

##### ../test
# go run run.go -- locklinear.go
exit status 1
panic: lockmany: too slow: 4000 ops: 17.315521ms; 8000 ops: 465.796945ms

1000 4.186967ms 2000 21.341457ms
1000 171.854355ms 2000 15.875607ms
1000 8.380933ms 2000 33.098731ms
1000 4.340298ms 2000 57.968891ms
1000 79.895221ms 2000 37.583062ms
2000 229.891317ms 4000 129.740395ms
2000 178.979218ms 4000 186.627779ms
2000 19.621144ms 4000 57.962043ms
2000 8.045203ms 4000 353.29689ms
2000 53.131247ms 4000 176.864036ms
4000 35.798087ms 8000 414.810857ms
4000 80.849778ms 8000 386.742847ms
4000 67.245059ms 8000 585.42688ms
4000 41.112874ms 8000 43.956649ms
4000 17.315521ms 8000 465.796945ms


goroutine 1 [running]:
main.checkLinear(0x10ca706, 0x8, 0x3e8, 0x10d0f80)
	/Users/josh/go/tip/test/locklinear.go:65 +0x564
main.main()
	/Users/josh/go/tip/test/locklinear.go:120 +0x79
exit status 2

FAIL	locklinear.go	6.767s
2017/04/04 07:22:54 Failed: exit status 1
^\SIGQUIT: quit
PC=0x1059443 m=0 sigcode=0

goroutine 0 [idle]:
runtime.kevent(0x4, 0x0, 0x0, 0x7fff5fbfef10, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/Users/josh/go/tip/src/runtime/sys_darwin_amd64.s:526 +0x23
runtime.netpoll(0xc420026601, 0xc420022c01)
	/Users/josh/go/tip/src/runtime/netpoll_kqueue.go:79 +0xb4
runtime.findrunnable(0xc42001e000, 0x0)
	/Users/josh/go/tip/src/runtime/proc.go:2074 +0x602
runtime.schedule()
	/Users/josh/go/tip/src/runtime/proc.go:2212 +0x12f
runtime.park_m(0xc420000180)
	/Users/josh/go/tip/src/runtime/proc.go:2275 +0xb6
runtime.mcall(0x7fff5fbff880)
	/Users/josh/go/tip/src/runtime/asm_amd64.s:274 +0x5b

goroutine 1 [syscall]:
syscall.Syscall6(0x7, 0xcb3e, 0xc420127b5c, 0x0, 0xc420186000, 0x0, 0x0, 0xc420186000, 0x16f73e0, 0x0)
	/Users/josh/go/tip/src/syscall/asm_darwin_amd64.s:41 +0x5
syscall.wait4(0xcb3e, 0xc420127b5c, 0x0, 0xc420186000, 0x90, 0x1483340, 0x1)
	/Users/josh/go/tip/src/syscall/zsyscall_darwin_amd64.go:34 +0x88
syscall.Wait4(0xcb3e, 0xc420127bac, 0x0, 0xc420186000, 0x4, 0x4, 0x1a80c78)
	/Users/josh/go/tip/src/syscall/syscall_bsd.go:129 +0x51
os.(*Process).wait(0xc420142f60, 0xc42017c000, 0x160, 0xc420010140)
	/Users/josh/go/tip/src/os/exec_unix.go:38 +0x84
os.(*Process).Wait(0xc420142f60, 0x0, 0x0, 0x14bca60)
	/Users/josh/go/tip/src/os/exec.go:115 +0x2b
os/exec.(*Cmd).Wait(0xc42017c000, 0x0SIGQUIT: quit, SIGQUIT: quit0x0

)
PC=	PC=/Users/josh/go/tip/src/os/exec/exec.go0x40f5cae:0x105421b m=439 m=0 +0
0x62 sigcode=

0goroutine 
os/exec.(*Cmd).Run1
( [0xc42017c000goroutine syscall, ]:
0xc42017c0000,  [0x1idle)
]:
	/Users/josh/go/tip/src/os/exec/exec.go:284 +0x5c
cmd/go/internal/tool.runTool(0x16f09a0runtime.mach_semaphore_wait, (0xc4200101400x100000803, , 0x40x0, , 0x40x0)
, 	0x10c8b96/Users/josh/go/tip/src/cmd/go/internal/tool/tool.go, :0x3000716f878,  +0x1228bc00x56b, 
0x7fff5fbff4b0, 0x104ed13, main.main0xffffffffffffffff(, )
0x1074e38	, .../Users/josh/go/tip/src/cmd/go/main.go)
:	133/Users/josh/go/tip/src/runtime/sys_darwin_amd64.s +:0x66a445
 +0xb

goroutine 17 [syscall, locked to thread]:
runtime.semasleep1(runtime.goexit0xffffffffffffffff(, )
0x1074e38	)
/Users/josh/go/tip/src/runtime/asm_amd64.s	:/Users/josh/go/tip/src/runtime/os_darwin.go2152: +4130x1 +
0x4b

goroutine 5 [syscall]:
runtime.semasleep.func1()
	/Users/josh/go/tip/src/runtime/os_darwin.go:432 +0x33os/signal.signal_recv
(0x0runtime.systemstack)
(	0x7fff5fbff4d8/Users/josh/go/tip/src/runtime/sigqueue.go)
:	116/Users/josh/go/tip/src/runtime/asm_amd64.s +:0xa9348
 +0xab
os/signal.loopruntime.semasleep(()
0xffffffffffffffff	, /Users/josh/go/tip/src/os/signal/signal_unix.go0xc420061e38:)
22	 +/Users/josh/go/tip/src/runtime/os_darwin.go0x22:
431created by  +os/signal.init.10x44

	/Users/josh/go/tip/src/os/signal/signal_unix.goruntime.notesleep:(280x1229010 +)
0x41	
/Users/josh/go/tip/src/runtime/lock_sema.go
:167rax     +0x40xe9

rbx    0x0
rcx    runtime.stopm0x7fff5fbfeea0(
)
rdx    	0x0/Users/josh/go/tip/src/runtime/proc.go
:rdi    16370x4 +
0xadrsi    
0x0
rbp    0x7fff5fbff710
rsp    0x7fff5fbfeea0
r8     0x40
r9     0x0runtime.findrunnable
(r10    0xc42001f9000x7fff5fbfef10, 
0x0r11    )
0x246	
/Users/josh/go/tip/src/runtime/proc.gor12    :20920xc420010c38 +0x5c7

r13    0x0
runtime.scheduler14    (0x54)

	r15    /Users/josh/go/tip/src/runtime/proc.go0x7:
2212rip     +0x12f0x1059443

rflags runtime.park_m0x247(
0xc420061e00cs     )
0x7	
/Users/josh/go/tip/src/runtime/proc.gofs     :0x02275
 +gs     0xb60x0

runtime.mcall(0x7fff5fbff6c0)
	/Users/josh/go/tip/src/runtime/asm_amd64.s:274 +0x5b

goroutine 1 [chan receive]:
main.(*tester).runPending(0xc4201b2000, 0x0)
	/Users/josh/go/tip/src/cmd/dist/test.go:968 +0x2e9
main.(*tester).run(0xc4201b2000)
	/Users/josh/go/tip/src/cmd/dist/test.go:213 +0x8eb
main.cmdtest()
	/Users/josh/go/tip/src/cmd/dist/test.go:41 +0x2b8
main.xmain()
	/Users/josh/go/tip/src/cmd/dist/main.go:43 +0x215
main.main()
	/Users/josh/go/tip/src/cmd/dist/util.go:509 +0x2ae

goroutine 5 [select]:
main.bghelper()
	/Users/josh/go/tip/src/cmd/dist/util.go:150 +0x105
created by main.bginit
	/Users/josh/go/tip/src/cmd/dist/util.go:143 +0x64

goroutine 6 [select]:
main.bghelper()
	/Users/josh/go/tip/src/cmd/dist/util.go:150 +0x105
created by main.bginit
	/Users/josh/go/tip/src/cmd/dist/util.go:143 +0x64

goroutine 7 [select]:
main.bghelper()
	/Users/josh/go/tip/src/cmd/dist/util.go:150 +0x105
created by main.bginit
	/Users/josh/go/tip/src/cmd/dist/util.go:143 +0x64

goroutine 8 [select]:
main.bghelper()
	/Users/josh/go/tip/src/cmd/dist/util.go:150 +0x105
created by main.bginit
	/Users/josh/go/tip/src/cmd/dist/util.go:143 +0x64

goroutine 87 [IO wait]:
internal/poll.runtime_pollWait(0x12dfe50, 0x72, 0x1073f89)
	/Users/josh/go/tip/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202c8d88, 0x72, 0x1213820, 0x1212168)
	/Users/josh/go/tip/src/internal/poll/fd_poll_runtime.go:85 +0xae
internal/poll.(*pollDesc).waitRead(0xc4202c8d88, 0xc4202a936a, 0x496)
	/Users/josh/go/tip/src/internal/poll/fd_poll_runtime.go:90 +0x34
internal/poll.(*FD).Read(0xc4202c8d70, 0xc4202a936a, 0x496, 0x496, 0x0, 0x0, 0x0)
	/Users/josh/go/tip/src/internal/poll/fd_unix.go:113 +0x15e
os.(*File).read(0xc4202c60f0, 0xc4202a936a, 0x496, 0x496, 0x1062dbb, 0x112eae0, 0x600)
	/Users/josh/go/tip/src/os/file_unix.go:203 +0x4e
os.(*File).Read(0xc4202c60f0, 0xc4202a936a, 0x496, 0x496, 0x16a, 0x0, 0x0)
	/Users/josh/go/tip/src/os/file.go:101 +0x77
bytes.(*Buffer).ReadFrom(0xc4202c4230, 0x12134e0, 0xc4202c60f0, 0x1477028, 0xc4202c4230, 0x1)
	/Users/josh/go/tip/src/bytes/buffer.go:188 +0x16f
io.copyBuffer(0x12132a0, 0xc4202c4230, 0x12134e0, 0xc4202c60f0, 0x0, 0x0, 0x0, 0xc420328420, 0xc42018cd01, 0xc4202c4230)
	/Users/josh/go/tip/src/io/io.go:384 +0x2c6
io.Copy(0x12132a0, 0xc4202c4230, 0x12134e0, 0xc4202c60f0, 0x101, 0xc420234fc8, 0x110e6d1)
	/Users/josh/go/tip/src/io/io.go:360 +0x68
os/exec.(*Cmd).writerDescriptor.func1(0x0, 0x0)
	/Users/josh/go/tip/src/os/exec/exec.go:258 +0x4d
os/exec.(*Cmd).Start.func1(0xc420328420, 0xc4202b1460)
	/Users/josh/go/tip/src/os/exec/exec.go:375 +0x27
created by os/exec.(*Cmd).Start
	/Users/josh/go/tip/src/os/exec/exec.go:374 +0x4c0

goroutine 98 [syscall]:
syscall.Syscall6(0x7, 0x103ec, 0xc420190dcc, 0x0, 0xc4202b85a0, 0x0, 0x0, 0xc4202b85a0, 0x14b9270, 0x0)
	/Users/josh/go/tip/src/syscall/asm_darwin_amd64.s:41 +0x5
syscall.wait4(0x103ec, 0xc420190dcc, 0x0, 0xc4202b85a0, 0x90, 0x1154f00, 0x1167901)
	/Users/josh/go/tip/src/syscall/zsyscall_darwin_amd64.go:34 +0x88
syscall.Wait4(0x103ec, 0xc420190e1c, 0x0, 0xc4202b85a0, 0x3, 0x3, 0x0)
	/Users/josh/go/tip/src/syscall/syscall_bsd.go:129 +0x51
os.(*Process).wait(0xc4202a0fc0, 0xc4202c4230, 0xc4202c6100, 0xc42031ce40)
	/Users/josh/go/tip/src/os/exec_unix.go:38 +0x84
os.(*Process).Wait(0xc4202a0fc0, 0x0, 0x0, 0x1167938)
	/Users/josh/go/tip/src/os/exec.go:115 +0x2b
os/exec.(*Cmd).Wait(0xc420328420, 0x0, 0x0)
	/Users/josh/go/tip/src/os/exec/exec.go:439 +0x62
os/exec.(*Cmd).Run(0xc420328420, 0xc4202c4230, 0xc420234fb7)
	/Users/josh/go/tip/src/os/exec/exec.go:284 +0x5c
os/exec.(*Cmd).CombinedOutput(0xc420328420, 0xc420234fb7, 0x0, 0x0, 0x0, 0x0)
	/Users/josh/go/tip/src/os/exec/exec.go:499 +0x127
main.(*tester).runPending.func1(0xc420326230)
	/Users/josh/go/tip/src/cmd/dist/test.go:934 +0x61
created by main.(*tester).runPending
	/Users/josh/go/tip/src/cmd/dist/test.go:930 +0x139

goroutine 99 [chan send]:
main.(*tester).runPending.func1(0xc420326280)
	/Users/josh/go/tip/src/cmd/dist/test.go:936 +0xc8
created by main.(*tester).runPending
	/Users/josh/go/tip/src/cmd/dist/test.go:930 +0x139

rax    0xe
rbx    0x1228f00
rcx    0x7fff5fbff450
rdx    0x7fff5fbff4d8
rdi    0x803
rsi    0x1
rbp    0x7fff5fbff488
rsp    0x7fff5fbff450
r8     0xc42018c1d8
r9     0xc420061e88
r10    0xc42018ce30
r11    0x286
r12    0xc42018ce28
r13    0x0
r14    0x54
r15    0x7
rip    0x105421b
rflags 0x286
cs     0x7
fs     0x0
gs     0x0
syscall.Syscall6(0xad, 0x1, 0xc855, 0xc42001a300, 0x24, 0x0, 0x0, 0x60, 0x0, 0x1)
	/Users/josh/go/1.7/src/syscall/asm_darwin_amd64.s:41 +0x5 fp=0xc420051970 sp=0xc420051968
os.(*Process).blockUntilWaitable(0xc420016fc0, 0x0, 0xc4200741e0, 0x0)
	/Users/josh/go/1.7/src/os/wait_waitid.go:28 +0xbc fp=0xc420051a08 sp=0xc420051970
os.(*Process).wait(0xc420016fc0, 0x0, 0x1ed, 0x0)
	/Users/josh/go/1.7/src/os/exec_unix.go:22 +0xab fp=0xc420051a98 sp=0xc420051a08
os.(*Process).Wait(0xc420016fc0, 0xc420051b00, 0x0, 0x0)
	/Users/josh/go/1.7/src/os/doc.go:49 +0x2b fp=0xc420051ac8 sp=0xc420051a98
os/exec.(*Cmd).Wait(0xc42008e6e0, 0x0, 0x0)
	/Users/josh/go/1.7/src/os/exec/exec.go:434 +0x6d fp=0xc420051b58 sp=0xc420051ac8
os/exec.(*Cmd).Run(0xc42008e6e0, 0xa, 0xc42000c2b0)
	/Users/josh/go/1.7/src/os/exec/exec.go:279 +0x48 fp=0xc420051b80 sp=0xc420051b58
main.runUtility(0xc42000c2b0, 0x1, 0x1, 0x100000000000060, 0x6, 0xc42001000d, 0x421d400)
	/Users/josh/src/github.com/variadico/noti/cmd/noti/noti.go:244 +0x193 fp=0xc420051c20 sp=0xc420051b80
main.utilityNotification(0xc420074060, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/Users/josh/src/github.com/variadico/noti/cmd/noti/noti.go:177 +0x80 fp=0xc420051cd0 sp=0xc420051c20
main.main()
	/Users/josh/src/github.com/variadico/noti/cmd/noti/noti.go:82 +0x761 fp=0xc420051f48 sp=0xc420051cd0
runtime.main()
	/Users/josh/go/1.7/src/runtime/proc.go:183 +0x1f4 fp=0xc420051fa0 sp=0xc420051f48
runtime.goexit()
	/Users/josh/go/1.7/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc420051fa8 sp=0xc420051fa0

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
	/Users/josh/go/1.7/src/runtime/asm_amd64.s:2086 +0x1

rax    0x20000ad
rbx    0xc42001c00c
rcx    0xc420051968
rdx    0xc42001a300
rdi    0x1
rsi    0xc855
rbp    0xc4200519f8
rsp    0xc420051968
r8     0x0
r9     0x0
r10    0x24
r11    0x202
r12    0x80
r13    0x300
r14    0x7
r15    0x1ffffff
rip    0x40f5cae
rflags 0x202
cs     0x7
fs     0x0
gs     0x0

@rsc
Copy link
Contributor

rsc commented Apr 4, 2017

@josharian Was the compiler broken? (Fair question since I assume you're working on it.)

@josharian
Copy link
Contributor

Definitely a fair question. :)

I'm pretty confident that the answer was no--the only local modifications were well-trodden and uninteresting refactorings, and the rest of the tests had already passes.

@rsc
Copy link
Contributor

rsc commented Apr 4, 2017

OK, are you saying that even though the ^\ appears later, you typed it before the locklinear panic stack trace appeared? Just a little confused about ordering. The transcript makes it look like locklinear exited (and the exit was reported) before the SIGQUIT.

@josharian
Copy link
Contributor

The transcript makes it look like locklinear exited (and the exit was reported) before the SIGQUIT.

Sorry. You're right, that's what happened. (Too much context-switching for my single-threaded brain.) So I guess there's nothing new/interesting here, just another failure.

@alexbrainman
Copy link
Member

@rsc
Copy link
Contributor

rsc commented Apr 5, 2017

OK, I will expand the valid ratios from [2, 2.5) to [2, 3) and also just give up in the test if we consistently find that N takes longer than 2*N.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/39591 mentions this issue.

@josharian
Copy link
Contributor

Another one:

https://storage.googleapis.com/go-build-log/265ee2c5/linux-386_4695dbaa.log

# go run run.go -- locklinear.go
exit status 1
panic: lockmany: too slow: 1000 ops: 66.665166ms; 2000 ops: 592.885551ms

1000 220.124127ms 2000 281.298265ms (1.3X)
1000 39.274018ms 2000 160.658373ms (4.1X)
1000 200.569047ms 2000 50.339258ms (0.3X)
1000 68.838694ms 2000 334.122209ms (4.9X)
1000 55.557601ms 2000 330.584047ms (6.0X)
1000 66.665166ms 2000 592.885551ms (8.9X)

@josharian josharian reopened this Apr 13, 2017
lparth pushed a commit to lparth/go that referenced this issue Apr 13, 2017
Fixes golang#19276.

Change-Id: I64f8f80331d09956b6698c0b004ed7f7d70857fc
Reviewed-on: https://go-review.googlesource.com/39591
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/42431 mentions this issue.

gopherbot pushed a commit that referenced this issue May 2, 2017
5 shards, each of which spins up NumCPU processes,
each of which is running at GOMAXPROCS=NumCPU,
is too much for one machine. It makes my laptop unusable.

It might also be in part responsible for test flakes
that require a moderately responsive system,
like #18589 (backedge scheduling) and #19276 (locklinear).

It's possible that Go should be a better neighbor in general;
that's #17969. In the meantime, fix this corner of the world.

Builders snapshot the world and run shards on different
machines, so keeping sharding high for them is good.

This is a partial reversion of CL 18199.

Fixes #20141.

Change-Id: I123cf9436f4f4da3550372896265c38117b78071
Reviewed-on: https://go-review.googlesource.com/42431
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
josharian added a commit to josharian/go that referenced this issue May 2, 2017
5 shards, each of which spins up NumCPU processes,
each of which is running at GOMAXPROCS=NumCPU,
is too much for one machine. It makes my laptop unusable.

It might also be in part responsible for test flakes
that require a moderately responsive system,
like golang#18589 (backedge scheduling) and golang#19276 (locklinear).

It's possible that Go should be a better neighbor in general;
that's golang#17969. In the meantime, fix this corner of the world.

Builders snapshot the world and run shards on different
machines, so keeping sharding high for them is good.

This is a partial reversion of CL 18199.

Fixes golang#20141.

Change-Id: I123cf9436f4f4da3550372896265c38117b78071
@bradfitz
Copy link
Contributor

I haven't seen this in awhile. Maybe fixed?

@bradfitz
Copy link
Contributor

bradfitz commented Jun 7, 2017

I'm happy.

@bradfitz bradfitz closed this as completed Jun 7, 2017
@golang golang locked and limited conversation to collaborators Jun 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

8 participants