Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=3/cpu=4 failed #31446

Closed
cockroach-teamcity opened this issue Oct 16, 2018 · 4 comments
Closed

roachtest: tpccbench/nodes=3/cpu=4 failed #31446

cockroach-teamcity opened this issue Oct 16, 2018 · 4 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/a0b7cd4ebddf5ebc8f8c2119b119e57688f072f9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=tpccbench/nodes=3/cpu=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=968704&tab=buildLog

The test failed on master:
	test.go:584,test.go:596: /home/agent/work/.go/bin/roachprod create teamcity-968704-tpccbench-nodes-3-cpu-4 -n 4 --gce-machine-type=n1-standard-4 --gce-zones=us-central1-b,us-west1-b,europe-west2-b returned:
		stderr:
		
		stdout:
		2018/10/16 05:28:25 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		2018/10/16 05:28:25 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		Error:  failed to run: aws ec2 describe-instances --region us-west-2 --output json: exit status 255
		: exit status 1

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Oct 16, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 16, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/083e4b3272338b6f1a0b0628a7854678aa32fa27

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=tpccbench/nodes=3/cpu=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=968157&tab=buildLog

The test failed on release-2.1:
	test.go:579,cluster.go:1450,tpcc.go:662,tpcc.go:333: unexpected node event: 1: dead

@petermattis petermattis reopened this Oct 16, 2018
@petermattis petermattis assigned tbg and unassigned andreimatei Oct 16, 2018
@petermattis
Copy link
Collaborator

@tschottdorf Another OOM:

I181016 13:36:09.891160 159 server/status/runtime.go:465  [n1] runtime stats: 14 GiB RSS, 248 goroutines, 9.2 GiB/309 MiB/9.9 GiB GO alloc/idle/total, 3.7 GiB/4.3 GiB CGO alloc/total, 56621.8 CGO/sec, 39.2/20.0 %(u/s)time, 0.0 %gc (0x), 219 MiB/5.8 MiB (r/w)net
fatal error: runtime: out of memory

Something was clearly broken recently.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3e69f3acba8f66b4b8019f52890aaa3f63a848ee

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=tpccbench/nodes=3/cpu=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=969842&tab=buildLog

The test failed on release-2.1:
	test.go:584,test.go:596: /home/agent/work/.go/bin/roachprod create teamcity-969842-tpccbench-nodes-3-cpu-4 -n 4 --gce-machine-type=n1-standard-4 --gce-zones=us-central1-b,us-west1-b,europe-west2-b returned:
		stderr:
		
		stdout:
		2018/10/16 15:22:15 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		2018/10/16 15:22:15 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		Error:  failed to run: aws ec2 describe-instances --region us-east-2 --output json: exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e6348bb4abbfd117424c382ce5ab42e8abbe88f0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=tpccbench/nodes=3/cpu=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=970034&tab=buildLog

The test failed on release-2.1:
	test.go:584,test.go:596: /home/agent/work/.go/bin/roachprod create teamcity-970034-tpccbench-nodes-3-cpu-4 -n 4 --gce-machine-type=n1-standard-4 --gce-zones=us-central1-b,us-west1-b,europe-west2-b returned:
		stderr:
		
		stdout:
		2018/10/16 15:44:08 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		2018/10/16 15:44:08 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		Error:  failed to run: aws ec2 describe-instances --region us-east-2 --output json: exit status 255
		: exit status 1

tbg added a commit to tbg/cockroach that referenced this issue Oct 22, 2018
The tracking of the uncommitted portion of the log had a bug where
it wasn't releasing everything as it should've. As a result, over
time, all proposals would be dropped. We're hitting this way earlier
in our import tests, which propose large proposals. As an intentional
implementation detail, a proposal that itself exceeds the max
uncommitted log size is allowed only if the uncommitted log is empty.
Due to the leak, we weren't ever hitting this case and so AddSSTable
commands were often dropped indefinitely.

Fixes cockroachdb#31184.
Fixes cockroachdb#28693.
Fixes cockroachdb#31642.

Optimistically:
Fixes cockroachdb#31675.
Fixes cockroachdb#31654.
Fixes cockroachdb#31446.

Release note: None
craig bot pushed a commit that referenced this issue Oct 22, 2018
31554: exec: initial commit of execgen tool r=solongordon a=solongordon

Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.

Release note: None

31610: sql: fix pg_catalog.pg_constraint's confkey column r=BramGruneir a=BramGruneir

Prior to this patch, all columns in the index were included instead of only the
ones being used in the foreign key reference.

Fixes #31545.

Release note (bug fix): Fix pg_catalog.pg_constraint's confkey column from
including columns that were not involved in the foreign key reference.

31689: storage: pick up fix for Raft uncommitted entry size tracking r=benesch a=tschottdorf

Waiting for the upstream PR

etcd-io/etcd#10199

to merge, but this is going to be what the result will look like.

----

The tracking of the uncommitted portion of the log had a bug where
it wasn't releasing everything as it should've. As a result, over
time, all proposals would be dropped. We're hitting this way earlier
in our import tests, which propose large proposals. As an intentional
implementation detail, a proposal that itself exceeds the max
uncommitted log size is allowed only if the uncommitted log is empty.
Due to the leak, we weren't ever hitting this case and so AddSSTable
commands were often dropped indefinitely.

Fixes #31184.
Fixes #28693.
Fixes #31642.

Optimistically:
Fixes #31675.
Fixes #31654.
Fixes #31446.

Release note: None

Co-authored-by: Solon Gordon <solon@cockroachlabs.com>
Co-authored-by: Bram Gruneir <bram@cockroachlabs.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@craig craig bot closed this as completed in #31689 Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

4 participants