Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-deterministic sql panic #34

Closed
dpark01 opened this issue May 12, 2020 · 12 comments
Closed

non-deterministic sql panic #34

dpark01 opened this issue May 12, 2020 · 12 comments

Comments

@dpark01
Copy link

dpark01 commented May 12, 2020

Hi folks,

I've noticed that with some low frequency at random, I get jobs failing with the following immediately after localizing files with dxda:

panic: sql: database is closed
goroutine 62 [running]:
github.com/dnanexus/dxda.check(...)
	/go/src/github.com/dnanexus/dxda/util.go:147
github.com/dnanexus/dxda.(*State).queryDBIntegerResult(0xc0000fa120, 0x8e0f36, 0x50, 0x0)
	/go/src/github.com/dnanexus/dxda/dxda.go:216 +0x221
github.com/dnanexus/dxda.(*State).DownloadProgressOneTime(0xc0000fa120, 0x1bf08eb000, 0xdf72e0, 0x3)
	/go/src/github.com/dnanexus/dxda/dxda.go:511 +0x85
github.com/dnanexus/dxda.(*State).downloadProgressContinuous(0xc0000fa120)
	/go/src/github.com/dnanexus/dxda/dxda.go:546 +0x125
created by github.com/dnanexus/dxda.(*State).DownloadManifestDB
	/go/src/github.com/dnanexus/dxda/dxda.go:794 +0x7e3
download agent failed rc=2

These often succeed upon relaunch. Here is one example job that came from this larger analysis.

Do you know what causes that behavior?

If it matters, these workflows are all generated via dxWDL 1.46.4. Since this often succeeds after relaunch, would it make sense for dxWDLrt to capture exit code 2 from dxda and just retry up to some reasonable retry limit?

@dpark01
Copy link
Author

dpark01 commented May 13, 2020

Sorry, this looks like maybe this was already addressed in #12 and #22 ? Should I be reporting this issue on dxWDL instead (since it looks like it uses a dxda that predates #12)?

@orodeh
Copy link
Contributor

orodeh commented May 13, 2020

dxWDL uses the latest dxda on master branch. This looks like a new variation of an old problem. We'll need to investigate.

Does this happen towards the end of the download phase?

@dpark01
Copy link
Author

dpark01 commented May 13, 2020

Thanks @orodeh -- you can see the context from the job I linked above. It's always after downloads complete. Pasted here:

Obtained token using environment
number of machine cores: 8
memory size: 15 GiB
Downloading files using 16 threads
maximal memory chunk size: 64 MiB
Creating manifest database /home/dnanexus/meta/dxdaManifest.json.bz2.stats.db
Preparing files for download
Required disk space = 3.0GB, available = 181.1GB
Downloaded 3097/3097 MB	44/44 Parts (~51.6 MB/s written to disk estimated over the last 60s)
Download completed successfully.
To perform additional post-download integrity checks, please use the 'inspect' subcommand.
panic: sql: database is closed
goroutine 56 [running]:
github.com/dnanexus/dxda.check(...)
	/go/src/github.com/dnanexus/dxda/util.go:147
github.com/dnanexus/dxda.(*State).queryDBIntegerResult(0xc0000e4120, 0x8e0499, 0x46, 0x0)
	/go/src/github.com/dnanexus/dxda/dxda.go:216 +0x221
github.com/dnanexus/dxda.(*State).DownloadProgressOneTime(0xc0000e4120, 0x37e1324ee, 0xdf72e0, 0x0)
	/go/src/github.com/dnanexus/dxda/dxda.go:514 +0xed
github.com/dnanexus/dxda.(*State).downloadProgressContinuous(0xc0000e4120)
	/go/src/github.com/dnanexus/dxda/dxda.go:546 +0x125
created by github.com/dnanexus/dxda.(*State).DownloadManifestDB
	/go/src/github.com/dnanexus/dxda/dxda.go:794 +0x7e3
download agent failed rc=2
The download log is:

@orodeh
Copy link
Contributor

orodeh commented May 13, 2020

Oh, I think I know what the problem is. The thread that periodically reports on download progress has not been shut down, and it continues to query the database after it has already been closed.

@orodeh
Copy link
Contributor

orodeh commented May 13, 2020

Should be an easy fix.

@orodeh
Copy link
Contributor

orodeh commented May 14, 2020

Fixed on master.

@orodeh orodeh closed this as completed May 14, 2020
@orodeh
Copy link
Contributor

orodeh commented May 15, 2020

Please check it out, new code is part of the new dxWDL release https://github.com/dnanexus/dxWDL/releases/tag/v1.47

@dpark01
Copy link
Author

dpark01 commented May 15, 2020

Hm... I still get this in dxWDL 1.47:

Analysis: https://platform.dnanexus.com/projects/F8PQ6380xf5bK0Qk0YPjB17P/monitor/analysis/Fpyz1B80xf5qY28X2X8YY3z4
Job: https://platform.dnanexus.com/projects/F8PQ6380xf5bK0Qk0YPjB17P/monitor/job/Fpyz26j0xf5j2vPq0p54fjkJ

Obtained token using environment
number of machine cores: 8
memory size: 62 GiB
Downloading files using 16 threads
maximal memory chunk size: 64 MiB
Creating manifest database /home/dnanexus/meta/dxdaManifest.json.bz2.stats.db
Preparing files for download
Required disk space = 34.9GB, available = 274.3GB
Downloaded 6619/35719 MB	140/479 Parts (~0.0 MB/s written to disk estimated over the last 15s)
Downloaded 15619/35719 MB	210/479 Parts (~300.6 MB/s written to disk estimated over the last 51s)
Downloaded 17119/35719 MB	270/479 Parts (~124.3 MB/s written to disk estimated over the last 66s)
Downloaded 25369/35719 MB	340/479 Parts (~240.7 MB/s written to disk estimated over the last 105s)
Downloaded 35719/35719 MB	479/479 Parts (~181.3 MB/s written to disk estimated over the last 60s)
Download completed successfully.
To perform additional post-download integrity checks, please use the 'inspect' subcommand.
panic: sql: database is closed
goroutine 59 [running]:
github.com/dnanexus/dxda.check(...)
	/go/src/github.com/dnanexus/dxda/util.go:149
github.com/dnanexus/dxda.(*State).queryDBIntegerResult(0xc0000d2100, 0xc0000fe000, 0x64, 0x0)
	/go/src/github.com/dnanexus/dxda/dxda.go:212 +0x221
github.com/dnanexus/dxda.(*State).calcBandwidth(0xc0000d2100, 0x1bf08eb000, 0x46)
	/go/src/github.com/dnanexus/dxda/dxda.go:482 +0x10d
github.com/dnanexus/dxda.(*State).DownloadProgressOneTime(0xc0000d2100, 0x1bf08eb000, 0xdada40, 0xbfa79a5ca634cc77)
	/go/src/github.com/dnanexus/dxda/dxda.go:513 +0x136
github.com/dnanexus/dxda.(*State).downloadProgressContinuous(0xc0000d2100, 0xc0005f23f0)
	/go/src/github.com/dnanexus/dxda/dxda.go:570 +0x1ff
created by github.com/dnanexus/dxda.(*State).DownloadManifestDB
	/go/src/github.com/dnanexus/dxda/dxda.go:884 +0x8a4
download agent failed rc=2
The download log is:
2020/05/15 01:33:46 Logging detailed output to: /home/dnanexus/meta/dxdaManifest.json.bz2.download.log
2020/05/15 01:33:46 Obtained token using environment
2020/05/15 01:33:46 Preparing files for download
2020/05/15 01:33:46 Required disk space = 34.9GB, available = 274.3GB
2020/05/15 01:35:59 Downloaded 35719/35719 MB	479/479 Parts (~181.3 MB/s written to disk estimated over the last 60s)
2020/05/15 01:35:59 Download completed successfully.
2020/05/15 01:35:59 To perform additional post-download integrity checks, please use the 'inspect' subcommand.

@orodeh
Copy link
Contributor

orodeh commented May 15, 2020

This is strange, because the continuous reporting thread has been shut down at this point. I am not sure how it is still operating. This will have to be reopened.

@orodeh orodeh reopened this May 15, 2020
@geetduggal
Copy link
Contributor

Hi @dpark01 we believe we found the issue and are testing an implementation fix. We will let you know when a dxWDL release will contain this fix. Thanks for reporting.

@commandlinegirl
Copy link

Hi @dpark01, we have released dxWDL with the fix. Please let us know if the issue persists or if you encounter any other issues. Thanks!

@sclan
Copy link
Collaborator

sclan commented Jul 13, 2020

v1.48.1 release fixed both database closed (this issue) and 502 error.
Download it from https://github.com/dnanexus/dxWDL/releases @dpark01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants