Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) #29

Closed
grhogg opened this issue Dec 31, 2020 · 36 comments
Closed

Segmentation fault (core dumped) #29

grhogg opened this issue Dec 31, 2020 · 36 comments

Comments

@grhogg
Copy link

grhogg commented Dec 31, 2020

Hello! Thank you so much for creating this awesome package. I am trying to run TRUST4 on a large set of BAM files; however, I find that for most of the files the function fails and returns various errors (most commonly) "Segmentation fault (core dumped)"

Here is my system call:

find /scratch/ghogg/GDC_DATA/BAM/GDC_Transfer -name "*.bam" -print0 | while read -d $'\0' file
do
./run-trust4 -b "$file" -f ./hg38_bcrtcr.fa --ref ./human_IMGT+C.fa
done

For some files, the function works beautifully and completes:

[Wed Dec 30 15:42:37 2020] TRUST4 begins.
[Wed Dec 30 15:42:37 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/bam-extractor -b /scratch/ghogg/GDC_DATA/BAM/GDC_Transfer/d2dcc520-547e-4c27-8037-5a4f41db67ae/5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead.bam -t 1 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead_toassemble
[Wed Dec 30 15:42:37 2020] Start to extract candidate reads from bam file.
[Wed Dec 30 15:46:21 2020] Finish obtaining the candidate read ids.
[Wed Dec 30 15:51:18 2020] Finish extracting reads.
[Wed Dec 30 15:51:18 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/trust4 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead -1 TRUST_5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead_toassemble_1.fq -2 TRUST_5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead_toassemble_2.fq
[Wed Dec 30 15:51:18 2020] Read in and count kmers for 100000 reads.
[Wed Dec 30 15:51:19 2020] Read in and count kmers for 200000 reads.
[Wed Dec 30 15:51:19 2020] Read in and count kmers for 300000 reads.
[Wed Dec 30 15:51:19 2020] Read in and count kmers for 400000 reads.
[Wed Dec 30 15:51:20 2020] Read in and count kmers for 500000 reads.
[Wed Dec 30 15:51:21 2020] Found 563628 reads.
[Wed Dec 30 15:51:22 2020] Finish sorting the reads.
[Wed Dec 30 15:51:31 2020] Finish rough annotations.
[Wed Dec 30 15:51:31 2020] Processed 100000 reads (19175 are used for assembly).
[Wed Dec 30 15:51:31 2020] Processed 200000 reads (36304 are used for assembly).
[Wed Dec 30 15:51:31 2020] Processed 300000 reads (49649 are used for assembly).
[Wed Dec 30 15:51:31 2020] Processed 400000 reads (96331 are used for assembly).
[Wed Dec 30 15:51:33 2020] Processed 500000 reads (159387 are used for assembly).
[Wed Dec 30 15:51:38 2020] Assembled 200863 reads.
[Wed Dec 30 15:51:38 2020] Try to rescue 3811 reads for assembly.
[Wed Dec 30 15:51:38 2020] Rescued 1670 reads.
[Wed Dec 30 15:51:39 2020] Processed 100000 reads for extension.
[Wed Dec 30 15:51:42 2020] Processed 200000 reads for extension.
[Wed Dec 30 15:51:42 2020] Extend assemblies by mate pair information.
[Wed Dec 30 15:51:44 2020] Remove redundant assemblies.
[Wed Dec 30 15:51:45 2020] Finish assembly.
[Wed Dec 30 15:51:45 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/annotator -f /home/ghogg/TRUST/TRUST4/human_IMGT+C.fa -a TRUST_5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead_final.out -t 1 -o TRUST_5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead -r TRUST_5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead_assembled_reads.fa > TRUST_5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead_annot.fa
[Wed Dec 30 15:51:45 2020] Start to annotate assemblies.
[Wed Dec 30 15:51:59 2020] Start to realign reads for CDR3 analysis.
[Wed Dec 30 15:52:01 2020] Realigned 100000 reads.
[Wed Dec 30 15:52:05 2020] Realigned 200000 reads.
[Wed Dec 30 15:52:05 2020] Compute CDR3 abundance.
[Wed Dec 30 15:52:05 2020] Finish annotation.
[Wed Dec 30 15:52:05 2020] SYSTEM CALL: perl /home/ghogg/TRUST/TRUST4/trust-simplerep.pl TRUST_5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead_cdr3.out > TRUST_5a683fae-9f67-4e74-95b7-586f8b0a820b_gdc_realn_rehead_report.tsv
[Wed Dec 30 15:52:05 2020] TRUST4 finishes.

However, for most of the BAM files, TRUST4 fails to complete due to a segmentation fault:

[Wed Dec 30 17:05:24 2020] TRUST4 begins.
[Wed Dec 30 17:05:24 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/bam-extractor -b /scratch/ghogg/GDC_DATA/BAM/GDC_Transfer/ebab1fb1-1976-48c7-8110-d688f5c6c92a/a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead.bam -t 1 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_toassemble
[Wed Dec 30 17:05:24 2020] Start to extract candidate reads from bam file.
[Wed Dec 30 17:09:47 2020] Finish obtaining the candidate read ids.
[Wed Dec 30 17:13:42 2020] Finish extracting reads.
[Wed Dec 30 17:13:42 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/trust4 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead -1 TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_toassemble_1.fq -2 TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_toassemble_2.fq
[Wed Dec 30 17:13:43 2020] Read in and count kmers for 100000 reads.
[Wed Dec 30 17:13:44 2020] Found 188364 reads.
[Wed Dec 30 17:13:44 2020] Finish sorting the reads.
[Wed Dec 30 17:13:49 2020] Finish rough annotations.
[Wed Dec 30 17:13:49 2020] Processed 100000 reads (16934 are used for assembly).
[Wed Dec 30 17:13:51 2020] Assembled 68516 reads.
[Wed Dec 30 17:13:51 2020] Try to rescue 1126 reads for assembly.
[Wed Dec 30 17:13:51 2020] Rescued 387 reads.
[Wed Dec 30 17:13:53 2020] Extend assemblies by mate pair information.
[Wed Dec 30 17:13:54 2020] Remove redundant assemblies.
[Wed Dec 30 17:13:54 2020] Finish assembly.
[Wed Dec 30 17:13:55 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/annotator -f /home/ghogg/TRUST/TRUST4/human_IMGT+C.fa -a TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_final.out -t 1 -o TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead -r TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_assembled_reads.fa > TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_annot.fa
[Wed Dec 30 17:13:55 2020] Start to annotate assemblies.
sh: line 1: 24686 Segmentation fault (core dumped) /home/ghogg/TRUST/TRUST4/annotator -f /home/ghogg/TRUST/TRUST4/human_IMGT+C.fa -a TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_final.out -t 1 -o TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead -r TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_assembled_reads.fa > TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_annot.fa
system /home/ghogg/TRUST/TRUST4/annotator -f /home/ghogg/TRUST/TRUST4/human_IMGT+C.fa -a TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_final.out -t 1 -o TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead -r TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_assembled_reads.fa > TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_annot.fa failed: 35584 at /home/ghogg/TRUST/TRUST4/run-trust4 line 44.

And sometimes it fails due to corrupted unsorted chunks (glibc detected)

[Wed Dec 30 18:02:36 2020] TRUST4 begins.
[Wed Dec 30 18:02:36 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/bam-extractor -b /scratch/ghogg/GDC_DATA/BAM/GDC_Transfer/271c5562-4fa4-403a-be2f-4bd42254a03e/94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead.bam -t 1 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_toassemble
[Wed Dec 30 18:02:37 2020] Start to extract candidate reads from bam file.
[Wed Dec 30 18:05:56 2020] Finish obtaining the candidate read ids.
[Wed Dec 30 18:12:19 2020] Finish extracting reads.
[Wed Dec 30 18:12:19 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/trust4 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead -1 TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_toassemble_1.fq -2 TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_toassemble_2.fq
[Wed Dec 30 18:12:20 2020] Read in and count kmers for 100000 reads.
[Wed Dec 30 18:12:20 2020] Read in and count kmers for 200000 reads.
[Wed Dec 30 18:12:20 2020] Read in and count kmers for 300000 reads.
[Wed Dec 30 18:12:21 2020] Read in and count kmers for 400000 reads.
[Wed Dec 30 18:12:21 2020] Read in and count kmers for 500000 reads.
[Wed Dec 30 18:12:21 2020] Read in and count kmers for 600000 reads.
[Wed Dec 30 18:12:22 2020] Read in and count kmers for 700000 reads.
[Wed Dec 30 18:12:22 2020] Read in and count kmers for 800000 reads.
[Wed Dec 30 18:12:23 2020] Read in and count kmers for 900000 reads.
[Wed Dec 30 18:12:23 2020] Read in and count kmers for 1000000 reads.
[Wed Dec 30 18:12:26 2020] Found 1012140 reads.
[Wed Dec 30 18:12:27 2020] Finish sorting the reads.
[Wed Dec 30 18:12:46 2020] Finish rough annotations.
[Wed Dec 30 18:12:46 2020] Processed 100000 reads (11455 are used for assembly).
[Wed Dec 30 18:12:46 2020] Processed 200000 reads (27501 are used for assembly).
[Wed Dec 30 18:12:46 2020] Processed 300000 reads (44703 are used for assembly).
[Wed Dec 30 18:12:46 2020] Processed 400000 reads (46072 are used for assembly).
[Wed Dec 30 18:12:46 2020] Processed 500000 reads (57038 are used for assembly).
[Wed Dec 30 18:12:46 2020] Processed 600000 reads (75623 are used for assembly).
[Wed Dec 30 18:12:46 2020] Processed 700000 reads (121807 are used for assembly).
[Wed Dec 30 18:12:47 2020] Processed 800000 reads (190805 are used for assembly).
[Wed Dec 30 18:12:54 2020] Processed 900000 reads (255998 are used for assembly).
[Wed Dec 30 18:13:02 2020] Processed 1000000 reads (322787 are used for assembly).
[Wed Dec 30 18:13:03 2020] Assembled 328259 reads.
[Wed Dec 30 18:13:03 2020] Try to rescue 6462 reads for assembly.
[Wed Dec 30 18:13:04 2020] Rescued 3330 reads.
[Wed Dec 30 18:13:05 2020] Processed 100000 reads for extension.
[Wed Dec 30 18:13:07 2020] Processed 200000 reads for extension.
[Wed Dec 30 18:13:12 2020] Processed 300000 reads for extension.
[Wed Dec 30 18:13:13 2020] Extend assemblies by mate pair information.
[Wed Dec 30 18:13:17 2020] Remove redundant assemblies.
[Wed Dec 30 18:13:18 2020] Finish assembly.
[Wed Dec 30 18:13:19 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/annotator -f /home/ghogg/TRUST/TRUST4/human_IMGT+C.fa -a TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_final.out -t 1 -o TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead -r TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_assembled_reads.fa > TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_annot.fa
[Wed Dec 30 18:13:19 2020] Start to annotate assemblies.
*** glibc detected *** /home/ghogg/TRUST/TRUST4/annotator: free(): corrupted unsorted chunks: 0x00000000067ff050 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3694075f3e]
/lib64/libc.so.6[0x3694078dd0]
/home/ghogg/TRUST/TRUST4/annotator[0x41cd3f]
/home/ghogg/TRUST/TRUST4/annotator[0x405b64]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x369401ed1d]
/home/ghogg/TRUST/TRUST4/annotator[0x401979]
======= Memory map: ========
00400000-00429000 r-xp 00000000 b5:84fa2 144116146356946693 /home/ghogg/TRUST/TRUST4/annotator
00628000-00649000 rw-p 00028000 b5:84fa2 144116146356946693 /home/ghogg/TRUST/TRUST4/annotator
008c8000-06b5d000 rw-p 00000000 00:00 0 [heap]
3693c00000-3693c20000 r-xp 00000000 08:03 3280510 /lib64/ld-2.12.so
3693e1f000-3693e20000 r--p 0001f000 08:03 3280510 /lib64/ld-2.12.so
3693e20000-3693e21000 rw-p 00020000 08:03 3280510 /lib64/ld-2.12.so
3693e21000-3693e22000 rw-p 00000000 00:00 0
3694000000-369418a000 r-xp 00000000 08:03 3280523 /lib64/libc-2.12.so
369418a000-369438a000 ---p 0018a000 08:03 3280523 /lib64/libc-2.12.so
369438a000-369438e000 r--p 0018a000 08:03 3280523 /lib64/libc-2.12.so
369438e000-3694390000 rw-p 0018e000 08:03 3280523 /lib64/libc-2.12.so
3694390000-3694394000 rw-p 00000000 00:00 0
3694800000-3694883000 r-xp 00000000 08:03 3280583 /lib64/libm-2.12.so
3694883000-3694a82000 ---p 00083000 08:03 3280583 /lib64/libm-2.12.so
3694a82000-3694a83000 r--p 00082000 08:03 3280583 /lib64/libm-2.12.so
3694a83000-3694a84000 rw-p 00083000 08:03 3280583 /lib64/libm-2.12.so
3694c00000-3694c17000 r-xp 00000000 08:03 3280614 /lib64/libpthread-2.12.so
3694c17000-3694e17000 ---p 00017000 08:03 3280614 /lib64/libpthread-2.12.so
3694e17000-3694e18000 r--p 00017000 08:03 3280614 /lib64/libpthread-2.12.so
3694e18000-3694e19000 rw-p 00018000 08:03 3280614 /lib64/libpthread-2.12.so
3694e19000-3694e1d000 rw-p 00000000 00:00 0
369e000000-369e016000 r-xp 00000000 08:03 3280553 /lib64/libgcc_s-4.4.7-20120601.so.1
369e016000-369e215000 ---p 00016000 08:03 3280553 /lib64/libgcc_s-4.4.7-20120601.so.1
369e215000-369e216000 rw-p 00015000 08:03 3280553 /lib64/libgcc_s-4.4.7-20120601.so.1
369e400000-369e4e8000 r-xp 00000000 08:03 409528 /usr/lib64/libstdc++.so.6.0.13
369e4e8000-369e6e8000 ---p 000e8000 08:03 409528 /usr/lib64/libstdc++.so.6.0.13
369e6e8000-369e6ef000 r--p 000e8000 08:03 409528 /usr/lib64/libstdc++.so.6.0.13
369e6ef000-369e6f1000 rw-p 000ef000 08:03 409528 /usr/lib64/libstdc++.so.6.0.13
369e6f1000-369e706000 rw-p 00000000 00:00 0
3be5200000-3be5215000 r-xp 00000000 08:03 3280662 /lib64/libz.so.1.2.3
3be5215000-3be5414000 ---p 00015000 08:03 3280662 /lib64/libz.so.1.2.3
3be5414000-3be5415000 r--p 00014000 08:03 3280662 /lib64/libz.so.1.2.3
3be5415000-3be5416000 rw-p 00015000 08:03 3280662 /lib64/libz.so.1.2.3
7f665c000000-7f665c021000 rw-p 00000000 00:00 0
7f665c021000-7f6660000000 ---p 00000000 00:00 0
7f66629a3000-7f66630e0000 rw-p 00000000 00:00 0
7f666364f000-7f66691e3000 rw-p 00000000 00:00 0
7f66691fc000-7f66691fe000 rw-p 00000000 00:00 0
7ffc96ad1000-7ffc96ae6000 rw-p 00000000 00:00 0 [stack]
7ffc96b92000-7ffc96b93000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
sh: line 1: 719 Aborted (core dumped) /home/ghogg/TRUST/TRUST4/annotator -f /home/ghogg/TRUST/TRUST4/human_IMGT+C.fa -a TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_final.out -t 1 -o TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead -r TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_assembled_reads.fa > TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_annot.fa
system /home/ghogg/TRUST/TRUST4/annotator -f /home/ghogg/TRUST/TRUST4/human_IMGT+C.fa -a TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_final.out -t 1 -o TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead -r TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_assembled_reads.fa > TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_annot.fa failed: 34304 at /home/ghogg/TRUST/TRUST4/run-trust4 line 44.

and sometimes it fails without any obvious clues as to why:

[Wed Dec 30 16:30:41 2020] TRUST4 begins.
[Wed Dec 30 16:30:41 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/bam-extractor -b /scratch/ghogg/GDC_DATA/BAM/GDC_Transfer/045c1807-df2b-4cb8-8e37-269eba60d401/634dde08-b8f0-4519-944d-2997d5ee1ff3_gdc_realn_rehead.bam -t 1 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_634dde08-b8f0-4519-944d-2997d5ee1ff3_gdc_realn_rehead_toassemble
[Wed Dec 30 16:30:41 2020] Start to extract candidate reads from bam file.
[Wed Dec 30 16:37:26 2020] Finish obtaining the candidate read ids.
[Wed Dec 30 16:45:52 2020] Finish extracting reads.
[Wed Dec 30 16:45:52 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/trust4 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_634dde08-b8f0-4519-944d-2997d5ee1ff3_gdc_realn_rehead -1 TRUST_634dde08-b8f0-4519-944d-2997d5ee1ff3_gdc_realn_rehead_toassemble_1.fq -2 TRUST_634dde08-b8f0-4519-944d-2997d5ee1ff3_gdc_realn_rehead_toassemble_2.fq
[Wed Dec 30 16:45:52 2020] Read in and count kmers for 100000 reads.
[Wed Dec 30 16:45:53 2020] Read in and count kmers for 200000 reads.
[Wed Dec 30 16:45:53 2020] Read in and count kmers for 300000 reads.
[Wed Dec 30 16:45:54 2020] Read in and count kmers for 400000 reads.
[Wed Dec 30 16:45:55 2020] Found 436070 reads.
[Wed Dec 30 16:45:56 2020] Finish sorting the reads.
system /home/ghogg/TRUST/TRUST4/trust4 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_634dde08-b8f0-4519-944d-2997d5ee1ff3_gdc_realn_rehead -1 TRUST_634dde08-b8f0-4519-944d-2997d5ee1ff3_gdc_realn_rehead_toassemble_1.fq -2 TRUST_634dde08-b8f0-4519-944d-2997d5ee1ff3_gdc_realn_rehead_toassemble_2.fq failed: 139 at /home/ghogg/TRUST/TRUST4/run-trust4 line 44.

Any thoughts as to why I may be running into this problem would be greatly appreciated! and Thank you again for creating this awesome function!

Distributor ID: CentOS
Description: CentOS release 6.10 (Final)
Release: 6.10
Codename: Final
Linux login01.cluster 2.6.32-696.20.1.el6.centos.plus.x86_64 #1 SMP Sun Jan 28 07:56:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@mourisl
Copy link
Collaborator

mourisl commented Dec 31, 2020

Which version of TRUST4 did you use? Can you share the file TRUST_a2468381-bf6c-4f08-911e-ec39c1253add_gdc_realn_rehead_final.out or TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_final.out so I can debug the issue? I feel like the last error could also be related to the annotation issue, since it happens during the stage of rough annotation. Thank you.

@grhogg
Copy link
Author

grhogg commented Dec 31, 2020

@mourisl
Copy link
Collaborator

mourisl commented Dec 31, 2020

I just tried the two files, and they both work on my system. Could you please pull the newest github repo and give it a try? For test, you can just run the command like:

newest_trust4_path/annotator -f newest_trust4_path/human_IMGT+C.fa -a TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_final.out -t 1 -o TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead -r TRUST_94b39980-23a2-4178-9bec-8b63acbd4c71_gdc_realn_rehead_assembled_reads.fa > tmp.out

to make sure it works without running the whole stages. Please let me know whether it works.

@grhogg
Copy link
Author

grhogg commented Dec 31, 2020

This does indeed work! I will try it again with all my files and will let you know if I run into any more errors

@grhogg
Copy link
Author

grhogg commented Dec 31, 2020

So sorry, but I'm still running into the same Segmentation fault (core dumped) error as before.

./run-trust4 -b /scratch/ghogg/GDC_DATA/BAM/GDC_Transfer/efc0c2ff-63c6-4028-a1c2-78bf63d04445/24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead.bam -f hg38_bcrtcr.fa --ref human_IMGT+C.fa

[Wed Dec 30 23:06:45 2020] TRUST4 begins.
[Wed Dec 30 23:06:45 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/bam-extractor -b /scratch/ghogg/GDC_DATA/BAM/GDC_Transfer/efc0c2ff-63c6-4028-a1c2-78bf63d04445/24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead.bam -t 1 -f hg38_bcrtcr.fa -o TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_toassemble
[Wed Dec 30 23:06:45 2020] Start to extract candidate reads from bam file.
[Wed Dec 30 23:10:32 2020] Finish obtaining the candidate read ids.
[Wed Dec 30 23:17:23 2020] Finish extracting reads.
[Wed Dec 30 23:17:24 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/trust4 -f hg38_bcrtcr.fa -o TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead -1 TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_toassemble_1.fq -2 TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_toassemble_2.fq
[Wed Dec 30 23:17:24 2020] Read in and count kmers for 100000 reads.
[Wed Dec 30 23:17:25 2020] Read in and count kmers for 200000 reads.
[Wed Dec 30 23:17:25 2020] Read in and count kmers for 300000 reads.
[Wed Dec 30 23:17:25 2020] Read in and count kmers for 400000 reads.
[Wed Dec 30 23:17:26 2020] Read in and count kmers for 500000 reads.
[Wed Dec 30 23:17:26 2020] Read in and count kmers for 600000 reads.
[Wed Dec 30 23:17:26 2020] Read in and count kmers for 700000 reads.
[Wed Dec 30 23:17:27 2020] Read in and count kmers for 800000 reads.
[Wed Dec 30 23:17:27 2020] Read in and count kmers for 900000 reads.
[Wed Dec 30 23:17:27 2020] Read in and count kmers for 1000000 reads.
[Wed Dec 30 23:17:27 2020] Read in and count kmers for 1100000 reads.
[Wed Dec 30 23:17:28 2020] Read in and count kmers for 1200000 reads.
[Wed Dec 30 23:17:28 2020] Read in and count kmers for 1300000 reads.
[Wed Dec 30 23:17:28 2020] Read in and count kmers for 1400000 reads.
[Wed Dec 30 23:17:29 2020] Read in and count kmers for 1500000 reads.
[Wed Dec 30 23:17:29 2020] Read in and count kmers for 1600000 reads.
[Wed Dec 30 23:17:30 2020] Read in and count kmers for 1700000 reads.
[Wed Dec 30 23:17:30 2020] Read in and count kmers for 1800000 reads.
[Wed Dec 30 23:17:31 2020] Read in and count kmers for 1900000 reads.
[Wed Dec 30 23:17:36 2020] Found 1907655 reads.
[Wed Dec 30 23:17:39 2020] Finish sorting the reads.
[Wed Dec 30 23:18:12 2020] Finish rough annotations.
[Wed Dec 30 23:18:12 2020] Processed 100000 reads (39178 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 200000 reads (66571 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 300000 reads (102784 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 400000 reads (109073 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 500000 reads (134025 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 600000 reads (154734 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 700000 reads (165467 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 800000 reads (179884 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 900000 reads (199505 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 1000000 reads (220420 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 1100000 reads (249605 are used for assembly).
[Wed Dec 30 23:18:12 2020] Processed 1200000 reads (314346 are used for assembly).
[Wed Dec 30 23:18:13 2020] Processed 1300000 reads (386852 are used for assembly).
[Wed Dec 30 23:18:15 2020] Processed 1400000 reads (468734 are used for assembly).
[Wed Dec 30 23:18:21 2020] Processed 1500000 reads (551715 are used for assembly).
[Wed Dec 30 23:18:27 2020] Processed 1600000 reads (630018 are used for assembly).
[Wed Dec 30 23:18:35 2020] Processed 1700000 reads (708412 are used for assembly).
[Wed Dec 30 23:18:42 2020] Processed 1800000 reads (783021 are used for assembly).
[Wed Dec 30 23:18:50 2020] Processed 1900000 reads (855188 are used for assembly).
[Wed Dec 30 23:18:51 2020] Assembled 857230 reads.
[Wed Dec 30 23:18:51 2020] Try to rescue 13544 reads for assembly.
[Wed Dec 30 23:18:52 2020] Rescued 7552 reads.
[Wed Dec 30 23:18:54 2020] Processed 100000 reads for extension.
[Wed Dec 30 23:18:54 2020] Processed 200000 reads for extension.
[Wed Dec 30 23:18:55 2020] Processed 300000 reads for extension.
[Wed Dec 30 23:18:56 2020] Processed 400000 reads for extension.
[Wed Dec 30 23:18:59 2020] Processed 500000 reads for extension.
[Wed Dec 30 23:19:03 2020] Processed 600000 reads for extension.
[Wed Dec 30 23:19:08 2020] Processed 700000 reads for extension.
[Wed Dec 30 23:19:12 2020] Processed 800000 reads for extension.
[Wed Dec 30 23:19:15 2020] Extend assemblies by mate pair information.
[Wed Dec 30 23:19:22 2020] Remove redundant assemblies.
[Wed Dec 30 23:19:26 2020] Finish assembly.
[Wed Dec 30 23:19:28 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/annotator -f human_IMGT+C.fa -a TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_final.out -t 1 -o TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead -r TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_assembled_reads.fa > TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_annot.fa
[Wed Dec 30 23:19:28 2020] Start to annotate assemblies.
sh: line 1: 22646 Segmentation fault (core dumped) /home/ghogg/TRUST/TRUST4/annotator -f human_IMGT+C.fa -a TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_final.out -t 1 -o TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead -r TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_assembled_reads.fa > TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_annot.fa
system /home/ghogg/TRUST/TRUST4/annotator -f human_IMGT+C.fa -a TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_final.out -t 1 -o TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead -r TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_assembled_reads.fa > TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_annot.fa failed: 35584 at ./run-trust4 line 47.

I'm now using TRUST4 v1.0.2-beta

Here is the final.out:

TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_final.out.txt.zip

Let me know if you have any other thoughts. Thanks!

@mourisl
Copy link
Collaborator

mourisl commented Dec 31, 2020

I think I just fixed a bug. Could you please pull/clone and recompile TRUST4 to test this sample? You need to rerun it with ./run-trust4, since the bug seems from earlier stages.

@grhogg
Copy link
Author

grhogg commented Dec 31, 2020

I'm so sorry, but I pulled the updated package and recompiled TRUST4, however, I am still running into the same issue:

It seems as if the program fails when initial BAM files are too large. From what I've seen, when the .fq files exceed 100 Mb it causes the segmentation fault. Even larger files .fq > 400 Mb do not even reach the final.out stage.

Do you think it would be helpful if I tried running TRUST4 in parallel in multiple nodes?

[Thu Dec 31 09:55:00 2020] TRUST4 begins.
[Thu Dec 31 09:55:00 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/bam-extractor -b /scratch/ghogg/GDC_DATA/BAM/GDC_Transfer/efc0c2ff-63c6-4028-a1c2-78bf63d04445/24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead.bam -t 1 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_toassemble
[Thu Dec 31 09:55:00 2020] Start to extract candidate reads from bam file.
[Thu Dec 31 09:58:20 2020] Finish obtaining the candidate read ids.
[Thu Dec 31 10:04:15 2020] Finish extracting reads.
[Thu Dec 31 10:04:16 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/trust4 -f /home/ghogg/TRUST/TRUST4/hg38_bcrtcr.fa -o TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead -1 TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_toassemble_1.fq -2 TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_toassemble_2.fq
[Thu Dec 31 10:04:16 2020] Read in and count kmers for 100000 reads.
[Thu Dec 31 10:04:16 2020] Read in and count kmers for 200000 reads.
[Thu Dec 31 10:04:17 2020] Read in and count kmers for 300000 reads.
[Thu Dec 31 10:04:17 2020] Read in and count kmers for 400000 reads.
[Thu Dec 31 10:04:17 2020] Read in and count kmers for 500000 reads.
[Thu Dec 31 10:04:18 2020] Read in and count kmers for 600000 reads.
[Thu Dec 31 10:04:18 2020] Read in and count kmers for 700000 reads.
[Thu Dec 31 10:04:18 2020] Read in and count kmers for 800000 reads.
[Thu Dec 31 10:04:18 2020] Read in and count kmers for 900000 reads.
[Thu Dec 31 10:04:19 2020] Read in and count kmers for 1000000 reads.
[Thu Dec 31 10:04:19 2020] Read in and count kmers for 1100000 reads.
[Thu Dec 31 10:04:19 2020] Read in and count kmers for 1200000 reads.
[Thu Dec 31 10:04:20 2020] Read in and count kmers for 1300000 reads.
[Thu Dec 31 10:04:20 2020] Read in and count kmers for 1400000 reads.
[Thu Dec 31 10:04:20 2020] Read in and count kmers for 1500000 reads.
[Thu Dec 31 10:04:21 2020] Read in and count kmers for 1600000 reads.
[Thu Dec 31 10:04:21 2020] Read in and count kmers for 1700000 reads.
[Thu Dec 31 10:04:22 2020] Read in and count kmers for 1800000 reads.
[Thu Dec 31 10:04:22 2020] Read in and count kmers for 1900000 reads.
[Thu Dec 31 10:04:27 2020] Found 1907655 reads.
[Thu Dec 31 10:04:29 2020] Finish sorting the reads.
[Thu Dec 31 10:05:01 2020] Finish rough annotations.
[Thu Dec 31 10:05:01 2020] Processed 100000 reads (39178 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 200000 reads (66571 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 300000 reads (102784 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 400000 reads (109073 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 500000 reads (134025 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 600000 reads (154734 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 700000 reads (165467 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 800000 reads (179884 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 900000 reads (199505 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 1000000 reads (220420 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 1100000 reads (249605 are used for assembly).
[Thu Dec 31 10:05:01 2020] Processed 1200000 reads (314346 are used for assembly).
[Thu Dec 31 10:05:02 2020] Processed 1300000 reads (386852 are used for assembly).
[Thu Dec 31 10:05:04 2020] Processed 1400000 reads (468733 are used for assembly).
[Thu Dec 31 10:05:09 2020] Processed 1500000 reads (551713 are used for assembly).
[Thu Dec 31 10:05:15 2020] Processed 1600000 reads (630005 are used for assembly).
[Thu Dec 31 10:05:23 2020] Processed 1700000 reads (708387 are used for assembly).
[Thu Dec 31 10:05:30 2020] Processed 1800000 reads (782977 are used for assembly).
[Thu Dec 31 10:05:38 2020] Processed 1900000 reads (855116 are used for assembly).
[Thu Dec 31 10:05:38 2020] Assembled 857156 reads.
[Thu Dec 31 10:05:38 2020] Try to rescue 13554 reads for assembly.
[Thu Dec 31 10:05:40 2020] Rescued 7554 reads.
[Thu Dec 31 10:05:41 2020] Processed 100000 reads for extension.
[Thu Dec 31 10:05:42 2020] Processed 200000 reads for extension.
[Thu Dec 31 10:05:42 2020] Processed 300000 reads for extension.
[Thu Dec 31 10:05:43 2020] Processed 400000 reads for extension.
[Thu Dec 31 10:05:47 2020] Processed 500000 reads for extension.
[Thu Dec 31 10:05:51 2020] Processed 600000 reads for extension.
[Thu Dec 31 10:05:55 2020] Processed 700000 reads for extension.
[Thu Dec 31 10:05:59 2020] Processed 800000 reads for extension.
[Thu Dec 31 10:06:02 2020] Extend assemblies by mate pair information.
[Thu Dec 31 10:06:09 2020] Remove redundant assemblies.
[Thu Dec 31 10:06:12 2020] Finish assembly.
[Thu Dec 31 10:06:13 2020] SYSTEM CALL: /home/ghogg/TRUST/TRUST4/annotator -f /home/ghogg/TRUST/TRUST4/human_IMGT+C.fa -a TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_final.out -t 1 -o TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead -r TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_assembled_reads.fa > TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_annot.fa
[Thu Dec 31 10:06:13 2020] Start to annotate assemblies.
sh: line 1: 3587 Segmentation fault (core dumped) /home/ghogg/TRUST/TRUST4/annotator -f /home/ghogg/TRUST/TRUST4/human_IMGT+C.fa -a TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_final.out -t 1 -o TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead -r TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_assembled_reads.fa > TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_annot.fa
system /home/ghogg/TRUST/TRUST4/annotator -f /home/ghogg/TRUST/TRUST4/human_IMGT+C.fa -a TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_final.out -t 1 -o TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead -r TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_assembled_reads.fa > TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_annot.fa failed: 35584 at /home/ghogg/TRUST/TRUST4/run-trust4 line 47.

Final.out file:

TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_final.out.zip

@mourisl
Copy link
Collaborator

mourisl commented Dec 31, 2020

We have this bam from TCGA on our server in our system, and TRUST4 could process this sample successfully. Here is my log for this sample of the main trust4 stage:

[Thu Dec 31 12:01:05 2020] Read in and count kmers for 100000 reads.
[Thu Dec 31 12:01:05 2020] Read in and count kmers for 200000 reads.
[Thu Dec 31 12:01:05 2020] Read in and count kmers for 300000 reads.
[Thu Dec 31 12:01:06 2020] Read in and count kmers for 400000 reads.
[Thu Dec 31 12:01:06 2020] Read in and count kmers for 500000 reads.
[Thu Dec 31 12:01:06 2020] Read in and count kmers for 600000 reads.
[Thu Dec 31 12:01:06 2020] Read in and count kmers for 700000 reads.
[Thu Dec 31 12:01:07 2020] Read in and count kmers for 800000 reads.
[Thu Dec 31 12:01:07 2020] Read in and count kmers for 900000 reads.
[Thu Dec 31 12:01:07 2020] Read in and count kmers for 1000000 reads.
[Thu Dec 31 12:01:07 2020] Read in and count kmers for 1100000 reads.
[Thu Dec 31 12:01:08 2020] Read in and count kmers for 1200000 reads.
[Thu Dec 31 12:01:08 2020] Read in and count kmers for 1300000 reads.
[Thu Dec 31 12:01:08 2020] Read in and count kmers for 1400000 reads.
[Thu Dec 31 12:01:09 2020] Read in and count kmers for 1500000 reads.
[Thu Dec 31 12:01:09 2020] Read in and count kmers for 1600000 reads.
[Thu Dec 31 12:01:10 2020] Read in and count kmers for 1700000 reads.
[Thu Dec 31 12:01:10 2020] Read in and count kmers for 1800000 reads.
[Thu Dec 31 12:01:10 2020] Read in and count kmers for 1900000 reads.
[Thu Dec 31 12:01:15 2020] Found 1907655 reads.
[Thu Dec 31 12:01:17 2020] Finish sorting the reads.
[Thu Dec 31 12:01:28 2020] Finish rough annotations.
[Thu Dec 31 12:01:29 2020] Processed 100000 reads (39178 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 200000 reads (66571 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 300000 reads (102784 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 400000 reads (109073 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 500000 reads (134025 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 600000 reads (154734 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 700000 reads (165467 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 800000 reads (179884 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 900000 reads (199505 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 1000000 reads (220420 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 1100000 reads (249605 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 1200000 reads (314346 are used for assembly).
[Thu Dec 31 12:01:29 2020] Processed 1300000 reads (386850 are used for assembly).
[Thu Dec 31 12:01:31 2020] Processed 1400000 reads (468734 are used for assembly).
[Thu Dec 31 12:01:36 2020] Processed 1500000 reads (551725 are used for assembly).
[Thu Dec 31 12:01:41 2020] Processed 1600000 reads (630037 are used for assembly).
[Thu Dec 31 12:01:48 2020] Processed 1700000 reads (708415 are used for assembly).
[Thu Dec 31 12:01:54 2020] Processed 1800000 reads (783005 are used for assembly).
[Thu Dec 31 12:02:02 2020] Processed 1900000 reads (855149 are used for assembly).
[Thu Dec 31 12:02:02 2020] Assembled 857189 reads.
[Thu Dec 31 12:02:02 2020] Try to rescue 13557 reads for assembly.
[Thu Dec 31 12:02:03 2020] Rescued 7560 reads.
[Thu Dec 31 12:02:10 2020] Extend assemblies by mate pair information.
[Thu Dec 31 12:02:17 2020] Remove redundant assemblies.
[Thu Dec 31 12:02:22 2020] Finish assembly.

The message "Found 1907655 reads." is the same as yours, suggesting the fq files are the same. However, the numbers in "Assembled 857189 reads." is slightly different, which means there might be bugs during assembly. I'm currently checking this on a different system.

Can you show me the last few lines of the file TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_toassemble_1.fq and do a "wc" on these two fastq files so I can make sure the content is the same between your results and mine? Thank you.

@grhogg
Copy link
Author

grhogg commented Dec 31, 2020

Thank you so much for all your help. I really appreciate it. Here are the last few lines:

CCCFFFFFHHGHHJJJJJJJJJJJJJJJJJJJIJJJDGHIJJJJJJJJ
@UNC11-SN627:380:C58DMACXX:5:1302:19841:81352
CTGTAAATATAAGTTAGTGAGGAGGCTGTTACATCCAGTTAGGTAGAC
+
@@CDFFFFHGHHGHHJJHGIJJJIIGIIJJJJIIIIIJHIJJJFHIIC
@UNC11-SN627:380:C58DMACXX:5:2113:8440:49566
CTGGTGTCTACCTAACTGGATGTAACAGCCTCCTCACTAACTTATATT
+
BBCFDFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJ

And here are the word counts:

wc /scratch/ghogg/GDC_DATA/OUT/TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_toassemble_1.fq

3997976 145424996

wc /scratch/ghogg/GDC_DATA/OUT/TRUST_24c6110f-257e-49c1-9900-dd36a4d026b9_gdc_realn_rehead_toassemble_2.fq

3997976 145424996

@mourisl
Copy link
Collaborator

mourisl commented Dec 31, 2020

I just tested the sample on a different system and also tried various gcc versions, they all gave the same result. Just want to make sure, when you pull the new version of TRUST4, did you "make clean" first before recompile TRUST4? If this is not the case, can you try it on a different system or allocate more memory if you are using computational platform like slurm?

For running on a different system, you can copy the two files toassemble*.fq out, and just run ./trust4 -f ../hg38_bcrtcr.fa -1 XXX_toassemble_1.fq -2 XXX_toassemble_2.fq to see whether the numbers in the log match. Thank you!

@gthunger
Copy link

Hi, I got the same error as grhogg did. The end of the error message is the same as grhogg posted above (quoted below). Used TRUST4 v1.0.2-beta.

failed: 35584 at /home/ghogg/TRUST/TRUST4/run-trust4 line 47.

The difference is that the trust4 run on majority of the samples (50+) finished successfully. 15% of the runs failed with this error. Let me know if you need further info. Thanks!

@mourisl
Copy link
Collaborator

mourisl commented Feb 19, 2021

Part of the issue is addressed in the newest version (v1.0.2), could you please give it a try? If it still fails at the "annotator" program, could you please share some of the "XXX_final.out" file from the failed samples? Thank you!

@gthunger
Copy link

Will give it a try. Thanks!

@gthunger
Copy link

Hi, it still failed in most previously failed samples except one finished successfully. I've attached *_final.out files from two failed samples for you to take a look. Thanks.
Archive.zip

@mourisl
Copy link
Collaborator

mourisl commented Feb 25, 2021

Thanks for sharing the data. I just tested it on our system, it works fine and the memory access pattern is also normal. I'm wondering whether this is fixed in the updates in the past week. Could you please try the most updated version on github with "git clonel" and give it a try? Thank you.

@gthunger
Copy link

gthunger commented Feb 26, 2021

I did use the most updated version v1.0.2 for the data above. For memory, 25gb was allocated. Is it enough? Anything else I can look into? Thanks!

@mourisl
Copy link
Collaborator

mourisl commented Feb 26, 2021

Could you please show me all the screen output from TRUST4 for one of the samples?

I just checked the length of the assemblies and some of them are quite short, suggesting there might be some very short reads or read through issue for some mate pairs. I fixed a bug regarding such short read length after releasing v1.0.2. Just want to make sure, did you use the version through "git clone" or downloaded v1.0.2 from the release page? Thank you.

@gthunger
Copy link

It was downloaded from the release page. I've attached a screen output from one sample.
8.log.zip

@mourisl
Copy link
Collaborator

mourisl commented Feb 26, 2021

I just tried the v1.0.2 version on the release page, indeed it crashed on your data. I think the new updates have fixed this issue. Could you use "git clone" to get the newest version and give it a try? If this one works, I will create a new release. Thank you.

@gthunger
Copy link

gthunger commented Mar 1, 2021

I tried using 'git clone' but still got the same segmentation error...

@mourisl
Copy link
Collaborator

mourisl commented Mar 1, 2021

Is the log the same as the previous run? I just want to confirm it crashed at the same step. Thank you for your patience.

@gthunger
Copy link

gthunger commented Mar 2, 2021

Yes, log is the same. Failed at the step of annotation assemblies. Thanks.

@mourisl
Copy link
Collaborator

mourisl commented Mar 2, 2021

Could you please share the file for the new 8_TCR_final.out or 16_TCR_final.out again? Those files could be slightly different from the v1.0.2 run. Thank you.

@gthunger
Copy link

gthunger commented Mar 2, 2021

Sure, attached. Thanks.
8_TCR_final.out.zip

@mourisl
Copy link
Collaborator

mourisl commented Mar 2, 2021

I'm so puzzled, it still works on my computer. Could you please share the two files, 8.TCR_toassemble_1.fq and 8.TCR_toassemble_2.fq? Thanks for your help on debugging TRUST4.

One unlikely reason is the -O3 optimization flag in the Makefile. Can you change it to just "-O", and do a "make clean; make" to recompile TRUST4, and then test it again?

@gthunger
Copy link

gthunger commented Mar 3, 2021

I'm not sure if it's related to my system. But it finished with success on many other samples... I've attached the files you asked for. Thanks for testing them.
Archive 2.zip

@mourisl
Copy link
Collaborator

mourisl commented Mar 3, 2021

It's strange, here is the log of the assembly module in TRUST4 for the v1.0.2 downloaded on the release page:

[Wed Mar 3 16:32:16 2021] Read in and count kmers for 100000 reads.
[Wed Mar 3 16:32:19 2021] Read in and count kmers for 200000 reads.
[Wed Mar 3 16:32:29 2021] Found 253483 reads.
[Wed Mar 3 16:32:32 2021] Finish sorting the reads.
[Wed Mar 3 16:32:39 2021] Finish rough annotations.
[Wed Mar 3 16:32:54 2021] Processed 100000 reads (59282 are used for assembly).
[Wed Mar 3 16:33:26 2021] Processed 200000 reads (109013 are used for assembly).
[Wed Mar 3 16:33:29 2021] Assembled 115437 reads.
[Wed Mar 3 16:33:29 2021] Try to rescue 21224 reads for assembly.
[Wed Mar 3 16:33:38 2021] Rescued 4503 reads.
[Wed Mar 3 16:33:55 2021] Processed 100000 reads for extension.
[Wed Mar 3 16:33:57 2021] Extend assemblies by mate pair information.
[Wed Mar 3 16:33:59 2021] Remove redundant assemblies.
[Wed Mar 3 16:34:02 2021] Finish assembly.

By comparing with your log file, the initial reads should be the same, but when in the assembly, there are small discrepancies, such as:
[Thu Feb 25 16:44:51 2021] Processed 100000 reads (59289 are used for assembly)
vs
[Wed Mar 3 16:32:54 2021] Processed 100000 reads (59282 are used for assembly)

I guess there could be something difference on the system or compiler. What is your system and compiler version? I will try to reproduce the issue. In the mean time, since the package on your system is in /share/apps/, can you confirm the program was complied after a "make clean"? Thank you.

@mourisl
Copy link
Collaborator

mourisl commented Mar 7, 2021

@gthunger I just fixed an issue that seems to affect more on the macOS. Could you please "git pull" the new code and give it a try? Thank you.

@mourisl
Copy link
Collaborator

mourisl commented Mar 10, 2021

Hi @grhogg , I think this update could also fix the crash issue on your data. Though it was a few months ago, could you please give it a try if you still need the results? Thank you.

@arquina
Copy link

arquina commented Apr 19, 2021

I have a same issue in using TRUST4 SW. I think the bigger bam file seems to be have more error. How can I get some help about this problem?

@mourisl
Copy link
Collaborator

mourisl commented Apr 19, 2021

@arquina Is your TRUST4 version from "git clone" ? Could you please show me the screen output so that we can see which stage it crashed on? Thank you.

@arquina
Copy link

arquina commented Apr 19, 2021

Yes! I downloaded the TRUST4 using "git clone" and conduct make clean and make.
I'll attatch the some of the error. I'm running trust with TCGA data and several files show error about memory.
Such as free() error or malloc error.
image

@mourisl
Copy link
Collaborator

mourisl commented Apr 19, 2021

@arquina Thank you for providing the information. I think I've found and fixed the bug. Could you please "git pull" the updates and give it a try? Thank you.

@arquina
Copy link

arquina commented Apr 20, 2021

Thank you. I update the code and retry to run TRUST4. It seems the error is changed but still have an error. Here is the result of the code.

double free or corruption (!prev)
system /home/seob/DIP/TRUST4/trust4 -f /home/seob/DIP/TRUST4/hg38_bcrtcr.fa -t 50 -o /home/seob/DIP/DIP_data/TCGA_new/COAD/cancer/trust_result/TCGA-3L-AA1B-01A/TCGA-3L-AA1B-01A -1 /home/seob/DIP/DIP_data/TCGA_new/COAD/cancer/trust_result/TCGA-3L-AA1B-01A/TCGA-3L-AA1B-01A_toassemble_1.fq -2 /home/seob/DIP/DIP_data/TCGA_new/COAD/cancer/trust_result/TCGA-3L-AA1B-01A/TCGA-3L-AA1B-01A_toassemble_2.fq failed: 134 at /home/seob/DIP/TRUST4/run-trust4 line 47.

image

@mourisl
Copy link
Collaborator

mourisl commented Apr 20, 2021

That is strange. I just tested and checked the memory access pattern again, it looked fine on our server, and I obtained the same numbers as your output. Just want to make sure, have you tried "make clean" before remake TRUST4? Thank you.

@arquina
Copy link

arquina commented Apr 20, 2021

Oh! It works! Thank you for your kind answer. I will ask later if I had some issue for running TRUST4. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants