Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weird encoding for reference name #66

Open
twaddlac opened this issue Jun 26, 2023 · 8 comments
Open

weird encoding for reference name #66

twaddlac opened this issue Jun 26, 2023 · 8 comments

Comments

@twaddlac
Copy link

twaddlac commented Jun 26, 2023

I'm trying to run perbase depth-only in.bam and I am getting the following output:

image

The bam file isn't corrupt (according to samtools) and works with other parts of my pipeline.

This reproduced when running base-depth as well, and replicated by other users.

Running perbase v0.8.5

Please let me know if I can provide any more information!

@sstadick
Copy link
Owner

sstadick commented Oct 9, 2023

Hi! could you share the bamfile / part of the bamfile that leads to this? Or paste in the bam header output from samtools when used on that bamfile as well.

@davidecarlson
Copy link

I'm experiencing the same issue with a few different references I'm trying to get perbase to work with:

REF     POS     DEPTH   A       C       G       T       N       INS     DEL     REF_SKIP        FAIL    NEAR_MAX_DEPTH
^@^@^@^@^@^@^@^P<9B>^C<F0><FF>^?^@^@!   9       26      0       0       22      0       4       0       0       0       0       false
^@^@^@^@^@^@^@<E0>}^A<F0><FF>^?^@^@!    10      98      0       0       0       83      15      0       0       0       0       false
^@^@^@^@^@^@^@^P~^A<F0><FF>^?^@^@!      11      154     0       123     0       0       31      0       0       0       0       false
^@^@^@^@^@^@^@p<E1>^@<F0><FF>^?^@^@!    12      161     0       0       0       123     38      0       0       0       0       false
^@^@^@^@^@^@^@<80><CC>^@<F0><FF>^?^@^@! 13      167     134     0       0       0       33      0       0       0       0       false
^@^@^@^@^@^@^@`%^E<F0><FF>^?^@^@!       14      675     0       546     0       1       128     0       0       0       0       false
^@^@^@^@^@^@^@^@<B9>^C<F0><FF>^?^@^@!   15      716     0       0       581     0       135     0       0       0       0       false
^@^@^@^@^@^@^@<E0>:^E<F0><FF>^?^@^@!    16      782     0       0       1       588     193     0       0       0       0       false
^@^@^@^@^@^@^@^P;^E<F0><FF>^?^@^@!      17      870     0       1       556     0       313     0       0       0       0       false
^@^@^@^@^@^@^@^P^U^L<F0><FF>^?^@^@!     18      940     0       0       710     0       230     0       0       0       0       false
^@^@^@^@^@^@^@^P^F^L<F0><FF>^?^@^@!     19      973     832     0       0       0       141     0       0       0       0       false

Here is my bam file header:

@HD	VN:1.6	SO:coordinate
@SQ	SN:DENV3-E-W_Min_preMVS_P3_CDX430_v2	LN:10708
@PG	ID:bwa	PN:bwa	VN:0.7.17-r1188	CL:bwa mem -t 96 /references/E3093-3_S3_ref.fasta /bbmerge_out/E3093-3_S3_merged_perfect.fastq
@PG	ID:samtools	PN:samtools	PP:bwa	VN:1.18	CL:samtools view -bS -
@PG	ID:samtools.1	PN:samtools	PP:samtools	VN:1.18	CL:samtools sort -@ 96 --output-fmt BAM -o ../bwa-mem_out/bbmerged/E3093-3_S3.sorted.bam ../bwa-mem_out/bbmerged/E3093-3_S3.bam
@PG	ID:MarkDuplicates	VN:3.1.0	CL:MarkDuplicates MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=10000 INPUT=[../bwa-mem_out/bbmerged/E3093-3_S3.sorted.bam] OUTPUT=../bwa-mem_out/bbmerged/E3093-3_S3.sorted.markdup.bam METRICS_FILE=../bwa-mem_out/bbmerged/E3093-3_S3.sorted.markdup.metrics REMOVE_DUPLICATES=false ASSUME_SORTED=true TMP_DIR=[/gpfs/projects/GenomicsCore/codagenix/scripts/tmp] VALIDATION_STRINGENCY=LENIENT    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false FLOW_MODE=false FLOW_QUALITY_SUM_STRATEGY=false USE_END_IN_UNPAIRED_READS=false USE_UNPAIRED_CLIPPED_END=false UNPAIRED_END_UNCERTAINTY=0 FLOW_SKIP_FIRST_N_FLOWS=0 FLOW_Q_IS_KNOWN_END=false FLOW_EFFECTIVE_QUALITY_THRESHOLD=15 ADD_PG_TAG_TO_READS=true DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false USE_JDK_DEFLATER=false USE_JDK_INFLATER=false	PN:MarkDuplicates
@PG	ID:samtools.2	PN:samtools	PP:samtools.1	VN:1.18	CL:samtools view -H E3093-3_S3.sorted.markdup.bam

I've also noticed that if I supply the reference fasta file in my perbase command, I get the following error:

[2023-11-13T19:38:56Z INFO  perbase_lib::par_granges] Using 94 worker threads.
[2023-11-13T19:38:56Z INFO  perbase_lib::par_granges] Creating channel of length 189246910 (* 120 bytes to get mem)
[2023-11-13T19:39:04Z INFO  perbase_lib::par_granges] Reading from "bwa-mem_out/bbmerged/E3093-3_S3.sorted.markdup.bam"
[2023-11-13T19:39:04Z INFO  perbase_lib::par_granges] Processing TID 0:0-10708
thread '<unnamed>' panicked at 'Fetched reference sequence: Unknown sequence name: ����!.', src/commands/base_depth.rs:275:30
stack backtrace:
   0:     0x55555575ea10 - std::backtrace_rs::backtrace::libunwind::trace::ha9053a9a07ca49cb
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x55555575ea10 - std::backtrace_rs::backtrace::trace_unsynchronized::h9c2852a457ad564e
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55555575ea10 - std::sys_common::backtrace::_print_fmt::h457936fbfaa0070f
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x55555575ea10 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h5779d7bf7f70cb0c
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x5555556b9bfe - core::fmt::write::h5a4baaff1bcd3eb5
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/fmt/mod.rs:1232:17
   5:     0x555555738ea4 - std::io::Write::write_fmt::h4bc1f301cb9e9cce
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/io/mod.rs:1684:15
   6:     0x55555576023f - std::sys_common::backtrace::_print::h5fcdc36060f177e8
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:47:5
   7:     0x55555576023f - std::sys_common::backtrace::print::h54ca9458b876c8bf
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:34:9
   8:     0x55555575fe3f - std::panicking::default_hook::{{closure}}::hbe471161c7664ed6
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:271:22
   9:     0x555555760e58 - std::panicking::default_hook::ha3500da57aa4ac4f
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:290:9
  10:     0x555555760e58 - std::panicking::rust_panic_with_hook::h50c09d000dc561d2
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:692:13
  11:     0x555555760914 - std::panicking::begin_panic_handler::{{closure}}::h9e2b2176e00e0d9c
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:583:13
  12:     0x555555760876 - std::sys_common::backtrace::__rust_end_short_backtrace::h5739b8e512c09d02
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:150:18
  13:     0x555555760861 - rust_begin_unwind
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
  14:     0x5555555d60b2 - core::panicking::panic_fmt::hf33a1475b4dc5c3e
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
  15:     0x5555555d63e2 - core::result::unwrap_failed::hdff5465d74574b44
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/result.rs:1750:5
  16:     0x5555555f669a - <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::next::h4ec7ffc6086dfa1a
  17:     0x555555624d34 - rayon::iter::plumbing::bridge_producer_consumer::helper::h40d19706cb30b716
  18:     0x5555555ef2ca - rayon_core::join::join_context::{{closure}}::he92c561eced12757
  19:     0x5555555e6b56 - rayon_core::thread_pool::ThreadPool::install::{{closure}}::h185c0d76b416e80d
  20:     0x555555632daf - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::hb3e400f8a6fd84fd
  21:     0x5555555da5ed - rayon_core::registry::WorkerThread::wait_until_cold::h441bdd8e1d8196e3
  22:     0x5555556dc9ba - std::sys_common::backtrace::__rust_begin_short_backtrace::h933f37f9b851e73f
  23:     0x5555556dc74d - core::ops::function::FnOnce::call_once{{vtable.shim}}::h1b5aaf29acd6a35f
  24:     0x555555761b45 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h39990b24eedef2ab
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/alloc/src/boxed.rs:1987:9
  25:     0x555555761b45 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h01a027258444143b
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/alloc/src/boxed.rs:1987:9
  26:     0x555555761b45 - std::sys::unix::thread::Thread::new::thread_start::ha4f1cdd9c25884ba
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys/unix/thread.rs:108:17
  27:     0x7ffff79af1cf - start_thread
  28:     0x7ffff7094e73 - clone
  29:                0x0 - <unknown>

As far as I can tell, there is nothing wrong with either my reference fasta file or the input bam file. Other programs (samtools, bwa, bbmap, etc.) don't report any issues with them.

Any idea what the problem could be? I'm happy to share the full fasta and/or bam files if it would be useful.

Best,
Dave

@sstadick
Copy link
Owner

If you had a minimal example with a bamfile and reference that would be awesome!

I'm really not sure what the issue is. This function is what would generate the ref_seq name, which in these outputs looks very mangled (breadcrumb for myself looking at this again later)

@davidecarlson
Copy link

Sure thing. The reference is very small (~10 Kb) and I've extracted a subset of 10k BAM entries that produce the weird results.

Thanks for taking a look!

Best,
Dave
example.zip

@davidecarlson
Copy link

Quick update - I reinstalled the latest version of perbase, and I'm no longer seeing the same errors on the exact same reference fastas and bam files.

I don't know what caused the issue in the first place, but it's working now, so I'm happy! :)

Thanks!
Dave

@twaddlac
Copy link
Author

@sstadick Sorry for the delay here! @davidecarlson thanks for chiming in as well. I installed 0.9.0 with no avail. I'll share the data here once I have clearance to do so (hopefully I will soon).

@sstadick
Copy link
Owner

@davidecarlson I'm glad that worked!
@twaddlac 👍

@25shmeckles
Copy link

25shmeckles commented Nov 25, 2024

I am having a similar issue with the latest version (0.10.0). The REF encoding is weird... It is as expected at first, then it suddenly changes, then it flips back to normal. It does so a few times on my bam. The bam file appears to be ok.

perbase only-depth --keep-zeros input.bam > output.tsv

This is a section of the output file showing the sudden change in the REF column:

PIK3CA_8__chr3_179217801-179218900	1008	1009	21
PIK3CA_8__chr3_179217801-179218900	1009	1010	13
PIK3CA_8__chr3_179217801-179218900	1010	1011	10
PIK3CA_8__chr3_179217801-179218900	1011	1012	9
PIK3CA_8__chr3_179217801-179218900	1012	1013	7
PIK3CA_8__chr3_179217801-179218900	1013	1015	6
PIK3CA_8__chr3_179217801-179218900	1015	1017	5
PIK3CA_8__chr3_179217801-179218900	1017	1019	3
PIK3CA_8__chr3_179217801-179218900	1019	1021	1
PIK3CA_8__chr3_179217801-179218900	1021	1100	0
"�������À•	��`��%��"	1	1231	0
"������� H
��`��#�"	1	1118	0
PIK3CA_12-13__chr3_179223805-179225014	1	1210	0
"�������Ðl	��`��#�"	1	922	0
PIK3CA_15-17__chr3_179229049-179230492	1	1444	0
"����������`��#�"	1	73	0
"�������0��`��#�"	73	80	1
"�������`��`��#�"	80	81	3
"����������`��#�"	81	82	5
"�������ÀÀ
��`��#�"	82	83	10

If I run perbase base-depth instead I get the following error:

[2024-11-25T15:57:33Z INFO  perbase_lib::par_granges] Processing TID 0:0-1398
[W::hts_idx_load3] The index file is older than the data file: /xxx/xxx/xxx/wf-alignment_01JD9GWY0GVRWXA0EHERBSN1C2/output/fastq_pass.sorted.aligned.bam.bai
thread '<unnamed>' panicked at /Users/xxx/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rust-htslib-0.38.2/src/bam/record.rs:1539:6:
index out of bounds: the len is 0 but the index is 616
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

with the RUST_BACKTRACE set to 1 I get:

[W::hts_idx_load3] The index file is older than the data file: /Volumes/xxx/xxx/wf-alignment_01JD9GWY0GVRWXA0EHERBSN1C2/output/fastq_pass.sorted.aligned.bam.bai
thread '<unnamed>' panicked at /Users/xxx/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rust-htslib-0.38.2/src/bam/record.rs:1539:6:
index out of bounds: the len is 0 but the index is 616
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic_bounds_check
   3: <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::next
   4: rayon::iter::plumbing::bridge_producer_consumer::helper
   5: rayon_core::join::join_context::{{closure}}
   6: rayon_core::thread_pool::ThreadPool::install::{{closure}}
   7: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
   8: rayon_core::registry::WorkerThread::wait_until_cold
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

and with the full backtrace:

[W::hts_idx_load3] The index file is older than the data file: /xxx/xxx/xxx/wf-alignment_01JD9GWY0GVRWXA0EHERBSN1C2/output/fastq_pass.sorted.aligned.bam.bai
thread '<unnamed>' panicked at /Users/xxx/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rust-htslib-0.38.2/src/bam/record.rs:1539:6:
index out of bounds: the len is 0 but the index is 616
stack backtrace:
   0:        0x1018d53a6 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::hcaf66bc4c0c453df
   1:        0x1017f603b - core::fmt::write::hc9c5f1836b413410
   2:        0x1018a4dd2 - std::io::Write::write_fmt::h49df280499063c09
   3:        0x1018d6a88 - std::panicking::default_hook::{{closure}}::h52c0b2f44f6107c5
   4:        0x1018d6701 - std::panicking::default_hook::h5a6cf31501c161b2
   5:        0x1018d814d - std::panicking::rust_panic_with_hook::hda4640ee332466e9
   6:        0x1018d73f5 - std::panicking::begin_panic_handler::{{closure}}::haa3060694b34ea3d
   7:        0x1018d7359 - std::sys::backtrace::__rust_end_short_backtrace::h8eb44913cfe71457
   8:        0x1018d734c - _rust_begin_unwind
   9:        0x1019ea7ea - core::panicking::panic_fmt::h31edc3d6ff0aadca
  10:        0x1019ea796 - core::panicking::panic_bounds_check::h5432ed6a32628a0d
  11:        0x10171a1b1 - <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::next::h18a341bd5498cd28
  12:        0x10174e972 - rayon::iter::plumbing::bridge_producer_consumer::helper::h4211baf774743063
  13:        0x101712f53 - rayon_core::join::join_context::{{closure}}::ha9193897d34d74cf
  14:        0x10170d828 - rayon_core::thread_pool::ThreadPool::install::{{closure}}::h09ba94b1c5e72919
  15:        0x10175c11b - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h3ff36cf52e19afbd
  16:        0x1019ece82 - rayon_core::registry::WorkerThread::wait_until_cold::h82cbc48dcf54a2de
  17:        0x1018128bf - std::sys::backtrace::__rust_begin_short_backtrace::hf6c2d7c84a523c27
  18:        0x1018126d4 - core::ops::function::FnOnce::call_once{{vtable.shim}}::he6517396c648279b
  19:        0x1018d8c1b - std::sys::pal::unix::thread::Thread::new::thread_start::h55ff15b5f2276bcd
  20:     0x7ff8015ff18b - __pthread_start

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants