Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10X whitelist issue #74

Open
Nusob888 opened this issue Aug 28, 2021 · 36 comments
Open

10X whitelist issue #74

Nusob888 opened this issue Aug 28, 2021 · 36 comments

Comments

@Nusob888
Copy link

Hi,

Sorry for spamming the issues. I am having an error come up whenever I try to add a whitelist.

essentially the run report gives me no additional explanation other than:
failed: 11 at run-trust4 line 48

I have tried to look over the script but can't really identify why it should fail there.

If I remove the barcodeWhitelist parameter, it carries on as normal. I have also triple checked the whitelists and can match them to the dataset fine.

Is it possible to check if this is a reproducible error? or something specific to my run?

It occurs on all datasets I have tried it on.

Thanks

@mourisl
Copy link
Collaborator

mourisl commented Aug 28, 2021

Could you please show me the full command you used? Thanks.

@Nusob888
Copy link
Author

Hi,

sure thing, this was for chemistry v3, but I have tried it for V1 and V2, same thing happens. Each time I adjusted the ranges to fit the chemistries

`
run-trust4 -t 24 -f ~/hg38_bcrtcr.fa --ref ~/human_IMGT+C.fa -u ~/XXXXX_2 --barcode ~/XXXXX_1, --barcodeRange 0 15 + --barcodeWhitelist ~/3M-february-2018.txt --UMI ~/XXXXX_1 --umiRange 16 27 + --od ~/outdir/

`

@mourisl
Copy link
Collaborator

mourisl commented Aug 28, 2021

Could you please show me the full on-screen output from TRUST4? I need to make sure that the crash happened in the fastq-extractor program. Thanks.

@Nusob888
Copy link
Author

Sorry, our cluster locks out copy and pasting the output and I have to keep the study IDs of the samples confidential. But yes it occurs just as the fastq-extractor starts. It then prints out "system" followed by the input commands as above. Then followed by failed: 11 at run-trust4 line 48.

Is that enough? if not I can email you a screen capture

@mourisl
Copy link
Collaborator

mourisl commented Aug 28, 2021

Just want to make sure, the 3M-february-2018.txt file is compressed in the cellranger package, so have you decompressed it before running TRUST4?

Did you see the line " Start to extract candidate reads from read files." output on the screen?

@Nusob888
Copy link
Author

Hi, yes I took it from a local installation of cellranger and uploaded it on our cluster within a custom directory that I then supply the absolute path to.

The reason was because our cell ranger is a module installation that cannot be edited by end-users.

But the file is the decompressed txt.

What I see is:

[Sat Aug 28 19:42:51 2021] Start to extract candidate reads from read files. system ~/TRUST4/fastq-extractor -t 24 -f ~/hg38_bcrtcr.fa -o ~/out_dir//TRUST_XXXXX_2_toassemble --barcodeStart 0 --barcodeEnd 15 --barcodeWhitelist ~/3M-february-2018.txt --umiStart 16 --umiEnd 27 -u ~/XXXX_2.fastq.gz --barcode ~/XXXX_1.fastq.gz --UMI ~/XXXX_1.fastq.gz failed: 11 at ~/TRUST/run-trust4 line 48

@mourisl
Copy link
Collaborator

mourisl commented Aug 28, 2021

This is very strange. Could you please show me the first few lines for XXX_1.fastq.gz, XXX_2.fastq.gz and 3M-february-2018.txt files?

@Nusob888
Copy link
Author

it is just the absolute paths to the locations:

for our system its
/groupdir/users/username/projects/projectx/gex/XXX_1.fastq.gz /groupdir/users/username/projects/projectx/gex/XXX_2.fastq.gz /groupdir/users/username/reference/cellranger_whitelists/v3/3M-february-2018.txt

I would be surprised if that is an issue as trust4 runs to completion when I do not include the whitelist. It will extract the barcodes and umis and reads fine and generate the outputs.

It seems to be something to do with the whitelist matching and correction. The only thing might be if you call directly on any shared libraries or /bin located files, we do not have access to those. Or if under the hood you are calling on cell ranger? but I didn't see that in the script.

@mourisl
Copy link
Collaborator

mourisl commented Aug 28, 2021

Sorry, I mean the first few lines of the content in those files, so I can check whether the lengths, especially in the whitelist file, match the files on our server. Thanks.

@Nusob888
Copy link
Author

Oh right. sorry.

R1
@XXXXX.1 1 length=28 CNAAGCGAGCTTACGACGCCTTTCATGC +XXXXX.1 1 length=28 F#F,FF,FFFFFFF:,FFFFFFFFFFFF @XXXXX.2 2 length=28 AATGCCAAGAACGCGTAGACATCTAGCT +XXXXX.2 2 length=28 FFFFFFFFFFFFFFFFFFFFFFFFFFFF @XXXXX.3 3 length=28 CCTCACACACAACCGCTTCTGGTGCAGA

R2
@XXXXX.1 1 length=151 GAAGAATGGTACAAATCCAAGTTTGCTGACCTCTCTGAGGCTGCCAACCGGAACAATGACGCCCTGCGCCAGGCAAAGCAGGAGTCCACTGAGTACCGGAGACAGGTGCAGTACCTCACCTGTGAAGTGGATGCCCTTAAAGGAACCAAAA +XXXXX.1 1 length=151 FFFFFFF:FFFFFFFFFFFF:FFFFFFF,FFFFFFFFFFFFFFFF:F:FFFF:FFFFFFFFF::FFFFFFFFFFF,FFFFFFFFFFFFFFFFFF,FFFFF:FFFFFFFFFFF,FFFFFF:FFFFFFFFFF,FFF:FFFFFFFF:,,FF:F: @XXXXX.2 2 length=151 AGCCACTGCGCCCAGCCGGTCTTACCCACTTTTATATATATTTGAAACACAGTGGAGTAGATACCTAATAAATATTTGTTAGATTTATCTGAAGAAACTTAGATTAAATTCACTCGGAAGGTGCCTTATGTCAGGGATACTAAAAAATTAT +XXXXX.2 2 length=151 FFFFFFFFF,FFFFFFF,FFFFF:FFFFF::FFFFFFF:F:FFFFFFFFF,FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF:FF,FF,FFFFFFF:FFFFFFFFFFF,FFFFFFF:FF,FF,,FFFFF,F,:,F::,F:FFFF @XXXXX.3 3 length=151 GCCTGGTGGCTCACACCTGTAATCCTAGCACTTTAGGAGGCCAAGGGGGGTGGATCACGAGGTCAGGGGTTCGAGACCAGCCTGACCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCTGGGCGTGGTGGCAGGCGCCT

whitelist:
AAACCCAAGAAACACT AAACCCAAGAAACCAT AAACCCAAGAAACCCA AAACCCAAGAAACCCG AAACCCAAGAAACCTG AAACCCAAGAAACGAA AAACCCAAGAAACGTC AAACCCAAGAAACTAC AAACCCAAGAAACTCA AAACCCAAGAAACTGC AAACCCAAGAAACTGT AAACCCAAGAAAGAAC AAACCCAAGAAAGACA AAACCCAAGAAAGCCT AAACCCAAGAAAGCGA AAACCCAAGAAAGGAT AAACCCAAGAAAGGTA AAACCCAAGAAAGTCT AAACCCAAGAAATAGG AAACCCAAGAAATCCA AAACCCAAGAAATGAG

@Nusob888
Copy link
Author

Sorry, can I just add a request onto the same issue chain.

Is it possible to index the annot.fa with the report.tsv or cdr3.out?

or Add read counts into the annot.fa names?

I am trying to re-annotate the annot.fa assembly contigs with Igblast but there is no way to index back to the read counts. Since annot.fa is a consensus contig, the read fragment count is lost.

This would be really helpful as a feature since I predominantly use Immcantation as my analysis workflow to get mutation counts etc. But in its current state, I cannot do it in the same way as I can with MIXCR. In MIXCR indexing is possible as they provide the "targetsequence" (consensus assembly) in their report file.

Thanks very much for the consideration.

@mourisl
Copy link
Collaborator

mourisl commented Aug 30, 2021

The first column in the cdr3.out file is the contig id in the annot.fa file, which can be used for index. Or do you mean other way of indexing?
The report.tsv file is the summary of cdr3.out file, and the second last column is the representative's contig id in the annot.fa file.

For the whitelist issue, it finished without any error on our data even when testing with different settings. I may need some more time to test. In our previous experiment, the barcode correction marginally improved the results.

@Nusob888
Copy link
Author

Thanks for the update on the whitelist issue and ongoing troubleshooting.

For the indexing, what I have noticed, is that there are multiple duplicated assemble ids. e.g. assemble0 occurs up to 20 times for me.

When I look closer, this is the case in the report also, and it seems this is down to the fact that trust4 allows for SMH, and therefore multiple cids can be assigned to different sequence contigs.

The cdr3. out file nicely contains a consensus_id index, but this is not present in the annot.fa and therefore we cannot index the exact sequence in the annot.fa file to the report or cdr3.out file. Also the cdr3.out file oddly has read counts that look like averages? 76.08 for example, whereas the report has round numericals

@mourisl
Copy link
Collaborator

mourisl commented Aug 30, 2021

The sequence contig in the annot.fa file is the consensus and encodes/compresses highly similar sequences, such as from SHMs as you mentioned. Therefore, in the _cdr3.out file, TRUST4 also outputs those minor CDR3s encoded in the consensus. In other words, the CDR3s are all from the sequence in annot.fa, and they are listed by the second column "index_within_consensus" in cdr3.out. I think for the CDR3s from the same contig/consensus, you can pick the one with highest abundance as representative.

As for the decimal numbers of the abundance, it is because that when TRUST4 tries to decode the CDR3 encoded in the consensus, a read partially overlapped with CDR3 region could support more than 1 CDR3 type. Therefore, TRUST4 applied EM algorithm to quantify the CDR3s from the same consensus.

Hope these explanations help.

@Nusob888
Copy link
Author

Thanks, that makes sense.

However, I remain a little confused how trust4 works in this case:

  • Is the consensus assembly then the most abundant assembled contig from a group of similar receptor sequences?
  • Why does trust4 not output a fasta of all assembled contigs? If you group sequences under a single consensus, are you not losing information about somatic hypermutation per recovered contig? For BCR analysis this is an important aspect when looking at clonal lineages. Many prefer to use custom clone definitions of varying stringency.

Lastly, I would prefer to index all annot.fa contigs with those in the cdr3.out or report.tsv files. The trouble with picking a highly abundant one is that the nucleotide sequences are not identical, therefore for diversity analysis, you are losing information if you just pick by consensus or abundance for use in other tools.

This seems like it could be easily resolved by adding a index_within_consensus ID into the annot.fa file?

@mourisl
Copy link
Collaborator

mourisl commented Aug 30, 2021

Yes, you can regard the consensus assembly being the most abundant clonotype.
The receptor sequence is much longer than a read, for example, it is difficult to decide whether two variations in V and J genes respectively are from the same receptor sequence. So TRUST4 only tries to recover the encoded sequence in CDR3, which in many cases can be fully covered by one read.

I think I can write a simple script to break up consensuses in annot.fa file with information from _cdr3.out file, so they have a one-to-one mapping.

@Nusob888
Copy link
Author

That would be great. Thank you.

Alternatively, if the fully assembled contig could be added to the report.tsv and cdr3.out file, that could be good as a user can then generate a fasta file directly from the contig.

re: how trust4 recovers sequences:
Does this mean that trust4 does not actually assemble a full contig per B cell receptor variation?

In a B cell receptor I would consider any variations of V, junction, J to be potentially relevant and warrants assembly into a full contig so long as the read fragments were of good quality and the overlaps with high confidence.

This is how Bracer and MIXCR handle their contigs.

I understand the logic from the methods section of trust4, that from a computational efficiency perspective the consensus system is faster. However, from an analysis side, grouping to a single assembly loses a lot of information and granularity on how diverse an individual clonal family is. Many BCR analysis specifically look at the phylogeny of individual clones of interest to understand clonal selection/affinity maturation.

Is there a possibility to generate variations of a consensus sequence by knitting reads of variance into their own individual contigs. e.g. if a read has 2 mutations in CDR1 region, but otherwise aligns exactly to the consensus, then annot.fa will add an additional contig that is the consensus but deviates in those 2 nucleotides in CDR1?

@mourisl
Copy link
Collaborator

mourisl commented Aug 30, 2021

If there are enough variations, TRUST4 will break them up into different contigs. The full contigs are for the highly expressed receptors.
I think Bracer is for single-cell assemblies. In the single-cell setting, TRUST4 will not mix the variations from different cells into the same consensus.
As for the last part, if a read has 2 mutations in CDR1, but there could be mutations in CDR2 or CDR3 from other reads, and it is very difficult to knit them together in bulk RNA-seq setting.

@Nusob888
Copy link
Author

Thanks for explaining, that makes sense now.

In that case the only desired addition, if possible, would be to break up annot.fa to match cdr3.out and report.tsv or to have consensus full length sequences as an additional column in the report.tsv. That would help a lot with indexing back to the read counts for all within consensus ids

@mourisl
Copy link
Collaborator

mourisl commented Aug 31, 2021

I just add a perl script AddSequenceToCDR3File.pl to the scripts folder in the github repo. You can run it as:
perl AddSequenceToCDR3File trust_cdr3.out trust_annot.fa > combined_cdr3.out
It will add the assembled contig sequence to each row in the cdr3 file and also change the CDR3 sequence in the contig accordingly.

Please let me know whether it works.

@cwxiix
Copy link

cwxiix commented Aug 31, 2021

Hello,
Sorry I left a comment here and hope didn't interrupt the discussion. I met the same error with whitelist. From my questions before I knew I need barcode whitelist correction. Now when I try to process it, I met same issue. Thus, I left a comment here. Next is my code and output message in terminal:

[000003 ~]$ ~/software/TRUST4/run-trust4 -t 8 -u ~/project/sc/data/fastq/vdj_nextgem_hs_pbmc3_5gex_protein_fastqs/vdj_nextgem_hs_pbmc3_5gex_protein_gex_fastqs/R2_001.fastq.gz --barcode ~/project/sc/data/fastq/vdj_nextgem_hs_pbmc3_5gex_protein_fastqs/vdj_nextgem_hs_pbmc3_5gex_protein_gex_fastqs/R1_001.fastq.gz --barcodeRange 0 15 + --barcodeWhitelist ~/project/sc/data/737K-august-2016.txt -f ~/software/TRUST4/hg38_bcrtcr.fa --ref ~/software/TRUST4/human_IMGT+C.fa -o ~/project/sc/output/trust4/gex_list/
[Tue Aug 31 10:36:49 2021] TRUST4 begins.
[Tue Aug 31 10:36:49 2021] SYSTEM CALL: ~/software/TRUST4/fastq-extractor -t 8 -f ~/TRUST4/hg38_bcrtcr.fa -o ~/project/sc/output/trust4/gex_list/_toassemble --barcodeStart 0 --barcodeEnd 15 --barcodeWhitelist ~/project/sc/data/737K-august-2016.txt -u ~/project/sc/data/fastq/vdj_nextgem_hs_pbmc3_5gex_protein_fastqs/vdj_nextgem_hs_pbmc3_5gex_protein_gex_fastqs/R2_001.fastq.gz --barcode ~/project/sc/data/fastq/vdj_nextgem_hs_pbmc3_5gex_protein_fastqs/vdj_nextgem_hs_pbmc3_5gex_protein_gex_fastqs/R1_001.fastq.gz
[Tue Aug 31 10:36:50 2021] Start to extract candidate reads from read files.
system ~/software/TRUST4/fastq-extractor -t 8 -f ~/software/TRUST4/hg38_bcrtcr.fa -o ~/project/sc/output/trust4/gex_list/_toassemble --barcodeStart 0 --barcodeEnd 15 --barcodeWhitelist ~/project/sc/data/737K-august-2016.txt -u ~/project/sc/data/fastq/vdj_nextgem_hs_pbmc3_5gex_protein_fastqs/vdj_nextgem_hs_pbmc3_5gex_protein_gex_fastqs/R2_001.fastq.gz --barcode ~/project/sc/data/fastq/vdj_nextgem_hs_pbmc3_5gex_protein_fastqs/vdj_nextgem_hs_pbmc3_5gex_protein_gex_fastqs/R1_001.fastq.gz failed: 139 at ~/software/TRUST4/run-trust4 line 48.

Here I changed all the home directory to '~' to delete my names in it. Also sorry for the long folder names of data.

I noticed one thing that in README the example code used --barcodeWhiteList and the parameter is --barcodeWhitelist with lower case in L of List. Will this make any differences?

Please tell me more about these. Thank you.

@mourisl
Copy link
Collaborator

mourisl commented Aug 31, 2021

That is a typo in README. It should be --barcodeWhitelist. I will fix that right away. Thank you!

@Nusob888
Copy link
Author

Nusob888 commented Sep 1, 2021

Hi @mourisl , this works a treat! thank you so much for that!

I will keep the thread open for @cwxiix and the whitelist issue, but this solves the indexing

@mourisl
Copy link
Collaborator

mourisl commented Sep 7, 2021

I tried several different data sets on our server and could not reproduce this whitelist error. What system were you using? Was it MacOS?

@cwxiix
Copy link

cwxiix commented Sep 7, 2021

For me, I am using a Xshell terminal to connect to a server and the system is CentOS Linux release 7.5.1804 (Core) . Is this information helpful?

@Nusob888
Copy link
Author

Nusob888 commented Sep 7, 2021

Our cluster structure is a little complex, but each node runs a version of linux. Just to sanity check, are there any hardcoded parts in the script that require the whitelist to be in a cellranger library? because I stored the whitelists in a project specific directory. I had a look and it didn't appear to be the case, but thought I would check.

@ttab963
Copy link

ttab963 commented Sep 12, 2021

Hi,

I have a same problem with 'Nusob888'.
The running code is here.

run-trust4 -f /mnt/S3/data2/workbench/Users/ktkim/program/TRUST4/hg38_bcrtcr.fa \
--ref /mnt/S3/data2/workbench/Users/ktkim/ref/tenX/refdata-gex-GRCh38-2020-A/fasta/genome.fa \
-u /data/rawdata/scHCC/TBD190769_20200514/01_fastq_file/46_N/*_R2_*.fastq.gz \
--barcode /data/rawdata/scHCC/TBD190769_20200514/01_fastq_file/46_N/*_R1_*.fastq.gz \
--barcodeRange 0 15  \
--barcodeWhitelist /mnt/S3/data2/workbench/Users/ktkim/program/cellranger4.0.0/lib/python/cellranger/barcodes/translation/3M-february-2018.txt

and error is like this.

failed: 11 at /mnt/S3/data2/workbench/Users/ktkim/program/TRUST4/run-trust4 line 48.

Those fasta files came from 10X, and some infomation is here.

R1 :

@A00930:41:HWGVCDSXX:3:1101:5104:1000 1:N:0:AAGCAGTC
NCAATCTTCACGAAGGCCCAGCCGCA
+
#FFFFFFFFFFFFFFFFFFFFFFFFF

R2 :

@A00930:41:HWGVCDSXX:3:1101:5104:1000 2:N:0:AAGCAGTC
GACATCCGAGGATGGATTGAAGGTAGGATAGGGGCTCACCGCTGATCCGGGACCACCTTTGGATGACTTCACAGTTTGAACATATTCCTGCTCTTCAT
+
FFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFFF,FFFFFFFFFF:FFFFFFFFFFFFFFFFFFF

whitelist came from cellranger :

AAACCCAAGAAACACT        AAACCCATCAAACACT
AAACCCAAGAAACCAT        AAACCCATCAAACCAT
AAACCCAAGAAACCCA        AAACCCATCAAACCCA
AAACCCAAGAAACCCG        AAACCCATCAAACCCG
AAACCCAAGAAACCTG        AAACCCATCAAACCTG
AAACCCAAGAAACGAA        AAACCCATCAAACGAA
AAACCCAAGAAACGTC        AAACCCATCAAACGTC
AAACCCAAGAAACTAC        AAACCCATCAAACTAC
AAACCCAAGAAACTCA        AAACCCATCAAACTCA
AAACCCAAGAAACTGC        AAACCCATCAAACTGC

Thanks to read.

@mourisl
Copy link
Collaborator

mourisl commented Sep 12, 2021

@ttab963 I think the reason for your case might be different. There are two issues with your command:

  1. --barcodeRange option should also have a third value to specify the strand, so it should be "--barcodeRange 0 15 +"
  2. This issue does not relate to the barcode. The --ref option is not for the reference genome, it is for the IMGT reference, so it should be "--ref /mnt/S3/data2/workbench/Users/ktkim/program/TRUST4/human_IMGT+C.fa".

@ttab963
Copy link

ttab963 commented Sep 13, 2021

Thanks for your reply.
I changed the code, but same error occurs.

nohup run-trust4 -f /mnt/S3/data2/workbench/Users/ktkim/program/TRUST4/hg38_bcrtcr.fa \
--ref /mnt/S3/data2/workbench/Users/ktkim/program/TRUST4/human_IMGT+C.fa \
-u /data/rawdata/scHCC/TBD190769_20200514/01_fastq_file/46_N/*_R2_*.fastq.gz \
--barcode /data/rawdata/scHCC/TBD190769_20200514/01_fastq_file/46_N/*_R1_*.fastq.gz \
--barcodeRange 0 15 + \
--barcodeWhitelist /mnt/S3/data2/workbench/Users/ktkim/program/cellranger4.0.0/lib/python/cellranger/barcodes/translation/3M-february-2018.txt &

@mourisl
Copy link
Collaborator

mourisl commented Sep 13, 2021

@ttab963 Your whitelist is from the cellranger translation folder. It contains two columns maybe for the translation of multiome data. TRUST4 assumes each row contains one barcode, so you should use the barcode in the path /mnt/S3/data2/workbench/Users/ktkim/program/cellranger4.0.0/lib/python/cellranger/barcodes/ instead of that in the translation folder. (You still need to unzip the whitelist there).

@ttab963
Copy link

ttab963 commented Sep 13, 2021

Sorry for bothering you.

I realize my data is actually build with 10x 5' library, so I change whitelist to ~/cellranger-4.0.0/lib/python/cellranger/barcodes/737K-august-2016.txt.
so I fix again and check that whitelist file has just one column.
But same error again.

Cellranger process with those data has any problems so data is not corrupted.
And plus, I also run well with example file or without whitelist option.

Thanks again, @mourisl.

@jmostrom013
Copy link

Hello, I just wanted to mention that I am getting this same error when running TRUST4 with the whitelist (failed: 139 at ./TRUST4/run-trust4 line 48.). It only happens when I run on a remote server running Ubuntu 20.04 -- I was able to run the same analysis with the whitelist successfully on MacOS. Also, when I remove the -barcodeWhitelist option when running on the server, it runs successfully.

Thanks!

@rawlings-lab
Copy link

I have the exact same error at the fastq-extractor when running with my own whitelist, (when removed it works fine):
failed: 11 at /home/cthouv/miniconda3/envs/ngs_bcr/bin/run-trust4 line 48.

Here's my input:
run-trust4 -f /home/cthouv/references/trust4/bcrtcr.fa --ref /home/cthouv/references/trust4/IMGT+C.fa -1 /home/cthouv/bcrseq_t1/seqs/frag2min_S1_L001_R1_001.fastq -2 /home/cthouv/bcrseq_t1/seqs/frag2min_S1_L001_R2_001.fastq --read1Range 20 -1 --barcode /home/cthouv/bcrseq_t1/seqs/frag2min_S1_L001_R1_001.fastq --barcodeRange 0 7 + --barcodeWhitelist /home/cthouv/references/trust4/barcode_whitelist_24.txt --repseq

my barcode_whitelist_24.txt:

GAAGTGCT
CCTCGTTA
CGAACAAC
TCGTCTGA
ATCTGACC
TTGGACTG
CGCCTTAT
GACTACGA
TCGAACCT
TGAGCTGT
CCTAGAGA
GAGTAGAG
ATGAGTGC
ATGTGGAC
AACGTAGC
GGATTCAC
TCAATCCG
ATTCCGCT
CCTACCTA
CGTCTTCA
GTACACCT
TGTCAGTG
AACGCCTT
GACTTGTG

I'm using it on a remote server running CentOS Linux, version 7. Thanks!

@mourisl
Copy link
Collaborator

mourisl commented May 3, 2022

@rawlings-lab Which TRUST4 version are you using? This issue should be fixed with a recent update.

@rawlings-lab
Copy link

My fault on that. I installed through conda and it looks like it installed 1.05 and I hadn't noticed. Got it installed through github and the error is gone. Much appreciated for the quick response.

@czarnewski
Copy link

Hi @Nusob888 @mourisl @jmostrom013 @rawlings-lab ,
I also run into the same error yesterday:

[...] Start to extract candidate reads from read files.
system /domus/h1/paucza/.conda/envs/stag/bin/fastq-extractor -t 6 -f ../data/TRUST4/PRJNA549239/SRR9310044/TRUST_SRR9310044_3_toassemble  --barcodeStart 0 --barcodeEnd 15 -u ../data/FASTQ/PRJNA549239/SRR9310044/SRR9310044_3.fastq.gz --barcode ../data/FASTQ/PRJNA549239/SRR9310044/SRR9310044_2.fastq.gz failed: 139 at /home/paucza/.conda/envs/stag/bin/run-trust4 line 48.

It seems that the issue has nothing to do with the white list, as I was getting the error even on bulk-RNAseq datasets. The issue was that TRUST4 was not creating the output directory specified in --od (as normal for other tools). For me, creating the folder beforehand solved it:
mkdir -p $OUT/TRUST4/$i/$j

# FOR EACH SINGLE_CELL DATASET
for i in $DATASETS_SC; do echo $i 
  RUNS=$(basename -a $OUT/SRA/$i/*) 
  for j in $RUNS; do echo $j 
    mkdir -p $OUT/TRUST4/$i/$j   #  <--------------------------- ADDED THIS LINE 
    run-trust4 \ 
      -f TRUST4/mouse/GRCm38_bcrtcr.fa \
      --ref TRUST4/mouse/mouse_IMGT+C.fa\
      -u $OUT/FASTQ/$i/$j/${j}_3.fastq.gz\
      --barcode $OUT/FASTQ/$i/$j/${j}_2.fastq.gz\
      --od $OUT/TRUST4/$i/$j \  
      --barcodeRange 0 15 + \
      --barcodeWhitelist TRUST4/737K-august-2016.txt\
      -t 1 &
  done
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants