-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UMI count question #254
Comments
If you provide the UMI in a single-cell data, the read count will be with respect to the number of UMIs supporting the corresponding CDR3. |
The assembly step does not use the UMI information for single-cell data. After the assembly, the reads from the same UMI maybe mapped to different contigs in the quantification step, as a result, some UMIs will be overcounted. But the count for a CDR3 of a specific contig from a cell is the unique number of UMIs that mapped to the CDR3. For the screenshot, do you mean that the count 461 (red square) is above the 48 (the UMIs for a cell), so the UMI count is wrong here? |
Yes, why is the count much higher at this position? Is it an error, possibly stemming from my input mistake? If it's an error, I'll reanalyze it. |
What was your running command? |
I split read1's cell barcode and unique molecular identifier (UMI) into two separate fastq files, and then I needed to convert the cell IDs, so I used barcodeTranslate. This shouldn't affect anything, right? |
This should be fine. One minor thing is that the "-1" for fastq-extractor should be "-u". Otherwise, it will think this is a paired-end data sets and throw an error of unequal number of reads. How did you calculate that there were 48 UMIs for this cell? |
Script statistics tcrbcr_bc.fa and tcrbcr_umi.fa |
Yes, it's "-u" in my pipeline . I made mistake in the issue description. |
This looks right to me. I just checked my run with barcode+UMI and a quick peek did not find any discrepancies. Could you please run |
The numbers look similar to the statistics |
There might be a bug in the program then. Could you please share the _assembled_reads.fa and _final.out file with me? |
How about just the reads and final.out (6 lines per contig) from the cell that you found had the issue? You can either send through the email as the attachment, or googledrive/dropbox/baiduwangpan link? Thank you. |
ok,your email dress? |
Thank you for sharing the file. I got a reasonable UMI count in the _cdr3.out file based on the files you provided:
Which version of TRUST4 did you use? |
TRUST4 v1.0.13-r473. |
Thank you for sharing the larger data set. I think I've found and fixed the bug that may assign a read to another barcode in the contig abundance estimation step. Could you please pull down the github repo again and give it a try? This is a pretty serious bug, if it works on your data set, I will draft a new release soon. |
I tested a larger dataset and obtained results for several cell IDs. Now the UMIs are working properly. |
Why do the umi numbers of the same umi become inconsistent after assembly? |
They should be consistent. Do you see those issues from the cell barcode you shared with me? |
I found the issue caused by my own mistake. I included 'missing_barcode' during the analysis, which caused the problem. Removing it will be OK. |
It's still quite strange. "E200004414L1C001R03004110063" and "E200004414L1C003R00300997369" have the same barcode and UMI, but their converted UMI numeric value is different. Their numeric UMI should not be affected by the "missing_barcode" issue. Or the UMIs correspond to other reads? |
I think I've found the issue. Could you please pull the updated github repo and give it a try? Please let me know whether it works when there are "missing_barcode" in the data. Thank you again for scrutinizing TRUST4's results. |
After using the new repo, the results match those obtained after removing the missing_barcode. |
Thank you for your software.
When analyzing single-cell 10x data, although we provide UMI data, the resulting output does not include UMI counts.
How does trust4 utilize UMI data during the assembly process? In barcode_report.tsv, there is only "read_fragment_count."
Does it represent the number of reads used to assemble each sequence?
The text was updated successfully, but these errors were encountered: