Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the sequence of C region? #244

Open
ljy-sys opened this issue Jan 17, 2024 · 5 comments
Open

How to get the sequence of C region? #244

ljy-sys opened this issue Jan 17, 2024 · 5 comments

Comments

@ljy-sys
Copy link

ljy-sys commented Jan 17, 2024

Excuse me again. Because I analyzed the TCR information of smart-seq3 through Trust-Smartsep.pl, but in the report and airr files, I do not found the sequence of the constant region C region, so I hope to get your help on how to obtain the sequence of the constant region C region.

@mourisl
Copy link
Collaborator

mourisl commented Jan 17, 2024

TRUST4 only assembles the first portion of C genes, maybe around 200bp. To get those sequences, you also need the AIRR format, and utilize the "sequence" column and the J_align column, where everything after J_align may correspond to C gene. Or you extract the sequences after the "sequence_alignment" portion.

I can add a "c_cigar" column in TRUST4 later, which will give you a more accurate range of C gene on the sequence.

@ljy-sys
Copy link
Author

ljy-sys commented Jan 17, 2024

TRUST4 only assembles the first portion of C genes, maybe around 200bp. To get those sequences, you also need the AIRR format, and utilize the "sequence" column and the J_align column, where everything after J_align may correspond to C gene. Or you extract the sequences after the "sequence_alignment" portion.

I can add a "c_cigar" column in TRUST4 later, which will give you a more accurate range of C gene on the sequence.

TRUST4 only assembles the first portion of C genes, maybe around 200bp. To get those sequences, you also need the AIRR format, and utilize the "sequence" column and the J_align column, where everything after J_align may correspond to C gene. Or you extract the sequences after the "sequence_alignment" portion.

I can add a "c_cigar" column in TRUST4 later, which will give you a more accurate range of C gene on the sequence.

Yes, I get the C gene sequence by processing the airr file: extract the sequence after the sequence contained in the "sequence_alignment" column in the "sequence" column, which is the partial sequence of the C gene. But there are two questions, the first is the sequence of FR4 region (J gene part) is not included in the "sequence"? the second is that with the current version, we can only get a partial sequence of the C gene, right?

@mourisl
Copy link
Collaborator

mourisl commented Jan 17, 2024

If the assembled contig contains the j gene part, it will be in both sequence and sequence_alignment columns.

Right. C gene is much less diverse, so there is no need for full-length C gene assembly to identify it. Just curious, why do you need the full sequence of C gene?

@mourisl
Copy link
Collaborator

mourisl commented Jan 17, 2024

I forgot to mention that the header i the _annot.fa file in the smartseq wrapper also contains the coordinate for the C gene, which probably is more accurate than using all the sequences after J gene.

@ljy-sys
Copy link
Author

ljy-sys commented Jan 18, 2024

I forgot to mention that the header i the _annot.fa file in the smartseq wrapper also contains the coordinate for the C gene, which probably is more accurate than using all the sequences after J gene.

Thanks very much! I got it. The main reason why I want the C gene sequence is to further understand smart-seq data and clear the use of TRUST4. Thank you again for your timely reply! Hey hey^_^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants