How to get the sequence of C region？ #244

ljy-sys · 2024-01-17T06:20:45Z

Excuse me again. Because I analyzed the TCR information of smart-seq3 through Trust-Smartsep.pl, but in the report and airr files, I do not found the sequence of the constant region C region, so I hope to get your help on how to obtain the sequence of the constant region C region.

mourisl · 2024-01-17T07:04:37Z

TRUST4 only assembles the first portion of C genes, maybe around 200bp. To get those sequences, you also need the AIRR format, and utilize the "sequence" column and the J_align column, where everything after J_align may correspond to C gene. Or you extract the sequences after the "sequence_alignment" portion.

I can add a "c_cigar" column in TRUST4 later, which will give you a more accurate range of C gene on the sequence.

ljy-sys · 2024-01-17T08:09:43Z

TRUST4 only assembles the first portion of C genes, maybe around 200bp. To get those sequences, you also need the AIRR format, and utilize the "sequence" column and the J_align column, where everything after J_align may correspond to C gene. Or you extract the sequences after the "sequence_alignment" portion.

I can add a "c_cigar" column in TRUST4 later, which will give you a more accurate range of C gene on the sequence.

Yes, I get the C gene sequence by processing the airr file: extract the sequence after the sequence contained in the "sequence_alignment" column in the "sequence" column, which is the partial sequence of the C gene. But there are two questions, the first is the sequence of FR4 region (J gene part) is not included in the "sequence"? the second is that with the current version, we can only get a partial sequence of the C gene, right？

mourisl · 2024-01-17T15:54:47Z

If the assembled contig contains the j gene part, it will be in both sequence and sequence_alignment columns.

Right. C gene is much less diverse, so there is no need for full-length C gene assembly to identify it. Just curious, why do you need the full sequence of C gene?

mourisl · 2024-01-17T21:11:48Z

I forgot to mention that the header i the _annot.fa file in the smartseq wrapper also contains the coordinate for the C gene, which probably is more accurate than using all the sequences after J gene.

ljy-sys · 2024-01-18T06:24:16Z

I forgot to mention that the header i the _annot.fa file in the smartseq wrapper also contains the coordinate for the C gene, which probably is more accurate than using all the sequences after J gene.

Thanks very much! I got it. The main reason why I want the C gene sequence is to further understand smart-seq data and clear the use of TRUST4. Thank you again for your timely reply! Hey hey^_^

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get the sequence of C region？ #244

How to get the sequence of C region？ #244

ljy-sys commented Jan 17, 2024

mourisl commented Jan 17, 2024

ljy-sys commented Jan 17, 2024

mourisl commented Jan 17, 2024

mourisl commented Jan 17, 2024

ljy-sys commented Jan 18, 2024

How to get the sequence of C region？ #244

How to get the sequence of C region？ #244

Comments

ljy-sys commented Jan 17, 2024

mourisl commented Jan 17, 2024

ljy-sys commented Jan 17, 2024

mourisl commented Jan 17, 2024

mourisl commented Jan 17, 2024

ljy-sys commented Jan 18, 2024