Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bambu for plant data #465

Open
baibhav-bioinfo opened this issue Feb 15, 2025 · 3 comments
Open

Bambu for plant data #465

baibhav-bioinfo opened this issue Feb 15, 2025 · 3 comments

Comments

@baibhav-bioinfo
Copy link

Hello,
Very helpful tool.
I am using Nanopore DRS dataset belonging to sorghum plant.
(1) can I use your general model directly to identify the novel transcripts in my dataset or do I need to train a model for plant or something?
(2) sorghum does have a pretty good annotated reference transcriptome. so what should be my NDR value if I am just interested in identifying novel transcripts which I will add in existing to make an updated transcriptome.

@cying111
Copy link
Collaborator

Hi,

Glad to hear you're using Bambu for sorghum plant DRS data—sounds exciting!

  1. For your first question, the answer depends on your data size. If you have enough data to train a model, that's ideal. By default, Bambu will attempt to train a model based on the provided data and will issue a warning if the data isn't sufficient.
  2. For your second question, you can leave NDR = NULL if you're unsure. Bambu will then recommend an NDR value corresponding to 10% of the human transcriptome.

Hope this helps! Let us know if you have any other questions.
Thanks!
Warm regards,
Ying

@baibhav-bioinfo
Copy link
Author

baibhav-bioinfo commented Feb 19, 2025

hello, thankyou so much for the swift response
as its written in documentation that the pretrained model do perform well on arabidopsis data too.
can i use the same pretrained model in my case too?
I am using following command to get the novel transcripts discovery from 3 condition and 3 rep data in each

se.multiSample_discovery_only_NDR_null <- bambu(reads = c("a6_r1.bam", "a6_r2.bam", "a6_r3.bam", "b6_r1.bam", "b6_r2.bam", "b6_r3.bam", "c6_r1.bam", "c6_r2.bam", "c6_r3.bam"), annotations = annotations, genome = "SbicolorRTx430_552_v2.0.fa", quant = FALSE, NDR = NULL)

bambu recommends a NDR of 0.085 (as sorghum is well annotated)

is this command good enough (maybe not ideal) for the novel transcript discovery? or do i have to go for species specific training?

if yes, can you guide me though it, i would appreciate it

@cying111
Copy link
Collaborator

Hi,

The current command already does species-specific training by default. So you can go ahead with it.

Hope this clarifies your question!

Thank you
Ying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants