Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the information provided in model.json #174

Open
afg1 opened this issue Jan 13, 2025 · 0 comments
Open

Expand the information provided in model.json #174

afg1 opened this issue Jan 13, 2025 · 0 comments

Comments

@afg1
Copy link
Contributor

afg1 commented Jan 13, 2025

When we run R2DT at RNAcentral, there are several steps to parse the output and import that data into RNAcentral so that we can show diagrams, dot bracket notation, and use hits to determine RNA type. This relies on a table that knows about all the models R2DT uses.

Right now, the file /rna/r2dt/data/models.json contains some of the information needed to update the r2dt_models table at RNAcentral updated, but not everything.

models.json currently provides model_id source, anddescription. To be able to update our table with R2DT's latest set of models we need the following:

Field Description
model_name Not always the same as model_id currently. e.g. RFXXXXX for Rfam, most others seem right
so_term_id The SO ID for the corresponding RNA type, e.g. SO:0002344 for mt_SSU_rRNA
model_source exactly as in the current models.json
model_length Currently we extract this from the model cm file using cmstat's clen column
model_basepair_count Also extracted from the model cm file using cmstat's 'bps' column

If this could be provided by R2DT it would make updating this table a lot more robust, as right now it is quite manual and prone to going wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

1 participant