-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about .feather output of create_cistarget_motif_databases.py #550
Comments
Motif names are in the last column of the dataframe, other columns contain scores or rankings for a specific region (one value per motif). import polars as pl
# Get schema of Feather file.
feather_schema = pl.read_ipc_schema("/databases/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather")
In [14]: list(feather_schema)[0:10]
Out[14]:
['chr10:100000176-100000504',
'chr10:100001759-100001930',
'chr10:100004841-100005148',
'chr10:100005876-100006219',
'chr10:100006302-100006644',
'chr10:100007267-100007608',
'chr10:100007891-100008108',
'chr10:100008250-100008544',
'chr10:100008734-100009017',
'chr10:100009359-100009597']
In [15]: list(feather_schema)[-10:]
Out[15]:
['chrY:9623241-9623568',
'chrY:9805269-9805610',
'chrY:9812117-9812357',
'chrY:9853800-9854136',
'chrY:9920102-9920438',
'chrY:9924284-9924623',
'chrY:9954840-9955041',
'chrY:9959132-9959359',
'chrY:9997981-9998328',
'motifs']
In [16]: motifs_df = pl.read_ipc("/databases/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather", columns=["motifs"])
Could not memory_map compressed IPC file, defaulting to normal read. Toggle off 'memory_map' to silence this warning.
In [17]: motifs_df
Out[17]:
shape: (5_876, 1)
┌──────────────────────────┐
│ motifs │
│ --- │
│ str │
╞══════════════════════════╡
│ bergman__Su_H_ │
│ bergman__croc │
│ bergman__pho │
│ bergman__tll │
│ c2h2_zfs__M0369 │
│ … │
│ yetfasco__TBP-TFIIA_1328 │
│ yetfasco__TBP-TFIIB_1329 │
│ yetfasco__YFL044C_1166 │
│ yetfasco__YGL192W_1000 │
│ yetfasco__YPR086W_1327 │
└──────────────────────────┘ |
Thank you so much! This is very helpful to confirm. I am indeed new to this feather format and I wondered if I might be missing something. This should resolve my questions. I imagine it can't hurt to have this documented here in an issue, either; someone will probably find it useful. =-) Thanks again! Have a nice day! Sara |
Dear SCENIC+ folks, I've a brief question, It pertains to the .feather outputs of the create_cistarget_motif_databases.py. In principle these contain information that map genomic loci to a certain set of motifs/TFs. I anticipated metadata reflecting this, nominally in the row names. However when I looked at the example hg38 files at https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/ I don't find row-wise metadata relating to the specific motifs or TFs? Is that expected? Am I looking in the wrong place in these .feather files or any other output files? I'd like to trace the specific genomic loci to motifs and TFs at this (human readable) level for sanity checks. It didn't seem trivially possible at this step. Thanks in advance, Sara
The text was updated successfully, but these errors were encountered: