Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

options for selecting tcga + non-tcga studies #5831

Closed
jjgao opened this issue Mar 5, 2019 · 9 comments
Closed

options for selecting tcga + non-tcga studies #5831

jjgao opened this issue Mar 5, 2019 · 9 comments
Assignees

Comments

@jjgao
Copy link
Member

jjgao commented Mar 5, 2019

After we removed the option to select all studies and replace it with TCGA PanCancer Atlas studies, a number of users complained.

image

Here are two users about their use cases:

“I have been studying some genes with very low mutation frequencies, and had previously determined that some mutations were present in very specific, non-TCGA studies. So if the search is only limited to the TCGA, then those mutations are missed. Manually selecting all studies is really a chore to be sure of not missing some rare mutations. These mutations could hold functional clues to how the enzyme works in cancer, so missing out on them could be detrimental.”

"We generally select all studies when doing a query in cBioPortal. Usually the situation for when we are performing the query is related to variant curation; we are looking for reported cases of a particular variant in any type of cancer. It is easy for us to select all the studies and figure out which are duplicate cases based on the sample ID, rather than selecting certain studies which would limit acquisition of experience with the variant. The TCGA PanCancer studies alone would be insufficient for this same limitation."

To solve this (addressing the use cases above), one option would be providing a button to allow users to select non-redundant TCGA + non-TCGA studies or a "good default set of studies" as proposed in #3395.

@cBioPortal/product @alisman @inodb

@jjgao jjgao self-assigned this Mar 5, 2019
@jjgao jjgao assigned alisman and inodb and unassigned jjgao and inodb Mar 17, 2019
@jjgao
Copy link
Member Author

jjgao commented Mar 17, 2019

@alisman as commented in https://github.com/cBioPortal/cbioportal/issues/3395#issuecomment-473596022, here is a list of 155 studies: default_studies_list_20190326.txt

Please:

  • Make it configurable for the "Quick select" buttons, e.g. add to properties or json configuration files including the button name and studies
    • by default there is no quick select buttons
  • Add a new button (ie. in properties) "A curated set of 165 studies" after "TCGA PanCancer Atlas studies"
    • @alisman @cBioPortal/product please propose a better name if you can
  • MSK portal will only have the "TCGA PanCancer Atlas studies" button
  • public portal will have both buttons
  • change https://www.cbioportal.org/ln?q=TP53:MUT to use the new list

@ritikakundra since this is a whitelist. We will need to update this list when new studies are being pushed out.

@jjgao
Copy link
Member Author

jjgao commented Mar 17, 2019

@inodb I am wondering if we should use this list for Quick Search too... It is slower than just TCGA pancancer studies, but not too bad: https://www.cbioportal.org/results/mutations?session_id=5c8daff7e4b046111fee2481

@alisman
Copy link
Contributor

alisman commented Mar 19, 2019

@jjgao do you think these quick search buttons (pancan and now the curated) are important for other portal instances beside mskcc and public? the configuration is a little awkward, 1. b/c we have to use structured data to represent the list, 2. because in pan can case, the pan can studies button can really only be shown and defined at run time.

Simplest solution would be to just have a flag that turns these on for our portals. Let me know what you think.

@jjgao
Copy link
Member Author

jjgao commented Mar 19, 2019

@alisman

  1. Maybe we should define them in the frontend config? e.g.
{
"quickSelect":[
 {
   "name":"TCGA PanCancer Atlas studies"
   "studyIds":[...],
   "descreption:":"33 TCGA PanCancer Atlas studies"
 },
 {
   "name":"Curated set of non-redundant studies"
   "studyIds":[...],
   "descreption:":"155 studies that are manually curated including TCGA and non-TCGA studies with no overlapping samples"
 }
]

}
  1. I think we should put the name of the pancan studies as well.

@alisman
Copy link
Contributor

alisman commented Mar 19, 2019

@jjgao it is defined in frontend config. the problem is that on dashi, we could have a json file sitting on disc. in aws we cannot do this as instances are ephemeral. so we the json configuration needs to live in a repository just as portal.properties. will discuss with ino tomorrow

@jjgao
Copy link
Member Author

jjgao commented Mar 19, 2019

@alisman @ino maybe we can have some public configure files (non private keys) on github?

@alisman
Copy link
Contributor

alisman commented Mar 20, 2019

@JJ do you think the tooltip for the curated set should indicate something about the non-overlapping nature? i.e. that's what drove the curation of the set?

@jjgao
Copy link
Member Author

jjgao commented Mar 20, 2019

@alisman I've updated the name and description above.

@jjgao
Copy link
Member Author

jjgao commented Mar 23, 2019

utuc_mskcc_2013 should be utuc_mskcc_2015

#5889

@jjgao jjgao closed this as completed Mar 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants