You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the start of the year I had created a static list of samples of interest with either wild-type or mutant p53 out of the (at the time 217) non-overlapping case studies. I've consistently been able to generate a final list of patients after many processing steps in R, one of which included removing samples from two patients which had a germline mutation status. However, in the past few weeks, I've noticed that, perhaps due to updates in the cBioPortal database, my germline-mutant filtering out step is not appropriately removing samples to yield my final expected number of patients because samples do not have a germline mutation status annotation anymore.
When comparing patient sample sizes in figures generated before and after this discrepancy, I believe the issue is arising in patients with liver cancer cumulatively pooled from msk_impact_2017, pan_origimed_2020, pancan_pcawg_2020, and lihc_tcga_pan_can_atlas_2018. I've tried looking in the datahub repository for these datasets, but I can't tell which samples may have had a mutation status annotation update in which germline annotations were removed recently. Now all the mutation status annotations in my filtered list of patients have either a dot or NA, so I'm afraid I can't get further insights from my side.
Is it possible to get more information on which patient IDs in these studies have recently had an update to remove germline annotations? If it's helpful, I pull mutation data from cBioPortal through R using my list of samples and the get_mutations_by_sample() function in the cbioportalR R package, where I've also posted a question regarding this.
I'd be most grateful if there could be some light shed on this, as all my subsequent analyses with this list are now thrown off due to the different sample sizes. As I had a static sample list, I thought that newly pulling mutation data in R would consistently end up with the same final list of patients after processing, but I unfortunately didn't anticipate that a change in mutation status could occur a year out.
The text was updated successfully, but these errors were encountered:
At the start of the year I had created a static list of samples of interest with either wild-type or mutant p53 out of the (at the time 217) non-overlapping case studies. I've consistently been able to generate a final list of patients after many processing steps in R, one of which included removing samples from two patients which had a germline mutation status. However, in the past few weeks, I've noticed that, perhaps due to updates in the cBioPortal database, my germline-mutant filtering out step is not appropriately removing samples to yield my final expected number of patients because samples do not have a germline mutation status annotation anymore.
When comparing patient sample sizes in figures generated before and after this discrepancy, I believe the issue is arising in patients with liver cancer cumulatively pooled from
msk_impact_2017
,pan_origimed_2020
,pancan_pcawg_2020
, andlihc_tcga_pan_can_atlas_2018
. I've tried looking in the datahub repository for these datasets, but I can't tell which samples may have had a mutation status annotation update in which germline annotations were removed recently. Now all the mutation status annotations in my filtered list of patients have either a dot or NA, so I'm afraid I can't get further insights from my side.Is it possible to get more information on which patient IDs in these studies have recently had an update to remove germline annotations? If it's helpful, I pull mutation data from cBioPortal through R using my list of samples and the
get_mutations_by_sample()
function in thecbioportalR
R package, where I've also posted a question regarding this.I'd be most grateful if there could be some light shed on this, as all my subsequent analyses with this list are now thrown off due to the different sample sizes. As I had a static sample list, I thought that newly pulling mutation data in R would consistently end up with the same final list of patients after processing, but I unfortunately didn't anticipate that a change in mutation status could occur a year out.
The text was updated successfully, but these errors were encountered: