Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifying samples previously annotated as germline-mutant #2124

Open
nehawali21 opened this issue Dec 31, 2024 · 0 comments
Open

Identifying samples previously annotated as germline-mutant #2124

nehawali21 opened this issue Dec 31, 2024 · 0 comments
Labels

Comments

@nehawali21
Copy link

nehawali21 commented Dec 31, 2024

At the start of the year I had created a static list of samples of interest with either wild-type or mutant p53 out of the (at the time 217) non-overlapping case studies. I've consistently been able to generate a final list of patients after many processing steps in R, one of which included removing samples from two patients which had a germline mutation status. However, in the past few weeks, I've noticed that, perhaps due to updates in the cBioPortal database, my germline-mutant filtering out step is not appropriately removing samples to yield my final expected number of patients because samples do not have a germline mutation status annotation anymore.

When comparing patient sample sizes in figures generated before and after this discrepancy, I believe the issue is arising in patients with liver cancer cumulatively pooled from msk_impact_2017, pan_origimed_2020, pancan_pcawg_2020, and lihc_tcga_pan_can_atlas_2018. I've tried looking in the datahub repository for these datasets, but I can't tell which samples may have had a mutation status annotation update in which germline annotations were removed recently. Now all the mutation status annotations in my filtered list of patients have either a dot or NA, so I'm afraid I can't get further insights from my side.

Is it possible to get more information on which patient IDs in these studies have recently had an update to remove germline annotations? If it's helpful, I pull mutation data from cBioPortal through R using my list of samples and the get_mutations_by_sample() function in the cbioportalR R package, where I've also posted a question regarding this.

I'd be most grateful if there could be some light shed on this, as all my subsequent analyses with this list are now thrown off due to the different sample sizes. As I had a static sample list, I thought that newly pulling mutation data in R would consistently end up with the same final list of patients after processing, but I unfortunately didn't anticipate that a change in mutation status could occur a year out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants