Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLASS Consortium. Longitudinal molecular trajectories of diffuse glioma in adults. Nature 2019 #933

Closed
9 tasks
jjgao opened this issue Dec 15, 2019 · 6 comments · Fixed by #1478
Closed
9 tasks

Comments

@jjgao
Copy link
Member

jjgao commented Dec 15, 2019

https://www.nature.com/articles/s41586-019-1775-1

  • create an issue on datahub before curating a study (one issue per study) and copy this checklist to the issue tracker
  • List information of the dataset/paper in the issue, e.g. pmid, paper link, suppl file link
  • Document the curation process, e.g. how and by whom the data was transformed
  • Follow the data checklist
  • Create a pull request to datahub once the data is curated
  • Push to triage portal
  • Import into msk and public portal database
  • Update the default quick select study list see options for selecting tcga + non-tcga studies cbioportal#5831 (only for studies that are not overlapping with existing ones; only for human tumors, not cell lines)
  • Update cBioPortal news
@stale
Copy link

stale bot commented Aug 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Nov 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 8, 2020
@stale stale bot closed this as completed Nov 22, 2020
@rmadupuri rmadupuri reopened this Nov 23, 2020
@stale stale bot closed this as completed Dec 23, 2020
@rmadupuri rmadupuri reopened this Dec 23, 2020
@stale stale bot removed the wontfix label Dec 23, 2020
@tmazor
Copy link
Contributor

tmazor commented Jun 4, 2021

Initial comments on review:

  1. Cancer Type Detailed is Glioblastoma for all samples, which is incorrect. Looks like Supp Table 8 lists histology & grade which can be combined into as follows into oncotree codes:
    grade II --> ASTR, ODG or OAST depending on histology
    grade III --> AASTR, AODG, AOAST depending on histology
    grave IV --> Glioblastoma
  2. I selected IDH status = IDHmut (214 samples). Then look at IDH/codel subtype and 55 samples are IDHwt. That seems like an error to me - I only expect to see IDHmut-codel or IDHmut-noncodel at that point
  3. Similarly, I selected IDH status = IDHwt (301 samples) and then look at IDH/codel subtype and there are 39 IDHmut-noncodel and 8 IDHmut-codel. That shouldn't be.
  4. Similarly, I selected 1p19q status = noncodel (466 samples) and when I look at IDH/codel subtype, there are 34 IDHmut-codel samples. That shouldn't be.
  5. Why is Initial Grade NA for so many samples? The data in Supp Table 7 seems mostly populated. Also shouldn't that be patient not sample data? Also if you're going to have this field, why not also Initial Tumor Histology?
  6. Do you have different data sources besides the supp tables? I picked a random field, eg MGM Methylation, and tried to validate the numbers against supp table 8 and the numbers are different
  7. What are 'Tumor Pairs' ?
  8. What is 'Tumor Sample Histology' ? Is that Initial Tumor Histology? If so, same as questions 4 above - why so many NAs and why not a patient attribute?
  9. WHO classification of diagnostic tumor - is that actually for diagnostic tumor or is that for each sample?
  10. I selected samples with MGMT Methylation Method = array or PCR, and then look at MGMT Methylation and 63 samples are NA - that's surprising, I would expect if there is a method, then there should also be a result.
  11. Is Primary Tumor Laterality actually specific to the primary tumor? If I select samples with a value Right or Left, and then look at Sample Type, about half are actually the primary tumor and the other half are recurrences.

@rmadupuri
Copy link
Collaborator

rmadupuri commented Jun 30, 2021

Hi @tmazor, thank you for reviewing the data.

This study has multisector samples (data is at aliquot level) and we initially had included the complete dataset - all the aliquots per sample from synapse https://www.synapse.org/#!Synapse:syn17038081/wiki/ hence the inconsistencies when compared to the supp 7, 8 tables of the paper.

We have now updated the study to the gold set patients with just two samples/aliquots (222 patients, 444 samples) each collected at highest quality timepoints per the QC filters stated here : https://www.nature.com/articles/s41586-019-1775-1/figures/5. So the cohort now is in line with the analysis done in the Nature 2019 paper.

Fixes made:
Points 1-4 : are fixed
5 : There are both Initial Tumor Grade (patient level) and per sample Grade attributes.
Also there is Initial Tumor Histology (patient level) and sample level Histology attributes. We have included both for now.
6. Is fixed now.
7. Tumor pairs - Each pair represents a combination of a two tumor aliquots in the given patient. We have removed this attribute as gold set has 1 high quality pair selected. An example table can be seen here : https://www.synapse.org/#!Synapse:syn18483769/tables/
9. This is per sample classification based on the sample histology.
10. The Methylation Status is given as NA for few samples in the supplemental tables as well as synapse even though there is a method. We will look into it more for additional details.
11. Updated the attribute name. It is not specific to primary tumor.

@tmazor
Copy link
Contributor

tmazor commented Jul 22, 2021

This looks much better! Thanks for the updates @rmadupuri .

I reviewed all my original comments, and everything seems to be resolved except for the first one. Cancer Type Detailed looks much better, but it looks like grade 2/3 tumors are all classified as Astro, Oligo or Oligoastro. Those OncoTree codes really refer to grade 2 tumors, and the grade 3 tumors should be the corresponding Anaplastic OncoTree codes. Unless there's a reason not to? I know there are studies in the public portal using the Anaplastic codes.

Also, upon further review, I'm actually thinking that Cancer Type Detailed should be populated based on the WHO Classification rather than Grade+Histology as I had originally suggested.

Some additional things I've noticed:

  1. Hypermutation seems to be a patient-level attribute, but I think should be sample-specific
  2. Mutation counts are the same for both samples from each patient. This can't be right.
    image
  3. I only looked at a handful of patients, but they all had multiple IDH1 mutations, eg
    http://private.cbioportal.org/private/patient?studyId=difg_glass_2019&caseId=GLSS-19-0273
    image
    That also cant be right.
    In fact, the Mutated Genes table makes clear that there are too many IDH1 mutations:
    image
    Not all samples should have mutations (see IDH Status chart) and samples should almost always have only 1 mutation.

@rmadupuri
Copy link
Collaborator

rmadupuri commented Nov 11, 2021

Hi @tmazor! Thank you again for the review and apologies for the delayed response. We have fixed the issues above.

  1. Cancer Type - Fixed. Updated based on WHO classification. Includes Anaplastic codes.

  2. Hypermutation Status - Fixed the attribute name/description - represents the Hypermutation status of only the recurrent tumors (If the recurrent tumor in the tumor pairs have a mutation burden (Mutations/Mb) > 10). Hence added this as a patient level attribute.

  3. Mutation counts: Fixed.
    We reached out to the authors. Understood that the original MAF had single-sample mutect calls overlaid on the multi-sample mutect2 calls. For each patient the variant information was listed for all samples (including normal blood). We have filtered the single sample mutect calls based on a flag in the maf (as directed by author).
    I have added the detailed data transformation steps here: difg_glass_2019: Add new study #1478 (comment).

  4. IDH variants were a special case. Filtering the MAF as above produced very few IDH variants even though the samples were clinically known to be IDHmut. The author suggested to force-filter the IDH variants - for samples that were known to be true IDH mutants from clinical tests and the variant alt_counts > 0. This is resolved now.

We have released this study to public here - https://www.cbioportal.org/study/summary?id=difg_glass_2019. Shared the link to the authors too.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants