Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Productionize LD code for gnomAD v4 SNVs/Indels #634

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

matren395
Copy link
Contributor

Ported code from gnomAD v2. Not running on SVs or for cross-pop analyses

@matren395 matren395 added the v4.1 label Sep 11, 2024
@matren395 matren395 self-assigned this Sep 11, 2024
@matren395 matren395 marked this pull request as ready for review September 24, 2024 19:35
@matren395
Copy link
Contributor Author

okay I'm actually okay attaching my name to this now! Ready for review

pop_freq = pop_mt.freq[meta_index]
pop_mt = pop_mt.annotate_rows(pop_freq=pop_freq)

pop_mt = pop_mt.filter_rows((hl.len(pop_mt.filters) == 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: this line duplicates line 88

pop_freq = pop_mt.freq[meta_index]
pop_mt = pop_mt.annotate_rows(pop_freq=pop_freq)

pop_mt = pop_mt.filter_rows((hl.len(pop_mt.filters) == 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a note for future reference:
In my case this line of filter is replaced with a set of filters below,

  1. Entries: high quality variants (~VQSR)
  2. Entries: adj filters
  3. Rows: hl.agg.any(mt.GT.n_alt_alleles() > 0)

return pop_mt


def generate_ld_pruned_set(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note to confirm, this function is not needed for the purpose of computing LD scores.

),
overwrite,
)
ld = hl.ld_matrix(pop_mt.GT.n_alt_alleles(), pop_mt.locus, radius)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my old script, I ran BlockMatrix.write_from_entry_expr( mt.GT.n_alt_alleles(), tmp_bm_path, mean_impute=True, center=False, normalize=False, overwrite=args.overwrite ), wondering how much difference this will introduce

)
ld = hl.ld_matrix(pop_mt.GT.n_alt_alleles(), pop_mt.locus, radius)
if data_type != "genomes_snv_sv":
ld = ld.sparsify_triangle()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: any thoughts in why this shouldn't be applied to all cases?


l2row = r2_adj.sum(axis=0).T
l2col = r2_adj.sum(axis=1)
l2 = l2row + l2col + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more note, I had this line as

r2_diag = checkpoint_tmp(r2_adj.diagonal()).T
l2 = l2row + l2col - r2_diag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants