Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running multiple traits #11

Open
swvanderlaan opened this issue Dec 1, 2020 · 4 comments
Open

Running multiple traits #11

swvanderlaan opened this issue Dec 1, 2020 · 4 comments

Comments

@swvanderlaan
Copy link

Hi,
We wanted to run HDL using multiple traits (19 to be exact). Just checking, based on the instructions and the code: is it correct that one can only test two traits at a time?

Did you have a smart way implemented to test 19 traits?

Thanks

Sander

@zhenin
Copy link
Owner

zhenin commented Dec 2, 2020

Hi Sander,

Yes, the current version of HDL only supports two traits testing at a time. I do have some code to save computational resources, but I have not made it clean and ready for users.

Thanks for your interest in HDL!

Best regards,
Zheng

@zhenin zhenin closed this as completed Jan 8, 2021
@swvanderlaan
Copy link
Author

Hi,
Would be great for us (tagging @Kai6662) if there's code that will handle more than two traits. It would make things more efficient indeed.

Thanks!

Sander

@zhenin zhenin reopened this Jan 25, 2021
@zhenin
Copy link
Owner

zhenin commented Jan 25, 2021

Hi Sander,

Thanks for the advice! I will try to summarize the code for multiple traits and write instructions in Wiki. Parallelization is needed for sure, but it takes much less time and memory to run HDL for multiple traits with the (hopefully) smarter code.

To be more specific, in HDL, it takes a lot of time and memory to load the eigens of the reference panel. If we want to estimate rg between 10 traits (45 pairs), we actually only need to do the above step once. But if we run a normal HDL 45 times, then we do the above step 45 times, which is indeed a huge waste.

It is tricky to pack the code into a package because the necessity of parallelization is different for different steps. So I plan to share the code firstly and we will see whether it can be improved :).

Best,
Zheng

@s-bell
Copy link

s-bell commented May 5, 2021

Hi Zheng,

Thanks for all of your hard work on HDL.

Picking up on this topic again - if one isn't so much interested in the genetic correlation matrix (10 traits = 45 pairs as you describe above; which is challenging to implement in ldsc too) and instead "just" wants to compute the correlations between one trait and multiple others (for example, coronary artery disease and HDL cholesterol, LDL cholesterol, systolic blood pressure [...]) is there an easier way to implement this than using an array for the second trait which would also require the reloading of the reference panel eigens repeatedly? I'm thinking something akin to that used in ldsc when you specify --rg CAD,HDL,LDL,SBP and get a summary printed near the bottom of the output file along the lines of:

p1 p2 rg
CAD HDL -0.15
CAD LDL 0.32
CAD SBP 0.46

(n.b. no HDL - LDL, HDL - SBP, LDL - SBP rows; just genetic correlations for the first trait listed and all those after it)

This would make it much more tractable to perform high-throughput genetic correlation analyses for a single trait and is perhaps a little more in line with the analyses an investigator might typically wish to perform than a full matrix operation each time (ie, carry out a GWAS of trait A and then look for correlations with traits B-Z [estimating 25 genetic correlations], rather than an Ai,i matrix which would represent 325 genetic correlations for A-Z pairings if continuing our example - 300 of which may not be informative/required to tackle the question at hand).

All the best,

Steven

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants