Running multiple traits #11

swvanderlaan · 2020-12-01T09:20:49Z

Hi,
We wanted to run HDL using multiple traits (19 to be exact). Just checking, based on the instructions and the code: is it correct that one can only test two traits at a time?

Did you have a smart way implemented to test 19 traits?

Thanks

Sander

zhenin · 2020-12-02T08:23:29Z

Hi Sander,

Yes, the current version of HDL only supports two traits testing at a time. I do have some code to save computational resources, but I have not made it clean and ready for users.

Thanks for your interest in HDL!

Best regards,
Zheng

swvanderlaan · 2021-01-19T21:25:17Z

Hi,
Would be great for us (tagging @Kai6662) if there's code that will handle more than two traits. It would make things more efficient indeed.

Thanks!

Sander

zhenin · 2021-01-25T10:00:16Z

Hi Sander,

Thanks for the advice! I will try to summarize the code for multiple traits and write instructions in Wiki. Parallelization is needed for sure, but it takes much less time and memory to run HDL for multiple traits with the (hopefully) smarter code.

To be more specific, in HDL, it takes a lot of time and memory to load the eigens of the reference panel. If we want to estimate rg between 10 traits (45 pairs), we actually only need to do the above step once. But if we run a normal HDL 45 times, then we do the above step 45 times, which is indeed a huge waste.

It is tricky to pack the code into a package because the necessity of parallelization is different for different steps. So I plan to share the code firstly and we will see whether it can be improved :).

Best,
Zheng

s-bell · 2021-05-05T12:24:43Z

Hi Zheng,

Thanks for all of your hard work on HDL.

Picking up on this topic again - if one isn't so much interested in the genetic correlation matrix (10 traits = 45 pairs as you describe above; which is challenging to implement in ldsc too) and instead "just" wants to compute the correlations between one trait and multiple others (for example, coronary artery disease and HDL cholesterol, LDL cholesterol, systolic blood pressure [...]) is there an easier way to implement this than using an array for the second trait which would also require the reloading of the reference panel eigens repeatedly? I'm thinking something akin to that used in ldsc when you specify --rg CAD,HDL,LDL,SBP and get a summary printed near the bottom of the output file along the lines of:

p1	p2	rg
CAD	HDL	-0.15
CAD	LDL	0.32
CAD	SBP	0.46

(n.b. no HDL - LDL, HDL - SBP, LDL - SBP rows; just genetic correlations for the first trait listed and all those after it)

This would make it much more tractable to perform high-throughput genetic correlation analyses for a single trait and is perhaps a little more in line with the analyses an investigator might typically wish to perform than a full matrix operation each time (ie, carry out a GWAS of trait A and then look for correlations with traits B-Z [estimating 25 genetic correlations], rather than an A_i,i matrix which would represent 325 genetic correlations for A-Z pairings if continuing our example - 300 of which may not be informative/required to tackle the question at hand).

All the best,

Steven

zhenin closed this as completed Jan 8, 2021

zhenin reopened this Jan 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running multiple traits #11

Running multiple traits #11

swvanderlaan commented Dec 1, 2020

zhenin commented Dec 2, 2020

swvanderlaan commented Jan 19, 2021

zhenin commented Jan 25, 2021

s-bell commented May 5, 2021

Running multiple traits #11

Running multiple traits #11

Comments

swvanderlaan commented Dec 1, 2020

zhenin commented Dec 2, 2020

swvanderlaan commented Jan 19, 2021

zhenin commented Jan 25, 2021

s-bell commented May 5, 2021