-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to interpret OrthologuesStats*.tsv files? #259
Comments
Hi Julien I've just checked the matrix in your post and it is symmetric, e.g. it reports that the number of one-to-one orthologs between Chaar and Latma is 12540 and that is the same if you look at the M(1,0) entry of the matrix or the M(0,1) entry. So for each pair of species the corresponding number in the matrix is the number of one-to-one orthologs between that pair of species. You don't need to take the sum over the rows of columns. All the best |
Hi David, Thanks for the helpful answer. I have now understood how to read the matrix. How about this one-to-many matrix?
Chaar 0.0 816.0 1269.0 1511.0 6740.0 597.0 871.0 Best, |
Hi Julien The reason for this is that it's not a symmetrical (e.g. one-to-one) relationship. Thanks for bringing this up, below is a explanation of how this works. I'll add something to the README file to describe these results files more fully as I realise now that there's not enough info for users to interpret them currently. For some gene trees you will have multiple duplication events post-speciation. This could lead to, for example, 2 genes in Latma being orthologs of 3 genes in Chaar. All of these occurrences are summed up in the many-to-many matrix. This case would add 2 to the entry for M(Latma, Chaar) and 3 to the entry for M(Chaar, Latma). This is a tree showing 3 genes in arabidopsis (AT2G07671, ATMG01080, ATMG00040) that are orthologs to 2 genes in volvox (Vocar.0009s0017.1, Vocar.0009s0018.1): For the one-to-many/many-to-one relationships, you might have matrices like this: one-to-many, X=
many-to-one, Y=
This means there are 1693 genes in V. carteria that are in a one-to-many relationship with orthologs in A. thaliana whereas there are only 115 genes in A. thaliana that are in a one-to-many relationship with genes in V. carteria. That corresponds to what should be expected, the genome of A. thaliana is larger and there have been more gene duplication events in lineage leading to A. thaliana than to the green algae V. carteria. A little care needs to be taken when reading these files though as the 1693 genes in volvox are orthologs of the 5596 genes in arabidopsis (i.e. All the best |
Hi David, Thank you for the comprehensive explanations. It would really be great if you could edit the README, in order to help users to better interpret those results. Best, PS: I deleted the previous post by mistake. I'll just repost the one-to-many matrix here for other readers who might be interested to this issue.
|
Hello,
it is not clear to my me how to interpret the files in OrthologuesStats*.tsv.
For instance, this matrix from the OrthologuesStats_one-to-one.tsv file is not symmetric. It is not clear how to infer the total number of one-to-one orthologs for ach species. Is it the rows sum or either le columns sum?
Chaar Latma Pagma Parch Perfl Sanlu Silsi
Chaar 0.0 12540.0 11361.0 13307.0 9480.0 6867.0 14564.0
Latma 12540.0 0.0 10242.0 11736.0 8457.0 6323.0 12891.0
Pagma 11361.0 10242.0 0.0 11388.0 7496.0 6068.0 12220.0
Parch 13307.0 11736.0 11388.0 0.0 9261.0 6840.0 13963.0
Perfl 9480.0 8457.0 7496.0 9261.0 0.0 5292.0 9781.0
Sanlu 6867.0 6323.0 6068.0 6840.0 5292.0 0.0 7035.0
Silsi 14564.0 12891.0 12220.0 13963.0 9781.0 7035.0 0.0
Thank you,
Julien
The text was updated successfully, but these errors were encountered: