Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to interpret OrthologuesStats*.tsv files? #259

Closed
bbalog87 opened this issue May 6, 2019 · 4 comments
Closed

How to interpret OrthologuesStats*.tsv files? #259

bbalog87 opened this issue May 6, 2019 · 4 comments

Comments

@bbalog87
Copy link

bbalog87 commented May 6, 2019

Hello,

it is not clear to my me how to interpret the files in OrthologuesStats*.tsv.

For instance, this matrix from the OrthologuesStats_one-to-one.tsv file is not symmetric. It is not clear how to infer the total number of one-to-one orthologs for ach species. Is it the rows sum or either le columns sum?
Chaar Latma Pagma Parch Perfl Sanlu Silsi
Chaar 0.0 12540.0 11361.0 13307.0 9480.0 6867.0 14564.0
Latma 12540.0 0.0 10242.0 11736.0 8457.0 6323.0 12891.0
Pagma 11361.0 10242.0 0.0 11388.0 7496.0 6068.0 12220.0
Parch 13307.0 11736.0 11388.0 0.0 9261.0 6840.0 13963.0
Perfl 9480.0 8457.0 7496.0 9261.0 0.0 5292.0 9781.0
Sanlu 6867.0 6323.0 6068.0 6840.0 5292.0 0.0 7035.0
Silsi 14564.0 12891.0 12220.0 13963.0 9781.0 7035.0 0.0

Thank you,
Julien

@davidemms
Copy link
Owner

Hi Julien

I've just checked the matrix in your post and it is symmetric, e.g. it reports that the number of one-to-one orthologs between Chaar and Latma is 12540 and that is the same if you look at the M(1,0) entry of the matrix or the M(0,1) entry. So for each pair of species the corresponding number in the matrix is the number of one-to-one orthologs between that pair of species. You don't need to take the sum over the rows of columns.

All the best
David

@bbalog87
Copy link
Author

bbalog87 commented May 7, 2019

Hi David,

Thanks for the helpful answer. I have now understood how to read the matrix.

How about this one-to-many matrix?

    Chaar   Latma   Pagma   Parch   Perfl   Sanlu   Silsi

Chaar 0.0 816.0 1269.0 1511.0 6740.0 597.0 871.0
Latma 431.0 0.0 1156.0 1388.0 5992.0 625.0 787.0
Pagma 1007.0 1218.0 0.0 2021.0 6088.0 738.0 1387.0
Parch 383.0 690.0 1110.0 0.0 6552.0 513.0 714.0
Perfl 197.0 441.0 772.0 706.0 0.0 448.0 430.0
Sanlu 239.0 426.0 719.0 827.0 3587.0 0.0 431.0
Silsi 441.0 822.0 1135.0 1492.0 6923.0 608.0 0.0

Best,
Julien

@bbalog87 bbalog87 closed this as completed May 7, 2019
@davidemms
Copy link
Owner

Hi Julien

The reason for this is that it's not a symmetrical (e.g. one-to-one) relationship. Thanks for bringing this up, below is a explanation of how this works. I'll add something to the README file to describe these results files more fully as I realise now that there's not enough info for users to interpret them currently.

For some gene trees you will have multiple duplication events post-speciation. This could lead to, for example, 2 genes in Latma being orthologs of 3 genes in Chaar. All of these occurrences are summed up in the many-to-many matrix. This case would add 2 to the entry for M(Latma, Chaar) and 3 to the entry for M(Chaar, Latma). This is a tree showing 3 genes in arabidopsis (AT2G07671, ATMG01080, ATMG00040) that are orthologs to 2 genes in volvox (Vocar.0009s0017.1, Vocar.0009s0018.1):

many-to-many

For the one-to-many/many-to-one relationships, you might have matrices like this:

one-to-many, X=

             A. thaliana   O. sativa   P. patens  V. carteri
 A. thaliana           0        1601        1614         115
   O. sativa        1893           0        1686         108
   P. patens         906         880           0         123
  V. carteri        1693        1606        2155           0

many-to-one, Y=

             A. thaliana   O. sativa   P. patens  V. carteri
 A. thaliana           0        4683        2463        5596
   O. sativa        4135           0        2483        5510
   P. patens        4099        4347           0        6439
  V. carteri         282         269         329           0

This means there are 1693 genes in V. carteria that are in a one-to-many relationship with orthologs in A. thaliana whereas there are only 115 genes in A. thaliana that are in a one-to-many relationship with genes in V. carteria. That corresponds to what should be expected, the genome of A. thaliana is larger and there have been more gene duplication events in lineage leading to A. thaliana than to the green algae V. carteria.

A little care needs to be taken when reading these files though as the 1693 genes in volvox are orthologs of the 5596 genes in arabidopsis (i.e. X(i,j) genes are orthologs of Y(j,i) genes) and the 115 genes in arabidopsis are orthologs of the 282 genes in volvox. This makes sense in terms of the naming of the matrices and the ordering of the entries, but might be different from what might naively be expected.

All the best
David

@bbalog87
Copy link
Author

bbalog87 commented May 8, 2019

Hi David,

Thank you for the comprehensive explanations. It would really be great if you could edit the README, in order to help users to better interpret those results.

Best,
Julien.

PS: I deleted the previous post by mistake. I'll just repost the one-to-many matrix here for other readers who might be interested to this issue.

        Chaar Latma  Pagma   Parch   Perfl   Sanlu   Silsi
Chaar   0.0     816.0   1269.0  1511.0  6740.0  597.0   871.0
Latma   431.0   0.0     1156.0  1388.0  5992.0  625.0   787.0
Pagma   1007.0  1218.0  0.0     2021.0  6088.0  738.0   1387.0
Parch   383.0   690.0   1110.0  0.0     6552.0  513.0   714.0
Perfl   197.0   441.0   772.0   706.0   0.0     448.0   430.0
Sanlu   239.0   426.0   719.0   827.0   3587.0  0.0     431.0
Silsi   441.0   822.0   1135.0  1492.0  6923.0  608.0   0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants