Post-treatments and warning messages #4

GENG126 · 2022-02-09T02:39:44Z

"data" is the species abundance matrix

##"site" is the environmant covariance matrix
##HU is a factor in site

fit PLN model

PLNMODEL<- PLN(data~site$HU)
Initialization...
Adjusting a PLN model with full covariance model
Post-treatments...
DONE!
Warning message:
In dpois(y, mu, log = TRUE) : non-integer x = 1.214953

Follow is my questions:

I am not sure whether the warning message is because of too limit data or too many zero in the data;
The model can still run and I can get the result of the inference network, so I am curious whether this result is robust;
I also want to know that if the result is not robust what can I do to infer species interaction network?

Raphaellemomal · 2022-02-10T16:47:30Z

I do not have an intuition for this warning but your questions are very relevant. Could you please provide:

the dimension of your data?
the summary of the vector of percentage of zeros for each species ?

Also, the robustness of your network is linked to the stability of the edges. Could you please provide the stability curve for the stability selection step ? (plot given by the EMtree::StaATS() function, you will have an example in the vignette here).

GENG126 · 2022-02-14T09:18:57Z

I do not have an intuition for this warning but your questions are very relevant. Could you please provide:

the dimension of your data?

the summary of the vector of percentage of zeros for each species ?

Also, the robustness of your network is linked to the stability of the edges. Could you please provide the stability curve for the stability selection step ? (plot given by the EMtree::StaATS() function, you will have an example in the vignette here).

data dimension
**"data" **is the species abundance matrix，with 8 obs. of 32 variables, so we have 32 species data in 8 years;
QUESTION The rows here are species abundance in different year (actually is dry season in 8 years, I want to see whether there is difference in species interaction in different season, and we have 8-year data), because we have different number of plots each year, so we add up to total abundance of one species each year. I don't know if this is feasible, or I need to change "years" to "plots"?

> dim(data)
[1]  8 32

**"site" **is the environment covariance matrix, with 8 obs. of 8 variables, variables including disturbs such as human disturb, temperature etc.

> dim(site)
[1] 8 8

zeros percentage for each species
···
0 0 37.5 75 50 0 87.5 87.5 87.5 87.5 0 0 87.5 75 0 25 0 0 0 0 0 25 0 0 62.5 37.5 87.5 87.5 0 12.5 100 100

But if I exclude species which 0 occur more than 25%, the same warning as I mentioned before.

PLNMODEL<- PLN(data1~site$HU)
Initialization...
Adjusting a PLN model with full covariance model
Post-treatments...
DONE!
Warning message:
In dpois(y, mu, log = TRUE) : non-integer x = 1.214953


3. stability curve
I can't run the EMtree::StaATS() function, and in the example the function name is "StATS()"

library(EMtree)
EMtree::StaATS
error: 'StaATS' is not an exported object from 'namespace:EMtree'
stab_selection=StATS(ResampEmtreeFit$Pmat, nlambda=50, stab.thresh=0.9,plot=TRUE)
Error in StATS(ResampEmtreeFit$Pmat, nlambda = 50, stab.thresh = 0.9, :
No "StATS" function

Raphaellemomal · 2022-02-15T13:37:24Z

It is possible the warning is a result of your very few number of observations (rows).

I would recommend you consider all plots as rows in your data (so change "years" to "plots" as you say), and add a "year" covariate as a factor in your "site" matrix. You will have to include the year covariate in the PLNmodel.
You should not consider species with no counts (100% zeros). Even if this does not solve the issue, no counts at all is like having no data, they are unnecessary dimensions.
Please update your version of EMtree, the StATS function is very helpful to set the appropriate threshold on your edges regarding the stability of your network.

GENG126 · 2022-02-17T01:03:39Z

It is possible the warning is a result of your very few number of observations (rows).

I would recommend you consider all plots as rows in your data (so change "years" to "plots" as you say), and add a "year" covariate as a factor in your "site" matrix. You will have to include the year covariate in the PLNmodel.

You should not consider species with no counts (100% zeros). Even if this does not solve the issue, no counts at all is like having no data, they are unnecessary dimensions.

Please update your version of EMtree, the StATS function is very helpful to set the appropriate threshold on your edges regarding the stability of your network.

Thank you Momal, I will try, thank you very much!

GENG126 · 2022-02-27T12:35:49Z

Hi, Momal. I could not understand the threshold setting in "stability". My questions are:

freqs<-freq_selec(ResampEmtreeFit$Pmat,Pt=0.2)
a_first_idea_of_network<-1*(freqs>0.8)

"Pt" here is the threshold came from "stability"? I am quite confused. For example, I need to run the function StATS(). set the stability threshold=0.9.

stab_selection=StATS(ResampEmtreeFit$Pmat, nlambda=50, stab.thresh=0.9,plot=TRUE)
stab_selection$lambda_opt
##0.031

So the Pt here I need to set as 0.031?
2. And I don't understand why "the optimal frequencies still need to be thresholded in order to obtain a network". If the network is stable according to the stability select, can I just use the visualizing function and do not select frequency?

weighted_net=ToSym(stab_selection$freqs_opt)
g<-draw_network(weighted_net,nodes_label = species.names, title="Weighted", pal_edges="dodgerblue3",layout="stress",shade = TRUE, btw_rank=3)$G

sorry, too much questions, I am the beginner of data analysis.
Looking forward to you reply.
Best wishes!

Raphaellemomal · 2022-03-03T14:57:14Z

Hi sorry for my late answer.
I will try to explain better here with this diagram. To obtain a network with the resampling version of EMtree, you need to set two thresholds: one on the edges conditional probabilities (output from the main method) and one on the frequencies.

But the problem is that we don't known how to set Pt a priori because they are probabilities scaled for a tree structure (the sum of all probabilities is the number of edges in a tree graph), they are not really edges probabilities of presence. This is where StATS() helps: it provides the optimal Pt, which is the value for which we have a good stability value across resamples, which means a good agreement between the resamples on the selected edges.

Additionally to providing a solution to the choice of Pt, you see here that we change of quantity of interest, going from conditional probabilities to well-understandable selection frequencies. This is important for interpretation : nos the weight on edges is actually a measure of the robustness of edges.

So after ResampleEMtree(), use StATS() to find the optimal Pt, and then you can either threshold the edge selection frequencies if you wish a binary network, or juste continue with the edges frequencies to work with a weighted network.

Is that clearer?

GENG126 · 2022-03-04T01:28:50Z

Hi sorry for my late answer. I will try to explain better here with this diagram. To obtain a network with the resampling version of EMtree, you need to set two thresholds: one on the edges conditional probabilities (output from the main method) and one on the frequencies.

But the problem is that we don't known how to set Pt a priori because they are probabilities scaled for a tree structure (the sum of all probabilities is the number of edges in a tree graph), they are not really edges probabilities of presence. This is where StATS() helps: it provides the optimal Pt, which is the value for which we have a good stability value across resamples, which means a good agreement between the resamples on the selected edges.

Additionally to providing a solution to the choice of Pt, you see here that we change of quantity of interest, going from conditional probabilities to well-understandable selection frequencies. This is important for interpretation : nos the weight on edges is actually a measure of the robustness of edges.

So after ResampleEMtree(), use StATS() to find the optimal Pt, and then you can either threshold the edge selection frequencies if you wish a binary network, or juste continue with the edges frequencies to work with a weighted network.

Is that clearer?

Yes!!! Never understand a model so clear before!! Thank you so so so so so much Momal!!

GENG126 · 2022-03-15T07:59:01Z

Hi, Momal:
I am here again! I still have questions in "PLN" step. Sorry, it is quite hard for me to handle this problem.
As you suggested first time, I need to consider "year" as a covariate and use "plots" as the rows of the matrix. So I change my mind, I construct network by each year-season (eg. 2013 dry season).

data<- as.matrix(read.csv("2013dry.csv"))
dim(data)
##23  10
site<- as_tibble(read.csv("2013dry.cor.csv"))
dim(site)
##23 2

data
![5c844d1c9dc4f01f2e860b9bbb4414e](https://user-images.githubusercontent.com/99301649/158330484-65bb728a-7496-4e
site

77-9a6c-ad0065d98ef3.png)

#Fit PLN model
PLNMODEL<- PLN(data~site$HL)

warning:
1: In dpois(y, mu, log = TRUE) : non-integer x = 8.333333
2: In dpois(y, mu, log = TRUE) : non-integer x = 83.333333
3: In dpois(y, mu, log = TRUE) : non-integer x = 4.545455
4: In dpois(y, mu, log = TRUE) : non-integer x = 3.351955
5: In dpois(y, mu, log = TRUE) : non-integer x = 2.500000
6: In dpois(y, mu, log = TRUE) : non-integer x = 2.777778
7: In dpois(y, mu, log = TRUE) : non-integer x = 83.333333
8: In dpois(y, mu, log = TRUE) : non-integer x = 11.111111
9: In dpois(y, mu, log = TRUE) : non-integer x = 5.263158
10: In dpois(y, mu, log = TRUE) : non-integer x = 66.666667
11: In dpois(y, mu, log = TRUE) : non-integer x = 11.904762
12: In dpois(y, mu, log = TRUE) : non-integer x = 1.818182
13: In dpois(y, mu, log = TRUE) : non-integer x = 13.888889
14: In dpois(y, mu, log = TRUE) : non-integer x = 4.166667

stab_selection=StATS(ResampEmtreeFit$Pmat, nlambda=50, stab.thresh=0.9,plot=TRUE)

This is the result of stable selection.
And I can get the network:

Can I just ignore these warnings? I try each year-season data, on the one hand, I exclude species data with too many "0"; on the other hand, some year-season has many plots like 37. But all have the warnings.

Looking forward to your reply!
Best wishes.
Geng Ying.

Raphaellemomal · 2022-03-24T16:29:49Z

Hi, sorry again for the late answer.
I talked with Julien Chiquet about this (owner of PLNmodels), we consider it is fine for you to pursue with your model as it is.
I just have a small question: how is it that you have data dimension 23 x 10, and 25 species in your network ?

Best,
Raphaëlle

GENG126 · 2022-03-25T01:23:40Z

Thank you so much for the reply! It help me to continue my work! The data dimension 23 x 10 is the species I selected more than 3 times in the matrix, and the network I showed is all the species including some species which may only occur once (I try whether there is difference with or without these species, and the PLN result is no). I think the network is more complete to show with 25 species to you, so I use this network. Sorry the inconsistency before and after confuses you.

GENG126 · 2022-05-31T02:34:32Z

Hi Momal:
Sorry I still have questions about EMtree.

ResampEmtreeFit<-ResampleEMtree(counts=data, S=999, covar_matrix = site$HL, maxIter=25,cond.tol=1e-8, cores=1)

I set different number of maxlter and get different network interactions. So I am curious how to set a maxIter. I have 8-year data, I want to show network each year. And the species I have most is 21 in 2020. So I think maxIter is 21 in all networks, so that all (8) networks have the uniform standards. Do I understand correctly?
And I have no idea how to set cond.tol, is this influence my result?

Looking forward to your reply!

Best wishes!
Geng Ying.

Rmomal · 2022-06-01T06:18:19Z

Hi Ying, - maxIter is the "Maximum number of EMtree iterations at each sub-sampling". It controls the number of iterations of the algorithm, so setting it to 21 can be too low and stopping the algorithm before it gets a chance to reach convergence. Il would be better to set this parameter to at least 50. - Your S parameter is very high, not that it is a problem but I think it is unnecessary. S is the "Total number of sub-samples.", and as you have 21 samples maximum setting S to 500 is enough. - cond.tol should be very low as it is a precision threshold for the computations. Default value might be too low as you have little number of samples, but I do not recommend to set it above 1e-6. Setting this higher creates a risk that the algorithm does not converge properly. You see that it is mostly about the algorithm's convergence, that you can check by running the simple EMtree() function with plot=TRUE (as a first check for maxiter, before ResampleEMtree() for example. Best, Raphaëlle Le mar. 31 mai 2022 à 04:34, GENG Ying ***@***.***> a écrit :

…

Hi Momal: Sorry I still have questions about EMtree. ResampEmtreeFit<-ResampleEMtree(counts=data, S=999, covar_matrix = site$HL, maxIter=25,cond.tol=1e-8, cores=1) I set different number of *maxlter* and get different network interactions. So I am curious how to set a *maxIter*. I have 8-year data, I want to show network each year. And the species I have most is 21 in 2020. So I think *maxIter* is *21* in all networks, so that all (8) networks have the uniform standards. Do I understand correctly? And I have no idea how to set *cond.tol*, is this influence my result? Looking forward to your reply! Best wishes! Geng Ying. — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH2P2ZIUMZZRL756X4YH7G3VMV3EJANCNFSM5N4GSVIQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

GENG126 · 2022-06-01T08:09:04Z

Hi Ying, - maxIter is the "Maximum number of EMtree iterations at each sub-sampling". It controls the number of iterations of the algorithm, so setting it to 21 can be too low and stopping the algorithm before it gets a chance to reach convergence. Il would be better to set this parameter to at least 50. - Your S parameter is very high, not that it is a problem but I think it is unnecessary. S is the "Total number of sub-samples.", and as you have 21 samples maximum setting S to 500 is enough. - cond.tol should be very low as it is a precision threshold for the computations. Default value might be too low as you have little number of samples, but I do not recommend to set it above 1e-6. Setting this higher creates a risk that the algorithm does not converge properly. You see that it is mostly about the algorithm's convergence, that you can check by running the simple EMtree() function with plot=TRUE (as a first check for maxiter, before ResampleEMtree() for example. Best, Raphaëlle Le mar. 31 mai 2022 à 04:34, GENG Ying @.***> a écrit :

Hi Momal:
Thank you very much for the reply!

Best wishes!
Geng.

GENG126 closed this as completed Feb 27, 2022

GENG126 reopened this Feb 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-treatments and warning messages #4

Post-treatments and warning messages #4

GENG126 commented Feb 9, 2022

Raphaellemomal commented Feb 10, 2022

GENG126 commented Feb 14, 2022

Raphaellemomal commented Feb 15, 2022

GENG126 commented Feb 17, 2022

GENG126 commented Feb 27, 2022

Raphaellemomal commented Mar 3, 2022 •

edited

Loading

GENG126 commented Mar 4, 2022

GENG126 commented Mar 15, 2022

Raphaellemomal commented Mar 24, 2022

GENG126 commented Mar 25, 2022

GENG126 commented May 31, 2022

Rmomal commented Jun 1, 2022 via email

GENG126 commented Jun 1, 2022

Post-treatments and warning messages #4

Post-treatments and warning messages #4

Comments

GENG126 commented Feb 9, 2022

"data" is the species abundance matrix

fit PLN model

Raphaellemomal commented Feb 10, 2022

GENG126 commented Feb 14, 2022

Raphaellemomal commented Feb 15, 2022

GENG126 commented Feb 17, 2022

GENG126 commented Feb 27, 2022

Raphaellemomal commented Mar 3, 2022 • edited Loading

GENG126 commented Mar 4, 2022

GENG126 commented Mar 15, 2022

Raphaellemomal commented Mar 24, 2022

GENG126 commented Mar 25, 2022

GENG126 commented May 31, 2022

Rmomal commented Jun 1, 2022 via email

GENG126 commented Jun 1, 2022

Raphaellemomal commented Mar 3, 2022 •

edited

Loading