Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: New function that returns covariance matrix #42

Closed
mdancho84 opened this issue Mar 22, 2017 · 7 comments
Closed

Feature Request: New function that returns covariance matrix #42

mdancho84 opened this issue Mar 22, 2017 · 7 comments
Labels
feature a feature request or enhancement

Comments

@mdancho84
Copy link

Great package. Something to consider is a function similar to correlate() that would return the covariance matrix. This will greatly assist with financial analysis.

@drsimonj
Copy link
Collaborator

drsimonj commented Apr 1, 2017

After some thinking, I have some questions:

  • If a separate function, what would you call it? E.g., for correlations we have correlate(), for covariance we have ... ?
  • Regardless, would a separate function be best? An alternative is to include a new argument in correlate(). E.g., could be a boolean like standardise that is TRUE by default for calculating correlations, but if FALSE computes covariance.
  • In general, how would you like the diagonal to be handled? correlate() returns whole diagonal as missing, and a diagonal argument allows them to all be set to anything such as 1, which is typical of correlation matrix. Covariance matrices typically have variance on the diagonal which is not uniform. If a new function is created, perhaps something like a boolean diag_as_na argument is used that returns diagonal as missing if TRUE or variance otherwise.
  • correlate() returns a tibble with a cor_df class. Would there be any reason to change this for a covariance structure? I.e., are there any other functions in corrr that you would want to behave differently with a covariance matrix? I can think of two: rplot and network_plot make colours range from -1 to 1 by default.

@mdancho84
Copy link
Author

First, I think this will be a very good addition to your package. From the financial side, the intended use will be with stock returns over time to understand how multiple stocks move together. This article explains more about how I will end up using it. http://www.investopedia.com/articles/financial-theory/11/calculating-covariance.asp

I'll collect some feedback from others as well, but here are some initial thoughts:

  • Function name is a tough one. covariate() might work since it follows the verb philosophy of the "tidyverse". Although covariation() is a noun, it might work well since users will know what it's doing. My preference is covariate.
  • A separate function makes sense to me since base R already has two separate functions cor() and cov(). However, I can see your point since the two are related. I actually like the standardise (and hopefully standardize) argument since essentially that is the essence. I would definitely support this approach.
  • The diagonal would need to be the variance for the covariance data frame. I'm ok with shaving off the top or bottom, but the diagonal should be present.
  • I think you can keep the cor_df class. I would not change this unless there is a specific reason to. However, you may have identified with rplot and network_plot colors. I will investigate more.

Overall, I think the correlate(standarise = TRUE) could be a nice solution. However, we'll need to investigate the interaction with the rplot and network_plot. Otherwise it may make sense to do a separate function with separate classes.

@drsimonj
Copy link
Collaborator

drsimonj commented Apr 2, 2017

I like covariate(). I think it's a better option.

@mdancho84
Copy link
Author

I'm on board with that. It seems that it will be easier to handle for rplot and network_plot.

@edgararuiz-zz
Copy link
Collaborator

Is this request still current?

@juliasilge
Copy link
Member

Closed in #116 thanks to @jameslairdsmith:

library(corrr)
colpair_map(mtcars, cov)
#> # A tibble: 11 x 12
#>    rowname     mpg     cyl   disp      hp     drat      wt     qsec       vs
#>    <chr>     <dbl>   <dbl>  <dbl>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl>
#>  1 mpg       NA     -9.17  -633.  -321.     2.20    -5.12    4.51     2.02  
#>  2 cyl       -9.17  NA      200.   102.    -0.668    1.37   -1.89    -0.730 
#>  3 disp    -633.   200.      NA   6721.   -47.1    108.    -96.1    -44.4   
#>  4 hp      -321.   102.    6721.    NA    -16.5     44.2   -86.8    -25.0   
#>  5 drat       2.20  -0.668  -47.1  -16.5   NA       -0.373   0.0871   0.119 
#>  6 wt        -5.12   1.37   108.    44.2   -0.373   NA      -0.305   -0.274 
#>  7 qsec       4.51  -1.89   -96.1  -86.8    0.0871  -0.305  NA        0.671 
#>  8 vs         2.02  -0.730  -44.4  -25.0    0.119   -0.274   0.671   NA     
#>  9 am         1.80  -0.466  -36.6   -8.32   0.190   -0.338  -0.205    0.0423
#> 10 gear       2.14  -0.649  -50.8   -6.36   0.276   -0.421  -0.280    0.0766
#> 11 carb      -5.36   1.52    79.1   83.0   -0.0784   0.676  -1.89    -0.464 
#> # … with 3 more variables: am <dbl>, gear <dbl>, carb <dbl>

Created on 2020-11-02 by the reprex package (v0.3.0.9001)

@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants