Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for arbitrary functions #94

Closed
jameslairdsmith opened this issue Mar 7, 2020 · 4 comments
Closed

Support for arbitrary functions #94

jameslairdsmith opened this issue Mar 7, 2020 · 4 comments
Labels
feature a feature request or enhancement

Comments

@jameslairdsmith
Copy link
Contributor

I've really enjoyed using corrr; it is an excellent package! Thanks for all the great work! One thing that I think would make it even more useful would be if there was support for arbitrary functions. At the moment we are limited to only creating dataframes of correlations (of various types). That's useful, but there are a number of different kind of pairwise statistics that can be calculated for the variables of a dataframe. I can see on a separate issue (#42) there has been a request for covariance to also be supported.

But it seems a more robust and elegant solution would be to have a function that could take an arbitrary function. And would return a cor_df like object, but with values output from that arbitrary function rather than correlations. There would have to be a few changes, like those already mentioned in the covariance issue. Not all outputs would be on a scale from -1 to 1 for example.

A relatively simple example would be a linear regression using lm(). If there were this form of the correlate() function, it could make the arbitrary function output the beta from regressing each of the variables onto one another. (The beta of regressing y on x is not the same as regressing x on y, so there wouldn't be duplication like in the case of correlations). If you wanted to find the p-values of those regressions, you could simply change the arbitrary function to output the p-values instead. Same thing if you wanted the R^2.

@topepo topepo added the feature a feature request or enhancement label Apr 3, 2020
@topepo
Copy link
Member

topepo commented Apr 3, 2020

I think that the solution here would be to make new S3 methods for correlate(). Would you like to make a pull request for some?

I like the idea of linear model covariance matrices here but we have to make sure that there is not problematic amounts of feature-creep.

@Athospd
Copy link

Athospd commented Apr 24, 2020

That would be a powerful feature! Make sense wide this idea for categorical features as well? For instance, a Cramer's V or Chisq matrix.

(actually it makes me imagine a list-matrix, a 2D version of list columns =P)

@juliasilge
Copy link
Member

Closed in #116!!! 🚀

@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants