Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit color bar range in network_plot() #158

Closed
JOl-lN opened this issue Apr 29, 2022 · 6 comments
Closed

Limit color bar range in network_plot() #158

JOl-lN opened this issue Apr 29, 2022 · 6 comments

Comments

@JOl-lN
Copy link

JOl-lN commented Apr 29, 2022

Feature

Previously, there was an issue ticket #27, where manual limiting of the range of the network_plot function was discussed. I believe this would be a useful feature. Consider the situation where the dataset only produces positive correlation values. In the first plot, half of the color range is useless.

library(tidyverse)
library(corrr)

mtcars |>
  select(cyl, disp, hp, wt, carb) |> 
  correlate() |>
  network_plot()
#> 
#> Correlation method: 'pearson'
#> Missing treated using: 'pairwise.complete.obs'

You can sort of circumvent this in a hacky way by making the color scale as so, but that's confusing to the reader because there are no correlations below 0 so why is it shown and even if there were, it'd all show as black so you couldn't tell what it was anyway. The goal here was to emulate if the scale started at 0 and went to 1.

mtcars |> 
  select(cyl, disp, hp, wt, carb) |>
  correlate() |>
  network_plot(colors = c("black", "black", "skyblue"))
#> 
#> Correlation method: 'pearson'
#> Missing treated using: 'pairwise.complete.obs'

Created on 2022-04-29 by the reprex package (v2.0.1)

Regarding

the user chooses to show moderate correlations while ignoring weak and strong relationships.

IMO, the software should allow people to do what they want, it's their job not to do something wrong, not the softwares.

As a viewer of the plot, it seems weird that a plot might show correlations between 0.3 and 0.6 and then it would be left to the viewer to reason or guess whether the missing relationships are weaker (i.e.< 0.3) or stronger (i.e. >0.6) than the relationships shown.

If a plot shows a particular range of data, for example, between 0.3 and 0.6, why should the reader assume that data is being hidden as opposed to that being the true range of the data?

@juliasilge
Copy link
Member

Thanks for your thoughts @john-s-f! Overall we definitely aim to build software that can keep folks from making common mistakes (we see this as part of the software's job), but we are happy to reevaluate this decision. Can you tell us a little more about how you have run into this as a problem in your use case?

@JOl-lN
Copy link
Author

JOl-lN commented May 3, 2022

It's situations where data only produce correlations of one sign, as in the above mtcars example. In my case, RNAseq datasets. These are datasets about gene expression and while it's not impossible to get negative correlations, it's highly unlikely in most use cases. Moreover, these datasets are generally highly correlated with each other (for example, between biological replicates in an experiment, or even between two treatment groups) so the entire -1 to 1 color scale makes it hard to display smaller differences. These kinds of correlation analyses are usually a sanity check that your data is ok and/or the start of other kinds of analyses. This plot in particular is helpful in showing people how their data groups and relates to other data they have, for example, a 3 replicates of a control group and 3 replicates each of two different treatments.

@thisisdaryn
Copy link
Collaborator

Hi @john-s-f,

I think the way to address your situation is to give the user the option to map the color gradient to the range, i.e. min to max, of correlations in the cor_df object - as opposed to mapping it -1 to 1 (which will remain the default).

I will submit a PR soon implementing this via using the pre-existing legend argument, which could then be toggled between the options.

I'm generally skeptical about adding an independent capability to manipulate the range arbitrarily. So I think this is the way to go.

@JOl-lN
Copy link
Author

JOl-lN commented May 9, 2022

Thanks, this is also a viable solution.

@juliasilge
Copy link
Member

juliasilge commented May 15, 2022

The solution here (using legend = c("full", "range", "none")) is now merged in. Let us know if you have further issues! 🙌

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators May 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants