Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forced join - qenv #85

Closed
wants to merge 34 commits into from
Closed

forced join - qenv #85

wants to merge 34 commits into from

Conversation

Polkas
Copy link
Contributor

@Polkas Polkas commented Oct 31, 2022

closes #56

Add the force argument to add the optional full control around the join method.
The force name is inspired from bash and git where the unsecure options have to be forced.
The forced join is always attached with advice message and optional warning if the default join seems to be suited.
The warning and message could be turn off.

gogonzo and others added 25 commits September 2, 2022 11:07
* constructor, eval_code, lockEnvironment  - need validation

* prototype

* constructor for the list of data (possibly reactive)

* progress with join quosure

* - added more constructors 
- test constructors
- test join

* update WIP

* fix constructor from list

* Update Quosure.R

* add id to the code, resolve join conflicts, more test

* implement id field in Quosure to distinguish detect code entries
Signed-off-by: Nikolas Burkoff <nikolas.burkoff@roche.com>
Signed-off-by: Nikolas Burkoff <nikolas.burkoff@roche.com>
@Polkas Polkas added the core label Oct 31, 2022
@@ -9,6 +9,11 @@
#' - more cases to be done
#' @param x (`qenv`)
#' @param y (`qenv`)
#' @param force (`logical(1)`) whether to turn off automatic validation and duplicated code removal, by default `FALSE`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about explaining objects in y overwrite those in x?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Contributor

@nikolas-burkoff nikolas-burkoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments from me.

Also note - I can't see this functionality being used - we haven't needed it in our codebase - but I guess there's no harm if it's there for others

standardGeneric("join")
})

#' @rdname join
#' @export
setMethod("join", signature = c("qenv", "qenv"), function(x, y) {
setMethod("join", signature = c("qenv", "qenv"), function(x, y, force, verbose) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could force and verbose have default args and then just a simple checkmate::assert_flag() - or does S4 not allow that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for default values,.
Furthermore I suggest force = TRUE, value = force.

S4 does allow this. If I'm not mistaken, if assertions are put in a method with signature = "ANY" that does not call callNextMethod(), they will be applied to all methods with more specific signatures.

if (verbose) {
message("The join method of qenv was forced, please be careful and manually validate the results.")
if (isTRUE(join_validation)) {
warning("Forced join is not recomended for this scenario, the default mechanism seems to be suited.")
Copy link
Contributor

@nikolas-burkoff nikolas-burkoff Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't like the fact that join(x, y, force = TRUE) and join(x, y, force = FALSE) give different answers if check joinable is true (and both qenv's have some shared code at the beginning) as that's confusing -> we could instead change it so force = TRUE only changes the behaviour if it would have given an error.

Or maybe we could use a different function like concatenate or just c as that's what force join is effectively doing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the default behaviour is more surprising than the forced one.

"we could instead change it so force = TRUE only changes the behaviour if it would have given an error."
My idea about force it is that it works always and at the same predicted way.
We simply join what is provided.

Comment on lines +69 to +77
y@id <- c(x@id, y@id)
y@code <- c(x@code, y@code)
y@warnings <- c(x@warnings, y@warnings)
y@messages <- c(x@messages, y@messages)

# insert (and overwrite) objects from y to x
y@env <- rlang::env_clone(y@env, parent = parent.env(.GlobalEnv))
rlang::env_coalesce(env = y@env, from = x@env)
y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's correct but very confusing for this to be the other way round to the case above - I wonder if we need a few more code comments at the very least? I couldn't think of a nice way to neaten this up...

tests/testthat/test-qenv_join.R Show resolved Hide resolved
@gogonzo
Copy link
Contributor

gogonzo commented Nov 2, 2022

In the issue I left the comment "Make a decision if this is needed first" which makes me realize that we can undermine this need. In my opinion we can keep qenv unchanged since this change is not dictated by our need. Current version of qenv fits to our needs and protect us from the irreproducible calls, except this and probably more which we can't imagine yet. I'd close the issue and give opportunity to the users to shape the product in the future.
Implementation of the force is correct but I'd be careful what we enable initially because it will be harder to remove the code in the future.

@Polkas
Copy link
Contributor Author

Polkas commented Nov 2, 2022

This issue is in my opinion help users as gives an option of full control which is always good.
However I am open to close the issue if the rest of the team decided.

@gogonzo @nikolas-burkoff I am more concern about the rlang::env_coalesce(env = x@env, from = y@env) call for already existed join code. It looks like we detach the logical flow between .check_joinable and raised rlang::env_coalesce.
So the expected join of environments should be always y with x not x with y, but because of .check_joinable we assume/know that any x variable is never overwrite by y. Thus the information from .check_joinable is not so obviously transferred to the env_coalesce step.

@gogonzo
Copy link
Contributor

gogonzo commented Nov 2, 2022

For me .check_joinable and what is happening in join are compatible (including rlang::env_coalesce).

  • .check_joinable checks if everything y@env doesn't have any modified object comparing to x@env
  • join makes an union of these two environments by copying extra objects from y@env -> x@env. env_coalesce skips copying objects which has bindings already in x with the same name.

In general one couldn't modify/create the objects in y@env which are already in x@env. The issue proposes to relax this limitation by giving opportunity for someone to make something irreproducible. If one wants to modify the object shared by two qenv then it should be done after a join or before a split.

@Polkas does it address your comments?

It looks like we detach the logical flow between .check_joinable and raised rlang::env_coalesce.
...
Thus the information from .check_joinable is not so obviously transferred to the env_coalesce step.

@Polkas
Copy link
Contributor Author

Polkas commented Nov 2, 2022

  • Why we are reusing x@env
  • coalesce is not needed, I think we could make it more efficient/elegant with proper env parent tree.
  • Should .check_joinable return TRUE or string, where it evaluate already some stuff

COmments and idea for improvements:

#' @rdname join
#' @export
setMethod("join", signature = c("qenv", "qenv"), function(x, y) {
  join_validation <- .check_joinable(x, y)

  # join expressions
  if (!isTRUE(join_validation)) {
    stop(join_validation)
  } else {

    # this is the evaluated once more as is already known in .check_joinable
    # may be should be returned by .check_joinable
    # what is it, no description
    id_unique <- !y@id %in% x@id

    # why we overwrite the x here
    # we could create a new qenv
    qenv <- new_qenv()
    qenv@id <- c(x@id, y@id[id_unique])
    qenv@code <- c(x@code, y@code[id_unique])
    qenv@warnings <- c(x@warnings, y@warnings[id_unique])
    qenv@messages <- c(x@messages, y@messages[id_unique])
    # why y and why parent is x, give some info
    # have to be tested with join after join pipe
    qenv@env <- rlang::env_clone(y@env, parent = x@env)
    qenv
  }
})

Nikolas Burkoff added 6 commits November 2, 2022 13:52
Signed-off-by: Nikolas Burkoff <nikolas.burkoff@tessella.com>
Signed-off-by: Nikolas Burkoff <nikolas.burkoff@tessella.com>
Signed-off-by: Nikolas Burkoff <nikolas.burkoff@tessella.com>
Signed-off-by: Nikolas Burkoff <nikolas.burkoff@tessella.com>
@gogonzo
Copy link
Contributor

gogonzo commented Nov 3, 2022

  • Why we are reusing x@env
  • coalesce is not needed, I think we could make it more efficient/elegant with proper env parent tree.
#' @rdname join
#' @export
setMethod("join", signature = c("qenv", "qenv"), function(x, y) {
    ...
    # have to be tested with join after join pipe
    qenv@env <- rlang::env_clone(y@env, parent = x@env)
    qenv
  }
})

Stuff highlighted in your comment was discussed here. This is neat solution but causes problems. Please consider the scenario step by step

  1. y <- join(a1, a2) results in y having a1 and a2 as ancestors:
  a1
    \
a2 - (y)
a1 - a2 - (y)
  1. x <- join(b1, b2) results in x having b1 and b2 as ancestors
  b1
    \
b2 - (x)
b1 - b2 - (x)
  1. Let's join now x and y
a1- a2 - (y) 
          \
            (z)
          /
b1- b2 - (x) 
b1 - b2 - (x) - (y)

so the a1 and a2 has been lost here.
4. Now we would like to eval_code(z) which creates a copy of object@env and changes the parent to .GlobalEnv

.GlobalEnv - z

So the problem is that only last environment is moved, and we should move all upstream environments up to the point where they have the same parent. If anyone has a better solution I'd be glad.

  • Should .check_joinable return TRUE or string, where it evaluate already some stuff

Yes, I'm open for suggestions. Your concern is similar to the previous there - let's continue discussion there

@nikolas-burkoff
Copy link
Contributor

My view would be to close this and not worry about force joining until we or a module developer explicitly needs this functionality.

I quite like the idea of forcing people to keep the contents of qenv's as simple as possible especially as there are plenty of edge cases where things won't work as expected

Base automatically changed from teal_refactor@main to main November 4, 2022 15:39
@mhallal1
Copy link
Contributor

mhallal1 commented Nov 7, 2022

I quite like the idea of forcing people to keep the contents of qenv's as simple as possible especially as there are plenty of edge cases where things won't work as expected

I second this opinion regarding this PR.

@donyunardi
Copy link
Contributor

I quite like the idea of forcing people to keep the contents of qenv's as simple as possible especially as there are plenty of edge cases where things won't work as expected

I agree on this statement.
Unless there's a heavy objection, I also voted that we close this topic for now.

Copy link
Contributor

@nikolas-burkoff nikolas-burkoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is not an implementation of force join - it is a concatenation. If we want a concatenate behaviour then that is not a problem but I would recommend naming the function something different.
It is very confusing and non-standard to have different behaviour when force = TRUE and force = FALSE when both cases work.

A force join would carry out the same action as a normal join but not bothering with the checks about whether it's joinable and just assuming it is. We haven't found a need for it in our code base so I wouldn't implement it but I guess you could use it for the edge case below as I the developer know that although the check_joinable checks fail - joining is (assuming i is not used later without being assigned) an OK option

image

And when you force join the qenv the code should be:

x <- 4
a <- numeric(0); for(i in 1:2) a <- c(a, i)
b <- numeric(0); for(i in 1:3) b <- c(b, i)

and not the concatenation of the qenv which would give:

x <- 4
b <- numeric(0); for(i in 1:3) b <- c(b, i)
x <- 4
a <- numeric(0); for(i in 1:2) a <- c(a, i)

@Polkas
Copy link
Contributor Author

Polkas commented Nov 15, 2022

@nikolas-burkoff thank you for your insight even that we know that we will not introduce this functionality.
"So this is not an implementation of force join - it is a concatenation.", Yes it is a simply and light concatenation so it is working always no matter what. That is beautiful about it, in my opinion.

@nikolas-burkoff I could add to your example a simpler case that somebody want to have a duplicated code:)

@Polkas Polkas closed this Nov 15, 2022
@m7pr m7pr deleted the 56_force_join@teal_refactor@main branch June 12, 2023 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Qenv - force join of two qenv
8 participants