Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in validateInputs #38

Open
Lewis-W-S-Fisher opened this issue Jul 27, 2023 · 8 comments
Open

Error in validateInputs #38

Lewis-W-S-Fisher opened this issue Jul 27, 2023 · 8 comments

Comments

@Lewis-W-S-Fisher
Copy link

Lewis-W-S-Fisher commented Jul 27, 2023

Hi there,

I'm currently trying something really niche with fishplot. I'm trying to make fishplots from the output of pairtree. I have formatted the data according to what is specified. My samples drop to 0 across timepoints but this is expected with what i'm working with and fix.missing.clones=TRUE is selected in the settings.

The main error is:
Error in eval(ei, envir) : clones with same nest level cannot have values that sum to more than 100%: Problem is in clusters 2,5,10,11,13

I went into the source code and built the validateInput function (the part where it's failing anyway) into my code then corrected the sample where it was going wrong. The frac.table at this point has been corrected (when originally 101% the clustered added to 100%). The error still occurs though.

#!/usr/bin/env Rscript

library(fishplot)
library(rjson)

getNestLevel <- function(parents,x){
  #sanity checks
  if(x > length(parents)){
    stop(paste("cannot have a parent that does not exist in list. parent =",x,", length(parents) =",length(parents)))
  }
  if(x < 0){
    stop("cannot have a value in parents of less than zero")
  }
  
  if(parents[x] == 0){
    return(0)
  } else {
    return(getNestLevel(parents,parents[x])+1)
  }
}

getAllNestLevels <- function(parents){
  nest.level=c()
  for(i in 1:length(parents)){
    nest.level=c(nest.level, getNestLevel(parents,i))
  }
  return(nest.level)
}

## splitting the clustering plot into a separate script using output tables ## 
## Takes an argument where the input directory is the same as the prefix from cluster_variant.R ##
json_file<- "pap004_pairtree_tree.json"

json_data<- fromJSON(paste(readLines(json_file), collapse=""))

cluster_file<- "pap004_pairtree_cluster_names.json"
tree_names<- fromJSON(paste(readLines(cluster_file), collapse=""))
length(names(tree_names))
##! get population frequencies 
frac.table<- Reduce(rbind, json_data$phi[1:length(json_data$phi)]) # not eta
frac.table<- 100 * round(frac.table, 2)
colnames(frac.table)<- json_data$samples
rownames(frac.table)<- seq(0, nrow(frac.table )-1)
print(frac.table)


## get the structure of the clones, aka parents
## need to insert 0 in the beginning since 0 is the founder clone for everything. 'parents + 1' because python is 0-index and R/fishplot is 1. 
parents<- c(0, json_data$parents + 1) 

parent_names<- c(0, names(tree_names))

#visualization. distribute samples at regular interval from 0 to 100.
timepoints<- seq(0, 100, 100/(ncol(frac.table) - 1) )    


# "16759" 
# frac.table[c(5), c("16759")]<- 90
# print(frac.table[c(2,5,10,11,13), c("16759")]) 
# # print(colSums(frac.table[c(2,5,10,11,13),]))
# print(colSums(frac.table))

clones =  1:dim(frac.table)[1]
timepts = 1:dim(frac.table)[2]
print(clones)
print(timepts)

nest.level<- getAllNestLevels(parents)
print(frac.table[c(2,5,10,11,13), 23])

for(timept in timepts){
  for(i in unique(nest.level)){
    neighbors = which(nest.level==i)
    if(sum(frac.table[neighbors,timept]) > 100){
      print("First Issue \n")
      cur_vector<- frac.table[neighbors,timept]
      index_array<- which(cur_vector == max(cur_vector), arr.ind=TRUE)
      cur_vector[unname(index_array)]<- cur_vector[unname(index_array)] -1
      print(frac.table[parents[i], timept])
      
      frac.table[neighbors,timept]<- cur_vector
      print(parent_names[parents[i+1]])
      stop(paste("clones with same nest level cannot have values that sum to more than 100%: Problem is in clusters ",
                 paste(neighbors,collapse=",")))
    }
  }

  for(i in unique(parents)){
    if(i > 0){
      neighbors = which(parents==i)
      if(sum(frac.table[neighbors,timept]) > frac.table[parents[neighbors[1]],timept]){
        print(paste("clusters:", paste(neighbors, collapse=","), "timepoint:",timept))
        print(parent_names[neighbors])
        print(frac.table[neighbors, timept])
        # parent[i] + 1 seems to be the parent of neighbours
        print(parent_names[parents[i+1]])

        print(frac.table[parents[i], timept])
        # print(parents)
        # print(parent_names)
        # stop(paste("clones with same parent cannot have values that sum to more than the percentage of the parent: Problem is in clusters ",paste(neighbors,collapse=","),"at timepoint",timept))
      }
    }
  }
}
parent_names
length(row.names(frac.table))
print(frac.table[c(2,5,10,11,13), 54])

#create a fish object
fish = createFishObject(frac.table, parents, timepoints=timepoints, fix.missing.clones=TRUE)

Apologies for the scrappy code, i've been trying just about everything to get it to work. It's frankensteined together from some issues on the pairtree github :D so credit goes to them. I can send the files over if necessary.

All the best,
Lewis

@chrisamiller
Copy link
Owner

Without a simple, reproducible example, I can't offer much help. Can you simplify your script down to something like the plots in the testsdirectory, substituting your values?

It also may help to draw out on a whiteboard the particular case that you're trying to plot. Does the nesting of samples make sense? If you annotate your drawing with fractions, is it an impossible situation (one where the fraction of all the clones doesn't "fit" into the 100% that can possibly be taken up by a tumor?

@Lewis-W-S-Fisher
Copy link
Author

Lewis-W-S-Fisher commented Aug 2, 2023

Hi again,

Sorry for the messy question. I can remember rushing last week to ask this question before going home.

Basically I looked into things a bit further and found that when fix.missing.clones=TRUE it adds in small numbers instead of 0s (as it says in your documentation). However, this is causing failure in the ValidateInputs function. So for clones at the same nest levels they add up to 100.0000002 causing the error. This also causes the sum of clones at the same nest level to become marginally larger than the parent also causing an error.

I was wondering if there's a simple fix for this?

All the best,
Lewis

@chrisamiller
Copy link
Owner

Ah, that makes sense. Fishplot is qualitative, not completely quantitative, as it's inferring data between the timepoints to make things look pretty. The easy solution would be to just subtract a very small amount from your input numbers to offset the very small amount being added. A more comprehensive solution would be to update the section that adds the small numbers such that it subtracts a corresponding amount from the parent clone. I'd welcome a pull request to that effect, or will add it to my to do list. Sorry that it's created some confusion, and thanks for chasing down the problem!

@Lewis-W-S-Fisher
Copy link
Author

No problem, I have changed some of my samples to fix this problem. How long is your to do list? I could have a go at implementing a solution if you want me to open a request.
All the best,
Lewis

@chrisamiller
Copy link
Owner

How long is your to do list?

Always longer than I'd like 😆

If you want to take a stab at it, that would be the most expedient way, and I'm happy to take a look at a proposed solution for merging!

@Lewis-W-S-Fisher
Copy link
Author

Yeah i'll give it a crack. I tried to open a request but no luck. If you could allow me I'll try make the changes :).

@chrisamiller
Copy link
Owner

To do that, I think you need to fork the project to your own github, make some changes and commit them, then open a PR. If that doesn't work, let me know and we can try to troubleshoot. Thanks!

@alquamalok22
Copy link

hi I just wanted to know is it possible to build fisplots on the genes and frequency basis, I don't have information about the cluster ids or clones to generate parents' vector. Could you please help me with that..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants