Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vignette update + function addition #19

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

vignette update + function addition #19

wants to merge 6 commits into from

Conversation

iciarfernandez
Copy link
Collaborator

  • updated vignette to reflect wording change from "caucasian" to "european"

  • added example usage of the predictPreeclampsia.R function to vignette

  • posted code for the predictPreeclampsia.R function

@wvictor14
Copy link
Owner

Hey @iciarfernandez thanks for the PR!

I made some changes onto main:

  • I relabelled nbeta to European for predictEthnicity, verified it works, but I left vignette unchanged since you have those changes here

@wvictor14
Copy link
Owner

wvictor14 commented May 22, 2023

Here is some code to use your model data I uploaded onto bioconductor. Note that you will need to install and use bioconductor devel

 library(ExperimentHub)

 eh = ExperimentHub()

  |======================================================================| 100%

 

snapshotDate(): 2023-04-04

 query(eh, c('eoPredData'))

ExperimentHub with 2 records

# snapshotDate(): 2023-04-04

# $dataprovider: University of British Columbia

# $species: Homo sapiens

# $rdataclass: mixo_splsda, matrix

# additional mcols(): taxonomyid, genome, description,

#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,

#   rdatapath, sourceurl, sourcetype

# retrieve records with, e.g., 'object[["EH8090"]]'

 

           title

  EH8090 | eoPredModel

  EH8091 | x_test

So those objects will be made available in the environment after loading with the code above.

Other things to do:

  • run document() to generate .rd files
  • run devtools::check() and BiocCheck::BiocCheck() and fix any errors and warnings (notes are ok)

After that I will take a more in depth review before merging.

Let me know if you have any questions or want help!
Thanks

@iciarfernandez
Copy link
Collaborator Author

Hey,

Thanks for the code! So it's working to load exactly what you copy-pasted above, but when I actually try to retrieve the eoPredModel object by running eh[["EH8090"]] as indicated, it starts downloading it but then just displays the following message:

Warning message:
package ‘eoPredData’ is not available for Bioconductor version '3.18'

Not sure if we need to fix something in the hub package before going forward then?

Moreover, I was just wondering how to then load the data from ExperimentHub into the function - is the idea to include

query(eh, c('eoPredModel'))

model=eh[['EH8080']]

at the start of the code (within the function itself) to basically serve the same purpose as data(ethnicityCpGs, envir=environment()) in your predictEthnicity function?

If that's the case I think I can figure out the rest by myself but I'll let you know if I have any more questions.

Thank you,
Iciar

@wvictor14
Copy link
Owner

Warning message:
package ‘eoPredData’ is not available for Bioconductor version '3.18'

Hey try downgrade bioconductor to 3.17:

BiocManager::install(version = "3.17")

Moreover, I was just wondering how to then load the data from ExperimentHub into the function - is the idea to include

query(eh, c('eoPredModel'))

model=eh[['EH8080']]

at the start of the code (within the function itself) to basically serve the same purpose as data(ethnicityCpGs, envir=environment()) in your predictEthnicity function?

Yes I think so

@wvictor14
Copy link
Owner

I relabelled "Caucasian" to "European" so now vignette continuesup to eoPred section:

processing file: planet.Rmd

Quitting from lines 333-366 [unnamed-chunk-12] (planet.Rmd)
Error in predictPreeclampsia():
! object 'trainLabels' not found
Backtrace:

  1. planet::predictPreeclampsia(t(val))
  2. mixOmics::splsda(trainBetas, trainLabels, ncomp = 1, keepX = 45)
    Execution halted

I updated the function to include the code where I load the model object (which yes, I think it basically replaces the equivalent of `data(ethnicityCpGs, envir=environment())` in the ethnicity function) and to require the mixOmics and ExperimentHub packages, ran `devtools::document()`, then `devtools::check()`.

I think everything with the function is working now - the error that I am getting now with `devtools::check()` is about the vignette; `valMeta` in the vignette is supposed to be the pData of the `val` object, but we didn't actually upload it to the ExperimentHub so I can't load it.

If you could update the ExperimentHub to 1) include the `valMeta` object, which I have added [to the dropbox](https://www.dropbox.com/scl/fi/byok0mkwl7iqwssexg5d2/valMeta.csv?rlkey=c0hle2lwgost83u6o6u4iwzku&dl=0) and 2) delete the `x_test` object and instead include a new `val` object valBMIQ, also [now in the dropbox](https://www.dropbox.com/scl/fi/uteq2joahew6wm9i0wl35/valBMIQ.rds?rlkey=er4trp4bg4xkskvey4yihgw4d&dl=0), then I can try running `devtools::check()` again and hopefully everything works!

Last but not least, I downgraded to Bioconductor 3.17 but still getting this warning when I try to load the data from ExperimentHub()

> Warning message:
> package ‘eoPredData’ is not available for Bioconductor version '3.17'

Even though it does seem to install - i.e., I am able to run `mod = eh[["EH8090"]]` and it works to load the data. Anyway, it doesn't seem to be a problem?
@iciarfernandez
Copy link
Collaborator Author

I updated the function to include the code where I load the model object (which yes, I think it basically replaces the equivalent of data(ethnicityCpGs, envir=environment()) in the ethnicity function) and to require the mixOmics and ExperimentHub packages, ran devtools::document(), then devtools::check().

I think everything with the function is working now - the error that I am getting now with devtools::check() is about the vignette; valMeta in the vignette is supposed to be the pData of the val object, but we didn't actually upload it to the ExperimentHub so I can't load it.

If you could update the ExperimentHub to 1) include the valMeta object, which I have added to the dropbox and 2) delete the x_test object and instead include a new val object valBMIQ, also now in the dropbox, then I can try running devtools::check() again and hopefully everything works!

Last but not least, I downgraded to Bioconductor 3.17 but still getting this warning when I try to load the data from ExperimentHub()

Warning message:
package ‘eoPredData’ is not available for Bioconductor version '3.17'

Even though it does seem to install - i.e., I am able to run mod = eh[["EH8090"]] and it works to load the data. Anyway, it doesn't seem to be a problem?

@iciarfernandez
Copy link
Collaborator Author

One more thing - it was discussed at lab meeting that instead of Ambiguous, it should say Other in the labels that predictEthnicity outputs, since the tool can only calculate 3 ancestries but there are other ancestries out there (+ ancestry is a continuum) so in reality samples being called ambiguous may just be mixed or from an ancestry other than African/Asian/European. Let me know what you think and if you agree I'm happy to change that myself too!

print(paste0("Input data must be a matrix or an array"))
}

subset <- betas[,colnames(betas) %in% trainCpGs]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming "subset" to something else, since subset is a base R function

CP <- as.data.frame(CP) %>% tibble::rownames_to_column("Sample_ID")
CP$Pred_Class <- CP$comp1
CP <- CP %>%
dplyr::mutate(Pred_Class = dplyr::case_when(EOPE > 0.55 ~ "EOPE",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming Pred_Class to something more informative related to PE (e.g. eoPred_class, predicted_pe, pe_status)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - will make this change too!

.DS_Store Outdated
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's delete this

@wvictor14
Copy link
Owner

wvictor14 commented Aug 11, 2023

Hi Iciar! Thanks for comments. Let me know if I'm understanding correctly:

predictPreeclampsia does not seem to use the pretrained models we uploaded onto experimenthub anymore, but instead trains a model "on-the-fly" using internal data trainBetas and trainLabel.

So if this is the new plan, then we don't need the experimentHub package for the function to work.

If this is true, then we need the following r objects to be added to the R package itself as sysdata.rda (internal, not available to user) or as rds objects:

  • trainBetas, trainLabels for predictPreeclampsia,
  • it looks like you want to use peBetas as an example.

But need to ensure that they are the under the size limit -- the model object was over limit hence the original plan to use the experimenthub pretrained model object.

Let's also just focus on the function for now, and worry about vignette after. I suggest maybe setting the code chunks in vignette to eval = FALSE for now to get the checks to run through

@wvictor14 wvictor14 mentioned this pull request Aug 11, 2023
@wvictor14
Copy link
Owner

One more thing - it was discussed at lab meeting that instead of Ambiguous, it should say Other in the labels that predictEthnicity outputs, since the tool can only calculate 3 ancestries but there are other ancestries out there (+ ancestry is a continuum) so in reality samples being called ambiguous may just be mixed or from an ancestry other than African/Asian/European. Let me know what you think and if you agree I'm happy to change that myself too!

I am moving this to a new issue to organize our discussions more

@iciarfernandez
Copy link
Collaborator Author

It does! Not sure if it didn't push correctly, but from my end this is the code I am seeing for the start of the predictPreeclampsia function:

predictPreeclampsia <- function(betas, ...){
 
  eh = ExperimentHub()
  mod = eh[['EH8090']]
  trainCpGs = colnames(mod$X)
 
  peCpGs = mixOmics::selectVar(mod)$name
  pp <- intersect(colnames(betas), peCpGs)

The issue I am having that I described here is that I need a few objects updated in the ExperimentHub package to fix the errors that devtools::check() is throwing now - let me know if you see what I mean in that comment (I think maybe you missed it?) and otherwise we can discuss further.

@wvictor14
Copy link
Owner

Ah I was looking at the wrong branch!

Can you set the vignette to code chunks to false for now so we can focus on the function and related check errors?

In the meantime I'll add those objects to experimenthub and remove x_test, it will require bioconductor to give us access.

Comment on lines 25 to 26
#' @import mixOmics
#' @import ExperimentHub
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to import the whole packages, can we remove these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how to fix the error that the devtools check was throwing other than by adding these as imports, is there a better way?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's fixed by importing the whole package, then likely there is some mixOmics function that ins't being called without double colons e.g. should be dplyr::slice instead of just slice.

Copy link
Owner

@wvictor14 wvictor14 Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think taking dependency on mixomics and experimenthub is overkill -> particularly mixomics seems like a large package and my worry is that it will 1) if mixomics has dependencies itself then many more subdependencies will be created

I can take a look

EDIT: basically I'm asking to remove these two import lines, and then add mixOmics::FUNCTION / ExperimentHub::FUNCTION whenever they are needed, then run the checks to see if any warnings / errors still there. If not then great.

Comment on lines 19 to 21
#' To predict early preeclampsia on 450k/850k samples
#'
#' Load data
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need comments (hashes #) or will cause errors

#' # Load Data

not
#' Load Data

other threshold, different labels can be assigned based on the output
probabilities.

```{r include=FALSE}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, we can set these to FALSE, and focus on vignette in a different PR / branch

@iciarfernandez
Copy link
Collaborator Author

In response to everything:

  1. I think I implemented all the changes to the code discussed above - now (when ignoring the vignette by setting it to include=FALSE for the time being) devtools::check() is not throwing any errors but it does have one warning and two notes.

The warning is

Undocumented arguments in documentation object 'predictPreeclampsia'
    ‘...’

and the notes are

Unexported object imported by a ':::' call: ‘mixOmics:::predict.mixo_spls’
    See the note in ?`:::` about the use of this operator.
predictAge: no visible global function definition for ‘data’
  predictAge: no visible binding for global variable ‘ageCpGs’
  predictEthnicity: no visible global function definition for ‘data’
  predictEthnicity: no visible binding for global variable
    ‘ethnicityCpGs’
  Undefined global functions or variables:
    ageCpGs data ethnicityCpGs
  Consider adding
    importFrom("utils", "data")
  to your NAMESPACE file.

On the other hand, BiocCheck::BiocCheck() does throw 2 errors that I am not sure how to fix:

ERROR: At least 80% of man pages documenting exported objects must have runnable examples.

ERROR: Maintainer must add package name to Watched Tags on the support site; Edit your Support Site User Profile to add Watched Tags.

  1. Thank you for submitting the request to ExperimentHub!

  2. I added ExperimentHub and mixOmics as imports because an error that the check step was previously throwing was related to not having them as dependencies - is there a better way to do this?

Let me know what you think!

@wvictor14
Copy link
Owner

In response to everything:

  1. I think I implemented all the changes to the code discussed above - now (when ignoring the vignette by setting it to include=FALSE for the time being) devtools::check() is not throwing any errors but it does have one warning and two notes.

The warning is

Undocumented arguments in documentation object 'predictPreeclampsia'
    ‘...’

the parameter ... needs documentation. Can write something like "feeds into this other function"

and the notes are

Unexported object imported by a ':::' call: ‘mixOmics:::predict.mixo_spls’
    See the note in ?`:::` about the use of this operator.

Am not sure if Notes need to be addressed for bioconductor/cran. I recall that they are ok to have but can you check?

On the other hand, BiocCheck::BiocCheck() does throw 2 errors that I am not sure how to fix:

ERROR: At least 80% of man pages documenting exported objects must have runnable examples.

Let's see if this is still there after addressing the examples error for predictPreclampsia.

ERROR: Maintainer must add package name to Watched Tags on the support site; Edit your Support Site User Profile to add Watched Tags.

Done. Reminds me, can you add yourself as a contributor or author to DESCRIPTION?

I see DS_STORE file is there image

@iciarfernandez
Copy link
Collaborator Author

Hi Victor,

Sorry that I dropped the ball on this, I got distracted with finishing projects and thesis writing - I think I made all the changes you requested. Please let me know of anything else I need to get done on my end and hopefully we can get this working soon!

Thanks,
Icíar

Copy link
Owner

@wvictor14 wvictor14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look and see what I can finish

predictPreeclampsia <- function(betas, ...){

# read in data to generate model
eh = ExperimentHub()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ::

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

.DS_Store Outdated
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this file / add to gitignore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added to gitignore

Comment on lines 25 to 26
#' @import mixOmics
#' @import ExperimentHub
Copy link
Owner

@wvictor14 wvictor14 Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think taking dependency on mixomics and experimenthub is overkill -> particularly mixomics seems like a large package and my worry is that it will 1) if mixomics has dependencies itself then many more subdependencies will be created

I can take a look

EDIT: basically I'm asking to remove these two import lines, and then add mixOmics::FUNCTION / ExperimentHub::FUNCTION whenever they are needed, then run the checks to see if any warnings / errors still there. If not then great.

@wvictor14
Copy link
Owner

wvictor14 commented Jun 15, 2024

  • Can you make sure your email is signed up for the support site? Otherwise cannot submit the eoPredData package to bioconductor:
 Checking for support site registration...
    * ERROR: Unable to find your email in the Support Site: HTTP 404 Not
      Found.

The email I have right now for you is Iciar.Fernandez@bcchr.ca, but lmk if you want to change it

@iciarfernandez
Copy link
Collaborator Author

I implemented the requested changes and check() didn't throw any errors or warnings. Re: your last comment, if you could change my email to iciarfernandez@outlook.com instead that would be great - I signed up to the support site with that email. Thank you!

@wvictor14
Copy link
Owner

Some merge conflicts, need to pull branch main into branch eoPred and locally fix merge conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants