diff --git a/CINSignatureQuantification.Rproj b/CINSignatureQuantification.Rproj new file mode 100644 index 0000000..294675a --- /dev/null +++ b/CINSignatureQuantification.Rproj @@ -0,0 +1,23 @@ +Version: 1.0 + +RestoreWorkspace: No +SaveWorkspace: No +AlwaysSaveHistory: Default + +EnableCodeIndexing: Yes +UseSpacesForTab: Yes +NumSpacesForTab: 4 +Encoding: UTF-8 + +RnwWeave: Sweave +LaTeX: pdfLaTeX + +AutoAppendNewline: Yes +StripTrailingWhitespace: Yes +LineEndingConversion: Posix + +BuildType: Package +PackageUseDevtools: Yes +PackageInstallArgs: --no-multiarch --with-keep.source +PackageRoxygenize: rd,collate,namespace +DisableExecuteRprofile: Yes diff --git a/DESCRIPTION b/DESCRIPTION new file mode 100644 index 0000000..2e59cca --- /dev/null +++ b/DESCRIPTION @@ -0,0 +1,51 @@ +Package: CINSignatureQuantification +Title: Simple and quick measuring of copy number signatures in cancers +Version: 0.0.9 +Authors@R: + c(person(given = "Philip S", + family = "Smith", + role = c("aut"), + email = "philip.smith@cruk.cam.ac.uk", + comment = c(ORCID = "0000-0002-9306-1747")), + person(given = "Ruben M", + family = "Drews", + role = c("aut", "cre"), + email = "Ruben.Drews@cruk.cam.ac.uk", + comment = c(ORCID = "0000-0001-7360-4970")), + person(given = "Cancer Research UK", + role = c("cph", "fnd"), + email = "Florian.Markowetz@cruk.cam.ac.uk", + comment = c(ORCID = "0000-0002-2784-5308")) + ) +Description: Allowing the simple and quick quantification of copy number signatures in + cancer samples from copy number profiles. The signatures are a readout of mutational + processes resulting in chromosomal instability (CIN). It is thought as a one-stop + solution, combining multiple published solutions. At the moment the methods from + Drews et al. (Nature, 2022) and Macintyre et al. (Nature Genetics, 2018) are included. +URL: https://github.com/markowetzlab/CINSignatureQuantification +BugReports: https://github.com/markowetzlab/CINSignatureQuantification/issues +biocViews: +Depends: + R (>= 4.0.0) +Imports: + base, + Biobase (>= 2.46.0), + data.table (>= 1.14), + graphics (>= 1.3.7), + limSolve (>= 1.5.6), + methods, + parallel, + stats, + stringr (>= 1.4), + utils +Suggests: + doParallel (>= 1.0.16), + foreach (>= 1.5.1), + knitr, + rmarkdown +License: ASL + file LICENSE +Encoding: UTF-8 +LazyData: true +Roxygen: list(markdown = TRUE) +RoxygenNote: 7.1.2 +VignetteBuilder: knitr diff --git a/LICENSE.txt b/LICENSE.txt new file mode 100644 index 0000000..d612ce7 --- /dev/null +++ b/LICENSE.txt @@ -0,0 +1,68 @@ +Available Source Licence (“ASL”) + +Copyright (c) 2022, University of Cambridge and Spanish National Cancer Research Centre (CNIO) + +Preamble +ASL is a software license that proposes to offer copyleft style rights to use the software in an academic non-commercial setting. The purpose of ASL is to encourage academic cooperation and collaboration free-of-charge whilst enabling academic institutions to earn revenue from the parallel licensing of valuable bodies of software code. Significant proportions of such revenue are typically reinvested in academic research. +ASL is not an open-source licence because it does not allow commercial use – it is an “available source” licence, meaning that the source code is made available subject to the terms of this licence and only for academic non-commercial use. +ASL is a reciprocal licence very similar to GPL and the core terms are identical to those of GNU GPLv2 (except for the limitation to non-commercial use), making it easier for those who know the GPLv2 to understand the licensing*. As a reciprocal licence, if you redistribute any derivative works you have created based on ASL licenced code, then you are required to license the new work under the ASL - including making your source code available and ensuring that licensees are aware of the terms of the ASL licence. +The non-commercial limitation makes ASL incompatible with the GPL and other open-source licences with copyleft provisions because these licences require that modified versions of the original program are made available free of charge for any type of use, including commercial, and this is prevented by ASL. As the two sets of reciprocal terms are incompatible, any modified version of the original program combining ASL and copylefted open-source code can be used internally but cannot be licensed out. +The non-commercial restriction is an integral part of the license and you may not remove it without the consent of the rights holder in the ASL licensed code. Please contact Cambridge Enterprise for any questions and/or to enquire about commercial use rights. +*The changes to the GPLv2 are: removing the Preamble, replacing the reference to the “General Public License” in clause 0 with a reference to the “ASL”, removing clause 9 and adding clause 13 with the non-commercial restriction and limited patent grant + +TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION +0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". +Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. +1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. +You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. +2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: +a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. +b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. +c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) +These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. +Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. +In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. +3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: +a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, +b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, +c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) +The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. +If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. +4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. +5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. +6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. +7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. +If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. +It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. +This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. +8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. +9.Not used. +10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the licensor to ask for permission. +NO WARRANTY +11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. +12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. +13. Non Commercial Use; Limited Patent Rights Grant +The Preamble of the GPLv2 does not apply to this License +The Program is provided to you by the Licensor subject to the following conditions, which prevail over any clause or indication to the contrary in the GPLv2: +a) The grant of rights under the License is for academic non-commercial use only. Academic non-commercial use is defined as use for academic research or other not-for-profit scholarly purposes, which are undertaken at an educational, non-profit, charitable or governmental institution, and which does not involve and is not intended to lead to the production or manufacture of products for sale, or the enhancement of a product or service in or proposed for commerce, or the performance of services for a fee. +b) Subject to your compliance with the License, in the event the Licensor holds patent rights on the Program or any part of it, the Licensor grants you a perpetual, worldwide, non-exclusive, royalty-free, irrevocable (except as stated in this section), limited patent licence to make, use, import and otherwise run, modify and distribute the Program. For the avoidance of doubt, this patent licence is limited to academic non-commercial use, as described above. If you institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Program constitutes direct or contributory patent infringement, then any patent license granted to you under this License for the Program shall terminate as of the date such litigation is filed. +Use other than academic and non-commercial use as above is deemed to be commercial use and outside the scope of this License. If you intend to use the Program for a commercial use, then you must obtain a commercial use license for the Program. In that case, please contact the original licensor to enquire about commercial use licenses. + +END OF TERMS AND CONDITIONS + +You may contact the original licensor at + +University of Cambridge +The Old Schools +Trinity Lane +Cambridge +CB2 1TN +United Kingdom + +and + +Spanish National Cancer Research Centre +Calle Melchor Fernández Almagro, 3 +28029 +Madrid + diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 0000000..0d84364 --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,24 @@ +# Generated by roxygen2: do not edit by hand + +export(addsampleFeatures) +export(calculateActivity) +export(calculateFeatures) +export(calculateSampleByComponentMatrix) +export(clinPredictionDenovo) +export(clinPredictionPlatinum) +export(createCNQuant) +export(getExperiment) +export(getSampleByComponent) +export(getSamplefeatures) +export(getSamples) +export(getSegments) +export(plotActivities) +export(plotSampleByComponent) +export(plotSegments) +export(quantifyCNSignatures) +exportClasses(CNQuant) +exportClasses(ExpQuant) +exportClasses(SigQuant) +importFrom(data.table,rbindlist) +importFrom(data.table,setDT) +importFrom(limSolve,lsei) diff --git a/R/AllClasses.R b/R/AllClasses.R new file mode 100644 index 0000000..32adc26 --- /dev/null +++ b/R/AllClasses.R @@ -0,0 +1,70 @@ +#' ExpQuant object +#' +#' @slot experimentName character +#' @slot init.date character +#' @slot last.modified character +#' @slot samples.full numeric +#' @slot samples.current numeric +#' @slot build character +#' @slot feature.method character +#' +#' @export +ExpQuant <- setClass("ExpQuant", + slots = list(experimentName = "character", + init.date = "character", + last.modified = "character", + samples.full = "numeric", + samples.current = "numeric", + build = "character", + feature.method = "character" + ), + prototype = list(experimentName = NULL, + init.date = as.character(Sys.time()), + last.modified = "NA", + samples.full = NULL, + samples.current = NULL, + build="hg19", + feature.method = "NA") +) + + +#' CNQuant object +#' +#' @slot segments list +#' @slot featData list +#' @slot featFitting list +#' @slot samplefeatData data.frame +#' @slot ExpData ExpQuant +#' +#' @export +#' +CNQuant <- setClass("CNQuant", + slots = list(segments = "list", + featData = "list", + featFitting = "list", + samplefeatData = "data.frame", + ExpData = "ExpQuant") +) + +#' SigQuant +#' +#' @slot activities list +#' @slot signature.model character +#' @slot backup.signatures matrix +#' @slot backup.thresholds numeric +#' @slot backup.scale list +#' @slot backup.scale.model character +#' +#' @export +#' +SigQuant <- setClass("SigQuant", + contains = "CNQuant", + slots = list( + activities = "list", + signature.model = "character", + backup.signatures = "matrix", + backup.thresholds = "numeric", + backup.scale = "list", + backup.scale.model = "character" + ) +) diff --git a/R/AllGenerics.R b/R/AllGenerics.R new file mode 100644 index 0000000..679b802 --- /dev/null +++ b/R/AllGenerics.R @@ -0,0 +1,155 @@ +#' getSamples +#' +#' Extracts sample names from a CNQuant object. +#' +#' @param object CNQuant object +#' @return A character vector +#' @export +#' @docType methods +#' @rdname getSamples-methods +#' +setGeneric("getSamples", function(object) standardGeneric("getSamples")) + +#' getSegments +#' +#' Extracts copy number segment data from a CNQuant object. +#' +#' @param object CNQuant object +#' @return A data.frame +#' @export +#' @docType methods +#' @rdname getSegments-methods +#' +setGeneric("getSegments", function(object) standardGeneric("getSegments")) + +#' getSamplefeatures +#' +#' Extracts sample feature data from a CNQuant object. +#' +#' @param object CNQuant object +#' @return A data.frame +#' @export +#' @docType methods +#' @rdname getSamplefeatures-methods +#' +setGeneric("getSamplefeatures", function(object) standardGeneric("getSamplefeatures")) + +#' getSampleByComponent +#' +#' @param object CNQuant object +#' @return matrix containing the sample-by-component data +#' @export +#' @docType methods +#' @rdname getSampleByComponent-methods +#' +setGeneric("getSampleByComponent", function(object) standardGeneric("getSampleByComponent")) + +#' getExperiment +#' +#' Extracts and returns copy number features from copy number profiles in a CNQuant object. +#' +#' @param object CNQuant object +#' @return A ExpQuant class object +#' @export +#' @docType methods +#' @rdname getExperiment-methods +#' +setGeneric("getExperiment", function(object) standardGeneric("getExperiment")) + +#' calculateFeatures +#' +#' Extracts and returns copy number features from copy number profiles in a CNQuant object. +#' +#' @param object CNQuant object +#' @param method Method to extract copy number features. Default is "drews". +#' @param smooth.diploid Binary variable indicating whether segments close to 2 should be collapsed to 2 and merged together. Default is TRUE. +#' @param cores Number of CPU threads/cores to utilise via doParallel. Default is 1. +#' @return A CNQuant class object with extracted features stored in the "featData" slot +#' @export +#' @docType methods +#' @rdname calculateFeatures-methods +#' +setGeneric("calculateFeatures",function(object, method="drews", smooth.diploid=TRUE,cores=1) + standardGeneric("calculateFeatures")) + +#' calculateSampleByComponentMatrix +#' +#' Calculates and returns a sample-by-component matrix from copy number features in a CNQuant object. +#' +#' @param object CNQuant object +#' @param method Determines the mixture components used to calculate sum-of-posterior probabilities. Default is "drews". +#' @return A CNQuant class object with sum-of-posterior probabilities stored in the "featFitting" slot +#' @export +#' @docType methods +#' @rdname calculateSampleByComponentMatrix-methods +#' + +setGeneric("calculateSampleByComponentMatrix",function(object, method="drews") + standardGeneric("calculateSampleByComponentMatrix")) + +#' calculateActivity +#' +#' Calculates and returns signature activities in a SigQuant object. Works best after function calculateSampleByComponentMatrix call. \cr \cr +#' The output of this function is a list of four matrices, the raw signature activities, the normalised activities, the normalised and +#' thresholded signature activities and the normalised, thresholded and scaled activities with the scaling factors obtained from the TCGA cohort. +#' +#' +#' @param object CNQuant object +#' @param method Determines the mixture components used to calculate sum-of-posterior probabilities. Default is "drews". +#' @return A SigQuant class object with four activity matrices stored in the "activities" slot +#' @export +#' @docType methods +#' @rdname calculateActivity-methods +#' + +setGeneric("calculateActivity",function(object, method="drews") + standardGeneric("calculateActivity")) + +#' quantifyCNSignatures +#' +#' This function takes a copy number profile as input and returns signature activities. +#' +#' @param object CNQuant object +#' @param experimentName A user-specified name of the experiment +#' @param method The method used for calculating the signature activities. Default is "drews" +#' @param cores Number of threads/cores to use for parallel processing +#' @return A SigQuant class object with four activity matrices stored in the "activities" slot +#' @export +#' @docType methods +#' @rdname quantifyCNSignatures-methods +#' + +setGeneric("quantifyCNSignatures",function(object, experimentName="Default", method="drews",cores=1) + standardGeneric("quantifyCNSignatures")) + +#' clinPredictionPlatinum +#' +#' The function takes signature activities based on Drews et al. methodology and predicts patient's response to platinum-based chemotherapies. +#' +#' @param object SigQuant object +#' @return A vector with "Predicted sensitive" or "Predicted resistant" for all samples in the input object. +#' @export +#' @docType methods +#' @rdname clinPredictionPlatinum-methods +#' + +setGeneric("clinPredictionPlatinum",function(object) + standardGeneric("clinPredictionPlatinum")) + +#' clinPredictionDenovo +#' +#' The function takes signature activities based on Drews et al. methodology and predicts patient's response based on +#' user-specified pair of signatures. \cr \cr +#' The user should supply a vector of samples for training purposes. The function then trains the classifier on these samples before applying it to all samples and return the labels. +#' +#' @param object SigQuant object +#' @param sampTrain Vector of sample names that should be used for training the classifier. +#' @param sigsTrain Vector with two signature names on which the prediction should be based upon. +#' @return A vector with "Signature <1> higher" or "Signature <2> higher" for all samples in the input object. +#' @export +#' @docType methods +#' @rdname clinPredictionDenovo-methods +#' + +setGeneric("clinPredictionDenovo",function(object, sampTrain, sigsTrain) + standardGeneric("clinPredictionDenovo")) diff --git a/R/CINSignatureQuantification.R b/R/CINSignatureQuantification.R new file mode 100644 index 0000000..b4bbebd --- /dev/null +++ b/R/CINSignatureQuantification.R @@ -0,0 +1,9 @@ +#' CINSignatureQuantification +#' @description Allowing the simple and quick quantification of copy number signatures in +#' cancer samples from copy number profiles. The signatures are a readout of mutational +#' processes resulting in chromosomal instability (CIN). It is thought as a one-stop +#' solution, combining multiple published solutions. At the moment the methods from +#' Drews et al. (Nature, 2022) and Macintyre et al. (Nature Genetics, 2018) are included. +#' @name CINSignatureQuantification +#' @docType package +NULL diff --git a/R/LinCombDecompSigs.R b/R/LinCombDecompSigs.R new file mode 100644 index 0000000..57f187a --- /dev/null +++ b/R/LinCombDecompSigs.R @@ -0,0 +1,41 @@ +#' @importFrom limSolve lsei +LinCombDecompSigs = function (component_by_sample, component_by_signature) { + + # Failsafe: Convert signatures to matrix + mSignatures = as.matrix(component_by_signature) + + # Prepare LCD (needs matrix and vectors for each signature) + numSigs = ncol(mSignatures) + G = diag(numSigs) + H = rep(0, numSigs) + + # Prepare output + dfOutExp = data.frame() + + # Loop over signatures + numSamps = ncol(component_by_sample) + for (i in seq_len(numSamps)) { + + # Perform the magic. LCD on a signature (vector of weights) and the input matrix + lLCDResult = lsei(A = mSignatures, + B = component_by_sample[, i], + G = G, + H = H, + verbose = FALSE) + + # Extract relevant vector + vExp = as.vector(lLCDResult$X) + + # Attach results to output data frame + dfOutExp[seq(1, numSigs, 1), i] = vExp + + # Clean up + rm(lLCDResult) + } + + # Transfer names and return + colnames(dfOutExp) = colnames(component_by_sample) + rownames(dfOutExp) = colnames(component_by_signature) + + return(dfOutExp) +} diff --git a/R/addSampleFeatures.R b/R/addSampleFeatures.R new file mode 100644 index 0000000..958810b --- /dev/null +++ b/R/addSampleFeatures.R @@ -0,0 +1,41 @@ +#' addsampleFeatures +#' +#' Adds custom sample-level data to the samplefeatData field of a CNQuant or SigQuant object. +#' This can be additional sample information (purity, tumour type, etc.) that can +#' be used in downstream analysis. +#' +#' @param object CNQuant or SigQuant class object +#' @param sample.data data.frame containing sample-level variables +#' @param id.col column containing sample identifiers +#' +#' @return CNQuant or SigQuant object with updated samplefeatData +#' @export +#' +addsampleFeatures <- function(object,sample.data=NULL,id.col = "sample"){ + if(!class(object) %in% c("CNQuant","SigQuant")){ + stop("this function requires a CNQuant or SigQuant class object") + } + if(is.null(sample.data)){ + stop("no sample.data provided") + } + if(!is.data.frame(sample.data)){ + stop("sample data is not a data.frame") + } + if(!id.col %in% colnames(sample.data)){ + stop("id.col not found in sample.data") + } + sampFeat <- object@samplefeatData + newDataSamples <- sample.data[,which(colnames(sample.data) == id.col)] + if(!all(newDataSamples %in% rownames(sampFeat))){ + stop("no overlapping samples in sample.data") + } + mergedsampfeats <- merge.data.frame(sampFeat,sample.data,by.x = "row.names",by.y = id.col,all = T) + rownames(mergedsampfeats) <- mergedsampfeats$Row.names + mergedsampfeats <- mergedsampfeats[,-1] + if(!all(rownames(mergedsampfeats) == names(object@segments))){ + stop("something terrible has happened with the data.frame order") + } + methods::initialize(object,samplefeatData=mergedsampfeats, + ExpData = methods::initialize(object@ExpData, + last.modified = as.character(Sys.time()))) +} diff --git a/R/applyThreshNormAndScaling.R b/R/applyThreshNormAndScaling.R new file mode 100644 index 0000000..b27d95b --- /dev/null +++ b/R/applyThreshNormAndScaling.R @@ -0,0 +1,25 @@ +applyThreshNormAndScaling = function(Hraw) { + + # Normalise matrix + H = t( apply(Hraw, 2, function(x) x/sum(x)) ) + + # Apply signature-specific thresholds (no renormalising happening in order to avoid inflation of signal) + #vThresh = get(load("data/Drews2022_TCGA_Signature_Thresholds.rda")) + vThresh = get(data("Drews2022_TCGA_Signature_Thresholds",envir = environment())) + threshH = sapply(names(vThresh), function(thisSig) { + + sigVals = H[,thisSig] + sigVals[ sigVals < vThresh[thisSig] ] = 0 + + return(sigVals) + }) + + # Scale according to TCGA-specific scaling factors + #lScales = get(load("data/Drews2022_TCGA_Scaling_Variables.rda")) + lScales = get(data("Drews2022_TCGA_Scaling_Variables",envir = environment())) + threshScaledH = scaleByModel(threshH, lScales) + + # Combine for return + lOut = list(rawAct0 = t(Hraw), normAct1 = H, thresholdAct2 = threshH, scaledAct3 = threshScaledH) + return(lOut) +} diff --git a/R/avoidMeasurementErrors.R b/R/avoidMeasurementErrors.R new file mode 100644 index 0000000..fe6c4fb --- /dev/null +++ b/R/avoidMeasurementErrors.R @@ -0,0 +1,5 @@ +avoidMeasurementErrors = function(dtSmooth) { + dtSmooth$segVal[ dtSmooth$segVal < 0 ] = 0 + dtSmooth$sample = factor(dtSmooth$sample) + return(dtSmooth) +} diff --git a/R/calculateActivity.R b/R/calculateActivity.R new file mode 100644 index 0000000..78a857e --- /dev/null +++ b/R/calculateActivity.R @@ -0,0 +1,60 @@ +#' @rdname calculateActivity-methods +#' @aliases calculateActivity +setMethod("calculateActivity", + signature=c(object="CNQuant"), + definition=function(object, method=NULL){ + if(length(object@featFitting) == 0){ + stop("Sample-by-component unavailable - run 'calculateSampleByComponentMatrix()'") + } + # Check method + if(is.null(method)){ + method <- getExperiment(object)@feature.method + } + switch(method, + mac={ + SigActs <- calculateActivityMac(object) + Hraw <- t(SigActs[[1]]) + SigActs <- t(SigActs[[2]]) + SigActs <- list(rawAct0=Hraw,normAct1=NULL,thresholdAct2=SigActs,scaledAct3=NULL) + + #W<-t(get(load("data/Macintyre2018_OV_Signatures_normalised.rda"))) + W <- t(get(data("Macintyre2018_OV_Signatures_normalised",envir = environment()))) + # Combine results + methods::new("SigQuant",object, + activities=SigActs, + signature.model = method, + backup.signatures=W, + backup.thresholds=0.01, + backup.scale=list(mean=c("NULL"),weight=c("NULL")), + backup.scale.model="NULL", + ExpData = methods::initialize(object@ExpData, + last.modified = as.character(Sys.time()), + feature.method = method)) + }, + drews={ + # Calculate activities + Hraw = calculateActivityDrews(object) + # Apply thresholds, normalisation and TCGA-specific scaling + lSigs = applyThreshNormAndScaling(Hraw) + + # Load data to be put into model as backup + #W = get(load("data/Drews2022_TCGA_Signatures.rda")) + W = get(data("Drews2022_TCGA_Signatures",envir = environment())) + #vThresh = get(load("data/Drews2022_TCGA_Signature_Thresholds.rda")) + vThresh = get(data("Drews2022_TCGA_Signature_Thresholds",envir = environment())) + #lScales = get(load("data/Drews2022_TCGA_Scaling_Variables.rda")) + lScales = get(data("Drews2022_TCGA_Scaling_Variables",envir = environment())) + # Combine results + methods::new("SigQuant",object, + activities=lSigs, + signature.model = method, + backup.signatures=W, + backup.thresholds=vThresh, + backup.scale=lScales, + backup.scale.model="TCGA", + ExpData = methods::initialize(object@ExpData, + last.modified = as.character(Sys.time()), + feature.method = method)) + + }) + }) diff --git a/R/calculateActivityDrews.R b/R/calculateActivityDrews.R new file mode 100644 index 0000000..90d2355 --- /dev/null +++ b/R/calculateActivityDrews.R @@ -0,0 +1,42 @@ +calculateActivityDrews = function(myData) { + + # Extract relevant information from object + V = myData@featFitting$sampleByComponent + nSamp = length(getSamples(myData)) + nFeat = ncol(myData@featFitting$sampleByComponent) + + # Load signatures + #W = get(load("data/Drews2022_TCGA_Signatures.rda")) + W = get(data("Drews2022_TCGA_Signatures",envir = environment())) + + # Sanity check mutational catalogue (not really necessary) + if(nSamp > nFeat) { + # Case 1: More samples than features + if(nrow(V) > ncol(V)) { V = t(V) } + } else if(nSamp < nFeat) { + # Case 2: Fewer samples than features + if(nrow(V) < ncol(V)) { V = t(V) } + } else { + # Case 3: Edge case where there are as many samples as features + if(sum(grepl("segsize", colnames(V))) > 0) { V = t(V) } + } + + + # Sanity check signature matrix + if(nrow(W) < ncol(W)) W = t(W) + # Check order of components and fix if necessary + if(! identical(rownames(W), rownames(V)) ) { + W = W[ match(rownames(V), rownames(W)), ] + } + + ### Functions needs: + ## Full matrix V mutCatalogue components (rows) by samples (cols) <= HAVE + ## Left matrix W sigCatalogue components (rows) by signature (cols) <= HAVE + ## Right matrix H expCatalogue signature (rows) by samples (cols) <= WANT + + # component_by_sample => NxM => N - Features, M - Samples => Component by Sample matrix + # component_by_signature => NxL => N - Features, L - Signatures => Component by Signature matrix + Hraw = as.matrix(LinCombDecompSigs(component_by_sample = V, component_by_signature = W)) + return(Hraw) + +} diff --git a/R/calculateActivityMac.R b/R/calculateActivityMac.R new file mode 100644 index 0000000..3acabc6 --- /dev/null +++ b/R/calculateActivityMac.R @@ -0,0 +1,41 @@ +calculateActivityMac <- function(object){ + + sample_by_component<-object@featFitting$sampleByComponent + #component_by_signature<-get(load("data/Macintyre2018_OV_Signatures_normalised.rda")) + component_by_signature <- get(data("Macintyre2018_OV_Signatures_normalised",envir = environment())) + + + # Do the magic. Calculate signature activities / exposures + signature_by_sample = LinCombDecompSigs(t(sample_by_component), component_by_signature) + + Hraw <- signature_by_sample + signature_by_sample<-normaliseMatrix(signature_by_sample) + signature_by_sample + return(list(Hraw,signature_by_sample)) +} + + +normaliseMatrix<-function(signature_by_sample,sig_thresh=0.01) +{ + norm_const<-colSums(signature_by_sample) + sample_by_signature<-apply(signature_by_sample,1,function(x){x/norm_const}) + sample_by_signature<-apply(sample_by_signature,1,lower_norm,sig_thresh) + signature_by_sample<-t(sample_by_signature) + norm_const<-apply(signature_by_sample,1,sum) + sample_by_signature<-apply(signature_by_sample,2,function(x){x/norm_const}) + signature_by_sample<-t(sample_by_signature) + signature_by_sample +} + +lower_norm<-function(x,sig_thresh=0.01) +{ + new_x<-x + for(i in 1:length(x)) + { + if(x[i] 1){ +# # require(foreach) +# # feats = c( "segsize", "bp10MB", "osCN", "changepoint", "copynumber", "bpchrarm" ) +# # doMC::registerDoMC(cores) +# # full_mat = foreach(feat=feats, .combine=cbind) %dopar% { +# # calculateSumOfPosteriors_MAC(CN_features[[feat]],all_components[[feat]], +# # feat, rowIter = rowIter, cores = subcores) +# # } +# # } else { +# full_mat<-cbind( +# calculateSumOfPosteriors_MAC(CN_features[["segsize"]],all_components[["segsize"]],"segsize"), +# calculateSumOfPosteriors_MAC(CN_features[["bp10MB"]],all_components[["bp10MB"]],"bp10MB"), +# calculateSumOfPosteriors_MAC(CN_features[["osCN"]],all_components[["osCN"]],"osCN"), +# calculateSumOfPosteriors_MAC(CN_features[["changepoint"]],all_components[["changepoint"]],"changepoint"), +# calculateSumOfPosteriors_MAC(CN_features[["copynumber"]],all_components[["copynumber"]],"copynumber"), +# calculateSumOfPosteriors_MAC(CN_features[["bpchrarm"]],all_components[["bpchrarm"]],"bpchrarm")) +# #} +# rownames(full_mat)<-unique(CN_features[["segsize"]][,1]) +# full_mat[is.na(full_mat)]<-0 +# list(method=method,sampleByComponent=full_mat,model=all_components) +# } +# +# calculateSumOfPosteriors_MAC<-function(CN_feature,components,name, rowIter = 1000, cores = 1) +# { +# # if(cores > 1){ +# # require(foreach) +# # require(doMC) +# # len = dim(CN_feature)[1] +# # iters = floor( len / rowIter ) +# # lastiter = iters[length(iters)] +# # registerDoMC(cores) +# # curr_posterior = foreach( i=0:iters, .combine=rbind) %dopar% { +# # start = i*rowIter+1 +# # if(i != lastiter) { end = (i+1)*rowIter } else { end = len } +# # flexmix::posterior(components,data.frame(dat=as.numeric(CN_feature[start:end,2]))) +# # } +# # } else { +# curr_posterior<-flexmix::posterior(components,data.frame(dat=as.numeric(CN_feature[,2]))) +# #} +# mat<-cbind(CN_feature,curr_posterior) +# posterior_sum<-c() +# ## foreach and parallelising doesn't make the following code faster. +# for(i in unique(mat$ID)) +# { +# posterior_sum<-rbind(posterior_sum,colSums(mat[mat$ID==i,c(-1,-2)])) +# } +# params<-flexmix::parameters(components) +# if(!is.null(nrow(params))) +# { +# posterior_sum<-posterior_sum[,order(params[1,])] +# } +# else +# { +# posterior_sum<-posterior_sum[,order(params)] +# } +# colnames(posterior_sum)<-paste0(name,1:ncol(posterior_sum)) +# rownames(posterior_sum)<-rownames(unique(mat$ID)) +# posterior_sum +# } diff --git a/R/checkChromosomeFormat.R b/R/checkChromosomeFormat.R new file mode 100644 index 0000000..43e7abb --- /dev/null +++ b/R/checkChromosomeFormat.R @@ -0,0 +1,3 @@ +checkChromosomeFormat <- function(){ + +} diff --git a/R/checkSegValRounding.R b/R/checkSegValRounding.R new file mode 100644 index 0000000..5c2f822 --- /dev/null +++ b/R/checkSegValRounding.R @@ -0,0 +1,8 @@ +checkSegValRounding <- function(x){ + x <- as.numeric(x) + if(all(x == round(x,digits = 0))){ + return(TRUE) + } else { + return(FALSE) + } +} diff --git a/R/checkbinned.R b/R/checkbinned.R new file mode 100644 index 0000000..1153d37 --- /dev/null +++ b/R/checkbinned.R @@ -0,0 +1,11 @@ +checkbinned <- function(segTable){ + t.chr <- unique(segTable$chromosome)[1] + t.end <- segTable$end[segTable$chromosome == t.chr] + t.start <- segTable$start[segTable$chromosome == t.chr] + startend.len <- length(unique(t.end - t.start)) + if(startend.len < 2){ + return(TRUE) + } else { + return(FALSE) + } +} diff --git a/R/clinPredictionDenovo.R b/R/clinPredictionDenovo.R new file mode 100644 index 0000000..cd92b54 --- /dev/null +++ b/R/clinPredictionDenovo.R @@ -0,0 +1,30 @@ +#' @rdname clinPredictionDenovo-methods +#' @aliases clinPredictionDenovo +setMethod("clinPredictionDenovo", + signature=c(object="SigQuant"), + definition=function(object, sampTrain, sigsTrain){ + if(getExperiment(object)@feature.method != "drews"){ + stop("This function is only applicable to objects using drews method.") + } + # Load normalised signature activities + mNorm = object@activities$normAct1 + + # Extract samples for training + if(is.null(sampTrain)) { stop("No sample names supplied.")} + if(is.null(sigsTrain)) { stop("No signature names supplied.")} + if(length(sigsTrain) != 2) { stop("So far only two signatures can be used.")} + mTrain = mNorm[ rownames(mNorm) %in% sampTrain, colnames(mNorm) %in% sigsTrain] + mTest = mNorm[ ! rownames(mNorm) %in% sampTrain, colnames(mNorm) %in% sigsTrain] + + # Scale training data and apply to test cohort + scaledTrain = scale(mTest) + lModel = list(mean = attr(scaledTrain, "scaled:center"), + scale = attr(scaledTrain, "scaled:scale")) + + scaledTest = scaleByModel(mTest, lModel) + + # Do classification + vPred = ifelse(scaledTest[,sigsTrain[1]] >= scaledTest[,sigsTrain[2]], paste("Signature", sigsTrain[1], "higher"), + paste("Signature", sigsTrain[2], "higher")) + return(vPred) +}) diff --git a/R/clinPredictionPlatinum.R b/R/clinPredictionPlatinum.R new file mode 100644 index 0000000..81ade7b --- /dev/null +++ b/R/clinPredictionPlatinum.R @@ -0,0 +1,19 @@ +#' @rdname clinPredictionPlatinum-methods +#' @aliases clinPredictionPlatinum +setMethod("clinPredictionPlatinum", + signature=c(object="SigQuant"), + definition=function(object){ + if(getExperiment(object)@feature.method != "drews"){ + stop("This function is only applicable to objects using drews method.") + } + # Load normalised signature activities + mNorm = object@activities$normAct1 + + # Load and apply gBRCA1 scaling vars + lModel = get(load("data/Drews2022_CX3CX2_Clinical_classifier.rda")) + mNormGBRCA1 = scaleByModel(mNorm[,names(lModel$mean)], lModel) + + # Do classification + vPred = ifelse(mNormGBRCA1[,"CX3"] >= mNormGBRCA1[,"CX2"], "Predicted sensitive", "Predicted resistant") + return(vPred) +}) diff --git a/R/createCNQuant.R b/R/createCNQuant.R new file mode 100644 index 0000000..0e1d2cd --- /dev/null +++ b/R/createCNQuant.R @@ -0,0 +1,87 @@ +#' createCNQuant +#' +#' @param data Unrounded absolute copy number data +#' @param experimentName A name for the experiment (default: defaultExperiment) +#' @param build A genome build specified as either hg19 or hg38 (default: hg19) +#' +#' @return A CNQuant class object +#' @export createCNQuant +#' +createCNQuant <- function(data=NULL,experimentName = "defaultExperiment",build = "hg19"){ + if(is.null(data)){ + stop("no data provided\n") + } + if(is.character(data)){ + if(!file.exists(data)){ + stop("File not found\n") + } + if(file.exists(data)){ + header <- colnames(data.table::fread(input = data, + header = T, + colClasses = c("character","numeric","numeric","numeric","character"), + nrows = 1)) + if(!any(header == c("chromosome","start","end","segVal","sample"))){ + stop("Header does not match the required naming") + } + segTable <- data.table::fread(input = data, + header = T, + colClasses = c("character","numeric","numeric","numeric","character")) + if(checkSegValRounding(segTable$segVal)){ + warning("segVal appears to be rounded, copy number signatures require unrounded absolute copy numbers") + } + if(checkbinned(segTable)){ + #segTable <- getSegTable() + # + # Not implemented + # + #split(segTable,f = as.factor(segTable$sample)) + } else { + segTable <- split(segTable,f = as.factor(segTable$sample)) + } + samplefeatData <- generateSampleFeatData(x = segTable) + methods::new("CNQuant",segments = segTable,samplefeatData = samplefeatData, + ExpData = methods::new("ExpQuant", + build = build, + samples.full = length(segTable), + samples.current = length(segTable), + experimentName = experimentName)) + } + } else if("QDNAseqCopyNumbers" %in% class(data)){ + segTable <- getSegTable(x = data) + if(checkSegValRounding(segTable$segVal)){ + warning("segVal appears to be rounded, copy number signatures require unrounded absolute copy numbers") + } + segTable <- split(segTable,f = as.factor(segTable$sample)) + samplefeatData <- generateSampleFeatData(x = segTable) + methods::new("CNQuant",segments = segTable,samplefeatData = samplefeatData, + ExpData = methods::new("ExpQuant", + build = build, + samples.full = length(segTable), + samples.current = length(segTable), + experimentName = experimentName)) + } else if(is.data.frame(data)){ + header <- colnames(data) + if(!any(header == c("chromosome","start","end","segVal","sample"))){ + stop("Header does not match the required naming") + } + segTable <- data + if(checkSegValRounding(segTable$segVal)){ + warning("segVal appears to be rounded, copy number signatures require unrounded absolute copy numbers") + } + if(checkbinned(segTable)){ + #segTable <- getSegTable() + #split(segTable,f = as.factor(segTable$sample)) + } else { + segTable <- split(segTable,f = as.factor(segTable$sample)) + } + samplefeatData <- generateSampleFeatData(x = segTable) + methods::new("CNQuant",segments=segTable,samplefeatData = samplefeatData, + ExpData = methods::new("ExpQuant", + build = build, + samples.full = length(segTable), + samples.current = length(segTable) + ,experimentName = experimentName)) + } else { + stop("Unknown input format\n") + } +} diff --git a/R/data.R b/R/data.R new file mode 100644 index 0000000..b9589ed --- /dev/null +++ b/R/data.R @@ -0,0 +1,114 @@ +#' gap_hg19 +#' +#' Chromosomal banding and position of genomic features in genome build hg19 +#' +#' @docType data +#' @keywords datasets +#' @name gap_hg19 +#' @usage data(gap_hg19) +#' @format A data frame with 457 rows and 9 variables +NULL + +#' hg19.chrom.sizes +#' +#' Chromosomal lengths for genome build hg19 +#' +#' @docType data +#' @keywords datasets +#' @name hg19.chrom.sizes +#' @usage data(hg19.chrom.sizes) +#' @format A data frame with 24 rows and 2 variables +NULL + +#' Drews2022_CX3CX2_Clinical_classifier +#' +#' List of mean and scaling factors for signatures 2 and 3 for the prediction of +#' platinum status used in Drews 2022 +#' +#' @docType data +#' @keywords datasets +#' @name Drews2022_CX3CX2_Clinical_classifier +#' @usage data(Drews2022_CX3CX2_Clinical_classifier) +#' @format A list of 2 numeric vectors of length 2 +NULL + +#' Drews2022_TCGA_Mixture_Models +#' +#' List of mixture model mean, standard deviation, and weight for each copy number +#' feature used in Drews 2022 +#' +#' @docType data +#' @keywords datasets +#' @name Drews2022_TCGA_Mixture_Models +#' @usage data(Drews2022_TCGA_Mixture_Models) +#' @format A list of 5 data.frames +NULL + +#' Drews2022_TCGA_Scaling_Variables +#' +#' List of mean and scaling factors for signatures used in Drews 2022 +#' +#' @docType data +#' @keywords datasets +#' @name Drews2022_TCGA_Scaling_Variables +#' @usage data(Drews2022_TCGA_Scaling_Variables) +#' @format A list of 2 numeric vectors of length 17 +NULL + +#' Drews2022_TCGA_Signature_Thresholds +#' +#' Numeric vector containing signature-specific thresholds used in activity +#' calculations used in Drews 2022 +#' +#' @docType data +#' @keywords datasets +#' @name Drews2022_TCGA_Signature_Thresholds +#' @usage data(Drews2022_TCGA_Signature_Thresholds) +#' @format A numeric vector of length 17 +NULL + +#' Drews2022_TCGA_Signatures +#' +#' Signature-by-component matrix for 17 derived signatures as used in Drews 2022 +#' +#' @docType data +#' @keywords datasets +#' @name Drews2022_TCGA_Signatures +#' @usage data(Drews2022_TCGA_Signatures) +#' @format A 17 by 43 numeric matrix +NULL + +#' Macintyre2018_OV_Mixture_Models +#' +#' List of mixture model mean, standard deviation, and weight for each copy number +#' feature used in Macintyre 2018 +#' +#' @docType data +#' @keywords datasets +#' @name Macintyre2018_OV_Mixture_Models +#' @usage data(Macintyre2018_OV_Mixture_Models) +#' @format A list of 6 data.frames +NULL + +#' Macintyre2018_OV_Signatures +#' +#' Signature-by-component matrix for 7 derived signatures as used in Macintyre 2018 +#' +#' @docType data +#' @keywords datasets +#' @name Macintyre2018_OV_Signatures +#' @usage data(Macintyre2018_OV_Signatures) +#' @format A 7 by 36 numeric matrix +NULL + +#' Macintyre2018_OV_Signatures_normalised +#' +#' Signature-by-component matrix for 7 derived signatures as used in Macintyre 2018 +#' where column sums are normalised to 1. +#' +#' @docType data +#' @keywords datasets +#' @name Macintyre2018_OV_Signatures_normalised +#' @usage data(Macintyre2018_OV_Signatures_normalised) +#' @format A 7 by 36 numeric matrix +NULL diff --git a/R/extractCopynumberFeaturesDrews.R b/R/extractCopynumberFeaturesDrews.R new file mode 100644 index 0000000..19b1a67 --- /dev/null +++ b/R/extractCopynumberFeaturesDrews.R @@ -0,0 +1,66 @@ +extractCopynumberFeaturesDrews = function(CN_data, cores = 1, allowedError = 0.1, rmNorm = FALSE) { + + # Get chromosome length and centromere locations + #chrlen <- get(load("data/hg19.chrom.sizes.rda")) + chrlen <- get(data("hg19.chrom.sizes",envir = environment())) + #gaps <- get(load("data/gap_hg19.rda")) + gaps <- get(data("gap_hg19",envir = environment())) + centromeres = gaps[gaps[,8]=="centromere",] + + if(cores > 1) { + if (!requireNamespace("doParallel", quietly = TRUE)) { + stop( + "Package \"doParallel\" must be installed to use multiple threads/cores.", + call. = FALSE + ) + } + # Multi-core usage + `%dopar%` <- foreach::`%dopar%` + doParallel::registerDoParallel(cores) + i <- NULL + temp_list = foreach::foreach(i=1:6) %dopar% { + if(i == 1){ + list(segsize = getSegsizeDrews(CN_data, rmNorm = rmNorm) ) + } else if (i == 2) { + list(bp10MB = getBPnumDrews(CN_data,chrlen) ) + } else if (i == 3) { + list(osCN = getOscillationDrews(CN_data,chrlen) ) + } else if (i == 4) { + list(bpchrarm = getCentromereDistCountsDrews(CN_data,centromeres,chrlen) ) + } else if (i == 5) { + list(changepoint = getChangepointCNDrews(CN_data, allowedError, rmNorm = rmNorm) ) + } else { + # Technically not needed but kept for backwards compatibility of the code + list(copynumber = getCNDrews(CN_data, rmNorm = rmNorm) ) + } + } + doParallel::stopImplicitCluster() + + # Another failsafe that the outcome is definitely numeric + temp_list = unlist( temp_list, recursive = FALSE ) + outList = lapply(temp_list, function(thisDF) { + thisDF[,2] = as.numeric(thisDF[,2]) + return(thisDF) + }) + return( outList ) + + } else { + # Single core usage + segsize<-getSegsizeDrews(CN_data) + bp10MB<-getBPnumDrews(CN_data,chrlen) + osCN<-getOscillationDrews(CN_data,chrlen) + bpchrarm<-getCentromereDistCountsDrews(CN_data,centromeres,chrlen) + changepoint<-getChangepointCNDrews(CN_data) + copynumber<-getCNDrews(CN_data) + + temp_list = list(segsize=segsize,bp10MB=bp10MB,osCN=osCN,bpchrarm=bpchrarm,changepoint=changepoint,copynumber=copynumber) + #temp_list = unlist( temp_list, recursive = FALSE ) + outList = lapply(temp_list, function(thisDF) { + thisDF[,2] = as.numeric(thisDF[,2]) + return(thisDF) + }) + return( outList ) + + } + +} diff --git a/R/extractCopynumberFeaturesMac.R b/R/extractCopynumberFeaturesMac.R new file mode 100644 index 0000000..f810b84 --- /dev/null +++ b/R/extractCopynumberFeaturesMac.R @@ -0,0 +1,51 @@ +extractCopynumberFeaturesMac <- function(CN_data,cores = 1){ + #chrlen <- get(load("data/hg19.chrom.sizes.rda")) + chrlen <- get(data("hg19.chrom.sizes",envir = environment())) + #gaps <- get(load("data/gap_hg19.rda")) + gaps <- get(data("gap_hg19",envir = environment())) + centromeres <- gaps[gaps[,8]=="centromere",] + if(cores > 1) { + if (!requireNamespace("doParallel", quietly = TRUE)) { + stop( + "Package \"doParallel\" must be installed to use multiple threads/cores.", + call. = FALSE + ) + } + # Multi-core usage + `%dopar%` <- foreach::`%dopar%` + doParallel::registerDoParallel(cores) + i <- NULL + temp_list = foreach::foreach(i=1:6) %dopar% { + if(i == 1){ + list(segsize = getSegsizeMac(CN_data) ) + } else if (i == 2) { + list(bp10MB = getBPnumMac(CN_data,chrlen) ) + } else if (i == 3) { + list(osCN = getOscillationMac(CN_data,chrlen) ) + } else if (i == 4) { + list(bpchrarm = getCentromereDistCountsMac(CN_data,centromeres,chrlen) ) + } else if (i == 5) { + list(changepoint = getChangepointCNMac(CN_data) ) + } else { + list(copynumber = getCNMac(CN_data) ) + } + + } + doParallel::stopImplicitCluster() + unlist( temp_list, recursive = FALSE ) + } else { + segsize <- getSegsizeMac(CN_data) + bp10MB <- getBPnumMac(CN_data,chrlen) + osCN <- getOscillationMac(CN_data,chrlen) + bpchrarm <- getCentromereDistCountsMac(CN_data,centromeres,chrlen) + changepoint <- getChangepointCNMac(CN_data) + copynumber <- getCNMac(CN_data) + + list(segsize=segsize, + bp10MB=bp10MB, + osCN=osCN, + bpchrarm=bpchrarm, + changepoint=changepoint, + copynumber=copynumber) + } +} diff --git a/R/generateSampleFeatData.R b/R/generateSampleFeatData.R new file mode 100644 index 0000000..1ff6f2b --- /dev/null +++ b/R/generateSampleFeatData.R @@ -0,0 +1,8 @@ +generateSampleFeatData <- function(x){ + segCounts <- getSegCounts(x) + ploidy <- getPloidyfeat(x) + featData <- data.frame( + segCounts = segCounts, + ploidy = ploidy) + return(featData) +} diff --git a/R/getBPnumDrews.R b/R/getBPnumDrews.R new file mode 100644 index 0000000..cd332e2 --- /dev/null +++ b/R/getBPnumDrews.R @@ -0,0 +1,30 @@ +getBPnumDrews = function(abs_profiles,chrlen, SIZE = 10000000) { + + # Prepare looping + out = c() + samps = names(abs_profiles) + + # Loop over samples + for(i in samps) { + # Retrieve segments + segTab = abs_profiles[[i]] + colnames(segTab)[4] = "segVal" + + # Loop over chromosomes and identify breaks + chrs = unique(segTab$chromosome) + allBPnum = c() + for(c in chrs) { + currseg = segTab[segTab$chromosome == c,] + intervals = seq(1, chrlen[chrlen[,1] == paste0("chr",c),2]+SIZE, SIZE) + res = hist(as.numeric(currseg$end[-nrow(currseg)]),breaks=intervals,plot=FALSE)$counts + allBPnum = c(allBPnum,res) + } + # Make sure it's really numeric + out = rbind(out, cbind(ID = rep(i,length(allBPnum)), + value = as.numeric(allBPnum))) + } + + # Prepare return + rownames(out) = NULL + return(data.frame(out,stringsAsFactors = FALSE)) +} diff --git a/R/getBPnumMac.R b/R/getBPnumMac.R new file mode 100644 index 0000000..8f5b220 --- /dev/null +++ b/R/getBPnumMac.R @@ -0,0 +1,21 @@ +getBPnumMac <- function(abs_profiles,chrlen){ + out <- c() + samps <- names(abs_profiles) + for(i in samps) + { + segTab <- abs_profiles[[i]] + colnames(segTab)[4] <- "segVal" + chrs <- unique(segTab$chromosome) + allBPnum <- c() + for(c in chrs) + { + currseg <- segTab[segTab$chromosome==c,] + intervals <- seq(1,chrlen[chrlen[,1]==paste0("chr",c),2]+10000000,10000000) + res <- graphics::hist(as.numeric(currseg$end[-nrow(currseg)]),breaks=intervals,plot=FALSE)$counts + allBPnum <- c(allBPnum,res) + } + out <- rbind(out,cbind(ID=rep(i,length(allBPnum)),value=allBPnum)) + } + rownames(out) <- NULL + data.frame(out,stringsAsFactors = F) +} diff --git a/R/getCNDrews.R b/R/getCNDrews.R new file mode 100644 index 0000000..d63c45f --- /dev/null +++ b/R/getCNDrews.R @@ -0,0 +1,23 @@ +getCNDrews = function(abs_profiles, rmNorm = FALSE) { + + # Prepare looping + out = c() + samps = names(abs_profiles) + # Loop over samples + for(i in samps) { + + # Retrieve segments + segTab = abs_profiles[[i]] + colnames(segTab)[4] = "segVal" + + segTab$segVal[as.numeric(segTab$segVal)<0] = 0 + # If wished, don't consider normal segments + if(rmNorm) { segTab = segTab[ segTab$segVal != 2, ] } + cn = as.numeric(segTab$segVal) + # Make sure it's really numeric. + out = rbind(out,cbind(ID=rep(i,length(cn)),value=as.numeric(cn))) + } + # Prepare return + rownames(out) = NULL + return(data.frame(out,stringsAsFactors = FALSE)) +} diff --git a/R/getCNMac.R b/R/getCNMac.R new file mode 100644 index 0000000..d4399a7 --- /dev/null +++ b/R/getCNMac.R @@ -0,0 +1,21 @@ +getCNMac<-function(abs_profiles){ + out<-c() + samps<-names(abs_profiles) + for(i in samps) + { + if(class(abs_profiles)=="QDNAseqCopyNumbers") + { + segTab<-getSegTable(abs_profiles[,which(colnames(abs_profiles)==i)]) + } + else + { + segTab<-abs_profiles[[i]] + colnames(segTab)[4]<-"segVal" + } + segTab$segVal[as.numeric(segTab$segVal)<0]<-0 + cn<-as.numeric(segTab$segVal) + out<-rbind(out,cbind(ID=rep(i,length(cn)),value=cn)) + } + rownames(out)<-NULL + data.frame(out,stringsAsFactors = F) +} diff --git a/R/getCentromereDistCountsDrews.R b/R/getCentromereDistCountsDrews.R new file mode 100644 index 0000000..5d4e864 --- /dev/null +++ b/R/getCentromereDistCountsDrews.R @@ -0,0 +1,46 @@ +getCentromereDistCountsDrews = function(abs_profiles,centromeres,chrlen) { + + # Prepare looping + out = c() + samps = names(abs_profiles) + # Loop over samples + for(i in samps) { + + # Retrieve segments + segTab = abs_profiles[[i]] + colnames(segTab)[4] = "segVal" + + # Loop over chromosomes + chrs = unique(segTab$chromosome) + all_dists = c() + for(c in chrs) { + if(nrow(segTab) > 1) { + starts = as.numeric(segTab$start[segTab$chromosome==c])[-1] + segstart = as.numeric(segTab$start[segTab$chromosome==c])[1] + ends = as.numeric(segTab$end[segTab$chromosome==c]) + segend = ends[length(ends)] + ends = ends[-length(ends)] + centstart = as.numeric(centromeres[substr(centromeres[,2],4,5)==c,3]) + centend = as.numeric(centromeres[substr(centromeres[,2],4,5)==c,4]) + chrend = chrlen[substr(chrlen[,1],4,5)==c,2] + ndist = cbind(rep(NA,length(starts)),rep(NA,length(starts))) + ndist[starts<=centstart,1] = (centstart-starts[starts<=centstart])/(centstart-segstart)*-1 + ndist[starts>=centend,1] = (starts[starts>=centend]-centend)/(segend-centend) + ndist[ends<=centstart,2] = (centstart-ends[ends<=centstart])/(centstart-segstart)*-1 + ndist[ends>=centend,2] = (ends[ends>=centend]-centend)/(segend-centend) + ndist = apply(ndist,1,min) + + all_dists = rbind(all_dists,sum(ndist>0)) + all_dists = rbind(all_dists,sum(ndist<=0)) + } + } + if(nrow(all_dists)>0) { + # Make sure it's really numeric + out = rbind(out,cbind(ID=i,ct1=as.numeric(all_dists[,1]))) + } + } + + # Prepare return + rownames(out) = NULL + return(data.frame(out,stringsAsFactors = FALSE)) +} diff --git a/R/getCentromereDistCountsMac.R b/R/getCentromereDistCountsMac.R new file mode 100644 index 0000000..8219bbf --- /dev/null +++ b/R/getCentromereDistCountsMac.R @@ -0,0 +1,46 @@ +getCentromereDistCountsMac<-function(abs_profiles,centromeres,chrlen){ + out<-c() + samps<-names(abs_profiles) + for(i in samps) + { + if(class(abs_profiles)=="QDNAseqCopyNumbers") + { + segTab<-getSegTable(abs_profiles[,which(colnames(abs_profiles)==i)]) + }else + { + segTab<-abs_profiles[[i]] + colnames(segTab)[4]<-"segVal" + } + chrs<-unique(segTab$chromosome) + all_dists<-c() + for(c in chrs) + { + if(nrow(segTab)>1) + { + starts<-as.numeric(segTab$start[segTab$chromosome==c])[-1] + segstart<-as.numeric(segTab$start[segTab$chromosome==c])[1] + ends<-as.numeric(segTab$end[segTab$chromosome==c]) + segend<-ends[length(ends)] + ends<-ends[-length(ends)] + centstart<-as.numeric(centromeres[substr(centromeres[,2],4,5)==c,3]) + centend<-as.numeric(centromeres[substr(centromeres[,2],4,5)==c,4]) + chrend<-chrlen[substr(chrlen[,1],4,5)==c,2] + ndist<-cbind(rep(NA,length(starts)),rep(NA,length(starts))) + ndist[starts<=centstart,1]<-(centstart-starts[starts<=centstart])/(centstart-segstart)*-1 + ndist[starts>=centend,1]<-(starts[starts>=centend]-centend)/(segend-centend) + ndist[ends<=centstart,2]<-(centstart-ends[ends<=centstart])/(centstart-segstart)*-1 + ndist[ends>=centend,2]<-(ends[ends>=centend]-centend)/(segend-centend) + ndist<-apply(ndist,1,min) + + all_dists<-rbind(all_dists,sum(ndist>0)) + all_dists<-rbind(all_dists,sum(ndist<=0)) + } + } + if(nrow(all_dists)>0) + { + out<-rbind(out,cbind(ID=i,ct1=all_dists[,1])) + } + } + rownames(out)<-NULL + data.frame(out,stringsAsFactors = F) +} diff --git a/R/getChangepointCNDrews.R b/R/getChangepointCNDrews.R new file mode 100644 index 0000000..867b12c --- /dev/null +++ b/R/getChangepointCNDrews.R @@ -0,0 +1,46 @@ +getChangepointCNDrews = function(abs_profiles, allowedError = 0.1, rmNorm = FALSE) { + + # Prepare looping + out = c() + samps = names(abs_profiles) + # Loop over samples + for(i in samps) { + + # Retrieve segments + segTab = abs_profiles[[i]] + colnames(segTab)[4] = "segVal" + + # Initiate and prepare looping over chromosomes + segTab$segVal = as.numeric(segTab$segVal) + segTab$segVal[segTab$segVal<0] = 0 + chrs = unique(segTab$chromosome) + allcp = c() + + # Loop over chromosomes + for(c in chrs) { + currseg = as.numeric(segTab$segVal[segTab$chromosome==c]) + firstSeg = abs(2 - currseg[1] ) + # As we look only at the left end of a CNA, we might miss a changepoint at the beginning of the p-arm + # That's why we check manually but only regard this value if it is higher than an allowed error rate. + if(firstSeg <= allowedError) { + theseChanges = abs(currseg[-1]-currseg[-length(currseg)]) + if(rmNorm) { theseChanges = theseChanges[ currseg[-1] != 2 ] } + allcp = c(allcp, theseChanges) + } else { + theseChanges = c( firstSeg, abs(currseg[-1]-currseg[-length(currseg)]) ) + if(rmNorm) { theseChanges = theseChanges[ currseg != 2 ] } + allcp = c(allcp, theseChanges) + } + + } + if(length(allcp)==0) { + allcp = 0 #if there are no changepoints + } + # Make sure it's really numeric + out = rbind(out,cbind(ID=rep(i,length(allcp)),value=as.numeric(allcp))) + } + + # Prepare return + rownames(out) = NULL + return(data.frame(out,stringsAsFactors = FALSE)) +} diff --git a/R/getChangepointCNMac.R b/R/getChangepointCNMac.R new file mode 100644 index 0000000..1f09aef --- /dev/null +++ b/R/getChangepointCNMac.R @@ -0,0 +1,31 @@ +getChangepointCNMac<-function(abs_profiles){ + out<-c() + samps<-names(abs_profiles) + for(i in samps) + { + if(class(abs_profiles)=="QDNAseqCopyNumbers") + { + segTab<-getSegTable(abs_profiles[,which(colnames(abs_profiles)==i)]) + } + else + { + segTab<-abs_profiles[[i]] + colnames(segTab)[4]<-"segVal" + } + segTab$segVal[as.numeric(segTab$segVal)<0]<-0 + chrs<-unique(segTab$chromosome) + allcp<-c() + for(c in chrs) + { + currseg<-as.numeric(segTab$segVal[segTab$chromosome==c]) + allcp<-c(allcp,abs(currseg[-1]-currseg[-length(currseg)])) + } + if(length(allcp)==0) + { + allcp<-0 #if there are no changepoints + } + out<-rbind(out,cbind(ID=rep(i,length(allcp)),value=allcp)) + } + rownames(out)<-NULL + data.frame(out,stringsAsFactors = F) +} diff --git a/R/getExperiment.R b/R/getExperiment.R new file mode 100644 index 0000000..0cadc16 --- /dev/null +++ b/R/getExperiment.R @@ -0,0 +1,5 @@ +#' @rdname getExperiment-methods +#' @aliases getExperiment +setMethod("getExperiment",signature = "CNQuant",function(object){ + object@ExpData +}) diff --git a/R/getOscillationDrews.R b/R/getOscillationDrews.R new file mode 100644 index 0000000..ef4349b --- /dev/null +++ b/R/getOscillationDrews.R @@ -0,0 +1,47 @@ +getOscillationDrews = function(abs_profiles, chrlen) { + + # Prepare looping + out = c() + samps = names(abs_profiles) + # Loop over samples + for(i in samps) { + + # Retrieve segments + segTab = abs_profiles[[i]] + colnames(segTab)[4] = "segVal" + + # Loop over chromosomes to identify oscillation + chrs = unique(segTab$chromosome) + oscCounts = c() + for(c in chrs) { + + currseg = as.numeric(segTab$segVal[segTab$chromosome == c]) + currseg = round(as.numeric(currseg)) + + # Only take chains into consideration with a length of more than 3 elements + if(length(currseg)>3) { + prevval = currseg[1] + count = 0 + for(j in 3:length(currseg)) { + if(currseg[j] == prevval & currseg[j] != currseg[j-1]) { + count = count+1 + } else { + oscCounts = c(oscCounts,count) + count = 0 + } + prevval = currseg[j-1] + } + } + } + # Make sure it's really numeric + out = rbind(out, cbind(ID = rep(i,length(oscCounts)), + value = as.numeric(oscCounts))) + if(length(oscCounts) == 0) { + out = rbind(out,cbind(ID = i, value = 0)) + } + } + + # Prepare return + rownames(out) = NULL + return(data.frame(out,stringsAsFactors = FALSE)) +} diff --git a/R/getOscillationMac.R b/R/getOscillationMac.R new file mode 100644 index 0000000..0cc1224 --- /dev/null +++ b/R/getOscillationMac.R @@ -0,0 +1,45 @@ +getOscillationMac<-function(abs_profiles,chrlen){ + out<-c() + samps<-names(abs_profiles) + for(i in samps) + { + if(class(abs_profiles)=="QDNAseqCopyNumbers") + { + segTab<-getSegTable(abs_profiles[,which(colnames(abs_profiles)==i)]) + }else + { + segTab<-abs_profiles[[i]] + colnames(segTab)[4]<-"segVal" + } + chrs<-unique(segTab$chromosome) + oscCounts<-c() + for(c in chrs) + { + currseg<-segTab$segVal[segTab$chromosome==c] + currseg<-round(as.numeric(currseg)) + if(length(currseg)>3) + { + prevval<-currseg[1] + count=0 + for(j in 3:length(currseg)) + { + if(currseg[j]==prevval&currseg[j]!=currseg[j-1]) + { + count<-count+1 + }else{ + oscCounts<-c(oscCounts,count) + count=0 + } + prevval<-currseg[j-1] + } + } + } + out<-rbind(out,cbind(ID=rep(i,length(oscCounts)),value=oscCounts)) + if(length(oscCounts)==0) + { + out<-rbind(out,cbind(ID=i,value=0)) + } + } + rownames(out)<-NULL + data.frame(out,stringsAsFactors = F) +} diff --git a/R/getPloidyfeat.R b/R/getPloidyfeat.R new file mode 100644 index 0000000..2cd8150 --- /dev/null +++ b/R/getPloidyfeat.R @@ -0,0 +1,8 @@ +getPloidyfeat <- function(x){ + featploidy <- unlist(lapply(x,FUN = function(y){ + segLen<-(as.numeric(y$end)-as.numeric(y$start)) + ploidy<-sum((segLen/sum(segLen))*as.numeric(y$segVal)) + })) + featploidy <- round(featploidy,digits = 3) + return(featploidy) +} diff --git a/R/getSampleByComponent.R b/R/getSampleByComponent.R new file mode 100644 index 0000000..ebce6d8 --- /dev/null +++ b/R/getSampleByComponent.R @@ -0,0 +1,5 @@ +#' @rdname getSampleByComponent-methods +#' @aliases getSampleByComponent +setMethod("getSampleByComponent",signature = "CNQuant",function(object){ + object@featFitting$sampleByComponent +}) diff --git a/R/getSamplefeatures.R b/R/getSamplefeatures.R new file mode 100644 index 0000000..ffa0dfd --- /dev/null +++ b/R/getSamplefeatures.R @@ -0,0 +1,5 @@ +#' @rdname getSamplefeatures-methods +#' @aliases getSamplefeatures +setMethod("getSamplefeatures",signature = "CNQuant",function(object){ + object@samplefeatData +}) diff --git a/R/getSamples.R b/R/getSamples.R new file mode 100644 index 0000000..02264f5 --- /dev/null +++ b/R/getSamples.R @@ -0,0 +1,5 @@ +#' @rdname getSamples-methods +#' @aliases getSamples +setMethod("getSamples",signature = "CNQuant",function(object){ + rownames(object@samplefeatData) +}) diff --git a/R/getSegCounts.R b/R/getSegCounts.R new file mode 100644 index 0000000..196ba41 --- /dev/null +++ b/R/getSegCounts.R @@ -0,0 +1,4 @@ +getSegCounts <- function(x){ + segCounts <- unlist(lapply(x,nrow)) + return(segCounts) +} diff --git a/R/getSegTable.R b/R/getSegTable.R new file mode 100644 index 0000000..e8d1ea7 --- /dev/null +++ b/R/getSegTable.R @@ -0,0 +1,33 @@ +getSegTable<-function(x){ + if(class(x)=="QDNAseqCopyNumbers"){ + sn<-Biobase::assayDataElement(x,"segmented") + fd <- Biobase::fData(x) + fd$use -> use + fdfiltfull<-fd[use,] + sn<-sn[use,] + segTable<-c() + for(s in colnames(sn)){ + for(c in unique(fdfiltfull$chromosome)) + { + snfilt<-sn[fdfiltfull$chromosome==c,colnames(sn) == s] + fdfilt<-fdfiltfull[fdfiltfull$chromosome==c,] + sn.rle<-rle(snfilt) + starts <- cumsum(c(1, sn.rle$lengths[-length(sn.rle$lengths)])) + ends <- cumsum(sn.rle$lengths) + lapply(1:length(sn.rle$lengths), function(s) { + from <- fdfilt$start[starts[s]] + to <- fdfilt$end[ends[s]] + segValue <- sn.rle$value[s] + c(fdfilt$chromosome[starts[s]], from, to, segValue) + }) -> segtmp + segTableRaw <- data.frame(matrix(unlist(segtmp), ncol=4, byrow=T),sample = rep(s,times=nrow(matrix(unlist(segtmp), ncol=4, byrow=T))),stringsAsFactors=F) + segTable<-rbind(segTable,segTableRaw) + } + } + colnames(segTable) <- c("chromosome", "start", "end", "segVal","sample") + return(segTable) + } else { + # NON QDNASEQ BINNED DATA FUNCTION + + } +} diff --git a/R/getSegments.R b/R/getSegments.R new file mode 100644 index 0000000..67c3764 --- /dev/null +++ b/R/getSegments.R @@ -0,0 +1,6 @@ +#' @rdname getSegments-methods +#' @aliases getSegments +setMethod("getSegments",signature = "CNQuant",function(object){ + segTable <- do.call(rbind,object@segments) + data.frame(segTable,row.names = NULL) +}) diff --git a/R/getSegsizeDrews.R b/R/getSegsizeDrews.R new file mode 100644 index 0000000..4087ad4 --- /dev/null +++ b/R/getSegsizeDrews.R @@ -0,0 +1,33 @@ +getSegsizeDrews = function(abs_profiles, rmNorm = FALSE) { + + # Prepare looping + out = c() + samps = names(abs_profiles) + + # Loop over samples + for(i in samps) { + + # Get segments + segTab = abs_profiles[[i]] + colnames(segTab)[4] = "segVal" + + # Make sure segment values are numeric + segTab$segVal = as.numeric(segTab$segVal) + + # If wished, don't consider normal segments + if(rmNorm) { segTab = segTab[ segTab$segVal != 2, ] } + + # Avoiding potential artefact + segTab$segVal[segTab$segVal<0] = 0 + seglen = segTab$end-segTab$start + seglen = seglen[seglen>0] + + # Double tap. + out = rbind(out,cbind(ID=rep(i,length(seglen)),value=as.numeric(seglen))) + } + + # Prepare return + rownames(out) = NULL + return(data.frame(out,stringsAsFactors = FALSE)) + +} diff --git a/R/getSegsizeMac.R b/R/getSegsizeMac.R new file mode 100644 index 0000000..3267756 --- /dev/null +++ b/R/getSegsizeMac.R @@ -0,0 +1,22 @@ +getSegsizeMac<-function(abs_profiles){ + out <- c() + samps <- names(abs_profiles) + for(i in samps) + { + if(class(abs_profiles)=="QDNAseqCopyNumbers") + { + segTab <- getSegTable(abs_profiles[,which(colnames(abs_profiles)==i)]) + } + else + { + segTab <- abs_profiles[[i]] + colnames(segTab)[4] <- "segVal" + } + segTab$segVal[as.numeric(segTab$segVal)<0]<-0 + seglen <- (as.numeric(segTab$end)-as.numeric(segTab$start)) + seglen <- seglen[seglen>0] + out <- rbind(out,cbind(ID=rep(i,length(seglen)),value=seglen)) + } + rownames(out) <- NULL + data.frame(out,stringsAsFactors = F) +} diff --git a/R/idSmoothingTargets.R b/R/idSmoothingTargets.R new file mode 100644 index 0000000..26fa9dd --- /dev/null +++ b/R/idSmoothingTargets.R @@ -0,0 +1,23 @@ +# Function for identifying neighbouring segments that have the same copy number and should be merged into one segment +idSmoothingTargets = function(dfAllSegs, WIGGLE, colNameSegVal, colNameChr, IGNOREDELS = TRUE) { + + ### Check column name + testSegVal = dfAllSegs[[colNameSegVal]][1] + testChr = dfAllSegs[[colNameChr]][1] + + # Quick sanity checks + if(! is.numeric(testSegVal)) { stop("Segment Value column has no numeric value in it. Supplied correct column name? Forgot conversion?")} + if(is.null(testSegVal)) { stop("Chromosome column has no numeric value in it. Supplied correct column name?")} + + # Take differences to segment down below + dfAllSegs$diffs = c( abs( dfAllSegs[[colNameSegVal]][1:(nrow(dfAllSegs)-1)] - dfAllSegs[[colNameSegVal]][2:nrow(dfAllSegs)] ), WIGGLE+1) + # Set TRUE if difference to next segment is smaller than the user supplied cutoff + dfAllSegs$smooth = dfAllSegs$diffs <= WIGGLE + # Set all segments which are last in a chromosome to FALSE. This also prevents leaking to other samples and cohorts. + dfAllSegs$smooth[ cumsum( rle(as.character(dfAllSegs[[colNameChr]]))$lengths ) ] = FALSE + + # Ignore deletions if wished + if(IGNOREDELS) { dfAllSegs$smooth[ dfAllSegs[[colNameSegVal]] == 0 ] = FALSE } + + return( dfAllSegs ) +} diff --git a/R/methods-CNQuant.R b/R/methods-CNQuant.R new file mode 100644 index 0000000..2d779db --- /dev/null +++ b/R/methods-CNQuant.R @@ -0,0 +1,94 @@ +setMethod("show", signature=c(object="CNQuant"), + definition=function(object){ + cat(class(object)," object (initialised: ",object@ExpData@init.date,")\n\n",sep = "") + cat("Experiment name:",object@ExpData@experimentName,"\n") + cat("Segmented samples: ",object@ExpData@samples.current," (On init: ",object@ExpData@samples.full,")\n",sep = "") + cat("CN feature data: ") + if(length(object@featData) == 0){ + cat("\n\tno data\n") + } else { + cat("\n\tcn feature count: ",length(object@featData),"\n",sep = "") + cat("\tcn features: ",paste0(names(object@featData),collapse = ","),"\n",sep = "") + } + cat("Feature fitting: ") + if(length(object@featFitting) == 0){ + cat("\n\tno data\n") + } else { + cat("\n\tsampleByComponent dim: ",dim(object@featFitting$sampleByComponent)[1]," x ",dim(object@featFitting$sampleByComponent)[2],"\n",sep = "") + cat("\tfitting method: ",object@featFitting$method,"\n",sep = "") + } + cat("Sample feature data:\n") + cat("\tdim: ",dim(object@samplefeatData)[1]," x ",dim(object@samplefeatData)[2],"\n",sep = "") + cat("\tfeatures: ",paste0(colnames(object@samplefeatData),collapse=","),"\n",sep = "") + cat("Experiment:\n") + cat("\tlast modified: ",object@ExpData@last.modified,"\n",sep = "") + cat("\tgenome build: ",object@ExpData@build,"\n",sep = "") + cat("`getExperiment(object)` for more details\n",sep = "") +}) + +#' Extract +#' @param x object to extract subset +#' @param i Indices of subset +#' @param j not used +#' @param ... not used +#' @param drop not used +#' +setMethod("[", signature=c("CNQuant", "numeric", "missing", "ANY"), + definition=function(x, i, j, ..., drop=TRUE){ + segs <- x@segments[i] + samplefeats <- x@samplefeatData[i,] + if(length(x@featData) == 0){ + feats <- x@featData + } else { + subsetSamples <- names(segs) + feats <- subsetCNfeatures(x = x@featData,s = subsetSamples) + } + + if(length(x@featFitting) == 0){ + featsfit <- x@featFitting + } else { + newfeatFitting <- subsetfeatFitting(x = x@featFitting,s = subsetSamples) + featsfit <- newfeatFitting + } + methods::initialize(x, + segments=segs, + featData=feats, + featFitting=featsfit, + samplefeatData=samplefeats, + ExpData = methods::initialize(x@ExpData, + samples.current = length(segs), + last.modified = as.character(Sys.time()))) + }) + +#' Extract +#' @param x object to extract subset +#' @param i Indices of subset +#' @param j not used +#' @param ... not used +#' @param drop not used +#' +setMethod("[", signature=c("CNQuant", "character", "missing", "ANY"), + definition=function(x, i, j, ..., drop=TRUE){ + segs <- x@segments[names(x@segments) %in% i] + samplefeats <- x@samplefeatData[rownames(x@samplefeatData) %in% i,] + if(length(x@featData) == 0){ + feats <- x@featData + } else { + subsetSamples <- i + feats <- subsetCNfeatures(x = x@featData,s = subsetSamples) + } + if(length(x@featFitting) == 0){ + featsfit <- x@featFitting + } else { + newfeatFitting <- subsetfeatFitting(x = x@featFitting,s = subsetSamples) + featsfit <- newfeatFitting + } + methods::initialize(x, + segments=segs, + featData=feats, + featFitting=featsfit, + samplefeatData=samplefeats, + ExpData = methods::initialize(x@ExpData, + samples.current = length(segs), + last.modified = as.character(Sys.time()))) + }) diff --git a/R/methods-ExpQuant.R b/R/methods-ExpQuant.R new file mode 100644 index 0000000..186f140 --- /dev/null +++ b/R/methods-ExpQuant.R @@ -0,0 +1,11 @@ +setMethod("show", signature=c(object="ExpQuant"), + definition=function(object){ + cat(class(object),"object\n") + cat("Experiment name:",object@experimentName,"\n") + cat("Initialisation:",object@init.date,"\n") + cat("Last modified:",object@last.modified,"\n") + cat("Sample count (full):",object@samples.full,"\n") + cat("Sample count (currrent):",object@samples.current,"\n") + cat("Genome build:",object@build,"\n") + cat("Feature method:",object@feature.method,"\n") + }) diff --git a/R/methods-SigQuant.R b/R/methods-SigQuant.R new file mode 100644 index 0000000..3f510ff --- /dev/null +++ b/R/methods-SigQuant.R @@ -0,0 +1,105 @@ +setMethod("show", signature=c(object="SigQuant"), + definition=function(object){ + cat(class(object)," object (initialised: ",object@ExpData@init.date,")\n\n",sep = "") + cat("Experiment name:",object@ExpData@experimentName,"\n") + cat("Segmented samples: ",object@ExpData@samples.current," (On init: ",object@ExpData@samples.full,")\n",sep = "") + cat("CN feature data: ") + if(length(object@featData) == 0){ + cat("\n\tno data\n") + } else { + cat("\n\tcn feature count: ",length(object@featData),"\n",sep = "") + cat("\tcn features: ",paste0(names(object@featData),collapse = ","),"\n",sep = "") + } + cat("Feature fitting: ") + if(length(object@featFitting) == 0){ + cat("\n\tno data\n") + } else { + cat("\n\tsampleByComponent dim: ",dim(object@featFitting$sampleByComponent)[1]," x ",dim(object@featFitting$sampleByComponent)[2],"\n",sep = "") + cat("\tfitting method: ",object@featFitting$method,"\n",sep = "") + } + cat("Sample feature data:\n") + cat("\tdim: ",dim(object@samplefeatData)[1]," x ",dim(object@samplefeatData)[2],"\n",sep = "") + cat("\tfeatures: ",paste0(colnames(object@samplefeatData),collapse=","),"\n",sep = "") + + cat("Signature activities:\n") + cat("\tdim: ",dim(object@activities$rawAct0)[1]," x ",dim(object@activities$rawAct0)[2],"\n",sep = "") + cat("\tsignature model: ",paste0(object@signature.model),"\n",sep = "") + + cat("Experiment:\n") + cat("\tlast modified: ",object@ExpData@last.modified,"\n",sep = "") + cat("\tgenome build: ",object@ExpData@build,"\n",sep = "") + cat("`getExperiment(object)` for more details\n",sep = "") +}) + +#' Extract +#' @param x object to extract subset +#' @param i Indices of subset +#' @param j not used +#' @param ... not used +#' @param drop not used +#' +setMethod("[", signature=c("SigQuant", "numeric", "missing", "ANY"), + definition=function(x, i, j, ..., drop=TRUE){ + segs <- x@segments[i] + samplefeats <- x@samplefeatData[i,] + subsetSamples <- names(segs) + if(length(x@featData) == 0){ + feats <- x@featData + } else { + feats <- subsetCNfeatures(x = x@featData,s = subsetSamples) + } + + if(length(x@featFitting) == 0){ + featsfit <- x@featFitting + } else { + newfeatFitting <- subsetfeatFitting(x = x@featFitting,s = subsetSamples) + featsfit <- newfeatFitting + } + newActivities <- subsetSigActivities(x=x@activities,s = subsetSamples) + + methods::initialize(x, + segments=segs, + featData=feats, + featFitting=featsfit, + samplefeatData=samplefeats, + activities=newActivities, + ExpData = methods::initialize(x@ExpData, + samples.current = length(segs), + last.modified = as.character(Sys.time()))) + }) + +#' Extract +#' @param x object to extract subset +#' @param i Indices of subset +#' @param j not used +#' @param ... not used +#' @param drop not used +#' +setMethod("[", signature=c("SigQuant", "character", "missing", "ANY"), + definition=function(x, i, j, ..., drop=TRUE){ + segs <- x@segments[names(x@segments) %in% i] + samplefeats <- x@samplefeatData[rownames(x@samplefeatData) %in% i,] + subsetSamples <- names(segs) + if(length(x@featData) == 0){ + feats <- x@featData + } else { + feats <- subsetCNfeatures(x = x@featData,s = subsetSamples) + } + if(length(x@featFitting) == 0){ + featsfit <- x@featFitting + } else { + newfeatFitting <- subsetfeatFitting(x = x@featFitting,s = subsetSamples) + featsfit <- newfeatFitting + } + newActivities <- subsetSigActivities(x=x@activities,s = subsetSamples) + + methods::initialize(x, + segments=segs, + featData=feats, + featFitting=featsfit, + samplefeatData=samplefeats, + activities=newActivities, + ExpData = methods::initialize(x@ExpData, + samples.current = length(segs), + last.modified = as.character(Sys.time()))) + }) diff --git a/R/plotActivities.R b/R/plotActivities.R new file mode 100644 index 0000000..b44e306 --- /dev/null +++ b/R/plotActivities.R @@ -0,0 +1,57 @@ +#' plotActivities +#' +#' Plot the copy number signature activites for a given CNQuant or SigQuant class object +#' containing copy number signature activities/exposures. Default ordering by +#' signature CX1 +#' +#' @param object A SigQuant class object +#' +#' @return plot +#' @export plotActivities +#' +plotActivities <- function(object){ + if(is.null(object)){ + stop("No object provided, object should be a object of class CNQuant or SigQuant") + } + if(!class(object) == "SigQuant"){ + stop("Object is not of class SigQuant") + } + ## May need to change which matrix is used + if(object@signature.model == "drews"){ + plotdata <- object@activities$thresholdAct2 + cols <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c', + '#fb9a99','#e31a1c','#fdbf6f','#ff7f00', + '#cab2d6','#6a3d9a','#ffff99','#b15928', + '#8dd3c7','#ffffb3','#bebada','#fb8072', + '#80b1d3') + clms <- 1 + l.pos <- -0.1 + } else { + plotdata <- object@activities$thresholdAct2 + cols <- c("#1B9E77","#D95F02","#7570B3","#E7298A", + "#66A61E","#E6AB02","#A6761D") + clms <- 1 + l.pos <- -0.1 + } + + plotdata <- plotdata[order(plotdata[,1],decreasing = T),] + tabl <- t(as.matrix(plotdata)) + + par(mar=c(5, 4, 4, 8), xpd=TRUE) + barplot(tabl, + main = paste0("Signature activities (","method: ",object@signature.model,")"), + col = cols, + xlab = "sample", + names.arg=rep("",ncol(tabl)), + ylab = "relative exposure", + axes=TRUE) + legend(x= "topright", + inset=c(l.pos, 0), + legend = rownames(tabl), + fill=cols, + cex=0.7, + ncol=clms, + x.intersp = 0.5, + y.intersp = 0.8, + box.col=NA) +} diff --git a/R/plotSampleByComponent.R b/R/plotSampleByComponent.R new file mode 100644 index 0000000..c63dff6 --- /dev/null +++ b/R/plotSampleByComponent.R @@ -0,0 +1,20 @@ +#' plotSampleByComponent +#' +#' Plots a heatmap of the sample-by-component matrix +#' +#' @param object a CNQuant or SigQuant class object +#' @param ... additional parameters passed to \link[stats]{heatmap} +#' +#' @return plot +#' @export plotSampleByComponent +#' +plotSampleByComponent <- function(object=NULL,...){ + if(is.null(object)){ + stop("no object provided") + } + if(length(object@featFitting) == 0){ + stop("feature fitting not calculated") + } + plotData <- object@featFitting$sampleByComponent + stats::heatmap(x = plotData,...) +} diff --git a/R/plotSegments.R b/R/plotSegments.R new file mode 100644 index 0000000..a8896b1 --- /dev/null +++ b/R/plotSegments.R @@ -0,0 +1,98 @@ +#' plotSegments +#' +#' Plot the segment data for a given sample stored in a CNQuant or SigQuant class object +#' +#' @param object A CNQuant or SigQuant class object +#' @param sample A vector of length 1 containing either a sample name or sample index +#' @param cn.max Maximum copy number to plot - Values over this are truncated to fit +#' +#' @return plot +#' @export plotSegments +#' +plotSegments <- function(object=NULL,sample=NULL,cn.max=15){ + if(is.null(object)){ + stop("No object provided, object should be a object of class CNQuant or SigQuant") + } + if(!class(object) %in% c("CNQuant","SigQuant")){ + stop("Object is not of class CNQuant or SigQuant") + } + if(is.null(sample)){ + stop("No sample specified; sample should be an integer index or name of sample contained within the provided object") + } + if(!is.numeric(sample) & !is.character(sample)){ + stop("Unknow sample value provided; sample should be an integer index or name of sample contained within the provided object") + } + samp <- getSamples(object = object) + samp.len <- length(samp) + if(is.numeric(sample)){ + if(sample > samp.len){ + stop(paste0("Sample index is out of bounds; Object contains ",samp.len," samples")) + } + } + if(is.character(sample)){ + if(!sample %in% samp){ + stop("Sample was not found in the object provided") + } + } + samp.name <- ifelse(is.numeric(sample),samp[sample],sample) + object <- object[samp.name] + ob.pl <- getSamplefeatures(object = object)$ploidy + segTab <- getSegments(object = object) + segTab$chromosome <- factor(segTab$chromosome,levels = stringr::str_sort(unique(segTab$chromosome),numeric = T)) + if(max(segTab$segVal) > cn.max){ + segTab$segVal[segTab$segVal > cn.max] <- cn.max + ylim <- c(0,cn.max) + } else { + ylim <- c(0,round(max(segTab$segVal))+1) + } + seg.n <- nrow(segTab) + chrom.len <- data.frame(Group.1=unique(segTab$chromosome)) + chrom.len$x.max <- stats::aggregate(segTab$end,by = list(segTab$chromosome),FUN = max)$x + chrom.len$x.min <- stats::aggregate(segTab$start,by = list(segTab$chromosome),FUN = min)$x + chrom.len$Group.1 <- factor(chrom.len$Group.1,levels = stringr::str_sort(unique(chrom.len$Group.1),numeric = T)) + chrom.len <- chrom.len[order(chrom.len$Group.1),] + + segTab$startf <- getcoordinates(chr = segTab$chromosome,pos = segTab$start,chrom.len = chrom.len) + segTab$endf <- getcoordinates(chr = segTab$chromosome,pos = segTab$end,chrom.len = chrom.len) + chrom.len$flatm <- getcoordinates(chr = chrom.len$Group.1,pos = chrom.len$x.min,chrom.len = chrom.len) + chrom.len$flats <- getcoordinates(chr = chrom.len$Group.1,pos = chrom.len$x.max/2,chrom.len = chrom.len) + chrom.len$flate <- getcoordinates(chr = chrom.len$Group.1,pos = chrom.len$x.max,chrom.len = chrom.len) + + title <- samp.name + sub.title <- paste0("ploidy: ",ob.pl," | segments: ",seg.n) + rect.col <- ifelse(seq_along(chrom.len$Group.1) %% 2 == 0,"white","grey95") + + graphics::par(mar=c(5, 4, 4, 4) + 0.2) + graphics::plot(NA, + xlab="chromosome", + ylab="absolute copy number", + las=1, + xlim=c(min(chrom.len$flatm), max(chrom.len$flate)), + ylim=ylim, + xaxs="i", + xaxt="n", + yaxp=c(ylim[1], ylim[2], ylim[2]-ylim[1]), + yaxs="i") + graphics::rect(xleft = chrom.len$flatm, + xright = chrom.len$flate, + ybottom = 0,ytop = 100, + col=rect.col, + border = NA) + graphics::axis(1, at=chrom.len$flats, labels=chrom.len$Group.1) + graphics::box() + graphics::mtext(side=3, line=2, at=-0.07, adj=0, cex=1.2, title) + graphics::mtext(side=3, line=1, at=-0.07, adj=0, cex=1, sub.title) + graphics::abline(h = seq.int(0,cn.max-1,1),lty="dashed",col="gray50") + graphics::segments(x0 = segTab$startf,y0 = segTab$segVal,x1 = segTab$endf,y1 = segTab$segVal,lwd=3,col="blue") +} + +getcoordinates <- function(chr, pos, chrom.len) { + posflat <- pos + offset <- 0 + for (contig_ix in 1:nrow(chrom.len)) { + on_contig <- chr == chrom.len$Group.1[contig_ix] + posflat[on_contig] <- pos[on_contig] + offset + offset <- offset + chrom.len$x.max[contig_ix] + } + posflat +} diff --git a/R/quantifyCNSignatures.R b/R/quantifyCNSignatures.R new file mode 100644 index 0000000..d7c9849 --- /dev/null +++ b/R/quantifyCNSignatures.R @@ -0,0 +1,29 @@ +#' @rdname quantifyCNSignatures-methods +#' @aliases quantifyCNSignatures +setMethod( + "quantifyCNSignatures", + signature = c(object = c("data.frame")), + definition = function(object, + experimentName = "Default", + method = "drews", + cores = 1) { + # Check method + if (is.null(method) | !(method %in% c("drews", "mac"))) { + stop("Method was neither 'drews' nor 'mac'.") + } + + # Create object from CN profiles + # TODO: Extend for QDNAseq + cigTCGA = createCNQuant(data = object, experimentName = experimentName) + # Extract features + cigTCGA = calculateFeatures(object = cigTCGA, + method = method, + cores = cores) + # Calculate sum-of-posterior matrix + cigTCGA = calculateSampleByComponentMatrix(object = cigTCGA, method = + method) + # Calculate signature activities + cigTCGA = calculateActivity(object = cigTCGA, method = method) + return(cigTCGA) + } +) diff --git a/R/removeQuietSamples.R b/R/removeQuietSamples.R new file mode 100644 index 0000000..ac65058 --- /dev/null +++ b/R/removeQuietSamples.R @@ -0,0 +1,12 @@ +removeQuietSamples = function(dtSmooth, DCIN = 20) { + + # Identify CNAs per sample + dtCNAs = dtSmooth[ dtSmooth$segVal != 2, ] + quietSamples = names(table(dtCNAs$sample))[ table(dtCNAs$sample) < DCIN ] + + # Remove quiet samples + dtSmooth = dtSmooth[ ! dtSmooth$sample %in% quietSamples, ] + dtSmooth$sample = factor(dtSmooth$sample) + + return(dtSmooth) +} diff --git a/R/scaleByModel.R b/R/scaleByModel.R new file mode 100644 index 0000000..24266b0 --- /dev/null +++ b/R/scaleByModel.R @@ -0,0 +1,8 @@ +scaleByModel = function(H, lModel) { + # First go over columns to subtract mean, then go over columns to divide by scale + Hscaled = sweep( + sweep(H, 2, lModel$mean, FUN = '-'), + 2, lModel$scale, FUN = "/") + + return(Hscaled) +} diff --git a/R/smoothAndMergeSegments.R b/R/smoothAndMergeSegments.R new file mode 100644 index 0000000..580f866 --- /dev/null +++ b/R/smoothAndMergeSegments.R @@ -0,0 +1,22 @@ +## Function for identifying and merging neighbouring segments that are close to each other (as defined by a user-supplied threshold) +smoothAndMergeSegments = function(dfAllSegs, CORES, WIGGLE = 0.1, colNameSegVal = "segVal", colNameChr = "chromosome", IGNOREDELS = FALSE) { + + # Explicit conversion to numeric. Just to be on the safe site. + dfAllSegs$start = as.numeric( dfAllSegs$start ) + dfAllSegs$end = as.numeric( dfAllSegs$end ) + dfAllSegs$segVal = as.numeric( dfAllSegs$segVal ) + + # Set everything very close to 2 to 2 + dfAllSegs$segVal[ dfAllSegs$segVal > (2-WIGGLE) & dfAllSegs$segVal < (2+WIGGLE) ] = 2 + # Merge segments only when two normal follow each other -> SMOOTHINGFACTOR = 0 + dfAllSegs = idSmoothingTargets(dfAllSegs, WIGGLE = 0, colNameSegVal = "segVal", colNameChr = "chromosome", IGNOREDELS = IGNOREDELS) + # Split by sample name + lRaw = split(dfAllSegs, dfAllSegs$sample) + + # Smooth segments by taking the weighted average of the segVal and their lengths + lSmooth = smoothSegments(lRaw, CORES, SMOOTHINGFACTOR = 0, colNameMerge = "segVal", colNameChr = "chromosome", + colNameStart = "start", colNameEnd = "end", IGNOREDELS = IGNOREDELS, asDf = FALSE) + dtSmooth = rbindlist(lSmooth) + + return(dtSmooth) +} diff --git a/R/smoothSegments.R b/R/smoothSegments.R new file mode 100644 index 0000000..218c181 --- /dev/null +++ b/R/smoothSegments.R @@ -0,0 +1,151 @@ +#' @importFrom data.table setDT +## Smooth segments that are close to +smoothSegments = function(lRaw, CORES, SMOOTHINGFACTOR, colNameMerge, colNameChr, colNameStart, colNameEnd, + IGNOREDELS = TRUE, asDf = FALSE) { + + ### Check column names + test = lRaw[[1]] + testMerge = test[[colNameMerge]][1] + testChr = test[[colNameChr]][1] + testStart = test[[colNameStart]][1] + testEnd = test[[colNameEnd]][1] + if(! is.numeric(testMerge)) { stop("Merge column has no numeric value in it. Supplied correct column name?")} + if(is.null(testChr)) { stop("Chromosome column has no numeric value in it. Supplied correct column name?")} + if(! is.numeric(testStart)) { stop("Start column has no numeric value in it. Supplied correct column name?")} + if(! is.numeric(testEnd)) { stop("End column has no numeric value in it. Supplied correct column name?")} + + # Add diff column to names we want to keep when merging (comes from function "idSmoothingTargets"). + colNameMerge = c(colNameMerge, "diffs") + if(CORES > 1) { + if (!requireNamespace("doParallel", quietly = TRUE)) { + stop( + "Package \"doParallel\" must be installed to use multiple threads/cores.", + call. = FALSE + ) + } + `%dopar%` <- foreach::`%dopar%` + doParallel::registerDoParallel(CORES) + lSmooth = foreach::foreach(thisSample = lRaw, .final = function(x) setNames(x, names(lRaw)) ) %dopar% { + + thisOut = thisSample + stillSmoothing = sum(thisOut$smooth) + while( stillSmoothing > 0 ) { + # For the while loop: + # Read lines from thisSample and change in thisOut. Hence for a new iteration I need to sync the two. + thisSample = thisOut + + rleRaw = rle(thisSample$smooth) + # This takes the indeces of the FALSE chains and adds 1. This should give you the next segment which is TRUE. + # Two challenges: + # 1) Last segment always FALSE (see above), hence removal of the last number as this would indicate to a segment outside the df. + # 2) If it starts with a TRUE segment, this would not be found when looking at the FALSE chains. Hence, adding index 1 manually if chain starts with TRUE. + indRaw = cumsum(rleRaw$lengths)[ ! rleRaw$values ] + 1 + indRaw = indRaw[ -length(indRaw) ] + if( rleRaw$values[1] ) { indRaw = c(1, indRaw) } + + # loop over start indices of TRUE chains. + for(i in indRaw) { + # detect length of segments to smooth. add 1 as the last segment has a FALSE value in it but still belongs to this chain. + endOfStreak = i + rle(thisSample$smooth[i:nrow(thisSample)])$lengths[1] + # extract reads + dfMerge = thisSample[i:endOfStreak,] + + # too stupid to make this work with data.table + newElement = as.data.frame( dfMerge[1,] ) + # Get new end and check first wether valid number. + newEnd = dfMerge[nrow(dfMerge),][[colNameEnd]] + if(! is.null(newEnd)) { + newElement[[colNameEnd]] = newEnd + } else { + stop("New end coordinate is null. Supplied correct column name?") + } + ## Column "segVal" will be dealt with in a minute. Column "diffs" later when running again idSmoothingTargets. + + # Merge cn specifically by taking the length of the elements into consideration + widthWeights = dfMerge[[colNameEnd]] - dfMerge[[colNameStart]] + newElement[[colNameMerge[1]]] = weighted.mean(dfMerge[[colNameMerge[1]]], widthWeights) + # Replace all to merge segments with the new merged segment. Later delete duplicated. + thisOut[i:endOfStreak,] = newElement + } + + # as we have replaced all segments with the new mean segment, we need to remove the duplicates + thisOut = thisOut[ ! duplicated(thisOut), ] + # again detect segments which needs smoothing + thisOut = idSmoothingTargets(thisOut, SMOOTHINGFACTOR, colNameSegVal = colNameMerge[[1]], colNameChr = colNameChr, + IGNOREDELS = IGNOREDELS) + stillSmoothing = sum(thisOut$smooth) + } + + # after smoothing is finished, change name of cohort + thisOut$smooth = NULL + thisOut$diffs = NULL + return( thisOut ) + } + doParallel::stopImplicitCluster() + } else { + #stop("no single thread method for smoothing") + lSmooth <- lapply(lRaw,FUN = function(x){ + + thisOut = x + stillSmoothing = sum(thisOut$smooth) + while( stillSmoothing > 0 ) { + # For the while loop: + # Read lines from thisSample and change in thisOut. Hence for a new iteration I need to sync the two. + thisSample = thisOut + + rleRaw = rle(thisSample$smooth) + # This takes the indeces of the FALSE chains and adds 1. This should give you the next segment which is TRUE. + # Two challenges: + # 1) Last segment always FALSE (see above), hence removal of the last number as this would indicate to a segment outside the df. + # 2) If it starts with a TRUE segment, this would not be found when looking at the FALSE chains. Hence, adding index 1 manually if chain starts with TRUE. + indRaw = cumsum(rleRaw$lengths)[ ! rleRaw$values ] + 1 + indRaw = indRaw[ -length(indRaw) ] + if( rleRaw$values[1] ) { indRaw = c(1, indRaw) } + + # loop over start indices of TRUE chains. + for(i in indRaw) { + # detect length of segments to smooth. add 1 as the last segment has a FALSE value in it but still belongs to this chain. + endOfStreak = i + rle(thisSample$smooth[i:nrow(thisSample)])$lengths[1] + # extract reads + dfMerge = thisSample[i:endOfStreak,] + + # too stupid to make this work with data.table + newElement = as.data.frame( dfMerge[1,] ) + # Get new end and check first wether valid number. + newEnd = dfMerge[nrow(dfMerge),][[colNameEnd]] + if(! is.null(newEnd)) { + newElement[[colNameEnd]] = newEnd + } else { + stop("New end coordinate is null. Supplied correct column name?") + } + ## Column "segVal" will be dealt with in a minute. Column "diffs" later when running again idSmoothingTargets. + + # Merge cn specifically by taking the length of the elements into consideration + widthWeights = dfMerge[[colNameEnd]] - dfMerge[[colNameStart]] + newElement[[colNameMerge[1]]] = weighted.mean(dfMerge[[colNameMerge[1]]], widthWeights) + # Replace all to merge segments with the new merged segment. Later delete duplicated. + thisOut[i:endOfStreak,] = newElement + } + + # as we have replaced all segments with the new mean segment, we need to remove the duplicates + thisOut = thisOut[ ! duplicated(thisOut), ] + # again detect segments which needs smoothing + thisOut = idSmoothingTargets(thisOut, SMOOTHINGFACTOR, colNameSegVal = colNameMerge[[1]], colNameChr = colNameChr, + IGNOREDELS = IGNOREDELS) + stillSmoothing = sum(thisOut$smooth) + } + + # after smoothing is finished, change name of cohort + thisOut$smooth = NULL + thisOut$diffs = NULL + return( thisOut ) + }) + } + if( isTRUE(asDf) ) { + dfSmooth = setDT( rbindlist( lSmooth ) ) + return( dfSmooth ) + } else { + return( lSmooth ) + } + +} diff --git a/R/startCopynumberFeatureExtractionDrews.R b/R/startCopynumberFeatureExtractionDrews.R new file mode 100644 index 0000000..2302d6c --- /dev/null +++ b/R/startCopynumberFeatureExtractionDrews.R @@ -0,0 +1,11 @@ +startCopynumberFeatureExtractionDrews = function(dtSmooth, cores = 1, RMNORM = TRUE) { + + # Convert to data frame + dfBR = data.frame(dtSmooth) + # Split by sample + lBR = split( dfBR, dfBR$sample ) + # Extract features + brECNF = extractCopynumberFeaturesDrews(lBR, cores = cores, rmNorm = RMNORM) + + return(brECNF) +} diff --git a/R/subsetCNfeatures.R b/R/subsetCNfeatures.R new file mode 100644 index 0000000..a9fc793 --- /dev/null +++ b/R/subsetCNfeatures.R @@ -0,0 +1,6 @@ +subsetCNfeatures <- function(x,s){ + subCNfeats <- lapply(x, FUN = function(y){ + y <- y[y$ID %in% s,] + }) + subCNfeats +} diff --git a/R/subsetSigActivities.R b/R/subsetSigActivities.R new file mode 100644 index 0000000..416d1fd --- /dev/null +++ b/R/subsetSigActivities.R @@ -0,0 +1,6 @@ +subsetSigActivities <- function(x,s){ + subSigActivities <- lapply(x, FUN = function(y){ + y <- y[which(rownames(y) %in% s),] + }) + subSigActivities +} diff --git a/R/subsetfeatFitting.R b/R/subsetfeatFitting.R new file mode 100644 index 0000000..ba61335 --- /dev/null +++ b/R/subsetfeatFitting.R @@ -0,0 +1,6 @@ +subsetfeatFitting <- function(x,s){ + sxc <- x$sampleByComponent + sxc <- sxc[which(rownames(sxc) %in% s),] + x$sampleByComponent <- sxc + return(x) +} diff --git a/README.md b/README.md new file mode 100644 index 0000000..b2b33d4 --- /dev/null +++ b/README.md @@ -0,0 +1,115 @@ +# SignatureQuantification + +R package to quantify signatures of chromosomal instability on copy number profiles. + +## Check status + +TBD + +## Introduction + +Chromosomal instability Chromosomal instability (CIN) results in the accumulation of large-scale losses, gains, and rearrangements of DNA. +In our recent study [1], we present a systematic framework to measure different types of CIN and their impact on clinical phenotypes. +This R package allows you to quantify the activity of the 17 signatures presented. It also allows you to quantify signature activities from other publications [2]. + +First, copy number features are extracted from the copy number profiles. +Second, the features are assigned to components for which a probability will be calculated. These probabilities are then summed up for each patient. +Third, the probabilities across the components are used to quantify signature activities. +Fourth, the signature activities can be used to predict patient response to platinum-based chemotherapies. + +The `SignatureQuantification` package provides you with the functions and example data to automise this process of quantifying signature activities. + +## Quick start + +Install the package either from Github directly by using `devtools` +```r +install_github("markowetzlab/CINSignatureQuantification", build_vignettes = TRUE, dependencies = TRUE) +``` + +or from CRAN +```r +install.package("CINSignatureQuantification", dependencies = TRUE) +``` + +Then load the package with: +```r +library(CINSignatureQuantification) +``` + +If you have a segmented copy number profile that looks like the following example, then you are good to go. Preferably use unrounded copy number data but rounded data will do fine as well. +> chromosome start end segVal sample +> 1 61735 249224388 2.0 TCGA-BT-A20P +> 2 12784 82571206 2.0 TCGA-BT-A20P +> 2 82571664 85357333 0.843 TCGA-BT-A20P + +Then use this function to automise the process of feature extraction and signature quantification: +```r +mySigs = quantifyCNSignatures() +``` + +If you want to use the signature activities to predict response to platinum-based chemotherapies, use this function: +```r +vPredictions = clinPredictionPlatinum(mySigs) +``` + +## Requirements + +The `CINSignatureQuantification` package requires R version >= 4.0 and depends on the following packages: +* data.table +* stringr +* parallel +* foreach +* doMC +Therefore, these packages need to be installed (see below). + +## Functionality + +The `CINSignatureQuantification` package offers two main functions: `quantifyCNSignatures` and `clinPredictionPlatinum`. It also allows you to do the signature quantification step-by-step with these functions: `createCNQuant`, `calculateFeatures`, `calculateSampleByComponentMatrix`, `calculateActivity` and `clinPredictionDenovo`. + +## Getting help + +For more information on obtaining copy number profiles, please refer to the documentation of common copy number callers like [ASCAT](https://github.com/VanLoo-lab/ascat) or [ABSOLUTE](https://github.com/ShixiangWang/DoAbsolute). + +More information on how to work with and generate copy number signatures can be obtained from: [Drews et al. (Nature, 2022)](https://www.nature.com/articles/s41586-022-04789-9) or [Macintyre et al. (Nature Genetics, 2018)](https://www.nature.com/articles/s41588-018-0179-8). + +## Example data + +The package comes with a set of 478 samples that were both part of the TCGA and the PCAWG cohort and have detectable levels of CIN [1]. + + +## Citation + +Please cite `CINSignatureQuantification` as: + +``` +TBD +``` + +## Authors + +[Ruben Drews](https://github.com/Martingales) Ruben.Drews 'at' cruk.cam.ac.uk + +[Philip Smith](https://github.com/Phil9S) Philip.Smith 'at' cruk.cam.ac.uk + + +## References + +[1] [Drews et al. (Nature, 2022)](https://www.nature.com/articles/s41586-022-04789-9) + +[2] [Macintyre et al. (Nature Genetics, 2018)](https://www.nature.com/articles/s41588-018-0179-8) + +## Maintenance + +For any issues please open an issue! + + +## Licence +The contents of this repository are copyright (c) 2022, University of Cambridge and Spanish National Cancer Research Centre (CNIO). + +The contents of this repository are published and distributed under the GAP Available Source License v1.0 (ASL). + +The contents of this repository are distributed in the hope that it will be useful for non-commercial academic research, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ASL for more details. + +The methods implemented in the code are the subject of pending patent application GB 2114203.9. + +Any commercial use of this code is prohibited. diff --git a/data/Drews2022_CX3CX2_Clinical_classifier.rda b/data/Drews2022_CX3CX2_Clinical_classifier.rda new file mode 100644 index 0000000..95b128b Binary files /dev/null and b/data/Drews2022_CX3CX2_Clinical_classifier.rda differ diff --git a/data/Drews2022_TCGA_Mixture_Models.rda b/data/Drews2022_TCGA_Mixture_Models.rda new file mode 100644 index 0000000..1468783 Binary files /dev/null and b/data/Drews2022_TCGA_Mixture_Models.rda differ diff --git a/data/Drews2022_TCGA_Scaling_Variables.rda b/data/Drews2022_TCGA_Scaling_Variables.rda new file mode 100644 index 0000000..b524a77 Binary files /dev/null and b/data/Drews2022_TCGA_Scaling_Variables.rda differ diff --git a/data/Drews2022_TCGA_Signature_Thresholds.rda b/data/Drews2022_TCGA_Signature_Thresholds.rda new file mode 100644 index 0000000..73a9345 Binary files /dev/null and b/data/Drews2022_TCGA_Signature_Thresholds.rda differ diff --git a/data/Drews2022_TCGA_Signatures.rda b/data/Drews2022_TCGA_Signatures.rda new file mode 100644 index 0000000..344a86b Binary files /dev/null and b/data/Drews2022_TCGA_Signatures.rda differ diff --git a/data/Macintyre2018_OV_Mixture_Models.rda b/data/Macintyre2018_OV_Mixture_Models.rda new file mode 100644 index 0000000..eb2d427 Binary files /dev/null and b/data/Macintyre2018_OV_Mixture_Models.rda differ diff --git a/data/Macintyre2018_OV_Signatures.rda b/data/Macintyre2018_OV_Signatures.rda new file mode 100644 index 0000000..d8a81f8 Binary files /dev/null and b/data/Macintyre2018_OV_Signatures.rda differ diff --git a/data/Macintyre2018_OV_Signatures_normalised.rda b/data/Macintyre2018_OV_Signatures_normalised.rda new file mode 100644 index 0000000..1ad42a6 Binary files /dev/null and b/data/Macintyre2018_OV_Signatures_normalised.rda differ diff --git a/data/gap_hg19.rda b/data/gap_hg19.rda new file mode 100644 index 0000000..e02f377 Binary files /dev/null and b/data/gap_hg19.rda differ diff --git a/data/hg19.chrom.sizes.rda b/data/hg19.chrom.sizes.rda new file mode 100644 index 0000000..23ea5c1 Binary files /dev/null and b/data/hg19.chrom.sizes.rda differ diff --git a/inst/CITATION b/inst/CITATION new file mode 100644 index 0000000..a9357ff --- /dev/null +++ b/inst/CITATION @@ -0,0 +1,37 @@ +citHeader("To cite CINSignatureQuantification in publications use:") + +citEntry( + entry = "Article", + title = "A pan-cancer compendium of chromosomal instability", + author = personList(as.person("Ruben Drews")), + journal = "Nature", + year = "2022", + volume = "", + number = "", + pages = "", + url = "", + textVersion = paste("Drews et al.", + "A pan-cancer compendium of chromosomal instability", + "Nature (2022)", + "." + ) +) + +citEntry( + entry = "Article", + title = "Copy number signatures and mutational processes in ovarian carcinoma", + author = personList(as.person("Geoff Macintyre")), + journal = "Nature Genetics", + year = "2018", + volume = "50", + number = "9", + pages = "1262-1270", + doi = "10.1038/s41588-018-0179-8", + textVersion = paste("Macintyre et al.", + "Copy number signatures and mutational processes in ovarian carcinoma.", + "Nature Genetics (2018)", + "." + + ) +) + diff --git a/inst/Example_data.R b/inst/Example_data.R new file mode 100644 index 0000000..4d5497b --- /dev/null +++ b/inst/Example_data.R @@ -0,0 +1,58 @@ +## Testing ground for package and functions + +# - CHECK E/W/N +# - Non-standard license +# - pkgdown +# - clinPredictionPlatinum-methods uses normalised but not threshold adjusted + +# Load library +library(CINSignatureQuantification) + +# Load test data +dfTest = readRDS("inst/TCGA_478_Samples_SNP6_GOLD.rds") + +## Pipeline method +sigAct478.drews = quantifyCNSignatures(dfTest,experimentName = "478TCGAPCAWG",method = "drews",cores = 6) +sigAct478.mac = quantifyCNSignatures(dfTest,experimentName = "478TCGAPCAWG",method = "mac",cores = 6) + +## Individual functions +# Convert to CNQuant object +myData = createCNQuant(data = dfTest) + +## Feature extraction (includes smoothing and preparing data) +myData.drews = calculateFeatures(myData, method="drews",cores = 1) +myData.mac = calculateFeatures(myData, method="mac",cores = 1) + +## Get sum-of-posterior matrix +myData.drews = calculateSampleByComponentMatrix(myData.drews) +myData.mac = calculateSampleByComponentMatrix(myData.mac) + +## Get activities +myData.drews = calculateActivity(myData.drews) +myData.mac = calculateActivity(myData.mac) + +## Test clinical classifier (CX3/CX2 and De-novo for two self chosen signatures) +vPredPlat = clinPredictionPlatinum(sigAct478.drews) +vPredCX8CX9 = clinPredictionDenovo(sigAct478.drews, sampTrain = sample(getSamples(sigAct478.drews), 50), sigsTrain = c("CX9", "CX8")) + +## Additional functions +# Show and subsetting +sigAct478.drews +sigAct478.drews[1:10] +sigAct478.drews[getSamples(sigAct478.drews)[1:50]] + +# Sample feature/clinical information +getSamplefeatures(sigAct478.drews) +load("inst/test.sample.features.rda") +sigAct478.drews = addsampleFeatures(object = sigAct478.drews,sample.data = test.sample.features) +getSamplefeatures(sigAct478.drews) + +# plots +plotSampleByComponent(sigAct478.drews) +plotSegments(sigAct478.drews,sample = 1,cn.max = 8) +plotActivities(object = sigAct478.drews) + +# misc +getSampleByComponent(sigAct478.drews) +getExperiment(sigAct478.drews) +getSamples(sigAct478.drews) diff --git a/inst/FlexmixToRubenMethodDiff.rda b/inst/FlexmixToRubenMethodDiff.rda new file mode 100644 index 0000000..b63586f Binary files /dev/null and b/inst/FlexmixToRubenMethodDiff.rda differ diff --git a/inst/TCGA_478_Samples_SNP6_GOLD.rds b/inst/TCGA_478_Samples_SNP6_GOLD.rds new file mode 100644 index 0000000..399b572 Binary files /dev/null and b/inst/TCGA_478_Samples_SNP6_GOLD.rds differ diff --git a/inst/test.sample.features.rda b/inst/test.sample.features.rda new file mode 100644 index 0000000..494b30c Binary files /dev/null and b/inst/test.sample.features.rda differ diff --git a/man/CINSignatureQuantification.Rd b/man/CINSignatureQuantification.Rd new file mode 100644 index 0000000..6b99da2 --- /dev/null +++ b/man/CINSignatureQuantification.Rd @@ -0,0 +1,13 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/CINSignatureQuantification.R +\docType{package} +\name{CINSignatureQuantification} +\alias{CINSignatureQuantification} +\title{CINSignatureQuantification} +\description{ +Allowing the simple and quick quantification of copy number signatures in +cancer samples from copy number profiles. The signatures are a readout of mutational +processes resulting in chromosomal instability (CIN). It is thought as a one-stop +solution, combining multiple published solutions. At the moment the methods from +Drews et al. (Nature, 2022) and Macintyre et al. (Nature Genetics, 2018) are included. +} diff --git a/man/CNQuant-class.Rd b/man/CNQuant-class.Rd new file mode 100644 index 0000000..4e5c936 --- /dev/null +++ b/man/CNQuant-class.Rd @@ -0,0 +1,24 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllClasses.R +\docType{class} +\name{CNQuant-class} +\alias{CNQuant-class} +\alias{CNQuant} +\title{CNQuant object} +\description{ +CNQuant object +} +\section{Slots}{ + +\describe{ +\item{\code{segments}}{list} + +\item{\code{featData}}{list} + +\item{\code{featFitting}}{list} + +\item{\code{samplefeatData}}{data.frame} + +\item{\code{ExpData}}{ExpQuant} +}} + diff --git a/man/Drews2022_CX3CX2_Clinical_classifier.Rd b/man/Drews2022_CX3CX2_Clinical_classifier.Rd new file mode 100644 index 0000000..e5593b4 --- /dev/null +++ b/man/Drews2022_CX3CX2_Clinical_classifier.Rd @@ -0,0 +1,17 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{Drews2022_CX3CX2_Clinical_classifier} +\alias{Drews2022_CX3CX2_Clinical_classifier} +\title{Drews2022_CX3CX2_Clinical_classifier} +\format{ +A list of 2 numeric vectors of length 2 +} +\usage{ +data(Drews2022_CX3CX2_Clinical_classifier) +} +\description{ +List of mean and scaling factors for signatures 2 and 3 for the prediction of +platinum status used in Drews 2022 +} +\keyword{datasets} diff --git a/man/Drews2022_TCGA_Mixture_Models.Rd b/man/Drews2022_TCGA_Mixture_Models.Rd new file mode 100644 index 0000000..0b1bea7 --- /dev/null +++ b/man/Drews2022_TCGA_Mixture_Models.Rd @@ -0,0 +1,17 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{Drews2022_TCGA_Mixture_Models} +\alias{Drews2022_TCGA_Mixture_Models} +\title{Drews2022_TCGA_Mixture_Models} +\format{ +A list of 5 data.frames +} +\usage{ +data(Drews2022_TCGA_Mixture_Models) +} +\description{ +List of mixture model mean, standard deviation, and weight for each copy number +feature used in Drews 2022 +} +\keyword{datasets} diff --git a/man/Drews2022_TCGA_Scaling_Variables.Rd b/man/Drews2022_TCGA_Scaling_Variables.Rd new file mode 100644 index 0000000..abfb522 --- /dev/null +++ b/man/Drews2022_TCGA_Scaling_Variables.Rd @@ -0,0 +1,16 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{Drews2022_TCGA_Scaling_Variables} +\alias{Drews2022_TCGA_Scaling_Variables} +\title{Drews2022_TCGA_Scaling_Variables} +\format{ +A list of 2 numeric vectors of length 17 +} +\usage{ +data(Drews2022_TCGA_Scaling_Variables) +} +\description{ +List of mean and scaling factors for signatures used in Drews 2022 +} +\keyword{datasets} diff --git a/man/Drews2022_TCGA_Signature_Thresholds.Rd b/man/Drews2022_TCGA_Signature_Thresholds.Rd new file mode 100644 index 0000000..0aab687 --- /dev/null +++ b/man/Drews2022_TCGA_Signature_Thresholds.Rd @@ -0,0 +1,17 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{Drews2022_TCGA_Signature_Thresholds} +\alias{Drews2022_TCGA_Signature_Thresholds} +\title{Drews2022_TCGA_Signature_Thresholds} +\format{ +A numeric vector of length 17 +} +\usage{ +data(Drews2022_TCGA_Signature_Thresholds) +} +\description{ +Numeric vector containing signature-specific thresholds used in activity +calculations used in Drews 2022 +} +\keyword{datasets} diff --git a/man/Drews2022_TCGA_Signatures.Rd b/man/Drews2022_TCGA_Signatures.Rd new file mode 100644 index 0000000..a628e41 --- /dev/null +++ b/man/Drews2022_TCGA_Signatures.Rd @@ -0,0 +1,16 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{Drews2022_TCGA_Signatures} +\alias{Drews2022_TCGA_Signatures} +\title{Drews2022_TCGA_Signatures} +\format{ +A 17 by 43 numeric matrix +} +\usage{ +data(Drews2022_TCGA_Signatures) +} +\description{ +Signature-by-component matrix for 17 derived signatures as used in Drews 2022 +} +\keyword{datasets} diff --git a/man/ExpQuant-class.Rd b/man/ExpQuant-class.Rd new file mode 100644 index 0000000..1577bd3 --- /dev/null +++ b/man/ExpQuant-class.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllClasses.R +\docType{class} +\name{ExpQuant-class} +\alias{ExpQuant-class} +\alias{ExpQuant} +\title{ExpQuant object} +\description{ +ExpQuant object +} +\section{Slots}{ + +\describe{ +\item{\code{experimentName}}{character} + +\item{\code{init.date}}{character} + +\item{\code{last.modified}}{character} + +\item{\code{samples.full}}{numeric} + +\item{\code{samples.current}}{numeric} + +\item{\code{build}}{character} + +\item{\code{feature.method}}{character} +}} + diff --git a/man/Macintyre2018_OV_Mixture_Models.Rd b/man/Macintyre2018_OV_Mixture_Models.Rd new file mode 100644 index 0000000..817c62c --- /dev/null +++ b/man/Macintyre2018_OV_Mixture_Models.Rd @@ -0,0 +1,17 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{Macintyre2018_OV_Mixture_Models} +\alias{Macintyre2018_OV_Mixture_Models} +\title{Macintyre2018_OV_Mixture_Models} +\format{ +A list of 6 data.frames +} +\usage{ +data(Macintyre2018_OV_Mixture_Models) +} +\description{ +List of mixture model mean, standard deviation, and weight for each copy number +feature used in Macintyre 2018 +} +\keyword{datasets} diff --git a/man/Macintyre2018_OV_Signatures.Rd b/man/Macintyre2018_OV_Signatures.Rd new file mode 100644 index 0000000..8ea0db5 --- /dev/null +++ b/man/Macintyre2018_OV_Signatures.Rd @@ -0,0 +1,16 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{Macintyre2018_OV_Signatures} +\alias{Macintyre2018_OV_Signatures} +\title{Macintyre2018_OV_Signatures} +\format{ +A 7 by 36 numeric matrix +} +\usage{ +data(Macintyre2018_OV_Signatures) +} +\description{ +Signature-by-component matrix for 7 derived signatures as used in Macintyre 2018 +} +\keyword{datasets} diff --git a/man/Macintyre2018_OV_Signatures_normalised.Rd b/man/Macintyre2018_OV_Signatures_normalised.Rd new file mode 100644 index 0000000..f6c8ee6 --- /dev/null +++ b/man/Macintyre2018_OV_Signatures_normalised.Rd @@ -0,0 +1,17 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{Macintyre2018_OV_Signatures_normalised} +\alias{Macintyre2018_OV_Signatures_normalised} +\title{Macintyre2018_OV_Signatures_normalised} +\format{ +A 7 by 36 numeric matrix +} +\usage{ +data(Macintyre2018_OV_Signatures_normalised) +} +\description{ +Signature-by-component matrix for 7 derived signatures as used in Macintyre 2018 +where column sums are normalised to 1. +} +\keyword{datasets} diff --git a/man/SigQuant-class.Rd b/man/SigQuant-class.Rd new file mode 100644 index 0000000..9dbc4d1 --- /dev/null +++ b/man/SigQuant-class.Rd @@ -0,0 +1,26 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllClasses.R +\docType{class} +\name{SigQuant-class} +\alias{SigQuant-class} +\alias{SigQuant} +\title{SigQuant} +\description{ +SigQuant +} +\section{Slots}{ + +\describe{ +\item{\code{activities}}{list} + +\item{\code{signature.model}}{character} + +\item{\code{backup.signatures}}{matrix} + +\item{\code{backup.thresholds}}{numeric} + +\item{\code{backup.scale}}{list} + +\item{\code{backup.scale.model}}{character} +}} + diff --git a/man/addsampleFeatures.Rd b/man/addsampleFeatures.Rd new file mode 100644 index 0000000..cbf932e --- /dev/null +++ b/man/addsampleFeatures.Rd @@ -0,0 +1,23 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/addSampleFeatures.R +\name{addsampleFeatures} +\alias{addsampleFeatures} +\title{addsampleFeatures} +\usage{ +addsampleFeatures(object, sample.data = NULL, id.col = "sample") +} +\arguments{ +\item{object}{CNQuant or SigQuant class object} + +\item{sample.data}{data.frame containing sample-level variables} + +\item{id.col}{column containing sample identifiers} +} +\value{ +CNQuant or SigQuant object with updated samplefeatData +} +\description{ +Adds custom sample-level data to the samplefeatData field of a CNQuant or SigQuant object. +This can be additional sample information (purity, tumour type, etc.) that can +be used in downstream analysis. +} diff --git a/man/calculateActivity-methods.Rd b/man/calculateActivity-methods.Rd new file mode 100644 index 0000000..99da03c --- /dev/null +++ b/man/calculateActivity-methods.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, R/calculateActivity.R +\docType{methods} +\name{calculateActivity} +\alias{calculateActivity} +\alias{calculateActivity,CNQuant-method} +\title{calculateActivity} +\usage{ +calculateActivity(object, method = "drews") + +\S4method{calculateActivity}{CNQuant}(object, method = NULL) +} +\arguments{ +\item{object}{CNQuant object} + +\item{method}{Determines the mixture components used to calculate sum-of-posterior probabilities. Default is "drews".} +} +\value{ +A SigQuant class object with four activity matrices stored in the "activities" slot +} +\description{ +Calculates and returns signature activities in a SigQuant object. Works best after function calculateSampleByComponentMatrix call. \cr \cr +The output of this function is a list of four matrices, the raw signature activities, the normalised activities, the normalised and +thresholded signature activities and the normalised, thresholded and scaled activities with the scaling factors obtained from the TCGA cohort. +} diff --git a/man/calculateFeatures-methods.Rd b/man/calculateFeatures-methods.Rd new file mode 100644 index 0000000..763aca8 --- /dev/null +++ b/man/calculateFeatures-methods.Rd @@ -0,0 +1,27 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, R/calculateFeatures.R +\docType{methods} +\name{calculateFeatures} +\alias{calculateFeatures} +\alias{calculateFeatures,CNQuant-method} +\title{calculateFeatures} +\usage{ +calculateFeatures(object, method = "drews", smooth.diploid = TRUE, cores = 1) + +\S4method{calculateFeatures}{CNQuant}(object, method = NULL, smooth.diploid = TRUE, cores = 1) +} +\arguments{ +\item{object}{CNQuant object} + +\item{method}{Method to extract copy number features. Default is "drews".} + +\item{smooth.diploid}{Binary variable indicating whether segments close to 2 should be collapsed to 2 and merged together. Default is TRUE.} + +\item{cores}{Number of CPU threads/cores to utilise via doParallel. Default is 1.} +} +\value{ +A CNQuant class object with extracted features stored in the "featData" slot +} +\description{ +Extracts and returns copy number features from copy number profiles in a CNQuant object. +} diff --git a/man/calculateSampleByComponentMatrix-methods.Rd b/man/calculateSampleByComponentMatrix-methods.Rd new file mode 100644 index 0000000..12fcafe --- /dev/null +++ b/man/calculateSampleByComponentMatrix-methods.Rd @@ -0,0 +1,24 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, +% R/calculateSampleByComponentMatrix.R +\docType{methods} +\name{calculateSampleByComponentMatrix} +\alias{calculateSampleByComponentMatrix} +\alias{calculateSampleByComponentMatrix,CNQuant-method} +\title{calculateSampleByComponentMatrix} +\usage{ +calculateSampleByComponentMatrix(object, method = "drews") + +\S4method{calculateSampleByComponentMatrix}{CNQuant}(object, method = NULL) +} +\arguments{ +\item{object}{CNQuant object} + +\item{method}{Determines the mixture components used to calculate sum-of-posterior probabilities. Default is "drews".} +} +\value{ +A CNQuant class object with sum-of-posterior probabilities stored in the "featFitting" slot +} +\description{ +Calculates and returns a sample-by-component matrix from copy number features in a CNQuant object. +} diff --git a/man/clinPredictionDenovo-methods.Rd b/man/clinPredictionDenovo-methods.Rd new file mode 100644 index 0000000..8d5d9ce --- /dev/null +++ b/man/clinPredictionDenovo-methods.Rd @@ -0,0 +1,27 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, R/clinPredictionDenovo.R +\docType{methods} +\name{clinPredictionDenovo} +\alias{clinPredictionDenovo} +\alias{clinPredictionDenovo,SigQuant-method} +\title{clinPredictionDenovo} +\usage{ +clinPredictionDenovo(object, sampTrain, sigsTrain) + +\S4method{clinPredictionDenovo}{SigQuant}(object, sampTrain, sigsTrain) +} +\arguments{ +\item{object}{SigQuant object} + +\item{sampTrain}{Vector of sample names that should be used for training the classifier.} + +\item{sigsTrain}{Vector with two signature names on which the prediction should be based upon.} +} +\value{ +A vector with "Signature <1> higher" or "Signature <2> higher" for all samples in the input object. +} +\description{ +The function takes signature activities based on Drews et al. methodology and predicts patient's response based on +user-specified pair of signatures. \cr \cr +The user should supply a vector of samples for training purposes. The function then trains the classifier on these samples before applying it to all samples and return the labels. +} diff --git a/man/clinPredictionPlatinum-methods.Rd b/man/clinPredictionPlatinum-methods.Rd new file mode 100644 index 0000000..9b4fc14 --- /dev/null +++ b/man/clinPredictionPlatinum-methods.Rd @@ -0,0 +1,21 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, R/clinPredictionPlatinum.R +\docType{methods} +\name{clinPredictionPlatinum} +\alias{clinPredictionPlatinum} +\alias{clinPredictionPlatinum,SigQuant-method} +\title{clinPredictionPlatinum} +\usage{ +clinPredictionPlatinum(object) + +\S4method{clinPredictionPlatinum}{SigQuant}(object) +} +\arguments{ +\item{object}{SigQuant object} +} +\value{ +A vector with "Predicted sensitive" or "Predicted resistant" for all samples in the input object. +} +\description{ +The function takes signature activities based on Drews et al. methodology and predicts patient's response to platinum-based chemotherapies. +} diff --git a/man/createCNQuant.Rd b/man/createCNQuant.Rd new file mode 100644 index 0000000..7ae5c69 --- /dev/null +++ b/man/createCNQuant.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/createCNQuant.R +\name{createCNQuant} +\alias{createCNQuant} +\title{createCNQuant} +\usage{ +createCNQuant( + data = NULL, + experimentName = "defaultExperiment", + build = "hg19" +) +} +\arguments{ +\item{data}{Unrounded absolute copy number data} + +\item{experimentName}{A name for the experiment (default: defaultExperiment)} + +\item{build}{A genome build specified as either hg19 or hg38 (default: hg19)} +} +\value{ +A CNQuant class object +} +\description{ +createCNQuant +} diff --git a/man/gap_hg19.Rd b/man/gap_hg19.Rd new file mode 100644 index 0000000..567c6f8 --- /dev/null +++ b/man/gap_hg19.Rd @@ -0,0 +1,16 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{gap_hg19} +\alias{gap_hg19} +\title{gap_hg19} +\format{ +A data frame with 457 rows and 9 variables +} +\usage{ +data(gap_hg19) +} +\description{ +Chromosomal banding and position of genomic features in genome build hg19 +} +\keyword{datasets} diff --git a/man/getExperiment-methods.Rd b/man/getExperiment-methods.Rd new file mode 100644 index 0000000..a93a6dc --- /dev/null +++ b/man/getExperiment-methods.Rd @@ -0,0 +1,21 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, R/getExperiment.R +\docType{methods} +\name{getExperiment} +\alias{getExperiment} +\alias{getExperiment,CNQuant-method} +\title{getExperiment} +\usage{ +getExperiment(object) + +\S4method{getExperiment}{CNQuant}(object) +} +\arguments{ +\item{object}{CNQuant object} +} +\value{ +A ExpQuant class object +} +\description{ +Extracts and returns copy number features from copy number profiles in a CNQuant object. +} diff --git a/man/getSampleByComponent-methods.Rd b/man/getSampleByComponent-methods.Rd new file mode 100644 index 0000000..cfc3640 --- /dev/null +++ b/man/getSampleByComponent-methods.Rd @@ -0,0 +1,21 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, R/getSampleByComponent.R +\docType{methods} +\name{getSampleByComponent} +\alias{getSampleByComponent} +\alias{getSampleByComponent,CNQuant-method} +\title{getSampleByComponent} +\usage{ +getSampleByComponent(object) + +\S4method{getSampleByComponent}{CNQuant}(object) +} +\arguments{ +\item{object}{CNQuant object} +} +\value{ +matrix containing the sample-by-component data +} +\description{ +getSampleByComponent +} diff --git a/man/getSamplefeatures-methods.Rd b/man/getSamplefeatures-methods.Rd new file mode 100644 index 0000000..0200456 --- /dev/null +++ b/man/getSamplefeatures-methods.Rd @@ -0,0 +1,21 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, R/getSamplefeatures.R +\docType{methods} +\name{getSamplefeatures} +\alias{getSamplefeatures} +\alias{getSamplefeatures,CNQuant-method} +\title{getSamplefeatures} +\usage{ +getSamplefeatures(object) + +\S4method{getSamplefeatures}{CNQuant}(object) +} +\arguments{ +\item{object}{CNQuant object} +} +\value{ +A data.frame +} +\description{ +Extracts sample feature data from a CNQuant object. +} diff --git a/man/getSamples-methods.Rd b/man/getSamples-methods.Rd new file mode 100644 index 0000000..36132e0 --- /dev/null +++ b/man/getSamples-methods.Rd @@ -0,0 +1,21 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, R/getSamples.R +\docType{methods} +\name{getSamples} +\alias{getSamples} +\alias{getSamples,CNQuant-method} +\title{getSamples} +\usage{ +getSamples(object) + +\S4method{getSamples}{CNQuant}(object) +} +\arguments{ +\item{object}{CNQuant object} +} +\value{ +A character vector +} +\description{ +Extracts sample names from a CNQuant object. +} diff --git a/man/getSegments-methods.Rd b/man/getSegments-methods.Rd new file mode 100644 index 0000000..040e36f --- /dev/null +++ b/man/getSegments-methods.Rd @@ -0,0 +1,21 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, R/getSegments.R +\docType{methods} +\name{getSegments} +\alias{getSegments} +\alias{getSegments,CNQuant-method} +\title{getSegments} +\usage{ +getSegments(object) + +\S4method{getSegments}{CNQuant}(object) +} +\arguments{ +\item{object}{CNQuant object} +} +\value{ +A data.frame +} +\description{ +Extracts copy number segment data from a CNQuant object. +} diff --git a/man/hg19.chrom.sizes.Rd b/man/hg19.chrom.sizes.Rd new file mode 100644 index 0000000..870831f --- /dev/null +++ b/man/hg19.chrom.sizes.Rd @@ -0,0 +1,16 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{hg19.chrom.sizes} +\alias{hg19.chrom.sizes} +\title{hg19.chrom.sizes} +\format{ +A data frame with 24 rows and 2 variables +} +\usage{ +data(hg19.chrom.sizes) +} +\description{ +Chromosomal lengths for genome build hg19 +} +\keyword{datasets} diff --git a/man/plotActivities.Rd b/man/plotActivities.Rd new file mode 100644 index 0000000..14cb03d --- /dev/null +++ b/man/plotActivities.Rd @@ -0,0 +1,19 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/plotActivities.R +\name{plotActivities} +\alias{plotActivities} +\title{plotActivities} +\usage{ +plotActivities(object) +} +\arguments{ +\item{object}{A SigQuant class object} +} +\value{ +plot +} +\description{ +Plot the copy number signature activites for a given CNQuant or SigQuant class object +containing copy number signature activities/exposures. Default ordering by +signature CX1 +} diff --git a/man/plotSampleByComponent.Rd b/man/plotSampleByComponent.Rd new file mode 100644 index 0000000..43718a7 --- /dev/null +++ b/man/plotSampleByComponent.Rd @@ -0,0 +1,19 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/plotSampleByComponent.R +\name{plotSampleByComponent} +\alias{plotSampleByComponent} +\title{plotSampleByComponent} +\usage{ +plotSampleByComponent(object = NULL, ...) +} +\arguments{ +\item{object}{a CNQuant or SigQuant class object} + +\item{...}{additional parameters passed to \link[stats]{heatmap}} +} +\value{ +plot +} +\description{ +Plots a heatmap of the sample-by-component matrix +} diff --git a/man/plotSegments.Rd b/man/plotSegments.Rd new file mode 100644 index 0000000..007c3a3 --- /dev/null +++ b/man/plotSegments.Rd @@ -0,0 +1,21 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/plotSegments.R +\name{plotSegments} +\alias{plotSegments} +\title{plotSegments} +\usage{ +plotSegments(object = NULL, sample = NULL, cn.max = 15) +} +\arguments{ +\item{object}{A CNQuant or SigQuant class object} + +\item{sample}{A vector of length 1 containing either a sample name or sample index} + +\item{cn.max}{Maximum copy number to plot - Values over this are truncated to fit} +} +\value{ +plot +} +\description{ +Plot the segment data for a given sample stored in a CNQuant or SigQuant class object +} diff --git a/man/quantifyCNSignatures-methods.Rd b/man/quantifyCNSignatures-methods.Rd new file mode 100644 index 0000000..75eadb0 --- /dev/null +++ b/man/quantifyCNSignatures-methods.Rd @@ -0,0 +1,37 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AllGenerics.R, R/quantifyCNSignatures.R +\docType{methods} +\name{quantifyCNSignatures} +\alias{quantifyCNSignatures} +\alias{quantifyCNSignatures,data.frame-method} +\title{quantifyCNSignatures} +\usage{ +quantifyCNSignatures( + object, + experimentName = "Default", + method = "drews", + cores = 1 +) + +\S4method{quantifyCNSignatures}{data.frame}( + object, + experimentName = "Default", + method = "drews", + cores = 1 +) +} +\arguments{ +\item{object}{CNQuant object} + +\item{experimentName}{A user-specified name of the experiment} + +\item{method}{The method used for calculating the signature activities. Default is "drews"} + +\item{cores}{Number of threads/cores to use for parallel processing} +} +\value{ +A SigQuant class object with four activity matrices stored in the "activities" slot +} +\description{ +This function takes a copy number profile as input and returns signature activities. +} diff --git a/man/sub-CNQuant-character-missing-method.Rd b/man/sub-CNQuant-character-missing-method.Rd new file mode 100644 index 0000000..7325eb0 --- /dev/null +++ b/man/sub-CNQuant-character-missing-method.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/methods-CNQuant.R +\name{[,CNQuant,character,missing-method} +\alias{[,CNQuant,character,missing-method} +\title{Extract} +\usage{ +\S4method{[}{CNQuant,character,missing}(x, i, j, ..., drop = TRUE) +} +\arguments{ +\item{x}{object to extract subset} + +\item{i}{Indices of subset} + +\item{j}{not used} + +\item{...}{not used} + +\item{drop}{not used} +} +\description{ +Extract +} diff --git a/man/sub-CNQuant-numeric-missing-method.Rd b/man/sub-CNQuant-numeric-missing-method.Rd new file mode 100644 index 0000000..820e92b --- /dev/null +++ b/man/sub-CNQuant-numeric-missing-method.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/methods-CNQuant.R +\name{[,CNQuant,numeric,missing-method} +\alias{[,CNQuant,numeric,missing-method} +\title{Extract} +\usage{ +\S4method{[}{CNQuant,numeric,missing}(x, i, j, ..., drop = TRUE) +} +\arguments{ +\item{x}{object to extract subset} + +\item{i}{Indices of subset} + +\item{j}{not used} + +\item{...}{not used} + +\item{drop}{not used} +} +\description{ +Extract +} diff --git a/man/sub-SigQuant-character-missing-method.Rd b/man/sub-SigQuant-character-missing-method.Rd new file mode 100644 index 0000000..5fdb5f7 --- /dev/null +++ b/man/sub-SigQuant-character-missing-method.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/methods-SigQuant.R +\name{[,SigQuant,character,missing-method} +\alias{[,SigQuant,character,missing-method} +\title{Extract} +\usage{ +\S4method{[}{SigQuant,character,missing}(x, i, j, ..., drop = TRUE) +} +\arguments{ +\item{x}{object to extract subset} + +\item{i}{Indices of subset} + +\item{j}{not used} + +\item{...}{not used} + +\item{drop}{not used} +} +\description{ +Extract +} diff --git a/man/sub-SigQuant-numeric-missing-method.Rd b/man/sub-SigQuant-numeric-missing-method.Rd new file mode 100644 index 0000000..fc0dd0f --- /dev/null +++ b/man/sub-SigQuant-numeric-missing-method.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/methods-SigQuant.R +\name{[,SigQuant,numeric,missing-method} +\alias{[,SigQuant,numeric,missing-method} +\title{Extract} +\usage{ +\S4method{[}{SigQuant,numeric,missing}(x, i, j, ..., drop = TRUE) +} +\arguments{ +\item{x}{object to extract subset} + +\item{i}{Indices of subset} + +\item{j}{not used} + +\item{...}{not used} + +\item{drop}{not used} +} +\description{ +Extract +} diff --git a/vignettes/.gitignore b/vignettes/.gitignore new file mode 100644 index 0000000..097b241 --- /dev/null +++ b/vignettes/.gitignore @@ -0,0 +1,2 @@ +*.html +*.R diff --git a/vignettes/CINSignaturesQuantification_vignette.Rmd b/vignettes/CINSignaturesQuantification_vignette.Rmd new file mode 100644 index 0000000..d713276 --- /dev/null +++ b/vignettes/CINSignaturesQuantification_vignette.Rmd @@ -0,0 +1,119 @@ +--- +title: "CINSignaturesQuantification_vignette" + +--- +title: "CINSignatureQuantification - Simple and quick measuring of copy number signatures in cancers" +author: "Philip Smith , Ruben Drews , Cancer Research UK " +date: "`r format(Sys.time(), '%d %B, %Y')`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{CINSignaturesQuantification_vignette} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r setup, include=FALSE} +#knitr::opts_chunk$set(echo = TRUE) +``` + +```{r load_lib} +#library(CINSignatureQuantification) +``` + +```{r test_data} +#test_data = readRDS("../inst//TCGA_478_Samples_SNP6_GOLD.rds") +#head(test_data) +``` + +### Load data + +```{r load_data} +#myData = createCignatures(data = test_data,experimentName = "TCGA/PCAWG overlap with dCIN") +#myData +``` + +### Class features + +#### experiment class slot + +```{r exp_data} +#getExperiment(object = myData) +``` + +#### Samples + +```{r get_samples} +#samples = getSamples(object = myData) +#samples[1:10] +``` + +#### Segment data + +```{r get_segs} +#segs = getSegments(object = myData) +#head(segs) +``` +#### Sample features + +```{r get_sampleFeats} +#sampleFeats = getSamplefeatures(object = myData) +#head(sampleFeats) +``` + +```{r add_sampleFeats} +#more_sample_data = read.table("data/test_sample_feature_data.tsv", +# header = TRUE, + # sep="\t") +#head(more_sample_data) +#myData = addsampleFeatures(object = myData,sample.data = more_sample_data,id.col = "sample") +#sampleFeats = getSamplefeatures(object = myData) +#head(sampleFeats) +``` +### Object subsetting + +```{r subsetting_data} +#myData[1:10] +``` + +```{r subsetting_data2} +#subSamples = getSamples(myData)[1:20] +#myData[subSamples] +#getExperiment(myData[subSamples]) +``` + +### CN features + +```{r calc_features} +#myData +#myData = calculateFeatures(object = myData,method = "mac") +#myData +#getExperiment(myData) +``` + +```{r subsetting_data3} +#myData = myData[1:100] +#myData +#length(unique(myData@featData$segsize$I)) +``` + +### SamplexComponent matrix +```{r sampCompMat} +#myData = calculateSampleByComponentMatrix(object = myData) +#myData +``` + +```{r sampCompMat_details} +#names(myData@featFitting) +#myData@featFitting$method +#myData@featFitting$model +``` + +```{r subsetting_data4} +# = myData[1:50] +#myData +``` + +```{r sxc_plotting} +#lotSampleByComponent(object = myData) +``` +