man/PACA.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/paca.R
\name{paca}
\alias{paca}
\title{Phenotype Aware Components Analysis (PACA)}
\usage{
paca(
  X,
  Y,
  k = NULL,
  scale = FALSE,
  rank = 5,
  thrsh = 10,
  ccweights = FALSE,
  info = 1
)
}
\arguments{
\item{X}{\eqn{m} by \eqn{n_1} matrix, where \eqn{m > n_1}; \cr
         Case (foreground) input data matrix. \cr

         Note: this input data needs to be scaled along the samples axis before being provided as input.
         This preprocessing can be done using the \code{\link{transformCCAinput}} function.}

\item{Y}{\eqn{m} by \eqn{n_0} matrix, where \eqn{m > n_0}; \cr
Control (foreground) input data matrix. \cr
Note: this input data needs to be scaled along the samples axis before being provided as input.
This preprocessing can be done using the \code{\link{transformCCAinput}} function.}

\item{k}{positive integer, optional (default: \eqn{NULL}); \cr
Number of, \eqn{k}, dimensions of shared variation to be removed from case data \code{X}. \cr
When \eqn{k = NULL} (default), K is automatically infered, i.e, we run autoPACA by default.}

\item{scale}{bool, optional (default: \eqn{FALSE}); normalize (center+scale) each matrix column-wise}

\item{rank}{Positive integer, optional (default \eqn{2}); \cr
Number of dominant principle components to be computed for the corrected case data.}

\item{thrsh}{Positive real value, optional (default \eqn{10}); \cr
Threshold value for the maximum ratio of variance in \emph{PACA} corrected \code{X} PCs and the variance it explain in Y
which indicates the presence of residual shared variation in X.}

\item{ccweights}{bool, optional (default \eqn{FALSE}); \cr
If \eqn{TRUE}, return the \emph{PACA} corrected case data (\code{xtil}) ONLY.}

\item{info}{Integer, optional (default: 0); \cr
Verbosity level for the log generated. \cr
0: Errors and warnings only \cr
1: Basic informational messages \cr
2: More detailed informational messages \cr
3: Debug mode, all informational log is dumped}
}
\value{
By default, \code{paca} returns a list containing the following components:
\describe{
   \item{Xtil}{     \eqn{m} by \eqn{n_1} matrix; \cr
                    the \emph{PACA} corrected case data, i.e., the data with the case-specific variation only.
   }
   \item{U0}{       \eqn{m} by \eqn{k} matrix; \cr
                    the \emph{PACA} shared components that are removed from \eqn{X}.
   }
   \item{x}{        \eqn{n_1} by \eqn{rank} matrix; \cr
                    the projections / scores of the \emph{PACA} corrected case data (\code{Xtil}).
   }
   \item{rotation}{  \eqn{m} by \eqn{rank} matrix; \cr
                     the rotation (eigenvectors)  of the \emph{PACA} corrected case data (\code{Xtil}).
   }
   \item{k}{         the number of shared components removed, int
   }
}

When \eqn{ccweights = TRUE}, \code{paca} returns a list containing the CCA direction and variates along withe the \emph{PACA} principle components:
\describe{
   \item{Xtil}{     \eqn{m} by \eqn{n_1} matrix; \cr
                    the \emph{PACA} corrected case data, i.e., the data with the case-specific variation only.
   }
   \item{U0}{       \eqn{m} by \eqn{k} matrix; \cr
                    the \emph{PACA} shared components that are removed from \eqn{X}.
   }
   \item{x}{        \eqn{n_1} by \eqn{rank} matrix; \cr
                    the projections / scores of the \emph{PACA} corrected case data (\code{Xtil}).
   }
   \item{rotation}{\eqn{m} by \eqn{rank} matrix; \cr
                     the rotation (eigenvectors)  of the \emph{PACA} corrected case data (\code{Xtil}).
   }
   \item{k}{        the number of shared components removed, int
   }
   \item{A}{       the loadings for \eqn{X}
   }
   \item{B}{       the loadings for \eqn{Y}
   }
   \item{U}{       canonical variables of \eqn{X}, calculated by column centering \eqn{X} and projecting it on \eqn{A}
   }
   \item{V}{       canonical variables of \eqn{Y}, calculated by column centering \eqn{Y} and projecting it on \eqn{B}
   }
}
}
\description{
Phenotype Aware Components Analysis (PACA) is a
contrastive learning approach leveraging canonical correlation analysis to robustly capture weak sources of
subphenotypic variation. Given case-control data of any modality, PACA highlights the dominant variation in a
subspace that is not affected by background variation as a putative representation of phenotypic heterogeneity. We do so by
removing the top \code{k} components of shared variation from the cases (or foreground) \code{X}.
In the context of complex disease, PACA learns a gradient of variation unique to cases \code{X} in
a given dataset, while leveraging control samples \code{Y} for accounting for variation and imbalances of biological
and technical confounders between cases and controls.
}