MultiAssayExperiment API

Executive summary

The MultiAssayExperiment class can be used to manage results of diverse assays on a collection of samples. Currently the class can handle assays that are organized as instances of RangedSummarizedExperiment, ExpressionSet, matrix, RangedRaggedAssay (inherited from GRangesList), and RangedVcfStack (defined in the yriMulti package in bioc-devel). Create new MultiAssayExperiment instances with the eponymous constructor, minimally with the argument Elist, potentially also with the arguments pData and sampleMap.

Other data classes can be used in the MultiAssayExperiment, as long as they provide four methods: colnames(), rownames(), [i, j], and dim(). See the Elist section for details on requirements for incorporating new data classes.

Note: For a brief visual summary of classes and methods involved in the package, please enter API(TRUE) after loading the package to invoke the API explorer shiny dashboard

Overview

The most important class exported by this package is the MultiAssayExperiment for coordinated representation of multiple experiments on partially overlapping samples, with associated metadata at the level of entire study and the level of "biological unit". The biological unit may be a patient, plant, yeast strain, etc. This package is designed around the following hierarchy of information:

study (highest level). The study can encompass several different types of experiments performed on one set of biological units, for example cancer patients. A MultiAssayExperiment represents a whole study, containing:

metadata about the study as a whole
metadata about each biological unit: for example, age, grade, stage for cancer patients
results from a set of experiments performed on the biological units
a map for matching data from the experiments back to the corresponding biological units.

experiment. A set of assays of a single type performed on some or all of the biological units. It is permissible that an experiment may be performed only on a subset of the biological units, and may be performed in duplicate on some of the biological units. For example, an experiment could be somatic mutation calls for some or all of the biological units.

Data from multiple experiments are stored in a list object called the Elist, which provides flexibility for partially overlapping samples (column names) and features (row names), while keeping samples correctly matched to study-level metadata and to other experiments on the same samples.

Experiments may be ID-based, where measurements are indexed identifiers of genes, microRNA, proteins, microbes, etc. Alternatively, experiments may be range-based, where measurements correspond to genomic ranges that can be represented as GRanges objects, such as gene expression or copy number. Note that for ID-based experiments, there is no requirement that the same IDs be present for different experiments. For range-based experiments, there is also no requirement that the same ranges be present for different experiments; furthermore, it is possible for different samples within an experiment to be represented by different ranges. Note however that even ranged-based features must be named, so that genomic features can be referred to by character IDs. The following data classes have so far been tested to work as elements of Elist:

matrix: the most basic class for ID-based datasets, could be used for example for gene expression summarized per-gene, microRNA, metabolomics, or microbiome data.
ExpressionSet: A richer representation for ID-based datasets, could be used for the same types of data as matrix, but storing additional assay-level metadata.
RangedSummarizedExperiment: For rectangular range-based datasets, meaning that one set of genomic ranges are assayed for multiple samples. Could be used for gene expression, methylation, or other data types referring to genomic positions.
RaggedRangedAssay: For non-rectangular (ragged) ranged-based datasets, meaning that a potentially different set of genomic ranges are assayed for each sample. A typical example would be segmented copy number, where segmentation of copy number alterations occurs and different genomic locations in each sample.
RangedVcfStack: For VCF archives broken up by chromosome (see VcfStack class defined in GenomicFiles package)

samples (lowest level). An individual set of measurements performed on a single biological unit. These measurements must be indexed by character IDs, however datasets may be ID-based (such as matrix or ExpressionSet) or range-based (such as RangedSummarizedExperiment). In the experimental datasets, columns refer to samples, and rows refer to genomic features that are represented by IDs or ranges.

`MultiAssayExperiment` class

Overview

The MultiAssayExperiment class is the main representation of multiple experiment data. It contains all information required to subset and match sample identifiers with clinical records.

Structure

Elist - slot of class Elist containing data for each experiment/assay
- contains "SimpleList" class from S4Vectors
pData - slot of class DataFrame describing the clinical data available across all experiments
sampleMap - slot of class DataFrame of translatable identifiers of samples and participants
metadata - slot of any class providing additional information about the MultiAssayExperiment object
drops - slot of class list to keep a log of all residuals from subset operations

Validity

Elist
Elist length should be the same as the unique length of the sampleMap "assayname" column.
Element names of the Elist should be found in the sampleMap "assayname" column.
For each Elist element (say for an element named "assay X"), the colnames of that element must be found in the "assay" column of the sampleMap within the rows where the "assayname" equals the name of that Elist element (in this example, "assay X"). The order does not need to be the same.
pData
Ensure that this slot is of class DataFrame
sampleMap - validity checks include checks for consistency between the sampleMap and the pData primary (or phenotype) data slot
all names in the sampleMap "master" column must be found in the rownames of the pData DataFrame.
Within rows of sampleMap corresponding to a single value in the "assayname" column, there can be no duplicated values in the "assay" column.

Note. These validity checks only apply when at least an Elist slot is provided at MultiAssayExperiment object creation.

`Elist` class

Overview

The Elist slot and class is the driver for the MultiAssayExperiment class as it contains necessary data from experiments and sample identifiers. The purpose of the Elist is to store results from a set of experiments, as a SimpleList. The list has one element per experiment performed.

Structure

Elist - inherits from SimpleList with no additions. Contains separate validity checks and a show method.

Validity

Elist elements
For data classes stored in each Elist element, ensure that method functions [ (bracket), colnames, rownames, and dim are possible.
For each Elist element, ensure that dimensions of non-zero length in each Elist element have non-null rownames and colnames.

Rationale

Elist element requirements
The requirement of methods [ (bracket), colnames, rownames, and dim allow for predictable subsetting operations and metadata acquisition.
Standard subsetting by columns or rows match character vectors to the rownames or colnames, so any Elist element with more than zero columns must have non-NULL colnames, and elements with more than zero rows must have non-NULL rownames.

Any data class that provides the following methods can be used as an element of Elist. RangedSummarizedExperiment provides the template behavior for Elist elements, as follows. These are "template" behavior, but not explicit requirements:

colnames(), by returning a character vector of sample identifiers
rownames(), by returning a character vector of feature identifiers
[i, j], by returning the restriction of the instance to rows i and columns j
dim(), by returning integer vector of length two for row and column length

`RangedRaggedAssay` class

Overview

The RangedRaggedAssay class is an extension of the GRangesList Bioconductor class. It is intended to mainly handle segmented copy number data. The visual element of a RangedRaggedAssay class includes a ragged table where columns represent the samples and the rows disjoint ranges. This class allows for such operations as colnames and rownames. The assay acessor will return available experiment metadata columns.

Optional validity checks for developers

`hasAssay` function

The standard assay functionality allows the user to obtain a numeric matrix of data. The current hasAssay function includes a "soft" check that ensures all classes in an existing MultiAssayExperiment class object have listed assay methods via the hasMethods function. For convenience, the argument passed to the hasAssay function can either be a MultiAssayExperiment or a list class object.

Experimental approach to model and plot specifications

Given a MultiAssayExperiment instance with elements named 'a', 'b', we propose that formulae can refer to the element names to select assays. The allLM_pw function operates on four basic inputs: a formula, a MultiAssayExperiment instance, and two optional transformation functions. mylms = allLM_pw(a~b, mae) will compute all pairwise regressions of features in element a of mae on features in element b of mae. The result is a list with two components: the list of all lm() fits, and a list of t-statistics for slopes.

pwplot(a~b, f1~f2, mae) will obtain feature f1 from element a of mae and f2 from element b of mae and form the scatterplot of feature values across all samples common to the two assays.

Coming: `MultiAssayView` class

This is not yet part of the official API, but is here as a placeholder.

Overview

The MultiAssayView class represents an initial step in the subsetting operations of the MultiAssayExperiment object. The main purpose of this class is to provide an explicit mechanism for intended and sequential subsetting operations without having to manipulate the full MultiAssayExperiment object in memory.

Structure

query - Either a character or range-based class
keeps - A list representation of matched queries for each assay in the MultiAssayExperiment
drops - A list representation of unmatched queries to be dropped from the MultiAssayExperiment object
type - An atomic character vector indicating the type of subset to be performed

Revision

colnames
rownames
assay

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiAssayExperiment API

Executive summary

Overview

`MultiAssayExperiment` class

Overview

Structure

Validity

`Elist` class

Overview

Structure

Validity

`RangedRaggedAssay` class

Overview

Optional validity checks for developers

`hasAssay` function

Experimental approach to model and plot specifications

Coming: `MultiAssayView` class

Overview

Structure

Clone this wiki locally

MultiAssayExperiment API

Executive summary

Overview

MultiAssayExperiment class

Overview

Structure

Validity

Elist class

Overview

Structure

Validity

RangedRaggedAssay class

Overview

Optional validity checks for developers

hasAssay function

Experimental approach to model and plot specifications

Coming: MultiAssayView class

Overview

Structure

Clone this wiki locally

`MultiAssayExperiment` class

`Elist` class

`RangedRaggedAssay` class

`hasAssay` function

Coming: `MultiAssayView` class