Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/biocore/empress into quic…
Browse files Browse the repository at this point in the history
…k-fixes
  • Loading branch information
fedarko committed Oct 8, 2020
2 parents f6ceff3 + 47d03b5 commit 4789630
Show file tree
Hide file tree
Showing 18 changed files with 1,163 additions and 173 deletions.
56 changes: 51 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Empress
[![Build Status](https://travis-ci.org/biocore/empress.svg?branch=master)](https://travis-ci.org/biocore/empress)
[![Build Status](https://travis-ci.org/biocore/empress.svg?branch=master)](https://travis-ci.org/biocore/empress)

<!---Empress Logo--->

Expand All @@ -18,7 +18,10 @@ conda activate qiime2-2020.6

You can replace `qiime2-2020.6` above with whichever version of QIIME 2 you have currently installed.

Now we are ready to install Empress. Run the following commands:
Now we are ready to install Empress. Run the following commands to do so. (Note
that Emperor will be _re-installed_ when the second command is run; the reason
we uninstall it first is so that we can ensure the most up-to-date version of
it is available.)

```
pip uninstall --yes emperor
Expand Down Expand Up @@ -177,9 +180,15 @@ Similarly to other tree visualization tools like [iTOL](https://itol.embl.de/),
#### First: a small warning about barplots

Although barplots are very useful for identifying patterns, be wary of
reading too much into them! The ordering of tips / clades on the same level of
the tree is [currently arbitrary](https://github.com/biocore/empress/issues/170),
and this can impact the way barplots look in ways that might not be immediately
reading too much into them! The way the rectangular and circular layouts work
means that a tip that looks "next" to another tip may actually be somewhat far
away from that tip (e.g. in the rectangular layout if one tip is at the top of
its clade, and another tip just "above" it is at the bottom of its clade). An
example of this is shown below with the mustard and lavender clades:

![Example of this phenomenon on the moving pictures dataset](docs/moving-pictures/img/empress_funky_barplot_example.png)

This can impact the way barplots look in ways that might not be immediately
obvious. To quote "Inferring Phylogenies" (Felsenstein 2004), pages 573–574:

> It is worth noting that by reordering tips, you can change the viewer's impression of the closeness of relationships. [...] A little judicious flipping may create a Great Chain of Being marching nicely along the sequence of names, even though the tree supports no such thing.
Expand Down Expand Up @@ -322,6 +331,43 @@ This is a good example of when your data can tell you something about your metad

## Additional Considerations

### Providing multiple metadata files

QIIME 2 allows you to specify multiple metadata files at once by just
repeating `--m-feature-metadata-file` (or `--m-sample-metadata-file`). For
example, we may want to visualize feature importances on a tree
in addition to taxonomic annotations:

```bash
qiime empress community-plot \
--i-tree rooted-tree.qza \
--i-feature-table table.qza \
--m-sample-metadata-file sample_metadata.tsv \
--m-feature-metadata-file taxonomy.qza \
--m-feature-metadata-file feature_importance.qza \
--o-visualization empress-tree.qzv
```

However, what QIIME 2 will do internally ([as of writing](https://forum.qiime2.org/t/support-other-metadata-merging-strategies/15907))
is filter the metadata to
_just_ the entries contained in _all_ of the input metadata files. So, in the
example above, if the `feature_importance.qza` file only has entries for a
couple of features (compared to the `taxonomy.qza` file), then the feature
metadata Empress receives will be limited to just the features contained in
both the feature importance and taxonomy metadata files -- which will mean that
less taxonomy information will be available in the Empress interface!

In the interim, the way to get around this (and to include multiple sources of
feature or sample metadata in Empress) is to merge metadata yourself before
creating an Empress visualization. Of course, you'll need to determine what
value(s) to assign to indicate that a given entry is "missing"; for
quantitative metadata, `NaN` or an empty value are both reasonable options.

Merging metadata files should be doable in many different programming
languages or spreadsheet tools; see
[this GitHub issue](https://github.com/biocore/empress/issues/393) for some
example Python code that does this.

### Filtered vs. raw table?

When your ordination was created from a subset of your original dataset (e.g. the feature table was rarefied, or certain low-frequency features or samples were otherwise filtered out), we recommend that you carefully consider *which* feature table you would like to visualize in Empress. You can use either:
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
245 changes: 233 additions & 12 deletions empress/support_files/js/bp-tree.js
Original file line number Diff line number Diff line change
Expand Up @@ -83,13 +83,12 @@ define(["ByteArray", "underscore"], function (ByteArray, _) {
this._numTips = new Array(this.size + 1).fill(0);

/**
* @type{Float32Array}
* @type {Array}
* @private
* stores the length of the nodes in preorder. If lengths are not
* provided then lengths will be set to 0.
* Note: lengths are assumed to be smaller that 3.4 * 10^38
* Stores the length of the nodes in preorder. If lengths are not
* provided then lengths will be set to null.
*/
this.lengths_ = lengths ? new Float32Array(lengths) : null;
this.lengths_ = lengths ? lengths : null;

/**
* @type {Uint32Array}
Expand Down Expand Up @@ -165,14 +164,24 @@ define(["ByteArray", "underscore"], function (ByteArray, _) {
* Stores the order of nodes in an in-order traversal. Elements in this
* array are node ids
*
* Note: In-order is stored because bp-tree doesn't not have
* Note: In-order is stored because bp-tree doesn't have
* an efficient way of convert a nodes in-order position to tree
* index and vice versa like it does with post order through
* the use of postorderselect() and postorder(). So it is more
* efficient to cache an in-order tree traversal.
*/
this._inorder = null;

/**
* @type{Array}
* @private
*
* Stores the order of nodes in ascending and descending-leaf-sorted
* postorder traversals. Elements in this array are node IDs.
*/
this._ascendingLeafSorted = null;
this._descendingLeafSorted = null;

/**
* @type {Object}
* @private
Expand All @@ -183,6 +192,43 @@ define(["ByteArray", "underscore"], function (ByteArray, _) {
this._nameToNodes = {};
}

/**
* Returns an Object describing the minimum, maximum, and average of all
* non-root node lengths.
*
* @return {Object} Contains three keys: "min", "max", and "avg", mapping
* to Numbers representing the minimum, maximum, and
* average non-root node length in the tree.
* @throws {Error} If this tree does not have length information (i.e.
* this.lengths_ is null; this should only happen during
* testing).
*/
BPTree.prototype.getLengthStats = function () {
if (this.lengths_ !== null) {
var min = Number.POSITIVE_INFINITY,
max = Number.NEGATIVE_INFINITY,
sum = 0,
avg = 0;
// non-root lengths should be guaranteed to be nonnegative, and
// at least one non-root length should be positive.
//
// The x = 1, skips the first element (lengths_ is 1-indexed);
// the x < this.lengths_.length - 1 skips the last element
// (corresponding to the root length, which we ignore on purpose)
for (var x = 1; x < this.lengths_.length - 1; x++) {
min = Math.min(min, this.lengths_[x]);
max = Math.max(max, this.lengths_[x]);
sum += this.lengths_[x];
}
min = min;
max = max;
avg = sum / (this.size - 1);
return { min: min, max: max, avg: avg };
} else {
throw new Error("No length information defined for this tree.");
}
};

/**
*
* Returns the number of times bit t was observed leading upto bit t
Expand Down Expand Up @@ -588,7 +634,7 @@ define(["ByteArray", "underscore"], function (ByteArray, _) {
};

/**
* Returns an array of nodes sorted by their inoder position.
* Returns an array of nodes sorted by their inorder position.
*
* Note: empress uses a nodes postorder position as its key in _treeData
* so this method will use a nodes postorder position to represent
Expand All @@ -610,11 +656,7 @@ define(["ByteArray", "underscore"], function (ByteArray, _) {
this._inorder.push(this.postorder(curNode));

// append children to stack
var child = this.fchild(curNode);
while (child !== 0) {
nodeStack.push(child);
child = this.nsibling(child);
}
nodeStack = nodeStack.concat(this.getChildren(curNode));
}
return this._inorder;
};
Expand Down Expand Up @@ -708,6 +750,185 @@ define(["ByteArray", "underscore"], function (ByteArray, _) {
return numTips;
};

/**
* Returns a list of nodes' postorder positions in the tree, using "leaf
* sorting" -- clades at the same level are sorted by the number of leaves
* they contain.
*
* This (usually) makes the tree look pretty.
*
* This code behaves similarly to this.inOrderNodes(), in that it caches
* the result once computed. Therefore, this function should be faster the
* second time it's called (assuming the sortingMethod is the same).
*
* @param {String} sortingMethod Either "ascending" or "descending". Any
* other values will trigger an error.
* -"ascending": Order clades starting with
* the clade with the fewest tips and ending
* with the clade with the most tips (ties
* broken arbitrarily).
* -"descending": Opposite of "ascending"
* (start with most-tip clades then go down;
* ties broken arbitrarily).
* @return {Array} Array of nodes' postorder positions in the tree, sorted
* as specified.
* @throws {Error} if sortingMethod is not "ascending" or "descending".
*
* REFERENCES
* ----------
* This was translated from this Python code to do leaf-sorted postorder
* traversal over a scikit-bio TreeNode:
* https://github.com/biocore/empress/commit/2b3d90f52fc3118b3641ddc6378d3659abdb0d05
*
* That code was in turn a slightly modified version of scikit-bio's
* postorder tree traversal code:
* https://github.com/biocore/scikit-bio/blob/6ccba4076f1b96843fa2428804cc5a91bf4b76b8/skbio/tree/_tree.py#L1085-L1154
*
* And this functionality was inspired by iTOL's use of it, of course:
* https://itol.embl.de/help.cgi (see the "Leaf sorting:" text)
*/
BPTree.prototype.postorderLeafSortedNodes = function (sortingMethod) {
if (sortingMethod === "ascending") {
if (this._ascendingLeafSorted !== null) {
return this._ascendingLeafSorted;
}
} else if (sortingMethod === "descending") {
if (this._descendingLeafSorted !== null) {
return this._descendingLeafSorted;
}
} else {
throw new Error(
"Unrecognized leaf sorting method " + sortingMethod
);
}
var outputNodes = [];
var childIdxStack = [0];
var rootNode = this.preorderselect(1);
var currNode = rootNode;
var currChildren = this.getSortedChildren(currNode, sortingMethod);
var currChildrenLen = currChildren.length;
var currIdx, currChild;
while (true) {
currIdx = childIdxStack[childIdxStack.length - 1];
// If children left, process them.
if (currIdx < currChildrenLen) {
currChild = currChildren[currIdx];
// If currChild has children, go there.
if (!this.isleaf(currChild)) {
childIdxStack.push(0);
currNode = currChild;
currChildren = this.getSortedChildren(
currNode,
sortingMethod
);
currChildrenLen = currChildren.length;
currIdx = 0;
}
// Otherwise, add this child.
else {
outputNodes.push(this.postorder(currChild));
childIdxStack[childIdxStack.length - 1]++;
}
}
// If no children left, add self and move on to self's parent
else {
outputNodes.push(this.postorder(currNode));
if (currNode === rootNode) {
break;
}
currNode = this.parent(currNode);
currChildren = this.getSortedChildren(currNode, sortingMethod);
currChildrenLen = currChildren.length;
childIdxStack.pop();
childIdxStack[childIdxStack.length - 1]++;
}
}
// Cache the results so we don't have to do all that again
if (sortingMethod === "ascending") {
this._ascendingLeafSorted = outputNodes;
} else {
this._descendingLeafSorted = outputNodes;
}
return outputNodes;
};

/**
* Returns an array containing the children of a node.
*
* The order of the array is based on fchild and nsibling, which I think
* should match what done in the input Newick file.
*
* If the input node has no children (i.e. it's a leaf / tip), this will
* return an empty Array.
*
* @param {Number} node Index of the node. "Index" here refers to the
* 0-indexed position in the balanced parentheses of
* the opening paren for the node in question, e.g.
* 0 1 2 3 4 5
* ( () () ( () () ) )
* @return {Array} children Array of child indices, specified analogously
* to the node index above. As an example, the
* children of node 3 in the tree above would be
* 4 and 5.
*/
BPTree.prototype.getChildren = function (node) {
var children = [];
var child = this.fchild(node);
while (child !== 0) {
children.push(child);
child = this.nsibling(child);
}
return children;
};

/**
* Returns an array containing the children of a node, sorted by the number
* of tips their subtrees contain.
*
* If the input node has no children (i.e. it's a leaf / tip), this will
* return an empty Array.
*
* Ties (e.g. when an internal node's children are all tips, and thus
* "contain" 1 tip) should respect the ordering in the initial Newick file,
* since _.sortBy() is a stable sort: https://underscorejs.org/#sortBy
*
* (That said, we don't make any claims in the UI about this [at least not
* for the ascending/descending leaf sorting options], so it's not a huge
* deal.)
*
* @param {Number} node
* @param {String} sortingMethod Should be one of "ascending" or
* "descending". We don't bother validating
* it at this point -- if another string is
* passed in then the behavior of this
* function is undefined (realistically it'll
* probably just sort things in ascending
* order, which is what it currently does in
* that case, but we can't guarantee that
* won't change).
* @return {Array} children
*/
BPTree.prototype.getSortedChildren = function (node, sortingMethod) {
var scope = this;

var children = this.getChildren(node);
// Define a function mapping a node index to the number of tips its
// subtree contains. This function will be used to sort the array
// of children.
var child2numTips = function (childIdx) {
var numTips = scope.getNumTips(scope.postorder(childIdx));
if (sortingMethod === "descending") {
// Flip things around; sort in descending order. This should be
// quicker than sorting in ascending order and then reversing
// it afterwards.
return -numTips;
} else {
return numTips;
}
};
return _.sortBy(children, child2numTips);
};

/**
* True if name is in the names array for the tree
*
Expand Down
Loading

0 comments on commit 4789630

Please sign in to comment.