Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stratify could infer implied parent nodes? #33

Closed
syntagmatic opened this issue Apr 17, 2016 · 11 comments · Fixed by #185
Closed

Stratify could infer implied parent nodes? #33

syntagmatic opened this issue Apr 17, 2016 · 11 comments · Fixed by #185
Assignees

Comments

@syntagmatic
Copy link

syntagmatic commented Apr 17, 2016

In the existing d3.stratify examples, it seems parent nodes must be included in the original CSV.

Example: http://bl.ocks.org/mbostock/9d0899acb5d3b8d839d9d613a9e1fe04

id,value
flare,
flare.analytics,
flare.analytics.cluster,
flare.analytics.cluster.AgglomerativeCluster,3938

It would be convenient if the first 3 rows in the above weren't required. In other words, this would be sufficient:

id,value
flare.analytics.cluster.AgglomerativeCluster,3938
flare.analytics.cluster.CommunityStructure,3812
flare.analytics.cluster.HierarchicalCluster,6714
flare.analytics.cluster.MergeEdge,743

The nodes flare, flare.analytics, and flare.analytics.cluster would be created automatically in the resulting hierarchy. Currently running the stratify function against the above CSV would throw an error like this:

Error: missing: flare.analytics.cluster

From this line of d3.stratify:

if (!parent) throw new Error("missing: " + nodeId);

if (!parent) throw new Error("missing: " + nodeId);
@syntagmatic
Copy link
Author

syntagmatic commented Apr 17, 2016

An alternate basic example that solves the problem with d3.nest or other method would work too.

@mbostock mbostock changed the title d3.stratify missing parent nodes Stratify could infer implied parent nodes? Apr 21, 2016
@mbostock
Copy link
Member

Yes, you have to specify all parent nodes in the tabular data. There’s no way for d3.stratify to infer the implied parent nodes in the example you gave because it doesn’t understand the semantics of the id: the identifier is opaque. So, this is flat:

id,value
flare.analytics.cluster.AgglomerativeCluster,3938
flare.analytics.cluster.CommunityStructure,3812
flare.analytics.cluster.HierarchicalCluster,6714
flare.analytics.cluster.MergeEdge,743

Just like this:

id,value
foo,3938
bar,3812
baz,6714
qux,743

I couldn’t think of an elegant way to create the implied parent nodes. But maybe d3.stratify could have a mode where it creates implied parent nodes automatically if you opt-in… but you’d need another accessor function to either specify all parents of a particular node (flare.analytics.cluster.AgglomerativeCluster ↦ [flare, flare.analytics, flare.analytics.cluster]), or equivalently an accessor function to compute the parent of an implied parent (flare.analytics.cluster ↦ flare.analytics, flare.analytics ↦ flare, flare ↦ null).

At any rate, you wouldn’t want to use d3.stratify with d3.nest because d3.nest already returns a hierarchical data structure; you want to use d3.hierarchy and pass a children accessor. I’ll cook up an example for that.

@mbostock
Copy link
Member

Here’s an example using d3.treemap and d3.nest:

screen shot 2016-04-20 at 9 36 18 pm

@syntagmatic
Copy link
Author

syntagmatic commented Apr 21, 2016

Take a look at the burrow function in this example. The advantage over nest is that each entry can be at an arbitrary depth. The advantage over stratify as that parent nodes need not be specified in advance.

In the tsv file that is the output of du, the parent nodes appear after the children, but that still works just fine. A burrow-like function seems too similar to stratify to be included in d3-hierarchy, but it's a convenient and forgiving way to generate hierarchies from flat files.

screen shot 2016-04-21 at 1 14 02 am

The particular use case I came across was a collection of organisms with partial taxonomy per organism. These are a few sample values from that column (a single column with comma-delimited strings in a TSV):

taxonomy
Proteobacteria, Bacteria
Burkholderiales, Betaproteobacteria, Proteobacteria, Bacteria
Microbacterium testaceum, Microbacterium, Micrococcales, Actinobacteria, Actinobacteria, Bacteria
Novosphingobium sp. MD-1, Novosphingobium, Sphingomonadales, Alphaproteobacteria, Proteobacteria, Bacteria

@syntagmatic
Copy link
Author

Looking at #34, it seems like creating implied parent nodes could be problematic with stratify if parents were to appear later in the dataset (like the output of du I posted)

@mbostock
Copy link
Member

mbostock commented May 3, 2016

This part in your example:

data.forEach(function(row) {
  row.taxonomy = row.file.split("/");
});

Is functionally equivalent to what I was suggesting here with d3.stratify:

you’d need another accessor function to either specify all parents of a particular node (flare.analytics.cluster.AgglomerativeCluster ↦ [flare, flare.analytics, flare.analytics.cluster]), or equivalently an accessor function to compute the parent of an implied parent (flare.analytics.cluster ↦ flare.analytics, flare.analytics ↦ flare, flare ↦ null).

@mbostock
Copy link
Member

mbostock commented Jan 11, 2017

Ordering shouldn’t be a problem. After a full pass of the input data, you’d have a set of missing parents. You’d then create nodes for each missing parent, and determine the next set of missing parents. When that set is empty, you’d be done.

Three options:

  1. If we required a datum.id, rather than having an arbitrary id accessor function (say as with the proposed d3.stratifyDot in d3.stratifyDot? d3.stratifySlash? #75), then we can generate an implied parent node of the form {id: …, children: […]}, and pass that to the parentId accessor function. But I don’t see an elegant way of opting-in to this functionality, other than maybe stratify.parentImplied(true) and then throwing an error if stratify.id is set to anything on than the default.

  2. If we added a stratify.ancestorIds method, it could return the identifiers of every ancestor, and then we could create the missing ancestors as needed. This would be exclusive with stratify.parentId: setting one would clear the other.

  3. Alternatively, a stratify.ancestorId method, which takes an identifier rather than a datum, and this is called repeatedly to determine the implied ancestors. This would also be exclusive with stratify.parentId: setting one would clear the other. It’s arguably confusing that this method has a different signature than stratify.parentId, but it seems reasonable…

@mbostock
Copy link
Member

mbostock commented Jan 11, 2017

If stratify.ancestorId is used, it’d be nice if we could automatically set a node.name, too. For example, if the id is flare.analytics.cluster.AgglomerativeCluster and the ancestor of this is flare.analytics.cluster, the implied local name should be AgglomerativeCluster. But, I don’t see a general way of inferring the name given an id and an ancestor id—I suppose there’s no requirement that the ancestor id is a prefix of the id, and there’s certainly no requirement that a single character is used as a separator.

Maybe I’m trying too hard to build this into d3.stratify, rather than creating something like d3.stratify that requires that the hierarchy structure be derived solely from delimiter-separated strings. And in that case, we can compute the implied parents and local names automatically.

@timelyportfolio
Copy link

timelyportfolio commented Jul 21, 2017

Would something like this hastily assembled flattree be useful here? @syntagmatic, this example demonstrates how parent levels are inferred when not explicitly provided. In this example, parents are explicitly specified.

I do understand though that the expected input structure is not sparse, and for this reason flattree might be better as a standalone entity.

@interwebjill
Copy link

Eagerly looking forward to this solution.

@roveo
Copy link

roveo commented Jul 20, 2018

Maybe I’m trying to hard to build this into d3.stratify, rather than creating something like d3.stratify that requires that the hierarchy structure be derived solely from delimiter-separated strings. And in that case, we can compute the implied parents and local names automatically.

I think it can go both ways: provide a general stratify.ancestorIds method and then a custom method for working with character-separated prefixed identifiers.

This was referenced Oct 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

5 participants