Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add split treemap algorithm as tile (WIP) #72

Closed

Conversation

timelyportfolio
Copy link

@timelyportfolio timelyportfolio commented Dec 9, 2016

This pull (#71) seeks to add the split algorithm as d3.treemapSplit.

References:

Examples

http://bl.ocks.org/timelyportfolio/2bbd4cedf597eae6a5e90412d9df1c1f
http://bl.ocks.org/timelyportfolio/e480359ea953d96a38a4466187d3d62b

Todo

  • add tests (f73f12b)
  • refactor to more closely mirror d3 style
  • document (60cbfed)

@thomasp85
Copy link

Well, technically my implementation is in ggraph and not ggforce but you're forgiven😊

@timelyportfolio
Copy link
Author

@thomasp85, fixed

@mbostock mbostock self-assigned this Dec 11, 2016
@mbostock
Copy link
Member

This looks a lot like the binary treemap algorithm. I don’t have time at the moment to exactly describe the differences, but it looks to me like even if the output is different, we could reuse optimizations from the binary treemap implementation here. Specifically, that implementation computes the cumulative values of the nodes once (in sums) and then does a binary search to split the current partition. This avoids repeatedly recalculating the sums and doing a linear scan during each recursion; see #44 for context.

@timelyportfolio
Copy link
Author

I started with binary because it looked closest but then went into get-it-working mode. The sums seems like a very easy optimization, so I will try to implement after focusing on the binary implementation.

@timelyportfolio
Copy link
Author

timelyportfolio commented Dec 12, 2016

A missing return caused my problem. In 2c56288 I optimized by avoiding repeated calculation of the sum, since we already know the sum for each split list. I also added a check for non-empty list before recursing splitTree.

Results on a decent size random hierarchy seem to indicate no success.

optimized bl.ock

treemap split took 106.86500000000001 milliseconds.
treemap split took 52.39999999999998 milliseconds.
treemap split took 49.55000000000007 milliseconds.
treemap split took 49.04499999999996 milliseconds.
treemap split took 48.725000000000136 milliseconds.

non-optimized bl.ock

treemap split took 82.88999999999999 milliseconds.
treemap split took 62.835000000000036 milliseconds.
treemap split took 58.96500000000003 milliseconds.
treemap split took 60.08000000000004 milliseconds.
treemap split took 59.70999999999992 milliseconds.

@timelyportfolio
Copy link
Author

timelyportfolio commented Dec 12, 2016

However, if we know parent.value is the sum, then we can send that on the call to splitTree(parent.children, parent.value, ...) and that will avoid another recalculation of the sum. I pursued this optimization in fd832be, and those results seem promising.

treemap split took 26.19999999999999 milliseconds.
treemap split took 21.41500000000002 milliseconds.
treemap split took 12.194999999999993 milliseconds.
treemap split took 15.704999999999927 milliseconds.
treemap split took 10.965000000000032 milliseconds.

@mbostock
Copy link
Member

Regarding the optimization (and please take this with a grain of salt since I’ve only done a cursory look at the code and might not be understanding it correctly): I think you’ll want to avoid calling nodes.slice during each recursive call, too, and instead pass indexes i and j (representing the inclusive start and exclusive end index of the current partition of nodes), as is done in the binary implementation. Also looks like you could compute the initial sum before calling splitTree the first time, rather than checking if sum is null.

@timelyportfolio
Copy link
Author

timelyportfolio commented Dec 12, 2016

@mbostock, is it ok to assume that all hierarchies passed to treemap have been summed? If so, we can entirely eliminate the null check on sum.

In terms of removing the nodes.slice, I am not sure how I can perform the split with just the indexes, but now that split is working again I will revisit binary again specifically looking at the index pieces.

Thanks!

nodes[0].y0 = y0;
nodes[0].y1 = y1;
if (nodes[0].children && nodes[0].children.length > 0) {
splitTree(nodes[0].children, parent.value, x0, y0, x1, y1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it assumes that parent.value is equal to the sum of the children’s values, which won’t be the case if the parent has a self-value.

return;
}

splitTree(parent.children, parent.value, x0, y0, x1, y1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it assumes that parent.value is equal to the sum of the children’s values, which won’t be the case if the parent has a self-value. You’ll need to sum parent.children[i].value before calling splitTree.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I wondered about this assumption. It sounds like my assumption could easily be violated in which case speed improvements will be dramatically reduced :( I'll revert back to the original sum handling.

w2 = 0;


if (sum === null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would sum ever be null?

Copy link
Author

@timelyportfolio timelyportfolio Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced the sum argument to splitTree() to avoid duplicating the sum calculation especially in the situation of recursing through the split list where we have already calculated the sum in the prior iteration. However, as you said above, there are cases where we might not know the sum in which we will call splitTree(nodes, null, ...) and expect the sum to be calculated.

@mbostock
Copy link
Member

is it ok to assume that all hierarchies passed to treemap have been summed? If so, we can entirely eliminate the null check on sum.

I’m confused but it’s probably because I am giving you drive-by feedback without looking too closely at the code. Yes, node.sum must be called on root before invoking the treemap. But no, the sum you were computing by adding together the children is not necessarily equal to the parent.value because nodes can have self-values (internal values), which should be visually represented as leftover whitespace for the parent without any overlaid child. My point about the null check was that it looked like the only time splitTree’s sum was null was the first time you called it (before recursing), so you could just compute that sum once when you call splitTree the first time, rather than checking for null on each recursion.

@timelyportfolio
Copy link
Author

timelyportfolio commented Dec 12, 2016

I added a quick speed result in this example for comparison.

Also, I added here bl.ock.

@mbostock, I discovered that treemap itself handles recursion into the children. That was my big disconnect. I removed this portion in 4f05ea2, and speed increased dramatically.

n = nodes.length;

for ( ; i < n; ++i) {
sum += nodes[i].value;
Copy link
Author

@timelyportfolio timelyportfolio Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbostock, should this sum include self value? I think no since I copied this from binary.

@mbostock
Copy link
Member

mbostock commented Mar 2, 2017

I think the main thing this is missing is some optimization. I believe that this implementation would benefit from the tricks used by d3.treemapBinary, which are:

(This is the same feedback I gave you in December.)

@mbostock
Copy link
Member

mbostock commented Mar 2, 2017

I feel like I still don’t have a grasp as to how this algorithm is different from the binary treemap algorithm. The implementations appear pretty similar… Can you give me a quick summary of how they differ?

@mbostock
Copy link
Member

mbostock commented Mar 2, 2017

So, this does appear to produce nearly identical results to the binary treemap algorithm, except the x and y positions are swapped. I added this to your linked example to compare:

data.each(function(d) {
  var x0 = d.x0, y0 = d.y0, x1 = d.x1, y1 = d.y1;
  d.x0 = y0, d.x1 = y1, d.y0 = x0, d.y1 = x1;
});

Still trying to understand why.

@mbostock
Copy link
Member

mbostock commented Mar 2, 2017

Okay, I have verified that these two algorithms are in fact identical¹, aside from the flipped x and y!

¹ I discovered a bug in my binary treemap implementation where it uses bisection to find the split point, but sometimes it is off by one because it doesn’t test the adjacent node. The fix will be something like this in binary.js:

if (Math.abs(sums[k - 1] - valueTarget) < Math.abs(sums[k] - valueTarget)) --k;

@mbostock
Copy link
Member

mbostock commented Mar 2, 2017

Just to demonstrate, this is the output from the binary treemap with the above fix and x and y swapped:

binary

@mbostock
Copy link
Member

mbostock commented Mar 2, 2017

Okay, the x and y were swapped because the binary treemap uses this test:

if ((y1 - y0) > (x1 - x0)) {
  // split vertically
} else {
  // split horizontally
}

Whereas the split treemap uses this test (rewritten slightly for parity):

if ((x1 - x0) > (y1 - y0)) {
  // split horizontally
} else {
  // split vertically
}

The difference is when x1 - x0 = y1 - y0: binary was using horizontal, split was using vertical. In our examples the layout size (and root node size) is a square 960×960. This seems like a nice improvement too since it will favor a vertical partition on square nodes.

@mbostock
Copy link
Member

mbostock commented Mar 2, 2017

Closing as fixed, since these turned out to be the same as binary treemaps. TIL!

@mbostock mbostock closed this Mar 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants