Next generation Tile Reduce #46

mourner · 2015-10-30T12:59:38Z

An early version of the next-generation Tile Reduce that is thousands of times faster than the current one. This is not a joke.

For example, generating GeoJSON road difference between OSM & Tiger 2015 for US West z12 (~26% of US) takes 2 minutes 45 seconds on my Macbook Pro; previously it would take a day on a much more powerful EC2 c3.4xlarge instance eating all of the available RAM.

The new architecture also makes Tile Reduce much more flexible and loosely coupled, e.g. easy to extend for raster support, weird tile formats, lazy decoding, etc.

I'll devlog how I did this, but the main idea behind the rewrite is removing almost all communication between the main thread and the workers and avoiding unnecessary work:

responsibility for fetching/parsing tiles fully moves to the workers
vector tiles are not parsed and converted to GeoJSON by default; instead, VT layer objects are passed to the workers for lazy decoding (parsing only the features that are relevant to the current task)
big GeoJSON output like difference is never passed back to the main thread; instead, it's streamed directly from workers to stdout; reduce event is only for smaller values like statistics

Somewhat breaks API compatibility, but this is something we have to do to support the new architecture. Upgrading existing processors should be easy once we finalize the work.

Also sports a fancy progress bar so that you could feel the unreal performance:

While all the basic stuff works, this is still a work in progress. Rough list of remaining tasks:

I would love feedback and help on this. We could win so much with a blazing fast Tile Reduce, so let's make it happen sooner.

cc @morganherlocker @MateoV @aaronlidman @tcql @tmcw @lxbarth @ericfischer @batpad @Rub21 @geohacker @arunasank @isiyu

mourner · 2015-10-30T15:18:15Z

Here's that US West OSM-Tiger diff I generated in 3 minutes on a map (disregard the weird boundaries).

aaronlidman · 2015-10-30T15:25:18Z

This sounds fantastic Vlad, I'll put aside some time this weekend to look more in-depth.

tcql · 2015-10-30T15:39:37Z

just adding my 👍 / proof; this is way fast. I did tests using mapbox/osm-sidewalker. I like it. The api on the map side feels slightly less friendly to newbies (since it comes through as vector tile instead of geojson), but I think it's advantageous overall.

I'm definitely in to sink some time to help move this too.

mourner · 2015-10-30T15:55:50Z

The api on the map side feels slightly less friendly to newbies (since it comes through as vector tile instead of geojson), but I think it's advantageous overall.

With the new architecture, it's pretty easy to make an option to pass GeoJSON instead of VT to the map function, and make it the default. It will still be relatively fast.

tcql · 2015-11-04T22:56:42Z

added support for tile queue pausing, so we don't pass too many messages to child workers

mourner · 2015-11-05T10:25:58Z

@tcql manual limiting broke the tile-reduce — now if you run the count example, the total tiles and results are different each time.

mourner · 2015-11-05T10:44:17Z

Fixed the tile list text stream somehow, but the JS array tile stream is still broken — for some reason it finishes when it hits the 50k limit.

mourner · 2015-11-05T11:16:32Z

Fixed the JS stream as well. Now I wonder if there would be a benefit to try to switch to using spawn and piping between the processes instead of send, like in comments here nodejs/node-v0.x-archive#4374

edit: also this

mourner · 2015-11-05T14:02:20Z

Running count on z12 world on my MBP, it goes super-smooth and fast for a while (getting to maybe 30% of the world tiles), then suddenly node workers memory starts to grow out of control and it starts to stutter, eventually filling the memory and making the terminal unresponsive:

Middle flat green area is the job at first, then you see the spike.

I really wonder what might be the cause. We want the runs to have flat memory consumption regardless of the job size.

mourner · 2015-11-05T14:13:16Z

Ha, when I switched the pause limit from 50000 to 5000, the world count went smoothly without memory spikes and finished successfuly in under 7 minutes!

2303777 / 2303777 tiles (100%), 401.8s elapsed [===============...
Features total: 436992923

Now let's try OSM coverage...

mourner · 2015-11-05T14:20:50Z

if we handled process communication with streams, backpressure would stop the problem, but we'd kind of need to do a file readstream => some round-robin selection of which worker to pipe to

Now I see what you mean here @tcql (I'm still a streams noob), let's explore this cool idea. Another benefit is that as far as I understand, in pre-Node4 forked process send is blocking while spawned process piping is not. And we could get some perf gains by fully avoiding spamming the event loop.

mourner · 2015-11-09T12:39:50Z

@tcql nope, replacing isPaused() with paused still breaks everything in Node 0.12.

morganherlocker · 2015-11-09T22:53:28Z

I am working on testing now. I have generated a very small mbtiles fixture weighing in at 1MB total. I think it will simplify our process if we just check this in for testing, rather than dealing with s3 and caching fixture assets in our CI. For benchmarking (more rare), people can use whatever they deem appropriate, such as the world OSM QA mbtiles file, which is just a curl away.

https://api.mapbox.com/styles/v1/morganherlocker/cigshqusv005xa6ks7iua9b2l.html?title=true&access_token=pk.eyJ1IjoibW9yZ2FuaGVybG9ja2VyIiwiYSI6Ii1zLU4xOWMifQ.FubD68OEerk74AYCLduMZQ#15.4/36.9777/-122.0222/-0.3

morganherlocker · 2015-11-10T00:04:08Z

I processed an old Tiger road file from 2009 for the same area so we can test data with multiple sources, and have the diffing example be runnable without downloading and processing external data. These are published at a stable mapid, so we can use that url for testing http tile sources.

morganherlocker · 2015-11-10T20:31:34Z

src/worker.js

+});
+
+function loadSource(source, done) {
+  if (source.mbtiles) sources.push({name: source.name, getTile: require('./mbtiles')(source, done)});


Are we waiting for the ready callback to be called anywhere? For the remote parser it appears that the getTile parser is not actually returned by the time the ready callback fires. I am not sure if this is a bug, but the flow is a bit difficult to follow.

@morganherlocker for the remote parser, ready is called immediately because there is no initialization step — it is ready to receive requests immediately unlike mbtiles which needs to open a db connection first before getting any tiles to work on.

I wonder if there is a way to make this an interface where any source can be used the same way, without knowing the implementation details (this could help us with additional plugins down the road). Perhaps instead of returning the getTile function synchronously, we should send it back as the result of the ready callback. I believe that would allow users of the parse interfaces to treat them the same.

Ok, I am going to proceed with writing tests for this in its current form and may refactor in a separate PR (requires some refactoring of the worker implementation). The remote interface diverts the execution flow in a confusing way IMO, for example:

var getTile = remote(source, function(err) { // i need to wait until i get here to see if there is an error on initialization if(err) throw err; // next line will fail because getTile has not actually returned yet, despite the parser being "ready" getTile([5276, 12757, 15], function(err, layers) { // yay, results! }); });

I agree, we should improve the flow here to make things easier to test and extend.

Implemented your suggestion.

morganherlocker · 2015-11-10T20:51:00Z

@mourner it appears we return all layers in this branch, as opposed to only the requested layers in the source object. Could preserving the layers array provide any significant parsing gains (keep in mind most people will want geojson most of the time)? Is selectively parsing worth the effort on the VT parsing side?

Current source example:

{
  name: 'streets',
  url: 'https://b.tiles.mapbox.com/v4/mapbox.mapbox-streets-v6/{z}/{x}/{y}.vector.pbf',
  layers: ['roads', 'tunnels']
}

morganherlocker · 2015-11-10T23:19:06Z

The latest build is passing on node 0.10, but failing on node 0.12. I added 0.10 to the travis config to track this behavior.

The issue is that the decoded vector tile under 0.10 looks like this with a raw data array:

{"buildings":{"version":1,"name":"buildings","extent":2048,"length":264,"_pbf":{"buf":[26,243,128,1,10,9,98,117, [...]

..while the 0.12 version returns a Buffer object with a wrapped data array:

{"buildings":{"version":1,"name":"buildings","extent":2048,"length":264,"_pbf":{"buf":{"type":"Buffer","data":[26,243,128,1,10,9,98,117,105,108,100,105,110,103,115,18,40,18,6,0,0,1,1,2,2,24,3 [...]

My hunch is that this issue is upstream in either the pbf lib or the vector-tile-js lib, but I will need to do some more digging.

mourner · 2015-11-11T12:23:47Z

@morganherlocker regarding layers — I didn't readd it yet because of lazy pbf decoding, but when we readd automatic GeoJSON parsing by default, we should definitely also readd layers property at the same time for performance gains (but make it optional so that it parses all layers by default).

Regarding VT 0.10 vs 0.12, I'm looking into the issue now.

mourner · 2015-11-11T13:36:38Z

@morganherlocker figured out the VT decoding issue — this happens simply because JSON.stringify has a different behavior on Buffer objects:

console.log(JSON.stringify(new Buffer([0xc8,0xe8,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x01])));

[200,232,255,255,255,255,255,255,255,1] # node 0.10
{"type":"Buffer","data":[200,232,255,255,255,255,255,255,255,1]} # node 0.12

You just need to compare objects differently, e.g. using t.deepEqual on objects rather than comparing strings.

mourner · 2015-11-12T16:13:59Z

Added an ability to omit area alltogether if you use mbtiles sources — it'll stream tile numbers from the first source if it's mbtiles using node-mbtiles ZXYStream. We can update the API later to be able to choose which source to use.

morganherlocker · 2015-11-12T19:32:41Z

I am noticing that binarysplit seems to crash when tile-reduce is run in the context of a test. I am pretty sure the main thread is grabbing TAP data somewhere, but need to do some more digging.

mourner · 2015-11-20T10:50:03Z

Warning: force-pushed a rebased branch. It merges cleanly now. Also squashed some tiny commits like "fix typo", and all merge commits were removed automatically.

mourner · 2015-11-20T14:34:07Z

The diff script now lives at https://github.com/mourner/road-diff. I'll clean it up before blogging next week.

tcql · 2015-11-20T16:49:40Z

working on adding a short doc section about events, so we've got bases covered

Edit: 👍 done. @morganherlocker @mourner review when you get a chance? I think we need to say more under the main Events header, but my brain doesn't want to do words right now

mourner · 2015-11-20T23:16:31Z

👍 let's merge and publish 3.0.0. We can improve as we go after that.

morganherlocker · 2015-11-21T02:46:49Z

lxbarth · 2015-11-21T17:09:03Z

🚂 !!

mourner added the enhancement label Oct 30, 2015

This was referenced Oct 30, 2015

Factor out vector tile parsing #39

Closed

is tilereduce a class or a function? #34

Closed

Consistent code style and ESLinting #43

Closed

mourner mentioned this pull request Oct 30, 2015

Memory leak #42

Closed

tcql mentioned this pull request Nov 9, 2015

[Not Ready] use Tile Reduce v3, improve false-positive matching, speed mapbox/osm-sidewalker#7

Merged

morganherlocker reviewed Nov 10, 2015
View reviewed changes

This was referenced Nov 12, 2015

Alternative ways to determine the list of tiles to process #47

Closed

Access token? #35

Closed

Tests do not confirm values #33

Closed

tcql and others added 15 commits November 20, 2015 12:45

support arbitrary tilestreams

7c8aa97

100% coverage of src/cover.js

3ed4e7f

more test coverage

c0424e5

ignore logging function in coverage

f3da9f6

test writes, more options

29295be

add test for sourceCover

3e6b362

updated options section of readme

cd01d1b

writeData now supports passing objects & auto stringifying

bde56c4

added map event, pass tile in reduce event, new events specific test

e8f8957

cleanup to readme

ca71ee8

add remote source throttling

6f92b3c

doc clarity, links, formatting

c51c09e

explicitly not maxrate

7a69c0a

add changelog

4b24194

remove unnecessary bind

6fdd81b

mourner force-pushed the nextgen branch from 5d89697 to 6fdd81b Compare November 20, 2015 10:49

move out diff to mapbox/road-diff

091df68

clean up readme, include sourceCover

5c7a68d

events section in doc

431525c

This was referenced Nov 20, 2015

maxrate is broken #44

Closed

document api #19

Closed

clarify opening; link to eventemitter

5fa9ef8

morganherlocker merged commit 5fa9ef8 into master Nov 21, 2015

morganherlocker deleted the nextgen branch November 21, 2015 02:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Next generation Tile Reduce #46

Next generation Tile Reduce #46

mourner commented Oct 30, 2015

mourner commented Oct 30, 2015

aaronlidman commented Oct 30, 2015

tcql commented Oct 30, 2015

mourner commented Oct 30, 2015

tcql commented Nov 4, 2015

mourner commented Nov 5, 2015

mourner commented Nov 5, 2015

mourner commented Nov 5, 2015

mourner commented Nov 5, 2015

mourner commented Nov 5, 2015

mourner commented Nov 5, 2015

mourner commented Nov 9, 2015

morganherlocker commented Nov 9, 2015

morganherlocker commented Nov 10, 2015

morganherlocker Nov 10, 2015

mourner Nov 10, 2015

morganherlocker Nov 10, 2015

morganherlocker Nov 10, 2015

mourner Nov 11, 2015

mourner Nov 13, 2015

morganherlocker commented Nov 10, 2015

morganherlocker commented Nov 10, 2015

mourner commented Nov 11, 2015

mourner commented Nov 11, 2015

mourner commented Nov 12, 2015

morganherlocker commented Nov 12, 2015

mourner commented Nov 20, 2015

mourner commented Nov 20, 2015

tcql commented Nov 20, 2015

mourner commented Nov 20, 2015

morganherlocker commented Nov 21, 2015

lxbarth commented Nov 21, 2015

Next generation Tile Reduce #46

Next generation Tile Reduce #46

Conversation

mourner commented Oct 30, 2015

mourner commented Oct 30, 2015

aaronlidman commented Oct 30, 2015

tcql commented Oct 30, 2015

mourner commented Oct 30, 2015

tcql commented Nov 4, 2015

mourner commented Nov 5, 2015

mourner commented Nov 5, 2015

mourner commented Nov 5, 2015

mourner commented Nov 5, 2015

mourner commented Nov 5, 2015

mourner commented Nov 5, 2015

mourner commented Nov 9, 2015

morganherlocker commented Nov 9, 2015

morganherlocker commented Nov 10, 2015

morganherlocker Nov 10, 2015

Choose a reason for hiding this comment

mourner Nov 10, 2015

Choose a reason for hiding this comment

morganherlocker Nov 10, 2015

Choose a reason for hiding this comment

morganherlocker Nov 10, 2015

Choose a reason for hiding this comment

mourner Nov 11, 2015

Choose a reason for hiding this comment

mourner Nov 13, 2015

Choose a reason for hiding this comment

morganherlocker commented Nov 10, 2015

morganherlocker commented Nov 10, 2015

mourner commented Nov 11, 2015

mourner commented Nov 11, 2015

mourner commented Nov 12, 2015

morganherlocker commented Nov 12, 2015

mourner commented Nov 20, 2015

mourner commented Nov 20, 2015

tcql commented Nov 20, 2015

mourner commented Nov 20, 2015

morganherlocker commented Nov 21, 2015

lxbarth commented Nov 21, 2015