-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Next generation Tile Reduce #46
Conversation
Here's that US West OSM-Tiger diff I generated in 3 minutes on a map (disregard the weird boundaries). |
This sounds fantastic Vlad, I'll put aside some time this weekend to look more in-depth. |
just adding my 👍 / proof; this is way fast. I did tests using mapbox/osm-sidewalker. I like it. The api on the I'm definitely in to sink some time to help move this too. |
With the new architecture, it's pretty easy to make an option to pass GeoJSON instead of VT to the map function, and make it the default. It will still be relatively fast. |
added support for tile queue pausing, so we don't pass too many messages to child workers |
@tcql manual limiting broke the tile-reduce — now if you run the count example, the total tiles and results are different each time. |
Fixed the tile list text stream somehow, but the JS array tile stream is still broken — for some reason it finishes when it hits the 50k limit. |
Fixed the JS stream as well. Now I wonder if there would be a benefit to try to switch to using edit: also this |
Running count on z12 world on my MBP, it goes super-smooth and fast for a while (getting to maybe 30% of the world tiles), then suddenly node workers memory starts to grow out of control and it starts to stutter, eventually filling the memory and making the terminal unresponsive: Middle flat green area is the job at first, then you see the spike. I really wonder what might be the cause. We want the runs to have flat memory consumption regardless of the job size. |
Ha, when I switched the pause limit from 50000 to 5000, the world count went smoothly without memory spikes and finished successfuly in under 7 minutes!
Now let's try OSM coverage... |
Now I see what you mean here @tcql (I'm still a streams noob), let's explore this cool idea. Another benefit is that as far as I understand, in pre-Node4 forked process send is blocking while spawned process piping is not. And we could get some perf gains by fully avoiding spamming the event loop. |
@tcql nope, replacing |
I am working on testing now. I have generated a very small mbtiles fixture weighing in at 1MB total. I think it will simplify our process if we just check this in for testing, rather than dealing with s3 and caching fixture assets in our CI. For benchmarking (more rare), people can use whatever they deem appropriate, such as the world OSM QA mbtiles file, which is just a |
}); | ||
|
||
function loadSource(source, done) { | ||
if (source.mbtiles) sources.push({name: source.name, getTile: require('./mbtiles')(source, done)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we waiting for the ready
callback to be called anywhere? For the remote
parser it appears that the getTile parser is not actually returned by the time the ready
callback fires. I am not sure if this is a bug, but the flow is a bit difficult to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@morganherlocker for the remote parser, ready is called immediately because there is no initialization step — it is ready to receive requests immediately unlike mbtiles which needs to open a db connection first before getting any tiles to work on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a way to make this an interface where any source can be used the same way, without knowing the implementation details (this could help us with additional plugins down the road). Perhaps instead of returning the getTile function synchronously, we should send it back as the result of the ready
callback. I believe that would allow users of the parse interfaces to treat them the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I am going to proceed with writing tests for this in its current form and may refactor in a separate PR (requires some refactoring of the worker implementation). The remote interface diverts the execution flow in a confusing way IMO, for example:
var getTile = remote(source, function(err) {
// i need to wait until i get here to see if there is an error on initialization
if(err) throw err;
// next line will fail because getTile has not actually returned yet, despite the parser being "ready"
getTile([5276, 12757, 15], function(err, layers) {
// yay, results!
});
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should improve the flow here to make things easier to test and extend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented your suggestion.
@mourner it appears we return all layers in this branch, as opposed to only the requested layers in the source object. Could preserving the Current source example:
|
The latest build is passing on node 0.10, but failing on node 0.12. I added 0.10 to the travis config to track this behavior. The issue is that the decoded vector tile under 0.10 looks like this with a raw data array:
..while the 0.12 version returns a Buffer object with a wrapped data array:
My hunch is that this issue is upstream in either the |
@morganherlocker regarding Regarding VT 0.10 vs 0.12, I'm looking into the issue now. |
@morganherlocker figured out the VT decoding issue — this happens simply because console.log(JSON.stringify(new Buffer([0xc8,0xe8,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x01]))); [200,232,255,255,255,255,255,255,255,1] # node 0.10
{"type":"Buffer","data":[200,232,255,255,255,255,255,255,255,1]} # node 0.12 You just need to compare objects differently, e.g. using |
Added an ability to omit area alltogether if you use mbtiles sources — it'll stream tile numbers from the first source if it's mbtiles using node-mbtiles ZXYStream. We can update the API later to be able to choose which source to use. |
I am noticing that binarysplit seems to crash when tile-reduce is run in the context of a test. I am pretty sure the main thread is grabbing TAP data somewhere, but need to do some more digging. |
Warning: force-pushed a rebased branch. It merges cleanly now. Also squashed some tiny commits like "fix typo", and all merge commits were removed automatically. |
The diff script now lives at https://github.com/mourner/road-diff. I'll clean it up before blogging next week. |
working on adding a short doc section about events, so we've got bases covered Edit: 👍 done. @morganherlocker @mourner review when you get a chance? I think we need to say more under the main Events header, but my brain doesn't want to do words right now |
👍 let's merge and publish 3.0.0. We can improve as we go after that. |
🚂 !! |
An early version of the next-generation Tile Reduce that is thousands of times faster than the current one. This is not a joke.
For example, generating GeoJSON road difference between OSM & Tiger 2015 for US West z12 (~26% of US) takes 2 minutes 45 seconds on my Macbook Pro; previously it would take a day on a much more powerful EC2 c3.4xlarge instance eating all of the available RAM.
The new architecture also makes Tile Reduce much more flexible and loosely coupled, e.g. easy to extend for raster support, weird tile formats, lazy decoding, etc.
I'll devlog how I did this, but the main idea behind the rewrite is removing almost all communication between the main thread and the workers and avoiding unnecessary work:
reduce
event is only for smaller values like statisticsSomewhat breaks API compatibility, but this is something we have to do to support the new architecture. Upgrading existing processors should be easy once we finalize the work.
Also sports a fancy progress bar so that you could feel the unreal performance:
While all the basic stuff works, this is still a work in progress. Rough list of remaining tasks:
readd overzoom (slicing to higher zoom levels than the sources provide)I would love feedback and help on this. We could win so much with a blazing fast Tile Reduce, so let's make it happen sooner.
cc @morganherlocker @MateoV @aaronlidman @tcql @tmcw @lxbarth @ericfischer @batpad @Rub21 @geohacker @arunasank @isiyu