-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database-reload #2105
Database-reload #2105
Conversation
This adds -G, --hstore and the lua tag transform to the documentation for the osm2pgsql import. This resolves gravitystorm#101.
We're already doing this by using our own tag transforms with Lua. At the very least we should drop columns which we don't use in the style, like |
|
||
# Columns defined in openstreetmap-carto.lua file | ||
way z_order int4 linear | ||
way osmcarto_z_order int4 linear | ||
|
||
# Area tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of this line and everything below if we're using a Lua transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pnorman Do you have a suggestion for a computationally efficient way to achieve this in lua? We have wildcards (asterisks) in the list of tags, and I'm afraid running 75 regexp-matches per tag will be a problem computationally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that this has been implemented in https://github.com/openstreetmap/osm2pgsql/blob/master/multi.lua (but not https://github.com/openstreetmap/osm2pgsql/blob/master/style.lua). I took that solution, we'll see what it does to performance.
I tried to keep the Lua transforms backwards compatible, or at least not to do any things that don't make sense for other styles. For example, the custom column name is prefixed with osm_carto, and z_order is still supposed to do what it always did.
Yes, I think that makes sense. Although we still have the issue of people using one database for multiple styles (which might need the poi columns). Not sure how big of a problem that is, and if we should take that into account? |
{'historic', 'citywalls'}, | ||
{'waterway', 'derelict_canal'}, | ||
{'waterway', 'ditch'}, | ||
{'waterway', 'drain'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not be the place in this PR but this list misses waterway=canal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching, added it now.
@math1985 , what are you basing this statement on? Looking at all of the rendering results I made in the past three years across the globe, I know for certain that there are still hoards of old school multipolygons in the database, even on very prominent world wide known features like renowned musea. Even though the "official" policy nowadays may be to only put tags on the multipolygon relation, no editor actually totally prevents you from creating "old school" type multipolygons. Since nothing truly prevents it, I have also encountered quite a number of recently created multipolygons in "old school" style with tags on the outer instead of the multipolygon relation. What is the "gain" of dropping support? AFAIK, this is just a decision on import of features by osm2pgsql, and shouldn't affect the style's developement or be directly related to (hstore) tag storage. I think, or better know, that dropping support for old school type multipolygons, will lead to some really unexpected wide ranging rendering changes, with features small and big suddenly, and to most people completely unexpectedly, disappearing from the map. Think of entire forests or sections of rivers or giant lakes. I think it recommended to discuss this with a wider audience if you really intend to do this... |
Dropping old style multi polygon should have a proper QA tool in place. And main editors should convert old style automatically or with one click. |
It makes the code quite a lot more complex. However, if there is desire to keep this functionality, I'm happy to re-add it. On the other hand, dropping support for old-style multipolygons might speed up the conversion process. |
sent from a phone
\o/ (btw: isn't this "all keys of objects that are in the rendering db"?) |
sent from a phone
sometimes it is right to have tags on the outline and on the relation, and current way of processing this "the old style way" breaks these |
If you're storing z_order using the example code in style.lua, this is different than the standard osm2pgsql z_order. |
@pnorman I don't see any differences, am I overlooking something? |
The numbers in the layers structure are different than style.lua. Edit: There are probably other differences, but this is a clear one. |
return filter, keyvalues | ||
end | ||
|
||
-- Filter out all relations except route, multipolygon and boundary relations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keys like “FIXME” and “note” are filtered by a blacklist. But here, for the relations, a whitelist is used. Would it be very bad for performance to put all relations in the database (maybe excluding some blacklisted relations)? Maybe one day you want to use some of these relations, and given that database reloads seem to be seldom if ever…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, seems a logical next (or additional) step when adding hstore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - you can't just add arbitrary relations here, you have to write logic to do something with them
Thanks, I was still looking at an old version. It changed relatively recent. I would still like feedback from other developers and maintainers of tile servers - is it desirable that one database import can be used for multiple styles? Maybe we should ask on the dev list? @gravitystorm are you still running such a set-up? |
For me that is desirable but I would consider it as merely "nice to have". Note that I was never running public tile server, so it would be worth asking people who did this. |
cc @woodpeck as perhaps the only person doing this |
One database driving multiple rendering styles is something that I do on a couple of servers and I've also installed servers for customers that do this. Of course there were always limits to the practice - you couldn't run tonerlite and osm-carto from the same database because one is osm2pgsql and one is imposm. But running the German style and osm-carto and osm-bright from the same database is possible and I'd say it is desirable, too. Today I solve this by having a master database with a hstore column, and then a number of views that cater to each style's specific demands. In an ideal world, there would be one standard osm2pgsql way of importing things, and then each style would come with its set of "create view" statements... |
It would be nice if one database can drive multiple styles, using views as @woodpeck does would be fine, but I'm not sure if our lua script changes the data too much for that. In practice, I've only ever managed to make styles under my own control work together, and I had to make a small change to openstreetmap-carto to get it working on my existing databases. Nowadays I have a much smaller need to run multiple styles on one database layout (using vector tiles allows me to abstract away database layout changes). I would recommend we design the database we need for this style. If anyone has time to check the feasibility of running other styles (e.g. osm-bright), whether directly or using views, that would be great, but it's a much lesser priority. Let's not get too bogged down in compatibility discussions since this database reload is years overdue already. |
No, this is fairly easy to achieve. In general I think we should make sure that our lua script does not overwrite existing columns, but creates custom columns instead if we want something special (I use columns prefixed with osm_carto for that). That also makes sense from a clarity perspective. A choice we do need to make is whether we want to use the z_order from the current osm2pgsql, or from the osm2pgsql from last year (which is also the current default for openstreetmap-carto, and likely the current setting for most tile servers in the wild). I think i'd be best not to refer to the z_order column from within openstreetmap-carto (and use a custom osmcarto_z_order column instead), but it would be nice if the import could leave a sensible z_order column behind for use by other projects. |
A question about a line .style: node capital text linear With hstore enabled, would it be possible later to access this as polygon, or does hstore ignore it because of this line in the .style file? |
This makes sure the object is put in planet_osm_line, not in planet_osm_polygon. I think if you really wanted to you could still convert the object to a polygon manually? |
Okay, I understand. Thanks for clarificaton. (It was rather a theoretical question.) |
Note in addition to compatibility with other styles there is also the question of backwards/forwards compatibility within this style, i.e.:
This would not only be a matter of compatibility in itself for style users but also affect the ability of style developers to easily do cross-fork development. The idea of @woodpeck to use views as an abstraction layer could probably also help in that regard. Things like the multipolygon handling options are of course something that cannot be tackled with that. |
I would like to try to achieve this. If this won't work, debugging code (for example finding out when a particular bug was introduced) will become a nightmare.
If we want to make use of the new columns from hstore, I don't think we can achieve this. |
With views that should be possible at least formally - the German style on openstreetmap.de for example uses hstore but this is optional, you can also run it on a classical database layout: https://github.com/woodpeck/openstreetmap-carto-german/tree/master/views Of course any view columns based on hstore would be empty when the same view is created on an old database layout without those columns. The only disadvantage to me seems that every time the style starts using a new key from hstore the view needs to be changed to support this. |
If the columns are empty it means the code won't be backwards-compatible with old databases :) I don't see very strong arguments in favour of using views (as in here) by default, @pnorman @gravitystorm what do you think? |
With respect to multipolygons: I'd still prefer to drop support for old-style multipolygons. @dieterdreist correctly mentions that was long as we support old-style multipolygons, we cannot fully support multipolygons that rightfully have polygon tags on the outer way. In addition, old-style multipolygons are mostly seen as errors by now, and it's only an advantage that the rendering indicates that there is a problem. |
* Use lua instead of style file to set some tags as polygon that are only used in the hstore column. * Use lua for deleting tags instead. * Filtering out objects with 0 tags now takes into account tags that are dropped by the script.
Done. |
With respect to backward compatiblity / compatibility with other projects this proposal takes the following measures:
Apart from multipolygon behaviour and polygon-versus-line behaviour (which are bug fixed IMO), this should guarantee that the old code is forward compatible with the new database scheme (or equivalently, that the new database is backwards compatible with old code versions). It also should mean that projects that can now share a database with osmcarto can still do so in the future. |
From my side, this is ready now. I'd appreciate if people could help me testing this, or spot any potential problems! On terms of performance of the import process: loading of Luxembourg with the new settings (with .lua file) takes 103 seconds, as opposed to 307 seconds with the old settings (without .lua file). So that seems to be a positive improvement. At first sight, processing of ways seems to have gotten worse while processing of polygons seems to be significantly faster. No idea how this scales to larger datasets though! |
About 30-40% of multipolygons are old-style, so this isn't a viable option right now.
No, we shouldn't be using views like that. I still believe attempting to maintain backwards compatibility is a mistake. |
Thanks, I didn't expect it to be that high. How did you get this number? |
count of relations with tags of type=multipolygon over count of relations with tags including type=multipolygon The data is a couple months old, but the numbers haven't changed much over the last couple years. |
The main arguments are that it enables contributors of other styles to more easily contribute here, it reduces complexity of the SQL code and most importantly it makes the code here mostly independent of the database scheme actually used, esp. w.r.t. future changes. I don't see any arguments so far against this. |
Not entirely surprising to me, considering all I have seen in the past three years rendering OpenStreetMap data...
? It is not exactly clear to me what you are trying to say here (although I do get the general idea how you might achieve to get this number of old-style versus modern multipolygons).
Unless all editing tools start explicitly warning users when creating old-style multipolygons (or even blocking upload of them), and providing guidance how to create correct modern ones, I don't see this figure going down much either. Multipolygons are really confusing for new and even many seasoned OSM users, and difficult to get right for them. I have seen plenty of recent edits including old-style multipolygons, so for each one being fixed, new ones are being added. Also, there is the problem of the common workflow of "upgrading" multiple existing disjunct closed ways / polygons to a multipolygon. Correctly moving the tags from the outer ways to the multipolygon relation is often far more work than simply throwing everything together into one relation and setting the "type=multipolygon" tag, which is dead easy in most editors. |
In my ArcGIS Renderer, I have overcome these issues by having a full automatic detection of OSM keys used in SQL. This was made possible by ESRI implementing its own kind of "hstore", and adding a tool in ArcGIS that can extract data from this key-value storage. All database fields / OSM keys extracted this way use a prefix of "osm_", meaning any SQL statements must use this prefix as well. This makes it relatively easy to write a parser that can auto-detect used keys. Actually, I do only use what are called Definition Queries of layers in ArcGIS in my renderer. These can only represent the WHERE clause of a basic SQL statement. This is of course a limitation, but so far, I haven't had any real need for really complex SQL statements to develop useful render rules. I think the render results speak for themselves. Essentially, each render rule with its Definition Query, is a bit equivalent to a database view. It gets stored in an ArcGIS project. By auto-detection, any needed keys are extracted from the ESRI "hstore" automatically, so users never have to worry about "missing" keys. If they stick to writing Definition Queries only, the render rules will simply work, no matter what keys are present yet as true database fields, or missing and need to be extracted (note: in ESRI's implementation of a key-value store, you can't access the data in the storage directly, it must always be extracted to a normal database field using the custom build ESRI tool for that). I also extended this concept to auto-extract OSM key names from the user specified SQL statements and python expressions that the labelling engine of ArcGIS can use, meaning all relevant properties of a render rule are covered. |
Another interesting aspect of my renderer is that it is possible to both render from just 3 base tables for Points, Lines, and Polygons, or to render from a dedicated Feature Class (database table) for each and every render rule in a style. These options do not influence the outcome of the rendered data, the resulting styled data is exactly the same. With the latter option, I have successfully generated geodatabases with close to 350 dedicated tables for the 350 render rules of the main style I developed (users can however create styles as simple or complex as they like, so aren't bound to this humongous base style), each of them having a dedicated set of fields. Since the renderer also uses the auto-detection of the keys used in SQL statements, to automatically index any attribute field referenced in a SQL statement, an OSM geodatabase created using dedicated Feature Classes for each render rule, may have over a thousand individual auto-generated indexes for relevant fields in all the tables generated in case of the base style I developed. |
I guess if the support for old-style multipolygons is not dropped, than this will have to wait until the next database reload, right? Given that database reloads are extremly rare so far, this would force old-style multipolygon rendering for probably many, many years. I do not feel comfortable with this idea either… |
I think basically everybody agrees that from a data quality perspective, dropping old-style multipolygons is a good idea, and will lead to a sharp improvement once users are forced to create valid ones, as non-valid ones simply won't show up in a proper way. The question remains though, if we can live with a Standard map on the OpenStreetMap website, that suddenly sharply changes in appearance when this policy change is implemented... If the answer to that question is yes, and you accept a map that may take a couple of months to a year (more?) to "catch up" with its former "glory", then it may be acceptable to drop multipolygons now. |
i stand for dopping the old-style multipolygons. But we should inform the users early enough that they could start fixing the multipolygons before we dropping them. Maybe add an maproulette challenge |
I adapted to PR to re-support old-style multipolygons. Data shows there is still a very large amount of them in use, and the amount is so large that I don't think retagging them on short notice is feasible. This does come with a large performance cost, loading Luxembourg now takes 290 seconds versus 98 seconds without support for old-style multipolygons and 223 seconds currently. I think this is not great, but acceptable. As the old-style multipolygon handling is the bottleneck, I'd expect the performance hit to be linear in terms of database size. With respect to backwards-compatibility: as we don't need to make large sacrifices for now, I'd propose to stay backwards compatible at least this reload. For the next reload, we can consider dropping this requirement, but for now I don't see large gains. I would really like to have this tested by somebody else before rolling it out to production. @pnorman or @gravitystorm, would you perhaps be able to test this, ideally on a larger dataset? Or somebody else? |
Performance should be comparable to the C backend - you were getting big gains before because old-style MPs weren't being handled as pending, skipping a lot of the work required. |
# osm_timestamp - datatype timestamptz(0). | ||
# Used with the --extra-attributes option to include metadata in the database. | ||
# If importing with both --hstore and --extra-attributes the meta-data will | ||
# end up in the tags hstore column regardless of the style file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should skip the documentation on what a style file is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Some git and process notes
|
One thing I'm a bit worried about is that multipolygon-handling might lead to unexpected behaviour when importing with --append. I suspect this will be the same in the current situation and with the .lua-files, but I can't be sure. For example,
|
Thanks for the advice. Done, see #2128. |
This is a pull-request against the 4.0-branch
This PR proposes a database-reload. It changes the documentation on the use of
osm2pgsql, and adds a .lua file for preprocessing.
This database import aims to be backwards compatible with old versions of the
style, and with other styles making use a rendering database created with
default osm2pgsql rendering settings.
This resolves (at least part of) #1504.
Hstore
Adding the hstore option to osm2pgsql allows us to use all keys.
We could roll out hstore in two different ways. First, we could just use hstore
for new keys, and keep using traditional columns for old keys. Second, we could
drop some of the columns, and start using hstore for these, in addition to using
hstore for new keys.
I would prefer to go the first route. The second route might lead to some
performance improvements. However, it is not clear exactly how these can be
achieved. @pnorman has been working on this over a year ago (see #1504), but as
far as I know, he currently has no time to continue this project.
According to @pnorman's benchmarks,
using hstore without dropping columns would lead to a 9% size increase and a
0.38% speed decrease. Given the benefits of having all columns available, I
would consider this acceptable.
In addition, the second way will lead to loss of compatibility with other styles
that make use of the same rendering database.
I would therefore propose to go the first route, and for now enable hstore
without dropping columns.
Make polygon/linestring decision based on value
This is based on the following unmerged PR to osm2pgsql:
osm2pgsql-dev/osm2pgsql#346
It makes the polygon/linestring decision based on the tag value, in addition to
the tag key. In particular, highway=services and junction=yes are now treated as
polygon, while leisure=track, man_made=embankment, man_made=breakwater,
man_made=groyne, natural=cliff, natural=tree_row, historic=citywalls,
waterway=derelict_canal, waterway=ditch, waterway=drain, waterway=river,
waterway=stream, waterway=wadi, waterway=weir, power=line, and power=minor_line
are now treated as linestring.
This resolves #1362, resolves #137, resolves #268, and resolves #892.
Rendering order
The new .lua file creates a osmcarto_z_order value in the database, which stores
the rendering order we use. This will allow us to simplify the road queries.
Recommend -G flag for multipolygons
This resolves #59.
Import ele on buildings
This resolves #101.
Delete more meta-tags from imports
The new .style file drops some more meta-information from imports that is of no
need for rendering.
This is based on the following unmerged PR to osm2pgsql:
Simplify treatment of multipolygons
In the default .lua style, multipolygons with the tags on outer way were
accepted as a valid alternative for multipolygons with tags on the multipolygon.
Because almost all multipolygons have the tags now on the multipolygons, support
for multipolygons with tags on the outer multipolygon has been dropped in the
proposed .lua file.