-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
geography proposal: do whatever GADM (et al.) does #5076
Comments
From @sharpphyl in #5063 (comment):
If it has things that can be made into postgis::geography, it can at least be discussed. (Naturalearth tends to be fairly low-resolution, which doesn't necessarily mean its not useful.) |
One additional consideration, illustrated nicely by the screenshot maps in #4857: The data strongly suggest that some significant portion of the time, users choosing geography see what they're looking for and click. The label says Russia, they click Russia, and for whatever reason fail to realize (care??) that there are in fact three Russia[+continent] options (two of which are appropriate for any given point, but not necessarily shape, in Russia - the odds are in their favor!). This leads to many thousands of situations where the geography data disagree with the locality - https://arctos.database.museum/guid/MVZ:Bird:46708 for instance: the geography claims Asia, the locality is clearly not in Asia, what should I believe? I suspect users will be much more likely to choose what they intend in an environment where we're not filling in the blanks because they exist and therefore only have one "Russia." |
If you haven't yet considered GeoBoundaries as a source, have a look at it. |
Well I got it, but I feel like I'm missing something fundamental - there's no parentage. I can figure it out until I get to ADM2 (municipalities), and then I run into.... seven things called 'Benito Juárez' because the Mexicans forgot to put a unique key on I grabbed MX because GADM has a lot of gaps around Oaxaca, geoBoundaries has so far had what I need, thanks @tucotuco! @mkoo you have any magic to offer?? Also, maybe even if we decide to reject this proposal: geoBoundaties requires attribution, I can probably stuff something into remarks and make them happy, but that seems wrong on a few levels - suggest we consider some more-formal 'source of the shape' addition to the model. |
From #5059 (comment) - perhaps change the idea of this to "follow the pattern of" rather than outright "do what they do" - what some of they do would not be very usable in a system such as Arctos, and requires a little massaging.
|
In #5084 (comment) I proposed creating through gadm2 (county-level) by default. Having now been through much of the data for Vietnam, and enough bits and pieces of other places to suspect it's relatively representative, I'd like to retract that and propose the opposite for the following reasons: There are at least dozens of VN wiki pages similar to https://en.wikipedia.org/wiki/H%E1%BB%93ng_Ng%E1%BB%B1_district:
There are dozens more "stubs" - things that Wikipedia acknowledges to exist, but doesn't have much information on. We would not discover problems/changes to them. GADM data tends to be a few years old and generally isn't qualified; it would take significant research to figure out what 'Hồng Ngự' might refer to. (Other sources have other quirks, but AFAIK nothing at all is much more than a shape with some sorta-ambiguous names attached.) Nobody (perhaps except the local politician) seems to much care about these subdivisions. (If there's much information on Wikipedia, it almost always relates to cultural and historical aspects, often the city or feature after which the administrative unit was named, not the unit itself.) At least in VN, the local divisions are physically small. Hồng Ngự district is about 80km^2 (vs. 2500 for Sacramento County as a point of reference). I don't think there are significant functionality differences between geography Essentially, I don't think we have the resources to adequately (much less properly!) manage 700+ second-level divisions of Vietnam, and I've come to believe that that pattern holds generally. I also don't think that pushing those data to locality has significant functional implications. I therefore propose to create GADM2-equivilant geography only when there is a particular reason and associated resources to do so: someone familiar with the area is willing to help manage geography, for example. @sharpphyl your thoughts on this would be very appreciated. |
@dustymc I haven't read this, but it sounds like it might be helpful? |
That's all strings so not very useful to me.
The part of GADM we're interested in involves
I'm sure that lots of those coincide with all sorts of other things, but that isn't recognized by GADM and so won't be recognized in our data. https://gadm.org/maps/USA/hawaii_2.html does not coincide with an island, but https://gadm.org/maps/USA/hawaii/hawaii.html does. (And the former probably coincides with an island group, but that's even more of a mess in our 'legacy' data and I'm wondering if it's ever useful or spatially represented in anything.) https://gadm.org/maps/PHL/batangas.html also doesn't correspond to an island - there are (or were, I think I cleaned this one up) lots of "island is most of the state-like-thing" with the island also listed in our data, and a fair number of them have records which map to the smaller islands. From here so far:
Some of that's detectable (by humans, probably) in the data
finds (8685 rows) Yapen Island | Irian Jaya, Japen Island, Ambai Island |
Accepted as #5138 |
Refs #5063, #4928, #5022, etc, all aimed at solving/avoiding #4836
Let's elevate #5063 (comment) to an actual proposal:
If I get to pick I'd go with the "just use GADM (and such)" approach because I don't see anything else that looks viable. That would work out a lot like the Kuwait issue [in which the country was created without a continent].
There's no particular limit to the sources - eg @mkoo is working on island data (and I have no idea what it'll look like), the only real "rule" is that we would not try to mix-n-match. (A guideline might be that we try to stick to accepted sources when possible, but can see no real problems with creating our own if someone wants to make that investment.)
(Or maybe we CAN mix-n-match, but that seems to inevitably lead to inconsistency - eg, we can add a continent to
US, Ohio
but then it's structured differently thanUS, Hawaii
. I suggest there is value in consistency, and there is very little - sometimes even negative - value in 'filling in the blanks.')There would be many details to work out, I suggest we ignore them all for the moment and just decide if we can agree in principle to follow things like GADM as our "geographic authority" data.
Some examples of the data I have available right now, and how it might be mapped.
Continents - hopefully these wouldn't much be used for cataloging, but they can be used for things like spatial search - its possible to find all the things that map to eg Africa no matter what the geography assertion might be. The mapping to geog_auth_rec would be to continent.
Seas
"Marine areas" (some of which are inland) - lots of overlap with seas which might result in eg "Beaufort Sea" (mapped to geog_auth_rec.sea) + "ARCTIC OCEAN|BEAUFORT SEA" (mapped to ocean + sea) existing (and they probably have slightly different shapes, some of these seem to be arbitrarily drawn by hand). Not ideal, but seems like something we can work with, one way or another. Note also that some of these do not seem to be 'natural' (regulatory areas, perhaps??) and mapping those to our model would require discussion.
Parks - surely there's something better, but I do have data for some (very arbitrary) public lands. Note that it's just the park in these data, but as with continents nonasserted geograpy could still be used to search. (That is, one could find some California things by searching 'Point Reyes NS' and all Point Reyes NS by searching California.)
EEZ+Land - these are a weird mishmash of sovereign (country), sorta-sovereign (Guam), and spatially convenient (Alaska) plus the (or bits of the??) associated EEZ. Mapping to geog_auth_rec would need discussed - https://arctos.database.museum/place.cfm?action=detail&geog_auth_rec_id=10016359 exists, I would not want to defend it.
And last but certainly not least, GADM. I've imported this as three levels, which would map to country (gadm0), country+state_prov (gadm1), and country+state_prov+ county (gadm2).
I believe @sharpphyl intends to discuss this at the next Arctos Office Hours, and I think at this point things like "this [ does | does not ] seem like a horrible idea" are very useful.
This looks like a workable idea to me, and that seems to be a hard bar to cross. I'm not sure I have an opinion beyond that, other than wanting to somehow end up in a situation where "geography authority" data all has a spatial representation (and I believe we all agreed to that in an AWG discussion).
Do note that there are no model changes proposed in here, simply guidelines for how we use (or, mostly, do not use) the long-existing model.
Alternative ideas which lead to spatial views of the world are of course most welcome.
Help!
The text was updated successfully, but these errors were encountered: