-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is geography: intersections #4836
Comments
Here is how we got there for the smaller quads: #2229 That was pre-locality attributes. I don't mind moving them over to locality attributes as long as we set up a code table. I was going to say that we should also transfer over all the map links that are in geography remark, but the links are dead 😢 If we did leave 7.5 and 15 minute quads in geography, we should be able to get spatial data from something like this: https://catalog.data.gov/dataset/usgs-map-indices-overlay-map-service-from-the-national-map |
Makes sense, we could do that if we go there.
Noice....
I can probably do something with that, but it almost certainly doesn't include the intersections that lead me here - eg https://arctos.database.museum/place.cfm?sch=geog&higher_geog=North%20Lake%207.5%20minute I think there are three possibilities
Thanks, very helpful, we still need to have a big-picture "what is geography" discussion.... I think https://arctos.database.museum/place.cfm?sch=geog&quad=Seward is the quad-weird winner with two spelling variations and 42 sub-quad-thingees.
|
Honestly, this is what started the whole "quad" problem - if AK can have them why can't NM? Could we just put the AK quads in the "county" field to save having that argument with everyone? Maybe even consider the following?
So that we aren't cramming provinces and krugs into "state" and "county"? We are still going to run into issues of what is Division 1 and what is Division 2 for some countries.... |
Thinking about this a little bit, I wonder if this could be solved by restructuring geography a bit - we could have the hierarchical, not intersecting geography as one thing, and have non hierarchical, potentially intersecting geography as a separate thing. The latter would be "things that it is useful to have spatial data for that don't fit neatly the main geography" - things like quads and features. For example, it would be helpful to have National Park spatial data in Arctos and even if you we not aware your locality was in a National Park, something would pop up - hey this locality is within current national park boundaries. |
2 differences - AK quads are 1:250k while the "small quads" seem to be a bit of anything that ever got printed on a map, and NM has normal-sized counties. (AK now has county-like-thingees too, but they're very recent and one's bigger than UT so not all that useful.) I'm not really arguing if anything should exist or not, but the spatiopolitical landscapes don't line up very well and we probably have to acknowledge that.
I'm (obviously, I hope) willing to consider just about anything, but there are dozens if not hundreds of these conversations and none of them have lead anywhere so we're flopping around here in the middle. Can we fix that - somehow just come together as a Community, decide what is and isn't geography, and make that work?
I think our "pick something and stick with it for the country, at least" approach works fairly well - we don't have to have a perfect global solution to have something significantly more usable than what we have now. |
Hu, neat. Would my half-baked proposal plus a trigger than allows only one "category" be functionally equivalent? (I'm not sure if that's a simplification or unnecessary complication). And I suppose we'd want to allow multiple (just two?) geographies per locality?
See also just_use_best_match - given coordinates I can magic WHATEVER, maybe there's some viable 'only asserts coordinates' model out there waiting to be found.
|
I think this idea of separating political and geography should be elevated to a full proposal. Given political stuff (US/HI) I can usually figure out the intent (state of Hawaii) and find appropriate spatial data. Given geographical stuff (HI/HI) I can usually figure out the intent (archipelago, island) and find appropriate spatial data. Given a random mix of those, everything conflicts with everything and nobody - especially us! - can figure out what we're talking about. (And spatial data isn't readily available for things like "European France" - there are practical reasons to do what the sources of spatial data have done, which generally involves not mixing concepts.) If someone wants to assert both then they can do so via two localities. If only one one is asserted, I can use spatial tools to make everything discoverable anyway. Can that be forged into a workable model? Does anyone else do anything like this? (I think not - they just deal with strings and pretend that "Hawaii" means whatever's convenient at the moment.) I don't know what will work well, but I'm increasingly certain that our mishmash can not be fully supported by spatial data, at least not without something like hiring a full-time GIS person (which doesn't seem remotely realistic). @sharpphyl I think most of your collection would be right at the border of those two things (where all the interesting stuff happens!) - your thoughts (proposals for a better model, whatever) would be very appreciated. |
This makes sense to me.
In a single event? That would be nice - this thing was collected on this day in this political place and geographic place (one may be inside the other or they may overlap, providing a more granular "higher geography")? I like the idea of being able to select, one, the other, or both. |
Not what I had in mind, but I suppose I'm up for anything. I really don't think there's much reason to do that, if we can come up with a model which better fits reality I should be able to use the spatial attributes to move back and forth across spaces. And I'm a little paranoid about the whole "Kenya's moved all the borders but reused half the names, again" thing at the moment so isolating that seems like a Really Good Idea. (Wild guess: 20% of our 15K current geog entries carry some sort of temporally-involved ambiguity.) |
That's #3018 |
That's the technical bit, but the whole picture also involves management. I'm staring up an an overwhelming mass of evidence which suggests that just never(ish) happens. Keeping current could technically be "part of Arctos," but from the social side of things I can see no safe way to do that while we're also allowing self-conflicting data. (And I can't see how that might be resolved at this point.) |
Drainages are a large part of the intersectional data. Suggest removing drainage from the geography model in some way:
Or, alternatively:
(Or even more alternatively, propose some model in which drainage as geography makes sense. 'Drainage requires all other fields to be NULL' would do it.) The first choice could add no work, and could not be self-conflicting (to the extent the underlying data are accurate). It would not be available to records which aren't georeferenced, but that's a very low bar in Arctos (click one button or ask me). The second would allow 'verbatim assertions' and work with data of any quality. |
A short hopefully-functional proposed solution to the quad problem: quads can only be accompanied by (continent_ocean,country,state_prov). This finds things which conflict:
currently (3611 rows), about 150 involving Alaska. That is not incompatible with moving quads (all or some) to locality attributes, from where they could be mixed with anything as desired. |
BUT in the lower 48, one can narrow the spatial footprint with county + quad as some quads extend over two or more counties.. |
The simple solution is "it ain't geography unless it's accompanied by spatial data" which would work for anyone who wants to calculate that overlap, but there's absolutely no way we're going to accomplish that and retain "legacy" geography, and so far nobody's stepped up offering to calculate those intersections.
I don't think that's a direction The Community is comfortable (or capable of) going - #4289 - and geography is certainly not a necessary component of "narrowing." |
not that important for us to have quads and drainages in higher geography (it doesn't seem to make sense to me there anyway), but locality attributes would be useful. Carl doesn't think it's useful in higher geog either. Verbatim assertions could work. Just want it in in some way and easy to search by. |
Intersecting geographies have been assigned single-component spatial data in production (probably when we first got WKT for quads). #4863 (reports) is trying to put https://arctos.database.museum/guid/UAM:Herb:109929 (Canada) in https://arctos.database.museum/place.cfm?action=detail&geog_auth_rec_id=4838 (claims Alaska, mostly maps to Canada). The inappropriate spatial data should be removed. |
I think the two things you're referencing are geographical and political "stuff." I think of most of our borders as wet and dry - that is in a water body (e.g, Gulf of Mexico) but needing to be linked to the dry land it's offshore of (e.g., Egmont Key, Florida). While it's interesting, I'm not sure resolving quads will do much for our marine records, but maybe I'm misinterpreting the intersection. |
I'm bumping this up for AWG discussion, and will attempt to summarize here. We have access to cool data which supports cool tools, we need to modernize our geography model (and viewpoint) to access them. The slightly more technical core of the issue is mostly discussed above, the problematic geography largely consists of different "categories" overlapping each other. We can support 1:250K quads, we can support Counties, we can support Islands, mixing them all up results in placenames for which I cannot get spatial data, and so we cannot know if georeferences to those places are appropriate. Here are a few examples from the most-used current nonspatial geography and how I'd handle them.
Those 5 example records represent over 100,000 records excluded from many (most?) analyses, with another 7,000 plus geography records having the same limitations. I don't think we need to do anything radical to the model, we just need to constrain ourselves to using the pieces of the model which represent accompanying spatial data. Under that view, someone who wants to bring a shape for the 5 square feet of some (quad+county+feature+drainage) overlap would be welcome to do so; the answer to "what is geography?" would be "something with spatial data." (Radical refactoring of the model might result in more consistent data and such, I certainly am not opposed to a complete rethink, but I'll settle for low-hanging fruit at the moment. And I don't think any interim more-spatial thing we might do could much conflict with any sort of radical refactor.) Failing to do anything seems within the realm of possibilities, if we end up there I want my promised 48 point flashing red warnings - this seems like 100% social problem to me, we can get spatial data and confine georeferences to it, if we do anything less it's because we've chosen to, and users deserve to know that Arctos has the technical capability to produce and recognize Research Grade place-data. (Semi-related, #4916 should be seen in the same vein - I've already georeferenced most everything, choosing not to expose and see how it fits with other stuff also seems like something we should tell users, loudly.) EDIT: response to some comments from @sharpphyl in #4894
There are two distinguishing features that should make these completely unambiguous
The actual name is مِنْطَقَة مَكَّة, https://handbook.arctosdb.org/documentation/higher-geography.html#guidelines-for-geographic-terms-in-arctos requires ASCII, whatever we do is a semi-arbitrary transcription, one's about as good as another. (My spatial data has one Makka and one Makkah...) |
AWG discussed, consensus is to move in a spatial direction, it is understood that doing so will require a lot of conversion and cleanup. I will be opening individual Issues to address specific problems as they're encountered. @mkoo is working on munging island data into something I can use, which will allow disentangling lots of localities. It was noted that Geography Shape Name provides a way to search by unasserted places; removing (island, continent, whatever) from "combo geography" does not affect discoverability. At the moment I don't think any model adjustments are necessary, but how we view/use the model does need to change (and the documentation needs to reflect that). Essentially, I think that means asserting only what's really required instead of filling in all of the blanks; adding continent to Russia just reduces accessibility, so don't. |
Merge-->#5138 |
Is your feature request related to a problem? Please describe.
I'm trying to add spatial data to geography, some geography is intersections, I'm not sure how "geog-like" that is or why this has happened. (Because it's useful, or because things are sorted by eg quad - in which case those data might be just as useful elsewhere - locality attributes or part attribute location or ????)
Describe what you're trying to accomplish
Awesomeify data without wasting time on things that should be dealt with elsewhere.
Describe the solution you'd like
Magic, but I can't find any.
Describe alternatives you've considered
By way of example:
ST_Intersection
of those two things, but that's a completely manual process with a LOT of room for error. (Feature Request - parent geography #3108 might have changed that, but it didn't).Additional context
Something like 70% of the records using this particular (arbitrary) geography aren't contained by it, although the vast majority do intersect. Others may be better or worse, but I suspect that this is fairly typical. That leads me to believe these wouldn't be great 'best fit' candidates (if we ever get tired of manually fixing stuff and asserting things that can't be true and etc.), but I'm happy to figure this out if there's some use case.
@Nicole-Ridgwell-NMMNHS your perspective would be useful here - what's the point of all those quads (which overlap with counties)?
Priority
Functional impact largely depends where #4834 leads, but ANY sort of decision regarding what is or isn't "geography" would be absolutely amazing.
The text was updated successfully, but these errors were encountered: