Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Location Table Updates #91

Closed
clairblacketer opened this issue Aug 2, 2017 · 13 comments
Closed

Location Table Updates #91

clairblacketer opened this issue Aug 2, 2017 · 13 comments

Comments

@clairblacketer
Copy link
Contributor

Location Table Updates


Proposal

Relevant table: Location

  • Add a country field
  • Add a way to capture region
  • Add latitude and longitude
  • The GIS workgroup will also have some changes they are interested in as well
Field Required Type Description
location_id Yes integer A unique identifier for each geographic location.
country No varchar(3) The ISO country code
address_1 No varchar(50) The address field 1, typically used for the street address, as it appears in the source data.
address_2 No varchar(50) The address field 2, typically used for additional detail such as buildings, suites, floors, as it appears in the source data.
city No varchar(50) The city field as it appears in the source data.
state No varchar(2) The state field as it appears in the source data.
zip No varchar(9) The zip or postal code.
county No varchar(20) The county.
region No varchar(50) The region field as it appears in the source data.
location_source_value No varchar(50) The verbatim information that is used to uniquely identify the location as it appears in the source data.
latitude No float The geocoded latitude
longitude No float The geocoded longitude

Conventions

  • The Country field will be based on the ISO's 3166-1 vocabulary, which officially designates 2- and 3-character codes for 249 countries. The alpha-3 set, which uses 3-characters, is the suggested version.
  • For standardized geospatial visualization and analysis, addresses need to be, at the minimum be geocoded into latitude and longitude. This allows it to put as a point on a map. This proposal is to add two fields, latitude and longitude to the location table.
  • We should allow regions to propose region-specific extensions to the CDM/vocabulary (needs elaboration)
@JaehyeongCho
Copy link

I think inputting the location information in the observation table would be an alternative for 'Add a way to capture region'

@JaehyeongCho
Copy link

I would like to know when the extended location model presented in this issue will be reflected in the OMOP-CDM version.

Will it be reflected in the next OMOP-CDM update?

@gowthamrao
Copy link
Member

I second that. Let's get the least controversial fields latitude and longitude in to the location table, at least

@cgreich
Copy link
Contributor

cgreich commented Feb 23, 2018

@gowthamrao: What's controversial about the others? I wouldn't call the field "zip" but postalcode (zip is an utterly American thing), and I might challenge whether we can merge the region or county fields, but that's not very controversial. If you think it's ready bring it on. @clairblacketer holds the meeting agenda strings.

@gowthamrao
Copy link
Member

Exactly. You just mentioned all the controversies. Region, zip, etc. Lat and long are non controversial. We need to propose that ASAP and try to incorporate into CDM.

@cgreich
Copy link
Contributor

cgreich commented Feb 23, 2018

Bring it on.

@clairblacketer
Copy link
Contributor Author

@gowthamrao @cgreich I'll add this to the agenda for next Tuesday and I will invite Robert Miller as well from the GIS workgroup.

@cgreich
Copy link
Contributor

cgreich commented Mar 2, 2018

Well, all of those guys say they just want country (which I don't understand why, since the value will always be the same in a database, with rarest exceptions). Only Robert has GIS, and the Koreans. Want to invite those?

@clairblacketer
Copy link
Contributor Author

Sure - I invited Robert Miller already and I will extend the invite to the Koreans as well.

@rtmill
Copy link

rtmill commented Mar 6, 2018

We didn't have time during the CDM call today to discuss this proposal so I figured I'd put my two cents down while the idea is still fresh. To preface, this ignores the other discussion of how to enable the location table to be internationally friendly (in addition to adding a field to represent altitude as Melanie Philofsky has suggested) as I assume it's best to distinguish the topics.

In short, my revised proposal is nearly the exact format that @gowthamrao put forward in his proposal (happy 1 year anniversary Gowtham's post!). The only difference would be to increase the size limit of the country field to 100 characters or so, just in case. TODO: Data mine Gowtham's forum post history, pass off ideas as my own

Is there anyone who disagrees with adding latitude and longitude columns? It appears to be low hanging fruit that we could push forward if we get hung up on the other modifications to the location table.

Differences from current proposal

1 ) Use free text instead of ISO codes to represent countries

  • Do we have use cases where institutions have data from enough distinct countries that they need to put countries into universal coding/concepts? For the small percentage of users that do have data from multiple countries, as long as they are consistent with how they represent the country as a string it should be sufficient for querying (what sort of analysis are we using country for?)
  • Fits pattern currently used in location table for street address fields (string as appears in source)
  • If you are using your location data to do spatial analysis, you will most likely have your data geocoded (latitude/longitude fields populated) and could thus infer the country rather than rely on the field

2 ) Do not maintain a reference to a region within the location table

  • Locations are not equivalent to regions (discussed here)
  • We should avoid forcing many-to-many relationships into one-to-one representations
    • Why continue this pattern of design that keeps biting us later on? For instance, currently there's no way to represent location history (though a proposal is ready for review) due to the CDM assumption that only one location is relevant throughout a person's life
      • e.g. Person to Location, Person to Care site, Person to Provider, Provider to Care Site, Care Site to Location, Provider to Location, etc.
      • We can avoid these issues by:
        • Restrict data that is temporally qualified to the tables that have temporal qualifiers (date fields)
        • Refrain from storing dynamic data in static tables
      • In other words, restrict the type of data we keep in entity tables (person, location, provider, care site) to one-to-one, static information (e.g. date of birth and race in PERSON, street address and lat/long in LOCATION)
    • The relationship between location and region is many-to-many
    • Which region type would be the standard? How do we reconcile institutions' varying level of granularity? International?
    • Regions do not always roll up to one another (e.g. census block to hospital service area) thus a single location-to-region reference would not be sufficient
Field Required Type Description
location_id Yes integer A unique identifier for each geographic location.
address_1 No varchar(50) The address field 1, typically used for the street address, as it appears in the source data.
address_2 No varchar(50) The address field 2, typically used for additional detail such as buildings, suites, floors, as it appears in the source data.
city No varchar(50) The city field as it appears in the source data.
state No varchar(2) The state field as it appears in the source data.
zip No varchar(9) The zip or postal code.
county No varchar(20) The county.
country No varchar(100) The country as it appears in the source data
location_source_value No varchar(50) The verbatim information that is used to uniquely identify the location as it appears in the source data.
latitude No float The geocoded latitude
longitude No float The geocoded longitude

@gowthamrao
Copy link
Member

gowthamrao commented Mar 6, 2018

@rtmill good post.

Altitude - can't it be derived at analytics time? Are there any use case to have altitude only on location table without lat and long? If lat and long are prerequisites, then can't altitude be derived from it thru a lookup?

@MelaniePhilofsky
Copy link
Collaborator

We, University of Colorado Denver, are interested in altitude data as a criteria to include patients in a cohort.

@clairblacketer
Copy link
Contributor Author

added in v6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants