Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add city field to research_organizations table #34

Merged
merged 3 commits into from
Jan 8, 2024

Conversation

dtgupta
Copy link
Contributor

@dtgupta dtgupta commented Jan 5, 2024

Add city field to research_organizations table and update corresponding tests. Adding city field to this table would make querying affiliations from specific cities efficient.

@dspinellis
Copy link
Owner

The PR is well done; thanks! However, while reviewing the code I saw that you're using addresses[0], so this should come from a one-to-many relationship. I then looked at the code, and saw that there is already a ror_addresses table containing a city field. Am I missing something?

@dtgupta
Copy link
Contributor Author

dtgupta commented Jan 5, 2024

Indeed, it may look as if the addresses field has a one-to-many relationship with multiple other addresses, but it's deceiving. I have looked at the ROR dataset and none of the organizations has a 2nd element under addresses.

I added city as a field to simply querying for the process I'm working on. The database cursor is unable to perform JOIN operation to get the city field from ror_addresses table. So it would require an additional time to narrow down the search results while matching. If you're still unsure about this, I am planning on creating another PR with the process soon, you can merge this branch after viewing the changes.

@dspinellis
Copy link
Owner

You are absolutely right:

$ jq '.[].addresses|length' v1.39-2023-12-20-ror-data.json  | sort -u
1

In this case all the elements of the addresses table, not just city should be folded into research_organizations, probably with field names prefixed by address_. A comment should indicate why this is the case.

@dtgupta
Copy link
Contributor Author

dtgupta commented Jan 8, 2024

Yes, that is a good idea! I have folded the addresses of research organizations into the main table, deleted the ror_addresses table and updated the tests. The previous commit reflects these changes.

Copy link
Owner

@dspinellis dspinellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@dspinellis
Copy link
Owner

Well done!

@dspinellis dspinellis merged commit 4a635a7 into dspinellis:main Jan 8, 2024
5 checks passed
@dtgupta dtgupta deleted the add_city_field branch January 8, 2024 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants