Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find geoJson files that aren't recognized as geoJson files; use redetect endpoint so that they're recognized as geoJson files #200

Open
jggautier opened this issue Nov 9, 2022 · 2 comments

Comments

@jggautier
Copy link
Collaborator

jggautier commented Nov 9, 2022

In #168 I wrote about a geoJson file that the Harvard repository didn't recognize as a geoJson file. After the redetect API endpoint was worked on, using it worked for making the repository label the file as a geoJson file. One reason for doing this is so that the file can be previewed in web browsers.

There may be other geoJson files that the Harvard repository does not recognize as such. This issue is to recommend and track the work of finding those files and using the redetect API endpoint to make the Harvard repository recognize the files as geoJson files.

@pdurbin
Copy link
Member

pdurbin commented Nov 18, 2022

Detecting GeoJSON files is currently based entirely on the file extension of .geojson (see IQSS/dataverse#8262) so you should be able to find candidates for redetection with this call to the Search API (with some iteration):

https://dataverse.harvard.edu/api/search?q=name:*.geojson

You'll know if the redetection is successful if file_content_type changes from application/octet-stream to application/geo+json.

@jggautier
Copy link
Collaborator Author

I've revisited this GitHub issue as part of an effort to review and prioritize work proposed in GitHub issues in the IQSS/Dataverse repo that have been opened for years (IQSS/dataverse-pm#114).

Knowing that detecting GeoJSON files is based only on the file extension of .geojson, the Search API call Phil shared, https://dataverse.harvard.edu/api/search?q=name:*.geojson, shows the files that have that extension.

We can search for files that already have the file type of GeoJSON with another Search API call: https://dataverse.harvard.edu/api/search?q=fileType:GeoJSON

And we can see which files have the extension of .geojson but not the file type GeoJSON with this Search API call: https://dataverse.harvard.edu/api/search?q=name:*.geojson+NOT+fileType:GeoJSON.

There are 84 published files as of this writing (late July 2024). Adding a superuser API token should return any unpublished files, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants