-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting object_pairs_hook for json.loads #313
Comments
Thanks for pointing this issue out. It's an interesting edge case; I'm curious how the dataset maintainer ended up with JSON with duplicate keys in the first place as most implementations of JSON won't allow this. There's a change that's coming to STAC_IO to refactor in order to be more flexible, after some difficulties in dealing with authorization headers in pystac-client. This change will likely move to a object-based IO class that can be overridden from the default, instead a top-level package STAC_IO. Not sure that's applicable here, but something to keep in mind while thinking through changes to the current STAC_IO structure. The My current thinking around the STAC_IO refactor is to have an optional IO object passed into any read or write operation. If the user doesn't provide the optional implementation, then the operation will try to use A. the IO implementation that was defined for a previous IO operation against a parent/root (which means stac objects will keep track of and reuse the IO implementation that was used to create them, e.g. for reading links), or a default implementation, which can be passed into the root catalog or perhaps set at the package level if a user wants to set it once and forget about it. In the above setting, we could provide a few standard implementations, e.g. one that uses This might be a bit hard to parse through without code to look at, but I was planning on tackling the IO refactor as part of a larger set of refactors in #309, which is making some bigger changes to the library in preparation for the final STAC 1.0 release. I can take this |
It was a case of copy-pasting an asset entry, which I guess is pretty common in manually edited metadata.
For sure. It's also an application-specific handler. That said, it could easily be generalised to just throw an exception when detecting a duplicate. Looking forward to seeing the final version of #309; let me know if you want a review of it. |
@l0b0 this is implemented in #309, particularly d0c6cc5. I still need to add a test case (or accept a PR into that branch with a test case :-)), but you should be able to do this now: import pystac as ps
class ReportingStacIO(DefaultStacIO, DuplicateKeyReportingMixin):
pass
ps.StacIO.set_default(ReportingStacIO) Thanks fore reviewing! |
I'm working on a project which reads, parses and validates STAC files on AWS S3 recursively, creates a list of asset and metadata files, and copies all of them across to some reference storage. In other words, if a file is not mentioned in the metadata, it will not be copied.
The STAC documentation specifies that assets in collections have unique keys. Unfortunately, the STAC JSON schema can't encode this and the Python JSON parser does not treat duplicate keys as an error.
Combining the above means the following scenario is likely to happen (and has already happened during a pre-production test):
assets
property with two or more identical names, for example,"assets": {"foo": {"href": "s3://bucket/first"}, "foo": {"href": "s3://bucket/second"}}
.At this point, there is no indication that anything went wrong, but in fact the way the JSON parser works means it's silently dropped one of the assets. Detecting and reporting such issues is highly desirable, and can be done by setting the
object_pairs_hook
injson.loads
(implementation, test).We're evaluating whether to use pystac for this project, and this is one of the features of our current code base we would like to keep. Would you be interested in a PR to enable this? It would presumably be very similar to the
read_text_method
mechanism you already have inSTAC_IO
.The text was updated successfully, but these errors were encountered: