Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simpler extension interface #1051

Closed
jsignell opened this issue Mar 17, 2023 · 16 comments · Fixed by #1243
Closed

Simpler extension interface #1051

jsignell opened this issue Mar 17, 2023 · 16 comments · Fixed by #1243
Assignees
Milestone

Comments

@jsignell
Copy link
Member

I just started interacting with the extensions in pystac and I was surprised by the API. As an xarray user I was expecting to just do item.proj. I think we can implement that by copying the _CachedAccessor code from xarray (this seems to be what pandas has done as well)

Getting extension and setting fields on it

Existing API

from pystac.extensions.proj import ProjectionExtension
import pystac 

PROJ_EXT_EXAMPLE_URL = "https://raw.githubusercontent.com/stac-extensions/projection/v1.1.0/examples/item.json"
item = pystac.Item.from_file(PROJ_EXT_EXAMPLE_URL)

proj_ext = ProjectExtension.ext(item)
proj_ext.proj

Proposed API

import pystac 

PROJ_EXT_EXAMPLE_URL = "https://raw.githubusercontent.com/stac-extensions/projection/v1.1.0/examples/item.json"
item = pystac.Item.from_file(PROJ_EXT_EXAMPLE_URL)

item.proj.epsg

Adding extension to item

I'm not sure what adding new extensions to existing items should look like.

Existing API

from pystac.extensions.eo import EOExtension

eo_ext = EOExtension.add_to(item)
eo_ext.cloud_cover = 0.7

Options

In the most magical world (probably too magical) just setting an extension field on the item could add the extension to the item:

item.eo.cloud_cover = 0.7

Another option would be to preserver the existing explicit add_to and remove_from functionality but maybe with the functions defined on the STACObject class so you could do:

item.add_extension("eo")  
item.eo.cloud_cover = 0.7

Either way, I think trying to set the field directly: item.proj = ProjectionExtension() should fail.

xref: #448 where extensions are also being discussed.

@jsignell
Copy link
Member Author

I did a little implementation (no types yet) just to get a sense of what this feels like and I am pretty sure I like it in the get case.

@gadomski
Copy link
Member

A couple of comments/questions.

Extension names as enum

item.add_extension("eo")

This should probably be an enum, e.g.:

class Extension(enum):  # Extensions would then inherit from a `BaseExtension` or `AbstractExtension` or something
    Eo = "eo"
    Projection = "projection"
    ...

Bulk update

Either way, I think trying to set the field directly: item.proj = ProjectionExtension() should fail.

I think having some sort of interface for bulk-updating an extension's values is important. It can be useful to store constant values as dictionaries (e.g. this example for the classification extension).

I'm not sure what that would look like, maybe:

item.proj.update(epsg=4326, shape=[42, 16])

Assets

Would assets have a similar interface? E.g.:

asset = item.assets["data"]
asset.raster.bands[0].nodata = 42

Sub structures

Many extensions have sub-structures, e.g. raster:bands. Would those still be their own standalone data structures? E.g.:

asset.raster.bands = [RasterBand(nodata=42, data_type="int16")]

@gadomski gadomski added this to the 2.0 milestone Mar 24, 2023
@jsignell
Copy link
Member Author

I was definitely approaching this more from the perspective of someone reading rather than writing STAC. I just spent some time looking over stactools/sentinel2 to get a sense of what the write patterns look like. So with that in mind I am going to try to answer Pete's questions. But if there is just no appetite for this type of change, feel free to close this ticket.

Extension names as enum

item.add_extension("eo")

This should probably be an enum, e.g.:

I am leaning more towards the very magic solution of not requiring these at all. So how you add an extension to an item is just by starting to write fields on it.

item.eo.cloud_cover = 0.7

Bulk update

Either way, I think trying to set the field directly: item.proj = ProjectionExtension() should fail.

I think having some sort of interface for bulk-updating an extension's values is important. It can be useful to store constant values as dictionaries (e.g. this example for the classification extension).

I'm not sure what that would look like, maybe:

item.proj.update(epsg=4326, shape=[42, 16])

I think that is pretty much right and already implemented in *Extension.apply. We can add enforcement of this method in the base class.

Assets

Would assets have a similar interface? E.g.:

asset = item.assets["data"]
asset.raster.bands[0].nodata = 42

Yes I think all pystac objects would have this interface.

Sub structures

Many extensions have sub-structures, e.g. raster:bands. Would those still be their own standalone data structures? E.g.:

asset.raster.bands = [RasterBand(nodata=42, data_type="int16")]

I think those would still be standalone structures yeah.

@gadomski
Copy link
Member

gadomski commented Apr 11, 2023

Cool, I'm on board with your suggestions (with my one comment below). How do you propose handling breaking changes to extensions? Currently we're (you're) building support for non-breaking changes, but how will pystac handle a world where there's (e.g.) two versions of proj in the wild that aren't directly compatible from a data structure sense?

I am leaning more towards the very magic solution of not requiring these at all. So how you add an extension to an item is just by starting to write fields on it

We should retain some way to check to see if an extension is present on an object. Maybe something like this?

assert not item.has_extension("eo")
item.eo.cloud_cover = 42
assert item.has_extension("eo")
assert item.extension_url("eo") == "https://stac-extensions.github.io/eo/v1.1.0/schema.json"

@jsignell
Copy link
Member Author

This is what I was imagining:

assert item.eo is None  # or maybe it raises an error
item.eo.cloud_cover = 42
assert item.eo is not None

Too weird?

How do you propose handling breaking changes to extensions? Currently we're (you're) building support for non-breaking changes, but how will pystac handle a world where there's (e.g.) two versions of proj in the wild that aren't directly compatible from a data structure sense?

I haven't really thought that through. It feels like a separate issue. My original thinking was that if extensions had a simpler interface we could have a separate class per schema version (Option 2 from #448) and it would be less disruptive for the user since they don't really interact with the class directly. From what I've seen so far in new schema versions it seems like newer classes could inherit from older ones and then older ones could be retired on some timeline and when people try to use older one they could get an error and a suggestion to use stac-migrate

@gadomski
Copy link
Member

Too weird?

How would we check (e.g.) the version number/url? I think it's a good thing to inspect w/o having to dig into stac_extensions.

My original thinking was that if extensions had a simpler interface we could have a separate class per schema version

👍🏼 yup, that all makes sense, just wanted to make sure we at least were thinking about it as a part of this redesign (even if we're not fully implementing).

@jsignell
Copy link
Member Author

Too weird?

How would we check (e.g.) the version number/url? I think it's a good thing to inspect w/o having to dig into stac_extensions.

Once an extension has been added to an object you can access all properties and methods on the extension class via the namespace like item.eo.get_schema_uri()

@gadomski
Copy link
Member

Once an extension has been added to an object you can access all properties and methods on the extension class via the namespace like item.eo.get_schema_uri()

That does get a little weird with our recent support for multiple schema uris (#1091). item.proj.get_schema_uri() would return the latest version, but the underlying item might have an older version?

@jsignell
Copy link
Member Author

I think this whole idea lives in the world where every version of the schema has its own class. So get_schema_uri() (or possibly it's an attr schema_uri) would always be correct.

@gadomski
Copy link
Member

I think this whole idea lives in the world where every version of the schema has its own class. So get_schema_uri() (or possibly it's an attr schema_uri) would always be correct.

Even if there's no changes to the data structure? And how do we handle versions we don't have a data structure for (e.g. a new proj version that hasn't been ported to PySTAC yet).

Sorry for the slew of questions :-)

@jsignell
Copy link
Member Author

I feel like I'm missing something. What is the current mechanism for getting the version of an extension?

Even if there's no changes to the data structure?

Why would there be a release with no changes? But yes I think there would be a new class for every version. Even if it's just

class ProjectionExtension(ProjectionExtension_1_0_0):
    schema_uri = "https://stac-extensions.github.io/projection/v1.1.0/schema.json"

And how do we handle versions we don't have a data structure for (e.g. a new proj version that hasn't been ported to PySTAC yet).

I don't have a clear idea of how to handle this. My first inclination is to bake in an assumption about the schema_uri structure and do a match, but I'm not sure if that is a good idea or not.

@gadomski
Copy link
Member

What is the current mechanism for getting the version of an extension?

Right now, there isn't one other than get_schema_uri(). But as we expand to support multiple versions, IMO it would be a good idea to let a user inspect what version their object has.

Why would there be a release with no changes?

proj is an example. We relaxed the epsg requirement, and updated some underlying jsonschema versions. None of that required changes to the pystac code base other than the extension schema url.

My first inclination is to bake in an assumption about the schema_uri structure and do a match, but I'm not sure if that is a good idea or not.

This has been my instinct as well. I haven't built anything, but my idea was something like:

  • If it's a MINOR change, any unrecognized fields (i.e. attribute adds, say if we add proj:authority) get auto-added as attributes to the extension object.
  • If it's a MAJOR change, a warning is issued, but all attributes are still available on the object.

That way we can avoid breaking the ecosystem if pystac is slow for adding support for new extension versions.

@jsignell
Copy link
Member Author

Why would there be a release with no changes?

proj is an example. We relaxed the epsg requirement, and updated some underlying jsonschema versions. None of that required changes to the pystac code base other than the extension schema url.

I guess I was thinking that we were just not doing the minor data changes because there is no mechanism. For instance with the new proj version if we had an easy way to extend the existing class we probably would have made proj:epsg not required in pystac.

@gadomski
Copy link
Member

For instance with the new proj version if we had an easy way to extend the existing class we probably would have made proj:epsg not required in pystac.

Gotcha, and each extension data structure (e.g. ProjectionExtensionV1) would support multiple schema urls, a la #1091?

@jsignell
Copy link
Member Author

Gotcha, and each extension data structure (e.g. ProjectionExtensionV1) would support multiple schema urls, a la #1091?

Hmm yeah I guess so we have to support multiple schema urls per class if we want to not have a ceiling.

@jsignell
Copy link
Member Author

We were just talking about this at the STAC sprint and @lossyrob pointed out that if you only register the extension attributes dynamically then the types aren't very useful in a static environment. So it would be better to explicitly add at least the "core" extensions. These would be non-nullable getters that throw an error if the extension in question is not implemented for a particular object.

Additionally these extensions should be nested under the .ext attribute to indicate that they have a different level of support and potentially availability.

Note that we still want to let external libraries register extensions dynamically. Even if this means that static typing doesn't work for those extension classes.

@jsignell jsignell self-assigned this Sep 26, 2023
@jsignell jsignell moved this from Todo to In Progress in STAC Sprint 2023 Sep 27, 2023
@gadomski gadomski modified the milestones: 2.0, 1.9 Sep 27, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in STAC Sprint 2023 Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants