Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust JSON-LD deserialisation #489

Closed
tesaguri opened this issue Feb 14, 2024 · 3 comments · Fixed by #492
Closed

More robust JSON-LD deserialisation #489

tesaguri opened this issue Feb 14, 2024 · 3 comments · Fixed by #492
Labels
enhancement New feature or request

Comments

@tesaguri
Copy link
Contributor

tesaguri commented Feb 14, 2024

Is your feature request related to a problem? Please describe.

The Properties section of Activity Streams Vocabulary specifies the following:

Items not marked as "Functional" can have multiple values.

And many properties are not marked as Functional: even the type and attributedTo properties can take any number of values, for example.

Also, in the JSON-LD semantics, non-array values and single-value arrays are considered equivalent, unless the term is marked as "@container": "@list" in the JSON-LD context (which only applies to the orderedItems term in the current normative context of Activity Streams).

In fact, a completely compliant implementation MUST produce single values for any terms that are not marked as "@container": "@list" or "@container": "@set", including to, cc and items, wherever applicable, due to Activity Streams' requirement of producing "compacted" JSON-LD serialisations (cf. w3c/activitystreams#535), even though that's far less compatible with other implementations in practice.

Also, in the JSON-LD semantics (again), wherever a term takes an @id, the serialisation can embed the referenced object instead of just referencing it with an IRI.

Currently, the deserialisers in kitsune-type only partially handle the embedded quirks representations and don't handle the array quirks behaviors at all. I believe we want them to be more tolerant to these serialisations for compatibility with the wider ecosystem.

Describe the solution you'd like

Mastodon and Misskey, for example, handle unexpected multiple values by just taking the first value and ignoring the rest (cf. JsonLdHelper#first_of_value of Mastodon, getOneApId of Misskey) and I think we can follow these precedents.

I expect we can handle this efficiently by defining a couple of helper deserialiser functions that only takes the first value in the visit_seq method and takes the id in the visit_map method respectively, and annotate the struct fields with the #[serde(deserialize_with(..)] attribute.

Describe alternatives you've considered

We could make every struct field a Vec, which isn't very efficient given that we have no use for the additional elements.

@tesaguri tesaguri added the enhancement New feature or request label Feb 14, 2024
@aumetra
Copy link
Member

aumetra commented Feb 15, 2024

That would probably iron out some ActivityPub federation quirks we have at the moment, such as the federation issue with Mitra as reported by @silverpill

@silverpill
Copy link

@aumetra Sorry I don't remember which one... Something related to Accept activity?

Regarding deserialisers, that's a good idea. In my experience the most common scenarios are:

  • Value needs to be transformed into a single URL (actor, object, attributedTo, url).
  • Value needs to be transformed into an array of URLs (to, cc).
  • Value needs to be transformed into an array of objects (examples: tag, attachment).

Over time I developed several helpers similar to what @tesaguri proposed: https://codeberg.org/silverpill/mitra/src/tag/v2.9.0/mitra_federation/src/deserialization.rs

@aumetra
Copy link
Member

aumetra commented Feb 16, 2024

Sorry I don't remember which one... Something related to Accept activity?

I think it was something like that.. I also don't remember. No problem. We can just redo federation tests at some point

Over time I developed several helpers similar to what @tesaguri proposed

Those look nice! I'm just thinking if we should make them function on the Deserializer trait level and just make Serializer essentially transient.

That way we could get around the issue of having to first deserialize into the DOM form before storing it into the struct, which is a layer of indirection I'd like to avoid for every incoming ActivityPub object (since that will most likely be the most busy code path, compared to the rest of it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants