Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable full use of relative URLs in Solid #194

Open
bblfish opened this issue Aug 20, 2020 · 14 comments
Open

Enable full use of relative URLs in Solid #194

bblfish opened this issue Aug 20, 2020 · 14 comments

Comments

@bblfish
Copy link
Member

bblfish commented Aug 20, 2020

Solid Containers are based on LDP Basic Containers. Those make it difficult to use relative urls reliably when publishing a document. Indeed the only relative URLs that can reliably be used in a document POSTed to an ldp:BasicContainer is the "this document" <> relative URL. The same is true with the current Solid spec, as it inherits that definition.

A bit of history will help to understand how ldp got to this compromise. It was an uphill battle at the time to get relative URLs (expressed in Turtle with <>) to be accepted at all, as various members of the LDP Working Group (WG) claimed that RDF documents with relative URLs were not bona fide RDF document at all. It took a while to overcome that inhibition. (The main argument is that POSTing is a speech/document act). Having finally reached consensus on the use of <> to refer to the newly created document, members of the WG did not want to go further for fear of loosing what they had achieved.

Over 7 years have passed and I think everyone is quite happy now with relative URLs. Indeed the Solid spec show a desire to go further (as shown below). We can make publishing documents a lot easier by also allowing URLs to be published containing any relative URL even ones linking up the hierarchy with `<../>. For people publishing HTML documents this would allow them to edit files locally and to publish those as they appear on their file system, with as little transformation as possible needed. It would also make it easier to publish new formats without the server or client needing to parse documents to absolutize relative URLs.

This would easily be feasible by creating a new type of Container. In a proposal from 2012 I called these intuitive Containers iContainers. Of course nothing in this argument hangs on the name. Here is what I wrote in issue 50: Intuitive Containers: better support for relative URIs in 2012:

Currently when creating a resource by POSTing a Turtle document to a container one cannot use the following relative URIs in the POSTed content:

<.> to refer to the creating container
<sibling> to refer to a sibling of the content created
<../other> to refer to a child of the parent of this container
<sister/child> to refer to a child of a sister content created in this container

The reason one cannot use any of these URIs that are allowed by Turtle and allowed by the URI spec, is that one cannot know what the URL of the resource created by a container will be given the current definition of an LDPC. Will it have any relation at all to the container URL? Since we are in the process of creating resources, it will help a lot to know when we are dealing with containers that have those properties.

Given that we accepted "ISSUE-29: Relative URIs are crucial" we therefore need to define a Container class that makes this guarantee to the client. This can be a subclass of an LDPC or it could be a new requirement on an LDPC.

So how does this apply to the current Solid Spec? Currently in §2.4.1 Shared slash semantics it is written that:

The slash character in the URI path indicates hierarchical relationship segments, and enables relative referencing [RFC3986]. The semantics of the slash character is shared by servers and clients. Paths ending with a slash denote a container resource.

The problem with this is that this is not guaranteed by any of the ldp Containers. In order to allow this behavior to be explicit and so for there not to be clashes with existing ldp implementations from IBM, Oracle, OpenLink or others, which could lead to errors in publications, Solid needs to define a subclass of ldp:Container with the desired properties.

Furthermore there are very good use cases for the less precisely specified Containers as defined by ldp. So it would be a pitty to loose those capabilities whilst also creating difficult to debug confusion in the wider community.

@csarven
Copy link
Member

csarven commented Aug 20, 2020

Is there something about iContainer or specifically its relative URI use that's different than how relative URIs are resolved with a base URI? As RDF 1.1 Turtle uses RFC 3986's relative resolution: https://www.ietf.org/rfc/rfc3986.html#section-5.2 , I don't see why a server wouldn't be able to process a POST's payload containing valid relative references. Ditto JSON-LD, HTML.. How is this not guaranteed to the client? Which criteria in the LDP spec limits this possibility?

@bblfish
Copy link
Member Author

bblfish commented Aug 21, 2020

Thanks for bringing that up. I was hoping to have such a discussion on the LDP group 8 years ago.

In the §5.2.3.7 of the LDP spec we find

LDP servers creating a LDP-RS via POST MUST interpret the null relative URI for the subject of triples in the LDP-RS representation in the request entity body as identifying the entity in the request body. Commonly, that entity is the model for the "to be created" LDPR, so triples whose subject is the null relative URI result in triples in the created resource whose subject is the created resource.

It does not say anything about other relative URLs. In particular the client cannot know if the server would interpret <.> urls to refer to the container in which it was posted, or the URL space where it ends up being placed. Not knowing this the client MUST absolutize all URLs except the local doc ones if it wants reliable behavior.

Note that with LDP the server does not need to parse the content to change relative URLs. It can just create a new resource somewhere, because document relative URLs will function out of the box as indicated by the LDP spec.

With an iContainer the same would be true too: the server would not need to parse any of the content, because the container would guarantee that the names chosen for a new resource would result in all relative URLs functioning in a way that can be relied on by the client. As a result the client would not need to absolutize relative URLs referring up and down the URL tree to existing documents either.

@csarven
Copy link
Member

csarven commented Aug 21, 2020

I do not know why 5.2.3.7 is specifically called for, but while it doesn't say more about other kinds of relative references, does it need to? Does Solid need to? Why can't a server while required to be able to parse some RDF bearing documents (re RDF 1.1 Concepts) not be equipped to resolve all valid relative IRIs in its description when producing its RDF graph? If not, wouldn't the server be using a non-conforming parser?

Is the issue you want to highlight about serializing in that relative IRIs in resource description may not be preserved and that it would be useful to make sure that happens? If so, would this be better documented in non-normative text or as a best practice, eg. along the lines of https://www.w3.org/TR/ldp-bp/#use-relative-uris if anything above and beyond?

@bblfish
Copy link
Member Author

bblfish commented Aug 21, 2020

I do not know why 5.2.3.7 is specifically called for, but while it doesn't say more about other kinds of relative references, does it need to?

The LDP group started off with a lot of people who did not understand the value of relative URLs, because they were mesmerized by RDF semantics being defined in terms of absolute URLs. So they thought the only thing that would be legal to POST would be fully defined RDF documents, such as NTriples. This then led them to the following impasse: how can you POST a document that can refer to itself? I.e. how do you POST

<> a foaf:PersonalProfileDocument;
  foaf:primaryTopic <#i> .

if you are not allowed relative URLs in the POSTed content? To be precise: how should the client name the<#i> resource since it does not yet know the name of the yet-to-be created document?
Quite a conundrum!

Some went on to suggest that the client should name those nodes with UUIDs so that after creation they could PATCH the created document and change the mistaken URLs. Others tried to think of complex protocols to do the same thing.

Luckily pragmatism prevailed and it was resolved that it was possible to send documents with relative URLs to the server. The text you saw above in §5.2.3.7 is testament to that.

But it was not possible to get to the next stage of the discussion - the one we we are having now - as Solid wants to specify (quite rightly) a more intuitive type of container. If you don't specify it you're left with the LDP one which does not provide the guarantees you want. So yes, Solid needs to create a subclass of ldp Containers if it wants

  • the behavior with slash URLs denoting containers
  • for the resources created by those containers to be the URL of the container + string that does not contain a /.

Why can't a server while required to be able to parse some RDF bearing documents (re RDF 1.1 Concepts) not be equipped to resolve all valid relative IRIs in its description when producing its RDF graph? If not, wouldn't the server be using a non-conforming parser?

How does the client know what the path relative URLs (eg. <.> ) refer to in the POSTed document, since it does not know where that document is going to land? And where is the behavior you are suggesting specified? (nowhere is the answer)

Furthermore there would be no need for the server to parse content to make sure the relative URLs do the right thing with an iContainer, since that follows from the definition §5.2 of RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. As a result you save yourself work on the client and on the server thereby removing a whole class of bugs whilst making the protocol a lot more intuitive.

Is there a problem with that?

@bblfish
Copy link
Member Author

bblfish commented Aug 21, 2020

btw. you don't need to specify a sub class iContainer of ldp:BasicContainer.
You could also specify solid:Resource to have the desired properties.
Then it would be possible to write:

<>  a ldp:BasicContainer, solid:Resource;
    ...

as well as

<>  a ldp:DirectContainer, solid:Resource;
   ...

I am not sure how non Container resources for solid differ from ldp yet. If they don't then solid:Container would be enough.
There are quite a number of other logically equivalent ways of doing the same thing. The advantage of solid:Resource is that clients written for ldp would continue to work with solid resources.

@bblfish
Copy link
Member Author

bblfish commented Aug 22, 2020

Let me know if you discuss this on the spec meetup. So I can make time to attend.

@bblfish
Copy link
Member Author

bblfish commented Aug 31, 2020

For an example of the types of problems that the LDP group had to surmount with relative URLS and that are it seems still alive see the recent discussion on gitter with Martynas.

@bblfish
Copy link
Member Author

bblfish commented Sep 3, 2020

In the discussions the following question came up, which is an interesting one asked by @csarven among others:

Should the LDP container not in fact correctly deal with documents that have relative URLS of all kinds by enforcing that <> in the posted document refers to the created document and that <.> refers to the container?

This is interesting because it suggests that perhaps no changes are needed. It would just follow from the many other specs that this is the right behavior.

The following would need to be investigated:

  • does any LDP container that does not behave the intuitive way do this?
  • Should they?

Here are reasons to think that this is very unlikely. Having contributed to banana-rdf -- a Scala abstraction over Java libraries such as Jena and Sesame, and over rdflib for JS -- I am pretty sure that these libraries are only written to parse rdf content with relative URLs by giving it the URL of the document from which it was fetched.

But the requirement that <.> refer to the container (e.g. /foo/) and <> to the created resource (e.g. /foo/bar/baz/bam would require a parser to be given 2 URLs to correctly resolve the document: one URL for the container and one for the newly created resource.

This would need to be done across all document types (e.g. HTML) that can contain relative URLs. This seems like a lot of work and it seems very unlikely to have been implemented.

@csarven
Copy link
Member

csarven commented Sep 18, 2020

Solid and LDP both refer to RFC 3986 for relative referencing. They are compatible in that regard.

As clients assign a URI to a resource when using PUT and PATCH, relative references in representation content are deterministic.

LDP doesn't specify URI patterns for resources when using the POST method eg. POST http://example.org/foo/ may result in creating a resource with URI http://example.org/foo/bar , http://example.org/baz/qux/quxx, http://example.org/{uuid} or something else. In Solid however, new resource URI must be in context of the request URI, so http://example.org/foo/bar will be created, and it is not possible to have the newly created resource anywhere but as a direct member of http://example.org/foo/.

Solid servers' behaviour in that regard is still compatible with LDP even if only ldp:Container is advertised in the HTTP Link header of the request URI (ie. http://example.org/foo/ as the effective request URI of POST). This entails that it is possible to have a Solid server as a particular specialisation of LDP with respect to behaviours around POST to containers. Clients that are built to speak with LDP servers would also be compatible with Solid servers re relative referencing - LDP-clients don't assure deterministic referencing when communicating with LDP servers. So the fact that Solid server happens to is a non-issue for LDP clients.

A Solid client can't fully communicate with a server that only implements LDP eg. it is possible for an LDP server implementation to assign URIs to new resources that's not compatible with Solid (re examples above). An LDP server must be specialised and extended to fulfil Solid server requirements in order to participate in the Solid ecosystem.

The Solid spec currently includes normative text - which is open to improvements - that's needed for clients to both generate and parse representations using relative referencing (as per RFC 3986). This information is currently not advertised by a Solid server. I agree that a type like solid:Resource (on any resource) would directly communicate the interaction possibilities as described in the Solid spec. Essentially this is particularly (only?) useful to Solid clients. In that regard, a Solid server claiming itself to be Solid-conforming can be done in a number of ways, including for example, response to OPTIONS * including Link: <http://www.w3.org/solid/terms#Resource>; rel="type" in the HTTP header to indicate how all URIs and requests targeting them will be interpreted by a server.

@kjetilk
Copy link
Member

kjetilk commented Jun 17, 2021

But the requirement that <.> refer to the container (e.g. /foo/) and <> to the created resource (e.g. /foo/bar/baz/bam would require a parser to be given 2 URLs to correctly resolve the document: one URL for the container and one for the newly created resource.

But surely, <.> to /foo/bar/baz/bam doesn't mean /foo/, it means /foo/bar/baz/? You'd need to have <../../> for it to mean just /foo/?

FWIW, I tested this with the old and largely obsolete RDF::Trine framework and the new Attean framework, which are the two main Perl frameworks for RDF, and both did the right thing right out of the box. I only had to pass the URL for the newly created resource, the URI resolution then figured out that <.> meant the container.

It would be nice if you could try it out in these frameworks @bblfish as it should do it as normal URI resolution.

@bblfish
Copy link
Member Author

bblfish commented Jun 17, 2021

@bblfish wrote 10 months ago

But the requirement that <.> refer to the container (e.g. /foo/) and <> to the created resource (e.g. /foo/bar/baz/bam would require a parser to be given 2 URLs to correctly resolve the document: one URL for the container and one for the newly created resource.

@kjetilk answered 7 hours ago:

But surely, <.> to /foo/bar/baz/bam doesn't mean /foo/, it means /foo/bar/baz/? You'd need to have <../../> for it to mean just /foo/?

That was exactly my point. I was saying that we need to introduce a subClass of ldp:Container, which I called an solid:iContainer (for intutive Container) which works correctly with relative URLs.

ldp:Containers don't work correctly with relative URLs other than <> - and it was a huge debate to get those into the spec.
For example they don't work for <.>, since posting a document into </foo/> stating in N3 that

<.> ldp:contains <> .

could end up creating a resource at </foo/bar/baz/bam> and, as you point out @kjetilk, <.> there would refer to </foo/bar/baz/> not the intended </foo/>`.

So there one could just lift one's hand 🤷 and swear about LDP!

(pause for swearing 🤬)

But actually there is a use for the LDP behavior, that we documented in the Use Case and Requirements document in the Privacy Section under §2.6.4 Limit information disclosure through URI. A server may want to have UUID encoded names for its containers and resources so that nothing can be gleaned from the URL structure. Then if you found a document with those URLs and an .onion domain name the URI would be completely opaque. In that case you actually want the LDP behavior: you cannot guarantee anything about the structure of the name of a newly constructed resource: you MUST follow your nose to find out where objects are contained. That is a pretty good use case, but it does not make for easy learning and not all situations require such protective measures: most public web sites don't. So the Solid intuitive container is also useful.

Therefore my suggestion is: let us distinguish them by name, and allow for both ldp:Container and solid:iContainer :-)

@kjetilk
Copy link
Member

kjetilk commented Jun 17, 2021

🤬 indeed!

I find that discussions starting with LDP orthodoxy usually aren't very fruitful. Yet, it seems like they simply underdefined the behaviour, which makes me think that we shouldn't bother too much about it, all relative URIs must be resolved relative to the retrieval URI as per RFC 3986, which is really the only sane thing to do. It doesn't seem to break the LDP spec AFAICS, but I suppose it may break existing LDP implementations that do insane things. Whether or not that means we need a new container type then depends on the actual practice out there, it seems to me.

I'm also not so sure about that use case... If your requirements are such that you cannot even expose container structure, then I would think twice about having the data on the Web at all. I would at least put it behind several protective layers that could have their own semantics...

@bblfish
Copy link
Member Author

bblfish commented Jun 17, 2021

all relative URIs must be resolved relative to the retrieval URI as per RFC 3986,

yes, LDP changes nothing about that and could not.
What LDP does not guarantee is that POSTing to an ldp:Container /foo/ will result in a resource /foo/bar it could also result in a resource /bar . Therefore before POSTing to an ldp:Container all URI paths in the POSTed content must be absolutizes to the root /, as otherwise they would not necessarily refer to the expected resource.

The Limit information Disclosure through URI is going to be very important to sell Solid to the paranoid security community. I really want to be able to implement that and go to the Chaos Communication Congress and show Solid working with Tor, which would be very cool. So for me the "underdefined" behavior of ldp is just right there.

But I also want the Solid community grow by being easy to learn, so I propose we also have an solid:iContainer type, which provides the guarantees that Solid has been specifying. That is what people building public readable web pages expect and I like that convention too. For solid:iContainers there is no need to make any changes to the relative URLs of the POSTed content. That is the behavior I am implementing right now on my server.

There is actually no need to oppose these two schemes, they are complementary.

@kjetilk
Copy link
Member

kjetilk commented Jun 17, 2021

Hmmm, right. I guess a Solid-specific container would work for me :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants