Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the server always advertise the owner link relation? #266

Open
kjetilk opened this issue Jun 1, 2021 · 14 comments
Open

Should the server always advertise the owner link relation? #266

kjetilk opened this issue Jun 1, 2021 · 14 comments

Comments

@kjetilk
Copy link
Member

kjetilk commented Jun 1, 2021

Another outstanding issue from #264 was whether the link relation should always be advertised by the server.

This discussion includes whether the owner can be anonymous, and if so, what is the nature of anonymity with respect to WebIDs.

@kjetilk
Copy link
Member Author

kjetilk commented Jun 1, 2021

To further state my opinion, I note that the same URI https://example.org/card#me can directly identify a physical person, it can be a pseudonym or, it can link to nothing, and thus, it might be possible to argue it is anonymous.

The URI itself may give away that it identifies a physical person, https://example.org/kjernsmo#kjetil wouldn't leave much doubt, but for the most part, the real test is what it yields when it is dereferenced. It is also not much doubt if the response contains more personally identifiable information.

For pseudonyms, one must at the very least assure that the neither the URI nor any of the responses that may be obtained by dereferencing any following resources can cause re-identification. To really maintain pseudonymity, then you actually need to assure that the re-identification cannot occur by using this information in conjunction with any information from the Web. I'd say that's pretty hard...

If the actual requirement is anonymity, then clearly, then if there is a dereferenceable URI, the care that must be taken to ensure anonymity is probably even harder. Just dereferencing a URI can lead to data being in some server log somewhere.

I'd therefore argue that it doesn't make sense in Solid to talk about URIs for anonymous things, if an identifier exists, it is at least a pseudonym, with the security practices and legal requirements that follows (GDPR Recital 26 makes clear that GDPR applies to pseudonyms but not anonymized data).

This leads me to that we should not require the owner link relation to be advertised. If the owner requires anonymity, the way it is achieved is by not advertising the owner relation.

It is unclear to me what the argument for this feature is. If it is to allow some data to be found for the owner in all situations, then I think that it carries too much risk for re-identification. I propose that in this context, any deferenceable resource is a pseudonym.

@justinwb
Copy link
Member

justinwb commented Jun 2, 2021

I propose that in this context, any deferenceable resource is a pseudonym.

Generally I am in agreement with this - if a server would always advertise an owner link relation with no other limiting factors.

That said, I'd like to respond to the primary question - Should the server always advertise the owner link relation? I think the pod owner should have the ability to choose.

One way to enact this choice would be to use a mechanism that we already have readily available; authorization. We could then say that the server always advertises the owner link relation, but the agent making the request must have been granted read access to the pod root to receive it. The owner would have the ability to pick and choose who can discover the owner identity URI, which could in turn be psuedonymous or not (again - the owner's choice).

I'm in full support of having an owner for a given pod, and advertising it on requests to the pod root. However, given the possibility for inadvertent information disclosure (e.g. someone puts PII in a webid that is advertised publicly), I think that we should make it protected by default, so that owners have a choice in the matter.

@d-a-v-i--
Copy link

Authorization would inherently be in conflict with concepts of anonymity and thus very complicated to implement. The more elegant solution could be to allow an owner link to be whatever someone wants (random or meaningless, not missing). A lack of integrity requirement preserves confidentiality better than forcing identification through authorization, while trying to create anonymity.

@kjetilk
Copy link
Member Author

kjetilk commented Jun 2, 2021

The more elegant solution could be to allow an owner link to be whatever someone wants (random or meaningless, not missing).

Right, and we could even provide a URI for it, like https://id.solidproject.org/anonymous#user which any user could use for anonymity. However, I'd like to understand the use case for such a thing. For the most part, we want URIs to give people something when they GET it, in the case of an owner possibly a way for them to ask for permission to do something, file a complaint, etc. If the URI doesn't resolve to anything they can use for that, I don't see the point, but I'd be interested in use cases.

@justinwb
Copy link
Member

justinwb commented Jun 3, 2021

A lack of integrity requirement preserves confidentiality better than forcing identification through authorization, while trying to create anonymity.

I'm not sure that the intention of advertising the owner is to create anonymity . The need for anonymity was only raised as a new problem created by advertising the owner to everyone.

For the most part, we want URIs to give people something when they GET it, in the case of an owner possibly a way for them to ask for permission to do something, file a complaint, etc. If the URI doesn't resolve to anything they can use for that, I don't see the point

Agree - the utility of having the owner established and available can be useful - but I'm not sure how useful that would be if the information was inaccessible.

but I'd be interested in use cases.

+1 on use cases.

@kjetilk
Copy link
Member Author

kjetilk commented Jun 3, 2021

So, rereading @csarven and @acoburn 's comment, then yes, I see that a case could be made for that the simplest solution might be to require the owner relation to be advertised and leave to the implementation to determine what that URI might be, and to ensure that no identifying information can be gleaned from it.

If we allow this, then people could say in the response that it is an anonymous user, and so indicate to clients why there aren't anything. Continuing on the idea to give a URI for a generic anonymous user could say something like

<https://id.solidproject.org/anonymous#user> a solid:AnonymousAgent .

If we don't require that relation, then we cut off that opportunity. So, I'm sliding towards a 👍 on this as a requirement anyway.

@justinwb
Copy link
Member

justinwb commented Jun 3, 2021

I see that a case could be made for that the simplest solution might be to require the owner relation to be advertised and leave to the implementation to determine what that URI might be, and to ensure that no identifying information can be gleaned from it.

If the utility of advertising an owner relation only exists if the owner opts out of anonymity, we may be eliminating useful scenarios where the owner wants only a limited group to know they are the owner.

  • What features won't work if someone chooses to use an anonymous URI?
  • Will people end up giving away a bit of privacy in exchange for a feature?
  • Do we have to leave them with that choice? How many users are prepared to understand the tradeoff?
  • Is there another avenue beyond this advertisement in the Link header to discover the owner? If so, it may address the previous concerns.

@kjetilk
Copy link
Member Author

kjetilk commented Jun 3, 2021

Good questions.

I'm not sure which features will not work if they choose an anonymous URI, but I'll try an answer to the last question:

Is there another avenue beyond this advertisement in the Link header to discover the owner? If so, it may address the previous concerns.

I imagine that people will put a triple in the body of a representation of some resource, where it would be subject to access control. Possibly in the root container or something.

@justinwb
Copy link
Member

justinwb commented Jun 4, 2021

I imagine that people will put a triple in the body of a representation of some resource, where it would be subject to access control. Possibly in the root container or something.

I think it's a sensible approach. If the owner was advertised based on that stored value, who the owner was would still be something that was accessible for certain workflows, subject to regular authorization, while at the same time publicly advertising an anonymous identity. I think it makes sense to have that as a server-managed value, so that it would only be changed through administrative interaction (i.e. facilitating an account transfer).

@kjetilk
Copy link
Member Author

kjetilk commented Jun 4, 2021

Yeah, and sometimes I feel that we may be over-using the Link header a bit... Back in the day, it was largely up to the application programmer to supply the body of the HTTP message, while the headers were more the domain of the framework programmer, and therefore the logic behind the headers where more expensive to change. I suppose framework developers of Solid servers are inclined to allow simple access to add Link headers, but still, there is a middle ground where access control applies and there are server managed triples, and it seems to me that we should be better at exploiting that.

@csarven
Copy link
Member

csarven commented Jun 7, 2021

Let me first say that I'm not an expert on privacy by/or data minimization, so take below with a grain of salt. I also do not think that anything in this issue should be generalised to the Solid ecosystem - especially when the discussion includes less than five people with zeroone (mine) implementation. The general topic would deserve its own panel with experts.


The requirement in #264 is essentially:

Link: <URI-Reference>; rel="http://www.w3.org/ns/solid/terms#owner"

Full stop.

The Protocol is not defining a URI Template for the URI-Reference. Why? Simply so that wherever, whenever, however.. the Protocol is used, it can be applied and used in a way that accommodates the needs of the environment and users. Neither is the Protocol implying anything beyond what a URI in context of the owner entails - we don't have a Solid URI scheme either.

There are ample reasons why the owner URI can use any pattern eg. http://example.org/{uuid}#i, http://example.org/TON-618#i... https://id.solidproject.org/anonymous#user.. https://csarven.ca/#i.

Aside: https://id.solidproject.org/anonymous#user is less anonymous than http://example.org/{uuid}#i simply because the former can be easily (or uniquely characterisable) identifiable in a set of identifiers. So, it is actually counter to what's desired. Neither is it the case that it is "anonymous" because it says so in the identifier or if it is claimed to be AnonymousAgent. Asideception: an anonymous agent may want to reveal their identity at a later date.

Aside: My original point about owner being linkable was to enable consistent discovery. I contrasted it to owner being unlinkable ie. server not advertising the link relation. Linking does not eliminate anonymity. Varying degrees of anonymity can still be achieved ie. the identifier doesn't need to be distinguishable from a set of identifiers, neither is there any expectation or association needs to be drawn to any identity or its attributes.

This is where Variability in Specifications plays an important role. The Protocol defines as much as it needs to and leaves sufficient space for implementations to vary. It can be revisited at a later date eg. based on implementation experience as well as how those implementations work out in real life - being informed by observable.. "utility".

The question is not "what's the use case for linking to the owner without a representation?" The use case is owner linkability. The Protocol states the requirements for that. Allowing representation or no-representation is a design decision (variability).

I think much of what's discussed here is most suitable for a Best Practices and Guidelines document - considerations that implementations should take into account. If there are stronger reasons to have orthogonal specs covering different aspects of data minimization in Solid, we can look into that - via dedicated panel perhaps.

As for people wondering about whether there is an alternative way to discovering the owner information besides the link relation... I'm not sure if that's a genuine question at this point in time because one doesn't know or can't think of any possibilities... or forgot existing issues that covers the general topic eg. resource for server/storage (meta)data. Immutable and server-controlled information about the resource eg. creator, date issued/created/modified is in the same category or should be considered at the same time.

@kjetilk
Copy link
Member Author

kjetilk commented Jun 7, 2021

The Protocol is not defining a URI Template for the URI-Reference. Why? Simply so that wherever, whenever, however.. the Protocol is used, it can be applied and used in a way that accommodates the needs of the environment and users. Neither is the Protocol implying anything beyond what a URI in context of the owner entails - we don't have a Solid URI scheme either.

Yes, I don't think anybody suggested otherwise.

The question is not "what's the use case for linking to the owner without a representation?" The use case is owner linkability. The Protocol states the requirements for that. Allowing representation or no-representation is a design decision (variability).

OK, I think I am confused about the terms here. I think the terms requirement and use cases are used in very different ways.

To me owner linkability isn't a use case, it can't be, it is not something a user would ask for. The user would ask for a way to interact with the owner, or to hold the owner accountable for something. A use case could be an expansion of something like "To protect myself, I need to be able to file a complaint about foo". From there, you can derive a set of requirements of different kinds, "the owner must be an identifiable entity", "it must be possible to contact that identifiable entity", and so on. From there, designers has to get creative, and then find that some or all of the requirements can be satisfied with the concept of linkability. With that, designers write a spec that puts down the concept of linkability into a set of requirements on the implementations. You seem to only use the term "requirement" in the latter sense, i.e. the requirement set by the specification on the implementations, but I tend to use it in a lengthy process before that.

Indeed, you could rephrase the title of this issue in terms of linkability. Yes, for identifiable agents and pseudonymous agents, linkability is a definite advantage, but it makes it very difficult to draw a line between pseudonymous agents and anonymous agents, but it is important to draw that line for security reasons and legal reasons.

As for people wondering about whether there is an alternative way to discovering the owner information besides the link relation... I'm not sure if that's a genuine question at this point in time because one doesn't know or can't think of any possibilities... or forgot existing issues that covers the general topic eg. resource for server/storage (meta)data. Immutable and server-controlled information about the resource eg. creator, date issued/created/modified is in the same category or should be considered at the same time.

Of course it is a genuine question, I don't understand why you don't think so. I have just detailed a possibility too. And yes, it could go into the whole server-controlled information scope.

@csarven
Copy link
Member

csarven commented Jun 7, 2021

I agree with what you're saying on use cases and requirements. When I've shortcutted to "owner linkability", that was to distinguish from what you two were going on with and building on from:

If the URI doesn't resolve to anything they can use for that, I don't see the point, but I'd be interested in use cases.

+1 on use cases.

I'm not sure/convinced yet that the difference between anonymous and pseudonymous agents is something in the scope of the protocol. If that distinction is desired, implementations can already make it so. If you tell me that's not possible by looking at the URI, I'll totally accept that (opaqueness..) but then you'll have to accept that it is equally not possible to distinguish either from an identifier that's identifying anything in particular. The actual distinction can be done at the resource description level, if need be.

To me it comes down to the degree of anonymity that we are willing to accept/allow for owner and the trade offs. But to keep it simple, what are the concrete/critical security implications of discoverable owner identifier (with no or random/noise descriptions) vs. unlinked? I'm fine to allow servers to not link - which is in the original PR:

When a server wants to advertise the owner of a storage..

just figured the simple approach is preferable. As opposed to bringing boatloads of complexity to make it possible based on some conditions (and unnecessary coupling) to discover the owner.

@timbl
Copy link
Contributor

timbl commented Jun 22, 2021

The publication of the URI of the owner of the pod is needed to prevent a kludge where the current home page code just guesses it! It was felt so necessary to provide a link to the owner that the current code guesses it is /profile/card#me! This is the only place AFAIK where any client code makes that assumption. But it uses it to say something like

"This is a public homepage of Tim BL on Inrupt.net, whose WebID is https://timbl.inrupt.net/profile/card#me."

or "Tim Berners-Lee (solid.community)'s Pod"

For a professional person, their solid pod is your home page on the web. It is very like their profile, and should be something they should be proud of and with a clear link to the profile .

From the solid onboarding perspective, a crucial sequence which should work smoothly is for example:

  • Alice says, "Bob, check me out on alice.solidcommunity.net"
  • Bob types in the domain name into his browser
  • Bob gets to see Alice's public stuff [after clicking on her picture or directly]
  • Bob starts a new chat with Alice, having to get a Solid account somewhere in the process

So, this functionality is needed even for Alice's own Pod home page to work properly.

It is no real security to hide the owner URI when it is /profile/card#me ! So in that case I suggest we don't offer the option. In the case where the pod owner URI is external, then sure do offer a checkbox to reduce its viability by suppressing this link header, as a 'nice to have' level down the line.

For now let's add the link, and let's remove the assumptions which guess the owner's URI from the client code!

@kjetilk kjetilk added status: Nominated An issue that has been nominated for the next monthly milestone and removed status: Nominated An issue that has been nominated for the next monthly milestone labels Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants