-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile negotiation [RPFN] #74
Comments
👍 |
I have put up a proposal at https://github.com/w3c/dxwg/tree/profiledesc-working/profiledesc/profileneg |
Nick, profile negotiation is its own deliverable, as per the charter, and is so far based on a proposal by Lars and Ruben: https://profilenegotiation.github.io/I-D-Accept--Schema/I-D-accept-schema. It would be best not to start a separate effort, but to further what is already proposed. Also note that any "solutions" must be based on use cases and requirements. As I have mentioned before, we appear to be lacking use cases that would lead to the profileDesc work and this profile negotiation proposal. |
I think that the work I’ve outlined above is compatible with Lars’ & Ruben’s work. In the implementions we’ve used before, a _format Query Sting Argument is used instead of it as a override for Accept header and _view QSA is effectively the equivalent of Accept-Profile. I would be able to implement Profile headers in the 6 or so APIs delivering different profiles in operation now if I can get persistent URIs for the profiles. We have discussed the registration of Profiles within our Govt Linked Data WG as registration would give them a persistent URI. We will likely register a series of Profiles for purposes such as an energy sector profile of DCAT (2018) but currently we are unclear about whether a catalogue of known profiles is needed or even possible. We may make such a thing for Aust Gov-approved profiles. |
I think we should be careful to try to standardise a way of putting profile information into URIs/URLs by mandating the use of _format or _view. I agree that it's one way of doing it, but there are others as well. The URLs to the specific resource versions can be propagated using http |
I agree that URI QSAs are only one of many ways of doing it and perhaps even a secondary way with HTTP headers being the primary, however I think such easy human use ways are very useful, hence my Use Case https://github.com/w3c/dxwg/issues/239 Since we are providing profile guidance, not just a single standard, I think we can base URI methods on (to be compatible with) HTTP methods. |
I don't disagree that we need easy ways for humans to address profiled versions of documents. What I disagree with is to say that we should mandate the use of |
Let's not break the Web; no spec should mandate the URL structure of a server. A secondary way can just be to follow links, i.e., opening the main profile URI in the browser results in an HTML document with links to other representations (for which the server can determine the URIs of its own). |
+1 to @RubenVerborgh |
I understand that the motivation for profile negotiation is that some users want to be able to use the same URL for versions of a dataset that conform to different profiles. Can anyone explain why some users want that? It does work to model the versions as representations of a resource, but it would also work to model them as separate resources (as in Ruben's suggestion above). |
No, the motivation is to have the same resource available in different profiles. Note that each representation still can have its own URL. We will just provide the mechanism to get from resource to representation.
Both models are the exact same, really. To understand this, it's important to see that the "representation" concept is a relative notion. E.g., in the sentence "A is a representation of B", B the resource that A is the representation of. However, A is a resource in its own right. An example to clarify:
Regardless of whether 2 has its own URL, all of the following hold:
|
I'm talking about the motivation to use negotiation. If the only motivation is to have the same resource available in conformance to different profiles, I don't see any particular reason to have profile negotiation that works like content negotiation. Having multiple profiles available is realized already by just offering a version of the dataset that conforms to one profile under one URL and a version that applies to another under another URL. Sorry I can't recall where it was expressed, but the idea of one URL for multiple profiles came from someone else in the group (maybe Lars?). |
|
Negotiation is what gets clients to the representation with their preferred profile.
No, that's not the motivation. We can do that with existing technologies already. What existing technologies don't do, is automatically getting a resource represented in a profile the client understands.
It's just like negotiating between XML or JSON, except more fine-grained:
But how does the client get from one to the other? |
Can we use DCAT as an example? I'm going to toss one out but it may not be correct. What if you have a dataset that has a whole lot of census-type data, which includes a wide range of elements that can be seen as about people (age, race, employment, location). Not every use of the data wants to make use of all of the columns in the table. Would different profiles be the way to get the view of the data that you desire? If so, could there be a direct correlation between profiles and services? Or could it be that one person's profile is another person's service? |
Yes, my serialization is your media type. "It might or might not have its own identifier " - if there is no identifier, how will it be accessed/transmitted? It's fine to have a "work" identifier (although again I caution that one needs to think very hard about what that identifier identifies), but any resource on the web has an identifier for the resource, not just the work. This is why I recommend that this work vs. actual thing be thought through carefully, and the relationship between those be clear. I don't know DCAT terribly well but this seems to be a difference between dataset and distribution. Obviously, the response to content negotiation is some form of distribution (in DCAT terms). In the FRBR sense, the work is an abstract concept with no physical/digital presence, and it is only when it is manifested (distributed) is there a non-abstract thing. So as long as the URI for the dataset refers to an abstraction, that makes sense, but I'm not clear on what the non-abstraction consists of. |
Yes, my serialization is your media type.
That might be a bit confusing then, because a serialization
(as in "a concrete series of bytes representing a dataset")
would be determined by multiple factors,
such as media type, language, and profile.
"It might or might not have its own identifier " - if there is no identifier, how will it be accessed/transmitted?
Access through the non-negotiated identifier;
indicate your preferences in headers.
The server replies with the negotiated response.
but any resource on the web has an identifier for the resource, not just the work.
Any resource on the Web *can* have an identifier.
I don't know DCAT terribly well but this seems to be a difference between dataset and distribution.
A distribution is a representation of a dataset.
So as long as the URI for the dataset refers to an abstraction, that makes sense, but I'm not clear on what the non-abstraction consists of.
It refers to the dataset.
Ruben
|
I think you misunderstood my question about non-abstractions, so let me make it clearer. As I understand it: Therefore the non-abstraction above refers to a distribution (as defined in DCAT, which has some permanence) or some other "thing" that is returned from content negotiation. What I'm wondering is whether, in DCAT parlance, what is served through conneg is a distribution, or if it is something else, or if this isn't the right question? Adding (from DCAT): |
Therefore the non-abstraction above refers to a distribution (as defined in DCAT, which has some permanence)
OK.
What I'm wondering is whether, in DCAT parlance, what is served through conneg is a distribution, or if it is something else, or if this isn't the right question?
A distribution.
Ruben
|
@kcoyle & @RubenVerborgh: are you saying above that one can interpret a resource and profiles of it as a Dataset and Distributions of it? If so, I think this is problematic. I see many more types of Resources and Profiles of them than DCAT will allow for. E.g., a Sample identified by URI with profiles of metadata for different purposes. The Resource + Profiles pattern holds here but not Dataset +Distributions. I can think of other cases: Datasets are just too “big” a thing for many Resources to be sensibly interpreted as them |
I'm saying that a dataset is a resource, and that representations of that dataset conforming to certain profiles and serialized in a certain media type are distributions.
That's fine. The mechanism is more generic than that. It's not because a dataset is a resource, that all resources are dataset. |
The alignment of DCAT and FRBR [1] is incomplete -
In order to fully match FRBR we would need a way to indicate different schematic representations of a dataset (i.e. conforming to different profiles), alongside the different serializations (media-types). Maybe add [1] https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records |
@kcoyle - now we have added an explicit class for services ( |
In an email that has not made into this GitHub thread, @agreiner takes us back to Fielding's analysis of web architecture, which distinguishes only Resource and Representation. The issue with that is that it conflates schematic representation and serialization into the one step. As I understand it, this requirement (Profile Negotiation) is aimed at allowing the shcematic representation to be made explicit. Meanwhile, @kcoyle has pointed out how this correlates with the FRBR conceptualization, which I've attempted to make more explicit two comments up. |
Not conflates, but combines. Why is that an issue? A representation can be negotiated over multiple dimensions, including media type, profile, language, etc.
Yes. |
Yes, combines - that is a better word. Not a problem, but an issue that is being teased out in the discussion here. Yes, multiple dimensions. FRBR privileges schematic representation very high up the conceptual stack, with its own class, while somehow the web had neglected it until now! |
Actually, FRBR is based on documents and doesn't really fit well with data - the whole "work/expression" thing is very text-based, and even librarians complain that they can't fit it will into music, film, etc. Rather than reference FRBR, why not simply say that there is an abstraction of the dataset which has certain metadata functionality (e.g. describes the dataset apart from any specific instances of it), and there are one or more distributions which have byte-presence. |
@nicholascar In my mind, a profile defines a distribution. Presumably conneg requests a distribution that conforms to a profile. I'm not sure what you mean by "a Sample identified by URI with profiles of metadata for different purposes." This seems to be analogous to the library case, where there is a physical thing (book) that is described by metadata; and there can be profiles governing what metadata is distributed. Is that the same? |
I am a little lost in the multiple things being considered here. Can I check this RDF as an assertion, based on Ruben's comment #74 (comment):
You could get Distribution_Y by asking for Dataset_X with a distribution conforming to Profile_Z. Interpretation using ProfileDesc:
|
The RDF snippet works for me. |
@kcoyle in #74 (comment): I think there is an analog of sorts between my Sample example and your Book example but I'm keen to avoid any inferencing whereby someone then thinks that a Sample (or a Book) is then a We can achieve this by having ProfileDesc as the general purpose ontology and ProfileDesc-like functionality allowed in DCAT, as indicated in my comment immediately above. |
The test implementation of the Media Types Linked Data API I just set up implements both QSA & HTTP format & language negotiation within QSA & HTTP profile negotiation, e.g.: Format: Entry for https://w3id.org/mediatype/text/csv in HTML, ‘alternates’ profile (‘view’ as the API calls it) requested using the URI https://promsns.org/def/alt: As above but in RDF (JSON-LD): Demo of weighted profile neg with not available view being ignored (not receiving HTTP 406): Entry for https://w3id.org/mediatype/text/csv, alternates’ profile indicated by QSA using token & Media Type also indicated by QSA: Entry for https://w3id.org/mediatype/text/csv default profile with format indicated by QSA using token overriding HTTP Accept header: Language: A Media Type, default view, HTML, in Polish (preferred), using HTTP headers In this configuration, both the format and language dimensions of the resource are dependent on (configured for a particular) profile. The alternates view of a Media Type shows all the options: https://w3id.org/mediatype/audio/3gpp?_view=alternates Note that the alternates view itself is only available in English and that the non-HTML serialisations of the “mt” view, while supposedly bing in Polish actually are not. This is an error for the dataset implementer (me) to fix with RDF lang mappings but the API is operating correctly now with both format & lang within profile QSA and HTTP-based negotiation. Not Implemented yet:
This is just a start. |
A concrete use case that we have at the Getty today, that might help some of the commenters or at least provide an avenue for further clarifications: The Getty Vocabularies are available as Linked Open Data. We currently provide exactly one schema which is a large super-set of SKOS. This schema is appropriate if you want to know absolutely every last thing that we know about the thesaurus terms. This is true for almost no one, it turns out ;) We also manage data in the institution using a profile of CIDOC-CRM, with which SKOS is not very well-aligned natively but is trivially mappable. For consistency with these other holdings, we would like to make the vocabularies available at the same URIs using this profile. This demonstrates two points:
We also intend to have a pure SKOS profile for consumers that don't care about everything, but do need SKOS. Again, the format and profile are orthogonal in the same way, and the URI being the same is critical. Please compare:
|
Rob's example above is what I would call the output from a "cross-walk" - data is converted from some database or metadata schema to another, and these schemas, in some cases, may be application profiles depending on their contents and functionality. It isn't clear to me if every use of metadata is a profile, however, so referring to profiles in the conneg work may not meet our definition of "profile", which is not (AFAIK) "any metadata schema." And not including non-profile metadata schemas may not satisfy the needs of conneg. We are going to have to spend some time on definitions. Note that we have (so far) defined profiles as: A profile is a named set of constraints on one or more identified base specifications, I think this is more restrictive than "arbitrary metadata schema". Wanting to serve the same data using a different metadata schema has the reputation of being lossy (in terms of absolute semantics). Rob says: "a different URI would mean a different concept." But I'm not so sure that we aren't talking about different concepts, although I realize that this becomes philosophical at a point. I believe this is what is bothering @agreiner. These are different datasets. That doesn't mean that you can't give an identifier to your data in all of its forms, but the same data served with different metadata schemas as a result of a conversion process is indeed a different dataset. But what is really troubling me is the use of "profile". (I know that "schema" isn't a great word to use here - substitute "model" or whatever you prefer if it bothers you.) |
I believe that our use case falls under that definition, in that both profiles have multiple base specifications, with subclasses, specific interpretations, identified vocabularies for the data instances and are there to accomplish particular functions. We are not talking about two different real world concepts of "gold", and hence the URI must be the same. If RDF/XML and Turtle are not different datasets, but SKOS and CIDOC-CRM are, then it seems the philosophy of the content negotiation deliverable is not aligned with the DCAT deliverable. As a reductio ad absurdum, if in model (A) the requirement is to use |
Rob, I do see the problem as the alignment between the use of the term "profile" in the two different deliverables. Whether we can align them, we'll have to see. The use of "application profile" in deliverable 2 (guidance for APs) becomes quite broad if we are to cover ANY metadata. Yet the conneg use case may need to allow for any metadata schema, not just those that meet our definition of "profile." As for if (A) and (B) are different datasets, the definition that I find in the DCAT document is: "A dataset in DCAT is defined as a "collection of data, published or curated by a single agent, and available for access or download in one or more formats". A dataset is a conceptual entity, and may be represented by one or more distributions that serialize the dataset for transfer. " Earlier discussion has likened DCAT datasets to FRBR:work (lots of warts there), so your definition of dataset coincides with the DCAT one, and I used "dataset" perhaps more in line with DCAT's "distribution" which reads: "Definition: | Connects a dataset to its available distributions." That definition seems to be undergoing discussion, and the emphasis on "serialization" may be an issue. I also note that "format" is dct:format, aka IANA media type. However, I'll try to be more in line with DCAT definitions in the future. |
The Use Case that @azaroth42 preesents sounds very similar to the one we have in the DNB where we want to and was described above. Good to hear we're not alone! |
de-tagging as Profile Negotiation |
Profile negotiation [RPFN]
Create a way to negotiate choice of profile between clients and servers
Related requirements: Profile definition [RPFDF]
Related use cases: Detailing and requesting additional constraints (profiles) beyond content types [ID2] Standard APIs for metadata profile negotiation [ID30]
The text was updated successfully, but these errors were encountered: