Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sh:prefixes optional in SPARQL queries #59

Open
HolgerKnublauch opened this issue Jul 18, 2024 · 6 comments
Open

Make sh:prefixes optional in SPARQL queries #59

HolgerKnublauch opened this issue Jul 18, 2024 · 6 comments

Comments

@HolgerKnublauch
Copy link
Contributor

Basically every user of SHACL-SPARQL ever has run into this limitation: Not only is the syntax of sh:prefixes complex to use, the expectation is that the namespace prefixes from the shapes graph apply automatically.

When this was designed, the WG decided to not rely on automatic definition of prefixes because prefixes are not persisted in the RDF Graph data model, but are just a temporary feature in during parsing. This is despite the fact that many APIs such as Jena do persist the prefixes together with the graph object. But yeah, the problem remains that when SHACL-SPARQL shapes are moved around the prefixes may get lost.

A solution here may be to automatically prepend all prefixes from the shapes graph when no sh:prefixes triple exists. We need to come up with some creative solution on how to define what these prefixes must be. But it's a huge obstacle, so we cannot just leave the current solution in place. At least I believe we can define a list of default prefixes such as rdf: owl: and sh: that should always be present.

@VladimirAlexiev
Copy link
Contributor

👍

Repo namespaces are not 100% "reliable" because if there's a conflicting prefix, the repo can remember only one of the namespaces.
(i.e. if you load file1.ttl that defines foo: and the repo remembers its namespace, then when you load file2.ttl with the same prefix but different namespace, the repo won't remember it).

But if there are such conflicts, the user can override with sh:prefixes

@TallTed
Copy link
Member

TallTed commented Aug 5, 2024

I do not believe we can, nor should, define such a list. There is no reason why SHACL users should be forbidden from using owl: elsewhere (and perhaps owl-ns: for http://www.w3.org/2002/07/owl#).

This "problem" is not specific to SHACL, and should not be addressed by SHACL-related specs. If anywhere, this "problem" should be addressed in the context of SPARQL, as there would then need to be a "SPARQL prefixes registry" or similar.

At present, there is no IANA nor W3C "prefix registry", which would be necessary to prevent collisions. The closest thing of which I'm aware is the prefix.cc lookup service — which does not prevent collisions!

This is despite the fact that many APIs such as Jena do persist the prefixes together with the graph object.

Those APIs are acting outside of all prefixed-name specifications of which I am aware, which specify that the prefixes be declared in the same document in which they are used. I certainly hope that Jena and any other software that acts this way treats prefix declarations they encounter as overriding their persisted prefixes in the context of the live declaration.

How does Jena handle a Turtle document that includes multiple declarations of the same prefix? Which declaration does Jena persist, and for how long? Does Jena persist the first declaration it encounters for any given prefix, over-riding later declarations in the same Turtle document? Or does Jena persist the "last declaration from the first document that contains a declaration for that prefix"?

Note that users have the option of setting any prefix they like in a given document, and may use the same prefix with multiple expansions in a single Turtle document, among other places. Note that this is explicitly permitted by the Turtle spec, and each declaration is active for the prefixed names following that declaration and preceding any other declaration of the same prefix.

rdf: could be used just as well for http://example.org/rough-data-format# as for http://www.w3.org/1999/02/22-rdf-syntax-ns# or https://cacax.fun/.

sh: could be http://shell.example# and/or http://example.sh# as well as http://www.w3.org/ns/shacl# or http://purl.org/skos-history/.

At least I believe we can define a list of default prefixes such as rdf:, owl: and sh: that should always be present.

Such a requirement would conflict with the rest of the universe of RDF specifications, even if nowhere else — but I believe it would conflict with many other specs as well.

@HolgerKnublauch
Copy link
Contributor Author

Whatever solution we want to implement here, SOMETHING has to be done. The current syntax of SHACL with the sh:prefixes has not passed the test of time and basically every user has problems with this. This is an obstacle to wider adoption of SHACL-SPARQL. We can find many reasons not to do something, yet we should have an open discussion with pragmatics as one of the main drivers.

@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented Aug 11, 2024

@HolgerKnublauch
Copy link
Contributor Author

Yes, or they can put explicit PREFIX declarations into the sh:select string. The defaults would only serve as fallback.

@TallTed
Copy link
Member

TallTed commented Aug 12, 2024

@VladimirAlexiev — Whether "[remembering] a prefix the first time they see it" is "a great usability feature" depends greatly on whether that first-use is optimal for the users of that deployment. Indeed, given that, for instance, Turtle mandates that the latest declaration of a given prefix cover any given occurrence within the instance data, "Only the first occurrence is remembered" seems like a great usability negative. At the least, there should be some way to tell such an "auto-remembering" system that this declaration should replace a previously remembered declaration and/or to forget all previously remembered declarations.

Indeed, unless there's some point of user confirmation, "the SPARQL editor can auto-insert such prefix when used in a query" is likely to lead to confusing results that may not match results on any other SPARQL processor, including other deployments of the same software with the same loaded data, just because user queries were run in a different order following data load (i.e., Query_1 on Server_1 uses prefix sh: with one namespace, while Query_2 on Server_1 uses prefix sh: with a different namespace, and these two queries are run in reverse order on Server_2).

Where do you learn what prefix declarations were used for a given query?

This seems like a pure landmine to me.

fwiw, Virtuoso has a table of prefix/namespace registrations (visible here for the DBpedia instance). These can be set for use in exports to media types that support prefixed names (such as N3 or Turtle), and to be a fallback for a SPARQL query that doesn't include one or more declarations for prefixed names found in that query. Declarations that occur in a query over-ride those in the table, when the same prefix is found in both places. There are rules in SPARQL and Turtle (among other places) that govern handling of duplicate prefixes that are declared with different namespaces within the same query or document.

I know that these stored namespaces can lead to user confusion because they have. This is another downside to the optimistic SPARQL interaction that lacks any useful way to report errors or communicate other things (such as predefined namespace prefixes that it applied to execution of a given query) to the user outside of HTTP headers (which many users never see). We provide the facility to predefine declarations because of user demand; I have hopes that this feature will be improved over time to decrease such user confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants