-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blank nodes #19
Comments
Personally, I like the idea of having expressions avoid the need for blank nodes. Note that there could be a semantic layer on top of RDF to support such constructs, without needing to adapt the triple-based data model - perhaps this could be tied in with one of the big ideas (?) But that kind of solution would still involve blank nodes behind-the-scenes, of course. |
As more background, TriG currently allows blank node labels to span multiple graphs within the same document: "BlankNodes sharing the same label in differently labeled graph statements are considered to be the same BlankNode." |
One of the practical difficulties of bnodes is use in structures; RDF lists, use in values for quantity+unit, because this is fragile. Lists can be broken in some way, or values having two units. Checking is "whole graph", not at the level of input stream when feedback is more useful. With guaranteed correct data, systems can store and handle in optimal form. Scoping to small sections of the document, like Something for N-triples is needed as well, but also for Turtle because determining pretty printing is the same as checking and expensive on large data (larger than RAM). A solution which allows streaming the graph out is needed (experience from both Turtle and RDF/XML output - users like pretty but at scale have to accept a reduced form). |
In support of: IDEA: Separate existential quantifier (blank node) logic from RDF
Carroll, Jeremy J. "Signing RDF graphs." International Semantic Web Conference. Springer, Berlin, Heidelberg, 2003. RDF needs to be processable in polynomial time, otherwise critical (business) use cases are nearly impossible if 100 % compliance with the RDF spec is expected. @jeremycarroll: any further insights on this matter? |
I think Aidan Hogan and co-authors have done the most research on blank nodes more recently:
I do not know a github handle for Aidan, but I will email him to see if I can get his attention on this. I spoke to him by phone earlier today and he is super busy with teaching right now, but intends to follow up in the next couple of days. |
There is a risk of "selection bias" . People who like what there is or who are "getting on with stuff" don't write papers for journals! From https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0170.html
From https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0053.html
|
We shouldn't be banning stuff that exists in the existing RDF standard just because a specific programmer or end-user profile struggles with its essence and/or utility. A Blank Node is how RDF (a Language) delivers the functionality of an Indefinite Pronoun. |
So you are saying that "cannot be done in polynomial time" is a struggle of "a specific programmer or end-user"? It's not about banning blank nodes, it's about separating them into an on-top layer, so that the core layer can be properly supported & widespread. Blank nodes should be limited to use cases actually requiring them (e.g. OWL vocabularies) but only as an optional opt-in feature. RDF is not a language, it is "a framework for representing information in the Web", so I'm not sure of the necessity of indefinite pronouns at is core. It would be helpful for the discussion if you could elaborate on that. |
That wasn't the point of anything I said.
I don't know how I or anyone else has indicated otherwise re. blank nodes i.e., they are supposed to be used when required. Use them where a pronoun would be applied in a structured sentence.
RDF is an Abstract Language. You can create RDF sentences using a variety of notations and serialize for persistent storage to a variety of document types. RDF has nothing to do with the Web in its most basic sense i.e., it is a framework that makes systematic use of signs, syntax, and semantics for encoding and decoding information. The subject, predicate, and object roles of an RDF sentence are basically a rendition of "parts of speech" in natural language. An HTTP-based Web comes into play when you apply Linked Data principles to RDF sentence construction, along the following lines:
Related
|
Notwithstanding a lot of this comment, I want to pick up on one part:
I want to challenge the final sentence of this strongly. |
To be crystal clear on this matter, regarding my fundamental point: A Blank Node brings Indefinite Pronoun functionality (and power) to RDF sentence construction. I've [provided examples] (http://kingsley.idehen.net/public_home/kidehen/Public/Linked%20Data%20Documents/Tutorials/conceptual-graphs-to-turtle-examples/), as I always do.
I hope I've clarified my position which simply boils down to leave RDF as is based on its ability to deal with complex "horses for courses" matters regarding structured data representation. Prematurely deprecating existing functionality on the basis of style or the usual "make it simpler" subjective argument aren't recommended. Related |
Yeah, I'm pretty sure we have a strong agreement on a lot of this. I see no need to actually change RDF. I accept that there is stuff that some people want to say that needs existential quantification. So I want to see the use of blank nodes discouraged, rather than where they are used as what often seems the go-to answer to not being bothered to actually say what you want in the RDF. Mind you, for your example, I don't see which of the blank nodes you have could not have an ID. |
On 19 Dec 2018, at 15:46, Hugh Glaser ***@***.***> wrote:
We might use "this", "that", "it", "she" etc. as pronouns extensively.
It is deeply unhelpful if, when representing the same knowledge in RDF, we choose to use blank nodes.
In natural language, we have some words with a widely agreed meaning - these are words you can look up in a dictionary. We also have names that can only be resolved in context, e.g. my friends know me as Dave, but there are lots of other people called Dave. In yet other cases you are given a description rather than a name, e.g. the third door on the left.
RDF should allow for the same flexibility.
Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things
|
@draggett If I stretch the natural language parallel further, to say: In NL we don't usually bother to name things because the overheads are not so much work. [I view it like the overheads of doing eager evaluation in a lazy evaluation context, in fact. |
I've opened issue #48 to explore to take a step back from the specifics of blank nodes, to discuss different kinds of identifiers in terms of how they are used as requirements for the underlying framework. |
Remember, you can apply reasoning and inference to produce 5-Star Linked Data (on a forward-chained basis) from basic RDF that may or may not be littered with Blank Nodes. Blank Nodes are a key part of RDF flexibility. Like most things, misuse leads to problems. This is why I look at all of this through the "horses for courses" context-lenses . I think we both agree that RDF is good as is i.e., it doesn't need to be tinkered with, especially not on the basis of compatibility with a fundamentally different approach to modeling as espoused by "Property Graphs" (a totally confusing moniker to me!). Personally, I believe we just need more tooling, educational collateral, dog-fooding, and cooperation :) |
Good yes, but not good enough. To quote: "1. The goal is to make RDF -- or some RDF-based successor -- easy enough for average developers (middle 33%), who are new to RDF, to be consistently successful. 2. Solutions may involve anything in the RDF ecosystem: standards, tools, guidance, etc. All options are on the table. 3. Backward compatibility is highly desirable, but less important than ease of use."
Agreed. But we should consider deprecating something if: (a) it raises the entry barrier to RDF adoption; and (b) reasonable alternatives are available. Please do not conflate blank nodes (as existential variables) with the syntactic conventions of () and [] in Turtle that currently generate implicit blank nodes. I fully agree that we need those syntactic conveniences. But we do not necessarily need the underlying blank nodes that are generated. There is ample evidence showing that blank nodes as existential variables are not actually needed in the vast majority of cases: URIs could be used instead. Obviously we would not want to force users to manually create a URI everywhere they wish to use () or []. That would be far too tedious. But just as tools auto-generate blank node labels, they could auto-generate URIs similarly. Details would have to be worked out, of course, but it is a realistic possibility. But I think the most important point around blank nodes is that users should not have to ever think about them or know about them. If blank nodes exist at all, they should be invisible to the user. |
Hi @kidehen
I think the problem can then be characterised as "what is the content of the educational collateral"? I thought I would look at |
I totally agree with that, and that exactly is why splitting RDF into profiles is a valid idea, which would lower the entry barrier both for consumers as well as tooling producers. Something in the line of RDF 2.0 with an opt-in RDFbn profile, which would itself provided backwards compatibility for RDF 1.1.
I agree that the lack of tooling is one of the main issues, but a high entry barrier together with big issues (non-referable "Linked Data", non-polynomial processing times for basic business use cases) produced by the current blank nodes approach hinder (imo) the development of such tools. |
I accept the notion that RDF is a problem. We are inadvertently blaming RDF for the issues arising from the missing RDF Applications Web (or Knowledge Graph). There are no tweaks to RDF that will fix the issue outlined above. Every other technology that's negated the problems afflicting RDF have done so via Application Directories and Catalogs combined with a library of educational literature. I am not speculating here, I am speaking from experience over the last 24+ years. There is a set pattern which simply hasn't happened with RDF en masse:
Deprecating is basically banning in my world. Why can't we let stuff evolve naturally? For instance, those who don't want to work with Blank Nodes simply don't use them.
You've lost me on that one. My blank node examples include pictorials. RDF-Turtle is just a preferred notation I use for my examples. An RDF-processor translates RDF sentences crafted using a notation. That's what I demonstrate with our OSDS tool.
See my comment above.
"Horses for courses" is a powerful feature of RDF. We shouldn't tamper with this. IMHO. |
Let me further explain this:
The syntactic conventions of () and [] in Turtle are a very important convenience. Currently they generate implicit blank nodes at the triple level -- "implicit" because they have no visible label at the Turtle level, in contrast with explicit blank nodes such as _:b1 . Two important things to note about implicit blank nodes:
Furthermore, URIs are just plain better than blank nodes in all but a vanishingly few cases. They are stable names that can be used reliably in follow-up SPARQL queries, and they prevent duplicate (non-lean) triples when the same data is loaded twice. In other words, the convenience that we crave is not because blank nodes provide existential variables. It is almost entirely the convenience of the syntactic conventions of () and [], which actually have nothing to do with blank-nodes-as-existential-variables. If we tease these features apart, I believe we can have our cake and eat it too: the convenience of () and [] without the problems that unrestricted blank nodes bring. |
I don't know why you are assuming that my point is about syntax. My point is about the fact that I can whimsically scribble RDF sentences on-the-fly. Likewise, I can construct transformations informed by inference rules as and when required. This discussion is a classic example of issues arising from the lack of awareness that afflicting existing RDF productivity tools which creates the illusion of non-existence. You've made assumptions about my intentions because my intentions aren't clear to you. Everything you've described about producing URIs from sentences crafted using RDF-Turtle notation ultimately belongs to the Applications rather than Language (and associated inscription notations and content serialization formats) bucket. RDF isn't the problem. The problem with RDF boils down to difficultly finding Applications (various genres) and Educational Literature (for various audiences) that help folks better understand and appreciate its value proposition. BTW -- Nothing stops the creation of a a new research area that creates something like a Markdown for RDF. That's where a lot of these stylistic issues belong. IMHO. Related
|
I think this is an important comment about "deprecation" as a term:
We need to be clear about terms. Of course, I may be the odd one out - I would like to know if I am, please. |
Every time I've encountered deprecation in the real-world it has amounted to banning i.e., newer tools treat what's deprecated as invalid which breaks existing stuff. Personally, speaking about "best practices" for using tech in a specific context is much safer. For example, you don't need Blank Nodes when publishing Linked Data if the use-case is dataset publication. |
They are an important convenience for RDF
authors, but they cause insidious downstream complications.
They have subtle, confusing semantics. (As Nathan Rixham
once aptly put it, a blank node is "a name that is not
a name".) Blank nodes are special second-class citizens
in RDF. They cannot be used as predicates, and they are not
stable identifiers. A blank node label cannot be used in
a follow-up SPARQL query to refer to the same node, which
is justifiably viewed as completely broken by RDF newbies.
Blank nodes also cause duplicate triples (non-lean) when the
same data is loaded more than once, which can easily happen
when data is merged from different sources. And they cause
difficulties with canonicalization.
"A problem we have with blank nodes that might make us banish them is
the impossibility to use them in reified statements."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0092.html
IDEA: Allow expressions as first-class entities
"Allowing expressions to be predicate arguments eliminates most cases
where blank nodes are required. In bio-ontologies, we have large numbers
of simple EL expressions that create huge numbers of blank nodes that
complicate SPARQL queries. Similarly, for representing equations like
E=mc^2, it's blank nodes or some kind of awful (from a programmer-pov)
unnecessary IDs."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0045.html
IDEA: Separate existential quantifier (blank node) logic from RDF
"I'm starting to believe an idea of separating the existential quantifier
(blank node) logic from RDF itself to a separate semantic extension on
top of RDF should be explored. As evidenced by this discussion it is
difficult to understand and talk about. If separate, it could be
expanded by negation to have the full power of FOL as Pat suggested. If
such separation was possible and made the basic operations (merges,
canonicalization) on RDF data sets easier to reason about and implement,
it would be of quite beneficial."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0172.html
IDEA: Add explicit scope mechanism for blank nodes
"Bnodes introduced to encode
structures like n-ary relational assertions, or lists, or some
complicated piece of OWL syntax, should have a very narrow scope
corresponding to the exact boundaries of those structures, and
hence should be ‘invisible’ from outside (which is why it is fine
to make them vanish in a higher-level syntax using [ ] or ( ).) . . . .
imagine a variant of NTriples in which a subset of
triples can be enclosed in brackets, say [ ] (or something else
if these are already taken) to indicate that any bnode ID in a
triple inside the bracket is local to those triples".
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0218.html
"A better system, which would allow for more elaborate structures, would be to have convention of labelled scope brackets of the form [ID ]"
https://lists.w3.org/Archives/Public/semantic-web/2018Dec/att-0018/00-part
https://lists.w3.org/Archives/Public/semantic-web/2018Dec/0018.html
IDEA: Define equality and hash functions on types
"For a common approach to addresses maybe a group like Schema org could
publish ==() and hash() functions on their
https://schema.org/PostalAddress page, possibly open sourced. In the
interim they could nominate an existing service like
https://smartystreets.com/, which is an address validation API I've just
discovered, there seem to be several. At a later stage they could publish a
fuzzy matching function there too."
https://lists.w3.org/Archives/Public/semantic-web/2018Dec/0001.html
The text was updated successfully, but these errors were encountered: