You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "@type": "@id"suggests that the object should always be interpreted as URL. Still, both values are turned into strings. Weird huh?
Another more tricky case concerns properties that can take several object types. For instance schema:identifier can take a text, URL or PropertyValue. Therefore you never know wether you'll get a URL or not.
That will also be the case for property taxonRank, that is not yet in schema.org, that can take a text or URL:
So I'm wondering whether the scraper should try to look for a usual url scheme (typically anything starting with http:// or https://) and turn it into a URL.
And should this be done only for properties whose object can be a URL, or should it be done whatever the property so that misuses be tolerated: if one uses a property that normally takes a text value and provides a URL, should we still turn this into a URL or keep it as a string?
The text was updated successfully, but these errors were encountered:
I think you are right that we should do some additional processing of properties that could be URLs and ensure that strings that look like URLs or CURIEs are treated as URLs.
The question of whether properties that are not expected to have a URL is an interesting one. For simplicity at this point I would say no. It could be something that we provide as a configuration parameter if the need arose.
I agree. Besides, since this post-processing may be time consuming, we could make it configurable with a postprocess = true|false plus additional optional parameters to fine tune post-processing.
Hi, this is an issue that we've started to discuss in issue #54.
When scraping page https://inpn.mnhn.fr/espece/cd_nom/60878, some URIs are turned into strings:
Property additionalType is defined in Schema.org context as:
The
"@type": "@id"
suggests that the object should always be interpreted as URL. Still, both values are turned into strings. Weird huh?Another more tricky case concerns properties that can take several object types. For instance
schema:identifier
can take a text, URL or PropertyValue. Therefore you never know wether you'll get a URL or not.That will also be the case for property taxonRank, that is not yet in schema.org, that can take a text or URL:
So I'm wondering whether the scraper should try to look for a usual url scheme (typically anything starting with http:// or https://) and turn it into a URL.
And should this be done only for properties whose object can be a URL, or should it be done whatever the property so that misuses be tolerated: if one uses a property that normally takes a text value and provides a URL, should we still turn this into a URL or keep it as a string?
The text was updated successfully, but these errors were encountered: