Add an index command #199

mielliott · 2022-11-02T22:05:11Z

#196 suggests to allow preston to look up URLs associated with a hash. Doing this quickly requires building an index. I can imagine two ways this could work:

Feed nquads into preston index for indexing

$ echo <nquads> | preston index
# to index everything:
$ preston ls | preston index
# to just index aliases:
$ preston alias | preston index
# to index whatever you want:
$ preston ls | grep <whatever> | preston index

Feed hashes (or nquads containing hashes) into preston index and index their content

$ echo hash://sha256/abc | preston index
# or
$ echo <blah> <...hasVersion> <hash://sha256/abc> | preston index
# to index everything:
$ preston history | preston index
# to index the latest log:
$ preston head | preston index

Option 1 is simpler and makes it easier for the user to pick and choose what goes into the index

Option 2 has the advantage of being able to record where statements came from, which is a big part of what "indexes" do, and also keeps the provenance chain going, which is great. e.g. in a Lucene index where "documents" represent RDF statements, we could record each statement's origin as a line in a provenance log (line:hash://sha256/abc!/L52)

(There's also an option 3: do both option 1 and option 2)

Option 1 is tempting but I think I favor option 2. @jhpoelen thoughts? Or better ideas?

The text was updated successfully, but these errors were encountered:

mielliott · 2022-11-02T22:23:04Z

Could also limit this to

$ preston index

to index all provenance, not allowing any flexibility. But this is significantly less fun

jhpoelen · 2022-11-02T23:30:11Z

I like the piping of things for sure!

And, I was wondering . . . building an index in just another transformation of some provenance logs. . . and has a specific result (the index), so I was wondering whether you had in mind to be able to do things like:

preston history | preston index | preston process

where, preston process take the nquads generated by the indexing and adds it to the provenance log.

the index would generate some dataset containing a bunch of lucene index files (or insert your favorite indexing method).

Neat thing about this would be that a provenance log would be securely linked to a specific version of an index. With this, you can ask questions like:

Ok Google, can you find me an alias index derived from hash://sha256/abc123 ?

or

Hey Siri, can you ask Google to find me a taxonomic name index derived from hash://sha256/abc123 ?

Would be fun to say out loud right? And, no need to spin those CPUs unnecessarily to regenerate an index that has already been baked somewhere.

mielliott · 2022-11-03T00:32:49Z

Piping is great! I figured preston index Should automatically append the index generation info to the provenance log because it’s saving to the blobstore, not just pumping results to stdout

Would be fun to say out loud right?

“hash colon slash slash sha 2 5 6 slash alpha beta 1 2 3 …” sounds great. Everyone loves convenient voice commands

mielliott · 2022-11-03T00:43:07Z

So, to clarify, I imagined building the index in temp/, then zipping everything and tossing it into data/ automatically. Then commands that make use of it (thinking of server commands like preston s —registry —index hash://sha256/abc) would unzip it back into tmp/ before using it. But we could keep it ready in an index/ folder so it’s better for one-off commands like alias

jhpoelen · 2022-11-03T02:08:45Z

Nice! I want it!

jhpoelen · 2023-12-15T23:53:16Z

using

#!/bin/bash
#
# index a patched version of provenance graph associated with an anchor
# into oxigraph  
#


preston ls\
 --anchor hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd\
 --remote https://linker.bio\
 | sed -E 's/(<)([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})([^ ]*)(>)/<urn:uuid:\2>/g'\
 | pv -l\
 | ./oxigraph_server_v0.3.22_x86_64_linux_gnu load --lenient --format nq --location preston-gib

I was able to load:

82159896 triples loaded in 1604s (51214 t/s)

with

$ du -d1 -h preston-gib/
33G	preston-gib/

jhpoelen · 2023-12-15T23:58:05Z

and then, with another-query.sparql

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?o
WHERE {
  <http://collections.mnhn.fr/ipt/archive.do?r=mnhn-ar> <http://purl.org/pav/hasVersion> ?o .
} limit 10

yielding

$ time cat another-query.sparql | ./oxigraph_server_v0.3.22_x86_64_linux_gnu query --location preston-gib/ --results-format tsv
?o
<hash://sha256/08b0b7ff634cf02132f4dc3b41df5f6d3ca3c6d4beb1e6c80fd5245c817d1849>
<hash://sha256/402990fc84d667d2dc68cf760725e43c63f4039ddbc9e59076fa287462cc3273>
<hash://sha256/c5e69606006c807ab7b886df38fdb9709b913c070ecc36a8d0eeb08f6b61887b>
<hash://sha256/8ac3636cfe810ae0f029f08ddd0cfed7e1d6ee0c63f3d7fb0dc809bc06231d62>
<hash://sha256/91f2b225f75f53914d0528a87500249b1f43357f7a27f230ce2f0e1e0e9526e8>
<hash://sha256/c4b005122e24f9385bce87195e7b37ba1bd4790b33470bff2a7d4ad0831e2cc0>
<hash://sha256/d495d171d8c7098a764a49f6721169c1d4aee1a02d69b5ff02831962e7564404>
<hash://sha256/f2fe90d0c11f4990de095d97439a3856d786e487c49b2926f5734d10caf93174>
<hash://sha256/800855c2b73c3fcd5f63340a4e22c90568d13c326c4ee8b70c3f487b38e1bb97>
<hash://sha256/3ce3de0b4038274f2ce3670f69a8f63122706d6d68b987b2436bdb957bab43a5>

real	0m0.060s
user	0m0.045s
sys	0m0.017s

jhpoelen · 2023-12-15T23:58:47Z

@mielliott perhaps we have found our indexer in oxigraph . . .

jhpoelen · 2023-12-16T00:33:53Z

Looking up content associated with a GBIF dataset id https://gbif.org/dataset/4fa7b334-ce0d-4e88-aaae-2e0c138d049e

urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e

see also

SELECT ?archiveUrl ?seenAt ?contentId  
WHERE {
  graph ?g1 {
   <urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e> <http://www.w3.org/ns/prov#hadMember> ?archiveUrl . 
   ?archiveUrl <http://purl.org/dc/elements/1.1/format> "application/dwca" .
  }
  graph ?activity {
    ?activity <http://www.w3.org/ns/prov#used> ?archiveUrl .
    ?activity <http://www.w3.org/ns/prov#generatedAtTime> ?seenAt .
    ?contentId <http://www.w3.org/ns/prov#qualifiedGeneration> ?activity .  
    }
} limit 10

yielding

?archiveUrl	?seenAt	?contentId
https://hosted-datasets.gbif.org/eBird/2020-eBird-dwca-1.0.zip	"2021-11-02T16:25:42.407Z"^^http://www.w3.org/2001/XMLSchema#dateTime	hash://sha256/b99c3f70f8571cd5bb1d6af84f1dccd5332736e8ac7a96f39e192fe9a7590d1c
https://hosted-datasets.gbif.org/eBird/2020-eBird-dwca-1.0.zip	"2022-02-02T06:55:11.184Z"^^http://www.w3.org/2001/XMLSchema#dateTime	hash://sha256/b99c3f70f8571cd5bb1d6af84f1dccd5332736e8ac7a96f39e192fe9a7590d1c
https://hosted-datasets.gbif.org/eBird/2020-eBird-dwca-1.0.zip	"2021-12-02T09:24:32.779Z"^^http://www.w3.org/2001/XMLSchema#dateTime	hash://sha256/b99c3f70f8571cd5bb1d6af84f1dccd5332736e8ac7a96f39e192fe9a7590d1c
https://hosted-datasets.gbif.org/eBird/2020-eBird-dwca-1.0.zip	"2021-08-03T01:33:39.136Z"^^http://www.w3.org/2001/XMLSchema#dateTime	hash://sha256/b99c3f70f8571cd5bb1d6af84f1dccd5332736e8ac7a96f39e192fe9a7590d1c
https://hosted-datasets.gbif.org/eBird/2020-eBird-dwca-1.0.zip	"2021-07-02T12:10:05.604Z"^^http://www.w3.org/2001/XMLSchema#dateTime	hash://sha256/b99c3f70f8571cd5bb1d6af84f1dccd5332736e8ac7a96f39e192fe9a7590d1c
https://hosted-datasets.gbif.org/eBird/2020-eBird-dwca-1.0.zip	"2021-09-02T01:07:50.41Z"^^http://www.w3.org/2001/XMLSchema#dateTime	hash://sha256/b99c3f70f8571cd5bb1d6af84f1dccd5332736e8ac7a96f39e192fe9a7590d1c
https://hosted-datasets.gbif.org/eBird/2020-eBird-dwca-1.0.zip	"2021-10-01T23:02:46.359Z"^^http://www.w3.org/2001/XMLSchema#dateTime	hash://sha256/b99c3f70f8571cd5bb1d6af84f1dccd5332736e8ac7a96f39e192fe9a7590d1c
https://hosted-datasets.gbif.org/eBird/2020-eBird-dwca-1.0.zip	"2022-03-02T02:36:59.419Z"^^http://www.w3.org/2001/XMLSchema#dateTime	hash://sha256/b99c3f70f8571cd5bb1d6af84f1dccd5332736e8ac7a96f39e192fe9a7590d1c
https://hosted-datasets.gbif.org/eBird/2020-eBird-dwca-1.0.zip	"2022-01-02T04:40:54.857Z"^^http://www.w3.org/2001/XMLSchema#dateTime	hash://sha256/b99c3f70f8571cd5bb1d6af84f1dccd5332736e8ac7a96f39e192fe9a7590d1c
https://hosted-datasets.gbif.org/eBird/2020-eBird-dwca-1.0.zip	"2021-11-02T16:25:42.407Z"^^http://www.w3.org/2001/XMLSchema#dateTime	hash://sha256/b99c3f70f8571cd5bb1d6af84f1dccd5332736e8ac7a96f39e192fe9a7590d1c

jhpoelen · 2023-12-16T00:49:13Z

here's a query for, and resulting list of, contentIds associated with our eBird friends. Note that this accounts for the introduction of activity namespaces in 2020 #41 .

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?contentId ?seenAt ?archiveUrl WHERE
{ 
{
    SELECT ?contentId ?seenAt ?archiveUrl  
WHERE {
  graph ?g1 {
   <urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e> <http://www.w3.org/ns/prov#hadMember> ?archiveUrl . 
   ?archiveUrl <http://purl.org/dc/elements/1.1/format> "application/dwca" .
  }
  graph ?activity {
    ?activity <http://www.w3.org/ns/prov#used> ?archiveUrl .
    ?activity <http://www.w3.org/ns/prov#generatedAtTime> ?seenAt .
    ?contentId <http://www.w3.org/ns/prov#qualifiedGeneration> ?activity .  
    }
}
}
UNION
{
    SELECT ?contentId ?seenAt ?archiveUrl  
WHERE {
   <urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e> <http://www.w3.org/ns/prov#hadMember> ?archiveUrl . 
   ?archiveUrl <http://purl.org/dc/elements/1.1/format> "application/dwca" .
    ?activity <http://www.w3.org/ns/prov#used> ?archiveUrl .
    ?activity <http://www.w3.org/ns/prov#generatedAtTime> ?seenAt .
    ?contentId <http://www.w3.org/ns/prov#qualifiedGeneration> ?activity .  
}
}
} ORDER BY ?seenAt

with

cat ebird.sparql\
 | ./oxigraph_server_v0.3.22_x86_64_linux_gnu query --results-format tsv --location preston-gib/\
 | tee ebird.tsv

with first 10 yielding

?contentId	?seenAt	?archiveUrl
hash://sha256/ec3ff57cb48d5c41b77b5d1075738b40f598a900e8be56e7645e5a24013dffc4	"2019-12-02T09:51:47.923Z"^^http://www.w3.org/2001/XMLSchema#dateTime	http://ebirddata.ornith.cornell.edu/downloads/gbiff/dwca-1.0.zip
hash://sha256/ee7134043e02f845643b6a655e1c3ffe6d406d0002f8089c5399f0df418b80d6	"2019-12-02T10:00:10.182Z"^^http://www.w3.org/2001/XMLSchema#dateTime	http://download.gbif.org/2019/03/2019-eBird-dwca-1.0.zip
https://deeplinker.bio/.well-known/genid/5a12240f-58fe-37ab-be2f-deeca35653c0	"2020-01-01T22:15:58.082Z"^^http://www.w3.org/2001/XMLSchema#dateTime	http://ebirddata.ornith.cornell.edu/downloads/gbiff/dwca-1.0.zip
hash://sha256/ee7134043e02f845643b6a655e1c3ffe6d406d0002f8089c5399f0df418b80d6	"2020-01-01T22:24:47.774Z"^^http://www.w3.org/2001/XMLSchema#dateTime	http://download.gbif.org/2019/03/2019-eBird-dwca-1.0.zip
https://deeplinker.bio/.well-known/genid/1d102839-ace4-3379-8ff3-2204ebfcae69	"2020-02-02T10:02:32.966Z"^^http://www.w3.org/2001/XMLSchema#dateTime	http://ebirddata.ornith.cornell.edu/downloads/gbiff/dwca-1.0.zip
hash://sha256/ee7134043e02f845643b6a655e1c3ffe6d406d0002f8089c5399f0df418b80d6	"2020-02-02T10:09:05.545Z"^^http://www.w3.org/2001/XMLSchema#dateTime	http://download.gbif.org/2019/03/2019-eBird-dwca-1.0.zip
https://deeplinker.bio/.well-known/genid/4373758e-5de6-3ebc-bce4-726a03dc8f12	"2020-02-11T11:19:58.521Z"^^http://www.w3.org/2001/XMLSchema#dateTime	http://ebirddata.ornith.cornell.edu/downloads/gbiff/dwca-1.0.zip
hash://sha256/ee7134043e02f845643b6a655e1c3ffe6d406d0002f8089c5399f0df418b80d6	"2020-02-11T11:28:25.313Z"^^http://www.w3.org/2001/XMLSchema#dateTime	http://download.gbif.org/2019/03/2019-eBird-dwca-1.0.zip
https://deeplinker.bio/.well-known/genid/302b40fb-7610-3402-a711-ccba64635489	"2020-03-02T16:20:05.905Z"^^http://www.w3.org/2001/XMLSchema#dateTime	http://ebirddata.ornith.cornell.edu/downloads/gbiff/dwca-1.0.zip

and last 10,

?contentId	?seenAt	?archiveUrl
hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d	"2023-10-01T17:02:42.558Z"^^http://www.w3.org/2001/XMLSchema#dateTime	https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d	"2023-10-01T17:02:42.558Z"^^http://www.w3.org/2001/XMLSchema#dateTime	https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d	"2023-11-02T04:13:34.257Z"^^http://www.w3.org/2001/XMLSchema#dateTime	https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d	"2023-11-02T04:13:34.257Z"^^http://www.w3.org/2001/XMLSchema#dateTime	https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d	"2023-11-02T04:13:34.257Z"^^http://www.w3.org/2001/XMLSchema#dateTime	https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d	"2023-11-02T04:13:34.257Z"^^http://www.w3.org/2001/XMLSchema#dateTime	https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d	"2023-12-02T16:05:25.261Z"^^http://www.w3.org/2001/XMLSchema#dateTime	https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d	"2023-12-02T16:05:25.261Z"^^http://www.w3.org/2001/XMLSchema#dateTime	https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d	"2023-12-02T16:05:25.261Z"^^http://www.w3.org/2001/XMLSchema#dateTime	https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d	"2023-12-02T16:05:25.261Z"^^http://www.w3.org/2001/XMLSchema#dateTime	https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip

attached
ebird.tsv.txt

…-guoda/preston#199 (comment)

mielliott · 2023-12-16T17:01:51Z

@mielliott perhaps we have found our indexer in oxigraph . . .

At long last! Hopefully with no fun surprises like Jena's demanding urn:uuid prefixes, etc.

I should probably mention that I did implement the indexing functionality described in #199 (comment) in the registry branch, using Lucene. It never made its way into main though. A big limitation with just using Lucene was the lack of a query language like SPARQL, so instead of writing a query.sparql to search the index, I could only do simple string matching.

Do you plan on packaging oxigraph with preston, or keeping it separate as in your examples?

jhpoelen · 2023-12-20T19:42:27Z

Do you plan on packaging oxigraph with preston, or keeping it separate as in your examples?

@mielliott great question! Not sure yet . . . am almost tempted to treat the oxigraph binaries as assets and add them to the content graph, along with functionality to execute workflows defined in that graph. But other than that, I do not see a compelling reason to merge preston with oxigraph and make it available in a single cli tool. But . . . I we did add a preston scommand to start a web interface, so why not allow something like preston sparql --anchor hash://sha256/.... to start a sparql endpoint for a specific biodiversity data graph. . .

Any ideas? What do you you think, @mielliott ?

…ve; related to bio-guoda/preston#199

jhpoelen · 2023-12-21T01:33:28Z

I've added some configuration to query the indexed provenance graph of GIB (GBIF, iDigBio, BioCase). The syntax is a bit weird, but grcl was quite helpful to get a usable API in front of the sparql endpoint.

Example query by UUID

using GBIF's uuid for the eBird dataset (most of GBIF's volume, https://www.gbif.org/dataset/4fa7b334-ce0d-4e88-aaae-2e0c138d049e reformatted to urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e), yield the following last 9 most recent ebird samples as retrieved from their reported origin.

curl "https://grlc.io/api-git/bio-guoda/preston-service/uuid.csv?uuid=urn%3Auuid%3A4fa7b334-ce0d-4e88-aaae-2e0c138d049e&endpoint=https%3A%2F%2Flod.globalbioticinteractions.org%2Fquery"\
 | head\
 | mlr --icsv --oxtab cat

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-12-02T16:05:25.261Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-11-02T04:13:34.257Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-10-01T17:02:42.558Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-09-02T12:54:17.912Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/89704be7d158045b1e615c9d0349baed4f7bd2fe908f33875df09b8c60cff299
archiveUrl   https://hosted-datasets.gbif.org/eBird/2021-eBird-dwca-1.0.zip
seenAt       2023-08-02T10:47:52.277Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/89704be7d158045b1e615c9d0349baed4f7bd2fe908f33875df09b8c60cff299
archiveUrl   https://hosted-datasets.gbif.org/eBird/2021-eBird-dwca-1.0.zip
seenAt       2023-07-02T21:02:24.922Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/89704be7d158045b1e615c9d0349baed4f7bd2fe908f33875df09b8c60cff299
archiveUrl   https://hosted-datasets.gbif.org/eBird/2021-eBird-dwca-1.0.zip
seenAt       2023-06-01T19:32:39.368Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/89704be7d158045b1e615c9d0349baed4f7bd2fe908f33875df09b8c60cff299
archiveUrl   https://hosted-datasets.gbif.org/eBird/2021-eBird-dwca-1.0.zip
seenAt       2023-05-01T22:45:13.589Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/89704be7d158045b1e615c9d0349baed4f7bd2fe908f33875df09b8c60cff299
archiveUrl   https://hosted-datasets.gbif.org/eBird/2021-eBird-dwca-1.0.zip
seenAt       2023-04-02T14:43:59.511Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

Example query by DOI

Using GBIF's assigned DOI https://doi.org/10.15468/aomfnb the following can be retrieved:

curl 'https://grlc.io/api-git/bio-guoda/preston-service/doi.csv?doi=https%3A%2F%2Fdoi.org%2F10.15468%2Faomfnb&endpoint=https%3A%2F%2Flod.globalbioticinteractions.org%2Fquery'\
 | head\
 | mlr --icsv --oxtab cat

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-12-02T16:05:25.261Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-11-02T04:13:34.257Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-10-01T17:02:42.558Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-09-02T12:54:17.912Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/89704be7d158045b1e615c9d0349baed4f7bd2fe908f33875df09b8c60cff299
archiveUrl   https://hosted-datasets.gbif.org/eBird/2021-eBird-dwca-1.0.zip
seenAt       2023-08-02T10:47:52.277Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/89704be7d158045b1e615c9d0349baed4f7bd2fe908f33875df09b8c60cff299
archiveUrl   https://hosted-datasets.gbif.org/eBird/2021-eBird-dwca-1.0.zip
seenAt       2023-07-02T21:02:24.922Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/89704be7d158045b1e615c9d0349baed4f7bd2fe908f33875df09b8c60cff299
archiveUrl   https://hosted-datasets.gbif.org/eBird/2021-eBird-dwca-1.0.zip
seenAt       2023-06-01T19:32:39.368Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/89704be7d158045b1e615c9d0349baed4f7bd2fe908f33875df09b8c60cff299
archiveUrl   https://hosted-datasets.gbif.org/eBird/2021-eBird-dwca-1.0.zip
seenAt       2023-05-01T22:45:13.589Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/89704be7d158045b1e615c9d0349baed4f7bd2fe908f33875df09b8c60cff299
archiveUrl   https://hosted-datasets.gbif.org/eBird/2021-eBird-dwca-1.0.zip
seenAt       2023-04-02T14:43:59.511Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

Query by URL

query activity by known location of a darwin core archive https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip .

curl 'https://grlc.io/api-git/bio-guoda/preston-service/url.csv?url=https%3A%2F%2Fhosted-datasets.gbif.org%2FeBird%2F2022-eBird-dwca-1.0.zip&endpoint=https%3A%2F%2Flod.globalbioticinteractions.org%2Fquery'\
 | head\
 | mlr --icsv --oxtab cat

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-12-02T16:05:25.261Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-11-02T04:13:34.257Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-10-01T17:02:42.558Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-09-02T12:54:17.912Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

Query by ContentId (aka hash)

Querying for a known dwc archive hash hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d

https://grlc.io/api-git/bio-guoda/preston-service/hash.csv?hash=hash%3A%2F%2Fsha256%2F1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d&endpoint=https%3A%2F%2Flod.globalbioticinteractions.org%2Fquery\
 | head\
 | mlr --icsv --oxtab cat

yields

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-12-02T16:05:25.261Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-11-02T04:13:34.257Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-10-01T17:02:42.558Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

doi          https://doi.org/10.15468/aomfnb
uuid         urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
contentId    hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
archiveUrl   https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
seenAt       2023-09-02T12:54:17.912Z
provenanceId hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd

… (comment)

…associations with doi/uuid/url/hashes. Related to #199 (comment) .

… server; related to bio-guoda/preston#199

…rects. Related to #199 (comment) .

jhpoelen · 2023-12-22T01:24:46Z

After some tinkering, I ended up implementing a redirection service.

The idea is that the service uses a content registry of known provenance, then redirects resolved content ids to a repository. Currently, the resolver resolves identifiers to their associated darwin core archives.

You can resolve by:

GBIF dataset uuid
iDigBio recordset uuid
doi
content id
url

For identifiers that are not uniquely tied to content (e.g., uuid, doi, url), the resolver picks the most recent darwin core archive associated with the identifier. So, this implements a kind of a wayback machine for darwin core archives registered in the GBIF/iDigBio universe. For now, you can find provenance information for the redirect in the 302 http redirect response headers.

Example 1. resolve by eBird dataset uuid

curl -I https://linker.bio/urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e

yields

HTTP/1.1 302 Found
[...]
Location: https://linker.bio/hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
ETag: hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
Content-Location: https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip
X-UUID: urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
X-DOI: https://doi.org/10.15468/aomfnb
X-PROV: hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd
X-PROV-wasInfluencedBy: https://doi.org/10.15468/aomfnb , urn:uuid:4fa7b334-ce0d-4e88-aaae-2e0c138d049e
X-PROV-wasGeneratedBy: urn:uuid:77f3faf7-acd2-4f14-9c0e-4e04ef5b63c7
X-PROV-generatedAtTime: 2023-12-02T16:05:25.261Z
X-PAV-hasVersion: hash://sha256/1e2b7436fce1848f41698e5a9c193f311abaf0ee051bec1a2e48b5106d29524d
[...]

Where,
X-PROV contains a reference to the specific corpus version . This version defines all the content and their known

X-PROV-wasGeneratedBy detailing the activity uuid in the corpus version that found the requested content.

X-PROV-generatedAtTime detailing the time at which the activity found the requested content was started.

X-PAV-hasVersion contains the content id of the content that is redirected to.

X-PROV-wasInfluencedBy contains the entities that are associated with the redirected content. In this case it is the GBIF eBird dataset UUID and DOI that registered in origin url of the content.

Content-Location is the original resource location

Location is the location being redirected to (e.g., https://linker.bio/hash://sha256/....). The client can verify authenticity of the content by inspecting headers, or, perhaps better, the provenance graph itself.

Example 2. resolve by eBird dataset DOI

curl -I https://linker.bio/10.15468/aomfnb

resulting in the same redirection, as expected.

Example 3. resolve by eBird dataset original resource location

curl -I https://linker.bio/https://hosted-datasets.gbif.org/eBird/2022-eBird-dwca-1.0.zip

resulting in the same redirection as in examples 1 and 2, as expected.

The index is built using oxigraph (see https://github.com/bio-guoda/preston-service/blob/9466c7ac601902b28ff64e7ac83ed6a9a74624a5/query/index-provenance-graph.sh ) and results in a ~30GiB index. This index is then run as a read-only service using https://github.com/bio-guoda/preston-service/blob/9466c7ac601902b28ff64e7ac83ed6a9a74624a5/systemd/system/preston-registry.service .

The redirect service is configured to query the index, and redirect to a known content repository via configured defined at https://github.com/bio-guoda/preston-service/blob/main/systemd/system/preston-redirect.service .

With this, we have a service that uses a well-defined relation between identifiers and their associated content. No longer we have to rely on DNS, or dynamic databases, because our redirection is anchor in a specific provenance graph (in this case, the provenance graph with version hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd) .

@seltmann @mielliott @cboettig - Can you feel the excitement? Curious to hear your thoughts.

You should be able to resolve any url/uuid/doi associated with darwin core archives registered with idigbio and gbif. At least, as recorded monthly since late 2018 / early 2019.

jhpoelen · 2023-12-22T01:50:29Z

For a UCSB example . . . I am noticing how there's various ids / locations associated with a specific versioned piece of content - the DwC-A containing the digital collection records and their associated metadata.

id	redirect	content id
https://www.gbif.org/dataset/d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0	https://linker.bio/urn:uuid:d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0	hash://sha256/f5d8f67c1eca34cbba1abac12f353585c78bb053bc8ce7ee7e7a78846e1bfc4a
https://doi.org/10.15468/w6hvhv	https://linker.bio/10.15468/w6hvhv	hash://sha256/f5d8f67c1eca34cbba1abac12f353585c78bb053bc8ce7ee7e7a78846e1bfc4a
https://ecdysis.org/content/dwca/UCSB-IZC_DwC-A.zip	https://linker.bio/https://ecdysis.org/content/dwca/UCSB-IZC_DwC-A.zip	hash://sha256/f5d8f67c1eca34cbba1abac12f353585c78bb053bc8ce7ee7e7a78846e1bfc4a
https://www.idigbio.org/portal/recordsets/65007e62-740c-4302-ba20-260fe68da291	https://linker.bio/urn:uuid:65007e62-740c-4302-ba20-260fe68da291	hash://sha256/f5d8f67c1eca34cbba1abac12f353585c78bb053bc8ce7ee7e7a78846e1bfc4a

jhpoelen · 2023-12-22T21:54:57Z

So, to cite an exact version of a dataset, you can now say something like:

Cheadle Center for Biodiversity and Ecological Restoration (2023). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv as derived from the DwC-A defined in hash://sha256/5b7fa37bf8b64e7c935c4ff3389e36f8dd162f0705410dd719fd089e1ea253cd as gathered through activity urn:uuid:603cb45b-c23e-4d3e-a0bf-604d8537296d at 2023-12-03T06:16:07.462Z

Quite the mouthful, and precise.

jhpoelen · 2023-12-26T20:09:04Z

Now, with added redirect badges for embedding on web pages . . .

with patterns being -

https://linker.bio/badge/[some known url / uuid / doi]

Example:

https://linker.bio/badge/urn:uuid:d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0

which renders to:

which would redirect to the associated content via

https://linker.bio/urn:uuid:d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0

id	redirect url	content id
https://www.gbif.org/dataset/d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0	https://linker.bio/urn:uuid:d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0	hash://sha256/f5d8f67c1eca34cbba1abac12f353585c78bb053bc8ce7ee7e7a78846e1bfc4a
https://doi.org/10.15468/w6hvhv	https://linker.bio/10.15468/w6hvhv	hash://sha256/f5d8f67c1eca34cbba1abac12f353585c78bb053bc8ce7ee7e7a78846e1bfc4a
https://ecdysis.org/content/dwca/UCSB-IZC_DwC-A.zip	https://linker.bio/https://ecdysis.org/content/dwca/UCSB-IZC_DwC-A.zip	hash://sha256/f5d8f67c1eca34cbba1abac12f353585c78bb053bc8ce7ee7e7a78846e1bfc4a
https://www.idigbio.org/portal/recordsets/65007e62-740c-4302-ba20-260fe68da291	https://linker.bio/urn:uuid:65007e62-740c-4302-ba20-260fe68da291	hash://sha256/f5d8f67c1eca34cbba1abac12f353585c78bb053bc8ce7ee7e7a78846e1bfc4a

…elated to #199

… design; related to #199

jhpoelen · 2023-12-30T01:49:32Z

@seltmann you can check whether your UCSB collection is tracked by Preston by embedding DwC-A and EML download buttons on your respective pages using GBIF Dataset DOI, DwC-A endpoint urls, GBIF Dataset UUID, iDigBIo recordset UUIDs -

e.g.,
https://www.gbif.org/dataset/d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0

can be used as

urn:uuid:d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0

to get most recent archived/tracked related DwC-A content using

https://linker.bio/urn:uuid:d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0

with badge uri

https://linker.bio/badge/urn:uuid:d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0

… of a specific redirect; #199

jhpoelen · 2024-01-03T22:27:21Z

@seltmann please note that I've redesigned the badge to be a FAIR assessment badge.

So, without further ado:

drum roll. . . .

Congratulations to @seltmann and colleagues: UCSB-IZC is FAIR!

Accessed from https://linker.bio/#use-case-4-assessing-fairness-of-biodiversity-data on 2024-01-03 -

jhpoelen · 2024-01-03T22:43:34Z

See also https://discourse.gbif.org/t/assessing-fairness-of-biodiversity-data-through-badges-and-download-buttons/4246

mielliott · 2024-01-04T17:51:05Z

Amazing stuff @jhpoelen, very fun! I noticed the badges default to calling stuff a DwC-A if the content type is unknown or the content doesn't exist:

preston/preston-serve/src/main/java/bio/guoda/preston/server/RedirectingServlet.java

Lines 81 to 89 in 8a912a4

    
           static String getContentType(URI requestURI) { 
        
               List<NameValuePair> parse = URLEncodedUtils.parse(requestURI.getQuery(), StandardCharsets.UTF_8); 
        
               return parse 
        
                       .stream() 
        
                       .filter(p -> StringUtils.equals(p.getName(), "type")) 
        
                       .findFirst() 
        
                       .map(NameValuePair::getValue) 
        
                       .orElse(MimeTypes.MIME_TYPE_DWCA); 
        
           }

And this kinda confused me when toying around with the new badge feature, asking for badges of silly things like RSS feeds or fake IDs. I can see this causing some confusion if for example something goes wrong for someone's EML/etc. badge, causing linker.bio to instead make a "DwC-A" badge. May I suggest a more unassuming badge when content type can't be determined? A more general "Content", "Error", or just blank? Or maybe there's an "unknown" MimeType or similar.

jhpoelen · 2024-01-04T18:33:00Z

@mielliott thanks for sharing your thoughts. I can see how a badge with "DwC-A unknown" can be confusing, especially when plugging in any kind of stuff like https://linker.bio/badge/10.12/345 .

What the badge is trying to say is: I couldn't find any trace of a DwC archive associated with "10.12/345". So even if some content is associated with "10.12/345" but it wasn't intended to be DwC, it'll still show the "DwC-A unknown" badge.

So requesting:

https://linker.bio/badge/10.12/345

is equivalent to asking:

https://linker.bio/badge/10.12/345?type=application/dwca

With this information, would you have any suggestions on how to make the "DwC-A unknown" badge less confusing and more informative?

mielliott · 2024-01-11T04:36:05Z

How about like https://linker.bio/badge/10.12/345?type=cats?

In this case I'd argue that being less informative is less confusing

PS - I really like the feature to specify the content type 🙌

@mielliott

…lacing requested content type (e.g., DwC-A) with the word "content". As suggested by @mielliott (thank you!) in #199 (comment)

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 16, 2023

add some examples on indexing preston provenance logs; related to bio…

e431046

…-guoda/preston#199 (comment)

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 20, 2023

towards adding basic web service for gbif/idigbio/biocase (gib) archi…

4a3904b

…ve; related to bio-guoda/preston#199

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 20, 2023

add config for preston-web related to bio-guoda/preston#199

a85279b

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 21, 2023

add queries by hash, url, and uuid; related to bio-guoda/preston#199

9cb5159

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 21, 2023

fix default uuid; related to bio-guoda/preston#199

8a89d82

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 21, 2023

optimize query by hash; related to bio-guoda/preston#199

d327a59

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 21, 2023

add provenance id; related to bio-guoda/preston#199

0d731b8

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 21, 2023

implement multiple choices response; related to bio-guoda/preston#199…

1b4d8fb

… (comment)

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 21, 2023

reuse content-by-alias functionality; bio-guoda/preston#199 (comment)

7bc355e

jhpoelen pushed a commit that referenced this issue Dec 21, 2023

add redirecting preston server - queries index to find known content …

7bdc322

…associations with doi/uuid/url/hashes. Related to #199 (comment) .

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 21, 2023

remove prototype for grlc integration in favor of redirecting preston…

a05b02c

… server; related to bio-guoda/preston#199

jhpoelen pushed a commit that referenced this issue Dec 21, 2023

add provenance info into redirect http headers; related to #199

428e9a4

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 22, 2023

add setup to use redirection server; related to bio-guoda/preston#199

43d0cf8

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 22, 2023

add registry and redirect service; related to bio-guoda/preston#199

0957f2e

jhpoelen pushed a commit that referenced this issue Dec 22, 2023

update config parameter names related to registry/repository for redi…

f8cf92d

…rects. Related to #199 (comment) .

jhpoelen pushed a commit that referenced this issue Dec 22, 2023

exclude blanks when resolving; related to #199

d068eac

jhpoelen pushed a commit that referenced this issue Dec 22, 2023

enable redirection of iDigBio recordset uuids also; related to #199

1c2d3b3

jhpoelen pushed a commit that referenced this issue Dec 26, 2023

refactor provenance redirect methods; relates to #199

e63b18f

jhpoelen pushed a commit that referenced this issue Dec 26, 2023

add badge service for identifier redirector; related to #199

463e4ed

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 26, 2023

add nginx redirect for badges; related to bio-guoda/preston#199

3addf29

jhpoelen pushed a commit that referenced this issue Dec 26, 2023

tighten text on redirect badge; #199

0540391

jhpoelen pushed a commit that referenced this issue Dec 27, 2023

re-introduce provenance query by content id; related to #199

0b95792

jhpoelen pushed a commit that referenced this issue Dec 27, 2023

add content id also known to iDigBio; related to #199

9432db0

jhpoelen pushed a commit that referenced this issue Dec 27, 2023

send 404 Not Found on unknown ids; related to #199

80dd880

jhpoelen pushed a commit that referenced this issue Dec 29, 2023

add index search support for EML files in addition to DwC Archives; r…

7247545

…elated to #199

jhpoelen pushed a commit that referenced this issue Dec 30, 2023

parse requested content type for redirect requests; related to #199

8a912a4

jhpoelen pushed a commit that referenced this issue Dec 30, 2023

include content-type in by contentId query; attempt to simplify badge…

9c1e8d7

… design; related to #199

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Dec 30, 2023

split prov content and index caches; related to bio-guoda/preston#199

975aee3

jhpoelen pushed a commit that referenced this issue Jan 3, 2024

introduce the idea of a FAIR DwC-A badge; related to #199

ec13d29

jhpoelen pushed a commit that referenced this issue Jan 3, 2024

add relation hadPrimarySource to associated the source provenance log…

6041955

… of a specific redirect; #199

jhpoelen pushed a commit to bio-guoda/preston-service that referenced this issue Jan 4, 2024

add badge by content id; related to bio-guoda/preston#199

f1d462a

jhpoelen pushed a commit that referenced this issue Jan 11, 2024

make content unknown badge less informative and less confusing by rep…

d4df381

…lacing requested content type (e.g., DwC-A) with the word "content". As suggested by @mielliott (thank you!) in #199 (comment)

jhpoelen pushed a commit that referenced this issue Jan 22, 2024

include provenance anchor on unsuccessful redirects; related to #199

d05cc78

jhpoelen pushed a commit that referenced this issue Aug 22, 2024

add test for indexed blob store; related to #199

c54a767

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an index command #199

Add an index command #199

mielliott commented Nov 2, 2022 •

edited

Loading

mielliott commented Nov 2, 2022

jhpoelen commented Nov 2, 2022 •

edited

Loading

mielliott commented Nov 3, 2022 •

edited

Loading

mielliott commented Nov 3, 2022 •

edited

Loading

jhpoelen commented Nov 3, 2022

jhpoelen commented Dec 15, 2023

jhpoelen commented Dec 15, 2023

jhpoelen commented Dec 15, 2023

jhpoelen commented Dec 16, 2023 •

edited

Loading

jhpoelen commented Dec 16, 2023 •

edited

Loading

mielliott commented Dec 16, 2023

jhpoelen commented Dec 20, 2023 •

edited

Loading

jhpoelen commented Dec 21, 2023 •

edited

Loading

jhpoelen commented Dec 22, 2023 •

edited

Loading

jhpoelen commented Dec 22, 2023 •

edited

Loading

jhpoelen commented Dec 22, 2023 •

edited

Loading

jhpoelen commented Dec 26, 2023 •

edited

Loading

jhpoelen commented Dec 30, 2023 •

edited

Loading

jhpoelen commented Jan 3, 2024 •

edited

Loading

jhpoelen commented Jan 3, 2024

mielliott commented Jan 4, 2024

jhpoelen commented Jan 4, 2024 •

edited

Loading

mielliott commented Jan 11, 2024

Add an index command #199

Add an index command #199

Comments

mielliott commented Nov 2, 2022 • edited Loading

mielliott commented Nov 2, 2022

jhpoelen commented Nov 2, 2022 • edited Loading

mielliott commented Nov 3, 2022 • edited Loading

mielliott commented Nov 3, 2022 • edited Loading

jhpoelen commented Nov 3, 2022

jhpoelen commented Dec 15, 2023

jhpoelen commented Dec 15, 2023

jhpoelen commented Dec 15, 2023

jhpoelen commented Dec 16, 2023 • edited Loading

jhpoelen commented Dec 16, 2023 • edited Loading

mielliott commented Dec 16, 2023

jhpoelen commented Dec 20, 2023 • edited Loading

jhpoelen commented Dec 21, 2023 • edited Loading

Example query by UUID

Example query by DOI

Query by URL

Query by ContentId (aka hash)

jhpoelen commented Dec 22, 2023 • edited Loading

Example 1. resolve by eBird dataset uuid

Example 2. resolve by eBird dataset DOI

Example 3. resolve by eBird dataset original resource location

jhpoelen commented Dec 22, 2023 • edited Loading

jhpoelen commented Dec 22, 2023 • edited Loading

jhpoelen commented Dec 26, 2023 • edited Loading

jhpoelen commented Dec 30, 2023 • edited Loading

jhpoelen commented Jan 3, 2024 • edited Loading

jhpoelen commented Jan 3, 2024

mielliott commented Jan 4, 2024

jhpoelen commented Jan 4, 2024 • edited Loading

mielliott commented Jan 11, 2024

mielliott commented Nov 2, 2022 •

edited

Loading

jhpoelen commented Nov 2, 2022 •

edited

Loading

mielliott commented Nov 3, 2022 •

edited

Loading

mielliott commented Nov 3, 2022 •

edited

Loading

jhpoelen commented Dec 16, 2023 •

edited

Loading

jhpoelen commented Dec 16, 2023 •

edited

Loading

jhpoelen commented Dec 20, 2023 •

edited

Loading

jhpoelen commented Dec 21, 2023 •

edited

Loading

jhpoelen commented Dec 22, 2023 •

edited

Loading

jhpoelen commented Dec 22, 2023 •

edited

Loading

jhpoelen commented Dec 22, 2023 •

edited

Loading

jhpoelen commented Dec 26, 2023 •

edited

Loading

jhpoelen commented Dec 30, 2023 •

edited

Loading

jhpoelen commented Jan 3, 2024 •

edited

Loading

jhpoelen commented Jan 4, 2024 •

edited

Loading