-
Notifications
You must be signed in to change notification settings - Fork 0
/
usecasesv1.html
3056 lines (2947 loc) · 189 KB
/
usecasesv1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>Data on the Web Best Practices Use Cases & Requirements</title>
<!--[if lt IE 9]>
<script src="http://www.w3.org/2008/site/js/html5shiv.js"></script> <![endif]-->
<script class="remove" src="https://www.w3.org/Tools/respec/respec-w3c-common"></script>
<script class="remove">
var respecConfig = {
// specification status (e.g. WD, LC, WG-NOTE, etc.). If in doubt use ED.
specStatus: "ED",
//specStatus: "CR",
// the specification's short name, as in http://www.w3.org/TR/short-name/
shortName: "dwbp-ucr",
// if your specification has a subtitle that goes below the main
// formal title, define it here
// subtitle : "an excellent document",
// if you wish the publication date to be other than today, set this
publishDate: "2015-03-27",
// prEnd: "2014-01-12",
// lcEnd: "2013-11-26",
// crEnd: "2013-11-26",
// if the specification's copyright date is a range of years, specify
// the start date here:
copyrightStart: "2014",
// if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
// and its maturity status
previousPublishDate: "2014-06-05",
//previousPublishDate: "2013-08-01",
//previousMaturity: "CR",
previousMaturity: "WD",
// if there a publicly available Editor's Draft, this is the link
edDraftURI: "http://w3c.github.io/dwbp/usecasesv1.html",
// if this is a LCWD, uncomment and set the end of its review period
// lcEnd: "2013-09-06",
// if there is an earler version of this specification at the Recommendation level,
// set this to the shortname of that version. This is optional and not usually
// necessary.
// prevRecShortname: "rdf-concepts",
// editors, add as many as you like
// only "name" is required
editors: [
{ name: "Deirdre Lee", url: "mailto:deirdre.lee@insight-centre.org", company: "Insight@NUIG, Ireland", companyURL: "http://www.insight-centre.org/"},
{ name: "Bernadette Farias Lóscio", url: "mailto:bfl@cin.ufpe.br", company: "Centro de Informática - Universidade Federal de Pernambuco, Brazil", companyURL: "http://www.cin.ufpe.br/" },
{name: "Phil Archer", url: "mailto:phila@w3.org", company: "W3C/ERCIM", companyURL: "http://www.w3.org/"}
],
// authors, add as many as you like.
// This is optional, uncomment if you have authors as well as editors.
// only "name" is required. Same format as editors.
//authors: [
// { name: "Your Name", url: "http://example.org/",
// company: "Your Company", companyURL: "http://example.com/" },
//],
// name of the WG
wg: "Data on the Web Best Practices Working Group",
// URI of the public WG page
wgURI: "http://www.w3.org/2013/dwbp/",
// name (WITHOUT the @w3.org) of the public mailing to which comments are due
wgPublicList: "public-dwbp-comments",
// URI of the patent status for this WG, for Rec-track documents
// !!!! IMPORTANT !!!!
// This is important for Rec-track documents, do not copy a patent URI from a random
// document unless you know what you're doing. If in doubt ask your friendly neighbourhood
// Team Contact.
wgPatentURI: "http://www.w3.org/2004/01/pp-impl/68239/status",
// if this parameter is set to true, ReSpec.js will embed various RDFa attributes
// throughout the generated specification. The triples generated use vocabulary items
// from the dcterms, foaf, and bibo. The parameter defaults to false.
doRDFa: "1.1",
alternateFormats: [ { uri: "diff-20131105.html", label: "diff to previous version" } ],
// implementationReportURI: "http://www.w3.org/2011/gld/wiki/DCAT_Implementations",
maxTocLevel: 2,
};
</script>
<link href="http://www.w3.org/TR/2014/WD-dwbp-ucr-20140605/localstyles.css"
rel="stylesheet" />
<style type="text/css">
caption {
caption-side:bottom;
margin-top:0.3em;
font-style:italic;
}
</style>
</head>
<body>
<section id="abstract">
<p>This document lists use cases, compiled by the Data on the
Web Best Practices Working Group, that represent scenarios of how data
is commonly published on the Web and how it is used. This document also
provides a set of requirements derived from these use cases that will be
used to guide the development of the set of Data on the Web Best
Practices and the development of two new vocabularies: Quality and
Granularity Description Vocabulary and Data Usage Description
Vocabulary. </p>
</section>
<section id="sotd"> </section>
<section class="informative">
<h2 id="intro">Introduction</h2>
<p>There is a growing interest in publishing and consuming data on the
Web. Both government and non-government organizations already make a
variety of data available on the Web, some openly, some with access
restrictions, covering many domains like education, the economy,
security, cultural heritage, eCommerce and scientific data. Developers,
journalists and others manipulate this data to create visualizations and
to perform data analysis. Experience in this field shows that several
important issues need to be addressed in order to meet the requirements
of both data publishers and data consumers. </p>
<p>To address these issues, the Data on the Web Best Practices Working
Group seeks to provide guidance to all stakeholders that will improve
consistency in the way data is published, managed, referenced and used
on the Web. The guidance will take two forms: a set of best practices
that apply to multiple technologies, and vocabularies that are currently
missing but that are needed to support the data ecosystem on the Web.</p>
<p>In order to determine the scope of the best practices and the
requirements for the new vocabularies, a set of use cases has been
compiled. Each use case provides a narrative describing an experience of
publishing and using Data on the Web. The use cases cover different
domains and illustrate some of the main challenges faced by data
publishers and data consumers. A set of requirements, used to guide the
development of the set of best practices as well as the development of
the vocabularies, have been derived from the compiled use cases.</p>
<p>Interpretations of each use case could lead to an unmanageably large
number of requirements and so before including them, each potential
requirement has been assessed against three specific criteria:</p>
<ol>
<li>Is the requirement specifically relevant to data published on the
Web?</li>
<li>Does the requirement encourage reuse or publication of data on the
Web?</li>
<li>Is the requirement testable?</li>
</ol>
<p>Only requirements meeting those three criteria have been included.</p>
</section>
<section>
<h2 id="use-cases">Use Cases</h2>
<p>A use case illustrates an experience of publishing and using Data on
the Web. The information gathered from the use cases should be helpful
for the identification of the best practices that will guide the
publishing and usage of Data on the Web. In general, a use case will be
described at least by a statement and a discussion of how the use case
is currently implemented. Use case descriptions demonstrate some of the
main challenges faced by publishers or developers. Information about
challenges will be helpful to identify areas where Best Practices are
necessary. According to the challenges, a set of requirements are
abstracted in such a way that a requirement motivates the creation of
one or more best practices.</p>
<!-- ASO: Airborne Snow Observatory -->
<section rel="bibo:Chapter" resource="#UC-ASO" typeof="bibo:Chapter" id="UC-ASO">
<h3 id="h3_UC-ASO" role="heading" aria-level="2">ASO: Airborne Snow
Observatory</h3>
<p class="contributor">(Contributed by Lewis John McGibbney, NASA Jet
Propulsion Laboratory/California Institute of Technology)<br />
URL: <a href="http://aso.jpl.nasa.gov/">http://aso.jpl.nasa.gov/</a></p>
<p>The two most critical properties for understanding snowmelt runoff
and timing are the spatial and temporal distributions of snow water
equivalent (SWE) and snow albedo. Despite their importance in
controlling volume and timing of runoff, snowpack albedo and SWE are
still largely unquantified in the US and not at all in most of the
globe, leaving runoff models poorly constrained. NASA/JPL, in
partnership with the California Department of Water Resources, has
developed the Airborne Snow Observatory (ASO), an imaging spectrometer
and scanning Lidar system, to quantify SWE and snow albedo, generate
unprecedented knowledge of snow properties for cutting edge
cryospheric science, and provide complete, robust inputs to water
management models and systems of the future.</p>
<p><strong>Elements:</strong></p>
<ul>
<li><strong>Domains:</strong> Digital Earth Modeling, Digital Surface
Modeling, Spatial Distribution Measurement, Snow Depth, Snow Water
Equivalent, Snow Albedo.</li>
<li><strong>Obligation/motivation:</strong> Funding provided by NASA
Terrestrial Hydrology, NASA Applied Sciences, and California
Department of Water Resources.</li>
<li><strong>Usage:</strong> Example data usage include < 24hrs
turnaround of flight data which is passed on to numerous Water
Resource Managers aiding in water conservation usage, policy and
decision making processes. Accurate and weekly spatially distributed
SWE has never been produced before, and is highly informative to
reservoir managers who must make tradeoffs between storing water for
summer water supply versus using water before snowmelt recedes for
generation of clean hydropower. Accurate SWE information, when
coupled with runoff forecasting models, can also have ecological
benefits through avoidance of late-spring high flows released from
reservoirs that are not part of the natural seasonal variability.</li>
<li><strong>Quality:</strong> Available in a number of scientific
formats to customers and stakeholders based on customer
requirements.</li>
<li><strong>Lineage:</strong> All ASO data stems directly from
on-board imaging spectrometer and scanning Lidar system instruments.</li>
<li><strong>Size:</strong> Many many TB in size. Raw data acquisition
is dependent on the basin/survey size. Recent individual flights
generate in the order of ~500GB which include imaging spectrometer
and Lidar data. This does however shrink considerably if we just
consider the data that we would distribute.</li>
<li><strong>Type/format:</strong> Digital Elevation Model / binary
image (not public atm), Lidar (Raw Point Clouds)/ las (not public
atm), Raster Zonal Stats / text (not public atm), Snow Water
Equivalent / tiff, Snow Albedo / tiff</li>
<li><strong>Rate of change:</strong> Recent weekly flights have
provided information on a scale and timing that has never occurred
before. Distributed SWE increases after storms, and decreases during
melt events in patterns that have never before been measured and
will be studied by snow hydrologists for years to come. Once data is
captured it is not updated, however subsequent data is generated
from the original data within processing pipelines which as
screening for data quality control and assurance.</li>
<li><strong>Data lifespan:</strong> For immediate operational
purposes, the last flight's data become obsolete when a new flight
is made. However, the annual sequence of data sets will be leveraged
by snow hydrologists and runoff forecasters during the next decade
as they are used to improve models and understanding of the spatial
nature of the mountain snowpack.</li>
<li><strong>Potential audience:</strong> (snow) hydrologists,
hydrologic modelers, runoff forecasters, and reservoir operators and
reservoir managers.</li>
</ul>
<p><strong>Positive aspects:</strong></p>
<p>This use case provides insight into what a NASA funded demonstration
mission looks like (from a data provenance, archival point of view).</p>
<p>It is an excellent opportunity to delve into an earth science mission
which is actively addressing the global problem of water resource
management. Recently senior officials have declared a statewide (CA)
drought emergency and are asking all Californians to reduce their
water use by 20 percent. California, and other U.S. states are
experiencing a serious drought and the state will be challenged to
meet its water needs in the upcoming year. Calendar year 2013 was the
driest year in recorded history for many areas of California, and
current conditions suggest no change is in sight for 2014. ASO is at
the front line of cutting edge scientific research meaning that the
data that backs the mission, as well as the practices adopted within
the project execution, are extremely important to addressing this
issue.</p>
<p>Project collaborators and stakeholders are sent data and information
when it is produced and curated. For some stakeholders, the data (in
an operational sense) they require is very small in size and in such
cases ASO emphasizes speed. It's more like a sharing of information
than delivering a product for the short-term turnaround of
information. </p>
<p><strong>Negative aspects:</strong></p>
<p>Demonstration missions of this caliber also have downsides. With
regards to data best practices, more work is required in the following
areas:</p>
<ul>
<li>Documentation of processes including data acquisition, provenance
tracking, curation of data products such as bare earth digital earth
models (DEM), full surface digital surface models (DSM), snow
products, snow water equivalents (SWE), etc.</li>
<li>Currently data is not searchable, this makes retrieval of specific
data difficult when data volumes grow to this size and nature</li>
<li>There is no publicly available guidance regarding suggested tools
which can be used to interact with the data sources.</li>
<li>Quick turnarounds of operational data may be compromised when ASO
moves beyond a demonstration mission and picks up new customers etc.
This will most likely be attributed to the time associations for the
generation and distribution of science grade products.</li>
</ul>
<p><strong>Challenges: </strong></p>
<ul>
<li>Data volumes are large, and will grow by year on year. The volume
of generated data grew by 50% between 2013 and 2014.</li>
<li>On many occasions we require a very quick turn around on
inferences which can be made from the data. This sometimes (but not
always) comes at the cost of reducing the emphasis of best practices
for the generation, storage and archival of projects data.</li>
<li>The data takes the form of science oriented representational
formats. Such formats are non-typical of the typical data many
people publish on the Web. A lot of thought needs to be put in to
how this data can be better accessed.</li>
</ul>
<p><strong>Requires:</strong> <a href="#R-AccessUptodate">R-AccessUptodate</a>,
<a href="#R-Citable">R-Citable</a>, <a href="#R-DataIrreproducibility">R-DataIrreproducibility</a>,
<a href="#R-DataMissingIncomplete">R-DataMissingIncomplete</a>, <a href="#R-FormatMachineRead">R-FormatMachineRead</a>,
<a href="#R-GeographicalContext">R-GeographicalContext</a>, <a href="#R-GranularityLevels">R-GranularityLevels</a>,
<a href="#R-LicenseLiability">R-LicenseLiability</a>, <a href="#R-MetadataAvailable">R-MetadataAvailable</a>,
<a href="#R-ProvAvailable">R-ProvAvailable</a>, <a href="#R-QualityCompleteness">R-QualityCompleteness</a>,
<a href="#R-QualityMetrics">R-QualityMetrics</a>, <a href="#R-TrackDataUsage">R-TrackDataUsage</a>,
<a href="#R-UsageFeedback">R-UsageFeedback</a>, <a href="#R-VocabDocum">R-VocabDocum</a>,
</p>
</section>
<!-- BBC -->
<section rel="bibo:Chapter" resource="#UC-BBC" typeof="bibo:Chapter" id="UC-BBC">
<h3 id="h3_UC-BBC" role="heading" aria-level="2">BBC</h3>
<p class="contributor"><strong>Contributors: </strong>Ghislain
Atemezing (EURECOM)</p>
<p><strong>URL: </strong><a href="http://www.bbc.co.uk/ontologies">http://www.bbc.co.uk/ontologies</a></p>
<p><strong>Overview:</strong> the BBC provides a <a href="http://www.bbc.co.uk/ontologies">list
of the ontologies</a> they implement and use for their Linked Data
platform. The site provides access to the ontologies the BBC is using
to support its audience using their applications, such as <a href="http://www.bbc.co.uk/sport">BBC
Sport</a> or <a href="http://www.bbc.co.uk/education">BBC Education</a>.
Each ontology has a short description with metadata information, an
introduction, sample data, an ontology diagram and the terms used in
the ontology. The metadata includes 6 fields that are generally
filled: mailto authors, created data, version (current version
number), prior version (decimal), license (a link to the license) and
a link for downloading the RDF version. For example, see the
description of the “<a href="http://www.bbc.co.uk/ontologies/coreconcepts">Core
concepts ontology</a>.” However, this metadata that is available in
the HTML page is NOT present in a machine-readable format, i.e. in the
ontology itself.</p>
<p><strong>Versioning:</strong> each ontology uses a decimal notation
for the version and the URL for accessing each version file of the
ontology is constructed as {BASE-URI}/{ONTO-PREFIX}/{VERSION}.ttl;
where {BASE-URI} is <a href="http://www.bbc.co.uk/ontologies/">http://www.bbc.co.uk/ontologies/</a>.
For example: the file of version 1.9 of the “core concepts” ontology
is located at <a href="http://www.bbc.co.uk/ontologies/coreconcepts/1.9.ttl">http://www.bbc.co.uk/ontologies/coreconcepts/1.9.ttl</a>.
However, between different versions, the URI of the ontology used is
the same and is of the form : {BASE-URI}/{ONTO-PREFIX}/.</p>
<p><strong>Elements:</strong></p>
<ul>
<li><strong>Domains:</strong> vocabulary catalog, versioning, metadata
</li>
<li><strong>Obligation/motivation:</strong> Provide a unique point of
vocabularies built within BBC </li>
<li><strong>Usage:</strong> The site provides access to the ontologies
the BBC is using to support its audience using their applications, </li>
<li><strong>Quality:</strong> High level and domain vocabularies
adapted to BBC applications. </li>
<li><strong>Size:</strong> currently, there are 12 ontologies of
different sizes, from 40 triples to 750 triples. </li>
<li><strong>Type/format:</strong> RDF/TURTLE, and html pages
describing each ontology </li>
<li><strong>Rate of change:</strong> Depends on the vocabulary, may
depends on the different versions; although there is not such
metadata information </li>
<li><strong>Data lifespan:</strong> n/a </li>
<li><strong>Potential audience:</strong> BBC applications and any user
interested in the domains of the vocabularies (publishers,
researchers or developers) </li>
</ul>
<p><strong>Challenges</strong></p>
<ul>
<li>It could be nice and consistent to add systematically the metadata
provided in the html pages describing each BBC ontology in the RDF
vocabulary. </li>
<li>How to dereference from a unique URI, different versions of the
ontology in different flavor of RDF (XML, TURTLE, etc.) </li>
<li>Need to add the modified date along with the version of each
ontology. </li>
</ul>
<p><strong>Requires</strong> <a href="#R-MetadataDocum">R-MetadataDocum</a>,
<a href="#R-MetadataMachineRead">R-MetadataMachineRead</a>, <a href="#R-FormatMultiple">R-FormatMultiple</a>,
<a href="#R-MetadataStandardized">R-MetadataStandardized</a> and <a href="#R-VocabVersion%20">R-VocabVersion
</a>.</p>
</section>
<!-- Bio2RDF -->
<section rel="bibo:Chapter" resource="#UC-Bio2RDF" typeof="bibo:Chapter" id="UC-Bio2RDF">
<h3 id="h3_UC-Bio2RDF" role="heading" aria-level="2">Bio2RDF</h3>
<p class="contributor">(Contributed by Carlos Laufer)<br />
URL: <a href="http://bio2rdf.org/">http://bio2rdf.org/</a></p>
<p> <a href="http://bio2rdf.org/">Bio2RDF</a><sup><a href="#bio1">1</a></sup>
is an open source project that uses Semantic Web technologies to make
possible the distributed querying of integrated life sciences data.
Since its inception<sup><a href="#bio2">2</a></sup>, Bio2RDF has made
use of the Resource Description Framework (RDF) and the RDF Schema
(RDFS) to unify the representation of data obtained from diverse
fields (molecules, enzymes, pathways, diseases, etc.) and
heterogeneously formatted biological data (e.g. flat-files,
tab-delimited files, SQL, dataset specific formats, XML etc.). Once
converted to RDF, this biological data can be queried using the SPARQL
Protocol and RDF Query Language (SPARQL), which can be used to
federate queries across multiple SPARQL endpoints. </p>
<p> <strong>Elements:</strong> </p>
<ul>
<li><b><i>Domains:</i></b> Biological data </li>
<li><b><i>Obligation/motivation:</i></b> Biological researchers are
often confronted with the inevitable and unenviable task of having
to integrate their experimental results with those of others. This
task usually involves a tedious manual search and assimilation of
often isolated and diverse collections of life sciences data hosted
by multiple independent providers including organizations such as
the National Center for Bio-technology Information (<a href="http://www.ncbi.nlm.nih.gov/">NCBI</a>)
and the European Bioinformatics Institute (<a href="http://www.ebi.ac.uk/">EBI</a>)
that provide dozens of user-submitted and curated datasets, as well
as smaller institutions such as the Donaldson group that publishes <a
href="http://irefindex.org/">iRefIndex</a><sup><a href="#bio3">3</a></sup>,
a database of molecular interactions aggregated from 13 data
sources. While these mostly isolated silos of biological information
occasionally provide links between their records (e.g. <a href="http://www.uniprot.org">UniProt</a>
links its entries to hundreds of other datasets), they are typically
serialized in either HTML elements or in flat file data dumps that
lack the semantic richness required to serialize the intent of the
linkage between data records. With thousands of biological databases
and hundreds of thousands of datasets, the ability to find relevant
data is hampered by non-standard database interfaces and an enormous
number of haphazard data formats<sup><a href="#bio4">4</a></sup>.
Moreover, metadata about these biological data providers (dataset
source data information, dataset versioning, licensing information,
date of creation, etc.) is often difficult to obtain. Taken
together, the inability to easily navigate through available data
presents an overwhelming barrier to their reuse. </li>
<li><b><i>Usage:</i></b> Biological research</li>
<li><b><i>Quality:</i></b> Bio2RDF scripts generate provenance records
using the <abbr title="World Wide Web Consortium">W3C</abbr>
Vocabulary of Interlinked Datasets (<a href="http://www.w3.org/TR/void/">VoID</a>),
the Provenance vocabulary (<a href="http://www.w3.org/TR/prov-overview/">PROV</a>)
and <a href="http://dublincore.org/">Dublin Core</a> vocabulary.
Each data item is linked to a provenance object that indicates the
source of the data, the time at which the RDF was generated,
licensing (if available from the data source provider), the SPARQL
endpoint in which the resource can be found, and the downloadable
RDF file where the data item is located. Each dataset provenance
object has a unique IRI and label based on the dataset name and
creation date. The date-specific dataset IRI is linked to a unique
dataset IRI using the PROV predicate <code>wasDerivedFrom</code>
such that one can query the dataset SPARQL endpoint to retrieve all
provenance records for datasets created on different dates. Each
resource in the dataset is linked the date-unique dataset IRI that
is part of the provenance record using the VoID <code>inDataset</code>
predicate. Other important features of the provenance record include
the use of the Dublin Core <code>creator</code> term to link a
dataset to the script on Github that was used to generate it, the
VoID predicate <code>sparqlEndpoint</code> to point to the dataset
SPARQL endpoint, and VoID predicate <code>dataDump</code> to point
to the data download URL.
<p>Dataset metrics </p>
<ol>
<li>total number of triples </li>
<li>number of unique subjects </li>
<li>number of unique predicates </li>
<li>number of unique objects </li>
<li>number of unique types </li>
<li>unique predicate-object links and their frequencies </li>
<li>unique predicate-literal links and their frequencies </li>
<li>unique subject type-predicate-object type links and their
frequencies </li>
<li>unique subject type-predicate-literal links and their
frequencies </li>
<li>total number of references to a namespace </li>
<li>total number of inter-namespace references </li>
<li>total number of inter-namespace-predicate references </li>
</ol>
</li>
<li><b><i>Size:</i></b> At the time of writing, thirty five datasets
have been generated as part of the <a href="http://download.bio2rdf.org/release/3/release.html">Bio2RDF
3 release</a>. Several of the datasets are themselves collections
of datasets that are now available as one resource. Each dataset has
been loaded into a dataset-specific SPARQL endpoint using Openlink
Virtuoso. All updated Bio2RDF linked data and their corresponding
Virtuoso DB files are available for <a href="http://download.bio2rdf.org/current/release.html">download</a>.</li>
</ul>
<ul>
<li>Type/format: RDF </li>
<li>Rate of change: depends on data source</li>
<li>Data lifespan: depends on data source</li>
<li>Potential audience: Biological researchers</li>
</ul>
<p><strong>References:</strong></p>
<ol>
<li id="bio1">Callahan A, Cruz-Toledo J, Ansell P, Klassen D,
Tumarello G, Dumontier M: <a href="http://ceur-ws.org/Vol-952/paper_18.pdf">Improved
dataset coverage and interoperability with Bio2RDF Release 2</a>
(PDF). SWAT4LS 2012, Proceedings of the 5th International Workshop
on Semantic Web Applications and Tools for Life Sciences, Paris,
France, November 28-30, 2012.</li>
<li id="bio2">Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette
J: Bio2RDF: towards a mashup to build bioinformatics knowledge
systems. J Biomed Inform 2008, 41(5):706-716.</li>
<li id="bio3">Razick S, Magklaras G, Donaldson IM: iRefIndex: a
consolidated protein interaction database with provenance. BMC
Bioinformatics 2008, 9:405.</li>
<li id="bio4">Goble C, Stevens R: State of the nation in data
integration for bioinformatics. J Biomed Inform 2008, 41(5):687-693.</li>
</ol>
<p><strong>Challenges:</strong></p>
<ul>
<li>Lack of human-readable metadata. </li>
<li>Data variability (models, sources, etc.). </li>
<li> RDFizations of Datasets. </li>
<li> Wide variety of formats and technologies. </li>
</ul>
<p><strong>Potential Requirements:</strong></p>
<ul>
<li>Dataset versioning and updating mechanisms </li>
<li>Standardization of schemas </li>
<li>Integration with other platforms/services </li>
<li>Data Persistence </li>
</ul>
<p><strong>Requires:</strong> <a href="#R-AccessLevel">R-AccessLevel</a>,
<a href="#R-AccessUptodate">R-AccessUptodate</a>, <a href="#R-DataLifecyclePrivacy">R-DataLifecyclePrivacy</a>,
<a href="#R-FormatMultiple">R-FormatMultiple</a>, <a href="#R-FormatStandardized">R-FormatStandardized
</a>, <a href="#R-PersistentIdentification">R-PersistentIdentification</a>
and <a href="#R-VocabReference">R-VocabReference</a> .</p>
</section>
<!-- Building Eye -->
<section rel="bibo:Chapter" resource="#UC-BuildingEye" typeof="bibo:Chapter"
id="UC-BuildingEye">
<h3 id="h3_UC-BuildingEye" role="heading" aria-level="2">BuildingEye:
SME use of public data</h3>
<p class="contributor">(Contributed by Deirdre Lee)<br />
URL: <a href="http://mypp.ie/">http://mypp.ie/</a></p>
<p>Buildingeye.com makes building and planning information easier to
find and understand by mapping what's happening in your city. In
Ireland local authorities handle planning applications and usually
provide some customized views of the data (PDFs, maps, etc.) on their
own Web site. However there isn't an easy way to get a nationwide view
of the data. BuildingEye, an independent SME, built <a href="http://mypp.ie/">http://mypp.ie/</a>
to achieve this. However as each local authority didn't have an Open
Data portal, BuildingEye had to directly ask each local authority for
its data. It was granted access to some authorities, but not all. The
data it did receive was in different formats and of varying
quality/detail. BuildingEye harmonized this data for its own system.
However, if another SME wanted to use this data, they would have to go
through the same process and again go to each local authority asking
for the data. </p>
<p> <strong>Elements:</strong> </p>
<ul>
<li><b><i>Domains:</i></b> Planning data</li>
<li><b><i>Obligation/motivation:</i></b> demand from SME</li>
<li><b><i>Usage:</i></b> Commercial usage</li>
<li><b><i>Quality:</i></b> standardized, interoperable across local
authorities</li>
<li><b><i>Size:</i></b> medium</li>
<li><b><i>Type/format:</i></b> structured according to legacy system
schema</li>
<li><b><i>Rate of change:</i></b> daily</li>
<li><b><i>Potential audience:</i></b> Business, citizens</li>
<li><b><i>Governance:</i></b> local authorities</li>
</ul>
<p> <strong>Challenges:</strong> </p>
<ul>
<li>Access to data is currently a manual process, on a case by case
basis</li>
<li>Data is provided in different formats, e.g. database dumps,
spreadsheets</li>
<li>Data is structured differently, depending on the legacy system
schema, concepts and terms not interoperable</li>
<li>No official Open license associated with the data</li>
<li>Data is not available for further reuse by other parties</li>
</ul>
<p> <strong>Potential Requirements:</strong> </p>
<ul>
<li>Creation of top-down policy on open data to ensure common
understanding and approach</li>
<li>Top-down guidance on recommended Open license usage</li>
<li>Standardized, non-proprietary formats</li>
<li>Availability of recommended domain-specific vocabularies.</li>
</ul>
<p> <strong>Requires:</strong> <a href="#R-AccessBulk">R-AccessBulk</a>,
<a href="#R-AccessRealTime">R-AccessRealTime</a>, <a href="#R-DataLifecyclePrivacy">R-DataLifecyclePrivacy</a>,
<a href="#R-DataMissingIncomplete">R-DataMissingIncomplete</a>, <a href="#R-DataProductionContext">R-DataProductionContext</a>,
<a href="#R-DataUnavailabilityReference">R-DataUnavailabilityReference</a>,
<a href="#R-FormatMachineRead">R-FormatMachineRead</a>, <a href="#R-FormatOpen">R-FormatOpen</a>,
<a href="#R-FormatStandardized">R-FormatStandardized</a>, <a href="#R-GeographicalContext">R-GeographicalContext</a>,
<a href="#R-LicenseAvailable">R-LicenseAvailable</a>, <a href="#R-MetadatAvailable">R-MetadatAvailable</a>,
<a href="#R-MetadataDocum">R-MetadataDocum</a>, <a href="#R-QualityCompleteness">R-QualityCompleteness</a>,
<a href="#R-QualityComparable">R-QualityComparable</a>, <a href="#R-SensitivePrivacy">R-SensitivePrivacy</a>,
<a href="R-SensitiveSecurity">R-SensitiveSecurity</a> and <a href="#R-VocabDocum">R-VocabDocum</a>.</p>
</section>
<!-- Dados Gov BR -->
<section rel="bibo:Chapter" resource="#UC-DadosGovBr" typeof="bibo:Chapter"
id="UC-DadosGovBr">
<h3 id="h3_UC-DadosGovBr" role="heading" aria-level="2">Dados.gov.br</h3>
<p class="contributor">(Contributed by Yasodara)<br />
URL: <a href="http://dados.gov.br/">http://dados.gov.br/</a></p>
<p> Dados.gov.br is the open data portal of Brazil's Federal Government.
The site was built by a community network pulled together by three
technicians from the Ministry of Planning. They managed the group from
<a href="http://wiki.gtinda.ibge.gov.br/Tecnologia.ashx">INDA</a> or
"National Infrastructure for Open Data." CKAN was chosen because it is
free software and presents independent solutions for the placement of
a data catalog of the Federal Government provided on the internet.</p>
<p> <strong>Elements:</strong> </p>
<ul>
<li><b><i>Domains:</i></b> federal budget, addresses, Infrastructure
information, e-gov tools usage, social data, geographic information,
political information, Transport information.</li>
<li><b><i>Obligation/motivation:</i></b> Data that must be provided to
the public under a legal obligation, the called LAI or Brazilian
Information Access Act, edited in 2012.</li>
<li><b><i>Usage: </i></b>Data that is the basis for services to the
public; Data that has commercial reuse potential.</li>
<li><b><i>Quality:</i></b> Authoritative, clean data, vetted and
guaranteed.</li>
<li><b><i>Lineage/Derivation:</i></b> Data came from various
publishers. As a catalog, the site has faced several challenges, one
of them was to integrate the various technologies and formulas used
by publishers to provide datasets in the portal.</li>
<li><b><i>Type/format:</i></b> Tabular data, text data.</li>
<li><b><i>Rate of change:</i></b> There is fixed data and data with
high rate of change.</li>
</ul>
<p> <strong>Challenges:</strong> </p>
<ul>
<li>Data integration (lack of vocabularies).</li>
<li>Collaborative construction of the portal: managing online sprints
and balancing public expectatives.</li>
<li>Licensing the data of the portal. Most of data that is in the
portal does not have a special licence so there are types of license
applied to different datasets.</li>
</ul>
<p> <strong>Requires:</strong> <a href="#R-AccessLevel">R-AccessLevel</a>,
<a href="#R-DataLifecyclePrivacy">R-DataLifecyclePrivacy</a>, <a href="#R-DataLifecycleStage">R-DataLifecycleStage</a>,
<a href="#R-DataMissingIncomplete">R-DataMissingIncomplete</a>, <a href="#R-FormatStandardized">R-FormatStandardized</a>,
<a href="#R-LicenseAvailable">R-LicenseAvailable</a>, <a href="#R-MetadataAvailable">R-MetadataAvailable</a>,
<a href="#R-GeographicalContext">R-GeographicalContext</a>, <a href="#R-MetadataDocum">R-MetadataDocum</a>,
<a href="#R-ProvAvailable">R-ProvAvailable</a>, <a href="#R-QualityOpinions">R-QualityOpinions</a>,
<a href="#R-UsageFeedback">R-UsageFeedback</a>, <a href="#R-VocabReference">R-VocabReference</a>
and <a href="#R-VocabVersion">R-VocabVersion</a>.</p>
</section>
<!-- Digital Archiving -->
<section rel="bibo:Chapter" resource="#UC-DigitalArchiving" typeof="bibo:Chapter"
id="UC-DigitalArchiving">
<h3 id="h3_UC-DigitalArchiving" role="heading" aria-level="2">Digital
archiving of Linked Data</h3>
<p class="contributor">(Contributed by Christophe Guéret)<br />
URL: <a href="http://dans.knaw.nl/">http://dans.knaw.nl/</a></p>
<p>Digital archives, such as <abbr title="Data Archiving and Networked Services"><a
href="http://dans.knaw.nl/">DANS</a></abbr> in the Netherlands,
have so far been concerned with the preservation of what could be
defined as "frozen" datasets. A frozen dataset is a finished,
self-contained set of data that does not evolve after it has been
constituted. The goal of the preserving institution is to ensure this
dataset remains available and readable for as many years as possible.
This can for example concern an audio recording, a digitized image,
e-books or database dumps. Consumers of the data are expected to look
for specific content based on its associated identifier, download it
from the archive and use it. Now comes the question of the
preservation of Linked Open Data. In opposition to "frozen" data sets,
linked data can be qualified as "live" data. The resources it contains
are part of a larger entity to which third parties contribute, one of
the design principles indicate that other data producers and consumers
should be able to point to data. As <abbr title="Linked Data">LD</abbr>
publishers stop offering their data (e.g. at the end of a project),
taking the LD off-line as a dump and putting it in an archive
effectively turns it into a frozen dataset, likewise SQL dumps and
other kind of databases. The question then is to what extent this is
an issue.</p>
<p> <strong>Challenges:</strong> The archive has to think about whether
dereferencing for resources found in preserved datasets is required or
not, also to think about providing a SPARQL endpoint or not. If data
consumers and publishers are fine with having RDF data dumps to be
downloaded from the archive prior to its usage - just like any other
digital item so far - the technical challenges could be limited to
handling the size of the dumps and taking care of serialization
evolution over time (e.g. from N-Triples to TriG, or from RDF/XML to <a
href="http://www.rdfhdt.org/">HDT</a>) as the preference for these
formats evolves. Turning a live dataset into a frozen dump also raises
the question of the scope. Considering that LD items are only part of
a much larger graph that gives them meaning through context the only
valid dump would be a complete snapshot of the entire connected
component of the Web of Data graph the target dataset is part of. </p>
<p> <strong>Potential Requirements:</strong> Decide on the importance
of the de-referencability of resources and the potential implications
for domain names and naming of resources. Decide on the scope of the
step that will turn a connected sub-graph into an isolated data dump.</p>
<p> <strong>Requires:</strong> <a href="#R-AccessLevel">R-AccessLevel</a>,
<a href="#R-PersistentIdentification">R-PersistentIdentification</a>,
<a href="#R-UniqueIdentifier">R-UniqueIdentifier</a> and <a href="#R-VocabReference">R-VocabReference</a>
.</p>
</section>
<!-- Dutch Base registers -->
<section typeof="bibo:Chapter" resource="#h3_UC-DutchBasicReg" rel="bibo:Chapter"
id="UC-DutchBasicReg">
<h3 id="h3_UC-DutchBasicReg" role="heading" aria-level="2">Dutch Base
Registers</h3>
<p class="contributor">(Contributed by Christophe Guéret)<br />
URL: <a href="http://www.e-overheid.nl/onderwerpen/stelselinformatiepunt/stelsel-van-basisregistraties">http://www.e-overheid.nl/onderwerpen/stelselinformatiepunt/stelsel-van-basisregistraties</a></p>
<p> The Netherlands has a <a href="http://e-overheid.nl/onderwerpen/stelselinformatiepunt/stelsel-van-basisregistraties/basisregistraties">
set of registers</a> that are under consideration for exposure as
Linked (Open) Data in the context of the <a href="http://www.pilod.nl/wiki/Hoofdpagina">"PiLOD"</a>
project. The registers contain information about buildings, people,
businesses that other individual public bodies may want to refer to
for they daily activities. One of them is, for instance, the service
of public taxes ("BelastingDienst") which regularly pulls out data
from several registers, stores this data in a big Oracle instance and
curates it. This costly and time consuming process could be optimized
by providing on-demand access to up-to-date descriptions provided by
the register owners.</p>
<p> <strong>Challenges:</strong> </p>
In terms of challenges, linking is for once not much of an issue as <a
href="http://www.e-overheid.nl/onderwerpen/stelselinformatiepunt/stelselthemas/verbindingen/verbindingen-tussen-basisregistraties">
registers already cross-reference unique identifiers</a> (see also <a
href="http://www.wikixl.nl/wiki/gemma/index.php/Ontsluiting_basisgegevens">
http://www.wikixl.nl/wiki/gemma/index.php/Ontsluiting_basisgegevens</a>).
A <a href="http://www.pilod.nl/wiki/Boek/URI-strategie">URI scheme</a>
with predicable and persistent URIs is being considered for
implementation. Actual challenges include:
<ul>
<li>Capacity: at this point, it is considered unreasonable to ask
every register to publish its own data. Some of them export what
they have on the national open data portal. This data has been used
to do some testing with third-party publications from PiLOD project
members but this is rather sensitive as a long term strategy
(governmental data has to be traceable/trustable as such). The middle
ground solution currently deployed is the PiLOD platform, a
(semi)-official platform for publishing register data.</li>
<li>Privacy: some of the register data is personal or may become so
when linked to others (e.g. when addresses are used to disambiguate
personal data). Some registers will require secure access to some of
their data to some people only (an example of non-open Linked Data).
Some others can go along with open data as long as they get a
precise log of who is using what.</li>
<li>Revenue: institutions working under mixed
government/non-government funding generate part of their revenue by
selling some of the data they curate. Switching to an open data
model will cause a direct loss in revenue that has to be compensated
for by other means. This does not have to mean closing the data,
e.g. a model of open dereferencing plus paid dumps can be
considered, as well as other indirect revenue streams. </li>
</ul>
<p> <strong>Requires:</strong> <a href="#R-AccessLevel">R-AccessLevel</a>,
<a href="#R-FormatMultiple">R-FormatMultiple</a>, <a href="#R-PersistentIdentification">R-PersistentIdentification</a>,
<a href="#R-SensitivePrivacy">R-SensitivePrivacy</a>, <a href="#R-UniqueIdentifier">R-UniqueIdentifier</a>
and <a href="#R-VocabReference">R-VocabReference</a> .</p>
</section>
<!-- GS1 Digital -->
<section rel="bibo:Chapter" resource="#UC-GS1Digital" typeof="bibo:Chapter"
id="UC-GS1Digital">
<h3 id="h3_UC-GS1Digital" role="heading" aria-level="2">GS1 Digital</h3>
<p class="contributor">(Contributed by Mark Harrison (University of
Cambridge) & Eric Kauz (GS1) )<br />
URL: <a href="http://www.gs1.org/digital">http://www.gs1.org/digital</a></p>
<p>Retailers and Manufacturers / Brand Owners are beginning to
understand that there can be benefits to openly publishing structured
data about products and product offerings on the Web as Linked Open
Data. Some of the initial benefits may be enhanced search listing
results (e.g. Google Rich Snippets) that improve the likelihood of
consumers choosing such a product or product offer over an alternative
product that lacks the enhanced search results. However, the longer
term vision is that an ecosystem of new product-related services can
be enabled if such data is available. Many of these will be
consumer-facing and might be accessed via smartphones and other mobile
devices, to help consumers to find the products and product offers
that best match their search criteria and personal preferences or
needs — and to alert them if a particular product is incompatible with
their dietary preferences or other criteria such as ethical /
environmental impact considerations — and to suggest an alternative
product that may be a more suitable match. A more <a href="https://www.w3.org/2013/dwbp/wiki/Use_Cases#GS1:_GS1_Digital">complete
description</a> of this use case is available.</p>
<p><strong>Elements:</strong></p>
<ul>
<li><b><i>Domains:</i></b>
<ul>
<li>Product master data (e.g. technical specifications,
ingredients, nutritional information, dimensions, weight,
packaging).</li>
<li>Product offerings (e.g. sales price, availability (online,
locally), payment options, delivery/collection options.</li>
<li>Ethical / environmental claims about a product and its
production process.</li>
</ul>
</li>
<li><b><i>Obligation/motivation:</i></b>
<ul>
<li>initially, enhanced search result listings (e.g. Google Rich
Snippets);</li>
<li>vision is to enable an ecosystem of new digital apps around
product data;</li>
<li>the food sector in the EU is already obliged under new food
labelling legislation (EU 1169 / 2011, Article 14) to provide
the same amount of information about a food product that is sold
online to consumers as the information that would be available
to them from the product packaging if they picked up the product
in-store. Although the legislation does not suggest that Linked
Open Data technology should be used to make the same information
available in a machine-readable format, there is currently
significant investment and effort to upgrade Web sites to
provide accurate and detailed information about food products;
the GS1 Digital team consider that for a relatively small amount
of effort, these companies could gain some tangible benefits
(e.g. enhanced search results) from such compliance efforts by
using Linked Open Data technology within their Web pages.</li>
</ul>
</li>
<li><b><i>Usage:</i></b>
<ul>
<li>data providing transparency about product characteristics</li>
<li>data used to help consumers make informed choices about which
products to buy/consume</li>
</ul>
</li>
<li><b><i>Quality:</i></b> Very important to have trustworthy
authoritative data from respective organizations.</li>
<li><b><i>Size:</i></b> Typically 20+ factual claims per product -
probably 40+ RDF triples.</li>
<li><b><i>Type/format:</i></b> HTML + RDFa / JSON-LD / Microdata.</li>
<li><b><i>Rate of change:</i></b> mostly static data initially — but
subject to some variation over time</li>
<li><b><i>Data lifespan:</i></b> data should remain accessible until
products are no longer considered to be in circulation; this
represents a challenge for deprecated product lines data that is
stated authoritatively by one organization might be embedded /
referenced in the data asserted by another organization; this raises
concerns about whether embedded data becomes stale if it is
inadequately synchronized, that referenced data is not dereferenced
(and therefore not discovered / gathered) by consumers or the data.
From a liability perspective, there also needs to be clarity about
which organization asserted which factual information — and also
information about which organization has the authority to assert
specific factual claims.</li>
<li><b><i>Potential audience:</i></b> machine-readable (search
engines, data aggregators, mobile apps etc.)</li>
</ul>
<p> <strong>Challenges:</strong> </p>
<ul>
<li>Linked Open Data about products is likely to be highly distributed
in nature and various parties have authority over specific claims.</li>
<li>Accreditation agencies have authority over ethical/environmental
claims.</li>
<li>Brand owners / manufacturers have authority over product master
data.</li>
<li>Retailers have authority over facts related to product offerings
(price, availability etc.).</li>
<li>An organization (e.g. retailer) might embed authoritative data
asserted by another organization (e.g. brand owner) and there is the
risk that such embedded information becomes stale if it is not
continuously synchronized.</li>
<li>An organization (e.g. retailer) might reference a graph of
authoritative data that can be retrieved via an HTTP request to a
remote HTTP URI. There is a risk that software or search engines
consuming Linked Open Data containing such references may fail to
dereference such HTTP URIs and in doing so may fail to gather all of
the relevant data.</li>
<li>Organizations are currently faced with a choice of whether to
embed machine-readable structured data in their Web pages using a
block approach (e.g. using JSON-LD) or using an inline approach
(e.g. using RDFa, RDFa Lite or Microdata). A block approach
(JSON-LD) may be simpler and less brittle than inline annotation,
especially as it can be easily decoupled from structural changes to
the body of the Web page that may happen over time in the redesign
of a Web site. At present, tool support for the 3 major markup
approaches for embedded Linked Open Data (RDFa, JSON-LD, Microdata)
is unequal across the three formats and some tools may not export or
import / ingest all 3 formats - some tools even fail to extract data
from JSON-LD markup created by their corresponding export tool.
There are some significant challenges to ensure that the structured
data embedded within a Web page is correctly linked to form coherent
RDF triples, without any dangling nodes that should be connected to
the Subject or other nodes.</li>
<li>Only through the provision of best-in-class tool support that
recognize all three major formats on a completely equal footing can
organizations have any confidence that they can use any of the 3
major markup formats and the ability to verify / validate that their
own markup does result in the correct RDF triples.</li>
</ul>
<p> <strong>Potential Requirements:</strong> </p>
<ul>
<li>The ability to determine who asserted various facts — and whether
they are the organization that can assert those facts
authoritatively.</li>
<li>Where data from other sources is embedded, there is a risk that
the embedded data might be stale. It is therefore helpful to
indicate which graph of triples is a snapshot in time from data from
another source - and to provide a link to the original source, so
that the consumer of the data has the opportunity to obtain a fresh
version of the live data rather than relying on a potentially stale
snapshot graph of data. <abbr title="Data on the Web Best Practices">DWBP</abbr>
could provide guidance about how to indicate which graph of data is
a snapshot and where it came from.</li>
<li>Consumers of Linked Open Data about products might rely on it for
making decisions — not only about purchase but even consumption. If
the data about a product is inaccurate or out-of-date, we might need
to provide some guidance about how liability terms and disclaimers
can be expressed in Linked Open Data. We’re not suggesting that we
define such terms from a legal perspective, but perhaps there is an
existing framework in a similar way that there is an existing
framework for expressing various licences of the data? If not,
perhaps such a framework needs to be developed - but outside of the
DWBP group? Licensing generally says what you’re allowed to do with
the data - but I don’t think it says anything about liability for
using the data or making decisions based on that data. This area
probably needs some clarification, particularly if there is a risk
of injury or death (due to inaccurate information about allergens in
a food product).</li>
</ul>
<p> <strong>Requires:</strong> <a href="#R-AccessUptodate">R-AccessUptodate</a>,
<a href="#R-Citable">R-Citable</a>, <a href="#R-FormatMultiple">R-FormatMultiple</a>,
<a href="#R-FormatStandardized">R-FormatStandardized</a>, <a href="#R-LicenseLiability">R-LicenseLiability</a>,
<a href="#R-PersistentIdentification">R-PersistentIdentification</a>
and <a href="#R-ProvAvailable">R-ProvAvailable</a> .</p>
</section>
<!-- ISOGEO Story -->
<section rel="bibo:Chapter" resource="#UC-ISOGeo" typeof="bibo:Chapter" id="UC-ISOGeo">
<h3 id="h3_UC-ISOGeo" role="heading" aria-level="2">ISO GEO Story</h3>
<p class="contributor">(Contributed by Ghislain Atemezing)</p>
<p> ISO GEO manages catalog records of geographic information in XML
that conform to ISO-19139, a French adaptation of ISO-19115 (<a href="ISO-19139-Isogeo-set.zip">data
sample</a>). They export thousands of records like that today but
they need to manage them better. In their platform, they store the
information in a more conventional manner and use this standard for
export datasets compliant to the <a href="http://inspire.ec.europa.eu/">INSPIRE</a>
standards or via the <abbr title="Open geospatial Consortium">OGC</abbr>'s
<abbr title="Catalog Service for the Web">CSW</abbr> protocol.
Sometimes, they have to enrich their metadata using tools like
GeoSource and accessed through an <abbr title="Spatial Data Infrastructure">SDI</abbr>
with their own metadata records. ISO GEO wants to be able to integrate
all the different implementations of ISO-19139 in different tools in a
single framework to better understand the thousands of metadata
records they use in their day-to-day business. Types of information
recorded in each file include: contact info (metadata) [data issued],
spatial representation, reference system info [code space], spatial
resolution, geographic extension of the data, file distribution, data
quality and process step (<a href="http://www.eurecom.fr/%7Eatemezin/datalift/isogeo/5cb5cbeb-fiche1.xml">example</a>).</p>
<p> <strong>Challenges:</strong> </p>
<ul>
<li>Achieve interoperability between supporting applications, e.g.
validation and discovery services built over a metadata repository.</li>
<li>Capture the semantics of the current metadata records with respect
to ISO-19139.</li>
<li>A unified way to have access to each record within the catalog at
different levels: local, regional, national or <abbr title="European Union">EU</abbr>
level.</li>
</ul>
<p> <strong>Requires:</strong> <a href="#R-AccessUpToDate">R-AccessUpToDate</a>,
<a href="#R-DataEnrichment">R-DataEnrichment</a>, <a href="#R-FormatLocalize">R-FormatLocalize</a>,
<a href="#R-FormatMachineRead">R-FormatMachineRead</a>, <a href="#R-GranularityLevels">R-GranularityLevels</a>,
<a href="#R-LicenseAvailable">R-LicenseAvailable</a>, <a href="#R-MetadataMachineRead">R-MetadataMachineRead</a>,
<a href="#R-MetadataStandardized">R-MetadataStandardized</a>, <a href="#R-PersisentIdentification">R-PersisentIdentification</a>,
<a href="#R-ProvAvailable">R-ProvAvailable</a> and <a href="#R-VocabReference">R-VocabReference</a>.
</p>
</section>
<!-- The Land Portal -->
<section typeof="bibo:Chapter" resource="#h3_UC-LandPortal" rel="bibo:Chapter"
id="UC-LandPortal">
<h3 id="h3_UC-LandPortal" role="heading" aria-level="2">The Land Portal</h3>
<p class="contributor">(Contributed by Carlos Iglesias)<br />
URL: <a href="http://landportal.info/">http://landportal.info/</a></p>
<p>The IFAD Land Portal platform has been completely rebuilt as an Open
Data collaborative platform for the Land Governance community. Among
the new features the Land Portal provides access to more than 100
indicators from more than 25 different sources on land governance
issues for more than 200 countries over the world, as well as a
repository of land related-content and documentation. Thanks to the
new platform people could</p>
<ol>
<li>curate and incorporate new data and metadata by means of different
data importers and making use of the underlying common data model;</li>
<li>search, explore and compare the data through countries and
indicators; and</li>
<li>consume and reuse the data by different means (i.e. raw data
download at the data catalog; linked data and SPARQL endpoint at RDF
triplestore; RESTful API; and built-in graphic visualization
framework).</li>
</ol>
<p><strong>Elements:</strong></p>
<ul>
<li><b><i>Domains:</i></b> Land Governance; Development</li>
<li><b><i>Obligation/motivation:</i></b> To find reliable data driven
indicators on land governance and put all them together to
facilitate access, study, analysis, comparison and data gaps
detection.</li>
<li><b><i>Usage:</i></b> Research; Policy Making, Journalism;
Development; Investments; Governance; Food security; Poverty; Gender
issues.</li>
<li><b><i>Quality:</i></b> Every sort of data, from high quality to
unverified.</li>
<li><b><i>Size:</i></b> Varies, but low-medium in general.</li>
<li><b><i>Type/format:</i></b> Varies: APIs; JSON; spreadsheets; CSV;
HTML; XML; PDF...</li>
<li><b><i>Rate of change:</i></b> Usually yearly, but also higher
rates (monthly, quarterly...).</li>
<li><b><i>Data lifespan:</i></b> Unlimited.</li>
<li><b><i>Potential audience:</i></b> Practitioners; Policy makers;
Activists; Researchers; Journalists.</li>
</ul>
<p><strong>Challenges:</strong></p>
<ul>
<li>Data coverage.</li>
<li>Quality of data and metadata.</li>
<li>Lack of machine-readable metadata.</li>
<li>Inconsistency between different data sources.</li>
<li>Wide variety of formats and technologies.</li>