A metadata standard for regulatory documents #81

DidacFB-CDDO · 2024-05-03T08:01:52Z

A metadata standard for regulatory documents

Note that this a refreshed February 2024 version of the original challenge request published in July 2021 which has now been closed.

Challenge Owner

Kevin Xu
Technical Architect | Smarter Regulation Executive | Department of Business and Trade (DBT)
Amilta Stephen Boyd
Policy Advisor | Smarter Regulation Executive | Department of Business and Trade (DBT)
Catherine Tabone
Data Scientist | The National Archives

Short Description

Cross-government data standards work is focused on data discoverability: https://cddo.blog.gov.uk/2023/04/12/how-we-are-improving-data-discoverability/
Key to this is the development of ‘a consistent metadata implementation approach’, with a view to making datasets more discoverable in government, and to support the development of a Government Data Marketplace. That data can be of many types, including statistics, operational data, public sector publications, and others.
An important set of data which does not currently follow a consistent metadata implementation approach is regulations. Regulations are documents published principally by regulators, of which there are over 80, either on their own websites or Gov.uk. Those documents may be legally enforceable regulations, or guidance documents about the regulations.
Legislation (primary and secondary) is already published with consistent metadata by the National Archives, on legislation.gov.uk. However, because of the independence and number of regulators, there is a lack of metadata consistency in the publishing of regulation. This represents a large gap in how the public sector publishes content.
The hypothesis of this challenge is that there will be value in regulators using common metadata fields to publish their regulation documents.
Our secondary hypothesis is that the best approach to agreeing a standard will be to implement existing standards and approaches in the regulatory context, rather than create an entirely new standard.
We have reviewed common data standards used to represent legislative and official documents published online, including the Dublin Core Metadata vocabulary, the Data Catalog Vocabulary (DCAT), the National Archives Crown Legislation Mark-up Language (CLML), Oasis LegalDocML (Akoma Ntoso) and the European Legislation Identifier (ELI).
In preparation for this challenge, the Better Regulation Executive Digital Team has developed a version 0.2 of a metadata standard, for consultation with regulators and other interested users.
As part of this standard, we also want to allow regulators to provide the content of their regulation documents in marked up format. This will not be an obligatory part of the standard required for compliance.
Our working name for the standard is ‘Open Regulation Document Standard’ or ORDS.

Short technical summary

Metadata
ORDS was developed by reviewing the following metadata standards:
Dublin Core - international metadata standard for describing digital or physical resources.
Functional Requirements for Bibliographic Records (FRBR) - Entity relationship model developed by librarians.
Data Catalog Vocabulary (DCAT) - Metadata standard recommended by CDDO for data interchange within government.
European Legislation Identifier (ELI) - EU standard for identifiers and metadata for European legislation publishers to describe legal documents online.

We have chosen to base ORDS on selected values from the Dublin Core metadata standard with a few additional regulation specific properties. Dublin Core is an established and widely used standard. It aligns with best practice for UK government data publishing and provides properties to express most to the relevant values for regulatory documents. We expect ORDS to act as a specification for how Dublin Core properties should be applied to regulator data in a consistent and logical way.
SRD Data Standards framework
The larger aim of SRD Digital is to improve the machine readability of regulatory content. We have defined a framework / vision for how this would be done. The different levels represent our recognition that regulators (and therefore regulatory content) are at different stages in their digital transformation journey and have different levels resource to commit to content publishing. We want to encourage best practice and help regulators take steps in this journey, no matter where they might be.

Consistency
ORDS metadata standard
Open Document File Format
Meaning
XML / HTML structure
XML / HTML semantic mark-up
Rules as Code – when appropriate

With most regulators, our goal is to achieve consistency of metadata – i.e the adoption of ORDS and for regulators to use open document file formats.

User Needs

Strategic alignment: There is a cross government drive to improve publishing processes, enabling data discoverability and to implement a common approach to recording metadata. ORDS supports these initiatives in the regulatory publishing space, benefitting 80+ regulators to ensure data interoperability with one another.
Data providers: Regulators need to publish their regulatory documents in a way that is easily found and accessible. They need to be able to easily manage their documents, be able to see how their documents relate to others and ensure they are continuously updated. To provide such maintenance and analytical options, there needs to be consistent metadata.
Intermediary data consumers: Better structured data and improved discoverability enables the creation of services and software dealing with regulatory information (RegTech). This includes:
Software developers
Data Scrapers
Insight generators
Legal and regulatory advisors
Regulatory consumers: Better services and software makes it easier to identify and comply with regulation. This supports:
Individuals
Businesses

Functional Needs

Provide alignment with broader cross-gov data discoverability.
Provide consistency or alignment with pre-existing relevant data standards: Dublin Core, DCAT, Akoma Ntoso, Crown XML.
Account for regulation-specific concepts through creating additional fields as appropriate.
Right balance of mandatory fields which support consistency, with optional fields which allow context-specific information to be captured.
Support for content mark up.

Process

Our first step was to engage with regulators and RegTech companies to validate the user need for a data standard for regulation documents.
Our second step was to consider pre-existing standards to determine whether they could be directly used in the regulatory context. We looked at Dublin Core, DCAT, Akoma Ntoso, and Crown XML. These standards are highly aligned with each other and included a large majority of the fields which would seem appropriate in the regulatory context. However, none were suitable to use exactly in their current form.
Our third step was to create a version 0.1 of ORDS, applying pre-existing standards to the regulatory context. For the metadata fields, this involved cross-referencing between DCAT and Dublin Core to come up with a consolidated set of fields (some mandatory, some optional), with a small number of additional fields specific to the regulation context. We also developed an initial proposal for content mark up.
Our fourth step was to form a working group comprised of representatives from DBT, the regulators, the National Archives and the Data Standards Authority to review and provide feedback on ORDS v0.2. We have now presented the to Data Standards Authority Steering Board and Peer Review Group, which recognises the value of ORDS.

Questions

Does our current approach seem sensible?
While our priority is to get the metadata right, we also want to account for more detailed (e.g. paragraph-level) mark-up. What would be the best way to do this?
What elements should be considered mandatory / optional?
How can we encourage greater uptake of the standard?

MattiSG · 2024-05-04T10:10:00Z

Thanks for sharing this in the open.

As explained three years ago in #79 (comment), I believe the ability to unambiguously reference regulatory documents is critical for interconnection, before even discoverability through metadata. I read in this challenge description points about metadata, but not about reference, even though ELI is mentioned as part of the reviewed points. The establishment of an URI scheme, preferably based on existing standards, is from the perspective of tools such as @openfisca a priority for modeling.

Regarding both referencing and “content markup”, you could be interested in @verbman’s Parliamentary Love Letter format, which holds strong promises of simplicity, efficiency and reusability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A metadata standard for regulatory documents #81

A metadata standard for regulatory documents #81

DidacFB-CDDO commented May 3, 2024 •

edited

Loading

MattiSG commented May 4, 2024

A metadata standard for regulatory documents #81

A metadata standard for regulatory documents #81

Comments

DidacFB-CDDO commented May 3, 2024 • edited Loading