diff --git a/rfc-116-store-attachment-data-in-content-items.md b/rfc-116-store-attachment-data-in-content-items.md index 5ec78bbb..db017d04 100644 --- a/rfc-116-store-attachment-data-in-content-items.md +++ b/rfc-116-store-attachment-data-in-content-items.md @@ -66,37 +66,61 @@ shouldn't conflate the two. I propose we do this: -1. Split `asset_link` into two new types: `attachment_asset` and +1. Split `asset_link` into four new types: `base_attachment_asset`, + `attachment_asset`, `specialist_publisher_attachment_asset` and `image_asset`, updating the schemas which use `asset_link` - accordingly. - - **Migration concerns:** This can be done without breaking - compatibility, as it's an internal change to the schemas. + accordingly, with these fields: + + - `base_attachment_asset`: + - `url` (required) + - `content_type` (required) + - `title` + - `attachment_asset` (extends `base_attachment_asset`): + - no extra fields at this time + - `specialist_publisher_attachment_asset` (extends `base_attachment_asset`): + - `content_id` (required) + - `created_at` + - `updated_at` + - `image_asset`: + - `url` (required) + - `content_type` (required) + + **Migration concerns:** This would need to be done in three steps: + 1. Update manuals-publisher to not set the `created_at` and `updated_at` for attachments. + 2. Republish all manual sections. + 3. Update the schemas. 1. Add new optional fields for extra metadata: - - Attachments: + - `base_attachment_asset`: - `accessible` - `alternative_format_contact_email` - - `attachment_type` - - `command_paper_number` - - `file_size` - `filename` - - `hoc_paper_number` - - `id` - - `isbn` + - `file_size` - `locale` - `number_of_pages` - - `parliamentary_session` - `preview_url` - - `unique_reference` - - `unnumbered_command_paper` - - `unnumbered_hoc_paper` - - Images: + - `attachment_asset` + - `id` + - `image_asset` - `alt_text` - `caption` - `credit` - - `id` + + **Migration concerns:** This can be done without breaking + compatibility, as all new fields are optional. + +1. Add new attachment type for publication attachments: + + `publication_attachment_asset` (extends `attachment_asset`) with fields: + - `attachment_type` (required) + - `command_paper_number` + - `hoc_paper_number` + - `isbn` + - `parliamentary_session` + - `unique_reference` + - `unnumbered_command_paper` + - `unnumbered_hoc_paper` I've left out the following fields because they can be deduced from others: `csv?`, `external?`, `external_url`, `file_extension`, `html?`, @@ -113,45 +137,44 @@ I propose we do this: `ordering`, `slug`. **Migration concerns:** This can be done without breaking - compatibility, as it doesn't add any new mandatory fields. - -1. Replace `content_id` in attachments with `id`. - - **Migration concerns:** this would need to be done in five steps: - add `id`; update app to read from both fields but write to `id`; - republish docs; remove `content_id`; update apps to not use - `content_id`. + compatibility, as the new type will be unused for now. -1. Remove non-useful metadata fields: +1. Make `id` in `attachment_asset` mandatory. - - Attachments: `created_at` and `updated_at` - - **Migration concerns:** this would need to be done in two steps: - update apps; remove fields. + **Migration concerns:** this would need to be done in three steps: + 1. Update travel advice publisher and manuals publisher to set an arbitrary ID for attachments. + 2. Republish all travel advice pages and manual sections. + 3. Make the field mandatory. 1. Add a *mandatory* `details.attachments` field to all formats which can have attachments. - **Migration concerns:** this would need to be done in four steps: add - *optional* field; update apps to write to it; republish docs; make - field mandatory. + **Migration concerns:** this would need to be done in four steps: + 1. Add the field as optional. + 2. Update publishing apps to write to it. + 3. Republish all affected documents. + 4. Make the field mandatory. + +1. Add a *mandatory* `details.document_attachments` list to + publication formats, which is an ordered list of govspeak or HTML + strings (using the `#/definitions/multiple_content_types` schema) + rendering attachments from `details.attachments`. -1. Change the `details.documents` list to store govspeak hashes. + **Migration concerns:** this would need to be done in four steps: + 1. Add the field as optional. + 2. Update publishing apps (just Whitehall and Content Publisher) to write to it. + 3. Republish all affected documents. + 4. Make the field mandatory. - **Migration concerns:** this would need to be done in five steps: - change schema to allow both types; update frontend apps; update - publishing apps; republish docs; change schema to only allow new - type. +1. Remove the `details.documents` field. -1. Document the relationship between the `attachment_asset` type in - our content schemas and the data Govspeak attachments use. For - example, all fields required in one must be required in the other. - Any pairs of fields which do the same job in both should have the - same name. Implement any changes arising from this, for example, - more Whitehall-like attachment rendering in Govspeak. + **Migration concerns:** this would need to be done in four steps: + 1. Update frontend apps to render from `details.document_attachments`. + 2. Update publishing apps to not set `details.documents`. + 3. Republish all affected documents. + 4. Remove the field. -The final `attachment_asset_list` / `attachment_asset` / `image_asset` -schema will be: +The final schemas will be: ``` { @@ -159,50 +182,98 @@ schema will be: type: "object", additionalProperties: false, required: [ + "content_type", "url", ], properties: { - id: { type: "string", }, alt_text: { type: "string", }, caption: { type: "string", }, + content_type: { type: "string", }, credit: { type: "string", }, url: { type: "string", format: "uri", }, }, }, - attachment_asset: { + + base_attachment_asset: { type: "object", additionalProperties: false, required: [ - "url", "content_type", + "url", ], properties: { accessible: { type: "boolean", }, alternative_format_contact_email: { type: "string", }, - attachment_type: { type: "string", enum: ["external", "file", "html"], }, - command_paper_number: { type: "string", }, content_type: { type: "string", }, file_size: { type: "integer", }, filename: { type: "string", }, - hoc_paper_number: { type: "string", }, - id: { type: "string", }, - isbn: { type: "string", }, locale: { "$ref": "#/definitions/locale", }, number_of_pages: { type: "integer", }, - parliamentary_session: { type: "string", }, preview_url: { type: "string", format: "uri", }, title: { type: "string", }, - unique_reference: { type: "string", }, - unnumbered_command_paper: { type: "boolean", }, - unnumbered_hoc_paper: { type: "boolean", }, url: { type: "string", format: "uri", }, }, }, - attachment_asset_list: { - description: "An ordered list of attachment links", - type: "array", - items: { - "$ref": "#/definitions/attachment_asset", + + attachment_asset: { + type: "object", + additionalProperties: false, + required: [ + "content_type", + "id", + "url", + ], + properties: { + allOf: [ + { "$ref": "#/definitions/base_attachment_asset/properties" }, + { id: { type: "string" } }, + ], + }, + }, + + specialist_publisher_attachment_asset: { + type: "object", + additionalProperties: false, + required: [ + "content_id", + "content_type", + "url", + ], + properties: { + allOf: [ + { "$ref": "#/definitions/base_attachment_asset/properties" }, + { + content_id: { "$ref": "#/definitions/guid", }, + created_at: { format: "date-time", }, + updated_at: { format: "date-time", }, + }, + ], + }, + }, + + publication_attachment_asset: { + type: "object", + additionalProperties: false, + required: [ + "attachment_type", + "content_type", + "id", + "url", + ], + properties: { + allOf: [ + { "$ref": "#/definitions/attachment_asset/properties" }, + { + attachment_type: { type: "string", enum: ["external", "file", "html"], }, + command_paper_number: { type: "string", }, + hoc_paper_number: { type: "string", }, + isbn: { type: "string", }, + parliamentary_session: { type: "string", }, + unique_reference: { type: "string", }, + unnumbered_command_paper: { type: "boolean", }, + unnumbered_hoc_paper: { type: "boolean", }, + }, + ], }, }, } @@ -212,17 +283,16 @@ schema will be: **Why two lists?** -There are really two different types of attachments in documents. -There are attachments which may be referenced in Govspeak but don't -otherwise appear on the document; in this case the attachment hashes -are more like metadata than actual document content. There are also -attachments which appear differently to the normal document text, like -in publications. +As mentioned earlier, there are really two different types of +attachments in documents. There are attachments which appear in the +body, and attachments which appear separately to the normal document +text, like in publications. The `details.attachments` list would contain both types of attachments, so there is one list with everything in. The -`details.documents` list would fulfil its current purpose: providing -the ordered list of attachments for publication-like document types. +`details.document_attachments` list fulfils its current purpose: +providing the ordered list of document-level attachments for +publication-like document types. **Why split `asset_link`?** @@ -231,6 +301,13 @@ attachments. However images have strictly less permissable metadata than attachments, it doesn't make sense to give a page image an ISBN for example. These should be separate types. +**Why have multiple attachment types?** + +Different publishing apps have different allowable metadata, so it +makes sense to have different types constraining what a publishing app +is able to provide. However, we still want some unifying structure +behind the types, so we use inheritance. + **Why make the lists mandatory?** Is there a difference between a missing list and a present, but empty,