RFC 42: Data transfer between Wagtail installations #42

gasman · 2019-10-01T15:48:33Z

kaedroho · 2019-10-01T16:04:03Z

text/042-data-transfer-between-installations.md

+The list of instances offered as import source sites will be configurable through the Django settings file. In cases where sites are locked down by basic authentication, IP range or similiar, it is the implementor's responsibility to ensure that each instance is able to make HTTP / HTTPS requests to its 'sibling' sites.
+
+    WAGTAIL_IMPORT_SOURCE_INSTANCES = [
+        {


Any reason to not require an identifier for each source? (similar to how we identify database and search backends)

WAGTAIL_IMPORT_SOURCE_INSTANCES = { 'staging': { 'label': "Staging", # Could default to the identifier.title() } }

kaedroho · 2019-10-01T16:07:06Z

text/042-data-transfer-between-installations.md

+
+The list of instances offered as import source sites will be configurable through the Django settings file. In cases where sites are locked down by basic authentication, IP range or similiar, it is the implementor's responsibility to ensure that each instance is able to make HTTP / HTTPS requests to its 'sibling' sites.
+
+    WAGTAIL_IMPORT_SOURCE_INSTANCES = [


We tend to omit the underscore between WAGTAIL and the module name, For example. WAGTAILSEARCH_BACKENDS.

Also maybe _SOURCE_INSTANCES could be replaced with _SOURCES?

kaedroho · 2019-10-01T16:09:43Z

text/042-data-transfer-between-installations.md

+
+To avoid the need to modify Wagtail's core data model, the import-export app will define a new model / database table to store the mappings from UIDs to local installation-specific IDs:
+
+    class IDMapping(models.Model):


Should we add a unique_together constraint on content_type and local_id?

cyface · 2019-10-01T20:04:02Z

I think in the spirit of the use case that I have for this, the pull approach should be fine, but the push approach is very much preferred. :)

In cases where the reason for doing data transfer is security related, it may not be possible for direct login to 'publish' instances vs. 'author' instances. UX-wise, the push approach also prevents the author from having to log in to multiple instances, and figure out which instance they are on.

Perhaps a menu item in the author instance that jumps you over to publish and initiates the import for that page?

chosak · 2019-10-30T13:47:23Z

This would be a great feature to have and should definitely be discussed as an RFC here.

In cases where the reason for doing data transfer is security related, it may not be possible for direct login to 'publish' instances vs. 'author' instances.

+1 @cyface. A reasonable and common pattern is to have the production environment in a network purposefully isolated from staging and development environments. You might not want to expose those environments to the production one because that opens the possibility of external requests making it down to an internal network.

In those cases it would be preferable to be able to do some kind of file-based export, where users on the "source" Wagtail can generate a transfer artifact which can be imported into the "destination" Wagtail. This would be akin to a super dumpdata which dumped only the desired models, along with a super loaddata which figured out how to smartly ingest them.

This file-based export could potentially even live along side a direct connection-based import/export, sharing the same logic underneath. This might depend a bit on how the logic works: if an import is "make a bunch of API calls, then manipulate and ingest the data", that seems like it'd work just as well with an interim file-based step. If instead the import logic were iterative (meaning, a query is made to the API, then other queries are made depending on how the destination is set up), this wouldn't fit so well with the proposed approach.

gasman · 2019-10-30T16:05:28Z

@chosak I think it is an iterative process, and one that can potentially vary depending on what objects exist at the destination. For any item that has been explicitly chosen to be imported, there's a tree of related items to be imported along with it, potentially of any depth (for example, a page may reference a snippet which references an image).

For every such relation in that tree, there's a business rule (i.e. one of the bullet points given under https://github.com/gasman/weps/blob/81060c34c190482e262be6b7595ac4d213276704/text/042-data-transfer-between-installations.md#configuration) which determines whether the related object will be imported or skipped. Some of these business rules have a 'get-or-create' behaviour, which means that they may or may not end up importing the related object (and possibly pulling in even more related objects associated with it) depending on whether it exists already. This means that the dataset exported from the source site has to be larger in the 'create' scenario than the 'get' scenario.

As a concrete example: suppose we're importing a blog section, and the blog posts have categories, identified by a slug. These categories are snippet/modeladmin models, and should be created at the destination whenever we encounter one that doesn't exist. Now, suppose that a category object isn't just a plain slug + label, but includes some 'rich' fields, such as an image or a rich-text description. The import logic has to consider the object references in these fields if, and only if, the category doesn't exist already at the destination - which is something we can't know if we're preparing the export dataset up-front.

This isn't insurmountable - the exporter logic could take a "pessimistic" view and always output the extra records whenever there's a chance of needing them - but it's added complexity that can be left until later.

kaedroho · 2019-10-30T17:16:12Z

Discussed in Wagtail core meeting. Everyone in favour of approach as it stands as it could be extended to support a push workflow and transfer with a file later on.

We also discussed implementing a mechanism in Wagtail core to discover all objects that a page references. This would scrape foreign keys, chooser blocks and rich text features to find all linked pages, images, documents and snippets. Everyone was in favour of this change too.

zerolab · 2020-01-24T14:04:46Z

Work done in https://github.com/wagtail/wagtail-transfer/

kaedroho · 2020-08-04T09:02:35Z

Merged since this is implemented now and the RFC generally reflects what made it in to Wagtail transfer

SidharajYadav · 2022-03-19T08:18:02Z

I want to start issues

RFC 42: Data transfer between Wagtail installations

81060c3

gasman force-pushed the rfc/data-transfer-between-installations branch from 1a3df8a to 81060c3 Compare October 1, 2019 15:49

gasman changed the title ~~RFC xx: Data transfer between Wagtail installations~~ RFC 42: Data transfer between Wagtail installations Oct 1, 2019

kaedroho reviewed Oct 1, 2019

View reviewed changes

kaedroho added the 1:New label Oct 23, 2019

kaedroho added 3:Review and removed 1:New labels Oct 30, 2019

kaedroho added 4:Final Comment Period and removed 3:Review labels Jul 13, 2020

kaedroho merged commit 6bdb994 into wagtail:master Aug 4, 2020

nmorduch added the Active label Oct 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC 42: Data transfer between Wagtail installations #42

RFC 42: Data transfer between Wagtail installations #42

gasman commented Oct 1, 2019 •

edited

Loading

kaedroho Oct 1, 2019 •

edited

Loading

kaedroho Oct 1, 2019

kaedroho Oct 1, 2019

cyface commented Oct 1, 2019 •

edited

Loading

chosak commented Oct 30, 2019

gasman commented Oct 30, 2019

kaedroho commented Oct 30, 2019

zerolab commented Jan 24, 2020

kaedroho commented Aug 4, 2020

SidharajYadav commented Mar 19, 2022


		The list of instances offered as import source sites will be configurable through the Django settings file. In cases where sites are locked down by basic authentication, IP range or similiar, it is the implementor's responsibility to ensure that each instance is able to make HTTP / HTTPS requests to its 'sibling' sites.

		WAGTAIL_IMPORT_SOURCE_INSTANCES = [


		To avoid the need to modify Wagtail's core data model, the import-export app will define a new model / database table to store the mappings from UIDs to local installation-specific IDs:

		class IDMapping(models.Model):

RFC 42: Data transfer between Wagtail installations #42

RFC 42: Data transfer between Wagtail installations #42

Conversation

gasman commented Oct 1, 2019 • edited Loading

kaedroho Oct 1, 2019 • edited Loading

Choose a reason for hiding this comment

kaedroho Oct 1, 2019

Choose a reason for hiding this comment

kaedroho Oct 1, 2019

Choose a reason for hiding this comment

cyface commented Oct 1, 2019 • edited Loading

chosak commented Oct 30, 2019

gasman commented Oct 30, 2019

kaedroho commented Oct 30, 2019

zerolab commented Jan 24, 2020

kaedroho commented Aug 4, 2020

SidharajYadav commented Mar 19, 2022

gasman commented Oct 1, 2019 •

edited

Loading

kaedroho Oct 1, 2019 •

edited

Loading

cyface commented Oct 1, 2019 •

edited

Loading