Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using migrate plus to import xml as nodes #819

Closed
Natkeeran opened this issue Mar 21, 2018 · 27 comments
Closed

Using migrate plus to import xml as nodes #819

Natkeeran opened this issue Mar 21, 2018 · 27 comments

Comments

@Natkeeran
Copy link
Contributor

Natkeeran commented Mar 21, 2018

On the March 21, 2018 CLAW Call some people wondered if it would be possible to import MODS xml into Drupal. This ticket is to explore this further.

Importing XML (i.e MODS) into Drupal 8 is a reasonably straightforward process. Please see this module here for an example: islandora_migrate_mods. The XML parser supports XPaths. Example: https://github.com/Natkeeran/islandora_migrate_mods/blob/master/config/install/migrate_plus.migration.import_mods.yml#L16.

In that example, MODS is combined into one data file. However, multiple xml files can be imported as well, without combining.

Drupal migrate has a UI and quite flexible. We can import files, inline entity nodes etc as well.

Once in Drupal, we can create tooling to generate MODS using Twig templates.

@dannylamb
Copy link
Contributor

Nothing better than going to make a ticket only to find @Natkeeran has already done so, and with more info/detail than I could. 🤗

@dannylamb
Copy link
Contributor

dannylamb commented Mar 22, 2018

Ok, so looking at sandbox's MODS form for basic image, the only thing that allows multiple nested entries is name. Everything else could just be flattened. Now obviously that doesn't capture the full scope of MODS, but being able to import a record from the 7.x sandbox is a pretty compelling proof of concept. For the price of having to make a single content type with a handful of fields, it's worth exploring.

@ajs6f
Copy link

ajs6f commented Mar 22, 2018

being able to import a record from the 7.x sandbox is a pretty compelling proof of concept

This times ten.

@dannylamb
Copy link
Contributor

@Natkeeran So how do I actually run the migration in https://github.com/Natkeeran/islandora_migrate_mods? All I get is a button to "Add a migration" that whitescreens. Am I missing something?

@Natkeeran
Copy link
Contributor Author

@dannylamb
Sorry, just seeing this message. Did it install properly? We can go over it together sometimes this week.

@dannylamb
Copy link
Contributor

@Natkeeran Seems to have installed properly. I composer'd down migrate_tools and migrate_plus and the cloned down your repo. It also doesn't seem to be related to anything you're doing. I uninstalled your module and retried the "Add a migration" button and it still whitescreened. So it's gotta be a migrate thing.

I'm getting this error from the Drupal logs:

Drupal\Component\Plugin\Exception\InvalidPluginDefinitionException: The "migration" entity type did not specify a "add" form class. in Drupal\Core\Entity\EntityTypeManager->getFormObject() (line 184 of /var/www/html/drupal/web/core/lib/Drupal/Core/Entity/EntityTypeManager.php).

Ever run into that?

@DiegoPino
Copy link
Contributor

Different Drupal core versions? @Natkeeran what version are you running? I guess Danny is on 8.5.0 right?

@dannylamb
Copy link
Contributor

@DiegoPino Yeah, about time to do the version check dance.

@Natkeeran I'm on 8.4.5 for Drupal.

@Natkeeran
Copy link
Contributor Author

@dannylamb

Not sure what you mean by Add a migration.

Please go to localhost:8000/admin/structure/migrate/manage/islandora_mods/migrations, and you will see the available migration there.

@dannylamb
Copy link
Contributor

@Natkeeran Yep, I can see it. But I've got an action button above the migration that whitescreens. I guess the migrate_tools UI is rough around the edges.

How are you kicking off the migration? From the README in the migration example module, it says you have to use drush to run the migration, but I have no migrate commands when I check what's available with drush. The drupal console does appear to have some functionality, though it seems to be geared towards migrating from an earlier Drupal?

@seth-shaw-unlv
Copy link
Contributor

I've been playing with the Migrate API a lot lately and have used the drush commands from migrate_tools v.4 almost exclusively. I would double-check your drush and migrate_tools install.

@DiegoPino
Copy link
Contributor

@Natkeeran @dannylamb @seth-shaw-unlv @whikloj @jonathangreen @mjordan i just had an idea. What if we migrate from Solr?
And what if we do it like this?
https://www.previousnext.com.au/blog/migrating-content-from-solr-drupal

@dannylamb
Copy link
Contributor

After lots of exploration, yes, this will do just fine. We can even stage multiple migrations that are interdependent, and the migrate module figures it out.

I've tested by using xml files on the filesystem, so if we want to pull from an actual islandora 7.x site we'll need a source plugin (probably solr as @DiegoPino is suggesting) that will get us the list to migrate, and then we can start requesting individual datastreams using islandora_rest or the fcrepo3 api 🤢

@mjordan
Copy link
Contributor

mjordan commented Apr 6, 2018

Islandora REST provides a list of all datasteams on an object:

{
   "pid":"alping:756",
   "label":"Mt. [Mount] Baker ice school, July 15, 1951",
   "owner":"admin",
   "models":[
      "islandora:sp_large_image_cmodel",
      "fedora-system:FedoraObject-3.0"
   ],
   "state":"A",
   "created":"2016-06-07T14:09:40.056Z",
   "modified":"2016-06-07T17:44:02.068Z",
   "datastreams":[
      {
         "dsid":"RELS-EXT",
         "label":"Fedora Object to Object Relationship Metadata.",
         "state":"A",
         "size":553,
         "mimeType":"application\/rdf+xml",
         "controlGroup":"X",
         "created":"2016-06-07T14:09:40.056Z",
         "versionable":true,
         "versions":[

         ]
      },
      {
         "dsid":"MODS",
         "label":"MODS Record",
         "state":"A",
         "size":4561,
         "mimeType":"application\/xml",
         "controlGroup":"M",
         "created":"2016-06-07T14:09:40.056Z",
         "versionable":true,
         "versions":[

         ]
      },
      {
         "dsid":"DC",
         "label":"DC Record",
         "state":"A",
         "size":2117,
         "mimeType":"application\/xml",
         "controlGroup":"M",
         "created":"2016-06-07T14:09:40.056Z",
         "versionable":true,
         "versions":[

         ]
      },
      {
         "dsid":"OBJ",
         "label":"OBJ Datastream",
         "state":"A",
         "size":1651496,
         "mimeType":"image\/jp2",
         "controlGroup":"M",
         "created":"2016-06-07T14:09:40.056Z",
         "versionable":true,
         "versions":[

         ]
      },
      {
         "dsid":"TECHMD",
         "label":"TECHMD",
         "state":"A",
         "size":6725,
         "mimeType":"application\/xml",
         "controlGroup":"M",
         "created":"2016-06-07T17:43:47.724Z",
         "versionable":true,
         "versions":[

         ]
      },
      {
         "dsid":"TN",
         "label":"Thumbnail",
         "state":"A",
         "size":5527,
         "mimeType":"image\/jpeg",
         "controlGroup":"M",
         "created":"2016-06-07T17:43:52.991Z",
         "versionable":true,
         "versions":[

         ]
      },
      {
         "dsid":"JPG",
         "label":"Medium sized JPEG",
         "state":"A",
         "size":33129,
         "mimeType":"image\/jpeg",
         "controlGroup":"M",
         "created":"2016-06-07T17:43:59.265Z",
         "versionable":true,
         "versions":[

         ]
      },
      {
         "dsid":"JP2",
         "label":"JPEG 2000",
         "state":"A",
         "size":1651496,
         "mimeType":"image\/jp2",
         "controlGroup":"M",
         "created":"2016-06-07T17:44:02.068Z",
         "versionable":true,
         "versions":[

         ]
      }
   ]
}

Even custom datastreams are included in this list, so we wouldn't need to rely on content models to determine the list of datastreams.

If we want a list to objects to migrate, an option would be to use OAI-PMH to get the objects.

@whikloj
Copy link
Member

whikloj commented Apr 13, 2018

Being slow to this party, you might have covered this but in the example code I read this.

The migration framework keeps track of the relationships between source and destination IDs in map tables, and the migration plugin is the means of performing a lookup in those map tables during processing.

I'm wondering if we could export this mapping table after the fact, as we will need a way to redirect users from the old PID URIs to the new Drupal URIs.

@seth-shaw-unlv
Copy link
Contributor

seth-shaw-unlv commented Apr 13, 2018

@whikloj I don't know if you can programmatically, but you certainly could run a SQL query against the db to get it. I used SQL queries against the migration mapping and message tables several times while trouble-shooting my migration development.

@whikloj
Copy link
Member

whikloj commented Apr 13, 2018

Oooooo we can make our own ID mapping code. So we could store it where ever we want like a Redis cache or a text file. https://cgit.drupalcode.org/drupal/tree/core/modules/migrate/src/Plugin/MigrateIdMapInterface.php

@whikloj
Copy link
Member

whikloj commented May 4, 2018

This is a work-in-progress, but it does query a remote Solr instance for the PIDs of items of a specific content-model and then use a modified XML data fetcher to grab the objectXML straight from Fedora.
https://github.com/whikloj/migrate_7x_claw

@mjordan
Copy link
Contributor

mjordan commented May 4, 2018

@whikloj this is awesome, but what if a site has its Solr and Fedora firewalled off (like we do)? Ima use your code as the basis for similar functionality via the 7.x REST module.

@mjordan
Copy link
Contributor

mjordan commented May 4, 2018

The more the merrier!

@mjordan
Copy link
Contributor

mjordan commented May 4, 2018

Maybe at next week's CLAW call we can focus on migrations? @dannylamb any objections?

@whikloj
Copy link
Member

whikloj commented May 4, 2018

@mjordan Depends on the firewall, if you can't access one machine from the other then obviously there is nothing you can do. The HTTP data fetcher plugin allows for authentication, and I am using that for accessing Fedora (as you need API-M access to get the objectXML). What I have determined here is that I could be re-using the datafetcher plugin so long as Solr doesn't need different credentials.

@whikloj
Copy link
Member

whikloj commented May 4, 2018

Basically right now I am testing this by running a 7.x vagrant and a CLAW playbook on my laptop and having one harvest the other. Hence the 10.0.2.2 Fedora URL.

@mjordan
Copy link
Contributor

mjordan commented May 4, 2018

You can run both vagrants at the same time? I'm jealous. 💚

@mjordan
Copy link
Contributor

mjordan commented May 6, 2018

@whikloj I got a simple migration working using D8's built-in JSON source plugin. It requires installing the REST module on the source 7.x. I've put the configuration up at https://github.com/mjordan/7x_claw_migration_over_REST.

@dannylamb
Copy link
Contributor

@mjordan No objections to focusing on migration at CLAW calls. It seems to be the next big frontier for us.

@whikloj
Copy link
Member

whikloj commented Aug 27, 2018

Do we consider this ticket as closed?

I think we have established that this is a viable migration framework and while there is a lot of work to be done, short of writing your own 7.x module to use the new CLAW REST endpoints and push to them (which is also a viable solution) this is the path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants