Drupal Migration Tips

Microserve's picture
Nov 6th 2015

Credit: This blog post was written by Oliver Davies during his time at Microserve.

As part of our recent work on road.cc, we performed a large data migration and transformation of hundreds of thousands of rows of data into their new Drupal 7 site, including users, taxonomy terms, nodes and comments. We did this using a combination of the migrate, migrate_d2d and migrate_extras modules, as well as a custom module to house all of our own migration code. During this process, I’ve collated some tips and tricks that I found useful.

Use Drush

I’d suggest that you use Drush to run the migrate commands rather than using the Migrate UI. I’ve found it to be more robust because I’ve had migrations fail when being run via the migrate UI, only to run successfully when executed via Drush.

There are the main Drush migrate commands that you can run:

$ drush migrate-import (mi)
$ drush migrate-stop (mst)
$ drush migrate-reset-status (mrs)
$ drush migrate-rollback (mr)

To see a full list of the available Drush Migrate commands, run $ drush --filter=migrate.

Use prepareRow() and drush_print_r()

If you’re used to using functions like dpm(), dsm() or kpr() in your module code to find out a value of a variable, or what properties an array or object has, there's a similar function in Drush - drush_print_r(). This outputs data to the screen in the same way that PHP’s print_r() function does.

I tend to use it within the prepareRow() method to see what data is available within the $row object.

protected function prepareRow($row) {
  drush_print_r($row);
}

If you are using migrate_d2d or extending another class, remember to use parent::prepareRow() to add to the preparations in the parent class rather than overridding them, and also to skip the row if was skipped in the parent class.

class RoadccPageNodeMigration {
  ...

  protected function prepareRow($row) {
    if (parent::prepareRow($row) === FALSE) {
      return FALSE;
    }
  }
}

Limit the Number of Items that You Are Importing

Rather than waiting for an entire migration to run to confirm if your latest script amend or addition worked, you can run a migration on a reduced number of items by using the --limit option.

drush mi RoadccPageNode --limit=”10 items”

You can limit by the number of items, such as "10 items", or by the amount of time, such as "60 seconds".

Update, not Rollback

You can also save time by using the --update option to update any already-imported rows, rather than rolling back and removing them, then re-importing.

drush mi RoadccPageNode --limit="10 items" --update

Use addSimpleMappings()

As part of each migration, you need to map the source values to the approprate destination using the addFieldMapping() method. For example:

$this->addFieldMapping('destination', 'source');

If, however, the source and destination names are the same, you can use the addSimpleMappings() method. This just takes a list of property names in an array and automatically uses each one as both the source and the destination.

$this->addSimpleMappings(
  array(
    'uid',
    'created',
    'changed',
    'field_foo',
    ...
  )
);

If you are using migrate_d2d then some of the common properties - e.g. uid, created, changed - will already be mapped in this way in the parent class.

Use addUnmigratedSources() and addUnmigratedDestinations()

If you use the Migrate UI, then you may see messages like the one in this image. In this example, there are 108 unmapped destination properties, although the same can happen for sources (properties attached to the data being imported). These may be intentionally not mapped, a newer source database has added more sources following a schema update, or a new module has been installed and has added more destinations.

If you do mean to intentionally not map a source or a destination, then use the addUnmigratedSources() and/or the addUnmigratedDestinations() method within your constructor after declaring your field mappings.

Both methods take an array of property names to declare as unmigrated, and will therefore mark them as mapped and remove the error.

public function __construct(array $arguments = array()) {
  ...

  // These fields are not being migrated, so mark them as such.
  $this->addUnmigratedDestinations(
    array(
      'field_one',
      'field_two',
      'field_three,
    )
  );
}

This makes it much clearer when you or a colleague re-visits this migration at a later date that these were intentionally not mapped and not forgotten about or were not present when the migration class was written.

Write your own base Migration Classes

Because migrate is based on object-oriented classes, these can be extended and customised as needed, making them extremely flexible. I’ve found this to be very useful when I needed to do something that needed to apply to all migrations, such as getting the database connection, or something affected all migrated nodes, such as replacing full URLs with relative ones so that they work on different environments, or mapping the source node ID values to the destination ones.

This is done by writing our own abstract classes that extend the default ones such as Migration or DrupalNode6Migration. Because we’re using the abstract keyword before the class name, we ensure that these classes cannot be instantiated directly, and must be extended by another class.

Extending a Normal Migration Class

class RoadccMigration extends DrupalMigration {
  protected function getConnection($connection = 'migrate') {
    return Database::getConnection('default', $connection);
  }
}

In this example, we can use $query = $this->getConnection() as the starting point for any classes that extend RoadccMigration, and then continue building the query using the db_select() syntax. This means that there is less duplication within our custom classes, and it makes it easy to update if needed as it’s only declared once.

Extending a migrate_d2d Class

abstract class RoadccNodeMigration extends DrupalNode6Migration {
  public function __construct(array $arguments = array()) {
    parent::__construct($arguments);
  }

  protected function prepareRow($row) {
    // Update any absolute URLs.
    foreach (array('body', 'teaser') as $property) {
      if (isset($row->{$property})) {
        if (strpos($row->{$property}, 'http://www.road.cc')) {
          $row->{$property} = str_replace('http://www.road.cc', '', $row->{$property});
        }

        if (strpos($row->{$property}, 'http://road.cc')) {
          $row->{$property} = str_replace('http://road.cc', '', $row->{$property});
        }
      }
    }
  }
}

In this example, we’ve extended the DrupalNode6Migration class from the migrate_d2d module, and are performing some transformations on the body and teaser values - removing the full URL so that users aren’t redirected back to the original production site rather than to their intended destination.

As all of our node migrations extend RoadccNodeMigration, this automatically applies to all nodes imported via the migration.

Limit your Result Set

If you need to test something, like if all of your field mappings are working, I found it beneficial to find a small collection of examples that would cover all use cases, and then limit the query so that the migration would only affect those nodes, rather than reguarly searching for the right examples to test against.

If you’re writing normal migrations you can do this in your __construct() method as part of your query. If you’re extending a migrate_d2d class, then you’ll need to add your own query() method and add the additional conditions to the query from the parent class.

For example:

class RoadccPageNodeMigration extends RoadccNodeMigration {
  ...

  protected function query() {
    // Get the query from the parent class.
    $query = parent::query();

    // Add any new conditions. In this case, just filter on this single node.
    $query->condition('n.nid', 123456);

    // Return the new, full query.
    return $query;
  }
}

This means that you can quickly re-run that migration and see how your changes affected the result, if at all, rather than waiting for the entire migration to be re-run on of hundreds or thousands of nodes.

Just remember to remove the test conditions when they are no longer needed. If you use Git for version control, I’d suggest using git add -p to interactively add chunks of code to the staging area, allowing you to review each one and keep your code repository clean of any test conditions.

Mark Pavlitski's picture

You may also like...

Why Drupal 8 entering Release Candidate phase matters

Mark Pavlitski, Oct 8th 2015
We're really excited to see that Drupal 8 has officially passed Beta and entered the Release Candidate phase on schedule. What does this mean for...

Comments

Yet Another Drupal Developer

Nov 6th 2015 - 5:11pmreply
Yet Another Drupal Developer's picture

I am working on a similar project with a bit more nodes than what you show in the screenshot and I feel really good when reading your suggestions comparing them with my experiences. There is one suggestion I will ask you to write about, how do you handle the unprocess (stuck) nodes or terms? how to find our why they are not been moved over.

Lady1973

Aug 30th 2016 - 1:08amreply
Lady1973's picture

someone knows if there a RSS feed to signup on this website?

Mark Pavlitski

Sep 7th 2016 - 4:09pmreply
Mark Pavlitski's picture

We publish most of our content to Drupal Planet and you can subscribe to our planet feed.

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Related Blogs

Sophie Shanahan-Kluth's picture
Mark Pavlitski's picture
Rick Donohoe's picture