Drupal Data Importing as a Thrill-seeking Behavior

Tom's picture
Tags: 

Yesterday Ethan, Chris and I found ourselves needing to move some data into Drupal — we're building an existing client a new site and they need some blog posts moved over from their old one. It sure sounds simple. But as anyone who's actually done data migration from one blogging platform to another (or just from one version to another) can attest, it's rarely that easy. There are a few options, though, which I present here in increasing order of their likelihood to unexpectedly spiral into a huge, awful mess:

Interns/Temps
You'd be surprised how often this is the most efficient data migration technology available to a project. How many nodes are we talking about, anyway? Aren't they going to require a manual review after any sort of automated migration, anyway? How sure are you that you're not going to run into weird unicode problems?
You should really take a good long look at the problem and figure out the hours involved. Frequently the cheapest and easiest option will be hiring a less skilled worker to manually move the data around. It's not glamorous, but it's the truth. Just be sure they don't have MS Word installed on their computers or else they'll inevitably copy and paste high ASCII garbage into your lovely new database.
node_import
This is the only other userspace method of moving Drupal content around that I know of — you export a CSV, upload it to your new server, do some field mapping and hopefully get a bunch of new nodes. I gave it a look a few projects ago and came away frustrated, but Ethan's reported recent success. It handles CCK types, taxonomy, event and location support, and generally seems to be the best hope for Drupal having really robust import/export functionality in the future. But as far as I know it can't handle comments or attachments, and probably chokes on a number of other module integration points. Import is a hard problem, and it's no knock on the module's authors to say that their work can't be all things to all people. But sometimes you're going to need more.
drupal_execute()
The drupal_execute() function lets you summon a Drupal form, populate its values and then submit it with all of Drupal's hooks firing just as they would for manually-entered content. Chris is our drupal_execute() guru, and he's pretty well convinced me that it's the premiere tool for difficult data importing problems. Sure, you might be irritated by the occasional HTTP timeout if you forget your ini_set()'s, and writing your first Drupal-bootstrapping wrapper might seem a little weird — although Drupal 6's command line mode should make this all much less painful — but in general it's an awfully powerful technique. Jeff Eaton has a nice example of it over at Lullabot's blog.
SQL!
But there are times when you just can't be bothered. You've got the old database; you've got the new one. Why are you being forced to screw around with an intermediary layer of abstraction?

Well, there are a few reasons. First, there's the sequences table — Ethan pointed out to me that it's disappearing in Drupal 6, but for earlier versions you'll need to update it after any direct SQL butchery. Second, there's the endless (and yet inconsistent!) JOINs of post-CCK Drupal — teasing out where all that data is supposed to go can take longer than jumping through Drupal's API hoops. Having taken this route with our Greenpeace UK project, writing a rat's next of Coldfusion-to-Drupal scripts in Perl, I can say with confidence that if your eyes immediately gravitated to this section of the post, you should probably go give drupal_execute() a closer look. Seriously.

But sometimes the temptation is impossible to resist. Need to move over a bunch of legacy path aliases? Have a big batch of comments that need to get into the system (and no fear of the node_comment_statistics table)? Then you may as well make sure you're good & backed up, clear some time afterward to look for odd bugs, and dive in. Sometimes it's worth doing just to remind yourself that you know web technologies beyond the scope of api.drupal.org.

Those are the options we considered, anyway. Am I missing anything? Have you done something unusual, like doing a large-scale import via Development Seed's FeedAPI? Let us know in comments.

Weird! There was a great

Weird! There was a great comment here from Tim but now it's mysteriously gone. Anybody know what happened to it?

yeah!! where did my great

yeah!! where did my great comment go?

Hmm. Well, it's a mystery.

Hmm. Well, it's a mystery. I can't imagine accidentally deleting it given the number of clicks required, and it's not in the spam folder.

Well, uh... hmm. Let's try to recreate it: there's a movable type importing plugin! Although that link is 4.x only, so maybe I've got the wrong one.

You guys don't do backups?

You guys don't do backups? For small- to medium-ish sized sites I recommend the Backup and Migrate module.

Of course we do backups! We

Of course we do backups! We do nightly backups to an offsite location and realtime intra-NOC replication to a hot spare. Nothing against the Backup and Migrate module, but we've got a genuine enterprise system for backing up our databases and filesystem.

But it's not worth digging into that system just to restore a single comment on a non-client site (no offense to Tim) -- especially when we can remember the gist of it.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockcode>
  • Lines and paragraphs break automatically.
  • You may post block code using <blockcode [type="language"]>...</blockcode> tags. You may also post inline code using <code [type="language"]>...</code> tags.

More information about formatting options

Captcha
Are you a robot? We usually like robots, but not in our comments.