<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://labs.echoditto.com" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>EchoDitto Labs - Node Import Part 0: the prequel - Comments</title>
 <link>http://labs.echoditto.com/node/66</link>
 <description>Comments for &quot;Node Import Part 0: the prequel&quot;</description>
 <language>en</language>
<item>
 <title>Node Import Part 0: the prequel</title>
 <link>http://labs.echoditto.com/node/66</link>
 <description>&lt;p&gt;I thought I&#039;d share some lessons learned in the node creation wild as a follow-up to Tom&#039;s recent post surveying the options available to prospective content importers who need to get a slew of nodes into Drupal from some far off place such as the land of flat HTML, custom CMSes or even the fabled Drupal 4.6.&lt;/p&gt;
&lt;p&gt;For a very exciting upcoming site we had to import a few hundred old blog posts from a Drupal 4.6 install, along with 30 authors and around 2000 taxonomy associations.  While these numbers are nowhere near as big as content migration figures can crawl, the complexity of the data structure was multiplied by the specific requirements of the various node importing schemes available in Drupal. This multiplicative factor yielded a somewhat dizzying array of alternatives that had to be evaluated, with the vast majority of them leading down paths that involved exporting nodes, importing with old legacy data attached then iteratively importing the author and taxonomy data one piece at a time using custom scripts and translation tables.&lt;/p&gt;
&lt;p&gt;Scary.&lt;/p&gt;
&lt;p&gt;While the Macro module definitely made the job a great deal easier -- mainly because it was able to handle authorship, taxonomy and creation timestamps all as part of the node creation process -- one of the real gems I uncovered was a MySQL function I wish I&#039;d found years ago: &lt;code class=&quot;geshifilter php&quot;&gt;GROUP_CONCAT&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;GROUP_CONCAT concatenates all records returned by a single field query, breaking up the results according to the GROUP BY clause, if one is present.   As an example, this means that this statement:&lt;/p&gt;
&lt;p&gt;SELECT GROUP_CONCAT(t_d.name) AS tags, t_n.nid FROM term_data as t_d JOIN term_node AS t_n ON t_n.tid = t_d.tid WHERE t_d.vid =2 GROUP BY t_n.nid&lt;/p&gt;
&lt;p&gt;will return a two column result set, in the first column will be a comma-delimited listing of all taxonomy terms from the vocabulary with vid=2 and  in the second column we&#039;ll have the nid of the post which has all these tags.  In the past, this step would require my writing some intermediary script to reduce the fully-normalized result in which each term was on a separate row.&lt;/p&gt;
&lt;p&gt;The real advantage of doing this de-normalization in pure SQL comes when we use this expression as a sub-select to construct a single table with all the information needed to construct full drupal_execute form value arrays for each node. This can be done with SQL that looks something like this:&lt;/p&gt;
&lt;p&gt;SELECT n.nid, a.dst, n.title, n.created, n.body, tags.tags, u.name, u.uid, u.mail FROM node as n JOIN&lt;br /&gt;
	(SELECT GROUP_CONCAT(t_d.name) AS tags, t_n.nid FROM term_data as t_d JOIN term_node AS t_n ON t_n.tid = t_d.tid WHERE t_d.vid =2 GROUP BY t_n.nid) as tags ON tags.nid = n.nid&lt;br /&gt;
	JOIN users as u on u.uid = n.uid&lt;br /&gt;
	JOIN url_alias as a ON a.src = CONCAT(&#039;node/&#039;,n.nid);&lt;/p&gt;
&lt;p&gt;This statement makes use of the second column in our GROUP_CONCAT query to join those results to the node data via a subselect. What comes out of this query can then be processed by a PHP script and used to generate the drupal_execute calls or Macro PHP code needed to migrate the nodes into the new database, taxonomy, users and all.&lt;/p&gt;
</description>
 <comments>http://labs.echoditto.com/node/66#comments</comments>
 <pubDate>Wed, 05 Mar 2008 22:36:25 -0800</pubDate>
 <dc:creator>Ethan</dc:creator>
 <guid isPermaLink="false">66 at http://labs.echoditto.com</guid>
</item>
</channel>
</rss>
