Drupal Internationalization: Part I, and introduction

Ethan's picture
Tags: 

We've recently begun putting together the infrastructure for a number of upcoming projects which will organize people around the world toward some pretty powerful, ambitious goals. While the details of those sites are still in the works, working with Drupal to create multi-language sites has been a great experience involving a great deal of learning. This is the first of a two part series covering how Drupal works with multiple languages and the best practices/tricks of the trade for making the most of the Drupal Local, i18n and L10n systems.

What the h3l is this i18n/L10n c2p all about, anyway?

Don’t be scared by the geekspeak! “i18n” and “L10n” are just abbreviations for “internationalization” and “localization”, the engineering methodologies used to create software packages and websites which can be used by speakers of different languages. i18n is the process of engineering an application so that it can be adapted for use by speakers of different languages or in different regions without needing major modifications and L10n is the related step of providing conversions between languages, date formats, etc. so the software or site can be accessed in any particular language (see the Wikipedia article on i18n and L10n for more).

How Drupal Handles Internationalization

Internationalizing an application is no trivial process, and the engineering challenges of architecting a Content Management System (CMS) capable of handling content authored in any of some set of languages and translated into any number of other languages is especially formidable: not only does the content of translated pages need to easily accessible via a unique URL, but the site’s navigation must correctly reflect the content available in each language, the interface elements and HTML generated by the CMS code and any plugin modules needs to be translated (form element labels, validation errors, and such), templates must be able to adjust for right-to-left languages, just to name a few.

Drupal’s internationalization is provided two modules: the Locale module, part of Drupal’s core distribution, allows for the “interface strings”, those bits of HTML output by the Drupal code, to be translated into languages other than English (English is the base language of the Drupal codebase). The Content Translate module in core along with the i18n suite of contrib modules, on the other hand, support the translation of a sites content, as well as adapting the menu and taxonomy systems for multilingual sites. Other contrib modules, such as the localization module, also provide similar features to the i18n modules but won’t be covered here.

The Locale module’s handling of text strings produced within the Drupal codebase is fairly straightforward: any words which will be output as part a page’s content are wrapped in a t() function call. The t() function is not, strictly speaking, provided by the Locale module; it is defined in the all-powerful common.inc. The t() function proper merely looks through the contents of of the global $custom_strings variable for any matches to a given string and translates as specified in the settings.php file. If the locale module is present, however, t() delegates to the local() function, which fetches translations of strings from the database and, depending on the module’s settings, maintains a cache of used translations. These translations of interface strings are provided in gettext format .po files which supply translation strings for each language for each module or, by default, through the locale modules “Translate Interface” admin page which allows individual interface string translations to be entered (though not as elegantly as with the Localization Client module).

While this approach works well for strings within Drupal code, for various reasons it does not work nearly as well for user generated content (for starters, consider the complexity of storing and retrieving translations for a strings originally authored in any one of a site’s supported language). The core requirement of an internationalization framework for a web CMS is that it support the translation of any page’s content. The Content Translate module allows for every post to not only have a specified language but also for translations of nodes to be stored and associated to each other in the database. Content Translate achieves this by technically storing translations as separate nodes from the original with a “translation bit” set. This approach allows for a very elegant interface for entering and maintaining translations of any content: simply click the “translate” tab when logged in as an administrator, choose the language you want to enter the translation for, and enter the translated title and content. If path aliases are used, the translated content for any node can be reached by prepending the language code to the pages path (i.e. fr/about/contact would give the French version of about/contact)

The complexity comes in when one considers the other data associated with a given node, such as taxonomy terms, CCK fields, menu items and publication/comment settings. The Content Translate module’s approach is a very simple and elegant solution to the internationalization problem, but it has the unfortunate side effect: it’s possible for different translations of the same piece of content to have very different taxonomy terms, menu item hierarchies and post settings. As an example, consider the menu hierarchy: in essence Content Translate maintains separate menu trees for each language, which means that the a translated node could live in two very different locations on the site from the original and that users visiting the site in different languages may see different menu options depending on the coverage of translations in any one language. This can be problematic if, taking our earlier about section example, the about page has not yet been translated then the contact page on the French site may be located in the primary navigation and the there is no way for French speaking visitors to read the mission statement. In addition, this introduces a host of maintenance and administrator-user-error scenarios: an admin may make a navigation change that is not carried over through all translations or might mistakenly mis-parent a single translations menu item. Similar problems arise with taxonomy terms (the management of translations of folksonomy/freetagged terms, for instance) and other post settings (though these are mostly a big more straightforward).

In the next installment, we’ll go over some clever solutions to these issues which we’ve learned through practice and from our good friends at Development Seed

Further Reading

I am currently working on

I am currently working on multi language site, waiting for next instalments. Keep it up.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockcode>
  • Lines and paragraphs break automatically.
  • You may post block code using <blockcode [type="language"]>...</blockcode> tags. You may also post inline code using <code [type="language"]>...</code> tags.

More information about formatting options

Captcha
Are you a robot? We usually like robots, but not in our comments.