04 Jun 2012 » Pinternationalization
We translated our site to Spanish and will continue to translate it into other languages in the future.
One day, I came into work and they said, "You are in charge of Internationalization now." I won't lie, it wasn't the most exciting news--for most developers, localization is a daunting task. I brought this upon myself after emailing my fellow engineers about how to do localization. At Mozilla WebDev localization is a core-competency. You work closely with a team of volunteer translators and you appropriately extract a variety of messages. Mozilla even built some useful tools to help with this.
I adapted this process for Pinterest and I've come to enjoy localization.
How localization works.
In general localizing a web site involves a few steps steps:
- Message marking: Any message on the site (e.g. "Hello Dave", "Login", "Repin") has to be "marked" as localizable.
- Message extraction: We have a tool that extracts any messages found in our codebase and builds a translation template.
- Translation: This involves taking a file filled with English messages and adding a translation for each one in multiple languages.
- Compilation: Each message is compiled into a binary format that allows for fast translation lookups.
It seems deceptively simple, but it can get very complicated as you'll see.
How it worked at Mozilla
At Mozilla any text we wrote had to be localized. It's very much a global organization.
All text had to be wrapped in special tags. These special tags served two purposes:
- They will look up in a message database what the translation is.
- They let our internal tools find these messages, so our translators can translate them.
Step 2 is what we call extraction. Some teams at Mozilla would automate this process and automatically email the localizers that the messages are ready for translation.
We had a tool that allowed volunteers to begin translating. These translators had leaders and the leaders worked with people at Mozilla to make sure the process was working.
The translated strings would automatically saved to our code repository and we'd then compile that before we deployed a web site.
How it works (for now) at Pinterest
The Mozilla process worked well, but there was a lot of awkward steps that I didn't want to replicate. Unfortunately we still had to markup strings. It's a lot more difficult to do this with a social networking site (versus the Mozilla web sites) because you get messages like this:
Bob and 6 others liked your pin.
This involved writing a uniquely translatable messages for:
- Bob liked your pin.
- Bob and 1 other liked your pin.
- Bob and
nothers liked your pin.
The hardest part was Pinterest was built without translation in mind--therefore assumptions were made about how we could dynamically construct sentences. For example (pseudo-code):
message = "Bob" if others: message += "and n other" if n != 1: message += "s" message += "liked your pin"
Localizing those four fragments ("Bob", "and n other", "s" and "liked your pin") wouldn't work as different languages have different rules for plurality, ordering of subjects and general sentence construction.
I spent a lot of time correcting these types of messages as well as doing in person code reviews in order to help other developers construct their own sentences. This is an on-going process and probably the most difficult part of translating .
For a lot of these strings, it was a real simple change, I employed a lot of vim macros to assist me (
set paste is your friend).
A tool that could find unmarked messages in our codebase would have been immensely useful. Instead I wrote a script to build a "!!!YELLING!!!" translation which uppercases everything and adds exclamations. This is a similar strategy that many teams use, including the Firefox Add-ons team, for finding untranslated text.
My colleague, Sarah, has been building a translation team. Together we figured out how we wanted to start. We decided to hire native translators who are familiar with our site. The feedback and discussion we've received for Latin American Spanish has been great. From them we're able to identify things that are difficult to translate, and help build better message strings and context so that a translator can effectively do help write copy for our site.
Context can be screenshots (using CloudApp heavily) or just a lengthy comment explaining where a word is used in the site.
To help facilitate the actual translation we employ Transifex as a translation hub:
- We take our extracted message template, upload it to transifex.
- Transifex merges those strings into templates for each language.
- Translators can download those language files and translate them, or they can translate them on the Transifex site directly.
- We download the translated strings.
We automate this process and upload our messages weekly and download translations twice daily. This means developers just need to worry about marking new strings, everything routine is done for them.
This process has gone well. We really liked focusing on one language to begin. It helped us narrow our focus. We learned that picking a small set of translators and eliminating as many of the levels as you can between engineer and translator is very useful. Keeping that loop tight allows you to keep translation quality high.
We had a lot of outside help. My former team, the web devs at Mozilla, specifically Wil Clouser, helped create some useful tools like tower. This utilizes gettext and babel which actually make light work of supporting internationalization. A lot of people, including Dan from Dropbox and Dimitri from Transifex have given us a lot of great advice, both technically and operationally. We've also had a lot of help from translators, volunteers and friendly Pinterest users, so thanks!
I'm excited to have Pinterest available to Spanish speakers (European Spanish is coming soon), and I'm excited to continue to bring this to the rest of the world.