Manage Localization of .NET Apps
Learn how to localize your applications easily with this robust localization and resource-management tool.
- By Liewen Huang
Technology Toolbox: VB.NET, C#, ASP.NET, C++.NET
Suppose a salesperson on the other side of the world says he needs your enterprise application to support another culture by tomorrow morning so he can demonstrate it to prospective customers in a potentially large market. Whether you respond "No way" or "No problem" depends on how you manage the localization of your application.
The two major steps to internationalize your application are globalization and localization. In the globalization step, you write a culture-neutral and language-neutral application, with the executable code separated from its resources. Visual Studio .NET supports localization of the UI element properties in a convenient way. When you set the Localizable property of a Windows Form to true, VS.NET puts the value of the properties of UI elements in an XML-based resource file (.resx). For example, if there is a button on a form, and the text of the button is Save, you should see this XML segment in the Form1.resx file:
VS.NET also modifies the form's InitializeComponent method to use the ResourceManager class to access strings or other resources from the .resx file. You can put the localizable string tables, such as error messages, in the .resx files and use the ResourceManager class to access these values at run time.
Localization, on the other hand, involves translating the strings and creating the language-specific resource files with the translated strings for all target languages. You can also use the VS.NET IDE to create language-specific resource files. After setting the language in the Windows Forms designer, the localized value for that language is stored in a language-specific resource file that contains the culture as part of the name. For example, if you set the language property to Spanish, and type Guarde for the text for the same button used in the previous example, this XML segment appears in the Form1.es.resx file:
During run time, this translation would be extracted and used in the application when the thread's CurrentUIControlInfo property is set to Spanish.
There are several challenges to the localization process. The first involves gathering the translatable strings. Manually documenting all the strings that need translation (and keeping the documentation up to date as developers continue working on the application) is time-consuming and prone to error. Most of the values in the VS.NET-generated Windows Form's resource files are localizable, but only some of them are needed for translation. For example, both the Text and the Anchor properties of a button are localizable, meaning they can be in different values in different cultures. However, usually only the Text value needs to be translated into the target language. If an Anchor property, such as "Top, Left," is mistakenly translated into another language, VS.NET throws an error. Ideally, all strings stored in the developer-created string table resource files would be translatable, but developers might put non-translatable strings in these files, such as SQL statements. Translating "Select * from Books" into another language would certainly break your application.
The other challenge is creating language-specific resource files from the translated strings. VS.NET tools, such as the IDE or WinRes.exe, require manual input of translated strings, so using these tools can be cumbersome and slow even for small applications. You should focus only on writing the code in the invariant culture when developing the applications; dealing with localized strings negatively affects the schedule to develop and deliver your applications. Outsourcing to a vendor to generate the resource files is also not desirable because it is expensive and time-consuming, and the process is beyond your control. Localizing your application also becomes troublesome as your application grows and more localizable strings are added, or as you need to support more languages, if you don't have a systematic and automatic way to manage it.
Get Help From an LRM
You can solve these problems by managing the localization process through a systematic approach, and controlling this process with a robust and sophisticated localization and resource management tool (LRM). An LRM can turn your application's localization into an automatic and highly efficient process by gathering translatable strings programmatically and generating all the language-specific resource files dynamically (see Figure 1 for a breakdown of the LRM workflow).
There are many benefits to managing your localization process with an LRM, in addition to saving time and money. For example, an LRM generates resource files on the fly, so you don't need to freeze the source code before you localize your application. An LRM analyzes the strings in the resource files and eliminates the strings that don't need translation to lower the translation cost. An LRM can also give feedback on the globalization and the localizability of your applications while it works on the resource strings, which saves testing and code-reviewing time.
An LRM manages any source code updates automatically. For example, if a menu item is added to the application, an LRM will detect it and put a new resource string for the text of the menu item into the Resource Repository. Moreover, when this systematic localization approach couples with machine translation, you can easily add one or more languages that your application supports even if your application is an enterprise suite.
An LRM works with a Resource Repository that holds all the translated resource strings for the supported cultures. When the LRM runs, it navigates the source-code hierarchy to read all .NET project files to gather the resource files included in each .NET project. For each resource file, it analyzes all the resource data to determine if that resource needs to be translated. If a resource needs to be translated, the LRM checks the Resource Repository for each target culture to determine if a string with the same context (resource data name and resource filename) exists. If it does, the LRM uses the translation to create the language-specific resource file.
A Resource Repository can be a single file or multiple files of any desired format. For example, you could use Excel files for the Resource Repository by using a file for each culture. On each spreadsheet, four columns represent the Invariant String, Localized Value, string Name, and resource file Path. The combination of Name and Path is the context of the resource string, and the Localized Value is the translation for the particular language. For example, a row in a spreadsheet for the Spanish culture might have Save for the invariant string, Guarde for the localized value, button1.Text for the name, and c:\myproject\Form1.resx for the Path.
Use the XML Spreadsheet format if you use an Excel file for the Resource Repository. You can version files in XML format easily, but the default Excel Workbook format doesn't allow that. Excel spreadsheets provide useful editing functionality, but each cell can only hold a string of limited length. Using native XML format for a Resource Repository would be a better choice if your resources contain long strings and if you don't want to spread the strings to multiple rows, or if resources contain image files.
The LRM determines if any resource data in the resource file needs to be translated through two kinds of rules: name-based (used for checking the name of resource data) and value-based (used for checking the value).
Name-based rules are based on the patterns of the names. For example, if a resource name ends with .Text, you need to translate it. If a resource name starts with ">>" or ends with .Filter, you do not need to translate it. Usually, name-based rules can return either true or false. You should also define the name-based rules for the developer-defined string table resource files to indicate explicitly if a resource string needs to translate. For example, you might define that a resource string needs to translate if its name ends with _tr, and doesn't need to if it ends with _nt. Resource data that doesn't fall into any name-based rules will be evaluated with value-based rules.
Unlike name-based rules, value-based rules usually return only false; that is, if a resource value meets a value-based rule, you don't need to translate it. Almost all the resources in the VS.NET-generated Windows Forms resource files do not need to be translated if their type or mimetype are not null. You also don't need to translate resources if a string doesn't have letters, if it fits a SQL statement pattern, if it fits a URL pattern, and so on. You need to translate any resource string that doesn't fit any name-based rules or value-based rules.
You can design the rules that work for you based on the contents of the string table resource files in your application. It's important to ensure that the rules work as expected, so the LRM writes all the excluded resources that match the value-based rules to the log for review. For example, the LRM can write the excluded strings to an Excel file named excluded.xls that contains several spreadsheets, where each spreadsheet is dedicated to a value-based rule.
You can review the exceptions and modify your rules accordingly, if desired. For example, the SQL statement pattern value-based rule might mistakenly exclude a string that needs to be translated, such as "Select your option from the following list." In this case, you might consider adding more conditions to the SQL statement pattern value-based rule. One possible option is to determine if the string ends with a period. For those resources that don't fit the best value-based rules you can have, you must change their name to fit the name-based rules, so they will be recognized explicitly as needing to be (or not needing to be) translated by the LRM. In this case, add _tr to the end of the resource name so the string fits the name-based rule and will be translated. Publish all the rules to your developers after you finish setting them up.
Generate Language-Specific Resource Files
For each target culture and for each localizable resource string, the LRM determines if the string with the same context has been translated already and recorded in the Resource Repository. If it has, the LRM will use the translation for this string to create the language-specific resource file. If it has not, the LRM will check the repository to see if the same string with a different context has been translated. If so, this translation will be re-used, and a new row with the same translation, but with new context, will be written to the Resource Repository.
For each resource file and for each target culture, the LRM creates a text-based resource file with <name>=<localized value> pairs for all resource strings as the contents for each culture. For example, a text-based resource file for the Chinese culture would look like this:
The LRM then uses the ResGen command to create the XML-based resource file. The LRM also modifies the project files, such as .csproj for C# projects or .vbproj for VB.NET projects, to include the newly created language-specific resource files. The project is then ready to be compiled. In practice, you can include the LRM in the build process after retrieving the source code from the source control and before the compilation begins.
Running ResGen to generate thousands of resource files could be expensive. You can improve the LRM's performance greatly by first checking the date-time stamps of the default resource file, language-specific text file, language-specific resource file, and Resource Repository to determine if the LRM needs to create the language-specific text file. Next, compare the contents of the new text file with the previous text file to determine if the LRM needs to generate a new XML-based resource file.
The LRM writes a row to the Resource Repository if a string has not been translated, even with different contexts. The Localized Value for this row can be the string in the invariant culture if a translation engine is not served for immediate translation. (You can still use machine translation to translate the strings after the LRM finishes its process.) For example, if a Menu item called Save is added to the application, the LRM detects it and adds a row to the Resource Repository in which both the Invariant String and the Localized Value is Save, the Name could be menuItem1.Text, and the Path could be c:\myproject\Form1.resx. A human interpreter or a machine translation will later replace the string in the Localized Value column (Save) with the translation.
In certain situations, you might need to modify the string before the LRM saves it to the Resource Repository. For Asian languages, if an access key "&" is included in a string in the invariant culture, you might want to move the access key to the end and put parentheses around it. For instance, if the invariant string is "&Save," the localized value saved by the LRM would be "Save (&S)." Then the human interpreter or machine translator will only need to translate Save and keep the access key. The translated value would be " (&S)" for the Chinese culture, in this case.
Manage Translations and Add Cultures
There are two situations in which the LRM will write new rows to the Resource Repository. One situation occurs when a row reuses the existing translation for the same invariant string value but a different context. The other situation occurs when the string has never been translated. In either case, the new row is subject to review or translation by a human interpreter. The LRM can help the interpreter find these strings easily by marking the resource rows when writing them to the Resource Repository. For example, you can add another column, Needs To Review, to the spreadsheet. When the LRM writes a resource row to the spreadsheet, it will put a Y in the column. After a human interpreter translates the localized value, or after a reused translation is reviewed, the interpreter changes the Y to an N. However, if the translation is done by machine translation, which is not always accurate, the Y flag should be kept and the translation should be reviewed by a human.
For a new neutral culture, the LRM will gather all the strings that need to be translated and add them to the Resource Repository. For instance, it could create a new Excel file with a spreadsheet for each neutral culture. Both the file and the spreadsheet will have the same name as the culture name. For example, if French is added, the LRM creates a file named fr.xls and a spreadsheet named fr. For demonstrating or testing purposes, you can use machine translation, such as the Translate feature in Microsoft Word, to translate the strings in the Localized Value column.
The LRM in the online example code supports two ways to add a new neutral culture: by hard-coding the culture in the LRM, or by creating an empty Excel file in the folder that houses the Resource Repository files. Adding this folder and the files to the source control allows anyone having access to these files to review the translation and make changes without touching the source code.
An LRM can also support specific cultures with an approach similar to that for supporting neutral cultures. You can add a spreadsheet for a specific culture to its parent-neutral culture's Excel file. For example, if a translation in a specific culture, such as German (Luxembourg), is different from what it is in the neutral culture (German), you could create a new spreadsheet, de-LU, in the de.xls file and put the translation in this spreadsheet.
The difference between the spreadsheets for the neutral culture and the specific culture is that an LRM reads and writes to the spreadsheets for the neutral culture and only reads from the spreadsheets for the specific culture. That is, the spreadsheets for the neutral culture contain all the strings that need to be translated for the entire application, whether they have been translated or not, but the spreadsheets for the specific culture contain only the resource strings that you input.
Support Localization Rather Than Translation
You use the LRM mainly for translation strings; however, you can easily expand it to localize other values, such as properties of a UI control. For example, you might want the size of a button in a culture to be different from that in the other cultures. One way to support this is to add another column, Enforced, to the Resource Repository to indicate whether a row in the repository is enforced, and then modify the LRM accordingly. For example, if the width of button1 on Form1 is 4 for the invariant culture and 5 for a particular language, you could put a row in the spreadsheet with 4 for the Invariant String, 5 for the Localized Value, button1.Width for the Name, c:\myproject\Form1.resx for the Path, and Y for Enforced.
Then, when the LRM reads the resource file c:\myproject\Form1.resx, it would use 5 as the localized value for button1.Width, completely bypassing the name-based or value-based rules.
You can also write a utility that can add customized localized values to the Resource Repository programmatically. After you've run the LRM and created the language-specific resource files, you can run the VS.NET IDE to customize the controls (such as resize) for the particular culture. You will save the customized value (such as size) in the language-specific culture resource file. Then, you can run the utility to compare the rows in the language-specific culture resource file with the data in the Resource Repository and create new rows in the Resource Repository for the new data in the language-specific resource file.
You can support localizing images by adding another column to the Resource Repository to indicate that the localized value for the row is the path for the image file. You need to modify the LRM to use the ResourceWriter class to add the image to the language-specific resource file after LRM generates the file with the ResGen command.
Using an LRM offers you many different options for localizing your VS.NET applications quickly and efficiently. For example, you might want to know the context of a translated string, or you might simply want to know if translation breaks your application. You can find these things out by having your application support a rare language that your company is not currently planning to support. In the Localized Value column on the spreadsheet for the language, you could input a number concatenating the invariant string instead of inputting translation. The number would be the sequence number of the resource on the spreadsheet. That is, if you have two Open strings, they might become 78-Open and 315-Open if they appear on the 78th and 315th rows of the spreadsheet. When you run your application, you could figure out the context for each Open string from the sequence number in the front of the string and by referring back to the spreadsheet.