Content Localization Strategies

Hello, Habr! I present to you the translation of the article "Strategien zur Lokalisierung von Content" by Nicolai Goschin.



Setting localization of content and, therefore, setting the language of the product interface so that the correct language is displayed for the right user is extremely important for each digital platform.



Background and Preliminary Considerations



Digital projects intended for audiences in different countries or in different language areas want to take advantage of localization strategies. Thus, we must answer the following question: which users should be provided which content and in which languages? The question at first glance seems simple. But later in this article we will indicate why this topic is, in fact, complicated. And, of course, we will also look at how to deal with this complexity.



As we delve deeper into the topic of localization in this article, there are two mechanisms that we need to understand from the very beginning. The first is the browser language setting, and the second is the user's IP address.



Browser Language Setting



Each time a website is requested, the web browser automatically sends the browser language to the server, which can be configured by the user through the browser settings. The default language is the language of the operating system. It is important to know that most users do not know that they can change the language. Each language usually consists of two parameters: the language itself and the region. Germany uses de-de, that is, Germany-Germany, Austria uses de-at, which means "Germany-Austria", and the US uses en-us.



In addition, the user can specify a list of languages ​​in the order of their preferences, for example, for example: en-us, en, de. In this case, the user's first choice is English in the US region, region-independent English is their secondary option, and region-independent German is their least demanded.



User IP



An IP address (for short, “IP”) is a custom “Internet address”. This is a designated number that can be used to identify a user on the Internet and contains information about their location. For example, you can determine the country from which a visitor visits the site by IP address. This is possible because certain IP ranges are assigned to individual countries. For example, IP addresses between 2.16.240.0 and 2.16.255.255 are assigned to Germany. If the user has an IP address of 2.16.265.100, we know that this person is connecting to the Internet from Germany.



It should be noted that there are other methods that can be used to determine the location of the user. However, at this point we omit them, because they ultimately provide the same information as the IP address.



Thus, we now know that there are two sources from which information about the language or location (country) of the user can be obtained. At this stage, we will consider how we can use this information for localization, that is, adaptation of content to different languages.



Linguistic localization



The simplest and most common form of localization is linguistic, based on the settings of the browser language. This method assumes that the user has set the desired language in the settings of his browser.



In Germany, most users use de-de, de and en. This combination implies that German content is preferred for Germany (de-de). If such content is not available on a specific website, then German content from any other region will be used, even if it does not apply to Germany (de). If there is no other content in German, the final version of English will be used.



In the scenario that we described in the introduction (an online magazine with German, English, and Arabic versions), all customers who set their language code to de should receive German content. In other words, these are all users whose main language is de-de, de-at, de-ch, de and so on.



For users who also understand English or Arabic, the situation is slightly more non-standard. Although German-speaking countries (which gather in the so-called DACH region) border each other geographically, this does not apply to English-speaking or Arabic-speaking countries. For example, English is spoken in the USA, England and Australia. In addition, English is the language that people understand best after their mother tongue in most countries of the world. That's why it is often indicated as an additional language in all browser settings.



Thus, if in our described scenario we configured the site to be localized solely based on the browser language, users from the USA and Australia will receive our English-language content. Users from Egypt and the United Arab Emirates will see Arabic content. So far, so good.



Disadvantages of localization depending on the language setting of the browser

This type of language definition becomes problematic if the language installed in the browser does not match the user's native language. This may be the case, for example, when a German-speaking user works in Germany in an international company in which the operating system and, by default, the browser are also set to English (en). This user will see the content in English, even if his native language is German.



A similar problem occurs in countries where the official language or business language is usually English, but the population speaks a different language. This is the case, for example, in countries such as the United Arab Emirates.



IP based geographic localization



The disadvantages of language localization are partially offset by IP-based localization. In the latter method, the language is determined based on the country from which the user has access to the Internet.



At first glance, IP-based localization seems to be a waterproof solution because it allows the case described above when the browser has configured a diverging language. Thus, using this method, a user in Germany always receives content in German, even if his browser is installed, for example, in English as the main language.



Disadvantages of IP Localization



So, is IP-based localization a panacea? Anyone who thinks so is wrong. The basic assumption is that all users who are in the same country are native speakers of that language. And this, of course, is far from reality. For example, someone who is in Germany but only speaks English will see all German web content, although the site is also available in his native language.



Finally, IP-based localization ignores browser language settings and is based solely on location. For example, we encounter this drawback when we work on the Internet while on vacation and do not see any content in our native language. Instead, web pages are displayed only in the language of the country in which we are located.



Combined localization



To work out a more optimal solution, both described approaches can now be combined so that we can better handle these borderline cases. We mean cases where we should not rely solely on the IP address or language of the browser. As described above, this is valid for non-native speakers in the host country and users with improperly configured browser language preferences.



And here is how we handle such cases:



  1. We use IP localization as the main criterion, that is, we are selected from the geographical location of the user, such as, for example, Germany.
  2. Then we check whether a certain place has also been set in the browser language settings. If there is a match, we display the content in the appropriate language. If the two data sources do not match, we will use IP localization. The main assumption here is that a user from a given country probably mastered the national language to some extent.
  3. Finally, we check if the content is available in other browser languages. If so, we display a pop-up window (similar to a cookie notification) informing the user that the web page is also available in the alternative languages ​​that they listed in their browser settings. So that visitors to the site can switch to another language or close the pop-up window with one click.
  4. Cookies are used to determine if the user has switched the language or rejected the pop-up window. And in the next session, the content will be displayed in the selected language.


For example, a user accessing the Internet from Egypt, but using a browser with the German language set as the main language, will see such a pop-up window. Content will initially be displayed in Arabic. However, the user will collectively see the following message in German: “This website is also available in German. Want to upgrade to the German version? ”



Now we can apply the same logic to various alternative languages ​​(languages ​​that are displayed if the required language is not available) by defining specific rules.



Access via Google



Another advantage of this differentiated method is that it allows you to better control access to the website through search engines such as, for example, Google. Search engines take into account the language of the browser and not necessarily the location of the user. Thus, a user who accesses the site through a search engine is always directed to a version that matches the language of the browser, even if there is a better match with the location (based on IP address). The user can still switch to other relevant language content through the pop-up window described above.



Conclusion



It is necessary to take into account the combination “content-language-user” not only for ease of use or, rather, for user convenience, but also for marketing and strategy. Therefore, the above task does not claim to be absolutely correct - the specific goal of the project is the decisive factor. However, if you take into account both the location and the language (that is, the IP address and language setting of the browser), the results will be much better, since border cases can also be processed correctly.



All Articles