[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


10    Internationalization


[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


10.1    Overview

The term "internationalization" is formally defined by X/Open as a "provision within a computer program of the capability of making itself adaptable to the requirements of different native languages, local customs, and coded character sets," which means, essentially, that internationalized programs can run in any supported locale without having to be modified. A locale is a software environment that correctly handles the cultural conventions of a particular geographic area, such as China or France, and a language as it is used in that area. So by selecting a Chinese locale, for example, all commands, system messages, and keystrokes can be in Chinese characters and displayed in a way idiomatic to Chinese dialects.

Digital UNIX Version 4.0 is an internationalized operating system that not only allows users to interact with Digital UNIX Version 4.0 in their native language, but also supports a full set of application interfaces, referred to as the X/Open Worldwide Portability Interfaces (WPI), to enable software developers to write internationalized applications. The code came from the OSF and was enhanced by Digital.

The internationalization subsystem in Digital UNIX Version 4.0 is based on POSIX 1003.2 and Single UNIX specifications. Commands, utilities, and libraries (including the curses library) have been internationalized, and a set of enhanced US English message catalogs and message catalogs that supports Asian languages have been included in the base system. In addition, Digital UNIX Version 4.0 supports the X Input Method (XIM) and X Output Method(XOM) to facilitate input of local language characters, text drawing, measurement, and inter-client communication which is implemented according to the X11R6 specification.

Note that Digital UNIX Version 4.0 also supports a 32 bit wchar_t datatype which in turn enables support for a wide array of codesets, including the full ISO 10646 standard.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.2    Supported Languages

Table 10-1 lists the languages supported in Digital UNIX Version 4.0 and their corresponding locales. The locales are built using new internationalization utilities and are more robust than those offered on previous versions of the operating system. Note that in Digital UNIX Version 4.0, the content of the locale definitions has changed to align with new national profiles and registered locale definitions. Those marked with an asterisk are available as part of language-variant subsets which can be optionally installed.

Table 10-1: Languages and Locales

Language Locale Name
Catalan * ca_ES.ISO8859-1
Simplified Chinese/PRC * zh_CN.dechanzi
zh_CN.dechanzi@pinyin
zh_CN.dechanzi@radical
zh_CN.dechanzi@stroke
Chinese/Hong Kong * zh_HK.big5
zh_HK.dechanyu
zh_HK.dechanzi
zh_HK.eucTW
Traditional Chinese/Taiwan * zh_TW.big5
zh_TW.big5@chuyin
zh_TW.big5@radical
zh_TW.big5@stroke
zh_TW.dechanyu
zh_TW.dechanyu@chuyin
zh_TW.dechanyu@radical
zh_TW.dechanyu@stroke
zh_TW.eucTW
zh_TW.eucTW@chuyin
zh_TW.eucTW@radical
zh_TW.eucTW@stroke
Czech * cs_CZ.ISO8859-2
Danish da_DK.ISO8859-1
Dutch nl_NL.ISO8859-1
Belgian Dutch nl_BE.ISO8859-1
US English/ASCII en_US.8859-1 (C/POSIX)
US English/ISO8859-1 en_US.ISO8859-1
GB English en_GB.ISO8859-1
Finnish fi_FI.ISO8859-1
German de_DE.ISO8859-1
Swiss German de_CH.ISO8859-1
Greek el_GR.ISO8859-7
French fr_FR.ISO8859-1
Belgian French fr_BE.ISO8859-1
Canadian French fr_CA.ISO8859-1
Swiss French fr_CH.ISO8859-1
Hebrew * iw_IL.ISO8859-8
Hungarian hu_HU.ISO8859-2
Icelandic is_IS.ISO8859-1
Italian it_IT.ISO8859-1
Japanese * ja_JP.eucJP
ja_JP.SJIS
ja_JP.deckanji
ja_JP.sdeckanji
Korean * ko_KR.deckorean
ko_KR.eucKR
Lithuanian * ko_KR.eucKR
Norwegian no_NO.ISO8859-1
Polish pl_PL.ISO8859-2
Portuguese pt_PT.ISO8859-1
Russian ru_RU.ISO8859-5
Slovak sk_SK.ISO8859-2
Slovene * sl_SIISO8859-2
Spanish es_ES.ISO8859-1
Swedish sv_SE.ISO8859-1
Thai * th_TH.TACTIS
Turkish tr_TR.ISO8859-9

Note that you can switch languages or character sets as necessary and can even operate multiple processes in different languages or codesets in the same system at the same time.

For information on supported character sets, see the guide Writing Software for the International Market and reference pages for individual languages and codesets.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.3    Code Conversion and the iconv Utility

Digital UNIX Version 4.0 extends the base tty terminal driver subsystem to include additional BSD line disciplines and STREAMS tty modules for processing data in all languages. The line discipline or STREAMS modules sed to process Japanese, Chinese, and Korean, for example, provides the following support:

Digital UNIX Version 4.0 supports the iconv utility, which converts text from one locale's codeset to another, thereby assisting programmers in the writing of international applications.

Code conversion is also implemented in the terminal driver and printing subsystem to allow the use of terminals and printers with different codesets. Additionally, code conversion is implemented in mail utilities for mail interchange with systems using different codesets (see the man command for reference page displays) and in the X Window Toolkit for text input, drawing, and interclient communication. For more information on the iconv utility, see iconv_intro(5).

For information on all the languages supported by the international terminal subsystem, see the guide, Writing Software for the International Market.

The following sections briefly discuss additional internationalization functionality.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.4    Unicode Support

Digital UNIX provides a set of locales and and codeset convertors that supports the Unicode and ISO 10646 standards.The codeset convertor modules enable an application to convert between other supported codesets and UCS-4. The following UCS-4 locales are supported:

Table 10-2: Languages and Locales

Language Locale Name
Simplified Chinese/PRC * zh_CN.dechanzi@pinyin@ucs4
zh_CN.dechanzi@radical@ucs4
zh_CN.dechanzi@stroke@ucs4
Chinese/Hong Kong * zh_HK.dechanyu@ucs4
zh_HK.dechanzi@ucs4
zh_HK.eucTW@ucs4
Traditional Chinese * zh_TW.dechanyu@ucs4
zh_TW.dechanyu@chuyin@ucs4
zh_TW.dechanyu@radical@ucs4
zh_TW.dechanyu@stroke@ucs4
zh_TW.eucTW@ucs4
zh_TW.eucTW@chuyin@ucs4
zh_TW.eucTW@radical@ucs4
zh_TW.eucTW@stroke@ucs4
Czech * cs_CZ.ISO8859-2@ucs4
Danish da_DK.ISO8859-1@ucs4
Dutch nl_NL.ISO8859-1@ucs4
Belgian Dutch nl_BE.ISO8859-1@ucs4
US English/ASCII en_US.8859-1@ucs4i@ucs4
US English/ISO8859-1 en_US.ISO8859-1@ucs4
GB English en_GB.ISO8859-1@ucs4
Finnish fi_FI.ISO8859-1@ucs4
German de_DE.ISO8859-1@ucs4
Swiss German de_CH.ISO8859-1@ucs4
Greek el_GR.ISO8859-7@ucs4
French fr_FR.ISO8859-1@ucs4
Belgian French fr_BE.ISO8859-1@ucs4
Canadian French fr_CA.ISO8859-1@ucs4
Swiss French fr_CH.ISO8859-1@ucs4
Hebrew * iw_IL.ISO8859-8@ucs4
Hungarian hu_HU.ISO8859-2@ucs4
Icelandic is_IS.ISO8859-1@ucs4
Italian it_IT.ISO8859-1@ucs4
Japanese * ja_JP.SJIS@ucs4
ja_JP.deckanji@ucs4
Korean * ko_KR.deckorean@ucs4
Norwegian no_NO.ISO8859-1@ucs4
Polish pl_PL.ISO8859-2@ucs4
Portuguese pt_PT.ISO8859-1@ucs4
Russian ru_RU.ISO8859-5@ucs4
Slovak sk_SK.ISO8859-2@ucs4
Slovene * sl_SIISO8859-2@ucs4
Spanish es_ES.ISO8859-1@ucs4
Swedish sv_SE.ISO8859-1@ucs4
Turkish tr_TR.ISO8859-9a@ucs4
Universal universal.utf8@ucs4

Digital UNIX also provides a function called fold_string_w(), which maps one Unicode string to another performing the specified Unicode transformations. For more information on the fold_string_w() function, see fold_string_w(3). For more information on Unicode support, see Unicode(5).


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.5    ISO-C

Digital UNIX provides support for the new ISO-C 1944 standard. This includes support for several new interfaces within libc as well as support within the new DEC C compiler.

The addition of these new routines provides a more complete coverage of routines that are wchar_t aware, which in turn allows Unicode to be more easily supported on the platform.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.6    Internationalized Curses

Digital UNIX supplies an internationalized Curses library in conformance with X/Open Curses, Issue 4. This provides functions for processing single-byte and multibyte characters. Multibyte characters may be in either wide-character (wchar_t) or complex-character (cchar_t) formats. The complex-character format provides for a single logical character made up of multiple wide characters. Some of the components of the complex character may be nonspacing characters.

For information on the syntax and effect of all Curses interfaces, see curses(3). For a description of the enhancements provided by the internationalized Curses routines, and their relationship to previous Curses routines, see the guide, Writing Software for the International Market.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.7    Printing

Digital UNIX Version 4.0 supports the printing of plain text and PostScript files for a variety of languages and provides outline fonts for high quality printing on PostScript printers. For the printing of Asian languages whose font files are typically too large to fit in printer memory, Digital UNIX Version 4.0 makes use of a unique font-faulting technology which substantially reduces memory requirements on the supported PostScript printers. For more information, on printing, see i18n_printing(5) and Writing Software for the International Market.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.8    Creating Locales and the localedef Utility

The localedef utility allows programmers to create their own locales, compile their source, and generate a unique name for their new locale.

For more information on localedef, see the localedef(1) reference page.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.9    I18N Configuration Tool

The I18N Configuration Tool, available via the CDE Application Manager, is one of the System Administration Configuration tools. It provides a graphical interface for the system administrator to configure I18N-specific settings. It also provides a convenient way to see what countries, locales, fonts, and keymaps are supported on the host. I18nconfig can be used to remove unused fonts and country support installed on the system.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.10    Special Support for Ideogrammatic Languages

The following sections discuss special support in Digital UNIX Version 4.0 for ideogrammatic languages, like Chinese and Japanese.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.10.1    Sorting and the asort Utility

Digital UNIX Version 4.0 supports the asort utility, an extension of the sort command, which allows characters of ideogrammatic languages, like Chinese and Japanese, to be sorted according to multiple collation sequences. For more information on the asort utility, see asort(1).


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.10.2    Multilingual EMACS

Digital UNIX Version 4.0 supports the Multilingual EMACS editor (MULE) for Asian languages. See MULE(1) for more information.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.10.3    Mail and 8-Bit Support

Digital UNIX Version 4.0 provides support for ideogrammatic languages in mailx, dtmail, MH, and comsat.

For more information on these mail utilities, see the corresponding reference pages.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.10.4    User-Defined Characters

Digital UNIX Version 4.0 provides support for creating user-defined characters (UDCs) for ideogrammatic languages, so that users can create and define character fonts and their attributes, including DECwindows fonts, with the cgen and cedit utilities. For more information on these utilities, see the appropriate reference pages.

Digital UNIX Version 4.0 also provides font rendering facilities so that X clients can use UDC databases through the X Server or font server to obtain bitmap fonts for user-defined characters.

For more information on user-defined characters, see Writing Software for the International Market.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.11    Internationalization and Motif

Motif Version 1.2.3 takes advantage of many of the internationalization features of X11R6 and the C library to support locales. Motif Version 1.2.3 also supports the use of alternate input methods, which allows input of non-ISO Latin-1 keystrokes, and delivers an extensively rewritten XmText widget which supports multibyte and wide characters and on-the-spot input style.

Motif supports multibyte and wide characters through the use of the X multibyte functions, and the localized C run-time functions (such as strlen). In addition, the compound string routines have been modified to include the X11R6 XFontSet functionality to allow for the creation of localized strings.

The User Interface Language (UIL) supports the creation of localized UID files through the -s compile-time switch on the UIL compiler, which causes the compiler to construct localized strings.

Alternate input methods can be specified by a resource on the VendorShell widget. Widgets that are parented by a Shell class widget can take advantage of this resource and register themselves as using a specific method for input.

The following sections discuss additional Motif internationalization functionality.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.11.1    Internationalized Motif Widgets

The following lists contain the widgets in the Motif Toolkit and in the DECwindows Extensions to the Motif Toolkit that support local language characters I/O capabilities and local language message displays.

Note that the Motif UIL compiler has been extended to support local language characters in UIL files.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.11.2    Internationalized Common Desktop Environment (CDE)

CDE becomes the default desktop for Digital UNIX V4.0. Digital UNIX provides internationalization support for the following CDE clients:


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Chapter] [Index] [Help]


10.11.3    Internationalized DECwindows X Clients

Digital UNIX Version 4.0 provides internationalization support for the following DECwindows X clients: