6 Using Internationalized Software

This chapter explains how setup tasks and software features vary among language environments other than English. The chapter is aimed at programmers who are familiar with Digital UNIX in an English-language environment and who need to work with other languages, particularly those that use multibyte characters, to run and test their applications.

6.1 Working in a Multilanguage Environment: Introduction

To enable input and display in any language other than English, you must always set the locale in which your process runs. Depending on the language, you may need to perform additional tasks, for example, to:

Select keyboard type
Define search paths for specialized data and executable files that are language specific
Set terminal code, application code, and other characteristics of the terminal driver to be appropriate for the codeset or codesets where a language's characters are defined
Load the fonts required for displaying the characters in a particular language
Enable one or more of the data input and editing methods used for defining and entering characters, words, and phrases
Apply printer-control characters, filters, and fonts that are appropriate for local-language printers
Note that printing text in languages other than English, particularly Asian languages, may require specialized printer hardware.

This chapter discusses these topics as they apply to particular languages or groups of languages. The chapter also describes command and DECwindows environment features that English-language speakers do not normally use and that allow you to display, enter, print, and mail text in languages other than English. For information about using internationalization features of applications that run in the Common Desktop Environment (CDE), see the CDE Companion.

Language-specific user guides provide additional information about customization and use of software provided for a particular language. These user guides are on the CD-ROM titled "Digital UNIX Online Documentation." If one or more of the language variant subsets are installed on your system, you can use the following command to read language variant guides using Bookreader. If you did not mount the CD-ROM device to the /mnt directory, replace /mnt in the following example with the directory to which you mounted the CD-ROM device.


% dxbook /mnt/DOCUMENTATION/WORLDWIDE/L10N_guides.decw_bookshelf &

PostScript files for the language variant user guides are also available on the CD-ROM. The directories that contain the PostScript versions of these guides have pathnames that adhere to the following format:

/mnt/DOCUMENTATION/WORLDWIDE/language_territory/POSTSCRIPT

Non-English characters are embedded in the tables and text of these guides. Therefore, to print a guide in PostScript format, you must first:

Install the corresponding language variant software to obtain the appropriate printer support files

Set up a DEClaser 1152, DEClaser 5100, or PrintServer 17 printer to use the print filters and fonts that are appropriate for the language
Refer to Section 6.12 and i18n_printing(5) for information about setting up printers for local languages.

Digital UNIX documentation also provides introductory reference pages on the topics of internationalization (i18n_intro(5)) and localization (l10n_intro(5)), along with reference pages for all supported languages and codesets.

6.2 Setting Locale and Language

System software that supports different language environments may provide translated message files, application resource files, help files, or some combination of these. If translations are available for message files, you can vary the language of software messages and other text by selecting a locale.

For system software, you set locale by defining the LANG environment variable. For example:


% setenv LANG en_US.ISO8859-1

Refer to the discussion of internationalization in the System Administration book and in the Command and Shell User's Guide for more detailed information on using locales and defining the associated variables for system and user setup. You can also refer to the i18n_intro(5) reference page for a discussion of locale variables such as LANG. If these locale variables are not defined, internationalized applications assume the POSIX (C) locale, which supports only English.

Note
Locales sometimes have multiple variants, usually to support different sort orders. These variants have the same name as the base locale but include a file name suffix that begins with the at sign (@). You usually assign locale names with an @ suffix to variables for specific locale categories, such as LC_COLLATE, and not to the LANG environment variable. The exception to this restriction are @ suffixes associated with codeset variants, such as @ucs4.
Many locale-specific files reside in directories whose names are constructed from the language, territory, and codeset portions of a locale name. Commands and other system applications insert the setting of the LANG variable into search paths that contain %L as one of the directory nodes. This makes it possible for software programs to find the correct set of files, such as fonts, resource files, user-defined character files, and translated reference pages, that should be used with the current locale. An @ suffix related to collation, if included in an assignment to the LANG variable, may result in applications being unable to find certain locale-specific files.

On a workstation, you also need to select a language to take advantage of text translations and local-language features available with Common Desktop Environment (CDE) and DECwindows Motif applications. For Asian languages, the correct language selection is particularly important because it enables:

Support for the appropriate input method in these applications

Entry of file names and other parameters that use ideographic characters

Cursor positioning on correct character and word boundaries

Line wrapping at correct word boundaries

See the CDE Companion for information about setting language in the Common Desktop Environment. Use the following steps to select a language in the DECwindows environment:

From the Session Manager's Options menu, select Language....

In the pop-up Language Options window, click on one of the displayed languages. If you set the LANG environment variable before starting your current session, you can click on Default to set the language to be consistent with the value of that variable. Note, however, that LANG settings made during your current session do not affect the setting of Default. If you click on an entry other than Default, the selected language overrides the value of the LANG environment variable or the system default locale, whichever applies.

Click on the Apply button.

Click on the OK button to dismiss the Language Options window.

If there is an input method that supports the selected language, you should also start the input method server before starting a DECterm window or other window where you want to work in that language (see Section 6.4). Some languages also require a keyboard setting before you begin entering text in the window (see Section 6.3).

Note
When you set the language, the change applies to all DECterm windows or other DECwindows Motif applications that start after you make the setting. The setting change does not apply to windows that are already started. You can therefore have windows running in different languages at the same time during a DECwindows Motif session.
There is a cut and paste restriction to keep in mind if you simultaneously run windows in different languages. Cutting from one window and pasting to another is supported only when both windows are set to the same language. DECterm windows emulate terminals, so data is transferred as a byte stream that has no embedded language information. Data appears on the target (paste) window according to the language applied to the target window, not according to the language applied to the source (cut) window. For example, data will be meaningless if you cut text from a Chinese window and paste it in a German window. For Chinese or Japanese, codeset converters support cut and paste operations between windows set to the same language but different codesets.

6.3 Selecting Keyboard Type

To enter English text, a standard keyboard provides a sufficient number of keys (combined with shift states) to enter all uppercase and lowercase letters, numerals, and punctuation marks. For many other languages, the default keyboard does not provide enough keys and shift states to enter all characters.

Terminal users must be using a localized keyboard or, if their Digital keyboard includes the Compose key, using Compose-key sequences to enter non-English characters from single-byte codesets. Many Digital terminals also provide software emulation of a number of keyboard layouts for languages that are based on single-byte codesets. The user guide for each terminal explains how you can use its keyboard to enter non-English characters. Entry of multibyte characters in Asian languages requires special terminal hardware.

Workstation users can set keyboard type to be appropriate for languages for which there are standard keyboard types when appropriate support files are installed on the system. You need to set keyboard type for Western and Eastern European languages, Japanese, Thai, and Hebrew. Keyboard setting is not required for Chinese and Korean languages.

If you are using the Common Desktop Environment, refer to the CDE Companion for information about changing keyboard setting. If you are using the DECwindows envirnonment, you can change keyboard type by performing the following steps:

From the Session Manager window, select Keyboard... from the Options menu.

In the Keyboard Options dialog box, click on a keyboard choice.

Click on Apply and then on OK to dismiss the dialog box.

Unlike the language setting, keyboard setting is a global attribute that applies to all windows. Therefore, if you are working in windows created with different language settings, you may need to change the keyboard setting as you move from one window to another.

6.3.1 Determining Keyboard Layout

If you change your keyboard from the one whose characters are printed on the hardware keys, you need to know how characters are mapped to keys and whether any characters must be entered by using a mode-switch key or mode-switch key sequence. For some languages, such as Czech, up to four different characters can be mapped to the same key. In such cases, you use the key defined as the mode switch to toggle among different sets of characters mapped to the same key. Note that mode switching is a character entry mechanism that is different from Compose sequences. A particular keyboard setting may support Compose sequences (which require one key to be defined as a multi-key), mode switching (which requires at least one key to be defined as a mode-switch key), both, or neither of these input mechanisms.

You can access a keyboard layout for your current keyboard setting in one of two ways:

By clicking on the Edit Keycaps button in the Keyboard Options dialog box (DECwindows environment) or in the Keyboard application window (Common Desktop Environment)
Refer to dxkeycaps(1X) for more information on the application invoked by the Edit Keycaps button.
By using a command similar to the following to create a PostScript file that you can print:
```
% /usr/bin/X11/xkbprint -label symbols -o mykeyboard.ps :0
```
Refer to xkbprint(1X) for more information about the xkbprint command.

6.4 Determining Input Method

For some languages, such as Japanese, Chinese, and Korean, you use an input method to enter characters, phrases, or both. An input method lets you input a character by taking multiple editing actions on entry data. The data entered at intermediate stages of character entry is called the preediting string. The X Input Method specification defines four user interaction styles:

On-the-spot
Data being edited is displayed directly in the application window. Application data is moved to allow the preediting string to display at the point of character insertion.

Over-the-spot
The preediting string is displayed in a window that is positioned over the point of insertion.

Off-the-spot
The preediting string is displayed in a window that is within the application window but not over the point of insertion. Often, the window for the preediting string appears at the bottom of the application window. In this case, the preediting window may occlude the last line of text in the application window. You can resize the application window to make this last line visible.

Root-window
The preediting string is displayed in a child window of the application RootWindow.

For some of the input styles selected in an application, the preediting and status windows are not redrawn correctly if the application window is occluded by other windows. To correct this problem, click on or refocus on the application window.

Input methods for different locales typically support more than one user interaction style but not all of them. If you are working in languages that are supported by an input method, you can specify styles in priority order through the VendorShell resource XmNpreeditType. By default, this resource is defined to be:

OnTheSpot,OverTheSpot,OffTheSpot,Root

The preceding value means that on-the-spot interaction style is used if the input method supports it, else the over-the-spot is used if the input method supports it, and so forth.

There are several ways to supply the XmNpreeditType resource value to an application:

In an application-specific resource file

On the command line that invokes an application

For example:


% dxnotepad -xrm '*preeditType: offthespot,onthespot' &

In the DECwindows environment, through the Session Manager, as follows:

From the Session Manager's Options menu, select Input Style....
In the Input Style Options dialog box, select a preference for interaction style from the Style Preference list.
To position your selection in the list, click on the up-arrow or down-arrow button.
Click on the Apply button and then the OK button to apply the new position to your selection and dismiss the dialog box.
Note that clicking on the Default button of the Input Style Options box restores the system default order to entries on the list.
Repeat the preceding steps until the Style Preference list is in the order you want.

In the Common Desktop Environment, by using the Input Methods application. See the CDE Companion for information on using this application.

Input styles are supported by specialized input method servers. An input method server runs as an independent process and communicates with an application to handle input operations. An input method server does not have to be running on the same system as the application but must be running and made accessible to the application before the application starts. It is therefore important to start an input method server for the DECwindows Motif environment before starting a DECterm window or any other DECwindows Motif application where you want to input characters in a language that requires the server. Following are the input method servers available in the operating system, along with the input styles that each server supports:

dxhangulim, the Korean input server, which supports all four input styles (over the spot, off the spot, root window, and on the spot)

dxhanyuim, the Traditional Chinese input server, which supports the off-the-spot and root-window input styles

dxhanziim, the Simplified Chinese input server, which supports the off-the-spot and root-window input styles

dxjim, the Japanese input server, which supports the on-the-spot, over-the-spot, and root-window input styles

Each of these servers has a corresponding reference page.

The applications that you run may support more, fewer, or none of the input styles supported by a particular input server. The preedit option "None" applies when an input server rejects all input styles supported by the application.

In the DECwindows environment, if an input method server is not defined as an application and started through your .Xdefaults file at login time (see Section 6.6), you have to start the server from the command line. The following example starts the input server for the Korean language:


% /usr/bin/X11/dxhangulim &

Note
The Asian-language input methods are not supported for use in xterm windows. Therefore, always choose the DECterm application to start windows where you want to display and input Asian-language characters.

In the Common Desktop Environment, the appropriate input server is automatically started when you select the language.

6.5 Determining the Input Mode Switch State

The keyboard layout for an Asian language provides keys for only a small number of characters. For Asian languages, you also use an input methodology (incorporating control-key sequences, keypad-key sequences, or options in a DECwindows application) to convert one or more characters that you can input directly from the keyboard to other kinds of characters. Section 6.4 and the language-specific user guides discuss input methods for Asian languages.

If you are using a terminal and your keyboard has a mode-switch LED (light emitting diode), the Keyboard Indicator utility switches the LED on or off, depending on whether you last toggled the special input mode on or off. When using a terminal, invoke the Keyboard Indicator utility with the following command:


% /usr/bin/X11/kb_indicator &

If you are using a workstation and your language is set to an Asian language, invoke the Keyboard Indicator utility with the -map option, as follows:


% /usr/bin/X11/kb_indicator -map &

The -map option starts a DECwindows Motif application that emulates a mode-switch LED. The application window contains one button, which is displayed as on or off, corresponding to the input mode state. You can click on this button to toggle in and out of input mode. The window is insensitive if input mode switching is not supported for your current language setting.

You can have only one Keyboard Indicator application running during your session. To stop the application, enter Ctrl/C in the window from which you started the application or enter the following kill command with the application's process id:

kill -INT process_id

If Keyboard Indicator is stopped by any other means, you must enter the following command before restarting the application:


% /usr/bin/X11/kb_indicator -clear

The preceding command erases the server status for the application so that it can be restarted cleanly.

If your language is set to Hebrew, the Keyboard Manager application (/usr/bin/X11/decwkm) provides the same function as the Keyboard Indicator window provides for Asian languages.

6.6 Setting Parameters in the .Xdefaults File

In the DECwindows environment, if you want your session to be started with a particular language, input method, or keyboard setting as the default, you can manually edit the .Xdefaults file in your home directory to add appropriate entries for language, input method (if applicable), and keyboard. Alternatively, you can select the language and keyboard options you want from DECwindows Motif Session Manager menus, quit the session, and click on the affirmative answer when asked whether you want to save current settings. Saving current settings adds lines to or modifies existing entries in your .Xdefaults file. When you log back in to start a new session, the changed defaults take effect.

Example 6-1 shows an .Xdefaults file, modified by the choice to save current settings when quitting the session. The language and keyboard settings are Japanese (DECkanji) and LK401aj, respectively. The string dxjim has been added to several lines to define the Japanese input method server as a DECwindows Motif application and automatically start the server process.

Example 6-1: Sample .Xdefaults File

DXsession.x:    3
DXsession.y:    40
DXsession.AutoStart:    dxjim
DXsession.applications: Bookreader,CDA Viewer,Calculator,\
Calendar,Cardfiler,Clock,DECterm,Differences,\
Mail,Notepad,Paint,Print Screen,XTerm,dxjim
DXsession.dxjim.command:        /usr/bin/X11/dxjim
DXsession.num_AutoStart:        1
DXsession.num_applications:     15
DXsession.AppMenu:      Bookreader,CDA Viewer,Calculator,\
Calendar,Cardfiler,Clock,DECterm,Differences,Mail,Notepad,\
Paint,XTerm,dxjim
DXsession.num_AppMenu:  13
*xnlLanguage:   ja_JP
*keyboard_dialect:      japanese lk401aj

Refer to Section 6.18 for information about using specific DECwindows Motif applications with Asian languages. Section 6.18 also discusses X Server customization that is important when ideographic fonts are used in local and remote displays.

For information about customizing session defaults in the Common Desktop Environment, see the CDE Companion.

6.7 Defining the Search Path for Specialized Components

European languages are supported by data and executable files installed at system default locations. Asian-language support for some commands and programming libraries requires files that are subordinate to the directory /usr/i18n. These files supplement or replace files in system default locations. When you install one or more of the Asian language subsets, the installation procedure makes the following adjustments to variable settings on a systemwide basis:

I18NPATH
The I18NPATH variable defines the location of files that provide Asian-language support and that are not in system default locations. This variable is set to:
/usr/i18n
Your system administrator can choose to install files for Asian-language support at a location different from /usr/i18n; however, there must be a link to the other location in the /usr/i18n directory.
PATH
The PATH variable points to the location of commands and is set to:
$I18NPATH/usr/bin:$PATH

The file /etc/i18n_profile includes the preceding variable assignments on a systemwide basis for Bourne and Korn shell users. For C shell users, the installation process includes the file /etc/i18n_login in the file /etc/csh.login to set search paths correctly for Hebrew and Asian languages. Unless specifically noted in descriptions of particular commands or utilities, individual users do not need to change process-specific search paths to find localized binaries and utilities.

6.8 Using Terminal Interface Features for Asian Languages

The Digital UNIX Asian terminal driver (atty) and Thai terminal driver (ttty) support input and output of English and other language characters over asynchronous terminal lines. When one or both of these drivers are installed, you can set terminal line characteristics to be appropriate for the language you are using. The driver's local-language capabilities are supported in the following terminal configurations:

Terminal connected directly to the host machine via a serial line
Terminal connected through LAT to the host system
Terminal connected through TCP/IP to the host system

Refer to the atty(7) and ttty(7) reference pages for more information about these terminal drivers.

Asian-language software subsets provide an enhanced stty command that can enable support for multibyte codesets and special character manipulation capabilities, such as the following:

Automatic codeset conversion between terminal and application
Line editing of multibyte characters
Japanese input method (Kana-Kanji conversion)
User-defined character (UDC) databases and on-demand loading (ODL) of associated fonts
Chinese phrase input method

This section provides general information about using the stty command to enable features added to the terminal subsystem for Asian languages.

The stty utility sets or reports on terminal input/output characteristics of the device that is the utility's standard input. Table 6-1 shows the stty options that set line discipline for Asian languages.

Table 6-1: The stty Command Options for Controlling Terminal Line Discipline
stty Option	Description
`adec`	Sets the terminal line discipline to handle multibyte data and the processing environment appropriate for simplified Chinese (Hanzi), traditional Chinese (Hanyu), and Korean codesets. This option is supported for both the STREAMS and BSD terminal drivers.
`jdec`	Sets the terminal line discipline to handle multibyte data and the processing environment appropriate for Japanese codesets. This option sets terminal code to `dec` and application code to `eucJP`. The `jdec` option is supported for both the STREAMS and BSD terminal drivers.
`tdec`	Sets the terminal line discipline to handle Thai characters and the processing environment appropriate for the Thai codeset. This option is supported for only the BSD terminal driver.
`dec`	Sets the terminal line discipline back to the default, or standard, `tty` line discipline and clears characteristics that preceding `stty` commands may have set for application and terminal code. This option is supported for both the STREAMS and BSD terminal drivers.

The stty command requires an appropriate locale setting to be in effect before changing terminal line discipline to support that locale. For example, to set your terminal line discipline to handle Korean, enter:


% setenv LANG ko_KR.deckorean

% stty adec

To set your terminal line discipline back to the tty default, enter:


% stty dec

Note
When your terminal line discipline is not set to the tty default and you want to switch to another nondefault option (to switch from jdec to adec, for example), first enter the stty dec command to clear any application or terminal characteristics that may not be appropriate for the new setting. The following example shows how to switch a terminal line discipline from its current setting of adec to jdec:
% stty dec

% stty jdec

The stty command entered with the -all option displays all settings for the current terminal line discipline:


% stty adec

% stty all
atty disc;speed 9600 baud; 24 rows; 80 columns
erase = ^?; werase = ^W; kill = ^U; intr = ^C; quit = ^\; susp = ^Z
dsusp = ^Y; eof = ^D; eol <undef>; eol2 <undef>; stop = ^S; start = ^Q
lnext = ^V; discard = ^O; reprint = ^R; status <undef>; time = 0
min = 1
-parenb -parodd cs8 -cstopb hupcl cread -clocal
-ignbrk brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -iuclc
ixon -ixany -ixoff imaxbel
isig icanon -xcase echo echoe echok -echonl -noflsh -mdmbuf -nohang
-tostop echoctl -echoprt echoke -altwerase iexten -nokerninfo
opost -olcuc onlcr -ocrnl -onocr -onlret -ofill -ofdel tabs -onoeot
-odl lru size=256
-sim key= class=
tcode=dec acode=deckanji

6.8.1 Converting Between Application and Terminal Codesets

Many terminals support only one codeset, which is a problem when you work on one terminal and need to run applications in locales (particularly Asian locales) that are based on a variety of codesets. Therefore, the atty driver provides a mechanism for converting between the codeset that an application uses and the codeset that a terminal supports. You control codeset conversion by using options on the stty command line.

Note that the adec, jdec, and dec options of the stty command set terminal code and application code appropriately for Digital terminals and workstations. You need to explicitly use the tcode option, for example, if you are logging on from a Japanese terminal that does not support the standard codeset for Digital terminals.

Table 6-2 specifies stty options that explicitly set terminal and application code.

Table 6-2: The stty Options to Explicitly Set Application and Terminal Code
stty Option	Description
`acode codeset`	Sets application code to codeset.
`tcode codeset`	Sets terminal code to codeset. For most all Digital terminals and for workstations, `tcode` should be set to `dec`.
`code codeset`	Sets both terminal code and application code to codeset.

The following command lets you run an application that uses DEC Kanji (the default codeset for Japanese) on a terminal that supports only Shifted JIS (a codeset prevalent in the Japanese personal computer market):


% stty acode deckanji tcode sjis

The user guides for the Asian-language subsets provide additional details about supported application codesets and terminal codesets.

6.8.2 Command Line Editing That Supports Multibyte Characters

This section discusses how you enable and use command-line editing when Asian-language support is installed on your system.

When the terminal line discipline and terminal codeset characteristics are set appropriately for multibyte codesets, the atty driver handles command-line editing appropriately for languages supported by those codesets. For example, when you enter the control sequence to delete a character (assuming you have defined the control sequence), the entire character is deleted, regardless of how many bytes it occupies. The character being erased can be either a single-byte English character or a multibyte Asian character when both occur on the same command line.

Word deletion is also supported, even when words combine single-byte and multibyte characters. The atty driver accepts single-byte space characters, two-byte space characters (if applicable to the terminal code setting), or tab characters as word delimiters.

The erase and werase options of the stty command line let you define the control sequence for character and word deletion. For example:


% stty erase ^H

% stty werase ^J

The preceding example specifies that Ctrl/H deletes the character that precedes the cursor and Ctrl/J deletes the word preceding the cursor.

History mode is a mode of command-line editing that allows you to recall and optionally modify a command entered previously. The history mode implementation discussed here is one that is customized for Asian-language input and supported only for the BSD terminal driver. Table 6-3 specifies the stty options that enable or disable history mode editing.

Table 6-3: The stty Options to Enable/Disable History Mode
stty Option	Description
`history key`	Sets the toggle key for the history mechanism and enables it.
`-history`	Disables the history mechanism.

The atty driver can maintain a history of up to 32 commands, each with a maximum length of 127 characters. Table 6-4 describes the commands you can use to edit command lines after entering the history key.

Table 6-4: Command Line Editing in History Mode
Command/Key	Description
`Ctrl/A`	Move to the beginning of the line
`Ctrl/D`	Delete the character under the cursor
`Ctrl/E`	Move to the end of the line
Up-arrow	Recall the previous command line in the history list
Down-arrow	Recall the next command in the history list
Left-arrow	Move the cursor left by one character
Right-arrow	Move the cursor right by one character
erase_sequence	Delete the character preceding the cursor
werase_sequence	Delete the word preceding the cursor

In the preceding table erase_sequence and werase_sequence indicate the control sequences defined by the stty options erase and werase, respectively.

When editing a command line in history mode, you insert characters as follows:

Press the arrow keys to move the cursor to the position immediately to the right of the point where you want to insert characters.

Enter the characters you want to insert.

If you enter the control characters that represent "kill," "interrupt," or "suspend," the tty driver breaks out of history mode and cancels the command line being edited.

6.8.3 Kana-Kanji Conversion: Customization of Japanese Input Options

In the Japanese language, a particular language element, such as a vowel, can be represented by more than one character. These characters can have both phonetic and ideographic variants; furthermore, the phonetic character variants can print in either two-column or single-column width. The different classes of characters, listed in the following table, require different input schemes:

Character Class	Description
Kanji	Ideographic
Hiragana	Phonetic
Katakana	Phonetic Katakana characters exist in full width (two-column) and half width (single-column) formats. The single-column format of Katakana is referred to as Hankaku.

During a single session, a Japanese user can work with Kanji, Hiragana, and Katakana characters in various combinations. The user therefore must be able to customize terminal input mode to suit the character being entered. When the input device is a JIS terminal rather than a workstation, the user must adjust line discipline and terminal code settings in the software to match hardware capabilities (for example, whether the terminal uses 7-bit or 8-bit encoding).

The tty driver supports a mechanism known as Kana-Kanji conversion. This term refers to the conversion between phonetic and ideographic character encoding and the support for keyboard entry sequences that make Japanese character selection more efficient for the user. You use the stty command to enable or disable the Kana-Kanji conversion method and other aspects of Japanese input support. The stty options that support Japanese input are described in Table 6-6 and, unless noted otherwise, are used in conjunction with the jdec option. For example, the following command sets the terminal line discipline to support Japanese character encoding and also enables Kana-Kanji conversion:


% stty jdec ikk

Table 6-6: The stty Options to Enable and Customize Japanese Input
stty Option	Description
`clause mode`	Sets the character attribute for marking a clause that results from Kana-Kanji conversion. The mode argument can be `bold`, `underline`, `reverse`, or `none`.
`esc.alw`	Changes the terminal state to "shift out" whenever a newline character is output. This option applies only when the `tcode` (terminal code) `stty` option is set to `jis7` or `jis8`.
`-esc.alw`	Does not change the current terminal state when a newline character is output. This option applies only when the `tcode` (terminal code) option is set to `jis7` or `jis8`.
`henkan mode`	Sets the character attribute for marking a Henkan, or conversion, region that results from Kana-Kanji conversion. The mode argument can be `bold`, `underline`, `reverse`, or `none`.
`ikk`	Enables the Japanese input method and spawns the Kana-Kanji conversion daemon, `kkcd`, if it does not already exist. Depending on the terminal driver (BSD or STREAMS), use either the `jx` or `jinkey` option before using the `ikk` option to enable the input method. By default, key map information is taken from (in highest to lowest priority order): The file specified for the `kkseq` option of the `stty` command The file defined for the environment variable `JSYKKSEQ` The file `$HOME/.jsykkseq` System default key map files for the Japanese input method reside in the directory `/usr/i18n/skel/ja_JP`. Dictionaries used with the Japanese input method are taken from (in highest to lowest priority order): The files defined for the environment variables `JSYTANGO`, `JSYKOJIN`, and `JSYLEARN` The dictionary files `/usr/i18n/jsy/jsytango.dic`, `$HOME/jsykojin.dic`, and `$HOME/jsylearn.dic`
`-ikk`	Disables the Japanese input method and kills the `kkcd` daemon.
`jinkey` sequence	Defines the escape sequence to activated the extended Japanese input method used with the STREAMS terminal driver. The parameter for this option can be more than one character.
`imode mode`	Sets the mode for handling 8-bit code or Hankaku (single-column) Kana code when the terminal line discipline is set to `dec`. The mode argument can be one of the following keywords: `kanji`, where the 8-bit code is treated as encoding for Kanji `hiragana`, where the 8-bit code is converted to 2-column Hiragana format `katakana`, where the 8-bit code is converted to 2-column Katakana format `hankaku`, where the 8-bit code is handled in Hankaku (1-column) Katakana format
`jx character`	Sets the toggle character for entering the extended, or `cbreak`, Kana-Kanji conversion mode used with the BSD terminal driver. Users need to enter `cbreak` mode when working in utilities, such as `dbx`, that do not support the full range of Japanese input options.
`-jx`	Undefines the toggle character for entering the extended Kana-Kanji conversion mode.
`kin esc_sequence`	Sets the JIS Kanji "shift in" escape sequence for the JIS terminal.
`kkmap`	Displays the current key map for Kana-Kanji conversion. The display is a traversal tree with a maximum length of 15 characters for each key sequence.
`kkseq file`	Sets the Kana-Kanji conversion key map file for the terminal (see also the table entry for the `ikk` option).
`knj.bsl`	Uses only one backspace to erase one Kanji character.
`-knj.bsl`	Uses two backspaces to erase one Kanji character.
`knj.sp`	Uses one 2-byte (zenkaku) space to blank out one Kanji character.
`-knj.sp`	Uses two ASCII spaces to blank out one Kanji character.
`kout esc_sequence`	Sets the JIS Kanji "shift out" escape sequence for the JIS terminal.

6.9 Setting Up and Using User-Defined Character Databases

The national character sets for Japan, Taiwan, and China do not include some of the characters that can appear in Asian place and personal names. Such characters are defined by users and reside in site-specific databases. These databases are called user-defined character (UDC) or character-attribute databases. When users define ideographic characters, they must also define font glyphs, collating files, and other support files for the characters. You create characters with the cedit application, discussed in Section 6.9.1. You use the cgen utility, discussed in Section 6.9.2, to create font, collation, and other support files for user-defined characters. X applications can also obtain fonts for user-defined characters directly from a UDC database by using font renderers. Refer to Section 6.18.2 for information about font renderers.

Note
The system default sort command does not access the collation files created for user-defined characters. Refer to Section 6.14 for information on sorting strings that may contain these characters.

The rest of this section discusses some setup that is necessary before terminals or workstation monitors can display user-defined characters.

The atty driver includes a mechanism to allow on-demand loading of files associated with user-defined characters. You enable this mechanism and can change some of its default parameter values with the stty command. Table 6-7 describes the stty options that you use with on-demand loading.

Table 6-7: The stty Options for On-Demand Loading of UDC Support Files
stty Option	Description
`odl`	Enables the software on-demand loading (SoftODL) service.
`-odl`	Disables the software on-demand loading (SoftODL) service.
`odlsize size`	Sets the maximum size of the ODL buffer. This size should be the same as a terminal's font-cache size. By default, size is 256 characters.
`odltype type`	Sets the ODL buffer replacement strategy. Valid values for type are: `fifo` (first-in-first-out) and `lru`(least recently used)
`odldb path`	Sets the path to the database and other files that support user-defined characters. If this path is not specified, either the system default files are used or, if users are allowed to create personal UDC databases, the process default files are used. Default pathnames for various databases are specified in the file `/var/i18n/conf/cp_dirs`, which is described in a subsequent section of this chapter. The `cp_dirs` file specifies, for example, that the systemwide defaults are `/var/i18n/udc` and `/var/i18n/odl`, and that the process defaults are `$HOME/.udc` and `$HOME/.odl`. Use the `odldb` option when you want to change the default `odl` file.
`odlreset`	Resets the ODL service and clears the internal ODL buffers.
`odlall`	Displays the current settings for the ODL service.

Figure 6-1 shows the relationship among components mentioned in the preceding list and the SoftODL service.

Figure 6-1: Components That Support User-Defined Characters

6.9.1 Creating User-Defined Characters

The user-defined character editor (cedit) is a curses application for managing attributes of user-defined characters. The character attributes that you usually manipulate with the cedit application include:

Styles and sizes (16x18, 24x24, 32x32, and 40x40) for bitmap fonts
Codeset values
Collating values
Input key sequences

Each user-defined character has a character attribute record, which is stored in a character attribute, or UDC, database. A UDC database can be systemwide or private. There can be only one systemwide database that all users share; however, any user can have a private database as well. The following command invokes the user-defined character editor:


% cedit

The preceding command, which includes no options, uses the default database. If you are superuser, the default database is /var/i18n/udc. If you are an unprivileged user, the default database is $HOME/.udc. There are a number of problems you can encounter when using user-defined characters that are maintained in private databases; therefore, Digital recommends that user-defined characters be maintained only in a systemwide database by a privileged user. The cedit command has a number of options and an argument, which are described in Table 6-8.

Table 6-8: The cedit Command Options
cedit Options and Arguments	Description
`-c old_db`	Converts a Japanese ULTRIX `fedit` font file or an Asian ULTRIX character attribute database file to the format used by `cedit`.
cur_db	Specifies the path of a character attribute database (to override the default path).
`-h`	Displays `cedit` syntax.
`-r ref_db`	Specifies the path of the reference character attribute database (to override the default path). This database provides a model for the UDC database on which you are working with the `cedit` utility. The Reference Database item on the `cedit` File menu is an alternative to specifying the `-r` option on the `cedit` command line.

The following command displays the cedit syntax format:


% cedit -h
Usage : cedit [-h] [-c <old_db>] [-r <ref_db>] [<cur_db>]

The cedit command returns an error message if your locale setting is one that is not supported for creation of user-defined characters. Locales supported for user-defined characters include those for the Chinese and Japanese languages. After you invoke cedit, you can use the Options menu on the cedit user interface screen to change the language of user interface messages and help text back to English.

The following sections discuss the screens, menu items, editing modes, and function keys of the cedit utility.

6.9.1.1 Working on the cedit User Interface Screen

When the LANG variable is set to a supported locale, such as zh_TW.big5, the cedit command displays the user interface screen shown in Figure 6-2.

Figure 6-2: The cedit User Interface Screen

The user interface screen is divided into three areas:

Menu area
This area contains a bar of menu names. When you select and activate a particular menu, its items appear in the portion of the menu area below the menu bar.
Status area
Below the menu area is the status area, which displays the current language and codeset.
Input and message area
The bottom two lines of the screen accept user input and display warning or informational messages.

To navigate the menu interface, you can use the four arrow keys to select a menu and then press either Return or the space bar to see items on that menu. You can accomplish the same goal more directly by pressing the key for the letter that is underlined in the title of the menu.

Menu items are displayed in one of the following states:

Active
An active item is one that you can select. Active items appear with one letter highlighted and underlined. You can press the key for that letter to start the function represented by the item.
Inactive
You cannot select inactive items. Inactive items do not contain underlined and highlighted letters.
Selected
If you press the down arrow key rather than the key for a highlighted letter, you can select items without starting the functions they represent. The currently selected item is shown in reverse video.
Activated
You activate an item when you press the key for a highlighted letter or when you press Return or the space bar after selecting the item with the down arrow key. Activating an item usually displays a pop-up menu, causes a particular function to start, or both. Activating an item that is followed by the characters >> displays a cascade menu.
In the text that follows, when you are told to choose an item, you should activate it.

To return to a higher menu level without activating items, press Ctrl/X.

Menus on the user interface screen provide the following options for managing user-defined characters and their attributes:

File
Use the File menu to:
- Save changes made to the character you are currently working on
- Cancel changes made to the current character
- Change the reference character attribute database
- Exit from or quit the cedit program
Edit
Use the Edit menu to select a character and create or change its font glyph, codeset value, collating value, input key sequence, class, or name.
Section 6.9.1.2 discusses editing a character's font glyph.
Delete
Use the Delete menu to delete a character or some of its attributes.

Show

Use the Show menu to display attributes of the character you are working on or the status of databases (current character attribute database or reference character attribute database).

The cedit utility keeps track of a character through its attribute record. This record contains fields to identify the following attributes:

Character number (unique for each character in the UDC database)
Codeset values (one for each codeset supported by a particular language/territory combination)
Font styles and sizes
Collation values (one for each collation sequence supported by the language)
Input key sequences (one for each input method supported by the language)
Class identifiers (reserved for future use)
Character mnemonic (reserved for future use)

There is some variation among Asian codesets in terms of support for UDC attributes. For example, you cannot define an input key sequence through cedit for a Japanese user-defined character. For Chinese, you can define an input key sequence for use only with the DEC Hanyu codeset and TsangChi and QuickTsangChi input modes.

Commands
Use the Commands menu to:
- Copy character records from the reference character attribute database to the current character attribute database or, within the current character attribute database, copy records from one range of characters to another
  You can implement the copy operation blindly (No Confirm), confirm the copy operation for each character in the range (Confirm All), or confirm the copy operation only for characters that will overwrite other characters (Confirm Conflict).

Options
Use the Options menu to change the current setting for language and codeset that is applied to your work on user-defined characters. You can also independently set the language of messages and help text in the cedit user interface. By default, the language of the cedit user interface is the same as the locale setting in effect when you invoked cedit.

Help

Use the Help menu to display introductory text for cedit functions. Help is also available for menu items through the Help key when this key is provided on your terminal or, for workstation users, enabled by your terminal setting. In other words, you can first select a menu item with the arrow keys and then press the Help key for a short description of the selected item.

6.9.1.2 Editing Font Glyphs

To create or change the font glyph of a user-defined character, you must invoke the font editing screen of cedit as follows:

Select a character by choosing the Character item from the Edit menu.

The cedit program then prompts you to enter the hexadecimal code value (without the \x prefix) for the character to be edited. The range of valid codes for UDC characters is defined in locales for Asian languages. When more than one codeset is supported for the language and territory of your current locale, cedit attempts to supply values for the additional codesets so the character can be used with all the associated locales.

If cedit cannot determine the character's value in other codesets, you can change the codeset setting through the Options menu and then explicitly specify the character's encoding in the additional codeset. In general, it is a good idea to define user-defined characters to have values that can be mapped to other codesets supported for the language. For more information on codes for user-defined characters in specific Asian languages, refer to the language-specific user guides available with the Asian-language subsets of Digital UNIX.

The cedit utility first searches your current UDC database for the code that you enter. If a character with that code is not found in the UDC database, the utility searches the current reference character database.

Choose the Font item from the Edit menu to see options for font style/size.

Choose one of the font style/size options.
If you are creating a font glyph for use in a DECwindows Motif application, the available size options may not be appropriate for the window area where you intend to use the font. In this case, choose the smallest size option that will accommodate both dimensions of your DECwindows font.

The cedit program then displays the full-screen font editor interface as shown in Figure 6-3.

Figure 6-3: The cedit Font Editing Screen

The cedit font-editing screen has several windows:

The large window on the right-hand side of the screen is where you edit the UDC font glyph. To edit, use the cursor movements and editing functions that cedit supports.
Each dot on the editing window represents one pixel.
The three small windows immediately under the Reference title display other font glyphs that you can refer to while editing the current one. You use the cedit Refer function to control which font glyphs appear in these windows.
The small window under the three reference windows is called the display window. The display window shows the font glyph you are editing in its actual size. The display window does not automatically reflect changes you make in the editing window. You must press the KP. key to update the font glyph in the display window.

Note
There are some hardware restrictions regarding font glyph displays in the small windows.
Font glyph displays in the reference and display windows are enabled only on certain terminals, specifically, on local-language terminals that support the Dynamic Replacement Character Set (DRCS) function.
On DECterm windows, the font glyph in the Display window does not appear in its actual size.

Fonts created in the editing window for use with system software are processed to occupy the size dimensions you selected before the editor interface screen appeared.

You can also create a font for use with DECwindows software and whose dimensions are smaller than those selected. In this case, you confine your editing operations to a rectangle that originates at the upper-left corner of the editing window and has dimensions smaller than the available editing space (see Figure 6-4). The UDC font converter that supports DECwindows considers the upper-left corner of the editing window as the font origin, generates dimensions needed to encompass the glyph based on this origin, and discards unused space outside these dimensions. This utility also allows you to explicitly specify the size dimensions for the compiled font glyphs.

Figure 6-4: Interpretation of Font Editing Screen for Sizing DECwindows Font

All functions in cedit are bound to keys; in other words, you press a key to invoke a function. Press either the PF2 or the Help key to see a diagram of how keys are bound to editing functions. Note that your online diagram may vary from the one shown here due to differences in keypad design on some systems. There are four kinds of editing modes for the cedit editing screen:

Cursor modes
Using the arrow keys to move the cursor does not affect the pixel state. However, when you use keypad keys to move the cursor, the following list describes how Cursor modes affect the pixel state:
- On: Turns on the pixel under the cursor.
- Off: Sets the pixel under the cursor off.
- On/Off: Toggles the pixel under the cursor.
  You can also toggle the pixel under the cursor with any movement by pressing KP5.
- Move: Moves the cursor without changing the pixel state.

Paste modes

Paste modes control the pixel operation when you perform the paste function.

Overlay: Sets a pixel on if it or its corresponding pixel in the paste buffer is on.
Overwrite: Sets the pixel to the state of the corresponding pixel in the paste buffer.

Type modes

Type modes determine whether the margin of one pixel width is maintained around the character.

Body: Allows you to edit the entire font glyph area.
Letter: Prevents you from editing the pixel value of the boundary area.
Letter mode means that you cannot set pixels to the on state when at the boundary of the editing window.

Wrap modes

Wrap modes enable or disable cursor wrapping.

On: Causes the cursor to wrap to the leftmost pixel when you move the cursor beyond the rightmost pixel in the editing area.
Similar wrapping behavior occurs when you move the cursor beyond the leftmost, uppermost, and lowermost pixels in the editing area.
Off: Causes the bell to ring and stops cursor movement on attempts to move the cursor beyond the leftmost, rightmost, uppermost, and lowermost pixels in the editing area.

The cedit font editor uses four buffers to store bitmap data. Some of these buffers are used by editing functions, which are discussed following the buffer descriptions.

Edit buffer
This is the buffer whose contents normally appear in the editing window.

Use buffer

This buffer is associated with the Use function and contains a font glyph you retrieved from a UDC database or one of the reference windows.

Cut-and-Paste buffer

Use this buffer when pasting bitmap data in the editing window. The bitmap data being pasted is copied either from a Use buffer or the Edit buffer (if you are copying something from one section of the editing window to another).

Undo buffer

This buffer contains the changes made during the last edit operation and is used by the cedit Undo function to delete those changes.

When you are working on windows in the font-editing screen, you invoke editing functions by using keystrokes or, in some cases, through a pop-up menu that appears when you press the Do key. The following functions are available on the pop-up menu:

Scale
This function lets you scale the current font glyph to another size supported by the system. The SCALE function does not have a keystroke alternative and is available only on the pop-up menu.

Use

This function retrieves a font glyph from a UDC database or from one of the reference windows.

Refer

This function saves a font glyph copied from a UDC database into one of the reference windows.

Figure 6-5 shows the keypad keymaps for invoking different editing functions. The keypad functions, along with the letter keys used for drawing, are described in the following tables.

Figure 6-5: Keymap for cedit Functions

Table 6-9: Keys for Miscellaneous Font Editing Functions
Key	Description
Help or PF2	Shows you which keys are bound to which editing functions. Press Help along with another key in the diagram for more information on a particular key's editing function.
PF1	Toggles the GOLD state. Some keypad keys represent more than one function; in this case, one of those functions is invoked by pressing PF1 and then the other keypad key.
KP.	Displays the font glyph in actual size on the display window.
GOLD KP.	Clears the font glyph displayed in the editing window.
U or u	Undoes the previous operation.
Ctrl/L	Redraws the screen.
Ctrl/Z	Suspends the `cedit` program.
Do	Displays the pop-up menu for invoking SCALE, USE, and REFER functions.
Enter	Saves changes and exits from the font editor.
GOLD Enter	Quits the font editor without saving changes.

Table 6-10: Keys for cedit Mode Switching
Key	Description
PF3	Toggles Cursor mode.
PF4	Toggles Paste mode.
KP-	Toggles Type mode.
KP.	Toggles Wrap mode.

Table 6-11: Keys for Fine Control of Cursor Movement
Key	Description
Up-arrow	Moves the cursor up.
Down-arrow	Moves the cursor down.
Left-arrow	Moves the cursor left.
Right-arrow	Moves the cursor right.
KP7	Depending on Cursor mode, moves the cursor up and left.
KP8	Depending on Cursor mode, moves the cursor up.
KP9	Depending on Cursor mode, moves the cursor up and right.
KP4	Depending on Cursor mode, moves the cursor left.
KP6	Depending on Cursor mode, moves the cursor right.
KP1	Depending on Cursor mode, moves the cursor down and left.
KP2	Depending on Cursor mode, moves the cursor down.
KP3	Depending on Cursor mode, moves the cursor down and right.
KP5	Toggles the pixel under the cursor without moving the cursor.

Table 6-12: Keys for Moving Cursor to Window Areas
Key	Description
GOLD KP7	Moves the cursor to the upper-left corner.
GOLD KP8	Moves the cursor to the top row.
GOLD KP9	Moves the cursor to the upper-right corner.
GOLD KP4	Moves the cursor to the leftmost column.
GOLD KP5	Moves the cursor to the center of the window.
GOLD KP6	Moves the cursor to the rightmost column.
GOLD KP1	Moves the cursor to the lower-left corner.
GOLD KP2	Moves the cursor to the bottom row.
GOLD KP3	Moves the cursor to the lower-right corner.

Table 6-13: Keys for Drawing Font Glyphs
Key	Description
L or l	Draws a line connecting two selected points.
C or c	Draws a circle centered at a selected point.
r	Draws an open rectangle in a selected area.
R	Draws a solid rectangle in a selected area.
e	Draws an open ellipse in a selected area.
E	Draws a solid ellipse in a selected area.
X or x	Mirrors the font glyph along the horizontal axis (X-axis).
Y or y	Mirrors the font glyph along the vertical axis (Y-axis).
/	Mirrors the font glyph along the 45-degree diagonal axis.
\	Mirrors the font glyph along the 135-degree diagonal axis.
F or f	Depending on cursor mode, fills an area.
T or t	Inverts the state of all pixels.

Table 6-14: Keys for Editing Font Glyphs
Key	Description
KP0	Changes the display in the Edit window from the font glyph in the Edit buffer to the font glyph in the Use buffer.
GOLD KP.	Displays font glyphs in the reference windows.
GOLD KP0	Changes the display in the Edit window from the font glyph in the Use buffer to the font glyph in the Edit buffer.
Select	Starts or cancels a selected area.
Insert	Inserts the contents of the CUT-AND-PASTE buffer.
Remove	Cuts a selected area to the CUT-AND-PASTE buffer.
GOLD Remove	Copies a selected area to the CUT-AND-PASTE buffer.
GOLD Up-arrow	Shifts the font glyph up by one line.
GOLD Down-arrow	Shifts the font glyph down by one line.
GOLD Left-arrow	Shifts the font glyph left by one column.
GOLD Right-arrow	Shifts the font glyph right by one column.

There is often more than one way to perform the same editing operation. The following summary discusses one method to accomplish various operations:

Drawing the glyph
Use the keys KP1 to KP9 to draw and navigate in the editing window. These keys are bound to cursor movement. With the exception of KP5, you can think of these keys as points on a compass; each point represents the direction in which drawing occurs. Drawing is affected by cursor mode, which is controlled using the KP3 key. When cursor mode is set to Move, the drawing keys move the cursor without drawing anything.
Use the KP5 key (in the middle of the compass) to toggle the pixel state on or off.
Cursor movement is affected by Type and Wrap modes, which are bound to the KP- and KP, keys, respectively.

Editing the glyph

Drawing keys change pixels one at a time. Several operations (cut, paste, and copy) affect pixels as a block. Use the Select function to define a select area. Then use Cut or Copy to move the block of pixels to a paste buffer. You can then move the cursor to another position and use the Paste function to move the pixels in the paste buffer to the new position. The paste operation is affected by the Paste mode setting.

To move the entire glyph in a particular direction, you can press the GOLD or PF1 key and the appropriate arrow key.

To undo the last editing operation, press the U key.

Displaying the glyph in actual size

If you are working on an Asian terminal rather than in a DECterm window, you can press the KP. key to display the glyph in actual size. This operation is not supported through DECterm windows.

Creating multiple prototypes of a glyph

You can create several versions of a glyph, storing earlier versions in reference windows, and later choose the one you like best. Press the KP. key to move a glyph from the editing window to a reference window. The three reference windows are used in round-robin fashion, from left to right.

Note that the Refer function available from the pop-up menu allows you to move an existing glyph from the current or reference database to a reference window.

Replacing the glyph in the editing window with another glyph

The Use function moves a glyph into the editing window. The Use function bound to the keypad copies a glyph from another codepoint in the current or reference database. The Use function accessed from the pop-up menu moves a glyph from one of the reference windows into the editing window.

The Use function saves a copy of the current glyph in the editing window to the Use buffer. You can retrieve the glyph from this buffer by pressing the KP0 key. Unlike the contents of the Undo buffer, the glyph in the Use buffer is available across editing operations.

Creating multiple sizes of glyphs

The Scale option in the cedit main menu creates multiple sizes of all glyphs in the database with the currently selected size. The Scale option available for the font-editing screen creates multiple sizes of only the character currently being edited. If you are working with an existing UDC database, use the Scale option from the font-editing screen rather than the cedit main menu. When scaling is implemented from the cedit main menu and affects an entire database, the operation undoes any manual refinements that may have been made to fonts after scaling.

Quitting the font-editing screen

Press the Enter key to save your edits and to exit from the font editing screen.

Press the GOLD or PF2 and Enter keys to quit without saving your edits.

After you create a font glyph, you need to specify its name, input key sequence, collating value, and, optionally, the name of the class to which the character belongs. Use the Edit menu items on the cedit user interface screen to specify these attributes.

6.9.2 Creating UDC Support Files That System Software Uses

The character attributes stored in the UDC database must be directed to specific kinds of files to meet the needs of different kinds of system software. Terminal driver software and the asort utility, for example, must recognize user-defined character attributes but cannot directly access information in UDC databases. Therefore, after you create or change character attributes in a UDC database, you use the cgen command to create the following support files:

Font files that the SoftODL (software on-demand loading) service uses

Font files that can be directly loaded to the device

Collating value tables for sorting characters

Files of input key sequences for user-defined characters

Font files that X and Motif applications use

The following command creates some of these files for the UDC database ~wang/.udc:


% cgen -odl -pre -col -iks ~wang/.udc

If you enter the cgen command without specifying options, it displays statistical information about the specified database. If you enter the command without specifying a UDC database, the private user database is used for a nonprivileged user and the system database for the superuser. In other words, the database specification in the preceding example would not be needed if the user who entered the command was logged on as wang.

Table 6-15 describes cgen command options.

Table 6-15: The cgen Command Options
Option	Description
`-bdf`	Creates `.bdf` files needed for X and DECwindows Motif applications.
`-col`	Creates collating value tables. You must use the `asort` command, rather than the `sort` command, if you want to apply these tables during sort operations.
`-dpi 75\|100`	Sets resolution to either `75` or `100` when creating `.bdf` and `.pcf` files with the `-bdf` and `-pcf` options.
`-fprop property`	Sets the font property when creating `.bdf` and `.pcf` files with the `-bdf` and `-pcf` options.
`-iks`	Creates the input key sequence file.
`-merge font_pattern`	Invokes the `fontconverter` command to merge the UDC fonts with an existing `pcf` font file that matches the specified font_pattern (for example, `'-140-jisx0208'`). If you specify the `-merge` option, you must also specify the `-pcf` and `-size` options. The output `.pcf` file is in the form `registry_width_` `height.pcf`, where registry* is the font registry field of the specified font file.
`-osiz widthxheight`	Specifies the font size for `bdf` output format. The font size in `bdf` format may be different from the size of the font defined in the UDC database. The font sizes that the `cedit` command supports are limited; the `-osiz` option lets you override these size restrictions both in the `.bdf` file and the `.pcf` file generated from the `.bdf` file. If the size parameters specified for the `-osiz` option are smaller than the size parameters specified for the `-size` option, only the upper left portion of the UDC font glyph is used. If the size parameters specified for the `-osiz` option are larger than the size parameters specified for the `-size` option, the lower-right portion of the resulting font glyph is filled with OFF pixels.
`-pcf`	Invokes the `bdftopof` command to create the `.pcf` files needed for X and Motif applications. When you use this option, the `cgen` command also invokes the `mkfontdir` and `xset` commands to make the fonts known to the font server and available to applications.
`-pre`	Creates preload font files. Preload font files are files that are directly and completely loaded to a terminal and some printers. Preload files are not useful when UDC databases are large because of the limited memory available on most devices. On-demand loading (ODL), which uses ODL font files, is an alternative to using preload font files.
`-odl`	Creates ODL font files. The terminal driver handles loading of fonts from ODL font files on an incremental basis, according to need and available memory.

6.9.3 Processing UDC Fonts for Use with DECwindows

The preload font files created with the -pre option of the cgen utility must be converted to bdf (Bitmap Distribution Format) or pcf (Portable Compiled Format) for use by X11 or DECwindows applications. The fontconverter command performs this conversion and can do one of two things with the converted output:

Create independent pcf and bdf font files, which you must then install on your workstation for application use
Merge the fonts into an existing DECwindows (pcf) font file

The remainder of this section discusses the fontconverter command and when to use its available options. The cgen command has comparable options; in other words, you can perform fontconverter operations indirectly by using similar options on the cgen command line.

6.9.3.1 Using fontconverter Command Options

The following example shows the simplest form of the fontconverter command, which relies on defaults for file locations, output file names, input file name extensions, and font dimensions. Assume for this example and the following discussion that the locale is set to a Japanese locale when the command is entered and that 24x24 was specified in the cedit utility when the font glyphs were created.


% fontconverter my_fonts

The preceding command converts fonts in the ~/.font/my_fonts.pre file. By default, the command creates the font files ~/.font/jisx.udc_24_24.pcf and ~/.font/jisx.udc_24_24.bdf. For the fonts to be available to applications, you can perform one of the following actions with the compiled (.pcf) fonts:

In the directory where the fonts reside, enter the following commands:
```
% /usr/bin/X11/mkfontdir

% /usr/bin/X11/xset +fp `pwd`
```
These commands make the fonts available for testing until a server restart or system shutdown occurs.
Including the -bdf and -pcf options on the cgen command line is a one-step alternative to executing the fontconverter and the preceding commands as separate operations.

Perform the following actions to make the fonts available on a more permanent basis (that is, after a server restart or system shutdown):

Copy the .pcf fonts to an existing font directory, for example, /usr/i18n/usr/lib/X11/fonts/decwin/100dpi.
```
% cp ~.fonts/jisx.udc_24_24.pcf \
/usr/i18n/usr/lib/X11/fonts/decwin/100dpi
```

Change to that directory.


% cd /usr/i18n/usr/lib/X11/fonts/decwin/100dpi

Enter the mkfontdir command at that location.
```
% /usr/bin/X11/mkfontdir
```
Enter the following command:
```
% /usr/bin/X11/xset fp rehash
```

Table 6-16 lists and describes options of the fontconverter command. With the exception of -preload, the options are listed in command-line order. See Section 6.9.3.2 for examples that use these options.

Table 6-16: Options and Arguments of the fontconverter Command
Argument or Option	Description
`-merge`	Specifies that command output be merged with an existing DECwindows font file. See also the entry for the `-font` option.
`-w`	Specifies the font width. Use this option when the fonts are created with a width smaller than the one specified for the `cedit` font editing window.
`-h`	Specifies the font height. Use this option when the fonts are created with a height smaller than the one specified for the `cedit` font editing window.
`-udc udc_font`	Specifies the name of the output UDC font file. This name is also used in the XLFD (X Logical Font Description) as the registry name. Use this option when you are creating a standalone output file (you are not merging output into an existing file) and you do not want your output file to have a default name. The default base names for output files vary according to language, as follows: Japanese: `jisx.udc` Hanyu: `dec.cns.udc` Hanzi: `gb.udc` The `fontconverter` command automatically appends font width and height to the output file base name.
`-font reference_font`	Specifies a reference DECwindows font file. If you use this option with the `-merge` option, reference_font indicates the `pcf` format file with which converted font glyphs are merged. If you use this option without the `-merge` option, the header of reference_font is used as a reference for generating the header of the standalone output file. Information in reference_font is also used to determine default characters in the standalone output file. A default character is a glyph (usually a square) that appears when the font does not contain any glyphs for a specified code.
`-preload preload_font`	Specifies the input file (created by the `cgen -pre` command). Use this option when you want to specify the preload_font argument at an arbitrary position in the `fontconverter` command line. You can omit `-preload` when placing preload_font at the end of the command line.

6.9.3.2 Controlling Output File Format

X and DECwindows Motif applications require loadable fonts in pcf format.

If you do not use the -merge option, the fontconverter command creates standalone font files in both pcf and bdf format. When you specify the -merge option, the converted fonts are merged into the pcf file specified by the -font option and a bdf file is not created.

When you merge UDC fonts with standard DECwindows fonts, you can use the combined file with all DECwindows Motif applications.

When you create independent font files, you can use the fonts with applications that explicitly load the file. If the font registry is one of the UDC registries for a particular locale, you can also use the files with standard system applications.

Note that fontconverter processing time is longer when you merge fonts into an existing file as compared to when you create independent files.

The following example:

Converts preload format fonts in the file udc_font.pre
Merges output into the DECwindows font JISX.UDC_*
Generates the output file JISX.UDC_16_16.pcf


% fontconverter -merge -font 'JISX.UDC_*' \
udc_font.pre

The following command:

Creates the files deckanji.udc_16_16.bdf and deckanji.udc_16_16.pcf
Obtains the default characters and most header information for these files from the DECwindows font JISX.UDC_*
Sets the font registry field to deckanji.udc


% fontconverter -udc deckanji.udc -font \
'JISX.UDC_*' udc_font.pre

6.10 Setting Up and Using the Chinese Phrase Input Method

In Korea, Taiwan, and China, users can input a complete phrase by typing a keyword, abbreviation, or acronym. This capability is provided by a phrase database and one of the following:

The VT382-D Traditional Chinese terminal
When using this terminal, a phrase database is loaded in its entirety to the terminal. Memory limitations restrict the size of the database to 100 phrases. The last line on the screen (line 26) is reserved for different input methods, phrase input being one of them, and users are prompted to enter phrase codes on this line.

The SIM (Software Input Method) service
This service, which is enabled through the -adec option of the stty command, extends support of phrase input to other Asian terminals in the VT382 series. The SIM service loads phrases dynamically to the terminal; therefore, the size of the phrase database is not limited by memory restrictions of terminal hardware. When using a terminal supported by the SIM service, you press a user-defined key sequence to toggle in and out of phrase input mode. Entering phrase input mode shifts the site of user input to the 26th line of the terminal screen where you are prompted to enter phrase codes.

The phrase input mechanism available in the DECwindows Motif environment on workstations
DECterm windows do not implement the 26th line of a terminal screen, so the SIM service does not work correctly on workstations. Phrase input, along with other kinds of input methods, is supported by the input method server for the Chinese and Korean languages. On workstations, you enter phrases by invoking the Input Method window and selecting the phrase item.

The phrase utility allows you to create and maintain a phrase database and, when using the VT382-D terminal, to load the database to the terminal.

Table 6-17 lists and describes basic terms associated with phrase input.

Table 6-17: Chinese Phrase Input Definitions
Term	Description
phrase	The string for the phrase that the user wants to retrieve. Each phrase is a string of any characters in the codeset of the current locale and can be a maximum of 80 bytes in length.
phrase code	The keyword entered by the user to retrieve a phrase. Each phrase code is a string of up to 8 ASCII alphanumeric characters.
class	A group of logically related phrases. Each class has an identifier that is a string of up to 8 ASCII characters.
database	A set of two files: the phrase data file `phrase.dat` and the class data file `class.dat`. If a phrase database is moved from one directory to another, the two data files must be moved together. There are two types of phrase databases: system and user. The system database is shared by all users on the system and is maintained by the system administrator. User databases are defined and maintained by individual users. Pathnames for the system and user phrase database directories are set in the file `/var/i18n/conf/cp_dirs`, which is described in a subsequent section of this chapter. By default, this file sets the pathname for the system phrase database directory to be `/var/i18n/sim` and for the user phrase database directory to be `$HOME/.sim`. Phrase database files are locale specific and reside in locale directories subordinate to the default path. For example, an individual user might create and maintain the following sets of files to support two different locales: $HOME/.sim/zh_TW.big5/phrase.dat $HOME/.sim/zh_TW.big5/class.dat $HOME/.sim/zh_TW.dechanyu/phrase.dat $HOME/.sim/zh_TW.dechanyu/class.dat

6.10.1 Enabling the SIM Service

Table 6-18 lists and describes the options on the stty command line that enable and set certain characteristics for Chinese phrase input through the VT382 series of Asian terminals. These options do not apply to DECterm windows, for which phrase input is supported using mechanisms other than SIM.

Table 6-18: The stty Options Used for the SIM Service
stty Option	Description
`sim`	Enables the Software Input Method (SIM) service.
`-sim`	Disables the Software Input Method (SIM) service.
`simkey key`	Sets the toggle key for entering phrase input mode.
`simclass class`	Sets the current class name for locating the appropriate phrase in the phrase database. Classes identify subsets of information in the phrase database and are defined by using the `phrase` utility.
`simdb path`	Sets the path for the phrase database.
`simall`	Displays current SIM service settings.

6.10.2 Creating and Maintaining a Chinese Phrase Database

You can create or maintain a phrase database by using the phrase utility. On workstations, you invoke this utility with the following command:


% phrase

The command assumes that you are using a private phrase database if you are a nonprivileged user and the systemwide phrase database if you are superuser. You can change these defaults by using the utility's menu interface.

If you are working on a VT382-D traditional Chinese terminal, you may also include one of the options described in Table 6-19. These options allow you to use the hardware phrase input method supported by your terminal.

Table 6-19: The phrase Command Options for the VT382-D Terminal
phrase Option	Description
`-user class_name`	Downloads the phrase definitions for the specified class from your private phrase database to the terminal.
`-system class_name`	Downloads the phrase definitions for the specified class from the systemwide phrase database to the terminal.

On startup, the phrase utility displays a full-screen, menu-driven interface like the one in Figure 6-6.

Figure 6-6: User Interface Screen of the phrase Utility

Take the following steps to change the language of messages and other text on the user interface to English:

Press the L key.
This action displays items on the LANGUAGE menu.
Press the E key.
This action specifies English for the user interface.

The phrase utility is a curses application. To navigate the phrase utility user interface, use the following guidelines:

Select a menu and menu items without activating them by using the arrow keys.

Press either Return or the space bar to activate the selected menu or menu item.

To select and activate in one operation, press the key for the underlined letter in the name of a menu or menu item, depending on your current level in the menu hierarchy.

Press Ctrl/X to return to a higher level of the menu hierarchy without activating a selection.

Pressing Ctrl/X when a menu is not activated causes the phrase utility to exit.

The phrase user interface screen includes:

A menu bar (upper-left corner of the screen)

An area that specifies the current phrase database and class (to the right of the menu bar)

Two lines for warning and informational messages (bottom of screen)

A large area for menu expansion and user dialog (center of screen)

The different menus allow you to perform the following operations:

FILE menu
- Override the default path for the phrase database with which you want to work
- Load phrases to a VT382D terminal
- Exit from the phrase utility and save any changes made to the database

CLASS menu

Create a class
View phrases in the selected class
Rename a class
Delete a class
Select (change) the current class

PHRASE menu

Create a phrase within the selected class
If you do not explicitly select a class, class DEFAULT is assumed.
Modify a phrase
Delete a phrase

LANGUAGE menu

Choose English or Chinese as the language in which screen text and messages appear

The following guidelines and restrictions apply to the phrase-management operations that you can perform:

Creating and maintaining phrases
- Phrases are always manipulated within the context of a phrase class. If you have not explicitly selected a class, the phrase is assumed to be in class DEFAULT. Otherwise, the phrase applies to the last class name you explicitly selected.
- When you choose options that manipulate phrase definitions, a two-part window appears. The left side displays phrase codes while the right side displays phrases.
  You input phrase names and definitions in an area below the two-part display window. Choose your phrase name carefully. This is the code used to invoke the phrase later. You cannot modify the phrase name without deleting and reentering the entire phrase definition.
- Phrase names must be unique within a given class, but you can use the same phrase name in different phrase classes.
- The phrase itself can contain up to 80 bytes of data, which correspond roughly to 80 columns on the screen. All 80 bytes of data appear in the user input area; however, the display window provides fewer than 80 columns to display the phrase. As a result, long phrase definitions are truncated at the right boundary of the display window. In such cases, the right angle bracket (>) appears in the rightmost position to indicate that the phrase definition contains more data. This truncation is a restriction of the display window and does not apply to the phrase when it is invoked.

Creating and maintaining classes
- Classes are created and maintained within the context of a particular database. If you have not explictly specified a database, the class operation applies to your default database.
- Class names must be unique within a database.
- Creating a new class causes that class to be the selected class and then automatically invokes the function to create new phrases for the class.
- The hardware phrase input method used on the VT382D terminal can load up to 100 phrases in a class. Keep this limitation in mind if you use one of these terminals or are maintaining a database accessed by others who log on through terminals.
  There are no restrictions on the number of phrases in a class when phrases are retrieved through other Asian terminals in the VT382 series or through the Input Method window in the DECwindows Motif environment.

Using multiple phrase databases
- Phrase databases are locale specific. You cannot invoke the phrase utility without setting the LANG environment variable to a locale; however, you can create phrase databases for any locale. Be sure that the LANG environment variable is set to the locale you want to create phrases for before invoking the phrase utility. Otherwise, you will be working with (or creating) phrase databases for a locale different from the one you want.
- You can copy phrase definitions to your private database from the systemwide database and from databases of other users (assuming their file protections allow you read access). If you choose to copy phrases from another user's database, you are prompted for the absolute path of the database from which you want to copy. If the specified database is accessible to you, all its phrase definitions are listed and you select the ones you want to copy.
- You must own a database to create, delete, or modify classes in that database. Unprivileged users can perform write operations on their private databases. Only the superuser can perform write operations on the systemwide database.

6.10.3 Using a Chinese Phrase Database

How you use a phrase database depends on whether you are using the hardware input method or the software input method (SIM) service. You can use either the hardware input method or SIM service on a VT382D Traditional Chinese terminal. For other terminals in the VT382 series of Asian terminals or for a DECterm window on a workstation, you use the SIM service.

If you are using the hardware input method with a VT382D Traditional Chinese terminal, refer to your terminal user guide for phrase input instructions.

6.10.3.1 Phrase Input Supported Through the SIM Service

Before you can use a phrase database, you use the stty command to:

Enable the SIM service
```
% stty sim
```
To enable the SIM service, make sure your locale is set to one that supports the Hanzi, Hanyu, or Korean codeset and that your terminal line discipline is set to adec.
Define the key sequence for toggling in and out of phrase input mode
The following example sets this key sequence to be Ctrl/B:
```
% stty simkey ^B
```
When you define the key sequence to toggle in and out of phrase mode, pick one that you do not already use at the command line or in other applications. For example, do not define the key sequence to be Ctrl/C (abort operation) or Ctrl/Z (suspend operation).

If you do not want to use phrases from the class DEFAULT or from your default phrase database, use the stty command to:

Specify the phrase class that the SIM service or specialized terminal software will use to interpret phrase codes
```
% stty simclass CORP
```
Specify the database that specialized terminal software will access
The SIM service always searches your private phrase database first for a phrase name and, if the name is not found, then searches the systemwide phrase database. However, terminals that support the hardware phrase input method can load phrases from only one database at a time. Therefore, a nonprivileged user using the terminal hardware input method might enter the following command:
```
% stty simdb /var/i18n/sim
```

When the terminal setup is complete, you can perform the following actions to retrieve a phrase:

Press the key sequence specified for the simkey option of the stty command.
```
% [Ctrl/B]
```
At the bottom of your screen, you are then prompted to enter a phrase code.
Type the phrase code and then press either Return or the space bar.
The phrase is returned to the screen or, if the phrase code was not found, an error message appears.

When you want to exit from phrase input mode, press the simkey key sequence again.

While in phrase input mode, the characters that you enter are subject to the following rules:

Lowercase alphanumeric characters, which are valid characters for phrase codes, are converted to uppercase.
A space or Return character entered when the phrase code buffer is empty is sent directly to the application from which you entered phrase input mode.
This behavior means that you do not have to exit from phrase mode to enter a space or newline between phrases.
If you enter printable characters other than alphanumeric ones, the bell rings to signal that they are invalid characters for a phrase code.
Control key sequences other than the one used to toggle in and out of phrase mode are sent directly to the application from which you entered phrase input mode.
This behavior means that control sequences such as Ctrl/Z and Ctrl/C are handled as you would expect for the system command line, editor, or other application where the phrases are being entered.
Pressing a function or arrow key produces undefined results.

6.10.3.2 Phrase Input in the DECwindows Motif Environment

When phrase input is supported by your language setting and the associated input method server is running, your DECwindows Motif environment includes an Input Options window. Click on the Options button in this window to:

Select the phrase database (user or system)
Select the phrase class within the database
Start phrase input
To start phrase input, select Input Method Customization from the Input Options menu and, in the pop-up dialog box, select Phrase.

6.11 Modifying the Database Location Configuration File

This section discusses the content and format of the file /var/i18n/conf/cp_dirs. Software services or hardware use this file to locate various kinds of databases that support input of Asian user-defined characters and phrases.

Example 6-2 shows the default entries in the cp_dirs file. You can edit these entries to change the default locations.

Example 6-2: Default cp_dirs File

#
# Attribute directory configuration file
#
#                       System location         User location
#                       ===============         =============
udc     -               /var/i18n/udc           ~/.udc
odl     -               /var/i18n/odl           ~/.odl
sim     -               /var/i18n/sim           ~/.sim
cdb     /usr/i18n/.cdb  /var/i18n/cdb           ~/.cdb
iks     -               /var/i18n/iks           ~/.iks
pre     -               /var/i18n/fonts         ~/.fonts
bdf     -               /var/i18n/fonts         ~/.fonts
pcf     -               /var/i18n/fonts         ~/.fonts

Each line in the cp_dirs file represents one entry and consists of the following format:

service_name standard_path system_path user_path

The service_name can be one of the following:

bdf (for font files in bdf format)

cdb (for collating value databases used with the asort command)

iks (for input key sequence files)

odl (for databases of fonts and input key sequences that the SoftODL service uses)

pcf (for files in pcf format)
These files, depending on their font resolution, reside in either the 75dpi or 100dpi subdirectory.

pre (for font files in preload format created by the cgen utility)
These are raw font files used to preload multibyte-character terminals.

sim (for phrase databases)

udc (for UDC databases)

The cp_dirs file can contain only one entry for each service named. Remaining fields in the entry line consist of the following:

standard_path specifies the location of the collating values database for the standard character sets (applies only to the cdb entry)
system_path specifies the location of systemwide databases
user_path specifies the location of users' private databases

The preceding locations are specified as one of the following:

An absolute pathname (starting with /)
A pathname (starting with ~/) that is relative to a user's home directory
- (minus sign or hyphen) to indicate that the entry is not used
For example, you can specify - to be user_path for all services related to user-defined characters if you want these characters supported only through systemwide databases.

Comment lines in the cp_dirs file begin with the number sign (#).

6.12 Using Printer Interface Features That Support Local Languages

When you install Digital UNIX language variant subsets, your printing subsystem is enhanced with the following features:

A set of print filters that support escape sequences used by local-language printers
Entries in the /etc/printcap file to support printer code conversion and on-demand loading of font files
An enhanced lprsetup command that lets you add entries for local-language printers to the /etc/printcap file
lp, lpr, lpc, lpq, lprm, and lpstat commands that support additional options for printing and printer control
Support for on-demand loading in the lpd printer daemon
The pfsetup command and associated daemon for downloading fonts to PostScript printers

The following sections discuss these features.

6.12.1 Print Filters for Local Language Printers

A print filter processes text data for a particular model of printer. The filter handles the device dependencies of the printer and performs device accounting functions. When each print job is completed, the print filter writes an accounting record to the file specified by the af field of the printer's entry in the /etc/printcap file.

The print filters for local-language text printers can handle text files that contain ASCII and local-language characters, or output files created by the nroff command. When processing nroff output, the filter removes multibyte characters that extend beyond the page boundary and translates nroff control sequences for underlining, superscripting, and subscripting to control sequences appropriate for the printer. However, the filter does not support multiple nroff control sequences on the same character.

The PostScript print filters can print PostScript files in addition to text and nroff output files. The memory requirement for some Asian fonts exceeds what is available on most printers, so there are specific font-loading mechanisms for loading these fonts on PostScript printers (see Section 6.12.5).

A local-language print filter can be the specified filter in both the of and if fields in the /etc/printcap file. For general information on /etc/printcap entries, refer to System Administrationand the printcap(4) reference page. Supplementary information is provided in the i18n_printing(5) reference page. A reference page for a specific language (for example, Japanese(5)) lists the names of print filters that support printing characters in that language.

The following print filters process text data for Asian languages:

Language	Filter	Printer
Japanese	`la84of`	LA84-J
Japanese	`la86of`	LA86-J
Japanese	`la90of`	LA90-J
Japanese	`la280of`	LA280-J
Japanese	`la380of`	LA380-J
Japanese	`ln03jaof`	LN03-J
Japanese	`ln05jaof`	LN05-J
Hanzi	`la88cof`	LA88-C
Hanzi	`la380cbof`	LA380-CB
Korean	`la380kof`	LA380-K
Korean	`dl510kaof`	DL510-KA
Hanyu	`cp382dof`	CP382-D
Thai	`thailpof`	EP1050+

The following print filters process PostScript and text data for Asian languages and for some of the languages supported by locales using the ISO8859-2, ISO8859-5, ISO8859-7, and ISO8859-9 codesets:

Language	Filter	Printer
Japanese	`ln82rof`	LN82R
Czech, Hanyu, Hanzi, Hungarian, Greek, Korean, Polish, Russian, Slovak, Slovene, and Turkish	`dl1152wrof`	DEClaser 1152
Thai	`dl1152trof`, `dl1152ttmrof`	DEClaser 1152
Czech, Hanyu, Hanzi, Hungarian, Greek, Korean, Polish, Russian, Slovak, Slovene, and Turkish	`dl5100wrof`	DEClaser 5100
Thai	`dl5100trof`, `dl5100ttmrof`	DEClaser 5100

See the reference page for a specific language (for example, Japanese(5)) to find the names of print filters that support printing characters in that language.

6.12.2 Support for Local Language Printers in /etc/printcap

The /etc/printcap file describes characteristics of each printer on the system. Printer characteristics are specified by symbol/value pairs, where each symbol is a 2-character mnemonic. Each time a user submits a print job, the lpd printer daemon and printer spooling system uses information in the /etc/printcap file to determine how that job is handled.

Table 6-22 lists and describes /etc/printcap symbols that are specific to support for local-language printers. Refer to printcap(4) for descriptions of other symbols used in the /etc/printcap file. Refer to Section 6.12.3 for an example of using the lprsetup command to add several of these options to the /etc/printcap for a local-language printer.

Note
Some printers, such as the DEClaser 5100, support printing in a variety of Asian languages. In such cases, if you want to use the same printer for printing different Asian languages, the following restrictions apply:

You must set up different print queues and use different spool areas for each language.

You must differentiate the queues by defining different sd (spool directory) and plocale (printer locale) values in the /etc/printcap file.

If the preceding requirements are not met, files may occasionally be printed in the wrong locale, resulting in meaningless output. There is one problem that can result from setting up multiple print queues and directories for the same printer. If two or more jobs are sent to different queues for the same printer within a very short time, some jobs may be blocked so that they do not print. If this happens, the system manager must use the lpc command to restart the blocked jobs.

Table 6-22 lists and describes /etc/printcap symbols that are specific to local-language requirements.

Table 6-22: Symbols in /etc/printcap File for Local Language Printers
Symbol	Type	Default	Description
`ya`	`str`	None	Double-quoted list of keyword value assignments. This assignment list specifies most of the printer options related to country-specific support. The option keywords, which are explained following this table, include `flocale`, `font`, `line`, `odldb`, `odlstyle`, `onehalf`, `plocale`, `spcom`, `tacdata`, and `tm`.
`yd`	`str`	None	Secondary tty line or channel for font faulting Specify this entry for the DEClaser 1152 printer to support the font-faulting mechanism. The font-faulting mechanism, which is enabled by the `alpc` and `ffserver` commands, allows the printer to use fonts that are installed but not downloaded. Font faulting is required to support Chinese, Korean, and some other fonts. The font-faulting daemon (`ffd`) uses the secondary tty line to send font information to the printer.
`yj`	`str`	`NULL`	If `on` is specified as a value, restarts the filter specified for the `of` symbol for every print job. You need to define this symbol only for printers that are not country-specific and only if non-ASCII characters need to be printed on the flag page of printed output.
`ys`	`num`	`NULL`	Size of the SoftODL character cache The `ys` entry is applied to text print filters. It must be present and its value must be greater than zero to enable on-demand loading of font files. These font files are the ODL support files created by the `cgen` utility for user-defined characters. The location of the SoftODL support files is identified by the path for systemwide ODL files in the database location configuration file `/usr/var/i18n/conf/cp_dirs`. ODL files for private UDC databases are not downloaded to printers. For optimal performance, the cache value specified for the `ys` field should match the printer cache size. To find out the cache size for a particular printer, refer to the printer's manual.
`yt`	`str`	`fifo`	The SoftODL character replacement method The `yt` entry applies to text print filters. The value for this entry can be either `fifo` (first-in-first-out) or `lru` (least recently used). You can type either uppercase or lowercase letters for these values. To find out which value is appropriate for a particular printer, refer to the printer's manual.

The value assigned to the ya symbol is a quoted string that can include one or more of the following options:

flocale=locale_name
Specifies the locale for interpretation of file text. The print filter uses this locale to validate characters in the text. For an Asian language that is supported by more than one codeset, a difference between the flocale and plocale values determines whether codeset conversion is done before the file is printed. If flocale is not specified, the filter interprets the file in the current locale.

font=font_name

Specifies the name of the outline font for printing PostScript files. This font must be appropriate for the specified plocale value.

line=number_of_lines

Specifies the number of lines per page. When used in combination with the -w flag of the lpr command, the line number can control the font size and orientation of printed output.

odldb=odl_database_path

Specifies the pathname of the software on-demand (SoftODL) database. By default, the printer uses the systemwide database as specified in the cp_dirs file.

odlstyle=style-NxN

Specifies the SoftODL font style and size to use, for example normal-24x24. If odlstyle is not specified, the default style and size set for the systemwide database is used.

onehalf

For the Thai language, specifies that characters be printed on one and a half lines, rather than three lines, to produce more compressed and natural looking output. The onehalf option is valid only for the thailpof print filter.

plocale=locale_name

Specifies the printer locale. Some printers, such as the LA380-CB printer, are country-specific and have built-in fonts that are encoded in a particular codeset. For these printers, the codeset part of locale_name should match the codeset of the built-in fonts. Other printers, such as the DEClaser 5100, are generic and suitable for printing files in a variety of languages. For these printers, the codeset part of locale_name should match the codeset of the font needed to print files in a particular language (or set of languages). Remember that to use the same generic printer for printing files in different languages, you must define a separate print queue and spool directory for each language (codeset) in which print jobs will be submitted.

spcom

Enables space-compensation mode for languages, such as Thai, that contain nonspacing characters. These characters can combine with other characters for display and therefore do not occupy space. Many of the existing tools that align text do not handle nonspacing characters correctly. If you want to print the Thai output that these tools generate, you should specify the spcom option to ensure proper text alignment in the printed file. This option is valid only when used with a Thai print filter or the th_TH.TACTIS plocale value.

tacdata=tac_data_path

Specifies the location of the character code tables used with the thailpof print filter. By default, tac_data_path is /usr/lbin/tac_data.

tm

Enables text morphing for printing Thai characters. Text morphing replaces some characters with others to produce better printed output. Refer to the Thai(5) reference page for information on text morphing.

6.12.3 Enhancements to the lprsetup Command

The lprsetup command helps you manage the printers on your system. The command queries you for answers to questions about adding, deleting, or changing the characteristics of any printers on your system. The questions have default answers, which are delimited by brackets ([ ]). Online help is available for each question. Either press only the Return key to choose the default answer or enter a valid alternative. Follow instructions displayed by lprsetup to see the help message for each question.

After you enter characteristics for a particular printer and verify that your entries are correct, the lprsetup command creates the printer spooling directory, links the filters, and writes the entry for the printer in the /etc/printcap file.

Example 6-3 shows how you use the lprsetup command to set up a local-language printer, in this case, ln05ja.

Example 6-3: Setting Up a Local Language Printer with lprsetup


# /usr/sbin/lprsetup   (1)
Digital OSF/1 Printer Setup Program

Command < add modify delete exit view quit help >: add

Adding printer entry, type '?' for help.


Enter printer name to add [0] : ln05    (2)

For more information on the specific printer types Enter
`printer?'

Enter the FULL name of one of the following printer
types:

cp382d  dl1152w dl510ka   dl5100w  ep1050+  fx80    fx1050    hpIIP
hpIIIP  hpIIID  hpIV      hp4M     ibmpro   la50    la70      la75
la84    la86    la88      la88c    la90     la280   la324     la380
la380cb la380k  la424     lf01r    lg02     lg06    lg12      lg31
lj250   ln03    ln03ja    ln03r    ln03s    ln05    ln05ja    ln05r
ln06    ln06r   ln07      ln07r    ln08     ln08r   ln09      ln10ja
ln82r   nec290  remote   unknown


or press RETURN for [unknown] : ln05ja    (3)

.
.
.

Enter the name of the printcap symbol you wish to modify.
Other valid entries are:

        'q' to quit (no more changes)
        'p' to print the symbols you have specified so far.
        'l' to list all of the possible symbols and defaults.

The names of the printcap symbols are:

 af  br  cf  ct  df  dn  du  fc  ff  fo  fs  gf  ic  if  lf  lo
 lp  mc  mx  nc  nf  of  op  os  pl  pp  ps  pw  px  py  rf  rm
 rp  rs  rw  sb  sc  sd  sf  sh  st  tf  tr  ts  uv  vf  xc  xf
 xs  ya  yd  yj  yp  ys  yt  Da  Dl  It  Lf  Lu  Ml  Nu  Or  Ot
 Ps  Sd  Si  Ss  Ul  Xf


Enter symbol name: ya    (4)

Enter a new value for symbol 'ya'? ["plocale=ja_JP.sdeckanji"]

Do you want to enable ODL? [n] y    (5)


Enter symbol name: yt    (6)

Enter a new value for symbol 'yt'? [fifo]


Enter symbol name: q    (7)

.
.
.

Invokes the lprsetup program.

Selects a name for the printer (see Table 6-23).
Selects the printer type.
Specifies the printer locale.
Enables on-demand loading (ODL) of printer fonts for user-defined characters. An affirmative response also sets the cache size that the SoftODL service uses. This value, by default the appropriate cache size for the printer, is stored as value of the ys symbol in the /etc/printcap file.
Specifies the character replacement method that the SoftODL service uses.
Quits the program to indicate no more changes are needed to the /etc/printcap file.

Table 6-23 lists Asian languages and the associated printer choices as displayed by the lprsetup script.

Table 6-23: Local Language Printers Supported by the lprsetup Command
Language	Printer
Japanese (text only)	`la84j`, `la86j`, `la90j`, `la280j`, `la380j`, `ln03ja`, `ln05ja`,
Japanese (PostScript)	`ln83r`
Traditional Chinese (text only)	`cp382d`
Simplified Chinese (text only)	`la88c`, `la380c`
Korean (text only)	`la380k`, `dl510k`
Czech, Hanyu, Hanzi, Hungarian, Greek, Korean, Polish, Russian, Slovak, Slovene, and Turkish (PostScript)	`dl1152w`, `dl5100w`
Thai (text only)	`dp1050+`
Thai (PostScript)	`dl1152t`, `dl1152ttm`, `dl5100t`, `dl5100ttm`

Hungarian, Czech, Slovak, Slovene (*.ISO8859-2)

Arial-Bold-ISOLatin2
Arial-BoldItalic-ISOLatin2
Arial-Italic-ISOLatin2
Arial-ISOLatin2
ArialNarrow-Bold-ISOLatin2
ArialNarrow-BoldItalic-ISOLatin2
ArialNarrow-Italic-ISOLatin2
ArialNarrow-ISOLatin2
BookAntiqua-Bold-ISOLatin2
BookAntiqua-BoldItalic-ISOLatin2
BookAntiqua-Italic-ISOLatin2
BookAntiqua-ISOLatin2
BookmanOldStyle-Bold-ISOLatin2
BookmanOldStyle-BoldItalic-ISOLatin2
BookmanOldStyle-Italic-ISOLatin2
BookmanOldStyle-ISOLatin2
CenturyGothic-Bold-ISOLatin2
CenturyGothic-BoldItalic-ISOLatin2
CenturyGothic-Italic-ISOLatin2
CenturyGothic-ISOLatin2
CenturySchoolbook-Bold-ISOLatin2
CenturySchoolbook-BoldItalic-ISOLatin2
CenturySchoolbook-Italic-ISOLatin2
CenturySchoolbook-Italic-ISOLatin2
CenturySchoolbook-ISOLatin2
Courier-Bold-ISOLatin2
Courier-BoldItalic-ISOLatin2
Courier-Italic-ISOLatin2
Courier-ISOLatin2
MonotypeCorsiva-ISOLatin2
TimesNewRoman-Bold-ISOLatin2
TimesNewRoman-BoldItalic-ISOLatin2
TimesNewRoman-Italic-ISOLatin2
TimesNewRoman-ISOLatin2

Russian (*.ISO8859-5)

Arial-Bold-ISOLatinCyrillic
Arial-BoldInclined-ISOLatinCyrillic
Arial-Inclined-ISOLatinCyrillic
Arial-ISOLatinCyrillic
Courier-Bold-ISOLatinCyrillic
Courier-BoldInclined-ISOLatinCyrillic
Courier-Inclined-ISOLatinCyrillic
Courier-ISOLatinCyrillic
Nimrod-Bold-ISOLatinCyrillic
Nimrod-BoldInclined-ISOLatinCyrillic
Nimrod-Inclined-ISOLatinCyrillic
Nimrod-ISOLatinCyrillic
Plantin-Bold-ISOLatinCyrillic
Plantin-BoldInclined-ISOLatinCyrillic
Plantin-Inclined-ISOLatinCyrillic
Plantin-ISOLatinCyrillic
TimesNewRoman-Bold-ISOLatinCyrillic
TimesNewRoman-BoldInclined-ISOLatinCyrillic
TimesNewRoman-Inclined-ISOLatinCyrillic
TimesNewRoman-ISOLatinCyrillic

Greek (*.ISO8859-7)

Arial-Bold-ISOLatinGreek
Arial-BoldInclined-ISOLatinGreek
Arial-Inclined-ISOLatinGreek
Arial-ISOLatinGreek
Courier-Bold-ISOLatinGreek
Courier-BoldInclined-ISOLatinGreek
Courier-Inclined-ISOLatinGreek
Courier-ISOLatinGreek
TimesNewRoman-Bold-ISOLatinGreek
TimesNewRoman-BoldInclined-ISOLatinGreek
TimesNewRoman-Inclined-ISOLatinGreek
TimesNewRoman-ISOLatinGreek

Turkish (*.ISO8859-9)

Arial-Bold-ISOLatin5
Arial-BoldItalic-ISOLatin5
Arial-Italic-ISOLatin5
Arial-ISOLatin5
ArialNarrow-Bold-ISOLatin5
ArialNarrow-BoldItalic-ISOLatin5
ArialNarrow-Italic-ISOLatin5
ArialNarrow-ISOLatin5
BookAntiqua-Bold-ISOLatin5
BookAntiqua-BoldItalic-ISOLatin5
BookAntiqua-Italic-ISOLatin5
BookAntiqua-ISOLatin5
BookmanOldStyle-Bold-ISOLatin5
BookmanOldStyle-BoldItalic-ISOLatin5
BookmanOldStyle-Italic-ISOLatin5
BookmanOldStyle-ISOLatin5
CenturyGothic-Bold-ISOLatin5
CenturyGothic-BoldItalic-ISOLatin5
CenturyGothic-Italic-ISOLatin5
CenturyGothic-ISOLatin5
CenturySchoolbook-Bold-ISOLatin5
CenturySchoolbook-BoldItalic-ISOLatin5
CenturySchoolbook-Italic-ISOLatin5
CenturySchoolbook-ISOLatin5
Courier-Bold-ISOLatin5
Courier-BoldItalic-ISOLatin5
Courier-Italic-ISOLatin5
Courier-ISOLatin5
MonotypeCorsiva-ISOLatin5
TimesNewRoman-Bold-ISOLatin5
TimesNewRoman-BoldItalic-ISOLatin5
TimesNewRoman-Italic-ISOLatin5
TimesNewRoman-ISOLatin5

Traditional Chinese (*.dechanyu)
```
Sung-Light-CNS11643
Hei-Light-CNS11643
```
Simplified Chinese (*.dechanzi)
```
XiSong-GB2312-80
Hei-GB2312-80
```
Korean (*.deckorean)
```
Munjo
```
Japanese (*.deckanji)
None (uses printer built-in fonts)

Thai (*.TACTIS)

AngsanaUPC-Bold
AngsanaUPC-BoldItalic
AngsanaUPC-Italic
AngsanaUPC-Light
CordiaUPC-Bold
CordiaUPC-BoldItalic
CordiaUPC-Italic
CordiaUPC-Light
EucrosiaUPC-Bold
EucrosiaUPC-BoldItalic
EucrosiaUPC-Italic
EucrosiaUPC-Light
FreesiaUPC-Bold
FreesiaUPC-BoldItalic
FreesiaUPC-Italic
FreesiaUPC-Light
IrisUPC-Bold
IrisUPC-BoldItalic
IrisUPC-Italic
IrisUPC-Light
JasmineUPC-Bold
JasmineUPC-BoldItalic
JasmineUPC-Italic
JasmineUPC-Light
KodchiangUPC-Bold
KodchiangUPC-BoldItalic
KodchiangUPC-Italic
KodchiangUPC-Light
LilyUPC-Bold
LilyUPC-BoldItalic
LilyUPC-Italic
LilyUPC-Light
WaterlilyUPC-Bold
WaterlilyUPC-BoldItalic
WaterlilyUPC-Italic
WaterlilyUPC-Light
YuccaUPC-Bold
YuccaUPC-BoldItalic
YuccaUPC-Italic
YuccaUPC-Light

6.12.5.2 Setting Up Print Queues With the pfsetup Command

The pfsetup utility is available to manage font downloading for print queues. This command identifies the correct downloading mechanism through the print filter name. The pfsetup command has the following format:

pfsetup[ -s| -d] [ queue_name ]...

You can use the pfsetup command in the following ways:

If you enter the pfsetup command without options, it displays setup information for the specified or all print queues.
The -s option runs the utility in setup mode. In this mode, the utility lists all printer fonts available for downloading to the specified or all print queues.
The -d option runs the utility in download mode. In this mode, you can download fonts for locales that are not supported by the built-in fonts.

6.12.5.3 Downloading Fonts to the DEClaser 1152

A mechanism called font faulting works around the problem of downloading very large fonts to the DEClaser 1152 printer. Font faulting is similar to the on-demand loading (ODL) mechanism used to load user-defined characters; in other words, a subset of fonts is in the device's memory at any particular time and new fonts are swapped in as needed.

For font faulting to work, there must be two channels for printer communication. The primary channel transfers file data from the host system to the printer. The secondary channel transfers font requests and responses between the printer and the host system. You specify the secondary channel through the yd entry for the printer in the /etc/printcap file.

When the printer receives unrecognized characters, it sends font requests through its secondary communication channel. The ffd daemon serves this channel and responds to the font requests from the printer. The daemon searches the font files for the requested fonts and sends back the requested data.

You can manually start and stop the ffd daemon with the following commands:

/usr/sbin/init.d/ffserver start &

/usr/sbin/init.d/ffserver stop &

You have to download at least one font using the pfsetup command to activate the font-faulting mechanism (refer to Section 6.12.5.1 for lists of fonts and to Section 6.12.5.2 for information about the pfsetup command). After the font-faulting mechanism is activated and until the printer is turned off, the mechanism automatically sends information for any font to the printer as required. Therefore, the printer can use all fonts that are installed on the printer's host system, including fonts that are not explicitly downloaded.

Note
Although the font-faulting mechanism allows the printer to use any installed font, there is some overhead cost when a print job uses fonts that are not downloaded to the printer. Therefore, Digital recommends that you use the pfsetup command to download fonts that print jobs most frequently use.

6.12.5.4 Downloading Fonts to the DEClaser 5100

For a DEClaser 5100 printer with a font disk, you can use the pfsetup command to download any fonts installed on the printer's host system. The command prompts you to verify that the printer has a font disk and then downloads the fonts you choose (refer to Section 6.12.5.1 for lists of fonts and to Section 6.12.5.2 for information about the pfsetup command). The number of fonts you can download is limited by the amount of space available on the font disk. After fonts are downloaded, the printer requires no additional setup to use them.

6.13 Using Mail in a Multilanguage Environment

Digital UNIX provides enhanced versions of the following commands and utilities to handle languages based on multibyte-character codesets:

sendmail
mailx
MH (mail handler)

The following sections discuss enhancements to these components, along with a discussion of codeset conversion done by the comsat server. Refer to the sendmail(8), mailx(1), mh(1), comsat(8) reference pages for more complete software descriptions.

6.13.1 The sendmail Utility

The sendmail utility, which is a back end to several user commands, can be configured to pass only 7-bit data in accordance with the Simple Mail Transfer Protocol (SMTP) or to pass 8-bit data as required for multibyte-character support. By default, sendmail supports only 7-bit data. You can configure sendmail to pass mail messages in 8-bit format by using the /usr/sbin/wwsetup script or, in the Common Desktop Environment, by clicking on the Mail option of the I18N Configuration application. (The navigation path to the I18N Configuration application is Application Manager -> System Administration -> Configuration -> I18N.)

Note
Digital recommends that you not configure sendmail to use 8-bit data format because the SMTP protocol, which is widely used, does not support this format.

6.13.2 The mailx Command and MH Commands

Both the mailx command and all applicable commands in the MH system support the conversion of mail messages between the mail interchange codeset (used to transfer messages to some hosts) and a user's application codeset. For example, if the mail interchange codeset is ISO-2022-JP and the application codeset is eucJP, the mailx or MH command converts incoming messages to the Japanese EUC codeset before displaying them.

To prevent data loss, when incoming messages are stored in mail folders, the messages are encoded in the codeset in which they are received. Codeset conversion takes place when users extract or display the messages.

To communicate mail interchange code information to other systems, outgoing messages include two additional header lines like the following:

Mime-Version: 1.0

Content-Type: TEXT/PLAIN; charset=ISO-2022-JP

The charset field in the preceding example specifies the mail interchange codeset, in this case, ISO-2022-JP. This codeset is an ISO 7-bit state-dependent codeset for Japanese characters. Codesets other than those that are part of the ISO standard, are identified by the prefix X- in the codeset name. For example, when DEC Hanyu is the codeset used for mail interchange, the following header lines are included in outgoing mail messages:

Mime-Version: 1.0

Content-Type: TEXT/PLAIN; charset=X-dechanyu

The mailx command or MH commands use the following values (listed in order of highest to lowest priority) to determine or set the mail interchange and application codesets for a particular message:

The mail interchange codeset applied to incoming messages is determined from:
1. The codeset specified as the systemwide mail interchange default in the file /usr/lib/mail-codesets
  If you create this file, it contains a single entry, which is the name of a locale.
If neither of the preceding values is available, codeset conversion does not occur.

The mail interchange codeset applied to outgoing messages is determined from:
If a codeset is not determined for outgoing mail interchange, the mail is sent with no codeset identifier.

The application codeset is determined from:
1. The setting of the LANG environment variable
2. The value of the lang component in the $HOME/.mailrc file (for the mailx command) or the $HOME/.mh_profile file (for MH commands)

6.13.3 The comsat Server

The comsat server, which notifies users of incoming mail messages, always attempts to convert incoming mail messages from the mail interchange codeset to the user's application codeset. The comsat server uses the following values (in order of highest to lowest priority) to determine the codesets that apply to a message:

For the mail interchange codeset
1. The charset field, if included in the mail message header
2. The codeset specified as the systemwide mail interchange default in the file /usr/lib/mail-codesets
  If neither of the preceding values is available, codeset conversion does not occur.

For the application codeset
1. The application codeset defined for the atty driver of the user's system
2. The codeset name in the file $HOME/.codeset_device_name, where device_name is the name of the terminal device for the current session

6.14 Applying Sort Orders to Non-English Characters

The sort command sorts characters according to the collation sequence defined for the current locale. A particular locale can apply one set of collation rules to the associated character set. Multiple locale names do exist, however, for the same combination of language, territory, and character set. Most often, these variations exist to offer users the choice of more than one collating sequence.

When there is more than one locale available for a given combination of language, territory, and codeset, some of the locale names include a suffix with the format @variant. To avoid problems with pathnames constructed using the %L specifier, you usually assign a locale name with an @ suffix only to the appropriate locale category variable (or variables). For example:


% setenv LANG zh_TW.eucTW

% setenv LC_COLLATE zh_TW.eucTW@radical

Supporting different collation orders through one or more locales is adequate for most languages. However, collation orders for Asian languages require additional support for the following reasons:

Asian languages include user-defined characters, which are not specified in a locale. These characters can be defined with a collation weight. In this case, the collation weight needs to be applied when the user-defined characters are encountered in the strings being sorted.
Ideographic characters can be sorted on more than one dimension (radical, stroke, phonetic, and internal code). Some users need to combine these dimensions during sort operations. In one operation the user may need to sort characters first by radical and then according to the number of strokes. For another operation, the user may need to put characters first in phonetic order, then according to the number of strokes, and so on. Sorting by combinations of dimensions requires breadth-first sorting, rather than the depth-first sorting implemented through locales.

For the preceding reasons, the asort command was developed and is available when you install language variant subsets that support Asian languages. The asort command uses, by default, the collating order defined for the LC_COLLATE variable and supports all the flags supported by the sort command. In addition, the asort command includes the following flags:

-C
This flag indicates that the sort operation should use special system sort tables, along with sort tables produced by the cgen utility to support user-defined characters. This flag overrides the sort sequence defined in the locale specified by the LC_COLLATE variable.
-v
This flag, which you can use only when you also specify the -C flag, implements breadth-first sorting.

Refer to the asort(1) reference page for more information about using this command.

6.15 Processing Reference Pages in Languages Other Than English

Programmers who supply software applications for UNIX systems frequently supply online reference pages (manpages) to document the application and its components. UNIX text-processing commands and utilities must be able to process translated versions of these reference pages for applications sold to the international market. Enhanced versions of the nroff, tbl, and man commands are included in Digital UNIX to support this requirement.

6.15.1 The nroff Command

The nroff command includes the following capabilities to support locales:

Formats reference page source files written in any language whose locale is installed on the system

Supports characters of any supported languages in the string arguments of macros and requests

Supports character mapping of characters for any supported language through the .tr request in reference page source files

Allows you to set the escape character (\), command control character (.), and nobreak control character (') to local language, as well as ASCII, characters

Maps each 2-byte space character, which is defined in most codesets for Asian languages, to two ASCII spaces in output

When formatting reference pages that contain ideographic characters, the nroff command treats each character as a single word. A string of ideographic characters, including 2-byte letters and punctuation characters, can be wrapped to the next line subject to the following constraints:

The last character on the text line cannot be defined as a no-last character by either the standard or private list of no-last characters

The first character on the text line cannot be defined as a no-first character by either the standard or private list of no-first characters

The standard no-first, no-last character lists are defined in nroff catalog files. For lists of these characters, refer to the language-specific user guides that are available on the CD-ROM from which you install subsets for Asian-language support.

The no-first and no-last constraints exist to prevent nroff from placing a punctuation mark or right parenthesis at the beginning of a text line or placing a left parenthesis at the end of a text line. You can turn the standard constraints on and off in source files with the .ki and .ko commands, respectively.

You can also define a private set of no-first and no-last characters with the following command:

.kl 'no-first-list'no-last-list '

The parameters no-first-list and no-last-list are strings of characters you should include in the no-first and no-last categories. You cancel a private no-first and no-last list by entering a .kl command with null strings as the parameters. For example:

.kl '''

Note
The characters specified in the .kl command override, rather than supplement, the characters in the standard set of no-first and no-last characters. Therefore, you cannot use the standard set of no-first and no-last characters together with a private set.
Using the command .kl ''' restores use of the standard set of no-first and no-last characters for the current locale.

The nroff command can format text so that it is justified or not justified to the right margin. When text is justified to the right margin, nroff inserts spaces between words in the line. Ideographic characters, although treated as words in most stages of the formatting process, differ in terms of whether they can be delimited by spaces.The characters that can be preceded by a space, followed by a space, or both are listed in the language-specific user guides that are available on line when you install language variant subsets of Digital UNIX. When right-justifying text, the nroff command inserts spaces only at the following places:

Where 1-byte or 2-byte spaces already occur

Between English and ideographic characters

Before characters defined as can-space-before

After characters defined as can-space-after

In other cases, no space is inserted between consecutive ideographic characters. Therefore, if a text line contains only ideographic characters, it may not be justified to the right margin.

6.15.2 The tbl Command

The tbl command preprocesses table formatting commands within blocks delimited by the .TS and .TE macros. The tbl command handles multibyte characters that can occur in text of languages other than English.

The tbl command is frequently used along with the neqn (equation formatting preprocessor) to filter input passed to the nroff command. In such cases, specify tbl first to minimize the volume of data passed through the pipes. For example:


% cd /usr/usr/share/ja_JP.deckanji/man/man1

% tbl od.1 | neqn | nroff -Tlpr -man -h | \
lpr -Pmyprinter

When printing text of an Asian language, you must use printer hardware that supports the language.

6.15.3 The man Command

The man command can handle multibyte characters in reference page files. By default, the man command automatically searches for reference pages in the/usr/share/locale_name /man directory before searching the /usr/share/man and /usr/local/man directories. Therefore, if the LANG environment variable is set to an installed locale and if reference page translations are available for that locale, the man command automatically displays reference pages in the appropriate language.

In addition, the man command automatically applies codeset conversion (assuming the availability of appropriate converters) when reference page translations for a particular language are encoded in a codeset that does not match the codeset of the user's locale. Refer to the man(1) reference page for information about redefining the man command search path and for more details about codeset conversion.

6.16 Converting Data Files from One Codeset to Another

Each locale is based on a specific codeset. Therefore, when an application uses a file whose data is coded in one codeset and runs in a locale based on another codeset, character interpretation may be meaningless. For example, assume that a fictional language includes a character named "quo", which is encoded as \031 in one codeset and \042 in another codeset. If the "quo" character is stored in a data file as \031, the application that reads data from that file should be running in the locale based on the same codeset. Otherwise, \031 identifies a character other than "quo".

Users, the applications they run, or both may need to set the process environment to a particular locale and use a data file created with a codeset different from the one on which the locale is based. The data file in question might be appropriate for a given language and in a codeset different from the user's locale for one of the following reasons:

The data file might have been created on another vendor's system by using a locale based on a vendor-specific codeset.
The locale could be one of several that support the same Asian language, such as Japanese. Asian languages are typically supported by a variety of locales, each based on a different codeset.

You can convert a data file from one codeset to another by using the iconv command or the iconv_open, iconv, and iconv_close functions. For example, the following command reads data in the file accounts_local, which is encoded in the deckorean codeset; converts the data to the eucKR codeset; and appends the results to the file accounts_central:


% iconv -f deckorean -t eucKR accounts_local \
>> accounts_central

The iconv command and associated functions can use either an algorithmic converter or a table converter to convert data. Algorithmic converters, if installed on your system, reside in the /usr/lib/nls/loc/iconv directory; this directory is the one searched first for a converter. Table converters, if installed on your system, reside in the /usr/lib/nls/loc/iconvTable directory. The value of the LOCPATH variable, if defined, overrides the command's default search path.

The iconv command assumes that a converter name adheres to the following format:

from-codeset_ to-codeset

For the preceding example, the iconv command would search for and use the /usr/lib/nls/loc/iconv/deckorean_eucKR converter.

Table 6-24 specifies the codeset conversions that Digital UNIX supports for English data. The user guides for the language variant subsets include tables with codeset conversions supported for Asian languages.

For detailed information about the iconv command, refer to the iconv(1) and iconv_intro(5) reference pages. For information on functions that programs can use to perform codeset conversion, refer to the iconv_open(3), iconv(3), and iconv_close(3) reference pages.

Table 6-24: Supported Codeset Conversions for English
Codeset	ASCII-GR	ISO8859-1	ISO8859-1-GL	ISO8859-1-GR
ASCII-GR	-	Yes	No	No
ISO8859-1	Yes	-	Yes	Yes
ISO8859-1-GL	No	Yes	-	No
ISO8859-1-GR	No	Yes	No	-

6.17 Miscellaneous Information for Base System Commands

The following list includes information about features and restrictions that apply when using traditional UNIX commands in local-language environments:

rlogin
When using the rlogin command to log on a Digital UNIX system from an ULTRIX system, be sure to specify the -8 flag to pass 8-bit data without stripping. Otherwise, you will have problems entering non-ASCII characters from your terminal.
If you view a large data file while logged on the remote system, use a pager command, such as pg, and not the Hold Screen key to view a large data file. The -8 option sets the terminal mode of the original host to RAW, disabling flow control. So, if data is sent to the terminal a rate faster than the terminal can handle it, some data is lost when you use the Hold Screen key.
This rlogin restriction applies not only when logging in from an ULTRIX system, but when logging in from any UNIX system whose software does not fully support 8-bit data format.

Emacs editor
The operating system includes the multilingual Emacs software from the Free Software Foundation. Before using this editor, you must add the /usr/i18n/mule/bin directory to your process-specific search path. You can then invoke this editor by using the mule command.

vi and more
The vi and more commands discard text that follows an invalid multibyte character. If you encounter this problem, it is likely that your locale setting is not correct for the text being viewed or edited. In this case, reset your locale to one that matches the text and invoke the command again.
When used with Thai characters, vi may wrap lines before the right boundary of the screen. This happens because Thai text includes nonspacing characters, which contribute to the character count but not to display width. The editor wraps lines based on character count. For example, vi may wrap a line after entry of 80 characters, even though these characters do not occupy 80 columns on the screen.

Using local-language user names and file names
It is a limitation of UNIX file systems that you cannot use a multibyte character whose second or subsequent byte is an ASCII slash (/) in names of files, users, or other objects. For portability reasons, Digital recommends that you avoid using multibyte characters in these names.

6.18 Using Language Support Enhancements for DECwindows Motif Applications

In the DECwindows Motif environment, you use versions of DECwindows Motif fonts, codesets, servers, and applications that support features discussed in earlier sections of this chapter. This section provides more detail on using DECwindows Motif with Asian languages. Topics include:

Tuning the cache and unit size of the X Display Server for languages with ideographic characters
Using font renderers for multibyte PostScript fonts
Changing the language of the Start Session (login) and Pause windows
Setting fonts in the Motif Window Manager for local and remote display
Customizing a DECterm window for local languages
Using the CDA viewer and converters with Asian-language text files

6.18.1 Tuning the X Server for Ideographic Languages

Asian languages have large ideographic character sets, so all characters needed for display are not loaded into memory at the same time. Instead, only as many characters as will fit in the memory cache are simultaneously loaded. When characters needed for display are not currently cached in memory, the least recently used font glyphs are removed from the cache to make room. The font-cache mechanism allows you to display ideographic text in multiple typefaces, font sizes, and font styles without increasing the amount of memory that systems must have to support ideographic languages.

The X Server font-cache mechanism allows you to change the number of cache units and the size of these units to best accommodate the character sets used in displays. You will probably need to change the default values set for cache parameters to achieve the best performance from your system if it will display Asian-language text. Consider the following criteria when deciding on the optimal values for font caching:

The number of ideographic languages that you want to display
If you intend to work with several ideographic languages during the same DECwindows Motif session, you need larger values for acceptable performance.
The number of fonts that will be used simultaneously
Variation in font number and size depends partly on the kinds of applications you run. A desktop publishing application typically requires more fonts than other types of applications whereas a software development tool requires fewer.
The number of frequently used characters in the languages you want to display
In Asian languages, only a subset of characters are used frequently. The size of this subset varies from one language to another. For example, around 20,000 standard characters are supported for Taiwan but only 5,000 of those characters are used frequently. Estimates for the number of frequently used characters for other Asian countries is as follows: People's Republic of China (3000), Korea (2000), and Japan (2000). Font-cache parameters are tuned to accommodate the subset of characters that are used frequently.

To change the cache size (which is the number of cache units) and the size of each cache unit, you must modify the X Server configuration file /usr/lib/X11/xdm/Xservers. This file contains a line, similar to the following one, that starts the X Server:

:0 local /usr/bin/X11/X

You can modify this line to add definitions for cache size and unit size. For example:

:0 local /usr/bin/X11/X -cs cache_size -cu unit_size

Table 6-25 describes the options that tune the font-cache mechanism.

Table 6-25: X Server Options for Tuning the Font-Cache Mechanism
stty Option	Description
`-cs cache_size`	Defines the number of cache units. The minimum (and also default) value for this parameter is 1024. If you specify a cache size smaller than 1024, font caching is disabled. For one ideographic language, the recommended value is the lowest multiple of 1024 that accommodates the number of frequently used characters in that language. If a workstation displays multiple ideographic languages simultaneously, you have to add together the values required for each language. Specify an even larger value if you intend to run applications, such as desktop publishing software, that require multiple font styles and sizes for each ideographic character.
`-cu unit_size`	Defines the size of each cache unit. The minimum value for unit size is 31 bytes and the default value is 128 bytes. If you specify a value smaller than 31 bytes, the value has no effect. If a particular font requires more memory space than 128 bytes, the font-cache mechanism automatically allocates one or more additional units to store its glyphs.

Note
Font caching applies only to uncompressed fonts in pcf format. Font caching is not applied to any compressed fonts or to fonts in bdf format. Because font caching cannot be used with compressed fonts, the 2-byte fonts for Asian languages are not installed in compressed format.

You can calculate cache unit size with the following formula:

unit_size =
((floor(ceil((double)WIDTH / 8.0) /4.0)) + 1.0) * 4.0 * (double)HEIGHT

Consider the following calculation for a typical font size of 24x24:

unit_size in bytes
= ((floor(ceil((double) 24 / 8.0 / 4.0)) + 1.0) * 4.0 * (double) 24
= 96

For 34x34 fonts, the unit size calculation would yield 272 bytes.

Given that 96 bytes are needed to cache a 24x24 font glyph and 272 bytes is needed to cache a 34x34 font glyph, the default unit size of 128 has the following implications:

For 24x24 fonts, each character needs only one cache unit. If cache size is set at 4096, the cache can accommodate 4096 characters.

For 34x34 fonts, each character needs three cache units. If cache size is set at 4096, the cache can accommodate 1365 characters.

Small fonts (whose characters require a single, 128-byte unit) are used more frequently for displaying ideographic characters. Therefore, you usually have to change only the cache size to achieve acceptable performance in text displays of languages with ideographic characters.

6.18.2 Using Font Renderers for Multibyte PostScript Fonts

The operating sytem includes font renderers that allow any X application to use the PostScript fonts available for the Chinese and Korean languages. The system administrator can set up font renderers for the following kinds of fonts for use through the X Server or the font server:

Double-byte PostScript fonts
UDC fonts

6.18.2.1 Setting Up the Font Renderer for Double-Byte PostScript Fonts

The font renderer for Chinese and Korean PostScript fonts can be set up for use either through the X Server or the font server by editing the appropriate configuration file:

For the X Server, the font renderer is automatically added at installation time to the font_renderers list in the X Server's configuration file.
For a font server, you must manually add the following entry to the renderers list in the font server's configuration file:
```
renderers = other_renderer, other_renderer,...
     libfr_DECpscf.so;DECpscfRegisterFontFileFunctions
```
In addition, you must specify the paths for the PostScript font files in the catalogue list in the same configuration file. Double-byte PostScript fonts for the Asian languages are available in the following directories:
```
/usr/i18n/lib/X11/fonts/KoreanPS
/usr/i18n/lib/X11/fonts/SChinesePS
/usr/i18n/lib/X11/fonts/TChinesePS
```
Each font in these directories has the following components:
- A Type1 font header with the .pfa2 file name extension
  This header file is the only file that must be listed in the fonts.dir file in the font directory.
- A data file with the .csdata file name extension
- A binary metrics file with the .xafm file name extension

The renderer for Asian double-byte PostScript fonts uses its own configuration file that specifies the following information:

Cache size (number of cache units)

Cache unit size

File handler (names associated with font-rendering software)

Default character (character that is printed in place of any character for which there is no glyph)

The default pathname for this configuration file is /var/X11/renderer/DECpscf_config; however, you can change this path by setting the DECPSCF_CONFIG_PATH environment variable.

6.18.2.2 Setting Up the Font Renderer for UDC Fonts

The UDC font renderer accesses the UDC database directly to obtain font glyphs. Therefore, X applications that use this renderer do not need to use .pcf files generated by the cgen utility.

The UDC font renderer can be set up for use either through the X Server or the font server as follows:

For the X Server, the font renderer is automatically added at installation time to the font_renderers list in the X Server's configuration file.

For a font server, you must manually add the following entry to the renderers list in the font server's configuration file:

renderers = other_renderer, other_renderer,...
     libfr_UDC.so;UDCRegisterFontFileFunctions

In addition, you must specify the path to the UDC database in the catalogue list of the same configuration file. This path should be set to the top directory for the UDC database. For example, /var/i18n/udc is the correct path for a systemwide UDC database if the database was set up in the default directory.

To process UDC characters in a particular language, the font renderer also requires entries in the fonts.dir file in the appropriate PostScript font directory from the following list:

/usr/i18n/lib/X11/fonts/SChinesePS
/usr/i18n/lib/X11/fonts/TChinesePS

Edit the fonts.dir file to specify virtual file names in the format locale_name.udc followed by the corresponding XLFD names registered for the codesets. The following table shows the XLFD entry that corresponds to different Asian codesets.

Table 6-26: XLFD Registry Names for UDC Characters
Codeset	XLFD Registry Name
`dechanyu`, `eucTW`	`DEC.CNS11643.1986-UDC`
`big5`	`BIG5-UDC`
`dechanzi`	`GB2312.1980-UDC`
`deckanji`, `sdeckanji`, `eucJP`	`JISX.UDC-1`

The following example entry is appropriate for the fonts.dir file in the /usr/i18n/lib/X11/fonts/TChinesePS directory:

2
zh_TW.dechanyu.udc -system-decwin-normal-r--24-240-75-75-m-24-DEC.CNS11643.1986-UDC
zh_TW.big5.udc -system-decwin-normal-r--24-240-75-75-m-24-BIG5-UDC

6.18.3 Changing the Language of the Start Session Window

The language of the window used to resume your session when it is in pause state is determined by the current language setting for your session. However, you must set the language of the Start Session window where you log in to your workstation by modifying the X Display Manager configuration file /usr/var/X11/xdm/xdm-config. In this file, define the entry for the DisplayManager*language resource to be a locale for the language you want. The following example sets this resource to a locale for Japanese:

DisplayManager*language: ja_JP.sdeckanji

6.18.4 Setting Fonts for Display of Local Languages

The system where you install language variant subsets is automatically updated with fonts required for text display in the supported languages. Usually, the new fonts are also added to the font list in the systemwide resource file /usr/lib/X11/app-defaults/Mwm that the local Motif Window Manager uses. This automatic update procedure is sufficient, except when:

A language-specific version of the systemwide Mwm resource file is installed as part of the local-language support
The system where language variant subsets are installed is a client in a client-server display environment

The following sections explain how to work around the preceding restrictions for the DECwindows Motif environment.

6.18.4.1 Using MwmFontSetup to Update a Private Mwm File

Currently, the subsets that support Japanese and Hebrew install a language-specific version of the systemwide Mwm resource file. Therefore, if you need access to Japanese or Hebrew fonts along with access to fonts that support other languages, you cannot rely on the systemwide Mwm file and must update the font list in your private Mwm file. You can run the /usr/i18n/usr/bin/X11/MwmFontSetup script to add or remove language-specific fonts from the font list in $HOME/Mwm.

The MwmFontSetup script:

Creates a backup copy of your current Mwm file
Displays the fonts that are listed in the current file
Asks if you want to remove or add fonts
If you choose to add fonts, displays a list of languages for which support is installed and asks you to select the language whose fonts should be added to your Mwm file
Allows you to repeat the steps for adding or removing fonts until you are satisfied with the font list
At this point, you can select the EXIT option to exit from the procedure.

Note that the MwmFontSetup is useful only in the DECwindows Motif environment. In the Common Desktop Environment, applications access fonts through alias names that are mapped to the real names of the fonts. Font alias files must exist for each supported locale. For example, the font alias files for Japanese Extended UNIX Code are /usr/dt/config/xfonts/ja_JP.eucJP/75dpi/fonts.alias, /usr/dt/config/xfonts/ja_JP.eucJP/100dpi/fonts.alias, and /usr/dt/config/ja_JP.eucJP/sys.font. These alias files are installed when Digital UNIX software for Japanese language support is installed.

6.18.4.2 Accessing Local Language Fonts for Remote Displays

The information in this section is appropriate for the DECwindows Motif environment. In the Common Desktop Environment, fonts are mapped to generic alias names.

The system where Asian-language subsets are installed may function as a client in a client-server display environment. In this case, the local-language fonts must also be available to the Motif Window Managers for all the server systems where native language text is displayed. You need to install fonts for other locales either on individual systems used for remote login to the system where language variant subsets are installed or make the fonts known to the other systems through a font server. Table 6-27, Table 6-28, Table 6-30, Table 6-31, Table 6-32, and Table 6-33 describe the fonts used to display text in various local languages. You can use the /usr/bin/X11/xlsfonts command to determine which fonts are currently installed on a system.

Table 6-27: Bitmap Fonts for Asian Locales
Language	Typeface	Style	Sizes	75dpi	100dpi
Japanese	Gothic (ISO Latin-1)	Normal	8, 10, 12, 14, 18, 24	x	x
	Gothic (Kanji)	Normal	8, 10, 12, 14, 18, 24	x	x
	Gothic (Roman Kana)	Normal	8, 10, 12, 14, 18, 24	x	x
	kmenu (ISO Latin-1)	Normal	12	x	x
	kmenu (Roman Kana)	Normal	12	x	x
	Mincho (ISO Latin-1)	Normal	8, 10, 12, 14, 18, 24	x	x
	Mincho (Kanji)	Normal	8, 10, 12, 14, 18, 24	x	x
	Mincho (Roman Kana)	Normal	8, 10, 12, 14, 18, 24	x	x
	Screen (DECsuppl)	Normal	14, 18, 24	x
	Screen (DECtech)	Normal	14, 18, 24	x
	Screen (ISO Latin-1)	Normal	14, 18, 24	x
	Screen (Kanji00)	Normal	10, 14, 16, 18, 24	x
	Screen (Kanji11)	Normal	10, 14, 18, 24	x
	Screen (Roman Kana)	Normal	10, 14, 18, 24	x
Korean	Gotic	Normal	16, 24	x
	Myungcho	Normal	16, 24, 32	x
	Screen	Normal	18, 24	x
	KS Roman	Normal	18, 24	x
Simplified Chinese	FangSongTi	Normal	24, 34	x
	HeiTi	Normal	16, 24, 34	x
	KaiTi	Normal	24, 34	x
	Screen	Normal	18, 24	x
	SongTi	Normal	16, 24, 34	x
Traditional Chinese	Hei (CNS11643)	Normal	16, 24	x
	Hei (DTSCS)	Normal	16, 24	x
	Screen (CNS11643)	Normal	18, 24	x
	Screen (DTSCS)	Normal	18, 24	x
	Sung (CNS11643)	Normal	24, 32	x
	Sung (DTSCS)	Normal	24, 32	x
Thai	Screen	Normal	14, 18, 24	x
Asia (Misc.)	Screen (DEC Ctrl)	Normal	14, 18, 24	x
	Screen (DRCS)	Normal	18, 24	x

Table 6-28: Bitmap Fonts for *.ISO8859-2 Locales
Language	Typeface	Style	Sizes	75dpi	100dpi
Czech, Hungarian, Polish, Slovak, Slovene	Arial	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Arial Narrow	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Book Antiqua	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Bookman Old Style	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Century Gothic	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Century Schoolbook	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Courier	Normal	8, 10, 12, 14, 18, 24, 36	x	x
		Italic	8, 10, 12, 14, 18, 24, 36	x	x
		Bold	8, 10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24, 36	x	x
	Monotype Corsiva	Normal	10, 12, 14, 18, 24, 36	x	x
	Times New Roman	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Terminal	Normal	14, 18	x	x
		Double-Width	14, 18	x	x
		Double-Width, Double-Height	28, 36	x	x
		Narrow	14, 18	x	x
		Double-Width, Narrow	14, 18	x	x
		Double-Width, Double-Height, Narrow	28, 36	x	x
		Bold	14, 18	x	x
		Double-Width, Bold	14, 18	x	x
		Double-Width, Double-Height, Bold	28, 36	x	x
		Narrow, Bold	14, 18	x	x
		Double-Width, Narrow, Bold	14, 18	x	x
		Double-Width, Double-Height, Narrow, Bold	28, 36	x	x

Table 6-29: Bitmap Fonts for *.ISO8859-4 Locales
Language	Typeface	Style	Sizes	75dpi	100dpi
Lithuanian	Arial	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Arial Narrow	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Book Antiqua	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Bookman Old Style	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Century Gothic	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Century Schoolbook	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Courier	Normal	8, 10, 12, 14, 18, 24, 36	x	x
		Italic	8, 10, 12, 14, 18, 24, 36	x	x
		Bold	8, 10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24, 36	x	x
	Monotype Corsiva	Normal	10, 12, 14, 18, 24, 36	x	x
	Times New Roman	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Terminal	Normal	14, 18	x	x
		Double-Width	14, 18	x	x
		Double-Width, Double-Height	28, 36	x	x
		Narrow	14, 18	x	x
		Double-Width, Narrow	14, 18	x	x
		Double-Width, Double-Height, Narrow	28, 36	x	x
		Bold	14, 18	x	x
		Double-Width, Bold	14, 18	x	x
		Double-Width, Double-Height, Bold	28, 36	x	x
		Narrow, Bold	14, 18	x	x
		Double-Width, Narrow, Bold	14, 18	x	x
		Double-Width, Double-Height, Narrow, Bold	28, 36	x	x

Table 6-30: Bitmap Fonts for *.ISO8859-5 Locales
Language	Typeface	Style	Sizes	75dpi	100dpi
Russian	Arial	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Courier	Normal	8, 10, 12, 14, 18, 24, 36	x	x
		Italic	8, 10, 12, 14, 18, 24, 36	x	x
		Bold	8, 10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24, 36	x	x
	Nimrod	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Plantin	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Times New Roman	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Terminal	Normal	14, 18	x	x
		Double-Width	14, 18	x	x
		Double-Width, Double-Height	28, 36	x	x
		Narrow	14, 18	x	x
		Double-Width, Narrow	14, 18	x	x
		Double-Width, Double-Height, Narrow	28, 36	x	x
		Bold	14, 18	x	x
		Double-Width, Bold	14, 18	x	x
		Double-Width, Double-Height, Bold	28, 36	x	x
		Narrow, Bold	14, 18	x	x
		Double-Width, Narrow, Bold	14, 18	x	x
		Double-Width, Double-Height, Narrow, Bold	28, 36	x	x

Table 6-31: Bitmap Fonts for *.ISO8859-7 Locales
Language	Typeface	Style	Sizes	75dpi	100dpi
Greek	Arial	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Courier	Normal	8, 10, 12, 14, 18, 24, 36	x	x
		Italic	8, 10, 12, 14, 18, 24, 36	x	x
		Bold	8, 10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24, 36	x	x
	Times New Roman	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Terminal	Normal	14, 18	x	x
		Double-Width	14, 18	x	x
		Double-Width, Double-Height	28, 36	x	x
		Narrow	14, 18	x	x
		Double-Width, Narrow	14, 18	x	x
		Double-Width, Double-Height, Narrow	28, 36	x	x
		Bold	14, 18	x	x
		Double-Width, Bold	14, 18	x	x
		Double-Width, Double-Height, Bold	28, 36	x	x
		Narrow, Bold	14, 18	x	x
		Double-Width, Narrow, Bold	14, 18	x	x


		Double-Width, Double-Height, Narrow, Bold	28, 36	x	x

Table 6-32: Bitmap Fonts for *.ISO8859-8 Locales
Language	Typeface	Style	Sizes	75dpi	100dpi
Hebrew	David	Normal	8, 10, 12, 14, 18, 24	x	x
		Italic	8, 10, 12, 14, 18, 24	x	x
		Bold	8, 10, 12, 14, 18, 24	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24	x	x
	Frankruhl	Normal	8, 10, 12, 14, 18, 24	x	x
		Italic	8, 10, 12, 14, 18, 24	x	x
		Bold	8, 10, 12, 14, 18, 24	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24	x	x
	Gam	Normal	8, 10, 12, 14, 18, 24	x	x
		Italic	8, 10, 12, 14, 18, 24	x	x
		Bold	8, 10, 12, 14, 18, 24	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24	x	x
	menu	Normal	10, 12	x	x
	Miriam	Normal	8, 10, 12, 14, 18, 24	x	x
		Italic	8, 10, 12, 14, 18, 24	x	x
		Bold	8, 10, 12, 14, 18, 24	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24	x	x
	Miriam Fixed	Normal	8, 10, 12, 14, 18, 24	x	x
		Italic	8, 10, 12, 14, 18, 24	x	x
		Bold	8, 10, 12, 14, 18, 24	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24	x	x
	Narkiss Tam	Normal	8, 10, 12, 14, 18, 24	x	x
		Italic	8, 10, 12, 14, 18, 24	x	x
		Bold	8, 10, 12, 14, 18, 24	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24	x	x
	Terminal	Normal	14, 18	x	x
		Double-Width	14, 18	x	x
		Double-Width, Double-Height	28, 36	x	x
		Narrow	14, 18	x	x
		Double-Width, Narrow	14, 18	x	x
		Double-Width, Double-Height, Narrow	28, 36	x	x
		Bold	14, 18	x	x
		Double-Width, Bold	14, 18	x	x
		Double-Width, Double-Height, Bold	28, 36	x	x
		Narrow, Bold	14, 18	x	x
		Double-Width, Narrow, Bold	14, 18	x	x
		Double-Width, Double-Height, Narrow, Bold	28, 36	x	x

Table 6-33: Bitmap Fonts for *.ISO8859-9 Locales
Language	Typeface	Style	Sizes	75dpi	100dpi
Turkish	Arial	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Arial Narrow	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Book Antiqua	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Bookman Old Style	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Century Gothic	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Century Schoolbook	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Courier	Normal	8, 10, 12, 14, 18, 24, 36	x	x
		Italic	8, 10, 12, 14, 18, 24, 36	x	x
		Bold	8, 10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	8, 10, 12, 14, 18, 24, 36	x	x
	Monotype Corsiva	Normal	10, 12, 14, 18, 24, 36	x	x
	Times New Roman	Normal	10, 12, 14, 18, 24, 36	x	x
		Italic	10, 12, 14, 18, 24, 36	x	x
		Bold	10, 12, 14, 18, 24, 36	x	x
		Bold-Italic	10, 12, 14, 18, 24, 36	x	x
	Terminal	Normal	14, 18	x	x
		Double-Width	14, 18	x	x
		Double-Width, Double-Height	28, 36	x	x
		Narrow	14, 18	x	x
		Double-Width, Narrow	14, 18	x	x
		Double-Width, Double-Height, Narrow	28, 36	x	x
		Bold	14, 18	x	x
		Double-Width, Bold	14, 18	x	x
		Double-Width, Double-Height, Bold	28, 36	x	x
		Narrow, Bold	14, 18	x	x
		Double-Width, Narrow, Bold	14, 18	x	x
		Double-Width, Double-Height, Narrow, Bold	28, 36	x	x

In the DECwindows Motif environment, fonts used to display local-language text in window titles, menu bars, menus, and so forth must also be added to one or more of the Motif Window Manager resource files on the server systems. These resource files are:

Systemwide
Color monitor: /usr/lib/X11/app-defaults/Mwm
Black-and-white monitor: /usr/lib/X11/app-defaults/Mwm_bw
Gray-scale monitor: /usr/lib/X11/app-defaults/Mwm_gray
Private
Color monitor: $HOME/Mwm
Black-and-white monitor: $HOME/Mwm_bw
Gray-scale monitor: $HOME/Mwm_gray

If users remotely log on to their home systems, where language variant subsets are installed and where they have run the MwmFontSetup script, their private Mwm resource files may already list the fonts moved to the display systems. Refer to Section 6.18.4.1 for information about the MwmFontSetup script.

6.18.5 Customizing the DECterm Window for Local Languages

The following features and restrictions apply to DECterm windows that you create when an Asian language is specified for the language setting:

You cannot customize the National Replacement Character Set (NRCS) when a DECterm window is emulating a terminal type that supports ideographic character sets.

Depending on the language setting, additional menu items, push buttons, toggle switches, and text entry fields may be available to you for customizing DECterm features.

Additional terminal identifiers may be available for terminal emulation. Note that terminal emulation always follows the selected language. For example, you cannot have a DECterm window emulate a Japanese terminal when the user interface language for the DECterm window is set to English.

From the DECterm Window Options dialog box, you can select the following font sizes for ideographic character sets:
- Big Font: 24-point font
- Little Font: 18-point font
- Fine Font: 14-point font
  Fine Font is available for Japanese and Thai only.
German Standard Font is not supported when a DECterm window emulates a terminal that supports ideographic characters. For other ISO Latin character sets, the size options are the same as those offered for standard DECterm software.

By default, the DECterm application saves its options in a file named DXterm_%l_%t, where %l is replaced by the language and %t is replaced by the territory (country) of the current locale.
Resource files should not be shared among locales. Therefore, do not save any application's resource file in one locale and attempt to use the same resource file when invoking the application in another locale.

For a language supported by an input method server, you must be sure the input language server is connected to the DECterm window where you input characters in that language. Otherwise, you cannot use the input method for character entry. The connection between a DECterm window and an input server does not exist if:
- The DECterm window was started before the input server started
- The input method server was killed for some reason
  For example, an input method server is killed if it was being run on a remote system that shut down.
If the connection between a DECterm window and the input method server was broken, you can first try to reconnect to the server by selecting the Reset Terminal item from the window's Commands menu. Alternatively, you can start the input method server and then create another DECterm window where you can use the input method.

For information about terminal programming enhancements that applications can use to draw ruled lines on a DECterm window, see Section 4.2.

6.18.6 Using the CDA Viewer and Converters with Asian Language Text

The CDA viewer is a DECwindows Motif application that lets you display the contents of compound documents and graphics, image files, and text files that contain ideographic characters. The viewer also supports PostScript files; however, PostScript display is supported only for languages with single-byte characters.

The viewer works with converters that convert files from one data format to another. If you want to view or convert text files that contain Asian-language characters, you must specify an option file to the CDA viewer and converters. This file must contain an entry to identify the codeset that applies to the text file being viewed, converted, or both. An option entry for text files starts with the keyword text. For Asian-language text files, this entry line must specify the appropriate character encoding (text_encoding). The following example is appropriate for a Japanese text file encoded in DEC Kanji:

text text_encoding dec_kanji

By convention, option files use the file extension .cda_options, so an appropriate name for the options file with the preceding entry might be japanese.cda_options.

The following table lists the supported encodings for text files used with CDA viewers and converters.

Language	Codeset	text_encoding Keyword
Chinese (Simplified)	DEC Hanzi	`dec_hanzi`
Chinese (Traditional)	DEC Hanyu	`dec_hanyu`
Japanese	DEC Kanji	`dec_kanji`
Korean	DEC Hangul (Korean)	`dec_hangul`

You specify an options file to CDA commands with the -O flag. The following example shows how to invoke the CDA viewer for the DECwindows Motif environment to display a Japanese text file named ja_document.txt:


% dxvdoc -f text -O japanese.cda_options \
ja_document.txt

The following example shows how to invoke a CDA converter to convert the same Japanese text file to ddif format:


% cdoc -s text -O japanese.cda_options \
-o ja_document.ddif ja_document.txt

After the text file is converted to ddif format, you can convert the ddif file to a PostScript file, as follows:


% cdoc -d ps -o ja_document.ps ja_document.ddif

Your system should have the required fonts installed if you are using the CDA converter to produce a PostScript file with Asian-language characters. The following table lists the basic fonts that the CDA converter uses for different Asian languages.

Language	Basic Font
Korean	Munjo
Hanyu	Sung-Light-CNS11643 or Sung-Light-DTSCS
Hanzi	XiSong-GB2312-80
Japanese	Ryumin-Light-EUC-H or Ryumin-Light.Hankaku

If the preceding fonts do not exist on the system, the converter uses Courier font.

As an alternative to the option file mechanism for specifying the encoding of input text files, you can define the environment variables DDIF_READ_TEXT_GL and DDIF_READ_TEXT_GR. The following table lists the supported values and associated encoding for these variables:

DDIF_READ_TEXT_GL	DDIF_READ_TEXT_GR	Encoding
LATIN1	MCS	MCS
LATIN1	LATIN1	ISO Latin-1
LATIN1	KATAKANA	ASCII-Kana
LATIN1	KANJI	DEC Kanji
ROMAN	MCS	Roman-MCS
ROMAN	LATIN1	Roman
ROMAN	KANJI	Roman-Kanji
ROMAN	KATAKANA	Roman-Kana
LATIN1	HANZI	DEC Hanzi
LATIN1	HANGUL	DEC Hangul (Korean)
LATIN1	HANYU	DEC Hanyu

Note
The CDA converter does not support vertical writing. Therefore, vertical text prints horizontally in files produced by the converter.

For complete information about CDA viewers and converters, refer to the cda(4) reference page. The cda(4) reference page also lists additional reference pages that describe specific CDA commands. Only a few of those commands and their options have been described here.

6 Using Internationalized Software

6.1 Working in a Multilanguage Environment: Introduction

6.2 Setting Locale and Language

Note

Note

6.3 Selecting Keyboard Type

6.3.1 Determining Keyboard Layout

6.4 Determining Input Method

Note

6.5 Determining the Input Mode Switch State

6.6 Setting Parameters in the .Xdefaults File

6.7 Defining the Search Path for Specialized Components

6.8 Using Terminal Interface Features for Asian Languages

Note

6.8.1 Converting Between Application and Terminal Codesets

6.8.2 Command Line Editing That Supports Multibyte Characters

6.8.3 Kana-Kanji Conversion: Customization of Japanese Input Options

6.9 Setting Up and Using User-Defined Character Databases

Note

6.9.1 Creating User-Defined Characters

6.9.1.1 Working on the cedit User Interface Screen

6.9.1.2 Editing Font Glyphs

Note

6.9.2 Creating UDC Support Files That System Software Uses

6.9.3 Processing UDC Fonts for Use with DECwindows

6.9.3.1 Using fontconverter Command Options

6.9.3.2 Controlling Output File Format

6.10 Setting Up and Using the Chinese Phrase Input Method

6.10.1 Enabling the SIM Service

6.10.2 Creating and Maintaining a Chinese Phrase Database

6.10.3 Using a Chinese Phrase Database

6.10.3.1 Phrase Input Supported Through the SIM Service

6.10.3.2 Phrase Input in the DECwindows Motif Environment

6.11 Modifying the Database Location Configuration File

6.12 Using Printer Interface Features That Support Local Languages

6.12.1 Print Filters for Local Language Printers

6.12.2 Support for Local Language Printers in /etc/printcap

Note

6.12.3 Enhancements to the lprsetup Command

6.12.4 Printing Commands and Printer Daemon

6.12.5 Font Handling for PostScript Printers

6.12.5.1 Choosing Fonts for Different Locales

6.12.5.2 Setting Up Print Queues With the pfsetup Command

6.12.5.3 Downloading Fonts to the DEClaser 1152

Note

6.12.5.4 Downloading Fonts to the DEClaser 5100

6.13 Using Mail in a Multilanguage Environment

6.13.1 The sendmail Utility

Note

6.13.2 The mailx Command and MH Commands

6.13.3 The comsat Server

6.14 Applying Sort Orders to Non-English Characters

6.15 Processing Reference Pages in Languages Other Than English

6.15.1 The nroff Command

Note

6.15.2 The tbl Command

6.15.3 The man Command

6.16 Converting Data Files from One Codeset to Another

6.17 Miscellaneous Information for Base System Commands

6.18 Using Language Support Enhancements for DECwindows Motif Applications

6.18.1 Tuning the X Server for Ideographic Languages

Note

6.18.2 Using Font Renderers for Multibyte PostScript Fonts

6.18.2.1 Setting Up the Font Renderer for Double-Byte PostScript Fonts

6.18.2.2 Setting Up the Font Renderer for UDC Fonts

6.18.3 Changing the Language of the Start Session Window

6.18.4 Setting Fonts for Display of Local Languages

6.18.4.1 Using MwmFontSetup to Update a Private Mwm File

6.18.4.2 Accessing Local Language Fonts for Remote Displays

6.18.5 Customizing the DECterm Window for Local Languages

6.18.6 Using the CDA Viewer and Converters with Asian Language Text

Note