Contents | Prev JDK Internationalization Overview

JDK 1.1 Internationalization Overview


Revision Date: 1/26/98

The global Internet demands global software- that is, software that can be developed independently of the countries or languages of its users, and then localized for multiple countries or regions. Version 1.1 of the JDK (Java Development Kit) provides a rich set of APIs for developing global applications. These internationalization APIs are based on the Unicode 2.0 character encoding and include the ability to adapt text, numbers, dates, currency, and user-defined objects to any country's conventions.

This document summarizes the internationalization APIs and features of JDK 1.1. For coding examples and step-by-step instructions, see the Java Tutorial. The detailed APIs are found at The Java Platform Core APIs Specification.


The internationalization APIs are concentrated in three packages:


On the Java platform, a locale is simply an identifier for a particular combination of language and region. It is not a collection of locale-specific attributes. Instead, each locale-sensitive class maintains its own locale-specific information. With this design, there is no difference in how user and system objects maintain their locale-specific resources. Both use the standard localization mechanism.

Java programs are not assigned a single global locale. All locale-sensitive operations may be explicitly given a locale as an argument. This greatly simplifies multilingual programs. While a global locale is not enforced, a default locale is available for programs that do not wish to manage locales explicitly. A default locale also makes it possible to affect the behavior of the entire presentation with a single choice.

Java locales act as requests for certain behavior from another object. For example, a French Canadian locale passed to a Calendar object asks that the Calendar behave correctly for the customs of Quebec. It is up to the object accepting the locale to do the right thing. If the object has not been localized for a particular locale, it will try to find a "close" match with a locale for which it has been localized. Thus if a Calendar object was not localized for French Canada, but was localized for the French language in general, it would use the French localization instead.

Locale Class

A Locale object represents a specific geographical, political, or cultural region. An operation that requires a locale to perform its task is called locale-sensitive and uses the Locale object to tailor information for the user. For example, displaying a number is a locale-sensitive operation- the number should be formatted according to the customs and conventions of the user's native country, region, or culture.

Supported Locales

On the Java platform, there does not have to be a single set of supported locales, since each class maintains its own localizations. Nevertheless, there is a consistent set of localizations supported by the JDK classes. Other implementations of the Java platform may support different locales than the JDK. Those supported by the JDK are summarized in the following table. Subsequent releases of the JDK may include additional locales.

Table 1 Locales Supported By JDK1.1




da_DK Danish Denmark
DE_AT German Austria
DE_CH German Switzerland
DE_DE German Germany
el_GR Greek Greece
en_CA English Canada
en_GB English United Kingdom
en_IE English Ireland
en_US English United States
es_ES Spanish Spain
fi_FI Finnish Finland
fr_BE French Belgium
fr_CA French Canada
fr_CH French Switzerland
fr_FR French France
it_CH Italian Switzerland
it_IT Italian Italy
ja_JP Japanese Japan
ko_KR Korean Korea
nl_BE Dutch Belgium
nl_NL Dutch Netherlands
no_NO Norwegian (Nynorsk) Norway
no_NO_B Norwegian (Bokmål) Norway
pt_PT Portuguese Portugal
sv_SE Swedish Sweden
tr_TR Turkish Turkey
zh_CN Chinese (Simplified) China
zh_TW Chinese (Traditional) Taiwan

Localized Resources

All locale-sensitive classes must be able to access resources customized for the
locales they support. To aid in the process of localization, it helps to have these
resources grouped together by locale and separated from the locale-neutral parts of the program.

ResourceBundle Class

The class ResourceBundle is an abstract base class representing containers of resources. Programmers create subclasses of ResourceBundle that contain resources for a particular locale. New resources can be added to an instance of ResourceBundle, or new instances of ResourceBundle can be added to a system without affecting the code that uses them. Packaging resources as classes allows developers to take advantage of Java's class loading mechanism to find resources.

Resource bundles contain locale-specific objects. When a program needs a locale-specific resource, a String object for example, the program can load it from the resource bundle that is appropriate for the current user's locale. In this way, the programmer can write code that is largely independent of the user's locale isolating most, if not all, of the locale-specific information in resource bundles.

This allows Java programmers to write code that can:

ListResourceBundle Class

ListResourceBundle is an abstract subclass of ResourceBundle that manages resources for a locale in a convenient and easy to use list.

PropertyResourceBundle Class

PropertyResourceBundle is a concrete subclass of ResourceBundle that manages resources for a locale using a set of static strings from a property file.

Calendar and Time Zone Support

JDK 1.0 introduced the java.util.Date class for the representation of dates and times. The java.util.Date class allowed for the interpretation of dates as year, month, day, hour, minute, and second values, and it formatted and parsed date strings. Unfortunately, the API for these functions was not amenable to internationalization. Only the "representation" part of this class is retained in JDK 1.1.

As of JDK 1.1, the Date class should only be used as a wrapper for a date or time. That is, Date objects represent a specific instant in time with millisecond precision. Instead, programmers should use the Calendar class to convert between date and time fields, and the DateFormat class to format and parse date strings. The corresponding methods in the JDK 1.0 version of the Date class have been deprecated.

Calendar Class

The class Calendar is an abstract base class which can convert between a point in time (represented as milliseconds from 00:00:00 GMT, Jan 1, 1970) and a set of integers representing the year, month, week and so on. GregorianCalendar is a concrete subclass of Calendar that does this according to the rules of the Gregorian calendar.

Calendar and its subclasses are useful for doing various manipulations with time values. Arithmetic can be performed on a Calendar object's fields and the resulting date determined. A Calendar object can produce all the time field values needed to implement the date-time formatting for a particular language and calendar style

TimeZone Class

The abstract class TimeZone encapsulates a time zone offset from UTC (Universal Coordinated Time) and a possible daylight-savings time offset. The class SimpleTimeZone is a concrete subclass that encapsulates some simple rules about daylight-savings time. These rules do not take into account historical changes in the laws affecting daylight-savings time. The Calendar class and its subclasses use the TimeZone and SimpleTimeZone classes to convert between local time and UTC, which is the internal representation used by Date objects. Most programs will not have to deal with TimeZone objects directly.


It is in formatting data for output that many cultural conventions are applied. Numbers, dates, times, and messages may all require formatting before they can be displayed. The Java platform provides a set of flexible formatting classes that can handle both the standard locale formats and programmer defined custom formats. These formatting classes are also able to parse formatted strings back into their constituent objects.

Format Class

The class Format is an abstract base class for formatting locale-sensitive information such as dates, times, messages, and numbers. Three main subclasses are provided: DateFormat, NumberFormat, and MessageFormat. These three also provide subclasses of their own.

DateFormat Class

Dates and times are stored internally in a locale-independent way, but should be formatted so that they can be displayed in a locale-sensitive manner. For example, the same date might be formatted as:

The class DateFormat is an abstract base class for formatting and parsing date and time values in a locale-independent manner. It has a number of static factory methods for getting standard time formats for a given locale.

The DateFormat object uses Calendar and TimeZone objects in order to interpret time values. By default, a DateFormat object for a given locale will use the appropriate Calendar object for that locale and the system's default TimeZone object. The programmer can override these choices if desired.

SimpleDateFormat Class

The class SimpleDateFormat is a concrete class for formatting and parsing dates and times in a locale-sensitive manner. It allows for formatting (milliseconds to text), parsing (text to milliseconds), and normalization.

DateFormatSymbols Class

The class DateFormatSymbols is used to encapsulate localizable date-time formatting data, such as the names of the months, the names of the days of the week, time of day, and the time zone data. The DateFormat and SimpleDateFormat classes both use the DateFormatSymbols class to encapsulate this information.

Usually, programmers will not use the DateFormatSymbols directly. Rather, they will implement formatting with the DateFormat class's factory methods.

NumberFormat Class

The class NumberFormat is an abstract base class for formatting and parsing numeric data. It contains a number of static factory methods for getting different kinds of locale-specific number formats.

The NumberFormat class helps programmers to format and parse numbers for any locale. Code using this class can be completely independent of the locale conventions for decimal points, thousands-separators, the particular decimal digits used, or whether the number format is even decimal. The application can also display a number as a normal decimal number, currency, or percentage:

DecimalFormat Class

Numbers are stored internally in a locale-independent way, but should be formatted so that they can be displayed in a locale-sensitive manner. For example, when using "#,###.00" as a pattern, the same number might be formatted as:

The class DecimalFormat, which is a concrete subclass of the NumberFormat class, can format decimal numbers. Programmers generally will not instantiate this class directly but will use the factory methods provided.

The DecimalFormat class has the ability to take a pattern string to specify how a number should be formatted. The pattern specifies attributes such as the precision of the number, whether leading zeros should be printed, and what currency symbols are used. The pattern string can be altered if a program needs to create a custom format.

DecimalFormatSymbols Class

The class DecimalFormatSymbols represents the set of symbols (such as the decimal separator, the grouping separator, and so on) needed by DecimalFormat to format numbers. DecimalFormat creates for itself an instance of DecimalFormatSymbols from its locale data. A programmer needing to change any of these symbols can get the DecimalFormatSymbols object from the DecimalFormat object and then modify it.

ChoiceFormat Class

The class ChoiceFormat is a concrete subclass of the NumberFormat class. The ChoiceFormat class allows the programmer to attach a format to a range of numbers. It is generally used in a MessageFormat object for handling plurals. See the "MessageFormat Class" section that follows for more information.

MessageFormat Class

Programs often need to build messages from sequences of strings, numbers and other data. For example, the text of a message displaying the number of files on a disk drive will vary:

If a message built from sequences of strings and numbers is hard-coded, it cannot be translated into other languages. For example, note the different positions of the parameters "3" and "G" in the following translations:

The class MessageFormat provides a means to produce concatenated messages in language-neutral way. The MessageFormat object takes a set of objects, formats them, and then inserts the formatted strings into the pattern at the appropriate places.

ParsePosition Class

The class ParsePosition is used by the Format class and its subclasses to keep track of the current position during parsing. The parseObject() method in the Format class requires a ParsePosition object as an argument.

FieldPosition Class

The FieldPosition class is used by the Format class and its subclasses to identify fields in formatted output. One version of the format() method in the Format class requires a FieldPosition object as an argument.

Locale-Sensitive String Operations

Programs frequently need to manipulate strings. Common operations on strings include searching and sorting. Some tasks, such as collating strings or finding various boundaries in text, are surprisingly difficult to get right and are even more difficult when multiple languages must be considered. The JDK provides classes for handling many of these common string manipulations in a locale-sensitive manner.

Collator Class

The Collator class performs locale-sensitive string comparison. Programmers use this class to build searching and alphabetical sorting routines for natural language text. Collator is an abstract base class. Its subclasses implement specific collation strategies. One subclass, RuleBasedCollator, is currently provided with the JDK and is applicable to a wide set of languages. Other subclasses may be created to handle more specialized needs.

RuleBasedCollator Class

The RuleBasedCollator class, which is a concrete subclass of the Collator class, provides a simple, data-driven, table collator. Using RuleBasedCollator, a programmer can create a customized table-based collator. For example, a programmer can build a collator that will ignore (or notice) uppercase letters, accents, and Unicode combining characters.

CollationElementIterator Class

The CollationElementIterator class is used as an iterator to walk through each character of an international string. Programmers use the iterator to return the ordering priority of the positioned character. The ordering priority of a character, or key, defines how a character is collated in the given Collator object. The CollationElementIterator class is used by the compare() method of the RuleBasedCollator class.

CollationKey Class

A CollationKey object represents a string under the rules of a specific Collator object. Comparing two CollationKey objects returns the relative order of the strings they represent. Using CollationKey objects to compare strings is generally faster than using the method. Thus, when the strings must be compared multiple times, for example when sorting a list of strings, it is more efficient to use CollationKey objects.

BreakIterator Class

The BreakIterator class indirectly implements methods for finding the position of the following types of boundaries in a string of text:

The conventions on where to break lines, sentences, words, and characters vary from one language to another. Since the BreakIterator class is locale-sensitive, it can be used by programs that perform text operations. For example, consider a a word processing program that can highlight a character, cut a word, move the cursor to the next sentence, or word-wrap at a line ending. This word processing program would use break iterators to determine the logical boundaries in text, enabling it to perform text operations in a locale-sensitive manner.

StringCharacterIterator Class

The StringCharacterIterator class provides the ability to iterate over a string of Unicode characters in a bidirectional manner. This class uses a cursor to move within a range of text, and can return individual characters or their index values. The StringCharacterIterator class implements the character iterator functionality of the CharacterIterator interface.

CharacterIterator Interface

The CharacterIterator interface defines a protocol for bidirectional iteration over Unicode characters. Classes should implement this interface if they want to move about within a range of text and return individual Unicode characters or their index values. CharacterIterator is for searching is useful when performing character searches.

Character Set Conversion

The Java platform uses Unicode as its native character encoding; however, many Java programs still need to handle text data in other encodings. Java therefore provides a set of classes that convert many standard character encodings to and from Unicode. Java programs that need to deal with non-Unicode text data will typically convert that data into Unicode, process the data as Unicode, then convert the result back to the external character encoding. The InputStreamReader and OutputStreamWriter classes provide methods that can convert between other character encodings and Unicode.

Supported Encodings

The InputStreamReader and OutputStreamWriter classes can convert between Unicode and the following set of character encodings:
Table 2 JDK 1.1 Character Encodings

Character Encoding


ISO8859_1 ISO 8859-1
ISO8859_2 ISO 8859-2
ISO8859_3 ISO 8859-3
ISO8859_4 ISO 8859-4
ISO8859_5 ISO 8859-5
ISO8859_6 ISO 8859-6
ISO8859_7 ISO 8859-7
ISO8859_8 ISO 8859-8
ISO8859_9 ISO 8859-9
Big5 Big5, Traditional Chinese
Cp037 USA, Canada(Bilingual, French), Netherlands, Portugal, Brazil, Australia
Cp1006 IBM AIX Pakistan (Urdu)
Cp1025 IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovinia, Macedonia(FYR)
Cp1026 IBM Latin-5, Turkey
Cp1046 IBM Open Edition US EBCDIC
Cp1097 IBM Iran(Farsi)/Persian
Cp1098 IBM Iran(Farsi)/Persian (PC)
Cp1112 IBM Latvia, Lithuania
Cp1122 IBM Estonia
Cp1123 IBM Ukraine
Cp1124 IBM AIX Ukraine
Cp1250 Windows Eastern European
Cp1251 Windows Cyrillic
Cp1252 Windows Latin-1
Cp1253 Windows Greek
Cp1254 Windows Turkish
Cp1255 Windows Hebrew
Cp1256 Windows Arabic
Cp1257 Windows Baltic
Cp1258 Windows Vietnamese
Cp1381 IBM OS/2, DOS People's Republic of China (PRC)
Cp1383 IBM AIX People's Republic of China (PRC)
Cp273 IBM Austria, Germany
Cp277 IBM Denmark, Norway
Cp278 IBM Finland, Sweden
Cp280 IBM Italy
Cp284 IBM Catalan/Spain, Spanish Latin America
Cp285 IBM United Kingdom, Ireland
Cp297 IBM France
Cp33722 IBM-eucJP - Japanese (superset of 5050)
Cp420 IBM Arabic
Cp424 IBM Hebrew
Cp437 MS-DOS United States, Australia, New Zealand, South Africa
Cp500 EBCDIC 500V1
Cp737 PC Greek
Cp775 PC Baltic
Cp838 IBM Thailand extended SBCS
Cp850 MS-DOS Latin-1
Cp852 MS-DOS Latin-2
Cp855 IBM Cyrillic
Cp857 IBM Turkish
Cp860 MS-DOS Portuguese
Cp861 MS-DOS Icelandic
Cp862 PC Hebrew
Cp863 MS-DOS Canadian French
Cp864 PC Arabic
Cp865 MS-DOS Nordic
Cp866 MS-DOS Russian
Cp868 MS-DOS Pakistan
Cp869 IBM Modern Greek
Cp870 IBM Multilingual Latin-2
Cp871 IBM Iceland
Cp874 IBM Thai
Cp875 IBM Greek
Cp918 IBM Pakistan(Urdu)
Cp921 IBM Latvia, Lithuania (AIX, DOS)
Cp922 IBM Estonia (AIX, DOS)
Cp930 Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026
Cp933 Korean Mixed with 1880 UDC, superset of 5029
Cp935 Simplified Chinese Host mixed with 1880 UDC, superset of 5031
Cp937 Traditional Chinese Host miexed with 6204 UDC, superset of 5033
Cp939 Japanese Latin Kanji mixed with 4370 UDC, superset of 5035
Cp942 Japanese (OS/2) superset of 932
Cp948 OS/2 Chinese (Taiwan) superset of 938
Cp949 PC Korean
Cp950 PC Chinese (Hong Kong, Taiwan)
Cp964 AIX Chinese (Taiwan)
Cp970 AIX Korean
EUC_CN GB2312, EUC encoding, Simplified Chinese
EUC_JP JIS0201, 0208, 0212, EUC Encoding, Japanese
EUC_KR KS C 5601, EUC Encoding, Korean
EUC_TW CNS11643 (Plane 1-3), T. Chinese, EUC encoding
GBK GBK, Simplified Chinese
ISO2022CN ISO 2022 CN, Chinese
ISO2022CN_CNS CNS 11643 in ISO-2022-CN form, T. Chinese
ISO2022CN_GB GB 2312 in ISO-2022-CN form, S. Chinese
ISO2022JP JIS0201, 0208, 0212, ISO2022 Encoding, Japanese
ISO2022KR ISO 2022 KR, Korean
JIS0201 JIS 0201, Japanese
JIS0208 JIS 0208, Japanese
JIS0212 JIS 0212, Japanese
KOI8_R KOI8-R, Russian
MS874 Windows Thai
MacArabic Macintosh Arabic
MacCentralEurope Macintosh Latin-2
MacCroatian Macintosh Croatian
MacCyrillic Macintosh Cyrillic
MacDingbat Macintosh Dingbat
MacGreek Macintosh Greek
MacHebrew Macintosh Hebrew
MacIceland Macintosh Iceland
MacRoman Macintosh Roman
MacRomania Macintosh Romania
MacSymbol Macintosh Symbol
MacThai Macintosh Thai
MacTurkish Macintosh Turkish
MacUkraine Macintosh Ukraine
SJIS Shift-JIS, Japanese

AWT Attributes

To aid in the internationalization of a program's GUI, JDK 1.1 provides two additional attributes for the Component class: Name and Locale.

Name Attribute

The Name attribute is a String object which serves as a non-localizable identifier for a Component object. New constructors for the Component class and its subclasses allow the Name attribute to be set. If these constructors are not used, Component objects are assigned a default Name. The method Component.getName() can be used to examine a Component object's Name attribute.

The Name attribute is particularly useful in writing Action handling routines in which a reference to the target is not known ahead of time. Such Action handlers are often generated by GUI builders. Previously, these routines tried to identify the target Component by looking at its label string. This approach failed when the label strings are localized. As of JDK 1.1, programmers should use the Component.getName() method instead.

Locale Attribute

The Component class now contains a Locale attribute. This attribute is accessed by the methods getLocale() and setLocale()methods. If a Component object's Locale is not explicitly set, its value defaults to the Locale of the Component object's parent. If no Component in the hierarchy has an explicit Locale, the default is the value of Locale.getDefault().

The Locale attribute of Component allows the GUI (or portions of the GUI) to maintain its own default locale. This would be useful, for example, in an applet that used the Japanese locale even when the rest of the browser was using the USA locale.

Stream I/O

JDK 1.1 provides two major enhancements to the package to improve the handling of character date: the new Reader and Writer classes, and an enhancement to the PrintStream class.

Reader and Writer Classes

The Reader and Writer class hierarchies provide the ability to perform I/O operations on character streams. These hierarchies parallel the InputStream and OutputStream class hierarchies, but operate on streams of characters rather than streams of bytes. Character streams make it easy to write programs that are not dependent upon a specific character encoding, and are therefore easier to internationalize. The Reader and Writer classes also have the ability to convert between Unicode and other character encodings. Please refer to the Character Streams document for more information about the Reader and Writer class hierarchies.

PrintStream Class

The PrintStream class has been enhanced to produce output using the system's default character encoding and line terminator. This change allows methods such as System.out.println() to act more reasonably with non-ASCII data.

Character Classification

The Java platform stores character data in Unicode- an international character set standard. The Unicode Standard uses a 16-bit encoding to support all of the major scripts of the world, as well as common technical symbols. Most Java code is written in ASCII, a 7-bit standard, or ISO-Latin-1, an 8-bit standard, but is translated into Unicode before processing. Therefore, the Java character set is always represented in Unicode.

JDK 1.0 introduced the Character class as an object wrapper to the char primitive type. The Character class also contained some static methods such as isLowerCase() and isDigit() for determining the properties of a character. This set of methods has been extended in JDK 1.1 to allow access to all the Unicode 2.0 defined properties for a character.

Contents | Prev
Copyright © 1996, 1997, 1998 Sun Microsystems, Inc. All rights reserved.