1026 Part VI ✦ National Language Support
✦ Support for date and time formats according to ISO standards worldwide.
✦ Support for Julian, Gregorian, Japanese, Imperial, and Thai Buddha calendars so that local requirements can be satisfied.
The first step in implementing support for the languages that will be used to store database data is to pick a character set and National Language (NLS) character set for your database. This is not an option. As you saw in Chapter 4, the CREATE DATABASE command requires that both the CHARACTER SET and NATIONAL CHARACTER SET clauses have a value provided. You were also informed at that time to keep the character set and NLS character set fairly similar so as not to introduce inefficiencies into your database when processing data in NCHAR, NVARCHAR2, and NCLOB columns, which use the NLS character set. Of course, the $64,000 question is which character set and NLS character set should you pick.
When choosing a character set, one of the things that you need to be aware of is what a character set and NLS character set actually means for the type of character data that can be stored and where it can be stored. In understanding this, you need to consider the data types supported by Oracle columns and the encoding schemes that are used by computers and operating systems to organize character data.
A character set encoding scheme determines how the character data will be physically stored in the database and how many bits and bytes will be needed to store the individual character. Essentially, the encoding scheme maps a binary value (or decimal, or hexadecimal) to a specific letter or character in the character set. A single encoding scheme can support many character sets but a character set will only use one of the available encoding schemes. The encoding scheme used will have an impact on the number of characters that can be supported by the character set, so choosing one appropriate to the language that will be stored in the database is important.
When using a single-byte encoding scheme, each character in the character set takes up one byte of data. Oracle supports two types of single-byte encoding schemes: 7-bit (which supports 128 characters from decimal 0 to decimal 127, or 27) or 8-bit (which supports 256 characters from decimal 0 to decimal 255, or 28). The character sets that correspond to the 7-bit and 8-bit single-byte encoding schemes include
US7ASCII (a 7-bit characters set that is the default character set for Oracle when creating a database), ISO 8859-1 Western European character set (WE8ISO8859P1, the recommended character set for databases whose data will contain character data in any of the Western European languages such as English, German, French, and so on), EBCDIC Code Page 500 8-bit West European (WE8ENCDIC500 used on EBCDIC-based platforms such as AS400 or IBM mainframes), DEC 8-bit West European (WE8DEC used on OpenVMS and other Compaq/Digital platforms), and others. The character set available will determine which actual character that you would see on paper would appear on the screen when a specific numerical (that is, binary, decimal or hex) is stored in the database. For example, the hex values used to store character data when using the WE8ISO8859P1 character set would store characters displayed in Figure 20-1. A different character set would have different values for hex value D4, but all character sets have the same values for all characters whose hex value is less than or equal to 7F.