Node:Generic Charset Conversion, Previous:Non-reentrant Conversion, Up:Character Set Handling
The conversion functions mentioned so far in this chapter all had in
common that they operate on character sets which are not directly
specified by the functions. The multibyte encoding used is specified by
the currently selected locale for the LC_CTYPE
category. The
wide character set is fixed by the implementation (in the case of GNU C
library it always is UCS-4 encoded ISO 10646.
This has of course several problems when it comes to general character
conversion:
LC_CTYPE
category,
one has to change the LC_CTYPE
locale using setlocale
.
This introduces major problems for the rest of the programs since
several more functions (e.g., the character classification functions,
see Classification of Characters) use the LC_CTYPE
category.
LC_CTYPE
selection is global and shared by all
threads.
wchar_t
representation there is at least a two-step
process necessary to convert a text using the functions above. One
would have to select the source character set as the multibyte encoding,
convert the text into a wchar_t
text, select the destination
character set as the multibyte encoding and convert the wide character
text to the multibyte (= destination) character set.
Even if this is possible (which is not guaranteed) it is a very tiring
work. Plus it suffers from the other two raised points even more due to
the steady changing of the locale.
iconv
, the
conversion function, will be described. Comparisons with other
implementations will show what pitfalls lie on the way of portable
applications. At last, the implementation is described as far as
interesting to the advanced user who wants to extend the conversion
capabilities.
iconv
example.
iconv
Implementations.
iconv
Implementation in the GNU C
library.