Node:Selecting the Conversion, Next:Keeping the state, Up:Restartable multibyte conversion
We already said above that the currently selected locale for the
LC_CTYPE
category decides about the conversion which is performed
by the functions we are about to describe. Each locale uses its own
character set (given as an argument to localedef
) and this is the
one assumed as the external multibyte encoding. The wide character
character set always is UCS-4, at least on GNU systems.
A characteristic of each multibyte character set is the maximum number
of bytes which can be necessary to represent one character. This
information is quite important when writing code which uses the
conversion functions. In the examples below we will see some examples.
The ISO C standard defines two macros which provide this information.
int MB_LEN_MAX | Macro |
This macro specifies the maximum number of bytes in the multibyte
sequence for a single character in any of the supported locales. It is
a compile-time constant and it is defined in limits.h .
|
int MB_CUR_MAX | Macro |
MB_CUR_MAX expands into a positive integer expression that is the
maximum number of bytes in a multibyte character in the current locale.
The value is never greater than MB_LEN_MAX . Unlike
MB_LEN_MAX this macro need not be a compile-time constant and in
fact, in the GNU C library it is not.
MB_CUR_MAX is defined in stdlib.h .
|
{ char buf[MB_LEN_MAX]; ssize_t len = 0; while (! feof (fp)) { fread (&buf[len], 1, MB_CUR_MAX - len, fp); /* ... process buf */ len -= used; } }The code in the inner loop is expected to have always enough bytes in the array buf to convert one multibyte character. The array buf has to be sized statically since many compilers do not allow a variable size. The
fread
call makes sure that always
MB_CUR_MAX
bytes are available in buf. Note that it isn't
a problem if MB_CUR_MAX
is not a compile-time constant.