Node:Keeping the state, Next:Converting a Character, Previous:Selecting the Conversion, Up:Restartable multibyte conversion
In the introduction of this chapter it was said that certain character sets use a stateful encoding. I.e., the encoded values depend in some way on the previous bytes in the text. Since the conversion functions allow converting a text in more than one step we must have a way to pass this information from one call of the functions to another.
mbstate_t | Data type |
A variable of type mbstate_t can contain all the information
about the shift state needed from one call to a conversion
function to another.
This type is defined in wchar.h . It got introduced in
Amendment 1 to ISO C90.
|
{ mbstate_t state; memset (&state, '\0', sizeof (state)); /* from now on state can be used. */ ... }When using the conversion functions to generate output it is often necessary to test whether the current state corresponds to the initial state. This is necessary, for example, to decide whether or not to emit escape sequences to set the state to the initial state at certain sequence points. Communication protocols often require this.
int mbsinit (const mbstate_t *ps) | Function |
This function determines whether the state object pointed to by ps
is in the initial state or not. If ps is a null pointer or the
object is in the initial state the return value is nonzero. Otherwise
it is zero.
This function was introduced in Amendment 1 to ISO C90 and
is declared in wchar.h .
|
{ mbstate_t state; memset (&state, '\0', sizeof (state)); /* Use state. */ ... if (! mbsinit (&state)) { /* Emit code to return to initial state. */ const wchar_t empty[] = L""; const wchar_t *srcp = empty; wcsrtombs (outbuf, &srcp, outbuflen, &state); } ... }The code to emit the escape sequence to get back to the initial state is interesting. The
wcsrtombs
function can be used to determine the
necessary output code (see Converting Strings). Please note that on
GNU systems it is not necessary to perform this extra action for the
conversion from multibyte text to wide character text since the wide
character encoding is not stateful. But there is nothing mentioned in
any standard which prohibits making wchar_t
using a stateful
encoding.