| CONVCS(2) | System Calls Manual | CONVCS(2) |
Convcs, Btos, Stob - character set conversion suite
include "convcs.m";
convcs := load Convcs Convcs->PATH;
Btos: module {
init: fn(arg: string): string;
btos: fn(s: Convcs->State, b: array of byte, nchars: int)
: (Convcs->State, string, int);
};
Stob: module {
init: fn (arg: string): string;
stob: fn(s: Convcs->State, str: string)
: (Convcs->State, array of byte);
};
Convcs: module {
State: type string;
Startstate: con "";
init: fn(csfile: string): string;
getbtos: fn(cs: string): (Btos, string);
getstob: fn(cs: string): (Stob, string);
enumcs: fn(): list of (string, string, int);
aliases: fn(cs: string): (string, list of string);
};
The Convcs suite is a collection of modules for converting various standard coded character sets and character encoding schemes to and from the Limbo strings.
The Convcs module provides an entry point to the suite, mapping character set names and aliases to their associated converter implementation.
The Btos module returned by getbtos() is already initialised and is ready to start the conversion. Conversions can be made on a individual basis, or in a `streamed' mode.
converter->btos(s, b, nchars)
The argument s is a converter state as returned from the previous call to btos on the same input stream. The first call to btos on a particular input stream should give Convcs->Startstate (or nil) as the value for s. The argument b is the bytes to be converted. The argument nchars is the maximal length of the string to be returned. If this argument is -1 then as much of b will be consumed as possible. A value of 0 indicates to the converter that there is no more data and that any pending state should be flushed.
The return value of btos is the tuple (state, str, nbytes) where state is the new state of the converter, str is the converted string, and nbytes is the number of bytes from b consumed by the conversion.
The same converter module can be used for multiple conversion streams by maintaining a separate state variable for each stream.
The Stob module returned by getstob() is already initialised and is ready to start the conversion.
converter->stob(s, str)
The return value of stob is the tuple (state, bytes) where state is the new state of the converter and bytes is the result of the conversion.
When using converter->btos() to convert data to Limbo strings, any byte sequences that are not valid for the specific character encoding scheme will be converted to the Unicode error character 16rFFFD.
When using converter->stob() to convert Limbo strings, any Unicode characters that can not be mapped into the character set will normally be substituted by the US-ASCII code for `?'. Note that this may be inappropriate for certain conversions, such converters will use a suitable error character for their particular character set and encoding scheme.
The file /lib/convcs/charsets provides the mapping between character set names and their implementation modules. The file format conforms to that supported by cfg (2). The following description relies on terms defined in the cfg (2) manual page.
Each record name defines a character set name. If the primary value of the record is non-empty then the name is an alias, the value being the real name. An alias record must point to an actual converter record, not to another alias, as Convcs only follows one level of aliasing.
Each converter record consists of a set of tuples with the following primary attributes:
Both the btos and stob tuples can have an optional arg attribute which is passed to the init() function of the converter when initialised by Convcs. If a converter record has neither an stob nor a btos tuple, then it is ignored.
The following example is an extract from the standard Inferno charsets file:
cp866=ibm866 866=ibm866 ibm866=
desc='Russian MS-DOS CP 866'
stob=/dis/lib/convcs/cp_stob.dis arg=/lib/convcs/ibm866.cp
btos=/dis/lib/convcs/cp_btos.dis arg=/lib/convcs/ibm866.cp
This entry defines Stob and Btos converters for the character set called ibm866. The converters are actually the generic codepage converters cp_stob and cp_btos paramaterized with a codepage file. The entry also defines the aliases cp866 and 866 for the name ibm866.