module Utf8_char: sig
.. end
Defines an abstraction and some utilities for UTF-8 encoded characters.
type
t
the type of UTF-8 encoded characters.
exception Bad_encoding
raised if an invalid UTF-8 encoding is encountered
val of_char : char -> t
Creates a UTF-8 character from a regular ASCII character.
Raises Bad_encoding
if the character is outside the range 0-127
val of_bytes : string -> t
Creates a UTF-8 character from a string of bytes.
Warning! Does not check if the bytes are valid! Use to_U_char to check.
val to_bytes : t -> string
Emits a byte string from a UTF-8 character
val size : t -> int
Determines the size, in bytes, of the UTF-8 character.
val to_U_char : t -> U_char.t
Decode the character to a UNICODE character.
Raises Bad_encoding
if there was a decoding error.
val utf8_character_size : int -> int
Decoding utility: given just the first byte of a utf-8 character, determines how many bytes the whole character should be. (In UTF-8, this information is encoded in the first byte)