[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Is copy_string_contents in emacs-module.h give us a proper UTF-8 str
From: |
Eli Zaretskii |
Subject: |
Re: Is copy_string_contents in emacs-module.h give us a proper UTF-8 string? |
Date: |
Thu, 08 Oct 2020 10:38:13 +0300 |
> Date: Thu, 8 Oct 2020 14:09:53 +0800 (CST)
> From: "Zhu Zihao" <all_but_last@163.com>
>
> To support this multitude of characters and scripts, Emacs closely
> follows the “Unicode Standard”. The Unicode Standard assigns a unique
> number, called a “codepoint”, to each and every character. The range of
> codepoints defined by Unicode, or the Unicode “codespace”, is
> ‘0..#x10FFFF’ (in hexadecimal notation), inclusive. Emacs extends this
> range with codepoints in the range ‘#x110000..#x3FFFFF’, which it uses
> for representing characters that are not unified with Unicode and “raw
> 8-bit bytes” that cannot be interpreted as characters. Thus, a
> character codepoint in Emacs is a 22-bit integer.
>
> Will "copy_string_contents" always give us a proper UTF-8 string. Or it will
> give us a mix of bytevector and
> UTF8?
If the original string includes raw bytes, copy_string_contents will
signal an error.