emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is copy_string_contents in emacs-module.h give us a proper UTF-8 str


From: Eli Zaretskii
Subject: Re: Is copy_string_contents in emacs-module.h give us a proper UTF-8 string?
Date: Thu, 08 Oct 2020 10:38:13 +0300

> Date: Thu, 8 Oct 2020 14:09:53 +0800 (CST)
> From: "Zhu Zihao" <all_but_last@163.com>
> 
>    To support this multitude of characters and scripts, Emacs closely
> follows the “Unicode Standard”.  The Unicode Standard assigns a unique
> number, called a “codepoint”, to each and every character.  The range of
> codepoints defined by Unicode, or the Unicode “codespace”, is
> ‘0..#x10FFFF’ (in hexadecimal notation), inclusive.  Emacs extends this
> range with codepoints in the range ‘#x110000..#x3FFFFF’, which it uses
> for representing characters that are not unified with Unicode and “raw
> 8-bit bytes” that cannot be interpreted as characters.  Thus, a
> character codepoint in Emacs is a 22-bit integer.
> 
> Will "copy_string_contents" always give us a proper UTF-8 string. Or it will 
> give us a mix of bytevector and
> UTF8?

If the original string includes raw bytes, copy_string_contents will
signal an error.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]