Re: indent mangles UTF-8

bug-indent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: indent mangles UTF-8

From:	Petr Pisar
Subject:	Re: indent mangles UTF-8
Date:	Mon, 27 Mar 2023 10:53:23 +0200

V Fri, Mar 24, 2023 at 11:01:04AM -0700, Adam Wozniak napsal(a):
> Using "indent" on a C file with structure members with UTF-8 names (as
> allowed under C99 and later).
> 
> indent completely mangles these member names, inserting spaces between UTF8
> bytes.
> 
> -double ə14(double GST, struct φλ φλ) {

C99 leaves Unicode characters in identifiers as an implementation-defined
option:

    An implementation may allow multibyte characters that are not part of the
    basic source character set to appear in identifiers; which characters and
    their correspondence to universal character names is
    implementation-defined.

You probably mistaken Unicode characters with Unicode character names
(a sequence like \uNNNN and \UNNNNNNNN):

    Universal character names may be used in identifiers, character constants,
    and string literals to designate characters that are not in the basic
    character set.

Hence C99-conforming compiler must support:

    double ə14(double GST, struct \u03c6\u03bb \u03c6\u03bb);

but may support:

    double ə14(double GST, struct φλ φλ);

while the interoperbility of the latter (e.g. linking to compilation units
together) is completely unspecified.

I don't say that cindent could not support Unicode characters (probably
depending on a locale because indent needs understand them to align columns
properly). Only that your claim about UTF-8 support in C99 is misleading.

-- Petr

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

indent mangles UTF-8, Adam Wozniak, 2023/03/24
- Re: indent mangles UTF-8, Petr Pisar <=

Prev by Date: indent mangles UTF-8
Previous by thread: indent mangles UTF-8
Index(es):
- Date
- Thread