[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: indent mangles UTF-8
From: |
Petr Pisar |
Subject: |
Re: indent mangles UTF-8 |
Date: |
Mon, 27 Mar 2023 10:53:23 +0200 |
V Fri, Mar 24, 2023 at 11:01:04AM -0700, Adam Wozniak napsal(a):
> Using "indent" on a C file with structure members with UTF-8 names (as
> allowed under C99 and later).
>
> indent completely mangles these member names, inserting spaces between UTF8
> bytes.
>
> -double ə14(double GST, struct φλ φλ) {
C99 leaves Unicode characters in identifiers as an implementation-defined
option:
An implementation may allow multibyte characters that are not part of the
basic source character set to appear in identifiers; which characters and
their correspondence to universal character names is
implementation-defined.
You probably mistaken Unicode characters with Unicode character names
(a sequence like \uNNNN and \UNNNNNNNN):
Universal character names may be used in identifiers, character constants,
and string literals to designate characters that are not in the basic
character set.
Hence C99-conforming compiler must support:
double ə14(double GST, struct \u03c6\u03bb \u03c6\u03bb);
but may support:
double ə14(double GST, struct φλ φλ);
while the interoperbility of the latter (e.g. linking to compilation units
together) is completely unspecified.
I don't say that cindent could not support Unicode characters (probably
depending on a locale because indent needs understand them to align columns
properly). Only that your claim about UTF-8 support in C99 is misleading.
-- Petr
signature.asc
Description: PGP signature