[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25987: 25.2; support gcc fixit notes

From: Eli Zaretskii
Subject: bug#25987: 25.2; support gcc fixit notes
Date: Thu, 12 Nov 2020 15:54:31 +0200

> From: David Malcolm <dmalcolm@redhat.com>
> Cc: 25987@debbugs.gnu.org
> Date: Wed, 11 Nov 2020 14:36:49 -0500
> On Tue, 2020-10-20 at 18:54 +0300, Eli Zaretskii wrote:
> > > From: David Malcolm <dmalcolm@redhat.com>
> > > Cc: 25987@debbugs.gnu.org
> > > Date: Tue, 20 Oct 2020 10:52:05 -0400
> > > 
> > > One possible issue: in the final diagnostic, there's a fix-it hint
> > > with
> > > non-ASCII replacement text, replacing "two_pi" with "two_π" (where
> > > the
> > > final char in the latter is GREEK SMALL LETTER PI, U+03C0)
> > > 
> > > This replacement currently expressed as encoded bytes i.e:
> > > 
> > > fix-it:"demo.c":{51:10-51:16}:"two_\317\200"
> > > 
> > > where \317\200 is the octal-escaped representation of the two bytes
> > > of
> > > the UTF-8 encoding of the character.
> > > 
> > > Is this going to work for Emacs?
> > 
> > You mean, GCC doesn't actually emit the UTF-8 encoding of π, it emits
> > its ASCII-fied representation?  We'd need to decode that, but is that
> > really justified?  Why not emit UTF-8?
> I have an implementation that simply emits UTF-8 in quotes, escaping
> backslash, tab, newline, and doublequotes as before.  (we have to
> escape at least newline, given that fix-it hint replacement text can
> contain them, and we're using newline to terminate the parseable hint).

Sorry, I've lost the context: where did those non-ASCII names come
from? are they names of variables in the user's program?  If so, in
what encoding does GCC quote portions of the source code in its
warning/error messages?  Does it use the exact byte stream it found in
the source, or does it perform any conversions of the encoding?

> However, the filename also needs to be escaped.  Currently I'm applying
> the same escaping rules to both filename and replacement text.
> What is the encoding of the filename?  What if the bytes in a filename
> aren't UTF-8 encoded?  How does emacs handle this case?

Emacs has a separate variable for the encoding of file names, which
gets set from the locale settings.  But this is not necessarily
relevant to the issue at hand, because we are talking about processing
output from a sub-process (GCC) which includes both file names and
other stuff, such as fragments of the source code.  When Emacs
processes sub-process output, it generally assumes all of it is
encoded in the same encoding.  So if, for example, you encode
non-ASCII variables in UTF-8 while the file names are emitted in some
other encoding (perhaps because the locale's codeset is not UTF-8),
then there will be complications: we will have to read the output from
GCC in its raw form, and then decode "by hand" (in Lisp) each part of
it as appropriate (which means we will need to be able to identifye
each such part).

So it's important to understand the situation and its limitations for
proposing the best solution.

> I tried creating file with the name "byte 0xff" .txt, and with valid
> UTF-8 non- ascii names and emacs reported them as \377.txt and with
> the UTF-8 names respectively, so perhaps I should simply emit the
> bytes and pretend they are UTF-8?

What do you mean by "pretend" in this context?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]