[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Incompatible compiler option fexec-charset
From: |
Richard Frith-Macdonald |
Subject: |
Re: Incompatible compiler option fexec-charset |
Date: |
Wed, 7 Dec 2011 13:19:32 +0000 |
On 7 Dec 2011, at 12:35, David Chisnall wrote:
> On 20 Nov 2011, at 19:45, Richard Frith-Macdonald wrote:
>
>>
>> On 20 Nov 2011, at 11:38, David Chisnall wrote:
>>
>>> This flag also isn't recognised by clang. What does GCC 4.x need it for?
>>
>> The -fexec-charset=UTF-8 tells the compiler to encode string literals as
>> UTF-8 in the binary. This allows developers to put any character they like
>> in a string literal and have GNUstep get things right at runtime because
>> base knows the compiler will have encoded all literals as UTF-8
>>
>> It's not actually clear what the compiler did prior to that option being
>> introduced ... from what I've read it seems likely that it simply used
>> whatever string encoding was set in the locale that was in use at the time
>> when the code was compiled, with no mechanism to know what that encoding was
>> at the point when the executable would run.
>>
>> So the only drawback to removing the option for older compilers is that
>> non-ascii string literals would malfunction (but such literals have simply
>> been illegal up to now anyway) ... so it would be reasonable to have an
>> autoconf check to see if the option works, and disable it and print a
>> warning. I hate writing autoconf stuff though, so I'd rather someone who's
>> interested in supporting old compilers did it.
>
>
> I misunderstood why we were using this option. I was under the impression
> that it was related to the encoding of NSConstantString objects, which should
> be UTF-8 by default.
That's right.
> The check in the configure script (which breaks the build with clang now -
> apparently it was not tested before being committed)
The script provides instruction on how to ignore the check for compilers which
don't support the 'standard' gcc behaviors. That worked on my system when I
tested it. I put that option in for old versions of gcc and because the latest
info I managed to find for clang was that it didn't support characterset
specifier flags and didn't check what characterset it was writing using for
string literals.
> is testing for something very different - it is checking whether we can put a
> latin1 character in a source file and have the compiler magically know that
> the source is latin1 and translate it to UTF-8.
It's testing to see if we need to use the command line options (to force the
use of UTF-8) or not ... by seeing if the compiler stores the string correctly
(ie as UTF-8) in the executable without it.
> This is amazingly fragile, because it requires that the compiler guess that
> the source file is latin1.
> If you want UTF-8 characters in C string literals then you should save the
> file in UTF-8 format or (better) you should use the correct escape sequences.
Great idea ... but not what the gcc documentation says ... how would we enforce
it on our users?
The gcc documentation says the source characterset is (by default) whatever the
current locale says it is (or UTF-8 if the compiler can't determine it from the
locale) ... unless overridden by the -finput-charset= command line option.
The check sees if the compiler is performing according to those rules (in which
case no command line options are needed), or if the compiler supports the
options to specify the charactersets (in which case we use those options). If
you don't want the check (either you don't have any non-ascii literals, or you
are sure your compiler will be generating UTF-8 output) you can disable it.
> If we are depending on the compiler doing this translation anywhere in
> GNUstep then we should fix that. Are we?
We shouldn't be using non-ascii string literals anywhere in our source.