[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Use UTF-8 active code page for Windows host.

From: Eli Zaretskii
Subject: Re: [PATCH] Use UTF-8 active code page for Windows host.
Date: Mon, 20 Mar 2023 13:54:31 +0200

> From: Costas Argyris <costas.argyris@gmail.com>
> Date: Sun, 19 Mar 2023 21:25:30 +0000
> Cc: bug-make@gnu.org, Paul Smith <psmith@gnu.org>
> I create a file src.β first:
> touch src.β
> and then run the following UTF-8 encoded Makefile:
> hello :
> @gcc ©\src.c -o ©\src.exe
> ifneq ("$(wildcard src.β)","")
> @echo src.β exists
> else
> @echo src.β does NOT exist
> endif
> ifneq ("$(wildcard src.Β)","")
> @echo src.Β exists
> else
> @echo src.Β does NOT exist
> endif
> ifneq ("$(wildcard src.βΒ)","")
> @echo src.βΒ exists
> else
> @echo src.βΒ does NOT exist
> endif
> and the output of Make is:
> C:\Users\cargyris\temp>make -f utf8.mk
> src.β exists
> src.Β exists
> src.βΒ does NOT exist
> which shows that it finds the one with the upper case extension as well,
> despite the fact that it exists in the file system as a lower case extension.

That's most probably because $(wildcard) calls a Win32 API that is
case-insensitive.  So the jury is still out on this matter, and I
still believe that the below is true:

> My guess would be that only characters within the locale, defined by
> the ANSI codepage, are supported by locale-aware functions in the C
> runtime.  That's because this is what happens even if you use "wide"
> Unicode APIs and/or functions like _wcsicmp that accept wchar_t
> characters: they all support only the characters of the current locale
> set by 'setlocale'.  I don't expect that to change just because UTF-8
> is used on the outside: internally, everything is converted to UTF-16,
> i.e. to the Windows flavor of wchar_t.
> But this one looks most relevant to your point:
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170#utf-8-support
> "Starting in Windows 10 version 1803 (10.0.17134.0), the Universal C Runtime 
> supports using a UTF-8 code
> page. The change means that char strings passed to C runtime functions can 
> expect strings in the UTF-8
> encoding. To enable UTF-8 mode, use ".UTF8" as the code page when using 
> setlocale. For example,
> setlocale(LC_ALL, ".UTF8") will use the current default Windows ANSI code 
> page (ACP) for the locale and
> UTF-8 for the code page."

This is about UCRT specifically, so I wonder whether MSVCRT will
behave the same.

> My point is, with the manifest embedded at build time, ACP will be UTF-8
> already when the program (Make) runs, so no need to do anything more.

Only on Windows versions that support this.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]