That's not a good experiment, IMO: the only non-ASCII character here
is U+274E, which has no case variants. And the characters whose
letter-case you tried to change are all ASCII, so their case
conversions are unaffected by the locale.
OK I think this is a better one, it is using U+03B2 and U+0392 which
are the lower and upper case of the same letter (β and Β).
I create a file src.β first:
and then run the following UTF-8 encoded Makefile:
@gcc ©\src.c -o ©\src.exe
ifneq ("$(wildcard src.β)","")
@echo src.β exists
@echo src.β does NOT exist
ifneq ("$(wildcard src.Β)","")
@echo src.Β exists
@echo src.Β does NOT exist
ifneq ("$(wildcard src.βΒ)","")
@echo src.βΒ exists
@echo src.βΒ does NOT exist
and the output of Make is:
C:\Users\cargyris\temp>make -f utf8.mk
src.βΒ does NOT exist
which shows that it finds the one with the upper case extension as well,
despite the fact that it exists in the file system as a lower case extension.
My guess would be that only characters within the locale, defined by
the ANSI codepage, are supported by locale-aware functions in the C
runtime. That's because this is what happens even if you use "wide"
Unicode APIs and/or functions like _wcsicmp that accept wchar_t
characters: they all support only the characters of the current locale
set by 'setlocale'. I don't expect that to change just because UTF-8
is used on the outside: internally, everything is converted to UTF-16,
i.e. to the Windows flavor of wchar_t.
When the manifest is used to set the active code page of the process
to UTF-8, the current ANSI code page does become UTF-8, so that
might explain why the above example is working.
As mentioned in:
"Also, the run-time library might obtain and use the value of the operating system code page, which is constant for the duration of the program's execution."
This seems to be offering some kind of confirmation.
But this one looks most relevant to your point:
"Starting in Windows 10 version 1803 (10.0.17134.0), the Universal C Runtime supports using a UTF-8 code page. The change means that char strings passed to C runtime functions can expect strings in the UTF-8 encoding. To enable UTF-8 mode, use ".UTF8" as the code page when using setlocale. For example, setlocale(LC_ALL, ".UTF8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page."
setlocale (LC_ALL, "");
so this could be changed to:
setlocale (LC_ALL, ".UTF8")
conditionally on the Windows version above, but I'm not sure if that is even
necessary, given the UTF-8 manifest change.
From reading the above doc my understanding is that embedding the UTF-8
manifest has an effect that covers the C runtime as well. For example:
"UTF-8 mode is also enabled for functions that have historically translated char strings using the default Windows ANSI code page (ACP). For example, calling _mkdir("😊") while using a UTF-8 code page will correctly produce a directory with that emoji as the folder name, instead of requiring the ACP to be changed to UTF-8 before running your program. Likewise, calling _getcwd() in that folder will return a UTF-8 encoded string. For compatibility, the ACP is still used if the C locale code page isn't set to UTF-8."
I have highlighted the important parts in bold.
My point is, with the manifest embedded at build time, ACP will be UTF-8
already when the program (Make) runs, so no need to do anything more.
This advice is for how to use UTF-8 in the C runtime if you don't have
ACP == UTF-8.
The Unicode -W APIs are different compared to the -A APIs in that
they don't even look at the current ANSI code page, they just use UTF-16.