bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] normalization tweaks for macOS


From: Grisha Levit
Subject: Re: [PATCH] normalization tweaks for macOS
Date: Mon, 17 Jul 2023 18:12:45 -0400

On Mon, Jul 17, 2023 at 3:29 PM Chet Ramey <chet.ramey@case.edu> wrote:
>
> On 7/7/23 5:05 PM, Grisha Levit wrote:
> > A few small tweaks for the macOS-specific normalization handling to
> > handle the issues below:
>
> The issue is that the behavior has to be different between cases where
> the shell is reading input from the terminal and gets NFC characters
> that need to be converted to NFD (which is how HFS+ and APFS store them)
> and when the shell is reading input from a file and doesn't need to (and
> should not) do anything with NFD characters.

NB: while HFS+ stores NFD names, APFS preserves normalization, so we
can get either NFC or NFD text back from readdir.  Both are
normalization-insensitive: "Being normalization-insensitive ensures
that normalization variants of a filename cannot be created in the
same directory, and that a filename can be found with any of its
normalization variants." [1]

Currently, Bash never actually converts to NFD.  The fnx_tofs()
function is there but it is never used.  Instead, Bash converts
filenames to NFC with fnx_fromfs() before comparing with either the
glob pattern or the completion hint text (which is never converted).

Since access is normalization-insensitive, we just need to normalize
to _some_ form, so going to NFC is fine, but if we're going to do that
we should normalize both the filesystem name and the text being
compared.

If there's a match, globs expand to the filenames (NFC or NFD) as
returned by readdir(), and Readline completes with NFC-normalized
versions of the names.  I think this makes sense.

What doesn't work quite right currently though is that glob patterns
with NFD text never match anything, and completion prefixes with NFD
text never expand to anything.

[1]: 
https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/APFS_Guide/FAQ/FAQ.html

> Does iconv work when taking NFD input that came from the file system and
> trying to convert it to NFD (UTF-8-MAC)? I've honestly never checked.

Converting to UTF-8-MAC always normalizes to NFD:

$ printf '\303\251\0\145\314\201' | iconv -f UTF-8-MAC -t UTF-8-MAC | od -b -An
          145 314 201 000 145 314 201

$ printf '\303\251\0\145\314\201' | iconv -f UTF-8     -t UTF-8-MAC | od -b -An
          145 314 201 000 145 314 201

But Bash only converts from UTF-8-MAC to UTF-8, which always normalizes to NFC:

$ printf '\303\251\0\145\314\201' | iconv -f UTF-8-MAC -t UTF-8     | od -b -An
          303 251 000 303 251



reply via email to

[Prev in Thread] Current Thread [Next in Thread]