bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] normalization tweaks for macOS


From: alex xmb ratchev
Subject: Re: [PATCH] normalization tweaks for macOS
Date: Tue, 18 Jul 2023 11:29:08 +0200

i no much here the topic .. just for short .. i found uconv of icu-devtools
has more opts
has also some transliteration opt
just that u may not know it
i no pro i still cant achieve what i had to do

On Tue, Jul 18, 2023, 12:13 AM Grisha Levit <grishalevit@gmail.com> wrote:

> On Mon, Jul 17, 2023 at 3:29 PM Chet Ramey <chet.ramey@case.edu> wrote:
> >
> > On 7/7/23 5:05 PM, Grisha Levit wrote:
> > > A few small tweaks for the macOS-specific normalization handling to
> > > handle the issues below:
> >
> > The issue is that the behavior has to be different between cases where
> > the shell is reading input from the terminal and gets NFC characters
> > that need to be converted to NFD (which is how HFS+ and APFS store them)
> > and when the shell is reading input from a file and doesn't need to (and
> > should not) do anything with NFD characters.
>
> NB: while HFS+ stores NFD names, APFS preserves normalization, so we
> can get either NFC or NFD text back from readdir.  Both are
> normalization-insensitive: "Being normalization-insensitive ensures
> that normalization variants of a filename cannot be created in the
> same directory, and that a filename can be found with any of its
> normalization variants." [1]
>
> Currently, Bash never actually converts to NFD.  The fnx_tofs()
> function is there but it is never used.  Instead, Bash converts
> filenames to NFC with fnx_fromfs() before comparing with either the
> glob pattern or the completion hint text (which is never converted).
>
> Since access is normalization-insensitive, we just need to normalize
> to _some_ form, so going to NFC is fine, but if we're going to do that
> we should normalize both the filesystem name and the text being
> compared.
>
> If there's a match, globs expand to the filenames (NFC or NFD) as
> returned by readdir(), and Readline completes with NFC-normalized
> versions of the names.  I think this makes sense.
>
> What doesn't work quite right currently though is that glob patterns
> with NFD text never match anything, and completion prefixes with NFD
> text never expand to anything.
>
> [1]:
> https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/APFS_Guide/FAQ/FAQ.html
>
> > Does iconv work when taking NFD input that came from the file system and
> > trying to convert it to NFD (UTF-8-MAC)? I've honestly never checked.
>
> Converting to UTF-8-MAC always normalizes to NFD:
>
> $ printf '\303\251\0\145\314\201' | iconv -f UTF-8-MAC -t UTF-8-MAC | od
> -b -An
>           145 314 201 000 145 314 201
>
> $ printf '\303\251\0\145\314\201' | iconv -f UTF-8     -t UTF-8-MAC | od
> -b -An
>           145 314 201 000 145 314 201
>
> But Bash only converts from UTF-8-MAC to UTF-8, which always normalizes to
> NFC:
>
> $ printf '\303\251\0\145\314\201' | iconv -f UTF-8-MAC -t UTF-8     | od
> -b -An
>           303 251 000 303 251
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]