[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects
From: |
Ken Hornstein |
Subject: |
Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects |
Date: |
Tue, 17 Jun 2014 16:11:22 -0400 |
>> if not for file names?
>
>The Unix kernel stores filenames as a run of bytes, not including `/'
>and NUL.
That's not universally true anymore. Some newer filesystems are mandating
that filenames are UTF-8 and enforcing normalization rules (MacOS X and
Solaris are two notable examples). Obviously some charset conversion is
happening for non-UTF-8 locales. I think that's inevitable, given the
issues with composed and decomposed characters.
For example, let's say you see this:
% ls
Résumé.txt Résumé.txt
How can that be? Well, they aren't the same sequence of bytes. In the
first one the “é” is U+00E9. In the second, it's U+0065 U+0301 (a regular
“e” followed by a combining accent character). The only way of resolving
this is to use the normalization rules for Unicode and do filename
searching that way; MacOS X actually rewrites all of the filenames
using Normalization Form D (all characters in decomposed form, which
means the regular character followed by the combining accents) and I think
that sucks, but they didn't ask me. Solaris is better; the original bytes
are preserved, but lookup is done using normalized names so you can't
have two filenames with the same characters.
--Ken
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, (continued)
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, norm, 2014/06/16
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ken Hornstein, 2014/06/16
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, norm, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ralph Corderoy, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, norm, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Jerrad Pierce, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ken Hornstein, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Earl Hood, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ken Hornstein, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ralph Corderoy, 2014/06/17
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects,
Ken Hornstein <=
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ralph Corderoy, 2014/06/18
- Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects, Ken Hornstein, 2014/06/18