bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Alignment bug in ls with UTF-8 filenames under Mac OS X


From: Vincent Lefevre
Subject: Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Date: Thu, 18 Jan 2007 03:50:26 +0100
User-agent: Mutt/1.5.13-vl-r14963 (2007-01-09)

On 2007-01-18 03:14:37 +0100, Bruno Haible wrote:
> Conclusion: What you see is not an ls bug, but an Apple Terminal bug
> with tabs.

I don't use the Apple Terminal (and never use it). As I said in my
bug report, I'm using uxterm here. More precisely:

prunille:~> uxterm -version
XFree86 4.3.99.903(184)

With the same uxterm, after a ssh to a Linux machine:

vin:~tmp/blah> LC_ALL=en_US.UTF-8 \ls -C | hd
00000000  45 cc 81 09 09 09 09 20  79 31 32 33 34 35 36 37  |E...... y1234567|
00000010  38 39 30 31 32 33 34 35  36 37 38 39 30 31 32 33  |8901234567890123|
00000020  34 35 36 37 38 39 30 0a  78 31 32 33 34 35 36 37  |4567890.x1234567|
00000030  38 39 30 31 32 33 34 35  36 37 38 39 30 31 32 33  |8901234567890123|
00000040  34 35 36 37 38 39 30 20  20 7a 31 32 33 34 35 36  |4567890  z123456|
00000050  37 38 39 30 31 32 33 34  35 36 37 38 39 30 31 32  |7890123456789012|
00000060  33 34 35 36 37 38 39 30  0a                       |34567890.|
00000069
vin:~tmp/blah> LC_ALL=en_US.UTF-8 \ls -C
É                                y123456789012345678901234567890
x123456789012345678901234567890  z123456789012345678901234567890

No problem.

Hmm... I forgot that ls was an alias (the same one on all my accounts).
So, back on Mac OS X:

prunille:~/blah> \ls
É                                y123456789012345678901234567890
x123456789012345678901234567890  z123456789012345678901234567890
prunille:~/blah> \ls --color=always
É                               y123456789012345678901234567890
x123456789012345678901234567890  z123456789012345678901234567890

prunille:~/blah> \ls -C | hexdump -C
00000000  45 cc 81 09 09 09 09 20  79 31 32 33 34 35 36 37  |E�..... y1234567|
00000010  38 39 30 31 32 33 34 35  36 37 38 39 30 31 32 33  |8901234567890123|
00000020  34 35 36 37 38 39 30 0a  78 31 32 33 34 35 36 37  |4567890.x1234567|
00000030  38 39 30 31 32 33 34 35  36 37 38 39 30 31 32 33  |8901234567890123|
00000040  34 35 36 37 38 39 30 20  20 7a 31 32 33 34 35 36  |4567890  z123456|
00000050  37 38 39 30 31 32 33 34  35 36 37 38 39 30 31 32  |7890123456789012|
00000060  33 34 35 36 37 38 39 30  0a                       |34567890.|
00000069

prunille:~/blah> \ls -C --color=always | hexdump -C
00000000  1b 5b 30 30 6d 1b 5b 30  6d 45 cc 81 1b 5b 30 30  |.[00m.[0mE�..[00|
00000010  6d 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |m               |
00000020  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000030  1b 5b 30 6d 79 31 32 33  34 35 36 37 38 39 30 31  |.[0my12345678901|
00000040  32 33 34 35 36 37 38 39  30 31 32 33 34 35 36 37  |2345678901234567|
00000050  38 39 30 1b 5b 30 30 6d  0a 1b 5b 30 6d 78 31 32  |890.[00m..[0mx12|
00000060  33 34 35 36 37 38 39 30  31 32 33 34 35 36 37 38  |3456789012345678|
00000070  39 30 31 32 33 34 35 36  37 38 39 30 1b 5b 30 30  |901234567890.[00|
00000080  6d 20 20 1b 5b 30 6d 7a  31 32 33 34 35 36 37 38  |m  .[0mz12345678|
00000090  39 30 31 32 33 34 35 36  37 38 39 30 31 32 33 34  |9012345678901234|
000000a0  35 36 37 38 39 30 1b 5b  30 30 6d 0a 1b 5b 6d     |567890.[00m..[m|
000000af

> But there is an ls bug:
> 
> $ ls -C -T0
> É                               y123456789012345678901234567890
> x123456789012345678901234567890  z123456789012345678901234567890
> $ ls -C -T0 | hd
> 000000  45 CC 81 20 20 20 20 20 20 20 20 20 20 20 20 20  E..             
> 000010  20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20                  
> 000020  20 20 79 31 32 33 34 35 36 37 38 39 30 31 32 33    y1234567890123
[...]

OK, so I think I was seeing this bug.

> What 'ls' here outputs is: an E, a combining accent and 31 spaces - text
> that moves to column 32, not 33. When I set a breakpoint in wcwidth,
> I see that the first call to wcwidth() gives: wcwidth(0x0301) = 1.
> U+0301 is COMBINING ACUTE ACCENT. So here is the problem: MacOS'
> wcwidth is buggy for combining characters like accents.

OK. Can't autoconf detect that and use another implementation?

> (*) 'hd' is a shell script:
> #!/bin/sh
> hexdump -e '"%06.6_ax  " 16/1 "%02X "' -e '"  " 16/1 "%_p" "\n"' "$@"

It's a bit like (or identical to) "hexdump -C", then.

Regards,

-- 
Vincent Lefèvre <address@hidden> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]