bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bash doesn't handle C with acute accent properly during readline's rl_ch


From: Eduardo Bustamante
Subject: Bash doesn't handle C with acute accent properly during readline's rl_change_case
Date: Thu, 11 May 2017 07:56:06 -0500

The C with acute accent character: https://en.wikipedia.org/wiki/%C4%86

- Upper case
dualbus@debian:~$ printf '\U0106\n'
Ć

- Lower case
dualbus@debian:~$ printf '\U0107\n'
ć

Now, in bash, if you type in ć, then run readline `upcase-word' on it,
instead of ending up with the UTF-8 multibyte string for U+0106 (0xC4
0x86), you end up with 0x07 0x87.

The parameter expansion doesn't seem to have that problem so I think
it's a bug in readline:

dualbus@debian:~/src/gnu/bash$ a=ć; echo ${a^^}
Ć

dualbus@debian:~/src/gnu/bash$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

(gdb) bt
#0  rl_change_case (count=1, op=1) at text.c:1339
#1  0x0000000000525cad in rl_upcase_word (count=1, key=117) at text.c:1304
#2  0x00000000004fe7a7 in _rl_dispatch_subseq (key=117, map=0x771d80
<emacs_meta_keymap>, got_subseq=0) at readline.c:851
#3  0x00000000004fed6f in _rl_dispatch_subseq (key=27, map=0x772d90
<emacs_standard_keymap>, got_subseq=0) at readline.c:985
#4  0x00000000004fe149 in _rl_dispatch (key=27, map=0x772d90
<emacs_standard_keymap>) at readline.c:797
#5  0x00000000004fe0b9 in readline_internal_char () at readline.c:629
#6  0x00000000004ff6a2 in readline_internal_charloop () at readline.c:656
#7  0x00000000004fda12 in readline_internal () at readline.c:670
#8  0x00000000004fd8d0 in readline (prompt=0x899ce8 "bash-4.4$ ") at
readline.c:374
#9  0x000000000042cae8 in yy_readline_get () at ./parse.y:1456
#10 0x0000000000431a8b in yy_getc () at ./parse.y:1389
#11 0x0000000000432328 in shell_getc (remove_quoted_newline=1) at ./parse.y:2289
#12 0x0000000000430bb7 in read_token (command=0) at ./parse.y:3138
#13 0x000000000042c14e in yylex () at ./parse.y:2675
#14 0x0000000000428abe in yyparse () at y.tab.c:1827
#15 0x00000000004285ab in parse_command () at eval.c:294
#16 0x0000000000428392 in read_command () at eval.c:338
#17 0x0000000000428091 in reader_loop () at eval.c:140
#18 0x00000000004253bb in main (argc=1, argv=0x7fffffffe498,
env=0x7fffffffe4a8) at shell.c:794

(gdb) p rl_line_buffer
$1 = 0x83a408 "ć"

(gdb) finish
Run till exit from #0  rl_change_case (count=1, op=1) at text.c:1339
0x0000000000525cad in rl_upcase_word (count=1, key=117) at text.c:1304
1304      return (rl_change_case (count, UpCase));
Value returned is $2 = 0

(gdb) p rl_line_buffer
$3 = 0x83a408 "\a\207"

For some reason, rl_change_case thinks `c` is ASCII:

(gdb) call isascii((unsigned char)c)
$8 = 1



reply via email to

[Prev in Thread] Current Thread [Next in Thread]