[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
stop octet storm in --debug (UTF-8)
From: |
Hideo Haga |
Subject: |
stop octet storm in --debug (UTF-8) |
Date: |
Mon, 17 Jun 2019 06:15:11 +0900 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 |
Sample(from online manual
5.9 Multibyte characters and Locale Considerations)
please set LANG=en_US.UTF-8
(or LANG=ja_JP.UTF-8)
in UTF-8 \u03A3b is 0xCE 0xA3
```
$ printf 'a\u03A3b' | sed 's/./X/g'
XXX
$ printf 'a\u03A3b' | sed --debug 's/./X/g'
SED PROGRAM:
s/./X/g
INPUT: 'STDIN' line 1
PATTERN: a\o37777777716\o37777777643b
COMMAND: s/./X/g
MATCHED REGEX REGISTERS
regex[0] = 0-1 'a'
PATTERN: XXX
END-OF-CYCLE:
XXX
```
then, all UTF-8 multibytes is displayed 11-digits octet.
I want to stop octet storm,
and Big ambition, all unicod-er want hex rather than octet.
because all most uni-code list by hex.
only stop storm, adding only mask 0xff.
```
diff --git a/sed/debug.c b/sed/debug.c
index 9ec37b6..4c40b97 100644
--- a/sed/debug.c
+++ b/sed/debug.c
@@ -66,7 +66,7 @@ debug_print_char (char c)
break;
default:
- printf ("o%03o", (unsigned int) c);
+ printf ("x%02x", (unsigned int) c & 0xff);
}
}
```
...but in l command not octet storm.
why? l command (do_list functon), get c by "unsigned char" from buffer
--> cast to int.
but, in debug_print_char function get c by "(sigined) char" --> cast to
int (minus value spread to 64bit) --> (unsined int) c --> printf.
--
------------------------------
Hideo Haga<address@hidden>
- stop octet storm in --debug (UTF-8),
Hideo Haga <=