bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23302: mention what are nonprinting characters


From: Assaf Gordon
Subject: bug#23302: mention what are nonprinting characters
Date: Wed, 31 Oct 2018 20:51:22 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

On 2018-10-31 12:34 p.m., 積丹尼 Dan Jacobson wrote:
Yes but every program has slightly different sets of non-printing
characters, so they need to list them exactly.


To my understanding, printable characters in C/POSIX locale
are strictly defined here:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03_01_01
Where it says:
    "print" is by definition "alnum", "punct", and the <space>
 and alnum/punct/space are defined on that page.

From that, every C program uses isprint(3) to determine
if a octet (value 0 to 255) is printable or not.
http://man7.org/linux/man-pages/man3/isprint.3p.html

And all corteutils' program use said logic.
(all bets are off in non C locale, of course).

For example,
Let's generate a file containing all 256 octets:

  env printf "$(env printf '\\x%02x' $(seq 0 255))" > 1

od's "z" type shows only printable characters:

  $ od -An -tx1z 1
  00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f  >................<
  10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f  >................<
  20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f  > !"#$%&'()*+,-./<
  30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f  >0123456789:;<=>?<
  40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f  >@ABCDEFGHIJKLMNO<
  50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f  >PQRSTUVWXYZ[\]^_<
  60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f  >`abcdefghijklmno<
  70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f  >pqrstuvwxyz{|}~.<
  80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f  >................<
  90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f  >................<
  a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af  >................<
  b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf  >................<
  c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf  >................<
  d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df  >................<
  e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef  >................<
  f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff  >................<

od's "c" type shows non-printable characters as octal values or escape sequences:

  $ od -An -tc 1
   \0 001 002 003 004 005 006  \a  \b  \t  \n  \v  \f  \r 016 017
  020 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037
        !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /
    0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?
    @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
    P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
    `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
    p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~ 177
  200 201 202 203 204 205 206 207 210 211 212 213 214 215 216 217
  220 221 222 223 224 225 226 227 230 231 232 233 234 235 236 237
  240 241 242 243 244 245 246 247 250 251 252 253 254 255 256 257
  260 261 262 263 264 265 266 267 270 271 272 273 274 275 276 277
  300 301 302 303 304 305 306 307 310 311 312 313 314 315 316 317
  320 321 322 323 324 325 326 327 330 331 332 333 334 335 336 337
  340 341 342 343 344 345 346 347 350 351 352 353 354 355 356 357
  360 361 362 363 364 365 366 367 370 371 372 373 374 375 376 377

tr can delete non-printables using a character class:

  $ tr -cd '[:print:]' < 1 ; echo

!"#$%&'()*+,-./0123456789:;<=>address@hidden|}~



and printf's "%q" type will also escape all non-printables as octal values:

  $ env printf "%q\n" "$(cat 2)"
  -bash: warning: command substitution: ignored null byte in input

'\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037'' !"#$%&'\''()*+,-./0123456789:;<=>address@hidden|}~'$'\177\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\327\330\331\332\333\334\335\336\337\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\367\370\371\372\373\374\375\376\377'

So it seems all these programs agree on what is a printable (and non-
printable) character - based on external definition.


Is there another instance you are aware of that behaves differently ?

-assaf






reply via email to

[Prev in Thread] Current Thread [Next in Thread]