bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fold and multibyte characters


From: Eric Blake
Subject: Re: fold and multibyte characters
Date: Fri, 05 May 2006 01:31:41 +0000

> Hi,

Sorry about the last email, my finger slipped and hit send too soon.

> 
> It seems to me that fold (5.94) doesn't support multibyte locales. I have
> LANG=hu_HU.UTF-8, no other LC_ variables set, fully UTF-8 environment. When
> I launch "fold -s" to format a UTF-8 encoded text file, it produces lines
> much shorter than 80 characters for those lines that have plenty of accented
> letters. If I omit the "-s" it even easily breaks a character, places a
> newline between its two bytes, which leads to invalid UTF-8 output.

Yes, this is a well-known deficiency of coreutils.  From the TODO document:

Adapt tools like wc, tr, fmt, etc. (most of the textutils) to be
  multibyte aware.  The problem is that I want to avoid duplicating
  significant blocks of logic, yet I also want to incur only minimal
  (preferably `no') cost when operating in single-byte mode.

Coreutils is currently only doing byte operations, so multibyte encodings
don't fold very well.  Some vendors have provided patches that attempt
to do multibyte operations, but none have been considered clean
enough for upstream inclusion yet.

> 
> Oh, and by the way it would be cool if fold could automatically find out the
> terminal's width if stdout is a terminal.

Interesting idea.  However, it would require a command-line option,
since POSIX requires that fold use 80 columns by default.

-- 
Eric Blake




reply via email to

[Prev in Thread] Current Thread [Next in Thread]