bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#21395: Bug with cut and Spanish characters from text file with UTF-8


From: Michael Lee
Subject: bug#21395: Bug with cut and Spanish characters from text file with UTF-8 encoding
Date: Wed, 2 Sep 2015 00:41:09 +0000 (UTC)

To whom it may concern:

To preface the explanation of this possible bug, the following was tested:

Encoding(s) was/were determined by opening the Spanish text files with vi and using ":set" to view the encoding type(s).

Text files containing Spanish letters/characters were used in this test.  First, the locale in the bash shell was set to UTF-8 (default setting with Ubuntu) and the encoding on the first test file was encoded with Latin1.  Under these conditions head and tail were used to try to output several Spanish letters/characters with accents above the letter.  Trying to use "head spanish.txt" and "tail spanish.txt" resulted in output with spaces in place of the Spanish letters/characters.

After spanish.txt was converted from Latin1 to UTF-8 with iconv, the test was repeated with the head and tail utilities and then the output was correct.  The Spanish letters/characters then displayed correctly instead of what previously appeared to be blank spaces.  When the "cut" command was added to this, the behavior of spaces taking the place of letters returned.

For example, "head -n 50 spanish.txt | cut -c 1" or "tail -n 50 spanish.txt | cut -c 1" will result in the first character showing only blank spaces where there are Spanish letters/characters.  Letters with accents are displayed as blank spaces.  Using only head or tail will show the Spanish letters correctly, but not with the cut command.

When using cut as, "cut -c 1" with a text file with Spanish characters, it does not display those characters.

For example, the character ã or á will not display if it is the first character and the file is trimmed using the cut command.

Converting the file from Latin1 to UTF-8 solved the problem with head and tail, but not cut.

The cut command does not seem to output the special letters/characters correctly.

Is there an environment variable that could fix this or could it possibly be a bug?

Thank you for your time.

Sincerely,
Michael Lee


reply via email to

[Prev in Thread] Current Thread [Next in Thread]