libcdio-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mis


From: Thomas Schmitt
Subject: Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?
Date: Mon, 29 Apr 2019 00:07:44 +0200

Hi,

Serge Pouliquen wrote:
> https://savannah.gnu.org/file/bulle_bob_plage_cd-info.txt?file_id=46851
> with CP1252
>  failing to display for track 8
>
> ++ WARN: Iconv failed: Invalid or incomplete multibyte or wide character

Uh, oh. Why is it talking of such characters ?
Shouldn't CP1251 be a single-byte character set ?


>  track2 has an error for accent, corret title is    Tiens voilà la pelle
> https://savannah.gnu.org/file/bulle_bob_plage_cdrskin_output.txt?file_id=46853
>   2 : 80 01 02 0a  o  b 00  T  i  e  n  s     v  o  i d9 65
>   3 : 80 02 03 09  l 88     l  a     p  e  l  l  e 00 54 f2

So the a-accent-grave is encoded as 0x88.
That's not ISO-8859-1, where it would be 0xe0.
0x88 is in the unused area of ISO-8859.
In CP1252 it is Modifier Letter Circumflex Accent, U+02C6.

Hrmpf. That's a problem. The "incomplete character" is an accent which
misses an applicable main character.

It looks like ISO-8859-1 does not have incomplete characters.
... and that you hit the only one in CP1252. Congrats !


>   29 : 8f 00 1d 00 00

It pretends to be ISO-8859-1. But i could not find any character set
where a-accent-grave is 0x88. (I even looked for HP or IBM.)


>  track7 has an error for accent, corret title is    Douce et salée
>   9 : 80 06 09 05     p  l  u  m  e 00  D  o  u  c  e 01 dc
>  10 : 80 07 0a 05     e  t     s  a  l 8e  e 00  I  l 3b 5f

0x8e for e-accent-aigu. In ISO-8859-1 undefined. In CP1252 a Z-hacek.

So the text encoding in this CD is hopelessly wrong.


But it shows that CP1252 might be too tolerant as a base assumption.
There are no incomplete characters in ASCII and ISO-8859-1. So the
conversion should not be able to throw this error.

So back to the original proposal:

          case CDTEXT_CHARCODE_ISO_8859_1:
            charset = (char *) "ISO-8859-1";
            break;
          case CDTEXT_CHARCODE_ASCII:
            /* ASCII is a subset of ISO-8859-1. Some CDs announce it but then
             * have 8-bit characters in their text. Trying ISO-8859-1 gives
             * more hope for a readable result than telling iconv to be picky.
             */
            charset = (char *) "ISO-8859-1";
            break;

Serge, may i impose another round of tests on you ?
You have the best ill CDs ever. :))

(The "Bulle et Bob" CD will of course not be shown better than already
 seen with unchanged character set choice.)


Have a nice day :)

Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]