[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mis
From: |
Thomas Schmitt |
Subject: |
Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ? |
Date: |
Mon, 29 Apr 2019 00:07:44 +0200 |
Hi,
Serge Pouliquen wrote:
> https://savannah.gnu.org/file/bulle_bob_plage_cd-info.txt?file_id=46851
> with CP1252
> failing to display for track 8
>
> ++ WARN: Iconv failed: Invalid or incomplete multibyte or wide character
Uh, oh. Why is it talking of such characters ?
Shouldn't CP1251 be a single-byte character set ?
> track2 has an error for accent, corret title is Tiens voilà la pelle
> https://savannah.gnu.org/file/bulle_bob_plage_cdrskin_output.txt?file_id=46853
> 2 : 80 01 02 0a o b 00 T i e n s v o i d9 65
> 3 : 80 02 03 09 l 88 l a p e l l e 00 54 f2
So the a-accent-grave is encoded as 0x88.
That's not ISO-8859-1, where it would be 0xe0.
0x88 is in the unused area of ISO-8859.
In CP1252 it is Modifier Letter Circumflex Accent, U+02C6.
Hrmpf. That's a problem. The "incomplete character" is an accent which
misses an applicable main character.
It looks like ISO-8859-1 does not have incomplete characters.
... and that you hit the only one in CP1252. Congrats !
> 29 : 8f 00 1d 00 00
It pretends to be ISO-8859-1. But i could not find any character set
where a-accent-grave is 0x88. (I even looked for HP or IBM.)
> track7 has an error for accent, corret title is Douce et salée
> 9 : 80 06 09 05 p l u m e 00 D o u c e 01 dc
> 10 : 80 07 0a 05 e t s a l 8e e 00 I l 3b 5f
0x8e for e-accent-aigu. In ISO-8859-1 undefined. In CP1252 a Z-hacek.
So the text encoding in this CD is hopelessly wrong.
But it shows that CP1252 might be too tolerant as a base assumption.
There are no incomplete characters in ASCII and ISO-8859-1. So the
conversion should not be able to throw this error.
So back to the original proposal:
case CDTEXT_CHARCODE_ISO_8859_1:
charset = (char *) "ISO-8859-1";
break;
case CDTEXT_CHARCODE_ASCII:
/* ASCII is a subset of ISO-8859-1. Some CDs announce it but then
* have 8-bit characters in their text. Trying ISO-8859-1 gives
* more hope for a readable result than telling iconv to be picky.
*/
charset = (char *) "ISO-8859-1";
break;
Serge, may i impose another round of tests on you ?
You have the best ill CDs ever. :))
(The "Bulle et Bob" CD will of course not be shown better than already
seen with unchanged character set choice.)
Have a nice day :)
Thomas
Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?, Leon Merten Lohse, 2019/04/29
- Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?, Thomas Schmitt, 2019/04/29
- Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?, Serge Pouliquen, 2019/04/29
- Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?, Thomas Schmitt, 2019/04/29
- Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?, Serge Pouliquen, 2019/04/29
- Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?, Thomas Schmitt, 2019/04/29
- Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?, Serge Pouliquen, 2019/04/29
- Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?, Leon Merten Lohse, 2019/04/29
- Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?, Thomas Schmitt, 2019/04/29
- Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?, Rocky Bernstein, 2019/04/29