libcdio-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Libcdio-devel] Joliet bug?


From: R. Bernstein
Subject: [Libcdio-devel] Joliet bug?
Date: Thu, 4 May 2006 19:50:48 -0400

I just had a chance to look at this Joliet problem is in more detail.

First, I'd like to summarize to make sure I understand this correctly.

The problem observed is that the routine ucs2be_to_locale() in
iso9660_fs.c expects a stream of bytes that is UCS-BE16 (Universal
Character Set Big-Endian 16-bit), but is probably getting something
that is not that when it is called via

    iso9660_ifs_get_preparer_id(), 
    iso9660_ifs_get_publisher_id(),
    iso9660_ifs_get_volumeset_id(), or 
    iso9660_ifs_get_application_id().

However things are okay when ucs2be_to_locale() is called via 
    iso9660_ifs_get_system_id();
    iso9660_ifs_get_volume_id();

If this is correct, at present I'm not sure yet whether the problem
has to do with the way the ISO 9660 was created or (more likely)
libcdio's interpretation. And to fix the libcdio's interpretation, I
guess I need to understand in which way libcdio messed up.

For the ISO 9660 standard, I've been using the European draft ECMA 119
which is freely available at
http://www.ecma-international.org/publications/standards/Ecma-119.htm

However the relevant portion is probably this below (where BP refers to the 
byte offset):

...
8.4.4 Unused Field (BP 8)
8.4.5 System Identifier (BP 9 to 40)
8.4.6 Volume Identifier (BP 41 to 72)
...
8.4.19 Volume Set Identifier (BP 191 to 318)
8.4.20 Publisher Identifier (BP 319 to 446)
8.4.21 Data Preparer Identifier (BP 447 to 574)
8.4.22 Application Identifier (BP 575 to 702)
8.4.23 Copyright File Identifier (BP 703 to 739)
8.4.24 Abstract File Identifier (BP 740 to 776)
8.4.25 Bibliographic File Identifier (BP 777 to 813)

One thing I notice is that none of the fields are 16-bit word
aligned. That is they start on odd byte-boundaries - even the ones
that work! However notice that before the "system identifier",
generally there is an unused field which is generally 0. (Or that's
what iso9660.h says even if it doesn't seem to be specified in ECMA
119 above).

So I guess I'm wondering whether a zero byte is presumed to be
inserted before the first byte, or (less likely but I suppose
possible) whether the following byte is the "upper byte" of the 16-bit
word. (In other words the values are really Little Endian with
non-aligned 16-bit words). Yet another possibility is that for those
odd-byte aligned field one goes back a byte, but I hope this is not
the case since the previous byte almost always really belongs to some
other field.

Of less concern but something of importance is whether iconv() cares
about 16-bit alignment and/or whether the replacement code does,
because, as mentioned above, that doesn't seem to be what is what's
happening here.

I'll try to do some more digging of iconv() specs and in K3b. Perhaps
I'll send a query to k3b folks since my guess is that was the ISO 9660
creator that was used. (Of course I welcome thoughts, comments, and
guidance from others.)


Thanks for investigating and reporting the problem.

address@hidden writes:
 > Hi,
 > 
 > I worked on UTF-8 support and stumbled across the following issue:
 > In iso9660_fs.c I replaced the function ucs2be_to_locale()
 > by a new and more generic cdio_charset_to_utf8().
 > 
 > While trying to figure out, why my routine fails, I found, that the
 > original version also fails. Digging deeper into it, it seems that
 > there is a +/- 1 byte offset when reading several strings.
 > 
 > When called by iso9660_ifs_get_application_id(), ucs2be_to_locale() gets
 > the following data:
 > 
 > 20 00 4b 00 33 00 42 00 20 00 54 00 48 00 45 00  .K.3.B. .T.H.E.
 > 20 00 43 00 44 00 20 00 4b 00 52 00 45 00 41 00  .C.D. .K.R.E.A.
 > 54 00 4f 00 52 00 20 00 28 00 43 00 29 00 20 00 T.O.R. .(.C.). .
 > 31 00 39 00 39 00 38 00 2d 00 32 00 30 00 30 00 1.9.9.8.-.2.0.0.
 > 35 00 20 00 53 00 45 00 42 00 41 00 53 00 54 00 5. .S.E.B.A.S.T.
 > 49 00 41 00 4e 00 20 00 54 00 52 00 55 00 45 00 I.A.N. .T.R.U.E.
 > 47 00 20 00 41 00 4e 00 44 00 20 00 54 00 48 00 G. .A.N.D. .T.H.
 > 45 00 20 00 4b 00 33 00 42 00 20 00 54 00 45 00 E. .K.3.B. .T.E.
 > 
 > Now my knowlegde about iso9660 is near zero, but I know for sure, that
 > the above sequence is no UCS-2BE. In Big Endian, a space ' ' will be
 > 0x00 0x20 instead of 0x20 0x00.
 > Same issues seem to be in the functions:
 > 
 > iso9660_ifs_get_preparer_id();
 > iso9660_ifs_get_publisher_id();
 > iso9660_ifs_get_volumeset_id();
 > 
 > The following functions work here:
 > 
 > iso9660_ifs_get_system_id();
 > iso9660_ifs_get_volume_id();
 > 
 > In iso-info, these bugs don't show up because the respective strings
 > are either not shown or they come from "somewhere else".
 > 
 > Can anyone help here?
 > My UTF-8 patch is practically finished but I would like to get these
 > issues resolved.
 > 
 > Thanks
 > 
 > Burkhard
 > 
 > 
 > 
 > 
 > 
 > 
 > _______________________________________________
 > Libcdio-devel mailing list
 > address@hidden
 > http://lists.gnu.org/mailman/listinfo/libcdio-devel
 > 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]