libcdio-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libcdio-devel] Iconv usage and string handling


From: Peter Creath
Subject: Re: [Libcdio-devel] Iconv usage and string handling
Date: Wed, 26 Apr 2006 10:46:01 -0400

Yeah, after a little bit of digging in UTF-8, it looks like actual
type-safety was specifically avoided to allow for maximum
backwards-compatibility.  And, as you pointed out, a char typedef
doesn't bark.

Given the above, I think it's a good idea to use that typedef
("typedef char utf8").  It just keeps people conscious of the fact
that we're dealing with a utf8 string and not ASCII.  After all, you
know how much developers like to read documentation.  :)

That way any developer working with libcdio can immediately see from
the header file that he's not necessarily getting ASCII.  And he'll
have some idea why he's getting slightly odd behavior from the string
routines.

I agree with you that strlen's behavior, while not exactly "standard",
is probably how it's generally used.  The harder trick with UTF is
finding the byte position of the Nth glyph (since individual glyphs
are varying lengths).

On 4/25/06, Burkhard Plaum <address@hidden> wrote:
> I can't think of many situations anywhere near the libcdio usage,
> where the number of characters actually matters, except textrendering
> in GUI toolkits and the stdio implementation in glibc.
>
> Another issue is alphabetic sorting, but
> AFAIK this is strongly locale dependent anyway.

Agreed.

> They document, that xmlChar must be UTF-8 and then have xmlStrcat(),
> xmlStrcup() etc. IMO completely unnecessary.

I agree that it's not necessary here, but I can see why they might
want to do that, to make all the string utilities look alike.  E.g.
"xmlStrchr" would probably be UTF-8 aware, and rather than have some
calls to strlen() and others to xmlStrchr(), they just force the xml-
prefix on everything.

> I won't fight to the death over this issue, but I would like to see a
> case, where an own utf8 datatype is absolutely neccessary. Otherwise,
> I prefer to keep things simple.

It's not strictly necessary, it just makes the code more self-documenting.

    -P




reply via email to

[Prev in Thread] Current Thread [Next in Thread]