bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strstr, strcase, strcasestr, and i18n


From: Bruno Haible
Subject: Re: strstr, strcase, strcasestr, and i18n
Date: Sun, 4 Feb 2007 15:28:40 +0100
User-agent: KMail/1.5.4

Paul Eggert wrote:
> >   - strstr: This function's behaviour is not clearly defined. POSIX says
> >     that it compares a "string" with a "sequence of bytes". Which a priori
> >     is nonsense, since the elements of strings are characters.
> 
> No, elements of "character strings" are characters.  Elements of "strings"
> are bytes.  See:
> 
> http://www.opengroup.org/susv3/basedefs/xbd_chap03.html#tag_03_92
> http://www.opengroup.org/susv3/basedefs/xbd_chap03.html#tag_03_367

It's hard to know POSIX as well as you do :-)

> So strstr's behavior is clearly defined: it operates on strings (i.e.,
> byte strings), not character strings.

Indeed. And strstr cannot be specified to consider "character strings",
without breaking backward compatibility :-(

> > It was tempting to make a clear API nomenclature: c-str* for the C locale
> > emulation, str* for the internationalized functions. But if you're right
> > with strstr, then we should find new names for the internationalized 
> > versions
> > of these functions.
> 
> I think we have to find new names, yes.

Yup. It appears that Microsoft did their homework regarding str* functions
and multibyte strings, while the ISO C and POSIX communities didn't. I'll be
adding the following functions to gnulib, attempting to fix the hole that
ISO C and POSIX left.

  mbschr      like strchr
  mbsrchr     like strrchr
  mbsstr      like strstr
  mbscasecmp  like strcasecmp
  mbscasestr  like strcasestr
  mbscspn     like strcspn
  mbspbrk     like strpbrk
  mbsspn      like strspn
  mbstok_r    like strtok_r

The prefix "mbs" coincides with the precedent "mbswidth" in gnulib and 
with the precedent "mbspbrk", "mbsrchr" on HP-UX.

It does not conflict with the Microsoft names, since Microsoft uses "_mbs",
but the functions have the same calling convention as Microsoft's functions,
except that MS uses 'unsigned char *' as multibyte string type.

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]