pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #15820] Can not read sav file


From: John Darrington
Subject: Re: [bug #15820] Can not read sav file
Date: Wed, 22 Feb 2006 09:10:08 +0800
User-agent: Mutt/1.5.9i

On Tue, Feb 21, 2006 at 02:57:30PM -0800, Ben Pfaff wrote:
     
     Follow-up Comment #2, bug #15820 (project pspp):
     
     > PSPP needs to set and respect the LC_CTYPE locale category.
     
     I'm a little nervous about doing that, because not every usage of
     the ctype 
     functions should honor LC_CTYPE.  If we start setting LC_CTYPE, we need to
     audit for that and change the ones that do not want to honor
     LC_CTYPE to use 
     the gnulib modules designed for the purpose, such as c-ctype and
     c-strcase.

Well perhaps we need to consider it more carefully.   

Since we have absolutely no idea of the locale in which a system file
was created, I think we should simply take it on trust that the
variable names and strings within a file are valid ones.  Thus lines
such as 

      /* Copy first character of variable name. */
      if (!isalpha ((unsigned char) sv.name[0])
          && sv.name[0] != '@' && sv.name[0] != '#')
        lose ((ME, _("%s: position %d: Variable name begins with invalid "
                     "character."),
               fh_get_filename (r->fh), i));
      if (islower ((unsigned char) sv.name[0]))
        msg (MW, _("%s: position %d: Variable name begins with lowercase letter 
"
                   "%c."),


need to be excised from sfm-read.c --- we don't know the locale in
which the file was written, so we don't know how isalpha/islower etc ought
to behave when reading.  Similarly, I think that that sfm-write should
also not use any ctype functions. Let's just assume that the
dictionary and casefiles are valid ones.


Instead, let's do all that sort of checking in the lexer, and the
output routines.  Thus, 

 DATA LIST LIST /Äpfel *.

Will give an error (or perhaps just a warning) in the default "C"
locale, but continue happily if the LC_CTYPE locale has been set to
say "de_DE".  Similarly, if I generate output from a system file which
was created in the "de_DE" locale, but my current locale is "en_US",
then the output routine will generate a warning when it encounters a
variable name for which isalpha returns false.


So you're probably right, we'd need to audit the code for files which
currently use ctype (I had a look, it's about 12 files), and decide
whether they really should honour LC_CTYPE.  Based on the above, once
we've finished the current directory reorganisation, the decision rule
will probably be :

src/data, src/libpspp   : Does not honour LC_CTYPE.
src/language/*, src/ui  : Honours LC_CTYPE.
src/math                : Probably a mistake if anything here uses ctype.

-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.


Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]