pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #15820] Can not read sav file


From: Ben Pfaff
Subject: Re: [bug #15820] Can not read sav file
Date: Thu, 23 Feb 2006 13:39:38 -0800
User-agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux)

John Darrington <address@hidden> writes:

> Since we have absolutely no idea of the locale in which a system file
> was created, I think we should simply take it on trust that the
> variable names and strings within a file are valid ones.  

Do you think we can assume that variables names are encoded in
UTF-8?  Then it is fairly easy to convert variable names to/from
the current locale on system file input/output.

I have not experimented with non-ASCII variable names in SPSS.  A
few experiments might turn up the encoding.

> Thus lines such as
[...]
> need to be excised from sfm-read.c --- we don't know the locale in
> which the file was written, so we don't know how isalpha/islower etc ought
> to behave when reading.  

I think it'd still be a good idea to sanity-check variable names,
assuming that we can figure out the variable name encoding used
in system files.

> Similarly, I think that that sfm-write should also not use any
> ctype functions. Let's just assume that the dictionary and
> casefiles are valid ones.

I don't think sfm-write validates anything in the dictionary
currently.

> Instead, let's do all that sort of checking in the lexer, and the
> output routines.  Thus, 
>
>  DATA LIST LIST /Äpfel *.
>
> Will give an error (or perhaps just a warning) in the default "C"
> locale, but continue happily if the LC_CTYPE locale has been set to
> say "de_DE".  Similarly, if I generate output from a system file which
> was created in the "de_DE" locale, but my current locale is "en_US",
> then the output routine will generate a warning when it encounters a
> variable name for which isalpha returns false.

Is that the way that other languages with support for
internationalization parse variable names?  e.g. how does Java
work?  I must admit that I have a pretty weak grasp of how this
sort of thing is supposed to work.

> So you're probably right, we'd need to audit the code for files which
> currently use ctype (I had a look, it's about 12 files), and decide
> whether they really should honour LC_CTYPE.  [...]
-- 
"To the engineer, the world is a toy box full of sub-optimized and
 feature-poor toys."
--Scott Adams




reply via email to

[Prev in Thread] Current Thread [Next in Thread]