pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #15820] Can not read sav file


From: Ben Pfaff
Subject: Re: [bug #15820] Can not read sav file
Date: Thu, 23 Feb 2006 21:19:56 -0800
User-agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux)

John Darrington <address@hidden> writes:

> Reading between the lines in the spss documentation, it seems to
> suggest that the encoding is that of the environment of the machine
> which created it.

That's an unpleasant situation, in my opinion.

The SPSS documentation implies that SPSS uses something like
UTF-16 for variable names in certain cases.  At any rate it
mentions a 32-character limit for "double-byte character sets".

>      I think it'd still be a good idea to sanity-check variable names,
>      assuming that we can figure out the variable name encoding used
>      in system files.
>
> It would be nice, but in view of the above, I don't think we know what
> "sane"  is.  We just have to presume sanity unless proved otherwise.

We have the opportunity to translate between character sets,
using iconv, if we can figure out what character sets to
translate between.

We could assume that files we read or write are in the current
locale.  We could also add an "encoding" option to the SAVE and GET
commands, to allow dealing translating to/from foreign locales.

>      Is that the way that other languages with support for
>      internationalization parse variable names?  e.g. how does Java
>      work?  I must admit that I have a pretty weak grasp of how this
>      sort of thing is supposed to work.
>
> Most languages that I've encountered insist on ascii for identifiers.
> The only exception I know of is TeX, which allows one to change it at
> will.

C99 adds support for internationalized identifiers.  It has a
whole appendix listing the characters that are allowed in
identifiers.  I think Java has the same features.

>      I found out what GCC does.  It assumes input files are in the
>      locale's character set, or UTF-8 if it there's no locale, and
>      there's a command line option to override.  Maybe we should do
>      the same.
>
> Seems to be reasonable except that I don't see how there can be "no
> locale" on any *NIX system.  I suppose that section of the gcc manual
> is just decribing what the code does if setlocale(LC_CTYPE, 0) returns
> NULL.

I was assuming it meant what happened in the C locale.
-- 
"I consider that the golden rule requires that if I like a program
 I must share it with other people who like it."
--Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]