pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fix for reading funny compressed data, for review


From: Ben Pfaff
Subject: Re: fix for reading funny compressed data, for review
Date: Thu, 15 Oct 2009 21:12:54 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)

Thanks.  I pushed it out.

John Darrington <address@hidden> writes:

> It seems reasonable to me.
>
> J'
>
> On Wed, Oct 14, 2009 at 09:44:52PM -0700, Ben Pfaff wrote:
>      I'd like to push this to the stable branch.  Comments
>      appreciated.
>      
>      commit e624e2da6ea68d22e6d4fba4eaa96d37d07a6730
>      Author: Ben Pfaff <address@hidden>
>      Date:   Wed Oct 14 21:20:44 2009 -0700
>      
>          sys-file-reader: Tolerate nonsensical opcodes in compressed data.
>          
>          Compressed data in .sav files uses a set of 256 opcodes, some of 
> which make
>          sense only for numeric data and others of which only make sense for 
> string
>          data.  However, Jereme Thomas <address@hidden> has provided one
>          file, written by SPSS 14, that uses an opcode that seems to makes 
> sense
>          only for numeric data in a string field.  So this commit adds 
> support for
>          these opcodes, although it still warns about the ones other than the 
> exact
>          one found in the file provided by Jereme.
>      
>      diff --git a/doc/dev/system-file-format.texi 
> b/doc/dev/system-file-format.texi
>      index 70fa385..b1be385 100644
>      --- a/doc/dev/system-file-format.texi
>      +++ b/doc/dev/system-file-format.texi
>      @@ -884,6 +884,9 @@ value @var{code} - @var{bias}, where
>       variable @code{bias} from the file header.  For example,
>       code 105 with bias 100.0 (the normal value) indicates a numeric variable
>       of value 5.
>      +One file has been seen written by SPSS 14 that contained such a code
>      +in a @emph{string} field with the value 0 (after the bias is
>      +subtracted) as a way of encoding null bytes.
>       
>       @item 252
>       End of file.  This code may or may not appear at the end of the data
>      diff --git a/src/data/sys-file-reader.c b/src/data/sys-file-reader.c
>      index fe7b533..8d973e4 100644
>      --- a/src/data/sys-file-reader.c
>      +++ b/src/data/sys-file-reader.c
>      @@ -86,6 +86,7 @@ struct sfm_reader
>           double bias;                /* Compression bias, usually 100.0. */
>           uint8_t opcodes[8];         /* Current block of opcodes. */
>           size_t opcode_idx;          /* Next opcode to interpret, 8 if none 
> left. */
>      +    bool corruption_warning;    /* Warned about possible corruption? */
>         };
>       
>       static const struct casereader_class sys_file_casereader_class;
>      @@ -192,6 +193,7 @@ sfm_open_reader (struct file_handle *fh, struct 
> dictionary **dict,
>         r->oct_cnt = 0;
>         r->has_long_var_names = false;
>         r->opcode_idx = sizeof r->opcodes;
>      +  r->corruption_warning = false;
>       
>         /* TRANSLATORS: this fragment will be interpolated into
>            messages in fh_lock() that identify types of files. */
>      @@ -1374,7 +1376,14 @@ read_compressed_number (struct sfm_reader *r, 
> double *d)
>             break;
>       
>           case 254:
>      -      sys_error (r, _("Compressed data is corrupt."));
>      +      float_convert (r->float_format, "        ", FLOAT_NATIVE_DOUBLE, 
> d);
>      +      if (!r->corruption_warning)
>      +        {
>      +          r->corruption_warning = true;
>      +          sys_warn (r, _("Possible compressed data corruption: "
>      +                         "compressed spaces appear in numeric field."));
>      +        }
>      +      break;
>       
>           case 255:
>             *d = SYSMIS;
>      @@ -1395,7 +1404,8 @@ read_compressed_number (struct sfm_reader *r, 
> double *d)
>       static bool
>       read_compressed_string (struct sfm_reader *r, char *dst)
>       {
>      -  switch (read_opcode (r))
>      +  int opcode = read_opcode (r);
>      +  switch (opcode)
>           {
>           case -1:
>           case 252:
>      @@ -1410,7 +1420,25 @@ read_compressed_string (struct sfm_reader *r, 
> char *dst)
>             break;
>       
>           default:
>      -      sys_error (r, _("Compressed data is corrupt."));
>      +      {
>      +        double value = opcode - r->bias;
>      +        float_convert (FLOAT_NATIVE_DOUBLE, &value, r->float_format, 
> dst);
>      +        if (value == 0.0)
>      +          {
>      +            /* This has actually been seen "in the wild".  The 
> submitter of the
>      +               file that showed that the contents decoded as spaces, 
> but they
>      +               were at the end of the field so it's possible that the 
> null
>      +               bytes just acted as null terminators. */
>      +          }
>      +        else if (!r->corruption_warning)
>      +          {
>      +            r->corruption_warning = true;
>      +            sys_warn (r, _("Possible compressed data corruption: "
>      +                           "string contains compressed integer (opcode 
> %d)"),
>      +                      opcode);
>      +          }
>      +      }
>      +      break;
>           }
>       
>         return true;
>      
>      -- 
>      Peter Seebach on managing engineers:
>      "It's like herding cats, only most of the engineers are already
>       sick of laser pointers."
>      
>      
>      _______________________________________________
>      pspp-dev mailing list
>      address@hidden
>      http://lists.gnu.org/mailman/listinfo/pspp-dev

-- 
Peter Seebach on managing engineers:
"It's like herding cats, only most of the engineers are already
 sick of laser pointers."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]