pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New branch ready for review.


From: Ben Pfaff
Subject: Re: New branch ready for review.
Date: Fri, 17 Jul 2009 22:31:16 -0700
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (gnu/linux)

John Darrington <address@hidden> writes:

[data-encoding branch]
> Comments about this branch are invited before I merge with master.

At a high level, it bothers me how much allocation and
deallocation and copying this adds to simple operations.
I guess that it's probably not worth worrying about without
profiler output showing that it's significant, though.

In "Store variable names, labels and value labels as UTF8": The
function recode_strings() in sys-file-reader.c will assert-fail
if renaming variables causes a duplicate variable name, even if
the duplication would only be temporary, that is, if variables
named A and B will be renamed B and C in the process of recoding.
This is because dict_rename_var assert-fails if a duplicate would
be created.  I can think of two solutions: either check whether a
variable by the recoded name exists before calling
dict_rename_var, or use the dict_rename_vars function to
bulk-rename all of the variables in the dictionary at once, which
doesn't have this problem.  Probably the former solution makes
sense if you think the possibility of recoding causing this
problem is absurdly unlikely, and the latter otherwise.

In "data_out function to dynamically allocate return value": The
changes to table_value_missing are unsafe because they allocate a
fixed-size buffer and then strcpy the data_out return value into
that, even though data_out might return arbitrarily large data.
Instead, it should not allocate its buffers (via tab_alloc) until
it has received its formatted data from data_out, because only
then does it know how much data there is.  Similarly in
format_cell_entry.  Actually, tab_alloc just allocates from a
pool, so these functions could use data_out_pool.

The loop in tab_double is wrong, since there might be more bytes
in the string than fmt->w.  data_out_pool always null-terminates
its output (presumably?) so the "&& cp < s + fmt->w" test can
just be deleted.

I think that the call to g_string_append_len in data_out_g_string
is now wrong, because fs->w is not necessarily the length of s.
Probably this should just be g_string_append now?

In "Convert to utf8 in data_out function.": There should be a
comment on data_out and data_out_pool explaining the meaning of
the 'encoding' argument.  There is a tendency to assuming that it
is the encoding output by the function, whereas I believe it is
actually the encoding of the function's input.

As I read "Recode strings when writing system files." it occurred
to me that a sensible way to deal with recoding on system file
read and write would be to write a dict_recode function that both
sys-file-reader and sys-file-writer could use.  Then neither file
would have to worry about recoding much; they could just call
this function.  Does that make sense?

I didn't have time to build and test this at all.  If you feel
confident about it, go ahead and merge into master.
-- 
Ben Pfaff 
http://benpfaff.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]