bug-recutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Indicate if field is multi-value in record set header


From: Ben Mather
Subject: Indicate if field is multi-value in record set header
Date: Mon, 21 Nov 2022 10:53:20 +0000
User-agent: Evolution 3.46.1

Hi Jose, Would it be possible to add support for indicating whether a
field can contain multiple values as part of the record-set header?

Possible options might be, in reverse order of how severely they seem
likely to break older versions of recutils:
 * Require the type of fields with multiple-values to be prefixed with
   `array`, e.g. `%type: Author array line`.
 * Keep current behaviour for unprefixed types, but allow single-values
   to be enforced by adding a prefix, e.g. `%type: Id scalar uuid`.  
 * Add new record-set properties in the style of `%mandatory`,
   `%allowed` and `%prohibit`, e.g. `%array: Author\n%scalar: Id`.

There are likely to be other options. I'm not able to judge
implementation difficulty. I am also not sure what the recutils policy
on backwards compatibility is.



For background, I've been trying to load some data that I have stored
in recfiles into pandas (a python data-frame library). Some fields in
the files are multi-value and some are not.

As far as I can tell, there currently isn't a reliable way to determine
if a field is expected to be multi-value (and therefore mapped to an
array type) or not:
 * Auto-detection based on whether a record with multiple values for a
   field exists is brittle and requires multiple passes.
 * Mapping everything to arrays on the off-chance that a field is
   multi-value is inneficient and not ergonomic.
 * Specifying at read time is only possible with known data, and means
   that the property isn't validated at write.
 * Dropping extra values is obviously undesirable.

I think that this being implicit is also a problem for some of the
tools in the recutils suite, for example `recsel` when joining on a
multi-value field, and that they would benefit from being able to flag
unexpectedly multi-value fields early.


Best regards,Ben



reply via email to

[Prev in Thread] Current Thread [Next in Thread]