[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Indicate if field is multi-value in record set header
From: |
Ben Mather |
Subject: |
Indicate if field is multi-value in record set header |
Date: |
Mon, 21 Nov 2022 10:53:20 +0000 |
User-agent: |
Evolution 3.46.1 |
Hi Jose, Would it be possible to add support for indicating whether a
field can contain multiple values as part of the record-set header?
Possible options might be, in reverse order of how severely they seem
likely to break older versions of recutils:
* Require the type of fields with multiple-values to be prefixed with
`array`, e.g. `%type: Author array line`.
* Keep current behaviour for unprefixed types, but allow single-values
to be enforced by adding a prefix, e.g. `%type: Id scalar uuid`.
* Add new record-set properties in the style of `%mandatory`,
`%allowed` and `%prohibit`, e.g. `%array: Author\n%scalar: Id`.
There are likely to be other options. I'm not able to judge
implementation difficulty. I am also not sure what the recutils policy
on backwards compatibility is.
For background, I've been trying to load some data that I have stored
in recfiles into pandas (a python data-frame library). Some fields in
the files are multi-value and some are not.
As far as I can tell, there currently isn't a reliable way to determine
if a field is expected to be multi-value (and therefore mapped to an
array type) or not:
* Auto-detection based on whether a record with multiple values for a
field exists is brittle and requires multiple passes.
* Mapping everything to arrays on the off-chance that a field is
multi-value is inneficient and not ergonomic.
* Specifying at read time is only possible with known data, and means
that the property isn't validated at write.
* Dropping extra values is obviously undesirable.
I think that this being implicit is also a problem for some of the
tools in the recutils suite, for example `recsel` when joining on a
multi-value field, and that they would benefit from being able to flag
unexpectedly multi-value fields early.
Best regards,Ben