[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: groff supports Italian input documents now
Re: groff supports Italian input documents now
Sat, 3 Jul 2021 16:01:34 +0200
G. Branden Robinson wrote on Sat, Jul 03, 2021 at 12:50:07PM +1000:
[ autodetection ]
> Important to note here--it doesn't. groff doesn't detect this--it has
> to be told.
Which is a good thing. Even when a document contains only a single
language, detecting it automatically may not be reliable. Even if
a document contains mostly text in one language, that doesn't imply
the author designed it for use with that language's macro set.
Relying on a specific macro set is a choice by the author of a
document and has to be treated as such.
> I revamped groff input localization a few months ago. It occurred to me
> that the mechanism groff had innovated for this purpose (specify options
> like -mfr for French) was duplicative of an existing and much more
> widely understood infrastructure for tackling such issues: locale(7).
That's not duplication at all but a totally different topic which
has almost nothing to do with what we are talking about.
The locale(7) system is a systems for users to specify user
preferences, for example which character set and encoding they
want to use *when interacting with programs* and which language
they want programs to use when displaying messages and when
parsing user input.
That is not at all related to which macro set a document author
decided to use for a document that the user wishes to process.
For example, i almost always work with an en_US.UTF-8 locale with
some exceptions for low-level work where is use the POSIX locale
instead. But that doesn't mean that i never want to process French
or German documents. Yes, setting a fake locale when calling a
program is possible, so a *workaround* does exist, even though it
certainly feels awkward.
Besides, this is a bad trap. Why should any user expect that whatever
locale they may have set according to their personal preferences
silently cripples formatting of documents they process, and that
they have to go an extra mile for modifying the locale in the
environment of their formatting commands?
> I have anticipated, but not yet heard, a protest
The reason you didn't is trivial: i missed your change... :-(
> along the lines that just because a (for instance) French document
> is being typeset, the user might not want to change their locale
> to begin with "fr".
You have this argument backwards.
I don't think "let's allow users to be lazy" is a good argument.
Instead, my point would be that you are abusing the locale system
for the wrong purpose.
> C. Instead of saying something like "groff -mit", we can use a standard
> environment variable to assert the locale. For groff's purposes,
> simply "LANG=it" will suffice.
How is "LANG=it groff" better than "groff -mit"?
It is not shorter nor clearer.
I can easily tell you how it is worse.
- There is a risk that it inadvertently creeps in from the user's
environment even if the user never intended to set it.
- The roff ecosystem is famous for using pipelines, and making
sure that in a pipeline, the right programs run with the right
environment variables can be tricky and error-prone, whereas
setting command line options on programs in a pipeline is easy
- There is a risk that the environment variables habe undesirable
and unintended side effects on some programs in the pipeline
because not all programs run in a roff pipeline must necessarily
be programs distributed with the respective core roff package.
- The LC_ variables are unreasonably powerful for this purpose
because they have never been designed for it. The only decision
needed here is whether to run a macro package, and which one,
whereas the LC_ variables carry much more information.
Accepting and parsing irrelevant information and requiring
needlessly complicated syntax both cause complexity, which in
general increases the risk of both user confusion and program
misbehaviour and bugs.
- The LANG variable is considered a legacy feature, and advertising
legacy features is usually not a good idea. Advertising a
more modern syntax like "LC_ALL=it_IT.UTF-8 groff" exacerbates
the previous problem, making the user wonder whether the "_IT"
part matters and what effect it might have, and whether ".UTF-8"
is the right choice and if so, whether ".UTF-8" here is sufficient
to assure correct processing of the character encoding in the
file - which it likely isn't. The user might also wonder which
effect, if any, the LC_TIME and LC_NUMERIC features contained
in LC_ALL might have, and if those effects, if any, are beneficial
or detrimental, and whether it might be better to set one of the
other LC_* variables instead, and if so, which one. It's not
readily apparent which of the variables to set because none of
them are designed for the purpose.
This is not an outright request of a revert, but an invitation
to reconsider whether this is really a useful and desirable change.
- groff supports Italian input documents now, G. Branden Robinson, 2021/07/02
- Re: groff supports Italian input documents now, Oliver Corff, 2021/07/02
- Re: groff supports Italian input documents now, G. Branden Robinson, 2021/07/02
- Re: groff supports Italian input documents now, Oliver Corff, 2021/07/03
- Re: groff supports Italian input documents now,
Ingo Schwarze <=
- Re: groff supports Italian input documents now, G. Branden Robinson, 2021/07/03
- Re: groff supports Italian input documents now, John Ankarström, 2021/07/03
- Re: groff supports Italian input documents now, Dave Kemper, 2021/07/05
- Re: groff supports Italian input documents now, Ingo Schwarze, 2021/07/05
- groff and multilingual documents, G. Branden Robinson, 2021/07/10
- Re: groff and multilingual documents, Dave Kemper, 2021/07/12
- Re: groff supports Italian input documents now, James K. Lowden, 2021/07/06