octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding issues : the DESCRIPTION file


From: Oliver Heimlich
Subject: Re: Encoding issues : the DESCRIPTION file
Date: Tue, 20 Jan 2015 19:14:52 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0

Am 20.01.2015 um 14:27 schrieb Julien Bect:
Hello everyone,

I have started to investigate encoding issues in the generate_html
package, following this discussion :

http://octave.1599824.n4.nabble.com/generate-html-breaks-documentation-encoding-tp4668154.html

...

I would like to come up with a solution that is clear and consistent for
the *automatic* processing of DESCRIPTION files (no more manual editing
should be needed).

Here are some options.

A) Assume US-ASCII. Error if any character > 0x7F is encountered.

A') Same as A, unless a optional ENCODING file is present, in which case
DESCRIPTION (and COPYING, and NEWS) is assumed to have the encoding
indicated in that file.

B) Assume ISO-8859-1. For "ø" and "ë" this wouldn't be a problem (F8 and
EB) but sooner or later a package manager whose name cannot be written
in ISO-8859-1 will join the project...

B') Assume ISO-8859-1 with an optional ENCODING file.

C) Assume UTF-8.

C') Assume UTF-8 with an optional ENCODING file (for package manager
that *really* don't want to use UTF-8).

D) In A', B' or C', use a new optional field in DESCRIPTION instead of
an ENCODING file.

I would vote for A' (just requires a small number of packager managers
to add an ENCODING file) or C (doesn't seem to require any additional
work at all).

Any thoughts ?



Let me add another option to the list.

E) Assume ISO-8859-1, but switch to UTF-8 if a byte order mark is present.


My favorite option is C, because it's simple and future-proof. IMHO utf-8 can be considered the default encoding nowadays.

I would be okay with B. I dislike A', B', C', and D, because these would be uncommon solutions. E would require advanced text editors and is therefore hard to maintain. The pre-unicode 20th century called, and they want their option A back ;-)


P.S. During generation of the index.html, please add calls to insert_char_entities as well (e. g. author and maintainer information with e-mail address in angle brackets).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]