|
From: | Oliver Heimlich |
Subject: | Re: Encoding issues : the DESCRIPTION file |
Date: | Tue, 20 Jan 2015 19:14:52 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0 |
Am 20.01.2015 um 14:27 schrieb Julien Bect:
Hello everyone, I have started to investigate encoding issues in the generate_html package, following this discussion : http://octave.1599824.n4.nabble.com/generate-html-breaks-documentation-encoding-tp4668154.html
...
I would like to come up with a solution that is clear and consistent for the *automatic* processing of DESCRIPTION files (no more manual editing should be needed). Here are some options. A) Assume US-ASCII. Error if any character > 0x7F is encountered. A') Same as A, unless a optional ENCODING file is present, in which case DESCRIPTION (and COPYING, and NEWS) is assumed to have the encoding indicated in that file. B) Assume ISO-8859-1. For "ø" and "ë" this wouldn't be a problem (F8 and EB) but sooner or later a package manager whose name cannot be written in ISO-8859-1 will join the project... B') Assume ISO-8859-1 with an optional ENCODING file. C) Assume UTF-8. C') Assume UTF-8 with an optional ENCODING file (for package manager that *really* don't want to use UTF-8). D) In A', B' or C', use a new optional field in DESCRIPTION instead of an ENCODING file. I would vote for A' (just requires a small number of packager managers to add an ENCODING file) or C (doesn't seem to require any additional work at all). Any thoughts ?
Let me add another option to the list. E) Assume ISO-8859-1, but switch to UTF-8 if a byte order mark is present.My favorite option is C, because it's simple and future-proof. IMHO utf-8 can be considered the default encoding nowadays.
I would be okay with B. I dislike A', B', C', and D, because these would be uncommon solutions. E would require advanced text editors and is therefore hard to maintain. The pre-unicode 20th century called, and they want their option A back ;-)
P.S. During generation of the index.html, please add calls to insert_char_entities as well (e. g. author and maintainer information with e-mail address in angle brackets).
[Prev in Thread] | Current Thread | [Next in Thread] |