Re: Displaying characters in user's locale

bug-texinfo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Displaying characters in user's locale

From:	Eli Zaretskii
Subject:	Re: Displaying characters in user's locale
Date:	Sat, 01 Feb 2014 10:11:44 +0200

> Date: Fri, 31 Jan 2014 21:33:39 +0000
> From: Gavin Smith <address@hidden>
> Cc: Karl Berry <address@hidden>, address@hidden
> 
> I've attached a patch which uses iconv as you suggested. I've tested
> it with the two files attached under both utf-8 and iso8859-1 locales.
> (I did this by, e.g. running "LANG=en_US.UTF8" to get a UTF-8
> terminal.) I haven't been able to figure out how to get an ASCII-only
> terminal yet.

Thank you for doing this.

Allow me a few comments about the patch.

First, I think a configure time test for libiconv availability should
be added, and the code that uses libiconv should be conditioned on
HAVE_LIBICONV or some such, computed by that test.

> +/* Look for local variables section in FB and set encoding */
> +static void
> +set_file_lc_ctype (FILE_BUFFER *fb)

I think this function should return UTF-8 if it doesn't find any
coding: cookies in the file.  UTF-8 is probably the best default
nowadays.

> +static void
> +degrade_utf8 (char **from, size_t *from_left, char **to, size_t *to_left)
> +{
> +  struct encoding_replacement er[] = {
> +  {"\xe2\x80\x98","'"}, /* Opening quote */
> +  {"\xe2\x80\x99","'"}, /* Closing quote */
> +  {0, 0}};

This list should include all the Unicode characters used by makeinfo.
Opening and closing double quotes and the right arrow come to mind.
Perhaps Patrice could point to the places in texi2any that could be
used to glean all those characters.

> +static void
> +convert_characters (FILE_BUFFER *fb)
> +{
> +  long node = 0, nextnode;
> +  SEARCH_BINDING binding;
> +  char *to_locale;
> +
> +  iconv_t iconv_state;
> +  int iconv_available = 0;
> +
> +  void (*degrade_funcs[5])(char **, size_t *,
> +                           char **, size_t *) = {
> +    degrade_dummy, degrade_utf8, degrade_dummy,
> +    degrade_dummy, degrade_dummy };

Why do we need any degrade_* functions except degrade_utf8?  Can you
tell what possible features can benefit from this?

> +  /* Read environment locale */
> +  to_locale = nl_langinfo(CODESET);
> +
> +  /* Don't degrade the contents if we are in fact
> +   * in the right locale for the file */
> +  if (!strcasecmp(to_locale, encoding_names[fb->lc_ctype]))
> +    return;
> +
> +  degrade = degrade_funcs [fb->lc_ctype];

One of the disadvantages of those degrade_* functions is that you must
match each encoding with a functions, and there are an awful lot of
possible encodings out there.

> +  /* Check if an iconv conversion from file locale to system
> +   * locale exists - if so we will try to use it. */
> +  iconv_state = iconv_open (to_locale, encoding_names[fb->lc_ctype]);
> +  if (iconv_state != (iconv_t) -1)
> +    iconv_available = 1;

I would suggest to use to_encoding here, as to_locale is misleading
(e.g., the locale is en_US.UTF-8, but the encoding you care about here
is UTF-8).

> +  /* Convert sections of the file separated by node separators. These
> +   * will be preambles, nodes, tag tables, or local variable sections.
> +   * We convert all of them, although probably only the nodes need to
> +   * be converted. 

I would indeed suggest to convert only the node that is about to be
displayed.  Some manuals are very large, so converting them in their
entirety might produce an annoying delay at startup.  Did you try the
Emacs Lisp manual, for example?

> +  while ((nextnode = find_node_separator (&binding)) != -1
> +    || (node != fb->filesize && (nextnode = fb->filesize)))

In this loop, I suggest an optimization: only call iconv for portions
of text that include bytes above 127, unless the file's encoding is
known to require conversion even in that case (some CJK encodings,
like ISO-2022 family, are known to be long to the latter class).  This
could save you some cycles.

Of course, optimizations can wait until the rest is known to work
correctly.

Thanks again for working on this.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Displaying characters in user's locale, Eli Zaretskii <=
- Re: Displaying characters in user's locale, Gavin Smith, 2014/02/01
  - Re: Displaying characters in user's locale, Eli Zaretskii, 2014/02/01
    - Re: Displaying characters in user's locale, Gavin Smith, 2014/02/01
    - Re: Displaying characters in user's locale, Eli Zaretskii, 2014/02/01
    - Re: Displaying characters in user's locale, Gavin Smith, 2014/02/01
    - Re: Displaying characters in user's locale, Eli Zaretskii, 2014/02/01
    - Re: Displaying characters in user's locale, Eli Zaretskii, 2014/02/01
    - Re: Displaying characters in user's locale, Per Bothner, 2014/02/01
  - Re: Displaying characters in user's locale, Reinhard Kotucha, 2014/02/02
    - Re: Displaying characters in user's locale, Eli Zaretskii, 2014/02/02

Next by Date: Re: Displaying characters in user's locale
Next by thread: Re: Displaying characters in user's locale
Index(es):
- Date
- Thread