emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CSV parsing and other issues (Re: LC_NUMERIC)


From: Maxim Nikulin
Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC)
Date: Mon, 14 Jun 2021 23:38:19 +0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

On 12/06/2021 01:04, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Fri, 11 Jun 2021 23:58:24 +0700
On 10/06/2021 23:57, Eli Zaretskii wrote:
 >> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700
 >
 > For processing CSV, if there's a need to know whether the
 > locale uses the comma as a decimal separator, we could
 > indeed extend locale-info.  But such an extension is almost
 > trivial and doesn't even touch on the significant problems
 > in the rest of the discussion.

You forgot `setlocale(LC_NUMERIC, "C")', didn't you?

No, I didn't.  Adding a call to setlocale to locale-info, even if we
want to add an argument for the caller to control the locale, is
trivial.

I would avoid such manipulations and the reason is not efficiency of particular implementation. Locale is not thread local, so changing it in *getter* is a source rare but really obscure hardly reproducible problems. I do not like such output

1234.567890
1234,567890
1234.567890

of the following program changing locale in a parallel thread

  #include <locale.h>
  #include <pthread.h>
  #include <stdio.h>
  #include <time.h>

  #define DELAY_NS 40000000

  void* other_thread(void *arg) {
          struct timespec delay = { 0, DELAY_NS/2 };
          nanosleep(&delay, NULL);
          printf("%f\n", 1234.56789);
          delay.tv_nsec = DELAY_NS;
          nanosleep(&delay, NULL);
          printf("%f\n", 1234.56789);
          nanosleep(&delay, NULL);
          printf("%f\n", 1234.56789);
          return NULL;
  }

  int main() {
          setlocale(LC_NUMERIC, "C");
          pthread_t thread_id;
          pthread_create(&thread_id, NULL, &other_thread, NULL);
          struct timespec delay = { 0, DELAY_NS };
          nanosleep(&delay, NULL);
          setlocale(LC_NUMERIC, "");
          nanosleep(&delay, NULL);
          setlocale(LC_NUMERIC, "C");
          void *res;
          pthread_join(thread_id, &res);
          return 0;
  }

Explicit locale objects decoupled from application-wide global preferences are safer and more flexible.

> Here's a trivial example:
 >
 >     (insert (downcase (buffer-substring POS1 POS2)))
 >
 > Contrast with
 >
 >     (insert (downcase "FOO"))

Either `set-text-properties' should be called on "FOO" before passing it to `downcase'

Which property will help here? we don't have such properties.  they
need to be designed and implemented.
Let's name it "locale". Its value is some object that represents either a "solid" locale such as de_DE or combined LC_NUMERIC=en_GB + LC_TIME=de_DE + default fr_FR. Data required for particular operations may be loaded on demand.

or `locale-downcase' with LOCALE first argument should be added.

How would you implement locale-downcase?  Are you familiar with how
Emacs case tables work?

No, I am not familiar with Emacs internals dealing with case conversion. I already wrote I am even unaware how to properly handle Turkish. For the scripts I am familiar with, it is enough to have default table for normalizing and conversion. I can admit that sometimes conversion may depend on language and the language can not be determined from code point. In such cases I expect additional override table that has higher priority than the default one.

> And even if we had locale-downcase, which locale would you
> pass to it in any given use case?

I already mentioned responsibility chain: explicit value or set of overrides passed by user, text property for particular span of characters, buffer-local variables, global environment variables. Locale may be instantiated from its name "it_IT". Convenience functions to obtain locale at point likely will be useful as well. (Actually I am assuming number parsing-formatting rather than case conversion.)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]