bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: command fill-paragraph deletes leading Umlauts if line begins with s


From: Ralf Angeli
Subject: Re: command fill-paragraph deletes leading Umlauts if line begins with space
Date: Thu, 23 Dec 2004 11:19:11 +0100
User-agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3.50 (gnu/linux)

* Ulrich Scholz (2004-12-22) writes:

>   value of $LANG: en_US.ISO-8859-15
>   locale-coding-system: nil
>   default-enable-multibyte-characters: nil
>
> Please describe exactly what actions triggered the bug
> and the precise symptoms of the bug:
>
> The command changes the following paragraph
>
>  �bersetzung L�sungsverfahren f�r eine spezielle Problemdom�ne haben auch
> Probleme:
>
> to the paragraph
>
> bersetzung L�sungsverfahren f�r eine spezielle Problemdom�ne haben
> auch Probleme:
>
> Note that the � of �bersetzung is missing in the second version.  The
> bug eats any number of Umlauts, but only as first characters of the line after
> some spaces.  Umlauts after the first non-Umlaut or in lines that begin with a
> non-space remain.
>
> I don't know how to get a list of all active modes.  The bug occurs while
> editing an LaTeX-file.  I use auc-tex and reftex.  iso-accents-mode does not
> seem to cause the bug.

I can reproduce the behavior with CVS AUCTeX, but only if I force
Emacs (21.3 or CVS) to open the file in unibyte mode by using
`find-file-literally'.  The problem is that with unibyte mode umlauts
are considered to have whitespace syntax.  For example, typing `C-u
C-x =' on the first umlaut in your example gives

  character: � (0334, 220, 0xdc)
    charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
 code point: 220
     syntax:    which means: whitespace
buffer code: 0xDC
  file code: 0xDC (encoded by coding system no-conversion)
    display: by display table entry [?�] (see below)

(Instead of the control char one actually sees a "Ü".)

A function in AUCTeX for doing indentation looks at whitespace syntax
for finding the first non-whitespace character (and so does
`back-to-indentation' in CVS Emacs).  That means it will skip the "Ü"
and delete everything from the beginning of the line to and including
the "Ü".

I removed this code in CVS AUCTeX which now only uses
`back-to-indentation'.  In Emacs 21.3 this function does not look at
character syntax but simply skips spaces and tab characters at the
beginning of a line.  So unless you are using CVS Emacs (i.e. the
upcoming Emacs 21.4) your umlauts should be safe.

Anyway, do you really need the unibyte stuff?  If you want to use
latin-1, latin-9 and other non-ASCII encodings it will be better to
use Emacs in multibyte mode.  That means you should get rid of a
--unibyte command line option, a nil value for
`default-enable-multibyte-characters' or stuff like
`(standard-display-european t)'.  For example, this will make `M-f'
work correctly, i.e. it will not stop at every umlaut.

-- 
Ralf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]