bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61726: [PATCH] Eglot: Support positionEncoding capability


From: Augusto Stoffel
Subject: bug#61726: [PATCH] Eglot: Support positionEncoding capability
Date: Thu, 23 Feb 2023 14:31:52 +0100

On Thu, 23 Feb 2023 at 14:54, Eli Zaretskii wrote:

>> But just to confirm: position-bytes and byte-to-position are always with
>> respect to Emacs's internal extended UTF-8 representation and have
>> nothing to do with the buffer file enconding, right?
>
> Yes.  See bufferpos-to-filepos to get an idea of what hoops we need to
> jump through to get it right, even just with UTF-8.

Okay, then we're on the same page.  Just to emphasize, the buffer file
is totally irrelevant for Eglot's purposes.  The only thing that matters
is the representation of the buffer text when it's serialized as an
UTF-8-encoded string inside a JSON object.

>> > What does this stuff do with double-width or zero-width characters?
>> > Emacs takes character-width into consideration when it counts columns,
>> > but it is unclear to me what do LSP servers do in those cases.
>> > Likewise with characters that are composed on display.
>> 
>> `eglot-move-to-column' is supposed so count Unicode codepoints, so
>> e.g. x, ⇒ and 😃 all contribute 1 unit.
>
> But if the resulting column is then used in move-to-column etc., it
> might go to the wrong column, because in Emacs each column is not
> necessarily a single codepoint.  The simplest example is a TAB
> character, but there are more examples, some of which are quite
> complicated (see below).

There's only one function that uses `move-to-column'.  It's very old and
I didn't touch it.

>> One the other hand, the Emoji
>> 🧛‍♀️ contributes 4 units. This is independent of with screen display.
>
> Not in Emacs.

Sorry, I don't understand what you mean.  Emas has no say as to how
Emoji are represented as sequences of codepoints.  The female vampire
Emoji is 4 codepoints, if I'm counting it right.

Of course I undestand taht the Emoji occupies 1 column in my screen.

>> By the way, I don't undertand your claim about column counting.  If I
>> move point over 🧛‍♀️, the mode line column count increments by 3 units,
>> which seems to make no sense: this Emoji is 4 codepoints longs and
>> occupies 1 screen column.  What's the logic here?
>
> If that is what you see, it could be a bug.  Does current-column agree
> with what you see in the mode line?

Yes.

> In general, characters (codepoints) that are composed on display into
> a single glyph or "grapheme cluster" are supposed to be counted as a
> single column.  Try typing this in "emacs -Q"
>
>   a C-x 8 RET COMBINING ACUTE ACCENT RET
>
> If your default font is capable enough, you will see a single glyph of
> 'a' with acute accent (á), and it will count as 1 column, although
> there are 2 codepoints in the buffer.  And "M-: (move-to-column 1) RET"
> will move past both codepoints.  Now imagine that we get such sequences
> from the LSP server -- what will Eglot do in terms of column counting?

Right, I undestand the Unicode business (thanks for the pointers in any
case).

If you look carefully at the Eglot code, you will see that
`move-to-column' only appears in the code pertaining the “UTF-16 way of
counting offsets”, which

1. is old and I didn't touch in this patch,
2. seems to work correctly, despite looking suspicious, and
3. will not be used anymore when both Eglot and the LSP server supports
   the positionEncodings capabitily.

I hope this motivates you to add this feature 🙂.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]