bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator c

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator c

From:	Eli Zaretskii
Subject:	bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters
Date:	Mon, 17 Jul 2017 18:09:46 +0300

> Date: Tue, 22 Mar 2016 18:13:15 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 23086@debbugs.gnu.org
> 
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Tue, 22 Mar 2016 11:42:46 +0100
> > 
> > Type some characters
> > C-x 8 RET LINE SEPARATOR (or PARAGRAPH SEPARATOR)
> > Type some more characters
> > M-q
> > 
> > Expected behavior: Emacs treats these characters as line and paragraph
> > separators: they are displayed as line breaks, M-q doesn't remove them,
> > and forward-paragraph etc. treat the paragraph separator as paragraph
> > end.
> > 
> > Actual behavior: These characters are displayed as one-pixel horizontal
> > whitespace and otherwise ignore.
> > 
> > Also discussed in
> > https://lists.gnu.org/archive/html/emacs-devel/2015-08/msg01043.html.
> > https://www.emacswiki.org/emacs/unicode-whitespace.el supposedly adds
> > support for these characters, but I think proper treatment of Unicode
> > separators should be part of Emacs.
> 
> It is not clear to me what exactly is the requested feature.  Can you
> propose a detailed list of requirements?
> 
> I'm asking because these characters come in Unicode with a non-trivial
> baggage, that is a far cry from just breaking the line; see
> 
>   http://unicode.org/reports/tr14/
>   http://unicode.org/reports/tr29/
> 
> There are also implications on the bidirectional display (it is
> sensitive to where the line and the paragraph begin and end).
> 
> If we want to support these two characters, we should think about
> which parts of the relevant functionality we want to see in Emacs,
> because users will expect that.  In addition, there are other
> white-space characters defined by Unicode, and it would make sense to
> treat them all alike.  I'm not sure it makes sense to support just the
> line-breaking and paragraph-separator parts of only these two
> characters.
> 
> Then there are Emacs-specific issues, for example:
> 
>  . do we treat u+2028 and u+2029 as literal characters, or as a form
>    of EOL encoding?
>  . if the former, how do we distinguish them from newlines on display?
>  . should Isearch find these when looking for "\n"? how about regexp
>    search for "$"?
> 
> There are probably more implications, these just the ones that popped
> in my mind in 5 sec.  IOW, I think Someone™ should think this over and
> present a detailed proposal.

So I've dusted off this year-old bug reported and decided to improve
Emacs in this area.  Here's what I propose:

 . u+2028 and u+2029 (and also perhaps u+0085) will be treated a form
   of EOL encoding, which means they will not appear on display, and
   will cause the next character be displayed on the next screen line
 . M-q will remove u+2028, as it removes newlines, and put newlines
   at all EOLs as part of filling
 . M-q will NOT remove u+2029, unless the user wants to refill several
   paragraphs as a single paragraph, and there happens to be a u+2029
   between some of the paragraphs
 . forward-paragraph etc. will treat u+2029 as paragraph end
 . bidi reordering will treat u+2029 as paragraph end

There are some compromises in these decisions, but they make the job
much easier and less intrusive, and I think they will advance the
level of our Unicode support quite a bit.

Comments?

I think we should also make $ match these two characters, in addition
to the newline, but that could be more difficult.  Would someone who
knows their way in regex.c want to work on this part?

[Prev in Thread]

Current Thread

[Next in Thread]

bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters, Eli Zaretskii <=

Prev by Date: bug#27667: 26.0.50; Dynamic menu is not updated (gtk3 + hidpi scaling)
Next by Date: bug#27508: 25.2; GTK3 scrollbar position does not respect scaling factor on HiDPI setting
Previous by thread: bug#27734: 26.0.50; display-line-numbers: allow to set alternate face for multiples of N
Next by thread: bug#27508: 25.2; GTK3 scrollbar position does not respect scaling factor on HiDPI setting
Index(es):
- Date
- Thread