bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70914: 29.3; Crashes often on Windows


From: Simen Endsjø
Subject: bug#70914: 29.3; Crashes often on Windows
Date: Wed, 15 May 2024 13:24:23 +0200

This issue is solved by setting `(prefer-coding-system 'utf-8)`.
Not sure if this is the preferred fix or if this gets me into trouble
later though.

On Wed, May 15, 2024 at 1:19 PM Simen Endsjø <simendsjo@gmail.com> wrote:
>
> I found another issue. My files are stored in UTF-8 also on Windows.
> Without setting Windows in the "Beta: Use UTF-8 everywhere", tools
> like ripgrep will somehow interpret the files as latin-1. So I cannot
> search for special characters in my language, and I even remember
> having crashes happening when searching documents which includes them.
>
> On Wed, May 15, 2024 at 12:25 PM Simen Endsjø <simendsjo@gmail.com> wrote:
> >
> > > I suggest to remove them, and see if the crashes keep happening.
> >
> > No crashes yet at least, so let's hope.
> >
> > > If removing these hacks make something stop working, describe the
> > > problems with the details: there are definitely ways to solve them
> > > without these dangerous customizations.
> >
> > Nothing has stopped working per se, but I encounter encoding problems
> > which is probably why I added this in the first place.
> > I tested using `emacs -Q`, so the default settings.
> >
> > When running in a regular terminal , I get the output:
> >     
> > ┌───────────────────────────────────────────────────────────┬───────────────────────────────┬─────────────────┬───────────┬─────────────────┬───────────┐
> >     │ Package                                                   │
> > Installed                     │ Released        │ Latest    │ Released
> >        │ Age (y)   │
> >
> > Tested with Git Bash, msys2, Powershell 5, Powershell 7 in Windows
> > Terminal, Powershell 7, Command Prompt.
> >
> > But in eshell, I get:
> >     
> > ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄ¿
> >     ³ Package        ³ Installed      ³ Released   ³ Latest ³ Released
> >   ³ Age (y) ³
> >
> > And in shell:
> >     
> > +------------------------------------------------------------------------------+
> >       Package            Installed          Released   Latest
> > Released   Age (y)
> >
> >
> > Guess I'll have to dig into encoding in emacs and integration with Windows.
> >
> > ┌ and Ú:
> >
> >                 position: 1 of 155 (0%), column: 0
> >                 character: ┌ (displayed as ┌) (codepoint 9484, #o22414, 
> > #x250c)
> >                 charset: unicode-bmp (Unicode Basic Multilingual Plane
> > (U+0000..U+FFFF))
> >     code point in charset: 0x250C
> >                 script: symbol
> >                 syntax: _     which means: symbol
> >                 category: .:Base, P:Haskell symbol constituent
> > characters, c:Chinese, h:Korean, j:Japanese
> >                 to input: type "C-x 8 RET 250c" or "C-x 8 RET BOX
> > DRAWINGS LIGHT DOWN AND RIGHT"
> >             buffer code: #xE2 #x94 #x8C
> >                 file code: #xE2 #x94 #x8C (encoded by coding system 
> > utf-8-dos)
> >                 display: by this font (glyph code):
> >         harfbuzz:-outline-Iosevka Slab
> > Regular-regular-normal-normal-mono-24-*-*-*-c-*-iso8859-1 (#x605F)
> >
> >     Character code properties: customize what to show
> >     name: BOX DRAWINGS LIGHT DOWN AND RIGHT
> >     old-name: FORMS LIGHT DOWN AND RIGHT
> >     general-category: So (Symbol, Other)
> >     decomposition: (9484) ('┌')
> >
> >
> >                 position: 155 of 1140 (14%), column: 0
> >                 character: Ú (displayed as Ú) (codepoint 218, #o332, #xda)
> >                 charset: unicode-bmp (Unicode Basic Multilingual Plane
> > (U+0000..U+FFFF))
> >     code point in charset: 0xDA
> >                 script: latin
> >                 syntax: w     which means: word
> >                 category: .:Base, L:Strong L2R, j:Japanese, l:Latin, v:Viet
> >                 to input: type "C-x 8 RET da" or "C-x 8 RET LATIN
> > CAPITAL LETTER U WITH ACUTE"
> >             buffer code: #xC3 #x9A
> >                 file code: #xC3 #x9A (encoded by coding system utf-8-dos)
> >                 display: by this font (glyph code):
> >         harfbuzz:-outline-Iosevka Slab
> > Regular-regular-normal-normal-mono-24-*-*-*-c-*-iso8859-1 (#x9B)
> >
> >     Character code properties: customize what to show
> >     name: LATIN CAPITAL LETTER U WITH ACUTE
> >     old-name: LATIN CAPITAL LETTER U ACUTE
> >     general-category: Lu (Letter, Uppercase)
> >     decomposition: (85 769) ('U' '́')
> >
> > On Tue, May 14, 2024 at 4:18 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > >
> > > > From: Simen Endsjø <simendsjo@gmail.com>
> > > > Date: Tue, 14 May 2024 15:58:48 +0200
> > > > Cc: 70914@debbugs.gnu.org
> > > >
> > > > I'm not really sure why I've added these anymore. I've added them over 
> > > > time
> > > > since 2016 first using Spacemacs, then Doom Emacs.
> > > >
> > > > >>   ;; Windows doesn't set this, but some packages might depend on the 
> > > > >> variable
> > > > >>   (setenv "LANG" "en_US")
> > > > >
> > > > > The comment is not correct.  To see for yourself, ensure LANG is not
> > > > > set in the system-wide environment, start "emacs -Q", and then type
> > > > >
> > > > >  M-: (getenv "LANG") RET
> > > >
> > > > That's interesting. I usually just { M-x getenv }, and LANG isn't 
> > > > listed there.
> > > > (getenv "LANG") returns "ENU" though. Looking at the environment 
> > > > variables for
> > > > the process, I see LANG listed there. How is getenv *not* listing the 
> > > > variable?
> > > > Has it marked it special somehow and filter it out?
> > >
> > > It's a Windows-specific trick: we ad a few environment variables at
> > > startup such that getenv can access them, but don't want it to appear
> > > in process-environment explicitly, and so the function that prompts
> > > for the variable when you invoke getenv interactively doesn't know
> > > about them.
> > >
> > > > > This is a very bad idea, IME.  The clipboard on Windows uses UTF-16,
> > > > > and Emacs knows how to decode it correctly.  Customizing
> > > > > clipboard-coding-system to something else just gets in the way.
> > > >
> > > > Probably something I did after changing Windows to use utf-8, which also
> > > > includes the clipboard.
> > > >
> > > > > I don't know where does the comment about latin-1 by default come from
> > > > > (maybe from Windows 9X days?), but it is not true on Windows for a
> > > > > very long time.  The default value of selection-coding-system on
> > > > > Windows is utf-16le-dos, you can again verify that in "emacs -Q".
> > > >
> > > > Maybe I broke something else when trying to get text to work properly 
> > > > and added
> > > > that hack as a workaround..? I really have no idea. Don't want to dig 
> > > > through my
> > > > git commits to find out ;)
> > > >
> > > > > Again, I'm not sure this is relevant to the crashes.  But it doesn't
> > > > > do any harm to make your Emacs configuration healthier ;-)
> > > >
> > > > Yes, thanks a lot for the help! I'm a bit scared to remove these hacks 
> > > > I've
> > > > accumulated over time as I probably added them there for a reason 
> > > > though. But
> > > > hopefully the workarounds was just for some symptoms and not the root 
> > > > cause --
> > > > we'll see.
> > >
> > > I suggest to remove them, and see if the crashes keep happening.
> > >
> > > If removing these hacks make something stop working, describe the
> > > problems with the details: there are definitely ways to solve them
> > > without these dangerous customizations.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]