[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#5797: 23.1; search-forward in unibyte buffer for \377
From: |
rasmith |
Subject: |
bug#5797: 23.1; search-forward in unibyte buffer for \377 |
Date: |
Mon, 29 Mar 2010 10:09:19 -0500 (CDT) |
Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.
Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.
Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:
search-forward fails to find a unibyte \377 in a raw unibyte buffer.
I use "cgreek", a package written by Naoto Takahashi for handling
polytonic (ancient, fully accented) Greek. It includes a file,
cgreek-tlg.el, for processing the files in the Thesaurus Linguae
Graecae, which have their own unique formats. In these files, the
byte \377 is used as a string terminator. Prior to emacs23, these
files could be processed by reading the file in with
insert-file-contents-literally, making the buffer unibyte with
(set-buffer-multibyte nil), and searching for the string terminator
with (search-forward (char-to-string ?\xff)). However, that search
now fails to find a single byte \377 and instead matches on the
two-byte sequence \231\277.
Changing the search function to (search-forward (unibyte-string ?\377))
has the same result.
On investigation, I see the following:
After further investigation, I'm not certain it's a bug: it may be an
intentional part of the modifications to accommodate utf-8. Here are
the details;
In a multibyte-buffer (set-buffer-multibyte t),
(search-forward (char-to-string ?\xff)) matches utf-8 "ÿ" (i.e. \303\277)
(search-forward (char-to-string ?\377)) matches utf-8 "ÿ"
(search-forward (unibyte-string ?\377)) matches byte \377
In a unibyte buffer (set-buffer-multibyte nil)
(search-forward (char-to-string ?\xff)) matches \231\277
(search-forward (char-to-string ?\377)) matches \231\277
(search-forward (unibyte-string ?\377)) matches \231\277
In other words, search-forward cannot find byte \377 when searching in
a *unibyte* buffer, but it can find that same byte if the buffer is
changed to multibyte. The reason is that in a unibyte buffer,
search-forward apparently changes byte \377 to a two-byte
representation (but not to utf-8, which would be \303\277).
This may be exactly the intended behavior of search-forward, but it
breaks scripts expecting search-forward to be able to find a single
high 8-bit byte in a unibyte buffer. In context, changing the buffer
to multibyte is not a solution.
The code in which I found this error can be fixed by replacing
(search-forward (char-to-string ?\xff))
with
(skip-chars-forward "^\377")
(forward-char 1)
(fix provided by Naoto Takahashi)
However, that means that scripts counting on the old behavior of
search-forward will have to be modified.
If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
`bt full' and `xbacktrace'.
If you would like to further debug the crash, please read the file
/usr/local/share/emacs/23.1/etc/DEBUG for instructions.
In GNU Emacs 23.1.1 (amd64-portbld-freebsd8.0, GTK+ Version 2.18.7)
of 2010-03-25 on aristotle.tamu.edu
Windowing system distributor `The X.Org Foundation', version 11.0.10605000
configured using `configure '--with-x-toolkit=gtk'
'--x-libraries=/usr/local/lib' '--x-includes=/usr/local/include'
'--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/info/'
'--build=amd64-portbld-freebsd8.0' 'build_alias=amd64-portbld-freebsd8.0'
'CC=cc' 'CFLAGS=-O2 -pipe -fno-strict-aliasing' 'LDFLAGS=-L/usr/local/lib
-lintl' 'CPPFLAGS=-I/usr/local/include''
Important settings:
value of $LC_ALL: en_US.UTF-8
value of $LC_COLLATE: nil
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: nil
locale-coding-system: utf-8-unix
default-enable-multibyte-characters: t
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
tool-bar-mode: t
mouse-wheel-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
global-auto-composition-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Recent input:
o <down> <down> <down> <return> C-q 0 0 0 <return>
C-q 3 7 7 <return> <up> <up> <up> <left> <up> C-x C-e
C-x o <down> <down> <down> <down> <backspace> <backspace>
C-q 2 3 1 <return> ] <backspace> C-q 2 7 7 <return>
<up> <up> <up> <up> C-e C-x C-e <up> <up> <left> C-x
C-e <up> <up> <switch-frame> <down-mouse-1> <mouse-movement>
<switch-frame> <mouse-1> <help-echo> <switch-frame>
<switch-frame> <switch-frame> <switch-frame> <switch-frame>
<switch-frame> <switch-frame> <switch-frame> <help-echo>
<up> <up> <left> <up> <right> C-k C-y <return> C-y
<left> <backspace> <backspace> <backspace> t <right>
C-x C-e <down> <right> <right> <right> <right> <right>
<right> <right> <right> <right> <right> <right> <right>
<right> <right> <right> C-x C-e C-x o <down> C-x C-e
<up> <up> <up> <left> <left> <left> <left> <return>
<up> ( s e a r c h - f o r w a r d SPC ( c h a r -
t o - s t r i o n g <backspace> <backspace> <backspace>
g <backspace> g SPC <backspace> <backspace> n g SPC
? \ x f f ) ) C-x C-e C-x o <up> <up> <down> <up> C-x
C-e <down> <down> C-e C-x C-e <up> <up> <up> <up> C-e
C-x C-e <up> <up> <left> C-x C-e <up> <up> <up> <up>
<up> <up> C-e C-x C-e <down> C-e C-x C-e C-x o <down>
<down> <down> <down> <down> <down> <return> C-q 3 7
7 <return> <up> <up> <up> <up> <up> <up> <left> <left>
C-x C-e <up> <up> <up> <up> <up> <up> <down> <left>
<left> C-x C-e <up> <up> <up> <up> <left> C-x C-e <up>
<up> <up> <up> <up> <left> <left> <left> <left> <left>
C-x C-e <down> <down> C-e C-x C-e <up> <up> <up> <up>
C-e C-x C-e <up> <up> <up> C-e C-x C-e <down> <switch-frame>
<switch-frame> <help-echo> <help-echo> <help-echo>
M-x r e p o r t <tab> b <tab> <return>
Recent messages:
Entering debugger...
326
Entering debugger...
nil
369 [3 times]
t
Entering debugger...
374 [2 times]
366
nil
369 [3 times]
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#5797: 23.1; search-forward in unibyte buffer for \377,
rasmith <=