[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nano-devel] massive updates: UTF-8 support, etc.
David Lawrence Ramsey
Re: [Nano-devel] massive updates: UTF-8 support, etc.
Wed, 19 Jan 2005 17:47:37 +0100
--- Jordi Mallach <address@hidden> wrote:
>This is great news, half of Debian was waiting for this mail. :)
>Do you think the current code is packageable, or should I wait for
>1.3.6? I'm concerned about the 8bit nasty bugs in vanilla 1.3.5 which I
>have in experimental.
It's not quite packageable yet, unfortunately. In the meantime, the
attached patch should remove the buggy UTF-8 support from 1.3.5, so that
the typing of 8-bit characters should at least work again. It also
fixes a mismatched prototype found by Jeremy Huddleston in Gentoo that's
been fixed in CVS (break_line() should return a ssize_t and not an int).
(Those of you using the old patch that adds the "noutf8" flag should be
aware that UTF-8 support in CVS is now autodetected based on locale, so
the flag is no longer needed.)
In CVS, the areas that still need UTF-8 support are:
* Tab completion of filenames and display of filenames in the file
browser. These are both in DB's old UTF-8 patch, but the changes needed
for them are on top of his changes in his old behemoth patch, which I
haven't gotten to porting over yet.
* revstrcasestr() and the equivalent of strcasestr(), since
case-insensitive searches for UTF-8 strings won't work without support,
and also strncmp(), since its length argument needs to be in terms of
* All of the help browser code, which currently cuts UTF-8 lines off
prematurely. (To see this, open a UTF-8 terminal using DejaVu fonts and
the ru_RU.UTF-8 locale, and compare the appearance of the help menu to
its appearance in an ru_RU locale.) The use of help_line_len() may have
to be changed, too, since calculating it for every line on the fly is
much slower if it contains calls to strlenpt() to get the number of
columns the line actually takes up in preparation for breaking it at a
space. And should it break at Unicode spacing characters as well as
* Possibly parts of the justify routine. Should justified lines of a
paragraph break at Unicode spacing characters as well as ASCII spaces?
If so, what do we do when spaces need to be added to the end of a line?
Do we just add ASCII spaces as we do now, or do we try to guess what
spacing character should be there (which could be prone to error)?
* All of the rcfile code, since it needs to read string arguments in
some cases and hence needs to parse the characters correctly.
* Whitespace display mode, since it partially relies on the rcfile code.
* The calculation of totsize, since it still maintains a count of
single-byte characters instead of multibyte characters in UTF-8 mode.
* The use of NO_CONVERT. Since it's normally used when opening binary
files, it should treat even UTF-8 as binary when it's used and display
the edit window and statusbar text as a raw stream of bytes. I'm not
yet sure where to put the hooks for this in a way that won't break
display of other things and will stay consistent when switching between
multiple buffers, though.
* What should be done about slang, since its UTF-8 support in 2.0
doesn't seem to work with its curses emulation, meaning that nano
binaries built with it can't support UTF-8 properly?
Other long-standing non-UTF-8 and non-TODO list issues that need to be
solved eventually and which I haven't forgotten:
* Improvements to the color code and related improvements to the rcfile
code (specifying regexes in separate files, specifying different regexes
for different filenames of the same type, etc.), as mentioned in posts
to the list a few months ago and earlier.
* Port over more of Gentoo's color regexes, as posted by Mike Frysinger
* Modifying the shortcut list display on the last two lines of the
screen so that it can handle overly long shortcuts in e.g. the Ukrainian
* Adding the keypad flag back in if the issues with the numeric keypad
and PuTTY still exist. (There's currently only room for one more flag
before the 32-bit limit is reached again, and bit fields don't work
because there's no way to subscript them and hence no way to associate
them with toggles, so maybe another approach may have to be used
* Adding support for having the Alt key act as a Meta key when using
PDCurses, as suggested by Tom Haller back in October. Related
questions: How do the KEY_ALT_L and KEY_ALT_R keys actually work? Are
they generated when you press those keys alone, or are they generated as
part of a sequence when you press them as part of a combination (i.e,
does [Left-]Alt-G produce "KEY_ALT_L" "g"?)
* Breaking the WriteOut routine into several separate routines for ease
of maintenance, as suggested by Jay Carlson back in 2002, and possibly
working with file descriptors instead of filenames in the process so
that safe_tempnam() is no longer needed and mkstemp() can be used
* Fixing the (now-overhauled) statusbar code so that it only refreshes
when it needs to, as the edit window code already does.
* Fixing the history code so that scrolling works properly in all cases
(since problems occasionally pop up, although I haven't figured out how
to reproduce them reliably; adding code to just peek at the next history
entry without moving to it would be easier to deal with), and so that
there's no blank line produced when scrolling up at the top of the list,
since text typed there is not preserved when scrolling as it is at the
bottom of the screen, and having a blank line only at the end is
consistent with the idea of the magicline. Also break the history
shortcut into two shortcuts, one for moving up and one for moving down,
and set their key values properly so that mouse clicks work on them.
* Adding mouse support to the statusbar prompt, in terms of moving the
cursor to where the user clicks.
* Possibly finding a generic way to merge the edit window and statusbar
routines in some cases, in order to cut down on the amount of duplicated
code between the two. (Related: How should filename searches in the
file browser be implemented, since findnextstr() works only with
filestructs and filenames are a simple char* array? Creating a fake
filestruct containing all filenames would be easiest, but seems
* Going over the list of potential memory problems that Rocco sent
awhile back, in case any of those could be latent bugs.
* Porting over the last of the useful code from DB's behemoth patch
(other than what's needed for UTF-8), such as the improved rcfile
parsing and the ability to handle smaller window sizes in nano.c.
* Possibly more rearrangement of source files, such as putting most edit
window-specific functions in edit.c and most statusbar prompt-specific
functions in status.c (not statusbar.c, since it doesn't fit in the 8.3
filename limit and could cause problems on Cygwin).
* Fribidi support, as mentioned ages ago (although this has to wait
until UTF-8 support is done and probably until we actually have a
translation in an RTL language so that it can be tested properly).
The changes in CVS since my last email are:
* UTF-8 support added to the strcasecmp() equivalent, the strncasecmp()
equivalent, do_next_word(), and do_prev_word()
* UTF-8 support fixed in mbstrnlen()
* String functions in utils.c moved to chars.c, as almost all of them
need multibyte versions to support UTF-8, and they all deal with
* Support added for the -O/--morespace option, Meta-O toggle, and
"morespace" rcfile option, which allows the blank line below the
titlebar to be used as part of the edit window, as suggested back in
* Support added for the CUT_TO_END flag when pressing Ctrl-K at the
statusbar prompt, for consistency with the edit window.
* Support added for moving to the next or previous word at the statusbar
prompt, since it may be useful when long strings of words are there.
* Updated documentation for the manual pages and info pages, syncing
their descriptions with those in nanorc.sample where necessary and
Description: Text document