[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nano-devel] massive updates: UTF-8 support, etc.

From: David Lawrence Ramsey
Subject: Re: [Nano-devel] massive updates: UTF-8 support, etc.
Date: Wed, 19 Jan 2005 17:47:37 +0100
User-agent: Mutt/1.5.6+20040907i

--- Jordi Mallach <address@hidden> wrote:
>Hi David!
>This is great news, half of Debian was waiting for this mail. :)

Oh good.

>Do you think the current code is packageable, or should I wait for 
>1.3.6? I'm concerned about the 8bit nasty bugs in vanilla 1.3.5 which I 
>have in experimental.

It's not quite packageable yet, unfortunately.  In the meantime, the 
attached patch should remove the buggy UTF-8 support from 1.3.5, so that 
the typing of 8-bit characters should at least work again.  It also 
fixes a mismatched prototype found by Jeremy Huddleston in Gentoo that's 
been fixed in CVS (break_line() should return a ssize_t and not an int).

(Those of you using the old patch that adds the "noutf8" flag should be 
aware that UTF-8 support in CVS is now autodetected based on locale, so 
the flag is no longer needed.)

In CVS, the areas that still need UTF-8 support are:

* Tab completion of filenames and display of filenames in the file 
browser.  These are both in DB's old UTF-8 patch, but the changes needed 
for them are on top of his changes in his old behemoth patch, which I 
haven't gotten to porting over yet.

* revstrcasestr() and the equivalent of strcasestr(), since 
case-insensitive searches for UTF-8 strings won't work without support, 
and also strncmp(), since its length argument needs to be in terms of 
multibyte characters.

* All of the help browser code, which currently cuts UTF-8 lines off 
prematurely.  (To see this, open a UTF-8 terminal using DejaVu fonts and 
the ru_RU.UTF-8 locale, and compare the appearance of the help menu to 
its appearance in an ru_RU locale.)  The use of help_line_len() may have 
to be changed, too, since calculating it for every line on the fly is 
much slower if it contains calls to strlenpt() to get the number of 
columns the line actually takes up in preparation for breaking it at a 
space.  And should it break at Unicode spacing characters as well as 
ASCII spaces?

* Possibly parts of the justify routine.  Should justified lines of a 
paragraph break at Unicode spacing characters as well as ASCII spaces?  
If so, what do we do when spaces need to be added to the end of a line?  
Do we just add ASCII spaces as we do now, or do we try to guess what 
spacing character should be there (which could be prone to error)?

* All of the rcfile code, since it needs to read string arguments in 
some cases and hence needs to parse the characters correctly.

* Whitespace display mode, since it partially relies on the rcfile code.

* The calculation of totsize, since it still maintains a count of 
single-byte characters instead of multibyte characters in UTF-8 mode.

* The use of NO_CONVERT.  Since it's normally used when opening binary 
files, it should treat even UTF-8 as binary when it's used and display 
the edit window and statusbar text as a raw stream of bytes.  I'm not 
yet sure where to put the hooks for this in a way that won't break 
display of other things and will stay consistent when switching between 
multiple buffers, though.

* What should be done about slang, since its UTF-8 support in 2.0 
doesn't seem to work with its curses emulation, meaning that nano 
binaries built with it can't support UTF-8 properly?

Other long-standing non-UTF-8 and non-TODO list issues that need to be 
solved eventually and which I haven't forgotten:

* Improvements to the color code and related improvements to the rcfile 
code (specifying regexes in separate files, specifying different regexes 
for different filenames of the same type, etc.), as mentioned in posts 
to the list a few months ago and earlier.

* Port over more of Gentoo's color regexes, as posted by Mike Frysinger 
awhile back.

* Modifying the shortcut list display on the last two lines of the 
screen so that it can handle overly long shortcuts in e.g. the Ukrainian 

* Adding the keypad flag back in if the issues with the numeric keypad 
and PuTTY still exist.  (There's currently only room for one more flag 
before the 32-bit limit is reached again, and bit fields don't work 
because there's no way to subscript them and hence no way to associate 
them with toggles, so maybe another approach may have to be used 

* Adding support for having the Alt key act as a Meta key when using 
PDCurses, as suggested by Tom Haller back in October.  Related 
questions: How do the KEY_ALT_L and KEY_ALT_R keys actually work?  Are 
they generated when you press those keys alone, or are they generated as 
part of a sequence when you press them as part of a combination (i.e, 
does [Left-]Alt-G produce "KEY_ALT_L" "g"?)

* Breaking the WriteOut routine into several separate routines for ease 
of maintenance, as suggested by Jay Carlson back in 2002, and possibly 
working with file descriptors instead of filenames in the process so 
that safe_tempnam() is no longer needed and mkstemp() can be used 

* Fixing the (now-overhauled) statusbar code so that it only refreshes 
when it needs to, as the edit window code already does.

* Fixing the history code so that scrolling works properly in all cases 
(since problems occasionally pop up, although I haven't figured out how 
to reproduce them reliably; adding code to just peek at the next history 
entry without moving to it would be easier to deal with), and so that 
there's no blank line produced when scrolling up at the top of the list, 
since text typed there is not preserved when scrolling as it is at the 
bottom of the screen, and having a blank line only at the end is 
consistent with the idea of the magicline.  Also break the history 
shortcut into two shortcuts, one for moving up and one for moving down, 
and set their key values properly so that mouse clicks work on them.

* Adding mouse support to the statusbar prompt, in terms of moving the 
cursor to where the user clicks.

* Possibly finding a generic way to merge the edit window and statusbar 
routines in some cases, in order to cut down on the amount of duplicated 
code between the two.  (Related: How should filename searches in the 
file browser be implemented, since findnextstr() works only with 
filestructs and filenames are a simple char* array?  Creating a fake 
filestruct containing all filenames would be easiest, but seems 

* Going over the list of potential memory problems that Rocco sent 
awhile back, in case any of those could be latent bugs.

* Porting over the last of the useful code from DB's behemoth patch 
(other than what's needed for UTF-8), such as the improved rcfile 
parsing and the ability to handle smaller window sizes in nano.c.

* Possibly more rearrangement of source files, such as putting most edit 
window-specific functions in edit.c and most statusbar prompt-specific 
functions in status.c (not statusbar.c, since it doesn't fit in the 8.3 
filename limit and could cause problems on Cygwin).

* Fribidi support, as mentioned ages ago (although this has to wait 
until UTF-8 support is done and probably until we actually have a 
translation in an RTL language so that it can be tested properly).

The changes in CVS since my last email are:

* UTF-8 support added to the strcasecmp() equivalent, the strncasecmp() 
equivalent, do_next_word(), and do_prev_word()

* UTF-8 support fixed in mbstrnlen()

* String functions in utils.c moved to chars.c, as almost all of them 
need multibyte versions to support UTF-8, and they all deal with 
characters anyway.

* Support added for the -O/--morespace option, Meta-O toggle, and 
"morespace" rcfile option, which allows the blank line below the 
titlebar to be used as part of the edit window, as suggested back in 

* Support added for the CUT_TO_END flag when pressing Ctrl-K at the 
statusbar prompt, for consistency with the edit window.

* Support added for moving to the next or previous word at the statusbar 
prompt, since it may be useful when long strings of words are there.

* Updated documentation for the manual pages and info pages, syncing 
their descriptions with those in nanorc.sample where necessary and 
documenting -O/--morespace.

Attachment: nano135noutf8.patch
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]