[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[wdiff-dev] [patch #7121] New, per-character diff, mode
From: |
Georgios Zarkadas |
Subject: |
[wdiff-dev] [patch #7121] New, per-character diff, mode |
Date: |
Mon, 29 Mar 2010 23:04:41 +0000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; el-GR; rv:1.9.1.8) Gecko/20100214 Ubuntu/9.10 (karmic) Firefox/3.5.8 |
Follow-up Comment #3, patch #7121 (project wdiff):
Hi,
My answers to the remarks follow, in the same order.
-1- Yes, to both questions.
-2- Ok, it fits with (-3-)'s time frame; we could also keep just the long
option to avoid any ambiguity.
-3- Yes, it is; it was a quick hack in order to drive development of a tool
for the trans-coord project, which uses wdiff for showing changes in fuzzy
translations (see
http://lists.gnu.org/archive/html/trans-coord-devel/2010-03/msg00014.html).
However, it is fast, resets after an error to normal and stream
preserving. If, as I chose, one wants to keep all bytes of the stream,
inevitably on an input error it will spit something non-printable; but it is
not wdiff's responsibility to validate the stream IMHO.
This is not my final word, however; in order to arrive in a more general
solution I have started studying other encodings, such as UTF-16, the unicode
routines available by glibc and also I had a quick look in Coreutils'
sources.
I believe that the following apply:
--- Calling `setlocale(LC_CTYPE, NULL)' at program's startup to get the
default encoding (plus a commandline option to supply it explicitly) and
branching to either single-byte or the appropriate multi-byte mode will be
easy.
--- Splitting of words/chars will need some thought in order not to loose
too much in speed and handle errors gracefully, but it is rather
straightforward; you just follow the selected encoding's rules. Most probably
it will require to change all getc/putc calls to the apropriate multi-byte
versions.
--- Doing more elaborate things such as trully supporting the -i,
--ignore-case option in all encodings will most probably require quite a lot
of code (actually, I do not know yet how much).
Thus, a pass-through implementation (just to break words/chars right in
any encoding) is feasible IMO in a few months; I will try it. If you are aware
of other GNU software that handles unicode point me to it; there may be
suitable ready code to use for this purpose.
-4- It is better to postpone any such activity at this moment (cf. -3-)
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/patch/?7121>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/