[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [wdiff-bugs] Bug#553490: wdiff: Does not handle UTF-8 properly (fwd)

From: Martin von Gagern
Subject: Re: [wdiff-bugs] Bug#553490: wdiff: Does not handle UTF-8 properly (fwd)
Date: Thu, 20 Oct 2011 21:05:56 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20111003 Thunderbird/7.0.1

Dear Santiago, Dear Josh,

I've already noticed that bug in your bug tracker, and added it to the
wdiff bug tracker at Savannah: https://savannah.gnu.org/bugs/?34224

Right now, I'm not sure how best to handle this case. Unicode support is
a big problem for the current wdiff implementation, in many ways. For
example, I guess that the most sensible way to really simulate
overstrike printing would be detecting grapheme clusters, i.e. even
treat sequences ofmultiple code points as a single entity if some of the
codepoints are combining.
http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries has the
details on this, but I don't think I'll implement this in wdiff myself.
I've been toying with the idea of writing wdiff up from scratch with
stuff like this in mind, using ICU break iterators or similar. Won't
happen too soon, though.

I'm also not sure what versions of less are behaving in what ways. For
one, I doubt that all of them will know about grapheme clusters when
reading their input, so they might fail to turn it back into character
attributes as expected. I also think that most less implementations
these days will handle terminal control codes just fine, particularly if
called as "less -R". So that overstriking thing might be obsolete in any

Therefore I hope to roll a release soon which will pass terminal control
sequences to less, thus avoiding that overstrike stuff. I'll have to
give a bit more thought to the best combination of configure switches,
environment variables and command line options, though.

 Martin von Gagern

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]