[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [wdiff-bugs] [patch] html support
Re: [wdiff-bugs] [patch] html support
Thu, 10 Jan 2008 13:10:22 -0500
On Jan 6, 2008 10:40 PM, Andrew Clausen <address@hidden> wrote:
> Hi Denver,
> On Sun, Jan 06, 2008 at 01:58:18PM -0500, Denver Gingerich wrote:
> > Thanks for the patch. I'm not sure what the conventions are for GNU
> > command-line tools with respect to outputting HTML. It seems that it
> > might be better to have an HTML post-processor that takes the normal
> > output of wdiff and converts it to HTML. That way wdiff only has to
> > worry about one type of formatting (plain text). This appears to be
> > what the GNU diff people expect since GNU diff doesn't natively
> > support HTML output.
> From a usability point of view, I think it's desirable to have a single
> wdiff front-end command to handle everything. (It's easier to find
> what you want with a single front-end, and the options in all the various
> output formats are likely to overlap substantially.)
> From a maintainability point of view, I don't see a big advantage from having
> HTML output generated via a wdiff post-processor. Most of the code would be
> parsing wdiff's output rather than generating html.
One should consider where HTML output would be used most. In most
cases, HTML-ized diff output is used in web-based version control
viewing systems (such as ViewVC). Since wdiff works better than diff
for long lines (which are more common in written text than in source
code), it might also be used in web-based document histories, such as
those provided by MediaWiki.
In both cases, the request is made via a web interface (ie. by
clicking a link that says "compare with previous revision") and the
response is provided via a web interface (the HTML-ized diff output).
As a result, implementing HTML-ized output in wdiff does not make
sense because its input is from a command line and its output is to a
Now you could make the argument that wdiff could be run from within a
web scripting language (ie. in PHP: "$diff = `wdiff --html a.txt
b.txt`"). However, this is generally considered to be a hack and for
good reason. First of all, it requires a significant amount of
processing overhead in converting the input data to files and starting
a new process on the web server. Secondly, it makes dependencies
difficult to trace because a web application using wdiff needs to
specify that it requires wdiff to be installed on the web server.
Generally web server administrators prefer to install plugins for the
web server than command-line applications. Additionally, not all web
servers will support running command-line applications (ie. the above
PHP command will not work) for security reasons.
I believe it is best for this sort of thing to be done in a library or
a dynamically-loadable web server module. Examples of this are the
use of Python's difflib in ViewVC  and MediaWiki's use of wikidiff2
 (a dynamically-loadable module). wikidiff2 does include a
command-line version that prints HTML to standard out, but it is
exclusively for testing purposes.
So the best bet for getting this into wdiff is to abstract out the
diffing part of wdiff into a library and then make wdiff use the
library and make a dynamically-loadable module that uses the library
to produce HTML output. Unfortunately, this is unlikely to work
properly until wdiff uses diff as a library because wdiff currently
exec()s the diff command directly (which I consider a hack), which
makes using the currently wdiff code as a dynamically-loaded module
equivalent in ugliness to using wdiff directly from the command-line
(it's just that your web server would depend on having the
command-line "diff" tool installed instead of "wdiff").
If you are interested in doing this (making wdiff use a diff library
instead of calling diff directly), I would encourage you and could
probably provide some help. This is on my long-term todo list for
> > I would appreciate comments from anyone on this list that can suggest
> > the best way of providing different types of output for wdiff. I
> > suppose ideally there would be a wdiff library that could be linked
> > into programs that specify a particular type of output, but that
> > doesn't really seem practical at this point.
> How about a vtable for copy_word() and copy_whitespace()?
Perhaps. To be honest, I haven't spent a lot of time reading the
wdiff code to fully understand what all the functions do since I
started maintaining it a few months ago. As a result, I can't really
make a good assessment of how well this would generalize the code.
But that definitely does appear to move into the library style that
> > Finally, your patch must apply cleanly to the latest version of wdiff
> > in CVS. The patch you sent appears to be made from an older version
> > of wdiff (perhaps 0.5) and does not apply cleanly to the latest
> > version in CVS.
> You still use CVS?!
Actually wdiff started using CVS when I began maintaining it. The
reason I use CVS is not because the previous maintainer used it. I'm
actually not sure which version control system the previous maintainer
used, if any at all.
I chose to use CVS mainly because I was familiar with it and it was
convenient to setup through Savannah. Had it been just as easy to
setup, I would have used Subversion instead, but the few improvements
Subversion has over CVS were not enough to make me go to the extra
work of setting it up.
Ideally I would have spent a bunch of time researching different
version control systems out there, especially newer ones like git and
Bazaar, and then used whichever one I deemed to be the best.
Hopefully I will have time to do this in the future, at which point I
will likely migrate wdiff to a new system.
> I got wdiff from the latest Ubuntu with "apt-get source".
> Are they a long way behind CVS?
Yes, a very long way. What you get with "apt-get source wdiff" (at
least on Ubuntu 7.10) is wdiff 0.5, which was released in 1994, along
with some patches the Debian package maintainers have made since then
(the Ubuntu wdiff package is really just the Debian wdiff package).
But this is understandable since wdiff 0.5 was the last stable release
of GNU wdiff. Following that release, a fork of GNU wdiff called Free
wdiff was created, which was developed outside of the umbrella of the
GNU project. The last version of Free wdiff (version 0.5g) was
released in 1999. Most package maintainers, including the Debian
maintainers, have chosen to stick with GNU wdiff, although some (such
as the Fink maintainers) have opted to switch to Free wdiff.
Since I started maintaining wdiff, I have merged the changes from Free
wdiff back into GNU wdiff with the help of the Free wdiff maintainer
and made some other changes like moving to Gnulib. All of these
changes are in CVS. However, I have not had the time to test these
changes in a sufficiently thorough manner to be comfortable with
releasing a new official version of wdiff. So for the time being,
package maintainers have no choice but to use the 1994 GNU wdiff 0.5
release or the 1999 Free wdiff 0.5g release.
In general, the code you get with "apt-get source" for a particular
piece of software (or, more generally, the code that a distribution
chooses to use in production) lags several months behind the latest
stable code for that software. I suspect this is done because
distribution maintainers don't have time to test every new version of
every piece of software that comes out. Instead, they test particular
versions of supported software packages for a distribution release
(such as Ubuntu 7.10) and then more or less stick with those versions
until the next release, making security updates as necessary. As a
result, it is almost always best to create patches for the latest
version of the software checked out from version control.
It is also important to know that the code you get with "apt-get
source" (or similar commands) is not only older than the latest
version, but it usually contains additional changes that weren't
present in the official version of that software. For example, the
Debian maintainers have made at least 15 changes to wdiff 0.5 in the
Debian wdiff package . These are usually merged back into the main
tree, but packages for software that hasn't been maintained in a while
(such as GNU wdiff) tend to accumulate a large number of changes so
they deviate significantly from the official source tree.
With Ubuntu or other Debian-based distributions, you can view these
distribution-specific changes by looking at the diff.gz file that you
get when you run "apt-get source". For example, for wdiff under
Ubuntu 7.10 these are in wdiff_0.5-17build1.diff.gz.