[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-diffutils] bug: diff -r swaps the files it should check
From: |
Paul Eggert |
Subject: |
Re: [bug-diffutils] bug: diff -r swaps the files it should check |
Date: |
Fri, 13 Aug 2010 03:06:20 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6 |
On 08/07/10 18:50, Christoph Anton Mitterer wrote:
> I found a rather strange bug in diff (Debian version 1:3.0-1) when doing
> large backups of millions of files.
> First I've copied the data to an external HDD, then I've did an "diff -q
> -r >out 2> err" in order to find any errors.
A wise precaution! I do that sort of thing a lot.
> The strange unicode characters had their character codes inside which read
> as follows:
> 1CEE 1CAE
> 1CAE 1CEE
> ...
> Any ideas?
My idea is that you've found and reported a bug that has been in GNU
diff ever since 2.8 was released back in 2002. Thanks very much for
reporting this: you've undoubtedly saved others some work and
hair-pulling.
I've installed the following patch, which I hope fixes things. If you
can test it on your data, that'd be nice. I have verified this bug on
a much simpler test case, and this patch includes the test case to try
to make sure that a similar bug doesn't creep back in later.
>From 29ac7f191c444400c6df067647e3158a1e188ddf Mon Sep 17 00:00:00 2001
From: Paul Eggert <address@hidden>
Date: Thu, 12 Aug 2010 17:55:05 -0700
Subject: [PATCH] diff: avoid spurious diffs when two distinct dir entries
compare equal
Problem reported by Christoph Anton Mitterer in:
http://lists.gnu.org/archive/html/bug-diffutils/2010-08/msg00000.html
* NEWS: Mention this bug fix.
* src/dir.c (compare_names_for_qsort): Fall back on file_name_cmp
if two distinct entries in the same directory compare equal.
(diff_dirs): Prefer a file_name_cmp match when available.
* tests/Makefile.am (TESTS): New test colliding-file-names.
* tests/colliding-file-names: New file.
---
NEWS | 5 +++++
src/dir.c | 41 +++++++++++++++++++++++++++++++++++++++--
tests/Makefile.am | 1 +
tests/colliding-file-names | 20 ++++++++++++++++++++
4 files changed, 65 insertions(+), 2 deletions(-)
create mode 100644 tests/colliding-file-names
diff --git a/NEWS b/NEWS
index 27b9886..1c41e9e 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,11 @@ GNU diffutils NEWS -*-
outline -*-
* Noteworthy changes in release ?.? (????-??-??) [?]
+** Bug fixes
+
+ diff no longer reports spurious differences merely because two entries
+ in the same directory have names that compare equal in the current
+ locale, or compare equal because --ignore-file-name-case was given.
* Noteworthy changes in release 3.0 (2010-05-03) [stable]
diff --git a/src/dir.c b/src/dir.c
index 5b4eaec..1b078ab 100644
--- a/src/dir.c
+++ b/src/dir.c
@@ -166,14 +166,16 @@ compare_names (char const *name1, char const *name2)
: file_name_cmp (name1, name2));
}
-/* A wrapper for compare_names suitable as an argument for qsort. */
+/* Compare names FILE1 and FILE2 when sorting a directory.
+ Prefer filtered comparison, breaking ties with file_name_cmp. */
static int
compare_names_for_qsort (void const *file1, void const *file2)
{
char const *const *f1 = file1;
char const *const *f2 = file2;
- return compare_names (*f1, *f2);
+ int diff = compare_names (*f1, *f2);
+ return diff ? diff : file_name_cmp (*f1, *f2);
}
/* Compare the contents of two directories named in CMP.
@@ -253,6 +255,41 @@ diff_dirs (struct comparison const *cmp,
pretend the "next name" in that dir is very large. */
int nameorder = (!*names[0] ? 1 : !*names[1] ? -1
: compare_names (*names[0], *names[1]));
+
+ /* Prefer a file_name_cmp match if available. This algorithm is
+ O(N**2), where N is the number of names in a directory
+ that compare_names says are all equal, but in practice N
+ is so small it's not worth tuning. */
+ if (nameorder == 0)
+ {
+ int raw_order = file_name_cmp (*names[0], *names[1]);
+ if (raw_order != 0)
+ {
+ int greater_side = raw_order < 0;
+ int lesser_side = 1 - greater_side;
+ char const **lesser = names[lesser_side];
+ char const *greater_name = *names[greater_side];
+ char const **p;
+
+ for (p = lesser + 1;
+ *p && compare_names (*p, greater_name) == 0;
+ p++)
+ {
+ int cmp = file_name_cmp (*p, greater_name);
+ if (0 <= cmp)
+ {
+ if (cmp == 0)
+ {
+ memmove (lesser + 1, lesser,
+ (char *) p - (char *) lesser);
+ *lesser = greater_name;
+ }
+ break;
+ }
+ }
+ }
+ }
+
int v1 = (*handle_file) (cmp,
0 < nameorder ? 0 : *names[0]++,
nameorder < 0 ? 0 : *names[1]++);
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 6a4858c..242d501 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1,6 +1,7 @@
TESTS = \
basic \
binary \
+ colliding-file-names \
help-version \
function-line-vs-leading-space \
label-vs-func \
diff --git a/tests/colliding-file-names b/tests/colliding-file-names
new file mode 100644
index 0000000..f14dce7
--- /dev/null
+++ b/tests/colliding-file-names
@@ -0,0 +1,20 @@
+#!/bin/sh
+# Check that diff responds well if a directory has multiple file names
+# that compare equal.
+
+: ${srcdir=.}
+. "$srcdir/init.sh"; path_prepend_ ../src
+
+mkdir d1 d2 || fail=1
+
+for i in abc abC aBc aBC Abc AbC ABc ABC; do
+ echo $i >d1/$i || fail=1
+done
+
+for i in ABC ABc AbC Abc aBC aBc abC abc; do
+ echo $i >d2/$i || fail=1
+done
+
+diff -r --ignore-file-name-case d1 d2 || fail=1
+
+Exit $fail
--
1.7.2