bug-diffutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-diffutils] bug: diff -r swaps the files it should check


From: Paul Eggert
Subject: Re: [bug-diffutils] bug: diff -r swaps the files it should check
Date: Fri, 13 Aug 2010 03:06:20 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6

On 08/07/10 18:50, Christoph Anton Mitterer wrote:

> I found a rather strange bug in diff (Debian version 1:3.0-1) when doing
> large backups of millions of files.
> First I've copied the data to an external HDD, then I've did an "diff -q
> -r >out 2> err" in order to find any errors.

A wise precaution!  I do that sort of thing a lot.

> The strange unicode characters had their character codes inside which read
> as follows:
> 1CEE      1CAE
> 1CAE      1CEE
> ...
> Any ideas?

My idea is that you've found and reported a bug that has been in GNU
diff ever since 2.8 was released back in 2002.  Thanks very much for
reporting this: you've undoubtedly saved others some work and
hair-pulling.

I've installed the following patch, which I hope fixes things.  If you
can test it on your data, that'd be nice.  I have verified this bug on
a much simpler test case, and this patch includes the test case to try
to make sure that a similar bug doesn't creep back in later.



>From 29ac7f191c444400c6df067647e3158a1e188ddf Mon Sep 17 00:00:00 2001
From: Paul Eggert <address@hidden>
Date: Thu, 12 Aug 2010 17:55:05 -0700
Subject: [PATCH] diff: avoid spurious diffs when two distinct dir entries 
compare equal

Problem reported by Christoph Anton Mitterer in:
http://lists.gnu.org/archive/html/bug-diffutils/2010-08/msg00000.html

* NEWS: Mention this bug fix.
* src/dir.c (compare_names_for_qsort): Fall back on file_name_cmp
if two distinct entries in the same directory compare equal.
(diff_dirs): Prefer a file_name_cmp match when available.
* tests/Makefile.am (TESTS): New test colliding-file-names.
* tests/colliding-file-names: New file.
---
 NEWS                       |    5 +++++
 src/dir.c                  |   41 +++++++++++++++++++++++++++++++++++++++--
 tests/Makefile.am          |    1 +
 tests/colliding-file-names |   20 ++++++++++++++++++++
 4 files changed, 65 insertions(+), 2 deletions(-)
 create mode 100644 tests/colliding-file-names

diff --git a/NEWS b/NEWS
index 27b9886..1c41e9e 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,11 @@ GNU diffutils NEWS                                    -*- 
outline -*-
 
 * Noteworthy changes in release ?.? (????-??-??) [?]
 
+** Bug fixes
+
+  diff no longer reports spurious differences merely because two entries
+  in the same directory have names that compare equal in the current
+  locale, or compare equal because --ignore-file-name-case was given.
 
 * Noteworthy changes in release 3.0 (2010-05-03) [stable]
 
diff --git a/src/dir.c b/src/dir.c
index 5b4eaec..1b078ab 100644
--- a/src/dir.c
+++ b/src/dir.c
@@ -166,14 +166,16 @@ compare_names (char const *name1, char const *name2)
          : file_name_cmp (name1, name2));
 }
 
-/* A wrapper for compare_names suitable as an argument for qsort.  */
+/* Compare names FILE1 and FILE2 when sorting a directory.
+   Prefer filtered comparison, breaking ties with file_name_cmp.  */
 
 static int
 compare_names_for_qsort (void const *file1, void const *file2)
 {
   char const *const *f1 = file1;
   char const *const *f2 = file2;
-  return compare_names (*f1, *f2);
+  int diff = compare_names (*f1, *f2);
+  return diff ? diff : file_name_cmp (*f1, *f2);
 }
 
 /* Compare the contents of two directories named in CMP.
@@ -253,6 +255,41 @@ diff_dirs (struct comparison const *cmp,
             pretend the "next name" in that dir is very large.  */
          int nameorder = (!*names[0] ? 1 : !*names[1] ? -1
                           : compare_names (*names[0], *names[1]));
+
+         /* Prefer a file_name_cmp match if available.  This algorithm is
+            O(N**2), where N is the number of names in a directory
+            that compare_names says are all equal, but in practice N
+            is so small it's not worth tuning.  */
+         if (nameorder == 0)
+           {
+             int raw_order = file_name_cmp (*names[0], *names[1]);
+             if (raw_order != 0)
+               {
+                 int greater_side = raw_order < 0;
+                 int lesser_side = 1 - greater_side;
+                 char const **lesser = names[lesser_side];
+                 char const *greater_name = *names[greater_side];
+                 char const **p;
+
+                 for (p = lesser + 1;
+                      *p && compare_names (*p, greater_name) == 0;
+                      p++)
+                   {
+                     int cmp = file_name_cmp (*p, greater_name);
+                     if (0 <= cmp)
+                       {
+                         if (cmp == 0)
+                           {
+                             memmove (lesser + 1, lesser,
+                                      (char *) p - (char *) lesser);
+                             *lesser = greater_name;
+                           }
+                         break;
+                       }
+                   }
+               }
+           }
+
          int v1 = (*handle_file) (cmp,
                                   0 < nameorder ? 0 : *names[0]++,
                                   nameorder < 0 ? 0 : *names[1]++);
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 6a4858c..242d501 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1,6 +1,7 @@
 TESTS = \
   basic \
   binary \
+  colliding-file-names \
   help-version \
   function-line-vs-leading-space \
   label-vs-func        \
diff --git a/tests/colliding-file-names b/tests/colliding-file-names
new file mode 100644
index 0000000..f14dce7
--- /dev/null
+++ b/tests/colliding-file-names
@@ -0,0 +1,20 @@
+#!/bin/sh
+# Check that diff responds well if a directory has multiple file names
+# that compare equal.
+
+: ${srcdir=.}
+. "$srcdir/init.sh"; path_prepend_ ../src
+
+mkdir d1 d2 || fail=1
+
+for i in abc abC aBc aBC Abc AbC ABc ABC; do
+ echo $i >d1/$i || fail=1
+done
+
+for i in ABC ABc AbC Abc aBC aBc abC abc; do
+ echo $i >d2/$i || fail=1
+done
+
+diff -r --ignore-file-name-case d1 d2 || fail=1
+
+Exit $fail
-- 
1.7.2





reply via email to

[Prev in Thread] Current Thread [Next in Thread]