bug-diffutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-diffutils] bug#44838: diff 3.7 incorrectly reports added lines and


From: Vincent Lefevre
Subject: [bug-diffutils] bug#44838: diff 3.7 incorrectly reports added lines and can generate huge diffs
Date: Tue, 24 Nov 2020 12:33:52 +0100
User-agent: Mutt/1.14.5+76 (bb407ec3) vl-127292 (2020-06-24)

I've attached an archive with 2 files "file1" and "file2"; "file2"
is "file1" with some lines removed, so that a diff should report
only removed lines.

Here are some tests done under Debian/sid (x86_64) with diff 3.7
(Debian package diffutils 1:3.7-3).

First, for the reference, the size of the initial diff:

$ diff -u file1 file2 | wc -l
22319

But this diff reports added lines, though "file2" has only removed
lines compared to "file1".

──────────────────────────────────────────────────────────────────
$ diff -u file1 file2 | grep -C16 'mark :37950'
-commit refs/heads/master
-#legacy-id 9122
-mark :37948
-committer Vincent Lefèvre <vincent@vinc17.net> 1404215412 +0000
-data 55
-[tests/trandom_deviate.c] Correction (fprintf format).
-from :37946
-M 100644 :37947 tests/trandom_deviate.c
-
-blob
-mark :37949
-data 15
-Blob at :37949
-
-commit refs/heads/misc
-#legacy-id 9123
-mark :37950
-committer Vincent Lefèvre <vincent@vinc17.net> 1404216001 +0000
-data 23
-[www/pub.html] Update.
-from :37941
-M 100644 :37949 www/pub.html
+data 55
+[tests/trandom_deviate.c] Correction (fprintf format).
+from :37946
+M 100644 :37947 tests/trandom_deviate.c
 
 blob
 mark :37951
@@ -9910,21 +467,6 @@
 M 100644 :38018 src/round_raw_generic.c
 
 blob
──────────────────────────────────────────────────────────────────

In particular, one can see:

-data 55
-[tests/trandom_deviate.c] Correction (fprintf format).
-from :37946
-M 100644 :37947 tests/trandom_deviate.c

and

+data 55
+[tests/trandom_deviate.c] Correction (fprintf format).
+from :37946
+M 100644 :37947 tests/trandom_deviate.c

while these lines should have been regarded as unmodified.

This problem disappears if I shorten "file2" a bit (these lines are
at the very beginning in "file2", so that such a change of behavior
is surprising):

$ head -n 129410 file2 > file3
$ diff -u file1 file3 | grep '^\+'
+++ file3       2020-11-24 11:58:17.922462693 +0100

So, now, no added lines reported. This is fine.

And here's what diff now gives around these lines:

──────────────────────────────────────────────────────────────────
$ diff -u file1 file3 | grep -C16 'mark :37950'
-commit refs/heads/master
-#legacy-id 9122
-mark :37948
-committer Vincent Lefèvre <vincent@vinc17.net> 1404215412 +0000
 data 55
 [tests/trandom_deviate.c] Correction (fprintf format).
 from :37946
 M 100644 :37947 tests/trandom_deviate.c
 
 blob
-mark :37949
-data 15
-Blob at :37949
-
-commit refs/heads/misc
-#legacy-id 9123
-mark :37950
-committer Vincent Lefèvre <vincent@vinc17.net> 1404216001 +0000
-data 23
-[www/pub.html] Update.
-from :37941
-M 100644 :37949 www/pub.html
-
-blob
 mark :37951
 data 15
 Blob at :37951
@@ -9910,21 +467,6 @@
 M 100644 :38018 src/round_raw_generic.c
 
 blob
-mark :38020
-data 15
──────────────────────────────────────────────────────────────────

This is now OK, but stranger things happen when I reduce "file2"
even more:

$ head -n 120200 file2 > file4
$ diff -u file1 file4 | grep -c '^\+'
7
$ diff -u file1 file4 | wc -l
31251

So, with "file2" reduced to 120200 lines, 7 − 1 = 6 added lines
are reported (though this new file has only removed lines). This
is incorrect, but if I remove 100 more lines at the end, this is
much worse, with 81120 added lines reported, and a huge diff:

$ head -n 120100 file2 > file5
$ diff -u file1 file5 | grep -c '^\+'
81121
$ diff -u file1 file5 | wc -l
231111

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Attachment: files.tar.xz
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]