[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Merging bug (wrong conflicts)

From: Karl Tomlinson
Subject: Re: Merging bug (wrong conflicts)
Date: Mon, 19 Feb 2001 14:54:06 +1300

"Derek R. Price" wrote:
> The short test cases might hold clues as to common causes for these errors?
> Karl, you managed to fix the algorithm without knowing what file 
> configurations cause
> spurious conflicts?

The short answer is yes, but I may be able to provide some help.
I didn't have much success analysing the problem by trying to determine
how the problem files were different so I looked at the algorithm and
noticed some things that I thought should be improved.
Having seen the algorithm, I can provide some theories on the problem
cases.  This was a while ago but my memory's coming back slowly.

I can now think of 3 possible configurations that could cause problems.

M(ine) O(lder) Y(ours)



M  O  Y

x  x  x
s  s  s
c  a  a
s  b  b
b  s  s
s  y  b
y     s

The erroreous merge with no conflicts may depend on the change
from O to Y repeating a portion of text.


Both depend on a difference between files that can be represented
by more than one minimal hunk.

An example is

Difference A:

 1  2

 x  y
 m  m
 x  y

Does the m in file 1 match the first or second m in file 2?

The logic in diag and compareseq is not consistent in analyze.c.
The match selected depends on where in the file this difference occurs
and on the surrounding differences.

If a difference such as this exists between O and Y but O and M are
the same in this region of the file,
then this should be recognized as a change from O to Y with no conflict.
However, when diff3 compared M and Y, although it should find the same
difference as between O and Y but sometimes got a different hunk, and
reported a conflict.

Careful when trying to create an test example from Difference A.
If y never occurs in file 1 and/or x never occurs in file 2 then then
y and/or x are discarded before compareseq and diag.  The

 1  2

 m  m

may produce much more consistent hunks.


A different hunk may occur if there are changes from O to M in a
different region of text.

I think it should be fairly easy to find a small test example
for this case.


This is the case where O and M are the same but different hunks are still
generated for diffs between O and Y, and between M and Y.  Things are more
complicated here but it may be possible to generate a smallish (<50 line)
test example.

This happens when the --horizon-lines option changes between two diffs run
by diff3.  The value of the argument is 10 for the first diff, but for the
second diff I think it is based on the sizes of the hunks for the first
diff.  So the first diff needs a hunk of more than 10 lines.

The horizon is the number of matching lines at the beginning or end of the
file that are not trimmed before performing the diff algorithm.
There need to be more than 10 matching lines at the beginning or end of the
file for a different --horizon-lines option to produce different hunks.

If there is a line in the extra horizon lines that is unique in its own
file, but exists in the other file (as a line that is not trimmed)
then its inclusion will prevent the line in the other file from being
discarded before diag and compareseq.  My theory is that this can cause a
change in the hunk reported.

An example may be:
O = M.  There is a difference between M and Y like Difference A.
There is also a difference (potentially the same difference) producing
a hunk of more than 10 lines.
There are more than 10 matching lines at the end of the file.
The (10+i)th (i>1) matching line (past the last non-matching line) is y.

It is going to be quite random whether a different hunk is produced.
You may need to play with extra non-matching lines elsewhere in the file
to get a different hunk produced.

Perhaps you could look at your current large failed examples, see if they
satisfy these prerequisites that I expect here, then remove lines that don't
affect these prerequisites.  You could possibly find that reducing the size
of the files a little removes the conflicts, and reducing them further brings
the conflict back.

Sorry, I can't think of anything better than trial and error.

Thanks Jacob for your efforts.

If some of this needs more explaining then please let me know specifically
what doesn't make sense, and I'll see what I can do.


Don't forget that O is the common file after the patch.
So configuration 3) should be tested with M=Y.

Configuration 2) with Y and O interchanged could be a problem after the patch,
but this represents the same change made on branch and trunk.  i.e. a difference
where branch and trunk are the same but older is different.  But this should
be rarer than a difference between older and branch with trunk the same as
older.  I think this can actually be fixed also, but requires significant
changes to diag and compareseq, and turning off heuristics.
I have made these changes in diffutils but haven't transferred them to cvs.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]