Re: sync repositories

info-cvs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sync repositories

From:	Mike Ayers
Subject:	Re: sync repositories
Date:	Fri, 09 Aug 2002 13:51:00 -0700
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1b) Gecko/20020721

Zieg, Mark wrote:

I have to sync two CVS repositories located on two non-

connected networks.


If you MUST do this (and it is almost certain that you do
not need to, but that's another story)



I assume that you've never had to develop under DOD-enforced contract
requirements, or you wouldn't have written that.  Anyway, the justification

Nope. I was thinking of DoD, that's why I wrote "almost". The DoDleads the list of "industries that don't understand digital security".(And no, I'm not thumbing my nose at you for working there)

Duly noted.  For the record, does anyone have a suggestion for an
open-source CM tool which _is_ designed for use in this manner?  Again,
assuming that the repositories must be on physically disjunct networks, such
that any synchronization would have to be via hand-ported media in
human-readable format (ie, diff patches on CD-R, etc).

I suspect you won't find one, since, as mentioned before, this isfundamentally bad practice with a simple solution - single repositoryoperation. (A "real" solution for this is mindbogglingly complicated,due to all the special cases that can arise). I know that the subjectcomes up a lot, but I've never seen anyone come back with a solidanswer. I think most of the solutions that get implemented arepainful but workable ones like the one I suggested, or the one yououtline below.

I don't debate that this is stretching the intended functionality of CVS,
but I would nonetheless prefer finding a way to use an open-source CM tool
such as CVS than rely on a proprietary commercial vendor "solution".

Sounds good. However, you aren't "stretching", you're "redefining".Please keep in mind that CVS will only be a component in a system ofyour own design.

I'm still working on a satisfactory algorithm for this, but my current
thinking bends toward a classic master-slave synchronization effort, ie
treat one repository as the "master" (main trunk) as the other as a slave
("branch").  Then all we have to do is merge the branch back into the main
trunk, then re-spawn a fresh copy of the master to start a new branch.

That should work OK, but you will have to shutdown the slave duringsynchronization and force update (and possible merge) of all slaveclients when you bring it back up.

(Although I'm using the term "branch", I'm not currently planning to make
use of actual CVS branches...should I?  Is there room for an efficient
optimization by using that feature?)

Almost certainly not, perhaps someone else is more inspired than I.The one thing I think might help would be per-file branches, which Imention below.

This visualizes my approach:

(RepoA and RepoB are Repositories on Networks NetA and NetB.)

RepoA
-------------
foo.c @  1.1
foo.c -> 1.2
foo.c -> 1.3

copy RepoA to RepoB

    |
    | \
    |   \
    |     \
    |       \
    |         \
    |           \
    \/           _|

RepoA           RepoB
-------------   ------------
foo.c @  1.3  | foo.c @  1.3
foo.c -> 1.4  |
              | foo.c -> 1.4 (alpha mod)
foo.c -> 1.5  |
foo.c -> 1.6  |
              | foo.c -> 1.5 (bravo mod)
foo.c -> 1.7  |

time to sync changes!


              | collect all diffs
              |    to all files
              |    (2 diffs for foo.c),
              |
             <--  transport to RepoA
              |
foreach file, |
foreach diff, |
apply & comm. |
              |
foo.c -> 1.8  |
foo.c -> 1.9  |
              |
foo.c now has |
  alpha and   |
  bravo mods  |
              |

WHOOT! WHOOT! WHOOT! Danger, Will Robinson! Are you ABSOLUTELYCERTAIN that the person merging the databases will be able to mergethe files? If not, you're either going to have to shuttle developersinto and out of the RepoA location (with both repos down in themeantime), or prepare for long periods of broken trees.

Another thing that you could do here would be per-file branches, onlyfor files that have conflicts like this. So your file would nowhistory like:


   foo.c
     |
     | \
     |   \
     |     \
     |       \
     |         \
     |           \
     \/           _|

    trunk           diff_branch_2002_08_09
 -------------   ------------
 foo.c @  1.3  | foo.c @  1.3.1.1
 foo.c -> 1.4  |
               | foo.c -> 1.3.1.2 (alpha mod)
 foo.c -> 1.5  |
 foo.c -> 1.6  |
               | foo.c -> 1.3.1.3 (bravo mod)
 foo.c -> 1.7  |

     |           /
     |         /
     |       /
     |     /
     |   /
     | /
     |
     V

  foo.c -> 1.8 (merged)

With this, you still have a viable archive (provided foo.c isn't thefile that everything else depends on), and may be able to overwriteRepoB with this and proceed for the next cycle while the foo.cdevelopers do the merge.

Structuring your files to minimize simultaneous alternate locationdevelopment will pay off big here.

copy RepoA to RepoB

    |
    | \
    |   \
    |     \
    |       \
    |         \
    |           \
    \/           _|

RepoA           RepoB
-------------   ------------
foo.c @  1.9  | foo.c @  1.9
              |
              V

...of course, this is of less help to Piet, who already has split
repositories, and may or may not have a common base version from which to
apply a "merge".  However, I have the advantage of having not yet split my
development, so I still have a chance to plan things out and initialize the
sets accordingly...

That does help you a lot, but a method that can synchronize two treesthat diverged last week can synchronize two trees that diverged lastyear, if a bit more painfully.

I haven't worked out the mechanics of extracting the diffs, but with a bit
of Perl and what-not it shouldn't be difficult to preserve log messages.  I
thought about trying to override the author/date attributes of the diffs,
but even if that were feasible and convenient, it would be a little weird if
"rev 1.8" seemed to be datestamped before "1.7"...therefore, I'll probably
just append the original (RepoB) author and date onto the log message as
each diff is re-applied.

OK, you're going to want to apply tags (e.g. Sync_2002_08_09) thatyou can use to determine which version was the last common baseline.From there, you'll be able to use cvs diff and cvs log with revisionsspecified to yield per-commit patches, which you can use to commit thechanges to RepoA as they were committed to RepoB.

Comments vigorously solicited!


        Ugh - brayn hurtz.


/|/|ike

[Prev in Thread]

Current Thread

[Next in Thread]

Re: sync repositories, (continued)
- Re: sync repositories, Zieg, Mark, 2002/08/09
  - Re: sync repositories, Mike Ayers, 2002/08/09
- RE: sync repositories, Zieg, Mark, 2002/08/09
  - RE: sync repositories, Paul Sander, 2002/08/09
    - Re: sync repositories, Eric Siegerman, 2002/08/09
    - Re: sync repositories, Paul Sander, 2002/08/09
    - RE: sync repositories, Ronald Landheer, 2002/08/13
    - Re: sync repositories, Eric Siegerman, 2002/08/19
    - Re: sync repositories, Paul Sander, 2002/08/19
    - Re: sync repositories, Eric Siegerman, 2002/08/19
  - Re: sync repositories, Mike Ayers <=

Prev by Date: RE: sync repositories
Next by Date: Re: sync repositories
Previous by thread: Re: sync repositories
Next by thread: vendor tag
Index(es):
- Date
- Thread