monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Cvssync status


From: Christof Petig
Subject: Re: [Monotone-devel] Cvssync status
Date: Wed, 21 Sep 2005 17:08:59 +0200
User-agent: Mozilla Thunderbird 1.0.6 (X11/20050912)

Nathaniel Smith schrieb:
> On Mon, Sep 19, 2005 at 01:35:07PM +0200, Christof Petig wrote:
>>>What does the cvs_admin command do?
>>
>>... looking into the code ...
>>
>>monotone cvs_admin manifest <revision>
>>
>>gives you each CVS revision and path name of a manifest. [I mostly used
>>this for debugging, it's handy to have. But if you think it unnecessary
>>I will drop it (or otherwise document it).]
> 
> 
> Hmm, nothing wrong with having debugging commands, but should probably
> give _some_ description somewhere, and put it in the "debug" section
> of help output.  Maybe name it something that will better indicate to
> average end users that they don't want it, like "cvssync_debug" or
> something.

Once upon a time cvs_admin offered much more functionality. cvs_debug
should be a better name ... done ...

>>I will document it, but I did only take cvs_import as a design pattern,
>>my code is not identical.
> 
> 
> Cool.

Not always. I completely left out the interner part of cvs_import. After
many difficulties figuring out the syntax and semantics of that
(undocumented) part. I see that it might save space on huge repositories
(compared to std::string).

Easy to port to using it later.

> I don't know anything other than astyle or indent.  Sorry.

I was hoping for you to know the correct command line options offhand ;-)

>>I decided to give up tricking here by adding another temporary in one place.
> 
> 
> Okay.
> 
> It looks like this stringtok thing should just be designed to take an
> output iterator, and then you could pass a std::inserter or a
> std::back_inserter depending on whether it should write into a set or
> a sequence...

You score. (I didn't think about output iterators here)

> Ah, there's a much better trick -- take a time_t, run gmtime, run
> mktime, compare the resulting time_t to the original time_t.  This
> gives you the timezone offset; just apply this correction factor to
> the result of running mktime on your original time.
> 
> Example code, with more detailed description:
>   http://lists.debian.org/deity/2002/04/msg00082.html

Cool! Do you think that migrating from time_t to boost::some_time_class
is worth the effort? (I know that time_t is space efficient and fast to
compare but nothing about the boost classes).

>>>In monname2month, would it have been so horrible to just have a 12-way
>>>if (x == "Jan") ... else if (x == "Feb") ... etc.? :-)
>>
>>No, but it's faster this way ;-)
> 
> 
> Err... is this a bottleneck, then? :-)

No. This was just too much creativity spent making the shortest possible
code for this stupid task.

>>Agreed. But writing comments to an imaginative audience is no fun
>>(especially if you have to redesign some times because of CVS server flaws).
> 
> 
> Yah.  Hopefully you don't expect to have to redesign again now?

Most of the functionality is in place and working. Yesterday I started
to write side branch support (because this might need adding some code
here and there) before starting to document it.

Apart from factoring out something or rearranging it I don't plan any
redesigns anymore.

[for the record: I had to patch netxx/probe.h today because of a Linux
kernel bug (?): selecting for read on the _write_ side of a pipe
returned instantly with "data available" 8-O ]

>>The CVS server has some flaws! Each revision has it's own set. What
>>started as a nice idea ended in a nightmare. Some commands work as
>>advertized on 1.11, some on 1.12. I can not fix every running CVS server
>>in the world so I had to get creative.
> 
> 
> Yeah, I understand this part.  It's what would make me terrified to
> touch the code, because all I know is that there are mysterious Weird
> Bugs out there that are being worked around, but I have no idea what
> they actually are, and thus can't tell what parts of the code are
> workaround, which are themselves bugs, etc.
> 
> The only way that I know to deal with this sort of situation is to
> build up communal knowledge -- every time someone finds a weirdness,
> write it down, so the next guy will know too.  That way, while you
> never really know whether your code is correct, you at least know that
> each version is _more_ correct than anything that came before it,
> because you can see that you still handle all the old cases, plus the
> new one.

For now only some code which had been abandoned by me uncovered flaws in
the server. The current design is free of workarounds (though sometimes
the comment outlived the functionality). I'll try to delete all cruft.

>>An especially the fact that the CVS server was designed to handle
>>exactly one command per connection (gasp!) is deadly to anyone trying to
>>contact it efficiently.
> 
> 
> Oh, ew.  How does one do anything efficient at all, then?

I ignore this design. Most of the time it works anyway (sometimes
certain flags (overwrite changes) help). But asking for specific
information is not easy sometimes (like "give me every revision which
belongs to this branch" => "unchanged files inherited from earlier
branches are never mentioned" ... back to start ... if you start using
more complex queries other revisions/files vanish ...)

>>>+#ifdef WIN32
>>>+#define sleep(x) _sleep(x)
>>>+#endif
>>>^^ Ick.  If this is a problem, put a sleep() in platform.hh.  New code
>>>should, basically, never ever say "#ifdef WIN32".
>>
>>This damned Win32 platform compatibility has caused more than half of
>>the work on this branch. I'm not sure I want to invest more to make a
>>platform work that I don't need at all.
> 
> 
> Well, that makes sense... though this particular fix is pretty
> trivial.
> 
> In the broad scheme of things, I'm not quite sure what to say.  I
> don't really want to drop more things on our few valiant win32
> maintainers; they already do a heroic enough job keeping things
> running at all... but perhaps someone else will find this
> functionality intriguing enough to fix it up.
> 
> In any case, basically, it's fine if the cross-platform support is a
> bit shaky for an experimental feature, but this will probably have to
> be fixed up at some point before it becomes "non-experimental"...

Agreed. I simply didn't want to spent more work on creating a platform
abstraction when theres enough to spent on the code itself.

> Oh, okay.  So, making sure I understand here, the idea is -- we have
> these certs that describe how to match up a given monotone revision
> with a CVS revision.  These certs are primarily for machine reading,
> but all else being equal, it'd be nice if humans could read it too.
> 
> Therefore, \t is chosen as a delimiter.  If a CVS repo directory (or
> module name or branch name, but I doubt(?) this is possible) ever
> contains a tab, then we will dump some corrupted data that we
> cannot read?  (What is 'host', by the way?  Could it ever contain a
> colon, say if we were talking to the CVS repo via ssh?)
> 
> I'm a little leery of the guess-and-pray method of lossless string
> quoting.  The safest character to use as a split delimiter is \0 --
> though, of course, that breaks textual display.  Your other usual
> options are length-prefixing (e.g., djb's "netstrings"), or some sort
> of proper quoting -- maybe basic_io would be the best thing to use
> here, it has proper quoting already.  It's also _much_ easier to
> figure out from looking at -- I can't figure out what the full
> cert_cvs() format is from looking at the code, never mind looking at
> some arbitrary strings strung together by tabs...
> 
> But I guess you're worried about size?  If so, probably worth running
> the numbers to see what the overhead would be...

If you look at the elements involved (host name, paths, cvs tag names) a
tab is very very unlikely to occur in the real world. Using space or /
to separate the elements turned out to be wrong.

> Is there any checking in the code for the possibility of multiple
> conflicting cvs certs?  Is there a test to make sure that this case is
> properly detected and handled/reported?

No. One can say that the current certification based storage scheme is
fragile at best. E.g.

cvs.manuproc.berlios.de:/cvsroot/manuproc       c++
+de2773c7a781190f9a4f6e4b6d38b283e6e3bd90
1.13 Basic/src/Interval.cc
1.12 Basic/src/Interval.h

This means: host cvs.manuproc.berlios.de directory /cvsroot/manuproc
accessed via CVS_RSH (this is the repository part), module c++, HEAD
branch (no third element).

Take the matching cvs cert from revision
de2773c7a781190f9a4f6e4b6d38b283e6e3bd90 and change the following two
cvs revision numbers.

A file/file_deltas based storage would be much more robust and efficient
but needs a change within monotone. E.g. a cert which refers to a file.

>>>+ // because some manifests might have been absolute (not delta encoded)
>>>+  // we possibly did not notice removes. check for them
>>>^^ what is this about?  (whether a manifest is delta-encoded or not
>>>should be completely hidden anywhere except database.cc, should it?)
>>
>>This talks about CVS manifests. And since CVS manifests are stored in
>>certificates
> 
> 
> Ah.  Missed that.  It's a bit confusing ATM...

I think the above example makes it much clearer.

>>>Why are all of cvs_edge's fields marked 'mutable'?  There are hardly
>>>_any_ good reasons to use 'mutable', certainly none that don't also
>>>have explanatary comments...
>>
>>Will have to take a look ...

Because set<>::iterator is the same as set<>::const_iterator (see STL
reference). The mutable parts take no part in the ordering and are mutable.

>>Perhaps the above mentioned pipe_test_client.c program is a good idea
>>since it can test for binary transparency (i fear problems with line
>>endings), input output dependance etc. I simply ran out of motivation
>>for the Win32 platform.
> 
> 
> I don't understand why you can't do just as much with 'cat' -- see if
> it losslessly pases \n, \r\n, \r?
> 
> Such a test would be just as important on POSIX as Win32, really.

I agree. (Where did I have my thoughts?)

>>>(And maybe also "echo", to test command line passing to the spawned
>>>program?)
>>>
>>>Your command-line quoting for win32 is wrong (as your comment notes).
>>>win32/process.cc already has correct win32 quoting code, duplicating
>>>this finicky task is quite silly, we should find a way not to do that.
>>
>>I did not want to make netxx/* candidate code rely on win32/*. But this
>>might be a better idea.
> 
> 
> Yeah, I'm not quite sure what the best layout would be, but it can't
> be that hard to find some sort of reasonable one...

A good idea might be to make
std::string munge_argv_into_cmdline(const char* const argv[])
publicly visible in platform.hh if WIN32 ?

Since I need to pass different STARTUPINFO arguments (stdio
descriptors), I can not use spawn_process.


>>The worst problems I have encountered in daily work is that cvs_pull was
>>unable to fetch a revision from the CVS server because of some protocol
>>oddity. Theoretically monotone could go amok and commit unwanted (or
>>broken) revisions (or insane changelogs) but it never touches the CVS
>>files directly, so this should be reversible.
> 
> 
> Yes.  But IMO we need to be _extremely_ cautious here.  There is no
> sin worse than losing/corrupting data, and many users are not
> prepared/capable of backing out unwanted changes to CVS, even if they
> are reversible in principle.  (I know _I_ wouldn't feel very
> comfortable doing that.)
> 
> Or if that's not convincing, consider that cvssync's selling point is
> sneaking monotone into groups from the grassroots, and dumping a
> bunch of bad data into the shared repo would give all the other
> non-monotone-using developers a _great_ first impression of monotone's
> reliability etc....

Agreed. The point had been convincing from the start. It's just that the
worst things that had happened to me while developing cvssync were
- synchronization between monotone and CVS lost (e.g. partial checkin
etc.) => pull again, merge, push
- unforeseen situation happened => think about the implications,
implement it before I you can pull/push again

>>Thank you for such a thorough review of my code (before I had reviewed
>>it <blush>). I hope we can address the problems, most likely some of the
>>code has to get restructured once git + other VCS synchronization go
>>mainline (SVN seems most interesting next).
> 
> 
> Yes, interesting point... maybe we should get the current stuff stable
> before tackling that ;-).

Even More stable? Monotone has already gotten more stable than CVS can
ever get. ;-) [Leaving merge algorithms aside]

  Christof

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]