monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Cvssync status


From: Christof Petig
Subject: Re: [Monotone-devel] Cvssync status
Date: Mon, 19 Sep 2005 13:35:07 +0200
User-agent: Mozilla Thunderbird 1.0.6 (X11/20050912)

Nathaniel Smith schrieb:
> On Sun, Sep 18, 2005 at 01:23:07AM +0200, Christof Petig wrote:
> 
>>So I would consider it ready for inclusion into mainline (to get more
>>testing by other people)

It had been ready for review.

> Cool!  Looking over the code:
> 
> = General issues =
> 
> What are these files?:
>  - WhyThisWay
>  - CVS_prot
>  - pipe_test_client.c
>  - tests.cvs/*

These should remain in the side branch (deleted in mainline). I deleted
WhyThisWay, pipe_test_client.c was a planned pipe test for Win32 and was
not yet used :-(, CVS_prot still holds some notes needed for branch
support implementation. And tests.cvs predates tests/t_*.at . I deleted
all but CVS_prot and pipe_test_client.c .

> I notice you added botan/md5.*.  Botan was upgraded on mainline
> recently; are these still from the correct version of botan?

I did not update it. But given the simplicity of the files I did not
suspect any problems (or even changes in mainline). I checked with
1.4.6, they are unchanged.

> Why is there a cvs_sync.hh but no cvs_sync.cc, and a
> cvs_repository.cc but no cvs_repository.hh?

I will rename cvs_repository.cc to cvs_sync.cc (cvs_sync is the
namespace, cvs_repository is the largest class).

> What does the cvs_admin command do?

... looking into the code ...

monotone cvs_admin manifest <revision>

gives you each CVS revision and path name of a manifest. [I mostly used
this for debugging, it's handy to have. But if you think it unnecessary
I will drop it (or otherwise document it).]

> Can you give more details on how you reconstruct whole tree-states
> from CVS's file-state information?  I ask in particular because I
> notice code that looks like the old cvs importer, but the cvs importer
> was recently rewritten because the tree state reconstruction stuff
> turned out to be fundamentally broken.  (Graydon is the person to talk
> to for details.)

I will document it, but I did only take cvs_import as a design pattern,
my code is not identical.

> We tend to say "FIXME" instead of "2DO".  Not that critical, but might
> be nice to be consistent.  (It would never occur to me to grep for
> "2DO".)

done

> What is the testing status of the cvssync code?  I see some tests in
> tests/, some tests in tests.cvs... what's going on here, and roughly
> how much of the cvssync functionality has tests?

see below

> == Formatting ==
> 
> Please format code appropriatly (= GNU style).  Really.  I know it
> seems trivial, and I hate to be annoying about it.  But it's
> important, and it _is_ trivial, so I hope this isn't a big deal.
> Code on mainline needs to be easily maintainable by anyone.  

I agree, but changing this by hand seems a dull task. Can you recommend
a program to do this task? [I will see if astyle can convert it easily.]

> = cvs_client.cc =
> 
> Can you clean up some of these #if 0's?  they're confusing to read;
> can't tell what their status is.

Oh sorry, I'll do it.

> There are some commented out headers in cvs_client.cc that can
> probably be deleted, as well.
> 
> Why do cvs_client::writestr() (and friends) use zlib directly, instead
> of via Botan?

I need _stream_ compression and decompression (for -zN accessing cvs)
and did not find that possible with botan. I'd like to get proven wrong
here.

> Why does cvs_client.hh include zlib.h, instead of cvs_client.cc?  And
> same question about stdarg.h.

The class declaration refers to va_list and z_stream.

> I am dubious about the workaround in cvs_client.cc:push_back2insert_cl
> where you make a struct with members marked mutable to avoid some
> warning from the compiler.  Are you sure you're not just silencing the
> warning while preserving the brokenness the compiler was trying to
> warn you against?  What exactly is going on here?

For the record: I was trying to avoid a temporary variable for the
adaptor (which mandates specifying the exact type of the adaptor) by
passing a temporary adaptor (which manipulates a non temporary
container)). So I cheated by making a const object (which gives no
warning on passing by reference to strtok) manipulate the mutable
reference. ["Use a hack in _one_ place to make it more easy to use this
adaptor"]

I decided to give up tricking here by adding another temporary in one place.

> 
> +// the creator function (so you don't need to specify the type
> ^^ this comment is missing a close paren
> 
> +// "  AA  " s=2 e=3
> ^^ this comment could probably be more helpful to the naive reader
> :-).

better this way?

// "  AA  " gives start=2 end=3, so substr(2,1) is correct

> Why are some methods in cvs_client capitalized and some not?

Because I like creativity ;-) ?  No good reason I suspect.

> Your putenv/getenv stuff in timezone2time_t does not look portable.
> It is simply not true that WIN32 has putenv while everywhere else has
> setenv...

I agree. But making this function portable to platforms I don't know
anything about was beyond my interest (to be honest). All I want to do
here is convert a broken down timestamp in UTC (!!!) to a time_t.
Perhaps boost::date can offer something here? (I did not find anything)

To be honest I was glad when this function worked on Win32 and Linux
(but has unwanted side effects on Win32 [it does not restore the
originial TZ]).

> Use lexical_cast<int>() instead of atoi.

done.

> In monname2month, would it have been so horrible to just have a 12-way
> if (x == "Jan") ... else if (x == "Feb") ... etc.? :-)

No, but it's faster this way ;-)

> In general, would it be better to use boost's time library instead of
> the libc functions?

Probably, I knew the libc functions by detail and boost was unknown to
me. As I see now matching class (local_date_time) has arrived in 1.33 in
boost. The time I designed it was unable.

> It's not clear to me what your varargs functions are being used for,
> probably just because I haven't looked over the code closely enough;
> can you explain a bit why they need to be varargs?

It's for sending commands with a variable number of parameters to the
CVS server.

> Generally, it'd be nice if there were a few more comments in
> cvs_client.cc explaining what's going on, for anyone trying to find
> their way around that has spent less time staring at the CVS protocol
> docs (such as they are).  Again a question of, mainline code is public
> code that needs to be publically maintainable...

Agreed. But writing comments to an imaginative audience is no fun
(especially if you have to redesign some times because of CVS server flaws).

> This is especially important for any needed quirks/workarounds that
> were only discovered through testing against real servers; e.g.,
> there's a comment:
> +// cater for encountered bugs ...
> +// actually this code should not be necessary with the current options
> +// without -C some revisions interacted with each other badly
> Reading this, I have no idea what it means, except I'm scared to touch
> the next code, because I have no idea how to even tell whether my
> changes broke anything.

The CVS server has some flaws! Each revision has it's own set. What
started as a nice idea ended in a nightmare. Some commands work as
advertized on 1.11, some on 1.12. I can not fix every running CVS server
in the world so I had to get creative.

And especially the fact that the CVS server was designed to handle
exactly one command per connection (gasp!) is deadly to anyone trying to
contact it efficiently.

> This file really needs a big comment at the top giving a rough
> overview of how things work.  Similar to the ones in, e.g., paths.hh,
> or netsync.cc, etc.
> 
> = cvs_repository.cc =
> 
> This file also needs at least one big huge comment explaining how it
> all works.
> 
> More #if 0's.

Most of them can probably deleted. Some had been nice ideas that turned
out to be impractical or unreliable due to CVS server bugs.

> 
> +#ifdef WIN32
> +#define sleep(x) _sleep(x)
> +#endif
> ^^ Ick.  If this is a problem, put a sleep() in platform.hh.  New code
> should, basically, never ever say "#ifdef WIN32".

This damned Win32 platform compatibility has caused more than half of
the work on this branch. I'm not sure I want to invest more to make a
platform work that I don't need at all.

> +#define BACKWARD_COMPATIBLE
> ^^ huh?

Can die in mainline. Backward compatible to the first revision ever
posted on net.venge.monotone.cvssync.

> +// since the piece methods in rcs_import depend on rcs_file I cannot reuse 
> them
> +// I rely on string handling reference counting (which is not that bad IIRC)
> ^^ Hrm, can they be refactored?
> 
> Don't use 'throw oops' in new code.  Use I/N/E, with P/L/MM to include
> more information if necessary on I()'s...

It's not new code ;-) It had been there before I . Agreed.

> +{ // I assume that at least TAB is uncommon in path names - even on Windows
> +  std::string content=host+":"+root+"\t"+module+"\n";
> ^^ not quite sure what this means, but it makes me nervous.  Can you
> elaborate?

Yes, I used to separate root and module by a space. Space in pathnames
is a common disease on windows :-(. Since ; : etc are not uncommon in
strange path names either I decided that nobody in his sane mind would
use tabs in CVS path names and used that as the new separator.

> +static void test_key_availability(app_state &app)
> ^^ this function looks misnamed -- shouldn't it be
> "ensure_key_availability" or "require_key_availability" or such?

yes.

> + // because some manifests might have been absolute (not delta encoded)
> +  // we possibly did not notice removes. check for them
> ^^ what is this about?  (whether a manifest is delta-encoded or not
> should be completely hidden anywhere except database.cc, should it?)

This talks about CVS manifests. And since CVS manifests are stored in
certificates

> +  if (global_sanity.debug) L(F("%s") % debug());
> ^^ huh?  why do you test global_sanity.debug here?

it enables much much more debugging output (computing which is expansive).

> I think we now have _four_ implementations of "build a changeset given
> manifests" -- in 'db rebuild', in cvs_import, in git_import, in
> cvs_pull.  Surely we can do better than this...

Of course!

> Why is the stringtok code included verbatim in _both_ cvs_client.cc
> and cvs_repository.cc?

Bad idea. I probably should create another common used header.

> = cvs_sync.hh ==
> 
> We can't have a public struct cvs_edge and a different private struct
> cvs_edge in cvs_import.  Just... no.

Agreed.

> Why are all of cvs_edge's fields marked 'mutable'?  There are hardly
> _any_ good reasons to use 'mutable', certainly none that don't also
> have explanatary comments...

Will have to take a look ...

> +  mutable unsigned cm_delta_depth; // we store a full manifest every N 
> revisions
> +  static const unsigned cm_max_delta_depth=50;
> ^^ it seems like a very bad idea for cvssync to be making storage
> decisions like this.  If this is a good idea, we should be doing it in
> a global location (like database.cc); if it isn't a good idea, we
> shouldn't be doing it here (and doing it here will make it harder to
> implement any future good ideas).

The certificate used for storing the CVS revision for each file is delta
encoded. Since some of my projects easily reached 3000 revisions, I
decided to store the full map every 50 revisions (instead of traversing
every stored revision to reconstruct the current value). Certificates
are stored uncompressed, base64 encoded so space is precious here :-( .

> = options.hh =
> 
> +#define OPT_SINCE 99 // use a custom number ...
> ^^ huh?

I had to change this number too often on merges, so I decided to take
one from an unused domain. Should get 37 once included. Or better can we
already reserve a number for netsync in mainline ;-) ?

> = netxx_pipe.cc =
> 
> Would it make more sense to add these into netxx itself?  (E.g., make
> Probe itself handle pipes?)  Normally modifying included libraries is
> a bit distasteful, but netxx is orphaned upstream and we already have
> some local enhancements included.

So pushing it into netxx seems logical. But PipeCompatibleProbe is
designed to be minimally invasive, so inclusion might recommend a
rewrite of both (which is beyond my current interest).

> We already require a unix shell environment on windows for building
> and testing, so your test can assume the existence of "ls" there.
> 
> Perhaps it would be better to use 'cat' (with no args) as the program
> spawned for the test, then you could test writing to stdin, reading
> from stdout, probe blocking until a timeout when there is no output,
> maybe others?

Perhaps the above mentioned pipe_test_client.c program is a good idea
since it can test for binary transparency (i fear problems with line
endings), input output dependance etc. I simply ran out of motivation
for the Win32 platform.

> (And maybe also "echo", to test command line passing to the spawned
> program?)
> 
> Your command-line quoting for win32 is wrong (as your comment notes).
> win32/process.cc already has correct win32 quoting code, duplicating
> this finicky task is quite silly, we should find a way not to do that.

I did not want to make netxx/* candidate code rely on win32/*. But this
might be a better idea.

> = Overall =
> 
> This is a big chunk of code, something like 10% of the total size of
> monotone, and it attempts a very complex and error-prone task, that if
> it goes wrong can corrupt people's production CVS repos and confuse
> users utterly.  Merging it to mainline means the development team as a
> whole assuming a responsibility to support and maintain it, so this is
> a big risk to take.

The worst problems I have encountered in daily work is that cvs_pull was
unable to fetch a revision from the CVS server because of some protocol
oddity. Theoretically monotone could go amok and commit unwanted (or
broken) revisions (or insane changelogs) but it never touches the CVS
files directly, so this should be reversible.

> 
> That doesn't mean we don't want to merge it; it provides some really
> fantastic and powerful functionality, that'd I'd love to see included.
> But it does mean that to be comfortable with that risk, we need to
> reduce it as much as possible, if that makes sense.
> 
> The main things I see as helping this:
>   1) basic tree cleanup.  we all live and work in the mainline tree,
>      can't have clutter about the place.

I already addressed some points. I'll try more ...

>   2) basic code cleanup.  readability and consistency are easy, and
>      important.

see above.

>   3) understandable code -- this mostly means enough commentary and
>      explanation in the code that we can figure out roughly what's
>      going on.
>   4) really good docs, clearly laying out the use cases, relation
>      between cvs_import and cvssync, limitations of cvssync, big fat
>      EXPERIMENTAL: MAY EAT HISTORY warnings all over the place until
>      well battle-tested, etc.

I thought that the docs should be in a good shape for a starting point.
This was the reason for my post (I should have cleaned up the code
first, though)

>   5) a good set of test cases, covering (at least) all the major
>      possibilities

This is nearly impossible since most problems arise against different
(buggy) versions of CVS. And writing a CVS bug emulator is not appealing.

> Not all of this needs to happen before it gets into mainline for
> early-adopter testing.  (1) and (2) are necessities.  (4) doesn't have
> to be perfect, but I think needs a bit more work (I know I _will_ have
> to field support questions on this code, even if marked EXPERIMENTAL
> and with you doing all actual hacking...).  (3) and (5) are not as
> critical so long as the code is marked experimental, but do need to be
> fixed sooner or later (preferably sooner, to make the code more
> accessible to others wanting to improve it).  All code in mainline
> needs to either fulfill (3) and (5) eventually or be removed, as a
> rule of thumb, and until cvssync meets this I personally wouldn't feel
> confident in committing to support it in the long term.

Testing covers:
- creating new files, changing files, deleting files in CVS and pulling
these changes in one step and step by step.
- taking over a changed directory and pushing back into CVS
- taking over an unchanged directory, creating a new file in one branch,
deleting a file in a different branch, merging both branches and pushing
the whole history into CVS.
- a strange setup (module name = directory name) which caused problems
with old, no longer used code paths on pull [I no longer use Checkout
but Update for everything since Update allows more precise path
specification]

> Overall, while I've been focusing a lot on negative things, I also
> want to say thanks a lot for working on this!  It's clearly a huge
> investment of effort, and your dedication is really impressive, and
> appreciated.  Now let's see if we can figure out how to get it over
> this hump...

Thank you for such a thorough review of my code (before I had reviewed
it <blush>). I hope we can address the problems, most likely some of the
code has to get restructured once git + other VCS synchronization go
mainline (SVN seems most interesting next).

Can we please agree to tackle win32 after inclusion in mainline? I don't
want to spend hours on that again.

I will contact Graydon after cleaning up and commenting the code. This
way it is easier to talk about file-state -> tree-state conversion. We
might also want to share some test code. Note that I currently ignore
tags (I planned to separate CVS tags from monotone revisions) and side
branches.

   Christof

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]