monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Cvssync status


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Cvssync status
Date: Sun, 18 Sep 2005 03:03:35 -0700
User-agent: Mutt/1.5.9i

On Sun, Sep 18, 2005 at 01:23:07AM +0200, Christof Petig wrote:
> So I would consider it ready for inclusion into mainline (to get more
> testing by other people)

Cool!  Looking over the code:

= General issues =

What are these files?:
 - WhyThisWay
 - CVS_prot
 - pipe_test_client.c
 - tests.cvs/*

I notice you added botan/md5.*.  Botan was upgraded on mainline
recently; are these still from the correct version of botan?

Why is there a cvs_sync.hh but no cvs_sync.cc, and a
cvs_repository.cc but no cvs_repository.hh?

What does the cvs_admin command do?

Can you give more details on how you reconstruct whole tree-states
from CVS's file-state information?  I ask in particular because I
notice code that looks like the old cvs importer, but the cvs importer
was recently rewritten because the tree state reconstruction stuff
turned out to be fundamentally broken.  (Graydon is the person to talk
to for details.)

We tend to say "FIXME" instead of "2DO".  Not that critical, but might
be nice to be consistent.  (It would never occur to me to grep for
"2DO".)

What is the testing status of the cvssync code?  I see some tests in
tests/, some tests in tests.cvs... what's going on here, and roughly
how much of the cvssync functionality has tests?

== Formatting ==

Please format code appropriatly (= GNU style).  Really.  I know it
seems trivial, and I hate to be annoying about it.  But it's
important, and it _is_ trivial, so I hope this isn't a big deal.
Code on mainline needs to be easily maintainable by anyone.  Some
examples:
+  if (args.size() == 2)
+  { repository = idx(args, 0)();
+    module = idx(args, 1)();
+  }

Should be:

if (args.size() == 2)
  {
    repository = idx(args, 0)();
    module = idx(args, 1)();
  }

+      (*byte_out_ticker)+=stream->write(s.c_str(),s.size());
should be
+      (*byte_out_ticker) += stream->write(s.c_str(), s.size());
(And why doesn't write() take a std::string...?)

I'd also prefer single-line if statements to be split over two lines:
  if (foo)
    bar;
rather than the current:
  if (foo) bar;

Two things that are not part of GNU style, but that we are consistent
about:
  - always say "Foo const & f" rather than "const Foo &f".
  - when defining functions, format like
static void
do_it()
    rather than
static void do_it()

= cvs_client.cc =

Can you clean up some of these #if 0's?  they're confusing to read;
can't tell what their status is.

There are some commented out headers in cvs_client.cc that can
probably be deleted, as well.

Why do cvs_client::writestr() (and friends) use zlib directly, instead
of via Botan?

Why does cvs_client.hh include zlib.h, instead of cvs_client.cc?  And
same question about stdarg.h.

I am dubious about the workaround in cvs_client.cc:push_back2insert_cl
where you make a struct with members marked mutable to avoid some
warning from the compiler.  Are you sure you're not just silencing the
warning while preserving the brokenness the compiler was trying to
warn you against?  What exactly is going on here?

+// the creator function (so you don't need to specify the type
^^ this comment is missing a close paren

+// "  AA  " s=2 e=3
^^ this comment could probably be more helpful to the naive reader
:-).

Why are some methods in cvs_client capitalized and some not?

Your putenv/getenv stuff in timezone2time_t does not look portable.
It is simply not true that WIN32 has putenv while everywhere else has
setenv...

Use lexical_cast<int>() instead of atoi.

In monname2month, would it have been so horrible to just have a 12-way
if (x == "Jan") ... else if (x == "Feb") ... etc.? :-)

In general, would it be better to use boost's time library instead of
the libc functions?

It's not clear to me what your varargs functions are being used for,
probably just because I haven't looked over the code closely enough;
can you explain a bit why they need to be varargs?

Generally, it'd be nice if there were a few more comments in
cvs_client.cc explaining what's going on, for anyone trying to find
their way around that has spent less time staring at the CVS protocol
docs (such as they are).  Again a question of, mainline code is public
code that needs to be publically maintainable...

This is especially important for any needed quirks/workarounds that
were only discovered through testing against real servers; e.g.,
there's a comment:
+// cater for encountered bugs ...
+// actually this code should not be necessary with the current options
+// without -C some revisions interacted with each other badly
Reading this, I have no idea what it means, except I'm scared to touch
the next code, because I have no idea how to even tell whether my
changes broke anything.

This file really needs a big comment at the top giving a rough
overview of how things work.  Similar to the ones in, e.g., paths.hh,
or netsync.cc, etc.

= cvs_repository.cc =

This file also needs at least one big huge comment explaining how it
all works.

More #if 0's.

+#ifdef WIN32
+#define sleep(x) _sleep(x)
+#endif
^^ Ick.  If this is a problem, put a sleep() in platform.hh.  New code
should, basically, never ever say "#ifdef WIN32".

+#define BACKWARD_COMPATIBLE
^^ huh?

+// since the piece methods in rcs_import depend on rcs_file I cannot reuse them
+// I rely on string handling reference counting (which is not that bad IIRC)
^^ Hrm, can they be refactored?

Don't use 'throw oops' in new code.  Use I/N/E, with P/L/MM to include
more information if necessary on I()'s...

+{ // I assume that at least TAB is uncommon in path names - even on Windows
+  std::string content=host+":"+root+"\t"+module+"\n";
^^ not quite sure what this means, but it makes me nervous.  Can you
elaborate?

+static void test_key_availability(app_state &app)
^^ this function looks misnamed -- shouldn't it be
"ensure_key_availability" or "require_key_availability" or such?

+ // because some manifests might have been absolute (not delta encoded)
+  // we possibly did not notice removes. check for them
^^ what is this about?  (whether a manifest is delta-encoded or not
should be completely hidden anywhere except database.cc, should it?)

+  if (global_sanity.debug) L(F("%s") % debug());
^^ huh?  why do you test global_sanity.debug here?

I think we now have _four_ implementations of "build a changeset given
manifests" -- in 'db rebuild', in cvs_import, in git_import, in
cvs_pull.  Surely we can do better than this...

Why is the stringtok code included verbatim in _both_ cvs_client.cc
and cvs_repository.cc?

= cvs_sync.hh ==

We can't have a public struct cvs_edge and a different private struct
cvs_edge in cvs_import.  Just... no.

Why are all of cvs_edge's fields marked 'mutable'?  There are hardly
_any_ good reasons to use 'mutable', certainly none that don't also
have explanatary comments...

+  mutable unsigned cm_delta_depth; // we store a full manifest every N 
revisions
+  static const unsigned cm_max_delta_depth=50;
^^ it seems like a very bad idea for cvssync to be making storage
decisions like this.  If this is a good idea, we should be doing it in
a global location (like database.cc); if it isn't a good idea, we
shouldn't be doing it here (and doing it here will make it harder to
implement any future good ideas).

= options.hh =

+#define OPT_SINCE 99 // use a custom number ...
^^ huh?

= netxx_pipe.cc =

Would it make more sense to add these into netxx itself?  (E.g., make
Probe itself handle pipes?)  Normally modifying included libraries is
a bit distasteful, but netxx is orphaned upstream and we already have
some local enhancements included.

We already require a unix shell environment on windows for building
and testing, so your test can assume the existence of "ls" there.

Perhaps it would be better to use 'cat' (with no args) as the program
spawned for the test, then you could test writing to stdin, reading
from stdout, probe blocking until a timeout when there is no output,
maybe others?

(And maybe also "echo", to test command line passing to the spawned
program?)

Your command-line quoting for win32 is wrong (as your comment notes).
win32/process.cc already has correct win32 quoting code, duplicating
this finicky task is quite silly, we should find a way not to do that.

= Overall =

This is a big chunk of code, something like 10% of the total size of
monotone, and it attempts a very complex and error-prone task, that if
it goes wrong can corrupt people's production CVS repos and confuse
users utterly.  Merging it to mainline means the development team as a
whole assuming a responsibility to support and maintain it, so this is
a big risk to take.

That doesn't mean we don't want to merge it; it provides some really
fantastic and powerful functionality, that'd I'd love to see included.
But it does mean that to be comfortable with that risk, we need to
reduce it as much as possible, if that makes sense.

The main things I see as helping this:
  1) basic tree cleanup.  we all live and work in the mainline tree,
     can't have clutter about the place.
  2) basic code cleanup.  readability and consistency are easy, and
     important.
  3) understandable code -- this mostly means enough commentary and
     explanation in the code that we can figure out roughly what's
     going on.
  4) really good docs, clearly laying out the use cases, relation
     between cvs_import and cvssync, limitations of cvssync, big fat
     EXPERIMENTAL: MAY EAT HISTORY warnings all over the place until
     well battle-tested, etc.
  5) a good set of test cases, covering (at least) all the major
     possibilities
Not all of this needs to happen before it gets into mainline for
early-adopter testing.  (1) and (2) are necessities.  (4) doesn't have
to be perfect, but I think needs a bit more work (I know I _will_ have
to field support questions on this code, even if marked EXPERIMENTAL
and with you doing all actual hacking...).  (3) and (5) are not as
critical so long as the code is marked experimental, but do need to be
fixed sooner or later (preferably sooner, to make the code more
accessible to others wanting to improve it).  All code in mainline
needs to either fulfill (3) and (5) eventually or be removed, as a
rule of thumb, and until cvssync meets this I personally wouldn't feel
confident in committing to support it in the long term.

Overall, while I've been focusing a lot on negative things, I also
want to say thanks a lot for working on this!  It's clearly a huge
investment of effort, and your dedication is really impressive, and
appreciated.  Now let's see if we can figure out how to get it over
this hump...

Cheers,
-- Nathaniel

-- 
"Of course, the entire effort is to put oneself
 Outside the ordinary range
 Of what are called statistics."
  -- Stephan Spender




reply via email to

[Prev in Thread] Current Thread [Next in Thread]