[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: renaming under CVS

From: Paul Sander
Subject: Re: renaming under CVS
Date: Sun, 10 Mar 2002 02:29:15 -0800

>--- Forwarded mail from address@hidden

>--- Paul Sander <address@hidden> wrote:
>> Keep in mind that "permissions" include not only the
>> file's mode,
>> but also its user and group ownerships, plus
>> whatever additional
>> mechanism (e.g. ACLs) are provided by the operating
>> system.  As
>> files propagate around the repository then the
>> permissions will
>> have to behave in ways that the user expects that
>> are not necessarily
>> easy to implement.

>Permissions need to work correctly in order to satisfy
>there primary goal, security.

>I'll have to check whether mv'ing the archive file
>into the new location will follow the SGID bit setting
>of the new directory.  If not, ln'ing or cp'ing the
>archive file to its new location, then rm'ing the old
>one will have to do.

>> >Assuming that a file is moved from one directory to
>> >another with a different group, and the SGID bit is
>> >set on the repo directory, the archive file will be
>> >moved adopting the new group.  An update to the
>> >repo-wide archive location mapping will also be
>> >necessary.
>> How do you expect this to affect retrieval by tag or
>> datestamp?
>> The person performing that task may not be a member
>> of the new
>> group.

>Then they shouldn't be allowed to access the file
>(unless other permissions are set).  If this isn't the
>desired behaviour, the cvs admin will need to change
>permissions within the repo.

Hmmm...  I have two problems with this.  First, renaming a file
should not break the ability to check out prior releases by tag
or branch/datestamp pair.  Second, renaming a file on one branch
should not break another user's ability to work on the same file
on a different branch.  It's hard to say how much of a problem this
would be in practice, though.

On the other hand, since once an RCS file is created, it need never
move around the repository; only the mapping between RCS files and
sandbox copies needs to change.  This opens up another avenue for
access control, which is to dedicate directories within the repository
to each group (in the access control sense).  Changing group ownership
of a file could then equate to moving the RCS file to a new directory
with different access controls.

>> What about this case?
>> mod-a a
>> mod-b a/b
>> Anyway, disallowing this type of sharing and that in
>> the example
>> above is not acceptable to me.  Code reuse by
>> sharing source code
>> isn't going away any time soon (no matter that the
>> buildmeisters
>> want it to), and the version control system can't
>> get in the way
>> of that.

>At first, I was thinking that all the archive files
>will reside within one directory in the repo.  This is
>not viable if we want to have permission
>responsibility owned by repo directories.  I'll have
>to rethink this (and other "complex" module
>definitions) in light of the fact that the archive
>files will need to reside within their repo

I think that RCS files can indeed live in one directory, though
we could consider using different directories strictly as an
access control mechanism.  The type of sharing in this example
can easily be done if directories are described by text files
that map RCS file names to sandbox files.

>> Also note that retrieval by tag or datestamp is
>> required to retain
>> the old shape of the tree.  Making that work means
>> that a single
>> RCS file maps to multiple locations in workspaces
>> anyway, so the
>> kind of sharing that I demand should come for free.

>The new scheme I'm thinking of should take care of

>> My thought had been that if a project reaches the
>> end of its life,
>> should a user be permitted to delete its definition
>> from the top level?
>> Re-thinking this, I don't think it matters.  It's
>> still retrievable by
>> tag or datestamp, and the effect is no different
>> from moving the project
>> into a new directory and removing it from there.
>> The question remaining is how to query the system
>> for existing modules,
>> because the ones contained at the top level of the
>> repository are really
>> just a subset of the total possible.  There's also
>> the issue of ambiguity
>> of names, because a project might be removed and a
>> new one created with
>> the same name.

>It sounds like a job for "cvs ls".  I'll leave this
>for another time.

>> Let's say a user does a "cvs add f; cvs commit f". 
>> Obviously, this
>> commits the initial contents of f to the repository.
>>  But does it
>> also commit the addition of f to the parent
>> directory, or is a separate
>> "cvs commit ." needed?  If the latter, what happens
>> to the RCS file
>> if the sandbox is released before the directory is
>> committed?
>> Now suppose that the "cvs add f" is really
>> resurrecting a file, or linking
>> (sharing) a file from another part of the
>> repository.  This introduces a
>> condition where the contents of the file are already
>> up to date, but
>> contents of the directory have been modified.  What
>> should "cvs commit f"
>> do, and is a separate "cvs commit ." required?

>These are good points.  I'll have to mull them over.

>There's another problem I'm thinking of:  What happens
>if developer A modifies a file, developer B checks in
>a move of that file, then developer A updates just the
>directory of that file (rather than the entire
>sandbox)?  My first inclination would be to say, "This
>is exactly the same situation as when developer B
>checks in a removal of that file".  But then, what if
>developer A updates the entire sandbox?  In order for
>this to behave properly, something in CVS must know
>what directory the update starts.

Yeah, that's an interesting problem.  But CVS knows where
the update started.  It's either the present working directory
or given on the command line when the main() function is
invoked.  I would hope that a new implementation would not
chdir around, so it would be easy to carry that info around.
(I have found that it's rarely a good idea to use the chdir
call within a program for the specific reason that it obsoletes
relative paths passed into the program, unless there's a very
good reason to do so or it's isolated in a child process.)

>> >I was thinking that the users' sandboxes would have
>> >the filename mapping within its CVS/Entries files.
>> >For example, let's say file is mapped to
>> >01ef,v (I'll use four nibble archive names to save
>> on
>> >bandwidth).  CVS/Entries will store this
>> information.
>> >The client will look up the archive name from
>> >CVS/Entries and will use only that name when
>> >communicating with the server.
>> >The server will then lookup the location of that
>> >archive within the repo using the repo-wide archive
>> >location mapping.  Everything else should work as
>> it
>> >does now, more or less.
>> That will probably work, and it would seem to solve
>> the evil
>> twin problem well enough.

>Let's add one more mapping in the above.  The
>directory repo will need to remap the archive name
>back to the actual filename.  This would take care of
>renames within the same directory.

Suppose you keep an RCS file in the repository whose versions
represent the contents of a sandbox directory.  It maps the RCS
files to the names of the sandbox copies of each file, plus
whatever child directories are attached (and perhaps a reference
to the parent directory).  That RCS file is tagged and branched
just like any source file, but manifests itself as the contents
of the Entries file rather than a source file.

>> On a single-user basis, this is true.  But by
>> granting exclusive
>> locks on the entire repository to a single user at
>> any given time
>> will reduce throughput a lot.  Some kind of
>> concurrent locking
>> is necessary, but for demonstration purposes (i.e.
>> the first hack)
>> a repository-level lock should be sufficient.

>I wasn't going to abandon reader-writer locks.  But
>doing this introduces the problem of having a writer
>block forever.  I suppose we could add better
>heuristics to the locking mechanism (eg if the writer
>has let three readers cut in already, let him
>through).  Like you said, though, a normal
>reader-writer lock mechanism is good enough for now.

>> Yeah, that's why I propose a system in which
>> checkouts are done
>> after the versions are identified (either by tag,
>> version number,
>> or branch/timestamp pair).  That precludes the need
>> for read locks
>> altogether.

>I'll have to think about this a bit more.

>> >> Fair enough.  The hard link thing is a workaround
>> >> for RCS' lack
>> >> of a two-phase commit hook.  They can be
>> eliminated
>> >> by adding
>> >> to RCS the ability to leave the ,*, file and
>> >> renaming it back to
>> >> the *,v file at a later time using the existing
>> RCS
>> >> mechanism.
>> >> Would that be satisfactory?
>> >I'm not sure if "mv" is supposed to be atomic, but
>> we
>> >can discuss this on another thread.
>> I'm inclined to think that "mv" should be (it just
>> updates a sandbox,
>> after all), but that the commits that follow make
>> the entire rename
>> process not be atomic.

>I think my statement was taken out of context.  We
>were talking about using the OS "mv" command to create

Ah, I understand now.  I'm inclined to swipe, er, reuse the
code that's in RCS.  Paul Eggert and company have spent a lot
of time minimizing race conditions on lock files.

Alternatively, we could build an in-memory locking system in
the CVS server and discontinue supporting local mode.  This
would be faster than a filesystem-based approach, and it would
not require a crash recovery procedure to clean up abandoned

>--- End of forwarded message from address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]