[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: invalid change text again

From: Larry Jones
Subject: Re: invalid change text again
Date: Tue, 27 Feb 2001 00:36:25 -0500 (EST)

Benjamin Dodge writes:
> I've experience a couple of corrupted RCS files resulting in 'invalid
> change text errors'. I've looked around the web and in this archive.
> I've not really seen good reasons as to why it happens, so I did some
> testing and investigations. Here's what I've found.
> A. No access method is safe.
>   1. People like to blaim this on locking in NFS. I've found that
>   corruption with NFS is worse than other kinds of corruption. However,
>   I don't think NFS needs to be labeled unsafe (more on this later). The
>   basic failure happens when two or more users are attempting to change
>   a file. Both think they have write access to the file due to the
>   flawed locking mechanism. The file has invalid data.

The NFS problem is *not* a locking problem, but actual interoperability
problems between different vendors' implementations of NFS.  The typical
symptom is blocks of NULs in the file in lieu of real data which never
made it from the NFS client to the NFS server.  I've never heard of it
happening when the NFS client and server are the same kind of system,
only when they are different.

>   2. rsh (ext) and pserver are actually not much better than NFS. Two
>   users can STILL get a lock on an RCS file. I have a test script that
>   can recreate this by having one machine constantly commit a file while
>   other machines try to tag the latest rev of the file (they update
>   before tagging). Both operations (commit and tag) require exclusive
>   write access to the RCS file. In this case, one server is modifying to
>   file so the corruption seems to just drop a revision or two. That's
>   not too bad.

This is a long-standing bug in tag -- some metadata was cached before
the file was locked which could cause stale data to be written to the
file after it was locked.  This has been fixed in the current
development version of CVS.

> B. The problem seems to be in the cvs code that does file locking.
>   1. cvs uses 'SIG_beginCrSect' and 'SIG_endCrSect' to make lock files.
>   These functions turn off the sensitivity to SIGNALS (INT, TERM,...)
>   between the begin and end calls. This does not work. The basic
>   algoritm is to:
>   SIG_beginCrSect()
>   create_some_dir_or_file
>   SIG_endCrSect()
>   However, the OS can still context switch the process out especially during
>   the file i/o operations. Being in this sort of critical section does not
>   seem to constitute true mutual exclusion.

It isn't intended to.  SIG_beginCrSect and SIG_endCrSect are used to
prevent a race condition where a caught signal could cause corruption
*in a single process*.  All interprocess synchonization is done by lock
files (actually directories).

>   fcntl seems to be a better way to do locks. Why is it not used? It is NFS
>   safe (when lockd is running on the clients).

Not all systems support fcntl locking.  Not all systems that do support
it support it across NFS.  Not all systems that support it across NFS
work correctly.

> Am I wrong? Is it possible to run CVS in client/server without ever getting
> RCS file corruption?

Given the locking bug in tag, it isn't even possible to run CVS in local
mode without *ever* getting RCS file corruption (although that bug has
been in the code for a couple of years at least and no one has ever
reported seeing corruption because of it in practice).  I believe the
current development version is safe in both local and all client/server
modes; but I wouldn't run it on an NFS-mounted repository unless the
clients and server were the same kind of system or I had tested it
extensively (and even then I'd be nervous).

-Larry Jones

reply via email to

[Prev in Thread] Current Thread [Next in Thread]