bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18681: cp Specific fail example


From: Bob Proulx
Subject: bug#18681: cp Specific fail example
Date: Sun, 19 Oct 2014 17:53:31 -0600
User-agent: Mutt/1.5.23 (2014-03-12)

Linda Walsh wrote:
> Bob Proulx wrote:
> > Also consider that if cp were to acquire all of the enhancements
> > that have been requested for cp as time has gone by then cp would
> > be just as featureful (bloated!) as rsync and likely just as slow
> > as rsync too.
>
>       Nope...rsync is slow because it does everything over a client
> server model --- even when it is local.  So everything is written through
> a pipe .. that's why it can't come close to cp -- and why cp would never
> be so slow -- I can't imagine it using a pipe to copy a file anywhere!

The client-server structure of rsync is required for copying between
systems.  Saying that cp doesn't have it isn't fair if cp were to add
every requested feature.  I am sure that if I search the archives I
would find a request to add client-server structure to cp to support
copying from system to system. :-)

Now I will proactively agree that it would be nice if rsync detected
that it was all running locally and didn't fork and instead ran
everything in one process like cp does.  But I could see that coming
to rsync at some time in the future.  It is an often requested
feature.

> > This is something to consider every time someone asks for a
> > creeping feature to cp.  Especially if they say they want the feature
> > in cp because it is faster than rsync.  The natural progression is
> > that cp would become rsync.
>
>       Not even!  Note.  cp already has a comparison function
> built in that it uses during "cp -u"...

I am not convinced of the robustness of 'cp -u ...' interrupt, repeat,
interrupt repeat.  It wasn't intended for that mode.  I am suspicious.
Is there any code path that could leave a new file in the target area
that would avoid copy?  Not sure.  Newer meets the -u test but isn't
an exact copy if the time stamp were older in the original.  But with
rsync I know it will correct for this during a subsequent run.

> built in that it uses during "cp -u"... but it doesn't go through
> pipes.  It used to use larger buffer sizes or maybe tell posix
> to pre-alloc the destination space, dunno, but it used to be
> faster.. I can't say for certain, but it seems to be using

Often the data sizes we work with grow larger over time making the
same task feel slower because we are actually dealing with more data
now.  Files include audio.  Files include video.  Standard def becomes
high def.  "Difficult to see.  Always in motion is the future."

> smaller buffer sizes.  Another reason rsync is so slow -- uses
> a relatively small i/o size 1-4k last I looked. I've asked them
> to increase it, but going through a pipe it won't help alot.

Nod.  Rsync was designed for the network use case.  It could benefit
with some tuning for the local case.  A topic for the rsync list.

> Also in rsync, they've added the posix calls to reserve
> space in the target location for a file being copied in.
> Specifically, this is to lower disk fragmentation (does
> cp do anything like that, been a while since I looked).

I don't know.  It would be worth a look.

> > The advantage of rsync is that it can be interrupted and restarted and
> > the restarted process will efficiently avoid doing work that is
> > already done.  An interrupted and restarted cp will perform the same
> > work again from start to finish.
>
>       I wouldn't trust that it would.  If you interrupt it at exactly
> the wrong time, I'd be afraid some file might get set with the right
> data but the wrong Meta info (acls, primarily).

The design of rsync is to copy the file to a temporary name beside the
intended target.  After the copy the timestamps are set.  After that
the timestamps are set the file is renamed into place.  An interrupt
that happens before that rename time will cause the temporary file to
be removed.  An interrupt that happens after the rename is, well,
after that and the copy is already done.  Since rename on the local
file system is atomic this is guaranteed to function robustly.  (As
long as you aren't using a buggy file system that changes the order of
operations.  That isn't cool.  But of course it was famously seen in
ext4 for a while.  Fortunately sanity has prevailed and ext4 doesn't
do that for this operation anymore.  Okay to use now.)

> > If I am doing a simple copy from A to B then I use 'cp -av A B'.  If I
> > am doing it the second time then I will use rsync to avoid repeating
> > previously done work 'rsync -av A B'.
>
>       Wouldn't cp -auv A B do the same?

Do I have to go look at the source code to verify that it doesn't? :-(

I assume it doesn't without looking.  I assume cp copies in place.  I
assume that cp does not make a temporary file off to the side and
rename it into place once it is done and has set the timestamps.  I
assume that cp copies to the named destination directly and updates
the timestamps afterward.  That creates a window of time when the file
is in place but has not had the timestamp placed on it yet.

Which means that if the cp is interrupted on a large file that it will
have started the copy but will not have finished it at the moment that
it is interrupted.  The new file will be in place with a new
timestamp.  The second run with cp -u will avoid overwriting the file
because the timestamp is newer.  However the contents of the file will
be incomplete, or at least not matching the source copy at the time of
the second copy.

If my assumptions in the above are wrong please correct me.  I will
learn something.  But the operating model would need to be the same
portably across all portable systems covered by posix before I would
consider it actually safe to use.

> > If I want progress indication...  If I want placement of backup files
> > in a particular directory...  If I want other fancy features that are
> > provided by rsync then it is worth it to use rsync.
> > ...trimmed simple benchmark...
> >  $ time cp -a coreutils junk/
>
> By default cp -a transfers acls and ext-attrs and preserves
> hard links.   Rsync doesn't do any of that by default.
> You need to  use "-aHAX" to compare them ...

Good catch.  :-)

> you have to call them
> out as 'extra' with rsync, so the above test may not be what it seems.
> Though if you don't use ACL's (which I do), then maybe the above
> is almost reasonable.  Still.. should use -aHAX

I didn't have any hard links, ACLs, or extended attributes in the test
case it shouldn't matter for the above.

> Is your rsync newer? i.e. does it have the posix-pre-alloc
> hints?... Mine has a pre-alloc patch, but I think that was
> suse-added and not the one in the mainline code.  Not sure.
> 
> rsync --version
> rsync  version 3.1.0  protocol version 31
>     64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
>     socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
>     append, ACLs, xattrs, iconv, symtimes, prealloc, SLP

I happened to run that test on Debian Sid and it is 3.1.1.  However
Debian Stable, which I have most widely deployed, has 3.0.9.  So you
are both ahead of and behind me at the same time. :-)

>       Throw a few TB copies at rsync -- where all the data
> won't fit in memory.... it also, I'm told, has problems with
> hardlinks, acls and xattrs slowing it down, so it may be a
> matter of usage...

I have had problems running rsync with -H for large data sets.  Bad
enough that I recommend against it.  Don't do it!  I don't know
anything about -A and -X.  But rsync -a is fine for very large data
sets.

>       BUT all that said... note that I DO USE it... for the
> job I'm doing in my snapper script, nothing else will.

Yes.  It is too useful to be without!

> (don't ya just love performance talk?)

Except that we should have moved all of this to the discussion list.
I feel guilty to have continued it.  We have drifted well away from
the original bug report.  The one with the terrible title.  If this
continues let's take it over to the coreutils discussion list for
further conversation about it.

Bob





reply via email to

[Prev in Thread] Current Thread [Next in Thread]