[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[coreutils] RE: cp command performance

From: Hemant Rumde
Subject: [coreutils] RE: cp command performance
Date: Thu, 23 Dec 2010 09:34:52 -0500

Hi Bob

Thanks for your quick response. I really appreciate your reply!
We are using HP Storage. I guess, our infrastructure is ok. 

Lets discuss on "cp A1 A1.bk". Correct me if I am wrong. 
In this cp, OS needs to cache all A1.bk data blocks from storage
to overwrite with A1 block. I guess, some time would be 
utilized for this.  

However, if A1.bk is new, then it would take free data 
Blocks from super block. I guess, this should be faster.

Apart from this, read/write hits can make some difference
in performance. When you use dd, I guess most of your data
would be in buffer-cache and read-hit rate would be more
And very few calls would go to backend storage. 

Does this make any sense?


-----Original Message-----
From: Bob Proulx [mailto:address@hidden 
Sent: Wednesday, December 22, 2010 9:17 PM
To: Hemant Rumde
Cc: address@hidden; address@hidden
Subject: Re: cp command performance

Hemant Rumde wrote:
> I do not log any bug for cp command.

In that case I will close the bug report that you have opened.

Let's have the discussion on the discussion mailing list
address@hidden  That is the more appropriate place.  I have set the
mail headers to direct discussion there but if your mailer doesn't
comply please manually redirect it.

> In our company, we copy huge Cobol files before processing data. This 
> is to rollback our data files.  Suppose A1 is my huge file of 60GB and

> A1.bk is its backup file, before we process ( change ) data into A1. 
> Then which of our method would be faster?
> 1. Method-1 ( A1.bk exists ) 
>     $ cp  A1 A1.bk
> 2. Method-2  
>     $ rm -f A1.bk 
>     $ cp A1 A1.bk
> 3. Method-3 
>     $ cp --remove-destination A1 A1,bk

All three of those should be virtually the same, especially the last
two.  But benchmarking it is always good.  I created a 10G test file
using dd and copied it once to set up the test and then performed the
following operations on a ext3 filesystem.

  $ time cp testdata testdata.bak
  real    3m34.435s
  user    0m0.108s
  sys     0m30.950s

  $ time ( rm -f testdata.bak ; cp testdata testdata.bak )
  real    3m27.941s
  user    0m0.092s
  sys     0m30.914s

  $ time cp --remove-destination testdata testdata.bak
  real    3m36.931s
  user    0m0.068s
  sys     0m30.862s

As you can see the times for all three operations are with limits of
being exactly the same.

> This operation is very simple. But our operators tell, in some cases 
> cp takes longer time. How can we reduce copying time?

I do not doubt that there will be differences in times consumed for just
the raw command.  With such a large file I think this will be dependent
upon outside influences.  Such as what filesystem you are using for the
copy and how much ram you have available for buffer cache and whether
extraneous sync and fsync calls are happening at the same time and so
forth.  I could send for-examples but I don't want to send you off on in
the wrong direction and so will resist.



NOTICE: The information contained in this electronic mail message is 
confidential and intended only for certain recipients.  If you are not an 
intended recipient, you are hereby notified that any disclosure, reproduction, 
distribution or other use of this communication and any attachments is strictly 
prohibited.  If you have received this communication in error, please notify 
the sender by reply transmission and delete the message without copying or 
disclosing it.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]