coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC/PATCH] cp: Add option to pre-allocate space for files


From: Pádraig Brady
Subject: Re: [RFC/PATCH] cp: Add option to pre-allocate space for files
Date: Fri, 11 May 2012 16:45:46 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0

On 05/11/2012 04:03 PM, Mark wrote:
> Hi,
> 
> Here's a patch for cp which adds a new --preallocate option. When
> specified, cp allocates disk space for the destination file before writing
> data. It uses fallocate() with FALLOC_FL_KEEP_SIZE on Linux, falling back
> to posix_fallocate() if that fails.

Thanks for taking the time to do this.
This feature is already under consideration.
See the comments at: http://bugs.gnu.org/9500

> Benefits of preallocation:
>  - Disk fragmentation can be greatly reduced. That means faster file
> access and less filesystem overhead (fewer extents).
>  - Recovering data after filesystem corruption should be more successful,
> since files are more likely to be contiguous.
>  - If you're e.g. copying a virtual machine disk image file, the
> destination should be (almost) contiguous, meaning that running a disk
> optimiser/defragmenter in the guest OS would work as it should (i.e.
> improve performance).
> 
> This is a very preliminary patch for testing. Hopefully someone will find
> it useful. And hopefully someone who (a) has a clue when it comes to C
> programming, and (b) is familiar with the coreutils source (I'm neither)
> can work from this to produce something which could be included in a
> future release.
> 
> Note that posix_fallocate() sets the destination file size. If your system
> doesn't support fallocate() with FALLOC_FL_KEEP_SIZE, you can't e.g. do
> "ls -l destfilename" to monitor the progress of a large file copy; the
> length shown will always be the final length.
> 
> Pre-allocating space can defeat the object of --sparse=always (or the
> default sparse-checking heuristic). If copying files with large holes you
> probably won't want to use --preallocate. If you do, regions in the
> destination corresponding to holes in the source will be allocated but
> unwritten. You'll lose the disk-space-saving benefit, but keep the
> fast-reading-of-holes benefit. On the other hand, that feature could be
> useful sometimes.
> 
> In the general case of copying non-sparse files, it should be beneficial
> to use --preallocate. However on some systems, when the destination
> filesystem does not support pre-allocation (e.g. FAT32), the
> implementation of posix_fallocate() might try to fill the region to be
> pre-allocated with zeros. That would double copy time for no benefit.
> 
> To-do list:
>  - Add --preallocate option to mv as well
>  - Should the option name be changed to --pre-allocate?
>  - Maybe have an option to tell cp to pre-allocate space for all
> destination files in one go, rather than pre-allocating space for each
> individual file before copying?

I don't think there should be an option at all.
cp should have enough info to do the right thing.
Why would you even not want to preallocate?
In saying that, using fallocate with XFS triggers
alignment behavior that causes fragmentation.
But this might change, and the user can't be expected to know this.
BTW I'm thinking of adding a new FALLOC_FL_ALIGN flag
to the kernel, that XFS can use in its tools to enable that
separate functionality.

>  - Check the error code that fallocate() returns. If it says the
> filesystem does not support fallocate(), don't call it again for every
> other file being copied.

>  - Better handling of sparse files, e.g. don't call fallocate() if source
> file is sparse and --sparse=always is given.

That's an important consideration.

>  - If pre-allocation fails due to insufficient disk space, cp prints a
> message and continues. So typically it will fill up the disk then abort
> with an out-of-disk-space error. It would be nice to be able to tell cp
> to abort when a pre-allocation fails, so it can exit without wasting
> time.

Yes it should exit immediately on ENOSPC

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]