coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cp: behavior regression in 8.23


From: Pádraig Brady
Subject: Re: cp: behavior regression in 8.23
Date: Sat, 31 Jan 2015 12:38:54 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

On 31/01/15 06:48, TAMUKI Shoichi wrote:
> Hello Padraig,
> 
> From: Padraig Brady <address@hidden>
> Subject: Re: cp: behavior regression in 8.23
> Date: Fri, 30 Jan 2015 12:46:16 +0000
> 
>> This change was made for performance reasons:
>>   https://oss.oracle.com/~mason/acp/
>>   http://home.ifi.uio.no/paalh/publications/files/ipccc09.pdf
> 
> Yes, I know.  In a particular case (depending on filesystems or using
> way of directory structures,) using directory lists ordered by inode
> can speed up.
> 
>> What's the particular problem you have with the order
>> of the files in the tar archive, so I understand your issue completely?
> 
> The point is cp should keep the function to preserve the deterministic
> directory structure of the original files/directories in the copy.
> That was possible with the cp in coreutils-8.22 or earlier.

There is no such guarantee from the system though.
Depending on the file system, number of files and the structure
of the underlying tree etc. the order can change.
cp uses savedir() (similar to scandir), which calls readdir(),
and readdir() is not deterministic.

>> I see that tar 1.28 has the --sort option.
>> Perhaps if that supported --sort=mtime it would cater for your use case
>> of reproducible tar archives with a specific order.
> 
> Ah, the case (January, February, ...) is just an example to explain
> the issue.  It is a mere coincidence they are sorted in mtime order.
> 
> Here is the more practical example using cp in coreutils-8.23:
> 
> tamuki@wombat:~/work32$ ls -fl coreutils-8.22
> total 8409088
> drwxr-xr-x  6 tamuki users    4096 Jan 31 14:48 ./
> drwxr-xr-x  3 tamuki users    4096 Jan 31 14:40 ../
> -rwxr-xr-x  1 tamuki users    8753 Dec 24 15:07 PlamoBuild.coreutils-8.22*
> -rw-------  1 tamuki users  186031 Jan 31 14:48 nohup.out
> -rw-r--r--  1 tamuki users 5335124 Dec 14  2013 coreutils-8.22.tar.xz
> drwxrwxr-x 12 tamuki users    4096 Dec 14  2013 coreutils-8.22/
> drwxrwxr-x 12 tamuki users    4096 Jan 31 14:47 build/
> drwxr-xr-x  6 root   root     4096 Jan 31 14:48 work/
> drwxr-xr-x  2 root   root     4096 Jan 31 14:48 pivot/
> -rw-r--r--  1 root   root        0 Jan 31 14:48 i.st
> -rw-r--r--  1 root   root        0 Jan 31 14:48 i.et
> -rw-r--r--  1 root   root  2843748 Jan 31 14:48 coreutils-8.22-i686-P2.txz
> tamuki@wombat:~/work32$ sudo cp -a coreutils-8.22 coreutils-8.23
> tamuki@wombat:~/work32$ ls -fl coreutils-8.23
> total 8409088
> drwxr-xr-x  6 tamuki users    4096 Jan 31 14:48 ./
> drwxr-xr-x  4 tamuki users    4096 Jan 31 14:50 ../
> -rw-------  1 tamuki users  186031 Jan 31 14:48 nohup.out
> -rw-r--r--  1 tamuki users 5335124 Dec 14  2013 coreutils-8.22.tar.xz
> drwxrwxr-x 12 tamuki users    4096 Dec 14  2013 coreutils-8.22/
> drwxrwxr-x 12 tamuki users    4096 Jan 31 14:47 build/
> -rw-r--r--  1 root   root        0 Jan 31 14:48 i.st
> -rw-r--r--  1 root   root        0 Jan 31 14:48 i.et
> -rwxr-xr-x  1 tamuki users    8753 Dec 24 15:07 PlamoBuild.coreutils-8.22*
> -rw-r--r--  1 root   root  2843748 Jan 31 14:48 coreutils-8.22-i686-P2.txz
> drwxr-xr-x  6 root   root     4096 Jan 31 14:48 work/
> drwxr-xr-x  2 root   root     4096 Jan 31 14:48 pivot/

I still don't understand why this is an issue TBH.
Directory listing programs like ls normally sort results.
If you want reproducible builds then tar has the --sort=name option.

> Anyway, in some cases, the copying directory tree will need to be done
> as fast as possible, even ignoring the order of the readdir calls.
> However, I don't think changing the specification will be a good idea
> because cp has been used in the same manner as before for close to
> three decades.
> 
> So, I propose to add --sort={none,name,inode} option to cp command.

I'm inclined to think an option is not appropriate here,
as it doesn't provide any subsequent guarantees from the system.

I do agree that sorting by inode is a bit of a hack,
though it's used also by all FTS using coreutils.

It's such a widely used hack that I think file systems
would continue to have inode order somewhat related to
locality on disk (which is still important for SSDs).

Ideally one could pass 'FAST_MODE' down to a readdir_mode syscall,
so that the system could return in the optimum order, and also
avoid the need for user space needing to read all entries at once,
and sorting them.

Alternatively file systems may adjust to using 'FAST_MODE' implicitly,
assuming that user space will sort if it cares about order/reproducibility.
That would be another reason to avoid an explicit --sort order for cp.

thanks,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]