coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: split overwriting already existing files


From: Pádraig Brady
Subject: Re: split overwriting already existing files
Date: Thu, 03 Jul 2014 09:24:01 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 07/03/2014 07:12 AM, Bernhard Voelker wrote:
> Analyzing bug#17904, I came across the idea that split(1) could
> possibly do something weird, i.e. delete the "aa" file, when
> an output file already exists. Well split(1) doesn't delete it,
> but rather overwrites it:
> 
>   $ wc -l file
>   25000 file
> 
>   $ cp -p file file-newaa
> 
>   $ ls -log file*
>   total 5864
>   -rw-r--r-- 1 2999930 Jul  3 07:47 file
>   -rw-r--r-- 1 2999930 Jul  3 07:47 file-newaa
> 
>   $ find . -size +1000 -exec ~/coreutils/src/split --verbose -l 10000 {\} 
> {}-new \;
>   creating file ‘./file-newaa-newaa’
>   creating file ‘./file-newaa-newab’
>   creating file ‘./file-newaa-newac’
>   creating file ‘./file-newaa’
>   creating file ‘./file-newab’
>   creating file ‘./file-newac’
> 
> find(1) was obviously passing "file-newaa" first to split(1).
> But the second split(1) run has silently overwritten the
> already existing "file-newaa"!
> 
>   $ ls -log
>   total 8796
>   -rw-r--r-- 1 2999930 Jul  3 07:47 file
>   -rw-r--r-- 1 1194980 Jul  3 07:48 file-newaa
>   -rw-r--r-- 1 1194980 Jul  3 07:48 file-newaa-newaa
>   -rw-r--r-- 1 1203284 Jul  3 07:48 file-newaa-newab
>   -rw-r--r-- 1  601666 Jul  3 07:48 file-newaa-newac
>   -rw-r--r-- 1 1203284 Jul  3 07:48 file-newab
>   -rw-r--r-- 1  601666 Jul  3 07:48 file-newac
> 
> There's nothing explicitly about overwriting in the Texinfo manual,
> but as it always says "the output file is created", I would assume
> that O_CREAT is used.
> 
> This is what POSIX [1] says about the output files:
> 
>   [1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/split.html
> 
>   The output files contain portions of the original input file;
>   otherwise, unchanged.
> 
> I'm not sure if that latter mandates to use O_CREAT, but I'd
> consider failing here would be better than losing data.
> 
> Before looking into the code, do you think we should change this?

I would say no because you would often want split
to overwrite the existing output set.
There was a related protection added recently
to not overwrite input files:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=ae584644
Other than that I can't think of other protections we could provide,
apart from adding a --no-clobber option, but I'm not sure that's warranted.

thanks,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]