bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: split behavior


From: Pádraig Brady
Subject: Re: split behavior
Date: Mon, 14 Sep 2009 09:59:54 +0100
User-agent: Thunderbird 2.0.0.6 (X11/20071008)

Roger McNichols wrote:
> 
> Thanks for the feedback.
> 
> 
>> Do you mean select the appropriate suffix length based on size,
>> or do you mean the zzaa, zzab scheme? The former wouldn't
>> help when processing a pipe for example so I'd probably
>> stick with the latter method for consistency.
> 
> Currently, split (at least 5.2.1) DOES pick the suffix size based on the file 
> size when used as "split -<#> file" and the file size is known.

I checked the repo and can't see code supporting that.
Perhaps you've got a locally modified `split` ?

> But as you 
> point out, if the file is a pipe you may still run out of suffixes if the 
> file size
> changes after invocatio of slpit, or if split is used in the "split -<#> -" 
> (reads stdin) mode, a 2-letter suffix is all you get unless you specify a 
> length.
> Now I suppose that maybe the discussion went something like:
>   >> what if an unknown-sized input stream is the input?
>   >> well then just use -a 100  and you will never* run out...
>      (*note 26^100 is pretty big)
> 
> Anyway, I propose to develop a new commandline option that would invoke the 
> 'old'
> suffix formation behavior.  And even though aa ... zaa ... zzaa ... instead 
> of 
> aa .. zzaa ... zzzzaa (as well as many other schemes) would work just as well,

Bzzt. zaa would sort before zb
In general one needs to append 'z'*suffix_len which would default to 2 if not 
specified.
One would need to consider this behaviour with digit suffixes also.

> I propose to utilize the 'old' one for the added advantage of reverse 
> compatibility.

OK. While I like the scheme it would be really nice to see what we're being 
compatible
with. I.E. it would be great if you found where the old split you used came 
from.

> That way any code that relied on the old scheme for counting would be able to 
> be
> re-functionalized with a simple addition of a commandline argument.
> 
>> if the suffix len is specified and is too small.
>> Otherwise we use the zzaa, zzab method as described before.
> 
> This is also a good idea, but it might override the users intention which 
> could 
> be to use split to detect a file that was more that 676*N lines long or to 
> use it 
> with the -1 option and only write our the first 676 lines of the input 

That's exceedingly unlikely. It would be great to have the "unlimited" behaviour
by default I think. As mentioned before we could have the "limited" behaviour
if POSIXLY_CORRECT is set.

> (who knows why, but we're fixing a fix that broke something else, right?)

I can't see the code for the old behaviour so I wouldn't assume that.

cheers,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]