coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] md5sum, sha*sum: only escape file names containing newlines


From: Pádraig Brady
Subject: Re: [PATCH] md5sum, sha*sum: only escape file names containing newlines
Date: Fri, 01 Nov 2013 22:51:15 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 11/01/2013 06:53 PM, Pádraig Brady wrote:
> On 11/01/2013 06:20 PM, Eric Blake wrote:
>> On 11/01/2013 11:03 AM, Pádraig Brady wrote:
>>
>>>>
>>>> Escape the output (marking with a leading '\' and backslash-escaping
>>>> both '\' and '\n') only when the file name contains a newline.
>>>> Before, we would do that for a file name containing either newline or 
>>>> backslash.
>>>>
>>>> This probably deserves a NEWS entry, since it is user-visible.
>>>
>>> I debated that as I thought it could have no impact on anything,
>>> but it could actually if one was comparing old and new outputs?
>>>
>>> newsum=$(md5sum my file set | md5sum)
>>> [ "$newsum" = "$(cat ./oldsum)" ] || error
>>
>> Not just that, but the new format is not necessarily parseable by older
>> md*sum.  Your patch didn't show (but probably should be enhanced) what
>> happens for a file named 'a\nb'; pre-patch, it gave '\sum  a\\nb',
>> post-patch it gives 'sum  a\nb'
> 
> Right.
> 
>> - but if the older utility assumes that
>> the missing leading \ was a mistake and unescapes the file name, it
>> results in looking for a file as 3 three-byte name "a<newline>b", which
>> is also part of the user-visible change.
> 
> Right but that's a big if.
> So you're referring to non GNU utils parsing these checksum files,
> and non honoring the leading \ escape marker.
> That's quite unlikely I would think.
> 
>> Breaking output so that older versions can't parse newer output has been
>> one of the reasons that I have only threatened to patch \r handling,
>> rather than actually doing it, because it's tricky to think about
>> old/new interactions and what might break.  Depending on how
>> conservative we are trying to be, we may need to add a command line
>> option that will let the user forcefully revert to the older-style
>> output for intentional interaction with older checksum tools regardless
>> of filename.  For 99% of the cases, the output is identical, since files
>> with \n or \\ in the name are already rare.  Thinking aloud, it may be
>> appropriate to have such a mode option be tri-state (old, new, or warn;
>> with default being warn), where the warning mode gives the new output
>> but ALSO flags to the user that their output may not be parseable by
>> older summing utilities.
> 
> Well any change here isn't worth a flag I think.
> Even for \r one can always `tr -d '\r'` the DOS files before processing.

Or dos2unix to be careful to only process EOLs:

$ printf 'a\rb\r\r\n' | dos2unix | od -tx1
0000000 61 0d 62 0d 0a

> The only reason I was avoiding the redundant '\' escaping
> was to avoid having to do the unescaping like in cleanup_sum()
> here for example http://fslint.googlecode.com/svn/trunk/fslint/findup
> But I suppose even that's not general.
> 
> OK I think it's not worth changing the output format now,
> given the possibility of non GNU tools parsing incorrectly,
> and the edge case where the output is directly compared
> to older output.
> 
> I'll just do a maint commit to optimize/document at bit.

Pushed the non user visible adjustment at:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=4d94e65

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]