[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Please advise work around or bug fix
From: |
Bernhard Voelker |
Subject: |
Re: Please advise work around or bug fix |
Date: |
Wed, 24 Mar 2021 22:13:08 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 |
Hi Kam,
On 3/24/21 6:17 PM, Yuen, Kam-Kuen CIV USARMY DEVCOM SC (USA) via Bug reports
for the GNU find utilities wrote:
> I am running the following command and the "ls" command gives error message
> that the file cannot be found. The problem is that the filename has spaces
> as part of the filename.
> The purpose is to find all files that exceeding file size of 1k. Filename
> might include spaces, special character like '
>
> find . -size +1k -print | xargs ls -sd
There is no bug in any of the tools involved in this command line, find(1),
xargs(1) and ls(1).
It is merely a wrong assumption about how they work (together).
Assumimg the above search will match the 2 files:
$ touch 'This is a Test'
$ touch ' This is another Test'
$ ls -log
total 0
-rw-r--r-- 1 0 Mar 24 21:36 ' This is another Test'
-rw-r--r-- 1 0 Mar 24 21:35 'This is a Test'
find(1) will print the file names matching the criteria, separated by a newline
character.
E.g.:
This is a Test <newline>
This is another Test <newline>
Shown as hex output:
$ find . -type f | od -tx1z
0000000 2e 2f 20 54 68 69 73 20 69 73 20 20 20 61 6e 6f >./ This is ano<
0000020 74 68 65 72 20 54 65 73 74 0a 2e 2f 54 68 69 73 >ther Test../This<
0000040 20 69 73 20 61 20 54 65 73 74 0a > is a Test.<
0000053
xargs(1) reads the entries from standard input, and assumes that the entries
are per default
separated by a <blank> character or a <newline>. See POSIX:
[...] arguments in the standard input are separated by unquoted <blank>
characters,
unescaped <blank> characters, or <newline> characters.
Also 'man xargs' documents this quite at the top:
[...] delimited by blanks [...] or newlines
Wit the above input from find(1), this means that xargs(1) recognizes the
following entries:
- 'This'
- 'is'
- 'a'
- 'Test'
- 'This'
- 'is'
- 'another'
- 'Test'
Note that blanks in the file names printed by find(1) will lead to separate
entries, with extra blanks already ignored.
As all of the above 8 entries can easily be packed into one invocation of the
command
to run, ls(1), it is started with those 8 separate arguments.
strace(1) shows what will be executed:
$ find . -type f | strace -ve execve xargs ls -logd
[...]
execve("/usr/bin/ls", ["ls", "-logd", "./", "This", "is", "another", "Test",
"./This", "is", "a", "Test"], ...) = 0
Obviously, ls(1) will probably not be able to stat(2) any of the files
(or in the worst case accidentally ones which have one of the shorter names).
> 1) The env is cygwin64 on Windows 10
>
> 2) Filename include space or special character
>
> 3) When running "ls" command directly on the folder, the screen show "
> ' " character surrounding the filename e.g. 'This is a Test Case With spaces
> in Filename.pdf'
As the output is a terminal, ls(1) defaults to quoting each file name properly
so that
it coule be copy+pasted safely to another command. Although there are
discussions about
this feature on the GNU coreutils mailing list, I personally consider this is a
good thing.
> 4) In the case the filename already has ' special character, the "ls"
> command shows the filename with double " around the filename e.g. " This is a
> Tester's File.pdf"
The same here: ls(1) quotes the file name so that it can be copy+pasted safely.
And note that this also includes the leading blank in the file name: " This
....".
> 5) When saving simple "ls" output to a file, do not see the surrounding
> character
Indeed, when printing to a file, ls(1) must only print the original characters
of the file names
without quoting.
> 6) Trying to use the -0 option with xargs but it complains the argument
> line too long
When using 'xargs -0', then the producer of the input also has to adhere to the
chosen
convention to separate the entries by a NUL character instead of newlines.
'man xargs' says:
-0, --null
...
The GNU find -print0 option produces input suitable for this mode.
> Can you advise How to handle filename with hidden character like ' or space
> or to report file size of current and subdirectories
There are several safer alternatives, all of them documented in the GNU
findutils manual.
https://www.gnu.org/software/findutils/manual/find.html
E.g.
# Tell find(1) to also use the NUL character as a separator: use -print0.
# This is safe for really all possible file names, including those with
single or double quotes,
# tabs and blanks, and finally also newlines. Yes, the only character which
cannot occur
# is the NUL character.
$ find . -size +1k -print0 | xargs -0 ls -sd
Note that xargs(1) will invoke ls(1) also if find(1) didn't match any file in
the above example.
Better to use the -r, --no-run-if-empty option:
$ find . -size +1k -print0 | xargs -r0 ls -sd
FWIW: One drawback is that there is a small race condition between the time
find(1) is examining
the file and the time ls(1) will see it: one has to be aware that file system
is constantly changing.
Another alternative is to let find(1) directly print the file size and file
name.
This avoid the race condition.
$ find . -size +1k -printf '%s %f\n'
Obviously, the output is not safe to process by another tool when a file name
contains a file name,
but for the human eyes its probably good enough.
Furthermore, there are also alternatives with other tools, e.g. the du(1)
command from the
GNU coreutils has a -t, --threshold option to filter files by their sizes (but
also outputs directories):
$ du -at +1k
Hope this helps.
Have a nice day,
Berny