[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: line buffering in pipes
From: |
Assaf Gordon |
Subject: |
Re: line buffering in pipes |
Date: |
Thu, 2 May 2019 13:40:47 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 |
Follow-up for completeness:
On 2019-05-02 1:14 p.m., Assaf Gordon wrote:
If you do need to worry about special characters in filenames (or file
names with ':'), see file's "-print0" option in addition to "find
-print0 | xargs -0".
This might not be trivial, so here's one solution to using "file" to
detect and act on file types, even on files with special characters
(including ":" and new lines):
find [DIRECTORY] -type f -print0 \
| xargs -0r \
file --raw --no-buffer --no-pad \
--mime-type --print0 --print0 \
| sed -zn 'h;n;/application\/x-archive/{x;p}' \
| xargs -0 -n1 echo == processing file:
Several things here:
1. Using "find -print0 | xargs -0" - that's well known.
2. Using "file ... -raw --print0 --print0" (must be used TWICE).
this tells "file" not to encode special characters as octal,
and to print a NUL after the file name
and a second NUL after the mime-type, with no other field separator
(e.g. no ":" is printed).
For example:
$ touch $'hello\nworld .txt'
$ file --mime-type --print0 --print0 h* | od -taz
0000000 h e l l o nl w o r l d sp . t x
t >hello.world .txt<
0000020 nul i n o d e / x - e m p t y nul
>.inode/x-empty.<
The output is now "filename<NUL>mime-type<NUL".
3. Using "sed -z" (must be GNU sed) - use NUL as line-terminator
instead of new-line.
Based on file's output (above), every odd line is a file name
and every even line is a mime type.
4. The sed program:
'h' reads the odd lines and keeps them in the hold buffer (file name).
'n' reads the next line (even lines, the mime type).
'/application\/x-archive/' is a regex match to check if the mime-type
matches.
If it does match, '{x;p}' fetches the content of the hold buffer (the
filename) and 'p' prints it.
The result of the sed program is a NUL-terminated list of file names
whose mime-typed matched the regular expression.
5. Since it is a NUL-terminated list of file names,
we can feed it to "xargs -0" again and execute anything we want
on these files, safely.
Hope this helps,
- assaf
Re: line buffering in pipes,
Assaf Gordon <=