bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Read a fixed length of input each time


From: Andrew J. Schorr
Subject: Re: Read a fixed length of input each time
Date: Tue, 23 Jun 2020 14:02:44 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Jun 23, 2020 at 12:48:29PM -0500, Neil R. Ormos wrote:
> I've used this for a few different things.  I don't suggest these use cases 
> justify any changes to gawk or extensions.
> 
> 1.  Detecting file type.  Yes, I understand
>     there's the file(1) utility, but file(1)'s
>     behavior is not consistent across all plaforms
>     and versions, at at least in the past, there
>     was a significant delay between the time a
>     particular new file type was seen in the wild
>     and the availability of a file(1) (or magic
>     entry) that could detect it.  For portably
>     detecting among a tiny universe of possible
>     file types, it can be easier in gawk to
>     directly inspect the file's content than to
>     deal with the output of file(1), especially
>     when the user does not control when file(1) is
>     updated.
> 
> 2.  Extracting version information from Andoid APK
>     files on systems where Android Asset Packaging
>     Tool is not available.
> 
> 3.  Detecting groups of files having common
>     initial chunks of N bytes.  There are a few
>     different applications for this. One is
>     identifying probably essentially-duplicate
>     media files--e.g., video or audio files that
>     have the same substantive content and differ
>     only in metadata placed near the end of the
>     file.  Although there may be "better" ways to
>     do it using the shell or common utilities, the
>     function of those utilities can vary by
>     platform, and if you will need to process some
>     of the content of the file, orchestrating a
>     shell pipeline may not be more convenient or
>     efficient.

Do these all rely upon examining the beginning of the file?
If so, could one instead use "head -c <n>" to read the first <n>
bytes into gawk?

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]