bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Read a fixed length of input each time


From: Neil R. Ormos
Subject: Re: Read a fixed length of input each time
Date: Tue, 23 Jun 2020 09:53:57 -0500 (CDT)

Peng Yu wrote:

> I'd like to read a binary in chunks of in
> bytes. awk by default uses RS to split the input
> as records. Is there a way to split the input in
> chunks with fixed length instead of using a
> deliminator? Thanks.

> PS. I see the following. But I am not sure if it
> is appropriate for my specific question.

> https://www.gnu.org/software/gawk/manual/html_node/Fixed-width-data.html

I think I understand what the OP is asking.

I do this [*1] via a loop that uses getline explicitly, e.g.,

  getline rec < (file)

(see manual, Using getline into a Variable from a File [*2]).  I haven't tried 
it using the regular awk input mechanism.

Set

  RS="................"

or the equivalent interval expression, where the number of characters to be 
matched is the chunk size.

Then, each getline will place chunk-size characters in RT, provided there are 
enough characters available to match RS.  Otherwise, the residual characters on 
the final getline resulting in reaching the end-of-file will be placed in the 
variable specified to receive the results from getline.

Because I don't know what multi-byte characters are, I always use the C locale, 
and in my experience, the characters in RT have always been /bytes/. I have no 
idea what happens in other locales.

I believe I have done this successfully with RS as large as about 1 M bytes.  
Setting RS seems to be time-consuming, so if you intend to build a large RS, 
e.g., in a loop, it is better to do so in a temporary variable and then assign 
that to RS.


[*1] I acknowledge the warnings from the developers and their suggestions that 
reading binary data can best be done with an extension or some pre-processing 
step.  But those solutions may not be available or uniform in all environments 
where gawk is available.  So, even if this RS-based method is not as good, it 
might allow the user to write a relatively portable program intended for 
several heterogeneous environments.

[*2] 
https://www.gnu.org/software/gawk/manual/html_node/Getline_002fVariable_002fFile.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]