[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Read a fixed length of input each time
From: |
Neil R. Ormos |
Subject: |
Re: Read a fixed length of input each time |
Date: |
Tue, 23 Jun 2020 09:53:57 -0500 (CDT) |
Peng Yu wrote:
> I'd like to read a binary in chunks of in
> bytes. awk by default uses RS to split the input
> as records. Is there a way to split the input in
> chunks with fixed length instead of using a
> deliminator? Thanks.
> PS. I see the following. But I am not sure if it
> is appropriate for my specific question.
> https://www.gnu.org/software/gawk/manual/html_node/Fixed-width-data.html
I think I understand what the OP is asking.
I do this [*1] via a loop that uses getline explicitly, e.g.,
getline rec < (file)
(see manual, Using getline into a Variable from a File [*2]). I haven't tried
it using the regular awk input mechanism.
Set
RS="................"
or the equivalent interval expression, where the number of characters to be
matched is the chunk size.
Then, each getline will place chunk-size characters in RT, provided there are
enough characters available to match RS. Otherwise, the residual characters on
the final getline resulting in reaching the end-of-file will be placed in the
variable specified to receive the results from getline.
Because I don't know what multi-byte characters are, I always use the C locale,
and in my experience, the characters in RT have always been /bytes/. I have no
idea what happens in other locales.
I believe I have done this successfully with RS as large as about 1 M bytes.
Setting RS seems to be time-consuming, so if you intend to build a large RS,
e.g., in a loop, it is better to do so in a temporary variable and then assign
that to RS.
[*1] I acknowledge the warnings from the developers and their suggestions that
reading binary data can best be done with an extension or some pre-processing
step. But those solutions may not be available or uniform in all environments
where gawk is available. So, even if this RS-based method is not as good, it
might allow the user to write a relatively portable program intended for
several heterogeneous environments.
[*2]
https://www.gnu.org/software/gawk/manual/html_node/Getline_002fVariable_002fFile.html
- Read a fixed length of input each time, Peng Yu, 2020/06/22
- Re: Read a fixed length of input each time, arnold, 2020/06/23
- Re: Read a fixed length of input each time,
Neil R. Ormos <=
- Re: Read a fixed length of input each time, Andrew J. Schorr, 2020/06/23
- Re: Read a fixed length of input each time, Peng Yu, 2020/06/23
- Re: Read a fixed length of input each time, Andrew J. Schorr, 2020/06/23
- Re: Read a fixed length of input each time, Peng Yu, 2020/06/23
- Re: Read a fixed length of input each time, Andrew J. Schorr, 2020/06/23
- Re: Read a fixed length of input each time, Peng Yu, 2020/06/23
- RE: Read a fixed length of input each time, Tom Gray, 2020/06/23
- Re: Read a fixed length of input each time, Neil R. Ormos, 2020/06/23
- Re: Read a fixed length of input each time, Andrew J. Schorr, 2020/06/23
- Re: Read a fixed length of input each time, Neil R. Ormos, 2020/06/23