bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Forcibly-unbuffered redirect-to-pipe yields terrible perf


From: arnold
Subject: Re: Forcibly-unbuffered redirect-to-pipe yields terrible perf
Date: Thu, 09 Feb 2023 00:59:14 -0700
User-agent: Heirloom mailx 12.5 7/5/10

Hi.

Thank you for your note.

> 3697ec5c  Arnold D. Robbins  Thu Jul 15 23:12:49 2010 +0300  Moved to gawk 
> 2.11.

Although this is dated 2010, you'll note the comment that mentions
gawk 2.11.  It was in 2010 that I built the Git repo based on older
versions. 2.11 dates from approximately 1989, so the change is around 33
years old!

Unsurprisingly, I don't remember the details from that long ago.

I suspect that it was to ensure correct semantics when doing
things like

        print "some stuff that goes to stdout"
        print ... | "some command that send to stdout"
        print "more stuff that goes to stdout"

In such a case, the interleaved output has to come out in the
correct order.

I will investigate possible changes that would enable buffered
output to pipes while not breaking any semantics.

> Before this commit, the programmer had the choice; they could call fflush() or

Actually, this is incorrect; fflush() wasn't added to gawk until 3.0,
well after the above change.

By the way, you mention that you are using gawk for scientific
computing. I'm curious, can you give more detail?

Thanks,

Arnold

<alexandre.ferrieux@orange.com> wrote:

> Hi,
>
> When writing into a pipe redirection:
>
>       gawk '{print | "cat > /tmp/foo"}'
>
> ... gawk *always* handles the pipe as unbuffered. This can be witnessed with 
> an 
> external "tail -f /tmp/foo".
>
> This makes gawk completely unusable for any heavy-duty multipipe output, as 
> CPU 
> time is dominated by single-line write() syscalls.
>
> By contrast, heavy-duty multifile output *is* supported:
>
>       gawk '{print  > "/tmp/foo"}'
>
> ... is fully buffered. What is the logic behind this difference ?
> Note: it can be traced to this commit:
>
> 3697ec5c  Arnold D. Robbins  Thu Jul 15 23:12:49 2010 +0300  Moved to gawk 
> 2.11.
>
> .. with the following comment:
>
> >Improved handling of output bufferring:  now all print[f]s redirected to a 
> >tty
> >or pipe are flushed immediately and non-redirected output to a tty is flushed
> >before the next input record is read.
>
> Before this commit, the programmer had the choice; they could call fflush() 
> or 
> not, so that both "interactive" and "efficient" use cases were supported.
> Afterwards, the choice has disappeared: any write to a pipe is deemed 
> "interactive", incurring a syscall, and terrible performance.
>
> Can someone explain why this is an improvement ?
>
> PS: I do realize this has been the case for 13 years. But maybe it wasn't 
> spotted before, precisely because Awk was too slow for such heavy-duty tasks, 
> back in the days. Now things are different: Awk is a serious candidate for 
> scientific computing, and such details are just starting to be a problem.
>
> _________________________________________________________________________________________________________________________
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]