[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Forcibly-unbuffered redirect-to-pipe yields terrible perf

From: arnold
Subject: Re: Forcibly-unbuffered redirect-to-pipe yields terrible perf
Date: Thu, 09 Feb 2023 00:59:14 -0700
User-agent: Heirloom mailx 12.5 7/5/10


Thank you for your note.

> 3697ec5c  Arnold D. Robbins  Thu Jul 15 23:12:49 2010 +0300  Moved to gawk 
> 2.11.

Although this is dated 2010, you'll note the comment that mentions
gawk 2.11.  It was in 2010 that I built the Git repo based on older
versions. 2.11 dates from approximately 1989, so the change is around 33
years old!

Unsurprisingly, I don't remember the details from that long ago.

I suspect that it was to ensure correct semantics when doing
things like

        print "some stuff that goes to stdout"
        print ... | "some command that send to stdout"
        print "more stuff that goes to stdout"

In such a case, the interleaved output has to come out in the
correct order.

I will investigate possible changes that would enable buffered
output to pipes while not breaking any semantics.

> Before this commit, the programmer had the choice; they could call fflush() or

Actually, this is incorrect; fflush() wasn't added to gawk until 3.0,
well after the above change.

By the way, you mention that you are using gawk for scientific
computing. I'm curious, can you give more detail?



<alexandre.ferrieux@orange.com> wrote:

> Hi,
> When writing into a pipe redirection:
>       gawk '{print | "cat > /tmp/foo"}'
> ... gawk *always* handles the pipe as unbuffered. This can be witnessed with 
> an 
> external "tail -f /tmp/foo".
> This makes gawk completely unusable for any heavy-duty multipipe output, as 
> CPU 
> time is dominated by single-line write() syscalls.
> By contrast, heavy-duty multifile output *is* supported:
>       gawk '{print  > "/tmp/foo"}'
> ... is fully buffered. What is the logic behind this difference ?
> Note: it can be traced to this commit:
> 3697ec5c  Arnold D. Robbins  Thu Jul 15 23:12:49 2010 +0300  Moved to gawk 
> 2.11.
> .. with the following comment:
> >Improved handling of output bufferring:  now all print[f]s redirected to a 
> >tty
> >or pipe are flushed immediately and non-redirected output to a tty is flushed
> >before the next input record is read.
> Before this commit, the programmer had the choice; they could call fflush() 
> or 
> not, so that both "interactive" and "efficient" use cases were supported.
> Afterwards, the choice has disappeared: any write to a pipe is deemed 
> "interactive", incurring a syscall, and terrible performance.
> Can someone explain why this is an improvement ?
> PS: I do realize this has been the case for 13 years. But maybe it wasn't 
> spotted before, precisely because Awk was too slow for such heavy-duty tasks, 
> back in the days. Now things are different: Awk is a serious candidate for 
> scientific computing, and such details are just starting to be a problem.
> _________________________________________________________________________________________________________________________
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]