bug#35350: Some compile output still leaks through with --verbosity=1

From: Ludovic Courtès
Subject: bug#35350: Some compile output still leaks through with --verbosity=1
Date: Sat, 04 May 2019 11:33:51 +0200
Hi Mark,

Mark H Weaver <address@hidden> skribis:

> Ludovic Courtès <address@hidden> writes:


>> So there are two things.  To fix the issue you reported (build output
>> that goes through), I think we must simply turn off UTF-8 decoding from
>> ‘process-stderr’ and leave that entirely to ‘build-event-output-port’.
> Can we assume that UTF-8 is the appropriate encoding for
> (current-build-output-port)?  My interpretation of the Guix manual entry
> for 'current-build-output-port' suggests that the answer should be "no".

What goes to ‘current-build-output-port’ comes from builds processes.
It’s usually UTF-8 but it can be anything, including binary garbage,
which should be gracefully handled.

That’s why ‘process-stderr’ currently uses ‘read-maybe-utf8-string’.

> Also, in your previous message you wrote:
>   The problem is the first layer of UTF-8 decoding that happens in
>   ‘process-stderr’, in the ‘%stderr-next’ case.  We would need to
>   disable it, but only if the build output port is
>   ‘build-event-output-port’ (i.e., it’s capable of interpreting
>   “multiplexed build output” correctly.)
> It sounds like you're suggesting that 'process-stderr' should look to
> see if (current-build-output-port) is a 'build-event-output-port', and
> in that case it should use binary I/O primitives to write raw binary
> data to it, otherwise it should use text I/O primitives and write
> characters to it.  Do I understand correctly?

Yes.  (Actually, rather than guessing if (current-build-output-port) is
a ‘build-event-output-port’, there could be a fluid to ask for the use
of raw binary primitives.)

> IMO, it would be cleaner to treat 'build-event-output-port' uniformly,
> and specifically as a textual port of unknown encoding.

(You mean ‘current-build-output-port’, right?)

I think you’re right.  I’m not yet entirely sure what the implications
are.  There’s a couple of tests in tests/store.scm for UTF-8
interpretation that describe behavior that I think we should preserve.

> I would suggest changing 'build-event-output-port' to create an R6RS
> custom *textual* output port, so that it wouldn't have to worry about
> encodings at all, and it would only be given whole characters.
> Internally, it would be doing exactly what you suggest above, but those
> details would be encapsulated within the custom textual port.
> However, I don't think we can use Guile's current implementation of R6RS
> custom textual output ports, which are currently built on Guile's legacy
> soft ports, which I suspect have a similar bug with multibyte characters
> sometimes being split (see 'soft_port_write' in vports.c).
> Having said all of this, my suggestions would ultimately entail having
> two separate places along the stderr pipeline where 'utf8->string!'
> would be used, and maybe that's too much until we have a more optimized
> C implementation of it.

Yeah it looks like we don’t yet have custom textual output ports that we
could rely on, do we?

I support your work to add that in Guile proper!


