[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Converting a part of byte vector to UTF-8 string
From: |
Mark H Weaver |
Subject: |
Re: Converting a part of byte vector to UTF-8 string |
Date: |
Wed, 15 Jan 2014 13:29:55 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) |
Panicz Maciej Godek <address@hidden> writes:
> Your solution seems reasonable, but I have found another way, which
> lead me to some new problems.
> I realised that since sockets are ports in guile, I could process them
> with the plain "read" (which is what I have been using them for
> anyway).
>
> However, this approach caused some new problems. The thing is that if
> I'm trying to read some message from port, and that message does not
> end with a delimiter (like a whitespace or a balancing, closing
> parenthesis), then the read would wait forever, possibly gluing its
> arguments.
>
> The solution I came up with is through soft ports. The idea is to have
> a port proxy, that -- if it would block -- would return an eof-object
> instead.
This is terribly inefficient, and also not robust. Guile's native soft
ports do not support efficient reading, because everything is one
character at a time. Also, Guile's 'char-ready?' currently does the job
of 'u8-ready?', i.e. it only checks if a _byte_ is available, not a
whole character, so the 'read-char' might still block. Anyway, if this
is a socket, what if the data isn't available simply because of network
latency? Then you'll generate a spurious EOF.
To offer my own answer to your original question: R7RS-small provides an
API that does precisely what you asked for. Its 'utf8->string'
procedure accepts optional 'start' and 'end' byte positions. I
implemented this on the 'r7rs-wip' branch of Guile git as follows:
http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=module/scheme/base.scm;h=f110d4c2b241ec0941b4223cece05c309db5308a;hb=r7rs-wip#l327
(import (rename (rnrs bytevectors)
(utf8->string r6rs-utf8->string)
(string->utf8 r6rs-string->utf8)
(bytevector-copy r6rs-bytevector-copy)
(bytevector-copy! r6rs-bytevector-copy!)))
[...]
(define bytevector-copy
(case-lambda
((bv)
(r6rs-bytevector-copy bv))
((bv start)
(let* ((len (- (bytevector-length bv) start))
(result (make-bytevector len)))
(r6rs-bytevector-copy! bv start result 0 len)
result))
((bv start end)
(let* ((len (- end start))
(result (make-bytevector len)))
(r6rs-bytevector-copy! bv start result 0 len)
result))))
(define utf8->string
(case-lambda
((bv) (r6rs-utf8->string bv))
((bv start)
(r6rs-utf8->string (bytevector-copy bv start)))
((bv start end)
(r6rs-utf8->string (bytevector-copy bv start end)))))