guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)


From: Ludovic Courtès
Subject: Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)
Date: Wed, 03 Apr 2013 22:11:32 +0200
User-agent: Gnus/5.130005 (Ma Gnus v0.5) Emacs/24.3 (gnu/linux)

Mark H Weaver <address@hidden> skribis:

> address@hidden (Ludovic Courtès) writes:
>> Woow, well thought out.  The semantics seem good.  (It’s interesting to
>> see how BOMs complicate things, but that’s life, I guess.)
>>
>> The patch looks good to me.  The test suite is nice.  It doesn’t seem to
>> cover all the corner cases listed above, but that can be added later on
>> perhaps?
>
> Yes, the tests are still a work-in-progess, but I've added quite a few
> more since you last looked.

Nice.

>> Perhaps the text above could be added to the manual,
>
> In the attached patch, I've added a new node to the "Input and Output"
> section.

Perfect.

>>> +{
>>> +  scm_t_port *pt = SCM_PTAB_ENTRY (port);
>>> +  int result;
>>> +  int i = 0;
>>> +
>>> +  while (i < len && scm_peek_byte_or_eof (port) == bytes[i])
>>> +    {
>>> +      pt->read_pos++;
>>> +      i++;
>>> +    }
>>> +
>>> +  result = (i == len);
>>> +
>>> +  while (i > 0)
>>> +    scm_unget_byte (bytes[--i], port);
>>> +
>>> +  return result;
>>> +}
>>
>> Should it be scm_get_byte_or_eof given that scm_unget_byte is used later?
>
> Yes.  Bytes are only consumed if are equal to bytes[i], so an EOF will
> never be consumed or passed to scm_unget_byte.
>
>> What if pt->read_buf_size == 1?  What if there’s data in saved_read_buf?
>
> All of those details are handled by 'scm_peek_byte_or_eof', which is
> guaranteed to leave 'pt->read_pos' pointing at the byte that's returned
> (if not EOF).  Therefore, it's always safe to increment that pointer by
> one (but no more than one) after calling 'scm_peek_byte_or_eof' if it
> returned non-EOF.
>
> Look at the code for 'scm_peek_byte_or_eof' and this will be clear.
> Also note that you did the same thing in 'scm_utf8_codepoint' :)

Ah yes, indeed.

[...]

> Here's the new patch.  Any more suggestions?

Not from me!  OK to commit as far as I’m concerned.

Thank you!

Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]