[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)
From: |
Ludovic Courtès |
Subject: |
Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs) |
Date: |
Wed, 03 Apr 2013 22:11:32 +0200 |
User-agent: |
Gnus/5.130005 (Ma Gnus v0.5) Emacs/24.3 (gnu/linux) |
Mark H Weaver <address@hidden> skribis:
> address@hidden (Ludovic Courtès) writes:
>> Woow, well thought out. The semantics seem good. (It’s interesting to
>> see how BOMs complicate things, but that’s life, I guess.)
>>
>> The patch looks good to me. The test suite is nice. It doesn’t seem to
>> cover all the corner cases listed above, but that can be added later on
>> perhaps?
>
> Yes, the tests are still a work-in-progess, but I've added quite a few
> more since you last looked.
Nice.
>> Perhaps the text above could be added to the manual,
>
> In the attached patch, I've added a new node to the "Input and Output"
> section.
Perfect.
>>> +{
>>> + scm_t_port *pt = SCM_PTAB_ENTRY (port);
>>> + int result;
>>> + int i = 0;
>>> +
>>> + while (i < len && scm_peek_byte_or_eof (port) == bytes[i])
>>> + {
>>> + pt->read_pos++;
>>> + i++;
>>> + }
>>> +
>>> + result = (i == len);
>>> +
>>> + while (i > 0)
>>> + scm_unget_byte (bytes[--i], port);
>>> +
>>> + return result;
>>> +}
>>
>> Should it be scm_get_byte_or_eof given that scm_unget_byte is used later?
>
> Yes. Bytes are only consumed if are equal to bytes[i], so an EOF will
> never be consumed or passed to scm_unget_byte.
>
>> What if pt->read_buf_size == 1? What if there’s data in saved_read_buf?
>
> All of those details are handled by 'scm_peek_byte_or_eof', which is
> guaranteed to leave 'pt->read_pos' pointing at the byte that's returned
> (if not EOF). Therefore, it's always safe to increment that pointer by
> one (but no more than one) after calling 'scm_peek_byte_or_eof' if it
> returned non-EOF.
>
> Look at the code for 'scm_peek_byte_or_eof' and this will be clear.
> Also note that you did the same thing in 'scm_utf8_codepoint' :)
Ah yes, indeed.
[...]
> Here's the new patch. Any more suggestions?
Not from me! OK to commit as far as I’m concerned.
Thank you!
Ludo’.
- [PATCH] Improve handling of Unicode byte-order marks (BOMs), Mark H Weaver, 2013/04/03
- Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs), Mark H Weaver, 2013/04/03
- Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs), Ludovic Courtès, 2013/04/03
- Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs), Andy Wingo, 2013/04/04
- Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs), Mark H Weaver, 2013/04/05
- Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs), Mike Gran, 2013/04/05
- Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs), Ludovic Courtès, 2013/04/05
- Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs), Mark H Weaver, 2013/04/05