chicken-janitors
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte seq


From: Chicken Trac
Subject: Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences
Date: Mon, 30 Mar 2015 09:59:12 -0000

#1182: utf8 egg silently accepts invalid byte sequences
-------------------------+--------------------------------------------------
  Reporter:  syn         |       Owner:  ashinn 
      Type:  defect      |      Status:  closed 
  Priority:  major       |   Milestone:  someday
 Component:  extensions  |     Version:  4.9.x  
Resolution:  invalid     |    Keywords:  utf8   
-------------------------+--------------------------------------------------

Comment(by syn):

 Replying to [comment:6 ashinn]:
 > That's not a complete test,

 What's missing?


 > and you're using different code now.

 I was using `string-for-each` in my inital example to illustrate the
 general issue but that doesn't lend itself too well for a test so I
 switched to `string->list` instead. As  both procedures rely on the same
 UTF-8 decoder internally, the code is essentially equivalent AFAICT.


 > (use utf8) puts the standard procedures in utf8 mode.  If you
 > pass valid inputs to those procedures and get an invalid output
 > it's a bug, and I will fix it.  If you pass invalid inputs, you get
 > undefined results.  Both of your examples are of invalid inputs,
 > created outside of utf8.

 Yep, that's exactly the point: passing strings that were created without
 any of the `utf8` string constructors. Please also read my second last
 reply again: I agree with you about preserving the current behavior of the
 decoder procedures. Instead, we should provide validation procedures for
 users who need to deal with strings they received from untrusted sources
 (e.g. from third party libraries which don't use the `utf8` procedures).

 I think the issue boils down to the fact that the `utf8` egg overloads /
 re-uses the core string type but currently doesn't provide a predicate to
 check whether a string is actually valid for use with its API.

 I hope that clarifies my point :-) So again: Would you be interested in
 integrating such a validation predicate with the `utf8` egg? I think it
 would belong there but I can also make it a separate egg if you prefer.

-- 
Ticket URL: <http://bugs.call-cc.org/ticket/1182#comment:8>
CHICKEN Scheme <http://www.call-with-current-continuation.org/>
CHICKEN Scheme is a compiler for the Scheme programming language.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]