chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] substring function and bounds checks


From: Peter Bex
Subject: Re: [Chicken-hackers] substring function and bounds checks
Date: Wed, 6 Feb 2013 10:21:56 +0100
User-agent: Mutt/1.4.2.3i

On Wed, Feb 06, 2013 at 12:55:16AM +0100, Michele La Monaca wrote:
> What I can say... Well, maybe one day I will see the light, in the
> meanwhile I would just have preferred a more useful substring
> function. I really think that the one provided by chicken is simply
> not on par with other languages, sorry.

This is not a question of being "on par" or not.  Some other languages
choose to have nonsensical requests like "give me characters 0
through 10 from this 3-character string" to return the original string.

Substring is a silly and trivial example but it clearly illustrates the
deep fundamental differences in philosophy between the different
cultures of these languages.

The most valuable gift a computer programming language can give you is
the ability to reason about a piece of code without extra information.
For example, if you see (substring s 0 10) you immediately *know* in
any code that follows:

- The variable s is a string
- The string s is at least 10 characters long
- The returned value is a string
- The returned string is exactly 10 characters long

If none of the above are true, you'll have an error situation and
the code following the substring call will not be executed.
In "sloppy" languages, you lose several of these important footholds,
which means you can't reason so well about your code's correctness
anymore, except by reading back and dragging in more context.

The above guarantees mean, for example, that if later you see
(string-ref s 8) this will *always* return a character.  In your
other languages, you won't know what exactly it'll do.  You'll have two
lines of possible future traces through your program: one where there
was a character and one where there wasn't.  Multiply this by every
sloppy operator you use and you end up with a tangled web of possible
futures.  Often, only one of those futures is what you had in mind when
writing code.  The other possibilities produce wrong computations, which
may result in data corruption or security problems.

Sloppiness promotes muddled thinking and imprecise code.  Imprecise
code is prone to bugs, which may lead to security problems.  This may
not be fair, but I think a particularly good example is the 2 recent
extremely dangerous vulnerabilities in Ruby on Rails which allowed for
remote code execution.

What caused these problems to happen?  Misplaced convenience and
sloppiness: Rails allowed Yaml data to be embedded in XML and JSON,
which automatically got parsed.  Of course, XML and JSON are their own
formats and their specifications don't mention anything about Yaml.
This means a Yaml parser has nothing, and I mean *nothing* to do in
an XML parser.  It may be convenient in a small set of cases, but
it's just another case of adding more magic: it's not something you
asked for, so it shouldn't happen.

For another good example of the deep confusion caused by code that
is sloppy by design, read https://bugs.php.net/bug.php?id=54547
Read it, and also note that the list of bizarre and unexpected
consequences includes, yet again, security implications.
At least, it's unexpected to users from different programming cultures,
expecting programming languages to be precise.

Still not convinced?  Look at the incredulity of the Postgres community
in response to being shown MySQL's sloppy behavior:
http://www.postgresql.org/message-id/address@hidden
Fundamentally. Different. Cultures.

I understand you're only asking about strings, but this sort of thing
needs to be considered in the wider cultural context, and Schemers
prefer precision and correctness over sloppiness and second-guessing.

> The semantic of a
> commonly-found substring function "give me at most N chars starting
> from a certain position in the string" is the most useful according
> me. I don't see any evil in that.

See above, consider the profound implications of widespread culturally
encouraged sloppiness.  This is a really important insight that should
not be dismissed out of hand for its apparent triviality.  It's
fundamental to every program you write in a language.

> The chicken (scheme?) alternative
> "give me exactly N chars or blow up" is rather limited in scope to me.

It's a substring procedure.  It *should* be limited in scope.  It should
"do one thing, and do it well".  The Unix community learned that lesson
well after the deluge of flags that Berkeley added to cat(1).
If you want a slicing procedure that includes a kitchen sink, that can
be easily added on top of such precise, simple basics, as shown by the
"slice" egg.  If you think that's too bloated, you can include the
trivial substring/n procedure Jim posted.

> > Scheme is about correctness.  If you provide invalid indices, you get 
> > errors.
> 
> Well, in the real world you can't always predict input and therefore
> you must do checks.

As has been pointed out before, you'll need to add back extra checks if
you rely on the sloppy behavior and want to make it more precise, in
order to reason about it properly.  The main problem is that it's
awfully tempting to "forget" to add proper checks, and now you have
another security problem that snuck into your system.

Being precise is a virtue when telling a computer what to do.  If you
omit important details, an attacker will provide them to the computer
for you.

> Either you leave this burden to the user or you kindly provide this
> even-cobol-has-it feature.

It's a misfeature.  This is like arguing that "1" + 2 not returning 3
is a "even Javascript provides it" feature and Scheme should add it.
We disagree, it's a bad idea.  This is fundamental to the culture of
a particular programming language and you're simply not going to get
one culture to agree that the values of another culture must be adopted
wholesale.

If you fundamentally disagree with the culture, you're programming in
the wrong language.  If you're still learning the language, like when
visiting a foreign country, you should delay judgement based on
preconceptions and try to understand the language's underlying culture
first before suggesting making radical differences to the language.

> >This will help you detect bugs early on instead of just keep going on with a
> >bad result of an incorrect computation until some other thing fails much
> >farther along.  This kind of thing also tends to sneak in vulnerabilities, as
> >you never *really* know what your code will do in the face of 
> >inconsistencies.
> >"fail early and noisily" is good design.
> 
> To me, this only means more bloated, unreadable code in the best case,
> unfriendly crashes in the worst one.

Like it was pointed out elsewhere, a controlled error situation is not a
crash.  Also, errors are good; when asking nonsensical questions you
should get "no can do" as an answer, rather than nonsensical answers.
Doing anything else is just building on quicksand.  But that's just my
opinion, and the reason I like (Chicken) Scheme so much: most of its
community seems to have agreed on these values long ago.

Cheers,
Peter
-- 
http://sjamaan.ath.cx



reply via email to

[Prev in Thread] Current Thread [Next in Thread]