[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Worrying development
From: |
Tom Lord |
Subject: |
Re: Worrying development |
Date: |
Fri, 23 Jan 2004 14:18:25 -0800 (PST) |
> From: Marius Vollmer <address@hidden>
> Tom Lord <address@hidden> writes:
> > Mutation-sharing shared substrings are an upwards compatible extension
> > to the Scheme standard. They break no correct programs. They enable
> > new kinds of programs.
> I'd say that the real 'trouble' is that strings are mutable at
> all.
Worried mostly about variable-length character encodings in string?
Or you'd just rather be programming in an ML-family language? :-)
If it's variable-length encodings that irk you: if strings were
read-only you'd want to optimize the heck out of STRING-APPEND and
SUBSTRING and, once you did that, you'd have essentially enough
machinery to do mutations efficiently.
> Also, I still like the idea of using mutation-sharing substrings as
> markers that allow O(1) access into variable-width encoded strings.
Interesting. The interaction with STRING-SET! will be tricky. I
think you'll either have to "timestamp" strings (one tick per mutation
-- and you'll likely have to use a GC'ed value rather than an inline
integer for timestamps) or wind up with O(K) for mutations where K is
the number of shared substrings.
The same problem comes up if you add STRING-RESIZE!. I keep going
back and forth on whether or not strings should be the same things as
or a subset buffers vs. making buffers a completely separate type.
(The latter certainly seems to be easier to implement.)
> Also, there is the possibility on the horizon that we turn
> string-ref etc into 'primitive generics' which means that people
> could implement new kinds of strings using GOOPS.
Well, heck. In that case, maybe consider what I'm planning for Pika
(at least initially). Purely ASCII strings are stored 1-byte per
character. Most other strings 2-bytes per character. Strings using
characters outside the Basic Multilingual Plane, 4 bytes per
character.
You want some fancier-than-libc string functions in C for that -- but
it gives you an expected-case O(1) for STRING-REF and STRING-SET! and
pretty good space efficiency. It also gives you some performance
glitches as when you store a U+0100 character in an otherwise purely
ASCII 10MB string. (We're working on providing such fancier-than-libc
functions in libhackerlab -- so they'd be available independently of
Pika if you went this route.)
-t
- Re: Worrying development, (continued)
- Re: Worrying development, tomas, 2004/01/16
- Re: Worrying development, Marius Vollmer, 2004/01/18
- Re: Worrying development, Tom Lord, 2004/01/18
- Re: Worrying development, Dirk Herrmann, 2004/01/22
- Re: Worrying development, Tom Lord, 2004/01/22
- Re: Worrying development, Dirk Herrmann, 2004/01/23
- Re: Worrying development, Tom Lord, 2004/01/23
- Re: Worrying development, Dirk Herrmann, 2004/01/23
- Re: Worrying development, Tom Lord, 2004/01/23
- Re: Worrying development, Marius Vollmer, 2004/01/23
- Re: Worrying development,
Tom Lord <=
- Re: Worrying development, Marius Vollmer, 2004/01/23
- Re: Worrying development, Tom Lord, 2004/01/23
- Re: Worrying development, Paul Jarc, 2004/01/23
- Re: Worrying development, rm, 2004/01/24
- Re: Worrying development, Marius Vollmer, 2004/01/24
- overriding car/cdr (was: Worrying development), Paul Jarc, 2004/01/25
- Re: Shared Substrings [was: Worrying development], Robert Uhl, 2004/01/22