[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gcl-devel] [Maxima-discuss] [Maxima-commits] [git] Maxima CAS branc
From: |
Raymond Toy |
Subject: |
Re: [Gcl-devel] [Maxima-discuss] [Maxima-commits] [git] Maxima CAS branch, master, updated. branch-5_37-base-91-gd9bf6ff |
Date: |
Sat, 10 Oct 2015 08:13:20 -0700 |
User-agent: |
Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b33 (darwin) |
>>>>> "Camm" == Camm Maguire <address@hidden> writes:
Camm> Greetings!
Camm> Raymond Toy <address@hidden> writes:
>> I, unfortunately, don't have great hope of seeing gcl with unicode any
>> time soon because the plan for supporting unicode is really
>> complicated. [1][2]
>>
>> --
>> Ray
>>
>> [1] UTF-8 strings with 21-bit Lisp character. I don't know how that's
>> going to work reliably when you can index at random points in the
>> string and also insert random characters into a utf-8 code
>> sequence.
>> [2] I suggested a really simple utf-16 with 16-bit chars to simplify
>> the implementation and still cover 99-44/100% of the use cases.
>> This is way easier to do with very minimal code changes.
>>
Camm> Perhaps I should weigh in here. I do have a branch starting utf8
Camm> unicode character support, but it will have to wait until post 2.6.13.
That's really great news!
Camm> Emacs takes this strategy, so I know its doable, and the performance
is
Camm> probably a net win as the gc overhead of the larger strings will
Camm> outweigh the string access times, I'm guessing. We also had a
Camm> discussion on gcl-devel that the current approach of defining a
Camm> character to be a byte, and relying on terminals etc. to do the
Camm> translation, is legal, although not desirable as a permanent
solution.
Camm> I can outline the algorithm if there is interest, but essentially a
Camm> simple one entry cache to cover the vast majority of cases of
sequential
Camm> access (utf8 can do this backwards as well) together with a log(N)
Camm> special character counting from the beginning, cache, or end (making
Camm> use of parallelism in long integers) for random access, appears quite
Camm> serviceable. This is not that complicated, and can be source inlined
Camm> escaping out the most common case of no special bytes, which can be
Camm> indicated by a flag in the header.
O(log(n)) access on strings certainly breaks people assumptions on
O(1) array access. Keeping the cache consistent seems error prone,
but I suppose most strings aren't modified at all. Strings are
probably composed of shorter strings and not modified in-place.
Best of luck! I'm looking forward to this.
Camm> (BTW, I've also put in open-stream-p for you in 2.6.13pre.)
Great!
--
Ray