chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] CHICKEN in production


From: Oleg Kolosov
Subject: Re: [Chicken-hackers] CHICKEN in production
Date: Wed, 8 Oct 2014 01:31:30 +0400

On Oct 7, 2014, at 10:04 PM, Peter Bex <address@hidden> wrote:
> 
> On Tue, Oct 07, 2014 at 01:13:09AM +0400, Oleg Kolosov wrote:
> 
> Hello Oleg,
> 
> Thanks for providing some more information about your project!
> I think these kinds of postmortem analyses are very interesting
> and we should take to heart all the lessons learned, and use them
> to improve CHICKEN.
> 
>> We are trying to avoid using Chicken as a ‘glue’ because we figured that FFI 
>> transitions can be major bottleneck (especially strings).
> 
> The overhead of calling C should be pretty minimal in the usual cases,
> unless strings are the only problem.  If it's the only dealbreaker,
> I think that should be fixable.

Yes, FFI overhead is within 5% of the pure C program for simple cases: passing 
around immediate values and pointers. But we have very important use case - 
fuzzy search in the song info database: tens of thousands of records. We use 
custom highly tuned indexing algorithm. The initial implementation was written 
in Scheme, was small and beautiful (according to author) but unusably slow. We 
tried to tune it, but measured that cost of passing strings through FFI is 
still too big and unavoidable due to copying. Additionally, there was some 
performance problems with unicode handling. So, now we use libc locale 
functions for conversions and doing indexing and processing in C. This is pain 
but at least 3 times faster than Chicken. There are still a lot of trickery on 
the GUI side to provide responsive incremental search because the amount of 
data returned is still quite large. 

> 
>> And adding Chicken to a C program makes normal analysis and debugging tools 
>> pretty much useless (for finding memory leaks and such), so hardware 
>> interfacing layer is pure C with separate high level FFI bindings on top.
> 
> It takes some more practice, but debugging C code called from CHICKEN is
> quite doable in my experience, but then I've never done huge C & CHICKEN
> projects, only smaller libraries.  Could you explain a bit more what the
> problems are you ran into?

Yes, I’ve done some debugging of generated code for Windows port. It is 
possible in principle, but requires some familiarity with the implementation 
and used as a last resort (mysterious crashes and such). In reality call stacks 
are almost infinite - it is hard to pinpoint interesting parts within the wall 
of f1234 functions. And useful info about passed arguments and such is left in 
the generated comments - you need to inspect the sources with the ‘list’ 
command to view it. We tried to improve this with the insertion of #line 
directives without much success - code generator is too complex, especially 
where FFI is involved. We are inserting logging statements everywhere. 
Unfortunately logging considerably uglifies the code and makes some functional 
programming idioms much harder to use (like map/fold/cut oneliners). Also 
various analysis tools like Valgrind and libc malloc checkers fall flat when 
Chicken is involved.

> 
>> We also struggled with posix and process control functions a lot (long 
>> story), trying to be functional here backfires badly, so we ended up with 
>> straightforward and ugly code (looking like verbose C with parentheses), 
>> replacing some functions from standard library (namely process-run) and 
>> customized error handling.
> 
> Would you care to unpack this a little?

We are trying to simulate parallel processing and separate responsibilities 
with the worker processes communicating through sockets. There are also message 
passing threads involved for monitoring and control. Judging by the history 
this may be the most buggy part of the project. With numerous workarounds and 
special case handling. SIGINT handling is still buggy, but not critical for 
production. Yes, the task is complex, but the API is too confusing and fragile 
too. It might be adequate for C but in Scheme a lot of foots was shoot away.

> 
>> There was a few problems (I don’t remember clearly) with preemptive 
>> scheduling, so we are using strategically placed carefully adjusted sleeps 
>> with manual yields. I’ve borrowed a few ideas from Chicken implementation 
>> and made a video player (used for background: pure C, no FFI, no GUI) 
>> abusing libuv event loop for CPS trampoline. The code looks strange for 
>> casual observer but performs surprisingly well. I’ve not yet figured out how 
>> to wrap this for an egg (managing C callbacks is hard).
> 
> Sounds interesting.  So at least you got something out of it aside from
> just frustration ;)

There was some discussions about replacing Chicken scheduler with libuv event 
loop and providing filesystem and socket API on top of it. The scheduler 
modification is necessary to block green threads to simulate synchronous calls. 
There are a lot of custom and confusing code in Chicken around select function 
with workarounds for Windows. We think that libuv implementation is superior. 
There are some concept code but we’ve not progressed too far with this yet.

> 
>> So, in the end, there are some great things (see video in this thread) to 
>> showcase, but for me (low-level and performance stuff mostly) it was more 
>> pain than joy.
> 
> If you can pinpoint the exact places where performance is particularly
> bad we can (at least attempt to) fix them.

Passing large number of C strings through FFI back and forth, utf-8 (we tested 
on uppercase conversion and trimming AFAIR). Update with defstruct is horribly 
slow - I don’t know all the details, just heard the conversation.

Scheduler even with disable-interrupts is still active - very hard to diagnose, 
but mysterious bugs are fixed by going down to C and not returning back until 
everything is settled (like fork -> exec). It would be nice to have an option 
to get rid of it, i.e. for performance critical parts we would like to have 
complete manual control - without interrupt handling and such code inserted.

> 
>> There are hot internal discussions currently about migrating to something 
>> more widely supported (with proper debugger, profiler, and other useful 
>> tools) for our next big project, because a new hardware is more powerful and 
>> there are fewer restrictions.
> 
> This is a bit of a tidal function: sure, hardware gets faster every year,
> but then they invent some new class of device which is more constrained
> than the previous generation, or a new niche of computation evolves where
> every CPU cycle is precious (bitcoin mining? 3D games?).  So even though
> there are lots of people falling over eachother trying to tell you that
> "hardware doesn't matter" and you should use their slow-ass language,
> that's just bullshit: performance will *always* matter.

This is true. But our new platform is even more customized for the given use 
cases and contains various specialized hardware to assist the CPU (like DSPs 
and ADC/DAC’s). It is still early prototype, but we are discussing how many 
cycles we are ready to burn for supposedly faster and straightforward 
development process.

— 
Regards, Oleg
Art-System


Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail


reply via email to

[Prev in Thread] Current Thread [Next in Thread]