[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal: Making datamash extendable

From: Shawn Wagner
Subject: Re: Proposal: Making datamash extendable
Date: Fri, 20 May 2022 05:37:00 -0700

I need to generate a few gigs of random data and do some actual profiling and benchmarks, but I doubt there'd be a noticeable performance hit on modern (Or even not so modern) computers. And guile scripts at least can be compiled to bytecode ahead of time so they load faster. But moving existing operations out into scripts isn't something I'm really committed to; it was just a thought. No matter what I do want to refactor the internals to make it simpler to add new operations in C. Such a thing can certainly be done with a mind to making it possible to load them from dynamic libraries; there's no reason you can't have both options available.

On Wed, May 18, 2022 at 2:25 PM Tim Rice <trice@posteo.net> wrote:
Hey Shawn,

I like the idea of making datamash more easily extendable. On the other hand, I have concerns about the performance hit of moving core functionality out to any scripting language.

An idea that comes to mind is using something like Bash's dynamically-loadable builtins. We could have it so that datamash is able to read extra object files from a particular directory. Since they are dynamically linked after being compiled, I believe (correct me if I'm wrong) they would or could be language-agnostic. People could then write extensions with C, Fortran, or whatever. Even assembly if that's the way they like to party :)

Another option would be to do what Git does: a "core" program which basically just searches the path for any other program prefixed with `git-` and farms out the rest of the arguments to that subprogram. This would make datamash very easy to extend, with the main problem being it would certainly destroy backwards compatibility in heavy-handed ways.

If people do want to use scripting languages with datamash, our refactoring work for v2.0 could aim to establish a "libdatamash" which people could then create language bindings for. Then datamash could be scripted not only for guile or tcl but also python, perl, ruby, lua etc, depending on who wants to create the bindings for their favorite language.


~ Tim

On Wed, May 18, 2022 at 05:52:46AM -0700, Shawn Wagner wrote:
>(This is a datamash 2.0 idea)
>Currently, adding a new operation is an annoying pain - you have to
>touch 3 or 4 different source files, making sure the order of
>different things all match up, etc.
>I want to embed a scripting language in it so that if an unknown
>operation is encountered, it can just load a source file that
>implements it - and maybe rewrite some/all of the existing operations
>to use this framework. It'll make for easier additions of new
>features, and allow user-contributed ones without needing to patch and
>My preference for a language to use is Guile, since it's GNU's
>official extension language and I'm quite fond of Scheme, with tcl a
>close second. There are some who like lua for an embedded scripting
>language, but they're silly people who should be treated kindly.
>A simple example of what defining a new operation might look like:
>(define-scalar add1 #:type 'numeric #:help "Add 1 to the value"
>    (lambda (n) (+ n 1)))

reply via email to

[Prev in Thread] Current Thread [Next in Thread]