nano-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nano-devel] WIP: Use caching for regex


From: Devin Hussey
Subject: Re: [Nano-devel] WIP: Use caching for regex
Date: Wed, 31 Oct 2018 12:03:37 -0400

I updated the gist. Tests are now compiled in order, and I fixed
nmatch/pmatch, adding a test based on an IBM example.

I also decided to attach it here if that is helpful.

Again, this is a standalone test with macros to wrap nanoisms.

Assuming regex_hash.c is in nano/src, your pwd is nano/src, and gnulib
is already configured and built, run
    gcc/clang regex_hash.c -I.. -I. -I../lib -DHAVE_CONFIG_H
-include../config.h ../lib/libgnu.a -O2 -Wall -std=gnu99 -o regex_hash
&& ./regex_hash

– Devin Hussey
On Tue, Oct 30, 2018 at 5:32 PM Devin Hussey <address@hidden> wrote:
>
> Sorry about the HTML from before. I send a lot of things from my phone
> and apparently, the Gmail app doesn't have an option for plain text,
> despite the fact that there are open feature requests dating back to
> 2012. I do a lot of coding in Termux on my phone.
>
> I was trying to figure out how to make RC file parsing faster. I
> initially thought it was strcmp hell, but I made a gperf and there was
> a whopping 0.010 second speedup, even with the nano syntax files
> included 8 times.
>
> So, I profiled nano, and I found the real bottleneck. Nano regcomps
> and regfrees aggressively (see: found_in_list). regcomp is a pretty
> expensive function, and that is the real reason that parsing is slow.
>
> Nano has a lot of syntax rules that use the same regex, so I was
> thinking we should reuse the regexes whenever possible.
>
> Here is what I have now (it is a GitHub Gist, because I am not sure
> whether attaching stuff is a good idea):
>
>     https://gist.github.com/easyaspi314/07e667636fde11216dae190f03d362c6
>
> That is a standalone test program with a bunch of logging. Nano-isms
> are defined to macros, and the code should fit right in once the
> macros are removed and proto.h is included.
>
> I use the FNV-1a (32-bit) hashing algorithm and a hash table to do the
> caching, and I use reference counting to allow code to call regfree
> safely.
> Collisions are handled by using a doubly-linked list (I am thinking of
> only a one-way list, I am not sure yet) and comparing the following
> information:
>  - The full 32-bit hash
>  - The regex flags
>  - The regex string itself
>
> I also check if any regex special characters exist in the string. If
> so, I skip compiling the regex altogether, instead opting for a
> strstr/strcasestr approach.
>
> Obviously, it isn't complete and it needs more testing (especially
> nmatch and pmatch), but do you have any thoughts?
>
> Would it be better to just not free the regexes until the program
> closes instead of reference counting? I don't know what effect it
> would have on the memory usage. If so, we can easily remove the
> backward link and refcount because we don't need it.
>
> Either way, I think this would be better than what we have now.
>
> -- Devin Hussey

Attachment: regex_hash.c
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]