sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regex library


From: Assaf Gordon
Subject: Re: Regex library
Date: Sun, 27 Jun 2021 00:24:35 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0

Hello,

On 2021-06-18 1:42 a.m., Pietro Paolini wrote:
In the sed source code there is a folder called lib/ which seems to include
the GNU lib and or maybe I am flay wrong and that isn't gnulib

The content of "/lib" in the sed-4.8.tar.gz is indeed a subset of gnulib.

Another question concerns the regex library in use, I can see the code
using regex functions defined as part of gnulib
[...]

Yet when I ldd the sed binary I can observe that PCRE is dynamically linked
[...]

What library is used for regex in GNU sed ? I inclined to say that PCRE
isn't used, after all libpthread gets linked too and it is not used.

First,
You're correct - PCRE is not used by gnu sed.

On my system, it is "libselinux" which uses PCRE (and sed does use selinux by default):

   $ ldd /lib/x86_64-linux-gnu/libselinux.so.1 | grep -i pcre
        libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3


Second,
As for which regex code is used, the answer is a bit nuanced.

The source code file which does the actual regex matching is "sed/regexp.c":
https://git.savannah.gnu.org/cgit/sed.git/tree/sed/regexp.c

Inside, two main function are used: re_compile_pattern() and re_search().

These are defined in gnulib's "regcomp.c" and "regexec.c" files:
https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/regcomp.c
https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/regexec.c


However,
These functions are also defined in glibc (although internal).

glibc and gnulib's source code are often synchronized, so these
functions might be identical, or (if it's an old glibc) - gnulib's
version that is bundled with gnu sed might be newer.

During "./configure", if the system's glibc is detected to have new-
enough version of these functions - they will be used.
Otherwise, the gnulib version will be used.

You can force the build to use glibc's version with:
    ./configure --without-included-regex
But that's not recommended, unless you are certain of what you're
doing.


Third,
To add another layer, GNU sed employs some regex optimizations using a
faster engine (gnulib's DFA engine, https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/dfa.c ).
That code is not available in glibc, and so it is always taken from gnulib.



Hope this answers the question.

regards,
 - assaf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]