[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lmi-commits] [lmi] master a167c11 3/7: Unit-test std::regex instead of
From: |
Greg Chicares |
Subject: |
[lmi-commits] [lmi] master a167c11 3/7: Unit-test std::regex instead of boost::regex |
Date: |
Sat, 2 Oct 2021 17:56:48 -0400 (EDT) |
branch: master
commit a167c110b772a0d6cb72f987940755cea3dd8758
Author: Gregory W. Chicares <gchicares@sbcglobal.net>
Commit: Gregory W. Chicares <gchicares@sbcglobal.net>
Unit-test std::regex instead of boost::regex
As this i686-w64-mingw32-gcc-10 comparison indicates, std::regex's
performance is woeful (the "0" lines are just string searches that
use no regex code):
std::regex
early 0: 2.520e-07 s mean; 0 us least of 39686 runs
early 1: 1.258e-04 s mean; 125 us least of 100 runs
early 2: 1.101e-04 s mean; 110 us least of 100 runs
early 3: 1.100e-04 s mean; 110 us least of 100 runs
late 0: 2.501e-07 s mean; 0 us least of 39981 runs
late 1: 1.254e-03 s mean; 1168 us least of 100 runs
late 2: 1.023e-03 s mean; 1012 us least of 100 runs
late 3: 1.020e-03 s mean; 1012 us least of 100 runs
never 0: 6.424e-08 s mean; 0 us least of 155674 runs
never 1: 1.168e-03 s mean; 1142 us least of 100 runs
never 2: 9.960e-04 s mean; 987 us least of 100 runs
never 3: 9.933e-04 s mean; 987 us least of 100 runs
boost-1.33.1
early 0: 2.498e-07 s mean; 0 us least of 40037 runs
early 1: 2.875e-05 s mean; 28 us least of 348 runs
early 2: 1.437e-05 s mean; 14 us least of 696 runs
early 3: 1.043e-05 s mean; 10 us least of 960 runs
late 0: 2.526e-07 s mean; 0 us least of 39587 runs
late 1: 3.622e-04 s mean; 361 us least of 100 runs
late 2: 1.895e-04 s mean; 189 us least of 100 runs
late 3: 1.260e-04 s mean; 125 us least of 100 runs
never 0: 8.592e-08 s mean; 0 us least of 116388 runs
never 1: 3.325e-04 s mean; 331 us least of 100 runs
never 2: 1.119e-04 s mean; 111 us least of 100 runs
never 3: 8.351e-05 s mean; 83 us least of 120 runs
---
regex_test.cpp | 142 +++++++++++++++++++++++++++++++--------------------------
1 file changed, 78 insertions(+), 64 deletions(-)
diff --git a/regex_test.cpp b/regex_test.cpp
index 12b8378..23f9ed0 100644
--- a/regex_test.cpp
+++ b/regex_test.cpp
@@ -21,12 +21,11 @@
#include "pchfile.hpp"
-#include "boost_regex.hpp"
-
#include "contains.hpp"
#include "test_tools.hpp"
#include "timer.hpp"
+#include <regex>
#include <sstream>
#include <string>
#include <vector>
@@ -105,6 +104,8 @@ bool contains_regex0(std::string const& regex)
/// Match a regex line by line.
///
+/// Historical notes: std::regex vs. boost.
+///
/// Perl 5 has 'm' and 's' modifiers that affect how
/// {caret, dollar, dot} match newlines:
///
@@ -115,6 +116,11 @@ bool contains_regex0(std::string const& regex)
/// m logical lines delimited by '\n' no
/// ms logical lines delimited by '\n' yes
///
+/// std::regex has an equivalent for perl's 'm' metacharacter (i.e.,
+/// std::regex::multiline flag), but the behavior of '.' is fixed
+/// and can only be changed by switching to a non-default regex
+/// syntax (demonstrated in some examples below).
+///
/// Perl's 's' is the default for boost regex; boost offers
/// http://boost.org/libs/regex/doc/syntax_option_type.html
/// "no_mod_s" and
@@ -162,10 +168,10 @@ bool contains_regex0(std::string const& regex)
bool contains_regex1(std::string const& regex)
{
- boost::regex const r(regex, boost::regex::sed);
+ std::regex const r(regex, std::regex::basic | std::regex::optimize);
for(auto const& i : lines)
{
- if(boost::regex_search(i, r))
+ if(std::regex_search(i, r))
{
return true;
}
@@ -173,18 +179,26 @@ bool contains_regex1(std::string const& regex)
return false;
}
-/// Match a regex as with Perl's '-s'.
+/// Match a regex as with Perl's '-s' (dot does not match newline).
+///
+/// This is the default (ECMAScript) behavior.
bool contains_regex2(std::string const& regex)
{
- return boost::regex_search(text, boost::regex("(?-s)" + regex));
+ return std::regex_search(text, std::regex(regex, std::regex::optimize));
}
-/// Match a regex as with Perl's 's'.
+/// Match a regex as with Perl's 's' (dot matches newline).
+///
+/// This is the behavior with various nondefault syntax options,
+/// of which BRE is arbitrarily chosen here.
bool contains_regex3(std::string const& regex)
{
- return boost::regex_search(text, boost::regex(regex));
+ return std::regex_search
+ (text
+ ,std::regex(regex, std::regex::basic | std::regex::optimize)
+ );
}
void mete_vectorize()
@@ -318,88 +332,88 @@ void test_input_sequence_regex()
// This is intended to be useful with xml schema languages, which
// implicitly anchor the entire regex, so '^' and '$' aren't used.
- boost::regex const r(R);
+ std::regex const r(R);
// Tests that are designed to succeed.
// Simple scalars.
- LMI_TEST( boost::regex_match("1234"
, r));
- LMI_TEST( boost::regex_match("glp"
, r));
+ LMI_TEST( std::regex_match("1234"
, r));
+ LMI_TEST( std::regex_match("glp"
, r));
// Semicolon-delimited values, as expected in inforce extracts.
- LMI_TEST( boost::regex_match("123;456;0"
, r));
+ LMI_TEST( std::regex_match("123;456;0"
, r));
// Same, with whitespace.
- LMI_TEST( boost::regex_match("123; 456; 0"
, r));
- LMI_TEST( boost::regex_match("123 ;456 ;0"
, r));
- LMI_TEST( boost::regex_match("123; 456; 0"
, r));
- LMI_TEST( boost::regex_match("123 ;456 ;0"
, r));
- LMI_TEST( boost::regex_match(" 123 ; 456 ; 0 "
, r));
- LMI_TEST( boost::regex_match(" 123 ; 456 ; 0 "
, r));
+ LMI_TEST( std::regex_match("123; 456; 0"
, r));
+ LMI_TEST( std::regex_match("123 ;456 ;0"
, r));
+ LMI_TEST( std::regex_match("123; 456; 0"
, r));
+ LMI_TEST( std::regex_match("123 ;456 ;0"
, r));
+ LMI_TEST( std::regex_match(" 123 ; 456 ; 0 "
, r));
+ LMI_TEST( std::regex_match(" 123 ; 456 ; 0 "
, r));
// Same, with optional terminal semicolon.
- LMI_TEST( boost::regex_match(" 123 ; 456 ; 0 ;"
, r));
- LMI_TEST( boost::regex_match(" 123 ; 456 ; 0 ; "
, r));
+ LMI_TEST( std::regex_match(" 123 ; 456 ; 0 ;"
, r));
+ LMI_TEST( std::regex_match(" 123 ; 456 ; 0 ; "
, r));
// Single scalar with terminal semicolon and various whitespace.
- LMI_TEST( boost::regex_match("123;"
, r));
- LMI_TEST( boost::regex_match("123 ;"
, r));
- LMI_TEST( boost::regex_match("123; "
, r));
- LMI_TEST( boost::regex_match(" 123 ; "
, r));
+ LMI_TEST( std::regex_match("123;"
, r));
+ LMI_TEST( std::regex_match("123 ;"
, r));
+ LMI_TEST( std::regex_match("123; "
, r));
+ LMI_TEST( std::regex_match(" 123 ; "
, r));
// Negatives (e.g., "negative" loans representing repayments).
- LMI_TEST( boost::regex_match("-987; -654"
, r));
+ LMI_TEST( std::regex_match("-987; -654"
, r));
// Decimals.
- LMI_TEST( boost::regex_match("0.;.0;0.0;1234.5678"
, r));
+ LMI_TEST( std::regex_match("0.;.0;0.0;1234.5678"
, r));
// Decimals, along with '#' and '@'.
- LMI_TEST( boost::regex_match("0.,2;.0,#3;0.0,@75;1234.5678"
, r));
+ LMI_TEST( std::regex_match("0.,2;.0,#3;0.0,@75;1234.5678"
, r));
// Same, with whitespace.
- LMI_TEST( boost::regex_match(" 0. , 2 ; .0 , # 3 ; 0.0 , @ 75 ; 1234.5678
" , r));
+ LMI_TEST( std::regex_match(" 0. , 2 ; .0 , # 3 ; 0.0 , @ 75 ; 1234.5678 "
, r));
// No numbers--only keywords.
- LMI_TEST( boost::regex_match("salary,retirement;corridor,maturity"
, r));
+ LMI_TEST( std::regex_match("salary,retirement;corridor,maturity"
, r));
// Same, with whitespace.
- LMI_TEST( boost::regex_match(" salary , retirement; corridor ,
maturity" , r));
- LMI_TEST( boost::regex_match(" salary , retirement; corridor ,
maturity " , r));
- LMI_TEST( boost::regex_match(" salary , retirement ; corridor ,
maturity" , r));
- LMI_TEST( boost::regex_match(" salary , retirement ; corridor ,
maturity " , r));
+ LMI_TEST( std::regex_match(" salary , retirement; corridor ,
maturity" , r));
+ LMI_TEST( std::regex_match(" salary , retirement; corridor ,
maturity " , r));
+ LMI_TEST( std::regex_match(" salary , retirement ; corridor ,
maturity" , r));
+ LMI_TEST( std::regex_match(" salary , retirement ; corridor ,
maturity " , r));
// Empty except for zero or more blanks.
- LMI_TEST( boost::regex_match(""
, r));
- LMI_TEST( boost::regex_match(" "
, r));
- LMI_TEST( boost::regex_match(" "
, r));
+ LMI_TEST( std::regex_match(""
, r));
+ LMI_TEST( std::regex_match(" "
, r));
+ LMI_TEST( std::regex_match(" "
, r));
// Interval notation.
- LMI_TEST( boost::regex_match("1 [2,3);4 (5,6]"
, r));
+ LMI_TEST( std::regex_match("1 [2,3);4 (5,6]"
, r));
// User-manual examples. See:
https://www.nongnu.org/lmi/sequence_input.html
- LMI_TEST( boost::regex_match("sevenpay 7; 250000 retirement; 100000 #10;
75000 @95; 50000", r));
- LMI_TEST( boost::regex_match("100000; 110000; 120000; 130000; 140000;
150000" , r));
- LMI_TEST( boost::regex_match("target; maximum"
, r)); // [Modified example.]
- LMI_TEST( boost::regex_match("10000 20; 0"
, r));
- LMI_TEST( boost::regex_match("10000 10; 5000 15; 0"
, r));
- LMI_TEST( boost::regex_match("10000 @70; 0"
, r));
- LMI_TEST( boost::regex_match("10000 retirement; 0"
, r));
- LMI_TEST( boost::regex_match("0 retirement; 5000"
, r));
- LMI_TEST( boost::regex_match("0 retirement; 5000 maturity"
, r));
- LMI_TEST( boost::regex_match("0 retirement; 5000 #10; 0"
, r));
- LMI_TEST( boost::regex_match("0,[0,retirement);10000,[retirement,#10);0"
, r));
+ LMI_TEST( std::regex_match("sevenpay 7; 250000 retirement; 100000 #10;
75000 @95; 50000", r));
+ LMI_TEST( std::regex_match("100000; 110000; 120000; 130000; 140000;
150000" , r));
+ LMI_TEST( std::regex_match("target; maximum"
, r)); // [Modified example.]
+ LMI_TEST( std::regex_match("10000 20; 0"
, r));
+ LMI_TEST( std::regex_match("10000 10; 5000 15; 0"
, r));
+ LMI_TEST( std::regex_match("10000 @70; 0"
, r));
+ LMI_TEST( std::regex_match("10000 retirement; 0"
, r));
+ LMI_TEST( std::regex_match("0 retirement; 5000"
, r));
+ LMI_TEST( std::regex_match("0 retirement; 5000 maturity"
, r));
+ LMI_TEST( std::regex_match("0 retirement; 5000 #10; 0"
, r));
+ LMI_TEST( std::regex_match("0,[0,retirement);10000,[retirement,#10);0"
, r));
// Tests that are designed to fail.
// Naked semicolon.
- LMI_TEST(!boost::regex_match(";"
, r));
- LMI_TEST(!boost::regex_match(" ; "
, r));
+ LMI_TEST(!std::regex_match(";"
, r));
+ LMI_TEST(!std::regex_match(" ; "
, r));
// Missing required semicolon.
- LMI_TEST(!boost::regex_match("7 24 25"
, r));
- LMI_TEST(!boost::regex_match("7,24,25"
, r));
- LMI_TEST(!boost::regex_match("7, 24, 25"
, r));
- LMI_TEST(!boost::regex_match("7 , 24 , 25"
, r));
+ LMI_TEST(!std::regex_match("7 24 25"
, r));
+ LMI_TEST(!std::regex_match("7,24,25"
, r));
+ LMI_TEST(!std::regex_match("7, 24, 25"
, r));
+ LMI_TEST(!std::regex_match("7 , 24 , 25"
, r));
// Extraneous commas.
- LMI_TEST(!boost::regex_match(",1"
, r));
- LMI_TEST(!boost::regex_match("1,"
, r));
- LMI_TEST(!boost::regex_match("1,2,"
, r));
- LMI_TEST(!boost::regex_match("1,,2"
, r));
+ LMI_TEST(!std::regex_match(",1"
, r));
+ LMI_TEST(!std::regex_match("1,"
, r));
+ LMI_TEST(!std::regex_match("1,2,"
, r));
+ LMI_TEST(!std::regex_match("1,,2"
, r));
// Impermissible character.
- LMI_TEST(!boost::regex_match("%"
, r));
+ LMI_TEST(!std::regex_match("%"
, r));
// Uppercase in keywords.
- LMI_TEST(!boost::regex_match("Glp"
, r));
- LMI_TEST(!boost::regex_match("GLP"
, r));
+ LMI_TEST(!std::regex_match("Glp"
, r));
+ LMI_TEST(!std::regex_match("GLP"
, r));
// Misppellings.
- LMI_TEST(!boost::regex_match("gdp"
, r));
- LMI_TEST(!boost::regex_match("glpp"
, r));
- LMI_TEST(!boost::regex_match("gglp"
, r));
+ LMI_TEST(!std::regex_match("gdp"
, r));
+ LMI_TEST(!std::regex_match("glpp"
, r));
+ LMI_TEST(!std::regex_match("gglp"
, r));
X = "(\\-?[0-9.]+)";
R = " *| *" + X + Y + "? *(; *" + X + Y + "? *)*;? *";
- [lmi-commits] [lmi] master updated (d2dd934 -> 368ee55), Greg Chicares, 2021/10/02
- [lmi-commits] [lmi] master da6b758 1/7: Make PCRE available, for pc-linux-gnu only, Greg Chicares, 2021/10/02
- [lmi-commits] [lmi] master c40b1c2 4/7: Unit-test PCRE (pc-linux-gnu only) as well as std::regex, Greg Chicares, 2021/10/02
- [lmi-commits] [lmi] master 11464e0 6/7: Use PCRE for 'test_coding_rules', Greg Chicares, 2021/10/02
- [lmi-commits] [lmi] master bcfffdd 5/7: Use an ELF 'test_coding_rules' regardless of $LMI_TRIPLET, Greg Chicares, 2021/10/02
- [lmi-commits] [lmi] master b87c2d5 2/7: Add C++ wrapper for PCRE2, Greg Chicares, 2021/10/02
- [lmi-commits] [lmi] master a167c11 3/7: Unit-test std::regex instead of boost::regex,
Greg Chicares <=
- [lmi-commits] [lmi] master 368ee55 7/7: Eradicate the boost regex library, Greg Chicares, 2021/10/02