lmi-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi-commits] [lmi] master a167c11 3/7: Unit-test std::regex instead of


From: Greg Chicares
Subject: [lmi-commits] [lmi] master a167c11 3/7: Unit-test std::regex instead of boost::regex
Date: Sat, 2 Oct 2021 17:56:48 -0400 (EDT)

branch: master
commit a167c110b772a0d6cb72f987940755cea3dd8758
Author: Gregory W. Chicares <gchicares@sbcglobal.net>
Commit: Gregory W. Chicares <gchicares@sbcglobal.net>

    Unit-test std::regex instead of boost::regex
    
    As this i686-w64-mingw32-gcc-10 comparison indicates, std::regex's
    performance is woeful (the "0" lines are just string searches that
    use no regex code):
    
    std::regex
    
      early 0:   2.520e-07 s mean;          0 us least of 39686 runs
      early 1:   1.258e-04 s mean;        125 us least of 100 runs
      early 2:   1.101e-04 s mean;        110 us least of 100 runs
      early 3:   1.100e-04 s mean;        110 us least of 100 runs
    
      late  0:   2.501e-07 s mean;          0 us least of 39981 runs
      late  1:   1.254e-03 s mean;       1168 us least of 100 runs
      late  2:   1.023e-03 s mean;       1012 us least of 100 runs
      late  3:   1.020e-03 s mean;       1012 us least of 100 runs
    
      never 0:   6.424e-08 s mean;          0 us least of 155674 runs
      never 1:   1.168e-03 s mean;       1142 us least of 100 runs
      never 2:   9.960e-04 s mean;        987 us least of 100 runs
      never 3:   9.933e-04 s mean;        987 us least of 100 runs
    
    boost-1.33.1
    
      early 0:   2.498e-07 s mean;          0 us least of 40037 runs
      early 1:   2.875e-05 s mean;         28 us least of 348 runs
      early 2:   1.437e-05 s mean;         14 us least of 696 runs
      early 3:   1.043e-05 s mean;         10 us least of 960 runs
    
      late  0:   2.526e-07 s mean;          0 us least of 39587 runs
      late  1:   3.622e-04 s mean;        361 us least of 100 runs
      late  2:   1.895e-04 s mean;        189 us least of 100 runs
      late  3:   1.260e-04 s mean;        125 us least of 100 runs
    
      never 0:   8.592e-08 s mean;          0 us least of 116388 runs
      never 1:   3.325e-04 s mean;        331 us least of 100 runs
      never 2:   1.119e-04 s mean;        111 us least of 100 runs
      never 3:   8.351e-05 s mean;         83 us least of 120 runs
---
 regex_test.cpp | 142 +++++++++++++++++++++++++++++++--------------------------
 1 file changed, 78 insertions(+), 64 deletions(-)

diff --git a/regex_test.cpp b/regex_test.cpp
index 12b8378..23f9ed0 100644
--- a/regex_test.cpp
+++ b/regex_test.cpp
@@ -21,12 +21,11 @@
 
 #include "pchfile.hpp"
 
-#include "boost_regex.hpp"
-
 #include "contains.hpp"
 #include "test_tools.hpp"
 #include "timer.hpp"
 
+#include <regex>
 #include <sstream>
 #include <string>
 #include <vector>
@@ -105,6 +104,8 @@ bool contains_regex0(std::string const& regex)
 
 /// Match a regex line by line.
 ///
+/// Historical notes: std::regex vs. boost.
+///
 /// Perl 5 has 'm' and 's' modifiers that affect how
 /// {caret, dollar, dot} match newlines:
 ///
@@ -115,6 +116,11 @@ bool contains_regex0(std::string const& regex)
 ///       m       logical lines delimited by '\n'          no
 ///      ms       logical lines delimited by '\n'         yes
 ///
+/// std::regex has an equivalent for perl's 'm' metacharacter (i.e.,
+/// std::regex::multiline flag), but the behavior of '.' is fixed
+/// and can only be changed by switching to a non-default regex
+/// syntax (demonstrated in some examples below).
+///
 /// Perl's 's' is the default for boost regex; boost offers
 ///   http://boost.org/libs/regex/doc/syntax_option_type.html
 /// "no_mod_s" and
@@ -162,10 +168,10 @@ bool contains_regex0(std::string const& regex)
 
 bool contains_regex1(std::string const& regex)
 {
-    boost::regex const r(regex, boost::regex::sed);
+    std::regex const r(regex, std::regex::basic | std::regex::optimize);
     for(auto const& i : lines)
         {
-        if(boost::regex_search(i, r))
+        if(std::regex_search(i, r))
             {
             return true;
             }
@@ -173,18 +179,26 @@ bool contains_regex1(std::string const& regex)
     return false;
 }
 
-/// Match a regex as with Perl's '-s'.
+/// Match a regex as with Perl's '-s' (dot does not match newline).
+///
+/// This is the default (ECMAScript) behavior.
 
 bool contains_regex2(std::string const& regex)
 {
-    return boost::regex_search(text, boost::regex("(?-s)" + regex));
+    return std::regex_search(text, std::regex(regex, std::regex::optimize));
 }
 
-/// Match a regex as with Perl's 's'.
+/// Match a regex as with Perl's 's' (dot matches newline).
+///
+/// This is the behavior with various nondefault syntax options,
+/// of which BRE is arbitrarily chosen here.
 
 bool contains_regex3(std::string const& regex)
 {
-    return boost::regex_search(text, boost::regex(regex));
+    return std::regex_search
+        (text
+        ,std::regex(regex, std::regex::basic | std::regex::optimize)
+        );
 }
 
 void mete_vectorize()
@@ -318,88 +332,88 @@ void test_input_sequence_regex()
 
     // This is intended to be useful with xml schema languages, which
     // implicitly anchor the entire regex, so '^' and '$' aren't used.
-    boost::regex const r(R);
+    std::regex const r(R);
 
     // Tests that are designed to succeed.
 
     // Simple scalars.
-    LMI_TEST( boost::regex_match("1234"                                        
               , r));
-    LMI_TEST( boost::regex_match("glp"                                         
               , r));
+    LMI_TEST( std::regex_match("1234"                                          
             , r));
+    LMI_TEST( std::regex_match("glp"                                           
             , r));
     // Semicolon-delimited values, as expected in inforce extracts.
-    LMI_TEST( boost::regex_match("123;456;0"                                   
               , r));
+    LMI_TEST( std::regex_match("123;456;0"                                     
             , r));
     // Same, with whitespace.
-    LMI_TEST( boost::regex_match("123; 456; 0"                                 
               , r));
-    LMI_TEST( boost::regex_match("123 ;456 ;0"                                 
               , r));
-    LMI_TEST( boost::regex_match("123;  456;  0"                               
               , r));
-    LMI_TEST( boost::regex_match("123  ;456  ;0"                               
               , r));
-    LMI_TEST( boost::regex_match(" 123  ;  456  ;  0 "                         
               , r));
-    LMI_TEST( boost::regex_match("  123  ;  456  ;  0  "                       
               , r));
+    LMI_TEST( std::regex_match("123; 456; 0"                                   
             , r));
+    LMI_TEST( std::regex_match("123 ;456 ;0"                                   
             , r));
+    LMI_TEST( std::regex_match("123;  456;  0"                                 
             , r));
+    LMI_TEST( std::regex_match("123  ;456  ;0"                                 
             , r));
+    LMI_TEST( std::regex_match(" 123  ;  456  ;  0 "                           
             , r));
+    LMI_TEST( std::regex_match("  123  ;  456  ;  0  "                         
             , r));
     // Same, with optional terminal semicolon.
-    LMI_TEST( boost::regex_match("  123  ;  456  ;  0  ;"                      
               , r));
-    LMI_TEST( boost::regex_match("  123  ;  456  ;  0  ;  "                    
               , r));
+    LMI_TEST( std::regex_match("  123  ;  456  ;  0  ;"                        
             , r));
+    LMI_TEST( std::regex_match("  123  ;  456  ;  0  ;  "                      
             , r));
     // Single scalar with terminal semicolon and various whitespace.
-    LMI_TEST( boost::regex_match("123;"                                        
               , r));
-    LMI_TEST( boost::regex_match("123 ;"                                       
               , r));
-    LMI_TEST( boost::regex_match("123; "                                       
               , r));
-    LMI_TEST( boost::regex_match(" 123 ; "                                     
               , r));
+    LMI_TEST( std::regex_match("123;"                                          
             , r));
+    LMI_TEST( std::regex_match("123 ;"                                         
             , r));
+    LMI_TEST( std::regex_match("123; "                                         
             , r));
+    LMI_TEST( std::regex_match(" 123 ; "                                       
             , r));
     // Negatives (e.g., "negative" loans representing repayments).
-    LMI_TEST( boost::regex_match("-987; -654"                                  
               , r));
+    LMI_TEST( std::regex_match("-987; -654"                                    
             , r));
     // Decimals.
-    LMI_TEST( boost::regex_match("0.;.0;0.0;1234.5678"                         
               , r));
+    LMI_TEST( std::regex_match("0.;.0;0.0;1234.5678"                           
             , r));
     // Decimals, along with '#' and '@'.
-    LMI_TEST( boost::regex_match("0.,2;.0,#3;0.0,@75;1234.5678"                
               , r));
+    LMI_TEST( std::regex_match("0.,2;.0,#3;0.0,@75;1234.5678"                  
             , r));
     // Same, with whitespace.
-    LMI_TEST( boost::regex_match(" 0. , 2 ; .0 , # 3 ; 0.0 , @ 75 ; 1234.5678 
"               , r));
+    LMI_TEST( std::regex_match(" 0. , 2 ; .0 , # 3 ; 0.0 , @ 75 ; 1234.5678 "  
             , r));
     // No numbers--only keywords.
-    LMI_TEST( boost::regex_match("salary,retirement;corridor,maturity"         
               , r));
+    LMI_TEST( std::regex_match("salary,retirement;corridor,maturity"           
             , r));
     // Same, with whitespace.
-    LMI_TEST( boost::regex_match("  salary  ,  retirement;  corridor  ,  
maturity"            , r));
-    LMI_TEST( boost::regex_match("  salary  ,  retirement;  corridor  ,  
maturity  "          , r));
-    LMI_TEST( boost::regex_match("  salary  ,  retirement  ;  corridor  ,  
maturity"          , r));
-    LMI_TEST( boost::regex_match("  salary  ,  retirement  ;  corridor  ,  
maturity  "        , r));
+    LMI_TEST( std::regex_match("  salary  ,  retirement;  corridor  ,  
maturity"            , r));
+    LMI_TEST( std::regex_match("  salary  ,  retirement;  corridor  ,  
maturity  "          , r));
+    LMI_TEST( std::regex_match("  salary  ,  retirement  ;  corridor  ,  
maturity"          , r));
+    LMI_TEST( std::regex_match("  salary  ,  retirement  ;  corridor  ,  
maturity  "        , r));
     // Empty except for zero or more blanks.
-    LMI_TEST( boost::regex_match(""                                            
               , r));
-    LMI_TEST( boost::regex_match(" "                                           
               , r));
-    LMI_TEST( boost::regex_match("  "                                          
               , r));
+    LMI_TEST( std::regex_match(""                                              
             , r));
+    LMI_TEST( std::regex_match(" "                                             
             , r));
+    LMI_TEST( std::regex_match("  "                                            
             , r));
     // Interval notation.
-    LMI_TEST( boost::regex_match("1 [2,3);4 (5,6]"                             
               , r));
+    LMI_TEST( std::regex_match("1 [2,3);4 (5,6]"                               
             , r));
     // User-manual examples. See: 
https://www.nongnu.org/lmi/sequence_input.html
-    LMI_TEST( boost::regex_match("sevenpay 7; 250000 retirement; 100000 #10; 
75000 @95; 50000", r));
-    LMI_TEST( boost::regex_match("100000; 110000; 120000; 130000; 140000; 
150000"             , r));
-    LMI_TEST( boost::regex_match("target; maximum"                             
               , r)); // [Modified example.]
-    LMI_TEST( boost::regex_match("10000 20; 0"                                 
               , r));
-    LMI_TEST( boost::regex_match("10000 10; 5000 15; 0"                        
               , r));
-    LMI_TEST( boost::regex_match("10000 @70; 0"                                
               , r));
-    LMI_TEST( boost::regex_match("10000 retirement; 0"                         
               , r));
-    LMI_TEST( boost::regex_match("0 retirement; 5000"                          
               , r));
-    LMI_TEST( boost::regex_match("0 retirement; 5000 maturity"                 
               , r));
-    LMI_TEST( boost::regex_match("0 retirement; 5000 #10; 0"                   
               , r));
-    LMI_TEST( boost::regex_match("0,[0,retirement);10000,[retirement,#10);0"   
               , r));
+    LMI_TEST( std::regex_match("sevenpay 7; 250000 retirement; 100000 #10; 
75000 @95; 50000", r));
+    LMI_TEST( std::regex_match("100000; 110000; 120000; 130000; 140000; 
150000"             , r));
+    LMI_TEST( std::regex_match("target; maximum"                               
             , r)); // [Modified example.]
+    LMI_TEST( std::regex_match("10000 20; 0"                                   
             , r));
+    LMI_TEST( std::regex_match("10000 10; 5000 15; 0"                          
             , r));
+    LMI_TEST( std::regex_match("10000 @70; 0"                                  
             , r));
+    LMI_TEST( std::regex_match("10000 retirement; 0"                           
             , r));
+    LMI_TEST( std::regex_match("0 retirement; 5000"                            
             , r));
+    LMI_TEST( std::regex_match("0 retirement; 5000 maturity"                   
             , r));
+    LMI_TEST( std::regex_match("0 retirement; 5000 #10; 0"                     
             , r));
+    LMI_TEST( std::regex_match("0,[0,retirement);10000,[retirement,#10);0"     
             , r));
 
     // Tests that are designed to fail.
 
     // Naked semicolon.
-    LMI_TEST(!boost::regex_match(";"                                           
               , r));
-    LMI_TEST(!boost::regex_match(" ; "                                         
               , r));
+    LMI_TEST(!std::regex_match(";"                                             
             , r));
+    LMI_TEST(!std::regex_match(" ; "                                           
             , r));
     // Missing required semicolon.
-    LMI_TEST(!boost::regex_match("7 24 25"                                     
               , r));
-    LMI_TEST(!boost::regex_match("7,24,25"                                     
               , r));
-    LMI_TEST(!boost::regex_match("7, 24, 25"                                   
               , r));
-    LMI_TEST(!boost::regex_match("7 , 24 , 25"                                 
               , r));
+    LMI_TEST(!std::regex_match("7 24 25"                                       
             , r));
+    LMI_TEST(!std::regex_match("7,24,25"                                       
             , r));
+    LMI_TEST(!std::regex_match("7, 24, 25"                                     
             , r));
+    LMI_TEST(!std::regex_match("7 , 24 , 25"                                   
             , r));
     // Extraneous commas.
-    LMI_TEST(!boost::regex_match(",1"                                          
               , r));
-    LMI_TEST(!boost::regex_match("1,"                                          
               , r));
-    LMI_TEST(!boost::regex_match("1,2,"                                        
               , r));
-    LMI_TEST(!boost::regex_match("1,,2"                                        
               , r));
+    LMI_TEST(!std::regex_match(",1"                                            
             , r));
+    LMI_TEST(!std::regex_match("1,"                                            
             , r));
+    LMI_TEST(!std::regex_match("1,2,"                                          
             , r));
+    LMI_TEST(!std::regex_match("1,,2"                                          
             , r));
     // Impermissible character.
-    LMI_TEST(!boost::regex_match("%"                                           
               , r));
+    LMI_TEST(!std::regex_match("%"                                             
             , r));
     // Uppercase in keywords.
-    LMI_TEST(!boost::regex_match("Glp"                                         
               , r));
-    LMI_TEST(!boost::regex_match("GLP"                                         
               , r));
+    LMI_TEST(!std::regex_match("Glp"                                           
             , r));
+    LMI_TEST(!std::regex_match("GLP"                                           
             , r));
     // Misppellings.
-    LMI_TEST(!boost::regex_match("gdp"                                         
               , r));
-    LMI_TEST(!boost::regex_match("glpp"                                        
               , r));
-    LMI_TEST(!boost::regex_match("gglp"                                        
               , r));
+    LMI_TEST(!std::regex_match("gdp"                                           
             , r));
+    LMI_TEST(!std::regex_match("glpp"                                          
             , r));
+    LMI_TEST(!std::regex_match("gglp"                                          
             , r));
 
     X = "(\\-?[0-9.]+)";
     R = " *| *" + X + Y + "? *(; *" + X + Y + "? *)*;? *";



reply via email to

[Prev in Thread] Current Thread [Next in Thread]