[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] funcsub func needed
From: |
Kjetil Flovild-Midtlie |
Subject: |
Re: [bug-gawk] funcsub func needed |
Date: |
Tue, 09 Dec 2014 11:33:58 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 |
Benchmarked on a 4core 1yr old i5 laptop, ca 2300 different regex rules
w gensub (pluss a 2nd pass awk to adjust grp-values ) == 5 min
w awk funcsub (1 pass) == 50 minutes !!!
PROBLEM:
funcsub version is VERY easy to understand and modify
for "dummy" regex personnel but not usable since it could take forever
to find a bug in output
gensub+2pass is complex and need a hardcore "awky" to mod the 2pass awk
and keep placeholder symbols in-sync and in correct position (see below
@%@ and @=@)
#
--------------------------------------------------------------------------------------------------------------
# sample awk file, 1pass [auto-generated from another awk file + tsv
datafile]
#
/^data/ { $0 =
gensub(/\<a\s([[:digit:]])\s([[:alpha:]])/,"/address@hidden@address@hidden@\\2","g") }
/^data/ { $0 =
gensub(/\<b\s([[:digit:]])\s([[:alpha:]])/,"/address@hidden@address@hidden@\\2","g") }
/^data/ { $0 =
gensub(/\<c\s([[:digit:]])\s([[:alpha:]])/,"/address@hidden@address@hidden@\\2","g") }
/^data/ { $0 =
gensub(/\<d\s([[:digit:]])\s([[:alpha:]])/,"/address@hidden@address@hidden@\\2","g") }
/^data/ { $0 =
gensub(/\<e\s([[:digit:]])\s([[:alpha:]])/,"/address@hidden@address@hidden@\\2","g") }
# ..(2295 more) !!!
{ print $0 }
# end 1pass file
------------------------------------------------------------------------------------------
# sample awk file 2pass (typically small)
---------------------------------------------------------
{
gensub( /@address@hidden/,"/Alp","g")
gensub( /@address@hidden/,"/Bet","g")
gensub( /@address@hidden/,"/0","g")
gensub( /@address@hidden/,"/1","g")
gensub( /@address@hidden/,"X","g")
gensub( /@address@hidden/,"IV","g")
gensub( /@address@hidden/,"I","g")
# etc
# often more advanced IF-ELSE is needed !!
#
print $0
}
# end 2pass file
--------------------------------------------------------------------------------------------
awk -f 1pass.awk input.txt | awk -f 2pass.awk > output.txt
Re: [bug-gawk] funcsub func needed, Kjetil Flovild-Midtlie, 2014/12/09
Re: [bug-gawk] funcsub func needed, Kjetil Flovild-Midtlie, 2014/12/09