bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] funcsub func needed


From: Kjetil Flovild-Midtlie
Subject: Re: [bug-gawk] funcsub func needed
Date: Tue, 09 Dec 2014 11:33:58 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

Benchmarked on a 4core 1yr old i5 laptop, ca 2300 different regex rules

w gensub (pluss a 2nd pass awk to adjust grp-values ) == 5 min
w awk funcsub (1 pass) == 50 minutes !!!

PROBLEM:
funcsub version is VERY easy to understand and modify
for "dummy" regex personnel but not usable since it could take forever to find a bug in output

gensub+2pass is complex and need a hardcore "awky" to mod the 2pass awk and keep placeholder symbols in-sync and in correct position (see below @%@ and @=@)


# -------------------------------------------------------------------------------------------------------------- # sample awk file, 1pass [auto-generated from another awk file + tsv datafile]
#
/^data/ { $0 = gensub(/\<a\s([[:digit:]])\s([[:alpha:]])/,"/address@hidden@address@hidden@\\2","g") } /^data/ { $0 = gensub(/\<b\s([[:digit:]])\s([[:alpha:]])/,"/address@hidden@address@hidden@\\2","g") } /^data/ { $0 = gensub(/\<c\s([[:digit:]])\s([[:alpha:]])/,"/address@hidden@address@hidden@\\2","g") } /^data/ { $0 = gensub(/\<d\s([[:digit:]])\s([[:alpha:]])/,"/address@hidden@address@hidden@\\2","g") } /^data/ { $0 = gensub(/\<e\s([[:digit:]])\s([[:alpha:]])/,"/address@hidden@address@hidden@\\2","g") }

# ..(2295 more) !!!

{ print $0 }

# end 1pass file ------------------------------------------------------------------------------------------

# sample awk file 2pass (typically small) ---------------------------------------------------------
{
    gensub( /@address@hidden/,"/Alp","g")
    gensub( /@address@hidden/,"/Bet","g")
    gensub( /@address@hidden/,"/0","g")
    gensub( /@address@hidden/,"/1","g")

    gensub( /@address@hidden/,"X","g")
    gensub( /@address@hidden/,"IV","g")
    gensub( /@address@hidden/,"I","g")

    # etc
   # often more advanced IF-ELSE is needed !!
   #
    print $0
}
# end 2pass file --------------------------------------------------------------------------------------------

awk -f 1pass.awk input.txt | awk -f 2pass.awk > output.txt









reply via email to

[Prev in Thread] Current Thread [Next in Thread]