bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] feature request: expanding escape sequences


From: Ed Morton
Subject: Re: [bug-gawk] feature request: expanding escape sequences
Date: Sun, 06 Jul 2014 17:11:17 -0500
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

OK, so now for processing single char escapes I have this which seems to work just fine and I'm sure I could expand it if necessary. Thanks!

      Ed.

$ cat tst2.awk
function expandEscapes(old,     segNr, segs, seps, new) {
    split(old,segs,/\\./,seps)
    for (segNr=1; segNr in segs; segNr++) {
        if ( idx = index( "abfnrtv", substr(seps[segNr],2,1) ) )
            seps[segNr] = substr("\a\b\f\n\r\t\v", idx, 1)
        new = new segs[segNr] seps[segNr]
    }
    return new
}

{
    printf "\"%s\"\n", expandEscapes($0)
}
----------------------------------------
$ cat file
a\tb\nc\jd\\te
---------------------------------------
$ awk -f tst2.awk file
"a      b
c\jd\\te"


On 7/6/2014 4:32 PM, Aharon Robbins wrote:
Date: Sun, 06 Jul 2014 13:25:31 -0500
From: Ed Morton <address@hidden>
To: Manuel Collado <address@hidden>, address@hidden
Subject: Re: [bug-gawk] feature request: expanding escape sequences

On 7/6/2014 1:05 PM, Manuel Collado wrote:
El 06/07/2014 16:26, Ed Morton escribi?:
Arnold - I don't believe it can be done fairly easily in any way. For
example, wrt your suggestion of splitting on an RE, let's say I define my
escape-sequence matching RE as '\[[:alpha:]]`. Well, that's wrong:
Please try this RE:

   /\\(x[0-9a-fA-F]*|[0-7]{1,3}|.)/

It should match every valid escape sequence. And every unnecessarily escaped
normal character.

Hope this helps. Regards.
That does identify the escape sequences in my small set of test cases, so
assuming it works for every case (and it looks like it should) then all that'd
be left to do is mapping them to their equivalent literal characters. Any
suggestions on a concise way of doing that?

Thanks,

       Ed.
Manuel, thanks for the regexp.

Ed, if there is exactly one character after the \, you can use

        i = index("abfnrtv", c)

if i > 0 then use

        c = substr("\a\b\f\n\r\t\v", i, 1)

to get the corresponding real character.  For a hex value after
the \, use

        c = sprintf("%c", strtonum("0x" rest_of_str))

and for an octal value use

        c = sprintf("%c", strtonum("0" rest_of_str))

You should be able to figure out how to put it all together.

If you want me to write the function for you from scratch,
I will charge a high consulting fee. :-)

Good luck,

Arnold





reply via email to

[Prev in Thread] Current Thread [Next in Thread]