help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regular expression


From: Emanuel Berg
Subject: Re: regular expression
Date: Mon, 30 Jun 2014 22:04:39 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

address@hidden writes:

> Hi, I'm newbe on this group.
>
> I know, that I can use Regexp in emacs. And, I would
> do that. Can someone help me?
>
> I have a text file, that is a converted .pdf
> file. So, I have many dirty character inside.
>
> I've found some reg- expression.
>
> i.e.: in a line like this:
>
> 40 STREET DW...
>
> I want to made substitution like this:
>
> 40#STREEDW...
>
> can someone help me to build this expression?

I suspect it is better to use gnu.emacs.help for this
kind of question as that group is much more
active. Therefore, I post this on both groups. You can
later remove the crosspost depending on where the
action is from now on.

As for your question, you only give one example so I
had to guess a bit what the general case is. For just
one example, you might as well use one (non-regexp)
search-and-replace, right? But I suspect you want to do
this on all cases like this:

40 STREET DW
6 ROAD EW
666 A Z
666 a z

So try the below command:

(replace-regexp
 "\\([0-9]+\\) \\([A-Z]+\\) \\([A-Z]+\\)"
 "\\1#\\2\\3")

Here is how it works:

[...] are ranges

+ is "one, or many (but never zero) of the previous"

whitespace is whitespace

\\(...\\) is a group - those are used in the "replace
with" expression - \\1 means insert group 1 (from left
to right), and so on.

Note that [A-Z] matches [a-z] as well (the lowercase
equivalent) unless the variable case-fold-search is
nil. If you want to have case-sensitive replacement
(where [A-Z] makes sense), you can enclose the command
like this:

(let ((case-fold-search nil))
  (replace-regexp
    "\\([0-9]+\\) \\([A-Z]+\\) \\([A-Z]+\\)"
    "\\1#\\2\\3") )

You can watch this in action by running it on the
examples above - see how now, the "a z" one is left
alone!

Yes, you can do this without writing code - but it is
easier to write it in code and execute it. The reason
is you have better overview and it is easier to adjust
the regexp (both the match and replacement parts) - and
this is often a thing you'd do a couple of times, to
get it right. So that is easier than to input it all
again and again interactively.

Come back with more question if you have any. Otherwise
tell us if you got it to work. Good luck!

-- 
underground experts united:
http://user.it.uu.se/~embe8573


reply via email to

[Prev in Thread] Current Thread [Next in Thread]