help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to get title of web page by url?


From: Teemu Likonen
Subject: Re: How to get title of web page by url?
Date: Wed, 28 Jul 2010 17:53:17 +0300
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2.50 (gnu/linux)

* 2010-07-28 16:12 (+0200), Deniz Dogan wrote:

> 2010/7/28 Thamer Mahmoud <thamer.mahmoud@gmail.com>:
>>    (re-search-forward "<title>\\(.*\\)<[/]title>" nil t 1)

> By the way, this will not work in scenarios where the title is spread
> out across multiple lines:
>
> <title>
>   Hello
> </title>
>
> How would you solve this in Emacs Lisp?

Regexps can match whitespace too. Just leave out spaces, tabs and
newlines in the beginning and end of title text. Also note that the
title text itself may contain newlines. We should probably replace
newlines with spaces in the matching string.

The real solution for extracting title from a HTML text are not regular
expressions but a specific HTML parser. The Lisp way to write such
parser would be to turn the document (or only the head part) to nested
lists and other s-expressions and then dive into the list to find the
title. Such parsers already exist for Common Lisp but I'm not sure about
Emacs Lisp.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]