help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to get title of web page by url?


From: Lennart Borgman
Subject: Re: How to get title of web page by url?
Date: Wed, 28 Jul 2010 17:44:45 +0200

On Wed, Jul 28, 2010 at 5:34 PM, Thamer Mahmoud
<address@hidden> wrote:
> filebat Mark <address@hidden> writes:
>
>> Thanks, Thamer. It works.
>>
>> Below is the code snippet.
>>
>> Well, I still have an encoding problem.
>> To get the title of "http://www.baidu.com";, the title we get is displayed as
>> unrecognizable codes.
>>
>> I have tried to encode it, in the way of "(setq web_title_str
>> (encode-coding-string  web_title_str 'utf-8-dos))", but it fails.
>
> I'm also new to Elisp (well sort of).
>
> But here is a modified version that should handle both charsets and
> newlines (and other issues noticed by Deniz Dogan. Thanks).
>
> (defun www-get-page-title (url)
>  (let ((title))
>    (with-current-buffer (url-retrieve-synchronously url)
>      (goto-char (point-min))
>      (re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
>      (setq title (match-string 1))
>      (goto-char (point-min))
>      (re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
>      (decode-coding-string title (intern (match-string 1))))))
>
> The robustness of this code would still depend on whether the HTML is
> well-formed, but it should be good enough I think.


Have a look at url-copy-file for how to get this correct. (Or
web-vcs-url-copy-file in nXhtml which is a little bit more careful.)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]