help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to get title of web page by url?


From: Thamer Mahmoud
Subject: Re: How to get title of web page by url?
Date: Wed, 28 Jul 2010 18:34:56 +0300
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

filebat Mark <filebat.mark@gmail.com> writes:

> Thanks, Thamer. It works.
>
> Below is the code snippet.
>
> Well, I still have an encoding problem.
> To get the title of "http://www.baidu.com";, the title we get is displayed as
> unrecognizable codes.
>
> I have tried to encode it, in the way of "(setq web_title_str
> (encode-coding-string  web_title_str 'utf-8-dos))", but it fails.

I'm also new to Elisp (well sort of). 

But here is a modified version that should handle both charsets and
newlines (and other issues noticed by Deniz Dogan. Thanks).

(defun www-get-page-title (url)
  (let ((title))
    (with-current-buffer (url-retrieve-synchronously url)
      (goto-char (point-min))
      (re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
      (setq title (match-string 1))
      (goto-char (point-min))
      (re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
      (decode-coding-string title (intern (match-string 1))))))

The robustness of this code would still depend on whether the HTML is
well-formed, but it should be good enough I think.

--
Thamer









> Since I am a newbie for emacs encoding, can you please help me to point what
> the problem is?



>
> ;; -------------------------- separator --------------------------
> (defun get-page-title()
>   "Get title of web page, whose url can be found in current line"
>   (interactive)
>   ;; Get url from current line
>   (copy-region-as-kill (re-search-backward "^") (re-search-forward "$"))
>   (setq url (substring-no-properties (current-kill 0)))
>   ;; Get title of web page, with the help of functions in url.el
>   (with-current-buffer (url-retrieve-synchronously url)
>     (goto-char 0)
>     (re-search-forward "<title>\\(.*\\)<[/]title>" nil t 1)
>     (setq web_title_str (match-string 1)))
>     (setq web_title_str (encode-coding-string web_title_str 'utf-8-dos))
>   ;; Insert the title in the next line
>   (reindent-then-newline-and-indent)
>   (insert web_title_str)
>   )
>
>
> On 7/28/10, Thamer Mahmoud <thamer.mahmoud@gmail.com> wrote:
>>
>> filebat Mark <filebat.mark@gmail.com> writes:
>>
>> > Such as, given "http://www.emacswiki.org/emacs/Git";, we will get the
>> title
>> > of this web page, which is "EmacsWiki: Git:".
>> >
>> > Function of w3m-current-title is quite close, but a standalone lisp
>> function
>> > is much preferred.
>>
>>
>> Using the url.el package,
>>
>> (defun www-get-page-title (url)
>>   (with-current-buffer (url-retrieve-synchronously url)
>>     (goto-char 0)
>>     (re-search-forward "<title>\\(.*\\)<[/]title>" nil t 1)
>>     (match-string 1)))
>>
>> (www-get-page-title "http://www.emacswiki.org/emacs/Git";)
>> => "EmacsWiki: Git"
>>
>> hth,
>>
>> Thamer
>>
>>
>>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]