[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to get title of web page by url?

From: filebat Mark
Subject: Re: How to get title of web page by url?
Date: Thu, 29 Jul 2010 23:07:43 +0800

Thank you very much, Thamer! It serves my need very well.

Though html parser shall be more powerful, grepping the string shall be good enough for my requirement.
Thank you all for the attention and valuable discussion.

Post the complete lisp function here, if someone else need it.
;; -------------------------- separator --------------------------
(defun get-page-title()
  "Get title of web page, whose url can be found in the current line"
  ;; Get url from current line
  (copy-region-as-kill (re-search-backward "^") (re-search-forward "$"))
  (setq url (substring-no-properties (current-kill 0)))
  ;; Get title of web page, with the help of functions in url.el
  (with-current-buffer (url-retrieve-synchronously url)
    ;; find title by grep the html code
    (goto-char 0)
    (re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
    (setq web_title_str (match-string 1))
    ;; find charset by grep the html code
    (goto-char 0)
    (re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
    ;; downcase the charaset. e.g, UTF-8 is not acceptible for emacs, while utf-8 is ok.
    (setq coding_charset (downcase (match-string 1)))
    ;; decode the string of title.
    (setq web_title_str (decode-coding-string web_title_str (intern coding_charset)))
  ;; Insert the title in the next line
  (insert web_title_str)

On Thu, Jul 29, 2010 at 2:14 AM, Thamer Mahmoud <address@hidden> wrote:

> (defun www-get-page-title (url)
>   (let ((title))
>     (with-current-buffer (url-retrieve-synchronously url)
>       (goto-char (point-min))
>       (re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
>       (setq title (match-string 1))
>       (goto-char (point-min))
>       (re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
>       (decode-coding-string title (intern (match-string 1))))))

Just did a test on a wikipedia page, and looks like
`decode-coding-string' doesn't handle upper-case charsets, like UTF-8,
only utf-8.

So the last line should be:

(decode-coding-string title (intern (downcase (match-string 1)))))))


Thanks & Regards

Denny Zhang

reply via email to

[Prev in Thread] Current Thread [Next in Thread]