[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to get title of web page by url?

From: Andreas Röhler
Subject: Re: How to get title of web page by url?
Date: Wed, 28 Jul 2010 18:03:58 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; de; rv: Gecko/20100711 Thunderbird/3.0.6

[ ... ]

The real solution for extracting title from a HTML text are not regular
expressions but a specific HTML parser. The Lisp way to write such
parser would be to turn the document (or only the head part) to nested
lists and other s-expressions and then dive into the list to find the
title. Such parsers already exist for Common Lisp but I'm not sure about
Emacs Lisp.



is an essay for such a parser

see thing-at-point-markup.el too, which serves markup-languages as xml, html

thing-at-point-utils.el offers functions to grasp everything between angles - and does count nesting.

try ar-angled-lesser-atpt for example

all this needs


where the core routines reside.

Have a look, how the parser mentioned is employed via beginning-of-form-base, end-of-form-base from there.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]