lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Using other HTML parsers in Lynx


From: pAb-032871
Subject: Re: lynx-dev Using other HTML parsers in Lynx
Date: Wed, 26 Jul 2000 01:04:34 -0700

In "lynx-dev Using other HTML parsers in Lynx"
[25/Jul/2000 Tue 19:30:40]
Mooneer Salem wrote:

I don't know much about programming, but some of this is of interest
to me.

> For fun, I decided to write a library that parses HTML. :)
> 
> After writing the library I decided to write a demo application using it
> (which prints an English representation of the HTML passed into it and
> which also acts as a benchmark app.) According to the demonstration
> program, it parsed the PostgreSQL-HOWTO (300kb in HTML format) in
> 0.16 seconds, which is pretty impressive for a Celeron 500 with 192MB RAM
> which also acts as a pretty busy DNS, Web and database server.
> 
> Here's the question: how hard would it be to implement this parser into
> Lynx?

Depends how integrated you want it to be.  Could be as easy as
defining a new DOWNLOADER [your app] in lynx.cfg

What really interests me is; who feels like teaching it to parse
JavaScript as well [if you're releasing the source and leaving
it open to changes]?

People seem to think adding support for JavaScript isn't practical,
and maybe impossibe, in Lynx because of its a one-pass HTML parsing.
A separate application that *could* make sense of JavaScript,
then pass the results to Lynx, popped into my head some time ago
but I don't know how to do it.

Here's an old message about it:
        http://www.flora.org/lynx-dev/html/month072000/msg00034.html

The follow-ups are probably more interesting than my input.


> The parser can be found at
> http://devel.usnuk.net/libhtmlparse-0.1-alpha1.tar.gz,
> and you can see the uncompressed archive at
> http://devel.usnuk.net/libhtmlparse-0.1-alpha1/.
> demo/test.c is the benchmark application I ran, while demo/prettyHTML is
> another
> demo app I wrote for the library (it makes HTML more clear and readable by
> adding tabs)
> 
> --
> Mooneer Salem
> Sysadmin, Ultraspeed UK (http://www.ultraspeed.co.uk/)
> GPLTrans (http://www.translator.cx/)
> Personal Home Page (http://msalem.translator.cx/)



                          Patrick
                <mailto:address@hidden>
 

; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]