[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Lynx-dev] Lynx as a filter?
[Lynx-dev] Lynx as a filter?
Fri, 4 Aug 2006 14:17:54 -0600 (MDT)
I should like to be able to use lynx as a filter to translate HTML
to plain text. That is to say, I should like to be able to send
HTML to lynx's stdin and have plain text (no HTML) appear at lynx's
I am able to make this translation with lynx but only if the input
is given as the name of a file containing the HTML on the command
/usr/local/bin/lynx -dump -force_html file.html
Before I continue:
PC% lynx --version
Lynx Version 2.8.5rel.1 (04 Feb 2004)
Built on freebsd4.11 Jan 4 2005 04:58:20
According to the lynx man page it should be possible to convince
lynx to accept input via it's stdin -- unless I'm misunderstanding
what is being said:
- If the argument is only '-', then Lynx expects to
receive the arguments from stdin. This is to allow
for the potentially very long command line that
can be associated with the -get_data or -post_data
arguments (see below). It can also be used to
avoid having sensitive information in the invoking
command line (which would be visible to other
processes on most systems), especially when the
-auth or -pauth options are used.
But when I try something like:
PC% cat file.html | /usr/local/bin/lynx -dump -force_html -
I get something like the following:
Can't Access `file://localhost/home/crs/</HTML>'
Alert!: Unable to access document.
lynx: Can't access startfile
Clearly, that is only a test; in that situation, I could just use
the command line described earlier with the filename as an argument
on the command line.
For those who may be curious, I want to be able to convert HTML to
text when it is not in a file. Specifically, I want to be able to
use vi's capability to apply an external command to a unit of text
(e.g. a paragraph or paragraphs). I want to make a simple-minded
shell script (say in file, html2txt):
/usr/local/bin/lynx -dump -force_html -
So that when I receive one of those damnable e-mails full of HTML,
I can run vi on the message (my mail client allows me to do that),
go to the start of the body of the message and tell vi
and have the script run lynx on the next 99 (or fewer) paragraphs,
converting it into readable text very much as I'm able to do
to format text with long lines to shorter lines.
As mentioned earlier, the shell script:
/usr/local/bin/lynx -dump -force_html $@
works fine as long as I feed that HTML to it from a file named on
the script's command line. But that means that, instead of simply
being able to run vi on the e-mail message, moving to the start of
the HTML, and doing the "!99} html2txt" on the remainder of the
message to replace the HTML with the actual content of the message,
I must, instead, save the message to a file, delete the e-mail
headers from that file, and then run html2txt on that filename,
either saving the output to a file or piping it to a pager to read.
Thanks for any help (or even for letting me know that I'm mis-
understanding the lynx man page so I can start looking elsewhere
for a solution.
P. O. Box 1225
Edgewood, NM 87015
Why HTML in e-mail is evil: http://www.birdhouse.org/etc/evilmail.html
and (possibly) how to turn it off: http://www.expita.com/nomime.html
|[Prev in Thread]
||[Next in Thread]|
- [Lynx-dev] Lynx as a filter?,
Charlie Sorsby <=