[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LYNX-DEV URL grabber for lynx
From: |
Axel C. Frinke |
Subject: |
Re: LYNX-DEV URL grabber for lynx |
Date: |
Sun, 12 Oct 1997 23:38:27 +0200 (MET DST) |
Roesberg, Sun 12.10.97
David,
A>> I've just written a small C program to filter URLs from stdin, sort
D>
D> Wouldn't it have been easier to borrow the routine from wget (Gnu sites)
D> that does this (assuming that you mean filter them from an HTML source)?
Yes, if wgets sorts the URLs in a useful way.
(Sorry, I've never heard about wgets before.)
At meanwhile, I've enhanced my program to filter URLs from files
already generated by 'lynx -dump'.
A>> batch or at) to retrieve all found URLs using "lynx -dump" (or
D> For -dump, and with recursion allowed, Lynx already does this.
You mean options "-crawl" or "-traversal"?
Anyway, I was not satisfied with them. URLs others than 'text/html'
and URLs from a different host were not retrieved.
Well, I go for that. After all, lynx must be able to terminate. ;-)
However, a program generating a script to retrieve URLs provides
ability to select URLs before retrieving.
D> wget has options to insert delays between each fetch to avoid overloading
D> the server and also respects robots.txt to avoid fetching private or
D> dynamic information.
OK, I will take a look for wgets.
But then, I've already done the effort.
So, if there no more concerns against publishing a URL grabber,
inspite of the existence of wgets I will upload it on the web
(at http://titan.informatik.uni-bonn.de/~frinke/UrlGrab.zip).
Regards,
Axel.
;
; To UNSUBSCRIBE: Send a mail message to address@hidden
; with "unsubscribe lynx-dev" (without the
; quotation marks) on a line by itself.
;