Re: lynx-dev Non-interactive lynx

lynx-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Non-interactive lynx

From:	Duncan Simpson
Subject:	Re: lynx-dev Non-interactive lynx
Date:	Sun, 18 Mar 2001 04:16:01 +0000

> In "lynx-dev Non-interactive lynx"
 I think it would still provoke those who spend time and consideration
> on which of their files have;
>       <META NAME="robots" CONTENT="all/none/nofollow/noindex">
> and so forth.  Also bear in mind that no robot can read copyright
> notices in the body of a page.
>
Does wget notice this? The robot exclusion protocol I know about is different: 
/robots.txt contains a set of glob patterns that robots covering the pages 
that robots should not read. I am pretty sure this is what wget knows about. 
If a site features the META tags you suggest then it almost certaintly 
provides robots.txt as well.
 
> Just wondered: how easy/hard would it be to make Lynx obey robot
> exclusion protocols in non-interactive mode?  This is also done
> with HTTP headers?
> 
Obeying robots.txt should be eminently possible given a C library with a 
working version of fnmatch in it. It might make sense to provide an 
implementation of fnmatch for those with deficient C library, for example the 
C library M$ visual C++ provides probably lacks fnmatch---I know both alloca 
and getopt are absent. If you need a symbol table mapping sites to a related 
robots.txt then I have a splay tree implementation that should be fairly easy 
to adapt for that purpose.

We can steal an implementation of fnmatch from glibc, since lynx is 
distributed under the GPL.

-- 
Duncan (-:
"software industry, the: unique industry where selling substandard goods is
legal and you can charge extra for fixing the problems."



; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

lynx-dev Non-interactive lynx, Ilya Zakharevich, 2001/03/17
- Re: lynx-dev Non-interactive lynx, Patrick, 2001/03/18
  - Re: lynx-dev Non-interactive lynx, Duncan Simpson <=

Prev by Date: Re: lynx-dev is it possible to see what lynx sends to the server?
Next by Date: Re: lynx-dev [PATCH 2.8.4dev.19] table line wrap
Previous by thread: Re: lynx-dev Non-interactive lynx
Next by thread: lynx-dev [PATCH 2.8.4dev.19] table line wrap
Index(es):
- Date
- Thread