[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev making lynx traversal crawl download html, not text
From: |
Bob |
Subject: |
Re: lynx-dev making lynx traversal crawl download html, not text |
Date: |
Fri, 22 Mar 2002 22:48:41 -0500 |
I don't find anywhere -traversal or -crawl use srcmode_for_next_retrieval,
so that we could get html instead of text by srcmode_for_next_retrieval(1)
instead of (0) or (-1). I'm looking elsewhere now.
OR
Since all I need to do is have lynx try to open a URL, satisfy cookies
demands, then request the same URL a second time to go around
yahoo's ad page with "Continue to message" link(just requesting
the same URL a second time), could I stdin a GET the URL twice,
or once on command line and GET again?
OR
If view mode were set to default to "source" rather than "presentation"
text mode, -traversal -crawl might download html.
OR
If -source was changed in the following way, -traversal -crawl -source
might not quit on the first link like -dump, and might keep on going in
source mode download to the *.dat files.
the way it is now -source will make lynx quit on the first download
/* -source */
PRIVATE int source_fun ARGS1(
char *, next_arg GCC_UNUSED)
{
dump_output_immediately = TRUE;
HTOutputFormat = (LYPrependBase ?
HTAtom_for("www/download") : HTAtom_for("www/dump"));
LYcols = MAX_COLS;
return 0;
}
could be
/* -source */
PRIVATE int source_fun ARGS1(
char *, next_arg GCC_UNUSED) {
dump_output_immediately = FALSE;
if ( traversal != TRUE && crawl != TRUE ) {
dump_output_immediately = TRUE;
};
HTOutputFormat = (LYPrependBase ?
HTAtom_for("www/download") : HTAtom_for("www/dump"));
LYcols = MAX_COLS;
return 0;
}
That's not enough, though, since -traversal and -crawl would
be downloading files, not just sending to stdout as -source.
-traveral and -crawl build a links table
links.[curdoc.link].lname
add_to_table(curdoc.address)
which they download into *.dat files via
sprintf(cfile,"lnk%08.dat",ccount);
Are the curdocs referenced in that table in source format?
Not since they are sprintfable?
-Bob
Thomas Dickey wrote:
> On Fri, Mar 22, 2002 at 07:30:19PM -0500, Bob wrote:
> > Either -dump or -source restrict the download to one file
> > only, correct?
> >
> > I was hoping to iterate the crawl with downloading in
> > html format.
> >
> > Perhaps there is a mode=1 set somewhere, instead of
> > mode=0, if srcmode_for_next_retrieval() is called from
> > somewhere? Or?
>
> I only see srcmode_for_next_retrieval() called with constant parameters:
So, in one of those places where the call is made with parameter
(0) or (-1) it might be nice if that was in a process under -traversal
or -crawl. Then I would put (1) there instead. I'll start looking at--
src/LYMainLoop.c:3819: srcmode_for_next_retrieval(0);
src/LYMainLoop.c:4380: srcmode_for_next_retrieval(-1);
src/LYMainLoop.c:4407: srcmode_for_next_retrieval(0);
src/LYMainLoop.c:4472: srcmode_for_next_retrieval(0);
src/LYOptions.c:3039: srcmode_for_next_retrieval(0);
src/LYOptions.c:3049: srcmode_for_next_retrieval(0);
-Bob
> src/LYGetFile.c:1118:PUBLIC void srcmode_for_next_retrieval ARGS1(
> src/LYGetFile.h:11:extern void srcmode_for_next_retrieval PARAMS((int));
> src/LYMainLoop.c:3802: srcmode_for_next_retrieval(1);
> src/LYMainLoop.c:3819: srcmode_for_next_retrieval(0);
> src/LYMainLoop.c:4236: srcmode_for_next_retrieval(1);
> src/LYMainLoop.c:4380: srcmode_for_next_retrieval(-1);
> src/LYMainLoop.c:4385: srcmode_for_next_retrieval(1);
> src/LYMainLoop.c:4407: srcmode_for_next_retrieval(0);
> src/LYMainLoop.c:4447: srcmode_for_next_retrieval(1);
> src/LYMainLoop.c:4469: srcmode_for_next_retrieval(1);
> src/LYMainLoop.c:4472: srcmode_for_next_retrieval(0);
> src/LYOptions.c:3032: srcmode_for_next_retrieval(1);
> src/LYOptions.c:3039: srcmode_for_next_retrieval(0);
> src/LYOptions.c:3049: srcmode_for_next_retrieval(0);
>
> --
> Thomas E. Dickey <address@hidden>
> http://invisible-island.net
> ftp://invisible-island.net
; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden