[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
-source change !work Re: lynx-dev making lynx traversal crawl download
From: |
Bob |
Subject: |
-source change !work Re: lynx-dev making lynx traversal crawl download html, not text |
Date: |
Sat, 23 Mar 2002 02:57:54 -0500 |
Just the source_fun change dump_output_immediately = FALSE
didn't permit -traversal -crawl -source to download html source.
-source still caused one file download then quit, as usual.
next look at
-traveral and -crawl build a links table
links.[curdoc.link].lname
add_to_table(curdoc.address)
which they download into *.dat files via
sprintf(cfile,"lnk%08.dat",ccount);
Are the curdocs referenced in that table in source format?
Not since they are sprintfable?
Bob wrote:
> I don't find anywhere -traversal or -crawl use srcmode_for_next_retrieval,
> so that we could get html instead of text by srcmode_for_next_retrieval(1)
> instead of (0) or (-1). I'm looking elsewhere now.
>
> OR
>
> Since all I need to do is have lynx try to open a URL, satisfy cookies
> demands, then request the same URL a second time to go around
> yahoo's ad page with "Continue to message" link(just requesting
> the same URL a second time), could I stdin a GET the URL twice,
> or once on command line and GET again?
>
> OR
>
> If view mode were set to default to "source" rather than "presentation"
> text mode, -traversal -crawl might download html.
>
> OR
>
> If -source was changed in the following way, -traversal -crawl -source
> might not quit on the first link like -dump, and might keep on going in
> source mode download to the *.dat files.
>
> the way it is now -source will make lynx quit on the first download
>
> /* -source */
> PRIVATE int source_fun ARGS1(
> char *, next_arg GCC_UNUSED)
> {
> dump_output_immediately = TRUE;
> HTOutputFormat = (LYPrependBase ?
> HTAtom_for("www/download") : HTAtom_for("www/dump"));
> LYcols = MAX_COLS;
> return 0;
> }
>
> could be
>
> /* -source */
> PRIVATE int source_fun ARGS1(
> char *, next_arg GCC_UNUSED) {
> dump_output_immediately = FALSE;
> if ( traversal != TRUE && crawl != TRUE ) {
> dump_output_immediately = TRUE;
> };
> HTOutputFormat = (LYPrependBase ?
> HTAtom_for("www/download") : HTAtom_for("www/dump"));
> LYcols = MAX_COLS;
> return 0;
> }
>
> That's not enough, though, since -traversal and -crawl would
> be downloading files, not just sending to stdout as -source.
>
> -traveral and -crawl build a links table
>
> links.[curdoc.link].lname
> add_to_table(curdoc.address)
>
> which they download into *.dat files via
>
> sprintf(cfile,"lnk%08.dat",ccount);
>
> Are the curdocs referenced in that table in source format?
> Not since they are sprintfable?
>
> -Bob
>
> Thomas Dickey wrote:
>
> > On Fri, Mar 22, 2002 at 07:30:19PM -0500, Bob wrote:
> > > Either -dump or -source restrict the download to one file
> > > only, correct?
> > >
> > > I was hoping to iterate the crawl with downloading in
> > > html format.
> > >
> > > Perhaps there is a mode=1 set somewhere, instead of
> > > mode=0, if srcmode_for_next_retrieval() is called from
> > > somewhere? Or?
> >
> > I only see srcmode_for_next_retrieval() called with constant parameters:
>
> So, in one of those places where the call is made with parameter
> (0) or (-1) it might be nice if that was in a process under -traversal
> or -crawl. Then I would put (1) there instead. I'll start looking at--
>
> src/LYMainLoop.c:3819: srcmode_for_next_retrieval(0);
> src/LYMainLoop.c:4380: srcmode_for_next_retrieval(-1);
> src/LYMainLoop.c:4407: srcmode_for_next_retrieval(0);
> src/LYMainLoop.c:4472: srcmode_for_next_retrieval(0);
> src/LYOptions.c:3039: srcmode_for_next_retrieval(0);
> src/LYOptions.c:3049: srcmode_for_next_retrieval(0);
>
> -Bob
>
> > src/LYGetFile.c:1118:PUBLIC void srcmode_for_next_retrieval ARGS1(
> > src/LYGetFile.h:11:extern void srcmode_for_next_retrieval PARAMS((int));
> > src/LYMainLoop.c:3802: srcmode_for_next_retrieval(1);
> > src/LYMainLoop.c:3819: srcmode_for_next_retrieval(0);
> > src/LYMainLoop.c:4236: srcmode_for_next_retrieval(1);
> > src/LYMainLoop.c:4380: srcmode_for_next_retrieval(-1);
> > src/LYMainLoop.c:4385: srcmode_for_next_retrieval(1);
> > src/LYMainLoop.c:4407: srcmode_for_next_retrieval(0);
> > src/LYMainLoop.c:4447: srcmode_for_next_retrieval(1);
> > src/LYMainLoop.c:4469: srcmode_for_next_retrieval(1);
> > src/LYMainLoop.c:4472: srcmode_for_next_retrieval(0);
> > src/LYOptions.c:3032: srcmode_for_next_retrieval(1);
> > src/LYOptions.c:3039: srcmode_for_next_retrieval(0);
> > src/LYOptions.c:3049: srcmode_for_next_retrieval(0);
> >
> > --
> > Thomas E. Dickey <address@hidden>
> > http://invisible-island.net
> > ftp://invisible-island.net
>
> ; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden
; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden