[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Groff] (no subject)
From: |
Gaius Mulley |
Subject: |
[Groff] (no subject) |
Date: |
Fri, 3 Dec 99 16:39 GMT |
Hi Werner,
Here are the patches as of 16:30 GMT
many thanks,
Gaius
diff -r -c groff/grohtml/ChangeLog groff.gaius/grohtml/ChangeLog
*** groff/grohtml/ChangeLog Thu Dec 2 20:39:53 1999
--- groff.gaius/grohtml/ChangeLog Wed Dec 1 16:35:48 1999
***************
*** 1,12 ****
! 1999-11-16 Gaius Mulley <address@hidden>
! * design.ms, grohtml.man: Updated.
! * html.cc, ordered_list.h: Fixed many bugs in the table handling
! code. Reverted the -t switch so that table handling code is used
! by default and users must turn it off with -t.
!
! Manual page generation using `groff -Thtml -man' is much better
due in large part to the table code and minor alterations in
tmac.an.
--- 1,15 ----
! 1999-11-29 Gaius Mulley <address@hidden>
! * fixed more bugs mainly in the table handling code. Making
! the code terminate a table at the correct position. Indented
! .IPs appear to work now. Region ends also correctly terminate
! tables.
! 1999-11-05 Gaius Mulley <address@hidden>
! * fixed many bugs in the table handling code. Reverted the -t
! switch so that table handling code is used by default and
! users must turn it off with -t.
! Manual page generation using groff -Thtml -man is much better
due in large part to the table code and minor alterations in
tmac.an.
diff -r -c groff/grohtml/design.ms groff.gaius/grohtml/design.ms
*** groff/grohtml/design.ms Thu Dec 2 20:39:53 1999
--- groff.gaius/grohtml/design.ms Wed Dec 1 16:35:48 1999
***************
*** 2,20 ****
.nr VS 14
.LP
.TL
! Design
.sp 1i
.SH
Overview of html.cc
.LP
This file briefly provides an overview of how html.cc operates.
The html device driver works as follows:
! .IP (i)
firstly it creates a linked list of all words on a page.
! .IP (ii)
it runs through the page and finds the left most margin. Later
on when generating the page it removes the margin.
! .IP (iii)
scans a page and builds two kinds of regions ascii text and graphical.
The graphical regions consist of tbl's, eqn's, pic's
(basically anything that cannot be textually displayed).
--- 2,41 ----
.nr VS 14
.LP
.TL
! Design of grohtml
.sp 1i
.SH
+ What is grohtml
+ .LP
+ Grohtml is a back end for groff which generates html.
+ The aim of grohtml is to produce respectible html given
+ fairly typical groff input.
+ .SH
+ Limitations of grohtml
+ .LP
+ Although basic text can be translated
+ in a straightforward fashion there are some areas where grohtml
+ has to try and guess text relationship. In particular whenever
+ grohtml encounters text tables and indented paragraphs or
+ two column mode it will try and utilize the html table construct
+ to preserve columns. Grohtml also attempts to work out which
+ lines should be automatically formatted by the browser.
+ Ultimately in trying to make reasonable guesses most of the time
+ it will make mistakes occasionally.
+ .PP
+ Tbl, pic, eqn's are also generated using images which may be
+ considered a limitation.
+ .SH
Overview of html.cc
.LP
This file briefly provides an overview of how html.cc operates.
The html device driver works as follows:
! .IP (i) .5i
firstly it creates a linked list of all words on a page.
! .IP (ii) .5i
it runs through the page and finds the left most margin. Later
on when generating the page it removes the margin.
! .IP (iii) .5i
scans a page and builds two kinds of regions ascii text and graphical.
The graphical regions consist of tbl's, eqn's, pic's
(basically anything that cannot be textually displayed).
***************
*** 22,29 ****
and places these into tiny graphical regions. Certain fonts
also are treated as a graphical region - as html has no easy
equivalent. For example Greek math symbols.
! .PP
! Finally all graphical regions are translated into gif files and
all text regions into html text.
.PP
To give grohtml a sporting chance of accuratly deciding which
--- 43,50 ----
and places these into tiny graphical regions. Certain fonts
also are treated as a graphical region - as html has no easy
equivalent. For example Greek math symbols.
! .LP
! Finally all graphical regions are translated into png files and
all text regions into html text.
.PP
To give grohtml a sporting chance of accuratly deciding which
***************
*** 31,83 ****
tbl, eqn, pic have all been tweeked to encapsulate pictures, tables
and equations with the following lines:
.sp
- .RS
.nf
\f[CR]\&.if '\\*(.T'html' \\X(graphic-start(\c
\&.if '\\*(.T'html' \\X(graphic-end(\c
\fP
.fi
- .RE
.sp
these appear to grohtml as:
.sp
- .RS
.nf
\f[CR]\&x X graphic-start
! and
\&x X graphic-end\fP
.fi
- .RE
.sp
! .PP
In addition to graphic-start and graphic-end there are two
other "special characters" which are used.
! .RS
! .nf
\f[CR]\&x X index:N\fP
! .fi
! .RE
where N is a number. The purpose of this sequence is to stop
devhtml from automatically producing links to headings which
! have a header level >N
! .RS
! .nf
\f[CR]\&x X html:STRING\fR
! .fi
! .RE
allows a STRING to be passed through to the output file with
no processing whatsoever. Ie it allows users to include html
commands, via macro, such as:
! .RS
! .nf
\f[CR]\&.URL "Latest Emacs" "ftp://somewonderful.gnu.software"\fP
! .fi
! .RE
Where the URL macro bundles the info into STRING above.
! For more info consult \f[CR]tmac/tmac.arkup\fP.
.PP
While scanning through a page the html device copies headings and titles
into a list of links which are later written to the beginning
--- 52,97 ----
tbl, eqn, pic have all been tweeked to encapsulate pictures, tables
and equations with the following lines:
.sp
.nf
\f[CR]\&.if '\\*(.T'html' \\X(graphic-start(\c
\&.if '\\*(.T'html' \\X(graphic-end(\c
\fP
.fi
.sp
these appear to grohtml as:
.sp
.nf
\f[CR]\&x X graphic-start
! \&...
\&x X graphic-end\fP
.fi
.sp
! .LP
In addition to graphic-start and graphic-end there are two
other "special characters" which are used.
! .sp
\f[CR]\&x X index:N\fP
! .sp
where N is a number. The purpose of this sequence is to stop
devhtml from automatically producing links to headings which
! have a header level >N.
! The line:
! .sp
\f[CR]\&x X html:STRING\fR
! .sp
! .LP
allows a STRING to be passed through to the output file with
no processing whatsoever. Ie it allows users to include html
commands, via macro, such as:
! .sp
\f[CR]\&.URL "Latest Emacs" "ftp://somewonderful.gnu.software"\fP
! .sp
! .LP
Where the URL macro bundles the info into STRING above.
! For more info consult: \f[CR]tmac/tmac.arkup\fP.
.PP
While scanning through a page the html device copies headings and titles
into a list of links which are later written to the beginning
***************
*** 92,124 ****
has to examine the troff output and \fIguess\fR when a table starts and
finishes. It is well to know the limitations of this approach as it
sometimes makes the wrong decision.
! .PP
Here are some of the rules that grohtml uses for terminating a html table:
! .RS
! .IP "\(bu"
A table will be terminated when grohtml finds line which is all in bold
font (it believes that this is a header which is outside of a table).
This might be considered incorrect behaviour especially if you use .2C
which generates a heading on the left column when the corresponding
right row is blank.
! .IP "\(bu"
A table is terminated when grohtml sees that the complete line is
has been spanned by words. Ie no gaps exist.
.SH
To do
.LP
! .IP (i)
finish working out the max and min x, y, extents for splines.
! .IP (ii)
check and test thoroughly all the character descriptions in devhtml
(originally taken from devX100)
! .IP (iii)
improve tmac.arkup
! .IP (vi)
also improve documentation.
.SH
Dependencies
.LP
! Grohtml is dependent upon grops, gs, ppmtogif and ppmquant which are invoked
to
! generate all gif files. Gif files are generated whenever a table, picture,
equation or line is encountered.
--- 106,156 ----
has to examine the troff output and \fIguess\fR when a table starts and
finishes. It is well to know the limitations of this approach as it
sometimes makes the wrong decision.
! .LP
Here are some of the rules that grohtml uses for terminating a html table:
! .LP
! .IP "(i)" .5i
A table will be terminated when grohtml finds line which is all in bold
font (it believes that this is a header which is outside of a table).
This might be considered incorrect behaviour especially if you use .2C
which generates a heading on the left column when the corresponding
right row is blank.
! .IP "(ii)" .5i
A table is terminated when grohtml sees that the complete line is
has been spanned by words. Ie no gaps exist.
+ .IP "(nb)" .5i
+ the documentation about these rules is particularly incomplete and needs
finishing
+ when time prevails.
.SH
To do
.LP
! .IP (i) .5i
finish working out the max and min x, y, extents for splines.
! .IP (ii) .5i
check and test thoroughly all the character descriptions in devhtml
(originally taken from devX100)
! .IP (iii) .5i
improve tmac.arkup
! .IP (vi) .5i
also improve documentation.
+ .IP (v) .5i
+ fix the bugs which are exposed by Eric Raymonds pic guide,
+ \fBMaking Pictures With GNU PIC\fR. It appears that grohtml becomes confused
+ about which sections of the document are text and which sections need
+ to be rendered as an image.
+ .IP (vi) .5i
+ it would be nice to modularise the source. A natural division might be
+ to extract the table handling code from html.cc into table.cc.
+ The table.cc could be expanded to recognise output from tbl and try
+ and generate html tables with lines/rules/boxes. The code as it stands
+ should cope with very simple plain text tables. But of course at present
+ it does not get a chance to do this because the output of gtbl is
+ bracketed by \fCgraphic-start\fR and \fCgraphic-end\fR.
+ .IP (vii) .5i
+ introduce anti aliasing for the images as mentioned by Werner.
.SH
Dependencies
.LP
! Grohtml is dependent upon grops, gs which are invoked to
! generate all png files. Png files are generated whenever a table, picture,
equation or line is encountered.
Only in groff.gaius/grohtml/: grohtml
Only in groff.gaius/grohtml/: grohtml.n
diff -r -c groff/grohtml/html.cc groff.gaius/grohtml/html.cc
*** groff/grohtml/html.cc Thu Dec 2 20:39:53 1999
--- groff.gaius/grohtml/html.cc Wed Dec 1 16:35:48 1999
***************
*** 645,658 ****
// this code is only present for safety sake
if (g->maxh < g->minh) {
if (debug_on) {
! fprintf(stderr, "assert failed minh > maxh\n");
stop();
}
g->maxh = g->minh;
}
if (g->maxv < g->minv) {
if (debug_on) {
! fprintf(stderr, "assert failed minv > maxv\n");
stop();
}
g->maxv = g->minv;
--- 645,658 ----
// this code is only present for safety sake
if (g->maxh < g->minh) {
if (debug_on) {
! fprintf(stderr, "assert failed minh > maxh\n"); fflush(stderr);
stop();
}
g->maxh = g->minh;
}
if (g->maxv < g->minv) {
if (debug_on) {
! fprintf(stderr, "assert failed minv > maxv\n"); fflush(stderr);
stop();
}
g->maxv = g->minv;
***************
*** 2772,2782 ****
line[0].left = 0;
line[0].right = 0;
if (start != 0) {
! graphic_glob *limit = 0;
! int graphic_limit = end_region_vpos;
if (is_whole_line_bold(t) && (t->minh == left_margin_indent)) {
// found header therefore terminate indentation table
} else {
int i =0;
int j =0;
--- 2772,2782 ----
line[0].left = 0;
line[0].right = 0;
if (start != 0) {
! int graphic_limit = end_region_vpos;
if (is_whole_line_bold(t) && (t->minh == left_margin_indent)) {
// found header therefore terminate indentation table
+ upper_limit = -t->minv; // so we know a header has stopped the column
} else {
int i =0;
int j =0;
***************
*** 3451,3457 ****
int limit;
#if 0
! if (strcmp(start->text_string, "(*") == 0) {
stop();
}
#endif
--- 3451,3457 ----
int limit;
#if 0
! if (strcmp(start->text_string, "(x)") == 0) {
stop();
}
#endif
***************
*** 3464,3470 ****
copy_line(all_words, last_guess);
indentation.vertical_limit = limit;
! if (page_contents->words.is_equal_to_head()) {
next_line[0].left = 0;
next_line[0].right = 0;
} else {
--- 3464,3470 ----
copy_line(all_words, last_guess);
indentation.vertical_limit = limit;
! if (page_contents->words.is_equal_to_head() || (limit == 0)) {
next_line[0].left = 0;
next_line[0].right = 0;
} else {
***************
*** 3500,3527 ****
combine_line(last_guess, next_line);
// subtract any columns which are bridged by a sequence of words
do {
- #if 0
- if (is_subset_of_columns(next_guess, last_guess)) {
- if (debug_table_on) {
- display_columns("[s]", "last_guess", last_guess);
- display_columns("[s]", "next_guess", next_guess);
- fprintf(stderr, "next_guess is a subset of last_guess - do
nothing\n");
- fflush(stderr);
- }
- } else {
- if (debug_table_on) {
- display_columns("[s]", "last_guess", last_guess);
- display_columns("[s]", "next_guess", next_guess);
- fprintf(stderr, "next_guess is not a subset of last_guess\n");
- fflush(stderr);
- }
- copy_line(last_guess, next_guess);
- }
- #else
copy_line(prev_guess, next_guess);
! combine_line(last_guess, next_guess); // was copy_line
! #endif
!
if (debug_table_on) {
t = page_contents->words.get_data();
display_columns(t->text_string, "[l] last_guess", last_guess);
--- 3500,3508 ----
combine_line(last_guess, next_line);
// subtract any columns which are bridged by a sequence of words
do {
copy_line(prev_guess, next_guess);
! combine_line(last_guess, next_guess);
!
if (debug_table_on) {
t = page_contents->words.get_data();
display_columns(t->text_string, "[l] last_guess", last_guess);
***************
*** 3546,3576 ****
if (debug_table_on) {
display_columns(t->text_string, "[l] next_line", next_line);
}
} while ((! remove_white_using_words(next_guess, last_guess, next_line))
&&
(! conflict_with_words(next_guess, all_words)) &&
(continue_searching_column(next_guess, last_guess, all_words)) &&
((is_continueous_column(prev_guess, last_raw)) ||
(is_exact_left(last_guess, next_line))) &&
! (! page_contents->words.is_equal_to_head()) && (limit != 0));
}
lines--;
! if (page_contents->words.is_equal_to_head()) {
! // end of page reached - therefore include everything
! indentation.vertical_limit = limit+1;
}
! rewind_text_to(start);
! if (debug_table_on) {
! display_columns(start->text_string, "[x] last_guess", last_guess);
}
- #if 0
- count_hits(last_guess);
- rewind_text_to(start);
if (debug_table_on) {
display_columns(start->text_string, "[x] last_guess", last_guess);
}
! #endif
i = count_columns(last_guess);
if (((lines > 1) && ((i>1) || (continue_searching_column(last_guess,
last_guess, all_words)))) ||
--- 3527,3577 ----
if (debug_table_on) {
display_columns(t->text_string, "[l] next_line", next_line);
}
+ t = page_contents->words.get_data();
+ #if 0
+ if (strcmp(t->text_string, "market,") == 0) {
+ stop();
+ }
+ #endif
+
} while ((! remove_white_using_words(next_guess, last_guess, next_line))
&&
(! conflict_with_words(next_guess, all_words)) &&
(continue_searching_column(next_guess, last_guess, all_words)) &&
((is_continueous_column(prev_guess, last_raw)) ||
(is_exact_left(last_guess, next_line))) &&
! (! page_contents->words.is_equal_to_head()) &&
! ((end_region_vpos <= 0) || (t->minv < end_region_vpos)) &&
! (limit >= 0));
}
lines--;
! if (limit < 0) {
! indentation.vertical_limit = limit;
}
! if (page_contents->words.is_equal_to_head()) {
! // end of page check whether we should include everything
! if ((! conflict_with_words(next_guess, all_words)) &&
! (continue_searching_column(next_guess, last_guess, all_words)) &&
! ((is_continueous_column(prev_guess, last_raw)) ||
(is_exact_left(last_guess, next_line)))) {
! // end of page reached - therefore include everything
! page_contents->words.start_from_tail();
! t = page_contents->words.get_data();
! indentation.vertical_limit = t->minv;
! }
! } else {
! t = page_contents->words.get_data();
! if ((end_region_vpos > 0) && (t->minv > end_region_vpos)) {
! indentation.vertical_limit = end_region_vpos+1;
! } else if (indentation.vertical_limit < 0) {
! // -1 as we don't want to include section heading itself
! indentation.vertical_limit = -indentation.vertical_limit-1;
! }
}
if (debug_table_on) {
display_columns(start->text_string, "[x] last_guess", last_guess);
}
! rewind_text_to(start);
i = count_columns(last_guess);
if (((lines > 1) && ((i>1) || (continue_searching_column(last_guess,
last_guess, all_words)))) ||
***************
*** 3705,3717 ****
if (left == right) {
return( right );
} else {
! int rightmost=-1;
text_glob *start = page_contents->words.get_data();
text_glob *g = start;
while ((g != 0) && (g->minv <= indentation.vertical_limit)) {
if ((left <= g->minh) && (g->minh<right)) {
! rightmost = max(g->maxh, rightmost);
}
page_contents->words.move_right();
if (page_contents->words.is_equal_to_head()) {
--- 3706,3734 ----
if (left == right) {
return( right );
} else {
! int rightmost =-1;
! int count = 0;
text_glob *start = page_contents->words.get_data();
text_glob *g = start;
while ((g != 0) && (g->minv <= indentation.vertical_limit)) {
if ((left <= g->minh) && (g->minh<right)) {
! if (debug_on) {
! fprintf(stderr, "right word = %s %d\n", g->text_string,
g->maxh); fflush(stderr);
! }
! if (g->maxh == rightmost) {
! count++;
! } else if (g->maxh > rightmost) {
! count = 1;
! rightmost = g->maxh;
! }
! if (g->maxh > right) {
! if (debug_on) {
! fprintf(stderr, "problem as right word = %s %d [%d..%d]\n",
! g->text_string, right, g->minh, g->maxh); fflush(stderr);
! stop();
! }
! }
}
page_contents->words.move_right();
if (page_contents->words.is_equal_to_head()) {
***************
*** 3725,3731 ****
if (rightmost == -1) {
return( right ); // no words in this column
} else {
! return( rightmost );
}
}
}
--- 3742,3752 ----
if (rightmost == -1) {
return( right ); // no words in this column
} else {
! if (count == 1) {
! return( rightmost+1 );
! } else {
! return( rightmost );
! }
}
}
}
***************
*** 3898,3904 ****
{
if (indentation.no_of_columns>0) {
int i;
! int left;
int limit=-1;
assign_used_columns(start);
--- 3919,3925 ----
{
if (indentation.no_of_columns>0) {
int i;
! int left, right;
int limit=-1;
assign_used_columns(start);
***************
*** 3908,3914 ****
do {
limit = determine_row_limit(start, limit); // find the bottom of the
next row
html.put_string("<tr valign=\"top\" align=\"left\">\n");
- left=left_margin_indent;
i=0;
start = page_contents->words.get_data();
while (i<indentation.no_of_columns) {
--- 3929,3934 ----
***************
*** 3917,3928 ****
output_vpos = start->minv;
output_hpos = indentation.columns[i].left;
// and display each column until limit
! column_display_word(limit,
!
column_calculate_left_margin(indentation.columns[i].left,
!
indentation.columns[i].right),
!
column_calculate_right_margin(indentation.columns[i].left,
!
indentation.columns[i].right),
! indentation.columns[i].right);
i++;
}
--- 3937,3962 ----
output_vpos = start->minv;
output_hpos = indentation.columns[i].left;
// and display each column until limit
! right = column_calculate_right_margin(indentation.columns[i].left,
! indentation.columns[i].right);
! left = column_calculate_left_margin(indentation.columns[i].left,
! indentation.columns[i].right);
!
! if (right>indentation.columns[i].right) {
! if (debug_on) {
! fprintf(stderr, "assert calculated right column edge is greater
than column\n"); fflush(stderr);
! stop();
! }
! }
!
! if (left<indentation.columns[i].left) {
! if (debug_on) {
! fprintf(stderr, "assert calculated left column edge is less than
column\n"); fflush(stderr);
! stop();
! }
! }
!
! column_display_word(limit, left, right, indentation.columns[i].right);
i++;
}
Only in groff.gaius/grohtml/: html.o
diff -r -c groff/doc/Makefile groff.gaius/doc/Makefile
*** groff/doc/Makefile Sun Sep 12 20:35:36 1999
--- groff.gaius/doc/Makefile Wed Dec 1 16:35:48 1999
***************
*** 47,56 ****
sed -e "s;@VERSION@;`cat ../VERSION`;" $< \
| ../groff/groff -p -e -t -Tps $(FFLAG) -ms >$@
install:
clean:
! -rm -f *.ps *.dit core
distclean: clean
--- 47,60 ----
sed -e "s;@VERSION@;`cat ../VERSION`;" $< \
| ../groff/groff -p -e -t -Tps $(FFLAG) -ms >$@
+ pic.html: pic.ms
+ sed -e "s;@VERSION@;`cat ../VERSION`;" $< \
+ | ../groff/groff -p -e -t -Thtml $(FFLAG) -ms -mhtml >$@
+
install:
clean:
! -rm -f *.ps *.html *.png *.gif *.dit core
distclean: clean
diff -r -c groff/doc/pic.ms groff.gaius/doc/pic.ms
*** groff/doc/pic.ms Fri May 21 05:50:47 1999
--- groff.gaius/doc/pic.ms Wed Dec 1 16:35:48 1999
***************
*** 299,305 ****
unit. Setting \fBscale = 2.54\fP will effectively change the internal
unit to centimeters (all other size variable valuess will be scaled
correspondingly).
! .NH 2 Default Sizes of Objects
.PP
Here are the default sizes for \fBpic\fP objects:
.RS
--- 299,306 ----
unit. Setting \fBscale = 2.54\fP will effectively change the internal
unit to centimeters (all other size variable valuess will be scaled
correspondingly).
! .NH 2
! Default Sizes of Objects
.PP
Here are the default sizes for \fBpic\fP objects:
.RS
diff -r -c groff/tmac/tmac.html groff.gaius/tmac/tmac.html
*** groff/tmac/tmac.html Thu Dec 2 20:41:46 1999
--- groff.gaius/tmac/tmac.html Wed Dec 1 16:35:48 1999
***************
*** 45,58 ****
.if !\n(_C .mso tmac.pspic
.cp \n(_C
.\" now turn off all headers and footers for ms, me and mm macro sets
! .EF '''
! .EH '''
! .OF '''
! .OH '''
! .ef '''
! .of '''
! .oh '''
! .eh '''
.\" it doesn't make sense to use hyphenation with html, so we turn it off.
.hy 0
.nr HY 0
--- 45,58 ----
.if !\n(_C .mso tmac.pspic
.cp \n(_C
.\" now turn off all headers and footers for ms, me and mm macro sets
! .if d EF .EF '''
! .if d EH .EH '''
! .if d OF .OF '''
! .if d OH .OH '''
! .if d ef .ef '''
! .if d of .of '''
! .if d oh .oh '''
! .if d eh .eh '''
.\" it doesn't make sense to use hyphenation with html, so we turn it off.
.hy 0
.nr HY 0
- [Groff] (no subject),
Gaius Mulley <=