[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #64279] Proposal to rename roff(7)
From: |
G. Branden Robinson |
Subject: |
[bug #64279] Proposal to rename roff(7) |
Date: |
Sat, 3 Jun 2023 13:15:41 -0400 (EDT) |
Update of bug #64279 (project groff):
Category: None => General
Severity: 3 - Normal => 1 - Wish
Item Group: None => Documentation
Status: None => Need Info
_______________________________________________________
Follow-up Comment #1:
Are you sure you're looking at a copy of roff(7) from the latest release
candidate, 1.23.0.rc4?
The page covers much more than just *roff history.
[...]
Below we present typographical concepts that form the background of
all roff implementations, narrate the development history of some
roff systems, detail the command pipeline managed by groff(1),
survey the formatting language, suggest tips for editing roff input,
and recommend further reading materials.
Concepts
roff input files contain text interspersed with instructions to
control the formatter. Even in the absence of such instructions, a
roff formatter still processes its input in several ways, by
filling, hyphenating, breaking, and adjusting it, and supplementing
it with inter-sentence space. These processes are basic to
typesetting, and can be controlled at the input document's
discretion.
When a device-independent roff formatter starts up, it obtains
information about the device for which it is preparing output from
the latter's description file (see groff_font(5)). An essential
property is the length of the output line, such as "6.5 inches".
The formatter interprets plain text files employing the Unix line-
ending convention. It reads input a character at a time, collecting
words as it goes, and fits as many words together on an output line
as it can--this is known as filling. To a roff system, a word is
any sequence of one or more characters that aren't spaces, tabs, or
newlines. The exceptions separate words.
A roff formatter attempts to detect boundaries between sentences,
and supplies additional inter-sentence space between them. It flags
certain characters (normally "!", "?", and ".") as potentially
ending a sentence. When the formatter encounters one of these end-
of-sentence characters at the end of an input line, or one of them
is followed by two (unescaped) spaces on the same input line, it
appends an inter-word space followed by an inter-sentence space in
the output. The dummy character escape sequence \& can be used
after an end-of-sentence character to defeat end-of-sentence
detection on a per-instance basis. Normally, the occurrence of a
visible non-end-of-sentence character (as opposed to a space or tab)
immediately after an end-of-sentence character cancels detection of
the end of a sentence. However, several characters are treated
transparently after the occurrence of an end-of-sentence character.
That is, a roff does not cancel end-of-sentence detection when it
processes them. This is because such characters are often used as
footnote markers or to close quotations and parentheticals. The
default set is ", ', ), ], *, \[dg], \[dd], \[rq], and \[cq]. The
last four are examples of special characters, escape sequences whose
purpose is to obtain glyphs that are not easily typed at the
keyboard, or which have special meaning to the formatter (like \).
When an output line is nearly full, it is uncommon for the next word
collected from the input to exactly fill it--typically, there is
room left over only for part of the next word. The process of
splitting a word so that it appears partially on one line (with a
hyphen to indicate to the reader that the word has been broken) with
its remainder on the next is hyphenation. Hyphenation points can be
manually specified; groff also uses a hyphenation algorithm and
language-specific pattern files to decide which words can be
hyphenated and where. Hyphenation does not always occur even when
the hyphenation rules for a word allow it; it can be disabled, and
when not disabled there are several parameters that can prevent it
in certain circumstances.
Once an output line is full, the next word (or remainder of a
hyphenated one) is placed on a different output line; this is called
a break. In this document and in roff discussions generally, a
"break" if not further qualified always refers to the termination of
an output line. When the formatter is filling text, it introduces
breaks automatically to keep output lines from exceeding the
configured line length. After an automatic break, a roff formatter
adjusts the line if applicable (see below), and then resumes
collecting and filling text on the next output line.
Sometimes, a line cannot be broken automatically. This usually does
not happen with natural language text unless the output line length
has been manipulated to be extremely short, but it can with
specialized text like program source code. groff provides a means
of telling the formatter where the line may be broken without
hyphens. This is done with the non-printing break point escape
sequence \:.
There are several ways to cause a break at a predictable location.
A blank input line not only causes a break, but by default it also
outputs a one-line vertical space (effectively a blank output line).
Macro packages may discourage or disable this "blank line method" of
paragraphing in favor of their own macros. A line that begins with
one or more spaces causes a break. The spaces are output at the
beginning of the next line without being adjusted (see below).
Again, macro packages may provide other methods of producing
indented paragraphs. Trailing spaces on text lines (see below) are
discarded. The end of input causes a break.
After the formatter performs an automatic break, it may then adjust
the line, widening inter-word spaces until the text reaches the
right margin. Extra spaces between words are preserved. Leading
and trailing spaces are handled as noted above. Text can be aligned
to the left or right margin only, or centered, using requests.
A roff formatter translates horizontal tab characters, also called
simply "tabs", in the input into movements to the next tab stop.
These tab stops are by default located every half inch measured from
the current position on the input line. With them, simple tables
can be made. However, this method can be deceptive, as the
appearance (and width) of the text in an editor and the results from
the formatter can vary greatly, particularly when proportional
typefaces are used. A tab character does not cause a break and
therefore does not interrupt filling. The formatter provides
facilities for sophisticated table composition; there are many
details to track when using the "tab" and "field" low-level
features, so most users turn to the tbl(1) preprocessor for table
construction.
Requests and macros
A request is an instruction to the formatter that occurs after a
control character, which is recognized at the beginning of an input
line. The regular control character is a dot ".". Its counterpart,
the no-break control character, a neutral apostrophe "'", suppresses
the break implied by some requests. These characters were chosen
because it is uncommon for lines of text in natural languages to
begin with them. If you require a formatted period or apostrophe
(closing single quotation mark) where the formatter is expecting a
control character, prefix the dot or neutral apostrophe with the
dummy character escape sequence, "\&".
An input line beginning with a control character is called a control
line. Every line of input that is not a control line is a text
line.
Requests often take arguments, words (separated from the request
name and each other by spaces) that specify details of the action
the formatter is expected to perform. If a request is meaningless
without arguments, it is typically ignored. Of key importance are
the requests that define macros. Macros are invoked like requests,
enabling the request repertoire to be extended or overridden.
A macro can be thought of as an abbreviation you can define for a
collection of control and text lines. When the macro is called by
giving its name after a control character, it is replaced with what
it stands for. The process of textual replacement is known as
interpolation. Interpolations are handled as soon as they are
recognized, and once performed, a roff formatter scans the
replacement for further requests, macro calls, and escape sequences.
In roff systems, the "de" request defines a macro.
Page geometry
roff systems format text under certain assumptions about the size of
the output medium, or page. For the formatter to correctly break a
line it is filling, it must know the line length, which it derives
from the page width. For it to decide whether to write an output
line to the current page or wait until the next one, it must know
the page length. A device's resolution converts practical units
like inches or centimeters to basic units, a convenient length
measure for the output device or file format. The formatter and
output driver use basic units to reckon page measurements. The
device description file defines its resolution and page dimensions
(see groff_font(5)).
A page is a two-dimensional structure upon which a roff system
imposes a rectangular coordinate system with its upper left corner
as the origin. Coordinate values are in basic units and increase
down and to the right. Useful ones are therefore always positive
and within numeric ranges corresponding to the page boundaries.
While the formatter (and, later, output driver) is processing a
page, it keeps track of its drawing position, which is the location
at which the next glyph will be written, from which the next motion
will be measured, or where a geometric primitive will commence
rendering. Notionally, glyphs are drawn from the text baseline
upward and to the right. (groff does not yet support right-to-left
scripts.) The text baseline is a (usually invisible) line upon
which the glyphs of a typeface are aligned. A glyph therefore
"starts" at its bottom-left corner. If drawn at the origin, a
typical letter glyph would lie partially or wholly off the page,
depending on whether, like "g", it features a descender below the
baseline.
Such a situation is nearly always undesirable. It is furthermore
conventional not to write or draw at the extreme edges of the page.
Therefore the initial drawing position of a roff formatter is not at
the origin, but below and to the right of it. This rightward shift
from the left edge is known as the page offset. (groff's terminal
output devices have page offsets of zero.) The downward shift
leaves room for a text output line.
Text is arranged on a one-dimensional lattice of text baselines from
the top to the bottom of the page. Vertical spacing is the distance
between adjacent text baselines. Typographic tradition sets this
quantity to 120% of the type size. The initial vertical drawing
position is one unit of vertical spacing below the page top.
Typographers term this unit a vee.
Vertical spacing has an impact on page-breaking decisions.
Generally, when a break occurs, the formatter moves the drawing
position to the next text baseline automatically. If the formatter
were already writing to the last line that would fit on the page,
advancing by one vee would place the next text baseline off the
page. Rather than let that happen, roff formatters instruct the
output driver to eject the page, start a new one, and again set the
drawing position to one vee below the page top; this is a page
break.
When the last line of input text corresponds to the last output line
that fits on the page, the break caused by the end of input will
also break the page, producing a useless blank one. Macro packages
keep users from having to confront this difficulty by setting
"traps"; moreover, all but the simplest page layouts tend to have
headers and footers, or at least bear vertical margins larger than
one vee.
Other language elements
Escape sequences start with the escape character, a backslash \, and
are followed by at least one additional character. They can appear
anywhere in the input.
With requests, the escape and control characters can be changed;
further, escape sequence recognition can be turned off and back on.
Strings store character sequences. In groff, they can be
parameterized as macros can.
Registers store numerical values, including measurements. The
latter are generally in basic units; scaling units can be appended
to numeric expressions to clarify their meaning when stored or
interpolated. Some read-only predefined registers interpolate text.
Fonts are identified either by a name or by a mounting position (a
non-negative number). Four styles are available on all devices. R
is "roman": normal, upright text. B is bold, an upright typeface
with a heavier weight. I is italic, a face that is oblique on
typesetter output devices and usually underlined instead on terminal
devices. BI is bold-italic, combining both of the foregoing style
variations. Typesetting devices group these four styles into
families of text fonts; they also typically offer one or more
special fonts that provide unstyled glyphs; see groff_char(7).
groff supports named colors for glyph rendering and drawing of
geometric primitives. Stroke and fill colors are distinct; the
stroke color is used for glyphs.
Glyphs are visual representation forms of characters. In groff, the
distinction between those two elements is not always obvious (and a
full discussion is beyond our scope). In brief, "A" is a character
when we consider it in the abstract: to make it a glyph, we must
select a typeface with which to render it, and determine its type
size and color. The formatting process turns input characters into
output glyphs. A few characters commonly seen on keyboards are
treated specially by the roff language and may not look correct in
output if used unthinkingly; they are the (double) quotation mark
("), the neutral apostrophe ('), the minus sign (-), the backslash
(\), the caret or circumflex accent (^), the grave accent (`), and
the tilde (~). All of these and more can be produced with special
character escape sequences; see groff_char(7).
groff offers streams, identifiers for writable files, but for
security reasons this feature is disabled by default.
A further few language elements arise as page layouts become more
sophisticated and demanding. Environments collect formatting
parameters like line length and typeface. A diversion stores
formatted output for later use. A trap is a condition on the input
or output, tested automatically by the formatter, that is associated
with a macro, calling it when that condition is fulfilled.
Footnote support often exercises all three of the foregoing
features. A simple implementation might work as follows. A pair of
macros is defined: one starts a footnote and the other ends it. The
author calls the first macro where a footnote marker is desired.
The macro establishes a diversion so that the footnote text is
collected at the place in the body text where its corresponding
marker appears. An environment is created for the footnote so that
it is set at a smaller typeface. The footnote text is formatted in
the diversion using that environment, but it does not yet appear in
the output. The document author calls the footnote end macro, which
returns to the previous environment and ends the diversion. Later,
after much more body text in the document, a trap, set a small
distance above the page bottom, is sprung. The macro called by the
trap draws a line across the page and emits the stored diversion.
Thus, the footnote is rendered.
History
[...]
The "History" section is only about 1/3rd of the page by line count.
Significant, but not even a majority of the content.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?64279>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [bug #64279] Proposal to rename roff(7), Michał Kruszewski, 2023/06/03
- Message not available
- [bug #64279] Proposal to rename roff(7),
G. Branden Robinson <=
- [bug #64279] Proposal to rename roff(7), G. Branden Robinson, 2023/06/03
- [bug #64279] Proposal to rename roff(7), Michał Kruszewski, 2023/06/03
- Message not available
- [bug #64279] Proposal to rename roff(7), Michał Kruszewski, 2023/06/03
- Message not available
- [bug #64279] Proposal to rename roff(7), G. Branden Robinson, 2023/06/03
- [bug #64279] rename roff(7) to get out of mandoc's way, G. Branden Robinson, 2023/06/21