bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #66051] [troff] permit special characters to have bespoke hyphenati


From: G. Branden Robinson
Subject: [bug #66051] [troff] permit special characters to have bespoke hyphenation codes
Date: Wed, 7 Aug 2024 11:13:27 -0400 (EDT)

Update of bug #66051 (group groff):

                  Status:                    None => Postponed              

    _______________________________________________________

Follow-up Comment #7:

Rut-roh, Raggy.

So, yeah, let's put in the missing bits.


diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 44e0981f6..6d9ee85c6 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -7275,12 +7275,15 @@ static void set_character_flags()
   skip_line();
 }
 
+// We use this to mint hyphenation codes for special characters.
+static int hyphenation_code_counter = 256;
+
 static void set_hyphenation_codes()
 {
   tok.skip();
   if (tok.is_newline() || tok.is_eof()) {
-    warning(WARN_MISSING, "hyphenation code configuration request"
-           " expects arguments");
+    warning(WARN_MISSING, "hyphenation code assignment request expects"
+           " arguments");
     skip_line();
     return;
   }
@@ -7309,24 +7312,38 @@ static void set_hyphenation_codes()
       error("cannot use the hyphenation code of a numeral");
       break;
     }
-    unsigned char new_code = 0; // TODO: int
+    int new_code = 0;
     charinfo *cisrc = tok.get_char();
-    if (csrc != 0)
-      new_code = csrc;
-    else {
+    if (cisrc != 0 /* nullptr */)
+      // Common case: assign destination character the hyphenation code
+      // of another character that already has one.
+      new_code = cisrc->get_hyphenation_code();
+    if (0 == csrc) {
       if (0 /* nullptr */ == cisrc) {
        error("expected ordinary or special character, got %1",
              tok.description());
        break;
       }
        break;
       }
       // source character is special
-      if (0 == cisrc->get_hyphenation_code()) {
-       error("second member of hyphenation code pair must be an"
-             " ordinary character, or a special character already"
-             " assigned a hyphenation code");
-       break;
+      if (strcmp(cidst->nm.contents(), cisrc->nm.contents()) == 0) {
+       debug("GBR: mint new");
+       if (hyphenation_code_counter == INT_MAX) {
+         error("supply of hyphenation codes exhausted");
+         break;
+       }
+       new_code = hyphenation_code_counter++;
+       debug("GBR: new code is %1", int(new_code));
       }
-      new_code = cisrc->get_hyphenation_code();
+      else {
+       debug("GBR: copy code");
+       new_code = cisrc->get_hyphenation_code();
+      }
+    }
+    else {
+      // If assigning a code to itself, use its character encoding value
+      // to mint a new code.
+      if (csrc == cdst)
+       new_code = tok.ch();
     }
     cidst->set_hyphenation_code(new_code);
     if (cidst->get_translation()


No problem.  Compiles fine.

Let's test it.


$ nl EXPERIMENTS/phcode-works.groff
     1  .phcode a
     2  .phcode A
     3  .phcode $
     4  .tm invoking .hcode \['a] $
     5  .hcode \['a] $
     6  .phcode \['a]
     7  .tm invoking .hcode \['a] \['a]
     8  .hcode \['a] \['a]
     9  .phcode \['a]
    10  .tm invoking .hcode \['a] \['A]
    11  .hcode \['a] \['A]
    12  .phcode \['a]
    13  .phcode \[vS]
$ ./build/test-groff EXPERIMENTS/phcode-works.groff
a       97
A       97
$       0
invoking .hcode \['a] $
\['a]   0
invoking .hcode \['a] \['a]
troff:EXPERIMENTS/phcode-works.groff:8: debug: GBR: mint new
troff:EXPERIMENTS/phcode-works.groff:8: debug: GBR: new code is 256
\['a]   0
invoking .hcode \['a] \['A]
troff:EXPERIMENTS/phcode-works.groff:11: debug: GBR: copy code
\['a]   225
\[vS]   0


"new code is 256"--great!

Next line...


\['a]   0


Wha..?  Well, yeah.  Because the hyphenation code is a `char`, so when we
assigned an `int` to it, the upper bits got masked off, and this isn't warned
about because pfffffffft, we're C/C++ programmers--we live close to the metal
where no one ever worries about narrowing type conversions.  😐

All right, fine, a hyphenation code is just a numeric type.  Let's widen it.


diff --git a/src/roff/troff/charinfo.h b/src/roff/troff/charinfo.h
index 02be5aa95..6b05c1dc2 100644
--- a/src/roff/troff/charinfo.h
+++ b/src/roff/troff/charinfo.h
@@ -76,7 +76,7 @@ public:
   int prohibit_break_before();
   int prohibit_break_after();
   int inter_char_space();
-  unsigned char get_hyphenation_code();
+  int get_hyphenation_code();
   unsigned char get_ascii_code();
   unsigned char get_asciify_code();
   int get_unicode_code();
@@ -212,7 +212,7 @@ inline charinfo *charinfo::get_translation(int
transparent_throughput)
          : translation);
 }
 
-inline unsigned char charinfo::get_hyphenation_code()
+inline int charinfo::get_hyphenation_code()
 {
   return hyphenation_code;
 }
diff --git a/src/roff/troff/env.cpp b/src/roff/troff/env.cpp
index ca69e38fa..8d80ad2ba 100644
--- a/src/roff/troff/env.cpp
+++ b/src/roff/troff/env.cpp
@@ -3634,7 +3634,7 @@ static void add_hyphenation_exceptions()
     skip_line();
     return;
   }
-  char buf[WORD_MAX + 1];
+  int buf[WORD_MAX + 1];
   unsigned char pos[WORD_MAX + 2];
   for (;;) {
     tok.skip();
@@ -3655,7 +3655,7 @@ static void add_hyphenation_exceptions()
          pos[npos++] = i;
       }
       else {
-       unsigned char c = ci->get_hyphenation_code();
+       int c = ci->get_hyphenation_code();
        if (c == 0)
          break;
        buf[i++] = c;
@@ -3664,9 +3664,9 @@ static void add_hyphenation_exceptions()
     if (i > 0) {
       pos[npos] = 0;
       buf[i] = 0;
-      unsigned char *tem = new unsigned char[npos + 1];
+      int *tem = new int[npos + 1];
       memcpy(tem, pos, npos + 1);
-      tem = (unsigned char
*)current_language->exceptions.lookup(symbol(buf),
+      tem = (int *)current_language->exceptions.lookup(symbol(buf),
                                                                 tem);
       if (tem)
        delete[] tem;


Easy-peasy.  Now...


$ make -C build troff
make: Entering directory '/home/branden/src/GIT/groff/build'
  CXX      src/roff/troff/env.o
../src/roff/troff/env.cpp: In function ‘void
add_hyphenation_exceptions()’:
../src/roff/troff/env.cpp:3669:66: error: no matching function for call to
‘symbol::symbol(int [257])’
 3669 |       tem = (int *)current_language->exceptions.lookup(symbol(buf),
      |                                                                  ^
In file included from ../src/include/color.h:22,
                 from ../src/roff/troff/troff.h:32,
                 from ../src/roff/troff/env.cpp:19:
../src/include/symbol.h:46:8: note: candidate: ‘symbol::symbol()’
   46 | inline symbol::symbol() : s(0)
      |        ^~~~~~
../src/include/symbol.h:46:8: note:   candidate expects 0 arguments, 1
provided
../src/include/symbol.h:32:3: note: candidate: ‘symbol::symbol(const char*,
int)’
   32 |   symbol(const char *p, int how = 0);
      |   ^~~~~~
../src/include/symbol.h:32:22: note:   no known conversion for argument 1 from
‘int [257]’ to ‘const char*’
   32 |   symbol(const char *p, int how = 0);
      |          ~~~~~~~~~~~~^
../src/include/symbol.h:24:7: note: candidate: ‘constexpr
symbol::symbol(const symbol&)’
   24 | class symbol {
      |       ^~~~~~
../src/include/symbol.h:24:7: note:   no known conversion for argument 1 from
‘int [257]’ to ‘const symbol&’
../src/include/symbol.h:24:7: note: candidate: ‘constexpr
symbol::symbol(symbol&&)’
../src/include/symbol.h:24:7: note:   no known conversion for argument 1 from
‘int [257]’ to ‘symbol&&’
make: *** [Makefile:8689: src/roff/troff/env.o] Error 1
make: Leaving directory '/home/branden/src/GIT/groff/build'


All right, it's our old friend the `symbol` class, which is used all over the
place.

Suddenly following this thread around the corner reveals the hulking beast
known as "change _groff_'s internal character representation to a wide type",
which is also a prerequisite for internal Unicode support.

So, uh, this one's screeching to a halt for 1.24.

I'll land what I can (without changing any classes), declare defeat, and move
on to the next formatter change.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66051>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]