branch master updated: Handle blocks of word characters in XS paragraph

texinfo-commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: Handle blocks of word characters in XS paragraph

From:	Gavin D. Smith
Subject:	branch master updated: Handle blocks of word characters in XS paragraph formatter
Date:	Sat, 21 Oct 2023 11:20:30 -0400

This is an automated email from the git hooks/post-receive script.

gavin pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new d813c10de2 Handle blocks of word characters in XS paragraph formatter
d813c10de2 is described below

commit d813c10de2fc06c691a90b96db00e6087e20d03e
Author: Gavin Smith <gavinsmith0123@gmail.com>
AuthorDate: Sat Oct 21 16:20:21 2023 +0100

    Handle blocks of word characters in XS paragraph formatter
    
    * tp/Texinfo/XS/xspara.c (xspara__add_next): Allow for 'word'
    argument not to be null-terminated.
    (xspara__add_text): Call xspara__add_next with block of word
    characters, rather than one at a time.  This has a minor performance
    benefit and matches the Perl code better.
---
 ChangeLog              | 10 ++++++++
 tp/Texinfo/XS/xspara.c | 70 ++++++++++++++++++++++++++------------------------
 2 files changed, 46 insertions(+), 34 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index be247ef46b..07383ff161 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2023-10-21  Gavin Smith <gavinsmith0123@gmail.com>
+
+       Handle blocks of word characters in XS paragraph formatter
+
+       * tp/Texinfo/XS/xspara.c (xspara__add_next): Allow for 'word'
+       argument not to be null-terminated.
+       (xspara__add_text): Call xspara__add_next with block of word
+       characters, rather than one at a time.  This has a minor performance
+       benefit and matches the Perl code better.
+
 2023-10-21  Gavin Smith <gavinsmith0123@gmail.com>
 
        * tp/Texinfo/Convert/ParagraphNonXS.pm (add_next, add_text):
diff --git a/tp/Texinfo/XS/xspara.c b/tp/Texinfo/XS/xspara.c
index 75f74c59cd..c7d507811e 100644
--- a/tp/Texinfo/XS/xspara.c
+++ b/tp/Texinfo/XS/xspara.c
@@ -735,7 +735,7 @@ xspara__add_next (TEXT *result, char *word, int word_len, 
int transparent)
         }
     }
 
-  if (strchr (word, '\n'))
+  if (memchr (word, '\n', word_len))
     {
       /* If there was a newline in the word we just added, put the entire
          pending ouput in the results string, and start a new line. */
@@ -975,7 +975,8 @@ xspara_add_text (char *text, int len)
           /* TODO: test just one character at a time to start.  then
              we can gradually work on the various blocks of
              code to operate on multiple characters. */
-          if (1 || next_type != type || next_type == type_finished)
+          if ((type != type_regular)
+              || next_type != type || next_type == type_finished)
             break;
 
           q += next_len; len -= next_len;
@@ -1130,42 +1131,43 @@ xspara_add_text (char *text, int len)
       /*************** Word character ******************************/
       else if (type == type_regular)
         {
-          static char added_word[8]; /* long enough for one UTF-8 character */
-          memcpy (added_word, p, q - p);
-          added_word[q - p] = '\0';
+          xspara__add_next (&result, p, q - p, 0);
 
-          xspara__add_next (&result, added_word, q - p, 0);
-
-          /* Now check if it is considered as an end of sentence, and
-             set state.end_sentence if it is. */
-
-          if (strchr (end_sentence_characters, *p) && !state.unfilled)
+          /* Now check for an end of sentence.  We can iterate backwards
+             by bytes as all the end-sentence characters or punctuation are
+             ASCII. */
+          char *q2 = q;
+          while (q2 > p)
             {
-              /* Doesn't count if preceded by an upper-case letter. */
-              if (!iswupper (state.last_letter))
+              q2--;
+              if (strchr (end_sentence_characters, *q2) && !state.unfilled)
                 {
-                  if (state.french_spacing)
-                    state.end_sentence = -1;
-                  else
-                    state.end_sentence = 1;
-                  if (debug)
-                    fprintf (stderr, "END_SENTENCE\n");
+                  /* Doesn't count if preceded by an upper-case letter. */
+                  if (!iswupper (state.last_letter))
+                    {
+                      if (state.french_spacing)
+                        state.end_sentence = -1;
+                      else
+                        state.end_sentence = 1;
+                      if (debug)
+                        fprintf (stderr, "END_SENTENCE\n");
+                      break;
+                    }
+                }
+              else if (strchr (after_punctuation_characters, *q2))
+                {
+                  /* These characters are ignored when checking for the end
+                     of a sentence. */
+                }
+              else
+                {
+                  /* Not at the end of a sentence. */
+                  if (debug && state.end_sentence != -2)
+                    fprintf (stderr, "delete END_SENTENCE(%d)\n",
+                                      state.end_sentence);
+                  state.end_sentence = -2;
+                  break;
                 }
-            }
-          else if (strchr (after_punctuation_characters, *p))
-            {
-              /* '"', '\'', ']' and ')' are ignored for the purpose
-               of deciding whether a full stop ends a sentence. */
-            }
-          else
-            {
-              /* Otherwise reset the end of sentence marker: a full stop in
-                 a string like "aaaa.bbbb" doesn't mark an end of
-                 sentence. */
-              if (debug && state.end_sentence != -2)
-                fprintf (stderr, "delete END_SENTENCE(%d)\n",
-                                  state.end_sentence);
-              state.end_sentence = -2;
             }
         }
       else if (type == type_unknown)

[Prev in Thread]

Current Thread

[Next in Thread]

branch master updated: Handle blocks of word characters in XS paragraph formatter, Gavin D. Smith <=

Prev by Date: branch master updated: * tp/Texinfo/Convert/ParagraphNonXS.pm (add_next, add_text): Add FIXME comment saying to make the code more like the XS code in handling of \x08 control characters.
Next by Date: branch master updated: update perl module versions for new modules
Previous by thread: branch master updated: * tp/Texinfo/Convert/ParagraphNonXS.pm (add_next, add_text): Add FIXME comment saying to make the code more like the XS code in handling of \x08 control characters.
Next by thread: branch master updated: update perl module versions for new modules
Index(es):
- Date
- Thread