[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: Handle blocks of word characters in XS paragraph
From: |
Gavin D. Smith |
Subject: |
branch master updated: Handle blocks of word characters in XS paragraph formatter |
Date: |
Sat, 21 Oct 2023 11:20:30 -0400 |
This is an automated email from the git hooks/post-receive script.
gavin pushed a commit to branch master
in repository texinfo.
The following commit(s) were added to refs/heads/master by this push:
new d813c10de2 Handle blocks of word characters in XS paragraph formatter
d813c10de2 is described below
commit d813c10de2fc06c691a90b96db00e6087e20d03e
Author: Gavin Smith <gavinsmith0123@gmail.com>
AuthorDate: Sat Oct 21 16:20:21 2023 +0100
Handle blocks of word characters in XS paragraph formatter
* tp/Texinfo/XS/xspara.c (xspara__add_next): Allow for 'word'
argument not to be null-terminated.
(xspara__add_text): Call xspara__add_next with block of word
characters, rather than one at a time. This has a minor performance
benefit and matches the Perl code better.
---
ChangeLog | 10 ++++++++
tp/Texinfo/XS/xspara.c | 70 ++++++++++++++++++++++++++------------------------
2 files changed, 46 insertions(+), 34 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index be247ef46b..07383ff161 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2023-10-21 Gavin Smith <gavinsmith0123@gmail.com>
+
+ Handle blocks of word characters in XS paragraph formatter
+
+ * tp/Texinfo/XS/xspara.c (xspara__add_next): Allow for 'word'
+ argument not to be null-terminated.
+ (xspara__add_text): Call xspara__add_next with block of word
+ characters, rather than one at a time. This has a minor performance
+ benefit and matches the Perl code better.
+
2023-10-21 Gavin Smith <gavinsmith0123@gmail.com>
* tp/Texinfo/Convert/ParagraphNonXS.pm (add_next, add_text):
diff --git a/tp/Texinfo/XS/xspara.c b/tp/Texinfo/XS/xspara.c
index 75f74c59cd..c7d507811e 100644
--- a/tp/Texinfo/XS/xspara.c
+++ b/tp/Texinfo/XS/xspara.c
@@ -735,7 +735,7 @@ xspara__add_next (TEXT *result, char *word, int word_len,
int transparent)
}
}
- if (strchr (word, '\n'))
+ if (memchr (word, '\n', word_len))
{
/* If there was a newline in the word we just added, put the entire
pending ouput in the results string, and start a new line. */
@@ -975,7 +975,8 @@ xspara_add_text (char *text, int len)
/* TODO: test just one character at a time to start. then
we can gradually work on the various blocks of
code to operate on multiple characters. */
- if (1 || next_type != type || next_type == type_finished)
+ if ((type != type_regular)
+ || next_type != type || next_type == type_finished)
break;
q += next_len; len -= next_len;
@@ -1130,42 +1131,43 @@ xspara_add_text (char *text, int len)
/*************** Word character ******************************/
else if (type == type_regular)
{
- static char added_word[8]; /* long enough for one UTF-8 character */
- memcpy (added_word, p, q - p);
- added_word[q - p] = '\0';
+ xspara__add_next (&result, p, q - p, 0);
- xspara__add_next (&result, added_word, q - p, 0);
-
- /* Now check if it is considered as an end of sentence, and
- set state.end_sentence if it is. */
-
- if (strchr (end_sentence_characters, *p) && !state.unfilled)
+ /* Now check for an end of sentence. We can iterate backwards
+ by bytes as all the end-sentence characters or punctuation are
+ ASCII. */
+ char *q2 = q;
+ while (q2 > p)
{
- /* Doesn't count if preceded by an upper-case letter. */
- if (!iswupper (state.last_letter))
+ q2--;
+ if (strchr (end_sentence_characters, *q2) && !state.unfilled)
{
- if (state.french_spacing)
- state.end_sentence = -1;
- else
- state.end_sentence = 1;
- if (debug)
- fprintf (stderr, "END_SENTENCE\n");
+ /* Doesn't count if preceded by an upper-case letter. */
+ if (!iswupper (state.last_letter))
+ {
+ if (state.french_spacing)
+ state.end_sentence = -1;
+ else
+ state.end_sentence = 1;
+ if (debug)
+ fprintf (stderr, "END_SENTENCE\n");
+ break;
+ }
+ }
+ else if (strchr (after_punctuation_characters, *q2))
+ {
+ /* These characters are ignored when checking for the end
+ of a sentence. */
+ }
+ else
+ {
+ /* Not at the end of a sentence. */
+ if (debug && state.end_sentence != -2)
+ fprintf (stderr, "delete END_SENTENCE(%d)\n",
+ state.end_sentence);
+ state.end_sentence = -2;
+ break;
}
- }
- else if (strchr (after_punctuation_characters, *p))
- {
- /* '"', '\'', ']' and ')' are ignored for the purpose
- of deciding whether a full stop ends a sentence. */
- }
- else
- {
- /* Otherwise reset the end of sentence marker: a full stop in
- a string like "aaaa.bbbb" doesn't mark an end of
- sentence. */
- if (debug && state.end_sentence != -2)
- fprintf (stderr, "delete END_SENTENCE(%d)\n",
- state.end_sentence);
- state.end_sentence = -2;
}
}
else if (type == type_unknown)
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch master updated: Handle blocks of word characters in XS paragraph formatter,
Gavin D. Smith <=