groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] input.cpp: Remove use of strncat(3)


From: Alejandro Colomar
Subject: Re: [RFC] input.cpp: Remove use of strncat(3)
Date: Tue, 6 Dec 2022 14:28:17 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1

Hi Larry,

On 12/6/22 00:45, Larry McVoy wrote:
On Mon, Dec 05, 2022 at 11:39:37PM +0100, Alejandro Colomar wrote:
See the strncat(3) Linux manual page for details about why strncat(3)
is actively harmful.

Is there a a known instance of strncpy() causing a problem in the groff
source?

Well, that's the only one instance of strncat(3) in groff, so we should be in a position to tell with certainty. :)


I get it, you are fixing a possible problem and there may not (yet)
be any places where the problem actually occurs but it still could.
I get it.

Yep.


I used to run a company that did source management tools and we strongly
discouraged these sorts of "clean ups" because they change the author
of that line of code.  Our system made it painless to go from a line of
code to the commit that added that line of code.  Which, if it goes back
to the original commit, is super pleasant for debugging.  If I have go
to your commit, then realize, oh, it used to be the commit before that
and so on, it takes longer to figure out the bug.

For the time I've been contributing/maintaining the Linux man-pages (just a bit over 2 years), I've had very many occasions where I've had to do exactly, to investigate who were the authors of some documentation and ask them to clarify some details, or fix some errors.

git-blame(1) is _really_ easy to use. I just followed the entire history of changes that I applied to the prototype of strncat(3) in SYNOPSIS. I didn't measure the time, but probably under a minute. See how:

$ git blame -- man3/strncat.3 | grep BI.*strncat
9baab44e7d (Alejandro Colomar 2022-12-06 02:15:24 +0100 16) .BI "char *strncat(char " dest "[restrict strlen(." dest ") + strnlen(." src ") + 1],"
$ git show --color 9baab44e7d | less -R
$ git blame 9baab44e7d^ -- man3/strncat.3 | grep BI.*strncat
5c6f6e6767 (Alejandro Colomar 2022-12-05 16:46:44 +0100 16) .BI "char *strncat(char " dest "[restrict strlen(." dest ") + strnlen(." n ") + 1],"
$ git show --color 5c6f6e6767 | less -R
$ git blame 5c6f6e6767^ -- man3/strcat.3 | grep BI.*strncat
Blaming lines: 100% (227/227), done.
7c19ad66c8 (Alejandro Colomar 2022-12-05 16:09:47 +0100 23) .BI "char *strncat(char " dest "[restrict strlen(." dest ") + strnlen(." n ") + 1],"
$ git show --color 7c19ad66c8 | less -R
$ git blame 7c19ad66c8^ -- man3/strcat.3 | grep BI.*strncat
Blaming lines: 100% (228/228), done.
1eed67e75d (Alejandro Colomar 2022-08-26 22:48:26 +0200 23) .BI "char *strncat(char " dest "[restrict ." n "], \
$ git show --color 1eed67e75d | less -R
$ git blame 1eed67e75d^ -- man3/strcat.3 | grep BI.*strncat
Blaming lines: 100% (227/227), done.
e4f4e683e5 (Alejandro Colomar 2021-03-10 19:31:37 +0100 23) .BI "char *strncat(char *restrict " dest ", const char *restrict " src \
$ git show --color e4f4e683e5 | less -R
$ git blame e4f4e683e5^ -- man3/strcat.3 | grep BI.*strncat
^fea681daf (Michael Kerrisk 2004-11-03 13:51:07 +0000 40) .BI "char *strncat(char *" dest ", const char *" src ", size_t " n );


And for those who are not so used to it, or for any reasons prefer a web UI, cgit-pink provides this functionality in clicky version. Github also does.



Before you say I'm counting angels on the head of a ping, that ability
to debug lead us to knowing, on average, 24x7x365, the root cause of a
bug in under 24 minutes.  If you limited it to business hours, it was
under 5 minutes.  We sold to big companies, they *loved* our support.

Well, can't argue too much with that. I don't have numbers of how much git-blame(1) or cgit-pink have been sold; being GPLv2 I can guess 0... :P I understand that in the past, this was a deal breaker. But today we have tools that eliminate that problem.

And this is hiding the cost of having not-self-documented code. Did anybody measure the time to understand a given random line of code?

The history of that line of code in my local repo is:

$ git blame -- src/roff/troff/input.cpp | grep strlcat
3e41a50dc8 src/roff/troff/input.cpp (Alejandro Colomar 2022-12-05 23:30:15 +0100 7895) strlcat(s, fn, sizeof(MACRO_PREFIX));
$ git blame 3e41a50dc8^ -- src/roff/troff/input.cpp | grep strncat
fca2b723e7 src/roff/troff/input.cpp (G. Branden Robinson 2018-11-06 05:53:52 -0500 7895) strncat(s, fn, fnlen - sizeof(MACRO_POSTFIX) + 1);
$ git show --color fca2b723e7 | less -R
$ git blame fca2b723e7^ -- src/roff/troff/input.cpp | grep strncat
1460ceee3a src/roff/troff/input.cc (Werner LEMBERG 2000-10-25 21:06:47 +0000 7716) strncat(s, fn, strlen(fn) - sizeof(MACRO_POSTFIX) + 1);
$ git show --color 1460ceee3a | less -R


I just found that I got my patch wrong (and in the process, I found what strncat(3) really is for: concatenating an unterminated string into a string).

This means that we really can't use strlcat(3) in this case, since we only want to copy a substring of 'fn'. strncat(3) is really the only standard function that does this, so we either continue using it, _and_ document what strncat(3) really is for (that will be my job in the manual page), or move to a safer function (but we would have to write it ourselves).

So, strncpy(3) is similar to ustr2str(), as defined in a proposal for the shadow project:
<https://github.com/shadow-maint/shadow/pull/603/files#diff-a2b8f8568ad025c2aeb4c64d3510f42d874e9ef0a64a89c2dd750aa93aabcd97>
Except that strncat(3) appends instead of a straight copy. That made me realize that I can improve the design of ustr2str() to allow for easier concatenation, similar to stpecpy() (or stpcpy(3)):
<https://software.codidact.com/posts/285946>

But defining and using such a function might be too much work for groff. I'll leave the readability questions up to the maintainers, which are those who will suffer them. Considering strncat(3) is being used correctly in the single instance it's used, I think it's fine.


However, and agreeing with Ralph's suggestion from a few weeks ago, I'll propose an alternative patch:



diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index ade608563..705af7334 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -7880,19 +7880,19 @@ void do_macro_source(bool quietly)
     if (!fp) {
       const char *fn = nm.contents();
       size_t fnlen = strlen(fn);
-      if (strncasecmp(fn, MACRO_PREFIX, sizeof(MACRO_PREFIX) - 1) == 0) {
+      if (strncasecmp(fn, MACRO_PREFIX, strlen(MACRO_PREFIX)) == 0) {
        char *s = new char[fnlen + sizeof(MACRO_POSTFIX)];
-       strcpy(s, fn + sizeof(MACRO_PREFIX) - 1);
+       strcpy(s, fn + strlen(MACRO_PREFIX));
        strcat(s, MACRO_POSTFIX);
        fp = mac_path->open_file(s, &path);
        delete[] s;
       }
       if (!fp) {
-       if (strncasecmp(fn + fnlen - sizeof(MACRO_POSTFIX) + 1,
+       if (strncasecmp(fn + fnlen - strlen(MACRO_POSTFIX),
                        MACRO_POSTFIX, sizeof(MACRO_POSTFIX) - 1) == 0) {
          char *s = new char[fnlen + sizeof(MACRO_PREFIX)];
          strcpy(s, MACRO_PREFIX);
-         strncat(s, fn, fnlen - sizeof(MACRO_POSTFIX) + 1);
+         strncat(s, fn, fnlen - strlen(MACRO_POSTFIX));
          fp = mac_path->open_file(s, &path);
          delete[] s;
        }


The compiler should be able to transform those into compile-time constants, 
anyway.


So there is real value to asking yourself "Am I actually fixing a real
bug?"

Depending on your definition of bug. If you mean observable behavior, no, the current groff is fine. If you include self-documentation problems in the definition of bug, then, the above diff is fixing a bug.


And it is groff, not some mission critical database.  If I were in
charge, I veto these sorts of commits and fix the problems if/when
they arise and have a cleaner history.

Some servers offer online manual pages. I guess most of them generate HTML files once, and then serve them, for security and performance reasons. But if anybody out there is serving the pages on-demand, generated from whatever the system has at the moment, then groff(1) is probably being exposed, and bugs may be security holes. Okay, not a recommended setup.


But I'm not in charge, so...

Me neither :)

Cheers,

Alex

--
<http://www.alejandro-colomar.es/>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]