coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coreutils & building w/C++


From: L A Walsh
Subject: Re: coreutils & building w/C++
Date: Sat, 04 Feb 2017 14:29:40 -0800
User-agent: Thunderbird

Eric Blake wrote:
On 02/03/2017 07:31 PM, L A Walsh wrote:
I was wondering if there has ever been any consideration given to migrating the coreutils to C++, or at least making it such that it would build cleanly under either?
No, and there probably is no interest in it either. Finding a standards-compliant C++ compiler on as many platforms as we already have a working C compiler is unlikely to happen, but would be a necessary prerequisite.
-----

Hmmm... Seems like the gnu C++ compiler shares alot of C's infrastructure -- even the same man pages. It certainly seems likely that if gcc runs on a given platform, g++ would as well.
Sorry, but I don't see any reason to rewrite in a different language. Most of the core contributors are more familiar with C than with C++,
-----

That's good since C++ is mostly a superset of C, so no rewriting should be necessary. Usually, I find most standards-compliant C programs will run with few changes. C++ does disallow some dubious C constructs, but C++'s runtime includes all of C's library functions. Given C++ was designed to be C compatible, I certainly don't see any _need_ for rewriting unless one wants to refactor or improve the code.
and so even if C++ were used, it would look a lot more like weird C than it would like proper C++.
-----

Well, "proper" is usually an academic matter (unless it becomes "vehemently proper", then religion may be involved). ;-) Many of the C++ programs I've seen or worked with look like standard C programs, with the users wanting specific C++ features, like its standard library or allowing namespaces solely for the purpose of keeping library or module calls in their own space and no impinging on the global name space.
I also don't buy the argument that being more object-oriented will help things; coreutils is not a large corpus of multiple inter-related pieces, but a bunch of individual utilities that do one thing.
----

Really? Looking at the source, I see 316 source files (.c or .h), with 69,000 lines in a shared library, with only 30 files and 9000 lines of separate, tool-frontends in src. In fact, it seems that all of the core utils can be built as 1 binary with all code shared and individual tools being symlinks to the 1 binary. There looks to be over 7x as much common code in the library as there is code supporting the different tool interfaces.

Looks like Gcc and binutils are better projects for using C++, and it shows, as both of them have already made progress towards that front.
----

I don't know how long it's been going on, but it looks like it is an ongoing project to find the commonalities in the core utils and merge them with the common code going into the library. There may be a score or more of individual tools, but many of them handle the exact same or similar issues -- like file-tree traversal, date+time manipulation+formatting, file access. With changes being added to core utils to support various security needs, even more commonality between tools is growing as they are all getting the framework to deal with more specific end cases -- and that's stuff that C++ can be very useful in organizing. I was thinking of how much more flexible coreutils would be if it was more organized like the linux-kernel, in so much that different, orthogonal features can be developed in loadable modules. They could conceivably be organized along the lines of the kernel's security modules where only modules that are wanted/needed/used would be loaded at runtime -- OR such modules can be hard-linked in with the kernel at build time, either limiting loadables to lesser used features, or none at all.

No, no one's ever been interested enough to even bother with it, because it is probably not worth doing.
----

One could look at history to find the correlation between something not being done and whether or not it was worth it. Most dominant people didn't think it was worth trying to circumnavigate the globe because it wasn't worth it to build a ship you knew would fall off the edge of the flat world. Many thought flying was impossible or couldn't be done, as was breaking the speed of sound. History is filled with more examples of "undone things" being useful when completed than not. Benefits of a particular course of action are usually not visible before the action is done.

FWIW, I gave it a spin, and there seem to be several "__THROW" keywords. I'm not familiar with those in the C-standard.
That's because they're not in the C standard. That particular macro is defined by glibc: misc/sys/cdefs.h:# define __THROW __attribute__ ((__nothrow__ __LEAF)) as an optimization hint for gcc, and as a no-op for other compilers.
-----

Wow... it was gcc that gave the errors (though w/different switches than are normally used in build.
But this is all open source - if you are questioning what the code means, rather than following the source and finding the answer yourself, then you're already facing an uphill battle at trying to rewrite the source.
-----

Especially in areas with non-standard API's or formats. In this case, I was looking at the "low-hanging fruit" that gcc claimed it didn't recognize that was responsible for multiple errors. C++ has standard language features to control error propagation and allow related optimizations. It's a good example of where C++ would be more useful in that it has standard methods and features in areas where "ad-hoc" methods need to be used in C.

With C++, those features could be looked up in a C++ language or library reference. Perhaps it might, at least, be possible to use the C++ compiler as type of diagnostic -- one that could help clarify existing C code and make it more robust. For example in C, one can initialize a character array like: char b32str[32] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567"; used in "base32.c". However, depending on how it is used, it can cause problems:

 Compiled with:

 > gcc -Wall -o b32 b32.c

-----

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char ** argv) {
 static const char b32str[32]  = "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567";
 const int bufsiz              = sizeof(b32str)*3/2;
 char * buff                   = calloc(bufsiz, 1);
 memset(buff, 'X', bufsiz-1);
 strncpy(buff, b32str, sizeof(b32str) );
 printf("Len of b32str, '%s', is %d characters.\n",
   buff, (int) strlen(buff));
 exit (0);
}

-----

 When run, it produces:

Len of b32str, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567XXXXXXXXXXXXXXX', is 47 
characters.

 Under C, it produces no warnings as C allows the NUL terminator to
be dropped from the initialization if the initializer exactly fits -- a 'valid', but potential problem if the expression is treated as the
same type as the initializer (a string).

 C++, generates an error:

> g++ -Wall -o b32 b32.c b32.c: In function ‘int main(int, char**)’:
 b32.c:7:33: error: initializer-string for array of chars is too long 
[-fpermissive]
   static const char b32str[32] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567";
                                  ^
 b32.c:9:32: error: invalid conversion from ‘void*’ to ‘char*’ [-fpermissive]
   char * buff = calloc(bufsiz, 1);
                               ^

 So, it seems it might be of some benefit in promoting safer programming
practices (or not, as people often find ways to work around restrictions!)

;-)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]