[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Warn on mid-input line sentence endings

From: Alejandro Colomar
Subject: Re: Warn on mid-input line sentence endings
Date: Mon, 1 May 2023 00:15:55 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

Hi Branden,

On 4/30/23 14:34, G. Branden Robinson wrote:
>> Well, formally yes.  And a regex can't find C function definitions in
>> a source tree; at least if you try to fool it by writing the most
>> horrible code in the universe.  But I wrote a relatively small
>> script[1] that finds a lot of C code with pcre2grep(1), and works most
>> of the time.  It has limitations; some of which can be fixed by
>> improving the regexes (read: making them even more unreadable); some
>> others are likely impossible to fix with a regex.  The biggest
>> limitation I think I've met is K&R-style functions: I don't think a
>> regex can cope with them.
> I don't know if you have to cope with "the lexer hack", but you might.

No, I didn't.  The script is by design very dumb.  It doesn't have a
database or index; it doesn't involve any compiler either.  It is able
to work very fast on any source tree, without having to perform any
operations on it (e.g., I can clone a repository, and immediately after
I can run the program to search for a function).

It's literally just a wrapper around pcre2grep(1), which is just grep(1)
on steroids.  I find it more usable than existing tools like ctags(1).

You could try it (but C++ will only work as long as it resembles C; and
you need to specify the file suffix).

> How much grief might have been saved if objects in C had been prefixed
> with a sigil like $, or if types had been prefixed with %?

With sane coding styles, my script works well.  Of course, if you
take code from an obfuscation code contest, it will find garbage, but
I'm writing a small tool that is useful for finding code in useful code,
not a compiler that needs to be able to actually compile the weirdest
stuff that one can think of.

> In my imagination, Thompson vetoed this, but when I consider it more
> seriously, I reckon the truth is more complicated, and arises from C's
> origins in the wholly untyped B language.  The dialect of C we see in
> Version 6 Unix (q.v. the Lions book) is shockingly loosely typed to
> modern eyes.  I once ground the productivity of my workplace to a halt
> for an entire afternoon by presenting my colleagues with the attached
> exhibit of "legal C".  (It remained legal in AT&T USG Unix for many,
> many years.)
>> I believe a regex-based script can be good enough for some purposes,
>> even if it's not perfect.
> All of this is true, and I like programming languages that are dead
> simple to lexically analyze.  (But I spend next to no time working in
> them.)
> I'm strident on this point because I'm opposed to putting a diagnostic
> into the formatter that throws false positives.

Bjarni didn't propose adding such a thing to groff.  He was rather
suggesting me to call such a script from my Makefile where I want the
diagnostics.  I think that would be fair (assuming I can get a readable
thing out of that script); especially, since I already have other
scripts for similar purposes (like the one suggested by Ralph, for the
80-column margin, which I find very useful).


>  That would disserve
> users.
> Regards,
> Branden

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]