[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Warn on mid-input line sentence endings
From: |
Alejandro Colomar |
Subject: |
Re: Warn on mid-input line sentence endings |
Date: |
Mon, 1 May 2023 00:15:55 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 |
Hi Branden,
On 4/30/23 14:34, G. Branden Robinson wrote:
>> Well, formally yes. And a regex can't find C function definitions in
>> a source tree; at least if you try to fool it by writing the most
>> horrible code in the universe. But I wrote a relatively small
>> script[1] that finds a lot of C code with pcre2grep(1), and works most
>> of the time. It has limitations; some of which can be fixed by
>> improving the regexes (read: making them even more unreadable); some
>> others are likely impossible to fix with a regex. The biggest
>> limitation I think I've met is K&R-style functions: I don't think a
>> regex can cope with them.
>
> I don't know if you have to cope with "the lexer hack", but you might.
>
> https://en.wikipedia.org/wiki/Lexer_hack
No, I didn't. The script is by design very dumb. It doesn't have a
database or index; it doesn't involve any compiler either. It is able
to work very fast on any source tree, without having to perform any
operations on it (e.g., I can clone a repository, and immediately after
I can run the program to search for a function).
It's literally just a wrapper around pcre2grep(1), which is just grep(1)
on steroids. I find it more usable than existing tools like ctags(1).
You could try it (but C++ will only work as long as it resembles C; and
you need to specify the file suffix).
>
> How much grief might have been saved if objects in C had been prefixed
> with a sigil like $, or if types had been prefixed with %?
With sane coding styles, my script works well. Of course, if you
take code from an obfuscation code contest, it will find garbage, but
I'm writing a small tool that is useful for finding code in useful code,
not a compiler that needs to be able to actually compile the weirdest
stuff that one can think of.
>
> In my imagination, Thompson vetoed this, but when I consider it more
> seriously, I reckon the truth is more complicated, and arises from C's
> origins in the wholly untyped B language. The dialect of C we see in
> Version 6 Unix (q.v. the Lions book) is shockingly loosely typed to
> modern eyes. I once ground the productivity of my workplace to a halt
> for an entire afternoon by presenting my colleagues with the attached
> exhibit of "legal C". (It remained legal in AT&T USG Unix for many,
> many years.)
>
>> I believe a regex-based script can be good enough for some purposes,
>> even if it's not perfect.
>
> All of this is true, and I like programming languages that are dead
> simple to lexically analyze. (But I spend next to no time working in
> them.)
>
> I'm strident on this point because I'm opposed to putting a diagnostic
> into the formatter that throws false positives.
Bjarni didn't propose adding such a thing to groff. He was rather
suggesting me to call such a script from my Makefile where I want the
diagnostics. I think that would be fair (assuming I can get a readable
thing out of that script); especially, since I already have other
scripts for similar purposes (like the one suggested by Ralph, for the
80-column margin, which I find very useful).
Cheers,
Alex
> That would disserve
> users.
>
> Regards,
> Branden
--
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
OpenPGP_signature
Description: OpenPGP digital signature
Re: Warn on semantic newlines, Dave Kemper, 2023/04/30