lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Raw strings [Was: PATCH: tests build fixes for clang]


From: Greg Chicares
Subject: [lmi] Raw strings [Was: PATCH: tests build fixes for clang]
Date: Mon, 8 Mar 2021 22:11:29 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

On 3/8/21 6:19 PM, Vadim Zeitlin wrote:
> On Mon, 8 Mar 2021 16:20:30 +0000 Greg Chicares <gchicares@sbcglobal.net> 
> wrote:
[...]
> GC> The root cause is using raw string literals, whose syntax is just
> GC> frankly horrible.
> 
>  Yes, it's regrettable that it's so limited. But there is nothing we can do
> about it.
> 
> GC> Here, the benefit (not having to write "\n" for newline)
> 
>  I think you're drastically underestimating the benefit. Even though not
> having to write "\n"s is very much appreciable on its own, this is not just
> about the newlines, but also about the quotes and backslashes that can be
> difficult to properly quote/unquote in arbitrarily long strings.

True.

>  Practically speaking, raw strings are great because they allow to just
> copy-paste between the editor and the terminal.

The concept is great. The syntax is what I object to.

I'm no fan of java, but they did a better job, first with
"raw string literals":

http://openjdk.java.net/jeps/326

String html = `<html>
                   <body>
                       <p>Hello World.</p>
                   </body>
               </html>
              `;

(since withdrawn) and then with their replacement, "text blocks":

https://blogs.oracle.com/javamagazine/text-blocks-come-to-java

String html = """
<HTML>
  <BODY>
    <H1>"Java 13 is here!"</H1>
  </BODY>
</HTML>""";

Both are better than C++'s implementation, largely because of its
awful user-defined delimiter that's written "inside the quotes".

This example from 'md5sum_cli.cpp' is about as simple as C++ allows:

R"(Usage: lmi_md5sum [OPTION]... [FILE]...
Print or check MD5 (128-bit) checksums.
...
)";

The decision to put the delimiter inside the double quotes was
a stunning mistake that harms all users all the time. It's simply
against nature. Quotes have a natural meaning everywhere else:
everything between quotes is what's quoted.

We might try to work around this with preprocessor macros, but
I fear that even if we came up with something that's valid
according to the language specification, it'd be an exotic use
that would expose preprocessor defects.

Can we devise an lmi rule for handling raw literals uniformly?

The issue with the 'md5sum_cli.cpp' example above is alignment:
the first line is indented, which would be terrible here:

std::string const simple_table_values =
R"(  0  0.12345
  1  0.23456
)";

...so a uniform rule would prefer something more like your
original code:

std::string const simple_table_values(1 + R"table(
  0  0.12345
  1  0.23456
)table");

To avoid the multiple levels of parentheses, I'd propose:

std::string const simple_table_values = 
1 + R"table(
  0  0.12345
  1  0.23456
)table";

which is of the form:

  BEGIN-MARKER
    text to
    be quoted
  END-MARKER-WITH-SEMICOLON

Then we make the markers stand out, so they can't possibly be
mistaken for anything else:

std::string const simple_table_values = 
1 + R"--8<--8<--8<--(
  0  0.12345
  1  0.23456
)--8<--8<--8<--";

Let's count carefully, in hex:

   1234567890ABCD
  )--8<--8<--8<--"

Our uniform d-char-sequence is fourteen characters. We're allowed
sixteen. A preprocessor author might count the ')' or the '"' in
that sixteen, even though that's not what the standard says, but
using fourteen sidesteps that plausible defect.

And we always write either "1 + ", or some macro if you like:

std::string const simple_table_values = 
NO_LEADING_NEWLINE R"--8<--8<--8<--(
  0  0.12345
  1  0.23456
)--8<--8<--8<--";

but I think we'll agree that "1 + " is better. Corollary:
inhibit that clang warning, because we know we aren't going
to make the mistake that it guards against:
  std::cout << 37 + " is the error number << std::endl;

And I guess we have no choice but to align everything to the
left margin, which makes grepping harder, but that's the
price we have to pay, because we don't have java's special
rules for stripping uniform initial whitespace.

Thus, my proposal is:

std::string const simple_table_values = 
1 + R"--8<--8<--8<--(
  0  0.12345
  1  0.23456
)--8<--8<--8<--";

or, even better IMO:

std::string const simple_table_values = 
1 + R"--cut-here--(
  0  0.12345
  1  0.23456
)--cut-here--";

Exception: one-line definitions can use an empty d-char-sequence.

> [...] would strongly prefer to live without this clang warning (i.e.
> disable it globally at makefile level) rather than not to use raw string
> literals.

Agreed. It's a garbage warning.

> They absolutely have their problems, but they're still much
> better than normal strings for multiline chunks of text. IMO f5d41a09e
> (Replace raw string literals, 2021-03-08) makes the source much less
> readable and maintainable and I wish it could be reverted.

Please let me know what you think of this:

acc47e4f2 (HEAD -> odd/eraseme_raw, origin/odd/eraseme_raw) Establish the One 
True nonempty d-char-sequence

>  Please reconsider proscribing raw strings!

Now I feel like Mr. Rosewater in that book.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]