bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SVN 1704 completely broke libedif


From: Dr . Jürgen Sauermann
Subject: Re: SVN 1704 completely broke libedif
Date: Wed, 7 Jun 2023 16:24:50 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

Hi Chris,

wrapping arbitrary (= UTF8-encoded) strings into UTF8_string first is
the proper way to go. Consider the differences between:

1.  UCS_string yyy(UTF8_string(xxx));   // almost proper, but ambiguous (most vexing parse error)
2.  UCS_string yyy(xxx);                // now private: so never use it
3a. UTF8_string utf(xxx);              // really proper
3b. UCS_string yyy(utf);
4. 
UCS_string yyy((UTF8_string(xxx)));   // also proper (this is 1. without the most vexing parse error)
5.  UCS_ASCII_string yyy(xxx)
 
If xxx is entirely ASCII then all of the above are equivalent.

Otherwise the difference is that 1. properly decodes UTF8-encoded
strings while the old 2. (which is now  disabled by private:) did not
(and the compiler has no way to detect an incorrect usage of 2.

Even worse, C++ would sometimes do 2. automatically (and incorrectly)
and without notice. Probably some of the recent Tokenization Errors
reported on bug-apl were caused by this.

Although 1. was throwing an assertion when used incorrectly, some
people wrapped a try {} catch {} around it which caused the error
to slip through unnoticed (at least up to the tokenizer).

A somewhat  unfortunate decision in the C++11 ff. standards was to
resolve yyy in  1. (which is ambiguous at a closer look) into a declaration
of function yyy() and not (as gcc still does) into two constructor calls
 UTF8_string(xxx) followed by UCS_string() with the first. This problem
can apparently be avoided by using 4. instead of 1. (note the extra pair
of () which is NOT redundant).

Finally, 5. is a safe replacement for 2. (and the comment in the .hh file
is still valid (so xxx MUST be ASCII), which should hopefully avoid the
automatic use of 2. by the compiler. It is also easier to use with grep
in order to spot the (still possible) incorrect usage of 5.

Hope this helps,
Jürgen


On 6/6/23 22:13, Chris Moller wrote:
Yeah, I saw your comment in one of the .hh files.  What I did was wrap all the edif ASCII strings in UTF8_string() calls.  That works, but if it's circumventing what you're trying to do, let me know and I'll think of something else. 

Even after a lot of years, I'm still not sure of the differences between UTF, UCS, Unicode, etc, etc.

--cm

On 6/6/23 15:56, Dr. Jürgen Sauermann wrote:
Hi,

sorry for that. The reason for making it private is to entirely prevent its usage.
The former implementation of of it only worked for ASCII strings. There was
a note about that in the header file, but I have seen quite a few incorrect
usages of it (read: with UTF8-encoded strings) which then caused other, difficult
to find, errors later on.

Best Regards,
Jürgen


On 6/6/23 17:31, Chris Moller wrote:
Hi, Xtian,

Just pushed a fix for edif if you want to give it a try.  Works for me on SVN 1706 and yesterday's SVN 1708.

--cm

On 6/5/23 03:33, Christian Robert wrote:
SVN 1704 completely broke libedif

Juergen made UCS_string (const char *)  a private member of the class
so a lot of compile errors in edif.cc ...

Not sure if this can be fixed. I reverted to SVN 1702 meanwhile. The is no way I'll revert to the "DEL Editor" !


Xtian.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]