denemo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Denemo-devel] New Binaries?


From: Richard Shann
Subject: Re: [Denemo-devel] New Binaries?
Date: Wed, 17 Jun 2015 08:12:54 +0100

On Tue, 2015-06-16 at 21:09 -0500, Jeremiah Benham wrote:
> I placed some code to debug what is going on in edit_label_for_button:
> 
>         gboolean isvalid = g_utf8_validate (label,
>                  sizeof(label),
>                  NULL);
>         printf("\nIs label valid utf8 == %i\n", isvalid);
>         printf("\nnewlabel= %s\n",label);
> 
>         isvalid = g_utf8_validate (newlabel,
>                  sizeof(newlabel),
>                  NULL);
>         printf("\nIs newlabel valid utf8 == %i\n", isvalid);
>         printf("\nnewlabel= %s\n",newlabel);
> 
> 
> I copied in a treble clef symbol and label was not utf-8 and newlabel
> was in fact utf8. 

So, what you have found is that newlabel on the Mac is UTF-8, as it is
on Debian, which means that the GtkEntry widget is returning UTF-8
encoded strings. So newlabel will point to the bytes 0xF0 0x9D 0x84 0x9E
0x00 when you have pasted a 𝄞 into the GtkEntry that is popped up when
you do "Edit Label".

You don't say in the snippet of debug code above where exactly the   

g_utf8_validate (label, sizeof(label),           NULL);

was placed - was the value in "label" one attached to the button widget
at line 241

label = g_object_get_data (G_OBJECT(button), "icon");

or was it one retrieved from the button's GtkLabel at line 243

 label = gtk_button_get_label (GTK_BUTTON(button));
> 
?If it is attached to the button widget as data, it could be that the
libxml2 library is bringing in the strings as UTF-16 (I noticed that the
xml format itself encodes them as Unicode, but the library on reading
palettes.xml will convert to some encoding, I would guess dependent on
the machine, perhaps we have a version of libxml2 that is encoding
UTF-16).

It might be useful as well to do other validations - when you say "label
was not utf8" can you put in a check to see if it is UTF-16 or the
Unicode code point?

It also occurred to me that the rasterized box with D834 in it may not
be intended to convey that two bytes (0xD8 0x34) are present at this
point in the string, it may be that it just displays the first two bytes
of anything it can't find a glyph for. However, what you have written
above points the other way, namely that Gtk is consistently working with
UTF-8 and it is somewhere in the backend that the conversion to UTF-16
is being done, resulting in the appearance of those bytes.

Richard





> Jeremiah
> 
> 
> 
> On Tue, Jun 16, 2015 at 1:55 PM, Richard Shann
> <address@hidden> wrote:
>         On Tue, 2015-06-16 at 12:14 -0500, Jeremiah Benham wrote:
>         >
>         >
>         > On Tue, Jun 16, 2015 at 10:22 AM, Richard Shann
>         > <address@hidden> wrote:
>         >
>         >
>         >         >         Another example is the first button in the
>         general
>         >         palette,
>         >         >         the one at
>         >         >         the top of the display. That is a treble
>         clef sign
>         >         in a large
>         >         >         size, its
>         >         >         label is this
>         >         >
>         >         >         <span font='16'> 𝄞   </span>
>         >         >
>         >         >         That one displays as D834 in your
>         screenshot. I can
>         >         only guess
>         >         >         that on
>         >         >         the Mac these embedded characters are
>         being expected
>         >         in a
>         >         >         different
>         >         >         format (UTF-16 instead of UTF-8 ?).
>         >         >         Looking in the source of this label, the
>         file
>         >         >         actions/palettes.xml I see
>         >         >
>         >         >         label="&lt;span font='16'&gt; &#x1D11E;
>         >          &lt;/span&gt;"
>         >         >
>         >         >         which means that 0x1D11E is the character
>         code being
>         >         inserted,
>         >         >         this is
>         >         >         what is called the unicode codepoint (I
>         think what
>         >         would be
>         >         >         written U
>         >         >         +1D11E). I don't know what else might work
>         in that
>         >         position.
>         >         >         Looking up
>         >         >         this unicode value I see that its UTF-16
>         >         representation is
>         >         >
>         >         >
>         >         >         D8 34 DD 1E
>         >         >
>         >         >         which hints to me that the (gtk routines
>         for) the
>         >         mac is just
>         >         >         seeing the
>         >         >         D834 bit - which would explain why your
>         screenshots
>         >         seem to
>         >         >         show this
>         >         >         same code on several buttons - they are
>         all in the
>         >         musical
>         >         >         instruments
>         >         >         block, which is perhaps what the D834
>         refers to (the
>         >         bass
>         >         >         clef, for
>         >         >         example, is D8 34 DD 22 in UTF-16).
>         >         >
>         >
>         >         > Can we convert the UTF-16 to UTF-8? Something
>         like:
>         >         >
>         >
>          
> https://developer.gnome.org/glib/stable/glib-Unicode-Manipulation.html#g-utf16-to-utf8
>         >         >
>         >         > Are these characters expected to be UTF-16 in
>         windows?
>         >         >
>         >
>         >         I used gdb to stop Denemo just as it is making the
>         call to
>         >         write the
>         >         label on a palette button.
>         >         this is in palettes.c at line 257
>         >
>         >         257             gtk_label_set_markup (GTK_LABEL
>         >         (label_widget), newlabel);
>         >
>         >         I then enquired what bytes were contained in the
>         string
>         >         newlabel that is
>         >         being passed to that function. On my Debian windows
>         system,
>         >         the bytes
>         >         are these:
>         >
>         >         0xF0 0x9D 0x84 0x9E
>         >
>         >         Looking this up, I see that this is the UTF-8
>         encoding for the
>         >         treble
>         >         clef sign (𝄞) which has the unicode value U+1D11E
>         >
>         >         So, the text entry widget is returning a UTF-8
>         string
>         >         representation for
>         >         the text you enter into it on Debian. Specifically
>         if you
>         >         paste 𝄞 the
>         >         text entry widget returns a pointer to the bytes
>         0xF0 0x9D
>         >         0x84 0x9E
>         >         0x00.
>         >
>         >         We don't know what bytes that widget is returning on
>         the Mac
>         >         but one guess is that it is returning 0xD8 0x34 0xDD
>         0x1E 0x00
>         >         that is
>         >         it is returning the UTF-16 encoding.
>         >
>         >         I tried setting newlabel to have this value 0xD8
>         0x34 0xDD
>         >         0x1E 0x00
>         >         from inside gdb and this caused a warning
>         >
>         >          Gtk - WARNING : Failed to set text from markup due
>         to error
>         >         parsing
>         >         markup: Error on line 1 char 13: Invalid UTF-8
>         encoded text in
>         >         name -
>         >         not valid '�4�"
>         >
>         >         Because of this, it fails to update the label. So
>         (in Debian)
>         >         the call
>         >         to gtk_label_set_markup() is expecting a UTF-8
>         encoded string
>         >         and fails
>         >         when given the string 0xD8 0x34 0xDD 0x1E 0x00
>         (label is not
>         >         written
>         >         to).
>         >
>         >         So, if you are able to test on the Mac, do
>         >
>         >         Right click on a palette button
>         >         Edit Label
>         >         delete all the text and paste in a single 𝄞
>         character
>         >         press enter and see if the label updates to a box
>         with D834 in
>         >         it, or if
>         >         it fails to update.
>         >
>         >
>         > I have done this and it fails.  It has the letters D834 like
>         the
>         > others. I thought we tested this already.
>         
>         
>         Sorry, what you have written is ambiguous: did it fail to
>         update the
>         label, or did it update it to become D834 in a box?
>         (That is, to test, start with a label that works, just ascii,
>         and then
>         try to edit it to be the single 𝄞 character).
>         
>         If it fails to update, (stays as the ascii you had before) the
>         we can't
>         be sure what the text entry widget is returning.
>         
>         If it updates to D834 in a box, then we can guess that it is
>         the
>         text_entry widget that is returning a UTF_16 string which the
>         gtk_label_set_markup() function is failing to display.
>         
>         Perhaps, nailing down what the mis-match is won't be as
>         important as
>         getting the right set of libraries that work together on the
>         Mac. We
>         don't know if the Mac code is supposed to be using UTF_16 or
>         8, whatever
>         it is, should be consistent between the GtkLabel and GtkEntry
>         widgets.
>         
>         Richard
>         
>         
>         
>         
>         
>         
>         >
>         >
>         > Jeremiah
>         >
>         >
>         >         Richard
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >         >
>         >         > Jeremiah
>         >         >
>         >         >
>         >         >         I'm not sure what the way through all this
>         is,
>         >         perhaps asking
>         >         >         someone in
>         >         >         the gtk mac world about the representation
>         of
>         >         characters - or,
>         >         >         if gtk2
>         >         >         works, then something in the upgrade
>         documentation
>         >         for gtk3
>         >         >         might help.
>         >         >
>         >         >         Richard
>         >         >
>         >         >         >
>         >         >
>         >         >
>         >         >
>         >
>         >
>         >
>         >
>         >
>         
>         
>         
> 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]