qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 28/60] json: Fix \uXXXX for surrogate pairs


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH v2 28/60] json: Fix \uXXXX for surrogate pairs
Date: Fri, 17 Aug 2018 11:36:14 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 08/17/2018 10:05 AM, Markus Armbruster wrote:
The JSON parser treats each half of a surrogate pair as unpaired
surrogate.  Fix it to recognize surrogate pairs.

Signed-off-by: Markus Armbruster <address@hidden>
Reviewed-by: Eric Blake <address@hidden>

I might have dropped the R-b, to ensure the changes since v1 get re-reviewed.

---
  qobject/json-parser.c | 60 ++++++++++++++++++++++++++++---------------
  tests/check-qjson.c   |  3 +--
  2 files changed, 40 insertions(+), 23 deletions(-)


@@ -157,22 +169,28 @@ static QString *parse_string(JSONParserContext *ctxt, 
JSONToken *token)
                  qstring_append_chr(str, '\t');
                  break;
              case 'u':
-                cp = 0;
-                for (i = 0; i < 4; i++) {
-                    if (!qemu_isxdigit(*ptr)) {
-                        parse_error(ctxt, token,
-                                    "invalid hex escape sequence in string");
-                        goto out;
+                cp = cvt4hex(ptr);
+                ptr += 4;
+
+                /* handle surrogate pairs */
+                if (cp >= 0xD800 && cp <= 0xDBFF
+                    && ptr[0] == '\\' && ptr[1] == 'u') {
+                    /* leading surrogate followed by \u */
+                    cp = 0x10000 + ((cp & 0x3FF) << 10);
+                    trailing = cvt4hex(ptr + 2);
+                    if (trailing >= 0xDC00 && trailing <= 0xDFFF) {
+                        /* followed by trailing surrogate */
+                        cp |= trailing & 0x3FF;
+                        ptr += 6;
+                    } else {
+                        cp = -1; /* invalid */
                      }
-                    cp <<= 4;
-                    cp |= hex2decimal(*ptr);
-                    ptr++;
                  }
if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {
                      parse_error(ctxt, token,
-                                "\\u%.4s is not a valid Unicode character",
-                                ptr - 3);
+                                "%.*s is not a valid Unicode character",
+                                (int)(ptr - beg), beg);

The error reporting here has indeed been improved over v1.

Reviewed-by: Eric Blake <address@hidden>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]