|
From: | Eric Blake |
Subject: | Re: [Qemu-devel] [PATCH v2 28/60] json: Fix \uXXXX for surrogate pairs |
Date: | Fri, 17 Aug 2018 11:36:14 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 08/17/2018 10:05 AM, Markus Armbruster wrote:
The JSON parser treats each half of a surrogate pair as unpaired surrogate. Fix it to recognize surrogate pairs. Signed-off-by: Markus Armbruster <address@hidden> Reviewed-by: Eric Blake <address@hidden>
I might have dropped the R-b, to ensure the changes since v1 get re-reviewed.
--- qobject/json-parser.c | 60 ++++++++++++++++++++++++++++--------------- tests/check-qjson.c | 3 +-- 2 files changed, 40 insertions(+), 23 deletions(-)
@@ -157,22 +169,28 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token) qstring_append_chr(str, '\t'); break; case 'u': - cp = 0; - for (i = 0; i < 4; i++) { - if (!qemu_isxdigit(*ptr)) { - parse_error(ctxt, token, - "invalid hex escape sequence in string"); - goto out; + cp = cvt4hex(ptr); + ptr += 4; + + /* handle surrogate pairs */ + if (cp >= 0xD800 && cp <= 0xDBFF + && ptr[0] == '\\' && ptr[1] == 'u') { + /* leading surrogate followed by \u */ + cp = 0x10000 + ((cp & 0x3FF) << 10); + trailing = cvt4hex(ptr + 2); + if (trailing >= 0xDC00 && trailing <= 0xDFFF) { + /* followed by trailing surrogate */ + cp |= trailing & 0x3FF; + ptr += 6; + } else { + cp = -1; /* invalid */ } - cp <<= 4; - cp |= hex2decimal(*ptr); - ptr++; }if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {parse_error(ctxt, token, - "\\u%.4s is not a valid Unicode character", - ptr - 3); + "%.*s is not a valid Unicode character", + (int)(ptr - beg), beg);
The error reporting here has indeed been improved over v1. Reviewed-by: Eric Blake <address@hidden> -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
[Prev in Thread] | Current Thread | [Next in Thread] |