[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24784: 26.0.50; JSON strings with utf-16 escape codes
From: |
Dmitry Gutov |
Subject: |
bug#24784: 26.0.50; JSON strings with utf-16 escape codes |
Date: |
Tue, 25 Oct 2016 02:19:18 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Thunderbird/50.0 |
Philipp,
Thanks. Some comments:
On 24.10.2016 22:57, Philipp Stephani wrote:
+(defsubst json--decode-utf-16-surrogates (high low)
IIRC, there might be no actual benefit from making it a defsubst. If
someone could benchmark it, I'd like to see the result.
+ ;; Special-case UTF-16 surrogate pairs,
+ ;; cf. https://tools.ietf.org/html/rfc7159#section-7
+ ((looking-at
+ (rx (group (any "Dd") (any "89ABab") (= 2 (any "0-9A-Fa-f")))
+ "\\u" (group (any "Dd") (any "C-Fc-f") (= 2 (any "0-9A-Fa-f")))))
+ (json-advance 10)
+ (json--decode-utf-16-surrogates
+ (string-to-number (match-string 1) 16)
+ (string-to-number (match-string 2) 16)))
Shouldn't this go below the UTF-8 case, as the less-frequent one?
(ert-deftest test-json-encode-string ()
(should (equal (json-encode-string "foo") "\"foo\""))
(should (equal (json-encode-string "a\n\fb") "\"a\\n\\fb\""))
- (should (equal (json-encode-string "\nasdфыв\u001f\u007ffgh\t")
- "\"\\nasdфыв\\u001f\u007ffgh\\t\"")))
+ (should (equal (json-encode-string "\nasdфыв�\u001f\u007ffgh\t")
+ "\"\\nasdфыв�\\u001f\u007ffgh\\t\"")))
Why are we testing string encoding here?