bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31138: Native json slower than json.el


From: Eli Zaretskii
Subject: bug#31138: Native json slower than json.el
Date: Sun, 15 Apr 2018 18:19:26 +0300

> From: Sebastien Chapuis <sebastien@chapu.is>
> Cc: 31138@debbugs.gnu.org
> Date: Sun, 15 Apr 2018 16:40:18 +0200
> 
> 
> > I'm surprised that the slowdown due to the conversion is so large,
> > though.  It doesn't feel right, even with a 4MB string.
> 
> I've digged a bit to know why it is so slow, and I've found that if I'm
> wrapping `json-parse-string` with a `with-temp-buffer`, it is now way
> faster:
> 
> results of benchmark-run with a string of 4043212 characters
> ```
> (with-temp-buffer (json-parse-string str)):
> (0.814315554 1 0.11941178500000005)
> 
> (json-parse-string str):
> (11.542233167 1 0.14954429599999997)
> 
> (with-temp-buffer (json-read-from-string str)):
> (5.9781185610000005 29 4.967349412000001)
> 
> (json-read-from-string str):
> (5.601267 24 4.723292248000001)
> ```

Interesting.

> Any idea why ?

Where did str come from?  Did you insert it into the buffer or
something?  Could that explain the difference in performance?

More generally, can you post the string you are using for the
benchmarking, and the benchmark code as well?  That would make the
discussion less abstract.

> > Yes, it's necessary, because the input string may include raw bytes,
> > which will crash Emacs if not handled properly.
> 
> The Jansson documentation guarantee that the strings returned
> from the library are always UTF-8 encoded [1].

You assume that the library has no bugs, yes?  Because if it does,
then we might crash Emacs by trusting it so much.  Letting invalid
bytes creep into Emacs buffers and strings is a sure recipe for an
eventual crash.

> By knowing that guarantee, is it possible to reconsider the use of
> code_convert_string ?

Since it's already much faster than a Lisp implementation, why would
we want to risk crashing an Emacs session by omitting the decoding?

> Encoding a string to UTF-8 which is already UTF-8 encoded seems
> useless..

It's decoding, not encoding, and the process of decoding examines
every sequence in the byte stream and ensures they are valid UTF-8.

Emacs never trusts any external data to be what the user or Lisp tell
it is; I see no reason why we should make an exception in this
particular case.

Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]