[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Conversion to unibyte, magic latin-1?

From: Julian Scheid
Subject: Conversion to unibyte, magic latin-1?
Date: Sun, 5 May 2019 03:48:24 +1200

I'm trying to work out how to calculate the SHA-256 for a binary
string reliably (and efficiently) in Elisp.

Consider this binary string:

    $ printf '\x52\xbc\xdd\x9e' | openssl dgst -sha256

`secure-hash' doesn't produce the same result (all tested in 26.2):

    (secure-hash 'sha256 (concat [#x52 #xbc #xdd #x9e]))

After studying the C source code I've figured out that this is because
it does multi-byte conversion behind the scenes (by the way, C-h f
secure-hash RET doesn't tell you this.)

Armed with this knowledge, and seeing in the code that no conversion
is done for unibyte strings, I've got it to work with

    (secure-hash 'sha256 (string-make-unibyte (concat [#x52 #xbc #xdd

Alas, `string-make-unibyte' is declared obsolete.  The help page tells
me that I should use `encode-coding-string' instead, so I tried that
with a few obvious encodings, but no luck:

    (secure-hash 'sha256 (encode-coding-string (concat [#x52 #xbc #xdd
#x9e]) 'raw-text))

    (secure-hash 'sha256 (encode-coding-string (concat [#x52 #xbc #xdd
#x9e]) 'binary))

In the end I searched for a coding system that works:

    (let* ((data (concat [#x52 #xbc #xdd #x9e]))
           (ref (secure-hash 'sha256 (string-make-unibyte data))))
       (lambda (coding-system)
         (string= (secure-hash 'sha256 (encode-coding-string data
    (latin-1 iso-8859-1 iso-latin-1)

    (secure-hash 'sha256 (encode-coding-string (concat [#x52 #xbc #xdd
#x9e]) 'latin-1))

This works, but I'm confused... why does latin-1 work but raw-text or
binary doesn't?  More importantly, how do I know that it works
everywhere and will continue to work in the future?  Is latin-1 a
"magic" encoding or does it only happen to work because it matches
with some default coding system set somewhere in my config?

For what it's worth, I can't see a mention of latin-1 anywhere in my
coding system settings (which are all defaults, afaik):

     (car coding-category-list))
    (utf-8-unix (utf-8-unix . utf-8-unix) utf-8-unix (utf-8-unix .
utf-8-unix) utf-8-unix nil coding-category-raw-text)

Could someone shed light on this?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]