help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing of multibyte strings frpom process output


From: Michael Albinus
Subject: Re: Parsing of multibyte strings frpom process output
Date: Tue, 08 May 2018 14:01:22 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

Helmut Eller <eller.helmut@gmail.com> writes:

Hi Helmut,

>> However, I don't know how to parse it that I could retrieve it. All
>> what I have tried returns always the *two* characters ?\xc2 ?\x9a,
>> multibyte encoded. How could I get just the multibyte character ?\x9a
>> from this?
>
> You could use (set-process-coding-system <proc> 'utf-8) if you know that
> the all output of the process is indeed utf-8 encoded.

I've done this already, for other purposes. But it doesn't help, the
string /home/albinus/tmp/\xc2\x9abung is written literally into the
output buffer.

> Alternatively, you could use 'binary as coding system and manually call
> decode-coding-string on the parts that are utf-8 encoded.  However keep
> in mind, that "raw bytes" in multibyte strings have char codes in the
> range #x3FFF00..#x3FFFFF.

I tried that, with no luck. But I didn't know that "raw" bytes are in
that range.

>   (decode-coding-string (string #x3FFFc3 #x3FFF9c) 'utf-8) => "Ü"

That's it! The following code works for me (res-symlink-target keeps the
file name from process output, as shown above):

--8<---------------cut here---------------start------------->8---
(setq res-symlink-target
      ;; Parse multibyte codings.
      (decode-coding-string
       (replace-regexp-in-string
        "\\\\x\\([[:xdigit:]]\\{2\\}\\)"
        (lambda (x)
          (string
           (string-to-number (concat "3FFF" (match-string 1 x)) 16)))
        res-symlink-target)
       'utf-8))
--8<---------------cut here---------------end--------------->8---

Thanks a lot!

> Helmut

Best regards, Muichael.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]