emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eww doesn't decode %AA%BB%CC URL names


From: Eli Zaretskii
Subject: Re: eww doesn't decode %AA%BB%CC URL names
Date: Thu, 24 Dec 2015 21:34:24 +0200

> From: Lars Ingebrigtsen <address@hidden>
> Cc: Yuri Khan <address@hidden>,  address@hidden
> Date: Thu, 24 Dec 2015 20:18:47 +0100
> 
> Eli Zaretskii <address@hidden> writes:
> 
> >> From: Yuri Khan <address@hidden>
> >> Date: Fri, 25 Dec 2015 00:07:40 +0600
> >> Cc: Eli Zaretskii <address@hidden>, Emacs developers <address@hidden>
> >> 
> >> On Thu, Dec 24, 2015 at 11:40 PM, Lars Ingebrigtsen <address@hidden> wrote:
> >> > (decode-coding-string (url-unhex-string
> >> > "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5")
> >> > 'utf-8)
> >> > => "Сердце"
> >> >
> >> > Right.  What charset do we choose?  I guess using the charset of the
> >> > document we're in doesn't make much sense (because it's linking to
> >> > something off-site which may be in a different charset)...
> >> 
> >> By RFC 3986, percent-encoded URLs SHOULD use UTF-8 encoding. If the
> >> URL does not decode into a valid UTF-8 string, it is ok to fall back
> >> to a heuristic, though.
> 
> That's basically just (car (decode-coding-string ...))

I believe you meant detect-coding-string.

> though, since it'll return utf-8 first if that's a possible charset,
> won't it?

You cannot rely on it returning UTF-8, that depends on coding
priorities (that are subject to customizations) and other things.

I think you should use UTF-8 literally as the first choice.

> > Yes, I think this is a good policy, thanks.  Bonus points for
> > implementing the command in a way that it will be able to accept user
> > choice of the encoding via "C-x RET c", like file operations do.
> 
> Let's see...  that function basically just binds
> `coding-system-for-{read,write}' and then calls the command
> interactively?

Yes.

> Do the commands just look at those variables, and if they're bound,
> then they use that coding system instead?

Yes, they use these in preference to everything else, something like
this:

  (let ((coding (or coding-system-for-read
                    document-encoding
                    locale-coding-system
                    ...)))
      (decode-coding-string ... coding))




reply via email to

[Prev in Thread] Current Thread [Next in Thread]