emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] [bug] Org link dialog escapes URL spaces incorrectly


From: David Maus
Subject: Re: [O] [bug] Org link dialog escapes URL spaces incorrectly
Date: Sat, 05 Nov 2011 15:04:32 +0100
User-agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (Gojō) APEL/10.8 Emacs/23.2 (i486-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

At Fri, 04 Nov 2011 14:25:42 -0400,
Nick Dokos wrote:
>
> Nick Dokos <address@hidden> wrote:
>
> > It probably does, but that's probably not the best place to do it: it might 
> > be
> > better to do it in the (setq link on line 9090 or thereabouts. Otherwise, in
> > the *other* case (editing the link at point), we'll end up unescaping twice:
> > probably not a problem, since unescaping should be idempotent (in contrast 
> > to
> > escaping ;-) ) but why do it twice?
> >
>
> Brian Wightman pointed out to me that the idempotent part of the
> statement above is definitely wrong (d'oh). The original URL that Jeff
> Horn posted, when unescaped once, would be completely free of % signs.
> But if the second (doubly-escaped) form is pasted into the minibuffer,
> then unescaping once would not be enough. So I presume the thing to do
> is to take the URL and unescape it repeatedly until it loses all
> escapes, and then escape it *once* before inserting it in the org
> buffer.
>
> Sounds icky, kludgy, dirty. The question is: 1) is it a solution?
> and 2) is there a better one?

No, this wouldn't be a solution. Consider a link with the sequence
%2525 -- Unescape until no more escapes (or rather "escapes") will
produce a single `%', not %25. Either escape once, or not at all.

What roughly happens is this:

1. The user enters a link via `org-insert-link'
2. Org escapes the link and writes it to the buffer
3. The user opens the link with `org-open-at-poin'
4. Org reads the link from the buffer and unescapes it
5. The link gets escaped and passed to the cosuming application (i.e. browser)

For steps 2 and 4 it is guaranteed that

(string= link (org-link-unescape (org-link-escape link)))

Thus, the problem is not in 2 or 4, but in 1 or 5.

Step 5 assumes, that a link entered by the user in step 1 was an
unescaped link and thus needs escaping before it is passed to the
cosuming application. If you enter a link in step 1 that already is
escaped, this assumption fails and you'll end up with a double-escaped
link that is passed to the consumer.

In other words, the question is: How to decide whether an arbitrary
URL is percent-escaped or not?

Now here's the problem: You can't. Is

"http://example.tld/foo%40bar";

already escaped or not? You can't tell for sure. It depends on the
application you copied the link from.[1]

What we could do in step 5 is... guess. If the (unescaped) link
produced by step 4 does contain characters that need escaping, we
escape the link. Otherwise we don't.

Not quiet sure about the impact of such a change.

Best,
 -- David

[1] Even worse: It may even depend on /how/ or /where/ you copied the
link. E.g. the link to a wikipedia page about set theory is copied as

http://de.wikipedia.org/wiki/Menge_%28Mathematik%29

if C-c'ed from the address bar but copied as

http://de.wikipedia.org/wiki/Menge_(Mathematik)

if C-c'ed via "Copy link to clipboard" at another page (Iceweasel
3.6.23).
--
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... address@hidden
Email..... address@hidden

Attachment: pgp5h4K4FLMaq.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]