[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode support for the MS Windows clipboard
From: |
Benjamin Riefenstahl |
Subject: |
Re: Unicode support for the MS Windows clipboard |
Date: |
Fri, 28 May 2004 15:26:10 +0200 |
User-agent: |
Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3.50 (gnu/linux) |
Hi all,
I have been doing some testing, research and thinking. First here is
a quick response on a few points. Next I'll do a new patch based on
my current ideas.
> "Eli Zaretskii" <address@hidden> writes:
>> Couldn't this be done without introducing Windows-specific options?
Jason Rumney <address@hidden> writes:
> Also, we should set (and read) CF_LOCALE when we are using CF_TEXT,
> to indicate the coding we have used.
Thanks for the pointer, researching locales actually led me to a
solution for deriving codepage properties (OEM vs "ANSI") via locales.
I think I have an algorithm that works. It makes a few assumptions
about the coding system names and based on that it derives the
requested clipboard type automatically.
How does this sound:
- If selection-coding-system has the form /(.*-)?utf-16.*/, I assume
CF_UNICODETEXT is wanted.
- If selection-coding-system has the form /cp[0-9]+.*/ or
/windows-[0-9]+.*/, I derive the codepage from that.
- Check if the codepage is identical to GetACP() or GetOEMCP().
If it is, use CF_TEXT or CF_OEMTEXT accordingly.
- Else get a corresponding LCID (reverse mapping via
EnumLocales()) which has the codepage as OEM or "ANSI". In this
case we also need to set LC_LOCALE accordingly.
The last step takes a small performance hit, but the results can
easily be cached.
I am also thinking of custom coding systems, like e.g. for doing
automatic remapping of private characters or locale specific
pre-/postprocessing. This is why I am not completely comfortable with
hardcoding coding systems or using heuristics based on the coding
system symbol names. If such concerns are completely misplaced,
please just tell ;-).
Anyway, I have no problem with dictating the above naming conventions
for selection-coding-system for now.
Jason Rumney <address@hidden> writes:
> Andrew Innes always had the intention to make the clipboard work
> on-demand, the same way it does on X. So the memory would only be
> used if the clipboard text was actually pasted (and then only for
> the format the client wanted).
We could do that using WM_RENDERFORMAT. But than we absolutely need a
valid HWND to get a target for that message. I don't know anything
about the Emacs message loop and the windows that are available. It
would probably be best to allocate a custom hidden window for this.
I'll postpone that idea for now and just assume that we don't use
Unicode on 9x/Me.
Jason Rumney <address@hidden> writes:
> Another thing worth considering, if we are making major changes to the
> clipboard code, is that Kenichi Handa pointed out some time ago that
> the encoding part of the X clipboard support is now done in Lisp
> (xselect.el). Windows could do this too.
At the moment this is done via {de,en}code_coding() and a couple of
friends. Is that the same thing?
Benjamin Riefenstahl <address@hidden> writes:
>> Anyway, what happens to the MULE problem in this unified scenario?
>> Do all problems go away with unify-8859-on-{de,en}coding?
Jason Rumney <address@hidden> writes:
> What MULE problem?
Disjunct charsets leading to the introduction of unwanted characters
(similar to that SHIFT-JIS <-> Chinese confusion that you just
mentioned). At one of the last times when the discussion came up
somebody mentioned that this could still be a serious problem.
Jason Rumney <address@hidden> writes:
> The encoding of CF_UNICODETEXT does not vary, so utf-16-le (or maybe
> -be) is the only coding-system that is appropriate.
Actually at the moment that would be utf-16le-dos, not utf-16-le-dos.
The latter includes a BOM, which we really don't want here. The
non-intuitive naming difference makes me wonder though, if this is
just some unintended confusion? There are also currently
utf-16-le-with-signature-* and mule-utf-16-*.
> "Eli Zaretskii" <address@hidden> writes:
>> Also, AFAIK CF_UNICODETEXT _can_ be used on Windows 9x, as any
>> program like clipbrd.exe or ClipConvert will show you.
I tested Win95 and Win98SE. On both systems, the clipboard viewer and
Notepad couldn't make use of CF_UNICODETEXT. Cut-and-paste between
two Emacs instances via CF_UNICODETEXT works, so i assume other
applications that support CF_UNICODETEXT would work, too. No
automatic conversion by Windows, though.
Benjamin Riefenstahl <address@hidden> writes:
>>> - Drop optimizations for ASCII-only text.
> "Eli Zaretskii" <address@hidden> writes:
>> Is that optimization indeed an optimization?
Getting data from the clipboard is indeed quite a bit faster with this
optimization. Putting something on the clipboard doesn't benefit, but
that's probably because the detection of this case is inefficient, it
uses find_charset_in_text(), although the result is not really
used. So probably that can be made better, too. I'll try to get this
integrated in the next version of the patch.
benny
- [Patch] Unicode support for the MS Windows clipboard, Benjamin Riefenstahl, 2004/05/26
- Re: [Patch] Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/27
- Re: Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/29
- Re: Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/29
- Re: Unicode support for the MS Windows clipboard, Benjamin Riefenstahl, 2004/05/29
- Re: Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/29
- Re: Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/28
- Re: Unicode support for the MS Windows clipboard, Benjamin Riefenstahl, 2004/05/29
Re: [Patch] Unicode support for the MS Windows clipboard, Jason Rumney, 2004/05/27