help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: those funny non-ASCII characters


From: Xah Lee
Subject: Re: those funny non-ASCII characters
Date: Fri, 25 May 2012 11:33:51 -0700 (PDT)
User-agent: G2/1.0

hope Eli answered all your questions.

here's some addition.

• embrace unicode, because it's just going to be more and more.
Programing Languages are all default on unicode by spec (e.g. any html/
css/JavaScript, and Java, Haskell, …). Most OS (Windows, Mac) and file
systems all default to unicode encoding now (not sure about linux).
Even emacs, starting with emacs 23, uses unicode as default internal
encoding.

〈Unicode Popularity on Web by Google〉
http://xahlee.org/comp/unicode_on_web.html

• Unicode is about 2 things: ① a char set with a integer ID for each
char. ② several encoding for the char set, most popular being utf-8
and utf-16 (the latter are default on Mac, Windows). (encoding is a
standard that changes a char from a char set into byte sequence)

• in emacs, just put this in your init:
(set-language-environment "UTF-8")

that should put all encoding to utf-8, and shouldn't cause you any
problem if all your curretn file and elisp file are ascii, because
ascii encoding is compatible/subset of utf-8/unicode.

• in emacs, call describe-car. That'll show the current char's
encoding as well as byte sequence used for that particular encoding.
(this is emacs 24. Emacs 23 may not show the byte sequence... i don't
recall.)

my unicode tutorial covers all these… feel free to ask me, or here, of
course.

 Xah


On May 25, 6:40 am, "Buchs, Kevin" <buchs.ke...@mayo.edu> wrote:
> Thanks, Xah and Eli, for contributing to my further understanding. I
> went to a specific website where I got the content I copied and pasted
> and I can see from the HTML that it has a charset=UTF-8, so I understand
> that is Unicode 8-bit. Using the C-u C-x =, I see that the particular
> character I pasted has a code point of 0x2013 (U+2013). I didn't see,
> however, what the UTF-8 encoding of that code point was. Should I be
> able to read that somewhere on the buffer of information I get with C-u
> C-x = ? I was poking around thewww.unicode.orgwebsite, trying to
> understand how this U+2013 code point is encoded into UTF-8, but I
> haven't determined that yet.
>
> A fresh buffer in emacs for me on my Win-7 box has an encoding system of
> iso-latin-1-dos. The coding system used to open and save files is the
> same.
>
> So, help me piece together what happens as I paste the UTF-8 text into a
> buffer. First, the paste buffer must define that it is in UTF-8. Emacs
> reads this information and inserts it into the byte string that defines
> the buffer. Now, how does emacs record that it was a UTF-8 encoded
> character? Does it translate it into a different internal encoding
> instead of just recording the 8 bits transferred? Is this encoding used
> as a superset of all possible encoding systems that emacs supports?
>
> Now,  Xah, you suggest I embrace Unicode. What does that mean? Would it
> involve marking all my lisp library files and my org-mode files with the
> file variable -*- coding: utf-8 -*- ? Or is there another way to go
> Unicode automatically?
>
> I assume that if my lisp library files are encoded utf-8, then I can
> paste that character from the web page into my call to replace-string in
> order to substitute the longer dash of Unicode U+2013 with an ascii
> hyphen or double hyphen. But, how does that really work? If the lisp
> file is encoded utf-8, then how can I put an ascii character in the
> replacement string?
>
> I would appreciate it if someone could help me open this new door in my
> brain a bit further.
>
> Kevin Buchs | Senior Engineer | SPPDG | 507-538-5459 |
> buchs.ke...@mayo.edu
> Mayo Clinic | 200 First Street SW | Rochester, MN 55905 
> |http://www.mayo.edu/sppdg
>
> -----Original Message-----
>
> With cursor on that character, type "C-u C-x =", and Emacs will show
> everything it knows about that character, including its canonical
> name.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]