help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coding system


From: B.T. Raven
Subject: Re: coding system
Date: Sun, 27 Mar 2005 00:56:37 -0600

"Stefan Monnier" <monnier@iro.umontreal.ca> wrote in message
877jju9dog.fsf-monnier+gnu.emacs.help@gnu.org">news:877jju9dog.fsf-monnier+gnu.emacs.help@gnu.org...
> > However it seems that the coding system for keyboard input is
latin-1.
> > This is a unibyte coding system; why does emacs see a multibyte
charater
> > when I press é? To what corresponds this 2281?
>
> Inside Emacs, there's no such thing as unibyte characters and
> a multibyte characters.   There are just characters, which are
represented
> by integers.  When loading/saving a file, characters are
decoded/encoded
> into sequences of bytes which can be unibyte or multibyte.  This same
"é"
> can be represented in some files with a single byte (e.g. if it's a
latin-1
> file) or as two bytes (e.g. if it's a utf-8 file), or ...
>
>
>         Stefan

That "or ..." is pregnant with meaning.  It seems that the same
character can be represented in the same buffer itself with 3 or more
different byte sequences. Here is the C-u C-x = report for three e with
acute and two e with macron:
(Sorry about the munged characters. I don't know how to use gnus under
w32 so I have to copypaste from emacs to Outlook.

Notice that the e with macron expands from a 2-byte to a 4-byte
representation in the buffer after being saved and then reloaded. Also
the part of the font it uses seems to be different. Even if unification
on decoding were working, could it overcome this great a difference in
the representation of the characters?

Ed.


ééé$,1 3,D:


(Bcharacter: é (04351, 2281, 0x8e9)
    charset: latin-iso8859-1 (Right-Hand Part of Latin Alphabet 1
(ISO/IEC 8859-1): ISO-IR-100)
 code point: 105
     syntax: word
   category: l:Latin
buffer code: 0x81 0xE9
  file code: E9 (encoded by coding system iso-latin-1-dos)
       font: -outline-Arial Unicode
MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-1

 character: é (04551, 2409, 0x969)
    charset: latin-iso8859-2 (Right-Hand Part of Latin Alphabet 2
(ISO/IEC 8859-2): ISO-IR-101)
 code point: 105
     syntax: word
   category: l:Latin
buffer code: 0x82 0xE9
  file code: 0xC3 0xA9 (encoded by coding system mule-utf-8-dos)
       font: -outline-Arial Unicode
MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-2

  character: é (05151, 2665, 0xa69)
    charset: latin-iso8859-4 (Right-Hand Part of Latin Alphabet 4
(ISO/IEC 8859-4): ISO-IR-110)
 code point: 105
     syntax: word
   category: l:Latin
buffer code: 0x84 0xE9
  file code: E9 (encoded by coding system iso-latin-1-dos)
       font: -outline-Arial Unicode
MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-4

  character: $,1 3(B (05072, 2618, 0xa3a)
    charset: latin-iso8859-4 (Right-Hand Part of Latin Alphabet 4
(ISO/IEC 8859-4): ISO-IR-110)
 code point: 58
     syntax: word
   category: l:Latin
buffer code: 0x84 0xBA
  file code: 0xC4 0x93 (encoded by coding system utf-8-dos)
       font: -outline-Arial Unicode
MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-4

 character: $,1 3(B (01210063, 331827, 0x51033)
    charset: mule-unicode-0100-24ff (Unicode characters of the range
U+0100..U+24FF.)
 code point: 32 51
     syntax: word
   category: l:Latin
buffer code: 0x9C 0xF4 0xA0 0xB3
  file code: 0xC4 0x93 (encoded by coding system mule-utf-8-dos)
       font: -outline-Arial Unicode
MS-normal-r-normal-normal-14-105-96-96-p-60-iso10646-1



reply via email to

[Prev in Thread] Current Thread [Next in Thread]