[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Emacs-diffs] emacs/doc/lispref nonascii.texi
From: |
Eli Zaretskii |
Subject: |
[Emacs-diffs] emacs/doc/lispref nonascii.texi |
Date: |
Sat, 29 Nov 2008 12:18:15 +0000 |
CVSROOT: /cvsroot/emacs
Module name: emacs
Changes by: Eli Zaretskii <eliz> 08/11/29 12:18:15
Modified files:
doc/lispref : nonascii.texi
Log message:
(Explicit Encoding): Update for Emacs 23.
(Character Codes): Document `max-char'.
CVSWeb URLs:
http://cvs.savannah.gnu.org/viewcvs/emacs/doc/lispref/nonascii.texi?cvsroot=emacs&r1=1.11&r2=1.12
Patches:
Index: nonascii.texi
===================================================================
RCS file: /cvsroot/emacs/emacs/doc/lispref/nonascii.texi,v
retrieving revision 1.11
retrieving revision 1.12
diff -u -b -r1.11 -r1.12
--- nonascii.texi 28 Nov 2008 13:26:17 -0000 1.11
+++ nonascii.texi 29 Nov 2008 12:18:14 -0000 1.12
@@ -298,12 +298,36 @@
@code{nil} otherwise.
@example
address@hidden
(characterp 65)
@result{} t
address@hidden group
address@hidden
(characterp 4194303)
@result{} t
address@hidden group
address@hidden
(characterp 4194304)
@result{} nil
address@hidden group
address@hidden example
address@hidden defun
+
address@hidden maximum value of character codepoint
address@hidden codepoint, largest value
address@hidden max-char
+This function returns the largest value that a valid character
+codepoint can have.
+
address@hidden
address@hidden
+(characterp (max-char))
+ @result{} t
address@hidden group
address@hidden
+(characterp (1+ (max-char)))
+ @result{} nil
address@hidden group
@end example
@end defun
@@ -579,48 +603,51 @@
@subsection Basic Concepts of Coding Systems
@cindex character code conversion
- @dfn{Character code conversion} involves conversion between the encoding
-used inside Emacs and some other encoding. Emacs supports many
-different encodings, in that it can convert to and from them. For
-example, it can convert text to or from encodings such as Latin 1, Latin
-2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some
-cases, Emacs supports several alternative encodings for the same
-characters; for example, there are three coding systems for the Cyrillic
-(Russian) alphabet: ISO, Alternativnyj, and KOI8.
+ @dfn{Character code conversion} involves conversion between the
+internal representation of characters used inside Emacs and some other
+encoding. Emacs supports many different encodings, in that it can
+convert to and from them. For example, it can convert text to or from
+encodings such as Latin 1, Latin 2, Latin 3, Latin 4, Latin 5, and
+several variants of ISO 2022. In some cases, Emacs supports several
+alternative encodings for the same characters; for example, there are
+three coding systems for the Cyrillic (Russian) alphabet: ISO,
+Alternativnyj, and KOI8.
address@hidden I think this paragraph is no longer correct.
address@hidden
Most coding systems specify a particular character code for
conversion, but some of them leave the choice unspecified---to be chosen
heuristically for each file, based on the data.
address@hidden ignore
In general, a coding system doesn't guarantee roundtrip identity:
decoding a byte sequence using coding system, then encoding the
resulting text in the same coding system, can produce a different byte
-sequence. However, the following coding systems do guarantee that the
-byte sequence will be the same as what you originally decoded:
+sequence. But some coding systems do guarantee that the byte sequence
+will be the same as what you originally decoded. Here are a few
+examples:
@quotation
-chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule
-greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3
-iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe
-japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text
+iso-8859-1, utf-8, big5, shift_jis, euc-jp
@end quotation
Encoding buffer text and then decoding the result can also fail to
-reproduce the original text. For instance, if you encode Latin-2
-characters with @code{utf-8} and decode the result using the same
-coding system, you'll get Unicode characters (of charset
address@hidden). If you encode Unicode characters with
address@hidden and decode the result with the same coding system,
-you'll get Latin-2 characters.
+reproduce the original text. For instance, if you encode a character
+with a coding system which does not support that character, the result
+is unpredictable, and thus decoding it using the same coding system
+may produce a different text. Currently, Emacs can't report errors
+that result from encoding unsupported characters.
@cindex EOL conversion
@cindex end-of-line conversion
@cindex line end conversion
- @dfn{End of line conversion} handles three different conventions used
-on various systems for representing end of line in files. The Unix
-convention is to use the linefeed character (also called newline). The
-DOS convention is to use a carriage-return and a linefeed at the end of
-a line. The Mac convention is to use just carriage-return.
+ @dfn{End of line conversion} handles three different conventions
+used on various systems for representing end of line in files. The
+Unix convention, used on GNU and Unix systems, is to use the linefeed
+character (also called newline). The DOS convention, used on
+MS-Windows and MS-DOS systems, is to use a carriage-return and a
+linefeed at the end of a line. The Mac convention is to use just
+carriage-return.
@cindex base coding system
@cindex variant coding system
@@ -639,7 +666,8 @@
conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}:
it specifies no conversion of either character codes or end-of-line.
- The coding system @code{emacs-mule} specifies that the data is
address@hidden address@hidden coding system}
+ The coding system @code{emacs-internal} specifies that the data is
represented in the internal Emacs encoding. This is like
@code{raw-text} in that no code conversion happens, but different in
that the result is multibyte data.
@@ -647,20 +675,20 @@
@defun coding-system-get coding-system property
This function returns the specified property of the coding system
@var{coding-system}. Most coding system properties exist for internal
-purposes, but one that you might find useful is @code{mime-charset}.
+purposes, but one that you might find useful is @code{:mime-charset}.
That property's value is the name used in MIME for the character coding
which this coding system can read and write. Examples:
@example
-(coding-system-get 'iso-latin-1 'mime-charset)
+(coding-system-get 'iso-latin-1 :mime-charset)
@result{} iso-8859-1
-(coding-system-get 'iso-2022-cn 'mime-charset)
+(coding-system-get 'iso-2022-cn :mime-charset)
@result{} iso-2022-cn
-(coding-system-get 'cyrillic-koi8 'mime-charset)
+(coding-system-get 'cyrillic-koi8 :mime-charset)
@result{} koi8-r
@end example
-The value of the @code{mime-charset} property is also defined
+The value of the @code{:mime-charset} property is also defined
as an alias for the coding system.
@end defun
@@ -763,9 +791,11 @@
@end defun
@defun check-coding-system coding-system
-This function checks the validity of @var{coding-system}.
-If that is valid, it returns @var{coding-system}.
-Otherwise it signals an error with condition @code{coding-system-error}.
+This function checks the validity of @var{coding-system}. If that is
+valid, it returns @var{coding-system}. If @var{coding-system} is
address@hidden, the function return @code{nil}. For any other values, it
+signals an error whose @code{error-symbol} is @code{coding-system-error}
+(@pxref{Signaling Errors, signal}).
@end defun
@defun coding-system-eol-type coding-system
@@ -837,8 +867,9 @@
@defun detect-coding-region start end &optional highest
This function chooses a plausible coding system for decoding the text
-from @var{start} to @var{end}. This text should be a byte sequence
-(@pxref{Explicit Encoding}).
+from @var{start} to @var{end}. This text should be a byte sequence,
+i.e.@: unibyte text or multibyte text with only @acronym{ASCII} and
+eight-bit characters (@pxref{Explicit Encoding}).
Normally this function returns a list of coding systems that could
handle decoding the text that was scanned. They are listed in order of
@@ -1160,10 +1191,12 @@
The result of encoding, and the input to decoding, are not ordinary
text. They logically consist of a series of byte values; that is, a
-series of characters whose codes are in the range 0 through 255. In a
-multibyte buffer or string, character codes 128 through 159 are
-represented by multibyte sequences, but this is invisible to Lisp
-programs.
+series of @acronym{ASCII} and eight-bit characters. In unibyte
+buffers and strings, these characters have codes in the range 0
+through 255. In a multibyte buffer or string, eight-bit characters
+have character codes higher than 255 (@pxref{Text Representations}),
+but Emacs transparently converts them to their single-byte values when
+you encode or decode such text.
The usual way to read a file into a buffer as a sequence of bytes, so
you can decode the contents explicitly, is with
@@ -1181,19 +1214,28 @@
Here are the functions to perform explicit encoding or decoding. The
encoding functions produce sequences of bytes; the decoding functions
are meant to operate on sequences of bytes. All of these functions
-discard text properties.
+discard text properties. They also set @code{last-coding-system-used}
+to the precise coding system they used.
address@hidden Command encode-coding-region start end coding-system
address@hidden Command encode-coding-region start end coding-system &optional
destination
This command encodes the text from @var{start} to @var{end} according
-to coding system @var{coding-system}. The encoded text replaces the
-original text in the buffer. The result of encoding is logically a
-sequence of bytes, but the buffer remains multibyte if it was multibyte
-before.
-
-This command returns the length of the encoded text.
+to coding system @var{coding-system}. Normally, the encoded text
+replaces the original text in the buffer, but the optional argument
address@hidden can change that. If @var{destination} is a buffer,
+the encoded text is inserted in that buffer after point (point does
+not move); if it is @code{t}, the command returns the encoded text as
+a unibyte string without inserting it.
+
+If encoded text is inserted in some buffer, this command returns the
+length of the encoded text.
+
+The result of encoding is logically a sequence of bytes, but the
+buffer remains multibyte if it was multibyte before, and any 8-bit
+bytes are converted to their multibyte representation (@pxref{Text
+Representations}).
@end deffn
address@hidden encode-coding-string string coding-system &optional nocopy
address@hidden encode-coding-string string coding-system &optional nocopy buffer
This function encodes the text in @var{string} according to coding
system @var{coding-system}. It returns a new string containing the
encoded text, except when @var{nocopy} is address@hidden, in which
@@ -1201,24 +1243,36 @@
operation is trivial. The result of encoding is a unibyte string.
@end defun
address@hidden Command decode-coding-region start end coding-system
address@hidden Command decode-coding-region start end coding-system destination
This command decodes the text from @var{start} to @var{end} according
-to coding system @var{coding-system}. The decoded text replaces the
-original text in the buffer. To make explicit decoding useful, the text
-before decoding ought to be a sequence of byte values, but both
-multibyte and unibyte buffers are acceptable.
+to coding system @var{coding-system}. To make explicit decoding
+useful, the text before decoding ought to be a sequence of byte
+values, but both multibyte and unibyte buffers are acceptable (in the
+multibyte case, the raw byte values should be represented as eight-bit
+characters). Normally, the decoded text replaces the original text in
+the buffer, but the optional argument @var{destination} can change
+that. If @var{destination} is a buffer, the decoded text is inserted
+in that buffer after point (point does not move); if it is @code{t},
+the command returns the decoded text as a multibyte string without
+inserting it.
-This command returns the length of the decoded text.
+If decoded text is inserted in some buffer, this command returns the
+length of the decoded text.
@end deffn
address@hidden decode-coding-string string coding-system &optional nocopy
-This function decodes the text in @var{string} according to coding
-system @var{coding-system}. It returns a new string containing the
-decoded text, except when @var{nocopy} is address@hidden, in which
-case the function may return @var{string} itself if the decoding
-operation is trivial. To make explicit decoding useful, the contents
-of @var{string} ought to be a sequence of byte values, but a multibyte
-string is acceptable.
address@hidden decode-coding-string string coding-system &optional nocopy buffer
+This function decodes the text in @var{string} according to
address@hidden It returns a new string containing the decoded
+text, except when @var{nocopy} is address@hidden, in which case the
+function may return @var{string} itself if the decoding operation is
+trivial. To make explicit decoding useful, the contents of
address@hidden ought to be a unibyte string with a sequence of byte
+values, but a multibyte string is also acceptable (assuming it
+contains 8-bit bytes in their multibyte form).
+
+If optional argument @var{buffer} specifies a buffer, the decoded text
+is inserted in that buffer after point (point does not move). In this
+case, the return value is the length of the decoded text.
@end defun
@defun decode-coding-inserted-region from to filename &optional visit beg end
replace
@@ -1236,10 +1290,10 @@
@subsection Terminal I/O Encoding
Emacs can decode keyboard input using a coding system, and encode
-terminal output. This is useful for terminals that transmit or display
-text using a particular encoding such as Latin-1. Emacs does not set
address@hidden for encoding or decoding for the
-terminal.
+terminal output. This is useful for terminals that transmit or
+display text using a particular encoding such as Latin-1. Emacs does
+not set @code{last-coding-system-used} for encoding or decoding of
+terminal I/O.
@defun keyboard-coding-system
This function returns the coding system that is in use for decoding