1 /* Coding system handler (conversion, detection, and etc).
2 Copyright (C) 2001, 2002, 2003, 2004, 2005,
3 2006, 2007, 2008 Free Software Foundation, Inc.
4 Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
6 National Institute of Advanced Industrial Science and Technology (AIST)
7 Registration Number H14PRO021
9 This file is part of GNU Emacs.
11 GNU Emacs is free software; you can redistribute it and/or modify
12 it under the terms of the GNU General Public License as published by
13 the Free Software Foundation; either version 3, or (at your option)
16 GNU Emacs is distributed in the hope that it will be useful,
17 but WITHOUT ANY WARRANTY; without even the implied warranty of
18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19 GNU General Public License for more details.
21 You should have received a copy of the GNU General Public License
22 along with GNU Emacs; see the file COPYING. If not, write to
23 the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
24 Boston, MA 02110-1301, USA. */
26 /*** TABLE OF CONTENTS ***
30 2. Emacs' internal format (emacs-mule) handlers
32 4. Shift-JIS and BIG5 handlers
34 6. End-of-line handlers
35 7. C library functions
36 8. Emacs Lisp library functions
41 /*** 0. General comments ***/
44 /*** GENERAL NOTE on CODING SYSTEMS ***
46 A coding system is an encoding mechanism for one or more character
47 sets. Here's a list of coding systems which Emacs can handle. When
48 we say "decode", it means converting some other coding system to
49 Emacs' internal format (emacs-mule), and when we say "encode",
50 it means converting the coding system emacs-mule to some other
53 0. Emacs' internal format (emacs-mule)
55 Emacs itself holds a multi-lingual character in buffers and strings
56 in a special format. Details are described in section 2.
60 The most famous coding system for multiple character sets. X's
61 Compound Text, various EUCs (Extended Unix Code), and coding
62 systems used in Internet communication such as ISO-2022-JP are
63 all variants of ISO2022. Details are described in section 3.
65 2. SJIS (or Shift-JIS or MS-Kanji-Code)
67 A coding system to encode character sets: ASCII, JISX0201, and
68 JISX0208. Widely used for PC's in Japan. Details are described in
73 A coding system to encode the character sets ASCII and Big5. Widely
74 used for Chinese (mainly in Taiwan and Hong Kong). Details are
75 described in section 4. In this file, when we write "BIG5"
76 (all uppercase), we mean the coding system, and when we write
77 "Big5" (capitalized), we mean the character set.
81 A coding system for text containing random 8-bit code. Emacs does
82 no code conversion on such text except for end-of-line format.
86 If a user wants to read/write text encoded in a coding system not
87 listed above, he can supply a decoder and an encoder for it as CCL
88 (Code Conversion Language) programs. Emacs executes the CCL program
89 while reading/writing.
91 Emacs represents a coding system by a Lisp symbol that has a property
92 `coding-system'. But, before actually using the coding system, the
93 information about it is set in a structure of type `struct
94 coding_system' for rapid processing. See section 6 for more details.
98 /*** GENERAL NOTES on END-OF-LINE FORMAT ***
100 How end-of-line of text is encoded depends on the operating system.
101 For instance, Unix's format is just one byte of `line-feed' code,
102 whereas DOS's format is two-byte sequence of `carriage-return' and
103 `line-feed' codes. MacOS's format is usually one byte of
106 Since text character encoding and end-of-line encoding are
107 independent, any coding system described above can have any
108 end-of-line format. So Emacs has information about end-of-line
109 format in each coding-system. See section 6 for more details.
113 /*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
115 These functions check if a text between SRC and SRC_END is encoded
116 in the coding system category XXX. Each returns an integer value in
117 which appropriate flag bits for the category XXX are set. The flag
118 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the
119 template for these functions. If MULTIBYTEP is nonzero, 8-bit codes
120 of the range 0x80..0x9F are in multibyte form. */
123 detect_coding_emacs_mule (src
, src_end
, multibytep
)
124 unsigned char *src
, *src_end
;
131 /*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
133 These functions decode SRC_BYTES length of unibyte text at SOURCE
134 encoded in CODING to Emacs' internal format. The resulting
135 multibyte text goes to a place pointed to by DESTINATION, the length
136 of which should not exceed DST_BYTES.
138 These functions set the information about original and decoded texts
139 in the members `produced', `produced_char', `consumed', and
140 `consumed_char' of the structure *CODING. They also set the member
141 `result' to one of CODING_FINISH_XXX indicating how the decoding
144 DST_BYTES zero means that the source area and destination area are
145 overlapped, which means that we can produce a decoded text until it
146 reaches the head of the not-yet-decoded source text.
148 Below is a template for these functions. */
151 decode_coding_XXX (coding
, source
, destination
, src_bytes
, dst_bytes
)
152 struct coding_system
*coding
;
153 const unsigned char *source
;
154 unsigned char *destination
;
155 int src_bytes
, dst_bytes
;
161 /*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
163 These functions encode SRC_BYTES length text at SOURCE from Emacs'
164 internal multibyte format to CODING. The resulting unibyte text
165 goes to a place pointed to by DESTINATION, the length of which
166 should not exceed DST_BYTES.
168 These functions set the information about original and encoded texts
169 in the members `produced', `produced_char', `consumed', and
170 `consumed_char' of the structure *CODING. They also set the member
171 `result' to one of CODING_FINISH_XXX indicating how the encoding
174 DST_BYTES zero means that the source area and destination area are
175 overlapped, which means that we can produce encoded text until it
176 reaches at the head of the not-yet-encoded source text.
178 Below is a template for these functions. */
181 encode_coding_XXX (coding
, source
, destination
, src_bytes
, dst_bytes
)
182 struct coding_system
*coding
;
183 unsigned char *source
, *destination
;
184 int src_bytes
, dst_bytes
;
190 /*** COMMONLY USED MACROS ***/
192 /* The following two macros ONE_MORE_BYTE and TWO_MORE_BYTES safely
193 get one, two, and three bytes from the source text respectively.
194 If there are not enough bytes in the source, they jump to
195 `label_end_of_loop'. The caller should set variables `coding',
196 `src' and `src_end' to appropriate pointer in advance. These
197 macros are called from decoding routines `decode_coding_XXX', thus
198 it is assumed that the source text is unibyte. */
200 #define ONE_MORE_BYTE(c1) \
202 if (src >= src_end) \
204 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
205 goto label_end_of_loop; \
210 #define TWO_MORE_BYTES(c1, c2) \
212 if (src + 1 >= src_end) \
214 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
215 goto label_end_of_loop; \
222 /* Like ONE_MORE_BYTE, but 8-bit bytes of data at SRC are in multibyte
223 form if MULTIBYTEP is nonzero. In addition, if SRC is not less
224 than SRC_END, return with RET. */
226 #define ONE_MORE_BYTE_CHECK_MULTIBYTE(c1, multibytep, ret) \
228 if (src >= src_end) \
230 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
234 if (multibytep && c1 == LEADING_CODE_8_BIT_CONTROL) \
235 c1 = *src++ - 0x20; \
238 /* Set C to the next character at the source text pointed by `src'.
239 If there are not enough characters in the source, jump to
240 `label_end_of_loop'. The caller should set variables `coding'
241 `src', `src_end', and `translation_table' to appropriate pointers
242 in advance. This macro is used in encoding routines
243 `encode_coding_XXX', thus it assumes that the source text is in
244 multibyte form except for 8-bit characters. 8-bit characters are
245 in multibyte form if coding->src_multibyte is nonzero, else they
246 are represented by a single byte. */
248 #define ONE_MORE_CHAR(c) \
250 int len = src_end - src; \
254 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
255 goto label_end_of_loop; \
257 if (coding->src_multibyte \
258 || UNIBYTE_STR_AS_MULTIBYTE_P (src, len, bytes)) \
259 c = STRING_CHAR_AND_LENGTH (src, len, bytes); \
261 c = *src, bytes = 1; \
262 if (!NILP (translation_table)) \
263 c = translate_char (translation_table, c, -1, 0, 0); \
268 /* Produce a multibyte form of character C to `dst'. Jump to
269 `label_end_of_loop' if there's not enough space at `dst'.
271 If we are now in the middle of a composition sequence, the decoded
272 character may be ALTCHAR (for the current composition). In that
273 case, the character goes to coding->cmp_data->data instead of
276 This macro is used in decoding routines. */
278 #define EMIT_CHAR(c) \
280 if (! COMPOSING_P (coding) \
281 || coding->composing == COMPOSITION_RELATIVE \
282 || coding->composing == COMPOSITION_WITH_RULE) \
284 int bytes = CHAR_BYTES (c); \
285 if ((dst + bytes) > (dst_bytes ? dst_end : src)) \
287 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
288 goto label_end_of_loop; \
290 dst += CHAR_STRING (c, dst); \
291 coding->produced_char++; \
294 if (COMPOSING_P (coding) \
295 && coding->composing != COMPOSITION_RELATIVE) \
297 CODING_ADD_COMPOSITION_COMPONENT (coding, c); \
298 coding->composition_rule_follows \
299 = coding->composing != COMPOSITION_WITH_ALTCHARS; \
304 #define EMIT_ONE_BYTE(c) \
306 if (dst >= (dst_bytes ? dst_end : src)) \
308 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
309 goto label_end_of_loop; \
314 #define EMIT_TWO_BYTES(c1, c2) \
316 if (dst + 2 > (dst_bytes ? dst_end : src)) \
318 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
319 goto label_end_of_loop; \
321 *dst++ = c1, *dst++ = c2; \
324 #define EMIT_BYTES(from, to) \
326 if (dst + (to - from) > (dst_bytes ? dst_end : src)) \
328 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
329 goto label_end_of_loop; \
336 /*** 1. Preamble ***/
349 #include "composite.h"
353 #include "intervals.h"
355 #include "termhooks.h"
357 #else /* not emacs */
361 #endif /* not emacs */
363 Lisp_Object Qcoding_system
, Qeol_type
;
364 Lisp_Object Qbuffer_file_coding_system
;
365 Lisp_Object Qpost_read_conversion
, Qpre_write_conversion
;
366 Lisp_Object Qno_conversion
, Qundecided
;
367 Lisp_Object Qcoding_system_history
;
368 Lisp_Object Qsafe_chars
;
369 Lisp_Object Qvalid_codes
;
370 Lisp_Object Qascii_incompatible
;
372 extern Lisp_Object Qinsert_file_contents
, Qwrite_region
;
373 Lisp_Object Qcall_process
, Qcall_process_region
;
374 Lisp_Object Qstart_process
, Qopen_network_stream
;
375 Lisp_Object Qtarget_idx
;
377 extern Lisp_Object Qcompletion_ignore_case
;
379 /* If a symbol has this property, evaluate the value to define the
380 symbol as a coding system. */
381 Lisp_Object Qcoding_system_define_form
;
383 Lisp_Object Vselect_safe_coding_system_function
;
385 int coding_system_require_warning
;
387 /* Mnemonic string for each format of end-of-line. */
388 Lisp_Object eol_mnemonic_unix
, eol_mnemonic_dos
, eol_mnemonic_mac
;
389 /* Mnemonic string to indicate format of end-of-line is not yet
391 Lisp_Object eol_mnemonic_undecided
;
393 /* Format of end-of-line decided by system. This is CODING_EOL_LF on
394 Unix, CODING_EOL_CRLF on DOS/Windows, and CODING_EOL_CR on Mac.
395 This has an effect only for external encoding (i.e. for output to
396 file and process), not for in-buffer or Lisp string encoding. */
401 /* Information about which coding system is safe for which chars.
402 The value has the form (GENERIC-LIST . NON-GENERIC-ALIST).
404 GENERIC-LIST is a list of generic coding systems which can encode
407 NON-GENERIC-ALIST is an alist of non generic coding systems vs the
408 corresponding char table that contains safe chars. */
409 Lisp_Object Vcoding_system_safe_chars
;
411 Lisp_Object Vcoding_system_list
, Vcoding_system_alist
;
413 Lisp_Object Qcoding_system_p
, Qcoding_system_error
;
415 /* Coding system emacs-mule and raw-text are for converting only
416 end-of-line format. */
417 Lisp_Object Qemacs_mule
, Qraw_text
;
421 /* Coding-systems are handed between Emacs Lisp programs and C internal
422 routines by the following three variables. */
423 /* Coding-system for reading files and receiving data from process. */
424 Lisp_Object Vcoding_system_for_read
;
425 /* Coding-system for writing files and sending data to process. */
426 Lisp_Object Vcoding_system_for_write
;
427 /* Coding-system actually used in the latest I/O. */
428 Lisp_Object Vlast_coding_system_used
;
430 /* A vector of length 256 which contains information about special
431 Latin codes (especially for dealing with Microsoft codes). */
432 Lisp_Object Vlatin_extra_code_table
;
434 /* Flag to inhibit code conversion of end-of-line format. */
435 int inhibit_eol_conversion
;
437 /* Flag to inhibit ISO2022 escape sequence detection. */
438 int inhibit_iso_escape_detection
;
440 /* Flag to make buffer-file-coding-system inherit from process-coding. */
441 int inherit_process_coding_system
;
443 /* Coding system to be used to encode text for terminal display when
444 terminal coding system is nil. */
445 struct coding_system safe_terminal_coding
;
447 /* Default coding system to be used to write a file. */
448 struct coding_system default_buffer_file_coding
;
450 Lisp_Object Vfile_coding_system_alist
;
451 Lisp_Object Vprocess_coding_system_alist
;
452 Lisp_Object Vnetwork_coding_system_alist
;
454 Lisp_Object Vlocale_coding_system
;
458 Lisp_Object Qcoding_category
, Qcoding_category_index
;
460 /* List of symbols `coding-category-xxx' ordered by priority. */
461 Lisp_Object Vcoding_category_list
;
463 /* Table of coding categories (Lisp symbols). */
464 Lisp_Object Vcoding_category_table
;
466 /* Table of names of symbol for each coding-category. */
467 char *coding_category_name
[CODING_CATEGORY_IDX_MAX
] = {
468 "coding-category-emacs-mule",
469 "coding-category-sjis",
470 "coding-category-iso-7",
471 "coding-category-iso-7-tight",
472 "coding-category-iso-8-1",
473 "coding-category-iso-8-2",
474 "coding-category-iso-7-else",
475 "coding-category-iso-8-else",
476 "coding-category-ccl",
477 "coding-category-big5",
478 "coding-category-utf-8",
479 "coding-category-utf-16-be",
480 "coding-category-utf-16-le",
481 "coding-category-raw-text",
482 "coding-category-binary"
485 /* Table of pointers to coding systems corresponding to each coding
487 struct coding_system
*coding_system_table
[CODING_CATEGORY_IDX_MAX
];
489 /* Table of coding category masks. Nth element is a mask for a coding
490 category of which priority is Nth. */
492 int coding_priorities
[CODING_CATEGORY_IDX_MAX
];
494 /* Flag to tell if we look up translation table on character code
496 Lisp_Object Venable_character_translation
;
497 /* Standard translation table to look up on decoding (reading). */
498 Lisp_Object Vstandard_translation_table_for_decode
;
499 /* Standard translation table to look up on encoding (writing). */
500 Lisp_Object Vstandard_translation_table_for_encode
;
502 Lisp_Object Qtranslation_table
;
503 Lisp_Object Qtranslation_table_id
;
504 Lisp_Object Qtranslation_table_for_decode
;
505 Lisp_Object Qtranslation_table_for_encode
;
507 /* Alist of charsets vs revision number. */
508 Lisp_Object Vcharset_revision_alist
;
510 /* Default coding systems used for process I/O. */
511 Lisp_Object Vdefault_process_coding_system
;
513 /* Char table for translating Quail and self-inserting input. */
514 Lisp_Object Vtranslation_table_for_input
;
516 /* Global flag to tell that we can't call post-read-conversion and
517 pre-write-conversion functions. Usually the value is zero, but it
518 is set to 1 temporarily while such functions are running. This is
519 to avoid infinite recursive call. */
520 static int inhibit_pre_post_conversion
;
522 Lisp_Object Qchar_coding_system
;
524 /* Return `safe-chars' property of CODING_SYSTEM (symbol). Don't check
528 coding_safe_chars (coding_system
)
529 Lisp_Object coding_system
;
531 Lisp_Object coding_spec
, plist
, safe_chars
;
533 coding_spec
= Fget (coding_system
, Qcoding_system
);
534 plist
= XVECTOR (coding_spec
)->contents
[3];
535 safe_chars
= Fplist_get (XVECTOR (coding_spec
)->contents
[3], Qsafe_chars
);
536 return (CHAR_TABLE_P (safe_chars
) ? safe_chars
: Qt
);
539 #define CODING_SAFE_CHAR_P(safe_chars, c) \
540 (EQ (safe_chars, Qt) || !NILP (CHAR_TABLE_REF (safe_chars, c)))
543 /*** 2. Emacs internal format (emacs-mule) handlers ***/
545 /* Emacs' internal format for representation of multiple character
546 sets is a kind of multi-byte encoding, i.e. characters are
547 represented by variable-length sequences of one-byte codes.
549 ASCII characters and control characters (e.g. `tab', `newline') are
550 represented by one-byte sequences which are their ASCII codes, in
551 the range 0x00 through 0x7F.
553 8-bit characters of the range 0x80..0x9F are represented by
554 two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit
557 8-bit characters of the range 0xA0..0xFF are represented by
558 one-byte sequences which are their 8-bit code.
560 The other characters are represented by a sequence of `base
561 leading-code', optional `extended leading-code', and one or two
562 `position-code's. The length of the sequence is determined by the
563 base leading-code. Leading-code takes the range 0x81 through 0x9D,
564 whereas extended leading-code and position-code take the range 0xA0
565 through 0xFF. See `charset.h' for more details about leading-code
568 --- CODE RANGE of Emacs' internal format ---
572 eight-bit-control LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF
573 eight-bit-graphic 0xA0..0xBF
574 ELSE 0x81..0x9D + [0xA0..0xFF]+
575 ---------------------------------------------
577 As this is the internal character representation, the format is
578 usually not used externally (i.e. in a file or in a data sent to a
579 process). But, it is possible to have a text externally in this
580 format (i.e. by encoding by the coding system `emacs-mule').
582 In that case, a sequence of one-byte codes has a slightly different
585 Firstly, all characters in eight-bit-control are represented by
586 one-byte sequences which are their 8-bit code.
588 Next, character composition data are represented by the byte
589 sequence of the form: 0x80 METHOD BYTES CHARS COMPONENT ...,
591 METHOD is 0xF0 plus one of composition method (enum
594 BYTES is 0xA0 plus the byte length of these composition data,
596 CHARS is 0xA0 plus the number of characters composed by these
599 COMPONENTs are characters of multibyte form or composition
600 rules encoded by two-byte of ASCII codes.
602 In addition, for backward compatibility, the following formats are
603 also recognized as composition data on decoding.
606 0x80 0xFF MSEQ RULE MSEQ RULE ... MSEQ
609 MSEQ is a multibyte form but in these special format:
610 ASCII: 0xA0 ASCII_CODE+0x80,
611 other: LEADING_CODE+0x20 FOLLOWING-BYTE ...,
612 RULE is a one byte code of the range 0xA0..0xF0 that
613 represents a composition rule.
616 enum emacs_code_class_type emacs_code_class
[256];
618 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
619 Check if a text is encoded in Emacs' internal format. If it is,
620 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */
623 detect_coding_emacs_mule (src
, src_end
, multibytep
)
624 unsigned char *src
, *src_end
;
629 /* Dummy for ONE_MORE_BYTE. */
630 struct coding_system dummy_coding
;
631 struct coding_system
*coding
= &dummy_coding
;
635 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
,
636 CODING_CATEGORY_MASK_EMACS_MULE
);
643 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, 0);
652 if (c
== ISO_CODE_ESC
|| c
== ISO_CODE_SI
|| c
== ISO_CODE_SO
)
655 else if (c
>= 0x80 && c
< 0xA0)
658 /* Old leading code for a composite character. */
662 unsigned char *src_base
= src
- 1;
665 if (!UNIBYTE_STR_AS_MULTIBYTE_P (src_base
, src_end
- src_base
,
668 src
= src_base
+ bytes
;
675 /* Record the starting position START and METHOD of one composition. */
677 #define CODING_ADD_COMPOSITION_START(coding, start, method) \
679 struct composition_data *cmp_data = coding->cmp_data; \
680 int *data = cmp_data->data + cmp_data->used; \
681 coding->cmp_data_start = cmp_data->used; \
683 data[1] = cmp_data->char_offset + start; \
684 data[3] = (int) method; \
685 cmp_data->used += 4; \
688 /* Record the ending position END of the current composition. */
690 #define CODING_ADD_COMPOSITION_END(coding, end) \
692 struct composition_data *cmp_data = coding->cmp_data; \
693 int *data = cmp_data->data + coding->cmp_data_start; \
694 data[0] = cmp_data->used - coding->cmp_data_start; \
695 data[2] = cmp_data->char_offset + end; \
698 /* Record one COMPONENT (alternate character or composition rule). */
700 #define CODING_ADD_COMPOSITION_COMPONENT(coding, component) \
702 coding->cmp_data->data[coding->cmp_data->used++] = component; \
703 if (coding->cmp_data->used - coding->cmp_data_start \
704 == COMPOSITION_DATA_MAX_BUNCH_LENGTH) \
706 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
707 coding->composing = COMPOSITION_NO; \
712 /* Get one byte from a data pointed by SRC and increment SRC. If SRC
713 is not less than SRC_END, return -1 without incrementing Src. */
715 #define SAFE_ONE_MORE_BYTE() (src >= src_end ? -1 : *src++)
718 /* Decode a character represented as a component of composition
719 sequence of Emacs 20 style at SRC. Set C to that character, store
720 its multibyte form sequence at P, and set P to the end of that
721 sequence. If no valid character is found, set C to -1. */
723 #define DECODE_EMACS_MULE_COMPOSITION_CHAR(c, p) \
727 c = SAFE_ONE_MORE_BYTE (); \
730 if (CHAR_HEAD_P (c)) \
732 else if (c == 0xA0) \
734 c = SAFE_ONE_MORE_BYTE (); \
743 else if (BASE_LEADING_CODE_P (c - 0x20)) \
745 unsigned char *p0 = p; \
749 bytes = BYTES_BY_CHAR_HEAD (c); \
752 c = SAFE_ONE_MORE_BYTE (); \
757 if (UNIBYTE_STR_AS_MULTIBYTE_P (p0, p - p0, bytes) \
758 || (coding->flags /* We are recovering a file. */ \
759 && p0[0] == LEADING_CODE_8_BIT_CONTROL \
760 && ! CHAR_HEAD_P (p0[1]))) \
761 c = STRING_CHAR (p0, bytes); \
770 /* Decode a composition rule represented as a component of composition
771 sequence of Emacs 20 style at SRC. Set C to the rule. If not
772 valid rule is found, set C to -1. */
774 #define DECODE_EMACS_MULE_COMPOSITION_RULE(c) \
776 c = SAFE_ONE_MORE_BYTE (); \
778 if (c < 0 || c >= 81) \
782 gref = c / 9, nref = c % 9; \
783 c = COMPOSITION_ENCODE_RULE (gref, nref); \
788 /* Decode composition sequence encoded by `emacs-mule' at the source
789 pointed by SRC. SRC_END is the end of source. Store information
790 of the composition in CODING->cmp_data.
792 For backward compatibility, decode also a composition sequence of
793 Emacs 20 style. In that case, the composition sequence contains
794 characters that should be extracted into a buffer or string. Store
795 those characters at *DESTINATION in multibyte form.
797 If we encounter an invalid byte sequence, return 0.
798 If we encounter an insufficient source or destination, or
799 insufficient space in CODING->cmp_data, return 1.
800 Otherwise, return consumed bytes in the source.
804 decode_composition_emacs_mule (coding
, src
, src_end
,
805 destination
, dst_end
, dst_bytes
)
806 struct coding_system
*coding
;
807 const unsigned char *src
, *src_end
;
808 unsigned char **destination
, *dst_end
;
811 unsigned char *dst
= *destination
;
812 int method
, data_len
, nchars
;
813 const unsigned char *src_base
= src
++;
814 /* Store components of composition. */
815 int component
[COMPOSITION_DATA_MAX_BUNCH_LENGTH
];
817 /* Store multibyte form of characters to be composed. This is for
818 Emacs 20 style composition sequence. */
819 unsigned char buf
[MAX_COMPOSITION_COMPONENTS
* MAX_MULTIBYTE_LENGTH
];
820 unsigned char *bufp
= buf
;
821 int c
, i
, gref
, nref
;
823 if (coding
->cmp_data
->used
+ COMPOSITION_DATA_MAX_BUNCH_LENGTH
824 >= COMPOSITION_DATA_SIZE
)
826 coding
->result
= CODING_FINISH_INSUFFICIENT_CMP
;
831 if (c
- 0xF0 >= COMPOSITION_RELATIVE
832 && c
- 0xF0 <= COMPOSITION_WITH_RULE_ALTCHARS
)
837 with_rule
= (method
== COMPOSITION_WITH_RULE
838 || method
== COMPOSITION_WITH_RULE_ALTCHARS
);
842 || src_base
+ data_len
> src_end
)
848 for (ncomponent
= 0; src
< src_base
+ data_len
; ncomponent
++)
850 /* If it is longer than this, it can't be valid. */
851 if (ncomponent
>= COMPOSITION_DATA_MAX_BUNCH_LENGTH
)
854 if (ncomponent
% 2 && with_rule
)
856 ONE_MORE_BYTE (gref
);
858 ONE_MORE_BYTE (nref
);
860 c
= COMPOSITION_ENCODE_RULE (gref
, nref
);
865 if (UNIBYTE_STR_AS_MULTIBYTE_P (src
, src_end
- src
, bytes
)
866 || (coding
->flags
/* We are recovering a file. */
867 && src
[0] == LEADING_CODE_8_BIT_CONTROL
868 && ! CHAR_HEAD_P (src
[1])))
869 c
= STRING_CHAR (src
, bytes
);
874 component
[ncomponent
] = c
;
879 /* This may be an old Emacs 20 style format. See the comment at
880 the section 2 of this file. */
881 while (src
< src_end
&& !CHAR_HEAD_P (*src
)) src
++;
883 && !(coding
->mode
& CODING_MODE_LAST_BLOCK
))
884 goto label_end_of_loop
;
890 method
= COMPOSITION_RELATIVE
;
891 for (ncomponent
= 0; ncomponent
< MAX_COMPOSITION_COMPONENTS
;)
893 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
896 component
[ncomponent
++] = c
;
904 method
= COMPOSITION_WITH_RULE
;
906 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
911 ncomponent
< MAX_COMPOSITION_COMPONENTS
* 2 - 1;)
913 DECODE_EMACS_MULE_COMPOSITION_RULE (c
);
916 component
[ncomponent
++] = c
;
917 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
920 component
[ncomponent
++] = c
;
924 nchars
= (ncomponent
+ 1) / 2;
932 if (buf
== bufp
|| dst
+ (bufp
- buf
) <= (dst_bytes
? dst_end
: src
))
934 CODING_ADD_COMPOSITION_START (coding
, coding
->produced_char
, method
);
935 for (i
= 0; i
< ncomponent
; i
++)
936 CODING_ADD_COMPOSITION_COMPONENT (coding
, component
[i
]);
937 CODING_ADD_COMPOSITION_END (coding
, coding
->produced_char
+ nchars
);
940 unsigned char *p
= buf
;
941 EMIT_BYTES (p
, bufp
);
942 *destination
+= bufp
- buf
;
943 coding
->produced_char
+= nchars
;
945 return (src
- src_base
);
951 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
954 decode_coding_emacs_mule (coding
, source
, destination
, src_bytes
, dst_bytes
)
955 struct coding_system
*coding
;
956 const unsigned char *source
;
957 unsigned char *destination
;
958 int src_bytes
, dst_bytes
;
960 const unsigned char *src
= source
;
961 const unsigned char *src_end
= source
+ src_bytes
;
962 unsigned char *dst
= destination
;
963 unsigned char *dst_end
= destination
+ dst_bytes
;
964 /* SRC_BASE remembers the start position in source in each loop.
965 The loop will be exited when there's not enough source code, or
966 when there's not enough destination area to produce a
968 const unsigned char *src_base
;
970 coding
->produced_char
= 0;
971 while ((src_base
= src
) < src_end
)
973 unsigned char tmp
[MAX_MULTIBYTE_LENGTH
];
974 const unsigned char *p
;
981 if (coding
->eol_type
== CODING_EOL_CR
)
983 else if (coding
->eol_type
== CODING_EOL_CRLF
)
993 coding
->produced_char
++;
996 else if (*src
== '\n')
998 if ((coding
->eol_type
== CODING_EOL_CR
999 || coding
->eol_type
== CODING_EOL_CRLF
)
1000 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
1002 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
1003 goto label_end_of_loop
;
1006 coding
->produced_char
++;
1009 else if (*src
== 0x80 && coding
->cmp_data
)
1011 /* Start of composition data. */
1012 int consumed
= decode_composition_emacs_mule (coding
, src
, src_end
,
1016 goto label_end_of_loop
;
1017 else if (consumed
> 0)
1022 bytes
= CHAR_STRING (*src
, tmp
);
1026 else if (UNIBYTE_STR_AS_MULTIBYTE_P (src
, src_end
- src
, bytes
)
1027 || (coding
->flags
/* We are recovering a file. */
1028 && src
[0] == LEADING_CODE_8_BIT_CONTROL
1029 && ! CHAR_HEAD_P (src
[1])))
1038 bytes
= BYTES_BY_CHAR_HEAD (*src
);
1040 for (i
= 1; i
< bytes
; i
++)
1043 if (CHAR_HEAD_P (c
))
1048 bytes
= CHAR_STRING (*src_base
, tmp
);
1057 if (dst
+ bytes
>= (dst_bytes
? dst_end
: src
))
1059 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
1062 while (bytes
--) *dst
++ = *p
++;
1063 coding
->produced_char
++;
1066 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
1067 coding
->produced
= dst
- destination
;
1071 /* Encode composition data stored at DATA into a special byte sequence
1072 starting by 0x80. Update CODING->cmp_data_start and maybe
1073 CODING->cmp_data for the next call. */
1075 #define ENCODE_COMPOSITION_EMACS_MULE(coding, data) \
1077 unsigned char buf[1024], *p0 = buf, *p; \
1078 int len = data[0]; \
1082 buf[1] = 0xF0 + data[3]; /* METHOD */ \
1083 buf[3] = 0xA0 + (data[2] - data[1]); /* COMPOSED-CHARS */ \
1085 if (data[3] == COMPOSITION_WITH_RULE \
1086 || data[3] == COMPOSITION_WITH_RULE_ALTCHARS) \
1088 p += CHAR_STRING (data[4], p); \
1089 for (i = 5; i < len; i += 2) \
1092 COMPOSITION_DECODE_RULE (data[i], gref, nref); \
1093 *p++ = 0x20 + gref; \
1094 *p++ = 0x20 + nref; \
1095 p += CHAR_STRING (data[i + 1], p); \
1100 for (i = 4; i < len; i++) \
1101 p += CHAR_STRING (data[i], p); \
1103 buf[2] = 0xA0 + (p - buf); /* COMPONENTS-BYTES */ \
1105 if (dst + (p - buf) + 4 > (dst_bytes ? dst_end : src)) \
1107 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
1108 goto label_end_of_loop; \
1112 coding->cmp_data_start += data[0]; \
1113 if (coding->cmp_data_start == coding->cmp_data->used \
1114 && coding->cmp_data->next) \
1116 coding->cmp_data = coding->cmp_data->next; \
1117 coding->cmp_data_start = 0; \
1122 static void encode_eol
P_ ((struct coding_system
*, const unsigned char *,
1123 unsigned char *, int, int));
1126 encode_coding_emacs_mule (coding
, source
, destination
, src_bytes
, dst_bytes
)
1127 struct coding_system
*coding
;
1128 const unsigned char *source
;
1129 unsigned char *destination
;
1130 int src_bytes
, dst_bytes
;
1132 const unsigned char *src
= source
;
1133 const unsigned char *src_end
= source
+ src_bytes
;
1134 unsigned char *dst
= destination
;
1135 unsigned char *dst_end
= destination
+ dst_bytes
;
1136 const unsigned char *src_base
;
1141 Lisp_Object translation_table
;
1143 translation_table
= Qnil
;
1145 /* Optimization for the case that there's no composition. */
1146 if (!coding
->cmp_data
|| coding
->cmp_data
->used
== 0)
1148 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
1152 char_offset
= coding
->cmp_data
->char_offset
;
1153 data
= coding
->cmp_data
->data
+ coding
->cmp_data_start
;
1158 /* If SRC starts a composition, encode the information about the
1159 composition in advance. */
1160 if (coding
->cmp_data_start
< coding
->cmp_data
->used
1161 && char_offset
+ coding
->consumed_char
== data
[1])
1163 ENCODE_COMPOSITION_EMACS_MULE (coding
, data
);
1164 char_offset
= coding
->cmp_data
->char_offset
;
1165 data
= coding
->cmp_data
->data
+ coding
->cmp_data_start
;
1169 if (c
== '\n' && (coding
->eol_type
== CODING_EOL_CRLF
1170 || coding
->eol_type
== CODING_EOL_CR
))
1172 if (coding
->eol_type
== CODING_EOL_CRLF
)
1173 EMIT_TWO_BYTES ('\r', c
);
1175 EMIT_ONE_BYTE ('\r');
1177 else if (SINGLE_BYTE_CHAR_P (c
))
1179 if (coding
->flags
&& ! ASCII_BYTE_P (c
))
1181 /* As we are auto saving, retain the multibyte form for
1183 unsigned char buf
[MAX_MULTIBYTE_LENGTH
];
1184 int bytes
= CHAR_STRING (c
, buf
);
1187 EMIT_ONE_BYTE (buf
[0]);
1189 EMIT_TWO_BYTES (buf
[0], buf
[1]);
1195 EMIT_BYTES (src_base
, src
);
1196 coding
->consumed_char
++;
1199 coding
->consumed
= src_base
- source
;
1200 coding
->produced
= coding
->produced_char
= dst
- destination
;
1205 /*** 3. ISO2022 handlers ***/
1207 /* The following note describes the coding system ISO2022 briefly.
1208 Since the intention of this note is to help understand the
1209 functions in this file, some parts are NOT ACCURATE or are OVERLY
1210 SIMPLIFIED. For thorough understanding, please refer to the
1211 original document of ISO2022. This is equivalent to the standard
1212 ECMA-35, obtainable from <URL:http://www.ecma.ch/> (*).
1214 ISO2022 provides many mechanisms to encode several character sets
1215 in 7-bit and 8-bit environments. For 7-bit environments, all text
1216 is encoded using bytes less than 128. This may make the encoded
1217 text a little bit longer, but the text passes more easily through
1218 several types of gateway, some of which strip off the MSB (Most
1221 There are two kinds of character sets: control character sets and
1222 graphic character sets. The former contain control characters such
1223 as `newline' and `escape' to provide control functions (control
1224 functions are also provided by escape sequences). The latter
1225 contain graphic characters such as 'A' and '-'. Emacs recognizes
1226 two control character sets and many graphic character sets.
1228 Graphic character sets are classified into one of the following
1229 four classes, according to the number of bytes (DIMENSION) and
1230 number of characters in one dimension (CHARS) of the set:
1231 - DIMENSION1_CHARS94
1232 - DIMENSION1_CHARS96
1233 - DIMENSION2_CHARS94
1234 - DIMENSION2_CHARS96
1236 In addition, each character set is assigned an identification tag,
1237 unique for each set, called the "final character" (denoted as <F>
1238 hereafter). The <F> of each character set is decided by ECMA(*)
1239 when it is registered in ISO. The code range of <F> is 0x30..0x7F
1240 (0x30..0x3F are for private use only).
1242 Note (*): ECMA = European Computer Manufacturers Association
1244 Here are examples of graphic character sets [NAME(<F>)]:
1245 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
1246 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
1247 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
1248 o DIMENSION2_CHARS96 -- none for the moment
1250 A code area (1 byte=8 bits) is divided into 4 areas, C0, GL, C1, and GR.
1251 C0 [0x00..0x1F] -- control character plane 0
1252 GL [0x20..0x7F] -- graphic character plane 0
1253 C1 [0x80..0x9F] -- control character plane 1
1254 GR [0xA0..0xFF] -- graphic character plane 1
1256 A control character set is directly designated and invoked to C0 or
1257 C1 by an escape sequence. The most common case is that:
1258 - ISO646's control character set is designated/invoked to C0, and
1259 - ISO6429's control character set is designated/invoked to C1,
1260 and usually these designations/invocations are omitted in encoded
1261 text. In a 7-bit environment, only C0 can be used, and a control
1262 character for C1 is encoded by an appropriate escape sequence to
1263 fit into the environment. All control characters for C1 are
1264 defined to have corresponding escape sequences.
1266 A graphic character set is at first designated to one of four
1267 graphic registers (G0 through G3), then these graphic registers are
1268 invoked to GL or GR. These designations and invocations can be
1269 done independently. The most common case is that G0 is invoked to
1270 GL, G1 is invoked to GR, and ASCII is designated to G0. Usually
1271 these invocations and designations are omitted in encoded text.
1272 In a 7-bit environment, only GL can be used.
1274 When a graphic character set of CHARS94 is invoked to GL, codes
1275 0x20 and 0x7F of the GL area work as control characters SPACE and
1276 DEL respectively, and codes 0xA0 and 0xFF of the GR area should not
1279 There are two ways of invocation: locking-shift and single-shift.
1280 With locking-shift, the invocation lasts until the next different
1281 invocation, whereas with single-shift, the invocation affects the
1282 following character only and doesn't affect the locking-shift
1283 state. Invocations are done by the following control characters or
1286 ----------------------------------------------------------------------
1287 abbrev function cntrl escape seq description
1288 ----------------------------------------------------------------------
1289 SI/LS0 (shift-in) 0x0F none invoke G0 into GL
1290 SO/LS1 (shift-out) 0x0E none invoke G1 into GL
1291 LS2 (locking-shift-2) none ESC 'n' invoke G2 into GL
1292 LS3 (locking-shift-3) none ESC 'o' invoke G3 into GL
1293 LS1R (locking-shift-1 right) none ESC '~' invoke G1 into GR (*)
1294 LS2R (locking-shift-2 right) none ESC '}' invoke G2 into GR (*)
1295 LS3R (locking-shift 3 right) none ESC '|' invoke G3 into GR (*)
1296 SS2 (single-shift-2) 0x8E ESC 'N' invoke G2 for one char
1297 SS3 (single-shift-3) 0x8F ESC 'O' invoke G3 for one char
1298 ----------------------------------------------------------------------
1299 (*) These are not used by any known coding system.
1301 Control characters for these functions are defined by macros
1302 ISO_CODE_XXX in `coding.h'.
1304 Designations are done by the following escape sequences:
1305 ----------------------------------------------------------------------
1306 escape sequence description
1307 ----------------------------------------------------------------------
1308 ESC '(' <F> designate DIMENSION1_CHARS94<F> to G0
1309 ESC ')' <F> designate DIMENSION1_CHARS94<F> to G1
1310 ESC '*' <F> designate DIMENSION1_CHARS94<F> to G2
1311 ESC '+' <F> designate DIMENSION1_CHARS94<F> to G3
1312 ESC ',' <F> designate DIMENSION1_CHARS96<F> to G0 (*)
1313 ESC '-' <F> designate DIMENSION1_CHARS96<F> to G1
1314 ESC '.' <F> designate DIMENSION1_CHARS96<F> to G2
1315 ESC '/' <F> designate DIMENSION1_CHARS96<F> to G3
1316 ESC '$' '(' <F> designate DIMENSION2_CHARS94<F> to G0 (**)
1317 ESC '$' ')' <F> designate DIMENSION2_CHARS94<F> to G1
1318 ESC '$' '*' <F> designate DIMENSION2_CHARS94<F> to G2
1319 ESC '$' '+' <F> designate DIMENSION2_CHARS94<F> to G3
1320 ESC '$' ',' <F> designate DIMENSION2_CHARS96<F> to G0 (*)
1321 ESC '$' '-' <F> designate DIMENSION2_CHARS96<F> to G1
1322 ESC '$' '.' <F> designate DIMENSION2_CHARS96<F> to G2
1323 ESC '$' '/' <F> designate DIMENSION2_CHARS96<F> to G3
1324 ----------------------------------------------------------------------
1326 In this list, "DIMENSION1_CHARS94<F>" means a graphic character set
1327 of dimension 1, chars 94, and final character <F>, etc...
1329 Note (*): Although these designations are not allowed in ISO2022,
1330 Emacs accepts them on decoding, and produces them on encoding
1331 CHARS96 character sets in a coding system which is characterized as
1332 7-bit environment, non-locking-shift, and non-single-shift.
1334 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
1335 '(' can be omitted. We refer to this as "short-form" hereafter.
1337 Now you may notice that there are a lot of ways of encoding the
1338 same multilingual text in ISO2022. Actually, there exist many
1339 coding systems such as Compound Text (used in X11's inter client
1340 communication, ISO-2022-JP (used in Japanese Internet), ISO-2022-KR
1341 (used in Korean Internet), EUC (Extended UNIX Code, used in Asian
1342 localized platforms), and all of these are variants of ISO2022.
1344 In addition to the above, Emacs handles two more kinds of escape
1345 sequences: ISO6429's direction specification and Emacs' private
1346 sequence for specifying character composition.
1348 ISO6429's direction specification takes the following form:
1349 o CSI ']' -- end of the current direction
1350 o CSI '0' ']' -- end of the current direction
1351 o CSI '1' ']' -- start of left-to-right text
1352 o CSI '2' ']' -- start of right-to-left text
1353 The control character CSI (0x9B: control sequence introducer) is
1354 abbreviated to the escape sequence ESC '[' in a 7-bit environment.
1356 Character composition specification takes the following form:
1357 o ESC '0' -- start relative composition
1358 o ESC '1' -- end composition
1359 o ESC '2' -- start rule-base composition (*)
1360 o ESC '3' -- start relative composition with alternate chars (**)
1361 o ESC '4' -- start rule-base composition with alternate chars (**)
1362 Since these are not standard escape sequences of any ISO standard,
1363 the use of them with these meanings is restricted to Emacs only.
1365 (*) This form is used only in Emacs 20.5 and older versions,
1366 but the newer versions can safely decode it.
1367 (**) This form is used only in Emacs 21.1 and newer versions,
1368 and the older versions can't decode it.
1370 Here's a list of example usages of these composition escape
1371 sequences (categorized by `enum composition_method').
1373 COMPOSITION_RELATIVE:
1374 ESC 0 CHAR [ CHAR ] ESC 1
1375 COMPOSITION_WITH_RULE:
1376 ESC 2 CHAR [ RULE CHAR ] ESC 1
1377 COMPOSITION_WITH_ALTCHARS:
1378 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
1379 COMPOSITION_WITH_RULE_ALTCHARS:
1380 ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */
1382 enum iso_code_class_type iso_code_class
[256];
1384 #define CHARSET_OK(idx, charset, c) \
1385 (coding_system_table[idx] \
1386 && (charset == CHARSET_ASCII \
1387 || (safe_chars = coding_safe_chars (coding_system_table[idx]->symbol), \
1388 CODING_SAFE_CHAR_P (safe_chars, c))) \
1389 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding_system_table[idx], \
1391 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
1393 #define SHIFT_OUT_OK(idx) \
1394 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
1396 #define COMPOSITION_OK(idx) \
1397 (coding_system_table[idx]->composing != COMPOSITION_DISABLED)
1399 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
1400 Check if a text is encoded in ISO2022. If it is, return an
1401 integer in which appropriate flag bits any of:
1402 CODING_CATEGORY_MASK_ISO_7
1403 CODING_CATEGORY_MASK_ISO_7_TIGHT
1404 CODING_CATEGORY_MASK_ISO_8_1
1405 CODING_CATEGORY_MASK_ISO_8_2
1406 CODING_CATEGORY_MASK_ISO_7_ELSE
1407 CODING_CATEGORY_MASK_ISO_8_ELSE
1408 are set. If a code which should never appear in ISO2022 is found,
1411 If *latin_extra_code_state is zero and Latin extra codes are found,
1412 set *latin_extra_code_state to 1 and return 0. If it is nonzero,
1413 accept Latin extra codes. */
1416 detect_coding_iso2022 (src
, src_end
, multibytep
, latin_extra_code_state
)
1417 unsigned char *src
, *src_end
;
1419 int *latin_extra_code_state
;
1421 int mask
= CODING_CATEGORY_MASK_ISO
;
1423 int reg
[4], shift_out
= 0, single_shifting
= 0;
1425 /* Dummy for ONE_MORE_BYTE. */
1426 struct coding_system dummy_coding
;
1427 struct coding_system
*coding
= &dummy_coding
;
1428 Lisp_Object safe_chars
;
1430 reg
[0] = CHARSET_ASCII
, reg
[1] = reg
[2] = reg
[3] = -1;
1433 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, mask
& mask_found
);
1438 if (inhibit_iso_escape_detection
)
1440 single_shifting
= 0;
1441 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, mask
& mask_found
);
1442 if (c
>= '(' && c
<= '/')
1444 /* Designation sequence for a charset of dimension 1. */
1445 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
, mask
& mask_found
);
1446 if (c1
< ' ' || c1
>= 0x80
1447 || (charset
= iso_charset_table
[0][c
>= ','][c1
]) < 0)
1448 /* Invalid designation sequence. Just ignore. */
1450 reg
[(c
- '(') % 4] = charset
;
1454 /* Designation sequence for a charset of dimension 2. */
1455 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, mask
& mask_found
);
1456 if (c
>= '@' && c
<= 'B')
1457 /* Designation for JISX0208.1978, GB2312, or JISX0208. */
1458 reg
[0] = charset
= iso_charset_table
[1][0][c
];
1459 else if (c
>= '(' && c
<= '/')
1461 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
,
1463 if (c1
< ' ' || c1
>= 0x80
1464 || (charset
= iso_charset_table
[1][c
>= ','][c1
]) < 0)
1465 /* Invalid designation sequence. Just ignore. */
1467 reg
[(c
- '(') % 4] = charset
;
1470 /* Invalid designation sequence. Just ignore. */
1473 else if (c
== 'N' || c
== 'O')
1475 /* ESC <Fe> for SS2 or SS3. */
1476 mask
&= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1479 else if (c
>= '0' && c
<= '4')
1481 /* ESC <Fp> for start/end composition. */
1482 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7
))
1483 mask_found
|= CODING_CATEGORY_MASK_ISO_7
;
1485 mask
&= ~CODING_CATEGORY_MASK_ISO_7
;
1486 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT
))
1487 mask_found
|= CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1489 mask
&= ~CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1490 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_1
))
1491 mask_found
|= CODING_CATEGORY_MASK_ISO_8_1
;
1493 mask
&= ~CODING_CATEGORY_MASK_ISO_8_1
;
1494 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_2
))
1495 mask_found
|= CODING_CATEGORY_MASK_ISO_8_2
;
1497 mask
&= ~CODING_CATEGORY_MASK_ISO_8_2
;
1498 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
))
1499 mask_found
|= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1501 mask
&= ~CODING_CATEGORY_MASK_ISO_7_ELSE
;
1502 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
))
1503 mask_found
|= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1505 mask
&= ~CODING_CATEGORY_MASK_ISO_8_ELSE
;
1509 /* Invalid escape sequence. Just ignore. */
1512 /* We found a valid designation sequence for CHARSET. */
1513 mask
&= ~CODING_CATEGORY_MASK_ISO_8BIT
;
1514 c
= MAKE_CHAR (charset
, 0, 0);
1515 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7
, charset
, c
))
1516 mask_found
|= CODING_CATEGORY_MASK_ISO_7
;
1518 mask
&= ~CODING_CATEGORY_MASK_ISO_7
;
1519 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT
, charset
, c
))
1520 mask_found
|= CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1522 mask
&= ~CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1523 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
, charset
, c
))
1524 mask_found
|= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1526 mask
&= ~CODING_CATEGORY_MASK_ISO_7_ELSE
;
1527 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
, charset
, c
))
1528 mask_found
|= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1530 mask
&= ~CODING_CATEGORY_MASK_ISO_8_ELSE
;
1534 if (inhibit_iso_escape_detection
)
1536 single_shifting
= 0;
1539 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
)
1540 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
)))
1542 /* Locking shift out. */
1543 mask
&= ~CODING_CATEGORY_MASK_ISO_7BIT
;
1544 mask_found
|= CODING_CATEGORY_MASK_ISO_SHIFT
;
1549 if (inhibit_iso_escape_detection
)
1551 single_shifting
= 0;
1554 /* Locking shift in. */
1555 mask
&= ~CODING_CATEGORY_MASK_ISO_7BIT
;
1556 mask_found
|= CODING_CATEGORY_MASK_ISO_SHIFT
;
1561 single_shifting
= 0;
1565 int newmask
= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1567 if (inhibit_iso_escape_detection
)
1569 if (c
!= ISO_CODE_CSI
)
1571 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1572 & CODING_FLAG_ISO_SINGLE_SHIFT
)
1573 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1574 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1575 & CODING_FLAG_ISO_SINGLE_SHIFT
)
1576 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1577 single_shifting
= 1;
1579 if (VECTORP (Vlatin_extra_code_table
)
1580 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
1582 if (! *latin_extra_code_state
)
1584 *latin_extra_code_state
= 1;
1587 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1588 & CODING_FLAG_ISO_LATIN_EXTRA
)
1589 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1590 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1591 & CODING_FLAG_ISO_LATIN_EXTRA
)
1592 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1595 mask_found
|= newmask
;
1602 single_shifting
= 0;
1607 single_shifting
= 0;
1608 if (VECTORP (Vlatin_extra_code_table
)
1609 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
1613 if (! *latin_extra_code_state
)
1615 *latin_extra_code_state
= 1;
1618 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1619 & CODING_FLAG_ISO_LATIN_EXTRA
)
1620 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1621 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1622 & CODING_FLAG_ISO_LATIN_EXTRA
)
1623 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1625 mask_found
|= newmask
;
1632 mask
&= ~(CODING_CATEGORY_MASK_ISO_7BIT
1633 | CODING_CATEGORY_MASK_ISO_7_ELSE
);
1634 mask_found
|= CODING_CATEGORY_MASK_ISO_8_1
;
1635 /* Check the length of succeeding codes of the range
1636 0xA0..0FF. If the byte length is odd, we exclude
1637 CODING_CATEGORY_MASK_ISO_8_2. We can check this only
1638 when we are not single shifting. */
1639 if (!single_shifting
1640 && mask
& CODING_CATEGORY_MASK_ISO_8_2
)
1645 while (src
< src_end
)
1647 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
,
1654 if (i
& 1 && src
< src_end
)
1655 mask
&= ~CODING_CATEGORY_MASK_ISO_8_2
;
1657 mask_found
|= CODING_CATEGORY_MASK_ISO_8_2
;
1659 /* This means that we have read one extra byte. */
1666 return (mask
& mask_found
);
1669 /* Decode a character of which charset is CHARSET, the 1st position
1670 code is C1, the 2nd position code is C2, and return the decoded
1671 character code. If the variable `translation_table' is non-nil,
1672 returned the translated code. */
1674 #define DECODE_ISO_CHARACTER(charset, c1, c2) \
1675 (NILP (translation_table) \
1676 ? MAKE_CHAR (charset, c1, c2) \
1677 : translate_char (translation_table, -1, charset, c1, c2))
1679 /* Set designation state into CODING. */
1680 #define DECODE_DESIGNATION(reg, dimension, chars, final_char) \
1684 if (final_char < '0' || final_char >= 128) \
1685 goto label_invalid_code; \
1686 charset = ISO_CHARSET_TABLE (make_number (dimension), \
1687 make_number (chars), \
1688 make_number (final_char)); \
1689 c = MAKE_CHAR (charset, 0, 0); \
1691 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg \
1692 || CODING_SAFE_CHAR_P (safe_chars, c))) \
1694 if (coding->spec.iso2022.last_invalid_designation_register == 0 \
1696 && charset == CHARSET_ASCII) \
1698 /* We should insert this designation sequence as is so \
1699 that it is surely written back to a file. */ \
1700 coding->spec.iso2022.last_invalid_designation_register = -1; \
1701 goto label_invalid_code; \
1703 coding->spec.iso2022.last_invalid_designation_register = -1; \
1704 if ((coding->mode & CODING_MODE_DIRECTION) \
1705 && CHARSET_REVERSE_CHARSET (charset) >= 0) \
1706 charset = CHARSET_REVERSE_CHARSET (charset); \
1707 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1711 coding->spec.iso2022.last_invalid_designation_register = reg; \
1712 goto label_invalid_code; \
1716 /* Allocate a memory block for storing information about compositions.
1717 The block is chained to the already allocated blocks. */
1720 coding_allocate_composition_data (coding
, char_offset
)
1721 struct coding_system
*coding
;
1724 struct composition_data
*cmp_data
1725 = (struct composition_data
*) xmalloc (sizeof *cmp_data
);
1727 cmp_data
->char_offset
= char_offset
;
1729 cmp_data
->prev
= coding
->cmp_data
;
1730 cmp_data
->next
= NULL
;
1731 if (coding
->cmp_data
)
1732 coding
->cmp_data
->next
= cmp_data
;
1733 coding
->cmp_data
= cmp_data
;
1734 coding
->cmp_data_start
= 0;
1735 coding
->composing
= COMPOSITION_NO
;
1738 /* Handle composition start sequence ESC 0, ESC 2, ESC 3, or ESC 4.
1739 ESC 0 : relative composition : ESC 0 CHAR ... ESC 1
1740 ESC 2 : rulebase composition : ESC 2 CHAR RULE CHAR RULE ... CHAR ESC 1
1741 ESC 3 : altchar composition : ESC 3 ALT ... ESC 0 CHAR ... ESC 1
1742 ESC 4 : alt&rule composition : ESC 4 ALT RULE .. ALT ESC 0 CHAR ... ESC 1
1745 #define DECODE_COMPOSITION_START(c1) \
1747 if (coding->composing == COMPOSITION_DISABLED) \
1749 *dst++ = ISO_CODE_ESC; \
1750 *dst++ = c1 & 0x7f; \
1751 coding->produced_char += 2; \
1753 else if (!COMPOSING_P (coding)) \
1755 /* This is surely the start of a composition. We must be sure \
1756 that coding->cmp_data has enough space to store the \
1757 information about the composition. If not, terminate the \
1758 current decoding loop, allocate one more memory block for \
1759 coding->cmp_data in the caller, then start the decoding \
1760 loop again. We can't allocate memory here directly because \
1761 it may cause buffer/string relocation. */ \
1762 if (!coding->cmp_data \
1763 || (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH \
1764 >= COMPOSITION_DATA_SIZE)) \
1766 coding->result = CODING_FINISH_INSUFFICIENT_CMP; \
1767 goto label_end_of_loop; \
1769 coding->composing = (c1 == '0' ? COMPOSITION_RELATIVE \
1770 : c1 == '2' ? COMPOSITION_WITH_RULE \
1771 : c1 == '3' ? COMPOSITION_WITH_ALTCHARS \
1772 : COMPOSITION_WITH_RULE_ALTCHARS); \
1773 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, \
1774 coding->composing); \
1775 coding->composition_rule_follows = 0; \
1779 /* We are already handling a composition. If the method is \
1780 the following two, the codes following the current escape \
1781 sequence are actual characters stored in a buffer. */ \
1782 if (coding->composing == COMPOSITION_WITH_ALTCHARS \
1783 || coding->composing == COMPOSITION_WITH_RULE_ALTCHARS) \
1785 coding->composing = COMPOSITION_RELATIVE; \
1786 coding->composition_rule_follows = 0; \
1791 /* Handle composition end sequence ESC 1. */
1793 #define DECODE_COMPOSITION_END(c1) \
1795 if (! COMPOSING_P (coding)) \
1797 *dst++ = ISO_CODE_ESC; \
1799 coding->produced_char += 2; \
1803 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
1804 coding->composing = COMPOSITION_NO; \
1808 /* Decode a composition rule from the byte C1 (and maybe one more byte
1809 from SRC) and store one encoded composition rule in
1810 coding->cmp_data. */
1812 #define DECODE_COMPOSITION_RULE(c1) \
1816 if (c1 < 81) /* old format (before ver.21) */ \
1818 int gref = (c1) / 9; \
1819 int nref = (c1) % 9; \
1820 if (gref == 4) gref = 10; \
1821 if (nref == 4) nref = 10; \
1822 rule = COMPOSITION_ENCODE_RULE (gref, nref); \
1824 else if (c1 < 93) /* new format (after ver.21) */ \
1826 ONE_MORE_BYTE (c2); \
1827 rule = COMPOSITION_ENCODE_RULE (c1 - 81, c2 - 32); \
1829 CODING_ADD_COMPOSITION_COMPONENT (coding, rule); \
1830 coding->composition_rule_follows = 0; \
1834 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
1837 decode_coding_iso2022 (coding
, source
, destination
, src_bytes
, dst_bytes
)
1838 struct coding_system
*coding
;
1839 const unsigned char *source
;
1840 unsigned char *destination
;
1841 int src_bytes
, dst_bytes
;
1843 const unsigned char *src
= source
;
1844 const unsigned char *src_end
= source
+ src_bytes
;
1845 unsigned char *dst
= destination
;
1846 unsigned char *dst_end
= destination
+ dst_bytes
;
1847 /* Charsets invoked to graphic plane 0 and 1 respectively. */
1848 int charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1849 int charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
1850 /* SRC_BASE remembers the start position in source in each loop.
1851 The loop will be exited when there's not enough source code
1852 (within macro ONE_MORE_BYTE), or when there's not enough
1853 destination area to produce a character (within macro
1855 const unsigned char *src_base
;
1857 Lisp_Object translation_table
;
1858 Lisp_Object safe_chars
;
1860 safe_chars
= coding_safe_chars (coding
->symbol
);
1862 if (NILP (Venable_character_translation
))
1863 translation_table
= Qnil
;
1866 translation_table
= coding
->translation_table_for_decode
;
1867 if (NILP (translation_table
))
1868 translation_table
= Vstandard_translation_table_for_decode
;
1871 coding
->result
= CODING_FINISH_NORMAL
;
1880 /* We produce no character or one character. */
1881 switch (iso_code_class
[c1
])
1883 case ISO_0x20_or_0x7F
:
1884 if (COMPOSING_P (coding
) && coding
->composition_rule_follows
)
1886 DECODE_COMPOSITION_RULE (c1
);
1889 if (charset0
< 0 || CHARSET_CHARS (charset0
) == 94)
1891 /* This is SPACE or DEL. */
1892 charset
= CHARSET_ASCII
;
1895 /* This is a graphic character, we fall down ... */
1897 case ISO_graphic_plane_0
:
1898 if (COMPOSING_P (coding
) && coding
->composition_rule_follows
)
1900 DECODE_COMPOSITION_RULE (c1
);
1906 case ISO_0xA0_or_0xFF
:
1907 if (charset1
< 0 || CHARSET_CHARS (charset1
) == 94
1908 || coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
)
1909 goto label_invalid_code
;
1910 /* This is a graphic character, we fall down ... */
1912 case ISO_graphic_plane_1
:
1914 goto label_invalid_code
;
1919 if (COMPOSING_P (coding
))
1920 DECODE_COMPOSITION_END ('1');
1922 /* All ISO2022 control characters in this class have the
1923 same representation in Emacs internal format. */
1925 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
1926 && (coding
->eol_type
== CODING_EOL_CR
1927 || coding
->eol_type
== CODING_EOL_CRLF
))
1929 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
1930 goto label_end_of_loop
;
1932 charset
= CHARSET_ASCII
;
1936 if (COMPOSING_P (coding
))
1937 DECODE_COMPOSITION_END ('1');
1938 goto label_invalid_code
;
1940 case ISO_carriage_return
:
1941 if (COMPOSING_P (coding
))
1942 DECODE_COMPOSITION_END ('1');
1944 if (coding
->eol_type
== CODING_EOL_CR
)
1946 else if (coding
->eol_type
== CODING_EOL_CRLF
)
1949 if (c1
!= ISO_CODE_LF
)
1955 charset
= CHARSET_ASCII
;
1959 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
1960 || CODING_SPEC_ISO_DESIGNATION (coding
, 1) < 0)
1961 goto label_invalid_code
;
1962 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 1;
1963 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1967 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
))
1968 goto label_invalid_code
;
1969 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 0;
1970 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1973 case ISO_single_shift_2_7
:
1974 case ISO_single_shift_2
:
1975 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
1976 goto label_invalid_code
;
1977 /* SS2 is handled as an escape sequence of ESC 'N' */
1979 goto label_escape_sequence
;
1981 case ISO_single_shift_3
:
1982 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
1983 goto label_invalid_code
;
1984 /* SS2 is handled as an escape sequence of ESC 'O' */
1986 goto label_escape_sequence
;
1988 case ISO_control_sequence_introducer
:
1989 /* CSI is handled as an escape sequence of ESC '[' ... */
1991 goto label_escape_sequence
;
1995 label_escape_sequence
:
1996 /* Escape sequences handled by Emacs are invocation,
1997 designation, direction specification, and character
1998 composition specification. */
2001 case '&': /* revision of following character set */
2003 if (!(c1
>= '@' && c1
<= '~'))
2004 goto label_invalid_code
;
2006 if (c1
!= ISO_CODE_ESC
)
2007 goto label_invalid_code
;
2009 goto label_escape_sequence
;
2011 case '$': /* designation of 2-byte character set */
2012 if (! (coding
->flags
& CODING_FLAG_ISO_DESIGNATION
))
2013 goto label_invalid_code
;
2015 if (c1
>= '@' && c1
<= 'B')
2016 { /* designation of JISX0208.1978, GB2312.1980,
2018 DECODE_DESIGNATION (0, 2, 94, c1
);
2020 else if (c1
>= 0x28 && c1
<= 0x2B)
2021 { /* designation of DIMENSION2_CHARS94 character set */
2023 DECODE_DESIGNATION (c1
- 0x28, 2, 94, c2
);
2025 else if (c1
>= 0x2C && c1
<= 0x2F)
2026 { /* designation of DIMENSION2_CHARS96 character set */
2028 DECODE_DESIGNATION (c1
- 0x2C, 2, 96, c2
);
2031 goto label_invalid_code
;
2032 /* We must update these variables now. */
2033 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2034 charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
2037 case 'n': /* invocation of locking-shift-2 */
2038 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
2039 || CODING_SPEC_ISO_DESIGNATION (coding
, 2) < 0)
2040 goto label_invalid_code
;
2041 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 2;
2042 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2045 case 'o': /* invocation of locking-shift-3 */
2046 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
2047 || CODING_SPEC_ISO_DESIGNATION (coding
, 3) < 0)
2048 goto label_invalid_code
;
2049 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 3;
2050 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2053 case 'N': /* invocation of single-shift-2 */
2054 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2055 || CODING_SPEC_ISO_DESIGNATION (coding
, 2) < 0)
2056 goto label_invalid_code
;
2057 charset
= CODING_SPEC_ISO_DESIGNATION (coding
, 2);
2059 if (c1
< 0x20 || (c1
>= 0x80 && c1
< 0xA0))
2060 goto label_invalid_code
;
2063 case 'O': /* invocation of single-shift-3 */
2064 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2065 || CODING_SPEC_ISO_DESIGNATION (coding
, 3) < 0)
2066 goto label_invalid_code
;
2067 charset
= CODING_SPEC_ISO_DESIGNATION (coding
, 3);
2069 if (c1
< 0x20 || (c1
>= 0x80 && c1
< 0xA0))
2070 goto label_invalid_code
;
2073 case '0': case '2': case '3': case '4': /* start composition */
2074 DECODE_COMPOSITION_START (c1
);
2077 case '1': /* end composition */
2078 DECODE_COMPOSITION_END (c1
);
2081 case '[': /* specification of direction */
2082 if (coding
->flags
& CODING_FLAG_ISO_NO_DIRECTION
)
2083 goto label_invalid_code
;
2084 /* For the moment, nested direction is not supported.
2085 So, `coding->mode & CODING_MODE_DIRECTION' zero means
2086 left-to-right, and nonzero means right-to-left. */
2090 case ']': /* end of the current direction */
2091 coding
->mode
&= ~CODING_MODE_DIRECTION
;
2093 case '0': /* end of the current direction */
2094 case '1': /* start of left-to-right direction */
2097 coding
->mode
&= ~CODING_MODE_DIRECTION
;
2099 goto label_invalid_code
;
2102 case '2': /* start of right-to-left direction */
2105 coding
->mode
|= CODING_MODE_DIRECTION
;
2107 goto label_invalid_code
;
2111 goto label_invalid_code
;
2116 if (COMPOSING_P (coding
))
2117 DECODE_COMPOSITION_END ('1');
2121 /* CTEXT extended segment:
2122 ESC % / [0-4] M L --ENCODING-NAME-- \002 --BYTES--
2123 We keep these bytes as is for the moment.
2124 They may be decoded by post-read-conversion. */
2129 ONE_MORE_BYTE (dim
);
2132 size
= ((M
- 128) * 128) + (L
- 128);
2133 required
= 8 + size
* 2;
2134 if (dst
+ required
> (dst_bytes
? dst_end
: src
))
2135 goto label_end_of_loop
;
2136 *dst
++ = ISO_CODE_ESC
;
2141 dst
+= CHAR_STRING (M
, dst
), produced_chars
++;
2142 dst
+= CHAR_STRING (L
, dst
), produced_chars
++;
2146 dst
+= CHAR_STRING (c1
, dst
), produced_chars
++;
2148 coding
->produced_char
+= produced_chars
;
2152 unsigned char *d
= dst
;
2155 /* XFree86 extension for embedding UTF-8 in CTEXT:
2156 ESC % G --UTF-8-BYTES-- ESC % @
2157 We keep these bytes as is for the moment.
2158 They may be decoded by post-read-conversion. */
2159 if (d
+ 6 > (dst_bytes
? dst_end
: src
))
2160 goto label_end_of_loop
;
2161 *d
++ = ISO_CODE_ESC
;
2165 while (d
+ 1 < (dst_bytes
? dst_end
: src
))
2168 if (c1
== ISO_CODE_ESC
2169 && src
+ 1 < src_end
2176 d
+= CHAR_STRING (c1
, d
), produced_chars
++;
2178 if (d
+ 3 > (dst_bytes
? dst_end
: src
))
2179 goto label_end_of_loop
;
2180 *d
++ = ISO_CODE_ESC
;
2184 coding
->produced_char
+= produced_chars
+ 3;
2187 goto label_invalid_code
;
2191 if (! (coding
->flags
& CODING_FLAG_ISO_DESIGNATION
))
2192 goto label_invalid_code
;
2193 if (c1
>= 0x28 && c1
<= 0x2B)
2194 { /* designation of DIMENSION1_CHARS94 character set */
2196 DECODE_DESIGNATION (c1
- 0x28, 1, 94, c2
);
2198 else if (c1
>= 0x2C && c1
<= 0x2F)
2199 { /* designation of DIMENSION1_CHARS96 character set */
2201 DECODE_DESIGNATION (c1
- 0x2C, 1, 96, c2
);
2204 goto label_invalid_code
;
2205 /* We must update these variables now. */
2206 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2207 charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
2212 /* Now we know CHARSET and 1st position code C1 of a character.
2213 Produce a multibyte sequence for that character while getting
2214 2nd position code C2 if necessary. */
2215 if (CHARSET_DIMENSION (charset
) == 2)
2218 if (c1
< 0x80 ? c2
< 0x20 || c2
>= 0x80 : c2
< 0xA0)
2219 /* C2 is not in a valid range. */
2220 goto label_invalid_code
;
2222 c
= DECODE_ISO_CHARACTER (charset
, c1
, c2
);
2228 if (COMPOSING_P (coding
))
2229 DECODE_COMPOSITION_END ('1');
2232 if (! NILP (translation_table
))
2233 c
= translate_char (translation_table
, c
, 0, 0, 0);
2238 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
2239 coding
->produced
= dst
- destination
;
2244 /* ISO2022 encoding stuff. */
2247 It is not enough to say just "ISO2022" on encoding, we have to
2248 specify more details. In Emacs, each ISO2022 coding system
2249 variant has the following specifications:
2250 1. Initial designation to G0 through G3.
2251 2. Allows short-form designation?
2252 3. ASCII should be designated to G0 before control characters?
2253 4. ASCII should be designated to G0 at end of line?
2254 5. 7-bit environment or 8-bit environment?
2255 6. Use locking-shift?
2256 7. Use Single-shift?
2257 And the following two are only for Japanese:
2258 8. Use ASCII in place of JIS0201-1976-Roman?
2259 9. Use JISX0208-1983 in place of JISX0208-1978?
2260 These specifications are encoded in `coding->flags' as flag bits
2261 defined by macros CODING_FLAG_ISO_XXX. See `coding.h' for more
2265 /* Produce codes (escape sequence) for designating CHARSET to graphic
2266 register REG at DST, and increment DST. If <final-char> of CHARSET is
2267 '@', 'A', or 'B' and the coding system CODING allows, produce
2268 designation sequence of short-form. */
2270 #define ENCODE_DESIGNATION(charset, reg, coding) \
2272 unsigned char final_char = CHARSET_ISO_FINAL_CHAR (charset); \
2273 char *intermediate_char_94 = "()*+"; \
2274 char *intermediate_char_96 = ",-./"; \
2275 int revision = CODING_SPEC_ISO_REVISION_NUMBER(coding, charset); \
2277 if (revision < 255) \
2279 *dst++ = ISO_CODE_ESC; \
2281 *dst++ = '@' + revision; \
2283 *dst++ = ISO_CODE_ESC; \
2284 if (CHARSET_DIMENSION (charset) == 1) \
2286 if (CHARSET_CHARS (charset) == 94) \
2287 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2289 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
2294 if (CHARSET_CHARS (charset) == 94) \
2296 if (! (coding->flags & CODING_FLAG_ISO_SHORT_FORM) \
2298 || final_char < '@' || final_char > 'B') \
2299 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2302 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
2304 *dst++ = final_char; \
2305 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
2308 /* The following two macros produce codes (control character or escape
2309 sequence) for ISO2022 single-shift functions (single-shift-2 and
2312 #define ENCODE_SINGLE_SHIFT_2 \
2314 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2315 *dst++ = ISO_CODE_ESC, *dst++ = 'N'; \
2317 *dst++ = ISO_CODE_SS2; \
2318 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2321 #define ENCODE_SINGLE_SHIFT_3 \
2323 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2324 *dst++ = ISO_CODE_ESC, *dst++ = 'O'; \
2326 *dst++ = ISO_CODE_SS3; \
2327 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2330 /* The following four macros produce codes (control character or
2331 escape sequence) for ISO2022 locking-shift functions (shift-in,
2332 shift-out, locking-shift-2, and locking-shift-3). */
2334 #define ENCODE_SHIFT_IN \
2336 *dst++ = ISO_CODE_SI; \
2337 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; \
2340 #define ENCODE_SHIFT_OUT \
2342 *dst++ = ISO_CODE_SO; \
2343 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; \
2346 #define ENCODE_LOCKING_SHIFT_2 \
2348 *dst++ = ISO_CODE_ESC, *dst++ = 'n'; \
2349 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; \
2352 #define ENCODE_LOCKING_SHIFT_3 \
2354 *dst++ = ISO_CODE_ESC, *dst++ = 'o'; \
2355 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; \
2358 /* Produce codes for a DIMENSION1 character whose character set is
2359 CHARSET and whose position-code is C1. Designation and invocation
2360 sequences are also produced in advance if necessary. */
2362 #define ENCODE_ISO_CHARACTER_DIMENSION1(charset, c1) \
2364 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2366 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2367 *dst++ = c1 & 0x7F; \
2369 *dst++ = c1 | 0x80; \
2370 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2373 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2375 *dst++ = c1 & 0x7F; \
2378 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2380 *dst++ = c1 | 0x80; \
2384 /* Since CHARSET is not yet invoked to any graphic planes, we \
2385 must invoke it, or, at first, designate it to some graphic \
2386 register. Then repeat the loop to actually produce the \
2388 dst = encode_invocation_designation (charset, coding, dst); \
2391 /* Produce codes for a DIMENSION2 character whose character set is
2392 CHARSET and whose position-codes are C1 and C2. Designation and
2393 invocation codes are also produced in advance if necessary. */
2395 #define ENCODE_ISO_CHARACTER_DIMENSION2(charset, c1, c2) \
2397 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2399 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2400 *dst++ = c1 & 0x7F, *dst++ = c2 & 0x7F; \
2402 *dst++ = c1 | 0x80, *dst++ = c2 | 0x80; \
2403 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2406 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2408 *dst++ = c1 & 0x7F, *dst++= c2 & 0x7F; \
2411 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2413 *dst++ = c1 | 0x80, *dst++= c2 | 0x80; \
2417 /* Since CHARSET is not yet invoked to any graphic planes, we \
2418 must invoke it, or, at first, designate it to some graphic \
2419 register. Then repeat the loop to actually produce the \
2421 dst = encode_invocation_designation (charset, coding, dst); \
2424 #define ENCODE_ISO_CHARACTER(c) \
2426 int charset, c1, c2; \
2428 SPLIT_CHAR (c, charset, c1, c2); \
2429 if (CHARSET_DEFINED_P (charset)) \
2431 if (CHARSET_DIMENSION (charset) == 1) \
2433 if (charset == CHARSET_ASCII \
2434 && coding->flags & CODING_FLAG_ISO_USE_ROMAN) \
2435 charset = charset_latin_jisx0201; \
2436 ENCODE_ISO_CHARACTER_DIMENSION1 (charset, c1); \
2440 if (charset == charset_jisx0208 \
2441 && coding->flags & CODING_FLAG_ISO_USE_OLDJIS) \
2442 charset = charset_jisx0208_1978; \
2443 ENCODE_ISO_CHARACTER_DIMENSION2 (charset, c1, c2); \
2455 /* Instead of encoding character C, produce one or two `?'s. */
2457 #define ENCODE_UNSAFE_CHARACTER(c) \
2459 ENCODE_ISO_CHARACTER (CODING_REPLACEMENT_CHARACTER); \
2460 if (CHARSET_WIDTH (CHAR_CHARSET (c)) > 1) \
2461 ENCODE_ISO_CHARACTER (CODING_REPLACEMENT_CHARACTER); \
2465 /* Produce designation and invocation codes at a place pointed by DST
2466 to use CHARSET. The element `spec.iso2022' of *CODING is updated.
2470 encode_invocation_designation (charset
, coding
, dst
)
2472 struct coding_system
*coding
;
2475 int reg
; /* graphic register number */
2477 /* At first, check designations. */
2478 for (reg
= 0; reg
< 4; reg
++)
2479 if (charset
== CODING_SPEC_ISO_DESIGNATION (coding
, reg
))
2484 /* CHARSET is not yet designated to any graphic registers. */
2485 /* At first check the requested designation. */
2486 reg
= CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
);
2487 if (reg
== CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
)
2488 /* Since CHARSET requests no special designation, designate it
2489 to graphic register 0. */
2492 ENCODE_DESIGNATION (charset
, reg
, coding
);
2495 if (CODING_SPEC_ISO_INVOCATION (coding
, 0) != reg
2496 && CODING_SPEC_ISO_INVOCATION (coding
, 1) != reg
)
2498 /* Since the graphic register REG is not invoked to any graphic
2499 planes, invoke it to graphic plane 0. */
2502 case 0: /* graphic register 0 */
2506 case 1: /* graphic register 1 */
2510 case 2: /* graphic register 2 */
2511 if (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2512 ENCODE_SINGLE_SHIFT_2
;
2514 ENCODE_LOCKING_SHIFT_2
;
2517 case 3: /* graphic register 3 */
2518 if (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2519 ENCODE_SINGLE_SHIFT_3
;
2521 ENCODE_LOCKING_SHIFT_3
;
2529 /* Produce 2-byte codes for encoded composition rule RULE. */
2531 #define ENCODE_COMPOSITION_RULE(rule) \
2534 COMPOSITION_DECODE_RULE (rule, gref, nref); \
2535 *dst++ = 32 + 81 + gref; \
2536 *dst++ = 32 + nref; \
2539 /* Produce codes for indicating the start of a composition sequence
2540 (ESC 0, ESC 3, or ESC 4). DATA points to an array of integers
2541 which specify information about the composition. See the comment
2542 in coding.h for the format of DATA. */
2544 #define ENCODE_COMPOSITION_START(coding, data) \
2546 coding->composing = data[3]; \
2547 *dst++ = ISO_CODE_ESC; \
2548 if (coding->composing == COMPOSITION_RELATIVE) \
2552 *dst++ = (coding->composing == COMPOSITION_WITH_ALTCHARS \
2554 coding->cmp_data_index = coding->cmp_data_start + 4; \
2555 coding->composition_rule_follows = 0; \
2559 /* Produce codes for indicating the end of the current composition. */
2561 #define ENCODE_COMPOSITION_END(coding, data) \
2563 *dst++ = ISO_CODE_ESC; \
2565 coding->cmp_data_start += data[0]; \
2566 coding->composing = COMPOSITION_NO; \
2567 if (coding->cmp_data_start == coding->cmp_data->used \
2568 && coding->cmp_data->next) \
2570 coding->cmp_data = coding->cmp_data->next; \
2571 coding->cmp_data_start = 0; \
2575 /* Produce composition start sequence ESC 0. Here, this sequence
2576 doesn't mean the start of a new composition but means that we have
2577 just produced components (alternate chars and composition rules) of
2578 the composition and the actual text follows in SRC. */
2580 #define ENCODE_COMPOSITION_FAKE_START(coding) \
2582 *dst++ = ISO_CODE_ESC; \
2584 coding->composing = COMPOSITION_RELATIVE; \
2587 /* The following three macros produce codes for indicating direction
2589 #define ENCODE_CONTROL_SEQUENCE_INTRODUCER \
2591 if (coding->flags == CODING_FLAG_ISO_SEVEN_BITS) \
2592 *dst++ = ISO_CODE_ESC, *dst++ = '['; \
2594 *dst++ = ISO_CODE_CSI; \
2597 #define ENCODE_DIRECTION_R2L \
2598 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '2', *dst++ = ']'
2600 #define ENCODE_DIRECTION_L2R \
2601 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '0', *dst++ = ']'
2603 /* Produce codes for designation and invocation to reset the graphic
2604 planes and registers to initial state. */
2605 #define ENCODE_RESET_PLANE_AND_REGISTER \
2608 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != 0) \
2610 for (reg = 0; reg < 4; reg++) \
2611 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg) >= 0 \
2612 && (CODING_SPEC_ISO_DESIGNATION (coding, reg) \
2613 != CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg))) \
2614 ENCODE_DESIGNATION \
2615 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \
2618 /* Produce designation sequences of charsets in the line started from
2619 SRC to a place pointed by DST, and return updated DST.
2621 If the current block ends before any end-of-line, we may fail to
2622 find all the necessary designations. */
2624 static unsigned char *
2625 encode_designation_at_bol (coding
, translation_table
, src
, src_end
, dst
)
2626 struct coding_system
*coding
;
2627 Lisp_Object translation_table
;
2628 const unsigned char *src
, *src_end
;
2631 int charset
, c
, found
= 0, reg
;
2632 /* Table of charsets to be designated to each graphic register. */
2635 for (reg
= 0; reg
< 4; reg
++)
2644 charset
= CHAR_CHARSET (c
);
2645 reg
= CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
);
2646 if (reg
!= CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
&& r
[reg
] < 0)
2656 for (reg
= 0; reg
< 4; reg
++)
2658 && CODING_SPEC_ISO_DESIGNATION (coding
, reg
) != r
[reg
])
2659 ENCODE_DESIGNATION (r
[reg
], reg
, coding
);
2665 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
2668 encode_coding_iso2022 (coding
, source
, destination
, src_bytes
, dst_bytes
)
2669 struct coding_system
*coding
;
2670 const unsigned char *source
;
2671 unsigned char *destination
;
2672 int src_bytes
, dst_bytes
;
2674 const unsigned char *src
= source
;
2675 const unsigned char *src_end
= source
+ src_bytes
;
2676 unsigned char *dst
= destination
;
2677 unsigned char *dst_end
= destination
+ dst_bytes
;
2678 /* Since the maximum bytes produced by each loop is 20, we subtract 19
2679 from DST_END to assure overflow checking is necessary only at the
2681 unsigned char *adjusted_dst_end
= dst_end
- 19;
2682 /* SRC_BASE remembers the start position in source in each loop.
2683 The loop will be exited when there's not enough source text to
2684 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2685 there's not enough destination area to produce encoded codes
2686 (within macro EMIT_BYTES). */
2687 const unsigned char *src_base
;
2689 Lisp_Object translation_table
;
2690 Lisp_Object safe_chars
;
2692 if (coding
->flags
& CODING_FLAG_ISO_SAFE
)
2693 coding
->mode
|= CODING_MODE_INHIBIT_UNENCODABLE_CHAR
;
2695 safe_chars
= coding_safe_chars (coding
->symbol
);
2697 if (NILP (Venable_character_translation
))
2698 translation_table
= Qnil
;
2701 translation_table
= coding
->translation_table_for_encode
;
2702 if (NILP (translation_table
))
2703 translation_table
= Vstandard_translation_table_for_encode
;
2706 coding
->consumed_char
= 0;
2712 if (dst
>= (dst_bytes
? adjusted_dst_end
: (src
- 19)))
2714 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
2718 if (coding
->flags
& CODING_FLAG_ISO_DESIGNATE_AT_BOL
2719 && CODING_SPEC_ISO_BOL (coding
))
2721 /* We have to produce designation sequences if any now. */
2722 dst
= encode_designation_at_bol (coding
, translation_table
,
2724 CODING_SPEC_ISO_BOL (coding
) = 0;
2727 /* Check composition start and end. */
2728 if (coding
->composing
!= COMPOSITION_DISABLED
2729 && coding
->cmp_data_start
< coding
->cmp_data
->used
)
2731 struct composition_data
*cmp_data
= coding
->cmp_data
;
2732 int *data
= cmp_data
->data
+ coding
->cmp_data_start
;
2733 int this_pos
= cmp_data
->char_offset
+ coding
->consumed_char
;
2735 if (coding
->composing
== COMPOSITION_RELATIVE
)
2737 if (this_pos
== data
[2])
2739 ENCODE_COMPOSITION_END (coding
, data
);
2740 cmp_data
= coding
->cmp_data
;
2741 data
= cmp_data
->data
+ coding
->cmp_data_start
;
2744 else if (COMPOSING_P (coding
))
2746 /* COMPOSITION_WITH_ALTCHARS or COMPOSITION_WITH_RULE_ALTCHAR */
2747 if (coding
->cmp_data_index
== coding
->cmp_data_start
+ data
[0])
2748 /* We have consumed components of the composition.
2749 What follows in SRC is the composition's base
2751 ENCODE_COMPOSITION_FAKE_START (coding
);
2754 int c
= cmp_data
->data
[coding
->cmp_data_index
++];
2755 if (coding
->composition_rule_follows
)
2757 ENCODE_COMPOSITION_RULE (c
);
2758 coding
->composition_rule_follows
= 0;
2762 if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
2763 && ! CODING_SAFE_CHAR_P (safe_chars
, c
))
2764 ENCODE_UNSAFE_CHARACTER (c
);
2766 ENCODE_ISO_CHARACTER (c
);
2767 if (coding
->composing
== COMPOSITION_WITH_RULE_ALTCHARS
)
2768 coding
->composition_rule_follows
= 1;
2773 if (!COMPOSING_P (coding
))
2775 if (this_pos
== data
[1])
2777 ENCODE_COMPOSITION_START (coding
, data
);
2785 /* Now encode the character C. */
2786 if (c
< 0x20 || c
== 0x7F)
2790 if (! (coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
))
2792 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_CNTL
)
2793 ENCODE_RESET_PLANE_AND_REGISTER
;
2797 /* fall down to treat '\r' as '\n' ... */
2802 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_EOL
)
2803 ENCODE_RESET_PLANE_AND_REGISTER
;
2804 if (coding
->flags
& CODING_FLAG_ISO_INIT_AT_BOL
)
2805 bcopy (coding
->spec
.iso2022
.initial_designation
,
2806 coding
->spec
.iso2022
.current_designation
,
2807 sizeof coding
->spec
.iso2022
.initial_designation
);
2808 if (coding
->eol_type
== CODING_EOL_LF
2809 || coding
->eol_type
== CODING_EOL_UNDECIDED
)
2810 *dst
++ = ISO_CODE_LF
;
2811 else if (coding
->eol_type
== CODING_EOL_CRLF
)
2812 *dst
++ = ISO_CODE_CR
, *dst
++ = ISO_CODE_LF
;
2814 *dst
++ = ISO_CODE_CR
;
2815 CODING_SPEC_ISO_BOL (coding
) = 1;
2819 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_CNTL
)
2820 ENCODE_RESET_PLANE_AND_REGISTER
;
2824 else if (ASCII_BYTE_P (c
))
2825 ENCODE_ISO_CHARACTER (c
);
2826 else if (SINGLE_BYTE_CHAR_P (c
))
2831 else if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
2832 && ! CODING_SAFE_CHAR_P (safe_chars
, c
))
2833 ENCODE_UNSAFE_CHARACTER (c
);
2835 ENCODE_ISO_CHARACTER (c
);
2837 coding
->consumed_char
++;
2841 coding
->consumed
= src_base
- source
;
2842 coding
->produced
= coding
->produced_char
= dst
- destination
;
2846 /*** 4. SJIS and BIG5 handlers ***/
2848 /* Although SJIS and BIG5 are not ISO coding systems, they are used
2849 quite widely. So, for the moment, Emacs supports them in the bare
2850 C code. But, in the future, they may be supported only by CCL. */
2852 /* SJIS is a coding system encoding three character sets: ASCII, right
2853 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded
2854 as is. A character of charset katakana-jisx0201 is encoded by
2855 "position-code + 0x80". A character of charset japanese-jisx0208
2856 is encoded in 2-byte but two position-codes are divided and shifted
2857 so that it fits in the range below.
2859 --- CODE RANGE of SJIS ---
2860 (character set) (range)
2862 KATAKANA-JISX0201 0xA1 .. 0xDF
2863 JISX0208 (1st byte) 0x81 .. 0x9F and 0xE0 .. 0xEF
2864 (2nd byte) 0x40 .. 0x7E and 0x80 .. 0xFC
2865 -------------------------------
2869 /* BIG5 is a coding system encoding two character sets: ASCII and
2870 Big5. An ASCII character is encoded as is. Big5 is a two-byte
2871 character set and is encoded in two bytes.
2873 --- CODE RANGE of BIG5 ---
2874 (character set) (range)
2876 Big5 (1st byte) 0xA1 .. 0xFE
2877 (2nd byte) 0x40 .. 0x7E and 0xA1 .. 0xFE
2878 --------------------------
2880 Since the number of characters in Big5 is larger than maximum
2881 characters in Emacs' charset (96x96), it can't be handled as one
2882 charset. So, in Emacs, Big5 is divided into two: `charset-big5-1'
2883 and `charset-big5-2'. Both are DIMENSION2 and CHARS94. The former
2884 contains frequently used characters and the latter contains less
2885 frequently used characters. */
2887 /* Macros to decode or encode a character of Big5 in BIG5. B1 and B2
2888 are the 1st and 2nd position-codes of Big5 in BIG5 coding system.
2889 C1 and C2 are the 1st and 2nd position-codes of Emacs' internal
2890 format. CHARSET is `charset_big5_1' or `charset_big5_2'. */
2892 /* Number of Big5 characters which have the same code in 1st byte. */
2893 #define BIG5_SAME_ROW (0xFF - 0xA1 + 0x7F - 0x40)
2895 #define DECODE_BIG5(b1, b2, charset, c1, c2) \
2898 = (b1 - 0xA1) * BIG5_SAME_ROW + b2 - (b2 < 0x7F ? 0x40 : 0x62); \
2900 charset = charset_big5_1; \
2903 charset = charset_big5_2; \
2904 temp -= (0xC9 - 0xA1) * BIG5_SAME_ROW; \
2906 c1 = temp / (0xFF - 0xA1) + 0x21; \
2907 c2 = temp % (0xFF - 0xA1) + 0x21; \
2910 #define ENCODE_BIG5(charset, c1, c2, b1, b2) \
2912 unsigned int temp = (c1 - 0x21) * (0xFF - 0xA1) + (c2 - 0x21); \
2913 if (charset == charset_big5_2) \
2914 temp += BIG5_SAME_ROW * (0xC9 - 0xA1); \
2915 b1 = temp / BIG5_SAME_ROW + 0xA1; \
2916 b2 = temp % BIG5_SAME_ROW; \
2917 b2 += b2 < 0x3F ? 0x40 : 0x62; \
2920 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2921 Check if a text is encoded in SJIS. If it is, return
2922 CODING_CATEGORY_MASK_SJIS, else return 0. */
2925 detect_coding_sjis (src
, src_end
, multibytep
)
2926 unsigned char *src
, *src_end
;
2930 /* Dummy for ONE_MORE_BYTE. */
2931 struct coding_system dummy_coding
;
2932 struct coding_system
*coding
= &dummy_coding
;
2936 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, CODING_CATEGORY_MASK_SJIS
);
2939 if (c
== 0x80 || c
== 0xA0 || c
> 0xEF)
2941 if (c
<= 0x9F || c
>= 0xE0)
2943 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, 0);
2944 if (c
< 0x40 || c
== 0x7F || c
> 0xFC)
2950 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2951 Check if a text is encoded in BIG5. If it is, return
2952 CODING_CATEGORY_MASK_BIG5, else return 0. */
2955 detect_coding_big5 (src
, src_end
, multibytep
)
2956 unsigned char *src
, *src_end
;
2960 /* Dummy for ONE_MORE_BYTE. */
2961 struct coding_system dummy_coding
;
2962 struct coding_system
*coding
= &dummy_coding
;
2966 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, CODING_CATEGORY_MASK_BIG5
);
2969 if (c
< 0xA1 || c
> 0xFE)
2971 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, 0);
2972 if (c
< 0x40 || (c
> 0x7F && c
< 0xA1) || c
> 0xFE)
2977 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2978 Check if a text is encoded in UTF-8. If it is, return
2979 CODING_CATEGORY_MASK_UTF_8, else return 0. */
2981 #define UTF_8_1_OCTET_P(c) ((c) < 0x80)
2982 #define UTF_8_EXTRA_OCTET_P(c) (((c) & 0xC0) == 0x80)
2983 #define UTF_8_2_OCTET_LEADING_P(c) (((c) & 0xE0) == 0xC0)
2984 #define UTF_8_3_OCTET_LEADING_P(c) (((c) & 0xF0) == 0xE0)
2985 #define UTF_8_4_OCTET_LEADING_P(c) (((c) & 0xF8) == 0xF0)
2986 #define UTF_8_5_OCTET_LEADING_P(c) (((c) & 0xFC) == 0xF8)
2987 #define UTF_8_6_OCTET_LEADING_P(c) (((c) & 0xFE) == 0xFC)
2990 detect_coding_utf_8 (src
, src_end
, multibytep
)
2991 unsigned char *src
, *src_end
;
2995 int seq_maybe_bytes
;
2996 /* Dummy for ONE_MORE_BYTE. */
2997 struct coding_system dummy_coding
;
2998 struct coding_system
*coding
= &dummy_coding
;
3002 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, CODING_CATEGORY_MASK_UTF_8
);
3003 if (UTF_8_1_OCTET_P (c
))
3005 else if (UTF_8_2_OCTET_LEADING_P (c
))
3006 seq_maybe_bytes
= 1;
3007 else if (UTF_8_3_OCTET_LEADING_P (c
))
3008 seq_maybe_bytes
= 2;
3009 else if (UTF_8_4_OCTET_LEADING_P (c
))
3010 seq_maybe_bytes
= 3;
3011 else if (UTF_8_5_OCTET_LEADING_P (c
))
3012 seq_maybe_bytes
= 4;
3013 else if (UTF_8_6_OCTET_LEADING_P (c
))
3014 seq_maybe_bytes
= 5;
3020 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, 0);
3021 if (!UTF_8_EXTRA_OCTET_P (c
))
3025 while (seq_maybe_bytes
> 0);
3029 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
3030 Check if a text is encoded in UTF-16 Big Endian (endian == 1) or
3031 Little Endian (otherwise). If it is, return
3032 CODING_CATEGORY_MASK_UTF_16_BE or CODING_CATEGORY_MASK_UTF_16_LE,
3035 #define UTF_16_INVALID_P(val) \
3036 (((val) == 0xFFFE) \
3037 || ((val) == 0xFFFF))
3039 #define UTF_16_HIGH_SURROGATE_P(val) \
3040 (((val) & 0xD800) == 0xD800)
3042 #define UTF_16_LOW_SURROGATE_P(val) \
3043 (((val) & 0xDC00) == 0xDC00)
3046 detect_coding_utf_16 (src
, src_end
, multibytep
)
3047 unsigned char *src
, *src_end
;
3050 unsigned char c1
, c2
;
3051 /* Dummy for ONE_MORE_BYTE_CHECK_MULTIBYTE. */
3052 struct coding_system dummy_coding
;
3053 struct coding_system
*coding
= &dummy_coding
;
3055 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
, 0);
3056 ONE_MORE_BYTE_CHECK_MULTIBYTE (c2
, multibytep
, 0);
3058 if ((c1
== 0xFF) && (c2
== 0xFE))
3059 return CODING_CATEGORY_MASK_UTF_16_LE
;
3060 else if ((c1
== 0xFE) && (c2
== 0xFF))
3061 return CODING_CATEGORY_MASK_UTF_16_BE
;
3065 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
3066 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */
3069 decode_coding_sjis_big5 (coding
, source
, destination
,
3070 src_bytes
, dst_bytes
, sjis_p
)
3071 struct coding_system
*coding
;
3072 const unsigned char *source
;
3073 unsigned char *destination
;
3074 int src_bytes
, dst_bytes
;
3077 const unsigned char *src
= source
;
3078 const unsigned char *src_end
= source
+ src_bytes
;
3079 unsigned char *dst
= destination
;
3080 unsigned char *dst_end
= destination
+ dst_bytes
;
3081 /* SRC_BASE remembers the start position in source in each loop.
3082 The loop will be exited when there's not enough source code
3083 (within macro ONE_MORE_BYTE), or when there's not enough
3084 destination area to produce a character (within macro
3086 const unsigned char *src_base
;
3087 Lisp_Object translation_table
;
3089 if (NILP (Venable_character_translation
))
3090 translation_table
= Qnil
;
3093 translation_table
= coding
->translation_table_for_decode
;
3094 if (NILP (translation_table
))
3095 translation_table
= Vstandard_translation_table_for_decode
;
3098 coding
->produced_char
= 0;
3101 int c
, charset
, c1
, c2
= 0;
3108 charset
= CHARSET_ASCII
;
3113 if (coding
->eol_type
== CODING_EOL_CRLF
)
3119 /* To process C2 again, SRC is subtracted by 1. */
3122 else if (coding
->eol_type
== CODING_EOL_CR
)
3126 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
3127 && (coding
->eol_type
== CODING_EOL_CR
3128 || coding
->eol_type
== CODING_EOL_CRLF
))
3130 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3131 goto label_end_of_loop
;
3139 if (c1
== 0x80 || c1
== 0xA0 || c1
> 0xEF)
3140 goto label_invalid_code
;
3141 if (c1
<= 0x9F || c1
>= 0xE0)
3143 /* SJIS -> JISX0208 */
3145 if (c2
< 0x40 || c2
== 0x7F || c2
> 0xFC)
3146 goto label_invalid_code
;
3147 DECODE_SJIS (c1
, c2
, c1
, c2
);
3148 charset
= charset_jisx0208
;
3151 /* SJIS -> JISX0201-Kana */
3152 charset
= charset_katakana_jisx0201
;
3157 if (c1
< 0xA0 || c1
> 0xFE)
3158 goto label_invalid_code
;
3160 if (c2
< 0x40 || (c2
> 0x7E && c2
< 0xA1) || c2
> 0xFE)
3161 goto label_invalid_code
;
3162 DECODE_BIG5 (c1
, c2
, charset
, c1
, c2
);
3166 c
= DECODE_ISO_CHARACTER (charset
, c1
, c2
);
3178 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
3179 coding
->produced
= dst
- destination
;
3183 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".
3184 This function can encode charsets `ascii', `katakana-jisx0201',
3185 `japanese-jisx0208', `chinese-big5-1', and `chinese-big5-2'. We
3186 are sure that all these charsets are registered as official charset
3187 (i.e. do not have extended leading-codes). Characters of other
3188 charsets are produced without any encoding. If SJIS_P is 1, encode
3189 SJIS text, else encode BIG5 text. */
3192 encode_coding_sjis_big5 (coding
, source
, destination
,
3193 src_bytes
, dst_bytes
, sjis_p
)
3194 struct coding_system
*coding
;
3195 unsigned char *source
, *destination
;
3196 int src_bytes
, dst_bytes
;
3199 unsigned char *src
= source
;
3200 unsigned char *src_end
= source
+ src_bytes
;
3201 unsigned char *dst
= destination
;
3202 unsigned char *dst_end
= destination
+ dst_bytes
;
3203 /* SRC_BASE remembers the start position in source in each loop.
3204 The loop will be exited when there's not enough source text to
3205 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3206 there's not enough destination area to produce encoded codes
3207 (within macro EMIT_BYTES). */
3208 unsigned char *src_base
;
3209 Lisp_Object translation_table
;
3211 if (NILP (Venable_character_translation
))
3212 translation_table
= Qnil
;
3215 translation_table
= coding
->translation_table_for_encode
;
3216 if (NILP (translation_table
))
3217 translation_table
= Vstandard_translation_table_for_encode
;
3222 int c
, charset
, c1
, c2
;
3227 /* Now encode the character C. */
3228 if (SINGLE_BYTE_CHAR_P (c
))
3233 if (!(coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
))
3240 if (coding
->eol_type
== CODING_EOL_CRLF
)
3242 EMIT_TWO_BYTES ('\r', c
);
3245 else if (coding
->eol_type
== CODING_EOL_CR
)
3253 SPLIT_CHAR (c
, charset
, c1
, c2
);
3256 if (charset
== charset_jisx0208
3257 || charset
== charset_jisx0208_1978
)
3259 ENCODE_SJIS (c1
, c2
, c1
, c2
);
3260 EMIT_TWO_BYTES (c1
, c2
);
3262 else if (charset
== charset_katakana_jisx0201
)
3263 EMIT_ONE_BYTE (c1
| 0x80);
3264 else if (charset
== charset_latin_jisx0201
)
3266 else if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
)
3268 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3269 if (CHARSET_WIDTH (charset
) > 1)
3270 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3273 /* There's no way other than producing the internal
3275 EMIT_BYTES (src_base
, src
);
3279 if (charset
== charset_big5_1
|| charset
== charset_big5_2
)
3281 ENCODE_BIG5 (charset
, c1
, c2
, c1
, c2
);
3282 EMIT_TWO_BYTES (c1
, c2
);
3284 else if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
)
3286 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3287 if (CHARSET_WIDTH (charset
) > 1)
3288 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3291 /* There's no way other than producing the internal
3293 EMIT_BYTES (src_base
, src
);
3296 coding
->consumed_char
++;
3300 coding
->consumed
= src_base
- source
;
3301 coding
->produced
= coding
->produced_char
= dst
- destination
;
3305 /*** 5. CCL handlers ***/
3307 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
3308 Check if a text is encoded in a coding system of which
3309 encoder/decoder are written in CCL program. If it is, return
3310 CODING_CATEGORY_MASK_CCL, else return 0. */
3313 detect_coding_ccl (src
, src_end
, multibytep
)
3314 unsigned char *src
, *src_end
;
3317 unsigned char *valid
;
3319 /* Dummy for ONE_MORE_BYTE. */
3320 struct coding_system dummy_coding
;
3321 struct coding_system
*coding
= &dummy_coding
;
3323 /* No coding system is assigned to coding-category-ccl. */
3324 if (!coding_system_table
[CODING_CATEGORY_IDX_CCL
])
3327 valid
= coding_system_table
[CODING_CATEGORY_IDX_CCL
]->spec
.ccl
.valid_codes
;
3330 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
, CODING_CATEGORY_MASK_CCL
);
3337 /*** 6. End-of-line handlers ***/
3339 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
3342 decode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
)
3343 struct coding_system
*coding
;
3344 const unsigned char *source
;
3345 unsigned char *destination
;
3346 int src_bytes
, dst_bytes
;
3348 const unsigned char *src
= source
;
3349 unsigned char *dst
= destination
;
3350 const unsigned char *src_end
= src
+ src_bytes
;
3351 unsigned char *dst_end
= dst
+ dst_bytes
;
3352 Lisp_Object translation_table
;
3353 /* SRC_BASE remembers the start position in source in each loop.
3354 The loop will be exited when there's not enough source code
3355 (within macro ONE_MORE_BYTE), or when there's not enough
3356 destination area to produce a character (within macro
3358 const unsigned char *src_base
;
3361 translation_table
= Qnil
;
3362 switch (coding
->eol_type
)
3364 case CODING_EOL_CRLF
:
3379 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
))
3381 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3382 goto label_end_of_loop
;
3395 if (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
3397 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3398 goto label_end_of_loop
;
3407 default: /* no need for EOL handling */
3417 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
3418 coding
->produced
= dst
- destination
;
3422 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode
3423 format of end-of-line according to `coding->eol_type'. It also
3424 convert multibyte form 8-bit characters to unibyte if
3425 CODING->src_multibyte is nonzero. If `coding->mode &
3426 CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code '\r' in source text
3427 also means end-of-line. */
3430 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
)
3431 struct coding_system
*coding
;
3432 const unsigned char *source
;
3433 unsigned char *destination
;
3434 int src_bytes
, dst_bytes
;
3436 const unsigned char *src
= source
;
3437 unsigned char *dst
= destination
;
3438 const unsigned char *src_end
= src
+ src_bytes
;
3439 unsigned char *dst_end
= dst
+ dst_bytes
;
3440 Lisp_Object translation_table
;
3441 /* SRC_BASE remembers the start position in source in each loop.
3442 The loop will be exited when there's not enough source text to
3443 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3444 there's not enough destination area to produce encoded codes
3445 (within macro EMIT_BYTES). */
3446 const unsigned char *src_base
;
3449 int selective_display
= coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
;
3451 translation_table
= Qnil
;
3452 if (coding
->src_multibyte
3453 && *(src_end
- 1) == LEADING_CODE_8_BIT_CONTROL
)
3457 coding
->result
= CODING_FINISH_INSUFFICIENT_SRC
;
3460 if (coding
->eol_type
== CODING_EOL_CRLF
)
3462 while (src
< src_end
)
3468 else if (c
== '\n' || (c
== '\r' && selective_display
))
3469 EMIT_TWO_BYTES ('\r', '\n');
3479 if (!dst_bytes
|| src_bytes
<= dst_bytes
)
3481 safe_bcopy (src
, dst
, src_bytes
);
3487 if (coding
->src_multibyte
3488 && *(src
+ dst_bytes
- 1) == LEADING_CODE_8_BIT_CONTROL
)
3490 safe_bcopy (src
, dst
, dst_bytes
);
3491 src_base
= src
+ dst_bytes
;
3492 dst
= destination
+ dst_bytes
;
3493 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
3495 if (coding
->eol_type
== CODING_EOL_CR
)
3497 for (tmp
= destination
; tmp
< dst
; tmp
++)
3498 if (*tmp
== '\n') *tmp
= '\r';
3500 else if (selective_display
)
3502 for (tmp
= destination
; tmp
< dst
; tmp
++)
3503 if (*tmp
== '\r') *tmp
= '\n';
3506 if (coding
->src_multibyte
)
3507 dst
= destination
+ str_as_unibyte (destination
, dst
- destination
);
3509 coding
->consumed
= src_base
- source
;
3510 coding
->produced
= dst
- destination
;
3511 coding
->produced_char
= coding
->produced
;
3515 /*** 7. C library functions ***/
3517 /* In Emacs Lisp, a coding system is represented by a Lisp symbol which
3518 has a property `coding-system'. The value of this property is a
3519 vector of length 5 (called the coding-vector). Among elements of
3520 this vector, the first (element[0]) and the fifth (element[4])
3521 carry important information for decoding/encoding. Before
3522 decoding/encoding, this information should be set in fields of a
3523 structure of type `coding_system'.
3525 The value of the property `coding-system' can be a symbol of another
3526 subsidiary coding-system. In that case, Emacs gets coding-vector
3529 `element[0]' contains information to be set in `coding->type'. The
3530 value and its meaning is as follows:
3532 0 -- coding_type_emacs_mule
3533 1 -- coding_type_sjis
3534 2 -- coding_type_iso2022
3535 3 -- coding_type_big5
3536 4 -- coding_type_ccl encoder/decoder written in CCL
3537 nil -- coding_type_no_conversion
3538 t -- coding_type_undecided (automatic conversion on decoding,
3539 no-conversion on encoding)
3541 `element[4]' contains information to be set in `coding->flags' and
3542 `coding->spec'. The meaning varies by `coding->type'.
3544 If `coding->type' is `coding_type_iso2022', element[4] is a vector
3545 of length 32 (of which the first 13 sub-elements are used now).
3546 Meanings of these sub-elements are:
3548 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso2022'
3549 If the value is an integer of valid charset, the charset is
3550 assumed to be designated to graphic register N initially.
3552 If the value is minus, it is a minus value of charset which
3553 reserves graphic register N, which means that the charset is
3554 not designated initially but should be designated to graphic
3555 register N just before encoding a character in that charset.
3557 If the value is nil, graphic register N is never used on
3560 sub-element[N] where N is 4 through 11: to be set in `coding->flags'
3561 Each value takes t or nil. See the section ISO2022 of
3562 `coding.h' for more information.
3564 If `coding->type' is `coding_type_big5', element[4] is t to denote
3565 BIG5-ETen or nil to denote BIG5-HKU.
3567 If `coding->type' takes the other value, element[4] is ignored.
3569 Emacs Lisp's coding systems also carry information about format of
3570 end-of-line in a value of property `eol-type'. If the value is
3571 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
3572 means CODING_EOL_CR. If it is not integer, it should be a vector
3573 of subsidiary coding systems of which property `eol-type' has one
3574 of the above values.
3578 /* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
3579 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING
3580 is setup so that no conversion is necessary and return -1, else
3584 setup_coding_system (coding_system
, coding
)
3585 Lisp_Object coding_system
;
3586 struct coding_system
*coding
;
3588 Lisp_Object coding_spec
, coding_type
, eol_type
, plist
;
3591 /* At first, zero clear all members. */
3592 bzero (coding
, sizeof (struct coding_system
));
3594 /* Initialize some fields required for all kinds of coding systems. */
3595 coding
->symbol
= coding_system
;
3596 coding
->heading_ascii
= -1;
3597 coding
->post_read_conversion
= coding
->pre_write_conversion
= Qnil
;
3598 coding
->composing
= COMPOSITION_DISABLED
;
3599 coding
->cmp_data
= NULL
;
3601 if (NILP (coding_system
))
3602 goto label_invalid_coding_system
;
3604 coding_spec
= Fget (coding_system
, Qcoding_system
);
3606 if (!VECTORP (coding_spec
)
3607 || XVECTOR (coding_spec
)->size
!= 5
3608 || !CONSP (XVECTOR (coding_spec
)->contents
[3]))
3609 goto label_invalid_coding_system
;
3611 eol_type
= inhibit_eol_conversion
? Qnil
: Fget (coding_system
, Qeol_type
);
3612 if (VECTORP (eol_type
))
3614 coding
->eol_type
= CODING_EOL_UNDECIDED
;
3615 coding
->common_flags
= CODING_REQUIRE_DETECTION_MASK
;
3616 if (system_eol_type
!= CODING_EOL_LF
)
3617 coding
->common_flags
|= CODING_REQUIRE_ENCODING_MASK
;
3619 else if (XFASTINT (eol_type
) == 1)
3621 coding
->eol_type
= CODING_EOL_CRLF
;
3622 coding
->common_flags
3623 = CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3625 else if (XFASTINT (eol_type
) == 2)
3627 coding
->eol_type
= CODING_EOL_CR
;
3628 coding
->common_flags
3629 = CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3633 coding
->common_flags
= 0;
3634 coding
->eol_type
= CODING_EOL_LF
;
3637 coding_type
= XVECTOR (coding_spec
)->contents
[0];
3638 /* Try short cut. */
3639 if (SYMBOLP (coding_type
))
3641 if (EQ (coding_type
, Qt
))
3643 coding
->type
= coding_type_undecided
;
3644 coding
->common_flags
|= CODING_REQUIRE_DETECTION_MASK
;
3647 coding
->type
= coding_type_no_conversion
;
3648 /* Initialize this member. Any thing other than
3649 CODING_CATEGORY_IDX_UTF_16_BE and
3650 CODING_CATEGORY_IDX_UTF_16_LE are ok because they have
3651 special treatment in detect_eol. */
3652 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
3657 /* Get values of coding system properties:
3658 `post-read-conversion', `pre-write-conversion',
3659 `translation-table-for-decode', `translation-table-for-encode'. */
3660 plist
= XVECTOR (coding_spec
)->contents
[3];
3661 /* Pre & post conversion functions should be disabled if
3662 inhibit_eol_conversion is nonzero. This is the case that a code
3663 conversion function is called while those functions are running. */
3664 if (! inhibit_pre_post_conversion
)
3666 coding
->post_read_conversion
= Fplist_get (plist
, Qpost_read_conversion
);
3667 coding
->pre_write_conversion
= Fplist_get (plist
, Qpre_write_conversion
);
3669 val
= Fplist_get (plist
, Qtranslation_table_for_decode
);
3671 val
= Fget (val
, Qtranslation_table_for_decode
);
3672 coding
->translation_table_for_decode
= CHAR_TABLE_P (val
) ? val
: Qnil
;
3673 val
= Fplist_get (plist
, Qtranslation_table_for_encode
);
3675 val
= Fget (val
, Qtranslation_table_for_encode
);
3676 coding
->translation_table_for_encode
= CHAR_TABLE_P (val
) ? val
: Qnil
;
3677 val
= Fplist_get (plist
, Qcoding_category
);
3680 val
= Fget (val
, Qcoding_category_index
);
3682 coding
->category_idx
= XINT (val
);
3684 goto label_invalid_coding_system
;
3687 goto label_invalid_coding_system
;
3689 /* If the coding system has non-nil `composition' property, enable
3690 composition handling. */
3691 val
= Fplist_get (plist
, Qcomposition
);
3693 coding
->composing
= COMPOSITION_NO
;
3695 /* If the coding system is ascii-incompatible, record it in
3697 val
= Fplist_get (plist
, Qascii_incompatible
);
3699 coding
->common_flags
|= CODING_ASCII_INCOMPATIBLE_MASK
;
3701 switch (XFASTINT (coding_type
))
3704 coding
->type
= coding_type_emacs_mule
;
3705 coding
->common_flags
3706 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3707 if (!NILP (coding
->post_read_conversion
))
3708 coding
->common_flags
|= CODING_REQUIRE_DECODING_MASK
;
3709 if (!NILP (coding
->pre_write_conversion
))
3710 coding
->common_flags
|= CODING_REQUIRE_ENCODING_MASK
;
3714 coding
->type
= coding_type_sjis
;
3715 coding
->common_flags
3716 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3720 coding
->type
= coding_type_iso2022
;
3721 coding
->common_flags
3722 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3724 Lisp_Object val
, temp
;
3726 int i
, charset
, reg_bits
= 0;
3728 val
= XVECTOR (coding_spec
)->contents
[4];
3730 if (!VECTORP (val
) || XVECTOR (val
)->size
!= 32)
3731 goto label_invalid_coding_system
;
3733 flags
= XVECTOR (val
)->contents
;
3735 = ((NILP (flags
[4]) ? 0 : CODING_FLAG_ISO_SHORT_FORM
)
3736 | (NILP (flags
[5]) ? 0 : CODING_FLAG_ISO_RESET_AT_EOL
)
3737 | (NILP (flags
[6]) ? 0 : CODING_FLAG_ISO_RESET_AT_CNTL
)
3738 | (NILP (flags
[7]) ? 0 : CODING_FLAG_ISO_SEVEN_BITS
)
3739 | (NILP (flags
[8]) ? 0 : CODING_FLAG_ISO_LOCKING_SHIFT
)
3740 | (NILP (flags
[9]) ? 0 : CODING_FLAG_ISO_SINGLE_SHIFT
)
3741 | (NILP (flags
[10]) ? 0 : CODING_FLAG_ISO_USE_ROMAN
)
3742 | (NILP (flags
[11]) ? 0 : CODING_FLAG_ISO_USE_OLDJIS
)
3743 | (NILP (flags
[12]) ? 0 : CODING_FLAG_ISO_NO_DIRECTION
)
3744 | (NILP (flags
[13]) ? 0 : CODING_FLAG_ISO_INIT_AT_BOL
)
3745 | (NILP (flags
[14]) ? 0 : CODING_FLAG_ISO_DESIGNATE_AT_BOL
)
3746 | (NILP (flags
[15]) ? 0 : CODING_FLAG_ISO_SAFE
)
3747 | (NILP (flags
[16]) ? 0 : CODING_FLAG_ISO_LATIN_EXTRA
)
3750 /* Invoke graphic register 0 to plane 0. */
3751 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 0;
3752 /* Invoke graphic register 1 to plane 1 if we can use full 8-bit. */
3753 CODING_SPEC_ISO_INVOCATION (coding
, 1)
3754 = (coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
? -1 : 1);
3755 /* Not single shifting at first. */
3756 CODING_SPEC_ISO_SINGLE_SHIFTING (coding
) = 0;
3757 /* Beginning of buffer should also be regarded as bol. */
3758 CODING_SPEC_ISO_BOL (coding
) = 1;
3760 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3761 CODING_SPEC_ISO_REVISION_NUMBER (coding
, charset
) = 255;
3762 val
= Vcharset_revision_alist
;
3765 charset
= get_charset_id (Fcar_safe (XCAR (val
)));
3767 && (temp
= Fcdr_safe (XCAR (val
)), INTEGERP (temp
))
3768 && (i
= XINT (temp
), (i
>= 0 && (i
+ '@') < 128)))
3769 CODING_SPEC_ISO_REVISION_NUMBER (coding
, charset
) = i
;
3773 /* Checks FLAGS[REG] (REG = 0, 1, 2 3) and decide designations.
3774 FLAGS[REG] can be one of below:
3775 integer CHARSET: CHARSET occupies register I,
3776 t: designate nothing to REG initially, but can be used
3778 list of integer, nil, or t: designate the first
3779 element (if integer) to REG initially, the remaining
3780 elements (if integer) is designated to REG on request,
3781 if an element is t, REG can be used by any charsets,
3782 nil: REG is never used. */
3783 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3784 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3785 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
;
3786 for (i
= 0; i
< 4; i
++)
3788 if ((INTEGERP (flags
[i
])
3789 && (charset
= XINT (flags
[i
]), CHARSET_VALID_P (charset
)))
3790 || (charset
= get_charset_id (flags
[i
])) >= 0)
3792 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = charset
;
3793 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
) = i
;
3795 else if (EQ (flags
[i
], Qt
))
3797 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3799 coding
->flags
|= CODING_FLAG_ISO_DESIGNATION
;
3801 else if (CONSP (flags
[i
]))
3806 coding
->flags
|= CODING_FLAG_ISO_DESIGNATION
;
3807 if ((INTEGERP (XCAR (tail
))
3808 && (charset
= XINT (XCAR (tail
)),
3809 CHARSET_VALID_P (charset
)))
3810 || (charset
= get_charset_id (XCAR (tail
))) >= 0)
3812 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = charset
;
3813 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
) =i
;
3816 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3818 while (CONSP (tail
))
3820 if ((INTEGERP (XCAR (tail
))
3821 && (charset
= XINT (XCAR (tail
)),
3822 CHARSET_VALID_P (charset
)))
3823 || (charset
= get_charset_id (XCAR (tail
))) >= 0)
3824 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3826 else if (EQ (XCAR (tail
), Qt
))
3832 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3834 CODING_SPEC_ISO_DESIGNATION (coding
, i
)
3835 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
);
3838 if (reg_bits
&& ! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
))
3840 /* REG 1 can be used only by locking shift in 7-bit env. */
3841 if (coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
)
3843 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
3844 /* Without any shifting, only REG 0 and 1 can be used. */
3849 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3851 if (CHARSET_DEFINED_P (charset
)
3852 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3853 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
))
3855 /* There exist some default graphic registers to be
3858 /* We had better avoid designating a charset of
3859 CHARS96 to REG 0 as far as possible. */
3860 if (CHARSET_CHARS (charset
) == 96)
3861 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3863 ? 1 : (reg_bits
& 4 ? 2 : (reg_bits
& 8 ? 3 : 0)));
3865 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3867 ? 0 : (reg_bits
& 2 ? 1 : (reg_bits
& 4 ? 2 : 3)));
3871 coding
->common_flags
|= CODING_REQUIRE_FLUSHING_MASK
;
3872 coding
->spec
.iso2022
.last_invalid_designation_register
= -1;
3876 coding
->type
= coding_type_big5
;
3877 coding
->common_flags
3878 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3880 = (NILP (XVECTOR (coding_spec
)->contents
[4])
3881 ? CODING_FLAG_BIG5_HKU
3882 : CODING_FLAG_BIG5_ETEN
);
3886 coding
->type
= coding_type_ccl
;
3887 coding
->common_flags
3888 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3890 val
= XVECTOR (coding_spec
)->contents
[4];
3892 || setup_ccl_program (&(coding
->spec
.ccl
.decoder
),
3894 || setup_ccl_program (&(coding
->spec
.ccl
.encoder
),
3896 goto label_invalid_coding_system
;
3898 bzero (coding
->spec
.ccl
.valid_codes
, 256);
3899 val
= Fplist_get (plist
, Qvalid_codes
);
3904 for (; CONSP (val
); val
= XCDR (val
))
3908 && XINT (this) >= 0 && XINT (this) < 256)
3909 coding
->spec
.ccl
.valid_codes
[XINT (this)] = 1;
3910 else if (CONSP (this)
3911 && INTEGERP (XCAR (this))
3912 && INTEGERP (XCDR (this)))
3914 int start
= XINT (XCAR (this));
3915 int end
= XINT (XCDR (this));
3917 if (start
>= 0 && start
<= end
&& end
< 256)
3918 while (start
<= end
)
3919 coding
->spec
.ccl
.valid_codes
[start
++] = 1;
3924 coding
->common_flags
|= CODING_REQUIRE_FLUSHING_MASK
;
3925 coding
->spec
.ccl
.cr_carryover
= 0;
3926 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
3930 coding
->type
= coding_type_raw_text
;
3934 goto label_invalid_coding_system
;
3938 label_invalid_coding_system
:
3939 coding
->type
= coding_type_no_conversion
;
3940 coding
->category_idx
= CODING_CATEGORY_IDX_BINARY
;
3941 coding
->common_flags
= 0;
3942 coding
->eol_type
= CODING_EOL_UNDECIDED
;
3943 coding
->pre_write_conversion
= coding
->post_read_conversion
= Qnil
;
3944 return NILP (coding_system
) ? 0 : -1;
3947 /* Free memory blocks allocated for storing composition information. */
3950 coding_free_composition_data (coding
)
3951 struct coding_system
*coding
;
3953 struct composition_data
*cmp_data
= coding
->cmp_data
, *next
;
3957 /* Memory blocks are chained. At first, rewind to the first, then,
3958 free blocks one by one. */
3959 while (cmp_data
->prev
)
3960 cmp_data
= cmp_data
->prev
;
3963 next
= cmp_data
->next
;
3967 coding
->cmp_data
= NULL
;
3970 /* Set `char_offset' member of all memory blocks pointed by
3971 coding->cmp_data to POS. */
3974 coding_adjust_composition_offset (coding
, pos
)
3975 struct coding_system
*coding
;
3978 struct composition_data
*cmp_data
;
3980 for (cmp_data
= coding
->cmp_data
; cmp_data
; cmp_data
= cmp_data
->next
)
3981 cmp_data
->char_offset
= pos
;
3984 /* Setup raw-text or one of its subsidiaries in the structure
3985 coding_system CODING according to the already setup value eol_type
3986 in CODING. CODING should be setup for some coding system in
3990 setup_raw_text_coding_system (coding
)
3991 struct coding_system
*coding
;
3993 if (coding
->type
!= coding_type_raw_text
)
3995 coding
->symbol
= Qraw_text
;
3996 coding
->type
= coding_type_raw_text
;
3997 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
3999 Lisp_Object subsidiaries
;
4000 subsidiaries
= Fget (Qraw_text
, Qeol_type
);
4002 if (VECTORP (subsidiaries
)
4003 && XVECTOR (subsidiaries
)->size
== 3)
4005 = XVECTOR (subsidiaries
)->contents
[coding
->eol_type
];
4007 setup_coding_system (coding
->symbol
, coding
);
4012 /* Emacs has a mechanism to automatically detect a coding system if it
4013 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But,
4014 it's impossible to distinguish some coding systems accurately
4015 because they use the same range of codes. So, at first, coding
4016 systems are categorized into 7, those are:
4018 o coding-category-emacs-mule
4020 The category for a coding system which has the same code range
4021 as Emacs' internal format. Assigned the coding-system (Lisp
4022 symbol) `emacs-mule' by default.
4024 o coding-category-sjis
4026 The category for a coding system which has the same code range
4027 as SJIS. Assigned the coding-system (Lisp
4028 symbol) `japanese-shift-jis' by default.
4030 o coding-category-iso-7
4032 The category for a coding system which has the same code range
4033 as ISO2022 of 7-bit environment. This doesn't use any locking
4034 shift and single shift functions. This can encode/decode all
4035 charsets. Assigned the coding-system (Lisp symbol)
4036 `iso-2022-7bit' by default.
4038 o coding-category-iso-7-tight
4040 Same as coding-category-iso-7 except that this can
4041 encode/decode only the specified charsets.
4043 o coding-category-iso-8-1
4045 The category for a coding system which has the same code range
4046 as ISO2022 of 8-bit environment and graphic plane 1 used only
4047 for DIMENSION1 charset. This doesn't use any locking shift
4048 and single shift functions. Assigned the coding-system (Lisp
4049 symbol) `iso-latin-1' by default.
4051 o coding-category-iso-8-2
4053 The category for a coding system which has the same code range
4054 as ISO2022 of 8-bit environment and graphic plane 1 used only
4055 for DIMENSION2 charset. This doesn't use any locking shift
4056 and single shift functions. Assigned the coding-system (Lisp
4057 symbol) `japanese-iso-8bit' by default.
4059 o coding-category-iso-7-else
4061 The category for a coding system which has the same code range
4062 as ISO2022 of 7-bit environment but uses locking shift or
4063 single shift functions. Assigned the coding-system (Lisp
4064 symbol) `iso-2022-7bit-lock' by default.
4066 o coding-category-iso-8-else
4068 The category for a coding system which has the same code range
4069 as ISO2022 of 8-bit environment but uses locking shift or
4070 single shift functions. Assigned the coding-system (Lisp
4071 symbol) `iso-2022-8bit-ss2' by default.
4073 o coding-category-big5
4075 The category for a coding system which has the same code range
4076 as BIG5. Assigned the coding-system (Lisp symbol)
4077 `cn-big5' by default.
4079 o coding-category-utf-8
4081 The category for a coding system which has the same code range
4082 as UTF-8 (cf. RFC3629). Assigned the coding-system (Lisp
4083 symbol) `utf-8' by default.
4085 o coding-category-utf-16-be
4087 The category for a coding system in which a text has an
4088 Unicode signature (cf. Unicode Standard) in the order of BIG
4089 endian at the head. Assigned the coding-system (Lisp symbol)
4090 `utf-16-be' by default.
4092 o coding-category-utf-16-le
4094 The category for a coding system in which a text has an
4095 Unicode signature (cf. Unicode Standard) in the order of
4096 LITTLE endian at the head. Assigned the coding-system (Lisp
4097 symbol) `utf-16-le' by default.
4099 o coding-category-ccl
4101 The category for a coding system of which encoder/decoder is
4102 written in CCL programs. The default value is nil, i.e., no
4103 coding system is assigned.
4105 o coding-category-binary
4107 The category for a coding system not categorized in any of the
4108 above. Assigned the coding-system (Lisp symbol)
4109 `no-conversion' by default.
4111 Each of them is a Lisp symbol and the value is an actual
4112 `coding-system' (this is also a Lisp symbol) assigned by a user.
4113 What Emacs does actually is to detect a category of coding system.
4114 Then, it uses a `coding-system' assigned to it. If Emacs can't
4115 decide a single possible category, it selects a category of the
4116 highest priority. Priorities of categories are also specified by a
4117 user in a Lisp variable `coding-category-list'.
4122 int ascii_skip_code
[256];
4124 /* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded.
4125 If it detects possible coding systems, return an integer in which
4126 appropriate flag bits are set. Flag bits are defined by macros
4127 CODING_CATEGORY_MASK_XXX in `coding.h'. If PRIORITIES is non-NULL,
4128 it should point the table `coding_priorities'. In that case, only
4129 the flag bit for a coding system of the highest priority is set in
4130 the returned value. If MULTIBYTEP is nonzero, 8-bit codes of the
4131 range 0x80..0x9F are in multibyte form.
4133 How many ASCII characters are at the head is returned as *SKIP. */
4136 detect_coding_mask (source
, src_bytes
, priorities
, skip
, multibytep
)
4137 unsigned char *source
;
4138 int src_bytes
, *priorities
, *skip
;
4141 register unsigned char c
;
4142 unsigned char *src
= source
, *src_end
= source
+ src_bytes
;
4143 unsigned int mask
, utf16_examined_p
, iso2022_examined_p
;
4145 int null_byte_found
;
4146 int latin_extra_code_state
= 1;
4148 /* At first, skip all ASCII characters and control characters except
4149 for three ISO2022 specific control characters. */
4150 ascii_skip_code
[ISO_CODE_SO
] = 0;
4151 ascii_skip_code
[ISO_CODE_SI
] = 0;
4152 ascii_skip_code
[ISO_CODE_ESC
] = 0;
4154 label_loop_detect_coding
:
4155 null_byte_found
= 0;
4156 while (src
< src_end
&& ascii_skip_code
[*src
])
4157 null_byte_found
|= (! *src
++);
4158 if (! null_byte_found
)
4160 unsigned char *p
= src
+ 1;
4162 null_byte_found
|= (! *p
++);
4164 *skip
= src
- source
;
4167 /* We found nothing other than ASCII (and NULL byte). There's
4172 /* The text seems to be encoded in some multilingual coding system.
4173 Now, try to find in which coding system the text is encoded. */
4174 if (! null_byte_found
&& c
< 0x80)
4176 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */
4177 /* C is an ISO2022 specific control code of C0. */
4178 latin_extra_code_state
= 1;
4179 mask
= detect_coding_iso2022 (src
, src_end
, multibytep
,
4180 &latin_extra_code_state
);
4183 /* No valid ISO2022 code follows C. Try again. */
4185 if (c
== ISO_CODE_ESC
)
4186 ascii_skip_code
[ISO_CODE_ESC
] = 1;
4188 ascii_skip_code
[ISO_CODE_SO
] = ascii_skip_code
[ISO_CODE_SI
] = 1;
4189 goto label_loop_detect_coding
;
4193 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
4195 if (mask
& priorities
[i
])
4196 return priorities
[i
];
4198 return CODING_CATEGORY_MASK_RAW_TEXT
;
4205 if (multibytep
&& c
== LEADING_CODE_8_BIT_CONTROL
)
4208 if (null_byte_found
)
4210 try = (CODING_CATEGORY_MASK_UTF_16_BE
4211 | CODING_CATEGORY_MASK_UTF_16_LE
);
4215 /* C is the first byte of SJIS character code,
4216 or a leading-code of Emacs' internal format (emacs-mule),
4217 or the first byte of UTF-16. */
4218 try = (CODING_CATEGORY_MASK_SJIS
4219 | CODING_CATEGORY_MASK_EMACS_MULE
4220 | CODING_CATEGORY_MASK_UTF_16_BE
4221 | CODING_CATEGORY_MASK_UTF_16_LE
);
4223 /* Or, if C is a special latin extra code,
4224 or is an ISO2022 specific control code of C1 (SS2 or SS3),
4225 or is an ISO2022 control-sequence-introducer (CSI),
4226 we should also consider the possibility of ISO2022 codings. */
4227 if ((latin_extra_code_state
4228 && VECTORP (Vlatin_extra_code_table
)
4229 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
4230 || (c
== ISO_CODE_SS2
|| c
== ISO_CODE_SS3
)
4231 || (c
== ISO_CODE_CSI
4234 || ((*src
== '0' || *src
== '1' || *src
== '2')
4235 && src
+ 1 < src_end
4236 && src
[1] == ']')))))
4237 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE
4238 | CODING_CATEGORY_MASK_ISO_8BIT
);
4241 /* C is a character of ISO2022 in graphic plane right,
4242 or a SJIS's 1-byte character code (i.e. JISX0201),
4243 or the first byte of BIG5's 2-byte code,
4244 or the first byte of UTF-8/16. */
4245 try = (CODING_CATEGORY_MASK_ISO_8_ELSE
4246 | CODING_CATEGORY_MASK_ISO_8BIT
4247 | CODING_CATEGORY_MASK_SJIS
4248 | CODING_CATEGORY_MASK_BIG5
4249 | CODING_CATEGORY_MASK_UTF_8
4250 | CODING_CATEGORY_MASK_UTF_16_BE
4251 | CODING_CATEGORY_MASK_UTF_16_LE
);
4253 /* Or, we may have to consider the possibility of CCL. */
4254 if (! null_byte_found
4255 && coding_system_table
[CODING_CATEGORY_IDX_CCL
]
4256 && (coding_system_table
[CODING_CATEGORY_IDX_CCL
]
4257 ->spec
.ccl
.valid_codes
)[c
])
4258 try |= CODING_CATEGORY_MASK_CCL
;
4263 /* At first try detection with Latin extra codes not-allowed.
4264 If no proper coding system is found because of Latin extra
4265 codes, try detection with Latin extra codes allowed. */
4266 latin_extra_code_state
= 0;
4268 utf16_examined_p
= iso2022_examined_p
= 0;
4269 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
4271 if (!iso2022_examined_p
4272 && (priorities
[i
] & try & CODING_CATEGORY_MASK_ISO
))
4274 mask
|= detect_coding_iso2022 (src
, src_end
, multibytep
,
4275 &latin_extra_code_state
);
4276 iso2022_examined_p
= 1;
4278 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_SJIS
)
4279 mask
|= detect_coding_sjis (src
, src_end
, multibytep
);
4280 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_UTF_8
)
4281 mask
|= detect_coding_utf_8 (src
, src_end
, multibytep
);
4282 else if (!utf16_examined_p
4283 && (priorities
[i
] & try &
4284 CODING_CATEGORY_MASK_UTF_16_BE_LE
))
4286 mask
|= detect_coding_utf_16 (src
, src_end
, multibytep
);
4287 utf16_examined_p
= 1;
4289 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_BIG5
)
4290 mask
|= detect_coding_big5 (src
, src_end
, multibytep
);
4291 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_EMACS_MULE
)
4292 mask
|= detect_coding_emacs_mule (src
, src_end
, multibytep
);
4293 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_CCL
)
4294 mask
|= detect_coding_ccl (src
, src_end
, multibytep
);
4295 else if (priorities
[i
] & CODING_CATEGORY_MASK_RAW_TEXT
)
4297 if (latin_extra_code_state
== 1)
4299 /* Detection of ISO-2022 based coding system
4300 failed because of Latin extra codes. Before
4301 falling back to raw-text, try again with
4302 Latin extra codes allowed. */
4303 latin_extra_code_state
= 2;
4304 try = (mask
| CODING_CATEGORY_MASK_ISO_8_ELSE
4305 | CODING_CATEGORY_MASK_ISO_8BIT
);
4308 mask
|= CODING_CATEGORY_MASK_RAW_TEXT
;
4310 else if (priorities
[i
] & CODING_CATEGORY_MASK_BINARY
)
4312 if (latin_extra_code_state
== 1)
4314 /* See the above comment. */
4315 latin_extra_code_state
= 2;
4316 try = (mask
| CODING_CATEGORY_MASK_ISO_8_ELSE
4317 | CODING_CATEGORY_MASK_ISO_8BIT
);
4320 mask
|= CODING_CATEGORY_MASK_BINARY
;
4322 if (mask
& priorities
[i
])
4323 return priorities
[i
];
4325 return CODING_CATEGORY_MASK_RAW_TEXT
;
4327 if (try & CODING_CATEGORY_MASK_ISO
)
4328 mask
|= detect_coding_iso2022 (src
, src_end
, multibytep
,
4329 &latin_extra_code_state
);
4330 if (try & CODING_CATEGORY_MASK_SJIS
)
4331 mask
|= detect_coding_sjis (src
, src_end
, multibytep
);
4332 if (try & CODING_CATEGORY_MASK_BIG5
)
4333 mask
|= detect_coding_big5 (src
, src_end
, multibytep
);
4334 if (try & CODING_CATEGORY_MASK_UTF_8
)
4335 mask
|= detect_coding_utf_8 (src
, src_end
, multibytep
);
4336 if (try & CODING_CATEGORY_MASK_UTF_16_BE_LE
)
4337 mask
|= detect_coding_utf_16 (src
, src_end
, multibytep
);
4338 if (try & CODING_CATEGORY_MASK_EMACS_MULE
)
4339 mask
|= detect_coding_emacs_mule (src
, src_end
, multibytep
);
4340 if (try & CODING_CATEGORY_MASK_CCL
)
4341 mask
|= detect_coding_ccl (src
, src_end
, multibytep
);
4343 return (mask
| CODING_CATEGORY_MASK_RAW_TEXT
| CODING_CATEGORY_MASK_BINARY
);
4346 /* Detect how a text of length SRC_BYTES pointed by SRC is encoded.
4347 The information of the detected coding system is set in CODING. */
4350 detect_coding (coding
, src
, src_bytes
)
4351 struct coding_system
*coding
;
4352 const unsigned char *src
;
4359 val
= Vcoding_category_list
;
4360 mask
= detect_coding_mask (src
, src_bytes
, coding_priorities
, &skip
,
4361 coding
->src_multibyte
);
4362 coding
->heading_ascii
= skip
;
4366 /* We found a single coding system of the highest priority in MASK. */
4368 while (mask
&& ! (mask
& 1)) mask
>>= 1, idx
++;
4370 idx
= CODING_CATEGORY_IDX_RAW_TEXT
;
4372 val
= find_symbol_value (XVECTOR (Vcoding_category_table
)->contents
[idx
]);
4374 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
4378 tmp
= Fget (val
, Qeol_type
);
4380 val
= XVECTOR (tmp
)->contents
[coding
->eol_type
];
4383 /* Setup this new coding system while preserving some slots. */
4385 int src_multibyte
= coding
->src_multibyte
;
4386 int dst_multibyte
= coding
->dst_multibyte
;
4388 setup_coding_system (val
, coding
);
4389 coding
->src_multibyte
= src_multibyte
;
4390 coding
->dst_multibyte
= dst_multibyte
;
4391 coding
->heading_ascii
= skip
;
4395 /* Detect how end-of-line of a text of length SRC_BYTES pointed by
4396 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF,
4397 CODING_EOL_CR, and CODING_EOL_UNDECIDED.
4399 How many non-eol characters are at the head is returned as *SKIP. */
4401 #define MAX_EOL_CHECK_COUNT 3
4404 detect_eol_type (source
, src_bytes
, skip
)
4405 const unsigned char *source
;
4406 int src_bytes
, *skip
;
4408 const unsigned char *src
= source
, *src_end
= src
+ src_bytes
;
4410 int total
= 0; /* How many end-of-lines are found so far. */
4411 int eol_type
= CODING_EOL_UNDECIDED
;
4416 while (src
< src_end
&& total
< MAX_EOL_CHECK_COUNT
)
4419 if (c
== '\n' || c
== '\r')
4422 *skip
= src
- 1 - source
;
4425 this_eol_type
= CODING_EOL_LF
;
4426 else if (src
>= src_end
|| *src
!= '\n')
4427 this_eol_type
= CODING_EOL_CR
;
4429 this_eol_type
= CODING_EOL_CRLF
, src
++;
4431 if (eol_type
== CODING_EOL_UNDECIDED
)
4432 /* This is the first end-of-line. */
4433 eol_type
= this_eol_type
;
4434 else if (eol_type
!= this_eol_type
)
4436 /* The found type is different from what found before. */
4437 eol_type
= CODING_EOL_INCONSISTENT
;
4444 *skip
= src_end
- source
;
4448 /* Like detect_eol_type, but detect EOL type in 2-octet
4449 big-endian/little-endian format for coding systems utf-16-be and
4453 detect_eol_type_in_2_octet_form (source
, src_bytes
, skip
, big_endian_p
)
4454 const unsigned char *source
;
4455 int src_bytes
, *skip
, big_endian_p
;
4457 const unsigned char *src
= source
, *src_end
= src
+ src_bytes
;
4458 unsigned int c1
, c2
;
4459 int total
= 0; /* How many end-of-lines are found so far. */
4460 int eol_type
= CODING_EOL_UNDECIDED
;
4471 while ((src
+ 1) < src_end
&& total
< MAX_EOL_CHECK_COUNT
)
4473 c1
= (src
[msb
] << 8) | (src
[lsb
]);
4476 if (c1
== '\n' || c1
== '\r')
4479 *skip
= src
- 2 - source
;
4483 this_eol_type
= CODING_EOL_LF
;
4487 if ((src
+ 1) >= src_end
)
4489 this_eol_type
= CODING_EOL_CR
;
4493 c2
= (src
[msb
] << 8) | (src
[lsb
]);
4495 this_eol_type
= CODING_EOL_CRLF
, src
+= 2;
4497 this_eol_type
= CODING_EOL_CR
;
4501 if (eol_type
== CODING_EOL_UNDECIDED
)
4502 /* This is the first end-of-line. */
4503 eol_type
= this_eol_type
;
4504 else if (eol_type
!= this_eol_type
)
4506 /* The found type is different from what found before. */
4507 eol_type
= CODING_EOL_INCONSISTENT
;
4514 *skip
= src_end
- source
;
4518 /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
4519 is encoded. If it detects an appropriate format of end-of-line, it
4520 sets the information in *CODING. */
4523 detect_eol (coding
, src
, src_bytes
)
4524 struct coding_system
*coding
;
4525 const unsigned char *src
;
4532 switch (coding
->category_idx
)
4534 case CODING_CATEGORY_IDX_UTF_16_BE
:
4535 eol_type
= detect_eol_type_in_2_octet_form (src
, src_bytes
, &skip
, 1);
4537 case CODING_CATEGORY_IDX_UTF_16_LE
:
4538 eol_type
= detect_eol_type_in_2_octet_form (src
, src_bytes
, &skip
, 0);
4541 eol_type
= detect_eol_type (src
, src_bytes
, &skip
);
4545 if (coding
->heading_ascii
> skip
)
4546 coding
->heading_ascii
= skip
;
4548 skip
= coding
->heading_ascii
;
4550 if (eol_type
== CODING_EOL_UNDECIDED
)
4552 if (eol_type
== CODING_EOL_INCONSISTENT
)
4555 /* This code is suppressed until we find a better way to
4556 distinguish raw text file and binary file. */
4558 /* If we have already detected that the coding is raw-text, the
4559 coding should actually be no-conversion. */
4560 if (coding
->type
== coding_type_raw_text
)
4562 setup_coding_system (Qno_conversion
, coding
);
4565 /* Else, let's decode only text code anyway. */
4567 eol_type
= CODING_EOL_LF
;
4570 val
= Fget (coding
->symbol
, Qeol_type
);
4571 if (VECTORP (val
) && XVECTOR (val
)->size
== 3)
4573 int src_multibyte
= coding
->src_multibyte
;
4574 int dst_multibyte
= coding
->dst_multibyte
;
4575 struct composition_data
*cmp_data
= coding
->cmp_data
;
4577 setup_coding_system (XVECTOR (val
)->contents
[eol_type
], coding
);
4578 coding
->src_multibyte
= src_multibyte
;
4579 coding
->dst_multibyte
= dst_multibyte
;
4580 coding
->heading_ascii
= skip
;
4581 coding
->cmp_data
= cmp_data
;
4585 #define CONVERSION_BUFFER_EXTRA_ROOM 256
4587 #define DECODING_BUFFER_MAG(coding) \
4588 (coding->type == coding_type_iso2022 \
4590 : (coding->type == coding_type_ccl \
4591 ? coding->spec.ccl.decoder.buf_magnification \
4594 /* Return maximum size (bytes) of a buffer enough for decoding
4595 SRC_BYTES of text encoded in CODING. */
4598 decoding_buffer_size (coding
, src_bytes
)
4599 struct coding_system
*coding
;
4602 return (src_bytes
* DECODING_BUFFER_MAG (coding
)
4603 + CONVERSION_BUFFER_EXTRA_ROOM
);
4606 /* Return maximum size (bytes) of a buffer enough for encoding
4607 SRC_BYTES of text to CODING. */
4610 encoding_buffer_size (coding
, src_bytes
)
4611 struct coding_system
*coding
;
4616 if (coding
->type
== coding_type_ccl
)
4618 magnification
= coding
->spec
.ccl
.encoder
.buf_magnification
;
4619 if (coding
->eol_type
== CODING_EOL_CRLF
)
4622 else if (CODING_REQUIRE_ENCODING (coding
))
4627 return (src_bytes
* magnification
+ CONVERSION_BUFFER_EXTRA_ROOM
);
4630 /* Working buffer for code conversion. */
4631 struct conversion_buffer
4633 int size
; /* size of data. */
4634 int on_stack
; /* 1 if allocated by alloca. */
4635 unsigned char *data
;
4638 /* Allocate LEN bytes of memory for BUF (struct conversion_buffer). */
4639 #define allocate_conversion_buffer(buf, len) \
4641 if (len < MAX_ALLOCA) \
4643 buf.data = (unsigned char *) alloca (len); \
4648 buf.data = (unsigned char *) xmalloc (len); \
4654 /* Double the allocated memory for *BUF. */
4656 extend_conversion_buffer (buf
)
4657 struct conversion_buffer
*buf
;
4661 unsigned char *save
= buf
->data
;
4662 buf
->data
= (unsigned char *) xmalloc (buf
->size
* 2);
4663 bcopy (save
, buf
->data
, buf
->size
);
4668 buf
->data
= (unsigned char *) xrealloc (buf
->data
, buf
->size
* 2);
4673 /* Free the allocated memory for BUF if it is not on stack. */
4675 free_conversion_buffer (buf
)
4676 struct conversion_buffer
*buf
;
4683 ccl_coding_driver (coding
, source
, destination
, src_bytes
, dst_bytes
, encodep
)
4684 struct coding_system
*coding
;
4685 unsigned char *source
, *destination
;
4686 int src_bytes
, dst_bytes
, encodep
;
4688 struct ccl_program
*ccl
4689 = encodep
? &coding
->spec
.ccl
.encoder
: &coding
->spec
.ccl
.decoder
;
4690 unsigned char *dst
= destination
;
4692 ccl
->suppress_error
= coding
->suppress_error
;
4693 ccl
->last_block
= coding
->mode
& CODING_MODE_LAST_BLOCK
;
4696 /* On encoding, EOL format is converted within ccl_driver. For
4697 that, setup proper information in the structure CCL. */
4698 ccl
->eol_type
= coding
->eol_type
;
4699 if (ccl
->eol_type
==CODING_EOL_UNDECIDED
)
4700 ccl
->eol_type
= CODING_EOL_LF
;
4701 ccl
->cr_consumed
= coding
->spec
.ccl
.cr_carryover
;
4702 ccl
->eight_bit_control
= coding
->dst_multibyte
;
4705 ccl
->eight_bit_control
= 1;
4706 ccl
->multibyte
= coding
->src_multibyte
;
4707 if (coding
->spec
.ccl
.eight_bit_carryover
[0] != 0)
4709 /* Move carryover bytes to DESTINATION. */
4710 unsigned char *p
= coding
->spec
.ccl
.eight_bit_carryover
;
4713 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
4715 dst_bytes
-= dst
- destination
;
4718 coding
->produced
= (ccl_driver (ccl
, source
, dst
, src_bytes
, dst_bytes
,
4719 &(coding
->consumed
))
4720 + dst
- destination
);
4724 coding
->produced_char
= coding
->produced
;
4725 coding
->spec
.ccl
.cr_carryover
= ccl
->cr_consumed
;
4727 else if (!ccl
->eight_bit_control
)
4729 /* The produced bytes forms a valid multibyte sequence. */
4730 coding
->produced_char
4731 = multibyte_chars_in_text (destination
, coding
->produced
);
4732 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
4736 /* On decoding, the destination should always multibyte. But,
4737 CCL program might have been generated an invalid multibyte
4738 sequence. Here we make such a sequence valid as
4741 = dst_bytes
? dst_bytes
: source
+ coding
->consumed
- destination
;
4743 if ((coding
->consumed
< src_bytes
4744 || !ccl
->last_block
)
4745 && coding
->produced
>= 1
4746 && destination
[coding
->produced
- 1] >= 0x80)
4748 /* We should not convert the tailing 8-bit codes to
4749 multibyte form even if they doesn't form a valid
4750 multibyte sequence. They may form a valid sequence in
4754 if (destination
[coding
->produced
- 1] < 0xA0)
4756 else if (coding
->produced
>= 2)
4758 if (destination
[coding
->produced
- 2] >= 0x80)
4760 if (destination
[coding
->produced
- 2] < 0xA0)
4762 else if (coding
->produced
>= 3
4763 && destination
[coding
->produced
- 3] >= 0x80
4764 && destination
[coding
->produced
- 3] < 0xA0)
4770 BCOPY_SHORT (destination
+ coding
->produced
- carryover
,
4771 coding
->spec
.ccl
.eight_bit_carryover
,
4773 coding
->spec
.ccl
.eight_bit_carryover
[carryover
] = 0;
4774 coding
->produced
-= carryover
;
4777 coding
->produced
= str_as_multibyte (destination
, bytes
,
4779 &(coding
->produced_char
));
4782 switch (ccl
->status
)
4784 case CCL_STAT_SUSPEND_BY_SRC
:
4785 coding
->result
= CODING_FINISH_INSUFFICIENT_SRC
;
4787 case CCL_STAT_SUSPEND_BY_DST
:
4788 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
4791 case CCL_STAT_INVALID_CMD
:
4792 coding
->result
= CODING_FINISH_INTERRUPT
;
4795 coding
->result
= CODING_FINISH_NORMAL
;
4798 return coding
->result
;
4801 /* Decode EOL format of the text at PTR of BYTES length destructively
4802 according to CODING->eol_type. This is called after the CCL
4803 program produced a decoded text at PTR. If we do CRLF->LF
4804 conversion, update CODING->produced and CODING->produced_char. */
4807 decode_eol_post_ccl (coding
, ptr
, bytes
)
4808 struct coding_system
*coding
;
4812 Lisp_Object val
, saved_coding_symbol
;
4813 unsigned char *pend
= ptr
+ bytes
;
4816 /* Remember the current coding system symbol. We set it back when
4817 an inconsistent EOL is found so that `last-coding-system-used' is
4818 set to the coding system that doesn't specify EOL conversion. */
4819 saved_coding_symbol
= coding
->symbol
;
4821 coding
->spec
.ccl
.cr_carryover
= 0;
4822 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
4824 /* Here, to avoid the call of setup_coding_system, we directly
4825 call detect_eol_type. */
4826 coding
->eol_type
= detect_eol_type (ptr
, bytes
, &dummy
);
4827 if (coding
->eol_type
== CODING_EOL_INCONSISTENT
)
4828 coding
->eol_type
= CODING_EOL_LF
;
4829 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
4831 val
= Fget (coding
->symbol
, Qeol_type
);
4832 if (VECTORP (val
) && XVECTOR (val
)->size
== 3)
4833 coding
->symbol
= XVECTOR (val
)->contents
[coding
->eol_type
];
4835 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
4838 if (coding
->eol_type
== CODING_EOL_LF
4839 || coding
->eol_type
== CODING_EOL_UNDECIDED
)
4841 /* We have nothing to do. */
4844 else if (coding
->eol_type
== CODING_EOL_CRLF
)
4846 unsigned char *pstart
= ptr
, *p
= ptr
;
4848 if (! (coding
->mode
& CODING_MODE_LAST_BLOCK
)
4849 && *(pend
- 1) == '\r')
4851 /* If the last character is CR, we can't handle it here
4852 because LF will be in the not-yet-decoded source text.
4853 Record that the CR is not yet processed. */
4854 coding
->spec
.ccl
.cr_carryover
= 1;
4856 coding
->produced_char
--;
4863 if (ptr
+ 1 < pend
&& *(ptr
+ 1) == '\n')
4870 if (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4871 goto undo_eol_conversion
;
4875 else if (*ptr
== '\n'
4876 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4877 goto undo_eol_conversion
;
4882 undo_eol_conversion
:
4883 /* We have faced with inconsistent EOL format at PTR.
4884 Convert all LFs before PTR back to CRLFs. */
4885 for (p
--, ptr
--; p
>= pstart
; p
--)
4888 *ptr
-- = '\n', *ptr
-- = '\r';
4892 /* If carryover is recorded, cancel it because we don't
4893 convert CRLF anymore. */
4894 if (coding
->spec
.ccl
.cr_carryover
)
4896 coding
->spec
.ccl
.cr_carryover
= 0;
4898 coding
->produced_char
++;
4902 coding
->eol_type
= CODING_EOL_LF
;
4903 coding
->symbol
= saved_coding_symbol
;
4907 /* As each two-byte sequence CRLF was converted to LF, (PEND
4908 - P) is the number of deleted characters. */
4909 coding
->produced
-= pend
- p
;
4910 coding
->produced_char
-= pend
- p
;
4913 else /* i.e. coding->eol_type == CODING_EOL_CR */
4915 unsigned char *p
= ptr
;
4917 for (; ptr
< pend
; ptr
++)
4921 else if (*ptr
== '\n'
4922 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4924 for (; p
< ptr
; p
++)
4930 coding
->eol_type
= CODING_EOL_LF
;
4931 coding
->symbol
= saved_coding_symbol
;
4937 /* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before
4938 decoding, it may detect coding system and format of end-of-line if
4939 those are not yet decided. The source should be unibyte, the
4940 result is multibyte if CODING->dst_multibyte is nonzero, else
4944 decode_coding (coding
, source
, destination
, src_bytes
, dst_bytes
)
4945 struct coding_system
*coding
;
4946 const unsigned char *source
;
4947 unsigned char *destination
;
4948 int src_bytes
, dst_bytes
;
4952 if (coding
->type
== coding_type_undecided
)
4953 detect_coding (coding
, source
, src_bytes
);
4955 if (coding
->eol_type
== CODING_EOL_UNDECIDED
4956 && coding
->type
!= coding_type_ccl
)
4958 detect_eol (coding
, source
, src_bytes
);
4959 /* We had better recover the original eol format if we
4960 encounter an inconsistent eol format while decoding. */
4961 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
4964 coding
->produced
= coding
->produced_char
= 0;
4965 coding
->consumed
= coding
->consumed_char
= 0;
4967 coding
->result
= CODING_FINISH_NORMAL
;
4969 switch (coding
->type
)
4971 case coding_type_sjis
:
4972 decode_coding_sjis_big5 (coding
, source
, destination
,
4973 src_bytes
, dst_bytes
, 1);
4976 case coding_type_iso2022
:
4977 decode_coding_iso2022 (coding
, source
, destination
,
4978 src_bytes
, dst_bytes
);
4981 case coding_type_big5
:
4982 decode_coding_sjis_big5 (coding
, source
, destination
,
4983 src_bytes
, dst_bytes
, 0);
4986 case coding_type_emacs_mule
:
4987 decode_coding_emacs_mule (coding
, source
, destination
,
4988 src_bytes
, dst_bytes
);
4991 case coding_type_ccl
:
4992 if (coding
->spec
.ccl
.cr_carryover
)
4994 /* Put the CR which was not processed by the previous call
4995 of decode_eol_post_ccl in DESTINATION. It will be
4996 decoded together with the following LF by the call to
4997 decode_eol_post_ccl below. */
4998 *destination
= '\r';
5000 coding
->produced_char
++;
5002 extra
= coding
->spec
.ccl
.cr_carryover
;
5004 ccl_coding_driver (coding
, source
, destination
+ extra
,
5005 src_bytes
, dst_bytes
, 0);
5006 if (coding
->eol_type
!= CODING_EOL_LF
)
5008 coding
->produced
+= extra
;
5009 coding
->produced_char
+= extra
;
5010 decode_eol_post_ccl (coding
, destination
, coding
->produced
);
5015 decode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
5018 if (coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
5019 && coding
->mode
& CODING_MODE_LAST_BLOCK
5020 && coding
->consumed
== src_bytes
)
5021 coding
->result
= CODING_FINISH_NORMAL
;
5023 if (coding
->mode
& CODING_MODE_LAST_BLOCK
5024 && coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
)
5026 const unsigned char *src
= source
+ coding
->consumed
;
5027 unsigned char *dst
= destination
+ coding
->produced
;
5029 src_bytes
-= coding
->consumed
;
5031 if (COMPOSING_P (coding
))
5032 DECODE_COMPOSITION_END ('1');
5036 dst
+= CHAR_STRING (c
, dst
);
5037 coding
->produced_char
++;
5039 coding
->consumed
= coding
->consumed_char
= src
- source
;
5040 coding
->produced
= dst
- destination
;
5041 coding
->result
= CODING_FINISH_NORMAL
;
5044 if (!coding
->dst_multibyte
)
5046 coding
->produced
= str_as_unibyte (destination
, coding
->produced
);
5047 coding
->produced_char
= coding
->produced
;
5050 return coding
->result
;
5053 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". The
5054 multibyteness of the source is CODING->src_multibyte, the
5055 multibyteness of the result is always unibyte. */
5058 encode_coding (coding
, source
, destination
, src_bytes
, dst_bytes
)
5059 struct coding_system
*coding
;
5060 const unsigned char *source
;
5061 unsigned char *destination
;
5062 int src_bytes
, dst_bytes
;
5064 coding
->produced
= coding
->produced_char
= 0;
5065 coding
->consumed
= coding
->consumed_char
= 0;
5067 coding
->result
= CODING_FINISH_NORMAL
;
5068 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
5069 coding
->eol_type
= CODING_EOL_LF
;
5071 switch (coding
->type
)
5073 case coding_type_sjis
:
5074 encode_coding_sjis_big5 (coding
, source
, destination
,
5075 src_bytes
, dst_bytes
, 1);
5078 case coding_type_iso2022
:
5079 encode_coding_iso2022 (coding
, source
, destination
,
5080 src_bytes
, dst_bytes
);
5083 case coding_type_big5
:
5084 encode_coding_sjis_big5 (coding
, source
, destination
,
5085 src_bytes
, dst_bytes
, 0);
5088 case coding_type_emacs_mule
:
5089 encode_coding_emacs_mule (coding
, source
, destination
,
5090 src_bytes
, dst_bytes
);
5093 case coding_type_ccl
:
5094 ccl_coding_driver (coding
, source
, destination
,
5095 src_bytes
, dst_bytes
, 1);
5099 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
5102 if (coding
->mode
& CODING_MODE_LAST_BLOCK
5103 && coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
)
5105 const unsigned char *src
= source
+ coding
->consumed
;
5106 unsigned char *dst
= destination
+ coding
->produced
;
5108 if (coding
->type
== coding_type_iso2022
)
5109 ENCODE_RESET_PLANE_AND_REGISTER
;
5110 if (COMPOSING_P (coding
))
5111 *dst
++ = ISO_CODE_ESC
, *dst
++ = '1';
5112 if (coding
->consumed
< src_bytes
)
5114 int len
= src_bytes
- coding
->consumed
;
5116 BCOPY_SHORT (src
, dst
, len
);
5117 if (coding
->src_multibyte
)
5118 len
= str_as_unibyte (dst
, len
);
5120 coding
->consumed
= src_bytes
;
5122 coding
->produced
= coding
->produced_char
= dst
- destination
;
5123 coding
->result
= CODING_FINISH_NORMAL
;
5126 if (coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
5127 && coding
->consumed
== src_bytes
)
5128 coding
->result
= CODING_FINISH_NORMAL
;
5130 return coding
->result
;
5133 /* Scan text in the region between *BEG and *END (byte positions),
5134 skip characters which we don't have to decode by coding system
5135 CODING at the head and tail, then set *BEG and *END to the region
5136 of the text we actually have to convert. The caller should move
5137 the gap out of the region in advance if the region is from a
5140 If STR is not NULL, *BEG and *END are indices into STR. */
5143 shrink_decoding_region (beg
, end
, coding
, str
)
5145 struct coding_system
*coding
;
5148 unsigned char *begp_orig
, *begp
, *endp_orig
, *endp
, c
;
5150 Lisp_Object translation_table
;
5152 if (coding
->type
== coding_type_ccl
5153 || coding
->type
== coding_type_undecided
5154 || coding
->eol_type
!= CODING_EOL_LF
5155 || !NILP (coding
->post_read_conversion
)
5156 || coding
->composing
!= COMPOSITION_DISABLED
)
5158 /* We can't skip any data. */
5161 if (coding
->type
== coding_type_no_conversion
5162 || coding
->type
== coding_type_raw_text
5163 || coding
->type
== coding_type_emacs_mule
)
5165 /* We need no conversion, but don't have to skip any data here.
5166 Decoding routine handles them effectively anyway. */
5170 translation_table
= coding
->translation_table_for_decode
;
5171 if (NILP (translation_table
) && !NILP (Venable_character_translation
))
5172 translation_table
= Vstandard_translation_table_for_decode
;
5173 if (CHAR_TABLE_P (translation_table
))
5176 for (i
= 0; i
< 128; i
++)
5177 if (!NILP (CHAR_TABLE_REF (translation_table
, i
)))
5180 /* Some ASCII character should be translated. We give up
5185 if (coding
->heading_ascii
>= 0)
5186 /* Detection routine has already found how much we can skip at the
5188 *beg
+= coding
->heading_ascii
;
5192 begp_orig
= begp
= str
+ *beg
;
5193 endp_orig
= endp
= str
+ *end
;
5197 begp_orig
= begp
= BYTE_POS_ADDR (*beg
);
5198 endp_orig
= endp
= begp
+ *end
- *beg
;
5201 eol_conversion
= (coding
->eol_type
== CODING_EOL_CR
5202 || coding
->eol_type
== CODING_EOL_CRLF
);
5204 switch (coding
->type
)
5206 case coding_type_sjis
:
5207 case coding_type_big5
:
5208 /* We can skip all ASCII characters at the head. */
5209 if (coding
->heading_ascii
< 0)
5212 while (begp
< endp
&& *begp
< 0x80 && *begp
!= '\r') begp
++;
5214 while (begp
< endp
&& *begp
< 0x80) begp
++;
5216 /* We can skip all ASCII characters at the tail except for the
5217 second byte of SJIS or BIG5 code. */
5219 while (begp
< endp
&& endp
[-1] < 0x80 && endp
[-1] != '\r') endp
--;
5221 while (begp
< endp
&& endp
[-1] < 0x80) endp
--;
5222 /* Do not consider LF as ascii if preceded by CR, since that
5223 confuses eol decoding. */
5224 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] == '\r' && endp
[0] == '\n')
5226 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] >= 0x80)
5230 case coding_type_iso2022
:
5231 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, 0) != CHARSET_ASCII
)
5232 /* We can't skip any data. */
5234 if (coding
->heading_ascii
< 0)
5236 /* We can skip all ASCII characters at the head except for a
5237 few control codes. */
5238 while (begp
< endp
&& (c
= *begp
) < 0x80
5239 && c
!= ISO_CODE_CR
&& c
!= ISO_CODE_SO
5240 && c
!= ISO_CODE_SI
&& c
!= ISO_CODE_ESC
5241 && (!eol_conversion
|| c
!= ISO_CODE_LF
))
5244 switch (coding
->category_idx
)
5246 case CODING_CATEGORY_IDX_ISO_8_1
:
5247 case CODING_CATEGORY_IDX_ISO_8_2
:
5248 /* We can skip all ASCII characters at the tail. */
5250 while (begp
< endp
&& (c
= endp
[-1]) < 0x80 && c
!= '\r') endp
--;
5252 while (begp
< endp
&& endp
[-1] < 0x80) endp
--;
5253 /* Do not consider LF as ascii if preceded by CR, since that
5254 confuses eol decoding. */
5255 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] == '\r' && endp
[0] == '\n')
5259 case CODING_CATEGORY_IDX_ISO_7
:
5260 case CODING_CATEGORY_IDX_ISO_7_TIGHT
:
5262 /* We can skip all characters at the tail except for 8-bit
5263 codes and ESC and the following 2-byte at the tail. */
5264 unsigned char *eight_bit
= NULL
;
5268 && (c
= endp
[-1]) != ISO_CODE_ESC
&& c
!= '\r')
5270 if (!eight_bit
&& c
& 0x80) eight_bit
= endp
;
5275 && (c
= endp
[-1]) != ISO_CODE_ESC
)
5277 if (!eight_bit
&& c
& 0x80) eight_bit
= endp
;
5280 /* Do not consider LF as ascii if preceded by CR, since that
5281 confuses eol decoding. */
5282 if (begp
< endp
&& endp
< endp_orig
5283 && endp
[-1] == '\r' && endp
[0] == '\n')
5285 if (begp
< endp
&& endp
[-1] == ISO_CODE_ESC
)
5287 if (endp
+ 1 < endp_orig
&& end
[0] == '(' && end
[1] == 'B')
5288 /* This is an ASCII designation sequence. We can
5289 surely skip the tail. But, if we have
5290 encountered an 8-bit code, skip only the codes
5292 endp
= eight_bit
? eight_bit
: endp
+ 2;
5294 /* Hmmm, we can't skip the tail. */
5306 *beg
+= begp
- begp_orig
;
5307 *end
+= endp
- endp_orig
;
5311 /* Like shrink_decoding_region but for encoding. */
5314 shrink_encoding_region (beg
, end
, coding
, str
)
5316 struct coding_system
*coding
;
5319 unsigned char *begp_orig
, *begp
, *endp_orig
, *endp
;
5321 Lisp_Object translation_table
;
5323 if (coding
->type
== coding_type_ccl
5324 || coding
->eol_type
== CODING_EOL_CRLF
5325 || coding
->eol_type
== CODING_EOL_CR
5326 || (coding
->cmp_data
&& coding
->cmp_data
->used
> 0))
5328 /* We can't skip any data. */
5331 if (coding
->type
== coding_type_no_conversion
5332 || coding
->type
== coding_type_raw_text
5333 || coding
->type
== coding_type_emacs_mule
5334 || coding
->type
== coding_type_undecided
)
5336 /* We need no conversion, but don't have to skip any data here.
5337 Encoding routine handles them effectively anyway. */
5341 translation_table
= coding
->translation_table_for_encode
;
5342 if (NILP (translation_table
) && !NILP (Venable_character_translation
))
5343 translation_table
= Vstandard_translation_table_for_encode
;
5344 if (CHAR_TABLE_P (translation_table
))
5347 for (i
= 0; i
< 128; i
++)
5348 if (!NILP (CHAR_TABLE_REF (translation_table
, i
)))
5351 /* Some ASCII character should be translated. We give up
5358 begp_orig
= begp
= str
+ *beg
;
5359 endp_orig
= endp
= str
+ *end
;
5363 begp_orig
= begp
= BYTE_POS_ADDR (*beg
);
5364 endp_orig
= endp
= begp
+ *end
- *beg
;
5367 eol_conversion
= (coding
->eol_type
== CODING_EOL_CR
5368 || coding
->eol_type
== CODING_EOL_CRLF
);
5370 /* Here, we don't have to check coding->pre_write_conversion because
5371 the caller is expected to have handled it already. */
5372 switch (coding
->type
)
5374 case coding_type_iso2022
:
5375 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, 0) != CHARSET_ASCII
)
5376 /* We can't skip any data. */
5378 if (coding
->flags
& CODING_FLAG_ISO_DESIGNATE_AT_BOL
)
5380 unsigned char *bol
= begp
;
5381 while (begp
< endp
&& *begp
< 0x80)
5384 if (begp
[-1] == '\n')
5388 goto label_skip_tail
;
5392 case coding_type_sjis
:
5393 case coding_type_big5
:
5394 /* We can skip all ASCII characters at the head and tail. */
5396 while (begp
< endp
&& *begp
< 0x80 && *begp
!= '\n') begp
++;
5398 while (begp
< endp
&& *begp
< 0x80) begp
++;
5401 while (begp
< endp
&& endp
[-1] < 0x80 && endp
[-1] != '\n') endp
--;
5403 while (begp
< endp
&& *(endp
- 1) < 0x80) endp
--;
5410 *beg
+= begp
- begp_orig
;
5411 *end
+= endp
- endp_orig
;
5415 /* As shrinking conversion region requires some overhead, we don't try
5416 shrinking if the length of conversion region is less than this
5418 static int shrink_conversion_region_threshhold
= 1024;
5420 #define SHRINK_CONVERSION_REGION(beg, end, coding, str, encodep) \
5422 if (*(end) - *(beg) > shrink_conversion_region_threshhold) \
5424 if (encodep) shrink_encoding_region (beg, end, coding, str); \
5425 else shrink_decoding_region (beg, end, coding, str); \
5429 /* ARG is (CODING BUFFER ...) where CODING is what to be set in
5430 Vlast_coding_system_used and the remaining elements are buffers to
5433 code_convert_region_unwind (arg
)
5436 struct gcpro gcpro1
;
5439 inhibit_pre_post_conversion
= 0;
5440 Vlast_coding_system_used
= XCAR (arg
);
5441 for (arg
= XCDR (arg
); CONSP (arg
); arg
= XCDR (arg
))
5442 Fkill_buffer (XCAR (arg
));
5448 /* Store information about all compositions in the range FROM and TO
5449 of OBJ in memory blocks pointed by CODING->cmp_data. OBJ is a
5450 buffer or a string, defaults to the current buffer. */
5453 coding_save_composition (coding
, from
, to
, obj
)
5454 struct coding_system
*coding
;
5461 if (coding
->composing
== COMPOSITION_DISABLED
)
5463 if (!coding
->cmp_data
)
5464 coding_allocate_composition_data (coding
, from
);
5465 if (!find_composition (from
, to
, &start
, &end
, &prop
, obj
)
5469 && (!find_composition (end
, to
, &start
, &end
, &prop
, obj
)
5472 coding
->composing
= COMPOSITION_NO
;
5475 if (COMPOSITION_VALID_P (start
, end
, prop
))
5477 enum composition_method method
= COMPOSITION_METHOD (prop
);
5478 if (coding
->cmp_data
->used
+ COMPOSITION_DATA_MAX_BUNCH_LENGTH
5479 >= COMPOSITION_DATA_SIZE
)
5480 coding_allocate_composition_data (coding
, from
);
5481 /* For relative composition, we remember start and end
5482 positions, for the other compositions, we also remember
5484 CODING_ADD_COMPOSITION_START (coding
, start
- from
, method
);
5485 if (method
!= COMPOSITION_RELATIVE
)
5487 /* We must store a*/
5488 Lisp_Object val
, ch
;
5490 val
= COMPOSITION_COMPONENTS (prop
);
5494 ch
= XCAR (val
), val
= XCDR (val
);
5495 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (ch
));
5497 else if (VECTORP (val
) || STRINGP (val
))
5499 int len
= (VECTORP (val
)
5500 ? XVECTOR (val
)->size
: SCHARS (val
));
5502 for (i
= 0; i
< len
; i
++)
5505 ? Faref (val
, make_number (i
))
5506 : XVECTOR (val
)->contents
[i
]);
5507 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (ch
));
5510 else /* INTEGERP (val) */
5511 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (val
));
5513 CODING_ADD_COMPOSITION_END (coding
, end
- from
);
5518 && find_composition (start
, to
, &start
, &end
, &prop
, obj
)
5521 /* Make coding->cmp_data point to the first memory block. */
5522 while (coding
->cmp_data
->prev
)
5523 coding
->cmp_data
= coding
->cmp_data
->prev
;
5524 coding
->cmp_data_start
= 0;
5527 /* Reflect the saved information about compositions to OBJ.
5528 CODING->cmp_data points to a memory block for the information. OBJ
5529 is a buffer or a string, defaults to the current buffer. */
5532 coding_restore_composition (coding
, obj
)
5533 struct coding_system
*coding
;
5536 struct composition_data
*cmp_data
= coding
->cmp_data
;
5541 while (cmp_data
->prev
)
5542 cmp_data
= cmp_data
->prev
;
5548 for (i
= 0; i
< cmp_data
->used
&& cmp_data
->data
[i
] > 0;
5549 i
+= cmp_data
->data
[i
])
5551 int *data
= cmp_data
->data
+ i
;
5552 enum composition_method method
= (enum composition_method
) data
[3];
5553 Lisp_Object components
;
5555 if (data
[0] < 0 || i
+ data
[0] > cmp_data
->used
)
5556 /* Invalid composition data. */
5559 if (method
== COMPOSITION_RELATIVE
)
5563 int len
= data
[0] - 4, j
;
5564 Lisp_Object args
[MAX_COMPOSITION_COMPONENTS
* 2 - 1];
5566 if (method
== COMPOSITION_WITH_RULE_ALTCHARS
5570 /* Invalid composition data. */
5572 for (j
= 0; j
< len
; j
++)
5573 args
[j
] = make_number (data
[4 + j
]);
5574 components
= (method
== COMPOSITION_WITH_ALTCHARS
5575 ? Fstring (len
, args
)
5576 : Fvector (len
, args
));
5578 compose_text (data
[1], data
[2], components
, Qnil
, obj
);
5580 cmp_data
= cmp_data
->next
;
5584 /* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the
5585 text from FROM to TO (byte positions are FROM_BYTE and TO_BYTE) by
5586 coding system CODING, and return the status code of code conversion
5587 (currently, this value has no meaning).
5589 How many characters (and bytes) are converted to how many
5590 characters (and bytes) are recorded in members of the structure
5593 If REPLACE is nonzero, we do various things as if the original text
5594 is deleted and a new text is inserted. See the comments in
5595 replace_range (insdel.c) to know what we are doing.
5597 If REPLACE is zero, it is assumed that the source text is unibyte.
5598 Otherwise, it is assumed that the source text is multibyte. */
5601 code_convert_region (from
, from_byte
, to
, to_byte
, coding
, encodep
, replace
)
5602 int from
, from_byte
, to
, to_byte
, encodep
, replace
;
5603 struct coding_system
*coding
;
5605 int len
= to
- from
, len_byte
= to_byte
- from_byte
;
5606 int nchars_del
= 0, nbytes_del
= 0;
5607 int require
, inserted
, inserted_byte
;
5608 int head_skip
, tail_skip
, total_skip
= 0;
5609 Lisp_Object saved_coding_symbol
;
5611 unsigned char *src
, *dst
;
5612 Lisp_Object deletion
;
5613 int orig_point
= PT
, orig_len
= len
;
5615 int multibyte_p
= !NILP (current_buffer
->enable_multibyte_characters
);
5618 saved_coding_symbol
= coding
->symbol
;
5620 if (from
< PT
&& PT
< to
)
5622 TEMP_SET_PT_BOTH (from
, from_byte
);
5628 int saved_from
= from
;
5629 int saved_inhibit_modification_hooks
;
5631 prepare_to_modify_buffer (from
, to
, &from
);
5632 if (saved_from
!= from
)
5635 from_byte
= CHAR_TO_BYTE (from
), to_byte
= CHAR_TO_BYTE (to
);
5636 len_byte
= to_byte
- from_byte
;
5639 /* The code conversion routine can not preserve text properties
5640 for now. So, we must remove all text properties in the
5641 region. Here, we must suppress all modification hooks. */
5642 saved_inhibit_modification_hooks
= inhibit_modification_hooks
;
5643 inhibit_modification_hooks
= 1;
5644 Fset_text_properties (make_number (from
), make_number (to
), Qnil
, Qnil
);
5645 inhibit_modification_hooks
= saved_inhibit_modification_hooks
;
5648 coding
->heading_ascii
= 0;
5650 if (! encodep
&& CODING_REQUIRE_DETECTION (coding
))
5652 /* We must detect encoding of text and eol format. */
5654 if (from
< GPT
&& to
> GPT
)
5655 move_gap_both (from
, from_byte
);
5656 if (coding
->type
== coding_type_undecided
)
5658 detect_coding (coding
, BYTE_POS_ADDR (from_byte
), len_byte
);
5659 if (coding
->type
== coding_type_undecided
)
5661 /* It seems that the text contains only ASCII, but we
5662 should not leave it undecided because the deeper
5663 decoding routine (decode_coding) tries to detect the
5664 encodings again in vain. */
5665 coding
->type
= coding_type_emacs_mule
;
5666 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
5667 /* As emacs-mule decoder will handle composition, we
5668 need this setting to allocate coding->cmp_data
5670 coding
->composing
= COMPOSITION_NO
;
5673 if (coding
->eol_type
== CODING_EOL_UNDECIDED
5674 && coding
->type
!= coding_type_ccl
)
5676 detect_eol (coding
, BYTE_POS_ADDR (from_byte
), len_byte
);
5677 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
5678 coding
->eol_type
= CODING_EOL_LF
;
5679 /* We had better recover the original eol format if we
5680 encounter an inconsistent eol format while decoding. */
5681 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
5685 /* Now we convert the text. */
5687 /* For encoding, we must process pre-write-conversion in advance. */
5688 if (! inhibit_pre_post_conversion
5690 && SYMBOLP (coding
->pre_write_conversion
)
5691 && ! NILP (Ffboundp (coding
->pre_write_conversion
)))
5693 /* The function in pre-write-conversion may put a new text in a
5695 struct buffer
*prev
= current_buffer
;
5698 record_unwind_protect (code_convert_region_unwind
,
5699 Fcons (Vlast_coding_system_used
, Qnil
));
5700 /* We should not call any more pre-write/post-read-conversion
5701 functions while this pre-write-conversion is running. */
5702 inhibit_pre_post_conversion
= 1;
5703 call2 (coding
->pre_write_conversion
,
5704 make_number (from
), make_number (to
));
5705 inhibit_pre_post_conversion
= 0;
5706 /* Discard the unwind protect. */
5709 if (current_buffer
!= prev
)
5712 new = Fcurrent_buffer ();
5713 set_buffer_internal_1 (prev
);
5714 del_range_2 (from
, from_byte
, to
, to_byte
, 0);
5715 TEMP_SET_PT_BOTH (from
, from_byte
);
5716 insert_from_buffer (XBUFFER (new), 1, len
, 0);
5718 if (orig_point
>= to
)
5719 orig_point
+= len
- orig_len
;
5720 else if (orig_point
> from
)
5724 from_byte
= CHAR_TO_BYTE (from
);
5725 to_byte
= CHAR_TO_BYTE (to
);
5726 len_byte
= to_byte
- from_byte
;
5727 TEMP_SET_PT_BOTH (from
, from_byte
);
5733 if (! EQ (current_buffer
->undo_list
, Qt
))
5734 deletion
= make_buffer_string_both (from
, from_byte
, to
, to_byte
, 1);
5737 nchars_del
= to
- from
;
5738 nbytes_del
= to_byte
- from_byte
;
5742 if (coding
->composing
!= COMPOSITION_DISABLED
)
5745 coding_save_composition (coding
, from
, to
, Fcurrent_buffer ());
5747 coding_allocate_composition_data (coding
, from
);
5750 /* Try to skip the heading and tailing ASCIIs. We can't skip them
5751 if we must run CCL program or there are compositions to
5753 if (coding
->type
!= coding_type_ccl
5754 && (! coding
->cmp_data
|| coding
->cmp_data
->used
== 0))
5756 int from_byte_orig
= from_byte
, to_byte_orig
= to_byte
;
5758 if (from
< GPT
&& GPT
< to
)
5759 move_gap_both (from
, from_byte
);
5760 SHRINK_CONVERSION_REGION (&from_byte
, &to_byte
, coding
, NULL
, encodep
);
5761 if (from_byte
== to_byte
5762 && (encodep
|| NILP (coding
->post_read_conversion
))
5763 && ! CODING_REQUIRE_FLUSHING (coding
))
5765 coding
->produced
= len_byte
;
5766 coding
->produced_char
= len
;
5768 /* We must record and adjust for this new text now. */
5769 adjust_after_insert (from
, from_byte_orig
, to
, to_byte_orig
, len
);
5770 coding_free_composition_data (coding
);
5774 head_skip
= from_byte
- from_byte_orig
;
5775 tail_skip
= to_byte_orig
- to_byte
;
5776 total_skip
= head_skip
+ tail_skip
;
5779 len
-= total_skip
; len_byte
-= total_skip
;
5782 /* For conversion, we must put the gap before the text in addition to
5783 making the gap larger for efficient decoding. The required gap
5784 size starts from 2000 which is the magic number used in make_gap.
5785 But, after one batch of conversion, it will be incremented if we
5786 find that it is not enough . */
5789 if (GAP_SIZE
< require
)
5790 make_gap (require
- GAP_SIZE
);
5791 move_gap_both (from
, from_byte
);
5793 inserted
= inserted_byte
= 0;
5795 GAP_SIZE
+= len_byte
;
5798 ZV_BYTE
-= len_byte
;
5801 if (GPT
- BEG
< BEG_UNCHANGED
)
5802 BEG_UNCHANGED
= GPT
- BEG
;
5803 if (Z
- GPT
< END_UNCHANGED
)
5804 END_UNCHANGED
= Z
- GPT
;
5806 if (!encodep
&& coding
->src_multibyte
)
5808 /* Decoding routines expects that the source text is unibyte.
5809 We must convert 8-bit characters of multibyte form to
5811 int len_byte_orig
= len_byte
;
5812 len_byte
= str_as_unibyte (GAP_END_ADDR
- len_byte
, len_byte
);
5813 if (len_byte
< len_byte_orig
)
5814 safe_bcopy (GAP_END_ADDR
- len_byte_orig
, GAP_END_ADDR
- len_byte
,
5816 coding
->src_multibyte
= 0;
5823 /* The buffer memory is now:
5824 +--------+converted-text+---------+-------original-text-------+---+
5825 |<-from->|<--inserted-->|---------|<--------len_byte--------->|---|
5826 |<---------------------- GAP ----------------------->| */
5827 src
= GAP_END_ADDR
- len_byte
;
5828 dst
= GPT_ADDR
+ inserted_byte
;
5831 result
= encode_coding (coding
, src
, dst
, len_byte
, 0);
5834 if (coding
->composing
!= COMPOSITION_DISABLED
)
5835 coding
->cmp_data
->char_offset
= from
+ inserted
;
5836 result
= decode_coding (coding
, src
, dst
, len_byte
, 0);
5839 /* The buffer memory is now:
5840 +--------+-------converted-text----+--+------original-text----+---+
5841 |<-from->|<-inserted->|<-produced->|--|<-(len_byte-consumed)->|---|
5842 |<---------------------- GAP ----------------------->| */
5844 inserted
+= coding
->produced_char
;
5845 inserted_byte
+= coding
->produced
;
5846 len_byte
-= coding
->consumed
;
5848 if (result
== CODING_FINISH_INSUFFICIENT_CMP
)
5850 coding_allocate_composition_data (coding
, from
+ inserted
);
5854 src
+= coding
->consumed
;
5855 dst
+= coding
->produced
;
5857 if (result
== CODING_FINISH_NORMAL
)
5862 if (! encodep
&& result
== CODING_FINISH_INCONSISTENT_EOL
)
5864 unsigned char *pend
= dst
, *p
= pend
- inserted_byte
;
5865 Lisp_Object eol_type
;
5867 /* Encode LFs back to the original eol format (CR or CRLF). */
5868 if (coding
->eol_type
== CODING_EOL_CR
)
5870 while (p
< pend
) if (*p
++ == '\n') p
[-1] = '\r';
5876 while (p
< pend
) if (*p
++ == '\n') count
++;
5877 if (src
- dst
< count
)
5879 /* We don't have sufficient room for encoding LFs
5880 back to CRLF. We must record converted and
5881 not-yet-converted text back to the buffer
5882 content, enlarge the gap, then record them out of
5883 the buffer contents again. */
5884 int add
= len_byte
+ inserted_byte
;
5887 ZV
+= add
; Z
+= add
; ZV_BYTE
+= add
; Z_BYTE
+= add
;
5888 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
5889 make_gap (count
- GAP_SIZE
);
5891 ZV
-= add
; Z
-= add
; ZV_BYTE
-= add
; Z_BYTE
-= add
;
5892 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
5893 /* Don't forget to update SRC, DST, and PEND. */
5894 src
= GAP_END_ADDR
- len_byte
;
5895 dst
= GPT_ADDR
+ inserted_byte
;
5899 inserted_byte
+= count
;
5900 coding
->produced
+= count
;
5901 p
= dst
= pend
+ count
;
5905 if (*p
== '\n') count
--, *--p
= '\r';
5909 /* Suppress eol-format conversion in the further conversion. */
5910 coding
->eol_type
= CODING_EOL_LF
;
5912 /* Set the coding system symbol to that for Unix-like EOL. */
5913 eol_type
= Fget (saved_coding_symbol
, Qeol_type
);
5914 if (VECTORP (eol_type
)
5915 && XVECTOR (eol_type
)->size
== 3
5916 && SYMBOLP (XVECTOR (eol_type
)->contents
[CODING_EOL_LF
]))
5917 coding
->symbol
= XVECTOR (eol_type
)->contents
[CODING_EOL_LF
];
5919 coding
->symbol
= saved_coding_symbol
;
5925 if (coding
->type
!= coding_type_ccl
5926 || coding
->mode
& CODING_MODE_LAST_BLOCK
)
5928 coding
->mode
|= CODING_MODE_LAST_BLOCK
;
5931 if (result
== CODING_FINISH_INSUFFICIENT_SRC
)
5933 /* The source text ends in invalid codes. Let's just
5934 make them valid buffer contents, and finish conversion. */
5937 unsigned char *start
= dst
;
5939 inserted
+= len_byte
;
5943 dst
+= CHAR_STRING (c
, dst
);
5946 inserted_byte
+= dst
- start
;
5950 inserted
+= len_byte
;
5951 inserted_byte
+= len_byte
;
5957 if (result
== CODING_FINISH_INTERRUPT
)
5959 /* The conversion procedure was interrupted by a user. */
5962 /* Now RESULT == CODING_FINISH_INSUFFICIENT_DST */
5963 if (coding
->consumed
< 1)
5965 /* It's quite strange to require more memory without
5966 consuming any bytes. Perhaps CCL program bug. */
5971 /* We have just done the first batch of conversion which was
5972 stopped because of insufficient gap. Let's reconsider the
5973 required gap size (i.e. SRT - DST) now.
5975 We have converted ORIG bytes (== coding->consumed) into
5976 NEW bytes (coding->produced). To convert the remaining
5977 LEN bytes, we may need REQUIRE bytes of gap, where:
5978 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
5979 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
5980 Here, we are sure that NEW >= ORIG. */
5982 if (coding
->produced
<= coding
->consumed
)
5984 /* This happens because of CCL-based coding system with
5990 float ratio
= coding
->produced
- coding
->consumed
;
5991 ratio
/= coding
->consumed
;
5992 require
= len_byte
* ratio
;
5996 if ((src
- dst
) < (require
+ 2000))
5998 /* See the comment above the previous call of make_gap. */
5999 int add
= len_byte
+ inserted_byte
;
6002 ZV
+= add
; Z
+= add
; ZV_BYTE
+= add
; Z_BYTE
+= add
;
6003 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
6004 make_gap (require
+ 2000);
6006 ZV
-= add
; Z
-= add
; ZV_BYTE
-= add
; Z_BYTE
-= add
;
6007 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
6010 if (src
- dst
> 0) *dst
= 0; /* Put an anchor. */
6012 if (encodep
&& coding
->dst_multibyte
)
6014 /* The output is unibyte. We must convert 8-bit characters to
6016 if (inserted_byte
* 2 > GAP_SIZE
)
6018 GAP_SIZE
-= inserted_byte
;
6019 ZV
+= inserted_byte
; Z
+= inserted_byte
;
6020 ZV_BYTE
+= inserted_byte
; Z_BYTE
+= inserted_byte
;
6021 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
6022 make_gap (inserted_byte
- GAP_SIZE
);
6023 GAP_SIZE
+= inserted_byte
;
6024 ZV
-= inserted_byte
; Z
-= inserted_byte
;
6025 ZV_BYTE
-= inserted_byte
; Z_BYTE
-= inserted_byte
;
6026 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
6028 inserted_byte
= str_to_multibyte (GPT_ADDR
, GAP_SIZE
, inserted_byte
);
6031 /* If we shrank the conversion area, adjust it now. */
6035 safe_bcopy (GAP_END_ADDR
, GPT_ADDR
+ inserted_byte
, tail_skip
);
6036 inserted
+= total_skip
; inserted_byte
+= total_skip
;
6037 GAP_SIZE
+= total_skip
;
6038 GPT
-= head_skip
; GPT_BYTE
-= head_skip
;
6039 ZV
-= total_skip
; ZV_BYTE
-= total_skip
;
6040 Z
-= total_skip
; Z_BYTE
-= total_skip
;
6041 from
-= head_skip
; from_byte
-= head_skip
;
6042 to
+= tail_skip
; to_byte
+= tail_skip
;
6046 if (! EQ (current_buffer
->undo_list
, Qt
))
6047 adjust_after_replace (from
, from_byte
, deletion
, inserted
, inserted_byte
);
6049 adjust_after_replace_noundo (from
, from_byte
, nchars_del
, nbytes_del
,
6050 inserted
, inserted_byte
);
6051 inserted
= Z
- prev_Z
;
6053 if (!encodep
&& coding
->cmp_data
&& coding
->cmp_data
->used
)
6054 coding_restore_composition (coding
, Fcurrent_buffer ());
6055 coding_free_composition_data (coding
);
6057 if (! inhibit_pre_post_conversion
6058 && ! encodep
&& ! NILP (coding
->post_read_conversion
))
6061 Lisp_Object saved_coding_system
;
6064 TEMP_SET_PT_BOTH (from
, from_byte
);
6066 record_unwind_protect (code_convert_region_unwind
,
6067 Fcons (Vlast_coding_system_used
, Qnil
));
6068 saved_coding_system
= Vlast_coding_system_used
;
6069 Vlast_coding_system_used
= coding
->symbol
;
6070 /* We should not call any more pre-write/post-read-conversion
6071 functions while this post-read-conversion is running. */
6072 inhibit_pre_post_conversion
= 1;
6073 val
= call1 (coding
->post_read_conversion
, make_number (inserted
));
6074 inhibit_pre_post_conversion
= 0;
6075 coding
->symbol
= Vlast_coding_system_used
;
6076 Vlast_coding_system_used
= saved_coding_system
;
6077 /* Discard the unwind protect. */
6080 inserted
+= Z
- prev_Z
;
6083 if (orig_point
>= from
)
6085 if (orig_point
>= from
+ orig_len
)
6086 orig_point
+= inserted
- orig_len
;
6089 TEMP_SET_PT (orig_point
);
6094 signal_after_change (from
, to
- from
, inserted
);
6095 update_compositions (from
, from
+ inserted
, CHECK_BORDER
);
6099 coding
->consumed
= to_byte
- from_byte
;
6100 coding
->consumed_char
= to
- from
;
6101 coding
->produced
= inserted_byte
;
6102 coding
->produced_char
= inserted
;
6108 /* Name (or base name) of work buffer for code conversion. */
6109 static Lisp_Object Vcode_conversion_workbuf_name
;
6111 /* Set the current buffer to the working buffer prepared for
6112 code-conversion. MULTIBYTE specifies the multibyteness of the
6113 buffer. Return the buffer we set if it must be killed after use.
6114 Otherwise return Qnil. */
6117 set_conversion_work_buffer (multibyte
)
6120 Lisp_Object buffer
, buffer_to_kill
;
6123 buffer
= Fget_buffer_create (Vcode_conversion_workbuf_name
);
6124 buf
= XBUFFER (buffer
);
6125 if (buf
== current_buffer
)
6127 /* As we are already in the work buffer, we must generate a new
6128 buffer for the work. */
6131 name
= Fgenerate_new_buffer_name (Vcode_conversion_workbuf_name
, Qnil
);
6132 buffer
= buffer_to_kill
= Fget_buffer_create (name
);
6133 buf
= XBUFFER (buffer
);
6136 buffer_to_kill
= Qnil
;
6138 delete_all_overlays (buf
);
6139 buf
->directory
= current_buffer
->directory
;
6140 buf
->read_only
= Qnil
;
6141 buf
->filename
= Qnil
;
6142 buf
->undo_list
= Qt
;
6143 eassert (buf
->overlays_before
== NULL
);
6144 eassert (buf
->overlays_after
== NULL
);
6145 set_buffer_internal (buf
);
6146 if (BEG
!= BEGV
|| Z
!= ZV
)
6148 del_range_2 (BEG
, BEG_BYTE
, Z
, Z_BYTE
, 0);
6149 buf
->enable_multibyte_characters
= multibyte
? Qt
: Qnil
;
6150 return buffer_to_kill
;
6154 run_pre_post_conversion_on_str (str
, coding
, encodep
)
6156 struct coding_system
*coding
;
6159 int count
= SPECPDL_INDEX ();
6160 struct gcpro gcpro1
, gcpro2
;
6161 int multibyte
= STRING_MULTIBYTE (str
);
6162 Lisp_Object old_deactivate_mark
;
6163 Lisp_Object buffer_to_kill
;
6164 Lisp_Object unwind_arg
;
6166 record_unwind_protect (Fset_buffer
, Fcurrent_buffer ());
6167 /* It is not crucial to specbind this. */
6168 old_deactivate_mark
= Vdeactivate_mark
;
6169 GCPRO2 (str
, old_deactivate_mark
);
6171 /* We must insert the contents of STR as is without
6172 unibyte<->multibyte conversion. For that, we adjust the
6173 multibyteness of the working buffer to that of STR. */
6174 buffer_to_kill
= set_conversion_work_buffer (multibyte
);
6175 if (NILP (buffer_to_kill
))
6176 unwind_arg
= Fcons (Vlast_coding_system_used
, Qnil
);
6178 unwind_arg
= list2 (Vlast_coding_system_used
, buffer_to_kill
);
6179 record_unwind_protect (code_convert_region_unwind
, unwind_arg
);
6181 insert_from_string (str
, 0, 0,
6182 SCHARS (str
), SBYTES (str
), 0);
6184 inhibit_pre_post_conversion
= 1;
6187 struct buffer
*prev
= current_buffer
;
6189 call2 (coding
->pre_write_conversion
, make_number (BEG
), make_number (Z
));
6190 if (prev
!= current_buffer
)
6191 /* We must kill the current buffer too. */
6192 Fsetcdr (unwind_arg
, Fcons (Fcurrent_buffer (), XCDR (unwind_arg
)));
6196 Vlast_coding_system_used
= coding
->symbol
;
6197 TEMP_SET_PT_BOTH (BEG
, BEG_BYTE
);
6198 call1 (coding
->post_read_conversion
, make_number (Z
- BEG
));
6199 coding
->symbol
= Vlast_coding_system_used
;
6201 inhibit_pre_post_conversion
= 0;
6202 Vdeactivate_mark
= old_deactivate_mark
;
6203 str
= make_buffer_string (BEG
, Z
, 1);
6204 return unbind_to (count
, str
);
6208 /* Run pre-write-conversion function of CODING on NCHARS/NBYTES
6209 text in *STR. *SIZE is the allocated bytes for STR. As it
6210 is intended that this function is called from encode_terminal_code,
6211 the pre-write-conversion function is run by safe_call and thus
6212 "Error during redisplay: ..." is logged when an error occurs.
6214 Store the resulting text in *STR and set CODING->produced_char and
6215 CODING->produced to the number of characters and bytes
6216 respectively. If the size of *STR is too small, enlarge it by
6217 xrealloc and update *STR and *SIZE. */
6220 run_pre_write_conversin_on_c_str (str
, size
, nchars
, nbytes
, coding
)
6221 unsigned char **str
;
6222 int *size
, nchars
, nbytes
;
6223 struct coding_system
*coding
;
6225 struct gcpro gcpro1
, gcpro2
;
6226 struct buffer
*cur
= current_buffer
;
6227 struct buffer
*prev
;
6228 Lisp_Object old_deactivate_mark
, old_last_coding_system_used
;
6229 Lisp_Object args
[3];
6230 Lisp_Object buffer_to_kill
;
6232 /* It is not crucial to specbind this. */
6233 old_deactivate_mark
= Vdeactivate_mark
;
6234 old_last_coding_system_used
= Vlast_coding_system_used
;
6235 GCPRO2 (old_deactivate_mark
, old_last_coding_system_used
);
6237 /* We must insert the contents of STR as is without
6238 unibyte<->multibyte conversion. For that, we adjust the
6239 multibyteness of the working buffer to that of STR. */
6240 buffer_to_kill
= set_conversion_work_buffer (coding
->src_multibyte
);
6241 insert_1_both (*str
, nchars
, nbytes
, 0, 0, 0);
6243 inhibit_pre_post_conversion
= 1;
6244 prev
= current_buffer
;
6245 args
[0] = coding
->pre_write_conversion
;
6246 args
[1] = make_number (BEG
);
6247 args
[2] = make_number (Z
);
6248 safe_call (3, args
);
6249 inhibit_pre_post_conversion
= 0;
6250 Vdeactivate_mark
= old_deactivate_mark
;
6251 Vlast_coding_system_used
= old_last_coding_system_used
;
6252 coding
->produced_char
= Z
- BEG
;
6253 coding
->produced
= Z_BYTE
- BEG_BYTE
;
6254 if (coding
->produced
> *size
)
6256 *size
= coding
->produced
;
6257 *str
= xrealloc (*str
, *size
);
6259 if (BEG
< GPT
&& GPT
< Z
)
6261 bcopy (BEG_ADDR
, *str
, coding
->produced
);
6262 coding
->src_multibyte
6263 = ! NILP (current_buffer
->enable_multibyte_characters
);
6264 if (prev
!= current_buffer
)
6265 Fkill_buffer (Fcurrent_buffer ());
6266 set_buffer_internal (cur
);
6267 if (! NILP (buffer_to_kill
))
6268 Fkill_buffer (buffer_to_kill
);
6273 decode_coding_string (str
, coding
, nocopy
)
6275 struct coding_system
*coding
;
6279 struct conversion_buffer buf
;
6281 Lisp_Object saved_coding_symbol
;
6283 int require_decoding
;
6284 int shrinked_bytes
= 0;
6286 int consumed
, consumed_char
, produced
, produced_char
;
6289 to_byte
= SBYTES (str
);
6291 saved_coding_symbol
= coding
->symbol
;
6292 coding
->src_multibyte
= STRING_MULTIBYTE (str
);
6293 coding
->dst_multibyte
= 1;
6294 coding
->heading_ascii
= 0;
6296 if (CODING_REQUIRE_DETECTION (coding
))
6298 /* See the comments in code_convert_region. */
6299 if (coding
->type
== coding_type_undecided
)
6301 detect_coding (coding
, SDATA (str
), to_byte
);
6302 if (coding
->type
== coding_type_undecided
)
6304 coding
->type
= coding_type_emacs_mule
;
6305 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
6306 /* As emacs-mule decoder will handle composition, we
6307 need this setting to allocate coding->cmp_data
6309 coding
->composing
= COMPOSITION_NO
;
6312 if (coding
->eol_type
== CODING_EOL_UNDECIDED
6313 && coding
->type
!= coding_type_ccl
)
6315 saved_coding_symbol
= coding
->symbol
;
6316 detect_eol (coding
, SDATA (str
), to_byte
);
6317 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
6318 coding
->eol_type
= CODING_EOL_LF
;
6319 /* We had better recover the original eol format if we
6320 encounter an inconsistent eol format while decoding. */
6321 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
6325 if (coding
->type
== coding_type_no_conversion
6326 || coding
->type
== coding_type_raw_text
)
6327 coding
->dst_multibyte
= 0;
6329 require_decoding
= CODING_REQUIRE_DECODING (coding
);
6331 if (STRING_MULTIBYTE (str
))
6333 /* Decoding routines expect the source text to be unibyte. */
6334 str
= Fstring_as_unibyte (str
);
6335 to_byte
= SBYTES (str
);
6337 coding
->src_multibyte
= 0;
6340 /* Try to skip the heading and tailing ASCIIs. */
6341 if (require_decoding
&& coding
->type
!= coding_type_ccl
)
6343 SHRINK_CONVERSION_REGION (&from
, &to_byte
, coding
, SDATA (str
),
6345 if (from
== to_byte
)
6346 require_decoding
= 0;
6347 shrinked_bytes
= from
+ (SBYTES (str
) - to_byte
);
6350 if (!require_decoding
6351 && !(SYMBOLP (coding
->post_read_conversion
)
6352 && !NILP (Ffboundp (coding
->post_read_conversion
))))
6354 coding
->consumed
= SBYTES (str
);
6355 coding
->consumed_char
= SCHARS (str
);
6356 if (coding
->dst_multibyte
)
6358 str
= Fstring_as_multibyte (str
);
6361 coding
->produced
= SBYTES (str
);
6362 coding
->produced_char
= SCHARS (str
);
6363 return (nocopy
? str
: Fcopy_sequence (str
));
6366 if (coding
->composing
!= COMPOSITION_DISABLED
)
6367 coding_allocate_composition_data (coding
, from
);
6368 len
= decoding_buffer_size (coding
, to_byte
- from
);
6369 allocate_conversion_buffer (buf
, len
);
6371 consumed
= consumed_char
= produced
= produced_char
= 0;
6374 result
= decode_coding (coding
, SDATA (str
) + from
+ consumed
,
6375 buf
.data
+ produced
, to_byte
- from
- consumed
,
6376 buf
.size
- produced
);
6377 consumed
+= coding
->consumed
;
6378 consumed_char
+= coding
->consumed_char
;
6379 produced
+= coding
->produced
;
6380 produced_char
+= coding
->produced_char
;
6381 if (result
== CODING_FINISH_NORMAL
6382 || result
== CODING_FINISH_INTERRUPT
6383 || (result
== CODING_FINISH_INSUFFICIENT_SRC
6384 && coding
->consumed
== 0))
6386 if (result
== CODING_FINISH_INSUFFICIENT_CMP
)
6387 coding_allocate_composition_data (coding
, from
+ produced_char
);
6388 else if (result
== CODING_FINISH_INSUFFICIENT_DST
)
6389 extend_conversion_buffer (&buf
);
6390 else if (result
== CODING_FINISH_INCONSISTENT_EOL
)
6392 Lisp_Object eol_type
;
6394 /* Recover the original EOL format. */
6395 if (coding
->eol_type
== CODING_EOL_CR
)
6398 for (p
= buf
.data
; p
< buf
.data
+ produced
; p
++)
6399 if (*p
== '\n') *p
= '\r';
6401 else if (coding
->eol_type
== CODING_EOL_CRLF
)
6404 unsigned char *p0
, *p1
;
6405 for (p0
= buf
.data
, p1
= p0
+ produced
; p0
< p1
; p0
++)
6406 if (*p0
== '\n') num_eol
++;
6407 if (produced
+ num_eol
>= buf
.size
)
6408 extend_conversion_buffer (&buf
);
6409 for (p0
= buf
.data
+ produced
, p1
= p0
+ num_eol
; p0
> buf
.data
;)
6412 if (*p0
== '\n') *--p1
= '\r';
6414 produced
+= num_eol
;
6415 produced_char
+= num_eol
;
6417 /* Suppress eol-format conversion in the further conversion. */
6418 coding
->eol_type
= CODING_EOL_LF
;
6420 /* Set the coding system symbol to that for Unix-like EOL. */
6421 eol_type
= Fget (saved_coding_symbol
, Qeol_type
);
6422 if (VECTORP (eol_type
)
6423 && XVECTOR (eol_type
)->size
== 3
6424 && SYMBOLP (XVECTOR (eol_type
)->contents
[CODING_EOL_LF
]))
6425 coding
->symbol
= XVECTOR (eol_type
)->contents
[CODING_EOL_LF
];
6427 coding
->symbol
= saved_coding_symbol
;
6433 coding
->consumed
= consumed
;
6434 coding
->consumed_char
= consumed_char
;
6435 coding
->produced
= produced
;
6436 coding
->produced_char
= produced_char
;
6438 if (coding
->dst_multibyte
)
6439 newstr
= make_uninit_multibyte_string (produced_char
+ shrinked_bytes
,
6440 produced
+ shrinked_bytes
);
6442 newstr
= make_uninit_string (produced
+ shrinked_bytes
);
6444 STRING_COPYIN (newstr
, 0, SDATA (str
), from
);
6445 STRING_COPYIN (newstr
, from
, buf
.data
, produced
);
6446 if (shrinked_bytes
> from
)
6447 STRING_COPYIN (newstr
, from
+ produced
,
6448 SDATA (str
) + to_byte
,
6449 shrinked_bytes
- from
);
6450 free_conversion_buffer (&buf
);
6452 coding
->consumed
+= shrinked_bytes
;
6453 coding
->consumed_char
+= shrinked_bytes
;
6454 coding
->produced
+= shrinked_bytes
;
6455 coding
->produced_char
+= shrinked_bytes
;
6457 if (coding
->cmp_data
&& coding
->cmp_data
->used
)
6458 coding_restore_composition (coding
, newstr
);
6459 coding_free_composition_data (coding
);
6461 if (SYMBOLP (coding
->post_read_conversion
)
6462 && !NILP (Ffboundp (coding
->post_read_conversion
)))
6463 newstr
= run_pre_post_conversion_on_str (newstr
, coding
, 0);
6469 encode_coding_string (str
, coding
, nocopy
)
6471 struct coding_system
*coding
;
6475 struct conversion_buffer buf
;
6476 int from
, to
, to_byte
;
6478 int shrinked_bytes
= 0;
6480 int consumed
, consumed_char
, produced
, produced_char
;
6482 if (SYMBOLP (coding
->pre_write_conversion
)
6483 && !NILP (Ffboundp (coding
->pre_write_conversion
)))
6485 str
= run_pre_post_conversion_on_str (str
, coding
, 1);
6486 /* As STR is just newly generated, we don't have to copy it
6493 to_byte
= SBYTES (str
);
6495 /* Encoding routines determine the multibyteness of the source text
6496 by coding->src_multibyte. */
6497 coding
->src_multibyte
= SCHARS (str
) < SBYTES (str
);
6498 coding
->dst_multibyte
= 0;
6499 if (! CODING_REQUIRE_ENCODING (coding
))
6500 goto no_need_of_encoding
;
6502 if (coding
->composing
!= COMPOSITION_DISABLED
)
6503 coding_save_composition (coding
, from
, to
, str
);
6505 /* Try to skip the heading and tailing ASCIIs. We can't skip them
6506 if we must run CCL program or there are compositions to
6508 coding
->heading_ascii
= 0;
6509 if (coding
->type
!= coding_type_ccl
6510 && (! coding
->cmp_data
|| coding
->cmp_data
->used
== 0))
6512 SHRINK_CONVERSION_REGION (&from
, &to_byte
, coding
, SDATA (str
),
6514 if (from
== to_byte
)
6516 coding_free_composition_data (coding
);
6517 goto no_need_of_encoding
;
6519 shrinked_bytes
= from
+ (SBYTES (str
) - to_byte
);
6522 len
= encoding_buffer_size (coding
, to_byte
- from
);
6523 allocate_conversion_buffer (buf
, len
);
6525 consumed
= consumed_char
= produced
= produced_char
= 0;
6528 result
= encode_coding (coding
, SDATA (str
) + from
+ consumed
,
6529 buf
.data
+ produced
, to_byte
- from
- consumed
,
6530 buf
.size
- produced
);
6531 consumed
+= coding
->consumed
;
6532 consumed_char
+= coding
->consumed_char
;
6533 produced
+= coding
->produced
;
6534 produced_char
+= coding
->produced_char
;
6535 if (result
== CODING_FINISH_NORMAL
6536 || result
== CODING_FINISH_INTERRUPT
6537 || (result
== CODING_FINISH_INSUFFICIENT_SRC
6538 && coding
->consumed
== 0))
6540 /* Now result should be CODING_FINISH_INSUFFICIENT_DST. */
6541 extend_conversion_buffer (&buf
);
6544 coding
->consumed
= consumed
;
6545 coding
->consumed_char
= consumed_char
;
6546 coding
->produced
= produced
;
6547 coding
->produced_char
= produced_char
;
6549 newstr
= make_uninit_string (produced
+ shrinked_bytes
);
6551 STRING_COPYIN (newstr
, 0, SDATA (str
), from
);
6552 STRING_COPYIN (newstr
, from
, buf
.data
, produced
);
6553 if (shrinked_bytes
> from
)
6554 STRING_COPYIN (newstr
, from
+ produced
,
6555 SDATA (str
) + to_byte
,
6556 shrinked_bytes
- from
);
6558 free_conversion_buffer (&buf
);
6559 coding_free_composition_data (coding
);
6563 no_need_of_encoding
:
6564 coding
->consumed
= SBYTES (str
);
6565 coding
->consumed_char
= SCHARS (str
);
6566 if (STRING_MULTIBYTE (str
))
6569 /* We are sure that STR doesn't contain a multibyte
6571 STRING_SET_UNIBYTE (str
);
6574 str
= Fstring_as_unibyte (str
);
6578 coding
->produced
= SBYTES (str
);
6579 coding
->produced_char
= SCHARS (str
);
6580 return (nocopy
? str
: Fcopy_sequence (str
));
6585 /*** 8. Emacs Lisp library functions ***/
6587 DEFUN ("coding-system-p", Fcoding_system_p
, Scoding_system_p
, 1, 1, 0,
6588 doc
: /* Return t if OBJECT is nil or a coding-system.
6589 See the documentation of `make-coding-system' for information
6590 about coding-system objects. */)
6598 if (! NILP (Fget (obj
, Qcoding_system_define_form
)))
6600 /* Get coding-spec vector for OBJ. */
6601 obj
= Fget (obj
, Qcoding_system
);
6602 return ((VECTORP (obj
) && XVECTOR (obj
)->size
== 5)
6606 DEFUN ("read-non-nil-coding-system", Fread_non_nil_coding_system
,
6607 Sread_non_nil_coding_system
, 1, 1, 0,
6608 doc
: /* Read a coding system from the minibuffer, prompting with string PROMPT. */)
6615 val
= Fcompleting_read (prompt
, Vcoding_system_alist
, Qnil
,
6616 Qt
, Qnil
, Qcoding_system_history
, Qnil
, Qnil
);
6618 while (SCHARS (val
) == 0);
6619 return (Fintern (val
, Qnil
));
6622 DEFUN ("read-coding-system", Fread_coding_system
, Sread_coding_system
, 1, 2, 0,
6623 doc
: /* Read a coding system from the minibuffer, prompting with string PROMPT.
6624 If the user enters null input, return second argument DEFAULT-CODING-SYSTEM.
6625 Ignores case when completing coding systems (all Emacs coding systems
6626 are lower-case). */)
6627 (prompt
, default_coding_system
)
6628 Lisp_Object prompt
, default_coding_system
;
6631 int count
= SPECPDL_INDEX ();
6633 if (SYMBOLP (default_coding_system
))
6634 default_coding_system
= SYMBOL_NAME (default_coding_system
);
6635 specbind (Qcompletion_ignore_case
, Qt
);
6636 val
= Fcompleting_read (prompt
, Vcoding_system_alist
, Qnil
,
6637 Qt
, Qnil
, Qcoding_system_history
,
6638 default_coding_system
, Qnil
);
6639 unbind_to (count
, Qnil
);
6640 return (SCHARS (val
) == 0 ? Qnil
: Fintern (val
, Qnil
));
6643 DEFUN ("check-coding-system", Fcheck_coding_system
, Scheck_coding_system
,
6645 doc
: /* Check validity of CODING-SYSTEM.
6646 If valid, return CODING-SYSTEM, else signal a `coding-system-error' error.
6647 It is valid if it is nil or a symbol with a non-nil `coding-system' property.
6648 The value of this property should be a vector of length 5. */)
6650 Lisp_Object coding_system
;
6652 Lisp_Object define_form
;
6654 define_form
= Fget (coding_system
, Qcoding_system_define_form
);
6655 if (! NILP (define_form
))
6657 Fput (coding_system
, Qcoding_system_define_form
, Qnil
);
6658 safe_eval (define_form
);
6660 if (!NILP (Fcoding_system_p (coding_system
)))
6661 return coding_system
;
6662 xsignal1 (Qcoding_system_error
, coding_system
);
6666 detect_coding_system (src
, src_bytes
, highest
, multibytep
)
6667 const unsigned char *src
;
6668 int src_bytes
, highest
;
6671 int coding_mask
, eol_type
;
6672 Lisp_Object val
, tmp
;
6675 coding_mask
= detect_coding_mask (src
, src_bytes
, NULL
, &dummy
, multibytep
);
6676 eol_type
= detect_eol_type (src
, src_bytes
, &dummy
);
6677 if (eol_type
== CODING_EOL_INCONSISTENT
)
6678 eol_type
= CODING_EOL_UNDECIDED
;
6683 if (eol_type
!= CODING_EOL_UNDECIDED
)
6686 val2
= Fget (Qundecided
, Qeol_type
);
6688 val
= XVECTOR (val2
)->contents
[eol_type
];
6690 return (highest
? val
: Fcons (val
, Qnil
));
6693 /* At first, gather possible coding systems in VAL. */
6695 for (tmp
= Vcoding_category_list
; CONSP (tmp
); tmp
= XCDR (tmp
))
6697 Lisp_Object category_val
, category_index
;
6699 category_index
= Fget (XCAR (tmp
), Qcoding_category_index
);
6700 category_val
= Fsymbol_value (XCAR (tmp
));
6701 if (!NILP (category_val
)
6702 && NATNUMP (category_index
)
6703 && (coding_mask
& (1 << XFASTINT (category_index
))))
6705 val
= Fcons (category_val
, val
);
6711 val
= Fnreverse (val
);
6713 /* Then, replace the elements with subsidiary coding systems. */
6714 for (tmp
= val
; CONSP (tmp
); tmp
= XCDR (tmp
))
6716 if (eol_type
!= CODING_EOL_UNDECIDED
6717 && eol_type
!= CODING_EOL_INCONSISTENT
)
6720 eol
= Fget (XCAR (tmp
), Qeol_type
);
6722 XSETCAR (tmp
, XVECTOR (eol
)->contents
[eol_type
]);
6725 return (highest
? XCAR (val
) : val
);
6728 DEFUN ("detect-coding-region", Fdetect_coding_region
, Sdetect_coding_region
,
6730 doc
: /* Detect how the byte sequence in the region is encoded.
6731 Return a list of possible coding systems used on decoding a byte
6732 sequence containing the bytes in the region between START and END when
6733 the coding system `undecided' is specified. The list is ordered by
6734 priority decided in the current language environment.
6736 If only ASCII characters are found (except for such ISO-2022 control
6737 characters ISO-2022 as ESC), it returns a list of single element
6738 `undecided' or its subsidiary coding system according to a detected
6741 If optional argument HIGHEST is non-nil, return the coding system of
6742 highest priority. */)
6743 (start
, end
, highest
)
6744 Lisp_Object start
, end
, highest
;
6747 int from_byte
, to_byte
;
6748 int include_anchor_byte
= 0;
6750 CHECK_NUMBER_COERCE_MARKER (start
);
6751 CHECK_NUMBER_COERCE_MARKER (end
);
6753 validate_region (&start
, &end
);
6754 from
= XINT (start
), to
= XINT (end
);
6755 from_byte
= CHAR_TO_BYTE (from
);
6756 to_byte
= CHAR_TO_BYTE (to
);
6758 if (from
< GPT
&& to
>= GPT
)
6759 move_gap_both (to
, to_byte
);
6760 /* If we an anchor byte `\0' follows the region, we include it in
6761 the detecting source. Then code detectors can handle the tailing
6762 byte sequence more accurately.
6764 Fix me: This is not a perfect solution. It is better that we
6765 add one more argument, say LAST_BLOCK, to all detect_coding_XXX.
6767 if (to
== Z
|| (to
== GPT
&& GAP_SIZE
> 0))
6768 include_anchor_byte
= 1;
6769 return detect_coding_system (BYTE_POS_ADDR (from_byte
),
6770 to_byte
- from_byte
+ include_anchor_byte
,
6772 !NILP (current_buffer
6773 ->enable_multibyte_characters
));
6776 DEFUN ("detect-coding-string", Fdetect_coding_string
, Sdetect_coding_string
,
6778 doc
: /* Detect how the byte sequence in STRING is encoded.
6779 Return a list of possible coding systems used on decoding a byte
6780 sequence containing the bytes in STRING when the coding system
6781 `undecided' is specified. The list is ordered by priority decided in
6782 the current language environment.
6784 If only ASCII characters are found (except for such ISO-2022 control
6785 characters ISO-2022 as ESC), it returns a list of single element
6786 `undecided' or its subsidiary coding system according to a detected
6789 If optional argument HIGHEST is non-nil, return the coding system of
6790 highest priority. */)
6792 Lisp_Object string
, highest
;
6794 CHECK_STRING (string
);
6796 return detect_coding_system (SDATA (string
),
6797 /* "+ 1" is to include the anchor byte
6798 `\0'. With this, code detectors can
6799 handle the tailing bytes more
6801 SBYTES (string
) + 1,
6803 STRING_MULTIBYTE (string
));
6806 /* Subroutine for Ffind_coding_systems_region_internal.
6808 Return a list of coding systems that safely encode the multibyte
6809 text between P and PEND. SAFE_CODINGS, if non-nil, is an alist of
6810 possible coding systems. If it is nil, it means that we have not
6811 yet found any coding systems.
6813 WORK_TABLE a char-table of which element is set to t once the
6814 element is looked up.
6816 If a non-ASCII single byte char is found, set
6817 *single_byte_char_found to 1. */
6820 find_safe_codings (p
, pend
, safe_codings
, work_table
, single_byte_char_found
)
6821 unsigned char *p
, *pend
;
6822 Lisp_Object safe_codings
, work_table
;
6823 int *single_byte_char_found
;
6826 Lisp_Object val
, ch
;
6827 Lisp_Object prev
, tail
;
6829 if (NILP (safe_codings
))
6830 goto done_safe_codings
;
6833 c
= STRING_CHAR_AND_LENGTH (p
, pend
- p
, len
);
6835 if (ASCII_BYTE_P (c
))
6836 /* We can ignore ASCII characters here. */
6838 if (SINGLE_BYTE_CHAR_P (c
))
6839 *single_byte_char_found
= 1;
6840 /* Check the safe coding systems for C. */
6841 ch
= make_number (c
);
6842 val
= Faref (work_table
, ch
);
6844 /* This element was already checked. Ignore it. */
6846 /* Remember that we checked this element. */
6847 Faset (work_table
, ch
, Qt
);
6849 for (prev
= tail
= safe_codings
; CONSP (tail
); tail
= XCDR (tail
))
6851 Lisp_Object elt
, translation_table
, hash_table
, accept_latin_extra
;
6855 if (CONSP (XCDR (elt
)))
6857 /* This entry has this format now:
6858 ( CODING SAFE-CHARS TRANSLATION-TABLE HASH-TABLE
6859 ACCEPT-LATIN-EXTRA ) */
6861 encodable
= ! NILP (Faref (XCAR (val
), ch
));
6865 translation_table
= XCAR (val
);
6866 hash_table
= XCAR (XCDR (val
));
6867 accept_latin_extra
= XCAR (XCDR (XCDR (val
)));
6872 /* This entry has this format now: ( CODING . SAFE-CHARS) */
6873 encodable
= ! NILP (Faref (XCDR (elt
), ch
));
6876 /* Transform the format to:
6877 ( CODING SAFE-CHARS TRANSLATION-TABLE HASH-TABLE
6878 ACCEPT-LATIN-EXTRA ) */
6879 val
= Fget (XCAR (elt
), Qcoding_system
);
6881 = Fplist_get (AREF (val
, 3),
6882 Qtranslation_table_for_encode
);
6883 if (SYMBOLP (translation_table
))
6884 translation_table
= Fget (translation_table
,
6885 Qtranslation_table
);
6887 = (CHAR_TABLE_P (translation_table
)
6888 ? XCHAR_TABLE (translation_table
)->extras
[1]
6891 = ((EQ (AREF (val
, 0), make_number (2))
6892 && VECTORP (AREF (val
, 4)))
6893 ? AREF (AREF (val
, 4), 16)
6895 XSETCAR (tail
, list5 (XCAR (elt
), XCDR (elt
),
6896 translation_table
, hash_table
,
6897 accept_latin_extra
));
6902 && ((CHAR_TABLE_P (translation_table
)
6903 && ! NILP (Faref (translation_table
, ch
)))
6904 || (HASH_TABLE_P (hash_table
)
6905 && ! NILP (Fgethash (ch
, hash_table
, Qnil
)))
6906 || (SINGLE_BYTE_CHAR_P (c
)
6907 && ! NILP (accept_latin_extra
)
6908 && VECTORP (Vlatin_extra_code_table
)
6909 && ! NILP (AREF (Vlatin_extra_code_table
, c
)))))
6915 /* Exclude this coding system from SAFE_CODINGS. */
6916 if (EQ (tail
, safe_codings
))
6918 safe_codings
= XCDR (safe_codings
);
6919 if (NILP (safe_codings
))
6920 goto done_safe_codings
;
6923 XSETCDR (prev
, XCDR (tail
));
6929 /* If the above loop was terminated before P reaches PEND, it means
6930 SAFE_CODINGS was set to nil. If we have not yet found an
6931 non-ASCII single-byte char, check it now. */
6932 if (! *single_byte_char_found
)
6935 c
= STRING_CHAR_AND_LENGTH (p
, pend
- p
, len
);
6937 if (! ASCII_BYTE_P (c
)
6938 && SINGLE_BYTE_CHAR_P (c
))
6940 *single_byte_char_found
= 1;
6944 return safe_codings
;
6947 DEFUN ("find-coding-systems-region-internal",
6948 Ffind_coding_systems_region_internal
,
6949 Sfind_coding_systems_region_internal
, 2, 2, 0,
6950 doc
: /* Internal use only. */)
6952 Lisp_Object start
, end
;
6954 Lisp_Object work_table
, safe_codings
;
6955 int non_ascii_p
= 0;
6956 int single_byte_char_found
= 0;
6957 const unsigned char *p1
, *p1end
, *p2
, *p2end
, *p
;
6959 if (STRINGP (start
))
6961 if (!STRING_MULTIBYTE (start
))
6963 p1
= SDATA (start
), p1end
= p1
+ SBYTES (start
);
6965 if (SCHARS (start
) != SBYTES (start
))
6972 CHECK_NUMBER_COERCE_MARKER (start
);
6973 CHECK_NUMBER_COERCE_MARKER (end
);
6974 if (XINT (start
) < BEG
|| XINT (end
) > Z
|| XINT (start
) > XINT (end
))
6975 args_out_of_range (start
, end
);
6976 if (NILP (current_buffer
->enable_multibyte_characters
))
6978 from
= CHAR_TO_BYTE (XINT (start
));
6979 to
= CHAR_TO_BYTE (XINT (end
));
6980 stop
= from
< GPT_BYTE
&& GPT_BYTE
< to
? GPT_BYTE
: to
;
6981 p1
= BYTE_POS_ADDR (from
), p1end
= p1
+ (stop
- from
);
6985 p2
= BYTE_POS_ADDR (stop
), p2end
= p2
+ (to
- stop
);
6986 if (XINT (end
) - XINT (start
) != to
- from
)
6992 /* We are sure that the text contains no multibyte character.
6993 Check if it contains eight-bit-graphic. */
6995 for (p
= p1
; p
< p1end
&& ASCII_BYTE_P (*p
); p
++);
6998 for (p
= p2
; p
< p2end
&& ASCII_BYTE_P (*p
); p
++);
7004 /* The text contains non-ASCII characters. */
7006 work_table
= Fmake_char_table (Qchar_coding_system
, Qnil
);
7007 safe_codings
= Fcopy_sequence (XCDR (Vcoding_system_safe_chars
));
7009 safe_codings
= find_safe_codings (p1
, p1end
, safe_codings
, work_table
,
7010 &single_byte_char_found
);
7012 safe_codings
= find_safe_codings (p2
, p2end
, safe_codings
, work_table
,
7013 &single_byte_char_found
);
7014 if (EQ (safe_codings
, XCDR (Vcoding_system_safe_chars
)))
7018 /* Turn safe_codings to a list of coding systems... */
7021 if (single_byte_char_found
)
7022 /* ... and append these for eight-bit chars. */
7023 val
= Fcons (Qraw_text
,
7024 Fcons (Qemacs_mule
, Fcons (Qno_conversion
, Qnil
)));
7026 /* ... and append generic coding systems. */
7027 val
= Fcopy_sequence (XCAR (Vcoding_system_safe_chars
));
7029 for (; CONSP (safe_codings
); safe_codings
= XCDR (safe_codings
))
7030 val
= Fcons (XCAR (XCAR (safe_codings
)), val
);
7034 return safe_codings
;
7038 /* Search from position POS for such characters that are unencodable
7039 accoding to SAFE_CHARS, and return a list of their positions. P
7040 points where in the memory the character at POS exists. Limit the
7041 search at PEND or when Nth unencodable characters are found.
7043 If SAFE_CHARS is a char table, an element for an unencodable
7046 If SAFE_CHARS is nil, all non-ASCII characters are unencodable.
7048 Otherwise, SAFE_CHARS is t, and only eight-bit-contrl and
7049 eight-bit-graphic characters are unencodable. */
7052 unencodable_char_position (safe_chars
, pos
, p
, pend
, n
)
7053 Lisp_Object safe_chars
;
7055 unsigned char *p
, *pend
;
7058 Lisp_Object pos_list
;
7064 int c
= STRING_CHAR_AND_LENGTH (p
, MAX_MULTIBYTE_LENGTH
, len
);
7067 && (CHAR_TABLE_P (safe_chars
)
7068 ? NILP (CHAR_TABLE_REF (safe_chars
, c
))
7069 : (NILP (safe_chars
) || c
< 256)))
7071 pos_list
= Fcons (make_number (pos
), pos_list
);
7078 return Fnreverse (pos_list
);
7082 DEFUN ("unencodable-char-position", Funencodable_char_position
,
7083 Sunencodable_char_position
, 3, 5, 0,
7085 Return position of first un-encodable character in a region.
7086 START and END specfiy the region and CODING-SYSTEM specifies the
7087 encoding to check. Return nil if CODING-SYSTEM does encode the region.
7089 If optional 4th argument COUNT is non-nil, it specifies at most how
7090 many un-encodable characters to search. In this case, the value is a
7093 If optional 5th argument STRING is non-nil, it is a string to search
7094 for un-encodable characters. In that case, START and END are indexes
7096 (start
, end
, coding_system
, count
, string
)
7097 Lisp_Object start
, end
, coding_system
, count
, string
;
7100 Lisp_Object safe_chars
;
7101 struct coding_system coding
;
7102 Lisp_Object positions
;
7104 unsigned char *p
, *pend
;
7108 validate_region (&start
, &end
);
7109 from
= XINT (start
);
7111 if (NILP (current_buffer
->enable_multibyte_characters
))
7113 p
= CHAR_POS_ADDR (from
);
7117 pend
= CHAR_POS_ADDR (to
);
7121 CHECK_STRING (string
);
7122 CHECK_NATNUM (start
);
7124 from
= XINT (start
);
7127 || to
> SCHARS (string
))
7128 args_out_of_range_3 (string
, start
, end
);
7129 if (! STRING_MULTIBYTE (string
))
7131 p
= SDATA (string
) + string_char_to_byte (string
, from
);
7132 pend
= SDATA (string
) + string_char_to_byte (string
, to
);
7135 setup_coding_system (Fcheck_coding_system (coding_system
), &coding
);
7141 CHECK_NATNUM (count
);
7145 if (coding
.type
== coding_type_no_conversion
7146 || coding
.type
== coding_type_raw_text
)
7149 if (coding
.type
== coding_type_undecided
)
7152 safe_chars
= coding_safe_chars (coding_system
);
7154 if (STRINGP (string
)
7155 || from
>= GPT
|| to
<= GPT
)
7156 positions
= unencodable_char_position (safe_chars
, from
, p
, pend
, n
);
7159 Lisp_Object args
[2];
7161 args
[0] = unencodable_char_position (safe_chars
, from
, p
, GPT_ADDR
, n
);
7162 n
-= XINT (Flength (args
[0]));
7164 positions
= args
[0];
7167 args
[1] = unencodable_char_position (safe_chars
, GPT
, GAP_END_ADDR
,
7169 positions
= Fappend (2, args
);
7173 return (NILP (count
) ? Fcar (positions
) : positions
);
7178 code_convert_region1 (start
, end
, coding_system
, encodep
)
7179 Lisp_Object start
, end
, coding_system
;
7182 struct coding_system coding
;
7185 CHECK_NUMBER_COERCE_MARKER (start
);
7186 CHECK_NUMBER_COERCE_MARKER (end
);
7187 CHECK_SYMBOL (coding_system
);
7189 validate_region (&start
, &end
);
7190 from
= XFASTINT (start
);
7191 to
= XFASTINT (end
);
7193 if (NILP (coding_system
))
7194 return make_number (to
- from
);
7196 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
7197 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
7199 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
7200 coding
.src_multibyte
= coding
.dst_multibyte
7201 = !NILP (current_buffer
->enable_multibyte_characters
);
7202 code_convert_region (from
, CHAR_TO_BYTE (from
), to
, CHAR_TO_BYTE (to
),
7203 &coding
, encodep
, 1);
7204 Vlast_coding_system_used
= coding
.symbol
;
7205 return make_number (coding
.produced_char
);
7208 DEFUN ("decode-coding-region", Fdecode_coding_region
, Sdecode_coding_region
,
7209 3, 3, "r\nzCoding system: ",
7210 doc
: /* Decode the current region from the specified coding system.
7211 When called from a program, takes three arguments:
7212 START, END, and CODING-SYSTEM. START and END are buffer positions.
7213 This function sets `last-coding-system-used' to the precise coding system
7214 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
7215 not fully specified.)
7216 It returns the length of the decoded text. */)
7217 (start
, end
, coding_system
)
7218 Lisp_Object start
, end
, coding_system
;
7220 return code_convert_region1 (start
, end
, coding_system
, 0);
7223 DEFUN ("encode-coding-region", Fencode_coding_region
, Sencode_coding_region
,
7224 3, 3, "r\nzCoding system: ",
7225 doc
: /* Encode the current region into the specified coding system.
7226 When called from a program, takes three arguments:
7227 START, END, and CODING-SYSTEM. START and END are buffer positions.
7228 This function sets `last-coding-system-used' to the precise coding system
7229 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
7230 not fully specified.)
7231 It returns the length of the encoded text. */)
7232 (start
, end
, coding_system
)
7233 Lisp_Object start
, end
, coding_system
;
7235 return code_convert_region1 (start
, end
, coding_system
, 1);
7239 code_convert_string1 (string
, coding_system
, nocopy
, encodep
)
7240 Lisp_Object string
, coding_system
, nocopy
;
7243 struct coding_system coding
;
7245 CHECK_STRING (string
);
7246 CHECK_SYMBOL (coding_system
);
7248 if (NILP (coding_system
))
7249 return (NILP (nocopy
) ? Fcopy_sequence (string
) : string
);
7251 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
7252 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
7254 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
7256 ? encode_coding_string (string
, &coding
, !NILP (nocopy
))
7257 : decode_coding_string (string
, &coding
, !NILP (nocopy
)));
7258 Vlast_coding_system_used
= coding
.symbol
;
7263 DEFUN ("decode-coding-string", Fdecode_coding_string
, Sdecode_coding_string
,
7265 doc
: /* Decode STRING which is encoded in CODING-SYSTEM, and return the result.
7266 Optional arg NOCOPY non-nil means it is OK to return STRING itself
7267 if the decoding operation is trivial.
7268 This function sets `last-coding-system-used' to the precise coding system
7269 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
7270 not fully specified.) */)
7271 (string
, coding_system
, nocopy
)
7272 Lisp_Object string
, coding_system
, nocopy
;
7274 return code_convert_string1 (string
, coding_system
, nocopy
, 0);
7277 DEFUN ("encode-coding-string", Fencode_coding_string
, Sencode_coding_string
,
7279 doc
: /* Encode STRING to CODING-SYSTEM, and return the result.
7280 Optional arg NOCOPY non-nil means it is OK to return STRING itself
7281 if the encoding operation is trivial.
7282 This function sets `last-coding-system-used' to the precise coding system
7283 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
7284 not fully specified.) */)
7285 (string
, coding_system
, nocopy
)
7286 Lisp_Object string
, coding_system
, nocopy
;
7288 return code_convert_string1 (string
, coding_system
, nocopy
, 1);
7291 /* Encode or decode STRING according to CODING_SYSTEM.
7292 Do not set Vlast_coding_system_used.
7294 This function is called only from macros DECODE_FILE and
7295 ENCODE_FILE, thus we ignore character composition. */
7298 code_convert_string_norecord (string
, coding_system
, encodep
)
7299 Lisp_Object string
, coding_system
;
7302 struct coding_system coding
;
7304 CHECK_STRING (string
);
7305 CHECK_SYMBOL (coding_system
);
7307 if (NILP (coding_system
))
7310 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
7311 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
7313 coding
.composing
= COMPOSITION_DISABLED
;
7314 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
7316 ? encode_coding_string (string
, &coding
, 1)
7317 : decode_coding_string (string
, &coding
, 1));
7320 DEFUN ("decode-sjis-char", Fdecode_sjis_char
, Sdecode_sjis_char
, 1, 1, 0,
7321 doc
: /* Decode a Japanese character which has CODE in shift_jis encoding.
7322 Return the corresponding character. */)
7326 unsigned char c1
, c2
, s1
, s2
;
7329 CHECK_NUMBER (code
);
7330 s1
= (XFASTINT (code
)) >> 8, s2
= (XFASTINT (code
)) & 0xFF;
7334 XSETFASTINT (val
, s2
);
7335 else if (s2
>= 0xA0 || s2
<= 0xDF)
7336 XSETFASTINT (val
, MAKE_CHAR (charset_katakana_jisx0201
, s2
, 0));
7338 error ("Invalid Shift JIS code: %x", XFASTINT (code
));
7342 if ((s1
< 0x80 || (s1
> 0x9F && s1
< 0xE0) || s1
> 0xEF)
7343 || (s2
< 0x40 || s2
== 0x7F || s2
> 0xFC))
7344 error ("Invalid Shift JIS code: %x", XFASTINT (code
));
7345 DECODE_SJIS (s1
, s2
, c1
, c2
);
7346 XSETFASTINT (val
, MAKE_CHAR (charset_jisx0208
, c1
, c2
));
7351 DEFUN ("encode-sjis-char", Fencode_sjis_char
, Sencode_sjis_char
, 1, 1, 0,
7352 doc
: /* Encode a Japanese character CH to shift_jis encoding.
7353 Return the corresponding code in SJIS. */)
7357 int charset
, c1
, c2
, s1
, s2
;
7361 SPLIT_CHAR (XFASTINT (ch
), charset
, c1
, c2
);
7362 if (charset
== CHARSET_ASCII
)
7366 else if (charset
== charset_jisx0208
7367 && c1
> 0x20 && c1
< 0x7F && c2
> 0x20 && c2
< 0x7F)
7369 ENCODE_SJIS (c1
, c2
, s1
, s2
);
7370 XSETFASTINT (val
, (s1
<< 8) | s2
);
7372 else if (charset
== charset_katakana_jisx0201
7373 && c1
> 0x20 && c2
< 0xE0)
7375 XSETFASTINT (val
, c1
| 0x80);
7378 error ("Can't encode to shift_jis: %d", XFASTINT (ch
));
7382 DEFUN ("decode-big5-char", Fdecode_big5_char
, Sdecode_big5_char
, 1, 1, 0,
7383 doc
: /* Decode a Big5 character which has CODE in BIG5 coding system.
7384 Return the corresponding character. */)
7389 unsigned char b1
, b2
, c1
, c2
;
7392 CHECK_NUMBER (code
);
7393 b1
= (XFASTINT (code
)) >> 8, b2
= (XFASTINT (code
)) & 0xFF;
7397 error ("Invalid BIG5 code: %x", XFASTINT (code
));
7402 if ((b1
< 0xA1 || b1
> 0xFE)
7403 || (b2
< 0x40 || (b2
> 0x7E && b2
< 0xA1) || b2
> 0xFE))
7404 error ("Invalid BIG5 code: %x", XFASTINT (code
));
7405 DECODE_BIG5 (b1
, b2
, charset
, c1
, c2
);
7406 XSETFASTINT (val
, MAKE_CHAR (charset
, c1
, c2
));
7411 DEFUN ("encode-big5-char", Fencode_big5_char
, Sencode_big5_char
, 1, 1, 0,
7412 doc
: /* Encode the Big5 character CH to BIG5 coding system.
7413 Return the corresponding character code in Big5. */)
7417 int charset
, c1
, c2
, b1
, b2
;
7421 SPLIT_CHAR (XFASTINT (ch
), charset
, c1
, c2
);
7422 if (charset
== CHARSET_ASCII
)
7426 else if ((charset
== charset_big5_1
7427 && (XFASTINT (ch
) >= 0x250a1 && XFASTINT (ch
) <= 0x271ec))
7428 || (charset
== charset_big5_2
7429 && XFASTINT (ch
) >= 0x290a1 && XFASTINT (ch
) <= 0x2bdb2))
7431 ENCODE_BIG5 (charset
, c1
, c2
, b1
, b2
);
7432 XSETFASTINT (val
, (b1
<< 8) | b2
);
7435 error ("Can't encode to Big5: %d", XFASTINT (ch
));
7439 DEFUN ("set-terminal-coding-system-internal", Fset_terminal_coding_system_internal
,
7440 Sset_terminal_coding_system_internal
, 1, 2, 0,
7441 doc
: /* Internal use only. */)
7442 (coding_system
, terminal
)
7443 Lisp_Object coding_system
;
7444 Lisp_Object terminal
;
7446 struct coding_system
*terminal_coding
= TERMINAL_TERMINAL_CODING (get_terminal (terminal
, 1));
7447 CHECK_SYMBOL (coding_system
);
7448 setup_coding_system (Fcheck_coding_system (coding_system
), terminal_coding
);
7449 /* We had better not send unsafe characters to terminal. */
7450 terminal_coding
->mode
|= CODING_MODE_INHIBIT_UNENCODABLE_CHAR
;
7451 /* Character composition should be disabled. */
7452 terminal_coding
->composing
= COMPOSITION_DISABLED
;
7453 /* Error notification should be suppressed. */
7454 terminal_coding
->suppress_error
= 1;
7455 terminal_coding
->src_multibyte
= 1;
7456 terminal_coding
->dst_multibyte
= 0;
7460 DEFUN ("set-safe-terminal-coding-system-internal", Fset_safe_terminal_coding_system_internal
,
7461 Sset_safe_terminal_coding_system_internal
, 1, 1, 0,
7462 doc
: /* Internal use only. */)
7464 Lisp_Object coding_system
;
7466 CHECK_SYMBOL (coding_system
);
7467 setup_coding_system (Fcheck_coding_system (coding_system
),
7468 &safe_terminal_coding
);
7469 /* Character composition should be disabled. */
7470 safe_terminal_coding
.composing
= COMPOSITION_DISABLED
;
7471 /* Error notification should be suppressed. */
7472 safe_terminal_coding
.suppress_error
= 1;
7473 safe_terminal_coding
.src_multibyte
= 1;
7474 safe_terminal_coding
.dst_multibyte
= 0;
7478 DEFUN ("terminal-coding-system", Fterminal_coding_system
,
7479 Sterminal_coding_system
, 0, 1, 0,
7480 doc
: /* Return coding system specified for terminal output on the given terminal.
7481 TERMINAL may be a terminal id, a frame, or nil for the selected
7482 frame's terminal device. */)
7484 Lisp_Object terminal
;
7486 return TERMINAL_TERMINAL_CODING (get_terminal (terminal
, 1))->symbol
;
7489 DEFUN ("set-keyboard-coding-system-internal", Fset_keyboard_coding_system_internal
,
7490 Sset_keyboard_coding_system_internal
, 1, 2, 0,
7491 doc
: /* Internal use only. */)
7492 (coding_system
, terminal
)
7493 Lisp_Object coding_system
;
7494 Lisp_Object terminal
;
7496 struct terminal
*t
= get_terminal (terminal
, 1);
7497 CHECK_SYMBOL (coding_system
);
7499 setup_coding_system (Fcheck_coding_system (coding_system
),
7500 TERMINAL_KEYBOARD_CODING (t
));
7501 /* Character composition should be disabled. */
7502 TERMINAL_KEYBOARD_CODING (t
)->composing
= COMPOSITION_DISABLED
;
7506 DEFUN ("keyboard-coding-system", Fkeyboard_coding_system
,
7507 Skeyboard_coding_system
, 0, 1, 0,
7508 doc
: /* Return coding system for decoding keyboard input on TERMINAL.
7509 TERMINAL may be a terminal id, a frame, or nil for the selected
7510 frame's terminal device. */)
7512 Lisp_Object terminal
;
7514 return TERMINAL_KEYBOARD_CODING (get_terminal (terminal
, 1))->symbol
;
7518 DEFUN ("find-operation-coding-system", Ffind_operation_coding_system
,
7519 Sfind_operation_coding_system
, 1, MANY
, 0,
7520 doc
: /* Choose a coding system for an operation based on the target name.
7521 The value names a pair of coding systems: (DECODING-SYSTEM . ENCODING-SYSTEM).
7522 DECODING-SYSTEM is the coding system to use for decoding
7523 \(in case OPERATION does decoding), and ENCODING-SYSTEM is the coding system
7524 for encoding (in case OPERATION does encoding).
7526 The first argument OPERATION specifies an I/O primitive:
7527 For file I/O, `insert-file-contents' or `write-region'.
7528 For process I/O, `call-process', `call-process-region', or `start-process'.
7529 For network I/O, `open-network-stream'.
7531 The remaining arguments should be the same arguments that were passed
7532 to the primitive. Depending on which primitive, one of those arguments
7533 is selected as the TARGET. For example, if OPERATION does file I/O,
7534 whichever argument specifies the file name is TARGET.
7536 TARGET has a meaning which depends on OPERATION:
7537 For file I/O, TARGET is a file name (except for the special case below).
7538 For process I/O, TARGET is a process name.
7539 For network I/O, TARGET is a service name or a port number
7541 This function looks up what specified for TARGET in,
7542 `file-coding-system-alist', `process-coding-system-alist',
7543 or `network-coding-system-alist' depending on OPERATION.
7544 They may specify a coding system, a cons of coding systems,
7545 or a function symbol to call.
7546 In the last case, we call the function with one argument,
7547 which is a list of all the arguments given to this function.
7548 If the function can't decide a coding system, it can return
7549 `undecided' so that the normal code-detection is performed.
7551 If OPERATION is `insert-file-contents', the argument corresponding to
7552 TARGET may be a cons (FILENAME . BUFFER). In that case, FILENAME is a
7553 file name to look up, and BUFFER is a buffer that contains the file's
7554 contents (not yet decoded). If `file-coding-system-alist' specifies a
7555 function to call for FILENAME, that function should examine the
7556 contents of BUFFER instead of reading the file.
7558 usage: (find-operation-coding-system OPERATION ARGUMENTS...) */)
7563 Lisp_Object operation
, target_idx
, target
, val
;
7564 register Lisp_Object chain
;
7567 error ("Too few arguments");
7568 operation
= args
[0];
7569 if (!SYMBOLP (operation
)
7570 || !INTEGERP (target_idx
= Fget (operation
, Qtarget_idx
)))
7571 error ("Invalid first argument");
7572 if (nargs
< 1 + XINT (target_idx
))
7573 error ("Too few arguments for operation: %s",
7574 SDATA (SYMBOL_NAME (operation
)));
7575 /* For write-region, if the 6th argument (i.e. VISIT, the 5th
7576 argument to write-region) is string, it must be treated as a
7577 target file name. */
7578 if (EQ (operation
, Qwrite_region
)
7580 && STRINGP (args
[5]))
7581 target_idx
= make_number (4);
7582 target
= args
[XINT (target_idx
) + 1];
7583 if (!(STRINGP (target
)
7584 || (EQ (operation
, Qinsert_file_contents
) && CONSP (target
)
7585 && STRINGP (XCAR (target
)) && BUFFERP (XCDR (target
)))
7586 || (EQ (operation
, Qopen_network_stream
) && INTEGERP (target
))))
7587 error ("Invalid argument %d", XINT (target_idx
) + 1);
7589 target
= XCAR (target
);
7591 chain
= ((EQ (operation
, Qinsert_file_contents
)
7592 || EQ (operation
, Qwrite_region
))
7593 ? Vfile_coding_system_alist
7594 : (EQ (operation
, Qopen_network_stream
)
7595 ? Vnetwork_coding_system_alist
7596 : Vprocess_coding_system_alist
));
7600 for (; CONSP (chain
); chain
= XCDR (chain
))
7606 && ((STRINGP (target
)
7607 && STRINGP (XCAR (elt
))
7608 && fast_string_match (XCAR (elt
), target
) >= 0)
7609 || (INTEGERP (target
) && EQ (target
, XCAR (elt
)))))
7612 /* Here, if VAL is both a valid coding system and a valid
7613 function symbol, we return VAL as a coding system. */
7616 if (! SYMBOLP (val
))
7618 if (! NILP (Fcoding_system_p (val
)))
7619 return Fcons (val
, val
);
7620 if (! NILP (Ffboundp (val
)))
7622 /* We use call1 rather than safe_call1
7623 so as to get bug reports about functions called here
7624 which don't handle the current interface. */
7625 val
= call1 (val
, Flist (nargs
, args
));
7628 if (SYMBOLP (val
) && ! NILP (Fcoding_system_p (val
)))
7629 return Fcons (val
, val
);
7637 DEFUN ("update-coding-systems-internal", Fupdate_coding_systems_internal
,
7638 Supdate_coding_systems_internal
, 0, 0, 0,
7639 doc
: /* Update internal database for ISO2022 and CCL based coding systems.
7640 When values of any coding categories are changed, you must
7641 call this function. */)
7646 for (i
= CODING_CATEGORY_IDX_EMACS_MULE
; i
< CODING_CATEGORY_IDX_MAX
; i
++)
7650 val
= find_symbol_value (XVECTOR (Vcoding_category_table
)->contents
[i
]);
7653 if (! coding_system_table
[i
])
7654 coding_system_table
[i
] = ((struct coding_system
*)
7655 xmalloc (sizeof (struct coding_system
)));
7656 setup_coding_system (val
, coding_system_table
[i
]);
7658 else if (coding_system_table
[i
])
7660 xfree (coding_system_table
[i
]);
7661 coding_system_table
[i
] = NULL
;
7668 DEFUN ("set-coding-priority-internal", Fset_coding_priority_internal
,
7669 Sset_coding_priority_internal
, 0, 0, 0,
7670 doc
: /* Update internal database for the current value of `coding-category-list'.
7671 This function is internal use only. */)
7677 val
= Vcoding_category_list
;
7679 while (CONSP (val
) && i
< CODING_CATEGORY_IDX_MAX
)
7681 if (! SYMBOLP (XCAR (val
)))
7683 idx
= XFASTINT (Fget (XCAR (val
), Qcoding_category_index
));
7684 if (idx
>= CODING_CATEGORY_IDX_MAX
)
7686 coding_priorities
[i
++] = (1 << idx
);
7689 /* If coding-category-list is valid and contains all coding
7690 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not,
7691 the following code saves Emacs from crashing. */
7692 while (i
< CODING_CATEGORY_IDX_MAX
)
7693 coding_priorities
[i
++] = CODING_CATEGORY_MASK_RAW_TEXT
;
7698 DEFUN ("define-coding-system-internal", Fdefine_coding_system_internal
,
7699 Sdefine_coding_system_internal
, 1, 1, 0,
7700 doc
: /* Register CODING-SYSTEM as a base coding system.
7701 This function is internal use only. */)
7703 Lisp_Object coding_system
;
7705 Lisp_Object safe_chars
, slot
;
7707 if (NILP (Fcheck_coding_system (coding_system
)))
7708 xsignal1 (Qcoding_system_error
, coding_system
);
7710 safe_chars
= coding_safe_chars (coding_system
);
7711 if (! EQ (safe_chars
, Qt
) && ! CHAR_TABLE_P (safe_chars
))
7712 error ("No valid safe-chars property for %s",
7713 SDATA (SYMBOL_NAME (coding_system
)));
7715 if (EQ (safe_chars
, Qt
))
7717 if (NILP (Fmemq (coding_system
, XCAR (Vcoding_system_safe_chars
))))
7718 XSETCAR (Vcoding_system_safe_chars
,
7719 Fcons (coding_system
, XCAR (Vcoding_system_safe_chars
)));
7723 slot
= Fassq (coding_system
, XCDR (Vcoding_system_safe_chars
));
7725 XSETCDR (Vcoding_system_safe_chars
,
7726 nconc2 (XCDR (Vcoding_system_safe_chars
),
7727 Fcons (Fcons (coding_system
, safe_chars
), Qnil
)));
7729 XSETCDR (slot
, safe_chars
);
7737 /*** 9. Post-amble ***/
7744 /* Emacs' internal format specific initialize routine. */
7745 for (i
= 0; i
<= 0x20; i
++)
7746 emacs_code_class
[i
] = EMACS_control_code
;
7747 emacs_code_class
[0x0A] = EMACS_linefeed_code
;
7748 emacs_code_class
[0x0D] = EMACS_carriage_return_code
;
7749 for (i
= 0x21 ; i
< 0x7F; i
++)
7750 emacs_code_class
[i
] = EMACS_ascii_code
;
7751 emacs_code_class
[0x7F] = EMACS_control_code
;
7752 for (i
= 0x80; i
< 0xFF; i
++)
7753 emacs_code_class
[i
] = EMACS_invalid_code
;
7754 emacs_code_class
[LEADING_CODE_PRIVATE_11
] = EMACS_leading_code_3
;
7755 emacs_code_class
[LEADING_CODE_PRIVATE_12
] = EMACS_leading_code_3
;
7756 emacs_code_class
[LEADING_CODE_PRIVATE_21
] = EMACS_leading_code_4
;
7757 emacs_code_class
[LEADING_CODE_PRIVATE_22
] = EMACS_leading_code_4
;
7759 /* ISO2022 specific initialize routine. */
7760 for (i
= 0; i
< 0x20; i
++)
7761 iso_code_class
[i
] = ISO_control_0
;
7762 for (i
= 0x21; i
< 0x7F; i
++)
7763 iso_code_class
[i
] = ISO_graphic_plane_0
;
7764 for (i
= 0x80; i
< 0xA0; i
++)
7765 iso_code_class
[i
] = ISO_control_1
;
7766 for (i
= 0xA1; i
< 0xFF; i
++)
7767 iso_code_class
[i
] = ISO_graphic_plane_1
;
7768 iso_code_class
[0x20] = iso_code_class
[0x7F] = ISO_0x20_or_0x7F
;
7769 iso_code_class
[0xA0] = iso_code_class
[0xFF] = ISO_0xA0_or_0xFF
;
7770 iso_code_class
[ISO_CODE_CR
] = ISO_carriage_return
;
7771 iso_code_class
[ISO_CODE_SO
] = ISO_shift_out
;
7772 iso_code_class
[ISO_CODE_SI
] = ISO_shift_in
;
7773 iso_code_class
[ISO_CODE_SS2_7
] = ISO_single_shift_2_7
;
7774 iso_code_class
[ISO_CODE_ESC
] = ISO_escape
;
7775 iso_code_class
[ISO_CODE_SS2
] = ISO_single_shift_2
;
7776 iso_code_class
[ISO_CODE_SS3
] = ISO_single_shift_3
;
7777 iso_code_class
[ISO_CODE_CSI
] = ISO_control_sequence_introducer
;
7779 setup_coding_system (Qnil
, &safe_terminal_coding
);
7780 setup_coding_system (Qnil
, &default_buffer_file_coding
);
7782 bzero (coding_system_table
, sizeof coding_system_table
);
7784 bzero (ascii_skip_code
, sizeof ascii_skip_code
);
7785 for (i
= 0; i
< 128; i
++)
7786 ascii_skip_code
[i
] = 1;
7788 #if defined (MSDOS) || defined (WINDOWSNT)
7789 system_eol_type
= CODING_EOL_CRLF
;
7791 system_eol_type
= CODING_EOL_LF
;
7794 inhibit_pre_post_conversion
= 0;
7802 staticpro (&Vcode_conversion_workbuf_name
);
7803 Vcode_conversion_workbuf_name
= build_string (" *code-conversion-work*");
7805 Qtarget_idx
= intern ("target-idx");
7806 staticpro (&Qtarget_idx
);
7808 Qcoding_system_history
= intern ("coding-system-history");
7809 staticpro (&Qcoding_system_history
);
7810 Fset (Qcoding_system_history
, Qnil
);
7812 /* Target FILENAME is the first argument. */
7813 Fput (Qinsert_file_contents
, Qtarget_idx
, make_number (0));
7814 /* Target FILENAME is the third argument. */
7815 Fput (Qwrite_region
, Qtarget_idx
, make_number (2));
7817 Qcall_process
= intern ("call-process");
7818 staticpro (&Qcall_process
);
7819 /* Target PROGRAM is the first argument. */
7820 Fput (Qcall_process
, Qtarget_idx
, make_number (0));
7822 Qcall_process_region
= intern ("call-process-region");
7823 staticpro (&Qcall_process_region
);
7824 /* Target PROGRAM is the third argument. */
7825 Fput (Qcall_process_region
, Qtarget_idx
, make_number (2));
7827 Qstart_process
= intern ("start-process");
7828 staticpro (&Qstart_process
);
7829 /* Target PROGRAM is the third argument. */
7830 Fput (Qstart_process
, Qtarget_idx
, make_number (2));
7832 Qopen_network_stream
= intern ("open-network-stream");
7833 staticpro (&Qopen_network_stream
);
7834 /* Target SERVICE is the fourth argument. */
7835 Fput (Qopen_network_stream
, Qtarget_idx
, make_number (3));
7837 Qcoding_system
= intern ("coding-system");
7838 staticpro (&Qcoding_system
);
7840 Qeol_type
= intern ("eol-type");
7841 staticpro (&Qeol_type
);
7843 Qbuffer_file_coding_system
= intern ("buffer-file-coding-system");
7844 staticpro (&Qbuffer_file_coding_system
);
7846 Qpost_read_conversion
= intern ("post-read-conversion");
7847 staticpro (&Qpost_read_conversion
);
7849 Qpre_write_conversion
= intern ("pre-write-conversion");
7850 staticpro (&Qpre_write_conversion
);
7852 Qno_conversion
= intern ("no-conversion");
7853 staticpro (&Qno_conversion
);
7855 Qundecided
= intern ("undecided");
7856 staticpro (&Qundecided
);
7858 Qcoding_system_p
= intern ("coding-system-p");
7859 staticpro (&Qcoding_system_p
);
7861 Qcoding_system_error
= intern ("coding-system-error");
7862 staticpro (&Qcoding_system_error
);
7864 Fput (Qcoding_system_error
, Qerror_conditions
,
7865 Fcons (Qcoding_system_error
, Fcons (Qerror
, Qnil
)));
7866 Fput (Qcoding_system_error
, Qerror_message
,
7867 build_string ("Invalid coding system"));
7869 Qcoding_category
= intern ("coding-category");
7870 staticpro (&Qcoding_category
);
7871 Qcoding_category_index
= intern ("coding-category-index");
7872 staticpro (&Qcoding_category_index
);
7874 Vcoding_category_table
7875 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX
), Qnil
);
7876 staticpro (&Vcoding_category_table
);
7879 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
7881 XVECTOR (Vcoding_category_table
)->contents
[i
]
7882 = intern (coding_category_name
[i
]);
7883 Fput (XVECTOR (Vcoding_category_table
)->contents
[i
],
7884 Qcoding_category_index
, make_number (i
));
7888 Vcoding_system_safe_chars
= Fcons (Qnil
, Qnil
);
7889 staticpro (&Vcoding_system_safe_chars
);
7891 Qtranslation_table
= intern ("translation-table");
7892 staticpro (&Qtranslation_table
);
7893 Fput (Qtranslation_table
, Qchar_table_extra_slots
, make_number (2));
7895 Qtranslation_table_id
= intern ("translation-table-id");
7896 staticpro (&Qtranslation_table_id
);
7898 Qtranslation_table_for_decode
= intern ("translation-table-for-decode");
7899 staticpro (&Qtranslation_table_for_decode
);
7901 Qtranslation_table_for_encode
= intern ("translation-table-for-encode");
7902 staticpro (&Qtranslation_table_for_encode
);
7904 Qsafe_chars
= intern ("safe-chars");
7905 staticpro (&Qsafe_chars
);
7907 Qchar_coding_system
= intern ("char-coding-system");
7908 staticpro (&Qchar_coding_system
);
7910 /* Intern this now in case it isn't already done.
7911 Setting this variable twice is harmless.
7912 But don't staticpro it here--that is done in alloc.c. */
7913 Qchar_table_extra_slots
= intern ("char-table-extra-slots");
7914 Fput (Qsafe_chars
, Qchar_table_extra_slots
, make_number (0));
7915 Fput (Qchar_coding_system
, Qchar_table_extra_slots
, make_number (0));
7917 Qvalid_codes
= intern ("valid-codes");
7918 staticpro (&Qvalid_codes
);
7920 Qascii_incompatible
= intern ("ascii-incompatible");
7921 staticpro (&Qascii_incompatible
);
7923 Qemacs_mule
= intern ("emacs-mule");
7924 staticpro (&Qemacs_mule
);
7926 Qraw_text
= intern ("raw-text");
7927 staticpro (&Qraw_text
);
7929 Qutf_8
= intern ("utf-8");
7930 staticpro (&Qutf_8
);
7932 Qcoding_system_define_form
= intern ("coding-system-define-form");
7933 staticpro (&Qcoding_system_define_form
);
7935 defsubr (&Scoding_system_p
);
7936 defsubr (&Sread_coding_system
);
7937 defsubr (&Sread_non_nil_coding_system
);
7938 defsubr (&Scheck_coding_system
);
7939 defsubr (&Sdetect_coding_region
);
7940 defsubr (&Sdetect_coding_string
);
7941 defsubr (&Sfind_coding_systems_region_internal
);
7942 defsubr (&Sunencodable_char_position
);
7943 defsubr (&Sdecode_coding_region
);
7944 defsubr (&Sencode_coding_region
);
7945 defsubr (&Sdecode_coding_string
);
7946 defsubr (&Sencode_coding_string
);
7947 defsubr (&Sdecode_sjis_char
);
7948 defsubr (&Sencode_sjis_char
);
7949 defsubr (&Sdecode_big5_char
);
7950 defsubr (&Sencode_big5_char
);
7951 defsubr (&Sset_terminal_coding_system_internal
);
7952 defsubr (&Sset_safe_terminal_coding_system_internal
);
7953 defsubr (&Sterminal_coding_system
);
7954 defsubr (&Sset_keyboard_coding_system_internal
);
7955 defsubr (&Skeyboard_coding_system
);
7956 defsubr (&Sfind_operation_coding_system
);
7957 defsubr (&Supdate_coding_systems_internal
);
7958 defsubr (&Sset_coding_priority_internal
);
7959 defsubr (&Sdefine_coding_system_internal
);
7961 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list
,
7962 doc
: /* List of coding systems.
7964 Do not alter the value of this variable manually. This variable should be
7965 updated by the functions `make-coding-system' and
7966 `define-coding-system-alias'. */);
7967 Vcoding_system_list
= Qnil
;
7969 DEFVAR_LISP ("coding-system-alist", &Vcoding_system_alist
,
7970 doc
: /* Alist of coding system names.
7971 Each element is one element list of coding system name.
7972 This variable is given to `completing-read' as TABLE argument.
7974 Do not alter the value of this variable manually. This variable should be
7975 updated by the functions `make-coding-system' and
7976 `define-coding-system-alias'. */);
7977 Vcoding_system_alist
= Qnil
;
7979 DEFVAR_LISP ("coding-category-list", &Vcoding_category_list
,
7980 doc
: /* List of coding-categories (symbols) ordered by priority.
7982 On detecting a coding system, Emacs tries code detection algorithms
7983 associated with each coding-category one by one in this order. When
7984 one algorithm agrees with a byte sequence of source text, the coding
7985 system bound to the corresponding coding-category is selected.
7987 Don't modify this variable directly, but use `set-coding-priority'. */);
7991 Vcoding_category_list
= Qnil
;
7992 for (i
= CODING_CATEGORY_IDX_MAX
- 1; i
>= 0; i
--)
7993 Vcoding_category_list
7994 = Fcons (XVECTOR (Vcoding_category_table
)->contents
[i
],
7995 Vcoding_category_list
);
7998 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read
,
7999 doc
: /* Specify the coding system for read operations.
8000 It is useful to bind this variable with `let', but do not set it globally.
8001 If the value is a coding system, it is used for decoding on read operation.
8002 If not, an appropriate element is used from one of the coding system alists:
8003 There are three such tables, `file-coding-system-alist',
8004 `process-coding-system-alist', and `network-coding-system-alist'. */);
8005 Vcoding_system_for_read
= Qnil
;
8007 DEFVAR_LISP ("coding-system-for-write", &Vcoding_system_for_write
,
8008 doc
: /* Specify the coding system for write operations.
8009 Programs bind this variable with `let', but you should not set it globally.
8010 If the value is a coding system, it is used for encoding of output,
8011 when writing it to a file and when sending it to a file or subprocess.
8013 If this does not specify a coding system, an appropriate element
8014 is used from one of the coding system alists:
8015 There are three such tables, `file-coding-system-alist',
8016 `process-coding-system-alist', and `network-coding-system-alist'.
8017 For output to files, if the above procedure does not specify a coding system,
8018 the value of `buffer-file-coding-system' is used. */);
8019 Vcoding_system_for_write
= Qnil
;
8021 DEFVAR_LISP ("last-coding-system-used", &Vlast_coding_system_used
,
8022 doc
: /* Coding system used in the latest file or process I/O.
8023 Also set by `encode-coding-region', `decode-coding-region',
8024 `encode-coding-string' and `decode-coding-string'. */);
8025 Vlast_coding_system_used
= Qnil
;
8027 DEFVAR_BOOL ("inhibit-eol-conversion", &inhibit_eol_conversion
,
8028 doc
: /* *Non-nil means always inhibit code conversion of end-of-line format.
8029 See info node `Coding Systems' and info node `Text and Binary' concerning
8030 such conversion. */);
8031 inhibit_eol_conversion
= 0;
8033 DEFVAR_BOOL ("inherit-process-coding-system", &inherit_process_coding_system
,
8034 doc
: /* Non-nil means process buffer inherits coding system of process output.
8035 Bind it to t if the process output is to be treated as if it were a file
8036 read from some filesystem. */);
8037 inherit_process_coding_system
= 0;
8039 DEFVAR_LISP ("file-coding-system-alist", &Vfile_coding_system_alist
,
8040 doc
: /* Alist to decide a coding system to use for a file I/O operation.
8041 The format is ((PATTERN . VAL) ...),
8042 where PATTERN is a regular expression matching a file name,
8043 VAL is a coding system, a cons of coding systems, or a function symbol.
8044 If VAL is a coding system, it is used for both decoding and encoding
8046 If VAL is a cons of coding systems, the car part is used for decoding,
8047 and the cdr part is used for encoding.
8048 If VAL is a function symbol, the function must return a coding system
8049 or a cons of coding systems which are used as above. The function is
8050 called with an argument that is a list of the arguments with which
8051 `find-operation-coding-system' was called. If the function can't decide
8052 a coding system, it can return `undecided' so that the normal
8053 code-detection is performed.
8055 See also the function `find-operation-coding-system'
8056 and the variable `auto-coding-alist'. */);
8057 Vfile_coding_system_alist
= Qnil
;
8059 DEFVAR_LISP ("process-coding-system-alist", &Vprocess_coding_system_alist
,
8060 doc
: /* Alist to decide a coding system to use for a process I/O operation.
8061 The format is ((PATTERN . VAL) ...),
8062 where PATTERN is a regular expression matching a program name,
8063 VAL is a coding system, a cons of coding systems, or a function symbol.
8064 If VAL is a coding system, it is used for both decoding what received
8065 from the program and encoding what sent to the program.
8066 If VAL is a cons of coding systems, the car part is used for decoding,
8067 and the cdr part is used for encoding.
8068 If VAL is a function symbol, the function must return a coding system
8069 or a cons of coding systems which are used as above.
8071 See also the function `find-operation-coding-system'. */);
8072 Vprocess_coding_system_alist
= Qnil
;
8074 DEFVAR_LISP ("network-coding-system-alist", &Vnetwork_coding_system_alist
,
8075 doc
: /* Alist to decide a coding system to use for a network I/O operation.
8076 The format is ((PATTERN . VAL) ...),
8077 where PATTERN is a regular expression matching a network service name
8078 or is a port number to connect to,
8079 VAL is a coding system, a cons of coding systems, or a function symbol.
8080 If VAL is a coding system, it is used for both decoding what received
8081 from the network stream and encoding what sent to the network stream.
8082 If VAL is a cons of coding systems, the car part is used for decoding,
8083 and the cdr part is used for encoding.
8084 If VAL is a function symbol, the function must return a coding system
8085 or a cons of coding systems which are used as above.
8087 See also the function `find-operation-coding-system'. */);
8088 Vnetwork_coding_system_alist
= Qnil
;
8090 DEFVAR_LISP ("locale-coding-system", &Vlocale_coding_system
,
8091 doc
: /* Coding system to use with system messages.
8092 Also used for decoding keyboard input on X Window system. */);
8093 Vlocale_coding_system
= Qnil
;
8095 /* The eol mnemonics are reset in startup.el system-dependently. */
8096 DEFVAR_LISP ("eol-mnemonic-unix", &eol_mnemonic_unix
,
8097 doc
: /* *String displayed in mode line for UNIX-like (LF) end-of-line format. */);
8098 eol_mnemonic_unix
= build_string (":");
8100 DEFVAR_LISP ("eol-mnemonic-dos", &eol_mnemonic_dos
,
8101 doc
: /* *String displayed in mode line for DOS-like (CRLF) end-of-line format. */);
8102 eol_mnemonic_dos
= build_string ("\\");
8104 DEFVAR_LISP ("eol-mnemonic-mac", &eol_mnemonic_mac
,
8105 doc
: /* *String displayed in mode line for MAC-like (CR) end-of-line format. */);
8106 eol_mnemonic_mac
= build_string ("/");
8108 DEFVAR_LISP ("eol-mnemonic-undecided", &eol_mnemonic_undecided
,
8109 doc
: /* *String displayed in mode line when end-of-line format is not yet determined. */);
8110 eol_mnemonic_undecided
= build_string (":");
8112 DEFVAR_LISP ("enable-character-translation", &Venable_character_translation
,
8113 doc
: /* *Non-nil enables character translation while encoding and decoding. */);
8114 Venable_character_translation
= Qt
;
8116 DEFVAR_LISP ("standard-translation-table-for-decode",
8117 &Vstandard_translation_table_for_decode
,
8118 doc
: /* Table for translating characters while decoding. */);
8119 Vstandard_translation_table_for_decode
= Qnil
;
8121 DEFVAR_LISP ("standard-translation-table-for-encode",
8122 &Vstandard_translation_table_for_encode
,
8123 doc
: /* Table for translating characters while encoding. */);
8124 Vstandard_translation_table_for_encode
= Qnil
;
8126 DEFVAR_LISP ("charset-revision-table", &Vcharset_revision_alist
,
8127 doc
: /* Alist of charsets vs revision numbers.
8128 While encoding, if a charset (car part of an element) is found,
8129 designate it with the escape sequence identifying revision (cdr part of the element). */);
8130 Vcharset_revision_alist
= Qnil
;
8132 DEFVAR_LISP ("default-process-coding-system",
8133 &Vdefault_process_coding_system
,
8134 doc
: /* Cons of coding systems used for process I/O by default.
8135 The car part is used for decoding a process output,
8136 the cdr part is used for encoding a text to be sent to a process. */);
8137 Vdefault_process_coding_system
= Qnil
;
8139 DEFVAR_LISP ("latin-extra-code-table", &Vlatin_extra_code_table
,
8140 doc
: /* Table of extra Latin codes in the range 128..159 (inclusive).
8141 This is a vector of length 256.
8142 If Nth element is non-nil, the existence of code N in a file
8143 \(or output of subprocess) doesn't prevent it to be detected as
8144 a coding system of ISO 2022 variant which has a flag
8145 `accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file
8146 or reading output of a subprocess.
8147 Only 128th through 159th elements has a meaning. */);
8148 Vlatin_extra_code_table
= Fmake_vector (make_number (256), Qnil
);
8150 DEFVAR_LISP ("select-safe-coding-system-function",
8151 &Vselect_safe_coding_system_function
,
8152 doc
: /* Function to call to select safe coding system for encoding a text.
8154 If set, this function is called to force a user to select a proper
8155 coding system which can encode the text in the case that a default
8156 coding system used in each operation can't encode the text.
8158 The default value is `select-safe-coding-system' (which see). */);
8159 Vselect_safe_coding_system_function
= Qnil
;
8161 DEFVAR_BOOL ("coding-system-require-warning",
8162 &coding_system_require_warning
,
8163 doc
: /* Internal use only.
8164 If non-nil, on writing a file, `select-safe-coding-system-function' is
8165 called even if `coding-system-for-write' is non-nil. The command
8166 `universal-coding-system-argument' binds this variable to t temporarily. */);
8167 coding_system_require_warning
= 0;
8170 DEFVAR_BOOL ("inhibit-iso-escape-detection",
8171 &inhibit_iso_escape_detection
,
8172 doc
: /* If non-nil, Emacs ignores ISO2022's escape sequence on code detection.
8174 By default, on reading a file, Emacs tries to detect how the text is
8175 encoded. This code detection is sensitive to escape sequences. If
8176 the sequence is valid as ISO2022, the code is determined as one of
8177 the ISO2022 encodings, and the file is decoded by the corresponding
8178 coding system (e.g. `iso-2022-7bit').
8180 However, there may be a case that you want to read escape sequences in
8181 a file as is. In such a case, you can set this variable to non-nil.
8182 Then, as the code detection ignores any escape sequences, no file is
8183 detected as encoded in some ISO2022 encoding. The result is that all
8184 escape sequences become visible in a buffer.
8186 The default value is nil, and it is strongly recommended not to change
8187 it. That is because many Emacs Lisp source files that contain
8188 non-ASCII characters are encoded by the coding system `iso-2022-7bit'
8189 in Emacs's distribution, and they won't be decoded correctly on
8190 reading if you suppress escape sequence detection.
8192 The other way to read escape sequences in a file without decoding is
8193 to explicitly specify some coding system that doesn't use ISO2022's
8194 escape sequence (e.g `latin-1') on reading by \\[universal-coding-system-argument]. */);
8195 inhibit_iso_escape_detection
= 0;
8197 DEFVAR_LISP ("translation-table-for-input", &Vtranslation_table_for_input
,
8198 doc
: /* Char table for translating self-inserting characters.
8199 This is applied to the result of input methods, not their input. See also
8200 `keyboard-translate-table'. */);
8201 Vtranslation_table_for_input
= Qnil
;
8205 emacs_strerror (error_number
)
8210 synchronize_system_messages_locale ();
8211 str
= strerror (error_number
);
8213 if (! NILP (Vlocale_coding_system
))
8215 Lisp_Object dec
= code_convert_string_norecord (build_string (str
),
8216 Vlocale_coding_system
,
8218 str
= (char *) SDATA (dec
);
8226 /* arch-tag: 3a3a2b01-5ff6-4071-9afe-f5b808d9229d
8227 (do not change this comment) */