(ONE_MORE_BYTE_CHECK_MULTIBYTE): New macro.
[bpt/emacs.git] / src / coding.c
CommitLineData
4ed46869 1/* Coding system handler (conversion, detection, and etc).
4a2f9c6a 2 Copyright (C) 1995, 1997, 1998 Electrotechnical Laboratory, JAPAN.
203cb916 3 Licensed to the Free Software Foundation.
4ed46869 4
369314dc
KH
5This file is part of GNU Emacs.
6
7GNU Emacs is free software; you can redistribute it and/or modify
8it under the terms of the GNU General Public License as published by
9the Free Software Foundation; either version 2, or (at your option)
10any later version.
4ed46869 11
369314dc
KH
12GNU Emacs is distributed in the hope that it will be useful,
13but WITHOUT ANY WARRANTY; without even the implied warranty of
14MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15GNU General Public License for more details.
4ed46869 16
369314dc
KH
17You should have received a copy of the GNU General Public License
18along with GNU Emacs; see the file COPYING. If not, write to
19the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
20Boston, MA 02111-1307, USA. */
4ed46869
KH
21
22/*** TABLE OF CONTENTS ***
23
b73bfc1c 24 0. General comments
4ed46869 25 1. Preamble
0ef69138 26 2. Emacs' internal format (emacs-mule) handlers
4ed46869
KH
27 3. ISO2022 handlers
28 4. Shift-JIS and BIG5 handlers
1397dc18
KH
29 5. CCL handlers
30 6. End-of-line handlers
31 7. C library functions
32 8. Emacs Lisp library functions
33 9. Post-amble
4ed46869
KH
34
35*/
36
b73bfc1c
KH
37/*** 0. General comments ***/
38
39
4ed46869
KH
40/*** GENERAL NOTE on CODING SYSTEM ***
41
42 Coding system is an encoding mechanism of one or more character
43 sets. Here's a list of coding systems which Emacs can handle. When
44 we say "decode", it means converting some other coding system to
0ef69138
KH
45 Emacs' internal format (emacs-internal), and when we say "encode",
46 it means converting the coding system emacs-mule to some other
47 coding system.
4ed46869 48
0ef69138 49 0. Emacs' internal format (emacs-mule)
4ed46869
KH
50
51 Emacs itself holds a multi-lingual character in a buffer and a string
f4dee582 52 in a special format. Details are described in section 2.
4ed46869
KH
53
54 1. ISO2022
55
56 The most famous coding system for multiple character sets. X's
f4dee582
RS
57 Compound Text, various EUCs (Extended Unix Code), and coding
58 systems used in Internet communication such as ISO-2022-JP are
59 all variants of ISO2022. Details are described in section 3.
4ed46869
KH
60
61 2. SJIS (or Shift-JIS or MS-Kanji-Code)
62
63 A coding system to encode character sets: ASCII, JISX0201, and
64 JISX0208. Widely used for PC's in Japan. Details are described in
f4dee582 65 section 4.
4ed46869
KH
66
67 3. BIG5
68
69 A coding system to encode character sets: ASCII and Big5. Widely
70 used by Chinese (mainly in Taiwan and Hong Kong). Details are
f4dee582
RS
71 described in section 4. In this file, when we write "BIG5"
72 (all uppercase), we mean the coding system, and when we write
73 "Big5" (capitalized), we mean the character set.
4ed46869 74
27901516
KH
75 4. Raw text
76
4608c386
KH
77 A coding system for a text containing random 8-bit code. Emacs does
78 no code conversion on such a text except for end-of-line format.
27901516
KH
79
80 5. Other
4ed46869 81
f4dee582 82 If a user wants to read/write a text encoded in a coding system not
4ed46869
KH
83 listed above, he can supply a decoder and an encoder for it in CCL
84 (Code Conversion Language) programs. Emacs executes the CCL program
85 while reading/writing.
86
d46c5b12
KH
87 Emacs represents a coding system by a Lisp symbol that has a property
88 `coding-system'. But, before actually using the coding system, the
4ed46869 89 information about it is set in a structure of type `struct
f4dee582 90 coding_system' for rapid processing. See section 6 for more details.
4ed46869
KH
91
92*/
93
94/*** GENERAL NOTES on END-OF-LINE FORMAT ***
95
96 How end-of-line of a text is encoded depends on a system. For
97 instance, Unix's format is just one byte of `line-feed' code,
f4dee582 98 whereas DOS's format is two-byte sequence of `carriage-return' and
d46c5b12
KH
99 `line-feed' codes. MacOS's format is usually one byte of
100 `carriage-return'.
4ed46869 101
f4dee582
RS
102 Since text characters encoding and end-of-line encoding are
103 independent, any coding system described above can take
4ed46869 104 any format of end-of-line. So, Emacs has information of format of
f4dee582 105 end-of-line in each coding-system. See section 6 for more details.
4ed46869
KH
106
107*/
108
109/*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
110
111 These functions check if a text between SRC and SRC_END is encoded
112 in the coding system category XXX. Each returns an integer value in
113 which appropriate flag bits for the category XXX is set. The flag
114 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the
0a28aafb
KH
115 template of these functions. If MULTIBYTEP is nonzero, 8-bit codes
116 of the range 0x80..0x9F are in multibyte form. */
4ed46869
KH
117#if 0
118int
0a28aafb 119detect_coding_emacs_mule (src, src_end, multibytep)
4ed46869 120 unsigned char *src, *src_end;
0a28aafb 121 int multibytep;
4ed46869
KH
122{
123 ...
124}
125#endif
126
127/*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
128
b73bfc1c
KH
129 These functions decode SRC_BYTES length of unibyte text at SOURCE
130 encoded in CODING to Emacs' internal format. The resulting
131 multibyte text goes to a place pointed to by DESTINATION, the length
132 of which should not exceed DST_BYTES.
d46c5b12 133
b73bfc1c
KH
134 These functions set the information of original and decoded texts in
135 the members produced, produced_char, consumed, and consumed_char of
136 the structure *CODING. They also set the member result to one of
137 CODING_FINISH_XXX indicating how the decoding finished.
d46c5b12
KH
138
139 DST_BYTES zero means that source area and destination area are
140 overlapped, which means that we can produce a decoded text until it
141 reaches at the head of not-yet-decoded source text.
142
143 Below is a template of these functions. */
4ed46869 144#if 0
b73bfc1c 145static void
d46c5b12 146decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
147 struct coding_system *coding;
148 unsigned char *source, *destination;
149 int src_bytes, dst_bytes;
4ed46869
KH
150{
151 ...
152}
153#endif
154
155/*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
156
0ef69138 157 These functions encode SRC_BYTES length text at SOURCE of Emacs'
b73bfc1c
KH
158 internal multibyte format to CODING. The resulting unibyte text
159 goes to a place pointed to by DESTINATION, the length of which
160 should not exceed DST_BYTES.
d46c5b12 161
b73bfc1c
KH
162 These functions set the information of original and encoded texts in
163 the members produced, produced_char, consumed, and consumed_char of
164 the structure *CODING. They also set the member result to one of
165 CODING_FINISH_XXX indicating how the encoding finished.
d46c5b12
KH
166
167 DST_BYTES zero means that source area and destination area are
b73bfc1c
KH
168 overlapped, which means that we can produce a encoded text until it
169 reaches at the head of not-yet-encoded source text.
d46c5b12
KH
170
171 Below is a template of these functions. */
4ed46869 172#if 0
b73bfc1c 173static void
d46c5b12 174encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
175 struct coding_system *coding;
176 unsigned char *source, *destination;
177 int src_bytes, dst_bytes;
4ed46869
KH
178{
179 ...
180}
181#endif
182
183/*** COMMONLY USED MACROS ***/
184
b73bfc1c
KH
185/* The following two macros ONE_MORE_BYTE and TWO_MORE_BYTES safely
186 get one, two, and three bytes from the source text respectively.
187 If there are not enough bytes in the source, they jump to
188 `label_end_of_loop'. The caller should set variables `coding',
189 `src' and `src_end' to appropriate pointer in advance. These
190 macros are called from decoding routines `decode_coding_XXX', thus
191 it is assumed that the source text is unibyte. */
4ed46869 192
b73bfc1c
KH
193#define ONE_MORE_BYTE(c1) \
194 do { \
195 if (src >= src_end) \
196 { \
197 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
198 goto label_end_of_loop; \
199 } \
200 c1 = *src++; \
4ed46869
KH
201 } while (0)
202
b73bfc1c
KH
203#define TWO_MORE_BYTES(c1, c2) \
204 do { \
205 if (src + 1 >= src_end) \
206 { \
207 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
208 goto label_end_of_loop; \
209 } \
210 c1 = *src++; \
211 c2 = *src++; \
4ed46869
KH
212 } while (0)
213
4ed46869 214
0a28aafb
KH
215/* Like ONE_MORE_BYTE, but 8-bit bytes of data at SRC are in multibyte
216 form if MULTIBYTEP is nonzero. */
217
218#define ONE_MORE_BYTE_CHECK_MULTIBYTE(c1, multibytep) \
219 do { \
220 if (src >= src_end) \
221 { \
222 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
223 goto label_end_of_loop; \
224 } \
225 c1 = *src++; \
226 if (multibytep && c1 == LEADING_CODE_8_BIT_CONTROL) \
227 c1 = *src++ - 0x20; \
228 } while (0)
229
b73bfc1c
KH
230/* Set C to the next character at the source text pointed by `src'.
231 If there are not enough characters in the source, jump to
232 `label_end_of_loop'. The caller should set variables `coding'
233 `src', `src_end', and `translation_table' to appropriate pointers
234 in advance. This macro is used in encoding routines
235 `encode_coding_XXX', thus it assumes that the source text is in
236 multibyte form except for 8-bit characters. 8-bit characters are
237 in multibyte form if coding->src_multibyte is nonzero, else they
238 are represented by a single byte. */
4ed46869 239
b73bfc1c
KH
240#define ONE_MORE_CHAR(c) \
241 do { \
242 int len = src_end - src; \
243 int bytes; \
244 if (len <= 0) \
245 { \
246 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
247 goto label_end_of_loop; \
248 } \
249 if (coding->src_multibyte \
250 || UNIBYTE_STR_AS_MULTIBYTE_P (src, len, bytes)) \
251 c = STRING_CHAR_AND_LENGTH (src, len, bytes); \
252 else \
253 c = *src, bytes = 1; \
254 if (!NILP (translation_table)) \
39658efc 255 c = translate_char (translation_table, c, -1, 0, 0); \
b73bfc1c 256 src += bytes; \
4ed46869
KH
257 } while (0)
258
4ed46869 259
b73bfc1c
KH
260/* Produce a multibyte form of characater C to `dst'. Jump to
261 `label_end_of_loop' if there's not enough space at `dst'.
262
263 If we are now in the middle of composition sequence, the decoded
264 character may be ALTCHAR (for the current composition). In that
265 case, the character goes to coding->cmp_data->data instead of
266 `dst'.
267
268 This macro is used in decoding routines. */
269
270#define EMIT_CHAR(c) \
4ed46869 271 do { \
b73bfc1c
KH
272 if (! COMPOSING_P (coding) \
273 || coding->composing == COMPOSITION_RELATIVE \
274 || coding->composing == COMPOSITION_WITH_RULE) \
275 { \
276 int bytes = CHAR_BYTES (c); \
277 if ((dst + bytes) > (dst_bytes ? dst_end : src)) \
278 { \
279 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
280 goto label_end_of_loop; \
281 } \
282 dst += CHAR_STRING (c, dst); \
283 coding->produced_char++; \
284 } \
ec6d2bb8 285 \
b73bfc1c
KH
286 if (COMPOSING_P (coding) \
287 && coding->composing != COMPOSITION_RELATIVE) \
288 { \
289 CODING_ADD_COMPOSITION_COMPONENT (coding, c); \
290 coding->composition_rule_follows \
291 = coding->composing != COMPOSITION_WITH_ALTCHARS; \
292 } \
4ed46869
KH
293 } while (0)
294
4ed46869 295
b73bfc1c
KH
296#define EMIT_ONE_BYTE(c) \
297 do { \
298 if (dst >= (dst_bytes ? dst_end : src)) \
299 { \
300 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
301 goto label_end_of_loop; \
302 } \
303 *dst++ = c; \
304 } while (0)
305
306#define EMIT_TWO_BYTES(c1, c2) \
307 do { \
308 if (dst + 2 > (dst_bytes ? dst_end : src)) \
309 { \
310 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
311 goto label_end_of_loop; \
312 } \
313 *dst++ = c1, *dst++ = c2; \
314 } while (0)
315
316#define EMIT_BYTES(from, to) \
317 do { \
318 if (dst + (to - from) > (dst_bytes ? dst_end : src)) \
319 { \
320 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
321 goto label_end_of_loop; \
322 } \
323 while (from < to) \
324 *dst++ = *from++; \
4ed46869
KH
325 } while (0)
326
327\f
328/*** 1. Preamble ***/
329
68c45bf0
PE
330#ifdef emacs
331#include <config.h>
332#endif
333
4ed46869
KH
334#include <stdio.h>
335
336#ifdef emacs
337
4ed46869
KH
338#include "lisp.h"
339#include "buffer.h"
340#include "charset.h"
ec6d2bb8 341#include "composite.h"
4ed46869
KH
342#include "ccl.h"
343#include "coding.h"
344#include "window.h"
345
346#else /* not emacs */
347
348#include "mulelib.h"
349
350#endif /* not emacs */
351
352Lisp_Object Qcoding_system, Qeol_type;
353Lisp_Object Qbuffer_file_coding_system;
354Lisp_Object Qpost_read_conversion, Qpre_write_conversion;
27901516 355Lisp_Object Qno_conversion, Qundecided;
bb0115a2 356Lisp_Object Qcoding_system_history;
05e6f5dc 357Lisp_Object Qsafe_chars;
1397dc18 358Lisp_Object Qvalid_codes;
4ed46869
KH
359
360extern Lisp_Object Qinsert_file_contents, Qwrite_region;
361Lisp_Object Qcall_process, Qcall_process_region, Qprocess_argument;
362Lisp_Object Qstart_process, Qopen_network_stream;
363Lisp_Object Qtarget_idx;
364
d46c5b12
KH
365Lisp_Object Vselect_safe_coding_system_function;
366
7722baf9
EZ
367/* Mnemonic string for each format of end-of-line. */
368Lisp_Object eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac;
369/* Mnemonic string to indicate format of end-of-line is not yet
4ed46869 370 decided. */
7722baf9 371Lisp_Object eol_mnemonic_undecided;
4ed46869 372
9ce27fde
KH
373/* Format of end-of-line decided by system. This is CODING_EOL_LF on
374 Unix, CODING_EOL_CRLF on DOS/Windows, and CODING_EOL_CR on Mac. */
375int system_eol_type;
376
4ed46869
KH
377#ifdef emacs
378
4608c386
KH
379Lisp_Object Vcoding_system_list, Vcoding_system_alist;
380
381Lisp_Object Qcoding_system_p, Qcoding_system_error;
4ed46869 382
d46c5b12
KH
383/* Coding system emacs-mule and raw-text are for converting only
384 end-of-line format. */
385Lisp_Object Qemacs_mule, Qraw_text;
9ce27fde 386
4ed46869
KH
387/* Coding-systems are handed between Emacs Lisp programs and C internal
388 routines by the following three variables. */
389/* Coding-system for reading files and receiving data from process. */
390Lisp_Object Vcoding_system_for_read;
391/* Coding-system for writing files and sending data to process. */
392Lisp_Object Vcoding_system_for_write;
393/* Coding-system actually used in the latest I/O. */
394Lisp_Object Vlast_coding_system_used;
395
c4825358 396/* A vector of length 256 which contains information about special
94487c4e 397 Latin codes (especially for dealing with Microsoft codes). */
3f003981 398Lisp_Object Vlatin_extra_code_table;
c4825358 399
9ce27fde
KH
400/* Flag to inhibit code conversion of end-of-line format. */
401int inhibit_eol_conversion;
402
74383408
KH
403/* Flag to inhibit ISO2022 escape sequence detection. */
404int inhibit_iso_escape_detection;
405
ed29121d
EZ
406/* Flag to make buffer-file-coding-system inherit from process-coding. */
407int inherit_process_coding_system;
408
c4825358 409/* Coding system to be used to encode text for terminal display. */
4ed46869
KH
410struct coding_system terminal_coding;
411
c4825358
KH
412/* Coding system to be used to encode text for terminal display when
413 terminal coding system is nil. */
414struct coding_system safe_terminal_coding;
415
416/* Coding system of what is sent from terminal keyboard. */
4ed46869
KH
417struct coding_system keyboard_coding;
418
6bc51348
KH
419/* Default coding system to be used to write a file. */
420struct coding_system default_buffer_file_coding;
421
02ba4723
KH
422Lisp_Object Vfile_coding_system_alist;
423Lisp_Object Vprocess_coding_system_alist;
424Lisp_Object Vnetwork_coding_system_alist;
4ed46869 425
68c45bf0
PE
426Lisp_Object Vlocale_coding_system;
427
4ed46869
KH
428#endif /* emacs */
429
d46c5b12 430Lisp_Object Qcoding_category, Qcoding_category_index;
4ed46869
KH
431
432/* List of symbols `coding-category-xxx' ordered by priority. */
433Lisp_Object Vcoding_category_list;
434
d46c5b12
KH
435/* Table of coding categories (Lisp symbols). */
436Lisp_Object Vcoding_category_table;
4ed46869
KH
437
438/* Table of names of symbol for each coding-category. */
439char *coding_category_name[CODING_CATEGORY_IDX_MAX] = {
0ef69138 440 "coding-category-emacs-mule",
4ed46869
KH
441 "coding-category-sjis",
442 "coding-category-iso-7",
d46c5b12 443 "coding-category-iso-7-tight",
4ed46869
KH
444 "coding-category-iso-8-1",
445 "coding-category-iso-8-2",
7717c392
KH
446 "coding-category-iso-7-else",
447 "coding-category-iso-8-else",
89fa8b36 448 "coding-category-ccl",
4ed46869 449 "coding-category-big5",
fa42c37f
KH
450 "coding-category-utf-8",
451 "coding-category-utf-16-be",
452 "coding-category-utf-16-le",
27901516 453 "coding-category-raw-text",
89fa8b36 454 "coding-category-binary"
4ed46869
KH
455};
456
66cfb530 457/* Table of pointers to coding systems corresponding to each coding
d46c5b12
KH
458 categories. */
459struct coding_system *coding_system_table[CODING_CATEGORY_IDX_MAX];
460
66cfb530
KH
461/* Table of coding category masks. Nth element is a mask for a coding
462 cateogry of which priority is Nth. */
463static
464int coding_priorities[CODING_CATEGORY_IDX_MAX];
465
f967223b
KH
466/* Flag to tell if we look up translation table on character code
467 conversion. */
84fbb8a0 468Lisp_Object Venable_character_translation;
f967223b
KH
469/* Standard translation table to look up on decoding (reading). */
470Lisp_Object Vstandard_translation_table_for_decode;
471/* Standard translation table to look up on encoding (writing). */
472Lisp_Object Vstandard_translation_table_for_encode;
84fbb8a0 473
f967223b
KH
474Lisp_Object Qtranslation_table;
475Lisp_Object Qtranslation_table_id;
476Lisp_Object Qtranslation_table_for_decode;
477Lisp_Object Qtranslation_table_for_encode;
4ed46869
KH
478
479/* Alist of charsets vs revision number. */
480Lisp_Object Vcharset_revision_alist;
481
02ba4723
KH
482/* Default coding systems used for process I/O. */
483Lisp_Object Vdefault_process_coding_system;
484
b843d1ae
KH
485/* Global flag to tell that we can't call post-read-conversion and
486 pre-write-conversion functions. Usually the value is zero, but it
487 is set to 1 temporarily while such functions are running. This is
488 to avoid infinite recursive call. */
489static int inhibit_pre_post_conversion;
490
05e6f5dc
KH
491/* Char-table containing safe coding systems of each character. */
492Lisp_Object Vchar_coding_system_table;
493Lisp_Object Qchar_coding_system;
494
495/* Return `safe-chars' property of coding system CODING. Don't check
496 validity of CODING. */
497
498Lisp_Object
499coding_safe_chars (coding)
500 struct coding_system *coding;
501{
502 Lisp_Object coding_spec, plist, safe_chars;
503
504 coding_spec = Fget (coding->symbol, Qcoding_system);
505 plist = XVECTOR (coding_spec)->contents[3];
506 safe_chars = Fplist_get (XVECTOR (coding_spec)->contents[3], Qsafe_chars);
507 return (CHAR_TABLE_P (safe_chars) ? safe_chars : Qt);
508}
509
510#define CODING_SAFE_CHAR_P(safe_chars, c) \
511 (EQ (safe_chars, Qt) || !NILP (CHAR_TABLE_REF (safe_chars, c)))
512
4ed46869 513\f
0ef69138 514/*** 2. Emacs internal format (emacs-mule) handlers ***/
4ed46869
KH
515
516/* Emacs' internal format for encoding multiple character sets is a
f4dee582 517 kind of multi-byte encoding, i.e. characters are encoded by
b73bfc1c
KH
518 variable-length sequences of one-byte codes.
519
520 ASCII characters and control characters (e.g. `tab', `newline') are
521 represented by one-byte sequences which are their ASCII codes, in
522 the range 0x00 through 0x7F.
523
524 8-bit characters of the range 0x80..0x9F are represented by
525 two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit
526 code + 0x20).
527
528 8-bit characters of the range 0xA0..0xFF are represented by
529 one-byte sequences which are their 8-bit code.
530
531 The other characters are represented by a sequence of `base
532 leading-code', optional `extended leading-code', and one or two
533 `position-code's. The length of the sequence is determined by the
534 base leading-code. Leading-code takes the range 0x80 through 0x9F,
535 whereas extended leading-code and position-code take the range 0xA0
536 through 0xFF. See `charset.h' for more details about leading-code
537 and position-code.
f4dee582 538
4ed46869 539 --- CODE RANGE of Emacs' internal format ---
b73bfc1c
KH
540 character set range
541 ------------- -----
542 ascii 0x00..0x7F
543 eight-bit-control LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF
544 eight-bit-graphic 0xA0..0xBF
545 ELSE 0x81..0x9F + [0xA0..0xFF]+
4ed46869
KH
546 ---------------------------------------------
547
548 */
549
550enum emacs_code_class_type emacs_code_class[256];
551
4ed46869
KH
552/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
553 Check if a text is encoded in Emacs' internal format. If it is,
d46c5b12 554 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */
4ed46869 555
0a28aafb
KH
556static int
557detect_coding_emacs_mule (src, src_end, multibytep)
b73bfc1c 558 unsigned char *src, *src_end;
0a28aafb 559 int multibytep;
4ed46869
KH
560{
561 unsigned char c;
562 int composing = 0;
b73bfc1c
KH
563 /* Dummy for ONE_MORE_BYTE. */
564 struct coding_system dummy_coding;
565 struct coding_system *coding = &dummy_coding;
4ed46869 566
b73bfc1c 567 while (1)
4ed46869 568 {
0a28aafb 569 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
4ed46869
KH
570
571 if (composing)
572 {
573 if (c < 0xA0)
574 composing = 0;
b73bfc1c
KH
575 else if (c == 0xA0)
576 {
0a28aafb 577 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
b73bfc1c
KH
578 c &= 0x7F;
579 }
4ed46869
KH
580 else
581 c -= 0x20;
582 }
583
b73bfc1c 584 if (c < 0x20)
4ed46869 585 {
4ed46869
KH
586 if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)
587 return 0;
b73bfc1c
KH
588 }
589 else if (c >= 0x80 && c < 0xA0)
590 {
591 if (c == 0x80)
592 /* Old leading code for a composite character. */
593 composing = 1;
594 else
595 {
596 unsigned char *src_base = src - 1;
597 int bytes;
4ed46869 598
b73bfc1c
KH
599 if (!UNIBYTE_STR_AS_MULTIBYTE_P (src_base, src_end - src_base,
600 bytes))
601 return 0;
602 src = src_base + bytes;
603 }
604 }
605 }
606 label_end_of_loop:
607 return CODING_CATEGORY_MASK_EMACS_MULE;
608}
4ed46869 609
4ed46869 610
b73bfc1c 611/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
4ed46869 612
b73bfc1c
KH
613static void
614decode_coding_emacs_mule (coding, source, destination, src_bytes, dst_bytes)
615 struct coding_system *coding;
616 unsigned char *source, *destination;
617 int src_bytes, dst_bytes;
618{
619 unsigned char *src = source;
620 unsigned char *src_end = source + src_bytes;
621 unsigned char *dst = destination;
622 unsigned char *dst_end = destination + dst_bytes;
623 /* SRC_BASE remembers the start position in source in each loop.
624 The loop will be exited when there's not enough source code, or
625 when there's not enough destination area to produce a
626 character. */
627 unsigned char *src_base;
4ed46869 628
b73bfc1c 629 coding->produced_char = 0;
8a33cf7b 630 while ((src_base = src) < src_end)
b73bfc1c
KH
631 {
632 unsigned char tmp[MAX_MULTIBYTE_LENGTH], *p;
633 int bytes;
ec6d2bb8 634
4af310db
EZ
635 if (*src == '\r')
636 {
2bcdf662 637 int c = *src++;
4af310db 638
4af310db
EZ
639 if (coding->eol_type == CODING_EOL_CR)
640 c = '\n';
641 else if (coding->eol_type == CODING_EOL_CRLF)
642 {
643 ONE_MORE_BYTE (c);
644 if (c != '\n')
645 {
646 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
647 {
648 coding->result = CODING_FINISH_INCONSISTENT_EOL;
649 goto label_end_of_loop;
650 }
651 src--;
652 c = '\r';
653 }
654 }
655 *dst++ = c;
656 coding->produced_char++;
657 continue;
658 }
659 else if (*src == '\n')
660 {
661 if ((coding->eol_type == CODING_EOL_CR
662 || coding->eol_type == CODING_EOL_CRLF)
663 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
664 {
665 coding->result = CODING_FINISH_INCONSISTENT_EOL;
666 goto label_end_of_loop;
667 }
668 *dst++ = *src++;
669 coding->produced_char++;
670 continue;
671 }
672 else if (UNIBYTE_STR_AS_MULTIBYTE_P (src, src_end - src, bytes))
b73bfc1c
KH
673 {
674 p = src;
675 src += bytes;
676 }
677 else
678 {
679 bytes = CHAR_STRING (*src, tmp);
680 p = tmp;
681 src++;
682 }
683 if (dst + bytes >= (dst_bytes ? dst_end : src))
684 {
685 coding->result = CODING_FINISH_INSUFFICIENT_DST;
4ed46869
KH
686 break;
687 }
b73bfc1c
KH
688 while (bytes--) *dst++ = *p++;
689 coding->produced_char++;
4ed46869 690 }
4af310db 691 label_end_of_loop:
b73bfc1c
KH
692 coding->consumed = coding->consumed_char = src_base - source;
693 coding->produced = dst - destination;
4ed46869
KH
694}
695
b73bfc1c
KH
696#define encode_coding_emacs_mule(coding, source, destination, src_bytes, dst_bytes) \
697 encode_eol (coding, source, destination, src_bytes, dst_bytes)
698
699
4ed46869
KH
700\f
701/*** 3. ISO2022 handlers ***/
702
703/* The following note describes the coding system ISO2022 briefly.
39787efd
KH
704 Since the intention of this note is to help understand the
705 functions in this file, some parts are NOT ACCURATE or OVERLY
706 SIMPLIFIED. For thorough understanding, please refer to the
4ed46869
KH
707 original document of ISO2022.
708
709 ISO2022 provides many mechanisms to encode several character sets
39787efd
KH
710 in 7-bit and 8-bit environments. For 7-bite environments, all text
711 is encoded using bytes less than 128. This may make the encoded
712 text a little bit longer, but the text passes more easily through
713 several gateways, some of which strip off MSB (Most Signigant Bit).
b73bfc1c 714
39787efd 715 There are two kinds of character sets: control character set and
4ed46869
KH
716 graphic character set. The former contains control characters such
717 as `newline' and `escape' to provide control functions (control
39787efd
KH
718 functions are also provided by escape sequences). The latter
719 contains graphic characters such as 'A' and '-'. Emacs recognizes
4ed46869
KH
720 two control character sets and many graphic character sets.
721
722 Graphic character sets are classified into one of the following
39787efd
KH
723 four classes, according to the number of bytes (DIMENSION) and
724 number of characters in one dimension (CHARS) of the set:
725 - DIMENSION1_CHARS94
726 - DIMENSION1_CHARS96
727 - DIMENSION2_CHARS94
728 - DIMENSION2_CHARS96
729
730 In addition, each character set is assigned an identification tag,
731 unique for each set, called "final character" (denoted as <F>
732 hereafter). The <F> of each character set is decided by ECMA(*)
733 when it is registered in ISO. The code range of <F> is 0x30..0x7F
734 (0x30..0x3F are for private use only).
4ed46869
KH
735
736 Note (*): ECMA = European Computer Manufacturers Association
737
738 Here are examples of graphic character set [NAME(<F>)]:
739 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
740 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
741 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
742 o DIMENSION2_CHARS96 -- none for the moment
743
39787efd 744 A code area (1 byte=8 bits) is divided into 4 areas, C0, GL, C1, and GR.
4ed46869
KH
745 C0 [0x00..0x1F] -- control character plane 0
746 GL [0x20..0x7F] -- graphic character plane 0
747 C1 [0x80..0x9F] -- control character plane 1
748 GR [0xA0..0xFF] -- graphic character plane 1
749
750 A control character set is directly designated and invoked to C0 or
39787efd
KH
751 C1 by an escape sequence. The most common case is that:
752 - ISO646's control character set is designated/invoked to C0, and
753 - ISO6429's control character set is designated/invoked to C1,
754 and usually these designations/invocations are omitted in encoded
755 text. In a 7-bit environment, only C0 can be used, and a control
756 character for C1 is encoded by an appropriate escape sequence to
757 fit into the environment. All control characters for C1 are
758 defined to have corresponding escape sequences.
4ed46869
KH
759
760 A graphic character set is at first designated to one of four
761 graphic registers (G0 through G3), then these graphic registers are
762 invoked to GL or GR. These designations and invocations can be
763 done independently. The most common case is that G0 is invoked to
39787efd
KH
764 GL, G1 is invoked to GR, and ASCII is designated to G0. Usually
765 these invocations and designations are omitted in encoded text.
766 In a 7-bit environment, only GL can be used.
4ed46869 767
39787efd
KH
768 When a graphic character set of CHARS94 is invoked to GL, codes
769 0x20 and 0x7F of the GL area work as control characters SPACE and
770 DEL respectively, and codes 0xA0 and 0xFF of the GR area should not
771 be used.
4ed46869
KH
772
773 There are two ways of invocation: locking-shift and single-shift.
774 With locking-shift, the invocation lasts until the next different
39787efd
KH
775 invocation, whereas with single-shift, the invocation affects the
776 following character only and doesn't affect the locking-shift
777 state. Invocations are done by the following control characters or
778 escape sequences:
4ed46869
KH
779
780 ----------------------------------------------------------------------
39787efd 781 abbrev function cntrl escape seq description
4ed46869 782 ----------------------------------------------------------------------
39787efd
KH
783 SI/LS0 (shift-in) 0x0F none invoke G0 into GL
784 SO/LS1 (shift-out) 0x0E none invoke G1 into GL
785 LS2 (locking-shift-2) none ESC 'n' invoke G2 into GL
786 LS3 (locking-shift-3) none ESC 'o' invoke G3 into GL
787 LS1R (locking-shift-1 right) none ESC '~' invoke G1 into GR (*)
788 LS2R (locking-shift-2 right) none ESC '}' invoke G2 into GR (*)
789 LS3R (locking-shift 3 right) none ESC '|' invoke G3 into GR (*)
790 SS2 (single-shift-2) 0x8E ESC 'N' invoke G2 for one char
791 SS3 (single-shift-3) 0x8F ESC 'O' invoke G3 for one char
4ed46869 792 ----------------------------------------------------------------------
39787efd
KH
793 (*) These are not used by any known coding system.
794
795 Control characters for these functions are defined by macros
796 ISO_CODE_XXX in `coding.h'.
4ed46869 797
39787efd 798 Designations are done by the following escape sequences:
4ed46869
KH
799 ----------------------------------------------------------------------
800 escape sequence description
801 ----------------------------------------------------------------------
802 ESC '(' <F> designate DIMENSION1_CHARS94<F> to G0
803 ESC ')' <F> designate DIMENSION1_CHARS94<F> to G1
804 ESC '*' <F> designate DIMENSION1_CHARS94<F> to G2
805 ESC '+' <F> designate DIMENSION1_CHARS94<F> to G3
806 ESC ',' <F> designate DIMENSION1_CHARS96<F> to G0 (*)
807 ESC '-' <F> designate DIMENSION1_CHARS96<F> to G1
808 ESC '.' <F> designate DIMENSION1_CHARS96<F> to G2
809 ESC '/' <F> designate DIMENSION1_CHARS96<F> to G3
810 ESC '$' '(' <F> designate DIMENSION2_CHARS94<F> to G0 (**)
811 ESC '$' ')' <F> designate DIMENSION2_CHARS94<F> to G1
812 ESC '$' '*' <F> designate DIMENSION2_CHARS94<F> to G2
813 ESC '$' '+' <F> designate DIMENSION2_CHARS94<F> to G3
814 ESC '$' ',' <F> designate DIMENSION2_CHARS96<F> to G0 (*)
815 ESC '$' '-' <F> designate DIMENSION2_CHARS96<F> to G1
816 ESC '$' '.' <F> designate DIMENSION2_CHARS96<F> to G2
817 ESC '$' '/' <F> designate DIMENSION2_CHARS96<F> to G3
818 ----------------------------------------------------------------------
819
820 In this list, "DIMENSION1_CHARS94<F>" means a graphic character set
39787efd 821 of dimension 1, chars 94, and final character <F>, etc...
4ed46869
KH
822
823 Note (*): Although these designations are not allowed in ISO2022,
824 Emacs accepts them on decoding, and produces them on encoding
39787efd 825 CHARS96 character sets in a coding system which is characterized as
4ed46869
KH
826 7-bit environment, non-locking-shift, and non-single-shift.
827
828 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
39787efd 829 '(' can be omitted. We refer to this as "short-form" hereafter.
4ed46869
KH
830
831 Now you may notice that there are a lot of ways for encoding the
39787efd
KH
832 same multilingual text in ISO2022. Actually, there exist many
833 coding systems such as Compound Text (used in X11's inter client
834 communication, ISO-2022-JP (used in Japanese internet), ISO-2022-KR
835 (used in Korean internet), EUC (Extended UNIX Code, used in Asian
4ed46869
KH
836 localized platforms), and all of these are variants of ISO2022.
837
838 In addition to the above, Emacs handles two more kinds of escape
839 sequences: ISO6429's direction specification and Emacs' private
840 sequence for specifying character composition.
841
39787efd 842 ISO6429's direction specification takes the following form:
4ed46869
KH
843 o CSI ']' -- end of the current direction
844 o CSI '0' ']' -- end of the current direction
845 o CSI '1' ']' -- start of left-to-right text
846 o CSI '2' ']' -- start of right-to-left text
847 The control character CSI (0x9B: control sequence introducer) is
39787efd
KH
848 abbreviated to the escape sequence ESC '[' in a 7-bit environment.
849
850 Character composition specification takes the following form:
ec6d2bb8
KH
851 o ESC '0' -- start relative composition
852 o ESC '1' -- end composition
853 o ESC '2' -- start rule-base composition (*)
854 o ESC '3' -- start relative composition with alternate chars (**)
855 o ESC '4' -- start rule-base composition with alternate chars (**)
b73bfc1c
KH
856 Since these are not standard escape sequences of any ISO standard,
857 the use of them for these meaning is restricted to Emacs only.
ec6d2bb8 858
b73bfc1c
KH
859 (*) This form is used only in Emacs 20.5 and the older versions,
860 but the newer versions can safely decode it.
861 (**) This form is used only in Emacs 21.1 and the newer versions,
862 and the older versions can't decode it.
ec6d2bb8 863
b73bfc1c
KH
864 Here's a list of examples usages of these composition escape
865 sequences (categorized by `enum composition_method').
ec6d2bb8 866
b73bfc1c 867 COMPOSITION_RELATIVE:
ec6d2bb8 868 ESC 0 CHAR [ CHAR ] ESC 1
b73bfc1c 869 COMPOSITOIN_WITH_RULE:
ec6d2bb8 870 ESC 2 CHAR [ RULE CHAR ] ESC 1
b73bfc1c 871 COMPOSITION_WITH_ALTCHARS:
ec6d2bb8 872 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
b73bfc1c 873 COMPOSITION_WITH_RULE_ALTCHARS:
ec6d2bb8 874 ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */
4ed46869
KH
875
876enum iso_code_class_type iso_code_class[256];
877
05e6f5dc
KH
878#define CHARSET_OK(idx, charset, c) \
879 (coding_system_table[idx] \
880 && (charset == CHARSET_ASCII \
881 || (safe_chars = coding_safe_chars (coding_system_table[idx]), \
882 CODING_SAFE_CHAR_P (safe_chars, c))) \
883 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding_system_table[idx], \
884 charset) \
885 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
d46c5b12
KH
886
887#define SHIFT_OUT_OK(idx) \
888 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
889
4ed46869
KH
890/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
891 Check if a text is encoded in ISO2022. If it is, returns an
892 integer in which appropriate flag bits any of:
893 CODING_CATEGORY_MASK_ISO_7
d46c5b12 894 CODING_CATEGORY_MASK_ISO_7_TIGHT
4ed46869
KH
895 CODING_CATEGORY_MASK_ISO_8_1
896 CODING_CATEGORY_MASK_ISO_8_2
7717c392
KH
897 CODING_CATEGORY_MASK_ISO_7_ELSE
898 CODING_CATEGORY_MASK_ISO_8_ELSE
4ed46869
KH
899 are set. If a code which should never appear in ISO2022 is found,
900 returns 0. */
901
0a28aafb
KH
902static int
903detect_coding_iso2022 (src, src_end, multibytep)
4ed46869 904 unsigned char *src, *src_end;
0a28aafb 905 int multibytep;
4ed46869 906{
d46c5b12
KH
907 int mask = CODING_CATEGORY_MASK_ISO;
908 int mask_found = 0;
f46869e4 909 int reg[4], shift_out = 0, single_shifting = 0;
d46c5b12 910 int c, c1, i, charset;
b73bfc1c
KH
911 /* Dummy for ONE_MORE_BYTE. */
912 struct coding_system dummy_coding;
913 struct coding_system *coding = &dummy_coding;
05e6f5dc 914 Lisp_Object safe_chars;
3f003981 915
d46c5b12 916 reg[0] = CHARSET_ASCII, reg[1] = reg[2] = reg[3] = -1;
3f003981 917 while (mask && src < src_end)
4ed46869 918 {
0a28aafb 919 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
4ed46869
KH
920 switch (c)
921 {
922 case ISO_CODE_ESC:
74383408
KH
923 if (inhibit_iso_escape_detection)
924 break;
f46869e4 925 single_shifting = 0;
0a28aafb 926 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
d46c5b12 927 if (c >= '(' && c <= '/')
4ed46869 928 {
bf9cdd4e 929 /* Designation sequence for a charset of dimension 1. */
0a28aafb 930 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1, multibytep);
d46c5b12
KH
931 if (c1 < ' ' || c1 >= 0x80
932 || (charset = iso_charset_table[0][c >= ','][c1]) < 0)
933 /* Invalid designation sequence. Just ignore. */
934 break;
935 reg[(c - '(') % 4] = charset;
bf9cdd4e
KH
936 }
937 else if (c == '$')
938 {
939 /* Designation sequence for a charset of dimension 2. */
0a28aafb 940 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
bf9cdd4e
KH
941 if (c >= '@' && c <= 'B')
942 /* Designation for JISX0208.1978, GB2312, or JISX0208. */
d46c5b12 943 reg[0] = charset = iso_charset_table[1][0][c];
bf9cdd4e 944 else if (c >= '(' && c <= '/')
bcf26d6a 945 {
0a28aafb 946 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1, multibytep);
d46c5b12
KH
947 if (c1 < ' ' || c1 >= 0x80
948 || (charset = iso_charset_table[1][c >= ','][c1]) < 0)
949 /* Invalid designation sequence. Just ignore. */
950 break;
951 reg[(c - '(') % 4] = charset;
bcf26d6a 952 }
bf9cdd4e 953 else
d46c5b12
KH
954 /* Invalid designation sequence. Just ignore. */
955 break;
956 }
ae9ff118 957 else if (c == 'N' || c == 'O')
d46c5b12 958 {
ae9ff118
KH
959 /* ESC <Fe> for SS2 or SS3. */
960 mask &= CODING_CATEGORY_MASK_ISO_7_ELSE;
d46c5b12 961 break;
4ed46869 962 }
ec6d2bb8
KH
963 else if (c >= '0' && c <= '4')
964 {
965 /* ESC <Fp> for start/end composition. */
966 mask_found |= CODING_CATEGORY_MASK_ISO;
967 break;
968 }
bf9cdd4e 969 else
d46c5b12
KH
970 /* Invalid escape sequence. Just ignore. */
971 break;
972
973 /* We found a valid designation sequence for CHARSET. */
974 mask &= ~CODING_CATEGORY_MASK_ISO_8BIT;
05e6f5dc
KH
975 c = MAKE_CHAR (charset, 0, 0);
976 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7, charset, c))
d46c5b12
KH
977 mask_found |= CODING_CATEGORY_MASK_ISO_7;
978 else
979 mask &= ~CODING_CATEGORY_MASK_ISO_7;
05e6f5dc 980 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT, charset, c))
d46c5b12
KH
981 mask_found |= CODING_CATEGORY_MASK_ISO_7_TIGHT;
982 else
983 mask &= ~CODING_CATEGORY_MASK_ISO_7_TIGHT;
05e6f5dc 984 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE, charset, c))
ae9ff118
KH
985 mask_found |= CODING_CATEGORY_MASK_ISO_7_ELSE;
986 else
d46c5b12 987 mask &= ~CODING_CATEGORY_MASK_ISO_7_ELSE;
05e6f5dc 988 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE, charset, c))
ae9ff118
KH
989 mask_found |= CODING_CATEGORY_MASK_ISO_8_ELSE;
990 else
d46c5b12 991 mask &= ~CODING_CATEGORY_MASK_ISO_8_ELSE;
4ed46869
KH
992 break;
993
4ed46869 994 case ISO_CODE_SO:
74383408
KH
995 if (inhibit_iso_escape_detection)
996 break;
f46869e4 997 single_shifting = 0;
d46c5b12
KH
998 if (shift_out == 0
999 && (reg[1] >= 0
1000 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE)
1001 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE)))
1002 {
1003 /* Locking shift out. */
1004 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
1005 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
1006 }
e0e989f6
KH
1007 break;
1008
d46c5b12 1009 case ISO_CODE_SI:
74383408
KH
1010 if (inhibit_iso_escape_detection)
1011 break;
f46869e4 1012 single_shifting = 0;
d46c5b12
KH
1013 if (shift_out == 1)
1014 {
1015 /* Locking shift in. */
1016 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
1017 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
1018 }
1019 break;
1020
4ed46869 1021 case ISO_CODE_CSI:
f46869e4 1022 single_shifting = 0;
4ed46869
KH
1023 case ISO_CODE_SS2:
1024 case ISO_CODE_SS3:
3f003981
KH
1025 {
1026 int newmask = CODING_CATEGORY_MASK_ISO_8_ELSE;
1027
74383408
KH
1028 if (inhibit_iso_escape_detection)
1029 break;
70c22245
KH
1030 if (c != ISO_CODE_CSI)
1031 {
d46c5b12
KH
1032 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
1033 & CODING_FLAG_ISO_SINGLE_SHIFT)
70c22245 1034 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
1035 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
1036 & CODING_FLAG_ISO_SINGLE_SHIFT)
70c22245 1037 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
f46869e4 1038 single_shifting = 1;
70c22245 1039 }
3f003981
KH
1040 if (VECTORP (Vlatin_extra_code_table)
1041 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
1042 {
d46c5b12
KH
1043 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
1044 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981 1045 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
1046 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
1047 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981
KH
1048 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
1049 }
1050 mask &= newmask;
d46c5b12 1051 mask_found |= newmask;
3f003981
KH
1052 }
1053 break;
4ed46869
KH
1054
1055 default:
1056 if (c < 0x80)
f46869e4
KH
1057 {
1058 single_shifting = 0;
1059 break;
1060 }
4ed46869 1061 else if (c < 0xA0)
c4825358 1062 {
f46869e4 1063 single_shifting = 0;
3f003981
KH
1064 if (VECTORP (Vlatin_extra_code_table)
1065 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
c4825358 1066 {
3f003981
KH
1067 int newmask = 0;
1068
d46c5b12
KH
1069 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
1070 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981 1071 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
1072 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
1073 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981
KH
1074 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
1075 mask &= newmask;
d46c5b12 1076 mask_found |= newmask;
c4825358 1077 }
3f003981
KH
1078 else
1079 return 0;
c4825358 1080 }
4ed46869
KH
1081 else
1082 {
d46c5b12 1083 mask &= ~(CODING_CATEGORY_MASK_ISO_7BIT
7717c392 1084 | CODING_CATEGORY_MASK_ISO_7_ELSE);
d46c5b12 1085 mask_found |= CODING_CATEGORY_MASK_ISO_8_1;
f46869e4
KH
1086 /* Check the length of succeeding codes of the range
1087 0xA0..0FF. If the byte length is odd, we exclude
1088 CODING_CATEGORY_MASK_ISO_8_2. We can check this only
1089 when we are not single shifting. */
b73bfc1c
KH
1090 if (!single_shifting
1091 && mask & CODING_CATEGORY_MASK_ISO_8_2)
f46869e4 1092 {
e17de821 1093 int i = 1;
b73bfc1c
KH
1094 while (src < src_end)
1095 {
0a28aafb 1096 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
b73bfc1c
KH
1097 if (c < 0xA0)
1098 break;
1099 i++;
1100 }
1101
1102 if (i & 1 && src < src_end)
f46869e4
KH
1103 mask &= ~CODING_CATEGORY_MASK_ISO_8_2;
1104 else
1105 mask_found |= CODING_CATEGORY_MASK_ISO_8_2;
1106 }
4ed46869
KH
1107 }
1108 break;
1109 }
1110 }
b73bfc1c 1111 label_end_of_loop:
d46c5b12 1112 return (mask & mask_found);
4ed46869
KH
1113}
1114
b73bfc1c
KH
1115/* Decode a character of which charset is CHARSET, the 1st position
1116 code is C1, the 2nd position code is C2, and return the decoded
1117 character code. If the variable `translation_table' is non-nil,
1118 returned the translated code. */
ec6d2bb8 1119
b73bfc1c
KH
1120#define DECODE_ISO_CHARACTER(charset, c1, c2) \
1121 (NILP (translation_table) \
1122 ? MAKE_CHAR (charset, c1, c2) \
1123 : translate_char (translation_table, -1, charset, c1, c2))
4ed46869
KH
1124
1125/* Set designation state into CODING. */
d46c5b12
KH
1126#define DECODE_DESIGNATION(reg, dimension, chars, final_char) \
1127 do { \
05e6f5dc 1128 int charset, c; \
944bd420
KH
1129 \
1130 if (final_char < '0' || final_char >= 128) \
1131 goto label_invalid_code; \
1132 charset = ISO_CHARSET_TABLE (make_number (dimension), \
1133 make_number (chars), \
1134 make_number (final_char)); \
05e6f5dc 1135 c = MAKE_CHAR (charset, 0, 0); \
d46c5b12 1136 if (charset >= 0 \
704c5781 1137 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg \
05e6f5dc 1138 || CODING_SAFE_CHAR_P (safe_chars, c))) \
d46c5b12
KH
1139 { \
1140 if (coding->spec.iso2022.last_invalid_designation_register == 0 \
1141 && reg == 0 \
1142 && charset == CHARSET_ASCII) \
1143 { \
1144 /* We should insert this designation sequence as is so \
1145 that it is surely written back to a file. */ \
1146 coding->spec.iso2022.last_invalid_designation_register = -1; \
1147 goto label_invalid_code; \
1148 } \
1149 coding->spec.iso2022.last_invalid_designation_register = -1; \
1150 if ((coding->mode & CODING_MODE_DIRECTION) \
1151 && CHARSET_REVERSE_CHARSET (charset) >= 0) \
1152 charset = CHARSET_REVERSE_CHARSET (charset); \
1153 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1154 } \
1155 else \
1156 { \
1157 coding->spec.iso2022.last_invalid_designation_register = reg; \
1158 goto label_invalid_code; \
1159 } \
4ed46869
KH
1160 } while (0)
1161
ec6d2bb8
KH
1162/* Allocate a memory block for storing information about compositions.
1163 The block is chained to the already allocated blocks. */
d46c5b12 1164
33fb63eb 1165void
ec6d2bb8 1166coding_allocate_composition_data (coding, char_offset)
d46c5b12 1167 struct coding_system *coding;
ec6d2bb8 1168 int char_offset;
d46c5b12 1169{
ec6d2bb8
KH
1170 struct composition_data *cmp_data
1171 = (struct composition_data *) xmalloc (sizeof *cmp_data);
1172
1173 cmp_data->char_offset = char_offset;
1174 cmp_data->used = 0;
1175 cmp_data->prev = coding->cmp_data;
1176 cmp_data->next = NULL;
1177 if (coding->cmp_data)
1178 coding->cmp_data->next = cmp_data;
1179 coding->cmp_data = cmp_data;
1180 coding->cmp_data_start = 0;
1181}
d46c5b12 1182
ec6d2bb8
KH
1183/* Record the starting position START and METHOD of one composition. */
1184
1185#define CODING_ADD_COMPOSITION_START(coding, start, method) \
1186 do { \
1187 struct composition_data *cmp_data = coding->cmp_data; \
1188 int *data = cmp_data->data + cmp_data->used; \
1189 coding->cmp_data_start = cmp_data->used; \
1190 data[0] = -1; \
1191 data[1] = cmp_data->char_offset + start; \
1192 data[3] = (int) method; \
1193 cmp_data->used += 4; \
1194 } while (0)
1195
1196/* Record the ending position END of the current composition. */
1197
1198#define CODING_ADD_COMPOSITION_END(coding, end) \
1199 do { \
1200 struct composition_data *cmp_data = coding->cmp_data; \
1201 int *data = cmp_data->data + coding->cmp_data_start; \
1202 data[0] = cmp_data->used - coding->cmp_data_start; \
1203 data[2] = cmp_data->char_offset + end; \
1204 } while (0)
1205
1206/* Record one COMPONENT (alternate character or composition rule). */
1207
1208#define CODING_ADD_COMPOSITION_COMPONENT(coding, component) \
1209 (coding->cmp_data->data[coding->cmp_data->used++] = component)
1210
1211/* Handle compositoin start sequence ESC 0, ESC 2, ESC 3, or ESC 4. */
1212
33fb63eb
KH
1213#define DECODE_COMPOSITION_START(c1) \
1214 do { \
1215 if (coding->composing == COMPOSITION_DISABLED) \
1216 { \
1217 *dst++ = ISO_CODE_ESC; \
1218 *dst++ = c1 & 0x7f; \
1219 coding->produced_char += 2; \
1220 } \
1221 else if (!COMPOSING_P (coding)) \
1222 { \
1223 /* This is surely the start of a composition. We must be sure \
1224 that coding->cmp_data has enough space to store the \
1225 information about the composition. If not, terminate the \
1226 current decoding loop, allocate one more memory block for \
1227 coding->cmp_data in the calller, then start the decoding \
1228 loop again. We can't allocate memory here directly because \
1229 it may cause buffer/string relocation. */ \
1230 if (!coding->cmp_data \
1231 || (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH \
1232 >= COMPOSITION_DATA_SIZE)) \
1233 { \
1234 coding->result = CODING_FINISH_INSUFFICIENT_CMP; \
1235 goto label_end_of_loop; \
1236 } \
1237 coding->composing = (c1 == '0' ? COMPOSITION_RELATIVE \
1238 : c1 == '2' ? COMPOSITION_WITH_RULE \
1239 : c1 == '3' ? COMPOSITION_WITH_ALTCHARS \
1240 : COMPOSITION_WITH_RULE_ALTCHARS); \
1241 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, \
1242 coding->composing); \
1243 coding->composition_rule_follows = 0; \
1244 } \
1245 else \
1246 { \
1247 /* We are already handling a composition. If the method is \
1248 the following two, the codes following the current escape \
1249 sequence are actual characters stored in a buffer. */ \
1250 if (coding->composing == COMPOSITION_WITH_ALTCHARS \
1251 || coding->composing == COMPOSITION_WITH_RULE_ALTCHARS) \
1252 { \
1253 coding->composing = COMPOSITION_RELATIVE; \
1254 coding->composition_rule_follows = 0; \
1255 } \
1256 } \
ec6d2bb8
KH
1257 } while (0)
1258
1259/* Handle compositoin end sequence ESC 1. */
1260
1261#define DECODE_COMPOSITION_END(c1) \
1262 do { \
1263 if (coding->composing == COMPOSITION_DISABLED) \
1264 { \
1265 *dst++ = ISO_CODE_ESC; \
1266 *dst++ = c1; \
1267 coding->produced_char += 2; \
1268 } \
1269 else \
1270 { \
1271 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
1272 coding->composing = COMPOSITION_NO; \
1273 } \
1274 } while (0)
1275
1276/* Decode a composition rule from the byte C1 (and maybe one more byte
1277 from SRC) and store one encoded composition rule in
1278 coding->cmp_data. */
1279
1280#define DECODE_COMPOSITION_RULE(c1) \
1281 do { \
1282 int rule = 0; \
1283 (c1) -= 32; \
1284 if (c1 < 81) /* old format (before ver.21) */ \
1285 { \
1286 int gref = (c1) / 9; \
1287 int nref = (c1) % 9; \
1288 if (gref == 4) gref = 10; \
1289 if (nref == 4) nref = 10; \
1290 rule = COMPOSITION_ENCODE_RULE (gref, nref); \
1291 } \
b73bfc1c 1292 else if (c1 < 93) /* new format (after ver.21) */ \
ec6d2bb8
KH
1293 { \
1294 ONE_MORE_BYTE (c2); \
1295 rule = COMPOSITION_ENCODE_RULE (c1 - 81, c2 - 32); \
1296 } \
1297 CODING_ADD_COMPOSITION_COMPONENT (coding, rule); \
1298 coding->composition_rule_follows = 0; \
1299 } while (0)
88993dfd 1300
d46c5b12 1301
4ed46869
KH
1302/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
1303
b73bfc1c 1304static void
d46c5b12 1305decode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
1306 struct coding_system *coding;
1307 unsigned char *source, *destination;
1308 int src_bytes, dst_bytes;
4ed46869
KH
1309{
1310 unsigned char *src = source;
1311 unsigned char *src_end = source + src_bytes;
1312 unsigned char *dst = destination;
1313 unsigned char *dst_end = destination + dst_bytes;
4ed46869
KH
1314 /* Charsets invoked to graphic plane 0 and 1 respectively. */
1315 int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1316 int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
b73bfc1c
KH
1317 /* SRC_BASE remembers the start position in source in each loop.
1318 The loop will be exited when there's not enough source code
1319 (within macro ONE_MORE_BYTE), or when there's not enough
1320 destination area to produce a character (within macro
1321 EMIT_CHAR). */
1322 unsigned char *src_base;
1323 int c, charset;
1324 Lisp_Object translation_table;
05e6f5dc
KH
1325 Lisp_Object safe_chars;
1326
1327 safe_chars = coding_safe_chars (coding);
bdd9fb48 1328
b73bfc1c
KH
1329 if (NILP (Venable_character_translation))
1330 translation_table = Qnil;
1331 else
1332 {
1333 translation_table = coding->translation_table_for_decode;
1334 if (NILP (translation_table))
1335 translation_table = Vstandard_translation_table_for_decode;
1336 }
4ed46869 1337
b73bfc1c
KH
1338 coding->result = CODING_FINISH_NORMAL;
1339
1340 while (1)
4ed46869 1341 {
b73bfc1c
KH
1342 int c1, c2;
1343
1344 src_base = src;
1345 ONE_MORE_BYTE (c1);
4ed46869 1346
ec6d2bb8 1347 /* We produce no character or one character. */
4ed46869
KH
1348 switch (iso_code_class [c1])
1349 {
1350 case ISO_0x20_or_0x7F:
ec6d2bb8
KH
1351 if (COMPOSING_P (coding) && coding->composition_rule_follows)
1352 {
1353 DECODE_COMPOSITION_RULE (c1);
b73bfc1c 1354 continue;
ec6d2bb8
KH
1355 }
1356 if (charset0 < 0 || CHARSET_CHARS (charset0) == 94)
4ed46869
KH
1357 {
1358 /* This is SPACE or DEL. */
b73bfc1c 1359 charset = CHARSET_ASCII;
4ed46869
KH
1360 break;
1361 }
1362 /* This is a graphic character, we fall down ... */
1363
1364 case ISO_graphic_plane_0:
ec6d2bb8 1365 if (COMPOSING_P (coding) && coding->composition_rule_follows)
b73bfc1c
KH
1366 {
1367 DECODE_COMPOSITION_RULE (c1);
1368 continue;
1369 }
1370 charset = charset0;
4ed46869
KH
1371 break;
1372
1373 case ISO_0xA0_or_0xFF:
d46c5b12
KH
1374 if (charset1 < 0 || CHARSET_CHARS (charset1) == 94
1375 || coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
fb88bf2d 1376 goto label_invalid_code;
4ed46869
KH
1377 /* This is a graphic character, we fall down ... */
1378
1379 case ISO_graphic_plane_1:
b73bfc1c 1380 if (charset1 < 0)
fb88bf2d 1381 goto label_invalid_code;
b73bfc1c 1382 charset = charset1;
4ed46869
KH
1383 break;
1384
b73bfc1c 1385 case ISO_control_0:
ec6d2bb8
KH
1386 if (COMPOSING_P (coding))
1387 DECODE_COMPOSITION_END ('1');
1388
4ed46869
KH
1389 /* All ISO2022 control characters in this class have the
1390 same representation in Emacs internal format. */
d46c5b12
KH
1391 if (c1 == '\n'
1392 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
1393 && (coding->eol_type == CODING_EOL_CR
1394 || coding->eol_type == CODING_EOL_CRLF))
1395 {
b73bfc1c
KH
1396 coding->result = CODING_FINISH_INCONSISTENT_EOL;
1397 goto label_end_of_loop;
d46c5b12 1398 }
b73bfc1c 1399 charset = CHARSET_ASCII;
4ed46869
KH
1400 break;
1401
b73bfc1c
KH
1402 case ISO_control_1:
1403 if (COMPOSING_P (coding))
1404 DECODE_COMPOSITION_END ('1');
1405 goto label_invalid_code;
1406
4ed46869 1407 case ISO_carriage_return:
ec6d2bb8
KH
1408 if (COMPOSING_P (coding))
1409 DECODE_COMPOSITION_END ('1');
1410
4ed46869 1411 if (coding->eol_type == CODING_EOL_CR)
b73bfc1c 1412 c1 = '\n';
4ed46869
KH
1413 else if (coding->eol_type == CODING_EOL_CRLF)
1414 {
1415 ONE_MORE_BYTE (c1);
b73bfc1c 1416 if (c1 != ISO_CODE_LF)
4ed46869 1417 {
d46c5b12
KH
1418 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
1419 {
b73bfc1c
KH
1420 coding->result = CODING_FINISH_INCONSISTENT_EOL;
1421 goto label_end_of_loop;
d46c5b12 1422 }
4ed46869 1423 src--;
b73bfc1c 1424 c1 = '\r';
4ed46869
KH
1425 }
1426 }
b73bfc1c 1427 charset = CHARSET_ASCII;
4ed46869
KH
1428 break;
1429
1430 case ISO_shift_out:
d46c5b12
KH
1431 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1432 || CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0)
1433 goto label_invalid_code;
4ed46869
KH
1434 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1;
1435 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1436 continue;
4ed46869
KH
1437
1438 case ISO_shift_in:
d46c5b12
KH
1439 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
1440 goto label_invalid_code;
4ed46869
KH
1441 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
1442 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1443 continue;
4ed46869
KH
1444
1445 case ISO_single_shift_2_7:
1446 case ISO_single_shift_2:
d46c5b12
KH
1447 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1448 goto label_invalid_code;
4ed46869
KH
1449 /* SS2 is handled as an escape sequence of ESC 'N' */
1450 c1 = 'N';
1451 goto label_escape_sequence;
1452
1453 case ISO_single_shift_3:
d46c5b12
KH
1454 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1455 goto label_invalid_code;
4ed46869
KH
1456 /* SS2 is handled as an escape sequence of ESC 'O' */
1457 c1 = 'O';
1458 goto label_escape_sequence;
1459
1460 case ISO_control_sequence_introducer:
1461 /* CSI is handled as an escape sequence of ESC '[' ... */
1462 c1 = '[';
1463 goto label_escape_sequence;
1464
1465 case ISO_escape:
1466 ONE_MORE_BYTE (c1);
1467 label_escape_sequence:
1468 /* Escape sequences handled by Emacs are invocation,
1469 designation, direction specification, and character
1470 composition specification. */
1471 switch (c1)
1472 {
1473 case '&': /* revision of following character set */
1474 ONE_MORE_BYTE (c1);
1475 if (!(c1 >= '@' && c1 <= '~'))
d46c5b12 1476 goto label_invalid_code;
4ed46869
KH
1477 ONE_MORE_BYTE (c1);
1478 if (c1 != ISO_CODE_ESC)
d46c5b12 1479 goto label_invalid_code;
4ed46869
KH
1480 ONE_MORE_BYTE (c1);
1481 goto label_escape_sequence;
1482
1483 case '$': /* designation of 2-byte character set */
d46c5b12
KH
1484 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
1485 goto label_invalid_code;
4ed46869
KH
1486 ONE_MORE_BYTE (c1);
1487 if (c1 >= '@' && c1 <= 'B')
1488 { /* designation of JISX0208.1978, GB2312.1980,
88993dfd 1489 or JISX0208.1980 */
4ed46869
KH
1490 DECODE_DESIGNATION (0, 2, 94, c1);
1491 }
1492 else if (c1 >= 0x28 && c1 <= 0x2B)
1493 { /* designation of DIMENSION2_CHARS94 character set */
1494 ONE_MORE_BYTE (c2);
1495 DECODE_DESIGNATION (c1 - 0x28, 2, 94, c2);
1496 }
1497 else if (c1 >= 0x2C && c1 <= 0x2F)
1498 { /* designation of DIMENSION2_CHARS96 character set */
1499 ONE_MORE_BYTE (c2);
1500 DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2);
1501 }
1502 else
d46c5b12 1503 goto label_invalid_code;
b73bfc1c
KH
1504 /* We must update these variables now. */
1505 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1506 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
1507 continue;
4ed46869
KH
1508
1509 case 'n': /* invocation of locking-shift-2 */
d46c5b12
KH
1510 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1511 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
1512 goto label_invalid_code;
4ed46869 1513 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2;
e0e989f6 1514 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1515 continue;
4ed46869
KH
1516
1517 case 'o': /* invocation of locking-shift-3 */
d46c5b12
KH
1518 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1519 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
1520 goto label_invalid_code;
4ed46869 1521 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3;
e0e989f6 1522 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1523 continue;
4ed46869
KH
1524
1525 case 'N': /* invocation of single-shift-2 */
d46c5b12
KH
1526 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1527 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
1528 goto label_invalid_code;
4ed46869 1529 charset = CODING_SPEC_ISO_DESIGNATION (coding, 2);
b73bfc1c 1530 ONE_MORE_BYTE (c1);
e7046a18
KH
1531 if (c1 < 0x20 || (c1 >= 0x80 && c1 < 0xA0))
1532 goto label_invalid_code;
4ed46869
KH
1533 break;
1534
1535 case 'O': /* invocation of single-shift-3 */
d46c5b12
KH
1536 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1537 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
1538 goto label_invalid_code;
4ed46869 1539 charset = CODING_SPEC_ISO_DESIGNATION (coding, 3);
b73bfc1c 1540 ONE_MORE_BYTE (c1);
e7046a18
KH
1541 if (c1 < 0x20 || (c1 >= 0x80 && c1 < 0xA0))
1542 goto label_invalid_code;
4ed46869
KH
1543 break;
1544
ec6d2bb8
KH
1545 case '0': case '2': case '3': case '4': /* start composition */
1546 DECODE_COMPOSITION_START (c1);
b73bfc1c 1547 continue;
4ed46869 1548
ec6d2bb8
KH
1549 case '1': /* end composition */
1550 DECODE_COMPOSITION_END (c1);
b73bfc1c 1551 continue;
4ed46869
KH
1552
1553 case '[': /* specification of direction */
d46c5b12
KH
1554 if (coding->flags & CODING_FLAG_ISO_NO_DIRECTION)
1555 goto label_invalid_code;
4ed46869 1556 /* For the moment, nested direction is not supported.
d46c5b12
KH
1557 So, `coding->mode & CODING_MODE_DIRECTION' zero means
1558 left-to-right, and nozero means right-to-left. */
4ed46869
KH
1559 ONE_MORE_BYTE (c1);
1560 switch (c1)
1561 {
1562 case ']': /* end of the current direction */
d46c5b12 1563 coding->mode &= ~CODING_MODE_DIRECTION;
4ed46869
KH
1564
1565 case '0': /* end of the current direction */
1566 case '1': /* start of left-to-right direction */
1567 ONE_MORE_BYTE (c1);
1568 if (c1 == ']')
d46c5b12 1569 coding->mode &= ~CODING_MODE_DIRECTION;
4ed46869 1570 else
d46c5b12 1571 goto label_invalid_code;
4ed46869
KH
1572 break;
1573
1574 case '2': /* start of right-to-left direction */
1575 ONE_MORE_BYTE (c1);
1576 if (c1 == ']')
d46c5b12 1577 coding->mode |= CODING_MODE_DIRECTION;
4ed46869 1578 else
d46c5b12 1579 goto label_invalid_code;
4ed46869
KH
1580 break;
1581
1582 default:
d46c5b12 1583 goto label_invalid_code;
4ed46869 1584 }
b73bfc1c 1585 continue;
4ed46869
KH
1586
1587 default:
d46c5b12
KH
1588 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
1589 goto label_invalid_code;
4ed46869
KH
1590 if (c1 >= 0x28 && c1 <= 0x2B)
1591 { /* designation of DIMENSION1_CHARS94 character set */
1592 ONE_MORE_BYTE (c2);
1593 DECODE_DESIGNATION (c1 - 0x28, 1, 94, c2);
1594 }
1595 else if (c1 >= 0x2C && c1 <= 0x2F)
1596 { /* designation of DIMENSION1_CHARS96 character set */
1597 ONE_MORE_BYTE (c2);
1598 DECODE_DESIGNATION (c1 - 0x2C, 1, 96, c2);
1599 }
1600 else
b73bfc1c
KH
1601 goto label_invalid_code;
1602 /* We must update these variables now. */
1603 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1604 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
1605 continue;
4ed46869 1606 }
b73bfc1c 1607 }
4ed46869 1608
b73bfc1c
KH
1609 /* Now we know CHARSET and 1st position code C1 of a character.
1610 Produce a multibyte sequence for that character while getting
1611 2nd position code C2 if necessary. */
1612 if (CHARSET_DIMENSION (charset) == 2)
1613 {
1614 ONE_MORE_BYTE (c2);
1615 if (c1 < 0x80 ? c2 < 0x20 || c2 >= 0x80 : c2 < 0xA0)
1616 /* C2 is not in a valid range. */
1617 goto label_invalid_code;
4ed46869 1618 }
b73bfc1c
KH
1619 c = DECODE_ISO_CHARACTER (charset, c1, c2);
1620 EMIT_CHAR (c);
4ed46869
KH
1621 continue;
1622
b73bfc1c
KH
1623 label_invalid_code:
1624 coding->errors++;
1625 if (COMPOSING_P (coding))
1626 DECODE_COMPOSITION_END ('1');
4ed46869 1627 src = src_base;
b73bfc1c
KH
1628 c = *src++;
1629 EMIT_CHAR (c);
4ed46869 1630 }
fb88bf2d 1631
b73bfc1c
KH
1632 label_end_of_loop:
1633 coding->consumed = coding->consumed_char = src_base - source;
d46c5b12 1634 coding->produced = dst - destination;
b73bfc1c 1635 return;
4ed46869
KH
1636}
1637
b73bfc1c 1638
f4dee582 1639/* ISO2022 encoding stuff. */
4ed46869
KH
1640
1641/*
f4dee582 1642 It is not enough to say just "ISO2022" on encoding, we have to
d46c5b12 1643 specify more details. In Emacs, each coding system of ISO2022
4ed46869
KH
1644 variant has the following specifications:
1645 1. Initial designation to G0 thru G3.
1646 2. Allows short-form designation?
1647 3. ASCII should be designated to G0 before control characters?
1648 4. ASCII should be designated to G0 at end of line?
1649 5. 7-bit environment or 8-bit environment?
1650 6. Use locking-shift?
1651 7. Use Single-shift?
1652 And the following two are only for Japanese:
1653 8. Use ASCII in place of JIS0201-1976-Roman?
1654 9. Use JISX0208-1983 in place of JISX0208-1978?
1655 These specifications are encoded in `coding->flags' as flag bits
1656 defined by macros CODING_FLAG_ISO_XXX. See `coding.h' for more
f4dee582 1657 details.
4ed46869
KH
1658*/
1659
1660/* Produce codes (escape sequence) for designating CHARSET to graphic
b73bfc1c
KH
1661 register REG at DST, and increment DST. If <final-char> of CHARSET is
1662 '@', 'A', or 'B' and the coding system CODING allows, produce
1663 designation sequence of short-form. */
4ed46869
KH
1664
1665#define ENCODE_DESIGNATION(charset, reg, coding) \
1666 do { \
1667 unsigned char final_char = CHARSET_ISO_FINAL_CHAR (charset); \
1668 char *intermediate_char_94 = "()*+"; \
1669 char *intermediate_char_96 = ",-./"; \
70c22245 1670 int revision = CODING_SPEC_ISO_REVISION_NUMBER(coding, charset); \
b73bfc1c 1671 \
70c22245
KH
1672 if (revision < 255) \
1673 { \
4ed46869
KH
1674 *dst++ = ISO_CODE_ESC; \
1675 *dst++ = '&'; \
70c22245 1676 *dst++ = '@' + revision; \
4ed46869 1677 } \
b73bfc1c 1678 *dst++ = ISO_CODE_ESC; \
4ed46869
KH
1679 if (CHARSET_DIMENSION (charset) == 1) \
1680 { \
1681 if (CHARSET_CHARS (charset) == 94) \
1682 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
1683 else \
1684 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
1685 } \
1686 else \
1687 { \
1688 *dst++ = '$'; \
1689 if (CHARSET_CHARS (charset) == 94) \
1690 { \
b73bfc1c
KH
1691 if (! (coding->flags & CODING_FLAG_ISO_SHORT_FORM) \
1692 || reg != 0 \
1693 || final_char < '@' || final_char > 'B') \
4ed46869
KH
1694 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
1695 } \
1696 else \
b73bfc1c 1697 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
4ed46869 1698 } \
b73bfc1c 1699 *dst++ = final_char; \
4ed46869
KH
1700 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1701 } while (0)
1702
1703/* The following two macros produce codes (control character or escape
1704 sequence) for ISO2022 single-shift functions (single-shift-2 and
1705 single-shift-3). */
1706
1707#define ENCODE_SINGLE_SHIFT_2 \
1708 do { \
1709 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
1710 *dst++ = ISO_CODE_ESC, *dst++ = 'N'; \
1711 else \
b73bfc1c 1712 *dst++ = ISO_CODE_SS2; \
4ed46869
KH
1713 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
1714 } while (0)
1715
fb88bf2d
KH
1716#define ENCODE_SINGLE_SHIFT_3 \
1717 do { \
4ed46869 1718 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
fb88bf2d
KH
1719 *dst++ = ISO_CODE_ESC, *dst++ = 'O'; \
1720 else \
b73bfc1c 1721 *dst++ = ISO_CODE_SS3; \
4ed46869
KH
1722 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
1723 } while (0)
1724
1725/* The following four macros produce codes (control character or
1726 escape sequence) for ISO2022 locking-shift functions (shift-in,
1727 shift-out, locking-shift-2, and locking-shift-3). */
1728
b73bfc1c
KH
1729#define ENCODE_SHIFT_IN \
1730 do { \
1731 *dst++ = ISO_CODE_SI; \
4ed46869
KH
1732 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; \
1733 } while (0)
1734
b73bfc1c
KH
1735#define ENCODE_SHIFT_OUT \
1736 do { \
1737 *dst++ = ISO_CODE_SO; \
4ed46869
KH
1738 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; \
1739 } while (0)
1740
1741#define ENCODE_LOCKING_SHIFT_2 \
1742 do { \
1743 *dst++ = ISO_CODE_ESC, *dst++ = 'n'; \
1744 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; \
1745 } while (0)
1746
b73bfc1c
KH
1747#define ENCODE_LOCKING_SHIFT_3 \
1748 do { \
1749 *dst++ = ISO_CODE_ESC, *dst++ = 'o'; \
4ed46869
KH
1750 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; \
1751 } while (0)
1752
f4dee582
RS
1753/* Produce codes for a DIMENSION1 character whose character set is
1754 CHARSET and whose position-code is C1. Designation and invocation
4ed46869
KH
1755 sequences are also produced in advance if necessary. */
1756
6e85d753
KH
1757#define ENCODE_ISO_CHARACTER_DIMENSION1(charset, c1) \
1758 do { \
1759 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
1760 { \
1761 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
1762 *dst++ = c1 & 0x7F; \
1763 else \
1764 *dst++ = c1 | 0x80; \
1765 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
1766 break; \
1767 } \
1768 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
1769 { \
1770 *dst++ = c1 & 0x7F; \
1771 break; \
1772 } \
1773 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
1774 { \
1775 *dst++ = c1 | 0x80; \
1776 break; \
1777 } \
6e85d753
KH
1778 else \
1779 /* Since CHARSET is not yet invoked to any graphic planes, we \
1780 must invoke it, or, at first, designate it to some graphic \
1781 register. Then repeat the loop to actually produce the \
1782 character. */ \
1783 dst = encode_invocation_designation (charset, coding, dst); \
4ed46869
KH
1784 } while (1)
1785
f4dee582
RS
1786/* Produce codes for a DIMENSION2 character whose character set is
1787 CHARSET and whose position-codes are C1 and C2. Designation and
4ed46869
KH
1788 invocation codes are also produced in advance if necessary. */
1789
6e85d753
KH
1790#define ENCODE_ISO_CHARACTER_DIMENSION2(charset, c1, c2) \
1791 do { \
1792 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
1793 { \
1794 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
1795 *dst++ = c1 & 0x7F, *dst++ = c2 & 0x7F; \
1796 else \
1797 *dst++ = c1 | 0x80, *dst++ = c2 | 0x80; \
1798 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
1799 break; \
1800 } \
1801 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
1802 { \
1803 *dst++ = c1 & 0x7F, *dst++= c2 & 0x7F; \
1804 break; \
1805 } \
1806 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
1807 { \
1808 *dst++ = c1 | 0x80, *dst++= c2 | 0x80; \
1809 break; \
1810 } \
6e85d753
KH
1811 else \
1812 /* Since CHARSET is not yet invoked to any graphic planes, we \
1813 must invoke it, or, at first, designate it to some graphic \
1814 register. Then repeat the loop to actually produce the \
1815 character. */ \
1816 dst = encode_invocation_designation (charset, coding, dst); \
4ed46869
KH
1817 } while (1)
1818
05e6f5dc
KH
1819#define ENCODE_ISO_CHARACTER(c) \
1820 do { \
1821 int charset, c1, c2; \
1822 \
1823 SPLIT_CHAR (c, charset, c1, c2); \
1824 if (CHARSET_DEFINED_P (charset)) \
1825 { \
1826 if (CHARSET_DIMENSION (charset) == 1) \
1827 { \
1828 if (charset == CHARSET_ASCII \
1829 && coding->flags & CODING_FLAG_ISO_USE_ROMAN) \
1830 charset = charset_latin_jisx0201; \
1831 ENCODE_ISO_CHARACTER_DIMENSION1 (charset, c1); \
1832 } \
1833 else \
1834 { \
1835 if (charset == charset_jisx0208 \
1836 && coding->flags & CODING_FLAG_ISO_USE_OLDJIS) \
1837 charset = charset_jisx0208_1978; \
1838 ENCODE_ISO_CHARACTER_DIMENSION2 (charset, c1, c2); \
1839 } \
1840 } \
1841 else \
1842 { \
1843 *dst++ = c1; \
1844 if (c2 >= 0) \
1845 *dst++ = c2; \
1846 } \
1847 } while (0)
1848
1849
1850/* Instead of encoding character C, produce one or two `?'s. */
1851
1852#define ENCODE_UNSAFE_CHARACTER(c) \
6f551029 1853 do { \
05e6f5dc
KH
1854 ENCODE_ISO_CHARACTER (CODING_INHIBIT_CHARACTER_SUBSTITUTION); \
1855 if (CHARSET_WIDTH (CHAR_CHARSET (c)) > 1) \
1856 ENCODE_ISO_CHARACTER (CODING_INHIBIT_CHARACTER_SUBSTITUTION); \
84fbb8a0 1857 } while (0)
bdd9fb48 1858
05e6f5dc 1859
4ed46869
KH
1860/* Produce designation and invocation codes at a place pointed by DST
1861 to use CHARSET. The element `spec.iso2022' of *CODING is updated.
1862 Return new DST. */
1863
1864unsigned char *
1865encode_invocation_designation (charset, coding, dst)
1866 int charset;
1867 struct coding_system *coding;
1868 unsigned char *dst;
1869{
1870 int reg; /* graphic register number */
1871
1872 /* At first, check designations. */
1873 for (reg = 0; reg < 4; reg++)
1874 if (charset == CODING_SPEC_ISO_DESIGNATION (coding, reg))
1875 break;
1876
1877 if (reg >= 4)
1878 {
1879 /* CHARSET is not yet designated to any graphic registers. */
1880 /* At first check the requested designation. */
1881 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
1ba9e4ab
KH
1882 if (reg == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)
1883 /* Since CHARSET requests no special designation, designate it
1884 to graphic register 0. */
4ed46869
KH
1885 reg = 0;
1886
1887 ENCODE_DESIGNATION (charset, reg, coding);
1888 }
1889
1890 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != reg
1891 && CODING_SPEC_ISO_INVOCATION (coding, 1) != reg)
1892 {
1893 /* Since the graphic register REG is not invoked to any graphic
1894 planes, invoke it to graphic plane 0. */
1895 switch (reg)
1896 {
1897 case 0: /* graphic register 0 */
1898 ENCODE_SHIFT_IN;
1899 break;
1900
1901 case 1: /* graphic register 1 */
1902 ENCODE_SHIFT_OUT;
1903 break;
1904
1905 case 2: /* graphic register 2 */
1906 if (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1907 ENCODE_SINGLE_SHIFT_2;
1908 else
1909 ENCODE_LOCKING_SHIFT_2;
1910 break;
1911
1912 case 3: /* graphic register 3 */
1913 if (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1914 ENCODE_SINGLE_SHIFT_3;
1915 else
1916 ENCODE_LOCKING_SHIFT_3;
1917 break;
1918 }
1919 }
b73bfc1c 1920
4ed46869
KH
1921 return dst;
1922}
1923
ec6d2bb8
KH
1924/* Produce 2-byte codes for encoded composition rule RULE. */
1925
1926#define ENCODE_COMPOSITION_RULE(rule) \
1927 do { \
1928 int gref, nref; \
1929 COMPOSITION_DECODE_RULE (rule, gref, nref); \
1930 *dst++ = 32 + 81 + gref; \
1931 *dst++ = 32 + nref; \
1932 } while (0)
1933
1934/* Produce codes for indicating the start of a composition sequence
1935 (ESC 0, ESC 3, or ESC 4). DATA points to an array of integers
1936 which specify information about the composition. See the comment
1937 in coding.h for the format of DATA. */
1938
1939#define ENCODE_COMPOSITION_START(coding, data) \
1940 do { \
1941 coding->composing = data[3]; \
1942 *dst++ = ISO_CODE_ESC; \
1943 if (coding->composing == COMPOSITION_RELATIVE) \
1944 *dst++ = '0'; \
1945 else \
1946 { \
1947 *dst++ = (coding->composing == COMPOSITION_WITH_ALTCHARS \
1948 ? '3' : '4'); \
1949 coding->cmp_data_index = coding->cmp_data_start + 4; \
1950 coding->composition_rule_follows = 0; \
1951 } \
1952 } while (0)
1953
1954/* Produce codes for indicating the end of the current composition. */
1955
1956#define ENCODE_COMPOSITION_END(coding, data) \
1957 do { \
1958 *dst++ = ISO_CODE_ESC; \
1959 *dst++ = '1'; \
1960 coding->cmp_data_start += data[0]; \
1961 coding->composing = COMPOSITION_NO; \
1962 if (coding->cmp_data_start == coding->cmp_data->used \
1963 && coding->cmp_data->next) \
1964 { \
1965 coding->cmp_data = coding->cmp_data->next; \
1966 coding->cmp_data_start = 0; \
1967 } \
1968 } while (0)
1969
1970/* Produce composition start sequence ESC 0. Here, this sequence
1971 doesn't mean the start of a new composition but means that we have
1972 just produced components (alternate chars and composition rules) of
1973 the composition and the actual text follows in SRC. */
1974
1975#define ENCODE_COMPOSITION_FAKE_START(coding) \
1976 do { \
1977 *dst++ = ISO_CODE_ESC; \
1978 *dst++ = '0'; \
1979 coding->composing = COMPOSITION_RELATIVE; \
1980 } while (0)
4ed46869
KH
1981
1982/* The following three macros produce codes for indicating direction
1983 of text. */
b73bfc1c
KH
1984#define ENCODE_CONTROL_SEQUENCE_INTRODUCER \
1985 do { \
4ed46869 1986 if (coding->flags == CODING_FLAG_ISO_SEVEN_BITS) \
b73bfc1c
KH
1987 *dst++ = ISO_CODE_ESC, *dst++ = '['; \
1988 else \
1989 *dst++ = ISO_CODE_CSI; \
4ed46869
KH
1990 } while (0)
1991
1992#define ENCODE_DIRECTION_R2L \
b73bfc1c 1993 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '2', *dst++ = ']'
4ed46869
KH
1994
1995#define ENCODE_DIRECTION_L2R \
b73bfc1c 1996 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '0', *dst++ = ']'
4ed46869
KH
1997
1998/* Produce codes for designation and invocation to reset the graphic
1999 planes and registers to initial state. */
e0e989f6
KH
2000#define ENCODE_RESET_PLANE_AND_REGISTER \
2001 do { \
2002 int reg; \
2003 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != 0) \
2004 ENCODE_SHIFT_IN; \
2005 for (reg = 0; reg < 4; reg++) \
2006 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg) >= 0 \
2007 && (CODING_SPEC_ISO_DESIGNATION (coding, reg) \
2008 != CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg))) \
2009 ENCODE_DESIGNATION \
2010 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \
4ed46869
KH
2011 } while (0)
2012
bdd9fb48 2013/* Produce designation sequences of charsets in the line started from
b73bfc1c 2014 SRC to a place pointed by DST, and return updated DST.
bdd9fb48
KH
2015
2016 If the current block ends before any end-of-line, we may fail to
d46c5b12
KH
2017 find all the necessary designations. */
2018
b73bfc1c
KH
2019static unsigned char *
2020encode_designation_at_bol (coding, translation_table, src, src_end, dst)
e0e989f6 2021 struct coding_system *coding;
b73bfc1c
KH
2022 Lisp_Object translation_table;
2023 unsigned char *src, *src_end, *dst;
e0e989f6 2024{
bdd9fb48
KH
2025 int charset, c, found = 0, reg;
2026 /* Table of charsets to be designated to each graphic register. */
2027 int r[4];
bdd9fb48
KH
2028
2029 for (reg = 0; reg < 4; reg++)
2030 r[reg] = -1;
2031
b73bfc1c 2032 while (found < 4)
e0e989f6 2033 {
b73bfc1c
KH
2034 ONE_MORE_CHAR (c);
2035 if (c == '\n')
2036 break;
bdd9fb48 2037
b73bfc1c 2038 charset = CHAR_CHARSET (c);
e0e989f6 2039 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
d46c5b12 2040 if (reg != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION && r[reg] < 0)
bdd9fb48
KH
2041 {
2042 found++;
2043 r[reg] = charset;
2044 }
bdd9fb48
KH
2045 }
2046
b73bfc1c 2047 label_end_of_loop:
bdd9fb48
KH
2048 if (found)
2049 {
2050 for (reg = 0; reg < 4; reg++)
2051 if (r[reg] >= 0
2052 && CODING_SPEC_ISO_DESIGNATION (coding, reg) != r[reg])
2053 ENCODE_DESIGNATION (r[reg], reg, coding);
e0e989f6 2054 }
b73bfc1c
KH
2055
2056 return dst;
e0e989f6
KH
2057}
2058
4ed46869
KH
2059/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
2060
b73bfc1c 2061static void
d46c5b12 2062encode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
2063 struct coding_system *coding;
2064 unsigned char *source, *destination;
2065 int src_bytes, dst_bytes;
4ed46869
KH
2066{
2067 unsigned char *src = source;
2068 unsigned char *src_end = source + src_bytes;
2069 unsigned char *dst = destination;
2070 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c 2071 /* Since the maximum bytes produced by each loop is 20, we subtract 19
4ed46869
KH
2072 from DST_END to assure overflow checking is necessary only at the
2073 head of loop. */
b73bfc1c
KH
2074 unsigned char *adjusted_dst_end = dst_end - 19;
2075 /* SRC_BASE remembers the start position in source in each loop.
2076 The loop will be exited when there's not enough source text to
2077 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2078 there's not enough destination area to produce encoded codes
2079 (within macro EMIT_BYTES). */
2080 unsigned char *src_base;
2081 int c;
2082 Lisp_Object translation_table;
05e6f5dc
KH
2083 Lisp_Object safe_chars;
2084
2085 safe_chars = coding_safe_chars (coding);
bdd9fb48 2086
b73bfc1c
KH
2087 if (NILP (Venable_character_translation))
2088 translation_table = Qnil;
2089 else
2090 {
2091 translation_table = coding->translation_table_for_encode;
2092 if (NILP (translation_table))
2093 translation_table = Vstandard_translation_table_for_encode;
2094 }
4ed46869 2095
d46c5b12 2096 coding->consumed_char = 0;
b73bfc1c
KH
2097 coding->errors = 0;
2098 while (1)
4ed46869 2099 {
b73bfc1c
KH
2100 src_base = src;
2101
2102 if (dst >= (dst_bytes ? adjusted_dst_end : (src - 19)))
2103 {
2104 coding->result = CODING_FINISH_INSUFFICIENT_DST;
2105 break;
2106 }
4ed46869 2107
e0e989f6
KH
2108 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL
2109 && CODING_SPEC_ISO_BOL (coding))
2110 {
bdd9fb48 2111 /* We have to produce designation sequences if any now. */
b73bfc1c
KH
2112 dst = encode_designation_at_bol (coding, translation_table,
2113 src, src_end, dst);
e0e989f6
KH
2114 CODING_SPEC_ISO_BOL (coding) = 0;
2115 }
2116
ec6d2bb8
KH
2117 /* Check composition start and end. */
2118 if (coding->composing != COMPOSITION_DISABLED
2119 && coding->cmp_data_start < coding->cmp_data->used)
4ed46869 2120 {
ec6d2bb8
KH
2121 struct composition_data *cmp_data = coding->cmp_data;
2122 int *data = cmp_data->data + coding->cmp_data_start;
2123 int this_pos = cmp_data->char_offset + coding->consumed_char;
2124
2125 if (coding->composing == COMPOSITION_RELATIVE)
4ed46869 2126 {
ec6d2bb8
KH
2127 if (this_pos == data[2])
2128 {
2129 ENCODE_COMPOSITION_END (coding, data);
2130 cmp_data = coding->cmp_data;
2131 data = cmp_data->data + coding->cmp_data_start;
2132 }
4ed46869 2133 }
ec6d2bb8 2134 else if (COMPOSING_P (coding))
4ed46869 2135 {
ec6d2bb8
KH
2136 /* COMPOSITION_WITH_ALTCHARS or COMPOSITION_WITH_RULE_ALTCHAR */
2137 if (coding->cmp_data_index == coding->cmp_data_start + data[0])
2138 /* We have consumed components of the composition.
2139 What follows in SRC is the compositions's base
2140 text. */
2141 ENCODE_COMPOSITION_FAKE_START (coding);
2142 else
4ed46869 2143 {
ec6d2bb8
KH
2144 int c = cmp_data->data[coding->cmp_data_index++];
2145 if (coding->composition_rule_follows)
2146 {
2147 ENCODE_COMPOSITION_RULE (c);
2148 coding->composition_rule_follows = 0;
2149 }
2150 else
2151 {
05e6f5dc
KH
2152 if (coding->flags & CODING_FLAG_ISO_SAFE
2153 && ! CODING_SAFE_CHAR_P (safe_chars, c))
2154 ENCODE_UNSAFE_CHARACTER (c);
2155 else
2156 ENCODE_ISO_CHARACTER (c);
ec6d2bb8
KH
2157 if (coding->composing == COMPOSITION_WITH_RULE_ALTCHARS)
2158 coding->composition_rule_follows = 1;
2159 }
4ed46869
KH
2160 continue;
2161 }
ec6d2bb8
KH
2162 }
2163 if (!COMPOSING_P (coding))
2164 {
2165 if (this_pos == data[1])
4ed46869 2166 {
ec6d2bb8
KH
2167 ENCODE_COMPOSITION_START (coding, data);
2168 continue;
4ed46869 2169 }
4ed46869
KH
2170 }
2171 }
ec6d2bb8 2172
b73bfc1c 2173 ONE_MORE_CHAR (c);
4ed46869 2174
b73bfc1c
KH
2175 /* Now encode the character C. */
2176 if (c < 0x20 || c == 0x7F)
2177 {
2178 if (c == '\r')
19a8d9e0 2179 {
b73bfc1c
KH
2180 if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY))
2181 {
2182 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
2183 ENCODE_RESET_PLANE_AND_REGISTER;
2184 *dst++ = c;
2185 continue;
2186 }
2187 /* fall down to treat '\r' as '\n' ... */
2188 c = '\n';
19a8d9e0 2189 }
b73bfc1c 2190 if (c == '\n')
19a8d9e0 2191 {
b73bfc1c
KH
2192 if (coding->flags & CODING_FLAG_ISO_RESET_AT_EOL)
2193 ENCODE_RESET_PLANE_AND_REGISTER;
2194 if (coding->flags & CODING_FLAG_ISO_INIT_AT_BOL)
2195 bcopy (coding->spec.iso2022.initial_designation,
2196 coding->spec.iso2022.current_designation,
2197 sizeof coding->spec.iso2022.initial_designation);
2198 if (coding->eol_type == CODING_EOL_LF
2199 || coding->eol_type == CODING_EOL_UNDECIDED)
2200 *dst++ = ISO_CODE_LF;
2201 else if (coding->eol_type == CODING_EOL_CRLF)
2202 *dst++ = ISO_CODE_CR, *dst++ = ISO_CODE_LF;
2203 else
2204 *dst++ = ISO_CODE_CR;
2205 CODING_SPEC_ISO_BOL (coding) = 1;
19a8d9e0 2206 }
b73bfc1c 2207 else
19a8d9e0 2208 {
b73bfc1c
KH
2209 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
2210 ENCODE_RESET_PLANE_AND_REGISTER;
2211 *dst++ = c;
19a8d9e0 2212 }
4ed46869 2213 }
b73bfc1c 2214 else if (ASCII_BYTE_P (c))
05e6f5dc 2215 ENCODE_ISO_CHARACTER (c);
b73bfc1c 2216 else if (SINGLE_BYTE_CHAR_P (c))
88993dfd 2217 {
b73bfc1c
KH
2218 *dst++ = c;
2219 coding->errors++;
88993dfd 2220 }
05e6f5dc
KH
2221 else if (coding->flags & CODING_FLAG_ISO_SAFE
2222 && ! CODING_SAFE_CHAR_P (safe_chars, c))
2223 ENCODE_UNSAFE_CHARACTER (c);
b73bfc1c 2224 else
05e6f5dc 2225 ENCODE_ISO_CHARACTER (c);
b73bfc1c
KH
2226
2227 coding->consumed_char++;
84fbb8a0 2228 }
b73bfc1c
KH
2229
2230 label_end_of_loop:
2231 coding->consumed = src_base - source;
d46c5b12 2232 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
2233}
2234
2235\f
2236/*** 4. SJIS and BIG5 handlers ***/
2237
f4dee582 2238/* Although SJIS and BIG5 are not ISO's coding system, they are used
4ed46869
KH
2239 quite widely. So, for the moment, Emacs supports them in the bare
2240 C code. But, in the future, they may be supported only by CCL. */
2241
2242/* SJIS is a coding system encoding three character sets: ASCII, right
2243 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded
2244 as is. A character of charset katakana-jisx0201 is encoded by
2245 "position-code + 0x80". A character of charset japanese-jisx0208
2246 is encoded in 2-byte but two position-codes are divided and shifted
2247 so that it fit in the range below.
2248
2249 --- CODE RANGE of SJIS ---
2250 (character set) (range)
2251 ASCII 0x00 .. 0x7F
2252 KATAKANA-JISX0201 0xA0 .. 0xDF
c28a9453 2253 JISX0208 (1st byte) 0x81 .. 0x9F and 0xE0 .. 0xEF
d14d03ac 2254 (2nd byte) 0x40 .. 0x7E and 0x80 .. 0xFC
4ed46869
KH
2255 -------------------------------
2256
2257*/
2258
2259/* BIG5 is a coding system encoding two character sets: ASCII and
2260 Big5. An ASCII character is encoded as is. Big5 is a two-byte
2261 character set and is encoded in two-byte.
2262
2263 --- CODE RANGE of BIG5 ---
2264 (character set) (range)
2265 ASCII 0x00 .. 0x7F
2266 Big5 (1st byte) 0xA1 .. 0xFE
2267 (2nd byte) 0x40 .. 0x7E and 0xA1 .. 0xFE
2268 --------------------------
2269
2270 Since the number of characters in Big5 is larger than maximum
2271 characters in Emacs' charset (96x96), it can't be handled as one
2272 charset. So, in Emacs, Big5 is divided into two: `charset-big5-1'
2273 and `charset-big5-2'. Both are DIMENSION2 and CHARS94. The former
2274 contains frequently used characters and the latter contains less
2275 frequently used characters. */
2276
2277/* Macros to decode or encode a character of Big5 in BIG5. B1 and B2
2278 are the 1st and 2nd position-codes of Big5 in BIG5 coding system.
2279 C1 and C2 are the 1st and 2nd position-codes of of Emacs' internal
2280 format. CHARSET is `charset_big5_1' or `charset_big5_2'. */
2281
2282/* Number of Big5 characters which have the same code in 1st byte. */
2283#define BIG5_SAME_ROW (0xFF - 0xA1 + 0x7F - 0x40)
2284
2285#define DECODE_BIG5(b1, b2, charset, c1, c2) \
2286 do { \
2287 unsigned int temp \
2288 = (b1 - 0xA1) * BIG5_SAME_ROW + b2 - (b2 < 0x7F ? 0x40 : 0x62); \
2289 if (b1 < 0xC9) \
2290 charset = charset_big5_1; \
2291 else \
2292 { \
2293 charset = charset_big5_2; \
2294 temp -= (0xC9 - 0xA1) * BIG5_SAME_ROW; \
2295 } \
2296 c1 = temp / (0xFF - 0xA1) + 0x21; \
2297 c2 = temp % (0xFF - 0xA1) + 0x21; \
2298 } while (0)
2299
2300#define ENCODE_BIG5(charset, c1, c2, b1, b2) \
2301 do { \
2302 unsigned int temp = (c1 - 0x21) * (0xFF - 0xA1) + (c2 - 0x21); \
2303 if (charset == charset_big5_2) \
2304 temp += BIG5_SAME_ROW * (0xC9 - 0xA1); \
2305 b1 = temp / BIG5_SAME_ROW + 0xA1; \
2306 b2 = temp % BIG5_SAME_ROW; \
2307 b2 += b2 < 0x3F ? 0x40 : 0x62; \
2308 } while (0)
2309
2310/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2311 Check if a text is encoded in SJIS. If it is, return
2312 CODING_CATEGORY_MASK_SJIS, else return 0. */
2313
0a28aafb
KH
2314static int
2315detect_coding_sjis (src, src_end, multibytep)
4ed46869 2316 unsigned char *src, *src_end;
0a28aafb 2317 int multibytep;
4ed46869 2318{
b73bfc1c
KH
2319 int c;
2320 /* Dummy for ONE_MORE_BYTE. */
2321 struct coding_system dummy_coding;
2322 struct coding_system *coding = &dummy_coding;
4ed46869 2323
b73bfc1c 2324 while (1)
4ed46869 2325 {
0a28aafb 2326 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
fd6f711b 2327 if (c >= 0x81)
4ed46869 2328 {
fd6f711b
KH
2329 if (c <= 0x9F || (c >= 0xE0 && c <= 0xEF))
2330 {
0a28aafb 2331 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
fd6f711b
KH
2332 if (c < 0x40 || c == 0x7F || c > 0xFC)
2333 return 0;
2334 }
2335 else if (c > 0xDF)
4ed46869
KH
2336 return 0;
2337 }
2338 }
b73bfc1c 2339 label_end_of_loop:
4ed46869
KH
2340 return CODING_CATEGORY_MASK_SJIS;
2341}
2342
2343/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2344 Check if a text is encoded in BIG5. If it is, return
2345 CODING_CATEGORY_MASK_BIG5, else return 0. */
2346
0a28aafb
KH
2347static int
2348detect_coding_big5 (src, src_end, multibytep)
4ed46869 2349 unsigned char *src, *src_end;
0a28aafb 2350 int multibytep;
4ed46869 2351{
b73bfc1c
KH
2352 int c;
2353 /* Dummy for ONE_MORE_BYTE. */
2354 struct coding_system dummy_coding;
2355 struct coding_system *coding = &dummy_coding;
4ed46869 2356
b73bfc1c 2357 while (1)
4ed46869 2358 {
0a28aafb 2359 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
4ed46869
KH
2360 if (c >= 0xA1)
2361 {
0a28aafb 2362 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
4ed46869
KH
2363 if (c < 0x40 || (c >= 0x7F && c <= 0xA0))
2364 return 0;
2365 }
2366 }
b73bfc1c 2367 label_end_of_loop:
4ed46869
KH
2368 return CODING_CATEGORY_MASK_BIG5;
2369}
2370
fa42c37f
KH
2371/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2372 Check if a text is encoded in UTF-8. If it is, return
2373 CODING_CATEGORY_MASK_UTF_8, else return 0. */
2374
2375#define UTF_8_1_OCTET_P(c) ((c) < 0x80)
2376#define UTF_8_EXTRA_OCTET_P(c) (((c) & 0xC0) == 0x80)
2377#define UTF_8_2_OCTET_LEADING_P(c) (((c) & 0xE0) == 0xC0)
2378#define UTF_8_3_OCTET_LEADING_P(c) (((c) & 0xF0) == 0xE0)
2379#define UTF_8_4_OCTET_LEADING_P(c) (((c) & 0xF8) == 0xF0)
2380#define UTF_8_5_OCTET_LEADING_P(c) (((c) & 0xFC) == 0xF8)
2381#define UTF_8_6_OCTET_LEADING_P(c) (((c) & 0xFE) == 0xFC)
2382
0a28aafb
KH
2383static int
2384detect_coding_utf_8 (src, src_end, multibytep)
fa42c37f 2385 unsigned char *src, *src_end;
0a28aafb 2386 int multibytep;
fa42c37f
KH
2387{
2388 unsigned char c;
2389 int seq_maybe_bytes;
b73bfc1c
KH
2390 /* Dummy for ONE_MORE_BYTE. */
2391 struct coding_system dummy_coding;
2392 struct coding_system *coding = &dummy_coding;
fa42c37f 2393
b73bfc1c 2394 while (1)
fa42c37f 2395 {
0a28aafb 2396 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
fa42c37f
KH
2397 if (UTF_8_1_OCTET_P (c))
2398 continue;
2399 else if (UTF_8_2_OCTET_LEADING_P (c))
2400 seq_maybe_bytes = 1;
2401 else if (UTF_8_3_OCTET_LEADING_P (c))
2402 seq_maybe_bytes = 2;
2403 else if (UTF_8_4_OCTET_LEADING_P (c))
2404 seq_maybe_bytes = 3;
2405 else if (UTF_8_5_OCTET_LEADING_P (c))
2406 seq_maybe_bytes = 4;
2407 else if (UTF_8_6_OCTET_LEADING_P (c))
2408 seq_maybe_bytes = 5;
2409 else
2410 return 0;
2411
2412 do
2413 {
0a28aafb 2414 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
fa42c37f
KH
2415 if (!UTF_8_EXTRA_OCTET_P (c))
2416 return 0;
2417 seq_maybe_bytes--;
2418 }
2419 while (seq_maybe_bytes > 0);
2420 }
2421
b73bfc1c 2422 label_end_of_loop:
fa42c37f
KH
2423 return CODING_CATEGORY_MASK_UTF_8;
2424}
2425
2426/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2427 Check if a text is encoded in UTF-16 Big Endian (endian == 1) or
2428 Little Endian (otherwise). If it is, return
2429 CODING_CATEGORY_MASK_UTF_16_BE or CODING_CATEGORY_MASK_UTF_16_LE,
2430 else return 0. */
2431
2432#define UTF_16_INVALID_P(val) \
2433 (((val) == 0xFFFE) \
2434 || ((val) == 0xFFFF))
2435
2436#define UTF_16_HIGH_SURROGATE_P(val) \
2437 (((val) & 0xD800) == 0xD800)
2438
2439#define UTF_16_LOW_SURROGATE_P(val) \
2440 (((val) & 0xDC00) == 0xDC00)
2441
0a28aafb
KH
2442static int
2443detect_coding_utf_16 (src, src_end, multibytep)
fa42c37f 2444 unsigned char *src, *src_end;
0a28aafb 2445 int multibytep;
fa42c37f 2446{
b73bfc1c
KH
2447 unsigned char c1, c2;
2448 /* Dummy for TWO_MORE_BYTES. */
2449 struct coding_system dummy_coding;
2450 struct coding_system *coding = &dummy_coding;
fa42c37f 2451
0a28aafb
KH
2452 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1, multibytep);
2453 ONE_MORE_BYTE_CHECK_MULTIBYTE (c2, multibytep);
b73bfc1c
KH
2454
2455 if ((c1 == 0xFF) && (c2 == 0xFE))
fa42c37f 2456 return CODING_CATEGORY_MASK_UTF_16_LE;
b73bfc1c 2457 else if ((c1 == 0xFE) && (c2 == 0xFF))
fa42c37f
KH
2458 return CODING_CATEGORY_MASK_UTF_16_BE;
2459
b73bfc1c 2460 label_end_of_loop:
fa42c37f
KH
2461 return 0;
2462}
2463
4ed46869
KH
2464/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
2465 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */
2466
b73bfc1c 2467static void
4ed46869 2468decode_coding_sjis_big5 (coding, source, destination,
d46c5b12 2469 src_bytes, dst_bytes, sjis_p)
4ed46869
KH
2470 struct coding_system *coding;
2471 unsigned char *source, *destination;
2472 int src_bytes, dst_bytes;
4ed46869
KH
2473 int sjis_p;
2474{
2475 unsigned char *src = source;
2476 unsigned char *src_end = source + src_bytes;
2477 unsigned char *dst = destination;
2478 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c
KH
2479 /* SRC_BASE remembers the start position in source in each loop.
2480 The loop will be exited when there's not enough source code
2481 (within macro ONE_MORE_BYTE), or when there's not enough
2482 destination area to produce a character (within macro
2483 EMIT_CHAR). */
2484 unsigned char *src_base;
2485 Lisp_Object translation_table;
a5d301df 2486
b73bfc1c
KH
2487 if (NILP (Venable_character_translation))
2488 translation_table = Qnil;
2489 else
2490 {
2491 translation_table = coding->translation_table_for_decode;
2492 if (NILP (translation_table))
2493 translation_table = Vstandard_translation_table_for_decode;
2494 }
4ed46869 2495
d46c5b12 2496 coding->produced_char = 0;
b73bfc1c 2497 while (1)
4ed46869 2498 {
b73bfc1c
KH
2499 int c, charset, c1, c2;
2500
2501 src_base = src;
2502 ONE_MORE_BYTE (c1);
2503
2504 if (c1 < 0x80)
4ed46869 2505 {
b73bfc1c
KH
2506 charset = CHARSET_ASCII;
2507 if (c1 < 0x20)
4ed46869 2508 {
b73bfc1c 2509 if (c1 == '\r')
d46c5b12 2510 {
b73bfc1c 2511 if (coding->eol_type == CODING_EOL_CRLF)
d46c5b12 2512 {
b73bfc1c
KH
2513 ONE_MORE_BYTE (c2);
2514 if (c2 == '\n')
2515 c1 = c2;
2516 else if (coding->mode
2517 & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2518 {
2519 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2520 goto label_end_of_loop;
2521 }
2522 else
2523 /* To process C2 again, SRC is subtracted by 1. */
2524 src--;
d46c5b12 2525 }
b73bfc1c
KH
2526 else if (coding->eol_type == CODING_EOL_CR)
2527 c1 = '\n';
2528 }
2529 else if (c1 == '\n'
2530 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2531 && (coding->eol_type == CODING_EOL_CR
2532 || coding->eol_type == CODING_EOL_CRLF))
2533 {
2534 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2535 goto label_end_of_loop;
d46c5b12 2536 }
4ed46869 2537 }
4ed46869 2538 }
54f78171 2539 else
b73bfc1c 2540 {
4ed46869
KH
2541 if (sjis_p)
2542 {
b73bfc1c
KH
2543 if (c1 >= 0xF0)
2544 goto label_invalid_code;
2545 if (c1 < 0xA0 || c1 >= 0xE0)
fb88bf2d 2546 {
54f78171
KH
2547 /* SJIS -> JISX0208 */
2548 ONE_MORE_BYTE (c2);
b73bfc1c
KH
2549 if (c2 < 0x40 || c2 == 0x7F || c2 > 0xFC)
2550 goto label_invalid_code;
2551 DECODE_SJIS (c1, c2, c1, c2);
2552 charset = charset_jisx0208;
5e34de15 2553 }
fb88bf2d 2554 else
b73bfc1c
KH
2555 /* SJIS -> JISX0201-Kana */
2556 charset = charset_katakana_jisx0201;
4ed46869 2557 }
fb88bf2d 2558 else
fb88bf2d 2559 {
54f78171 2560 /* BIG5 -> Big5 */
b73bfc1c
KH
2561 if (c1 < 0xA1 || c1 > 0xFE)
2562 goto label_invalid_code;
2563 ONE_MORE_BYTE (c2);
2564 if (c2 < 0x40 || (c2 > 0x7E && c2 < 0xA1) || c2 > 0xFE)
2565 goto label_invalid_code;
2566 DECODE_BIG5 (c1, c2, charset, c1, c2);
4ed46869
KH
2567 }
2568 }
4ed46869 2569
b73bfc1c
KH
2570 c = DECODE_ISO_CHARACTER (charset, c1, c2);
2571 EMIT_CHAR (c);
fb88bf2d
KH
2572 continue;
2573
b73bfc1c
KH
2574 label_invalid_code:
2575 coding->errors++;
4ed46869 2576 src = src_base;
b73bfc1c
KH
2577 c = *src++;
2578 EMIT_CHAR (c);
fb88bf2d 2579 }
d46c5b12 2580
b73bfc1c
KH
2581 label_end_of_loop:
2582 coding->consumed = coding->consumed_char = src_base - source;
d46c5b12 2583 coding->produced = dst - destination;
b73bfc1c 2584 return;
4ed46869
KH
2585}
2586
2587/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".
b73bfc1c
KH
2588 This function can encode charsets `ascii', `katakana-jisx0201',
2589 `japanese-jisx0208', `chinese-big5-1', and `chinese-big5-2'. We
2590 are sure that all these charsets are registered as official charset
4ed46869
KH
2591 (i.e. do not have extended leading-codes). Characters of other
2592 charsets are produced without any encoding. If SJIS_P is 1, encode
2593 SJIS text, else encode BIG5 text. */
2594
b73bfc1c 2595static void
4ed46869 2596encode_coding_sjis_big5 (coding, source, destination,
d46c5b12 2597 src_bytes, dst_bytes, sjis_p)
4ed46869
KH
2598 struct coding_system *coding;
2599 unsigned char *source, *destination;
2600 int src_bytes, dst_bytes;
4ed46869
KH
2601 int sjis_p;
2602{
2603 unsigned char *src = source;
2604 unsigned char *src_end = source + src_bytes;
2605 unsigned char *dst = destination;
2606 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c
KH
2607 /* SRC_BASE remembers the start position in source in each loop.
2608 The loop will be exited when there's not enough source text to
2609 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2610 there's not enough destination area to produce encoded codes
2611 (within macro EMIT_BYTES). */
2612 unsigned char *src_base;
2613 Lisp_Object translation_table;
4ed46869 2614
b73bfc1c
KH
2615 if (NILP (Venable_character_translation))
2616 translation_table = Qnil;
2617 else
4ed46869 2618 {
39658efc 2619 translation_table = coding->translation_table_for_encode;
b73bfc1c 2620 if (NILP (translation_table))
39658efc 2621 translation_table = Vstandard_translation_table_for_encode;
b73bfc1c 2622 }
a5d301df 2623
b73bfc1c
KH
2624 while (1)
2625 {
2626 int c, charset, c1, c2;
4ed46869 2627
b73bfc1c
KH
2628 src_base = src;
2629 ONE_MORE_CHAR (c);
2630
2631 /* Now encode the character C. */
2632 if (SINGLE_BYTE_CHAR_P (c))
2633 {
2634 switch (c)
4ed46869 2635 {
b73bfc1c
KH
2636 case '\r':
2637 if (!coding->mode & CODING_MODE_SELECTIVE_DISPLAY)
2638 {
2639 EMIT_ONE_BYTE (c);
2640 break;
2641 }
2642 c = '\n';
2643 case '\n':
2644 if (coding->eol_type == CODING_EOL_CRLF)
2645 {
2646 EMIT_TWO_BYTES ('\r', c);
2647 break;
2648 }
2649 else if (coding->eol_type == CODING_EOL_CR)
2650 c = '\r';
2651 default:
2652 EMIT_ONE_BYTE (c);
2653 }
2654 }
2655 else
2656 {
2657 SPLIT_CHAR (c, charset, c1, c2);
2658 if (sjis_p)
2659 {
2660 if (charset == charset_jisx0208
2661 || charset == charset_jisx0208_1978)
2662 {
2663 ENCODE_SJIS (c1, c2, c1, c2);
2664 EMIT_TWO_BYTES (c1, c2);
2665 }
39658efc
KH
2666 else if (charset == charset_katakana_jisx0201)
2667 EMIT_ONE_BYTE (c1 | 0x80);
fc53a214
KH
2668 else if (charset == charset_latin_jisx0201)
2669 EMIT_ONE_BYTE (c1);
b73bfc1c
KH
2670 else
2671 /* There's no way other than producing the internal
2672 codes as is. */
2673 EMIT_BYTES (src_base, src);
4ed46869 2674 }
4ed46869 2675 else
b73bfc1c
KH
2676 {
2677 if (charset == charset_big5_1 || charset == charset_big5_2)
2678 {
2679 ENCODE_BIG5 (charset, c1, c2, c1, c2);
2680 EMIT_TWO_BYTES (c1, c2);
2681 }
2682 else
2683 /* There's no way other than producing the internal
2684 codes as is. */
2685 EMIT_BYTES (src_base, src);
2686 }
4ed46869 2687 }
b73bfc1c 2688 coding->consumed_char++;
4ed46869
KH
2689 }
2690
b73bfc1c
KH
2691 label_end_of_loop:
2692 coding->consumed = src_base - source;
d46c5b12 2693 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
2694}
2695
2696\f
1397dc18
KH
2697/*** 5. CCL handlers ***/
2698
2699/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2700 Check if a text is encoded in a coding system of which
2701 encoder/decoder are written in CCL program. If it is, return
2702 CODING_CATEGORY_MASK_CCL, else return 0. */
2703
0a28aafb
KH
2704static int
2705detect_coding_ccl (src, src_end, multibytep)
1397dc18 2706 unsigned char *src, *src_end;
0a28aafb 2707 int multibytep;
1397dc18
KH
2708{
2709 unsigned char *valid;
b73bfc1c
KH
2710 int c;
2711 /* Dummy for ONE_MORE_BYTE. */
2712 struct coding_system dummy_coding;
2713 struct coding_system *coding = &dummy_coding;
1397dc18
KH
2714
2715 /* No coding system is assigned to coding-category-ccl. */
2716 if (!coding_system_table[CODING_CATEGORY_IDX_CCL])
2717 return 0;
2718
2719 valid = coding_system_table[CODING_CATEGORY_IDX_CCL]->spec.ccl.valid_codes;
b73bfc1c 2720 while (1)
1397dc18 2721 {
0a28aafb 2722 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
b73bfc1c
KH
2723 if (! valid[c])
2724 return 0;
1397dc18 2725 }
b73bfc1c 2726 label_end_of_loop:
1397dc18
KH
2727 return CODING_CATEGORY_MASK_CCL;
2728}
2729
2730\f
2731/*** 6. End-of-line handlers ***/
4ed46869 2732
b73bfc1c 2733/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
4ed46869 2734
b73bfc1c 2735static void
d46c5b12 2736decode_eol (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
2737 struct coding_system *coding;
2738 unsigned char *source, *destination;
2739 int src_bytes, dst_bytes;
4ed46869
KH
2740{
2741 unsigned char *src = source;
4ed46869 2742 unsigned char *dst = destination;
b73bfc1c
KH
2743 unsigned char *src_end = src + src_bytes;
2744 unsigned char *dst_end = dst + dst_bytes;
2745 Lisp_Object translation_table;
2746 /* SRC_BASE remembers the start position in source in each loop.
2747 The loop will be exited when there's not enough source code
2748 (within macro ONE_MORE_BYTE), or when there's not enough
2749 destination area to produce a character (within macro
2750 EMIT_CHAR). */
2751 unsigned char *src_base;
2752 int c;
2753
2754 translation_table = Qnil;
4ed46869
KH
2755 switch (coding->eol_type)
2756 {
2757 case CODING_EOL_CRLF:
b73bfc1c 2758 while (1)
d46c5b12 2759 {
b73bfc1c
KH
2760 src_base = src;
2761 ONE_MORE_BYTE (c);
2762 if (c == '\r')
fb88bf2d 2763 {
b73bfc1c
KH
2764 ONE_MORE_BYTE (c);
2765 if (c != '\n')
2766 {
2767 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2768 {
2769 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2770 goto label_end_of_loop;
2771 }
2772 src--;
2773 c = '\r';
2774 }
fb88bf2d 2775 }
b73bfc1c
KH
2776 else if (c == '\n'
2777 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL))
d46c5b12 2778 {
b73bfc1c
KH
2779 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2780 goto label_end_of_loop;
d46c5b12 2781 }
b73bfc1c 2782 EMIT_CHAR (c);
d46c5b12 2783 }
b73bfc1c
KH
2784 break;
2785
2786 case CODING_EOL_CR:
2787 while (1)
d46c5b12 2788 {
b73bfc1c
KH
2789 src_base = src;
2790 ONE_MORE_BYTE (c);
2791 if (c == '\n')
2792 {
2793 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2794 {
2795 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2796 goto label_end_of_loop;
2797 }
2798 }
2799 else if (c == '\r')
2800 c = '\n';
2801 EMIT_CHAR (c);
d46c5b12 2802 }
4ed46869
KH
2803 break;
2804
b73bfc1c
KH
2805 default: /* no need for EOL handling */
2806 while (1)
d46c5b12 2807 {
b73bfc1c
KH
2808 src_base = src;
2809 ONE_MORE_BYTE (c);
2810 EMIT_CHAR (c);
d46c5b12 2811 }
4ed46869
KH
2812 }
2813
b73bfc1c
KH
2814 label_end_of_loop:
2815 coding->consumed = coding->consumed_char = src_base - source;
2816 coding->produced = dst - destination;
2817 return;
4ed46869
KH
2818}
2819
2820/* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode
b73bfc1c
KH
2821 format of end-of-line according to `coding->eol_type'. It also
2822 convert multibyte form 8-bit characers to unibyte if
2823 CODING->src_multibyte is nonzero. If `coding->mode &
2824 CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code '\r' in source text
2825 also means end-of-line. */
4ed46869 2826
b73bfc1c 2827static void
d46c5b12 2828encode_eol (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
2829 struct coding_system *coding;
2830 unsigned char *source, *destination;
2831 int src_bytes, dst_bytes;
4ed46869
KH
2832{
2833 unsigned char *src = source;
2834 unsigned char *dst = destination;
b73bfc1c
KH
2835 unsigned char *src_end = src + src_bytes;
2836 unsigned char *dst_end = dst + dst_bytes;
2837 Lisp_Object translation_table;
2838 /* SRC_BASE remembers the start position in source in each loop.
2839 The loop will be exited when there's not enough source text to
2840 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2841 there's not enough destination area to produce encoded codes
2842 (within macro EMIT_BYTES). */
2843 unsigned char *src_base;
2844 int c;
2845 int selective_display = coding->mode & CODING_MODE_SELECTIVE_DISPLAY;
2846
2847 translation_table = Qnil;
2848 if (coding->src_multibyte
2849 && *(src_end - 1) == LEADING_CODE_8_BIT_CONTROL)
2850 {
2851 src_end--;
2852 src_bytes--;
2853 coding->result = CODING_FINISH_INSUFFICIENT_SRC;
2854 }
fb88bf2d 2855
d46c5b12
KH
2856 if (coding->eol_type == CODING_EOL_CRLF)
2857 {
b73bfc1c 2858 while (src < src_end)
d46c5b12 2859 {
b73bfc1c 2860 src_base = src;
d46c5b12 2861 c = *src++;
b73bfc1c
KH
2862 if (c >= 0x20)
2863 EMIT_ONE_BYTE (c);
2864 else if (c == '\n' || (c == '\r' && selective_display))
2865 EMIT_TWO_BYTES ('\r', '\n');
d46c5b12 2866 else
b73bfc1c 2867 EMIT_ONE_BYTE (c);
d46c5b12 2868 }
ff2b1ea9 2869 src_base = src;
b73bfc1c 2870 label_end_of_loop:
005f0d35 2871 ;
d46c5b12
KH
2872 }
2873 else
4ed46869 2874 {
78a629d2 2875 if (!dst_bytes || src_bytes <= dst_bytes)
4ed46869 2876 {
b73bfc1c
KH
2877 safe_bcopy (src, dst, src_bytes);
2878 src_base = src_end;
2879 dst += src_bytes;
d46c5b12 2880 }
d46c5b12 2881 else
b73bfc1c
KH
2882 {
2883 if (coding->src_multibyte
2884 && *(src + dst_bytes - 1) == LEADING_CODE_8_BIT_CONTROL)
2885 dst_bytes--;
2886 safe_bcopy (src, dst, dst_bytes);
2887 src_base = src + dst_bytes;
2888 dst = destination + dst_bytes;
2889 coding->result = CODING_FINISH_INSUFFICIENT_DST;
2890 }
993824c9 2891 if (coding->eol_type == CODING_EOL_CR)
d46c5b12 2892 {
b73bfc1c
KH
2893 for (src = destination; src < dst; src++)
2894 if (*src == '\n') *src = '\r';
d46c5b12 2895 }
b73bfc1c 2896 else if (selective_display)
d46c5b12 2897 {
b73bfc1c
KH
2898 for (src = destination; src < dst; src++)
2899 if (*src == '\r') *src = '\n';
4ed46869 2900 }
4ed46869 2901 }
b73bfc1c
KH
2902 if (coding->src_multibyte)
2903 dst = destination + str_as_unibyte (destination, dst - destination);
4ed46869 2904
b73bfc1c
KH
2905 coding->consumed = src_base - source;
2906 coding->produced = dst - destination;
78a629d2 2907 coding->produced_char = coding->produced;
4ed46869
KH
2908}
2909
2910\f
1397dc18 2911/*** 7. C library functions ***/
4ed46869
KH
2912
2913/* In Emacs Lisp, coding system is represented by a Lisp symbol which
2914 has a property `coding-system'. The value of this property is a
2915 vector of length 5 (called as coding-vector). Among elements of
2916 this vector, the first (element[0]) and the fifth (element[4])
2917 carry important information for decoding/encoding. Before
2918 decoding/encoding, this information should be set in fields of a
2919 structure of type `coding_system'.
2920
2921 A value of property `coding-system' can be a symbol of another
2922 subsidiary coding-system. In that case, Emacs gets coding-vector
2923 from that symbol.
2924
2925 `element[0]' contains information to be set in `coding->type'. The
2926 value and its meaning is as follows:
2927
0ef69138
KH
2928 0 -- coding_type_emacs_mule
2929 1 -- coding_type_sjis
2930 2 -- coding_type_iso2022
2931 3 -- coding_type_big5
2932 4 -- coding_type_ccl encoder/decoder written in CCL
2933 nil -- coding_type_no_conversion
2934 t -- coding_type_undecided (automatic conversion on decoding,
2935 no-conversion on encoding)
4ed46869
KH
2936
2937 `element[4]' contains information to be set in `coding->flags' and
2938 `coding->spec'. The meaning varies by `coding->type'.
2939
2940 If `coding->type' is `coding_type_iso2022', element[4] is a vector
2941 of length 32 (of which the first 13 sub-elements are used now).
2942 Meanings of these sub-elements are:
2943
2944 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso2022'
2945 If the value is an integer of valid charset, the charset is
2946 assumed to be designated to graphic register N initially.
2947
2948 If the value is minus, it is a minus value of charset which
2949 reserves graphic register N, which means that the charset is
2950 not designated initially but should be designated to graphic
2951 register N just before encoding a character in that charset.
2952
2953 If the value is nil, graphic register N is never used on
2954 encoding.
2955
2956 sub-element[N] where N is 4 through 11: to be set in `coding->flags'
2957 Each value takes t or nil. See the section ISO2022 of
2958 `coding.h' for more information.
2959
2960 If `coding->type' is `coding_type_big5', element[4] is t to denote
2961 BIG5-ETen or nil to denote BIG5-HKU.
2962
2963 If `coding->type' takes the other value, element[4] is ignored.
2964
2965 Emacs Lisp's coding system also carries information about format of
2966 end-of-line in a value of property `eol-type'. If the value is
2967 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
2968 means CODING_EOL_CR. If it is not integer, it should be a vector
2969 of subsidiary coding systems of which property `eol-type' has one
2970 of above values.
2971
2972*/
2973
2974/* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
2975 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING
2976 is setup so that no conversion is necessary and return -1, else
2977 return 0. */
2978
2979int
e0e989f6
KH
2980setup_coding_system (coding_system, coding)
2981 Lisp_Object coding_system;
4ed46869
KH
2982 struct coding_system *coding;
2983{
d46c5b12 2984 Lisp_Object coding_spec, coding_type, eol_type, plist;
4608c386 2985 Lisp_Object val;
70c22245 2986 int i;
4ed46869 2987
d46c5b12 2988 /* Initialize some fields required for all kinds of coding systems. */
774324d6 2989 coding->symbol = coding_system;
d46c5b12
KH
2990 coding->common_flags = 0;
2991 coding->mode = 0;
2992 coding->heading_ascii = -1;
2993 coding->post_read_conversion = coding->pre_write_conversion = Qnil;
ec6d2bb8
KH
2994 coding->composing = COMPOSITION_DISABLED;
2995 coding->cmp_data = NULL;
1f5dbf34
KH
2996
2997 if (NILP (coding_system))
2998 goto label_invalid_coding_system;
2999
4608c386 3000 coding_spec = Fget (coding_system, Qcoding_system);
1f5dbf34 3001
4608c386
KH
3002 if (!VECTORP (coding_spec)
3003 || XVECTOR (coding_spec)->size != 5
3004 || !CONSP (XVECTOR (coding_spec)->contents[3]))
4ed46869 3005 goto label_invalid_coding_system;
4608c386 3006
d46c5b12
KH
3007 eol_type = inhibit_eol_conversion ? Qnil : Fget (coding_system, Qeol_type);
3008 if (VECTORP (eol_type))
3009 {
3010 coding->eol_type = CODING_EOL_UNDECIDED;
3011 coding->common_flags = CODING_REQUIRE_DETECTION_MASK;
3012 }
3013 else if (XFASTINT (eol_type) == 1)
3014 {
3015 coding->eol_type = CODING_EOL_CRLF;
3016 coding->common_flags
3017 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
3018 }
3019 else if (XFASTINT (eol_type) == 2)
3020 {
3021 coding->eol_type = CODING_EOL_CR;
3022 coding->common_flags
3023 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
3024 }
3025 else
3026 coding->eol_type = CODING_EOL_LF;
3027
3028 coding_type = XVECTOR (coding_spec)->contents[0];
3029 /* Try short cut. */
3030 if (SYMBOLP (coding_type))
3031 {
3032 if (EQ (coding_type, Qt))
3033 {
3034 coding->type = coding_type_undecided;
3035 coding->common_flags |= CODING_REQUIRE_DETECTION_MASK;
3036 }
3037 else
3038 coding->type = coding_type_no_conversion;
9b96232f
KH
3039 /* Initialize this member. Any thing other than
3040 CODING_CATEGORY_IDX_UTF_16_BE and
3041 CODING_CATEGORY_IDX_UTF_16_LE are ok because they have
3042 special treatment in detect_eol. */
3043 coding->category_idx = CODING_CATEGORY_IDX_EMACS_MULE;
3044
d46c5b12
KH
3045 return 0;
3046 }
3047
d46c5b12
KH
3048 /* Get values of coding system properties:
3049 `post-read-conversion', `pre-write-conversion',
f967223b 3050 `translation-table-for-decode', `translation-table-for-encode'. */
4608c386 3051 plist = XVECTOR (coding_spec)->contents[3];
b843d1ae
KH
3052 /* Pre & post conversion functions should be disabled if
3053 inhibit_eol_conversion is nozero. This is the case that a code
3054 conversion function is called while those functions are running. */
3055 if (! inhibit_pre_post_conversion)
3056 {
3057 coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion);
3058 coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion);
3059 }
f967223b 3060 val = Fplist_get (plist, Qtranslation_table_for_decode);
4608c386 3061 if (SYMBOLP (val))
f967223b
KH
3062 val = Fget (val, Qtranslation_table_for_decode);
3063 coding->translation_table_for_decode = CHAR_TABLE_P (val) ? val : Qnil;
3064 val = Fplist_get (plist, Qtranslation_table_for_encode);
4608c386 3065 if (SYMBOLP (val))
f967223b
KH
3066 val = Fget (val, Qtranslation_table_for_encode);
3067 coding->translation_table_for_encode = CHAR_TABLE_P (val) ? val : Qnil;
d46c5b12
KH
3068 val = Fplist_get (plist, Qcoding_category);
3069 if (!NILP (val))
3070 {
3071 val = Fget (val, Qcoding_category_index);
3072 if (INTEGERP (val))
3073 coding->category_idx = XINT (val);
3074 else
3075 goto label_invalid_coding_system;
3076 }
3077 else
3078 goto label_invalid_coding_system;
4608c386 3079
ec6d2bb8
KH
3080 /* If the coding system has non-nil `composition' property, enable
3081 composition handling. */
3082 val = Fplist_get (plist, Qcomposition);
3083 if (!NILP (val))
3084 coding->composing = COMPOSITION_NO;
3085
d46c5b12 3086 switch (XFASTINT (coding_type))
4ed46869
KH
3087 {
3088 case 0:
0ef69138 3089 coding->type = coding_type_emacs_mule;
c952af22
KH
3090 if (!NILP (coding->post_read_conversion))
3091 coding->common_flags |= CODING_REQUIRE_DECODING_MASK;
3092 if (!NILP (coding->pre_write_conversion))
3093 coding->common_flags |= CODING_REQUIRE_ENCODING_MASK;
4ed46869
KH
3094 break;
3095
3096 case 1:
3097 coding->type = coding_type_sjis;
c952af22
KH
3098 coding->common_flags
3099 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869
KH
3100 break;
3101
3102 case 2:
3103 coding->type = coding_type_iso2022;
c952af22
KH
3104 coding->common_flags
3105 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3106 {
70c22245 3107 Lisp_Object val, temp;
4ed46869 3108 Lisp_Object *flags;
d46c5b12 3109 int i, charset, reg_bits = 0;
4ed46869 3110
4608c386 3111 val = XVECTOR (coding_spec)->contents[4];
f44d27ce 3112
4ed46869
KH
3113 if (!VECTORP (val) || XVECTOR (val)->size != 32)
3114 goto label_invalid_coding_system;
3115
3116 flags = XVECTOR (val)->contents;
3117 coding->flags
3118 = ((NILP (flags[4]) ? 0 : CODING_FLAG_ISO_SHORT_FORM)
3119 | (NILP (flags[5]) ? 0 : CODING_FLAG_ISO_RESET_AT_EOL)
3120 | (NILP (flags[6]) ? 0 : CODING_FLAG_ISO_RESET_AT_CNTL)
3121 | (NILP (flags[7]) ? 0 : CODING_FLAG_ISO_SEVEN_BITS)
3122 | (NILP (flags[8]) ? 0 : CODING_FLAG_ISO_LOCKING_SHIFT)
3123 | (NILP (flags[9]) ? 0 : CODING_FLAG_ISO_SINGLE_SHIFT)
3124 | (NILP (flags[10]) ? 0 : CODING_FLAG_ISO_USE_ROMAN)
3125 | (NILP (flags[11]) ? 0 : CODING_FLAG_ISO_USE_OLDJIS)
e0e989f6
KH
3126 | (NILP (flags[12]) ? 0 : CODING_FLAG_ISO_NO_DIRECTION)
3127 | (NILP (flags[13]) ? 0 : CODING_FLAG_ISO_INIT_AT_BOL)
c4825358
KH
3128 | (NILP (flags[14]) ? 0 : CODING_FLAG_ISO_DESIGNATE_AT_BOL)
3129 | (NILP (flags[15]) ? 0 : CODING_FLAG_ISO_SAFE)
3f003981 3130 | (NILP (flags[16]) ? 0 : CODING_FLAG_ISO_LATIN_EXTRA)
c4825358 3131 );
4ed46869
KH
3132
3133 /* Invoke graphic register 0 to plane 0. */
3134 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
3135 /* Invoke graphic register 1 to plane 1 if we can use full 8-bit. */
3136 CODING_SPEC_ISO_INVOCATION (coding, 1)
3137 = (coding->flags & CODING_FLAG_ISO_SEVEN_BITS ? -1 : 1);
3138 /* Not single shifting at first. */
6e85d753 3139 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0;
e0e989f6 3140 /* Beginning of buffer should also be regarded as bol. */
6e85d753 3141 CODING_SPEC_ISO_BOL (coding) = 1;
4ed46869 3142
70c22245
KH
3143 for (charset = 0; charset <= MAX_CHARSET; charset++)
3144 CODING_SPEC_ISO_REVISION_NUMBER (coding, charset) = 255;
3145 val = Vcharset_revision_alist;
3146 while (CONSP (val))
3147 {
03699b14 3148 charset = get_charset_id (Fcar_safe (XCAR (val)));
70c22245 3149 if (charset >= 0
03699b14 3150 && (temp = Fcdr_safe (XCAR (val)), INTEGERP (temp))
70c22245
KH
3151 && (i = XINT (temp), (i >= 0 && (i + '@') < 128)))
3152 CODING_SPEC_ISO_REVISION_NUMBER (coding, charset) = i;
03699b14 3153 val = XCDR (val);
70c22245
KH
3154 }
3155
4ed46869
KH
3156 /* Checks FLAGS[REG] (REG = 0, 1, 2 3) and decide designations.
3157 FLAGS[REG] can be one of below:
3158 integer CHARSET: CHARSET occupies register I,
3159 t: designate nothing to REG initially, but can be used
3160 by any charsets,
3161 list of integer, nil, or t: designate the first
3162 element (if integer) to REG initially, the remaining
3163 elements (if integer) is designated to REG on request,
d46c5b12 3164 if an element is t, REG can be used by any charsets,
4ed46869 3165 nil: REG is never used. */
467e7675 3166 for (charset = 0; charset <= MAX_CHARSET; charset++)
1ba9e4ab
KH
3167 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3168 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION;
4ed46869
KH
3169 for (i = 0; i < 4; i++)
3170 {
3171 if (INTEGERP (flags[i])
e0e989f6
KH
3172 && (charset = XINT (flags[i]), CHARSET_VALID_P (charset))
3173 || (charset = get_charset_id (flags[i])) >= 0)
4ed46869
KH
3174 {
3175 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = charset;
3176 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) = i;
3177 }
3178 else if (EQ (flags[i], Qt))
3179 {
3180 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
d46c5b12
KH
3181 reg_bits |= 1 << i;
3182 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
4ed46869
KH
3183 }
3184 else if (CONSP (flags[i]))
3185 {
84d60297
RS
3186 Lisp_Object tail;
3187 tail = flags[i];
4ed46869 3188
d46c5b12 3189 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
03699b14
KR
3190 if (INTEGERP (XCAR (tail))
3191 && (charset = XINT (XCAR (tail)),
e0e989f6 3192 CHARSET_VALID_P (charset))
03699b14 3193 || (charset = get_charset_id (XCAR (tail))) >= 0)
4ed46869
KH
3194 {
3195 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = charset;
3196 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) =i;
3197 }
3198 else
3199 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
03699b14 3200 tail = XCDR (tail);
4ed46869
KH
3201 while (CONSP (tail))
3202 {
03699b14
KR
3203 if (INTEGERP (XCAR (tail))
3204 && (charset = XINT (XCAR (tail)),
e0e989f6 3205 CHARSET_VALID_P (charset))
03699b14 3206 || (charset = get_charset_id (XCAR (tail))) >= 0)
70c22245
KH
3207 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3208 = i;
03699b14 3209 else if (EQ (XCAR (tail), Qt))
d46c5b12 3210 reg_bits |= 1 << i;
03699b14 3211 tail = XCDR (tail);
4ed46869
KH
3212 }
3213 }
3214 else
3215 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
3216
3217 CODING_SPEC_ISO_DESIGNATION (coding, i)
3218 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i);
3219 }
3220
d46c5b12 3221 if (reg_bits && ! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
4ed46869
KH
3222 {
3223 /* REG 1 can be used only by locking shift in 7-bit env. */
3224 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
d46c5b12 3225 reg_bits &= ~2;
4ed46869
KH
3226 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
3227 /* Without any shifting, only REG 0 and 1 can be used. */
d46c5b12 3228 reg_bits &= 3;
4ed46869
KH
3229 }
3230
d46c5b12
KH
3231 if (reg_bits)
3232 for (charset = 0; charset <= MAX_CHARSET; charset++)
6e85d753 3233 {
96148065
KH
3234 if (CHARSET_VALID_P (charset)
3235 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3236 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
d46c5b12
KH
3237 {
3238 /* There exist some default graphic registers to be
96148065 3239 used by CHARSET. */
d46c5b12
KH
3240
3241 /* We had better avoid designating a charset of
3242 CHARS96 to REG 0 as far as possible. */
3243 if (CHARSET_CHARS (charset) == 96)
3244 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3245 = (reg_bits & 2
3246 ? 1 : (reg_bits & 4 ? 2 : (reg_bits & 8 ? 3 : 0)));
3247 else
3248 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3249 = (reg_bits & 1
3250 ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3)));
3251 }
6e85d753 3252 }
4ed46869 3253 }
c952af22 3254 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
d46c5b12 3255 coding->spec.iso2022.last_invalid_designation_register = -1;
4ed46869
KH
3256 break;
3257
3258 case 3:
3259 coding->type = coding_type_big5;
c952af22
KH
3260 coding->common_flags
3261 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3262 coding->flags
4608c386 3263 = (NILP (XVECTOR (coding_spec)->contents[4])
4ed46869
KH
3264 ? CODING_FLAG_BIG5_HKU
3265 : CODING_FLAG_BIG5_ETEN);
3266 break;
3267
3268 case 4:
3269 coding->type = coding_type_ccl;
c952af22
KH
3270 coding->common_flags
3271 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3272 {
84d60297 3273 val = XVECTOR (coding_spec)->contents[4];
ef4ced28
KH
3274 if (! CONSP (val)
3275 || setup_ccl_program (&(coding->spec.ccl.decoder),
03699b14 3276 XCAR (val)) < 0
ef4ced28 3277 || setup_ccl_program (&(coding->spec.ccl.encoder),
03699b14 3278 XCDR (val)) < 0)
4ed46869 3279 goto label_invalid_coding_system;
1397dc18
KH
3280
3281 bzero (coding->spec.ccl.valid_codes, 256);
3282 val = Fplist_get (plist, Qvalid_codes);
3283 if (CONSP (val))
3284 {
3285 Lisp_Object this;
3286
03699b14 3287 for (; CONSP (val); val = XCDR (val))
1397dc18 3288 {
03699b14 3289 this = XCAR (val);
1397dc18
KH
3290 if (INTEGERP (this)
3291 && XINT (this) >= 0 && XINT (this) < 256)
3292 coding->spec.ccl.valid_codes[XINT (this)] = 1;
3293 else if (CONSP (this)
03699b14
KR
3294 && INTEGERP (XCAR (this))
3295 && INTEGERP (XCDR (this)))
1397dc18 3296 {
03699b14
KR
3297 int start = XINT (XCAR (this));
3298 int end = XINT (XCDR (this));
1397dc18
KH
3299
3300 if (start >= 0 && start <= end && end < 256)
e133c8fa 3301 while (start <= end)
1397dc18
KH
3302 coding->spec.ccl.valid_codes[start++] = 1;
3303 }
3304 }
3305 }
4ed46869 3306 }
c952af22 3307 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
aaaf0b1e 3308 coding->spec.ccl.cr_carryover = 0;
4ed46869
KH
3309 break;
3310
27901516
KH
3311 case 5:
3312 coding->type = coding_type_raw_text;
3313 break;
3314
4ed46869 3315 default:
d46c5b12 3316 goto label_invalid_coding_system;
4ed46869
KH
3317 }
3318 return 0;
3319
3320 label_invalid_coding_system:
3321 coding->type = coding_type_no_conversion;
d46c5b12 3322 coding->category_idx = CODING_CATEGORY_IDX_BINARY;
c952af22 3323 coding->common_flags = 0;
dec137e5 3324 coding->eol_type = CODING_EOL_LF;
d46c5b12 3325 coding->pre_write_conversion = coding->post_read_conversion = Qnil;
4ed46869
KH
3326 return -1;
3327}
3328
ec6d2bb8
KH
3329/* Free memory blocks allocated for storing composition information. */
3330
3331void
3332coding_free_composition_data (coding)
3333 struct coding_system *coding;
3334{
3335 struct composition_data *cmp_data = coding->cmp_data, *next;
3336
3337 if (!cmp_data)
3338 return;
3339 /* Memory blocks are chained. At first, rewind to the first, then,
3340 free blocks one by one. */
3341 while (cmp_data->prev)
3342 cmp_data = cmp_data->prev;
3343 while (cmp_data)
3344 {
3345 next = cmp_data->next;
3346 xfree (cmp_data);
3347 cmp_data = next;
3348 }
3349 coding->cmp_data = NULL;
3350}
3351
3352/* Set `char_offset' member of all memory blocks pointed by
3353 coding->cmp_data to POS. */
3354
3355void
3356coding_adjust_composition_offset (coding, pos)
3357 struct coding_system *coding;
3358 int pos;
3359{
3360 struct composition_data *cmp_data;
3361
3362 for (cmp_data = coding->cmp_data; cmp_data; cmp_data = cmp_data->next)
3363 cmp_data->char_offset = pos;
3364}
3365
54f78171
KH
3366/* Setup raw-text or one of its subsidiaries in the structure
3367 coding_system CODING according to the already setup value eol_type
3368 in CODING. CODING should be setup for some coding system in
3369 advance. */
3370
3371void
3372setup_raw_text_coding_system (coding)
3373 struct coding_system *coding;
3374{
3375 if (coding->type != coding_type_raw_text)
3376 {
3377 coding->symbol = Qraw_text;
3378 coding->type = coding_type_raw_text;
3379 if (coding->eol_type != CODING_EOL_UNDECIDED)
3380 {
84d60297
RS
3381 Lisp_Object subsidiaries;
3382 subsidiaries = Fget (Qraw_text, Qeol_type);
54f78171
KH
3383
3384 if (VECTORP (subsidiaries)
3385 && XVECTOR (subsidiaries)->size == 3)
3386 coding->symbol
3387 = XVECTOR (subsidiaries)->contents[coding->eol_type];
3388 }
716e0b0a 3389 setup_coding_system (coding->symbol, coding);
54f78171
KH
3390 }
3391 return;
3392}
3393
4ed46869
KH
3394/* Emacs has a mechanism to automatically detect a coding system if it
3395 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But,
3396 it's impossible to distinguish some coding systems accurately
3397 because they use the same range of codes. So, at first, coding
3398 systems are categorized into 7, those are:
3399
0ef69138 3400 o coding-category-emacs-mule
4ed46869
KH
3401
3402 The category for a coding system which has the same code range
3403 as Emacs' internal format. Assigned the coding-system (Lisp
0ef69138 3404 symbol) `emacs-mule' by default.
4ed46869
KH
3405
3406 o coding-category-sjis
3407
3408 The category for a coding system which has the same code range
3409 as SJIS. Assigned the coding-system (Lisp
7717c392 3410 symbol) `japanese-shift-jis' by default.
4ed46869
KH
3411
3412 o coding-category-iso-7
3413
3414 The category for a coding system which has the same code range
7717c392 3415 as ISO2022 of 7-bit environment. This doesn't use any locking
d46c5b12
KH
3416 shift and single shift functions. This can encode/decode all
3417 charsets. Assigned the coding-system (Lisp symbol)
3418 `iso-2022-7bit' by default.
3419
3420 o coding-category-iso-7-tight
3421
3422 Same as coding-category-iso-7 except that this can
3423 encode/decode only the specified charsets.
4ed46869
KH
3424
3425 o coding-category-iso-8-1
3426
3427 The category for a coding system which has the same code range
3428 as ISO2022 of 8-bit environment and graphic plane 1 used only
7717c392
KH
3429 for DIMENSION1 charset. This doesn't use any locking shift
3430 and single shift functions. Assigned the coding-system (Lisp
3431 symbol) `iso-latin-1' by default.
4ed46869
KH
3432
3433 o coding-category-iso-8-2
3434
3435 The category for a coding system which has the same code range
3436 as ISO2022 of 8-bit environment and graphic plane 1 used only
7717c392
KH
3437 for DIMENSION2 charset. This doesn't use any locking shift
3438 and single shift functions. Assigned the coding-system (Lisp
3439 symbol) `japanese-iso-8bit' by default.
4ed46869 3440
7717c392 3441 o coding-category-iso-7-else
4ed46869
KH
3442
3443 The category for a coding system which has the same code range
7717c392
KH
3444 as ISO2022 of 7-bit environemnt but uses locking shift or
3445 single shift functions. Assigned the coding-system (Lisp
3446 symbol) `iso-2022-7bit-lock' by default.
3447
3448 o coding-category-iso-8-else
3449
3450 The category for a coding system which has the same code range
3451 as ISO2022 of 8-bit environemnt but uses locking shift or
3452 single shift functions. Assigned the coding-system (Lisp
3453 symbol) `iso-2022-8bit-ss2' by default.
4ed46869
KH
3454
3455 o coding-category-big5
3456
3457 The category for a coding system which has the same code range
3458 as BIG5. Assigned the coding-system (Lisp symbol)
e0e989f6 3459 `cn-big5' by default.
4ed46869 3460
fa42c37f
KH
3461 o coding-category-utf-8
3462
3463 The category for a coding system which has the same code range
3464 as UTF-8 (cf. RFC2279). Assigned the coding-system (Lisp
3465 symbol) `utf-8' by default.
3466
3467 o coding-category-utf-16-be
3468
3469 The category for a coding system in which a text has an
3470 Unicode signature (cf. Unicode Standard) in the order of BIG
3471 endian at the head. Assigned the coding-system (Lisp symbol)
3472 `utf-16-be' by default.
3473
3474 o coding-category-utf-16-le
3475
3476 The category for a coding system in which a text has an
3477 Unicode signature (cf. Unicode Standard) in the order of
3478 LITTLE endian at the head. Assigned the coding-system (Lisp
3479 symbol) `utf-16-le' by default.
3480
1397dc18
KH
3481 o coding-category-ccl
3482
3483 The category for a coding system of which encoder/decoder is
3484 written in CCL programs. The default value is nil, i.e., no
3485 coding system is assigned.
3486
4ed46869
KH
3487 o coding-category-binary
3488
3489 The category for a coding system not categorized in any of the
3490 above. Assigned the coding-system (Lisp symbol)
e0e989f6 3491 `no-conversion' by default.
4ed46869
KH
3492
3493 Each of them is a Lisp symbol and the value is an actual
3494 `coding-system's (this is also a Lisp symbol) assigned by a user.
3495 What Emacs does actually is to detect a category of coding system.
3496 Then, it uses a `coding-system' assigned to it. If Emacs can't
3497 decide only one possible category, it selects a category of the
3498 highest priority. Priorities of categories are also specified by a
3499 user in a Lisp variable `coding-category-list'.
3500
3501*/
3502
66cfb530
KH
3503static
3504int ascii_skip_code[256];
3505
d46c5b12 3506/* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded.
4ed46869
KH
3507 If it detects possible coding systems, return an integer in which
3508 appropriate flag bits are set. Flag bits are defined by macros
fa42c37f
KH
3509 CODING_CATEGORY_MASK_XXX in `coding.h'. If PRIORITIES is non-NULL,
3510 it should point the table `coding_priorities'. In that case, only
3511 the flag bit for a coding system of the highest priority is set in
0a28aafb
KH
3512 the returned value. If MULTIBYTEP is nonzero, 8-bit codes of the
3513 range 0x80..0x9F are in multibyte form.
4ed46869 3514
d46c5b12
KH
3515 How many ASCII characters are at the head is returned as *SKIP. */
3516
3517static int
0a28aafb 3518detect_coding_mask (source, src_bytes, priorities, skip, multibytep)
d46c5b12
KH
3519 unsigned char *source;
3520 int src_bytes, *priorities, *skip;
0a28aafb 3521 int multibytep;
4ed46869
KH
3522{
3523 register unsigned char c;
d46c5b12 3524 unsigned char *src = source, *src_end = source + src_bytes;
fa42c37f
KH
3525 unsigned int mask, utf16_examined_p, iso2022_examined_p;
3526 int i, idx;
4ed46869
KH
3527
3528 /* At first, skip all ASCII characters and control characters except
3529 for three ISO2022 specific control characters. */
66cfb530
KH
3530 ascii_skip_code[ISO_CODE_SO] = 0;
3531 ascii_skip_code[ISO_CODE_SI] = 0;
3532 ascii_skip_code[ISO_CODE_ESC] = 0;
3533
bcf26d6a 3534 label_loop_detect_coding:
66cfb530 3535 while (src < src_end && ascii_skip_code[*src]) src++;
d46c5b12 3536 *skip = src - source;
4ed46869
KH
3537
3538 if (src >= src_end)
3539 /* We found nothing other than ASCII. There's nothing to do. */
d46c5b12 3540 return 0;
4ed46869 3541
8a8147d6 3542 c = *src;
4ed46869
KH
3543 /* The text seems to be encoded in some multilingual coding system.
3544 Now, try to find in which coding system the text is encoded. */
3545 if (c < 0x80)
bcf26d6a
KH
3546 {
3547 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */
3548 /* C is an ISO2022 specific control code of C0. */
0a28aafb 3549 mask = detect_coding_iso2022 (src, src_end, multibytep);
1b2af4b0 3550 if (mask == 0)
d46c5b12
KH
3551 {
3552 /* No valid ISO2022 code follows C. Try again. */
3553 src++;
66cfb530
KH
3554 if (c == ISO_CODE_ESC)
3555 ascii_skip_code[ISO_CODE_ESC] = 1;
3556 else
3557 ascii_skip_code[ISO_CODE_SO] = ascii_skip_code[ISO_CODE_SI] = 1;
d46c5b12
KH
3558 goto label_loop_detect_coding;
3559 }
3560 if (priorities)
fa42c37f
KH
3561 {
3562 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
3563 {
3564 if (mask & priorities[i])
3565 return priorities[i];
3566 }
3567 return CODING_CATEGORY_MASK_RAW_TEXT;
3568 }
bcf26d6a 3569 }
d46c5b12 3570 else
c4825358 3571 {
d46c5b12 3572 int try;
4ed46869 3573
0a28aafb
KH
3574 if (multibytep && c == LEADING_CODE_8_BIT_CONTROL)
3575 c = *src++ - 0x20;
3576
d46c5b12
KH
3577 if (c < 0xA0)
3578 {
3579 /* C is the first byte of SJIS character code,
fa42c37f
KH
3580 or a leading-code of Emacs' internal format (emacs-mule),
3581 or the first byte of UTF-16. */
3582 try = (CODING_CATEGORY_MASK_SJIS
3583 | CODING_CATEGORY_MASK_EMACS_MULE
3584 | CODING_CATEGORY_MASK_UTF_16_BE
3585 | CODING_CATEGORY_MASK_UTF_16_LE);
d46c5b12
KH
3586
3587 /* Or, if C is a special latin extra code,
3588 or is an ISO2022 specific control code of C1 (SS2 or SS3),
3589 or is an ISO2022 control-sequence-introducer (CSI),
3590 we should also consider the possibility of ISO2022 codings. */
3591 if ((VECTORP (Vlatin_extra_code_table)
3592 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
3593 || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3)
3594 || (c == ISO_CODE_CSI
3595 && (src < src_end
3596 && (*src == ']'
3597 || ((*src == '0' || *src == '1' || *src == '2')
3598 && src + 1 < src_end
3599 && src[1] == ']')))))
3600 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE
3601 | CODING_CATEGORY_MASK_ISO_8BIT);
3602 }
c4825358 3603 else
d46c5b12
KH
3604 /* C is a character of ISO2022 in graphic plane right,
3605 or a SJIS's 1-byte character code (i.e. JISX0201),
fa42c37f
KH
3606 or the first byte of BIG5's 2-byte code,
3607 or the first byte of UTF-8/16. */
d46c5b12
KH
3608 try = (CODING_CATEGORY_MASK_ISO_8_ELSE
3609 | CODING_CATEGORY_MASK_ISO_8BIT
3610 | CODING_CATEGORY_MASK_SJIS
fa42c37f
KH
3611 | CODING_CATEGORY_MASK_BIG5
3612 | CODING_CATEGORY_MASK_UTF_8
3613 | CODING_CATEGORY_MASK_UTF_16_BE
3614 | CODING_CATEGORY_MASK_UTF_16_LE);
d46c5b12 3615
1397dc18
KH
3616 /* Or, we may have to consider the possibility of CCL. */
3617 if (coding_system_table[CODING_CATEGORY_IDX_CCL]
3618 && (coding_system_table[CODING_CATEGORY_IDX_CCL]
3619 ->spec.ccl.valid_codes)[c])
3620 try |= CODING_CATEGORY_MASK_CCL;
3621
d46c5b12 3622 mask = 0;
fa42c37f 3623 utf16_examined_p = iso2022_examined_p = 0;
d46c5b12
KH
3624 if (priorities)
3625 {
3626 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
3627 {
fa42c37f
KH
3628 if (!iso2022_examined_p
3629 && (priorities[i] & try & CODING_CATEGORY_MASK_ISO))
3630 {
3631 mask |= detect_coding_iso2022 (src, src_end);
3632 iso2022_examined_p = 1;
3633 }
5ab13dd0 3634 else if (priorities[i] & try & CODING_CATEGORY_MASK_SJIS)
0a28aafb 3635 mask |= detect_coding_sjis (src, src_end, multibytep);
fa42c37f 3636 else if (priorities[i] & try & CODING_CATEGORY_MASK_UTF_8)
0a28aafb 3637 mask |= detect_coding_utf_8 (src, src_end, multibytep);
fa42c37f
KH
3638 else if (!utf16_examined_p
3639 && (priorities[i] & try &
3640 CODING_CATEGORY_MASK_UTF_16_BE_LE))
3641 {
0a28aafb 3642 mask |= detect_coding_utf_16 (src, src_end, multibytep);
fa42c37f
KH
3643 utf16_examined_p = 1;
3644 }
5ab13dd0 3645 else if (priorities[i] & try & CODING_CATEGORY_MASK_BIG5)
0a28aafb 3646 mask |= detect_coding_big5 (src, src_end, multibytep);
5ab13dd0 3647 else if (priorities[i] & try & CODING_CATEGORY_MASK_EMACS_MULE)
0a28aafb 3648 mask |= detect_coding_emacs_mule (src, src_end, multibytep);
89fa8b36 3649 else if (priorities[i] & try & CODING_CATEGORY_MASK_CCL)
0a28aafb 3650 mask |= detect_coding_ccl (src, src_end, multibytep);
5ab13dd0 3651 else if (priorities[i] & CODING_CATEGORY_MASK_RAW_TEXT)
fa42c37f 3652 mask |= CODING_CATEGORY_MASK_RAW_TEXT;
5ab13dd0 3653 else if (priorities[i] & CODING_CATEGORY_MASK_BINARY)
fa42c37f
KH
3654 mask |= CODING_CATEGORY_MASK_BINARY;
3655 if (mask & priorities[i])
3656 return priorities[i];
d46c5b12
KH
3657 }
3658 return CODING_CATEGORY_MASK_RAW_TEXT;
3659 }
3660 if (try & CODING_CATEGORY_MASK_ISO)
0a28aafb 3661 mask |= detect_coding_iso2022 (src, src_end, multibytep);
d46c5b12 3662 if (try & CODING_CATEGORY_MASK_SJIS)
0a28aafb 3663 mask |= detect_coding_sjis (src, src_end, multibytep);
d46c5b12 3664 if (try & CODING_CATEGORY_MASK_BIG5)
0a28aafb 3665 mask |= detect_coding_big5 (src, src_end, multibytep);
fa42c37f 3666 if (try & CODING_CATEGORY_MASK_UTF_8)
0a28aafb 3667 mask |= detect_coding_utf_8 (src, src_end, multibytep);
fa42c37f 3668 if (try & CODING_CATEGORY_MASK_UTF_16_BE_LE)
0a28aafb 3669 mask |= detect_coding_utf_16 (src, src_end, multibytep);
d46c5b12 3670 if (try & CODING_CATEGORY_MASK_EMACS_MULE)
0a28aafb 3671 mask |= detect_coding_emacs_mule (src, src_end, multibytep);
1397dc18 3672 if (try & CODING_CATEGORY_MASK_CCL)
0a28aafb 3673 mask |= detect_coding_ccl (src, src_end, multibytep);
c4825358 3674 }
5ab13dd0 3675 return (mask | CODING_CATEGORY_MASK_RAW_TEXT | CODING_CATEGORY_MASK_BINARY);
4ed46869
KH
3676}
3677
3678/* Detect how a text of length SRC_BYTES pointed by SRC is encoded.
3679 The information of the detected coding system is set in CODING. */
3680
3681void
3682detect_coding (coding, src, src_bytes)
3683 struct coding_system *coding;
3684 unsigned char *src;
3685 int src_bytes;
3686{
d46c5b12
KH
3687 unsigned int idx;
3688 int skip, mask, i;
84d60297 3689 Lisp_Object val;
4ed46869 3690
84d60297 3691 val = Vcoding_category_list;
0a28aafb 3692 mask = detect_coding_mask (src, src_bytes, coding_priorities, &skip, 0);
d46c5b12 3693 coding->heading_ascii = skip;
4ed46869 3694
d46c5b12
KH
3695 if (!mask) return;
3696
3697 /* We found a single coding system of the highest priority in MASK. */
3698 idx = 0;
3699 while (mask && ! (mask & 1)) mask >>= 1, idx++;
3700 if (! mask)
3701 idx = CODING_CATEGORY_IDX_RAW_TEXT;
4ed46869 3702
d46c5b12
KH
3703 val = XSYMBOL (XVECTOR (Vcoding_category_table)->contents[idx])->value;
3704
3705 if (coding->eol_type != CODING_EOL_UNDECIDED)
27901516 3706 {
84d60297 3707 Lisp_Object tmp;
d46c5b12 3708
84d60297 3709 tmp = Fget (val, Qeol_type);
d46c5b12
KH
3710 if (VECTORP (tmp))
3711 val = XVECTOR (tmp)->contents[coding->eol_type];
4ed46869 3712 }
b73bfc1c
KH
3713
3714 /* Setup this new coding system while preserving some slots. */
3715 {
3716 int src_multibyte = coding->src_multibyte;
3717 int dst_multibyte = coding->dst_multibyte;
3718
3719 setup_coding_system (val, coding);
3720 coding->src_multibyte = src_multibyte;
3721 coding->dst_multibyte = dst_multibyte;
3722 coding->heading_ascii = skip;
3723 }
4ed46869
KH
3724}
3725
d46c5b12
KH
3726/* Detect how end-of-line of a text of length SRC_BYTES pointed by
3727 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF,
3728 CODING_EOL_CR, and CODING_EOL_UNDECIDED.
3729
3730 How many non-eol characters are at the head is returned as *SKIP. */
4ed46869 3731
bc4bc72a
RS
3732#define MAX_EOL_CHECK_COUNT 3
3733
d46c5b12
KH
3734static int
3735detect_eol_type (source, src_bytes, skip)
3736 unsigned char *source;
3737 int src_bytes, *skip;
4ed46869 3738{
d46c5b12 3739 unsigned char *src = source, *src_end = src + src_bytes;
4ed46869 3740 unsigned char c;
bc4bc72a
RS
3741 int total = 0; /* How many end-of-lines are found so far. */
3742 int eol_type = CODING_EOL_UNDECIDED;
3743 int this_eol_type;
4ed46869 3744
d46c5b12
KH
3745 *skip = 0;
3746
bc4bc72a 3747 while (src < src_end && total < MAX_EOL_CHECK_COUNT)
4ed46869
KH
3748 {
3749 c = *src++;
bc4bc72a 3750 if (c == '\n' || c == '\r')
4ed46869 3751 {
d46c5b12
KH
3752 if (*skip == 0)
3753 *skip = src - 1 - source;
bc4bc72a
RS
3754 total++;
3755 if (c == '\n')
3756 this_eol_type = CODING_EOL_LF;
3757 else if (src >= src_end || *src != '\n')
3758 this_eol_type = CODING_EOL_CR;
4ed46869 3759 else
bc4bc72a
RS
3760 this_eol_type = CODING_EOL_CRLF, src++;
3761
3762 if (eol_type == CODING_EOL_UNDECIDED)
3763 /* This is the first end-of-line. */
3764 eol_type = this_eol_type;
3765 else if (eol_type != this_eol_type)
d46c5b12
KH
3766 {
3767 /* The found type is different from what found before. */
3768 eol_type = CODING_EOL_INCONSISTENT;
3769 break;
3770 }
4ed46869
KH
3771 }
3772 }
bc4bc72a 3773
d46c5b12
KH
3774 if (*skip == 0)
3775 *skip = src_end - source;
85a02ca4 3776 return eol_type;
4ed46869
KH
3777}
3778
fa42c37f
KH
3779/* Like detect_eol_type, but detect EOL type in 2-octet
3780 big-endian/little-endian format for coding systems utf-16-be and
3781 utf-16-le. */
3782
3783static int
3784detect_eol_type_in_2_octet_form (source, src_bytes, skip, big_endian_p)
3785 unsigned char *source;
3786 int src_bytes, *skip;
3787{
3788 unsigned char *src = source, *src_end = src + src_bytes;
3789 unsigned int c1, c2;
3790 int total = 0; /* How many end-of-lines are found so far. */
3791 int eol_type = CODING_EOL_UNDECIDED;
3792 int this_eol_type;
3793 int msb, lsb;
3794
3795 if (big_endian_p)
3796 msb = 0, lsb = 1;
3797 else
3798 msb = 1, lsb = 0;
3799
3800 *skip = 0;
3801
3802 while ((src + 1) < src_end && total < MAX_EOL_CHECK_COUNT)
3803 {
3804 c1 = (src[msb] << 8) | (src[lsb]);
3805 src += 2;
3806
3807 if (c1 == '\n' || c1 == '\r')
3808 {
3809 if (*skip == 0)
3810 *skip = src - 2 - source;
3811 total++;
3812 if (c1 == '\n')
3813 {
3814 this_eol_type = CODING_EOL_LF;
3815 }
3816 else
3817 {
3818 if ((src + 1) >= src_end)
3819 {
3820 this_eol_type = CODING_EOL_CR;
3821 }
3822 else
3823 {
3824 c2 = (src[msb] << 8) | (src[lsb]);
3825 if (c2 == '\n')
3826 this_eol_type = CODING_EOL_CRLF, src += 2;
3827 else
3828 this_eol_type = CODING_EOL_CR;
3829 }
3830 }
3831
3832 if (eol_type == CODING_EOL_UNDECIDED)
3833 /* This is the first end-of-line. */
3834 eol_type = this_eol_type;
3835 else if (eol_type != this_eol_type)
3836 {
3837 /* The found type is different from what found before. */
3838 eol_type = CODING_EOL_INCONSISTENT;
3839 break;
3840 }
3841 }
3842 }
3843
3844 if (*skip == 0)
3845 *skip = src_end - source;
3846 return eol_type;
3847}
3848
4ed46869
KH
3849/* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
3850 is encoded. If it detects an appropriate format of end-of-line, it
3851 sets the information in *CODING. */
3852
3853void
3854detect_eol (coding, src, src_bytes)
3855 struct coding_system *coding;
3856 unsigned char *src;
3857 int src_bytes;
3858{
4608c386 3859 Lisp_Object val;
d46c5b12 3860 int skip;
fa42c37f
KH
3861 int eol_type;
3862
3863 switch (coding->category_idx)
3864 {
3865 case CODING_CATEGORY_IDX_UTF_16_BE:
3866 eol_type = detect_eol_type_in_2_octet_form (src, src_bytes, &skip, 1);
3867 break;
3868 case CODING_CATEGORY_IDX_UTF_16_LE:
3869 eol_type = detect_eol_type_in_2_octet_form (src, src_bytes, &skip, 0);
3870 break;
3871 default:
3872 eol_type = detect_eol_type (src, src_bytes, &skip);
3873 break;
3874 }
d46c5b12
KH
3875
3876 if (coding->heading_ascii > skip)
3877 coding->heading_ascii = skip;
3878 else
3879 skip = coding->heading_ascii;
4ed46869 3880
0ef69138 3881 if (eol_type == CODING_EOL_UNDECIDED)
4ed46869 3882 return;
27901516
KH
3883 if (eol_type == CODING_EOL_INCONSISTENT)
3884 {
3885#if 0
3886 /* This code is suppressed until we find a better way to
992f23f2 3887 distinguish raw text file and binary file. */
27901516
KH
3888
3889 /* If we have already detected that the coding is raw-text, the
3890 coding should actually be no-conversion. */
3891 if (coding->type == coding_type_raw_text)
3892 {
3893 setup_coding_system (Qno_conversion, coding);
3894 return;
3895 }
3896 /* Else, let's decode only text code anyway. */
3897#endif /* 0 */
1b2af4b0 3898 eol_type = CODING_EOL_LF;
27901516
KH
3899 }
3900
4608c386 3901 val = Fget (coding->symbol, Qeol_type);
4ed46869 3902 if (VECTORP (val) && XVECTOR (val)->size == 3)
d46c5b12 3903 {
b73bfc1c
KH
3904 int src_multibyte = coding->src_multibyte;
3905 int dst_multibyte = coding->dst_multibyte;
3906
d46c5b12 3907 setup_coding_system (XVECTOR (val)->contents[eol_type], coding);
b73bfc1c
KH
3908 coding->src_multibyte = src_multibyte;
3909 coding->dst_multibyte = dst_multibyte;
d46c5b12
KH
3910 coding->heading_ascii = skip;
3911 }
3912}
3913
3914#define CONVERSION_BUFFER_EXTRA_ROOM 256
3915
b73bfc1c
KH
3916#define DECODING_BUFFER_MAG(coding) \
3917 (coding->type == coding_type_iso2022 \
3918 ? 3 \
3919 : (coding->type == coding_type_ccl \
3920 ? coding->spec.ccl.decoder.buf_magnification \
3921 : 2))
d46c5b12
KH
3922
3923/* Return maximum size (bytes) of a buffer enough for decoding
3924 SRC_BYTES of text encoded in CODING. */
3925
3926int
3927decoding_buffer_size (coding, src_bytes)
3928 struct coding_system *coding;
3929 int src_bytes;
3930{
3931 return (src_bytes * DECODING_BUFFER_MAG (coding)
3932 + CONVERSION_BUFFER_EXTRA_ROOM);
3933}
3934
3935/* Return maximum size (bytes) of a buffer enough for encoding
3936 SRC_BYTES of text to CODING. */
3937
3938int
3939encoding_buffer_size (coding, src_bytes)
3940 struct coding_system *coding;
3941 int src_bytes;
3942{
3943 int magnification;
3944
3945 if (coding->type == coding_type_ccl)
3946 magnification = coding->spec.ccl.encoder.buf_magnification;
b73bfc1c 3947 else if (CODING_REQUIRE_ENCODING (coding))
d46c5b12 3948 magnification = 3;
b73bfc1c
KH
3949 else
3950 magnification = 1;
d46c5b12
KH
3951
3952 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM);
3953}
3954
73be902c
KH
3955/* Working buffer for code conversion. */
3956struct conversion_buffer
3957{
3958 int size; /* size of data. */
3959 int on_stack; /* 1 if allocated by alloca. */
3960 unsigned char *data;
3961};
d46c5b12 3962
73be902c
KH
3963/* Don't use alloca for allocating memory space larger than this, lest
3964 we overflow their stack. */
3965#define MAX_ALLOCA 16*1024
d46c5b12 3966
73be902c
KH
3967/* Allocate LEN bytes of memory for BUF (struct conversion_buffer). */
3968#define allocate_conversion_buffer(buf, len) \
3969 do { \
3970 if (len < MAX_ALLOCA) \
3971 { \
3972 buf.data = (unsigned char *) alloca (len); \
3973 buf.on_stack = 1; \
3974 } \
3975 else \
3976 { \
3977 buf.data = (unsigned char *) xmalloc (len); \
3978 buf.on_stack = 0; \
3979 } \
3980 buf.size = len; \
3981 } while (0)
d46c5b12 3982
73be902c
KH
3983/* Double the allocated memory for *BUF. */
3984static void
3985extend_conversion_buffer (buf)
3986 struct conversion_buffer *buf;
d46c5b12 3987{
73be902c 3988 if (buf->on_stack)
d46c5b12 3989 {
73be902c
KH
3990 unsigned char *save = buf->data;
3991 buf->data = (unsigned char *) xmalloc (buf->size * 2);
3992 bcopy (save, buf->data, buf->size);
3993 buf->on_stack = 0;
d46c5b12 3994 }
73be902c
KH
3995 else
3996 {
3997 buf->data = (unsigned char *) xrealloc (buf->data, buf->size * 2);
3998 }
3999 buf->size *= 2;
4000}
4001
4002/* Free the allocated memory for BUF if it is not on stack. */
4003static void
4004free_conversion_buffer (buf)
4005 struct conversion_buffer *buf;
4006{
4007 if (!buf->on_stack)
4008 xfree (buf->data);
d46c5b12
KH
4009}
4010
4011int
4012ccl_coding_driver (coding, source, destination, src_bytes, dst_bytes, encodep)
4013 struct coding_system *coding;
4014 unsigned char *source, *destination;
4015 int src_bytes, dst_bytes, encodep;
4016{
4017 struct ccl_program *ccl
4018 = encodep ? &coding->spec.ccl.encoder : &coding->spec.ccl.decoder;
4019 int result;
4020
ae9ff118 4021 ccl->last_block = coding->mode & CODING_MODE_LAST_BLOCK;
aaaf0b1e
KH
4022 if (encodep)
4023 ccl->eol_type = coding->eol_type;
7272d75c 4024 ccl->multibyte = coding->src_multibyte;
d46c5b12
KH
4025 coding->produced = ccl_driver (ccl, source, destination,
4026 src_bytes, dst_bytes, &(coding->consumed));
b73bfc1c
KH
4027 if (encodep)
4028 coding->produced_char = coding->produced;
4029 else
4030 {
4031 int bytes
4032 = dst_bytes ? dst_bytes : source + coding->consumed - destination;
4033 coding->produced = str_as_multibyte (destination, bytes,
4034 coding->produced,
4035 &(coding->produced_char));
4036 }
69f76525 4037
d46c5b12
KH
4038 switch (ccl->status)
4039 {
4040 case CCL_STAT_SUSPEND_BY_SRC:
73be902c 4041 coding->result = CODING_FINISH_INSUFFICIENT_SRC;
d46c5b12
KH
4042 break;
4043 case CCL_STAT_SUSPEND_BY_DST:
73be902c 4044 coding->result = CODING_FINISH_INSUFFICIENT_DST;
d46c5b12 4045 break;
9864ebce
KH
4046 case CCL_STAT_QUIT:
4047 case CCL_STAT_INVALID_CMD:
73be902c 4048 coding->result = CODING_FINISH_INTERRUPT;
9864ebce 4049 break;
d46c5b12 4050 default:
73be902c 4051 coding->result = CODING_FINISH_NORMAL;
d46c5b12
KH
4052 break;
4053 }
73be902c 4054 return coding->result;
4ed46869
KH
4055}
4056
aaaf0b1e
KH
4057/* Decode EOL format of the text at PTR of BYTES length destructively
4058 according to CODING->eol_type. This is called after the CCL
4059 program produced a decoded text at PTR. If we do CRLF->LF
4060 conversion, update CODING->produced and CODING->produced_char. */
4061
4062static void
4063decode_eol_post_ccl (coding, ptr, bytes)
4064 struct coding_system *coding;
4065 unsigned char *ptr;
4066 int bytes;
4067{
4068 Lisp_Object val, saved_coding_symbol;
4069 unsigned char *pend = ptr + bytes;
4070 int dummy;
4071
4072 /* Remember the current coding system symbol. We set it back when
4073 an inconsistent EOL is found so that `last-coding-system-used' is
4074 set to the coding system that doesn't specify EOL conversion. */
4075 saved_coding_symbol = coding->symbol;
4076
4077 coding->spec.ccl.cr_carryover = 0;
4078 if (coding->eol_type == CODING_EOL_UNDECIDED)
4079 {
4080 /* Here, to avoid the call of setup_coding_system, we directly
4081 call detect_eol_type. */
4082 coding->eol_type = detect_eol_type (ptr, bytes, &dummy);
74b01b80
EZ
4083 if (coding->eol_type == CODING_EOL_INCONSISTENT)
4084 coding->eol_type = CODING_EOL_LF;
4085 if (coding->eol_type != CODING_EOL_UNDECIDED)
4086 {
4087 val = Fget (coding->symbol, Qeol_type);
4088 if (VECTORP (val) && XVECTOR (val)->size == 3)
4089 coding->symbol = XVECTOR (val)->contents[coding->eol_type];
4090 }
aaaf0b1e
KH
4091 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
4092 }
4093
74b01b80
EZ
4094 if (coding->eol_type == CODING_EOL_LF
4095 || coding->eol_type == CODING_EOL_UNDECIDED)
aaaf0b1e
KH
4096 {
4097 /* We have nothing to do. */
4098 ptr = pend;
4099 }
4100 else if (coding->eol_type == CODING_EOL_CRLF)
4101 {
4102 unsigned char *pstart = ptr, *p = ptr;
4103
4104 if (! (coding->mode & CODING_MODE_LAST_BLOCK)
4105 && *(pend - 1) == '\r')
4106 {
4107 /* If the last character is CR, we can't handle it here
4108 because LF will be in the not-yet-decoded source text.
4109 Recorded that the CR is not yet processed. */
4110 coding->spec.ccl.cr_carryover = 1;
4111 coding->produced--;
4112 coding->produced_char--;
4113 pend--;
4114 }
4115 while (ptr < pend)
4116 {
4117 if (*ptr == '\r')
4118 {
4119 if (ptr + 1 < pend && *(ptr + 1) == '\n')
4120 {
4121 *p++ = '\n';
4122 ptr += 2;
4123 }
4124 else
4125 {
4126 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4127 goto undo_eol_conversion;
4128 *p++ = *ptr++;
4129 }
4130 }
4131 else if (*ptr == '\n'
4132 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4133 goto undo_eol_conversion;
4134 else
4135 *p++ = *ptr++;
4136 continue;
4137
4138 undo_eol_conversion:
4139 /* We have faced with inconsistent EOL format at PTR.
4140 Convert all LFs before PTR back to CRLFs. */
4141 for (p--, ptr--; p >= pstart; p--)
4142 {
4143 if (*p == '\n')
4144 *ptr-- = '\n', *ptr-- = '\r';
4145 else
4146 *ptr-- = *p;
4147 }
4148 /* If carryover is recorded, cancel it because we don't
4149 convert CRLF anymore. */
4150 if (coding->spec.ccl.cr_carryover)
4151 {
4152 coding->spec.ccl.cr_carryover = 0;
4153 coding->produced++;
4154 coding->produced_char++;
4155 pend++;
4156 }
4157 p = ptr = pend;
4158 coding->eol_type = CODING_EOL_LF;
4159 coding->symbol = saved_coding_symbol;
4160 }
4161 if (p < pend)
4162 {
4163 /* As each two-byte sequence CRLF was converted to LF, (PEND
4164 - P) is the number of deleted characters. */
4165 coding->produced -= pend - p;
4166 coding->produced_char -= pend - p;
4167 }
4168 }
4169 else /* i.e. coding->eol_type == CODING_EOL_CR */
4170 {
4171 unsigned char *p = ptr;
4172
4173 for (; ptr < pend; ptr++)
4174 {
4175 if (*ptr == '\r')
4176 *ptr = '\n';
4177 else if (*ptr == '\n'
4178 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4179 {
4180 for (; p < ptr; p++)
4181 {
4182 if (*p == '\n')
4183 *p = '\r';
4184 }
4185 ptr = pend;
4186 coding->eol_type = CODING_EOL_LF;
4187 coding->symbol = saved_coding_symbol;
4188 }
4189 }
4190 }
4191}
4192
4ed46869
KH
4193/* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before
4194 decoding, it may detect coding system and format of end-of-line if
b73bfc1c
KH
4195 those are not yet decided. The source should be unibyte, the
4196 result is multibyte if CODING->dst_multibyte is nonzero, else
4197 unibyte. */
4ed46869
KH
4198
4199int
d46c5b12 4200decode_coding (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
4201 struct coding_system *coding;
4202 unsigned char *source, *destination;
4203 int src_bytes, dst_bytes;
4ed46869 4204{
0ef69138 4205 if (coding->type == coding_type_undecided)
4ed46869
KH
4206 detect_coding (coding, source, src_bytes);
4207
aaaf0b1e
KH
4208 if (coding->eol_type == CODING_EOL_UNDECIDED
4209 && coding->type != coding_type_ccl)
4ed46869
KH
4210 detect_eol (coding, source, src_bytes);
4211
b73bfc1c
KH
4212 coding->produced = coding->produced_char = 0;
4213 coding->consumed = coding->consumed_char = 0;
4214 coding->errors = 0;
4215 coding->result = CODING_FINISH_NORMAL;
4216
4ed46869
KH
4217 switch (coding->type)
4218 {
4ed46869 4219 case coding_type_sjis:
b73bfc1c
KH
4220 decode_coding_sjis_big5 (coding, source, destination,
4221 src_bytes, dst_bytes, 1);
4ed46869
KH
4222 break;
4223
4224 case coding_type_iso2022:
b73bfc1c
KH
4225 decode_coding_iso2022 (coding, source, destination,
4226 src_bytes, dst_bytes);
4ed46869
KH
4227 break;
4228
4229 case coding_type_big5:
b73bfc1c
KH
4230 decode_coding_sjis_big5 (coding, source, destination,
4231 src_bytes, dst_bytes, 0);
4232 break;
4233
4234 case coding_type_emacs_mule:
4235 decode_coding_emacs_mule (coding, source, destination,
4236 src_bytes, dst_bytes);
4ed46869
KH
4237 break;
4238
4239 case coding_type_ccl:
aaaf0b1e
KH
4240 if (coding->spec.ccl.cr_carryover)
4241 {
4242 /* Set the CR which is not processed by the previous call of
4243 decode_eol_post_ccl in DESTINATION. */
4244 *destination = '\r';
4245 coding->produced++;
4246 coding->produced_char++;
4247 dst_bytes--;
4248 }
4249 ccl_coding_driver (coding, source,
4250 destination + coding->spec.ccl.cr_carryover,
b73bfc1c 4251 src_bytes, dst_bytes, 0);
aaaf0b1e
KH
4252 if (coding->eol_type != CODING_EOL_LF)
4253 decode_eol_post_ccl (coding, destination, coding->produced);
d46c5b12
KH
4254 break;
4255
b73bfc1c
KH
4256 default:
4257 decode_eol (coding, source, destination, src_bytes, dst_bytes);
4258 }
4259
4260 if (coding->result == CODING_FINISH_INSUFFICIENT_SRC
e7c9eef9 4261 && coding->mode & CODING_MODE_LAST_BLOCK
b73bfc1c
KH
4262 && coding->consumed == src_bytes)
4263 coding->result = CODING_FINISH_NORMAL;
4264
4265 if (coding->mode & CODING_MODE_LAST_BLOCK
4266 && coding->result == CODING_FINISH_INSUFFICIENT_SRC)
4267 {
4268 unsigned char *src = source + coding->consumed;
4269 unsigned char *dst = destination + coding->produced;
4270
4271 src_bytes -= coding->consumed;
bb10be8b 4272 coding->errors++;
b73bfc1c
KH
4273 if (COMPOSING_P (coding))
4274 DECODE_COMPOSITION_END ('1');
4275 while (src_bytes--)
d46c5b12 4276 {
b73bfc1c
KH
4277 int c = *src++;
4278 dst += CHAR_STRING (c, dst);
4279 coding->produced_char++;
d46c5b12 4280 }
b73bfc1c
KH
4281 coding->consumed = coding->consumed_char = src - source;
4282 coding->produced = dst - destination;
73be902c 4283 coding->result = CODING_FINISH_NORMAL;
4ed46869
KH
4284 }
4285
b73bfc1c
KH
4286 if (!coding->dst_multibyte)
4287 {
4288 coding->produced = str_as_unibyte (destination, coding->produced);
4289 coding->produced_char = coding->produced;
4290 }
4ed46869 4291
b73bfc1c
KH
4292 return coding->result;
4293}
52d41803 4294
b73bfc1c
KH
4295/* See "GENERAL NOTES about `encode_coding_XXX ()' functions". The
4296 multibyteness of the source is CODING->src_multibyte, the
4297 multibyteness of the result is always unibyte. */
4ed46869
KH
4298
4299int
d46c5b12 4300encode_coding (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
4301 struct coding_system *coding;
4302 unsigned char *source, *destination;
4303 int src_bytes, dst_bytes;
4ed46869 4304{
b73bfc1c
KH
4305 coding->produced = coding->produced_char = 0;
4306 coding->consumed = coding->consumed_char = 0;
4307 coding->errors = 0;
4308 coding->result = CODING_FINISH_NORMAL;
4ed46869 4309
d46c5b12
KH
4310 switch (coding->type)
4311 {
4ed46869 4312 case coding_type_sjis:
b73bfc1c
KH
4313 encode_coding_sjis_big5 (coding, source, destination,
4314 src_bytes, dst_bytes, 1);
4ed46869
KH
4315 break;
4316
4317 case coding_type_iso2022:
b73bfc1c
KH
4318 encode_coding_iso2022 (coding, source, destination,
4319 src_bytes, dst_bytes);
4ed46869
KH
4320 break;
4321
4322 case coding_type_big5:
b73bfc1c
KH
4323 encode_coding_sjis_big5 (coding, source, destination,
4324 src_bytes, dst_bytes, 0);
4325 break;
4326
4327 case coding_type_emacs_mule:
4328 encode_coding_emacs_mule (coding, source, destination,
4329 src_bytes, dst_bytes);
4ed46869
KH
4330 break;
4331
4332 case coding_type_ccl:
b73bfc1c
KH
4333 ccl_coding_driver (coding, source, destination,
4334 src_bytes, dst_bytes, 1);
d46c5b12
KH
4335 break;
4336
b73bfc1c
KH
4337 default:
4338 encode_eol (coding, source, destination, src_bytes, dst_bytes);
4339 }
4340
73be902c
KH
4341 if (coding->mode & CODING_MODE_LAST_BLOCK
4342 && coding->result == CODING_FINISH_INSUFFICIENT_SRC)
b73bfc1c
KH
4343 {
4344 unsigned char *src = source + coding->consumed;
4345 unsigned char *src_end = src + src_bytes;
4346 unsigned char *dst = destination + coding->produced;
4347
4348 if (coding->type == coding_type_iso2022)
4349 ENCODE_RESET_PLANE_AND_REGISTER;
4350 if (COMPOSING_P (coding))
4351 *dst++ = ISO_CODE_ESC, *dst++ = '1';
4352 if (coding->consumed < src_bytes)
d46c5b12 4353 {
b73bfc1c
KH
4354 int len = src_bytes - coding->consumed;
4355
4356 BCOPY_SHORT (source + coding->consumed, dst, len);
4357 if (coding->src_multibyte)
4358 len = str_as_unibyte (dst, len);
4359 dst += len;
4360 coding->consumed = src_bytes;
d46c5b12 4361 }
b73bfc1c 4362 coding->produced = coding->produced_char = dst - destination;
73be902c 4363 coding->result = CODING_FINISH_NORMAL;
4ed46869
KH
4364 }
4365
bb10be8b
KH
4366 if (coding->result == CODING_FINISH_INSUFFICIENT_SRC
4367 && coding->consumed == src_bytes)
4368 coding->result = CODING_FINISH_NORMAL;
4369
b73bfc1c 4370 return coding->result;
4ed46869
KH
4371}
4372
fb88bf2d
KH
4373/* Scan text in the region between *BEG and *END (byte positions),
4374 skip characters which we don't have to decode by coding system
4375 CODING at the head and tail, then set *BEG and *END to the region
4376 of the text we actually have to convert. The caller should move
b73bfc1c
KH
4377 the gap out of the region in advance if the region is from a
4378 buffer.
4ed46869 4379
d46c5b12
KH
4380 If STR is not NULL, *BEG and *END are indices into STR. */
4381
4382static void
4383shrink_decoding_region (beg, end, coding, str)
4384 int *beg, *end;
4385 struct coding_system *coding;
4386 unsigned char *str;
4387{
fb88bf2d 4388 unsigned char *begp_orig, *begp, *endp_orig, *endp, c;
d46c5b12 4389 int eol_conversion;
88993dfd 4390 Lisp_Object translation_table;
d46c5b12
KH
4391
4392 if (coding->type == coding_type_ccl
4393 || coding->type == coding_type_undecided
b73bfc1c
KH
4394 || coding->eol_type != CODING_EOL_LF
4395 || !NILP (coding->post_read_conversion)
4396 || coding->composing != COMPOSITION_DISABLED)
d46c5b12
KH
4397 {
4398 /* We can't skip any data. */
4399 return;
4400 }
b73bfc1c
KH
4401 if (coding->type == coding_type_no_conversion
4402 || coding->type == coding_type_raw_text
4403 || coding->type == coding_type_emacs_mule)
d46c5b12 4404 {
fb88bf2d
KH
4405 /* We need no conversion, but don't have to skip any data here.
4406 Decoding routine handles them effectively anyway. */
d46c5b12
KH
4407 return;
4408 }
4409
88993dfd
KH
4410 translation_table = coding->translation_table_for_decode;
4411 if (NILP (translation_table) && !NILP (Venable_character_translation))
4412 translation_table = Vstandard_translation_table_for_decode;
4413 if (CHAR_TABLE_P (translation_table))
4414 {
4415 int i;
4416 for (i = 0; i < 128; i++)
4417 if (!NILP (CHAR_TABLE_REF (translation_table, i)))
4418 break;
4419 if (i < 128)
fa46990e 4420 /* Some ASCII character should be translated. We give up
88993dfd
KH
4421 shrinking. */
4422 return;
4423 }
4424
b73bfc1c 4425 if (coding->heading_ascii >= 0)
d46c5b12
KH
4426 /* Detection routine has already found how much we can skip at the
4427 head. */
4428 *beg += coding->heading_ascii;
4429
4430 if (str)
4431 {
4432 begp_orig = begp = str + *beg;
4433 endp_orig = endp = str + *end;
4434 }
4435 else
4436 {
fb88bf2d 4437 begp_orig = begp = BYTE_POS_ADDR (*beg);
d46c5b12
KH
4438 endp_orig = endp = begp + *end - *beg;
4439 }
4440
fa46990e
DL
4441 eol_conversion = (coding->eol_type == CODING_EOL_CR
4442 || coding->eol_type == CODING_EOL_CRLF);
4443
d46c5b12
KH
4444 switch (coding->type)
4445 {
d46c5b12
KH
4446 case coding_type_sjis:
4447 case coding_type_big5:
4448 /* We can skip all ASCII characters at the head. */
4449 if (coding->heading_ascii < 0)
4450 {
4451 if (eol_conversion)
de9d083c 4452 while (begp < endp && *begp < 0x80 && *begp != '\r') begp++;
d46c5b12
KH
4453 else
4454 while (begp < endp && *begp < 0x80) begp++;
4455 }
4456 /* We can skip all ASCII characters at the tail except for the
4457 second byte of SJIS or BIG5 code. */
4458 if (eol_conversion)
de9d083c 4459 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\r') endp--;
d46c5b12
KH
4460 else
4461 while (begp < endp && endp[-1] < 0x80) endp--;
ee59c65f
RS
4462 /* Do not consider LF as ascii if preceded by CR, since that
4463 confuses eol decoding. */
4464 if (begp < endp && endp < endp_orig && endp[-1] == '\r' && endp[0] == '\n')
4465 endp++;
d46c5b12
KH
4466 if (begp < endp && endp < endp_orig && endp[-1] >= 0x80)
4467 endp++;
4468 break;
4469
b73bfc1c 4470 case coding_type_iso2022:
622fece5
KH
4471 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, 0) != CHARSET_ASCII)
4472 /* We can't skip any data. */
4473 break;
d46c5b12
KH
4474 if (coding->heading_ascii < 0)
4475 {
d46c5b12
KH
4476 /* We can skip all ASCII characters at the head except for a
4477 few control codes. */
4478 while (begp < endp && (c = *begp) < 0x80
4479 && c != ISO_CODE_CR && c != ISO_CODE_SO
4480 && c != ISO_CODE_SI && c != ISO_CODE_ESC
4481 && (!eol_conversion || c != ISO_CODE_LF))
4482 begp++;
4483 }
4484 switch (coding->category_idx)
4485 {
4486 case CODING_CATEGORY_IDX_ISO_8_1:
4487 case CODING_CATEGORY_IDX_ISO_8_2:
4488 /* We can skip all ASCII characters at the tail. */
4489 if (eol_conversion)
de9d083c 4490 while (begp < endp && (c = endp[-1]) < 0x80 && c != '\r') endp--;
d46c5b12
KH
4491 else
4492 while (begp < endp && endp[-1] < 0x80) endp--;
ee59c65f
RS
4493 /* Do not consider LF as ascii if preceded by CR, since that
4494 confuses eol decoding. */
4495 if (begp < endp && endp < endp_orig && endp[-1] == '\r' && endp[0] == '\n')
4496 endp++;
d46c5b12
KH
4497 break;
4498
4499 case CODING_CATEGORY_IDX_ISO_7:
4500 case CODING_CATEGORY_IDX_ISO_7_TIGHT:
de79a6a5
KH
4501 {
4502 /* We can skip all charactes at the tail except for 8-bit
4503 codes and ESC and the following 2-byte at the tail. */
4504 unsigned char *eight_bit = NULL;
4505
4506 if (eol_conversion)
4507 while (begp < endp
4508 && (c = endp[-1]) != ISO_CODE_ESC && c != '\r')
4509 {
4510 if (!eight_bit && c & 0x80) eight_bit = endp;
4511 endp--;
4512 }
4513 else
4514 while (begp < endp
4515 && (c = endp[-1]) != ISO_CODE_ESC)
4516 {
4517 if (!eight_bit && c & 0x80) eight_bit = endp;
4518 endp--;
4519 }
4520 /* Do not consider LF as ascii if preceded by CR, since that
4521 confuses eol decoding. */
4522 if (begp < endp && endp < endp_orig
4523 && endp[-1] == '\r' && endp[0] == '\n')
4524 endp++;
4525 if (begp < endp && endp[-1] == ISO_CODE_ESC)
4526 {
4527 if (endp + 1 < endp_orig && end[0] == '(' && end[1] == 'B')
4528 /* This is an ASCII designation sequence. We can
4529 surely skip the tail. But, if we have
4530 encountered an 8-bit code, skip only the codes
4531 after that. */
4532 endp = eight_bit ? eight_bit : endp + 2;
4533 else
4534 /* Hmmm, we can't skip the tail. */
4535 endp = endp_orig;
4536 }
4537 else if (eight_bit)
4538 endp = eight_bit;
4539 }
d46c5b12 4540 }
b73bfc1c
KH
4541 break;
4542
4543 default:
4544 abort ();
d46c5b12
KH
4545 }
4546 *beg += begp - begp_orig;
4547 *end += endp - endp_orig;
4548 return;
4549}
4550
4551/* Like shrink_decoding_region but for encoding. */
4552
4553static void
4554shrink_encoding_region (beg, end, coding, str)
4555 int *beg, *end;
4556 struct coding_system *coding;
4557 unsigned char *str;
4558{
4559 unsigned char *begp_orig, *begp, *endp_orig, *endp;
4560 int eol_conversion;
88993dfd 4561 Lisp_Object translation_table;
d46c5b12 4562
b73bfc1c
KH
4563 if (coding->type == coding_type_ccl
4564 || coding->eol_type == CODING_EOL_CRLF
4565 || coding->eol_type == CODING_EOL_CR
4566 || coding->cmp_data && coding->cmp_data->used > 0)
d46c5b12 4567 {
b73bfc1c
KH
4568 /* We can't skip any data. */
4569 return;
4570 }
4571 if (coding->type == coding_type_no_conversion
4572 || coding->type == coding_type_raw_text
4573 || coding->type == coding_type_emacs_mule
4574 || coding->type == coding_type_undecided)
4575 {
4576 /* We need no conversion, but don't have to skip any data here.
4577 Encoding routine handles them effectively anyway. */
d46c5b12
KH
4578 return;
4579 }
4580
88993dfd
KH
4581 translation_table = coding->translation_table_for_encode;
4582 if (NILP (translation_table) && !NILP (Venable_character_translation))
4583 translation_table = Vstandard_translation_table_for_encode;
4584 if (CHAR_TABLE_P (translation_table))
4585 {
4586 int i;
4587 for (i = 0; i < 128; i++)
4588 if (!NILP (CHAR_TABLE_REF (translation_table, i)))
4589 break;
4590 if (i < 128)
4591 /* Some ASCII character should be tranlsated. We give up
4592 shrinking. */
4593 return;
4594 }
4595
d46c5b12
KH
4596 if (str)
4597 {
4598 begp_orig = begp = str + *beg;
4599 endp_orig = endp = str + *end;
4600 }
4601 else
4602 {
fb88bf2d 4603 begp_orig = begp = BYTE_POS_ADDR (*beg);
d46c5b12
KH
4604 endp_orig = endp = begp + *end - *beg;
4605 }
4606
4607 eol_conversion = (coding->eol_type == CODING_EOL_CR
4608 || coding->eol_type == CODING_EOL_CRLF);
4609
4610 /* Here, we don't have to check coding->pre_write_conversion because
4611 the caller is expected to have handled it already. */
4612 switch (coding->type)
4613 {
d46c5b12 4614 case coding_type_iso2022:
622fece5
KH
4615 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, 0) != CHARSET_ASCII)
4616 /* We can't skip any data. */
4617 break;
d46c5b12
KH
4618 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL)
4619 {
4620 unsigned char *bol = begp;
4621 while (begp < endp && *begp < 0x80)
4622 {
4623 begp++;
4624 if (begp[-1] == '\n')
4625 bol = begp;
4626 }
4627 begp = bol;
4628 goto label_skip_tail;
4629 }
4630 /* fall down ... */
4631
b73bfc1c
KH
4632 case coding_type_sjis:
4633 case coding_type_big5:
d46c5b12
KH
4634 /* We can skip all ASCII characters at the head and tail. */
4635 if (eol_conversion)
4636 while (begp < endp && *begp < 0x80 && *begp != '\n') begp++;
4637 else
4638 while (begp < endp && *begp < 0x80) begp++;
4639 label_skip_tail:
4640 if (eol_conversion)
4641 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--;
4642 else
4643 while (begp < endp && *(endp - 1) < 0x80) endp--;
4644 break;
b73bfc1c
KH
4645
4646 default:
4647 abort ();
d46c5b12
KH
4648 }
4649
4650 *beg += begp - begp_orig;
4651 *end += endp - endp_orig;
4652 return;
4653}
4654
88993dfd
KH
4655/* As shrinking conversion region requires some overhead, we don't try
4656 shrinking if the length of conversion region is less than this
4657 value. */
4658static int shrink_conversion_region_threshhold = 1024;
4659
4660#define SHRINK_CONVERSION_REGION(beg, end, coding, str, encodep) \
4661 do { \
4662 if (*(end) - *(beg) > shrink_conversion_region_threshhold) \
4663 { \
4664 if (encodep) shrink_encoding_region (beg, end, coding, str); \
4665 else shrink_decoding_region (beg, end, coding, str); \
4666 } \
4667 } while (0)
4668
b843d1ae
KH
4669static Lisp_Object
4670code_convert_region_unwind (dummy)
4671 Lisp_Object dummy;
4672{
4673 inhibit_pre_post_conversion = 0;
4674 return Qnil;
4675}
4676
ec6d2bb8
KH
4677/* Store information about all compositions in the range FROM and TO
4678 of OBJ in memory blocks pointed by CODING->cmp_data. OBJ is a
4679 buffer or a string, defaults to the current buffer. */
4680
4681void
4682coding_save_composition (coding, from, to, obj)
4683 struct coding_system *coding;
4684 int from, to;
4685 Lisp_Object obj;
4686{
4687 Lisp_Object prop;
4688 int start, end;
4689
91bee881
KH
4690 if (coding->composing == COMPOSITION_DISABLED)
4691 return;
4692 if (!coding->cmp_data)
4693 coding_allocate_composition_data (coding, from);
ec6d2bb8
KH
4694 if (!find_composition (from, to, &start, &end, &prop, obj)
4695 || end > to)
4696 return;
4697 if (start < from
4698 && (!find_composition (end, to, &start, &end, &prop, obj)
4699 || end > to))
4700 return;
4701 coding->composing = COMPOSITION_NO;
ec6d2bb8
KH
4702 do
4703 {
4704 if (COMPOSITION_VALID_P (start, end, prop))
4705 {
4706 enum composition_method method = COMPOSITION_METHOD (prop);
4707 if (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH
4708 >= COMPOSITION_DATA_SIZE)
4709 coding_allocate_composition_data (coding, from);
4710 /* For relative composition, we remember start and end
4711 positions, for the other compositions, we also remember
4712 components. */
4713 CODING_ADD_COMPOSITION_START (coding, start - from, method);
4714 if (method != COMPOSITION_RELATIVE)
4715 {
4716 /* We must store a*/
4717 Lisp_Object val, ch;
4718
4719 val = COMPOSITION_COMPONENTS (prop);
4720 if (CONSP (val))
4721 while (CONSP (val))
4722 {
4723 ch = XCAR (val), val = XCDR (val);
4724 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (ch));
4725 }
4726 else if (VECTORP (val) || STRINGP (val))
4727 {
4728 int len = (VECTORP (val)
4729 ? XVECTOR (val)->size : XSTRING (val)->size);
4730 int i;
4731 for (i = 0; i < len; i++)
4732 {
4733 ch = (STRINGP (val)
4734 ? Faref (val, make_number (i))
4735 : XVECTOR (val)->contents[i]);
4736 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (ch));
4737 }
4738 }
4739 else /* INTEGERP (val) */
4740 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (val));
4741 }
4742 CODING_ADD_COMPOSITION_END (coding, end - from);
4743 }
4744 start = end;
4745 }
4746 while (start < to
4747 && find_composition (start, to, &start, &end, &prop, obj)
4748 && end <= to);
4749
4750 /* Make coding->cmp_data point to the first memory block. */
4751 while (coding->cmp_data->prev)
4752 coding->cmp_data = coding->cmp_data->prev;
4753 coding->cmp_data_start = 0;
4754}
4755
4756/* Reflect the saved information about compositions to OBJ.
4757 CODING->cmp_data points to a memory block for the informaiton. OBJ
4758 is a buffer or a string, defaults to the current buffer. */
4759
33fb63eb 4760void
ec6d2bb8
KH
4761coding_restore_composition (coding, obj)
4762 struct coding_system *coding;
4763 Lisp_Object obj;
4764{
4765 struct composition_data *cmp_data = coding->cmp_data;
4766
4767 if (!cmp_data)
4768 return;
4769
4770 while (cmp_data->prev)
4771 cmp_data = cmp_data->prev;
4772
4773 while (cmp_data)
4774 {
4775 int i;
4776
78108bcd
KH
4777 for (i = 0; i < cmp_data->used && cmp_data->data[i] > 0;
4778 i += cmp_data->data[i])
ec6d2bb8
KH
4779 {
4780 int *data = cmp_data->data + i;
4781 enum composition_method method = (enum composition_method) data[3];
4782 Lisp_Object components;
4783
4784 if (method == COMPOSITION_RELATIVE)
4785 components = Qnil;
4786 else
4787 {
4788 int len = data[0] - 4, j;
4789 Lisp_Object args[MAX_COMPOSITION_COMPONENTS * 2 - 1];
4790
4791 for (j = 0; j < len; j++)
4792 args[j] = make_number (data[4 + j]);
4793 components = (method == COMPOSITION_WITH_ALTCHARS
4794 ? Fstring (len, args) : Fvector (len, args));
4795 }
4796 compose_text (data[1], data[2], components, Qnil, obj);
4797 }
4798 cmp_data = cmp_data->next;
4799 }
4800}
4801
d46c5b12 4802/* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the
fb88bf2d
KH
4803 text from FROM to TO (byte positions are FROM_BYTE and TO_BYTE) by
4804 coding system CODING, and return the status code of code conversion
4805 (currently, this value has no meaning).
4806
4807 How many characters (and bytes) are converted to how many
4808 characters (and bytes) are recorded in members of the structure
4809 CODING.
d46c5b12 4810
6e44253b 4811 If REPLACE is nonzero, we do various things as if the original text
d46c5b12 4812 is deleted and a new text is inserted. See the comments in
b73bfc1c
KH
4813 replace_range (insdel.c) to know what we are doing.
4814
4815 If REPLACE is zero, it is assumed that the source text is unibyte.
4816 Otherwize, it is assumed that the source text is multibyte. */
4ed46869
KH
4817
4818int
6e44253b
KH
4819code_convert_region (from, from_byte, to, to_byte, coding, encodep, replace)
4820 int from, from_byte, to, to_byte, encodep, replace;
4ed46869 4821 struct coding_system *coding;
4ed46869 4822{
fb88bf2d
KH
4823 int len = to - from, len_byte = to_byte - from_byte;
4824 int require, inserted, inserted_byte;
4b39528c 4825 int head_skip, tail_skip, total_skip = 0;
84d60297 4826 Lisp_Object saved_coding_symbol;
fb88bf2d 4827 int first = 1;
fb88bf2d 4828 unsigned char *src, *dst;
84d60297 4829 Lisp_Object deletion;
e133c8fa 4830 int orig_point = PT, orig_len = len;
6abb9bd9 4831 int prev_Z;
b73bfc1c
KH
4832 int multibyte_p = !NILP (current_buffer->enable_multibyte_characters);
4833
4834 coding->src_multibyte = replace && multibyte_p;
4835 coding->dst_multibyte = multibyte_p;
84d60297
RS
4836
4837 deletion = Qnil;
4838 saved_coding_symbol = Qnil;
d46c5b12 4839
83fa074f 4840 if (from < PT && PT < to)
e133c8fa
KH
4841 {
4842 TEMP_SET_PT_BOTH (from, from_byte);
4843 orig_point = from;
4844 }
83fa074f 4845
6e44253b 4846 if (replace)
d46c5b12 4847 {
fb88bf2d 4848 int saved_from = from;
e077cc80 4849 int saved_inhibit_modification_hooks;
fb88bf2d 4850
d46c5b12 4851 prepare_to_modify_buffer (from, to, &from);
fb88bf2d
KH
4852 if (saved_from != from)
4853 {
4854 to = from + len;
b73bfc1c 4855 from_byte = CHAR_TO_BYTE (from), to_byte = CHAR_TO_BYTE (to);
fb88bf2d
KH
4856 len_byte = to_byte - from_byte;
4857 }
e077cc80
KH
4858
4859 /* The code conversion routine can not preserve text properties
4860 for now. So, we must remove all text properties in the
4861 region. Here, we must suppress all modification hooks. */
4862 saved_inhibit_modification_hooks = inhibit_modification_hooks;
4863 inhibit_modification_hooks = 1;
4864 Fset_text_properties (make_number (from), make_number (to), Qnil, Qnil);
4865 inhibit_modification_hooks = saved_inhibit_modification_hooks;
d46c5b12 4866 }
d46c5b12
KH
4867
4868 if (! encodep && CODING_REQUIRE_DETECTION (coding))
4869 {
12410ef1 4870 /* We must detect encoding of text and eol format. */
d46c5b12
KH
4871
4872 if (from < GPT && to > GPT)
4873 move_gap_both (from, from_byte);
4874 if (coding->type == coding_type_undecided)
4875 {
fb88bf2d 4876 detect_coding (coding, BYTE_POS_ADDR (from_byte), len_byte);
d46c5b12 4877 if (coding->type == coding_type_undecided)
62b3ef1d
KH
4878 {
4879 /* It seems that the text contains only ASCII, but we
d9aef30f 4880 should not leave it undecided because the deeper
62b3ef1d
KH
4881 decoding routine (decode_coding) tries to detect the
4882 encodings again in vain. */
4883 coding->type = coding_type_emacs_mule;
4884 coding->category_idx = CODING_CATEGORY_IDX_EMACS_MULE;
4885 }
d46c5b12 4886 }
aaaf0b1e
KH
4887 if (coding->eol_type == CODING_EOL_UNDECIDED
4888 && coding->type != coding_type_ccl)
d46c5b12
KH
4889 {
4890 saved_coding_symbol = coding->symbol;
4891 detect_eol (coding, BYTE_POS_ADDR (from_byte), len_byte);
4892 if (coding->eol_type == CODING_EOL_UNDECIDED)
4893 coding->eol_type = CODING_EOL_LF;
4894 /* We had better recover the original eol format if we
4895 encounter an inconsitent eol format while decoding. */
4896 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
4897 }
4898 }
4899
d46c5b12
KH
4900 /* Now we convert the text. */
4901
4902 /* For encoding, we must process pre-write-conversion in advance. */
b73bfc1c
KH
4903 if (! inhibit_pre_post_conversion
4904 && encodep
d46c5b12
KH
4905 && SYMBOLP (coding->pre_write_conversion)
4906 && ! NILP (Ffboundp (coding->pre_write_conversion)))
4907 {
2b4f9037
KH
4908 /* The function in pre-write-conversion may put a new text in a
4909 new buffer. */
0007bdd0
KH
4910 struct buffer *prev = current_buffer;
4911 Lisp_Object new;
b843d1ae 4912 int count = specpdl_ptr - specpdl;
d46c5b12 4913
b843d1ae
KH
4914 record_unwind_protect (code_convert_region_unwind, Qnil);
4915 /* We should not call any more pre-write/post-read-conversion
4916 functions while this pre-write-conversion is running. */
4917 inhibit_pre_post_conversion = 1;
b39f748c
AS
4918 call2 (coding->pre_write_conversion,
4919 make_number (from), make_number (to));
b843d1ae
KH
4920 inhibit_pre_post_conversion = 0;
4921 /* Discard the unwind protect. */
4922 specpdl_ptr--;
4923
d46c5b12
KH
4924 if (current_buffer != prev)
4925 {
4926 len = ZV - BEGV;
0007bdd0 4927 new = Fcurrent_buffer ();
d46c5b12 4928 set_buffer_internal_1 (prev);
7dae4502 4929 del_range_2 (from, from_byte, to, to_byte, 0);
e133c8fa 4930 TEMP_SET_PT_BOTH (from, from_byte);
0007bdd0
KH
4931 insert_from_buffer (XBUFFER (new), 1, len, 0);
4932 Fkill_buffer (new);
e133c8fa
KH
4933 if (orig_point >= to)
4934 orig_point += len - orig_len;
4935 else if (orig_point > from)
4936 orig_point = from;
4937 orig_len = len;
d46c5b12 4938 to = from + len;
b73bfc1c
KH
4939 from_byte = CHAR_TO_BYTE (from);
4940 to_byte = CHAR_TO_BYTE (to);
d46c5b12 4941 len_byte = to_byte - from_byte;
e133c8fa 4942 TEMP_SET_PT_BOTH (from, from_byte);
d46c5b12
KH
4943 }
4944 }
4945
12410ef1
KH
4946 if (replace)
4947 deletion = make_buffer_string_both (from, from_byte, to, to_byte, 1);
4948
ec6d2bb8
KH
4949 if (coding->composing != COMPOSITION_DISABLED)
4950 {
4951 if (encodep)
4952 coding_save_composition (coding, from, to, Fcurrent_buffer ());
4953 else
4954 coding_allocate_composition_data (coding, from);
4955 }
fb88bf2d 4956
b73bfc1c 4957 /* Try to skip the heading and tailing ASCIIs. */
4956c225
KH
4958 if (coding->type != coding_type_ccl)
4959 {
4960 int from_byte_orig = from_byte, to_byte_orig = to_byte;
ec6d2bb8 4961
4956c225
KH
4962 if (from < GPT && GPT < to)
4963 move_gap_both (from, from_byte);
4964 SHRINK_CONVERSION_REGION (&from_byte, &to_byte, coding, NULL, encodep);
4965 if (from_byte == to_byte
4966 && (encodep || NILP (coding->post_read_conversion))
4967 && ! CODING_REQUIRE_FLUSHING (coding))
4968 {
4969 coding->produced = len_byte;
4970 coding->produced_char = len;
4971 if (!replace)
4972 /* We must record and adjust for this new text now. */
4973 adjust_after_insert (from, from_byte_orig, to, to_byte_orig, len);
4974 return 0;
4975 }
4976
4977 head_skip = from_byte - from_byte_orig;
4978 tail_skip = to_byte_orig - to_byte;
4979 total_skip = head_skip + tail_skip;
4980 from += head_skip;
4981 to -= tail_skip;
4982 len -= total_skip; len_byte -= total_skip;
4983 }
d46c5b12 4984
fb88bf2d
KH
4985 /* For converion, we must put the gap before the text in addition to
4986 making the gap larger for efficient decoding. The required gap
4987 size starts from 2000 which is the magic number used in make_gap.
4988 But, after one batch of conversion, it will be incremented if we
4989 find that it is not enough . */
d46c5b12
KH
4990 require = 2000;
4991
4992 if (GAP_SIZE < require)
4993 make_gap (require - GAP_SIZE);
4994 move_gap_both (from, from_byte);
4995
d46c5b12 4996 inserted = inserted_byte = 0;
fb88bf2d
KH
4997
4998 GAP_SIZE += len_byte;
4999 ZV -= len;
5000 Z -= len;
5001 ZV_BYTE -= len_byte;
5002 Z_BYTE -= len_byte;
5003
d9f9a1bc
GM
5004 if (GPT - BEG < BEG_UNCHANGED)
5005 BEG_UNCHANGED = GPT - BEG;
5006 if (Z - GPT < END_UNCHANGED)
5007 END_UNCHANGED = Z - GPT;
f2558efd 5008
b73bfc1c
KH
5009 if (!encodep && coding->src_multibyte)
5010 {
5011 /* Decoding routines expects that the source text is unibyte.
5012 We must convert 8-bit characters of multibyte form to
5013 unibyte. */
5014 int len_byte_orig = len_byte;
5015 len_byte = str_as_unibyte (GAP_END_ADDR - len_byte, len_byte);
5016 if (len_byte < len_byte_orig)
5017 safe_bcopy (GAP_END_ADDR - len_byte_orig, GAP_END_ADDR - len_byte,
5018 len_byte);
5019 coding->src_multibyte = 0;
5020 }
5021
d46c5b12
KH
5022 for (;;)
5023 {
fb88bf2d 5024 int result;
d46c5b12 5025
ec6d2bb8 5026 /* The buffer memory is now:
b73bfc1c
KH
5027 +--------+converted-text+---------+-------original-text-------+---+
5028 |<-from->|<--inserted-->|---------|<--------len_byte--------->|---|
5029 |<---------------------- GAP ----------------------->| */
ec6d2bb8
KH
5030 src = GAP_END_ADDR - len_byte;
5031 dst = GPT_ADDR + inserted_byte;
5032
d46c5b12 5033 if (encodep)
fb88bf2d 5034 result = encode_coding (coding, src, dst, len_byte, 0);
d46c5b12 5035 else
fb88bf2d 5036 result = decode_coding (coding, src, dst, len_byte, 0);
ec6d2bb8
KH
5037
5038 /* The buffer memory is now:
b73bfc1c
KH
5039 +--------+-------converted-text----+--+------original-text----+---+
5040 |<-from->|<-inserted->|<-produced->|--|<-(len_byte-consumed)->|---|
5041 |<---------------------- GAP ----------------------->| */
ec6d2bb8 5042
d46c5b12
KH
5043 inserted += coding->produced_char;
5044 inserted_byte += coding->produced;
d46c5b12 5045 len_byte -= coding->consumed;
ec6d2bb8
KH
5046
5047 if (result == CODING_FINISH_INSUFFICIENT_CMP)
5048 {
5049 coding_allocate_composition_data (coding, from + inserted);
5050 continue;
5051 }
5052
fb88bf2d 5053 src += coding->consumed;
3636f7a3 5054 dst += coding->produced;
d46c5b12 5055
9864ebce
KH
5056 if (result == CODING_FINISH_NORMAL)
5057 {
5058 src += len_byte;
5059 break;
5060 }
d46c5b12
KH
5061 if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL)
5062 {
fb88bf2d 5063 unsigned char *pend = dst, *p = pend - inserted_byte;
38edf7d4 5064 Lisp_Object eol_type;
d46c5b12
KH
5065
5066 /* Encode LFs back to the original eol format (CR or CRLF). */
5067 if (coding->eol_type == CODING_EOL_CR)
5068 {
5069 while (p < pend) if (*p++ == '\n') p[-1] = '\r';
5070 }
5071 else
5072 {
d46c5b12
KH
5073 int count = 0;
5074
fb88bf2d
KH
5075 while (p < pend) if (*p++ == '\n') count++;
5076 if (src - dst < count)
d46c5b12 5077 {
38edf7d4 5078 /* We don't have sufficient room for encoding LFs
fb88bf2d
KH
5079 back to CRLF. We must record converted and
5080 not-yet-converted text back to the buffer
5081 content, enlarge the gap, then record them out of
5082 the buffer contents again. */
5083 int add = len_byte + inserted_byte;
5084
5085 GAP_SIZE -= add;
5086 ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add;
5087 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5088 make_gap (count - GAP_SIZE);
5089 GAP_SIZE += add;
5090 ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add;
5091 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
5092 /* Don't forget to update SRC, DST, and PEND. */
5093 src = GAP_END_ADDR - len_byte;
5094 dst = GPT_ADDR + inserted_byte;
5095 pend = dst;
d46c5b12 5096 }
d46c5b12
KH
5097 inserted += count;
5098 inserted_byte += count;
fb88bf2d
KH
5099 coding->produced += count;
5100 p = dst = pend + count;
5101 while (count)
5102 {
5103 *--p = *--pend;
5104 if (*p == '\n') count--, *--p = '\r';
5105 }
d46c5b12
KH
5106 }
5107
5108 /* Suppress eol-format conversion in the further conversion. */
5109 coding->eol_type = CODING_EOL_LF;
5110
38edf7d4
KH
5111 /* Set the coding system symbol to that for Unix-like EOL. */
5112 eol_type = Fget (saved_coding_symbol, Qeol_type);
5113 if (VECTORP (eol_type)
5114 && XVECTOR (eol_type)->size == 3
5115 && SYMBOLP (XVECTOR (eol_type)->contents[CODING_EOL_LF]))
5116 coding->symbol = XVECTOR (eol_type)->contents[CODING_EOL_LF];
5117 else
5118 coding->symbol = saved_coding_symbol;
fb88bf2d
KH
5119
5120 continue;
d46c5b12
KH
5121 }
5122 if (len_byte <= 0)
944bd420
KH
5123 {
5124 if (coding->type != coding_type_ccl
5125 || coding->mode & CODING_MODE_LAST_BLOCK)
5126 break;
5127 coding->mode |= CODING_MODE_LAST_BLOCK;
5128 continue;
5129 }
d46c5b12
KH
5130 if (result == CODING_FINISH_INSUFFICIENT_SRC)
5131 {
5132 /* The source text ends in invalid codes. Let's just
5133 make them valid buffer contents, and finish conversion. */
fb88bf2d 5134 inserted += len_byte;
d46c5b12 5135 inserted_byte += len_byte;
fb88bf2d 5136 while (len_byte--)
ee59c65f 5137 *dst++ = *src++;
d46c5b12
KH
5138 break;
5139 }
9864ebce
KH
5140 if (result == CODING_FINISH_INTERRUPT)
5141 {
5142 /* The conversion procedure was interrupted by a user. */
9864ebce
KH
5143 break;
5144 }
5145 /* Now RESULT == CODING_FINISH_INSUFFICIENT_DST */
5146 if (coding->consumed < 1)
5147 {
5148 /* It's quite strange to require more memory without
5149 consuming any bytes. Perhaps CCL program bug. */
9864ebce
KH
5150 break;
5151 }
fb88bf2d
KH
5152 if (first)
5153 {
5154 /* We have just done the first batch of conversion which was
5155 stoped because of insufficient gap. Let's reconsider the
5156 required gap size (i.e. SRT - DST) now.
5157
5158 We have converted ORIG bytes (== coding->consumed) into
5159 NEW bytes (coding->produced). To convert the remaining
5160 LEN bytes, we may need REQUIRE bytes of gap, where:
5161 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
5162 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
5163 Here, we are sure that NEW >= ORIG. */
6e44253b
KH
5164 float ratio = coding->produced - coding->consumed;
5165 ratio /= coding->consumed;
5166 require = len_byte * ratio;
fb88bf2d
KH
5167 first = 0;
5168 }
5169 if ((src - dst) < (require + 2000))
5170 {
5171 /* See the comment above the previous call of make_gap. */
5172 int add = len_byte + inserted_byte;
5173
5174 GAP_SIZE -= add;
5175 ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add;
5176 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5177 make_gap (require + 2000);
5178 GAP_SIZE += add;
5179 ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add;
5180 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
fb88bf2d 5181 }
d46c5b12 5182 }
fb88bf2d
KH
5183 if (src - dst > 0) *dst = 0; /* Put an anchor. */
5184
b73bfc1c
KH
5185 if (encodep && coding->dst_multibyte)
5186 {
5187 /* The output is unibyte. We must convert 8-bit characters to
5188 multibyte form. */
5189 if (inserted_byte * 2 > GAP_SIZE)
5190 {
5191 GAP_SIZE -= inserted_byte;
5192 ZV += inserted_byte; Z += inserted_byte;
5193 ZV_BYTE += inserted_byte; Z_BYTE += inserted_byte;
5194 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5195 make_gap (inserted_byte - GAP_SIZE);
5196 GAP_SIZE += inserted_byte;
5197 ZV -= inserted_byte; Z -= inserted_byte;
5198 ZV_BYTE -= inserted_byte; Z_BYTE -= inserted_byte;
5199 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
5200 }
5201 inserted_byte = str_to_multibyte (GPT_ADDR, GAP_SIZE, inserted_byte);
5202 }
7553d0e1 5203
12410ef1
KH
5204 /* If we have shrinked the conversion area, adjust it now. */
5205 if (total_skip > 0)
5206 {
5207 if (tail_skip > 0)
5208 safe_bcopy (GAP_END_ADDR, GPT_ADDR + inserted_byte, tail_skip);
5209 inserted += total_skip; inserted_byte += total_skip;
5210 GAP_SIZE += total_skip;
5211 GPT -= head_skip; GPT_BYTE -= head_skip;
5212 ZV -= total_skip; ZV_BYTE -= total_skip;
5213 Z -= total_skip; Z_BYTE -= total_skip;
5214 from -= head_skip; from_byte -= head_skip;
5215 to += tail_skip; to_byte += tail_skip;
5216 }
5217
6abb9bd9 5218 prev_Z = Z;
12410ef1 5219 adjust_after_replace (from, from_byte, deletion, inserted, inserted_byte);
6abb9bd9 5220 inserted = Z - prev_Z;
4ed46869 5221
ec6d2bb8
KH
5222 if (!encodep && coding->cmp_data && coding->cmp_data->used)
5223 coding_restore_composition (coding, Fcurrent_buffer ());
5224 coding_free_composition_data (coding);
5225
b73bfc1c
KH
5226 if (! inhibit_pre_post_conversion
5227 && ! encodep && ! NILP (coding->post_read_conversion))
d46c5b12 5228 {
2b4f9037 5229 Lisp_Object val;
b843d1ae 5230 int count = specpdl_ptr - specpdl;
4ed46869 5231
e133c8fa
KH
5232 if (from != PT)
5233 TEMP_SET_PT_BOTH (from, from_byte);
6abb9bd9 5234 prev_Z = Z;
b843d1ae
KH
5235 record_unwind_protect (code_convert_region_unwind, Qnil);
5236 /* We should not call any more pre-write/post-read-conversion
5237 functions while this post-read-conversion is running. */
5238 inhibit_pre_post_conversion = 1;
2b4f9037 5239 val = call1 (coding->post_read_conversion, make_number (inserted));
b843d1ae
KH
5240 inhibit_pre_post_conversion = 0;
5241 /* Discard the unwind protect. */
5242 specpdl_ptr--;
6abb9bd9 5243 CHECK_NUMBER (val, 0);
944bd420 5244 inserted += Z - prev_Z;
e133c8fa
KH
5245 }
5246
5247 if (orig_point >= from)
5248 {
5249 if (orig_point >= from + orig_len)
5250 orig_point += inserted - orig_len;
5251 else
5252 orig_point = from;
5253 TEMP_SET_PT (orig_point);
d46c5b12 5254 }
4ed46869 5255
ec6d2bb8
KH
5256 if (replace)
5257 {
5258 signal_after_change (from, to - from, inserted);
e19539f1 5259 update_compositions (from, from + inserted, CHECK_BORDER);
ec6d2bb8 5260 }
2b4f9037 5261
fb88bf2d 5262 {
12410ef1
KH
5263 coding->consumed = to_byte - from_byte;
5264 coding->consumed_char = to - from;
5265 coding->produced = inserted_byte;
5266 coding->produced_char = inserted;
fb88bf2d 5267 }
7553d0e1 5268
fb88bf2d 5269 return 0;
d46c5b12
KH
5270}
5271
5272Lisp_Object
b73bfc1c
KH
5273run_pre_post_conversion_on_str (str, coding, encodep)
5274 Lisp_Object str;
5275 struct coding_system *coding;
5276 int encodep;
5277{
5278 int count = specpdl_ptr - specpdl;
5279 struct gcpro gcpro1;
5280 struct buffer *prev = current_buffer;
5281 int multibyte = STRING_MULTIBYTE (str);
5282
5283 record_unwind_protect (Fset_buffer, Fcurrent_buffer ());
5284 record_unwind_protect (code_convert_region_unwind, Qnil);
5285 GCPRO1 (str);
5286 temp_output_buffer_setup (" *code-converting-work*");
5287 set_buffer_internal (XBUFFER (Vstandard_output));
5288 /* We must insert the contents of STR as is without
5289 unibyte<->multibyte conversion. For that, we adjust the
5290 multibyteness of the working buffer to that of STR. */
5291 Ferase_buffer ();
5292 current_buffer->enable_multibyte_characters = multibyte ? Qt : Qnil;
5293 insert_from_string (str, 0, 0,
5294 XSTRING (str)->size, STRING_BYTES (XSTRING (str)), 0);
5295 UNGCPRO;
5296 inhibit_pre_post_conversion = 1;
5297 if (encodep)
5298 call2 (coding->pre_write_conversion, make_number (BEG), make_number (Z));
5299 else
6bac5b12
KH
5300 {
5301 TEMP_SET_PT_BOTH (BEG, BEG_BYTE);
5302 call1 (coding->post_read_conversion, make_number (Z - BEG));
5303 }
b73bfc1c 5304 inhibit_pre_post_conversion = 0;
78108bcd 5305 str = make_buffer_string (BEG, Z, 1);
b73bfc1c
KH
5306 return unbind_to (count, str);
5307}
5308
5309Lisp_Object
5310decode_coding_string (str, coding, nocopy)
d46c5b12 5311 Lisp_Object str;
4ed46869 5312 struct coding_system *coding;
b73bfc1c 5313 int nocopy;
4ed46869 5314{
d46c5b12 5315 int len;
73be902c 5316 struct conversion_buffer buf;
b73bfc1c 5317 int from, to, to_byte;
d46c5b12 5318 struct gcpro gcpro1;
84d60297 5319 Lisp_Object saved_coding_symbol;
d46c5b12 5320 int result;
78108bcd 5321 int require_decoding;
73be902c
KH
5322 int shrinked_bytes = 0;
5323 Lisp_Object newstr;
2391eaa4 5324 int consumed, consumed_char, produced, produced_char;
4ed46869 5325
b73bfc1c
KH
5326 from = 0;
5327 to = XSTRING (str)->size;
5328 to_byte = STRING_BYTES (XSTRING (str));
4ed46869 5329
b73bfc1c
KH
5330 saved_coding_symbol = Qnil;
5331 if (CODING_REQUIRE_DETECTION (coding))
d46c5b12
KH
5332 {
5333 /* See the comments in code_convert_region. */
5334 if (coding->type == coding_type_undecided)
5335 {
5336 detect_coding (coding, XSTRING (str)->data, to_byte);
5337 if (coding->type == coding_type_undecided)
5338 coding->type = coding_type_emacs_mule;
5339 }
aaaf0b1e
KH
5340 if (coding->eol_type == CODING_EOL_UNDECIDED
5341 && coding->type != coding_type_ccl)
d46c5b12
KH
5342 {
5343 saved_coding_symbol = coding->symbol;
5344 detect_eol (coding, XSTRING (str)->data, to_byte);
5345 if (coding->eol_type == CODING_EOL_UNDECIDED)
5346 coding->eol_type = CODING_EOL_LF;
5347 /* We had better recover the original eol format if we
5348 encounter an inconsitent eol format while decoding. */
5349 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
5350 }
5351 }
4ed46869 5352
e2c06b17
KH
5353 coding->src_multibyte = 0;
5354 coding->dst_multibyte = (coding->type != coding_type_no_conversion
5355 && coding->type != coding_type_raw_text);
78108bcd 5356 require_decoding = CODING_REQUIRE_DECODING (coding);
ec6d2bb8 5357
b73bfc1c 5358 if (STRING_MULTIBYTE (str))
d46c5b12 5359 {
b73bfc1c
KH
5360 /* Decoding routines expect the source text to be unibyte. */
5361 str = Fstring_as_unibyte (str);
86af83a9 5362 to_byte = STRING_BYTES (XSTRING (str));
b73bfc1c 5363 nocopy = 1;
b73bfc1c 5364 }
ec6d2bb8 5365
b73bfc1c 5366 /* Try to skip the heading and tailing ASCIIs. */
78108bcd 5367 if (require_decoding && coding->type != coding_type_ccl)
4956c225 5368 {
4956c225
KH
5369 SHRINK_CONVERSION_REGION (&from, &to_byte, coding, XSTRING (str)->data,
5370 0);
5371 if (from == to_byte)
78108bcd 5372 require_decoding = 0;
73be902c 5373 shrinked_bytes = from + (STRING_BYTES (XSTRING (str)) - to_byte);
4956c225 5374 }
b73bfc1c 5375
78108bcd
KH
5376 if (!require_decoding)
5377 {
5378 coding->consumed = STRING_BYTES (XSTRING (str));
5379 coding->consumed_char = XSTRING (str)->size;
5380 if (coding->dst_multibyte)
5381 {
5382 str = Fstring_as_multibyte (str);
5383 nocopy = 1;
5384 }
5385 coding->produced = STRING_BYTES (XSTRING (str));
5386 coding->produced_char = XSTRING (str)->size;
5387 return (nocopy ? str : Fcopy_sequence (str));
5388 }
5389
5390 if (coding->composing != COMPOSITION_DISABLED)
5391 coding_allocate_composition_data (coding, from);
b73bfc1c 5392 len = decoding_buffer_size (coding, to_byte - from);
73be902c 5393 allocate_conversion_buffer (buf, len);
4ed46869 5394
2391eaa4 5395 consumed = consumed_char = produced = produced_char = 0;
73be902c 5396 while (1)
4ed46869 5397 {
73be902c
KH
5398 result = decode_coding (coding, XSTRING (str)->data + from + consumed,
5399 buf.data + produced, to_byte - from - consumed,
5400 buf.size - produced);
5401 consumed += coding->consumed;
2391eaa4 5402 consumed_char += coding->consumed_char;
73be902c
KH
5403 produced += coding->produced;
5404 produced_char += coding->produced_char;
2391eaa4
KH
5405 if (result == CODING_FINISH_NORMAL
5406 || (result == CODING_FINISH_INSUFFICIENT_SRC
5407 && coding->consumed == 0))
73be902c
KH
5408 break;
5409 if (result == CODING_FINISH_INSUFFICIENT_CMP)
5410 coding_allocate_composition_data (coding, from + produced_char);
5411 else if (result == CODING_FINISH_INSUFFICIENT_DST)
5412 extend_conversion_buffer (&buf);
5413 else if (result == CODING_FINISH_INCONSISTENT_EOL)
5414 {
5415 /* Recover the original EOL format. */
5416 if (coding->eol_type == CODING_EOL_CR)
5417 {
5418 unsigned char *p;
5419 for (p = buf.data; p < buf.data + produced; p++)
5420 if (*p == '\n') *p = '\r';
5421 }
5422 else if (coding->eol_type == CODING_EOL_CRLF)
5423 {
5424 int num_eol = 0;
5425 unsigned char *p0, *p1;
5426 for (p0 = buf.data, p1 = p0 + produced; p0 < p1; p0++)
5427 if (*p0 == '\n') num_eol++;
5428 if (produced + num_eol >= buf.size)
5429 extend_conversion_buffer (&buf);
5430 for (p0 = buf.data + produced, p1 = p0 + num_eol; p0 > buf.data;)
5431 {
5432 *--p1 = *--p0;
5433 if (*p0 == '\n') *--p1 = '\r';
5434 }
5435 produced += num_eol;
5436 produced_char += num_eol;
5437 }
5438 coding->eol_type = CODING_EOL_LF;
5439 coding->symbol = saved_coding_symbol;
5440 }
4ed46869 5441 }
d46c5b12 5442
2391eaa4
KH
5443 coding->consumed = consumed;
5444 coding->consumed_char = consumed_char;
5445 coding->produced = produced;
5446 coding->produced_char = produced_char;
5447
78108bcd 5448 if (coding->dst_multibyte)
73be902c
KH
5449 newstr = make_uninit_multibyte_string (produced_char + shrinked_bytes,
5450 produced + shrinked_bytes);
78108bcd 5451 else
73be902c
KH
5452 newstr = make_uninit_string (produced + shrinked_bytes);
5453 if (from > 0)
5454 bcopy (XSTRING (str)->data, XSTRING (newstr)->data, from);
5455 bcopy (buf.data, XSTRING (newstr)->data + from, produced);
5456 if (shrinked_bytes > from)
5457 bcopy (XSTRING (str)->data + to_byte,
5458 XSTRING (newstr)->data + from + produced,
5459 shrinked_bytes - from);
5460 free_conversion_buffer (&buf);
b73bfc1c
KH
5461
5462 if (coding->cmp_data && coding->cmp_data->used)
73be902c 5463 coding_restore_composition (coding, newstr);
b73bfc1c
KH
5464 coding_free_composition_data (coding);
5465
5466 if (SYMBOLP (coding->post_read_conversion)
5467 && !NILP (Ffboundp (coding->post_read_conversion)))
73be902c 5468 newstr = run_pre_post_conversion_on_str (newstr, coding, 0);
b73bfc1c 5469
73be902c 5470 return newstr;
b73bfc1c
KH
5471}
5472
5473Lisp_Object
5474encode_coding_string (str, coding, nocopy)
5475 Lisp_Object str;
5476 struct coding_system *coding;
5477 int nocopy;
5478{
5479 int len;
73be902c 5480 struct conversion_buffer buf;
b73bfc1c
KH
5481 int from, to, to_byte;
5482 struct gcpro gcpro1;
5483 Lisp_Object saved_coding_symbol;
5484 int result;
73be902c
KH
5485 int shrinked_bytes = 0;
5486 Lisp_Object newstr;
2391eaa4 5487 int consumed, consumed_char, produced, produced_char;
b73bfc1c
KH
5488
5489 if (SYMBOLP (coding->pre_write_conversion)
5490 && !NILP (Ffboundp (coding->pre_write_conversion)))
6bac5b12 5491 str = run_pre_post_conversion_on_str (str, coding, 1);
b73bfc1c
KH
5492
5493 from = 0;
5494 to = XSTRING (str)->size;
5495 to_byte = STRING_BYTES (XSTRING (str));
5496
5497 saved_coding_symbol = Qnil;
e2c06b17
KH
5498
5499 /* Encoding routines determine the multibyteness of the source text
5500 by coding->src_multibyte. */
5501 coding->src_multibyte = STRING_MULTIBYTE (str);
5502 coding->dst_multibyte = 0;
b73bfc1c 5503 if (! CODING_REQUIRE_ENCODING (coding))
826bfb8b 5504 {
2391eaa4
KH
5505 coding->consumed = STRING_BYTES (XSTRING (str));
5506 coding->consumed_char = XSTRING (str)->size;
b73bfc1c
KH
5507 if (STRING_MULTIBYTE (str))
5508 {
5509 str = Fstring_as_unibyte (str);
5510 nocopy = 1;
5511 }
2391eaa4
KH
5512 coding->produced = STRING_BYTES (XSTRING (str));
5513 coding->produced_char = XSTRING (str)->size;
b73bfc1c 5514 return (nocopy ? str : Fcopy_sequence (str));
826bfb8b
KH
5515 }
5516
b73bfc1c
KH
5517 if (coding->composing != COMPOSITION_DISABLED)
5518 coding_save_composition (coding, from, to, str);
ec6d2bb8 5519
b73bfc1c 5520 /* Try to skip the heading and tailing ASCIIs. */
4956c225
KH
5521 if (coding->type != coding_type_ccl)
5522 {
4956c225
KH
5523 SHRINK_CONVERSION_REGION (&from, &to_byte, coding, XSTRING (str)->data,
5524 1);
5525 if (from == to_byte)
5526 return (nocopy ? str : Fcopy_sequence (str));
73be902c 5527 shrinked_bytes = from + (STRING_BYTES (XSTRING (str)) - to_byte);
4956c225 5528 }
b73bfc1c
KH
5529
5530 len = encoding_buffer_size (coding, to_byte - from);
73be902c
KH
5531 allocate_conversion_buffer (buf, len);
5532
2391eaa4 5533 consumed = consumed_char = produced = produced_char = 0;
73be902c
KH
5534 while (1)
5535 {
5536 result = encode_coding (coding, XSTRING (str)->data + from + consumed,
5537 buf.data + produced, to_byte - from - consumed,
5538 buf.size - produced);
5539 consumed += coding->consumed;
2391eaa4 5540 consumed_char += coding->consumed_char;
13004bef 5541 produced += coding->produced;
2391eaa4
KH
5542 produced_char += coding->produced_char;
5543 if (result == CODING_FINISH_NORMAL
5544 || (result == CODING_FINISH_INSUFFICIENT_SRC
5545 && coding->consumed == 0))
73be902c
KH
5546 break;
5547 /* Now result should be CODING_FINISH_INSUFFICIENT_DST. */
5548 extend_conversion_buffer (&buf);
5549 }
5550
2391eaa4
KH
5551 coding->consumed = consumed;
5552 coding->consumed_char = consumed_char;
5553 coding->produced = produced;
5554 coding->produced_char = produced_char;
5555
73be902c 5556 newstr = make_uninit_string (produced + shrinked_bytes);
b73bfc1c 5557 if (from > 0)
73be902c
KH
5558 bcopy (XSTRING (str)->data, XSTRING (newstr)->data, from);
5559 bcopy (buf.data, XSTRING (newstr)->data + from, produced);
5560 if (shrinked_bytes > from)
5561 bcopy (XSTRING (str)->data + to_byte,
5562 XSTRING (newstr)->data + from + produced,
5563 shrinked_bytes - from);
5564
5565 free_conversion_buffer (&buf);
ec6d2bb8 5566 coding_free_composition_data (coding);
b73bfc1c 5567
73be902c 5568 return newstr;
4ed46869
KH
5569}
5570
5571\f
5572#ifdef emacs
1397dc18 5573/*** 8. Emacs Lisp library functions ***/
4ed46869 5574
4ed46869
KH
5575DEFUN ("coding-system-p", Fcoding_system_p, Scoding_system_p, 1, 1, 0,
5576 "Return t if OBJECT is nil or a coding-system.\n\
3a73fa5d
RS
5577See the documentation of `make-coding-system' for information\n\
5578about coding-system objects.")
4ed46869
KH
5579 (obj)
5580 Lisp_Object obj;
5581{
4608c386
KH
5582 if (NILP (obj))
5583 return Qt;
5584 if (!SYMBOLP (obj))
5585 return Qnil;
5586 /* Get coding-spec vector for OBJ. */
5587 obj = Fget (obj, Qcoding_system);
5588 return ((VECTORP (obj) && XVECTOR (obj)->size == 5)
5589 ? Qt : Qnil);
4ed46869
KH
5590}
5591
9d991de8
RS
5592DEFUN ("read-non-nil-coding-system", Fread_non_nil_coding_system,
5593 Sread_non_nil_coding_system, 1, 1, 0,
e0e989f6 5594 "Read a coding system from the minibuffer, prompting with string PROMPT.")
4ed46869
KH
5595 (prompt)
5596 Lisp_Object prompt;
5597{
e0e989f6 5598 Lisp_Object val;
9d991de8
RS
5599 do
5600 {
4608c386
KH
5601 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil,
5602 Qt, Qnil, Qcoding_system_history, Qnil, Qnil);
9d991de8
RS
5603 }
5604 while (XSTRING (val)->size == 0);
e0e989f6 5605 return (Fintern (val, Qnil));
4ed46869
KH
5606}
5607
9b787f3e
RS
5608DEFUN ("read-coding-system", Fread_coding_system, Sread_coding_system, 1, 2, 0,
5609 "Read a coding system from the minibuffer, prompting with string PROMPT.\n\
5610If the user enters null input, return second argument DEFAULT-CODING-SYSTEM.")
5611 (prompt, default_coding_system)
5612 Lisp_Object prompt, default_coding_system;
4ed46869 5613{
f44d27ce 5614 Lisp_Object val;
9b787f3e
RS
5615 if (SYMBOLP (default_coding_system))
5616 XSETSTRING (default_coding_system, XSYMBOL (default_coding_system)->name);
4608c386 5617 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil,
9b787f3e
RS
5618 Qt, Qnil, Qcoding_system_history,
5619 default_coding_system, Qnil);
e0e989f6 5620 return (XSTRING (val)->size == 0 ? Qnil : Fintern (val, Qnil));
4ed46869
KH
5621}
5622
5623DEFUN ("check-coding-system", Fcheck_coding_system, Scheck_coding_system,
5624 1, 1, 0,
5625 "Check validity of CODING-SYSTEM.\n\
3a73fa5d
RS
5626If valid, return CODING-SYSTEM, else signal a `coding-system-error' error.\n\
5627It is valid if it is a symbol with a non-nil `coding-system' property.\n\
4ed46869
KH
5628The value of property should be a vector of length 5.")
5629 (coding_system)
5630 Lisp_Object coding_system;
5631{
5632 CHECK_SYMBOL (coding_system, 0);
5633 if (!NILP (Fcoding_system_p (coding_system)))
5634 return coding_system;
5635 while (1)
02ba4723 5636 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil));
4ed46869 5637}
3a73fa5d 5638\f
d46c5b12 5639Lisp_Object
0a28aafb 5640detect_coding_system (src, src_bytes, highest, multibytep)
d46c5b12
KH
5641 unsigned char *src;
5642 int src_bytes, highest;
0a28aafb 5643 int multibytep;
4ed46869
KH
5644{
5645 int coding_mask, eol_type;
d46c5b12
KH
5646 Lisp_Object val, tmp;
5647 int dummy;
4ed46869 5648
0a28aafb 5649 coding_mask = detect_coding_mask (src, src_bytes, NULL, &dummy, multibytep);
d46c5b12
KH
5650 eol_type = detect_eol_type (src, src_bytes, &dummy);
5651 if (eol_type == CODING_EOL_INCONSISTENT)
25b02698 5652 eol_type = CODING_EOL_UNDECIDED;
4ed46869 5653
d46c5b12 5654 if (!coding_mask)
4ed46869 5655 {
27901516 5656 val = Qundecided;
d46c5b12 5657 if (eol_type != CODING_EOL_UNDECIDED)
4ed46869 5658 {
f44d27ce
RS
5659 Lisp_Object val2;
5660 val2 = Fget (Qundecided, Qeol_type);
4ed46869
KH
5661 if (VECTORP (val2))
5662 val = XVECTOR (val2)->contents[eol_type];
5663 }
80e803b4 5664 return (highest ? val : Fcons (val, Qnil));
4ed46869 5665 }
4ed46869 5666
d46c5b12
KH
5667 /* At first, gather possible coding systems in VAL. */
5668 val = Qnil;
fa42c37f 5669 for (tmp = Vcoding_category_list; CONSP (tmp); tmp = XCDR (tmp))
4ed46869 5670 {
fa42c37f
KH
5671 Lisp_Object category_val, category_index;
5672
5673 category_index = Fget (XCAR (tmp), Qcoding_category_index);
5674 category_val = Fsymbol_value (XCAR (tmp));
5675 if (!NILP (category_val)
5676 && NATNUMP (category_index)
5677 && (coding_mask & (1 << XFASTINT (category_index))))
4ed46869 5678 {
fa42c37f 5679 val = Fcons (category_val, val);
d46c5b12
KH
5680 if (highest)
5681 break;
4ed46869
KH
5682 }
5683 }
d46c5b12
KH
5684 if (!highest)
5685 val = Fnreverse (val);
4ed46869 5686
65059037 5687 /* Then, replace the elements with subsidiary coding systems. */
fa42c37f 5688 for (tmp = val; CONSP (tmp); tmp = XCDR (tmp))
4ed46869 5689 {
65059037
RS
5690 if (eol_type != CODING_EOL_UNDECIDED
5691 && eol_type != CODING_EOL_INCONSISTENT)
4ed46869 5692 {
d46c5b12 5693 Lisp_Object eol;
03699b14 5694 eol = Fget (XCAR (tmp), Qeol_type);
d46c5b12 5695 if (VECTORP (eol))
03699b14 5696 XCAR (tmp) = XVECTOR (eol)->contents[eol_type];
4ed46869
KH
5697 }
5698 }
03699b14 5699 return (highest ? XCAR (val) : val);
d46c5b12 5700}
4ed46869 5701
d46c5b12
KH
5702DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region,
5703 2, 3, 0,
5704 "Detect coding system of the text in the region between START and END.\n\
5705Return a list of possible coding systems ordered by priority.\n\
5706\n\
80e803b4
KH
5707If only ASCII characters are found, it returns a list of single element\n\
5708`undecided' or its subsidiary coding system according to a detected\n\
5709end-of-line format.\n\
d46c5b12
KH
5710\n\
5711If optional argument HIGHEST is non-nil, return the coding system of\n\
5712highest priority.")
5713 (start, end, highest)
5714 Lisp_Object start, end, highest;
5715{
5716 int from, to;
5717 int from_byte, to_byte;
6289dd10 5718
d46c5b12
KH
5719 CHECK_NUMBER_COERCE_MARKER (start, 0);
5720 CHECK_NUMBER_COERCE_MARKER (end, 1);
4ed46869 5721
d46c5b12
KH
5722 validate_region (&start, &end);
5723 from = XINT (start), to = XINT (end);
5724 from_byte = CHAR_TO_BYTE (from);
5725 to_byte = CHAR_TO_BYTE (to);
6289dd10 5726
d46c5b12
KH
5727 if (from < GPT && to >= GPT)
5728 move_gap_both (to, to_byte);
4ed46869 5729
d46c5b12
KH
5730 return detect_coding_system (BYTE_POS_ADDR (from_byte),
5731 to_byte - from_byte,
0a28aafb
KH
5732 !NILP (highest),
5733 !NILP (current_buffer
5734 ->enable_multibyte_characters));
d46c5b12 5735}
6289dd10 5736
d46c5b12
KH
5737DEFUN ("detect-coding-string", Fdetect_coding_string, Sdetect_coding_string,
5738 1, 2, 0,
5739 "Detect coding system of the text in STRING.\n\
5740Return a list of possible coding systems ordered by priority.\n\
5741\n\
80e803b4
KH
5742If only ASCII characters are found, it returns a list of single element\n\
5743`undecided' or its subsidiary coding system according to a detected\n\
5744end-of-line format.\n\
d46c5b12
KH
5745\n\
5746If optional argument HIGHEST is non-nil, return the coding system of\n\
5747highest priority.")
5748 (string, highest)
5749 Lisp_Object string, highest;
5750{
5751 CHECK_STRING (string, 0);
4ed46869 5752
d46c5b12 5753 return detect_coding_system (XSTRING (string)->data,
fc932ac6 5754 STRING_BYTES (XSTRING (string)),
0a28aafb
KH
5755 !NILP (highest),
5756 STRING_MULTIBYTE (string));
4ed46869
KH
5757}
5758
05e6f5dc
KH
5759/* Return an intersection of lists L1 and L2. */
5760
5761static Lisp_Object
5762intersection (l1, l2)
5763 Lisp_Object l1, l2;
5764{
5765 Lisp_Object val;
5766
5767 for (val = Qnil; CONSP (l1); l1 = XCDR (l1))
5768 {
5769 if (!NILP (Fmemq (XCAR (l1), l2)))
5770 val = Fcons (XCAR (l1), val);
5771 }
5772 return val;
5773}
5774
5775
5776/* Subroutine for Fsafe_coding_systems_region_internal.
5777
5778 Return a list of coding systems that safely encode the multibyte
5779 text between P and PEND. SAFE_CODINGS, if non-nil, is a list of
5780 possible coding systems. If it is nil, it means that we have not
5781 yet found any coding systems.
5782
5783 WORK_TABLE is a copy of the char-table Vchar_coding_system_table. An
5784 element of WORK_TABLE is set to t once the element is looked up.
5785
5786 If a non-ASCII single byte char is found, set
5787 *single_byte_char_found to 1. */
5788
5789static Lisp_Object
5790find_safe_codings (p, pend, safe_codings, work_table, single_byte_char_found)
5791 unsigned char *p, *pend;
5792 Lisp_Object safe_codings, work_table;
5793 int *single_byte_char_found;
5794{
5795 int c, len, idx;
5796 Lisp_Object val;
5797
5798 while (p < pend)
5799 {
5800 c = STRING_CHAR_AND_LENGTH (p, pend - p, len);
5801 p += len;
5802 if (ASCII_BYTE_P (c))
5803 /* We can ignore ASCII characters here. */
5804 continue;
5805 if (SINGLE_BYTE_CHAR_P (c))
5806 *single_byte_char_found = 1;
5807 if (NILP (safe_codings))
5808 continue;
5809 /* Check the safe coding systems for C. */
5810 val = char_table_ref_and_index (work_table, c, &idx);
5811 if (EQ (val, Qt))
5812 /* This element was already checked. Ignore it. */
5813 continue;
5814 /* Remember that we checked this element. */
975f250a 5815 CHAR_TABLE_SET (work_table, make_number (idx), Qt);
05e6f5dc
KH
5816
5817 /* If there are some safe coding systems for C and we have
5818 already found the other set of coding systems for the
5819 different characters, get the intersection of them. */
5820 if (!EQ (safe_codings, Qt) && !NILP (val))
5821 val = intersection (safe_codings, val);
5822 safe_codings = val;
5823 }
5824 return safe_codings;
5825}
5826
5827
5828/* Return a list of coding systems that safely encode the text between
5829 START and END. If the text contains only ASCII or is unibyte,
5830 return t. */
5831
5832DEFUN ("find-coding-systems-region-internal",
5833 Ffind_coding_systems_region_internal,
5834 Sfind_coding_systems_region_internal, 2, 2, 0,
5835 "Internal use only.")
5836 (start, end)
5837 Lisp_Object start, end;
5838{
5839 Lisp_Object work_table, safe_codings;
5840 int non_ascii_p = 0;
5841 int single_byte_char_found = 0;
5842 unsigned char *p1, *p1end, *p2, *p2end, *p;
5843 Lisp_Object args[2];
5844
5845 if (STRINGP (start))
5846 {
5847 if (!STRING_MULTIBYTE (start))
5848 return Qt;
5849 p1 = XSTRING (start)->data, p1end = p1 + STRING_BYTES (XSTRING (start));
5850 p2 = p2end = p1end;
5851 if (XSTRING (start)->size != STRING_BYTES (XSTRING (start)))
5852 non_ascii_p = 1;
5853 }
5854 else
5855 {
5856 int from, to, stop;
5857
5858 CHECK_NUMBER_COERCE_MARKER (start, 0);
5859 CHECK_NUMBER_COERCE_MARKER (end, 1);
5860 if (XINT (start) < BEG || XINT (end) > Z || XINT (start) > XINT (end))
5861 args_out_of_range (start, end);
5862 if (NILP (current_buffer->enable_multibyte_characters))
5863 return Qt;
5864 from = CHAR_TO_BYTE (XINT (start));
5865 to = CHAR_TO_BYTE (XINT (end));
5866 stop = from < GPT_BYTE && GPT_BYTE < to ? GPT_BYTE : to;
5867 p1 = BYTE_POS_ADDR (from), p1end = p1 + (stop - from);
5868 if (stop == to)
5869 p2 = p2end = p1end;
5870 else
5871 p2 = BYTE_POS_ADDR (stop), p2end = p2 + (to - stop);
5872 if (XINT (end) - XINT (start) != to - from)
5873 non_ascii_p = 1;
5874 }
5875
5876 if (!non_ascii_p)
5877 {
5878 /* We are sure that the text contains no multibyte character.
5879 Check if it contains eight-bit-graphic. */
5880 p = p1;
5881 for (p = p1; p < p1end && ASCII_BYTE_P (*p); p++);
5882 if (p == p1end)
5883 {
5884 for (p = p2; p < p2end && ASCII_BYTE_P (*p); p++);
5885 if (p == p2end)
5886 return Qt;
5887 }
5888 }
5889
5890 /* The text contains non-ASCII characters. */
5891 work_table = Fcopy_sequence (Vchar_coding_system_table);
5892 safe_codings = find_safe_codings (p1, p1end, Qt, work_table,
5893 &single_byte_char_found);
5894 if (p2 < p2end)
5895 safe_codings = find_safe_codings (p2, p2end, safe_codings, work_table,
5896 &single_byte_char_found);
5897
5898 if (!single_byte_char_found)
5899 {
5900 /* Append generic coding systems. */
5901 Lisp_Object args[2];
5902 args[0] = safe_codings;
5903 args[1] = Fchar_table_extra_slot (Vchar_coding_system_table,
5904 make_number (0));
975f250a 5905 safe_codings = Fappend (2, args);
05e6f5dc
KH
5906 }
5907 else
109a5acb
KH
5908 safe_codings = Fcons (Qraw_text,
5909 Fcons (Qemacs_mule,
5910 Fcons (Qno_conversion, safe_codings)));
05e6f5dc
KH
5911 return safe_codings;
5912}
5913
5914
4031e2bf
KH
5915Lisp_Object
5916code_convert_region1 (start, end, coding_system, encodep)
d46c5b12 5917 Lisp_Object start, end, coding_system;
4031e2bf 5918 int encodep;
3a73fa5d
RS
5919{
5920 struct coding_system coding;
4031e2bf 5921 int from, to, len;
3a73fa5d 5922
d46c5b12
KH
5923 CHECK_NUMBER_COERCE_MARKER (start, 0);
5924 CHECK_NUMBER_COERCE_MARKER (end, 1);
3a73fa5d
RS
5925 CHECK_SYMBOL (coding_system, 2);
5926
d46c5b12
KH
5927 validate_region (&start, &end);
5928 from = XFASTINT (start);
5929 to = XFASTINT (end);
5930
3a73fa5d 5931 if (NILP (coding_system))
d46c5b12
KH
5932 return make_number (to - from);
5933
3a73fa5d 5934 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
d46c5b12 5935 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
3a73fa5d 5936
d46c5b12 5937 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
5938 coding.src_multibyte = coding.dst_multibyte
5939 = !NILP (current_buffer->enable_multibyte_characters);
fb88bf2d
KH
5940 code_convert_region (from, CHAR_TO_BYTE (from), to, CHAR_TO_BYTE (to),
5941 &coding, encodep, 1);
f072a3e8 5942 Vlast_coding_system_used = coding.symbol;
fb88bf2d 5943 return make_number (coding.produced_char);
4031e2bf
KH
5944}
5945
5946DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region,
5947 3, 3, "r\nzCoding system: ",
5948 "Decode the current region by specified coding system.\n\
5949When called from a program, takes three arguments:\n\
5950START, END, and CODING-SYSTEM. START and END are buffer positions.\n\
f072a3e8
RS
5951This function sets `last-coding-system-used' to the precise coding system\n\
5952used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
5953not fully specified.)\n\
5954It returns the length of the decoded text.")
4031e2bf
KH
5955 (start, end, coding_system)
5956 Lisp_Object start, end, coding_system;
5957{
5958 return code_convert_region1 (start, end, coding_system, 0);
3a73fa5d
RS
5959}
5960
5961DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region,
5962 3, 3, "r\nzCoding system: ",
d46c5b12 5963 "Encode the current region by specified coding system.\n\
3a73fa5d 5964When called from a program, takes three arguments:\n\
d46c5b12 5965START, END, and CODING-SYSTEM. START and END are buffer positions.\n\
f072a3e8
RS
5966This function sets `last-coding-system-used' to the precise coding system\n\
5967used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
5968not fully specified.)\n\
5969It returns the length of the encoded text.")
d46c5b12
KH
5970 (start, end, coding_system)
5971 Lisp_Object start, end, coding_system;
3a73fa5d 5972{
4031e2bf
KH
5973 return code_convert_region1 (start, end, coding_system, 1);
5974}
3a73fa5d 5975
4031e2bf
KH
5976Lisp_Object
5977code_convert_string1 (string, coding_system, nocopy, encodep)
5978 Lisp_Object string, coding_system, nocopy;
5979 int encodep;
5980{
5981 struct coding_system coding;
3a73fa5d 5982
4031e2bf
KH
5983 CHECK_STRING (string, 0);
5984 CHECK_SYMBOL (coding_system, 1);
4ed46869 5985
d46c5b12 5986 if (NILP (coding_system))
4031e2bf 5987 return (NILP (nocopy) ? Fcopy_sequence (string) : string);
4ed46869 5988
d46c5b12
KH
5989 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
5990 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
5f1cd180 5991
d46c5b12 5992 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
5993 string = (encodep
5994 ? encode_coding_string (string, &coding, !NILP (nocopy))
5995 : decode_coding_string (string, &coding, !NILP (nocopy)));
f072a3e8 5996 Vlast_coding_system_used = coding.symbol;
ec6d2bb8
KH
5997
5998 return string;
4ed46869
KH
5999}
6000
4ed46869 6001DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string,
e0e989f6
KH
6002 2, 3, 0,
6003 "Decode STRING which is encoded in CODING-SYSTEM, and return the result.\n\
fe487a71 6004Optional arg NOCOPY non-nil means it is ok to return STRING itself\n\
f072a3e8
RS
6005if the decoding operation is trivial.\n\
6006This function sets `last-coding-system-used' to the precise coding system\n\
6007used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
6008not fully specified.)")
e0e989f6
KH
6009 (string, coding_system, nocopy)
6010 Lisp_Object string, coding_system, nocopy;
4ed46869 6011{
f072a3e8 6012 return code_convert_string1 (string, coding_system, nocopy, 0);
4ed46869
KH
6013}
6014
6015DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string,
e0e989f6
KH
6016 2, 3, 0,
6017 "Encode STRING to CODING-SYSTEM, and return the result.\n\
fe487a71 6018Optional arg NOCOPY non-nil means it is ok to return STRING itself\n\
f072a3e8
RS
6019if the encoding operation is trivial.\n\
6020This function sets `last-coding-system-used' to the precise coding system\n\
6021used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
6022not fully specified.)")
e0e989f6
KH
6023 (string, coding_system, nocopy)
6024 Lisp_Object string, coding_system, nocopy;
4ed46869 6025{
f072a3e8 6026 return code_convert_string1 (string, coding_system, nocopy, 1);
4ed46869 6027}
4031e2bf 6028
ecec61c1 6029/* Encode or decode STRING according to CODING_SYSTEM.
ec6d2bb8
KH
6030 Do not set Vlast_coding_system_used.
6031
6032 This function is called only from macros DECODE_FILE and
6033 ENCODE_FILE, thus we ignore character composition. */
ecec61c1
KH
6034
6035Lisp_Object
6036code_convert_string_norecord (string, coding_system, encodep)
6037 Lisp_Object string, coding_system;
6038 int encodep;
6039{
6040 struct coding_system coding;
6041
6042 CHECK_STRING (string, 0);
6043 CHECK_SYMBOL (coding_system, 1);
6044
6045 if (NILP (coding_system))
6046 return string;
6047
6048 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
6049 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
6050
ec6d2bb8 6051 coding.composing = COMPOSITION_DISABLED;
ecec61c1 6052 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
6053 return (encodep
6054 ? encode_coding_string (string, &coding, 1)
6055 : decode_coding_string (string, &coding, 1));
ecec61c1 6056}
3a73fa5d 6057\f
4ed46869 6058DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0,
55ab7be3 6059 "Decode a Japanese character which has CODE in shift_jis encoding.\n\
4ed46869
KH
6060Return the corresponding character.")
6061 (code)
6062 Lisp_Object code;
6063{
6064 unsigned char c1, c2, s1, s2;
6065 Lisp_Object val;
6066
6067 CHECK_NUMBER (code, 0);
6068 s1 = (XFASTINT (code)) >> 8, s2 = (XFASTINT (code)) & 0xFF;
55ab7be3
KH
6069 if (s1 == 0)
6070 {
c28a9453
KH
6071 if (s2 < 0x80)
6072 XSETFASTINT (val, s2);
6073 else if (s2 >= 0xA0 || s2 <= 0xDF)
b73bfc1c 6074 XSETFASTINT (val, MAKE_CHAR (charset_katakana_jisx0201, s2, 0));
c28a9453 6075 else
9da8350f 6076 error ("Invalid Shift JIS code: %x", XFASTINT (code));
55ab7be3
KH
6077 }
6078 else
6079 {
6080 if ((s1 < 0x80 || s1 > 0x9F && s1 < 0xE0 || s1 > 0xEF)
6081 || (s2 < 0x40 || s2 == 0x7F || s2 > 0xFC))
9da8350f 6082 error ("Invalid Shift JIS code: %x", XFASTINT (code));
55ab7be3 6083 DECODE_SJIS (s1, s2, c1, c2);
b73bfc1c 6084 XSETFASTINT (val, MAKE_CHAR (charset_jisx0208, c1, c2));
55ab7be3 6085 }
4ed46869
KH
6086 return val;
6087}
6088
6089DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0,
55ab7be3
KH
6090 "Encode a Japanese character CHAR to shift_jis encoding.\n\
6091Return the corresponding code in SJIS.")
4ed46869
KH
6092 (ch)
6093 Lisp_Object ch;
6094{
bcf26d6a 6095 int charset, c1, c2, s1, s2;
4ed46869
KH
6096 Lisp_Object val;
6097
6098 CHECK_NUMBER (ch, 0);
6099 SPLIT_CHAR (XFASTINT (ch), charset, c1, c2);
c28a9453
KH
6100 if (charset == CHARSET_ASCII)
6101 {
6102 val = ch;
6103 }
6104 else if (charset == charset_jisx0208
6105 && c1 > 0x20 && c1 < 0x7F && c2 > 0x20 && c2 < 0x7F)
4ed46869
KH
6106 {
6107 ENCODE_SJIS (c1, c2, s1, s2);
bcf26d6a 6108 XSETFASTINT (val, (s1 << 8) | s2);
4ed46869 6109 }
55ab7be3
KH
6110 else if (charset == charset_katakana_jisx0201
6111 && c1 > 0x20 && c2 < 0xE0)
6112 {
6113 XSETFASTINT (val, c1 | 0x80);
6114 }
4ed46869 6115 else
55ab7be3 6116 error ("Can't encode to shift_jis: %d", XFASTINT (ch));
4ed46869
KH
6117 return val;
6118}
6119
6120DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0,
c28a9453 6121 "Decode a Big5 character which has CODE in BIG5 coding system.\n\
4ed46869
KH
6122Return the corresponding character.")
6123 (code)
6124 Lisp_Object code;
6125{
6126 int charset;
6127 unsigned char b1, b2, c1, c2;
6128 Lisp_Object val;
6129
6130 CHECK_NUMBER (code, 0);
6131 b1 = (XFASTINT (code)) >> 8, b2 = (XFASTINT (code)) & 0xFF;
c28a9453
KH
6132 if (b1 == 0)
6133 {
6134 if (b2 >= 0x80)
9da8350f 6135 error ("Invalid BIG5 code: %x", XFASTINT (code));
c28a9453
KH
6136 val = code;
6137 }
6138 else
6139 {
6140 if ((b1 < 0xA1 || b1 > 0xFE)
6141 || (b2 < 0x40 || (b2 > 0x7E && b2 < 0xA1) || b2 > 0xFE))
9da8350f 6142 error ("Invalid BIG5 code: %x", XFASTINT (code));
c28a9453 6143 DECODE_BIG5 (b1, b2, charset, c1, c2);
b73bfc1c 6144 XSETFASTINT (val, MAKE_CHAR (charset, c1, c2));
c28a9453 6145 }
4ed46869
KH
6146 return val;
6147}
6148
6149DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0,
d46c5b12 6150 "Encode the Big5 character CHAR to BIG5 coding system.\n\
4ed46869
KH
6151Return the corresponding character code in Big5.")
6152 (ch)
6153 Lisp_Object ch;
6154{
bcf26d6a 6155 int charset, c1, c2, b1, b2;
4ed46869
KH
6156 Lisp_Object val;
6157
6158 CHECK_NUMBER (ch, 0);
6159 SPLIT_CHAR (XFASTINT (ch), charset, c1, c2);
c28a9453
KH
6160 if (charset == CHARSET_ASCII)
6161 {
6162 val = ch;
6163 }
6164 else if ((charset == charset_big5_1
6165 && (XFASTINT (ch) >= 0x250a1 && XFASTINT (ch) <= 0x271ec))
6166 || (charset == charset_big5_2
6167 && XFASTINT (ch) >= 0x290a1 && XFASTINT (ch) <= 0x2bdb2))
4ed46869
KH
6168 {
6169 ENCODE_BIG5 (charset, c1, c2, b1, b2);
bcf26d6a 6170 XSETFASTINT (val, (b1 << 8) | b2);
4ed46869
KH
6171 }
6172 else
c28a9453 6173 error ("Can't encode to Big5: %d", XFASTINT (ch));
4ed46869
KH
6174 return val;
6175}
3a73fa5d 6176\f
1ba9e4ab
KH
6177DEFUN ("set-terminal-coding-system-internal",
6178 Fset_terminal_coding_system_internal,
6179 Sset_terminal_coding_system_internal, 1, 1, 0, "")
4ed46869
KH
6180 (coding_system)
6181 Lisp_Object coding_system;
6182{
6183 CHECK_SYMBOL (coding_system, 0);
6184 setup_coding_system (Fcheck_coding_system (coding_system), &terminal_coding);
70c22245 6185 /* We had better not send unsafe characters to terminal. */
6e85d753 6186 terminal_coding.flags |= CODING_FLAG_ISO_SAFE;
ec6d2bb8
KH
6187 /* Characer composition should be disabled. */
6188 terminal_coding.composing = COMPOSITION_DISABLED;
b73bfc1c
KH
6189 terminal_coding.src_multibyte = 1;
6190 terminal_coding.dst_multibyte = 0;
4ed46869
KH
6191 return Qnil;
6192}
6193
c4825358
KH
6194DEFUN ("set-safe-terminal-coding-system-internal",
6195 Fset_safe_terminal_coding_system_internal,
6196 Sset_safe_terminal_coding_system_internal, 1, 1, 0, "")
6197 (coding_system)
6198 Lisp_Object coding_system;
6199{
6200 CHECK_SYMBOL (coding_system, 0);
6201 setup_coding_system (Fcheck_coding_system (coding_system),
6202 &safe_terminal_coding);
ec6d2bb8
KH
6203 /* Characer composition should be disabled. */
6204 safe_terminal_coding.composing = COMPOSITION_DISABLED;
b73bfc1c
KH
6205 safe_terminal_coding.src_multibyte = 1;
6206 safe_terminal_coding.dst_multibyte = 0;
c4825358
KH
6207 return Qnil;
6208}
6209
4ed46869
KH
6210DEFUN ("terminal-coding-system",
6211 Fterminal_coding_system, Sterminal_coding_system, 0, 0, 0,
3a73fa5d 6212 "Return coding system specified for terminal output.")
4ed46869
KH
6213 ()
6214{
6215 return terminal_coding.symbol;
6216}
6217
1ba9e4ab
KH
6218DEFUN ("set-keyboard-coding-system-internal",
6219 Fset_keyboard_coding_system_internal,
6220 Sset_keyboard_coding_system_internal, 1, 1, 0, "")
4ed46869
KH
6221 (coding_system)
6222 Lisp_Object coding_system;
6223{
6224 CHECK_SYMBOL (coding_system, 0);
6225 setup_coding_system (Fcheck_coding_system (coding_system), &keyboard_coding);
ec6d2bb8
KH
6226 /* Characer composition should be disabled. */
6227 keyboard_coding.composing = COMPOSITION_DISABLED;
4ed46869
KH
6228 return Qnil;
6229}
6230
6231DEFUN ("keyboard-coding-system",
6232 Fkeyboard_coding_system, Skeyboard_coding_system, 0, 0, 0,
3a73fa5d 6233 "Return coding system specified for decoding keyboard input.")
4ed46869
KH
6234 ()
6235{
6236 return keyboard_coding.symbol;
6237}
6238
6239\f
a5d301df
KH
6240DEFUN ("find-operation-coding-system", Ffind_operation_coding_system,
6241 Sfind_operation_coding_system, 1, MANY, 0,
6242 "Choose a coding system for an operation based on the target name.\n\
69f76525 6243The value names a pair of coding systems: (DECODING-SYSTEM . ENCODING-SYSTEM).\n\
9ce27fde
KH
6244DECODING-SYSTEM is the coding system to use for decoding\n\
6245\(in case OPERATION does decoding), and ENCODING-SYSTEM is the coding system\n\
6246for encoding (in case OPERATION does encoding).\n\
ccdb79f5
RS
6247\n\
6248The first argument OPERATION specifies an I/O primitive:\n\
6249 For file I/O, `insert-file-contents' or `write-region'.\n\
6250 For process I/O, `call-process', `call-process-region', or `start-process'.\n\
6251 For network I/O, `open-network-stream'.\n\
6252\n\
6253The remaining arguments should be the same arguments that were passed\n\
6254to the primitive. Depending on which primitive, one of those arguments\n\
6255is selected as the TARGET. For example, if OPERATION does file I/O,\n\
6256whichever argument specifies the file name is TARGET.\n\
6257\n\
6258TARGET has a meaning which depends on OPERATION:\n\
4ed46869
KH
6259 For file I/O, TARGET is a file name.\n\
6260 For process I/O, TARGET is a process name.\n\
6261 For network I/O, TARGET is a service name or a port number\n\
6262\n\
02ba4723
KH
6263This function looks up what specified for TARGET in,\n\
6264`file-coding-system-alist', `process-coding-system-alist',\n\
6265or `network-coding-system-alist' depending on OPERATION.\n\
6266They may specify a coding system, a cons of coding systems,\n\
6267or a function symbol to call.\n\
6268In the last case, we call the function with one argument,\n\
9ce27fde 6269which is a list of all the arguments given to this function.")
4ed46869
KH
6270 (nargs, args)
6271 int nargs;
6272 Lisp_Object *args;
6273{
6274 Lisp_Object operation, target_idx, target, val;
6275 register Lisp_Object chain;
6276
6277 if (nargs < 2)
6278 error ("Too few arguments");
6279 operation = args[0];
6280 if (!SYMBOLP (operation)
6281 || !INTEGERP (target_idx = Fget (operation, Qtarget_idx)))
6282 error ("Invalid first arguement");
6283 if (nargs < 1 + XINT (target_idx))
6284 error ("Too few arguments for operation: %s",
6285 XSYMBOL (operation)->name->data);
6286 target = args[XINT (target_idx) + 1];
6287 if (!(STRINGP (target)
6288 || (EQ (operation, Qopen_network_stream) && INTEGERP (target))))
6289 error ("Invalid %dth argument", XINT (target_idx) + 1);
6290
2e34157c
RS
6291 chain = ((EQ (operation, Qinsert_file_contents)
6292 || EQ (operation, Qwrite_region))
02ba4723 6293 ? Vfile_coding_system_alist
2e34157c 6294 : (EQ (operation, Qopen_network_stream)
02ba4723
KH
6295 ? Vnetwork_coding_system_alist
6296 : Vprocess_coding_system_alist));
4ed46869
KH
6297 if (NILP (chain))
6298 return Qnil;
6299
03699b14 6300 for (; CONSP (chain); chain = XCDR (chain))
4ed46869 6301 {
f44d27ce 6302 Lisp_Object elt;
03699b14 6303 elt = XCAR (chain);
4ed46869
KH
6304
6305 if (CONSP (elt)
6306 && ((STRINGP (target)
03699b14
KR
6307 && STRINGP (XCAR (elt))
6308 && fast_string_match (XCAR (elt), target) >= 0)
6309 || (INTEGERP (target) && EQ (target, XCAR (elt)))))
02ba4723 6310 {
03699b14 6311 val = XCDR (elt);
b19fd4c5
KH
6312 /* Here, if VAL is both a valid coding system and a valid
6313 function symbol, we return VAL as a coding system. */
02ba4723
KH
6314 if (CONSP (val))
6315 return val;
6316 if (! SYMBOLP (val))
6317 return Qnil;
6318 if (! NILP (Fcoding_system_p (val)))
6319 return Fcons (val, val);
b19fd4c5
KH
6320 if (! NILP (Ffboundp (val)))
6321 {
6322 val = call1 (val, Flist (nargs, args));
6323 if (CONSP (val))
6324 return val;
6325 if (SYMBOLP (val) && ! NILP (Fcoding_system_p (val)))
6326 return Fcons (val, val);
6327 }
02ba4723
KH
6328 return Qnil;
6329 }
4ed46869
KH
6330 }
6331 return Qnil;
6332}
6333
1397dc18
KH
6334DEFUN ("update-coding-systems-internal", Fupdate_coding_systems_internal,
6335 Supdate_coding_systems_internal, 0, 0, 0,
6336 "Update internal database for ISO2022 and CCL based coding systems.\n\
fa42c37f
KH
6337When values of any coding categories are changed, you must\n\
6338call this function")
d46c5b12
KH
6339 ()
6340{
6341 int i;
6342
fa42c37f 6343 for (i = CODING_CATEGORY_IDX_EMACS_MULE; i < CODING_CATEGORY_IDX_MAX; i++)
d46c5b12 6344 {
1397dc18
KH
6345 Lisp_Object val;
6346
6347 val = XSYMBOL (XVECTOR (Vcoding_category_table)->contents[i])->value;
6348 if (!NILP (val))
6349 {
6350 if (! coding_system_table[i])
6351 coding_system_table[i] = ((struct coding_system *)
6352 xmalloc (sizeof (struct coding_system)));
6353 setup_coding_system (val, coding_system_table[i]);
6354 }
6355 else if (coding_system_table[i])
6356 {
6357 xfree (coding_system_table[i]);
6358 coding_system_table[i] = NULL;
6359 }
d46c5b12 6360 }
1397dc18 6361
d46c5b12
KH
6362 return Qnil;
6363}
6364
66cfb530
KH
6365DEFUN ("set-coding-priority-internal", Fset_coding_priority_internal,
6366 Sset_coding_priority_internal, 0, 0, 0,
6367 "Update internal database for the current value of `coding-category-list'.\n\
6368This function is internal use only.")
6369 ()
6370{
6371 int i = 0, idx;
84d60297
RS
6372 Lisp_Object val;
6373
6374 val = Vcoding_category_list;
66cfb530
KH
6375
6376 while (CONSP (val) && i < CODING_CATEGORY_IDX_MAX)
6377 {
03699b14 6378 if (! SYMBOLP (XCAR (val)))
66cfb530 6379 break;
03699b14 6380 idx = XFASTINT (Fget (XCAR (val), Qcoding_category_index));
66cfb530
KH
6381 if (idx >= CODING_CATEGORY_IDX_MAX)
6382 break;
6383 coding_priorities[i++] = (1 << idx);
03699b14 6384 val = XCDR (val);
66cfb530
KH
6385 }
6386 /* If coding-category-list is valid and contains all coding
6387 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not,
fa42c37f 6388 the following code saves Emacs from crashing. */
66cfb530
KH
6389 while (i < CODING_CATEGORY_IDX_MAX)
6390 coding_priorities[i++] = CODING_CATEGORY_MASK_RAW_TEXT;
6391
6392 return Qnil;
6393}
6394
4ed46869
KH
6395#endif /* emacs */
6396
6397\f
1397dc18 6398/*** 9. Post-amble ***/
4ed46869 6399
dfcf069d 6400void
4ed46869
KH
6401init_coding_once ()
6402{
6403 int i;
6404
0ef69138 6405 /* Emacs' internal format specific initialize routine. */
4ed46869
KH
6406 for (i = 0; i <= 0x20; i++)
6407 emacs_code_class[i] = EMACS_control_code;
6408 emacs_code_class[0x0A] = EMACS_linefeed_code;
6409 emacs_code_class[0x0D] = EMACS_carriage_return_code;
6410 for (i = 0x21 ; i < 0x7F; i++)
6411 emacs_code_class[i] = EMACS_ascii_code;
6412 emacs_code_class[0x7F] = EMACS_control_code;
ec6d2bb8 6413 for (i = 0x80; i < 0xFF; i++)
4ed46869
KH
6414 emacs_code_class[i] = EMACS_invalid_code;
6415 emacs_code_class[LEADING_CODE_PRIVATE_11] = EMACS_leading_code_3;
6416 emacs_code_class[LEADING_CODE_PRIVATE_12] = EMACS_leading_code_3;
6417 emacs_code_class[LEADING_CODE_PRIVATE_21] = EMACS_leading_code_4;
6418 emacs_code_class[LEADING_CODE_PRIVATE_22] = EMACS_leading_code_4;
6419
6420 /* ISO2022 specific initialize routine. */
6421 for (i = 0; i < 0x20; i++)
b73bfc1c 6422 iso_code_class[i] = ISO_control_0;
4ed46869
KH
6423 for (i = 0x21; i < 0x7F; i++)
6424 iso_code_class[i] = ISO_graphic_plane_0;
6425 for (i = 0x80; i < 0xA0; i++)
b73bfc1c 6426 iso_code_class[i] = ISO_control_1;
4ed46869
KH
6427 for (i = 0xA1; i < 0xFF; i++)
6428 iso_code_class[i] = ISO_graphic_plane_1;
6429 iso_code_class[0x20] = iso_code_class[0x7F] = ISO_0x20_or_0x7F;
6430 iso_code_class[0xA0] = iso_code_class[0xFF] = ISO_0xA0_or_0xFF;
6431 iso_code_class[ISO_CODE_CR] = ISO_carriage_return;
6432 iso_code_class[ISO_CODE_SO] = ISO_shift_out;
6433 iso_code_class[ISO_CODE_SI] = ISO_shift_in;
6434 iso_code_class[ISO_CODE_SS2_7] = ISO_single_shift_2_7;
6435 iso_code_class[ISO_CODE_ESC] = ISO_escape;
6436 iso_code_class[ISO_CODE_SS2] = ISO_single_shift_2;
6437 iso_code_class[ISO_CODE_SS3] = ISO_single_shift_3;
6438 iso_code_class[ISO_CODE_CSI] = ISO_control_sequence_introducer;
6439
e0e989f6
KH
6440 setup_coding_system (Qnil, &keyboard_coding);
6441 setup_coding_system (Qnil, &terminal_coding);
c4825358 6442 setup_coding_system (Qnil, &safe_terminal_coding);
6bc51348 6443 setup_coding_system (Qnil, &default_buffer_file_coding);
9ce27fde 6444
d46c5b12
KH
6445 bzero (coding_system_table, sizeof coding_system_table);
6446
66cfb530
KH
6447 bzero (ascii_skip_code, sizeof ascii_skip_code);
6448 for (i = 0; i < 128; i++)
6449 ascii_skip_code[i] = 1;
6450
9ce27fde
KH
6451#if defined (MSDOS) || defined (WINDOWSNT)
6452 system_eol_type = CODING_EOL_CRLF;
6453#else
6454 system_eol_type = CODING_EOL_LF;
6455#endif
b843d1ae
KH
6456
6457 inhibit_pre_post_conversion = 0;
e0e989f6
KH
6458}
6459
6460#ifdef emacs
6461
dfcf069d 6462void
e0e989f6
KH
6463syms_of_coding ()
6464{
6465 Qtarget_idx = intern ("target-idx");
6466 staticpro (&Qtarget_idx);
6467
bb0115a2
RS
6468 Qcoding_system_history = intern ("coding-system-history");
6469 staticpro (&Qcoding_system_history);
6470 Fset (Qcoding_system_history, Qnil);
6471
9ce27fde 6472 /* Target FILENAME is the first argument. */
e0e989f6 6473 Fput (Qinsert_file_contents, Qtarget_idx, make_number (0));
9ce27fde 6474 /* Target FILENAME is the third argument. */
e0e989f6
KH
6475 Fput (Qwrite_region, Qtarget_idx, make_number (2));
6476
6477 Qcall_process = intern ("call-process");
6478 staticpro (&Qcall_process);
9ce27fde 6479 /* Target PROGRAM is the first argument. */
e0e989f6
KH
6480 Fput (Qcall_process, Qtarget_idx, make_number (0));
6481
6482 Qcall_process_region = intern ("call-process-region");
6483 staticpro (&Qcall_process_region);
9ce27fde 6484 /* Target PROGRAM is the third argument. */
e0e989f6
KH
6485 Fput (Qcall_process_region, Qtarget_idx, make_number (2));
6486
6487 Qstart_process = intern ("start-process");
6488 staticpro (&Qstart_process);
9ce27fde 6489 /* Target PROGRAM is the third argument. */
e0e989f6
KH
6490 Fput (Qstart_process, Qtarget_idx, make_number (2));
6491
6492 Qopen_network_stream = intern ("open-network-stream");
6493 staticpro (&Qopen_network_stream);
9ce27fde 6494 /* Target SERVICE is the fourth argument. */
e0e989f6
KH
6495 Fput (Qopen_network_stream, Qtarget_idx, make_number (3));
6496
4ed46869
KH
6497 Qcoding_system = intern ("coding-system");
6498 staticpro (&Qcoding_system);
6499
6500 Qeol_type = intern ("eol-type");
6501 staticpro (&Qeol_type);
6502
6503 Qbuffer_file_coding_system = intern ("buffer-file-coding-system");
6504 staticpro (&Qbuffer_file_coding_system);
6505
6506 Qpost_read_conversion = intern ("post-read-conversion");
6507 staticpro (&Qpost_read_conversion);
6508
6509 Qpre_write_conversion = intern ("pre-write-conversion");
6510 staticpro (&Qpre_write_conversion);
6511
27901516
KH
6512 Qno_conversion = intern ("no-conversion");
6513 staticpro (&Qno_conversion);
6514
6515 Qundecided = intern ("undecided");
6516 staticpro (&Qundecided);
6517
4ed46869
KH
6518 Qcoding_system_p = intern ("coding-system-p");
6519 staticpro (&Qcoding_system_p);
6520
6521 Qcoding_system_error = intern ("coding-system-error");
6522 staticpro (&Qcoding_system_error);
6523
6524 Fput (Qcoding_system_error, Qerror_conditions,
6525 Fcons (Qcoding_system_error, Fcons (Qerror, Qnil)));
6526 Fput (Qcoding_system_error, Qerror_message,
9ce27fde 6527 build_string ("Invalid coding system"));
4ed46869 6528
d46c5b12
KH
6529 Qcoding_category = intern ("coding-category");
6530 staticpro (&Qcoding_category);
4ed46869
KH
6531 Qcoding_category_index = intern ("coding-category-index");
6532 staticpro (&Qcoding_category_index);
6533
d46c5b12
KH
6534 Vcoding_category_table
6535 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX), Qnil);
6536 staticpro (&Vcoding_category_table);
4ed46869
KH
6537 {
6538 int i;
6539 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
6540 {
d46c5b12
KH
6541 XVECTOR (Vcoding_category_table)->contents[i]
6542 = intern (coding_category_name[i]);
6543 Fput (XVECTOR (Vcoding_category_table)->contents[i],
6544 Qcoding_category_index, make_number (i));
4ed46869
KH
6545 }
6546 }
6547
f967223b
KH
6548 Qtranslation_table = intern ("translation-table");
6549 staticpro (&Qtranslation_table);
1397dc18 6550 Fput (Qtranslation_table, Qchar_table_extra_slots, make_number (1));
bdd9fb48 6551
f967223b
KH
6552 Qtranslation_table_id = intern ("translation-table-id");
6553 staticpro (&Qtranslation_table_id);
84fbb8a0 6554
f967223b
KH
6555 Qtranslation_table_for_decode = intern ("translation-table-for-decode");
6556 staticpro (&Qtranslation_table_for_decode);
a5d301df 6557
f967223b
KH
6558 Qtranslation_table_for_encode = intern ("translation-table-for-encode");
6559 staticpro (&Qtranslation_table_for_encode);
a5d301df 6560
05e6f5dc
KH
6561 Qsafe_chars = intern ("safe-chars");
6562 staticpro (&Qsafe_chars);
6563
6564 Qchar_coding_system = intern ("char-coding-system");
6565 staticpro (&Qchar_coding_system);
6566
6567 /* Intern this now in case it isn't already done.
6568 Setting this variable twice is harmless.
6569 But don't staticpro it here--that is done in alloc.c. */
6570 Qchar_table_extra_slots = intern ("char-table-extra-slots");
6571 Fput (Qsafe_chars, Qchar_table_extra_slots, make_number (0));
6572 Fput (Qchar_coding_system, Qchar_table_extra_slots, make_number (1));
70c22245 6573
1397dc18
KH
6574 Qvalid_codes = intern ("valid-codes");
6575 staticpro (&Qvalid_codes);
6576
9ce27fde
KH
6577 Qemacs_mule = intern ("emacs-mule");
6578 staticpro (&Qemacs_mule);
6579
d46c5b12
KH
6580 Qraw_text = intern ("raw-text");
6581 staticpro (&Qraw_text);
6582
4ed46869
KH
6583 defsubr (&Scoding_system_p);
6584 defsubr (&Sread_coding_system);
6585 defsubr (&Sread_non_nil_coding_system);
6586 defsubr (&Scheck_coding_system);
6587 defsubr (&Sdetect_coding_region);
d46c5b12 6588 defsubr (&Sdetect_coding_string);
05e6f5dc 6589 defsubr (&Sfind_coding_systems_region_internal);
4ed46869
KH
6590 defsubr (&Sdecode_coding_region);
6591 defsubr (&Sencode_coding_region);
6592 defsubr (&Sdecode_coding_string);
6593 defsubr (&Sencode_coding_string);
6594 defsubr (&Sdecode_sjis_char);
6595 defsubr (&Sencode_sjis_char);
6596 defsubr (&Sdecode_big5_char);
6597 defsubr (&Sencode_big5_char);
1ba9e4ab 6598 defsubr (&Sset_terminal_coding_system_internal);
c4825358 6599 defsubr (&Sset_safe_terminal_coding_system_internal);
4ed46869 6600 defsubr (&Sterminal_coding_system);
1ba9e4ab 6601 defsubr (&Sset_keyboard_coding_system_internal);
4ed46869 6602 defsubr (&Skeyboard_coding_system);
a5d301df 6603 defsubr (&Sfind_operation_coding_system);
1397dc18 6604 defsubr (&Supdate_coding_systems_internal);
66cfb530 6605 defsubr (&Sset_coding_priority_internal);
4ed46869 6606
4608c386
KH
6607 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list,
6608 "List of coding systems.\n\
6609\n\
6610Do not alter the value of this variable manually. This variable should be\n\
6611updated by the functions `make-coding-system' and\n\
6612`define-coding-system-alias'.");
6613 Vcoding_system_list = Qnil;
6614
6615 DEFVAR_LISP ("coding-system-alist", &Vcoding_system_alist,
6616 "Alist of coding system names.\n\
6617Each element is one element list of coding system name.\n\
6618This variable is given to `completing-read' as TABLE argument.\n\
6619\n\
6620Do not alter the value of this variable manually. This variable should be\n\
6621updated by the functions `make-coding-system' and\n\
6622`define-coding-system-alias'.");
6623 Vcoding_system_alist = Qnil;
6624
4ed46869
KH
6625 DEFVAR_LISP ("coding-category-list", &Vcoding_category_list,
6626 "List of coding-categories (symbols) ordered by priority.");
6627 {
6628 int i;
6629
6630 Vcoding_category_list = Qnil;
6631 for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--)
6632 Vcoding_category_list
d46c5b12
KH
6633 = Fcons (XVECTOR (Vcoding_category_table)->contents[i],
6634 Vcoding_category_list);
4ed46869
KH
6635 }
6636
6637 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read,
10bff6f1 6638 "Specify the coding system for read operations.\n\
2ebb362d 6639It is useful to bind this variable with `let', but do not set it globally.\n\
4ed46869 6640If the value is a coding system, it is used for decoding on read operation.\n\
a67a9c66 6641If not, an appropriate element is used from one of the coding system alists:\n\
10bff6f1 6642There are three such tables, `file-coding-system-alist',\n\
a67a9c66 6643`process-coding-system-alist', and `network-coding-system-alist'.");
4ed46869
KH
6644 Vcoding_system_for_read = Qnil;
6645
6646 DEFVAR_LISP ("coding-system-for-write", &Vcoding_system_for_write,
10bff6f1 6647 "Specify the coding system for write operations.\n\
928aedd8
RS
6648Programs bind this variable with `let', but you should not set it globally.\n\
6649If the value is a coding system, it is used for encoding of output,\n\
6650when writing it to a file and when sending it to a file or subprocess.\n\
6651\n\
6652If this does not specify a coding system, an appropriate element\n\
6653is used from one of the coding system alists:\n\
10bff6f1 6654There are three such tables, `file-coding-system-alist',\n\
928aedd8
RS
6655`process-coding-system-alist', and `network-coding-system-alist'.\n\
6656For output to files, if the above procedure does not specify a coding system,\n\
6657the value of `buffer-file-coding-system' is used.");
4ed46869
KH
6658 Vcoding_system_for_write = Qnil;
6659
6660 DEFVAR_LISP ("last-coding-system-used", &Vlast_coding_system_used,
a67a9c66 6661 "Coding system used in the latest file or process I/O.");
4ed46869
KH
6662 Vlast_coding_system_used = Qnil;
6663
9ce27fde 6664 DEFVAR_BOOL ("inhibit-eol-conversion", &inhibit_eol_conversion,
f07f4a24 6665 "*Non-nil means always inhibit code conversion of end-of-line format.\n\
94c7a214
DL
6666See info node `Coding Systems' and info node `Text and Binary' concerning\n\
6667such conversion.");
9ce27fde
KH
6668 inhibit_eol_conversion = 0;
6669
ed29121d
EZ
6670 DEFVAR_BOOL ("inherit-process-coding-system", &inherit_process_coding_system,
6671 "Non-nil means process buffer inherits coding system of process output.\n\
6672Bind it to t if the process output is to be treated as if it were a file\n\
6673read from some filesystem.");
6674 inherit_process_coding_system = 0;
6675
02ba4723
KH
6676 DEFVAR_LISP ("file-coding-system-alist", &Vfile_coding_system_alist,
6677 "Alist to decide a coding system to use for a file I/O operation.\n\
6678The format is ((PATTERN . VAL) ...),\n\
6679where PATTERN is a regular expression matching a file name,\n\
6680VAL is a coding system, a cons of coding systems, or a function symbol.\n\
6681If VAL is a coding system, it is used for both decoding and encoding\n\
6682the file contents.\n\
6683If VAL is a cons of coding systems, the car part is used for decoding,\n\
6684and the cdr part is used for encoding.\n\
6685If VAL is a function symbol, the function must return a coding system\n\
6686or a cons of coding systems which are used as above.\n\
e0e989f6 6687\n\
a85a871a 6688See also the function `find-operation-coding-system'\n\
eda284ac 6689and the variable `auto-coding-alist'.");
02ba4723
KH
6690 Vfile_coding_system_alist = Qnil;
6691
6692 DEFVAR_LISP ("process-coding-system-alist", &Vprocess_coding_system_alist,
6693 "Alist to decide a coding system to use for a process I/O operation.\n\
6694The format is ((PATTERN . VAL) ...),\n\
6695where PATTERN is a regular expression matching a program name,\n\
6696VAL is a coding system, a cons of coding systems, or a function symbol.\n\
6697If VAL is a coding system, it is used for both decoding what received\n\
6698from the program and encoding what sent to the program.\n\
6699If VAL is a cons of coding systems, the car part is used for decoding,\n\
6700and the cdr part is used for encoding.\n\
6701If VAL is a function symbol, the function must return a coding system\n\
6702or a cons of coding systems which are used as above.\n\
4ed46869 6703\n\
9ce27fde 6704See also the function `find-operation-coding-system'.");
02ba4723
KH
6705 Vprocess_coding_system_alist = Qnil;
6706
6707 DEFVAR_LISP ("network-coding-system-alist", &Vnetwork_coding_system_alist,
6708 "Alist to decide a coding system to use for a network I/O operation.\n\
6709The format is ((PATTERN . VAL) ...),\n\
6710where PATTERN is a regular expression matching a network service name\n\
6711or is a port number to connect to,\n\
6712VAL is a coding system, a cons of coding systems, or a function symbol.\n\
6713If VAL is a coding system, it is used for both decoding what received\n\
6714from the network stream and encoding what sent to the network stream.\n\
6715If VAL is a cons of coding systems, the car part is used for decoding,\n\
6716and the cdr part is used for encoding.\n\
6717If VAL is a function symbol, the function must return a coding system\n\
6718or a cons of coding systems which are used as above.\n\
4ed46869 6719\n\
9ce27fde 6720See also the function `find-operation-coding-system'.");
02ba4723 6721 Vnetwork_coding_system_alist = Qnil;
4ed46869 6722
68c45bf0
PE
6723 DEFVAR_LISP ("locale-coding-system", &Vlocale_coding_system,
6724 "Coding system to use with system messages.");
6725 Vlocale_coding_system = Qnil;
6726
005f0d35 6727 /* The eol mnemonics are reset in startup.el system-dependently. */
7722baf9
EZ
6728 DEFVAR_LISP ("eol-mnemonic-unix", &eol_mnemonic_unix,
6729 "*String displayed in mode line for UNIX-like (LF) end-of-line format.");
6730 eol_mnemonic_unix = build_string (":");
4ed46869 6731
7722baf9
EZ
6732 DEFVAR_LISP ("eol-mnemonic-dos", &eol_mnemonic_dos,
6733 "*String displayed in mode line for DOS-like (CRLF) end-of-line format.");
6734 eol_mnemonic_dos = build_string ("\\");
4ed46869 6735
7722baf9
EZ
6736 DEFVAR_LISP ("eol-mnemonic-mac", &eol_mnemonic_mac,
6737 "*String displayed in mode line for MAC-like (CR) end-of-line format.");
6738 eol_mnemonic_mac = build_string ("/");
4ed46869 6739
7722baf9
EZ
6740 DEFVAR_LISP ("eol-mnemonic-undecided", &eol_mnemonic_undecided,
6741 "*String displayed in mode line when end-of-line format is not yet determined.");
6742 eol_mnemonic_undecided = build_string (":");
4ed46869 6743
84fbb8a0 6744 DEFVAR_LISP ("enable-character-translation", &Venable_character_translation,
f967223b 6745 "*Non-nil enables character translation while encoding and decoding.");
84fbb8a0 6746 Venable_character_translation = Qt;
bdd9fb48 6747
f967223b
KH
6748 DEFVAR_LISP ("standard-translation-table-for-decode",
6749 &Vstandard_translation_table_for_decode,
84fbb8a0 6750 "Table for translating characters while decoding.");
f967223b 6751 Vstandard_translation_table_for_decode = Qnil;
bdd9fb48 6752
f967223b
KH
6753 DEFVAR_LISP ("standard-translation-table-for-encode",
6754 &Vstandard_translation_table_for_encode,
84fbb8a0 6755 "Table for translationg characters while encoding.");
f967223b 6756 Vstandard_translation_table_for_encode = Qnil;
4ed46869
KH
6757
6758 DEFVAR_LISP ("charset-revision-table", &Vcharset_revision_alist,
6759 "Alist of charsets vs revision numbers.\n\
6760While encoding, if a charset (car part of an element) is found,\n\
6761designate it with the escape sequence identifing revision (cdr part of the element).");
6762 Vcharset_revision_alist = Qnil;
02ba4723
KH
6763
6764 DEFVAR_LISP ("default-process-coding-system",
6765 &Vdefault_process_coding_system,
6766 "Cons of coding systems used for process I/O by default.\n\
6767The car part is used for decoding a process output,\n\
6768the cdr part is used for encoding a text to be sent to a process.");
6769 Vdefault_process_coding_system = Qnil;
c4825358 6770
3f003981
KH
6771 DEFVAR_LISP ("latin-extra-code-table", &Vlatin_extra_code_table,
6772 "Table of extra Latin codes in the range 128..159 (inclusive).\n\
c4825358
KH
6773This is a vector of length 256.\n\
6774If Nth element is non-nil, the existence of code N in a file\n\
bb0115a2 6775\(or output of subprocess) doesn't prevent it to be detected as\n\
3f003981
KH
6776a coding system of ISO 2022 variant which has a flag\n\
6777`accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file\n\
c4825358
KH
6778or reading output of a subprocess.\n\
6779Only 128th through 159th elements has a meaning.");
3f003981 6780 Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil);
d46c5b12
KH
6781
6782 DEFVAR_LISP ("select-safe-coding-system-function",
6783 &Vselect_safe_coding_system_function,
6784 "Function to call to select safe coding system for encoding a text.\n\
6785\n\
6786If set, this function is called to force a user to select a proper\n\
6787coding system which can encode the text in the case that a default\n\
6788coding system used in each operation can't encode the text.\n\
6789\n\
a85a871a 6790The default value is `select-safe-coding-system' (which see).");
d46c5b12
KH
6791 Vselect_safe_coding_system_function = Qnil;
6792
05e6f5dc
KH
6793 DEFVAR_LISP ("char-coding-system-table", &Vchar_coding_system_table,
6794 "Char-table containing safe coding systems of each characters.\n\
6795Each element doesn't include such generic coding systems that can\n\
6796encode any characters. They are in the first extra slot.");
6797 Vchar_coding_system_table = Fmake_char_table (Qchar_coding_system, Qnil);
6798
22ab2303 6799 DEFVAR_BOOL ("inhibit-iso-escape-detection",
74383408
KH
6800 &inhibit_iso_escape_detection,
6801 "If non-nil, Emacs ignores ISO2022's escape sequence on code detection.\n\
6802\n\
6803By default, on reading a file, Emacs tries to detect how the text is\n\
6804encoded. This code detection is sensitive to escape sequences. If\n\
e215fa58
EZ
6805the sequence is valid as ISO2022, the code is determined as one of\n\
6806the ISO2022 encodings, and the file is decoded by the corresponding\n\
6807coding system (e.g. `iso-2022-7bit').\n\
74383408
KH
6808\n\
6809However, there may be a case that you want to read escape sequences in\n\
6810a file as is. In such a case, you can set this variable to non-nil.\n\
6811Then, as the code detection ignores any escape sequences, no file is\n\
e215fa58
EZ
6812detected as encoded in some ISO2022 encoding. The result is that all\n\
6813escape sequences become visible in a buffer.\n\
74383408
KH
6814\n\
6815The default value is nil, and it is strongly recommended not to change\n\
6816it. That is because many Emacs Lisp source files that contain\n\
6817non-ASCII characters are encoded by the coding system `iso-2022-7bit'\n\
6818in Emacs's distribution, and they won't be decoded correctly on\n\
e215fa58 6819reading if you suppress escape sequence detection.\n\
74383408
KH
6820\n\
6821The other way to read escape sequences in a file without decoding is\n\
e215fa58 6822to explicitly specify some coding system that doesn't use ISO2022's\n\
74383408
KH
6823escape sequence (e.g `latin-1') on reading by \\[universal-coding-system-argument].");
6824 inhibit_iso_escape_detection = 0;
4ed46869
KH
6825}
6826
68c45bf0
PE
6827char *
6828emacs_strerror (error_number)
6829 int error_number;
6830{
6831 char *str;
6832
ca9c0567 6833 synchronize_system_messages_locale ();
68c45bf0
PE
6834 str = strerror (error_number);
6835
6836 if (! NILP (Vlocale_coding_system))
6837 {
6838 Lisp_Object dec = code_convert_string_norecord (build_string (str),
6839 Vlocale_coding_system,
6840 0);
6841 str = (char *) XSTRING (dec)->data;
6842 }
6843
6844 return str;
6845}
6846
4ed46869 6847#endif /* emacs */
c2f94ebc 6848