Correct typo in a comment.
[bpt/emacs.git] / src / coding.c
CommitLineData
4ed46869 1/* Coding system handler (conversion, detection, and etc).
4a2f9c6a 2 Copyright (C) 1995, 1997, 1998 Electrotechnical Laboratory, JAPAN.
203cb916 3 Licensed to the Free Software Foundation.
4ed46869 4
369314dc
KH
5This file is part of GNU Emacs.
6
7GNU Emacs is free software; you can redistribute it and/or modify
8it under the terms of the GNU General Public License as published by
9the Free Software Foundation; either version 2, or (at your option)
10any later version.
4ed46869 11
369314dc
KH
12GNU Emacs is distributed in the hope that it will be useful,
13but WITHOUT ANY WARRANTY; without even the implied warranty of
14MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15GNU General Public License for more details.
4ed46869 16
369314dc
KH
17You should have received a copy of the GNU General Public License
18along with GNU Emacs; see the file COPYING. If not, write to
19the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
20Boston, MA 02111-1307, USA. */
4ed46869
KH
21
22/*** TABLE OF CONTENTS ***
23
b73bfc1c 24 0. General comments
4ed46869 25 1. Preamble
0ef69138 26 2. Emacs' internal format (emacs-mule) handlers
4ed46869
KH
27 3. ISO2022 handlers
28 4. Shift-JIS and BIG5 handlers
1397dc18
KH
29 5. CCL handlers
30 6. End-of-line handlers
31 7. C library functions
32 8. Emacs Lisp library functions
33 9. Post-amble
4ed46869
KH
34
35*/
36
b73bfc1c
KH
37/*** 0. General comments ***/
38
39
4ed46869
KH
40/*** GENERAL NOTE on CODING SYSTEM ***
41
42 Coding system is an encoding mechanism of one or more character
43 sets. Here's a list of coding systems which Emacs can handle. When
44 we say "decode", it means converting some other coding system to
0ef69138
KH
45 Emacs' internal format (emacs-internal), and when we say "encode",
46 it means converting the coding system emacs-mule to some other
47 coding system.
4ed46869 48
0ef69138 49 0. Emacs' internal format (emacs-mule)
4ed46869
KH
50
51 Emacs itself holds a multi-lingual character in a buffer and a string
f4dee582 52 in a special format. Details are described in section 2.
4ed46869
KH
53
54 1. ISO2022
55
56 The most famous coding system for multiple character sets. X's
f4dee582
RS
57 Compound Text, various EUCs (Extended Unix Code), and coding
58 systems used in Internet communication such as ISO-2022-JP are
59 all variants of ISO2022. Details are described in section 3.
4ed46869
KH
60
61 2. SJIS (or Shift-JIS or MS-Kanji-Code)
62
63 A coding system to encode character sets: ASCII, JISX0201, and
64 JISX0208. Widely used for PC's in Japan. Details are described in
f4dee582 65 section 4.
4ed46869
KH
66
67 3. BIG5
68
69 A coding system to encode character sets: ASCII and Big5. Widely
70 used by Chinese (mainly in Taiwan and Hong Kong). Details are
f4dee582
RS
71 described in section 4. In this file, when we write "BIG5"
72 (all uppercase), we mean the coding system, and when we write
73 "Big5" (capitalized), we mean the character set.
4ed46869 74
27901516
KH
75 4. Raw text
76
4608c386
KH
77 A coding system for a text containing random 8-bit code. Emacs does
78 no code conversion on such a text except for end-of-line format.
27901516
KH
79
80 5. Other
4ed46869 81
f4dee582 82 If a user wants to read/write a text encoded in a coding system not
4ed46869
KH
83 listed above, he can supply a decoder and an encoder for it in CCL
84 (Code Conversion Language) programs. Emacs executes the CCL program
85 while reading/writing.
86
d46c5b12
KH
87 Emacs represents a coding system by a Lisp symbol that has a property
88 `coding-system'. But, before actually using the coding system, the
4ed46869 89 information about it is set in a structure of type `struct
f4dee582 90 coding_system' for rapid processing. See section 6 for more details.
4ed46869
KH
91
92*/
93
94/*** GENERAL NOTES on END-OF-LINE FORMAT ***
95
96 How end-of-line of a text is encoded depends on a system. For
97 instance, Unix's format is just one byte of `line-feed' code,
f4dee582 98 whereas DOS's format is two-byte sequence of `carriage-return' and
d46c5b12
KH
99 `line-feed' codes. MacOS's format is usually one byte of
100 `carriage-return'.
4ed46869 101
f4dee582
RS
102 Since text characters encoding and end-of-line encoding are
103 independent, any coding system described above can take
4ed46869 104 any format of end-of-line. So, Emacs has information of format of
f4dee582 105 end-of-line in each coding-system. See section 6 for more details.
4ed46869
KH
106
107*/
108
109/*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
110
111 These functions check if a text between SRC and SRC_END is encoded
112 in the coding system category XXX. Each returns an integer value in
113 which appropriate flag bits for the category XXX is set. The flag
114 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the
115 template of these functions. */
116#if 0
117int
0ef69138 118detect_coding_emacs_mule (src, src_end)
4ed46869
KH
119 unsigned char *src, *src_end;
120{
121 ...
122}
123#endif
124
125/*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
126
b73bfc1c
KH
127 These functions decode SRC_BYTES length of unibyte text at SOURCE
128 encoded in CODING to Emacs' internal format. The resulting
129 multibyte text goes to a place pointed to by DESTINATION, the length
130 of which should not exceed DST_BYTES.
d46c5b12 131
b73bfc1c
KH
132 These functions set the information of original and decoded texts in
133 the members produced, produced_char, consumed, and consumed_char of
134 the structure *CODING. They also set the member result to one of
135 CODING_FINISH_XXX indicating how the decoding finished.
d46c5b12
KH
136
137 DST_BYTES zero means that source area and destination area are
138 overlapped, which means that we can produce a decoded text until it
139 reaches at the head of not-yet-decoded source text.
140
141 Below is a template of these functions. */
4ed46869 142#if 0
b73bfc1c 143static void
d46c5b12 144decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
145 struct coding_system *coding;
146 unsigned char *source, *destination;
147 int src_bytes, dst_bytes;
4ed46869
KH
148{
149 ...
150}
151#endif
152
153/*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
154
0ef69138 155 These functions encode SRC_BYTES length text at SOURCE of Emacs'
b73bfc1c
KH
156 internal multibyte format to CODING. The resulting unibyte text
157 goes to a place pointed to by DESTINATION, the length of which
158 should not exceed DST_BYTES.
d46c5b12 159
b73bfc1c
KH
160 These functions set the information of original and encoded texts in
161 the members produced, produced_char, consumed, and consumed_char of
162 the structure *CODING. They also set the member result to one of
163 CODING_FINISH_XXX indicating how the encoding finished.
d46c5b12
KH
164
165 DST_BYTES zero means that source area and destination area are
b73bfc1c
KH
166 overlapped, which means that we can produce a encoded text until it
167 reaches at the head of not-yet-encoded source text.
d46c5b12
KH
168
169 Below is a template of these functions. */
4ed46869 170#if 0
b73bfc1c 171static void
d46c5b12 172encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
173 struct coding_system *coding;
174 unsigned char *source, *destination;
175 int src_bytes, dst_bytes;
4ed46869
KH
176{
177 ...
178}
179#endif
180
181/*** COMMONLY USED MACROS ***/
182
b73bfc1c
KH
183/* The following two macros ONE_MORE_BYTE and TWO_MORE_BYTES safely
184 get one, two, and three bytes from the source text respectively.
185 If there are not enough bytes in the source, they jump to
186 `label_end_of_loop'. The caller should set variables `coding',
187 `src' and `src_end' to appropriate pointer in advance. These
188 macros are called from decoding routines `decode_coding_XXX', thus
189 it is assumed that the source text is unibyte. */
4ed46869 190
b73bfc1c
KH
191#define ONE_MORE_BYTE(c1) \
192 do { \
193 if (src >= src_end) \
194 { \
195 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
196 goto label_end_of_loop; \
197 } \
198 c1 = *src++; \
4ed46869
KH
199 } while (0)
200
b73bfc1c
KH
201#define TWO_MORE_BYTES(c1, c2) \
202 do { \
203 if (src + 1 >= src_end) \
204 { \
205 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
206 goto label_end_of_loop; \
207 } \
208 c1 = *src++; \
209 c2 = *src++; \
4ed46869
KH
210 } while (0)
211
4ed46869 212
b73bfc1c
KH
213/* Set C to the next character at the source text pointed by `src'.
214 If there are not enough characters in the source, jump to
215 `label_end_of_loop'. The caller should set variables `coding'
216 `src', `src_end', and `translation_table' to appropriate pointers
217 in advance. This macro is used in encoding routines
218 `encode_coding_XXX', thus it assumes that the source text is in
219 multibyte form except for 8-bit characters. 8-bit characters are
220 in multibyte form if coding->src_multibyte is nonzero, else they
221 are represented by a single byte. */
4ed46869 222
b73bfc1c
KH
223#define ONE_MORE_CHAR(c) \
224 do { \
225 int len = src_end - src; \
226 int bytes; \
227 if (len <= 0) \
228 { \
229 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
230 goto label_end_of_loop; \
231 } \
232 if (coding->src_multibyte \
233 || UNIBYTE_STR_AS_MULTIBYTE_P (src, len, bytes)) \
234 c = STRING_CHAR_AND_LENGTH (src, len, bytes); \
235 else \
236 c = *src, bytes = 1; \
237 if (!NILP (translation_table)) \
238 c = translate_char (translation_table, c, 0, 0, 0); \
239 src += bytes; \
4ed46869
KH
240 } while (0)
241
4ed46869 242
b73bfc1c
KH
243/* Produce a multibyte form of characater C to `dst'. Jump to
244 `label_end_of_loop' if there's not enough space at `dst'.
245
246 If we are now in the middle of composition sequence, the decoded
247 character may be ALTCHAR (for the current composition). In that
248 case, the character goes to coding->cmp_data->data instead of
249 `dst'.
250
251 This macro is used in decoding routines. */
252
253#define EMIT_CHAR(c) \
4ed46869 254 do { \
b73bfc1c
KH
255 if (! COMPOSING_P (coding) \
256 || coding->composing == COMPOSITION_RELATIVE \
257 || coding->composing == COMPOSITION_WITH_RULE) \
258 { \
259 int bytes = CHAR_BYTES (c); \
260 if ((dst + bytes) > (dst_bytes ? dst_end : src)) \
261 { \
262 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
263 goto label_end_of_loop; \
264 } \
265 dst += CHAR_STRING (c, dst); \
266 coding->produced_char++; \
267 } \
ec6d2bb8 268 \
b73bfc1c
KH
269 if (COMPOSING_P (coding) \
270 && coding->composing != COMPOSITION_RELATIVE) \
271 { \
272 CODING_ADD_COMPOSITION_COMPONENT (coding, c); \
273 coding->composition_rule_follows \
274 = coding->composing != COMPOSITION_WITH_ALTCHARS; \
275 } \
4ed46869
KH
276 } while (0)
277
4ed46869 278
b73bfc1c
KH
279#define EMIT_ONE_BYTE(c) \
280 do { \
281 if (dst >= (dst_bytes ? dst_end : src)) \
282 { \
283 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
284 goto label_end_of_loop; \
285 } \
286 *dst++ = c; \
287 } while (0)
288
289#define EMIT_TWO_BYTES(c1, c2) \
290 do { \
291 if (dst + 2 > (dst_bytes ? dst_end : src)) \
292 { \
293 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
294 goto label_end_of_loop; \
295 } \
296 *dst++ = c1, *dst++ = c2; \
297 } while (0)
298
299#define EMIT_BYTES(from, to) \
300 do { \
301 if (dst + (to - from) > (dst_bytes ? dst_end : src)) \
302 { \
303 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
304 goto label_end_of_loop; \
305 } \
306 while (from < to) \
307 *dst++ = *from++; \
4ed46869
KH
308 } while (0)
309
310\f
311/*** 1. Preamble ***/
312
68c45bf0
PE
313#ifdef emacs
314#include <config.h>
315#endif
316
4ed46869
KH
317#include <stdio.h>
318
319#ifdef emacs
320
4ed46869
KH
321#include "lisp.h"
322#include "buffer.h"
323#include "charset.h"
ec6d2bb8 324#include "composite.h"
4ed46869
KH
325#include "ccl.h"
326#include "coding.h"
327#include "window.h"
328
329#else /* not emacs */
330
331#include "mulelib.h"
332
333#endif /* not emacs */
334
335Lisp_Object Qcoding_system, Qeol_type;
336Lisp_Object Qbuffer_file_coding_system;
337Lisp_Object Qpost_read_conversion, Qpre_write_conversion;
27901516 338Lisp_Object Qno_conversion, Qundecided;
bb0115a2 339Lisp_Object Qcoding_system_history;
70c22245 340Lisp_Object Qsafe_charsets;
1397dc18 341Lisp_Object Qvalid_codes;
4ed46869
KH
342
343extern Lisp_Object Qinsert_file_contents, Qwrite_region;
344Lisp_Object Qcall_process, Qcall_process_region, Qprocess_argument;
345Lisp_Object Qstart_process, Qopen_network_stream;
346Lisp_Object Qtarget_idx;
347
d46c5b12
KH
348Lisp_Object Vselect_safe_coding_system_function;
349
7722baf9
EZ
350/* Mnemonic string for each format of end-of-line. */
351Lisp_Object eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac;
352/* Mnemonic string to indicate format of end-of-line is not yet
4ed46869 353 decided. */
7722baf9 354Lisp_Object eol_mnemonic_undecided;
4ed46869 355
9ce27fde
KH
356/* Format of end-of-line decided by system. This is CODING_EOL_LF on
357 Unix, CODING_EOL_CRLF on DOS/Windows, and CODING_EOL_CR on Mac. */
358int system_eol_type;
359
4ed46869
KH
360#ifdef emacs
361
4608c386
KH
362Lisp_Object Vcoding_system_list, Vcoding_system_alist;
363
364Lisp_Object Qcoding_system_p, Qcoding_system_error;
4ed46869 365
d46c5b12
KH
366/* Coding system emacs-mule and raw-text are for converting only
367 end-of-line format. */
368Lisp_Object Qemacs_mule, Qraw_text;
9ce27fde 369
4ed46869
KH
370/* Coding-systems are handed between Emacs Lisp programs and C internal
371 routines by the following three variables. */
372/* Coding-system for reading files and receiving data from process. */
373Lisp_Object Vcoding_system_for_read;
374/* Coding-system for writing files and sending data to process. */
375Lisp_Object Vcoding_system_for_write;
376/* Coding-system actually used in the latest I/O. */
377Lisp_Object Vlast_coding_system_used;
378
c4825358 379/* A vector of length 256 which contains information about special
94487c4e 380 Latin codes (especially for dealing with Microsoft codes). */
3f003981 381Lisp_Object Vlatin_extra_code_table;
c4825358 382
9ce27fde
KH
383/* Flag to inhibit code conversion of end-of-line format. */
384int inhibit_eol_conversion;
385
74383408
KH
386/* Flag to inhibit ISO2022 escape sequence detection. */
387int inhibit_iso_escape_detection;
388
ed29121d
EZ
389/* Flag to make buffer-file-coding-system inherit from process-coding. */
390int inherit_process_coding_system;
391
c4825358 392/* Coding system to be used to encode text for terminal display. */
4ed46869
KH
393struct coding_system terminal_coding;
394
c4825358
KH
395/* Coding system to be used to encode text for terminal display when
396 terminal coding system is nil. */
397struct coding_system safe_terminal_coding;
398
399/* Coding system of what is sent from terminal keyboard. */
4ed46869
KH
400struct coding_system keyboard_coding;
401
6bc51348
KH
402/* Default coding system to be used to write a file. */
403struct coding_system default_buffer_file_coding;
404
02ba4723
KH
405Lisp_Object Vfile_coding_system_alist;
406Lisp_Object Vprocess_coding_system_alist;
407Lisp_Object Vnetwork_coding_system_alist;
4ed46869 408
68c45bf0
PE
409Lisp_Object Vlocale_coding_system;
410
4ed46869
KH
411#endif /* emacs */
412
d46c5b12 413Lisp_Object Qcoding_category, Qcoding_category_index;
4ed46869
KH
414
415/* List of symbols `coding-category-xxx' ordered by priority. */
416Lisp_Object Vcoding_category_list;
417
d46c5b12
KH
418/* Table of coding categories (Lisp symbols). */
419Lisp_Object Vcoding_category_table;
4ed46869
KH
420
421/* Table of names of symbol for each coding-category. */
422char *coding_category_name[CODING_CATEGORY_IDX_MAX] = {
0ef69138 423 "coding-category-emacs-mule",
4ed46869
KH
424 "coding-category-sjis",
425 "coding-category-iso-7",
d46c5b12 426 "coding-category-iso-7-tight",
4ed46869
KH
427 "coding-category-iso-8-1",
428 "coding-category-iso-8-2",
7717c392
KH
429 "coding-category-iso-7-else",
430 "coding-category-iso-8-else",
89fa8b36 431 "coding-category-ccl",
4ed46869 432 "coding-category-big5",
fa42c37f
KH
433 "coding-category-utf-8",
434 "coding-category-utf-16-be",
435 "coding-category-utf-16-le",
27901516 436 "coding-category-raw-text",
89fa8b36 437 "coding-category-binary"
4ed46869
KH
438};
439
66cfb530 440/* Table of pointers to coding systems corresponding to each coding
d46c5b12
KH
441 categories. */
442struct coding_system *coding_system_table[CODING_CATEGORY_IDX_MAX];
443
66cfb530
KH
444/* Table of coding category masks. Nth element is a mask for a coding
445 cateogry of which priority is Nth. */
446static
447int coding_priorities[CODING_CATEGORY_IDX_MAX];
448
f967223b
KH
449/* Flag to tell if we look up translation table on character code
450 conversion. */
84fbb8a0 451Lisp_Object Venable_character_translation;
f967223b
KH
452/* Standard translation table to look up on decoding (reading). */
453Lisp_Object Vstandard_translation_table_for_decode;
454/* Standard translation table to look up on encoding (writing). */
455Lisp_Object Vstandard_translation_table_for_encode;
84fbb8a0 456
f967223b
KH
457Lisp_Object Qtranslation_table;
458Lisp_Object Qtranslation_table_id;
459Lisp_Object Qtranslation_table_for_decode;
460Lisp_Object Qtranslation_table_for_encode;
4ed46869
KH
461
462/* Alist of charsets vs revision number. */
463Lisp_Object Vcharset_revision_alist;
464
02ba4723
KH
465/* Default coding systems used for process I/O. */
466Lisp_Object Vdefault_process_coding_system;
467
b843d1ae
KH
468/* Global flag to tell that we can't call post-read-conversion and
469 pre-write-conversion functions. Usually the value is zero, but it
470 is set to 1 temporarily while such functions are running. This is
471 to avoid infinite recursive call. */
472static int inhibit_pre_post_conversion;
473
4ed46869 474\f
0ef69138 475/*** 2. Emacs internal format (emacs-mule) handlers ***/
4ed46869
KH
476
477/* Emacs' internal format for encoding multiple character sets is a
f4dee582 478 kind of multi-byte encoding, i.e. characters are encoded by
b73bfc1c
KH
479 variable-length sequences of one-byte codes.
480
481 ASCII characters and control characters (e.g. `tab', `newline') are
482 represented by one-byte sequences which are their ASCII codes, in
483 the range 0x00 through 0x7F.
484
485 8-bit characters of the range 0x80..0x9F are represented by
486 two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit
487 code + 0x20).
488
489 8-bit characters of the range 0xA0..0xFF are represented by
490 one-byte sequences which are their 8-bit code.
491
492 The other characters are represented by a sequence of `base
493 leading-code', optional `extended leading-code', and one or two
494 `position-code's. The length of the sequence is determined by the
495 base leading-code. Leading-code takes the range 0x80 through 0x9F,
496 whereas extended leading-code and position-code take the range 0xA0
497 through 0xFF. See `charset.h' for more details about leading-code
498 and position-code.
f4dee582 499
4ed46869 500 --- CODE RANGE of Emacs' internal format ---
b73bfc1c
KH
501 character set range
502 ------------- -----
503 ascii 0x00..0x7F
504 eight-bit-control LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF
505 eight-bit-graphic 0xA0..0xBF
506 ELSE 0x81..0x9F + [0xA0..0xFF]+
4ed46869
KH
507 ---------------------------------------------
508
509 */
510
511enum emacs_code_class_type emacs_code_class[256];
512
4ed46869
KH
513/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
514 Check if a text is encoded in Emacs' internal format. If it is,
d46c5b12 515 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */
4ed46869
KH
516
517int
0ef69138 518detect_coding_emacs_mule (src, src_end)
b73bfc1c 519 unsigned char *src, *src_end;
4ed46869
KH
520{
521 unsigned char c;
522 int composing = 0;
b73bfc1c
KH
523 /* Dummy for ONE_MORE_BYTE. */
524 struct coding_system dummy_coding;
525 struct coding_system *coding = &dummy_coding;
4ed46869 526
b73bfc1c 527 while (1)
4ed46869 528 {
b73bfc1c 529 ONE_MORE_BYTE (c);
4ed46869
KH
530
531 if (composing)
532 {
533 if (c < 0xA0)
534 composing = 0;
b73bfc1c
KH
535 else if (c == 0xA0)
536 {
537 ONE_MORE_BYTE (c);
538 c &= 0x7F;
539 }
4ed46869
KH
540 else
541 c -= 0x20;
542 }
543
b73bfc1c 544 if (c < 0x20)
4ed46869 545 {
4ed46869
KH
546 if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)
547 return 0;
b73bfc1c
KH
548 }
549 else if (c >= 0x80 && c < 0xA0)
550 {
551 if (c == 0x80)
552 /* Old leading code for a composite character. */
553 composing = 1;
554 else
555 {
556 unsigned char *src_base = src - 1;
557 int bytes;
4ed46869 558
b73bfc1c
KH
559 if (!UNIBYTE_STR_AS_MULTIBYTE_P (src_base, src_end - src_base,
560 bytes))
561 return 0;
562 src = src_base + bytes;
563 }
564 }
565 }
566 label_end_of_loop:
567 return CODING_CATEGORY_MASK_EMACS_MULE;
568}
4ed46869 569
4ed46869 570
b73bfc1c 571/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
4ed46869 572
b73bfc1c
KH
573static void
574decode_coding_emacs_mule (coding, source, destination, src_bytes, dst_bytes)
575 struct coding_system *coding;
576 unsigned char *source, *destination;
577 int src_bytes, dst_bytes;
578{
579 unsigned char *src = source;
580 unsigned char *src_end = source + src_bytes;
581 unsigned char *dst = destination;
582 unsigned char *dst_end = destination + dst_bytes;
583 /* SRC_BASE remembers the start position in source in each loop.
584 The loop will be exited when there's not enough source code, or
585 when there's not enough destination area to produce a
586 character. */
587 unsigned char *src_base;
4ed46869 588
b73bfc1c 589 coding->produced_char = 0;
8a33cf7b 590 while ((src_base = src) < src_end)
b73bfc1c
KH
591 {
592 unsigned char tmp[MAX_MULTIBYTE_LENGTH], *p;
593 int bytes;
ec6d2bb8 594
b73bfc1c
KH
595 if (UNIBYTE_STR_AS_MULTIBYTE_P (src, src_end - src, bytes))
596 {
597 p = src;
598 src += bytes;
599 }
600 else
601 {
602 bytes = CHAR_STRING (*src, tmp);
603 p = tmp;
604 src++;
605 }
606 if (dst + bytes >= (dst_bytes ? dst_end : src))
607 {
608 coding->result = CODING_FINISH_INSUFFICIENT_DST;
4ed46869
KH
609 break;
610 }
b73bfc1c
KH
611 while (bytes--) *dst++ = *p++;
612 coding->produced_char++;
4ed46869 613 }
b73bfc1c
KH
614 coding->consumed = coding->consumed_char = src_base - source;
615 coding->produced = dst - destination;
4ed46869
KH
616}
617
b73bfc1c
KH
618#define encode_coding_emacs_mule(coding, source, destination, src_bytes, dst_bytes) \
619 encode_eol (coding, source, destination, src_bytes, dst_bytes)
620
621
4ed46869
KH
622\f
623/*** 3. ISO2022 handlers ***/
624
625/* The following note describes the coding system ISO2022 briefly.
39787efd
KH
626 Since the intention of this note is to help understand the
627 functions in this file, some parts are NOT ACCURATE or OVERLY
628 SIMPLIFIED. For thorough understanding, please refer to the
4ed46869
KH
629 original document of ISO2022.
630
631 ISO2022 provides many mechanisms to encode several character sets
39787efd
KH
632 in 7-bit and 8-bit environments. For 7-bite environments, all text
633 is encoded using bytes less than 128. This may make the encoded
634 text a little bit longer, but the text passes more easily through
635 several gateways, some of which strip off MSB (Most Signigant Bit).
b73bfc1c 636
39787efd 637 There are two kinds of character sets: control character set and
4ed46869
KH
638 graphic character set. The former contains control characters such
639 as `newline' and `escape' to provide control functions (control
39787efd
KH
640 functions are also provided by escape sequences). The latter
641 contains graphic characters such as 'A' and '-'. Emacs recognizes
4ed46869
KH
642 two control character sets and many graphic character sets.
643
644 Graphic character sets are classified into one of the following
39787efd
KH
645 four classes, according to the number of bytes (DIMENSION) and
646 number of characters in one dimension (CHARS) of the set:
647 - DIMENSION1_CHARS94
648 - DIMENSION1_CHARS96
649 - DIMENSION2_CHARS94
650 - DIMENSION2_CHARS96
651
652 In addition, each character set is assigned an identification tag,
653 unique for each set, called "final character" (denoted as <F>
654 hereafter). The <F> of each character set is decided by ECMA(*)
655 when it is registered in ISO. The code range of <F> is 0x30..0x7F
656 (0x30..0x3F are for private use only).
4ed46869
KH
657
658 Note (*): ECMA = European Computer Manufacturers Association
659
660 Here are examples of graphic character set [NAME(<F>)]:
661 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
662 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
663 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
664 o DIMENSION2_CHARS96 -- none for the moment
665
39787efd 666 A code area (1 byte=8 bits) is divided into 4 areas, C0, GL, C1, and GR.
4ed46869
KH
667 C0 [0x00..0x1F] -- control character plane 0
668 GL [0x20..0x7F] -- graphic character plane 0
669 C1 [0x80..0x9F] -- control character plane 1
670 GR [0xA0..0xFF] -- graphic character plane 1
671
672 A control character set is directly designated and invoked to C0 or
39787efd
KH
673 C1 by an escape sequence. The most common case is that:
674 - ISO646's control character set is designated/invoked to C0, and
675 - ISO6429's control character set is designated/invoked to C1,
676 and usually these designations/invocations are omitted in encoded
677 text. In a 7-bit environment, only C0 can be used, and a control
678 character for C1 is encoded by an appropriate escape sequence to
679 fit into the environment. All control characters for C1 are
680 defined to have corresponding escape sequences.
4ed46869
KH
681
682 A graphic character set is at first designated to one of four
683 graphic registers (G0 through G3), then these graphic registers are
684 invoked to GL or GR. These designations and invocations can be
685 done independently. The most common case is that G0 is invoked to
39787efd
KH
686 GL, G1 is invoked to GR, and ASCII is designated to G0. Usually
687 these invocations and designations are omitted in encoded text.
688 In a 7-bit environment, only GL can be used.
4ed46869 689
39787efd
KH
690 When a graphic character set of CHARS94 is invoked to GL, codes
691 0x20 and 0x7F of the GL area work as control characters SPACE and
692 DEL respectively, and codes 0xA0 and 0xFF of the GR area should not
693 be used.
4ed46869
KH
694
695 There are two ways of invocation: locking-shift and single-shift.
696 With locking-shift, the invocation lasts until the next different
39787efd
KH
697 invocation, whereas with single-shift, the invocation affects the
698 following character only and doesn't affect the locking-shift
699 state. Invocations are done by the following control characters or
700 escape sequences:
4ed46869
KH
701
702 ----------------------------------------------------------------------
39787efd 703 abbrev function cntrl escape seq description
4ed46869 704 ----------------------------------------------------------------------
39787efd
KH
705 SI/LS0 (shift-in) 0x0F none invoke G0 into GL
706 SO/LS1 (shift-out) 0x0E none invoke G1 into GL
707 LS2 (locking-shift-2) none ESC 'n' invoke G2 into GL
708 LS3 (locking-shift-3) none ESC 'o' invoke G3 into GL
709 LS1R (locking-shift-1 right) none ESC '~' invoke G1 into GR (*)
710 LS2R (locking-shift-2 right) none ESC '}' invoke G2 into GR (*)
711 LS3R (locking-shift 3 right) none ESC '|' invoke G3 into GR (*)
712 SS2 (single-shift-2) 0x8E ESC 'N' invoke G2 for one char
713 SS3 (single-shift-3) 0x8F ESC 'O' invoke G3 for one char
4ed46869 714 ----------------------------------------------------------------------
39787efd
KH
715 (*) These are not used by any known coding system.
716
717 Control characters for these functions are defined by macros
718 ISO_CODE_XXX in `coding.h'.
4ed46869 719
39787efd 720 Designations are done by the following escape sequences:
4ed46869
KH
721 ----------------------------------------------------------------------
722 escape sequence description
723 ----------------------------------------------------------------------
724 ESC '(' <F> designate DIMENSION1_CHARS94<F> to G0
725 ESC ')' <F> designate DIMENSION1_CHARS94<F> to G1
726 ESC '*' <F> designate DIMENSION1_CHARS94<F> to G2
727 ESC '+' <F> designate DIMENSION1_CHARS94<F> to G3
728 ESC ',' <F> designate DIMENSION1_CHARS96<F> to G0 (*)
729 ESC '-' <F> designate DIMENSION1_CHARS96<F> to G1
730 ESC '.' <F> designate DIMENSION1_CHARS96<F> to G2
731 ESC '/' <F> designate DIMENSION1_CHARS96<F> to G3
732 ESC '$' '(' <F> designate DIMENSION2_CHARS94<F> to G0 (**)
733 ESC '$' ')' <F> designate DIMENSION2_CHARS94<F> to G1
734 ESC '$' '*' <F> designate DIMENSION2_CHARS94<F> to G2
735 ESC '$' '+' <F> designate DIMENSION2_CHARS94<F> to G3
736 ESC '$' ',' <F> designate DIMENSION2_CHARS96<F> to G0 (*)
737 ESC '$' '-' <F> designate DIMENSION2_CHARS96<F> to G1
738 ESC '$' '.' <F> designate DIMENSION2_CHARS96<F> to G2
739 ESC '$' '/' <F> designate DIMENSION2_CHARS96<F> to G3
740 ----------------------------------------------------------------------
741
742 In this list, "DIMENSION1_CHARS94<F>" means a graphic character set
39787efd 743 of dimension 1, chars 94, and final character <F>, etc...
4ed46869
KH
744
745 Note (*): Although these designations are not allowed in ISO2022,
746 Emacs accepts them on decoding, and produces them on encoding
39787efd 747 CHARS96 character sets in a coding system which is characterized as
4ed46869
KH
748 7-bit environment, non-locking-shift, and non-single-shift.
749
750 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
39787efd 751 '(' can be omitted. We refer to this as "short-form" hereafter.
4ed46869
KH
752
753 Now you may notice that there are a lot of ways for encoding the
39787efd
KH
754 same multilingual text in ISO2022. Actually, there exist many
755 coding systems such as Compound Text (used in X11's inter client
756 communication, ISO-2022-JP (used in Japanese internet), ISO-2022-KR
757 (used in Korean internet), EUC (Extended UNIX Code, used in Asian
4ed46869
KH
758 localized platforms), and all of these are variants of ISO2022.
759
760 In addition to the above, Emacs handles two more kinds of escape
761 sequences: ISO6429's direction specification and Emacs' private
762 sequence for specifying character composition.
763
39787efd 764 ISO6429's direction specification takes the following form:
4ed46869
KH
765 o CSI ']' -- end of the current direction
766 o CSI '0' ']' -- end of the current direction
767 o CSI '1' ']' -- start of left-to-right text
768 o CSI '2' ']' -- start of right-to-left text
769 The control character CSI (0x9B: control sequence introducer) is
39787efd
KH
770 abbreviated to the escape sequence ESC '[' in a 7-bit environment.
771
772 Character composition specification takes the following form:
ec6d2bb8
KH
773 o ESC '0' -- start relative composition
774 o ESC '1' -- end composition
775 o ESC '2' -- start rule-base composition (*)
776 o ESC '3' -- start relative composition with alternate chars (**)
777 o ESC '4' -- start rule-base composition with alternate chars (**)
b73bfc1c
KH
778 Since these are not standard escape sequences of any ISO standard,
779 the use of them for these meaning is restricted to Emacs only.
ec6d2bb8 780
b73bfc1c
KH
781 (*) This form is used only in Emacs 20.5 and the older versions,
782 but the newer versions can safely decode it.
783 (**) This form is used only in Emacs 21.1 and the newer versions,
784 and the older versions can't decode it.
ec6d2bb8 785
b73bfc1c
KH
786 Here's a list of examples usages of these composition escape
787 sequences (categorized by `enum composition_method').
ec6d2bb8 788
b73bfc1c 789 COMPOSITION_RELATIVE:
ec6d2bb8 790 ESC 0 CHAR [ CHAR ] ESC 1
b73bfc1c 791 COMPOSITOIN_WITH_RULE:
ec6d2bb8 792 ESC 2 CHAR [ RULE CHAR ] ESC 1
b73bfc1c 793 COMPOSITION_WITH_ALTCHARS:
ec6d2bb8 794 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
b73bfc1c 795 COMPOSITION_WITH_RULE_ALTCHARS:
ec6d2bb8 796 ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */
4ed46869
KH
797
798enum iso_code_class_type iso_code_class[256];
799
f024b6aa
RS
800#define CHARSET_OK(idx, charset) \
801 (coding_system_table[idx] \
802 && (coding_system_table[idx]->safe_charsets[charset] \
803 || (CODING_SPEC_ISO_REQUESTED_DESIGNATION \
804 (coding_system_table[idx], charset) \
805 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)))
d46c5b12
KH
806
807#define SHIFT_OUT_OK(idx) \
808 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
809
4ed46869
KH
810/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
811 Check if a text is encoded in ISO2022. If it is, returns an
812 integer in which appropriate flag bits any of:
813 CODING_CATEGORY_MASK_ISO_7
d46c5b12 814 CODING_CATEGORY_MASK_ISO_7_TIGHT
4ed46869
KH
815 CODING_CATEGORY_MASK_ISO_8_1
816 CODING_CATEGORY_MASK_ISO_8_2
7717c392
KH
817 CODING_CATEGORY_MASK_ISO_7_ELSE
818 CODING_CATEGORY_MASK_ISO_8_ELSE
4ed46869
KH
819 are set. If a code which should never appear in ISO2022 is found,
820 returns 0. */
821
822int
823detect_coding_iso2022 (src, src_end)
824 unsigned char *src, *src_end;
825{
d46c5b12
KH
826 int mask = CODING_CATEGORY_MASK_ISO;
827 int mask_found = 0;
f46869e4 828 int reg[4], shift_out = 0, single_shifting = 0;
d46c5b12 829 int c, c1, i, charset;
b73bfc1c
KH
830 /* Dummy for ONE_MORE_BYTE. */
831 struct coding_system dummy_coding;
832 struct coding_system *coding = &dummy_coding;
3f003981 833
d46c5b12 834 reg[0] = CHARSET_ASCII, reg[1] = reg[2] = reg[3] = -1;
3f003981 835 while (mask && src < src_end)
4ed46869 836 {
b73bfc1c 837 ONE_MORE_BYTE (c);
4ed46869
KH
838 switch (c)
839 {
840 case ISO_CODE_ESC:
74383408
KH
841 if (inhibit_iso_escape_detection)
842 break;
f46869e4 843 single_shifting = 0;
b73bfc1c 844 ONE_MORE_BYTE (c);
d46c5b12 845 if (c >= '(' && c <= '/')
4ed46869 846 {
bf9cdd4e 847 /* Designation sequence for a charset of dimension 1. */
b73bfc1c 848 ONE_MORE_BYTE (c1);
d46c5b12
KH
849 if (c1 < ' ' || c1 >= 0x80
850 || (charset = iso_charset_table[0][c >= ','][c1]) < 0)
851 /* Invalid designation sequence. Just ignore. */
852 break;
853 reg[(c - '(') % 4] = charset;
bf9cdd4e
KH
854 }
855 else if (c == '$')
856 {
857 /* Designation sequence for a charset of dimension 2. */
b73bfc1c 858 ONE_MORE_BYTE (c);
bf9cdd4e
KH
859 if (c >= '@' && c <= 'B')
860 /* Designation for JISX0208.1978, GB2312, or JISX0208. */
d46c5b12 861 reg[0] = charset = iso_charset_table[1][0][c];
bf9cdd4e 862 else if (c >= '(' && c <= '/')
bcf26d6a 863 {
b73bfc1c 864 ONE_MORE_BYTE (c1);
d46c5b12
KH
865 if (c1 < ' ' || c1 >= 0x80
866 || (charset = iso_charset_table[1][c >= ','][c1]) < 0)
867 /* Invalid designation sequence. Just ignore. */
868 break;
869 reg[(c - '(') % 4] = charset;
bcf26d6a 870 }
bf9cdd4e 871 else
d46c5b12
KH
872 /* Invalid designation sequence. Just ignore. */
873 break;
874 }
ae9ff118 875 else if (c == 'N' || c == 'O')
d46c5b12 876 {
ae9ff118
KH
877 /* ESC <Fe> for SS2 or SS3. */
878 mask &= CODING_CATEGORY_MASK_ISO_7_ELSE;
d46c5b12 879 break;
4ed46869 880 }
ec6d2bb8
KH
881 else if (c >= '0' && c <= '4')
882 {
883 /* ESC <Fp> for start/end composition. */
884 mask_found |= CODING_CATEGORY_MASK_ISO;
885 break;
886 }
bf9cdd4e 887 else
d46c5b12
KH
888 /* Invalid escape sequence. Just ignore. */
889 break;
890
891 /* We found a valid designation sequence for CHARSET. */
892 mask &= ~CODING_CATEGORY_MASK_ISO_8BIT;
893 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7, charset))
894 mask_found |= CODING_CATEGORY_MASK_ISO_7;
895 else
896 mask &= ~CODING_CATEGORY_MASK_ISO_7;
897 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT, charset))
898 mask_found |= CODING_CATEGORY_MASK_ISO_7_TIGHT;
899 else
900 mask &= ~CODING_CATEGORY_MASK_ISO_7_TIGHT;
ae9ff118
KH
901 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE, charset))
902 mask_found |= CODING_CATEGORY_MASK_ISO_7_ELSE;
903 else
d46c5b12 904 mask &= ~CODING_CATEGORY_MASK_ISO_7_ELSE;
ae9ff118
KH
905 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE, charset))
906 mask_found |= CODING_CATEGORY_MASK_ISO_8_ELSE;
907 else
d46c5b12 908 mask &= ~CODING_CATEGORY_MASK_ISO_8_ELSE;
4ed46869
KH
909 break;
910
4ed46869 911 case ISO_CODE_SO:
74383408
KH
912 if (inhibit_iso_escape_detection)
913 break;
f46869e4 914 single_shifting = 0;
d46c5b12
KH
915 if (shift_out == 0
916 && (reg[1] >= 0
917 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE)
918 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE)))
919 {
920 /* Locking shift out. */
921 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
922 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
923 }
e0e989f6
KH
924 break;
925
d46c5b12 926 case ISO_CODE_SI:
74383408
KH
927 if (inhibit_iso_escape_detection)
928 break;
f46869e4 929 single_shifting = 0;
d46c5b12
KH
930 if (shift_out == 1)
931 {
932 /* Locking shift in. */
933 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
934 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
935 }
936 break;
937
4ed46869 938 case ISO_CODE_CSI:
f46869e4 939 single_shifting = 0;
4ed46869
KH
940 case ISO_CODE_SS2:
941 case ISO_CODE_SS3:
3f003981
KH
942 {
943 int newmask = CODING_CATEGORY_MASK_ISO_8_ELSE;
944
74383408
KH
945 if (inhibit_iso_escape_detection)
946 break;
70c22245
KH
947 if (c != ISO_CODE_CSI)
948 {
d46c5b12
KH
949 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
950 & CODING_FLAG_ISO_SINGLE_SHIFT)
70c22245 951 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
952 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
953 & CODING_FLAG_ISO_SINGLE_SHIFT)
70c22245 954 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
f46869e4 955 single_shifting = 1;
70c22245 956 }
3f003981
KH
957 if (VECTORP (Vlatin_extra_code_table)
958 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
959 {
d46c5b12
KH
960 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
961 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981 962 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
963 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
964 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981
KH
965 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
966 }
967 mask &= newmask;
d46c5b12 968 mask_found |= newmask;
3f003981
KH
969 }
970 break;
4ed46869
KH
971
972 default:
973 if (c < 0x80)
f46869e4
KH
974 {
975 single_shifting = 0;
976 break;
977 }
4ed46869 978 else if (c < 0xA0)
c4825358 979 {
f46869e4 980 single_shifting = 0;
3f003981
KH
981 if (VECTORP (Vlatin_extra_code_table)
982 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
c4825358 983 {
3f003981
KH
984 int newmask = 0;
985
d46c5b12
KH
986 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
987 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981 988 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
989 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
990 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981
KH
991 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
992 mask &= newmask;
d46c5b12 993 mask_found |= newmask;
c4825358 994 }
3f003981
KH
995 else
996 return 0;
c4825358 997 }
4ed46869
KH
998 else
999 {
d46c5b12 1000 mask &= ~(CODING_CATEGORY_MASK_ISO_7BIT
7717c392 1001 | CODING_CATEGORY_MASK_ISO_7_ELSE);
d46c5b12 1002 mask_found |= CODING_CATEGORY_MASK_ISO_8_1;
f46869e4
KH
1003 /* Check the length of succeeding codes of the range
1004 0xA0..0FF. If the byte length is odd, we exclude
1005 CODING_CATEGORY_MASK_ISO_8_2. We can check this only
1006 when we are not single shifting. */
b73bfc1c
KH
1007 if (!single_shifting
1008 && mask & CODING_CATEGORY_MASK_ISO_8_2)
f46869e4 1009 {
e17de821 1010 int i = 1;
b73bfc1c
KH
1011 while (src < src_end)
1012 {
1013 ONE_MORE_BYTE (c);
1014 if (c < 0xA0)
1015 break;
1016 i++;
1017 }
1018
1019 if (i & 1 && src < src_end)
f46869e4
KH
1020 mask &= ~CODING_CATEGORY_MASK_ISO_8_2;
1021 else
1022 mask_found |= CODING_CATEGORY_MASK_ISO_8_2;
1023 }
4ed46869
KH
1024 }
1025 break;
1026 }
1027 }
b73bfc1c 1028 label_end_of_loop:
d46c5b12 1029 return (mask & mask_found);
4ed46869
KH
1030}
1031
b73bfc1c
KH
1032/* Decode a character of which charset is CHARSET, the 1st position
1033 code is C1, the 2nd position code is C2, and return the decoded
1034 character code. If the variable `translation_table' is non-nil,
1035 returned the translated code. */
ec6d2bb8 1036
b73bfc1c
KH
1037#define DECODE_ISO_CHARACTER(charset, c1, c2) \
1038 (NILP (translation_table) \
1039 ? MAKE_CHAR (charset, c1, c2) \
1040 : translate_char (translation_table, -1, charset, c1, c2))
4ed46869
KH
1041
1042/* Set designation state into CODING. */
d46c5b12
KH
1043#define DECODE_DESIGNATION(reg, dimension, chars, final_char) \
1044 do { \
944bd420
KH
1045 int charset; \
1046 \
1047 if (final_char < '0' || final_char >= 128) \
1048 goto label_invalid_code; \
1049 charset = ISO_CHARSET_TABLE (make_number (dimension), \
1050 make_number (chars), \
1051 make_number (final_char)); \
d46c5b12 1052 if (charset >= 0 \
704c5781
KH
1053 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg \
1054 || coding->safe_charsets[charset])) \
d46c5b12
KH
1055 { \
1056 if (coding->spec.iso2022.last_invalid_designation_register == 0 \
1057 && reg == 0 \
1058 && charset == CHARSET_ASCII) \
1059 { \
1060 /* We should insert this designation sequence as is so \
1061 that it is surely written back to a file. */ \
1062 coding->spec.iso2022.last_invalid_designation_register = -1; \
1063 goto label_invalid_code; \
1064 } \
1065 coding->spec.iso2022.last_invalid_designation_register = -1; \
1066 if ((coding->mode & CODING_MODE_DIRECTION) \
1067 && CHARSET_REVERSE_CHARSET (charset) >= 0) \
1068 charset = CHARSET_REVERSE_CHARSET (charset); \
1069 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1070 } \
1071 else \
1072 { \
1073 coding->spec.iso2022.last_invalid_designation_register = reg; \
1074 goto label_invalid_code; \
1075 } \
4ed46869
KH
1076 } while (0)
1077
ec6d2bb8
KH
1078/* Allocate a memory block for storing information about compositions.
1079 The block is chained to the already allocated blocks. */
d46c5b12 1080
33fb63eb 1081void
ec6d2bb8 1082coding_allocate_composition_data (coding, char_offset)
d46c5b12 1083 struct coding_system *coding;
ec6d2bb8 1084 int char_offset;
d46c5b12 1085{
ec6d2bb8
KH
1086 struct composition_data *cmp_data
1087 = (struct composition_data *) xmalloc (sizeof *cmp_data);
1088
1089 cmp_data->char_offset = char_offset;
1090 cmp_data->used = 0;
1091 cmp_data->prev = coding->cmp_data;
1092 cmp_data->next = NULL;
1093 if (coding->cmp_data)
1094 coding->cmp_data->next = cmp_data;
1095 coding->cmp_data = cmp_data;
1096 coding->cmp_data_start = 0;
1097}
d46c5b12 1098
ec6d2bb8
KH
1099/* Record the starting position START and METHOD of one composition. */
1100
1101#define CODING_ADD_COMPOSITION_START(coding, start, method) \
1102 do { \
1103 struct composition_data *cmp_data = coding->cmp_data; \
1104 int *data = cmp_data->data + cmp_data->used; \
1105 coding->cmp_data_start = cmp_data->used; \
1106 data[0] = -1; \
1107 data[1] = cmp_data->char_offset + start; \
1108 data[3] = (int) method; \
1109 cmp_data->used += 4; \
1110 } while (0)
1111
1112/* Record the ending position END of the current composition. */
1113
1114#define CODING_ADD_COMPOSITION_END(coding, end) \
1115 do { \
1116 struct composition_data *cmp_data = coding->cmp_data; \
1117 int *data = cmp_data->data + coding->cmp_data_start; \
1118 data[0] = cmp_data->used - coding->cmp_data_start; \
1119 data[2] = cmp_data->char_offset + end; \
1120 } while (0)
1121
1122/* Record one COMPONENT (alternate character or composition rule). */
1123
1124#define CODING_ADD_COMPOSITION_COMPONENT(coding, component) \
1125 (coding->cmp_data->data[coding->cmp_data->used++] = component)
1126
1127/* Handle compositoin start sequence ESC 0, ESC 2, ESC 3, or ESC 4. */
1128
33fb63eb
KH
1129#define DECODE_COMPOSITION_START(c1) \
1130 do { \
1131 if (coding->composing == COMPOSITION_DISABLED) \
1132 { \
1133 *dst++ = ISO_CODE_ESC; \
1134 *dst++ = c1 & 0x7f; \
1135 coding->produced_char += 2; \
1136 } \
1137 else if (!COMPOSING_P (coding)) \
1138 { \
1139 /* This is surely the start of a composition. We must be sure \
1140 that coding->cmp_data has enough space to store the \
1141 information about the composition. If not, terminate the \
1142 current decoding loop, allocate one more memory block for \
1143 coding->cmp_data in the calller, then start the decoding \
1144 loop again. We can't allocate memory here directly because \
1145 it may cause buffer/string relocation. */ \
1146 if (!coding->cmp_data \
1147 || (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH \
1148 >= COMPOSITION_DATA_SIZE)) \
1149 { \
1150 coding->result = CODING_FINISH_INSUFFICIENT_CMP; \
1151 goto label_end_of_loop; \
1152 } \
1153 coding->composing = (c1 == '0' ? COMPOSITION_RELATIVE \
1154 : c1 == '2' ? COMPOSITION_WITH_RULE \
1155 : c1 == '3' ? COMPOSITION_WITH_ALTCHARS \
1156 : COMPOSITION_WITH_RULE_ALTCHARS); \
1157 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, \
1158 coding->composing); \
1159 coding->composition_rule_follows = 0; \
1160 } \
1161 else \
1162 { \
1163 /* We are already handling a composition. If the method is \
1164 the following two, the codes following the current escape \
1165 sequence are actual characters stored in a buffer. */ \
1166 if (coding->composing == COMPOSITION_WITH_ALTCHARS \
1167 || coding->composing == COMPOSITION_WITH_RULE_ALTCHARS) \
1168 { \
1169 coding->composing = COMPOSITION_RELATIVE; \
1170 coding->composition_rule_follows = 0; \
1171 } \
1172 } \
ec6d2bb8
KH
1173 } while (0)
1174
1175/* Handle compositoin end sequence ESC 1. */
1176
1177#define DECODE_COMPOSITION_END(c1) \
1178 do { \
1179 if (coding->composing == COMPOSITION_DISABLED) \
1180 { \
1181 *dst++ = ISO_CODE_ESC; \
1182 *dst++ = c1; \
1183 coding->produced_char += 2; \
1184 } \
1185 else \
1186 { \
1187 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
1188 coding->composing = COMPOSITION_NO; \
1189 } \
1190 } while (0)
1191
1192/* Decode a composition rule from the byte C1 (and maybe one more byte
1193 from SRC) and store one encoded composition rule in
1194 coding->cmp_data. */
1195
1196#define DECODE_COMPOSITION_RULE(c1) \
1197 do { \
1198 int rule = 0; \
1199 (c1) -= 32; \
1200 if (c1 < 81) /* old format (before ver.21) */ \
1201 { \
1202 int gref = (c1) / 9; \
1203 int nref = (c1) % 9; \
1204 if (gref == 4) gref = 10; \
1205 if (nref == 4) nref = 10; \
1206 rule = COMPOSITION_ENCODE_RULE (gref, nref); \
1207 } \
b73bfc1c 1208 else if (c1 < 93) /* new format (after ver.21) */ \
ec6d2bb8
KH
1209 { \
1210 ONE_MORE_BYTE (c2); \
1211 rule = COMPOSITION_ENCODE_RULE (c1 - 81, c2 - 32); \
1212 } \
1213 CODING_ADD_COMPOSITION_COMPONENT (coding, rule); \
1214 coding->composition_rule_follows = 0; \
1215 } while (0)
88993dfd 1216
d46c5b12 1217
4ed46869
KH
1218/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
1219
b73bfc1c 1220static void
d46c5b12 1221decode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
1222 struct coding_system *coding;
1223 unsigned char *source, *destination;
1224 int src_bytes, dst_bytes;
4ed46869
KH
1225{
1226 unsigned char *src = source;
1227 unsigned char *src_end = source + src_bytes;
1228 unsigned char *dst = destination;
1229 unsigned char *dst_end = destination + dst_bytes;
4ed46869
KH
1230 /* Charsets invoked to graphic plane 0 and 1 respectively. */
1231 int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1232 int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
b73bfc1c
KH
1233 /* SRC_BASE remembers the start position in source in each loop.
1234 The loop will be exited when there's not enough source code
1235 (within macro ONE_MORE_BYTE), or when there's not enough
1236 destination area to produce a character (within macro
1237 EMIT_CHAR). */
1238 unsigned char *src_base;
1239 int c, charset;
1240 Lisp_Object translation_table;
bdd9fb48 1241
b73bfc1c
KH
1242 if (NILP (Venable_character_translation))
1243 translation_table = Qnil;
1244 else
1245 {
1246 translation_table = coding->translation_table_for_decode;
1247 if (NILP (translation_table))
1248 translation_table = Vstandard_translation_table_for_decode;
1249 }
4ed46869 1250
b73bfc1c
KH
1251 coding->result = CODING_FINISH_NORMAL;
1252
1253 while (1)
4ed46869 1254 {
b73bfc1c
KH
1255 int c1, c2;
1256
1257 src_base = src;
1258 ONE_MORE_BYTE (c1);
4ed46869 1259
ec6d2bb8 1260 /* We produce no character or one character. */
4ed46869
KH
1261 switch (iso_code_class [c1])
1262 {
1263 case ISO_0x20_or_0x7F:
ec6d2bb8
KH
1264 if (COMPOSING_P (coding) && coding->composition_rule_follows)
1265 {
1266 DECODE_COMPOSITION_RULE (c1);
b73bfc1c 1267 continue;
ec6d2bb8
KH
1268 }
1269 if (charset0 < 0 || CHARSET_CHARS (charset0) == 94)
4ed46869
KH
1270 {
1271 /* This is SPACE or DEL. */
b73bfc1c 1272 charset = CHARSET_ASCII;
4ed46869
KH
1273 break;
1274 }
1275 /* This is a graphic character, we fall down ... */
1276
1277 case ISO_graphic_plane_0:
ec6d2bb8 1278 if (COMPOSING_P (coding) && coding->composition_rule_follows)
b73bfc1c
KH
1279 {
1280 DECODE_COMPOSITION_RULE (c1);
1281 continue;
1282 }
1283 charset = charset0;
4ed46869
KH
1284 break;
1285
1286 case ISO_0xA0_or_0xFF:
d46c5b12
KH
1287 if (charset1 < 0 || CHARSET_CHARS (charset1) == 94
1288 || coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
fb88bf2d 1289 goto label_invalid_code;
4ed46869
KH
1290 /* This is a graphic character, we fall down ... */
1291
1292 case ISO_graphic_plane_1:
b73bfc1c 1293 if (charset1 < 0)
fb88bf2d 1294 goto label_invalid_code;
b73bfc1c 1295 charset = charset1;
4ed46869
KH
1296 break;
1297
b73bfc1c 1298 case ISO_control_0:
ec6d2bb8
KH
1299 if (COMPOSING_P (coding))
1300 DECODE_COMPOSITION_END ('1');
1301
4ed46869
KH
1302 /* All ISO2022 control characters in this class have the
1303 same representation in Emacs internal format. */
d46c5b12
KH
1304 if (c1 == '\n'
1305 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
1306 && (coding->eol_type == CODING_EOL_CR
1307 || coding->eol_type == CODING_EOL_CRLF))
1308 {
b73bfc1c
KH
1309 coding->result = CODING_FINISH_INCONSISTENT_EOL;
1310 goto label_end_of_loop;
d46c5b12 1311 }
b73bfc1c 1312 charset = CHARSET_ASCII;
4ed46869
KH
1313 break;
1314
b73bfc1c
KH
1315 case ISO_control_1:
1316 if (COMPOSING_P (coding))
1317 DECODE_COMPOSITION_END ('1');
1318 goto label_invalid_code;
1319
4ed46869 1320 case ISO_carriage_return:
ec6d2bb8
KH
1321 if (COMPOSING_P (coding))
1322 DECODE_COMPOSITION_END ('1');
1323
4ed46869 1324 if (coding->eol_type == CODING_EOL_CR)
b73bfc1c 1325 c1 = '\n';
4ed46869
KH
1326 else if (coding->eol_type == CODING_EOL_CRLF)
1327 {
1328 ONE_MORE_BYTE (c1);
b73bfc1c 1329 if (c1 != ISO_CODE_LF)
4ed46869 1330 {
d46c5b12
KH
1331 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
1332 {
b73bfc1c
KH
1333 coding->result = CODING_FINISH_INCONSISTENT_EOL;
1334 goto label_end_of_loop;
d46c5b12 1335 }
4ed46869 1336 src--;
b73bfc1c 1337 c1 = '\r';
4ed46869
KH
1338 }
1339 }
b73bfc1c 1340 charset = CHARSET_ASCII;
4ed46869
KH
1341 break;
1342
1343 case ISO_shift_out:
d46c5b12
KH
1344 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1345 || CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0)
1346 goto label_invalid_code;
4ed46869
KH
1347 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1;
1348 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1349 continue;
4ed46869
KH
1350
1351 case ISO_shift_in:
d46c5b12
KH
1352 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
1353 goto label_invalid_code;
4ed46869
KH
1354 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
1355 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1356 continue;
4ed46869
KH
1357
1358 case ISO_single_shift_2_7:
1359 case ISO_single_shift_2:
d46c5b12
KH
1360 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1361 goto label_invalid_code;
4ed46869
KH
1362 /* SS2 is handled as an escape sequence of ESC 'N' */
1363 c1 = 'N';
1364 goto label_escape_sequence;
1365
1366 case ISO_single_shift_3:
d46c5b12
KH
1367 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1368 goto label_invalid_code;
4ed46869
KH
1369 /* SS2 is handled as an escape sequence of ESC 'O' */
1370 c1 = 'O';
1371 goto label_escape_sequence;
1372
1373 case ISO_control_sequence_introducer:
1374 /* CSI is handled as an escape sequence of ESC '[' ... */
1375 c1 = '[';
1376 goto label_escape_sequence;
1377
1378 case ISO_escape:
1379 ONE_MORE_BYTE (c1);
1380 label_escape_sequence:
1381 /* Escape sequences handled by Emacs are invocation,
1382 designation, direction specification, and character
1383 composition specification. */
1384 switch (c1)
1385 {
1386 case '&': /* revision of following character set */
1387 ONE_MORE_BYTE (c1);
1388 if (!(c1 >= '@' && c1 <= '~'))
d46c5b12 1389 goto label_invalid_code;
4ed46869
KH
1390 ONE_MORE_BYTE (c1);
1391 if (c1 != ISO_CODE_ESC)
d46c5b12 1392 goto label_invalid_code;
4ed46869
KH
1393 ONE_MORE_BYTE (c1);
1394 goto label_escape_sequence;
1395
1396 case '$': /* designation of 2-byte character set */
d46c5b12
KH
1397 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
1398 goto label_invalid_code;
4ed46869
KH
1399 ONE_MORE_BYTE (c1);
1400 if (c1 >= '@' && c1 <= 'B')
1401 { /* designation of JISX0208.1978, GB2312.1980,
88993dfd 1402 or JISX0208.1980 */
4ed46869
KH
1403 DECODE_DESIGNATION (0, 2, 94, c1);
1404 }
1405 else if (c1 >= 0x28 && c1 <= 0x2B)
1406 { /* designation of DIMENSION2_CHARS94 character set */
1407 ONE_MORE_BYTE (c2);
1408 DECODE_DESIGNATION (c1 - 0x28, 2, 94, c2);
1409 }
1410 else if (c1 >= 0x2C && c1 <= 0x2F)
1411 { /* designation of DIMENSION2_CHARS96 character set */
1412 ONE_MORE_BYTE (c2);
1413 DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2);
1414 }
1415 else
d46c5b12 1416 goto label_invalid_code;
b73bfc1c
KH
1417 /* We must update these variables now. */
1418 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1419 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
1420 continue;
4ed46869
KH
1421
1422 case 'n': /* invocation of locking-shift-2 */
d46c5b12
KH
1423 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1424 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
1425 goto label_invalid_code;
4ed46869 1426 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2;
e0e989f6 1427 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1428 continue;
4ed46869
KH
1429
1430 case 'o': /* invocation of locking-shift-3 */
d46c5b12
KH
1431 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1432 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
1433 goto label_invalid_code;
4ed46869 1434 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3;
e0e989f6 1435 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1436 continue;
4ed46869
KH
1437
1438 case 'N': /* invocation of single-shift-2 */
d46c5b12
KH
1439 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1440 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
1441 goto label_invalid_code;
4ed46869 1442 charset = CODING_SPEC_ISO_DESIGNATION (coding, 2);
b73bfc1c 1443 ONE_MORE_BYTE (c1);
4ed46869
KH
1444 break;
1445
1446 case 'O': /* invocation of single-shift-3 */
d46c5b12
KH
1447 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1448 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
1449 goto label_invalid_code;
4ed46869 1450 charset = CODING_SPEC_ISO_DESIGNATION (coding, 3);
b73bfc1c 1451 ONE_MORE_BYTE (c1);
4ed46869
KH
1452 break;
1453
ec6d2bb8
KH
1454 case '0': case '2': case '3': case '4': /* start composition */
1455 DECODE_COMPOSITION_START (c1);
b73bfc1c 1456 continue;
4ed46869 1457
ec6d2bb8
KH
1458 case '1': /* end composition */
1459 DECODE_COMPOSITION_END (c1);
b73bfc1c 1460 continue;
4ed46869
KH
1461
1462 case '[': /* specification of direction */
d46c5b12
KH
1463 if (coding->flags & CODING_FLAG_ISO_NO_DIRECTION)
1464 goto label_invalid_code;
4ed46869 1465 /* For the moment, nested direction is not supported.
d46c5b12
KH
1466 So, `coding->mode & CODING_MODE_DIRECTION' zero means
1467 left-to-right, and nozero means right-to-left. */
4ed46869
KH
1468 ONE_MORE_BYTE (c1);
1469 switch (c1)
1470 {
1471 case ']': /* end of the current direction */
d46c5b12 1472 coding->mode &= ~CODING_MODE_DIRECTION;
4ed46869
KH
1473
1474 case '0': /* end of the current direction */
1475 case '1': /* start of left-to-right direction */
1476 ONE_MORE_BYTE (c1);
1477 if (c1 == ']')
d46c5b12 1478 coding->mode &= ~CODING_MODE_DIRECTION;
4ed46869 1479 else
d46c5b12 1480 goto label_invalid_code;
4ed46869
KH
1481 break;
1482
1483 case '2': /* start of right-to-left direction */
1484 ONE_MORE_BYTE (c1);
1485 if (c1 == ']')
d46c5b12 1486 coding->mode |= CODING_MODE_DIRECTION;
4ed46869 1487 else
d46c5b12 1488 goto label_invalid_code;
4ed46869
KH
1489 break;
1490
1491 default:
d46c5b12 1492 goto label_invalid_code;
4ed46869 1493 }
b73bfc1c 1494 continue;
4ed46869
KH
1495
1496 default:
d46c5b12
KH
1497 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
1498 goto label_invalid_code;
4ed46869
KH
1499 if (c1 >= 0x28 && c1 <= 0x2B)
1500 { /* designation of DIMENSION1_CHARS94 character set */
1501 ONE_MORE_BYTE (c2);
1502 DECODE_DESIGNATION (c1 - 0x28, 1, 94, c2);
1503 }
1504 else if (c1 >= 0x2C && c1 <= 0x2F)
1505 { /* designation of DIMENSION1_CHARS96 character set */
1506 ONE_MORE_BYTE (c2);
1507 DECODE_DESIGNATION (c1 - 0x2C, 1, 96, c2);
1508 }
1509 else
b73bfc1c
KH
1510 goto label_invalid_code;
1511 /* We must update these variables now. */
1512 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1513 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
1514 continue;
4ed46869 1515 }
b73bfc1c 1516 }
4ed46869 1517
b73bfc1c
KH
1518 /* Now we know CHARSET and 1st position code C1 of a character.
1519 Produce a multibyte sequence for that character while getting
1520 2nd position code C2 if necessary. */
1521 if (CHARSET_DIMENSION (charset) == 2)
1522 {
1523 ONE_MORE_BYTE (c2);
1524 if (c1 < 0x80 ? c2 < 0x20 || c2 >= 0x80 : c2 < 0xA0)
1525 /* C2 is not in a valid range. */
1526 goto label_invalid_code;
4ed46869 1527 }
b73bfc1c
KH
1528 c = DECODE_ISO_CHARACTER (charset, c1, c2);
1529 EMIT_CHAR (c);
4ed46869
KH
1530 continue;
1531
b73bfc1c
KH
1532 label_invalid_code:
1533 coding->errors++;
1534 if (COMPOSING_P (coding))
1535 DECODE_COMPOSITION_END ('1');
4ed46869 1536 src = src_base;
b73bfc1c
KH
1537 c = *src++;
1538 EMIT_CHAR (c);
4ed46869 1539 }
fb88bf2d 1540
b73bfc1c
KH
1541 label_end_of_loop:
1542 coding->consumed = coding->consumed_char = src_base - source;
d46c5b12 1543 coding->produced = dst - destination;
b73bfc1c 1544 return;
4ed46869
KH
1545}
1546
b73bfc1c 1547
f4dee582 1548/* ISO2022 encoding stuff. */
4ed46869
KH
1549
1550/*
f4dee582 1551 It is not enough to say just "ISO2022" on encoding, we have to
d46c5b12 1552 specify more details. In Emacs, each coding system of ISO2022
4ed46869
KH
1553 variant has the following specifications:
1554 1. Initial designation to G0 thru G3.
1555 2. Allows short-form designation?
1556 3. ASCII should be designated to G0 before control characters?
1557 4. ASCII should be designated to G0 at end of line?
1558 5. 7-bit environment or 8-bit environment?
1559 6. Use locking-shift?
1560 7. Use Single-shift?
1561 And the following two are only for Japanese:
1562 8. Use ASCII in place of JIS0201-1976-Roman?
1563 9. Use JISX0208-1983 in place of JISX0208-1978?
1564 These specifications are encoded in `coding->flags' as flag bits
1565 defined by macros CODING_FLAG_ISO_XXX. See `coding.h' for more
f4dee582 1566 details.
4ed46869
KH
1567*/
1568
1569/* Produce codes (escape sequence) for designating CHARSET to graphic
b73bfc1c
KH
1570 register REG at DST, and increment DST. If <final-char> of CHARSET is
1571 '@', 'A', or 'B' and the coding system CODING allows, produce
1572 designation sequence of short-form. */
4ed46869
KH
1573
1574#define ENCODE_DESIGNATION(charset, reg, coding) \
1575 do { \
1576 unsigned char final_char = CHARSET_ISO_FINAL_CHAR (charset); \
1577 char *intermediate_char_94 = "()*+"; \
1578 char *intermediate_char_96 = ",-./"; \
70c22245 1579 int revision = CODING_SPEC_ISO_REVISION_NUMBER(coding, charset); \
b73bfc1c 1580 \
70c22245
KH
1581 if (revision < 255) \
1582 { \
4ed46869
KH
1583 *dst++ = ISO_CODE_ESC; \
1584 *dst++ = '&'; \
70c22245 1585 *dst++ = '@' + revision; \
4ed46869 1586 } \
b73bfc1c 1587 *dst++ = ISO_CODE_ESC; \
4ed46869
KH
1588 if (CHARSET_DIMENSION (charset) == 1) \
1589 { \
1590 if (CHARSET_CHARS (charset) == 94) \
1591 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
1592 else \
1593 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
1594 } \
1595 else \
1596 { \
1597 *dst++ = '$'; \
1598 if (CHARSET_CHARS (charset) == 94) \
1599 { \
b73bfc1c
KH
1600 if (! (coding->flags & CODING_FLAG_ISO_SHORT_FORM) \
1601 || reg != 0 \
1602 || final_char < '@' || final_char > 'B') \
4ed46869
KH
1603 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
1604 } \
1605 else \
b73bfc1c 1606 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
4ed46869 1607 } \
b73bfc1c 1608 *dst++ = final_char; \
4ed46869
KH
1609 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1610 } while (0)
1611
1612/* The following two macros produce codes (control character or escape
1613 sequence) for ISO2022 single-shift functions (single-shift-2 and
1614 single-shift-3). */
1615
1616#define ENCODE_SINGLE_SHIFT_2 \
1617 do { \
1618 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
1619 *dst++ = ISO_CODE_ESC, *dst++ = 'N'; \
1620 else \
b73bfc1c 1621 *dst++ = ISO_CODE_SS2; \
4ed46869
KH
1622 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
1623 } while (0)
1624
fb88bf2d
KH
1625#define ENCODE_SINGLE_SHIFT_3 \
1626 do { \
4ed46869 1627 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
fb88bf2d
KH
1628 *dst++ = ISO_CODE_ESC, *dst++ = 'O'; \
1629 else \
b73bfc1c 1630 *dst++ = ISO_CODE_SS3; \
4ed46869
KH
1631 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
1632 } while (0)
1633
1634/* The following four macros produce codes (control character or
1635 escape sequence) for ISO2022 locking-shift functions (shift-in,
1636 shift-out, locking-shift-2, and locking-shift-3). */
1637
b73bfc1c
KH
1638#define ENCODE_SHIFT_IN \
1639 do { \
1640 *dst++ = ISO_CODE_SI; \
4ed46869
KH
1641 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; \
1642 } while (0)
1643
b73bfc1c
KH
1644#define ENCODE_SHIFT_OUT \
1645 do { \
1646 *dst++ = ISO_CODE_SO; \
4ed46869
KH
1647 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; \
1648 } while (0)
1649
1650#define ENCODE_LOCKING_SHIFT_2 \
1651 do { \
1652 *dst++ = ISO_CODE_ESC, *dst++ = 'n'; \
1653 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; \
1654 } while (0)
1655
b73bfc1c
KH
1656#define ENCODE_LOCKING_SHIFT_3 \
1657 do { \
1658 *dst++ = ISO_CODE_ESC, *dst++ = 'o'; \
4ed46869
KH
1659 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; \
1660 } while (0)
1661
f4dee582
RS
1662/* Produce codes for a DIMENSION1 character whose character set is
1663 CHARSET and whose position-code is C1. Designation and invocation
4ed46869
KH
1664 sequences are also produced in advance if necessary. */
1665
6e85d753
KH
1666#define ENCODE_ISO_CHARACTER_DIMENSION1(charset, c1) \
1667 do { \
1668 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
1669 { \
1670 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
1671 *dst++ = c1 & 0x7F; \
1672 else \
1673 *dst++ = c1 | 0x80; \
1674 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
1675 break; \
1676 } \
1677 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
1678 { \
1679 *dst++ = c1 & 0x7F; \
1680 break; \
1681 } \
1682 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
1683 { \
1684 *dst++ = c1 | 0x80; \
1685 break; \
1686 } \
1687 else if (coding->flags & CODING_FLAG_ISO_SAFE \
70c22245 1688 && !coding->safe_charsets[charset]) \
6e85d753
KH
1689 { \
1690 /* We should not encode this character, instead produce one or \
1691 two `?'s. */ \
1692 *dst++ = CODING_INHIBIT_CHARACTER_SUBSTITUTION; \
1693 if (CHARSET_WIDTH (charset) == 2) \
1694 *dst++ = CODING_INHIBIT_CHARACTER_SUBSTITUTION; \
1695 break; \
1696 } \
1697 else \
1698 /* Since CHARSET is not yet invoked to any graphic planes, we \
1699 must invoke it, or, at first, designate it to some graphic \
1700 register. Then repeat the loop to actually produce the \
1701 character. */ \
1702 dst = encode_invocation_designation (charset, coding, dst); \
4ed46869
KH
1703 } while (1)
1704
f4dee582
RS
1705/* Produce codes for a DIMENSION2 character whose character set is
1706 CHARSET and whose position-codes are C1 and C2. Designation and
4ed46869
KH
1707 invocation codes are also produced in advance if necessary. */
1708
6e85d753
KH
1709#define ENCODE_ISO_CHARACTER_DIMENSION2(charset, c1, c2) \
1710 do { \
1711 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
1712 { \
1713 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
1714 *dst++ = c1 & 0x7F, *dst++ = c2 & 0x7F; \
1715 else \
1716 *dst++ = c1 | 0x80, *dst++ = c2 | 0x80; \
1717 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
1718 break; \
1719 } \
1720 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
1721 { \
1722 *dst++ = c1 & 0x7F, *dst++= c2 & 0x7F; \
1723 break; \
1724 } \
1725 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
1726 { \
1727 *dst++ = c1 | 0x80, *dst++= c2 | 0x80; \
1728 break; \
1729 } \
1730 else if (coding->flags & CODING_FLAG_ISO_SAFE \
70c22245 1731 && !coding->safe_charsets[charset]) \
6e85d753
KH
1732 { \
1733 /* We should not encode this character, instead produce one or \
1734 two `?'s. */ \
1735 *dst++ = CODING_INHIBIT_CHARACTER_SUBSTITUTION; \
1736 if (CHARSET_WIDTH (charset) == 2) \
1737 *dst++ = CODING_INHIBIT_CHARACTER_SUBSTITUTION; \
1738 break; \
1739 } \
1740 else \
1741 /* Since CHARSET is not yet invoked to any graphic planes, we \
1742 must invoke it, or, at first, designate it to some graphic \
1743 register. Then repeat the loop to actually produce the \
1744 character. */ \
1745 dst = encode_invocation_designation (charset, coding, dst); \
4ed46869
KH
1746 } while (1)
1747
6f551029
KH
1748#define ENCODE_ISO_CHARACTER(charset, c1, c2) \
1749 do { \
b73bfc1c 1750 int alt_charset = charset; \
ec6d2bb8 1751 \
b73bfc1c 1752 if (CHARSET_DEFINED_P (charset)) \
6f551029 1753 { \
b73bfc1c 1754 if (CHARSET_DIMENSION (charset) == 1) \
6f551029
KH
1755 { \
1756 if (charset == CHARSET_ASCII \
1757 && coding->flags & CODING_FLAG_ISO_USE_ROMAN) \
b73bfc1c
KH
1758 alt_charset = charset_latin_jisx0201; \
1759 ENCODE_ISO_CHARACTER_DIMENSION1 (alt_charset, c1); \
6f551029
KH
1760 } \
1761 else \
1762 { \
1763 if (charset == charset_jisx0208 \
1764 && coding->flags & CODING_FLAG_ISO_USE_OLDJIS) \
b73bfc1c
KH
1765 alt_charset = charset_jisx0208_1978; \
1766 ENCODE_ISO_CHARACTER_DIMENSION2 (alt_charset, c1, c2); \
6f551029
KH
1767 } \
1768 } \
1769 else \
1770 { \
b73bfc1c
KH
1771 *dst++ = c1; \
1772 if (c2 >= 0) \
1773 *dst++ = c2; \
6f551029 1774 } \
84fbb8a0 1775 } while (0)
bdd9fb48 1776
4ed46869
KH
1777/* Produce designation and invocation codes at a place pointed by DST
1778 to use CHARSET. The element `spec.iso2022' of *CODING is updated.
1779 Return new DST. */
1780
1781unsigned char *
1782encode_invocation_designation (charset, coding, dst)
1783 int charset;
1784 struct coding_system *coding;
1785 unsigned char *dst;
1786{
1787 int reg; /* graphic register number */
1788
1789 /* At first, check designations. */
1790 for (reg = 0; reg < 4; reg++)
1791 if (charset == CODING_SPEC_ISO_DESIGNATION (coding, reg))
1792 break;
1793
1794 if (reg >= 4)
1795 {
1796 /* CHARSET is not yet designated to any graphic registers. */
1797 /* At first check the requested designation. */
1798 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
1ba9e4ab
KH
1799 if (reg == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)
1800 /* Since CHARSET requests no special designation, designate it
1801 to graphic register 0. */
4ed46869
KH
1802 reg = 0;
1803
1804 ENCODE_DESIGNATION (charset, reg, coding);
1805 }
1806
1807 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != reg
1808 && CODING_SPEC_ISO_INVOCATION (coding, 1) != reg)
1809 {
1810 /* Since the graphic register REG is not invoked to any graphic
1811 planes, invoke it to graphic plane 0. */
1812 switch (reg)
1813 {
1814 case 0: /* graphic register 0 */
1815 ENCODE_SHIFT_IN;
1816 break;
1817
1818 case 1: /* graphic register 1 */
1819 ENCODE_SHIFT_OUT;
1820 break;
1821
1822 case 2: /* graphic register 2 */
1823 if (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1824 ENCODE_SINGLE_SHIFT_2;
1825 else
1826 ENCODE_LOCKING_SHIFT_2;
1827 break;
1828
1829 case 3: /* graphic register 3 */
1830 if (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1831 ENCODE_SINGLE_SHIFT_3;
1832 else
1833 ENCODE_LOCKING_SHIFT_3;
1834 break;
1835 }
1836 }
b73bfc1c 1837
4ed46869
KH
1838 return dst;
1839}
1840
ec6d2bb8
KH
1841/* Produce 2-byte codes for encoded composition rule RULE. */
1842
1843#define ENCODE_COMPOSITION_RULE(rule) \
1844 do { \
1845 int gref, nref; \
1846 COMPOSITION_DECODE_RULE (rule, gref, nref); \
1847 *dst++ = 32 + 81 + gref; \
1848 *dst++ = 32 + nref; \
1849 } while (0)
1850
1851/* Produce codes for indicating the start of a composition sequence
1852 (ESC 0, ESC 3, or ESC 4). DATA points to an array of integers
1853 which specify information about the composition. See the comment
1854 in coding.h for the format of DATA. */
1855
1856#define ENCODE_COMPOSITION_START(coding, data) \
1857 do { \
1858 coding->composing = data[3]; \
1859 *dst++ = ISO_CODE_ESC; \
1860 if (coding->composing == COMPOSITION_RELATIVE) \
1861 *dst++ = '0'; \
1862 else \
1863 { \
1864 *dst++ = (coding->composing == COMPOSITION_WITH_ALTCHARS \
1865 ? '3' : '4'); \
1866 coding->cmp_data_index = coding->cmp_data_start + 4; \
1867 coding->composition_rule_follows = 0; \
1868 } \
1869 } while (0)
1870
1871/* Produce codes for indicating the end of the current composition. */
1872
1873#define ENCODE_COMPOSITION_END(coding, data) \
1874 do { \
1875 *dst++ = ISO_CODE_ESC; \
1876 *dst++ = '1'; \
1877 coding->cmp_data_start += data[0]; \
1878 coding->composing = COMPOSITION_NO; \
1879 if (coding->cmp_data_start == coding->cmp_data->used \
1880 && coding->cmp_data->next) \
1881 { \
1882 coding->cmp_data = coding->cmp_data->next; \
1883 coding->cmp_data_start = 0; \
1884 } \
1885 } while (0)
1886
1887/* Produce composition start sequence ESC 0. Here, this sequence
1888 doesn't mean the start of a new composition but means that we have
1889 just produced components (alternate chars and composition rules) of
1890 the composition and the actual text follows in SRC. */
1891
1892#define ENCODE_COMPOSITION_FAKE_START(coding) \
1893 do { \
1894 *dst++ = ISO_CODE_ESC; \
1895 *dst++ = '0'; \
1896 coding->composing = COMPOSITION_RELATIVE; \
1897 } while (0)
4ed46869
KH
1898
1899/* The following three macros produce codes for indicating direction
1900 of text. */
b73bfc1c
KH
1901#define ENCODE_CONTROL_SEQUENCE_INTRODUCER \
1902 do { \
4ed46869 1903 if (coding->flags == CODING_FLAG_ISO_SEVEN_BITS) \
b73bfc1c
KH
1904 *dst++ = ISO_CODE_ESC, *dst++ = '['; \
1905 else \
1906 *dst++ = ISO_CODE_CSI; \
4ed46869
KH
1907 } while (0)
1908
1909#define ENCODE_DIRECTION_R2L \
b73bfc1c 1910 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '2', *dst++ = ']'
4ed46869
KH
1911
1912#define ENCODE_DIRECTION_L2R \
b73bfc1c 1913 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '0', *dst++ = ']'
4ed46869
KH
1914
1915/* Produce codes for designation and invocation to reset the graphic
1916 planes and registers to initial state. */
e0e989f6
KH
1917#define ENCODE_RESET_PLANE_AND_REGISTER \
1918 do { \
1919 int reg; \
1920 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != 0) \
1921 ENCODE_SHIFT_IN; \
1922 for (reg = 0; reg < 4; reg++) \
1923 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg) >= 0 \
1924 && (CODING_SPEC_ISO_DESIGNATION (coding, reg) \
1925 != CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg))) \
1926 ENCODE_DESIGNATION \
1927 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \
4ed46869
KH
1928 } while (0)
1929
bdd9fb48 1930/* Produce designation sequences of charsets in the line started from
b73bfc1c 1931 SRC to a place pointed by DST, and return updated DST.
bdd9fb48
KH
1932
1933 If the current block ends before any end-of-line, we may fail to
d46c5b12
KH
1934 find all the necessary designations. */
1935
b73bfc1c
KH
1936static unsigned char *
1937encode_designation_at_bol (coding, translation_table, src, src_end, dst)
e0e989f6 1938 struct coding_system *coding;
b73bfc1c
KH
1939 Lisp_Object translation_table;
1940 unsigned char *src, *src_end, *dst;
e0e989f6 1941{
bdd9fb48
KH
1942 int charset, c, found = 0, reg;
1943 /* Table of charsets to be designated to each graphic register. */
1944 int r[4];
bdd9fb48
KH
1945
1946 for (reg = 0; reg < 4; reg++)
1947 r[reg] = -1;
1948
b73bfc1c 1949 while (found < 4)
e0e989f6 1950 {
b73bfc1c
KH
1951 ONE_MORE_CHAR (c);
1952 if (c == '\n')
1953 break;
bdd9fb48 1954
b73bfc1c 1955 charset = CHAR_CHARSET (c);
e0e989f6 1956 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
d46c5b12 1957 if (reg != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION && r[reg] < 0)
bdd9fb48
KH
1958 {
1959 found++;
1960 r[reg] = charset;
1961 }
bdd9fb48
KH
1962 }
1963
b73bfc1c 1964 label_end_of_loop:
bdd9fb48
KH
1965 if (found)
1966 {
1967 for (reg = 0; reg < 4; reg++)
1968 if (r[reg] >= 0
1969 && CODING_SPEC_ISO_DESIGNATION (coding, reg) != r[reg])
1970 ENCODE_DESIGNATION (r[reg], reg, coding);
e0e989f6 1971 }
b73bfc1c
KH
1972
1973 return dst;
e0e989f6
KH
1974}
1975
4ed46869
KH
1976/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
1977
b73bfc1c 1978static void
d46c5b12 1979encode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
1980 struct coding_system *coding;
1981 unsigned char *source, *destination;
1982 int src_bytes, dst_bytes;
4ed46869
KH
1983{
1984 unsigned char *src = source;
1985 unsigned char *src_end = source + src_bytes;
1986 unsigned char *dst = destination;
1987 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c 1988 /* Since the maximum bytes produced by each loop is 20, we subtract 19
4ed46869
KH
1989 from DST_END to assure overflow checking is necessary only at the
1990 head of loop. */
b73bfc1c
KH
1991 unsigned char *adjusted_dst_end = dst_end - 19;
1992 /* SRC_BASE remembers the start position in source in each loop.
1993 The loop will be exited when there's not enough source text to
1994 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
1995 there's not enough destination area to produce encoded codes
1996 (within macro EMIT_BYTES). */
1997 unsigned char *src_base;
1998 int c;
1999 Lisp_Object translation_table;
bdd9fb48 2000
b73bfc1c
KH
2001 if (NILP (Venable_character_translation))
2002 translation_table = Qnil;
2003 else
2004 {
2005 translation_table = coding->translation_table_for_encode;
2006 if (NILP (translation_table))
2007 translation_table = Vstandard_translation_table_for_encode;
2008 }
4ed46869 2009
d46c5b12 2010 coding->consumed_char = 0;
b73bfc1c
KH
2011 coding->errors = 0;
2012 while (1)
4ed46869 2013 {
b73bfc1c
KH
2014 int charset, c1, c2;
2015
2016 src_base = src;
2017
2018 if (dst >= (dst_bytes ? adjusted_dst_end : (src - 19)))
2019 {
2020 coding->result = CODING_FINISH_INSUFFICIENT_DST;
2021 break;
2022 }
4ed46869 2023
e0e989f6
KH
2024 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL
2025 && CODING_SPEC_ISO_BOL (coding))
2026 {
bdd9fb48 2027 /* We have to produce designation sequences if any now. */
b73bfc1c
KH
2028 dst = encode_designation_at_bol (coding, translation_table,
2029 src, src_end, dst);
e0e989f6
KH
2030 CODING_SPEC_ISO_BOL (coding) = 0;
2031 }
2032
ec6d2bb8
KH
2033 /* Check composition start and end. */
2034 if (coding->composing != COMPOSITION_DISABLED
2035 && coding->cmp_data_start < coding->cmp_data->used)
4ed46869 2036 {
ec6d2bb8
KH
2037 struct composition_data *cmp_data = coding->cmp_data;
2038 int *data = cmp_data->data + coding->cmp_data_start;
2039 int this_pos = cmp_data->char_offset + coding->consumed_char;
2040
2041 if (coding->composing == COMPOSITION_RELATIVE)
4ed46869 2042 {
ec6d2bb8
KH
2043 if (this_pos == data[2])
2044 {
2045 ENCODE_COMPOSITION_END (coding, data);
2046 cmp_data = coding->cmp_data;
2047 data = cmp_data->data + coding->cmp_data_start;
2048 }
4ed46869 2049 }
ec6d2bb8 2050 else if (COMPOSING_P (coding))
4ed46869 2051 {
ec6d2bb8
KH
2052 /* COMPOSITION_WITH_ALTCHARS or COMPOSITION_WITH_RULE_ALTCHAR */
2053 if (coding->cmp_data_index == coding->cmp_data_start + data[0])
2054 /* We have consumed components of the composition.
2055 What follows in SRC is the compositions's base
2056 text. */
2057 ENCODE_COMPOSITION_FAKE_START (coding);
2058 else
4ed46869 2059 {
ec6d2bb8
KH
2060 int c = cmp_data->data[coding->cmp_data_index++];
2061 if (coding->composition_rule_follows)
2062 {
2063 ENCODE_COMPOSITION_RULE (c);
2064 coding->composition_rule_follows = 0;
2065 }
2066 else
2067 {
2068 SPLIT_CHAR (c, charset, c1, c2);
2069 ENCODE_ISO_CHARACTER (charset, c1, c2);
ec6d2bb8
KH
2070 if (coding->composing == COMPOSITION_WITH_RULE_ALTCHARS)
2071 coding->composition_rule_follows = 1;
2072 }
4ed46869
KH
2073 continue;
2074 }
ec6d2bb8
KH
2075 }
2076 if (!COMPOSING_P (coding))
2077 {
2078 if (this_pos == data[1])
4ed46869 2079 {
ec6d2bb8
KH
2080 ENCODE_COMPOSITION_START (coding, data);
2081 continue;
4ed46869 2082 }
4ed46869
KH
2083 }
2084 }
ec6d2bb8 2085
b73bfc1c 2086 ONE_MORE_CHAR (c);
4ed46869 2087
b73bfc1c
KH
2088 /* Now encode the character C. */
2089 if (c < 0x20 || c == 0x7F)
2090 {
2091 if (c == '\r')
19a8d9e0 2092 {
b73bfc1c
KH
2093 if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY))
2094 {
2095 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
2096 ENCODE_RESET_PLANE_AND_REGISTER;
2097 *dst++ = c;
2098 continue;
2099 }
2100 /* fall down to treat '\r' as '\n' ... */
2101 c = '\n';
19a8d9e0 2102 }
b73bfc1c 2103 if (c == '\n')
19a8d9e0 2104 {
b73bfc1c
KH
2105 if (coding->flags & CODING_FLAG_ISO_RESET_AT_EOL)
2106 ENCODE_RESET_PLANE_AND_REGISTER;
2107 if (coding->flags & CODING_FLAG_ISO_INIT_AT_BOL)
2108 bcopy (coding->spec.iso2022.initial_designation,
2109 coding->spec.iso2022.current_designation,
2110 sizeof coding->spec.iso2022.initial_designation);
2111 if (coding->eol_type == CODING_EOL_LF
2112 || coding->eol_type == CODING_EOL_UNDECIDED)
2113 *dst++ = ISO_CODE_LF;
2114 else if (coding->eol_type == CODING_EOL_CRLF)
2115 *dst++ = ISO_CODE_CR, *dst++ = ISO_CODE_LF;
2116 else
2117 *dst++ = ISO_CODE_CR;
2118 CODING_SPEC_ISO_BOL (coding) = 1;
19a8d9e0 2119 }
b73bfc1c 2120 else
19a8d9e0 2121 {
b73bfc1c
KH
2122 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
2123 ENCODE_RESET_PLANE_AND_REGISTER;
2124 *dst++ = c;
19a8d9e0 2125 }
4ed46869 2126 }
b73bfc1c
KH
2127 else if (ASCII_BYTE_P (c))
2128 ENCODE_ISO_CHARACTER (CHARSET_ASCII, c, /* dummy */ c1);
2129 else if (SINGLE_BYTE_CHAR_P (c))
88993dfd 2130 {
b73bfc1c
KH
2131 *dst++ = c;
2132 coding->errors++;
88993dfd 2133 }
b73bfc1c
KH
2134 else
2135 {
2136 SPLIT_CHAR (c, charset, c1, c2);
2137 ENCODE_ISO_CHARACTER (charset, c1, c2);
2138 }
2139
2140 coding->consumed_char++;
84fbb8a0 2141 }
b73bfc1c
KH
2142
2143 label_end_of_loop:
2144 coding->consumed = src_base - source;
d46c5b12 2145 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
2146}
2147
2148\f
2149/*** 4. SJIS and BIG5 handlers ***/
2150
f4dee582 2151/* Although SJIS and BIG5 are not ISO's coding system, they are used
4ed46869
KH
2152 quite widely. So, for the moment, Emacs supports them in the bare
2153 C code. But, in the future, they may be supported only by CCL. */
2154
2155/* SJIS is a coding system encoding three character sets: ASCII, right
2156 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded
2157 as is. A character of charset katakana-jisx0201 is encoded by
2158 "position-code + 0x80". A character of charset japanese-jisx0208
2159 is encoded in 2-byte but two position-codes are divided and shifted
2160 so that it fit in the range below.
2161
2162 --- CODE RANGE of SJIS ---
2163 (character set) (range)
2164 ASCII 0x00 .. 0x7F
2165 KATAKANA-JISX0201 0xA0 .. 0xDF
c28a9453 2166 JISX0208 (1st byte) 0x81 .. 0x9F and 0xE0 .. 0xEF
d14d03ac 2167 (2nd byte) 0x40 .. 0x7E and 0x80 .. 0xFC
4ed46869
KH
2168 -------------------------------
2169
2170*/
2171
2172/* BIG5 is a coding system encoding two character sets: ASCII and
2173 Big5. An ASCII character is encoded as is. Big5 is a two-byte
2174 character set and is encoded in two-byte.
2175
2176 --- CODE RANGE of BIG5 ---
2177 (character set) (range)
2178 ASCII 0x00 .. 0x7F
2179 Big5 (1st byte) 0xA1 .. 0xFE
2180 (2nd byte) 0x40 .. 0x7E and 0xA1 .. 0xFE
2181 --------------------------
2182
2183 Since the number of characters in Big5 is larger than maximum
2184 characters in Emacs' charset (96x96), it can't be handled as one
2185 charset. So, in Emacs, Big5 is divided into two: `charset-big5-1'
2186 and `charset-big5-2'. Both are DIMENSION2 and CHARS94. The former
2187 contains frequently used characters and the latter contains less
2188 frequently used characters. */
2189
2190/* Macros to decode or encode a character of Big5 in BIG5. B1 and B2
2191 are the 1st and 2nd position-codes of Big5 in BIG5 coding system.
2192 C1 and C2 are the 1st and 2nd position-codes of of Emacs' internal
2193 format. CHARSET is `charset_big5_1' or `charset_big5_2'. */
2194
2195/* Number of Big5 characters which have the same code in 1st byte. */
2196#define BIG5_SAME_ROW (0xFF - 0xA1 + 0x7F - 0x40)
2197
2198#define DECODE_BIG5(b1, b2, charset, c1, c2) \
2199 do { \
2200 unsigned int temp \
2201 = (b1 - 0xA1) * BIG5_SAME_ROW + b2 - (b2 < 0x7F ? 0x40 : 0x62); \
2202 if (b1 < 0xC9) \
2203 charset = charset_big5_1; \
2204 else \
2205 { \
2206 charset = charset_big5_2; \
2207 temp -= (0xC9 - 0xA1) * BIG5_SAME_ROW; \
2208 } \
2209 c1 = temp / (0xFF - 0xA1) + 0x21; \
2210 c2 = temp % (0xFF - 0xA1) + 0x21; \
2211 } while (0)
2212
2213#define ENCODE_BIG5(charset, c1, c2, b1, b2) \
2214 do { \
2215 unsigned int temp = (c1 - 0x21) * (0xFF - 0xA1) + (c2 - 0x21); \
2216 if (charset == charset_big5_2) \
2217 temp += BIG5_SAME_ROW * (0xC9 - 0xA1); \
2218 b1 = temp / BIG5_SAME_ROW + 0xA1; \
2219 b2 = temp % BIG5_SAME_ROW; \
2220 b2 += b2 < 0x3F ? 0x40 : 0x62; \
2221 } while (0)
2222
2223/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2224 Check if a text is encoded in SJIS. If it is, return
2225 CODING_CATEGORY_MASK_SJIS, else return 0. */
2226
2227int
2228detect_coding_sjis (src, src_end)
2229 unsigned char *src, *src_end;
2230{
b73bfc1c
KH
2231 int c;
2232 /* Dummy for ONE_MORE_BYTE. */
2233 struct coding_system dummy_coding;
2234 struct coding_system *coding = &dummy_coding;
4ed46869 2235
b73bfc1c 2236 while (1)
4ed46869 2237 {
b73bfc1c 2238 ONE_MORE_BYTE (c);
4ed46869
KH
2239 if ((c >= 0x80 && c < 0xA0) || c >= 0xE0)
2240 {
b73bfc1c
KH
2241 ONE_MORE_BYTE (c);
2242 if (c < 0x40)
4ed46869
KH
2243 return 0;
2244 }
2245 }
b73bfc1c 2246 label_end_of_loop:
4ed46869
KH
2247 return CODING_CATEGORY_MASK_SJIS;
2248}
2249
2250/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2251 Check if a text is encoded in BIG5. If it is, return
2252 CODING_CATEGORY_MASK_BIG5, else return 0. */
2253
2254int
2255detect_coding_big5 (src, src_end)
2256 unsigned char *src, *src_end;
2257{
b73bfc1c
KH
2258 int c;
2259 /* Dummy for ONE_MORE_BYTE. */
2260 struct coding_system dummy_coding;
2261 struct coding_system *coding = &dummy_coding;
4ed46869 2262
b73bfc1c 2263 while (1)
4ed46869 2264 {
b73bfc1c 2265 ONE_MORE_BYTE (c);
4ed46869
KH
2266 if (c >= 0xA1)
2267 {
b73bfc1c 2268 ONE_MORE_BYTE (c);
4ed46869
KH
2269 if (c < 0x40 || (c >= 0x7F && c <= 0xA0))
2270 return 0;
2271 }
2272 }
b73bfc1c 2273 label_end_of_loop:
4ed46869
KH
2274 return CODING_CATEGORY_MASK_BIG5;
2275}
2276
fa42c37f
KH
2277/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2278 Check if a text is encoded in UTF-8. If it is, return
2279 CODING_CATEGORY_MASK_UTF_8, else return 0. */
2280
2281#define UTF_8_1_OCTET_P(c) ((c) < 0x80)
2282#define UTF_8_EXTRA_OCTET_P(c) (((c) & 0xC0) == 0x80)
2283#define UTF_8_2_OCTET_LEADING_P(c) (((c) & 0xE0) == 0xC0)
2284#define UTF_8_3_OCTET_LEADING_P(c) (((c) & 0xF0) == 0xE0)
2285#define UTF_8_4_OCTET_LEADING_P(c) (((c) & 0xF8) == 0xF0)
2286#define UTF_8_5_OCTET_LEADING_P(c) (((c) & 0xFC) == 0xF8)
2287#define UTF_8_6_OCTET_LEADING_P(c) (((c) & 0xFE) == 0xFC)
2288
2289int
2290detect_coding_utf_8 (src, src_end)
2291 unsigned char *src, *src_end;
2292{
2293 unsigned char c;
2294 int seq_maybe_bytes;
b73bfc1c
KH
2295 /* Dummy for ONE_MORE_BYTE. */
2296 struct coding_system dummy_coding;
2297 struct coding_system *coding = &dummy_coding;
fa42c37f 2298
b73bfc1c 2299 while (1)
fa42c37f 2300 {
b73bfc1c 2301 ONE_MORE_BYTE (c);
fa42c37f
KH
2302 if (UTF_8_1_OCTET_P (c))
2303 continue;
2304 else if (UTF_8_2_OCTET_LEADING_P (c))
2305 seq_maybe_bytes = 1;
2306 else if (UTF_8_3_OCTET_LEADING_P (c))
2307 seq_maybe_bytes = 2;
2308 else if (UTF_8_4_OCTET_LEADING_P (c))
2309 seq_maybe_bytes = 3;
2310 else if (UTF_8_5_OCTET_LEADING_P (c))
2311 seq_maybe_bytes = 4;
2312 else if (UTF_8_6_OCTET_LEADING_P (c))
2313 seq_maybe_bytes = 5;
2314 else
2315 return 0;
2316
2317 do
2318 {
b73bfc1c 2319 ONE_MORE_BYTE (c);
fa42c37f
KH
2320 if (!UTF_8_EXTRA_OCTET_P (c))
2321 return 0;
2322 seq_maybe_bytes--;
2323 }
2324 while (seq_maybe_bytes > 0);
2325 }
2326
b73bfc1c 2327 label_end_of_loop:
fa42c37f
KH
2328 return CODING_CATEGORY_MASK_UTF_8;
2329}
2330
2331/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2332 Check if a text is encoded in UTF-16 Big Endian (endian == 1) or
2333 Little Endian (otherwise). If it is, return
2334 CODING_CATEGORY_MASK_UTF_16_BE or CODING_CATEGORY_MASK_UTF_16_LE,
2335 else return 0. */
2336
2337#define UTF_16_INVALID_P(val) \
2338 (((val) == 0xFFFE) \
2339 || ((val) == 0xFFFF))
2340
2341#define UTF_16_HIGH_SURROGATE_P(val) \
2342 (((val) & 0xD800) == 0xD800)
2343
2344#define UTF_16_LOW_SURROGATE_P(val) \
2345 (((val) & 0xDC00) == 0xDC00)
2346
2347int
2348detect_coding_utf_16 (src, src_end)
2349 unsigned char *src, *src_end;
2350{
b73bfc1c
KH
2351 unsigned char c1, c2;
2352 /* Dummy for TWO_MORE_BYTES. */
2353 struct coding_system dummy_coding;
2354 struct coding_system *coding = &dummy_coding;
fa42c37f 2355
b73bfc1c
KH
2356 TWO_MORE_BYTES (c1, c2);
2357
2358 if ((c1 == 0xFF) && (c2 == 0xFE))
fa42c37f 2359 return CODING_CATEGORY_MASK_UTF_16_LE;
b73bfc1c 2360 else if ((c1 == 0xFE) && (c2 == 0xFF))
fa42c37f
KH
2361 return CODING_CATEGORY_MASK_UTF_16_BE;
2362
b73bfc1c 2363 label_end_of_loop:
fa42c37f
KH
2364 return 0;
2365}
2366
4ed46869
KH
2367/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
2368 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */
2369
b73bfc1c 2370static void
4ed46869 2371decode_coding_sjis_big5 (coding, source, destination,
d46c5b12 2372 src_bytes, dst_bytes, sjis_p)
4ed46869
KH
2373 struct coding_system *coding;
2374 unsigned char *source, *destination;
2375 int src_bytes, dst_bytes;
4ed46869
KH
2376 int sjis_p;
2377{
2378 unsigned char *src = source;
2379 unsigned char *src_end = source + src_bytes;
2380 unsigned char *dst = destination;
2381 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c
KH
2382 /* SRC_BASE remembers the start position in source in each loop.
2383 The loop will be exited when there's not enough source code
2384 (within macro ONE_MORE_BYTE), or when there's not enough
2385 destination area to produce a character (within macro
2386 EMIT_CHAR). */
2387 unsigned char *src_base;
2388 Lisp_Object translation_table;
a5d301df 2389
b73bfc1c
KH
2390 if (NILP (Venable_character_translation))
2391 translation_table = Qnil;
2392 else
2393 {
2394 translation_table = coding->translation_table_for_decode;
2395 if (NILP (translation_table))
2396 translation_table = Vstandard_translation_table_for_decode;
2397 }
4ed46869 2398
d46c5b12 2399 coding->produced_char = 0;
b73bfc1c 2400 while (1)
4ed46869 2401 {
b73bfc1c
KH
2402 int c, charset, c1, c2;
2403
2404 src_base = src;
2405 ONE_MORE_BYTE (c1);
2406
2407 if (c1 < 0x80)
4ed46869 2408 {
b73bfc1c
KH
2409 charset = CHARSET_ASCII;
2410 if (c1 < 0x20)
4ed46869 2411 {
b73bfc1c 2412 if (c1 == '\r')
d46c5b12 2413 {
b73bfc1c 2414 if (coding->eol_type == CODING_EOL_CRLF)
d46c5b12 2415 {
b73bfc1c
KH
2416 ONE_MORE_BYTE (c2);
2417 if (c2 == '\n')
2418 c1 = c2;
2419 else if (coding->mode
2420 & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2421 {
2422 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2423 goto label_end_of_loop;
2424 }
2425 else
2426 /* To process C2 again, SRC is subtracted by 1. */
2427 src--;
d46c5b12 2428 }
b73bfc1c
KH
2429 else if (coding->eol_type == CODING_EOL_CR)
2430 c1 = '\n';
2431 }
2432 else if (c1 == '\n'
2433 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2434 && (coding->eol_type == CODING_EOL_CR
2435 || coding->eol_type == CODING_EOL_CRLF))
2436 {
2437 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2438 goto label_end_of_loop;
d46c5b12 2439 }
4ed46869 2440 }
4ed46869 2441 }
54f78171 2442 else
b73bfc1c 2443 {
4ed46869
KH
2444 if (sjis_p)
2445 {
b73bfc1c
KH
2446 if (c1 >= 0xF0)
2447 goto label_invalid_code;
2448 if (c1 < 0xA0 || c1 >= 0xE0)
fb88bf2d 2449 {
54f78171
KH
2450 /* SJIS -> JISX0208 */
2451 ONE_MORE_BYTE (c2);
b73bfc1c
KH
2452 if (c2 < 0x40 || c2 == 0x7F || c2 > 0xFC)
2453 goto label_invalid_code;
2454 DECODE_SJIS (c1, c2, c1, c2);
2455 charset = charset_jisx0208;
5e34de15 2456 }
fb88bf2d 2457 else
b73bfc1c
KH
2458 /* SJIS -> JISX0201-Kana */
2459 charset = charset_katakana_jisx0201;
4ed46869 2460 }
fb88bf2d 2461 else
fb88bf2d 2462 {
54f78171 2463 /* BIG5 -> Big5 */
b73bfc1c
KH
2464 if (c1 < 0xA1 || c1 > 0xFE)
2465 goto label_invalid_code;
2466 ONE_MORE_BYTE (c2);
2467 if (c2 < 0x40 || (c2 > 0x7E && c2 < 0xA1) || c2 > 0xFE)
2468 goto label_invalid_code;
2469 DECODE_BIG5 (c1, c2, charset, c1, c2);
4ed46869
KH
2470 }
2471 }
4ed46869 2472
b73bfc1c
KH
2473 c = DECODE_ISO_CHARACTER (charset, c1, c2);
2474 EMIT_CHAR (c);
fb88bf2d
KH
2475 continue;
2476
b73bfc1c
KH
2477 label_invalid_code:
2478 coding->errors++;
4ed46869 2479 src = src_base;
b73bfc1c
KH
2480 c = *src++;
2481 EMIT_CHAR (c);
fb88bf2d 2482 }
d46c5b12 2483
b73bfc1c
KH
2484 label_end_of_loop:
2485 coding->consumed = coding->consumed_char = src_base - source;
d46c5b12 2486 coding->produced = dst - destination;
b73bfc1c 2487 return;
4ed46869
KH
2488}
2489
2490/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".
b73bfc1c
KH
2491 This function can encode charsets `ascii', `katakana-jisx0201',
2492 `japanese-jisx0208', `chinese-big5-1', and `chinese-big5-2'. We
2493 are sure that all these charsets are registered as official charset
4ed46869
KH
2494 (i.e. do not have extended leading-codes). Characters of other
2495 charsets are produced without any encoding. If SJIS_P is 1, encode
2496 SJIS text, else encode BIG5 text. */
2497
b73bfc1c 2498static void
4ed46869 2499encode_coding_sjis_big5 (coding, source, destination,
d46c5b12 2500 src_bytes, dst_bytes, sjis_p)
4ed46869
KH
2501 struct coding_system *coding;
2502 unsigned char *source, *destination;
2503 int src_bytes, dst_bytes;
4ed46869
KH
2504 int sjis_p;
2505{
2506 unsigned char *src = source;
2507 unsigned char *src_end = source + src_bytes;
2508 unsigned char *dst = destination;
2509 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c
KH
2510 /* SRC_BASE remembers the start position in source in each loop.
2511 The loop will be exited when there's not enough source text to
2512 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2513 there's not enough destination area to produce encoded codes
2514 (within macro EMIT_BYTES). */
2515 unsigned char *src_base;
2516 Lisp_Object translation_table;
4ed46869 2517
b73bfc1c
KH
2518 if (NILP (Venable_character_translation))
2519 translation_table = Qnil;
2520 else
4ed46869 2521 {
b73bfc1c
KH
2522 translation_table = coding->translation_table_for_decode;
2523 if (NILP (translation_table))
2524 translation_table = Vstandard_translation_table_for_decode;
2525 }
a5d301df 2526
b73bfc1c
KH
2527 while (1)
2528 {
2529 int c, charset, c1, c2;
4ed46869 2530
b73bfc1c
KH
2531 src_base = src;
2532 ONE_MORE_CHAR (c);
2533
2534 /* Now encode the character C. */
2535 if (SINGLE_BYTE_CHAR_P (c))
2536 {
2537 switch (c)
4ed46869 2538 {
b73bfc1c
KH
2539 case '\r':
2540 if (!coding->mode & CODING_MODE_SELECTIVE_DISPLAY)
2541 {
2542 EMIT_ONE_BYTE (c);
2543 break;
2544 }
2545 c = '\n';
2546 case '\n':
2547 if (coding->eol_type == CODING_EOL_CRLF)
2548 {
2549 EMIT_TWO_BYTES ('\r', c);
2550 break;
2551 }
2552 else if (coding->eol_type == CODING_EOL_CR)
2553 c = '\r';
2554 default:
2555 EMIT_ONE_BYTE (c);
2556 }
2557 }
2558 else
2559 {
2560 SPLIT_CHAR (c, charset, c1, c2);
2561 if (sjis_p)
2562 {
2563 if (charset == charset_jisx0208
2564 || charset == charset_jisx0208_1978)
2565 {
2566 ENCODE_SJIS (c1, c2, c1, c2);
2567 EMIT_TWO_BYTES (c1, c2);
2568 }
2569 else if (charset == charset_latin_jisx0201)
2570 EMIT_ONE_BYTE (c1);
2571 else
2572 /* There's no way other than producing the internal
2573 codes as is. */
2574 EMIT_BYTES (src_base, src);
4ed46869 2575 }
4ed46869 2576 else
b73bfc1c
KH
2577 {
2578 if (charset == charset_big5_1 || charset == charset_big5_2)
2579 {
2580 ENCODE_BIG5 (charset, c1, c2, c1, c2);
2581 EMIT_TWO_BYTES (c1, c2);
2582 }
2583 else
2584 /* There's no way other than producing the internal
2585 codes as is. */
2586 EMIT_BYTES (src_base, src);
2587 }
4ed46869 2588 }
b73bfc1c 2589 coding->consumed_char++;
4ed46869
KH
2590 }
2591
b73bfc1c
KH
2592 label_end_of_loop:
2593 coding->consumed = src_base - source;
d46c5b12 2594 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
2595}
2596
2597\f
1397dc18
KH
2598/*** 5. CCL handlers ***/
2599
2600/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2601 Check if a text is encoded in a coding system of which
2602 encoder/decoder are written in CCL program. If it is, return
2603 CODING_CATEGORY_MASK_CCL, else return 0. */
2604
2605int
2606detect_coding_ccl (src, src_end)
2607 unsigned char *src, *src_end;
2608{
2609 unsigned char *valid;
b73bfc1c
KH
2610 int c;
2611 /* Dummy for ONE_MORE_BYTE. */
2612 struct coding_system dummy_coding;
2613 struct coding_system *coding = &dummy_coding;
1397dc18
KH
2614
2615 /* No coding system is assigned to coding-category-ccl. */
2616 if (!coding_system_table[CODING_CATEGORY_IDX_CCL])
2617 return 0;
2618
2619 valid = coding_system_table[CODING_CATEGORY_IDX_CCL]->spec.ccl.valid_codes;
b73bfc1c 2620 while (1)
1397dc18 2621 {
b73bfc1c
KH
2622 ONE_MORE_BYTE (c);
2623 if (! valid[c])
2624 return 0;
1397dc18 2625 }
b73bfc1c 2626 label_end_of_loop:
1397dc18
KH
2627 return CODING_CATEGORY_MASK_CCL;
2628}
2629
2630\f
2631/*** 6. End-of-line handlers ***/
4ed46869 2632
b73bfc1c 2633/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
4ed46869 2634
b73bfc1c 2635static void
d46c5b12 2636decode_eol (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
2637 struct coding_system *coding;
2638 unsigned char *source, *destination;
2639 int src_bytes, dst_bytes;
4ed46869
KH
2640{
2641 unsigned char *src = source;
4ed46869 2642 unsigned char *dst = destination;
b73bfc1c
KH
2643 unsigned char *src_end = src + src_bytes;
2644 unsigned char *dst_end = dst + dst_bytes;
2645 Lisp_Object translation_table;
2646 /* SRC_BASE remembers the start position in source in each loop.
2647 The loop will be exited when there's not enough source code
2648 (within macro ONE_MORE_BYTE), or when there's not enough
2649 destination area to produce a character (within macro
2650 EMIT_CHAR). */
2651 unsigned char *src_base;
2652 int c;
2653
2654 translation_table = Qnil;
4ed46869
KH
2655 switch (coding->eol_type)
2656 {
2657 case CODING_EOL_CRLF:
b73bfc1c 2658 while (1)
d46c5b12 2659 {
b73bfc1c
KH
2660 src_base = src;
2661 ONE_MORE_BYTE (c);
2662 if (c == '\r')
fb88bf2d 2663 {
b73bfc1c
KH
2664 ONE_MORE_BYTE (c);
2665 if (c != '\n')
2666 {
2667 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2668 {
2669 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2670 goto label_end_of_loop;
2671 }
2672 src--;
2673 c = '\r';
2674 }
fb88bf2d 2675 }
b73bfc1c
KH
2676 else if (c == '\n'
2677 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL))
d46c5b12 2678 {
b73bfc1c
KH
2679 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2680 goto label_end_of_loop;
d46c5b12 2681 }
b73bfc1c 2682 EMIT_CHAR (c);
d46c5b12 2683 }
b73bfc1c
KH
2684 break;
2685
2686 case CODING_EOL_CR:
2687 while (1)
d46c5b12 2688 {
b73bfc1c
KH
2689 src_base = src;
2690 ONE_MORE_BYTE (c);
2691 if (c == '\n')
2692 {
2693 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2694 {
2695 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2696 goto label_end_of_loop;
2697 }
2698 }
2699 else if (c == '\r')
2700 c = '\n';
2701 EMIT_CHAR (c);
d46c5b12 2702 }
4ed46869
KH
2703 break;
2704
b73bfc1c
KH
2705 default: /* no need for EOL handling */
2706 while (1)
d46c5b12 2707 {
b73bfc1c
KH
2708 src_base = src;
2709 ONE_MORE_BYTE (c);
2710 EMIT_CHAR (c);
d46c5b12 2711 }
4ed46869
KH
2712 }
2713
b73bfc1c
KH
2714 label_end_of_loop:
2715 coding->consumed = coding->consumed_char = src_base - source;
2716 coding->produced = dst - destination;
2717 return;
4ed46869
KH
2718}
2719
2720/* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode
b73bfc1c
KH
2721 format of end-of-line according to `coding->eol_type'. It also
2722 convert multibyte form 8-bit characers to unibyte if
2723 CODING->src_multibyte is nonzero. If `coding->mode &
2724 CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code '\r' in source text
2725 also means end-of-line. */
4ed46869 2726
b73bfc1c 2727static void
d46c5b12 2728encode_eol (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
2729 struct coding_system *coding;
2730 unsigned char *source, *destination;
2731 int src_bytes, dst_bytes;
4ed46869
KH
2732{
2733 unsigned char *src = source;
2734 unsigned char *dst = destination;
b73bfc1c
KH
2735 unsigned char *src_end = src + src_bytes;
2736 unsigned char *dst_end = dst + dst_bytes;
2737 Lisp_Object translation_table;
2738 /* SRC_BASE remembers the start position in source in each loop.
2739 The loop will be exited when there's not enough source text to
2740 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2741 there's not enough destination area to produce encoded codes
2742 (within macro EMIT_BYTES). */
2743 unsigned char *src_base;
2744 int c;
2745 int selective_display = coding->mode & CODING_MODE_SELECTIVE_DISPLAY;
2746
2747 translation_table = Qnil;
2748 if (coding->src_multibyte
2749 && *(src_end - 1) == LEADING_CODE_8_BIT_CONTROL)
2750 {
2751 src_end--;
2752 src_bytes--;
2753 coding->result = CODING_FINISH_INSUFFICIENT_SRC;
2754 }
fb88bf2d 2755
d46c5b12
KH
2756 if (coding->eol_type == CODING_EOL_CRLF)
2757 {
b73bfc1c 2758 while (src < src_end)
d46c5b12 2759 {
b73bfc1c 2760 src_base = src;
d46c5b12 2761 c = *src++;
b73bfc1c
KH
2762 if (c >= 0x20)
2763 EMIT_ONE_BYTE (c);
2764 else if (c == '\n' || (c == '\r' && selective_display))
2765 EMIT_TWO_BYTES ('\r', '\n');
d46c5b12 2766 else
b73bfc1c 2767 EMIT_ONE_BYTE (c);
d46c5b12 2768 }
ff2b1ea9 2769 src_base = src;
b73bfc1c 2770 label_end_of_loop:
005f0d35 2771 ;
d46c5b12
KH
2772 }
2773 else
4ed46869 2774 {
b73bfc1c 2775 if (src_bytes <= dst_bytes)
4ed46869 2776 {
b73bfc1c
KH
2777 safe_bcopy (src, dst, src_bytes);
2778 src_base = src_end;
2779 dst += src_bytes;
d46c5b12 2780 }
d46c5b12 2781 else
b73bfc1c
KH
2782 {
2783 if (coding->src_multibyte
2784 && *(src + dst_bytes - 1) == LEADING_CODE_8_BIT_CONTROL)
2785 dst_bytes--;
2786 safe_bcopy (src, dst, dst_bytes);
2787 src_base = src + dst_bytes;
2788 dst = destination + dst_bytes;
2789 coding->result = CODING_FINISH_INSUFFICIENT_DST;
2790 }
993824c9 2791 if (coding->eol_type == CODING_EOL_CR)
d46c5b12 2792 {
b73bfc1c
KH
2793 for (src = destination; src < dst; src++)
2794 if (*src == '\n') *src = '\r';
d46c5b12 2795 }
b73bfc1c 2796 else if (selective_display)
d46c5b12 2797 {
b73bfc1c
KH
2798 for (src = destination; src < dst; src++)
2799 if (*src == '\r') *src = '\n';
4ed46869 2800 }
4ed46869 2801 }
b73bfc1c
KH
2802 if (coding->src_multibyte)
2803 dst = destination + str_as_unibyte (destination, dst - destination);
4ed46869 2804
b73bfc1c
KH
2805 coding->consumed = src_base - source;
2806 coding->produced = dst - destination;
4ed46869
KH
2807}
2808
2809\f
1397dc18 2810/*** 7. C library functions ***/
4ed46869
KH
2811
2812/* In Emacs Lisp, coding system is represented by a Lisp symbol which
2813 has a property `coding-system'. The value of this property is a
2814 vector of length 5 (called as coding-vector). Among elements of
2815 this vector, the first (element[0]) and the fifth (element[4])
2816 carry important information for decoding/encoding. Before
2817 decoding/encoding, this information should be set in fields of a
2818 structure of type `coding_system'.
2819
2820 A value of property `coding-system' can be a symbol of another
2821 subsidiary coding-system. In that case, Emacs gets coding-vector
2822 from that symbol.
2823
2824 `element[0]' contains information to be set in `coding->type'. The
2825 value and its meaning is as follows:
2826
0ef69138
KH
2827 0 -- coding_type_emacs_mule
2828 1 -- coding_type_sjis
2829 2 -- coding_type_iso2022
2830 3 -- coding_type_big5
2831 4 -- coding_type_ccl encoder/decoder written in CCL
2832 nil -- coding_type_no_conversion
2833 t -- coding_type_undecided (automatic conversion on decoding,
2834 no-conversion on encoding)
4ed46869
KH
2835
2836 `element[4]' contains information to be set in `coding->flags' and
2837 `coding->spec'. The meaning varies by `coding->type'.
2838
2839 If `coding->type' is `coding_type_iso2022', element[4] is a vector
2840 of length 32 (of which the first 13 sub-elements are used now).
2841 Meanings of these sub-elements are:
2842
2843 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso2022'
2844 If the value is an integer of valid charset, the charset is
2845 assumed to be designated to graphic register N initially.
2846
2847 If the value is minus, it is a minus value of charset which
2848 reserves graphic register N, which means that the charset is
2849 not designated initially but should be designated to graphic
2850 register N just before encoding a character in that charset.
2851
2852 If the value is nil, graphic register N is never used on
2853 encoding.
2854
2855 sub-element[N] where N is 4 through 11: to be set in `coding->flags'
2856 Each value takes t or nil. See the section ISO2022 of
2857 `coding.h' for more information.
2858
2859 If `coding->type' is `coding_type_big5', element[4] is t to denote
2860 BIG5-ETen or nil to denote BIG5-HKU.
2861
2862 If `coding->type' takes the other value, element[4] is ignored.
2863
2864 Emacs Lisp's coding system also carries information about format of
2865 end-of-line in a value of property `eol-type'. If the value is
2866 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
2867 means CODING_EOL_CR. If it is not integer, it should be a vector
2868 of subsidiary coding systems of which property `eol-type' has one
2869 of above values.
2870
2871*/
2872
2873/* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
2874 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING
2875 is setup so that no conversion is necessary and return -1, else
2876 return 0. */
2877
2878int
e0e989f6
KH
2879setup_coding_system (coding_system, coding)
2880 Lisp_Object coding_system;
4ed46869
KH
2881 struct coding_system *coding;
2882{
d46c5b12 2883 Lisp_Object coding_spec, coding_type, eol_type, plist;
4608c386 2884 Lisp_Object val;
70c22245 2885 int i;
4ed46869 2886
d46c5b12 2887 /* Initialize some fields required for all kinds of coding systems. */
774324d6 2888 coding->symbol = coding_system;
d46c5b12
KH
2889 coding->common_flags = 0;
2890 coding->mode = 0;
2891 coding->heading_ascii = -1;
2892 coding->post_read_conversion = coding->pre_write_conversion = Qnil;
ec6d2bb8
KH
2893 coding->composing = COMPOSITION_DISABLED;
2894 coding->cmp_data = NULL;
1f5dbf34
KH
2895
2896 if (NILP (coding_system))
2897 goto label_invalid_coding_system;
2898
4608c386 2899 coding_spec = Fget (coding_system, Qcoding_system);
1f5dbf34 2900
4608c386
KH
2901 if (!VECTORP (coding_spec)
2902 || XVECTOR (coding_spec)->size != 5
2903 || !CONSP (XVECTOR (coding_spec)->contents[3]))
4ed46869 2904 goto label_invalid_coding_system;
4608c386 2905
d46c5b12
KH
2906 eol_type = inhibit_eol_conversion ? Qnil : Fget (coding_system, Qeol_type);
2907 if (VECTORP (eol_type))
2908 {
2909 coding->eol_type = CODING_EOL_UNDECIDED;
2910 coding->common_flags = CODING_REQUIRE_DETECTION_MASK;
2911 }
2912 else if (XFASTINT (eol_type) == 1)
2913 {
2914 coding->eol_type = CODING_EOL_CRLF;
2915 coding->common_flags
2916 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
2917 }
2918 else if (XFASTINT (eol_type) == 2)
2919 {
2920 coding->eol_type = CODING_EOL_CR;
2921 coding->common_flags
2922 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
2923 }
2924 else
2925 coding->eol_type = CODING_EOL_LF;
2926
2927 coding_type = XVECTOR (coding_spec)->contents[0];
2928 /* Try short cut. */
2929 if (SYMBOLP (coding_type))
2930 {
2931 if (EQ (coding_type, Qt))
2932 {
2933 coding->type = coding_type_undecided;
2934 coding->common_flags |= CODING_REQUIRE_DETECTION_MASK;
2935 }
2936 else
2937 coding->type = coding_type_no_conversion;
2938 return 0;
2939 }
2940
d46c5b12
KH
2941 /* Get values of coding system properties:
2942 `post-read-conversion', `pre-write-conversion',
f967223b 2943 `translation-table-for-decode', `translation-table-for-encode'. */
4608c386 2944 plist = XVECTOR (coding_spec)->contents[3];
b843d1ae
KH
2945 /* Pre & post conversion functions should be disabled if
2946 inhibit_eol_conversion is nozero. This is the case that a code
2947 conversion function is called while those functions are running. */
2948 if (! inhibit_pre_post_conversion)
2949 {
2950 coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion);
2951 coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion);
2952 }
f967223b 2953 val = Fplist_get (plist, Qtranslation_table_for_decode);
4608c386 2954 if (SYMBOLP (val))
f967223b
KH
2955 val = Fget (val, Qtranslation_table_for_decode);
2956 coding->translation_table_for_decode = CHAR_TABLE_P (val) ? val : Qnil;
2957 val = Fplist_get (plist, Qtranslation_table_for_encode);
4608c386 2958 if (SYMBOLP (val))
f967223b
KH
2959 val = Fget (val, Qtranslation_table_for_encode);
2960 coding->translation_table_for_encode = CHAR_TABLE_P (val) ? val : Qnil;
d46c5b12
KH
2961 val = Fplist_get (plist, Qcoding_category);
2962 if (!NILP (val))
2963 {
2964 val = Fget (val, Qcoding_category_index);
2965 if (INTEGERP (val))
2966 coding->category_idx = XINT (val);
2967 else
2968 goto label_invalid_coding_system;
2969 }
2970 else
2971 goto label_invalid_coding_system;
4608c386 2972
70c22245
KH
2973 val = Fplist_get (plist, Qsafe_charsets);
2974 if (EQ (val, Qt))
2975 {
2976 for (i = 0; i <= MAX_CHARSET; i++)
2977 coding->safe_charsets[i] = 1;
2978 }
2979 else
2980 {
2981 bzero (coding->safe_charsets, MAX_CHARSET + 1);
2982 while (CONSP (val))
2983 {
03699b14 2984 if ((i = get_charset_id (XCAR (val))) >= 0)
70c22245 2985 coding->safe_charsets[i] = 1;
03699b14 2986 val = XCDR (val);
70c22245
KH
2987 }
2988 }
2989
ec6d2bb8
KH
2990 /* If the coding system has non-nil `composition' property, enable
2991 composition handling. */
2992 val = Fplist_get (plist, Qcomposition);
2993 if (!NILP (val))
2994 coding->composing = COMPOSITION_NO;
2995
d46c5b12 2996 switch (XFASTINT (coding_type))
4ed46869
KH
2997 {
2998 case 0:
0ef69138 2999 coding->type = coding_type_emacs_mule;
c952af22
KH
3000 if (!NILP (coding->post_read_conversion))
3001 coding->common_flags |= CODING_REQUIRE_DECODING_MASK;
3002 if (!NILP (coding->pre_write_conversion))
3003 coding->common_flags |= CODING_REQUIRE_ENCODING_MASK;
4ed46869
KH
3004 break;
3005
3006 case 1:
3007 coding->type = coding_type_sjis;
c952af22
KH
3008 coding->common_flags
3009 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869
KH
3010 break;
3011
3012 case 2:
3013 coding->type = coding_type_iso2022;
c952af22
KH
3014 coding->common_flags
3015 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3016 {
70c22245 3017 Lisp_Object val, temp;
4ed46869 3018 Lisp_Object *flags;
d46c5b12 3019 int i, charset, reg_bits = 0;
4ed46869 3020
4608c386 3021 val = XVECTOR (coding_spec)->contents[4];
f44d27ce 3022
4ed46869
KH
3023 if (!VECTORP (val) || XVECTOR (val)->size != 32)
3024 goto label_invalid_coding_system;
3025
3026 flags = XVECTOR (val)->contents;
3027 coding->flags
3028 = ((NILP (flags[4]) ? 0 : CODING_FLAG_ISO_SHORT_FORM)
3029 | (NILP (flags[5]) ? 0 : CODING_FLAG_ISO_RESET_AT_EOL)
3030 | (NILP (flags[6]) ? 0 : CODING_FLAG_ISO_RESET_AT_CNTL)
3031 | (NILP (flags[7]) ? 0 : CODING_FLAG_ISO_SEVEN_BITS)
3032 | (NILP (flags[8]) ? 0 : CODING_FLAG_ISO_LOCKING_SHIFT)
3033 | (NILP (flags[9]) ? 0 : CODING_FLAG_ISO_SINGLE_SHIFT)
3034 | (NILP (flags[10]) ? 0 : CODING_FLAG_ISO_USE_ROMAN)
3035 | (NILP (flags[11]) ? 0 : CODING_FLAG_ISO_USE_OLDJIS)
e0e989f6
KH
3036 | (NILP (flags[12]) ? 0 : CODING_FLAG_ISO_NO_DIRECTION)
3037 | (NILP (flags[13]) ? 0 : CODING_FLAG_ISO_INIT_AT_BOL)
c4825358
KH
3038 | (NILP (flags[14]) ? 0 : CODING_FLAG_ISO_DESIGNATE_AT_BOL)
3039 | (NILP (flags[15]) ? 0 : CODING_FLAG_ISO_SAFE)
3f003981 3040 | (NILP (flags[16]) ? 0 : CODING_FLAG_ISO_LATIN_EXTRA)
c4825358 3041 );
4ed46869
KH
3042
3043 /* Invoke graphic register 0 to plane 0. */
3044 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
3045 /* Invoke graphic register 1 to plane 1 if we can use full 8-bit. */
3046 CODING_SPEC_ISO_INVOCATION (coding, 1)
3047 = (coding->flags & CODING_FLAG_ISO_SEVEN_BITS ? -1 : 1);
3048 /* Not single shifting at first. */
6e85d753 3049 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0;
e0e989f6 3050 /* Beginning of buffer should also be regarded as bol. */
6e85d753 3051 CODING_SPEC_ISO_BOL (coding) = 1;
4ed46869 3052
70c22245
KH
3053 for (charset = 0; charset <= MAX_CHARSET; charset++)
3054 CODING_SPEC_ISO_REVISION_NUMBER (coding, charset) = 255;
3055 val = Vcharset_revision_alist;
3056 while (CONSP (val))
3057 {
03699b14 3058 charset = get_charset_id (Fcar_safe (XCAR (val)));
70c22245 3059 if (charset >= 0
03699b14 3060 && (temp = Fcdr_safe (XCAR (val)), INTEGERP (temp))
70c22245
KH
3061 && (i = XINT (temp), (i >= 0 && (i + '@') < 128)))
3062 CODING_SPEC_ISO_REVISION_NUMBER (coding, charset) = i;
03699b14 3063 val = XCDR (val);
70c22245
KH
3064 }
3065
4ed46869
KH
3066 /* Checks FLAGS[REG] (REG = 0, 1, 2 3) and decide designations.
3067 FLAGS[REG] can be one of below:
3068 integer CHARSET: CHARSET occupies register I,
3069 t: designate nothing to REG initially, but can be used
3070 by any charsets,
3071 list of integer, nil, or t: designate the first
3072 element (if integer) to REG initially, the remaining
3073 elements (if integer) is designated to REG on request,
d46c5b12 3074 if an element is t, REG can be used by any charsets,
4ed46869 3075 nil: REG is never used. */
467e7675 3076 for (charset = 0; charset <= MAX_CHARSET; charset++)
1ba9e4ab
KH
3077 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3078 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION;
4ed46869
KH
3079 for (i = 0; i < 4; i++)
3080 {
3081 if (INTEGERP (flags[i])
e0e989f6
KH
3082 && (charset = XINT (flags[i]), CHARSET_VALID_P (charset))
3083 || (charset = get_charset_id (flags[i])) >= 0)
4ed46869
KH
3084 {
3085 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = charset;
3086 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) = i;
3087 }
3088 else if (EQ (flags[i], Qt))
3089 {
3090 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
d46c5b12
KH
3091 reg_bits |= 1 << i;
3092 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
4ed46869
KH
3093 }
3094 else if (CONSP (flags[i]))
3095 {
84d60297
RS
3096 Lisp_Object tail;
3097 tail = flags[i];
4ed46869 3098
d46c5b12 3099 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
03699b14
KR
3100 if (INTEGERP (XCAR (tail))
3101 && (charset = XINT (XCAR (tail)),
e0e989f6 3102 CHARSET_VALID_P (charset))
03699b14 3103 || (charset = get_charset_id (XCAR (tail))) >= 0)
4ed46869
KH
3104 {
3105 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = charset;
3106 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) =i;
3107 }
3108 else
3109 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
03699b14 3110 tail = XCDR (tail);
4ed46869
KH
3111 while (CONSP (tail))
3112 {
03699b14
KR
3113 if (INTEGERP (XCAR (tail))
3114 && (charset = XINT (XCAR (tail)),
e0e989f6 3115 CHARSET_VALID_P (charset))
03699b14 3116 || (charset = get_charset_id (XCAR (tail))) >= 0)
70c22245
KH
3117 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3118 = i;
03699b14 3119 else if (EQ (XCAR (tail), Qt))
d46c5b12 3120 reg_bits |= 1 << i;
03699b14 3121 tail = XCDR (tail);
4ed46869
KH
3122 }
3123 }
3124 else
3125 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
3126
3127 CODING_SPEC_ISO_DESIGNATION (coding, i)
3128 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i);
3129 }
3130
d46c5b12 3131 if (reg_bits && ! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
4ed46869
KH
3132 {
3133 /* REG 1 can be used only by locking shift in 7-bit env. */
3134 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
d46c5b12 3135 reg_bits &= ~2;
4ed46869
KH
3136 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
3137 /* Without any shifting, only REG 0 and 1 can be used. */
d46c5b12 3138 reg_bits &= 3;
4ed46869
KH
3139 }
3140
d46c5b12
KH
3141 if (reg_bits)
3142 for (charset = 0; charset <= MAX_CHARSET; charset++)
6e85d753 3143 {
96148065
KH
3144 if (CHARSET_VALID_P (charset)
3145 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3146 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
d46c5b12
KH
3147 {
3148 /* There exist some default graphic registers to be
96148065 3149 used by CHARSET. */
d46c5b12
KH
3150
3151 /* We had better avoid designating a charset of
3152 CHARS96 to REG 0 as far as possible. */
3153 if (CHARSET_CHARS (charset) == 96)
3154 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3155 = (reg_bits & 2
3156 ? 1 : (reg_bits & 4 ? 2 : (reg_bits & 8 ? 3 : 0)));
3157 else
3158 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3159 = (reg_bits & 1
3160 ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3)));
3161 }
6e85d753 3162 }
4ed46869 3163 }
c952af22 3164 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
d46c5b12 3165 coding->spec.iso2022.last_invalid_designation_register = -1;
4ed46869
KH
3166 break;
3167
3168 case 3:
3169 coding->type = coding_type_big5;
c952af22
KH
3170 coding->common_flags
3171 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3172 coding->flags
4608c386 3173 = (NILP (XVECTOR (coding_spec)->contents[4])
4ed46869
KH
3174 ? CODING_FLAG_BIG5_HKU
3175 : CODING_FLAG_BIG5_ETEN);
3176 break;
3177
3178 case 4:
3179 coding->type = coding_type_ccl;
c952af22
KH
3180 coding->common_flags
3181 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3182 {
84d60297 3183 val = XVECTOR (coding_spec)->contents[4];
ef4ced28
KH
3184 if (! CONSP (val)
3185 || setup_ccl_program (&(coding->spec.ccl.decoder),
03699b14 3186 XCAR (val)) < 0
ef4ced28 3187 || setup_ccl_program (&(coding->spec.ccl.encoder),
03699b14 3188 XCDR (val)) < 0)
4ed46869 3189 goto label_invalid_coding_system;
1397dc18
KH
3190
3191 bzero (coding->spec.ccl.valid_codes, 256);
3192 val = Fplist_get (plist, Qvalid_codes);
3193 if (CONSP (val))
3194 {
3195 Lisp_Object this;
3196
03699b14 3197 for (; CONSP (val); val = XCDR (val))
1397dc18 3198 {
03699b14 3199 this = XCAR (val);
1397dc18
KH
3200 if (INTEGERP (this)
3201 && XINT (this) >= 0 && XINT (this) < 256)
3202 coding->spec.ccl.valid_codes[XINT (this)] = 1;
3203 else if (CONSP (this)
03699b14
KR
3204 && INTEGERP (XCAR (this))
3205 && INTEGERP (XCDR (this)))
1397dc18 3206 {
03699b14
KR
3207 int start = XINT (XCAR (this));
3208 int end = XINT (XCDR (this));
1397dc18
KH
3209
3210 if (start >= 0 && start <= end && end < 256)
e133c8fa 3211 while (start <= end)
1397dc18
KH
3212 coding->spec.ccl.valid_codes[start++] = 1;
3213 }
3214 }
3215 }
4ed46869 3216 }
c952af22 3217 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
aaaf0b1e 3218 coding->spec.ccl.cr_carryover = 0;
4ed46869
KH
3219 break;
3220
27901516
KH
3221 case 5:
3222 coding->type = coding_type_raw_text;
3223 break;
3224
4ed46869 3225 default:
d46c5b12 3226 goto label_invalid_coding_system;
4ed46869
KH
3227 }
3228 return 0;
3229
3230 label_invalid_coding_system:
3231 coding->type = coding_type_no_conversion;
d46c5b12 3232 coding->category_idx = CODING_CATEGORY_IDX_BINARY;
c952af22 3233 coding->common_flags = 0;
dec137e5 3234 coding->eol_type = CODING_EOL_LF;
d46c5b12 3235 coding->pre_write_conversion = coding->post_read_conversion = Qnil;
4ed46869
KH
3236 return -1;
3237}
3238
ec6d2bb8
KH
3239/* Free memory blocks allocated for storing composition information. */
3240
3241void
3242coding_free_composition_data (coding)
3243 struct coding_system *coding;
3244{
3245 struct composition_data *cmp_data = coding->cmp_data, *next;
3246
3247 if (!cmp_data)
3248 return;
3249 /* Memory blocks are chained. At first, rewind to the first, then,
3250 free blocks one by one. */
3251 while (cmp_data->prev)
3252 cmp_data = cmp_data->prev;
3253 while (cmp_data)
3254 {
3255 next = cmp_data->next;
3256 xfree (cmp_data);
3257 cmp_data = next;
3258 }
3259 coding->cmp_data = NULL;
3260}
3261
3262/* Set `char_offset' member of all memory blocks pointed by
3263 coding->cmp_data to POS. */
3264
3265void
3266coding_adjust_composition_offset (coding, pos)
3267 struct coding_system *coding;
3268 int pos;
3269{
3270 struct composition_data *cmp_data;
3271
3272 for (cmp_data = coding->cmp_data; cmp_data; cmp_data = cmp_data->next)
3273 cmp_data->char_offset = pos;
3274}
3275
54f78171
KH
3276/* Setup raw-text or one of its subsidiaries in the structure
3277 coding_system CODING according to the already setup value eol_type
3278 in CODING. CODING should be setup for some coding system in
3279 advance. */
3280
3281void
3282setup_raw_text_coding_system (coding)
3283 struct coding_system *coding;
3284{
3285 if (coding->type != coding_type_raw_text)
3286 {
3287 coding->symbol = Qraw_text;
3288 coding->type = coding_type_raw_text;
3289 if (coding->eol_type != CODING_EOL_UNDECIDED)
3290 {
84d60297
RS
3291 Lisp_Object subsidiaries;
3292 subsidiaries = Fget (Qraw_text, Qeol_type);
54f78171
KH
3293
3294 if (VECTORP (subsidiaries)
3295 && XVECTOR (subsidiaries)->size == 3)
3296 coding->symbol
3297 = XVECTOR (subsidiaries)->contents[coding->eol_type];
3298 }
716e0b0a 3299 setup_coding_system (coding->symbol, coding);
54f78171
KH
3300 }
3301 return;
3302}
3303
4ed46869
KH
3304/* Emacs has a mechanism to automatically detect a coding system if it
3305 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But,
3306 it's impossible to distinguish some coding systems accurately
3307 because they use the same range of codes. So, at first, coding
3308 systems are categorized into 7, those are:
3309
0ef69138 3310 o coding-category-emacs-mule
4ed46869
KH
3311
3312 The category for a coding system which has the same code range
3313 as Emacs' internal format. Assigned the coding-system (Lisp
0ef69138 3314 symbol) `emacs-mule' by default.
4ed46869
KH
3315
3316 o coding-category-sjis
3317
3318 The category for a coding system which has the same code range
3319 as SJIS. Assigned the coding-system (Lisp
7717c392 3320 symbol) `japanese-shift-jis' by default.
4ed46869
KH
3321
3322 o coding-category-iso-7
3323
3324 The category for a coding system which has the same code range
7717c392 3325 as ISO2022 of 7-bit environment. This doesn't use any locking
d46c5b12
KH
3326 shift and single shift functions. This can encode/decode all
3327 charsets. Assigned the coding-system (Lisp symbol)
3328 `iso-2022-7bit' by default.
3329
3330 o coding-category-iso-7-tight
3331
3332 Same as coding-category-iso-7 except that this can
3333 encode/decode only the specified charsets.
4ed46869
KH
3334
3335 o coding-category-iso-8-1
3336
3337 The category for a coding system which has the same code range
3338 as ISO2022 of 8-bit environment and graphic plane 1 used only
7717c392
KH
3339 for DIMENSION1 charset. This doesn't use any locking shift
3340 and single shift functions. Assigned the coding-system (Lisp
3341 symbol) `iso-latin-1' by default.
4ed46869
KH
3342
3343 o coding-category-iso-8-2
3344
3345 The category for a coding system which has the same code range
3346 as ISO2022 of 8-bit environment and graphic plane 1 used only
7717c392
KH
3347 for DIMENSION2 charset. This doesn't use any locking shift
3348 and single shift functions. Assigned the coding-system (Lisp
3349 symbol) `japanese-iso-8bit' by default.
4ed46869 3350
7717c392 3351 o coding-category-iso-7-else
4ed46869
KH
3352
3353 The category for a coding system which has the same code range
7717c392
KH
3354 as ISO2022 of 7-bit environemnt but uses locking shift or
3355 single shift functions. Assigned the coding-system (Lisp
3356 symbol) `iso-2022-7bit-lock' by default.
3357
3358 o coding-category-iso-8-else
3359
3360 The category for a coding system which has the same code range
3361 as ISO2022 of 8-bit environemnt but uses locking shift or
3362 single shift functions. Assigned the coding-system (Lisp
3363 symbol) `iso-2022-8bit-ss2' by default.
4ed46869
KH
3364
3365 o coding-category-big5
3366
3367 The category for a coding system which has the same code range
3368 as BIG5. Assigned the coding-system (Lisp symbol)
e0e989f6 3369 `cn-big5' by default.
4ed46869 3370
fa42c37f
KH
3371 o coding-category-utf-8
3372
3373 The category for a coding system which has the same code range
3374 as UTF-8 (cf. RFC2279). Assigned the coding-system (Lisp
3375 symbol) `utf-8' by default.
3376
3377 o coding-category-utf-16-be
3378
3379 The category for a coding system in which a text has an
3380 Unicode signature (cf. Unicode Standard) in the order of BIG
3381 endian at the head. Assigned the coding-system (Lisp symbol)
3382 `utf-16-be' by default.
3383
3384 o coding-category-utf-16-le
3385
3386 The category for a coding system in which a text has an
3387 Unicode signature (cf. Unicode Standard) in the order of
3388 LITTLE endian at the head. Assigned the coding-system (Lisp
3389 symbol) `utf-16-le' by default.
3390
1397dc18
KH
3391 o coding-category-ccl
3392
3393 The category for a coding system of which encoder/decoder is
3394 written in CCL programs. The default value is nil, i.e., no
3395 coding system is assigned.
3396
4ed46869
KH
3397 o coding-category-binary
3398
3399 The category for a coding system not categorized in any of the
3400 above. Assigned the coding-system (Lisp symbol)
e0e989f6 3401 `no-conversion' by default.
4ed46869
KH
3402
3403 Each of them is a Lisp symbol and the value is an actual
3404 `coding-system's (this is also a Lisp symbol) assigned by a user.
3405 What Emacs does actually is to detect a category of coding system.
3406 Then, it uses a `coding-system' assigned to it. If Emacs can't
3407 decide only one possible category, it selects a category of the
3408 highest priority. Priorities of categories are also specified by a
3409 user in a Lisp variable `coding-category-list'.
3410
3411*/
3412
66cfb530
KH
3413static
3414int ascii_skip_code[256];
3415
d46c5b12 3416/* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded.
4ed46869
KH
3417 If it detects possible coding systems, return an integer in which
3418 appropriate flag bits are set. Flag bits are defined by macros
fa42c37f
KH
3419 CODING_CATEGORY_MASK_XXX in `coding.h'. If PRIORITIES is non-NULL,
3420 it should point the table `coding_priorities'. In that case, only
3421 the flag bit for a coding system of the highest priority is set in
3422 the returned value.
4ed46869 3423
d46c5b12
KH
3424 How many ASCII characters are at the head is returned as *SKIP. */
3425
3426static int
3427detect_coding_mask (source, src_bytes, priorities, skip)
3428 unsigned char *source;
3429 int src_bytes, *priorities, *skip;
4ed46869
KH
3430{
3431 register unsigned char c;
d46c5b12 3432 unsigned char *src = source, *src_end = source + src_bytes;
fa42c37f
KH
3433 unsigned int mask, utf16_examined_p, iso2022_examined_p;
3434 int i, idx;
4ed46869
KH
3435
3436 /* At first, skip all ASCII characters and control characters except
3437 for three ISO2022 specific control characters. */
66cfb530
KH
3438 ascii_skip_code[ISO_CODE_SO] = 0;
3439 ascii_skip_code[ISO_CODE_SI] = 0;
3440 ascii_skip_code[ISO_CODE_ESC] = 0;
3441
bcf26d6a 3442 label_loop_detect_coding:
66cfb530 3443 while (src < src_end && ascii_skip_code[*src]) src++;
d46c5b12 3444 *skip = src - source;
4ed46869
KH
3445
3446 if (src >= src_end)
3447 /* We found nothing other than ASCII. There's nothing to do. */
d46c5b12 3448 return 0;
4ed46869 3449
8a8147d6 3450 c = *src;
4ed46869
KH
3451 /* The text seems to be encoded in some multilingual coding system.
3452 Now, try to find in which coding system the text is encoded. */
3453 if (c < 0x80)
bcf26d6a
KH
3454 {
3455 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */
3456 /* C is an ISO2022 specific control code of C0. */
3457 mask = detect_coding_iso2022 (src, src_end);
1b2af4b0 3458 if (mask == 0)
d46c5b12
KH
3459 {
3460 /* No valid ISO2022 code follows C. Try again. */
3461 src++;
66cfb530
KH
3462 if (c == ISO_CODE_ESC)
3463 ascii_skip_code[ISO_CODE_ESC] = 1;
3464 else
3465 ascii_skip_code[ISO_CODE_SO] = ascii_skip_code[ISO_CODE_SI] = 1;
d46c5b12
KH
3466 goto label_loop_detect_coding;
3467 }
3468 if (priorities)
fa42c37f
KH
3469 {
3470 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
3471 {
3472 if (mask & priorities[i])
3473 return priorities[i];
3474 }
3475 return CODING_CATEGORY_MASK_RAW_TEXT;
3476 }
bcf26d6a 3477 }
d46c5b12 3478 else
c4825358 3479 {
d46c5b12 3480 int try;
4ed46869 3481
d46c5b12
KH
3482 if (c < 0xA0)
3483 {
3484 /* C is the first byte of SJIS character code,
fa42c37f
KH
3485 or a leading-code of Emacs' internal format (emacs-mule),
3486 or the first byte of UTF-16. */
3487 try = (CODING_CATEGORY_MASK_SJIS
3488 | CODING_CATEGORY_MASK_EMACS_MULE
3489 | CODING_CATEGORY_MASK_UTF_16_BE
3490 | CODING_CATEGORY_MASK_UTF_16_LE);
d46c5b12
KH
3491
3492 /* Or, if C is a special latin extra code,
3493 or is an ISO2022 specific control code of C1 (SS2 or SS3),
3494 or is an ISO2022 control-sequence-introducer (CSI),
3495 we should also consider the possibility of ISO2022 codings. */
3496 if ((VECTORP (Vlatin_extra_code_table)
3497 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
3498 || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3)
3499 || (c == ISO_CODE_CSI
3500 && (src < src_end
3501 && (*src == ']'
3502 || ((*src == '0' || *src == '1' || *src == '2')
3503 && src + 1 < src_end
3504 && src[1] == ']')))))
3505 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE
3506 | CODING_CATEGORY_MASK_ISO_8BIT);
3507 }
c4825358 3508 else
d46c5b12
KH
3509 /* C is a character of ISO2022 in graphic plane right,
3510 or a SJIS's 1-byte character code (i.e. JISX0201),
fa42c37f
KH
3511 or the first byte of BIG5's 2-byte code,
3512 or the first byte of UTF-8/16. */
d46c5b12
KH
3513 try = (CODING_CATEGORY_MASK_ISO_8_ELSE
3514 | CODING_CATEGORY_MASK_ISO_8BIT
3515 | CODING_CATEGORY_MASK_SJIS
fa42c37f
KH
3516 | CODING_CATEGORY_MASK_BIG5
3517 | CODING_CATEGORY_MASK_UTF_8
3518 | CODING_CATEGORY_MASK_UTF_16_BE
3519 | CODING_CATEGORY_MASK_UTF_16_LE);
d46c5b12 3520
1397dc18
KH
3521 /* Or, we may have to consider the possibility of CCL. */
3522 if (coding_system_table[CODING_CATEGORY_IDX_CCL]
3523 && (coding_system_table[CODING_CATEGORY_IDX_CCL]
3524 ->spec.ccl.valid_codes)[c])
3525 try |= CODING_CATEGORY_MASK_CCL;
3526
d46c5b12 3527 mask = 0;
fa42c37f 3528 utf16_examined_p = iso2022_examined_p = 0;
d46c5b12
KH
3529 if (priorities)
3530 {
3531 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
3532 {
fa42c37f
KH
3533 if (!iso2022_examined_p
3534 && (priorities[i] & try & CODING_CATEGORY_MASK_ISO))
3535 {
3536 mask |= detect_coding_iso2022 (src, src_end);
3537 iso2022_examined_p = 1;
3538 }
5ab13dd0 3539 else if (priorities[i] & try & CODING_CATEGORY_MASK_SJIS)
fa42c37f
KH
3540 mask |= detect_coding_sjis (src, src_end);
3541 else if (priorities[i] & try & CODING_CATEGORY_MASK_UTF_8)
3542 mask |= detect_coding_utf_8 (src, src_end);
3543 else if (!utf16_examined_p
3544 && (priorities[i] & try &
3545 CODING_CATEGORY_MASK_UTF_16_BE_LE))
3546 {
3547 mask |= detect_coding_utf_16 (src, src_end);
3548 utf16_examined_p = 1;
3549 }
5ab13dd0 3550 else if (priorities[i] & try & CODING_CATEGORY_MASK_BIG5)
fa42c37f 3551 mask |= detect_coding_big5 (src, src_end);
5ab13dd0 3552 else if (priorities[i] & try & CODING_CATEGORY_MASK_EMACS_MULE)
fa42c37f 3553 mask |= detect_coding_emacs_mule (src, src_end);
89fa8b36 3554 else if (priorities[i] & try & CODING_CATEGORY_MASK_CCL)
fa42c37f 3555 mask |= detect_coding_ccl (src, src_end);
5ab13dd0 3556 else if (priorities[i] & CODING_CATEGORY_MASK_RAW_TEXT)
fa42c37f 3557 mask |= CODING_CATEGORY_MASK_RAW_TEXT;
5ab13dd0 3558 else if (priorities[i] & CODING_CATEGORY_MASK_BINARY)
fa42c37f
KH
3559 mask |= CODING_CATEGORY_MASK_BINARY;
3560 if (mask & priorities[i])
3561 return priorities[i];
d46c5b12
KH
3562 }
3563 return CODING_CATEGORY_MASK_RAW_TEXT;
3564 }
3565 if (try & CODING_CATEGORY_MASK_ISO)
3566 mask |= detect_coding_iso2022 (src, src_end);
3567 if (try & CODING_CATEGORY_MASK_SJIS)
3568 mask |= detect_coding_sjis (src, src_end);
3569 if (try & CODING_CATEGORY_MASK_BIG5)
3570 mask |= detect_coding_big5 (src, src_end);
fa42c37f
KH
3571 if (try & CODING_CATEGORY_MASK_UTF_8)
3572 mask |= detect_coding_utf_8 (src, src_end);
3573 if (try & CODING_CATEGORY_MASK_UTF_16_BE_LE)
3574 mask |= detect_coding_utf_16 (src, src_end);
d46c5b12 3575 if (try & CODING_CATEGORY_MASK_EMACS_MULE)
1397dc18
KH
3576 mask |= detect_coding_emacs_mule (src, src_end);
3577 if (try & CODING_CATEGORY_MASK_CCL)
3578 mask |= detect_coding_ccl (src, src_end);
c4825358 3579 }
5ab13dd0 3580 return (mask | CODING_CATEGORY_MASK_RAW_TEXT | CODING_CATEGORY_MASK_BINARY);
4ed46869
KH
3581}
3582
3583/* Detect how a text of length SRC_BYTES pointed by SRC is encoded.
3584 The information of the detected coding system is set in CODING. */
3585
3586void
3587detect_coding (coding, src, src_bytes)
3588 struct coding_system *coding;
3589 unsigned char *src;
3590 int src_bytes;
3591{
d46c5b12
KH
3592 unsigned int idx;
3593 int skip, mask, i;
84d60297 3594 Lisp_Object val;
4ed46869 3595
84d60297 3596 val = Vcoding_category_list;
66cfb530 3597 mask = detect_coding_mask (src, src_bytes, coding_priorities, &skip);
d46c5b12 3598 coding->heading_ascii = skip;
4ed46869 3599
d46c5b12
KH
3600 if (!mask) return;
3601
3602 /* We found a single coding system of the highest priority in MASK. */
3603 idx = 0;
3604 while (mask && ! (mask & 1)) mask >>= 1, idx++;
3605 if (! mask)
3606 idx = CODING_CATEGORY_IDX_RAW_TEXT;
4ed46869 3607
d46c5b12
KH
3608 val = XSYMBOL (XVECTOR (Vcoding_category_table)->contents[idx])->value;
3609
3610 if (coding->eol_type != CODING_EOL_UNDECIDED)
27901516 3611 {
84d60297 3612 Lisp_Object tmp;
d46c5b12 3613
84d60297 3614 tmp = Fget (val, Qeol_type);
d46c5b12
KH
3615 if (VECTORP (tmp))
3616 val = XVECTOR (tmp)->contents[coding->eol_type];
4ed46869 3617 }
b73bfc1c
KH
3618
3619 /* Setup this new coding system while preserving some slots. */
3620 {
3621 int src_multibyte = coding->src_multibyte;
3622 int dst_multibyte = coding->dst_multibyte;
3623
3624 setup_coding_system (val, coding);
3625 coding->src_multibyte = src_multibyte;
3626 coding->dst_multibyte = dst_multibyte;
3627 coding->heading_ascii = skip;
3628 }
4ed46869
KH
3629}
3630
d46c5b12
KH
3631/* Detect how end-of-line of a text of length SRC_BYTES pointed by
3632 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF,
3633 CODING_EOL_CR, and CODING_EOL_UNDECIDED.
3634
3635 How many non-eol characters are at the head is returned as *SKIP. */
4ed46869 3636
bc4bc72a
RS
3637#define MAX_EOL_CHECK_COUNT 3
3638
d46c5b12
KH
3639static int
3640detect_eol_type (source, src_bytes, skip)
3641 unsigned char *source;
3642 int src_bytes, *skip;
4ed46869 3643{
d46c5b12 3644 unsigned char *src = source, *src_end = src + src_bytes;
4ed46869 3645 unsigned char c;
bc4bc72a
RS
3646 int total = 0; /* How many end-of-lines are found so far. */
3647 int eol_type = CODING_EOL_UNDECIDED;
3648 int this_eol_type;
4ed46869 3649
d46c5b12
KH
3650 *skip = 0;
3651
bc4bc72a 3652 while (src < src_end && total < MAX_EOL_CHECK_COUNT)
4ed46869
KH
3653 {
3654 c = *src++;
bc4bc72a 3655 if (c == '\n' || c == '\r')
4ed46869 3656 {
d46c5b12
KH
3657 if (*skip == 0)
3658 *skip = src - 1 - source;
bc4bc72a
RS
3659 total++;
3660 if (c == '\n')
3661 this_eol_type = CODING_EOL_LF;
3662 else if (src >= src_end || *src != '\n')
3663 this_eol_type = CODING_EOL_CR;
4ed46869 3664 else
bc4bc72a
RS
3665 this_eol_type = CODING_EOL_CRLF, src++;
3666
3667 if (eol_type == CODING_EOL_UNDECIDED)
3668 /* This is the first end-of-line. */
3669 eol_type = this_eol_type;
3670 else if (eol_type != this_eol_type)
d46c5b12
KH
3671 {
3672 /* The found type is different from what found before. */
3673 eol_type = CODING_EOL_INCONSISTENT;
3674 break;
3675 }
4ed46869
KH
3676 }
3677 }
bc4bc72a 3678
d46c5b12
KH
3679 if (*skip == 0)
3680 *skip = src_end - source;
85a02ca4 3681 return eol_type;
4ed46869
KH
3682}
3683
fa42c37f
KH
3684/* Like detect_eol_type, but detect EOL type in 2-octet
3685 big-endian/little-endian format for coding systems utf-16-be and
3686 utf-16-le. */
3687
3688static int
3689detect_eol_type_in_2_octet_form (source, src_bytes, skip, big_endian_p)
3690 unsigned char *source;
3691 int src_bytes, *skip;
3692{
3693 unsigned char *src = source, *src_end = src + src_bytes;
3694 unsigned int c1, c2;
3695 int total = 0; /* How many end-of-lines are found so far. */
3696 int eol_type = CODING_EOL_UNDECIDED;
3697 int this_eol_type;
3698 int msb, lsb;
3699
3700 if (big_endian_p)
3701 msb = 0, lsb = 1;
3702 else
3703 msb = 1, lsb = 0;
3704
3705 *skip = 0;
3706
3707 while ((src + 1) < src_end && total < MAX_EOL_CHECK_COUNT)
3708 {
3709 c1 = (src[msb] << 8) | (src[lsb]);
3710 src += 2;
3711
3712 if (c1 == '\n' || c1 == '\r')
3713 {
3714 if (*skip == 0)
3715 *skip = src - 2 - source;
3716 total++;
3717 if (c1 == '\n')
3718 {
3719 this_eol_type = CODING_EOL_LF;
3720 }
3721 else
3722 {
3723 if ((src + 1) >= src_end)
3724 {
3725 this_eol_type = CODING_EOL_CR;
3726 }
3727 else
3728 {
3729 c2 = (src[msb] << 8) | (src[lsb]);
3730 if (c2 == '\n')
3731 this_eol_type = CODING_EOL_CRLF, src += 2;
3732 else
3733 this_eol_type = CODING_EOL_CR;
3734 }
3735 }
3736
3737 if (eol_type == CODING_EOL_UNDECIDED)
3738 /* This is the first end-of-line. */
3739 eol_type = this_eol_type;
3740 else if (eol_type != this_eol_type)
3741 {
3742 /* The found type is different from what found before. */
3743 eol_type = CODING_EOL_INCONSISTENT;
3744 break;
3745 }
3746 }
3747 }
3748
3749 if (*skip == 0)
3750 *skip = src_end - source;
3751 return eol_type;
3752}
3753
4ed46869
KH
3754/* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
3755 is encoded. If it detects an appropriate format of end-of-line, it
3756 sets the information in *CODING. */
3757
3758void
3759detect_eol (coding, src, src_bytes)
3760 struct coding_system *coding;
3761 unsigned char *src;
3762 int src_bytes;
3763{
4608c386 3764 Lisp_Object val;
d46c5b12 3765 int skip;
fa42c37f
KH
3766 int eol_type;
3767
3768 switch (coding->category_idx)
3769 {
3770 case CODING_CATEGORY_IDX_UTF_16_BE:
3771 eol_type = detect_eol_type_in_2_octet_form (src, src_bytes, &skip, 1);
3772 break;
3773 case CODING_CATEGORY_IDX_UTF_16_LE:
3774 eol_type = detect_eol_type_in_2_octet_form (src, src_bytes, &skip, 0);
3775 break;
3776 default:
3777 eol_type = detect_eol_type (src, src_bytes, &skip);
3778 break;
3779 }
d46c5b12
KH
3780
3781 if (coding->heading_ascii > skip)
3782 coding->heading_ascii = skip;
3783 else
3784 skip = coding->heading_ascii;
4ed46869 3785
0ef69138 3786 if (eol_type == CODING_EOL_UNDECIDED)
4ed46869 3787 return;
27901516
KH
3788 if (eol_type == CODING_EOL_INCONSISTENT)
3789 {
3790#if 0
3791 /* This code is suppressed until we find a better way to
992f23f2 3792 distinguish raw text file and binary file. */
27901516
KH
3793
3794 /* If we have already detected that the coding is raw-text, the
3795 coding should actually be no-conversion. */
3796 if (coding->type == coding_type_raw_text)
3797 {
3798 setup_coding_system (Qno_conversion, coding);
3799 return;
3800 }
3801 /* Else, let's decode only text code anyway. */
3802#endif /* 0 */
1b2af4b0 3803 eol_type = CODING_EOL_LF;
27901516
KH
3804 }
3805
4608c386 3806 val = Fget (coding->symbol, Qeol_type);
4ed46869 3807 if (VECTORP (val) && XVECTOR (val)->size == 3)
d46c5b12 3808 {
b73bfc1c
KH
3809 int src_multibyte = coding->src_multibyte;
3810 int dst_multibyte = coding->dst_multibyte;
3811
d46c5b12 3812 setup_coding_system (XVECTOR (val)->contents[eol_type], coding);
b73bfc1c
KH
3813 coding->src_multibyte = src_multibyte;
3814 coding->dst_multibyte = dst_multibyte;
d46c5b12
KH
3815 coding->heading_ascii = skip;
3816 }
3817}
3818
3819#define CONVERSION_BUFFER_EXTRA_ROOM 256
3820
b73bfc1c
KH
3821#define DECODING_BUFFER_MAG(coding) \
3822 (coding->type == coding_type_iso2022 \
3823 ? 3 \
3824 : (coding->type == coding_type_ccl \
3825 ? coding->spec.ccl.decoder.buf_magnification \
3826 : 2))
d46c5b12
KH
3827
3828/* Return maximum size (bytes) of a buffer enough for decoding
3829 SRC_BYTES of text encoded in CODING. */
3830
3831int
3832decoding_buffer_size (coding, src_bytes)
3833 struct coding_system *coding;
3834 int src_bytes;
3835{
3836 return (src_bytes * DECODING_BUFFER_MAG (coding)
3837 + CONVERSION_BUFFER_EXTRA_ROOM);
3838}
3839
3840/* Return maximum size (bytes) of a buffer enough for encoding
3841 SRC_BYTES of text to CODING. */
3842
3843int
3844encoding_buffer_size (coding, src_bytes)
3845 struct coding_system *coding;
3846 int src_bytes;
3847{
3848 int magnification;
3849
3850 if (coding->type == coding_type_ccl)
3851 magnification = coding->spec.ccl.encoder.buf_magnification;
b73bfc1c 3852 else if (CODING_REQUIRE_ENCODING (coding))
d46c5b12 3853 magnification = 3;
b73bfc1c
KH
3854 else
3855 magnification = 1;
d46c5b12
KH
3856
3857 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM);
3858}
3859
3860#ifndef MINIMUM_CONVERSION_BUFFER_SIZE
3861#define MINIMUM_CONVERSION_BUFFER_SIZE 1024
3862#endif
3863
3864char *conversion_buffer;
3865int conversion_buffer_size;
3866
3867/* Return a pointer to a SIZE bytes of buffer to be used for encoding
3868 or decoding. Sufficient memory is allocated automatically. If we
3869 run out of memory, return NULL. */
3870
3871char *
3872get_conversion_buffer (size)
3873 int size;
3874{
3875 if (size > conversion_buffer_size)
3876 {
3877 char *buf;
3878 int real_size = conversion_buffer_size * 2;
3879
3880 while (real_size < size) real_size *= 2;
3881 buf = (char *) xmalloc (real_size);
3882 xfree (conversion_buffer);
3883 conversion_buffer = buf;
3884 conversion_buffer_size = real_size;
3885 }
3886 return conversion_buffer;
3887}
3888
3889int
3890ccl_coding_driver (coding, source, destination, src_bytes, dst_bytes, encodep)
3891 struct coding_system *coding;
3892 unsigned char *source, *destination;
3893 int src_bytes, dst_bytes, encodep;
3894{
3895 struct ccl_program *ccl
3896 = encodep ? &coding->spec.ccl.encoder : &coding->spec.ccl.decoder;
3897 int result;
3898
ae9ff118 3899 ccl->last_block = coding->mode & CODING_MODE_LAST_BLOCK;
aaaf0b1e
KH
3900 if (encodep)
3901 ccl->eol_type = coding->eol_type;
d46c5b12
KH
3902 coding->produced = ccl_driver (ccl, source, destination,
3903 src_bytes, dst_bytes, &(coding->consumed));
b73bfc1c
KH
3904 if (encodep)
3905 coding->produced_char = coding->produced;
3906 else
3907 {
3908 int bytes
3909 = dst_bytes ? dst_bytes : source + coding->consumed - destination;
3910 coding->produced = str_as_multibyte (destination, bytes,
3911 coding->produced,
3912 &(coding->produced_char));
3913 }
69f76525 3914
d46c5b12
KH
3915 switch (ccl->status)
3916 {
3917 case CCL_STAT_SUSPEND_BY_SRC:
3918 result = CODING_FINISH_INSUFFICIENT_SRC;
3919 break;
3920 case CCL_STAT_SUSPEND_BY_DST:
3921 result = CODING_FINISH_INSUFFICIENT_DST;
3922 break;
9864ebce
KH
3923 case CCL_STAT_QUIT:
3924 case CCL_STAT_INVALID_CMD:
3925 result = CODING_FINISH_INTERRUPT;
3926 break;
d46c5b12
KH
3927 default:
3928 result = CODING_FINISH_NORMAL;
3929 break;
3930 }
3931 return result;
4ed46869
KH
3932}
3933
aaaf0b1e
KH
3934/* Decode EOL format of the text at PTR of BYTES length destructively
3935 according to CODING->eol_type. This is called after the CCL
3936 program produced a decoded text at PTR. If we do CRLF->LF
3937 conversion, update CODING->produced and CODING->produced_char. */
3938
3939static void
3940decode_eol_post_ccl (coding, ptr, bytes)
3941 struct coding_system *coding;
3942 unsigned char *ptr;
3943 int bytes;
3944{
3945 Lisp_Object val, saved_coding_symbol;
3946 unsigned char *pend = ptr + bytes;
3947 int dummy;
3948
3949 /* Remember the current coding system symbol. We set it back when
3950 an inconsistent EOL is found so that `last-coding-system-used' is
3951 set to the coding system that doesn't specify EOL conversion. */
3952 saved_coding_symbol = coding->symbol;
3953
3954 coding->spec.ccl.cr_carryover = 0;
3955 if (coding->eol_type == CODING_EOL_UNDECIDED)
3956 {
3957 /* Here, to avoid the call of setup_coding_system, we directly
3958 call detect_eol_type. */
3959 coding->eol_type = detect_eol_type (ptr, bytes, &dummy);
74b01b80
EZ
3960 if (coding->eol_type == CODING_EOL_INCONSISTENT)
3961 coding->eol_type = CODING_EOL_LF;
3962 if (coding->eol_type != CODING_EOL_UNDECIDED)
3963 {
3964 val = Fget (coding->symbol, Qeol_type);
3965 if (VECTORP (val) && XVECTOR (val)->size == 3)
3966 coding->symbol = XVECTOR (val)->contents[coding->eol_type];
3967 }
aaaf0b1e
KH
3968 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
3969 }
3970
74b01b80
EZ
3971 if (coding->eol_type == CODING_EOL_LF
3972 || coding->eol_type == CODING_EOL_UNDECIDED)
aaaf0b1e
KH
3973 {
3974 /* We have nothing to do. */
3975 ptr = pend;
3976 }
3977 else if (coding->eol_type == CODING_EOL_CRLF)
3978 {
3979 unsigned char *pstart = ptr, *p = ptr;
3980
3981 if (! (coding->mode & CODING_MODE_LAST_BLOCK)
3982 && *(pend - 1) == '\r')
3983 {
3984 /* If the last character is CR, we can't handle it here
3985 because LF will be in the not-yet-decoded source text.
3986 Recorded that the CR is not yet processed. */
3987 coding->spec.ccl.cr_carryover = 1;
3988 coding->produced--;
3989 coding->produced_char--;
3990 pend--;
3991 }
3992 while (ptr < pend)
3993 {
3994 if (*ptr == '\r')
3995 {
3996 if (ptr + 1 < pend && *(ptr + 1) == '\n')
3997 {
3998 *p++ = '\n';
3999 ptr += 2;
4000 }
4001 else
4002 {
4003 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4004 goto undo_eol_conversion;
4005 *p++ = *ptr++;
4006 }
4007 }
4008 else if (*ptr == '\n'
4009 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4010 goto undo_eol_conversion;
4011 else
4012 *p++ = *ptr++;
4013 continue;
4014
4015 undo_eol_conversion:
4016 /* We have faced with inconsistent EOL format at PTR.
4017 Convert all LFs before PTR back to CRLFs. */
4018 for (p--, ptr--; p >= pstart; p--)
4019 {
4020 if (*p == '\n')
4021 *ptr-- = '\n', *ptr-- = '\r';
4022 else
4023 *ptr-- = *p;
4024 }
4025 /* If carryover is recorded, cancel it because we don't
4026 convert CRLF anymore. */
4027 if (coding->spec.ccl.cr_carryover)
4028 {
4029 coding->spec.ccl.cr_carryover = 0;
4030 coding->produced++;
4031 coding->produced_char++;
4032 pend++;
4033 }
4034 p = ptr = pend;
4035 coding->eol_type = CODING_EOL_LF;
4036 coding->symbol = saved_coding_symbol;
4037 }
4038 if (p < pend)
4039 {
4040 /* As each two-byte sequence CRLF was converted to LF, (PEND
4041 - P) is the number of deleted characters. */
4042 coding->produced -= pend - p;
4043 coding->produced_char -= pend - p;
4044 }
4045 }
4046 else /* i.e. coding->eol_type == CODING_EOL_CR */
4047 {
4048 unsigned char *p = ptr;
4049
4050 for (; ptr < pend; ptr++)
4051 {
4052 if (*ptr == '\r')
4053 *ptr = '\n';
4054 else if (*ptr == '\n'
4055 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4056 {
4057 for (; p < ptr; p++)
4058 {
4059 if (*p == '\n')
4060 *p = '\r';
4061 }
4062 ptr = pend;
4063 coding->eol_type = CODING_EOL_LF;
4064 coding->symbol = saved_coding_symbol;
4065 }
4066 }
4067 }
4068}
4069
4ed46869
KH
4070/* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before
4071 decoding, it may detect coding system and format of end-of-line if
b73bfc1c
KH
4072 those are not yet decided. The source should be unibyte, the
4073 result is multibyte if CODING->dst_multibyte is nonzero, else
4074 unibyte. */
4ed46869
KH
4075
4076int
d46c5b12 4077decode_coding (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
4078 struct coding_system *coding;
4079 unsigned char *source, *destination;
4080 int src_bytes, dst_bytes;
4ed46869 4081{
0ef69138 4082 if (coding->type == coding_type_undecided)
4ed46869
KH
4083 detect_coding (coding, source, src_bytes);
4084
aaaf0b1e
KH
4085 if (coding->eol_type == CODING_EOL_UNDECIDED
4086 && coding->type != coding_type_ccl)
4ed46869
KH
4087 detect_eol (coding, source, src_bytes);
4088
b73bfc1c
KH
4089 coding->produced = coding->produced_char = 0;
4090 coding->consumed = coding->consumed_char = 0;
4091 coding->errors = 0;
4092 coding->result = CODING_FINISH_NORMAL;
4093
4ed46869
KH
4094 switch (coding->type)
4095 {
4ed46869 4096 case coding_type_sjis:
b73bfc1c
KH
4097 decode_coding_sjis_big5 (coding, source, destination,
4098 src_bytes, dst_bytes, 1);
4ed46869
KH
4099 break;
4100
4101 case coding_type_iso2022:
b73bfc1c
KH
4102 decode_coding_iso2022 (coding, source, destination,
4103 src_bytes, dst_bytes);
4ed46869
KH
4104 break;
4105
4106 case coding_type_big5:
b73bfc1c
KH
4107 decode_coding_sjis_big5 (coding, source, destination,
4108 src_bytes, dst_bytes, 0);
4109 break;
4110
4111 case coding_type_emacs_mule:
4112 decode_coding_emacs_mule (coding, source, destination,
4113 src_bytes, dst_bytes);
4ed46869
KH
4114 break;
4115
4116 case coding_type_ccl:
aaaf0b1e
KH
4117 if (coding->spec.ccl.cr_carryover)
4118 {
4119 /* Set the CR which is not processed by the previous call of
4120 decode_eol_post_ccl in DESTINATION. */
4121 *destination = '\r';
4122 coding->produced++;
4123 coding->produced_char++;
4124 dst_bytes--;
4125 }
4126 ccl_coding_driver (coding, source,
4127 destination + coding->spec.ccl.cr_carryover,
b73bfc1c 4128 src_bytes, dst_bytes, 0);
aaaf0b1e
KH
4129 if (coding->eol_type != CODING_EOL_LF)
4130 decode_eol_post_ccl (coding, destination, coding->produced);
d46c5b12
KH
4131 break;
4132
b73bfc1c
KH
4133 default:
4134 decode_eol (coding, source, destination, src_bytes, dst_bytes);
4135 }
4136
4137 if (coding->result == CODING_FINISH_INSUFFICIENT_SRC
4138 && coding->consumed == src_bytes)
4139 coding->result = CODING_FINISH_NORMAL;
4140
4141 if (coding->mode & CODING_MODE_LAST_BLOCK
4142 && coding->result == CODING_FINISH_INSUFFICIENT_SRC)
4143 {
4144 unsigned char *src = source + coding->consumed;
4145 unsigned char *dst = destination + coding->produced;
4146
4147 src_bytes -= coding->consumed;
4148 coding->errors++;
4149 if (COMPOSING_P (coding))
4150 DECODE_COMPOSITION_END ('1');
4151 while (src_bytes--)
d46c5b12 4152 {
b73bfc1c
KH
4153 int c = *src++;
4154 dst += CHAR_STRING (c, dst);
4155 coding->produced_char++;
d46c5b12 4156 }
b73bfc1c
KH
4157 coding->consumed = coding->consumed_char = src - source;
4158 coding->produced = dst - destination;
4ed46869
KH
4159 }
4160
b73bfc1c
KH
4161 if (!coding->dst_multibyte)
4162 {
4163 coding->produced = str_as_unibyte (destination, coding->produced);
4164 coding->produced_char = coding->produced;
4165 }
4ed46869 4166
b73bfc1c
KH
4167 return coding->result;
4168}
52d41803 4169
b73bfc1c
KH
4170/* See "GENERAL NOTES about `encode_coding_XXX ()' functions". The
4171 multibyteness of the source is CODING->src_multibyte, the
4172 multibyteness of the result is always unibyte. */
4ed46869
KH
4173
4174int
d46c5b12 4175encode_coding (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
4176 struct coding_system *coding;
4177 unsigned char *source, *destination;
4178 int src_bytes, dst_bytes;
4ed46869 4179{
b73bfc1c
KH
4180 coding->produced = coding->produced_char = 0;
4181 coding->consumed = coding->consumed_char = 0;
4182 coding->errors = 0;
4183 coding->result = CODING_FINISH_NORMAL;
4ed46869 4184
d46c5b12
KH
4185 switch (coding->type)
4186 {
4ed46869 4187 case coding_type_sjis:
b73bfc1c
KH
4188 encode_coding_sjis_big5 (coding, source, destination,
4189 src_bytes, dst_bytes, 1);
4ed46869
KH
4190 break;
4191
4192 case coding_type_iso2022:
b73bfc1c
KH
4193 encode_coding_iso2022 (coding, source, destination,
4194 src_bytes, dst_bytes);
4ed46869
KH
4195 break;
4196
4197 case coding_type_big5:
b73bfc1c
KH
4198 encode_coding_sjis_big5 (coding, source, destination,
4199 src_bytes, dst_bytes, 0);
4200 break;
4201
4202 case coding_type_emacs_mule:
4203 encode_coding_emacs_mule (coding, source, destination,
4204 src_bytes, dst_bytes);
4ed46869
KH
4205 break;
4206
4207 case coding_type_ccl:
b73bfc1c
KH
4208 ccl_coding_driver (coding, source, destination,
4209 src_bytes, dst_bytes, 1);
d46c5b12
KH
4210 break;
4211
b73bfc1c
KH
4212 default:
4213 encode_eol (coding, source, destination, src_bytes, dst_bytes);
4214 }
4215
4216 if (coding->result == CODING_FINISH_INSUFFICIENT_SRC
4217 && coding->consumed == src_bytes)
4218 coding->result = CODING_FINISH_NORMAL;
4219
4220 if (coding->mode & CODING_MODE_LAST_BLOCK)
4221 {
4222 unsigned char *src = source + coding->consumed;
4223 unsigned char *src_end = src + src_bytes;
4224 unsigned char *dst = destination + coding->produced;
4225
4226 if (coding->type == coding_type_iso2022)
4227 ENCODE_RESET_PLANE_AND_REGISTER;
4228 if (COMPOSING_P (coding))
4229 *dst++ = ISO_CODE_ESC, *dst++ = '1';
4230 if (coding->consumed < src_bytes)
d46c5b12 4231 {
b73bfc1c
KH
4232 int len = src_bytes - coding->consumed;
4233
4234 BCOPY_SHORT (source + coding->consumed, dst, len);
4235 if (coding->src_multibyte)
4236 len = str_as_unibyte (dst, len);
4237 dst += len;
4238 coding->consumed = src_bytes;
d46c5b12 4239 }
b73bfc1c 4240 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
4241 }
4242
b73bfc1c 4243 return coding->result;
4ed46869
KH
4244}
4245
fb88bf2d
KH
4246/* Scan text in the region between *BEG and *END (byte positions),
4247 skip characters which we don't have to decode by coding system
4248 CODING at the head and tail, then set *BEG and *END to the region
4249 of the text we actually have to convert. The caller should move
b73bfc1c
KH
4250 the gap out of the region in advance if the region is from a
4251 buffer.
4ed46869 4252
d46c5b12
KH
4253 If STR is not NULL, *BEG and *END are indices into STR. */
4254
4255static void
4256shrink_decoding_region (beg, end, coding, str)
4257 int *beg, *end;
4258 struct coding_system *coding;
4259 unsigned char *str;
4260{
fb88bf2d 4261 unsigned char *begp_orig, *begp, *endp_orig, *endp, c;
d46c5b12 4262 int eol_conversion;
88993dfd 4263 Lisp_Object translation_table;
d46c5b12
KH
4264
4265 if (coding->type == coding_type_ccl
4266 || coding->type == coding_type_undecided
b73bfc1c
KH
4267 || coding->eol_type != CODING_EOL_LF
4268 || !NILP (coding->post_read_conversion)
4269 || coding->composing != COMPOSITION_DISABLED)
d46c5b12
KH
4270 {
4271 /* We can't skip any data. */
4272 return;
4273 }
b73bfc1c
KH
4274 if (coding->type == coding_type_no_conversion
4275 || coding->type == coding_type_raw_text
4276 || coding->type == coding_type_emacs_mule)
d46c5b12 4277 {
fb88bf2d
KH
4278 /* We need no conversion, but don't have to skip any data here.
4279 Decoding routine handles them effectively anyway. */
d46c5b12
KH
4280 return;
4281 }
4282
88993dfd
KH
4283 translation_table = coding->translation_table_for_decode;
4284 if (NILP (translation_table) && !NILP (Venable_character_translation))
4285 translation_table = Vstandard_translation_table_for_decode;
4286 if (CHAR_TABLE_P (translation_table))
4287 {
4288 int i;
4289 for (i = 0; i < 128; i++)
4290 if (!NILP (CHAR_TABLE_REF (translation_table, i)))
4291 break;
4292 if (i < 128)
fa46990e 4293 /* Some ASCII character should be translated. We give up
88993dfd
KH
4294 shrinking. */
4295 return;
4296 }
4297
b73bfc1c 4298 if (coding->heading_ascii >= 0)
d46c5b12
KH
4299 /* Detection routine has already found how much we can skip at the
4300 head. */
4301 *beg += coding->heading_ascii;
4302
4303 if (str)
4304 {
4305 begp_orig = begp = str + *beg;
4306 endp_orig = endp = str + *end;
4307 }
4308 else
4309 {
fb88bf2d 4310 begp_orig = begp = BYTE_POS_ADDR (*beg);
d46c5b12
KH
4311 endp_orig = endp = begp + *end - *beg;
4312 }
4313
fa46990e
DL
4314 eol_conversion = (coding->eol_type == CODING_EOL_CR
4315 || coding->eol_type == CODING_EOL_CRLF);
4316
d46c5b12
KH
4317 switch (coding->type)
4318 {
d46c5b12
KH
4319 case coding_type_sjis:
4320 case coding_type_big5:
4321 /* We can skip all ASCII characters at the head. */
4322 if (coding->heading_ascii < 0)
4323 {
4324 if (eol_conversion)
de9d083c 4325 while (begp < endp && *begp < 0x80 && *begp != '\r') begp++;
d46c5b12
KH
4326 else
4327 while (begp < endp && *begp < 0x80) begp++;
4328 }
4329 /* We can skip all ASCII characters at the tail except for the
4330 second byte of SJIS or BIG5 code. */
4331 if (eol_conversion)
de9d083c 4332 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\r') endp--;
d46c5b12
KH
4333 else
4334 while (begp < endp && endp[-1] < 0x80) endp--;
ee59c65f
RS
4335 /* Do not consider LF as ascii if preceded by CR, since that
4336 confuses eol decoding. */
4337 if (begp < endp && endp < endp_orig && endp[-1] == '\r' && endp[0] == '\n')
4338 endp++;
d46c5b12
KH
4339 if (begp < endp && endp < endp_orig && endp[-1] >= 0x80)
4340 endp++;
4341 break;
4342
b73bfc1c 4343 case coding_type_iso2022:
622fece5
KH
4344 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, 0) != CHARSET_ASCII)
4345 /* We can't skip any data. */
4346 break;
d46c5b12
KH
4347 if (coding->heading_ascii < 0)
4348 {
d46c5b12
KH
4349 /* We can skip all ASCII characters at the head except for a
4350 few control codes. */
4351 while (begp < endp && (c = *begp) < 0x80
4352 && c != ISO_CODE_CR && c != ISO_CODE_SO
4353 && c != ISO_CODE_SI && c != ISO_CODE_ESC
4354 && (!eol_conversion || c != ISO_CODE_LF))
4355 begp++;
4356 }
4357 switch (coding->category_idx)
4358 {
4359 case CODING_CATEGORY_IDX_ISO_8_1:
4360 case CODING_CATEGORY_IDX_ISO_8_2:
4361 /* We can skip all ASCII characters at the tail. */
4362 if (eol_conversion)
de9d083c 4363 while (begp < endp && (c = endp[-1]) < 0x80 && c != '\r') endp--;
d46c5b12
KH
4364 else
4365 while (begp < endp && endp[-1] < 0x80) endp--;
ee59c65f
RS
4366 /* Do not consider LF as ascii if preceded by CR, since that
4367 confuses eol decoding. */
4368 if (begp < endp && endp < endp_orig && endp[-1] == '\r' && endp[0] == '\n')
4369 endp++;
d46c5b12
KH
4370 break;
4371
4372 case CODING_CATEGORY_IDX_ISO_7:
4373 case CODING_CATEGORY_IDX_ISO_7_TIGHT:
de79a6a5
KH
4374 {
4375 /* We can skip all charactes at the tail except for 8-bit
4376 codes and ESC and the following 2-byte at the tail. */
4377 unsigned char *eight_bit = NULL;
4378
4379 if (eol_conversion)
4380 while (begp < endp
4381 && (c = endp[-1]) != ISO_CODE_ESC && c != '\r')
4382 {
4383 if (!eight_bit && c & 0x80) eight_bit = endp;
4384 endp--;
4385 }
4386 else
4387 while (begp < endp
4388 && (c = endp[-1]) != ISO_CODE_ESC)
4389 {
4390 if (!eight_bit && c & 0x80) eight_bit = endp;
4391 endp--;
4392 }
4393 /* Do not consider LF as ascii if preceded by CR, since that
4394 confuses eol decoding. */
4395 if (begp < endp && endp < endp_orig
4396 && endp[-1] == '\r' && endp[0] == '\n')
4397 endp++;
4398 if (begp < endp && endp[-1] == ISO_CODE_ESC)
4399 {
4400 if (endp + 1 < endp_orig && end[0] == '(' && end[1] == 'B')
4401 /* This is an ASCII designation sequence. We can
4402 surely skip the tail. But, if we have
4403 encountered an 8-bit code, skip only the codes
4404 after that. */
4405 endp = eight_bit ? eight_bit : endp + 2;
4406 else
4407 /* Hmmm, we can't skip the tail. */
4408 endp = endp_orig;
4409 }
4410 else if (eight_bit)
4411 endp = eight_bit;
4412 }
d46c5b12 4413 }
b73bfc1c
KH
4414 break;
4415
4416 default:
4417 abort ();
d46c5b12
KH
4418 }
4419 *beg += begp - begp_orig;
4420 *end += endp - endp_orig;
4421 return;
4422}
4423
4424/* Like shrink_decoding_region but for encoding. */
4425
4426static void
4427shrink_encoding_region (beg, end, coding, str)
4428 int *beg, *end;
4429 struct coding_system *coding;
4430 unsigned char *str;
4431{
4432 unsigned char *begp_orig, *begp, *endp_orig, *endp;
4433 int eol_conversion;
88993dfd 4434 Lisp_Object translation_table;
d46c5b12 4435
b73bfc1c
KH
4436 if (coding->type == coding_type_ccl
4437 || coding->eol_type == CODING_EOL_CRLF
4438 || coding->eol_type == CODING_EOL_CR
4439 || coding->cmp_data && coding->cmp_data->used > 0)
d46c5b12 4440 {
b73bfc1c
KH
4441 /* We can't skip any data. */
4442 return;
4443 }
4444 if (coding->type == coding_type_no_conversion
4445 || coding->type == coding_type_raw_text
4446 || coding->type == coding_type_emacs_mule
4447 || coding->type == coding_type_undecided)
4448 {
4449 /* We need no conversion, but don't have to skip any data here.
4450 Encoding routine handles them effectively anyway. */
d46c5b12
KH
4451 return;
4452 }
4453
88993dfd
KH
4454 translation_table = coding->translation_table_for_encode;
4455 if (NILP (translation_table) && !NILP (Venable_character_translation))
4456 translation_table = Vstandard_translation_table_for_encode;
4457 if (CHAR_TABLE_P (translation_table))
4458 {
4459 int i;
4460 for (i = 0; i < 128; i++)
4461 if (!NILP (CHAR_TABLE_REF (translation_table, i)))
4462 break;
4463 if (i < 128)
4464 /* Some ASCII character should be tranlsated. We give up
4465 shrinking. */
4466 return;
4467 }
4468
d46c5b12
KH
4469 if (str)
4470 {
4471 begp_orig = begp = str + *beg;
4472 endp_orig = endp = str + *end;
4473 }
4474 else
4475 {
fb88bf2d 4476 begp_orig = begp = BYTE_POS_ADDR (*beg);
d46c5b12
KH
4477 endp_orig = endp = begp + *end - *beg;
4478 }
4479
4480 eol_conversion = (coding->eol_type == CODING_EOL_CR
4481 || coding->eol_type == CODING_EOL_CRLF);
4482
4483 /* Here, we don't have to check coding->pre_write_conversion because
4484 the caller is expected to have handled it already. */
4485 switch (coding->type)
4486 {
d46c5b12 4487 case coding_type_iso2022:
622fece5
KH
4488 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, 0) != CHARSET_ASCII)
4489 /* We can't skip any data. */
4490 break;
d46c5b12
KH
4491 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL)
4492 {
4493 unsigned char *bol = begp;
4494 while (begp < endp && *begp < 0x80)
4495 {
4496 begp++;
4497 if (begp[-1] == '\n')
4498 bol = begp;
4499 }
4500 begp = bol;
4501 goto label_skip_tail;
4502 }
4503 /* fall down ... */
4504
b73bfc1c
KH
4505 case coding_type_sjis:
4506 case coding_type_big5:
d46c5b12
KH
4507 /* We can skip all ASCII characters at the head and tail. */
4508 if (eol_conversion)
4509 while (begp < endp && *begp < 0x80 && *begp != '\n') begp++;
4510 else
4511 while (begp < endp && *begp < 0x80) begp++;
4512 label_skip_tail:
4513 if (eol_conversion)
4514 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--;
4515 else
4516 while (begp < endp && *(endp - 1) < 0x80) endp--;
4517 break;
b73bfc1c
KH
4518
4519 default:
4520 abort ();
d46c5b12
KH
4521 }
4522
4523 *beg += begp - begp_orig;
4524 *end += endp - endp_orig;
4525 return;
4526}
4527
88993dfd
KH
4528/* As shrinking conversion region requires some overhead, we don't try
4529 shrinking if the length of conversion region is less than this
4530 value. */
4531static int shrink_conversion_region_threshhold = 1024;
4532
4533#define SHRINK_CONVERSION_REGION(beg, end, coding, str, encodep) \
4534 do { \
4535 if (*(end) - *(beg) > shrink_conversion_region_threshhold) \
4536 { \
4537 if (encodep) shrink_encoding_region (beg, end, coding, str); \
4538 else shrink_decoding_region (beg, end, coding, str); \
4539 } \
4540 } while (0)
4541
b843d1ae
KH
4542static Lisp_Object
4543code_convert_region_unwind (dummy)
4544 Lisp_Object dummy;
4545{
4546 inhibit_pre_post_conversion = 0;
4547 return Qnil;
4548}
4549
ec6d2bb8
KH
4550/* Store information about all compositions in the range FROM and TO
4551 of OBJ in memory blocks pointed by CODING->cmp_data. OBJ is a
4552 buffer or a string, defaults to the current buffer. */
4553
4554void
4555coding_save_composition (coding, from, to, obj)
4556 struct coding_system *coding;
4557 int from, to;
4558 Lisp_Object obj;
4559{
4560 Lisp_Object prop;
4561 int start, end;
4562
91bee881
KH
4563 if (coding->composing == COMPOSITION_DISABLED)
4564 return;
4565 if (!coding->cmp_data)
4566 coding_allocate_composition_data (coding, from);
ec6d2bb8
KH
4567 if (!find_composition (from, to, &start, &end, &prop, obj)
4568 || end > to)
4569 return;
4570 if (start < from
4571 && (!find_composition (end, to, &start, &end, &prop, obj)
4572 || end > to))
4573 return;
4574 coding->composing = COMPOSITION_NO;
ec6d2bb8
KH
4575 do
4576 {
4577 if (COMPOSITION_VALID_P (start, end, prop))
4578 {
4579 enum composition_method method = COMPOSITION_METHOD (prop);
4580 if (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH
4581 >= COMPOSITION_DATA_SIZE)
4582 coding_allocate_composition_data (coding, from);
4583 /* For relative composition, we remember start and end
4584 positions, for the other compositions, we also remember
4585 components. */
4586 CODING_ADD_COMPOSITION_START (coding, start - from, method);
4587 if (method != COMPOSITION_RELATIVE)
4588 {
4589 /* We must store a*/
4590 Lisp_Object val, ch;
4591
4592 val = COMPOSITION_COMPONENTS (prop);
4593 if (CONSP (val))
4594 while (CONSP (val))
4595 {
4596 ch = XCAR (val), val = XCDR (val);
4597 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (ch));
4598 }
4599 else if (VECTORP (val) || STRINGP (val))
4600 {
4601 int len = (VECTORP (val)
4602 ? XVECTOR (val)->size : XSTRING (val)->size);
4603 int i;
4604 for (i = 0; i < len; i++)
4605 {
4606 ch = (STRINGP (val)
4607 ? Faref (val, make_number (i))
4608 : XVECTOR (val)->contents[i]);
4609 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (ch));
4610 }
4611 }
4612 else /* INTEGERP (val) */
4613 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (val));
4614 }
4615 CODING_ADD_COMPOSITION_END (coding, end - from);
4616 }
4617 start = end;
4618 }
4619 while (start < to
4620 && find_composition (start, to, &start, &end, &prop, obj)
4621 && end <= to);
4622
4623 /* Make coding->cmp_data point to the first memory block. */
4624 while (coding->cmp_data->prev)
4625 coding->cmp_data = coding->cmp_data->prev;
4626 coding->cmp_data_start = 0;
4627}
4628
4629/* Reflect the saved information about compositions to OBJ.
4630 CODING->cmp_data points to a memory block for the informaiton. OBJ
4631 is a buffer or a string, defaults to the current buffer. */
4632
33fb63eb 4633void
ec6d2bb8
KH
4634coding_restore_composition (coding, obj)
4635 struct coding_system *coding;
4636 Lisp_Object obj;
4637{
4638 struct composition_data *cmp_data = coding->cmp_data;
4639
4640 if (!cmp_data)
4641 return;
4642
4643 while (cmp_data->prev)
4644 cmp_data = cmp_data->prev;
4645
4646 while (cmp_data)
4647 {
4648 int i;
4649
4650 for (i = 0; i < cmp_data->used; i += cmp_data->data[i])
4651 {
4652 int *data = cmp_data->data + i;
4653 enum composition_method method = (enum composition_method) data[3];
4654 Lisp_Object components;
4655
4656 if (method == COMPOSITION_RELATIVE)
4657 components = Qnil;
4658 else
4659 {
4660 int len = data[0] - 4, j;
4661 Lisp_Object args[MAX_COMPOSITION_COMPONENTS * 2 - 1];
4662
4663 for (j = 0; j < len; j++)
4664 args[j] = make_number (data[4 + j]);
4665 components = (method == COMPOSITION_WITH_ALTCHARS
4666 ? Fstring (len, args) : Fvector (len, args));
4667 }
4668 compose_text (data[1], data[2], components, Qnil, obj);
4669 }
4670 cmp_data = cmp_data->next;
4671 }
4672}
4673
d46c5b12 4674/* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the
fb88bf2d
KH
4675 text from FROM to TO (byte positions are FROM_BYTE and TO_BYTE) by
4676 coding system CODING, and return the status code of code conversion
4677 (currently, this value has no meaning).
4678
4679 How many characters (and bytes) are converted to how many
4680 characters (and bytes) are recorded in members of the structure
4681 CODING.
d46c5b12 4682
6e44253b 4683 If REPLACE is nonzero, we do various things as if the original text
d46c5b12 4684 is deleted and a new text is inserted. See the comments in
b73bfc1c
KH
4685 replace_range (insdel.c) to know what we are doing.
4686
4687 If REPLACE is zero, it is assumed that the source text is unibyte.
4688 Otherwize, it is assumed that the source text is multibyte. */
4ed46869
KH
4689
4690int
6e44253b
KH
4691code_convert_region (from, from_byte, to, to_byte, coding, encodep, replace)
4692 int from, from_byte, to, to_byte, encodep, replace;
4ed46869 4693 struct coding_system *coding;
4ed46869 4694{
fb88bf2d
KH
4695 int len = to - from, len_byte = to_byte - from_byte;
4696 int require, inserted, inserted_byte;
4b39528c 4697 int head_skip, tail_skip, total_skip = 0;
84d60297 4698 Lisp_Object saved_coding_symbol;
fb88bf2d 4699 int first = 1;
fb88bf2d 4700 unsigned char *src, *dst;
84d60297 4701 Lisp_Object deletion;
e133c8fa 4702 int orig_point = PT, orig_len = len;
6abb9bd9 4703 int prev_Z;
b73bfc1c
KH
4704 int multibyte_p = !NILP (current_buffer->enable_multibyte_characters);
4705
4706 coding->src_multibyte = replace && multibyte_p;
4707 coding->dst_multibyte = multibyte_p;
84d60297
RS
4708
4709 deletion = Qnil;
4710 saved_coding_symbol = Qnil;
d46c5b12 4711
83fa074f 4712 if (from < PT && PT < to)
e133c8fa
KH
4713 {
4714 TEMP_SET_PT_BOTH (from, from_byte);
4715 orig_point = from;
4716 }
83fa074f 4717
6e44253b 4718 if (replace)
d46c5b12 4719 {
fb88bf2d 4720 int saved_from = from;
e077cc80 4721 int saved_inhibit_modification_hooks;
fb88bf2d 4722
d46c5b12 4723 prepare_to_modify_buffer (from, to, &from);
fb88bf2d
KH
4724 if (saved_from != from)
4725 {
4726 to = from + len;
b73bfc1c 4727 from_byte = CHAR_TO_BYTE (from), to_byte = CHAR_TO_BYTE (to);
fb88bf2d
KH
4728 len_byte = to_byte - from_byte;
4729 }
e077cc80
KH
4730
4731 /* The code conversion routine can not preserve text properties
4732 for now. So, we must remove all text properties in the
4733 region. Here, we must suppress all modification hooks. */
4734 saved_inhibit_modification_hooks = inhibit_modification_hooks;
4735 inhibit_modification_hooks = 1;
4736 Fset_text_properties (make_number (from), make_number (to), Qnil, Qnil);
4737 inhibit_modification_hooks = saved_inhibit_modification_hooks;
d46c5b12 4738 }
d46c5b12
KH
4739
4740 if (! encodep && CODING_REQUIRE_DETECTION (coding))
4741 {
12410ef1 4742 /* We must detect encoding of text and eol format. */
d46c5b12
KH
4743
4744 if (from < GPT && to > GPT)
4745 move_gap_both (from, from_byte);
4746 if (coding->type == coding_type_undecided)
4747 {
fb88bf2d 4748 detect_coding (coding, BYTE_POS_ADDR (from_byte), len_byte);
d46c5b12 4749 if (coding->type == coding_type_undecided)
12410ef1
KH
4750 /* It seems that the text contains only ASCII, but we
4751 should not left it undecided because the deeper
4752 decoding routine (decode_coding) tries to detect the
4753 encodings again in vain. */
d46c5b12
KH
4754 coding->type = coding_type_emacs_mule;
4755 }
aaaf0b1e
KH
4756 if (coding->eol_type == CODING_EOL_UNDECIDED
4757 && coding->type != coding_type_ccl)
d46c5b12
KH
4758 {
4759 saved_coding_symbol = coding->symbol;
4760 detect_eol (coding, BYTE_POS_ADDR (from_byte), len_byte);
4761 if (coding->eol_type == CODING_EOL_UNDECIDED)
4762 coding->eol_type = CODING_EOL_LF;
4763 /* We had better recover the original eol format if we
4764 encounter an inconsitent eol format while decoding. */
4765 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
4766 }
4767 }
4768
d46c5b12
KH
4769 /* Now we convert the text. */
4770
4771 /* For encoding, we must process pre-write-conversion in advance. */
b73bfc1c
KH
4772 if (! inhibit_pre_post_conversion
4773 && encodep
d46c5b12
KH
4774 && SYMBOLP (coding->pre_write_conversion)
4775 && ! NILP (Ffboundp (coding->pre_write_conversion)))
4776 {
2b4f9037
KH
4777 /* The function in pre-write-conversion may put a new text in a
4778 new buffer. */
0007bdd0
KH
4779 struct buffer *prev = current_buffer;
4780 Lisp_Object new;
b843d1ae 4781 int count = specpdl_ptr - specpdl;
d46c5b12 4782
b843d1ae
KH
4783 record_unwind_protect (code_convert_region_unwind, Qnil);
4784 /* We should not call any more pre-write/post-read-conversion
4785 functions while this pre-write-conversion is running. */
4786 inhibit_pre_post_conversion = 1;
b39f748c
AS
4787 call2 (coding->pre_write_conversion,
4788 make_number (from), make_number (to));
b843d1ae
KH
4789 inhibit_pre_post_conversion = 0;
4790 /* Discard the unwind protect. */
4791 specpdl_ptr--;
4792
d46c5b12
KH
4793 if (current_buffer != prev)
4794 {
4795 len = ZV - BEGV;
0007bdd0 4796 new = Fcurrent_buffer ();
d46c5b12 4797 set_buffer_internal_1 (prev);
7dae4502 4798 del_range_2 (from, from_byte, to, to_byte, 0);
e133c8fa 4799 TEMP_SET_PT_BOTH (from, from_byte);
0007bdd0
KH
4800 insert_from_buffer (XBUFFER (new), 1, len, 0);
4801 Fkill_buffer (new);
e133c8fa
KH
4802 if (orig_point >= to)
4803 orig_point += len - orig_len;
4804 else if (orig_point > from)
4805 orig_point = from;
4806 orig_len = len;
d46c5b12 4807 to = from + len;
b73bfc1c
KH
4808 from_byte = CHAR_TO_BYTE (from);
4809 to_byte = CHAR_TO_BYTE (to);
d46c5b12 4810 len_byte = to_byte - from_byte;
e133c8fa 4811 TEMP_SET_PT_BOTH (from, from_byte);
d46c5b12
KH
4812 }
4813 }
4814
12410ef1
KH
4815 if (replace)
4816 deletion = make_buffer_string_both (from, from_byte, to, to_byte, 1);
4817
ec6d2bb8
KH
4818 if (coding->composing != COMPOSITION_DISABLED)
4819 {
4820 if (encodep)
4821 coding_save_composition (coding, from, to, Fcurrent_buffer ());
4822 else
4823 coding_allocate_composition_data (coding, from);
4824 }
fb88bf2d 4825
b73bfc1c 4826 /* Try to skip the heading and tailing ASCIIs. */
4956c225
KH
4827 if (coding->type != coding_type_ccl)
4828 {
4829 int from_byte_orig = from_byte, to_byte_orig = to_byte;
ec6d2bb8 4830
4956c225
KH
4831 if (from < GPT && GPT < to)
4832 move_gap_both (from, from_byte);
4833 SHRINK_CONVERSION_REGION (&from_byte, &to_byte, coding, NULL, encodep);
4834 if (from_byte == to_byte
4835 && (encodep || NILP (coding->post_read_conversion))
4836 && ! CODING_REQUIRE_FLUSHING (coding))
4837 {
4838 coding->produced = len_byte;
4839 coding->produced_char = len;
4840 if (!replace)
4841 /* We must record and adjust for this new text now. */
4842 adjust_after_insert (from, from_byte_orig, to, to_byte_orig, len);
4843 return 0;
4844 }
4845
4846 head_skip = from_byte - from_byte_orig;
4847 tail_skip = to_byte_orig - to_byte;
4848 total_skip = head_skip + tail_skip;
4849 from += head_skip;
4850 to -= tail_skip;
4851 len -= total_skip; len_byte -= total_skip;
4852 }
d46c5b12 4853
fb88bf2d
KH
4854 /* For converion, we must put the gap before the text in addition to
4855 making the gap larger for efficient decoding. The required gap
4856 size starts from 2000 which is the magic number used in make_gap.
4857 But, after one batch of conversion, it will be incremented if we
4858 find that it is not enough . */
d46c5b12
KH
4859 require = 2000;
4860
4861 if (GAP_SIZE < require)
4862 make_gap (require - GAP_SIZE);
4863 move_gap_both (from, from_byte);
4864
d46c5b12 4865 inserted = inserted_byte = 0;
fb88bf2d
KH
4866
4867 GAP_SIZE += len_byte;
4868 ZV -= len;
4869 Z -= len;
4870 ZV_BYTE -= len_byte;
4871 Z_BYTE -= len_byte;
4872
d9f9a1bc
GM
4873 if (GPT - BEG < BEG_UNCHANGED)
4874 BEG_UNCHANGED = GPT - BEG;
4875 if (Z - GPT < END_UNCHANGED)
4876 END_UNCHANGED = Z - GPT;
f2558efd 4877
b73bfc1c
KH
4878 if (!encodep && coding->src_multibyte)
4879 {
4880 /* Decoding routines expects that the source text is unibyte.
4881 We must convert 8-bit characters of multibyte form to
4882 unibyte. */
4883 int len_byte_orig = len_byte;
4884 len_byte = str_as_unibyte (GAP_END_ADDR - len_byte, len_byte);
4885 if (len_byte < len_byte_orig)
4886 safe_bcopy (GAP_END_ADDR - len_byte_orig, GAP_END_ADDR - len_byte,
4887 len_byte);
4888 coding->src_multibyte = 0;
4889 }
4890
d46c5b12
KH
4891 for (;;)
4892 {
fb88bf2d 4893 int result;
d46c5b12 4894
ec6d2bb8 4895 /* The buffer memory is now:
b73bfc1c
KH
4896 +--------+converted-text+---------+-------original-text-------+---+
4897 |<-from->|<--inserted-->|---------|<--------len_byte--------->|---|
4898 |<---------------------- GAP ----------------------->| */
ec6d2bb8
KH
4899 src = GAP_END_ADDR - len_byte;
4900 dst = GPT_ADDR + inserted_byte;
4901
d46c5b12 4902 if (encodep)
fb88bf2d 4903 result = encode_coding (coding, src, dst, len_byte, 0);
d46c5b12 4904 else
fb88bf2d 4905 result = decode_coding (coding, src, dst, len_byte, 0);
ec6d2bb8
KH
4906
4907 /* The buffer memory is now:
b73bfc1c
KH
4908 +--------+-------converted-text----+--+------original-text----+---+
4909 |<-from->|<-inserted->|<-produced->|--|<-(len_byte-consumed)->|---|
4910 |<---------------------- GAP ----------------------->| */
ec6d2bb8 4911
d46c5b12
KH
4912 inserted += coding->produced_char;
4913 inserted_byte += coding->produced;
d46c5b12 4914 len_byte -= coding->consumed;
ec6d2bb8
KH
4915
4916 if (result == CODING_FINISH_INSUFFICIENT_CMP)
4917 {
4918 coding_allocate_composition_data (coding, from + inserted);
4919 continue;
4920 }
4921
fb88bf2d 4922 src += coding->consumed;
3636f7a3 4923 dst += coding->produced;
d46c5b12 4924
9864ebce
KH
4925 if (result == CODING_FINISH_NORMAL)
4926 {
4927 src += len_byte;
4928 break;
4929 }
d46c5b12
KH
4930 if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL)
4931 {
fb88bf2d 4932 unsigned char *pend = dst, *p = pend - inserted_byte;
38edf7d4 4933 Lisp_Object eol_type;
d46c5b12
KH
4934
4935 /* Encode LFs back to the original eol format (CR or CRLF). */
4936 if (coding->eol_type == CODING_EOL_CR)
4937 {
4938 while (p < pend) if (*p++ == '\n') p[-1] = '\r';
4939 }
4940 else
4941 {
d46c5b12
KH
4942 int count = 0;
4943
fb88bf2d
KH
4944 while (p < pend) if (*p++ == '\n') count++;
4945 if (src - dst < count)
d46c5b12 4946 {
38edf7d4 4947 /* We don't have sufficient room for encoding LFs
fb88bf2d
KH
4948 back to CRLF. We must record converted and
4949 not-yet-converted text back to the buffer
4950 content, enlarge the gap, then record them out of
4951 the buffer contents again. */
4952 int add = len_byte + inserted_byte;
4953
4954 GAP_SIZE -= add;
4955 ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add;
4956 GPT += inserted_byte; GPT_BYTE += inserted_byte;
4957 make_gap (count - GAP_SIZE);
4958 GAP_SIZE += add;
4959 ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add;
4960 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
4961 /* Don't forget to update SRC, DST, and PEND. */
4962 src = GAP_END_ADDR - len_byte;
4963 dst = GPT_ADDR + inserted_byte;
4964 pend = dst;
d46c5b12 4965 }
d46c5b12
KH
4966 inserted += count;
4967 inserted_byte += count;
fb88bf2d
KH
4968 coding->produced += count;
4969 p = dst = pend + count;
4970 while (count)
4971 {
4972 *--p = *--pend;
4973 if (*p == '\n') count--, *--p = '\r';
4974 }
d46c5b12
KH
4975 }
4976
4977 /* Suppress eol-format conversion in the further conversion. */
4978 coding->eol_type = CODING_EOL_LF;
4979
38edf7d4
KH
4980 /* Set the coding system symbol to that for Unix-like EOL. */
4981 eol_type = Fget (saved_coding_symbol, Qeol_type);
4982 if (VECTORP (eol_type)
4983 && XVECTOR (eol_type)->size == 3
4984 && SYMBOLP (XVECTOR (eol_type)->contents[CODING_EOL_LF]))
4985 coding->symbol = XVECTOR (eol_type)->contents[CODING_EOL_LF];
4986 else
4987 coding->symbol = saved_coding_symbol;
fb88bf2d
KH
4988
4989 continue;
d46c5b12
KH
4990 }
4991 if (len_byte <= 0)
944bd420
KH
4992 {
4993 if (coding->type != coding_type_ccl
4994 || coding->mode & CODING_MODE_LAST_BLOCK)
4995 break;
4996 coding->mode |= CODING_MODE_LAST_BLOCK;
4997 continue;
4998 }
d46c5b12
KH
4999 if (result == CODING_FINISH_INSUFFICIENT_SRC)
5000 {
5001 /* The source text ends in invalid codes. Let's just
5002 make them valid buffer contents, and finish conversion. */
fb88bf2d 5003 inserted += len_byte;
d46c5b12 5004 inserted_byte += len_byte;
fb88bf2d 5005 while (len_byte--)
ee59c65f 5006 *dst++ = *src++;
d46c5b12
KH
5007 break;
5008 }
9864ebce
KH
5009 if (result == CODING_FINISH_INTERRUPT)
5010 {
5011 /* The conversion procedure was interrupted by a user. */
9864ebce
KH
5012 break;
5013 }
5014 /* Now RESULT == CODING_FINISH_INSUFFICIENT_DST */
5015 if (coding->consumed < 1)
5016 {
5017 /* It's quite strange to require more memory without
5018 consuming any bytes. Perhaps CCL program bug. */
9864ebce
KH
5019 break;
5020 }
fb88bf2d
KH
5021 if (first)
5022 {
5023 /* We have just done the first batch of conversion which was
5024 stoped because of insufficient gap. Let's reconsider the
5025 required gap size (i.e. SRT - DST) now.
5026
5027 We have converted ORIG bytes (== coding->consumed) into
5028 NEW bytes (coding->produced). To convert the remaining
5029 LEN bytes, we may need REQUIRE bytes of gap, where:
5030 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
5031 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
5032 Here, we are sure that NEW >= ORIG. */
6e44253b
KH
5033 float ratio = coding->produced - coding->consumed;
5034 ratio /= coding->consumed;
5035 require = len_byte * ratio;
fb88bf2d
KH
5036 first = 0;
5037 }
5038 if ((src - dst) < (require + 2000))
5039 {
5040 /* See the comment above the previous call of make_gap. */
5041 int add = len_byte + inserted_byte;
5042
5043 GAP_SIZE -= add;
5044 ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add;
5045 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5046 make_gap (require + 2000);
5047 GAP_SIZE += add;
5048 ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add;
5049 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
fb88bf2d 5050 }
d46c5b12 5051 }
fb88bf2d
KH
5052 if (src - dst > 0) *dst = 0; /* Put an anchor. */
5053
b73bfc1c
KH
5054 if (encodep && coding->dst_multibyte)
5055 {
5056 /* The output is unibyte. We must convert 8-bit characters to
5057 multibyte form. */
5058 if (inserted_byte * 2 > GAP_SIZE)
5059 {
5060 GAP_SIZE -= inserted_byte;
5061 ZV += inserted_byte; Z += inserted_byte;
5062 ZV_BYTE += inserted_byte; Z_BYTE += inserted_byte;
5063 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5064 make_gap (inserted_byte - GAP_SIZE);
5065 GAP_SIZE += inserted_byte;
5066 ZV -= inserted_byte; Z -= inserted_byte;
5067 ZV_BYTE -= inserted_byte; Z_BYTE -= inserted_byte;
5068 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
5069 }
5070 inserted_byte = str_to_multibyte (GPT_ADDR, GAP_SIZE, inserted_byte);
5071 }
7553d0e1 5072
12410ef1
KH
5073 /* If we have shrinked the conversion area, adjust it now. */
5074 if (total_skip > 0)
5075 {
5076 if (tail_skip > 0)
5077 safe_bcopy (GAP_END_ADDR, GPT_ADDR + inserted_byte, tail_skip);
5078 inserted += total_skip; inserted_byte += total_skip;
5079 GAP_SIZE += total_skip;
5080 GPT -= head_skip; GPT_BYTE -= head_skip;
5081 ZV -= total_skip; ZV_BYTE -= total_skip;
5082 Z -= total_skip; Z_BYTE -= total_skip;
5083 from -= head_skip; from_byte -= head_skip;
5084 to += tail_skip; to_byte += tail_skip;
5085 }
5086
6abb9bd9 5087 prev_Z = Z;
12410ef1 5088 adjust_after_replace (from, from_byte, deletion, inserted, inserted_byte);
6abb9bd9 5089 inserted = Z - prev_Z;
4ed46869 5090
ec6d2bb8
KH
5091 if (!encodep && coding->cmp_data && coding->cmp_data->used)
5092 coding_restore_composition (coding, Fcurrent_buffer ());
5093 coding_free_composition_data (coding);
5094
b73bfc1c
KH
5095 if (! inhibit_pre_post_conversion
5096 && ! encodep && ! NILP (coding->post_read_conversion))
d46c5b12 5097 {
2b4f9037 5098 Lisp_Object val;
b843d1ae 5099 int count = specpdl_ptr - specpdl;
4ed46869 5100
e133c8fa
KH
5101 if (from != PT)
5102 TEMP_SET_PT_BOTH (from, from_byte);
6abb9bd9 5103 prev_Z = Z;
b843d1ae
KH
5104 record_unwind_protect (code_convert_region_unwind, Qnil);
5105 /* We should not call any more pre-write/post-read-conversion
5106 functions while this post-read-conversion is running. */
5107 inhibit_pre_post_conversion = 1;
2b4f9037 5108 val = call1 (coding->post_read_conversion, make_number (inserted));
b843d1ae
KH
5109 inhibit_pre_post_conversion = 0;
5110 /* Discard the unwind protect. */
5111 specpdl_ptr--;
6abb9bd9 5112 CHECK_NUMBER (val, 0);
944bd420 5113 inserted += Z - prev_Z;
e133c8fa
KH
5114 }
5115
5116 if (orig_point >= from)
5117 {
5118 if (orig_point >= from + orig_len)
5119 orig_point += inserted - orig_len;
5120 else
5121 orig_point = from;
5122 TEMP_SET_PT (orig_point);
d46c5b12 5123 }
4ed46869 5124
ec6d2bb8
KH
5125 if (replace)
5126 {
5127 signal_after_change (from, to - from, inserted);
e19539f1 5128 update_compositions (from, from + inserted, CHECK_BORDER);
ec6d2bb8 5129 }
2b4f9037 5130
fb88bf2d 5131 {
12410ef1
KH
5132 coding->consumed = to_byte - from_byte;
5133 coding->consumed_char = to - from;
5134 coding->produced = inserted_byte;
5135 coding->produced_char = inserted;
fb88bf2d 5136 }
7553d0e1 5137
fb88bf2d 5138 return 0;
d46c5b12
KH
5139}
5140
5141Lisp_Object
b73bfc1c
KH
5142run_pre_post_conversion_on_str (str, coding, encodep)
5143 Lisp_Object str;
5144 struct coding_system *coding;
5145 int encodep;
5146{
5147 int count = specpdl_ptr - specpdl;
5148 struct gcpro gcpro1;
5149 struct buffer *prev = current_buffer;
5150 int multibyte = STRING_MULTIBYTE (str);
5151
5152 record_unwind_protect (Fset_buffer, Fcurrent_buffer ());
5153 record_unwind_protect (code_convert_region_unwind, Qnil);
5154 GCPRO1 (str);
5155 temp_output_buffer_setup (" *code-converting-work*");
5156 set_buffer_internal (XBUFFER (Vstandard_output));
5157 /* We must insert the contents of STR as is without
5158 unibyte<->multibyte conversion. For that, we adjust the
5159 multibyteness of the working buffer to that of STR. */
5160 Ferase_buffer ();
5161 current_buffer->enable_multibyte_characters = multibyte ? Qt : Qnil;
5162 insert_from_string (str, 0, 0,
5163 XSTRING (str)->size, STRING_BYTES (XSTRING (str)), 0);
5164 UNGCPRO;
5165 inhibit_pre_post_conversion = 1;
5166 if (encodep)
5167 call2 (coding->pre_write_conversion, make_number (BEG), make_number (Z));
5168 else
6bac5b12
KH
5169 {
5170 TEMP_SET_PT_BOTH (BEG, BEG_BYTE);
5171 call1 (coding->post_read_conversion, make_number (Z - BEG));
5172 }
b73bfc1c
KH
5173 inhibit_pre_post_conversion = 0;
5174 str = make_buffer_string (BEG, Z, 0);
5175 return unbind_to (count, str);
5176}
5177
5178Lisp_Object
5179decode_coding_string (str, coding, nocopy)
d46c5b12 5180 Lisp_Object str;
4ed46869 5181 struct coding_system *coding;
b73bfc1c 5182 int nocopy;
4ed46869 5183{
d46c5b12
KH
5184 int len;
5185 char *buf;
b73bfc1c 5186 int from, to, to_byte;
d46c5b12 5187 struct gcpro gcpro1;
84d60297 5188 Lisp_Object saved_coding_symbol;
d46c5b12 5189 int result;
4ed46869 5190
b73bfc1c
KH
5191 from = 0;
5192 to = XSTRING (str)->size;
5193 to_byte = STRING_BYTES (XSTRING (str));
4ed46869 5194
b73bfc1c
KH
5195 saved_coding_symbol = Qnil;
5196 if (CODING_REQUIRE_DETECTION (coding))
d46c5b12
KH
5197 {
5198 /* See the comments in code_convert_region. */
5199 if (coding->type == coding_type_undecided)
5200 {
5201 detect_coding (coding, XSTRING (str)->data, to_byte);
5202 if (coding->type == coding_type_undecided)
5203 coding->type = coding_type_emacs_mule;
5204 }
aaaf0b1e
KH
5205 if (coding->eol_type == CODING_EOL_UNDECIDED
5206 && coding->type != coding_type_ccl)
d46c5b12
KH
5207 {
5208 saved_coding_symbol = coding->symbol;
5209 detect_eol (coding, XSTRING (str)->data, to_byte);
5210 if (coding->eol_type == CODING_EOL_UNDECIDED)
5211 coding->eol_type = CODING_EOL_LF;
5212 /* We had better recover the original eol format if we
5213 encounter an inconsitent eol format while decoding. */
5214 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
5215 }
5216 }
4ed46869 5217
b73bfc1c 5218 if (! CODING_REQUIRE_DECODING (coding))
ec6d2bb8 5219 {
b73bfc1c
KH
5220 if (!STRING_MULTIBYTE (str))
5221 {
5222 str = Fstring_as_multibyte (str);
5223 nocopy = 1;
5224 }
5225 return (nocopy ? str : Fcopy_sequence (str));
ec6d2bb8
KH
5226 }
5227
b73bfc1c 5228 if (STRING_MULTIBYTE (str))
d46c5b12 5229 {
b73bfc1c
KH
5230 /* Decoding routines expect the source text to be unibyte. */
5231 str = Fstring_as_unibyte (str);
86af83a9 5232 to_byte = STRING_BYTES (XSTRING (str));
b73bfc1c
KH
5233 nocopy = 1;
5234 coding->src_multibyte = 0;
5235 }
5236 coding->dst_multibyte = 1;
ec6d2bb8 5237
b73bfc1c
KH
5238 if (coding->composing != COMPOSITION_DISABLED)
5239 coding_allocate_composition_data (coding, from);
ec6d2bb8 5240
b73bfc1c 5241 /* Try to skip the heading and tailing ASCIIs. */
4956c225
KH
5242 if (coding->type != coding_type_ccl)
5243 {
5244 int from_orig = from;
4ed46869 5245
4956c225
KH
5246 SHRINK_CONVERSION_REGION (&from, &to_byte, coding, XSTRING (str)->data,
5247 0);
5248 if (from == to_byte)
5249 return (nocopy ? str : Fcopy_sequence (str));
5250 }
b73bfc1c
KH
5251
5252 len = decoding_buffer_size (coding, to_byte - from);
fc932ac6 5253 len += from + STRING_BYTES (XSTRING (str)) - to_byte;
d46c5b12
KH
5254 GCPRO1 (str);
5255 buf = get_conversion_buffer (len);
5256 UNGCPRO;
4ed46869 5257
d46c5b12
KH
5258 if (from > 0)
5259 bcopy (XSTRING (str)->data, buf, from);
b73bfc1c
KH
5260 result = decode_coding (coding, XSTRING (str)->data + from,
5261 buf + from, to_byte - from, len);
5262 if (result == CODING_FINISH_INCONSISTENT_EOL)
4ed46869 5263 {
ec6d2bb8 5264 /* We simply try to decode the whole string again but without
d46c5b12
KH
5265 eol-conversion this time. */
5266 coding->eol_type = CODING_EOL_LF;
5267 coding->symbol = saved_coding_symbol;
ec6d2bb8 5268 coding_free_composition_data (coding);
b73bfc1c 5269 return decode_coding_string (str, coding, nocopy);
4ed46869 5270 }
d46c5b12
KH
5271
5272 bcopy (XSTRING (str)->data + to_byte, buf + from + coding->produced,
fc932ac6 5273 STRING_BYTES (XSTRING (str)) - to_byte);
d46c5b12 5274
fc932ac6 5275 len = from + STRING_BYTES (XSTRING (str)) - to_byte;
b73bfc1c
KH
5276 str = make_multibyte_string (buf, len + coding->produced_char,
5277 len + coding->produced);
5278
5279 if (coding->cmp_data && coding->cmp_data->used)
5280 coding_restore_composition (coding, str);
5281 coding_free_composition_data (coding);
5282
5283 if (SYMBOLP (coding->post_read_conversion)
5284 && !NILP (Ffboundp (coding->post_read_conversion)))
6bac5b12 5285 str = run_pre_post_conversion_on_str (str, coding, 0);
b73bfc1c
KH
5286
5287 return str;
5288}
5289
5290Lisp_Object
5291encode_coding_string (str, coding, nocopy)
5292 Lisp_Object str;
5293 struct coding_system *coding;
5294 int nocopy;
5295{
5296 int len;
5297 char *buf;
5298 int from, to, to_byte;
5299 struct gcpro gcpro1;
5300 Lisp_Object saved_coding_symbol;
5301 int result;
5302
5303 if (SYMBOLP (coding->pre_write_conversion)
5304 && !NILP (Ffboundp (coding->pre_write_conversion)))
6bac5b12 5305 str = run_pre_post_conversion_on_str (str, coding, 1);
b73bfc1c
KH
5306
5307 from = 0;
5308 to = XSTRING (str)->size;
5309 to_byte = STRING_BYTES (XSTRING (str));
5310
5311 saved_coding_symbol = Qnil;
5312 if (! CODING_REQUIRE_ENCODING (coding))
826bfb8b 5313 {
b73bfc1c
KH
5314 if (STRING_MULTIBYTE (str))
5315 {
5316 str = Fstring_as_unibyte (str);
5317 nocopy = 1;
5318 }
5319 return (nocopy ? str : Fcopy_sequence (str));
826bfb8b
KH
5320 }
5321
b73bfc1c
KH
5322 /* Encoding routines determine the multibyteness of the source text
5323 by coding->src_multibyte. */
5324 coding->src_multibyte = STRING_MULTIBYTE (str);
5325 coding->dst_multibyte = 0;
5326
5327 if (coding->composing != COMPOSITION_DISABLED)
5328 coding_save_composition (coding, from, to, str);
ec6d2bb8 5329
b73bfc1c 5330 /* Try to skip the heading and tailing ASCIIs. */
4956c225
KH
5331 if (coding->type != coding_type_ccl)
5332 {
5333 int from_orig = from;
b73bfc1c 5334
4956c225
KH
5335 SHRINK_CONVERSION_REGION (&from, &to_byte, coding, XSTRING (str)->data,
5336 1);
5337 if (from == to_byte)
5338 return (nocopy ? str : Fcopy_sequence (str));
5339 }
b73bfc1c
KH
5340
5341 len = encoding_buffer_size (coding, to_byte - from);
5342 len += from + STRING_BYTES (XSTRING (str)) - to_byte;
5343 GCPRO1 (str);
5344 buf = get_conversion_buffer (len);
5345 UNGCPRO;
5346
5347 if (from > 0)
5348 bcopy (XSTRING (str)->data, buf, from);
5349 result = encode_coding (coding, XSTRING (str)->data + from,
5350 buf + from, to_byte - from, len);
5351 bcopy (XSTRING (str)->data + to_byte, buf + from + coding->produced,
5352 STRING_BYTES (XSTRING (str)) - to_byte);
5353
5354 len = from + STRING_BYTES (XSTRING (str)) - to_byte;
5355 str = make_unibyte_string (buf, len + coding->produced);
ec6d2bb8 5356 coding_free_composition_data (coding);
b73bfc1c 5357
d46c5b12 5358 return str;
4ed46869
KH
5359}
5360
5361\f
5362#ifdef emacs
1397dc18 5363/*** 8. Emacs Lisp library functions ***/
4ed46869 5364
4ed46869
KH
5365DEFUN ("coding-system-p", Fcoding_system_p, Scoding_system_p, 1, 1, 0,
5366 "Return t if OBJECT is nil or a coding-system.\n\
3a73fa5d
RS
5367See the documentation of `make-coding-system' for information\n\
5368about coding-system objects.")
4ed46869
KH
5369 (obj)
5370 Lisp_Object obj;
5371{
4608c386
KH
5372 if (NILP (obj))
5373 return Qt;
5374 if (!SYMBOLP (obj))
5375 return Qnil;
5376 /* Get coding-spec vector for OBJ. */
5377 obj = Fget (obj, Qcoding_system);
5378 return ((VECTORP (obj) && XVECTOR (obj)->size == 5)
5379 ? Qt : Qnil);
4ed46869
KH
5380}
5381
9d991de8
RS
5382DEFUN ("read-non-nil-coding-system", Fread_non_nil_coding_system,
5383 Sread_non_nil_coding_system, 1, 1, 0,
e0e989f6 5384 "Read a coding system from the minibuffer, prompting with string PROMPT.")
4ed46869
KH
5385 (prompt)
5386 Lisp_Object prompt;
5387{
e0e989f6 5388 Lisp_Object val;
9d991de8
RS
5389 do
5390 {
4608c386
KH
5391 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil,
5392 Qt, Qnil, Qcoding_system_history, Qnil, Qnil);
9d991de8
RS
5393 }
5394 while (XSTRING (val)->size == 0);
e0e989f6 5395 return (Fintern (val, Qnil));
4ed46869
KH
5396}
5397
9b787f3e
RS
5398DEFUN ("read-coding-system", Fread_coding_system, Sread_coding_system, 1, 2, 0,
5399 "Read a coding system from the minibuffer, prompting with string PROMPT.\n\
5400If the user enters null input, return second argument DEFAULT-CODING-SYSTEM.")
5401 (prompt, default_coding_system)
5402 Lisp_Object prompt, default_coding_system;
4ed46869 5403{
f44d27ce 5404 Lisp_Object val;
9b787f3e
RS
5405 if (SYMBOLP (default_coding_system))
5406 XSETSTRING (default_coding_system, XSYMBOL (default_coding_system)->name);
4608c386 5407 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil,
9b787f3e
RS
5408 Qt, Qnil, Qcoding_system_history,
5409 default_coding_system, Qnil);
e0e989f6 5410 return (XSTRING (val)->size == 0 ? Qnil : Fintern (val, Qnil));
4ed46869
KH
5411}
5412
5413DEFUN ("check-coding-system", Fcheck_coding_system, Scheck_coding_system,
5414 1, 1, 0,
5415 "Check validity of CODING-SYSTEM.\n\
3a73fa5d
RS
5416If valid, return CODING-SYSTEM, else signal a `coding-system-error' error.\n\
5417It is valid if it is a symbol with a non-nil `coding-system' property.\n\
4ed46869
KH
5418The value of property should be a vector of length 5.")
5419 (coding_system)
5420 Lisp_Object coding_system;
5421{
5422 CHECK_SYMBOL (coding_system, 0);
5423 if (!NILP (Fcoding_system_p (coding_system)))
5424 return coding_system;
5425 while (1)
02ba4723 5426 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil));
4ed46869 5427}
3a73fa5d 5428\f
d46c5b12
KH
5429Lisp_Object
5430detect_coding_system (src, src_bytes, highest)
5431 unsigned char *src;
5432 int src_bytes, highest;
4ed46869
KH
5433{
5434 int coding_mask, eol_type;
d46c5b12
KH
5435 Lisp_Object val, tmp;
5436 int dummy;
4ed46869 5437
d46c5b12
KH
5438 coding_mask = detect_coding_mask (src, src_bytes, NULL, &dummy);
5439 eol_type = detect_eol_type (src, src_bytes, &dummy);
5440 if (eol_type == CODING_EOL_INCONSISTENT)
25b02698 5441 eol_type = CODING_EOL_UNDECIDED;
4ed46869 5442
d46c5b12 5443 if (!coding_mask)
4ed46869 5444 {
27901516 5445 val = Qundecided;
d46c5b12 5446 if (eol_type != CODING_EOL_UNDECIDED)
4ed46869 5447 {
f44d27ce
RS
5448 Lisp_Object val2;
5449 val2 = Fget (Qundecided, Qeol_type);
4ed46869
KH
5450 if (VECTORP (val2))
5451 val = XVECTOR (val2)->contents[eol_type];
5452 }
80e803b4 5453 return (highest ? val : Fcons (val, Qnil));
4ed46869 5454 }
4ed46869 5455
d46c5b12
KH
5456 /* At first, gather possible coding systems in VAL. */
5457 val = Qnil;
fa42c37f 5458 for (tmp = Vcoding_category_list; CONSP (tmp); tmp = XCDR (tmp))
4ed46869 5459 {
fa42c37f
KH
5460 Lisp_Object category_val, category_index;
5461
5462 category_index = Fget (XCAR (tmp), Qcoding_category_index);
5463 category_val = Fsymbol_value (XCAR (tmp));
5464 if (!NILP (category_val)
5465 && NATNUMP (category_index)
5466 && (coding_mask & (1 << XFASTINT (category_index))))
4ed46869 5467 {
fa42c37f 5468 val = Fcons (category_val, val);
d46c5b12
KH
5469 if (highest)
5470 break;
4ed46869
KH
5471 }
5472 }
d46c5b12
KH
5473 if (!highest)
5474 val = Fnreverse (val);
4ed46869 5475
65059037 5476 /* Then, replace the elements with subsidiary coding systems. */
fa42c37f 5477 for (tmp = val; CONSP (tmp); tmp = XCDR (tmp))
4ed46869 5478 {
65059037
RS
5479 if (eol_type != CODING_EOL_UNDECIDED
5480 && eol_type != CODING_EOL_INCONSISTENT)
4ed46869 5481 {
d46c5b12 5482 Lisp_Object eol;
03699b14 5483 eol = Fget (XCAR (tmp), Qeol_type);
d46c5b12 5484 if (VECTORP (eol))
03699b14 5485 XCAR (tmp) = XVECTOR (eol)->contents[eol_type];
4ed46869
KH
5486 }
5487 }
03699b14 5488 return (highest ? XCAR (val) : val);
d46c5b12 5489}
4ed46869 5490
d46c5b12
KH
5491DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region,
5492 2, 3, 0,
5493 "Detect coding system of the text in the region between START and END.\n\
5494Return a list of possible coding systems ordered by priority.\n\
5495\n\
80e803b4
KH
5496If only ASCII characters are found, it returns a list of single element\n\
5497`undecided' or its subsidiary coding system according to a detected\n\
5498end-of-line format.\n\
d46c5b12
KH
5499\n\
5500If optional argument HIGHEST is non-nil, return the coding system of\n\
5501highest priority.")
5502 (start, end, highest)
5503 Lisp_Object start, end, highest;
5504{
5505 int from, to;
5506 int from_byte, to_byte;
6289dd10 5507
d46c5b12
KH
5508 CHECK_NUMBER_COERCE_MARKER (start, 0);
5509 CHECK_NUMBER_COERCE_MARKER (end, 1);
4ed46869 5510
d46c5b12
KH
5511 validate_region (&start, &end);
5512 from = XINT (start), to = XINT (end);
5513 from_byte = CHAR_TO_BYTE (from);
5514 to_byte = CHAR_TO_BYTE (to);
6289dd10 5515
d46c5b12
KH
5516 if (from < GPT && to >= GPT)
5517 move_gap_both (to, to_byte);
4ed46869 5518
d46c5b12
KH
5519 return detect_coding_system (BYTE_POS_ADDR (from_byte),
5520 to_byte - from_byte,
5521 !NILP (highest));
5522}
6289dd10 5523
d46c5b12
KH
5524DEFUN ("detect-coding-string", Fdetect_coding_string, Sdetect_coding_string,
5525 1, 2, 0,
5526 "Detect coding system of the text in STRING.\n\
5527Return a list of possible coding systems ordered by priority.\n\
5528\n\
80e803b4
KH
5529If only ASCII characters are found, it returns a list of single element\n\
5530`undecided' or its subsidiary coding system according to a detected\n\
5531end-of-line format.\n\
d46c5b12
KH
5532\n\
5533If optional argument HIGHEST is non-nil, return the coding system of\n\
5534highest priority.")
5535 (string, highest)
5536 Lisp_Object string, highest;
5537{
5538 CHECK_STRING (string, 0);
4ed46869 5539
d46c5b12 5540 return detect_coding_system (XSTRING (string)->data,
fc932ac6 5541 STRING_BYTES (XSTRING (string)),
d46c5b12 5542 !NILP (highest));
4ed46869
KH
5543}
5544
4031e2bf
KH
5545Lisp_Object
5546code_convert_region1 (start, end, coding_system, encodep)
d46c5b12 5547 Lisp_Object start, end, coding_system;
4031e2bf 5548 int encodep;
3a73fa5d
RS
5549{
5550 struct coding_system coding;
4031e2bf 5551 int from, to, len;
3a73fa5d 5552
d46c5b12
KH
5553 CHECK_NUMBER_COERCE_MARKER (start, 0);
5554 CHECK_NUMBER_COERCE_MARKER (end, 1);
3a73fa5d
RS
5555 CHECK_SYMBOL (coding_system, 2);
5556
d46c5b12
KH
5557 validate_region (&start, &end);
5558 from = XFASTINT (start);
5559 to = XFASTINT (end);
5560
3a73fa5d 5561 if (NILP (coding_system))
d46c5b12
KH
5562 return make_number (to - from);
5563
3a73fa5d 5564 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
d46c5b12 5565 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
3a73fa5d 5566
d46c5b12 5567 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
5568 coding.src_multibyte = coding.dst_multibyte
5569 = !NILP (current_buffer->enable_multibyte_characters);
fb88bf2d
KH
5570 code_convert_region (from, CHAR_TO_BYTE (from), to, CHAR_TO_BYTE (to),
5571 &coding, encodep, 1);
f072a3e8 5572 Vlast_coding_system_used = coding.symbol;
fb88bf2d 5573 return make_number (coding.produced_char);
4031e2bf
KH
5574}
5575
5576DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region,
5577 3, 3, "r\nzCoding system: ",
5578 "Decode the current region by specified coding system.\n\
5579When called from a program, takes three arguments:\n\
5580START, END, and CODING-SYSTEM. START and END are buffer positions.\n\
f072a3e8
RS
5581This function sets `last-coding-system-used' to the precise coding system\n\
5582used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
5583not fully specified.)\n\
5584It returns the length of the decoded text.")
4031e2bf
KH
5585 (start, end, coding_system)
5586 Lisp_Object start, end, coding_system;
5587{
5588 return code_convert_region1 (start, end, coding_system, 0);
3a73fa5d
RS
5589}
5590
5591DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region,
5592 3, 3, "r\nzCoding system: ",
d46c5b12 5593 "Encode the current region by specified coding system.\n\
3a73fa5d 5594When called from a program, takes three arguments:\n\
d46c5b12 5595START, END, and CODING-SYSTEM. START and END are buffer positions.\n\
f072a3e8
RS
5596This function sets `last-coding-system-used' to the precise coding system\n\
5597used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
5598not fully specified.)\n\
5599It returns the length of the encoded text.")
d46c5b12
KH
5600 (start, end, coding_system)
5601 Lisp_Object start, end, coding_system;
3a73fa5d 5602{
4031e2bf
KH
5603 return code_convert_region1 (start, end, coding_system, 1);
5604}
3a73fa5d 5605
4031e2bf
KH
5606Lisp_Object
5607code_convert_string1 (string, coding_system, nocopy, encodep)
5608 Lisp_Object string, coding_system, nocopy;
5609 int encodep;
5610{
5611 struct coding_system coding;
3a73fa5d 5612
4031e2bf
KH
5613 CHECK_STRING (string, 0);
5614 CHECK_SYMBOL (coding_system, 1);
4ed46869 5615
d46c5b12 5616 if (NILP (coding_system))
4031e2bf 5617 return (NILP (nocopy) ? Fcopy_sequence (string) : string);
4ed46869 5618
d46c5b12
KH
5619 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
5620 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
5f1cd180 5621
d46c5b12 5622 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
5623 string = (encodep
5624 ? encode_coding_string (string, &coding, !NILP (nocopy))
5625 : decode_coding_string (string, &coding, !NILP (nocopy)));
f072a3e8 5626 Vlast_coding_system_used = coding.symbol;
ec6d2bb8
KH
5627
5628 return string;
4ed46869
KH
5629}
5630
4ed46869 5631DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string,
e0e989f6
KH
5632 2, 3, 0,
5633 "Decode STRING which is encoded in CODING-SYSTEM, and return the result.\n\
fe487a71 5634Optional arg NOCOPY non-nil means it is ok to return STRING itself\n\
f072a3e8
RS
5635if the decoding operation is trivial.\n\
5636This function sets `last-coding-system-used' to the precise coding system\n\
5637used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
5638not fully specified.)")
e0e989f6
KH
5639 (string, coding_system, nocopy)
5640 Lisp_Object string, coding_system, nocopy;
4ed46869 5641{
f072a3e8 5642 return code_convert_string1 (string, coding_system, nocopy, 0);
4ed46869
KH
5643}
5644
5645DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string,
e0e989f6
KH
5646 2, 3, 0,
5647 "Encode STRING to CODING-SYSTEM, and return the result.\n\
fe487a71 5648Optional arg NOCOPY non-nil means it is ok to return STRING itself\n\
f072a3e8
RS
5649if the encoding operation is trivial.\n\
5650This function sets `last-coding-system-used' to the precise coding system\n\
5651used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
5652not fully specified.)")
e0e989f6
KH
5653 (string, coding_system, nocopy)
5654 Lisp_Object string, coding_system, nocopy;
4ed46869 5655{
f072a3e8 5656 return code_convert_string1 (string, coding_system, nocopy, 1);
4ed46869 5657}
4031e2bf 5658
ecec61c1 5659/* Encode or decode STRING according to CODING_SYSTEM.
ec6d2bb8
KH
5660 Do not set Vlast_coding_system_used.
5661
5662 This function is called only from macros DECODE_FILE and
5663 ENCODE_FILE, thus we ignore character composition. */
ecec61c1
KH
5664
5665Lisp_Object
5666code_convert_string_norecord (string, coding_system, encodep)
5667 Lisp_Object string, coding_system;
5668 int encodep;
5669{
5670 struct coding_system coding;
5671
5672 CHECK_STRING (string, 0);
5673 CHECK_SYMBOL (coding_system, 1);
5674
5675 if (NILP (coding_system))
5676 return string;
5677
5678 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
5679 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
5680
ec6d2bb8 5681 coding.composing = COMPOSITION_DISABLED;
ecec61c1 5682 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
5683 return (encodep
5684 ? encode_coding_string (string, &coding, 1)
5685 : decode_coding_string (string, &coding, 1));
ecec61c1 5686}
3a73fa5d 5687\f
4ed46869 5688DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0,
55ab7be3 5689 "Decode a Japanese character which has CODE in shift_jis encoding.\n\
4ed46869
KH
5690Return the corresponding character.")
5691 (code)
5692 Lisp_Object code;
5693{
5694 unsigned char c1, c2, s1, s2;
5695 Lisp_Object val;
5696
5697 CHECK_NUMBER (code, 0);
5698 s1 = (XFASTINT (code)) >> 8, s2 = (XFASTINT (code)) & 0xFF;
55ab7be3
KH
5699 if (s1 == 0)
5700 {
c28a9453
KH
5701 if (s2 < 0x80)
5702 XSETFASTINT (val, s2);
5703 else if (s2 >= 0xA0 || s2 <= 0xDF)
b73bfc1c 5704 XSETFASTINT (val, MAKE_CHAR (charset_katakana_jisx0201, s2, 0));
c28a9453 5705 else
9da8350f 5706 error ("Invalid Shift JIS code: %x", XFASTINT (code));
55ab7be3
KH
5707 }
5708 else
5709 {
5710 if ((s1 < 0x80 || s1 > 0x9F && s1 < 0xE0 || s1 > 0xEF)
5711 || (s2 < 0x40 || s2 == 0x7F || s2 > 0xFC))
9da8350f 5712 error ("Invalid Shift JIS code: %x", XFASTINT (code));
55ab7be3 5713 DECODE_SJIS (s1, s2, c1, c2);
b73bfc1c 5714 XSETFASTINT (val, MAKE_CHAR (charset_jisx0208, c1, c2));
55ab7be3 5715 }
4ed46869
KH
5716 return val;
5717}
5718
5719DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0,
55ab7be3
KH
5720 "Encode a Japanese character CHAR to shift_jis encoding.\n\
5721Return the corresponding code in SJIS.")
4ed46869
KH
5722 (ch)
5723 Lisp_Object ch;
5724{
bcf26d6a 5725 int charset, c1, c2, s1, s2;
4ed46869
KH
5726 Lisp_Object val;
5727
5728 CHECK_NUMBER (ch, 0);
5729 SPLIT_CHAR (XFASTINT (ch), charset, c1, c2);
c28a9453
KH
5730 if (charset == CHARSET_ASCII)
5731 {
5732 val = ch;
5733 }
5734 else if (charset == charset_jisx0208
5735 && c1 > 0x20 && c1 < 0x7F && c2 > 0x20 && c2 < 0x7F)
4ed46869
KH
5736 {
5737 ENCODE_SJIS (c1, c2, s1, s2);
bcf26d6a 5738 XSETFASTINT (val, (s1 << 8) | s2);
4ed46869 5739 }
55ab7be3
KH
5740 else if (charset == charset_katakana_jisx0201
5741 && c1 > 0x20 && c2 < 0xE0)
5742 {
5743 XSETFASTINT (val, c1 | 0x80);
5744 }
4ed46869 5745 else
55ab7be3 5746 error ("Can't encode to shift_jis: %d", XFASTINT (ch));
4ed46869
KH
5747 return val;
5748}
5749
5750DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0,
c28a9453 5751 "Decode a Big5 character which has CODE in BIG5 coding system.\n\
4ed46869
KH
5752Return the corresponding character.")
5753 (code)
5754 Lisp_Object code;
5755{
5756 int charset;
5757 unsigned char b1, b2, c1, c2;
5758 Lisp_Object val;
5759
5760 CHECK_NUMBER (code, 0);
5761 b1 = (XFASTINT (code)) >> 8, b2 = (XFASTINT (code)) & 0xFF;
c28a9453
KH
5762 if (b1 == 0)
5763 {
5764 if (b2 >= 0x80)
9da8350f 5765 error ("Invalid BIG5 code: %x", XFASTINT (code));
c28a9453
KH
5766 val = code;
5767 }
5768 else
5769 {
5770 if ((b1 < 0xA1 || b1 > 0xFE)
5771 || (b2 < 0x40 || (b2 > 0x7E && b2 < 0xA1) || b2 > 0xFE))
9da8350f 5772 error ("Invalid BIG5 code: %x", XFASTINT (code));
c28a9453 5773 DECODE_BIG5 (b1, b2, charset, c1, c2);
b73bfc1c 5774 XSETFASTINT (val, MAKE_CHAR (charset, c1, c2));
c28a9453 5775 }
4ed46869
KH
5776 return val;
5777}
5778
5779DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0,
d46c5b12 5780 "Encode the Big5 character CHAR to BIG5 coding system.\n\
4ed46869
KH
5781Return the corresponding character code in Big5.")
5782 (ch)
5783 Lisp_Object ch;
5784{
bcf26d6a 5785 int charset, c1, c2, b1, b2;
4ed46869
KH
5786 Lisp_Object val;
5787
5788 CHECK_NUMBER (ch, 0);
5789 SPLIT_CHAR (XFASTINT (ch), charset, c1, c2);
c28a9453
KH
5790 if (charset == CHARSET_ASCII)
5791 {
5792 val = ch;
5793 }
5794 else if ((charset == charset_big5_1
5795 && (XFASTINT (ch) >= 0x250a1 && XFASTINT (ch) <= 0x271ec))
5796 || (charset == charset_big5_2
5797 && XFASTINT (ch) >= 0x290a1 && XFASTINT (ch) <= 0x2bdb2))
4ed46869
KH
5798 {
5799 ENCODE_BIG5 (charset, c1, c2, b1, b2);
bcf26d6a 5800 XSETFASTINT (val, (b1 << 8) | b2);
4ed46869
KH
5801 }
5802 else
c28a9453 5803 error ("Can't encode to Big5: %d", XFASTINT (ch));
4ed46869
KH
5804 return val;
5805}
3a73fa5d 5806\f
1ba9e4ab
KH
5807DEFUN ("set-terminal-coding-system-internal",
5808 Fset_terminal_coding_system_internal,
5809 Sset_terminal_coding_system_internal, 1, 1, 0, "")
4ed46869
KH
5810 (coding_system)
5811 Lisp_Object coding_system;
5812{
5813 CHECK_SYMBOL (coding_system, 0);
5814 setup_coding_system (Fcheck_coding_system (coding_system), &terminal_coding);
70c22245 5815 /* We had better not send unsafe characters to terminal. */
6e85d753 5816 terminal_coding.flags |= CODING_FLAG_ISO_SAFE;
ec6d2bb8
KH
5817 /* Characer composition should be disabled. */
5818 terminal_coding.composing = COMPOSITION_DISABLED;
b73bfc1c
KH
5819 terminal_coding.src_multibyte = 1;
5820 terminal_coding.dst_multibyte = 0;
4ed46869
KH
5821 return Qnil;
5822}
5823
c4825358
KH
5824DEFUN ("set-safe-terminal-coding-system-internal",
5825 Fset_safe_terminal_coding_system_internal,
5826 Sset_safe_terminal_coding_system_internal, 1, 1, 0, "")
5827 (coding_system)
5828 Lisp_Object coding_system;
5829{
5830 CHECK_SYMBOL (coding_system, 0);
5831 setup_coding_system (Fcheck_coding_system (coding_system),
5832 &safe_terminal_coding);
ec6d2bb8
KH
5833 /* Characer composition should be disabled. */
5834 safe_terminal_coding.composing = COMPOSITION_DISABLED;
b73bfc1c
KH
5835 safe_terminal_coding.src_multibyte = 1;
5836 safe_terminal_coding.dst_multibyte = 0;
c4825358
KH
5837 return Qnil;
5838}
5839
4ed46869
KH
5840DEFUN ("terminal-coding-system",
5841 Fterminal_coding_system, Sterminal_coding_system, 0, 0, 0,
3a73fa5d 5842 "Return coding system specified for terminal output.")
4ed46869
KH
5843 ()
5844{
5845 return terminal_coding.symbol;
5846}
5847
1ba9e4ab
KH
5848DEFUN ("set-keyboard-coding-system-internal",
5849 Fset_keyboard_coding_system_internal,
5850 Sset_keyboard_coding_system_internal, 1, 1, 0, "")
4ed46869
KH
5851 (coding_system)
5852 Lisp_Object coding_system;
5853{
5854 CHECK_SYMBOL (coding_system, 0);
5855 setup_coding_system (Fcheck_coding_system (coding_system), &keyboard_coding);
ec6d2bb8
KH
5856 /* Characer composition should be disabled. */
5857 keyboard_coding.composing = COMPOSITION_DISABLED;
4ed46869
KH
5858 return Qnil;
5859}
5860
5861DEFUN ("keyboard-coding-system",
5862 Fkeyboard_coding_system, Skeyboard_coding_system, 0, 0, 0,
3a73fa5d 5863 "Return coding system specified for decoding keyboard input.")
4ed46869
KH
5864 ()
5865{
5866 return keyboard_coding.symbol;
5867}
5868
5869\f
a5d301df
KH
5870DEFUN ("find-operation-coding-system", Ffind_operation_coding_system,
5871 Sfind_operation_coding_system, 1, MANY, 0,
5872 "Choose a coding system for an operation based on the target name.\n\
69f76525 5873The value names a pair of coding systems: (DECODING-SYSTEM . ENCODING-SYSTEM).\n\
9ce27fde
KH
5874DECODING-SYSTEM is the coding system to use for decoding\n\
5875\(in case OPERATION does decoding), and ENCODING-SYSTEM is the coding system\n\
5876for encoding (in case OPERATION does encoding).\n\
ccdb79f5
RS
5877\n\
5878The first argument OPERATION specifies an I/O primitive:\n\
5879 For file I/O, `insert-file-contents' or `write-region'.\n\
5880 For process I/O, `call-process', `call-process-region', or `start-process'.\n\
5881 For network I/O, `open-network-stream'.\n\
5882\n\
5883The remaining arguments should be the same arguments that were passed\n\
5884to the primitive. Depending on which primitive, one of those arguments\n\
5885is selected as the TARGET. For example, if OPERATION does file I/O,\n\
5886whichever argument specifies the file name is TARGET.\n\
5887\n\
5888TARGET has a meaning which depends on OPERATION:\n\
4ed46869
KH
5889 For file I/O, TARGET is a file name.\n\
5890 For process I/O, TARGET is a process name.\n\
5891 For network I/O, TARGET is a service name or a port number\n\
5892\n\
02ba4723
KH
5893This function looks up what specified for TARGET in,\n\
5894`file-coding-system-alist', `process-coding-system-alist',\n\
5895or `network-coding-system-alist' depending on OPERATION.\n\
5896They may specify a coding system, a cons of coding systems,\n\
5897or a function symbol to call.\n\
5898In the last case, we call the function with one argument,\n\
9ce27fde 5899which is a list of all the arguments given to this function.")
4ed46869
KH
5900 (nargs, args)
5901 int nargs;
5902 Lisp_Object *args;
5903{
5904 Lisp_Object operation, target_idx, target, val;
5905 register Lisp_Object chain;
5906
5907 if (nargs < 2)
5908 error ("Too few arguments");
5909 operation = args[0];
5910 if (!SYMBOLP (operation)
5911 || !INTEGERP (target_idx = Fget (operation, Qtarget_idx)))
5912 error ("Invalid first arguement");
5913 if (nargs < 1 + XINT (target_idx))
5914 error ("Too few arguments for operation: %s",
5915 XSYMBOL (operation)->name->data);
5916 target = args[XINT (target_idx) + 1];
5917 if (!(STRINGP (target)
5918 || (EQ (operation, Qopen_network_stream) && INTEGERP (target))))
5919 error ("Invalid %dth argument", XINT (target_idx) + 1);
5920
2e34157c
RS
5921 chain = ((EQ (operation, Qinsert_file_contents)
5922 || EQ (operation, Qwrite_region))
02ba4723 5923 ? Vfile_coding_system_alist
2e34157c 5924 : (EQ (operation, Qopen_network_stream)
02ba4723
KH
5925 ? Vnetwork_coding_system_alist
5926 : Vprocess_coding_system_alist));
4ed46869
KH
5927 if (NILP (chain))
5928 return Qnil;
5929
03699b14 5930 for (; CONSP (chain); chain = XCDR (chain))
4ed46869 5931 {
f44d27ce 5932 Lisp_Object elt;
03699b14 5933 elt = XCAR (chain);
4ed46869
KH
5934
5935 if (CONSP (elt)
5936 && ((STRINGP (target)
03699b14
KR
5937 && STRINGP (XCAR (elt))
5938 && fast_string_match (XCAR (elt), target) >= 0)
5939 || (INTEGERP (target) && EQ (target, XCAR (elt)))))
02ba4723 5940 {
03699b14 5941 val = XCDR (elt);
b19fd4c5
KH
5942 /* Here, if VAL is both a valid coding system and a valid
5943 function symbol, we return VAL as a coding system. */
02ba4723
KH
5944 if (CONSP (val))
5945 return val;
5946 if (! SYMBOLP (val))
5947 return Qnil;
5948 if (! NILP (Fcoding_system_p (val)))
5949 return Fcons (val, val);
b19fd4c5
KH
5950 if (! NILP (Ffboundp (val)))
5951 {
5952 val = call1 (val, Flist (nargs, args));
5953 if (CONSP (val))
5954 return val;
5955 if (SYMBOLP (val) && ! NILP (Fcoding_system_p (val)))
5956 return Fcons (val, val);
5957 }
02ba4723
KH
5958 return Qnil;
5959 }
4ed46869
KH
5960 }
5961 return Qnil;
5962}
5963
1397dc18
KH
5964DEFUN ("update-coding-systems-internal", Fupdate_coding_systems_internal,
5965 Supdate_coding_systems_internal, 0, 0, 0,
5966 "Update internal database for ISO2022 and CCL based coding systems.\n\
fa42c37f
KH
5967When values of any coding categories are changed, you must\n\
5968call this function")
d46c5b12
KH
5969 ()
5970{
5971 int i;
5972
fa42c37f 5973 for (i = CODING_CATEGORY_IDX_EMACS_MULE; i < CODING_CATEGORY_IDX_MAX; i++)
d46c5b12 5974 {
1397dc18
KH
5975 Lisp_Object val;
5976
5977 val = XSYMBOL (XVECTOR (Vcoding_category_table)->contents[i])->value;
5978 if (!NILP (val))
5979 {
5980 if (! coding_system_table[i])
5981 coding_system_table[i] = ((struct coding_system *)
5982 xmalloc (sizeof (struct coding_system)));
5983 setup_coding_system (val, coding_system_table[i]);
5984 }
5985 else if (coding_system_table[i])
5986 {
5987 xfree (coding_system_table[i]);
5988 coding_system_table[i] = NULL;
5989 }
d46c5b12 5990 }
1397dc18 5991
d46c5b12
KH
5992 return Qnil;
5993}
5994
66cfb530
KH
5995DEFUN ("set-coding-priority-internal", Fset_coding_priority_internal,
5996 Sset_coding_priority_internal, 0, 0, 0,
5997 "Update internal database for the current value of `coding-category-list'.\n\
5998This function is internal use only.")
5999 ()
6000{
6001 int i = 0, idx;
84d60297
RS
6002 Lisp_Object val;
6003
6004 val = Vcoding_category_list;
66cfb530
KH
6005
6006 while (CONSP (val) && i < CODING_CATEGORY_IDX_MAX)
6007 {
03699b14 6008 if (! SYMBOLP (XCAR (val)))
66cfb530 6009 break;
03699b14 6010 idx = XFASTINT (Fget (XCAR (val), Qcoding_category_index));
66cfb530
KH
6011 if (idx >= CODING_CATEGORY_IDX_MAX)
6012 break;
6013 coding_priorities[i++] = (1 << idx);
03699b14 6014 val = XCDR (val);
66cfb530
KH
6015 }
6016 /* If coding-category-list is valid and contains all coding
6017 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not,
fa42c37f 6018 the following code saves Emacs from crashing. */
66cfb530
KH
6019 while (i < CODING_CATEGORY_IDX_MAX)
6020 coding_priorities[i++] = CODING_CATEGORY_MASK_RAW_TEXT;
6021
6022 return Qnil;
6023}
6024
4ed46869
KH
6025#endif /* emacs */
6026
6027\f
1397dc18 6028/*** 9. Post-amble ***/
4ed46869 6029
6d74c3aa
KH
6030void
6031init_coding ()
6032{
6033 conversion_buffer = (char *) xmalloc (MINIMUM_CONVERSION_BUFFER_SIZE);
6034}
6035
dfcf069d 6036void
4ed46869
KH
6037init_coding_once ()
6038{
6039 int i;
6040
0ef69138 6041 /* Emacs' internal format specific initialize routine. */
4ed46869
KH
6042 for (i = 0; i <= 0x20; i++)
6043 emacs_code_class[i] = EMACS_control_code;
6044 emacs_code_class[0x0A] = EMACS_linefeed_code;
6045 emacs_code_class[0x0D] = EMACS_carriage_return_code;
6046 for (i = 0x21 ; i < 0x7F; i++)
6047 emacs_code_class[i] = EMACS_ascii_code;
6048 emacs_code_class[0x7F] = EMACS_control_code;
ec6d2bb8 6049 for (i = 0x80; i < 0xFF; i++)
4ed46869
KH
6050 emacs_code_class[i] = EMACS_invalid_code;
6051 emacs_code_class[LEADING_CODE_PRIVATE_11] = EMACS_leading_code_3;
6052 emacs_code_class[LEADING_CODE_PRIVATE_12] = EMACS_leading_code_3;
6053 emacs_code_class[LEADING_CODE_PRIVATE_21] = EMACS_leading_code_4;
6054 emacs_code_class[LEADING_CODE_PRIVATE_22] = EMACS_leading_code_4;
6055
6056 /* ISO2022 specific initialize routine. */
6057 for (i = 0; i < 0x20; i++)
b73bfc1c 6058 iso_code_class[i] = ISO_control_0;
4ed46869
KH
6059 for (i = 0x21; i < 0x7F; i++)
6060 iso_code_class[i] = ISO_graphic_plane_0;
6061 for (i = 0x80; i < 0xA0; i++)
b73bfc1c 6062 iso_code_class[i] = ISO_control_1;
4ed46869
KH
6063 for (i = 0xA1; i < 0xFF; i++)
6064 iso_code_class[i] = ISO_graphic_plane_1;
6065 iso_code_class[0x20] = iso_code_class[0x7F] = ISO_0x20_or_0x7F;
6066 iso_code_class[0xA0] = iso_code_class[0xFF] = ISO_0xA0_or_0xFF;
6067 iso_code_class[ISO_CODE_CR] = ISO_carriage_return;
6068 iso_code_class[ISO_CODE_SO] = ISO_shift_out;
6069 iso_code_class[ISO_CODE_SI] = ISO_shift_in;
6070 iso_code_class[ISO_CODE_SS2_7] = ISO_single_shift_2_7;
6071 iso_code_class[ISO_CODE_ESC] = ISO_escape;
6072 iso_code_class[ISO_CODE_SS2] = ISO_single_shift_2;
6073 iso_code_class[ISO_CODE_SS3] = ISO_single_shift_3;
6074 iso_code_class[ISO_CODE_CSI] = ISO_control_sequence_introducer;
6075
e0e989f6 6076 conversion_buffer_size = MINIMUM_CONVERSION_BUFFER_SIZE;
e0e989f6
KH
6077
6078 setup_coding_system (Qnil, &keyboard_coding);
6079 setup_coding_system (Qnil, &terminal_coding);
c4825358 6080 setup_coding_system (Qnil, &safe_terminal_coding);
6bc51348 6081 setup_coding_system (Qnil, &default_buffer_file_coding);
9ce27fde 6082
d46c5b12
KH
6083 bzero (coding_system_table, sizeof coding_system_table);
6084
66cfb530
KH
6085 bzero (ascii_skip_code, sizeof ascii_skip_code);
6086 for (i = 0; i < 128; i++)
6087 ascii_skip_code[i] = 1;
6088
9ce27fde
KH
6089#if defined (MSDOS) || defined (WINDOWSNT)
6090 system_eol_type = CODING_EOL_CRLF;
6091#else
6092 system_eol_type = CODING_EOL_LF;
6093#endif
b843d1ae
KH
6094
6095 inhibit_pre_post_conversion = 0;
e0e989f6
KH
6096}
6097
6098#ifdef emacs
6099
dfcf069d 6100void
e0e989f6
KH
6101syms_of_coding ()
6102{
6103 Qtarget_idx = intern ("target-idx");
6104 staticpro (&Qtarget_idx);
6105
bb0115a2
RS
6106 Qcoding_system_history = intern ("coding-system-history");
6107 staticpro (&Qcoding_system_history);
6108 Fset (Qcoding_system_history, Qnil);
6109
9ce27fde 6110 /* Target FILENAME is the first argument. */
e0e989f6 6111 Fput (Qinsert_file_contents, Qtarget_idx, make_number (0));
9ce27fde 6112 /* Target FILENAME is the third argument. */
e0e989f6
KH
6113 Fput (Qwrite_region, Qtarget_idx, make_number (2));
6114
6115 Qcall_process = intern ("call-process");
6116 staticpro (&Qcall_process);
9ce27fde 6117 /* Target PROGRAM is the first argument. */
e0e989f6
KH
6118 Fput (Qcall_process, Qtarget_idx, make_number (0));
6119
6120 Qcall_process_region = intern ("call-process-region");
6121 staticpro (&Qcall_process_region);
9ce27fde 6122 /* Target PROGRAM is the third argument. */
e0e989f6
KH
6123 Fput (Qcall_process_region, Qtarget_idx, make_number (2));
6124
6125 Qstart_process = intern ("start-process");
6126 staticpro (&Qstart_process);
9ce27fde 6127 /* Target PROGRAM is the third argument. */
e0e989f6
KH
6128 Fput (Qstart_process, Qtarget_idx, make_number (2));
6129
6130 Qopen_network_stream = intern ("open-network-stream");
6131 staticpro (&Qopen_network_stream);
9ce27fde 6132 /* Target SERVICE is the fourth argument. */
e0e989f6
KH
6133 Fput (Qopen_network_stream, Qtarget_idx, make_number (3));
6134
4ed46869
KH
6135 Qcoding_system = intern ("coding-system");
6136 staticpro (&Qcoding_system);
6137
6138 Qeol_type = intern ("eol-type");
6139 staticpro (&Qeol_type);
6140
6141 Qbuffer_file_coding_system = intern ("buffer-file-coding-system");
6142 staticpro (&Qbuffer_file_coding_system);
6143
6144 Qpost_read_conversion = intern ("post-read-conversion");
6145 staticpro (&Qpost_read_conversion);
6146
6147 Qpre_write_conversion = intern ("pre-write-conversion");
6148 staticpro (&Qpre_write_conversion);
6149
27901516
KH
6150 Qno_conversion = intern ("no-conversion");
6151 staticpro (&Qno_conversion);
6152
6153 Qundecided = intern ("undecided");
6154 staticpro (&Qundecided);
6155
4ed46869
KH
6156 Qcoding_system_p = intern ("coding-system-p");
6157 staticpro (&Qcoding_system_p);
6158
6159 Qcoding_system_error = intern ("coding-system-error");
6160 staticpro (&Qcoding_system_error);
6161
6162 Fput (Qcoding_system_error, Qerror_conditions,
6163 Fcons (Qcoding_system_error, Fcons (Qerror, Qnil)));
6164 Fput (Qcoding_system_error, Qerror_message,
9ce27fde 6165 build_string ("Invalid coding system"));
4ed46869 6166
d46c5b12
KH
6167 Qcoding_category = intern ("coding-category");
6168 staticpro (&Qcoding_category);
4ed46869
KH
6169 Qcoding_category_index = intern ("coding-category-index");
6170 staticpro (&Qcoding_category_index);
6171
d46c5b12
KH
6172 Vcoding_category_table
6173 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX), Qnil);
6174 staticpro (&Vcoding_category_table);
4ed46869
KH
6175 {
6176 int i;
6177 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
6178 {
d46c5b12
KH
6179 XVECTOR (Vcoding_category_table)->contents[i]
6180 = intern (coding_category_name[i]);
6181 Fput (XVECTOR (Vcoding_category_table)->contents[i],
6182 Qcoding_category_index, make_number (i));
4ed46869
KH
6183 }
6184 }
6185
f967223b
KH
6186 Qtranslation_table = intern ("translation-table");
6187 staticpro (&Qtranslation_table);
1397dc18 6188 Fput (Qtranslation_table, Qchar_table_extra_slots, make_number (1));
bdd9fb48 6189
f967223b
KH
6190 Qtranslation_table_id = intern ("translation-table-id");
6191 staticpro (&Qtranslation_table_id);
84fbb8a0 6192
f967223b
KH
6193 Qtranslation_table_for_decode = intern ("translation-table-for-decode");
6194 staticpro (&Qtranslation_table_for_decode);
a5d301df 6195
f967223b
KH
6196 Qtranslation_table_for_encode = intern ("translation-table-for-encode");
6197 staticpro (&Qtranslation_table_for_encode);
a5d301df 6198
70c22245
KH
6199 Qsafe_charsets = intern ("safe-charsets");
6200 staticpro (&Qsafe_charsets);
6201
1397dc18
KH
6202 Qvalid_codes = intern ("valid-codes");
6203 staticpro (&Qvalid_codes);
6204
9ce27fde
KH
6205 Qemacs_mule = intern ("emacs-mule");
6206 staticpro (&Qemacs_mule);
6207
d46c5b12
KH
6208 Qraw_text = intern ("raw-text");
6209 staticpro (&Qraw_text);
6210
4ed46869
KH
6211 defsubr (&Scoding_system_p);
6212 defsubr (&Sread_coding_system);
6213 defsubr (&Sread_non_nil_coding_system);
6214 defsubr (&Scheck_coding_system);
6215 defsubr (&Sdetect_coding_region);
d46c5b12 6216 defsubr (&Sdetect_coding_string);
4ed46869
KH
6217 defsubr (&Sdecode_coding_region);
6218 defsubr (&Sencode_coding_region);
6219 defsubr (&Sdecode_coding_string);
6220 defsubr (&Sencode_coding_string);
6221 defsubr (&Sdecode_sjis_char);
6222 defsubr (&Sencode_sjis_char);
6223 defsubr (&Sdecode_big5_char);
6224 defsubr (&Sencode_big5_char);
1ba9e4ab 6225 defsubr (&Sset_terminal_coding_system_internal);
c4825358 6226 defsubr (&Sset_safe_terminal_coding_system_internal);
4ed46869 6227 defsubr (&Sterminal_coding_system);
1ba9e4ab 6228 defsubr (&Sset_keyboard_coding_system_internal);
4ed46869 6229 defsubr (&Skeyboard_coding_system);
a5d301df 6230 defsubr (&Sfind_operation_coding_system);
1397dc18 6231 defsubr (&Supdate_coding_systems_internal);
66cfb530 6232 defsubr (&Sset_coding_priority_internal);
4ed46869 6233
4608c386
KH
6234 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list,
6235 "List of coding systems.\n\
6236\n\
6237Do not alter the value of this variable manually. This variable should be\n\
6238updated by the functions `make-coding-system' and\n\
6239`define-coding-system-alias'.");
6240 Vcoding_system_list = Qnil;
6241
6242 DEFVAR_LISP ("coding-system-alist", &Vcoding_system_alist,
6243 "Alist of coding system names.\n\
6244Each element is one element list of coding system name.\n\
6245This variable is given to `completing-read' as TABLE argument.\n\
6246\n\
6247Do not alter the value of this variable manually. This variable should be\n\
6248updated by the functions `make-coding-system' and\n\
6249`define-coding-system-alias'.");
6250 Vcoding_system_alist = Qnil;
6251
4ed46869
KH
6252 DEFVAR_LISP ("coding-category-list", &Vcoding_category_list,
6253 "List of coding-categories (symbols) ordered by priority.");
6254 {
6255 int i;
6256
6257 Vcoding_category_list = Qnil;
6258 for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--)
6259 Vcoding_category_list
d46c5b12
KH
6260 = Fcons (XVECTOR (Vcoding_category_table)->contents[i],
6261 Vcoding_category_list);
4ed46869
KH
6262 }
6263
6264 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read,
10bff6f1 6265 "Specify the coding system for read operations.\n\
2ebb362d 6266It is useful to bind this variable with `let', but do not set it globally.\n\
4ed46869 6267If the value is a coding system, it is used for decoding on read operation.\n\
a67a9c66 6268If not, an appropriate element is used from one of the coding system alists:\n\
10bff6f1 6269There are three such tables, `file-coding-system-alist',\n\
a67a9c66 6270`process-coding-system-alist', and `network-coding-system-alist'.");
4ed46869
KH
6271 Vcoding_system_for_read = Qnil;
6272
6273 DEFVAR_LISP ("coding-system-for-write", &Vcoding_system_for_write,
10bff6f1 6274 "Specify the coding system for write operations.\n\
928aedd8
RS
6275Programs bind this variable with `let', but you should not set it globally.\n\
6276If the value is a coding system, it is used for encoding of output,\n\
6277when writing it to a file and when sending it to a file or subprocess.\n\
6278\n\
6279If this does not specify a coding system, an appropriate element\n\
6280is used from one of the coding system alists:\n\
10bff6f1 6281There are three such tables, `file-coding-system-alist',\n\
928aedd8
RS
6282`process-coding-system-alist', and `network-coding-system-alist'.\n\
6283For output to files, if the above procedure does not specify a coding system,\n\
6284the value of `buffer-file-coding-system' is used.");
4ed46869
KH
6285 Vcoding_system_for_write = Qnil;
6286
6287 DEFVAR_LISP ("last-coding-system-used", &Vlast_coding_system_used,
a67a9c66 6288 "Coding system used in the latest file or process I/O.");
4ed46869
KH
6289 Vlast_coding_system_used = Qnil;
6290
9ce27fde 6291 DEFVAR_BOOL ("inhibit-eol-conversion", &inhibit_eol_conversion,
f07f4a24 6292 "*Non-nil means always inhibit code conversion of end-of-line format.\n\
94c7a214
DL
6293See info node `Coding Systems' and info node `Text and Binary' concerning\n\
6294such conversion.");
9ce27fde
KH
6295 inhibit_eol_conversion = 0;
6296
ed29121d
EZ
6297 DEFVAR_BOOL ("inherit-process-coding-system", &inherit_process_coding_system,
6298 "Non-nil means process buffer inherits coding system of process output.\n\
6299Bind it to t if the process output is to be treated as if it were a file\n\
6300read from some filesystem.");
6301 inherit_process_coding_system = 0;
6302
02ba4723
KH
6303 DEFVAR_LISP ("file-coding-system-alist", &Vfile_coding_system_alist,
6304 "Alist to decide a coding system to use for a file I/O operation.\n\
6305The format is ((PATTERN . VAL) ...),\n\
6306where PATTERN is a regular expression matching a file name,\n\
6307VAL is a coding system, a cons of coding systems, or a function symbol.\n\
6308If VAL is a coding system, it is used for both decoding and encoding\n\
6309the file contents.\n\
6310If VAL is a cons of coding systems, the car part is used for decoding,\n\
6311and the cdr part is used for encoding.\n\
6312If VAL is a function symbol, the function must return a coding system\n\
6313or a cons of coding systems which are used as above.\n\
e0e989f6 6314\n\
a85a871a 6315See also the function `find-operation-coding-system'\n\
eda284ac 6316and the variable `auto-coding-alist'.");
02ba4723
KH
6317 Vfile_coding_system_alist = Qnil;
6318
6319 DEFVAR_LISP ("process-coding-system-alist", &Vprocess_coding_system_alist,
6320 "Alist to decide a coding system to use for a process I/O operation.\n\
6321The format is ((PATTERN . VAL) ...),\n\
6322where PATTERN is a regular expression matching a program name,\n\
6323VAL is a coding system, a cons of coding systems, or a function symbol.\n\
6324If VAL is a coding system, it is used for both decoding what received\n\
6325from the program and encoding what sent to the program.\n\
6326If VAL is a cons of coding systems, the car part is used for decoding,\n\
6327and the cdr part is used for encoding.\n\
6328If VAL is a function symbol, the function must return a coding system\n\
6329or a cons of coding systems which are used as above.\n\
4ed46869 6330\n\
9ce27fde 6331See also the function `find-operation-coding-system'.");
02ba4723
KH
6332 Vprocess_coding_system_alist = Qnil;
6333
6334 DEFVAR_LISP ("network-coding-system-alist", &Vnetwork_coding_system_alist,
6335 "Alist to decide a coding system to use for a network I/O operation.\n\
6336The format is ((PATTERN . VAL) ...),\n\
6337where PATTERN is a regular expression matching a network service name\n\
6338or is a port number to connect to,\n\
6339VAL is a coding system, a cons of coding systems, or a function symbol.\n\
6340If VAL is a coding system, it is used for both decoding what received\n\
6341from the network stream and encoding what sent to the network stream.\n\
6342If VAL is a cons of coding systems, the car part is used for decoding,\n\
6343and the cdr part is used for encoding.\n\
6344If VAL is a function symbol, the function must return a coding system\n\
6345or a cons of coding systems which are used as above.\n\
4ed46869 6346\n\
9ce27fde 6347See also the function `find-operation-coding-system'.");
02ba4723 6348 Vnetwork_coding_system_alist = Qnil;
4ed46869 6349
68c45bf0
PE
6350 DEFVAR_LISP ("locale-coding-system", &Vlocale_coding_system,
6351 "Coding system to use with system messages.");
6352 Vlocale_coding_system = Qnil;
6353
005f0d35 6354 /* The eol mnemonics are reset in startup.el system-dependently. */
7722baf9
EZ
6355 DEFVAR_LISP ("eol-mnemonic-unix", &eol_mnemonic_unix,
6356 "*String displayed in mode line for UNIX-like (LF) end-of-line format.");
6357 eol_mnemonic_unix = build_string (":");
4ed46869 6358
7722baf9
EZ
6359 DEFVAR_LISP ("eol-mnemonic-dos", &eol_mnemonic_dos,
6360 "*String displayed in mode line for DOS-like (CRLF) end-of-line format.");
6361 eol_mnemonic_dos = build_string ("\\");
4ed46869 6362
7722baf9
EZ
6363 DEFVAR_LISP ("eol-mnemonic-mac", &eol_mnemonic_mac,
6364 "*String displayed in mode line for MAC-like (CR) end-of-line format.");
6365 eol_mnemonic_mac = build_string ("/");
4ed46869 6366
7722baf9
EZ
6367 DEFVAR_LISP ("eol-mnemonic-undecided", &eol_mnemonic_undecided,
6368 "*String displayed in mode line when end-of-line format is not yet determined.");
6369 eol_mnemonic_undecided = build_string (":");
4ed46869 6370
84fbb8a0 6371 DEFVAR_LISP ("enable-character-translation", &Venable_character_translation,
f967223b 6372 "*Non-nil enables character translation while encoding and decoding.");
84fbb8a0 6373 Venable_character_translation = Qt;
bdd9fb48 6374
f967223b
KH
6375 DEFVAR_LISP ("standard-translation-table-for-decode",
6376 &Vstandard_translation_table_for_decode,
84fbb8a0 6377 "Table for translating characters while decoding.");
f967223b 6378 Vstandard_translation_table_for_decode = Qnil;
bdd9fb48 6379
f967223b
KH
6380 DEFVAR_LISP ("standard-translation-table-for-encode",
6381 &Vstandard_translation_table_for_encode,
84fbb8a0 6382 "Table for translationg characters while encoding.");
f967223b 6383 Vstandard_translation_table_for_encode = Qnil;
4ed46869
KH
6384
6385 DEFVAR_LISP ("charset-revision-table", &Vcharset_revision_alist,
6386 "Alist of charsets vs revision numbers.\n\
6387While encoding, if a charset (car part of an element) is found,\n\
6388designate it with the escape sequence identifing revision (cdr part of the element).");
6389 Vcharset_revision_alist = Qnil;
02ba4723
KH
6390
6391 DEFVAR_LISP ("default-process-coding-system",
6392 &Vdefault_process_coding_system,
6393 "Cons of coding systems used for process I/O by default.\n\
6394The car part is used for decoding a process output,\n\
6395the cdr part is used for encoding a text to be sent to a process.");
6396 Vdefault_process_coding_system = Qnil;
c4825358 6397
3f003981
KH
6398 DEFVAR_LISP ("latin-extra-code-table", &Vlatin_extra_code_table,
6399 "Table of extra Latin codes in the range 128..159 (inclusive).\n\
c4825358
KH
6400This is a vector of length 256.\n\
6401If Nth element is non-nil, the existence of code N in a file\n\
bb0115a2 6402\(or output of subprocess) doesn't prevent it to be detected as\n\
3f003981
KH
6403a coding system of ISO 2022 variant which has a flag\n\
6404`accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file\n\
c4825358
KH
6405or reading output of a subprocess.\n\
6406Only 128th through 159th elements has a meaning.");
3f003981 6407 Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil);
d46c5b12
KH
6408
6409 DEFVAR_LISP ("select-safe-coding-system-function",
6410 &Vselect_safe_coding_system_function,
6411 "Function to call to select safe coding system for encoding a text.\n\
6412\n\
6413If set, this function is called to force a user to select a proper\n\
6414coding system which can encode the text in the case that a default\n\
6415coding system used in each operation can't encode the text.\n\
6416\n\
a85a871a 6417The default value is `select-safe-coding-system' (which see).");
d46c5b12
KH
6418 Vselect_safe_coding_system_function = Qnil;
6419
22ab2303 6420 DEFVAR_BOOL ("inhibit-iso-escape-detection",
74383408
KH
6421 &inhibit_iso_escape_detection,
6422 "If non-nil, Emacs ignores ISO2022's escape sequence on code detection.\n\
6423\n\
6424By default, on reading a file, Emacs tries to detect how the text is\n\
6425encoded. This code detection is sensitive to escape sequences. If\n\
6426the sequence is valid as ISO2022, the code is detemined as one of\n\
6427ISO2022 encoding, and the file is decoded by the corresponding coding\n\
6428system (e.g. `iso-2022-7bit').\n\
6429\n\
6430However, there may be a case that you want to read escape sequences in\n\
6431a file as is. In such a case, you can set this variable to non-nil.\n\
6432Then, as the code detection ignores any escape sequences, no file is\n\
6433detected as some of ISO2022 encoding. The result is that all escape\n\
6434sequences become visible in a buffer.\n\
6435\n\
6436The default value is nil, and it is strongly recommended not to change\n\
6437it. That is because many Emacs Lisp source files that contain\n\
6438non-ASCII characters are encoded by the coding system `iso-2022-7bit'\n\
6439in Emacs's distribution, and they won't be decoded correctly on\n\
6440reading if you suppress escapse sequence detection.\n\
6441\n\
6442The other way to read escape sequences in a file without decoding is\n\
6443to explicitely specify some coding system that doesn't use ISO2022's\n\
6444escape sequence (e.g `latin-1') on reading by \\[universal-coding-system-argument].");
6445 inhibit_iso_escape_detection = 0;
4ed46869
KH
6446}
6447
68c45bf0
PE
6448char *
6449emacs_strerror (error_number)
6450 int error_number;
6451{
6452 char *str;
6453
ca9c0567 6454 synchronize_system_messages_locale ();
68c45bf0
PE
6455 str = strerror (error_number);
6456
6457 if (! NILP (Vlocale_coding_system))
6458 {
6459 Lisp_Object dec = code_convert_string_norecord (build_string (str),
6460 Vlocale_coding_system,
6461 0);
6462 str = (char *) XSTRING (dec)->data;
6463 }
6464
6465 return str;
6466}
6467
4ed46869 6468#endif /* emacs */
c2f94ebc 6469