(encode_coding_sjis_big5): Fix previous change.
[bpt/emacs.git] / src / coding.c
CommitLineData
4ed46869 1/* Coding system handler (conversion, detection, and etc).
4a2f9c6a 2 Copyright (C) 1995, 1997, 1998 Electrotechnical Laboratory, JAPAN.
203cb916 3 Licensed to the Free Software Foundation.
4ed46869 4
369314dc
KH
5This file is part of GNU Emacs.
6
7GNU Emacs is free software; you can redistribute it and/or modify
8it under the terms of the GNU General Public License as published by
9the Free Software Foundation; either version 2, or (at your option)
10any later version.
4ed46869 11
369314dc
KH
12GNU Emacs is distributed in the hope that it will be useful,
13but WITHOUT ANY WARRANTY; without even the implied warranty of
14MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15GNU General Public License for more details.
4ed46869 16
369314dc
KH
17You should have received a copy of the GNU General Public License
18along with GNU Emacs; see the file COPYING. If not, write to
19the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
20Boston, MA 02111-1307, USA. */
4ed46869
KH
21
22/*** TABLE OF CONTENTS ***
23
b73bfc1c 24 0. General comments
4ed46869 25 1. Preamble
0ef69138 26 2. Emacs' internal format (emacs-mule) handlers
4ed46869
KH
27 3. ISO2022 handlers
28 4. Shift-JIS and BIG5 handlers
1397dc18
KH
29 5. CCL handlers
30 6. End-of-line handlers
31 7. C library functions
32 8. Emacs Lisp library functions
33 9. Post-amble
4ed46869
KH
34
35*/
36
b73bfc1c
KH
37/*** 0. General comments ***/
38
39
4ed46869
KH
40/*** GENERAL NOTE on CODING SYSTEM ***
41
42 Coding system is an encoding mechanism of one or more character
43 sets. Here's a list of coding systems which Emacs can handle. When
44 we say "decode", it means converting some other coding system to
0ef69138
KH
45 Emacs' internal format (emacs-internal), and when we say "encode",
46 it means converting the coding system emacs-mule to some other
47 coding system.
4ed46869 48
0ef69138 49 0. Emacs' internal format (emacs-mule)
4ed46869
KH
50
51 Emacs itself holds a multi-lingual character in a buffer and a string
f4dee582 52 in a special format. Details are described in section 2.
4ed46869
KH
53
54 1. ISO2022
55
56 The most famous coding system for multiple character sets. X's
f4dee582
RS
57 Compound Text, various EUCs (Extended Unix Code), and coding
58 systems used in Internet communication such as ISO-2022-JP are
59 all variants of ISO2022. Details are described in section 3.
4ed46869
KH
60
61 2. SJIS (or Shift-JIS or MS-Kanji-Code)
62
63 A coding system to encode character sets: ASCII, JISX0201, and
64 JISX0208. Widely used for PC's in Japan. Details are described in
f4dee582 65 section 4.
4ed46869
KH
66
67 3. BIG5
68
69 A coding system to encode character sets: ASCII and Big5. Widely
70 used by Chinese (mainly in Taiwan and Hong Kong). Details are
f4dee582
RS
71 described in section 4. In this file, when we write "BIG5"
72 (all uppercase), we mean the coding system, and when we write
73 "Big5" (capitalized), we mean the character set.
4ed46869 74
27901516
KH
75 4. Raw text
76
4608c386
KH
77 A coding system for a text containing random 8-bit code. Emacs does
78 no code conversion on such a text except for end-of-line format.
27901516
KH
79
80 5. Other
4ed46869 81
f4dee582 82 If a user wants to read/write a text encoded in a coding system not
4ed46869
KH
83 listed above, he can supply a decoder and an encoder for it in CCL
84 (Code Conversion Language) programs. Emacs executes the CCL program
85 while reading/writing.
86
d46c5b12
KH
87 Emacs represents a coding system by a Lisp symbol that has a property
88 `coding-system'. But, before actually using the coding system, the
4ed46869 89 information about it is set in a structure of type `struct
f4dee582 90 coding_system' for rapid processing. See section 6 for more details.
4ed46869
KH
91
92*/
93
94/*** GENERAL NOTES on END-OF-LINE FORMAT ***
95
96 How end-of-line of a text is encoded depends on a system. For
97 instance, Unix's format is just one byte of `line-feed' code,
f4dee582 98 whereas DOS's format is two-byte sequence of `carriage-return' and
d46c5b12
KH
99 `line-feed' codes. MacOS's format is usually one byte of
100 `carriage-return'.
4ed46869 101
f4dee582
RS
102 Since text characters encoding and end-of-line encoding are
103 independent, any coding system described above can take
4ed46869 104 any format of end-of-line. So, Emacs has information of format of
f4dee582 105 end-of-line in each coding-system. See section 6 for more details.
4ed46869
KH
106
107*/
108
109/*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
110
111 These functions check if a text between SRC and SRC_END is encoded
112 in the coding system category XXX. Each returns an integer value in
113 which appropriate flag bits for the category XXX is set. The flag
114 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the
115 template of these functions. */
116#if 0
117int
0ef69138 118detect_coding_emacs_mule (src, src_end)
4ed46869
KH
119 unsigned char *src, *src_end;
120{
121 ...
122}
123#endif
124
125/*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
126
b73bfc1c
KH
127 These functions decode SRC_BYTES length of unibyte text at SOURCE
128 encoded in CODING to Emacs' internal format. The resulting
129 multibyte text goes to a place pointed to by DESTINATION, the length
130 of which should not exceed DST_BYTES.
d46c5b12 131
b73bfc1c
KH
132 These functions set the information of original and decoded texts in
133 the members produced, produced_char, consumed, and consumed_char of
134 the structure *CODING. They also set the member result to one of
135 CODING_FINISH_XXX indicating how the decoding finished.
d46c5b12
KH
136
137 DST_BYTES zero means that source area and destination area are
138 overlapped, which means that we can produce a decoded text until it
139 reaches at the head of not-yet-decoded source text.
140
141 Below is a template of these functions. */
4ed46869 142#if 0
b73bfc1c 143static void
d46c5b12 144decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
145 struct coding_system *coding;
146 unsigned char *source, *destination;
147 int src_bytes, dst_bytes;
4ed46869
KH
148{
149 ...
150}
151#endif
152
153/*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
154
0ef69138 155 These functions encode SRC_BYTES length text at SOURCE of Emacs'
b73bfc1c
KH
156 internal multibyte format to CODING. The resulting unibyte text
157 goes to a place pointed to by DESTINATION, the length of which
158 should not exceed DST_BYTES.
d46c5b12 159
b73bfc1c
KH
160 These functions set the information of original and encoded texts in
161 the members produced, produced_char, consumed, and consumed_char of
162 the structure *CODING. They also set the member result to one of
163 CODING_FINISH_XXX indicating how the encoding finished.
d46c5b12
KH
164
165 DST_BYTES zero means that source area and destination area are
b73bfc1c
KH
166 overlapped, which means that we can produce a encoded text until it
167 reaches at the head of not-yet-encoded source text.
d46c5b12
KH
168
169 Below is a template of these functions. */
4ed46869 170#if 0
b73bfc1c 171static void
d46c5b12 172encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
173 struct coding_system *coding;
174 unsigned char *source, *destination;
175 int src_bytes, dst_bytes;
4ed46869
KH
176{
177 ...
178}
179#endif
180
181/*** COMMONLY USED MACROS ***/
182
b73bfc1c
KH
183/* The following two macros ONE_MORE_BYTE and TWO_MORE_BYTES safely
184 get one, two, and three bytes from the source text respectively.
185 If there are not enough bytes in the source, they jump to
186 `label_end_of_loop'. The caller should set variables `coding',
187 `src' and `src_end' to appropriate pointer in advance. These
188 macros are called from decoding routines `decode_coding_XXX', thus
189 it is assumed that the source text is unibyte. */
4ed46869 190
b73bfc1c
KH
191#define ONE_MORE_BYTE(c1) \
192 do { \
193 if (src >= src_end) \
194 { \
195 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
196 goto label_end_of_loop; \
197 } \
198 c1 = *src++; \
4ed46869
KH
199 } while (0)
200
b73bfc1c
KH
201#define TWO_MORE_BYTES(c1, c2) \
202 do { \
203 if (src + 1 >= src_end) \
204 { \
205 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
206 goto label_end_of_loop; \
207 } \
208 c1 = *src++; \
209 c2 = *src++; \
4ed46869
KH
210 } while (0)
211
4ed46869 212
b73bfc1c
KH
213/* Set C to the next character at the source text pointed by `src'.
214 If there are not enough characters in the source, jump to
215 `label_end_of_loop'. The caller should set variables `coding'
216 `src', `src_end', and `translation_table' to appropriate pointers
217 in advance. This macro is used in encoding routines
218 `encode_coding_XXX', thus it assumes that the source text is in
219 multibyte form except for 8-bit characters. 8-bit characters are
220 in multibyte form if coding->src_multibyte is nonzero, else they
221 are represented by a single byte. */
4ed46869 222
b73bfc1c
KH
223#define ONE_MORE_CHAR(c) \
224 do { \
225 int len = src_end - src; \
226 int bytes; \
227 if (len <= 0) \
228 { \
229 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
230 goto label_end_of_loop; \
231 } \
232 if (coding->src_multibyte \
233 || UNIBYTE_STR_AS_MULTIBYTE_P (src, len, bytes)) \
234 c = STRING_CHAR_AND_LENGTH (src, len, bytes); \
235 else \
236 c = *src, bytes = 1; \
237 if (!NILP (translation_table)) \
39658efc 238 c = translate_char (translation_table, c, -1, 0, 0); \
b73bfc1c 239 src += bytes; \
4ed46869
KH
240 } while (0)
241
4ed46869 242
b73bfc1c
KH
243/* Produce a multibyte form of characater C to `dst'. Jump to
244 `label_end_of_loop' if there's not enough space at `dst'.
245
246 If we are now in the middle of composition sequence, the decoded
247 character may be ALTCHAR (for the current composition). In that
248 case, the character goes to coding->cmp_data->data instead of
249 `dst'.
250
251 This macro is used in decoding routines. */
252
253#define EMIT_CHAR(c) \
4ed46869 254 do { \
b73bfc1c
KH
255 if (! COMPOSING_P (coding) \
256 || coding->composing == COMPOSITION_RELATIVE \
257 || coding->composing == COMPOSITION_WITH_RULE) \
258 { \
259 int bytes = CHAR_BYTES (c); \
260 if ((dst + bytes) > (dst_bytes ? dst_end : src)) \
261 { \
262 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
263 goto label_end_of_loop; \
264 } \
265 dst += CHAR_STRING (c, dst); \
266 coding->produced_char++; \
267 } \
ec6d2bb8 268 \
b73bfc1c
KH
269 if (COMPOSING_P (coding) \
270 && coding->composing != COMPOSITION_RELATIVE) \
271 { \
272 CODING_ADD_COMPOSITION_COMPONENT (coding, c); \
273 coding->composition_rule_follows \
274 = coding->composing != COMPOSITION_WITH_ALTCHARS; \
275 } \
4ed46869
KH
276 } while (0)
277
4ed46869 278
b73bfc1c
KH
279#define EMIT_ONE_BYTE(c) \
280 do { \
281 if (dst >= (dst_bytes ? dst_end : src)) \
282 { \
283 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
284 goto label_end_of_loop; \
285 } \
286 *dst++ = c; \
287 } while (0)
288
289#define EMIT_TWO_BYTES(c1, c2) \
290 do { \
291 if (dst + 2 > (dst_bytes ? dst_end : src)) \
292 { \
293 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
294 goto label_end_of_loop; \
295 } \
296 *dst++ = c1, *dst++ = c2; \
297 } while (0)
298
299#define EMIT_BYTES(from, to) \
300 do { \
301 if (dst + (to - from) > (dst_bytes ? dst_end : src)) \
302 { \
303 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
304 goto label_end_of_loop; \
305 } \
306 while (from < to) \
307 *dst++ = *from++; \
4ed46869
KH
308 } while (0)
309
310\f
311/*** 1. Preamble ***/
312
68c45bf0
PE
313#ifdef emacs
314#include <config.h>
315#endif
316
4ed46869
KH
317#include <stdio.h>
318
319#ifdef emacs
320
4ed46869
KH
321#include "lisp.h"
322#include "buffer.h"
323#include "charset.h"
ec6d2bb8 324#include "composite.h"
4ed46869
KH
325#include "ccl.h"
326#include "coding.h"
327#include "window.h"
328
329#else /* not emacs */
330
331#include "mulelib.h"
332
333#endif /* not emacs */
334
335Lisp_Object Qcoding_system, Qeol_type;
336Lisp_Object Qbuffer_file_coding_system;
337Lisp_Object Qpost_read_conversion, Qpre_write_conversion;
27901516 338Lisp_Object Qno_conversion, Qundecided;
bb0115a2 339Lisp_Object Qcoding_system_history;
05e6f5dc 340Lisp_Object Qsafe_chars;
1397dc18 341Lisp_Object Qvalid_codes;
4ed46869
KH
342
343extern Lisp_Object Qinsert_file_contents, Qwrite_region;
344Lisp_Object Qcall_process, Qcall_process_region, Qprocess_argument;
345Lisp_Object Qstart_process, Qopen_network_stream;
346Lisp_Object Qtarget_idx;
347
d46c5b12
KH
348Lisp_Object Vselect_safe_coding_system_function;
349
7722baf9
EZ
350/* Mnemonic string for each format of end-of-line. */
351Lisp_Object eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac;
352/* Mnemonic string to indicate format of end-of-line is not yet
4ed46869 353 decided. */
7722baf9 354Lisp_Object eol_mnemonic_undecided;
4ed46869 355
9ce27fde
KH
356/* Format of end-of-line decided by system. This is CODING_EOL_LF on
357 Unix, CODING_EOL_CRLF on DOS/Windows, and CODING_EOL_CR on Mac. */
358int system_eol_type;
359
4ed46869
KH
360#ifdef emacs
361
4608c386
KH
362Lisp_Object Vcoding_system_list, Vcoding_system_alist;
363
364Lisp_Object Qcoding_system_p, Qcoding_system_error;
4ed46869 365
d46c5b12
KH
366/* Coding system emacs-mule and raw-text are for converting only
367 end-of-line format. */
368Lisp_Object Qemacs_mule, Qraw_text;
9ce27fde 369
4ed46869
KH
370/* Coding-systems are handed between Emacs Lisp programs and C internal
371 routines by the following three variables. */
372/* Coding-system for reading files and receiving data from process. */
373Lisp_Object Vcoding_system_for_read;
374/* Coding-system for writing files and sending data to process. */
375Lisp_Object Vcoding_system_for_write;
376/* Coding-system actually used in the latest I/O. */
377Lisp_Object Vlast_coding_system_used;
378
c4825358 379/* A vector of length 256 which contains information about special
94487c4e 380 Latin codes (especially for dealing with Microsoft codes). */
3f003981 381Lisp_Object Vlatin_extra_code_table;
c4825358 382
9ce27fde
KH
383/* Flag to inhibit code conversion of end-of-line format. */
384int inhibit_eol_conversion;
385
74383408
KH
386/* Flag to inhibit ISO2022 escape sequence detection. */
387int inhibit_iso_escape_detection;
388
ed29121d
EZ
389/* Flag to make buffer-file-coding-system inherit from process-coding. */
390int inherit_process_coding_system;
391
c4825358 392/* Coding system to be used to encode text for terminal display. */
4ed46869
KH
393struct coding_system terminal_coding;
394
c4825358
KH
395/* Coding system to be used to encode text for terminal display when
396 terminal coding system is nil. */
397struct coding_system safe_terminal_coding;
398
399/* Coding system of what is sent from terminal keyboard. */
4ed46869
KH
400struct coding_system keyboard_coding;
401
6bc51348
KH
402/* Default coding system to be used to write a file. */
403struct coding_system default_buffer_file_coding;
404
02ba4723
KH
405Lisp_Object Vfile_coding_system_alist;
406Lisp_Object Vprocess_coding_system_alist;
407Lisp_Object Vnetwork_coding_system_alist;
4ed46869 408
68c45bf0
PE
409Lisp_Object Vlocale_coding_system;
410
4ed46869
KH
411#endif /* emacs */
412
d46c5b12 413Lisp_Object Qcoding_category, Qcoding_category_index;
4ed46869
KH
414
415/* List of symbols `coding-category-xxx' ordered by priority. */
416Lisp_Object Vcoding_category_list;
417
d46c5b12
KH
418/* Table of coding categories (Lisp symbols). */
419Lisp_Object Vcoding_category_table;
4ed46869
KH
420
421/* Table of names of symbol for each coding-category. */
422char *coding_category_name[CODING_CATEGORY_IDX_MAX] = {
0ef69138 423 "coding-category-emacs-mule",
4ed46869
KH
424 "coding-category-sjis",
425 "coding-category-iso-7",
d46c5b12 426 "coding-category-iso-7-tight",
4ed46869
KH
427 "coding-category-iso-8-1",
428 "coding-category-iso-8-2",
7717c392
KH
429 "coding-category-iso-7-else",
430 "coding-category-iso-8-else",
89fa8b36 431 "coding-category-ccl",
4ed46869 432 "coding-category-big5",
fa42c37f
KH
433 "coding-category-utf-8",
434 "coding-category-utf-16-be",
435 "coding-category-utf-16-le",
27901516 436 "coding-category-raw-text",
89fa8b36 437 "coding-category-binary"
4ed46869
KH
438};
439
66cfb530 440/* Table of pointers to coding systems corresponding to each coding
d46c5b12
KH
441 categories. */
442struct coding_system *coding_system_table[CODING_CATEGORY_IDX_MAX];
443
66cfb530
KH
444/* Table of coding category masks. Nth element is a mask for a coding
445 cateogry of which priority is Nth. */
446static
447int coding_priorities[CODING_CATEGORY_IDX_MAX];
448
f967223b
KH
449/* Flag to tell if we look up translation table on character code
450 conversion. */
84fbb8a0 451Lisp_Object Venable_character_translation;
f967223b
KH
452/* Standard translation table to look up on decoding (reading). */
453Lisp_Object Vstandard_translation_table_for_decode;
454/* Standard translation table to look up on encoding (writing). */
455Lisp_Object Vstandard_translation_table_for_encode;
84fbb8a0 456
f967223b
KH
457Lisp_Object Qtranslation_table;
458Lisp_Object Qtranslation_table_id;
459Lisp_Object Qtranslation_table_for_decode;
460Lisp_Object Qtranslation_table_for_encode;
4ed46869
KH
461
462/* Alist of charsets vs revision number. */
463Lisp_Object Vcharset_revision_alist;
464
02ba4723
KH
465/* Default coding systems used for process I/O. */
466Lisp_Object Vdefault_process_coding_system;
467
b843d1ae
KH
468/* Global flag to tell that we can't call post-read-conversion and
469 pre-write-conversion functions. Usually the value is zero, but it
470 is set to 1 temporarily while such functions are running. This is
471 to avoid infinite recursive call. */
472static int inhibit_pre_post_conversion;
473
05e6f5dc
KH
474/* Char-table containing safe coding systems of each character. */
475Lisp_Object Vchar_coding_system_table;
476Lisp_Object Qchar_coding_system;
477
478/* Return `safe-chars' property of coding system CODING. Don't check
479 validity of CODING. */
480
481Lisp_Object
482coding_safe_chars (coding)
483 struct coding_system *coding;
484{
485 Lisp_Object coding_spec, plist, safe_chars;
486
487 coding_spec = Fget (coding->symbol, Qcoding_system);
488 plist = XVECTOR (coding_spec)->contents[3];
489 safe_chars = Fplist_get (XVECTOR (coding_spec)->contents[3], Qsafe_chars);
490 return (CHAR_TABLE_P (safe_chars) ? safe_chars : Qt);
491}
492
493#define CODING_SAFE_CHAR_P(safe_chars, c) \
494 (EQ (safe_chars, Qt) || !NILP (CHAR_TABLE_REF (safe_chars, c)))
495
4ed46869 496\f
0ef69138 497/*** 2. Emacs internal format (emacs-mule) handlers ***/
4ed46869
KH
498
499/* Emacs' internal format for encoding multiple character sets is a
f4dee582 500 kind of multi-byte encoding, i.e. characters are encoded by
b73bfc1c
KH
501 variable-length sequences of one-byte codes.
502
503 ASCII characters and control characters (e.g. `tab', `newline') are
504 represented by one-byte sequences which are their ASCII codes, in
505 the range 0x00 through 0x7F.
506
507 8-bit characters of the range 0x80..0x9F are represented by
508 two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit
509 code + 0x20).
510
511 8-bit characters of the range 0xA0..0xFF are represented by
512 one-byte sequences which are their 8-bit code.
513
514 The other characters are represented by a sequence of `base
515 leading-code', optional `extended leading-code', and one or two
516 `position-code's. The length of the sequence is determined by the
517 base leading-code. Leading-code takes the range 0x80 through 0x9F,
518 whereas extended leading-code and position-code take the range 0xA0
519 through 0xFF. See `charset.h' for more details about leading-code
520 and position-code.
f4dee582 521
4ed46869 522 --- CODE RANGE of Emacs' internal format ---
b73bfc1c
KH
523 character set range
524 ------------- -----
525 ascii 0x00..0x7F
526 eight-bit-control LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF
527 eight-bit-graphic 0xA0..0xBF
528 ELSE 0x81..0x9F + [0xA0..0xFF]+
4ed46869
KH
529 ---------------------------------------------
530
531 */
532
533enum emacs_code_class_type emacs_code_class[256];
534
4ed46869
KH
535/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
536 Check if a text is encoded in Emacs' internal format. If it is,
d46c5b12 537 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */
4ed46869
KH
538
539int
0ef69138 540detect_coding_emacs_mule (src, src_end)
b73bfc1c 541 unsigned char *src, *src_end;
4ed46869
KH
542{
543 unsigned char c;
544 int composing = 0;
b73bfc1c
KH
545 /* Dummy for ONE_MORE_BYTE. */
546 struct coding_system dummy_coding;
547 struct coding_system *coding = &dummy_coding;
4ed46869 548
b73bfc1c 549 while (1)
4ed46869 550 {
b73bfc1c 551 ONE_MORE_BYTE (c);
4ed46869
KH
552
553 if (composing)
554 {
555 if (c < 0xA0)
556 composing = 0;
b73bfc1c
KH
557 else if (c == 0xA0)
558 {
559 ONE_MORE_BYTE (c);
560 c &= 0x7F;
561 }
4ed46869
KH
562 else
563 c -= 0x20;
564 }
565
b73bfc1c 566 if (c < 0x20)
4ed46869 567 {
4ed46869
KH
568 if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)
569 return 0;
b73bfc1c
KH
570 }
571 else if (c >= 0x80 && c < 0xA0)
572 {
573 if (c == 0x80)
574 /* Old leading code for a composite character. */
575 composing = 1;
576 else
577 {
578 unsigned char *src_base = src - 1;
579 int bytes;
4ed46869 580
b73bfc1c
KH
581 if (!UNIBYTE_STR_AS_MULTIBYTE_P (src_base, src_end - src_base,
582 bytes))
583 return 0;
584 src = src_base + bytes;
585 }
586 }
587 }
588 label_end_of_loop:
589 return CODING_CATEGORY_MASK_EMACS_MULE;
590}
4ed46869 591
4ed46869 592
b73bfc1c 593/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
4ed46869 594
b73bfc1c
KH
595static void
596decode_coding_emacs_mule (coding, source, destination, src_bytes, dst_bytes)
597 struct coding_system *coding;
598 unsigned char *source, *destination;
599 int src_bytes, dst_bytes;
600{
601 unsigned char *src = source;
602 unsigned char *src_end = source + src_bytes;
603 unsigned char *dst = destination;
604 unsigned char *dst_end = destination + dst_bytes;
605 /* SRC_BASE remembers the start position in source in each loop.
606 The loop will be exited when there's not enough source code, or
607 when there's not enough destination area to produce a
608 character. */
609 unsigned char *src_base;
4ed46869 610
b73bfc1c 611 coding->produced_char = 0;
8a33cf7b 612 while ((src_base = src) < src_end)
b73bfc1c
KH
613 {
614 unsigned char tmp[MAX_MULTIBYTE_LENGTH], *p;
615 int bytes;
ec6d2bb8 616
b73bfc1c
KH
617 if (UNIBYTE_STR_AS_MULTIBYTE_P (src, src_end - src, bytes))
618 {
619 p = src;
620 src += bytes;
621 }
622 else
623 {
624 bytes = CHAR_STRING (*src, tmp);
625 p = tmp;
626 src++;
627 }
628 if (dst + bytes >= (dst_bytes ? dst_end : src))
629 {
630 coding->result = CODING_FINISH_INSUFFICIENT_DST;
4ed46869
KH
631 break;
632 }
b73bfc1c
KH
633 while (bytes--) *dst++ = *p++;
634 coding->produced_char++;
4ed46869 635 }
b73bfc1c
KH
636 coding->consumed = coding->consumed_char = src_base - source;
637 coding->produced = dst - destination;
4ed46869
KH
638}
639
b73bfc1c
KH
640#define encode_coding_emacs_mule(coding, source, destination, src_bytes, dst_bytes) \
641 encode_eol (coding, source, destination, src_bytes, dst_bytes)
642
643
4ed46869
KH
644\f
645/*** 3. ISO2022 handlers ***/
646
647/* The following note describes the coding system ISO2022 briefly.
39787efd
KH
648 Since the intention of this note is to help understand the
649 functions in this file, some parts are NOT ACCURATE or OVERLY
650 SIMPLIFIED. For thorough understanding, please refer to the
4ed46869
KH
651 original document of ISO2022.
652
653 ISO2022 provides many mechanisms to encode several character sets
39787efd
KH
654 in 7-bit and 8-bit environments. For 7-bite environments, all text
655 is encoded using bytes less than 128. This may make the encoded
656 text a little bit longer, but the text passes more easily through
657 several gateways, some of which strip off MSB (Most Signigant Bit).
b73bfc1c 658
39787efd 659 There are two kinds of character sets: control character set and
4ed46869
KH
660 graphic character set. The former contains control characters such
661 as `newline' and `escape' to provide control functions (control
39787efd
KH
662 functions are also provided by escape sequences). The latter
663 contains graphic characters such as 'A' and '-'. Emacs recognizes
4ed46869
KH
664 two control character sets and many graphic character sets.
665
666 Graphic character sets are classified into one of the following
39787efd
KH
667 four classes, according to the number of bytes (DIMENSION) and
668 number of characters in one dimension (CHARS) of the set:
669 - DIMENSION1_CHARS94
670 - DIMENSION1_CHARS96
671 - DIMENSION2_CHARS94
672 - DIMENSION2_CHARS96
673
674 In addition, each character set is assigned an identification tag,
675 unique for each set, called "final character" (denoted as <F>
676 hereafter). The <F> of each character set is decided by ECMA(*)
677 when it is registered in ISO. The code range of <F> is 0x30..0x7F
678 (0x30..0x3F are for private use only).
4ed46869
KH
679
680 Note (*): ECMA = European Computer Manufacturers Association
681
682 Here are examples of graphic character set [NAME(<F>)]:
683 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
684 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
685 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
686 o DIMENSION2_CHARS96 -- none for the moment
687
39787efd 688 A code area (1 byte=8 bits) is divided into 4 areas, C0, GL, C1, and GR.
4ed46869
KH
689 C0 [0x00..0x1F] -- control character plane 0
690 GL [0x20..0x7F] -- graphic character plane 0
691 C1 [0x80..0x9F] -- control character plane 1
692 GR [0xA0..0xFF] -- graphic character plane 1
693
694 A control character set is directly designated and invoked to C0 or
39787efd
KH
695 C1 by an escape sequence. The most common case is that:
696 - ISO646's control character set is designated/invoked to C0, and
697 - ISO6429's control character set is designated/invoked to C1,
698 and usually these designations/invocations are omitted in encoded
699 text. In a 7-bit environment, only C0 can be used, and a control
700 character for C1 is encoded by an appropriate escape sequence to
701 fit into the environment. All control characters for C1 are
702 defined to have corresponding escape sequences.
4ed46869
KH
703
704 A graphic character set is at first designated to one of four
705 graphic registers (G0 through G3), then these graphic registers are
706 invoked to GL or GR. These designations and invocations can be
707 done independently. The most common case is that G0 is invoked to
39787efd
KH
708 GL, G1 is invoked to GR, and ASCII is designated to G0. Usually
709 these invocations and designations are omitted in encoded text.
710 In a 7-bit environment, only GL can be used.
4ed46869 711
39787efd
KH
712 When a graphic character set of CHARS94 is invoked to GL, codes
713 0x20 and 0x7F of the GL area work as control characters SPACE and
714 DEL respectively, and codes 0xA0 and 0xFF of the GR area should not
715 be used.
4ed46869
KH
716
717 There are two ways of invocation: locking-shift and single-shift.
718 With locking-shift, the invocation lasts until the next different
39787efd
KH
719 invocation, whereas with single-shift, the invocation affects the
720 following character only and doesn't affect the locking-shift
721 state. Invocations are done by the following control characters or
722 escape sequences:
4ed46869
KH
723
724 ----------------------------------------------------------------------
39787efd 725 abbrev function cntrl escape seq description
4ed46869 726 ----------------------------------------------------------------------
39787efd
KH
727 SI/LS0 (shift-in) 0x0F none invoke G0 into GL
728 SO/LS1 (shift-out) 0x0E none invoke G1 into GL
729 LS2 (locking-shift-2) none ESC 'n' invoke G2 into GL
730 LS3 (locking-shift-3) none ESC 'o' invoke G3 into GL
731 LS1R (locking-shift-1 right) none ESC '~' invoke G1 into GR (*)
732 LS2R (locking-shift-2 right) none ESC '}' invoke G2 into GR (*)
733 LS3R (locking-shift 3 right) none ESC '|' invoke G3 into GR (*)
734 SS2 (single-shift-2) 0x8E ESC 'N' invoke G2 for one char
735 SS3 (single-shift-3) 0x8F ESC 'O' invoke G3 for one char
4ed46869 736 ----------------------------------------------------------------------
39787efd
KH
737 (*) These are not used by any known coding system.
738
739 Control characters for these functions are defined by macros
740 ISO_CODE_XXX in `coding.h'.
4ed46869 741
39787efd 742 Designations are done by the following escape sequences:
4ed46869
KH
743 ----------------------------------------------------------------------
744 escape sequence description
745 ----------------------------------------------------------------------
746 ESC '(' <F> designate DIMENSION1_CHARS94<F> to G0
747 ESC ')' <F> designate DIMENSION1_CHARS94<F> to G1
748 ESC '*' <F> designate DIMENSION1_CHARS94<F> to G2
749 ESC '+' <F> designate DIMENSION1_CHARS94<F> to G3
750 ESC ',' <F> designate DIMENSION1_CHARS96<F> to G0 (*)
751 ESC '-' <F> designate DIMENSION1_CHARS96<F> to G1
752 ESC '.' <F> designate DIMENSION1_CHARS96<F> to G2
753 ESC '/' <F> designate DIMENSION1_CHARS96<F> to G3
754 ESC '$' '(' <F> designate DIMENSION2_CHARS94<F> to G0 (**)
755 ESC '$' ')' <F> designate DIMENSION2_CHARS94<F> to G1
756 ESC '$' '*' <F> designate DIMENSION2_CHARS94<F> to G2
757 ESC '$' '+' <F> designate DIMENSION2_CHARS94<F> to G3
758 ESC '$' ',' <F> designate DIMENSION2_CHARS96<F> to G0 (*)
759 ESC '$' '-' <F> designate DIMENSION2_CHARS96<F> to G1
760 ESC '$' '.' <F> designate DIMENSION2_CHARS96<F> to G2
761 ESC '$' '/' <F> designate DIMENSION2_CHARS96<F> to G3
762 ----------------------------------------------------------------------
763
764 In this list, "DIMENSION1_CHARS94<F>" means a graphic character set
39787efd 765 of dimension 1, chars 94, and final character <F>, etc...
4ed46869
KH
766
767 Note (*): Although these designations are not allowed in ISO2022,
768 Emacs accepts them on decoding, and produces them on encoding
39787efd 769 CHARS96 character sets in a coding system which is characterized as
4ed46869
KH
770 7-bit environment, non-locking-shift, and non-single-shift.
771
772 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
39787efd 773 '(' can be omitted. We refer to this as "short-form" hereafter.
4ed46869
KH
774
775 Now you may notice that there are a lot of ways for encoding the
39787efd
KH
776 same multilingual text in ISO2022. Actually, there exist many
777 coding systems such as Compound Text (used in X11's inter client
778 communication, ISO-2022-JP (used in Japanese internet), ISO-2022-KR
779 (used in Korean internet), EUC (Extended UNIX Code, used in Asian
4ed46869
KH
780 localized platforms), and all of these are variants of ISO2022.
781
782 In addition to the above, Emacs handles two more kinds of escape
783 sequences: ISO6429's direction specification and Emacs' private
784 sequence for specifying character composition.
785
39787efd 786 ISO6429's direction specification takes the following form:
4ed46869
KH
787 o CSI ']' -- end of the current direction
788 o CSI '0' ']' -- end of the current direction
789 o CSI '1' ']' -- start of left-to-right text
790 o CSI '2' ']' -- start of right-to-left text
791 The control character CSI (0x9B: control sequence introducer) is
39787efd
KH
792 abbreviated to the escape sequence ESC '[' in a 7-bit environment.
793
794 Character composition specification takes the following form:
ec6d2bb8
KH
795 o ESC '0' -- start relative composition
796 o ESC '1' -- end composition
797 o ESC '2' -- start rule-base composition (*)
798 o ESC '3' -- start relative composition with alternate chars (**)
799 o ESC '4' -- start rule-base composition with alternate chars (**)
b73bfc1c
KH
800 Since these are not standard escape sequences of any ISO standard,
801 the use of them for these meaning is restricted to Emacs only.
ec6d2bb8 802
b73bfc1c
KH
803 (*) This form is used only in Emacs 20.5 and the older versions,
804 but the newer versions can safely decode it.
805 (**) This form is used only in Emacs 21.1 and the newer versions,
806 and the older versions can't decode it.
ec6d2bb8 807
b73bfc1c
KH
808 Here's a list of examples usages of these composition escape
809 sequences (categorized by `enum composition_method').
ec6d2bb8 810
b73bfc1c 811 COMPOSITION_RELATIVE:
ec6d2bb8 812 ESC 0 CHAR [ CHAR ] ESC 1
b73bfc1c 813 COMPOSITOIN_WITH_RULE:
ec6d2bb8 814 ESC 2 CHAR [ RULE CHAR ] ESC 1
b73bfc1c 815 COMPOSITION_WITH_ALTCHARS:
ec6d2bb8 816 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
b73bfc1c 817 COMPOSITION_WITH_RULE_ALTCHARS:
ec6d2bb8 818 ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */
4ed46869
KH
819
820enum iso_code_class_type iso_code_class[256];
821
05e6f5dc
KH
822#define CHARSET_OK(idx, charset, c) \
823 (coding_system_table[idx] \
824 && (charset == CHARSET_ASCII \
825 || (safe_chars = coding_safe_chars (coding_system_table[idx]), \
826 CODING_SAFE_CHAR_P (safe_chars, c))) \
827 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding_system_table[idx], \
828 charset) \
829 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
d46c5b12
KH
830
831#define SHIFT_OUT_OK(idx) \
832 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
833
4ed46869
KH
834/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
835 Check if a text is encoded in ISO2022. If it is, returns an
836 integer in which appropriate flag bits any of:
837 CODING_CATEGORY_MASK_ISO_7
d46c5b12 838 CODING_CATEGORY_MASK_ISO_7_TIGHT
4ed46869
KH
839 CODING_CATEGORY_MASK_ISO_8_1
840 CODING_CATEGORY_MASK_ISO_8_2
7717c392
KH
841 CODING_CATEGORY_MASK_ISO_7_ELSE
842 CODING_CATEGORY_MASK_ISO_8_ELSE
4ed46869
KH
843 are set. If a code which should never appear in ISO2022 is found,
844 returns 0. */
845
846int
847detect_coding_iso2022 (src, src_end)
848 unsigned char *src, *src_end;
849{
d46c5b12
KH
850 int mask = CODING_CATEGORY_MASK_ISO;
851 int mask_found = 0;
f46869e4 852 int reg[4], shift_out = 0, single_shifting = 0;
d46c5b12 853 int c, c1, i, charset;
b73bfc1c
KH
854 /* Dummy for ONE_MORE_BYTE. */
855 struct coding_system dummy_coding;
856 struct coding_system *coding = &dummy_coding;
05e6f5dc 857 Lisp_Object safe_chars;
3f003981 858
d46c5b12 859 reg[0] = CHARSET_ASCII, reg[1] = reg[2] = reg[3] = -1;
3f003981 860 while (mask && src < src_end)
4ed46869 861 {
b73bfc1c 862 ONE_MORE_BYTE (c);
4ed46869
KH
863 switch (c)
864 {
865 case ISO_CODE_ESC:
74383408
KH
866 if (inhibit_iso_escape_detection)
867 break;
f46869e4 868 single_shifting = 0;
b73bfc1c 869 ONE_MORE_BYTE (c);
d46c5b12 870 if (c >= '(' && c <= '/')
4ed46869 871 {
bf9cdd4e 872 /* Designation sequence for a charset of dimension 1. */
b73bfc1c 873 ONE_MORE_BYTE (c1);
d46c5b12
KH
874 if (c1 < ' ' || c1 >= 0x80
875 || (charset = iso_charset_table[0][c >= ','][c1]) < 0)
876 /* Invalid designation sequence. Just ignore. */
877 break;
878 reg[(c - '(') % 4] = charset;
bf9cdd4e
KH
879 }
880 else if (c == '$')
881 {
882 /* Designation sequence for a charset of dimension 2. */
b73bfc1c 883 ONE_MORE_BYTE (c);
bf9cdd4e
KH
884 if (c >= '@' && c <= 'B')
885 /* Designation for JISX0208.1978, GB2312, or JISX0208. */
d46c5b12 886 reg[0] = charset = iso_charset_table[1][0][c];
bf9cdd4e 887 else if (c >= '(' && c <= '/')
bcf26d6a 888 {
b73bfc1c 889 ONE_MORE_BYTE (c1);
d46c5b12
KH
890 if (c1 < ' ' || c1 >= 0x80
891 || (charset = iso_charset_table[1][c >= ','][c1]) < 0)
892 /* Invalid designation sequence. Just ignore. */
893 break;
894 reg[(c - '(') % 4] = charset;
bcf26d6a 895 }
bf9cdd4e 896 else
d46c5b12
KH
897 /* Invalid designation sequence. Just ignore. */
898 break;
899 }
ae9ff118 900 else if (c == 'N' || c == 'O')
d46c5b12 901 {
ae9ff118
KH
902 /* ESC <Fe> for SS2 or SS3. */
903 mask &= CODING_CATEGORY_MASK_ISO_7_ELSE;
d46c5b12 904 break;
4ed46869 905 }
ec6d2bb8
KH
906 else if (c >= '0' && c <= '4')
907 {
908 /* ESC <Fp> for start/end composition. */
909 mask_found |= CODING_CATEGORY_MASK_ISO;
910 break;
911 }
bf9cdd4e 912 else
d46c5b12
KH
913 /* Invalid escape sequence. Just ignore. */
914 break;
915
916 /* We found a valid designation sequence for CHARSET. */
917 mask &= ~CODING_CATEGORY_MASK_ISO_8BIT;
05e6f5dc
KH
918 c = MAKE_CHAR (charset, 0, 0);
919 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7, charset, c))
d46c5b12
KH
920 mask_found |= CODING_CATEGORY_MASK_ISO_7;
921 else
922 mask &= ~CODING_CATEGORY_MASK_ISO_7;
05e6f5dc 923 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT, charset, c))
d46c5b12
KH
924 mask_found |= CODING_CATEGORY_MASK_ISO_7_TIGHT;
925 else
926 mask &= ~CODING_CATEGORY_MASK_ISO_7_TIGHT;
05e6f5dc 927 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE, charset, c))
ae9ff118
KH
928 mask_found |= CODING_CATEGORY_MASK_ISO_7_ELSE;
929 else
d46c5b12 930 mask &= ~CODING_CATEGORY_MASK_ISO_7_ELSE;
05e6f5dc 931 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE, charset, c))
ae9ff118
KH
932 mask_found |= CODING_CATEGORY_MASK_ISO_8_ELSE;
933 else
d46c5b12 934 mask &= ~CODING_CATEGORY_MASK_ISO_8_ELSE;
4ed46869
KH
935 break;
936
4ed46869 937 case ISO_CODE_SO:
74383408
KH
938 if (inhibit_iso_escape_detection)
939 break;
f46869e4 940 single_shifting = 0;
d46c5b12
KH
941 if (shift_out == 0
942 && (reg[1] >= 0
943 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE)
944 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE)))
945 {
946 /* Locking shift out. */
947 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
948 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
949 }
e0e989f6
KH
950 break;
951
d46c5b12 952 case ISO_CODE_SI:
74383408
KH
953 if (inhibit_iso_escape_detection)
954 break;
f46869e4 955 single_shifting = 0;
d46c5b12
KH
956 if (shift_out == 1)
957 {
958 /* Locking shift in. */
959 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
960 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
961 }
962 break;
963
4ed46869 964 case ISO_CODE_CSI:
f46869e4 965 single_shifting = 0;
4ed46869
KH
966 case ISO_CODE_SS2:
967 case ISO_CODE_SS3:
3f003981
KH
968 {
969 int newmask = CODING_CATEGORY_MASK_ISO_8_ELSE;
970
74383408
KH
971 if (inhibit_iso_escape_detection)
972 break;
70c22245
KH
973 if (c != ISO_CODE_CSI)
974 {
d46c5b12
KH
975 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
976 & CODING_FLAG_ISO_SINGLE_SHIFT)
70c22245 977 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
978 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
979 & CODING_FLAG_ISO_SINGLE_SHIFT)
70c22245 980 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
f46869e4 981 single_shifting = 1;
70c22245 982 }
3f003981
KH
983 if (VECTORP (Vlatin_extra_code_table)
984 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
985 {
d46c5b12
KH
986 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
987 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981 988 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
989 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
990 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981
KH
991 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
992 }
993 mask &= newmask;
d46c5b12 994 mask_found |= newmask;
3f003981
KH
995 }
996 break;
4ed46869
KH
997
998 default:
999 if (c < 0x80)
f46869e4
KH
1000 {
1001 single_shifting = 0;
1002 break;
1003 }
4ed46869 1004 else if (c < 0xA0)
c4825358 1005 {
f46869e4 1006 single_shifting = 0;
3f003981
KH
1007 if (VECTORP (Vlatin_extra_code_table)
1008 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
c4825358 1009 {
3f003981
KH
1010 int newmask = 0;
1011
d46c5b12
KH
1012 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
1013 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981 1014 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
1015 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
1016 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981
KH
1017 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
1018 mask &= newmask;
d46c5b12 1019 mask_found |= newmask;
c4825358 1020 }
3f003981
KH
1021 else
1022 return 0;
c4825358 1023 }
4ed46869
KH
1024 else
1025 {
d46c5b12 1026 mask &= ~(CODING_CATEGORY_MASK_ISO_7BIT
7717c392 1027 | CODING_CATEGORY_MASK_ISO_7_ELSE);
d46c5b12 1028 mask_found |= CODING_CATEGORY_MASK_ISO_8_1;
f46869e4
KH
1029 /* Check the length of succeeding codes of the range
1030 0xA0..0FF. If the byte length is odd, we exclude
1031 CODING_CATEGORY_MASK_ISO_8_2. We can check this only
1032 when we are not single shifting. */
b73bfc1c
KH
1033 if (!single_shifting
1034 && mask & CODING_CATEGORY_MASK_ISO_8_2)
f46869e4 1035 {
e17de821 1036 int i = 1;
b73bfc1c
KH
1037 while (src < src_end)
1038 {
1039 ONE_MORE_BYTE (c);
1040 if (c < 0xA0)
1041 break;
1042 i++;
1043 }
1044
1045 if (i & 1 && src < src_end)
f46869e4
KH
1046 mask &= ~CODING_CATEGORY_MASK_ISO_8_2;
1047 else
1048 mask_found |= CODING_CATEGORY_MASK_ISO_8_2;
1049 }
4ed46869
KH
1050 }
1051 break;
1052 }
1053 }
b73bfc1c 1054 label_end_of_loop:
d46c5b12 1055 return (mask & mask_found);
4ed46869
KH
1056}
1057
b73bfc1c
KH
1058/* Decode a character of which charset is CHARSET, the 1st position
1059 code is C1, the 2nd position code is C2, and return the decoded
1060 character code. If the variable `translation_table' is non-nil,
1061 returned the translated code. */
ec6d2bb8 1062
b73bfc1c
KH
1063#define DECODE_ISO_CHARACTER(charset, c1, c2) \
1064 (NILP (translation_table) \
1065 ? MAKE_CHAR (charset, c1, c2) \
1066 : translate_char (translation_table, -1, charset, c1, c2))
4ed46869
KH
1067
1068/* Set designation state into CODING. */
d46c5b12
KH
1069#define DECODE_DESIGNATION(reg, dimension, chars, final_char) \
1070 do { \
05e6f5dc 1071 int charset, c; \
944bd420
KH
1072 \
1073 if (final_char < '0' || final_char >= 128) \
1074 goto label_invalid_code; \
1075 charset = ISO_CHARSET_TABLE (make_number (dimension), \
1076 make_number (chars), \
1077 make_number (final_char)); \
05e6f5dc 1078 c = MAKE_CHAR (charset, 0, 0); \
d46c5b12 1079 if (charset >= 0 \
704c5781 1080 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg \
05e6f5dc 1081 || CODING_SAFE_CHAR_P (safe_chars, c))) \
d46c5b12
KH
1082 { \
1083 if (coding->spec.iso2022.last_invalid_designation_register == 0 \
1084 && reg == 0 \
1085 && charset == CHARSET_ASCII) \
1086 { \
1087 /* We should insert this designation sequence as is so \
1088 that it is surely written back to a file. */ \
1089 coding->spec.iso2022.last_invalid_designation_register = -1; \
1090 goto label_invalid_code; \
1091 } \
1092 coding->spec.iso2022.last_invalid_designation_register = -1; \
1093 if ((coding->mode & CODING_MODE_DIRECTION) \
1094 && CHARSET_REVERSE_CHARSET (charset) >= 0) \
1095 charset = CHARSET_REVERSE_CHARSET (charset); \
1096 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1097 } \
1098 else \
1099 { \
1100 coding->spec.iso2022.last_invalid_designation_register = reg; \
1101 goto label_invalid_code; \
1102 } \
4ed46869
KH
1103 } while (0)
1104
ec6d2bb8
KH
1105/* Allocate a memory block for storing information about compositions.
1106 The block is chained to the already allocated blocks. */
d46c5b12 1107
33fb63eb 1108void
ec6d2bb8 1109coding_allocate_composition_data (coding, char_offset)
d46c5b12 1110 struct coding_system *coding;
ec6d2bb8 1111 int char_offset;
d46c5b12 1112{
ec6d2bb8
KH
1113 struct composition_data *cmp_data
1114 = (struct composition_data *) xmalloc (sizeof *cmp_data);
1115
1116 cmp_data->char_offset = char_offset;
1117 cmp_data->used = 0;
1118 cmp_data->prev = coding->cmp_data;
1119 cmp_data->next = NULL;
1120 if (coding->cmp_data)
1121 coding->cmp_data->next = cmp_data;
1122 coding->cmp_data = cmp_data;
1123 coding->cmp_data_start = 0;
1124}
d46c5b12 1125
ec6d2bb8
KH
1126/* Record the starting position START and METHOD of one composition. */
1127
1128#define CODING_ADD_COMPOSITION_START(coding, start, method) \
1129 do { \
1130 struct composition_data *cmp_data = coding->cmp_data; \
1131 int *data = cmp_data->data + cmp_data->used; \
1132 coding->cmp_data_start = cmp_data->used; \
1133 data[0] = -1; \
1134 data[1] = cmp_data->char_offset + start; \
1135 data[3] = (int) method; \
1136 cmp_data->used += 4; \
1137 } while (0)
1138
1139/* Record the ending position END of the current composition. */
1140
1141#define CODING_ADD_COMPOSITION_END(coding, end) \
1142 do { \
1143 struct composition_data *cmp_data = coding->cmp_data; \
1144 int *data = cmp_data->data + coding->cmp_data_start; \
1145 data[0] = cmp_data->used - coding->cmp_data_start; \
1146 data[2] = cmp_data->char_offset + end; \
1147 } while (0)
1148
1149/* Record one COMPONENT (alternate character or composition rule). */
1150
1151#define CODING_ADD_COMPOSITION_COMPONENT(coding, component) \
1152 (coding->cmp_data->data[coding->cmp_data->used++] = component)
1153
1154/* Handle compositoin start sequence ESC 0, ESC 2, ESC 3, or ESC 4. */
1155
33fb63eb
KH
1156#define DECODE_COMPOSITION_START(c1) \
1157 do { \
1158 if (coding->composing == COMPOSITION_DISABLED) \
1159 { \
1160 *dst++ = ISO_CODE_ESC; \
1161 *dst++ = c1 & 0x7f; \
1162 coding->produced_char += 2; \
1163 } \
1164 else if (!COMPOSING_P (coding)) \
1165 { \
1166 /* This is surely the start of a composition. We must be sure \
1167 that coding->cmp_data has enough space to store the \
1168 information about the composition. If not, terminate the \
1169 current decoding loop, allocate one more memory block for \
1170 coding->cmp_data in the calller, then start the decoding \
1171 loop again. We can't allocate memory here directly because \
1172 it may cause buffer/string relocation. */ \
1173 if (!coding->cmp_data \
1174 || (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH \
1175 >= COMPOSITION_DATA_SIZE)) \
1176 { \
1177 coding->result = CODING_FINISH_INSUFFICIENT_CMP; \
1178 goto label_end_of_loop; \
1179 } \
1180 coding->composing = (c1 == '0' ? COMPOSITION_RELATIVE \
1181 : c1 == '2' ? COMPOSITION_WITH_RULE \
1182 : c1 == '3' ? COMPOSITION_WITH_ALTCHARS \
1183 : COMPOSITION_WITH_RULE_ALTCHARS); \
1184 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, \
1185 coding->composing); \
1186 coding->composition_rule_follows = 0; \
1187 } \
1188 else \
1189 { \
1190 /* We are already handling a composition. If the method is \
1191 the following two, the codes following the current escape \
1192 sequence are actual characters stored in a buffer. */ \
1193 if (coding->composing == COMPOSITION_WITH_ALTCHARS \
1194 || coding->composing == COMPOSITION_WITH_RULE_ALTCHARS) \
1195 { \
1196 coding->composing = COMPOSITION_RELATIVE; \
1197 coding->composition_rule_follows = 0; \
1198 } \
1199 } \
ec6d2bb8
KH
1200 } while (0)
1201
1202/* Handle compositoin end sequence ESC 1. */
1203
1204#define DECODE_COMPOSITION_END(c1) \
1205 do { \
1206 if (coding->composing == COMPOSITION_DISABLED) \
1207 { \
1208 *dst++ = ISO_CODE_ESC; \
1209 *dst++ = c1; \
1210 coding->produced_char += 2; \
1211 } \
1212 else \
1213 { \
1214 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
1215 coding->composing = COMPOSITION_NO; \
1216 } \
1217 } while (0)
1218
1219/* Decode a composition rule from the byte C1 (and maybe one more byte
1220 from SRC) and store one encoded composition rule in
1221 coding->cmp_data. */
1222
1223#define DECODE_COMPOSITION_RULE(c1) \
1224 do { \
1225 int rule = 0; \
1226 (c1) -= 32; \
1227 if (c1 < 81) /* old format (before ver.21) */ \
1228 { \
1229 int gref = (c1) / 9; \
1230 int nref = (c1) % 9; \
1231 if (gref == 4) gref = 10; \
1232 if (nref == 4) nref = 10; \
1233 rule = COMPOSITION_ENCODE_RULE (gref, nref); \
1234 } \
b73bfc1c 1235 else if (c1 < 93) /* new format (after ver.21) */ \
ec6d2bb8
KH
1236 { \
1237 ONE_MORE_BYTE (c2); \
1238 rule = COMPOSITION_ENCODE_RULE (c1 - 81, c2 - 32); \
1239 } \
1240 CODING_ADD_COMPOSITION_COMPONENT (coding, rule); \
1241 coding->composition_rule_follows = 0; \
1242 } while (0)
88993dfd 1243
d46c5b12 1244
4ed46869
KH
1245/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
1246
b73bfc1c 1247static void
d46c5b12 1248decode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
1249 struct coding_system *coding;
1250 unsigned char *source, *destination;
1251 int src_bytes, dst_bytes;
4ed46869
KH
1252{
1253 unsigned char *src = source;
1254 unsigned char *src_end = source + src_bytes;
1255 unsigned char *dst = destination;
1256 unsigned char *dst_end = destination + dst_bytes;
4ed46869
KH
1257 /* Charsets invoked to graphic plane 0 and 1 respectively. */
1258 int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1259 int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
b73bfc1c
KH
1260 /* SRC_BASE remembers the start position in source in each loop.
1261 The loop will be exited when there's not enough source code
1262 (within macro ONE_MORE_BYTE), or when there's not enough
1263 destination area to produce a character (within macro
1264 EMIT_CHAR). */
1265 unsigned char *src_base;
1266 int c, charset;
1267 Lisp_Object translation_table;
05e6f5dc
KH
1268 Lisp_Object safe_chars;
1269
1270 safe_chars = coding_safe_chars (coding);
bdd9fb48 1271
b73bfc1c
KH
1272 if (NILP (Venable_character_translation))
1273 translation_table = Qnil;
1274 else
1275 {
1276 translation_table = coding->translation_table_for_decode;
1277 if (NILP (translation_table))
1278 translation_table = Vstandard_translation_table_for_decode;
1279 }
4ed46869 1280
b73bfc1c
KH
1281 coding->result = CODING_FINISH_NORMAL;
1282
1283 while (1)
4ed46869 1284 {
b73bfc1c
KH
1285 int c1, c2;
1286
1287 src_base = src;
1288 ONE_MORE_BYTE (c1);
4ed46869 1289
ec6d2bb8 1290 /* We produce no character or one character. */
4ed46869
KH
1291 switch (iso_code_class [c1])
1292 {
1293 case ISO_0x20_or_0x7F:
ec6d2bb8
KH
1294 if (COMPOSING_P (coding) && coding->composition_rule_follows)
1295 {
1296 DECODE_COMPOSITION_RULE (c1);
b73bfc1c 1297 continue;
ec6d2bb8
KH
1298 }
1299 if (charset0 < 0 || CHARSET_CHARS (charset0) == 94)
4ed46869
KH
1300 {
1301 /* This is SPACE or DEL. */
b73bfc1c 1302 charset = CHARSET_ASCII;
4ed46869
KH
1303 break;
1304 }
1305 /* This is a graphic character, we fall down ... */
1306
1307 case ISO_graphic_plane_0:
ec6d2bb8 1308 if (COMPOSING_P (coding) && coding->composition_rule_follows)
b73bfc1c
KH
1309 {
1310 DECODE_COMPOSITION_RULE (c1);
1311 continue;
1312 }
1313 charset = charset0;
4ed46869
KH
1314 break;
1315
1316 case ISO_0xA0_or_0xFF:
d46c5b12
KH
1317 if (charset1 < 0 || CHARSET_CHARS (charset1) == 94
1318 || coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
fb88bf2d 1319 goto label_invalid_code;
4ed46869
KH
1320 /* This is a graphic character, we fall down ... */
1321
1322 case ISO_graphic_plane_1:
b73bfc1c 1323 if (charset1 < 0)
fb88bf2d 1324 goto label_invalid_code;
b73bfc1c 1325 charset = charset1;
4ed46869
KH
1326 break;
1327
b73bfc1c 1328 case ISO_control_0:
ec6d2bb8
KH
1329 if (COMPOSING_P (coding))
1330 DECODE_COMPOSITION_END ('1');
1331
4ed46869
KH
1332 /* All ISO2022 control characters in this class have the
1333 same representation in Emacs internal format. */
d46c5b12
KH
1334 if (c1 == '\n'
1335 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
1336 && (coding->eol_type == CODING_EOL_CR
1337 || coding->eol_type == CODING_EOL_CRLF))
1338 {
b73bfc1c
KH
1339 coding->result = CODING_FINISH_INCONSISTENT_EOL;
1340 goto label_end_of_loop;
d46c5b12 1341 }
b73bfc1c 1342 charset = CHARSET_ASCII;
4ed46869
KH
1343 break;
1344
b73bfc1c
KH
1345 case ISO_control_1:
1346 if (COMPOSING_P (coding))
1347 DECODE_COMPOSITION_END ('1');
1348 goto label_invalid_code;
1349
4ed46869 1350 case ISO_carriage_return:
ec6d2bb8
KH
1351 if (COMPOSING_P (coding))
1352 DECODE_COMPOSITION_END ('1');
1353
4ed46869 1354 if (coding->eol_type == CODING_EOL_CR)
b73bfc1c 1355 c1 = '\n';
4ed46869
KH
1356 else if (coding->eol_type == CODING_EOL_CRLF)
1357 {
1358 ONE_MORE_BYTE (c1);
b73bfc1c 1359 if (c1 != ISO_CODE_LF)
4ed46869 1360 {
d46c5b12
KH
1361 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
1362 {
b73bfc1c
KH
1363 coding->result = CODING_FINISH_INCONSISTENT_EOL;
1364 goto label_end_of_loop;
d46c5b12 1365 }
4ed46869 1366 src--;
b73bfc1c 1367 c1 = '\r';
4ed46869
KH
1368 }
1369 }
b73bfc1c 1370 charset = CHARSET_ASCII;
4ed46869
KH
1371 break;
1372
1373 case ISO_shift_out:
d46c5b12
KH
1374 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1375 || CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0)
1376 goto label_invalid_code;
4ed46869
KH
1377 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1;
1378 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1379 continue;
4ed46869
KH
1380
1381 case ISO_shift_in:
d46c5b12
KH
1382 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
1383 goto label_invalid_code;
4ed46869
KH
1384 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
1385 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1386 continue;
4ed46869
KH
1387
1388 case ISO_single_shift_2_7:
1389 case ISO_single_shift_2:
d46c5b12
KH
1390 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1391 goto label_invalid_code;
4ed46869
KH
1392 /* SS2 is handled as an escape sequence of ESC 'N' */
1393 c1 = 'N';
1394 goto label_escape_sequence;
1395
1396 case ISO_single_shift_3:
d46c5b12
KH
1397 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1398 goto label_invalid_code;
4ed46869
KH
1399 /* SS2 is handled as an escape sequence of ESC 'O' */
1400 c1 = 'O';
1401 goto label_escape_sequence;
1402
1403 case ISO_control_sequence_introducer:
1404 /* CSI is handled as an escape sequence of ESC '[' ... */
1405 c1 = '[';
1406 goto label_escape_sequence;
1407
1408 case ISO_escape:
1409 ONE_MORE_BYTE (c1);
1410 label_escape_sequence:
1411 /* Escape sequences handled by Emacs are invocation,
1412 designation, direction specification, and character
1413 composition specification. */
1414 switch (c1)
1415 {
1416 case '&': /* revision of following character set */
1417 ONE_MORE_BYTE (c1);
1418 if (!(c1 >= '@' && c1 <= '~'))
d46c5b12 1419 goto label_invalid_code;
4ed46869
KH
1420 ONE_MORE_BYTE (c1);
1421 if (c1 != ISO_CODE_ESC)
d46c5b12 1422 goto label_invalid_code;
4ed46869
KH
1423 ONE_MORE_BYTE (c1);
1424 goto label_escape_sequence;
1425
1426 case '$': /* designation of 2-byte character set */
d46c5b12
KH
1427 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
1428 goto label_invalid_code;
4ed46869
KH
1429 ONE_MORE_BYTE (c1);
1430 if (c1 >= '@' && c1 <= 'B')
1431 { /* designation of JISX0208.1978, GB2312.1980,
88993dfd 1432 or JISX0208.1980 */
4ed46869
KH
1433 DECODE_DESIGNATION (0, 2, 94, c1);
1434 }
1435 else if (c1 >= 0x28 && c1 <= 0x2B)
1436 { /* designation of DIMENSION2_CHARS94 character set */
1437 ONE_MORE_BYTE (c2);
1438 DECODE_DESIGNATION (c1 - 0x28, 2, 94, c2);
1439 }
1440 else if (c1 >= 0x2C && c1 <= 0x2F)
1441 { /* designation of DIMENSION2_CHARS96 character set */
1442 ONE_MORE_BYTE (c2);
1443 DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2);
1444 }
1445 else
d46c5b12 1446 goto label_invalid_code;
b73bfc1c
KH
1447 /* We must update these variables now. */
1448 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1449 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
1450 continue;
4ed46869
KH
1451
1452 case 'n': /* invocation of locking-shift-2 */
d46c5b12
KH
1453 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1454 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
1455 goto label_invalid_code;
4ed46869 1456 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2;
e0e989f6 1457 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1458 continue;
4ed46869
KH
1459
1460 case 'o': /* invocation of locking-shift-3 */
d46c5b12
KH
1461 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1462 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
1463 goto label_invalid_code;
4ed46869 1464 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3;
e0e989f6 1465 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1466 continue;
4ed46869
KH
1467
1468 case 'N': /* invocation of single-shift-2 */
d46c5b12
KH
1469 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1470 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
1471 goto label_invalid_code;
4ed46869 1472 charset = CODING_SPEC_ISO_DESIGNATION (coding, 2);
b73bfc1c 1473 ONE_MORE_BYTE (c1);
e7046a18
KH
1474 if (c1 < 0x20 || (c1 >= 0x80 && c1 < 0xA0))
1475 goto label_invalid_code;
4ed46869
KH
1476 break;
1477
1478 case 'O': /* invocation of single-shift-3 */
d46c5b12
KH
1479 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1480 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
1481 goto label_invalid_code;
4ed46869 1482 charset = CODING_SPEC_ISO_DESIGNATION (coding, 3);
b73bfc1c 1483 ONE_MORE_BYTE (c1);
e7046a18
KH
1484 if (c1 < 0x20 || (c1 >= 0x80 && c1 < 0xA0))
1485 goto label_invalid_code;
4ed46869
KH
1486 break;
1487
ec6d2bb8
KH
1488 case '0': case '2': case '3': case '4': /* start composition */
1489 DECODE_COMPOSITION_START (c1);
b73bfc1c 1490 continue;
4ed46869 1491
ec6d2bb8
KH
1492 case '1': /* end composition */
1493 DECODE_COMPOSITION_END (c1);
b73bfc1c 1494 continue;
4ed46869
KH
1495
1496 case '[': /* specification of direction */
d46c5b12
KH
1497 if (coding->flags & CODING_FLAG_ISO_NO_DIRECTION)
1498 goto label_invalid_code;
4ed46869 1499 /* For the moment, nested direction is not supported.
d46c5b12
KH
1500 So, `coding->mode & CODING_MODE_DIRECTION' zero means
1501 left-to-right, and nozero means right-to-left. */
4ed46869
KH
1502 ONE_MORE_BYTE (c1);
1503 switch (c1)
1504 {
1505 case ']': /* end of the current direction */
d46c5b12 1506 coding->mode &= ~CODING_MODE_DIRECTION;
4ed46869
KH
1507
1508 case '0': /* end of the current direction */
1509 case '1': /* start of left-to-right direction */
1510 ONE_MORE_BYTE (c1);
1511 if (c1 == ']')
d46c5b12 1512 coding->mode &= ~CODING_MODE_DIRECTION;
4ed46869 1513 else
d46c5b12 1514 goto label_invalid_code;
4ed46869
KH
1515 break;
1516
1517 case '2': /* start of right-to-left direction */
1518 ONE_MORE_BYTE (c1);
1519 if (c1 == ']')
d46c5b12 1520 coding->mode |= CODING_MODE_DIRECTION;
4ed46869 1521 else
d46c5b12 1522 goto label_invalid_code;
4ed46869
KH
1523 break;
1524
1525 default:
d46c5b12 1526 goto label_invalid_code;
4ed46869 1527 }
b73bfc1c 1528 continue;
4ed46869
KH
1529
1530 default:
d46c5b12
KH
1531 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
1532 goto label_invalid_code;
4ed46869
KH
1533 if (c1 >= 0x28 && c1 <= 0x2B)
1534 { /* designation of DIMENSION1_CHARS94 character set */
1535 ONE_MORE_BYTE (c2);
1536 DECODE_DESIGNATION (c1 - 0x28, 1, 94, c2);
1537 }
1538 else if (c1 >= 0x2C && c1 <= 0x2F)
1539 { /* designation of DIMENSION1_CHARS96 character set */
1540 ONE_MORE_BYTE (c2);
1541 DECODE_DESIGNATION (c1 - 0x2C, 1, 96, c2);
1542 }
1543 else
b73bfc1c
KH
1544 goto label_invalid_code;
1545 /* We must update these variables now. */
1546 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1547 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
1548 continue;
4ed46869 1549 }
b73bfc1c 1550 }
4ed46869 1551
b73bfc1c
KH
1552 /* Now we know CHARSET and 1st position code C1 of a character.
1553 Produce a multibyte sequence for that character while getting
1554 2nd position code C2 if necessary. */
1555 if (CHARSET_DIMENSION (charset) == 2)
1556 {
1557 ONE_MORE_BYTE (c2);
1558 if (c1 < 0x80 ? c2 < 0x20 || c2 >= 0x80 : c2 < 0xA0)
1559 /* C2 is not in a valid range. */
1560 goto label_invalid_code;
4ed46869 1561 }
b73bfc1c
KH
1562 c = DECODE_ISO_CHARACTER (charset, c1, c2);
1563 EMIT_CHAR (c);
4ed46869
KH
1564 continue;
1565
b73bfc1c
KH
1566 label_invalid_code:
1567 coding->errors++;
1568 if (COMPOSING_P (coding))
1569 DECODE_COMPOSITION_END ('1');
4ed46869 1570 src = src_base;
b73bfc1c
KH
1571 c = *src++;
1572 EMIT_CHAR (c);
4ed46869 1573 }
fb88bf2d 1574
b73bfc1c
KH
1575 label_end_of_loop:
1576 coding->consumed = coding->consumed_char = src_base - source;
d46c5b12 1577 coding->produced = dst - destination;
b73bfc1c 1578 return;
4ed46869
KH
1579}
1580
b73bfc1c 1581
f4dee582 1582/* ISO2022 encoding stuff. */
4ed46869
KH
1583
1584/*
f4dee582 1585 It is not enough to say just "ISO2022" on encoding, we have to
d46c5b12 1586 specify more details. In Emacs, each coding system of ISO2022
4ed46869
KH
1587 variant has the following specifications:
1588 1. Initial designation to G0 thru G3.
1589 2. Allows short-form designation?
1590 3. ASCII should be designated to G0 before control characters?
1591 4. ASCII should be designated to G0 at end of line?
1592 5. 7-bit environment or 8-bit environment?
1593 6. Use locking-shift?
1594 7. Use Single-shift?
1595 And the following two are only for Japanese:
1596 8. Use ASCII in place of JIS0201-1976-Roman?
1597 9. Use JISX0208-1983 in place of JISX0208-1978?
1598 These specifications are encoded in `coding->flags' as flag bits
1599 defined by macros CODING_FLAG_ISO_XXX. See `coding.h' for more
f4dee582 1600 details.
4ed46869
KH
1601*/
1602
1603/* Produce codes (escape sequence) for designating CHARSET to graphic
b73bfc1c
KH
1604 register REG at DST, and increment DST. If <final-char> of CHARSET is
1605 '@', 'A', or 'B' and the coding system CODING allows, produce
1606 designation sequence of short-form. */
4ed46869
KH
1607
1608#define ENCODE_DESIGNATION(charset, reg, coding) \
1609 do { \
1610 unsigned char final_char = CHARSET_ISO_FINAL_CHAR (charset); \
1611 char *intermediate_char_94 = "()*+"; \
1612 char *intermediate_char_96 = ",-./"; \
70c22245 1613 int revision = CODING_SPEC_ISO_REVISION_NUMBER(coding, charset); \
b73bfc1c 1614 \
70c22245
KH
1615 if (revision < 255) \
1616 { \
4ed46869
KH
1617 *dst++ = ISO_CODE_ESC; \
1618 *dst++ = '&'; \
70c22245 1619 *dst++ = '@' + revision; \
4ed46869 1620 } \
b73bfc1c 1621 *dst++ = ISO_CODE_ESC; \
4ed46869
KH
1622 if (CHARSET_DIMENSION (charset) == 1) \
1623 { \
1624 if (CHARSET_CHARS (charset) == 94) \
1625 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
1626 else \
1627 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
1628 } \
1629 else \
1630 { \
1631 *dst++ = '$'; \
1632 if (CHARSET_CHARS (charset) == 94) \
1633 { \
b73bfc1c
KH
1634 if (! (coding->flags & CODING_FLAG_ISO_SHORT_FORM) \
1635 || reg != 0 \
1636 || final_char < '@' || final_char > 'B') \
4ed46869
KH
1637 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
1638 } \
1639 else \
b73bfc1c 1640 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
4ed46869 1641 } \
b73bfc1c 1642 *dst++ = final_char; \
4ed46869
KH
1643 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1644 } while (0)
1645
1646/* The following two macros produce codes (control character or escape
1647 sequence) for ISO2022 single-shift functions (single-shift-2 and
1648 single-shift-3). */
1649
1650#define ENCODE_SINGLE_SHIFT_2 \
1651 do { \
1652 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
1653 *dst++ = ISO_CODE_ESC, *dst++ = 'N'; \
1654 else \
b73bfc1c 1655 *dst++ = ISO_CODE_SS2; \
4ed46869
KH
1656 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
1657 } while (0)
1658
fb88bf2d
KH
1659#define ENCODE_SINGLE_SHIFT_3 \
1660 do { \
4ed46869 1661 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
fb88bf2d
KH
1662 *dst++ = ISO_CODE_ESC, *dst++ = 'O'; \
1663 else \
b73bfc1c 1664 *dst++ = ISO_CODE_SS3; \
4ed46869
KH
1665 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
1666 } while (0)
1667
1668/* The following four macros produce codes (control character or
1669 escape sequence) for ISO2022 locking-shift functions (shift-in,
1670 shift-out, locking-shift-2, and locking-shift-3). */
1671
b73bfc1c
KH
1672#define ENCODE_SHIFT_IN \
1673 do { \
1674 *dst++ = ISO_CODE_SI; \
4ed46869
KH
1675 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; \
1676 } while (0)
1677
b73bfc1c
KH
1678#define ENCODE_SHIFT_OUT \
1679 do { \
1680 *dst++ = ISO_CODE_SO; \
4ed46869
KH
1681 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; \
1682 } while (0)
1683
1684#define ENCODE_LOCKING_SHIFT_2 \
1685 do { \
1686 *dst++ = ISO_CODE_ESC, *dst++ = 'n'; \
1687 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; \
1688 } while (0)
1689
b73bfc1c
KH
1690#define ENCODE_LOCKING_SHIFT_3 \
1691 do { \
1692 *dst++ = ISO_CODE_ESC, *dst++ = 'o'; \
4ed46869
KH
1693 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; \
1694 } while (0)
1695
f4dee582
RS
1696/* Produce codes for a DIMENSION1 character whose character set is
1697 CHARSET and whose position-code is C1. Designation and invocation
4ed46869
KH
1698 sequences are also produced in advance if necessary. */
1699
6e85d753
KH
1700#define ENCODE_ISO_CHARACTER_DIMENSION1(charset, c1) \
1701 do { \
1702 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
1703 { \
1704 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
1705 *dst++ = c1 & 0x7F; \
1706 else \
1707 *dst++ = c1 | 0x80; \
1708 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
1709 break; \
1710 } \
1711 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
1712 { \
1713 *dst++ = c1 & 0x7F; \
1714 break; \
1715 } \
1716 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
1717 { \
1718 *dst++ = c1 | 0x80; \
1719 break; \
1720 } \
6e85d753
KH
1721 else \
1722 /* Since CHARSET is not yet invoked to any graphic planes, we \
1723 must invoke it, or, at first, designate it to some graphic \
1724 register. Then repeat the loop to actually produce the \
1725 character. */ \
1726 dst = encode_invocation_designation (charset, coding, dst); \
4ed46869
KH
1727 } while (1)
1728
f4dee582
RS
1729/* Produce codes for a DIMENSION2 character whose character set is
1730 CHARSET and whose position-codes are C1 and C2. Designation and
4ed46869
KH
1731 invocation codes are also produced in advance if necessary. */
1732
6e85d753
KH
1733#define ENCODE_ISO_CHARACTER_DIMENSION2(charset, c1, c2) \
1734 do { \
1735 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
1736 { \
1737 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
1738 *dst++ = c1 & 0x7F, *dst++ = c2 & 0x7F; \
1739 else \
1740 *dst++ = c1 | 0x80, *dst++ = c2 | 0x80; \
1741 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
1742 break; \
1743 } \
1744 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
1745 { \
1746 *dst++ = c1 & 0x7F, *dst++= c2 & 0x7F; \
1747 break; \
1748 } \
1749 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
1750 { \
1751 *dst++ = c1 | 0x80, *dst++= c2 | 0x80; \
1752 break; \
1753 } \
6e85d753
KH
1754 else \
1755 /* Since CHARSET is not yet invoked to any graphic planes, we \
1756 must invoke it, or, at first, designate it to some graphic \
1757 register. Then repeat the loop to actually produce the \
1758 character. */ \
1759 dst = encode_invocation_designation (charset, coding, dst); \
4ed46869
KH
1760 } while (1)
1761
05e6f5dc
KH
1762#define ENCODE_ISO_CHARACTER(c) \
1763 do { \
1764 int charset, c1, c2; \
1765 \
1766 SPLIT_CHAR (c, charset, c1, c2); \
1767 if (CHARSET_DEFINED_P (charset)) \
1768 { \
1769 if (CHARSET_DIMENSION (charset) == 1) \
1770 { \
1771 if (charset == CHARSET_ASCII \
1772 && coding->flags & CODING_FLAG_ISO_USE_ROMAN) \
1773 charset = charset_latin_jisx0201; \
1774 ENCODE_ISO_CHARACTER_DIMENSION1 (charset, c1); \
1775 } \
1776 else \
1777 { \
1778 if (charset == charset_jisx0208 \
1779 && coding->flags & CODING_FLAG_ISO_USE_OLDJIS) \
1780 charset = charset_jisx0208_1978; \
1781 ENCODE_ISO_CHARACTER_DIMENSION2 (charset, c1, c2); \
1782 } \
1783 } \
1784 else \
1785 { \
1786 *dst++ = c1; \
1787 if (c2 >= 0) \
1788 *dst++ = c2; \
1789 } \
1790 } while (0)
1791
1792
1793/* Instead of encoding character C, produce one or two `?'s. */
1794
1795#define ENCODE_UNSAFE_CHARACTER(c) \
6f551029 1796 do { \
05e6f5dc
KH
1797 ENCODE_ISO_CHARACTER (CODING_INHIBIT_CHARACTER_SUBSTITUTION); \
1798 if (CHARSET_WIDTH (CHAR_CHARSET (c)) > 1) \
1799 ENCODE_ISO_CHARACTER (CODING_INHIBIT_CHARACTER_SUBSTITUTION); \
84fbb8a0 1800 } while (0)
bdd9fb48 1801
05e6f5dc 1802
4ed46869
KH
1803/* Produce designation and invocation codes at a place pointed by DST
1804 to use CHARSET. The element `spec.iso2022' of *CODING is updated.
1805 Return new DST. */
1806
1807unsigned char *
1808encode_invocation_designation (charset, coding, dst)
1809 int charset;
1810 struct coding_system *coding;
1811 unsigned char *dst;
1812{
1813 int reg; /* graphic register number */
1814
1815 /* At first, check designations. */
1816 for (reg = 0; reg < 4; reg++)
1817 if (charset == CODING_SPEC_ISO_DESIGNATION (coding, reg))
1818 break;
1819
1820 if (reg >= 4)
1821 {
1822 /* CHARSET is not yet designated to any graphic registers. */
1823 /* At first check the requested designation. */
1824 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
1ba9e4ab
KH
1825 if (reg == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)
1826 /* Since CHARSET requests no special designation, designate it
1827 to graphic register 0. */
4ed46869
KH
1828 reg = 0;
1829
1830 ENCODE_DESIGNATION (charset, reg, coding);
1831 }
1832
1833 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != reg
1834 && CODING_SPEC_ISO_INVOCATION (coding, 1) != reg)
1835 {
1836 /* Since the graphic register REG is not invoked to any graphic
1837 planes, invoke it to graphic plane 0. */
1838 switch (reg)
1839 {
1840 case 0: /* graphic register 0 */
1841 ENCODE_SHIFT_IN;
1842 break;
1843
1844 case 1: /* graphic register 1 */
1845 ENCODE_SHIFT_OUT;
1846 break;
1847
1848 case 2: /* graphic register 2 */
1849 if (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1850 ENCODE_SINGLE_SHIFT_2;
1851 else
1852 ENCODE_LOCKING_SHIFT_2;
1853 break;
1854
1855 case 3: /* graphic register 3 */
1856 if (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1857 ENCODE_SINGLE_SHIFT_3;
1858 else
1859 ENCODE_LOCKING_SHIFT_3;
1860 break;
1861 }
1862 }
b73bfc1c 1863
4ed46869
KH
1864 return dst;
1865}
1866
ec6d2bb8
KH
1867/* Produce 2-byte codes for encoded composition rule RULE. */
1868
1869#define ENCODE_COMPOSITION_RULE(rule) \
1870 do { \
1871 int gref, nref; \
1872 COMPOSITION_DECODE_RULE (rule, gref, nref); \
1873 *dst++ = 32 + 81 + gref; \
1874 *dst++ = 32 + nref; \
1875 } while (0)
1876
1877/* Produce codes for indicating the start of a composition sequence
1878 (ESC 0, ESC 3, or ESC 4). DATA points to an array of integers
1879 which specify information about the composition. See the comment
1880 in coding.h for the format of DATA. */
1881
1882#define ENCODE_COMPOSITION_START(coding, data) \
1883 do { \
1884 coding->composing = data[3]; \
1885 *dst++ = ISO_CODE_ESC; \
1886 if (coding->composing == COMPOSITION_RELATIVE) \
1887 *dst++ = '0'; \
1888 else \
1889 { \
1890 *dst++ = (coding->composing == COMPOSITION_WITH_ALTCHARS \
1891 ? '3' : '4'); \
1892 coding->cmp_data_index = coding->cmp_data_start + 4; \
1893 coding->composition_rule_follows = 0; \
1894 } \
1895 } while (0)
1896
1897/* Produce codes for indicating the end of the current composition. */
1898
1899#define ENCODE_COMPOSITION_END(coding, data) \
1900 do { \
1901 *dst++ = ISO_CODE_ESC; \
1902 *dst++ = '1'; \
1903 coding->cmp_data_start += data[0]; \
1904 coding->composing = COMPOSITION_NO; \
1905 if (coding->cmp_data_start == coding->cmp_data->used \
1906 && coding->cmp_data->next) \
1907 { \
1908 coding->cmp_data = coding->cmp_data->next; \
1909 coding->cmp_data_start = 0; \
1910 } \
1911 } while (0)
1912
1913/* Produce composition start sequence ESC 0. Here, this sequence
1914 doesn't mean the start of a new composition but means that we have
1915 just produced components (alternate chars and composition rules) of
1916 the composition and the actual text follows in SRC. */
1917
1918#define ENCODE_COMPOSITION_FAKE_START(coding) \
1919 do { \
1920 *dst++ = ISO_CODE_ESC; \
1921 *dst++ = '0'; \
1922 coding->composing = COMPOSITION_RELATIVE; \
1923 } while (0)
4ed46869
KH
1924
1925/* The following three macros produce codes for indicating direction
1926 of text. */
b73bfc1c
KH
1927#define ENCODE_CONTROL_SEQUENCE_INTRODUCER \
1928 do { \
4ed46869 1929 if (coding->flags == CODING_FLAG_ISO_SEVEN_BITS) \
b73bfc1c
KH
1930 *dst++ = ISO_CODE_ESC, *dst++ = '['; \
1931 else \
1932 *dst++ = ISO_CODE_CSI; \
4ed46869
KH
1933 } while (0)
1934
1935#define ENCODE_DIRECTION_R2L \
b73bfc1c 1936 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '2', *dst++ = ']'
4ed46869
KH
1937
1938#define ENCODE_DIRECTION_L2R \
b73bfc1c 1939 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '0', *dst++ = ']'
4ed46869
KH
1940
1941/* Produce codes for designation and invocation to reset the graphic
1942 planes and registers to initial state. */
e0e989f6
KH
1943#define ENCODE_RESET_PLANE_AND_REGISTER \
1944 do { \
1945 int reg; \
1946 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != 0) \
1947 ENCODE_SHIFT_IN; \
1948 for (reg = 0; reg < 4; reg++) \
1949 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg) >= 0 \
1950 && (CODING_SPEC_ISO_DESIGNATION (coding, reg) \
1951 != CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg))) \
1952 ENCODE_DESIGNATION \
1953 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \
4ed46869
KH
1954 } while (0)
1955
bdd9fb48 1956/* Produce designation sequences of charsets in the line started from
b73bfc1c 1957 SRC to a place pointed by DST, and return updated DST.
bdd9fb48
KH
1958
1959 If the current block ends before any end-of-line, we may fail to
d46c5b12
KH
1960 find all the necessary designations. */
1961
b73bfc1c
KH
1962static unsigned char *
1963encode_designation_at_bol (coding, translation_table, src, src_end, dst)
e0e989f6 1964 struct coding_system *coding;
b73bfc1c
KH
1965 Lisp_Object translation_table;
1966 unsigned char *src, *src_end, *dst;
e0e989f6 1967{
bdd9fb48
KH
1968 int charset, c, found = 0, reg;
1969 /* Table of charsets to be designated to each graphic register. */
1970 int r[4];
bdd9fb48
KH
1971
1972 for (reg = 0; reg < 4; reg++)
1973 r[reg] = -1;
1974
b73bfc1c 1975 while (found < 4)
e0e989f6 1976 {
b73bfc1c
KH
1977 ONE_MORE_CHAR (c);
1978 if (c == '\n')
1979 break;
bdd9fb48 1980
b73bfc1c 1981 charset = CHAR_CHARSET (c);
e0e989f6 1982 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
d46c5b12 1983 if (reg != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION && r[reg] < 0)
bdd9fb48
KH
1984 {
1985 found++;
1986 r[reg] = charset;
1987 }
bdd9fb48
KH
1988 }
1989
b73bfc1c 1990 label_end_of_loop:
bdd9fb48
KH
1991 if (found)
1992 {
1993 for (reg = 0; reg < 4; reg++)
1994 if (r[reg] >= 0
1995 && CODING_SPEC_ISO_DESIGNATION (coding, reg) != r[reg])
1996 ENCODE_DESIGNATION (r[reg], reg, coding);
e0e989f6 1997 }
b73bfc1c
KH
1998
1999 return dst;
e0e989f6
KH
2000}
2001
4ed46869
KH
2002/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
2003
b73bfc1c 2004static void
d46c5b12 2005encode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
2006 struct coding_system *coding;
2007 unsigned char *source, *destination;
2008 int src_bytes, dst_bytes;
4ed46869
KH
2009{
2010 unsigned char *src = source;
2011 unsigned char *src_end = source + src_bytes;
2012 unsigned char *dst = destination;
2013 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c 2014 /* Since the maximum bytes produced by each loop is 20, we subtract 19
4ed46869
KH
2015 from DST_END to assure overflow checking is necessary only at the
2016 head of loop. */
b73bfc1c
KH
2017 unsigned char *adjusted_dst_end = dst_end - 19;
2018 /* SRC_BASE remembers the start position in source in each loop.
2019 The loop will be exited when there's not enough source text to
2020 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2021 there's not enough destination area to produce encoded codes
2022 (within macro EMIT_BYTES). */
2023 unsigned char *src_base;
2024 int c;
2025 Lisp_Object translation_table;
05e6f5dc
KH
2026 Lisp_Object safe_chars;
2027
2028 safe_chars = coding_safe_chars (coding);
bdd9fb48 2029
b73bfc1c
KH
2030 if (NILP (Venable_character_translation))
2031 translation_table = Qnil;
2032 else
2033 {
2034 translation_table = coding->translation_table_for_encode;
2035 if (NILP (translation_table))
2036 translation_table = Vstandard_translation_table_for_encode;
2037 }
4ed46869 2038
d46c5b12 2039 coding->consumed_char = 0;
b73bfc1c
KH
2040 coding->errors = 0;
2041 while (1)
4ed46869 2042 {
b73bfc1c
KH
2043 src_base = src;
2044
2045 if (dst >= (dst_bytes ? adjusted_dst_end : (src - 19)))
2046 {
2047 coding->result = CODING_FINISH_INSUFFICIENT_DST;
2048 break;
2049 }
4ed46869 2050
e0e989f6
KH
2051 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL
2052 && CODING_SPEC_ISO_BOL (coding))
2053 {
bdd9fb48 2054 /* We have to produce designation sequences if any now. */
b73bfc1c
KH
2055 dst = encode_designation_at_bol (coding, translation_table,
2056 src, src_end, dst);
e0e989f6
KH
2057 CODING_SPEC_ISO_BOL (coding) = 0;
2058 }
2059
ec6d2bb8
KH
2060 /* Check composition start and end. */
2061 if (coding->composing != COMPOSITION_DISABLED
2062 && coding->cmp_data_start < coding->cmp_data->used)
4ed46869 2063 {
ec6d2bb8
KH
2064 struct composition_data *cmp_data = coding->cmp_data;
2065 int *data = cmp_data->data + coding->cmp_data_start;
2066 int this_pos = cmp_data->char_offset + coding->consumed_char;
2067
2068 if (coding->composing == COMPOSITION_RELATIVE)
4ed46869 2069 {
ec6d2bb8
KH
2070 if (this_pos == data[2])
2071 {
2072 ENCODE_COMPOSITION_END (coding, data);
2073 cmp_data = coding->cmp_data;
2074 data = cmp_data->data + coding->cmp_data_start;
2075 }
4ed46869 2076 }
ec6d2bb8 2077 else if (COMPOSING_P (coding))
4ed46869 2078 {
ec6d2bb8
KH
2079 /* COMPOSITION_WITH_ALTCHARS or COMPOSITION_WITH_RULE_ALTCHAR */
2080 if (coding->cmp_data_index == coding->cmp_data_start + data[0])
2081 /* We have consumed components of the composition.
2082 What follows in SRC is the compositions's base
2083 text. */
2084 ENCODE_COMPOSITION_FAKE_START (coding);
2085 else
4ed46869 2086 {
ec6d2bb8
KH
2087 int c = cmp_data->data[coding->cmp_data_index++];
2088 if (coding->composition_rule_follows)
2089 {
2090 ENCODE_COMPOSITION_RULE (c);
2091 coding->composition_rule_follows = 0;
2092 }
2093 else
2094 {
05e6f5dc
KH
2095 if (coding->flags & CODING_FLAG_ISO_SAFE
2096 && ! CODING_SAFE_CHAR_P (safe_chars, c))
2097 ENCODE_UNSAFE_CHARACTER (c);
2098 else
2099 ENCODE_ISO_CHARACTER (c);
ec6d2bb8
KH
2100 if (coding->composing == COMPOSITION_WITH_RULE_ALTCHARS)
2101 coding->composition_rule_follows = 1;
2102 }
4ed46869
KH
2103 continue;
2104 }
ec6d2bb8
KH
2105 }
2106 if (!COMPOSING_P (coding))
2107 {
2108 if (this_pos == data[1])
4ed46869 2109 {
ec6d2bb8
KH
2110 ENCODE_COMPOSITION_START (coding, data);
2111 continue;
4ed46869 2112 }
4ed46869
KH
2113 }
2114 }
ec6d2bb8 2115
b73bfc1c 2116 ONE_MORE_CHAR (c);
4ed46869 2117
b73bfc1c
KH
2118 /* Now encode the character C. */
2119 if (c < 0x20 || c == 0x7F)
2120 {
2121 if (c == '\r')
19a8d9e0 2122 {
b73bfc1c
KH
2123 if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY))
2124 {
2125 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
2126 ENCODE_RESET_PLANE_AND_REGISTER;
2127 *dst++ = c;
2128 continue;
2129 }
2130 /* fall down to treat '\r' as '\n' ... */
2131 c = '\n';
19a8d9e0 2132 }
b73bfc1c 2133 if (c == '\n')
19a8d9e0 2134 {
b73bfc1c
KH
2135 if (coding->flags & CODING_FLAG_ISO_RESET_AT_EOL)
2136 ENCODE_RESET_PLANE_AND_REGISTER;
2137 if (coding->flags & CODING_FLAG_ISO_INIT_AT_BOL)
2138 bcopy (coding->spec.iso2022.initial_designation,
2139 coding->spec.iso2022.current_designation,
2140 sizeof coding->spec.iso2022.initial_designation);
2141 if (coding->eol_type == CODING_EOL_LF
2142 || coding->eol_type == CODING_EOL_UNDECIDED)
2143 *dst++ = ISO_CODE_LF;
2144 else if (coding->eol_type == CODING_EOL_CRLF)
2145 *dst++ = ISO_CODE_CR, *dst++ = ISO_CODE_LF;
2146 else
2147 *dst++ = ISO_CODE_CR;
2148 CODING_SPEC_ISO_BOL (coding) = 1;
19a8d9e0 2149 }
b73bfc1c 2150 else
19a8d9e0 2151 {
b73bfc1c
KH
2152 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
2153 ENCODE_RESET_PLANE_AND_REGISTER;
2154 *dst++ = c;
19a8d9e0 2155 }
4ed46869 2156 }
b73bfc1c 2157 else if (ASCII_BYTE_P (c))
05e6f5dc 2158 ENCODE_ISO_CHARACTER (c);
b73bfc1c 2159 else if (SINGLE_BYTE_CHAR_P (c))
88993dfd 2160 {
b73bfc1c
KH
2161 *dst++ = c;
2162 coding->errors++;
88993dfd 2163 }
05e6f5dc
KH
2164 else if (coding->flags & CODING_FLAG_ISO_SAFE
2165 && ! CODING_SAFE_CHAR_P (safe_chars, c))
2166 ENCODE_UNSAFE_CHARACTER (c);
b73bfc1c 2167 else
05e6f5dc 2168 ENCODE_ISO_CHARACTER (c);
b73bfc1c
KH
2169
2170 coding->consumed_char++;
84fbb8a0 2171 }
b73bfc1c
KH
2172
2173 label_end_of_loop:
2174 coding->consumed = src_base - source;
d46c5b12 2175 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
2176}
2177
2178\f
2179/*** 4. SJIS and BIG5 handlers ***/
2180
f4dee582 2181/* Although SJIS and BIG5 are not ISO's coding system, they are used
4ed46869
KH
2182 quite widely. So, for the moment, Emacs supports them in the bare
2183 C code. But, in the future, they may be supported only by CCL. */
2184
2185/* SJIS is a coding system encoding three character sets: ASCII, right
2186 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded
2187 as is. A character of charset katakana-jisx0201 is encoded by
2188 "position-code + 0x80". A character of charset japanese-jisx0208
2189 is encoded in 2-byte but two position-codes are divided and shifted
2190 so that it fit in the range below.
2191
2192 --- CODE RANGE of SJIS ---
2193 (character set) (range)
2194 ASCII 0x00 .. 0x7F
2195 KATAKANA-JISX0201 0xA0 .. 0xDF
c28a9453 2196 JISX0208 (1st byte) 0x81 .. 0x9F and 0xE0 .. 0xEF
d14d03ac 2197 (2nd byte) 0x40 .. 0x7E and 0x80 .. 0xFC
4ed46869
KH
2198 -------------------------------
2199
2200*/
2201
2202/* BIG5 is a coding system encoding two character sets: ASCII and
2203 Big5. An ASCII character is encoded as is. Big5 is a two-byte
2204 character set and is encoded in two-byte.
2205
2206 --- CODE RANGE of BIG5 ---
2207 (character set) (range)
2208 ASCII 0x00 .. 0x7F
2209 Big5 (1st byte) 0xA1 .. 0xFE
2210 (2nd byte) 0x40 .. 0x7E and 0xA1 .. 0xFE
2211 --------------------------
2212
2213 Since the number of characters in Big5 is larger than maximum
2214 characters in Emacs' charset (96x96), it can't be handled as one
2215 charset. So, in Emacs, Big5 is divided into two: `charset-big5-1'
2216 and `charset-big5-2'. Both are DIMENSION2 and CHARS94. The former
2217 contains frequently used characters and the latter contains less
2218 frequently used characters. */
2219
2220/* Macros to decode or encode a character of Big5 in BIG5. B1 and B2
2221 are the 1st and 2nd position-codes of Big5 in BIG5 coding system.
2222 C1 and C2 are the 1st and 2nd position-codes of of Emacs' internal
2223 format. CHARSET is `charset_big5_1' or `charset_big5_2'. */
2224
2225/* Number of Big5 characters which have the same code in 1st byte. */
2226#define BIG5_SAME_ROW (0xFF - 0xA1 + 0x7F - 0x40)
2227
2228#define DECODE_BIG5(b1, b2, charset, c1, c2) \
2229 do { \
2230 unsigned int temp \
2231 = (b1 - 0xA1) * BIG5_SAME_ROW + b2 - (b2 < 0x7F ? 0x40 : 0x62); \
2232 if (b1 < 0xC9) \
2233 charset = charset_big5_1; \
2234 else \
2235 { \
2236 charset = charset_big5_2; \
2237 temp -= (0xC9 - 0xA1) * BIG5_SAME_ROW; \
2238 } \
2239 c1 = temp / (0xFF - 0xA1) + 0x21; \
2240 c2 = temp % (0xFF - 0xA1) + 0x21; \
2241 } while (0)
2242
2243#define ENCODE_BIG5(charset, c1, c2, b1, b2) \
2244 do { \
2245 unsigned int temp = (c1 - 0x21) * (0xFF - 0xA1) + (c2 - 0x21); \
2246 if (charset == charset_big5_2) \
2247 temp += BIG5_SAME_ROW * (0xC9 - 0xA1); \
2248 b1 = temp / BIG5_SAME_ROW + 0xA1; \
2249 b2 = temp % BIG5_SAME_ROW; \
2250 b2 += b2 < 0x3F ? 0x40 : 0x62; \
2251 } while (0)
2252
2253/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2254 Check if a text is encoded in SJIS. If it is, return
2255 CODING_CATEGORY_MASK_SJIS, else return 0. */
2256
2257int
2258detect_coding_sjis (src, src_end)
2259 unsigned char *src, *src_end;
2260{
b73bfc1c
KH
2261 int c;
2262 /* Dummy for ONE_MORE_BYTE. */
2263 struct coding_system dummy_coding;
2264 struct coding_system *coding = &dummy_coding;
4ed46869 2265
b73bfc1c 2266 while (1)
4ed46869 2267 {
b73bfc1c 2268 ONE_MORE_BYTE (c);
4ed46869
KH
2269 if ((c >= 0x80 && c < 0xA0) || c >= 0xE0)
2270 {
b73bfc1c
KH
2271 ONE_MORE_BYTE (c);
2272 if (c < 0x40)
4ed46869
KH
2273 return 0;
2274 }
2275 }
b73bfc1c 2276 label_end_of_loop:
4ed46869
KH
2277 return CODING_CATEGORY_MASK_SJIS;
2278}
2279
2280/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2281 Check if a text is encoded in BIG5. If it is, return
2282 CODING_CATEGORY_MASK_BIG5, else return 0. */
2283
2284int
2285detect_coding_big5 (src, src_end)
2286 unsigned char *src, *src_end;
2287{
b73bfc1c
KH
2288 int c;
2289 /* Dummy for ONE_MORE_BYTE. */
2290 struct coding_system dummy_coding;
2291 struct coding_system *coding = &dummy_coding;
4ed46869 2292
b73bfc1c 2293 while (1)
4ed46869 2294 {
b73bfc1c 2295 ONE_MORE_BYTE (c);
4ed46869
KH
2296 if (c >= 0xA1)
2297 {
b73bfc1c 2298 ONE_MORE_BYTE (c);
4ed46869
KH
2299 if (c < 0x40 || (c >= 0x7F && c <= 0xA0))
2300 return 0;
2301 }
2302 }
b73bfc1c 2303 label_end_of_loop:
4ed46869
KH
2304 return CODING_CATEGORY_MASK_BIG5;
2305}
2306
fa42c37f
KH
2307/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2308 Check if a text is encoded in UTF-8. If it is, return
2309 CODING_CATEGORY_MASK_UTF_8, else return 0. */
2310
2311#define UTF_8_1_OCTET_P(c) ((c) < 0x80)
2312#define UTF_8_EXTRA_OCTET_P(c) (((c) & 0xC0) == 0x80)
2313#define UTF_8_2_OCTET_LEADING_P(c) (((c) & 0xE0) == 0xC0)
2314#define UTF_8_3_OCTET_LEADING_P(c) (((c) & 0xF0) == 0xE0)
2315#define UTF_8_4_OCTET_LEADING_P(c) (((c) & 0xF8) == 0xF0)
2316#define UTF_8_5_OCTET_LEADING_P(c) (((c) & 0xFC) == 0xF8)
2317#define UTF_8_6_OCTET_LEADING_P(c) (((c) & 0xFE) == 0xFC)
2318
2319int
2320detect_coding_utf_8 (src, src_end)
2321 unsigned char *src, *src_end;
2322{
2323 unsigned char c;
2324 int seq_maybe_bytes;
b73bfc1c
KH
2325 /* Dummy for ONE_MORE_BYTE. */
2326 struct coding_system dummy_coding;
2327 struct coding_system *coding = &dummy_coding;
fa42c37f 2328
b73bfc1c 2329 while (1)
fa42c37f 2330 {
b73bfc1c 2331 ONE_MORE_BYTE (c);
fa42c37f
KH
2332 if (UTF_8_1_OCTET_P (c))
2333 continue;
2334 else if (UTF_8_2_OCTET_LEADING_P (c))
2335 seq_maybe_bytes = 1;
2336 else if (UTF_8_3_OCTET_LEADING_P (c))
2337 seq_maybe_bytes = 2;
2338 else if (UTF_8_4_OCTET_LEADING_P (c))
2339 seq_maybe_bytes = 3;
2340 else if (UTF_8_5_OCTET_LEADING_P (c))
2341 seq_maybe_bytes = 4;
2342 else if (UTF_8_6_OCTET_LEADING_P (c))
2343 seq_maybe_bytes = 5;
2344 else
2345 return 0;
2346
2347 do
2348 {
b73bfc1c 2349 ONE_MORE_BYTE (c);
fa42c37f
KH
2350 if (!UTF_8_EXTRA_OCTET_P (c))
2351 return 0;
2352 seq_maybe_bytes--;
2353 }
2354 while (seq_maybe_bytes > 0);
2355 }
2356
b73bfc1c 2357 label_end_of_loop:
fa42c37f
KH
2358 return CODING_CATEGORY_MASK_UTF_8;
2359}
2360
2361/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2362 Check if a text is encoded in UTF-16 Big Endian (endian == 1) or
2363 Little Endian (otherwise). If it is, return
2364 CODING_CATEGORY_MASK_UTF_16_BE or CODING_CATEGORY_MASK_UTF_16_LE,
2365 else return 0. */
2366
2367#define UTF_16_INVALID_P(val) \
2368 (((val) == 0xFFFE) \
2369 || ((val) == 0xFFFF))
2370
2371#define UTF_16_HIGH_SURROGATE_P(val) \
2372 (((val) & 0xD800) == 0xD800)
2373
2374#define UTF_16_LOW_SURROGATE_P(val) \
2375 (((val) & 0xDC00) == 0xDC00)
2376
2377int
2378detect_coding_utf_16 (src, src_end)
2379 unsigned char *src, *src_end;
2380{
b73bfc1c
KH
2381 unsigned char c1, c2;
2382 /* Dummy for TWO_MORE_BYTES. */
2383 struct coding_system dummy_coding;
2384 struct coding_system *coding = &dummy_coding;
fa42c37f 2385
b73bfc1c
KH
2386 TWO_MORE_BYTES (c1, c2);
2387
2388 if ((c1 == 0xFF) && (c2 == 0xFE))
fa42c37f 2389 return CODING_CATEGORY_MASK_UTF_16_LE;
b73bfc1c 2390 else if ((c1 == 0xFE) && (c2 == 0xFF))
fa42c37f
KH
2391 return CODING_CATEGORY_MASK_UTF_16_BE;
2392
b73bfc1c 2393 label_end_of_loop:
fa42c37f
KH
2394 return 0;
2395}
2396
4ed46869
KH
2397/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
2398 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */
2399
b73bfc1c 2400static void
4ed46869 2401decode_coding_sjis_big5 (coding, source, destination,
d46c5b12 2402 src_bytes, dst_bytes, sjis_p)
4ed46869
KH
2403 struct coding_system *coding;
2404 unsigned char *source, *destination;
2405 int src_bytes, dst_bytes;
4ed46869
KH
2406 int sjis_p;
2407{
2408 unsigned char *src = source;
2409 unsigned char *src_end = source + src_bytes;
2410 unsigned char *dst = destination;
2411 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c
KH
2412 /* SRC_BASE remembers the start position in source in each loop.
2413 The loop will be exited when there's not enough source code
2414 (within macro ONE_MORE_BYTE), or when there's not enough
2415 destination area to produce a character (within macro
2416 EMIT_CHAR). */
2417 unsigned char *src_base;
2418 Lisp_Object translation_table;
a5d301df 2419
b73bfc1c
KH
2420 if (NILP (Venable_character_translation))
2421 translation_table = Qnil;
2422 else
2423 {
2424 translation_table = coding->translation_table_for_decode;
2425 if (NILP (translation_table))
2426 translation_table = Vstandard_translation_table_for_decode;
2427 }
4ed46869 2428
d46c5b12 2429 coding->produced_char = 0;
b73bfc1c 2430 while (1)
4ed46869 2431 {
b73bfc1c
KH
2432 int c, charset, c1, c2;
2433
2434 src_base = src;
2435 ONE_MORE_BYTE (c1);
2436
2437 if (c1 < 0x80)
4ed46869 2438 {
b73bfc1c
KH
2439 charset = CHARSET_ASCII;
2440 if (c1 < 0x20)
4ed46869 2441 {
b73bfc1c 2442 if (c1 == '\r')
d46c5b12 2443 {
b73bfc1c 2444 if (coding->eol_type == CODING_EOL_CRLF)
d46c5b12 2445 {
b73bfc1c
KH
2446 ONE_MORE_BYTE (c2);
2447 if (c2 == '\n')
2448 c1 = c2;
2449 else if (coding->mode
2450 & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2451 {
2452 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2453 goto label_end_of_loop;
2454 }
2455 else
2456 /* To process C2 again, SRC is subtracted by 1. */
2457 src--;
d46c5b12 2458 }
b73bfc1c
KH
2459 else if (coding->eol_type == CODING_EOL_CR)
2460 c1 = '\n';
2461 }
2462 else if (c1 == '\n'
2463 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2464 && (coding->eol_type == CODING_EOL_CR
2465 || coding->eol_type == CODING_EOL_CRLF))
2466 {
2467 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2468 goto label_end_of_loop;
d46c5b12 2469 }
4ed46869 2470 }
4ed46869 2471 }
54f78171 2472 else
b73bfc1c 2473 {
4ed46869
KH
2474 if (sjis_p)
2475 {
b73bfc1c
KH
2476 if (c1 >= 0xF0)
2477 goto label_invalid_code;
2478 if (c1 < 0xA0 || c1 >= 0xE0)
fb88bf2d 2479 {
54f78171
KH
2480 /* SJIS -> JISX0208 */
2481 ONE_MORE_BYTE (c2);
b73bfc1c
KH
2482 if (c2 < 0x40 || c2 == 0x7F || c2 > 0xFC)
2483 goto label_invalid_code;
2484 DECODE_SJIS (c1, c2, c1, c2);
2485 charset = charset_jisx0208;
5e34de15 2486 }
fb88bf2d 2487 else
b73bfc1c
KH
2488 /* SJIS -> JISX0201-Kana */
2489 charset = charset_katakana_jisx0201;
4ed46869 2490 }
fb88bf2d 2491 else
fb88bf2d 2492 {
54f78171 2493 /* BIG5 -> Big5 */
b73bfc1c
KH
2494 if (c1 < 0xA1 || c1 > 0xFE)
2495 goto label_invalid_code;
2496 ONE_MORE_BYTE (c2);
2497 if (c2 < 0x40 || (c2 > 0x7E && c2 < 0xA1) || c2 > 0xFE)
2498 goto label_invalid_code;
2499 DECODE_BIG5 (c1, c2, charset, c1, c2);
4ed46869
KH
2500 }
2501 }
4ed46869 2502
b73bfc1c
KH
2503 c = DECODE_ISO_CHARACTER (charset, c1, c2);
2504 EMIT_CHAR (c);
fb88bf2d
KH
2505 continue;
2506
b73bfc1c
KH
2507 label_invalid_code:
2508 coding->errors++;
4ed46869 2509 src = src_base;
b73bfc1c
KH
2510 c = *src++;
2511 EMIT_CHAR (c);
fb88bf2d 2512 }
d46c5b12 2513
b73bfc1c
KH
2514 label_end_of_loop:
2515 coding->consumed = coding->consumed_char = src_base - source;
d46c5b12 2516 coding->produced = dst - destination;
b73bfc1c 2517 return;
4ed46869
KH
2518}
2519
2520/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".
b73bfc1c
KH
2521 This function can encode charsets `ascii', `katakana-jisx0201',
2522 `japanese-jisx0208', `chinese-big5-1', and `chinese-big5-2'. We
2523 are sure that all these charsets are registered as official charset
4ed46869
KH
2524 (i.e. do not have extended leading-codes). Characters of other
2525 charsets are produced without any encoding. If SJIS_P is 1, encode
2526 SJIS text, else encode BIG5 text. */
2527
b73bfc1c 2528static void
4ed46869 2529encode_coding_sjis_big5 (coding, source, destination,
d46c5b12 2530 src_bytes, dst_bytes, sjis_p)
4ed46869
KH
2531 struct coding_system *coding;
2532 unsigned char *source, *destination;
2533 int src_bytes, dst_bytes;
4ed46869
KH
2534 int sjis_p;
2535{
2536 unsigned char *src = source;
2537 unsigned char *src_end = source + src_bytes;
2538 unsigned char *dst = destination;
2539 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c
KH
2540 /* SRC_BASE remembers the start position in source in each loop.
2541 The loop will be exited when there's not enough source text to
2542 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2543 there's not enough destination area to produce encoded codes
2544 (within macro EMIT_BYTES). */
2545 unsigned char *src_base;
2546 Lisp_Object translation_table;
4ed46869 2547
b73bfc1c
KH
2548 if (NILP (Venable_character_translation))
2549 translation_table = Qnil;
2550 else
4ed46869 2551 {
39658efc 2552 translation_table = coding->translation_table_for_encode;
b73bfc1c 2553 if (NILP (translation_table))
39658efc 2554 translation_table = Vstandard_translation_table_for_encode;
b73bfc1c 2555 }
a5d301df 2556
b73bfc1c
KH
2557 while (1)
2558 {
2559 int c, charset, c1, c2;
4ed46869 2560
b73bfc1c
KH
2561 src_base = src;
2562 ONE_MORE_CHAR (c);
2563
2564 /* Now encode the character C. */
2565 if (SINGLE_BYTE_CHAR_P (c))
2566 {
2567 switch (c)
4ed46869 2568 {
b73bfc1c
KH
2569 case '\r':
2570 if (!coding->mode & CODING_MODE_SELECTIVE_DISPLAY)
2571 {
2572 EMIT_ONE_BYTE (c);
2573 break;
2574 }
2575 c = '\n';
2576 case '\n':
2577 if (coding->eol_type == CODING_EOL_CRLF)
2578 {
2579 EMIT_TWO_BYTES ('\r', c);
2580 break;
2581 }
2582 else if (coding->eol_type == CODING_EOL_CR)
2583 c = '\r';
2584 default:
2585 EMIT_ONE_BYTE (c);
2586 }
2587 }
2588 else
2589 {
2590 SPLIT_CHAR (c, charset, c1, c2);
2591 if (sjis_p)
2592 {
2593 if (charset == charset_jisx0208
2594 || charset == charset_jisx0208_1978)
2595 {
2596 ENCODE_SJIS (c1, c2, c1, c2);
2597 EMIT_TWO_BYTES (c1, c2);
2598 }
39658efc
KH
2599 else if (charset == charset_katakana_jisx0201)
2600 EMIT_ONE_BYTE (c1 | 0x80);
fc53a214
KH
2601 else if (charset == charset_latin_jisx0201)
2602 EMIT_ONE_BYTE (c1);
b73bfc1c
KH
2603 else
2604 /* There's no way other than producing the internal
2605 codes as is. */
2606 EMIT_BYTES (src_base, src);
4ed46869 2607 }
4ed46869 2608 else
b73bfc1c
KH
2609 {
2610 if (charset == charset_big5_1 || charset == charset_big5_2)
2611 {
2612 ENCODE_BIG5 (charset, c1, c2, c1, c2);
2613 EMIT_TWO_BYTES (c1, c2);
2614 }
2615 else
2616 /* There's no way other than producing the internal
2617 codes as is. */
2618 EMIT_BYTES (src_base, src);
2619 }
4ed46869 2620 }
b73bfc1c 2621 coding->consumed_char++;
4ed46869
KH
2622 }
2623
b73bfc1c
KH
2624 label_end_of_loop:
2625 coding->consumed = src_base - source;
d46c5b12 2626 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
2627}
2628
2629\f
1397dc18
KH
2630/*** 5. CCL handlers ***/
2631
2632/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2633 Check if a text is encoded in a coding system of which
2634 encoder/decoder are written in CCL program. If it is, return
2635 CODING_CATEGORY_MASK_CCL, else return 0. */
2636
2637int
2638detect_coding_ccl (src, src_end)
2639 unsigned char *src, *src_end;
2640{
2641 unsigned char *valid;
b73bfc1c
KH
2642 int c;
2643 /* Dummy for ONE_MORE_BYTE. */
2644 struct coding_system dummy_coding;
2645 struct coding_system *coding = &dummy_coding;
1397dc18
KH
2646
2647 /* No coding system is assigned to coding-category-ccl. */
2648 if (!coding_system_table[CODING_CATEGORY_IDX_CCL])
2649 return 0;
2650
2651 valid = coding_system_table[CODING_CATEGORY_IDX_CCL]->spec.ccl.valid_codes;
b73bfc1c 2652 while (1)
1397dc18 2653 {
b73bfc1c
KH
2654 ONE_MORE_BYTE (c);
2655 if (! valid[c])
2656 return 0;
1397dc18 2657 }
b73bfc1c 2658 label_end_of_loop:
1397dc18
KH
2659 return CODING_CATEGORY_MASK_CCL;
2660}
2661
2662\f
2663/*** 6. End-of-line handlers ***/
4ed46869 2664
b73bfc1c 2665/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
4ed46869 2666
b73bfc1c 2667static void
d46c5b12 2668decode_eol (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
2669 struct coding_system *coding;
2670 unsigned char *source, *destination;
2671 int src_bytes, dst_bytes;
4ed46869
KH
2672{
2673 unsigned char *src = source;
4ed46869 2674 unsigned char *dst = destination;
b73bfc1c
KH
2675 unsigned char *src_end = src + src_bytes;
2676 unsigned char *dst_end = dst + dst_bytes;
2677 Lisp_Object translation_table;
2678 /* SRC_BASE remembers the start position in source in each loop.
2679 The loop will be exited when there's not enough source code
2680 (within macro ONE_MORE_BYTE), or when there's not enough
2681 destination area to produce a character (within macro
2682 EMIT_CHAR). */
2683 unsigned char *src_base;
2684 int c;
2685
2686 translation_table = Qnil;
4ed46869
KH
2687 switch (coding->eol_type)
2688 {
2689 case CODING_EOL_CRLF:
b73bfc1c 2690 while (1)
d46c5b12 2691 {
b73bfc1c
KH
2692 src_base = src;
2693 ONE_MORE_BYTE (c);
2694 if (c == '\r')
fb88bf2d 2695 {
b73bfc1c
KH
2696 ONE_MORE_BYTE (c);
2697 if (c != '\n')
2698 {
2699 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2700 {
2701 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2702 goto label_end_of_loop;
2703 }
2704 src--;
2705 c = '\r';
2706 }
fb88bf2d 2707 }
b73bfc1c
KH
2708 else if (c == '\n'
2709 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL))
d46c5b12 2710 {
b73bfc1c
KH
2711 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2712 goto label_end_of_loop;
d46c5b12 2713 }
b73bfc1c 2714 EMIT_CHAR (c);
d46c5b12 2715 }
b73bfc1c
KH
2716 break;
2717
2718 case CODING_EOL_CR:
2719 while (1)
d46c5b12 2720 {
b73bfc1c
KH
2721 src_base = src;
2722 ONE_MORE_BYTE (c);
2723 if (c == '\n')
2724 {
2725 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2726 {
2727 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2728 goto label_end_of_loop;
2729 }
2730 }
2731 else if (c == '\r')
2732 c = '\n';
2733 EMIT_CHAR (c);
d46c5b12 2734 }
4ed46869
KH
2735 break;
2736
b73bfc1c
KH
2737 default: /* no need for EOL handling */
2738 while (1)
d46c5b12 2739 {
b73bfc1c
KH
2740 src_base = src;
2741 ONE_MORE_BYTE (c);
2742 EMIT_CHAR (c);
d46c5b12 2743 }
4ed46869
KH
2744 }
2745
b73bfc1c
KH
2746 label_end_of_loop:
2747 coding->consumed = coding->consumed_char = src_base - source;
2748 coding->produced = dst - destination;
2749 return;
4ed46869
KH
2750}
2751
2752/* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode
b73bfc1c
KH
2753 format of end-of-line according to `coding->eol_type'. It also
2754 convert multibyte form 8-bit characers to unibyte if
2755 CODING->src_multibyte is nonzero. If `coding->mode &
2756 CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code '\r' in source text
2757 also means end-of-line. */
4ed46869 2758
b73bfc1c 2759static void
d46c5b12 2760encode_eol (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
2761 struct coding_system *coding;
2762 unsigned char *source, *destination;
2763 int src_bytes, dst_bytes;
4ed46869
KH
2764{
2765 unsigned char *src = source;
2766 unsigned char *dst = destination;
b73bfc1c
KH
2767 unsigned char *src_end = src + src_bytes;
2768 unsigned char *dst_end = dst + dst_bytes;
2769 Lisp_Object translation_table;
2770 /* SRC_BASE remembers the start position in source in each loop.
2771 The loop will be exited when there's not enough source text to
2772 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2773 there's not enough destination area to produce encoded codes
2774 (within macro EMIT_BYTES). */
2775 unsigned char *src_base;
2776 int c;
2777 int selective_display = coding->mode & CODING_MODE_SELECTIVE_DISPLAY;
2778
2779 translation_table = Qnil;
2780 if (coding->src_multibyte
2781 && *(src_end - 1) == LEADING_CODE_8_BIT_CONTROL)
2782 {
2783 src_end--;
2784 src_bytes--;
2785 coding->result = CODING_FINISH_INSUFFICIENT_SRC;
2786 }
fb88bf2d 2787
d46c5b12
KH
2788 if (coding->eol_type == CODING_EOL_CRLF)
2789 {
b73bfc1c 2790 while (src < src_end)
d46c5b12 2791 {
b73bfc1c 2792 src_base = src;
d46c5b12 2793 c = *src++;
b73bfc1c
KH
2794 if (c >= 0x20)
2795 EMIT_ONE_BYTE (c);
2796 else if (c == '\n' || (c == '\r' && selective_display))
2797 EMIT_TWO_BYTES ('\r', '\n');
d46c5b12 2798 else
b73bfc1c 2799 EMIT_ONE_BYTE (c);
d46c5b12 2800 }
ff2b1ea9 2801 src_base = src;
b73bfc1c 2802 label_end_of_loop:
005f0d35 2803 ;
d46c5b12
KH
2804 }
2805 else
4ed46869 2806 {
78a629d2 2807 if (!dst_bytes || src_bytes <= dst_bytes)
4ed46869 2808 {
b73bfc1c
KH
2809 safe_bcopy (src, dst, src_bytes);
2810 src_base = src_end;
2811 dst += src_bytes;
d46c5b12 2812 }
d46c5b12 2813 else
b73bfc1c
KH
2814 {
2815 if (coding->src_multibyte
2816 && *(src + dst_bytes - 1) == LEADING_CODE_8_BIT_CONTROL)
2817 dst_bytes--;
2818 safe_bcopy (src, dst, dst_bytes);
2819 src_base = src + dst_bytes;
2820 dst = destination + dst_bytes;
2821 coding->result = CODING_FINISH_INSUFFICIENT_DST;
2822 }
993824c9 2823 if (coding->eol_type == CODING_EOL_CR)
d46c5b12 2824 {
b73bfc1c
KH
2825 for (src = destination; src < dst; src++)
2826 if (*src == '\n') *src = '\r';
d46c5b12 2827 }
b73bfc1c 2828 else if (selective_display)
d46c5b12 2829 {
b73bfc1c
KH
2830 for (src = destination; src < dst; src++)
2831 if (*src == '\r') *src = '\n';
4ed46869 2832 }
4ed46869 2833 }
b73bfc1c
KH
2834 if (coding->src_multibyte)
2835 dst = destination + str_as_unibyte (destination, dst - destination);
4ed46869 2836
b73bfc1c
KH
2837 coding->consumed = src_base - source;
2838 coding->produced = dst - destination;
78a629d2 2839 coding->produced_char = coding->produced;
4ed46869
KH
2840}
2841
2842\f
1397dc18 2843/*** 7. C library functions ***/
4ed46869
KH
2844
2845/* In Emacs Lisp, coding system is represented by a Lisp symbol which
2846 has a property `coding-system'. The value of this property is a
2847 vector of length 5 (called as coding-vector). Among elements of
2848 this vector, the first (element[0]) and the fifth (element[4])
2849 carry important information for decoding/encoding. Before
2850 decoding/encoding, this information should be set in fields of a
2851 structure of type `coding_system'.
2852
2853 A value of property `coding-system' can be a symbol of another
2854 subsidiary coding-system. In that case, Emacs gets coding-vector
2855 from that symbol.
2856
2857 `element[0]' contains information to be set in `coding->type'. The
2858 value and its meaning is as follows:
2859
0ef69138
KH
2860 0 -- coding_type_emacs_mule
2861 1 -- coding_type_sjis
2862 2 -- coding_type_iso2022
2863 3 -- coding_type_big5
2864 4 -- coding_type_ccl encoder/decoder written in CCL
2865 nil -- coding_type_no_conversion
2866 t -- coding_type_undecided (automatic conversion on decoding,
2867 no-conversion on encoding)
4ed46869
KH
2868
2869 `element[4]' contains information to be set in `coding->flags' and
2870 `coding->spec'. The meaning varies by `coding->type'.
2871
2872 If `coding->type' is `coding_type_iso2022', element[4] is a vector
2873 of length 32 (of which the first 13 sub-elements are used now).
2874 Meanings of these sub-elements are:
2875
2876 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso2022'
2877 If the value is an integer of valid charset, the charset is
2878 assumed to be designated to graphic register N initially.
2879
2880 If the value is minus, it is a minus value of charset which
2881 reserves graphic register N, which means that the charset is
2882 not designated initially but should be designated to graphic
2883 register N just before encoding a character in that charset.
2884
2885 If the value is nil, graphic register N is never used on
2886 encoding.
2887
2888 sub-element[N] where N is 4 through 11: to be set in `coding->flags'
2889 Each value takes t or nil. See the section ISO2022 of
2890 `coding.h' for more information.
2891
2892 If `coding->type' is `coding_type_big5', element[4] is t to denote
2893 BIG5-ETen or nil to denote BIG5-HKU.
2894
2895 If `coding->type' takes the other value, element[4] is ignored.
2896
2897 Emacs Lisp's coding system also carries information about format of
2898 end-of-line in a value of property `eol-type'. If the value is
2899 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
2900 means CODING_EOL_CR. If it is not integer, it should be a vector
2901 of subsidiary coding systems of which property `eol-type' has one
2902 of above values.
2903
2904*/
2905
2906/* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
2907 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING
2908 is setup so that no conversion is necessary and return -1, else
2909 return 0. */
2910
2911int
e0e989f6
KH
2912setup_coding_system (coding_system, coding)
2913 Lisp_Object coding_system;
4ed46869
KH
2914 struct coding_system *coding;
2915{
d46c5b12 2916 Lisp_Object coding_spec, coding_type, eol_type, plist;
4608c386 2917 Lisp_Object val;
70c22245 2918 int i;
4ed46869 2919
d46c5b12 2920 /* Initialize some fields required for all kinds of coding systems. */
774324d6 2921 coding->symbol = coding_system;
d46c5b12
KH
2922 coding->common_flags = 0;
2923 coding->mode = 0;
2924 coding->heading_ascii = -1;
2925 coding->post_read_conversion = coding->pre_write_conversion = Qnil;
ec6d2bb8
KH
2926 coding->composing = COMPOSITION_DISABLED;
2927 coding->cmp_data = NULL;
1f5dbf34
KH
2928
2929 if (NILP (coding_system))
2930 goto label_invalid_coding_system;
2931
4608c386 2932 coding_spec = Fget (coding_system, Qcoding_system);
1f5dbf34 2933
4608c386
KH
2934 if (!VECTORP (coding_spec)
2935 || XVECTOR (coding_spec)->size != 5
2936 || !CONSP (XVECTOR (coding_spec)->contents[3]))
4ed46869 2937 goto label_invalid_coding_system;
4608c386 2938
d46c5b12
KH
2939 eol_type = inhibit_eol_conversion ? Qnil : Fget (coding_system, Qeol_type);
2940 if (VECTORP (eol_type))
2941 {
2942 coding->eol_type = CODING_EOL_UNDECIDED;
2943 coding->common_flags = CODING_REQUIRE_DETECTION_MASK;
2944 }
2945 else if (XFASTINT (eol_type) == 1)
2946 {
2947 coding->eol_type = CODING_EOL_CRLF;
2948 coding->common_flags
2949 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
2950 }
2951 else if (XFASTINT (eol_type) == 2)
2952 {
2953 coding->eol_type = CODING_EOL_CR;
2954 coding->common_flags
2955 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
2956 }
2957 else
2958 coding->eol_type = CODING_EOL_LF;
2959
2960 coding_type = XVECTOR (coding_spec)->contents[0];
2961 /* Try short cut. */
2962 if (SYMBOLP (coding_type))
2963 {
2964 if (EQ (coding_type, Qt))
2965 {
2966 coding->type = coding_type_undecided;
2967 coding->common_flags |= CODING_REQUIRE_DETECTION_MASK;
2968 }
2969 else
2970 coding->type = coding_type_no_conversion;
2971 return 0;
2972 }
2973
d46c5b12
KH
2974 /* Get values of coding system properties:
2975 `post-read-conversion', `pre-write-conversion',
f967223b 2976 `translation-table-for-decode', `translation-table-for-encode'. */
4608c386 2977 plist = XVECTOR (coding_spec)->contents[3];
b843d1ae
KH
2978 /* Pre & post conversion functions should be disabled if
2979 inhibit_eol_conversion is nozero. This is the case that a code
2980 conversion function is called while those functions are running. */
2981 if (! inhibit_pre_post_conversion)
2982 {
2983 coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion);
2984 coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion);
2985 }
f967223b 2986 val = Fplist_get (plist, Qtranslation_table_for_decode);
4608c386 2987 if (SYMBOLP (val))
f967223b
KH
2988 val = Fget (val, Qtranslation_table_for_decode);
2989 coding->translation_table_for_decode = CHAR_TABLE_P (val) ? val : Qnil;
2990 val = Fplist_get (plist, Qtranslation_table_for_encode);
4608c386 2991 if (SYMBOLP (val))
f967223b
KH
2992 val = Fget (val, Qtranslation_table_for_encode);
2993 coding->translation_table_for_encode = CHAR_TABLE_P (val) ? val : Qnil;
d46c5b12
KH
2994 val = Fplist_get (plist, Qcoding_category);
2995 if (!NILP (val))
2996 {
2997 val = Fget (val, Qcoding_category_index);
2998 if (INTEGERP (val))
2999 coding->category_idx = XINT (val);
3000 else
3001 goto label_invalid_coding_system;
3002 }
3003 else
3004 goto label_invalid_coding_system;
4608c386 3005
ec6d2bb8
KH
3006 /* If the coding system has non-nil `composition' property, enable
3007 composition handling. */
3008 val = Fplist_get (plist, Qcomposition);
3009 if (!NILP (val))
3010 coding->composing = COMPOSITION_NO;
3011
d46c5b12 3012 switch (XFASTINT (coding_type))
4ed46869
KH
3013 {
3014 case 0:
0ef69138 3015 coding->type = coding_type_emacs_mule;
c952af22
KH
3016 if (!NILP (coding->post_read_conversion))
3017 coding->common_flags |= CODING_REQUIRE_DECODING_MASK;
3018 if (!NILP (coding->pre_write_conversion))
3019 coding->common_flags |= CODING_REQUIRE_ENCODING_MASK;
4ed46869
KH
3020 break;
3021
3022 case 1:
3023 coding->type = coding_type_sjis;
c952af22
KH
3024 coding->common_flags
3025 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869
KH
3026 break;
3027
3028 case 2:
3029 coding->type = coding_type_iso2022;
c952af22
KH
3030 coding->common_flags
3031 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3032 {
70c22245 3033 Lisp_Object val, temp;
4ed46869 3034 Lisp_Object *flags;
d46c5b12 3035 int i, charset, reg_bits = 0;
4ed46869 3036
4608c386 3037 val = XVECTOR (coding_spec)->contents[4];
f44d27ce 3038
4ed46869
KH
3039 if (!VECTORP (val) || XVECTOR (val)->size != 32)
3040 goto label_invalid_coding_system;
3041
3042 flags = XVECTOR (val)->contents;
3043 coding->flags
3044 = ((NILP (flags[4]) ? 0 : CODING_FLAG_ISO_SHORT_FORM)
3045 | (NILP (flags[5]) ? 0 : CODING_FLAG_ISO_RESET_AT_EOL)
3046 | (NILP (flags[6]) ? 0 : CODING_FLAG_ISO_RESET_AT_CNTL)
3047 | (NILP (flags[7]) ? 0 : CODING_FLAG_ISO_SEVEN_BITS)
3048 | (NILP (flags[8]) ? 0 : CODING_FLAG_ISO_LOCKING_SHIFT)
3049 | (NILP (flags[9]) ? 0 : CODING_FLAG_ISO_SINGLE_SHIFT)
3050 | (NILP (flags[10]) ? 0 : CODING_FLAG_ISO_USE_ROMAN)
3051 | (NILP (flags[11]) ? 0 : CODING_FLAG_ISO_USE_OLDJIS)
e0e989f6
KH
3052 | (NILP (flags[12]) ? 0 : CODING_FLAG_ISO_NO_DIRECTION)
3053 | (NILP (flags[13]) ? 0 : CODING_FLAG_ISO_INIT_AT_BOL)
c4825358
KH
3054 | (NILP (flags[14]) ? 0 : CODING_FLAG_ISO_DESIGNATE_AT_BOL)
3055 | (NILP (flags[15]) ? 0 : CODING_FLAG_ISO_SAFE)
3f003981 3056 | (NILP (flags[16]) ? 0 : CODING_FLAG_ISO_LATIN_EXTRA)
c4825358 3057 );
4ed46869
KH
3058
3059 /* Invoke graphic register 0 to plane 0. */
3060 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
3061 /* Invoke graphic register 1 to plane 1 if we can use full 8-bit. */
3062 CODING_SPEC_ISO_INVOCATION (coding, 1)
3063 = (coding->flags & CODING_FLAG_ISO_SEVEN_BITS ? -1 : 1);
3064 /* Not single shifting at first. */
6e85d753 3065 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0;
e0e989f6 3066 /* Beginning of buffer should also be regarded as bol. */
6e85d753 3067 CODING_SPEC_ISO_BOL (coding) = 1;
4ed46869 3068
70c22245
KH
3069 for (charset = 0; charset <= MAX_CHARSET; charset++)
3070 CODING_SPEC_ISO_REVISION_NUMBER (coding, charset) = 255;
3071 val = Vcharset_revision_alist;
3072 while (CONSP (val))
3073 {
03699b14 3074 charset = get_charset_id (Fcar_safe (XCAR (val)));
70c22245 3075 if (charset >= 0
03699b14 3076 && (temp = Fcdr_safe (XCAR (val)), INTEGERP (temp))
70c22245
KH
3077 && (i = XINT (temp), (i >= 0 && (i + '@') < 128)))
3078 CODING_SPEC_ISO_REVISION_NUMBER (coding, charset) = i;
03699b14 3079 val = XCDR (val);
70c22245
KH
3080 }
3081
4ed46869
KH
3082 /* Checks FLAGS[REG] (REG = 0, 1, 2 3) and decide designations.
3083 FLAGS[REG] can be one of below:
3084 integer CHARSET: CHARSET occupies register I,
3085 t: designate nothing to REG initially, but can be used
3086 by any charsets,
3087 list of integer, nil, or t: designate the first
3088 element (if integer) to REG initially, the remaining
3089 elements (if integer) is designated to REG on request,
d46c5b12 3090 if an element is t, REG can be used by any charsets,
4ed46869 3091 nil: REG is never used. */
467e7675 3092 for (charset = 0; charset <= MAX_CHARSET; charset++)
1ba9e4ab
KH
3093 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3094 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION;
4ed46869
KH
3095 for (i = 0; i < 4; i++)
3096 {
3097 if (INTEGERP (flags[i])
e0e989f6
KH
3098 && (charset = XINT (flags[i]), CHARSET_VALID_P (charset))
3099 || (charset = get_charset_id (flags[i])) >= 0)
4ed46869
KH
3100 {
3101 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = charset;
3102 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) = i;
3103 }
3104 else if (EQ (flags[i], Qt))
3105 {
3106 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
d46c5b12
KH
3107 reg_bits |= 1 << i;
3108 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
4ed46869
KH
3109 }
3110 else if (CONSP (flags[i]))
3111 {
84d60297
RS
3112 Lisp_Object tail;
3113 tail = flags[i];
4ed46869 3114
d46c5b12 3115 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
03699b14
KR
3116 if (INTEGERP (XCAR (tail))
3117 && (charset = XINT (XCAR (tail)),
e0e989f6 3118 CHARSET_VALID_P (charset))
03699b14 3119 || (charset = get_charset_id (XCAR (tail))) >= 0)
4ed46869
KH
3120 {
3121 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = charset;
3122 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) =i;
3123 }
3124 else
3125 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
03699b14 3126 tail = XCDR (tail);
4ed46869
KH
3127 while (CONSP (tail))
3128 {
03699b14
KR
3129 if (INTEGERP (XCAR (tail))
3130 && (charset = XINT (XCAR (tail)),
e0e989f6 3131 CHARSET_VALID_P (charset))
03699b14 3132 || (charset = get_charset_id (XCAR (tail))) >= 0)
70c22245
KH
3133 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3134 = i;
03699b14 3135 else if (EQ (XCAR (tail), Qt))
d46c5b12 3136 reg_bits |= 1 << i;
03699b14 3137 tail = XCDR (tail);
4ed46869
KH
3138 }
3139 }
3140 else
3141 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
3142
3143 CODING_SPEC_ISO_DESIGNATION (coding, i)
3144 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i);
3145 }
3146
d46c5b12 3147 if (reg_bits && ! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
4ed46869
KH
3148 {
3149 /* REG 1 can be used only by locking shift in 7-bit env. */
3150 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
d46c5b12 3151 reg_bits &= ~2;
4ed46869
KH
3152 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
3153 /* Without any shifting, only REG 0 and 1 can be used. */
d46c5b12 3154 reg_bits &= 3;
4ed46869
KH
3155 }
3156
d46c5b12
KH
3157 if (reg_bits)
3158 for (charset = 0; charset <= MAX_CHARSET; charset++)
6e85d753 3159 {
96148065
KH
3160 if (CHARSET_VALID_P (charset)
3161 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3162 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
d46c5b12
KH
3163 {
3164 /* There exist some default graphic registers to be
96148065 3165 used by CHARSET. */
d46c5b12
KH
3166
3167 /* We had better avoid designating a charset of
3168 CHARS96 to REG 0 as far as possible. */
3169 if (CHARSET_CHARS (charset) == 96)
3170 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3171 = (reg_bits & 2
3172 ? 1 : (reg_bits & 4 ? 2 : (reg_bits & 8 ? 3 : 0)));
3173 else
3174 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3175 = (reg_bits & 1
3176 ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3)));
3177 }
6e85d753 3178 }
4ed46869 3179 }
c952af22 3180 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
d46c5b12 3181 coding->spec.iso2022.last_invalid_designation_register = -1;
4ed46869
KH
3182 break;
3183
3184 case 3:
3185 coding->type = coding_type_big5;
c952af22
KH
3186 coding->common_flags
3187 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3188 coding->flags
4608c386 3189 = (NILP (XVECTOR (coding_spec)->contents[4])
4ed46869
KH
3190 ? CODING_FLAG_BIG5_HKU
3191 : CODING_FLAG_BIG5_ETEN);
3192 break;
3193
3194 case 4:
3195 coding->type = coding_type_ccl;
c952af22
KH
3196 coding->common_flags
3197 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3198 {
84d60297 3199 val = XVECTOR (coding_spec)->contents[4];
ef4ced28
KH
3200 if (! CONSP (val)
3201 || setup_ccl_program (&(coding->spec.ccl.decoder),
03699b14 3202 XCAR (val)) < 0
ef4ced28 3203 || setup_ccl_program (&(coding->spec.ccl.encoder),
03699b14 3204 XCDR (val)) < 0)
4ed46869 3205 goto label_invalid_coding_system;
1397dc18
KH
3206
3207 bzero (coding->spec.ccl.valid_codes, 256);
3208 val = Fplist_get (plist, Qvalid_codes);
3209 if (CONSP (val))
3210 {
3211 Lisp_Object this;
3212
03699b14 3213 for (; CONSP (val); val = XCDR (val))
1397dc18 3214 {
03699b14 3215 this = XCAR (val);
1397dc18
KH
3216 if (INTEGERP (this)
3217 && XINT (this) >= 0 && XINT (this) < 256)
3218 coding->spec.ccl.valid_codes[XINT (this)] = 1;
3219 else if (CONSP (this)
03699b14
KR
3220 && INTEGERP (XCAR (this))
3221 && INTEGERP (XCDR (this)))
1397dc18 3222 {
03699b14
KR
3223 int start = XINT (XCAR (this));
3224 int end = XINT (XCDR (this));
1397dc18
KH
3225
3226 if (start >= 0 && start <= end && end < 256)
e133c8fa 3227 while (start <= end)
1397dc18
KH
3228 coding->spec.ccl.valid_codes[start++] = 1;
3229 }
3230 }
3231 }
4ed46869 3232 }
c952af22 3233 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
aaaf0b1e 3234 coding->spec.ccl.cr_carryover = 0;
4ed46869
KH
3235 break;
3236
27901516
KH
3237 case 5:
3238 coding->type = coding_type_raw_text;
3239 break;
3240
4ed46869 3241 default:
d46c5b12 3242 goto label_invalid_coding_system;
4ed46869
KH
3243 }
3244 return 0;
3245
3246 label_invalid_coding_system:
3247 coding->type = coding_type_no_conversion;
d46c5b12 3248 coding->category_idx = CODING_CATEGORY_IDX_BINARY;
c952af22 3249 coding->common_flags = 0;
dec137e5 3250 coding->eol_type = CODING_EOL_LF;
d46c5b12 3251 coding->pre_write_conversion = coding->post_read_conversion = Qnil;
4ed46869
KH
3252 return -1;
3253}
3254
ec6d2bb8
KH
3255/* Free memory blocks allocated for storing composition information. */
3256
3257void
3258coding_free_composition_data (coding)
3259 struct coding_system *coding;
3260{
3261 struct composition_data *cmp_data = coding->cmp_data, *next;
3262
3263 if (!cmp_data)
3264 return;
3265 /* Memory blocks are chained. At first, rewind to the first, then,
3266 free blocks one by one. */
3267 while (cmp_data->prev)
3268 cmp_data = cmp_data->prev;
3269 while (cmp_data)
3270 {
3271 next = cmp_data->next;
3272 xfree (cmp_data);
3273 cmp_data = next;
3274 }
3275 coding->cmp_data = NULL;
3276}
3277
3278/* Set `char_offset' member of all memory blocks pointed by
3279 coding->cmp_data to POS. */
3280
3281void
3282coding_adjust_composition_offset (coding, pos)
3283 struct coding_system *coding;
3284 int pos;
3285{
3286 struct composition_data *cmp_data;
3287
3288 for (cmp_data = coding->cmp_data; cmp_data; cmp_data = cmp_data->next)
3289 cmp_data->char_offset = pos;
3290}
3291
54f78171
KH
3292/* Setup raw-text or one of its subsidiaries in the structure
3293 coding_system CODING according to the already setup value eol_type
3294 in CODING. CODING should be setup for some coding system in
3295 advance. */
3296
3297void
3298setup_raw_text_coding_system (coding)
3299 struct coding_system *coding;
3300{
3301 if (coding->type != coding_type_raw_text)
3302 {
3303 coding->symbol = Qraw_text;
3304 coding->type = coding_type_raw_text;
3305 if (coding->eol_type != CODING_EOL_UNDECIDED)
3306 {
84d60297
RS
3307 Lisp_Object subsidiaries;
3308 subsidiaries = Fget (Qraw_text, Qeol_type);
54f78171
KH
3309
3310 if (VECTORP (subsidiaries)
3311 && XVECTOR (subsidiaries)->size == 3)
3312 coding->symbol
3313 = XVECTOR (subsidiaries)->contents[coding->eol_type];
3314 }
716e0b0a 3315 setup_coding_system (coding->symbol, coding);
54f78171
KH
3316 }
3317 return;
3318}
3319
4ed46869
KH
3320/* Emacs has a mechanism to automatically detect a coding system if it
3321 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But,
3322 it's impossible to distinguish some coding systems accurately
3323 because they use the same range of codes. So, at first, coding
3324 systems are categorized into 7, those are:
3325
0ef69138 3326 o coding-category-emacs-mule
4ed46869
KH
3327
3328 The category for a coding system which has the same code range
3329 as Emacs' internal format. Assigned the coding-system (Lisp
0ef69138 3330 symbol) `emacs-mule' by default.
4ed46869
KH
3331
3332 o coding-category-sjis
3333
3334 The category for a coding system which has the same code range
3335 as SJIS. Assigned the coding-system (Lisp
7717c392 3336 symbol) `japanese-shift-jis' by default.
4ed46869
KH
3337
3338 o coding-category-iso-7
3339
3340 The category for a coding system which has the same code range
7717c392 3341 as ISO2022 of 7-bit environment. This doesn't use any locking
d46c5b12
KH
3342 shift and single shift functions. This can encode/decode all
3343 charsets. Assigned the coding-system (Lisp symbol)
3344 `iso-2022-7bit' by default.
3345
3346 o coding-category-iso-7-tight
3347
3348 Same as coding-category-iso-7 except that this can
3349 encode/decode only the specified charsets.
4ed46869
KH
3350
3351 o coding-category-iso-8-1
3352
3353 The category for a coding system which has the same code range
3354 as ISO2022 of 8-bit environment and graphic plane 1 used only
7717c392
KH
3355 for DIMENSION1 charset. This doesn't use any locking shift
3356 and single shift functions. Assigned the coding-system (Lisp
3357 symbol) `iso-latin-1' by default.
4ed46869
KH
3358
3359 o coding-category-iso-8-2
3360
3361 The category for a coding system which has the same code range
3362 as ISO2022 of 8-bit environment and graphic plane 1 used only
7717c392
KH
3363 for DIMENSION2 charset. This doesn't use any locking shift
3364 and single shift functions. Assigned the coding-system (Lisp
3365 symbol) `japanese-iso-8bit' by default.
4ed46869 3366
7717c392 3367 o coding-category-iso-7-else
4ed46869
KH
3368
3369 The category for a coding system which has the same code range
7717c392
KH
3370 as ISO2022 of 7-bit environemnt but uses locking shift or
3371 single shift functions. Assigned the coding-system (Lisp
3372 symbol) `iso-2022-7bit-lock' by default.
3373
3374 o coding-category-iso-8-else
3375
3376 The category for a coding system which has the same code range
3377 as ISO2022 of 8-bit environemnt but uses locking shift or
3378 single shift functions. Assigned the coding-system (Lisp
3379 symbol) `iso-2022-8bit-ss2' by default.
4ed46869
KH
3380
3381 o coding-category-big5
3382
3383 The category for a coding system which has the same code range
3384 as BIG5. Assigned the coding-system (Lisp symbol)
e0e989f6 3385 `cn-big5' by default.
4ed46869 3386
fa42c37f
KH
3387 o coding-category-utf-8
3388
3389 The category for a coding system which has the same code range
3390 as UTF-8 (cf. RFC2279). Assigned the coding-system (Lisp
3391 symbol) `utf-8' by default.
3392
3393 o coding-category-utf-16-be
3394
3395 The category for a coding system in which a text has an
3396 Unicode signature (cf. Unicode Standard) in the order of BIG
3397 endian at the head. Assigned the coding-system (Lisp symbol)
3398 `utf-16-be' by default.
3399
3400 o coding-category-utf-16-le
3401
3402 The category for a coding system in which a text has an
3403 Unicode signature (cf. Unicode Standard) in the order of
3404 LITTLE endian at the head. Assigned the coding-system (Lisp
3405 symbol) `utf-16-le' by default.
3406
1397dc18
KH
3407 o coding-category-ccl
3408
3409 The category for a coding system of which encoder/decoder is
3410 written in CCL programs. The default value is nil, i.e., no
3411 coding system is assigned.
3412
4ed46869
KH
3413 o coding-category-binary
3414
3415 The category for a coding system not categorized in any of the
3416 above. Assigned the coding-system (Lisp symbol)
e0e989f6 3417 `no-conversion' by default.
4ed46869
KH
3418
3419 Each of them is a Lisp symbol and the value is an actual
3420 `coding-system's (this is also a Lisp symbol) assigned by a user.
3421 What Emacs does actually is to detect a category of coding system.
3422 Then, it uses a `coding-system' assigned to it. If Emacs can't
3423 decide only one possible category, it selects a category of the
3424 highest priority. Priorities of categories are also specified by a
3425 user in a Lisp variable `coding-category-list'.
3426
3427*/
3428
66cfb530
KH
3429static
3430int ascii_skip_code[256];
3431
d46c5b12 3432/* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded.
4ed46869
KH
3433 If it detects possible coding systems, return an integer in which
3434 appropriate flag bits are set. Flag bits are defined by macros
fa42c37f
KH
3435 CODING_CATEGORY_MASK_XXX in `coding.h'. If PRIORITIES is non-NULL,
3436 it should point the table `coding_priorities'. In that case, only
3437 the flag bit for a coding system of the highest priority is set in
3438 the returned value.
4ed46869 3439
d46c5b12
KH
3440 How many ASCII characters are at the head is returned as *SKIP. */
3441
3442static int
3443detect_coding_mask (source, src_bytes, priorities, skip)
3444 unsigned char *source;
3445 int src_bytes, *priorities, *skip;
4ed46869
KH
3446{
3447 register unsigned char c;
d46c5b12 3448 unsigned char *src = source, *src_end = source + src_bytes;
fa42c37f
KH
3449 unsigned int mask, utf16_examined_p, iso2022_examined_p;
3450 int i, idx;
4ed46869
KH
3451
3452 /* At first, skip all ASCII characters and control characters except
3453 for three ISO2022 specific control characters. */
66cfb530
KH
3454 ascii_skip_code[ISO_CODE_SO] = 0;
3455 ascii_skip_code[ISO_CODE_SI] = 0;
3456 ascii_skip_code[ISO_CODE_ESC] = 0;
3457
bcf26d6a 3458 label_loop_detect_coding:
66cfb530 3459 while (src < src_end && ascii_skip_code[*src]) src++;
d46c5b12 3460 *skip = src - source;
4ed46869
KH
3461
3462 if (src >= src_end)
3463 /* We found nothing other than ASCII. There's nothing to do. */
d46c5b12 3464 return 0;
4ed46869 3465
8a8147d6 3466 c = *src;
4ed46869
KH
3467 /* The text seems to be encoded in some multilingual coding system.
3468 Now, try to find in which coding system the text is encoded. */
3469 if (c < 0x80)
bcf26d6a
KH
3470 {
3471 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */
3472 /* C is an ISO2022 specific control code of C0. */
3473 mask = detect_coding_iso2022 (src, src_end);
1b2af4b0 3474 if (mask == 0)
d46c5b12
KH
3475 {
3476 /* No valid ISO2022 code follows C. Try again. */
3477 src++;
66cfb530
KH
3478 if (c == ISO_CODE_ESC)
3479 ascii_skip_code[ISO_CODE_ESC] = 1;
3480 else
3481 ascii_skip_code[ISO_CODE_SO] = ascii_skip_code[ISO_CODE_SI] = 1;
d46c5b12
KH
3482 goto label_loop_detect_coding;
3483 }
3484 if (priorities)
fa42c37f
KH
3485 {
3486 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
3487 {
3488 if (mask & priorities[i])
3489 return priorities[i];
3490 }
3491 return CODING_CATEGORY_MASK_RAW_TEXT;
3492 }
bcf26d6a 3493 }
d46c5b12 3494 else
c4825358 3495 {
d46c5b12 3496 int try;
4ed46869 3497
d46c5b12
KH
3498 if (c < 0xA0)
3499 {
3500 /* C is the first byte of SJIS character code,
fa42c37f
KH
3501 or a leading-code of Emacs' internal format (emacs-mule),
3502 or the first byte of UTF-16. */
3503 try = (CODING_CATEGORY_MASK_SJIS
3504 | CODING_CATEGORY_MASK_EMACS_MULE
3505 | CODING_CATEGORY_MASK_UTF_16_BE
3506 | CODING_CATEGORY_MASK_UTF_16_LE);
d46c5b12
KH
3507
3508 /* Or, if C is a special latin extra code,
3509 or is an ISO2022 specific control code of C1 (SS2 or SS3),
3510 or is an ISO2022 control-sequence-introducer (CSI),
3511 we should also consider the possibility of ISO2022 codings. */
3512 if ((VECTORP (Vlatin_extra_code_table)
3513 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
3514 || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3)
3515 || (c == ISO_CODE_CSI
3516 && (src < src_end
3517 && (*src == ']'
3518 || ((*src == '0' || *src == '1' || *src == '2')
3519 && src + 1 < src_end
3520 && src[1] == ']')))))
3521 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE
3522 | CODING_CATEGORY_MASK_ISO_8BIT);
3523 }
c4825358 3524 else
d46c5b12
KH
3525 /* C is a character of ISO2022 in graphic plane right,
3526 or a SJIS's 1-byte character code (i.e. JISX0201),
fa42c37f
KH
3527 or the first byte of BIG5's 2-byte code,
3528 or the first byte of UTF-8/16. */
d46c5b12
KH
3529 try = (CODING_CATEGORY_MASK_ISO_8_ELSE
3530 | CODING_CATEGORY_MASK_ISO_8BIT
3531 | CODING_CATEGORY_MASK_SJIS
fa42c37f
KH
3532 | CODING_CATEGORY_MASK_BIG5
3533 | CODING_CATEGORY_MASK_UTF_8
3534 | CODING_CATEGORY_MASK_UTF_16_BE
3535 | CODING_CATEGORY_MASK_UTF_16_LE);
d46c5b12 3536
1397dc18
KH
3537 /* Or, we may have to consider the possibility of CCL. */
3538 if (coding_system_table[CODING_CATEGORY_IDX_CCL]
3539 && (coding_system_table[CODING_CATEGORY_IDX_CCL]
3540 ->spec.ccl.valid_codes)[c])
3541 try |= CODING_CATEGORY_MASK_CCL;
3542
d46c5b12 3543 mask = 0;
fa42c37f 3544 utf16_examined_p = iso2022_examined_p = 0;
d46c5b12
KH
3545 if (priorities)
3546 {
3547 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
3548 {
fa42c37f
KH
3549 if (!iso2022_examined_p
3550 && (priorities[i] & try & CODING_CATEGORY_MASK_ISO))
3551 {
3552 mask |= detect_coding_iso2022 (src, src_end);
3553 iso2022_examined_p = 1;
3554 }
5ab13dd0 3555 else if (priorities[i] & try & CODING_CATEGORY_MASK_SJIS)
fa42c37f
KH
3556 mask |= detect_coding_sjis (src, src_end);
3557 else if (priorities[i] & try & CODING_CATEGORY_MASK_UTF_8)
3558 mask |= detect_coding_utf_8 (src, src_end);
3559 else if (!utf16_examined_p
3560 && (priorities[i] & try &
3561 CODING_CATEGORY_MASK_UTF_16_BE_LE))
3562 {
3563 mask |= detect_coding_utf_16 (src, src_end);
3564 utf16_examined_p = 1;
3565 }
5ab13dd0 3566 else if (priorities[i] & try & CODING_CATEGORY_MASK_BIG5)
fa42c37f 3567 mask |= detect_coding_big5 (src, src_end);
5ab13dd0 3568 else if (priorities[i] & try & CODING_CATEGORY_MASK_EMACS_MULE)
fa42c37f 3569 mask |= detect_coding_emacs_mule (src, src_end);
89fa8b36 3570 else if (priorities[i] & try & CODING_CATEGORY_MASK_CCL)
fa42c37f 3571 mask |= detect_coding_ccl (src, src_end);
5ab13dd0 3572 else if (priorities[i] & CODING_CATEGORY_MASK_RAW_TEXT)
fa42c37f 3573 mask |= CODING_CATEGORY_MASK_RAW_TEXT;
5ab13dd0 3574 else if (priorities[i] & CODING_CATEGORY_MASK_BINARY)
fa42c37f
KH
3575 mask |= CODING_CATEGORY_MASK_BINARY;
3576 if (mask & priorities[i])
3577 return priorities[i];
d46c5b12
KH
3578 }
3579 return CODING_CATEGORY_MASK_RAW_TEXT;
3580 }
3581 if (try & CODING_CATEGORY_MASK_ISO)
3582 mask |= detect_coding_iso2022 (src, src_end);
3583 if (try & CODING_CATEGORY_MASK_SJIS)
3584 mask |= detect_coding_sjis (src, src_end);
3585 if (try & CODING_CATEGORY_MASK_BIG5)
3586 mask |= detect_coding_big5 (src, src_end);
fa42c37f
KH
3587 if (try & CODING_CATEGORY_MASK_UTF_8)
3588 mask |= detect_coding_utf_8 (src, src_end);
3589 if (try & CODING_CATEGORY_MASK_UTF_16_BE_LE)
3590 mask |= detect_coding_utf_16 (src, src_end);
d46c5b12 3591 if (try & CODING_CATEGORY_MASK_EMACS_MULE)
1397dc18
KH
3592 mask |= detect_coding_emacs_mule (src, src_end);
3593 if (try & CODING_CATEGORY_MASK_CCL)
3594 mask |= detect_coding_ccl (src, src_end);
c4825358 3595 }
5ab13dd0 3596 return (mask | CODING_CATEGORY_MASK_RAW_TEXT | CODING_CATEGORY_MASK_BINARY);
4ed46869
KH
3597}
3598
3599/* Detect how a text of length SRC_BYTES pointed by SRC is encoded.
3600 The information of the detected coding system is set in CODING. */
3601
3602void
3603detect_coding (coding, src, src_bytes)
3604 struct coding_system *coding;
3605 unsigned char *src;
3606 int src_bytes;
3607{
d46c5b12
KH
3608 unsigned int idx;
3609 int skip, mask, i;
84d60297 3610 Lisp_Object val;
4ed46869 3611
84d60297 3612 val = Vcoding_category_list;
66cfb530 3613 mask = detect_coding_mask (src, src_bytes, coding_priorities, &skip);
d46c5b12 3614 coding->heading_ascii = skip;
4ed46869 3615
d46c5b12
KH
3616 if (!mask) return;
3617
3618 /* We found a single coding system of the highest priority in MASK. */
3619 idx = 0;
3620 while (mask && ! (mask & 1)) mask >>= 1, idx++;
3621 if (! mask)
3622 idx = CODING_CATEGORY_IDX_RAW_TEXT;
4ed46869 3623
d46c5b12
KH
3624 val = XSYMBOL (XVECTOR (Vcoding_category_table)->contents[idx])->value;
3625
3626 if (coding->eol_type != CODING_EOL_UNDECIDED)
27901516 3627 {
84d60297 3628 Lisp_Object tmp;
d46c5b12 3629
84d60297 3630 tmp = Fget (val, Qeol_type);
d46c5b12
KH
3631 if (VECTORP (tmp))
3632 val = XVECTOR (tmp)->contents[coding->eol_type];
4ed46869 3633 }
b73bfc1c
KH
3634
3635 /* Setup this new coding system while preserving some slots. */
3636 {
3637 int src_multibyte = coding->src_multibyte;
3638 int dst_multibyte = coding->dst_multibyte;
3639
3640 setup_coding_system (val, coding);
3641 coding->src_multibyte = src_multibyte;
3642 coding->dst_multibyte = dst_multibyte;
3643 coding->heading_ascii = skip;
3644 }
4ed46869
KH
3645}
3646
d46c5b12
KH
3647/* Detect how end-of-line of a text of length SRC_BYTES pointed by
3648 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF,
3649 CODING_EOL_CR, and CODING_EOL_UNDECIDED.
3650
3651 How many non-eol characters are at the head is returned as *SKIP. */
4ed46869 3652
bc4bc72a
RS
3653#define MAX_EOL_CHECK_COUNT 3
3654
d46c5b12
KH
3655static int
3656detect_eol_type (source, src_bytes, skip)
3657 unsigned char *source;
3658 int src_bytes, *skip;
4ed46869 3659{
d46c5b12 3660 unsigned char *src = source, *src_end = src + src_bytes;
4ed46869 3661 unsigned char c;
bc4bc72a
RS
3662 int total = 0; /* How many end-of-lines are found so far. */
3663 int eol_type = CODING_EOL_UNDECIDED;
3664 int this_eol_type;
4ed46869 3665
d46c5b12
KH
3666 *skip = 0;
3667
bc4bc72a 3668 while (src < src_end && total < MAX_EOL_CHECK_COUNT)
4ed46869
KH
3669 {
3670 c = *src++;
bc4bc72a 3671 if (c == '\n' || c == '\r')
4ed46869 3672 {
d46c5b12
KH
3673 if (*skip == 0)
3674 *skip = src - 1 - source;
bc4bc72a
RS
3675 total++;
3676 if (c == '\n')
3677 this_eol_type = CODING_EOL_LF;
3678 else if (src >= src_end || *src != '\n')
3679 this_eol_type = CODING_EOL_CR;
4ed46869 3680 else
bc4bc72a
RS
3681 this_eol_type = CODING_EOL_CRLF, src++;
3682
3683 if (eol_type == CODING_EOL_UNDECIDED)
3684 /* This is the first end-of-line. */
3685 eol_type = this_eol_type;
3686 else if (eol_type != this_eol_type)
d46c5b12
KH
3687 {
3688 /* The found type is different from what found before. */
3689 eol_type = CODING_EOL_INCONSISTENT;
3690 break;
3691 }
4ed46869
KH
3692 }
3693 }
bc4bc72a 3694
d46c5b12
KH
3695 if (*skip == 0)
3696 *skip = src_end - source;
85a02ca4 3697 return eol_type;
4ed46869
KH
3698}
3699
fa42c37f
KH
3700/* Like detect_eol_type, but detect EOL type in 2-octet
3701 big-endian/little-endian format for coding systems utf-16-be and
3702 utf-16-le. */
3703
3704static int
3705detect_eol_type_in_2_octet_form (source, src_bytes, skip, big_endian_p)
3706 unsigned char *source;
3707 int src_bytes, *skip;
3708{
3709 unsigned char *src = source, *src_end = src + src_bytes;
3710 unsigned int c1, c2;
3711 int total = 0; /* How many end-of-lines are found so far. */
3712 int eol_type = CODING_EOL_UNDECIDED;
3713 int this_eol_type;
3714 int msb, lsb;
3715
3716 if (big_endian_p)
3717 msb = 0, lsb = 1;
3718 else
3719 msb = 1, lsb = 0;
3720
3721 *skip = 0;
3722
3723 while ((src + 1) < src_end && total < MAX_EOL_CHECK_COUNT)
3724 {
3725 c1 = (src[msb] << 8) | (src[lsb]);
3726 src += 2;
3727
3728 if (c1 == '\n' || c1 == '\r')
3729 {
3730 if (*skip == 0)
3731 *skip = src - 2 - source;
3732 total++;
3733 if (c1 == '\n')
3734 {
3735 this_eol_type = CODING_EOL_LF;
3736 }
3737 else
3738 {
3739 if ((src + 1) >= src_end)
3740 {
3741 this_eol_type = CODING_EOL_CR;
3742 }
3743 else
3744 {
3745 c2 = (src[msb] << 8) | (src[lsb]);
3746 if (c2 == '\n')
3747 this_eol_type = CODING_EOL_CRLF, src += 2;
3748 else
3749 this_eol_type = CODING_EOL_CR;
3750 }
3751 }
3752
3753 if (eol_type == CODING_EOL_UNDECIDED)
3754 /* This is the first end-of-line. */
3755 eol_type = this_eol_type;
3756 else if (eol_type != this_eol_type)
3757 {
3758 /* The found type is different from what found before. */
3759 eol_type = CODING_EOL_INCONSISTENT;
3760 break;
3761 }
3762 }
3763 }
3764
3765 if (*skip == 0)
3766 *skip = src_end - source;
3767 return eol_type;
3768}
3769
4ed46869
KH
3770/* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
3771 is encoded. If it detects an appropriate format of end-of-line, it
3772 sets the information in *CODING. */
3773
3774void
3775detect_eol (coding, src, src_bytes)
3776 struct coding_system *coding;
3777 unsigned char *src;
3778 int src_bytes;
3779{
4608c386 3780 Lisp_Object val;
d46c5b12 3781 int skip;
fa42c37f
KH
3782 int eol_type;
3783
3784 switch (coding->category_idx)
3785 {
3786 case CODING_CATEGORY_IDX_UTF_16_BE:
3787 eol_type = detect_eol_type_in_2_octet_form (src, src_bytes, &skip, 1);
3788 break;
3789 case CODING_CATEGORY_IDX_UTF_16_LE:
3790 eol_type = detect_eol_type_in_2_octet_form (src, src_bytes, &skip, 0);
3791 break;
3792 default:
3793 eol_type = detect_eol_type (src, src_bytes, &skip);
3794 break;
3795 }
d46c5b12
KH
3796
3797 if (coding->heading_ascii > skip)
3798 coding->heading_ascii = skip;
3799 else
3800 skip = coding->heading_ascii;
4ed46869 3801
0ef69138 3802 if (eol_type == CODING_EOL_UNDECIDED)
4ed46869 3803 return;
27901516
KH
3804 if (eol_type == CODING_EOL_INCONSISTENT)
3805 {
3806#if 0
3807 /* This code is suppressed until we find a better way to
992f23f2 3808 distinguish raw text file and binary file. */
27901516
KH
3809
3810 /* If we have already detected that the coding is raw-text, the
3811 coding should actually be no-conversion. */
3812 if (coding->type == coding_type_raw_text)
3813 {
3814 setup_coding_system (Qno_conversion, coding);
3815 return;
3816 }
3817 /* Else, let's decode only text code anyway. */
3818#endif /* 0 */
1b2af4b0 3819 eol_type = CODING_EOL_LF;
27901516
KH
3820 }
3821
4608c386 3822 val = Fget (coding->symbol, Qeol_type);
4ed46869 3823 if (VECTORP (val) && XVECTOR (val)->size == 3)
d46c5b12 3824 {
b73bfc1c
KH
3825 int src_multibyte = coding->src_multibyte;
3826 int dst_multibyte = coding->dst_multibyte;
3827
d46c5b12 3828 setup_coding_system (XVECTOR (val)->contents[eol_type], coding);
b73bfc1c
KH
3829 coding->src_multibyte = src_multibyte;
3830 coding->dst_multibyte = dst_multibyte;
d46c5b12
KH
3831 coding->heading_ascii = skip;
3832 }
3833}
3834
3835#define CONVERSION_BUFFER_EXTRA_ROOM 256
3836
b73bfc1c
KH
3837#define DECODING_BUFFER_MAG(coding) \
3838 (coding->type == coding_type_iso2022 \
3839 ? 3 \
3840 : (coding->type == coding_type_ccl \
3841 ? coding->spec.ccl.decoder.buf_magnification \
3842 : 2))
d46c5b12
KH
3843
3844/* Return maximum size (bytes) of a buffer enough for decoding
3845 SRC_BYTES of text encoded in CODING. */
3846
3847int
3848decoding_buffer_size (coding, src_bytes)
3849 struct coding_system *coding;
3850 int src_bytes;
3851{
3852 return (src_bytes * DECODING_BUFFER_MAG (coding)
3853 + CONVERSION_BUFFER_EXTRA_ROOM);
3854}
3855
3856/* Return maximum size (bytes) of a buffer enough for encoding
3857 SRC_BYTES of text to CODING. */
3858
3859int
3860encoding_buffer_size (coding, src_bytes)
3861 struct coding_system *coding;
3862 int src_bytes;
3863{
3864 int magnification;
3865
3866 if (coding->type == coding_type_ccl)
3867 magnification = coding->spec.ccl.encoder.buf_magnification;
b73bfc1c 3868 else if (CODING_REQUIRE_ENCODING (coding))
d46c5b12 3869 magnification = 3;
b73bfc1c
KH
3870 else
3871 magnification = 1;
d46c5b12
KH
3872
3873 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM);
3874}
3875
73be902c
KH
3876/* Working buffer for code conversion. */
3877struct conversion_buffer
3878{
3879 int size; /* size of data. */
3880 int on_stack; /* 1 if allocated by alloca. */
3881 unsigned char *data;
3882};
d46c5b12 3883
73be902c
KH
3884/* Don't use alloca for allocating memory space larger than this, lest
3885 we overflow their stack. */
3886#define MAX_ALLOCA 16*1024
d46c5b12 3887
73be902c
KH
3888/* Allocate LEN bytes of memory for BUF (struct conversion_buffer). */
3889#define allocate_conversion_buffer(buf, len) \
3890 do { \
3891 if (len < MAX_ALLOCA) \
3892 { \
3893 buf.data = (unsigned char *) alloca (len); \
3894 buf.on_stack = 1; \
3895 } \
3896 else \
3897 { \
3898 buf.data = (unsigned char *) xmalloc (len); \
3899 buf.on_stack = 0; \
3900 } \
3901 buf.size = len; \
3902 } while (0)
d46c5b12 3903
73be902c
KH
3904/* Double the allocated memory for *BUF. */
3905static void
3906extend_conversion_buffer (buf)
3907 struct conversion_buffer *buf;
d46c5b12 3908{
73be902c 3909 if (buf->on_stack)
d46c5b12 3910 {
73be902c
KH
3911 unsigned char *save = buf->data;
3912 buf->data = (unsigned char *) xmalloc (buf->size * 2);
3913 bcopy (save, buf->data, buf->size);
3914 buf->on_stack = 0;
d46c5b12 3915 }
73be902c
KH
3916 else
3917 {
3918 buf->data = (unsigned char *) xrealloc (buf->data, buf->size * 2);
3919 }
3920 buf->size *= 2;
3921}
3922
3923/* Free the allocated memory for BUF if it is not on stack. */
3924static void
3925free_conversion_buffer (buf)
3926 struct conversion_buffer *buf;
3927{
3928 if (!buf->on_stack)
3929 xfree (buf->data);
d46c5b12
KH
3930}
3931
3932int
3933ccl_coding_driver (coding, source, destination, src_bytes, dst_bytes, encodep)
3934 struct coding_system *coding;
3935 unsigned char *source, *destination;
3936 int src_bytes, dst_bytes, encodep;
3937{
3938 struct ccl_program *ccl
3939 = encodep ? &coding->spec.ccl.encoder : &coding->spec.ccl.decoder;
3940 int result;
3941
ae9ff118 3942 ccl->last_block = coding->mode & CODING_MODE_LAST_BLOCK;
aaaf0b1e
KH
3943 if (encodep)
3944 ccl->eol_type = coding->eol_type;
7272d75c 3945 ccl->multibyte = coding->src_multibyte;
d46c5b12
KH
3946 coding->produced = ccl_driver (ccl, source, destination,
3947 src_bytes, dst_bytes, &(coding->consumed));
b73bfc1c
KH
3948 if (encodep)
3949 coding->produced_char = coding->produced;
3950 else
3951 {
3952 int bytes
3953 = dst_bytes ? dst_bytes : source + coding->consumed - destination;
3954 coding->produced = str_as_multibyte (destination, bytes,
3955 coding->produced,
3956 &(coding->produced_char));
3957 }
69f76525 3958
d46c5b12
KH
3959 switch (ccl->status)
3960 {
3961 case CCL_STAT_SUSPEND_BY_SRC:
73be902c 3962 coding->result = CODING_FINISH_INSUFFICIENT_SRC;
d46c5b12
KH
3963 break;
3964 case CCL_STAT_SUSPEND_BY_DST:
73be902c 3965 coding->result = CODING_FINISH_INSUFFICIENT_DST;
d46c5b12 3966 break;
9864ebce
KH
3967 case CCL_STAT_QUIT:
3968 case CCL_STAT_INVALID_CMD:
73be902c 3969 coding->result = CODING_FINISH_INTERRUPT;
9864ebce 3970 break;
d46c5b12 3971 default:
73be902c 3972 coding->result = CODING_FINISH_NORMAL;
d46c5b12
KH
3973 break;
3974 }
73be902c 3975 return coding->result;
4ed46869
KH
3976}
3977
aaaf0b1e
KH
3978/* Decode EOL format of the text at PTR of BYTES length destructively
3979 according to CODING->eol_type. This is called after the CCL
3980 program produced a decoded text at PTR. If we do CRLF->LF
3981 conversion, update CODING->produced and CODING->produced_char. */
3982
3983static void
3984decode_eol_post_ccl (coding, ptr, bytes)
3985 struct coding_system *coding;
3986 unsigned char *ptr;
3987 int bytes;
3988{
3989 Lisp_Object val, saved_coding_symbol;
3990 unsigned char *pend = ptr + bytes;
3991 int dummy;
3992
3993 /* Remember the current coding system symbol. We set it back when
3994 an inconsistent EOL is found so that `last-coding-system-used' is
3995 set to the coding system that doesn't specify EOL conversion. */
3996 saved_coding_symbol = coding->symbol;
3997
3998 coding->spec.ccl.cr_carryover = 0;
3999 if (coding->eol_type == CODING_EOL_UNDECIDED)
4000 {
4001 /* Here, to avoid the call of setup_coding_system, we directly
4002 call detect_eol_type. */
4003 coding->eol_type = detect_eol_type (ptr, bytes, &dummy);
74b01b80
EZ
4004 if (coding->eol_type == CODING_EOL_INCONSISTENT)
4005 coding->eol_type = CODING_EOL_LF;
4006 if (coding->eol_type != CODING_EOL_UNDECIDED)
4007 {
4008 val = Fget (coding->symbol, Qeol_type);
4009 if (VECTORP (val) && XVECTOR (val)->size == 3)
4010 coding->symbol = XVECTOR (val)->contents[coding->eol_type];
4011 }
aaaf0b1e
KH
4012 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
4013 }
4014
74b01b80
EZ
4015 if (coding->eol_type == CODING_EOL_LF
4016 || coding->eol_type == CODING_EOL_UNDECIDED)
aaaf0b1e
KH
4017 {
4018 /* We have nothing to do. */
4019 ptr = pend;
4020 }
4021 else if (coding->eol_type == CODING_EOL_CRLF)
4022 {
4023 unsigned char *pstart = ptr, *p = ptr;
4024
4025 if (! (coding->mode & CODING_MODE_LAST_BLOCK)
4026 && *(pend - 1) == '\r')
4027 {
4028 /* If the last character is CR, we can't handle it here
4029 because LF will be in the not-yet-decoded source text.
4030 Recorded that the CR is not yet processed. */
4031 coding->spec.ccl.cr_carryover = 1;
4032 coding->produced--;
4033 coding->produced_char--;
4034 pend--;
4035 }
4036 while (ptr < pend)
4037 {
4038 if (*ptr == '\r')
4039 {
4040 if (ptr + 1 < pend && *(ptr + 1) == '\n')
4041 {
4042 *p++ = '\n';
4043 ptr += 2;
4044 }
4045 else
4046 {
4047 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4048 goto undo_eol_conversion;
4049 *p++ = *ptr++;
4050 }
4051 }
4052 else if (*ptr == '\n'
4053 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4054 goto undo_eol_conversion;
4055 else
4056 *p++ = *ptr++;
4057 continue;
4058
4059 undo_eol_conversion:
4060 /* We have faced with inconsistent EOL format at PTR.
4061 Convert all LFs before PTR back to CRLFs. */
4062 for (p--, ptr--; p >= pstart; p--)
4063 {
4064 if (*p == '\n')
4065 *ptr-- = '\n', *ptr-- = '\r';
4066 else
4067 *ptr-- = *p;
4068 }
4069 /* If carryover is recorded, cancel it because we don't
4070 convert CRLF anymore. */
4071 if (coding->spec.ccl.cr_carryover)
4072 {
4073 coding->spec.ccl.cr_carryover = 0;
4074 coding->produced++;
4075 coding->produced_char++;
4076 pend++;
4077 }
4078 p = ptr = pend;
4079 coding->eol_type = CODING_EOL_LF;
4080 coding->symbol = saved_coding_symbol;
4081 }
4082 if (p < pend)
4083 {
4084 /* As each two-byte sequence CRLF was converted to LF, (PEND
4085 - P) is the number of deleted characters. */
4086 coding->produced -= pend - p;
4087 coding->produced_char -= pend - p;
4088 }
4089 }
4090 else /* i.e. coding->eol_type == CODING_EOL_CR */
4091 {
4092 unsigned char *p = ptr;
4093
4094 for (; ptr < pend; ptr++)
4095 {
4096 if (*ptr == '\r')
4097 *ptr = '\n';
4098 else if (*ptr == '\n'
4099 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4100 {
4101 for (; p < ptr; p++)
4102 {
4103 if (*p == '\n')
4104 *p = '\r';
4105 }
4106 ptr = pend;
4107 coding->eol_type = CODING_EOL_LF;
4108 coding->symbol = saved_coding_symbol;
4109 }
4110 }
4111 }
4112}
4113
4ed46869
KH
4114/* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before
4115 decoding, it may detect coding system and format of end-of-line if
b73bfc1c
KH
4116 those are not yet decided. The source should be unibyte, the
4117 result is multibyte if CODING->dst_multibyte is nonzero, else
4118 unibyte. */
4ed46869
KH
4119
4120int
d46c5b12 4121decode_coding (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
4122 struct coding_system *coding;
4123 unsigned char *source, *destination;
4124 int src_bytes, dst_bytes;
4ed46869 4125{
0ef69138 4126 if (coding->type == coding_type_undecided)
4ed46869
KH
4127 detect_coding (coding, source, src_bytes);
4128
aaaf0b1e
KH
4129 if (coding->eol_type == CODING_EOL_UNDECIDED
4130 && coding->type != coding_type_ccl)
4ed46869
KH
4131 detect_eol (coding, source, src_bytes);
4132
b73bfc1c
KH
4133 coding->produced = coding->produced_char = 0;
4134 coding->consumed = coding->consumed_char = 0;
4135 coding->errors = 0;
4136 coding->result = CODING_FINISH_NORMAL;
4137
4ed46869
KH
4138 switch (coding->type)
4139 {
4ed46869 4140 case coding_type_sjis:
b73bfc1c
KH
4141 decode_coding_sjis_big5 (coding, source, destination,
4142 src_bytes, dst_bytes, 1);
4ed46869
KH
4143 break;
4144
4145 case coding_type_iso2022:
b73bfc1c
KH
4146 decode_coding_iso2022 (coding, source, destination,
4147 src_bytes, dst_bytes);
4ed46869
KH
4148 break;
4149
4150 case coding_type_big5:
b73bfc1c
KH
4151 decode_coding_sjis_big5 (coding, source, destination,
4152 src_bytes, dst_bytes, 0);
4153 break;
4154
4155 case coding_type_emacs_mule:
4156 decode_coding_emacs_mule (coding, source, destination,
4157 src_bytes, dst_bytes);
4ed46869
KH
4158 break;
4159
4160 case coding_type_ccl:
aaaf0b1e
KH
4161 if (coding->spec.ccl.cr_carryover)
4162 {
4163 /* Set the CR which is not processed by the previous call of
4164 decode_eol_post_ccl in DESTINATION. */
4165 *destination = '\r';
4166 coding->produced++;
4167 coding->produced_char++;
4168 dst_bytes--;
4169 }
4170 ccl_coding_driver (coding, source,
4171 destination + coding->spec.ccl.cr_carryover,
b73bfc1c 4172 src_bytes, dst_bytes, 0);
aaaf0b1e
KH
4173 if (coding->eol_type != CODING_EOL_LF)
4174 decode_eol_post_ccl (coding, destination, coding->produced);
d46c5b12
KH
4175 break;
4176
b73bfc1c
KH
4177 default:
4178 decode_eol (coding, source, destination, src_bytes, dst_bytes);
4179 }
4180
4181 if (coding->result == CODING_FINISH_INSUFFICIENT_SRC
4182 && coding->consumed == src_bytes)
4183 coding->result = CODING_FINISH_NORMAL;
4184
4185 if (coding->mode & CODING_MODE_LAST_BLOCK
4186 && coding->result == CODING_FINISH_INSUFFICIENT_SRC)
4187 {
4188 unsigned char *src = source + coding->consumed;
4189 unsigned char *dst = destination + coding->produced;
4190
4191 src_bytes -= coding->consumed;
bb10be8b 4192 coding->errors++;
b73bfc1c
KH
4193 if (COMPOSING_P (coding))
4194 DECODE_COMPOSITION_END ('1');
4195 while (src_bytes--)
d46c5b12 4196 {
b73bfc1c
KH
4197 int c = *src++;
4198 dst += CHAR_STRING (c, dst);
4199 coding->produced_char++;
d46c5b12 4200 }
b73bfc1c
KH
4201 coding->consumed = coding->consumed_char = src - source;
4202 coding->produced = dst - destination;
73be902c 4203 coding->result = CODING_FINISH_NORMAL;
4ed46869
KH
4204 }
4205
b73bfc1c
KH
4206 if (!coding->dst_multibyte)
4207 {
4208 coding->produced = str_as_unibyte (destination, coding->produced);
4209 coding->produced_char = coding->produced;
4210 }
4ed46869 4211
b73bfc1c
KH
4212 return coding->result;
4213}
52d41803 4214
b73bfc1c
KH
4215/* See "GENERAL NOTES about `encode_coding_XXX ()' functions". The
4216 multibyteness of the source is CODING->src_multibyte, the
4217 multibyteness of the result is always unibyte. */
4ed46869
KH
4218
4219int
d46c5b12 4220encode_coding (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
4221 struct coding_system *coding;
4222 unsigned char *source, *destination;
4223 int src_bytes, dst_bytes;
4ed46869 4224{
b73bfc1c
KH
4225 coding->produced = coding->produced_char = 0;
4226 coding->consumed = coding->consumed_char = 0;
4227 coding->errors = 0;
4228 coding->result = CODING_FINISH_NORMAL;
4ed46869 4229
d46c5b12
KH
4230 switch (coding->type)
4231 {
4ed46869 4232 case coding_type_sjis:
b73bfc1c
KH
4233 encode_coding_sjis_big5 (coding, source, destination,
4234 src_bytes, dst_bytes, 1);
4ed46869
KH
4235 break;
4236
4237 case coding_type_iso2022:
b73bfc1c
KH
4238 encode_coding_iso2022 (coding, source, destination,
4239 src_bytes, dst_bytes);
4ed46869
KH
4240 break;
4241
4242 case coding_type_big5:
b73bfc1c
KH
4243 encode_coding_sjis_big5 (coding, source, destination,
4244 src_bytes, dst_bytes, 0);
4245 break;
4246
4247 case coding_type_emacs_mule:
4248 encode_coding_emacs_mule (coding, source, destination,
4249 src_bytes, dst_bytes);
4ed46869
KH
4250 break;
4251
4252 case coding_type_ccl:
b73bfc1c
KH
4253 ccl_coding_driver (coding, source, destination,
4254 src_bytes, dst_bytes, 1);
d46c5b12
KH
4255 break;
4256
b73bfc1c
KH
4257 default:
4258 encode_eol (coding, source, destination, src_bytes, dst_bytes);
4259 }
4260
73be902c
KH
4261 if (coding->mode & CODING_MODE_LAST_BLOCK
4262 && coding->result == CODING_FINISH_INSUFFICIENT_SRC)
b73bfc1c
KH
4263 {
4264 unsigned char *src = source + coding->consumed;
4265 unsigned char *src_end = src + src_bytes;
4266 unsigned char *dst = destination + coding->produced;
4267
4268 if (coding->type == coding_type_iso2022)
4269 ENCODE_RESET_PLANE_AND_REGISTER;
4270 if (COMPOSING_P (coding))
4271 *dst++ = ISO_CODE_ESC, *dst++ = '1';
4272 if (coding->consumed < src_bytes)
d46c5b12 4273 {
b73bfc1c
KH
4274 int len = src_bytes - coding->consumed;
4275
4276 BCOPY_SHORT (source + coding->consumed, dst, len);
4277 if (coding->src_multibyte)
4278 len = str_as_unibyte (dst, len);
4279 dst += len;
4280 coding->consumed = src_bytes;
d46c5b12 4281 }
b73bfc1c 4282 coding->produced = coding->produced_char = dst - destination;
73be902c 4283 coding->result = CODING_FINISH_NORMAL;
4ed46869
KH
4284 }
4285
bb10be8b
KH
4286 if (coding->result == CODING_FINISH_INSUFFICIENT_SRC
4287 && coding->consumed == src_bytes)
4288 coding->result = CODING_FINISH_NORMAL;
4289
b73bfc1c 4290 return coding->result;
4ed46869
KH
4291}
4292
fb88bf2d
KH
4293/* Scan text in the region between *BEG and *END (byte positions),
4294 skip characters which we don't have to decode by coding system
4295 CODING at the head and tail, then set *BEG and *END to the region
4296 of the text we actually have to convert. The caller should move
b73bfc1c
KH
4297 the gap out of the region in advance if the region is from a
4298 buffer.
4ed46869 4299
d46c5b12
KH
4300 If STR is not NULL, *BEG and *END are indices into STR. */
4301
4302static void
4303shrink_decoding_region (beg, end, coding, str)
4304 int *beg, *end;
4305 struct coding_system *coding;
4306 unsigned char *str;
4307{
fb88bf2d 4308 unsigned char *begp_orig, *begp, *endp_orig, *endp, c;
d46c5b12 4309 int eol_conversion;
88993dfd 4310 Lisp_Object translation_table;
d46c5b12
KH
4311
4312 if (coding->type == coding_type_ccl
4313 || coding->type == coding_type_undecided
b73bfc1c
KH
4314 || coding->eol_type != CODING_EOL_LF
4315 || !NILP (coding->post_read_conversion)
4316 || coding->composing != COMPOSITION_DISABLED)
d46c5b12
KH
4317 {
4318 /* We can't skip any data. */
4319 return;
4320 }
b73bfc1c
KH
4321 if (coding->type == coding_type_no_conversion
4322 || coding->type == coding_type_raw_text
4323 || coding->type == coding_type_emacs_mule)
d46c5b12 4324 {
fb88bf2d
KH
4325 /* We need no conversion, but don't have to skip any data here.
4326 Decoding routine handles them effectively anyway. */
d46c5b12
KH
4327 return;
4328 }
4329
88993dfd
KH
4330 translation_table = coding->translation_table_for_decode;
4331 if (NILP (translation_table) && !NILP (Venable_character_translation))
4332 translation_table = Vstandard_translation_table_for_decode;
4333 if (CHAR_TABLE_P (translation_table))
4334 {
4335 int i;
4336 for (i = 0; i < 128; i++)
4337 if (!NILP (CHAR_TABLE_REF (translation_table, i)))
4338 break;
4339 if (i < 128)
fa46990e 4340 /* Some ASCII character should be translated. We give up
88993dfd
KH
4341 shrinking. */
4342 return;
4343 }
4344
b73bfc1c 4345 if (coding->heading_ascii >= 0)
d46c5b12
KH
4346 /* Detection routine has already found how much we can skip at the
4347 head. */
4348 *beg += coding->heading_ascii;
4349
4350 if (str)
4351 {
4352 begp_orig = begp = str + *beg;
4353 endp_orig = endp = str + *end;
4354 }
4355 else
4356 {
fb88bf2d 4357 begp_orig = begp = BYTE_POS_ADDR (*beg);
d46c5b12
KH
4358 endp_orig = endp = begp + *end - *beg;
4359 }
4360
fa46990e
DL
4361 eol_conversion = (coding->eol_type == CODING_EOL_CR
4362 || coding->eol_type == CODING_EOL_CRLF);
4363
d46c5b12
KH
4364 switch (coding->type)
4365 {
d46c5b12
KH
4366 case coding_type_sjis:
4367 case coding_type_big5:
4368 /* We can skip all ASCII characters at the head. */
4369 if (coding->heading_ascii < 0)
4370 {
4371 if (eol_conversion)
de9d083c 4372 while (begp < endp && *begp < 0x80 && *begp != '\r') begp++;
d46c5b12
KH
4373 else
4374 while (begp < endp && *begp < 0x80) begp++;
4375 }
4376 /* We can skip all ASCII characters at the tail except for the
4377 second byte of SJIS or BIG5 code. */
4378 if (eol_conversion)
de9d083c 4379 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\r') endp--;
d46c5b12
KH
4380 else
4381 while (begp < endp && endp[-1] < 0x80) endp--;
ee59c65f
RS
4382 /* Do not consider LF as ascii if preceded by CR, since that
4383 confuses eol decoding. */
4384 if (begp < endp && endp < endp_orig && endp[-1] == '\r' && endp[0] == '\n')
4385 endp++;
d46c5b12
KH
4386 if (begp < endp && endp < endp_orig && endp[-1] >= 0x80)
4387 endp++;
4388 break;
4389
b73bfc1c 4390 case coding_type_iso2022:
622fece5
KH
4391 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, 0) != CHARSET_ASCII)
4392 /* We can't skip any data. */
4393 break;
d46c5b12
KH
4394 if (coding->heading_ascii < 0)
4395 {
d46c5b12
KH
4396 /* We can skip all ASCII characters at the head except for a
4397 few control codes. */
4398 while (begp < endp && (c = *begp) < 0x80
4399 && c != ISO_CODE_CR && c != ISO_CODE_SO
4400 && c != ISO_CODE_SI && c != ISO_CODE_ESC
4401 && (!eol_conversion || c != ISO_CODE_LF))
4402 begp++;
4403 }
4404 switch (coding->category_idx)
4405 {
4406 case CODING_CATEGORY_IDX_ISO_8_1:
4407 case CODING_CATEGORY_IDX_ISO_8_2:
4408 /* We can skip all ASCII characters at the tail. */
4409 if (eol_conversion)
de9d083c 4410 while (begp < endp && (c = endp[-1]) < 0x80 && c != '\r') endp--;
d46c5b12
KH
4411 else
4412 while (begp < endp && endp[-1] < 0x80) endp--;
ee59c65f
RS
4413 /* Do not consider LF as ascii if preceded by CR, since that
4414 confuses eol decoding. */
4415 if (begp < endp && endp < endp_orig && endp[-1] == '\r' && endp[0] == '\n')
4416 endp++;
d46c5b12
KH
4417 break;
4418
4419 case CODING_CATEGORY_IDX_ISO_7:
4420 case CODING_CATEGORY_IDX_ISO_7_TIGHT:
de79a6a5
KH
4421 {
4422 /* We can skip all charactes at the tail except for 8-bit
4423 codes and ESC and the following 2-byte at the tail. */
4424 unsigned char *eight_bit = NULL;
4425
4426 if (eol_conversion)
4427 while (begp < endp
4428 && (c = endp[-1]) != ISO_CODE_ESC && c != '\r')
4429 {
4430 if (!eight_bit && c & 0x80) eight_bit = endp;
4431 endp--;
4432 }
4433 else
4434 while (begp < endp
4435 && (c = endp[-1]) != ISO_CODE_ESC)
4436 {
4437 if (!eight_bit && c & 0x80) eight_bit = endp;
4438 endp--;
4439 }
4440 /* Do not consider LF as ascii if preceded by CR, since that
4441 confuses eol decoding. */
4442 if (begp < endp && endp < endp_orig
4443 && endp[-1] == '\r' && endp[0] == '\n')
4444 endp++;
4445 if (begp < endp && endp[-1] == ISO_CODE_ESC)
4446 {
4447 if (endp + 1 < endp_orig && end[0] == '(' && end[1] == 'B')
4448 /* This is an ASCII designation sequence. We can
4449 surely skip the tail. But, if we have
4450 encountered an 8-bit code, skip only the codes
4451 after that. */
4452 endp = eight_bit ? eight_bit : endp + 2;
4453 else
4454 /* Hmmm, we can't skip the tail. */
4455 endp = endp_orig;
4456 }
4457 else if (eight_bit)
4458 endp = eight_bit;
4459 }
d46c5b12 4460 }
b73bfc1c
KH
4461 break;
4462
4463 default:
4464 abort ();
d46c5b12
KH
4465 }
4466 *beg += begp - begp_orig;
4467 *end += endp - endp_orig;
4468 return;
4469}
4470
4471/* Like shrink_decoding_region but for encoding. */
4472
4473static void
4474shrink_encoding_region (beg, end, coding, str)
4475 int *beg, *end;
4476 struct coding_system *coding;
4477 unsigned char *str;
4478{
4479 unsigned char *begp_orig, *begp, *endp_orig, *endp;
4480 int eol_conversion;
88993dfd 4481 Lisp_Object translation_table;
d46c5b12 4482
b73bfc1c
KH
4483 if (coding->type == coding_type_ccl
4484 || coding->eol_type == CODING_EOL_CRLF
4485 || coding->eol_type == CODING_EOL_CR
4486 || coding->cmp_data && coding->cmp_data->used > 0)
d46c5b12 4487 {
b73bfc1c
KH
4488 /* We can't skip any data. */
4489 return;
4490 }
4491 if (coding->type == coding_type_no_conversion
4492 || coding->type == coding_type_raw_text
4493 || coding->type == coding_type_emacs_mule
4494 || coding->type == coding_type_undecided)
4495 {
4496 /* We need no conversion, but don't have to skip any data here.
4497 Encoding routine handles them effectively anyway. */
d46c5b12
KH
4498 return;
4499 }
4500
88993dfd
KH
4501 translation_table = coding->translation_table_for_encode;
4502 if (NILP (translation_table) && !NILP (Venable_character_translation))
4503 translation_table = Vstandard_translation_table_for_encode;
4504 if (CHAR_TABLE_P (translation_table))
4505 {
4506 int i;
4507 for (i = 0; i < 128; i++)
4508 if (!NILP (CHAR_TABLE_REF (translation_table, i)))
4509 break;
4510 if (i < 128)
4511 /* Some ASCII character should be tranlsated. We give up
4512 shrinking. */
4513 return;
4514 }
4515
d46c5b12
KH
4516 if (str)
4517 {
4518 begp_orig = begp = str + *beg;
4519 endp_orig = endp = str + *end;
4520 }
4521 else
4522 {
fb88bf2d 4523 begp_orig = begp = BYTE_POS_ADDR (*beg);
d46c5b12
KH
4524 endp_orig = endp = begp + *end - *beg;
4525 }
4526
4527 eol_conversion = (coding->eol_type == CODING_EOL_CR
4528 || coding->eol_type == CODING_EOL_CRLF);
4529
4530 /* Here, we don't have to check coding->pre_write_conversion because
4531 the caller is expected to have handled it already. */
4532 switch (coding->type)
4533 {
d46c5b12 4534 case coding_type_iso2022:
622fece5
KH
4535 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, 0) != CHARSET_ASCII)
4536 /* We can't skip any data. */
4537 break;
d46c5b12
KH
4538 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL)
4539 {
4540 unsigned char *bol = begp;
4541 while (begp < endp && *begp < 0x80)
4542 {
4543 begp++;
4544 if (begp[-1] == '\n')
4545 bol = begp;
4546 }
4547 begp = bol;
4548 goto label_skip_tail;
4549 }
4550 /* fall down ... */
4551
b73bfc1c
KH
4552 case coding_type_sjis:
4553 case coding_type_big5:
d46c5b12
KH
4554 /* We can skip all ASCII characters at the head and tail. */
4555 if (eol_conversion)
4556 while (begp < endp && *begp < 0x80 && *begp != '\n') begp++;
4557 else
4558 while (begp < endp && *begp < 0x80) begp++;
4559 label_skip_tail:
4560 if (eol_conversion)
4561 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--;
4562 else
4563 while (begp < endp && *(endp - 1) < 0x80) endp--;
4564 break;
b73bfc1c
KH
4565
4566 default:
4567 abort ();
d46c5b12
KH
4568 }
4569
4570 *beg += begp - begp_orig;
4571 *end += endp - endp_orig;
4572 return;
4573}
4574
88993dfd
KH
4575/* As shrinking conversion region requires some overhead, we don't try
4576 shrinking if the length of conversion region is less than this
4577 value. */
4578static int shrink_conversion_region_threshhold = 1024;
4579
4580#define SHRINK_CONVERSION_REGION(beg, end, coding, str, encodep) \
4581 do { \
4582 if (*(end) - *(beg) > shrink_conversion_region_threshhold) \
4583 { \
4584 if (encodep) shrink_encoding_region (beg, end, coding, str); \
4585 else shrink_decoding_region (beg, end, coding, str); \
4586 } \
4587 } while (0)
4588
b843d1ae
KH
4589static Lisp_Object
4590code_convert_region_unwind (dummy)
4591 Lisp_Object dummy;
4592{
4593 inhibit_pre_post_conversion = 0;
4594 return Qnil;
4595}
4596
ec6d2bb8
KH
4597/* Store information about all compositions in the range FROM and TO
4598 of OBJ in memory blocks pointed by CODING->cmp_data. OBJ is a
4599 buffer or a string, defaults to the current buffer. */
4600
4601void
4602coding_save_composition (coding, from, to, obj)
4603 struct coding_system *coding;
4604 int from, to;
4605 Lisp_Object obj;
4606{
4607 Lisp_Object prop;
4608 int start, end;
4609
91bee881
KH
4610 if (coding->composing == COMPOSITION_DISABLED)
4611 return;
4612 if (!coding->cmp_data)
4613 coding_allocate_composition_data (coding, from);
ec6d2bb8
KH
4614 if (!find_composition (from, to, &start, &end, &prop, obj)
4615 || end > to)
4616 return;
4617 if (start < from
4618 && (!find_composition (end, to, &start, &end, &prop, obj)
4619 || end > to))
4620 return;
4621 coding->composing = COMPOSITION_NO;
ec6d2bb8
KH
4622 do
4623 {
4624 if (COMPOSITION_VALID_P (start, end, prop))
4625 {
4626 enum composition_method method = COMPOSITION_METHOD (prop);
4627 if (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH
4628 >= COMPOSITION_DATA_SIZE)
4629 coding_allocate_composition_data (coding, from);
4630 /* For relative composition, we remember start and end
4631 positions, for the other compositions, we also remember
4632 components. */
4633 CODING_ADD_COMPOSITION_START (coding, start - from, method);
4634 if (method != COMPOSITION_RELATIVE)
4635 {
4636 /* We must store a*/
4637 Lisp_Object val, ch;
4638
4639 val = COMPOSITION_COMPONENTS (prop);
4640 if (CONSP (val))
4641 while (CONSP (val))
4642 {
4643 ch = XCAR (val), val = XCDR (val);
4644 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (ch));
4645 }
4646 else if (VECTORP (val) || STRINGP (val))
4647 {
4648 int len = (VECTORP (val)
4649 ? XVECTOR (val)->size : XSTRING (val)->size);
4650 int i;
4651 for (i = 0; i < len; i++)
4652 {
4653 ch = (STRINGP (val)
4654 ? Faref (val, make_number (i))
4655 : XVECTOR (val)->contents[i]);
4656 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (ch));
4657 }
4658 }
4659 else /* INTEGERP (val) */
4660 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (val));
4661 }
4662 CODING_ADD_COMPOSITION_END (coding, end - from);
4663 }
4664 start = end;
4665 }
4666 while (start < to
4667 && find_composition (start, to, &start, &end, &prop, obj)
4668 && end <= to);
4669
4670 /* Make coding->cmp_data point to the first memory block. */
4671 while (coding->cmp_data->prev)
4672 coding->cmp_data = coding->cmp_data->prev;
4673 coding->cmp_data_start = 0;
4674}
4675
4676/* Reflect the saved information about compositions to OBJ.
4677 CODING->cmp_data points to a memory block for the informaiton. OBJ
4678 is a buffer or a string, defaults to the current buffer. */
4679
33fb63eb 4680void
ec6d2bb8
KH
4681coding_restore_composition (coding, obj)
4682 struct coding_system *coding;
4683 Lisp_Object obj;
4684{
4685 struct composition_data *cmp_data = coding->cmp_data;
4686
4687 if (!cmp_data)
4688 return;
4689
4690 while (cmp_data->prev)
4691 cmp_data = cmp_data->prev;
4692
4693 while (cmp_data)
4694 {
4695 int i;
4696
78108bcd
KH
4697 for (i = 0; i < cmp_data->used && cmp_data->data[i] > 0;
4698 i += cmp_data->data[i])
ec6d2bb8
KH
4699 {
4700 int *data = cmp_data->data + i;
4701 enum composition_method method = (enum composition_method) data[3];
4702 Lisp_Object components;
4703
4704 if (method == COMPOSITION_RELATIVE)
4705 components = Qnil;
4706 else
4707 {
4708 int len = data[0] - 4, j;
4709 Lisp_Object args[MAX_COMPOSITION_COMPONENTS * 2 - 1];
4710
4711 for (j = 0; j < len; j++)
4712 args[j] = make_number (data[4 + j]);
4713 components = (method == COMPOSITION_WITH_ALTCHARS
4714 ? Fstring (len, args) : Fvector (len, args));
4715 }
4716 compose_text (data[1], data[2], components, Qnil, obj);
4717 }
4718 cmp_data = cmp_data->next;
4719 }
4720}
4721
d46c5b12 4722/* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the
fb88bf2d
KH
4723 text from FROM to TO (byte positions are FROM_BYTE and TO_BYTE) by
4724 coding system CODING, and return the status code of code conversion
4725 (currently, this value has no meaning).
4726
4727 How many characters (and bytes) are converted to how many
4728 characters (and bytes) are recorded in members of the structure
4729 CODING.
d46c5b12 4730
6e44253b 4731 If REPLACE is nonzero, we do various things as if the original text
d46c5b12 4732 is deleted and a new text is inserted. See the comments in
b73bfc1c
KH
4733 replace_range (insdel.c) to know what we are doing.
4734
4735 If REPLACE is zero, it is assumed that the source text is unibyte.
4736 Otherwize, it is assumed that the source text is multibyte. */
4ed46869
KH
4737
4738int
6e44253b
KH
4739code_convert_region (from, from_byte, to, to_byte, coding, encodep, replace)
4740 int from, from_byte, to, to_byte, encodep, replace;
4ed46869 4741 struct coding_system *coding;
4ed46869 4742{
fb88bf2d
KH
4743 int len = to - from, len_byte = to_byte - from_byte;
4744 int require, inserted, inserted_byte;
4b39528c 4745 int head_skip, tail_skip, total_skip = 0;
84d60297 4746 Lisp_Object saved_coding_symbol;
fb88bf2d 4747 int first = 1;
fb88bf2d 4748 unsigned char *src, *dst;
84d60297 4749 Lisp_Object deletion;
e133c8fa 4750 int orig_point = PT, orig_len = len;
6abb9bd9 4751 int prev_Z;
b73bfc1c
KH
4752 int multibyte_p = !NILP (current_buffer->enable_multibyte_characters);
4753
4754 coding->src_multibyte = replace && multibyte_p;
4755 coding->dst_multibyte = multibyte_p;
84d60297
RS
4756
4757 deletion = Qnil;
4758 saved_coding_symbol = Qnil;
d46c5b12 4759
83fa074f 4760 if (from < PT && PT < to)
e133c8fa
KH
4761 {
4762 TEMP_SET_PT_BOTH (from, from_byte);
4763 orig_point = from;
4764 }
83fa074f 4765
6e44253b 4766 if (replace)
d46c5b12 4767 {
fb88bf2d 4768 int saved_from = from;
e077cc80 4769 int saved_inhibit_modification_hooks;
fb88bf2d 4770
d46c5b12 4771 prepare_to_modify_buffer (from, to, &from);
fb88bf2d
KH
4772 if (saved_from != from)
4773 {
4774 to = from + len;
b73bfc1c 4775 from_byte = CHAR_TO_BYTE (from), to_byte = CHAR_TO_BYTE (to);
fb88bf2d
KH
4776 len_byte = to_byte - from_byte;
4777 }
e077cc80
KH
4778
4779 /* The code conversion routine can not preserve text properties
4780 for now. So, we must remove all text properties in the
4781 region. Here, we must suppress all modification hooks. */
4782 saved_inhibit_modification_hooks = inhibit_modification_hooks;
4783 inhibit_modification_hooks = 1;
4784 Fset_text_properties (make_number (from), make_number (to), Qnil, Qnil);
4785 inhibit_modification_hooks = saved_inhibit_modification_hooks;
d46c5b12 4786 }
d46c5b12
KH
4787
4788 if (! encodep && CODING_REQUIRE_DETECTION (coding))
4789 {
12410ef1 4790 /* We must detect encoding of text and eol format. */
d46c5b12
KH
4791
4792 if (from < GPT && to > GPT)
4793 move_gap_both (from, from_byte);
4794 if (coding->type == coding_type_undecided)
4795 {
fb88bf2d 4796 detect_coding (coding, BYTE_POS_ADDR (from_byte), len_byte);
d46c5b12 4797 if (coding->type == coding_type_undecided)
12410ef1
KH
4798 /* It seems that the text contains only ASCII, but we
4799 should not left it undecided because the deeper
4800 decoding routine (decode_coding) tries to detect the
4801 encodings again in vain. */
d46c5b12
KH
4802 coding->type = coding_type_emacs_mule;
4803 }
aaaf0b1e
KH
4804 if (coding->eol_type == CODING_EOL_UNDECIDED
4805 && coding->type != coding_type_ccl)
d46c5b12
KH
4806 {
4807 saved_coding_symbol = coding->symbol;
4808 detect_eol (coding, BYTE_POS_ADDR (from_byte), len_byte);
4809 if (coding->eol_type == CODING_EOL_UNDECIDED)
4810 coding->eol_type = CODING_EOL_LF;
4811 /* We had better recover the original eol format if we
4812 encounter an inconsitent eol format while decoding. */
4813 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
4814 }
4815 }
4816
d46c5b12
KH
4817 /* Now we convert the text. */
4818
4819 /* For encoding, we must process pre-write-conversion in advance. */
b73bfc1c
KH
4820 if (! inhibit_pre_post_conversion
4821 && encodep
d46c5b12
KH
4822 && SYMBOLP (coding->pre_write_conversion)
4823 && ! NILP (Ffboundp (coding->pre_write_conversion)))
4824 {
2b4f9037
KH
4825 /* The function in pre-write-conversion may put a new text in a
4826 new buffer. */
0007bdd0
KH
4827 struct buffer *prev = current_buffer;
4828 Lisp_Object new;
b843d1ae 4829 int count = specpdl_ptr - specpdl;
d46c5b12 4830
b843d1ae
KH
4831 record_unwind_protect (code_convert_region_unwind, Qnil);
4832 /* We should not call any more pre-write/post-read-conversion
4833 functions while this pre-write-conversion is running. */
4834 inhibit_pre_post_conversion = 1;
b39f748c
AS
4835 call2 (coding->pre_write_conversion,
4836 make_number (from), make_number (to));
b843d1ae
KH
4837 inhibit_pre_post_conversion = 0;
4838 /* Discard the unwind protect. */
4839 specpdl_ptr--;
4840
d46c5b12
KH
4841 if (current_buffer != prev)
4842 {
4843 len = ZV - BEGV;
0007bdd0 4844 new = Fcurrent_buffer ();
d46c5b12 4845 set_buffer_internal_1 (prev);
7dae4502 4846 del_range_2 (from, from_byte, to, to_byte, 0);
e133c8fa 4847 TEMP_SET_PT_BOTH (from, from_byte);
0007bdd0
KH
4848 insert_from_buffer (XBUFFER (new), 1, len, 0);
4849 Fkill_buffer (new);
e133c8fa
KH
4850 if (orig_point >= to)
4851 orig_point += len - orig_len;
4852 else if (orig_point > from)
4853 orig_point = from;
4854 orig_len = len;
d46c5b12 4855 to = from + len;
b73bfc1c
KH
4856 from_byte = CHAR_TO_BYTE (from);
4857 to_byte = CHAR_TO_BYTE (to);
d46c5b12 4858 len_byte = to_byte - from_byte;
e133c8fa 4859 TEMP_SET_PT_BOTH (from, from_byte);
d46c5b12
KH
4860 }
4861 }
4862
12410ef1
KH
4863 if (replace)
4864 deletion = make_buffer_string_both (from, from_byte, to, to_byte, 1);
4865
ec6d2bb8
KH
4866 if (coding->composing != COMPOSITION_DISABLED)
4867 {
4868 if (encodep)
4869 coding_save_composition (coding, from, to, Fcurrent_buffer ());
4870 else
4871 coding_allocate_composition_data (coding, from);
4872 }
fb88bf2d 4873
b73bfc1c 4874 /* Try to skip the heading and tailing ASCIIs. */
4956c225
KH
4875 if (coding->type != coding_type_ccl)
4876 {
4877 int from_byte_orig = from_byte, to_byte_orig = to_byte;
ec6d2bb8 4878
4956c225
KH
4879 if (from < GPT && GPT < to)
4880 move_gap_both (from, from_byte);
4881 SHRINK_CONVERSION_REGION (&from_byte, &to_byte, coding, NULL, encodep);
4882 if (from_byte == to_byte
4883 && (encodep || NILP (coding->post_read_conversion))
4884 && ! CODING_REQUIRE_FLUSHING (coding))
4885 {
4886 coding->produced = len_byte;
4887 coding->produced_char = len;
4888 if (!replace)
4889 /* We must record and adjust for this new text now. */
4890 adjust_after_insert (from, from_byte_orig, to, to_byte_orig, len);
4891 return 0;
4892 }
4893
4894 head_skip = from_byte - from_byte_orig;
4895 tail_skip = to_byte_orig - to_byte;
4896 total_skip = head_skip + tail_skip;
4897 from += head_skip;
4898 to -= tail_skip;
4899 len -= total_skip; len_byte -= total_skip;
4900 }
d46c5b12 4901
fb88bf2d
KH
4902 /* For converion, we must put the gap before the text in addition to
4903 making the gap larger for efficient decoding. The required gap
4904 size starts from 2000 which is the magic number used in make_gap.
4905 But, after one batch of conversion, it will be incremented if we
4906 find that it is not enough . */
d46c5b12
KH
4907 require = 2000;
4908
4909 if (GAP_SIZE < require)
4910 make_gap (require - GAP_SIZE);
4911 move_gap_both (from, from_byte);
4912
d46c5b12 4913 inserted = inserted_byte = 0;
fb88bf2d
KH
4914
4915 GAP_SIZE += len_byte;
4916 ZV -= len;
4917 Z -= len;
4918 ZV_BYTE -= len_byte;
4919 Z_BYTE -= len_byte;
4920
d9f9a1bc
GM
4921 if (GPT - BEG < BEG_UNCHANGED)
4922 BEG_UNCHANGED = GPT - BEG;
4923 if (Z - GPT < END_UNCHANGED)
4924 END_UNCHANGED = Z - GPT;
f2558efd 4925
b73bfc1c
KH
4926 if (!encodep && coding->src_multibyte)
4927 {
4928 /* Decoding routines expects that the source text is unibyte.
4929 We must convert 8-bit characters of multibyte form to
4930 unibyte. */
4931 int len_byte_orig = len_byte;
4932 len_byte = str_as_unibyte (GAP_END_ADDR - len_byte, len_byte);
4933 if (len_byte < len_byte_orig)
4934 safe_bcopy (GAP_END_ADDR - len_byte_orig, GAP_END_ADDR - len_byte,
4935 len_byte);
4936 coding->src_multibyte = 0;
4937 }
4938
d46c5b12
KH
4939 for (;;)
4940 {
fb88bf2d 4941 int result;
d46c5b12 4942
ec6d2bb8 4943 /* The buffer memory is now:
b73bfc1c
KH
4944 +--------+converted-text+---------+-------original-text-------+---+
4945 |<-from->|<--inserted-->|---------|<--------len_byte--------->|---|
4946 |<---------------------- GAP ----------------------->| */
ec6d2bb8
KH
4947 src = GAP_END_ADDR - len_byte;
4948 dst = GPT_ADDR + inserted_byte;
4949
d46c5b12 4950 if (encodep)
fb88bf2d 4951 result = encode_coding (coding, src, dst, len_byte, 0);
d46c5b12 4952 else
fb88bf2d 4953 result = decode_coding (coding, src, dst, len_byte, 0);
ec6d2bb8
KH
4954
4955 /* The buffer memory is now:
b73bfc1c
KH
4956 +--------+-------converted-text----+--+------original-text----+---+
4957 |<-from->|<-inserted->|<-produced->|--|<-(len_byte-consumed)->|---|
4958 |<---------------------- GAP ----------------------->| */
ec6d2bb8 4959
d46c5b12
KH
4960 inserted += coding->produced_char;
4961 inserted_byte += coding->produced;
d46c5b12 4962 len_byte -= coding->consumed;
ec6d2bb8
KH
4963
4964 if (result == CODING_FINISH_INSUFFICIENT_CMP)
4965 {
4966 coding_allocate_composition_data (coding, from + inserted);
4967 continue;
4968 }
4969
fb88bf2d 4970 src += coding->consumed;
3636f7a3 4971 dst += coding->produced;
d46c5b12 4972
9864ebce
KH
4973 if (result == CODING_FINISH_NORMAL)
4974 {
4975 src += len_byte;
4976 break;
4977 }
d46c5b12
KH
4978 if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL)
4979 {
fb88bf2d 4980 unsigned char *pend = dst, *p = pend - inserted_byte;
38edf7d4 4981 Lisp_Object eol_type;
d46c5b12
KH
4982
4983 /* Encode LFs back to the original eol format (CR or CRLF). */
4984 if (coding->eol_type == CODING_EOL_CR)
4985 {
4986 while (p < pend) if (*p++ == '\n') p[-1] = '\r';
4987 }
4988 else
4989 {
d46c5b12
KH
4990 int count = 0;
4991
fb88bf2d
KH
4992 while (p < pend) if (*p++ == '\n') count++;
4993 if (src - dst < count)
d46c5b12 4994 {
38edf7d4 4995 /* We don't have sufficient room for encoding LFs
fb88bf2d
KH
4996 back to CRLF. We must record converted and
4997 not-yet-converted text back to the buffer
4998 content, enlarge the gap, then record them out of
4999 the buffer contents again. */
5000 int add = len_byte + inserted_byte;
5001
5002 GAP_SIZE -= add;
5003 ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add;
5004 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5005 make_gap (count - GAP_SIZE);
5006 GAP_SIZE += add;
5007 ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add;
5008 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
5009 /* Don't forget to update SRC, DST, and PEND. */
5010 src = GAP_END_ADDR - len_byte;
5011 dst = GPT_ADDR + inserted_byte;
5012 pend = dst;
d46c5b12 5013 }
d46c5b12
KH
5014 inserted += count;
5015 inserted_byte += count;
fb88bf2d
KH
5016 coding->produced += count;
5017 p = dst = pend + count;
5018 while (count)
5019 {
5020 *--p = *--pend;
5021 if (*p == '\n') count--, *--p = '\r';
5022 }
d46c5b12
KH
5023 }
5024
5025 /* Suppress eol-format conversion in the further conversion. */
5026 coding->eol_type = CODING_EOL_LF;
5027
38edf7d4
KH
5028 /* Set the coding system symbol to that for Unix-like EOL. */
5029 eol_type = Fget (saved_coding_symbol, Qeol_type);
5030 if (VECTORP (eol_type)
5031 && XVECTOR (eol_type)->size == 3
5032 && SYMBOLP (XVECTOR (eol_type)->contents[CODING_EOL_LF]))
5033 coding->symbol = XVECTOR (eol_type)->contents[CODING_EOL_LF];
5034 else
5035 coding->symbol = saved_coding_symbol;
fb88bf2d
KH
5036
5037 continue;
d46c5b12
KH
5038 }
5039 if (len_byte <= 0)
944bd420
KH
5040 {
5041 if (coding->type != coding_type_ccl
5042 || coding->mode & CODING_MODE_LAST_BLOCK)
5043 break;
5044 coding->mode |= CODING_MODE_LAST_BLOCK;
5045 continue;
5046 }
d46c5b12
KH
5047 if (result == CODING_FINISH_INSUFFICIENT_SRC)
5048 {
5049 /* The source text ends in invalid codes. Let's just
5050 make them valid buffer contents, and finish conversion. */
fb88bf2d 5051 inserted += len_byte;
d46c5b12 5052 inserted_byte += len_byte;
fb88bf2d 5053 while (len_byte--)
ee59c65f 5054 *dst++ = *src++;
d46c5b12
KH
5055 break;
5056 }
9864ebce
KH
5057 if (result == CODING_FINISH_INTERRUPT)
5058 {
5059 /* The conversion procedure was interrupted by a user. */
9864ebce
KH
5060 break;
5061 }
5062 /* Now RESULT == CODING_FINISH_INSUFFICIENT_DST */
5063 if (coding->consumed < 1)
5064 {
5065 /* It's quite strange to require more memory without
5066 consuming any bytes. Perhaps CCL program bug. */
9864ebce
KH
5067 break;
5068 }
fb88bf2d
KH
5069 if (first)
5070 {
5071 /* We have just done the first batch of conversion which was
5072 stoped because of insufficient gap. Let's reconsider the
5073 required gap size (i.e. SRT - DST) now.
5074
5075 We have converted ORIG bytes (== coding->consumed) into
5076 NEW bytes (coding->produced). To convert the remaining
5077 LEN bytes, we may need REQUIRE bytes of gap, where:
5078 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
5079 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
5080 Here, we are sure that NEW >= ORIG. */
6e44253b
KH
5081 float ratio = coding->produced - coding->consumed;
5082 ratio /= coding->consumed;
5083 require = len_byte * ratio;
fb88bf2d
KH
5084 first = 0;
5085 }
5086 if ((src - dst) < (require + 2000))
5087 {
5088 /* See the comment above the previous call of make_gap. */
5089 int add = len_byte + inserted_byte;
5090
5091 GAP_SIZE -= add;
5092 ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add;
5093 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5094 make_gap (require + 2000);
5095 GAP_SIZE += add;
5096 ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add;
5097 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
fb88bf2d 5098 }
d46c5b12 5099 }
fb88bf2d
KH
5100 if (src - dst > 0) *dst = 0; /* Put an anchor. */
5101
b73bfc1c
KH
5102 if (encodep && coding->dst_multibyte)
5103 {
5104 /* The output is unibyte. We must convert 8-bit characters to
5105 multibyte form. */
5106 if (inserted_byte * 2 > GAP_SIZE)
5107 {
5108 GAP_SIZE -= inserted_byte;
5109 ZV += inserted_byte; Z += inserted_byte;
5110 ZV_BYTE += inserted_byte; Z_BYTE += inserted_byte;
5111 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5112 make_gap (inserted_byte - GAP_SIZE);
5113 GAP_SIZE += inserted_byte;
5114 ZV -= inserted_byte; Z -= inserted_byte;
5115 ZV_BYTE -= inserted_byte; Z_BYTE -= inserted_byte;
5116 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
5117 }
5118 inserted_byte = str_to_multibyte (GPT_ADDR, GAP_SIZE, inserted_byte);
5119 }
7553d0e1 5120
12410ef1
KH
5121 /* If we have shrinked the conversion area, adjust it now. */
5122 if (total_skip > 0)
5123 {
5124 if (tail_skip > 0)
5125 safe_bcopy (GAP_END_ADDR, GPT_ADDR + inserted_byte, tail_skip);
5126 inserted += total_skip; inserted_byte += total_skip;
5127 GAP_SIZE += total_skip;
5128 GPT -= head_skip; GPT_BYTE -= head_skip;
5129 ZV -= total_skip; ZV_BYTE -= total_skip;
5130 Z -= total_skip; Z_BYTE -= total_skip;
5131 from -= head_skip; from_byte -= head_skip;
5132 to += tail_skip; to_byte += tail_skip;
5133 }
5134
6abb9bd9 5135 prev_Z = Z;
12410ef1 5136 adjust_after_replace (from, from_byte, deletion, inserted, inserted_byte);
6abb9bd9 5137 inserted = Z - prev_Z;
4ed46869 5138
ec6d2bb8
KH
5139 if (!encodep && coding->cmp_data && coding->cmp_data->used)
5140 coding_restore_composition (coding, Fcurrent_buffer ());
5141 coding_free_composition_data (coding);
5142
b73bfc1c
KH
5143 if (! inhibit_pre_post_conversion
5144 && ! encodep && ! NILP (coding->post_read_conversion))
d46c5b12 5145 {
2b4f9037 5146 Lisp_Object val;
b843d1ae 5147 int count = specpdl_ptr - specpdl;
4ed46869 5148
e133c8fa
KH
5149 if (from != PT)
5150 TEMP_SET_PT_BOTH (from, from_byte);
6abb9bd9 5151 prev_Z = Z;
b843d1ae
KH
5152 record_unwind_protect (code_convert_region_unwind, Qnil);
5153 /* We should not call any more pre-write/post-read-conversion
5154 functions while this post-read-conversion is running. */
5155 inhibit_pre_post_conversion = 1;
2b4f9037 5156 val = call1 (coding->post_read_conversion, make_number (inserted));
b843d1ae
KH
5157 inhibit_pre_post_conversion = 0;
5158 /* Discard the unwind protect. */
5159 specpdl_ptr--;
6abb9bd9 5160 CHECK_NUMBER (val, 0);
944bd420 5161 inserted += Z - prev_Z;
e133c8fa
KH
5162 }
5163
5164 if (orig_point >= from)
5165 {
5166 if (orig_point >= from + orig_len)
5167 orig_point += inserted - orig_len;
5168 else
5169 orig_point = from;
5170 TEMP_SET_PT (orig_point);
d46c5b12 5171 }
4ed46869 5172
ec6d2bb8
KH
5173 if (replace)
5174 {
5175 signal_after_change (from, to - from, inserted);
e19539f1 5176 update_compositions (from, from + inserted, CHECK_BORDER);
ec6d2bb8 5177 }
2b4f9037 5178
fb88bf2d 5179 {
12410ef1
KH
5180 coding->consumed = to_byte - from_byte;
5181 coding->consumed_char = to - from;
5182 coding->produced = inserted_byte;
5183 coding->produced_char = inserted;
fb88bf2d 5184 }
7553d0e1 5185
fb88bf2d 5186 return 0;
d46c5b12
KH
5187}
5188
5189Lisp_Object
b73bfc1c
KH
5190run_pre_post_conversion_on_str (str, coding, encodep)
5191 Lisp_Object str;
5192 struct coding_system *coding;
5193 int encodep;
5194{
5195 int count = specpdl_ptr - specpdl;
5196 struct gcpro gcpro1;
5197 struct buffer *prev = current_buffer;
5198 int multibyte = STRING_MULTIBYTE (str);
5199
5200 record_unwind_protect (Fset_buffer, Fcurrent_buffer ());
5201 record_unwind_protect (code_convert_region_unwind, Qnil);
5202 GCPRO1 (str);
5203 temp_output_buffer_setup (" *code-converting-work*");
5204 set_buffer_internal (XBUFFER (Vstandard_output));
5205 /* We must insert the contents of STR as is without
5206 unibyte<->multibyte conversion. For that, we adjust the
5207 multibyteness of the working buffer to that of STR. */
5208 Ferase_buffer ();
5209 current_buffer->enable_multibyte_characters = multibyte ? Qt : Qnil;
5210 insert_from_string (str, 0, 0,
5211 XSTRING (str)->size, STRING_BYTES (XSTRING (str)), 0);
5212 UNGCPRO;
5213 inhibit_pre_post_conversion = 1;
5214 if (encodep)
5215 call2 (coding->pre_write_conversion, make_number (BEG), make_number (Z));
5216 else
6bac5b12
KH
5217 {
5218 TEMP_SET_PT_BOTH (BEG, BEG_BYTE);
5219 call1 (coding->post_read_conversion, make_number (Z - BEG));
5220 }
b73bfc1c 5221 inhibit_pre_post_conversion = 0;
78108bcd 5222 str = make_buffer_string (BEG, Z, 1);
b73bfc1c
KH
5223 return unbind_to (count, str);
5224}
5225
5226Lisp_Object
5227decode_coding_string (str, coding, nocopy)
d46c5b12 5228 Lisp_Object str;
4ed46869 5229 struct coding_system *coding;
b73bfc1c 5230 int nocopy;
4ed46869 5231{
d46c5b12 5232 int len;
73be902c 5233 struct conversion_buffer buf;
b73bfc1c 5234 int from, to, to_byte;
d46c5b12 5235 struct gcpro gcpro1;
84d60297 5236 Lisp_Object saved_coding_symbol;
d46c5b12 5237 int result;
78108bcd 5238 int require_decoding;
73be902c
KH
5239 int shrinked_bytes = 0;
5240 Lisp_Object newstr;
2391eaa4 5241 int consumed, consumed_char, produced, produced_char;
4ed46869 5242
b73bfc1c
KH
5243 from = 0;
5244 to = XSTRING (str)->size;
5245 to_byte = STRING_BYTES (XSTRING (str));
4ed46869 5246
b73bfc1c
KH
5247 saved_coding_symbol = Qnil;
5248 if (CODING_REQUIRE_DETECTION (coding))
d46c5b12
KH
5249 {
5250 /* See the comments in code_convert_region. */
5251 if (coding->type == coding_type_undecided)
5252 {
5253 detect_coding (coding, XSTRING (str)->data, to_byte);
5254 if (coding->type == coding_type_undecided)
5255 coding->type = coding_type_emacs_mule;
5256 }
aaaf0b1e
KH
5257 if (coding->eol_type == CODING_EOL_UNDECIDED
5258 && coding->type != coding_type_ccl)
d46c5b12
KH
5259 {
5260 saved_coding_symbol = coding->symbol;
5261 detect_eol (coding, XSTRING (str)->data, to_byte);
5262 if (coding->eol_type == CODING_EOL_UNDECIDED)
5263 coding->eol_type = CODING_EOL_LF;
5264 /* We had better recover the original eol format if we
5265 encounter an inconsitent eol format while decoding. */
5266 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
5267 }
5268 }
4ed46869 5269
78108bcd 5270 require_decoding = CODING_REQUIRE_DECODING (coding);
ec6d2bb8 5271
b73bfc1c 5272 if (STRING_MULTIBYTE (str))
d46c5b12 5273 {
b73bfc1c
KH
5274 /* Decoding routines expect the source text to be unibyte. */
5275 str = Fstring_as_unibyte (str);
86af83a9 5276 to_byte = STRING_BYTES (XSTRING (str));
b73bfc1c 5277 nocopy = 1;
b73bfc1c 5278 }
78108bcd
KH
5279 coding->src_multibyte = 0;
5280 coding->dst_multibyte = (coding->type != coding_type_no_conversion
5281 && coding->type != coding_type_raw_text);
ec6d2bb8 5282
b73bfc1c 5283 /* Try to skip the heading and tailing ASCIIs. */
78108bcd 5284 if (require_decoding && coding->type != coding_type_ccl)
4956c225 5285 {
4956c225
KH
5286 SHRINK_CONVERSION_REGION (&from, &to_byte, coding, XSTRING (str)->data,
5287 0);
5288 if (from == to_byte)
78108bcd 5289 require_decoding = 0;
73be902c 5290 shrinked_bytes = from + (STRING_BYTES (XSTRING (str)) - to_byte);
4956c225 5291 }
b73bfc1c 5292
78108bcd
KH
5293 if (!require_decoding)
5294 {
5295 coding->consumed = STRING_BYTES (XSTRING (str));
5296 coding->consumed_char = XSTRING (str)->size;
5297 if (coding->dst_multibyte)
5298 {
5299 str = Fstring_as_multibyte (str);
5300 nocopy = 1;
5301 }
5302 coding->produced = STRING_BYTES (XSTRING (str));
5303 coding->produced_char = XSTRING (str)->size;
5304 return (nocopy ? str : Fcopy_sequence (str));
5305 }
5306
5307 if (coding->composing != COMPOSITION_DISABLED)
5308 coding_allocate_composition_data (coding, from);
b73bfc1c 5309 len = decoding_buffer_size (coding, to_byte - from);
73be902c 5310 allocate_conversion_buffer (buf, len);
4ed46869 5311
2391eaa4 5312 consumed = consumed_char = produced = produced_char = 0;
73be902c 5313 while (1)
4ed46869 5314 {
73be902c
KH
5315 result = decode_coding (coding, XSTRING (str)->data + from + consumed,
5316 buf.data + produced, to_byte - from - consumed,
5317 buf.size - produced);
5318 consumed += coding->consumed;
2391eaa4 5319 consumed_char += coding->consumed_char;
73be902c
KH
5320 produced += coding->produced;
5321 produced_char += coding->produced_char;
2391eaa4
KH
5322 if (result == CODING_FINISH_NORMAL
5323 || (result == CODING_FINISH_INSUFFICIENT_SRC
5324 && coding->consumed == 0))
73be902c
KH
5325 break;
5326 if (result == CODING_FINISH_INSUFFICIENT_CMP)
5327 coding_allocate_composition_data (coding, from + produced_char);
5328 else if (result == CODING_FINISH_INSUFFICIENT_DST)
5329 extend_conversion_buffer (&buf);
5330 else if (result == CODING_FINISH_INCONSISTENT_EOL)
5331 {
5332 /* Recover the original EOL format. */
5333 if (coding->eol_type == CODING_EOL_CR)
5334 {
5335 unsigned char *p;
5336 for (p = buf.data; p < buf.data + produced; p++)
5337 if (*p == '\n') *p = '\r';
5338 }
5339 else if (coding->eol_type == CODING_EOL_CRLF)
5340 {
5341 int num_eol = 0;
5342 unsigned char *p0, *p1;
5343 for (p0 = buf.data, p1 = p0 + produced; p0 < p1; p0++)
5344 if (*p0 == '\n') num_eol++;
5345 if (produced + num_eol >= buf.size)
5346 extend_conversion_buffer (&buf);
5347 for (p0 = buf.data + produced, p1 = p0 + num_eol; p0 > buf.data;)
5348 {
5349 *--p1 = *--p0;
5350 if (*p0 == '\n') *--p1 = '\r';
5351 }
5352 produced += num_eol;
5353 produced_char += num_eol;
5354 }
5355 coding->eol_type = CODING_EOL_LF;
5356 coding->symbol = saved_coding_symbol;
5357 }
4ed46869 5358 }
d46c5b12 5359
2391eaa4
KH
5360 coding->consumed = consumed;
5361 coding->consumed_char = consumed_char;
5362 coding->produced = produced;
5363 coding->produced_char = produced_char;
5364
78108bcd 5365 if (coding->dst_multibyte)
73be902c
KH
5366 newstr = make_uninit_multibyte_string (produced_char + shrinked_bytes,
5367 produced + shrinked_bytes);
78108bcd 5368 else
73be902c
KH
5369 newstr = make_uninit_string (produced + shrinked_bytes);
5370 if (from > 0)
5371 bcopy (XSTRING (str)->data, XSTRING (newstr)->data, from);
5372 bcopy (buf.data, XSTRING (newstr)->data + from, produced);
5373 if (shrinked_bytes > from)
5374 bcopy (XSTRING (str)->data + to_byte,
5375 XSTRING (newstr)->data + from + produced,
5376 shrinked_bytes - from);
5377 free_conversion_buffer (&buf);
b73bfc1c
KH
5378
5379 if (coding->cmp_data && coding->cmp_data->used)
73be902c 5380 coding_restore_composition (coding, newstr);
b73bfc1c
KH
5381 coding_free_composition_data (coding);
5382
5383 if (SYMBOLP (coding->post_read_conversion)
5384 && !NILP (Ffboundp (coding->post_read_conversion)))
73be902c 5385 newstr = run_pre_post_conversion_on_str (newstr, coding, 0);
b73bfc1c 5386
73be902c 5387 return newstr;
b73bfc1c
KH
5388}
5389
5390Lisp_Object
5391encode_coding_string (str, coding, nocopy)
5392 Lisp_Object str;
5393 struct coding_system *coding;
5394 int nocopy;
5395{
5396 int len;
73be902c 5397 struct conversion_buffer buf;
b73bfc1c
KH
5398 int from, to, to_byte;
5399 struct gcpro gcpro1;
5400 Lisp_Object saved_coding_symbol;
5401 int result;
73be902c
KH
5402 int shrinked_bytes = 0;
5403 Lisp_Object newstr;
2391eaa4 5404 int consumed, consumed_char, produced, produced_char;
b73bfc1c
KH
5405
5406 if (SYMBOLP (coding->pre_write_conversion)
5407 && !NILP (Ffboundp (coding->pre_write_conversion)))
6bac5b12 5408 str = run_pre_post_conversion_on_str (str, coding, 1);
b73bfc1c
KH
5409
5410 from = 0;
5411 to = XSTRING (str)->size;
5412 to_byte = STRING_BYTES (XSTRING (str));
5413
5414 saved_coding_symbol = Qnil;
5415 if (! CODING_REQUIRE_ENCODING (coding))
826bfb8b 5416 {
2391eaa4
KH
5417 coding->consumed = STRING_BYTES (XSTRING (str));
5418 coding->consumed_char = XSTRING (str)->size;
b73bfc1c
KH
5419 if (STRING_MULTIBYTE (str))
5420 {
5421 str = Fstring_as_unibyte (str);
5422 nocopy = 1;
5423 }
2391eaa4
KH
5424 coding->produced = STRING_BYTES (XSTRING (str));
5425 coding->produced_char = XSTRING (str)->size;
b73bfc1c 5426 return (nocopy ? str : Fcopy_sequence (str));
826bfb8b
KH
5427 }
5428
b73bfc1c
KH
5429 /* Encoding routines determine the multibyteness of the source text
5430 by coding->src_multibyte. */
5431 coding->src_multibyte = STRING_MULTIBYTE (str);
5432 coding->dst_multibyte = 0;
5433
5434 if (coding->composing != COMPOSITION_DISABLED)
5435 coding_save_composition (coding, from, to, str);
ec6d2bb8 5436
b73bfc1c 5437 /* Try to skip the heading and tailing ASCIIs. */
4956c225
KH
5438 if (coding->type != coding_type_ccl)
5439 {
4956c225
KH
5440 SHRINK_CONVERSION_REGION (&from, &to_byte, coding, XSTRING (str)->data,
5441 1);
5442 if (from == to_byte)
5443 return (nocopy ? str : Fcopy_sequence (str));
73be902c 5444 shrinked_bytes = from + (STRING_BYTES (XSTRING (str)) - to_byte);
4956c225 5445 }
b73bfc1c
KH
5446
5447 len = encoding_buffer_size (coding, to_byte - from);
73be902c
KH
5448 allocate_conversion_buffer (buf, len);
5449
2391eaa4 5450 consumed = consumed_char = produced = produced_char = 0;
73be902c
KH
5451 while (1)
5452 {
5453 result = encode_coding (coding, XSTRING (str)->data + from + consumed,
5454 buf.data + produced, to_byte - from - consumed,
5455 buf.size - produced);
5456 consumed += coding->consumed;
2391eaa4 5457 consumed_char += coding->consumed_char;
13004bef 5458 produced += coding->produced;
2391eaa4
KH
5459 produced_char += coding->produced_char;
5460 if (result == CODING_FINISH_NORMAL
5461 || (result == CODING_FINISH_INSUFFICIENT_SRC
5462 && coding->consumed == 0))
73be902c
KH
5463 break;
5464 /* Now result should be CODING_FINISH_INSUFFICIENT_DST. */
5465 extend_conversion_buffer (&buf);
5466 }
5467
2391eaa4
KH
5468 coding->consumed = consumed;
5469 coding->consumed_char = consumed_char;
5470 coding->produced = produced;
5471 coding->produced_char = produced_char;
5472
73be902c 5473 newstr = make_uninit_string (produced + shrinked_bytes);
b73bfc1c 5474 if (from > 0)
73be902c
KH
5475 bcopy (XSTRING (str)->data, XSTRING (newstr)->data, from);
5476 bcopy (buf.data, XSTRING (newstr)->data + from, produced);
5477 if (shrinked_bytes > from)
5478 bcopy (XSTRING (str)->data + to_byte,
5479 XSTRING (newstr)->data + from + produced,
5480 shrinked_bytes - from);
5481
5482 free_conversion_buffer (&buf);
ec6d2bb8 5483 coding_free_composition_data (coding);
b73bfc1c 5484
73be902c 5485 return newstr;
4ed46869
KH
5486}
5487
5488\f
5489#ifdef emacs
1397dc18 5490/*** 8. Emacs Lisp library functions ***/
4ed46869 5491
4ed46869
KH
5492DEFUN ("coding-system-p", Fcoding_system_p, Scoding_system_p, 1, 1, 0,
5493 "Return t if OBJECT is nil or a coding-system.\n\
3a73fa5d
RS
5494See the documentation of `make-coding-system' for information\n\
5495about coding-system objects.")
4ed46869
KH
5496 (obj)
5497 Lisp_Object obj;
5498{
4608c386
KH
5499 if (NILP (obj))
5500 return Qt;
5501 if (!SYMBOLP (obj))
5502 return Qnil;
5503 /* Get coding-spec vector for OBJ. */
5504 obj = Fget (obj, Qcoding_system);
5505 return ((VECTORP (obj) && XVECTOR (obj)->size == 5)
5506 ? Qt : Qnil);
4ed46869
KH
5507}
5508
9d991de8
RS
5509DEFUN ("read-non-nil-coding-system", Fread_non_nil_coding_system,
5510 Sread_non_nil_coding_system, 1, 1, 0,
e0e989f6 5511 "Read a coding system from the minibuffer, prompting with string PROMPT.")
4ed46869
KH
5512 (prompt)
5513 Lisp_Object prompt;
5514{
e0e989f6 5515 Lisp_Object val;
9d991de8
RS
5516 do
5517 {
4608c386
KH
5518 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil,
5519 Qt, Qnil, Qcoding_system_history, Qnil, Qnil);
9d991de8
RS
5520 }
5521 while (XSTRING (val)->size == 0);
e0e989f6 5522 return (Fintern (val, Qnil));
4ed46869
KH
5523}
5524
9b787f3e
RS
5525DEFUN ("read-coding-system", Fread_coding_system, Sread_coding_system, 1, 2, 0,
5526 "Read a coding system from the minibuffer, prompting with string PROMPT.\n\
5527If the user enters null input, return second argument DEFAULT-CODING-SYSTEM.")
5528 (prompt, default_coding_system)
5529 Lisp_Object prompt, default_coding_system;
4ed46869 5530{
f44d27ce 5531 Lisp_Object val;
9b787f3e
RS
5532 if (SYMBOLP (default_coding_system))
5533 XSETSTRING (default_coding_system, XSYMBOL (default_coding_system)->name);
4608c386 5534 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil,
9b787f3e
RS
5535 Qt, Qnil, Qcoding_system_history,
5536 default_coding_system, Qnil);
e0e989f6 5537 return (XSTRING (val)->size == 0 ? Qnil : Fintern (val, Qnil));
4ed46869
KH
5538}
5539
5540DEFUN ("check-coding-system", Fcheck_coding_system, Scheck_coding_system,
5541 1, 1, 0,
5542 "Check validity of CODING-SYSTEM.\n\
3a73fa5d
RS
5543If valid, return CODING-SYSTEM, else signal a `coding-system-error' error.\n\
5544It is valid if it is a symbol with a non-nil `coding-system' property.\n\
4ed46869
KH
5545The value of property should be a vector of length 5.")
5546 (coding_system)
5547 Lisp_Object coding_system;
5548{
5549 CHECK_SYMBOL (coding_system, 0);
5550 if (!NILP (Fcoding_system_p (coding_system)))
5551 return coding_system;
5552 while (1)
02ba4723 5553 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil));
4ed46869 5554}
3a73fa5d 5555\f
d46c5b12
KH
5556Lisp_Object
5557detect_coding_system (src, src_bytes, highest)
5558 unsigned char *src;
5559 int src_bytes, highest;
4ed46869
KH
5560{
5561 int coding_mask, eol_type;
d46c5b12
KH
5562 Lisp_Object val, tmp;
5563 int dummy;
4ed46869 5564
d46c5b12
KH
5565 coding_mask = detect_coding_mask (src, src_bytes, NULL, &dummy);
5566 eol_type = detect_eol_type (src, src_bytes, &dummy);
5567 if (eol_type == CODING_EOL_INCONSISTENT)
25b02698 5568 eol_type = CODING_EOL_UNDECIDED;
4ed46869 5569
d46c5b12 5570 if (!coding_mask)
4ed46869 5571 {
27901516 5572 val = Qundecided;
d46c5b12 5573 if (eol_type != CODING_EOL_UNDECIDED)
4ed46869 5574 {
f44d27ce
RS
5575 Lisp_Object val2;
5576 val2 = Fget (Qundecided, Qeol_type);
4ed46869
KH
5577 if (VECTORP (val2))
5578 val = XVECTOR (val2)->contents[eol_type];
5579 }
80e803b4 5580 return (highest ? val : Fcons (val, Qnil));
4ed46869 5581 }
4ed46869 5582
d46c5b12
KH
5583 /* At first, gather possible coding systems in VAL. */
5584 val = Qnil;
fa42c37f 5585 for (tmp = Vcoding_category_list; CONSP (tmp); tmp = XCDR (tmp))
4ed46869 5586 {
fa42c37f
KH
5587 Lisp_Object category_val, category_index;
5588
5589 category_index = Fget (XCAR (tmp), Qcoding_category_index);
5590 category_val = Fsymbol_value (XCAR (tmp));
5591 if (!NILP (category_val)
5592 && NATNUMP (category_index)
5593 && (coding_mask & (1 << XFASTINT (category_index))))
4ed46869 5594 {
fa42c37f 5595 val = Fcons (category_val, val);
d46c5b12
KH
5596 if (highest)
5597 break;
4ed46869
KH
5598 }
5599 }
d46c5b12
KH
5600 if (!highest)
5601 val = Fnreverse (val);
4ed46869 5602
65059037 5603 /* Then, replace the elements with subsidiary coding systems. */
fa42c37f 5604 for (tmp = val; CONSP (tmp); tmp = XCDR (tmp))
4ed46869 5605 {
65059037
RS
5606 if (eol_type != CODING_EOL_UNDECIDED
5607 && eol_type != CODING_EOL_INCONSISTENT)
4ed46869 5608 {
d46c5b12 5609 Lisp_Object eol;
03699b14 5610 eol = Fget (XCAR (tmp), Qeol_type);
d46c5b12 5611 if (VECTORP (eol))
03699b14 5612 XCAR (tmp) = XVECTOR (eol)->contents[eol_type];
4ed46869
KH
5613 }
5614 }
03699b14 5615 return (highest ? XCAR (val) : val);
d46c5b12 5616}
4ed46869 5617
d46c5b12
KH
5618DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region,
5619 2, 3, 0,
5620 "Detect coding system of the text in the region between START and END.\n\
5621Return a list of possible coding systems ordered by priority.\n\
5622\n\
80e803b4
KH
5623If only ASCII characters are found, it returns a list of single element\n\
5624`undecided' or its subsidiary coding system according to a detected\n\
5625end-of-line format.\n\
d46c5b12
KH
5626\n\
5627If optional argument HIGHEST is non-nil, return the coding system of\n\
5628highest priority.")
5629 (start, end, highest)
5630 Lisp_Object start, end, highest;
5631{
5632 int from, to;
5633 int from_byte, to_byte;
6289dd10 5634
d46c5b12
KH
5635 CHECK_NUMBER_COERCE_MARKER (start, 0);
5636 CHECK_NUMBER_COERCE_MARKER (end, 1);
4ed46869 5637
d46c5b12
KH
5638 validate_region (&start, &end);
5639 from = XINT (start), to = XINT (end);
5640 from_byte = CHAR_TO_BYTE (from);
5641 to_byte = CHAR_TO_BYTE (to);
6289dd10 5642
d46c5b12
KH
5643 if (from < GPT && to >= GPT)
5644 move_gap_both (to, to_byte);
4ed46869 5645
d46c5b12
KH
5646 return detect_coding_system (BYTE_POS_ADDR (from_byte),
5647 to_byte - from_byte,
5648 !NILP (highest));
5649}
6289dd10 5650
d46c5b12
KH
5651DEFUN ("detect-coding-string", Fdetect_coding_string, Sdetect_coding_string,
5652 1, 2, 0,
5653 "Detect coding system of the text in STRING.\n\
5654Return a list of possible coding systems ordered by priority.\n\
5655\n\
80e803b4
KH
5656If only ASCII characters are found, it returns a list of single element\n\
5657`undecided' or its subsidiary coding system according to a detected\n\
5658end-of-line format.\n\
d46c5b12
KH
5659\n\
5660If optional argument HIGHEST is non-nil, return the coding system of\n\
5661highest priority.")
5662 (string, highest)
5663 Lisp_Object string, highest;
5664{
5665 CHECK_STRING (string, 0);
4ed46869 5666
d46c5b12 5667 return detect_coding_system (XSTRING (string)->data,
fc932ac6 5668 STRING_BYTES (XSTRING (string)),
d46c5b12 5669 !NILP (highest));
4ed46869
KH
5670}
5671
05e6f5dc
KH
5672/* Return an intersection of lists L1 and L2. */
5673
5674static Lisp_Object
5675intersection (l1, l2)
5676 Lisp_Object l1, l2;
5677{
5678 Lisp_Object val;
5679
5680 for (val = Qnil; CONSP (l1); l1 = XCDR (l1))
5681 {
5682 if (!NILP (Fmemq (XCAR (l1), l2)))
5683 val = Fcons (XCAR (l1), val);
5684 }
5685 return val;
5686}
5687
5688
5689/* Subroutine for Fsafe_coding_systems_region_internal.
5690
5691 Return a list of coding systems that safely encode the multibyte
5692 text between P and PEND. SAFE_CODINGS, if non-nil, is a list of
5693 possible coding systems. If it is nil, it means that we have not
5694 yet found any coding systems.
5695
5696 WORK_TABLE is a copy of the char-table Vchar_coding_system_table. An
5697 element of WORK_TABLE is set to t once the element is looked up.
5698
5699 If a non-ASCII single byte char is found, set
5700 *single_byte_char_found to 1. */
5701
5702static Lisp_Object
5703find_safe_codings (p, pend, safe_codings, work_table, single_byte_char_found)
5704 unsigned char *p, *pend;
5705 Lisp_Object safe_codings, work_table;
5706 int *single_byte_char_found;
5707{
5708 int c, len, idx;
5709 Lisp_Object val;
5710
5711 while (p < pend)
5712 {
5713 c = STRING_CHAR_AND_LENGTH (p, pend - p, len);
5714 p += len;
5715 if (ASCII_BYTE_P (c))
5716 /* We can ignore ASCII characters here. */
5717 continue;
5718 if (SINGLE_BYTE_CHAR_P (c))
5719 *single_byte_char_found = 1;
5720 if (NILP (safe_codings))
5721 continue;
5722 /* Check the safe coding systems for C. */
5723 val = char_table_ref_and_index (work_table, c, &idx);
5724 if (EQ (val, Qt))
5725 /* This element was already checked. Ignore it. */
5726 continue;
5727 /* Remember that we checked this element. */
975f250a 5728 CHAR_TABLE_SET (work_table, make_number (idx), Qt);
05e6f5dc
KH
5729
5730 /* If there are some safe coding systems for C and we have
5731 already found the other set of coding systems for the
5732 different characters, get the intersection of them. */
5733 if (!EQ (safe_codings, Qt) && !NILP (val))
5734 val = intersection (safe_codings, val);
5735 safe_codings = val;
5736 }
5737 return safe_codings;
5738}
5739
5740
5741/* Return a list of coding systems that safely encode the text between
5742 START and END. If the text contains only ASCII or is unibyte,
5743 return t. */
5744
5745DEFUN ("find-coding-systems-region-internal",
5746 Ffind_coding_systems_region_internal,
5747 Sfind_coding_systems_region_internal, 2, 2, 0,
5748 "Internal use only.")
5749 (start, end)
5750 Lisp_Object start, end;
5751{
5752 Lisp_Object work_table, safe_codings;
5753 int non_ascii_p = 0;
5754 int single_byte_char_found = 0;
5755 unsigned char *p1, *p1end, *p2, *p2end, *p;
5756 Lisp_Object args[2];
5757
5758 if (STRINGP (start))
5759 {
5760 if (!STRING_MULTIBYTE (start))
5761 return Qt;
5762 p1 = XSTRING (start)->data, p1end = p1 + STRING_BYTES (XSTRING (start));
5763 p2 = p2end = p1end;
5764 if (XSTRING (start)->size != STRING_BYTES (XSTRING (start)))
5765 non_ascii_p = 1;
5766 }
5767 else
5768 {
5769 int from, to, stop;
5770
5771 CHECK_NUMBER_COERCE_MARKER (start, 0);
5772 CHECK_NUMBER_COERCE_MARKER (end, 1);
5773 if (XINT (start) < BEG || XINT (end) > Z || XINT (start) > XINT (end))
5774 args_out_of_range (start, end);
5775 if (NILP (current_buffer->enable_multibyte_characters))
5776 return Qt;
5777 from = CHAR_TO_BYTE (XINT (start));
5778 to = CHAR_TO_BYTE (XINT (end));
5779 stop = from < GPT_BYTE && GPT_BYTE < to ? GPT_BYTE : to;
5780 p1 = BYTE_POS_ADDR (from), p1end = p1 + (stop - from);
5781 if (stop == to)
5782 p2 = p2end = p1end;
5783 else
5784 p2 = BYTE_POS_ADDR (stop), p2end = p2 + (to - stop);
5785 if (XINT (end) - XINT (start) != to - from)
5786 non_ascii_p = 1;
5787 }
5788
5789 if (!non_ascii_p)
5790 {
5791 /* We are sure that the text contains no multibyte character.
5792 Check if it contains eight-bit-graphic. */
5793 p = p1;
5794 for (p = p1; p < p1end && ASCII_BYTE_P (*p); p++);
5795 if (p == p1end)
5796 {
5797 for (p = p2; p < p2end && ASCII_BYTE_P (*p); p++);
5798 if (p == p2end)
5799 return Qt;
5800 }
5801 }
5802
5803 /* The text contains non-ASCII characters. */
5804 work_table = Fcopy_sequence (Vchar_coding_system_table);
5805 safe_codings = find_safe_codings (p1, p1end, Qt, work_table,
5806 &single_byte_char_found);
5807 if (p2 < p2end)
5808 safe_codings = find_safe_codings (p2, p2end, safe_codings, work_table,
5809 &single_byte_char_found);
5810
5811 if (!single_byte_char_found)
5812 {
5813 /* Append generic coding systems. */
5814 Lisp_Object args[2];
5815 args[0] = safe_codings;
5816 args[1] = Fchar_table_extra_slot (Vchar_coding_system_table,
5817 make_number (0));
975f250a 5818 safe_codings = Fappend (2, args);
05e6f5dc
KH
5819 }
5820 else
5821 safe_codings = Fcons (Qraw_text, Fcons (Qemacs_mule, safe_codings));
5822 return safe_codings;
5823}
5824
5825
4031e2bf
KH
5826Lisp_Object
5827code_convert_region1 (start, end, coding_system, encodep)
d46c5b12 5828 Lisp_Object start, end, coding_system;
4031e2bf 5829 int encodep;
3a73fa5d
RS
5830{
5831 struct coding_system coding;
4031e2bf 5832 int from, to, len;
3a73fa5d 5833
d46c5b12
KH
5834 CHECK_NUMBER_COERCE_MARKER (start, 0);
5835 CHECK_NUMBER_COERCE_MARKER (end, 1);
3a73fa5d
RS
5836 CHECK_SYMBOL (coding_system, 2);
5837
d46c5b12
KH
5838 validate_region (&start, &end);
5839 from = XFASTINT (start);
5840 to = XFASTINT (end);
5841
3a73fa5d 5842 if (NILP (coding_system))
d46c5b12
KH
5843 return make_number (to - from);
5844
3a73fa5d 5845 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
d46c5b12 5846 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
3a73fa5d 5847
d46c5b12 5848 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
5849 coding.src_multibyte = coding.dst_multibyte
5850 = !NILP (current_buffer->enable_multibyte_characters);
fb88bf2d
KH
5851 code_convert_region (from, CHAR_TO_BYTE (from), to, CHAR_TO_BYTE (to),
5852 &coding, encodep, 1);
f072a3e8 5853 Vlast_coding_system_used = coding.symbol;
fb88bf2d 5854 return make_number (coding.produced_char);
4031e2bf
KH
5855}
5856
5857DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region,
5858 3, 3, "r\nzCoding system: ",
5859 "Decode the current region by specified coding system.\n\
5860When called from a program, takes three arguments:\n\
5861START, END, and CODING-SYSTEM. START and END are buffer positions.\n\
f072a3e8
RS
5862This function sets `last-coding-system-used' to the precise coding system\n\
5863used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
5864not fully specified.)\n\
5865It returns the length of the decoded text.")
4031e2bf
KH
5866 (start, end, coding_system)
5867 Lisp_Object start, end, coding_system;
5868{
5869 return code_convert_region1 (start, end, coding_system, 0);
3a73fa5d
RS
5870}
5871
5872DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region,
5873 3, 3, "r\nzCoding system: ",
d46c5b12 5874 "Encode the current region by specified coding system.\n\
3a73fa5d 5875When called from a program, takes three arguments:\n\
d46c5b12 5876START, END, and CODING-SYSTEM. START and END are buffer positions.\n\
f072a3e8
RS
5877This function sets `last-coding-system-used' to the precise coding system\n\
5878used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
5879not fully specified.)\n\
5880It returns the length of the encoded text.")
d46c5b12
KH
5881 (start, end, coding_system)
5882 Lisp_Object start, end, coding_system;
3a73fa5d 5883{
4031e2bf
KH
5884 return code_convert_region1 (start, end, coding_system, 1);
5885}
3a73fa5d 5886
4031e2bf
KH
5887Lisp_Object
5888code_convert_string1 (string, coding_system, nocopy, encodep)
5889 Lisp_Object string, coding_system, nocopy;
5890 int encodep;
5891{
5892 struct coding_system coding;
3a73fa5d 5893
4031e2bf
KH
5894 CHECK_STRING (string, 0);
5895 CHECK_SYMBOL (coding_system, 1);
4ed46869 5896
d46c5b12 5897 if (NILP (coding_system))
4031e2bf 5898 return (NILP (nocopy) ? Fcopy_sequence (string) : string);
4ed46869 5899
d46c5b12
KH
5900 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
5901 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
5f1cd180 5902
d46c5b12 5903 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
5904 string = (encodep
5905 ? encode_coding_string (string, &coding, !NILP (nocopy))
5906 : decode_coding_string (string, &coding, !NILP (nocopy)));
f072a3e8 5907 Vlast_coding_system_used = coding.symbol;
ec6d2bb8
KH
5908
5909 return string;
4ed46869
KH
5910}
5911
4ed46869 5912DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string,
e0e989f6
KH
5913 2, 3, 0,
5914 "Decode STRING which is encoded in CODING-SYSTEM, and return the result.\n\
fe487a71 5915Optional arg NOCOPY non-nil means it is ok to return STRING itself\n\
f072a3e8
RS
5916if the decoding operation is trivial.\n\
5917This function sets `last-coding-system-used' to the precise coding system\n\
5918used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
5919not fully specified.)")
e0e989f6
KH
5920 (string, coding_system, nocopy)
5921 Lisp_Object string, coding_system, nocopy;
4ed46869 5922{
f072a3e8 5923 return code_convert_string1 (string, coding_system, nocopy, 0);
4ed46869
KH
5924}
5925
5926DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string,
e0e989f6
KH
5927 2, 3, 0,
5928 "Encode STRING to CODING-SYSTEM, and return the result.\n\
fe487a71 5929Optional arg NOCOPY non-nil means it is ok to return STRING itself\n\
f072a3e8
RS
5930if the encoding operation is trivial.\n\
5931This function sets `last-coding-system-used' to the precise coding system\n\
5932used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
5933not fully specified.)")
e0e989f6
KH
5934 (string, coding_system, nocopy)
5935 Lisp_Object string, coding_system, nocopy;
4ed46869 5936{
f072a3e8 5937 return code_convert_string1 (string, coding_system, nocopy, 1);
4ed46869 5938}
4031e2bf 5939
ecec61c1 5940/* Encode or decode STRING according to CODING_SYSTEM.
ec6d2bb8
KH
5941 Do not set Vlast_coding_system_used.
5942
5943 This function is called only from macros DECODE_FILE and
5944 ENCODE_FILE, thus we ignore character composition. */
ecec61c1
KH
5945
5946Lisp_Object
5947code_convert_string_norecord (string, coding_system, encodep)
5948 Lisp_Object string, coding_system;
5949 int encodep;
5950{
5951 struct coding_system coding;
5952
5953 CHECK_STRING (string, 0);
5954 CHECK_SYMBOL (coding_system, 1);
5955
5956 if (NILP (coding_system))
5957 return string;
5958
5959 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
5960 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
5961
ec6d2bb8 5962 coding.composing = COMPOSITION_DISABLED;
ecec61c1 5963 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
5964 return (encodep
5965 ? encode_coding_string (string, &coding, 1)
5966 : decode_coding_string (string, &coding, 1));
ecec61c1 5967}
3a73fa5d 5968\f
4ed46869 5969DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0,
55ab7be3 5970 "Decode a Japanese character which has CODE in shift_jis encoding.\n\
4ed46869
KH
5971Return the corresponding character.")
5972 (code)
5973 Lisp_Object code;
5974{
5975 unsigned char c1, c2, s1, s2;
5976 Lisp_Object val;
5977
5978 CHECK_NUMBER (code, 0);
5979 s1 = (XFASTINT (code)) >> 8, s2 = (XFASTINT (code)) & 0xFF;
55ab7be3
KH
5980 if (s1 == 0)
5981 {
c28a9453
KH
5982 if (s2 < 0x80)
5983 XSETFASTINT (val, s2);
5984 else if (s2 >= 0xA0 || s2 <= 0xDF)
b73bfc1c 5985 XSETFASTINT (val, MAKE_CHAR (charset_katakana_jisx0201, s2, 0));
c28a9453 5986 else
9da8350f 5987 error ("Invalid Shift JIS code: %x", XFASTINT (code));
55ab7be3
KH
5988 }
5989 else
5990 {
5991 if ((s1 < 0x80 || s1 > 0x9F && s1 < 0xE0 || s1 > 0xEF)
5992 || (s2 < 0x40 || s2 == 0x7F || s2 > 0xFC))
9da8350f 5993 error ("Invalid Shift JIS code: %x", XFASTINT (code));
55ab7be3 5994 DECODE_SJIS (s1, s2, c1, c2);
b73bfc1c 5995 XSETFASTINT (val, MAKE_CHAR (charset_jisx0208, c1, c2));
55ab7be3 5996 }
4ed46869
KH
5997 return val;
5998}
5999
6000DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0,
55ab7be3
KH
6001 "Encode a Japanese character CHAR to shift_jis encoding.\n\
6002Return the corresponding code in SJIS.")
4ed46869
KH
6003 (ch)
6004 Lisp_Object ch;
6005{
bcf26d6a 6006 int charset, c1, c2, s1, s2;
4ed46869
KH
6007 Lisp_Object val;
6008
6009 CHECK_NUMBER (ch, 0);
6010 SPLIT_CHAR (XFASTINT (ch), charset, c1, c2);
c28a9453
KH
6011 if (charset == CHARSET_ASCII)
6012 {
6013 val = ch;
6014 }
6015 else if (charset == charset_jisx0208
6016 && c1 > 0x20 && c1 < 0x7F && c2 > 0x20 && c2 < 0x7F)
4ed46869
KH
6017 {
6018 ENCODE_SJIS (c1, c2, s1, s2);
bcf26d6a 6019 XSETFASTINT (val, (s1 << 8) | s2);
4ed46869 6020 }
55ab7be3
KH
6021 else if (charset == charset_katakana_jisx0201
6022 && c1 > 0x20 && c2 < 0xE0)
6023 {
6024 XSETFASTINT (val, c1 | 0x80);
6025 }
4ed46869 6026 else
55ab7be3 6027 error ("Can't encode to shift_jis: %d", XFASTINT (ch));
4ed46869
KH
6028 return val;
6029}
6030
6031DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0,
c28a9453 6032 "Decode a Big5 character which has CODE in BIG5 coding system.\n\
4ed46869
KH
6033Return the corresponding character.")
6034 (code)
6035 Lisp_Object code;
6036{
6037 int charset;
6038 unsigned char b1, b2, c1, c2;
6039 Lisp_Object val;
6040
6041 CHECK_NUMBER (code, 0);
6042 b1 = (XFASTINT (code)) >> 8, b2 = (XFASTINT (code)) & 0xFF;
c28a9453
KH
6043 if (b1 == 0)
6044 {
6045 if (b2 >= 0x80)
9da8350f 6046 error ("Invalid BIG5 code: %x", XFASTINT (code));
c28a9453
KH
6047 val = code;
6048 }
6049 else
6050 {
6051 if ((b1 < 0xA1 || b1 > 0xFE)
6052 || (b2 < 0x40 || (b2 > 0x7E && b2 < 0xA1) || b2 > 0xFE))
9da8350f 6053 error ("Invalid BIG5 code: %x", XFASTINT (code));
c28a9453 6054 DECODE_BIG5 (b1, b2, charset, c1, c2);
b73bfc1c 6055 XSETFASTINT (val, MAKE_CHAR (charset, c1, c2));
c28a9453 6056 }
4ed46869
KH
6057 return val;
6058}
6059
6060DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0,
d46c5b12 6061 "Encode the Big5 character CHAR to BIG5 coding system.\n\
4ed46869
KH
6062Return the corresponding character code in Big5.")
6063 (ch)
6064 Lisp_Object ch;
6065{
bcf26d6a 6066 int charset, c1, c2, b1, b2;
4ed46869
KH
6067 Lisp_Object val;
6068
6069 CHECK_NUMBER (ch, 0);
6070 SPLIT_CHAR (XFASTINT (ch), charset, c1, c2);
c28a9453
KH
6071 if (charset == CHARSET_ASCII)
6072 {
6073 val = ch;
6074 }
6075 else if ((charset == charset_big5_1
6076 && (XFASTINT (ch) >= 0x250a1 && XFASTINT (ch) <= 0x271ec))
6077 || (charset == charset_big5_2
6078 && XFASTINT (ch) >= 0x290a1 && XFASTINT (ch) <= 0x2bdb2))
4ed46869
KH
6079 {
6080 ENCODE_BIG5 (charset, c1, c2, b1, b2);
bcf26d6a 6081 XSETFASTINT (val, (b1 << 8) | b2);
4ed46869
KH
6082 }
6083 else
c28a9453 6084 error ("Can't encode to Big5: %d", XFASTINT (ch));
4ed46869
KH
6085 return val;
6086}
3a73fa5d 6087\f
1ba9e4ab
KH
6088DEFUN ("set-terminal-coding-system-internal",
6089 Fset_terminal_coding_system_internal,
6090 Sset_terminal_coding_system_internal, 1, 1, 0, "")
4ed46869
KH
6091 (coding_system)
6092 Lisp_Object coding_system;
6093{
6094 CHECK_SYMBOL (coding_system, 0);
6095 setup_coding_system (Fcheck_coding_system (coding_system), &terminal_coding);
70c22245 6096 /* We had better not send unsafe characters to terminal. */
6e85d753 6097 terminal_coding.flags |= CODING_FLAG_ISO_SAFE;
ec6d2bb8
KH
6098 /* Characer composition should be disabled. */
6099 terminal_coding.composing = COMPOSITION_DISABLED;
b73bfc1c
KH
6100 terminal_coding.src_multibyte = 1;
6101 terminal_coding.dst_multibyte = 0;
4ed46869
KH
6102 return Qnil;
6103}
6104
c4825358
KH
6105DEFUN ("set-safe-terminal-coding-system-internal",
6106 Fset_safe_terminal_coding_system_internal,
6107 Sset_safe_terminal_coding_system_internal, 1, 1, 0, "")
6108 (coding_system)
6109 Lisp_Object coding_system;
6110{
6111 CHECK_SYMBOL (coding_system, 0);
6112 setup_coding_system (Fcheck_coding_system (coding_system),
6113 &safe_terminal_coding);
ec6d2bb8
KH
6114 /* Characer composition should be disabled. */
6115 safe_terminal_coding.composing = COMPOSITION_DISABLED;
b73bfc1c
KH
6116 safe_terminal_coding.src_multibyte = 1;
6117 safe_terminal_coding.dst_multibyte = 0;
c4825358
KH
6118 return Qnil;
6119}
6120
4ed46869
KH
6121DEFUN ("terminal-coding-system",
6122 Fterminal_coding_system, Sterminal_coding_system, 0, 0, 0,
3a73fa5d 6123 "Return coding system specified for terminal output.")
4ed46869
KH
6124 ()
6125{
6126 return terminal_coding.symbol;
6127}
6128
1ba9e4ab
KH
6129DEFUN ("set-keyboard-coding-system-internal",
6130 Fset_keyboard_coding_system_internal,
6131 Sset_keyboard_coding_system_internal, 1, 1, 0, "")
4ed46869
KH
6132 (coding_system)
6133 Lisp_Object coding_system;
6134{
6135 CHECK_SYMBOL (coding_system, 0);
6136 setup_coding_system (Fcheck_coding_system (coding_system), &keyboard_coding);
ec6d2bb8
KH
6137 /* Characer composition should be disabled. */
6138 keyboard_coding.composing = COMPOSITION_DISABLED;
4ed46869
KH
6139 return Qnil;
6140}
6141
6142DEFUN ("keyboard-coding-system",
6143 Fkeyboard_coding_system, Skeyboard_coding_system, 0, 0, 0,
3a73fa5d 6144 "Return coding system specified for decoding keyboard input.")
4ed46869
KH
6145 ()
6146{
6147 return keyboard_coding.symbol;
6148}
6149
6150\f
a5d301df
KH
6151DEFUN ("find-operation-coding-system", Ffind_operation_coding_system,
6152 Sfind_operation_coding_system, 1, MANY, 0,
6153 "Choose a coding system for an operation based on the target name.\n\
69f76525 6154The value names a pair of coding systems: (DECODING-SYSTEM . ENCODING-SYSTEM).\n\
9ce27fde
KH
6155DECODING-SYSTEM is the coding system to use for decoding\n\
6156\(in case OPERATION does decoding), and ENCODING-SYSTEM is the coding system\n\
6157for encoding (in case OPERATION does encoding).\n\
ccdb79f5
RS
6158\n\
6159The first argument OPERATION specifies an I/O primitive:\n\
6160 For file I/O, `insert-file-contents' or `write-region'.\n\
6161 For process I/O, `call-process', `call-process-region', or `start-process'.\n\
6162 For network I/O, `open-network-stream'.\n\
6163\n\
6164The remaining arguments should be the same arguments that were passed\n\
6165to the primitive. Depending on which primitive, one of those arguments\n\
6166is selected as the TARGET. For example, if OPERATION does file I/O,\n\
6167whichever argument specifies the file name is TARGET.\n\
6168\n\
6169TARGET has a meaning which depends on OPERATION:\n\
4ed46869
KH
6170 For file I/O, TARGET is a file name.\n\
6171 For process I/O, TARGET is a process name.\n\
6172 For network I/O, TARGET is a service name or a port number\n\
6173\n\
02ba4723
KH
6174This function looks up what specified for TARGET in,\n\
6175`file-coding-system-alist', `process-coding-system-alist',\n\
6176or `network-coding-system-alist' depending on OPERATION.\n\
6177They may specify a coding system, a cons of coding systems,\n\
6178or a function symbol to call.\n\
6179In the last case, we call the function with one argument,\n\
9ce27fde 6180which is a list of all the arguments given to this function.")
4ed46869
KH
6181 (nargs, args)
6182 int nargs;
6183 Lisp_Object *args;
6184{
6185 Lisp_Object operation, target_idx, target, val;
6186 register Lisp_Object chain;
6187
6188 if (nargs < 2)
6189 error ("Too few arguments");
6190 operation = args[0];
6191 if (!SYMBOLP (operation)
6192 || !INTEGERP (target_idx = Fget (operation, Qtarget_idx)))
6193 error ("Invalid first arguement");
6194 if (nargs < 1 + XINT (target_idx))
6195 error ("Too few arguments for operation: %s",
6196 XSYMBOL (operation)->name->data);
6197 target = args[XINT (target_idx) + 1];
6198 if (!(STRINGP (target)
6199 || (EQ (operation, Qopen_network_stream) && INTEGERP (target))))
6200 error ("Invalid %dth argument", XINT (target_idx) + 1);
6201
2e34157c
RS
6202 chain = ((EQ (operation, Qinsert_file_contents)
6203 || EQ (operation, Qwrite_region))
02ba4723 6204 ? Vfile_coding_system_alist
2e34157c 6205 : (EQ (operation, Qopen_network_stream)
02ba4723
KH
6206 ? Vnetwork_coding_system_alist
6207 : Vprocess_coding_system_alist));
4ed46869
KH
6208 if (NILP (chain))
6209 return Qnil;
6210
03699b14 6211 for (; CONSP (chain); chain = XCDR (chain))
4ed46869 6212 {
f44d27ce 6213 Lisp_Object elt;
03699b14 6214 elt = XCAR (chain);
4ed46869
KH
6215
6216 if (CONSP (elt)
6217 && ((STRINGP (target)
03699b14
KR
6218 && STRINGP (XCAR (elt))
6219 && fast_string_match (XCAR (elt), target) >= 0)
6220 || (INTEGERP (target) && EQ (target, XCAR (elt)))))
02ba4723 6221 {
03699b14 6222 val = XCDR (elt);
b19fd4c5
KH
6223 /* Here, if VAL is both a valid coding system and a valid
6224 function symbol, we return VAL as a coding system. */
02ba4723
KH
6225 if (CONSP (val))
6226 return val;
6227 if (! SYMBOLP (val))
6228 return Qnil;
6229 if (! NILP (Fcoding_system_p (val)))
6230 return Fcons (val, val);
b19fd4c5
KH
6231 if (! NILP (Ffboundp (val)))
6232 {
6233 val = call1 (val, Flist (nargs, args));
6234 if (CONSP (val))
6235 return val;
6236 if (SYMBOLP (val) && ! NILP (Fcoding_system_p (val)))
6237 return Fcons (val, val);
6238 }
02ba4723
KH
6239 return Qnil;
6240 }
4ed46869
KH
6241 }
6242 return Qnil;
6243}
6244
1397dc18
KH
6245DEFUN ("update-coding-systems-internal", Fupdate_coding_systems_internal,
6246 Supdate_coding_systems_internal, 0, 0, 0,
6247 "Update internal database for ISO2022 and CCL based coding systems.\n\
fa42c37f
KH
6248When values of any coding categories are changed, you must\n\
6249call this function")
d46c5b12
KH
6250 ()
6251{
6252 int i;
6253
fa42c37f 6254 for (i = CODING_CATEGORY_IDX_EMACS_MULE; i < CODING_CATEGORY_IDX_MAX; i++)
d46c5b12 6255 {
1397dc18
KH
6256 Lisp_Object val;
6257
6258 val = XSYMBOL (XVECTOR (Vcoding_category_table)->contents[i])->value;
6259 if (!NILP (val))
6260 {
6261 if (! coding_system_table[i])
6262 coding_system_table[i] = ((struct coding_system *)
6263 xmalloc (sizeof (struct coding_system)));
6264 setup_coding_system (val, coding_system_table[i]);
6265 }
6266 else if (coding_system_table[i])
6267 {
6268 xfree (coding_system_table[i]);
6269 coding_system_table[i] = NULL;
6270 }
d46c5b12 6271 }
1397dc18 6272
d46c5b12
KH
6273 return Qnil;
6274}
6275
66cfb530
KH
6276DEFUN ("set-coding-priority-internal", Fset_coding_priority_internal,
6277 Sset_coding_priority_internal, 0, 0, 0,
6278 "Update internal database for the current value of `coding-category-list'.\n\
6279This function is internal use only.")
6280 ()
6281{
6282 int i = 0, idx;
84d60297
RS
6283 Lisp_Object val;
6284
6285 val = Vcoding_category_list;
66cfb530
KH
6286
6287 while (CONSP (val) && i < CODING_CATEGORY_IDX_MAX)
6288 {
03699b14 6289 if (! SYMBOLP (XCAR (val)))
66cfb530 6290 break;
03699b14 6291 idx = XFASTINT (Fget (XCAR (val), Qcoding_category_index));
66cfb530
KH
6292 if (idx >= CODING_CATEGORY_IDX_MAX)
6293 break;
6294 coding_priorities[i++] = (1 << idx);
03699b14 6295 val = XCDR (val);
66cfb530
KH
6296 }
6297 /* If coding-category-list is valid and contains all coding
6298 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not,
fa42c37f 6299 the following code saves Emacs from crashing. */
66cfb530
KH
6300 while (i < CODING_CATEGORY_IDX_MAX)
6301 coding_priorities[i++] = CODING_CATEGORY_MASK_RAW_TEXT;
6302
6303 return Qnil;
6304}
6305
4ed46869
KH
6306#endif /* emacs */
6307
6308\f
1397dc18 6309/*** 9. Post-amble ***/
4ed46869 6310
dfcf069d 6311void
4ed46869
KH
6312init_coding_once ()
6313{
6314 int i;
6315
0ef69138 6316 /* Emacs' internal format specific initialize routine. */
4ed46869
KH
6317 for (i = 0; i <= 0x20; i++)
6318 emacs_code_class[i] = EMACS_control_code;
6319 emacs_code_class[0x0A] = EMACS_linefeed_code;
6320 emacs_code_class[0x0D] = EMACS_carriage_return_code;
6321 for (i = 0x21 ; i < 0x7F; i++)
6322 emacs_code_class[i] = EMACS_ascii_code;
6323 emacs_code_class[0x7F] = EMACS_control_code;
ec6d2bb8 6324 for (i = 0x80; i < 0xFF; i++)
4ed46869
KH
6325 emacs_code_class[i] = EMACS_invalid_code;
6326 emacs_code_class[LEADING_CODE_PRIVATE_11] = EMACS_leading_code_3;
6327 emacs_code_class[LEADING_CODE_PRIVATE_12] = EMACS_leading_code_3;
6328 emacs_code_class[LEADING_CODE_PRIVATE_21] = EMACS_leading_code_4;
6329 emacs_code_class[LEADING_CODE_PRIVATE_22] = EMACS_leading_code_4;
6330
6331 /* ISO2022 specific initialize routine. */
6332 for (i = 0; i < 0x20; i++)
b73bfc1c 6333 iso_code_class[i] = ISO_control_0;
4ed46869
KH
6334 for (i = 0x21; i < 0x7F; i++)
6335 iso_code_class[i] = ISO_graphic_plane_0;
6336 for (i = 0x80; i < 0xA0; i++)
b73bfc1c 6337 iso_code_class[i] = ISO_control_1;
4ed46869
KH
6338 for (i = 0xA1; i < 0xFF; i++)
6339 iso_code_class[i] = ISO_graphic_plane_1;
6340 iso_code_class[0x20] = iso_code_class[0x7F] = ISO_0x20_or_0x7F;
6341 iso_code_class[0xA0] = iso_code_class[0xFF] = ISO_0xA0_or_0xFF;
6342 iso_code_class[ISO_CODE_CR] = ISO_carriage_return;
6343 iso_code_class[ISO_CODE_SO] = ISO_shift_out;
6344 iso_code_class[ISO_CODE_SI] = ISO_shift_in;
6345 iso_code_class[ISO_CODE_SS2_7] = ISO_single_shift_2_7;
6346 iso_code_class[ISO_CODE_ESC] = ISO_escape;
6347 iso_code_class[ISO_CODE_SS2] = ISO_single_shift_2;
6348 iso_code_class[ISO_CODE_SS3] = ISO_single_shift_3;
6349 iso_code_class[ISO_CODE_CSI] = ISO_control_sequence_introducer;
6350
e0e989f6
KH
6351 setup_coding_system (Qnil, &keyboard_coding);
6352 setup_coding_system (Qnil, &terminal_coding);
c4825358 6353 setup_coding_system (Qnil, &safe_terminal_coding);
6bc51348 6354 setup_coding_system (Qnil, &default_buffer_file_coding);
9ce27fde 6355
d46c5b12
KH
6356 bzero (coding_system_table, sizeof coding_system_table);
6357
66cfb530
KH
6358 bzero (ascii_skip_code, sizeof ascii_skip_code);
6359 for (i = 0; i < 128; i++)
6360 ascii_skip_code[i] = 1;
6361
9ce27fde
KH
6362#if defined (MSDOS) || defined (WINDOWSNT)
6363 system_eol_type = CODING_EOL_CRLF;
6364#else
6365 system_eol_type = CODING_EOL_LF;
6366#endif
b843d1ae
KH
6367
6368 inhibit_pre_post_conversion = 0;
e0e989f6
KH
6369}
6370
6371#ifdef emacs
6372
dfcf069d 6373void
e0e989f6
KH
6374syms_of_coding ()
6375{
6376 Qtarget_idx = intern ("target-idx");
6377 staticpro (&Qtarget_idx);
6378
bb0115a2
RS
6379 Qcoding_system_history = intern ("coding-system-history");
6380 staticpro (&Qcoding_system_history);
6381 Fset (Qcoding_system_history, Qnil);
6382
9ce27fde 6383 /* Target FILENAME is the first argument. */
e0e989f6 6384 Fput (Qinsert_file_contents, Qtarget_idx, make_number (0));
9ce27fde 6385 /* Target FILENAME is the third argument. */
e0e989f6
KH
6386 Fput (Qwrite_region, Qtarget_idx, make_number (2));
6387
6388 Qcall_process = intern ("call-process");
6389 staticpro (&Qcall_process);
9ce27fde 6390 /* Target PROGRAM is the first argument. */
e0e989f6
KH
6391 Fput (Qcall_process, Qtarget_idx, make_number (0));
6392
6393 Qcall_process_region = intern ("call-process-region");
6394 staticpro (&Qcall_process_region);
9ce27fde 6395 /* Target PROGRAM is the third argument. */
e0e989f6
KH
6396 Fput (Qcall_process_region, Qtarget_idx, make_number (2));
6397
6398 Qstart_process = intern ("start-process");
6399 staticpro (&Qstart_process);
9ce27fde 6400 /* Target PROGRAM is the third argument. */
e0e989f6
KH
6401 Fput (Qstart_process, Qtarget_idx, make_number (2));
6402
6403 Qopen_network_stream = intern ("open-network-stream");
6404 staticpro (&Qopen_network_stream);
9ce27fde 6405 /* Target SERVICE is the fourth argument. */
e0e989f6
KH
6406 Fput (Qopen_network_stream, Qtarget_idx, make_number (3));
6407
4ed46869
KH
6408 Qcoding_system = intern ("coding-system");
6409 staticpro (&Qcoding_system);
6410
6411 Qeol_type = intern ("eol-type");
6412 staticpro (&Qeol_type);
6413
6414 Qbuffer_file_coding_system = intern ("buffer-file-coding-system");
6415 staticpro (&Qbuffer_file_coding_system);
6416
6417 Qpost_read_conversion = intern ("post-read-conversion");
6418 staticpro (&Qpost_read_conversion);
6419
6420 Qpre_write_conversion = intern ("pre-write-conversion");
6421 staticpro (&Qpre_write_conversion);
6422
27901516
KH
6423 Qno_conversion = intern ("no-conversion");
6424 staticpro (&Qno_conversion);
6425
6426 Qundecided = intern ("undecided");
6427 staticpro (&Qundecided);
6428
4ed46869
KH
6429 Qcoding_system_p = intern ("coding-system-p");
6430 staticpro (&Qcoding_system_p);
6431
6432 Qcoding_system_error = intern ("coding-system-error");
6433 staticpro (&Qcoding_system_error);
6434
6435 Fput (Qcoding_system_error, Qerror_conditions,
6436 Fcons (Qcoding_system_error, Fcons (Qerror, Qnil)));
6437 Fput (Qcoding_system_error, Qerror_message,
9ce27fde 6438 build_string ("Invalid coding system"));
4ed46869 6439
d46c5b12
KH
6440 Qcoding_category = intern ("coding-category");
6441 staticpro (&Qcoding_category);
4ed46869
KH
6442 Qcoding_category_index = intern ("coding-category-index");
6443 staticpro (&Qcoding_category_index);
6444
d46c5b12
KH
6445 Vcoding_category_table
6446 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX), Qnil);
6447 staticpro (&Vcoding_category_table);
4ed46869
KH
6448 {
6449 int i;
6450 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
6451 {
d46c5b12
KH
6452 XVECTOR (Vcoding_category_table)->contents[i]
6453 = intern (coding_category_name[i]);
6454 Fput (XVECTOR (Vcoding_category_table)->contents[i],
6455 Qcoding_category_index, make_number (i));
4ed46869
KH
6456 }
6457 }
6458
f967223b
KH
6459 Qtranslation_table = intern ("translation-table");
6460 staticpro (&Qtranslation_table);
1397dc18 6461 Fput (Qtranslation_table, Qchar_table_extra_slots, make_number (1));
bdd9fb48 6462
f967223b
KH
6463 Qtranslation_table_id = intern ("translation-table-id");
6464 staticpro (&Qtranslation_table_id);
84fbb8a0 6465
f967223b
KH
6466 Qtranslation_table_for_decode = intern ("translation-table-for-decode");
6467 staticpro (&Qtranslation_table_for_decode);
a5d301df 6468
f967223b
KH
6469 Qtranslation_table_for_encode = intern ("translation-table-for-encode");
6470 staticpro (&Qtranslation_table_for_encode);
a5d301df 6471
05e6f5dc
KH
6472 Qsafe_chars = intern ("safe-chars");
6473 staticpro (&Qsafe_chars);
6474
6475 Qchar_coding_system = intern ("char-coding-system");
6476 staticpro (&Qchar_coding_system);
6477
6478 /* Intern this now in case it isn't already done.
6479 Setting this variable twice is harmless.
6480 But don't staticpro it here--that is done in alloc.c. */
6481 Qchar_table_extra_slots = intern ("char-table-extra-slots");
6482 Fput (Qsafe_chars, Qchar_table_extra_slots, make_number (0));
6483 Fput (Qchar_coding_system, Qchar_table_extra_slots, make_number (1));
70c22245 6484
1397dc18
KH
6485 Qvalid_codes = intern ("valid-codes");
6486 staticpro (&Qvalid_codes);
6487
9ce27fde
KH
6488 Qemacs_mule = intern ("emacs-mule");
6489 staticpro (&Qemacs_mule);
6490
d46c5b12
KH
6491 Qraw_text = intern ("raw-text");
6492 staticpro (&Qraw_text);
6493
4ed46869
KH
6494 defsubr (&Scoding_system_p);
6495 defsubr (&Sread_coding_system);
6496 defsubr (&Sread_non_nil_coding_system);
6497 defsubr (&Scheck_coding_system);
6498 defsubr (&Sdetect_coding_region);
d46c5b12 6499 defsubr (&Sdetect_coding_string);
05e6f5dc 6500 defsubr (&Sfind_coding_systems_region_internal);
4ed46869
KH
6501 defsubr (&Sdecode_coding_region);
6502 defsubr (&Sencode_coding_region);
6503 defsubr (&Sdecode_coding_string);
6504 defsubr (&Sencode_coding_string);
6505 defsubr (&Sdecode_sjis_char);
6506 defsubr (&Sencode_sjis_char);
6507 defsubr (&Sdecode_big5_char);
6508 defsubr (&Sencode_big5_char);
1ba9e4ab 6509 defsubr (&Sset_terminal_coding_system_internal);
c4825358 6510 defsubr (&Sset_safe_terminal_coding_system_internal);
4ed46869 6511 defsubr (&Sterminal_coding_system);
1ba9e4ab 6512 defsubr (&Sset_keyboard_coding_system_internal);
4ed46869 6513 defsubr (&Skeyboard_coding_system);
a5d301df 6514 defsubr (&Sfind_operation_coding_system);
1397dc18 6515 defsubr (&Supdate_coding_systems_internal);
66cfb530 6516 defsubr (&Sset_coding_priority_internal);
4ed46869 6517
4608c386
KH
6518 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list,
6519 "List of coding systems.\n\
6520\n\
6521Do not alter the value of this variable manually. This variable should be\n\
6522updated by the functions `make-coding-system' and\n\
6523`define-coding-system-alias'.");
6524 Vcoding_system_list = Qnil;
6525
6526 DEFVAR_LISP ("coding-system-alist", &Vcoding_system_alist,
6527 "Alist of coding system names.\n\
6528Each element is one element list of coding system name.\n\
6529This variable is given to `completing-read' as TABLE argument.\n\
6530\n\
6531Do not alter the value of this variable manually. This variable should be\n\
6532updated by the functions `make-coding-system' and\n\
6533`define-coding-system-alias'.");
6534 Vcoding_system_alist = Qnil;
6535
4ed46869
KH
6536 DEFVAR_LISP ("coding-category-list", &Vcoding_category_list,
6537 "List of coding-categories (symbols) ordered by priority.");
6538 {
6539 int i;
6540
6541 Vcoding_category_list = Qnil;
6542 for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--)
6543 Vcoding_category_list
d46c5b12
KH
6544 = Fcons (XVECTOR (Vcoding_category_table)->contents[i],
6545 Vcoding_category_list);
4ed46869
KH
6546 }
6547
6548 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read,
10bff6f1 6549 "Specify the coding system for read operations.\n\
2ebb362d 6550It is useful to bind this variable with `let', but do not set it globally.\n\
4ed46869 6551If the value is a coding system, it is used for decoding on read operation.\n\
a67a9c66 6552If not, an appropriate element is used from one of the coding system alists:\n\
10bff6f1 6553There are three such tables, `file-coding-system-alist',\n\
a67a9c66 6554`process-coding-system-alist', and `network-coding-system-alist'.");
4ed46869
KH
6555 Vcoding_system_for_read = Qnil;
6556
6557 DEFVAR_LISP ("coding-system-for-write", &Vcoding_system_for_write,
10bff6f1 6558 "Specify the coding system for write operations.\n\
928aedd8
RS
6559Programs bind this variable with `let', but you should not set it globally.\n\
6560If the value is a coding system, it is used for encoding of output,\n\
6561when writing it to a file and when sending it to a file or subprocess.\n\
6562\n\
6563If this does not specify a coding system, an appropriate element\n\
6564is used from one of the coding system alists:\n\
10bff6f1 6565There are three such tables, `file-coding-system-alist',\n\
928aedd8
RS
6566`process-coding-system-alist', and `network-coding-system-alist'.\n\
6567For output to files, if the above procedure does not specify a coding system,\n\
6568the value of `buffer-file-coding-system' is used.");
4ed46869
KH
6569 Vcoding_system_for_write = Qnil;
6570
6571 DEFVAR_LISP ("last-coding-system-used", &Vlast_coding_system_used,
a67a9c66 6572 "Coding system used in the latest file or process I/O.");
4ed46869
KH
6573 Vlast_coding_system_used = Qnil;
6574
9ce27fde 6575 DEFVAR_BOOL ("inhibit-eol-conversion", &inhibit_eol_conversion,
f07f4a24 6576 "*Non-nil means always inhibit code conversion of end-of-line format.\n\
94c7a214
DL
6577See info node `Coding Systems' and info node `Text and Binary' concerning\n\
6578such conversion.");
9ce27fde
KH
6579 inhibit_eol_conversion = 0;
6580
ed29121d
EZ
6581 DEFVAR_BOOL ("inherit-process-coding-system", &inherit_process_coding_system,
6582 "Non-nil means process buffer inherits coding system of process output.\n\
6583Bind it to t if the process output is to be treated as if it were a file\n\
6584read from some filesystem.");
6585 inherit_process_coding_system = 0;
6586
02ba4723
KH
6587 DEFVAR_LISP ("file-coding-system-alist", &Vfile_coding_system_alist,
6588 "Alist to decide a coding system to use for a file I/O operation.\n\
6589The format is ((PATTERN . VAL) ...),\n\
6590where PATTERN is a regular expression matching a file name,\n\
6591VAL is a coding system, a cons of coding systems, or a function symbol.\n\
6592If VAL is a coding system, it is used for both decoding and encoding\n\
6593the file contents.\n\
6594If VAL is a cons of coding systems, the car part is used for decoding,\n\
6595and the cdr part is used for encoding.\n\
6596If VAL is a function symbol, the function must return a coding system\n\
6597or a cons of coding systems which are used as above.\n\
e0e989f6 6598\n\
a85a871a 6599See also the function `find-operation-coding-system'\n\
eda284ac 6600and the variable `auto-coding-alist'.");
02ba4723
KH
6601 Vfile_coding_system_alist = Qnil;
6602
6603 DEFVAR_LISP ("process-coding-system-alist", &Vprocess_coding_system_alist,
6604 "Alist to decide a coding system to use for a process I/O operation.\n\
6605The format is ((PATTERN . VAL) ...),\n\
6606where PATTERN is a regular expression matching a program name,\n\
6607VAL is a coding system, a cons of coding systems, or a function symbol.\n\
6608If VAL is a coding system, it is used for both decoding what received\n\
6609from the program and encoding what sent to the program.\n\
6610If VAL is a cons of coding systems, the car part is used for decoding,\n\
6611and the cdr part is used for encoding.\n\
6612If VAL is a function symbol, the function must return a coding system\n\
6613or a cons of coding systems which are used as above.\n\
4ed46869 6614\n\
9ce27fde 6615See also the function `find-operation-coding-system'.");
02ba4723
KH
6616 Vprocess_coding_system_alist = Qnil;
6617
6618 DEFVAR_LISP ("network-coding-system-alist", &Vnetwork_coding_system_alist,
6619 "Alist to decide a coding system to use for a network I/O operation.\n\
6620The format is ((PATTERN . VAL) ...),\n\
6621where PATTERN is a regular expression matching a network service name\n\
6622or is a port number to connect to,\n\
6623VAL is a coding system, a cons of coding systems, or a function symbol.\n\
6624If VAL is a coding system, it is used for both decoding what received\n\
6625from the network stream and encoding what sent to the network stream.\n\
6626If VAL is a cons of coding systems, the car part is used for decoding,\n\
6627and the cdr part is used for encoding.\n\
6628If VAL is a function symbol, the function must return a coding system\n\
6629or a cons of coding systems which are used as above.\n\
4ed46869 6630\n\
9ce27fde 6631See also the function `find-operation-coding-system'.");
02ba4723 6632 Vnetwork_coding_system_alist = Qnil;
4ed46869 6633
68c45bf0
PE
6634 DEFVAR_LISP ("locale-coding-system", &Vlocale_coding_system,
6635 "Coding system to use with system messages.");
6636 Vlocale_coding_system = Qnil;
6637
005f0d35 6638 /* The eol mnemonics are reset in startup.el system-dependently. */
7722baf9
EZ
6639 DEFVAR_LISP ("eol-mnemonic-unix", &eol_mnemonic_unix,
6640 "*String displayed in mode line for UNIX-like (LF) end-of-line format.");
6641 eol_mnemonic_unix = build_string (":");
4ed46869 6642
7722baf9
EZ
6643 DEFVAR_LISP ("eol-mnemonic-dos", &eol_mnemonic_dos,
6644 "*String displayed in mode line for DOS-like (CRLF) end-of-line format.");
6645 eol_mnemonic_dos = build_string ("\\");
4ed46869 6646
7722baf9
EZ
6647 DEFVAR_LISP ("eol-mnemonic-mac", &eol_mnemonic_mac,
6648 "*String displayed in mode line for MAC-like (CR) end-of-line format.");
6649 eol_mnemonic_mac = build_string ("/");
4ed46869 6650
7722baf9
EZ
6651 DEFVAR_LISP ("eol-mnemonic-undecided", &eol_mnemonic_undecided,
6652 "*String displayed in mode line when end-of-line format is not yet determined.");
6653 eol_mnemonic_undecided = build_string (":");
4ed46869 6654
84fbb8a0 6655 DEFVAR_LISP ("enable-character-translation", &Venable_character_translation,
f967223b 6656 "*Non-nil enables character translation while encoding and decoding.");
84fbb8a0 6657 Venable_character_translation = Qt;
bdd9fb48 6658
f967223b
KH
6659 DEFVAR_LISP ("standard-translation-table-for-decode",
6660 &Vstandard_translation_table_for_decode,
84fbb8a0 6661 "Table for translating characters while decoding.");
f967223b 6662 Vstandard_translation_table_for_decode = Qnil;
bdd9fb48 6663
f967223b
KH
6664 DEFVAR_LISP ("standard-translation-table-for-encode",
6665 &Vstandard_translation_table_for_encode,
84fbb8a0 6666 "Table for translationg characters while encoding.");
f967223b 6667 Vstandard_translation_table_for_encode = Qnil;
4ed46869
KH
6668
6669 DEFVAR_LISP ("charset-revision-table", &Vcharset_revision_alist,
6670 "Alist of charsets vs revision numbers.\n\
6671While encoding, if a charset (car part of an element) is found,\n\
6672designate it with the escape sequence identifing revision (cdr part of the element).");
6673 Vcharset_revision_alist = Qnil;
02ba4723
KH
6674
6675 DEFVAR_LISP ("default-process-coding-system",
6676 &Vdefault_process_coding_system,
6677 "Cons of coding systems used for process I/O by default.\n\
6678The car part is used for decoding a process output,\n\
6679the cdr part is used for encoding a text to be sent to a process.");
6680 Vdefault_process_coding_system = Qnil;
c4825358 6681
3f003981
KH
6682 DEFVAR_LISP ("latin-extra-code-table", &Vlatin_extra_code_table,
6683 "Table of extra Latin codes in the range 128..159 (inclusive).\n\
c4825358
KH
6684This is a vector of length 256.\n\
6685If Nth element is non-nil, the existence of code N in a file\n\
bb0115a2 6686\(or output of subprocess) doesn't prevent it to be detected as\n\
3f003981
KH
6687a coding system of ISO 2022 variant which has a flag\n\
6688`accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file\n\
c4825358
KH
6689or reading output of a subprocess.\n\
6690Only 128th through 159th elements has a meaning.");
3f003981 6691 Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil);
d46c5b12
KH
6692
6693 DEFVAR_LISP ("select-safe-coding-system-function",
6694 &Vselect_safe_coding_system_function,
6695 "Function to call to select safe coding system for encoding a text.\n\
6696\n\
6697If set, this function is called to force a user to select a proper\n\
6698coding system which can encode the text in the case that a default\n\
6699coding system used in each operation can't encode the text.\n\
6700\n\
a85a871a 6701The default value is `select-safe-coding-system' (which see).");
d46c5b12
KH
6702 Vselect_safe_coding_system_function = Qnil;
6703
05e6f5dc
KH
6704 DEFVAR_LISP ("char-coding-system-table", &Vchar_coding_system_table,
6705 "Char-table containing safe coding systems of each characters.\n\
6706Each element doesn't include such generic coding systems that can\n\
6707encode any characters. They are in the first extra slot.");
6708 Vchar_coding_system_table = Fmake_char_table (Qchar_coding_system, Qnil);
6709
22ab2303 6710 DEFVAR_BOOL ("inhibit-iso-escape-detection",
74383408
KH
6711 &inhibit_iso_escape_detection,
6712 "If non-nil, Emacs ignores ISO2022's escape sequence on code detection.\n\
6713\n\
6714By default, on reading a file, Emacs tries to detect how the text is\n\
6715encoded. This code detection is sensitive to escape sequences. If\n\
e215fa58
EZ
6716the sequence is valid as ISO2022, the code is determined as one of\n\
6717the ISO2022 encodings, and the file is decoded by the corresponding\n\
6718coding system (e.g. `iso-2022-7bit').\n\
74383408
KH
6719\n\
6720However, there may be a case that you want to read escape sequences in\n\
6721a file as is. In such a case, you can set this variable to non-nil.\n\
6722Then, as the code detection ignores any escape sequences, no file is\n\
e215fa58
EZ
6723detected as encoded in some ISO2022 encoding. The result is that all\n\
6724escape sequences become visible in a buffer.\n\
74383408
KH
6725\n\
6726The default value is nil, and it is strongly recommended not to change\n\
6727it. That is because many Emacs Lisp source files that contain\n\
6728non-ASCII characters are encoded by the coding system `iso-2022-7bit'\n\
6729in Emacs's distribution, and they won't be decoded correctly on\n\
e215fa58 6730reading if you suppress escape sequence detection.\n\
74383408
KH
6731\n\
6732The other way to read escape sequences in a file without decoding is\n\
e215fa58 6733to explicitly specify some coding system that doesn't use ISO2022's\n\
74383408
KH
6734escape sequence (e.g `latin-1') on reading by \\[universal-coding-system-argument].");
6735 inhibit_iso_escape_detection = 0;
4ed46869
KH
6736}
6737
68c45bf0
PE
6738char *
6739emacs_strerror (error_number)
6740 int error_number;
6741{
6742 char *str;
6743
ca9c0567 6744 synchronize_system_messages_locale ();
68c45bf0
PE
6745 str = strerror (error_number);
6746
6747 if (! NILP (Vlocale_coding_system))
6748 {
6749 Lisp_Object dec = code_convert_string_norecord (build_string (str),
6750 Vlocale_coding_system,
6751 0);
6752 str = (char *) XSTRING (dec)->data;
6753 }
6754
6755 return str;
6756}
6757
4ed46869 6758#endif /* emacs */
c2f94ebc 6759