(vc-default-workfile-unchanged-p): Pass nil
[bpt/emacs.git] / src / coding.c
CommitLineData
4ed46869 1/* Coding system handler (conversion, detection, and etc).
ff955d90 2 Copyright (C) 1995, 1997, 1998, 2002 Electrotechnical Laboratory, JAPAN.
203cb916 3 Licensed to the Free Software Foundation.
ea9d458b 4 Copyright (C) 2001,2002 Free Software Foundation, Inc.
4ed46869 5
369314dc
KH
6This file is part of GNU Emacs.
7
8GNU Emacs is free software; you can redistribute it and/or modify
9it under the terms of the GNU General Public License as published by
10the Free Software Foundation; either version 2, or (at your option)
11any later version.
4ed46869 12
369314dc
KH
13GNU Emacs is distributed in the hope that it will be useful,
14but WITHOUT ANY WARRANTY; without even the implied warranty of
15MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16GNU General Public License for more details.
4ed46869 17
369314dc
KH
18You should have received a copy of the GNU General Public License
19along with GNU Emacs; see the file COPYING. If not, write to
20the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
21Boston, MA 02111-1307, USA. */
4ed46869
KH
22
23/*** TABLE OF CONTENTS ***
24
b73bfc1c 25 0. General comments
4ed46869 26 1. Preamble
0ef69138 27 2. Emacs' internal format (emacs-mule) handlers
4ed46869
KH
28 3. ISO2022 handlers
29 4. Shift-JIS and BIG5 handlers
1397dc18
KH
30 5. CCL handlers
31 6. End-of-line handlers
32 7. C library functions
33 8. Emacs Lisp library functions
34 9. Post-amble
4ed46869
KH
35
36*/
37
b73bfc1c
KH
38/*** 0. General comments ***/
39
40
cfb43547 41/*** GENERAL NOTE on CODING SYSTEMS ***
4ed46869 42
cfb43547 43 A coding system is an encoding mechanism for one or more character
4ed46869
KH
44 sets. Here's a list of coding systems which Emacs can handle. When
45 we say "decode", it means converting some other coding system to
cfb43547 46 Emacs' internal format (emacs-mule), and when we say "encode",
0ef69138
KH
47 it means converting the coding system emacs-mule to some other
48 coding system.
4ed46869 49
0ef69138 50 0. Emacs' internal format (emacs-mule)
4ed46869 51
cfb43547 52 Emacs itself holds a multi-lingual character in buffers and strings
f4dee582 53 in a special format. Details are described in section 2.
4ed46869
KH
54
55 1. ISO2022
56
57 The most famous coding system for multiple character sets. X's
f4dee582
RS
58 Compound Text, various EUCs (Extended Unix Code), and coding
59 systems used in Internet communication such as ISO-2022-JP are
60 all variants of ISO2022. Details are described in section 3.
4ed46869
KH
61
62 2. SJIS (or Shift-JIS or MS-Kanji-Code)
93dec019 63
4ed46869
KH
64 A coding system to encode character sets: ASCII, JISX0201, and
65 JISX0208. Widely used for PC's in Japan. Details are described in
f4dee582 66 section 4.
4ed46869
KH
67
68 3. BIG5
69
cfb43547
DL
70 A coding system to encode the character sets ASCII and Big5. Widely
71 used for Chinese (mainly in Taiwan and Hong Kong). Details are
f4dee582
RS
72 described in section 4. In this file, when we write "BIG5"
73 (all uppercase), we mean the coding system, and when we write
74 "Big5" (capitalized), we mean the character set.
4ed46869 75
27901516
KH
76 4. Raw text
77
cfb43547
DL
78 A coding system for text containing random 8-bit code. Emacs does
79 no code conversion on such text except for end-of-line format.
27901516
KH
80
81 5. Other
4ed46869 82
cfb43547
DL
83 If a user wants to read/write text encoded in a coding system not
84 listed above, he can supply a decoder and an encoder for it as CCL
4ed46869
KH
85 (Code Conversion Language) programs. Emacs executes the CCL program
86 while reading/writing.
87
d46c5b12
KH
88 Emacs represents a coding system by a Lisp symbol that has a property
89 `coding-system'. But, before actually using the coding system, the
4ed46869 90 information about it is set in a structure of type `struct
f4dee582 91 coding_system' for rapid processing. See section 6 for more details.
4ed46869
KH
92
93*/
94
95/*** GENERAL NOTES on END-OF-LINE FORMAT ***
96
cfb43547
DL
97 How end-of-line of text is encoded depends on the operating system.
98 For instance, Unix's format is just one byte of `line-feed' code,
f4dee582 99 whereas DOS's format is two-byte sequence of `carriage-return' and
d46c5b12
KH
100 `line-feed' codes. MacOS's format is usually one byte of
101 `carriage-return'.
4ed46869 102
cfb43547
DL
103 Since text character encoding and end-of-line encoding are
104 independent, any coding system described above can have any
105 end-of-line format. So Emacs has information about end-of-line
106 format in each coding-system. See section 6 for more details.
4ed46869
KH
107
108*/
109
110/*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
111
112 These functions check if a text between SRC and SRC_END is encoded
113 in the coding system category XXX. Each returns an integer value in
cfb43547 114 which appropriate flag bits for the category XXX are set. The flag
4ed46869 115 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the
cfb43547 116 template for these functions. If MULTIBYTEP is nonzero, 8-bit codes
0a28aafb 117 of the range 0x80..0x9F are in multibyte form. */
4ed46869
KH
118#if 0
119int
0a28aafb 120detect_coding_emacs_mule (src, src_end, multibytep)
4ed46869 121 unsigned char *src, *src_end;
0a28aafb 122 int multibytep;
4ed46869
KH
123{
124 ...
125}
126#endif
127
128/*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
129
b73bfc1c
KH
130 These functions decode SRC_BYTES length of unibyte text at SOURCE
131 encoded in CODING to Emacs' internal format. The resulting
132 multibyte text goes to a place pointed to by DESTINATION, the length
133 of which should not exceed DST_BYTES.
d46c5b12 134
cfb43547
DL
135 These functions set the information about original and decoded texts
136 in the members `produced', `produced_char', `consumed', and
137 `consumed_char' of the structure *CODING. They also set the member
138 `result' to one of CODING_FINISH_XXX indicating how the decoding
139 finished.
d46c5b12 140
cfb43547 141 DST_BYTES zero means that the source area and destination area are
d46c5b12 142 overlapped, which means that we can produce a decoded text until it
cfb43547 143 reaches the head of the not-yet-decoded source text.
d46c5b12 144
cfb43547 145 Below is a template for these functions. */
4ed46869 146#if 0
b73bfc1c 147static void
d46c5b12 148decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
149 struct coding_system *coding;
150 unsigned char *source, *destination;
151 int src_bytes, dst_bytes;
4ed46869
KH
152{
153 ...
154}
155#endif
156
157/*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
158
cfb43547 159 These functions encode SRC_BYTES length text at SOURCE from Emacs'
b73bfc1c
KH
160 internal multibyte format to CODING. The resulting unibyte text
161 goes to a place pointed to by DESTINATION, the length of which
162 should not exceed DST_BYTES.
d46c5b12 163
cfb43547
DL
164 These functions set the information about original and encoded texts
165 in the members `produced', `produced_char', `consumed', and
166 `consumed_char' of the structure *CODING. They also set the member
167 `result' to one of CODING_FINISH_XXX indicating how the encoding
168 finished.
d46c5b12 169
cfb43547
DL
170 DST_BYTES zero means that the source area and destination area are
171 overlapped, which means that we can produce encoded text until it
172 reaches at the head of the not-yet-encoded source text.
d46c5b12 173
cfb43547 174 Below is a template for these functions. */
4ed46869 175#if 0
b73bfc1c 176static void
d46c5b12 177encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
178 struct coding_system *coding;
179 unsigned char *source, *destination;
180 int src_bytes, dst_bytes;
4ed46869
KH
181{
182 ...
183}
184#endif
185
186/*** COMMONLY USED MACROS ***/
187
b73bfc1c
KH
188/* The following two macros ONE_MORE_BYTE and TWO_MORE_BYTES safely
189 get one, two, and three bytes from the source text respectively.
190 If there are not enough bytes in the source, they jump to
191 `label_end_of_loop'. The caller should set variables `coding',
192 `src' and `src_end' to appropriate pointer in advance. These
193 macros are called from decoding routines `decode_coding_XXX', thus
194 it is assumed that the source text is unibyte. */
4ed46869 195
b73bfc1c
KH
196#define ONE_MORE_BYTE(c1) \
197 do { \
198 if (src >= src_end) \
199 { \
200 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
201 goto label_end_of_loop; \
202 } \
203 c1 = *src++; \
4ed46869
KH
204 } while (0)
205
b73bfc1c
KH
206#define TWO_MORE_BYTES(c1, c2) \
207 do { \
208 if (src + 1 >= src_end) \
209 { \
210 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
211 goto label_end_of_loop; \
212 } \
213 c1 = *src++; \
214 c2 = *src++; \
4ed46869
KH
215 } while (0)
216
4ed46869 217
0a28aafb
KH
218/* Like ONE_MORE_BYTE, but 8-bit bytes of data at SRC are in multibyte
219 form if MULTIBYTEP is nonzero. */
220
221#define ONE_MORE_BYTE_CHECK_MULTIBYTE(c1, multibytep) \
222 do { \
223 if (src >= src_end) \
224 { \
225 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
226 goto label_end_of_loop; \
227 } \
228 c1 = *src++; \
229 if (multibytep && c1 == LEADING_CODE_8_BIT_CONTROL) \
230 c1 = *src++ - 0x20; \
231 } while (0)
232
b73bfc1c
KH
233/* Set C to the next character at the source text pointed by `src'.
234 If there are not enough characters in the source, jump to
235 `label_end_of_loop'. The caller should set variables `coding'
236 `src', `src_end', and `translation_table' to appropriate pointers
237 in advance. This macro is used in encoding routines
238 `encode_coding_XXX', thus it assumes that the source text is in
239 multibyte form except for 8-bit characters. 8-bit characters are
240 in multibyte form if coding->src_multibyte is nonzero, else they
241 are represented by a single byte. */
4ed46869 242
b73bfc1c
KH
243#define ONE_MORE_CHAR(c) \
244 do { \
245 int len = src_end - src; \
246 int bytes; \
247 if (len <= 0) \
248 { \
249 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
250 goto label_end_of_loop; \
251 } \
252 if (coding->src_multibyte \
253 || UNIBYTE_STR_AS_MULTIBYTE_P (src, len, bytes)) \
254 c = STRING_CHAR_AND_LENGTH (src, len, bytes); \
255 else \
256 c = *src, bytes = 1; \
257 if (!NILP (translation_table)) \
39658efc 258 c = translate_char (translation_table, c, -1, 0, 0); \
b73bfc1c 259 src += bytes; \
4ed46869
KH
260 } while (0)
261
4ed46869 262
8ca3766a 263/* Produce a multibyte form of character C to `dst'. Jump to
b73bfc1c
KH
264 `label_end_of_loop' if there's not enough space at `dst'.
265
cfb43547 266 If we are now in the middle of a composition sequence, the decoded
b73bfc1c
KH
267 character may be ALTCHAR (for the current composition). In that
268 case, the character goes to coding->cmp_data->data instead of
269 `dst'.
270
271 This macro is used in decoding routines. */
272
273#define EMIT_CHAR(c) \
4ed46869 274 do { \
b73bfc1c
KH
275 if (! COMPOSING_P (coding) \
276 || coding->composing == COMPOSITION_RELATIVE \
277 || coding->composing == COMPOSITION_WITH_RULE) \
278 { \
279 int bytes = CHAR_BYTES (c); \
280 if ((dst + bytes) > (dst_bytes ? dst_end : src)) \
281 { \
282 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
283 goto label_end_of_loop; \
284 } \
285 dst += CHAR_STRING (c, dst); \
286 coding->produced_char++; \
287 } \
ec6d2bb8 288 \
b73bfc1c
KH
289 if (COMPOSING_P (coding) \
290 && coding->composing != COMPOSITION_RELATIVE) \
291 { \
292 CODING_ADD_COMPOSITION_COMPONENT (coding, c); \
293 coding->composition_rule_follows \
294 = coding->composing != COMPOSITION_WITH_ALTCHARS; \
295 } \
4ed46869
KH
296 } while (0)
297
4ed46869 298
b73bfc1c
KH
299#define EMIT_ONE_BYTE(c) \
300 do { \
301 if (dst >= (dst_bytes ? dst_end : src)) \
302 { \
303 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
304 goto label_end_of_loop; \
305 } \
306 *dst++ = c; \
307 } while (0)
308
309#define EMIT_TWO_BYTES(c1, c2) \
310 do { \
311 if (dst + 2 > (dst_bytes ? dst_end : src)) \
312 { \
313 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
314 goto label_end_of_loop; \
315 } \
316 *dst++ = c1, *dst++ = c2; \
317 } while (0)
318
319#define EMIT_BYTES(from, to) \
320 do { \
321 if (dst + (to - from) > (dst_bytes ? dst_end : src)) \
322 { \
323 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
324 goto label_end_of_loop; \
325 } \
326 while (from < to) \
327 *dst++ = *from++; \
4ed46869
KH
328 } while (0)
329
330\f
331/*** 1. Preamble ***/
332
68c45bf0
PE
333#ifdef emacs
334#include <config.h>
335#endif
336
4ed46869
KH
337#include <stdio.h>
338
339#ifdef emacs
340
4ed46869
KH
341#include "lisp.h"
342#include "buffer.h"
343#include "charset.h"
ec6d2bb8 344#include "composite.h"
4ed46869
KH
345#include "ccl.h"
346#include "coding.h"
347#include "window.h"
348
349#else /* not emacs */
350
351#include "mulelib.h"
352
353#endif /* not emacs */
354
355Lisp_Object Qcoding_system, Qeol_type;
356Lisp_Object Qbuffer_file_coding_system;
357Lisp_Object Qpost_read_conversion, Qpre_write_conversion;
27901516 358Lisp_Object Qno_conversion, Qundecided;
bb0115a2 359Lisp_Object Qcoding_system_history;
05e6f5dc 360Lisp_Object Qsafe_chars;
1397dc18 361Lisp_Object Qvalid_codes;
4ed46869
KH
362
363extern Lisp_Object Qinsert_file_contents, Qwrite_region;
364Lisp_Object Qcall_process, Qcall_process_region, Qprocess_argument;
365Lisp_Object Qstart_process, Qopen_network_stream;
366Lisp_Object Qtarget_idx;
367
d46c5b12
KH
368Lisp_Object Vselect_safe_coding_system_function;
369
5d5bf4d8
KH
370int coding_system_require_warning;
371
7722baf9
EZ
372/* Mnemonic string for each format of end-of-line. */
373Lisp_Object eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac;
374/* Mnemonic string to indicate format of end-of-line is not yet
4ed46869 375 decided. */
7722baf9 376Lisp_Object eol_mnemonic_undecided;
4ed46869 377
9ce27fde
KH
378/* Format of end-of-line decided by system. This is CODING_EOL_LF on
379 Unix, CODING_EOL_CRLF on DOS/Windows, and CODING_EOL_CR on Mac. */
380int system_eol_type;
381
4ed46869
KH
382#ifdef emacs
383
6b89e3aa
KH
384/* Information about which coding system is safe for which chars.
385 The value has the form (GENERIC-LIST . NON-GENERIC-ALIST).
386
387 GENERIC-LIST is a list of generic coding systems which can encode
388 any characters.
389
390 NON-GENERIC-ALIST is an alist of non generic coding systems vs the
391 corresponding char table that contains safe chars. */
392Lisp_Object Vcoding_system_safe_chars;
393
4608c386
KH
394Lisp_Object Vcoding_system_list, Vcoding_system_alist;
395
396Lisp_Object Qcoding_system_p, Qcoding_system_error;
4ed46869 397
d46c5b12
KH
398/* Coding system emacs-mule and raw-text are for converting only
399 end-of-line format. */
400Lisp_Object Qemacs_mule, Qraw_text;
9ce27fde 401
4ed46869
KH
402/* Coding-systems are handed between Emacs Lisp programs and C internal
403 routines by the following three variables. */
404/* Coding-system for reading files and receiving data from process. */
405Lisp_Object Vcoding_system_for_read;
406/* Coding-system for writing files and sending data to process. */
407Lisp_Object Vcoding_system_for_write;
408/* Coding-system actually used in the latest I/O. */
409Lisp_Object Vlast_coding_system_used;
410
c4825358 411/* A vector of length 256 which contains information about special
94487c4e 412 Latin codes (especially for dealing with Microsoft codes). */
3f003981 413Lisp_Object Vlatin_extra_code_table;
c4825358 414
9ce27fde
KH
415/* Flag to inhibit code conversion of end-of-line format. */
416int inhibit_eol_conversion;
417
74383408
KH
418/* Flag to inhibit ISO2022 escape sequence detection. */
419int inhibit_iso_escape_detection;
420
ed29121d
EZ
421/* Flag to make buffer-file-coding-system inherit from process-coding. */
422int inherit_process_coding_system;
423
c4825358 424/* Coding system to be used to encode text for terminal display. */
4ed46869
KH
425struct coding_system terminal_coding;
426
c4825358
KH
427/* Coding system to be used to encode text for terminal display when
428 terminal coding system is nil. */
429struct coding_system safe_terminal_coding;
430
431/* Coding system of what is sent from terminal keyboard. */
4ed46869
KH
432struct coding_system keyboard_coding;
433
6bc51348
KH
434/* Default coding system to be used to write a file. */
435struct coding_system default_buffer_file_coding;
436
02ba4723
KH
437Lisp_Object Vfile_coding_system_alist;
438Lisp_Object Vprocess_coding_system_alist;
439Lisp_Object Vnetwork_coding_system_alist;
4ed46869 440
68c45bf0
PE
441Lisp_Object Vlocale_coding_system;
442
4ed46869
KH
443#endif /* emacs */
444
d46c5b12 445Lisp_Object Qcoding_category, Qcoding_category_index;
4ed46869
KH
446
447/* List of symbols `coding-category-xxx' ordered by priority. */
448Lisp_Object Vcoding_category_list;
449
d46c5b12
KH
450/* Table of coding categories (Lisp symbols). */
451Lisp_Object Vcoding_category_table;
4ed46869
KH
452
453/* Table of names of symbol for each coding-category. */
454char *coding_category_name[CODING_CATEGORY_IDX_MAX] = {
0ef69138 455 "coding-category-emacs-mule",
4ed46869
KH
456 "coding-category-sjis",
457 "coding-category-iso-7",
d46c5b12 458 "coding-category-iso-7-tight",
4ed46869
KH
459 "coding-category-iso-8-1",
460 "coding-category-iso-8-2",
7717c392
KH
461 "coding-category-iso-7-else",
462 "coding-category-iso-8-else",
89fa8b36 463 "coding-category-ccl",
4ed46869 464 "coding-category-big5",
fa42c37f
KH
465 "coding-category-utf-8",
466 "coding-category-utf-16-be",
467 "coding-category-utf-16-le",
27901516 468 "coding-category-raw-text",
89fa8b36 469 "coding-category-binary"
4ed46869
KH
470};
471
66cfb530 472/* Table of pointers to coding systems corresponding to each coding
d46c5b12
KH
473 categories. */
474struct coding_system *coding_system_table[CODING_CATEGORY_IDX_MAX];
475
66cfb530 476/* Table of coding category masks. Nth element is a mask for a coding
8ca3766a 477 category of which priority is Nth. */
66cfb530
KH
478static
479int coding_priorities[CODING_CATEGORY_IDX_MAX];
480
f967223b
KH
481/* Flag to tell if we look up translation table on character code
482 conversion. */
84fbb8a0 483Lisp_Object Venable_character_translation;
f967223b
KH
484/* Standard translation table to look up on decoding (reading). */
485Lisp_Object Vstandard_translation_table_for_decode;
486/* Standard translation table to look up on encoding (writing). */
487Lisp_Object Vstandard_translation_table_for_encode;
84fbb8a0 488
f967223b
KH
489Lisp_Object Qtranslation_table;
490Lisp_Object Qtranslation_table_id;
491Lisp_Object Qtranslation_table_for_decode;
492Lisp_Object Qtranslation_table_for_encode;
4ed46869
KH
493
494/* Alist of charsets vs revision number. */
495Lisp_Object Vcharset_revision_alist;
496
02ba4723
KH
497/* Default coding systems used for process I/O. */
498Lisp_Object Vdefault_process_coding_system;
499
002fdb44
DL
500/* Char table for translating Quail and self-inserting input. */
501Lisp_Object Vtranslation_table_for_input;
502
b843d1ae
KH
503/* Global flag to tell that we can't call post-read-conversion and
504 pre-write-conversion functions. Usually the value is zero, but it
505 is set to 1 temporarily while such functions are running. This is
506 to avoid infinite recursive call. */
507static int inhibit_pre_post_conversion;
508
05e6f5dc
KH
509Lisp_Object Qchar_coding_system;
510
6b89e3aa
KH
511/* Return `safe-chars' property of CODING_SYSTEM (symbol). Don't check
512 its validity. */
05e6f5dc
KH
513
514Lisp_Object
6b89e3aa
KH
515coding_safe_chars (coding_system)
516 Lisp_Object coding_system;
05e6f5dc
KH
517{
518 Lisp_Object coding_spec, plist, safe_chars;
93dec019 519
6b89e3aa 520 coding_spec = Fget (coding_system, Qcoding_system);
05e6f5dc
KH
521 plist = XVECTOR (coding_spec)->contents[3];
522 safe_chars = Fplist_get (XVECTOR (coding_spec)->contents[3], Qsafe_chars);
523 return (CHAR_TABLE_P (safe_chars) ? safe_chars : Qt);
524}
525
526#define CODING_SAFE_CHAR_P(safe_chars, c) \
527 (EQ (safe_chars, Qt) || !NILP (CHAR_TABLE_REF (safe_chars, c)))
528
4ed46869 529\f
0ef69138 530/*** 2. Emacs internal format (emacs-mule) handlers ***/
4ed46869 531
aa72b389
KH
532/* Emacs' internal format for representation of multiple character
533 sets is a kind of multi-byte encoding, i.e. characters are
534 represented by variable-length sequences of one-byte codes.
b73bfc1c
KH
535
536 ASCII characters and control characters (e.g. `tab', `newline') are
537 represented by one-byte sequences which are their ASCII codes, in
538 the range 0x00 through 0x7F.
539
540 8-bit characters of the range 0x80..0x9F are represented by
541 two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit
542 code + 0x20).
543
544 8-bit characters of the range 0xA0..0xFF are represented by
545 one-byte sequences which are their 8-bit code.
546
547 The other characters are represented by a sequence of `base
548 leading-code', optional `extended leading-code', and one or two
549 `position-code's. The length of the sequence is determined by the
aa72b389 550 base leading-code. Leading-code takes the range 0x81 through 0x9D,
b73bfc1c
KH
551 whereas extended leading-code and position-code take the range 0xA0
552 through 0xFF. See `charset.h' for more details about leading-code
553 and position-code.
f4dee582 554
4ed46869 555 --- CODE RANGE of Emacs' internal format ---
b73bfc1c
KH
556 character set range
557 ------------- -----
558 ascii 0x00..0x7F
559 eight-bit-control LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF
560 eight-bit-graphic 0xA0..0xBF
aa72b389 561 ELSE 0x81..0x9D + [0xA0..0xFF]+
4ed46869
KH
562 ---------------------------------------------
563
aa72b389
KH
564 As this is the internal character representation, the format is
565 usually not used externally (i.e. in a file or in a data sent to a
566 process). But, it is possible to have a text externally in this
567 format (i.e. by encoding by the coding system `emacs-mule').
568
569 In that case, a sequence of one-byte codes has a slightly different
570 form.
571
ae5145c2 572 Firstly, all characters in eight-bit-control are represented by
aa72b389
KH
573 one-byte sequences which are their 8-bit code.
574
575 Next, character composition data are represented by the byte
576 sequence of the form: 0x80 METHOD BYTES CHARS COMPONENT ...,
577 where,
578 METHOD is 0xF0 plus one of composition method (enum
579 composition_method),
580
ae5145c2 581 BYTES is 0xA0 plus the byte length of these composition data,
aa72b389 582
ae5145c2 583 CHARS is 0xA0 plus the number of characters composed by these
aa72b389
KH
584 data,
585
8ca3766a 586 COMPONENTs are characters of multibyte form or composition
aa72b389
KH
587 rules encoded by two-byte of ASCII codes.
588
589 In addition, for backward compatibility, the following formats are
590 also recognized as composition data on decoding.
591
592 0x80 MSEQ ...
593 0x80 0xFF MSEQ RULE MSEQ RULE ... MSEQ
594
595 Here,
596 MSEQ is a multibyte form but in these special format:
597 ASCII: 0xA0 ASCII_CODE+0x80,
598 other: LEADING_CODE+0x20 FOLLOWING-BYTE ...,
599 RULE is a one byte code of the range 0xA0..0xF0 that
600 represents a composition rule.
4ed46869
KH
601 */
602
603enum emacs_code_class_type emacs_code_class[256];
604
4ed46869
KH
605/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
606 Check if a text is encoded in Emacs' internal format. If it is,
d46c5b12 607 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */
4ed46869 608
0a28aafb
KH
609static int
610detect_coding_emacs_mule (src, src_end, multibytep)
b73bfc1c 611 unsigned char *src, *src_end;
0a28aafb 612 int multibytep;
4ed46869
KH
613{
614 unsigned char c;
615 int composing = 0;
b73bfc1c
KH
616 /* Dummy for ONE_MORE_BYTE. */
617 struct coding_system dummy_coding;
618 struct coding_system *coding = &dummy_coding;
4ed46869 619
b73bfc1c 620 while (1)
4ed46869 621 {
0a28aafb 622 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
4ed46869
KH
623
624 if (composing)
625 {
626 if (c < 0xA0)
627 composing = 0;
b73bfc1c
KH
628 else if (c == 0xA0)
629 {
0a28aafb 630 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
b73bfc1c
KH
631 c &= 0x7F;
632 }
4ed46869
KH
633 else
634 c -= 0x20;
635 }
636
b73bfc1c 637 if (c < 0x20)
4ed46869 638 {
4ed46869
KH
639 if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)
640 return 0;
b73bfc1c
KH
641 }
642 else if (c >= 0x80 && c < 0xA0)
643 {
644 if (c == 0x80)
645 /* Old leading code for a composite character. */
646 composing = 1;
647 else
648 {
649 unsigned char *src_base = src - 1;
650 int bytes;
4ed46869 651
b73bfc1c
KH
652 if (!UNIBYTE_STR_AS_MULTIBYTE_P (src_base, src_end - src_base,
653 bytes))
654 return 0;
655 src = src_base + bytes;
656 }
657 }
658 }
659 label_end_of_loop:
660 return CODING_CATEGORY_MASK_EMACS_MULE;
661}
4ed46869 662
4ed46869 663
aa72b389
KH
664/* Record the starting position START and METHOD of one composition. */
665
666#define CODING_ADD_COMPOSITION_START(coding, start, method) \
667 do { \
668 struct composition_data *cmp_data = coding->cmp_data; \
669 int *data = cmp_data->data + cmp_data->used; \
670 coding->cmp_data_start = cmp_data->used; \
671 data[0] = -1; \
672 data[1] = cmp_data->char_offset + start; \
673 data[3] = (int) method; \
674 cmp_data->used += 4; \
675 } while (0)
676
677/* Record the ending position END of the current composition. */
678
679#define CODING_ADD_COMPOSITION_END(coding, end) \
680 do { \
681 struct composition_data *cmp_data = coding->cmp_data; \
682 int *data = cmp_data->data + coding->cmp_data_start; \
683 data[0] = cmp_data->used - coding->cmp_data_start; \
684 data[2] = cmp_data->char_offset + end; \
685 } while (0)
686
687/* Record one COMPONENT (alternate character or composition rule). */
688
b6871cc7
KH
689#define CODING_ADD_COMPOSITION_COMPONENT(coding, component) \
690 do { \
691 coding->cmp_data->data[coding->cmp_data->used++] = component; \
692 if (coding->cmp_data->used - coding->cmp_data_start \
693 == COMPOSITION_DATA_MAX_BUNCH_LENGTH) \
694 { \
695 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
696 coding->composing = COMPOSITION_NO; \
697 } \
698 } while (0)
aa72b389
KH
699
700
701/* Get one byte from a data pointed by SRC and increment SRC. If SRC
8ca3766a 702 is not less than SRC_END, return -1 without incrementing Src. */
aa72b389
KH
703
704#define SAFE_ONE_MORE_BYTE() (src >= src_end ? -1 : *src++)
705
706
707/* Decode a character represented as a component of composition
708 sequence of Emacs 20 style at SRC. Set C to that character, store
709 its multibyte form sequence at P, and set P to the end of that
710 sequence. If no valid character is found, set C to -1. */
711
712#define DECODE_EMACS_MULE_COMPOSITION_CHAR(c, p) \
713 do { \
714 int bytes; \
715 \
716 c = SAFE_ONE_MORE_BYTE (); \
717 if (c < 0) \
718 break; \
719 if (CHAR_HEAD_P (c)) \
720 c = -1; \
721 else if (c == 0xA0) \
722 { \
723 c = SAFE_ONE_MORE_BYTE (); \
724 if (c < 0xA0) \
725 c = -1; \
726 else \
727 { \
728 c -= 0xA0; \
729 *p++ = c; \
730 } \
731 } \
732 else if (BASE_LEADING_CODE_P (c - 0x20)) \
733 { \
734 unsigned char *p0 = p; \
735 \
736 c -= 0x20; \
737 *p++ = c; \
738 bytes = BYTES_BY_CHAR_HEAD (c); \
739 while (--bytes) \
740 { \
741 c = SAFE_ONE_MORE_BYTE (); \
742 if (c < 0) \
743 break; \
744 *p++ = c; \
745 } \
746 if (UNIBYTE_STR_AS_MULTIBYTE_P (p0, p - p0, bytes)) \
747 c = STRING_CHAR (p0, bytes); \
748 else \
749 c = -1; \
750 } \
751 else \
752 c = -1; \
753 } while (0)
754
755
756/* Decode a composition rule represented as a component of composition
757 sequence of Emacs 20 style at SRC. Set C to the rule. If not
758 valid rule is found, set C to -1. */
759
760#define DECODE_EMACS_MULE_COMPOSITION_RULE(c) \
761 do { \
762 c = SAFE_ONE_MORE_BYTE (); \
763 c -= 0xA0; \
764 if (c < 0 || c >= 81) \
765 c = -1; \
766 else \
767 { \
768 gref = c / 9, nref = c % 9; \
769 c = COMPOSITION_ENCODE_RULE (gref, nref); \
770 } \
771 } while (0)
772
773
774/* Decode composition sequence encoded by `emacs-mule' at the source
775 pointed by SRC. SRC_END is the end of source. Store information
776 of the composition in CODING->cmp_data.
777
778 For backward compatibility, decode also a composition sequence of
779 Emacs 20 style. In that case, the composition sequence contains
780 characters that should be extracted into a buffer or string. Store
781 those characters at *DESTINATION in multibyte form.
782
783 If we encounter an invalid byte sequence, return 0.
784 If we encounter an insufficient source or destination, or
785 insufficient space in CODING->cmp_data, return 1.
786 Otherwise, return consumed bytes in the source.
787
788*/
789static INLINE int
790decode_composition_emacs_mule (coding, src, src_end,
791 destination, dst_end, dst_bytes)
792 struct coding_system *coding;
793 unsigned char *src, *src_end, **destination, *dst_end;
794 int dst_bytes;
795{
796 unsigned char *dst = *destination;
797 int method, data_len, nchars;
798 unsigned char *src_base = src++;
8ca3766a 799 /* Store components of composition. */
aa72b389
KH
800 int component[COMPOSITION_DATA_MAX_BUNCH_LENGTH];
801 int ncomponent;
802 /* Store multibyte form of characters to be composed. This is for
803 Emacs 20 style composition sequence. */
804 unsigned char buf[MAX_COMPOSITION_COMPONENTS * MAX_MULTIBYTE_LENGTH];
805 unsigned char *bufp = buf;
806 int c, i, gref, nref;
807
808 if (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH
809 >= COMPOSITION_DATA_SIZE)
810 {
811 coding->result = CODING_FINISH_INSUFFICIENT_CMP;
812 return -1;
813 }
814
815 ONE_MORE_BYTE (c);
816 if (c - 0xF0 >= COMPOSITION_RELATIVE
817 && c - 0xF0 <= COMPOSITION_WITH_RULE_ALTCHARS)
818 {
819 int with_rule;
820
821 method = c - 0xF0;
822 with_rule = (method == COMPOSITION_WITH_RULE
823 || method == COMPOSITION_WITH_RULE_ALTCHARS);
824 ONE_MORE_BYTE (c);
825 data_len = c - 0xA0;
826 if (data_len < 4
827 || src_base + data_len > src_end)
828 return 0;
829 ONE_MORE_BYTE (c);
830 nchars = c - 0xA0;
831 if (c < 1)
832 return 0;
833 for (ncomponent = 0; src < src_base + data_len; ncomponent++)
834 {
b1887814
RS
835 /* If it is longer than this, it can't be valid. */
836 if (ncomponent >= COMPOSITION_DATA_MAX_BUNCH_LENGTH)
837 return 0;
838
aa72b389
KH
839 if (ncomponent % 2 && with_rule)
840 {
841 ONE_MORE_BYTE (gref);
842 gref -= 32;
843 ONE_MORE_BYTE (nref);
844 nref -= 32;
845 c = COMPOSITION_ENCODE_RULE (gref, nref);
846 }
847 else
848 {
849 int bytes;
850 if (UNIBYTE_STR_AS_MULTIBYTE_P (src, src_end - src, bytes))
851 c = STRING_CHAR (src, bytes);
852 else
853 c = *src, bytes = 1;
854 src += bytes;
855 }
856 component[ncomponent] = c;
857 }
858 }
859 else
860 {
861 /* This may be an old Emacs 20 style format. See the comment at
862 the section 2 of this file. */
863 while (src < src_end && !CHAR_HEAD_P (*src)) src++;
864 if (src == src_end
865 && !(coding->mode & CODING_MODE_LAST_BLOCK))
866 goto label_end_of_loop;
867
868 src_end = src;
869 src = src_base + 1;
870 if (c < 0xC0)
871 {
872 method = COMPOSITION_RELATIVE;
873 for (ncomponent = 0; ncomponent < MAX_COMPOSITION_COMPONENTS;)
874 {
875 DECODE_EMACS_MULE_COMPOSITION_CHAR (c, bufp);
876 if (c < 0)
877 break;
878 component[ncomponent++] = c;
879 }
880 if (ncomponent < 2)
881 return 0;
882 nchars = ncomponent;
883 }
884 else if (c == 0xFF)
885 {
886 method = COMPOSITION_WITH_RULE;
887 src++;
888 DECODE_EMACS_MULE_COMPOSITION_CHAR (c, bufp);
889 if (c < 0)
890 return 0;
891 component[0] = c;
892 for (ncomponent = 1;
893 ncomponent < MAX_COMPOSITION_COMPONENTS * 2 - 1;)
894 {
895 DECODE_EMACS_MULE_COMPOSITION_RULE (c);
896 if (c < 0)
897 break;
898 component[ncomponent++] = c;
899 DECODE_EMACS_MULE_COMPOSITION_CHAR (c, bufp);
900 if (c < 0)
901 break;
902 component[ncomponent++] = c;
903 }
904 if (ncomponent < 3)
905 return 0;
906 nchars = (ncomponent + 1) / 2;
907 }
908 else
909 return 0;
910 }
911
912 if (buf == bufp || dst + (bufp - buf) <= (dst_bytes ? dst_end : src))
913 {
914 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, method);
915 for (i = 0; i < ncomponent; i++)
916 CODING_ADD_COMPOSITION_COMPONENT (coding, component[i]);
93dec019 917 CODING_ADD_COMPOSITION_END (coding, coding->produced_char + nchars);
aa72b389
KH
918 if (buf < bufp)
919 {
920 unsigned char *p = buf;
921 EMIT_BYTES (p, bufp);
922 *destination += bufp - buf;
923 coding->produced_char += nchars;
924 }
925 return (src - src_base);
926 }
927 label_end_of_loop:
928 return -1;
929}
930
b73bfc1c 931/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
4ed46869 932
b73bfc1c
KH
933static void
934decode_coding_emacs_mule (coding, source, destination, src_bytes, dst_bytes)
935 struct coding_system *coding;
936 unsigned char *source, *destination;
937 int src_bytes, dst_bytes;
938{
939 unsigned char *src = source;
940 unsigned char *src_end = source + src_bytes;
941 unsigned char *dst = destination;
942 unsigned char *dst_end = destination + dst_bytes;
943 /* SRC_BASE remembers the start position in source in each loop.
944 The loop will be exited when there's not enough source code, or
945 when there's not enough destination area to produce a
946 character. */
947 unsigned char *src_base;
4ed46869 948
b73bfc1c 949 coding->produced_char = 0;
8a33cf7b 950 while ((src_base = src) < src_end)
b73bfc1c
KH
951 {
952 unsigned char tmp[MAX_MULTIBYTE_LENGTH], *p;
953 int bytes;
ec6d2bb8 954
4af310db
EZ
955 if (*src == '\r')
956 {
2bcdf662 957 int c = *src++;
4af310db 958
4af310db
EZ
959 if (coding->eol_type == CODING_EOL_CR)
960 c = '\n';
961 else if (coding->eol_type == CODING_EOL_CRLF)
962 {
963 ONE_MORE_BYTE (c);
964 if (c != '\n')
965 {
4af310db
EZ
966 src--;
967 c = '\r';
968 }
969 }
970 *dst++ = c;
971 coding->produced_char++;
972 continue;
973 }
974 else if (*src == '\n')
975 {
976 if ((coding->eol_type == CODING_EOL_CR
977 || coding->eol_type == CODING_EOL_CRLF)
978 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
979 {
980 coding->result = CODING_FINISH_INCONSISTENT_EOL;
981 goto label_end_of_loop;
982 }
983 *dst++ = *src++;
984 coding->produced_char++;
985 continue;
986 }
3089d25c 987 else if (*src == 0x80 && coding->cmp_data)
aa72b389
KH
988 {
989 /* Start of composition data. */
990 int consumed = decode_composition_emacs_mule (coding, src, src_end,
991 &dst, dst_end,
992 dst_bytes);
993 if (consumed < 0)
994 goto label_end_of_loop;
995 else if (consumed > 0)
996 {
997 src += consumed;
998 continue;
999 }
1000 bytes = CHAR_STRING (*src, tmp);
1001 p = tmp;
1002 src++;
1003 }
4af310db 1004 else if (UNIBYTE_STR_AS_MULTIBYTE_P (src, src_end - src, bytes))
b73bfc1c
KH
1005 {
1006 p = src;
1007 src += bytes;
1008 }
1009 else
1010 {
1011 bytes = CHAR_STRING (*src, tmp);
1012 p = tmp;
1013 src++;
1014 }
1015 if (dst + bytes >= (dst_bytes ? dst_end : src))
1016 {
1017 coding->result = CODING_FINISH_INSUFFICIENT_DST;
4ed46869
KH
1018 break;
1019 }
b73bfc1c
KH
1020 while (bytes--) *dst++ = *p++;
1021 coding->produced_char++;
4ed46869 1022 }
4af310db 1023 label_end_of_loop:
b73bfc1c
KH
1024 coding->consumed = coding->consumed_char = src_base - source;
1025 coding->produced = dst - destination;
4ed46869
KH
1026}
1027
b73bfc1c 1028
aa72b389
KH
1029/* Encode composition data stored at DATA into a special byte sequence
1030 starting by 0x80. Update CODING->cmp_data_start and maybe
1031 CODING->cmp_data for the next call. */
1032
1033#define ENCODE_COMPOSITION_EMACS_MULE(coding, data) \
1034 do { \
1035 unsigned char buf[1024], *p0 = buf, *p; \
1036 int len = data[0]; \
1037 int i; \
1038 \
1039 buf[0] = 0x80; \
1040 buf[1] = 0xF0 + data[3]; /* METHOD */ \
1041 buf[3] = 0xA0 + (data[2] - data[1]); /* COMPOSED-CHARS */ \
1042 p = buf + 4; \
1043 if (data[3] == COMPOSITION_WITH_RULE \
1044 || data[3] == COMPOSITION_WITH_RULE_ALTCHARS) \
1045 { \
1046 p += CHAR_STRING (data[4], p); \
1047 for (i = 5; i < len; i += 2) \
1048 { \
1049 int gref, nref; \
1050 COMPOSITION_DECODE_RULE (data[i], gref, nref); \
1051 *p++ = 0x20 + gref; \
1052 *p++ = 0x20 + nref; \
1053 p += CHAR_STRING (data[i + 1], p); \
1054 } \
1055 } \
1056 else \
1057 { \
1058 for (i = 4; i < len; i++) \
1059 p += CHAR_STRING (data[i], p); \
1060 } \
1061 buf[2] = 0xA0 + (p - buf); /* COMPONENTS-BYTES */ \
1062 \
1063 if (dst + (p - buf) + 4 > (dst_bytes ? dst_end : src)) \
1064 { \
1065 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
1066 goto label_end_of_loop; \
1067 } \
1068 while (p0 < p) \
1069 *dst++ = *p0++; \
1070 coding->cmp_data_start += data[0]; \
1071 if (coding->cmp_data_start == coding->cmp_data->used \
1072 && coding->cmp_data->next) \
1073 { \
1074 coding->cmp_data = coding->cmp_data->next; \
1075 coding->cmp_data_start = 0; \
1076 } \
1077 } while (0)
93dec019 1078
aa72b389 1079
a4244313 1080static void encode_eol P_ ((struct coding_system *, const unsigned char *,
aa72b389
KH
1081 unsigned char *, int, int));
1082
1083static void
1084encode_coding_emacs_mule (coding, source, destination, src_bytes, dst_bytes)
1085 struct coding_system *coding;
1086 unsigned char *source, *destination;
1087 int src_bytes, dst_bytes;
1088{
1089 unsigned char *src = source;
1090 unsigned char *src_end = source + src_bytes;
1091 unsigned char *dst = destination;
1092 unsigned char *dst_end = destination + dst_bytes;
1093 unsigned char *src_base;
1094 int c;
1095 int char_offset;
1096 int *data;
1097
1098 Lisp_Object translation_table;
1099
1100 translation_table = Qnil;
1101
1102 /* Optimization for the case that there's no composition. */
1103 if (!coding->cmp_data || coding->cmp_data->used == 0)
1104 {
1105 encode_eol (coding, source, destination, src_bytes, dst_bytes);
1106 return;
1107 }
1108
1109 char_offset = coding->cmp_data->char_offset;
1110 data = coding->cmp_data->data + coding->cmp_data_start;
1111 while (1)
1112 {
1113 src_base = src;
1114
1115 /* If SRC starts a composition, encode the information about the
1116 composition in advance. */
1117 if (coding->cmp_data_start < coding->cmp_data->used
1118 && char_offset + coding->consumed_char == data[1])
1119 {
1120 ENCODE_COMPOSITION_EMACS_MULE (coding, data);
1121 char_offset = coding->cmp_data->char_offset;
1122 data = coding->cmp_data->data + coding->cmp_data_start;
1123 }
1124
1125 ONE_MORE_CHAR (c);
1126 if (c == '\n' && (coding->eol_type == CODING_EOL_CRLF
1127 || coding->eol_type == CODING_EOL_CR))
1128 {
1129 if (coding->eol_type == CODING_EOL_CRLF)
1130 EMIT_TWO_BYTES ('\r', c);
1131 else
1132 EMIT_ONE_BYTE ('\r');
1133 }
1134 else if (SINGLE_BYTE_CHAR_P (c))
1135 EMIT_ONE_BYTE (c);
1136 else
1137 EMIT_BYTES (src_base, src);
1138 coding->consumed_char++;
1139 }
1140 label_end_of_loop:
1141 coding->consumed = src_base - source;
1142 coding->produced = coding->produced_char = dst - destination;
1143 return;
1144}
b73bfc1c 1145
4ed46869
KH
1146\f
1147/*** 3. ISO2022 handlers ***/
1148
1149/* The following note describes the coding system ISO2022 briefly.
39787efd 1150 Since the intention of this note is to help understand the
cfb43547 1151 functions in this file, some parts are NOT ACCURATE or are OVERLY
39787efd 1152 SIMPLIFIED. For thorough understanding, please refer to the
cfb43547
DL
1153 original document of ISO2022. This is equivalent to the standard
1154 ECMA-35, obtainable from <URL:http://www.ecma.ch/> (*).
4ed46869
KH
1155
1156 ISO2022 provides many mechanisms to encode several character sets
cfb43547 1157 in 7-bit and 8-bit environments. For 7-bit environments, all text
39787efd
KH
1158 is encoded using bytes less than 128. This may make the encoded
1159 text a little bit longer, but the text passes more easily through
cfb43547 1160 several types of gateway, some of which strip off the MSB (Most
8ca3766a 1161 Significant Bit).
b73bfc1c 1162
cfb43547
DL
1163 There are two kinds of character sets: control character sets and
1164 graphic character sets. The former contain control characters such
4ed46869 1165 as `newline' and `escape' to provide control functions (control
39787efd 1166 functions are also provided by escape sequences). The latter
cfb43547 1167 contain graphic characters such as 'A' and '-'. Emacs recognizes
4ed46869
KH
1168 two control character sets and many graphic character sets.
1169
1170 Graphic character sets are classified into one of the following
39787efd
KH
1171 four classes, according to the number of bytes (DIMENSION) and
1172 number of characters in one dimension (CHARS) of the set:
1173 - DIMENSION1_CHARS94
1174 - DIMENSION1_CHARS96
1175 - DIMENSION2_CHARS94
1176 - DIMENSION2_CHARS96
1177
1178 In addition, each character set is assigned an identification tag,
cfb43547 1179 unique for each set, called the "final character" (denoted as <F>
39787efd
KH
1180 hereafter). The <F> of each character set is decided by ECMA(*)
1181 when it is registered in ISO. The code range of <F> is 0x30..0x7F
1182 (0x30..0x3F are for private use only).
4ed46869
KH
1183
1184 Note (*): ECMA = European Computer Manufacturers Association
1185
cfb43547 1186 Here are examples of graphic character sets [NAME(<F>)]:
4ed46869
KH
1187 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
1188 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
1189 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
1190 o DIMENSION2_CHARS96 -- none for the moment
1191
39787efd 1192 A code area (1 byte=8 bits) is divided into 4 areas, C0, GL, C1, and GR.
4ed46869
KH
1193 C0 [0x00..0x1F] -- control character plane 0
1194 GL [0x20..0x7F] -- graphic character plane 0
1195 C1 [0x80..0x9F] -- control character plane 1
1196 GR [0xA0..0xFF] -- graphic character plane 1
1197
1198 A control character set is directly designated and invoked to C0 or
39787efd
KH
1199 C1 by an escape sequence. The most common case is that:
1200 - ISO646's control character set is designated/invoked to C0, and
1201 - ISO6429's control character set is designated/invoked to C1,
1202 and usually these designations/invocations are omitted in encoded
1203 text. In a 7-bit environment, only C0 can be used, and a control
1204 character for C1 is encoded by an appropriate escape sequence to
1205 fit into the environment. All control characters for C1 are
1206 defined to have corresponding escape sequences.
4ed46869
KH
1207
1208 A graphic character set is at first designated to one of four
1209 graphic registers (G0 through G3), then these graphic registers are
1210 invoked to GL or GR. These designations and invocations can be
1211 done independently. The most common case is that G0 is invoked to
39787efd
KH
1212 GL, G1 is invoked to GR, and ASCII is designated to G0. Usually
1213 these invocations and designations are omitted in encoded text.
1214 In a 7-bit environment, only GL can be used.
4ed46869 1215
39787efd
KH
1216 When a graphic character set of CHARS94 is invoked to GL, codes
1217 0x20 and 0x7F of the GL area work as control characters SPACE and
1218 DEL respectively, and codes 0xA0 and 0xFF of the GR area should not
1219 be used.
4ed46869
KH
1220
1221 There are two ways of invocation: locking-shift and single-shift.
1222 With locking-shift, the invocation lasts until the next different
39787efd
KH
1223 invocation, whereas with single-shift, the invocation affects the
1224 following character only and doesn't affect the locking-shift
1225 state. Invocations are done by the following control characters or
1226 escape sequences:
4ed46869
KH
1227
1228 ----------------------------------------------------------------------
39787efd 1229 abbrev function cntrl escape seq description
4ed46869 1230 ----------------------------------------------------------------------
39787efd
KH
1231 SI/LS0 (shift-in) 0x0F none invoke G0 into GL
1232 SO/LS1 (shift-out) 0x0E none invoke G1 into GL
1233 LS2 (locking-shift-2) none ESC 'n' invoke G2 into GL
1234 LS3 (locking-shift-3) none ESC 'o' invoke G3 into GL
1235 LS1R (locking-shift-1 right) none ESC '~' invoke G1 into GR (*)
1236 LS2R (locking-shift-2 right) none ESC '}' invoke G2 into GR (*)
1237 LS3R (locking-shift 3 right) none ESC '|' invoke G3 into GR (*)
1238 SS2 (single-shift-2) 0x8E ESC 'N' invoke G2 for one char
1239 SS3 (single-shift-3) 0x8F ESC 'O' invoke G3 for one char
4ed46869 1240 ----------------------------------------------------------------------
39787efd
KH
1241 (*) These are not used by any known coding system.
1242
1243 Control characters for these functions are defined by macros
1244 ISO_CODE_XXX in `coding.h'.
4ed46869 1245
39787efd 1246 Designations are done by the following escape sequences:
4ed46869
KH
1247 ----------------------------------------------------------------------
1248 escape sequence description
1249 ----------------------------------------------------------------------
1250 ESC '(' <F> designate DIMENSION1_CHARS94<F> to G0
1251 ESC ')' <F> designate DIMENSION1_CHARS94<F> to G1
1252 ESC '*' <F> designate DIMENSION1_CHARS94<F> to G2
1253 ESC '+' <F> designate DIMENSION1_CHARS94<F> to G3
1254 ESC ',' <F> designate DIMENSION1_CHARS96<F> to G0 (*)
1255 ESC '-' <F> designate DIMENSION1_CHARS96<F> to G1
1256 ESC '.' <F> designate DIMENSION1_CHARS96<F> to G2
1257 ESC '/' <F> designate DIMENSION1_CHARS96<F> to G3
1258 ESC '$' '(' <F> designate DIMENSION2_CHARS94<F> to G0 (**)
1259 ESC '$' ')' <F> designate DIMENSION2_CHARS94<F> to G1
1260 ESC '$' '*' <F> designate DIMENSION2_CHARS94<F> to G2
1261 ESC '$' '+' <F> designate DIMENSION2_CHARS94<F> to G3
1262 ESC '$' ',' <F> designate DIMENSION2_CHARS96<F> to G0 (*)
1263 ESC '$' '-' <F> designate DIMENSION2_CHARS96<F> to G1
1264 ESC '$' '.' <F> designate DIMENSION2_CHARS96<F> to G2
1265 ESC '$' '/' <F> designate DIMENSION2_CHARS96<F> to G3
1266 ----------------------------------------------------------------------
1267
1268 In this list, "DIMENSION1_CHARS94<F>" means a graphic character set
39787efd 1269 of dimension 1, chars 94, and final character <F>, etc...
4ed46869
KH
1270
1271 Note (*): Although these designations are not allowed in ISO2022,
1272 Emacs accepts them on decoding, and produces them on encoding
39787efd 1273 CHARS96 character sets in a coding system which is characterized as
4ed46869
KH
1274 7-bit environment, non-locking-shift, and non-single-shift.
1275
1276 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
39787efd 1277 '(' can be omitted. We refer to this as "short-form" hereafter.
4ed46869 1278
cfb43547 1279 Now you may notice that there are a lot of ways of encoding the
39787efd
KH
1280 same multilingual text in ISO2022. Actually, there exist many
1281 coding systems such as Compound Text (used in X11's inter client
8ca3766a
DL
1282 communication, ISO-2022-JP (used in Japanese Internet), ISO-2022-KR
1283 (used in Korean Internet), EUC (Extended UNIX Code, used in Asian
4ed46869
KH
1284 localized platforms), and all of these are variants of ISO2022.
1285
1286 In addition to the above, Emacs handles two more kinds of escape
1287 sequences: ISO6429's direction specification and Emacs' private
1288 sequence for specifying character composition.
1289
39787efd 1290 ISO6429's direction specification takes the following form:
4ed46869
KH
1291 o CSI ']' -- end of the current direction
1292 o CSI '0' ']' -- end of the current direction
1293 o CSI '1' ']' -- start of left-to-right text
1294 o CSI '2' ']' -- start of right-to-left text
1295 The control character CSI (0x9B: control sequence introducer) is
39787efd
KH
1296 abbreviated to the escape sequence ESC '[' in a 7-bit environment.
1297
1298 Character composition specification takes the following form:
ec6d2bb8
KH
1299 o ESC '0' -- start relative composition
1300 o ESC '1' -- end composition
1301 o ESC '2' -- start rule-base composition (*)
1302 o ESC '3' -- start relative composition with alternate chars (**)
1303 o ESC '4' -- start rule-base composition with alternate chars (**)
b73bfc1c 1304 Since these are not standard escape sequences of any ISO standard,
cfb43547 1305 the use of them with these meanings is restricted to Emacs only.
ec6d2bb8 1306
cfb43547 1307 (*) This form is used only in Emacs 20.5 and older versions,
b73bfc1c 1308 but the newer versions can safely decode it.
cfb43547 1309 (**) This form is used only in Emacs 21.1 and newer versions,
b73bfc1c 1310 and the older versions can't decode it.
ec6d2bb8 1311
cfb43547 1312 Here's a list of example usages of these composition escape
b73bfc1c 1313 sequences (categorized by `enum composition_method').
ec6d2bb8 1314
b73bfc1c 1315 COMPOSITION_RELATIVE:
ec6d2bb8 1316 ESC 0 CHAR [ CHAR ] ESC 1
8ca3766a 1317 COMPOSITION_WITH_RULE:
ec6d2bb8 1318 ESC 2 CHAR [ RULE CHAR ] ESC 1
b73bfc1c 1319 COMPOSITION_WITH_ALTCHARS:
ec6d2bb8 1320 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
b73bfc1c 1321 COMPOSITION_WITH_RULE_ALTCHARS:
ec6d2bb8 1322 ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */
4ed46869
KH
1323
1324enum iso_code_class_type iso_code_class[256];
1325
05e6f5dc
KH
1326#define CHARSET_OK(idx, charset, c) \
1327 (coding_system_table[idx] \
1328 && (charset == CHARSET_ASCII \
6b89e3aa 1329 || (safe_chars = coding_safe_chars (coding_system_table[idx]->symbol), \
05e6f5dc
KH
1330 CODING_SAFE_CHAR_P (safe_chars, c))) \
1331 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding_system_table[idx], \
1332 charset) \
1333 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
d46c5b12
KH
1334
1335#define SHIFT_OUT_OK(idx) \
1336 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
1337
b6871cc7
KH
1338#define COMPOSITION_OK(idx) \
1339 (coding_system_table[idx]->composing != COMPOSITION_DISABLED)
1340
4ed46869 1341/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
cfb43547 1342 Check if a text is encoded in ISO2022. If it is, return an
4ed46869
KH
1343 integer in which appropriate flag bits any of:
1344 CODING_CATEGORY_MASK_ISO_7
d46c5b12 1345 CODING_CATEGORY_MASK_ISO_7_TIGHT
4ed46869
KH
1346 CODING_CATEGORY_MASK_ISO_8_1
1347 CODING_CATEGORY_MASK_ISO_8_2
7717c392
KH
1348 CODING_CATEGORY_MASK_ISO_7_ELSE
1349 CODING_CATEGORY_MASK_ISO_8_ELSE
4ed46869
KH
1350 are set. If a code which should never appear in ISO2022 is found,
1351 returns 0. */
1352
0a28aafb
KH
1353static int
1354detect_coding_iso2022 (src, src_end, multibytep)
4ed46869 1355 unsigned char *src, *src_end;
0a28aafb 1356 int multibytep;
4ed46869 1357{
d46c5b12
KH
1358 int mask = CODING_CATEGORY_MASK_ISO;
1359 int mask_found = 0;
f46869e4 1360 int reg[4], shift_out = 0, single_shifting = 0;
da55a2b7 1361 int c, c1, charset;
b73bfc1c
KH
1362 /* Dummy for ONE_MORE_BYTE. */
1363 struct coding_system dummy_coding;
1364 struct coding_system *coding = &dummy_coding;
05e6f5dc 1365 Lisp_Object safe_chars;
3f003981 1366
d46c5b12 1367 reg[0] = CHARSET_ASCII, reg[1] = reg[2] = reg[3] = -1;
3f003981 1368 while (mask && src < src_end)
4ed46869 1369 {
0a28aafb 1370 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
8d239c89 1371 retry:
4ed46869
KH
1372 switch (c)
1373 {
1374 case ISO_CODE_ESC:
74383408
KH
1375 if (inhibit_iso_escape_detection)
1376 break;
f46869e4 1377 single_shifting = 0;
0a28aafb 1378 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
d46c5b12 1379 if (c >= '(' && c <= '/')
4ed46869 1380 {
bf9cdd4e 1381 /* Designation sequence for a charset of dimension 1. */
0a28aafb 1382 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1, multibytep);
d46c5b12
KH
1383 if (c1 < ' ' || c1 >= 0x80
1384 || (charset = iso_charset_table[0][c >= ','][c1]) < 0)
1385 /* Invalid designation sequence. Just ignore. */
1386 break;
1387 reg[(c - '(') % 4] = charset;
bf9cdd4e
KH
1388 }
1389 else if (c == '$')
1390 {
1391 /* Designation sequence for a charset of dimension 2. */
0a28aafb 1392 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
bf9cdd4e
KH
1393 if (c >= '@' && c <= 'B')
1394 /* Designation for JISX0208.1978, GB2312, or JISX0208. */
d46c5b12 1395 reg[0] = charset = iso_charset_table[1][0][c];
bf9cdd4e 1396 else if (c >= '(' && c <= '/')
bcf26d6a 1397 {
0a28aafb 1398 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1, multibytep);
d46c5b12
KH
1399 if (c1 < ' ' || c1 >= 0x80
1400 || (charset = iso_charset_table[1][c >= ','][c1]) < 0)
1401 /* Invalid designation sequence. Just ignore. */
1402 break;
1403 reg[(c - '(') % 4] = charset;
bcf26d6a 1404 }
bf9cdd4e 1405 else
d46c5b12
KH
1406 /* Invalid designation sequence. Just ignore. */
1407 break;
1408 }
ae9ff118 1409 else if (c == 'N' || c == 'O')
d46c5b12 1410 {
ae9ff118
KH
1411 /* ESC <Fe> for SS2 or SS3. */
1412 mask &= CODING_CATEGORY_MASK_ISO_7_ELSE;
d46c5b12 1413 break;
4ed46869 1414 }
ec6d2bb8
KH
1415 else if (c >= '0' && c <= '4')
1416 {
1417 /* ESC <Fp> for start/end composition. */
b6871cc7
KH
1418 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7))
1419 mask_found |= CODING_CATEGORY_MASK_ISO_7;
1420 else
1421 mask &= ~CODING_CATEGORY_MASK_ISO_7;
1422 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT))
1423 mask_found |= CODING_CATEGORY_MASK_ISO_7_TIGHT;
1424 else
1425 mask &= ~CODING_CATEGORY_MASK_ISO_7_TIGHT;
1426 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_1))
1427 mask_found |= CODING_CATEGORY_MASK_ISO_8_1;
1428 else
1429 mask &= ~CODING_CATEGORY_MASK_ISO_8_1;
1430 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_2))
1431 mask_found |= CODING_CATEGORY_MASK_ISO_8_2;
1432 else
1433 mask &= ~CODING_CATEGORY_MASK_ISO_8_2;
1434 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_ELSE))
1435 mask_found |= CODING_CATEGORY_MASK_ISO_7_ELSE;
1436 else
1437 mask &= ~CODING_CATEGORY_MASK_ISO_7_ELSE;
1438 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_ELSE))
1439 mask_found |= CODING_CATEGORY_MASK_ISO_8_ELSE;
1440 else
1441 mask &= ~CODING_CATEGORY_MASK_ISO_8_ELSE;
ec6d2bb8
KH
1442 break;
1443 }
bf9cdd4e 1444 else
d46c5b12
KH
1445 /* Invalid escape sequence. Just ignore. */
1446 break;
1447
1448 /* We found a valid designation sequence for CHARSET. */
1449 mask &= ~CODING_CATEGORY_MASK_ISO_8BIT;
05e6f5dc
KH
1450 c = MAKE_CHAR (charset, 0, 0);
1451 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7, charset, c))
d46c5b12
KH
1452 mask_found |= CODING_CATEGORY_MASK_ISO_7;
1453 else
1454 mask &= ~CODING_CATEGORY_MASK_ISO_7;
05e6f5dc 1455 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT, charset, c))
d46c5b12
KH
1456 mask_found |= CODING_CATEGORY_MASK_ISO_7_TIGHT;
1457 else
1458 mask &= ~CODING_CATEGORY_MASK_ISO_7_TIGHT;
05e6f5dc 1459 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE, charset, c))
ae9ff118
KH
1460 mask_found |= CODING_CATEGORY_MASK_ISO_7_ELSE;
1461 else
d46c5b12 1462 mask &= ~CODING_CATEGORY_MASK_ISO_7_ELSE;
05e6f5dc 1463 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE, charset, c))
ae9ff118
KH
1464 mask_found |= CODING_CATEGORY_MASK_ISO_8_ELSE;
1465 else
d46c5b12 1466 mask &= ~CODING_CATEGORY_MASK_ISO_8_ELSE;
4ed46869
KH
1467 break;
1468
4ed46869 1469 case ISO_CODE_SO:
74383408
KH
1470 if (inhibit_iso_escape_detection)
1471 break;
f46869e4 1472 single_shifting = 0;
d46c5b12
KH
1473 if (shift_out == 0
1474 && (reg[1] >= 0
1475 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE)
1476 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE)))
1477 {
1478 /* Locking shift out. */
1479 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
1480 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
1481 }
e0e989f6 1482 break;
93dec019 1483
d46c5b12 1484 case ISO_CODE_SI:
74383408
KH
1485 if (inhibit_iso_escape_detection)
1486 break;
f46869e4 1487 single_shifting = 0;
d46c5b12
KH
1488 if (shift_out == 1)
1489 {
1490 /* Locking shift in. */
1491 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
1492 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
1493 }
1494 break;
1495
4ed46869 1496 case ISO_CODE_CSI:
f46869e4 1497 single_shifting = 0;
4ed46869
KH
1498 case ISO_CODE_SS2:
1499 case ISO_CODE_SS3:
3f003981
KH
1500 {
1501 int newmask = CODING_CATEGORY_MASK_ISO_8_ELSE;
1502
74383408
KH
1503 if (inhibit_iso_escape_detection)
1504 break;
70c22245
KH
1505 if (c != ISO_CODE_CSI)
1506 {
d46c5b12
KH
1507 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
1508 & CODING_FLAG_ISO_SINGLE_SHIFT)
70c22245 1509 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
1510 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
1511 & CODING_FLAG_ISO_SINGLE_SHIFT)
70c22245 1512 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
f46869e4 1513 single_shifting = 1;
70c22245 1514 }
3f003981
KH
1515 if (VECTORP (Vlatin_extra_code_table)
1516 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
1517 {
d46c5b12
KH
1518 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
1519 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981 1520 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
1521 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
1522 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981
KH
1523 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
1524 }
1525 mask &= newmask;
d46c5b12 1526 mask_found |= newmask;
3f003981
KH
1527 }
1528 break;
4ed46869
KH
1529
1530 default:
1531 if (c < 0x80)
f46869e4
KH
1532 {
1533 single_shifting = 0;
1534 break;
1535 }
4ed46869 1536 else if (c < 0xA0)
c4825358 1537 {
f46869e4 1538 single_shifting = 0;
3f003981
KH
1539 if (VECTORP (Vlatin_extra_code_table)
1540 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
c4825358 1541 {
3f003981
KH
1542 int newmask = 0;
1543
d46c5b12
KH
1544 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
1545 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981 1546 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
1547 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
1548 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981
KH
1549 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
1550 mask &= newmask;
d46c5b12 1551 mask_found |= newmask;
c4825358 1552 }
3f003981
KH
1553 else
1554 return 0;
c4825358 1555 }
4ed46869
KH
1556 else
1557 {
d46c5b12 1558 mask &= ~(CODING_CATEGORY_MASK_ISO_7BIT
7717c392 1559 | CODING_CATEGORY_MASK_ISO_7_ELSE);
d46c5b12 1560 mask_found |= CODING_CATEGORY_MASK_ISO_8_1;
f46869e4
KH
1561 /* Check the length of succeeding codes of the range
1562 0xA0..0FF. If the byte length is odd, we exclude
1563 CODING_CATEGORY_MASK_ISO_8_2. We can check this only
1564 when we are not single shifting. */
b73bfc1c
KH
1565 if (!single_shifting
1566 && mask & CODING_CATEGORY_MASK_ISO_8_2)
f46869e4 1567 {
e17de821 1568 int i = 1;
8d239c89
KH
1569
1570 c = -1;
b73bfc1c
KH
1571 while (src < src_end)
1572 {
0a28aafb 1573 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
b73bfc1c
KH
1574 if (c < 0xA0)
1575 break;
1576 i++;
1577 }
1578
1579 if (i & 1 && src < src_end)
f46869e4
KH
1580 mask &= ~CODING_CATEGORY_MASK_ISO_8_2;
1581 else
1582 mask_found |= CODING_CATEGORY_MASK_ISO_8_2;
8d239c89
KH
1583 if (c >= 0)
1584 /* This means that we have read one extra byte. */
1585 goto retry;
f46869e4 1586 }
4ed46869
KH
1587 }
1588 break;
1589 }
1590 }
b73bfc1c 1591 label_end_of_loop:
d46c5b12 1592 return (mask & mask_found);
4ed46869
KH
1593}
1594
b73bfc1c
KH
1595/* Decode a character of which charset is CHARSET, the 1st position
1596 code is C1, the 2nd position code is C2, and return the decoded
1597 character code. If the variable `translation_table' is non-nil,
1598 returned the translated code. */
ec6d2bb8 1599
b73bfc1c
KH
1600#define DECODE_ISO_CHARACTER(charset, c1, c2) \
1601 (NILP (translation_table) \
1602 ? MAKE_CHAR (charset, c1, c2) \
1603 : translate_char (translation_table, -1, charset, c1, c2))
4ed46869
KH
1604
1605/* Set designation state into CODING. */
d46c5b12
KH
1606#define DECODE_DESIGNATION(reg, dimension, chars, final_char) \
1607 do { \
05e6f5dc 1608 int charset, c; \
944bd420
KH
1609 \
1610 if (final_char < '0' || final_char >= 128) \
1611 goto label_invalid_code; \
1612 charset = ISO_CHARSET_TABLE (make_number (dimension), \
1613 make_number (chars), \
1614 make_number (final_char)); \
05e6f5dc 1615 c = MAKE_CHAR (charset, 0, 0); \
d46c5b12 1616 if (charset >= 0 \
704c5781 1617 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg \
05e6f5dc 1618 || CODING_SAFE_CHAR_P (safe_chars, c))) \
d46c5b12
KH
1619 { \
1620 if (coding->spec.iso2022.last_invalid_designation_register == 0 \
1621 && reg == 0 \
1622 && charset == CHARSET_ASCII) \
1623 { \
1624 /* We should insert this designation sequence as is so \
1625 that it is surely written back to a file. */ \
1626 coding->spec.iso2022.last_invalid_designation_register = -1; \
1627 goto label_invalid_code; \
1628 } \
1629 coding->spec.iso2022.last_invalid_designation_register = -1; \
1630 if ((coding->mode & CODING_MODE_DIRECTION) \
1631 && CHARSET_REVERSE_CHARSET (charset) >= 0) \
1632 charset = CHARSET_REVERSE_CHARSET (charset); \
1633 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1634 } \
1635 else \
1636 { \
1637 coding->spec.iso2022.last_invalid_designation_register = reg; \
1638 goto label_invalid_code; \
1639 } \
4ed46869
KH
1640 } while (0)
1641
ec6d2bb8
KH
1642/* Allocate a memory block for storing information about compositions.
1643 The block is chained to the already allocated blocks. */
d46c5b12 1644
33fb63eb 1645void
ec6d2bb8 1646coding_allocate_composition_data (coding, char_offset)
d46c5b12 1647 struct coding_system *coding;
ec6d2bb8 1648 int char_offset;
d46c5b12 1649{
ec6d2bb8
KH
1650 struct composition_data *cmp_data
1651 = (struct composition_data *) xmalloc (sizeof *cmp_data);
1652
1653 cmp_data->char_offset = char_offset;
1654 cmp_data->used = 0;
1655 cmp_data->prev = coding->cmp_data;
1656 cmp_data->next = NULL;
1657 if (coding->cmp_data)
1658 coding->cmp_data->next = cmp_data;
1659 coding->cmp_data = cmp_data;
1660 coding->cmp_data_start = 0;
1661}
d46c5b12 1662
aa72b389
KH
1663/* Handle composition start sequence ESC 0, ESC 2, ESC 3, or ESC 4.
1664 ESC 0 : relative composition : ESC 0 CHAR ... ESC 1
1665 ESC 2 : rulebase composition : ESC 2 CHAR RULE CHAR RULE ... CHAR ESC 1
1666 ESC 3 : altchar composition : ESC 3 ALT ... ESC 0 CHAR ... ESC 1
1667 ESC 4 : alt&rule composition : ESC 4 ALT RULE .. ALT ESC 0 CHAR ... ESC 1
1668 */
ec6d2bb8 1669
33fb63eb
KH
1670#define DECODE_COMPOSITION_START(c1) \
1671 do { \
1672 if (coding->composing == COMPOSITION_DISABLED) \
1673 { \
1674 *dst++ = ISO_CODE_ESC; \
1675 *dst++ = c1 & 0x7f; \
1676 coding->produced_char += 2; \
1677 } \
1678 else if (!COMPOSING_P (coding)) \
1679 { \
1680 /* This is surely the start of a composition. We must be sure \
1681 that coding->cmp_data has enough space to store the \
1682 information about the composition. If not, terminate the \
1683 current decoding loop, allocate one more memory block for \
8ca3766a 1684 coding->cmp_data in the caller, then start the decoding \
33fb63eb
KH
1685 loop again. We can't allocate memory here directly because \
1686 it may cause buffer/string relocation. */ \
1687 if (!coding->cmp_data \
1688 || (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH \
1689 >= COMPOSITION_DATA_SIZE)) \
1690 { \
1691 coding->result = CODING_FINISH_INSUFFICIENT_CMP; \
1692 goto label_end_of_loop; \
1693 } \
1694 coding->composing = (c1 == '0' ? COMPOSITION_RELATIVE \
1695 : c1 == '2' ? COMPOSITION_WITH_RULE \
1696 : c1 == '3' ? COMPOSITION_WITH_ALTCHARS \
1697 : COMPOSITION_WITH_RULE_ALTCHARS); \
1698 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, \
1699 coding->composing); \
1700 coding->composition_rule_follows = 0; \
1701 } \
1702 else \
1703 { \
1704 /* We are already handling a composition. If the method is \
1705 the following two, the codes following the current escape \
1706 sequence are actual characters stored in a buffer. */ \
1707 if (coding->composing == COMPOSITION_WITH_ALTCHARS \
1708 || coding->composing == COMPOSITION_WITH_RULE_ALTCHARS) \
1709 { \
1710 coding->composing = COMPOSITION_RELATIVE; \
1711 coding->composition_rule_follows = 0; \
1712 } \
1713 } \
ec6d2bb8
KH
1714 } while (0)
1715
8ca3766a 1716/* Handle composition end sequence ESC 1. */
ec6d2bb8
KH
1717
1718#define DECODE_COMPOSITION_END(c1) \
1719 do { \
93dec019 1720 if (! COMPOSING_P (coding)) \
ec6d2bb8
KH
1721 { \
1722 *dst++ = ISO_CODE_ESC; \
1723 *dst++ = c1; \
1724 coding->produced_char += 2; \
1725 } \
1726 else \
1727 { \
1728 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
1729 coding->composing = COMPOSITION_NO; \
1730 } \
1731 } while (0)
1732
1733/* Decode a composition rule from the byte C1 (and maybe one more byte
1734 from SRC) and store one encoded composition rule in
1735 coding->cmp_data. */
1736
1737#define DECODE_COMPOSITION_RULE(c1) \
1738 do { \
1739 int rule = 0; \
1740 (c1) -= 32; \
1741 if (c1 < 81) /* old format (before ver.21) */ \
1742 { \
1743 int gref = (c1) / 9; \
1744 int nref = (c1) % 9; \
1745 if (gref == 4) gref = 10; \
1746 if (nref == 4) nref = 10; \
1747 rule = COMPOSITION_ENCODE_RULE (gref, nref); \
1748 } \
b73bfc1c 1749 else if (c1 < 93) /* new format (after ver.21) */ \
ec6d2bb8
KH
1750 { \
1751 ONE_MORE_BYTE (c2); \
1752 rule = COMPOSITION_ENCODE_RULE (c1 - 81, c2 - 32); \
1753 } \
1754 CODING_ADD_COMPOSITION_COMPONENT (coding, rule); \
1755 coding->composition_rule_follows = 0; \
1756 } while (0)
88993dfd 1757
d46c5b12 1758
4ed46869
KH
1759/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
1760
b73bfc1c 1761static void
d46c5b12 1762decode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
1763 struct coding_system *coding;
1764 unsigned char *source, *destination;
1765 int src_bytes, dst_bytes;
4ed46869
KH
1766{
1767 unsigned char *src = source;
1768 unsigned char *src_end = source + src_bytes;
1769 unsigned char *dst = destination;
1770 unsigned char *dst_end = destination + dst_bytes;
4ed46869
KH
1771 /* Charsets invoked to graphic plane 0 and 1 respectively. */
1772 int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1773 int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
b73bfc1c
KH
1774 /* SRC_BASE remembers the start position in source in each loop.
1775 The loop will be exited when there's not enough source code
1776 (within macro ONE_MORE_BYTE), or when there's not enough
1777 destination area to produce a character (within macro
1778 EMIT_CHAR). */
1779 unsigned char *src_base;
1780 int c, charset;
1781 Lisp_Object translation_table;
05e6f5dc
KH
1782 Lisp_Object safe_chars;
1783
6b89e3aa 1784 safe_chars = coding_safe_chars (coding->symbol);
bdd9fb48 1785
b73bfc1c
KH
1786 if (NILP (Venable_character_translation))
1787 translation_table = Qnil;
1788 else
1789 {
1790 translation_table = coding->translation_table_for_decode;
1791 if (NILP (translation_table))
1792 translation_table = Vstandard_translation_table_for_decode;
1793 }
4ed46869 1794
b73bfc1c
KH
1795 coding->result = CODING_FINISH_NORMAL;
1796
1797 while (1)
4ed46869 1798 {
b73bfc1c
KH
1799 int c1, c2;
1800
1801 src_base = src;
1802 ONE_MORE_BYTE (c1);
4ed46869 1803
ec6d2bb8 1804 /* We produce no character or one character. */
4ed46869
KH
1805 switch (iso_code_class [c1])
1806 {
1807 case ISO_0x20_or_0x7F:
ec6d2bb8
KH
1808 if (COMPOSING_P (coding) && coding->composition_rule_follows)
1809 {
1810 DECODE_COMPOSITION_RULE (c1);
b73bfc1c 1811 continue;
ec6d2bb8
KH
1812 }
1813 if (charset0 < 0 || CHARSET_CHARS (charset0) == 94)
4ed46869
KH
1814 {
1815 /* This is SPACE or DEL. */
b73bfc1c 1816 charset = CHARSET_ASCII;
4ed46869
KH
1817 break;
1818 }
1819 /* This is a graphic character, we fall down ... */
1820
1821 case ISO_graphic_plane_0:
ec6d2bb8 1822 if (COMPOSING_P (coding) && coding->composition_rule_follows)
b73bfc1c
KH
1823 {
1824 DECODE_COMPOSITION_RULE (c1);
1825 continue;
1826 }
1827 charset = charset0;
4ed46869
KH
1828 break;
1829
1830 case ISO_0xA0_or_0xFF:
d46c5b12
KH
1831 if (charset1 < 0 || CHARSET_CHARS (charset1) == 94
1832 || coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
fb88bf2d 1833 goto label_invalid_code;
4ed46869
KH
1834 /* This is a graphic character, we fall down ... */
1835
1836 case ISO_graphic_plane_1:
b73bfc1c 1837 if (charset1 < 0)
fb88bf2d 1838 goto label_invalid_code;
b73bfc1c 1839 charset = charset1;
4ed46869
KH
1840 break;
1841
b73bfc1c 1842 case ISO_control_0:
ec6d2bb8
KH
1843 if (COMPOSING_P (coding))
1844 DECODE_COMPOSITION_END ('1');
1845
4ed46869
KH
1846 /* All ISO2022 control characters in this class have the
1847 same representation in Emacs internal format. */
d46c5b12
KH
1848 if (c1 == '\n'
1849 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
1850 && (coding->eol_type == CODING_EOL_CR
1851 || coding->eol_type == CODING_EOL_CRLF))
1852 {
b73bfc1c
KH
1853 coding->result = CODING_FINISH_INCONSISTENT_EOL;
1854 goto label_end_of_loop;
d46c5b12 1855 }
b73bfc1c 1856 charset = CHARSET_ASCII;
4ed46869
KH
1857 break;
1858
b73bfc1c
KH
1859 case ISO_control_1:
1860 if (COMPOSING_P (coding))
1861 DECODE_COMPOSITION_END ('1');
1862 goto label_invalid_code;
1863
4ed46869 1864 case ISO_carriage_return:
ec6d2bb8
KH
1865 if (COMPOSING_P (coding))
1866 DECODE_COMPOSITION_END ('1');
1867
4ed46869 1868 if (coding->eol_type == CODING_EOL_CR)
b73bfc1c 1869 c1 = '\n';
4ed46869
KH
1870 else if (coding->eol_type == CODING_EOL_CRLF)
1871 {
1872 ONE_MORE_BYTE (c1);
b73bfc1c 1873 if (c1 != ISO_CODE_LF)
4ed46869
KH
1874 {
1875 src--;
b73bfc1c 1876 c1 = '\r';
4ed46869
KH
1877 }
1878 }
b73bfc1c 1879 charset = CHARSET_ASCII;
4ed46869
KH
1880 break;
1881
1882 case ISO_shift_out:
d46c5b12
KH
1883 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1884 || CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0)
1885 goto label_invalid_code;
4ed46869
KH
1886 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1;
1887 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1888 continue;
4ed46869
KH
1889
1890 case ISO_shift_in:
d46c5b12
KH
1891 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
1892 goto label_invalid_code;
4ed46869
KH
1893 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
1894 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1895 continue;
4ed46869
KH
1896
1897 case ISO_single_shift_2_7:
1898 case ISO_single_shift_2:
d46c5b12
KH
1899 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1900 goto label_invalid_code;
4ed46869
KH
1901 /* SS2 is handled as an escape sequence of ESC 'N' */
1902 c1 = 'N';
1903 goto label_escape_sequence;
1904
1905 case ISO_single_shift_3:
d46c5b12
KH
1906 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1907 goto label_invalid_code;
4ed46869
KH
1908 /* SS2 is handled as an escape sequence of ESC 'O' */
1909 c1 = 'O';
1910 goto label_escape_sequence;
1911
1912 case ISO_control_sequence_introducer:
1913 /* CSI is handled as an escape sequence of ESC '[' ... */
1914 c1 = '[';
1915 goto label_escape_sequence;
1916
1917 case ISO_escape:
1918 ONE_MORE_BYTE (c1);
1919 label_escape_sequence:
1920 /* Escape sequences handled by Emacs are invocation,
1921 designation, direction specification, and character
1922 composition specification. */
1923 switch (c1)
1924 {
1925 case '&': /* revision of following character set */
1926 ONE_MORE_BYTE (c1);
1927 if (!(c1 >= '@' && c1 <= '~'))
d46c5b12 1928 goto label_invalid_code;
4ed46869
KH
1929 ONE_MORE_BYTE (c1);
1930 if (c1 != ISO_CODE_ESC)
d46c5b12 1931 goto label_invalid_code;
4ed46869
KH
1932 ONE_MORE_BYTE (c1);
1933 goto label_escape_sequence;
1934
1935 case '$': /* designation of 2-byte character set */
d46c5b12
KH
1936 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
1937 goto label_invalid_code;
4ed46869
KH
1938 ONE_MORE_BYTE (c1);
1939 if (c1 >= '@' && c1 <= 'B')
1940 { /* designation of JISX0208.1978, GB2312.1980,
88993dfd 1941 or JISX0208.1980 */
4ed46869
KH
1942 DECODE_DESIGNATION (0, 2, 94, c1);
1943 }
1944 else if (c1 >= 0x28 && c1 <= 0x2B)
1945 { /* designation of DIMENSION2_CHARS94 character set */
1946 ONE_MORE_BYTE (c2);
1947 DECODE_DESIGNATION (c1 - 0x28, 2, 94, c2);
1948 }
1949 else if (c1 >= 0x2C && c1 <= 0x2F)
1950 { /* designation of DIMENSION2_CHARS96 character set */
1951 ONE_MORE_BYTE (c2);
1952 DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2);
1953 }
1954 else
d46c5b12 1955 goto label_invalid_code;
b73bfc1c
KH
1956 /* We must update these variables now. */
1957 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1958 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
1959 continue;
4ed46869
KH
1960
1961 case 'n': /* invocation of locking-shift-2 */
d46c5b12
KH
1962 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1963 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
1964 goto label_invalid_code;
4ed46869 1965 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2;
e0e989f6 1966 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1967 continue;
4ed46869
KH
1968
1969 case 'o': /* invocation of locking-shift-3 */
d46c5b12
KH
1970 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1971 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
1972 goto label_invalid_code;
4ed46869 1973 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3;
e0e989f6 1974 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1975 continue;
4ed46869
KH
1976
1977 case 'N': /* invocation of single-shift-2 */
d46c5b12
KH
1978 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1979 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
1980 goto label_invalid_code;
4ed46869 1981 charset = CODING_SPEC_ISO_DESIGNATION (coding, 2);
b73bfc1c 1982 ONE_MORE_BYTE (c1);
e7046a18
KH
1983 if (c1 < 0x20 || (c1 >= 0x80 && c1 < 0xA0))
1984 goto label_invalid_code;
4ed46869
KH
1985 break;
1986
1987 case 'O': /* invocation of single-shift-3 */
d46c5b12
KH
1988 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1989 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
1990 goto label_invalid_code;
4ed46869 1991 charset = CODING_SPEC_ISO_DESIGNATION (coding, 3);
b73bfc1c 1992 ONE_MORE_BYTE (c1);
e7046a18
KH
1993 if (c1 < 0x20 || (c1 >= 0x80 && c1 < 0xA0))
1994 goto label_invalid_code;
4ed46869
KH
1995 break;
1996
ec6d2bb8
KH
1997 case '0': case '2': case '3': case '4': /* start composition */
1998 DECODE_COMPOSITION_START (c1);
b73bfc1c 1999 continue;
4ed46869 2000
ec6d2bb8
KH
2001 case '1': /* end composition */
2002 DECODE_COMPOSITION_END (c1);
b73bfc1c 2003 continue;
4ed46869
KH
2004
2005 case '[': /* specification of direction */
d46c5b12
KH
2006 if (coding->flags & CODING_FLAG_ISO_NO_DIRECTION)
2007 goto label_invalid_code;
4ed46869 2008 /* For the moment, nested direction is not supported.
d46c5b12 2009 So, `coding->mode & CODING_MODE_DIRECTION' zero means
8ca3766a 2010 left-to-right, and nonzero means right-to-left. */
4ed46869
KH
2011 ONE_MORE_BYTE (c1);
2012 switch (c1)
2013 {
2014 case ']': /* end of the current direction */
d46c5b12 2015 coding->mode &= ~CODING_MODE_DIRECTION;
4ed46869
KH
2016
2017 case '0': /* end of the current direction */
2018 case '1': /* start of left-to-right direction */
2019 ONE_MORE_BYTE (c1);
2020 if (c1 == ']')
d46c5b12 2021 coding->mode &= ~CODING_MODE_DIRECTION;
4ed46869 2022 else
d46c5b12 2023 goto label_invalid_code;
4ed46869
KH
2024 break;
2025
2026 case '2': /* start of right-to-left direction */
2027 ONE_MORE_BYTE (c1);
2028 if (c1 == ']')
d46c5b12 2029 coding->mode |= CODING_MODE_DIRECTION;
4ed46869 2030 else
d46c5b12 2031 goto label_invalid_code;
4ed46869
KH
2032 break;
2033
2034 default:
d46c5b12 2035 goto label_invalid_code;
4ed46869 2036 }
b73bfc1c 2037 continue;
4ed46869
KH
2038
2039 default:
d46c5b12
KH
2040 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
2041 goto label_invalid_code;
4ed46869
KH
2042 if (c1 >= 0x28 && c1 <= 0x2B)
2043 { /* designation of DIMENSION1_CHARS94 character set */
2044 ONE_MORE_BYTE (c2);
2045 DECODE_DESIGNATION (c1 - 0x28, 1, 94, c2);
2046 }
2047 else if (c1 >= 0x2C && c1 <= 0x2F)
2048 { /* designation of DIMENSION1_CHARS96 character set */
2049 ONE_MORE_BYTE (c2);
2050 DECODE_DESIGNATION (c1 - 0x2C, 1, 96, c2);
2051 }
2052 else
b73bfc1c
KH
2053 goto label_invalid_code;
2054 /* We must update these variables now. */
2055 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
2056 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
2057 continue;
4ed46869 2058 }
b73bfc1c 2059 }
4ed46869 2060
b73bfc1c
KH
2061 /* Now we know CHARSET and 1st position code C1 of a character.
2062 Produce a multibyte sequence for that character while getting
2063 2nd position code C2 if necessary. */
2064 if (CHARSET_DIMENSION (charset) == 2)
2065 {
2066 ONE_MORE_BYTE (c2);
2067 if (c1 < 0x80 ? c2 < 0x20 || c2 >= 0x80 : c2 < 0xA0)
2068 /* C2 is not in a valid range. */
2069 goto label_invalid_code;
4ed46869 2070 }
b73bfc1c
KH
2071 c = DECODE_ISO_CHARACTER (charset, c1, c2);
2072 EMIT_CHAR (c);
4ed46869
KH
2073 continue;
2074
b73bfc1c
KH
2075 label_invalid_code:
2076 coding->errors++;
2077 if (COMPOSING_P (coding))
2078 DECODE_COMPOSITION_END ('1');
4ed46869 2079 src = src_base;
b73bfc1c
KH
2080 c = *src++;
2081 EMIT_CHAR (c);
4ed46869 2082 }
fb88bf2d 2083
b73bfc1c
KH
2084 label_end_of_loop:
2085 coding->consumed = coding->consumed_char = src_base - source;
d46c5b12 2086 coding->produced = dst - destination;
b73bfc1c 2087 return;
4ed46869
KH
2088}
2089
b73bfc1c 2090
f4dee582 2091/* ISO2022 encoding stuff. */
4ed46869
KH
2092
2093/*
f4dee582 2094 It is not enough to say just "ISO2022" on encoding, we have to
cfb43547 2095 specify more details. In Emacs, each ISO2022 coding system
4ed46869 2096 variant has the following specifications:
8ca3766a 2097 1. Initial designation to G0 through G3.
4ed46869
KH
2098 2. Allows short-form designation?
2099 3. ASCII should be designated to G0 before control characters?
2100 4. ASCII should be designated to G0 at end of line?
2101 5. 7-bit environment or 8-bit environment?
2102 6. Use locking-shift?
2103 7. Use Single-shift?
2104 And the following two are only for Japanese:
2105 8. Use ASCII in place of JIS0201-1976-Roman?
2106 9. Use JISX0208-1983 in place of JISX0208-1978?
2107 These specifications are encoded in `coding->flags' as flag bits
2108 defined by macros CODING_FLAG_ISO_XXX. See `coding.h' for more
f4dee582 2109 details.
4ed46869
KH
2110*/
2111
2112/* Produce codes (escape sequence) for designating CHARSET to graphic
b73bfc1c
KH
2113 register REG at DST, and increment DST. If <final-char> of CHARSET is
2114 '@', 'A', or 'B' and the coding system CODING allows, produce
2115 designation sequence of short-form. */
4ed46869
KH
2116
2117#define ENCODE_DESIGNATION(charset, reg, coding) \
2118 do { \
2119 unsigned char final_char = CHARSET_ISO_FINAL_CHAR (charset); \
2120 char *intermediate_char_94 = "()*+"; \
2121 char *intermediate_char_96 = ",-./"; \
70c22245 2122 int revision = CODING_SPEC_ISO_REVISION_NUMBER(coding, charset); \
b73bfc1c 2123 \
70c22245
KH
2124 if (revision < 255) \
2125 { \
4ed46869
KH
2126 *dst++ = ISO_CODE_ESC; \
2127 *dst++ = '&'; \
70c22245 2128 *dst++ = '@' + revision; \
4ed46869 2129 } \
b73bfc1c 2130 *dst++ = ISO_CODE_ESC; \
4ed46869
KH
2131 if (CHARSET_DIMENSION (charset) == 1) \
2132 { \
2133 if (CHARSET_CHARS (charset) == 94) \
2134 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2135 else \
2136 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
2137 } \
2138 else \
2139 { \
2140 *dst++ = '$'; \
2141 if (CHARSET_CHARS (charset) == 94) \
2142 { \
b73bfc1c
KH
2143 if (! (coding->flags & CODING_FLAG_ISO_SHORT_FORM) \
2144 || reg != 0 \
2145 || final_char < '@' || final_char > 'B') \
4ed46869
KH
2146 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2147 } \
2148 else \
b73bfc1c 2149 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
4ed46869 2150 } \
b73bfc1c 2151 *dst++ = final_char; \
4ed46869
KH
2152 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
2153 } while (0)
2154
2155/* The following two macros produce codes (control character or escape
2156 sequence) for ISO2022 single-shift functions (single-shift-2 and
2157 single-shift-3). */
2158
2159#define ENCODE_SINGLE_SHIFT_2 \
2160 do { \
2161 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2162 *dst++ = ISO_CODE_ESC, *dst++ = 'N'; \
2163 else \
b73bfc1c 2164 *dst++ = ISO_CODE_SS2; \
4ed46869
KH
2165 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2166 } while (0)
2167
fb88bf2d
KH
2168#define ENCODE_SINGLE_SHIFT_3 \
2169 do { \
4ed46869 2170 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
fb88bf2d
KH
2171 *dst++ = ISO_CODE_ESC, *dst++ = 'O'; \
2172 else \
b73bfc1c 2173 *dst++ = ISO_CODE_SS3; \
4ed46869
KH
2174 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2175 } while (0)
2176
2177/* The following four macros produce codes (control character or
2178 escape sequence) for ISO2022 locking-shift functions (shift-in,
2179 shift-out, locking-shift-2, and locking-shift-3). */
2180
b73bfc1c
KH
2181#define ENCODE_SHIFT_IN \
2182 do { \
2183 *dst++ = ISO_CODE_SI; \
4ed46869
KH
2184 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; \
2185 } while (0)
2186
b73bfc1c
KH
2187#define ENCODE_SHIFT_OUT \
2188 do { \
2189 *dst++ = ISO_CODE_SO; \
4ed46869
KH
2190 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; \
2191 } while (0)
2192
2193#define ENCODE_LOCKING_SHIFT_2 \
2194 do { \
2195 *dst++ = ISO_CODE_ESC, *dst++ = 'n'; \
2196 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; \
2197 } while (0)
2198
b73bfc1c
KH
2199#define ENCODE_LOCKING_SHIFT_3 \
2200 do { \
2201 *dst++ = ISO_CODE_ESC, *dst++ = 'o'; \
4ed46869
KH
2202 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; \
2203 } while (0)
2204
f4dee582
RS
2205/* Produce codes for a DIMENSION1 character whose character set is
2206 CHARSET and whose position-code is C1. Designation and invocation
4ed46869
KH
2207 sequences are also produced in advance if necessary. */
2208
6e85d753
KH
2209#define ENCODE_ISO_CHARACTER_DIMENSION1(charset, c1) \
2210 do { \
2211 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2212 { \
2213 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2214 *dst++ = c1 & 0x7F; \
2215 else \
2216 *dst++ = c1 | 0x80; \
2217 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2218 break; \
2219 } \
2220 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2221 { \
2222 *dst++ = c1 & 0x7F; \
2223 break; \
2224 } \
2225 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2226 { \
2227 *dst++ = c1 | 0x80; \
2228 break; \
2229 } \
6e85d753
KH
2230 else \
2231 /* Since CHARSET is not yet invoked to any graphic planes, we \
2232 must invoke it, or, at first, designate it to some graphic \
2233 register. Then repeat the loop to actually produce the \
2234 character. */ \
2235 dst = encode_invocation_designation (charset, coding, dst); \
4ed46869
KH
2236 } while (1)
2237
f4dee582
RS
2238/* Produce codes for a DIMENSION2 character whose character set is
2239 CHARSET and whose position-codes are C1 and C2. Designation and
4ed46869
KH
2240 invocation codes are also produced in advance if necessary. */
2241
6e85d753
KH
2242#define ENCODE_ISO_CHARACTER_DIMENSION2(charset, c1, c2) \
2243 do { \
2244 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2245 { \
2246 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2247 *dst++ = c1 & 0x7F, *dst++ = c2 & 0x7F; \
2248 else \
2249 *dst++ = c1 | 0x80, *dst++ = c2 | 0x80; \
2250 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2251 break; \
2252 } \
2253 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2254 { \
2255 *dst++ = c1 & 0x7F, *dst++= c2 & 0x7F; \
2256 break; \
2257 } \
2258 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2259 { \
2260 *dst++ = c1 | 0x80, *dst++= c2 | 0x80; \
2261 break; \
2262 } \
6e85d753
KH
2263 else \
2264 /* Since CHARSET is not yet invoked to any graphic planes, we \
2265 must invoke it, or, at first, designate it to some graphic \
2266 register. Then repeat the loop to actually produce the \
2267 character. */ \
2268 dst = encode_invocation_designation (charset, coding, dst); \
4ed46869
KH
2269 } while (1)
2270
05e6f5dc
KH
2271#define ENCODE_ISO_CHARACTER(c) \
2272 do { \
2273 int charset, c1, c2; \
2274 \
2275 SPLIT_CHAR (c, charset, c1, c2); \
2276 if (CHARSET_DEFINED_P (charset)) \
2277 { \
2278 if (CHARSET_DIMENSION (charset) == 1) \
2279 { \
2280 if (charset == CHARSET_ASCII \
2281 && coding->flags & CODING_FLAG_ISO_USE_ROMAN) \
2282 charset = charset_latin_jisx0201; \
2283 ENCODE_ISO_CHARACTER_DIMENSION1 (charset, c1); \
2284 } \
2285 else \
2286 { \
2287 if (charset == charset_jisx0208 \
2288 && coding->flags & CODING_FLAG_ISO_USE_OLDJIS) \
2289 charset = charset_jisx0208_1978; \
2290 ENCODE_ISO_CHARACTER_DIMENSION2 (charset, c1, c2); \
2291 } \
2292 } \
2293 else \
2294 { \
2295 *dst++ = c1; \
2296 if (c2 >= 0) \
2297 *dst++ = c2; \
2298 } \
2299 } while (0)
2300
2301
2302/* Instead of encoding character C, produce one or two `?'s. */
2303
2304#define ENCODE_UNSAFE_CHARACTER(c) \
6f551029 2305 do { \
05e6f5dc
KH
2306 ENCODE_ISO_CHARACTER (CODING_INHIBIT_CHARACTER_SUBSTITUTION); \
2307 if (CHARSET_WIDTH (CHAR_CHARSET (c)) > 1) \
2308 ENCODE_ISO_CHARACTER (CODING_INHIBIT_CHARACTER_SUBSTITUTION); \
84fbb8a0 2309 } while (0)
bdd9fb48 2310
05e6f5dc 2311
4ed46869
KH
2312/* Produce designation and invocation codes at a place pointed by DST
2313 to use CHARSET. The element `spec.iso2022' of *CODING is updated.
2314 Return new DST. */
2315
2316unsigned char *
2317encode_invocation_designation (charset, coding, dst)
2318 int charset;
2319 struct coding_system *coding;
2320 unsigned char *dst;
2321{
2322 int reg; /* graphic register number */
2323
2324 /* At first, check designations. */
2325 for (reg = 0; reg < 4; reg++)
2326 if (charset == CODING_SPEC_ISO_DESIGNATION (coding, reg))
2327 break;
2328
2329 if (reg >= 4)
2330 {
2331 /* CHARSET is not yet designated to any graphic registers. */
2332 /* At first check the requested designation. */
2333 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
1ba9e4ab
KH
2334 if (reg == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)
2335 /* Since CHARSET requests no special designation, designate it
2336 to graphic register 0. */
4ed46869
KH
2337 reg = 0;
2338
2339 ENCODE_DESIGNATION (charset, reg, coding);
2340 }
2341
2342 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != reg
2343 && CODING_SPEC_ISO_INVOCATION (coding, 1) != reg)
2344 {
2345 /* Since the graphic register REG is not invoked to any graphic
2346 planes, invoke it to graphic plane 0. */
2347 switch (reg)
2348 {
2349 case 0: /* graphic register 0 */
2350 ENCODE_SHIFT_IN;
2351 break;
2352
2353 case 1: /* graphic register 1 */
2354 ENCODE_SHIFT_OUT;
2355 break;
2356
2357 case 2: /* graphic register 2 */
2358 if (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
2359 ENCODE_SINGLE_SHIFT_2;
2360 else
2361 ENCODE_LOCKING_SHIFT_2;
2362 break;
2363
2364 case 3: /* graphic register 3 */
2365 if (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
2366 ENCODE_SINGLE_SHIFT_3;
2367 else
2368 ENCODE_LOCKING_SHIFT_3;
2369 break;
2370 }
2371 }
b73bfc1c 2372
4ed46869
KH
2373 return dst;
2374}
2375
ec6d2bb8
KH
2376/* Produce 2-byte codes for encoded composition rule RULE. */
2377
2378#define ENCODE_COMPOSITION_RULE(rule) \
2379 do { \
2380 int gref, nref; \
2381 COMPOSITION_DECODE_RULE (rule, gref, nref); \
2382 *dst++ = 32 + 81 + gref; \
2383 *dst++ = 32 + nref; \
2384 } while (0)
2385
2386/* Produce codes for indicating the start of a composition sequence
2387 (ESC 0, ESC 3, or ESC 4). DATA points to an array of integers
2388 which specify information about the composition. See the comment
2389 in coding.h for the format of DATA. */
2390
2391#define ENCODE_COMPOSITION_START(coding, data) \
2392 do { \
2393 coding->composing = data[3]; \
2394 *dst++ = ISO_CODE_ESC; \
2395 if (coding->composing == COMPOSITION_RELATIVE) \
2396 *dst++ = '0'; \
2397 else \
2398 { \
2399 *dst++ = (coding->composing == COMPOSITION_WITH_ALTCHARS \
2400 ? '3' : '4'); \
2401 coding->cmp_data_index = coding->cmp_data_start + 4; \
2402 coding->composition_rule_follows = 0; \
2403 } \
2404 } while (0)
2405
2406/* Produce codes for indicating the end of the current composition. */
2407
2408#define ENCODE_COMPOSITION_END(coding, data) \
2409 do { \
2410 *dst++ = ISO_CODE_ESC; \
2411 *dst++ = '1'; \
2412 coding->cmp_data_start += data[0]; \
2413 coding->composing = COMPOSITION_NO; \
2414 if (coding->cmp_data_start == coding->cmp_data->used \
2415 && coding->cmp_data->next) \
2416 { \
2417 coding->cmp_data = coding->cmp_data->next; \
2418 coding->cmp_data_start = 0; \
2419 } \
2420 } while (0)
2421
2422/* Produce composition start sequence ESC 0. Here, this sequence
2423 doesn't mean the start of a new composition but means that we have
2424 just produced components (alternate chars and composition rules) of
2425 the composition and the actual text follows in SRC. */
2426
2427#define ENCODE_COMPOSITION_FAKE_START(coding) \
2428 do { \
2429 *dst++ = ISO_CODE_ESC; \
2430 *dst++ = '0'; \
2431 coding->composing = COMPOSITION_RELATIVE; \
2432 } while (0)
4ed46869
KH
2433
2434/* The following three macros produce codes for indicating direction
2435 of text. */
b73bfc1c
KH
2436#define ENCODE_CONTROL_SEQUENCE_INTRODUCER \
2437 do { \
4ed46869 2438 if (coding->flags == CODING_FLAG_ISO_SEVEN_BITS) \
b73bfc1c
KH
2439 *dst++ = ISO_CODE_ESC, *dst++ = '['; \
2440 else \
2441 *dst++ = ISO_CODE_CSI; \
4ed46869
KH
2442 } while (0)
2443
2444#define ENCODE_DIRECTION_R2L \
b73bfc1c 2445 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '2', *dst++ = ']'
4ed46869
KH
2446
2447#define ENCODE_DIRECTION_L2R \
b73bfc1c 2448 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '0', *dst++ = ']'
4ed46869
KH
2449
2450/* Produce codes for designation and invocation to reset the graphic
2451 planes and registers to initial state. */
e0e989f6
KH
2452#define ENCODE_RESET_PLANE_AND_REGISTER \
2453 do { \
2454 int reg; \
2455 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != 0) \
2456 ENCODE_SHIFT_IN; \
2457 for (reg = 0; reg < 4; reg++) \
2458 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg) >= 0 \
2459 && (CODING_SPEC_ISO_DESIGNATION (coding, reg) \
2460 != CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg))) \
2461 ENCODE_DESIGNATION \
2462 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \
4ed46869
KH
2463 } while (0)
2464
bdd9fb48 2465/* Produce designation sequences of charsets in the line started from
b73bfc1c 2466 SRC to a place pointed by DST, and return updated DST.
bdd9fb48
KH
2467
2468 If the current block ends before any end-of-line, we may fail to
d46c5b12
KH
2469 find all the necessary designations. */
2470
b73bfc1c
KH
2471static unsigned char *
2472encode_designation_at_bol (coding, translation_table, src, src_end, dst)
e0e989f6 2473 struct coding_system *coding;
b73bfc1c
KH
2474 Lisp_Object translation_table;
2475 unsigned char *src, *src_end, *dst;
e0e989f6 2476{
bdd9fb48
KH
2477 int charset, c, found = 0, reg;
2478 /* Table of charsets to be designated to each graphic register. */
2479 int r[4];
bdd9fb48
KH
2480
2481 for (reg = 0; reg < 4; reg++)
2482 r[reg] = -1;
2483
b73bfc1c 2484 while (found < 4)
e0e989f6 2485 {
b73bfc1c
KH
2486 ONE_MORE_CHAR (c);
2487 if (c == '\n')
2488 break;
93dec019 2489
b73bfc1c 2490 charset = CHAR_CHARSET (c);
e0e989f6 2491 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
d46c5b12 2492 if (reg != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION && r[reg] < 0)
bdd9fb48
KH
2493 {
2494 found++;
2495 r[reg] = charset;
2496 }
bdd9fb48
KH
2497 }
2498
b73bfc1c 2499 label_end_of_loop:
bdd9fb48
KH
2500 if (found)
2501 {
2502 for (reg = 0; reg < 4; reg++)
2503 if (r[reg] >= 0
2504 && CODING_SPEC_ISO_DESIGNATION (coding, reg) != r[reg])
2505 ENCODE_DESIGNATION (r[reg], reg, coding);
e0e989f6 2506 }
b73bfc1c
KH
2507
2508 return dst;
e0e989f6
KH
2509}
2510
4ed46869
KH
2511/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
2512
b73bfc1c 2513static void
d46c5b12 2514encode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
2515 struct coding_system *coding;
2516 unsigned char *source, *destination;
2517 int src_bytes, dst_bytes;
4ed46869
KH
2518{
2519 unsigned char *src = source;
2520 unsigned char *src_end = source + src_bytes;
2521 unsigned char *dst = destination;
2522 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c 2523 /* Since the maximum bytes produced by each loop is 20, we subtract 19
4ed46869
KH
2524 from DST_END to assure overflow checking is necessary only at the
2525 head of loop. */
b73bfc1c
KH
2526 unsigned char *adjusted_dst_end = dst_end - 19;
2527 /* SRC_BASE remembers the start position in source in each loop.
2528 The loop will be exited when there's not enough source text to
2529 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2530 there's not enough destination area to produce encoded codes
2531 (within macro EMIT_BYTES). */
2532 unsigned char *src_base;
2533 int c;
2534 Lisp_Object translation_table;
05e6f5dc
KH
2535 Lisp_Object safe_chars;
2536
6b89e3aa 2537 safe_chars = coding_safe_chars (coding->symbol);
bdd9fb48 2538
b73bfc1c
KH
2539 if (NILP (Venable_character_translation))
2540 translation_table = Qnil;
2541 else
2542 {
2543 translation_table = coding->translation_table_for_encode;
2544 if (NILP (translation_table))
2545 translation_table = Vstandard_translation_table_for_encode;
2546 }
4ed46869 2547
d46c5b12 2548 coding->consumed_char = 0;
b73bfc1c
KH
2549 coding->errors = 0;
2550 while (1)
4ed46869 2551 {
b73bfc1c
KH
2552 src_base = src;
2553
2554 if (dst >= (dst_bytes ? adjusted_dst_end : (src - 19)))
2555 {
2556 coding->result = CODING_FINISH_INSUFFICIENT_DST;
2557 break;
2558 }
4ed46869 2559
e0e989f6
KH
2560 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL
2561 && CODING_SPEC_ISO_BOL (coding))
2562 {
bdd9fb48 2563 /* We have to produce designation sequences if any now. */
b73bfc1c
KH
2564 dst = encode_designation_at_bol (coding, translation_table,
2565 src, src_end, dst);
e0e989f6
KH
2566 CODING_SPEC_ISO_BOL (coding) = 0;
2567 }
2568
ec6d2bb8
KH
2569 /* Check composition start and end. */
2570 if (coding->composing != COMPOSITION_DISABLED
2571 && coding->cmp_data_start < coding->cmp_data->used)
4ed46869 2572 {
ec6d2bb8
KH
2573 struct composition_data *cmp_data = coding->cmp_data;
2574 int *data = cmp_data->data + coding->cmp_data_start;
2575 int this_pos = cmp_data->char_offset + coding->consumed_char;
2576
2577 if (coding->composing == COMPOSITION_RELATIVE)
4ed46869 2578 {
ec6d2bb8
KH
2579 if (this_pos == data[2])
2580 {
2581 ENCODE_COMPOSITION_END (coding, data);
2582 cmp_data = coding->cmp_data;
2583 data = cmp_data->data + coding->cmp_data_start;
2584 }
4ed46869 2585 }
ec6d2bb8 2586 else if (COMPOSING_P (coding))
4ed46869 2587 {
ec6d2bb8
KH
2588 /* COMPOSITION_WITH_ALTCHARS or COMPOSITION_WITH_RULE_ALTCHAR */
2589 if (coding->cmp_data_index == coding->cmp_data_start + data[0])
2590 /* We have consumed components of the composition.
8ca3766a 2591 What follows in SRC is the composition's base
ec6d2bb8
KH
2592 text. */
2593 ENCODE_COMPOSITION_FAKE_START (coding);
2594 else
4ed46869 2595 {
ec6d2bb8
KH
2596 int c = cmp_data->data[coding->cmp_data_index++];
2597 if (coding->composition_rule_follows)
2598 {
2599 ENCODE_COMPOSITION_RULE (c);
2600 coding->composition_rule_follows = 0;
2601 }
2602 else
2603 {
05e6f5dc
KH
2604 if (coding->flags & CODING_FLAG_ISO_SAFE
2605 && ! CODING_SAFE_CHAR_P (safe_chars, c))
2606 ENCODE_UNSAFE_CHARACTER (c);
2607 else
2608 ENCODE_ISO_CHARACTER (c);
ec6d2bb8
KH
2609 if (coding->composing == COMPOSITION_WITH_RULE_ALTCHARS)
2610 coding->composition_rule_follows = 1;
2611 }
4ed46869
KH
2612 continue;
2613 }
ec6d2bb8
KH
2614 }
2615 if (!COMPOSING_P (coding))
2616 {
2617 if (this_pos == data[1])
4ed46869 2618 {
ec6d2bb8
KH
2619 ENCODE_COMPOSITION_START (coding, data);
2620 continue;
4ed46869 2621 }
4ed46869
KH
2622 }
2623 }
ec6d2bb8 2624
b73bfc1c 2625 ONE_MORE_CHAR (c);
4ed46869 2626
b73bfc1c
KH
2627 /* Now encode the character C. */
2628 if (c < 0x20 || c == 0x7F)
2629 {
2630 if (c == '\r')
19a8d9e0 2631 {
b73bfc1c
KH
2632 if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY))
2633 {
2634 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
2635 ENCODE_RESET_PLANE_AND_REGISTER;
2636 *dst++ = c;
2637 continue;
2638 }
2639 /* fall down to treat '\r' as '\n' ... */
2640 c = '\n';
19a8d9e0 2641 }
b73bfc1c 2642 if (c == '\n')
19a8d9e0 2643 {
b73bfc1c
KH
2644 if (coding->flags & CODING_FLAG_ISO_RESET_AT_EOL)
2645 ENCODE_RESET_PLANE_AND_REGISTER;
2646 if (coding->flags & CODING_FLAG_ISO_INIT_AT_BOL)
2647 bcopy (coding->spec.iso2022.initial_designation,
2648 coding->spec.iso2022.current_designation,
2649 sizeof coding->spec.iso2022.initial_designation);
2650 if (coding->eol_type == CODING_EOL_LF
2651 || coding->eol_type == CODING_EOL_UNDECIDED)
2652 *dst++ = ISO_CODE_LF;
2653 else if (coding->eol_type == CODING_EOL_CRLF)
2654 *dst++ = ISO_CODE_CR, *dst++ = ISO_CODE_LF;
2655 else
2656 *dst++ = ISO_CODE_CR;
2657 CODING_SPEC_ISO_BOL (coding) = 1;
19a8d9e0 2658 }
93dec019 2659 else
19a8d9e0 2660 {
b73bfc1c
KH
2661 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
2662 ENCODE_RESET_PLANE_AND_REGISTER;
2663 *dst++ = c;
19a8d9e0 2664 }
4ed46869 2665 }
b73bfc1c 2666 else if (ASCII_BYTE_P (c))
05e6f5dc 2667 ENCODE_ISO_CHARACTER (c);
b73bfc1c 2668 else if (SINGLE_BYTE_CHAR_P (c))
88993dfd 2669 {
b73bfc1c
KH
2670 *dst++ = c;
2671 coding->errors++;
88993dfd 2672 }
05e6f5dc
KH
2673 else if (coding->flags & CODING_FLAG_ISO_SAFE
2674 && ! CODING_SAFE_CHAR_P (safe_chars, c))
2675 ENCODE_UNSAFE_CHARACTER (c);
b73bfc1c 2676 else
05e6f5dc 2677 ENCODE_ISO_CHARACTER (c);
b73bfc1c
KH
2678
2679 coding->consumed_char++;
84fbb8a0 2680 }
b73bfc1c
KH
2681
2682 label_end_of_loop:
2683 coding->consumed = src_base - source;
d46c5b12 2684 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
2685}
2686
2687\f
2688/*** 4. SJIS and BIG5 handlers ***/
2689
cfb43547 2690/* Although SJIS and BIG5 are not ISO coding systems, they are used
4ed46869
KH
2691 quite widely. So, for the moment, Emacs supports them in the bare
2692 C code. But, in the future, they may be supported only by CCL. */
2693
2694/* SJIS is a coding system encoding three character sets: ASCII, right
2695 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded
2696 as is. A character of charset katakana-jisx0201 is encoded by
2697 "position-code + 0x80". A character of charset japanese-jisx0208
2698 is encoded in 2-byte but two position-codes are divided and shifted
cfb43547 2699 so that it fits in the range below.
4ed46869
KH
2700
2701 --- CODE RANGE of SJIS ---
2702 (character set) (range)
2703 ASCII 0x00 .. 0x7F
682169fe 2704 KATAKANA-JISX0201 0xA1 .. 0xDF
c28a9453 2705 JISX0208 (1st byte) 0x81 .. 0x9F and 0xE0 .. 0xEF
d14d03ac 2706 (2nd byte) 0x40 .. 0x7E and 0x80 .. 0xFC
4ed46869
KH
2707 -------------------------------
2708
2709*/
2710
2711/* BIG5 is a coding system encoding two character sets: ASCII and
2712 Big5. An ASCII character is encoded as is. Big5 is a two-byte
cfb43547 2713 character set and is encoded in two bytes.
4ed46869
KH
2714
2715 --- CODE RANGE of BIG5 ---
2716 (character set) (range)
2717 ASCII 0x00 .. 0x7F
2718 Big5 (1st byte) 0xA1 .. 0xFE
2719 (2nd byte) 0x40 .. 0x7E and 0xA1 .. 0xFE
2720 --------------------------
2721
2722 Since the number of characters in Big5 is larger than maximum
2723 characters in Emacs' charset (96x96), it can't be handled as one
2724 charset. So, in Emacs, Big5 is divided into two: `charset-big5-1'
2725 and `charset-big5-2'. Both are DIMENSION2 and CHARS94. The former
2726 contains frequently used characters and the latter contains less
2727 frequently used characters. */
2728
2729/* Macros to decode or encode a character of Big5 in BIG5. B1 and B2
2730 are the 1st and 2nd position-codes of Big5 in BIG5 coding system.
f458a8e0 2731 C1 and C2 are the 1st and 2nd position-codes of Emacs' internal
4ed46869
KH
2732 format. CHARSET is `charset_big5_1' or `charset_big5_2'. */
2733
2734/* Number of Big5 characters which have the same code in 1st byte. */
2735#define BIG5_SAME_ROW (0xFF - 0xA1 + 0x7F - 0x40)
2736
2737#define DECODE_BIG5(b1, b2, charset, c1, c2) \
2738 do { \
2739 unsigned int temp \
2740 = (b1 - 0xA1) * BIG5_SAME_ROW + b2 - (b2 < 0x7F ? 0x40 : 0x62); \
2741 if (b1 < 0xC9) \
2742 charset = charset_big5_1; \
2743 else \
2744 { \
2745 charset = charset_big5_2; \
2746 temp -= (0xC9 - 0xA1) * BIG5_SAME_ROW; \
2747 } \
2748 c1 = temp / (0xFF - 0xA1) + 0x21; \
2749 c2 = temp % (0xFF - 0xA1) + 0x21; \
2750 } while (0)
2751
2752#define ENCODE_BIG5(charset, c1, c2, b1, b2) \
2753 do { \
2754 unsigned int temp = (c1 - 0x21) * (0xFF - 0xA1) + (c2 - 0x21); \
2755 if (charset == charset_big5_2) \
2756 temp += BIG5_SAME_ROW * (0xC9 - 0xA1); \
2757 b1 = temp / BIG5_SAME_ROW + 0xA1; \
2758 b2 = temp % BIG5_SAME_ROW; \
2759 b2 += b2 < 0x3F ? 0x40 : 0x62; \
2760 } while (0)
2761
2762/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2763 Check if a text is encoded in SJIS. If it is, return
2764 CODING_CATEGORY_MASK_SJIS, else return 0. */
2765
0a28aafb
KH
2766static int
2767detect_coding_sjis (src, src_end, multibytep)
4ed46869 2768 unsigned char *src, *src_end;
0a28aafb 2769 int multibytep;
4ed46869 2770{
b73bfc1c
KH
2771 int c;
2772 /* Dummy for ONE_MORE_BYTE. */
2773 struct coding_system dummy_coding;
2774 struct coding_system *coding = &dummy_coding;
4ed46869 2775
b73bfc1c 2776 while (1)
4ed46869 2777 {
0a28aafb 2778 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
682169fe
KH
2779 if (c < 0x80)
2780 continue;
2781 if (c == 0x80 || c == 0xA0 || c > 0xEF)
2782 return 0;
2783 if (c <= 0x9F || c >= 0xE0)
4ed46869 2784 {
682169fe
KH
2785 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
2786 if (c < 0x40 || c == 0x7F || c > 0xFC)
4ed46869
KH
2787 return 0;
2788 }
2789 }
b73bfc1c 2790 label_end_of_loop:
4ed46869
KH
2791 return CODING_CATEGORY_MASK_SJIS;
2792}
2793
2794/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2795 Check if a text is encoded in BIG5. If it is, return
2796 CODING_CATEGORY_MASK_BIG5, else return 0. */
2797
0a28aafb
KH
2798static int
2799detect_coding_big5 (src, src_end, multibytep)
4ed46869 2800 unsigned char *src, *src_end;
0a28aafb 2801 int multibytep;
4ed46869 2802{
b73bfc1c
KH
2803 int c;
2804 /* Dummy for ONE_MORE_BYTE. */
2805 struct coding_system dummy_coding;
2806 struct coding_system *coding = &dummy_coding;
4ed46869 2807
b73bfc1c 2808 while (1)
4ed46869 2809 {
0a28aafb 2810 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
682169fe
KH
2811 if (c < 0x80)
2812 continue;
2813 if (c < 0xA1 || c > 0xFE)
2814 return 0;
2815 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
2816 if (c < 0x40 || (c > 0x7F && c < 0xA1) || c > 0xFE)
2817 return 0;
4ed46869 2818 }
b73bfc1c 2819 label_end_of_loop:
4ed46869
KH
2820 return CODING_CATEGORY_MASK_BIG5;
2821}
2822
fa42c37f
KH
2823/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2824 Check if a text is encoded in UTF-8. If it is, return
2825 CODING_CATEGORY_MASK_UTF_8, else return 0. */
2826
2827#define UTF_8_1_OCTET_P(c) ((c) < 0x80)
2828#define UTF_8_EXTRA_OCTET_P(c) (((c) & 0xC0) == 0x80)
2829#define UTF_8_2_OCTET_LEADING_P(c) (((c) & 0xE0) == 0xC0)
2830#define UTF_8_3_OCTET_LEADING_P(c) (((c) & 0xF0) == 0xE0)
2831#define UTF_8_4_OCTET_LEADING_P(c) (((c) & 0xF8) == 0xF0)
2832#define UTF_8_5_OCTET_LEADING_P(c) (((c) & 0xFC) == 0xF8)
2833#define UTF_8_6_OCTET_LEADING_P(c) (((c) & 0xFE) == 0xFC)
2834
0a28aafb
KH
2835static int
2836detect_coding_utf_8 (src, src_end, multibytep)
fa42c37f 2837 unsigned char *src, *src_end;
0a28aafb 2838 int multibytep;
fa42c37f
KH
2839{
2840 unsigned char c;
2841 int seq_maybe_bytes;
b73bfc1c
KH
2842 /* Dummy for ONE_MORE_BYTE. */
2843 struct coding_system dummy_coding;
2844 struct coding_system *coding = &dummy_coding;
fa42c37f 2845
b73bfc1c 2846 while (1)
fa42c37f 2847 {
0a28aafb 2848 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
fa42c37f
KH
2849 if (UTF_8_1_OCTET_P (c))
2850 continue;
2851 else if (UTF_8_2_OCTET_LEADING_P (c))
2852 seq_maybe_bytes = 1;
2853 else if (UTF_8_3_OCTET_LEADING_P (c))
2854 seq_maybe_bytes = 2;
2855 else if (UTF_8_4_OCTET_LEADING_P (c))
2856 seq_maybe_bytes = 3;
2857 else if (UTF_8_5_OCTET_LEADING_P (c))
2858 seq_maybe_bytes = 4;
2859 else if (UTF_8_6_OCTET_LEADING_P (c))
2860 seq_maybe_bytes = 5;
2861 else
2862 return 0;
2863
2864 do
2865 {
0a28aafb 2866 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
fa42c37f
KH
2867 if (!UTF_8_EXTRA_OCTET_P (c))
2868 return 0;
2869 seq_maybe_bytes--;
2870 }
2871 while (seq_maybe_bytes > 0);
2872 }
2873
b73bfc1c 2874 label_end_of_loop:
fa42c37f
KH
2875 return CODING_CATEGORY_MASK_UTF_8;
2876}
2877
2878/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2879 Check if a text is encoded in UTF-16 Big Endian (endian == 1) or
2880 Little Endian (otherwise). If it is, return
2881 CODING_CATEGORY_MASK_UTF_16_BE or CODING_CATEGORY_MASK_UTF_16_LE,
2882 else return 0. */
2883
2884#define UTF_16_INVALID_P(val) \
2885 (((val) == 0xFFFE) \
2886 || ((val) == 0xFFFF))
2887
2888#define UTF_16_HIGH_SURROGATE_P(val) \
2889 (((val) & 0xD800) == 0xD800)
2890
2891#define UTF_16_LOW_SURROGATE_P(val) \
2892 (((val) & 0xDC00) == 0xDC00)
2893
0a28aafb
KH
2894static int
2895detect_coding_utf_16 (src, src_end, multibytep)
fa42c37f 2896 unsigned char *src, *src_end;
0a28aafb 2897 int multibytep;
fa42c37f 2898{
b73bfc1c 2899 unsigned char c1, c2;
1c7457e2 2900 /* Dummy for ONE_MORE_BYTE_CHECK_MULTIBYTE. */
b73bfc1c
KH
2901 struct coding_system dummy_coding;
2902 struct coding_system *coding = &dummy_coding;
fa42c37f 2903
0a28aafb
KH
2904 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1, multibytep);
2905 ONE_MORE_BYTE_CHECK_MULTIBYTE (c2, multibytep);
b73bfc1c
KH
2906
2907 if ((c1 == 0xFF) && (c2 == 0xFE))
fa42c37f 2908 return CODING_CATEGORY_MASK_UTF_16_LE;
b73bfc1c 2909 else if ((c1 == 0xFE) && (c2 == 0xFF))
fa42c37f
KH
2910 return CODING_CATEGORY_MASK_UTF_16_BE;
2911
b73bfc1c 2912 label_end_of_loop:
fa42c37f
KH
2913 return 0;
2914}
2915
4ed46869
KH
2916/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
2917 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */
2918
b73bfc1c 2919static void
4ed46869 2920decode_coding_sjis_big5 (coding, source, destination,
d46c5b12 2921 src_bytes, dst_bytes, sjis_p)
4ed46869
KH
2922 struct coding_system *coding;
2923 unsigned char *source, *destination;
2924 int src_bytes, dst_bytes;
4ed46869
KH
2925 int sjis_p;
2926{
2927 unsigned char *src = source;
2928 unsigned char *src_end = source + src_bytes;
2929 unsigned char *dst = destination;
2930 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c
KH
2931 /* SRC_BASE remembers the start position in source in each loop.
2932 The loop will be exited when there's not enough source code
2933 (within macro ONE_MORE_BYTE), or when there's not enough
2934 destination area to produce a character (within macro
2935 EMIT_CHAR). */
2936 unsigned char *src_base;
2937 Lisp_Object translation_table;
a5d301df 2938
b73bfc1c
KH
2939 if (NILP (Venable_character_translation))
2940 translation_table = Qnil;
2941 else
2942 {
2943 translation_table = coding->translation_table_for_decode;
2944 if (NILP (translation_table))
2945 translation_table = Vstandard_translation_table_for_decode;
2946 }
4ed46869 2947
d46c5b12 2948 coding->produced_char = 0;
b73bfc1c 2949 while (1)
4ed46869 2950 {
b73bfc1c
KH
2951 int c, charset, c1, c2;
2952
2953 src_base = src;
2954 ONE_MORE_BYTE (c1);
2955
2956 if (c1 < 0x80)
4ed46869 2957 {
b73bfc1c
KH
2958 charset = CHARSET_ASCII;
2959 if (c1 < 0x20)
4ed46869 2960 {
b73bfc1c 2961 if (c1 == '\r')
d46c5b12 2962 {
b73bfc1c 2963 if (coding->eol_type == CODING_EOL_CRLF)
d46c5b12 2964 {
b73bfc1c
KH
2965 ONE_MORE_BYTE (c2);
2966 if (c2 == '\n')
2967 c1 = c2;
b73bfc1c
KH
2968 else
2969 /* To process C2 again, SRC is subtracted by 1. */
2970 src--;
d46c5b12 2971 }
b73bfc1c
KH
2972 else if (coding->eol_type == CODING_EOL_CR)
2973 c1 = '\n';
2974 }
2975 else if (c1 == '\n'
2976 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2977 && (coding->eol_type == CODING_EOL_CR
2978 || coding->eol_type == CODING_EOL_CRLF))
2979 {
2980 coding->result = CODING_FINISH_INCONSISTENT_EOL;
2981 goto label_end_of_loop;
d46c5b12 2982 }
4ed46869 2983 }
4ed46869 2984 }
54f78171 2985 else
b73bfc1c 2986 {
4ed46869
KH
2987 if (sjis_p)
2988 {
682169fe 2989 if (c1 == 0x80 || c1 == 0xA0 || c1 > 0xEF)
b73bfc1c 2990 goto label_invalid_code;
682169fe 2991 if (c1 <= 0x9F || c1 >= 0xE0)
fb88bf2d 2992 {
54f78171
KH
2993 /* SJIS -> JISX0208 */
2994 ONE_MORE_BYTE (c2);
b73bfc1c
KH
2995 if (c2 < 0x40 || c2 == 0x7F || c2 > 0xFC)
2996 goto label_invalid_code;
2997 DECODE_SJIS (c1, c2, c1, c2);
2998 charset = charset_jisx0208;
5e34de15 2999 }
fb88bf2d 3000 else
b73bfc1c
KH
3001 /* SJIS -> JISX0201-Kana */
3002 charset = charset_katakana_jisx0201;
4ed46869 3003 }
fb88bf2d 3004 else
fb88bf2d 3005 {
54f78171 3006 /* BIG5 -> Big5 */
682169fe 3007 if (c1 < 0xA0 || c1 > 0xFE)
b73bfc1c
KH
3008 goto label_invalid_code;
3009 ONE_MORE_BYTE (c2);
3010 if (c2 < 0x40 || (c2 > 0x7E && c2 < 0xA1) || c2 > 0xFE)
3011 goto label_invalid_code;
3012 DECODE_BIG5 (c1, c2, charset, c1, c2);
4ed46869
KH
3013 }
3014 }
4ed46869 3015
b73bfc1c
KH
3016 c = DECODE_ISO_CHARACTER (charset, c1, c2);
3017 EMIT_CHAR (c);
fb88bf2d
KH
3018 continue;
3019
b73bfc1c
KH
3020 label_invalid_code:
3021 coding->errors++;
4ed46869 3022 src = src_base;
b73bfc1c
KH
3023 c = *src++;
3024 EMIT_CHAR (c);
fb88bf2d 3025 }
d46c5b12 3026
b73bfc1c
KH
3027 label_end_of_loop:
3028 coding->consumed = coding->consumed_char = src_base - source;
d46c5b12 3029 coding->produced = dst - destination;
b73bfc1c 3030 return;
4ed46869
KH
3031}
3032
3033/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".
b73bfc1c
KH
3034 This function can encode charsets `ascii', `katakana-jisx0201',
3035 `japanese-jisx0208', `chinese-big5-1', and `chinese-big5-2'. We
3036 are sure that all these charsets are registered as official charset
4ed46869
KH
3037 (i.e. do not have extended leading-codes). Characters of other
3038 charsets are produced without any encoding. If SJIS_P is 1, encode
3039 SJIS text, else encode BIG5 text. */
3040
b73bfc1c 3041static void
4ed46869 3042encode_coding_sjis_big5 (coding, source, destination,
d46c5b12 3043 src_bytes, dst_bytes, sjis_p)
4ed46869
KH
3044 struct coding_system *coding;
3045 unsigned char *source, *destination;
3046 int src_bytes, dst_bytes;
4ed46869
KH
3047 int sjis_p;
3048{
3049 unsigned char *src = source;
3050 unsigned char *src_end = source + src_bytes;
3051 unsigned char *dst = destination;
3052 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c
KH
3053 /* SRC_BASE remembers the start position in source in each loop.
3054 The loop will be exited when there's not enough source text to
3055 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3056 there's not enough destination area to produce encoded codes
3057 (within macro EMIT_BYTES). */
3058 unsigned char *src_base;
3059 Lisp_Object translation_table;
4ed46869 3060
b73bfc1c
KH
3061 if (NILP (Venable_character_translation))
3062 translation_table = Qnil;
3063 else
4ed46869 3064 {
39658efc 3065 translation_table = coding->translation_table_for_encode;
b73bfc1c 3066 if (NILP (translation_table))
39658efc 3067 translation_table = Vstandard_translation_table_for_encode;
b73bfc1c 3068 }
a5d301df 3069
b73bfc1c
KH
3070 while (1)
3071 {
3072 int c, charset, c1, c2;
4ed46869 3073
b73bfc1c
KH
3074 src_base = src;
3075 ONE_MORE_CHAR (c);
93dec019 3076
b73bfc1c
KH
3077 /* Now encode the character C. */
3078 if (SINGLE_BYTE_CHAR_P (c))
3079 {
3080 switch (c)
4ed46869 3081 {
b73bfc1c 3082 case '\r':
7371fe0a 3083 if (!(coding->mode & CODING_MODE_SELECTIVE_DISPLAY))
b73bfc1c
KH
3084 {
3085 EMIT_ONE_BYTE (c);
3086 break;
3087 }
3088 c = '\n';
3089 case '\n':
3090 if (coding->eol_type == CODING_EOL_CRLF)
3091 {
3092 EMIT_TWO_BYTES ('\r', c);
3093 break;
3094 }
3095 else if (coding->eol_type == CODING_EOL_CR)
3096 c = '\r';
3097 default:
3098 EMIT_ONE_BYTE (c);
3099 }
3100 }
3101 else
3102 {
3103 SPLIT_CHAR (c, charset, c1, c2);
3104 if (sjis_p)
3105 {
3106 if (charset == charset_jisx0208
3107 || charset == charset_jisx0208_1978)
3108 {
3109 ENCODE_SJIS (c1, c2, c1, c2);
3110 EMIT_TWO_BYTES (c1, c2);
3111 }
39658efc
KH
3112 else if (charset == charset_katakana_jisx0201)
3113 EMIT_ONE_BYTE (c1 | 0x80);
fc53a214
KH
3114 else if (charset == charset_latin_jisx0201)
3115 EMIT_ONE_BYTE (c1);
b73bfc1c
KH
3116 else
3117 /* There's no way other than producing the internal
3118 codes as is. */
3119 EMIT_BYTES (src_base, src);
4ed46869 3120 }
4ed46869 3121 else
b73bfc1c
KH
3122 {
3123 if (charset == charset_big5_1 || charset == charset_big5_2)
3124 {
3125 ENCODE_BIG5 (charset, c1, c2, c1, c2);
3126 EMIT_TWO_BYTES (c1, c2);
3127 }
3128 else
3129 /* There's no way other than producing the internal
3130 codes as is. */
3131 EMIT_BYTES (src_base, src);
3132 }
4ed46869 3133 }
b73bfc1c 3134 coding->consumed_char++;
4ed46869
KH
3135 }
3136
b73bfc1c
KH
3137 label_end_of_loop:
3138 coding->consumed = src_base - source;
d46c5b12 3139 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
3140}
3141
3142\f
1397dc18
KH
3143/*** 5. CCL handlers ***/
3144
3145/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
3146 Check if a text is encoded in a coding system of which
3147 encoder/decoder are written in CCL program. If it is, return
3148 CODING_CATEGORY_MASK_CCL, else return 0. */
3149
0a28aafb
KH
3150static int
3151detect_coding_ccl (src, src_end, multibytep)
1397dc18 3152 unsigned char *src, *src_end;
0a28aafb 3153 int multibytep;
1397dc18
KH
3154{
3155 unsigned char *valid;
b73bfc1c
KH
3156 int c;
3157 /* Dummy for ONE_MORE_BYTE. */
3158 struct coding_system dummy_coding;
3159 struct coding_system *coding = &dummy_coding;
1397dc18
KH
3160
3161 /* No coding system is assigned to coding-category-ccl. */
3162 if (!coding_system_table[CODING_CATEGORY_IDX_CCL])
3163 return 0;
3164
3165 valid = coding_system_table[CODING_CATEGORY_IDX_CCL]->spec.ccl.valid_codes;
b73bfc1c 3166 while (1)
1397dc18 3167 {
0a28aafb 3168 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
b73bfc1c
KH
3169 if (! valid[c])
3170 return 0;
1397dc18 3171 }
b73bfc1c 3172 label_end_of_loop:
1397dc18
KH
3173 return CODING_CATEGORY_MASK_CCL;
3174}
3175
3176\f
3177/*** 6. End-of-line handlers ***/
4ed46869 3178
b73bfc1c 3179/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
4ed46869 3180
b73bfc1c 3181static void
d46c5b12 3182decode_eol (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
3183 struct coding_system *coding;
3184 unsigned char *source, *destination;
3185 int src_bytes, dst_bytes;
4ed46869
KH
3186{
3187 unsigned char *src = source;
4ed46869 3188 unsigned char *dst = destination;
b73bfc1c
KH
3189 unsigned char *src_end = src + src_bytes;
3190 unsigned char *dst_end = dst + dst_bytes;
3191 Lisp_Object translation_table;
3192 /* SRC_BASE remembers the start position in source in each loop.
3193 The loop will be exited when there's not enough source code
3194 (within macro ONE_MORE_BYTE), or when there's not enough
3195 destination area to produce a character (within macro
3196 EMIT_CHAR). */
3197 unsigned char *src_base;
3198 int c;
3199
3200 translation_table = Qnil;
4ed46869
KH
3201 switch (coding->eol_type)
3202 {
3203 case CODING_EOL_CRLF:
b73bfc1c 3204 while (1)
d46c5b12 3205 {
b73bfc1c
KH
3206 src_base = src;
3207 ONE_MORE_BYTE (c);
3208 if (c == '\r')
fb88bf2d 3209 {
b73bfc1c
KH
3210 ONE_MORE_BYTE (c);
3211 if (c != '\n')
3212 {
b73bfc1c
KH
3213 src--;
3214 c = '\r';
3215 }
fb88bf2d 3216 }
b73bfc1c
KH
3217 else if (c == '\n'
3218 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL))
d46c5b12 3219 {
b73bfc1c
KH
3220 coding->result = CODING_FINISH_INCONSISTENT_EOL;
3221 goto label_end_of_loop;
d46c5b12 3222 }
b73bfc1c 3223 EMIT_CHAR (c);
d46c5b12 3224 }
b73bfc1c
KH
3225 break;
3226
3227 case CODING_EOL_CR:
3228 while (1)
d46c5b12 3229 {
b73bfc1c
KH
3230 src_base = src;
3231 ONE_MORE_BYTE (c);
3232 if (c == '\n')
3233 {
3234 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
3235 {
3236 coding->result = CODING_FINISH_INCONSISTENT_EOL;
3237 goto label_end_of_loop;
3238 }
3239 }
3240 else if (c == '\r')
3241 c = '\n';
3242 EMIT_CHAR (c);
d46c5b12 3243 }
4ed46869
KH
3244 break;
3245
b73bfc1c
KH
3246 default: /* no need for EOL handling */
3247 while (1)
d46c5b12 3248 {
b73bfc1c
KH
3249 src_base = src;
3250 ONE_MORE_BYTE (c);
3251 EMIT_CHAR (c);
d46c5b12 3252 }
4ed46869
KH
3253 }
3254
b73bfc1c
KH
3255 label_end_of_loop:
3256 coding->consumed = coding->consumed_char = src_base - source;
3257 coding->produced = dst - destination;
3258 return;
4ed46869
KH
3259}
3260
3261/* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode
b73bfc1c 3262 format of end-of-line according to `coding->eol_type'. It also
8ca3766a 3263 convert multibyte form 8-bit characters to unibyte if
b73bfc1c
KH
3264 CODING->src_multibyte is nonzero. If `coding->mode &
3265 CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code '\r' in source text
3266 also means end-of-line. */
4ed46869 3267
b73bfc1c 3268static void
d46c5b12 3269encode_eol (coding, source, destination, src_bytes, dst_bytes)
4ed46869 3270 struct coding_system *coding;
a4244313
KR
3271 const unsigned char *source;
3272 unsigned char *destination;
4ed46869 3273 int src_bytes, dst_bytes;
4ed46869 3274{
a4244313 3275 const unsigned char *src = source;
4ed46869 3276 unsigned char *dst = destination;
a4244313 3277 const unsigned char *src_end = src + src_bytes;
b73bfc1c
KH
3278 unsigned char *dst_end = dst + dst_bytes;
3279 Lisp_Object translation_table;
3280 /* SRC_BASE remembers the start position in source in each loop.
3281 The loop will be exited when there's not enough source text to
3282 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3283 there's not enough destination area to produce encoded codes
3284 (within macro EMIT_BYTES). */
a4244313
KR
3285 const unsigned char *src_base;
3286 unsigned char *tmp;
b73bfc1c
KH
3287 int c;
3288 int selective_display = coding->mode & CODING_MODE_SELECTIVE_DISPLAY;
3289
3290 translation_table = Qnil;
3291 if (coding->src_multibyte
3292 && *(src_end - 1) == LEADING_CODE_8_BIT_CONTROL)
3293 {
3294 src_end--;
3295 src_bytes--;
3296 coding->result = CODING_FINISH_INSUFFICIENT_SRC;
3297 }
fb88bf2d 3298
d46c5b12
KH
3299 if (coding->eol_type == CODING_EOL_CRLF)
3300 {
b73bfc1c 3301 while (src < src_end)
d46c5b12 3302 {
b73bfc1c 3303 src_base = src;
d46c5b12 3304 c = *src++;
b73bfc1c
KH
3305 if (c >= 0x20)
3306 EMIT_ONE_BYTE (c);
3307 else if (c == '\n' || (c == '\r' && selective_display))
3308 EMIT_TWO_BYTES ('\r', '\n');
d46c5b12 3309 else
b73bfc1c 3310 EMIT_ONE_BYTE (c);
d46c5b12 3311 }
ff2b1ea9 3312 src_base = src;
b73bfc1c 3313 label_end_of_loop:
005f0d35 3314 ;
d46c5b12
KH
3315 }
3316 else
4ed46869 3317 {
78a629d2 3318 if (!dst_bytes || src_bytes <= dst_bytes)
4ed46869 3319 {
b73bfc1c
KH
3320 safe_bcopy (src, dst, src_bytes);
3321 src_base = src_end;
3322 dst += src_bytes;
d46c5b12 3323 }
d46c5b12 3324 else
b73bfc1c
KH
3325 {
3326 if (coding->src_multibyte
3327 && *(src + dst_bytes - 1) == LEADING_CODE_8_BIT_CONTROL)
3328 dst_bytes--;
3329 safe_bcopy (src, dst, dst_bytes);
3330 src_base = src + dst_bytes;
3331 dst = destination + dst_bytes;
3332 coding->result = CODING_FINISH_INSUFFICIENT_DST;
3333 }
993824c9 3334 if (coding->eol_type == CODING_EOL_CR)
d46c5b12 3335 {
a4244313
KR
3336 for (tmp = destination; tmp < dst; tmp++)
3337 if (*tmp == '\n') *tmp = '\r';
d46c5b12 3338 }
b73bfc1c 3339 else if (selective_display)
d46c5b12 3340 {
a4244313
KR
3341 for (tmp = destination; tmp < dst; tmp++)
3342 if (*tmp == '\r') *tmp = '\n';
4ed46869 3343 }
4ed46869 3344 }
b73bfc1c
KH
3345 if (coding->src_multibyte)
3346 dst = destination + str_as_unibyte (destination, dst - destination);
4ed46869 3347
b73bfc1c
KH
3348 coding->consumed = src_base - source;
3349 coding->produced = dst - destination;
78a629d2 3350 coding->produced_char = coding->produced;
4ed46869
KH
3351}
3352
3353\f
1397dc18 3354/*** 7. C library functions ***/
4ed46869 3355
cfb43547 3356/* In Emacs Lisp, a coding system is represented by a Lisp symbol which
4ed46869 3357 has a property `coding-system'. The value of this property is a
cfb43547 3358 vector of length 5 (called the coding-vector). Among elements of
4ed46869
KH
3359 this vector, the first (element[0]) and the fifth (element[4])
3360 carry important information for decoding/encoding. Before
3361 decoding/encoding, this information should be set in fields of a
3362 structure of type `coding_system'.
3363
cfb43547 3364 The value of the property `coding-system' can be a symbol of another
4ed46869
KH
3365 subsidiary coding-system. In that case, Emacs gets coding-vector
3366 from that symbol.
3367
3368 `element[0]' contains information to be set in `coding->type'. The
3369 value and its meaning is as follows:
3370
0ef69138
KH
3371 0 -- coding_type_emacs_mule
3372 1 -- coding_type_sjis
3373 2 -- coding_type_iso2022
3374 3 -- coding_type_big5
3375 4 -- coding_type_ccl encoder/decoder written in CCL
3376 nil -- coding_type_no_conversion
3377 t -- coding_type_undecided (automatic conversion on decoding,
3378 no-conversion on encoding)
4ed46869
KH
3379
3380 `element[4]' contains information to be set in `coding->flags' and
3381 `coding->spec'. The meaning varies by `coding->type'.
3382
3383 If `coding->type' is `coding_type_iso2022', element[4] is a vector
3384 of length 32 (of which the first 13 sub-elements are used now).
3385 Meanings of these sub-elements are:
3386
3387 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso2022'
3388 If the value is an integer of valid charset, the charset is
3389 assumed to be designated to graphic register N initially.
3390
3391 If the value is minus, it is a minus value of charset which
3392 reserves graphic register N, which means that the charset is
3393 not designated initially but should be designated to graphic
3394 register N just before encoding a character in that charset.
3395
3396 If the value is nil, graphic register N is never used on
3397 encoding.
93dec019 3398
4ed46869
KH
3399 sub-element[N] where N is 4 through 11: to be set in `coding->flags'
3400 Each value takes t or nil. See the section ISO2022 of
3401 `coding.h' for more information.
3402
3403 If `coding->type' is `coding_type_big5', element[4] is t to denote
3404 BIG5-ETen or nil to denote BIG5-HKU.
3405
3406 If `coding->type' takes the other value, element[4] is ignored.
3407
cfb43547 3408 Emacs Lisp's coding systems also carry information about format of
4ed46869
KH
3409 end-of-line in a value of property `eol-type'. If the value is
3410 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
3411 means CODING_EOL_CR. If it is not integer, it should be a vector
3412 of subsidiary coding systems of which property `eol-type' has one
cfb43547 3413 of the above values.
4ed46869
KH
3414
3415*/
3416
3417/* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
3418 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING
3419 is setup so that no conversion is necessary and return -1, else
3420 return 0. */
3421
3422int
e0e989f6
KH
3423setup_coding_system (coding_system, coding)
3424 Lisp_Object coding_system;
4ed46869
KH
3425 struct coding_system *coding;
3426{
d46c5b12 3427 Lisp_Object coding_spec, coding_type, eol_type, plist;
4608c386 3428 Lisp_Object val;
4ed46869 3429
c07c8e12
KH
3430 /* At first, zero clear all members. */
3431 bzero (coding, sizeof (struct coding_system));
3432
d46c5b12 3433 /* Initialize some fields required for all kinds of coding systems. */
774324d6 3434 coding->symbol = coding_system;
d46c5b12
KH
3435 coding->heading_ascii = -1;
3436 coding->post_read_conversion = coding->pre_write_conversion = Qnil;
ec6d2bb8
KH
3437 coding->composing = COMPOSITION_DISABLED;
3438 coding->cmp_data = NULL;
1f5dbf34
KH
3439
3440 if (NILP (coding_system))
3441 goto label_invalid_coding_system;
3442
4608c386 3443 coding_spec = Fget (coding_system, Qcoding_system);
1f5dbf34 3444
4608c386
KH
3445 if (!VECTORP (coding_spec)
3446 || XVECTOR (coding_spec)->size != 5
3447 || !CONSP (XVECTOR (coding_spec)->contents[3]))
4ed46869 3448 goto label_invalid_coding_system;
4608c386 3449
d46c5b12
KH
3450 eol_type = inhibit_eol_conversion ? Qnil : Fget (coding_system, Qeol_type);
3451 if (VECTORP (eol_type))
3452 {
3453 coding->eol_type = CODING_EOL_UNDECIDED;
3454 coding->common_flags = CODING_REQUIRE_DETECTION_MASK;
3455 }
3456 else if (XFASTINT (eol_type) == 1)
3457 {
3458 coding->eol_type = CODING_EOL_CRLF;
3459 coding->common_flags
3460 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
3461 }
3462 else if (XFASTINT (eol_type) == 2)
3463 {
3464 coding->eol_type = CODING_EOL_CR;
3465 coding->common_flags
3466 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
3467 }
3468 else
3469 coding->eol_type = CODING_EOL_LF;
3470
3471 coding_type = XVECTOR (coding_spec)->contents[0];
3472 /* Try short cut. */
3473 if (SYMBOLP (coding_type))
3474 {
3475 if (EQ (coding_type, Qt))
3476 {
3477 coding->type = coding_type_undecided;
3478 coding->common_flags |= CODING_REQUIRE_DETECTION_MASK;
3479 }
3480 else
3481 coding->type = coding_type_no_conversion;
9b96232f
KH
3482 /* Initialize this member. Any thing other than
3483 CODING_CATEGORY_IDX_UTF_16_BE and
3484 CODING_CATEGORY_IDX_UTF_16_LE are ok because they have
3485 special treatment in detect_eol. */
3486 coding->category_idx = CODING_CATEGORY_IDX_EMACS_MULE;
3487
d46c5b12
KH
3488 return 0;
3489 }
3490
d46c5b12
KH
3491 /* Get values of coding system properties:
3492 `post-read-conversion', `pre-write-conversion',
f967223b 3493 `translation-table-for-decode', `translation-table-for-encode'. */
4608c386 3494 plist = XVECTOR (coding_spec)->contents[3];
b843d1ae 3495 /* Pre & post conversion functions should be disabled if
8ca3766a 3496 inhibit_eol_conversion is nonzero. This is the case that a code
b843d1ae
KH
3497 conversion function is called while those functions are running. */
3498 if (! inhibit_pre_post_conversion)
3499 {
3500 coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion);
3501 coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion);
3502 }
f967223b 3503 val = Fplist_get (plist, Qtranslation_table_for_decode);
4608c386 3504 if (SYMBOLP (val))
f967223b
KH
3505 val = Fget (val, Qtranslation_table_for_decode);
3506 coding->translation_table_for_decode = CHAR_TABLE_P (val) ? val : Qnil;
3507 val = Fplist_get (plist, Qtranslation_table_for_encode);
4608c386 3508 if (SYMBOLP (val))
f967223b
KH
3509 val = Fget (val, Qtranslation_table_for_encode);
3510 coding->translation_table_for_encode = CHAR_TABLE_P (val) ? val : Qnil;
d46c5b12
KH
3511 val = Fplist_get (plist, Qcoding_category);
3512 if (!NILP (val))
3513 {
3514 val = Fget (val, Qcoding_category_index);
3515 if (INTEGERP (val))
3516 coding->category_idx = XINT (val);
3517 else
3518 goto label_invalid_coding_system;
3519 }
3520 else
3521 goto label_invalid_coding_system;
93dec019 3522
ec6d2bb8
KH
3523 /* If the coding system has non-nil `composition' property, enable
3524 composition handling. */
3525 val = Fplist_get (plist, Qcomposition);
3526 if (!NILP (val))
3527 coding->composing = COMPOSITION_NO;
3528
d46c5b12 3529 switch (XFASTINT (coding_type))
4ed46869
KH
3530 {
3531 case 0:
0ef69138 3532 coding->type = coding_type_emacs_mule;
aa72b389
KH
3533 coding->common_flags
3534 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
c952af22
KH
3535 if (!NILP (coding->post_read_conversion))
3536 coding->common_flags |= CODING_REQUIRE_DECODING_MASK;
3537 if (!NILP (coding->pre_write_conversion))
3538 coding->common_flags |= CODING_REQUIRE_ENCODING_MASK;
4ed46869
KH
3539 break;
3540
3541 case 1:
3542 coding->type = coding_type_sjis;
c952af22
KH
3543 coding->common_flags
3544 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869
KH
3545 break;
3546
3547 case 2:
3548 coding->type = coding_type_iso2022;
c952af22
KH
3549 coding->common_flags
3550 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3551 {
70c22245 3552 Lisp_Object val, temp;
4ed46869 3553 Lisp_Object *flags;
d46c5b12 3554 int i, charset, reg_bits = 0;
4ed46869 3555
4608c386 3556 val = XVECTOR (coding_spec)->contents[4];
f44d27ce 3557
4ed46869
KH
3558 if (!VECTORP (val) || XVECTOR (val)->size != 32)
3559 goto label_invalid_coding_system;
3560
3561 flags = XVECTOR (val)->contents;
3562 coding->flags
3563 = ((NILP (flags[4]) ? 0 : CODING_FLAG_ISO_SHORT_FORM)
3564 | (NILP (flags[5]) ? 0 : CODING_FLAG_ISO_RESET_AT_EOL)
3565 | (NILP (flags[6]) ? 0 : CODING_FLAG_ISO_RESET_AT_CNTL)
3566 | (NILP (flags[7]) ? 0 : CODING_FLAG_ISO_SEVEN_BITS)
3567 | (NILP (flags[8]) ? 0 : CODING_FLAG_ISO_LOCKING_SHIFT)
3568 | (NILP (flags[9]) ? 0 : CODING_FLAG_ISO_SINGLE_SHIFT)
3569 | (NILP (flags[10]) ? 0 : CODING_FLAG_ISO_USE_ROMAN)
3570 | (NILP (flags[11]) ? 0 : CODING_FLAG_ISO_USE_OLDJIS)
e0e989f6
KH
3571 | (NILP (flags[12]) ? 0 : CODING_FLAG_ISO_NO_DIRECTION)
3572 | (NILP (flags[13]) ? 0 : CODING_FLAG_ISO_INIT_AT_BOL)
c4825358
KH
3573 | (NILP (flags[14]) ? 0 : CODING_FLAG_ISO_DESIGNATE_AT_BOL)
3574 | (NILP (flags[15]) ? 0 : CODING_FLAG_ISO_SAFE)
3f003981 3575 | (NILP (flags[16]) ? 0 : CODING_FLAG_ISO_LATIN_EXTRA)
c4825358 3576 );
4ed46869
KH
3577
3578 /* Invoke graphic register 0 to plane 0. */
3579 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
3580 /* Invoke graphic register 1 to plane 1 if we can use full 8-bit. */
3581 CODING_SPEC_ISO_INVOCATION (coding, 1)
3582 = (coding->flags & CODING_FLAG_ISO_SEVEN_BITS ? -1 : 1);
3583 /* Not single shifting at first. */
6e85d753 3584 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0;
e0e989f6 3585 /* Beginning of buffer should also be regarded as bol. */
6e85d753 3586 CODING_SPEC_ISO_BOL (coding) = 1;
4ed46869 3587
70c22245
KH
3588 for (charset = 0; charset <= MAX_CHARSET; charset++)
3589 CODING_SPEC_ISO_REVISION_NUMBER (coding, charset) = 255;
3590 val = Vcharset_revision_alist;
3591 while (CONSP (val))
3592 {
03699b14 3593 charset = get_charset_id (Fcar_safe (XCAR (val)));
70c22245 3594 if (charset >= 0
03699b14 3595 && (temp = Fcdr_safe (XCAR (val)), INTEGERP (temp))
70c22245
KH
3596 && (i = XINT (temp), (i >= 0 && (i + '@') < 128)))
3597 CODING_SPEC_ISO_REVISION_NUMBER (coding, charset) = i;
03699b14 3598 val = XCDR (val);
70c22245
KH
3599 }
3600
4ed46869
KH
3601 /* Checks FLAGS[REG] (REG = 0, 1, 2 3) and decide designations.
3602 FLAGS[REG] can be one of below:
3603 integer CHARSET: CHARSET occupies register I,
3604 t: designate nothing to REG initially, but can be used
3605 by any charsets,
3606 list of integer, nil, or t: designate the first
3607 element (if integer) to REG initially, the remaining
3608 elements (if integer) is designated to REG on request,
d46c5b12 3609 if an element is t, REG can be used by any charsets,
4ed46869 3610 nil: REG is never used. */
467e7675 3611 for (charset = 0; charset <= MAX_CHARSET; charset++)
1ba9e4ab
KH
3612 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3613 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION;
4ed46869
KH
3614 for (i = 0; i < 4; i++)
3615 {
87323294
PJ
3616 if ((INTEGERP (flags[i])
3617 && (charset = XINT (flags[i]), CHARSET_VALID_P (charset)))
e0e989f6 3618 || (charset = get_charset_id (flags[i])) >= 0)
4ed46869
KH
3619 {
3620 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = charset;
3621 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) = i;
3622 }
3623 else if (EQ (flags[i], Qt))
3624 {
3625 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
d46c5b12
KH
3626 reg_bits |= 1 << i;
3627 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
4ed46869
KH
3628 }
3629 else if (CONSP (flags[i]))
3630 {
84d60297
RS
3631 Lisp_Object tail;
3632 tail = flags[i];
4ed46869 3633
d46c5b12 3634 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
87323294
PJ
3635 if ((INTEGERP (XCAR (tail))
3636 && (charset = XINT (XCAR (tail)),
3637 CHARSET_VALID_P (charset)))
03699b14 3638 || (charset = get_charset_id (XCAR (tail))) >= 0)
4ed46869
KH
3639 {
3640 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = charset;
3641 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) =i;
3642 }
3643 else
3644 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
03699b14 3645 tail = XCDR (tail);
4ed46869
KH
3646 while (CONSP (tail))
3647 {
87323294
PJ
3648 if ((INTEGERP (XCAR (tail))
3649 && (charset = XINT (XCAR (tail)),
3650 CHARSET_VALID_P (charset)))
03699b14 3651 || (charset = get_charset_id (XCAR (tail))) >= 0)
70c22245
KH
3652 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3653 = i;
03699b14 3654 else if (EQ (XCAR (tail), Qt))
d46c5b12 3655 reg_bits |= 1 << i;
03699b14 3656 tail = XCDR (tail);
4ed46869
KH
3657 }
3658 }
3659 else
3660 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
93dec019 3661
4ed46869
KH
3662 CODING_SPEC_ISO_DESIGNATION (coding, i)
3663 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i);
3664 }
3665
d46c5b12 3666 if (reg_bits && ! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
4ed46869
KH
3667 {
3668 /* REG 1 can be used only by locking shift in 7-bit env. */
3669 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
d46c5b12 3670 reg_bits &= ~2;
4ed46869
KH
3671 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
3672 /* Without any shifting, only REG 0 and 1 can be used. */
d46c5b12 3673 reg_bits &= 3;
4ed46869
KH
3674 }
3675
d46c5b12
KH
3676 if (reg_bits)
3677 for (charset = 0; charset <= MAX_CHARSET; charset++)
6e85d753 3678 {
928a85c1 3679 if (CHARSET_DEFINED_P (charset)
96148065
KH
3680 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3681 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
d46c5b12
KH
3682 {
3683 /* There exist some default graphic registers to be
96148065 3684 used by CHARSET. */
d46c5b12
KH
3685
3686 /* We had better avoid designating a charset of
3687 CHARS96 to REG 0 as far as possible. */
3688 if (CHARSET_CHARS (charset) == 96)
3689 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3690 = (reg_bits & 2
3691 ? 1 : (reg_bits & 4 ? 2 : (reg_bits & 8 ? 3 : 0)));
3692 else
3693 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3694 = (reg_bits & 1
3695 ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3)));
3696 }
6e85d753 3697 }
4ed46869 3698 }
c952af22 3699 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
d46c5b12 3700 coding->spec.iso2022.last_invalid_designation_register = -1;
4ed46869
KH
3701 break;
3702
3703 case 3:
3704 coding->type = coding_type_big5;
c952af22
KH
3705 coding->common_flags
3706 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3707 coding->flags
4608c386 3708 = (NILP (XVECTOR (coding_spec)->contents[4])
4ed46869
KH
3709 ? CODING_FLAG_BIG5_HKU
3710 : CODING_FLAG_BIG5_ETEN);
3711 break;
3712
3713 case 4:
3714 coding->type = coding_type_ccl;
c952af22
KH
3715 coding->common_flags
3716 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3717 {
84d60297 3718 val = XVECTOR (coding_spec)->contents[4];
ef4ced28
KH
3719 if (! CONSP (val)
3720 || setup_ccl_program (&(coding->spec.ccl.decoder),
03699b14 3721 XCAR (val)) < 0
ef4ced28 3722 || setup_ccl_program (&(coding->spec.ccl.encoder),
03699b14 3723 XCDR (val)) < 0)
4ed46869 3724 goto label_invalid_coding_system;
1397dc18
KH
3725
3726 bzero (coding->spec.ccl.valid_codes, 256);
3727 val = Fplist_get (plist, Qvalid_codes);
3728 if (CONSP (val))
3729 {
3730 Lisp_Object this;
3731
03699b14 3732 for (; CONSP (val); val = XCDR (val))
1397dc18 3733 {
03699b14 3734 this = XCAR (val);
1397dc18
KH
3735 if (INTEGERP (this)
3736 && XINT (this) >= 0 && XINT (this) < 256)
3737 coding->spec.ccl.valid_codes[XINT (this)] = 1;
3738 else if (CONSP (this)
03699b14
KR
3739 && INTEGERP (XCAR (this))
3740 && INTEGERP (XCDR (this)))
1397dc18 3741 {
03699b14
KR
3742 int start = XINT (XCAR (this));
3743 int end = XINT (XCDR (this));
1397dc18
KH
3744
3745 if (start >= 0 && start <= end && end < 256)
e133c8fa 3746 while (start <= end)
1397dc18
KH
3747 coding->spec.ccl.valid_codes[start++] = 1;
3748 }
3749 }
3750 }
4ed46869 3751 }
c952af22 3752 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
aaaf0b1e 3753 coding->spec.ccl.cr_carryover = 0;
1c3478b0 3754 coding->spec.ccl.eight_bit_carryover[0] = 0;
4ed46869
KH
3755 break;
3756
27901516
KH
3757 case 5:
3758 coding->type = coding_type_raw_text;
3759 break;
3760
4ed46869 3761 default:
d46c5b12 3762 goto label_invalid_coding_system;
4ed46869
KH
3763 }
3764 return 0;
3765
3766 label_invalid_coding_system:
3767 coding->type = coding_type_no_conversion;
d46c5b12 3768 coding->category_idx = CODING_CATEGORY_IDX_BINARY;
c952af22 3769 coding->common_flags = 0;
dec137e5 3770 coding->eol_type = CODING_EOL_LF;
d46c5b12 3771 coding->pre_write_conversion = coding->post_read_conversion = Qnil;
4ed46869
KH
3772 return -1;
3773}
3774
ec6d2bb8
KH
3775/* Free memory blocks allocated for storing composition information. */
3776
3777void
3778coding_free_composition_data (coding)
3779 struct coding_system *coding;
3780{
3781 struct composition_data *cmp_data = coding->cmp_data, *next;
3782
3783 if (!cmp_data)
3784 return;
3785 /* Memory blocks are chained. At first, rewind to the first, then,
3786 free blocks one by one. */
3787 while (cmp_data->prev)
3788 cmp_data = cmp_data->prev;
3789 while (cmp_data)
3790 {
3791 next = cmp_data->next;
3792 xfree (cmp_data);
3793 cmp_data = next;
3794 }
3795 coding->cmp_data = NULL;
3796}
3797
3798/* Set `char_offset' member of all memory blocks pointed by
3799 coding->cmp_data to POS. */
3800
3801void
3802coding_adjust_composition_offset (coding, pos)
3803 struct coding_system *coding;
3804 int pos;
3805{
3806 struct composition_data *cmp_data;
3807
3808 for (cmp_data = coding->cmp_data; cmp_data; cmp_data = cmp_data->next)
3809 cmp_data->char_offset = pos;
3810}
3811
54f78171
KH
3812/* Setup raw-text or one of its subsidiaries in the structure
3813 coding_system CODING according to the already setup value eol_type
3814 in CODING. CODING should be setup for some coding system in
3815 advance. */
3816
3817void
3818setup_raw_text_coding_system (coding)
3819 struct coding_system *coding;
3820{
3821 if (coding->type != coding_type_raw_text)
3822 {
3823 coding->symbol = Qraw_text;
3824 coding->type = coding_type_raw_text;
3825 if (coding->eol_type != CODING_EOL_UNDECIDED)
3826 {
84d60297
RS
3827 Lisp_Object subsidiaries;
3828 subsidiaries = Fget (Qraw_text, Qeol_type);
54f78171
KH
3829
3830 if (VECTORP (subsidiaries)
3831 && XVECTOR (subsidiaries)->size == 3)
3832 coding->symbol
3833 = XVECTOR (subsidiaries)->contents[coding->eol_type];
3834 }
716e0b0a 3835 setup_coding_system (coding->symbol, coding);
54f78171
KH
3836 }
3837 return;
3838}
3839
4ed46869
KH
3840/* Emacs has a mechanism to automatically detect a coding system if it
3841 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But,
3842 it's impossible to distinguish some coding systems accurately
3843 because they use the same range of codes. So, at first, coding
3844 systems are categorized into 7, those are:
3845
0ef69138 3846 o coding-category-emacs-mule
4ed46869
KH
3847
3848 The category for a coding system which has the same code range
3849 as Emacs' internal format. Assigned the coding-system (Lisp
0ef69138 3850 symbol) `emacs-mule' by default.
4ed46869
KH
3851
3852 o coding-category-sjis
3853
3854 The category for a coding system which has the same code range
3855 as SJIS. Assigned the coding-system (Lisp
7717c392 3856 symbol) `japanese-shift-jis' by default.
4ed46869
KH
3857
3858 o coding-category-iso-7
3859
3860 The category for a coding system which has the same code range
7717c392 3861 as ISO2022 of 7-bit environment. This doesn't use any locking
d46c5b12
KH
3862 shift and single shift functions. This can encode/decode all
3863 charsets. Assigned the coding-system (Lisp symbol)
3864 `iso-2022-7bit' by default.
3865
3866 o coding-category-iso-7-tight
3867
3868 Same as coding-category-iso-7 except that this can
3869 encode/decode only the specified charsets.
4ed46869
KH
3870
3871 o coding-category-iso-8-1
3872
3873 The category for a coding system which has the same code range
3874 as ISO2022 of 8-bit environment and graphic plane 1 used only
7717c392
KH
3875 for DIMENSION1 charset. This doesn't use any locking shift
3876 and single shift functions. Assigned the coding-system (Lisp
3877 symbol) `iso-latin-1' by default.
4ed46869
KH
3878
3879 o coding-category-iso-8-2
3880
3881 The category for a coding system which has the same code range
3882 as ISO2022 of 8-bit environment and graphic plane 1 used only
7717c392
KH
3883 for DIMENSION2 charset. This doesn't use any locking shift
3884 and single shift functions. Assigned the coding-system (Lisp
3885 symbol) `japanese-iso-8bit' by default.
4ed46869 3886
7717c392 3887 o coding-category-iso-7-else
4ed46869
KH
3888
3889 The category for a coding system which has the same code range
8ca3766a 3890 as ISO2022 of 7-bit environment but uses locking shift or
7717c392
KH
3891 single shift functions. Assigned the coding-system (Lisp
3892 symbol) `iso-2022-7bit-lock' by default.
3893
3894 o coding-category-iso-8-else
3895
3896 The category for a coding system which has the same code range
8ca3766a 3897 as ISO2022 of 8-bit environment but uses locking shift or
7717c392
KH
3898 single shift functions. Assigned the coding-system (Lisp
3899 symbol) `iso-2022-8bit-ss2' by default.
4ed46869
KH
3900
3901 o coding-category-big5
3902
3903 The category for a coding system which has the same code range
3904 as BIG5. Assigned the coding-system (Lisp symbol)
e0e989f6 3905 `cn-big5' by default.
4ed46869 3906
fa42c37f
KH
3907 o coding-category-utf-8
3908
3909 The category for a coding system which has the same code range
3910 as UTF-8 (cf. RFC2279). Assigned the coding-system (Lisp
3911 symbol) `utf-8' by default.
3912
3913 o coding-category-utf-16-be
3914
3915 The category for a coding system in which a text has an
3916 Unicode signature (cf. Unicode Standard) in the order of BIG
3917 endian at the head. Assigned the coding-system (Lisp symbol)
3918 `utf-16-be' by default.
3919
3920 o coding-category-utf-16-le
3921
3922 The category for a coding system in which a text has an
3923 Unicode signature (cf. Unicode Standard) in the order of
3924 LITTLE endian at the head. Assigned the coding-system (Lisp
3925 symbol) `utf-16-le' by default.
3926
1397dc18
KH
3927 o coding-category-ccl
3928
3929 The category for a coding system of which encoder/decoder is
3930 written in CCL programs. The default value is nil, i.e., no
3931 coding system is assigned.
3932
4ed46869
KH
3933 o coding-category-binary
3934
3935 The category for a coding system not categorized in any of the
3936 above. Assigned the coding-system (Lisp symbol)
e0e989f6 3937 `no-conversion' by default.
4ed46869
KH
3938
3939 Each of them is a Lisp symbol and the value is an actual
cfb43547 3940 `coding-system' (this is also a Lisp symbol) assigned by a user.
4ed46869
KH
3941 What Emacs does actually is to detect a category of coding system.
3942 Then, it uses a `coding-system' assigned to it. If Emacs can't
cfb43547 3943 decide a single possible category, it selects a category of the
4ed46869
KH
3944 highest priority. Priorities of categories are also specified by a
3945 user in a Lisp variable `coding-category-list'.
3946
3947*/
3948
66cfb530
KH
3949static
3950int ascii_skip_code[256];
3951
d46c5b12 3952/* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded.
4ed46869
KH
3953 If it detects possible coding systems, return an integer in which
3954 appropriate flag bits are set. Flag bits are defined by macros
fa42c37f
KH
3955 CODING_CATEGORY_MASK_XXX in `coding.h'. If PRIORITIES is non-NULL,
3956 it should point the table `coding_priorities'. In that case, only
3957 the flag bit for a coding system of the highest priority is set in
0a28aafb
KH
3958 the returned value. If MULTIBYTEP is nonzero, 8-bit codes of the
3959 range 0x80..0x9F are in multibyte form.
4ed46869 3960
d46c5b12
KH
3961 How many ASCII characters are at the head is returned as *SKIP. */
3962
3963static int
0a28aafb 3964detect_coding_mask (source, src_bytes, priorities, skip, multibytep)
d46c5b12
KH
3965 unsigned char *source;
3966 int src_bytes, *priorities, *skip;
0a28aafb 3967 int multibytep;
4ed46869
KH
3968{
3969 register unsigned char c;
d46c5b12 3970 unsigned char *src = source, *src_end = source + src_bytes;
fa42c37f 3971 unsigned int mask, utf16_examined_p, iso2022_examined_p;
da55a2b7 3972 int i;
4ed46869
KH
3973
3974 /* At first, skip all ASCII characters and control characters except
3975 for three ISO2022 specific control characters. */
66cfb530
KH
3976 ascii_skip_code[ISO_CODE_SO] = 0;
3977 ascii_skip_code[ISO_CODE_SI] = 0;
3978 ascii_skip_code[ISO_CODE_ESC] = 0;
3979
bcf26d6a 3980 label_loop_detect_coding:
66cfb530 3981 while (src < src_end && ascii_skip_code[*src]) src++;
d46c5b12 3982 *skip = src - source;
4ed46869
KH
3983
3984 if (src >= src_end)
3985 /* We found nothing other than ASCII. There's nothing to do. */
d46c5b12 3986 return 0;
4ed46869 3987
8a8147d6 3988 c = *src;
4ed46869
KH
3989 /* The text seems to be encoded in some multilingual coding system.
3990 Now, try to find in which coding system the text is encoded. */
3991 if (c < 0x80)
bcf26d6a
KH
3992 {
3993 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */
3994 /* C is an ISO2022 specific control code of C0. */
0a28aafb 3995 mask = detect_coding_iso2022 (src, src_end, multibytep);
1b2af4b0 3996 if (mask == 0)
d46c5b12
KH
3997 {
3998 /* No valid ISO2022 code follows C. Try again. */
3999 src++;
66cfb530
KH
4000 if (c == ISO_CODE_ESC)
4001 ascii_skip_code[ISO_CODE_ESC] = 1;
4002 else
4003 ascii_skip_code[ISO_CODE_SO] = ascii_skip_code[ISO_CODE_SI] = 1;
d46c5b12
KH
4004 goto label_loop_detect_coding;
4005 }
4006 if (priorities)
fa42c37f
KH
4007 {
4008 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
4009 {
4010 if (mask & priorities[i])
4011 return priorities[i];
4012 }
4013 return CODING_CATEGORY_MASK_RAW_TEXT;
4014 }
bcf26d6a 4015 }
d46c5b12 4016 else
c4825358 4017 {
d46c5b12 4018 int try;
4ed46869 4019
0a28aafb 4020 if (multibytep && c == LEADING_CODE_8_BIT_CONTROL)
67091e59 4021 c = src[1] - 0x20;
0a28aafb 4022
d46c5b12
KH
4023 if (c < 0xA0)
4024 {
4025 /* C is the first byte of SJIS character code,
fa42c37f
KH
4026 or a leading-code of Emacs' internal format (emacs-mule),
4027 or the first byte of UTF-16. */
4028 try = (CODING_CATEGORY_MASK_SJIS
4029 | CODING_CATEGORY_MASK_EMACS_MULE
4030 | CODING_CATEGORY_MASK_UTF_16_BE
4031 | CODING_CATEGORY_MASK_UTF_16_LE);
d46c5b12
KH
4032
4033 /* Or, if C is a special latin extra code,
93dec019 4034 or is an ISO2022 specific control code of C1 (SS2 or SS3),
d46c5b12
KH
4035 or is an ISO2022 control-sequence-introducer (CSI),
4036 we should also consider the possibility of ISO2022 codings. */
4037 if ((VECTORP (Vlatin_extra_code_table)
4038 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
4039 || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3)
4040 || (c == ISO_CODE_CSI
4041 && (src < src_end
4042 && (*src == ']'
4043 || ((*src == '0' || *src == '1' || *src == '2')
4044 && src + 1 < src_end
4045 && src[1] == ']')))))
4046 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE
4047 | CODING_CATEGORY_MASK_ISO_8BIT);
4048 }
c4825358 4049 else
d46c5b12
KH
4050 /* C is a character of ISO2022 in graphic plane right,
4051 or a SJIS's 1-byte character code (i.e. JISX0201),
fa42c37f
KH
4052 or the first byte of BIG5's 2-byte code,
4053 or the first byte of UTF-8/16. */
d46c5b12
KH
4054 try = (CODING_CATEGORY_MASK_ISO_8_ELSE
4055 | CODING_CATEGORY_MASK_ISO_8BIT
4056 | CODING_CATEGORY_MASK_SJIS
fa42c37f
KH
4057 | CODING_CATEGORY_MASK_BIG5
4058 | CODING_CATEGORY_MASK_UTF_8
4059 | CODING_CATEGORY_MASK_UTF_16_BE
4060 | CODING_CATEGORY_MASK_UTF_16_LE);
d46c5b12 4061
1397dc18
KH
4062 /* Or, we may have to consider the possibility of CCL. */
4063 if (coding_system_table[CODING_CATEGORY_IDX_CCL]
4064 && (coding_system_table[CODING_CATEGORY_IDX_CCL]
4065 ->spec.ccl.valid_codes)[c])
4066 try |= CODING_CATEGORY_MASK_CCL;
4067
d46c5b12 4068 mask = 0;
fa42c37f 4069 utf16_examined_p = iso2022_examined_p = 0;
d46c5b12
KH
4070 if (priorities)
4071 {
4072 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
4073 {
fa42c37f
KH
4074 if (!iso2022_examined_p
4075 && (priorities[i] & try & CODING_CATEGORY_MASK_ISO))
4076 {
0192762c 4077 mask |= detect_coding_iso2022 (src, src_end, multibytep);
fa42c37f
KH
4078 iso2022_examined_p = 1;
4079 }
5ab13dd0 4080 else if (priorities[i] & try & CODING_CATEGORY_MASK_SJIS)
0a28aafb 4081 mask |= detect_coding_sjis (src, src_end, multibytep);
fa42c37f 4082 else if (priorities[i] & try & CODING_CATEGORY_MASK_UTF_8)
0a28aafb 4083 mask |= detect_coding_utf_8 (src, src_end, multibytep);
fa42c37f
KH
4084 else if (!utf16_examined_p
4085 && (priorities[i] & try &
4086 CODING_CATEGORY_MASK_UTF_16_BE_LE))
4087 {
0a28aafb 4088 mask |= detect_coding_utf_16 (src, src_end, multibytep);
fa42c37f
KH
4089 utf16_examined_p = 1;
4090 }
5ab13dd0 4091 else if (priorities[i] & try & CODING_CATEGORY_MASK_BIG5)
0a28aafb 4092 mask |= detect_coding_big5 (src, src_end, multibytep);
5ab13dd0 4093 else if (priorities[i] & try & CODING_CATEGORY_MASK_EMACS_MULE)
0a28aafb 4094 mask |= detect_coding_emacs_mule (src, src_end, multibytep);
89fa8b36 4095 else if (priorities[i] & try & CODING_CATEGORY_MASK_CCL)
0a28aafb 4096 mask |= detect_coding_ccl (src, src_end, multibytep);
5ab13dd0 4097 else if (priorities[i] & CODING_CATEGORY_MASK_RAW_TEXT)
fa42c37f 4098 mask |= CODING_CATEGORY_MASK_RAW_TEXT;
5ab13dd0 4099 else if (priorities[i] & CODING_CATEGORY_MASK_BINARY)
fa42c37f
KH
4100 mask |= CODING_CATEGORY_MASK_BINARY;
4101 if (mask & priorities[i])
4102 return priorities[i];
d46c5b12
KH
4103 }
4104 return CODING_CATEGORY_MASK_RAW_TEXT;
4105 }
4106 if (try & CODING_CATEGORY_MASK_ISO)
0a28aafb 4107 mask |= detect_coding_iso2022 (src, src_end, multibytep);
d46c5b12 4108 if (try & CODING_CATEGORY_MASK_SJIS)
0a28aafb 4109 mask |= detect_coding_sjis (src, src_end, multibytep);
d46c5b12 4110 if (try & CODING_CATEGORY_MASK_BIG5)
0a28aafb 4111 mask |= detect_coding_big5 (src, src_end, multibytep);
fa42c37f 4112 if (try & CODING_CATEGORY_MASK_UTF_8)
0a28aafb 4113 mask |= detect_coding_utf_8 (src, src_end, multibytep);
fa42c37f 4114 if (try & CODING_CATEGORY_MASK_UTF_16_BE_LE)
0a28aafb 4115 mask |= detect_coding_utf_16 (src, src_end, multibytep);
d46c5b12 4116 if (try & CODING_CATEGORY_MASK_EMACS_MULE)
0a28aafb 4117 mask |= detect_coding_emacs_mule (src, src_end, multibytep);
1397dc18 4118 if (try & CODING_CATEGORY_MASK_CCL)
0a28aafb 4119 mask |= detect_coding_ccl (src, src_end, multibytep);
c4825358 4120 }
5ab13dd0 4121 return (mask | CODING_CATEGORY_MASK_RAW_TEXT | CODING_CATEGORY_MASK_BINARY);
4ed46869
KH
4122}
4123
4124/* Detect how a text of length SRC_BYTES pointed by SRC is encoded.
4125 The information of the detected coding system is set in CODING. */
4126
4127void
4128detect_coding (coding, src, src_bytes)
4129 struct coding_system *coding;
a4244313 4130 const unsigned char *src;
4ed46869
KH
4131 int src_bytes;
4132{
d46c5b12 4133 unsigned int idx;
da55a2b7 4134 int skip, mask;
84d60297 4135 Lisp_Object val;
4ed46869 4136
84d60297 4137 val = Vcoding_category_list;
64c1e55f
KH
4138 mask = detect_coding_mask (src, src_bytes, coding_priorities, &skip,
4139 coding->src_multibyte);
d46c5b12 4140 coding->heading_ascii = skip;
4ed46869 4141
d46c5b12
KH
4142 if (!mask) return;
4143
4144 /* We found a single coding system of the highest priority in MASK. */
4145 idx = 0;
4146 while (mask && ! (mask & 1)) mask >>= 1, idx++;
4147 if (! mask)
4148 idx = CODING_CATEGORY_IDX_RAW_TEXT;
4ed46869 4149
f5c1dd0d 4150 val = SYMBOL_VALUE (XVECTOR (Vcoding_category_table)->contents[idx]);
d46c5b12
KH
4151
4152 if (coding->eol_type != CODING_EOL_UNDECIDED)
27901516 4153 {
84d60297 4154 Lisp_Object tmp;
d46c5b12 4155
84d60297 4156 tmp = Fget (val, Qeol_type);
d46c5b12
KH
4157 if (VECTORP (tmp))
4158 val = XVECTOR (tmp)->contents[coding->eol_type];
4ed46869 4159 }
b73bfc1c
KH
4160
4161 /* Setup this new coding system while preserving some slots. */
4162 {
4163 int src_multibyte = coding->src_multibyte;
4164 int dst_multibyte = coding->dst_multibyte;
4165
4166 setup_coding_system (val, coding);
4167 coding->src_multibyte = src_multibyte;
4168 coding->dst_multibyte = dst_multibyte;
4169 coding->heading_ascii = skip;
4170 }
4ed46869
KH
4171}
4172
d46c5b12
KH
4173/* Detect how end-of-line of a text of length SRC_BYTES pointed by
4174 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF,
4175 CODING_EOL_CR, and CODING_EOL_UNDECIDED.
4176
4177 How many non-eol characters are at the head is returned as *SKIP. */
4ed46869 4178
bc4bc72a
RS
4179#define MAX_EOL_CHECK_COUNT 3
4180
d46c5b12
KH
4181static int
4182detect_eol_type (source, src_bytes, skip)
4183 unsigned char *source;
4184 int src_bytes, *skip;
4ed46869 4185{
d46c5b12 4186 unsigned char *src = source, *src_end = src + src_bytes;
4ed46869 4187 unsigned char c;
bc4bc72a
RS
4188 int total = 0; /* How many end-of-lines are found so far. */
4189 int eol_type = CODING_EOL_UNDECIDED;
4190 int this_eol_type;
4ed46869 4191
d46c5b12
KH
4192 *skip = 0;
4193
bc4bc72a 4194 while (src < src_end && total < MAX_EOL_CHECK_COUNT)
4ed46869
KH
4195 {
4196 c = *src++;
bc4bc72a 4197 if (c == '\n' || c == '\r')
4ed46869 4198 {
d46c5b12
KH
4199 if (*skip == 0)
4200 *skip = src - 1 - source;
bc4bc72a
RS
4201 total++;
4202 if (c == '\n')
4203 this_eol_type = CODING_EOL_LF;
4204 else if (src >= src_end || *src != '\n')
4205 this_eol_type = CODING_EOL_CR;
4ed46869 4206 else
bc4bc72a
RS
4207 this_eol_type = CODING_EOL_CRLF, src++;
4208
4209 if (eol_type == CODING_EOL_UNDECIDED)
4210 /* This is the first end-of-line. */
4211 eol_type = this_eol_type;
4212 else if (eol_type != this_eol_type)
d46c5b12
KH
4213 {
4214 /* The found type is different from what found before. */
4215 eol_type = CODING_EOL_INCONSISTENT;
4216 break;
4217 }
4ed46869
KH
4218 }
4219 }
bc4bc72a 4220
d46c5b12
KH
4221 if (*skip == 0)
4222 *skip = src_end - source;
85a02ca4 4223 return eol_type;
4ed46869
KH
4224}
4225
fa42c37f
KH
4226/* Like detect_eol_type, but detect EOL type in 2-octet
4227 big-endian/little-endian format for coding systems utf-16-be and
4228 utf-16-le. */
4229
4230static int
4231detect_eol_type_in_2_octet_form (source, src_bytes, skip, big_endian_p)
4232 unsigned char *source;
cfb43547 4233 int src_bytes, *skip, big_endian_p;
fa42c37f
KH
4234{
4235 unsigned char *src = source, *src_end = src + src_bytes;
4236 unsigned int c1, c2;
4237 int total = 0; /* How many end-of-lines are found so far. */
4238 int eol_type = CODING_EOL_UNDECIDED;
4239 int this_eol_type;
4240 int msb, lsb;
4241
4242 if (big_endian_p)
4243 msb = 0, lsb = 1;
4244 else
4245 msb = 1, lsb = 0;
4246
4247 *skip = 0;
4248
4249 while ((src + 1) < src_end && total < MAX_EOL_CHECK_COUNT)
4250 {
4251 c1 = (src[msb] << 8) | (src[lsb]);
4252 src += 2;
4253
4254 if (c1 == '\n' || c1 == '\r')
4255 {
4256 if (*skip == 0)
4257 *skip = src - 2 - source;
4258 total++;
4259 if (c1 == '\n')
4260 {
4261 this_eol_type = CODING_EOL_LF;
4262 }
4263 else
4264 {
4265 if ((src + 1) >= src_end)
4266 {
4267 this_eol_type = CODING_EOL_CR;
4268 }
4269 else
4270 {
4271 c2 = (src[msb] << 8) | (src[lsb]);
4272 if (c2 == '\n')
4273 this_eol_type = CODING_EOL_CRLF, src += 2;
4274 else
4275 this_eol_type = CODING_EOL_CR;
4276 }
4277 }
4278
4279 if (eol_type == CODING_EOL_UNDECIDED)
4280 /* This is the first end-of-line. */
4281 eol_type = this_eol_type;
4282 else if (eol_type != this_eol_type)
4283 {
4284 /* The found type is different from what found before. */
4285 eol_type = CODING_EOL_INCONSISTENT;
4286 break;
4287 }
4288 }
4289 }
4290
4291 if (*skip == 0)
4292 *skip = src_end - source;
4293 return eol_type;
4294}
4295
4ed46869
KH
4296/* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
4297 is encoded. If it detects an appropriate format of end-of-line, it
4298 sets the information in *CODING. */
4299
4300void
4301detect_eol (coding, src, src_bytes)
4302 struct coding_system *coding;
a4244313 4303 const unsigned char *src;
4ed46869
KH
4304 int src_bytes;
4305{
4608c386 4306 Lisp_Object val;
d46c5b12 4307 int skip;
fa42c37f
KH
4308 int eol_type;
4309
4310 switch (coding->category_idx)
4311 {
4312 case CODING_CATEGORY_IDX_UTF_16_BE:
4313 eol_type = detect_eol_type_in_2_octet_form (src, src_bytes, &skip, 1);
4314 break;
4315 case CODING_CATEGORY_IDX_UTF_16_LE:
4316 eol_type = detect_eol_type_in_2_octet_form (src, src_bytes, &skip, 0);
4317 break;
4318 default:
4319 eol_type = detect_eol_type (src, src_bytes, &skip);
4320 break;
4321 }
d46c5b12
KH
4322
4323 if (coding->heading_ascii > skip)
4324 coding->heading_ascii = skip;
4325 else
4326 skip = coding->heading_ascii;
4ed46869 4327
0ef69138 4328 if (eol_type == CODING_EOL_UNDECIDED)
4ed46869 4329 return;
27901516
KH
4330 if (eol_type == CODING_EOL_INCONSISTENT)
4331 {
4332#if 0
4333 /* This code is suppressed until we find a better way to
992f23f2 4334 distinguish raw text file and binary file. */
27901516
KH
4335
4336 /* If we have already detected that the coding is raw-text, the
4337 coding should actually be no-conversion. */
4338 if (coding->type == coding_type_raw_text)
4339 {
4340 setup_coding_system (Qno_conversion, coding);
4341 return;
4342 }
4343 /* Else, let's decode only text code anyway. */
4344#endif /* 0 */
1b2af4b0 4345 eol_type = CODING_EOL_LF;
27901516
KH
4346 }
4347
4608c386 4348 val = Fget (coding->symbol, Qeol_type);
4ed46869 4349 if (VECTORP (val) && XVECTOR (val)->size == 3)
d46c5b12 4350 {
b73bfc1c
KH
4351 int src_multibyte = coding->src_multibyte;
4352 int dst_multibyte = coding->dst_multibyte;
1cd6b64c 4353 struct composition_data *cmp_data = coding->cmp_data;
b73bfc1c 4354
d46c5b12 4355 setup_coding_system (XVECTOR (val)->contents[eol_type], coding);
b73bfc1c
KH
4356 coding->src_multibyte = src_multibyte;
4357 coding->dst_multibyte = dst_multibyte;
d46c5b12 4358 coding->heading_ascii = skip;
1cd6b64c 4359 coding->cmp_data = cmp_data;
d46c5b12
KH
4360 }
4361}
4362
4363#define CONVERSION_BUFFER_EXTRA_ROOM 256
4364
b73bfc1c
KH
4365#define DECODING_BUFFER_MAG(coding) \
4366 (coding->type == coding_type_iso2022 \
4367 ? 3 \
4368 : (coding->type == coding_type_ccl \
4369 ? coding->spec.ccl.decoder.buf_magnification \
4370 : 2))
d46c5b12
KH
4371
4372/* Return maximum size (bytes) of a buffer enough for decoding
4373 SRC_BYTES of text encoded in CODING. */
4374
4375int
4376decoding_buffer_size (coding, src_bytes)
4377 struct coding_system *coding;
4378 int src_bytes;
4379{
4380 return (src_bytes * DECODING_BUFFER_MAG (coding)
4381 + CONVERSION_BUFFER_EXTRA_ROOM);
4382}
4383
4384/* Return maximum size (bytes) of a buffer enough for encoding
4385 SRC_BYTES of text to CODING. */
4386
4387int
4388encoding_buffer_size (coding, src_bytes)
4389 struct coding_system *coding;
4390 int src_bytes;
4391{
4392 int magnification;
4393
4394 if (coding->type == coding_type_ccl)
4395 magnification = coding->spec.ccl.encoder.buf_magnification;
b73bfc1c 4396 else if (CODING_REQUIRE_ENCODING (coding))
d46c5b12 4397 magnification = 3;
b73bfc1c
KH
4398 else
4399 magnification = 1;
d46c5b12
KH
4400
4401 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM);
4402}
4403
73be902c
KH
4404/* Working buffer for code conversion. */
4405struct conversion_buffer
4406{
4407 int size; /* size of data. */
4408 int on_stack; /* 1 if allocated by alloca. */
4409 unsigned char *data;
4410};
d46c5b12 4411
73be902c
KH
4412/* Don't use alloca for allocating memory space larger than this, lest
4413 we overflow their stack. */
4414#define MAX_ALLOCA 16*1024
d46c5b12 4415
73be902c
KH
4416/* Allocate LEN bytes of memory for BUF (struct conversion_buffer). */
4417#define allocate_conversion_buffer(buf, len) \
4418 do { \
4419 if (len < MAX_ALLOCA) \
4420 { \
4421 buf.data = (unsigned char *) alloca (len); \
4422 buf.on_stack = 1; \
4423 } \
4424 else \
4425 { \
4426 buf.data = (unsigned char *) xmalloc (len); \
4427 buf.on_stack = 0; \
4428 } \
4429 buf.size = len; \
4430 } while (0)
d46c5b12 4431
73be902c
KH
4432/* Double the allocated memory for *BUF. */
4433static void
4434extend_conversion_buffer (buf)
4435 struct conversion_buffer *buf;
d46c5b12 4436{
73be902c 4437 if (buf->on_stack)
d46c5b12 4438 {
73be902c
KH
4439 unsigned char *save = buf->data;
4440 buf->data = (unsigned char *) xmalloc (buf->size * 2);
4441 bcopy (save, buf->data, buf->size);
4442 buf->on_stack = 0;
d46c5b12 4443 }
73be902c
KH
4444 else
4445 {
4446 buf->data = (unsigned char *) xrealloc (buf->data, buf->size * 2);
4447 }
4448 buf->size *= 2;
4449}
4450
4451/* Free the allocated memory for BUF if it is not on stack. */
4452static void
4453free_conversion_buffer (buf)
4454 struct conversion_buffer *buf;
4455{
4456 if (!buf->on_stack)
4457 xfree (buf->data);
d46c5b12
KH
4458}
4459
4460int
4461ccl_coding_driver (coding, source, destination, src_bytes, dst_bytes, encodep)
4462 struct coding_system *coding;
4463 unsigned char *source, *destination;
4464 int src_bytes, dst_bytes, encodep;
4465{
4466 struct ccl_program *ccl
4467 = encodep ? &coding->spec.ccl.encoder : &coding->spec.ccl.decoder;
1c3478b0 4468 unsigned char *dst = destination;
d46c5b12 4469
bd64290d 4470 ccl->suppress_error = coding->suppress_error;
ae9ff118 4471 ccl->last_block = coding->mode & CODING_MODE_LAST_BLOCK;
aaaf0b1e 4472 if (encodep)
80e0ca99
KH
4473 {
4474 /* On encoding, EOL format is converted within ccl_driver. For
4475 that, setup proper information in the structure CCL. */
4476 ccl->eol_type = coding->eol_type;
4477 if (ccl->eol_type ==CODING_EOL_UNDECIDED)
4478 ccl->eol_type = CODING_EOL_LF;
4479 ccl->cr_consumed = coding->spec.ccl.cr_carryover;
4480 }
7272d75c 4481 ccl->multibyte = coding->src_multibyte;
1c3478b0
KH
4482 if (coding->spec.ccl.eight_bit_carryover[0] != 0)
4483 {
4484 /* Move carryover bytes to DESTINATION. */
4485 unsigned char *p = coding->spec.ccl.eight_bit_carryover;
4486 while (*p)
4487 *dst++ = *p++;
4488 coding->spec.ccl.eight_bit_carryover[0] = 0;
4489 if (dst_bytes)
4490 dst_bytes -= dst - destination;
4491 }
4492
4493 coding->produced = (ccl_driver (ccl, source, dst, src_bytes, dst_bytes,
4494 &(coding->consumed))
4495 + dst - destination);
4496
b73bfc1c 4497 if (encodep)
80e0ca99
KH
4498 {
4499 coding->produced_char = coding->produced;
4500 coding->spec.ccl.cr_carryover = ccl->cr_consumed;
4501 }
ade8d05e
KH
4502 else if (!ccl->eight_bit_control)
4503 {
4504 /* The produced bytes forms a valid multibyte sequence. */
4505 coding->produced_char
4506 = multibyte_chars_in_text (destination, coding->produced);
4507 coding->spec.ccl.eight_bit_carryover[0] = 0;
4508 }
b73bfc1c
KH
4509 else
4510 {
1c3478b0
KH
4511 /* On decoding, the destination should always multibyte. But,
4512 CCL program might have been generated an invalid multibyte
4513 sequence. Here we make such a sequence valid as
4514 multibyte. */
b73bfc1c
KH
4515 int bytes
4516 = dst_bytes ? dst_bytes : source + coding->consumed - destination;
1c3478b0
KH
4517
4518 if ((coding->consumed < src_bytes
4519 || !ccl->last_block)
4520 && coding->produced >= 1
4521 && destination[coding->produced - 1] >= 0x80)
4522 {
4523 /* We should not convert the tailing 8-bit codes to
4524 multibyte form even if they doesn't form a valid
4525 multibyte sequence. They may form a valid sequence in
4526 the next call. */
4527 int carryover = 0;
4528
4529 if (destination[coding->produced - 1] < 0xA0)
4530 carryover = 1;
4531 else if (coding->produced >= 2)
4532 {
4533 if (destination[coding->produced - 2] >= 0x80)
4534 {
4535 if (destination[coding->produced - 2] < 0xA0)
4536 carryover = 2;
4537 else if (coding->produced >= 3
4538 && destination[coding->produced - 3] >= 0x80
4539 && destination[coding->produced - 3] < 0xA0)
4540 carryover = 3;
4541 }
4542 }
4543 if (carryover > 0)
4544 {
4545 BCOPY_SHORT (destination + coding->produced - carryover,
4546 coding->spec.ccl.eight_bit_carryover,
4547 carryover);
4548 coding->spec.ccl.eight_bit_carryover[carryover] = 0;
4549 coding->produced -= carryover;
4550 }
4551 }
b73bfc1c
KH
4552 coding->produced = str_as_multibyte (destination, bytes,
4553 coding->produced,
4554 &(coding->produced_char));
4555 }
69f76525 4556
d46c5b12
KH
4557 switch (ccl->status)
4558 {
4559 case CCL_STAT_SUSPEND_BY_SRC:
73be902c 4560 coding->result = CODING_FINISH_INSUFFICIENT_SRC;
d46c5b12
KH
4561 break;
4562 case CCL_STAT_SUSPEND_BY_DST:
73be902c 4563 coding->result = CODING_FINISH_INSUFFICIENT_DST;
d46c5b12 4564 break;
9864ebce
KH
4565 case CCL_STAT_QUIT:
4566 case CCL_STAT_INVALID_CMD:
73be902c 4567 coding->result = CODING_FINISH_INTERRUPT;
9864ebce 4568 break;
d46c5b12 4569 default:
73be902c 4570 coding->result = CODING_FINISH_NORMAL;
d46c5b12
KH
4571 break;
4572 }
73be902c 4573 return coding->result;
4ed46869
KH
4574}
4575
aaaf0b1e
KH
4576/* Decode EOL format of the text at PTR of BYTES length destructively
4577 according to CODING->eol_type. This is called after the CCL
4578 program produced a decoded text at PTR. If we do CRLF->LF
4579 conversion, update CODING->produced and CODING->produced_char. */
4580
4581static void
4582decode_eol_post_ccl (coding, ptr, bytes)
4583 struct coding_system *coding;
4584 unsigned char *ptr;
4585 int bytes;
4586{
4587 Lisp_Object val, saved_coding_symbol;
4588 unsigned char *pend = ptr + bytes;
4589 int dummy;
4590
4591 /* Remember the current coding system symbol. We set it back when
4592 an inconsistent EOL is found so that `last-coding-system-used' is
4593 set to the coding system that doesn't specify EOL conversion. */
4594 saved_coding_symbol = coding->symbol;
4595
4596 coding->spec.ccl.cr_carryover = 0;
4597 if (coding->eol_type == CODING_EOL_UNDECIDED)
4598 {
4599 /* Here, to avoid the call of setup_coding_system, we directly
4600 call detect_eol_type. */
4601 coding->eol_type = detect_eol_type (ptr, bytes, &dummy);
74b01b80
EZ
4602 if (coding->eol_type == CODING_EOL_INCONSISTENT)
4603 coding->eol_type = CODING_EOL_LF;
4604 if (coding->eol_type != CODING_EOL_UNDECIDED)
4605 {
4606 val = Fget (coding->symbol, Qeol_type);
4607 if (VECTORP (val) && XVECTOR (val)->size == 3)
4608 coding->symbol = XVECTOR (val)->contents[coding->eol_type];
4609 }
aaaf0b1e
KH
4610 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
4611 }
4612
74b01b80
EZ
4613 if (coding->eol_type == CODING_EOL_LF
4614 || coding->eol_type == CODING_EOL_UNDECIDED)
aaaf0b1e
KH
4615 {
4616 /* We have nothing to do. */
4617 ptr = pend;
4618 }
4619 else if (coding->eol_type == CODING_EOL_CRLF)
4620 {
4621 unsigned char *pstart = ptr, *p = ptr;
4622
4623 if (! (coding->mode & CODING_MODE_LAST_BLOCK)
4624 && *(pend - 1) == '\r')
4625 {
4626 /* If the last character is CR, we can't handle it here
4627 because LF will be in the not-yet-decoded source text.
9861e777 4628 Record that the CR is not yet processed. */
aaaf0b1e
KH
4629 coding->spec.ccl.cr_carryover = 1;
4630 coding->produced--;
4631 coding->produced_char--;
4632 pend--;
4633 }
4634 while (ptr < pend)
4635 {
4636 if (*ptr == '\r')
4637 {
4638 if (ptr + 1 < pend && *(ptr + 1) == '\n')
4639 {
4640 *p++ = '\n';
4641 ptr += 2;
4642 }
4643 else
4644 {
4645 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4646 goto undo_eol_conversion;
4647 *p++ = *ptr++;
4648 }
4649 }
4650 else if (*ptr == '\n'
4651 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4652 goto undo_eol_conversion;
4653 else
4654 *p++ = *ptr++;
4655 continue;
4656
4657 undo_eol_conversion:
4658 /* We have faced with inconsistent EOL format at PTR.
4659 Convert all LFs before PTR back to CRLFs. */
4660 for (p--, ptr--; p >= pstart; p--)
4661 {
4662 if (*p == '\n')
4663 *ptr-- = '\n', *ptr-- = '\r';
4664 else
4665 *ptr-- = *p;
4666 }
4667 /* If carryover is recorded, cancel it because we don't
4668 convert CRLF anymore. */
4669 if (coding->spec.ccl.cr_carryover)
4670 {
4671 coding->spec.ccl.cr_carryover = 0;
4672 coding->produced++;
4673 coding->produced_char++;
4674 pend++;
4675 }
4676 p = ptr = pend;
4677 coding->eol_type = CODING_EOL_LF;
4678 coding->symbol = saved_coding_symbol;
4679 }
4680 if (p < pend)
4681 {
4682 /* As each two-byte sequence CRLF was converted to LF, (PEND
4683 - P) is the number of deleted characters. */
4684 coding->produced -= pend - p;
4685 coding->produced_char -= pend - p;
4686 }
4687 }
4688 else /* i.e. coding->eol_type == CODING_EOL_CR */
4689 {
4690 unsigned char *p = ptr;
4691
4692 for (; ptr < pend; ptr++)
4693 {
4694 if (*ptr == '\r')
4695 *ptr = '\n';
4696 else if (*ptr == '\n'
4697 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4698 {
4699 for (; p < ptr; p++)
4700 {
4701 if (*p == '\n')
4702 *p = '\r';
4703 }
4704 ptr = pend;
4705 coding->eol_type = CODING_EOL_LF;
4706 coding->symbol = saved_coding_symbol;
4707 }
4708 }
4709 }
4710}
4711
4ed46869
KH
4712/* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before
4713 decoding, it may detect coding system and format of end-of-line if
b73bfc1c
KH
4714 those are not yet decided. The source should be unibyte, the
4715 result is multibyte if CODING->dst_multibyte is nonzero, else
4716 unibyte. */
4ed46869
KH
4717
4718int
d46c5b12 4719decode_coding (coding, source, destination, src_bytes, dst_bytes)
4ed46869 4720 struct coding_system *coding;
a4244313
KR
4721 const unsigned char *source;
4722 unsigned char *destination;
4ed46869 4723 int src_bytes, dst_bytes;
4ed46869 4724{
9861e777
EZ
4725 int extra = 0;
4726
0ef69138 4727 if (coding->type == coding_type_undecided)
4ed46869
KH
4728 detect_coding (coding, source, src_bytes);
4729
aaaf0b1e
KH
4730 if (coding->eol_type == CODING_EOL_UNDECIDED
4731 && coding->type != coding_type_ccl)
8844fa83
KH
4732 {
4733 detect_eol (coding, source, src_bytes);
4734 /* We had better recover the original eol format if we
8ca3766a 4735 encounter an inconsistent eol format while decoding. */
8844fa83
KH
4736 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
4737 }
4ed46869 4738
b73bfc1c
KH
4739 coding->produced = coding->produced_char = 0;
4740 coding->consumed = coding->consumed_char = 0;
4741 coding->errors = 0;
4742 coding->result = CODING_FINISH_NORMAL;
4743
4ed46869
KH
4744 switch (coding->type)
4745 {
4ed46869 4746 case coding_type_sjis:
b73bfc1c
KH
4747 decode_coding_sjis_big5 (coding, source, destination,
4748 src_bytes, dst_bytes, 1);
4ed46869
KH
4749 break;
4750
4751 case coding_type_iso2022:
b73bfc1c
KH
4752 decode_coding_iso2022 (coding, source, destination,
4753 src_bytes, dst_bytes);
4ed46869
KH
4754 break;
4755
4756 case coding_type_big5:
b73bfc1c
KH
4757 decode_coding_sjis_big5 (coding, source, destination,
4758 src_bytes, dst_bytes, 0);
4759 break;
4760
4761 case coding_type_emacs_mule:
4762 decode_coding_emacs_mule (coding, source, destination,
4763 src_bytes, dst_bytes);
4ed46869
KH
4764 break;
4765
4766 case coding_type_ccl:
aaaf0b1e
KH
4767 if (coding->spec.ccl.cr_carryover)
4768 {
9861e777
EZ
4769 /* Put the CR which was not processed by the previous call
4770 of decode_eol_post_ccl in DESTINATION. It will be
4771 decoded together with the following LF by the call to
4772 decode_eol_post_ccl below. */
aaaf0b1e
KH
4773 *destination = '\r';
4774 coding->produced++;
4775 coding->produced_char++;
4776 dst_bytes--;
9861e777 4777 extra = coding->spec.ccl.cr_carryover;
aaaf0b1e 4778 }
9861e777 4779 ccl_coding_driver (coding, source, destination + extra,
b73bfc1c 4780 src_bytes, dst_bytes, 0);
aaaf0b1e 4781 if (coding->eol_type != CODING_EOL_LF)
9861e777
EZ
4782 {
4783 coding->produced += extra;
4784 coding->produced_char += extra;
4785 decode_eol_post_ccl (coding, destination, coding->produced);
4786 }
d46c5b12
KH
4787 break;
4788
b73bfc1c
KH
4789 default:
4790 decode_eol (coding, source, destination, src_bytes, dst_bytes);
4791 }
4792
4793 if (coding->result == CODING_FINISH_INSUFFICIENT_SRC
e7c9eef9 4794 && coding->mode & CODING_MODE_LAST_BLOCK
b73bfc1c
KH
4795 && coding->consumed == src_bytes)
4796 coding->result = CODING_FINISH_NORMAL;
4797
4798 if (coding->mode & CODING_MODE_LAST_BLOCK
4799 && coding->result == CODING_FINISH_INSUFFICIENT_SRC)
4800 {
a4244313 4801 const unsigned char *src = source + coding->consumed;
b73bfc1c
KH
4802 unsigned char *dst = destination + coding->produced;
4803
4804 src_bytes -= coding->consumed;
bb10be8b 4805 coding->errors++;
b73bfc1c
KH
4806 if (COMPOSING_P (coding))
4807 DECODE_COMPOSITION_END ('1');
4808 while (src_bytes--)
d46c5b12 4809 {
b73bfc1c
KH
4810 int c = *src++;
4811 dst += CHAR_STRING (c, dst);
4812 coding->produced_char++;
d46c5b12 4813 }
b73bfc1c
KH
4814 coding->consumed = coding->consumed_char = src - source;
4815 coding->produced = dst - destination;
73be902c 4816 coding->result = CODING_FINISH_NORMAL;
4ed46869
KH
4817 }
4818
b73bfc1c
KH
4819 if (!coding->dst_multibyte)
4820 {
4821 coding->produced = str_as_unibyte (destination, coding->produced);
4822 coding->produced_char = coding->produced;
4823 }
4ed46869 4824
b73bfc1c
KH
4825 return coding->result;
4826}
52d41803 4827
b73bfc1c
KH
4828/* See "GENERAL NOTES about `encode_coding_XXX ()' functions". The
4829 multibyteness of the source is CODING->src_multibyte, the
4830 multibyteness of the result is always unibyte. */
4ed46869
KH
4831
4832int
d46c5b12 4833encode_coding (coding, source, destination, src_bytes, dst_bytes)
4ed46869 4834 struct coding_system *coding;
a4244313
KR
4835 const unsigned char *source;
4836 unsigned char *destination;
4ed46869 4837 int src_bytes, dst_bytes;
4ed46869 4838{
b73bfc1c
KH
4839 coding->produced = coding->produced_char = 0;
4840 coding->consumed = coding->consumed_char = 0;
4841 coding->errors = 0;
4842 coding->result = CODING_FINISH_NORMAL;
4ed46869 4843
d46c5b12
KH
4844 switch (coding->type)
4845 {
4ed46869 4846 case coding_type_sjis:
b73bfc1c
KH
4847 encode_coding_sjis_big5 (coding, source, destination,
4848 src_bytes, dst_bytes, 1);
4ed46869
KH
4849 break;
4850
4851 case coding_type_iso2022:
b73bfc1c
KH
4852 encode_coding_iso2022 (coding, source, destination,
4853 src_bytes, dst_bytes);
4ed46869
KH
4854 break;
4855
4856 case coding_type_big5:
b73bfc1c
KH
4857 encode_coding_sjis_big5 (coding, source, destination,
4858 src_bytes, dst_bytes, 0);
4859 break;
4860
4861 case coding_type_emacs_mule:
4862 encode_coding_emacs_mule (coding, source, destination,
4863 src_bytes, dst_bytes);
4ed46869
KH
4864 break;
4865
4866 case coding_type_ccl:
b73bfc1c
KH
4867 ccl_coding_driver (coding, source, destination,
4868 src_bytes, dst_bytes, 1);
d46c5b12
KH
4869 break;
4870
b73bfc1c
KH
4871 default:
4872 encode_eol (coding, source, destination, src_bytes, dst_bytes);
4873 }
4874
73be902c
KH
4875 if (coding->mode & CODING_MODE_LAST_BLOCK
4876 && coding->result == CODING_FINISH_INSUFFICIENT_SRC)
b73bfc1c 4877 {
a4244313 4878 const unsigned char *src = source + coding->consumed;
b73bfc1c
KH
4879 unsigned char *dst = destination + coding->produced;
4880
4881 if (coding->type == coding_type_iso2022)
4882 ENCODE_RESET_PLANE_AND_REGISTER;
4883 if (COMPOSING_P (coding))
4884 *dst++ = ISO_CODE_ESC, *dst++ = '1';
4885 if (coding->consumed < src_bytes)
d46c5b12 4886 {
b73bfc1c
KH
4887 int len = src_bytes - coding->consumed;
4888
fabf4a91 4889 BCOPY_SHORT (src, dst, len);
b73bfc1c
KH
4890 if (coding->src_multibyte)
4891 len = str_as_unibyte (dst, len);
4892 dst += len;
4893 coding->consumed = src_bytes;
d46c5b12 4894 }
b73bfc1c 4895 coding->produced = coding->produced_char = dst - destination;
73be902c 4896 coding->result = CODING_FINISH_NORMAL;
4ed46869
KH
4897 }
4898
bb10be8b
KH
4899 if (coding->result == CODING_FINISH_INSUFFICIENT_SRC
4900 && coding->consumed == src_bytes)
4901 coding->result = CODING_FINISH_NORMAL;
4902
b73bfc1c 4903 return coding->result;
4ed46869
KH
4904}
4905
fb88bf2d
KH
4906/* Scan text in the region between *BEG and *END (byte positions),
4907 skip characters which we don't have to decode by coding system
4908 CODING at the head and tail, then set *BEG and *END to the region
4909 of the text we actually have to convert. The caller should move
b73bfc1c
KH
4910 the gap out of the region in advance if the region is from a
4911 buffer.
4ed46869 4912
d46c5b12
KH
4913 If STR is not NULL, *BEG and *END are indices into STR. */
4914
4915static void
4916shrink_decoding_region (beg, end, coding, str)
4917 int *beg, *end;
4918 struct coding_system *coding;
4919 unsigned char *str;
4920{
fb88bf2d 4921 unsigned char *begp_orig, *begp, *endp_orig, *endp, c;
d46c5b12 4922 int eol_conversion;
88993dfd 4923 Lisp_Object translation_table;
d46c5b12
KH
4924
4925 if (coding->type == coding_type_ccl
4926 || coding->type == coding_type_undecided
b73bfc1c
KH
4927 || coding->eol_type != CODING_EOL_LF
4928 || !NILP (coding->post_read_conversion)
4929 || coding->composing != COMPOSITION_DISABLED)
d46c5b12
KH
4930 {
4931 /* We can't skip any data. */
4932 return;
4933 }
b73bfc1c
KH
4934 if (coding->type == coding_type_no_conversion
4935 || coding->type == coding_type_raw_text
4936 || coding->type == coding_type_emacs_mule)
d46c5b12 4937 {
fb88bf2d
KH
4938 /* We need no conversion, but don't have to skip any data here.
4939 Decoding routine handles them effectively anyway. */
d46c5b12
KH
4940 return;
4941 }
4942
88993dfd
KH
4943 translation_table = coding->translation_table_for_decode;
4944 if (NILP (translation_table) && !NILP (Venable_character_translation))
4945 translation_table = Vstandard_translation_table_for_decode;
4946 if (CHAR_TABLE_P (translation_table))
4947 {
4948 int i;
4949 for (i = 0; i < 128; i++)
4950 if (!NILP (CHAR_TABLE_REF (translation_table, i)))
4951 break;
4952 if (i < 128)
fa46990e 4953 /* Some ASCII character should be translated. We give up
88993dfd
KH
4954 shrinking. */
4955 return;
4956 }
4957
b73bfc1c 4958 if (coding->heading_ascii >= 0)
d46c5b12
KH
4959 /* Detection routine has already found how much we can skip at the
4960 head. */
4961 *beg += coding->heading_ascii;
4962
4963 if (str)
4964 {
4965 begp_orig = begp = str + *beg;
4966 endp_orig = endp = str + *end;
4967 }
4968 else
4969 {
fb88bf2d 4970 begp_orig = begp = BYTE_POS_ADDR (*beg);
d46c5b12
KH
4971 endp_orig = endp = begp + *end - *beg;
4972 }
4973
fa46990e
DL
4974 eol_conversion = (coding->eol_type == CODING_EOL_CR
4975 || coding->eol_type == CODING_EOL_CRLF);
4976
d46c5b12
KH
4977 switch (coding->type)
4978 {
d46c5b12
KH
4979 case coding_type_sjis:
4980 case coding_type_big5:
4981 /* We can skip all ASCII characters at the head. */
4982 if (coding->heading_ascii < 0)
4983 {
4984 if (eol_conversion)
de9d083c 4985 while (begp < endp && *begp < 0x80 && *begp != '\r') begp++;
d46c5b12
KH
4986 else
4987 while (begp < endp && *begp < 0x80) begp++;
4988 }
4989 /* We can skip all ASCII characters at the tail except for the
4990 second byte of SJIS or BIG5 code. */
4991 if (eol_conversion)
de9d083c 4992 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\r') endp--;
d46c5b12
KH
4993 else
4994 while (begp < endp && endp[-1] < 0x80) endp--;
ee59c65f
RS
4995 /* Do not consider LF as ascii if preceded by CR, since that
4996 confuses eol decoding. */
4997 if (begp < endp && endp < endp_orig && endp[-1] == '\r' && endp[0] == '\n')
4998 endp++;
d46c5b12
KH
4999 if (begp < endp && endp < endp_orig && endp[-1] >= 0x80)
5000 endp++;
5001 break;
5002
b73bfc1c 5003 case coding_type_iso2022:
622fece5
KH
5004 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, 0) != CHARSET_ASCII)
5005 /* We can't skip any data. */
5006 break;
d46c5b12
KH
5007 if (coding->heading_ascii < 0)
5008 {
d46c5b12
KH
5009 /* We can skip all ASCII characters at the head except for a
5010 few control codes. */
5011 while (begp < endp && (c = *begp) < 0x80
5012 && c != ISO_CODE_CR && c != ISO_CODE_SO
5013 && c != ISO_CODE_SI && c != ISO_CODE_ESC
5014 && (!eol_conversion || c != ISO_CODE_LF))
5015 begp++;
5016 }
5017 switch (coding->category_idx)
5018 {
5019 case CODING_CATEGORY_IDX_ISO_8_1:
5020 case CODING_CATEGORY_IDX_ISO_8_2:
5021 /* We can skip all ASCII characters at the tail. */
5022 if (eol_conversion)
de9d083c 5023 while (begp < endp && (c = endp[-1]) < 0x80 && c != '\r') endp--;
d46c5b12
KH
5024 else
5025 while (begp < endp && endp[-1] < 0x80) endp--;
ee59c65f
RS
5026 /* Do not consider LF as ascii if preceded by CR, since that
5027 confuses eol decoding. */
5028 if (begp < endp && endp < endp_orig && endp[-1] == '\r' && endp[0] == '\n')
5029 endp++;
d46c5b12
KH
5030 break;
5031
5032 case CODING_CATEGORY_IDX_ISO_7:
5033 case CODING_CATEGORY_IDX_ISO_7_TIGHT:
de79a6a5 5034 {
8ca3766a 5035 /* We can skip all characters at the tail except for 8-bit
de79a6a5
KH
5036 codes and ESC and the following 2-byte at the tail. */
5037 unsigned char *eight_bit = NULL;
5038
5039 if (eol_conversion)
5040 while (begp < endp
5041 && (c = endp[-1]) != ISO_CODE_ESC && c != '\r')
5042 {
5043 if (!eight_bit && c & 0x80) eight_bit = endp;
5044 endp--;
5045 }
5046 else
5047 while (begp < endp
5048 && (c = endp[-1]) != ISO_CODE_ESC)
5049 {
5050 if (!eight_bit && c & 0x80) eight_bit = endp;
5051 endp--;
5052 }
5053 /* Do not consider LF as ascii if preceded by CR, since that
5054 confuses eol decoding. */
5055 if (begp < endp && endp < endp_orig
5056 && endp[-1] == '\r' && endp[0] == '\n')
5057 endp++;
5058 if (begp < endp && endp[-1] == ISO_CODE_ESC)
5059 {
5060 if (endp + 1 < endp_orig && end[0] == '(' && end[1] == 'B')
5061 /* This is an ASCII designation sequence. We can
5062 surely skip the tail. But, if we have
5063 encountered an 8-bit code, skip only the codes
5064 after that. */
5065 endp = eight_bit ? eight_bit : endp + 2;
5066 else
5067 /* Hmmm, we can't skip the tail. */
5068 endp = endp_orig;
5069 }
5070 else if (eight_bit)
5071 endp = eight_bit;
5072 }
d46c5b12 5073 }
b73bfc1c
KH
5074 break;
5075
5076 default:
5077 abort ();
d46c5b12
KH
5078 }
5079 *beg += begp - begp_orig;
5080 *end += endp - endp_orig;
5081 return;
5082}
5083
5084/* Like shrink_decoding_region but for encoding. */
5085
5086static void
5087shrink_encoding_region (beg, end, coding, str)
5088 int *beg, *end;
5089 struct coding_system *coding;
5090 unsigned char *str;
5091{
5092 unsigned char *begp_orig, *begp, *endp_orig, *endp;
5093 int eol_conversion;
88993dfd 5094 Lisp_Object translation_table;
d46c5b12 5095
b73bfc1c
KH
5096 if (coding->type == coding_type_ccl
5097 || coding->eol_type == CODING_EOL_CRLF
5098 || coding->eol_type == CODING_EOL_CR
87323294 5099 || (coding->cmp_data && coding->cmp_data->used > 0))
d46c5b12 5100 {
b73bfc1c
KH
5101 /* We can't skip any data. */
5102 return;
5103 }
5104 if (coding->type == coding_type_no_conversion
5105 || coding->type == coding_type_raw_text
5106 || coding->type == coding_type_emacs_mule
5107 || coding->type == coding_type_undecided)
5108 {
5109 /* We need no conversion, but don't have to skip any data here.
5110 Encoding routine handles them effectively anyway. */
d46c5b12
KH
5111 return;
5112 }
5113
88993dfd
KH
5114 translation_table = coding->translation_table_for_encode;
5115 if (NILP (translation_table) && !NILP (Venable_character_translation))
5116 translation_table = Vstandard_translation_table_for_encode;
5117 if (CHAR_TABLE_P (translation_table))
5118 {
5119 int i;
5120 for (i = 0; i < 128; i++)
5121 if (!NILP (CHAR_TABLE_REF (translation_table, i)))
5122 break;
5123 if (i < 128)
8ca3766a 5124 /* Some ASCII character should be translated. We give up
88993dfd
KH
5125 shrinking. */
5126 return;
5127 }
5128
d46c5b12
KH
5129 if (str)
5130 {
5131 begp_orig = begp = str + *beg;
5132 endp_orig = endp = str + *end;
5133 }
5134 else
5135 {
fb88bf2d 5136 begp_orig = begp = BYTE_POS_ADDR (*beg);
d46c5b12
KH
5137 endp_orig = endp = begp + *end - *beg;
5138 }
5139
5140 eol_conversion = (coding->eol_type == CODING_EOL_CR
5141 || coding->eol_type == CODING_EOL_CRLF);
5142
5143 /* Here, we don't have to check coding->pre_write_conversion because
5144 the caller is expected to have handled it already. */
5145 switch (coding->type)
5146 {
d46c5b12 5147 case coding_type_iso2022:
622fece5
KH
5148 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, 0) != CHARSET_ASCII)
5149 /* We can't skip any data. */
5150 break;
d46c5b12
KH
5151 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL)
5152 {
93dec019 5153 unsigned char *bol = begp;
d46c5b12
KH
5154 while (begp < endp && *begp < 0x80)
5155 {
5156 begp++;
5157 if (begp[-1] == '\n')
5158 bol = begp;
5159 }
5160 begp = bol;
5161 goto label_skip_tail;
5162 }
5163 /* fall down ... */
5164
b73bfc1c
KH
5165 case coding_type_sjis:
5166 case coding_type_big5:
d46c5b12
KH
5167 /* We can skip all ASCII characters at the head and tail. */
5168 if (eol_conversion)
5169 while (begp < endp && *begp < 0x80 && *begp != '\n') begp++;
5170 else
5171 while (begp < endp && *begp < 0x80) begp++;
5172 label_skip_tail:
5173 if (eol_conversion)
5174 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--;
5175 else
5176 while (begp < endp && *(endp - 1) < 0x80) endp--;
5177 break;
b73bfc1c
KH
5178
5179 default:
5180 abort ();
d46c5b12
KH
5181 }
5182
5183 *beg += begp - begp_orig;
5184 *end += endp - endp_orig;
5185 return;
5186}
5187
88993dfd
KH
5188/* As shrinking conversion region requires some overhead, we don't try
5189 shrinking if the length of conversion region is less than this
5190 value. */
5191static int shrink_conversion_region_threshhold = 1024;
5192
5193#define SHRINK_CONVERSION_REGION(beg, end, coding, str, encodep) \
5194 do { \
5195 if (*(end) - *(beg) > shrink_conversion_region_threshhold) \
5196 { \
5197 if (encodep) shrink_encoding_region (beg, end, coding, str); \
5198 else shrink_decoding_region (beg, end, coding, str); \
5199 } \
5200 } while (0)
5201
b843d1ae 5202static Lisp_Object
1c7457e2
KH
5203code_convert_region_unwind (arg)
5204 Lisp_Object arg;
b843d1ae
KH
5205{
5206 inhibit_pre_post_conversion = 0;
1c7457e2 5207 Vlast_coding_system_used = arg;
b843d1ae
KH
5208 return Qnil;
5209}
5210
ec6d2bb8
KH
5211/* Store information about all compositions in the range FROM and TO
5212 of OBJ in memory blocks pointed by CODING->cmp_data. OBJ is a
5213 buffer or a string, defaults to the current buffer. */
5214
5215void
5216coding_save_composition (coding, from, to, obj)
5217 struct coding_system *coding;
5218 int from, to;
5219 Lisp_Object obj;
5220{
5221 Lisp_Object prop;
5222 int start, end;
5223
91bee881
KH
5224 if (coding->composing == COMPOSITION_DISABLED)
5225 return;
5226 if (!coding->cmp_data)
5227 coding_allocate_composition_data (coding, from);
ec6d2bb8
KH
5228 if (!find_composition (from, to, &start, &end, &prop, obj)
5229 || end > to)
5230 return;
5231 if (start < from
5232 && (!find_composition (end, to, &start, &end, &prop, obj)
5233 || end > to))
5234 return;
5235 coding->composing = COMPOSITION_NO;
ec6d2bb8
KH
5236 do
5237 {
5238 if (COMPOSITION_VALID_P (start, end, prop))
5239 {
5240 enum composition_method method = COMPOSITION_METHOD (prop);
5241 if (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH
5242 >= COMPOSITION_DATA_SIZE)
5243 coding_allocate_composition_data (coding, from);
5244 /* For relative composition, we remember start and end
5245 positions, for the other compositions, we also remember
5246 components. */
5247 CODING_ADD_COMPOSITION_START (coding, start - from, method);
5248 if (method != COMPOSITION_RELATIVE)
5249 {
5250 /* We must store a*/
5251 Lisp_Object val, ch;
5252
5253 val = COMPOSITION_COMPONENTS (prop);
5254 if (CONSP (val))
5255 while (CONSP (val))
5256 {
5257 ch = XCAR (val), val = XCDR (val);
5258 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (ch));
5259 }
5260 else if (VECTORP (val) || STRINGP (val))
5261 {
5262 int len = (VECTORP (val)
d5db4077 5263 ? XVECTOR (val)->size : SCHARS (val));
ec6d2bb8
KH
5264 int i;
5265 for (i = 0; i < len; i++)
5266 {
5267 ch = (STRINGP (val)
5268 ? Faref (val, make_number (i))
5269 : XVECTOR (val)->contents[i]);
5270 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (ch));
5271 }
5272 }
5273 else /* INTEGERP (val) */
5274 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (val));
5275 }
5276 CODING_ADD_COMPOSITION_END (coding, end - from);
5277 }
5278 start = end;
5279 }
5280 while (start < to
5281 && find_composition (start, to, &start, &end, &prop, obj)
5282 && end <= to);
5283
5284 /* Make coding->cmp_data point to the first memory block. */
5285 while (coding->cmp_data->prev)
5286 coding->cmp_data = coding->cmp_data->prev;
5287 coding->cmp_data_start = 0;
5288}
5289
5290/* Reflect the saved information about compositions to OBJ.
8ca3766a 5291 CODING->cmp_data points to a memory block for the information. OBJ
ec6d2bb8
KH
5292 is a buffer or a string, defaults to the current buffer. */
5293
33fb63eb 5294void
ec6d2bb8
KH
5295coding_restore_composition (coding, obj)
5296 struct coding_system *coding;
5297 Lisp_Object obj;
5298{
5299 struct composition_data *cmp_data = coding->cmp_data;
5300
5301 if (!cmp_data)
5302 return;
5303
5304 while (cmp_data->prev)
5305 cmp_data = cmp_data->prev;
5306
5307 while (cmp_data)
5308 {
5309 int i;
5310
78108bcd
KH
5311 for (i = 0; i < cmp_data->used && cmp_data->data[i] > 0;
5312 i += cmp_data->data[i])
ec6d2bb8
KH
5313 {
5314 int *data = cmp_data->data + i;
5315 enum composition_method method = (enum composition_method) data[3];
5316 Lisp_Object components;
5317
5318 if (method == COMPOSITION_RELATIVE)
5319 components = Qnil;
5320 else
5321 {
5322 int len = data[0] - 4, j;
5323 Lisp_Object args[MAX_COMPOSITION_COMPONENTS * 2 - 1];
5324
b6871cc7
KH
5325 if (method == COMPOSITION_WITH_RULE_ALTCHARS
5326 && len % 2 == 0)
5327 len --;
ec6d2bb8
KH
5328 for (j = 0; j < len; j++)
5329 args[j] = make_number (data[4 + j]);
5330 components = (method == COMPOSITION_WITH_ALTCHARS
5331 ? Fstring (len, args) : Fvector (len, args));
5332 }
5333 compose_text (data[1], data[2], components, Qnil, obj);
5334 }
5335 cmp_data = cmp_data->next;
5336 }
5337}
5338
d46c5b12 5339/* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the
fb88bf2d
KH
5340 text from FROM to TO (byte positions are FROM_BYTE and TO_BYTE) by
5341 coding system CODING, and return the status code of code conversion
5342 (currently, this value has no meaning).
5343
5344 How many characters (and bytes) are converted to how many
5345 characters (and bytes) are recorded in members of the structure
5346 CODING.
d46c5b12 5347
6e44253b 5348 If REPLACE is nonzero, we do various things as if the original text
d46c5b12 5349 is deleted and a new text is inserted. See the comments in
b73bfc1c
KH
5350 replace_range (insdel.c) to know what we are doing.
5351
5352 If REPLACE is zero, it is assumed that the source text is unibyte.
8ca3766a 5353 Otherwise, it is assumed that the source text is multibyte. */
4ed46869
KH
5354
5355int
6e44253b
KH
5356code_convert_region (from, from_byte, to, to_byte, coding, encodep, replace)
5357 int from, from_byte, to, to_byte, encodep, replace;
4ed46869 5358 struct coding_system *coding;
4ed46869 5359{
fb88bf2d 5360 int len = to - from, len_byte = to_byte - from_byte;
72d1a715 5361 int nchars_del = 0, nbytes_del = 0;
fb88bf2d 5362 int require, inserted, inserted_byte;
4b39528c 5363 int head_skip, tail_skip, total_skip = 0;
84d60297 5364 Lisp_Object saved_coding_symbol;
fb88bf2d 5365 int first = 1;
fb88bf2d 5366 unsigned char *src, *dst;
84d60297 5367 Lisp_Object deletion;
e133c8fa 5368 int orig_point = PT, orig_len = len;
6abb9bd9 5369 int prev_Z;
b73bfc1c
KH
5370 int multibyte_p = !NILP (current_buffer->enable_multibyte_characters);
5371
84d60297 5372 deletion = Qnil;
8844fa83 5373 saved_coding_symbol = coding->symbol;
d46c5b12 5374
83fa074f 5375 if (from < PT && PT < to)
e133c8fa
KH
5376 {
5377 TEMP_SET_PT_BOTH (from, from_byte);
5378 orig_point = from;
5379 }
83fa074f 5380
6e44253b 5381 if (replace)
d46c5b12 5382 {
fb88bf2d 5383 int saved_from = from;
e077cc80 5384 int saved_inhibit_modification_hooks;
fb88bf2d 5385
d46c5b12 5386 prepare_to_modify_buffer (from, to, &from);
fb88bf2d
KH
5387 if (saved_from != from)
5388 {
5389 to = from + len;
b73bfc1c 5390 from_byte = CHAR_TO_BYTE (from), to_byte = CHAR_TO_BYTE (to);
fb88bf2d
KH
5391 len_byte = to_byte - from_byte;
5392 }
e077cc80
KH
5393
5394 /* The code conversion routine can not preserve text properties
5395 for now. So, we must remove all text properties in the
5396 region. Here, we must suppress all modification hooks. */
5397 saved_inhibit_modification_hooks = inhibit_modification_hooks;
5398 inhibit_modification_hooks = 1;
5399 Fset_text_properties (make_number (from), make_number (to), Qnil, Qnil);
5400 inhibit_modification_hooks = saved_inhibit_modification_hooks;
d46c5b12 5401 }
d46c5b12
KH
5402
5403 if (! encodep && CODING_REQUIRE_DETECTION (coding))
5404 {
12410ef1 5405 /* We must detect encoding of text and eol format. */
d46c5b12
KH
5406
5407 if (from < GPT && to > GPT)
5408 move_gap_both (from, from_byte);
5409 if (coding->type == coding_type_undecided)
5410 {
fb88bf2d 5411 detect_coding (coding, BYTE_POS_ADDR (from_byte), len_byte);
d46c5b12 5412 if (coding->type == coding_type_undecided)
62b3ef1d
KH
5413 {
5414 /* It seems that the text contains only ASCII, but we
d9aef30f 5415 should not leave it undecided because the deeper
62b3ef1d
KH
5416 decoding routine (decode_coding) tries to detect the
5417 encodings again in vain. */
5418 coding->type = coding_type_emacs_mule;
5419 coding->category_idx = CODING_CATEGORY_IDX_EMACS_MULE;
d280ccb6
KH
5420 /* As emacs-mule decoder will handle composition, we
5421 need this setting to allocate coding->cmp_data
5422 later. */
5423 coding->composing = COMPOSITION_NO;
62b3ef1d 5424 }
d46c5b12 5425 }
aaaf0b1e
KH
5426 if (coding->eol_type == CODING_EOL_UNDECIDED
5427 && coding->type != coding_type_ccl)
d46c5b12 5428 {
d46c5b12
KH
5429 detect_eol (coding, BYTE_POS_ADDR (from_byte), len_byte);
5430 if (coding->eol_type == CODING_EOL_UNDECIDED)
5431 coding->eol_type = CODING_EOL_LF;
5432 /* We had better recover the original eol format if we
8ca3766a 5433 encounter an inconsistent eol format while decoding. */
d46c5b12
KH
5434 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
5435 }
5436 }
5437
d46c5b12
KH
5438 /* Now we convert the text. */
5439
5440 /* For encoding, we must process pre-write-conversion in advance. */
b73bfc1c
KH
5441 if (! inhibit_pre_post_conversion
5442 && encodep
d46c5b12
KH
5443 && SYMBOLP (coding->pre_write_conversion)
5444 && ! NILP (Ffboundp (coding->pre_write_conversion)))
5445 {
2b4f9037
KH
5446 /* The function in pre-write-conversion may put a new text in a
5447 new buffer. */
0007bdd0
KH
5448 struct buffer *prev = current_buffer;
5449 Lisp_Object new;
d46c5b12 5450
1c7457e2 5451 record_unwind_protect (code_convert_region_unwind,
24a948a7 5452 Vlast_coding_system_used);
b843d1ae
KH
5453 /* We should not call any more pre-write/post-read-conversion
5454 functions while this pre-write-conversion is running. */
5455 inhibit_pre_post_conversion = 1;
b39f748c
AS
5456 call2 (coding->pre_write_conversion,
5457 make_number (from), make_number (to));
b843d1ae
KH
5458 inhibit_pre_post_conversion = 0;
5459 /* Discard the unwind protect. */
5460 specpdl_ptr--;
5461
d46c5b12
KH
5462 if (current_buffer != prev)
5463 {
5464 len = ZV - BEGV;
0007bdd0 5465 new = Fcurrent_buffer ();
d46c5b12 5466 set_buffer_internal_1 (prev);
7dae4502 5467 del_range_2 (from, from_byte, to, to_byte, 0);
e133c8fa 5468 TEMP_SET_PT_BOTH (from, from_byte);
0007bdd0
KH
5469 insert_from_buffer (XBUFFER (new), 1, len, 0);
5470 Fkill_buffer (new);
e133c8fa
KH
5471 if (orig_point >= to)
5472 orig_point += len - orig_len;
5473 else if (orig_point > from)
5474 orig_point = from;
5475 orig_len = len;
d46c5b12 5476 to = from + len;
b73bfc1c
KH
5477 from_byte = CHAR_TO_BYTE (from);
5478 to_byte = CHAR_TO_BYTE (to);
d46c5b12 5479 len_byte = to_byte - from_byte;
e133c8fa 5480 TEMP_SET_PT_BOTH (from, from_byte);
d46c5b12
KH
5481 }
5482 }
5483
12410ef1 5484 if (replace)
72d1a715
RS
5485 {
5486 if (! EQ (current_buffer->undo_list, Qt))
5487 deletion = make_buffer_string_both (from, from_byte, to, to_byte, 1);
5488 else
5489 {
5490 nchars_del = to - from;
5491 nbytes_del = to_byte - from_byte;
5492 }
5493 }
12410ef1 5494
ec6d2bb8
KH
5495 if (coding->composing != COMPOSITION_DISABLED)
5496 {
5497 if (encodep)
5498 coding_save_composition (coding, from, to, Fcurrent_buffer ());
5499 else
5500 coding_allocate_composition_data (coding, from);
5501 }
fb88bf2d 5502
b73bfc1c 5503 /* Try to skip the heading and tailing ASCIIs. */
4956c225
KH
5504 if (coding->type != coding_type_ccl)
5505 {
5506 int from_byte_orig = from_byte, to_byte_orig = to_byte;
ec6d2bb8 5507
4956c225
KH
5508 if (from < GPT && GPT < to)
5509 move_gap_both (from, from_byte);
5510 SHRINK_CONVERSION_REGION (&from_byte, &to_byte, coding, NULL, encodep);
5511 if (from_byte == to_byte
5512 && (encodep || NILP (coding->post_read_conversion))
5513 && ! CODING_REQUIRE_FLUSHING (coding))
5514 {
5515 coding->produced = len_byte;
5516 coding->produced_char = len;
5517 if (!replace)
5518 /* We must record and adjust for this new text now. */
5519 adjust_after_insert (from, from_byte_orig, to, to_byte_orig, len);
5520 return 0;
5521 }
5522
5523 head_skip = from_byte - from_byte_orig;
5524 tail_skip = to_byte_orig - to_byte;
5525 total_skip = head_skip + tail_skip;
5526 from += head_skip;
5527 to -= tail_skip;
5528 len -= total_skip; len_byte -= total_skip;
5529 }
d46c5b12 5530
8ca3766a 5531 /* For conversion, we must put the gap before the text in addition to
fb88bf2d
KH
5532 making the gap larger for efficient decoding. The required gap
5533 size starts from 2000 which is the magic number used in make_gap.
5534 But, after one batch of conversion, it will be incremented if we
5535 find that it is not enough . */
d46c5b12
KH
5536 require = 2000;
5537
5538 if (GAP_SIZE < require)
5539 make_gap (require - GAP_SIZE);
5540 move_gap_both (from, from_byte);
5541
d46c5b12 5542 inserted = inserted_byte = 0;
fb88bf2d
KH
5543
5544 GAP_SIZE += len_byte;
5545 ZV -= len;
5546 Z -= len;
5547 ZV_BYTE -= len_byte;
5548 Z_BYTE -= len_byte;
5549
d9f9a1bc
GM
5550 if (GPT - BEG < BEG_UNCHANGED)
5551 BEG_UNCHANGED = GPT - BEG;
5552 if (Z - GPT < END_UNCHANGED)
5553 END_UNCHANGED = Z - GPT;
f2558efd 5554
b73bfc1c
KH
5555 if (!encodep && coding->src_multibyte)
5556 {
5557 /* Decoding routines expects that the source text is unibyte.
5558 We must convert 8-bit characters of multibyte form to
5559 unibyte. */
5560 int len_byte_orig = len_byte;
5561 len_byte = str_as_unibyte (GAP_END_ADDR - len_byte, len_byte);
5562 if (len_byte < len_byte_orig)
5563 safe_bcopy (GAP_END_ADDR - len_byte_orig, GAP_END_ADDR - len_byte,
5564 len_byte);
5565 coding->src_multibyte = 0;
5566 }
5567
d46c5b12
KH
5568 for (;;)
5569 {
fb88bf2d 5570 int result;
d46c5b12 5571
ec6d2bb8 5572 /* The buffer memory is now:
b73bfc1c
KH
5573 +--------+converted-text+---------+-------original-text-------+---+
5574 |<-from->|<--inserted-->|---------|<--------len_byte--------->|---|
5575 |<---------------------- GAP ----------------------->| */
ec6d2bb8
KH
5576 src = GAP_END_ADDR - len_byte;
5577 dst = GPT_ADDR + inserted_byte;
5578
d46c5b12 5579 if (encodep)
fb88bf2d 5580 result = encode_coding (coding, src, dst, len_byte, 0);
d46c5b12 5581 else
0e79d667
RS
5582 {
5583 if (coding->composing != COMPOSITION_DISABLED)
5584 coding->cmp_data->char_offset = from + inserted;
5585 result = decode_coding (coding, src, dst, len_byte, 0);
5586 }
ec6d2bb8
KH
5587
5588 /* The buffer memory is now:
b73bfc1c
KH
5589 +--------+-------converted-text----+--+------original-text----+---+
5590 |<-from->|<-inserted->|<-produced->|--|<-(len_byte-consumed)->|---|
5591 |<---------------------- GAP ----------------------->| */
ec6d2bb8 5592
d46c5b12
KH
5593 inserted += coding->produced_char;
5594 inserted_byte += coding->produced;
d46c5b12 5595 len_byte -= coding->consumed;
ec6d2bb8
KH
5596
5597 if (result == CODING_FINISH_INSUFFICIENT_CMP)
5598 {
5599 coding_allocate_composition_data (coding, from + inserted);
5600 continue;
5601 }
5602
fb88bf2d 5603 src += coding->consumed;
3636f7a3 5604 dst += coding->produced;
d46c5b12 5605
9864ebce
KH
5606 if (result == CODING_FINISH_NORMAL)
5607 {
5608 src += len_byte;
5609 break;
5610 }
d46c5b12
KH
5611 if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL)
5612 {
fb88bf2d 5613 unsigned char *pend = dst, *p = pend - inserted_byte;
38edf7d4 5614 Lisp_Object eol_type;
d46c5b12
KH
5615
5616 /* Encode LFs back to the original eol format (CR or CRLF). */
5617 if (coding->eol_type == CODING_EOL_CR)
5618 {
5619 while (p < pend) if (*p++ == '\n') p[-1] = '\r';
5620 }
5621 else
5622 {
d46c5b12
KH
5623 int count = 0;
5624
fb88bf2d
KH
5625 while (p < pend) if (*p++ == '\n') count++;
5626 if (src - dst < count)
d46c5b12 5627 {
38edf7d4 5628 /* We don't have sufficient room for encoding LFs
fb88bf2d
KH
5629 back to CRLF. We must record converted and
5630 not-yet-converted text back to the buffer
5631 content, enlarge the gap, then record them out of
5632 the buffer contents again. */
5633 int add = len_byte + inserted_byte;
5634
5635 GAP_SIZE -= add;
5636 ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add;
5637 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5638 make_gap (count - GAP_SIZE);
5639 GAP_SIZE += add;
5640 ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add;
5641 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
5642 /* Don't forget to update SRC, DST, and PEND. */
5643 src = GAP_END_ADDR - len_byte;
5644 dst = GPT_ADDR + inserted_byte;
5645 pend = dst;
d46c5b12 5646 }
d46c5b12
KH
5647 inserted += count;
5648 inserted_byte += count;
fb88bf2d
KH
5649 coding->produced += count;
5650 p = dst = pend + count;
5651 while (count)
5652 {
5653 *--p = *--pend;
5654 if (*p == '\n') count--, *--p = '\r';
5655 }
d46c5b12
KH
5656 }
5657
5658 /* Suppress eol-format conversion in the further conversion. */
5659 coding->eol_type = CODING_EOL_LF;
5660
38edf7d4
KH
5661 /* Set the coding system symbol to that for Unix-like EOL. */
5662 eol_type = Fget (saved_coding_symbol, Qeol_type);
5663 if (VECTORP (eol_type)
5664 && XVECTOR (eol_type)->size == 3
5665 && SYMBOLP (XVECTOR (eol_type)->contents[CODING_EOL_LF]))
5666 coding->symbol = XVECTOR (eol_type)->contents[CODING_EOL_LF];
5667 else
5668 coding->symbol = saved_coding_symbol;
93dec019 5669
fb88bf2d 5670 continue;
d46c5b12
KH
5671 }
5672 if (len_byte <= 0)
944bd420
KH
5673 {
5674 if (coding->type != coding_type_ccl
5675 || coding->mode & CODING_MODE_LAST_BLOCK)
5676 break;
5677 coding->mode |= CODING_MODE_LAST_BLOCK;
5678 continue;
5679 }
d46c5b12
KH
5680 if (result == CODING_FINISH_INSUFFICIENT_SRC)
5681 {
5682 /* The source text ends in invalid codes. Let's just
5683 make them valid buffer contents, and finish conversion. */
70ad9fc4
GM
5684 if (multibyte_p)
5685 {
5686 unsigned char *start = dst;
93dec019 5687
70ad9fc4
GM
5688 inserted += len_byte;
5689 while (len_byte--)
5690 {
5691 int c = *src++;
5692 dst += CHAR_STRING (c, dst);
5693 }
5694
5695 inserted_byte += dst - start;
5696 }
5697 else
5698 {
5699 inserted += len_byte;
5700 inserted_byte += len_byte;
5701 while (len_byte--)
5702 *dst++ = *src++;
5703 }
d46c5b12
KH
5704 break;
5705 }
9864ebce
KH
5706 if (result == CODING_FINISH_INTERRUPT)
5707 {
5708 /* The conversion procedure was interrupted by a user. */
9864ebce
KH
5709 break;
5710 }
5711 /* Now RESULT == CODING_FINISH_INSUFFICIENT_DST */
5712 if (coding->consumed < 1)
5713 {
5714 /* It's quite strange to require more memory without
5715 consuming any bytes. Perhaps CCL program bug. */
9864ebce
KH
5716 break;
5717 }
fb88bf2d
KH
5718 if (first)
5719 {
5720 /* We have just done the first batch of conversion which was
8ca3766a 5721 stopped because of insufficient gap. Let's reconsider the
fb88bf2d
KH
5722 required gap size (i.e. SRT - DST) now.
5723
5724 We have converted ORIG bytes (== coding->consumed) into
5725 NEW bytes (coding->produced). To convert the remaining
5726 LEN bytes, we may need REQUIRE bytes of gap, where:
5727 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
5728 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
5729 Here, we are sure that NEW >= ORIG. */
b3385c28
KH
5730 float ratio;
5731
5732 if (coding->produced <= coding->consumed)
5733 {
5734 /* This happens because of CCL-based coding system with
5735 eol-type CRLF. */
5736 require = 0;
5737 }
5738 else
5739 {
5740 ratio = (coding->produced - coding->consumed) / coding->consumed;
5741 require = len_byte * ratio;
5742 }
fb88bf2d
KH
5743 first = 0;
5744 }
5745 if ((src - dst) < (require + 2000))
5746 {
5747 /* See the comment above the previous call of make_gap. */
5748 int add = len_byte + inserted_byte;
5749
5750 GAP_SIZE -= add;
5751 ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add;
5752 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5753 make_gap (require + 2000);
5754 GAP_SIZE += add;
5755 ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add;
5756 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
fb88bf2d 5757 }
d46c5b12 5758 }
fb88bf2d
KH
5759 if (src - dst > 0) *dst = 0; /* Put an anchor. */
5760
b73bfc1c
KH
5761 if (encodep && coding->dst_multibyte)
5762 {
5763 /* The output is unibyte. We must convert 8-bit characters to
5764 multibyte form. */
5765 if (inserted_byte * 2 > GAP_SIZE)
5766 {
5767 GAP_SIZE -= inserted_byte;
5768 ZV += inserted_byte; Z += inserted_byte;
5769 ZV_BYTE += inserted_byte; Z_BYTE += inserted_byte;
5770 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5771 make_gap (inserted_byte - GAP_SIZE);
5772 GAP_SIZE += inserted_byte;
5773 ZV -= inserted_byte; Z -= inserted_byte;
5774 ZV_BYTE -= inserted_byte; Z_BYTE -= inserted_byte;
5775 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
5776 }
5777 inserted_byte = str_to_multibyte (GPT_ADDR, GAP_SIZE, inserted_byte);
5778 }
7553d0e1 5779
93dec019 5780 /* If we shrank the conversion area, adjust it now. */
12410ef1
KH
5781 if (total_skip > 0)
5782 {
5783 if (tail_skip > 0)
5784 safe_bcopy (GAP_END_ADDR, GPT_ADDR + inserted_byte, tail_skip);
5785 inserted += total_skip; inserted_byte += total_skip;
5786 GAP_SIZE += total_skip;
5787 GPT -= head_skip; GPT_BYTE -= head_skip;
5788 ZV -= total_skip; ZV_BYTE -= total_skip;
5789 Z -= total_skip; Z_BYTE -= total_skip;
5790 from -= head_skip; from_byte -= head_skip;
5791 to += tail_skip; to_byte += tail_skip;
5792 }
5793
6abb9bd9 5794 prev_Z = Z;
72d1a715
RS
5795 if (! EQ (current_buffer->undo_list, Qt))
5796 adjust_after_replace (from, from_byte, deletion, inserted, inserted_byte);
5797 else
5798 adjust_after_replace_noundo (from, from_byte, nchars_del, nbytes_del,
5799 inserted, inserted_byte);
6abb9bd9 5800 inserted = Z - prev_Z;
4ed46869 5801
ec6d2bb8
KH
5802 if (!encodep && coding->cmp_data && coding->cmp_data->used)
5803 coding_restore_composition (coding, Fcurrent_buffer ());
5804 coding_free_composition_data (coding);
5805
b73bfc1c
KH
5806 if (! inhibit_pre_post_conversion
5807 && ! encodep && ! NILP (coding->post_read_conversion))
d46c5b12 5808 {
2b4f9037 5809 Lisp_Object val;
1c7457e2 5810 Lisp_Object saved_coding_system;
4ed46869 5811
e133c8fa
KH
5812 if (from != PT)
5813 TEMP_SET_PT_BOTH (from, from_byte);
6abb9bd9 5814 prev_Z = Z;
1c7457e2
KH
5815 record_unwind_protect (code_convert_region_unwind,
5816 Vlast_coding_system_used);
5817 saved_coding_system = Vlast_coding_system_used;
5818 Vlast_coding_system_used = coding->symbol;
b843d1ae
KH
5819 /* We should not call any more pre-write/post-read-conversion
5820 functions while this post-read-conversion is running. */
5821 inhibit_pre_post_conversion = 1;
2b4f9037 5822 val = call1 (coding->post_read_conversion, make_number (inserted));
b843d1ae 5823 inhibit_pre_post_conversion = 0;
1c7457e2
KH
5824 coding->symbol = Vlast_coding_system_used;
5825 Vlast_coding_system_used = saved_coding_system;
b843d1ae
KH
5826 /* Discard the unwind protect. */
5827 specpdl_ptr--;
b7826503 5828 CHECK_NUMBER (val);
944bd420 5829 inserted += Z - prev_Z;
e133c8fa
KH
5830 }
5831
5832 if (orig_point >= from)
5833 {
5834 if (orig_point >= from + orig_len)
5835 orig_point += inserted - orig_len;
5836 else
5837 orig_point = from;
5838 TEMP_SET_PT (orig_point);
d46c5b12 5839 }
4ed46869 5840
ec6d2bb8
KH
5841 if (replace)
5842 {
5843 signal_after_change (from, to - from, inserted);
e19539f1 5844 update_compositions (from, from + inserted, CHECK_BORDER);
ec6d2bb8 5845 }
2b4f9037 5846
fb88bf2d 5847 {
12410ef1
KH
5848 coding->consumed = to_byte - from_byte;
5849 coding->consumed_char = to - from;
5850 coding->produced = inserted_byte;
5851 coding->produced_char = inserted;
fb88bf2d 5852 }
7553d0e1 5853
fb88bf2d 5854 return 0;
d46c5b12
KH
5855}
5856
5857Lisp_Object
b73bfc1c
KH
5858run_pre_post_conversion_on_str (str, coding, encodep)
5859 Lisp_Object str;
5860 struct coding_system *coding;
5861 int encodep;
5862{
aed13378 5863 int count = SPECPDL_INDEX ();
cf3b32fc 5864 struct gcpro gcpro1, gcpro2;
b73bfc1c 5865 int multibyte = STRING_MULTIBYTE (str);
3fd9494b
RS
5866 Lisp_Object buffer;
5867 struct buffer *buf;
cf3b32fc 5868 Lisp_Object old_deactivate_mark;
b73bfc1c
KH
5869
5870 record_unwind_protect (Fset_buffer, Fcurrent_buffer ());
1c7457e2
KH
5871 record_unwind_protect (code_convert_region_unwind,
5872 Vlast_coding_system_used);
cf3b32fc
RS
5873 /* It is not crucial to specbind this. */
5874 old_deactivate_mark = Vdeactivate_mark;
5875 GCPRO2 (str, old_deactivate_mark);
3fd9494b
RS
5876
5877 buffer = Fget_buffer_create (build_string (" *code-converting-work*"));
5878 buf = XBUFFER (buffer);
5879
5880 buf->directory = current_buffer->directory;
5881 buf->read_only = Qnil;
5882 buf->filename = Qnil;
5883 buf->undo_list = Qt;
5884 buf->overlays_before = Qnil;
5885 buf->overlays_after = Qnil;
5886
5887 set_buffer_internal (buf);
b73bfc1c
KH
5888 /* We must insert the contents of STR as is without
5889 unibyte<->multibyte conversion. For that, we adjust the
5890 multibyteness of the working buffer to that of STR. */
5891 Ferase_buffer ();
3fd9494b
RS
5892 buf->enable_multibyte_characters = multibyte ? Qt : Qnil;
5893
b73bfc1c 5894 insert_from_string (str, 0, 0,
d5db4077 5895 SCHARS (str), SBYTES (str), 0);
b73bfc1c
KH
5896 UNGCPRO;
5897 inhibit_pre_post_conversion = 1;
5898 if (encodep)
5899 call2 (coding->pre_write_conversion, make_number (BEG), make_number (Z));
5900 else
6bac5b12 5901 {
1c7457e2 5902 Vlast_coding_system_used = coding->symbol;
6bac5b12
KH
5903 TEMP_SET_PT_BOTH (BEG, BEG_BYTE);
5904 call1 (coding->post_read_conversion, make_number (Z - BEG));
1c7457e2 5905 coding->symbol = Vlast_coding_system_used;
6bac5b12 5906 }
b73bfc1c 5907 inhibit_pre_post_conversion = 0;
cf3b32fc 5908 Vdeactivate_mark = old_deactivate_mark;
78108bcd 5909 str = make_buffer_string (BEG, Z, 1);
b73bfc1c
KH
5910 return unbind_to (count, str);
5911}
5912
5913Lisp_Object
5914decode_coding_string (str, coding, nocopy)
d46c5b12 5915 Lisp_Object str;
4ed46869 5916 struct coding_system *coding;
b73bfc1c 5917 int nocopy;
4ed46869 5918{
d46c5b12 5919 int len;
73be902c 5920 struct conversion_buffer buf;
da55a2b7 5921 int from, to_byte;
84d60297 5922 Lisp_Object saved_coding_symbol;
d46c5b12 5923 int result;
78108bcd 5924 int require_decoding;
73be902c
KH
5925 int shrinked_bytes = 0;
5926 Lisp_Object newstr;
2391eaa4 5927 int consumed, consumed_char, produced, produced_char;
4ed46869 5928
b73bfc1c 5929 from = 0;
d5db4077 5930 to_byte = SBYTES (str);
4ed46869 5931
8844fa83 5932 saved_coding_symbol = coding->symbol;
764ca8da
KH
5933 coding->src_multibyte = STRING_MULTIBYTE (str);
5934 coding->dst_multibyte = 1;
b73bfc1c 5935 if (CODING_REQUIRE_DETECTION (coding))
d46c5b12
KH
5936 {
5937 /* See the comments in code_convert_region. */
5938 if (coding->type == coding_type_undecided)
5939 {
d5db4077 5940 detect_coding (coding, SDATA (str), to_byte);
d46c5b12 5941 if (coding->type == coding_type_undecided)
d280ccb6
KH
5942 {
5943 coding->type = coding_type_emacs_mule;
5944 coding->category_idx = CODING_CATEGORY_IDX_EMACS_MULE;
5945 /* As emacs-mule decoder will handle composition, we
5946 need this setting to allocate coding->cmp_data
5947 later. */
5948 coding->composing = COMPOSITION_NO;
5949 }
d46c5b12 5950 }
aaaf0b1e
KH
5951 if (coding->eol_type == CODING_EOL_UNDECIDED
5952 && coding->type != coding_type_ccl)
d46c5b12
KH
5953 {
5954 saved_coding_symbol = coding->symbol;
d5db4077 5955 detect_eol (coding, SDATA (str), to_byte);
d46c5b12
KH
5956 if (coding->eol_type == CODING_EOL_UNDECIDED)
5957 coding->eol_type = CODING_EOL_LF;
5958 /* We had better recover the original eol format if we
8ca3766a 5959 encounter an inconsistent eol format while decoding. */
d46c5b12
KH
5960 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
5961 }
5962 }
4ed46869 5963
764ca8da
KH
5964 if (coding->type == coding_type_no_conversion
5965 || coding->type == coding_type_raw_text)
5966 coding->dst_multibyte = 0;
5967
78108bcd 5968 require_decoding = CODING_REQUIRE_DECODING (coding);
ec6d2bb8 5969
b73bfc1c 5970 if (STRING_MULTIBYTE (str))
d46c5b12 5971 {
b73bfc1c
KH
5972 /* Decoding routines expect the source text to be unibyte. */
5973 str = Fstring_as_unibyte (str);
d5db4077 5974 to_byte = SBYTES (str);
b73bfc1c 5975 nocopy = 1;
764ca8da 5976 coding->src_multibyte = 0;
b73bfc1c 5977 }
ec6d2bb8 5978
b73bfc1c 5979 /* Try to skip the heading and tailing ASCIIs. */
78108bcd 5980 if (require_decoding && coding->type != coding_type_ccl)
4956c225 5981 {
d5db4077 5982 SHRINK_CONVERSION_REGION (&from, &to_byte, coding, SDATA (str),
4956c225
KH
5983 0);
5984 if (from == to_byte)
78108bcd 5985 require_decoding = 0;
d5db4077 5986 shrinked_bytes = from + (SBYTES (str) - to_byte);
4956c225 5987 }
b73bfc1c 5988
78108bcd
KH
5989 if (!require_decoding)
5990 {
d5db4077
KR
5991 coding->consumed = SBYTES (str);
5992 coding->consumed_char = SCHARS (str);
78108bcd
KH
5993 if (coding->dst_multibyte)
5994 {
5995 str = Fstring_as_multibyte (str);
5996 nocopy = 1;
5997 }
d5db4077
KR
5998 coding->produced = SBYTES (str);
5999 coding->produced_char = SCHARS (str);
78108bcd
KH
6000 return (nocopy ? str : Fcopy_sequence (str));
6001 }
6002
6003 if (coding->composing != COMPOSITION_DISABLED)
6004 coding_allocate_composition_data (coding, from);
b73bfc1c 6005 len = decoding_buffer_size (coding, to_byte - from);
73be902c 6006 allocate_conversion_buffer (buf, len);
4ed46869 6007
2391eaa4 6008 consumed = consumed_char = produced = produced_char = 0;
73be902c 6009 while (1)
4ed46869 6010 {
d5db4077 6011 result = decode_coding (coding, SDATA (str) + from + consumed,
73be902c
KH
6012 buf.data + produced, to_byte - from - consumed,
6013 buf.size - produced);
6014 consumed += coding->consumed;
2391eaa4 6015 consumed_char += coding->consumed_char;
73be902c
KH
6016 produced += coding->produced;
6017 produced_char += coding->produced_char;
2391eaa4
KH
6018 if (result == CODING_FINISH_NORMAL
6019 || (result == CODING_FINISH_INSUFFICIENT_SRC
6020 && coding->consumed == 0))
73be902c
KH
6021 break;
6022 if (result == CODING_FINISH_INSUFFICIENT_CMP)
6023 coding_allocate_composition_data (coding, from + produced_char);
6024 else if (result == CODING_FINISH_INSUFFICIENT_DST)
6025 extend_conversion_buffer (&buf);
6026 else if (result == CODING_FINISH_INCONSISTENT_EOL)
6027 {
8844fa83
KH
6028 Lisp_Object eol_type;
6029
73be902c
KH
6030 /* Recover the original EOL format. */
6031 if (coding->eol_type == CODING_EOL_CR)
6032 {
6033 unsigned char *p;
6034 for (p = buf.data; p < buf.data + produced; p++)
6035 if (*p == '\n') *p = '\r';
6036 }
6037 else if (coding->eol_type == CODING_EOL_CRLF)
6038 {
6039 int num_eol = 0;
6040 unsigned char *p0, *p1;
6041 for (p0 = buf.data, p1 = p0 + produced; p0 < p1; p0++)
6042 if (*p0 == '\n') num_eol++;
6043 if (produced + num_eol >= buf.size)
6044 extend_conversion_buffer (&buf);
6045 for (p0 = buf.data + produced, p1 = p0 + num_eol; p0 > buf.data;)
6046 {
6047 *--p1 = *--p0;
6048 if (*p0 == '\n') *--p1 = '\r';
6049 }
6050 produced += num_eol;
6051 produced_char += num_eol;
93dec019 6052 }
8844fa83 6053 /* Suppress eol-format conversion in the further conversion. */
73be902c 6054 coding->eol_type = CODING_EOL_LF;
8844fa83
KH
6055
6056 /* Set the coding system symbol to that for Unix-like EOL. */
6057 eol_type = Fget (saved_coding_symbol, Qeol_type);
6058 if (VECTORP (eol_type)
6059 && XVECTOR (eol_type)->size == 3
6060 && SYMBOLP (XVECTOR (eol_type)->contents[CODING_EOL_LF]))
6061 coding->symbol = XVECTOR (eol_type)->contents[CODING_EOL_LF];
6062 else
6063 coding->symbol = saved_coding_symbol;
6064
6065
73be902c 6066 }
4ed46869 6067 }
d46c5b12 6068
2391eaa4
KH
6069 coding->consumed = consumed;
6070 coding->consumed_char = consumed_char;
6071 coding->produced = produced;
6072 coding->produced_char = produced_char;
6073
78108bcd 6074 if (coding->dst_multibyte)
73be902c
KH
6075 newstr = make_uninit_multibyte_string (produced_char + shrinked_bytes,
6076 produced + shrinked_bytes);
78108bcd 6077 else
73be902c
KH
6078 newstr = make_uninit_string (produced + shrinked_bytes);
6079 if (from > 0)
a4244313
KR
6080 STRING_COPYIN (newstr, 0, SDATA (str), from);
6081 STRING_COPYIN (newstr, from, buf.data, produced);
73be902c 6082 if (shrinked_bytes > from)
a4244313
KR
6083 STRING_COPYIN (newstr, from + produced,
6084 SDATA (str) + to_byte,
6085 shrinked_bytes - from);
73be902c 6086 free_conversion_buffer (&buf);
b73bfc1c
KH
6087
6088 if (coding->cmp_data && coding->cmp_data->used)
73be902c 6089 coding_restore_composition (coding, newstr);
b73bfc1c
KH
6090 coding_free_composition_data (coding);
6091
6092 if (SYMBOLP (coding->post_read_conversion)
6093 && !NILP (Ffboundp (coding->post_read_conversion)))
73be902c 6094 newstr = run_pre_post_conversion_on_str (newstr, coding, 0);
b73bfc1c 6095
73be902c 6096 return newstr;
b73bfc1c
KH
6097}
6098
6099Lisp_Object
6100encode_coding_string (str, coding, nocopy)
6101 Lisp_Object str;
6102 struct coding_system *coding;
6103 int nocopy;
6104{
6105 int len;
73be902c 6106 struct conversion_buffer buf;
b73bfc1c 6107 int from, to, to_byte;
b73bfc1c 6108 int result;
73be902c
KH
6109 int shrinked_bytes = 0;
6110 Lisp_Object newstr;
2391eaa4 6111 int consumed, consumed_char, produced, produced_char;
b73bfc1c
KH
6112
6113 if (SYMBOLP (coding->pre_write_conversion)
6114 && !NILP (Ffboundp (coding->pre_write_conversion)))
6bac5b12 6115 str = run_pre_post_conversion_on_str (str, coding, 1);
b73bfc1c
KH
6116
6117 from = 0;
d5db4077
KR
6118 to = SCHARS (str);
6119 to_byte = SBYTES (str);
b73bfc1c 6120
e2c06b17
KH
6121 /* Encoding routines determine the multibyteness of the source text
6122 by coding->src_multibyte. */
6123 coding->src_multibyte = STRING_MULTIBYTE (str);
6124 coding->dst_multibyte = 0;
b73bfc1c 6125 if (! CODING_REQUIRE_ENCODING (coding))
826bfb8b 6126 {
d5db4077
KR
6127 coding->consumed = SBYTES (str);
6128 coding->consumed_char = SCHARS (str);
b73bfc1c
KH
6129 if (STRING_MULTIBYTE (str))
6130 {
6131 str = Fstring_as_unibyte (str);
6132 nocopy = 1;
6133 }
d5db4077
KR
6134 coding->produced = SBYTES (str);
6135 coding->produced_char = SCHARS (str);
b73bfc1c 6136 return (nocopy ? str : Fcopy_sequence (str));
826bfb8b
KH
6137 }
6138
b73bfc1c
KH
6139 if (coding->composing != COMPOSITION_DISABLED)
6140 coding_save_composition (coding, from, to, str);
ec6d2bb8 6141
b73bfc1c 6142 /* Try to skip the heading and tailing ASCIIs. */
4956c225
KH
6143 if (coding->type != coding_type_ccl)
6144 {
d5db4077 6145 SHRINK_CONVERSION_REGION (&from, &to_byte, coding, SDATA (str),
4956c225
KH
6146 1);
6147 if (from == to_byte)
6148 return (nocopy ? str : Fcopy_sequence (str));
d5db4077 6149 shrinked_bytes = from + (SBYTES (str) - to_byte);
4956c225 6150 }
b73bfc1c
KH
6151
6152 len = encoding_buffer_size (coding, to_byte - from);
73be902c
KH
6153 allocate_conversion_buffer (buf, len);
6154
2391eaa4 6155 consumed = consumed_char = produced = produced_char = 0;
73be902c
KH
6156 while (1)
6157 {
d5db4077 6158 result = encode_coding (coding, SDATA (str) + from + consumed,
73be902c
KH
6159 buf.data + produced, to_byte - from - consumed,
6160 buf.size - produced);
6161 consumed += coding->consumed;
2391eaa4 6162 consumed_char += coding->consumed_char;
13004bef 6163 produced += coding->produced;
2391eaa4
KH
6164 produced_char += coding->produced_char;
6165 if (result == CODING_FINISH_NORMAL
6166 || (result == CODING_FINISH_INSUFFICIENT_SRC
6167 && coding->consumed == 0))
73be902c
KH
6168 break;
6169 /* Now result should be CODING_FINISH_INSUFFICIENT_DST. */
6170 extend_conversion_buffer (&buf);
6171 }
6172
2391eaa4
KH
6173 coding->consumed = consumed;
6174 coding->consumed_char = consumed_char;
6175 coding->produced = produced;
6176 coding->produced_char = produced_char;
6177
73be902c 6178 newstr = make_uninit_string (produced + shrinked_bytes);
b73bfc1c 6179 if (from > 0)
a4244313
KR
6180 STRING_COPYIN (newstr, 0, SDATA (str), from);
6181 STRING_COPYIN (newstr, from, buf.data, produced);
73be902c 6182 if (shrinked_bytes > from)
a4244313
KR
6183 STRING_COPYIN (newstr, from + produced,
6184 SDATA (str) + to_byte,
6185 shrinked_bytes - from);
73be902c
KH
6186
6187 free_conversion_buffer (&buf);
ec6d2bb8 6188 coding_free_composition_data (coding);
b73bfc1c 6189
73be902c 6190 return newstr;
4ed46869
KH
6191}
6192
6193\f
6194#ifdef emacs
1397dc18 6195/*** 8. Emacs Lisp library functions ***/
4ed46869 6196
4ed46869 6197DEFUN ("coding-system-p", Fcoding_system_p, Scoding_system_p, 1, 1, 0,
48b0f3ae
PJ
6198 doc: /* Return t if OBJECT is nil or a coding-system.
6199See the documentation of `make-coding-system' for information
6200about coding-system objects. */)
6201 (obj)
4ed46869
KH
6202 Lisp_Object obj;
6203{
4608c386
KH
6204 if (NILP (obj))
6205 return Qt;
6206 if (!SYMBOLP (obj))
6207 return Qnil;
6208 /* Get coding-spec vector for OBJ. */
6209 obj = Fget (obj, Qcoding_system);
6210 return ((VECTORP (obj) && XVECTOR (obj)->size == 5)
6211 ? Qt : Qnil);
4ed46869
KH
6212}
6213
9d991de8
RS
6214DEFUN ("read-non-nil-coding-system", Fread_non_nil_coding_system,
6215 Sread_non_nil_coding_system, 1, 1, 0,
48b0f3ae
PJ
6216 doc: /* Read a coding system from the minibuffer, prompting with string PROMPT. */)
6217 (prompt)
4ed46869
KH
6218 Lisp_Object prompt;
6219{
e0e989f6 6220 Lisp_Object val;
9d991de8
RS
6221 do
6222 {
4608c386
KH
6223 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil,
6224 Qt, Qnil, Qcoding_system_history, Qnil, Qnil);
9d991de8 6225 }
d5db4077 6226 while (SCHARS (val) == 0);
e0e989f6 6227 return (Fintern (val, Qnil));
4ed46869
KH
6228}
6229
9b787f3e 6230DEFUN ("read-coding-system", Fread_coding_system, Sread_coding_system, 1, 2, 0,
48b0f3ae
PJ
6231 doc: /* Read a coding system from the minibuffer, prompting with string PROMPT.
6232If the user enters null input, return second argument DEFAULT-CODING-SYSTEM. */)
6233 (prompt, default_coding_system)
9b787f3e 6234 Lisp_Object prompt, default_coding_system;
4ed46869 6235{
f44d27ce 6236 Lisp_Object val;
9b787f3e 6237 if (SYMBOLP (default_coding_system))
57d25e6f 6238 default_coding_system = SYMBOL_NAME (default_coding_system);
4608c386 6239 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil,
9b787f3e
RS
6240 Qt, Qnil, Qcoding_system_history,
6241 default_coding_system, Qnil);
d5db4077 6242 return (SCHARS (val) == 0 ? Qnil : Fintern (val, Qnil));
4ed46869
KH
6243}
6244
6245DEFUN ("check-coding-system", Fcheck_coding_system, Scheck_coding_system,
6246 1, 1, 0,
48b0f3ae
PJ
6247 doc: /* Check validity of CODING-SYSTEM.
6248If valid, return CODING-SYSTEM, else signal a `coding-system-error' error.
6249It is valid if it is a symbol with a non-nil `coding-system' property.
6250The value of property should be a vector of length 5. */)
6251 (coding_system)
4ed46869
KH
6252 Lisp_Object coding_system;
6253{
b7826503 6254 CHECK_SYMBOL (coding_system);
4ed46869
KH
6255 if (!NILP (Fcoding_system_p (coding_system)))
6256 return coding_system;
6257 while (1)
02ba4723 6258 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil));
4ed46869 6259}
3a73fa5d 6260\f
d46c5b12 6261Lisp_Object
0a28aafb 6262detect_coding_system (src, src_bytes, highest, multibytep)
a4244313 6263 const unsigned char *src;
d46c5b12 6264 int src_bytes, highest;
0a28aafb 6265 int multibytep;
4ed46869
KH
6266{
6267 int coding_mask, eol_type;
d46c5b12
KH
6268 Lisp_Object val, tmp;
6269 int dummy;
4ed46869 6270
0a28aafb 6271 coding_mask = detect_coding_mask (src, src_bytes, NULL, &dummy, multibytep);
d46c5b12
KH
6272 eol_type = detect_eol_type (src, src_bytes, &dummy);
6273 if (eol_type == CODING_EOL_INCONSISTENT)
25b02698 6274 eol_type = CODING_EOL_UNDECIDED;
4ed46869 6275
d46c5b12 6276 if (!coding_mask)
4ed46869 6277 {
27901516 6278 val = Qundecided;
d46c5b12 6279 if (eol_type != CODING_EOL_UNDECIDED)
4ed46869 6280 {
f44d27ce
RS
6281 Lisp_Object val2;
6282 val2 = Fget (Qundecided, Qeol_type);
4ed46869
KH
6283 if (VECTORP (val2))
6284 val = XVECTOR (val2)->contents[eol_type];
6285 }
80e803b4 6286 return (highest ? val : Fcons (val, Qnil));
4ed46869 6287 }
4ed46869 6288
d46c5b12
KH
6289 /* At first, gather possible coding systems in VAL. */
6290 val = Qnil;
fa42c37f 6291 for (tmp = Vcoding_category_list; CONSP (tmp); tmp = XCDR (tmp))
4ed46869 6292 {
fa42c37f
KH
6293 Lisp_Object category_val, category_index;
6294
6295 category_index = Fget (XCAR (tmp), Qcoding_category_index);
6296 category_val = Fsymbol_value (XCAR (tmp));
6297 if (!NILP (category_val)
6298 && NATNUMP (category_index)
6299 && (coding_mask & (1 << XFASTINT (category_index))))
4ed46869 6300 {
fa42c37f 6301 val = Fcons (category_val, val);
d46c5b12
KH
6302 if (highest)
6303 break;
4ed46869
KH
6304 }
6305 }
d46c5b12
KH
6306 if (!highest)
6307 val = Fnreverse (val);
4ed46869 6308
65059037 6309 /* Then, replace the elements with subsidiary coding systems. */
fa42c37f 6310 for (tmp = val; CONSP (tmp); tmp = XCDR (tmp))
4ed46869 6311 {
65059037
RS
6312 if (eol_type != CODING_EOL_UNDECIDED
6313 && eol_type != CODING_EOL_INCONSISTENT)
4ed46869 6314 {
d46c5b12 6315 Lisp_Object eol;
03699b14 6316 eol = Fget (XCAR (tmp), Qeol_type);
d46c5b12 6317 if (VECTORP (eol))
f3fbd155 6318 XSETCAR (tmp, XVECTOR (eol)->contents[eol_type]);
4ed46869
KH
6319 }
6320 }
03699b14 6321 return (highest ? XCAR (val) : val);
93dec019 6322}
4ed46869 6323
d46c5b12
KH
6324DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region,
6325 2, 3, 0,
40fd536c
KH
6326 doc: /* Detect how the byte sequence in the region is encoded.
6327Return a list of possible coding systems used on decoding a byte
6328sequence containing the bytes in the region between START and END when
6329the coding system `undecided' is specified. The list is ordered by
6330priority decided in the current language environment.
48b0f3ae
PJ
6331
6332If only ASCII characters are found, it returns a list of single element
6333`undecided' or its subsidiary coding system according to a detected
6334end-of-line format.
6335
6336If optional argument HIGHEST is non-nil, return the coding system of
6337highest priority. */)
6338 (start, end, highest)
d46c5b12
KH
6339 Lisp_Object start, end, highest;
6340{
6341 int from, to;
6342 int from_byte, to_byte;
682169fe 6343 int include_anchor_byte = 0;
6289dd10 6344
b7826503
PJ
6345 CHECK_NUMBER_COERCE_MARKER (start);
6346 CHECK_NUMBER_COERCE_MARKER (end);
4ed46869 6347
d46c5b12
KH
6348 validate_region (&start, &end);
6349 from = XINT (start), to = XINT (end);
6350 from_byte = CHAR_TO_BYTE (from);
6351 to_byte = CHAR_TO_BYTE (to);
6289dd10 6352
d46c5b12
KH
6353 if (from < GPT && to >= GPT)
6354 move_gap_both (to, to_byte);
c210f766
KH
6355 /* If we an anchor byte `\0' follows the region, we include it in
6356 the detecting source. Then code detectors can handle the tailing
6357 byte sequence more accurately.
6358
7d0393cf 6359 Fix me: This is not a perfect solution. It is better that we
c210f766
KH
6360 add one more argument, say LAST_BLOCK, to all detect_coding_XXX.
6361 */
682169fe
KH
6362 if (to == Z || (to == GPT && GAP_SIZE > 0))
6363 include_anchor_byte = 1;
d46c5b12 6364 return detect_coding_system (BYTE_POS_ADDR (from_byte),
682169fe 6365 to_byte - from_byte + include_anchor_byte,
0a28aafb
KH
6366 !NILP (highest),
6367 !NILP (current_buffer
6368 ->enable_multibyte_characters));
d46c5b12 6369}
6289dd10 6370
d46c5b12
KH
6371DEFUN ("detect-coding-string", Fdetect_coding_string, Sdetect_coding_string,
6372 1, 2, 0,
eec1f3c7
KH
6373 doc: /* Detect how the byte sequence in STRING is encoded.
6374Return a list of possible coding systems used on decoding a byte
6375sequence containing the bytes in STRING when the coding system
6376`undecided' is specified. The list is ordered by priority decided in
6377the current language environment.
48b0f3ae
PJ
6378
6379If only ASCII characters are found, it returns a list of single element
6380`undecided' or its subsidiary coding system according to a detected
6381end-of-line format.
6382
6383If optional argument HIGHEST is non-nil, return the coding system of
6384highest priority. */)
6385 (string, highest)
d46c5b12
KH
6386 Lisp_Object string, highest;
6387{
b7826503 6388 CHECK_STRING (string);
4ed46869 6389
d5db4077 6390 return detect_coding_system (SDATA (string),
682169fe
KH
6391 /* "+ 1" is to include the anchor byte
6392 `\0'. With this, code detectors can
c210f766
KH
6393 handle the tailing bytes more
6394 accurately. */
d5db4077 6395 SBYTES (string) + 1,
0a28aafb
KH
6396 !NILP (highest),
6397 STRING_MULTIBYTE (string));
4ed46869
KH
6398}
6399
05e6f5dc
KH
6400/* Subroutine for Fsafe_coding_systems_region_internal.
6401
6402 Return a list of coding systems that safely encode the multibyte
b666620c 6403 text between P and PEND. SAFE_CODINGS, if non-nil, is an alist of
05e6f5dc
KH
6404 possible coding systems. If it is nil, it means that we have not
6405 yet found any coding systems.
6406
6407 WORK_TABLE is a copy of the char-table Vchar_coding_system_table. An
6408 element of WORK_TABLE is set to t once the element is looked up.
6409
6410 If a non-ASCII single byte char is found, set
6411 *single_byte_char_found to 1. */
6412
6413static Lisp_Object
6414find_safe_codings (p, pend, safe_codings, work_table, single_byte_char_found)
6415 unsigned char *p, *pend;
6416 Lisp_Object safe_codings, work_table;
6417 int *single_byte_char_found;
6b89e3aa
KH
6418{
6419 int c, len, i;
6420 Lisp_Object val, ch;
6421 Lisp_Object prev, tail;
177c0ea7 6422
6b89e3aa
KH
6423 while (p < pend)
6424 {
6425 c = STRING_CHAR_AND_LENGTH (p, pend - p, len);
6426 p += len;
6427 if (ASCII_BYTE_P (c))
6428 /* We can ignore ASCII characters here. */
6429 continue;
6430 if (SINGLE_BYTE_CHAR_P (c))
6431 *single_byte_char_found = 1;
6432 if (NILP (safe_codings))
b666620c
KH
6433 /* Already all coding systems are excluded. But, we can't
6434 terminate the loop here because non-ASCII single-byte char
6435 must be found. */
6b89e3aa
KH
6436 continue;
6437 /* Check the safe coding systems for C. */
6438 ch = make_number (c);
6439 val = Faref (work_table, ch);
6440 if (EQ (val, Qt))
6441 /* This element was already checked. Ignore it. */
6442 continue;
6443 /* Remember that we checked this element. */
6444 Faset (work_table, ch, Qt);
6445
6446 for (prev = tail = safe_codings; CONSP (tail); tail = XCDR (tail))
6447 {
b666620c
KH
6448 Lisp_Object elt, translation_table, hash_table, accept_latin_extra;
6449 int encodable;
6450
6451 elt = XCAR (tail);
6452 if (CONSP (XCDR (elt)))
6453 {
6454 /* This entry has this format now:
6455 ( CODING SAFE-CHARS TRANSLATION-TABLE HASH-TABLE
6456 ACCEPT-LATIN-EXTRA ) */
6457 val = XCDR (elt);
6458 encodable = ! NILP (Faref (XCAR (val), ch));
6459 if (! encodable)
6460 {
6461 val = XCDR (val);
6462 translation_table = XCAR (val);
6463 hash_table = XCAR (XCDR (val));
6464 accept_latin_extra = XCAR (XCDR (XCDR (val)));
6465 }
6466 }
6467 else
6468 {
6469 /* This entry has this format now: ( CODING . SAFE-CHARS) */
6470 encodable = ! NILP (Faref (XCDR (elt), ch));
6471 if (! encodable)
6472 {
6473 /* Transform the format to:
6474 ( CODING SAFE-CHARS TRANSLATION-TABLE HASH-TABLE
6475 ACCEPT-LATIN-EXTRA ) */
6476 val = Fget (XCAR (elt), Qcoding_system);
6477 translation_table
6478 = Fplist_get (AREF (val, 3),
6479 Qtranslation_table_for_encode);
6480 if (SYMBOLP (translation_table))
6481 translation_table = Fget (translation_table,
6482 Qtranslation_table);
6483 hash_table
6484 = (CHAR_TABLE_P (translation_table)
6485 ? XCHAR_TABLE (translation_table)->extras[1]
6486 : Qnil);
6487 accept_latin_extra
6488 = ((EQ (AREF (val, 0), make_number (2))
6489 && VECTORP (AREF (val, 4)))
6490 ? AREF (AREF (val, 4), CODING_FLAG_ISO_LATIN_EXTRA)
6491 : Qnil);
6492 XSETCAR (tail, list5 (XCAR (elt), XCDR (elt),
6493 translation_table, hash_table,
6494 accept_latin_extra));
6495 }
6496 }
6497
6498 if (! encodable
6499 && ((CHAR_TABLE_P (translation_table)
6500 && ! NILP (Faref (translation_table, ch)))
6501 || (HASH_TABLE_P (hash_table)
6502 && ! NILP (Fgethash (ch, hash_table, Qnil)))
6503 || (SINGLE_BYTE_CHAR_P (c)
6504 && ! NILP (accept_latin_extra)
6505 && VECTORP (Vlatin_extra_code_table)
6506 && ! NILP (AREF (Vlatin_extra_code_table, c)))))
6507 encodable = 1;
6508 if (encodable)
6509 prev = tail;
6510 else
6b89e3aa
KH
6511 {
6512 /* Exclued this coding system from SAFE_CODINGS. */
6513 if (EQ (tail, safe_codings))
6514 safe_codings = XCDR (safe_codings);
6515 else
6516 XSETCDR (prev, XCDR (tail));
6517 }
6b89e3aa
KH
6518 }
6519 }
6520 return safe_codings;
6521}
6522
067a6a66
KH
6523DEFUN ("find-coding-systems-region-internal",
6524 Ffind_coding_systems_region_internal,
6525 Sfind_coding_systems_region_internal, 2, 2, 0,
6b89e3aa
KH
6526 doc: /* Internal use only. */)
6527 (start, end)
6528 Lisp_Object start, end;
6529{
6530 Lisp_Object work_table, safe_codings;
6531 int non_ascii_p = 0;
6532 int single_byte_char_found = 0;
6533 const unsigned char *p1, *p1end, *p2, *p2end, *p;
6534
6535 if (STRINGP (start))
6536 {
6537 if (!STRING_MULTIBYTE (start))
6538 return Qt;
6539 p1 = SDATA (start), p1end = p1 + SBYTES (start);
6540 p2 = p2end = p1end;
6541 if (SCHARS (start) != SBYTES (start))
6542 non_ascii_p = 1;
6543 }
6544 else
6545 {
6546 int from, to, stop;
6547
6548 CHECK_NUMBER_COERCE_MARKER (start);
6549 CHECK_NUMBER_COERCE_MARKER (end);
6550 if (XINT (start) < BEG || XINT (end) > Z || XINT (start) > XINT (end))
6551 args_out_of_range (start, end);
6552 if (NILP (current_buffer->enable_multibyte_characters))
6553 return Qt;
6554 from = CHAR_TO_BYTE (XINT (start));
6555 to = CHAR_TO_BYTE (XINT (end));
6556 stop = from < GPT_BYTE && GPT_BYTE < to ? GPT_BYTE : to;
6557 p1 = BYTE_POS_ADDR (from), p1end = p1 + (stop - from);
6558 if (stop == to)
6559 p2 = p2end = p1end;
6560 else
6561 p2 = BYTE_POS_ADDR (stop), p2end = p2 + (to - stop);
6562 if (XINT (end) - XINT (start) != to - from)
6563 non_ascii_p = 1;
6564 }
6565
6566 if (!non_ascii_p)
6567 {
6568 /* We are sure that the text contains no multibyte character.
6569 Check if it contains eight-bit-graphic. */
6570 p = p1;
6571 for (p = p1; p < p1end && ASCII_BYTE_P (*p); p++);
6572 if (p == p1end)
6573 {
6574 for (p = p2; p < p2end && ASCII_BYTE_P (*p); p++);
6575 if (p == p2end)
6576 return Qt;
6577 }
6578 }
6579
6580 /* The text contains non-ASCII characters. */
6581
6582 work_table = Fmake_char_table (Qchar_coding_system, Qnil);
6583 safe_codings = Fcopy_sequence (XCDR (Vcoding_system_safe_chars));
6584
067a6a66
KH
6585 safe_codings = find_safe_codings (p1, p1end, safe_codings, work_table,
6586 &single_byte_char_found);
6b89e3aa 6587 if (p2 < p2end)
067a6a66
KH
6588 safe_codings = find_safe_codings (p2, p2end, safe_codings, work_table,
6589 &single_byte_char_found);
6b89e3aa
KH
6590 if (EQ (safe_codings, XCDR (Vcoding_system_safe_chars)))
6591 safe_codings = Qt;
6592 else
6593 {
6594 /* Turn safe_codings to a list of coding systems... */
6595 Lisp_Object val;
6596
6597 if (single_byte_char_found)
6598 /* ... and append these for eight-bit chars. */
6599 val = Fcons (Qraw_text,
6600 Fcons (Qemacs_mule, Fcons (Qno_conversion, Qnil)));
6601 else
6602 /* ... and append generic coding systems. */
6603 val = Fcopy_sequence (XCAR (Vcoding_system_safe_chars));
177c0ea7 6604
6b89e3aa
KH
6605 for (; CONSP (safe_codings); safe_codings = XCDR (safe_codings))
6606 val = Fcons (XCAR (XCAR (safe_codings)), val);
6607 safe_codings = val;
6608 }
6609
6610 return safe_codings;
6611}
6612
6613
068a9dbd
KH
6614/* Search from position POS for such characters that are unencodable
6615 accoding to SAFE_CHARS, and return a list of their positions. P
6616 points where in the memory the character at POS exists. Limit the
6617 search at PEND or when Nth unencodable characters are found.
6618
6619 If SAFE_CHARS is a char table, an element for an unencodable
6620 character is nil.
6621
6622 If SAFE_CHARS is nil, all non-ASCII characters are unencodable.
6623
6624 Otherwise, SAFE_CHARS is t, and only eight-bit-contrl and
6625 eight-bit-graphic characters are unencodable. */
6626
6627static Lisp_Object
6628unencodable_char_position (safe_chars, pos, p, pend, n)
6629 Lisp_Object safe_chars;
6630 int pos;
6631 unsigned char *p, *pend;
6632 int n;
6633{
6634 Lisp_Object pos_list;
6635
6636 pos_list = Qnil;
6637 while (p < pend)
6638 {
6639 int len;
6640 int c = STRING_CHAR_AND_LENGTH (p, MAX_MULTIBYTE_LENGTH, len);
7d0393cf 6641
068a9dbd
KH
6642 if (c >= 128
6643 && (CHAR_TABLE_P (safe_chars)
6644 ? NILP (CHAR_TABLE_REF (safe_chars, c))
6645 : (NILP (safe_chars) || c < 256)))
6646 {
6647 pos_list = Fcons (make_number (pos), pos_list);
6648 if (--n <= 0)
6649 break;
6650 }
6651 pos++;
6652 p += len;
6653 }
6654 return Fnreverse (pos_list);
6655}
6656
6657
6658DEFUN ("unencodable-char-position", Funencodable_char_position,
6659 Sunencodable_char_position, 3, 5, 0,
6660 doc: /*
6661Return position of first un-encodable character in a region.
6662START and END specfiy the region and CODING-SYSTEM specifies the
6663encoding to check. Return nil if CODING-SYSTEM does encode the region.
6664
6665If optional 4th argument COUNT is non-nil, it specifies at most how
6666many un-encodable characters to search. In this case, the value is a
6667list of positions.
6668
6669If optional 5th argument STRING is non-nil, it is a string to search
6670for un-encodable characters. In that case, START and END are indexes
6671to the string. */)
6672 (start, end, coding_system, count, string)
6673 Lisp_Object start, end, coding_system, count, string;
6674{
6675 int n;
6676 Lisp_Object safe_chars;
6677 struct coding_system coding;
6678 Lisp_Object positions;
6679 int from, to;
6680 unsigned char *p, *pend;
6681
6682 if (NILP (string))
6683 {
6684 validate_region (&start, &end);
6685 from = XINT (start);
6686 to = XINT (end);
6687 if (NILP (current_buffer->enable_multibyte_characters))
6688 return Qnil;
6689 p = CHAR_POS_ADDR (from);
200c93e2
KH
6690 if (to == GPT)
6691 pend = GPT_ADDR;
6692 else
6693 pend = CHAR_POS_ADDR (to);
068a9dbd
KH
6694 }
6695 else
6696 {
6697 CHECK_STRING (string);
6698 CHECK_NATNUM (start);
6699 CHECK_NATNUM (end);
6700 from = XINT (start);
6701 to = XINT (end);
6702 if (from > to
6703 || to > SCHARS (string))
6704 args_out_of_range_3 (string, start, end);
6705 if (! STRING_MULTIBYTE (string))
6706 return Qnil;
6707 p = SDATA (string) + string_char_to_byte (string, from);
6708 pend = SDATA (string) + string_char_to_byte (string, to);
6709 }
6710
6711 setup_coding_system (Fcheck_coding_system (coding_system), &coding);
6712
6713 if (NILP (count))
6714 n = 1;
6715 else
6716 {
6717 CHECK_NATNUM (count);
6718 n = XINT (count);
6719 }
6720
6721 if (coding.type == coding_type_no_conversion
6722 || coding.type == coding_type_raw_text)
6723 return Qnil;
6724
6725 if (coding.type == coding_type_undecided)
6726 safe_chars = Qnil;
6727 else
6b89e3aa 6728 safe_chars = coding_safe_chars (coding_system);
068a9dbd
KH
6729
6730 if (STRINGP (string)
6731 || from >= GPT || to <= GPT)
6732 positions = unencodable_char_position (safe_chars, from, p, pend, n);
6733 else
6734 {
6735 Lisp_Object args[2];
6736
6737 args[0] = unencodable_char_position (safe_chars, from, p, GPT_ADDR, n);
96d2e64d 6738 n -= XINT (Flength (args[0]));
068a9dbd
KH
6739 if (n <= 0)
6740 positions = args[0];
6741 else
6742 {
6743 args[1] = unencodable_char_position (safe_chars, GPT, GAP_END_ADDR,
6744 pend, n);
6745 positions = Fappend (2, args);
6746 }
6747 }
6748
6749 return (NILP (count) ? Fcar (positions) : positions);
6750}
6751
6752
4031e2bf
KH
6753Lisp_Object
6754code_convert_region1 (start, end, coding_system, encodep)
d46c5b12 6755 Lisp_Object start, end, coding_system;
4031e2bf 6756 int encodep;
3a73fa5d
RS
6757{
6758 struct coding_system coding;
da55a2b7 6759 int from, to;
3a73fa5d 6760
b7826503
PJ
6761 CHECK_NUMBER_COERCE_MARKER (start);
6762 CHECK_NUMBER_COERCE_MARKER (end);
6763 CHECK_SYMBOL (coding_system);
3a73fa5d 6764
d46c5b12
KH
6765 validate_region (&start, &end);
6766 from = XFASTINT (start);
6767 to = XFASTINT (end);
6768
3a73fa5d 6769 if (NILP (coding_system))
d46c5b12
KH
6770 return make_number (to - from);
6771
3a73fa5d 6772 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
d5db4077 6773 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system)));
3a73fa5d 6774
d46c5b12 6775 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
6776 coding.src_multibyte = coding.dst_multibyte
6777 = !NILP (current_buffer->enable_multibyte_characters);
fb88bf2d
KH
6778 code_convert_region (from, CHAR_TO_BYTE (from), to, CHAR_TO_BYTE (to),
6779 &coding, encodep, 1);
f072a3e8 6780 Vlast_coding_system_used = coding.symbol;
fb88bf2d 6781 return make_number (coding.produced_char);
4031e2bf
KH
6782}
6783
6784DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region,
6785 3, 3, "r\nzCoding system: ",
48b0f3ae
PJ
6786 doc: /* Decode the current region from the specified coding system.
6787When called from a program, takes three arguments:
6788START, END, and CODING-SYSTEM. START and END are buffer positions.
6789This function sets `last-coding-system-used' to the precise coding system
6790used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6791not fully specified.)
6792It returns the length of the decoded text. */)
6793 (start, end, coding_system)
4031e2bf
KH
6794 Lisp_Object start, end, coding_system;
6795{
6796 return code_convert_region1 (start, end, coding_system, 0);
3a73fa5d
RS
6797}
6798
6799DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region,
6800 3, 3, "r\nzCoding system: ",
48b0f3ae
PJ
6801 doc: /* Encode the current region into the specified coding system.
6802When called from a program, takes three arguments:
6803START, END, and CODING-SYSTEM. START and END are buffer positions.
6804This function sets `last-coding-system-used' to the precise coding system
6805used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6806not fully specified.)
6807It returns the length of the encoded text. */)
6808 (start, end, coding_system)
d46c5b12 6809 Lisp_Object start, end, coding_system;
3a73fa5d 6810{
4031e2bf
KH
6811 return code_convert_region1 (start, end, coding_system, 1);
6812}
3a73fa5d 6813
4031e2bf
KH
6814Lisp_Object
6815code_convert_string1 (string, coding_system, nocopy, encodep)
6816 Lisp_Object string, coding_system, nocopy;
6817 int encodep;
6818{
6819 struct coding_system coding;
3a73fa5d 6820
b7826503
PJ
6821 CHECK_STRING (string);
6822 CHECK_SYMBOL (coding_system);
4ed46869 6823
d46c5b12 6824 if (NILP (coding_system))
4031e2bf 6825 return (NILP (nocopy) ? Fcopy_sequence (string) : string);
4ed46869 6826
d46c5b12 6827 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
d5db4077 6828 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system)));
5f1cd180 6829
d46c5b12 6830 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
6831 string = (encodep
6832 ? encode_coding_string (string, &coding, !NILP (nocopy))
6833 : decode_coding_string (string, &coding, !NILP (nocopy)));
f072a3e8 6834 Vlast_coding_system_used = coding.symbol;
ec6d2bb8
KH
6835
6836 return string;
4ed46869
KH
6837}
6838
4ed46869 6839DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string,
e0e989f6 6840 2, 3, 0,
48b0f3ae
PJ
6841 doc: /* Decode STRING which is encoded in CODING-SYSTEM, and return the result.
6842Optional arg NOCOPY non-nil means it is OK to return STRING itself
6843if the decoding operation is trivial.
6844This function sets `last-coding-system-used' to the precise coding system
6845used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6846not fully specified.) */)
6847 (string, coding_system, nocopy)
e0e989f6 6848 Lisp_Object string, coding_system, nocopy;
4ed46869 6849{
f072a3e8 6850 return code_convert_string1 (string, coding_system, nocopy, 0);
4ed46869
KH
6851}
6852
6853DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string,
e0e989f6 6854 2, 3, 0,
48b0f3ae
PJ
6855 doc: /* Encode STRING to CODING-SYSTEM, and return the result.
6856Optional arg NOCOPY non-nil means it is OK to return STRING itself
6857if the encoding operation is trivial.
6858This function sets `last-coding-system-used' to the precise coding system
6859used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6860not fully specified.) */)
6861 (string, coding_system, nocopy)
e0e989f6 6862 Lisp_Object string, coding_system, nocopy;
4ed46869 6863{
f072a3e8 6864 return code_convert_string1 (string, coding_system, nocopy, 1);
4ed46869 6865}
4031e2bf 6866
ecec61c1 6867/* Encode or decode STRING according to CODING_SYSTEM.
ec6d2bb8
KH
6868 Do not set Vlast_coding_system_used.
6869
6870 This function is called only from macros DECODE_FILE and
6871 ENCODE_FILE, thus we ignore character composition. */
ecec61c1
KH
6872
6873Lisp_Object
6874code_convert_string_norecord (string, coding_system, encodep)
6875 Lisp_Object string, coding_system;
6876 int encodep;
6877{
6878 struct coding_system coding;
6879
b7826503
PJ
6880 CHECK_STRING (string);
6881 CHECK_SYMBOL (coding_system);
ecec61c1
KH
6882
6883 if (NILP (coding_system))
6884 return string;
6885
6886 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
d5db4077 6887 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system)));
ecec61c1 6888
ec6d2bb8 6889 coding.composing = COMPOSITION_DISABLED;
ecec61c1 6890 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
6891 return (encodep
6892 ? encode_coding_string (string, &coding, 1)
6893 : decode_coding_string (string, &coding, 1));
ecec61c1 6894}
3a73fa5d 6895\f
4ed46869 6896DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0,
48b0f3ae
PJ
6897 doc: /* Decode a Japanese character which has CODE in shift_jis encoding.
6898Return the corresponding character. */)
6899 (code)
4ed46869
KH
6900 Lisp_Object code;
6901{
6902 unsigned char c1, c2, s1, s2;
6903 Lisp_Object val;
6904
b7826503 6905 CHECK_NUMBER (code);
4ed46869 6906 s1 = (XFASTINT (code)) >> 8, s2 = (XFASTINT (code)) & 0xFF;
55ab7be3
KH
6907 if (s1 == 0)
6908 {
c28a9453
KH
6909 if (s2 < 0x80)
6910 XSETFASTINT (val, s2);
6911 else if (s2 >= 0xA0 || s2 <= 0xDF)
b73bfc1c 6912 XSETFASTINT (val, MAKE_CHAR (charset_katakana_jisx0201, s2, 0));
c28a9453 6913 else
9da8350f 6914 error ("Invalid Shift JIS code: %x", XFASTINT (code));
55ab7be3
KH
6915 }
6916 else
6917 {
87323294 6918 if ((s1 < 0x80 || (s1 > 0x9F && s1 < 0xE0) || s1 > 0xEF)
55ab7be3 6919 || (s2 < 0x40 || s2 == 0x7F || s2 > 0xFC))
9da8350f 6920 error ("Invalid Shift JIS code: %x", XFASTINT (code));
55ab7be3 6921 DECODE_SJIS (s1, s2, c1, c2);
b73bfc1c 6922 XSETFASTINT (val, MAKE_CHAR (charset_jisx0208, c1, c2));
55ab7be3 6923 }
4ed46869
KH
6924 return val;
6925}
6926
6927DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0,
48b0f3ae
PJ
6928 doc: /* Encode a Japanese character CHAR to shift_jis encoding.
6929Return the corresponding code in SJIS. */)
6930 (ch)
4ed46869
KH
6931 Lisp_Object ch;
6932{
bcf26d6a 6933 int charset, c1, c2, s1, s2;
4ed46869
KH
6934 Lisp_Object val;
6935
b7826503 6936 CHECK_NUMBER (ch);
4ed46869 6937 SPLIT_CHAR (XFASTINT (ch), charset, c1, c2);
c28a9453
KH
6938 if (charset == CHARSET_ASCII)
6939 {
6940 val = ch;
6941 }
6942 else if (charset == charset_jisx0208
6943 && c1 > 0x20 && c1 < 0x7F && c2 > 0x20 && c2 < 0x7F)
4ed46869
KH
6944 {
6945 ENCODE_SJIS (c1, c2, s1, s2);
bcf26d6a 6946 XSETFASTINT (val, (s1 << 8) | s2);
4ed46869 6947 }
55ab7be3
KH
6948 else if (charset == charset_katakana_jisx0201
6949 && c1 > 0x20 && c2 < 0xE0)
6950 {
6951 XSETFASTINT (val, c1 | 0x80);
6952 }
4ed46869 6953 else
55ab7be3 6954 error ("Can't encode to shift_jis: %d", XFASTINT (ch));
4ed46869
KH
6955 return val;
6956}
6957
6958DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0,
48b0f3ae
PJ
6959 doc: /* Decode a Big5 character which has CODE in BIG5 coding system.
6960Return the corresponding character. */)
6961 (code)
4ed46869
KH
6962 Lisp_Object code;
6963{
6964 int charset;
6965 unsigned char b1, b2, c1, c2;
6966 Lisp_Object val;
6967
b7826503 6968 CHECK_NUMBER (code);
4ed46869 6969 b1 = (XFASTINT (code)) >> 8, b2 = (XFASTINT (code)) & 0xFF;
c28a9453
KH
6970 if (b1 == 0)
6971 {
6972 if (b2 >= 0x80)
9da8350f 6973 error ("Invalid BIG5 code: %x", XFASTINT (code));
c28a9453
KH
6974 val = code;
6975 }
6976 else
6977 {
6978 if ((b1 < 0xA1 || b1 > 0xFE)
6979 || (b2 < 0x40 || (b2 > 0x7E && b2 < 0xA1) || b2 > 0xFE))
9da8350f 6980 error ("Invalid BIG5 code: %x", XFASTINT (code));
c28a9453 6981 DECODE_BIG5 (b1, b2, charset, c1, c2);
b73bfc1c 6982 XSETFASTINT (val, MAKE_CHAR (charset, c1, c2));
c28a9453 6983 }
4ed46869
KH
6984 return val;
6985}
6986
6987DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0,
48b0f3ae
PJ
6988 doc: /* Encode the Big5 character CHAR to BIG5 coding system.
6989Return the corresponding character code in Big5. */)
6990 (ch)
4ed46869
KH
6991 Lisp_Object ch;
6992{
bcf26d6a 6993 int charset, c1, c2, b1, b2;
4ed46869
KH
6994 Lisp_Object val;
6995
b7826503 6996 CHECK_NUMBER (ch);
4ed46869 6997 SPLIT_CHAR (XFASTINT (ch), charset, c1, c2);
c28a9453
KH
6998 if (charset == CHARSET_ASCII)
6999 {
7000 val = ch;
7001 }
7002 else if ((charset == charset_big5_1
7003 && (XFASTINT (ch) >= 0x250a1 && XFASTINT (ch) <= 0x271ec))
7004 || (charset == charset_big5_2
7005 && XFASTINT (ch) >= 0x290a1 && XFASTINT (ch) <= 0x2bdb2))
4ed46869
KH
7006 {
7007 ENCODE_BIG5 (charset, c1, c2, b1, b2);
bcf26d6a 7008 XSETFASTINT (val, (b1 << 8) | b2);
4ed46869
KH
7009 }
7010 else
c28a9453 7011 error ("Can't encode to Big5: %d", XFASTINT (ch));
4ed46869
KH
7012 return val;
7013}
3a73fa5d 7014\f
002fdb44 7015DEFUN ("set-terminal-coding-system-internal", Fset_terminal_coding_system_internal,
48b0f3ae
PJ
7016 Sset_terminal_coding_system_internal, 1, 1, 0,
7017 doc: /* Internal use only. */)
7018 (coding_system)
4ed46869
KH
7019 Lisp_Object coding_system;
7020{
b7826503 7021 CHECK_SYMBOL (coding_system);
4ed46869 7022 setup_coding_system (Fcheck_coding_system (coding_system), &terminal_coding);
70c22245 7023 /* We had better not send unsafe characters to terminal. */
6e85d753 7024 terminal_coding.flags |= CODING_FLAG_ISO_SAFE;
8ca3766a 7025 /* Character composition should be disabled. */
ec6d2bb8 7026 terminal_coding.composing = COMPOSITION_DISABLED;
bd64290d
KH
7027 /* Error notification should be suppressed. */
7028 terminal_coding.suppress_error = 1;
b73bfc1c
KH
7029 terminal_coding.src_multibyte = 1;
7030 terminal_coding.dst_multibyte = 0;
4ed46869
KH
7031 return Qnil;
7032}
7033
002fdb44 7034DEFUN ("set-safe-terminal-coding-system-internal", Fset_safe_terminal_coding_system_internal,
48b0f3ae 7035 Sset_safe_terminal_coding_system_internal, 1, 1, 0,
ddb67bdc 7036 doc: /* Internal use only. */)
48b0f3ae 7037 (coding_system)
c4825358
KH
7038 Lisp_Object coding_system;
7039{
b7826503 7040 CHECK_SYMBOL (coding_system);
c4825358
KH
7041 setup_coding_system (Fcheck_coding_system (coding_system),
7042 &safe_terminal_coding);
8ca3766a 7043 /* Character composition should be disabled. */
ec6d2bb8 7044 safe_terminal_coding.composing = COMPOSITION_DISABLED;
bd64290d
KH
7045 /* Error notification should be suppressed. */
7046 terminal_coding.suppress_error = 1;
b73bfc1c
KH
7047 safe_terminal_coding.src_multibyte = 1;
7048 safe_terminal_coding.dst_multibyte = 0;
c4825358
KH
7049 return Qnil;
7050}
7051
002fdb44
DL
7052DEFUN ("terminal-coding-system", Fterminal_coding_system,
7053 Sterminal_coding_system, 0, 0, 0,
48b0f3ae
PJ
7054 doc: /* Return coding system specified for terminal output. */)
7055 ()
4ed46869
KH
7056{
7057 return terminal_coding.symbol;
7058}
7059
002fdb44 7060DEFUN ("set-keyboard-coding-system-internal", Fset_keyboard_coding_system_internal,
48b0f3ae
PJ
7061 Sset_keyboard_coding_system_internal, 1, 1, 0,
7062 doc: /* Internal use only. */)
7063 (coding_system)
4ed46869
KH
7064 Lisp_Object coding_system;
7065{
b7826503 7066 CHECK_SYMBOL (coding_system);
4ed46869 7067 setup_coding_system (Fcheck_coding_system (coding_system), &keyboard_coding);
8ca3766a 7068 /* Character composition should be disabled. */
ec6d2bb8 7069 keyboard_coding.composing = COMPOSITION_DISABLED;
4ed46869
KH
7070 return Qnil;
7071}
7072
002fdb44
DL
7073DEFUN ("keyboard-coding-system", Fkeyboard_coding_system,
7074 Skeyboard_coding_system, 0, 0, 0,
48b0f3ae
PJ
7075 doc: /* Return coding system specified for decoding keyboard input. */)
7076 ()
4ed46869
KH
7077{
7078 return keyboard_coding.symbol;
7079}
7080
7081\f
a5d301df
KH
7082DEFUN ("find-operation-coding-system", Ffind_operation_coding_system,
7083 Sfind_operation_coding_system, 1, MANY, 0,
48b0f3ae
PJ
7084 doc: /* Choose a coding system for an operation based on the target name.
7085The value names a pair of coding systems: (DECODING-SYSTEM . ENCODING-SYSTEM).
7086DECODING-SYSTEM is the coding system to use for decoding
7087\(in case OPERATION does decoding), and ENCODING-SYSTEM is the coding system
7088for encoding (in case OPERATION does encoding).
7089
7090The first argument OPERATION specifies an I/O primitive:
7091 For file I/O, `insert-file-contents' or `write-region'.
7092 For process I/O, `call-process', `call-process-region', or `start-process'.
7093 For network I/O, `open-network-stream'.
7094
7095The remaining arguments should be the same arguments that were passed
7096to the primitive. Depending on which primitive, one of those arguments
7097is selected as the TARGET. For example, if OPERATION does file I/O,
7098whichever argument specifies the file name is TARGET.
7099
7100TARGET has a meaning which depends on OPERATION:
7101 For file I/O, TARGET is a file name.
7102 For process I/O, TARGET is a process name.
7103 For network I/O, TARGET is a service name or a port number
7104
7105This function looks up what specified for TARGET in,
7106`file-coding-system-alist', `process-coding-system-alist',
7107or `network-coding-system-alist' depending on OPERATION.
7108They may specify a coding system, a cons of coding systems,
7109or a function symbol to call.
7110In the last case, we call the function with one argument,
7111which is a list of all the arguments given to this function.
7112
7113usage: (find-operation-coding-system OPERATION ARGUMENTS ...) */)
7114 (nargs, args)
4ed46869
KH
7115 int nargs;
7116 Lisp_Object *args;
7117{
7118 Lisp_Object operation, target_idx, target, val;
7119 register Lisp_Object chain;
7120
7121 if (nargs < 2)
7122 error ("Too few arguments");
7123 operation = args[0];
7124 if (!SYMBOLP (operation)
7125 || !INTEGERP (target_idx = Fget (operation, Qtarget_idx)))
8ca3766a 7126 error ("Invalid first argument");
4ed46869
KH
7127 if (nargs < 1 + XINT (target_idx))
7128 error ("Too few arguments for operation: %s",
d5db4077 7129 SDATA (SYMBOL_NAME (operation)));
7f787cfd
KH
7130 /* For write-region, if the 6th argument (i.e. VISIT, the 5th
7131 argument to write-region) is string, it must be treated as a
7132 target file name. */
7133 if (EQ (operation, Qwrite_region)
7134 && nargs > 5
7135 && STRINGP (args[5]))
d90ed3b4 7136 target_idx = make_number (4);
4ed46869
KH
7137 target = args[XINT (target_idx) + 1];
7138 if (!(STRINGP (target)
7139 || (EQ (operation, Qopen_network_stream) && INTEGERP (target))))
8ca3766a 7140 error ("Invalid argument %d", XINT (target_idx) + 1);
4ed46869 7141
2e34157c
RS
7142 chain = ((EQ (operation, Qinsert_file_contents)
7143 || EQ (operation, Qwrite_region))
02ba4723 7144 ? Vfile_coding_system_alist
2e34157c 7145 : (EQ (operation, Qopen_network_stream)
02ba4723
KH
7146 ? Vnetwork_coding_system_alist
7147 : Vprocess_coding_system_alist));
4ed46869
KH
7148 if (NILP (chain))
7149 return Qnil;
7150
03699b14 7151 for (; CONSP (chain); chain = XCDR (chain))
4ed46869 7152 {
f44d27ce 7153 Lisp_Object elt;
03699b14 7154 elt = XCAR (chain);
4ed46869
KH
7155
7156 if (CONSP (elt)
7157 && ((STRINGP (target)
03699b14
KR
7158 && STRINGP (XCAR (elt))
7159 && fast_string_match (XCAR (elt), target) >= 0)
7160 || (INTEGERP (target) && EQ (target, XCAR (elt)))))
02ba4723 7161 {
03699b14 7162 val = XCDR (elt);
b19fd4c5
KH
7163 /* Here, if VAL is both a valid coding system and a valid
7164 function symbol, we return VAL as a coding system. */
02ba4723
KH
7165 if (CONSP (val))
7166 return val;
7167 if (! SYMBOLP (val))
7168 return Qnil;
7169 if (! NILP (Fcoding_system_p (val)))
7170 return Fcons (val, val);
b19fd4c5
KH
7171 if (! NILP (Ffboundp (val)))
7172 {
7173 val = call1 (val, Flist (nargs, args));
7174 if (CONSP (val))
7175 return val;
7176 if (SYMBOLP (val) && ! NILP (Fcoding_system_p (val)))
7177 return Fcons (val, val);
7178 }
02ba4723
KH
7179 return Qnil;
7180 }
4ed46869
KH
7181 }
7182 return Qnil;
7183}
7184
1397dc18
KH
7185DEFUN ("update-coding-systems-internal", Fupdate_coding_systems_internal,
7186 Supdate_coding_systems_internal, 0, 0, 0,
48b0f3ae
PJ
7187 doc: /* Update internal database for ISO2022 and CCL based coding systems.
7188When values of any coding categories are changed, you must
7189call this function. */)
7190 ()
d46c5b12
KH
7191{
7192 int i;
7193
fa42c37f 7194 for (i = CODING_CATEGORY_IDX_EMACS_MULE; i < CODING_CATEGORY_IDX_MAX; i++)
d46c5b12 7195 {
1397dc18
KH
7196 Lisp_Object val;
7197
f5c1dd0d 7198 val = SYMBOL_VALUE (XVECTOR (Vcoding_category_table)->contents[i]);
1397dc18
KH
7199 if (!NILP (val))
7200 {
7201 if (! coding_system_table[i])
7202 coding_system_table[i] = ((struct coding_system *)
7203 xmalloc (sizeof (struct coding_system)));
7204 setup_coding_system (val, coding_system_table[i]);
7205 }
7206 else if (coding_system_table[i])
7207 {
7208 xfree (coding_system_table[i]);
7209 coding_system_table[i] = NULL;
7210 }
d46c5b12 7211 }
1397dc18 7212
d46c5b12
KH
7213 return Qnil;
7214}
7215
66cfb530
KH
7216DEFUN ("set-coding-priority-internal", Fset_coding_priority_internal,
7217 Sset_coding_priority_internal, 0, 0, 0,
48b0f3ae
PJ
7218 doc: /* Update internal database for the current value of `coding-category-list'.
7219This function is internal use only. */)
7220 ()
66cfb530
KH
7221{
7222 int i = 0, idx;
84d60297
RS
7223 Lisp_Object val;
7224
7225 val = Vcoding_category_list;
66cfb530
KH
7226
7227 while (CONSP (val) && i < CODING_CATEGORY_IDX_MAX)
7228 {
03699b14 7229 if (! SYMBOLP (XCAR (val)))
66cfb530 7230 break;
03699b14 7231 idx = XFASTINT (Fget (XCAR (val), Qcoding_category_index));
66cfb530
KH
7232 if (idx >= CODING_CATEGORY_IDX_MAX)
7233 break;
7234 coding_priorities[i++] = (1 << idx);
03699b14 7235 val = XCDR (val);
66cfb530
KH
7236 }
7237 /* If coding-category-list is valid and contains all coding
7238 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not,
fa42c37f 7239 the following code saves Emacs from crashing. */
66cfb530
KH
7240 while (i < CODING_CATEGORY_IDX_MAX)
7241 coding_priorities[i++] = CODING_CATEGORY_MASK_RAW_TEXT;
7242
7243 return Qnil;
7244}
7245
6b89e3aa
KH
7246DEFUN ("define-coding-system-internal", Fdefine_coding_system_internal,
7247 Sdefine_coding_system_internal, 1, 1, 0,
7248 doc: /* Register CODING-SYSTEM as a base coding system.
7249This function is internal use only. */)
7250 (coding_system)
7251 Lisp_Object coding_system;
7252{
7253 Lisp_Object safe_chars, slot;
7254
7255 if (NILP (Fcheck_coding_system (coding_system)))
7256 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil));
7257 safe_chars = coding_safe_chars (coding_system);
7258 if (! EQ (safe_chars, Qt) && ! CHAR_TABLE_P (safe_chars))
7259 error ("No valid safe-chars property for %s",
7260 SDATA (SYMBOL_NAME (coding_system)));
7261 if (EQ (safe_chars, Qt))
7262 {
7263 if (NILP (Fmemq (coding_system, XCAR (Vcoding_system_safe_chars))))
7264 XSETCAR (Vcoding_system_safe_chars,
7265 Fcons (coding_system, XCAR (Vcoding_system_safe_chars)));
7266 }
7267 else
7268 {
7269 slot = Fassq (coding_system, XCDR (Vcoding_system_safe_chars));
7270 if (NILP (slot))
7271 XSETCDR (Vcoding_system_safe_chars,
7272 nconc2 (XCDR (Vcoding_system_safe_chars),
7273 Fcons (Fcons (coding_system, safe_chars), Qnil)));
7274 else
7275 XSETCDR (slot, safe_chars);
7276 }
7277 return Qnil;
7278}
7279
4ed46869
KH
7280#endif /* emacs */
7281
7282\f
1397dc18 7283/*** 9. Post-amble ***/
4ed46869 7284
dfcf069d 7285void
4ed46869
KH
7286init_coding_once ()
7287{
7288 int i;
7289
93dec019 7290 /* Emacs' internal format specific initialize routine. */
4ed46869
KH
7291 for (i = 0; i <= 0x20; i++)
7292 emacs_code_class[i] = EMACS_control_code;
7293 emacs_code_class[0x0A] = EMACS_linefeed_code;
7294 emacs_code_class[0x0D] = EMACS_carriage_return_code;
7295 for (i = 0x21 ; i < 0x7F; i++)
7296 emacs_code_class[i] = EMACS_ascii_code;
7297 emacs_code_class[0x7F] = EMACS_control_code;
ec6d2bb8 7298 for (i = 0x80; i < 0xFF; i++)
4ed46869
KH
7299 emacs_code_class[i] = EMACS_invalid_code;
7300 emacs_code_class[LEADING_CODE_PRIVATE_11] = EMACS_leading_code_3;
7301 emacs_code_class[LEADING_CODE_PRIVATE_12] = EMACS_leading_code_3;
7302 emacs_code_class[LEADING_CODE_PRIVATE_21] = EMACS_leading_code_4;
7303 emacs_code_class[LEADING_CODE_PRIVATE_22] = EMACS_leading_code_4;
7304
7305 /* ISO2022 specific initialize routine. */
7306 for (i = 0; i < 0x20; i++)
b73bfc1c 7307 iso_code_class[i] = ISO_control_0;
4ed46869
KH
7308 for (i = 0x21; i < 0x7F; i++)
7309 iso_code_class[i] = ISO_graphic_plane_0;
7310 for (i = 0x80; i < 0xA0; i++)
b73bfc1c 7311 iso_code_class[i] = ISO_control_1;
4ed46869
KH
7312 for (i = 0xA1; i < 0xFF; i++)
7313 iso_code_class[i] = ISO_graphic_plane_1;
7314 iso_code_class[0x20] = iso_code_class[0x7F] = ISO_0x20_or_0x7F;
7315 iso_code_class[0xA0] = iso_code_class[0xFF] = ISO_0xA0_or_0xFF;
7316 iso_code_class[ISO_CODE_CR] = ISO_carriage_return;
7317 iso_code_class[ISO_CODE_SO] = ISO_shift_out;
7318 iso_code_class[ISO_CODE_SI] = ISO_shift_in;
7319 iso_code_class[ISO_CODE_SS2_7] = ISO_single_shift_2_7;
7320 iso_code_class[ISO_CODE_ESC] = ISO_escape;
7321 iso_code_class[ISO_CODE_SS2] = ISO_single_shift_2;
7322 iso_code_class[ISO_CODE_SS3] = ISO_single_shift_3;
7323 iso_code_class[ISO_CODE_CSI] = ISO_control_sequence_introducer;
7324
e0e989f6
KH
7325 setup_coding_system (Qnil, &keyboard_coding);
7326 setup_coding_system (Qnil, &terminal_coding);
c4825358 7327 setup_coding_system (Qnil, &safe_terminal_coding);
6bc51348 7328 setup_coding_system (Qnil, &default_buffer_file_coding);
9ce27fde 7329
d46c5b12
KH
7330 bzero (coding_system_table, sizeof coding_system_table);
7331
66cfb530
KH
7332 bzero (ascii_skip_code, sizeof ascii_skip_code);
7333 for (i = 0; i < 128; i++)
7334 ascii_skip_code[i] = 1;
7335
9ce27fde
KH
7336#if defined (MSDOS) || defined (WINDOWSNT)
7337 system_eol_type = CODING_EOL_CRLF;
7338#else
7339 system_eol_type = CODING_EOL_LF;
7340#endif
b843d1ae
KH
7341
7342 inhibit_pre_post_conversion = 0;
e0e989f6
KH
7343}
7344
7345#ifdef emacs
7346
dfcf069d 7347void
e0e989f6
KH
7348syms_of_coding ()
7349{
7350 Qtarget_idx = intern ("target-idx");
7351 staticpro (&Qtarget_idx);
7352
bb0115a2
RS
7353 Qcoding_system_history = intern ("coding-system-history");
7354 staticpro (&Qcoding_system_history);
7355 Fset (Qcoding_system_history, Qnil);
7356
9ce27fde 7357 /* Target FILENAME is the first argument. */
e0e989f6 7358 Fput (Qinsert_file_contents, Qtarget_idx, make_number (0));
9ce27fde 7359 /* Target FILENAME is the third argument. */
e0e989f6
KH
7360 Fput (Qwrite_region, Qtarget_idx, make_number (2));
7361
7362 Qcall_process = intern ("call-process");
7363 staticpro (&Qcall_process);
9ce27fde 7364 /* Target PROGRAM is the first argument. */
e0e989f6
KH
7365 Fput (Qcall_process, Qtarget_idx, make_number (0));
7366
7367 Qcall_process_region = intern ("call-process-region");
7368 staticpro (&Qcall_process_region);
9ce27fde 7369 /* Target PROGRAM is the third argument. */
e0e989f6
KH
7370 Fput (Qcall_process_region, Qtarget_idx, make_number (2));
7371
7372 Qstart_process = intern ("start-process");
7373 staticpro (&Qstart_process);
9ce27fde 7374 /* Target PROGRAM is the third argument. */
e0e989f6
KH
7375 Fput (Qstart_process, Qtarget_idx, make_number (2));
7376
7377 Qopen_network_stream = intern ("open-network-stream");
7378 staticpro (&Qopen_network_stream);
9ce27fde 7379 /* Target SERVICE is the fourth argument. */
e0e989f6
KH
7380 Fput (Qopen_network_stream, Qtarget_idx, make_number (3));
7381
4ed46869
KH
7382 Qcoding_system = intern ("coding-system");
7383 staticpro (&Qcoding_system);
7384
7385 Qeol_type = intern ("eol-type");
7386 staticpro (&Qeol_type);
7387
7388 Qbuffer_file_coding_system = intern ("buffer-file-coding-system");
7389 staticpro (&Qbuffer_file_coding_system);
7390
7391 Qpost_read_conversion = intern ("post-read-conversion");
7392 staticpro (&Qpost_read_conversion);
7393
7394 Qpre_write_conversion = intern ("pre-write-conversion");
7395 staticpro (&Qpre_write_conversion);
7396
27901516
KH
7397 Qno_conversion = intern ("no-conversion");
7398 staticpro (&Qno_conversion);
7399
7400 Qundecided = intern ("undecided");
7401 staticpro (&Qundecided);
7402
4ed46869
KH
7403 Qcoding_system_p = intern ("coding-system-p");
7404 staticpro (&Qcoding_system_p);
7405
7406 Qcoding_system_error = intern ("coding-system-error");
7407 staticpro (&Qcoding_system_error);
7408
7409 Fput (Qcoding_system_error, Qerror_conditions,
7410 Fcons (Qcoding_system_error, Fcons (Qerror, Qnil)));
7411 Fput (Qcoding_system_error, Qerror_message,
9ce27fde 7412 build_string ("Invalid coding system"));
4ed46869 7413
d46c5b12
KH
7414 Qcoding_category = intern ("coding-category");
7415 staticpro (&Qcoding_category);
4ed46869
KH
7416 Qcoding_category_index = intern ("coding-category-index");
7417 staticpro (&Qcoding_category_index);
7418
d46c5b12
KH
7419 Vcoding_category_table
7420 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX), Qnil);
7421 staticpro (&Vcoding_category_table);
4ed46869
KH
7422 {
7423 int i;
7424 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
7425 {
d46c5b12
KH
7426 XVECTOR (Vcoding_category_table)->contents[i]
7427 = intern (coding_category_name[i]);
7428 Fput (XVECTOR (Vcoding_category_table)->contents[i],
7429 Qcoding_category_index, make_number (i));
4ed46869
KH
7430 }
7431 }
7432
6b89e3aa
KH
7433 Vcoding_system_safe_chars = Fcons (Qnil, Qnil);
7434 staticpro (&Vcoding_system_safe_chars);
7435
f967223b
KH
7436 Qtranslation_table = intern ("translation-table");
7437 staticpro (&Qtranslation_table);
b666620c 7438 Fput (Qtranslation_table, Qchar_table_extra_slots, make_number (2));
bdd9fb48 7439
f967223b
KH
7440 Qtranslation_table_id = intern ("translation-table-id");
7441 staticpro (&Qtranslation_table_id);
84fbb8a0 7442
f967223b
KH
7443 Qtranslation_table_for_decode = intern ("translation-table-for-decode");
7444 staticpro (&Qtranslation_table_for_decode);
a5d301df 7445
f967223b
KH
7446 Qtranslation_table_for_encode = intern ("translation-table-for-encode");
7447 staticpro (&Qtranslation_table_for_encode);
a5d301df 7448
05e6f5dc
KH
7449 Qsafe_chars = intern ("safe-chars");
7450 staticpro (&Qsafe_chars);
7451
7452 Qchar_coding_system = intern ("char-coding-system");
7453 staticpro (&Qchar_coding_system);
7454
7455 /* Intern this now in case it isn't already done.
7456 Setting this variable twice is harmless.
7457 But don't staticpro it here--that is done in alloc.c. */
7458 Qchar_table_extra_slots = intern ("char-table-extra-slots");
7459 Fput (Qsafe_chars, Qchar_table_extra_slots, make_number (0));
067a6a66 7460 Fput (Qchar_coding_system, Qchar_table_extra_slots, make_number (0));
70c22245 7461
1397dc18
KH
7462 Qvalid_codes = intern ("valid-codes");
7463 staticpro (&Qvalid_codes);
7464
9ce27fde
KH
7465 Qemacs_mule = intern ("emacs-mule");
7466 staticpro (&Qemacs_mule);
7467
d46c5b12
KH
7468 Qraw_text = intern ("raw-text");
7469 staticpro (&Qraw_text);
7470
4ed46869
KH
7471 defsubr (&Scoding_system_p);
7472 defsubr (&Sread_coding_system);
7473 defsubr (&Sread_non_nil_coding_system);
7474 defsubr (&Scheck_coding_system);
7475 defsubr (&Sdetect_coding_region);
d46c5b12 7476 defsubr (&Sdetect_coding_string);
05e6f5dc 7477 defsubr (&Sfind_coding_systems_region_internal);
068a9dbd 7478 defsubr (&Sunencodable_char_position);
4ed46869
KH
7479 defsubr (&Sdecode_coding_region);
7480 defsubr (&Sencode_coding_region);
7481 defsubr (&Sdecode_coding_string);
7482 defsubr (&Sencode_coding_string);
7483 defsubr (&Sdecode_sjis_char);
7484 defsubr (&Sencode_sjis_char);
7485 defsubr (&Sdecode_big5_char);
7486 defsubr (&Sencode_big5_char);
1ba9e4ab 7487 defsubr (&Sset_terminal_coding_system_internal);
c4825358 7488 defsubr (&Sset_safe_terminal_coding_system_internal);
4ed46869 7489 defsubr (&Sterminal_coding_system);
1ba9e4ab 7490 defsubr (&Sset_keyboard_coding_system_internal);
4ed46869 7491 defsubr (&Skeyboard_coding_system);
a5d301df 7492 defsubr (&Sfind_operation_coding_system);
1397dc18 7493 defsubr (&Supdate_coding_systems_internal);
66cfb530 7494 defsubr (&Sset_coding_priority_internal);
6b89e3aa 7495 defsubr (&Sdefine_coding_system_internal);
4ed46869 7496
4608c386 7497 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list,
48b0f3ae
PJ
7498 doc: /* List of coding systems.
7499
7500Do not alter the value of this variable manually. This variable should be
7501updated by the functions `make-coding-system' and
7502`define-coding-system-alias'. */);
4608c386
KH
7503 Vcoding_system_list = Qnil;
7504
7505 DEFVAR_LISP ("coding-system-alist", &Vcoding_system_alist,
48b0f3ae
PJ
7506 doc: /* Alist of coding system names.
7507Each element is one element list of coding system name.
7508This variable is given to `completing-read' as TABLE argument.
7509
7510Do not alter the value of this variable manually. This variable should be
7511updated by the functions `make-coding-system' and
7512`define-coding-system-alias'. */);
4608c386
KH
7513 Vcoding_system_alist = Qnil;
7514
4ed46869 7515 DEFVAR_LISP ("coding-category-list", &Vcoding_category_list,
48b0f3ae
PJ
7516 doc: /* List of coding-categories (symbols) ordered by priority.
7517
7518On detecting a coding system, Emacs tries code detection algorithms
7519associated with each coding-category one by one in this order. When
7520one algorithm agrees with a byte sequence of source text, the coding
7521system bound to the corresponding coding-category is selected. */);
4ed46869
KH
7522 {
7523 int i;
7524
7525 Vcoding_category_list = Qnil;
7526 for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--)
7527 Vcoding_category_list
d46c5b12
KH
7528 = Fcons (XVECTOR (Vcoding_category_table)->contents[i],
7529 Vcoding_category_list);
4ed46869
KH
7530 }
7531
7532 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read,
48b0f3ae
PJ
7533 doc: /* Specify the coding system for read operations.
7534It is useful to bind this variable with `let', but do not set it globally.
7535If the value is a coding system, it is used for decoding on read operation.
7536If not, an appropriate element is used from one of the coding system alists:
7537There are three such tables, `file-coding-system-alist',
7538`process-coding-system-alist', and `network-coding-system-alist'. */);
4ed46869
KH
7539 Vcoding_system_for_read = Qnil;
7540
7541 DEFVAR_LISP ("coding-system-for-write", &Vcoding_system_for_write,
48b0f3ae
PJ
7542 doc: /* Specify the coding system for write operations.
7543Programs bind this variable with `let', but you should not set it globally.
7544If the value is a coding system, it is used for encoding of output,
7545when writing it to a file and when sending it to a file or subprocess.
7546
7547If this does not specify a coding system, an appropriate element
7548is used from one of the coding system alists:
7549There are three such tables, `file-coding-system-alist',
7550`process-coding-system-alist', and `network-coding-system-alist'.
7551For output to files, if the above procedure does not specify a coding system,
7552the value of `buffer-file-coding-system' is used. */);
4ed46869
KH
7553 Vcoding_system_for_write = Qnil;
7554
7555 DEFVAR_LISP ("last-coding-system-used", &Vlast_coding_system_used,
48b0f3ae 7556 doc: /* Coding system used in the latest file or process I/O. */);
4ed46869
KH
7557 Vlast_coding_system_used = Qnil;
7558
9ce27fde 7559 DEFVAR_BOOL ("inhibit-eol-conversion", &inhibit_eol_conversion,
48b0f3ae
PJ
7560 doc: /* *Non-nil means always inhibit code conversion of end-of-line format.
7561See info node `Coding Systems' and info node `Text and Binary' concerning
7562such conversion. */);
9ce27fde
KH
7563 inhibit_eol_conversion = 0;
7564
ed29121d 7565 DEFVAR_BOOL ("inherit-process-coding-system", &inherit_process_coding_system,
48b0f3ae
PJ
7566 doc: /* Non-nil means process buffer inherits coding system of process output.
7567Bind it to t if the process output is to be treated as if it were a file
7568read from some filesystem. */);
ed29121d
EZ
7569 inherit_process_coding_system = 0;
7570
02ba4723 7571 DEFVAR_LISP ("file-coding-system-alist", &Vfile_coding_system_alist,
48b0f3ae
PJ
7572 doc: /* Alist to decide a coding system to use for a file I/O operation.
7573The format is ((PATTERN . VAL) ...),
7574where PATTERN is a regular expression matching a file name,
7575VAL is a coding system, a cons of coding systems, or a function symbol.
7576If VAL is a coding system, it is used for both decoding and encoding
7577the file contents.
7578If VAL is a cons of coding systems, the car part is used for decoding,
7579and the cdr part is used for encoding.
7580If VAL is a function symbol, the function must return a coding system
0192762c 7581or a cons of coding systems which are used as above. The function gets
ff955d90 7582the arguments with which `find-operation-coding-system' was called.
48b0f3ae
PJ
7583
7584See also the function `find-operation-coding-system'
7585and the variable `auto-coding-alist'. */);
02ba4723
KH
7586 Vfile_coding_system_alist = Qnil;
7587
7588 DEFVAR_LISP ("process-coding-system-alist", &Vprocess_coding_system_alist,
48b0f3ae
PJ
7589 doc: /* Alist to decide a coding system to use for a process I/O operation.
7590The format is ((PATTERN . VAL) ...),
7591where PATTERN is a regular expression matching a program name,
7592VAL is a coding system, a cons of coding systems, or a function symbol.
7593If VAL is a coding system, it is used for both decoding what received
7594from the program and encoding what sent to the program.
7595If VAL is a cons of coding systems, the car part is used for decoding,
7596and the cdr part is used for encoding.
7597If VAL is a function symbol, the function must return a coding system
7598or a cons of coding systems which are used as above.
7599
7600See also the function `find-operation-coding-system'. */);
02ba4723
KH
7601 Vprocess_coding_system_alist = Qnil;
7602
7603 DEFVAR_LISP ("network-coding-system-alist", &Vnetwork_coding_system_alist,
48b0f3ae
PJ
7604 doc: /* Alist to decide a coding system to use for a network I/O operation.
7605The format is ((PATTERN . VAL) ...),
7606where PATTERN is a regular expression matching a network service name
7607or is a port number to connect to,
7608VAL is a coding system, a cons of coding systems, or a function symbol.
7609If VAL is a coding system, it is used for both decoding what received
7610from the network stream and encoding what sent to the network stream.
7611If VAL is a cons of coding systems, the car part is used for decoding,
7612and the cdr part is used for encoding.
7613If VAL is a function symbol, the function must return a coding system
7614or a cons of coding systems which are used as above.
7615
7616See also the function `find-operation-coding-system'. */);
02ba4723 7617 Vnetwork_coding_system_alist = Qnil;
4ed46869 7618
68c45bf0 7619 DEFVAR_LISP ("locale-coding-system", &Vlocale_coding_system,
75205970
RS
7620 doc: /* Coding system to use with system messages.
7621Also used for decoding keyboard input on X Window system. */);
68c45bf0
PE
7622 Vlocale_coding_system = Qnil;
7623
005f0d35 7624 /* The eol mnemonics are reset in startup.el system-dependently. */
7722baf9 7625 DEFVAR_LISP ("eol-mnemonic-unix", &eol_mnemonic_unix,
48b0f3ae 7626 doc: /* *String displayed in mode line for UNIX-like (LF) end-of-line format. */);
7722baf9 7627 eol_mnemonic_unix = build_string (":");
4ed46869 7628
7722baf9 7629 DEFVAR_LISP ("eol-mnemonic-dos", &eol_mnemonic_dos,
48b0f3ae 7630 doc: /* *String displayed in mode line for DOS-like (CRLF) end-of-line format. */);
7722baf9 7631 eol_mnemonic_dos = build_string ("\\");
4ed46869 7632
7722baf9 7633 DEFVAR_LISP ("eol-mnemonic-mac", &eol_mnemonic_mac,
48b0f3ae 7634 doc: /* *String displayed in mode line for MAC-like (CR) end-of-line format. */);
7722baf9 7635 eol_mnemonic_mac = build_string ("/");
4ed46869 7636
7722baf9 7637 DEFVAR_LISP ("eol-mnemonic-undecided", &eol_mnemonic_undecided,
48b0f3ae 7638 doc: /* *String displayed in mode line when end-of-line format is not yet determined. */);
7722baf9 7639 eol_mnemonic_undecided = build_string (":");
4ed46869 7640
84fbb8a0 7641 DEFVAR_LISP ("enable-character-translation", &Venable_character_translation,
48b0f3ae 7642 doc: /* *Non-nil enables character translation while encoding and decoding. */);
84fbb8a0 7643 Venable_character_translation = Qt;
bdd9fb48 7644
f967223b 7645 DEFVAR_LISP ("standard-translation-table-for-decode",
48b0f3ae
PJ
7646 &Vstandard_translation_table_for_decode,
7647 doc: /* Table for translating characters while decoding. */);
f967223b 7648 Vstandard_translation_table_for_decode = Qnil;
bdd9fb48 7649
f967223b 7650 DEFVAR_LISP ("standard-translation-table-for-encode",
48b0f3ae
PJ
7651 &Vstandard_translation_table_for_encode,
7652 doc: /* Table for translating characters while encoding. */);
f967223b 7653 Vstandard_translation_table_for_encode = Qnil;
4ed46869
KH
7654
7655 DEFVAR_LISP ("charset-revision-table", &Vcharset_revision_alist,
48b0f3ae
PJ
7656 doc: /* Alist of charsets vs revision numbers.
7657While encoding, if a charset (car part of an element) is found,
7658designate it with the escape sequence identifying revision (cdr part of the element). */);
4ed46869 7659 Vcharset_revision_alist = Qnil;
02ba4723
KH
7660
7661 DEFVAR_LISP ("default-process-coding-system",
7662 &Vdefault_process_coding_system,
48b0f3ae
PJ
7663 doc: /* Cons of coding systems used for process I/O by default.
7664The car part is used for decoding a process output,
7665the cdr part is used for encoding a text to be sent to a process. */);
02ba4723 7666 Vdefault_process_coding_system = Qnil;
c4825358 7667
3f003981 7668 DEFVAR_LISP ("latin-extra-code-table", &Vlatin_extra_code_table,
48b0f3ae
PJ
7669 doc: /* Table of extra Latin codes in the range 128..159 (inclusive).
7670This is a vector of length 256.
7671If Nth element is non-nil, the existence of code N in a file
7672\(or output of subprocess) doesn't prevent it to be detected as
7673a coding system of ISO 2022 variant which has a flag
7674`accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file
7675or reading output of a subprocess.
7676Only 128th through 159th elements has a meaning. */);
3f003981 7677 Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil);
d46c5b12
KH
7678
7679 DEFVAR_LISP ("select-safe-coding-system-function",
7680 &Vselect_safe_coding_system_function,
48b0f3ae
PJ
7681 doc: /* Function to call to select safe coding system for encoding a text.
7682
7683If set, this function is called to force a user to select a proper
7684coding system which can encode the text in the case that a default
7685coding system used in each operation can't encode the text.
7686
7687The default value is `select-safe-coding-system' (which see). */);
d46c5b12
KH
7688 Vselect_safe_coding_system_function = Qnil;
7689
5d5bf4d8
KH
7690 DEFVAR_BOOL ("coding-system-require-warning",
7691 &coding_system_require_warning,
7692 doc: /* Internal use only.
6b89e3aa
KH
7693If non-nil, on writing a file, `select-safe-coding-system-function' is
7694called even if `coding-system-for-write' is non-nil. The command
7695`universal-coding-system-argument' binds this variable to t temporarily. */);
5d5bf4d8
KH
7696 coding_system_require_warning = 0;
7697
7698
22ab2303 7699 DEFVAR_BOOL ("inhibit-iso-escape-detection",
74383408 7700 &inhibit_iso_escape_detection,
48b0f3ae
PJ
7701 doc: /* If non-nil, Emacs ignores ISO2022's escape sequence on code detection.
7702
7703By default, on reading a file, Emacs tries to detect how the text is
7704encoded. This code detection is sensitive to escape sequences. If
7705the sequence is valid as ISO2022, the code is determined as one of
7706the ISO2022 encodings, and the file is decoded by the corresponding
7707coding system (e.g. `iso-2022-7bit').
7708
7709However, there may be a case that you want to read escape sequences in
7710a file as is. In such a case, you can set this variable to non-nil.
7711Then, as the code detection ignores any escape sequences, no file is
7712detected as encoded in some ISO2022 encoding. The result is that all
7713escape sequences become visible in a buffer.
7714
7715The default value is nil, and it is strongly recommended not to change
7716it. That is because many Emacs Lisp source files that contain
7717non-ASCII characters are encoded by the coding system `iso-2022-7bit'
7718in Emacs's distribution, and they won't be decoded correctly on
7719reading if you suppress escape sequence detection.
7720
7721The other way to read escape sequences in a file without decoding is
7722to explicitly specify some coding system that doesn't use ISO2022's
7723escape sequence (e.g `latin-1') on reading by \\[universal-coding-system-argument]. */);
74383408 7724 inhibit_iso_escape_detection = 0;
002fdb44
DL
7725
7726 DEFVAR_LISP ("translation-table-for-input", &Vtranslation_table_for_input,
15c8f9d1
DL
7727 doc: /* Char table for translating self-inserting characters.
7728This is applied to the result of input methods, not their input. See also
7729`keyboard-translate-table'. */);
002fdb44 7730 Vtranslation_table_for_input = Qnil;
4ed46869
KH
7731}
7732
68c45bf0
PE
7733char *
7734emacs_strerror (error_number)
7735 int error_number;
7736{
7737 char *str;
7738
ca9c0567 7739 synchronize_system_messages_locale ();
68c45bf0
PE
7740 str = strerror (error_number);
7741
7742 if (! NILP (Vlocale_coding_system))
7743 {
7744 Lisp_Object dec = code_convert_string_norecord (build_string (str),
7745 Vlocale_coding_system,
7746 0);
d5db4077 7747 str = (char *) SDATA (dec);
68c45bf0
PE
7748 }
7749
7750 return str;
7751}
7752
4ed46869 7753#endif /* emacs */
c2f94ebc 7754