Rename `struct device' to `struct terminal'. Rename some terminal-related functions...
[bpt/emacs.git] / src / coding.c
CommitLineData
4ed46869 1/* Coding system handler (conversion, detection, and etc).
0b5538bd 2 Copyright (C) 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
ce03bf76
KH
3 Copyright (C) 1995, 1997, 1998, 2002, 2003, 2004, 2005
4 National Institute of Advanced Industrial Science and Technology (AIST)
5 Registration Number H14PRO021
4ed46869 6
369314dc
KH
7This file is part of GNU Emacs.
8
9GNU Emacs is free software; you can redistribute it and/or modify
10it under the terms of the GNU General Public License as published by
11the Free Software Foundation; either version 2, or (at your option)
12any later version.
4ed46869 13
369314dc
KH
14GNU Emacs is distributed in the hope that it will be useful,
15but WITHOUT ANY WARRANTY; without even the implied warranty of
16MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17GNU General Public License for more details.
4ed46869 18
369314dc
KH
19You should have received a copy of the GNU General Public License
20along with GNU Emacs; see the file COPYING. If not, write to
4fc5845f
LK
21the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
22Boston, MA 02110-1301, USA. */
4ed46869
KH
23
24/*** TABLE OF CONTENTS ***
25
b73bfc1c 26 0. General comments
4ed46869 27 1. Preamble
0ef69138 28 2. Emacs' internal format (emacs-mule) handlers
4ed46869
KH
29 3. ISO2022 handlers
30 4. Shift-JIS and BIG5 handlers
1397dc18
KH
31 5. CCL handlers
32 6. End-of-line handlers
33 7. C library functions
34 8. Emacs Lisp library functions
35 9. Post-amble
4ed46869
KH
36
37*/
38
b73bfc1c
KH
39/*** 0. General comments ***/
40
41
cfb43547 42/*** GENERAL NOTE on CODING SYSTEMS ***
4ed46869 43
cfb43547 44 A coding system is an encoding mechanism for one or more character
4ed46869
KH
45 sets. Here's a list of coding systems which Emacs can handle. When
46 we say "decode", it means converting some other coding system to
cfb43547 47 Emacs' internal format (emacs-mule), and when we say "encode",
0ef69138
KH
48 it means converting the coding system emacs-mule to some other
49 coding system.
4ed46869 50
0ef69138 51 0. Emacs' internal format (emacs-mule)
4ed46869 52
cfb43547 53 Emacs itself holds a multi-lingual character in buffers and strings
f4dee582 54 in a special format. Details are described in section 2.
4ed46869
KH
55
56 1. ISO2022
57
58 The most famous coding system for multiple character sets. X's
f4dee582
RS
59 Compound Text, various EUCs (Extended Unix Code), and coding
60 systems used in Internet communication such as ISO-2022-JP are
61 all variants of ISO2022. Details are described in section 3.
4ed46869
KH
62
63 2. SJIS (or Shift-JIS or MS-Kanji-Code)
93dec019 64
4ed46869
KH
65 A coding system to encode character sets: ASCII, JISX0201, and
66 JISX0208. Widely used for PC's in Japan. Details are described in
f4dee582 67 section 4.
4ed46869
KH
68
69 3. BIG5
70
cfb43547
DL
71 A coding system to encode the character sets ASCII and Big5. Widely
72 used for Chinese (mainly in Taiwan and Hong Kong). Details are
f4dee582
RS
73 described in section 4. In this file, when we write "BIG5"
74 (all uppercase), we mean the coding system, and when we write
75 "Big5" (capitalized), we mean the character set.
4ed46869 76
27901516
KH
77 4. Raw text
78
cfb43547
DL
79 A coding system for text containing random 8-bit code. Emacs does
80 no code conversion on such text except for end-of-line format.
27901516
KH
81
82 5. Other
4ed46869 83
cfb43547
DL
84 If a user wants to read/write text encoded in a coding system not
85 listed above, he can supply a decoder and an encoder for it as CCL
4ed46869
KH
86 (Code Conversion Language) programs. Emacs executes the CCL program
87 while reading/writing.
88
d46c5b12
KH
89 Emacs represents a coding system by a Lisp symbol that has a property
90 `coding-system'. But, before actually using the coding system, the
4ed46869 91 information about it is set in a structure of type `struct
f4dee582 92 coding_system' for rapid processing. See section 6 for more details.
4ed46869
KH
93
94*/
95
96/*** GENERAL NOTES on END-OF-LINE FORMAT ***
97
cfb43547
DL
98 How end-of-line of text is encoded depends on the operating system.
99 For instance, Unix's format is just one byte of `line-feed' code,
f4dee582 100 whereas DOS's format is two-byte sequence of `carriage-return' and
d46c5b12
KH
101 `line-feed' codes. MacOS's format is usually one byte of
102 `carriage-return'.
4ed46869 103
cfb43547
DL
104 Since text character encoding and end-of-line encoding are
105 independent, any coding system described above can have any
106 end-of-line format. So Emacs has information about end-of-line
107 format in each coding-system. See section 6 for more details.
4ed46869
KH
108
109*/
110
111/*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
112
113 These functions check if a text between SRC and SRC_END is encoded
114 in the coding system category XXX. Each returns an integer value in
cfb43547 115 which appropriate flag bits for the category XXX are set. The flag
4ed46869 116 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the
cfb43547 117 template for these functions. If MULTIBYTEP is nonzero, 8-bit codes
0a28aafb 118 of the range 0x80..0x9F are in multibyte form. */
4ed46869
KH
119#if 0
120int
0a28aafb 121detect_coding_emacs_mule (src, src_end, multibytep)
4ed46869 122 unsigned char *src, *src_end;
0a28aafb 123 int multibytep;
4ed46869
KH
124{
125 ...
126}
127#endif
128
129/*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
130
b73bfc1c
KH
131 These functions decode SRC_BYTES length of unibyte text at SOURCE
132 encoded in CODING to Emacs' internal format. The resulting
133 multibyte text goes to a place pointed to by DESTINATION, the length
134 of which should not exceed DST_BYTES.
d46c5b12 135
cfb43547
DL
136 These functions set the information about original and decoded texts
137 in the members `produced', `produced_char', `consumed', and
138 `consumed_char' of the structure *CODING. They also set the member
139 `result' to one of CODING_FINISH_XXX indicating how the decoding
140 finished.
d46c5b12 141
cfb43547 142 DST_BYTES zero means that the source area and destination area are
d46c5b12 143 overlapped, which means that we can produce a decoded text until it
cfb43547 144 reaches the head of the not-yet-decoded source text.
d46c5b12 145
cfb43547 146 Below is a template for these functions. */
4ed46869 147#if 0
b73bfc1c 148static void
d46c5b12 149decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
4ed46869 150 struct coding_system *coding;
5bdca8af
DN
151 const unsigned char *source;
152 unsigned char *destination;
4ed46869 153 int src_bytes, dst_bytes;
4ed46869
KH
154{
155 ...
156}
157#endif
158
159/*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
160
cfb43547 161 These functions encode SRC_BYTES length text at SOURCE from Emacs'
b73bfc1c
KH
162 internal multibyte format to CODING. The resulting unibyte text
163 goes to a place pointed to by DESTINATION, the length of which
164 should not exceed DST_BYTES.
d46c5b12 165
cfb43547
DL
166 These functions set the information about original and encoded texts
167 in the members `produced', `produced_char', `consumed', and
168 `consumed_char' of the structure *CODING. They also set the member
169 `result' to one of CODING_FINISH_XXX indicating how the encoding
170 finished.
d46c5b12 171
cfb43547
DL
172 DST_BYTES zero means that the source area and destination area are
173 overlapped, which means that we can produce encoded text until it
174 reaches at the head of the not-yet-encoded source text.
d46c5b12 175
cfb43547 176 Below is a template for these functions. */
4ed46869 177#if 0
b73bfc1c 178static void
d46c5b12 179encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
4ed46869
KH
180 struct coding_system *coding;
181 unsigned char *source, *destination;
182 int src_bytes, dst_bytes;
4ed46869
KH
183{
184 ...
185}
186#endif
187
188/*** COMMONLY USED MACROS ***/
189
b73bfc1c
KH
190/* The following two macros ONE_MORE_BYTE and TWO_MORE_BYTES safely
191 get one, two, and three bytes from the source text respectively.
192 If there are not enough bytes in the source, they jump to
193 `label_end_of_loop'. The caller should set variables `coding',
194 `src' and `src_end' to appropriate pointer in advance. These
195 macros are called from decoding routines `decode_coding_XXX', thus
196 it is assumed that the source text is unibyte. */
4ed46869 197
b73bfc1c
KH
198#define ONE_MORE_BYTE(c1) \
199 do { \
200 if (src >= src_end) \
201 { \
202 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
203 goto label_end_of_loop; \
204 } \
205 c1 = *src++; \
4ed46869
KH
206 } while (0)
207
b73bfc1c
KH
208#define TWO_MORE_BYTES(c1, c2) \
209 do { \
210 if (src + 1 >= src_end) \
211 { \
212 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
213 goto label_end_of_loop; \
214 } \
215 c1 = *src++; \
216 c2 = *src++; \
4ed46869
KH
217 } while (0)
218
4ed46869 219
0a28aafb
KH
220/* Like ONE_MORE_BYTE, but 8-bit bytes of data at SRC are in multibyte
221 form if MULTIBYTEP is nonzero. */
222
223#define ONE_MORE_BYTE_CHECK_MULTIBYTE(c1, multibytep) \
224 do { \
225 if (src >= src_end) \
226 { \
227 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
228 goto label_end_of_loop; \
229 } \
230 c1 = *src++; \
231 if (multibytep && c1 == LEADING_CODE_8_BIT_CONTROL) \
232 c1 = *src++ - 0x20; \
233 } while (0)
234
b73bfc1c
KH
235/* Set C to the next character at the source text pointed by `src'.
236 If there are not enough characters in the source, jump to
237 `label_end_of_loop'. The caller should set variables `coding'
238 `src', `src_end', and `translation_table' to appropriate pointers
239 in advance. This macro is used in encoding routines
240 `encode_coding_XXX', thus it assumes that the source text is in
241 multibyte form except for 8-bit characters. 8-bit characters are
242 in multibyte form if coding->src_multibyte is nonzero, else they
243 are represented by a single byte. */
4ed46869 244
b73bfc1c
KH
245#define ONE_MORE_CHAR(c) \
246 do { \
247 int len = src_end - src; \
248 int bytes; \
249 if (len <= 0) \
250 { \
251 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
252 goto label_end_of_loop; \
253 } \
254 if (coding->src_multibyte \
255 || UNIBYTE_STR_AS_MULTIBYTE_P (src, len, bytes)) \
256 c = STRING_CHAR_AND_LENGTH (src, len, bytes); \
257 else \
258 c = *src, bytes = 1; \
259 if (!NILP (translation_table)) \
39658efc 260 c = translate_char (translation_table, c, -1, 0, 0); \
b73bfc1c 261 src += bytes; \
4ed46869
KH
262 } while (0)
263
4ed46869 264
8ca3766a 265/* Produce a multibyte form of character C to `dst'. Jump to
b73bfc1c
KH
266 `label_end_of_loop' if there's not enough space at `dst'.
267
cfb43547 268 If we are now in the middle of a composition sequence, the decoded
b73bfc1c
KH
269 character may be ALTCHAR (for the current composition). In that
270 case, the character goes to coding->cmp_data->data instead of
271 `dst'.
272
273 This macro is used in decoding routines. */
274
275#define EMIT_CHAR(c) \
4ed46869 276 do { \
b73bfc1c
KH
277 if (! COMPOSING_P (coding) \
278 || coding->composing == COMPOSITION_RELATIVE \
279 || coding->composing == COMPOSITION_WITH_RULE) \
280 { \
281 int bytes = CHAR_BYTES (c); \
282 if ((dst + bytes) > (dst_bytes ? dst_end : src)) \
283 { \
284 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
285 goto label_end_of_loop; \
286 } \
287 dst += CHAR_STRING (c, dst); \
288 coding->produced_char++; \
289 } \
ec6d2bb8 290 \
b73bfc1c
KH
291 if (COMPOSING_P (coding) \
292 && coding->composing != COMPOSITION_RELATIVE) \
293 { \
294 CODING_ADD_COMPOSITION_COMPONENT (coding, c); \
295 coding->composition_rule_follows \
296 = coding->composing != COMPOSITION_WITH_ALTCHARS; \
297 } \
4ed46869
KH
298 } while (0)
299
4ed46869 300
b73bfc1c
KH
301#define EMIT_ONE_BYTE(c) \
302 do { \
303 if (dst >= (dst_bytes ? dst_end : src)) \
304 { \
305 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
306 goto label_end_of_loop; \
307 } \
308 *dst++ = c; \
309 } while (0)
310
311#define EMIT_TWO_BYTES(c1, c2) \
312 do { \
313 if (dst + 2 > (dst_bytes ? dst_end : src)) \
314 { \
315 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
316 goto label_end_of_loop; \
317 } \
318 *dst++ = c1, *dst++ = c2; \
319 } while (0)
320
321#define EMIT_BYTES(from, to) \
322 do { \
323 if (dst + (to - from) > (dst_bytes ? dst_end : src)) \
324 { \
325 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
326 goto label_end_of_loop; \
327 } \
328 while (from < to) \
329 *dst++ = *from++; \
4ed46869
KH
330 } while (0)
331
332\f
333/*** 1. Preamble ***/
334
68c45bf0
PE
335#ifdef emacs
336#include <config.h>
337#endif
338
4ed46869
KH
339#include <stdio.h>
340
341#ifdef emacs
342
4ed46869
KH
343#include "lisp.h"
344#include "buffer.h"
345#include "charset.h"
ec6d2bb8 346#include "composite.h"
4ed46869
KH
347#include "ccl.h"
348#include "coding.h"
349#include "window.h"
66638433 350#include "intervals.h"
b8299c66
KL
351#include "frame.h"
352#include "termhooks.h"
4ed46869
KH
353
354#else /* not emacs */
355
356#include "mulelib.h"
357
358#endif /* not emacs */
359
360Lisp_Object Qcoding_system, Qeol_type;
361Lisp_Object Qbuffer_file_coding_system;
362Lisp_Object Qpost_read_conversion, Qpre_write_conversion;
27901516 363Lisp_Object Qno_conversion, Qundecided;
bb0115a2 364Lisp_Object Qcoding_system_history;
05e6f5dc 365Lisp_Object Qsafe_chars;
1397dc18 366Lisp_Object Qvalid_codes;
4ed46869
KH
367
368extern Lisp_Object Qinsert_file_contents, Qwrite_region;
387f6ba5 369Lisp_Object Qcall_process, Qcall_process_region;
4ed46869
KH
370Lisp_Object Qstart_process, Qopen_network_stream;
371Lisp_Object Qtarget_idx;
372
a362520d
KH
373/* If a symbol has this property, evaluate the value to define the
374 symbol as a coding system. */
375Lisp_Object Qcoding_system_define_form;
376
d46c5b12
KH
377Lisp_Object Vselect_safe_coding_system_function;
378
5d5bf4d8
KH
379int coding_system_require_warning;
380
7722baf9
EZ
381/* Mnemonic string for each format of end-of-line. */
382Lisp_Object eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac;
383/* Mnemonic string to indicate format of end-of-line is not yet
4ed46869 384 decided. */
7722baf9 385Lisp_Object eol_mnemonic_undecided;
4ed46869 386
9ce27fde
KH
387/* Format of end-of-line decided by system. This is CODING_EOL_LF on
388 Unix, CODING_EOL_CRLF on DOS/Windows, and CODING_EOL_CR on Mac. */
389int system_eol_type;
390
4ed46869
KH
391#ifdef emacs
392
6b89e3aa
KH
393/* Information about which coding system is safe for which chars.
394 The value has the form (GENERIC-LIST . NON-GENERIC-ALIST).
395
396 GENERIC-LIST is a list of generic coding systems which can encode
397 any characters.
398
399 NON-GENERIC-ALIST is an alist of non generic coding systems vs the
400 corresponding char table that contains safe chars. */
401Lisp_Object Vcoding_system_safe_chars;
402
4608c386
KH
403Lisp_Object Vcoding_system_list, Vcoding_system_alist;
404
405Lisp_Object Qcoding_system_p, Qcoding_system_error;
4ed46869 406
d46c5b12
KH
407/* Coding system emacs-mule and raw-text are for converting only
408 end-of-line format. */
409Lisp_Object Qemacs_mule, Qraw_text;
9ce27fde 410
ecf488bc
DL
411Lisp_Object Qutf_8;
412
4ed46869
KH
413/* Coding-systems are handed between Emacs Lisp programs and C internal
414 routines by the following three variables. */
415/* Coding-system for reading files and receiving data from process. */
416Lisp_Object Vcoding_system_for_read;
417/* Coding-system for writing files and sending data to process. */
418Lisp_Object Vcoding_system_for_write;
419/* Coding-system actually used in the latest I/O. */
420Lisp_Object Vlast_coding_system_used;
421
c4825358 422/* A vector of length 256 which contains information about special
94487c4e 423 Latin codes (especially for dealing with Microsoft codes). */
3f003981 424Lisp_Object Vlatin_extra_code_table;
c4825358 425
9ce27fde
KH
426/* Flag to inhibit code conversion of end-of-line format. */
427int inhibit_eol_conversion;
428
74383408
KH
429/* Flag to inhibit ISO2022 escape sequence detection. */
430int inhibit_iso_escape_detection;
431
ed29121d
EZ
432/* Flag to make buffer-file-coding-system inherit from process-coding. */
433int inherit_process_coding_system;
434
c4825358
KH
435/* Coding system to be used to encode text for terminal display when
436 terminal coding system is nil. */
437struct coding_system safe_terminal_coding;
438
6bc51348
KH
439/* Default coding system to be used to write a file. */
440struct coding_system default_buffer_file_coding;
441
02ba4723
KH
442Lisp_Object Vfile_coding_system_alist;
443Lisp_Object Vprocess_coding_system_alist;
444Lisp_Object Vnetwork_coding_system_alist;
4ed46869 445
68c45bf0
PE
446Lisp_Object Vlocale_coding_system;
447
4ed46869
KH
448#endif /* emacs */
449
d46c5b12 450Lisp_Object Qcoding_category, Qcoding_category_index;
4ed46869
KH
451
452/* List of symbols `coding-category-xxx' ordered by priority. */
453Lisp_Object Vcoding_category_list;
454
d46c5b12
KH
455/* Table of coding categories (Lisp symbols). */
456Lisp_Object Vcoding_category_table;
4ed46869
KH
457
458/* Table of names of symbol for each coding-category. */
459char *coding_category_name[CODING_CATEGORY_IDX_MAX] = {
0ef69138 460 "coding-category-emacs-mule",
4ed46869
KH
461 "coding-category-sjis",
462 "coding-category-iso-7",
d46c5b12 463 "coding-category-iso-7-tight",
4ed46869
KH
464 "coding-category-iso-8-1",
465 "coding-category-iso-8-2",
7717c392
KH
466 "coding-category-iso-7-else",
467 "coding-category-iso-8-else",
89fa8b36 468 "coding-category-ccl",
4ed46869 469 "coding-category-big5",
fa42c37f
KH
470 "coding-category-utf-8",
471 "coding-category-utf-16-be",
472 "coding-category-utf-16-le",
27901516 473 "coding-category-raw-text",
89fa8b36 474 "coding-category-binary"
4ed46869
KH
475};
476
66cfb530 477/* Table of pointers to coding systems corresponding to each coding
d46c5b12
KH
478 categories. */
479struct coding_system *coding_system_table[CODING_CATEGORY_IDX_MAX];
480
66cfb530 481/* Table of coding category masks. Nth element is a mask for a coding
8ca3766a 482 category of which priority is Nth. */
66cfb530
KH
483static
484int coding_priorities[CODING_CATEGORY_IDX_MAX];
485
f967223b
KH
486/* Flag to tell if we look up translation table on character code
487 conversion. */
84fbb8a0 488Lisp_Object Venable_character_translation;
f967223b
KH
489/* Standard translation table to look up on decoding (reading). */
490Lisp_Object Vstandard_translation_table_for_decode;
491/* Standard translation table to look up on encoding (writing). */
492Lisp_Object Vstandard_translation_table_for_encode;
84fbb8a0 493
f967223b
KH
494Lisp_Object Qtranslation_table;
495Lisp_Object Qtranslation_table_id;
496Lisp_Object Qtranslation_table_for_decode;
497Lisp_Object Qtranslation_table_for_encode;
4ed46869
KH
498
499/* Alist of charsets vs revision number. */
500Lisp_Object Vcharset_revision_alist;
501
02ba4723
KH
502/* Default coding systems used for process I/O. */
503Lisp_Object Vdefault_process_coding_system;
504
002fdb44
DL
505/* Char table for translating Quail and self-inserting input. */
506Lisp_Object Vtranslation_table_for_input;
507
b843d1ae
KH
508/* Global flag to tell that we can't call post-read-conversion and
509 pre-write-conversion functions. Usually the value is zero, but it
510 is set to 1 temporarily while such functions are running. This is
511 to avoid infinite recursive call. */
512static int inhibit_pre_post_conversion;
513
05e6f5dc
KH
514Lisp_Object Qchar_coding_system;
515
6b89e3aa
KH
516/* Return `safe-chars' property of CODING_SYSTEM (symbol). Don't check
517 its validity. */
05e6f5dc
KH
518
519Lisp_Object
6b89e3aa
KH
520coding_safe_chars (coding_system)
521 Lisp_Object coding_system;
05e6f5dc
KH
522{
523 Lisp_Object coding_spec, plist, safe_chars;
93dec019 524
6b89e3aa 525 coding_spec = Fget (coding_system, Qcoding_system);
05e6f5dc
KH
526 plist = XVECTOR (coding_spec)->contents[3];
527 safe_chars = Fplist_get (XVECTOR (coding_spec)->contents[3], Qsafe_chars);
528 return (CHAR_TABLE_P (safe_chars) ? safe_chars : Qt);
529}
530
531#define CODING_SAFE_CHAR_P(safe_chars, c) \
532 (EQ (safe_chars, Qt) || !NILP (CHAR_TABLE_REF (safe_chars, c)))
533
4ed46869 534\f
0ef69138 535/*** 2. Emacs internal format (emacs-mule) handlers ***/
4ed46869 536
aa72b389
KH
537/* Emacs' internal format for representation of multiple character
538 sets is a kind of multi-byte encoding, i.e. characters are
539 represented by variable-length sequences of one-byte codes.
b73bfc1c
KH
540
541 ASCII characters and control characters (e.g. `tab', `newline') are
542 represented by one-byte sequences which are their ASCII codes, in
543 the range 0x00 through 0x7F.
544
545 8-bit characters of the range 0x80..0x9F are represented by
546 two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit
547 code + 0x20).
548
549 8-bit characters of the range 0xA0..0xFF are represented by
550 one-byte sequences which are their 8-bit code.
551
552 The other characters are represented by a sequence of `base
553 leading-code', optional `extended leading-code', and one or two
554 `position-code's. The length of the sequence is determined by the
aa72b389 555 base leading-code. Leading-code takes the range 0x81 through 0x9D,
b73bfc1c
KH
556 whereas extended leading-code and position-code take the range 0xA0
557 through 0xFF. See `charset.h' for more details about leading-code
558 and position-code.
f4dee582 559
4ed46869 560 --- CODE RANGE of Emacs' internal format ---
b73bfc1c
KH
561 character set range
562 ------------- -----
563 ascii 0x00..0x7F
564 eight-bit-control LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF
565 eight-bit-graphic 0xA0..0xBF
aa72b389 566 ELSE 0x81..0x9D + [0xA0..0xFF]+
4ed46869
KH
567 ---------------------------------------------
568
aa72b389
KH
569 As this is the internal character representation, the format is
570 usually not used externally (i.e. in a file or in a data sent to a
571 process). But, it is possible to have a text externally in this
572 format (i.e. by encoding by the coding system `emacs-mule').
573
574 In that case, a sequence of one-byte codes has a slightly different
575 form.
576
ae5145c2 577 Firstly, all characters in eight-bit-control are represented by
aa72b389
KH
578 one-byte sequences which are their 8-bit code.
579
580 Next, character composition data are represented by the byte
581 sequence of the form: 0x80 METHOD BYTES CHARS COMPONENT ...,
582 where,
583 METHOD is 0xF0 plus one of composition method (enum
584 composition_method),
585
ae5145c2 586 BYTES is 0xA0 plus the byte length of these composition data,
aa72b389 587
ae5145c2 588 CHARS is 0xA0 plus the number of characters composed by these
aa72b389
KH
589 data,
590
8ca3766a 591 COMPONENTs are characters of multibyte form or composition
aa72b389
KH
592 rules encoded by two-byte of ASCII codes.
593
594 In addition, for backward compatibility, the following formats are
595 also recognized as composition data on decoding.
596
597 0x80 MSEQ ...
598 0x80 0xFF MSEQ RULE MSEQ RULE ... MSEQ
599
600 Here,
601 MSEQ is a multibyte form but in these special format:
602 ASCII: 0xA0 ASCII_CODE+0x80,
603 other: LEADING_CODE+0x20 FOLLOWING-BYTE ...,
604 RULE is a one byte code of the range 0xA0..0xF0 that
605 represents a composition rule.
4ed46869
KH
606 */
607
608enum emacs_code_class_type emacs_code_class[256];
609
4ed46869
KH
610/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
611 Check if a text is encoded in Emacs' internal format. If it is,
d46c5b12 612 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */
4ed46869 613
0a28aafb
KH
614static int
615detect_coding_emacs_mule (src, src_end, multibytep)
b73bfc1c 616 unsigned char *src, *src_end;
0a28aafb 617 int multibytep;
4ed46869
KH
618{
619 unsigned char c;
620 int composing = 0;
b73bfc1c
KH
621 /* Dummy for ONE_MORE_BYTE. */
622 struct coding_system dummy_coding;
623 struct coding_system *coding = &dummy_coding;
4ed46869 624
b73bfc1c 625 while (1)
4ed46869 626 {
0a28aafb 627 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
4ed46869
KH
628
629 if (composing)
630 {
631 if (c < 0xA0)
632 composing = 0;
b73bfc1c
KH
633 else if (c == 0xA0)
634 {
0a28aafb 635 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
b73bfc1c
KH
636 c &= 0x7F;
637 }
4ed46869
KH
638 else
639 c -= 0x20;
640 }
641
b73bfc1c 642 if (c < 0x20)
4ed46869 643 {
4ed46869
KH
644 if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)
645 return 0;
b73bfc1c
KH
646 }
647 else if (c >= 0x80 && c < 0xA0)
648 {
649 if (c == 0x80)
650 /* Old leading code for a composite character. */
651 composing = 1;
652 else
653 {
654 unsigned char *src_base = src - 1;
655 int bytes;
4ed46869 656
b73bfc1c
KH
657 if (!UNIBYTE_STR_AS_MULTIBYTE_P (src_base, src_end - src_base,
658 bytes))
659 return 0;
660 src = src_base + bytes;
661 }
662 }
663 }
664 label_end_of_loop:
665 return CODING_CATEGORY_MASK_EMACS_MULE;
666}
4ed46869 667
4ed46869 668
aa72b389
KH
669/* Record the starting position START and METHOD of one composition. */
670
671#define CODING_ADD_COMPOSITION_START(coding, start, method) \
672 do { \
673 struct composition_data *cmp_data = coding->cmp_data; \
674 int *data = cmp_data->data + cmp_data->used; \
675 coding->cmp_data_start = cmp_data->used; \
676 data[0] = -1; \
677 data[1] = cmp_data->char_offset + start; \
678 data[3] = (int) method; \
679 cmp_data->used += 4; \
680 } while (0)
681
682/* Record the ending position END of the current composition. */
683
684#define CODING_ADD_COMPOSITION_END(coding, end) \
685 do { \
686 struct composition_data *cmp_data = coding->cmp_data; \
687 int *data = cmp_data->data + coding->cmp_data_start; \
688 data[0] = cmp_data->used - coding->cmp_data_start; \
689 data[2] = cmp_data->char_offset + end; \
690 } while (0)
691
692/* Record one COMPONENT (alternate character or composition rule). */
693
b6871cc7
KH
694#define CODING_ADD_COMPOSITION_COMPONENT(coding, component) \
695 do { \
696 coding->cmp_data->data[coding->cmp_data->used++] = component; \
697 if (coding->cmp_data->used - coding->cmp_data_start \
698 == COMPOSITION_DATA_MAX_BUNCH_LENGTH) \
699 { \
700 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
701 coding->composing = COMPOSITION_NO; \
702 } \
703 } while (0)
aa72b389
KH
704
705
706/* Get one byte from a data pointed by SRC and increment SRC. If SRC
8ca3766a 707 is not less than SRC_END, return -1 without incrementing Src. */
aa72b389
KH
708
709#define SAFE_ONE_MORE_BYTE() (src >= src_end ? -1 : *src++)
710
711
712/* Decode a character represented as a component of composition
713 sequence of Emacs 20 style at SRC. Set C to that character, store
714 its multibyte form sequence at P, and set P to the end of that
715 sequence. If no valid character is found, set C to -1. */
716
717#define DECODE_EMACS_MULE_COMPOSITION_CHAR(c, p) \
718 do { \
719 int bytes; \
fd3ae0b9 720 \
aa72b389
KH
721 c = SAFE_ONE_MORE_BYTE (); \
722 if (c < 0) \
723 break; \
724 if (CHAR_HEAD_P (c)) \
725 c = -1; \
726 else if (c == 0xA0) \
727 { \
728 c = SAFE_ONE_MORE_BYTE (); \
729 if (c < 0xA0) \
730 c = -1; \
731 else \
732 { \
733 c -= 0xA0; \
734 *p++ = c; \
735 } \
736 } \
737 else if (BASE_LEADING_CODE_P (c - 0x20)) \
738 { \
739 unsigned char *p0 = p; \
740 \
741 c -= 0x20; \
742 *p++ = c; \
743 bytes = BYTES_BY_CHAR_HEAD (c); \
744 while (--bytes) \
745 { \
746 c = SAFE_ONE_MORE_BYTE (); \
747 if (c < 0) \
748 break; \
749 *p++ = c; \
750 } \
fd3ae0b9
KH
751 if (UNIBYTE_STR_AS_MULTIBYTE_P (p0, p - p0, bytes) \
752 || (coding->flags /* We are recovering a file. */ \
753 && p0[0] == LEADING_CODE_8_BIT_CONTROL \
754 && ! CHAR_HEAD_P (p0[1]))) \
aa72b389
KH
755 c = STRING_CHAR (p0, bytes); \
756 else \
757 c = -1; \
758 } \
759 else \
760 c = -1; \
761 } while (0)
762
763
764/* Decode a composition rule represented as a component of composition
765 sequence of Emacs 20 style at SRC. Set C to the rule. If not
766 valid rule is found, set C to -1. */
767
768#define DECODE_EMACS_MULE_COMPOSITION_RULE(c) \
769 do { \
770 c = SAFE_ONE_MORE_BYTE (); \
771 c -= 0xA0; \
772 if (c < 0 || c >= 81) \
773 c = -1; \
774 else \
775 { \
776 gref = c / 9, nref = c % 9; \
777 c = COMPOSITION_ENCODE_RULE (gref, nref); \
778 } \
779 } while (0)
780
781
782/* Decode composition sequence encoded by `emacs-mule' at the source
783 pointed by SRC. SRC_END is the end of source. Store information
784 of the composition in CODING->cmp_data.
785
786 For backward compatibility, decode also a composition sequence of
787 Emacs 20 style. In that case, the composition sequence contains
788 characters that should be extracted into a buffer or string. Store
789 those characters at *DESTINATION in multibyte form.
790
791 If we encounter an invalid byte sequence, return 0.
792 If we encounter an insufficient source or destination, or
793 insufficient space in CODING->cmp_data, return 1.
794 Otherwise, return consumed bytes in the source.
795
796*/
797static INLINE int
798decode_composition_emacs_mule (coding, src, src_end,
799 destination, dst_end, dst_bytes)
800 struct coding_system *coding;
5bdca8af
DN
801 const unsigned char *src, *src_end;
802 unsigned char **destination, *dst_end;
aa72b389
KH
803 int dst_bytes;
804{
805 unsigned char *dst = *destination;
806 int method, data_len, nchars;
5bdca8af 807 const unsigned char *src_base = src++;
8ca3766a 808 /* Store components of composition. */
aa72b389
KH
809 int component[COMPOSITION_DATA_MAX_BUNCH_LENGTH];
810 int ncomponent;
811 /* Store multibyte form of characters to be composed. This is for
812 Emacs 20 style composition sequence. */
813 unsigned char buf[MAX_COMPOSITION_COMPONENTS * MAX_MULTIBYTE_LENGTH];
814 unsigned char *bufp = buf;
815 int c, i, gref, nref;
816
817 if (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH
818 >= COMPOSITION_DATA_SIZE)
819 {
820 coding->result = CODING_FINISH_INSUFFICIENT_CMP;
821 return -1;
822 }
823
824 ONE_MORE_BYTE (c);
825 if (c - 0xF0 >= COMPOSITION_RELATIVE
826 && c - 0xF0 <= COMPOSITION_WITH_RULE_ALTCHARS)
827 {
828 int with_rule;
829
830 method = c - 0xF0;
831 with_rule = (method == COMPOSITION_WITH_RULE
832 || method == COMPOSITION_WITH_RULE_ALTCHARS);
833 ONE_MORE_BYTE (c);
834 data_len = c - 0xA0;
835 if (data_len < 4
836 || src_base + data_len > src_end)
837 return 0;
838 ONE_MORE_BYTE (c);
839 nchars = c - 0xA0;
840 if (c < 1)
841 return 0;
842 for (ncomponent = 0; src < src_base + data_len; ncomponent++)
843 {
b1887814
RS
844 /* If it is longer than this, it can't be valid. */
845 if (ncomponent >= COMPOSITION_DATA_MAX_BUNCH_LENGTH)
846 return 0;
847
aa72b389
KH
848 if (ncomponent % 2 && with_rule)
849 {
850 ONE_MORE_BYTE (gref);
851 gref -= 32;
852 ONE_MORE_BYTE (nref);
853 nref -= 32;
854 c = COMPOSITION_ENCODE_RULE (gref, nref);
855 }
856 else
857 {
858 int bytes;
fd3ae0b9
KH
859 if (UNIBYTE_STR_AS_MULTIBYTE_P (src, src_end - src, bytes)
860 || (coding->flags /* We are recovering a file. */
861 && src[0] == LEADING_CODE_8_BIT_CONTROL
862 && ! CHAR_HEAD_P (src[1])))
aa72b389
KH
863 c = STRING_CHAR (src, bytes);
864 else
865 c = *src, bytes = 1;
866 src += bytes;
867 }
868 component[ncomponent] = c;
869 }
870 }
871 else
872 {
873 /* This may be an old Emacs 20 style format. See the comment at
874 the section 2 of this file. */
875 while (src < src_end && !CHAR_HEAD_P (*src)) src++;
876 if (src == src_end
877 && !(coding->mode & CODING_MODE_LAST_BLOCK))
878 goto label_end_of_loop;
879
880 src_end = src;
881 src = src_base + 1;
882 if (c < 0xC0)
883 {
884 method = COMPOSITION_RELATIVE;
885 for (ncomponent = 0; ncomponent < MAX_COMPOSITION_COMPONENTS;)
886 {
887 DECODE_EMACS_MULE_COMPOSITION_CHAR (c, bufp);
888 if (c < 0)
889 break;
890 component[ncomponent++] = c;
891 }
892 if (ncomponent < 2)
893 return 0;
894 nchars = ncomponent;
895 }
896 else if (c == 0xFF)
897 {
898 method = COMPOSITION_WITH_RULE;
899 src++;
900 DECODE_EMACS_MULE_COMPOSITION_CHAR (c, bufp);
901 if (c < 0)
902 return 0;
903 component[0] = c;
904 for (ncomponent = 1;
905 ncomponent < MAX_COMPOSITION_COMPONENTS * 2 - 1;)
906 {
907 DECODE_EMACS_MULE_COMPOSITION_RULE (c);
908 if (c < 0)
909 break;
910 component[ncomponent++] = c;
911 DECODE_EMACS_MULE_COMPOSITION_CHAR (c, bufp);
912 if (c < 0)
913 break;
914 component[ncomponent++] = c;
915 }
916 if (ncomponent < 3)
917 return 0;
918 nchars = (ncomponent + 1) / 2;
919 }
920 else
921 return 0;
922 }
923
924 if (buf == bufp || dst + (bufp - buf) <= (dst_bytes ? dst_end : src))
925 {
926 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, method);
927 for (i = 0; i < ncomponent; i++)
928 CODING_ADD_COMPOSITION_COMPONENT (coding, component[i]);
93dec019 929 CODING_ADD_COMPOSITION_END (coding, coding->produced_char + nchars);
aa72b389
KH
930 if (buf < bufp)
931 {
932 unsigned char *p = buf;
933 EMIT_BYTES (p, bufp);
934 *destination += bufp - buf;
935 coding->produced_char += nchars;
936 }
937 return (src - src_base);
938 }
939 label_end_of_loop:
940 return -1;
941}
942
b73bfc1c 943/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
4ed46869 944
b73bfc1c
KH
945static void
946decode_coding_emacs_mule (coding, source, destination, src_bytes, dst_bytes)
947 struct coding_system *coding;
5bdca8af
DN
948 const unsigned char *source;
949 unsigned char *destination;
b73bfc1c
KH
950 int src_bytes, dst_bytes;
951{
5bdca8af
DN
952 const unsigned char *src = source;
953 const unsigned char *src_end = source + src_bytes;
b73bfc1c
KH
954 unsigned char *dst = destination;
955 unsigned char *dst_end = destination + dst_bytes;
956 /* SRC_BASE remembers the start position in source in each loop.
957 The loop will be exited when there's not enough source code, or
958 when there's not enough destination area to produce a
959 character. */
5bdca8af 960 const unsigned char *src_base;
4ed46869 961
b73bfc1c 962 coding->produced_char = 0;
8a33cf7b 963 while ((src_base = src) < src_end)
b73bfc1c 964 {
5bdca8af
DN
965 unsigned char tmp[MAX_MULTIBYTE_LENGTH];
966 const unsigned char *p;
b73bfc1c 967 int bytes;
ec6d2bb8 968
4af310db
EZ
969 if (*src == '\r')
970 {
2bcdf662 971 int c = *src++;
4af310db 972
4af310db
EZ
973 if (coding->eol_type == CODING_EOL_CR)
974 c = '\n';
975 else if (coding->eol_type == CODING_EOL_CRLF)
976 {
977 ONE_MORE_BYTE (c);
978 if (c != '\n')
979 {
4af310db
EZ
980 src--;
981 c = '\r';
982 }
983 }
984 *dst++ = c;
985 coding->produced_char++;
986 continue;
987 }
988 else if (*src == '\n')
989 {
990 if ((coding->eol_type == CODING_EOL_CR
991 || coding->eol_type == CODING_EOL_CRLF)
992 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
993 {
994 coding->result = CODING_FINISH_INCONSISTENT_EOL;
995 goto label_end_of_loop;
996 }
997 *dst++ = *src++;
998 coding->produced_char++;
999 continue;
1000 }
3089d25c 1001 else if (*src == 0x80 && coding->cmp_data)
aa72b389
KH
1002 {
1003 /* Start of composition data. */
1004 int consumed = decode_composition_emacs_mule (coding, src, src_end,
1005 &dst, dst_end,
1006 dst_bytes);
1007 if (consumed < 0)
1008 goto label_end_of_loop;
1009 else if (consumed > 0)
1010 {
1011 src += consumed;
1012 continue;
1013 }
1014 bytes = CHAR_STRING (*src, tmp);
1015 p = tmp;
1016 src++;
1017 }
fd3ae0b9
KH
1018 else if (UNIBYTE_STR_AS_MULTIBYTE_P (src, src_end - src, bytes)
1019 || (coding->flags /* We are recovering a file. */
1020 && src[0] == LEADING_CODE_8_BIT_CONTROL
1021 && ! CHAR_HEAD_P (src[1])))
b73bfc1c
KH
1022 {
1023 p = src;
1024 src += bytes;
1025 }
1026 else
1027 {
6eced09c
KH
1028 int i, c;
1029
1030 bytes = BYTES_BY_CHAR_HEAD (*src);
b73bfc1c 1031 src++;
6eced09c
KH
1032 for (i = 1; i < bytes; i++)
1033 {
1034 ONE_MORE_BYTE (c);
1035 if (CHAR_HEAD_P (c))
1036 break;
1037 }
1038 if (i < bytes)
1039 {
1040 bytes = CHAR_STRING (*src_base, tmp);
1041 p = tmp;
1042 src = src_base + 1;
1043 }
1044 else
1045 {
1046 p = src_base;
1047 }
b73bfc1c
KH
1048 }
1049 if (dst + bytes >= (dst_bytes ? dst_end : src))
1050 {
1051 coding->result = CODING_FINISH_INSUFFICIENT_DST;
4ed46869
KH
1052 break;
1053 }
b73bfc1c
KH
1054 while (bytes--) *dst++ = *p++;
1055 coding->produced_char++;
4ed46869 1056 }
4af310db 1057 label_end_of_loop:
b73bfc1c
KH
1058 coding->consumed = coding->consumed_char = src_base - source;
1059 coding->produced = dst - destination;
4ed46869
KH
1060}
1061
b73bfc1c 1062
aa72b389
KH
1063/* Encode composition data stored at DATA into a special byte sequence
1064 starting by 0x80. Update CODING->cmp_data_start and maybe
1065 CODING->cmp_data for the next call. */
1066
1067#define ENCODE_COMPOSITION_EMACS_MULE(coding, data) \
1068 do { \
1069 unsigned char buf[1024], *p0 = buf, *p; \
1070 int len = data[0]; \
1071 int i; \
1072 \
1073 buf[0] = 0x80; \
1074 buf[1] = 0xF0 + data[3]; /* METHOD */ \
1075 buf[3] = 0xA0 + (data[2] - data[1]); /* COMPOSED-CHARS */ \
1076 p = buf + 4; \
1077 if (data[3] == COMPOSITION_WITH_RULE \
1078 || data[3] == COMPOSITION_WITH_RULE_ALTCHARS) \
1079 { \
1080 p += CHAR_STRING (data[4], p); \
1081 for (i = 5; i < len; i += 2) \
1082 { \
1083 int gref, nref; \
1084 COMPOSITION_DECODE_RULE (data[i], gref, nref); \
1085 *p++ = 0x20 + gref; \
1086 *p++ = 0x20 + nref; \
1087 p += CHAR_STRING (data[i + 1], p); \
1088 } \
1089 } \
1090 else \
1091 { \
1092 for (i = 4; i < len; i++) \
1093 p += CHAR_STRING (data[i], p); \
1094 } \
1095 buf[2] = 0xA0 + (p - buf); /* COMPONENTS-BYTES */ \
1096 \
1097 if (dst + (p - buf) + 4 > (dst_bytes ? dst_end : src)) \
1098 { \
1099 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
1100 goto label_end_of_loop; \
1101 } \
1102 while (p0 < p) \
1103 *dst++ = *p0++; \
1104 coding->cmp_data_start += data[0]; \
1105 if (coding->cmp_data_start == coding->cmp_data->used \
1106 && coding->cmp_data->next) \
1107 { \
1108 coding->cmp_data = coding->cmp_data->next; \
1109 coding->cmp_data_start = 0; \
1110 } \
1111 } while (0)
93dec019 1112
aa72b389 1113
a4244313 1114static void encode_eol P_ ((struct coding_system *, const unsigned char *,
aa72b389
KH
1115 unsigned char *, int, int));
1116
1117static void
1118encode_coding_emacs_mule (coding, source, destination, src_bytes, dst_bytes)
1119 struct coding_system *coding;
5bdca8af
DN
1120 const unsigned char *source;
1121 unsigned char *destination;
aa72b389
KH
1122 int src_bytes, dst_bytes;
1123{
5bdca8af
DN
1124 const unsigned char *src = source;
1125 const unsigned char *src_end = source + src_bytes;
aa72b389
KH
1126 unsigned char *dst = destination;
1127 unsigned char *dst_end = destination + dst_bytes;
5bdca8af 1128 const unsigned char *src_base;
aa72b389
KH
1129 int c;
1130 int char_offset;
1131 int *data;
1132
1133 Lisp_Object translation_table;
1134
1135 translation_table = Qnil;
1136
1137 /* Optimization for the case that there's no composition. */
1138 if (!coding->cmp_data || coding->cmp_data->used == 0)
1139 {
1140 encode_eol (coding, source, destination, src_bytes, dst_bytes);
1141 return;
1142 }
1143
1144 char_offset = coding->cmp_data->char_offset;
1145 data = coding->cmp_data->data + coding->cmp_data_start;
1146 while (1)
1147 {
1148 src_base = src;
1149
1150 /* If SRC starts a composition, encode the information about the
1151 composition in advance. */
1152 if (coding->cmp_data_start < coding->cmp_data->used
1153 && char_offset + coding->consumed_char == data[1])
1154 {
1155 ENCODE_COMPOSITION_EMACS_MULE (coding, data);
1156 char_offset = coding->cmp_data->char_offset;
1157 data = coding->cmp_data->data + coding->cmp_data_start;
1158 }
1159
1160 ONE_MORE_CHAR (c);
1161 if (c == '\n' && (coding->eol_type == CODING_EOL_CRLF
1162 || coding->eol_type == CODING_EOL_CR))
1163 {
1164 if (coding->eol_type == CODING_EOL_CRLF)
1165 EMIT_TWO_BYTES ('\r', c);
1166 else
1167 EMIT_ONE_BYTE ('\r');
1168 }
1169 else if (SINGLE_BYTE_CHAR_P (c))
fd3ae0b9
KH
1170 {
1171 if (coding->flags && ! ASCII_BYTE_P (c))
1172 {
1173 /* As we are auto saving, retain the multibyte form for
1174 8-bit chars. */
1175 unsigned char buf[MAX_MULTIBYTE_LENGTH];
1176 int bytes = CHAR_STRING (c, buf);
1177
1178 if (bytes == 1)
1179 EMIT_ONE_BYTE (buf[0]);
1180 else
1181 EMIT_TWO_BYTES (buf[0], buf[1]);
1182 }
1183 else
1184 EMIT_ONE_BYTE (c);
1185 }
aa72b389
KH
1186 else
1187 EMIT_BYTES (src_base, src);
1188 coding->consumed_char++;
1189 }
1190 label_end_of_loop:
1191 coding->consumed = src_base - source;
1192 coding->produced = coding->produced_char = dst - destination;
1193 return;
1194}
b73bfc1c 1195
4ed46869
KH
1196\f
1197/*** 3. ISO2022 handlers ***/
1198
1199/* The following note describes the coding system ISO2022 briefly.
39787efd 1200 Since the intention of this note is to help understand the
cfb43547 1201 functions in this file, some parts are NOT ACCURATE or are OVERLY
39787efd 1202 SIMPLIFIED. For thorough understanding, please refer to the
cfb43547
DL
1203 original document of ISO2022. This is equivalent to the standard
1204 ECMA-35, obtainable from <URL:http://www.ecma.ch/> (*).
4ed46869
KH
1205
1206 ISO2022 provides many mechanisms to encode several character sets
cfb43547 1207 in 7-bit and 8-bit environments. For 7-bit environments, all text
39787efd
KH
1208 is encoded using bytes less than 128. This may make the encoded
1209 text a little bit longer, but the text passes more easily through
cfb43547 1210 several types of gateway, some of which strip off the MSB (Most
8ca3766a 1211 Significant Bit).
b73bfc1c 1212
cfb43547
DL
1213 There are two kinds of character sets: control character sets and
1214 graphic character sets. The former contain control characters such
4ed46869 1215 as `newline' and `escape' to provide control functions (control
39787efd 1216 functions are also provided by escape sequences). The latter
cfb43547 1217 contain graphic characters such as 'A' and '-'. Emacs recognizes
4ed46869
KH
1218 two control character sets and many graphic character sets.
1219
1220 Graphic character sets are classified into one of the following
39787efd
KH
1221 four classes, according to the number of bytes (DIMENSION) and
1222 number of characters in one dimension (CHARS) of the set:
1223 - DIMENSION1_CHARS94
1224 - DIMENSION1_CHARS96
1225 - DIMENSION2_CHARS94
1226 - DIMENSION2_CHARS96
1227
1228 In addition, each character set is assigned an identification tag,
cfb43547 1229 unique for each set, called the "final character" (denoted as <F>
39787efd
KH
1230 hereafter). The <F> of each character set is decided by ECMA(*)
1231 when it is registered in ISO. The code range of <F> is 0x30..0x7F
1232 (0x30..0x3F are for private use only).
4ed46869
KH
1233
1234 Note (*): ECMA = European Computer Manufacturers Association
1235
cfb43547 1236 Here are examples of graphic character sets [NAME(<F>)]:
4ed46869
KH
1237 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
1238 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
1239 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
1240 o DIMENSION2_CHARS96 -- none for the moment
1241
39787efd 1242 A code area (1 byte=8 bits) is divided into 4 areas, C0, GL, C1, and GR.
4ed46869
KH
1243 C0 [0x00..0x1F] -- control character plane 0
1244 GL [0x20..0x7F] -- graphic character plane 0
1245 C1 [0x80..0x9F] -- control character plane 1
1246 GR [0xA0..0xFF] -- graphic character plane 1
1247
1248 A control character set is directly designated and invoked to C0 or
39787efd
KH
1249 C1 by an escape sequence. The most common case is that:
1250 - ISO646's control character set is designated/invoked to C0, and
1251 - ISO6429's control character set is designated/invoked to C1,
1252 and usually these designations/invocations are omitted in encoded
1253 text. In a 7-bit environment, only C0 can be used, and a control
1254 character for C1 is encoded by an appropriate escape sequence to
1255 fit into the environment. All control characters for C1 are
1256 defined to have corresponding escape sequences.
4ed46869
KH
1257
1258 A graphic character set is at first designated to one of four
1259 graphic registers (G0 through G3), then these graphic registers are
1260 invoked to GL or GR. These designations and invocations can be
1261 done independently. The most common case is that G0 is invoked to
39787efd
KH
1262 GL, G1 is invoked to GR, and ASCII is designated to G0. Usually
1263 these invocations and designations are omitted in encoded text.
1264 In a 7-bit environment, only GL can be used.
4ed46869 1265
39787efd
KH
1266 When a graphic character set of CHARS94 is invoked to GL, codes
1267 0x20 and 0x7F of the GL area work as control characters SPACE and
1268 DEL respectively, and codes 0xA0 and 0xFF of the GR area should not
1269 be used.
4ed46869
KH
1270
1271 There are two ways of invocation: locking-shift and single-shift.
1272 With locking-shift, the invocation lasts until the next different
39787efd
KH
1273 invocation, whereas with single-shift, the invocation affects the
1274 following character only and doesn't affect the locking-shift
1275 state. Invocations are done by the following control characters or
1276 escape sequences:
4ed46869
KH
1277
1278 ----------------------------------------------------------------------
39787efd 1279 abbrev function cntrl escape seq description
4ed46869 1280 ----------------------------------------------------------------------
39787efd
KH
1281 SI/LS0 (shift-in) 0x0F none invoke G0 into GL
1282 SO/LS1 (shift-out) 0x0E none invoke G1 into GL
1283 LS2 (locking-shift-2) none ESC 'n' invoke G2 into GL
1284 LS3 (locking-shift-3) none ESC 'o' invoke G3 into GL
1285 LS1R (locking-shift-1 right) none ESC '~' invoke G1 into GR (*)
1286 LS2R (locking-shift-2 right) none ESC '}' invoke G2 into GR (*)
1287 LS3R (locking-shift 3 right) none ESC '|' invoke G3 into GR (*)
1288 SS2 (single-shift-2) 0x8E ESC 'N' invoke G2 for one char
1289 SS3 (single-shift-3) 0x8F ESC 'O' invoke G3 for one char
4ed46869 1290 ----------------------------------------------------------------------
39787efd
KH
1291 (*) These are not used by any known coding system.
1292
1293 Control characters for these functions are defined by macros
1294 ISO_CODE_XXX in `coding.h'.
4ed46869 1295
39787efd 1296 Designations are done by the following escape sequences:
4ed46869
KH
1297 ----------------------------------------------------------------------
1298 escape sequence description
1299 ----------------------------------------------------------------------
1300 ESC '(' <F> designate DIMENSION1_CHARS94<F> to G0
1301 ESC ')' <F> designate DIMENSION1_CHARS94<F> to G1
1302 ESC '*' <F> designate DIMENSION1_CHARS94<F> to G2
1303 ESC '+' <F> designate DIMENSION1_CHARS94<F> to G3
1304 ESC ',' <F> designate DIMENSION1_CHARS96<F> to G0 (*)
1305 ESC '-' <F> designate DIMENSION1_CHARS96<F> to G1
1306 ESC '.' <F> designate DIMENSION1_CHARS96<F> to G2
1307 ESC '/' <F> designate DIMENSION1_CHARS96<F> to G3
1308 ESC '$' '(' <F> designate DIMENSION2_CHARS94<F> to G0 (**)
1309 ESC '$' ')' <F> designate DIMENSION2_CHARS94<F> to G1
1310 ESC '$' '*' <F> designate DIMENSION2_CHARS94<F> to G2
1311 ESC '$' '+' <F> designate DIMENSION2_CHARS94<F> to G3
1312 ESC '$' ',' <F> designate DIMENSION2_CHARS96<F> to G0 (*)
1313 ESC '$' '-' <F> designate DIMENSION2_CHARS96<F> to G1
1314 ESC '$' '.' <F> designate DIMENSION2_CHARS96<F> to G2
1315 ESC '$' '/' <F> designate DIMENSION2_CHARS96<F> to G3
1316 ----------------------------------------------------------------------
1317
1318 In this list, "DIMENSION1_CHARS94<F>" means a graphic character set
39787efd 1319 of dimension 1, chars 94, and final character <F>, etc...
4ed46869
KH
1320
1321 Note (*): Although these designations are not allowed in ISO2022,
1322 Emacs accepts them on decoding, and produces them on encoding
39787efd 1323 CHARS96 character sets in a coding system which is characterized as
4ed46869
KH
1324 7-bit environment, non-locking-shift, and non-single-shift.
1325
1326 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
39787efd 1327 '(' can be omitted. We refer to this as "short-form" hereafter.
4ed46869 1328
cfb43547 1329 Now you may notice that there are a lot of ways of encoding the
39787efd
KH
1330 same multilingual text in ISO2022. Actually, there exist many
1331 coding systems such as Compound Text (used in X11's inter client
8ca3766a
DL
1332 communication, ISO-2022-JP (used in Japanese Internet), ISO-2022-KR
1333 (used in Korean Internet), EUC (Extended UNIX Code, used in Asian
4ed46869
KH
1334 localized platforms), and all of these are variants of ISO2022.
1335
1336 In addition to the above, Emacs handles two more kinds of escape
1337 sequences: ISO6429's direction specification and Emacs' private
1338 sequence for specifying character composition.
1339
39787efd 1340 ISO6429's direction specification takes the following form:
4ed46869
KH
1341 o CSI ']' -- end of the current direction
1342 o CSI '0' ']' -- end of the current direction
1343 o CSI '1' ']' -- start of left-to-right text
1344 o CSI '2' ']' -- start of right-to-left text
1345 The control character CSI (0x9B: control sequence introducer) is
39787efd
KH
1346 abbreviated to the escape sequence ESC '[' in a 7-bit environment.
1347
1348 Character composition specification takes the following form:
ec6d2bb8
KH
1349 o ESC '0' -- start relative composition
1350 o ESC '1' -- end composition
1351 o ESC '2' -- start rule-base composition (*)
1352 o ESC '3' -- start relative composition with alternate chars (**)
1353 o ESC '4' -- start rule-base composition with alternate chars (**)
b73bfc1c 1354 Since these are not standard escape sequences of any ISO standard,
cfb43547 1355 the use of them with these meanings is restricted to Emacs only.
ec6d2bb8 1356
cfb43547 1357 (*) This form is used only in Emacs 20.5 and older versions,
b73bfc1c 1358 but the newer versions can safely decode it.
cfb43547 1359 (**) This form is used only in Emacs 21.1 and newer versions,
b73bfc1c 1360 and the older versions can't decode it.
ec6d2bb8 1361
cfb43547 1362 Here's a list of example usages of these composition escape
b73bfc1c 1363 sequences (categorized by `enum composition_method').
ec6d2bb8 1364
b73bfc1c 1365 COMPOSITION_RELATIVE:
ec6d2bb8 1366 ESC 0 CHAR [ CHAR ] ESC 1
8ca3766a 1367 COMPOSITION_WITH_RULE:
ec6d2bb8 1368 ESC 2 CHAR [ RULE CHAR ] ESC 1
b73bfc1c 1369 COMPOSITION_WITH_ALTCHARS:
ec6d2bb8 1370 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
b73bfc1c 1371 COMPOSITION_WITH_RULE_ALTCHARS:
ec6d2bb8 1372 ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */
4ed46869
KH
1373
1374enum iso_code_class_type iso_code_class[256];
1375
05e6f5dc
KH
1376#define CHARSET_OK(idx, charset, c) \
1377 (coding_system_table[idx] \
1378 && (charset == CHARSET_ASCII \
6b89e3aa 1379 || (safe_chars = coding_safe_chars (coding_system_table[idx]->symbol), \
05e6f5dc
KH
1380 CODING_SAFE_CHAR_P (safe_chars, c))) \
1381 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding_system_table[idx], \
1382 charset) \
1383 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
d46c5b12
KH
1384
1385#define SHIFT_OUT_OK(idx) \
1386 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
1387
b6871cc7
KH
1388#define COMPOSITION_OK(idx) \
1389 (coding_system_table[idx]->composing != COMPOSITION_DISABLED)
1390
4ed46869 1391/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
cfb43547 1392 Check if a text is encoded in ISO2022. If it is, return an
4ed46869
KH
1393 integer in which appropriate flag bits any of:
1394 CODING_CATEGORY_MASK_ISO_7
d46c5b12 1395 CODING_CATEGORY_MASK_ISO_7_TIGHT
4ed46869
KH
1396 CODING_CATEGORY_MASK_ISO_8_1
1397 CODING_CATEGORY_MASK_ISO_8_2
7717c392
KH
1398 CODING_CATEGORY_MASK_ISO_7_ELSE
1399 CODING_CATEGORY_MASK_ISO_8_ELSE
4ed46869
KH
1400 are set. If a code which should never appear in ISO2022 is found,
1401 returns 0. */
1402
0a28aafb
KH
1403static int
1404detect_coding_iso2022 (src, src_end, multibytep)
4ed46869 1405 unsigned char *src, *src_end;
0a28aafb 1406 int multibytep;
4ed46869 1407{
d46c5b12
KH
1408 int mask = CODING_CATEGORY_MASK_ISO;
1409 int mask_found = 0;
f46869e4 1410 int reg[4], shift_out = 0, single_shifting = 0;
da55a2b7 1411 int c, c1, charset;
b73bfc1c
KH
1412 /* Dummy for ONE_MORE_BYTE. */
1413 struct coding_system dummy_coding;
1414 struct coding_system *coding = &dummy_coding;
05e6f5dc 1415 Lisp_Object safe_chars;
3f003981 1416
d46c5b12 1417 reg[0] = CHARSET_ASCII, reg[1] = reg[2] = reg[3] = -1;
3f003981 1418 while (mask && src < src_end)
4ed46869 1419 {
0a28aafb 1420 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
8d239c89 1421 retry:
4ed46869
KH
1422 switch (c)
1423 {
1424 case ISO_CODE_ESC:
74383408
KH
1425 if (inhibit_iso_escape_detection)
1426 break;
f46869e4 1427 single_shifting = 0;
0a28aafb 1428 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
d46c5b12 1429 if (c >= '(' && c <= '/')
4ed46869 1430 {
bf9cdd4e 1431 /* Designation sequence for a charset of dimension 1. */
0a28aafb 1432 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1, multibytep);
d46c5b12
KH
1433 if (c1 < ' ' || c1 >= 0x80
1434 || (charset = iso_charset_table[0][c >= ','][c1]) < 0)
1435 /* Invalid designation sequence. Just ignore. */
1436 break;
1437 reg[(c - '(') % 4] = charset;
bf9cdd4e
KH
1438 }
1439 else if (c == '$')
1440 {
1441 /* Designation sequence for a charset of dimension 2. */
0a28aafb 1442 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
bf9cdd4e
KH
1443 if (c >= '@' && c <= 'B')
1444 /* Designation for JISX0208.1978, GB2312, or JISX0208. */
d46c5b12 1445 reg[0] = charset = iso_charset_table[1][0][c];
bf9cdd4e 1446 else if (c >= '(' && c <= '/')
bcf26d6a 1447 {
0a28aafb 1448 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1, multibytep);
d46c5b12
KH
1449 if (c1 < ' ' || c1 >= 0x80
1450 || (charset = iso_charset_table[1][c >= ','][c1]) < 0)
1451 /* Invalid designation sequence. Just ignore. */
1452 break;
1453 reg[(c - '(') % 4] = charset;
bcf26d6a 1454 }
bf9cdd4e 1455 else
d46c5b12
KH
1456 /* Invalid designation sequence. Just ignore. */
1457 break;
1458 }
ae9ff118 1459 else if (c == 'N' || c == 'O')
d46c5b12 1460 {
ae9ff118
KH
1461 /* ESC <Fe> for SS2 or SS3. */
1462 mask &= CODING_CATEGORY_MASK_ISO_7_ELSE;
d46c5b12 1463 break;
4ed46869 1464 }
ec6d2bb8
KH
1465 else if (c >= '0' && c <= '4')
1466 {
1467 /* ESC <Fp> for start/end composition. */
b6871cc7
KH
1468 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7))
1469 mask_found |= CODING_CATEGORY_MASK_ISO_7;
1470 else
1471 mask &= ~CODING_CATEGORY_MASK_ISO_7;
1472 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT))
1473 mask_found |= CODING_CATEGORY_MASK_ISO_7_TIGHT;
1474 else
1475 mask &= ~CODING_CATEGORY_MASK_ISO_7_TIGHT;
1476 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_1))
1477 mask_found |= CODING_CATEGORY_MASK_ISO_8_1;
1478 else
1479 mask &= ~CODING_CATEGORY_MASK_ISO_8_1;
1480 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_2))
1481 mask_found |= CODING_CATEGORY_MASK_ISO_8_2;
1482 else
1483 mask &= ~CODING_CATEGORY_MASK_ISO_8_2;
1484 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_ELSE))
1485 mask_found |= CODING_CATEGORY_MASK_ISO_7_ELSE;
1486 else
1487 mask &= ~CODING_CATEGORY_MASK_ISO_7_ELSE;
1488 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_ELSE))
1489 mask_found |= CODING_CATEGORY_MASK_ISO_8_ELSE;
1490 else
1491 mask &= ~CODING_CATEGORY_MASK_ISO_8_ELSE;
ec6d2bb8
KH
1492 break;
1493 }
bf9cdd4e 1494 else
d46c5b12
KH
1495 /* Invalid escape sequence. Just ignore. */
1496 break;
1497
1498 /* We found a valid designation sequence for CHARSET. */
1499 mask &= ~CODING_CATEGORY_MASK_ISO_8BIT;
05e6f5dc
KH
1500 c = MAKE_CHAR (charset, 0, 0);
1501 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7, charset, c))
d46c5b12
KH
1502 mask_found |= CODING_CATEGORY_MASK_ISO_7;
1503 else
1504 mask &= ~CODING_CATEGORY_MASK_ISO_7;
05e6f5dc 1505 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT, charset, c))
d46c5b12
KH
1506 mask_found |= CODING_CATEGORY_MASK_ISO_7_TIGHT;
1507 else
1508 mask &= ~CODING_CATEGORY_MASK_ISO_7_TIGHT;
05e6f5dc 1509 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE, charset, c))
ae9ff118
KH
1510 mask_found |= CODING_CATEGORY_MASK_ISO_7_ELSE;
1511 else
d46c5b12 1512 mask &= ~CODING_CATEGORY_MASK_ISO_7_ELSE;
05e6f5dc 1513 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE, charset, c))
ae9ff118
KH
1514 mask_found |= CODING_CATEGORY_MASK_ISO_8_ELSE;
1515 else
d46c5b12 1516 mask &= ~CODING_CATEGORY_MASK_ISO_8_ELSE;
4ed46869
KH
1517 break;
1518
4ed46869 1519 case ISO_CODE_SO:
74383408
KH
1520 if (inhibit_iso_escape_detection)
1521 break;
f46869e4 1522 single_shifting = 0;
d46c5b12
KH
1523 if (shift_out == 0
1524 && (reg[1] >= 0
1525 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE)
1526 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE)))
1527 {
1528 /* Locking shift out. */
1529 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
1530 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
1531 }
e0e989f6 1532 break;
93dec019 1533
d46c5b12 1534 case ISO_CODE_SI:
74383408
KH
1535 if (inhibit_iso_escape_detection)
1536 break;
f46869e4 1537 single_shifting = 0;
d46c5b12
KH
1538 if (shift_out == 1)
1539 {
1540 /* Locking shift in. */
1541 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
1542 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
1543 }
1544 break;
1545
4ed46869 1546 case ISO_CODE_CSI:
f46869e4 1547 single_shifting = 0;
4ed46869
KH
1548 case ISO_CODE_SS2:
1549 case ISO_CODE_SS3:
3f003981
KH
1550 {
1551 int newmask = CODING_CATEGORY_MASK_ISO_8_ELSE;
1552
74383408
KH
1553 if (inhibit_iso_escape_detection)
1554 break;
70c22245
KH
1555 if (c != ISO_CODE_CSI)
1556 {
d46c5b12
KH
1557 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
1558 & CODING_FLAG_ISO_SINGLE_SHIFT)
70c22245 1559 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
1560 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
1561 & CODING_FLAG_ISO_SINGLE_SHIFT)
70c22245 1562 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
f46869e4 1563 single_shifting = 1;
70c22245 1564 }
3f003981
KH
1565 if (VECTORP (Vlatin_extra_code_table)
1566 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
1567 {
d46c5b12
KH
1568 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
1569 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981 1570 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
1571 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
1572 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981
KH
1573 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
1574 }
1575 mask &= newmask;
d46c5b12 1576 mask_found |= newmask;
3f003981
KH
1577 }
1578 break;
4ed46869
KH
1579
1580 default:
1581 if (c < 0x80)
f46869e4
KH
1582 {
1583 single_shifting = 0;
1584 break;
1585 }
4ed46869 1586 else if (c < 0xA0)
c4825358 1587 {
f46869e4 1588 single_shifting = 0;
3f003981
KH
1589 if (VECTORP (Vlatin_extra_code_table)
1590 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
c4825358 1591 {
3f003981
KH
1592 int newmask = 0;
1593
d46c5b12
KH
1594 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
1595 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981 1596 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
d46c5b12
KH
1597 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
1598 & CODING_FLAG_ISO_LATIN_EXTRA)
3f003981
KH
1599 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
1600 mask &= newmask;
d46c5b12 1601 mask_found |= newmask;
c4825358 1602 }
3f003981
KH
1603 else
1604 return 0;
c4825358 1605 }
4ed46869
KH
1606 else
1607 {
d46c5b12 1608 mask &= ~(CODING_CATEGORY_MASK_ISO_7BIT
7717c392 1609 | CODING_CATEGORY_MASK_ISO_7_ELSE);
d46c5b12 1610 mask_found |= CODING_CATEGORY_MASK_ISO_8_1;
f46869e4
KH
1611 /* Check the length of succeeding codes of the range
1612 0xA0..0FF. If the byte length is odd, we exclude
1613 CODING_CATEGORY_MASK_ISO_8_2. We can check this only
1614 when we are not single shifting. */
b73bfc1c
KH
1615 if (!single_shifting
1616 && mask & CODING_CATEGORY_MASK_ISO_8_2)
f46869e4 1617 {
e17de821 1618 int i = 1;
8d239c89
KH
1619
1620 c = -1;
b73bfc1c
KH
1621 while (src < src_end)
1622 {
0a28aafb 1623 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
b73bfc1c
KH
1624 if (c < 0xA0)
1625 break;
1626 i++;
1627 }
1628
1629 if (i & 1 && src < src_end)
f46869e4
KH
1630 mask &= ~CODING_CATEGORY_MASK_ISO_8_2;
1631 else
1632 mask_found |= CODING_CATEGORY_MASK_ISO_8_2;
8d239c89
KH
1633 if (c >= 0)
1634 /* This means that we have read one extra byte. */
1635 goto retry;
f46869e4 1636 }
4ed46869
KH
1637 }
1638 break;
1639 }
1640 }
b73bfc1c 1641 label_end_of_loop:
d46c5b12 1642 return (mask & mask_found);
4ed46869
KH
1643}
1644
b73bfc1c
KH
1645/* Decode a character of which charset is CHARSET, the 1st position
1646 code is C1, the 2nd position code is C2, and return the decoded
1647 character code. If the variable `translation_table' is non-nil,
1648 returned the translated code. */
ec6d2bb8 1649
b73bfc1c
KH
1650#define DECODE_ISO_CHARACTER(charset, c1, c2) \
1651 (NILP (translation_table) \
1652 ? MAKE_CHAR (charset, c1, c2) \
1653 : translate_char (translation_table, -1, charset, c1, c2))
4ed46869
KH
1654
1655/* Set designation state into CODING. */
d46c5b12
KH
1656#define DECODE_DESIGNATION(reg, dimension, chars, final_char) \
1657 do { \
05e6f5dc 1658 int charset, c; \
944bd420
KH
1659 \
1660 if (final_char < '0' || final_char >= 128) \
1661 goto label_invalid_code; \
1662 charset = ISO_CHARSET_TABLE (make_number (dimension), \
1663 make_number (chars), \
1664 make_number (final_char)); \
05e6f5dc 1665 c = MAKE_CHAR (charset, 0, 0); \
d46c5b12 1666 if (charset >= 0 \
704c5781 1667 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg \
05e6f5dc 1668 || CODING_SAFE_CHAR_P (safe_chars, c))) \
d46c5b12
KH
1669 { \
1670 if (coding->spec.iso2022.last_invalid_designation_register == 0 \
1671 && reg == 0 \
1672 && charset == CHARSET_ASCII) \
1673 { \
1674 /* We should insert this designation sequence as is so \
1675 that it is surely written back to a file. */ \
1676 coding->spec.iso2022.last_invalid_designation_register = -1; \
1677 goto label_invalid_code; \
1678 } \
1679 coding->spec.iso2022.last_invalid_designation_register = -1; \
1680 if ((coding->mode & CODING_MODE_DIRECTION) \
1681 && CHARSET_REVERSE_CHARSET (charset) >= 0) \
1682 charset = CHARSET_REVERSE_CHARSET (charset); \
1683 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1684 } \
1685 else \
1686 { \
1687 coding->spec.iso2022.last_invalid_designation_register = reg; \
1688 goto label_invalid_code; \
1689 } \
4ed46869
KH
1690 } while (0)
1691
ec6d2bb8
KH
1692/* Allocate a memory block for storing information about compositions.
1693 The block is chained to the already allocated blocks. */
d46c5b12 1694
33fb63eb 1695void
ec6d2bb8 1696coding_allocate_composition_data (coding, char_offset)
d46c5b12 1697 struct coding_system *coding;
ec6d2bb8 1698 int char_offset;
d46c5b12 1699{
ec6d2bb8
KH
1700 struct composition_data *cmp_data
1701 = (struct composition_data *) xmalloc (sizeof *cmp_data);
1702
1703 cmp_data->char_offset = char_offset;
1704 cmp_data->used = 0;
1705 cmp_data->prev = coding->cmp_data;
1706 cmp_data->next = NULL;
1707 if (coding->cmp_data)
1708 coding->cmp_data->next = cmp_data;
1709 coding->cmp_data = cmp_data;
1710 coding->cmp_data_start = 0;
4307d534 1711 coding->composing = COMPOSITION_NO;
ec6d2bb8 1712}
d46c5b12 1713
aa72b389
KH
1714/* Handle composition start sequence ESC 0, ESC 2, ESC 3, or ESC 4.
1715 ESC 0 : relative composition : ESC 0 CHAR ... ESC 1
1716 ESC 2 : rulebase composition : ESC 2 CHAR RULE CHAR RULE ... CHAR ESC 1
1717 ESC 3 : altchar composition : ESC 3 ALT ... ESC 0 CHAR ... ESC 1
1718 ESC 4 : alt&rule composition : ESC 4 ALT RULE .. ALT ESC 0 CHAR ... ESC 1
1719 */
ec6d2bb8 1720
33fb63eb
KH
1721#define DECODE_COMPOSITION_START(c1) \
1722 do { \
1723 if (coding->composing == COMPOSITION_DISABLED) \
1724 { \
1725 *dst++ = ISO_CODE_ESC; \
1726 *dst++ = c1 & 0x7f; \
1727 coding->produced_char += 2; \
1728 } \
1729 else if (!COMPOSING_P (coding)) \
1730 { \
1731 /* This is surely the start of a composition. We must be sure \
1732 that coding->cmp_data has enough space to store the \
1733 information about the composition. If not, terminate the \
1734 current decoding loop, allocate one more memory block for \
8ca3766a 1735 coding->cmp_data in the caller, then start the decoding \
33fb63eb
KH
1736 loop again. We can't allocate memory here directly because \
1737 it may cause buffer/string relocation. */ \
1738 if (!coding->cmp_data \
1739 || (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH \
1740 >= COMPOSITION_DATA_SIZE)) \
1741 { \
1742 coding->result = CODING_FINISH_INSUFFICIENT_CMP; \
1743 goto label_end_of_loop; \
1744 } \
1745 coding->composing = (c1 == '0' ? COMPOSITION_RELATIVE \
1746 : c1 == '2' ? COMPOSITION_WITH_RULE \
1747 : c1 == '3' ? COMPOSITION_WITH_ALTCHARS \
1748 : COMPOSITION_WITH_RULE_ALTCHARS); \
1749 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, \
1750 coding->composing); \
1751 coding->composition_rule_follows = 0; \
1752 } \
1753 else \
1754 { \
1755 /* We are already handling a composition. If the method is \
1756 the following two, the codes following the current escape \
1757 sequence are actual characters stored in a buffer. */ \
1758 if (coding->composing == COMPOSITION_WITH_ALTCHARS \
1759 || coding->composing == COMPOSITION_WITH_RULE_ALTCHARS) \
1760 { \
1761 coding->composing = COMPOSITION_RELATIVE; \
1762 coding->composition_rule_follows = 0; \
1763 } \
1764 } \
ec6d2bb8
KH
1765 } while (0)
1766
8ca3766a 1767/* Handle composition end sequence ESC 1. */
ec6d2bb8
KH
1768
1769#define DECODE_COMPOSITION_END(c1) \
1770 do { \
93dec019 1771 if (! COMPOSING_P (coding)) \
ec6d2bb8
KH
1772 { \
1773 *dst++ = ISO_CODE_ESC; \
1774 *dst++ = c1; \
1775 coding->produced_char += 2; \
1776 } \
1777 else \
1778 { \
1779 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
1780 coding->composing = COMPOSITION_NO; \
1781 } \
1782 } while (0)
1783
1784/* Decode a composition rule from the byte C1 (and maybe one more byte
1785 from SRC) and store one encoded composition rule in
1786 coding->cmp_data. */
1787
1788#define DECODE_COMPOSITION_RULE(c1) \
1789 do { \
1790 int rule = 0; \
1791 (c1) -= 32; \
1792 if (c1 < 81) /* old format (before ver.21) */ \
1793 { \
1794 int gref = (c1) / 9; \
1795 int nref = (c1) % 9; \
1796 if (gref == 4) gref = 10; \
1797 if (nref == 4) nref = 10; \
1798 rule = COMPOSITION_ENCODE_RULE (gref, nref); \
1799 } \
b73bfc1c 1800 else if (c1 < 93) /* new format (after ver.21) */ \
ec6d2bb8
KH
1801 { \
1802 ONE_MORE_BYTE (c2); \
1803 rule = COMPOSITION_ENCODE_RULE (c1 - 81, c2 - 32); \
1804 } \
1805 CODING_ADD_COMPOSITION_COMPONENT (coding, rule); \
1806 coding->composition_rule_follows = 0; \
1807 } while (0)
88993dfd 1808
d46c5b12 1809
4ed46869
KH
1810/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
1811
b73bfc1c 1812static void
d46c5b12 1813decode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
4ed46869 1814 struct coding_system *coding;
5bdca8af
DN
1815 const unsigned char *source;
1816 unsigned char *destination;
4ed46869 1817 int src_bytes, dst_bytes;
4ed46869 1818{
5bdca8af
DN
1819 const unsigned char *src = source;
1820 const unsigned char *src_end = source + src_bytes;
4ed46869
KH
1821 unsigned char *dst = destination;
1822 unsigned char *dst_end = destination + dst_bytes;
4ed46869
KH
1823 /* Charsets invoked to graphic plane 0 and 1 respectively. */
1824 int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1825 int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
b73bfc1c
KH
1826 /* SRC_BASE remembers the start position in source in each loop.
1827 The loop will be exited when there's not enough source code
1828 (within macro ONE_MORE_BYTE), or when there's not enough
1829 destination area to produce a character (within macro
1830 EMIT_CHAR). */
5bdca8af 1831 const unsigned char *src_base;
b73bfc1c
KH
1832 int c, charset;
1833 Lisp_Object translation_table;
05e6f5dc
KH
1834 Lisp_Object safe_chars;
1835
6b89e3aa 1836 safe_chars = coding_safe_chars (coding->symbol);
bdd9fb48 1837
b73bfc1c
KH
1838 if (NILP (Venable_character_translation))
1839 translation_table = Qnil;
1840 else
1841 {
1842 translation_table = coding->translation_table_for_decode;
1843 if (NILP (translation_table))
1844 translation_table = Vstandard_translation_table_for_decode;
1845 }
4ed46869 1846
b73bfc1c
KH
1847 coding->result = CODING_FINISH_NORMAL;
1848
1849 while (1)
4ed46869 1850 {
85478bc6 1851 int c1, c2 = 0;
b73bfc1c
KH
1852
1853 src_base = src;
1854 ONE_MORE_BYTE (c1);
4ed46869 1855
ec6d2bb8 1856 /* We produce no character or one character. */
4ed46869
KH
1857 switch (iso_code_class [c1])
1858 {
1859 case ISO_0x20_or_0x7F:
ec6d2bb8
KH
1860 if (COMPOSING_P (coding) && coding->composition_rule_follows)
1861 {
1862 DECODE_COMPOSITION_RULE (c1);
b73bfc1c 1863 continue;
ec6d2bb8
KH
1864 }
1865 if (charset0 < 0 || CHARSET_CHARS (charset0) == 94)
4ed46869
KH
1866 {
1867 /* This is SPACE or DEL. */
b73bfc1c 1868 charset = CHARSET_ASCII;
4ed46869
KH
1869 break;
1870 }
1871 /* This is a graphic character, we fall down ... */
1872
1873 case ISO_graphic_plane_0:
ec6d2bb8 1874 if (COMPOSING_P (coding) && coding->composition_rule_follows)
b73bfc1c
KH
1875 {
1876 DECODE_COMPOSITION_RULE (c1);
1877 continue;
1878 }
1879 charset = charset0;
4ed46869
KH
1880 break;
1881
1882 case ISO_0xA0_or_0xFF:
d46c5b12
KH
1883 if (charset1 < 0 || CHARSET_CHARS (charset1) == 94
1884 || coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
fb88bf2d 1885 goto label_invalid_code;
4ed46869
KH
1886 /* This is a graphic character, we fall down ... */
1887
1888 case ISO_graphic_plane_1:
b73bfc1c 1889 if (charset1 < 0)
fb88bf2d 1890 goto label_invalid_code;
b73bfc1c 1891 charset = charset1;
4ed46869
KH
1892 break;
1893
b73bfc1c 1894 case ISO_control_0:
ec6d2bb8
KH
1895 if (COMPOSING_P (coding))
1896 DECODE_COMPOSITION_END ('1');
1897
4ed46869
KH
1898 /* All ISO2022 control characters in this class have the
1899 same representation in Emacs internal format. */
d46c5b12
KH
1900 if (c1 == '\n'
1901 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
1902 && (coding->eol_type == CODING_EOL_CR
1903 || coding->eol_type == CODING_EOL_CRLF))
1904 {
b73bfc1c
KH
1905 coding->result = CODING_FINISH_INCONSISTENT_EOL;
1906 goto label_end_of_loop;
d46c5b12 1907 }
b73bfc1c 1908 charset = CHARSET_ASCII;
4ed46869
KH
1909 break;
1910
b73bfc1c
KH
1911 case ISO_control_1:
1912 if (COMPOSING_P (coding))
1913 DECODE_COMPOSITION_END ('1');
1914 goto label_invalid_code;
1915
4ed46869 1916 case ISO_carriage_return:
ec6d2bb8
KH
1917 if (COMPOSING_P (coding))
1918 DECODE_COMPOSITION_END ('1');
1919
4ed46869 1920 if (coding->eol_type == CODING_EOL_CR)
b73bfc1c 1921 c1 = '\n';
4ed46869
KH
1922 else if (coding->eol_type == CODING_EOL_CRLF)
1923 {
1924 ONE_MORE_BYTE (c1);
b73bfc1c 1925 if (c1 != ISO_CODE_LF)
4ed46869
KH
1926 {
1927 src--;
b73bfc1c 1928 c1 = '\r';
4ed46869
KH
1929 }
1930 }
b73bfc1c 1931 charset = CHARSET_ASCII;
4ed46869
KH
1932 break;
1933
1934 case ISO_shift_out:
d46c5b12
KH
1935 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1936 || CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0)
1937 goto label_invalid_code;
4ed46869
KH
1938 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1;
1939 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1940 continue;
4ed46869
KH
1941
1942 case ISO_shift_in:
d46c5b12
KH
1943 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
1944 goto label_invalid_code;
4ed46869
KH
1945 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
1946 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 1947 continue;
4ed46869
KH
1948
1949 case ISO_single_shift_2_7:
1950 case ISO_single_shift_2:
d46c5b12
KH
1951 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1952 goto label_invalid_code;
4ed46869
KH
1953 /* SS2 is handled as an escape sequence of ESC 'N' */
1954 c1 = 'N';
1955 goto label_escape_sequence;
1956
1957 case ISO_single_shift_3:
d46c5b12
KH
1958 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1959 goto label_invalid_code;
4ed46869
KH
1960 /* SS2 is handled as an escape sequence of ESC 'O' */
1961 c1 = 'O';
1962 goto label_escape_sequence;
1963
1964 case ISO_control_sequence_introducer:
1965 /* CSI is handled as an escape sequence of ESC '[' ... */
1966 c1 = '[';
1967 goto label_escape_sequence;
1968
1969 case ISO_escape:
1970 ONE_MORE_BYTE (c1);
1971 label_escape_sequence:
1972 /* Escape sequences handled by Emacs are invocation,
1973 designation, direction specification, and character
1974 composition specification. */
1975 switch (c1)
1976 {
1977 case '&': /* revision of following character set */
1978 ONE_MORE_BYTE (c1);
1979 if (!(c1 >= '@' && c1 <= '~'))
d46c5b12 1980 goto label_invalid_code;
4ed46869
KH
1981 ONE_MORE_BYTE (c1);
1982 if (c1 != ISO_CODE_ESC)
d46c5b12 1983 goto label_invalid_code;
4ed46869
KH
1984 ONE_MORE_BYTE (c1);
1985 goto label_escape_sequence;
1986
1987 case '$': /* designation of 2-byte character set */
d46c5b12
KH
1988 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
1989 goto label_invalid_code;
4ed46869
KH
1990 ONE_MORE_BYTE (c1);
1991 if (c1 >= '@' && c1 <= 'B')
1992 { /* designation of JISX0208.1978, GB2312.1980,
88993dfd 1993 or JISX0208.1980 */
4ed46869
KH
1994 DECODE_DESIGNATION (0, 2, 94, c1);
1995 }
1996 else if (c1 >= 0x28 && c1 <= 0x2B)
1997 { /* designation of DIMENSION2_CHARS94 character set */
1998 ONE_MORE_BYTE (c2);
1999 DECODE_DESIGNATION (c1 - 0x28, 2, 94, c2);
2000 }
2001 else if (c1 >= 0x2C && c1 <= 0x2F)
2002 { /* designation of DIMENSION2_CHARS96 character set */
2003 ONE_MORE_BYTE (c2);
2004 DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2);
2005 }
2006 else
d46c5b12 2007 goto label_invalid_code;
b73bfc1c
KH
2008 /* We must update these variables now. */
2009 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
2010 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
2011 continue;
4ed46869
KH
2012
2013 case 'n': /* invocation of locking-shift-2 */
d46c5b12
KH
2014 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
2015 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
2016 goto label_invalid_code;
4ed46869 2017 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2;
e0e989f6 2018 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 2019 continue;
4ed46869
KH
2020
2021 case 'o': /* invocation of locking-shift-3 */
d46c5b12
KH
2022 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
2023 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
2024 goto label_invalid_code;
4ed46869 2025 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3;
e0e989f6 2026 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
b73bfc1c 2027 continue;
4ed46869
KH
2028
2029 case 'N': /* invocation of single-shift-2 */
d46c5b12
KH
2030 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
2031 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
2032 goto label_invalid_code;
4ed46869 2033 charset = CODING_SPEC_ISO_DESIGNATION (coding, 2);
b73bfc1c 2034 ONE_MORE_BYTE (c1);
e7046a18
KH
2035 if (c1 < 0x20 || (c1 >= 0x80 && c1 < 0xA0))
2036 goto label_invalid_code;
4ed46869
KH
2037 break;
2038
2039 case 'O': /* invocation of single-shift-3 */
d46c5b12
KH
2040 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
2041 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
2042 goto label_invalid_code;
4ed46869 2043 charset = CODING_SPEC_ISO_DESIGNATION (coding, 3);
b73bfc1c 2044 ONE_MORE_BYTE (c1);
e7046a18
KH
2045 if (c1 < 0x20 || (c1 >= 0x80 && c1 < 0xA0))
2046 goto label_invalid_code;
4ed46869
KH
2047 break;
2048
ec6d2bb8
KH
2049 case '0': case '2': case '3': case '4': /* start composition */
2050 DECODE_COMPOSITION_START (c1);
b73bfc1c 2051 continue;
4ed46869 2052
ec6d2bb8
KH
2053 case '1': /* end composition */
2054 DECODE_COMPOSITION_END (c1);
b73bfc1c 2055 continue;
4ed46869
KH
2056
2057 case '[': /* specification of direction */
d46c5b12
KH
2058 if (coding->flags & CODING_FLAG_ISO_NO_DIRECTION)
2059 goto label_invalid_code;
4ed46869 2060 /* For the moment, nested direction is not supported.
d46c5b12 2061 So, `coding->mode & CODING_MODE_DIRECTION' zero means
8ca3766a 2062 left-to-right, and nonzero means right-to-left. */
4ed46869
KH
2063 ONE_MORE_BYTE (c1);
2064 switch (c1)
2065 {
2066 case ']': /* end of the current direction */
d46c5b12 2067 coding->mode &= ~CODING_MODE_DIRECTION;
4ed46869
KH
2068
2069 case '0': /* end of the current direction */
2070 case '1': /* start of left-to-right direction */
2071 ONE_MORE_BYTE (c1);
2072 if (c1 == ']')
d46c5b12 2073 coding->mode &= ~CODING_MODE_DIRECTION;
4ed46869 2074 else
d46c5b12 2075 goto label_invalid_code;
4ed46869
KH
2076 break;
2077
2078 case '2': /* start of right-to-left direction */
2079 ONE_MORE_BYTE (c1);
2080 if (c1 == ']')
d46c5b12 2081 coding->mode |= CODING_MODE_DIRECTION;
4ed46869 2082 else
d46c5b12 2083 goto label_invalid_code;
4ed46869
KH
2084 break;
2085
2086 default:
d46c5b12 2087 goto label_invalid_code;
4ed46869 2088 }
b73bfc1c 2089 continue;
4ed46869 2090
103e0180
KH
2091 case '%':
2092 if (COMPOSING_P (coding))
2093 DECODE_COMPOSITION_END ('1');
2094 ONE_MORE_BYTE (c1);
2095 if (c1 == '/')
2096 {
2097 /* CTEXT extended segment:
2098 ESC % / [0-4] M L --ENCODING-NAME-- \002 --BYTES--
2099 We keep these bytes as is for the moment.
2100 They may be decoded by post-read-conversion. */
2101 int dim, M, L;
2102 int size, required;
2103 int produced_chars;
43e4a82f 2104
103e0180
KH
2105 ONE_MORE_BYTE (dim);
2106 ONE_MORE_BYTE (M);
2107 ONE_MORE_BYTE (L);
2108 size = ((M - 128) * 128) + (L - 128);
2109 required = 8 + size * 2;
2110 if (dst + required > (dst_bytes ? dst_end : src))
2111 goto label_end_of_loop;
2112 *dst++ = ISO_CODE_ESC;
2113 *dst++ = '%';
2114 *dst++ = '/';
2115 *dst++ = dim;
2116 produced_chars = 4;
2117 dst += CHAR_STRING (M, dst), produced_chars++;
2118 dst += CHAR_STRING (L, dst), produced_chars++;
2119 while (size-- > 0)
2120 {
2121 ONE_MORE_BYTE (c1);
2122 dst += CHAR_STRING (c1, dst), produced_chars++;
2123 }
2124 coding->produced_char += produced_chars;
2125 }
2126 else if (c1 == 'G')
2127 {
2128 unsigned char *d = dst;
2129 int produced_chars;
2130
2131 /* XFree86 extension for embedding UTF-8 in CTEXT:
2132 ESC % G --UTF-8-BYTES-- ESC % @
2133 We keep these bytes as is for the moment.
2134 They may be decoded by post-read-conversion. */
2135 if (d + 6 > (dst_bytes ? dst_end : src))
2136 goto label_end_of_loop;
2137 *d++ = ISO_CODE_ESC;
2138 *d++ = '%';
2139 *d++ = 'G';
2140 produced_chars = 3;
2141 while (d + 1 < (dst_bytes ? dst_end : src))
2142 {
2143 ONE_MORE_BYTE (c1);
2144 if (c1 == ISO_CODE_ESC
2145 && src + 1 < src_end
2146 && src[0] == '%'
2147 && src[1] == '@')
47dc91ad
KH
2148 {
2149 src += 2;
2150 break;
2151 }
103e0180
KH
2152 d += CHAR_STRING (c1, d), produced_chars++;
2153 }
2154 if (d + 3 > (dst_bytes ? dst_end : src))
2155 goto label_end_of_loop;
2156 *d++ = ISO_CODE_ESC;
2157 *d++ = '%';
2158 *d++ = '@';
2159 dst = d;
2160 coding->produced_char += produced_chars + 3;
2161 }
2162 else
2163 goto label_invalid_code;
2164 continue;
2165
4ed46869 2166 default:
d46c5b12
KH
2167 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
2168 goto label_invalid_code;
4ed46869
KH
2169 if (c1 >= 0x28 && c1 <= 0x2B)
2170 { /* designation of DIMENSION1_CHARS94 character set */
2171 ONE_MORE_BYTE (c2);
2172 DECODE_DESIGNATION (c1 - 0x28, 1, 94, c2);
2173 }
2174 else if (c1 >= 0x2C && c1 <= 0x2F)
2175 { /* designation of DIMENSION1_CHARS96 character set */
2176 ONE_MORE_BYTE (c2);
2177 DECODE_DESIGNATION (c1 - 0x2C, 1, 96, c2);
2178 }
2179 else
b73bfc1c
KH
2180 goto label_invalid_code;
2181 /* We must update these variables now. */
2182 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
2183 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
2184 continue;
4ed46869 2185 }
b73bfc1c 2186 }
4ed46869 2187
b73bfc1c
KH
2188 /* Now we know CHARSET and 1st position code C1 of a character.
2189 Produce a multibyte sequence for that character while getting
2190 2nd position code C2 if necessary. */
2191 if (CHARSET_DIMENSION (charset) == 2)
2192 {
2193 ONE_MORE_BYTE (c2);
2194 if (c1 < 0x80 ? c2 < 0x20 || c2 >= 0x80 : c2 < 0xA0)
2195 /* C2 is not in a valid range. */
2196 goto label_invalid_code;
4ed46869 2197 }
b73bfc1c
KH
2198 c = DECODE_ISO_CHARACTER (charset, c1, c2);
2199 EMIT_CHAR (c);
4ed46869
KH
2200 continue;
2201
b73bfc1c
KH
2202 label_invalid_code:
2203 coding->errors++;
2204 if (COMPOSING_P (coding))
2205 DECODE_COMPOSITION_END ('1');
4ed46869 2206 src = src_base;
b73bfc1c 2207 c = *src++;
2d4430a8
KH
2208 if (! NILP (translation_table))
2209 c = translate_char (translation_table, c, 0, 0, 0);
b73bfc1c 2210 EMIT_CHAR (c);
4ed46869 2211 }
fb88bf2d 2212
b73bfc1c
KH
2213 label_end_of_loop:
2214 coding->consumed = coding->consumed_char = src_base - source;
d46c5b12 2215 coding->produced = dst - destination;
b73bfc1c 2216 return;
4ed46869
KH
2217}
2218
b73bfc1c 2219
f4dee582 2220/* ISO2022 encoding stuff. */
4ed46869
KH
2221
2222/*
f4dee582 2223 It is not enough to say just "ISO2022" on encoding, we have to
cfb43547 2224 specify more details. In Emacs, each ISO2022 coding system
4ed46869 2225 variant has the following specifications:
8ca3766a 2226 1. Initial designation to G0 through G3.
4ed46869
KH
2227 2. Allows short-form designation?
2228 3. ASCII should be designated to G0 before control characters?
2229 4. ASCII should be designated to G0 at end of line?
2230 5. 7-bit environment or 8-bit environment?
2231 6. Use locking-shift?
2232 7. Use Single-shift?
2233 And the following two are only for Japanese:
2234 8. Use ASCII in place of JIS0201-1976-Roman?
2235 9. Use JISX0208-1983 in place of JISX0208-1978?
2236 These specifications are encoded in `coding->flags' as flag bits
2237 defined by macros CODING_FLAG_ISO_XXX. See `coding.h' for more
f4dee582 2238 details.
4ed46869
KH
2239*/
2240
2241/* Produce codes (escape sequence) for designating CHARSET to graphic
b73bfc1c
KH
2242 register REG at DST, and increment DST. If <final-char> of CHARSET is
2243 '@', 'A', or 'B' and the coding system CODING allows, produce
2244 designation sequence of short-form. */
4ed46869
KH
2245
2246#define ENCODE_DESIGNATION(charset, reg, coding) \
2247 do { \
2248 unsigned char final_char = CHARSET_ISO_FINAL_CHAR (charset); \
2249 char *intermediate_char_94 = "()*+"; \
2250 char *intermediate_char_96 = ",-./"; \
70c22245 2251 int revision = CODING_SPEC_ISO_REVISION_NUMBER(coding, charset); \
b73bfc1c 2252 \
70c22245
KH
2253 if (revision < 255) \
2254 { \
4ed46869
KH
2255 *dst++ = ISO_CODE_ESC; \
2256 *dst++ = '&'; \
70c22245 2257 *dst++ = '@' + revision; \
4ed46869 2258 } \
b73bfc1c 2259 *dst++ = ISO_CODE_ESC; \
4ed46869
KH
2260 if (CHARSET_DIMENSION (charset) == 1) \
2261 { \
2262 if (CHARSET_CHARS (charset) == 94) \
2263 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2264 else \
2265 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
2266 } \
2267 else \
2268 { \
2269 *dst++ = '$'; \
2270 if (CHARSET_CHARS (charset) == 94) \
2271 { \
b73bfc1c
KH
2272 if (! (coding->flags & CODING_FLAG_ISO_SHORT_FORM) \
2273 || reg != 0 \
2274 || final_char < '@' || final_char > 'B') \
4ed46869
KH
2275 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2276 } \
2277 else \
b73bfc1c 2278 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
4ed46869 2279 } \
b73bfc1c 2280 *dst++ = final_char; \
4ed46869
KH
2281 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
2282 } while (0)
2283
2284/* The following two macros produce codes (control character or escape
2285 sequence) for ISO2022 single-shift functions (single-shift-2 and
2286 single-shift-3). */
2287
2288#define ENCODE_SINGLE_SHIFT_2 \
2289 do { \
2290 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2291 *dst++ = ISO_CODE_ESC, *dst++ = 'N'; \
2292 else \
b73bfc1c 2293 *dst++ = ISO_CODE_SS2; \
4ed46869
KH
2294 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2295 } while (0)
2296
fb88bf2d
KH
2297#define ENCODE_SINGLE_SHIFT_3 \
2298 do { \
4ed46869 2299 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
fb88bf2d
KH
2300 *dst++ = ISO_CODE_ESC, *dst++ = 'O'; \
2301 else \
b73bfc1c 2302 *dst++ = ISO_CODE_SS3; \
4ed46869
KH
2303 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2304 } while (0)
2305
2306/* The following four macros produce codes (control character or
2307 escape sequence) for ISO2022 locking-shift functions (shift-in,
2308 shift-out, locking-shift-2, and locking-shift-3). */
2309
b73bfc1c
KH
2310#define ENCODE_SHIFT_IN \
2311 do { \
2312 *dst++ = ISO_CODE_SI; \
4ed46869
KH
2313 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; \
2314 } while (0)
2315
b73bfc1c
KH
2316#define ENCODE_SHIFT_OUT \
2317 do { \
2318 *dst++ = ISO_CODE_SO; \
4ed46869
KH
2319 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; \
2320 } while (0)
2321
2322#define ENCODE_LOCKING_SHIFT_2 \
2323 do { \
2324 *dst++ = ISO_CODE_ESC, *dst++ = 'n'; \
2325 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; \
2326 } while (0)
2327
b73bfc1c
KH
2328#define ENCODE_LOCKING_SHIFT_3 \
2329 do { \
2330 *dst++ = ISO_CODE_ESC, *dst++ = 'o'; \
4ed46869
KH
2331 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; \
2332 } while (0)
2333
f4dee582
RS
2334/* Produce codes for a DIMENSION1 character whose character set is
2335 CHARSET and whose position-code is C1. Designation and invocation
4ed46869
KH
2336 sequences are also produced in advance if necessary. */
2337
6e85d753
KH
2338#define ENCODE_ISO_CHARACTER_DIMENSION1(charset, c1) \
2339 do { \
2340 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2341 { \
2342 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2343 *dst++ = c1 & 0x7F; \
2344 else \
2345 *dst++ = c1 | 0x80; \
2346 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2347 break; \
2348 } \
2349 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2350 { \
2351 *dst++ = c1 & 0x7F; \
2352 break; \
2353 } \
2354 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2355 { \
2356 *dst++ = c1 | 0x80; \
2357 break; \
2358 } \
6e85d753
KH
2359 else \
2360 /* Since CHARSET is not yet invoked to any graphic planes, we \
2361 must invoke it, or, at first, designate it to some graphic \
2362 register. Then repeat the loop to actually produce the \
2363 character. */ \
2364 dst = encode_invocation_designation (charset, coding, dst); \
4ed46869
KH
2365 } while (1)
2366
f4dee582
RS
2367/* Produce codes for a DIMENSION2 character whose character set is
2368 CHARSET and whose position-codes are C1 and C2. Designation and
4ed46869
KH
2369 invocation codes are also produced in advance if necessary. */
2370
6e85d753
KH
2371#define ENCODE_ISO_CHARACTER_DIMENSION2(charset, c1, c2) \
2372 do { \
2373 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2374 { \
2375 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2376 *dst++ = c1 & 0x7F, *dst++ = c2 & 0x7F; \
2377 else \
2378 *dst++ = c1 | 0x80, *dst++ = c2 | 0x80; \
2379 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2380 break; \
2381 } \
2382 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2383 { \
2384 *dst++ = c1 & 0x7F, *dst++= c2 & 0x7F; \
2385 break; \
2386 } \
2387 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2388 { \
2389 *dst++ = c1 | 0x80, *dst++= c2 | 0x80; \
2390 break; \
2391 } \
6e85d753
KH
2392 else \
2393 /* Since CHARSET is not yet invoked to any graphic planes, we \
2394 must invoke it, or, at first, designate it to some graphic \
2395 register. Then repeat the loop to actually produce the \
2396 character. */ \
2397 dst = encode_invocation_designation (charset, coding, dst); \
4ed46869
KH
2398 } while (1)
2399
05e6f5dc
KH
2400#define ENCODE_ISO_CHARACTER(c) \
2401 do { \
2402 int charset, c1, c2; \
2403 \
2404 SPLIT_CHAR (c, charset, c1, c2); \
2405 if (CHARSET_DEFINED_P (charset)) \
2406 { \
2407 if (CHARSET_DIMENSION (charset) == 1) \
2408 { \
2409 if (charset == CHARSET_ASCII \
2410 && coding->flags & CODING_FLAG_ISO_USE_ROMAN) \
2411 charset = charset_latin_jisx0201; \
2412 ENCODE_ISO_CHARACTER_DIMENSION1 (charset, c1); \
2413 } \
2414 else \
2415 { \
2416 if (charset == charset_jisx0208 \
2417 && coding->flags & CODING_FLAG_ISO_USE_OLDJIS) \
2418 charset = charset_jisx0208_1978; \
2419 ENCODE_ISO_CHARACTER_DIMENSION2 (charset, c1, c2); \
2420 } \
2421 } \
2422 else \
2423 { \
2424 *dst++ = c1; \
2425 if (c2 >= 0) \
2426 *dst++ = c2; \
2427 } \
2428 } while (0)
2429
2430
2431/* Instead of encoding character C, produce one or two `?'s. */
2432
0eecad43
KH
2433#define ENCODE_UNSAFE_CHARACTER(c) \
2434 do { \
2435 ENCODE_ISO_CHARACTER (CODING_REPLACEMENT_CHARACTER); \
2436 if (CHARSET_WIDTH (CHAR_CHARSET (c)) > 1) \
2437 ENCODE_ISO_CHARACTER (CODING_REPLACEMENT_CHARACTER); \
84fbb8a0 2438 } while (0)
bdd9fb48 2439
05e6f5dc 2440
4ed46869
KH
2441/* Produce designation and invocation codes at a place pointed by DST
2442 to use CHARSET. The element `spec.iso2022' of *CODING is updated.
2443 Return new DST. */
2444
2445unsigned char *
2446encode_invocation_designation (charset, coding, dst)
2447 int charset;
2448 struct coding_system *coding;
2449 unsigned char *dst;
2450{
2451 int reg; /* graphic register number */
2452
2453 /* At first, check designations. */
2454 for (reg = 0; reg < 4; reg++)
2455 if (charset == CODING_SPEC_ISO_DESIGNATION (coding, reg))
2456 break;
2457
2458 if (reg >= 4)
2459 {
2460 /* CHARSET is not yet designated to any graphic registers. */
2461 /* At first check the requested designation. */
2462 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
1ba9e4ab
KH
2463 if (reg == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)
2464 /* Since CHARSET requests no special designation, designate it
2465 to graphic register 0. */
4ed46869
KH
2466 reg = 0;
2467
2468 ENCODE_DESIGNATION (charset, reg, coding);
2469 }
2470
2471 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != reg
2472 && CODING_SPEC_ISO_INVOCATION (coding, 1) != reg)
2473 {
2474 /* Since the graphic register REG is not invoked to any graphic
2475 planes, invoke it to graphic plane 0. */
2476 switch (reg)
2477 {
2478 case 0: /* graphic register 0 */
2479 ENCODE_SHIFT_IN;
2480 break;
2481
2482 case 1: /* graphic register 1 */
2483 ENCODE_SHIFT_OUT;
2484 break;
2485
2486 case 2: /* graphic register 2 */
2487 if (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
2488 ENCODE_SINGLE_SHIFT_2;
2489 else
2490 ENCODE_LOCKING_SHIFT_2;
2491 break;
2492
2493 case 3: /* graphic register 3 */
2494 if (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
2495 ENCODE_SINGLE_SHIFT_3;
2496 else
2497 ENCODE_LOCKING_SHIFT_3;
2498 break;
2499 }
2500 }
b73bfc1c 2501
4ed46869
KH
2502 return dst;
2503}
2504
ec6d2bb8
KH
2505/* Produce 2-byte codes for encoded composition rule RULE. */
2506
2507#define ENCODE_COMPOSITION_RULE(rule) \
2508 do { \
2509 int gref, nref; \
2510 COMPOSITION_DECODE_RULE (rule, gref, nref); \
2511 *dst++ = 32 + 81 + gref; \
2512 *dst++ = 32 + nref; \
2513 } while (0)
2514
2515/* Produce codes for indicating the start of a composition sequence
2516 (ESC 0, ESC 3, or ESC 4). DATA points to an array of integers
2517 which specify information about the composition. See the comment
2518 in coding.h for the format of DATA. */
2519
2520#define ENCODE_COMPOSITION_START(coding, data) \
2521 do { \
2522 coding->composing = data[3]; \
2523 *dst++ = ISO_CODE_ESC; \
2524 if (coding->composing == COMPOSITION_RELATIVE) \
2525 *dst++ = '0'; \
2526 else \
2527 { \
2528 *dst++ = (coding->composing == COMPOSITION_WITH_ALTCHARS \
2529 ? '3' : '4'); \
2530 coding->cmp_data_index = coding->cmp_data_start + 4; \
2531 coding->composition_rule_follows = 0; \
2532 } \
2533 } while (0)
2534
2535/* Produce codes for indicating the end of the current composition. */
2536
2537#define ENCODE_COMPOSITION_END(coding, data) \
2538 do { \
2539 *dst++ = ISO_CODE_ESC; \
2540 *dst++ = '1'; \
2541 coding->cmp_data_start += data[0]; \
2542 coding->composing = COMPOSITION_NO; \
2543 if (coding->cmp_data_start == coding->cmp_data->used \
2544 && coding->cmp_data->next) \
2545 { \
2546 coding->cmp_data = coding->cmp_data->next; \
2547 coding->cmp_data_start = 0; \
2548 } \
2549 } while (0)
2550
2551/* Produce composition start sequence ESC 0. Here, this sequence
2552 doesn't mean the start of a new composition but means that we have
2553 just produced components (alternate chars and composition rules) of
2554 the composition and the actual text follows in SRC. */
2555
2556#define ENCODE_COMPOSITION_FAKE_START(coding) \
2557 do { \
2558 *dst++ = ISO_CODE_ESC; \
2559 *dst++ = '0'; \
2560 coding->composing = COMPOSITION_RELATIVE; \
2561 } while (0)
4ed46869
KH
2562
2563/* The following three macros produce codes for indicating direction
2564 of text. */
b73bfc1c
KH
2565#define ENCODE_CONTROL_SEQUENCE_INTRODUCER \
2566 do { \
4ed46869 2567 if (coding->flags == CODING_FLAG_ISO_SEVEN_BITS) \
b73bfc1c
KH
2568 *dst++ = ISO_CODE_ESC, *dst++ = '['; \
2569 else \
2570 *dst++ = ISO_CODE_CSI; \
4ed46869
KH
2571 } while (0)
2572
2573#define ENCODE_DIRECTION_R2L \
b73bfc1c 2574 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '2', *dst++ = ']'
4ed46869
KH
2575
2576#define ENCODE_DIRECTION_L2R \
b73bfc1c 2577 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '0', *dst++ = ']'
4ed46869
KH
2578
2579/* Produce codes for designation and invocation to reset the graphic
2580 planes and registers to initial state. */
e0e989f6
KH
2581#define ENCODE_RESET_PLANE_AND_REGISTER \
2582 do { \
2583 int reg; \
2584 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != 0) \
2585 ENCODE_SHIFT_IN; \
2586 for (reg = 0; reg < 4; reg++) \
2587 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg) >= 0 \
2588 && (CODING_SPEC_ISO_DESIGNATION (coding, reg) \
2589 != CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg))) \
2590 ENCODE_DESIGNATION \
2591 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \
4ed46869
KH
2592 } while (0)
2593
bdd9fb48 2594/* Produce designation sequences of charsets in the line started from
b73bfc1c 2595 SRC to a place pointed by DST, and return updated DST.
bdd9fb48
KH
2596
2597 If the current block ends before any end-of-line, we may fail to
d46c5b12
KH
2598 find all the necessary designations. */
2599
b73bfc1c
KH
2600static unsigned char *
2601encode_designation_at_bol (coding, translation_table, src, src_end, dst)
e0e989f6 2602 struct coding_system *coding;
b73bfc1c 2603 Lisp_Object translation_table;
5bdca8af
DN
2604 const unsigned char *src, *src_end;
2605 unsigned char *dst;
e0e989f6 2606{
bdd9fb48
KH
2607 int charset, c, found = 0, reg;
2608 /* Table of charsets to be designated to each graphic register. */
2609 int r[4];
bdd9fb48
KH
2610
2611 for (reg = 0; reg < 4; reg++)
2612 r[reg] = -1;
2613
b73bfc1c 2614 while (found < 4)
e0e989f6 2615 {
b73bfc1c
KH
2616 ONE_MORE_CHAR (c);
2617 if (c == '\n')
2618 break;
93dec019 2619
b73bfc1c 2620 charset = CHAR_CHARSET (c);
e0e989f6 2621 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
d46c5b12 2622 if (reg != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION && r[reg] < 0)
bdd9fb48
KH
2623 {
2624 found++;
2625 r[reg] = charset;
2626 }
bdd9fb48
KH
2627 }
2628
b73bfc1c 2629 label_end_of_loop:
bdd9fb48
KH
2630 if (found)
2631 {
2632 for (reg = 0; reg < 4; reg++)
2633 if (r[reg] >= 0
2634 && CODING_SPEC_ISO_DESIGNATION (coding, reg) != r[reg])
2635 ENCODE_DESIGNATION (r[reg], reg, coding);
e0e989f6 2636 }
b73bfc1c
KH
2637
2638 return dst;
e0e989f6
KH
2639}
2640
4ed46869
KH
2641/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
2642
b73bfc1c 2643static void
d46c5b12 2644encode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
4ed46869 2645 struct coding_system *coding;
5bdca8af
DN
2646 const unsigned char *source;
2647 unsigned char *destination;
4ed46869 2648 int src_bytes, dst_bytes;
4ed46869 2649{
5bdca8af
DN
2650 const unsigned char *src = source;
2651 const unsigned char *src_end = source + src_bytes;
4ed46869
KH
2652 unsigned char *dst = destination;
2653 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c 2654 /* Since the maximum bytes produced by each loop is 20, we subtract 19
4ed46869
KH
2655 from DST_END to assure overflow checking is necessary only at the
2656 head of loop. */
b73bfc1c
KH
2657 unsigned char *adjusted_dst_end = dst_end - 19;
2658 /* SRC_BASE remembers the start position in source in each loop.
2659 The loop will be exited when there's not enough source text to
2660 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2661 there's not enough destination area to produce encoded codes
2662 (within macro EMIT_BYTES). */
5bdca8af 2663 const unsigned char *src_base;
b73bfc1c
KH
2664 int c;
2665 Lisp_Object translation_table;
05e6f5dc
KH
2666 Lisp_Object safe_chars;
2667
0eecad43
KH
2668 if (coding->flags & CODING_FLAG_ISO_SAFE)
2669 coding->mode |= CODING_MODE_INHIBIT_UNENCODABLE_CHAR;
2670
6b89e3aa 2671 safe_chars = coding_safe_chars (coding->symbol);
bdd9fb48 2672
b73bfc1c
KH
2673 if (NILP (Venable_character_translation))
2674 translation_table = Qnil;
2675 else
2676 {
2677 translation_table = coding->translation_table_for_encode;
2678 if (NILP (translation_table))
2679 translation_table = Vstandard_translation_table_for_encode;
2680 }
4ed46869 2681
d46c5b12 2682 coding->consumed_char = 0;
b73bfc1c
KH
2683 coding->errors = 0;
2684 while (1)
4ed46869 2685 {
b73bfc1c
KH
2686 src_base = src;
2687
2688 if (dst >= (dst_bytes ? adjusted_dst_end : (src - 19)))
2689 {
2690 coding->result = CODING_FINISH_INSUFFICIENT_DST;
2691 break;
2692 }
4ed46869 2693
e0e989f6
KH
2694 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL
2695 && CODING_SPEC_ISO_BOL (coding))
2696 {
bdd9fb48 2697 /* We have to produce designation sequences if any now. */
b73bfc1c
KH
2698 dst = encode_designation_at_bol (coding, translation_table,
2699 src, src_end, dst);
e0e989f6
KH
2700 CODING_SPEC_ISO_BOL (coding) = 0;
2701 }
2702
ec6d2bb8
KH
2703 /* Check composition start and end. */
2704 if (coding->composing != COMPOSITION_DISABLED
2705 && coding->cmp_data_start < coding->cmp_data->used)
4ed46869 2706 {
ec6d2bb8
KH
2707 struct composition_data *cmp_data = coding->cmp_data;
2708 int *data = cmp_data->data + coding->cmp_data_start;
2709 int this_pos = cmp_data->char_offset + coding->consumed_char;
2710
2711 if (coding->composing == COMPOSITION_RELATIVE)
4ed46869 2712 {
ec6d2bb8
KH
2713 if (this_pos == data[2])
2714 {
2715 ENCODE_COMPOSITION_END (coding, data);
2716 cmp_data = coding->cmp_data;
2717 data = cmp_data->data + coding->cmp_data_start;
2718 }
4ed46869 2719 }
ec6d2bb8 2720 else if (COMPOSING_P (coding))
4ed46869 2721 {
ec6d2bb8
KH
2722 /* COMPOSITION_WITH_ALTCHARS or COMPOSITION_WITH_RULE_ALTCHAR */
2723 if (coding->cmp_data_index == coding->cmp_data_start + data[0])
2724 /* We have consumed components of the composition.
8ca3766a 2725 What follows in SRC is the composition's base
ec6d2bb8
KH
2726 text. */
2727 ENCODE_COMPOSITION_FAKE_START (coding);
2728 else
4ed46869 2729 {
ec6d2bb8
KH
2730 int c = cmp_data->data[coding->cmp_data_index++];
2731 if (coding->composition_rule_follows)
2732 {
2733 ENCODE_COMPOSITION_RULE (c);
2734 coding->composition_rule_follows = 0;
2735 }
2736 else
2737 {
0eecad43 2738 if (coding->mode & CODING_MODE_INHIBIT_UNENCODABLE_CHAR
05e6f5dc
KH
2739 && ! CODING_SAFE_CHAR_P (safe_chars, c))
2740 ENCODE_UNSAFE_CHARACTER (c);
2741 else
2742 ENCODE_ISO_CHARACTER (c);
ec6d2bb8
KH
2743 if (coding->composing == COMPOSITION_WITH_RULE_ALTCHARS)
2744 coding->composition_rule_follows = 1;
2745 }
4ed46869
KH
2746 continue;
2747 }
ec6d2bb8
KH
2748 }
2749 if (!COMPOSING_P (coding))
2750 {
2751 if (this_pos == data[1])
4ed46869 2752 {
ec6d2bb8
KH
2753 ENCODE_COMPOSITION_START (coding, data);
2754 continue;
4ed46869 2755 }
4ed46869
KH
2756 }
2757 }
ec6d2bb8 2758
b73bfc1c 2759 ONE_MORE_CHAR (c);
4ed46869 2760
b73bfc1c
KH
2761 /* Now encode the character C. */
2762 if (c < 0x20 || c == 0x7F)
2763 {
2764 if (c == '\r')
19a8d9e0 2765 {
b73bfc1c
KH
2766 if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY))
2767 {
2768 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
2769 ENCODE_RESET_PLANE_AND_REGISTER;
2770 *dst++ = c;
2771 continue;
2772 }
2773 /* fall down to treat '\r' as '\n' ... */
2774 c = '\n';
19a8d9e0 2775 }
b73bfc1c 2776 if (c == '\n')
19a8d9e0 2777 {
b73bfc1c
KH
2778 if (coding->flags & CODING_FLAG_ISO_RESET_AT_EOL)
2779 ENCODE_RESET_PLANE_AND_REGISTER;
2780 if (coding->flags & CODING_FLAG_ISO_INIT_AT_BOL)
2781 bcopy (coding->spec.iso2022.initial_designation,
2782 coding->spec.iso2022.current_designation,
2783 sizeof coding->spec.iso2022.initial_designation);
2784 if (coding->eol_type == CODING_EOL_LF
2785 || coding->eol_type == CODING_EOL_UNDECIDED)
2786 *dst++ = ISO_CODE_LF;
2787 else if (coding->eol_type == CODING_EOL_CRLF)
2788 *dst++ = ISO_CODE_CR, *dst++ = ISO_CODE_LF;
2789 else
2790 *dst++ = ISO_CODE_CR;
2791 CODING_SPEC_ISO_BOL (coding) = 1;
19a8d9e0 2792 }
93dec019 2793 else
19a8d9e0 2794 {
b73bfc1c
KH
2795 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
2796 ENCODE_RESET_PLANE_AND_REGISTER;
2797 *dst++ = c;
19a8d9e0 2798 }
4ed46869 2799 }
b73bfc1c 2800 else if (ASCII_BYTE_P (c))
05e6f5dc 2801 ENCODE_ISO_CHARACTER (c);
b73bfc1c 2802 else if (SINGLE_BYTE_CHAR_P (c))
88993dfd 2803 {
b73bfc1c
KH
2804 *dst++ = c;
2805 coding->errors++;
88993dfd 2806 }
0eecad43 2807 else if (coding->mode & CODING_MODE_INHIBIT_UNENCODABLE_CHAR
05e6f5dc
KH
2808 && ! CODING_SAFE_CHAR_P (safe_chars, c))
2809 ENCODE_UNSAFE_CHARACTER (c);
b73bfc1c 2810 else
05e6f5dc 2811 ENCODE_ISO_CHARACTER (c);
b73bfc1c
KH
2812
2813 coding->consumed_char++;
84fbb8a0 2814 }
b73bfc1c
KH
2815
2816 label_end_of_loop:
2817 coding->consumed = src_base - source;
d46c5b12 2818 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
2819}
2820
2821\f
2822/*** 4. SJIS and BIG5 handlers ***/
2823
cfb43547 2824/* Although SJIS and BIG5 are not ISO coding systems, they are used
4ed46869
KH
2825 quite widely. So, for the moment, Emacs supports them in the bare
2826 C code. But, in the future, they may be supported only by CCL. */
2827
2828/* SJIS is a coding system encoding three character sets: ASCII, right
2829 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded
2830 as is. A character of charset katakana-jisx0201 is encoded by
2831 "position-code + 0x80". A character of charset japanese-jisx0208
2832 is encoded in 2-byte but two position-codes are divided and shifted
cfb43547 2833 so that it fits in the range below.
4ed46869
KH
2834
2835 --- CODE RANGE of SJIS ---
2836 (character set) (range)
2837 ASCII 0x00 .. 0x7F
682169fe 2838 KATAKANA-JISX0201 0xA1 .. 0xDF
c28a9453 2839 JISX0208 (1st byte) 0x81 .. 0x9F and 0xE0 .. 0xEF
d14d03ac 2840 (2nd byte) 0x40 .. 0x7E and 0x80 .. 0xFC
4ed46869
KH
2841 -------------------------------
2842
2843*/
2844
2845/* BIG5 is a coding system encoding two character sets: ASCII and
2846 Big5. An ASCII character is encoded as is. Big5 is a two-byte
cfb43547 2847 character set and is encoded in two bytes.
4ed46869
KH
2848
2849 --- CODE RANGE of BIG5 ---
2850 (character set) (range)
2851 ASCII 0x00 .. 0x7F
2852 Big5 (1st byte) 0xA1 .. 0xFE
2853 (2nd byte) 0x40 .. 0x7E and 0xA1 .. 0xFE
2854 --------------------------
2855
2856 Since the number of characters in Big5 is larger than maximum
2857 characters in Emacs' charset (96x96), it can't be handled as one
2858 charset. So, in Emacs, Big5 is divided into two: `charset-big5-1'
2859 and `charset-big5-2'. Both are DIMENSION2 and CHARS94. The former
2860 contains frequently used characters and the latter contains less
2861 frequently used characters. */
2862
2863/* Macros to decode or encode a character of Big5 in BIG5. B1 and B2
2864 are the 1st and 2nd position-codes of Big5 in BIG5 coding system.
f458a8e0 2865 C1 and C2 are the 1st and 2nd position-codes of Emacs' internal
4ed46869
KH
2866 format. CHARSET is `charset_big5_1' or `charset_big5_2'. */
2867
2868/* Number of Big5 characters which have the same code in 1st byte. */
2869#define BIG5_SAME_ROW (0xFF - 0xA1 + 0x7F - 0x40)
2870
2871#define DECODE_BIG5(b1, b2, charset, c1, c2) \
2872 do { \
2873 unsigned int temp \
2874 = (b1 - 0xA1) * BIG5_SAME_ROW + b2 - (b2 < 0x7F ? 0x40 : 0x62); \
2875 if (b1 < 0xC9) \
2876 charset = charset_big5_1; \
2877 else \
2878 { \
2879 charset = charset_big5_2; \
2880 temp -= (0xC9 - 0xA1) * BIG5_SAME_ROW; \
2881 } \
2882 c1 = temp / (0xFF - 0xA1) + 0x21; \
2883 c2 = temp % (0xFF - 0xA1) + 0x21; \
2884 } while (0)
2885
2886#define ENCODE_BIG5(charset, c1, c2, b1, b2) \
2887 do { \
2888 unsigned int temp = (c1 - 0x21) * (0xFF - 0xA1) + (c2 - 0x21); \
2889 if (charset == charset_big5_2) \
2890 temp += BIG5_SAME_ROW * (0xC9 - 0xA1); \
2891 b1 = temp / BIG5_SAME_ROW + 0xA1; \
2892 b2 = temp % BIG5_SAME_ROW; \
2893 b2 += b2 < 0x3F ? 0x40 : 0x62; \
2894 } while (0)
2895
2896/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2897 Check if a text is encoded in SJIS. If it is, return
2898 CODING_CATEGORY_MASK_SJIS, else return 0. */
2899
0a28aafb
KH
2900static int
2901detect_coding_sjis (src, src_end, multibytep)
4ed46869 2902 unsigned char *src, *src_end;
0a28aafb 2903 int multibytep;
4ed46869 2904{
b73bfc1c
KH
2905 int c;
2906 /* Dummy for ONE_MORE_BYTE. */
2907 struct coding_system dummy_coding;
2908 struct coding_system *coding = &dummy_coding;
4ed46869 2909
b73bfc1c 2910 while (1)
4ed46869 2911 {
0a28aafb 2912 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
682169fe
KH
2913 if (c < 0x80)
2914 continue;
2915 if (c == 0x80 || c == 0xA0 || c > 0xEF)
2916 return 0;
2917 if (c <= 0x9F || c >= 0xE0)
4ed46869 2918 {
682169fe
KH
2919 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
2920 if (c < 0x40 || c == 0x7F || c > 0xFC)
4ed46869
KH
2921 return 0;
2922 }
2923 }
b73bfc1c 2924 label_end_of_loop:
4ed46869
KH
2925 return CODING_CATEGORY_MASK_SJIS;
2926}
2927
2928/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2929 Check if a text is encoded in BIG5. If it is, return
2930 CODING_CATEGORY_MASK_BIG5, else return 0. */
2931
0a28aafb
KH
2932static int
2933detect_coding_big5 (src, src_end, multibytep)
4ed46869 2934 unsigned char *src, *src_end;
0a28aafb 2935 int multibytep;
4ed46869 2936{
b73bfc1c
KH
2937 int c;
2938 /* Dummy for ONE_MORE_BYTE. */
2939 struct coding_system dummy_coding;
2940 struct coding_system *coding = &dummy_coding;
4ed46869 2941
b73bfc1c 2942 while (1)
4ed46869 2943 {
0a28aafb 2944 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
682169fe
KH
2945 if (c < 0x80)
2946 continue;
2947 if (c < 0xA1 || c > 0xFE)
2948 return 0;
2949 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
2950 if (c < 0x40 || (c > 0x7F && c < 0xA1) || c > 0xFE)
2951 return 0;
4ed46869 2952 }
b73bfc1c 2953 label_end_of_loop:
4ed46869
KH
2954 return CODING_CATEGORY_MASK_BIG5;
2955}
2956
fa42c37f
KH
2957/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2958 Check if a text is encoded in UTF-8. If it is, return
2959 CODING_CATEGORY_MASK_UTF_8, else return 0. */
2960
2961#define UTF_8_1_OCTET_P(c) ((c) < 0x80)
2962#define UTF_8_EXTRA_OCTET_P(c) (((c) & 0xC0) == 0x80)
2963#define UTF_8_2_OCTET_LEADING_P(c) (((c) & 0xE0) == 0xC0)
2964#define UTF_8_3_OCTET_LEADING_P(c) (((c) & 0xF0) == 0xE0)
2965#define UTF_8_4_OCTET_LEADING_P(c) (((c) & 0xF8) == 0xF0)
2966#define UTF_8_5_OCTET_LEADING_P(c) (((c) & 0xFC) == 0xF8)
2967#define UTF_8_6_OCTET_LEADING_P(c) (((c) & 0xFE) == 0xFC)
2968
0a28aafb
KH
2969static int
2970detect_coding_utf_8 (src, src_end, multibytep)
fa42c37f 2971 unsigned char *src, *src_end;
0a28aafb 2972 int multibytep;
fa42c37f
KH
2973{
2974 unsigned char c;
2975 int seq_maybe_bytes;
b73bfc1c
KH
2976 /* Dummy for ONE_MORE_BYTE. */
2977 struct coding_system dummy_coding;
2978 struct coding_system *coding = &dummy_coding;
fa42c37f 2979
b73bfc1c 2980 while (1)
fa42c37f 2981 {
0a28aafb 2982 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
fa42c37f
KH
2983 if (UTF_8_1_OCTET_P (c))
2984 continue;
2985 else if (UTF_8_2_OCTET_LEADING_P (c))
2986 seq_maybe_bytes = 1;
2987 else if (UTF_8_3_OCTET_LEADING_P (c))
2988 seq_maybe_bytes = 2;
2989 else if (UTF_8_4_OCTET_LEADING_P (c))
2990 seq_maybe_bytes = 3;
2991 else if (UTF_8_5_OCTET_LEADING_P (c))
2992 seq_maybe_bytes = 4;
2993 else if (UTF_8_6_OCTET_LEADING_P (c))
2994 seq_maybe_bytes = 5;
2995 else
2996 return 0;
2997
2998 do
2999 {
0a28aafb 3000 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
fa42c37f
KH
3001 if (!UTF_8_EXTRA_OCTET_P (c))
3002 return 0;
3003 seq_maybe_bytes--;
3004 }
3005 while (seq_maybe_bytes > 0);
3006 }
3007
b73bfc1c 3008 label_end_of_loop:
fa42c37f
KH
3009 return CODING_CATEGORY_MASK_UTF_8;
3010}
3011
3012/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
3013 Check if a text is encoded in UTF-16 Big Endian (endian == 1) or
3014 Little Endian (otherwise). If it is, return
3015 CODING_CATEGORY_MASK_UTF_16_BE or CODING_CATEGORY_MASK_UTF_16_LE,
3016 else return 0. */
3017
3018#define UTF_16_INVALID_P(val) \
3019 (((val) == 0xFFFE) \
3020 || ((val) == 0xFFFF))
3021
3022#define UTF_16_HIGH_SURROGATE_P(val) \
3023 (((val) & 0xD800) == 0xD800)
3024
3025#define UTF_16_LOW_SURROGATE_P(val) \
3026 (((val) & 0xDC00) == 0xDC00)
3027
0a28aafb
KH
3028static int
3029detect_coding_utf_16 (src, src_end, multibytep)
fa42c37f 3030 unsigned char *src, *src_end;
0a28aafb 3031 int multibytep;
fa42c37f 3032{
b73bfc1c 3033 unsigned char c1, c2;
1c7457e2 3034 /* Dummy for ONE_MORE_BYTE_CHECK_MULTIBYTE. */
b73bfc1c
KH
3035 struct coding_system dummy_coding;
3036 struct coding_system *coding = &dummy_coding;
fa42c37f 3037
0a28aafb
KH
3038 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1, multibytep);
3039 ONE_MORE_BYTE_CHECK_MULTIBYTE (c2, multibytep);
b73bfc1c
KH
3040
3041 if ((c1 == 0xFF) && (c2 == 0xFE))
fa42c37f 3042 return CODING_CATEGORY_MASK_UTF_16_LE;
b73bfc1c 3043 else if ((c1 == 0xFE) && (c2 == 0xFF))
fa42c37f
KH
3044 return CODING_CATEGORY_MASK_UTF_16_BE;
3045
b73bfc1c 3046 label_end_of_loop:
fa42c37f
KH
3047 return 0;
3048}
3049
4ed46869
KH
3050/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
3051 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */
3052
b73bfc1c 3053static void
4ed46869 3054decode_coding_sjis_big5 (coding, source, destination,
d46c5b12 3055 src_bytes, dst_bytes, sjis_p)
4ed46869 3056 struct coding_system *coding;
5bdca8af
DN
3057 const unsigned char *source;
3058 unsigned char *destination;
4ed46869 3059 int src_bytes, dst_bytes;
4ed46869
KH
3060 int sjis_p;
3061{
5bdca8af
DN
3062 const unsigned char *src = source;
3063 const unsigned char *src_end = source + src_bytes;
4ed46869
KH
3064 unsigned char *dst = destination;
3065 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c
KH
3066 /* SRC_BASE remembers the start position in source in each loop.
3067 The loop will be exited when there's not enough source code
3068 (within macro ONE_MORE_BYTE), or when there's not enough
3069 destination area to produce a character (within macro
3070 EMIT_CHAR). */
5bdca8af 3071 const unsigned char *src_base;
b73bfc1c 3072 Lisp_Object translation_table;
a5d301df 3073
b73bfc1c
KH
3074 if (NILP (Venable_character_translation))
3075 translation_table = Qnil;
3076 else
3077 {
3078 translation_table = coding->translation_table_for_decode;
3079 if (NILP (translation_table))
3080 translation_table = Vstandard_translation_table_for_decode;
3081 }
4ed46869 3082
d46c5b12 3083 coding->produced_char = 0;
b73bfc1c 3084 while (1)
4ed46869 3085 {
85478bc6 3086 int c, charset, c1, c2 = 0;
b73bfc1c
KH
3087
3088 src_base = src;
3089 ONE_MORE_BYTE (c1);
3090
3091 if (c1 < 0x80)
4ed46869 3092 {
b73bfc1c
KH
3093 charset = CHARSET_ASCII;
3094 if (c1 < 0x20)
4ed46869 3095 {
b73bfc1c 3096 if (c1 == '\r')
d46c5b12 3097 {
b73bfc1c 3098 if (coding->eol_type == CODING_EOL_CRLF)
d46c5b12 3099 {
b73bfc1c
KH
3100 ONE_MORE_BYTE (c2);
3101 if (c2 == '\n')
3102 c1 = c2;
b73bfc1c
KH
3103 else
3104 /* To process C2 again, SRC is subtracted by 1. */
3105 src--;
d46c5b12 3106 }
b73bfc1c
KH
3107 else if (coding->eol_type == CODING_EOL_CR)
3108 c1 = '\n';
3109 }
3110 else if (c1 == '\n'
3111 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
3112 && (coding->eol_type == CODING_EOL_CR
3113 || coding->eol_type == CODING_EOL_CRLF))
3114 {
3115 coding->result = CODING_FINISH_INCONSISTENT_EOL;
3116 goto label_end_of_loop;
d46c5b12 3117 }
4ed46869 3118 }
4ed46869 3119 }
54f78171 3120 else
b73bfc1c 3121 {
4ed46869
KH
3122 if (sjis_p)
3123 {
682169fe 3124 if (c1 == 0x80 || c1 == 0xA0 || c1 > 0xEF)
b73bfc1c 3125 goto label_invalid_code;
682169fe 3126 if (c1 <= 0x9F || c1 >= 0xE0)
fb88bf2d 3127 {
54f78171
KH
3128 /* SJIS -> JISX0208 */
3129 ONE_MORE_BYTE (c2);
b73bfc1c
KH
3130 if (c2 < 0x40 || c2 == 0x7F || c2 > 0xFC)
3131 goto label_invalid_code;
3132 DECODE_SJIS (c1, c2, c1, c2);
3133 charset = charset_jisx0208;
5e34de15 3134 }
fb88bf2d 3135 else
b73bfc1c
KH
3136 /* SJIS -> JISX0201-Kana */
3137 charset = charset_katakana_jisx0201;
4ed46869 3138 }
fb88bf2d 3139 else
fb88bf2d 3140 {
54f78171 3141 /* BIG5 -> Big5 */
682169fe 3142 if (c1 < 0xA0 || c1 > 0xFE)
b73bfc1c
KH
3143 goto label_invalid_code;
3144 ONE_MORE_BYTE (c2);
3145 if (c2 < 0x40 || (c2 > 0x7E && c2 < 0xA1) || c2 > 0xFE)
3146 goto label_invalid_code;
3147 DECODE_BIG5 (c1, c2, charset, c1, c2);
4ed46869
KH
3148 }
3149 }
4ed46869 3150
b73bfc1c
KH
3151 c = DECODE_ISO_CHARACTER (charset, c1, c2);
3152 EMIT_CHAR (c);
fb88bf2d
KH
3153 continue;
3154
b73bfc1c
KH
3155 label_invalid_code:
3156 coding->errors++;
4ed46869 3157 src = src_base;
b73bfc1c
KH
3158 c = *src++;
3159 EMIT_CHAR (c);
fb88bf2d 3160 }
d46c5b12 3161
b73bfc1c
KH
3162 label_end_of_loop:
3163 coding->consumed = coding->consumed_char = src_base - source;
d46c5b12 3164 coding->produced = dst - destination;
b73bfc1c 3165 return;
4ed46869
KH
3166}
3167
3168/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".
b73bfc1c
KH
3169 This function can encode charsets `ascii', `katakana-jisx0201',
3170 `japanese-jisx0208', `chinese-big5-1', and `chinese-big5-2'. We
3171 are sure that all these charsets are registered as official charset
4ed46869
KH
3172 (i.e. do not have extended leading-codes). Characters of other
3173 charsets are produced without any encoding. If SJIS_P is 1, encode
3174 SJIS text, else encode BIG5 text. */
3175
b73bfc1c 3176static void
4ed46869 3177encode_coding_sjis_big5 (coding, source, destination,
d46c5b12 3178 src_bytes, dst_bytes, sjis_p)
4ed46869
KH
3179 struct coding_system *coding;
3180 unsigned char *source, *destination;
3181 int src_bytes, dst_bytes;
4ed46869
KH
3182 int sjis_p;
3183{
3184 unsigned char *src = source;
3185 unsigned char *src_end = source + src_bytes;
3186 unsigned char *dst = destination;
3187 unsigned char *dst_end = destination + dst_bytes;
b73bfc1c
KH
3188 /* SRC_BASE remembers the start position in source in each loop.
3189 The loop will be exited when there's not enough source text to
3190 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3191 there's not enough destination area to produce encoded codes
3192 (within macro EMIT_BYTES). */
3193 unsigned char *src_base;
3194 Lisp_Object translation_table;
4ed46869 3195
b73bfc1c
KH
3196 if (NILP (Venable_character_translation))
3197 translation_table = Qnil;
3198 else
4ed46869 3199 {
39658efc 3200 translation_table = coding->translation_table_for_encode;
b73bfc1c 3201 if (NILP (translation_table))
39658efc 3202 translation_table = Vstandard_translation_table_for_encode;
b73bfc1c 3203 }
a5d301df 3204
b73bfc1c
KH
3205 while (1)
3206 {
3207 int c, charset, c1, c2;
4ed46869 3208
b73bfc1c
KH
3209 src_base = src;
3210 ONE_MORE_CHAR (c);
93dec019 3211
b73bfc1c
KH
3212 /* Now encode the character C. */
3213 if (SINGLE_BYTE_CHAR_P (c))
3214 {
3215 switch (c)
4ed46869 3216 {
b73bfc1c 3217 case '\r':
7371fe0a 3218 if (!(coding->mode & CODING_MODE_SELECTIVE_DISPLAY))
b73bfc1c
KH
3219 {
3220 EMIT_ONE_BYTE (c);
3221 break;
3222 }
3223 c = '\n';
3224 case '\n':
3225 if (coding->eol_type == CODING_EOL_CRLF)
3226 {
3227 EMIT_TWO_BYTES ('\r', c);
3228 break;
3229 }
3230 else if (coding->eol_type == CODING_EOL_CR)
3231 c = '\r';
3232 default:
3233 EMIT_ONE_BYTE (c);
3234 }
3235 }
3236 else
3237 {
3238 SPLIT_CHAR (c, charset, c1, c2);
3239 if (sjis_p)
3240 {
3241 if (charset == charset_jisx0208
3242 || charset == charset_jisx0208_1978)
3243 {
3244 ENCODE_SJIS (c1, c2, c1, c2);
3245 EMIT_TWO_BYTES (c1, c2);
3246 }
39658efc
KH
3247 else if (charset == charset_katakana_jisx0201)
3248 EMIT_ONE_BYTE (c1 | 0x80);
fc53a214
KH
3249 else if (charset == charset_latin_jisx0201)
3250 EMIT_ONE_BYTE (c1);
0eecad43
KH
3251 else if (coding->mode & CODING_MODE_INHIBIT_UNENCODABLE_CHAR)
3252 {
3253 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER);
3254 if (CHARSET_WIDTH (charset) > 1)
3255 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER);
3256 }
b73bfc1c
KH
3257 else
3258 /* There's no way other than producing the internal
3259 codes as is. */
3260 EMIT_BYTES (src_base, src);
4ed46869 3261 }
4ed46869 3262 else
b73bfc1c
KH
3263 {
3264 if (charset == charset_big5_1 || charset == charset_big5_2)
3265 {
3266 ENCODE_BIG5 (charset, c1, c2, c1, c2);
3267 EMIT_TWO_BYTES (c1, c2);
3268 }
0eecad43
KH
3269 else if (coding->mode & CODING_MODE_INHIBIT_UNENCODABLE_CHAR)
3270 {
3271 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER);
3272 if (CHARSET_WIDTH (charset) > 1)
3273 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER);
3274 }
b73bfc1c
KH
3275 else
3276 /* There's no way other than producing the internal
3277 codes as is. */
3278 EMIT_BYTES (src_base, src);
3279 }
4ed46869 3280 }
b73bfc1c 3281 coding->consumed_char++;
4ed46869
KH
3282 }
3283
b73bfc1c
KH
3284 label_end_of_loop:
3285 coding->consumed = src_base - source;
d46c5b12 3286 coding->produced = coding->produced_char = dst - destination;
4ed46869
KH
3287}
3288
3289\f
1397dc18
KH
3290/*** 5. CCL handlers ***/
3291
3292/* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
3293 Check if a text is encoded in a coding system of which
3294 encoder/decoder are written in CCL program. If it is, return
3295 CODING_CATEGORY_MASK_CCL, else return 0. */
3296
0a28aafb
KH
3297static int
3298detect_coding_ccl (src, src_end, multibytep)
1397dc18 3299 unsigned char *src, *src_end;
0a28aafb 3300 int multibytep;
1397dc18
KH
3301{
3302 unsigned char *valid;
b73bfc1c
KH
3303 int c;
3304 /* Dummy for ONE_MORE_BYTE. */
3305 struct coding_system dummy_coding;
3306 struct coding_system *coding = &dummy_coding;
1397dc18
KH
3307
3308 /* No coding system is assigned to coding-category-ccl. */
3309 if (!coding_system_table[CODING_CATEGORY_IDX_CCL])
3310 return 0;
3311
3312 valid = coding_system_table[CODING_CATEGORY_IDX_CCL]->spec.ccl.valid_codes;
b73bfc1c 3313 while (1)
1397dc18 3314 {
0a28aafb 3315 ONE_MORE_BYTE_CHECK_MULTIBYTE (c, multibytep);
b73bfc1c
KH
3316 if (! valid[c])
3317 return 0;
1397dc18 3318 }
b73bfc1c 3319 label_end_of_loop:
1397dc18
KH
3320 return CODING_CATEGORY_MASK_CCL;
3321}
3322
3323\f
3324/*** 6. End-of-line handlers ***/
4ed46869 3325
b73bfc1c 3326/* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
4ed46869 3327
b73bfc1c 3328static void
d46c5b12 3329decode_eol (coding, source, destination, src_bytes, dst_bytes)
4ed46869 3330 struct coding_system *coding;
5bdca8af
DN
3331 const unsigned char *source;
3332 unsigned char *destination;
4ed46869 3333 int src_bytes, dst_bytes;
4ed46869 3334{
5bdca8af 3335 const unsigned char *src = source;
4ed46869 3336 unsigned char *dst = destination;
5bdca8af 3337 const unsigned char *src_end = src + src_bytes;
b73bfc1c
KH
3338 unsigned char *dst_end = dst + dst_bytes;
3339 Lisp_Object translation_table;
3340 /* SRC_BASE remembers the start position in source in each loop.
3341 The loop will be exited when there's not enough source code
3342 (within macro ONE_MORE_BYTE), or when there's not enough
3343 destination area to produce a character (within macro
3344 EMIT_CHAR). */
5bdca8af 3345 const unsigned char *src_base;
b73bfc1c
KH
3346 int c;
3347
3348 translation_table = Qnil;
4ed46869
KH
3349 switch (coding->eol_type)
3350 {
3351 case CODING_EOL_CRLF:
b73bfc1c 3352 while (1)
d46c5b12 3353 {
b73bfc1c
KH
3354 src_base = src;
3355 ONE_MORE_BYTE (c);
3356 if (c == '\r')
fb88bf2d 3357 {
b73bfc1c
KH
3358 ONE_MORE_BYTE (c);
3359 if (c != '\n')
3360 {
b73bfc1c
KH
3361 src--;
3362 c = '\r';
3363 }
fb88bf2d 3364 }
b73bfc1c
KH
3365 else if (c == '\n'
3366 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL))
d46c5b12 3367 {
b73bfc1c
KH
3368 coding->result = CODING_FINISH_INCONSISTENT_EOL;
3369 goto label_end_of_loop;
d46c5b12 3370 }
b73bfc1c 3371 EMIT_CHAR (c);
d46c5b12 3372 }
b73bfc1c
KH
3373 break;
3374
3375 case CODING_EOL_CR:
3376 while (1)
d46c5b12 3377 {
b73bfc1c
KH
3378 src_base = src;
3379 ONE_MORE_BYTE (c);
3380 if (c == '\n')
3381 {
3382 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
3383 {
3384 coding->result = CODING_FINISH_INCONSISTENT_EOL;
3385 goto label_end_of_loop;
3386 }
3387 }
3388 else if (c == '\r')
3389 c = '\n';
3390 EMIT_CHAR (c);
d46c5b12 3391 }
4ed46869
KH
3392 break;
3393
b73bfc1c
KH
3394 default: /* no need for EOL handling */
3395 while (1)
d46c5b12 3396 {
b73bfc1c
KH
3397 src_base = src;
3398 ONE_MORE_BYTE (c);
3399 EMIT_CHAR (c);
d46c5b12 3400 }
4ed46869
KH
3401 }
3402
b73bfc1c
KH
3403 label_end_of_loop:
3404 coding->consumed = coding->consumed_char = src_base - source;
3405 coding->produced = dst - destination;
3406 return;
4ed46869
KH
3407}
3408
3409/* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode
b73bfc1c 3410 format of end-of-line according to `coding->eol_type'. It also
8ca3766a 3411 convert multibyte form 8-bit characters to unibyte if
b73bfc1c
KH
3412 CODING->src_multibyte is nonzero. If `coding->mode &
3413 CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code '\r' in source text
3414 also means end-of-line. */
4ed46869 3415
b73bfc1c 3416static void
d46c5b12 3417encode_eol (coding, source, destination, src_bytes, dst_bytes)
4ed46869 3418 struct coding_system *coding;
a4244313
KR
3419 const unsigned char *source;
3420 unsigned char *destination;
4ed46869 3421 int src_bytes, dst_bytes;
4ed46869 3422{
a4244313 3423 const unsigned char *src = source;
4ed46869 3424 unsigned char *dst = destination;
a4244313 3425 const unsigned char *src_end = src + src_bytes;
b73bfc1c
KH
3426 unsigned char *dst_end = dst + dst_bytes;
3427 Lisp_Object translation_table;
3428 /* SRC_BASE remembers the start position in source in each loop.
3429 The loop will be exited when there's not enough source text to
3430 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3431 there's not enough destination area to produce encoded codes
3432 (within macro EMIT_BYTES). */
a4244313
KR
3433 const unsigned char *src_base;
3434 unsigned char *tmp;
b73bfc1c
KH
3435 int c;
3436 int selective_display = coding->mode & CODING_MODE_SELECTIVE_DISPLAY;
3437
3438 translation_table = Qnil;
3439 if (coding->src_multibyte
3440 && *(src_end - 1) == LEADING_CODE_8_BIT_CONTROL)
3441 {
3442 src_end--;
3443 src_bytes--;
3444 coding->result = CODING_FINISH_INSUFFICIENT_SRC;
3445 }
fb88bf2d 3446
d46c5b12
KH
3447 if (coding->eol_type == CODING_EOL_CRLF)
3448 {
b73bfc1c 3449 while (src < src_end)
d46c5b12 3450 {
b73bfc1c 3451 src_base = src;
d46c5b12 3452 c = *src++;
b73bfc1c
KH
3453 if (c >= 0x20)
3454 EMIT_ONE_BYTE (c);
3455 else if (c == '\n' || (c == '\r' && selective_display))
3456 EMIT_TWO_BYTES ('\r', '\n');
d46c5b12 3457 else
b73bfc1c 3458 EMIT_ONE_BYTE (c);
d46c5b12 3459 }
ff2b1ea9 3460 src_base = src;
b73bfc1c 3461 label_end_of_loop:
005f0d35 3462 ;
d46c5b12
KH
3463 }
3464 else
4ed46869 3465 {
78a629d2 3466 if (!dst_bytes || src_bytes <= dst_bytes)
4ed46869 3467 {
b73bfc1c
KH
3468 safe_bcopy (src, dst, src_bytes);
3469 src_base = src_end;
3470 dst += src_bytes;
d46c5b12 3471 }
d46c5b12 3472 else
b73bfc1c
KH
3473 {
3474 if (coding->src_multibyte
3475 && *(src + dst_bytes - 1) == LEADING_CODE_8_BIT_CONTROL)
3476 dst_bytes--;
3477 safe_bcopy (src, dst, dst_bytes);
3478 src_base = src + dst_bytes;
3479 dst = destination + dst_bytes;
3480 coding->result = CODING_FINISH_INSUFFICIENT_DST;
3481 }
993824c9 3482 if (coding->eol_type == CODING_EOL_CR)
d46c5b12 3483 {
a4244313
KR
3484 for (tmp = destination; tmp < dst; tmp++)
3485 if (*tmp == '\n') *tmp = '\r';
d46c5b12 3486 }
b73bfc1c 3487 else if (selective_display)
d46c5b12 3488 {
a4244313
KR
3489 for (tmp = destination; tmp < dst; tmp++)
3490 if (*tmp == '\r') *tmp = '\n';
4ed46869 3491 }
4ed46869 3492 }
b73bfc1c
KH
3493 if (coding->src_multibyte)
3494 dst = destination + str_as_unibyte (destination, dst - destination);
4ed46869 3495
b73bfc1c
KH
3496 coding->consumed = src_base - source;
3497 coding->produced = dst - destination;
78a629d2 3498 coding->produced_char = coding->produced;
4ed46869
KH
3499}
3500
3501\f
1397dc18 3502/*** 7. C library functions ***/
4ed46869 3503
cfb43547 3504/* In Emacs Lisp, a coding system is represented by a Lisp symbol which
4ed46869 3505 has a property `coding-system'. The value of this property is a
cfb43547 3506 vector of length 5 (called the coding-vector). Among elements of
4ed46869
KH
3507 this vector, the first (element[0]) and the fifth (element[4])
3508 carry important information for decoding/encoding. Before
3509 decoding/encoding, this information should be set in fields of a
3510 structure of type `coding_system'.
3511
cfb43547 3512 The value of the property `coding-system' can be a symbol of another
4ed46869
KH
3513 subsidiary coding-system. In that case, Emacs gets coding-vector
3514 from that symbol.
3515
3516 `element[0]' contains information to be set in `coding->type'. The
3517 value and its meaning is as follows:
3518
0ef69138
KH
3519 0 -- coding_type_emacs_mule
3520 1 -- coding_type_sjis
3521 2 -- coding_type_iso2022
3522 3 -- coding_type_big5
3523 4 -- coding_type_ccl encoder/decoder written in CCL
3524 nil -- coding_type_no_conversion
3525 t -- coding_type_undecided (automatic conversion on decoding,
3526 no-conversion on encoding)
4ed46869
KH
3527
3528 `element[4]' contains information to be set in `coding->flags' and
3529 `coding->spec'. The meaning varies by `coding->type'.
3530
3531 If `coding->type' is `coding_type_iso2022', element[4] is a vector
3532 of length 32 (of which the first 13 sub-elements are used now).
3533 Meanings of these sub-elements are:
3534
3535 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso2022'
3536 If the value is an integer of valid charset, the charset is
3537 assumed to be designated to graphic register N initially.
3538
3539 If the value is minus, it is a minus value of charset which
3540 reserves graphic register N, which means that the charset is
3541 not designated initially but should be designated to graphic
3542 register N just before encoding a character in that charset.
3543
3544 If the value is nil, graphic register N is never used on
3545 encoding.
93dec019 3546
4ed46869
KH
3547 sub-element[N] where N is 4 through 11: to be set in `coding->flags'
3548 Each value takes t or nil. See the section ISO2022 of
3549 `coding.h' for more information.
3550
3551 If `coding->type' is `coding_type_big5', element[4] is t to denote
3552 BIG5-ETen or nil to denote BIG5-HKU.
3553
3554 If `coding->type' takes the other value, element[4] is ignored.
3555
cfb43547 3556 Emacs Lisp's coding systems also carry information about format of
4ed46869
KH
3557 end-of-line in a value of property `eol-type'. If the value is
3558 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
3559 means CODING_EOL_CR. If it is not integer, it should be a vector
3560 of subsidiary coding systems of which property `eol-type' has one
cfb43547 3561 of the above values.
4ed46869
KH
3562
3563*/
3564
3565/* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
3566 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING
3567 is setup so that no conversion is necessary and return -1, else
3568 return 0. */
3569
3570int
e0e989f6
KH
3571setup_coding_system (coding_system, coding)
3572 Lisp_Object coding_system;
4ed46869
KH
3573 struct coding_system *coding;
3574{
d46c5b12 3575 Lisp_Object coding_spec, coding_type, eol_type, plist;
4608c386 3576 Lisp_Object val;
4ed46869 3577
c07c8e12
KH
3578 /* At first, zero clear all members. */
3579 bzero (coding, sizeof (struct coding_system));
3580
d46c5b12 3581 /* Initialize some fields required for all kinds of coding systems. */
774324d6 3582 coding->symbol = coding_system;
d46c5b12
KH
3583 coding->heading_ascii = -1;
3584 coding->post_read_conversion = coding->pre_write_conversion = Qnil;
ec6d2bb8
KH
3585 coding->composing = COMPOSITION_DISABLED;
3586 coding->cmp_data = NULL;
1f5dbf34
KH
3587
3588 if (NILP (coding_system))
3589 goto label_invalid_coding_system;
3590
4608c386 3591 coding_spec = Fget (coding_system, Qcoding_system);
1f5dbf34 3592
4608c386
KH
3593 if (!VECTORP (coding_spec)
3594 || XVECTOR (coding_spec)->size != 5
3595 || !CONSP (XVECTOR (coding_spec)->contents[3]))
4ed46869 3596 goto label_invalid_coding_system;
4608c386 3597
d46c5b12
KH
3598 eol_type = inhibit_eol_conversion ? Qnil : Fget (coding_system, Qeol_type);
3599 if (VECTORP (eol_type))
3600 {
3601 coding->eol_type = CODING_EOL_UNDECIDED;
3602 coding->common_flags = CODING_REQUIRE_DETECTION_MASK;
3603 }
3604 else if (XFASTINT (eol_type) == 1)
3605 {
3606 coding->eol_type = CODING_EOL_CRLF;
3607 coding->common_flags
3608 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
3609 }
3610 else if (XFASTINT (eol_type) == 2)
3611 {
3612 coding->eol_type = CODING_EOL_CR;
3613 coding->common_flags
3614 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
3615 }
3616 else
3617 coding->eol_type = CODING_EOL_LF;
3618
3619 coding_type = XVECTOR (coding_spec)->contents[0];
3620 /* Try short cut. */
3621 if (SYMBOLP (coding_type))
3622 {
3623 if (EQ (coding_type, Qt))
3624 {
3625 coding->type = coding_type_undecided;
3626 coding->common_flags |= CODING_REQUIRE_DETECTION_MASK;
3627 }
3628 else
3629 coding->type = coding_type_no_conversion;
9b96232f
KH
3630 /* Initialize this member. Any thing other than
3631 CODING_CATEGORY_IDX_UTF_16_BE and
3632 CODING_CATEGORY_IDX_UTF_16_LE are ok because they have
3633 special treatment in detect_eol. */
3634 coding->category_idx = CODING_CATEGORY_IDX_EMACS_MULE;
3635
d46c5b12
KH
3636 return 0;
3637 }
3638
d46c5b12
KH
3639 /* Get values of coding system properties:
3640 `post-read-conversion', `pre-write-conversion',
f967223b 3641 `translation-table-for-decode', `translation-table-for-encode'. */
4608c386 3642 plist = XVECTOR (coding_spec)->contents[3];
b843d1ae 3643 /* Pre & post conversion functions should be disabled if
8ca3766a 3644 inhibit_eol_conversion is nonzero. This is the case that a code
b843d1ae
KH
3645 conversion function is called while those functions are running. */
3646 if (! inhibit_pre_post_conversion)
3647 {
3648 coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion);
3649 coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion);
3650 }
f967223b 3651 val = Fplist_get (plist, Qtranslation_table_for_decode);
4608c386 3652 if (SYMBOLP (val))
f967223b
KH
3653 val = Fget (val, Qtranslation_table_for_decode);
3654 coding->translation_table_for_decode = CHAR_TABLE_P (val) ? val : Qnil;
3655 val = Fplist_get (plist, Qtranslation_table_for_encode);
4608c386 3656 if (SYMBOLP (val))
f967223b
KH
3657 val = Fget (val, Qtranslation_table_for_encode);
3658 coding->translation_table_for_encode = CHAR_TABLE_P (val) ? val : Qnil;
d46c5b12
KH
3659 val = Fplist_get (plist, Qcoding_category);
3660 if (!NILP (val))
3661 {
3662 val = Fget (val, Qcoding_category_index);
3663 if (INTEGERP (val))
3664 coding->category_idx = XINT (val);
3665 else
3666 goto label_invalid_coding_system;
3667 }
3668 else
3669 goto label_invalid_coding_system;
93dec019 3670
ec6d2bb8
KH
3671 /* If the coding system has non-nil `composition' property, enable
3672 composition handling. */
3673 val = Fplist_get (plist, Qcomposition);
3674 if (!NILP (val))
3675 coding->composing = COMPOSITION_NO;
3676
d46c5b12 3677 switch (XFASTINT (coding_type))
4ed46869
KH
3678 {
3679 case 0:
0ef69138 3680 coding->type = coding_type_emacs_mule;
aa72b389
KH
3681 coding->common_flags
3682 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
c952af22
KH
3683 if (!NILP (coding->post_read_conversion))
3684 coding->common_flags |= CODING_REQUIRE_DECODING_MASK;
3685 if (!NILP (coding->pre_write_conversion))
3686 coding->common_flags |= CODING_REQUIRE_ENCODING_MASK;
4ed46869
KH
3687 break;
3688
3689 case 1:
3690 coding->type = coding_type_sjis;
c952af22
KH
3691 coding->common_flags
3692 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869
KH
3693 break;
3694
3695 case 2:
3696 coding->type = coding_type_iso2022;
c952af22
KH
3697 coding->common_flags
3698 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3699 {
70c22245 3700 Lisp_Object val, temp;
4ed46869 3701 Lisp_Object *flags;
d46c5b12 3702 int i, charset, reg_bits = 0;
4ed46869 3703
4608c386 3704 val = XVECTOR (coding_spec)->contents[4];
f44d27ce 3705
4ed46869
KH
3706 if (!VECTORP (val) || XVECTOR (val)->size != 32)
3707 goto label_invalid_coding_system;
3708
3709 flags = XVECTOR (val)->contents;
3710 coding->flags
3711 = ((NILP (flags[4]) ? 0 : CODING_FLAG_ISO_SHORT_FORM)
3712 | (NILP (flags[5]) ? 0 : CODING_FLAG_ISO_RESET_AT_EOL)
3713 | (NILP (flags[6]) ? 0 : CODING_FLAG_ISO_RESET_AT_CNTL)
3714 | (NILP (flags[7]) ? 0 : CODING_FLAG_ISO_SEVEN_BITS)
3715 | (NILP (flags[8]) ? 0 : CODING_FLAG_ISO_LOCKING_SHIFT)
3716 | (NILP (flags[9]) ? 0 : CODING_FLAG_ISO_SINGLE_SHIFT)
3717 | (NILP (flags[10]) ? 0 : CODING_FLAG_ISO_USE_ROMAN)
3718 | (NILP (flags[11]) ? 0 : CODING_FLAG_ISO_USE_OLDJIS)
e0e989f6
KH
3719 | (NILP (flags[12]) ? 0 : CODING_FLAG_ISO_NO_DIRECTION)
3720 | (NILP (flags[13]) ? 0 : CODING_FLAG_ISO_INIT_AT_BOL)
c4825358
KH
3721 | (NILP (flags[14]) ? 0 : CODING_FLAG_ISO_DESIGNATE_AT_BOL)
3722 | (NILP (flags[15]) ? 0 : CODING_FLAG_ISO_SAFE)
3f003981 3723 | (NILP (flags[16]) ? 0 : CODING_FLAG_ISO_LATIN_EXTRA)
c4825358 3724 );
4ed46869
KH
3725
3726 /* Invoke graphic register 0 to plane 0. */
3727 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
3728 /* Invoke graphic register 1 to plane 1 if we can use full 8-bit. */
3729 CODING_SPEC_ISO_INVOCATION (coding, 1)
3730 = (coding->flags & CODING_FLAG_ISO_SEVEN_BITS ? -1 : 1);
3731 /* Not single shifting at first. */
6e85d753 3732 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0;
e0e989f6 3733 /* Beginning of buffer should also be regarded as bol. */
6e85d753 3734 CODING_SPEC_ISO_BOL (coding) = 1;
4ed46869 3735
70c22245
KH
3736 for (charset = 0; charset <= MAX_CHARSET; charset++)
3737 CODING_SPEC_ISO_REVISION_NUMBER (coding, charset) = 255;
3738 val = Vcharset_revision_alist;
3739 while (CONSP (val))
3740 {
03699b14 3741 charset = get_charset_id (Fcar_safe (XCAR (val)));
70c22245 3742 if (charset >= 0
03699b14 3743 && (temp = Fcdr_safe (XCAR (val)), INTEGERP (temp))
70c22245
KH
3744 && (i = XINT (temp), (i >= 0 && (i + '@') < 128)))
3745 CODING_SPEC_ISO_REVISION_NUMBER (coding, charset) = i;
03699b14 3746 val = XCDR (val);
70c22245
KH
3747 }
3748
4ed46869
KH
3749 /* Checks FLAGS[REG] (REG = 0, 1, 2 3) and decide designations.
3750 FLAGS[REG] can be one of below:
3751 integer CHARSET: CHARSET occupies register I,
3752 t: designate nothing to REG initially, but can be used
3753 by any charsets,
3754 list of integer, nil, or t: designate the first
3755 element (if integer) to REG initially, the remaining
3756 elements (if integer) is designated to REG on request,
d46c5b12 3757 if an element is t, REG can be used by any charsets,
4ed46869 3758 nil: REG is never used. */
467e7675 3759 for (charset = 0; charset <= MAX_CHARSET; charset++)
1ba9e4ab
KH
3760 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3761 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION;
4ed46869
KH
3762 for (i = 0; i < 4; i++)
3763 {
87323294
PJ
3764 if ((INTEGERP (flags[i])
3765 && (charset = XINT (flags[i]), CHARSET_VALID_P (charset)))
e0e989f6 3766 || (charset = get_charset_id (flags[i])) >= 0)
4ed46869
KH
3767 {
3768 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = charset;
3769 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) = i;
3770 }
3771 else if (EQ (flags[i], Qt))
3772 {
3773 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
d46c5b12
KH
3774 reg_bits |= 1 << i;
3775 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
4ed46869
KH
3776 }
3777 else if (CONSP (flags[i]))
3778 {
84d60297
RS
3779 Lisp_Object tail;
3780 tail = flags[i];
4ed46869 3781
d46c5b12 3782 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
87323294
PJ
3783 if ((INTEGERP (XCAR (tail))
3784 && (charset = XINT (XCAR (tail)),
3785 CHARSET_VALID_P (charset)))
03699b14 3786 || (charset = get_charset_id (XCAR (tail))) >= 0)
4ed46869
KH
3787 {
3788 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = charset;
3789 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) =i;
3790 }
3791 else
3792 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
03699b14 3793 tail = XCDR (tail);
4ed46869
KH
3794 while (CONSP (tail))
3795 {
87323294
PJ
3796 if ((INTEGERP (XCAR (tail))
3797 && (charset = XINT (XCAR (tail)),
3798 CHARSET_VALID_P (charset)))
03699b14 3799 || (charset = get_charset_id (XCAR (tail))) >= 0)
70c22245
KH
3800 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3801 = i;
03699b14 3802 else if (EQ (XCAR (tail), Qt))
d46c5b12 3803 reg_bits |= 1 << i;
03699b14 3804 tail = XCDR (tail);
4ed46869
KH
3805 }
3806 }
3807 else
3808 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
93dec019 3809
4ed46869
KH
3810 CODING_SPEC_ISO_DESIGNATION (coding, i)
3811 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i);
3812 }
3813
d46c5b12 3814 if (reg_bits && ! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
4ed46869
KH
3815 {
3816 /* REG 1 can be used only by locking shift in 7-bit env. */
3817 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
d46c5b12 3818 reg_bits &= ~2;
4ed46869
KH
3819 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
3820 /* Without any shifting, only REG 0 and 1 can be used. */
d46c5b12 3821 reg_bits &= 3;
4ed46869
KH
3822 }
3823
d46c5b12
KH
3824 if (reg_bits)
3825 for (charset = 0; charset <= MAX_CHARSET; charset++)
6e85d753 3826 {
928a85c1 3827 if (CHARSET_DEFINED_P (charset)
96148065
KH
3828 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3829 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
d46c5b12
KH
3830 {
3831 /* There exist some default graphic registers to be
96148065 3832 used by CHARSET. */
d46c5b12
KH
3833
3834 /* We had better avoid designating a charset of
3835 CHARS96 to REG 0 as far as possible. */
3836 if (CHARSET_CHARS (charset) == 96)
3837 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3838 = (reg_bits & 2
3839 ? 1 : (reg_bits & 4 ? 2 : (reg_bits & 8 ? 3 : 0)));
3840 else
3841 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
3842 = (reg_bits & 1
3843 ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3)));
3844 }
6e85d753 3845 }
4ed46869 3846 }
c952af22 3847 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
d46c5b12 3848 coding->spec.iso2022.last_invalid_designation_register = -1;
4ed46869
KH
3849 break;
3850
3851 case 3:
3852 coding->type = coding_type_big5;
c952af22
KH
3853 coding->common_flags
3854 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3855 coding->flags
4608c386 3856 = (NILP (XVECTOR (coding_spec)->contents[4])
4ed46869
KH
3857 ? CODING_FLAG_BIG5_HKU
3858 : CODING_FLAG_BIG5_ETEN);
3859 break;
3860
3861 case 4:
3862 coding->type = coding_type_ccl;
c952af22
KH
3863 coding->common_flags
3864 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
4ed46869 3865 {
84d60297 3866 val = XVECTOR (coding_spec)->contents[4];
ef4ced28
KH
3867 if (! CONSP (val)
3868 || setup_ccl_program (&(coding->spec.ccl.decoder),
03699b14 3869 XCAR (val)) < 0
ef4ced28 3870 || setup_ccl_program (&(coding->spec.ccl.encoder),
03699b14 3871 XCDR (val)) < 0)
4ed46869 3872 goto label_invalid_coding_system;
1397dc18
KH
3873
3874 bzero (coding->spec.ccl.valid_codes, 256);
3875 val = Fplist_get (plist, Qvalid_codes);
3876 if (CONSP (val))
3877 {
3878 Lisp_Object this;
3879
03699b14 3880 for (; CONSP (val); val = XCDR (val))
1397dc18 3881 {
03699b14 3882 this = XCAR (val);
1397dc18
KH
3883 if (INTEGERP (this)
3884 && XINT (this) >= 0 && XINT (this) < 256)
3885 coding->spec.ccl.valid_codes[XINT (this)] = 1;
3886 else if (CONSP (this)
03699b14
KR
3887 && INTEGERP (XCAR (this))
3888 && INTEGERP (XCDR (this)))
1397dc18 3889 {
03699b14
KR
3890 int start = XINT (XCAR (this));
3891 int end = XINT (XCDR (this));
1397dc18
KH
3892
3893 if (start >= 0 && start <= end && end < 256)
e133c8fa 3894 while (start <= end)
1397dc18
KH
3895 coding->spec.ccl.valid_codes[start++] = 1;
3896 }
3897 }
3898 }
4ed46869 3899 }
c952af22 3900 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
aaaf0b1e 3901 coding->spec.ccl.cr_carryover = 0;
1c3478b0 3902 coding->spec.ccl.eight_bit_carryover[0] = 0;
4ed46869
KH
3903 break;
3904
27901516
KH
3905 case 5:
3906 coding->type = coding_type_raw_text;
3907 break;
3908
4ed46869 3909 default:
d46c5b12 3910 goto label_invalid_coding_system;
4ed46869
KH
3911 }
3912 return 0;
3913
3914 label_invalid_coding_system:
3915 coding->type = coding_type_no_conversion;
d46c5b12 3916 coding->category_idx = CODING_CATEGORY_IDX_BINARY;
c952af22 3917 coding->common_flags = 0;
dec137e5 3918 coding->eol_type = CODING_EOL_LF;
d46c5b12 3919 coding->pre_write_conversion = coding->post_read_conversion = Qnil;
4ed46869
KH
3920 return -1;
3921}
3922
ec6d2bb8
KH
3923/* Free memory blocks allocated for storing composition information. */
3924
3925void
3926coding_free_composition_data (coding)
3927 struct coding_system *coding;
3928{
3929 struct composition_data *cmp_data = coding->cmp_data, *next;
3930
3931 if (!cmp_data)
3932 return;
3933 /* Memory blocks are chained. At first, rewind to the first, then,
3934 free blocks one by one. */
3935 while (cmp_data->prev)
3936 cmp_data = cmp_data->prev;
3937 while (cmp_data)
3938 {
3939 next = cmp_data->next;
3940 xfree (cmp_data);
3941 cmp_data = next;
3942 }
3943 coding->cmp_data = NULL;
3944}
3945
3946/* Set `char_offset' member of all memory blocks pointed by
3947 coding->cmp_data to POS. */
3948
3949void
3950coding_adjust_composition_offset (coding, pos)
3951 struct coding_system *coding;
3952 int pos;
3953{
3954 struct composition_data *cmp_data;
3955
3956 for (cmp_data = coding->cmp_data; cmp_data; cmp_data = cmp_data->next)
3957 cmp_data->char_offset = pos;
3958}
3959
54f78171
KH
3960/* Setup raw-text or one of its subsidiaries in the structure
3961 coding_system CODING according to the already setup value eol_type
3962 in CODING. CODING should be setup for some coding system in
3963 advance. */
3964
3965void
3966setup_raw_text_coding_system (coding)
3967 struct coding_system *coding;
3968{
3969 if (coding->type != coding_type_raw_text)
3970 {
3971 coding->symbol = Qraw_text;
3972 coding->type = coding_type_raw_text;
3973 if (coding->eol_type != CODING_EOL_UNDECIDED)
3974 {
84d60297
RS
3975 Lisp_Object subsidiaries;
3976 subsidiaries = Fget (Qraw_text, Qeol_type);
54f78171
KH
3977
3978 if (VECTORP (subsidiaries)
3979 && XVECTOR (subsidiaries)->size == 3)
3980 coding->symbol
3981 = XVECTOR (subsidiaries)->contents[coding->eol_type];
3982 }
716e0b0a 3983 setup_coding_system (coding->symbol, coding);
54f78171
KH
3984 }
3985 return;
3986}
3987
4ed46869
KH
3988/* Emacs has a mechanism to automatically detect a coding system if it
3989 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But,
3990 it's impossible to distinguish some coding systems accurately
3991 because they use the same range of codes. So, at first, coding
3992 systems are categorized into 7, those are:
3993
0ef69138 3994 o coding-category-emacs-mule
4ed46869
KH
3995
3996 The category for a coding system which has the same code range
3997 as Emacs' internal format. Assigned the coding-system (Lisp
0ef69138 3998 symbol) `emacs-mule' by default.
4ed46869
KH
3999
4000 o coding-category-sjis
4001
4002 The category for a coding system which has the same code range
4003 as SJIS. Assigned the coding-system (Lisp
7717c392 4004 symbol) `japanese-shift-jis' by default.
4ed46869
KH
4005
4006 o coding-category-iso-7
4007
4008 The category for a coding system which has the same code range
7717c392 4009 as ISO2022 of 7-bit environment. This doesn't use any locking
d46c5b12
KH
4010 shift and single shift functions. This can encode/decode all
4011 charsets. Assigned the coding-system (Lisp symbol)
4012 `iso-2022-7bit' by default.
4013
4014 o coding-category-iso-7-tight
4015
4016 Same as coding-category-iso-7 except that this can
4017 encode/decode only the specified charsets.
4ed46869
KH
4018
4019 o coding-category-iso-8-1
4020
4021 The category for a coding system which has the same code range
4022 as ISO2022 of 8-bit environment and graphic plane 1 used only
7717c392
KH
4023 for DIMENSION1 charset. This doesn't use any locking shift
4024 and single shift functions. Assigned the coding-system (Lisp
4025 symbol) `iso-latin-1' by default.
4ed46869
KH
4026
4027 o coding-category-iso-8-2
4028
4029 The category for a coding system which has the same code range
4030 as ISO2022 of 8-bit environment and graphic plane 1 used only
7717c392
KH
4031 for DIMENSION2 charset. This doesn't use any locking shift
4032 and single shift functions. Assigned the coding-system (Lisp
4033 symbol) `japanese-iso-8bit' by default.
4ed46869 4034
7717c392 4035 o coding-category-iso-7-else
4ed46869
KH
4036
4037 The category for a coding system which has the same code range
8ca3766a 4038 as ISO2022 of 7-bit environment but uses locking shift or
7717c392
KH
4039 single shift functions. Assigned the coding-system (Lisp
4040 symbol) `iso-2022-7bit-lock' by default.
4041
4042 o coding-category-iso-8-else
4043
4044 The category for a coding system which has the same code range
8ca3766a 4045 as ISO2022 of 8-bit environment but uses locking shift or
7717c392
KH
4046 single shift functions. Assigned the coding-system (Lisp
4047 symbol) `iso-2022-8bit-ss2' by default.
4ed46869
KH
4048
4049 o coding-category-big5
4050
4051 The category for a coding system which has the same code range
4052 as BIG5. Assigned the coding-system (Lisp symbol)
e0e989f6 4053 `cn-big5' by default.
4ed46869 4054
fa42c37f
KH
4055 o coding-category-utf-8
4056
4057 The category for a coding system which has the same code range
38b92c42 4058 as UTF-8 (cf. RFC3629). Assigned the coding-system (Lisp
fa42c37f
KH
4059 symbol) `utf-8' by default.
4060
4061 o coding-category-utf-16-be
4062
4063 The category for a coding system in which a text has an
4064 Unicode signature (cf. Unicode Standard) in the order of BIG
4065 endian at the head. Assigned the coding-system (Lisp symbol)
4066 `utf-16-be' by default.
4067
4068 o coding-category-utf-16-le
4069
4070 The category for a coding system in which a text has an
4071 Unicode signature (cf. Unicode Standard) in the order of
4072 LITTLE endian at the head. Assigned the coding-system (Lisp
4073 symbol) `utf-16-le' by default.
4074
1397dc18
KH
4075 o coding-category-ccl
4076
4077 The category for a coding system of which encoder/decoder is
4078 written in CCL programs. The default value is nil, i.e., no
4079 coding system is assigned.
4080
4ed46869
KH
4081 o coding-category-binary
4082
4083 The category for a coding system not categorized in any of the
4084 above. Assigned the coding-system (Lisp symbol)
e0e989f6 4085 `no-conversion' by default.
4ed46869
KH
4086
4087 Each of them is a Lisp symbol and the value is an actual
cfb43547 4088 `coding-system' (this is also a Lisp symbol) assigned by a user.
4ed46869
KH
4089 What Emacs does actually is to detect a category of coding system.
4090 Then, it uses a `coding-system' assigned to it. If Emacs can't
cfb43547 4091 decide a single possible category, it selects a category of the
4ed46869
KH
4092 highest priority. Priorities of categories are also specified by a
4093 user in a Lisp variable `coding-category-list'.
4094
4095*/
4096
66cfb530
KH
4097static
4098int ascii_skip_code[256];
4099
d46c5b12 4100/* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded.
4ed46869
KH
4101 If it detects possible coding systems, return an integer in which
4102 appropriate flag bits are set. Flag bits are defined by macros
fa42c37f
KH
4103 CODING_CATEGORY_MASK_XXX in `coding.h'. If PRIORITIES is non-NULL,
4104 it should point the table `coding_priorities'. In that case, only
4105 the flag bit for a coding system of the highest priority is set in
0a28aafb
KH
4106 the returned value. If MULTIBYTEP is nonzero, 8-bit codes of the
4107 range 0x80..0x9F are in multibyte form.
4ed46869 4108
d46c5b12
KH
4109 How many ASCII characters are at the head is returned as *SKIP. */
4110
4111static int
0a28aafb 4112detect_coding_mask (source, src_bytes, priorities, skip, multibytep)
d46c5b12
KH
4113 unsigned char *source;
4114 int src_bytes, *priorities, *skip;
0a28aafb 4115 int multibytep;
4ed46869
KH
4116{
4117 register unsigned char c;
d46c5b12 4118 unsigned char *src = source, *src_end = source + src_bytes;
fa42c37f 4119 unsigned int mask, utf16_examined_p, iso2022_examined_p;
da55a2b7 4120 int i;
4ed46869
KH
4121
4122 /* At first, skip all ASCII characters and control characters except
4123 for three ISO2022 specific control characters. */
66cfb530
KH
4124 ascii_skip_code[ISO_CODE_SO] = 0;
4125 ascii_skip_code[ISO_CODE_SI] = 0;
4126 ascii_skip_code[ISO_CODE_ESC] = 0;
4127
bcf26d6a 4128 label_loop_detect_coding:
66cfb530 4129 while (src < src_end && ascii_skip_code[*src]) src++;
d46c5b12 4130 *skip = src - source;
4ed46869
KH
4131
4132 if (src >= src_end)
4133 /* We found nothing other than ASCII. There's nothing to do. */
d46c5b12 4134 return 0;
4ed46869 4135
8a8147d6 4136 c = *src;
4ed46869
KH
4137 /* The text seems to be encoded in some multilingual coding system.
4138 Now, try to find in which coding system the text is encoded. */
4139 if (c < 0x80)
bcf26d6a
KH
4140 {
4141 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */
4142 /* C is an ISO2022 specific control code of C0. */
0a28aafb 4143 mask = detect_coding_iso2022 (src, src_end, multibytep);
1b2af4b0 4144 if (mask == 0)
d46c5b12
KH
4145 {
4146 /* No valid ISO2022 code follows C. Try again. */
4147 src++;
66cfb530
KH
4148 if (c == ISO_CODE_ESC)
4149 ascii_skip_code[ISO_CODE_ESC] = 1;
4150 else
4151 ascii_skip_code[ISO_CODE_SO] = ascii_skip_code[ISO_CODE_SI] = 1;
d46c5b12
KH
4152 goto label_loop_detect_coding;
4153 }
4154 if (priorities)
fa42c37f
KH
4155 {
4156 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
4157 {
4158 if (mask & priorities[i])
4159 return priorities[i];
4160 }
4161 return CODING_CATEGORY_MASK_RAW_TEXT;
4162 }
bcf26d6a 4163 }
d46c5b12 4164 else
c4825358 4165 {
d46c5b12 4166 int try;
4ed46869 4167
0a28aafb 4168 if (multibytep && c == LEADING_CODE_8_BIT_CONTROL)
67091e59 4169 c = src[1] - 0x20;
0a28aafb 4170
d46c5b12
KH
4171 if (c < 0xA0)
4172 {
4173 /* C is the first byte of SJIS character code,
fa42c37f
KH
4174 or a leading-code of Emacs' internal format (emacs-mule),
4175 or the first byte of UTF-16. */
4176 try = (CODING_CATEGORY_MASK_SJIS
4177 | CODING_CATEGORY_MASK_EMACS_MULE
4178 | CODING_CATEGORY_MASK_UTF_16_BE
4179 | CODING_CATEGORY_MASK_UTF_16_LE);
d46c5b12
KH
4180
4181 /* Or, if C is a special latin extra code,
93dec019 4182 or is an ISO2022 specific control code of C1 (SS2 or SS3),
d46c5b12
KH
4183 or is an ISO2022 control-sequence-introducer (CSI),
4184 we should also consider the possibility of ISO2022 codings. */
4185 if ((VECTORP (Vlatin_extra_code_table)
4186 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
4187 || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3)
4188 || (c == ISO_CODE_CSI
4189 && (src < src_end
4190 && (*src == ']'
4191 || ((*src == '0' || *src == '1' || *src == '2')
4192 && src + 1 < src_end
4193 && src[1] == ']')))))
4194 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE
4195 | CODING_CATEGORY_MASK_ISO_8BIT);
4196 }
c4825358 4197 else
d46c5b12
KH
4198 /* C is a character of ISO2022 in graphic plane right,
4199 or a SJIS's 1-byte character code (i.e. JISX0201),
fa42c37f
KH
4200 or the first byte of BIG5's 2-byte code,
4201 or the first byte of UTF-8/16. */
d46c5b12
KH
4202 try = (CODING_CATEGORY_MASK_ISO_8_ELSE
4203 | CODING_CATEGORY_MASK_ISO_8BIT
4204 | CODING_CATEGORY_MASK_SJIS
fa42c37f
KH
4205 | CODING_CATEGORY_MASK_BIG5
4206 | CODING_CATEGORY_MASK_UTF_8
4207 | CODING_CATEGORY_MASK_UTF_16_BE
4208 | CODING_CATEGORY_MASK_UTF_16_LE);
d46c5b12 4209
1397dc18
KH
4210 /* Or, we may have to consider the possibility of CCL. */
4211 if (coding_system_table[CODING_CATEGORY_IDX_CCL]
4212 && (coding_system_table[CODING_CATEGORY_IDX_CCL]
4213 ->spec.ccl.valid_codes)[c])
4214 try |= CODING_CATEGORY_MASK_CCL;
4215
d46c5b12 4216 mask = 0;
fa42c37f 4217 utf16_examined_p = iso2022_examined_p = 0;
d46c5b12
KH
4218 if (priorities)
4219 {
4220 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
4221 {
fa42c37f
KH
4222 if (!iso2022_examined_p
4223 && (priorities[i] & try & CODING_CATEGORY_MASK_ISO))
4224 {
0192762c 4225 mask |= detect_coding_iso2022 (src, src_end, multibytep);
fa42c37f
KH
4226 iso2022_examined_p = 1;
4227 }
5ab13dd0 4228 else if (priorities[i] & try & CODING_CATEGORY_MASK_SJIS)
0a28aafb 4229 mask |= detect_coding_sjis (src, src_end, multibytep);
fa42c37f 4230 else if (priorities[i] & try & CODING_CATEGORY_MASK_UTF_8)
0a28aafb 4231 mask |= detect_coding_utf_8 (src, src_end, multibytep);
fa42c37f
KH
4232 else if (!utf16_examined_p
4233 && (priorities[i] & try &
4234 CODING_CATEGORY_MASK_UTF_16_BE_LE))
4235 {
0a28aafb 4236 mask |= detect_coding_utf_16 (src, src_end, multibytep);
fa42c37f
KH
4237 utf16_examined_p = 1;
4238 }
5ab13dd0 4239 else if (priorities[i] & try & CODING_CATEGORY_MASK_BIG5)
0a28aafb 4240 mask |= detect_coding_big5 (src, src_end, multibytep);
5ab13dd0 4241 else if (priorities[i] & try & CODING_CATEGORY_MASK_EMACS_MULE)
0a28aafb 4242 mask |= detect_coding_emacs_mule (src, src_end, multibytep);
89fa8b36 4243 else if (priorities[i] & try & CODING_CATEGORY_MASK_CCL)
0a28aafb 4244 mask |= detect_coding_ccl (src, src_end, multibytep);
5ab13dd0 4245 else if (priorities[i] & CODING_CATEGORY_MASK_RAW_TEXT)
fa42c37f 4246 mask |= CODING_CATEGORY_MASK_RAW_TEXT;
5ab13dd0 4247 else if (priorities[i] & CODING_CATEGORY_MASK_BINARY)
fa42c37f
KH
4248 mask |= CODING_CATEGORY_MASK_BINARY;
4249 if (mask & priorities[i])
4250 return priorities[i];
d46c5b12
KH
4251 }
4252 return CODING_CATEGORY_MASK_RAW_TEXT;
4253 }
4254 if (try & CODING_CATEGORY_MASK_ISO)
0a28aafb 4255 mask |= detect_coding_iso2022 (src, src_end, multibytep);
d46c5b12 4256 if (try & CODING_CATEGORY_MASK_SJIS)
0a28aafb 4257 mask |= detect_coding_sjis (src, src_end, multibytep);
d46c5b12 4258 if (try & CODING_CATEGORY_MASK_BIG5)
0a28aafb 4259 mask |= detect_coding_big5 (src, src_end, multibytep);
fa42c37f 4260 if (try & CODING_CATEGORY_MASK_UTF_8)
0a28aafb 4261 mask |= detect_coding_utf_8 (src, src_end, multibytep);
fa42c37f 4262 if (try & CODING_CATEGORY_MASK_UTF_16_BE_LE)
0a28aafb 4263 mask |= detect_coding_utf_16 (src, src_end, multibytep);
d46c5b12 4264 if (try & CODING_CATEGORY_MASK_EMACS_MULE)
0a28aafb 4265 mask |= detect_coding_emacs_mule (src, src_end, multibytep);
1397dc18 4266 if (try & CODING_CATEGORY_MASK_CCL)
0a28aafb 4267 mask |= detect_coding_ccl (src, src_end, multibytep);
c4825358 4268 }
5ab13dd0 4269 return (mask | CODING_CATEGORY_MASK_RAW_TEXT | CODING_CATEGORY_MASK_BINARY);
4ed46869
KH
4270}
4271
4272/* Detect how a text of length SRC_BYTES pointed by SRC is encoded.
4273 The information of the detected coding system is set in CODING. */
4274
4275void
4276detect_coding (coding, src, src_bytes)
4277 struct coding_system *coding;
a4244313 4278 const unsigned char *src;
4ed46869
KH
4279 int src_bytes;
4280{
d46c5b12 4281 unsigned int idx;
da55a2b7 4282 int skip, mask;
84d60297 4283 Lisp_Object val;
4ed46869 4284
84d60297 4285 val = Vcoding_category_list;
64c1e55f
KH
4286 mask = detect_coding_mask (src, src_bytes, coding_priorities, &skip,
4287 coding->src_multibyte);
d46c5b12 4288 coding->heading_ascii = skip;
4ed46869 4289
d46c5b12
KH
4290 if (!mask) return;
4291
4292 /* We found a single coding system of the highest priority in MASK. */
4293 idx = 0;
4294 while (mask && ! (mask & 1)) mask >>= 1, idx++;
4295 if (! mask)
4296 idx = CODING_CATEGORY_IDX_RAW_TEXT;
4ed46869 4297
f5c1dd0d 4298 val = SYMBOL_VALUE (XVECTOR (Vcoding_category_table)->contents[idx]);
d46c5b12
KH
4299
4300 if (coding->eol_type != CODING_EOL_UNDECIDED)
27901516 4301 {
84d60297 4302 Lisp_Object tmp;
d46c5b12 4303
84d60297 4304 tmp = Fget (val, Qeol_type);
d46c5b12
KH
4305 if (VECTORP (tmp))
4306 val = XVECTOR (tmp)->contents[coding->eol_type];
4ed46869 4307 }
b73bfc1c
KH
4308
4309 /* Setup this new coding system while preserving some slots. */
4310 {
4311 int src_multibyte = coding->src_multibyte;
4312 int dst_multibyte = coding->dst_multibyte;
4313
4314 setup_coding_system (val, coding);
4315 coding->src_multibyte = src_multibyte;
4316 coding->dst_multibyte = dst_multibyte;
4317 coding->heading_ascii = skip;
4318 }
4ed46869
KH
4319}
4320
d46c5b12
KH
4321/* Detect how end-of-line of a text of length SRC_BYTES pointed by
4322 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF,
4323 CODING_EOL_CR, and CODING_EOL_UNDECIDED.
4324
4325 How many non-eol characters are at the head is returned as *SKIP. */
4ed46869 4326
bc4bc72a
RS
4327#define MAX_EOL_CHECK_COUNT 3
4328
d46c5b12
KH
4329static int
4330detect_eol_type (source, src_bytes, skip)
4331 unsigned char *source;
4332 int src_bytes, *skip;
4ed46869 4333{
d46c5b12 4334 unsigned char *src = source, *src_end = src + src_bytes;
4ed46869 4335 unsigned char c;
bc4bc72a
RS
4336 int total = 0; /* How many end-of-lines are found so far. */
4337 int eol_type = CODING_EOL_UNDECIDED;
4338 int this_eol_type;
4ed46869 4339
d46c5b12
KH
4340 *skip = 0;
4341
bc4bc72a 4342 while (src < src_end && total < MAX_EOL_CHECK_COUNT)
4ed46869
KH
4343 {
4344 c = *src++;
bc4bc72a 4345 if (c == '\n' || c == '\r')
4ed46869 4346 {
d46c5b12
KH
4347 if (*skip == 0)
4348 *skip = src - 1 - source;
bc4bc72a
RS
4349 total++;
4350 if (c == '\n')
4351 this_eol_type = CODING_EOL_LF;
4352 else if (src >= src_end || *src != '\n')
4353 this_eol_type = CODING_EOL_CR;
4ed46869 4354 else
bc4bc72a
RS
4355 this_eol_type = CODING_EOL_CRLF, src++;
4356
4357 if (eol_type == CODING_EOL_UNDECIDED)
4358 /* This is the first end-of-line. */
4359 eol_type = this_eol_type;
4360 else if (eol_type != this_eol_type)
d46c5b12
KH
4361 {
4362 /* The found type is different from what found before. */
4363 eol_type = CODING_EOL_INCONSISTENT;
4364 break;
4365 }
4ed46869
KH
4366 }
4367 }
bc4bc72a 4368
d46c5b12
KH
4369 if (*skip == 0)
4370 *skip = src_end - source;
85a02ca4 4371 return eol_type;
4ed46869
KH
4372}
4373
fa42c37f
KH
4374/* Like detect_eol_type, but detect EOL type in 2-octet
4375 big-endian/little-endian format for coding systems utf-16-be and
4376 utf-16-le. */
4377
4378static int
4379detect_eol_type_in_2_octet_form (source, src_bytes, skip, big_endian_p)
4380 unsigned char *source;
cfb43547 4381 int src_bytes, *skip, big_endian_p;
fa42c37f
KH
4382{
4383 unsigned char *src = source, *src_end = src + src_bytes;
4384 unsigned int c1, c2;
4385 int total = 0; /* How many end-of-lines are found so far. */
4386 int eol_type = CODING_EOL_UNDECIDED;
4387 int this_eol_type;
4388 int msb, lsb;
4389
4390 if (big_endian_p)
4391 msb = 0, lsb = 1;
4392 else
4393 msb = 1, lsb = 0;
4394
4395 *skip = 0;
4396
4397 while ((src + 1) < src_end && total < MAX_EOL_CHECK_COUNT)
4398 {
4399 c1 = (src[msb] << 8) | (src[lsb]);
4400 src += 2;
4401
4402 if (c1 == '\n' || c1 == '\r')
4403 {
4404 if (*skip == 0)
4405 *skip = src - 2 - source;
4406 total++;
4407 if (c1 == '\n')
4408 {
4409 this_eol_type = CODING_EOL_LF;
4410 }
4411 else
4412 {
4413 if ((src + 1) >= src_end)
4414 {
4415 this_eol_type = CODING_EOL_CR;
4416 }
4417 else
4418 {
4419 c2 = (src[msb] << 8) | (src[lsb]);
4420 if (c2 == '\n')
4421 this_eol_type = CODING_EOL_CRLF, src += 2;
4422 else
4423 this_eol_type = CODING_EOL_CR;
4424 }
4425 }
4426
4427 if (eol_type == CODING_EOL_UNDECIDED)
4428 /* This is the first end-of-line. */
4429 eol_type = this_eol_type;
4430 else if (eol_type != this_eol_type)
4431 {
4432 /* The found type is different from what found before. */
4433 eol_type = CODING_EOL_INCONSISTENT;
4434 break;
4435 }
4436 }
4437 }
4438
4439 if (*skip == 0)
4440 *skip = src_end - source;
4441 return eol_type;
4442}
4443
4ed46869
KH
4444/* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
4445 is encoded. If it detects an appropriate format of end-of-line, it
4446 sets the information in *CODING. */
4447
4448void
4449detect_eol (coding, src, src_bytes)
4450 struct coding_system *coding;
a4244313 4451 const unsigned char *src;
4ed46869
KH
4452 int src_bytes;
4453{
4608c386 4454 Lisp_Object val;
d46c5b12 4455 int skip;
fa42c37f
KH
4456 int eol_type;
4457
4458 switch (coding->category_idx)
4459 {
4460 case CODING_CATEGORY_IDX_UTF_16_BE:
4461 eol_type = detect_eol_type_in_2_octet_form (src, src_bytes, &skip, 1);
4462 break;
4463 case CODING_CATEGORY_IDX_UTF_16_LE:
4464 eol_type = detect_eol_type_in_2_octet_form (src, src_bytes, &skip, 0);
4465 break;
4466 default:
4467 eol_type = detect_eol_type (src, src_bytes, &skip);
4468 break;
4469 }
d46c5b12
KH
4470
4471 if (coding->heading_ascii > skip)
4472 coding->heading_ascii = skip;
4473 else
4474 skip = coding->heading_ascii;
4ed46869 4475
0ef69138 4476 if (eol_type == CODING_EOL_UNDECIDED)
4ed46869 4477 return;
27901516
KH
4478 if (eol_type == CODING_EOL_INCONSISTENT)
4479 {
4480#if 0
4481 /* This code is suppressed until we find a better way to
992f23f2 4482 distinguish raw text file and binary file. */
27901516
KH
4483
4484 /* If we have already detected that the coding is raw-text, the
4485 coding should actually be no-conversion. */
4486 if (coding->type == coding_type_raw_text)
4487 {
4488 setup_coding_system (Qno_conversion, coding);
4489 return;
4490 }
4491 /* Else, let's decode only text code anyway. */
4492#endif /* 0 */
1b2af4b0 4493 eol_type = CODING_EOL_LF;
27901516
KH
4494 }
4495
4608c386 4496 val = Fget (coding->symbol, Qeol_type);
4ed46869 4497 if (VECTORP (val) && XVECTOR (val)->size == 3)
d46c5b12 4498 {
b73bfc1c
KH
4499 int src_multibyte = coding->src_multibyte;
4500 int dst_multibyte = coding->dst_multibyte;
1cd6b64c 4501 struct composition_data *cmp_data = coding->cmp_data;
b73bfc1c 4502
d46c5b12 4503 setup_coding_system (XVECTOR (val)->contents[eol_type], coding);
b73bfc1c
KH
4504 coding->src_multibyte = src_multibyte;
4505 coding->dst_multibyte = dst_multibyte;
d46c5b12 4506 coding->heading_ascii = skip;
1cd6b64c 4507 coding->cmp_data = cmp_data;
d46c5b12
KH
4508 }
4509}
4510
4511#define CONVERSION_BUFFER_EXTRA_ROOM 256
4512
b73bfc1c
KH
4513#define DECODING_BUFFER_MAG(coding) \
4514 (coding->type == coding_type_iso2022 \
4515 ? 3 \
4516 : (coding->type == coding_type_ccl \
4517 ? coding->spec.ccl.decoder.buf_magnification \
4518 : 2))
d46c5b12
KH
4519
4520/* Return maximum size (bytes) of a buffer enough for decoding
4521 SRC_BYTES of text encoded in CODING. */
4522
4523int
4524decoding_buffer_size (coding, src_bytes)
4525 struct coding_system *coding;
4526 int src_bytes;
4527{
4528 return (src_bytes * DECODING_BUFFER_MAG (coding)
4529 + CONVERSION_BUFFER_EXTRA_ROOM);
4530}
4531
4532/* Return maximum size (bytes) of a buffer enough for encoding
4533 SRC_BYTES of text to CODING. */
4534
4535int
4536encoding_buffer_size (coding, src_bytes)
4537 struct coding_system *coding;
4538 int src_bytes;
4539{
4540 int magnification;
4541
4542 if (coding->type == coding_type_ccl)
a84f1519
KH
4543 {
4544 magnification = coding->spec.ccl.encoder.buf_magnification;
4545 if (coding->eol_type == CODING_EOL_CRLF)
4546 magnification *= 2;
4547 }
b73bfc1c 4548 else if (CODING_REQUIRE_ENCODING (coding))
d46c5b12 4549 magnification = 3;
b73bfc1c
KH
4550 else
4551 magnification = 1;
d46c5b12
KH
4552
4553 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM);
4554}
4555
73be902c
KH
4556/* Working buffer for code conversion. */
4557struct conversion_buffer
4558{
4559 int size; /* size of data. */
4560 int on_stack; /* 1 if allocated by alloca. */
4561 unsigned char *data;
4562};
d46c5b12 4563
73be902c
KH
4564/* Allocate LEN bytes of memory for BUF (struct conversion_buffer). */
4565#define allocate_conversion_buffer(buf, len) \
4566 do { \
4567 if (len < MAX_ALLOCA) \
4568 { \
4569 buf.data = (unsigned char *) alloca (len); \
4570 buf.on_stack = 1; \
4571 } \
4572 else \
4573 { \
4574 buf.data = (unsigned char *) xmalloc (len); \
4575 buf.on_stack = 0; \
4576 } \
4577 buf.size = len; \
4578 } while (0)
d46c5b12 4579
73be902c
KH
4580/* Double the allocated memory for *BUF. */
4581static void
4582extend_conversion_buffer (buf)
4583 struct conversion_buffer *buf;
d46c5b12 4584{
73be902c 4585 if (buf->on_stack)
d46c5b12 4586 {
73be902c
KH
4587 unsigned char *save = buf->data;
4588 buf->data = (unsigned char *) xmalloc (buf->size * 2);
4589 bcopy (save, buf->data, buf->size);
4590 buf->on_stack = 0;
d46c5b12 4591 }
73be902c
KH
4592 else
4593 {
4594 buf->data = (unsigned char *) xrealloc (buf->data, buf->size * 2);
4595 }
4596 buf->size *= 2;
4597}
4598
4599/* Free the allocated memory for BUF if it is not on stack. */
4600static void
4601free_conversion_buffer (buf)
4602 struct conversion_buffer *buf;
4603{
4604 if (!buf->on_stack)
4605 xfree (buf->data);
d46c5b12
KH
4606}
4607
4608int
4609ccl_coding_driver (coding, source, destination, src_bytes, dst_bytes, encodep)
4610 struct coding_system *coding;
4611 unsigned char *source, *destination;
4612 int src_bytes, dst_bytes, encodep;
4613{
4614 struct ccl_program *ccl
4615 = encodep ? &coding->spec.ccl.encoder : &coding->spec.ccl.decoder;
1c3478b0 4616 unsigned char *dst = destination;
d46c5b12 4617
bd64290d 4618 ccl->suppress_error = coding->suppress_error;
ae9ff118 4619 ccl->last_block = coding->mode & CODING_MODE_LAST_BLOCK;
aaaf0b1e 4620 if (encodep)
80e0ca99
KH
4621 {
4622 /* On encoding, EOL format is converted within ccl_driver. For
4623 that, setup proper information in the structure CCL. */
4624 ccl->eol_type = coding->eol_type;
4625 if (ccl->eol_type ==CODING_EOL_UNDECIDED)
4626 ccl->eol_type = CODING_EOL_LF;
4627 ccl->cr_consumed = coding->spec.ccl.cr_carryover;
b671ed5e 4628 ccl->eight_bit_control = coding->dst_multibyte;
80e0ca99 4629 }
b671ed5e
KH
4630 else
4631 ccl->eight_bit_control = 1;
7272d75c 4632 ccl->multibyte = coding->src_multibyte;
1c3478b0
KH
4633 if (coding->spec.ccl.eight_bit_carryover[0] != 0)
4634 {
4635 /* Move carryover bytes to DESTINATION. */
4636 unsigned char *p = coding->spec.ccl.eight_bit_carryover;
4637 while (*p)
4638 *dst++ = *p++;
4639 coding->spec.ccl.eight_bit_carryover[0] = 0;
4640 if (dst_bytes)
4641 dst_bytes -= dst - destination;
4642 }
4643
4644 coding->produced = (ccl_driver (ccl, source, dst, src_bytes, dst_bytes,
4645 &(coding->consumed))
4646 + dst - destination);
4647
b73bfc1c 4648 if (encodep)
80e0ca99
KH
4649 {
4650 coding->produced_char = coding->produced;
4651 coding->spec.ccl.cr_carryover = ccl->cr_consumed;
4652 }
ade8d05e
KH
4653 else if (!ccl->eight_bit_control)
4654 {
4655 /* The produced bytes forms a valid multibyte sequence. */
4656 coding->produced_char
4657 = multibyte_chars_in_text (destination, coding->produced);
4658 coding->spec.ccl.eight_bit_carryover[0] = 0;
4659 }
b73bfc1c
KH
4660 else
4661 {
1c3478b0
KH
4662 /* On decoding, the destination should always multibyte. But,
4663 CCL program might have been generated an invalid multibyte
4664 sequence. Here we make such a sequence valid as
4665 multibyte. */
b73bfc1c
KH
4666 int bytes
4667 = dst_bytes ? dst_bytes : source + coding->consumed - destination;
1c3478b0
KH
4668
4669 if ((coding->consumed < src_bytes
4670 || !ccl->last_block)
4671 && coding->produced >= 1
4672 && destination[coding->produced - 1] >= 0x80)
4673 {
4674 /* We should not convert the tailing 8-bit codes to
4675 multibyte form even if they doesn't form a valid
4676 multibyte sequence. They may form a valid sequence in
4677 the next call. */
4678 int carryover = 0;
4679
4680 if (destination[coding->produced - 1] < 0xA0)
4681 carryover = 1;
4682 else if (coding->produced >= 2)
4683 {
4684 if (destination[coding->produced - 2] >= 0x80)
4685 {
4686 if (destination[coding->produced - 2] < 0xA0)
4687 carryover = 2;
4688 else if (coding->produced >= 3
4689 && destination[coding->produced - 3] >= 0x80
4690 && destination[coding->produced - 3] < 0xA0)
4691 carryover = 3;
4692 }
4693 }
4694 if (carryover > 0)
4695 {
4696 BCOPY_SHORT (destination + coding->produced - carryover,
4697 coding->spec.ccl.eight_bit_carryover,
4698 carryover);
4699 coding->spec.ccl.eight_bit_carryover[carryover] = 0;
4700 coding->produced -= carryover;
4701 }
4702 }
b73bfc1c
KH
4703 coding->produced = str_as_multibyte (destination, bytes,
4704 coding->produced,
4705 &(coding->produced_char));
4706 }
69f76525 4707
d46c5b12
KH
4708 switch (ccl->status)
4709 {
4710 case CCL_STAT_SUSPEND_BY_SRC:
73be902c 4711 coding->result = CODING_FINISH_INSUFFICIENT_SRC;
d46c5b12
KH
4712 break;
4713 case CCL_STAT_SUSPEND_BY_DST:
73be902c 4714 coding->result = CODING_FINISH_INSUFFICIENT_DST;
d46c5b12 4715 break;
9864ebce
KH
4716 case CCL_STAT_QUIT:
4717 case CCL_STAT_INVALID_CMD:
73be902c 4718 coding->result = CODING_FINISH_INTERRUPT;
9864ebce 4719 break;
d46c5b12 4720 default:
73be902c 4721 coding->result = CODING_FINISH_NORMAL;
d46c5b12
KH
4722 break;
4723 }
73be902c 4724 return coding->result;
4ed46869
KH
4725}
4726
aaaf0b1e
KH
4727/* Decode EOL format of the text at PTR of BYTES length destructively
4728 according to CODING->eol_type. This is called after the CCL
4729 program produced a decoded text at PTR. If we do CRLF->LF
4730 conversion, update CODING->produced and CODING->produced_char. */
4731
4732static void
4733decode_eol_post_ccl (coding, ptr, bytes)
4734 struct coding_system *coding;
4735 unsigned char *ptr;
4736 int bytes;
4737{
4738 Lisp_Object val, saved_coding_symbol;
4739 unsigned char *pend = ptr + bytes;
4740 int dummy;
4741
4742 /* Remember the current coding system symbol. We set it back when
4743 an inconsistent EOL is found so that `last-coding-system-used' is
4744 set to the coding system that doesn't specify EOL conversion. */
4745 saved_coding_symbol = coding->symbol;
4746
4747 coding->spec.ccl.cr_carryover = 0;
4748 if (coding->eol_type == CODING_EOL_UNDECIDED)
4749 {
4750 /* Here, to avoid the call of setup_coding_system, we directly
4751 call detect_eol_type. */
4752 coding->eol_type = detect_eol_type (ptr, bytes, &dummy);
74b01b80
EZ
4753 if (coding->eol_type == CODING_EOL_INCONSISTENT)
4754 coding->eol_type = CODING_EOL_LF;
4755 if (coding->eol_type != CODING_EOL_UNDECIDED)
4756 {
4757 val = Fget (coding->symbol, Qeol_type);
4758 if (VECTORP (val) && XVECTOR (val)->size == 3)
4759 coding->symbol = XVECTOR (val)->contents[coding->eol_type];
4760 }
aaaf0b1e
KH
4761 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
4762 }
4763
74b01b80
EZ
4764 if (coding->eol_type == CODING_EOL_LF
4765 || coding->eol_type == CODING_EOL_UNDECIDED)
aaaf0b1e
KH
4766 {
4767 /* We have nothing to do. */
4768 ptr = pend;
4769 }
4770 else if (coding->eol_type == CODING_EOL_CRLF)
4771 {
4772 unsigned char *pstart = ptr, *p = ptr;
4773
4774 if (! (coding->mode & CODING_MODE_LAST_BLOCK)
4775 && *(pend - 1) == '\r')
4776 {
4777 /* If the last character is CR, we can't handle it here
4778 because LF will be in the not-yet-decoded source text.
9861e777 4779 Record that the CR is not yet processed. */
aaaf0b1e
KH
4780 coding->spec.ccl.cr_carryover = 1;
4781 coding->produced--;
4782 coding->produced_char--;
4783 pend--;
4784 }
4785 while (ptr < pend)
4786 {
4787 if (*ptr == '\r')
4788 {
4789 if (ptr + 1 < pend && *(ptr + 1) == '\n')
4790 {
4791 *p++ = '\n';
4792 ptr += 2;
4793 }
4794 else
4795 {
4796 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4797 goto undo_eol_conversion;
4798 *p++ = *ptr++;
4799 }
4800 }
4801 else if (*ptr == '\n'
4802 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4803 goto undo_eol_conversion;
4804 else
4805 *p++ = *ptr++;
4806 continue;
4807
4808 undo_eol_conversion:
4809 /* We have faced with inconsistent EOL format at PTR.
4810 Convert all LFs before PTR back to CRLFs. */
4811 for (p--, ptr--; p >= pstart; p--)
4812 {
4813 if (*p == '\n')
4814 *ptr-- = '\n', *ptr-- = '\r';
4815 else
4816 *ptr-- = *p;
4817 }
4818 /* If carryover is recorded, cancel it because we don't
4819 convert CRLF anymore. */
4820 if (coding->spec.ccl.cr_carryover)
4821 {
4822 coding->spec.ccl.cr_carryover = 0;
4823 coding->produced++;
4824 coding->produced_char++;
4825 pend++;
4826 }
4827 p = ptr = pend;
4828 coding->eol_type = CODING_EOL_LF;
4829 coding->symbol = saved_coding_symbol;
4830 }
4831 if (p < pend)
4832 {
4833 /* As each two-byte sequence CRLF was converted to LF, (PEND
4834 - P) is the number of deleted characters. */
4835 coding->produced -= pend - p;
4836 coding->produced_char -= pend - p;
4837 }
4838 }
4839 else /* i.e. coding->eol_type == CODING_EOL_CR */
4840 {
4841 unsigned char *p = ptr;
4842
4843 for (; ptr < pend; ptr++)
4844 {
4845 if (*ptr == '\r')
4846 *ptr = '\n';
4847 else if (*ptr == '\n'
4848 && coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
4849 {
4850 for (; p < ptr; p++)
4851 {
4852 if (*p == '\n')
4853 *p = '\r';
4854 }
4855 ptr = pend;
4856 coding->eol_type = CODING_EOL_LF;
4857 coding->symbol = saved_coding_symbol;
4858 }
4859 }
4860 }
4861}
4862
4ed46869
KH
4863/* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before
4864 decoding, it may detect coding system and format of end-of-line if
b73bfc1c
KH
4865 those are not yet decided. The source should be unibyte, the
4866 result is multibyte if CODING->dst_multibyte is nonzero, else
4867 unibyte. */
4ed46869
KH
4868
4869int
d46c5b12 4870decode_coding (coding, source, destination, src_bytes, dst_bytes)
4ed46869 4871 struct coding_system *coding;
a4244313
KR
4872 const unsigned char *source;
4873 unsigned char *destination;
4ed46869 4874 int src_bytes, dst_bytes;
4ed46869 4875{
9861e777
EZ
4876 int extra = 0;
4877
0ef69138 4878 if (coding->type == coding_type_undecided)
4ed46869
KH
4879 detect_coding (coding, source, src_bytes);
4880
aaaf0b1e
KH
4881 if (coding->eol_type == CODING_EOL_UNDECIDED
4882 && coding->type != coding_type_ccl)
8844fa83
KH
4883 {
4884 detect_eol (coding, source, src_bytes);
4885 /* We had better recover the original eol format if we
8ca3766a 4886 encounter an inconsistent eol format while decoding. */
8844fa83
KH
4887 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
4888 }
4ed46869 4889
b73bfc1c
KH
4890 coding->produced = coding->produced_char = 0;
4891 coding->consumed = coding->consumed_char = 0;
4892 coding->errors = 0;
4893 coding->result = CODING_FINISH_NORMAL;
4894
4ed46869
KH
4895 switch (coding->type)
4896 {
4ed46869 4897 case coding_type_sjis:
b73bfc1c
KH
4898 decode_coding_sjis_big5 (coding, source, destination,
4899 src_bytes, dst_bytes, 1);
4ed46869
KH
4900 break;
4901
4902 case coding_type_iso2022:
b73bfc1c
KH
4903 decode_coding_iso2022 (coding, source, destination,
4904 src_bytes, dst_bytes);
4ed46869
KH
4905 break;
4906
4907 case coding_type_big5:
b73bfc1c
KH
4908 decode_coding_sjis_big5 (coding, source, destination,
4909 src_bytes, dst_bytes, 0);
4910 break;
4911
4912 case coding_type_emacs_mule:
4913 decode_coding_emacs_mule (coding, source, destination,
4914 src_bytes, dst_bytes);
4ed46869
KH
4915 break;
4916
4917 case coding_type_ccl:
aaaf0b1e
KH
4918 if (coding->spec.ccl.cr_carryover)
4919 {
9861e777
EZ
4920 /* Put the CR which was not processed by the previous call
4921 of decode_eol_post_ccl in DESTINATION. It will be
4922 decoded together with the following LF by the call to
4923 decode_eol_post_ccl below. */
aaaf0b1e
KH
4924 *destination = '\r';
4925 coding->produced++;
4926 coding->produced_char++;
4927 dst_bytes--;
9861e777 4928 extra = coding->spec.ccl.cr_carryover;
aaaf0b1e 4929 }
9861e777 4930 ccl_coding_driver (coding, source, destination + extra,
b73bfc1c 4931 src_bytes, dst_bytes, 0);
aaaf0b1e 4932 if (coding->eol_type != CODING_EOL_LF)
9861e777
EZ
4933 {
4934 coding->produced += extra;
4935 coding->produced_char += extra;
4936 decode_eol_post_ccl (coding, destination, coding->produced);
4937 }
d46c5b12
KH
4938 break;
4939
b73bfc1c
KH
4940 default:
4941 decode_eol (coding, source, destination, src_bytes, dst_bytes);
4942 }
4943
4944 if (coding->result == CODING_FINISH_INSUFFICIENT_SRC
e7c9eef9 4945 && coding->mode & CODING_MODE_LAST_BLOCK
b73bfc1c
KH
4946 && coding->consumed == src_bytes)
4947 coding->result = CODING_FINISH_NORMAL;
4948
4949 if (coding->mode & CODING_MODE_LAST_BLOCK
4950 && coding->result == CODING_FINISH_INSUFFICIENT_SRC)
4951 {
a4244313 4952 const unsigned char *src = source + coding->consumed;
b73bfc1c
KH
4953 unsigned char *dst = destination + coding->produced;
4954
4955 src_bytes -= coding->consumed;
bb10be8b 4956 coding->errors++;
b73bfc1c
KH
4957 if (COMPOSING_P (coding))
4958 DECODE_COMPOSITION_END ('1');
4959 while (src_bytes--)
d46c5b12 4960 {
b73bfc1c
KH
4961 int c = *src++;
4962 dst += CHAR_STRING (c, dst);
4963 coding->produced_char++;
d46c5b12 4964 }
b73bfc1c
KH
4965 coding->consumed = coding->consumed_char = src - source;
4966 coding->produced = dst - destination;
73be902c 4967 coding->result = CODING_FINISH_NORMAL;
4ed46869
KH
4968 }
4969
b73bfc1c
KH
4970 if (!coding->dst_multibyte)
4971 {
4972 coding->produced = str_as_unibyte (destination, coding->produced);
4973 coding->produced_char = coding->produced;
4974 }
4ed46869 4975
b73bfc1c
KH
4976 return coding->result;
4977}
52d41803 4978
b73bfc1c
KH
4979/* See "GENERAL NOTES about `encode_coding_XXX ()' functions". The
4980 multibyteness of the source is CODING->src_multibyte, the
4981 multibyteness of the result is always unibyte. */
4ed46869
KH
4982
4983int
d46c5b12 4984encode_coding (coding, source, destination, src_bytes, dst_bytes)
4ed46869 4985 struct coding_system *coding;
a4244313
KR
4986 const unsigned char *source;
4987 unsigned char *destination;
4ed46869 4988 int src_bytes, dst_bytes;
4ed46869 4989{
b73bfc1c
KH
4990 coding->produced = coding->produced_char = 0;
4991 coding->consumed = coding->consumed_char = 0;
4992 coding->errors = 0;
4993 coding->result = CODING_FINISH_NORMAL;
4ed46869 4994
d46c5b12
KH
4995 switch (coding->type)
4996 {
4ed46869 4997 case coding_type_sjis:
b73bfc1c
KH
4998 encode_coding_sjis_big5 (coding, source, destination,
4999 src_bytes, dst_bytes, 1);
4ed46869
KH
5000 break;
5001
5002 case coding_type_iso2022:
b73bfc1c
KH
5003 encode_coding_iso2022 (coding, source, destination,
5004 src_bytes, dst_bytes);
4ed46869
KH
5005 break;
5006
5007 case coding_type_big5:
b73bfc1c
KH
5008 encode_coding_sjis_big5 (coding, source, destination,
5009 src_bytes, dst_bytes, 0);
5010 break;
5011
5012 case coding_type_emacs_mule:
5013 encode_coding_emacs_mule (coding, source, destination,
5014 src_bytes, dst_bytes);
4ed46869
KH
5015 break;
5016
5017 case coding_type_ccl:
b73bfc1c
KH
5018 ccl_coding_driver (coding, source, destination,
5019 src_bytes, dst_bytes, 1);
d46c5b12
KH
5020 break;
5021
b73bfc1c
KH
5022 default:
5023 encode_eol (coding, source, destination, src_bytes, dst_bytes);
5024 }
5025
73be902c
KH
5026 if (coding->mode & CODING_MODE_LAST_BLOCK
5027 && coding->result == CODING_FINISH_INSUFFICIENT_SRC)
b73bfc1c 5028 {
a4244313 5029 const unsigned char *src = source + coding->consumed;
b73bfc1c
KH
5030 unsigned char *dst = destination + coding->produced;
5031
5032 if (coding->type == coding_type_iso2022)
5033 ENCODE_RESET_PLANE_AND_REGISTER;
5034 if (COMPOSING_P (coding))
5035 *dst++ = ISO_CODE_ESC, *dst++ = '1';
5036 if (coding->consumed < src_bytes)
d46c5b12 5037 {
b73bfc1c
KH
5038 int len = src_bytes - coding->consumed;
5039
fabf4a91 5040 BCOPY_SHORT (src, dst, len);
b73bfc1c
KH
5041 if (coding->src_multibyte)
5042 len = str_as_unibyte (dst, len);
5043 dst += len;
5044 coding->consumed = src_bytes;
d46c5b12 5045 }
b73bfc1c 5046 coding->produced = coding->produced_char = dst - destination;
73be902c 5047 coding->result = CODING_FINISH_NORMAL;
4ed46869
KH
5048 }
5049
bb10be8b
KH
5050 if (coding->result == CODING_FINISH_INSUFFICIENT_SRC
5051 && coding->consumed == src_bytes)
5052 coding->result = CODING_FINISH_NORMAL;
5053
b73bfc1c 5054 return coding->result;
4ed46869
KH
5055}
5056
fb88bf2d
KH
5057/* Scan text in the region between *BEG and *END (byte positions),
5058 skip characters which we don't have to decode by coding system
5059 CODING at the head and tail, then set *BEG and *END to the region
5060 of the text we actually have to convert. The caller should move
b73bfc1c
KH
5061 the gap out of the region in advance if the region is from a
5062 buffer.
4ed46869 5063
d46c5b12
KH
5064 If STR is not NULL, *BEG and *END are indices into STR. */
5065
5066static void
5067shrink_decoding_region (beg, end, coding, str)
5068 int *beg, *end;
5069 struct coding_system *coding;
5070 unsigned char *str;
5071{
fb88bf2d 5072 unsigned char *begp_orig, *begp, *endp_orig, *endp, c;
d46c5b12 5073 int eol_conversion;
88993dfd 5074 Lisp_Object translation_table;
d46c5b12
KH
5075
5076 if (coding->type == coding_type_ccl
5077 || coding->type == coding_type_undecided
b73bfc1c
KH
5078 || coding->eol_type != CODING_EOL_LF
5079 || !NILP (coding->post_read_conversion)
5080 || coding->composing != COMPOSITION_DISABLED)
d46c5b12
KH
5081 {
5082 /* We can't skip any data. */
5083 return;
5084 }
b73bfc1c
KH
5085 if (coding->type == coding_type_no_conversion
5086 || coding->type == coding_type_raw_text
5087 || coding->type == coding_type_emacs_mule)
d46c5b12 5088 {
fb88bf2d
KH
5089 /* We need no conversion, but don't have to skip any data here.
5090 Decoding routine handles them effectively anyway. */
d46c5b12
KH
5091 return;
5092 }
5093
88993dfd
KH
5094 translation_table = coding->translation_table_for_decode;
5095 if (NILP (translation_table) && !NILP (Venable_character_translation))
5096 translation_table = Vstandard_translation_table_for_decode;
5097 if (CHAR_TABLE_P (translation_table))
5098 {
5099 int i;
5100 for (i = 0; i < 128; i++)
5101 if (!NILP (CHAR_TABLE_REF (translation_table, i)))
5102 break;
5103 if (i < 128)
fa46990e 5104 /* Some ASCII character should be translated. We give up
88993dfd
KH
5105 shrinking. */
5106 return;
5107 }
5108
b73bfc1c 5109 if (coding->heading_ascii >= 0)
d46c5b12
KH
5110 /* Detection routine has already found how much we can skip at the
5111 head. */
5112 *beg += coding->heading_ascii;
5113
5114 if (str)
5115 {
5116 begp_orig = begp = str + *beg;
5117 endp_orig = endp = str + *end;
5118 }
5119 else
5120 {
fb88bf2d 5121 begp_orig = begp = BYTE_POS_ADDR (*beg);
d46c5b12
KH
5122 endp_orig = endp = begp + *end - *beg;
5123 }
5124
fa46990e
DL
5125 eol_conversion = (coding->eol_type == CODING_EOL_CR
5126 || coding->eol_type == CODING_EOL_CRLF);
5127
d46c5b12
KH
5128 switch (coding->type)
5129 {
d46c5b12
KH
5130 case coding_type_sjis:
5131 case coding_type_big5:
5132 /* We can skip all ASCII characters at the head. */
5133 if (coding->heading_ascii < 0)
5134 {
5135 if (eol_conversion)
de9d083c 5136 while (begp < endp && *begp < 0x80 && *begp != '\r') begp++;
d46c5b12
KH
5137 else
5138 while (begp < endp && *begp < 0x80) begp++;
5139 }
5140 /* We can skip all ASCII characters at the tail except for the
5141 second byte of SJIS or BIG5 code. */
5142 if (eol_conversion)
de9d083c 5143 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\r') endp--;
d46c5b12
KH
5144 else
5145 while (begp < endp && endp[-1] < 0x80) endp--;
ee59c65f
RS
5146 /* Do not consider LF as ascii if preceded by CR, since that
5147 confuses eol decoding. */
5148 if (begp < endp && endp < endp_orig && endp[-1] == '\r' && endp[0] == '\n')
5149 endp++;
d46c5b12
KH
5150 if (begp < endp && endp < endp_orig && endp[-1] >= 0x80)
5151 endp++;
5152 break;
5153
b73bfc1c 5154 case coding_type_iso2022:
622fece5
KH
5155 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, 0) != CHARSET_ASCII)
5156 /* We can't skip any data. */
5157 break;
d46c5b12
KH
5158 if (coding->heading_ascii < 0)
5159 {
d46c5b12
KH
5160 /* We can skip all ASCII characters at the head except for a
5161 few control codes. */
5162 while (begp < endp && (c = *begp) < 0x80
5163 && c != ISO_CODE_CR && c != ISO_CODE_SO
5164 && c != ISO_CODE_SI && c != ISO_CODE_ESC
5165 && (!eol_conversion || c != ISO_CODE_LF))
5166 begp++;
5167 }
5168 switch (coding->category_idx)
5169 {
5170 case CODING_CATEGORY_IDX_ISO_8_1:
5171 case CODING_CATEGORY_IDX_ISO_8_2:
5172 /* We can skip all ASCII characters at the tail. */
5173 if (eol_conversion)
de9d083c 5174 while (begp < endp && (c = endp[-1]) < 0x80 && c != '\r') endp--;
d46c5b12
KH
5175 else
5176 while (begp < endp && endp[-1] < 0x80) endp--;
ee59c65f
RS
5177 /* Do not consider LF as ascii if preceded by CR, since that
5178 confuses eol decoding. */
5179 if (begp < endp && endp < endp_orig && endp[-1] == '\r' && endp[0] == '\n')
5180 endp++;
d46c5b12
KH
5181 break;
5182
5183 case CODING_CATEGORY_IDX_ISO_7:
5184 case CODING_CATEGORY_IDX_ISO_7_TIGHT:
de79a6a5 5185 {
8ca3766a 5186 /* We can skip all characters at the tail except for 8-bit
de79a6a5
KH
5187 codes and ESC and the following 2-byte at the tail. */
5188 unsigned char *eight_bit = NULL;
5189
5190 if (eol_conversion)
5191 while (begp < endp
5192 && (c = endp[-1]) != ISO_CODE_ESC && c != '\r')
5193 {
5194 if (!eight_bit && c & 0x80) eight_bit = endp;
5195 endp--;
5196 }
5197 else
5198 while (begp < endp
5199 && (c = endp[-1]) != ISO_CODE_ESC)
5200 {
5201 if (!eight_bit && c & 0x80) eight_bit = endp;
5202 endp--;
5203 }
5204 /* Do not consider LF as ascii if preceded by CR, since that
5205 confuses eol decoding. */
5206 if (begp < endp && endp < endp_orig
5207 && endp[-1] == '\r' && endp[0] == '\n')
5208 endp++;
5209 if (begp < endp && endp[-1] == ISO_CODE_ESC)
5210 {
5211 if (endp + 1 < endp_orig && end[0] == '(' && end[1] == 'B')
5212 /* This is an ASCII designation sequence. We can
5213 surely skip the tail. But, if we have
5214 encountered an 8-bit code, skip only the codes
5215 after that. */
5216 endp = eight_bit ? eight_bit : endp + 2;
5217 else
5218 /* Hmmm, we can't skip the tail. */
5219 endp = endp_orig;
5220 }
5221 else if (eight_bit)
5222 endp = eight_bit;
5223 }
d46c5b12 5224 }
b73bfc1c
KH
5225 break;
5226
5227 default:
5228 abort ();
d46c5b12
KH
5229 }
5230 *beg += begp - begp_orig;
5231 *end += endp - endp_orig;
5232 return;
5233}
5234
5235/* Like shrink_decoding_region but for encoding. */
5236
5237static void
5238shrink_encoding_region (beg, end, coding, str)
5239 int *beg, *end;
5240 struct coding_system *coding;
5241 unsigned char *str;
5242{
5243 unsigned char *begp_orig, *begp, *endp_orig, *endp;
5244 int eol_conversion;
88993dfd 5245 Lisp_Object translation_table;
d46c5b12 5246
b73bfc1c
KH
5247 if (coding->type == coding_type_ccl
5248 || coding->eol_type == CODING_EOL_CRLF
5249 || coding->eol_type == CODING_EOL_CR
87323294 5250 || (coding->cmp_data && coding->cmp_data->used > 0))
d46c5b12 5251 {
b73bfc1c
KH
5252 /* We can't skip any data. */
5253 return;
5254 }
5255 if (coding->type == coding_type_no_conversion
5256 || coding->type == coding_type_raw_text
5257 || coding->type == coding_type_emacs_mule
5258 || coding->type == coding_type_undecided)
5259 {
5260 /* We need no conversion, but don't have to skip any data here.
5261 Encoding routine handles them effectively anyway. */
d46c5b12
KH
5262 return;
5263 }
5264
88993dfd
KH
5265 translation_table = coding->translation_table_for_encode;
5266 if (NILP (translation_table) && !NILP (Venable_character_translation))
5267 translation_table = Vstandard_translation_table_for_encode;
5268 if (CHAR_TABLE_P (translation_table))
5269 {
5270 int i;
5271 for (i = 0; i < 128; i++)
5272 if (!NILP (CHAR_TABLE_REF (translation_table, i)))
5273 break;
5274 if (i < 128)
8ca3766a 5275 /* Some ASCII character should be translated. We give up
88993dfd
KH
5276 shrinking. */
5277 return;
5278 }
5279
d46c5b12
KH
5280 if (str)
5281 {
5282 begp_orig = begp = str + *beg;
5283 endp_orig = endp = str + *end;
5284 }
5285 else
5286 {
fb88bf2d 5287 begp_orig = begp = BYTE_POS_ADDR (*beg);
d46c5b12
KH
5288 endp_orig = endp = begp + *end - *beg;
5289 }
5290
5291 eol_conversion = (coding->eol_type == CODING_EOL_CR
5292 || coding->eol_type == CODING_EOL_CRLF);
5293
5294 /* Here, we don't have to check coding->pre_write_conversion because
5295 the caller is expected to have handled it already. */
5296 switch (coding->type)
5297 {
d46c5b12 5298 case coding_type_iso2022:
622fece5
KH
5299 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, 0) != CHARSET_ASCII)
5300 /* We can't skip any data. */
5301 break;
d46c5b12
KH
5302 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL)
5303 {
93dec019 5304 unsigned char *bol = begp;
d46c5b12
KH
5305 while (begp < endp && *begp < 0x80)
5306 {
5307 begp++;
5308 if (begp[-1] == '\n')
5309 bol = begp;
5310 }
5311 begp = bol;
5312 goto label_skip_tail;
5313 }
5314 /* fall down ... */
5315
b73bfc1c
KH
5316 case coding_type_sjis:
5317 case coding_type_big5:
d46c5b12
KH
5318 /* We can skip all ASCII characters at the head and tail. */
5319 if (eol_conversion)
5320 while (begp < endp && *begp < 0x80 && *begp != '\n') begp++;
5321 else
5322 while (begp < endp && *begp < 0x80) begp++;
5323 label_skip_tail:
5324 if (eol_conversion)
5325 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--;
5326 else
5327 while (begp < endp && *(endp - 1) < 0x80) endp--;
5328 break;
b73bfc1c
KH
5329
5330 default:
5331 abort ();
d46c5b12
KH
5332 }
5333
5334 *beg += begp - begp_orig;
5335 *end += endp - endp_orig;
5336 return;
5337}
5338
88993dfd
KH
5339/* As shrinking conversion region requires some overhead, we don't try
5340 shrinking if the length of conversion region is less than this
5341 value. */
5342static int shrink_conversion_region_threshhold = 1024;
5343
5344#define SHRINK_CONVERSION_REGION(beg, end, coding, str, encodep) \
5345 do { \
5346 if (*(end) - *(beg) > shrink_conversion_region_threshhold) \
5347 { \
5348 if (encodep) shrink_encoding_region (beg, end, coding, str); \
5349 else shrink_decoding_region (beg, end, coding, str); \
5350 } \
5351 } while (0)
5352
24a2b282
KH
5353/* ARG is (CODING BUFFER ...) where CODING is what to be set in
5354 Vlast_coding_system_used and the remaining elements are buffers to
16ef9c56 5355 kill. */
b843d1ae 5356static Lisp_Object
1c7457e2
KH
5357code_convert_region_unwind (arg)
5358 Lisp_Object arg;
b843d1ae 5359{
89aa725a
KH
5360 struct gcpro gcpro1;
5361 GCPRO1 (arg);
5362
b843d1ae 5363 inhibit_pre_post_conversion = 0;
16ef9c56 5364 Vlast_coding_system_used = XCAR (arg);
24a2b282
KH
5365 for (arg = XCDR (arg); ! NILP (arg); arg = XCDR (arg))
5366 Fkill_buffer (XCAR (arg));
89aa725a
KH
5367
5368 UNGCPRO;
b843d1ae
KH
5369 return Qnil;
5370}
5371
ec6d2bb8
KH
5372/* Store information about all compositions in the range FROM and TO
5373 of OBJ in memory blocks pointed by CODING->cmp_data. OBJ is a
5374 buffer or a string, defaults to the current buffer. */
5375
5376void
5377coding_save_composition (coding, from, to, obj)
5378 struct coding_system *coding;
5379 int from, to;
5380 Lisp_Object obj;
5381{
5382 Lisp_Object prop;
5383 int start, end;
5384
91bee881
KH
5385 if (coding->composing == COMPOSITION_DISABLED)
5386 return;
5387 if (!coding->cmp_data)
5388 coding_allocate_composition_data (coding, from);
ec6d2bb8
KH
5389 if (!find_composition (from, to, &start, &end, &prop, obj)
5390 || end > to)
5391 return;
5392 if (start < from
5393 && (!find_composition (end, to, &start, &end, &prop, obj)
5394 || end > to))
5395 return;
5396 coding->composing = COMPOSITION_NO;
ec6d2bb8
KH
5397 do
5398 {
5399 if (COMPOSITION_VALID_P (start, end, prop))
5400 {
5401 enum composition_method method = COMPOSITION_METHOD (prop);
5402 if (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH
5403 >= COMPOSITION_DATA_SIZE)
5404 coding_allocate_composition_data (coding, from);
5405 /* For relative composition, we remember start and end
5406 positions, for the other compositions, we also remember
5407 components. */
5408 CODING_ADD_COMPOSITION_START (coding, start - from, method);
5409 if (method != COMPOSITION_RELATIVE)
5410 {
5411 /* We must store a*/
5412 Lisp_Object val, ch;
5413
5414 val = COMPOSITION_COMPONENTS (prop);
5415 if (CONSP (val))
5416 while (CONSP (val))
5417 {
5418 ch = XCAR (val), val = XCDR (val);
5419 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (ch));
5420 }
5421 else if (VECTORP (val) || STRINGP (val))
5422 {
5423 int len = (VECTORP (val)
d5db4077 5424 ? XVECTOR (val)->size : SCHARS (val));
ec6d2bb8
KH
5425 int i;
5426 for (i = 0; i < len; i++)
5427 {
5428 ch = (STRINGP (val)
5429 ? Faref (val, make_number (i))
5430 : XVECTOR (val)->contents[i]);
5431 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (ch));
5432 }
5433 }
5434 else /* INTEGERP (val) */
5435 CODING_ADD_COMPOSITION_COMPONENT (coding, XINT (val));
5436 }
5437 CODING_ADD_COMPOSITION_END (coding, end - from);
5438 }
5439 start = end;
5440 }
5441 while (start < to
5442 && find_composition (start, to, &start, &end, &prop, obj)
5443 && end <= to);
5444
5445 /* Make coding->cmp_data point to the first memory block. */
5446 while (coding->cmp_data->prev)
5447 coding->cmp_data = coding->cmp_data->prev;
5448 coding->cmp_data_start = 0;
5449}
5450
5451/* Reflect the saved information about compositions to OBJ.
8ca3766a 5452 CODING->cmp_data points to a memory block for the information. OBJ
ec6d2bb8
KH
5453 is a buffer or a string, defaults to the current buffer. */
5454
33fb63eb 5455void
ec6d2bb8
KH
5456coding_restore_composition (coding, obj)
5457 struct coding_system *coding;
5458 Lisp_Object obj;
5459{
5460 struct composition_data *cmp_data = coding->cmp_data;
5461
5462 if (!cmp_data)
5463 return;
5464
5465 while (cmp_data->prev)
5466 cmp_data = cmp_data->prev;
5467
5468 while (cmp_data)
5469 {
5470 int i;
5471
78108bcd
KH
5472 for (i = 0; i < cmp_data->used && cmp_data->data[i] > 0;
5473 i += cmp_data->data[i])
ec6d2bb8
KH
5474 {
5475 int *data = cmp_data->data + i;
5476 enum composition_method method = (enum composition_method) data[3];
5477 Lisp_Object components;
5478
4307d534
KH
5479 if (data[0] < 0 || i + data[0] > cmp_data->used)
5480 /* Invalid composition data. */
5481 break;
5482
ec6d2bb8
KH
5483 if (method == COMPOSITION_RELATIVE)
5484 components = Qnil;
5485 else
5486 {
5487 int len = data[0] - 4, j;
5488 Lisp_Object args[MAX_COMPOSITION_COMPONENTS * 2 - 1];
5489
b6871cc7
KH
5490 if (method == COMPOSITION_WITH_RULE_ALTCHARS
5491 && len % 2 == 0)
5492 len --;
09721b31
KH
5493 if (len < 1)
5494 /* Invalid composition data. */
5495 break;
ec6d2bb8
KH
5496 for (j = 0; j < len; j++)
5497 args[j] = make_number (data[4 + j]);
5498 components = (method == COMPOSITION_WITH_ALTCHARS
316d4bf9
SM
5499 ? Fstring (len, args)
5500 : Fvector (len, args));
ec6d2bb8
KH
5501 }
5502 compose_text (data[1], data[2], components, Qnil, obj);
5503 }
5504 cmp_data = cmp_data->next;
5505 }
5506}
5507
d46c5b12 5508/* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the
fb88bf2d
KH
5509 text from FROM to TO (byte positions are FROM_BYTE and TO_BYTE) by
5510 coding system CODING, and return the status code of code conversion
5511 (currently, this value has no meaning).
5512
5513 How many characters (and bytes) are converted to how many
5514 characters (and bytes) are recorded in members of the structure
5515 CODING.
d46c5b12 5516
6e44253b 5517 If REPLACE is nonzero, we do various things as if the original text
d46c5b12 5518 is deleted and a new text is inserted. See the comments in
b73bfc1c
KH
5519 replace_range (insdel.c) to know what we are doing.
5520
5521 If REPLACE is zero, it is assumed that the source text is unibyte.
8ca3766a 5522 Otherwise, it is assumed that the source text is multibyte. */
4ed46869
KH
5523
5524int
6e44253b
KH
5525code_convert_region (from, from_byte, to, to_byte, coding, encodep, replace)
5526 int from, from_byte, to, to_byte, encodep, replace;
4ed46869 5527 struct coding_system *coding;
4ed46869 5528{
fb88bf2d 5529 int len = to - from, len_byte = to_byte - from_byte;
72d1a715 5530 int nchars_del = 0, nbytes_del = 0;
fb88bf2d 5531 int require, inserted, inserted_byte;
4b39528c 5532 int head_skip, tail_skip, total_skip = 0;
84d60297 5533 Lisp_Object saved_coding_symbol;
fb88bf2d 5534 int first = 1;
fb88bf2d 5535 unsigned char *src, *dst;
84d60297 5536 Lisp_Object deletion;
e133c8fa 5537 int orig_point = PT, orig_len = len;
6abb9bd9 5538 int prev_Z;
b73bfc1c
KH
5539 int multibyte_p = !NILP (current_buffer->enable_multibyte_characters);
5540
84d60297 5541 deletion = Qnil;
8844fa83 5542 saved_coding_symbol = coding->symbol;
d46c5b12 5543
83fa074f 5544 if (from < PT && PT < to)
e133c8fa
KH
5545 {
5546 TEMP_SET_PT_BOTH (from, from_byte);
5547 orig_point = from;
5548 }
83fa074f 5549
6e44253b 5550 if (replace)
d46c5b12 5551 {
fb88bf2d 5552 int saved_from = from;
e077cc80 5553 int saved_inhibit_modification_hooks;
fb88bf2d 5554
d46c5b12 5555 prepare_to_modify_buffer (from, to, &from);
fb88bf2d
KH
5556 if (saved_from != from)
5557 {
5558 to = from + len;
b73bfc1c 5559 from_byte = CHAR_TO_BYTE (from), to_byte = CHAR_TO_BYTE (to);
fb88bf2d
KH
5560 len_byte = to_byte - from_byte;
5561 }
e077cc80
KH
5562
5563 /* The code conversion routine can not preserve text properties
5564 for now. So, we must remove all text properties in the
5565 region. Here, we must suppress all modification hooks. */
5566 saved_inhibit_modification_hooks = inhibit_modification_hooks;
5567 inhibit_modification_hooks = 1;
5568 Fset_text_properties (make_number (from), make_number (to), Qnil, Qnil);
5569 inhibit_modification_hooks = saved_inhibit_modification_hooks;
d46c5b12 5570 }
d46c5b12
KH
5571
5572 if (! encodep && CODING_REQUIRE_DETECTION (coding))
5573 {
12410ef1 5574 /* We must detect encoding of text and eol format. */
d46c5b12
KH
5575
5576 if (from < GPT && to > GPT)
5577 move_gap_both (from, from_byte);
5578 if (coding->type == coding_type_undecided)
5579 {
fb88bf2d 5580 detect_coding (coding, BYTE_POS_ADDR (from_byte), len_byte);
d46c5b12 5581 if (coding->type == coding_type_undecided)
62b3ef1d
KH
5582 {
5583 /* It seems that the text contains only ASCII, but we
d9aef30f 5584 should not leave it undecided because the deeper
62b3ef1d
KH
5585 decoding routine (decode_coding) tries to detect the
5586 encodings again in vain. */
5587 coding->type = coding_type_emacs_mule;
5588 coding->category_idx = CODING_CATEGORY_IDX_EMACS_MULE;
d280ccb6
KH
5589 /* As emacs-mule decoder will handle composition, we
5590 need this setting to allocate coding->cmp_data
5591 later. */
5592 coding->composing = COMPOSITION_NO;
62b3ef1d 5593 }
d46c5b12 5594 }
aaaf0b1e
KH
5595 if (coding->eol_type == CODING_EOL_UNDECIDED
5596 && coding->type != coding_type_ccl)
d46c5b12 5597 {
d46c5b12
KH
5598 detect_eol (coding, BYTE_POS_ADDR (from_byte), len_byte);
5599 if (coding->eol_type == CODING_EOL_UNDECIDED)
5600 coding->eol_type = CODING_EOL_LF;
5601 /* We had better recover the original eol format if we
8ca3766a 5602 encounter an inconsistent eol format while decoding. */
d46c5b12
KH
5603 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
5604 }
5605 }
5606
d46c5b12
KH
5607 /* Now we convert the text. */
5608
5609 /* For encoding, we must process pre-write-conversion in advance. */
b73bfc1c
KH
5610 if (! inhibit_pre_post_conversion
5611 && encodep
d46c5b12
KH
5612 && SYMBOLP (coding->pre_write_conversion)
5613 && ! NILP (Ffboundp (coding->pre_write_conversion)))
5614 {
2b4f9037
KH
5615 /* The function in pre-write-conversion may put a new text in a
5616 new buffer. */
0007bdd0
KH
5617 struct buffer *prev = current_buffer;
5618 Lisp_Object new;
d46c5b12 5619
1c7457e2 5620 record_unwind_protect (code_convert_region_unwind,
16ef9c56 5621 Fcons (Vlast_coding_system_used, Qnil));
b843d1ae
KH
5622 /* We should not call any more pre-write/post-read-conversion
5623 functions while this pre-write-conversion is running. */
5624 inhibit_pre_post_conversion = 1;
b39f748c
AS
5625 call2 (coding->pre_write_conversion,
5626 make_number (from), make_number (to));
b843d1ae
KH
5627 inhibit_pre_post_conversion = 0;
5628 /* Discard the unwind protect. */
5629 specpdl_ptr--;
5630
d46c5b12
KH
5631 if (current_buffer != prev)
5632 {
5633 len = ZV - BEGV;
0007bdd0 5634 new = Fcurrent_buffer ();
d46c5b12 5635 set_buffer_internal_1 (prev);
7dae4502 5636 del_range_2 (from, from_byte, to, to_byte, 0);
e133c8fa 5637 TEMP_SET_PT_BOTH (from, from_byte);
0007bdd0
KH
5638 insert_from_buffer (XBUFFER (new), 1, len, 0);
5639 Fkill_buffer (new);
e133c8fa
KH
5640 if (orig_point >= to)
5641 orig_point += len - orig_len;
5642 else if (orig_point > from)
5643 orig_point = from;
5644 orig_len = len;
d46c5b12 5645 to = from + len;
b73bfc1c
KH
5646 from_byte = CHAR_TO_BYTE (from);
5647 to_byte = CHAR_TO_BYTE (to);
d46c5b12 5648 len_byte = to_byte - from_byte;
e133c8fa 5649 TEMP_SET_PT_BOTH (from, from_byte);
d46c5b12
KH
5650 }
5651 }
5652
12410ef1 5653 if (replace)
72d1a715
RS
5654 {
5655 if (! EQ (current_buffer->undo_list, Qt))
5656 deletion = make_buffer_string_both (from, from_byte, to, to_byte, 1);
5657 else
5658 {
5659 nchars_del = to - from;
5660 nbytes_del = to_byte - from_byte;
5661 }
5662 }
12410ef1 5663
ec6d2bb8
KH
5664 if (coding->composing != COMPOSITION_DISABLED)
5665 {
5666 if (encodep)
5667 coding_save_composition (coding, from, to, Fcurrent_buffer ());
5668 else
5669 coding_allocate_composition_data (coding, from);
5670 }
fb88bf2d 5671
ce559e6f
KH
5672 /* Try to skip the heading and tailing ASCIIs. We can't skip them
5673 if we must run CCL program or there are compositions to
5674 encode. */
5675 if (coding->type != coding_type_ccl
5676 && (! coding->cmp_data || coding->cmp_data->used == 0))
4956c225
KH
5677 {
5678 int from_byte_orig = from_byte, to_byte_orig = to_byte;
ec6d2bb8 5679
4956c225
KH
5680 if (from < GPT && GPT < to)
5681 move_gap_both (from, from_byte);
5682 SHRINK_CONVERSION_REGION (&from_byte, &to_byte, coding, NULL, encodep);
5683 if (from_byte == to_byte
5684 && (encodep || NILP (coding->post_read_conversion))
5685 && ! CODING_REQUIRE_FLUSHING (coding))
5686 {
5687 coding->produced = len_byte;
5688 coding->produced_char = len;
5689 if (!replace)
5690 /* We must record and adjust for this new text now. */
5691 adjust_after_insert (from, from_byte_orig, to, to_byte_orig, len);
ce559e6f 5692 coding_free_composition_data (coding);
4956c225
KH
5693 return 0;
5694 }
5695
5696 head_skip = from_byte - from_byte_orig;
5697 tail_skip = to_byte_orig - to_byte;
5698 total_skip = head_skip + tail_skip;
5699 from += head_skip;
5700 to -= tail_skip;
5701 len -= total_skip; len_byte -= total_skip;
5702 }
d46c5b12 5703
8ca3766a 5704 /* For conversion, we must put the gap before the text in addition to
fb88bf2d
KH
5705 making the gap larger for efficient decoding. The required gap
5706 size starts from 2000 which is the magic number used in make_gap.
5707 But, after one batch of conversion, it will be incremented if we
5708 find that it is not enough . */
d46c5b12
KH
5709 require = 2000;
5710
5711 if (GAP_SIZE < require)
5712 make_gap (require - GAP_SIZE);
5713 move_gap_both (from, from_byte);
5714
d46c5b12 5715 inserted = inserted_byte = 0;
fb88bf2d
KH
5716
5717 GAP_SIZE += len_byte;
5718 ZV -= len;
5719 Z -= len;
5720 ZV_BYTE -= len_byte;
5721 Z_BYTE -= len_byte;
5722
d9f9a1bc
GM
5723 if (GPT - BEG < BEG_UNCHANGED)
5724 BEG_UNCHANGED = GPT - BEG;
5725 if (Z - GPT < END_UNCHANGED)
5726 END_UNCHANGED = Z - GPT;
f2558efd 5727
b73bfc1c
KH
5728 if (!encodep && coding->src_multibyte)
5729 {
5730 /* Decoding routines expects that the source text is unibyte.
5731 We must convert 8-bit characters of multibyte form to
5732 unibyte. */
5733 int len_byte_orig = len_byte;
5734 len_byte = str_as_unibyte (GAP_END_ADDR - len_byte, len_byte);
5735 if (len_byte < len_byte_orig)
5736 safe_bcopy (GAP_END_ADDR - len_byte_orig, GAP_END_ADDR - len_byte,
5737 len_byte);
5738 coding->src_multibyte = 0;
5739 }
5740
d46c5b12
KH
5741 for (;;)
5742 {
fb88bf2d 5743 int result;
d46c5b12 5744
ec6d2bb8 5745 /* The buffer memory is now:
b73bfc1c
KH
5746 +--------+converted-text+---------+-------original-text-------+---+
5747 |<-from->|<--inserted-->|---------|<--------len_byte--------->|---|
5748 |<---------------------- GAP ----------------------->| */
ec6d2bb8
KH
5749 src = GAP_END_ADDR - len_byte;
5750 dst = GPT_ADDR + inserted_byte;
5751
d46c5b12 5752 if (encodep)
fb88bf2d 5753 result = encode_coding (coding, src, dst, len_byte, 0);
d46c5b12 5754 else
0e79d667
RS
5755 {
5756 if (coding->composing != COMPOSITION_DISABLED)
5757 coding->cmp_data->char_offset = from + inserted;
5758 result = decode_coding (coding, src, dst, len_byte, 0);
5759 }
ec6d2bb8
KH
5760
5761 /* The buffer memory is now:
b73bfc1c
KH
5762 +--------+-------converted-text----+--+------original-text----+---+
5763 |<-from->|<-inserted->|<-produced->|--|<-(len_byte-consumed)->|---|
5764 |<---------------------- GAP ----------------------->| */
ec6d2bb8 5765
d46c5b12
KH
5766 inserted += coding->produced_char;
5767 inserted_byte += coding->produced;
d46c5b12 5768 len_byte -= coding->consumed;
ec6d2bb8
KH
5769
5770 if (result == CODING_FINISH_INSUFFICIENT_CMP)
5771 {
5772 coding_allocate_composition_data (coding, from + inserted);
5773 continue;
5774 }
5775
fb88bf2d 5776 src += coding->consumed;
3636f7a3 5777 dst += coding->produced;
d46c5b12 5778
9864ebce
KH
5779 if (result == CODING_FINISH_NORMAL)
5780 {
5781 src += len_byte;
5782 break;
5783 }
d46c5b12
KH
5784 if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL)
5785 {
fb88bf2d 5786 unsigned char *pend = dst, *p = pend - inserted_byte;
38edf7d4 5787 Lisp_Object eol_type;
d46c5b12
KH
5788
5789 /* Encode LFs back to the original eol format (CR or CRLF). */
5790 if (coding->eol_type == CODING_EOL_CR)
5791 {
5792 while (p < pend) if (*p++ == '\n') p[-1] = '\r';
5793 }
5794 else
5795 {
d46c5b12
KH
5796 int count = 0;
5797
fb88bf2d
KH
5798 while (p < pend) if (*p++ == '\n') count++;
5799 if (src - dst < count)
d46c5b12 5800 {
38edf7d4 5801 /* We don't have sufficient room for encoding LFs
fb88bf2d
KH
5802 back to CRLF. We must record converted and
5803 not-yet-converted text back to the buffer
5804 content, enlarge the gap, then record them out of
5805 the buffer contents again. */
5806 int add = len_byte + inserted_byte;
5807
5808 GAP_SIZE -= add;
5809 ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add;
5810 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5811 make_gap (count - GAP_SIZE);
5812 GAP_SIZE += add;
5813 ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add;
5814 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
5815 /* Don't forget to update SRC, DST, and PEND. */
5816 src = GAP_END_ADDR - len_byte;
5817 dst = GPT_ADDR + inserted_byte;
5818 pend = dst;
d46c5b12 5819 }
d46c5b12
KH
5820 inserted += count;
5821 inserted_byte += count;
fb88bf2d
KH
5822 coding->produced += count;
5823 p = dst = pend + count;
5824 while (count)
5825 {
5826 *--p = *--pend;
5827 if (*p == '\n') count--, *--p = '\r';
5828 }
d46c5b12
KH
5829 }
5830
5831 /* Suppress eol-format conversion in the further conversion. */
5832 coding->eol_type = CODING_EOL_LF;
5833
38edf7d4
KH
5834 /* Set the coding system symbol to that for Unix-like EOL. */
5835 eol_type = Fget (saved_coding_symbol, Qeol_type);
5836 if (VECTORP (eol_type)
5837 && XVECTOR (eol_type)->size == 3
5838 && SYMBOLP (XVECTOR (eol_type)->contents[CODING_EOL_LF]))
5839 coding->symbol = XVECTOR (eol_type)->contents[CODING_EOL_LF];
5840 else
5841 coding->symbol = saved_coding_symbol;
93dec019 5842
fb88bf2d 5843 continue;
d46c5b12
KH
5844 }
5845 if (len_byte <= 0)
944bd420
KH
5846 {
5847 if (coding->type != coding_type_ccl
5848 || coding->mode & CODING_MODE_LAST_BLOCK)
5849 break;
5850 coding->mode |= CODING_MODE_LAST_BLOCK;
5851 continue;
5852 }
d46c5b12
KH
5853 if (result == CODING_FINISH_INSUFFICIENT_SRC)
5854 {
5855 /* The source text ends in invalid codes. Let's just
5856 make them valid buffer contents, and finish conversion. */
70ad9fc4
GM
5857 if (multibyte_p)
5858 {
5859 unsigned char *start = dst;
93dec019 5860
70ad9fc4
GM
5861 inserted += len_byte;
5862 while (len_byte--)
5863 {
5864 int c = *src++;
5865 dst += CHAR_STRING (c, dst);
5866 }
5867
5868 inserted_byte += dst - start;
5869 }
5870 else
5871 {
5872 inserted += len_byte;
5873 inserted_byte += len_byte;
5874 while (len_byte--)
5875 *dst++ = *src++;
5876 }
d46c5b12
KH
5877 break;
5878 }
9864ebce
KH
5879 if (result == CODING_FINISH_INTERRUPT)
5880 {
5881 /* The conversion procedure was interrupted by a user. */
9864ebce
KH
5882 break;
5883 }
5884 /* Now RESULT == CODING_FINISH_INSUFFICIENT_DST */
5885 if (coding->consumed < 1)
5886 {
5887 /* It's quite strange to require more memory without
5888 consuming any bytes. Perhaps CCL program bug. */
9864ebce
KH
5889 break;
5890 }
fb88bf2d
KH
5891 if (first)
5892 {
5893 /* We have just done the first batch of conversion which was
8ca3766a 5894 stopped because of insufficient gap. Let's reconsider the
fb88bf2d
KH
5895 required gap size (i.e. SRT - DST) now.
5896
5897 We have converted ORIG bytes (== coding->consumed) into
5898 NEW bytes (coding->produced). To convert the remaining
5899 LEN bytes, we may need REQUIRE bytes of gap, where:
5900 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
5901 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
5902 Here, we are sure that NEW >= ORIG. */
b3385c28
KH
5903
5904 if (coding->produced <= coding->consumed)
5905 {
5906 /* This happens because of CCL-based coding system with
5907 eol-type CRLF. */
5908 require = 0;
5909 }
5910 else
5911 {
b3ebb2d4
KH
5912 float ratio = coding->produced - coding->consumed;
5913 ratio /= coding->consumed;
b3385c28
KH
5914 require = len_byte * ratio;
5915 }
fb88bf2d
KH
5916 first = 0;
5917 }
5918 if ((src - dst) < (require + 2000))
5919 {
5920 /* See the comment above the previous call of make_gap. */
5921 int add = len_byte + inserted_byte;
5922
5923 GAP_SIZE -= add;
5924 ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add;
5925 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5926 make_gap (require + 2000);
5927 GAP_SIZE += add;
5928 ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add;
5929 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
fb88bf2d 5930 }
d46c5b12 5931 }
fb88bf2d
KH
5932 if (src - dst > 0) *dst = 0; /* Put an anchor. */
5933
b73bfc1c
KH
5934 if (encodep && coding->dst_multibyte)
5935 {
5936 /* The output is unibyte. We must convert 8-bit characters to
5937 multibyte form. */
5938 if (inserted_byte * 2 > GAP_SIZE)
5939 {
5940 GAP_SIZE -= inserted_byte;
5941 ZV += inserted_byte; Z += inserted_byte;
5942 ZV_BYTE += inserted_byte; Z_BYTE += inserted_byte;
5943 GPT += inserted_byte; GPT_BYTE += inserted_byte;
5944 make_gap (inserted_byte - GAP_SIZE);
5945 GAP_SIZE += inserted_byte;
5946 ZV -= inserted_byte; Z -= inserted_byte;
5947 ZV_BYTE -= inserted_byte; Z_BYTE -= inserted_byte;
5948 GPT -= inserted_byte; GPT_BYTE -= inserted_byte;
5949 }
5950 inserted_byte = str_to_multibyte (GPT_ADDR, GAP_SIZE, inserted_byte);
5951 }
7553d0e1 5952
93dec019 5953 /* If we shrank the conversion area, adjust it now. */
12410ef1
KH
5954 if (total_skip > 0)
5955 {
5956 if (tail_skip > 0)
5957 safe_bcopy (GAP_END_ADDR, GPT_ADDR + inserted_byte, tail_skip);
5958 inserted += total_skip; inserted_byte += total_skip;
5959 GAP_SIZE += total_skip;
5960 GPT -= head_skip; GPT_BYTE -= head_skip;
5961 ZV -= total_skip; ZV_BYTE -= total_skip;
5962 Z -= total_skip; Z_BYTE -= total_skip;
5963 from -= head_skip; from_byte -= head_skip;
5964 to += tail_skip; to_byte += tail_skip;
5965 }
5966
6abb9bd9 5967 prev_Z = Z;
72d1a715
RS
5968 if (! EQ (current_buffer->undo_list, Qt))
5969 adjust_after_replace (from, from_byte, deletion, inserted, inserted_byte);
5970 else
5971 adjust_after_replace_noundo (from, from_byte, nchars_del, nbytes_del,
5972 inserted, inserted_byte);
6abb9bd9 5973 inserted = Z - prev_Z;
4ed46869 5974
ec6d2bb8
KH
5975 if (!encodep && coding->cmp_data && coding->cmp_data->used)
5976 coding_restore_composition (coding, Fcurrent_buffer ());
5977 coding_free_composition_data (coding);
5978
b73bfc1c
KH
5979 if (! inhibit_pre_post_conversion
5980 && ! encodep && ! NILP (coding->post_read_conversion))
d46c5b12 5981 {
2b4f9037 5982 Lisp_Object val;
1c7457e2 5983 Lisp_Object saved_coding_system;
4ed46869 5984
e133c8fa
KH
5985 if (from != PT)
5986 TEMP_SET_PT_BOTH (from, from_byte);
6abb9bd9 5987 prev_Z = Z;
1c7457e2 5988 record_unwind_protect (code_convert_region_unwind,
16ef9c56 5989 Fcons (Vlast_coding_system_used, Qnil));
1c7457e2
KH
5990 saved_coding_system = Vlast_coding_system_used;
5991 Vlast_coding_system_used = coding->symbol;
b843d1ae
KH
5992 /* We should not call any more pre-write/post-read-conversion
5993 functions while this post-read-conversion is running. */
5994 inhibit_pre_post_conversion = 1;
2b4f9037 5995 val = call1 (coding->post_read_conversion, make_number (inserted));
b843d1ae 5996 inhibit_pre_post_conversion = 0;
1c7457e2
KH
5997 coding->symbol = Vlast_coding_system_used;
5998 Vlast_coding_system_used = saved_coding_system;
b843d1ae
KH
5999 /* Discard the unwind protect. */
6000 specpdl_ptr--;
b7826503 6001 CHECK_NUMBER (val);
944bd420 6002 inserted += Z - prev_Z;
e133c8fa
KH
6003 }
6004
6005 if (orig_point >= from)
6006 {
6007 if (orig_point >= from + orig_len)
6008 orig_point += inserted - orig_len;
6009 else
6010 orig_point = from;
6011 TEMP_SET_PT (orig_point);
d46c5b12 6012 }
4ed46869 6013
ec6d2bb8
KH
6014 if (replace)
6015 {
6016 signal_after_change (from, to - from, inserted);
e19539f1 6017 update_compositions (from, from + inserted, CHECK_BORDER);
ec6d2bb8 6018 }
2b4f9037 6019
fb88bf2d 6020 {
12410ef1
KH
6021 coding->consumed = to_byte - from_byte;
6022 coding->consumed_char = to - from;
6023 coding->produced = inserted_byte;
6024 coding->produced_char = inserted;
fb88bf2d 6025 }
7553d0e1 6026
fb88bf2d 6027 return 0;
d46c5b12
KH
6028}
6029
2a47931b
KH
6030/* Name (or base name) of work buffer for code conversion. */
6031static Lisp_Object Vcode_conversion_workbuf_name;
6032
6033/* Set the current buffer to the working buffer prepared for
6034 code-conversion. MULTIBYTE specifies the multibyteness of the
16ef9c56
KH
6035 buffer. Return the buffer we set if it must be killed after use.
6036 Otherwise return Qnil. */
2a47931b 6037
16ef9c56 6038static Lisp_Object
2a47931b
KH
6039set_conversion_work_buffer (multibyte)
6040 int multibyte;
6041{
16ef9c56 6042 Lisp_Object buffer, buffer_to_kill;
2a47931b
KH
6043 struct buffer *buf;
6044
6045 buffer = Fget_buffer_create (Vcode_conversion_workbuf_name);
6046 buf = XBUFFER (buffer);
16ef9c56
KH
6047 if (buf == current_buffer)
6048 {
6049 /* As we are already in the work buffer, we must generate a new
6050 buffer for the work. */
6051 Lisp_Object name;
6052
6053 name = Fgenerate_new_buffer_name (Vcode_conversion_workbuf_name, Qnil);
6054 buffer = buffer_to_kill = Fget_buffer_create (name);
6055 buf = XBUFFER (buffer);
6056 }
6057 else
6058 buffer_to_kill = Qnil;
6059
2a47931b
KH
6060 delete_all_overlays (buf);
6061 buf->directory = current_buffer->directory;
6062 buf->read_only = Qnil;
6063 buf->filename = Qnil;
6064 buf->undo_list = Qt;
6065 eassert (buf->overlays_before == NULL);
6066 eassert (buf->overlays_after == NULL);
6067 set_buffer_internal (buf);
6068 if (BEG != BEGV || Z != ZV)
6069 Fwiden ();
6070 del_range_2 (BEG, BEG_BYTE, Z, Z_BYTE, 0);
6071 buf->enable_multibyte_characters = multibyte ? Qt : Qnil;
16ef9c56 6072 return buffer_to_kill;
2a47931b
KH
6073}
6074
d46c5b12 6075Lisp_Object
b73bfc1c
KH
6076run_pre_post_conversion_on_str (str, coding, encodep)
6077 Lisp_Object str;
6078 struct coding_system *coding;
6079 int encodep;
6080{
aed13378 6081 int count = SPECPDL_INDEX ();
cf3b32fc 6082 struct gcpro gcpro1, gcpro2;
b73bfc1c 6083 int multibyte = STRING_MULTIBYTE (str);
cf3b32fc 6084 Lisp_Object old_deactivate_mark;
16ef9c56 6085 Lisp_Object buffer_to_kill;
24a2b282 6086 Lisp_Object unwind_arg;
b73bfc1c
KH
6087
6088 record_unwind_protect (Fset_buffer, Fcurrent_buffer ());
cf3b32fc
RS
6089 /* It is not crucial to specbind this. */
6090 old_deactivate_mark = Vdeactivate_mark;
6091 GCPRO2 (str, old_deactivate_mark);
3fd9494b 6092
b73bfc1c
KH
6093 /* We must insert the contents of STR as is without
6094 unibyte<->multibyte conversion. For that, we adjust the
6095 multibyteness of the working buffer to that of STR. */
16ef9c56 6096 buffer_to_kill = set_conversion_work_buffer (multibyte);
24a2b282
KH
6097 if (NILP (buffer_to_kill))
6098 unwind_arg = Fcons (Vlast_coding_system_used, Qnil);
6099 else
6100 unwind_arg = list2 (Vlast_coding_system_used, buffer_to_kill);
6101 record_unwind_protect (code_convert_region_unwind, unwind_arg);
3fd9494b 6102
b73bfc1c 6103 insert_from_string (str, 0, 0,
d5db4077 6104 SCHARS (str), SBYTES (str), 0);
b73bfc1c
KH
6105 UNGCPRO;
6106 inhibit_pre_post_conversion = 1;
6107 if (encodep)
24a2b282
KH
6108 {
6109 struct buffer *prev = current_buffer;
6110
6111 call2 (coding->pre_write_conversion, make_number (BEG), make_number (Z));
6112 if (prev != current_buffer)
6113 /* We must kill the current buffer too. */
6114 Fsetcdr (unwind_arg, Fcons (Fcurrent_buffer (), XCDR (unwind_arg)));
6115 }
b73bfc1c 6116 else
6bac5b12 6117 {
1c7457e2 6118 Vlast_coding_system_used = coding->symbol;
6bac5b12
KH
6119 TEMP_SET_PT_BOTH (BEG, BEG_BYTE);
6120 call1 (coding->post_read_conversion, make_number (Z - BEG));
1c7457e2 6121 coding->symbol = Vlast_coding_system_used;
6bac5b12 6122 }
b73bfc1c 6123 inhibit_pre_post_conversion = 0;
cf3b32fc 6124 Vdeactivate_mark = old_deactivate_mark;
78108bcd 6125 str = make_buffer_string (BEG, Z, 1);
b73bfc1c
KH
6126 return unbind_to (count, str);
6127}
6128
2a47931b
KH
6129
6130/* Run pre-write-conversion function of CODING on NCHARS/NBYTES
6131 text in *STR. *SIZE is the allocated bytes for STR. As it
6132 is intended that this function is called from encode_terminal_code,
6133 the pre-write-conversion function is run by safe_call and thus
6134 "Error during redisplay: ..." is logged when an error occurs.
6135
6136 Store the resulting text in *STR and set CODING->produced_char and
6137 CODING->produced to the number of characters and bytes
6138 respectively. If the size of *STR is too small, enlarge it by
6139 xrealloc and update *STR and *SIZE. */
6140
6141void
6142run_pre_write_conversin_on_c_str (str, size, nchars, nbytes, coding)
6143 unsigned char **str;
6144 int *size, nchars, nbytes;
6145 struct coding_system *coding;
6146{
6147 struct gcpro gcpro1, gcpro2;
6148 struct buffer *cur = current_buffer;
24a2b282 6149 struct buffer *prev;
2a47931b
KH
6150 Lisp_Object old_deactivate_mark, old_last_coding_system_used;
6151 Lisp_Object args[3];
16ef9c56 6152 Lisp_Object buffer_to_kill;
2a47931b
KH
6153
6154 /* It is not crucial to specbind this. */
6155 old_deactivate_mark = Vdeactivate_mark;
6156 old_last_coding_system_used = Vlast_coding_system_used;
6157 GCPRO2 (old_deactivate_mark, old_last_coding_system_used);
6158
6159 /* We must insert the contents of STR as is without
6160 unibyte<->multibyte conversion. For that, we adjust the
6161 multibyteness of the working buffer to that of STR. */
16ef9c56 6162 buffer_to_kill = set_conversion_work_buffer (coding->src_multibyte);
2a47931b
KH
6163 insert_1_both (*str, nchars, nbytes, 0, 0, 0);
6164 UNGCPRO;
6165 inhibit_pre_post_conversion = 1;
24a2b282 6166 prev = current_buffer;
2a47931b
KH
6167 args[0] = coding->pre_write_conversion;
6168 args[1] = make_number (BEG);
6169 args[2] = make_number (Z);
6170 safe_call (3, args);
6171 inhibit_pre_post_conversion = 0;
6172 Vdeactivate_mark = old_deactivate_mark;
6173 Vlast_coding_system_used = old_last_coding_system_used;
6174 coding->produced_char = Z - BEG;
6175 coding->produced = Z_BYTE - BEG_BYTE;
6176 if (coding->produced > *size)
6177 {
6178 *size = coding->produced;
6179 *str = xrealloc (*str, *size);
6180 }
6181 if (BEG < GPT && GPT < Z)
6182 move_gap (BEG);
6183 bcopy (BEG_ADDR, *str, coding->produced);
6184 coding->src_multibyte
6185 = ! NILP (current_buffer->enable_multibyte_characters);
24a2b282
KH
6186 if (prev != current_buffer)
6187 Fkill_buffer (Fcurrent_buffer ());
2a47931b 6188 set_buffer_internal (cur);
16ef9c56
KH
6189 if (! NILP (buffer_to_kill))
6190 Fkill_buffer (buffer_to_kill);
2a47931b
KH
6191}
6192
6193
b73bfc1c
KH
6194Lisp_Object
6195decode_coding_string (str, coding, nocopy)
d46c5b12 6196 Lisp_Object str;
4ed46869 6197 struct coding_system *coding;
b73bfc1c 6198 int nocopy;
4ed46869 6199{
d46c5b12 6200 int len;
73be902c 6201 struct conversion_buffer buf;
da55a2b7 6202 int from, to_byte;
84d60297 6203 Lisp_Object saved_coding_symbol;
d46c5b12 6204 int result;
78108bcd 6205 int require_decoding;
73be902c
KH
6206 int shrinked_bytes = 0;
6207 Lisp_Object newstr;
2391eaa4 6208 int consumed, consumed_char, produced, produced_char;
4ed46869 6209
b73bfc1c 6210 from = 0;
d5db4077 6211 to_byte = SBYTES (str);
4ed46869 6212
8844fa83 6213 saved_coding_symbol = coding->symbol;
764ca8da
KH
6214 coding->src_multibyte = STRING_MULTIBYTE (str);
6215 coding->dst_multibyte = 1;
b73bfc1c 6216 if (CODING_REQUIRE_DETECTION (coding))
d46c5b12
KH
6217 {
6218 /* See the comments in code_convert_region. */
6219 if (coding->type == coding_type_undecided)
6220 {
d5db4077 6221 detect_coding (coding, SDATA (str), to_byte);
d46c5b12 6222 if (coding->type == coding_type_undecided)
d280ccb6
KH
6223 {
6224 coding->type = coding_type_emacs_mule;
6225 coding->category_idx = CODING_CATEGORY_IDX_EMACS_MULE;
6226 /* As emacs-mule decoder will handle composition, we
6227 need this setting to allocate coding->cmp_data
6228 later. */
6229 coding->composing = COMPOSITION_NO;
6230 }
d46c5b12 6231 }
aaaf0b1e
KH
6232 if (coding->eol_type == CODING_EOL_UNDECIDED
6233 && coding->type != coding_type_ccl)
d46c5b12
KH
6234 {
6235 saved_coding_symbol = coding->symbol;
d5db4077 6236 detect_eol (coding, SDATA (str), to_byte);
d46c5b12
KH
6237 if (coding->eol_type == CODING_EOL_UNDECIDED)
6238 coding->eol_type = CODING_EOL_LF;
6239 /* We had better recover the original eol format if we
8ca3766a 6240 encounter an inconsistent eol format while decoding. */
d46c5b12
KH
6241 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
6242 }
6243 }
4ed46869 6244
764ca8da
KH
6245 if (coding->type == coding_type_no_conversion
6246 || coding->type == coding_type_raw_text)
6247 coding->dst_multibyte = 0;
6248
78108bcd 6249 require_decoding = CODING_REQUIRE_DECODING (coding);
ec6d2bb8 6250
b73bfc1c 6251 if (STRING_MULTIBYTE (str))
d46c5b12 6252 {
b73bfc1c
KH
6253 /* Decoding routines expect the source text to be unibyte. */
6254 str = Fstring_as_unibyte (str);
d5db4077 6255 to_byte = SBYTES (str);
b73bfc1c 6256 nocopy = 1;
764ca8da 6257 coding->src_multibyte = 0;
b73bfc1c 6258 }
ec6d2bb8 6259
b73bfc1c 6260 /* Try to skip the heading and tailing ASCIIs. */
78108bcd 6261 if (require_decoding && coding->type != coding_type_ccl)
4956c225 6262 {
d5db4077 6263 SHRINK_CONVERSION_REGION (&from, &to_byte, coding, SDATA (str),
4956c225
KH
6264 0);
6265 if (from == to_byte)
78108bcd 6266 require_decoding = 0;
d5db4077 6267 shrinked_bytes = from + (SBYTES (str) - to_byte);
4956c225 6268 }
b73bfc1c 6269
439ad9ea
KH
6270 if (!require_decoding
6271 && !(SYMBOLP (coding->post_read_conversion)
6272 && !NILP (Ffboundp (coding->post_read_conversion))))
78108bcd 6273 {
d5db4077
KR
6274 coding->consumed = SBYTES (str);
6275 coding->consumed_char = SCHARS (str);
78108bcd
KH
6276 if (coding->dst_multibyte)
6277 {
6278 str = Fstring_as_multibyte (str);
6279 nocopy = 1;
6280 }
d5db4077
KR
6281 coding->produced = SBYTES (str);
6282 coding->produced_char = SCHARS (str);
78108bcd
KH
6283 return (nocopy ? str : Fcopy_sequence (str));
6284 }
6285
6286 if (coding->composing != COMPOSITION_DISABLED)
6287 coding_allocate_composition_data (coding, from);
b73bfc1c 6288 len = decoding_buffer_size (coding, to_byte - from);
73be902c 6289 allocate_conversion_buffer (buf, len);
4ed46869 6290
2391eaa4 6291 consumed = consumed_char = produced = produced_char = 0;
73be902c 6292 while (1)
4ed46869 6293 {
d5db4077 6294 result = decode_coding (coding, SDATA (str) + from + consumed,
73be902c
KH
6295 buf.data + produced, to_byte - from - consumed,
6296 buf.size - produced);
6297 consumed += coding->consumed;
2391eaa4 6298 consumed_char += coding->consumed_char;
73be902c
KH
6299 produced += coding->produced;
6300 produced_char += coding->produced_char;
2391eaa4 6301 if (result == CODING_FINISH_NORMAL
c3912f23 6302 || result == CODING_FINISH_INTERRUPT
2391eaa4
KH
6303 || (result == CODING_FINISH_INSUFFICIENT_SRC
6304 && coding->consumed == 0))
73be902c
KH
6305 break;
6306 if (result == CODING_FINISH_INSUFFICIENT_CMP)
6307 coding_allocate_composition_data (coding, from + produced_char);
6308 else if (result == CODING_FINISH_INSUFFICIENT_DST)
6309 extend_conversion_buffer (&buf);
6310 else if (result == CODING_FINISH_INCONSISTENT_EOL)
6311 {
8844fa83
KH
6312 Lisp_Object eol_type;
6313
73be902c
KH
6314 /* Recover the original EOL format. */
6315 if (coding->eol_type == CODING_EOL_CR)
6316 {
6317 unsigned char *p;
6318 for (p = buf.data; p < buf.data + produced; p++)
6319 if (*p == '\n') *p = '\r';
6320 }
6321 else if (coding->eol_type == CODING_EOL_CRLF)
6322 {
6323 int num_eol = 0;
6324 unsigned char *p0, *p1;
6325 for (p0 = buf.data, p1 = p0 + produced; p0 < p1; p0++)
6326 if (*p0 == '\n') num_eol++;
6327 if (produced + num_eol >= buf.size)
6328 extend_conversion_buffer (&buf);
6329 for (p0 = buf.data + produced, p1 = p0 + num_eol; p0 > buf.data;)
6330 {
6331 *--p1 = *--p0;
6332 if (*p0 == '\n') *--p1 = '\r';
6333 }
6334 produced += num_eol;
6335 produced_char += num_eol;
93dec019 6336 }
8844fa83 6337 /* Suppress eol-format conversion in the further conversion. */
73be902c 6338 coding->eol_type = CODING_EOL_LF;
8844fa83
KH
6339
6340 /* Set the coding system symbol to that for Unix-like EOL. */
6341 eol_type = Fget (saved_coding_symbol, Qeol_type);
6342 if (VECTORP (eol_type)
6343 && XVECTOR (eol_type)->size == 3
6344 && SYMBOLP (XVECTOR (eol_type)->contents[CODING_EOL_LF]))
6345 coding->symbol = XVECTOR (eol_type)->contents[CODING_EOL_LF];
6346 else
6347 coding->symbol = saved_coding_symbol;
6348
6349
73be902c 6350 }
4ed46869 6351 }
d46c5b12 6352
2391eaa4
KH
6353 coding->consumed = consumed;
6354 coding->consumed_char = consumed_char;
6355 coding->produced = produced;
6356 coding->produced_char = produced_char;
6357
78108bcd 6358 if (coding->dst_multibyte)
73be902c
KH
6359 newstr = make_uninit_multibyte_string (produced_char + shrinked_bytes,
6360 produced + shrinked_bytes);
78108bcd 6361 else
73be902c
KH
6362 newstr = make_uninit_string (produced + shrinked_bytes);
6363 if (from > 0)
a4244313
KR
6364 STRING_COPYIN (newstr, 0, SDATA (str), from);
6365 STRING_COPYIN (newstr, from, buf.data, produced);
73be902c 6366 if (shrinked_bytes > from)
a4244313
KR
6367 STRING_COPYIN (newstr, from + produced,
6368 SDATA (str) + to_byte,
6369 shrinked_bytes - from);
73be902c 6370 free_conversion_buffer (&buf);
b73bfc1c 6371
160a708c
KH
6372 coding->consumed += shrinked_bytes;
6373 coding->consumed_char += shrinked_bytes;
6374 coding->produced += shrinked_bytes;
6375 coding->produced_char += shrinked_bytes;
6376
b73bfc1c 6377 if (coding->cmp_data && coding->cmp_data->used)
73be902c 6378 coding_restore_composition (coding, newstr);
b73bfc1c
KH
6379 coding_free_composition_data (coding);
6380
6381 if (SYMBOLP (coding->post_read_conversion)
6382 && !NILP (Ffboundp (coding->post_read_conversion)))
73be902c 6383 newstr = run_pre_post_conversion_on_str (newstr, coding, 0);
b73bfc1c 6384
73be902c 6385 return newstr;
b73bfc1c
KH
6386}
6387
6388Lisp_Object
6389encode_coding_string (str, coding, nocopy)
6390 Lisp_Object str;
6391 struct coding_system *coding;
6392 int nocopy;
6393{
6394 int len;
73be902c 6395 struct conversion_buffer buf;
b73bfc1c 6396 int from, to, to_byte;
b73bfc1c 6397 int result;
73be902c
KH
6398 int shrinked_bytes = 0;
6399 Lisp_Object newstr;
2391eaa4 6400 int consumed, consumed_char, produced, produced_char;
b73bfc1c
KH
6401
6402 if (SYMBOLP (coding->pre_write_conversion)
6403 && !NILP (Ffboundp (coding->pre_write_conversion)))
3bb917bf
KH
6404 {
6405 str = run_pre_post_conversion_on_str (str, coding, 1);
6406 /* As STR is just newly generated, we don't have to copy it
6407 anymore. */
6408 nocopy = 1;
6409 }
b73bfc1c
KH
6410
6411 from = 0;
d5db4077
KR
6412 to = SCHARS (str);
6413 to_byte = SBYTES (str);
b73bfc1c 6414
e2c06b17
KH
6415 /* Encoding routines determine the multibyteness of the source text
6416 by coding->src_multibyte. */
3bb917bf 6417 coding->src_multibyte = SCHARS (str) < SBYTES (str);
e2c06b17 6418 coding->dst_multibyte = 0;
b73bfc1c 6419 if (! CODING_REQUIRE_ENCODING (coding))
3bb917bf 6420 goto no_need_of_encoding;
826bfb8b 6421
b73bfc1c
KH
6422 if (coding->composing != COMPOSITION_DISABLED)
6423 coding_save_composition (coding, from, to, str);
ec6d2bb8 6424
ce559e6f
KH
6425 /* Try to skip the heading and tailing ASCIIs. We can't skip them
6426 if we must run CCL program or there are compositions to
6427 encode. */
6428 if (coding->type != coding_type_ccl
6429 && (! coding->cmp_data || coding->cmp_data->used == 0))
4956c225 6430 {
d5db4077 6431 SHRINK_CONVERSION_REGION (&from, &to_byte, coding, SDATA (str),
4956c225
KH
6432 1);
6433 if (from == to_byte)
ce559e6f
KH
6434 {
6435 coding_free_composition_data (coding);
3bb917bf 6436 goto no_need_of_encoding;
ce559e6f 6437 }
d5db4077 6438 shrinked_bytes = from + (SBYTES (str) - to_byte);
4956c225 6439 }
b73bfc1c
KH
6440
6441 len = encoding_buffer_size (coding, to_byte - from);
73be902c
KH
6442 allocate_conversion_buffer (buf, len);
6443
2391eaa4 6444 consumed = consumed_char = produced = produced_char = 0;
73be902c
KH
6445 while (1)
6446 {
d5db4077 6447 result = encode_coding (coding, SDATA (str) + from + consumed,
73be902c
KH
6448 buf.data + produced, to_byte - from - consumed,
6449 buf.size - produced);
6450 consumed += coding->consumed;
2391eaa4 6451 consumed_char += coding->consumed_char;
13004bef 6452 produced += coding->produced;
2391eaa4
KH
6453 produced_char += coding->produced_char;
6454 if (result == CODING_FINISH_NORMAL
230779b9 6455 || result == CODING_FINISH_INTERRUPT
2391eaa4
KH
6456 || (result == CODING_FINISH_INSUFFICIENT_SRC
6457 && coding->consumed == 0))
73be902c
KH
6458 break;
6459 /* Now result should be CODING_FINISH_INSUFFICIENT_DST. */
6460 extend_conversion_buffer (&buf);
6461 }
6462
2391eaa4
KH
6463 coding->consumed = consumed;
6464 coding->consumed_char = consumed_char;
6465 coding->produced = produced;
6466 coding->produced_char = produced_char;
6467
73be902c 6468 newstr = make_uninit_string (produced + shrinked_bytes);
b73bfc1c 6469 if (from > 0)
a4244313
KR
6470 STRING_COPYIN (newstr, 0, SDATA (str), from);
6471 STRING_COPYIN (newstr, from, buf.data, produced);
73be902c 6472 if (shrinked_bytes > from)
a4244313
KR
6473 STRING_COPYIN (newstr, from + produced,
6474 SDATA (str) + to_byte,
6475 shrinked_bytes - from);
73be902c
KH
6476
6477 free_conversion_buffer (&buf);
ec6d2bb8 6478 coding_free_composition_data (coding);
b73bfc1c 6479
73be902c 6480 return newstr;
3bb917bf
KH
6481
6482 no_need_of_encoding:
6483 coding->consumed = SBYTES (str);
6484 coding->consumed_char = SCHARS (str);
6485 if (STRING_MULTIBYTE (str))
6486 {
6487 if (nocopy)
6488 /* We are sure that STR doesn't contain a multibyte
6489 character. */
6490 STRING_SET_UNIBYTE (str);
6491 else
6492 {
6493 str = Fstring_as_unibyte (str);
6494 nocopy = 1;
6495 }
6496 }
6497 coding->produced = SBYTES (str);
6498 coding->produced_char = SCHARS (str);
6499 return (nocopy ? str : Fcopy_sequence (str));
4ed46869
KH
6500}
6501
6502\f
6503#ifdef emacs
1397dc18 6504/*** 8. Emacs Lisp library functions ***/
4ed46869 6505
4ed46869 6506DEFUN ("coding-system-p", Fcoding_system_p, Scoding_system_p, 1, 1, 0,
48b0f3ae
PJ
6507 doc: /* Return t if OBJECT is nil or a coding-system.
6508See the documentation of `make-coding-system' for information
6509about coding-system objects. */)
6510 (obj)
4ed46869
KH
6511 Lisp_Object obj;
6512{
4608c386
KH
6513 if (NILP (obj))
6514 return Qt;
6515 if (!SYMBOLP (obj))
6516 return Qnil;
c2164d91
KH
6517 if (! NILP (Fget (obj, Qcoding_system_define_form)))
6518 return Qt;
4608c386
KH
6519 /* Get coding-spec vector for OBJ. */
6520 obj = Fget (obj, Qcoding_system);
6521 return ((VECTORP (obj) && XVECTOR (obj)->size == 5)
6522 ? Qt : Qnil);
4ed46869
KH
6523}
6524
9d991de8
RS
6525DEFUN ("read-non-nil-coding-system", Fread_non_nil_coding_system,
6526 Sread_non_nil_coding_system, 1, 1, 0,
48b0f3ae
PJ
6527 doc: /* Read a coding system from the minibuffer, prompting with string PROMPT. */)
6528 (prompt)
4ed46869
KH
6529 Lisp_Object prompt;
6530{
e0e989f6 6531 Lisp_Object val;
9d991de8
RS
6532 do
6533 {
4608c386
KH
6534 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil,
6535 Qt, Qnil, Qcoding_system_history, Qnil, Qnil);
9d991de8 6536 }
d5db4077 6537 while (SCHARS (val) == 0);
e0e989f6 6538 return (Fintern (val, Qnil));
4ed46869
KH
6539}
6540
9b787f3e 6541DEFUN ("read-coding-system", Fread_coding_system, Sread_coding_system, 1, 2, 0,
48b0f3ae
PJ
6542 doc: /* Read a coding system from the minibuffer, prompting with string PROMPT.
6543If the user enters null input, return second argument DEFAULT-CODING-SYSTEM. */)
6544 (prompt, default_coding_system)
9b787f3e 6545 Lisp_Object prompt, default_coding_system;
4ed46869 6546{
f44d27ce 6547 Lisp_Object val;
9b787f3e 6548 if (SYMBOLP (default_coding_system))
57d25e6f 6549 default_coding_system = SYMBOL_NAME (default_coding_system);
4608c386 6550 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil,
9b787f3e
RS
6551 Qt, Qnil, Qcoding_system_history,
6552 default_coding_system, Qnil);
d5db4077 6553 return (SCHARS (val) == 0 ? Qnil : Fintern (val, Qnil));
4ed46869
KH
6554}
6555
6556DEFUN ("check-coding-system", Fcheck_coding_system, Scheck_coding_system,
6557 1, 1, 0,
48b0f3ae
PJ
6558 doc: /* Check validity of CODING-SYSTEM.
6559If valid, return CODING-SYSTEM, else signal a `coding-system-error' error.
303cdc2d 6560It is valid if it is nil or a symbol with a non-nil `coding-system' property.
de1d1a40 6561The value of this property should be a vector of length 5. */)
48b0f3ae 6562 (coding_system)
4ed46869
KH
6563 Lisp_Object coding_system;
6564{
a362520d
KH
6565 Lisp_Object define_form;
6566
6567 define_form = Fget (coding_system, Qcoding_system_define_form);
6568 if (! NILP (define_form))
6569 {
6570 Fput (coding_system, Qcoding_system_define_form, Qnil);
6571 safe_eval (define_form);
6572 }
4ed46869
KH
6573 if (!NILP (Fcoding_system_p (coding_system)))
6574 return coding_system;
6575 while (1)
02ba4723 6576 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil));
4ed46869 6577}
3a73fa5d 6578\f
d46c5b12 6579Lisp_Object
0a28aafb 6580detect_coding_system (src, src_bytes, highest, multibytep)
a4244313 6581 const unsigned char *src;
d46c5b12 6582 int src_bytes, highest;
0a28aafb 6583 int multibytep;
4ed46869
KH
6584{
6585 int coding_mask, eol_type;
d46c5b12
KH
6586 Lisp_Object val, tmp;
6587 int dummy;
4ed46869 6588
0a28aafb 6589 coding_mask = detect_coding_mask (src, src_bytes, NULL, &dummy, multibytep);
d46c5b12
KH
6590 eol_type = detect_eol_type (src, src_bytes, &dummy);
6591 if (eol_type == CODING_EOL_INCONSISTENT)
25b02698 6592 eol_type = CODING_EOL_UNDECIDED;
4ed46869 6593
d46c5b12 6594 if (!coding_mask)
4ed46869 6595 {
27901516 6596 val = Qundecided;
d46c5b12 6597 if (eol_type != CODING_EOL_UNDECIDED)
4ed46869 6598 {
f44d27ce
RS
6599 Lisp_Object val2;
6600 val2 = Fget (Qundecided, Qeol_type);
4ed46869
KH
6601 if (VECTORP (val2))
6602 val = XVECTOR (val2)->contents[eol_type];
6603 }
80e803b4 6604 return (highest ? val : Fcons (val, Qnil));
4ed46869 6605 }
4ed46869 6606
d46c5b12
KH
6607 /* At first, gather possible coding systems in VAL. */
6608 val = Qnil;
fa42c37f 6609 for (tmp = Vcoding_category_list; CONSP (tmp); tmp = XCDR (tmp))
4ed46869 6610 {
fa42c37f
KH
6611 Lisp_Object category_val, category_index;
6612
6613 category_index = Fget (XCAR (tmp), Qcoding_category_index);
6614 category_val = Fsymbol_value (XCAR (tmp));
6615 if (!NILP (category_val)
6616 && NATNUMP (category_index)
6617 && (coding_mask & (1 << XFASTINT (category_index))))
4ed46869 6618 {
fa42c37f 6619 val = Fcons (category_val, val);
d46c5b12
KH
6620 if (highest)
6621 break;
4ed46869
KH
6622 }
6623 }
d46c5b12
KH
6624 if (!highest)
6625 val = Fnreverse (val);
4ed46869 6626
65059037 6627 /* Then, replace the elements with subsidiary coding systems. */
fa42c37f 6628 for (tmp = val; CONSP (tmp); tmp = XCDR (tmp))
4ed46869 6629 {
65059037
RS
6630 if (eol_type != CODING_EOL_UNDECIDED
6631 && eol_type != CODING_EOL_INCONSISTENT)
4ed46869 6632 {
d46c5b12 6633 Lisp_Object eol;
03699b14 6634 eol = Fget (XCAR (tmp), Qeol_type);
d46c5b12 6635 if (VECTORP (eol))
f3fbd155 6636 XSETCAR (tmp, XVECTOR (eol)->contents[eol_type]);
4ed46869
KH
6637 }
6638 }
03699b14 6639 return (highest ? XCAR (val) : val);
93dec019 6640}
4ed46869 6641
d46c5b12
KH
6642DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region,
6643 2, 3, 0,
40fd536c
KH
6644 doc: /* Detect how the byte sequence in the region is encoded.
6645Return a list of possible coding systems used on decoding a byte
6646sequence containing the bytes in the region between START and END when
6647the coding system `undecided' is specified. The list is ordered by
6648priority decided in the current language environment.
48b0f3ae
PJ
6649
6650If only ASCII characters are found, it returns a list of single element
6651`undecided' or its subsidiary coding system according to a detected
6652end-of-line format.
6653
6654If optional argument HIGHEST is non-nil, return the coding system of
6655highest priority. */)
6656 (start, end, highest)
d46c5b12
KH
6657 Lisp_Object start, end, highest;
6658{
6659 int from, to;
6660 int from_byte, to_byte;
682169fe 6661 int include_anchor_byte = 0;
6289dd10 6662
b7826503
PJ
6663 CHECK_NUMBER_COERCE_MARKER (start);
6664 CHECK_NUMBER_COERCE_MARKER (end);
4ed46869 6665
d46c5b12
KH
6666 validate_region (&start, &end);
6667 from = XINT (start), to = XINT (end);
6668 from_byte = CHAR_TO_BYTE (from);
6669 to_byte = CHAR_TO_BYTE (to);
6289dd10 6670
d46c5b12
KH
6671 if (from < GPT && to >= GPT)
6672 move_gap_both (to, to_byte);
c210f766
KH
6673 /* If we an anchor byte `\0' follows the region, we include it in
6674 the detecting source. Then code detectors can handle the tailing
6675 byte sequence more accurately.
6676
7d0393cf 6677 Fix me: This is not a perfect solution. It is better that we
c210f766
KH
6678 add one more argument, say LAST_BLOCK, to all detect_coding_XXX.
6679 */
682169fe
KH
6680 if (to == Z || (to == GPT && GAP_SIZE > 0))
6681 include_anchor_byte = 1;
d46c5b12 6682 return detect_coding_system (BYTE_POS_ADDR (from_byte),
682169fe 6683 to_byte - from_byte + include_anchor_byte,
0a28aafb
KH
6684 !NILP (highest),
6685 !NILP (current_buffer
6686 ->enable_multibyte_characters));
d46c5b12 6687}
6289dd10 6688
d46c5b12
KH
6689DEFUN ("detect-coding-string", Fdetect_coding_string, Sdetect_coding_string,
6690 1, 2, 0,
eec1f3c7
KH
6691 doc: /* Detect how the byte sequence in STRING is encoded.
6692Return a list of possible coding systems used on decoding a byte
6693sequence containing the bytes in STRING when the coding system
6694`undecided' is specified. The list is ordered by priority decided in
6695the current language environment.
48b0f3ae
PJ
6696
6697If only ASCII characters are found, it returns a list of single element
6698`undecided' or its subsidiary coding system according to a detected
6699end-of-line format.
6700
6701If optional argument HIGHEST is non-nil, return the coding system of
6702highest priority. */)
6703 (string, highest)
d46c5b12
KH
6704 Lisp_Object string, highest;
6705{
b7826503 6706 CHECK_STRING (string);
4ed46869 6707
d5db4077 6708 return detect_coding_system (SDATA (string),
682169fe
KH
6709 /* "+ 1" is to include the anchor byte
6710 `\0'. With this, code detectors can
c210f766
KH
6711 handle the tailing bytes more
6712 accurately. */
d5db4077 6713 SBYTES (string) + 1,
0a28aafb
KH
6714 !NILP (highest),
6715 STRING_MULTIBYTE (string));
4ed46869
KH
6716}
6717
d12168d6 6718/* Subroutine for Ffind_coding_systems_region_internal.
05e6f5dc
KH
6719
6720 Return a list of coding systems that safely encode the multibyte
b666620c 6721 text between P and PEND. SAFE_CODINGS, if non-nil, is an alist of
05e6f5dc
KH
6722 possible coding systems. If it is nil, it means that we have not
6723 yet found any coding systems.
6724
12d5b185
KH
6725 WORK_TABLE a char-table of which element is set to t once the
6726 element is looked up.
05e6f5dc
KH
6727
6728 If a non-ASCII single byte char is found, set
6729 *single_byte_char_found to 1. */
6730
6731static Lisp_Object
6732find_safe_codings (p, pend, safe_codings, work_table, single_byte_char_found)
6733 unsigned char *p, *pend;
6734 Lisp_Object safe_codings, work_table;
6735 int *single_byte_char_found;
6b89e3aa 6736{
f1ce3dcf 6737 int c, len;
6b89e3aa
KH
6738 Lisp_Object val, ch;
6739 Lisp_Object prev, tail;
177c0ea7 6740
12d5b185
KH
6741 if (NILP (safe_codings))
6742 goto done_safe_codings;
6b89e3aa
KH
6743 while (p < pend)
6744 {
6745 c = STRING_CHAR_AND_LENGTH (p, pend - p, len);
6746 p += len;
6747 if (ASCII_BYTE_P (c))
6748 /* We can ignore ASCII characters here. */
6749 continue;
6750 if (SINGLE_BYTE_CHAR_P (c))
6751 *single_byte_char_found = 1;
6b89e3aa
KH
6752 /* Check the safe coding systems for C. */
6753 ch = make_number (c);
6754 val = Faref (work_table, ch);
6755 if (EQ (val, Qt))
6756 /* This element was already checked. Ignore it. */
6757 continue;
6758 /* Remember that we checked this element. */
6759 Faset (work_table, ch, Qt);
6760
6761 for (prev = tail = safe_codings; CONSP (tail); tail = XCDR (tail))
6762 {
b666620c
KH
6763 Lisp_Object elt, translation_table, hash_table, accept_latin_extra;
6764 int encodable;
6765
6766 elt = XCAR (tail);
6767 if (CONSP (XCDR (elt)))
6768 {
6769 /* This entry has this format now:
6770 ( CODING SAFE-CHARS TRANSLATION-TABLE HASH-TABLE
6771 ACCEPT-LATIN-EXTRA ) */
6772 val = XCDR (elt);
6773 encodable = ! NILP (Faref (XCAR (val), ch));
6774 if (! encodable)
6775 {
6776 val = XCDR (val);
6777 translation_table = XCAR (val);
6778 hash_table = XCAR (XCDR (val));
6779 accept_latin_extra = XCAR (XCDR (XCDR (val)));
6780 }
6781 }
6782 else
6783 {
6784 /* This entry has this format now: ( CODING . SAFE-CHARS) */
6785 encodable = ! NILP (Faref (XCDR (elt), ch));
6786 if (! encodable)
6787 {
6788 /* Transform the format to:
6789 ( CODING SAFE-CHARS TRANSLATION-TABLE HASH-TABLE
6790 ACCEPT-LATIN-EXTRA ) */
6791 val = Fget (XCAR (elt), Qcoding_system);
6792 translation_table
6793 = Fplist_get (AREF (val, 3),
6794 Qtranslation_table_for_encode);
6795 if (SYMBOLP (translation_table))
6796 translation_table = Fget (translation_table,
6797 Qtranslation_table);
6798 hash_table
6799 = (CHAR_TABLE_P (translation_table)
6800 ? XCHAR_TABLE (translation_table)->extras[1]
6801 : Qnil);
6802 accept_latin_extra
6803 = ((EQ (AREF (val, 0), make_number (2))
6804 && VECTORP (AREF (val, 4)))
58f99379 6805 ? AREF (AREF (val, 4), 16)
b666620c
KH
6806 : Qnil);
6807 XSETCAR (tail, list5 (XCAR (elt), XCDR (elt),
6808 translation_table, hash_table,
6809 accept_latin_extra));
6810 }
6811 }
43e4a82f 6812
b666620c
KH
6813 if (! encodable
6814 && ((CHAR_TABLE_P (translation_table)
6815 && ! NILP (Faref (translation_table, ch)))
6816 || (HASH_TABLE_P (hash_table)
6817 && ! NILP (Fgethash (ch, hash_table, Qnil)))
6818 || (SINGLE_BYTE_CHAR_P (c)
6819 && ! NILP (accept_latin_extra)
6820 && VECTORP (Vlatin_extra_code_table)
6821 && ! NILP (AREF (Vlatin_extra_code_table, c)))))
6822 encodable = 1;
6823 if (encodable)
6824 prev = tail;
6825 else
6b89e3aa 6826 {
7c695ab9 6827 /* Exclude this coding system from SAFE_CODINGS. */
6b89e3aa 6828 if (EQ (tail, safe_codings))
12d5b185
KH
6829 {
6830 safe_codings = XCDR (safe_codings);
6831 if (NILP (safe_codings))
6832 goto done_safe_codings;
6833 }
6b89e3aa
KH
6834 else
6835 XSETCDR (prev, XCDR (tail));
6836 }
6b89e3aa
KH
6837 }
6838 }
12d5b185
KH
6839
6840 done_safe_codings:
6841 /* If the above loop was terminated before P reaches PEND, it means
6842 SAFE_CODINGS was set to nil. If we have not yet found an
6843 non-ASCII single-byte char, check it now. */
6844 if (! *single_byte_char_found)
6845 while (p < pend)
6846 {
6847 c = STRING_CHAR_AND_LENGTH (p, pend - p, len);
6848 p += len;
6849 if (! ASCII_BYTE_P (c)
6850 && SINGLE_BYTE_CHAR_P (c))
6851 {
6852 *single_byte_char_found = 1;
6853 break;
6854 }
6855 }
6b89e3aa
KH
6856 return safe_codings;
6857}
6858
067a6a66
KH
6859DEFUN ("find-coding-systems-region-internal",
6860 Ffind_coding_systems_region_internal,
6861 Sfind_coding_systems_region_internal, 2, 2, 0,
6b89e3aa
KH
6862 doc: /* Internal use only. */)
6863 (start, end)
6864 Lisp_Object start, end;
6865{
6866 Lisp_Object work_table, safe_codings;
6867 int non_ascii_p = 0;
6868 int single_byte_char_found = 0;
6869 const unsigned char *p1, *p1end, *p2, *p2end, *p;
6870
6871 if (STRINGP (start))
6872 {
6873 if (!STRING_MULTIBYTE (start))
6874 return Qt;
6875 p1 = SDATA (start), p1end = p1 + SBYTES (start);
6876 p2 = p2end = p1end;
6877 if (SCHARS (start) != SBYTES (start))
6878 non_ascii_p = 1;
6879 }
6880 else
6881 {
6882 int from, to, stop;
6883
6884 CHECK_NUMBER_COERCE_MARKER (start);
6885 CHECK_NUMBER_COERCE_MARKER (end);
6886 if (XINT (start) < BEG || XINT (end) > Z || XINT (start) > XINT (end))
6887 args_out_of_range (start, end);
6888 if (NILP (current_buffer->enable_multibyte_characters))
6889 return Qt;
6890 from = CHAR_TO_BYTE (XINT (start));
6891 to = CHAR_TO_BYTE (XINT (end));
6892 stop = from < GPT_BYTE && GPT_BYTE < to ? GPT_BYTE : to;
6893 p1 = BYTE_POS_ADDR (from), p1end = p1 + (stop - from);
6894 if (stop == to)
6895 p2 = p2end = p1end;
6896 else
6897 p2 = BYTE_POS_ADDR (stop), p2end = p2 + (to - stop);
6898 if (XINT (end) - XINT (start) != to - from)
6899 non_ascii_p = 1;
6900 }
6901
6902 if (!non_ascii_p)
6903 {
6904 /* We are sure that the text contains no multibyte character.
6905 Check if it contains eight-bit-graphic. */
6906 p = p1;
6907 for (p = p1; p < p1end && ASCII_BYTE_P (*p); p++);
6908 if (p == p1end)
6909 {
6910 for (p = p2; p < p2end && ASCII_BYTE_P (*p); p++);
6911 if (p == p2end)
6912 return Qt;
6913 }
6914 }
6915
6916 /* The text contains non-ASCII characters. */
6917
6918 work_table = Fmake_char_table (Qchar_coding_system, Qnil);
6919 safe_codings = Fcopy_sequence (XCDR (Vcoding_system_safe_chars));
6920
067a6a66
KH
6921 safe_codings = find_safe_codings (p1, p1end, safe_codings, work_table,
6922 &single_byte_char_found);
6b89e3aa 6923 if (p2 < p2end)
067a6a66
KH
6924 safe_codings = find_safe_codings (p2, p2end, safe_codings, work_table,
6925 &single_byte_char_found);
6b89e3aa
KH
6926 if (EQ (safe_codings, XCDR (Vcoding_system_safe_chars)))
6927 safe_codings = Qt;
6928 else
6929 {
6930 /* Turn safe_codings to a list of coding systems... */
6931 Lisp_Object val;
6932
6933 if (single_byte_char_found)
6934 /* ... and append these for eight-bit chars. */
6935 val = Fcons (Qraw_text,
6936 Fcons (Qemacs_mule, Fcons (Qno_conversion, Qnil)));
6937 else
6938 /* ... and append generic coding systems. */
6939 val = Fcopy_sequence (XCAR (Vcoding_system_safe_chars));
177c0ea7 6940
6b89e3aa
KH
6941 for (; CONSP (safe_codings); safe_codings = XCDR (safe_codings))
6942 val = Fcons (XCAR (XCAR (safe_codings)), val);
6943 safe_codings = val;
6944 }
6945
6946 return safe_codings;
6947}
6948
6949
068a9dbd
KH
6950/* Search from position POS for such characters that are unencodable
6951 accoding to SAFE_CHARS, and return a list of their positions. P
6952 points where in the memory the character at POS exists. Limit the
6953 search at PEND or when Nth unencodable characters are found.
6954
6955 If SAFE_CHARS is a char table, an element for an unencodable
6956 character is nil.
6957
6958 If SAFE_CHARS is nil, all non-ASCII characters are unencodable.
6959
6960 Otherwise, SAFE_CHARS is t, and only eight-bit-contrl and
6961 eight-bit-graphic characters are unencodable. */
6962
6963static Lisp_Object
6964unencodable_char_position (safe_chars, pos, p, pend, n)
6965 Lisp_Object safe_chars;
6966 int pos;
6967 unsigned char *p, *pend;
6968 int n;
6969{
6970 Lisp_Object pos_list;
6971
6972 pos_list = Qnil;
6973 while (p < pend)
6974 {
6975 int len;
6976 int c = STRING_CHAR_AND_LENGTH (p, MAX_MULTIBYTE_LENGTH, len);
7d0393cf 6977
068a9dbd
KH
6978 if (c >= 128
6979 && (CHAR_TABLE_P (safe_chars)
6980 ? NILP (CHAR_TABLE_REF (safe_chars, c))
6981 : (NILP (safe_chars) || c < 256)))
6982 {
6983 pos_list = Fcons (make_number (pos), pos_list);
6984 if (--n <= 0)
6985 break;
6986 }
6987 pos++;
6988 p += len;
6989 }
6990 return Fnreverse (pos_list);
6991}
6992
6993
6994DEFUN ("unencodable-char-position", Funencodable_char_position,
6995 Sunencodable_char_position, 3, 5, 0,
6996 doc: /*
6997Return position of first un-encodable character in a region.
6998START and END specfiy the region and CODING-SYSTEM specifies the
6999encoding to check. Return nil if CODING-SYSTEM does encode the region.
7000
7001If optional 4th argument COUNT is non-nil, it specifies at most how
7002many un-encodable characters to search. In this case, the value is a
7003list of positions.
7004
7005If optional 5th argument STRING is non-nil, it is a string to search
7006for un-encodable characters. In that case, START and END are indexes
7007to the string. */)
7008 (start, end, coding_system, count, string)
7009 Lisp_Object start, end, coding_system, count, string;
7010{
7011 int n;
7012 Lisp_Object safe_chars;
7013 struct coding_system coding;
7014 Lisp_Object positions;
7015 int from, to;
7016 unsigned char *p, *pend;
7017
7018 if (NILP (string))
7019 {
7020 validate_region (&start, &end);
7021 from = XINT (start);
7022 to = XINT (end);
7023 if (NILP (current_buffer->enable_multibyte_characters))
7024 return Qnil;
7025 p = CHAR_POS_ADDR (from);
200c93e2
KH
7026 if (to == GPT)
7027 pend = GPT_ADDR;
7028 else
7029 pend = CHAR_POS_ADDR (to);
068a9dbd
KH
7030 }
7031 else
7032 {
7033 CHECK_STRING (string);
7034 CHECK_NATNUM (start);
7035 CHECK_NATNUM (end);
7036 from = XINT (start);
7037 to = XINT (end);
7038 if (from > to
7039 || to > SCHARS (string))
7040 args_out_of_range_3 (string, start, end);
7041 if (! STRING_MULTIBYTE (string))
7042 return Qnil;
7043 p = SDATA (string) + string_char_to_byte (string, from);
7044 pend = SDATA (string) + string_char_to_byte (string, to);
7045 }
7046
7047 setup_coding_system (Fcheck_coding_system (coding_system), &coding);
7048
7049 if (NILP (count))
7050 n = 1;
7051 else
7052 {
7053 CHECK_NATNUM (count);
7054 n = XINT (count);
7055 }
7056
7057 if (coding.type == coding_type_no_conversion
7058 || coding.type == coding_type_raw_text)
7059 return Qnil;
7060
7061 if (coding.type == coding_type_undecided)
7062 safe_chars = Qnil;
7063 else
6b89e3aa 7064 safe_chars = coding_safe_chars (coding_system);
068a9dbd
KH
7065
7066 if (STRINGP (string)
7067 || from >= GPT || to <= GPT)
7068 positions = unencodable_char_position (safe_chars, from, p, pend, n);
7069 else
7070 {
7071 Lisp_Object args[2];
7072
7073 args[0] = unencodable_char_position (safe_chars, from, p, GPT_ADDR, n);
96d2e64d 7074 n -= XINT (Flength (args[0]));
068a9dbd
KH
7075 if (n <= 0)
7076 positions = args[0];
7077 else
7078 {
7079 args[1] = unencodable_char_position (safe_chars, GPT, GAP_END_ADDR,
7080 pend, n);
7081 positions = Fappend (2, args);
7082 }
7083 }
7084
7085 return (NILP (count) ? Fcar (positions) : positions);
7086}
7087
7088
4031e2bf
KH
7089Lisp_Object
7090code_convert_region1 (start, end, coding_system, encodep)
d46c5b12 7091 Lisp_Object start, end, coding_system;
4031e2bf 7092 int encodep;
3a73fa5d
RS
7093{
7094 struct coding_system coding;
da55a2b7 7095 int from, to;
3a73fa5d 7096
b7826503
PJ
7097 CHECK_NUMBER_COERCE_MARKER (start);
7098 CHECK_NUMBER_COERCE_MARKER (end);
7099 CHECK_SYMBOL (coding_system);
3a73fa5d 7100
d46c5b12
KH
7101 validate_region (&start, &end);
7102 from = XFASTINT (start);
7103 to = XFASTINT (end);
7104
3a73fa5d 7105 if (NILP (coding_system))
d46c5b12
KH
7106 return make_number (to - from);
7107
3a73fa5d 7108 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
d5db4077 7109 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system)));
3a73fa5d 7110
d46c5b12 7111 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
7112 coding.src_multibyte = coding.dst_multibyte
7113 = !NILP (current_buffer->enable_multibyte_characters);
fb88bf2d
KH
7114 code_convert_region (from, CHAR_TO_BYTE (from), to, CHAR_TO_BYTE (to),
7115 &coding, encodep, 1);
f072a3e8 7116 Vlast_coding_system_used = coding.symbol;
fb88bf2d 7117 return make_number (coding.produced_char);
4031e2bf
KH
7118}
7119
7120DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region,
7121 3, 3, "r\nzCoding system: ",
48b0f3ae
PJ
7122 doc: /* Decode the current region from the specified coding system.
7123When called from a program, takes three arguments:
7124START, END, and CODING-SYSTEM. START and END are buffer positions.
7125This function sets `last-coding-system-used' to the precise coding system
7126used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
7127not fully specified.)
7128It returns the length of the decoded text. */)
7129 (start, end, coding_system)
4031e2bf
KH
7130 Lisp_Object start, end, coding_system;
7131{
7132 return code_convert_region1 (start, end, coding_system, 0);
3a73fa5d
RS
7133}
7134
7135DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region,
7136 3, 3, "r\nzCoding system: ",
48b0f3ae
PJ
7137 doc: /* Encode the current region into the specified coding system.
7138When called from a program, takes three arguments:
7139START, END, and CODING-SYSTEM. START and END are buffer positions.
7140This function sets `last-coding-system-used' to the precise coding system
7141used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
7142not fully specified.)
7143It returns the length of the encoded text. */)
7144 (start, end, coding_system)
d46c5b12 7145 Lisp_Object start, end, coding_system;
3a73fa5d 7146{
4031e2bf
KH
7147 return code_convert_region1 (start, end, coding_system, 1);
7148}
3a73fa5d 7149
4031e2bf
KH
7150Lisp_Object
7151code_convert_string1 (string, coding_system, nocopy, encodep)
7152 Lisp_Object string, coding_system, nocopy;
7153 int encodep;
7154{
7155 struct coding_system coding;
3a73fa5d 7156
b7826503
PJ
7157 CHECK_STRING (string);
7158 CHECK_SYMBOL (coding_system);
4ed46869 7159
d46c5b12 7160 if (NILP (coding_system))
4031e2bf 7161 return (NILP (nocopy) ? Fcopy_sequence (string) : string);
4ed46869 7162
d46c5b12 7163 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
d5db4077 7164 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system)));
5f1cd180 7165
d46c5b12 7166 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
7167 string = (encodep
7168 ? encode_coding_string (string, &coding, !NILP (nocopy))
7169 : decode_coding_string (string, &coding, !NILP (nocopy)));
f072a3e8 7170 Vlast_coding_system_used = coding.symbol;
ec6d2bb8
KH
7171
7172 return string;
4ed46869
KH
7173}
7174
4ed46869 7175DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string,
e0e989f6 7176 2, 3, 0,
48b0f3ae
PJ
7177 doc: /* Decode STRING which is encoded in CODING-SYSTEM, and return the result.
7178Optional arg NOCOPY non-nil means it is OK to return STRING itself
7179if the decoding operation is trivial.
7180This function sets `last-coding-system-used' to the precise coding system
7181used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
7182not fully specified.) */)
7183 (string, coding_system, nocopy)
e0e989f6 7184 Lisp_Object string, coding_system, nocopy;
4ed46869 7185{
f072a3e8 7186 return code_convert_string1 (string, coding_system, nocopy, 0);
4ed46869
KH
7187}
7188
7189DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string,
e0e989f6 7190 2, 3, 0,
48b0f3ae
PJ
7191 doc: /* Encode STRING to CODING-SYSTEM, and return the result.
7192Optional arg NOCOPY non-nil means it is OK to return STRING itself
7193if the encoding operation is trivial.
7194This function sets `last-coding-system-used' to the precise coding system
7195used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
7196not fully specified.) */)
7197 (string, coding_system, nocopy)
e0e989f6 7198 Lisp_Object string, coding_system, nocopy;
4ed46869 7199{
f072a3e8 7200 return code_convert_string1 (string, coding_system, nocopy, 1);
4ed46869 7201}
4031e2bf 7202
ecec61c1 7203/* Encode or decode STRING according to CODING_SYSTEM.
ec6d2bb8
KH
7204 Do not set Vlast_coding_system_used.
7205
7206 This function is called only from macros DECODE_FILE and
7207 ENCODE_FILE, thus we ignore character composition. */
ecec61c1
KH
7208
7209Lisp_Object
7210code_convert_string_norecord (string, coding_system, encodep)
7211 Lisp_Object string, coding_system;
7212 int encodep;
7213{
7214 struct coding_system coding;
7215
b7826503
PJ
7216 CHECK_STRING (string);
7217 CHECK_SYMBOL (coding_system);
ecec61c1
KH
7218
7219 if (NILP (coding_system))
7220 return string;
7221
7222 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
d5db4077 7223 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system)));
ecec61c1 7224
ec6d2bb8 7225 coding.composing = COMPOSITION_DISABLED;
ecec61c1 7226 coding.mode |= CODING_MODE_LAST_BLOCK;
b73bfc1c
KH
7227 return (encodep
7228 ? encode_coding_string (string, &coding, 1)
7229 : decode_coding_string (string, &coding, 1));
ecec61c1 7230}
3a73fa5d 7231\f
4ed46869 7232DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0,
48b0f3ae
PJ
7233 doc: /* Decode a Japanese character which has CODE in shift_jis encoding.
7234Return the corresponding character. */)
7235 (code)
4ed46869
KH
7236 Lisp_Object code;
7237{
7238 unsigned char c1, c2, s1, s2;
7239 Lisp_Object val;
7240
b7826503 7241 CHECK_NUMBER (code);
4ed46869 7242 s1 = (XFASTINT (code)) >> 8, s2 = (XFASTINT (code)) & 0xFF;
55ab7be3
KH
7243 if (s1 == 0)
7244 {
c28a9453
KH
7245 if (s2 < 0x80)
7246 XSETFASTINT (val, s2);
7247 else if (s2 >= 0xA0 || s2 <= 0xDF)
b73bfc1c 7248 XSETFASTINT (val, MAKE_CHAR (charset_katakana_jisx0201, s2, 0));
c28a9453 7249 else
9da8350f 7250 error ("Invalid Shift JIS code: %x", XFASTINT (code));
55ab7be3
KH
7251 }
7252 else
7253 {
87323294 7254 if ((s1 < 0x80 || (s1 > 0x9F && s1 < 0xE0) || s1 > 0xEF)
55ab7be3 7255 || (s2 < 0x40 || s2 == 0x7F || s2 > 0xFC))
9da8350f 7256 error ("Invalid Shift JIS code: %x", XFASTINT (code));
55ab7be3 7257 DECODE_SJIS (s1, s2, c1, c2);
b73bfc1c 7258 XSETFASTINT (val, MAKE_CHAR (charset_jisx0208, c1, c2));
55ab7be3 7259 }
4ed46869
KH
7260 return val;
7261}
7262
7263DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0,
48b0f3ae
PJ
7264 doc: /* Encode a Japanese character CHAR to shift_jis encoding.
7265Return the corresponding code in SJIS. */)
7266 (ch)
4ed46869
KH
7267 Lisp_Object ch;
7268{
bcf26d6a 7269 int charset, c1, c2, s1, s2;
4ed46869
KH
7270 Lisp_Object val;
7271
b7826503 7272 CHECK_NUMBER (ch);
4ed46869 7273 SPLIT_CHAR (XFASTINT (ch), charset, c1, c2);
c28a9453
KH
7274 if (charset == CHARSET_ASCII)
7275 {
7276 val = ch;
7277 }
7278 else if (charset == charset_jisx0208
7279 && c1 > 0x20 && c1 < 0x7F && c2 > 0x20 && c2 < 0x7F)
4ed46869
KH
7280 {
7281 ENCODE_SJIS (c1, c2, s1, s2);
bcf26d6a 7282 XSETFASTINT (val, (s1 << 8) | s2);
4ed46869 7283 }
55ab7be3
KH
7284 else if (charset == charset_katakana_jisx0201
7285 && c1 > 0x20 && c2 < 0xE0)
7286 {
7287 XSETFASTINT (val, c1 | 0x80);
7288 }
4ed46869 7289 else
55ab7be3 7290 error ("Can't encode to shift_jis: %d", XFASTINT (ch));
4ed46869
KH
7291 return val;
7292}
7293
7294DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0,
48b0f3ae
PJ
7295 doc: /* Decode a Big5 character which has CODE in BIG5 coding system.
7296Return the corresponding character. */)
7297 (code)
4ed46869
KH
7298 Lisp_Object code;
7299{
7300 int charset;
7301 unsigned char b1, b2, c1, c2;
7302 Lisp_Object val;
7303
b7826503 7304 CHECK_NUMBER (code);
4ed46869 7305 b1 = (XFASTINT (code)) >> 8, b2 = (XFASTINT (code)) & 0xFF;
c28a9453
KH
7306 if (b1 == 0)
7307 {
7308 if (b2 >= 0x80)
9da8350f 7309 error ("Invalid BIG5 code: %x", XFASTINT (code));
c28a9453
KH
7310 val = code;
7311 }
7312 else
7313 {
7314 if ((b1 < 0xA1 || b1 > 0xFE)
7315 || (b2 < 0x40 || (b2 > 0x7E && b2 < 0xA1) || b2 > 0xFE))
9da8350f 7316 error ("Invalid BIG5 code: %x", XFASTINT (code));
c28a9453 7317 DECODE_BIG5 (b1, b2, charset, c1, c2);
b73bfc1c 7318 XSETFASTINT (val, MAKE_CHAR (charset, c1, c2));
c28a9453 7319 }
4ed46869
KH
7320 return val;
7321}
7322
7323DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0,
48b0f3ae
PJ
7324 doc: /* Encode the Big5 character CHAR to BIG5 coding system.
7325Return the corresponding character code in Big5. */)
7326 (ch)
4ed46869
KH
7327 Lisp_Object ch;
7328{
bcf26d6a 7329 int charset, c1, c2, b1, b2;
4ed46869
KH
7330 Lisp_Object val;
7331
b7826503 7332 CHECK_NUMBER (ch);
4ed46869 7333 SPLIT_CHAR (XFASTINT (ch), charset, c1, c2);
c28a9453
KH
7334 if (charset == CHARSET_ASCII)
7335 {
7336 val = ch;
7337 }
7338 else if ((charset == charset_big5_1
7339 && (XFASTINT (ch) >= 0x250a1 && XFASTINT (ch) <= 0x271ec))
7340 || (charset == charset_big5_2
7341 && XFASTINT (ch) >= 0x290a1 && XFASTINT (ch) <= 0x2bdb2))
4ed46869
KH
7342 {
7343 ENCODE_BIG5 (charset, c1, c2, b1, b2);
bcf26d6a 7344 XSETFASTINT (val, (b1 << 8) | b2);
4ed46869
KH
7345 }
7346 else
c28a9453 7347 error ("Can't encode to Big5: %d", XFASTINT (ch));
4ed46869
KH
7348 return val;
7349}
3a73fa5d 7350\f
002fdb44 7351DEFUN ("set-terminal-coding-system-internal", Fset_terminal_coding_system_internal,
68bba4e4 7352 Sset_terminal_coding_system_internal, 1, 2, 0,
48b0f3ae 7353 doc: /* Internal use only. */)
6ed8eeff 7354 (coding_system, terminal)
4ed46869 7355 Lisp_Object coding_system;
6ed8eeff 7356 Lisp_Object terminal;
4ed46869 7357{
6ed8eeff 7358 struct coding_system *terminal_coding = TERMINAL_TERMINAL_CODING (get_terminal (terminal, 1));
b7826503 7359 CHECK_SYMBOL (coding_system);
b8299c66 7360 setup_coding_system (Fcheck_coding_system (coding_system), terminal_coding);
70c22245 7361 /* We had better not send unsafe characters to terminal. */
b8299c66 7362 terminal_coding->mode |= CODING_MODE_INHIBIT_UNENCODABLE_CHAR;
8ca3766a 7363 /* Character composition should be disabled. */
b8299c66 7364 terminal_coding->composing = COMPOSITION_DISABLED;
bd64290d 7365 /* Error notification should be suppressed. */
b8299c66
KL
7366 terminal_coding->suppress_error = 1;
7367 terminal_coding->src_multibyte = 1;
7368 terminal_coding->dst_multibyte = 0;
4ed46869
KH
7369 return Qnil;
7370}
7371
002fdb44 7372DEFUN ("set-safe-terminal-coding-system-internal", Fset_safe_terminal_coding_system_internal,
48b0f3ae 7373 Sset_safe_terminal_coding_system_internal, 1, 1, 0,
ddb67bdc 7374 doc: /* Internal use only. */)
48b0f3ae 7375 (coding_system)
c4825358
KH
7376 Lisp_Object coding_system;
7377{
b7826503 7378 CHECK_SYMBOL (coding_system);
c4825358
KH
7379 setup_coding_system (Fcheck_coding_system (coding_system),
7380 &safe_terminal_coding);
8ca3766a 7381 /* Character composition should be disabled. */
ec6d2bb8 7382 safe_terminal_coding.composing = COMPOSITION_DISABLED;
bd64290d 7383 /* Error notification should be suppressed. */
b8299c66 7384 safe_terminal_coding.suppress_error = 1;
b73bfc1c
KH
7385 safe_terminal_coding.src_multibyte = 1;
7386 safe_terminal_coding.dst_multibyte = 0;
c4825358
KH
7387 return Qnil;
7388}
7389
002fdb44 7390DEFUN ("terminal-coding-system", Fterminal_coding_system,
68bba4e4 7391 Sterminal_coding_system, 0, 1, 0,
6ed8eeff
KL
7392 doc: /* Return coding system specified for terminal output on the given terminal.
7393TERMINAL may be a terminal id, a frame, or nil for the selected
7394frame's terminal device. */)
7395 (terminal)
7396 Lisp_Object terminal;
4ed46869 7397{
6ed8eeff 7398 return TERMINAL_TERMINAL_CODING (get_terminal (terminal, 1))->symbol;
4ed46869
KH
7399}
7400
002fdb44 7401DEFUN ("set-keyboard-coding-system-internal", Fset_keyboard_coding_system_internal,
68bba4e4 7402 Sset_keyboard_coding_system_internal, 1, 2, 0,
48b0f3ae 7403 doc: /* Internal use only. */)
6ed8eeff 7404 (coding_system, terminal)
4ed46869 7405 Lisp_Object coding_system;
6ed8eeff 7406 Lisp_Object terminal;
4ed46869 7407{
6ed8eeff 7408 struct terminal *t = get_terminal (terminal, 1);
b7826503 7409 CHECK_SYMBOL (coding_system);
68bba4e4 7410
b8299c66 7411 setup_coding_system (Fcheck_coding_system (coding_system),
6ed8eeff 7412 TERMINAL_KEYBOARD_CODING (t));
8ca3766a 7413 /* Character composition should be disabled. */
6ed8eeff 7414 TERMINAL_KEYBOARD_CODING (t)->composing = COMPOSITION_DISABLED;
4ed46869
KH
7415 return Qnil;
7416}
7417
002fdb44 7418DEFUN ("keyboard-coding-system", Fkeyboard_coding_system,
68bba4e4 7419 Skeyboard_coding_system, 0, 1, 0,
6ed8eeff
KL
7420 doc: /* Return coding system for decoding keyboard input on TERMINAL.
7421TERMINAL may be a terminal id, a frame, or nil for the selected
7422frame's terminal device. */)
7423 (terminal)
7424 Lisp_Object terminal;
4ed46869 7425{
6ed8eeff 7426 return TERMINAL_KEYBOARD_CODING (get_terminal (terminal, 1))->symbol;
4ed46869
KH
7427}
7428
7429\f
a5d301df
KH
7430DEFUN ("find-operation-coding-system", Ffind_operation_coding_system,
7431 Sfind_operation_coding_system, 1, MANY, 0,
48b0f3ae
PJ
7432 doc: /* Choose a coding system for an operation based on the target name.
7433The value names a pair of coding systems: (DECODING-SYSTEM . ENCODING-SYSTEM).
7434DECODING-SYSTEM is the coding system to use for decoding
7435\(in case OPERATION does decoding), and ENCODING-SYSTEM is the coding system
7436for encoding (in case OPERATION does encoding).
7437
7438The first argument OPERATION specifies an I/O primitive:
7439 For file I/O, `insert-file-contents' or `write-region'.
7440 For process I/O, `call-process', `call-process-region', or `start-process'.
7441 For network I/O, `open-network-stream'.
7442
7443The remaining arguments should be the same arguments that were passed
7444to the primitive. Depending on which primitive, one of those arguments
7445is selected as the TARGET. For example, if OPERATION does file I/O,
7446whichever argument specifies the file name is TARGET.
7447
7448TARGET has a meaning which depends on OPERATION:
7449 For file I/O, TARGET is a file name.
7450 For process I/O, TARGET is a process name.
7451 For network I/O, TARGET is a service name or a port number
7452
7453This function looks up what specified for TARGET in,
7454`file-coding-system-alist', `process-coding-system-alist',
7455or `network-coding-system-alist' depending on OPERATION.
7456They may specify a coding system, a cons of coding systems,
7457or a function symbol to call.
7458In the last case, we call the function with one argument,
7459which is a list of all the arguments given to this function.
7460
7461usage: (find-operation-coding-system OPERATION ARGUMENTS ...) */)
7462 (nargs, args)
4ed46869
KH
7463 int nargs;
7464 Lisp_Object *args;
7465{
7466 Lisp_Object operation, target_idx, target, val;
7467 register Lisp_Object chain;
7468
7469 if (nargs < 2)
7470 error ("Too few arguments");
7471 operation = args[0];
7472 if (!SYMBOLP (operation)
7473 || !INTEGERP (target_idx = Fget (operation, Qtarget_idx)))
8ca3766a 7474 error ("Invalid first argument");
4ed46869
KH
7475 if (nargs < 1 + XINT (target_idx))
7476 error ("Too few arguments for operation: %s",
d5db4077 7477 SDATA (SYMBOL_NAME (operation)));
7f787cfd
KH
7478 /* For write-region, if the 6th argument (i.e. VISIT, the 5th
7479 argument to write-region) is string, it must be treated as a
7480 target file name. */
7481 if (EQ (operation, Qwrite_region)
7482 && nargs > 5
7483 && STRINGP (args[5]))
d90ed3b4 7484 target_idx = make_number (4);
4ed46869
KH
7485 target = args[XINT (target_idx) + 1];
7486 if (!(STRINGP (target)
7487 || (EQ (operation, Qopen_network_stream) && INTEGERP (target))))
8ca3766a 7488 error ("Invalid argument %d", XINT (target_idx) + 1);
4ed46869 7489
2e34157c
RS
7490 chain = ((EQ (operation, Qinsert_file_contents)
7491 || EQ (operation, Qwrite_region))
02ba4723 7492 ? Vfile_coding_system_alist
2e34157c 7493 : (EQ (operation, Qopen_network_stream)
02ba4723
KH
7494 ? Vnetwork_coding_system_alist
7495 : Vprocess_coding_system_alist));
4ed46869
KH
7496 if (NILP (chain))
7497 return Qnil;
7498
03699b14 7499 for (; CONSP (chain); chain = XCDR (chain))
4ed46869 7500 {
f44d27ce 7501 Lisp_Object elt;
03699b14 7502 elt = XCAR (chain);
4ed46869
KH
7503
7504 if (CONSP (elt)
7505 && ((STRINGP (target)
03699b14
KR
7506 && STRINGP (XCAR (elt))
7507 && fast_string_match (XCAR (elt), target) >= 0)
7508 || (INTEGERP (target) && EQ (target, XCAR (elt)))))
02ba4723 7509 {
03699b14 7510 val = XCDR (elt);
b19fd4c5
KH
7511 /* Here, if VAL is both a valid coding system and a valid
7512 function symbol, we return VAL as a coding system. */
02ba4723
KH
7513 if (CONSP (val))
7514 return val;
7515 if (! SYMBOLP (val))
7516 return Qnil;
7517 if (! NILP (Fcoding_system_p (val)))
7518 return Fcons (val, val);
b19fd4c5
KH
7519 if (! NILP (Ffboundp (val)))
7520 {
7521 val = call1 (val, Flist (nargs, args));
7522 if (CONSP (val))
7523 return val;
7524 if (SYMBOLP (val) && ! NILP (Fcoding_system_p (val)))
7525 return Fcons (val, val);
7526 }
02ba4723
KH
7527 return Qnil;
7528 }
4ed46869
KH
7529 }
7530 return Qnil;
7531}
7532
1397dc18
KH
7533DEFUN ("update-coding-systems-internal", Fupdate_coding_systems_internal,
7534 Supdate_coding_systems_internal, 0, 0, 0,
48b0f3ae
PJ
7535 doc: /* Update internal database for ISO2022 and CCL based coding systems.
7536When values of any coding categories are changed, you must
7537call this function. */)
7538 ()
d46c5b12
KH
7539{
7540 int i;
7541
fa42c37f 7542 for (i = CODING_CATEGORY_IDX_EMACS_MULE; i < CODING_CATEGORY_IDX_MAX; i++)
d46c5b12 7543 {
1397dc18
KH
7544 Lisp_Object val;
7545
f5c1dd0d 7546 val = SYMBOL_VALUE (XVECTOR (Vcoding_category_table)->contents[i]);
1397dc18
KH
7547 if (!NILP (val))
7548 {
7549 if (! coding_system_table[i])
7550 coding_system_table[i] = ((struct coding_system *)
7551 xmalloc (sizeof (struct coding_system)));
7552 setup_coding_system (val, coding_system_table[i]);
7553 }
7554 else if (coding_system_table[i])
7555 {
7556 xfree (coding_system_table[i]);
7557 coding_system_table[i] = NULL;
7558 }
d46c5b12 7559 }
1397dc18 7560
d46c5b12
KH
7561 return Qnil;
7562}
7563
66cfb530
KH
7564DEFUN ("set-coding-priority-internal", Fset_coding_priority_internal,
7565 Sset_coding_priority_internal, 0, 0, 0,
48b0f3ae
PJ
7566 doc: /* Update internal database for the current value of `coding-category-list'.
7567This function is internal use only. */)
7568 ()
66cfb530
KH
7569{
7570 int i = 0, idx;
84d60297
RS
7571 Lisp_Object val;
7572
7573 val = Vcoding_category_list;
66cfb530
KH
7574
7575 while (CONSP (val) && i < CODING_CATEGORY_IDX_MAX)
7576 {
03699b14 7577 if (! SYMBOLP (XCAR (val)))
66cfb530 7578 break;
03699b14 7579 idx = XFASTINT (Fget (XCAR (val), Qcoding_category_index));
66cfb530
KH
7580 if (idx >= CODING_CATEGORY_IDX_MAX)
7581 break;
7582 coding_priorities[i++] = (1 << idx);
03699b14 7583 val = XCDR (val);
66cfb530
KH
7584 }
7585 /* If coding-category-list is valid and contains all coding
7586 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not,
fa42c37f 7587 the following code saves Emacs from crashing. */
66cfb530
KH
7588 while (i < CODING_CATEGORY_IDX_MAX)
7589 coding_priorities[i++] = CODING_CATEGORY_MASK_RAW_TEXT;
7590
7591 return Qnil;
7592}
7593
6b89e3aa
KH
7594DEFUN ("define-coding-system-internal", Fdefine_coding_system_internal,
7595 Sdefine_coding_system_internal, 1, 1, 0,
7596 doc: /* Register CODING-SYSTEM as a base coding system.
7597This function is internal use only. */)
7598 (coding_system)
7599 Lisp_Object coding_system;
7600{
7601 Lisp_Object safe_chars, slot;
7602
7603 if (NILP (Fcheck_coding_system (coding_system)))
7604 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil));
7605 safe_chars = coding_safe_chars (coding_system);
7606 if (! EQ (safe_chars, Qt) && ! CHAR_TABLE_P (safe_chars))
7607 error ("No valid safe-chars property for %s",
7608 SDATA (SYMBOL_NAME (coding_system)));
7609 if (EQ (safe_chars, Qt))
7610 {
7611 if (NILP (Fmemq (coding_system, XCAR (Vcoding_system_safe_chars))))
7612 XSETCAR (Vcoding_system_safe_chars,
7613 Fcons (coding_system, XCAR (Vcoding_system_safe_chars)));
7614 }
7615 else
7616 {
7617 slot = Fassq (coding_system, XCDR (Vcoding_system_safe_chars));
7618 if (NILP (slot))
7619 XSETCDR (Vcoding_system_safe_chars,
7620 nconc2 (XCDR (Vcoding_system_safe_chars),
7621 Fcons (Fcons (coding_system, safe_chars), Qnil)));
7622 else
7623 XSETCDR (slot, safe_chars);
7624 }
7625 return Qnil;
7626}
7627
4ed46869
KH
7628#endif /* emacs */
7629
7630\f
1397dc18 7631/*** 9. Post-amble ***/
4ed46869 7632
dfcf069d 7633void
4ed46869
KH
7634init_coding_once ()
7635{
7636 int i;
7637
93dec019 7638 /* Emacs' internal format specific initialize routine. */
4ed46869
KH
7639 for (i = 0; i <= 0x20; i++)
7640 emacs_code_class[i] = EMACS_control_code;
7641 emacs_code_class[0x0A] = EMACS_linefeed_code;
7642 emacs_code_class[0x0D] = EMACS_carriage_return_code;
7643 for (i = 0x21 ; i < 0x7F; i++)
7644 emacs_code_class[i] = EMACS_ascii_code;
7645 emacs_code_class[0x7F] = EMACS_control_code;
ec6d2bb8 7646 for (i = 0x80; i < 0xFF; i++)
4ed46869
KH
7647 emacs_code_class[i] = EMACS_invalid_code;
7648 emacs_code_class[LEADING_CODE_PRIVATE_11] = EMACS_leading_code_3;
7649 emacs_code_class[LEADING_CODE_PRIVATE_12] = EMACS_leading_code_3;
7650 emacs_code_class[LEADING_CODE_PRIVATE_21] = EMACS_leading_code_4;
7651 emacs_code_class[LEADING_CODE_PRIVATE_22] = EMACS_leading_code_4;
7652
7653 /* ISO2022 specific initialize routine. */
7654 for (i = 0; i < 0x20; i++)
b73bfc1c 7655 iso_code_class[i] = ISO_control_0;
4ed46869
KH
7656 for (i = 0x21; i < 0x7F; i++)
7657 iso_code_class[i] = ISO_graphic_plane_0;
7658 for (i = 0x80; i < 0xA0; i++)
b73bfc1c 7659 iso_code_class[i] = ISO_control_1;
4ed46869
KH
7660 for (i = 0xA1; i < 0xFF; i++)
7661 iso_code_class[i] = ISO_graphic_plane_1;
7662 iso_code_class[0x20] = iso_code_class[0x7F] = ISO_0x20_or_0x7F;
7663 iso_code_class[0xA0] = iso_code_class[0xFF] = ISO_0xA0_or_0xFF;
7664 iso_code_class[ISO_CODE_CR] = ISO_carriage_return;
7665 iso_code_class[ISO_CODE_SO] = ISO_shift_out;
7666 iso_code_class[ISO_CODE_SI] = ISO_shift_in;
7667 iso_code_class[ISO_CODE_SS2_7] = ISO_single_shift_2_7;
7668 iso_code_class[ISO_CODE_ESC] = ISO_escape;
7669 iso_code_class[ISO_CODE_SS2] = ISO_single_shift_2;
7670 iso_code_class[ISO_CODE_SS3] = ISO_single_shift_3;
7671 iso_code_class[ISO_CODE_CSI] = ISO_control_sequence_introducer;
7672
c4825358 7673 setup_coding_system (Qnil, &safe_terminal_coding);
6bc51348 7674 setup_coding_system (Qnil, &default_buffer_file_coding);
9ce27fde 7675
d46c5b12
KH
7676 bzero (coding_system_table, sizeof coding_system_table);
7677
66cfb530
KH
7678 bzero (ascii_skip_code, sizeof ascii_skip_code);
7679 for (i = 0; i < 128; i++)
7680 ascii_skip_code[i] = 1;
7681
9ce27fde
KH
7682#if defined (MSDOS) || defined (WINDOWSNT)
7683 system_eol_type = CODING_EOL_CRLF;
7684#else
7685 system_eol_type = CODING_EOL_LF;
7686#endif
b843d1ae
KH
7687
7688 inhibit_pre_post_conversion = 0;
e0e989f6
KH
7689}
7690
7691#ifdef emacs
7692
dfcf069d 7693void
e0e989f6
KH
7694syms_of_coding ()
7695{
2a47931b
KH
7696 staticpro (&Vcode_conversion_workbuf_name);
7697 Vcode_conversion_workbuf_name = build_string (" *code-conversion-work*");
7698
e0e989f6
KH
7699 Qtarget_idx = intern ("target-idx");
7700 staticpro (&Qtarget_idx);
7701
bb0115a2
RS
7702 Qcoding_system_history = intern ("coding-system-history");
7703 staticpro (&Qcoding_system_history);
7704 Fset (Qcoding_system_history, Qnil);
7705
9ce27fde 7706 /* Target FILENAME is the first argument. */
e0e989f6 7707 Fput (Qinsert_file_contents, Qtarget_idx, make_number (0));
9ce27fde 7708 /* Target FILENAME is the third argument. */
e0e989f6
KH
7709 Fput (Qwrite_region, Qtarget_idx, make_number (2));
7710
7711 Qcall_process = intern ("call-process");
7712 staticpro (&Qcall_process);
9ce27fde 7713 /* Target PROGRAM is the first argument. */
e0e989f6
KH
7714 Fput (Qcall_process, Qtarget_idx, make_number (0));
7715
7716 Qcall_process_region = intern ("call-process-region");
7717 staticpro (&Qcall_process_region);
9ce27fde 7718 /* Target PROGRAM is the third argument. */
e0e989f6
KH
7719 Fput (Qcall_process_region, Qtarget_idx, make_number (2));
7720
7721 Qstart_process = intern ("start-process");
7722 staticpro (&Qstart_process);
9ce27fde 7723 /* Target PROGRAM is the third argument. */
e0e989f6
KH
7724 Fput (Qstart_process, Qtarget_idx, make_number (2));
7725
7726 Qopen_network_stream = intern ("open-network-stream");
7727 staticpro (&Qopen_network_stream);
9ce27fde 7728 /* Target SERVICE is the fourth argument. */
e0e989f6
KH
7729 Fput (Qopen_network_stream, Qtarget_idx, make_number (3));
7730
4ed46869
KH
7731 Qcoding_system = intern ("coding-system");
7732 staticpro (&Qcoding_system);
7733
7734 Qeol_type = intern ("eol-type");
7735 staticpro (&Qeol_type);
7736
7737 Qbuffer_file_coding_system = intern ("buffer-file-coding-system");
7738 staticpro (&Qbuffer_file_coding_system);
7739
7740 Qpost_read_conversion = intern ("post-read-conversion");
7741 staticpro (&Qpost_read_conversion);
7742
7743 Qpre_write_conversion = intern ("pre-write-conversion");
7744 staticpro (&Qpre_write_conversion);
7745
27901516
KH
7746 Qno_conversion = intern ("no-conversion");
7747 staticpro (&Qno_conversion);
7748
7749 Qundecided = intern ("undecided");
7750 staticpro (&Qundecided);
7751
4ed46869
KH
7752 Qcoding_system_p = intern ("coding-system-p");
7753 staticpro (&Qcoding_system_p);
7754
7755 Qcoding_system_error = intern ("coding-system-error");
7756 staticpro (&Qcoding_system_error);
7757
7758 Fput (Qcoding_system_error, Qerror_conditions,
7759 Fcons (Qcoding_system_error, Fcons (Qerror, Qnil)));
7760 Fput (Qcoding_system_error, Qerror_message,
9ce27fde 7761 build_string ("Invalid coding system"));
4ed46869 7762
d46c5b12
KH
7763 Qcoding_category = intern ("coding-category");
7764 staticpro (&Qcoding_category);
4ed46869
KH
7765 Qcoding_category_index = intern ("coding-category-index");
7766 staticpro (&Qcoding_category_index);
7767
d46c5b12
KH
7768 Vcoding_category_table
7769 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX), Qnil);
7770 staticpro (&Vcoding_category_table);
4ed46869
KH
7771 {
7772 int i;
7773 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
7774 {
d46c5b12
KH
7775 XVECTOR (Vcoding_category_table)->contents[i]
7776 = intern (coding_category_name[i]);
7777 Fput (XVECTOR (Vcoding_category_table)->contents[i],
7778 Qcoding_category_index, make_number (i));
4ed46869
KH
7779 }
7780 }
7781
6b89e3aa
KH
7782 Vcoding_system_safe_chars = Fcons (Qnil, Qnil);
7783 staticpro (&Vcoding_system_safe_chars);
7784
f967223b
KH
7785 Qtranslation_table = intern ("translation-table");
7786 staticpro (&Qtranslation_table);
b666620c 7787 Fput (Qtranslation_table, Qchar_table_extra_slots, make_number (2));
bdd9fb48 7788
f967223b
KH
7789 Qtranslation_table_id = intern ("translation-table-id");
7790 staticpro (&Qtranslation_table_id);
84fbb8a0 7791
f967223b
KH
7792 Qtranslation_table_for_decode = intern ("translation-table-for-decode");
7793 staticpro (&Qtranslation_table_for_decode);
a5d301df 7794
f967223b
KH
7795 Qtranslation_table_for_encode = intern ("translation-table-for-encode");
7796 staticpro (&Qtranslation_table_for_encode);
a5d301df 7797
05e6f5dc
KH
7798 Qsafe_chars = intern ("safe-chars");
7799 staticpro (&Qsafe_chars);
7800
7801 Qchar_coding_system = intern ("char-coding-system");
7802 staticpro (&Qchar_coding_system);
7803
7804 /* Intern this now in case it isn't already done.
7805 Setting this variable twice is harmless.
7806 But don't staticpro it here--that is done in alloc.c. */
7807 Qchar_table_extra_slots = intern ("char-table-extra-slots");
7808 Fput (Qsafe_chars, Qchar_table_extra_slots, make_number (0));
067a6a66 7809 Fput (Qchar_coding_system, Qchar_table_extra_slots, make_number (0));
70c22245 7810
1397dc18
KH
7811 Qvalid_codes = intern ("valid-codes");
7812 staticpro (&Qvalid_codes);
7813
9ce27fde
KH
7814 Qemacs_mule = intern ("emacs-mule");
7815 staticpro (&Qemacs_mule);
7816
d46c5b12
KH
7817 Qraw_text = intern ("raw-text");
7818 staticpro (&Qraw_text);
7819
ecf488bc
DL
7820 Qutf_8 = intern ("utf-8");
7821 staticpro (&Qutf_8);
7822
a362520d
KH
7823 Qcoding_system_define_form = intern ("coding-system-define-form");
7824 staticpro (&Qcoding_system_define_form);
7825
4ed46869
KH
7826 defsubr (&Scoding_system_p);
7827 defsubr (&Sread_coding_system);
7828 defsubr (&Sread_non_nil_coding_system);
7829 defsubr (&Scheck_coding_system);
7830 defsubr (&Sdetect_coding_region);
d46c5b12 7831 defsubr (&Sdetect_coding_string);
05e6f5dc 7832 defsubr (&Sfind_coding_systems_region_internal);
068a9dbd 7833 defsubr (&Sunencodable_char_position);
4ed46869
KH
7834 defsubr (&Sdecode_coding_region);
7835 defsubr (&Sencode_coding_region);
7836 defsubr (&Sdecode_coding_string);
7837 defsubr (&Sencode_coding_string);
7838 defsubr (&Sdecode_sjis_char);
7839 defsubr (&Sencode_sjis_char);
7840 defsubr (&Sdecode_big5_char);
7841 defsubr (&Sencode_big5_char);
1ba9e4ab 7842 defsubr (&Sset_terminal_coding_system_internal);
c4825358 7843 defsubr (&Sset_safe_terminal_coding_system_internal);
4ed46869 7844 defsubr (&Sterminal_coding_system);
1ba9e4ab 7845 defsubr (&Sset_keyboard_coding_system_internal);
4ed46869 7846 defsubr (&Skeyboard_coding_system);
a5d301df 7847 defsubr (&Sfind_operation_coding_system);
1397dc18 7848 defsubr (&Supdate_coding_systems_internal);
66cfb530 7849 defsubr (&Sset_coding_priority_internal);
6b89e3aa 7850 defsubr (&Sdefine_coding_system_internal);
4ed46869 7851
4608c386 7852 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list,
48b0f3ae
PJ
7853 doc: /* List of coding systems.
7854
7855Do not alter the value of this variable manually. This variable should be
7856updated by the functions `make-coding-system' and
7857`define-coding-system-alias'. */);
4608c386
KH
7858 Vcoding_system_list = Qnil;
7859
7860 DEFVAR_LISP ("coding-system-alist", &Vcoding_system_alist,
48b0f3ae
PJ
7861 doc: /* Alist of coding system names.
7862Each element is one element list of coding system name.
7863This variable is given to `completing-read' as TABLE argument.
7864
7865Do not alter the value of this variable manually. This variable should be
7866updated by the functions `make-coding-system' and
7867`define-coding-system-alias'. */);
4608c386
KH
7868 Vcoding_system_alist = Qnil;
7869
4ed46869 7870 DEFVAR_LISP ("coding-category-list", &Vcoding_category_list,
48b0f3ae
PJ
7871 doc: /* List of coding-categories (symbols) ordered by priority.
7872
7873On detecting a coding system, Emacs tries code detection algorithms
7874associated with each coding-category one by one in this order. When
7875one algorithm agrees with a byte sequence of source text, the coding
0ec31faf
KH
7876system bound to the corresponding coding-category is selected.
7877
42205607 7878Don't modify this variable directly, but use `set-coding-priority'. */);
4ed46869
KH
7879 {
7880 int i;
7881
7882 Vcoding_category_list = Qnil;
7883 for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--)
7884 Vcoding_category_list
d46c5b12
KH
7885 = Fcons (XVECTOR (Vcoding_category_table)->contents[i],
7886 Vcoding_category_list);
4ed46869
KH
7887 }
7888
7889 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read,
48b0f3ae
PJ
7890 doc: /* Specify the coding system for read operations.
7891It is useful to bind this variable with `let', but do not set it globally.
7892If the value is a coding system, it is used for decoding on read operation.
7893If not, an appropriate element is used from one of the coding system alists:
7894There are three such tables, `file-coding-system-alist',
7895`process-coding-system-alist', and `network-coding-system-alist'. */);
4ed46869
KH
7896 Vcoding_system_for_read = Qnil;
7897
7898 DEFVAR_LISP ("coding-system-for-write", &Vcoding_system_for_write,
48b0f3ae
PJ
7899 doc: /* Specify the coding system for write operations.
7900Programs bind this variable with `let', but you should not set it globally.
7901If the value is a coding system, it is used for encoding of output,
7902when writing it to a file and when sending it to a file or subprocess.
7903
7904If this does not specify a coding system, an appropriate element
7905is used from one of the coding system alists:
7906There are three such tables, `file-coding-system-alist',
7907`process-coding-system-alist', and `network-coding-system-alist'.
7908For output to files, if the above procedure does not specify a coding system,
7909the value of `buffer-file-coding-system' is used. */);
4ed46869
KH
7910 Vcoding_system_for_write = Qnil;
7911
7912 DEFVAR_LISP ("last-coding-system-used", &Vlast_coding_system_used,
7c695ab9
DL
7913 doc: /* Coding system used in the latest file or process I/O.
7914Also set by `encode-coding-region', `decode-coding-region',
7915`encode-coding-string' and `decode-coding-string'. */);
4ed46869
KH
7916 Vlast_coding_system_used = Qnil;
7917
9ce27fde 7918 DEFVAR_BOOL ("inhibit-eol-conversion", &inhibit_eol_conversion,
48b0f3ae
PJ
7919 doc: /* *Non-nil means always inhibit code conversion of end-of-line format.
7920See info node `Coding Systems' and info node `Text and Binary' concerning
7921such conversion. */);
9ce27fde
KH
7922 inhibit_eol_conversion = 0;
7923
ed29121d 7924 DEFVAR_BOOL ("inherit-process-coding-system", &inherit_process_coding_system,
48b0f3ae
PJ
7925 doc: /* Non-nil means process buffer inherits coding system of process output.
7926Bind it to t if the process output is to be treated as if it were a file
7927read from some filesystem. */);
ed29121d
EZ
7928 inherit_process_coding_system = 0;
7929
02ba4723 7930 DEFVAR_LISP ("file-coding-system-alist", &Vfile_coding_system_alist,
48b0f3ae
PJ
7931 doc: /* Alist to decide a coding system to use for a file I/O operation.
7932The format is ((PATTERN . VAL) ...),
7933where PATTERN is a regular expression matching a file name,
7934VAL is a coding system, a cons of coding systems, or a function symbol.
7935If VAL is a coding system, it is used for both decoding and encoding
7936the file contents.
7937If VAL is a cons of coding systems, the car part is used for decoding,
7938and the cdr part is used for encoding.
7939If VAL is a function symbol, the function must return a coding system
0192762c 7940or a cons of coding systems which are used as above. The function gets
ff955d90 7941the arguments with which `find-operation-coding-system' was called.
48b0f3ae
PJ
7942
7943See also the function `find-operation-coding-system'
7944and the variable `auto-coding-alist'. */);
02ba4723
KH
7945 Vfile_coding_system_alist = Qnil;
7946
7947 DEFVAR_LISP ("process-coding-system-alist", &Vprocess_coding_system_alist,
48b0f3ae
PJ
7948 doc: /* Alist to decide a coding system to use for a process I/O operation.
7949The format is ((PATTERN . VAL) ...),
7950where PATTERN is a regular expression matching a program name,
7951VAL is a coding system, a cons of coding systems, or a function symbol.
7952If VAL is a coding system, it is used for both decoding what received
7953from the program and encoding what sent to the program.
7954If VAL is a cons of coding systems, the car part is used for decoding,
7955and the cdr part is used for encoding.
7956If VAL is a function symbol, the function must return a coding system
7957or a cons of coding systems which are used as above.
7958
7959See also the function `find-operation-coding-system'. */);
02ba4723
KH
7960 Vprocess_coding_system_alist = Qnil;
7961
7962 DEFVAR_LISP ("network-coding-system-alist", &Vnetwork_coding_system_alist,
48b0f3ae
PJ
7963 doc: /* Alist to decide a coding system to use for a network I/O operation.
7964The format is ((PATTERN . VAL) ...),
7965where PATTERN is a regular expression matching a network service name
7966or is a port number to connect to,
7967VAL is a coding system, a cons of coding systems, or a function symbol.
7968If VAL is a coding system, it is used for both decoding what received
7969from the network stream and encoding what sent to the network stream.
7970If VAL is a cons of coding systems, the car part is used for decoding,
7971and the cdr part is used for encoding.
7972If VAL is a function symbol, the function must return a coding system
7973or a cons of coding systems which are used as above.
7974
7975See also the function `find-operation-coding-system'. */);
02ba4723 7976 Vnetwork_coding_system_alist = Qnil;
4ed46869 7977
68c45bf0 7978 DEFVAR_LISP ("locale-coding-system", &Vlocale_coding_system,
75205970
RS
7979 doc: /* Coding system to use with system messages.
7980Also used for decoding keyboard input on X Window system. */);
68c45bf0
PE
7981 Vlocale_coding_system = Qnil;
7982
005f0d35 7983 /* The eol mnemonics are reset in startup.el system-dependently. */
7722baf9 7984 DEFVAR_LISP ("eol-mnemonic-unix", &eol_mnemonic_unix,
48b0f3ae 7985 doc: /* *String displayed in mode line for UNIX-like (LF) end-of-line format. */);
7722baf9 7986 eol_mnemonic_unix = build_string (":");
4ed46869 7987
7722baf9 7988 DEFVAR_LISP ("eol-mnemonic-dos", &eol_mnemonic_dos,
48b0f3ae 7989 doc: /* *String displayed in mode line for DOS-like (CRLF) end-of-line format. */);
7722baf9 7990 eol_mnemonic_dos = build_string ("\\");
4ed46869 7991
7722baf9 7992 DEFVAR_LISP ("eol-mnemonic-mac", &eol_mnemonic_mac,
48b0f3ae 7993 doc: /* *String displayed in mode line for MAC-like (CR) end-of-line format. */);
7722baf9 7994 eol_mnemonic_mac = build_string ("/");
4ed46869 7995
7722baf9 7996 DEFVAR_LISP ("eol-mnemonic-undecided", &eol_mnemonic_undecided,
48b0f3ae 7997 doc: /* *String displayed in mode line when end-of-line format is not yet determined. */);
7722baf9 7998 eol_mnemonic_undecided = build_string (":");
4ed46869 7999
84fbb8a0 8000 DEFVAR_LISP ("enable-character-translation", &Venable_character_translation,
48b0f3ae 8001 doc: /* *Non-nil enables character translation while encoding and decoding. */);
84fbb8a0 8002 Venable_character_translation = Qt;
bdd9fb48 8003
f967223b 8004 DEFVAR_LISP ("standard-translation-table-for-decode",
48b0f3ae
PJ
8005 &Vstandard_translation_table_for_decode,
8006 doc: /* Table for translating characters while decoding. */);
f967223b 8007 Vstandard_translation_table_for_decode = Qnil;
bdd9fb48 8008
f967223b 8009 DEFVAR_LISP ("standard-translation-table-for-encode",
48b0f3ae
PJ
8010 &Vstandard_translation_table_for_encode,
8011 doc: /* Table for translating characters while encoding. */);
f967223b 8012 Vstandard_translation_table_for_encode = Qnil;
4ed46869
KH
8013
8014 DEFVAR_LISP ("charset-revision-table", &Vcharset_revision_alist,
48b0f3ae
PJ
8015 doc: /* Alist of charsets vs revision numbers.
8016While encoding, if a charset (car part of an element) is found,
8017designate it with the escape sequence identifying revision (cdr part of the element). */);
4ed46869 8018 Vcharset_revision_alist = Qnil;
02ba4723
KH
8019
8020 DEFVAR_LISP ("default-process-coding-system",
8021 &Vdefault_process_coding_system,
48b0f3ae
PJ
8022 doc: /* Cons of coding systems used for process I/O by default.
8023The car part is used for decoding a process output,
8024the cdr part is used for encoding a text to be sent to a process. */);
02ba4723 8025 Vdefault_process_coding_system = Qnil;
c4825358 8026
3f003981 8027 DEFVAR_LISP ("latin-extra-code-table", &Vlatin_extra_code_table,
48b0f3ae
PJ
8028 doc: /* Table of extra Latin codes in the range 128..159 (inclusive).
8029This is a vector of length 256.
8030If Nth element is non-nil, the existence of code N in a file
8031\(or output of subprocess) doesn't prevent it to be detected as
8032a coding system of ISO 2022 variant which has a flag
8033`accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file
8034or reading output of a subprocess.
8035Only 128th through 159th elements has a meaning. */);
3f003981 8036 Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil);
d46c5b12
KH
8037
8038 DEFVAR_LISP ("select-safe-coding-system-function",
8039 &Vselect_safe_coding_system_function,
48b0f3ae
PJ
8040 doc: /* Function to call to select safe coding system for encoding a text.
8041
8042If set, this function is called to force a user to select a proper
8043coding system which can encode the text in the case that a default
8044coding system used in each operation can't encode the text.
8045
8046The default value is `select-safe-coding-system' (which see). */);
d46c5b12
KH
8047 Vselect_safe_coding_system_function = Qnil;
8048
5d5bf4d8
KH
8049 DEFVAR_BOOL ("coding-system-require-warning",
8050 &coding_system_require_warning,
8051 doc: /* Internal use only.
6b89e3aa
KH
8052If non-nil, on writing a file, `select-safe-coding-system-function' is
8053called even if `coding-system-for-write' is non-nil. The command
8054`universal-coding-system-argument' binds this variable to t temporarily. */);
5d5bf4d8
KH
8055 coding_system_require_warning = 0;
8056
8057
22ab2303 8058 DEFVAR_BOOL ("inhibit-iso-escape-detection",
74383408 8059 &inhibit_iso_escape_detection,
48b0f3ae
PJ
8060 doc: /* If non-nil, Emacs ignores ISO2022's escape sequence on code detection.
8061
8062By default, on reading a file, Emacs tries to detect how the text is
8063encoded. This code detection is sensitive to escape sequences. If
8064the sequence is valid as ISO2022, the code is determined as one of
8065the ISO2022 encodings, and the file is decoded by the corresponding
8066coding system (e.g. `iso-2022-7bit').
8067
8068However, there may be a case that you want to read escape sequences in
8069a file as is. In such a case, you can set this variable to non-nil.
8070Then, as the code detection ignores any escape sequences, no file is
8071detected as encoded in some ISO2022 encoding. The result is that all
8072escape sequences become visible in a buffer.
8073
8074The default value is nil, and it is strongly recommended not to change
8075it. That is because many Emacs Lisp source files that contain
8076non-ASCII characters are encoded by the coding system `iso-2022-7bit'
8077in Emacs's distribution, and they won't be decoded correctly on
8078reading if you suppress escape sequence detection.
8079
8080The other way to read escape sequences in a file without decoding is
8081to explicitly specify some coding system that doesn't use ISO2022's
8082escape sequence (e.g `latin-1') on reading by \\[universal-coding-system-argument]. */);
74383408 8083 inhibit_iso_escape_detection = 0;
002fdb44
DL
8084
8085 DEFVAR_LISP ("translation-table-for-input", &Vtranslation_table_for_input,
15c8f9d1
DL
8086 doc: /* Char table for translating self-inserting characters.
8087This is applied to the result of input methods, not their input. See also
8088`keyboard-translate-table'. */);
002fdb44 8089 Vtranslation_table_for_input = Qnil;
4ed46869
KH
8090}
8091
68c45bf0
PE
8092char *
8093emacs_strerror (error_number)
8094 int error_number;
8095{
8096 char *str;
8097
ca9c0567 8098 synchronize_system_messages_locale ();
68c45bf0
PE
8099 str = strerror (error_number);
8100
8101 if (! NILP (Vlocale_coding_system))
8102 {
8103 Lisp_Object dec = code_convert_string_norecord (build_string (str),
8104 Vlocale_coding_system,
8105 0);
d5db4077 8106 str = (char *) SDATA (dec);
68c45bf0
PE
8107 }
8108
8109 return str;
8110}
8111
4ed46869 8112#endif /* emacs */
c2f94ebc 8113
ab5796a9
MB
8114/* arch-tag: 3a3a2b01-5ff6-4071-9afe-f5b808d9229d
8115 (do not change this comment) */