2016-08-29 22:49:24 +02:00
|
|
|
|
/* vi:set ts=8 sts=4 sw=4 noet:
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*
|
|
|
|
|
* VIM - Vi IMproved by Bram Moolenaar
|
|
|
|
|
* Multibyte extensions partly by Sung-Hoon Baek
|
|
|
|
|
*
|
|
|
|
|
* Do ":help uganda" in Vim to read copying and usage conditions.
|
|
|
|
|
* Do ":help credits" in Vim to see a list of people who contributed.
|
|
|
|
|
* See README.txt for an overview of the Vim source code.
|
|
|
|
|
*/
|
|
|
|
|
/*
|
|
|
|
|
* mbyte.c: Code specifically for handling multi-byte characters.
|
|
|
|
|
*
|
|
|
|
|
* The encoding used in the core is set with 'encoding'. When 'encoding' is
|
|
|
|
|
* changed, the following four variables are set (for speed).
|
|
|
|
|
* Currently these types of character encodings are supported:
|
|
|
|
|
*
|
|
|
|
|
* "enc_dbcs" When non-zero it tells the type of double byte character
|
|
|
|
|
* encoding (Chinese, Korean, Japanese, etc.).
|
|
|
|
|
* The cell width on the display is equal to the number of
|
|
|
|
|
* bytes. (exception: DBCS_JPNU with first byte 0x8e)
|
|
|
|
|
* Recognizing the first or second byte is difficult, it
|
|
|
|
|
* requires checking a byte sequence from the start.
|
|
|
|
|
* "enc_utf8" When TRUE use Unicode characters in UTF-8 encoding.
|
|
|
|
|
* The cell width on the display needs to be determined from
|
|
|
|
|
* the character value.
|
|
|
|
|
* Recognizing bytes is easy: 0xxx.xxxx is a single-byte
|
|
|
|
|
* char, 10xx.xxxx is a trailing byte, 11xx.xxxx is a leading
|
|
|
|
|
* byte of a multi-byte character.
|
2010-01-12 19:52:03 +01:00
|
|
|
|
* To make things complicated, up to six composing characters
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* are allowed. These are drawn on top of the first char.
|
|
|
|
|
* For most editing the sequence of bytes with composing
|
|
|
|
|
* characters included is considered to be one character.
|
|
|
|
|
* "enc_unicode" When 2 use 16-bit Unicode characters (or UTF-16).
|
|
|
|
|
* When 4 use 32-but Unicode characters.
|
|
|
|
|
* Internally characters are stored in UTF-8 encoding to
|
|
|
|
|
* avoid NUL bytes. Conversion happens when doing I/O.
|
|
|
|
|
* "enc_utf8" will also be TRUE.
|
|
|
|
|
*
|
|
|
|
|
* "has_mbyte" is set when "enc_dbcs" or "enc_utf8" is non-zero.
|
|
|
|
|
*
|
|
|
|
|
* If none of these is TRUE, 8-bit bytes are used for a character. The
|
|
|
|
|
* encoding isn't currently specified (TODO).
|
|
|
|
|
*
|
|
|
|
|
* 'encoding' specifies the encoding used in the core. This is in registers,
|
|
|
|
|
* text manipulation, buffers, etc. Conversion has to be done when characters
|
|
|
|
|
* in another encoding are received or send:
|
|
|
|
|
*
|
|
|
|
|
* clipboard
|
|
|
|
|
* ^
|
|
|
|
|
* | (2)
|
|
|
|
|
* V
|
|
|
|
|
* +---------------+
|
|
|
|
|
* (1) | | (3)
|
|
|
|
|
* keyboard ----->| core |-----> display
|
|
|
|
|
* | |
|
|
|
|
|
* +---------------+
|
|
|
|
|
* ^
|
|
|
|
|
* | (4)
|
|
|
|
|
* V
|
|
|
|
|
* file
|
|
|
|
|
*
|
|
|
|
|
* (1) Typed characters arrive in the current locale. Conversion is to be
|
|
|
|
|
* done when 'encoding' is different from 'termencoding'.
|
|
|
|
|
* (2) Text will be made available with the encoding specified with
|
|
|
|
|
* 'encoding'. If this is not sufficient, system-specific conversion
|
|
|
|
|
* might be required.
|
|
|
|
|
* (3) For the GUI the correct font must be selected, no conversion done.
|
|
|
|
|
* Otherwise, conversion is to be done when 'encoding' differs from
|
|
|
|
|
* 'termencoding'. (Different in the GTK+ 2 port -- 'termencoding'
|
|
|
|
|
* is always used for both input and output and must always be set to
|
|
|
|
|
* "utf-8". gui_mch_init() does this automatically.)
|
|
|
|
|
* (4) The encoding of the file is specified with 'fileencoding'. Conversion
|
|
|
|
|
* is to be done when it's different from 'encoding'.
|
|
|
|
|
*
|
|
|
|
|
* The viminfo file is a special case: Only text is converted, not file names.
|
|
|
|
|
* Vim scripts may contain an ":encoding" command. This has an effect for
|
|
|
|
|
* some commands, like ":menutrans"
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
#include "vim.h"
|
|
|
|
|
|
|
|
|
|
#ifdef WIN32UNIX
|
|
|
|
|
# ifndef WIN32_LEAN_AND_MEAN
|
|
|
|
|
# define WIN32_LEAN_AND_MEAN
|
|
|
|
|
# endif
|
2014-01-14 13:26:21 +01:00
|
|
|
|
# if defined(FEAT_GUI) || defined(FEAT_XCLIPBOARD)
|
2022-04-11 15:28:50 +01:00
|
|
|
|
# ifdef __CYGWIN__
|
|
|
|
|
// ControlMask from <X11/X.h> (included in "vim.h") is conflicting with
|
|
|
|
|
// <w32api/windows.h> (included in <X11/Xwindows.h>).
|
|
|
|
|
# undef ControlMask
|
|
|
|
|
# endif
|
2014-01-14 13:26:21 +01:00
|
|
|
|
# include <X11/Xwindows.h>
|
|
|
|
|
# define WINBYTE wBYTE
|
|
|
|
|
# else
|
|
|
|
|
# include <windows.h>
|
|
|
|
|
# define WINBYTE BYTE
|
|
|
|
|
# endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
# ifdef WIN32
|
2019-12-04 21:57:43 +01:00
|
|
|
|
# undef WIN32 // Some windows.h define WIN32, we don't want that here.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
# endif
|
2014-01-14 13:26:21 +01:00
|
|
|
|
#else
|
|
|
|
|
# define WINBYTE BYTE
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#endif
|
|
|
|
|
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#if (defined(MSWIN) || defined(WIN32UNIX)) && !defined(__MINGW32__)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
# include <winnls.h>
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
#ifdef FEAT_GUI_X11
|
|
|
|
|
# include <X11/Intrinsic.h>
|
|
|
|
|
#endif
|
|
|
|
|
#ifdef X_LOCALE
|
2017-10-28 21:11:06 +02:00
|
|
|
|
# include <X11/Xlocale.h>
|
|
|
|
|
# if !defined(HAVE_MBLEN) && !defined(mblen)
|
|
|
|
|
# define mblen _Xmblen
|
|
|
|
|
# endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
#ifdef HAVE_WCHAR_H
|
|
|
|
|
# include <wchar.h>
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
#if 0
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// This has been disabled, because several people reported problems with the
|
|
|
|
|
// wcwidth() and iswprint() library functions, esp. for Hebrew.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
# ifdef __STDC_ISO_10646__
|
|
|
|
|
# define USE_WCHAR_FUNCTIONS
|
|
|
|
|
# endif
|
|
|
|
|
#endif
|
|
|
|
|
|
2016-01-29 22:36:45 +01:00
|
|
|
|
static int dbcs_char2len(int c);
|
|
|
|
|
static int dbcs_char2bytes(int c, char_u *buf);
|
|
|
|
|
static int dbcs_ptr2len(char_u *p);
|
|
|
|
|
static int dbcs_ptr2len_len(char_u *p, int size);
|
|
|
|
|
static int utf_ptr2cells_len(char_u *p, int size);
|
|
|
|
|
static int dbcs_char2cells(int c);
|
|
|
|
|
static int dbcs_ptr2cells_len(char_u *p, int size);
|
|
|
|
|
static int dbcs_ptr2char(char_u *p);
|
2019-09-10 21:27:18 +02:00
|
|
|
|
static int dbcs_head_off(char_u *base, char_u *p);
|
2020-08-28 22:24:57 +02:00
|
|
|
|
#ifdef FEAT_EVAL
|
2020-08-28 21:04:24 +02:00
|
|
|
|
static int cw_value(int c);
|
2020-08-28 22:24:57 +02:00
|
|
|
|
#endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2009-12-02 14:02:39 +00:00
|
|
|
|
/*
|
|
|
|
|
* Lookup table to quickly get the length in bytes of a UTF-8 character from
|
|
|
|
|
* the first byte of a UTF-8 string.
|
|
|
|
|
* Bytes which are illegal when used as the first byte have a 1.
|
|
|
|
|
* The NUL byte has length 1.
|
|
|
|
|
*/
|
2004-06-13 20:20:40 +00:00
|
|
|
|
static char utf8len_tab[256] =
|
|
|
|
|
{
|
|
|
|
|
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
|
|
|
|
|
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
|
|
|
|
|
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
|
|
|
|
|
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
|
2009-12-02 14:02:39 +00:00
|
|
|
|
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
|
|
|
|
|
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
|
2004-06-13 20:20:40 +00:00
|
|
|
|
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
|
|
|
|
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,6,6,1,1,
|
|
|
|
|
};
|
|
|
|
|
|
2009-12-02 14:02:39 +00:00
|
|
|
|
/*
|
|
|
|
|
* Like utf8len_tab above, but using a zero for illegal lead bytes.
|
|
|
|
|
*/
|
|
|
|
|
static char utf8len_tab_zero[256] =
|
|
|
|
|
{
|
|
|
|
|
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
|
|
|
|
|
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
|
|
|
|
|
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
|
|
|
|
|
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
|
|
|
|
|
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
|
|
|
|
|
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
|
|
|
|
|
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
|
|
|
|
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,6,6,0,0,
|
|
|
|
|
};
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Canonical encoding names and their properties.
|
|
|
|
|
* "iso-8859-n" is handled by enc_canonize() directly.
|
|
|
|
|
*/
|
|
|
|
|
static struct
|
|
|
|
|
{ char *name; int prop; int codepage;}
|
|
|
|
|
enc_canon_table[] =
|
|
|
|
|
{
|
|
|
|
|
#define IDX_LATIN_1 0
|
|
|
|
|
{"latin1", ENC_8BIT + ENC_LATIN1, 1252},
|
|
|
|
|
#define IDX_ISO_2 1
|
|
|
|
|
{"iso-8859-2", ENC_8BIT, 0},
|
|
|
|
|
#define IDX_ISO_3 2
|
|
|
|
|
{"iso-8859-3", ENC_8BIT, 0},
|
|
|
|
|
#define IDX_ISO_4 3
|
|
|
|
|
{"iso-8859-4", ENC_8BIT, 0},
|
|
|
|
|
#define IDX_ISO_5 4
|
|
|
|
|
{"iso-8859-5", ENC_8BIT, 0},
|
|
|
|
|
#define IDX_ISO_6 5
|
|
|
|
|
{"iso-8859-6", ENC_8BIT, 0},
|
|
|
|
|
#define IDX_ISO_7 6
|
|
|
|
|
{"iso-8859-7", ENC_8BIT, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_ISO_8 7
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"iso-8859-8", ENC_8BIT, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_ISO_9 8
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"iso-8859-9", ENC_8BIT, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_ISO_10 9
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"iso-8859-10", ENC_8BIT, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_ISO_11 10
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"iso-8859-11", ENC_8BIT, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_ISO_13 11
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"iso-8859-13", ENC_8BIT, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_ISO_14 12
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"iso-8859-14", ENC_8BIT, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_ISO_15 13
|
2004-10-07 21:02:47 +00:00
|
|
|
|
{"iso-8859-15", ENC_8BIT + ENC_LATIN9, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_KOI8_R 14
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"koi8-r", ENC_8BIT, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_KOI8_U 15
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"koi8-u", ENC_8BIT, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_UTF8 16
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"utf-8", ENC_UNICODE, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_UCS2 17
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"ucs-2", ENC_UNICODE + ENC_ENDIAN_B + ENC_2BYTE, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_UCS2LE 18
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"ucs-2le", ENC_UNICODE + ENC_ENDIAN_L + ENC_2BYTE, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_UTF16 19
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"utf-16", ENC_UNICODE + ENC_ENDIAN_B + ENC_2WORD, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_UTF16LE 20
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"utf-16le", ENC_UNICODE + ENC_ENDIAN_L + ENC_2WORD, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_UCS4 21
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"ucs-4", ENC_UNICODE + ENC_ENDIAN_B + ENC_4BYTE, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_UCS4LE 22
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"ucs-4le", ENC_UNICODE + ENC_ENDIAN_L + ENC_4BYTE, 0},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// For debugging DBCS encoding on Unix.
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_DEBUG 23
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"debug", ENC_DBCS, DBCS_DEBUG},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_EUC_JP 24
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"euc-jp", ENC_DBCS, DBCS_JPNU},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_SJIS 25
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"sjis", ENC_DBCS, DBCS_JPN},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_EUC_KR 26
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"euc-kr", ENC_DBCS, DBCS_KORU},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_EUC_CN 27
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"euc-cn", ENC_DBCS, DBCS_CHSU},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_EUC_TW 28
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"euc-tw", ENC_DBCS, DBCS_CHTU},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_BIG5 29
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"big5", ENC_DBCS, DBCS_CHT},
|
2005-07-09 21:08:57 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// MS-DOS and MS-Windows codepages are included here, so that they can be
|
|
|
|
|
// used on Unix too. Most of them are similar to ISO-8859 encodings, but
|
|
|
|
|
// not exactly the same.
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP437 30
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp437", ENC_8BIT, 437}, // like iso-8859-1
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP737 31
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp737", ENC_8BIT, 737}, // like iso-8859-7
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP775 32
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp775", ENC_8BIT, 775}, // Baltic
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP850 33
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp850", ENC_8BIT, 850}, // like iso-8859-4
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP852 34
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp852", ENC_8BIT, 852}, // like iso-8859-1
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP855 35
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp855", ENC_8BIT, 855}, // like iso-8859-2
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP857 36
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp857", ENC_8BIT, 857}, // like iso-8859-5
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP860 37
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp860", ENC_8BIT, 860}, // like iso-8859-9
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP861 38
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp861", ENC_8BIT, 861}, // like iso-8859-1
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP862 39
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp862", ENC_8BIT, 862}, // like iso-8859-1
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP863 40
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp863", ENC_8BIT, 863}, // like iso-8859-8
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP865 41
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp865", ENC_8BIT, 865}, // like iso-8859-1
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP866 42
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp866", ENC_8BIT, 866}, // like iso-8859-5
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP869 43
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp869", ENC_8BIT, 869}, // like iso-8859-7
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP874 44
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp874", ENC_8BIT, 874}, // Thai
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP932 45
|
|
|
|
|
{"cp932", ENC_DBCS, DBCS_JPN},
|
|
|
|
|
#define IDX_CP936 46
|
|
|
|
|
{"cp936", ENC_DBCS, DBCS_CHS},
|
|
|
|
|
#define IDX_CP949 47
|
|
|
|
|
{"cp949", ENC_DBCS, DBCS_KOR},
|
|
|
|
|
#define IDX_CP950 48
|
|
|
|
|
{"cp950", ENC_DBCS, DBCS_CHT},
|
|
|
|
|
#define IDX_CP1250 49
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp1250", ENC_8BIT, 1250}, // Czech, Polish, etc.
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP1251 50
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp1251", ENC_8BIT, 1251}, // Cyrillic
|
|
|
|
|
// cp1252 is considered to be equal to latin1
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP1253 51
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp1253", ENC_8BIT, 1253}, // Greek
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP1254 52
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp1254", ENC_8BIT, 1254}, // Turkish
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP1255 53
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp1255", ENC_8BIT, 1255}, // Hebrew
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP1256 54
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp1256", ENC_8BIT, 1256}, // Arabic
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP1257 55
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp1257", ENC_8BIT, 1257}, // Baltic
|
2005-07-09 21:08:57 +00:00
|
|
|
|
#define IDX_CP1258 56
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"cp1258", ENC_8BIT, 1258}, // Vietnamese
|
2005-07-09 21:08:57 +00:00
|
|
|
|
|
|
|
|
|
#define IDX_MACROMAN 57
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"macroman", ENC_8BIT + ENC_MACROMAN, 0}, // Mac OS
|
2006-05-13 15:06:23 +00:00
|
|
|
|
#define IDX_DECMCS 58
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"dec-mcs", ENC_8BIT, 0}, // DEC MCS
|
2006-05-13 15:06:23 +00:00
|
|
|
|
#define IDX_HPROMAN8 59
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"hp-roman8", ENC_8BIT, 0}, // HP Roman8
|
2006-05-13 15:06:23 +00:00
|
|
|
|
#define IDX_COUNT 60
|
2004-06-13 20:20:40 +00:00
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Aliases for encoding names.
|
|
|
|
|
*/
|
|
|
|
|
static struct
|
|
|
|
|
{ char *name; int canon;}
|
|
|
|
|
enc_alias_table[] =
|
|
|
|
|
{
|
|
|
|
|
{"ansi", IDX_LATIN_1},
|
|
|
|
|
{"iso-8859-1", IDX_LATIN_1},
|
2022-02-22 12:34:28 +00:00
|
|
|
|
{"iso-8859", IDX_LATIN_1},
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"latin2", IDX_ISO_2},
|
|
|
|
|
{"latin3", IDX_ISO_3},
|
|
|
|
|
{"latin4", IDX_ISO_4},
|
|
|
|
|
{"cyrillic", IDX_ISO_5},
|
|
|
|
|
{"arabic", IDX_ISO_6},
|
|
|
|
|
{"greek", IDX_ISO_7},
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#ifdef MSWIN
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"hebrew", IDX_CP1255},
|
|
|
|
|
#else
|
|
|
|
|
{"hebrew", IDX_ISO_8},
|
|
|
|
|
#endif
|
|
|
|
|
{"latin5", IDX_ISO_9},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"turkish", IDX_ISO_9}, // ?
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"latin6", IDX_ISO_10},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"nordic", IDX_ISO_10}, // ?
|
|
|
|
|
{"thai", IDX_ISO_11}, // ?
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"latin7", IDX_ISO_13},
|
|
|
|
|
{"latin8", IDX_ISO_14},
|
|
|
|
|
{"latin9", IDX_ISO_15},
|
|
|
|
|
{"utf8", IDX_UTF8},
|
|
|
|
|
{"unicode", IDX_UCS2},
|
|
|
|
|
{"ucs2", IDX_UCS2},
|
|
|
|
|
{"ucs2be", IDX_UCS2},
|
|
|
|
|
{"ucs-2be", IDX_UCS2},
|
|
|
|
|
{"ucs2le", IDX_UCS2LE},
|
|
|
|
|
{"utf16", IDX_UTF16},
|
|
|
|
|
{"utf16be", IDX_UTF16},
|
|
|
|
|
{"utf-16be", IDX_UTF16},
|
|
|
|
|
{"utf16le", IDX_UTF16LE},
|
|
|
|
|
{"ucs4", IDX_UCS4},
|
|
|
|
|
{"ucs4be", IDX_UCS4},
|
|
|
|
|
{"ucs-4be", IDX_UCS4},
|
|
|
|
|
{"ucs4le", IDX_UCS4LE},
|
2008-02-20 10:29:39 +00:00
|
|
|
|
{"utf32", IDX_UCS4},
|
|
|
|
|
{"utf-32", IDX_UCS4},
|
|
|
|
|
{"utf32be", IDX_UCS4},
|
|
|
|
|
{"utf-32be", IDX_UCS4},
|
|
|
|
|
{"utf32le", IDX_UCS4LE},
|
|
|
|
|
{"utf-32le", IDX_UCS4LE},
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"932", IDX_CP932},
|
|
|
|
|
{"949", IDX_CP949},
|
|
|
|
|
{"936", IDX_CP936},
|
2006-08-16 16:03:34 +00:00
|
|
|
|
{"gbk", IDX_CP936},
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"950", IDX_CP950},
|
|
|
|
|
{"eucjp", IDX_EUC_JP},
|
|
|
|
|
{"unix-jis", IDX_EUC_JP},
|
|
|
|
|
{"ujis", IDX_EUC_JP},
|
|
|
|
|
{"shift-jis", IDX_SJIS},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"pck", IDX_SJIS}, // Sun: PCK
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"euckr", IDX_EUC_KR},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{"5601", IDX_EUC_KR}, // Sun: KS C 5601
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"euccn", IDX_EUC_CN},
|
|
|
|
|
{"gb2312", IDX_EUC_CN},
|
|
|
|
|
{"euctw", IDX_EUC_TW},
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#if defined(MSWIN) || defined(WIN32UNIX) || defined(MACOS_X)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{"japan", IDX_CP932},
|
|
|
|
|
{"korea", IDX_CP949},
|
|
|
|
|
{"prc", IDX_CP936},
|
|
|
|
|
{"chinese", IDX_CP936},
|
|
|
|
|
{"taiwan", IDX_CP950},
|
|
|
|
|
{"big5", IDX_CP950},
|
|
|
|
|
#else
|
|
|
|
|
{"japan", IDX_EUC_JP},
|
|
|
|
|
{"korea", IDX_EUC_KR},
|
|
|
|
|
{"prc", IDX_EUC_CN},
|
|
|
|
|
{"chinese", IDX_EUC_CN},
|
|
|
|
|
{"taiwan", IDX_EUC_TW},
|
|
|
|
|
{"cp950", IDX_BIG5},
|
|
|
|
|
{"950", IDX_BIG5},
|
|
|
|
|
#endif
|
|
|
|
|
{"mac", IDX_MACROMAN},
|
2006-05-13 15:06:23 +00:00
|
|
|
|
{"mac-roman", IDX_MACROMAN},
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{NULL, 0}
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
#ifndef CP_UTF8
|
2019-12-04 21:57:43 +01:00
|
|
|
|
# define CP_UTF8 65001 // magic number from winnls.h
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Find encoding "name" in the list of canonical encoding names.
|
|
|
|
|
* Returns -1 if not found.
|
|
|
|
|
*/
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
enc_canon_search(char_u *name)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
for (i = 0; i < IDX_COUNT; ++i)
|
|
|
|
|
if (STRCMP(name, enc_canon_table[i].name) == 0)
|
|
|
|
|
return i;
|
|
|
|
|
return -1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Find canonical encoding "name" in the list and return its properties.
|
|
|
|
|
* Returns 0 if not found.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
enc_canon_props(char_u *name)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
i = enc_canon_search(name);
|
|
|
|
|
if (i >= 0)
|
|
|
|
|
return enc_canon_table[i].prop;
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#ifdef MSWIN
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (name[0] == 'c' && name[1] == 'p' && VIM_ISDIGIT(name[2]))
|
|
|
|
|
{
|
|
|
|
|
CPINFO cpinfo;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Get info on this codepage to find out what it is.
|
2016-02-16 15:06:59 +01:00
|
|
|
|
if (GetCPInfo(atoi((char *)name + 2), &cpinfo) != 0)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (cpinfo.MaxCharSize == 1) // some single-byte encoding
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return ENC_8BIT;
|
|
|
|
|
if (cpinfo.MaxCharSize == 2
|
|
|
|
|
&& (cpinfo.LeadByte[0] != 0 || cpinfo.LeadByte[1] != 0))
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// must be a DBCS encoding
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return ENC_DBCS;
|
|
|
|
|
}
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
if (STRNCMP(name, "2byte-", 6) == 0)
|
|
|
|
|
return ENC_DBCS;
|
|
|
|
|
if (STRNCMP(name, "8bit-", 5) == 0 || STRNCMP(name, "iso-8859-", 9) == 0)
|
|
|
|
|
return ENC_8BIT;
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Set up for using multi-byte characters.
|
|
|
|
|
* Called in three cases:
|
|
|
|
|
* - by main() to initialize (p_enc == NULL)
|
|
|
|
|
* - by set_init_1() after 'encoding' was set to its default.
|
|
|
|
|
* - by do_set() when 'encoding' has been set.
|
|
|
|
|
* p_enc must have been passed through enc_canonize() already.
|
|
|
|
|
* Sets the "enc_unicode", "enc_utf8", "enc_dbcs" and "has_mbyte" flags.
|
|
|
|
|
* Fills mb_bytelen_tab[] and returns NULL when there are no problems.
|
|
|
|
|
* When there is something wrong: Returns an error message and doesn't change
|
|
|
|
|
* anything.
|
|
|
|
|
*/
|
2019-01-13 23:38:42 +01:00
|
|
|
|
char *
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_init(void)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int i;
|
|
|
|
|
int idx;
|
|
|
|
|
int n;
|
|
|
|
|
int enc_dbcs_new = 0;
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#if defined(USE_ICONV) && !defined(MSWIN) && !defined(WIN32UNIX) \
|
2017-10-28 21:11:06 +02:00
|
|
|
|
&& !defined(MACOS_CONVERT)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
# define LEN_FROM_CONV
|
|
|
|
|
vimconv_T vimconv;
|
|
|
|
|
char_u *p;
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
if (p_enc == NULL)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Just starting up: set the whole table to one's.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
for (i = 0; i < 256; ++i)
|
|
|
|
|
mb_bytelen_tab[i] = 1;
|
|
|
|
|
input_conv.vc_type = CONV_NONE;
|
|
|
|
|
input_conv.vc_factor = 1;
|
|
|
|
|
output_conv.vc_type = CONV_NONE;
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#ifdef MSWIN
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (p_enc[0] == 'c' && p_enc[1] == 'p' && VIM_ISDIGIT(p_enc[2]))
|
|
|
|
|
{
|
|
|
|
|
CPINFO cpinfo;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Get info on this codepage to find out what it is.
|
2016-02-16 15:06:59 +01:00
|
|
|
|
if (GetCPInfo(atoi((char *)p_enc + 2), &cpinfo) != 0)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (cpinfo.MaxCharSize == 1)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// some single-byte encoding
|
2004-06-13 20:20:40 +00:00
|
|
|
|
enc_unicode = 0;
|
|
|
|
|
enc_utf8 = FALSE;
|
|
|
|
|
}
|
|
|
|
|
else if (cpinfo.MaxCharSize == 2
|
|
|
|
|
&& (cpinfo.LeadByte[0] != 0 || cpinfo.LeadByte[1] != 0))
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// must be a DBCS encoding, check below
|
2016-02-16 15:06:59 +01:00
|
|
|
|
enc_dbcs_new = atoi((char *)p_enc + 2);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
goto codepage_invalid;
|
|
|
|
|
}
|
|
|
|
|
else if (GetLastError() == ERROR_INVALID_PARAMETER)
|
|
|
|
|
{
|
|
|
|
|
codepage_invalid:
|
2022-01-02 21:26:16 +00:00
|
|
|
|
return N_(e_not_valid_codepage);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
else if (STRNCMP(p_enc, "8bit-", 5) == 0
|
|
|
|
|
|| STRNCMP(p_enc, "iso-8859-", 9) == 0)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Accept any "8bit-" or "iso-8859-" name.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
enc_unicode = 0;
|
|
|
|
|
enc_utf8 = FALSE;
|
|
|
|
|
}
|
|
|
|
|
else if (STRNCMP(p_enc, "2byte-", 6) == 0)
|
|
|
|
|
{
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#ifdef MSWIN
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Windows: accept only valid codepage numbers, check below.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (p_enc[6] != 'c' || p_enc[7] != 'p'
|
2016-02-16 15:06:59 +01:00
|
|
|
|
|| (enc_dbcs_new = atoi((char *)p_enc + 8)) == 0)
|
2021-12-31 22:49:24 +00:00
|
|
|
|
return e_invalid_argument;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#else
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Unix: accept any "2byte-" name, assume current locale.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
enc_dbcs_new = DBCS_2BYTE;
|
|
|
|
|
#endif
|
|
|
|
|
}
|
|
|
|
|
else if ((idx = enc_canon_search(p_enc)) >= 0)
|
|
|
|
|
{
|
|
|
|
|
i = enc_canon_table[idx].prop;
|
|
|
|
|
if (i & ENC_UNICODE)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Unicode
|
2004-06-13 20:20:40 +00:00
|
|
|
|
enc_utf8 = TRUE;
|
|
|
|
|
if (i & (ENC_2BYTE | ENC_2WORD))
|
|
|
|
|
enc_unicode = 2;
|
|
|
|
|
else if (i & ENC_4BYTE)
|
|
|
|
|
enc_unicode = 4;
|
|
|
|
|
else
|
|
|
|
|
enc_unicode = 0;
|
|
|
|
|
}
|
|
|
|
|
else if (i & ENC_DBCS)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// 2byte, handle below
|
2004-06-13 20:20:40 +00:00
|
|
|
|
enc_dbcs_new = enc_canon_table[idx].codepage;
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Must be 8-bit.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
enc_unicode = 0;
|
|
|
|
|
enc_utf8 = FALSE;
|
|
|
|
|
}
|
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
else // Don't know what encoding this is, reject it.
|
2021-12-31 22:49:24 +00:00
|
|
|
|
return e_invalid_argument;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
if (enc_dbcs_new != 0)
|
|
|
|
|
{
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#ifdef MSWIN
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Check if the DBCS code page is OK.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (!IsValidCodePage(enc_dbcs_new))
|
|
|
|
|
goto codepage_invalid;
|
|
|
|
|
#endif
|
|
|
|
|
enc_unicode = 0;
|
|
|
|
|
enc_utf8 = FALSE;
|
|
|
|
|
}
|
|
|
|
|
enc_dbcs = enc_dbcs_new;
|
|
|
|
|
has_mbyte = (enc_dbcs != 0 || enc_utf8);
|
|
|
|
|
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#if defined(MSWIN) || defined(FEAT_CYGWIN_WIN32_CLIPBOARD)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
enc_codepage = encname2codepage(p_enc);
|
2004-10-07 21:02:47 +00:00
|
|
|
|
enc_latin9 = (STRCMP(p_enc, "iso-8859-15") == 0);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#endif
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Detect an encoding that uses latin1 characters.
|
2005-08-23 21:00:13 +00:00
|
|
|
|
enc_latin1like = (enc_utf8 || STRCMP(p_enc, "latin1") == 0
|
|
|
|
|
|| STRCMP(p_enc, "iso-8859-15") == 0);
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Set the function pointers.
|
|
|
|
|
*/
|
|
|
|
|
if (enc_utf8)
|
|
|
|
|
{
|
2005-08-10 21:07:57 +00:00
|
|
|
|
mb_ptr2len = utfc_ptr2len;
|
2009-06-16 13:12:07 +00:00
|
|
|
|
mb_ptr2len_len = utfc_ptr2len_len;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
mb_char2len = utf_char2len;
|
|
|
|
|
mb_char2bytes = utf_char2bytes;
|
|
|
|
|
mb_ptr2cells = utf_ptr2cells;
|
2009-06-16 13:12:07 +00:00
|
|
|
|
mb_ptr2cells_len = utf_ptr2cells_len;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
mb_char2cells = utf_char2cells;
|
|
|
|
|
mb_off2cells = utf_off2cells;
|
|
|
|
|
mb_ptr2char = utf_ptr2char;
|
|
|
|
|
mb_head_off = utf_head_off;
|
|
|
|
|
}
|
|
|
|
|
else if (enc_dbcs != 0)
|
|
|
|
|
{
|
2005-08-10 21:07:57 +00:00
|
|
|
|
mb_ptr2len = dbcs_ptr2len;
|
2009-06-16 13:12:07 +00:00
|
|
|
|
mb_ptr2len_len = dbcs_ptr2len_len;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
mb_char2len = dbcs_char2len;
|
|
|
|
|
mb_char2bytes = dbcs_char2bytes;
|
|
|
|
|
mb_ptr2cells = dbcs_ptr2cells;
|
2009-06-16 13:12:07 +00:00
|
|
|
|
mb_ptr2cells_len = dbcs_ptr2cells_len;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
mb_char2cells = dbcs_char2cells;
|
|
|
|
|
mb_off2cells = dbcs_off2cells;
|
|
|
|
|
mb_ptr2char = dbcs_ptr2char;
|
|
|
|
|
mb_head_off = dbcs_head_off;
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
2005-08-10 21:07:57 +00:00
|
|
|
|
mb_ptr2len = latin_ptr2len;
|
2009-06-16 13:12:07 +00:00
|
|
|
|
mb_ptr2len_len = latin_ptr2len_len;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
mb_char2len = latin_char2len;
|
|
|
|
|
mb_char2bytes = latin_char2bytes;
|
|
|
|
|
mb_ptr2cells = latin_ptr2cells;
|
2009-06-16 13:12:07 +00:00
|
|
|
|
mb_ptr2cells_len = latin_ptr2cells_len;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
mb_char2cells = latin_char2cells;
|
|
|
|
|
mb_off2cells = latin_off2cells;
|
|
|
|
|
mb_ptr2char = latin_ptr2char;
|
|
|
|
|
mb_head_off = latin_head_off;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Fill the mb_bytelen_tab[] for MB_BYTE2LEN().
|
|
|
|
|
*/
|
|
|
|
|
#ifdef LEN_FROM_CONV
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// When 'encoding' is different from the current locale mblen() won't
|
|
|
|
|
// work. Use conversion to "utf-8" instead.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
vimconv.vc_type = CONV_NONE;
|
|
|
|
|
if (enc_dbcs)
|
|
|
|
|
{
|
|
|
|
|
p = enc_locale();
|
|
|
|
|
if (p == NULL || STRCMP(p, p_enc) != 0)
|
|
|
|
|
{
|
|
|
|
|
convert_setup(&vimconv, p_enc, (char_u *)"utf-8");
|
|
|
|
|
vimconv.vc_fail = TRUE;
|
|
|
|
|
}
|
|
|
|
|
vim_free(p);
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
for (i = 0; i < 256; ++i)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Our own function to reliably check the length of UTF-8 characters,
|
|
|
|
|
// independent of mblen().
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (enc_utf8)
|
|
|
|
|
n = utf8len_tab[i];
|
|
|
|
|
else if (enc_dbcs == 0)
|
|
|
|
|
n = 1;
|
|
|
|
|
else
|
|
|
|
|
{
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#if defined(MSWIN) || defined(WIN32UNIX)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// enc_dbcs is set by setting 'fileencoding'. It becomes a Windows
|
|
|
|
|
// CodePage identifier, which we can pass directly in to Windows
|
|
|
|
|
// API
|
2014-01-14 13:26:21 +01:00
|
|
|
|
n = IsDBCSLeadByteEx(enc_dbcs, (WINBYTE)i) ? 2 : 1;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#else
|
2017-10-28 21:11:06 +02:00
|
|
|
|
# if defined(__amigaos4__) || defined(__ANDROID__) || \
|
|
|
|
|
!(defined(HAVE_MBLEN) || defined(X_LOCALE))
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* if mblen() is not available, character which MSB is turned on
|
|
|
|
|
* are treated as leading byte character. (note : This assumption
|
|
|
|
|
* is not always true.)
|
|
|
|
|
*/
|
|
|
|
|
n = (i & 0x80) ? 2 : 1;
|
|
|
|
|
# else
|
2012-06-01 15:21:02 +02:00
|
|
|
|
char buf[MB_MAXBYTES + 1];
|
2017-10-28 21:11:06 +02:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (i == NUL) // just in case mblen() can't handle ""
|
2004-06-13 20:20:40 +00:00
|
|
|
|
n = 1;
|
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
buf[0] = i;
|
|
|
|
|
buf[1] = 0;
|
2017-10-28 21:11:06 +02:00
|
|
|
|
# ifdef LEN_FROM_CONV
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (vimconv.vc_type != CONV_NONE)
|
|
|
|
|
{
|
|
|
|
|
/*
|
|
|
|
|
* string_convert() should fail when converting the first
|
|
|
|
|
* byte of a double-byte character.
|
|
|
|
|
*/
|
|
|
|
|
p = string_convert(&vimconv, (char_u *)buf, NULL);
|
|
|
|
|
if (p != NULL)
|
|
|
|
|
{
|
|
|
|
|
vim_free(p);
|
|
|
|
|
n = 1;
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
n = 2;
|
|
|
|
|
}
|
|
|
|
|
else
|
2017-10-28 21:11:06 +02:00
|
|
|
|
# endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
/*
|
|
|
|
|
* mblen() should return -1 for invalid (means the leading
|
|
|
|
|
* multibyte) character. However there are some platforms
|
|
|
|
|
* where mblen() returns 0 for invalid character.
|
|
|
|
|
* Therefore, following condition includes 0.
|
|
|
|
|
*/
|
2018-09-13 15:33:43 +02:00
|
|
|
|
vim_ignored = mblen(NULL, 0); // First reset the state.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (mblen(buf, (size_t)1) <= 0)
|
|
|
|
|
n = 2;
|
|
|
|
|
else
|
|
|
|
|
n = 1;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
# endif
|
|
|
|
|
#endif
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
mb_bytelen_tab[i] = n;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
#ifdef LEN_FROM_CONV
|
|
|
|
|
convert_setup(&vimconv, NULL, NULL);
|
|
|
|
|
#endif
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// The cell width depends on the type of multi-byte characters.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
(void)init_chartab();
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// When enc_utf8 is set or reset, (de)allocate ScreenLinesUC[]
|
2004-06-13 20:20:40 +00:00
|
|
|
|
screenalloc(FALSE);
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// When using Unicode, set default for 'fileencodings'.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (enc_utf8 && !option_was_set((char_u *)"fencs"))
|
2021-06-13 20:27:36 +02:00
|
|
|
|
set_fencs_unicode();
|
2006-02-27 23:58:35 +00:00
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#if defined(HAVE_BIND_TEXTDOMAIN_CODESET) && defined(FEAT_GETTEXT)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// GNU gettext 0.10.37 supports this feature: set the codeset used for
|
|
|
|
|
// translated messages independently from the current locale.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
(void)bind_textdomain_codeset(VIMPACKAGE,
|
|
|
|
|
enc_utf8 ? "utf-8" : (char *)p_enc);
|
|
|
|
|
#endif
|
|
|
|
|
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#ifdef MSWIN
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// When changing 'encoding' while starting up, then convert the command
|
|
|
|
|
// line arguments from the active codepage to 'encoding'.
|
2004-09-06 17:44:46 +00:00
|
|
|
|
if (starting != 0)
|
|
|
|
|
fix_arg_enc();
|
|
|
|
|
#endif
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Fire an autocommand to let people do custom font setup. This must be
|
|
|
|
|
// after Vim has been setup for the new encoding.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
apply_autocmds(EVENT_ENCODINGCHANGED, NULL, (char_u *)"", FALSE, curbuf);
|
|
|
|
|
|
2006-03-12 21:56:11 +00:00
|
|
|
|
#ifdef FEAT_SPELL
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Need to reload spell dictionaries
|
2005-04-15 21:00:38 +00:00
|
|
|
|
spell_reload();
|
|
|
|
|
#endif
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return the size of the BOM for the current buffer:
|
|
|
|
|
* 0 - no BOM
|
|
|
|
|
* 2 - UCS-2 or UTF-16 BOM
|
|
|
|
|
* 4 - UCS-4 BOM
|
|
|
|
|
* 3 - UTF-8 BOM
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
bomb_size(void)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int n = 0;
|
|
|
|
|
|
|
|
|
|
if (curbuf->b_p_bomb && !curbuf->b_p_bin)
|
|
|
|
|
{
|
|
|
|
|
if (*curbuf->b_p_fenc == NUL)
|
|
|
|
|
{
|
|
|
|
|
if (enc_utf8)
|
|
|
|
|
{
|
|
|
|
|
if (enc_unicode != 0)
|
|
|
|
|
n = enc_unicode;
|
|
|
|
|
else
|
|
|
|
|
n = 3;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
else if (STRCMP(curbuf->b_p_fenc, "utf-8") == 0)
|
|
|
|
|
n = 3;
|
|
|
|
|
else if (STRNCMP(curbuf->b_p_fenc, "ucs-2", 5) == 0
|
|
|
|
|
|| STRNCMP(curbuf->b_p_fenc, "utf-16", 6) == 0)
|
|
|
|
|
n = 2;
|
|
|
|
|
else if (STRNCMP(curbuf->b_p_fenc, "ucs-4", 5) == 0)
|
|
|
|
|
n = 4;
|
|
|
|
|
}
|
|
|
|
|
return n;
|
|
|
|
|
}
|
|
|
|
|
|
2019-01-20 15:30:40 +01:00
|
|
|
|
#if defined(FEAT_QUICKFIX) || defined(PROTO)
|
2011-08-10 13:21:46 +02:00
|
|
|
|
/*
|
|
|
|
|
* Remove all BOM from "s" by moving remaining text.
|
|
|
|
|
*/
|
|
|
|
|
void
|
2016-01-30 18:51:09 +01:00
|
|
|
|
remove_bom(char_u *s)
|
2011-08-10 13:21:46 +02:00
|
|
|
|
{
|
2023-01-14 12:32:28 +00:00
|
|
|
|
if (!enc_utf8)
|
|
|
|
|
return;
|
2011-08-10 13:21:46 +02:00
|
|
|
|
|
2023-01-14 12:32:28 +00:00
|
|
|
|
char_u *p = s;
|
|
|
|
|
|
|
|
|
|
while ((p = vim_strbyte(p, 0xef)) != NULL)
|
|
|
|
|
{
|
|
|
|
|
if (p[1] == 0xbb && p[2] == 0xbf)
|
|
|
|
|
STRMOVE(p, p + 3);
|
|
|
|
|
else
|
|
|
|
|
++p;
|
2011-08-10 13:21:46 +02:00
|
|
|
|
}
|
|
|
|
|
}
|
2019-01-20 15:30:40 +01:00
|
|
|
|
#endif
|
2011-08-10 13:21:46 +02:00
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Get class of pointer:
|
|
|
|
|
* 0 for blank or NUL
|
|
|
|
|
* 1 for punctuation
|
|
|
|
|
* 2 for an (ASCII) word character
|
|
|
|
|
* >2 for other word characters
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_get_class(char_u *p)
|
2013-01-30 13:59:37 +01:00
|
|
|
|
{
|
|
|
|
|
return mb_get_class_buf(p, curbuf);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_get_class_buf(char_u *p, buf_T *buf)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (MB_BYTE2LEN(p[0]) == 1)
|
|
|
|
|
{
|
2017-03-12 20:10:05 +01:00
|
|
|
|
if (p[0] == NUL || VIM_ISWHITE(p[0]))
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 0;
|
2013-01-30 13:59:37 +01:00
|
|
|
|
if (vim_iswordc_buf(p[0], buf))
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 2;
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
if (enc_dbcs != 0 && p[0] != NUL && p[1] != NUL)
|
|
|
|
|
return dbcs_class(p[0], p[1]);
|
|
|
|
|
if (enc_utf8)
|
2017-01-28 16:39:34 +01:00
|
|
|
|
return utf_class_buf(utf_ptr2char(p), buf);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Get class of a double-byte character. This always returns 3 or bigger.
|
|
|
|
|
* TODO: Should return 1 for punctuation.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_class(unsigned lead, unsigned trail)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
switch (enc_dbcs)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// please add classify routine for your language in here
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case DBCS_JPNU: // ?
|
2004-06-13 20:20:40 +00:00
|
|
|
|
case DBCS_JPN:
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// JIS code classification
|
2004-06-13 20:20:40 +00:00
|
|
|
|
unsigned char lb = lead;
|
|
|
|
|
unsigned char tb = trail;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// convert process code to JIS
|
2019-02-17 17:44:42 +01:00
|
|
|
|
# if defined(MSWIN) || defined(WIN32UNIX) || defined(MACOS_X)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// process code is SJIS
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (lb <= 0x9f)
|
|
|
|
|
lb = (lb - 0x81) * 2 + 0x21;
|
|
|
|
|
else
|
|
|
|
|
lb = (lb - 0xc1) * 2 + 0x21;
|
|
|
|
|
if (tb <= 0x7e)
|
|
|
|
|
tb -= 0x1f;
|
|
|
|
|
else if (tb <= 0x9e)
|
|
|
|
|
tb -= 0x20;
|
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
tb -= 0x7e;
|
|
|
|
|
lb += 1;
|
|
|
|
|
}
|
|
|
|
|
# else
|
|
|
|
|
/*
|
|
|
|
|
* XXX: Code page identification can not use with all
|
|
|
|
|
* system! So, some other encoding information
|
|
|
|
|
* will be needed.
|
|
|
|
|
* In japanese: SJIS,EUC,UNICODE,(JIS)
|
|
|
|
|
* Note that JIS-code system don't use as
|
|
|
|
|
* process code in most system because it uses
|
|
|
|
|
* escape sequences(JIS is context depend encoding).
|
|
|
|
|
*/
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// assume process code is JAPANESE-EUC
|
2004-06-13 20:20:40 +00:00
|
|
|
|
lb &= 0x7f;
|
|
|
|
|
tb &= 0x7f;
|
|
|
|
|
# endif
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// exceptions
|
2004-06-13 20:20:40 +00:00
|
|
|
|
switch (lb << 8 | tb)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case 0x2121: // ZENKAKU space
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 0;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case 0x2122: // TOU-TEN (Japanese comma)
|
|
|
|
|
case 0x2123: // KU-TEN (Japanese period)
|
|
|
|
|
case 0x2124: // ZENKAKU comma
|
|
|
|
|
case 0x2125: // ZENKAKU period
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 1;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case 0x213c: // prolongedsound handled as KATAKANA
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 13;
|
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// sieved by KU code
|
2004-06-13 20:20:40 +00:00
|
|
|
|
switch (lb)
|
|
|
|
|
{
|
|
|
|
|
case 0x21:
|
|
|
|
|
case 0x22:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// special symbols
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 10;
|
|
|
|
|
case 0x23:
|
2020-01-26 22:00:26 +01:00
|
|
|
|
// alphanumeric
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 11;
|
|
|
|
|
case 0x24:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// hiragana
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 12;
|
|
|
|
|
case 0x25:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// katakana
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 13;
|
|
|
|
|
case 0x26:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// greek
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 14;
|
|
|
|
|
case 0x27:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// russian
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 15;
|
|
|
|
|
case 0x28:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// lines
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 16;
|
|
|
|
|
default:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// kanji
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 17;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case DBCS_KORU: // ?
|
2004-06-13 20:20:40 +00:00
|
|
|
|
case DBCS_KOR:
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// KS code classification
|
2004-06-13 20:20:40 +00:00
|
|
|
|
unsigned char c1 = lead;
|
|
|
|
|
unsigned char c2 = trail;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* 20 : Hangul
|
|
|
|
|
* 21 : Hanja
|
|
|
|
|
* 22 : Symbols
|
2020-01-26 22:00:26 +01:00
|
|
|
|
* 23 : Alphanumeric/Roman Letter (Full width)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* 24 : Hangul Letter(Alphabet)
|
|
|
|
|
* 25 : Roman Numeral/Greek Letter
|
|
|
|
|
* 26 : Box Drawings
|
|
|
|
|
* 27 : Unit Symbols
|
|
|
|
|
* 28 : Circled/Parenthesized Letter
|
2013-05-06 04:24:17 +02:00
|
|
|
|
* 29 : Hiragana/Katakana
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* 30 : Cyrillic Letter
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
if (c1 >= 0xB0 && c1 <= 0xC8)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Hangul
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 20;
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#if defined(MSWIN) || defined(WIN32UNIX)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else if (c1 <= 0xA0 || c2 <= 0xA0)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Extended Hangul Region : MS UHC(Unified Hangul Code)
|
|
|
|
|
// c1: 0x81-0xA0 with c2: 0x41-0x5A, 0x61-0x7A, 0x81-0xFE
|
|
|
|
|
// c1: 0xA1-0xC6 with c2: 0x41-0x5A, 0x61-0x7A, 0x81-0xA0
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 20;
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
else if (c1 >= 0xCA && c1 <= 0xFD)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Hanja
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 21;
|
|
|
|
|
else switch (c1)
|
|
|
|
|
{
|
|
|
|
|
case 0xA1:
|
|
|
|
|
case 0xA2:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Symbols
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 22;
|
|
|
|
|
case 0xA3:
|
2020-01-26 22:00:26 +01:00
|
|
|
|
// Alphanumeric
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 23;
|
|
|
|
|
case 0xA4:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Hangul Letter(Alphabet)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 24;
|
|
|
|
|
case 0xA5:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Roman Numeral/Greek Letter
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 25;
|
|
|
|
|
case 0xA6:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Box Drawings
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 26;
|
|
|
|
|
case 0xA7:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Unit Symbols
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 27;
|
|
|
|
|
case 0xA8:
|
|
|
|
|
case 0xA9:
|
|
|
|
|
if (c2 <= 0xAF)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return 25; // Roman Letter
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else if (c2 >= 0xF6)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return 22; // Symbols
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Circled/Parenthesized Letter
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 28;
|
|
|
|
|
case 0xAA:
|
|
|
|
|
case 0xAB:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Hiragana/Katakana
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 29;
|
|
|
|
|
case 0xAC:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Cyrillic Letter
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 30;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
default:
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
return 3;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* mb_char2len() function pointer.
|
|
|
|
|
* Return length in bytes of character "c".
|
|
|
|
|
* Returns 1 for a single-byte character.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
latin_char2len(int c UNUSED)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_char2len(
|
|
|
|
|
int c)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (c >= 0x100)
|
|
|
|
|
return 2;
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* mb_char2bytes() function pointer.
|
|
|
|
|
* Convert a character to its bytes.
|
|
|
|
|
* Returns the length in bytes.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
latin_char2bytes(int c, char_u *buf)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
buf[0] = c;
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_char2bytes(int c, char_u *buf)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (c >= 0x100)
|
|
|
|
|
{
|
|
|
|
|
buf[0] = (unsigned)c >> 8;
|
|
|
|
|
buf[1] = c;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Never use a NUL byte, it causes lots of trouble. It's an invalid
|
|
|
|
|
// character anyway.
|
2005-03-20 22:37:15 +00:00
|
|
|
|
if (buf[1] == NUL)
|
|
|
|
|
buf[1] = '\n';
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 2;
|
|
|
|
|
}
|
|
|
|
|
buf[0] = c;
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
2022-08-16 17:50:38 +01:00
|
|
|
|
* Get byte length of character at "*p". Returns zero when "*p" is NUL.
|
|
|
|
|
* Used for mb_ptr2len() when 'encoding' latin.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
latin_ptr2len(char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2022-08-16 17:50:38 +01:00
|
|
|
|
return *p == NUL ? 0 : 1;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
2022-08-16 17:50:38 +01:00
|
|
|
|
/*
|
|
|
|
|
* Get byte length of character at "*p". Returns zero when "*p" is NUL.
|
|
|
|
|
* Used for mb_ptr2len() when 'encoding' DBCS.
|
|
|
|
|
*/
|
2004-06-13 20:20:40 +00:00
|
|
|
|
static int
|
2022-08-16 17:50:38 +01:00
|
|
|
|
dbcs_ptr2len(char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
|
|
|
|
|
2022-08-16 17:50:38 +01:00
|
|
|
|
if (*p == NUL)
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
|
|
// if the second byte is missing the length is 1
|
2004-06-13 20:20:40 +00:00
|
|
|
|
len = MB_BYTE2LEN(*p);
|
|
|
|
|
if (len == 2 && p[1] == NUL)
|
|
|
|
|
len = 1;
|
|
|
|
|
return len;
|
|
|
|
|
}
|
|
|
|
|
|
2009-06-16 13:12:07 +00:00
|
|
|
|
/*
|
|
|
|
|
* mb_ptr2len_len() function pointer.
|
|
|
|
|
* Like mb_ptr2len(), but limit to read "size" bytes.
|
|
|
|
|
* Returns 0 for an empty string.
|
|
|
|
|
* Returns 1 for an illegal char or an incomplete byte sequence.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
latin_ptr2len_len(char_u *p, int size)
|
2009-06-16 13:12:07 +00:00
|
|
|
|
{
|
|
|
|
|
if (size < 1 || *p == NUL)
|
|
|
|
|
return 0;
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_ptr2len_len(char_u *p, int size)
|
2009-06-16 13:12:07 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
|
|
|
|
|
|
|
|
|
if (size < 1 || *p == NUL)
|
|
|
|
|
return 0;
|
|
|
|
|
if (size == 1)
|
|
|
|
|
return 1;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Check that second byte is not missing.
|
2009-06-16 13:12:07 +00:00
|
|
|
|
len = MB_BYTE2LEN(*p);
|
|
|
|
|
if (len == 2 && p[1] == NUL)
|
|
|
|
|
len = 1;
|
|
|
|
|
return len;
|
|
|
|
|
}
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
struct interval
|
|
|
|
|
{
|
2010-01-12 19:52:03 +01:00
|
|
|
|
long first;
|
|
|
|
|
long last;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return TRUE if "c" is in "table[size / sizeof(struct interval)]".
|
|
|
|
|
*/
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
intable(struct interval *table, size_t size, int c)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int mid, bot, top;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// first quick check for Latin1 etc. characters
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (c < table[0].first)
|
|
|
|
|
return FALSE;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// binary search in table
|
2004-06-13 20:20:40 +00:00
|
|
|
|
bot = 0;
|
2006-04-17 22:14:47 +00:00
|
|
|
|
top = (int)(size / sizeof(struct interval) - 1);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
while (top >= bot)
|
|
|
|
|
{
|
|
|
|
|
mid = (bot + top) / 2;
|
|
|
|
|
if (table[mid].last < c)
|
|
|
|
|
bot = mid + 1;
|
|
|
|
|
else if (table[mid].first > c)
|
|
|
|
|
top = mid - 1;
|
|
|
|
|
else
|
|
|
|
|
return TRUE;
|
|
|
|
|
}
|
|
|
|
|
return FALSE;
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Sorted list of non-overlapping intervals of East Asian Ambiguous
|
|
|
|
|
// characters, generated with ../runtime/tools/unicode.vim.
|
2016-04-02 22:14:51 +02:00
|
|
|
|
static struct interval ambiguous[] =
|
|
|
|
|
{
|
|
|
|
|
{0x00a1, 0x00a1},
|
|
|
|
|
{0x00a4, 0x00a4},
|
|
|
|
|
{0x00a7, 0x00a8},
|
|
|
|
|
{0x00aa, 0x00aa},
|
|
|
|
|
{0x00ad, 0x00ae},
|
|
|
|
|
{0x00b0, 0x00b4},
|
|
|
|
|
{0x00b6, 0x00ba},
|
|
|
|
|
{0x00bc, 0x00bf},
|
|
|
|
|
{0x00c6, 0x00c6},
|
|
|
|
|
{0x00d0, 0x00d0},
|
|
|
|
|
{0x00d7, 0x00d8},
|
|
|
|
|
{0x00de, 0x00e1},
|
|
|
|
|
{0x00e6, 0x00e6},
|
|
|
|
|
{0x00e8, 0x00ea},
|
|
|
|
|
{0x00ec, 0x00ed},
|
|
|
|
|
{0x00f0, 0x00f0},
|
|
|
|
|
{0x00f2, 0x00f3},
|
|
|
|
|
{0x00f7, 0x00fa},
|
|
|
|
|
{0x00fc, 0x00fc},
|
|
|
|
|
{0x00fe, 0x00fe},
|
|
|
|
|
{0x0101, 0x0101},
|
|
|
|
|
{0x0111, 0x0111},
|
|
|
|
|
{0x0113, 0x0113},
|
|
|
|
|
{0x011b, 0x011b},
|
|
|
|
|
{0x0126, 0x0127},
|
|
|
|
|
{0x012b, 0x012b},
|
|
|
|
|
{0x0131, 0x0133},
|
|
|
|
|
{0x0138, 0x0138},
|
|
|
|
|
{0x013f, 0x0142},
|
|
|
|
|
{0x0144, 0x0144},
|
|
|
|
|
{0x0148, 0x014b},
|
|
|
|
|
{0x014d, 0x014d},
|
|
|
|
|
{0x0152, 0x0153},
|
|
|
|
|
{0x0166, 0x0167},
|
|
|
|
|
{0x016b, 0x016b},
|
|
|
|
|
{0x01ce, 0x01ce},
|
|
|
|
|
{0x01d0, 0x01d0},
|
|
|
|
|
{0x01d2, 0x01d2},
|
|
|
|
|
{0x01d4, 0x01d4},
|
|
|
|
|
{0x01d6, 0x01d6},
|
|
|
|
|
{0x01d8, 0x01d8},
|
|
|
|
|
{0x01da, 0x01da},
|
|
|
|
|
{0x01dc, 0x01dc},
|
|
|
|
|
{0x0251, 0x0251},
|
|
|
|
|
{0x0261, 0x0261},
|
|
|
|
|
{0x02c4, 0x02c4},
|
|
|
|
|
{0x02c7, 0x02c7},
|
|
|
|
|
{0x02c9, 0x02cb},
|
|
|
|
|
{0x02cd, 0x02cd},
|
|
|
|
|
{0x02d0, 0x02d0},
|
|
|
|
|
{0x02d8, 0x02db},
|
|
|
|
|
{0x02dd, 0x02dd},
|
|
|
|
|
{0x02df, 0x02df},
|
|
|
|
|
{0x0300, 0x036f},
|
|
|
|
|
{0x0391, 0x03a1},
|
|
|
|
|
{0x03a3, 0x03a9},
|
|
|
|
|
{0x03b1, 0x03c1},
|
|
|
|
|
{0x03c3, 0x03c9},
|
|
|
|
|
{0x0401, 0x0401},
|
|
|
|
|
{0x0410, 0x044f},
|
|
|
|
|
{0x0451, 0x0451},
|
|
|
|
|
{0x2010, 0x2010},
|
|
|
|
|
{0x2013, 0x2016},
|
|
|
|
|
{0x2018, 0x2019},
|
|
|
|
|
{0x201c, 0x201d},
|
|
|
|
|
{0x2020, 0x2022},
|
|
|
|
|
{0x2024, 0x2027},
|
|
|
|
|
{0x2030, 0x2030},
|
|
|
|
|
{0x2032, 0x2033},
|
|
|
|
|
{0x2035, 0x2035},
|
|
|
|
|
{0x203b, 0x203b},
|
|
|
|
|
{0x203e, 0x203e},
|
|
|
|
|
{0x2074, 0x2074},
|
|
|
|
|
{0x207f, 0x207f},
|
|
|
|
|
{0x2081, 0x2084},
|
|
|
|
|
{0x20ac, 0x20ac},
|
|
|
|
|
{0x2103, 0x2103},
|
|
|
|
|
{0x2105, 0x2105},
|
|
|
|
|
{0x2109, 0x2109},
|
|
|
|
|
{0x2113, 0x2113},
|
|
|
|
|
{0x2116, 0x2116},
|
|
|
|
|
{0x2121, 0x2122},
|
|
|
|
|
{0x2126, 0x2126},
|
|
|
|
|
{0x212b, 0x212b},
|
|
|
|
|
{0x2153, 0x2154},
|
|
|
|
|
{0x215b, 0x215e},
|
|
|
|
|
{0x2160, 0x216b},
|
|
|
|
|
{0x2170, 0x2179},
|
|
|
|
|
{0x2189, 0x2189},
|
|
|
|
|
{0x2190, 0x2199},
|
|
|
|
|
{0x21b8, 0x21b9},
|
|
|
|
|
{0x21d2, 0x21d2},
|
|
|
|
|
{0x21d4, 0x21d4},
|
|
|
|
|
{0x21e7, 0x21e7},
|
|
|
|
|
{0x2200, 0x2200},
|
|
|
|
|
{0x2202, 0x2203},
|
|
|
|
|
{0x2207, 0x2208},
|
|
|
|
|
{0x220b, 0x220b},
|
|
|
|
|
{0x220f, 0x220f},
|
|
|
|
|
{0x2211, 0x2211},
|
|
|
|
|
{0x2215, 0x2215},
|
|
|
|
|
{0x221a, 0x221a},
|
|
|
|
|
{0x221d, 0x2220},
|
|
|
|
|
{0x2223, 0x2223},
|
|
|
|
|
{0x2225, 0x2225},
|
|
|
|
|
{0x2227, 0x222c},
|
|
|
|
|
{0x222e, 0x222e},
|
|
|
|
|
{0x2234, 0x2237},
|
|
|
|
|
{0x223c, 0x223d},
|
|
|
|
|
{0x2248, 0x2248},
|
|
|
|
|
{0x224c, 0x224c},
|
|
|
|
|
{0x2252, 0x2252},
|
|
|
|
|
{0x2260, 0x2261},
|
|
|
|
|
{0x2264, 0x2267},
|
|
|
|
|
{0x226a, 0x226b},
|
|
|
|
|
{0x226e, 0x226f},
|
|
|
|
|
{0x2282, 0x2283},
|
|
|
|
|
{0x2286, 0x2287},
|
|
|
|
|
{0x2295, 0x2295},
|
|
|
|
|
{0x2299, 0x2299},
|
|
|
|
|
{0x22a5, 0x22a5},
|
|
|
|
|
{0x22bf, 0x22bf},
|
|
|
|
|
{0x2312, 0x2312},
|
|
|
|
|
{0x2460, 0x24e9},
|
|
|
|
|
{0x24eb, 0x254b},
|
|
|
|
|
{0x2550, 0x2573},
|
|
|
|
|
{0x2580, 0x258f},
|
|
|
|
|
{0x2592, 0x2595},
|
|
|
|
|
{0x25a0, 0x25a1},
|
|
|
|
|
{0x25a3, 0x25a9},
|
|
|
|
|
{0x25b2, 0x25b3},
|
|
|
|
|
{0x25b6, 0x25b7},
|
|
|
|
|
{0x25bc, 0x25bd},
|
|
|
|
|
{0x25c0, 0x25c1},
|
|
|
|
|
{0x25c6, 0x25c8},
|
|
|
|
|
{0x25cb, 0x25cb},
|
|
|
|
|
{0x25ce, 0x25d1},
|
|
|
|
|
{0x25e2, 0x25e5},
|
|
|
|
|
{0x25ef, 0x25ef},
|
|
|
|
|
{0x2605, 0x2606},
|
|
|
|
|
{0x2609, 0x2609},
|
|
|
|
|
{0x260e, 0x260f},
|
|
|
|
|
{0x261c, 0x261c},
|
|
|
|
|
{0x261e, 0x261e},
|
|
|
|
|
{0x2640, 0x2640},
|
|
|
|
|
{0x2642, 0x2642},
|
|
|
|
|
{0x2660, 0x2661},
|
|
|
|
|
{0x2663, 0x2665},
|
|
|
|
|
{0x2667, 0x266a},
|
|
|
|
|
{0x266c, 0x266d},
|
|
|
|
|
{0x266f, 0x266f},
|
|
|
|
|
{0x269e, 0x269f},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x26bf, 0x26bf},
|
|
|
|
|
{0x26c6, 0x26cd},
|
|
|
|
|
{0x26cf, 0x26d3},
|
|
|
|
|
{0x26d5, 0x26e1},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0x26e3, 0x26e3},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x26e8, 0x26e9},
|
|
|
|
|
{0x26eb, 0x26f1},
|
|
|
|
|
{0x26f4, 0x26f4},
|
|
|
|
|
{0x26f6, 0x26f9},
|
|
|
|
|
{0x26fb, 0x26fc},
|
|
|
|
|
{0x26fe, 0x26ff},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0x273d, 0x273d},
|
|
|
|
|
{0x2776, 0x277f},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x2b56, 0x2b59},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0x3248, 0x324f},
|
|
|
|
|
{0xe000, 0xf8ff},
|
|
|
|
|
{0xfe00, 0xfe0f},
|
|
|
|
|
{0xfffd, 0xfffd},
|
|
|
|
|
{0x1f100, 0x1f10a},
|
|
|
|
|
{0x1f110, 0x1f12d},
|
|
|
|
|
{0x1f130, 0x1f169},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1f170, 0x1f18d},
|
|
|
|
|
{0x1f18f, 0x1f190},
|
|
|
|
|
{0x1f19b, 0x1f1ac},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0xe0100, 0xe01ef},
|
|
|
|
|
{0xf0000, 0xffffd},
|
|
|
|
|
{0x100000, 0x10fffd}
|
|
|
|
|
};
|
|
|
|
|
|
2017-08-22 22:12:17 +02:00
|
|
|
|
#if defined(FEAT_TERMINAL) || defined(PROTO)
|
|
|
|
|
/*
|
|
|
|
|
* utf_char2cells() with different argument type for libvterm.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2017-08-30 13:22:28 +02:00
|
|
|
|
utf_uint2cells(UINT32_T c)
|
2017-08-22 22:12:17 +02:00
|
|
|
|
{
|
2017-10-15 22:56:49 +02:00
|
|
|
|
if (c >= 0x100 && utf_iscomposing((int)c))
|
|
|
|
|
return 0;
|
2017-08-22 22:12:17 +02:00
|
|
|
|
return utf_char2cells((int)c);
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* For UTF-8 character "c" return 2 for a double-width character, 1 for others.
|
|
|
|
|
* Returns 4 or 6 for an unprintable character.
|
|
|
|
|
* Is only correct for characters >= 0x80.
|
|
|
|
|
* When p_ambw is "double", return 2 for a character with East Asian Width
|
|
|
|
|
* class 'A'(mbiguous).
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_char2cells(int c)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Sorted list of non-overlapping intervals of East Asian double width
|
|
|
|
|
// characters, generated with ../runtime/tools/unicode.vim.
|
2010-01-27 18:29:26 +01:00
|
|
|
|
static struct interval doublewidth[] =
|
|
|
|
|
{
|
|
|
|
|
{0x1100, 0x115f},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x231a, 0x231b},
|
2010-01-27 18:29:26 +01:00
|
|
|
|
{0x2329, 0x232a},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x23e9, 0x23ec},
|
|
|
|
|
{0x23f0, 0x23f0},
|
|
|
|
|
{0x23f3, 0x23f3},
|
|
|
|
|
{0x25fd, 0x25fe},
|
|
|
|
|
{0x2614, 0x2615},
|
|
|
|
|
{0x2648, 0x2653},
|
|
|
|
|
{0x267f, 0x267f},
|
|
|
|
|
{0x2693, 0x2693},
|
|
|
|
|
{0x26a1, 0x26a1},
|
|
|
|
|
{0x26aa, 0x26ab},
|
|
|
|
|
{0x26bd, 0x26be},
|
|
|
|
|
{0x26c4, 0x26c5},
|
|
|
|
|
{0x26ce, 0x26ce},
|
|
|
|
|
{0x26d4, 0x26d4},
|
|
|
|
|
{0x26ea, 0x26ea},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x26f2, 0x26f3},
|
|
|
|
|
{0x26f5, 0x26f5},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x26fa, 0x26fa},
|
|
|
|
|
{0x26fd, 0x26fd},
|
|
|
|
|
{0x2705, 0x2705},
|
|
|
|
|
{0x270a, 0x270b},
|
|
|
|
|
{0x2728, 0x2728},
|
|
|
|
|
{0x274c, 0x274c},
|
|
|
|
|
{0x274e, 0x274e},
|
|
|
|
|
{0x2753, 0x2755},
|
|
|
|
|
{0x2757, 0x2757},
|
|
|
|
|
{0x2795, 0x2797},
|
|
|
|
|
{0x27b0, 0x27b0},
|
|
|
|
|
{0x27bf, 0x27bf},
|
|
|
|
|
{0x2b1b, 0x2b1c},
|
|
|
|
|
{0x2b50, 0x2b50},
|
|
|
|
|
{0x2b55, 0x2b55},
|
2010-01-27 18:29:26 +01:00
|
|
|
|
{0x2e80, 0x2e99},
|
|
|
|
|
{0x2e9b, 0x2ef3},
|
|
|
|
|
{0x2f00, 0x2fd5},
|
2023-10-11 21:24:49 +02:00
|
|
|
|
{0x2ff0, 0x303e},
|
2010-01-27 18:29:26 +01:00
|
|
|
|
{0x3041, 0x3096},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x3099, 0x30ff},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x3105, 0x312f},
|
2010-01-27 18:29:26 +01:00
|
|
|
|
{0x3131, 0x318e},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x3190, 0x31e3},
|
2023-10-11 21:24:49 +02:00
|
|
|
|
{0x31ef, 0x321e},
|
2010-01-27 18:29:26 +01:00
|
|
|
|
{0x3220, 0x3247},
|
2019-06-05 22:46:13 +02:00
|
|
|
|
{0x3250, 0x4dbf},
|
2010-01-27 18:29:26 +01:00
|
|
|
|
{0x4e00, 0xa48c},
|
|
|
|
|
{0xa490, 0xa4c6},
|
|
|
|
|
{0xa960, 0xa97c},
|
|
|
|
|
{0xac00, 0xd7a3},
|
|
|
|
|
{0xf900, 0xfaff},
|
|
|
|
|
{0xfe10, 0xfe19},
|
|
|
|
|
{0xfe30, 0xfe52},
|
|
|
|
|
{0xfe54, 0xfe66},
|
|
|
|
|
{0xfe68, 0xfe6b},
|
|
|
|
|
{0xff01, 0xff60},
|
|
|
|
|
{0xffe0, 0xffe6},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x16fe0, 0x16fe3},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x16ff0, 0x16ff1},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x17000, 0x187f7},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x18800, 0x18cd5},
|
|
|
|
|
{0x18d00, 0x18d08},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1aff0, 0x1aff3},
|
|
|
|
|
{0x1aff5, 0x1affb},
|
|
|
|
|
{0x1affd, 0x1affe},
|
|
|
|
|
{0x1b000, 0x1b122},
|
|
|
|
|
{0x1b132, 0x1b132},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x1b150, 0x1b152},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1b155, 0x1b155},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x1b164, 0x1b167},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x1b170, 0x1b2fb},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1f004, 0x1f004},
|
|
|
|
|
{0x1f0cf, 0x1f0cf},
|
|
|
|
|
{0x1f18e, 0x1f18e},
|
|
|
|
|
{0x1f191, 0x1f19a},
|
2016-03-21 22:15:30 +01:00
|
|
|
|
{0x1f200, 0x1f202},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1f210, 0x1f23b},
|
2016-03-21 22:15:30 +01:00
|
|
|
|
{0x1f240, 0x1f248},
|
|
|
|
|
{0x1f250, 0x1f251},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x1f260, 0x1f265},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1f300, 0x1f320},
|
|
|
|
|
{0x1f32d, 0x1f335},
|
|
|
|
|
{0x1f337, 0x1f37c},
|
|
|
|
|
{0x1f37e, 0x1f393},
|
|
|
|
|
{0x1f3a0, 0x1f3ca},
|
|
|
|
|
{0x1f3cf, 0x1f3d3},
|
|
|
|
|
{0x1f3e0, 0x1f3f0},
|
|
|
|
|
{0x1f3f4, 0x1f3f4},
|
|
|
|
|
{0x1f3f8, 0x1f43e},
|
|
|
|
|
{0x1f440, 0x1f440},
|
|
|
|
|
{0x1f442, 0x1f4fc},
|
|
|
|
|
{0x1f4ff, 0x1f53d},
|
|
|
|
|
{0x1f54b, 0x1f54e},
|
|
|
|
|
{0x1f550, 0x1f567},
|
|
|
|
|
{0x1f57a, 0x1f57a},
|
|
|
|
|
{0x1f595, 0x1f596},
|
|
|
|
|
{0x1f5a4, 0x1f5a4},
|
|
|
|
|
{0x1f5fb, 0x1f64f},
|
|
|
|
|
{0x1f680, 0x1f6c5},
|
|
|
|
|
{0x1f6cc, 0x1f6cc},
|
|
|
|
|
{0x1f6d0, 0x1f6d2},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x1f6d5, 0x1f6d7},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1f6dc, 0x1f6df},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1f6eb, 0x1f6ec},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x1f6f4, 0x1f6fc},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x1f7e0, 0x1f7eb},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1f7f0, 0x1f7f0},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x1f90c, 0x1f93a},
|
|
|
|
|
{0x1f93c, 0x1f945},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1f947, 0x1f9ff},
|
|
|
|
|
{0x1fa70, 0x1fa7c},
|
|
|
|
|
{0x1fa80, 0x1fa88},
|
|
|
|
|
{0x1fa90, 0x1fabd},
|
|
|
|
|
{0x1fabf, 0x1fac5},
|
|
|
|
|
{0x1face, 0x1fadb},
|
|
|
|
|
{0x1fae0, 0x1fae8},
|
|
|
|
|
{0x1faf0, 0x1faf8},
|
2010-01-27 18:29:26 +01:00
|
|
|
|
{0x20000, 0x2fffd},
|
|
|
|
|
{0x30000, 0x3fffd}
|
|
|
|
|
};
|
2015-01-14 17:40:09 +01:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Sorted list of non-overlapping intervals of Emoji characters that don't
|
|
|
|
|
// have ambiguous or double width,
|
|
|
|
|
// based on http://unicode.org/emoji/charts/emoji-list.html
|
2020-08-28 21:04:24 +02:00
|
|
|
|
static struct interval emoji_wide[] =
|
2016-03-21 22:09:44 +01:00
|
|
|
|
{
|
2020-08-28 22:24:57 +02:00
|
|
|
|
{0x23ed, 0x23ef},
|
|
|
|
|
{0x23f1, 0x23f2},
|
|
|
|
|
{0x23f8, 0x23fa},
|
|
|
|
|
{0x24c2, 0x24c2},
|
|
|
|
|
{0x261d, 0x261d},
|
|
|
|
|
{0x26c8, 0x26c8},
|
|
|
|
|
{0x26cf, 0x26cf},
|
|
|
|
|
{0x26d1, 0x26d1},
|
|
|
|
|
{0x26d3, 0x26d3},
|
|
|
|
|
{0x26e9, 0x26e9},
|
|
|
|
|
{0x26f0, 0x26f1},
|
|
|
|
|
{0x26f7, 0x26f9},
|
|
|
|
|
{0x270c, 0x270d},
|
|
|
|
|
{0x2934, 0x2935},
|
|
|
|
|
{0x1f170, 0x1f189},
|
2016-03-21 22:09:44 +01:00
|
|
|
|
{0x1f1e6, 0x1f1ff},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x1f321, 0x1f321},
|
|
|
|
|
{0x1f324, 0x1f32c},
|
|
|
|
|
{0x1f336, 0x1f336},
|
|
|
|
|
{0x1f37d, 0x1f37d},
|
|
|
|
|
{0x1f396, 0x1f397},
|
|
|
|
|
{0x1f399, 0x1f39b},
|
|
|
|
|
{0x1f39e, 0x1f39f},
|
|
|
|
|
{0x1f3cb, 0x1f3ce},
|
|
|
|
|
{0x1f3d4, 0x1f3df},
|
|
|
|
|
{0x1f3f3, 0x1f3f5},
|
|
|
|
|
{0x1f3f7, 0x1f3f7},
|
|
|
|
|
{0x1f43f, 0x1f43f},
|
|
|
|
|
{0x1f441, 0x1f441},
|
|
|
|
|
{0x1f4fd, 0x1f4fd},
|
|
|
|
|
{0x1f549, 0x1f54a},
|
|
|
|
|
{0x1f56f, 0x1f570},
|
|
|
|
|
{0x1f573, 0x1f579},
|
|
|
|
|
{0x1f587, 0x1f587},
|
|
|
|
|
{0x1f58a, 0x1f58d},
|
|
|
|
|
{0x1f590, 0x1f590},
|
|
|
|
|
{0x1f5a5, 0x1f5a5},
|
|
|
|
|
{0x1f5a8, 0x1f5a8},
|
|
|
|
|
{0x1f5b1, 0x1f5b2},
|
|
|
|
|
{0x1f5bc, 0x1f5bc},
|
|
|
|
|
{0x1f5c2, 0x1f5c4},
|
|
|
|
|
{0x1f5d1, 0x1f5d3},
|
|
|
|
|
{0x1f5dc, 0x1f5de},
|
|
|
|
|
{0x1f5e1, 0x1f5e1},
|
|
|
|
|
{0x1f5e3, 0x1f5e3},
|
|
|
|
|
{0x1f5e8, 0x1f5e8},
|
|
|
|
|
{0x1f5ef, 0x1f5ef},
|
|
|
|
|
{0x1f5f3, 0x1f5f3},
|
|
|
|
|
{0x1f5fa, 0x1f5fa},
|
|
|
|
|
{0x1f6cb, 0x1f6cf},
|
|
|
|
|
{0x1f6e0, 0x1f6e5},
|
|
|
|
|
{0x1f6e9, 0x1f6e9},
|
|
|
|
|
{0x1f6f0, 0x1f6f0},
|
|
|
|
|
{0x1f6f3, 0x1f6f3}
|
2021-04-07 19:00:25 +02:00
|
|
|
|
|
|
|
|
|
#ifdef MACOS_X
|
2022-11-07 11:05:52 +00:00
|
|
|
|
// Include SF Symbols 4 characters, which should be rendered as
|
|
|
|
|
// double-width. SF Symbols is an Apple-specific set of symbols and
|
|
|
|
|
// icons for use in Apple operating systems. They are included as
|
|
|
|
|
// glyphs as part of the default San Francisco fonts shipped with
|
|
|
|
|
// macOS. The current version is SF Symbols 4.
|
|
|
|
|
//
|
|
|
|
|
// These Apple-specific glyphs are not part of standard Unicode, and
|
|
|
|
|
// all of them are in the Supplementary Private Use Area-B range. The
|
|
|
|
|
// exact range was determined by downloading the 'SF Symbols 4' app
|
|
|
|
|
// from Apple (https://developer.apple.com/sf-symbols/), and then
|
|
|
|
|
// selecting all symbols, copying them out, and inspecting the unicode
|
|
|
|
|
// values of them.
|
|
|
|
|
//
|
|
|
|
|
// Note that these symbols are of varying widths, as they are symbols
|
2023-03-07 17:13:51 +00:00
|
|
|
|
// representing different things ranging from a simple gear icon to an
|
2022-11-07 11:05:52 +00:00
|
|
|
|
// airplane. Some of them are in fact wider than double-width, but Vim
|
|
|
|
|
// doesn't support non-fixed-width font, and tagging them as
|
|
|
|
|
// double-width is the best way to handle them.
|
|
|
|
|
//
|
|
|
|
|
// Also see https://en.wikipedia.org/wiki/San_Francisco_(sans-serif_typeface)#SF_Symbols
|
|
|
|
|
, {0x100000, 0x1018c7}
|
2021-04-07 19:00:25 +02:00
|
|
|
|
#endif
|
2016-03-21 22:09:44 +01:00
|
|
|
|
};
|
|
|
|
|
|
2020-08-28 22:24:57 +02:00
|
|
|
|
#ifdef FEAT_EVAL
|
2023-01-20 16:00:55 +00:00
|
|
|
|
// Use the value from setcellwidths() at 0x80 and higher, unless the
|
|
|
|
|
// character is not printable.
|
|
|
|
|
if (c >= 0x80 &&
|
|
|
|
|
# ifdef USE_WCHAR_FUNCTIONS
|
|
|
|
|
wcwidth(c) >= 1 &&
|
|
|
|
|
# endif
|
|
|
|
|
vim_isprintc(c))
|
|
|
|
|
{
|
|
|
|
|
int n = cw_value(c);
|
2020-08-28 21:04:24 +02:00
|
|
|
|
if (n != 0)
|
|
|
|
|
return n;
|
2023-01-20 16:00:55 +00:00
|
|
|
|
}
|
2020-08-28 22:24:57 +02:00
|
|
|
|
#endif
|
2020-08-28 21:04:24 +02:00
|
|
|
|
|
2023-01-20 16:00:55 +00:00
|
|
|
|
if (c >= 0x100)
|
|
|
|
|
{
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#ifdef USE_WCHAR_FUNCTIONS
|
2023-01-20 16:00:55 +00:00
|
|
|
|
int n;
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Assume the library function wcwidth() works better than our own
|
|
|
|
|
* stuff. It should return 1 for ambiguous width chars!
|
|
|
|
|
*/
|
2020-08-28 21:04:24 +02:00
|
|
|
|
n = wcwidth(c);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
if (n < 0)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return 6; // unprintable, displays <xxxx>
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (n > 1)
|
|
|
|
|
return n;
|
|
|
|
|
#else
|
|
|
|
|
if (!utf_printable(c))
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return 6; // unprintable, displays <xxxx>
|
2010-01-27 18:29:26 +01:00
|
|
|
|
if (intable(doublewidth, sizeof(doublewidth), c))
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 2;
|
|
|
|
|
#endif
|
2020-08-28 21:04:24 +02:00
|
|
|
|
if (p_emoji && intable(emoji_wide, sizeof(emoji_wide), c))
|
2016-03-19 18:42:29 +01:00
|
|
|
|
return 2;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Characters below 0x100 are influenced by 'isprint' option
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else if (c >= 0x80 && !vim_isprintc(c))
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return 4; // unprintable, displays <xx>
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
if (c >= 0x80 && *p_ambw == 'd' && intable(ambiguous, sizeof(ambiguous), c))
|
|
|
|
|
return 2;
|
|
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* mb_ptr2cells() function pointer.
|
|
|
|
|
* Return the number of display cells character at "*p" occupies.
|
|
|
|
|
* This doesn't take care of unprintable characters, use ptr2cells() for that.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
latin_ptr2cells(char_u *p UNUSED)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_ptr2cells(
|
|
|
|
|
char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int c;
|
|
|
|
|
|
2020-08-30 19:26:45 +02:00
|
|
|
|
// Need to convert to a character number.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (*p >= 0x80)
|
|
|
|
|
{
|
|
|
|
|
c = utf_ptr2char(p);
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// An illegal byte is displayed as <xx>.
|
2005-08-10 21:07:57 +00:00
|
|
|
|
if (utf_ptr2len(p) == 1 || c == NUL)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 4;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// If the char is ASCII it must be an overlong sequence.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (c < 0x80)
|
|
|
|
|
return char2cells(c);
|
|
|
|
|
return utf_char2cells(c);
|
|
|
|
|
}
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_ptr2cells(char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Number of cells is equal to number of bytes, except for euc-jp when
|
|
|
|
|
// the first byte is 0x8e.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (enc_dbcs == DBCS_JPNU && *p == 0x8e)
|
|
|
|
|
return 1;
|
|
|
|
|
return MB_BYTE2LEN(*p);
|
|
|
|
|
}
|
|
|
|
|
|
2009-06-16 13:12:07 +00:00
|
|
|
|
/*
|
|
|
|
|
* mb_ptr2cells_len() function pointer.
|
|
|
|
|
* Like mb_ptr2cells(), but limit string length to "size".
|
|
|
|
|
* For an empty string or truncated character returns 1.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
latin_ptr2cells_len(char_u *p UNUSED, int size UNUSED)
|
2009-06-16 13:12:07 +00:00
|
|
|
|
{
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_ptr2cells_len(char_u *p, int size)
|
2009-06-16 13:12:07 +00:00
|
|
|
|
{
|
|
|
|
|
int c;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Need to convert to a wide character.
|
2009-06-16 13:12:07 +00:00
|
|
|
|
if (size > 0 && *p >= 0x80)
|
|
|
|
|
{
|
|
|
|
|
if (utf_ptr2len_len(p, size) < utf8len_tab[*p])
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return 1; // truncated
|
2009-06-16 13:12:07 +00:00
|
|
|
|
c = utf_ptr2char(p);
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// An illegal byte is displayed as <xx>.
|
2009-06-16 13:12:07 +00:00
|
|
|
|
if (utf_ptr2len(p) == 1 || c == NUL)
|
|
|
|
|
return 4;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// If the char is ASCII it must be an overlong sequence.
|
2009-06-16 13:12:07 +00:00
|
|
|
|
if (c < 0x80)
|
|
|
|
|
return char2cells(c);
|
|
|
|
|
return utf_char2cells(c);
|
|
|
|
|
}
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_ptr2cells_len(char_u *p, int size)
|
2009-06-16 13:12:07 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Number of cells is equal to number of bytes, except for euc-jp when
|
|
|
|
|
// the first byte is 0x8e.
|
2009-06-16 13:12:07 +00:00
|
|
|
|
if (size <= 1 || (enc_dbcs == DBCS_JPNU && *p == 0x8e))
|
|
|
|
|
return 1;
|
|
|
|
|
return MB_BYTE2LEN(*p);
|
|
|
|
|
}
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* mb_char2cells() function pointer.
|
|
|
|
|
* Return the number of display cells character "c" occupies.
|
|
|
|
|
* Only takes care of multi-byte chars, not "^C" and such.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
latin_char2cells(int c UNUSED)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_char2cells(int c)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Number of cells is equal to number of bytes, except for euc-jp when
|
|
|
|
|
// the first byte is 0x8e.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (enc_dbcs == DBCS_JPNU && ((unsigned)c >> 8) == 0x8e)
|
|
|
|
|
return 1;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// use the first byte
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return MB_BYTE2LEN((unsigned)c >> 8);
|
|
|
|
|
}
|
|
|
|
|
|
2010-07-18 15:31:08 +02:00
|
|
|
|
/*
|
|
|
|
|
* Return the number of cells occupied by string "p".
|
|
|
|
|
* Stop at a NUL character. When "len" >= 0 stop at character "p[len]".
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_string2cells(char_u *p, int len)
|
2010-07-18 15:31:08 +02:00
|
|
|
|
{
|
|
|
|
|
int i;
|
|
|
|
|
int clen = 0;
|
|
|
|
|
|
|
|
|
|
for (i = 0; (len < 0 || i < len) && p[i] != NUL; i += (*mb_ptr2len)(p + i))
|
|
|
|
|
clen += (*mb_ptr2cells)(p + i);
|
|
|
|
|
return clen;
|
|
|
|
|
}
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* mb_off2cells() function pointer.
|
|
|
|
|
* Return number of display cells for char at ScreenLines[off].
|
2007-08-30 11:53:22 +00:00
|
|
|
|
* We make sure that the offset used is less than "max_off".
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
latin_off2cells(unsigned off UNUSED, unsigned max_off UNUSED)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_off2cells(unsigned off, unsigned max_off)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// never check beyond end of the line
|
2007-08-30 11:53:22 +00:00
|
|
|
|
if (off >= max_off)
|
|
|
|
|
return 1;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Number of cells is equal to number of bytes, except for euc-jp when
|
|
|
|
|
// the first byte is 0x8e.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (enc_dbcs == DBCS_JPNU && ScreenLines[off] == 0x8e)
|
|
|
|
|
return 1;
|
|
|
|
|
return MB_BYTE2LEN(ScreenLines[off]);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_off2cells(unsigned off, unsigned max_off)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2007-08-30 11:53:22 +00:00
|
|
|
|
return (off + 1 < max_off && ScreenLines[off + 1] == 0) ? 2 : 1;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* mb_ptr2char() function pointer.
|
|
|
|
|
* Convert a byte sequence into a character.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
latin_ptr2char(char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
return *p;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_ptr2char(char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (MB_BYTE2LEN(*p) > 1 && p[1] != NUL)
|
|
|
|
|
return (p[0] << 8) + p[1];
|
|
|
|
|
return *p;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
2020-08-30 19:26:45 +02:00
|
|
|
|
* Convert a UTF-8 byte sequence to a character number.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* If the sequence is illegal or truncated by a NUL the first byte is
|
|
|
|
|
* returned.
|
2018-01-27 21:01:34 +01:00
|
|
|
|
* For an overlong sequence this may return zero.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* Does not include composing characters, of course.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_ptr2char(char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (p[0] < 0x80) // be quick for ASCII
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return p[0];
|
|
|
|
|
|
2009-12-02 14:02:39 +00:00
|
|
|
|
len = utf8len_tab_zero[p[0]];
|
2008-06-29 14:16:06 +00:00
|
|
|
|
if (len > 1 && (p[1] & 0xc0) == 0x80)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (len == 2)
|
|
|
|
|
return ((p[0] & 0x1f) << 6) + (p[1] & 0x3f);
|
|
|
|
|
if ((p[2] & 0xc0) == 0x80)
|
|
|
|
|
{
|
|
|
|
|
if (len == 3)
|
|
|
|
|
return ((p[0] & 0x0f) << 12) + ((p[1] & 0x3f) << 6)
|
|
|
|
|
+ (p[2] & 0x3f);
|
|
|
|
|
if ((p[3] & 0xc0) == 0x80)
|
|
|
|
|
{
|
|
|
|
|
if (len == 4)
|
|
|
|
|
return ((p[0] & 0x07) << 18) + ((p[1] & 0x3f) << 12)
|
|
|
|
|
+ ((p[2] & 0x3f) << 6) + (p[3] & 0x3f);
|
|
|
|
|
if ((p[4] & 0xc0) == 0x80)
|
|
|
|
|
{
|
|
|
|
|
if (len == 5)
|
|
|
|
|
return ((p[0] & 0x03) << 24) + ((p[1] & 0x3f) << 18)
|
|
|
|
|
+ ((p[2] & 0x3f) << 12) + ((p[3] & 0x3f) << 6)
|
|
|
|
|
+ (p[4] & 0x3f);
|
|
|
|
|
if ((p[5] & 0xc0) == 0x80 && len == 6)
|
|
|
|
|
return ((p[0] & 0x01) << 30) + ((p[1] & 0x3f) << 24)
|
|
|
|
|
+ ((p[2] & 0x3f) << 18) + ((p[3] & 0x3f) << 12)
|
|
|
|
|
+ ((p[4] & 0x3f) << 6) + (p[5] & 0x3f);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Illegal value, just return the first byte
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return p[0];
|
|
|
|
|
}
|
|
|
|
|
|
2011-07-15 21:16:59 +02:00
|
|
|
|
/*
|
|
|
|
|
* Convert a UTF-8 byte sequence to a wide character.
|
|
|
|
|
* String is assumed to be terminated by NUL or after "n" bytes, whichever
|
|
|
|
|
* comes first.
|
|
|
|
|
* The function is safe in the sense that it never accesses memory beyond the
|
|
|
|
|
* first "n" bytes of "s".
|
|
|
|
|
*
|
|
|
|
|
* On success, returns decoded codepoint, advances "s" to the beginning of
|
|
|
|
|
* next character and decreases "n" accordingly.
|
|
|
|
|
*
|
|
|
|
|
* If end of string was reached, returns 0 and, if "n" > 0, advances "s" past
|
|
|
|
|
* NUL byte.
|
|
|
|
|
*
|
|
|
|
|
* If byte sequence is illegal or incomplete, returns -1 and does not advance
|
|
|
|
|
* "s".
|
|
|
|
|
*/
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_safe_read_char_adv(char_u **s, size_t *n)
|
2011-07-15 21:16:59 +02:00
|
|
|
|
{
|
|
|
|
|
int c, k;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (*n == 0) // end of buffer
|
2011-07-15 21:16:59 +02:00
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
|
|
k = utf8len_tab_zero[**s];
|
|
|
|
|
|
|
|
|
|
if (k == 1)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// ASCII character or NUL
|
2011-07-15 21:16:59 +02:00
|
|
|
|
(*n)--;
|
|
|
|
|
return *(*s)++;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if ((size_t)k <= *n)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// We have a multibyte sequence and it isn't truncated by buffer
|
|
|
|
|
// limits so utf_ptr2char() is safe to use. Or the first byte is
|
|
|
|
|
// illegal (k=0), and it's also safe to use utf_ptr2char().
|
2011-07-15 21:16:59 +02:00
|
|
|
|
c = utf_ptr2char(*s);
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// On failure, utf_ptr2char() returns the first byte, so here we
|
|
|
|
|
// check equality with the first byte. The only non-ASCII character
|
|
|
|
|
// which equals the first byte of its own UTF-8 representation is
|
|
|
|
|
// U+00C3 (UTF-8: 0xC3 0x83), so need to check that special case too.
|
|
|
|
|
// It's safe even if n=1, else we would have k=2 > n.
|
2011-07-15 21:16:59 +02:00
|
|
|
|
if (c != (int)(**s) || (c == 0xC3 && (*s)[1] == 0x83))
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// byte sequence was successfully decoded
|
2011-07-15 21:16:59 +02:00
|
|
|
|
*s += k;
|
|
|
|
|
*n -= k;
|
|
|
|
|
return c;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// byte sequence is incomplete or illegal
|
2011-07-15 21:16:59 +02:00
|
|
|
|
return -1;
|
|
|
|
|
}
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Get character at **pp and advance *pp to the next character.
|
|
|
|
|
* Note: composing characters are skipped!
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_ptr2char_adv(char_u **pp)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int c;
|
|
|
|
|
|
|
|
|
|
c = (*mb_ptr2char)(*pp);
|
2005-08-10 21:07:57 +00:00
|
|
|
|
*pp += (*mb_ptr2len)(*pp);
|
|
|
|
|
return c;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Get character at **pp and advance *pp to the next character.
|
|
|
|
|
* Note: composing characters are returned as separate characters.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_cptr2char_adv(char_u **pp)
|
2005-08-10 21:07:57 +00:00
|
|
|
|
{
|
|
|
|
|
int c;
|
|
|
|
|
|
|
|
|
|
c = (*mb_ptr2char)(*pp);
|
|
|
|
|
if (enc_utf8)
|
|
|
|
|
*pp += utf_ptr2len(*pp);
|
|
|
|
|
else
|
|
|
|
|
*pp += (*mb_ptr2len)(*pp);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return c;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
#if defined(FEAT_ARABIC) || defined(PROTO)
|
|
|
|
|
/*
|
|
|
|
|
* Check if the character pointed to by "p2" is a composing character when it
|
|
|
|
|
* comes after "p1". For Arabic sometimes "ab" is replaced with "c", which
|
|
|
|
|
* behaves like a composing character.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_composinglike(char_u *p1, char_u *p2)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int c2;
|
|
|
|
|
|
|
|
|
|
c2 = utf_ptr2char(p2);
|
|
|
|
|
if (utf_iscomposing(c2))
|
|
|
|
|
return TRUE;
|
|
|
|
|
if (!arabic_maycombine(c2))
|
|
|
|
|
return FALSE;
|
|
|
|
|
return arabic_combine(utf_ptr2char(p1), c2);
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
/*
|
2007-05-10 17:45:37 +00:00
|
|
|
|
* Convert a UTF-8 byte string to a wide character. Also get up to MAX_MCO
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* composing characters.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utfc_ptr2char(
|
|
|
|
|
char_u *p,
|
2019-12-04 21:57:43 +01:00
|
|
|
|
int *pcc) // return: composing chars, last one is 0
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
|
|
|
|
int c;
|
|
|
|
|
int cc;
|
2006-03-06 23:29:24 +00:00
|
|
|
|
int i = 0;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
c = utf_ptr2char(p);
|
2005-08-10 21:07:57 +00:00
|
|
|
|
len = utf_ptr2len(p);
|
2006-03-06 23:29:24 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Only accept a composing char when the first char isn't illegal.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if ((len > 1 || *p < 0x80)
|
|
|
|
|
&& p[len] >= 0x80
|
|
|
|
|
&& UTF_COMPOSINGLIKE(p, p + len))
|
|
|
|
|
{
|
2006-03-06 23:29:24 +00:00
|
|
|
|
cc = utf_ptr2char(p + len);
|
|
|
|
|
for (;;)
|
|
|
|
|
{
|
|
|
|
|
pcc[i++] = cc;
|
|
|
|
|
if (i == MAX_MCO)
|
|
|
|
|
break;
|
|
|
|
|
len += utf_ptr2len(p + len);
|
|
|
|
|
if (p[len] < 0x80 || !utf_iscomposing(cc = utf_ptr2char(p + len)))
|
|
|
|
|
break;
|
|
|
|
|
}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
2006-03-06 23:29:24 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (i < MAX_MCO) // last composing char must be 0
|
2006-03-06 23:29:24 +00:00
|
|
|
|
pcc[i] = 0;
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return c;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
2007-05-10 17:45:37 +00:00
|
|
|
|
* Convert a UTF-8 byte string to a wide character. Also get up to MAX_MCO
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* composing characters. Use no more than p[maxlen].
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utfc_ptr2char_len(
|
|
|
|
|
char_u *p,
|
2019-12-04 21:57:43 +01:00
|
|
|
|
int *pcc, // return: composing chars, last one is 0
|
2016-01-30 18:51:09 +01:00
|
|
|
|
int maxlen)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
|
|
|
|
int c;
|
|
|
|
|
int cc;
|
2006-03-06 23:29:24 +00:00
|
|
|
|
int i = 0;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
c = utf_ptr2char(p);
|
2005-08-10 21:07:57 +00:00
|
|
|
|
len = utf_ptr2len_len(p, maxlen);
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Only accept a composing char when the first char isn't illegal.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if ((len > 1 || *p < 0x80)
|
|
|
|
|
&& len < maxlen
|
|
|
|
|
&& p[len] >= 0x80
|
|
|
|
|
&& UTF_COMPOSINGLIKE(p, p + len))
|
|
|
|
|
{
|
2006-03-06 23:29:24 +00:00
|
|
|
|
cc = utf_ptr2char(p + len);
|
|
|
|
|
for (;;)
|
|
|
|
|
{
|
|
|
|
|
pcc[i++] = cc;
|
|
|
|
|
if (i == MAX_MCO)
|
|
|
|
|
break;
|
|
|
|
|
len += utf_ptr2len_len(p + len, maxlen - len);
|
|
|
|
|
if (len >= maxlen
|
|
|
|
|
|| p[len] < 0x80
|
|
|
|
|
|| !utf_iscomposing(cc = utf_ptr2char(p + len)))
|
|
|
|
|
break;
|
|
|
|
|
}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
2006-03-06 23:29:24 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (i < MAX_MCO) // last composing char must be 0
|
2006-03-06 23:29:24 +00:00
|
|
|
|
pcc[i] = 0;
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return c;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Convert the character at screen position "off" to a sequence of bytes.
|
|
|
|
|
* Includes the composing characters.
|
2012-06-01 15:21:02 +02:00
|
|
|
|
* "buf" must at least have the length MB_MAXBYTES + 1.
|
2010-05-15 13:56:02 +02:00
|
|
|
|
* Only to be used when ScreenLinesUC[off] != 0.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* Returns the produced number of bytes.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utfc_char2bytes(int off, char_u *buf)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
2006-03-06 23:29:24 +00:00
|
|
|
|
int i;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
len = utf_char2bytes(ScreenLinesUC[off], buf);
|
2006-03-06 23:29:24 +00:00
|
|
|
|
for (i = 0; i < Screen_mco; ++i)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2006-03-06 23:29:24 +00:00
|
|
|
|
if (ScreenLinesC[i][off] == 0)
|
|
|
|
|
break;
|
|
|
|
|
len += utf_char2bytes(ScreenLinesC[i][off], buf + len);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
return len;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Get the length of a UTF-8 byte sequence, not including any following
|
|
|
|
|
* composing characters.
|
|
|
|
|
* Returns 0 for "".
|
|
|
|
|
* Returns 1 for an illegal byte sequence.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_ptr2len(char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
if (*p == NUL)
|
|
|
|
|
return 0;
|
|
|
|
|
len = utf8len_tab[*p];
|
|
|
|
|
for (i = 1; i < len; ++i)
|
|
|
|
|
if ((p[i] & 0xc0) != 0x80)
|
|
|
|
|
return 1;
|
|
|
|
|
return len;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return length of UTF-8 character, obtained from the first byte.
|
|
|
|
|
* "b" must be between 0 and 255!
|
2009-12-02 14:02:39 +00:00
|
|
|
|
* Returns 1 for an invalid first byte value.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_byte2len(int b)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
return utf8len_tab[b];
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Get the length of UTF-8 byte sequence "p[size]". Does not include any
|
|
|
|
|
* following composing characters.
|
|
|
|
|
* Returns 1 for "".
|
2008-01-04 16:47:25 +00:00
|
|
|
|
* Returns 1 for an illegal byte sequence (also in incomplete byte seq.).
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* Returns number > "size" for an incomplete byte sequence.
|
2009-12-02 14:02:39 +00:00
|
|
|
|
* Never returns zero.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_ptr2len_len(char_u *p, int size)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
|
|
|
|
int i;
|
2008-01-04 16:47:25 +00:00
|
|
|
|
int m;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2009-12-02 14:02:39 +00:00
|
|
|
|
len = utf8len_tab[*p];
|
|
|
|
|
if (len == 1)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return 1; // NUL, ascii or illegal lead byte
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (len > size)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
m = size; // incomplete byte sequence.
|
2009-12-02 14:02:39 +00:00
|
|
|
|
else
|
|
|
|
|
m = len;
|
2008-01-04 16:47:25 +00:00
|
|
|
|
for (i = 1; i < m; ++i)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if ((p[i] & 0xc0) != 0x80)
|
|
|
|
|
return 1;
|
|
|
|
|
return len;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return the number of bytes the UTF-8 encoding of the character at "p" takes.
|
|
|
|
|
* This includes following composing characters.
|
2022-08-16 17:50:38 +01:00
|
|
|
|
* Returns zero for NUL.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utfc_ptr2len(char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
2005-06-08 22:00:03 +00:00
|
|
|
|
int b0 = *p;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#ifdef FEAT_ARABIC
|
|
|
|
|
int prevlen;
|
|
|
|
|
#endif
|
|
|
|
|
|
2005-06-08 22:00:03 +00:00
|
|
|
|
if (b0 == NUL)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 0;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (b0 < 0x80 && p[1] < 0x80) // be quick for ASCII
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 1;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Skip over first UTF-8 char, stopping at a NUL byte.
|
2005-08-10 21:07:57 +00:00
|
|
|
|
len = utf_ptr2len(p);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Check for illegal byte.
|
2005-06-08 22:00:03 +00:00
|
|
|
|
if (len == 1 && b0 >= 0x80)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
|
|
/*
|
2010-01-12 19:52:03 +01:00
|
|
|
|
* Check for composing characters. We can handle only the first six, but
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* skip all of them (otherwise the cursor would get stuck).
|
|
|
|
|
*/
|
|
|
|
|
#ifdef FEAT_ARABIC
|
|
|
|
|
prevlen = 0;
|
|
|
|
|
#endif
|
|
|
|
|
for (;;)
|
|
|
|
|
{
|
|
|
|
|
if (p[len] < 0x80 || !UTF_COMPOSINGLIKE(p + prevlen, p + len))
|
|
|
|
|
return len;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Skip over composing char
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#ifdef FEAT_ARABIC
|
|
|
|
|
prevlen = len;
|
|
|
|
|
#endif
|
2005-08-10 21:07:57 +00:00
|
|
|
|
len += utf_ptr2len(p + len);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return the number of bytes the UTF-8 encoding of the character at "p[size]"
|
|
|
|
|
* takes. This includes following composing characters.
|
2009-06-16 13:12:07 +00:00
|
|
|
|
* Returns 0 for an empty string.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* Returns 1 for an illegal char or an incomplete byte sequence.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utfc_ptr2len_len(char_u *p, int size)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
|
|
|
|
#ifdef FEAT_ARABIC
|
|
|
|
|
int prevlen;
|
|
|
|
|
#endif
|
|
|
|
|
|
2009-06-16 13:12:07 +00:00
|
|
|
|
if (size < 1 || *p == NUL)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 0;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (p[0] < 0x80 && (size == 1 || p[1] < 0x80)) // be quick for ASCII
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 1;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Skip over first UTF-8 char, stopping at a NUL byte.
|
2005-08-10 21:07:57 +00:00
|
|
|
|
len = utf_ptr2len_len(p, size);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Check for illegal byte and incomplete byte sequence.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if ((len == 1 && p[0] >= 0x80) || len > size)
|
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
|
|
/*
|
2010-01-12 19:52:03 +01:00
|
|
|
|
* Check for composing characters. We can handle only the first six, but
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* skip all of them (otherwise the cursor would get stuck).
|
|
|
|
|
*/
|
|
|
|
|
#ifdef FEAT_ARABIC
|
|
|
|
|
prevlen = 0;
|
|
|
|
|
#endif
|
|
|
|
|
while (len < size)
|
|
|
|
|
{
|
2008-06-29 14:16:06 +00:00
|
|
|
|
int len_next_char;
|
|
|
|
|
|
|
|
|
|
if (p[len] < 0x80)
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Next character length should not go beyond size to ensure that
|
|
|
|
|
* UTF_COMPOSINGLIKE(...) does not read beyond size.
|
|
|
|
|
*/
|
|
|
|
|
len_next_char = utf_ptr2len_len(p + len, size - len);
|
|
|
|
|
if (len_next_char > size - len)
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
if (!UTF_COMPOSINGLIKE(p + prevlen, p + len))
|
2004-06-13 20:20:40 +00:00
|
|
|
|
break;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Skip over composing char
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#ifdef FEAT_ARABIC
|
|
|
|
|
prevlen = len;
|
|
|
|
|
#endif
|
2008-06-29 14:16:06 +00:00
|
|
|
|
len += len_next_char;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
return len;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return the number of bytes the UTF-8 encoding of character "c" takes.
|
|
|
|
|
* This does not include composing characters.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_char2len(int c)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (c < 0x80)
|
|
|
|
|
return 1;
|
|
|
|
|
if (c < 0x800)
|
|
|
|
|
return 2;
|
|
|
|
|
if (c < 0x10000)
|
|
|
|
|
return 3;
|
|
|
|
|
if (c < 0x200000)
|
|
|
|
|
return 4;
|
|
|
|
|
if (c < 0x4000000)
|
|
|
|
|
return 5;
|
|
|
|
|
return 6;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Convert Unicode character "c" to UTF-8 string in "buf[]".
|
|
|
|
|
* Returns the number of bytes.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_char2bytes(int c, char_u *buf)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (c < 0x80) // 7 bits
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
buf[0] = c;
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (c < 0x800) // 11 bits
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
buf[0] = 0xc0 + ((unsigned)c >> 6);
|
|
|
|
|
buf[1] = 0x80 + (c & 0x3f);
|
|
|
|
|
return 2;
|
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (c < 0x10000) // 16 bits
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
buf[0] = 0xe0 + ((unsigned)c >> 12);
|
|
|
|
|
buf[1] = 0x80 + (((unsigned)c >> 6) & 0x3f);
|
|
|
|
|
buf[2] = 0x80 + (c & 0x3f);
|
|
|
|
|
return 3;
|
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (c < 0x200000) // 21 bits
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
buf[0] = 0xf0 + ((unsigned)c >> 18);
|
|
|
|
|
buf[1] = 0x80 + (((unsigned)c >> 12) & 0x3f);
|
|
|
|
|
buf[2] = 0x80 + (((unsigned)c >> 6) & 0x3f);
|
|
|
|
|
buf[3] = 0x80 + (c & 0x3f);
|
|
|
|
|
return 4;
|
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (c < 0x4000000) // 26 bits
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
buf[0] = 0xf8 + ((unsigned)c >> 24);
|
|
|
|
|
buf[1] = 0x80 + (((unsigned)c >> 18) & 0x3f);
|
|
|
|
|
buf[2] = 0x80 + (((unsigned)c >> 12) & 0x3f);
|
|
|
|
|
buf[3] = 0x80 + (((unsigned)c >> 6) & 0x3f);
|
|
|
|
|
buf[4] = 0x80 + (c & 0x3f);
|
|
|
|
|
return 5;
|
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// 31 bits
|
2004-06-13 20:20:40 +00:00
|
|
|
|
buf[0] = 0xfc + ((unsigned)c >> 30);
|
|
|
|
|
buf[1] = 0x80 + (((unsigned)c >> 24) & 0x3f);
|
|
|
|
|
buf[2] = 0x80 + (((unsigned)c >> 18) & 0x3f);
|
|
|
|
|
buf[3] = 0x80 + (((unsigned)c >> 12) & 0x3f);
|
|
|
|
|
buf[4] = 0x80 + (((unsigned)c >> 6) & 0x3f);
|
|
|
|
|
buf[5] = 0x80 + (c & 0x3f);
|
|
|
|
|
return 6;
|
|
|
|
|
}
|
|
|
|
|
|
2017-08-22 22:12:17 +02:00
|
|
|
|
#if defined(FEAT_TERMINAL) || defined(PROTO)
|
|
|
|
|
/*
|
|
|
|
|
* utf_iscomposing() with different argument type for libvterm.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2017-08-30 13:22:28 +02:00
|
|
|
|
utf_iscomposing_uint(UINT32_T c)
|
2017-08-22 22:12:17 +02:00
|
|
|
|
{
|
|
|
|
|
return utf_iscomposing((int)c);
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Return TRUE if "c" is a composing UTF-8 character. This means it will be
|
|
|
|
|
* drawn on top of the preceding character.
|
|
|
|
|
* Based on code from Markus Kuhn.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_iscomposing(int c)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Sorted list of non-overlapping intervals.
|
|
|
|
|
// Generated by ../runtime/tools/unicode.vim.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
static struct interval combining[] =
|
|
|
|
|
{
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0300, 0x036f},
|
|
|
|
|
{0x0483, 0x0489},
|
|
|
|
|
{0x0591, 0x05bd},
|
|
|
|
|
{0x05bf, 0x05bf},
|
|
|
|
|
{0x05c1, 0x05c2},
|
|
|
|
|
{0x05c4, 0x05c5},
|
|
|
|
|
{0x05c7, 0x05c7},
|
|
|
|
|
{0x0610, 0x061a},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x064b, 0x065f},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0670, 0x0670},
|
|
|
|
|
{0x06d6, 0x06dc},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x06df, 0x06e4},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x06e7, 0x06e8},
|
|
|
|
|
{0x06ea, 0x06ed},
|
|
|
|
|
{0x0711, 0x0711},
|
|
|
|
|
{0x0730, 0x074a},
|
|
|
|
|
{0x07a6, 0x07b0},
|
|
|
|
|
{0x07eb, 0x07f3},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x07fd, 0x07fd},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0816, 0x0819},
|
|
|
|
|
{0x081b, 0x0823},
|
|
|
|
|
{0x0825, 0x0827},
|
|
|
|
|
{0x0829, 0x082d},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x0859, 0x085b},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x0898, 0x089f},
|
|
|
|
|
{0x08ca, 0x08e1},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x08e3, 0x0902},
|
|
|
|
|
{0x093a, 0x093a},
|
|
|
|
|
{0x093c, 0x093c},
|
|
|
|
|
{0x0941, 0x0948},
|
|
|
|
|
{0x094d, 0x094d},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x0951, 0x0957},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0962, 0x0963},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0981, 0x0981},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x09bc, 0x09bc},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x09c1, 0x09c4},
|
|
|
|
|
{0x09cd, 0x09cd},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x09e2, 0x09e3},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x09fe, 0x09fe},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0a01, 0x0a02},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0a3c, 0x0a3c},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0a41, 0x0a42},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0a47, 0x0a48},
|
|
|
|
|
{0x0a4b, 0x0a4d},
|
|
|
|
|
{0x0a51, 0x0a51},
|
|
|
|
|
{0x0a70, 0x0a71},
|
|
|
|
|
{0x0a75, 0x0a75},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0a81, 0x0a82},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0abc, 0x0abc},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0ac1, 0x0ac5},
|
|
|
|
|
{0x0ac7, 0x0ac8},
|
|
|
|
|
{0x0acd, 0x0acd},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0ae2, 0x0ae3},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x0afa, 0x0aff},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0b01, 0x0b01},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0b3c, 0x0b3c},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0b3f, 0x0b3f},
|
|
|
|
|
{0x0b41, 0x0b44},
|
|
|
|
|
{0x0b4d, 0x0b4d},
|
|
|
|
|
{0x0b55, 0x0b56},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0b62, 0x0b63},
|
|
|
|
|
{0x0b82, 0x0b82},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0bc0, 0x0bc0},
|
|
|
|
|
{0x0bcd, 0x0bcd},
|
|
|
|
|
{0x0c00, 0x0c00},
|
|
|
|
|
{0x0c04, 0x0c04},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x0c3c, 0x0c3c},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0c3e, 0x0c40},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0c46, 0x0c48},
|
|
|
|
|
{0x0c4a, 0x0c4d},
|
|
|
|
|
{0x0c55, 0x0c56},
|
|
|
|
|
{0x0c62, 0x0c63},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0c81, 0x0c81},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0cbc, 0x0cbc},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0cbf, 0x0cbf},
|
|
|
|
|
{0x0cc6, 0x0cc6},
|
|
|
|
|
{0x0ccc, 0x0ccd},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0ce2, 0x0ce3},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0d00, 0x0d01},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x0d3b, 0x0d3c},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0d41, 0x0d44},
|
|
|
|
|
{0x0d4d, 0x0d4d},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0d62, 0x0d63},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0d81, 0x0d81},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0dca, 0x0dca},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0dd2, 0x0dd4},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0dd6, 0x0dd6},
|
|
|
|
|
{0x0e31, 0x0e31},
|
|
|
|
|
{0x0e34, 0x0e3a},
|
|
|
|
|
{0x0e47, 0x0e4e},
|
|
|
|
|
{0x0eb1, 0x0eb1},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x0eb4, 0x0ebc},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x0ec8, 0x0ece},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0f18, 0x0f19},
|
|
|
|
|
{0x0f35, 0x0f35},
|
|
|
|
|
{0x0f37, 0x0f37},
|
|
|
|
|
{0x0f39, 0x0f39},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x0f71, 0x0f7e},
|
|
|
|
|
{0x0f80, 0x0f84},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0f86, 0x0f87},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x0f8d, 0x0f97},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x0f99, 0x0fbc},
|
|
|
|
|
{0x0fc6, 0x0fc6},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x102d, 0x1030},
|
|
|
|
|
{0x1032, 0x1037},
|
|
|
|
|
{0x1039, 0x103a},
|
|
|
|
|
{0x103d, 0x103e},
|
|
|
|
|
{0x1058, 0x1059},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x105e, 0x1060},
|
|
|
|
|
{0x1071, 0x1074},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x1082, 0x1082},
|
|
|
|
|
{0x1085, 0x1086},
|
|
|
|
|
{0x108d, 0x108d},
|
|
|
|
|
{0x109d, 0x109d},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x135d, 0x135f},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x1712, 0x1714},
|
|
|
|
|
{0x1732, 0x1733},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1752, 0x1753},
|
|
|
|
|
{0x1772, 0x1773},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x17b4, 0x17b5},
|
|
|
|
|
{0x17b7, 0x17bd},
|
|
|
|
|
{0x17c6, 0x17c6},
|
|
|
|
|
{0x17c9, 0x17d3},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x17dd, 0x17dd},
|
|
|
|
|
{0x180b, 0x180d},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x180f, 0x180f},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1885, 0x1886},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x18a9, 0x18a9},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x1920, 0x1922},
|
|
|
|
|
{0x1927, 0x1928},
|
|
|
|
|
{0x1932, 0x1932},
|
|
|
|
|
{0x1939, 0x193b},
|
|
|
|
|
{0x1a17, 0x1a18},
|
|
|
|
|
{0x1a1b, 0x1a1b},
|
|
|
|
|
{0x1a56, 0x1a56},
|
|
|
|
|
{0x1a58, 0x1a5e},
|
|
|
|
|
{0x1a60, 0x1a60},
|
|
|
|
|
{0x1a62, 0x1a62},
|
|
|
|
|
{0x1a65, 0x1a6c},
|
|
|
|
|
{0x1a73, 0x1a7c},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1a7f, 0x1a7f},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1ab0, 0x1ace},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x1b00, 0x1b03},
|
|
|
|
|
{0x1b34, 0x1b34},
|
|
|
|
|
{0x1b36, 0x1b3a},
|
|
|
|
|
{0x1b3c, 0x1b3c},
|
|
|
|
|
{0x1b42, 0x1b42},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1b6b, 0x1b73},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x1b80, 0x1b81},
|
|
|
|
|
{0x1ba2, 0x1ba5},
|
|
|
|
|
{0x1ba8, 0x1ba9},
|
|
|
|
|
{0x1bab, 0x1bad},
|
|
|
|
|
{0x1be6, 0x1be6},
|
|
|
|
|
{0x1be8, 0x1be9},
|
|
|
|
|
{0x1bed, 0x1bed},
|
|
|
|
|
{0x1bef, 0x1bf1},
|
|
|
|
|
{0x1c2c, 0x1c33},
|
|
|
|
|
{0x1c36, 0x1c37},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1cd0, 0x1cd2},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x1cd4, 0x1ce0},
|
|
|
|
|
{0x1ce2, 0x1ce8},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1ced, 0x1ced},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x1cf4, 0x1cf4},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x1cf8, 0x1cf9},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1dc0, 0x1dff},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x20d0, 0x20f0},
|
|
|
|
|
{0x2cef, 0x2cf1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x2d7f, 0x2d7f},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x2de0, 0x2dff},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x302a, 0x302d},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x3099, 0x309a},
|
|
|
|
|
{0xa66f, 0xa672},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0xa674, 0xa67d},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0xa69e, 0xa69f},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xa6f0, 0xa6f1},
|
|
|
|
|
{0xa802, 0xa802},
|
|
|
|
|
{0xa806, 0xa806},
|
|
|
|
|
{0xa80b, 0xa80b},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0xa825, 0xa826},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0xa82c, 0xa82c},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0xa8c4, 0xa8c5},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xa8e0, 0xa8f1},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0xa8ff, 0xa8ff},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xa926, 0xa92d},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0xa947, 0xa951},
|
|
|
|
|
{0xa980, 0xa982},
|
|
|
|
|
{0xa9b3, 0xa9b3},
|
|
|
|
|
{0xa9b6, 0xa9b9},
|
|
|
|
|
{0xa9bc, 0xa9bd},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0xa9e5, 0xa9e5},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0xaa29, 0xaa2e},
|
|
|
|
|
{0xaa31, 0xaa32},
|
|
|
|
|
{0xaa35, 0xaa36},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xaa43, 0xaa43},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0xaa4c, 0xaa4c},
|
|
|
|
|
{0xaa7c, 0xaa7c},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xaab0, 0xaab0},
|
|
|
|
|
{0xaab2, 0xaab4},
|
|
|
|
|
{0xaab7, 0xaab8},
|
|
|
|
|
{0xaabe, 0xaabf},
|
|
|
|
|
{0xaac1, 0xaac1},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0xaaec, 0xaaed},
|
|
|
|
|
{0xaaf6, 0xaaf6},
|
|
|
|
|
{0xabe5, 0xabe5},
|
|
|
|
|
{0xabe8, 0xabe8},
|
|
|
|
|
{0xabed, 0xabed},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xfb1e, 0xfb1e},
|
|
|
|
|
{0xfe00, 0xfe0f},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0xfe20, 0xfe2f},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x101fd, 0x101fd},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x102e0, 0x102e0},
|
|
|
|
|
{0x10376, 0x1037a},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x10a01, 0x10a03},
|
|
|
|
|
{0x10a05, 0x10a06},
|
|
|
|
|
{0x10a0c, 0x10a0f},
|
|
|
|
|
{0x10a38, 0x10a3a},
|
|
|
|
|
{0x10a3f, 0x10a3f},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x10ae5, 0x10ae6},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x10d24, 0x10d27},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x10eab, 0x10eac},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x10efd, 0x10eff},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x10f46, 0x10f50},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x10f82, 0x10f85},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11001, 0x11001},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x11038, 0x11046},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x11070, 0x11070},
|
|
|
|
|
{0x11073, 0x11074},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x1107f, 0x11081},
|
|
|
|
|
{0x110b3, 0x110b6},
|
|
|
|
|
{0x110b9, 0x110ba},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x110c2, 0x110c2},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x11100, 0x11102},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11127, 0x1112b},
|
|
|
|
|
{0x1112d, 0x11134},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x11173, 0x11173},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11180, 0x11181},
|
|
|
|
|
{0x111b6, 0x111be},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x111c9, 0x111cc},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x111cf, 0x111cf},
|
|
|
|
|
{0x1122f, 0x11231},
|
|
|
|
|
{0x11234, 0x11234},
|
|
|
|
|
{0x11236, 0x11237},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1123e, 0x1123e},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x11241, 0x11241},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x112df, 0x112df},
|
|
|
|
|
{0x112e3, 0x112ea},
|
|
|
|
|
{0x11300, 0x11301},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x1133b, 0x1133c},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11340, 0x11340},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x11366, 0x1136c},
|
|
|
|
|
{0x11370, 0x11374},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11438, 0x1143f},
|
|
|
|
|
{0x11442, 0x11444},
|
|
|
|
|
{0x11446, 0x11446},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x1145e, 0x1145e},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x114b3, 0x114b8},
|
|
|
|
|
{0x114ba, 0x114ba},
|
|
|
|
|
{0x114bf, 0x114c0},
|
|
|
|
|
{0x114c2, 0x114c3},
|
|
|
|
|
{0x115b2, 0x115b5},
|
|
|
|
|
{0x115bc, 0x115bd},
|
|
|
|
|
{0x115bf, 0x115c0},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0x115dc, 0x115dd},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11633, 0x1163a},
|
|
|
|
|
{0x1163d, 0x1163d},
|
|
|
|
|
{0x1163f, 0x11640},
|
|
|
|
|
{0x116ab, 0x116ab},
|
|
|
|
|
{0x116ad, 0x116ad},
|
|
|
|
|
{0x116b0, 0x116b5},
|
|
|
|
|
{0x116b7, 0x116b7},
|
|
|
|
|
{0x1171d, 0x1171f},
|
|
|
|
|
{0x11722, 0x11725},
|
|
|
|
|
{0x11727, 0x1172b},
|
|
|
|
|
{0x1182f, 0x11837},
|
|
|
|
|
{0x11839, 0x1183a},
|
|
|
|
|
{0x1193b, 0x1193c},
|
|
|
|
|
{0x1193e, 0x1193e},
|
|
|
|
|
{0x11943, 0x11943},
|
|
|
|
|
{0x119d4, 0x119d7},
|
|
|
|
|
{0x119da, 0x119db},
|
|
|
|
|
{0x119e0, 0x119e0},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x11a01, 0x11a0a},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11a33, 0x11a38},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x11a3b, 0x11a3e},
|
|
|
|
|
{0x11a47, 0x11a47},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11a51, 0x11a56},
|
|
|
|
|
{0x11a59, 0x11a5b},
|
|
|
|
|
{0x11a8a, 0x11a96},
|
|
|
|
|
{0x11a98, 0x11a99},
|
|
|
|
|
{0x11c30, 0x11c36},
|
|
|
|
|
{0x11c38, 0x11c3d},
|
|
|
|
|
{0x11c3f, 0x11c3f},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x11c92, 0x11ca7},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11caa, 0x11cb0},
|
|
|
|
|
{0x11cb2, 0x11cb3},
|
|
|
|
|
{0x11cb5, 0x11cb6},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x11d31, 0x11d36},
|
|
|
|
|
{0x11d3a, 0x11d3a},
|
|
|
|
|
{0x11d3c, 0x11d3d},
|
|
|
|
|
{0x11d3f, 0x11d45},
|
|
|
|
|
{0x11d47, 0x11d47},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x11d90, 0x11d91},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11d95, 0x11d95},
|
|
|
|
|
{0x11d97, 0x11d97},
|
|
|
|
|
{0x11ef3, 0x11ef4},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x11f00, 0x11f01},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x11f36, 0x11f3a},
|
|
|
|
|
{0x11f40, 0x11f40},
|
|
|
|
|
{0x11f42, 0x11f42},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x13440, 0x13440},
|
|
|
|
|
{0x13447, 0x13455},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x16af0, 0x16af4},
|
|
|
|
|
{0x16b30, 0x16b36},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x16f4f, 0x16f4f},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x16f8f, 0x16f92},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x16fe4, 0x16fe4},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x1bc9d, 0x1bc9e},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1cf00, 0x1cf2d},
|
|
|
|
|
{0x1cf30, 0x1cf46},
|
2022-10-05 18:03:00 +01:00
|
|
|
|
{0x1d167, 0x1d169},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1d17b, 0x1d182},
|
|
|
|
|
{0x1d185, 0x1d18b},
|
|
|
|
|
{0x1d1aa, 0x1d1ad},
|
|
|
|
|
{0x1d242, 0x1d244},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0x1da00, 0x1da36},
|
|
|
|
|
{0x1da3b, 0x1da6c},
|
|
|
|
|
{0x1da75, 0x1da75},
|
|
|
|
|
{0x1da84, 0x1da84},
|
|
|
|
|
{0x1da9b, 0x1da9f},
|
|
|
|
|
{0x1daa1, 0x1daaf},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1e000, 0x1e006},
|
|
|
|
|
{0x1e008, 0x1e018},
|
|
|
|
|
{0x1e01b, 0x1e021},
|
|
|
|
|
{0x1e023, 0x1e024},
|
|
|
|
|
{0x1e026, 0x1e02a},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1e08f, 0x1e08f},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x1e130, 0x1e136},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1e2ae, 0x1e2ae},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x1e2ec, 0x1e2ef},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1e4ec, 0x1e4ef},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x1e8d0, 0x1e8d6},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1e944, 0x1e94a},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xe0100, 0xe01ef}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
return intable(combining, sizeof(combining), c);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return TRUE for characters that can be displayed in a normal way.
|
|
|
|
|
* Only for characters of 0x100 and above!
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_printable(int c)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
#ifdef USE_WCHAR_FUNCTIONS
|
|
|
|
|
/*
|
|
|
|
|
* Assume the iswprint() library function works better than our own stuff.
|
|
|
|
|
*/
|
|
|
|
|
return iswprint(c);
|
|
|
|
|
#else
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Sorted list of non-overlapping intervals.
|
|
|
|
|
// 0xd800-0xdfff is reserved for UTF-16, actually illegal.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
static struct interval nonprint[] =
|
|
|
|
|
{
|
|
|
|
|
{0x070f, 0x070f}, {0x180b, 0x180e}, {0x200b, 0x200f}, {0x202a, 0x202e},
|
2021-11-02 20:24:38 +00:00
|
|
|
|
{0x2060, 0x206f}, {0xd800, 0xdfff}, {0xfeff, 0xfeff}, {0xfff9, 0xfffb},
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{0xfffe, 0xffff}
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
return !intable(nonprint, sizeof(nonprint), c);
|
|
|
|
|
#endif
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Sorted list of non-overlapping intervals of all Emoji characters,
|
|
|
|
|
// based on http://unicode.org/emoji/charts/emoji-list.html
|
2020-08-28 21:04:24 +02:00
|
|
|
|
// Generated by ../runtime/tools/unicode.vim.
|
|
|
|
|
// Excludes 0x00a9 and 0x00ae because they are considered latin1.
|
2016-04-02 22:14:51 +02:00
|
|
|
|
static struct interval emoji_all[] =
|
|
|
|
|
{
|
|
|
|
|
{0x203c, 0x203c},
|
|
|
|
|
{0x2049, 0x2049},
|
|
|
|
|
{0x2122, 0x2122},
|
|
|
|
|
{0x2139, 0x2139},
|
|
|
|
|
{0x2194, 0x2199},
|
|
|
|
|
{0x21a9, 0x21aa},
|
|
|
|
|
{0x231a, 0x231b},
|
|
|
|
|
{0x2328, 0x2328},
|
|
|
|
|
{0x23cf, 0x23cf},
|
|
|
|
|
{0x23e9, 0x23f3},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x23f8, 0x23fa},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0x24c2, 0x24c2},
|
|
|
|
|
{0x25aa, 0x25ab},
|
|
|
|
|
{0x25b6, 0x25b6},
|
|
|
|
|
{0x25c0, 0x25c0},
|
|
|
|
|
{0x25fb, 0x25fe},
|
|
|
|
|
{0x2600, 0x2604},
|
|
|
|
|
{0x260e, 0x260e},
|
|
|
|
|
{0x2611, 0x2611},
|
|
|
|
|
{0x2614, 0x2615},
|
|
|
|
|
{0x2618, 0x2618},
|
|
|
|
|
{0x261d, 0x261d},
|
|
|
|
|
{0x2620, 0x2620},
|
|
|
|
|
{0x2622, 0x2623},
|
|
|
|
|
{0x2626, 0x2626},
|
|
|
|
|
{0x262a, 0x262a},
|
|
|
|
|
{0x262e, 0x262f},
|
|
|
|
|
{0x2638, 0x263a},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x2640, 0x2640},
|
|
|
|
|
{0x2642, 0x2642},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0x2648, 0x2653},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x265f, 0x2660},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0x2663, 0x2663},
|
|
|
|
|
{0x2665, 0x2666},
|
|
|
|
|
{0x2668, 0x2668},
|
|
|
|
|
{0x267b, 0x267b},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x267e, 0x267f},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x2692, 0x2697},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0x2699, 0x2699},
|
|
|
|
|
{0x269b, 0x269c},
|
|
|
|
|
{0x26a0, 0x26a1},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x26a7, 0x26a7},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0x26aa, 0x26ab},
|
|
|
|
|
{0x26b0, 0x26b1},
|
|
|
|
|
{0x26bd, 0x26be},
|
|
|
|
|
{0x26c4, 0x26c5},
|
|
|
|
|
{0x26c8, 0x26c8},
|
|
|
|
|
{0x26ce, 0x26cf},
|
|
|
|
|
{0x26d1, 0x26d1},
|
|
|
|
|
{0x26d3, 0x26d4},
|
|
|
|
|
{0x26e9, 0x26ea},
|
|
|
|
|
{0x26f0, 0x26f5},
|
|
|
|
|
{0x26f7, 0x26fa},
|
|
|
|
|
{0x26fd, 0x26fd},
|
|
|
|
|
{0x2702, 0x2702},
|
|
|
|
|
{0x2705, 0x2705},
|
|
|
|
|
{0x2708, 0x270d},
|
|
|
|
|
{0x270f, 0x270f},
|
|
|
|
|
{0x2712, 0x2712},
|
|
|
|
|
{0x2714, 0x2714},
|
|
|
|
|
{0x2716, 0x2716},
|
|
|
|
|
{0x271d, 0x271d},
|
|
|
|
|
{0x2721, 0x2721},
|
|
|
|
|
{0x2728, 0x2728},
|
|
|
|
|
{0x2733, 0x2734},
|
|
|
|
|
{0x2744, 0x2744},
|
|
|
|
|
{0x2747, 0x2747},
|
|
|
|
|
{0x274c, 0x274c},
|
|
|
|
|
{0x274e, 0x274e},
|
|
|
|
|
{0x2753, 0x2755},
|
|
|
|
|
{0x2757, 0x2757},
|
|
|
|
|
{0x2763, 0x2764},
|
|
|
|
|
{0x2795, 0x2797},
|
|
|
|
|
{0x27a1, 0x27a1},
|
|
|
|
|
{0x27b0, 0x27b0},
|
|
|
|
|
{0x27bf, 0x27bf},
|
|
|
|
|
{0x2934, 0x2935},
|
|
|
|
|
{0x2b05, 0x2b07},
|
|
|
|
|
{0x2b1b, 0x2b1c},
|
|
|
|
|
{0x2b50, 0x2b50},
|
|
|
|
|
{0x2b55, 0x2b55},
|
|
|
|
|
{0x3030, 0x3030},
|
|
|
|
|
{0x303d, 0x303d},
|
|
|
|
|
{0x3297, 0x3297},
|
|
|
|
|
{0x3299, 0x3299},
|
|
|
|
|
{0x1f004, 0x1f004},
|
|
|
|
|
{0x1f0cf, 0x1f0cf},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x1f170, 0x1f171},
|
|
|
|
|
{0x1f17e, 0x1f17f},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0x1f18e, 0x1f18e},
|
|
|
|
|
{0x1f191, 0x1f19a},
|
|
|
|
|
{0x1f1e6, 0x1f1ff},
|
|
|
|
|
{0x1f201, 0x1f202},
|
|
|
|
|
{0x1f21a, 0x1f21a},
|
|
|
|
|
{0x1f22f, 0x1f22f},
|
|
|
|
|
{0x1f232, 0x1f23a},
|
|
|
|
|
{0x1f250, 0x1f251},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x1f300, 0x1f321},
|
|
|
|
|
{0x1f324, 0x1f393},
|
|
|
|
|
{0x1f396, 0x1f397},
|
|
|
|
|
{0x1f399, 0x1f39b},
|
|
|
|
|
{0x1f39e, 0x1f3f0},
|
|
|
|
|
{0x1f3f3, 0x1f3f5},
|
|
|
|
|
{0x1f3f7, 0x1f4fd},
|
|
|
|
|
{0x1f4ff, 0x1f53d},
|
|
|
|
|
{0x1f549, 0x1f54e},
|
2016-04-02 22:14:51 +02:00
|
|
|
|
{0x1f550, 0x1f567},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x1f56f, 0x1f570},
|
|
|
|
|
{0x1f573, 0x1f57a},
|
|
|
|
|
{0x1f587, 0x1f587},
|
|
|
|
|
{0x1f58a, 0x1f58d},
|
|
|
|
|
{0x1f590, 0x1f590},
|
|
|
|
|
{0x1f595, 0x1f596},
|
|
|
|
|
{0x1f5a4, 0x1f5a5},
|
|
|
|
|
{0x1f5a8, 0x1f5a8},
|
|
|
|
|
{0x1f5b1, 0x1f5b2},
|
|
|
|
|
{0x1f5bc, 0x1f5bc},
|
|
|
|
|
{0x1f5c2, 0x1f5c4},
|
|
|
|
|
{0x1f5d1, 0x1f5d3},
|
|
|
|
|
{0x1f5dc, 0x1f5de},
|
|
|
|
|
{0x1f5e1, 0x1f5e1},
|
|
|
|
|
{0x1f5e3, 0x1f5e3},
|
|
|
|
|
{0x1f5e8, 0x1f5e8},
|
|
|
|
|
{0x1f5ef, 0x1f5ef},
|
|
|
|
|
{0x1f5f3, 0x1f5f3},
|
|
|
|
|
{0x1f5fa, 0x1f64f},
|
|
|
|
|
{0x1f680, 0x1f6c5},
|
|
|
|
|
{0x1f6cb, 0x1f6d2},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x1f6d5, 0x1f6d7},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1f6dc, 0x1f6e5},
|
2017-06-22 15:27:37 +02:00
|
|
|
|
{0x1f6e9, 0x1f6e9},
|
|
|
|
|
{0x1f6eb, 0x1f6ec},
|
|
|
|
|
{0x1f6f0, 0x1f6f0},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x1f6f3, 0x1f6fc},
|
|
|
|
|
{0x1f7e0, 0x1f7eb},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1f7f0, 0x1f7f0},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0x1f90c, 0x1f93a},
|
|
|
|
|
{0x1f93c, 0x1f945},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x1f947, 0x1f9ff},
|
|
|
|
|
{0x1fa70, 0x1fa7c},
|
|
|
|
|
{0x1fa80, 0x1fa88},
|
|
|
|
|
{0x1fa90, 0x1fabd},
|
|
|
|
|
{0x1fabf, 0x1fac5},
|
|
|
|
|
{0x1face, 0x1fadb},
|
|
|
|
|
{0x1fae0, 0x1fae8},
|
|
|
|
|
{0x1faf0, 0x1faf8}
|
2016-04-02 22:14:51 +02:00
|
|
|
|
};
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Get class of a Unicode character.
|
|
|
|
|
* 0: white space
|
|
|
|
|
* 1: punctuation
|
|
|
|
|
* 2 or bigger: some class of word character.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_class(int c)
|
2017-01-28 16:39:34 +01:00
|
|
|
|
{
|
|
|
|
|
return utf_class_buf(c, curbuf);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int
|
|
|
|
|
utf_class_buf(int c, buf_T *buf)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// sorted list of non-overlapping intervals
|
2004-06-13 20:20:40 +00:00
|
|
|
|
static struct clinterval
|
|
|
|
|
{
|
2013-11-12 04:44:01 +01:00
|
|
|
|
unsigned int first;
|
|
|
|
|
unsigned int last;
|
|
|
|
|
unsigned int class;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
} classes[] =
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{0x037e, 0x037e, 1}, // Greek question mark
|
|
|
|
|
{0x0387, 0x0387, 1}, // Greek ano teleia
|
|
|
|
|
{0x055a, 0x055f, 1}, // Armenian punctuation
|
|
|
|
|
{0x0589, 0x0589, 1}, // Armenian full stop
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{0x05be, 0x05be, 1},
|
|
|
|
|
{0x05c0, 0x05c0, 1},
|
|
|
|
|
{0x05c3, 0x05c3, 1},
|
|
|
|
|
{0x05f3, 0x05f4, 1},
|
|
|
|
|
{0x060c, 0x060c, 1},
|
|
|
|
|
{0x061b, 0x061b, 1},
|
|
|
|
|
{0x061f, 0x061f, 1},
|
|
|
|
|
{0x066a, 0x066d, 1},
|
|
|
|
|
{0x06d4, 0x06d4, 1},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{0x0700, 0x070d, 1}, // Syriac punctuation
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{0x0964, 0x0965, 1},
|
|
|
|
|
{0x0970, 0x0970, 1},
|
|
|
|
|
{0x0df4, 0x0df4, 1},
|
|
|
|
|
{0x0e4f, 0x0e4f, 1},
|
|
|
|
|
{0x0e5a, 0x0e5b, 1},
|
|
|
|
|
{0x0f04, 0x0f12, 1},
|
|
|
|
|
{0x0f3a, 0x0f3d, 1},
|
|
|
|
|
{0x0f85, 0x0f85, 1},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{0x104a, 0x104f, 1}, // Myanmar punctuation
|
|
|
|
|
{0x10fb, 0x10fb, 1}, // Georgian punctuation
|
|
|
|
|
{0x1361, 0x1368, 1}, // Ethiopic punctuation
|
|
|
|
|
{0x166d, 0x166e, 1}, // Canadian Syl. punctuation
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{0x1680, 0x1680, 0},
|
|
|
|
|
{0x169b, 0x169c, 1},
|
|
|
|
|
{0x16eb, 0x16ed, 1},
|
|
|
|
|
{0x1735, 0x1736, 1},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{0x17d4, 0x17dc, 1}, // Khmer punctuation
|
|
|
|
|
{0x1800, 0x180a, 1}, // Mongolian punctuation
|
|
|
|
|
{0x2000, 0x200b, 0}, // spaces
|
|
|
|
|
{0x200c, 0x2027, 1}, // punctuation and symbols
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{0x2028, 0x2029, 0},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{0x202a, 0x202e, 1}, // punctuation and symbols
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{0x202f, 0x202f, 0},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{0x2030, 0x205e, 1}, // punctuation and symbols
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{0x205f, 0x205f, 0},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{0x2060, 0x27ff, 1}, // punctuation and symbols
|
|
|
|
|
{0x2070, 0x207f, 0x2070}, // superscript
|
|
|
|
|
{0x2080, 0x2094, 0x2080}, // subscript
|
|
|
|
|
{0x20a0, 0x27ff, 1}, // all kinds of symbols
|
|
|
|
|
{0x2800, 0x28ff, 0x2800}, // braille
|
|
|
|
|
{0x2900, 0x2998, 1}, // arrows, brackets, etc.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{0x29d8, 0x29db, 1},
|
|
|
|
|
{0x29fc, 0x29fd, 1},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{0x2e00, 0x2e7f, 1}, // supplemental punctuation
|
|
|
|
|
{0x3000, 0x3000, 0}, // ideographic space
|
|
|
|
|
{0x3001, 0x3020, 1}, // ideographic punctuation
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{0x3030, 0x3030, 1},
|
|
|
|
|
{0x303d, 0x303d, 1},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{0x3040, 0x309f, 0x3040}, // Hiragana
|
|
|
|
|
{0x30a0, 0x30ff, 0x30a0}, // Katakana
|
|
|
|
|
{0x3300, 0x9fff, 0x4e00}, // CJK Ideographs
|
|
|
|
|
{0xac00, 0xd7a3, 0xac00}, // Hangul Syllables
|
|
|
|
|
{0xf900, 0xfaff, 0x4e00}, // CJK Ideographs
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{0xfd3e, 0xfd3f, 1},
|
2019-12-04 21:57:43 +01:00
|
|
|
|
{0xfe30, 0xfe6b, 1}, // punctuation forms
|
|
|
|
|
{0xff00, 0xff0f, 1}, // half/fullwidth ASCII
|
|
|
|
|
{0xff1a, 0xff20, 1}, // half/fullwidth ASCII
|
|
|
|
|
{0xff3b, 0xff40, 1}, // half/fullwidth ASCII
|
|
|
|
|
{0xff5b, 0xff65, 1}, // half/fullwidth ASCII
|
|
|
|
|
{0x1d000, 0x1d24f, 1}, // Musical notation
|
|
|
|
|
{0x1d400, 0x1d7ff, 1}, // Mathematical Alphanumeric Symbols
|
|
|
|
|
{0x1f000, 0x1f2ff, 1}, // Game pieces; enclosed characters
|
|
|
|
|
{0x1f300, 0x1f9ff, 1}, // Many symbol blocks
|
|
|
|
|
{0x20000, 0x2a6df, 0x4e00}, // CJK Ideographs
|
|
|
|
|
{0x2a700, 0x2b73f, 0x4e00}, // CJK Ideographs
|
|
|
|
|
{0x2b740, 0x2b81f, 0x4e00}, // CJK Ideographs
|
|
|
|
|
{0x2f800, 0x2fa1f, 0x4e00}, // CJK Ideographs
|
2004-06-13 20:20:40 +00:00
|
|
|
|
};
|
2016-03-21 22:09:44 +01:00
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
int bot = 0;
|
2021-06-02 13:28:16 +02:00
|
|
|
|
int top = ARRAY_LENGTH(classes) - 1;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
int mid;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// First quick check for Latin1 characters, use 'iskeyword'.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (c < 0x100)
|
|
|
|
|
{
|
2005-06-17 21:55:00 +00:00
|
|
|
|
if (c == ' ' || c == '\t' || c == NUL || c == 0xa0)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return 0; // blank
|
2017-01-28 16:39:34 +01:00
|
|
|
|
if (vim_iswordc_buf(c, buf))
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return 2; // word character
|
|
|
|
|
return 1; // punctuation
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
2020-08-28 22:24:57 +02:00
|
|
|
|
// emoji
|
|
|
|
|
if (intable(emoji_all, sizeof(emoji_all), c))
|
|
|
|
|
return 3;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// binary search in table
|
2004-06-13 20:20:40 +00:00
|
|
|
|
while (top >= bot)
|
|
|
|
|
{
|
|
|
|
|
mid = (bot + top) / 2;
|
2013-11-12 04:44:01 +01:00
|
|
|
|
if (classes[mid].last < (unsigned int)c)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
bot = mid + 1;
|
2013-11-12 04:44:01 +01:00
|
|
|
|
else if (classes[mid].first > (unsigned int)c)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
top = mid - 1;
|
|
|
|
|
else
|
|
|
|
|
return (int)classes[mid].class;
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// most other characters are "word" characters
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 2;
|
|
|
|
|
}
|
|
|
|
|
|
2016-04-02 22:14:51 +02:00
|
|
|
|
int
|
|
|
|
|
utf_ambiguous_width(int c)
|
|
|
|
|
{
|
|
|
|
|
return c >= 0x80 && (intable(ambiguous, sizeof(ambiguous), c)
|
|
|
|
|
|| intable(emoji_all, sizeof(emoji_all), c));
|
|
|
|
|
}
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Code for Unicode case-dependent operations. Based on notes in
|
|
|
|
|
* http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
|
|
|
|
|
* This code uses simple case folding, not full case folding.
|
2010-01-12 19:52:03 +01:00
|
|
|
|
* Last updated for Unicode 5.2.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
/*
|
2010-01-12 19:52:03 +01:00
|
|
|
|
* The following tables are built by ../runtime/tools/unicode.vim.
|
|
|
|
|
* They must be in numeric order, because we use binary search.
|
|
|
|
|
* An entry such as {0x41,0x5a,1,32} means that Unicode characters in the
|
|
|
|
|
* range from 0x41 to 0x5a inclusive, stepping by 1, are changed to
|
|
|
|
|
* folded/upper/lower by adding 32.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
typedef struct
|
|
|
|
|
{
|
|
|
|
|
int rangeStart;
|
|
|
|
|
int rangeEnd;
|
|
|
|
|
int step;
|
|
|
|
|
int offset;
|
|
|
|
|
} convertStruct;
|
|
|
|
|
|
2005-06-01 21:51:55 +00:00
|
|
|
|
static convertStruct foldCase[] =
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x41,0x5a,1,32},
|
|
|
|
|
{0xb5,0xb5,-1,775},
|
|
|
|
|
{0xc0,0xd6,1,32},
|
|
|
|
|
{0xd8,0xde,1,32},
|
|
|
|
|
{0x100,0x12e,2,1},
|
|
|
|
|
{0x132,0x136,2,1},
|
|
|
|
|
{0x139,0x147,2,1},
|
|
|
|
|
{0x14a,0x176,2,1},
|
|
|
|
|
{0x178,0x178,-1,-121},
|
|
|
|
|
{0x179,0x17d,2,1},
|
|
|
|
|
{0x17f,0x17f,-1,-268},
|
|
|
|
|
{0x181,0x181,-1,210},
|
|
|
|
|
{0x182,0x184,2,1},
|
|
|
|
|
{0x186,0x186,-1,206},
|
|
|
|
|
{0x187,0x187,-1,1},
|
|
|
|
|
{0x189,0x18a,1,205},
|
|
|
|
|
{0x18b,0x18b,-1,1},
|
|
|
|
|
{0x18e,0x18e,-1,79},
|
|
|
|
|
{0x18f,0x18f,-1,202},
|
|
|
|
|
{0x190,0x190,-1,203},
|
|
|
|
|
{0x191,0x191,-1,1},
|
|
|
|
|
{0x193,0x193,-1,205},
|
|
|
|
|
{0x194,0x194,-1,207},
|
|
|
|
|
{0x196,0x196,-1,211},
|
|
|
|
|
{0x197,0x197,-1,209},
|
|
|
|
|
{0x198,0x198,-1,1},
|
|
|
|
|
{0x19c,0x19c,-1,211},
|
|
|
|
|
{0x19d,0x19d,-1,213},
|
|
|
|
|
{0x19f,0x19f,-1,214},
|
|
|
|
|
{0x1a0,0x1a4,2,1},
|
|
|
|
|
{0x1a6,0x1a6,-1,218},
|
|
|
|
|
{0x1a7,0x1a7,-1,1},
|
|
|
|
|
{0x1a9,0x1a9,-1,218},
|
|
|
|
|
{0x1ac,0x1ac,-1,1},
|
|
|
|
|
{0x1ae,0x1ae,-1,218},
|
|
|
|
|
{0x1af,0x1af,-1,1},
|
|
|
|
|
{0x1b1,0x1b2,1,217},
|
|
|
|
|
{0x1b3,0x1b5,2,1},
|
|
|
|
|
{0x1b7,0x1b7,-1,219},
|
|
|
|
|
{0x1b8,0x1bc,4,1},
|
|
|
|
|
{0x1c4,0x1c4,-1,2},
|
|
|
|
|
{0x1c5,0x1c5,-1,1},
|
|
|
|
|
{0x1c7,0x1c7,-1,2},
|
|
|
|
|
{0x1c8,0x1c8,-1,1},
|
|
|
|
|
{0x1ca,0x1ca,-1,2},
|
|
|
|
|
{0x1cb,0x1db,2,1},
|
|
|
|
|
{0x1de,0x1ee,2,1},
|
|
|
|
|
{0x1f1,0x1f1,-1,2},
|
|
|
|
|
{0x1f2,0x1f4,2,1},
|
|
|
|
|
{0x1f6,0x1f6,-1,-97},
|
|
|
|
|
{0x1f7,0x1f7,-1,-56},
|
|
|
|
|
{0x1f8,0x21e,2,1},
|
|
|
|
|
{0x220,0x220,-1,-130},
|
|
|
|
|
{0x222,0x232,2,1},
|
|
|
|
|
{0x23a,0x23a,-1,10795},
|
|
|
|
|
{0x23b,0x23b,-1,1},
|
|
|
|
|
{0x23d,0x23d,-1,-163},
|
|
|
|
|
{0x23e,0x23e,-1,10792},
|
|
|
|
|
{0x241,0x241,-1,1},
|
|
|
|
|
{0x243,0x243,-1,-195},
|
|
|
|
|
{0x244,0x244,-1,69},
|
|
|
|
|
{0x245,0x245,-1,71},
|
|
|
|
|
{0x246,0x24e,2,1},
|
|
|
|
|
{0x345,0x345,-1,116},
|
|
|
|
|
{0x370,0x372,2,1},
|
|
|
|
|
{0x376,0x376,-1,1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x37f,0x37f,-1,116},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x386,0x386,-1,38},
|
|
|
|
|
{0x388,0x38a,1,37},
|
|
|
|
|
{0x38c,0x38c,-1,64},
|
|
|
|
|
{0x38e,0x38f,1,63},
|
|
|
|
|
{0x391,0x3a1,1,32},
|
|
|
|
|
{0x3a3,0x3ab,1,32},
|
|
|
|
|
{0x3c2,0x3c2,-1,1},
|
|
|
|
|
{0x3cf,0x3cf,-1,8},
|
|
|
|
|
{0x3d0,0x3d0,-1,-30},
|
|
|
|
|
{0x3d1,0x3d1,-1,-25},
|
|
|
|
|
{0x3d5,0x3d5,-1,-15},
|
|
|
|
|
{0x3d6,0x3d6,-1,-22},
|
|
|
|
|
{0x3d8,0x3ee,2,1},
|
|
|
|
|
{0x3f0,0x3f0,-1,-54},
|
|
|
|
|
{0x3f1,0x3f1,-1,-48},
|
|
|
|
|
{0x3f4,0x3f4,-1,-60},
|
|
|
|
|
{0x3f5,0x3f5,-1,-64},
|
|
|
|
|
{0x3f7,0x3f7,-1,1},
|
|
|
|
|
{0x3f9,0x3f9,-1,-7},
|
|
|
|
|
{0x3fa,0x3fa,-1,1},
|
|
|
|
|
{0x3fd,0x3ff,1,-130},
|
|
|
|
|
{0x400,0x40f,1,80},
|
|
|
|
|
{0x410,0x42f,1,32},
|
|
|
|
|
{0x460,0x480,2,1},
|
|
|
|
|
{0x48a,0x4be,2,1},
|
|
|
|
|
{0x4c0,0x4c0,-1,15},
|
|
|
|
|
{0x4c1,0x4cd,2,1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x4d0,0x52e,2,1},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x531,0x556,1,48},
|
|
|
|
|
{0x10a0,0x10c5,1,7264},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x10c7,0x10cd,6,7264},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0x13f8,0x13fd,1,-8},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1c80,0x1c80,-1,-6222},
|
|
|
|
|
{0x1c81,0x1c81,-1,-6221},
|
|
|
|
|
{0x1c82,0x1c82,-1,-6212},
|
|
|
|
|
{0x1c83,0x1c84,1,-6210},
|
|
|
|
|
{0x1c85,0x1c85,-1,-6211},
|
|
|
|
|
{0x1c86,0x1c86,-1,-6204},
|
|
|
|
|
{0x1c87,0x1c87,-1,-6180},
|
|
|
|
|
{0x1c88,0x1c88,-1,35267},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x1c90,0x1cba,1,-3008},
|
|
|
|
|
{0x1cbd,0x1cbf,1,-3008},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1e00,0x1e94,2,1},
|
|
|
|
|
{0x1e9b,0x1e9b,-1,-58},
|
|
|
|
|
{0x1e9e,0x1e9e,-1,-7615},
|
|
|
|
|
{0x1ea0,0x1efe,2,1},
|
|
|
|
|
{0x1f08,0x1f0f,1,-8},
|
|
|
|
|
{0x1f18,0x1f1d,1,-8},
|
|
|
|
|
{0x1f28,0x1f2f,1,-8},
|
|
|
|
|
{0x1f38,0x1f3f,1,-8},
|
|
|
|
|
{0x1f48,0x1f4d,1,-8},
|
|
|
|
|
{0x1f59,0x1f5f,2,-8},
|
|
|
|
|
{0x1f68,0x1f6f,1,-8},
|
|
|
|
|
{0x1f88,0x1f8f,1,-8},
|
|
|
|
|
{0x1f98,0x1f9f,1,-8},
|
|
|
|
|
{0x1fa8,0x1faf,1,-8},
|
|
|
|
|
{0x1fb8,0x1fb9,1,-8},
|
|
|
|
|
{0x1fba,0x1fbb,1,-74},
|
|
|
|
|
{0x1fbc,0x1fbc,-1,-9},
|
|
|
|
|
{0x1fbe,0x1fbe,-1,-7173},
|
|
|
|
|
{0x1fc8,0x1fcb,1,-86},
|
|
|
|
|
{0x1fcc,0x1fcc,-1,-9},
|
2023-10-11 21:24:49 +02:00
|
|
|
|
{0x1fd3,0x1fd3,-1,-7235},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1fd8,0x1fd9,1,-8},
|
|
|
|
|
{0x1fda,0x1fdb,1,-100},
|
2023-10-11 21:24:49 +02:00
|
|
|
|
{0x1fe3,0x1fe3,-1,-7219},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1fe8,0x1fe9,1,-8},
|
|
|
|
|
{0x1fea,0x1feb,1,-112},
|
|
|
|
|
{0x1fec,0x1fec,-1,-7},
|
|
|
|
|
{0x1ff8,0x1ff9,1,-128},
|
|
|
|
|
{0x1ffa,0x1ffb,1,-126},
|
|
|
|
|
{0x1ffc,0x1ffc,-1,-9},
|
|
|
|
|
{0x2126,0x2126,-1,-7517},
|
|
|
|
|
{0x212a,0x212a,-1,-8383},
|
|
|
|
|
{0x212b,0x212b,-1,-8262},
|
|
|
|
|
{0x2132,0x2132,-1,28},
|
|
|
|
|
{0x2160,0x216f,1,16},
|
|
|
|
|
{0x2183,0x2183,-1,1},
|
|
|
|
|
{0x24b6,0x24cf,1,26},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x2c00,0x2c2f,1,48},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x2c60,0x2c60,-1,1},
|
|
|
|
|
{0x2c62,0x2c62,-1,-10743},
|
|
|
|
|
{0x2c63,0x2c63,-1,-3814},
|
|
|
|
|
{0x2c64,0x2c64,-1,-10727},
|
|
|
|
|
{0x2c67,0x2c6b,2,1},
|
|
|
|
|
{0x2c6d,0x2c6d,-1,-10780},
|
|
|
|
|
{0x2c6e,0x2c6e,-1,-10749},
|
|
|
|
|
{0x2c6f,0x2c6f,-1,-10783},
|
|
|
|
|
{0x2c70,0x2c70,-1,-10782},
|
|
|
|
|
{0x2c72,0x2c75,3,1},
|
|
|
|
|
{0x2c7e,0x2c7f,1,-10815},
|
|
|
|
|
{0x2c80,0x2ce2,2,1},
|
|
|
|
|
{0x2ceb,0x2ced,2,1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x2cf2,0xa640,31054,1},
|
|
|
|
|
{0xa642,0xa66c,2,1},
|
|
|
|
|
{0xa680,0xa69a,2,1},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xa722,0xa72e,2,1},
|
|
|
|
|
{0xa732,0xa76e,2,1},
|
|
|
|
|
{0xa779,0xa77b,2,1},
|
|
|
|
|
{0xa77d,0xa77d,-1,-35332},
|
|
|
|
|
{0xa77e,0xa786,2,1},
|
|
|
|
|
{0xa78b,0xa78b,-1,1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0xa78d,0xa78d,-1,-42280},
|
|
|
|
|
{0xa790,0xa792,2,1},
|
|
|
|
|
{0xa796,0xa7a8,2,1},
|
|
|
|
|
{0xa7aa,0xa7aa,-1,-42308},
|
|
|
|
|
{0xa7ab,0xa7ab,-1,-42319},
|
|
|
|
|
{0xa7ac,0xa7ac,-1,-42315},
|
|
|
|
|
{0xa7ad,0xa7ad,-1,-42305},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0xa7ae,0xa7ae,-1,-42308},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0xa7b0,0xa7b0,-1,-42258},
|
|
|
|
|
{0xa7b1,0xa7b1,-1,-42282},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0xa7b2,0xa7b2,-1,-42261},
|
|
|
|
|
{0xa7b3,0xa7b3,-1,928},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0xa7b4,0xa7c2,2,1},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0xa7c4,0xa7c4,-1,-48},
|
|
|
|
|
{0xa7c5,0xa7c5,-1,-42307},
|
|
|
|
|
{0xa7c6,0xa7c6,-1,-35384},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0xa7c7,0xa7c9,2,1},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0xa7d0,0xa7d6,6,1},
|
|
|
|
|
{0xa7d8,0xa7f5,29,1},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0xab70,0xabbf,1,-38864},
|
2023-10-11 21:24:49 +02:00
|
|
|
|
{0xfb05,0xfb05,-1,1},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xff21,0xff3a,1,32},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x10400,0x10427,1,40},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x104b0,0x104d3,1,40},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x10570,0x1057a,1,39},
|
|
|
|
|
{0x1057c,0x1058a,1,39},
|
|
|
|
|
{0x1058c,0x10592,1,39},
|
|
|
|
|
{0x10594,0x10595,1,39},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0x10c80,0x10cb2,1,64},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x118a0,0x118bf,1,32},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x16e40,0x16e5f,1,32},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1e900,0x1e921,1,34}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Generic conversion function for case operations.
|
|
|
|
|
* Return the converted equivalent of "a", which is a UCS-4 character. Use
|
|
|
|
|
* the given conversion "table". Uses binary search on "table".
|
|
|
|
|
*/
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_convert(
|
|
|
|
|
int a,
|
|
|
|
|
convertStruct table[],
|
|
|
|
|
int tableSize)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
int start, mid, end; // indices into table
|
2011-12-08 15:09:52 +01:00
|
|
|
|
int entries = tableSize / sizeof(convertStruct);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
start = 0;
|
2011-12-08 15:09:52 +01:00
|
|
|
|
end = entries;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
while (start < end)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// need to search further
|
2011-12-08 15:09:52 +01:00
|
|
|
|
mid = (end + start) / 2;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (table[mid].rangeEnd < a)
|
|
|
|
|
start = mid + 1;
|
|
|
|
|
else
|
|
|
|
|
end = mid;
|
|
|
|
|
}
|
2011-12-08 15:09:52 +01:00
|
|
|
|
if (start < entries
|
|
|
|
|
&& table[start].rangeStart <= a
|
|
|
|
|
&& a <= table[start].rangeEnd
|
2004-06-13 20:20:40 +00:00
|
|
|
|
&& (a - table[start].rangeStart) % table[start].step == 0)
|
|
|
|
|
return (a + table[start].offset);
|
|
|
|
|
else
|
|
|
|
|
return a;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return the folded-case equivalent of "a", which is a UCS-4 character. Uses
|
|
|
|
|
* simple case folding.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_fold(int a)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2016-07-10 18:24:27 +02:00
|
|
|
|
if (a < 0x80)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// be fast for ASCII
|
2016-07-10 18:24:27 +02:00
|
|
|
|
return a >= 0x41 && a <= 0x5a ? a + 32 : a;
|
2011-12-08 15:09:52 +01:00
|
|
|
|
return utf_convert(a, foldCase, (int)sizeof(foldCase));
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
2005-06-01 21:51:55 +00:00
|
|
|
|
static convertStruct toLower[] =
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x41,0x5a,1,32},
|
|
|
|
|
{0xc0,0xd6,1,32},
|
|
|
|
|
{0xd8,0xde,1,32},
|
|
|
|
|
{0x100,0x12e,2,1},
|
|
|
|
|
{0x130,0x130,-1,-199},
|
|
|
|
|
{0x132,0x136,2,1},
|
|
|
|
|
{0x139,0x147,2,1},
|
|
|
|
|
{0x14a,0x176,2,1},
|
|
|
|
|
{0x178,0x178,-1,-121},
|
|
|
|
|
{0x179,0x17d,2,1},
|
|
|
|
|
{0x181,0x181,-1,210},
|
|
|
|
|
{0x182,0x184,2,1},
|
|
|
|
|
{0x186,0x186,-1,206},
|
|
|
|
|
{0x187,0x187,-1,1},
|
|
|
|
|
{0x189,0x18a,1,205},
|
|
|
|
|
{0x18b,0x18b,-1,1},
|
|
|
|
|
{0x18e,0x18e,-1,79},
|
|
|
|
|
{0x18f,0x18f,-1,202},
|
|
|
|
|
{0x190,0x190,-1,203},
|
|
|
|
|
{0x191,0x191,-1,1},
|
|
|
|
|
{0x193,0x193,-1,205},
|
|
|
|
|
{0x194,0x194,-1,207},
|
|
|
|
|
{0x196,0x196,-1,211},
|
|
|
|
|
{0x197,0x197,-1,209},
|
|
|
|
|
{0x198,0x198,-1,1},
|
|
|
|
|
{0x19c,0x19c,-1,211},
|
|
|
|
|
{0x19d,0x19d,-1,213},
|
|
|
|
|
{0x19f,0x19f,-1,214},
|
|
|
|
|
{0x1a0,0x1a4,2,1},
|
|
|
|
|
{0x1a6,0x1a6,-1,218},
|
|
|
|
|
{0x1a7,0x1a7,-1,1},
|
|
|
|
|
{0x1a9,0x1a9,-1,218},
|
|
|
|
|
{0x1ac,0x1ac,-1,1},
|
|
|
|
|
{0x1ae,0x1ae,-1,218},
|
|
|
|
|
{0x1af,0x1af,-1,1},
|
|
|
|
|
{0x1b1,0x1b2,1,217},
|
|
|
|
|
{0x1b3,0x1b5,2,1},
|
|
|
|
|
{0x1b7,0x1b7,-1,219},
|
|
|
|
|
{0x1b8,0x1bc,4,1},
|
|
|
|
|
{0x1c4,0x1c4,-1,2},
|
|
|
|
|
{0x1c5,0x1c5,-1,1},
|
|
|
|
|
{0x1c7,0x1c7,-1,2},
|
|
|
|
|
{0x1c8,0x1c8,-1,1},
|
|
|
|
|
{0x1ca,0x1ca,-1,2},
|
|
|
|
|
{0x1cb,0x1db,2,1},
|
|
|
|
|
{0x1de,0x1ee,2,1},
|
|
|
|
|
{0x1f1,0x1f1,-1,2},
|
|
|
|
|
{0x1f2,0x1f4,2,1},
|
|
|
|
|
{0x1f6,0x1f6,-1,-97},
|
|
|
|
|
{0x1f7,0x1f7,-1,-56},
|
|
|
|
|
{0x1f8,0x21e,2,1},
|
|
|
|
|
{0x220,0x220,-1,-130},
|
|
|
|
|
{0x222,0x232,2,1},
|
|
|
|
|
{0x23a,0x23a,-1,10795},
|
|
|
|
|
{0x23b,0x23b,-1,1},
|
|
|
|
|
{0x23d,0x23d,-1,-163},
|
|
|
|
|
{0x23e,0x23e,-1,10792},
|
|
|
|
|
{0x241,0x241,-1,1},
|
|
|
|
|
{0x243,0x243,-1,-195},
|
|
|
|
|
{0x244,0x244,-1,69},
|
|
|
|
|
{0x245,0x245,-1,71},
|
|
|
|
|
{0x246,0x24e,2,1},
|
|
|
|
|
{0x370,0x372,2,1},
|
|
|
|
|
{0x376,0x376,-1,1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x37f,0x37f,-1,116},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x386,0x386,-1,38},
|
|
|
|
|
{0x388,0x38a,1,37},
|
|
|
|
|
{0x38c,0x38c,-1,64},
|
|
|
|
|
{0x38e,0x38f,1,63},
|
|
|
|
|
{0x391,0x3a1,1,32},
|
|
|
|
|
{0x3a3,0x3ab,1,32},
|
|
|
|
|
{0x3cf,0x3cf,-1,8},
|
|
|
|
|
{0x3d8,0x3ee,2,1},
|
|
|
|
|
{0x3f4,0x3f4,-1,-60},
|
|
|
|
|
{0x3f7,0x3f7,-1,1},
|
|
|
|
|
{0x3f9,0x3f9,-1,-7},
|
|
|
|
|
{0x3fa,0x3fa,-1,1},
|
|
|
|
|
{0x3fd,0x3ff,1,-130},
|
|
|
|
|
{0x400,0x40f,1,80},
|
|
|
|
|
{0x410,0x42f,1,32},
|
|
|
|
|
{0x460,0x480,2,1},
|
|
|
|
|
{0x48a,0x4be,2,1},
|
|
|
|
|
{0x4c0,0x4c0,-1,15},
|
|
|
|
|
{0x4c1,0x4cd,2,1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x4d0,0x52e,2,1},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x531,0x556,1,48},
|
|
|
|
|
{0x10a0,0x10c5,1,7264},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x10c7,0x10cd,6,7264},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0x13a0,0x13ef,1,38864},
|
|
|
|
|
{0x13f0,0x13f5,1,8},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x1c90,0x1cba,1,-3008},
|
|
|
|
|
{0x1cbd,0x1cbf,1,-3008},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1e00,0x1e94,2,1},
|
|
|
|
|
{0x1e9e,0x1e9e,-1,-7615},
|
|
|
|
|
{0x1ea0,0x1efe,2,1},
|
|
|
|
|
{0x1f08,0x1f0f,1,-8},
|
|
|
|
|
{0x1f18,0x1f1d,1,-8},
|
|
|
|
|
{0x1f28,0x1f2f,1,-8},
|
|
|
|
|
{0x1f38,0x1f3f,1,-8},
|
|
|
|
|
{0x1f48,0x1f4d,1,-8},
|
|
|
|
|
{0x1f59,0x1f5f,2,-8},
|
|
|
|
|
{0x1f68,0x1f6f,1,-8},
|
|
|
|
|
{0x1f88,0x1f8f,1,-8},
|
|
|
|
|
{0x1f98,0x1f9f,1,-8},
|
|
|
|
|
{0x1fa8,0x1faf,1,-8},
|
|
|
|
|
{0x1fb8,0x1fb9,1,-8},
|
|
|
|
|
{0x1fba,0x1fbb,1,-74},
|
|
|
|
|
{0x1fbc,0x1fbc,-1,-9},
|
|
|
|
|
{0x1fc8,0x1fcb,1,-86},
|
|
|
|
|
{0x1fcc,0x1fcc,-1,-9},
|
|
|
|
|
{0x1fd8,0x1fd9,1,-8},
|
|
|
|
|
{0x1fda,0x1fdb,1,-100},
|
|
|
|
|
{0x1fe8,0x1fe9,1,-8},
|
|
|
|
|
{0x1fea,0x1feb,1,-112},
|
|
|
|
|
{0x1fec,0x1fec,-1,-7},
|
|
|
|
|
{0x1ff8,0x1ff9,1,-128},
|
|
|
|
|
{0x1ffa,0x1ffb,1,-126},
|
|
|
|
|
{0x1ffc,0x1ffc,-1,-9},
|
|
|
|
|
{0x2126,0x2126,-1,-7517},
|
|
|
|
|
{0x212a,0x212a,-1,-8383},
|
|
|
|
|
{0x212b,0x212b,-1,-8262},
|
|
|
|
|
{0x2132,0x2132,-1,28},
|
|
|
|
|
{0x2160,0x216f,1,16},
|
|
|
|
|
{0x2183,0x2183,-1,1},
|
|
|
|
|
{0x24b6,0x24cf,1,26},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x2c00,0x2c2f,1,48},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x2c60,0x2c60,-1,1},
|
|
|
|
|
{0x2c62,0x2c62,-1,-10743},
|
|
|
|
|
{0x2c63,0x2c63,-1,-3814},
|
|
|
|
|
{0x2c64,0x2c64,-1,-10727},
|
|
|
|
|
{0x2c67,0x2c6b,2,1},
|
|
|
|
|
{0x2c6d,0x2c6d,-1,-10780},
|
|
|
|
|
{0x2c6e,0x2c6e,-1,-10749},
|
|
|
|
|
{0x2c6f,0x2c6f,-1,-10783},
|
|
|
|
|
{0x2c70,0x2c70,-1,-10782},
|
|
|
|
|
{0x2c72,0x2c75,3,1},
|
|
|
|
|
{0x2c7e,0x2c7f,1,-10815},
|
|
|
|
|
{0x2c80,0x2ce2,2,1},
|
|
|
|
|
{0x2ceb,0x2ced,2,1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x2cf2,0xa640,31054,1},
|
|
|
|
|
{0xa642,0xa66c,2,1},
|
|
|
|
|
{0xa680,0xa69a,2,1},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xa722,0xa72e,2,1},
|
|
|
|
|
{0xa732,0xa76e,2,1},
|
|
|
|
|
{0xa779,0xa77b,2,1},
|
|
|
|
|
{0xa77d,0xa77d,-1,-35332},
|
|
|
|
|
{0xa77e,0xa786,2,1},
|
|
|
|
|
{0xa78b,0xa78b,-1,1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0xa78d,0xa78d,-1,-42280},
|
|
|
|
|
{0xa790,0xa792,2,1},
|
|
|
|
|
{0xa796,0xa7a8,2,1},
|
|
|
|
|
{0xa7aa,0xa7aa,-1,-42308},
|
|
|
|
|
{0xa7ab,0xa7ab,-1,-42319},
|
|
|
|
|
{0xa7ac,0xa7ac,-1,-42315},
|
|
|
|
|
{0xa7ad,0xa7ad,-1,-42305},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0xa7ae,0xa7ae,-1,-42308},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0xa7b0,0xa7b0,-1,-42258},
|
|
|
|
|
{0xa7b1,0xa7b1,-1,-42282},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0xa7b2,0xa7b2,-1,-42261},
|
|
|
|
|
{0xa7b3,0xa7b3,-1,928},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0xa7b4,0xa7c2,2,1},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0xa7c4,0xa7c4,-1,-48},
|
|
|
|
|
{0xa7c5,0xa7c5,-1,-42307},
|
|
|
|
|
{0xa7c6,0xa7c6,-1,-35384},
|
2021-06-27 21:30:14 +02:00
|
|
|
|
{0xa7c7,0xa7c9,2,1},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0xa7d0,0xa7d6,6,1},
|
|
|
|
|
{0xa7d8,0xa7f5,29,1},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xff21,0xff3a,1,32},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x10400,0x10427,1,40},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x104b0,0x104d3,1,40},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x10570,0x1057a,1,39},
|
|
|
|
|
{0x1057c,0x1058a,1,39},
|
|
|
|
|
{0x1058c,0x10592,1,39},
|
|
|
|
|
{0x10594,0x10595,1,39},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0x10c80,0x10cb2,1,64},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x118a0,0x118bf,1,32},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x16e40,0x16e5f,1,32},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1e900,0x1e921,1,34}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
};
|
|
|
|
|
|
2024-02-12 22:14:53 +01:00
|
|
|
|
// Note: UnicodeData.txt does not define U+1E9E as being the corresponding upper
|
|
|
|
|
// case letter for U+00DF (ß), however it is part of the toLower table
|
2005-06-01 21:51:55 +00:00
|
|
|
|
static convertStruct toUpper[] =
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x61,0x7a,1,-32},
|
|
|
|
|
{0xb5,0xb5,-1,743},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0xe0,0xf6,1,-32},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xf8,0xfe,1,-32},
|
|
|
|
|
{0xff,0xff,-1,121},
|
|
|
|
|
{0x101,0x12f,2,-1},
|
|
|
|
|
{0x131,0x131,-1,-232},
|
|
|
|
|
{0x133,0x137,2,-1},
|
|
|
|
|
{0x13a,0x148,2,-1},
|
|
|
|
|
{0x14b,0x177,2,-1},
|
|
|
|
|
{0x17a,0x17e,2,-1},
|
|
|
|
|
{0x17f,0x17f,-1,-300},
|
|
|
|
|
{0x180,0x180,-1,195},
|
|
|
|
|
{0x183,0x185,2,-1},
|
|
|
|
|
{0x188,0x18c,4,-1},
|
|
|
|
|
{0x192,0x192,-1,-1},
|
|
|
|
|
{0x195,0x195,-1,97},
|
|
|
|
|
{0x199,0x199,-1,-1},
|
|
|
|
|
{0x19a,0x19a,-1,163},
|
|
|
|
|
{0x19e,0x19e,-1,130},
|
|
|
|
|
{0x1a1,0x1a5,2,-1},
|
|
|
|
|
{0x1a8,0x1ad,5,-1},
|
|
|
|
|
{0x1b0,0x1b4,4,-1},
|
|
|
|
|
{0x1b6,0x1b9,3,-1},
|
|
|
|
|
{0x1bd,0x1bd,-1,-1},
|
|
|
|
|
{0x1bf,0x1bf,-1,56},
|
|
|
|
|
{0x1c5,0x1c5,-1,-1},
|
|
|
|
|
{0x1c6,0x1c6,-1,-2},
|
|
|
|
|
{0x1c8,0x1c8,-1,-1},
|
|
|
|
|
{0x1c9,0x1c9,-1,-2},
|
|
|
|
|
{0x1cb,0x1cb,-1,-1},
|
|
|
|
|
{0x1cc,0x1cc,-1,-2},
|
|
|
|
|
{0x1ce,0x1dc,2,-1},
|
|
|
|
|
{0x1dd,0x1dd,-1,-79},
|
|
|
|
|
{0x1df,0x1ef,2,-1},
|
|
|
|
|
{0x1f2,0x1f2,-1,-1},
|
|
|
|
|
{0x1f3,0x1f3,-1,-2},
|
|
|
|
|
{0x1f5,0x1f9,4,-1},
|
|
|
|
|
{0x1fb,0x21f,2,-1},
|
|
|
|
|
{0x223,0x233,2,-1},
|
|
|
|
|
{0x23c,0x23c,-1,-1},
|
|
|
|
|
{0x23f,0x240,1,10815},
|
|
|
|
|
{0x242,0x247,5,-1},
|
|
|
|
|
{0x249,0x24f,2,-1},
|
|
|
|
|
{0x250,0x250,-1,10783},
|
|
|
|
|
{0x251,0x251,-1,10780},
|
|
|
|
|
{0x252,0x252,-1,10782},
|
|
|
|
|
{0x253,0x253,-1,-210},
|
|
|
|
|
{0x254,0x254,-1,-206},
|
|
|
|
|
{0x256,0x257,1,-205},
|
|
|
|
|
{0x259,0x259,-1,-202},
|
|
|
|
|
{0x25b,0x25b,-1,-203},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x25c,0x25c,-1,42319},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x260,0x260,-1,-205},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x261,0x261,-1,42315},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x263,0x263,-1,-207},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x265,0x265,-1,42280},
|
|
|
|
|
{0x266,0x266,-1,42308},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x268,0x268,-1,-209},
|
|
|
|
|
{0x269,0x269,-1,-211},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x26a,0x26a,-1,42308},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x26b,0x26b,-1,10743},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x26c,0x26c,-1,42305},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x26f,0x26f,-1,-211},
|
|
|
|
|
{0x271,0x271,-1,10749},
|
|
|
|
|
{0x272,0x272,-1,-213},
|
|
|
|
|
{0x275,0x275,-1,-214},
|
|
|
|
|
{0x27d,0x27d,-1,10727},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x280,0x280,-1,-218},
|
|
|
|
|
{0x282,0x282,-1,42307},
|
|
|
|
|
{0x283,0x283,-1,-218},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x287,0x287,-1,42282},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x288,0x288,-1,-218},
|
|
|
|
|
{0x289,0x289,-1,-69},
|
|
|
|
|
{0x28a,0x28b,1,-217},
|
|
|
|
|
{0x28c,0x28c,-1,-71},
|
|
|
|
|
{0x292,0x292,-1,-219},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0x29d,0x29d,-1,42261},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x29e,0x29e,-1,42258},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x345,0x345,-1,84},
|
|
|
|
|
{0x371,0x373,2,-1},
|
|
|
|
|
{0x377,0x377,-1,-1},
|
|
|
|
|
{0x37b,0x37d,1,130},
|
|
|
|
|
{0x3ac,0x3ac,-1,-38},
|
|
|
|
|
{0x3ad,0x3af,1,-37},
|
|
|
|
|
{0x3b1,0x3c1,1,-32},
|
|
|
|
|
{0x3c2,0x3c2,-1,-31},
|
|
|
|
|
{0x3c3,0x3cb,1,-32},
|
|
|
|
|
{0x3cc,0x3cc,-1,-64},
|
|
|
|
|
{0x3cd,0x3ce,1,-63},
|
|
|
|
|
{0x3d0,0x3d0,-1,-62},
|
|
|
|
|
{0x3d1,0x3d1,-1,-57},
|
|
|
|
|
{0x3d5,0x3d5,-1,-47},
|
|
|
|
|
{0x3d6,0x3d6,-1,-54},
|
|
|
|
|
{0x3d7,0x3d7,-1,-8},
|
|
|
|
|
{0x3d9,0x3ef,2,-1},
|
|
|
|
|
{0x3f0,0x3f0,-1,-86},
|
|
|
|
|
{0x3f1,0x3f1,-1,-80},
|
|
|
|
|
{0x3f2,0x3f2,-1,7},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x3f3,0x3f3,-1,-116},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x3f5,0x3f5,-1,-96},
|
|
|
|
|
{0x3f8,0x3fb,3,-1},
|
|
|
|
|
{0x430,0x44f,1,-32},
|
|
|
|
|
{0x450,0x45f,1,-80},
|
|
|
|
|
{0x461,0x481,2,-1},
|
|
|
|
|
{0x48b,0x4bf,2,-1},
|
|
|
|
|
{0x4c2,0x4ce,2,-1},
|
|
|
|
|
{0x4cf,0x4cf,-1,-15},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x4d1,0x52f,2,-1},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x561,0x586,1,-48},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x10d0,0x10fa,1,3008},
|
|
|
|
|
{0x10fd,0x10ff,1,3008},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0x13f8,0x13fd,1,-8},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1c80,0x1c80,-1,-6254},
|
|
|
|
|
{0x1c81,0x1c81,-1,-6253},
|
|
|
|
|
{0x1c82,0x1c82,-1,-6244},
|
|
|
|
|
{0x1c83,0x1c84,1,-6242},
|
|
|
|
|
{0x1c85,0x1c85,-1,-6243},
|
|
|
|
|
{0x1c86,0x1c86,-1,-6236},
|
|
|
|
|
{0x1c87,0x1c87,-1,-6181},
|
|
|
|
|
{0x1c88,0x1c88,-1,35266},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1d79,0x1d79,-1,35332},
|
|
|
|
|
{0x1d7d,0x1d7d,-1,3814},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0x1d8e,0x1d8e,-1,35384},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x1e01,0x1e95,2,-1},
|
|
|
|
|
{0x1e9b,0x1e9b,-1,-59},
|
|
|
|
|
{0x1ea1,0x1eff,2,-1},
|
|
|
|
|
{0x1f00,0x1f07,1,8},
|
|
|
|
|
{0x1f10,0x1f15,1,8},
|
|
|
|
|
{0x1f20,0x1f27,1,8},
|
|
|
|
|
{0x1f30,0x1f37,1,8},
|
|
|
|
|
{0x1f40,0x1f45,1,8},
|
|
|
|
|
{0x1f51,0x1f57,2,8},
|
|
|
|
|
{0x1f60,0x1f67,1,8},
|
|
|
|
|
{0x1f70,0x1f71,1,74},
|
|
|
|
|
{0x1f72,0x1f75,1,86},
|
|
|
|
|
{0x1f76,0x1f77,1,100},
|
|
|
|
|
{0x1f78,0x1f79,1,128},
|
|
|
|
|
{0x1f7a,0x1f7b,1,112},
|
|
|
|
|
{0x1f7c,0x1f7d,1,126},
|
|
|
|
|
{0x1f80,0x1f87,1,8},
|
|
|
|
|
{0x1f90,0x1f97,1,8},
|
|
|
|
|
{0x1fa0,0x1fa7,1,8},
|
|
|
|
|
{0x1fb0,0x1fb1,1,8},
|
|
|
|
|
{0x1fb3,0x1fb3,-1,9},
|
|
|
|
|
{0x1fbe,0x1fbe,-1,-7205},
|
|
|
|
|
{0x1fc3,0x1fc3,-1,9},
|
|
|
|
|
{0x1fd0,0x1fd1,1,8},
|
|
|
|
|
{0x1fe0,0x1fe1,1,8},
|
|
|
|
|
{0x1fe5,0x1fe5,-1,7},
|
|
|
|
|
{0x1ff3,0x1ff3,-1,9},
|
|
|
|
|
{0x214e,0x214e,-1,-28},
|
|
|
|
|
{0x2170,0x217f,1,-16},
|
|
|
|
|
{0x2184,0x2184,-1,-1},
|
|
|
|
|
{0x24d0,0x24e9,1,-26},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x2c30,0x2c5f,1,-48},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x2c61,0x2c61,-1,-1},
|
|
|
|
|
{0x2c65,0x2c65,-1,-10795},
|
|
|
|
|
{0x2c66,0x2c66,-1,-10792},
|
|
|
|
|
{0x2c68,0x2c6c,2,-1},
|
|
|
|
|
{0x2c73,0x2c76,3,-1},
|
|
|
|
|
{0x2c81,0x2ce3,2,-1},
|
|
|
|
|
{0x2cec,0x2cee,2,-1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x2cf3,0x2cf3,-1,-1},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0x2d00,0x2d25,1,-7264},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x2d27,0x2d2d,6,-7264},
|
|
|
|
|
{0xa641,0xa66d,2,-1},
|
|
|
|
|
{0xa681,0xa69b,2,-1},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xa723,0xa72f,2,-1},
|
|
|
|
|
{0xa733,0xa76f,2,-1},
|
|
|
|
|
{0xa77a,0xa77c,2,-1},
|
|
|
|
|
{0xa77f,0xa787,2,-1},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0xa78c,0xa791,5,-1},
|
2019-04-12 20:08:55 +02:00
|
|
|
|
{0xa793,0xa793,-1,-1},
|
|
|
|
|
{0xa794,0xa794,-1,48},
|
|
|
|
|
{0xa797,0xa7a9,2,-1},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0xa7b5,0xa7c3,2,-1},
|
|
|
|
|
{0xa7c8,0xa7ca,2,-1},
|
|
|
|
|
{0xa7d1,0xa7d7,6,-1},
|
|
|
|
|
{0xa7d9,0xa7f6,29,-1},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0xab53,0xab53,-1,-928},
|
|
|
|
|
{0xab70,0xabbf,1,-38864},
|
2010-01-12 19:52:03 +01:00
|
|
|
|
{0xff41,0xff5a,1,-32},
|
2015-01-14 17:40:09 +01:00
|
|
|
|
{0x10428,0x1044f,1,-40},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x104d8,0x104fb,1,-40},
|
2022-09-25 19:25:51 +01:00
|
|
|
|
{0x10597,0x105a1,1,-39},
|
|
|
|
|
{0x105a3,0x105b1,1,-39},
|
|
|
|
|
{0x105b3,0x105b9,1,-39},
|
|
|
|
|
{0x105bb,0x105bc,1,-39},
|
2015-06-21 14:22:00 +02:00
|
|
|
|
{0x10cc0,0x10cf2,1,-64},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x118c0,0x118df,1,-32},
|
2018-07-14 19:30:36 +02:00
|
|
|
|
{0x16e60,0x16e7f,1,-32},
|
2016-06-26 17:53:07 +02:00
|
|
|
|
{0x1e922,0x1e943,1,-34}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
};
|
2016-03-21 22:15:30 +01:00
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Return the upper-case equivalent of "a", which is a UCS-4 character. Use
|
|
|
|
|
* simple case folding.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_toupper(int a)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// If 'casemap' contains "keepascii" use ASCII style toupper().
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (a < 128 && (cmp_flags & CMP_KEEPASCII))
|
|
|
|
|
return TOUPPER_ASC(a);
|
|
|
|
|
|
2006-04-30 18:54:39 +00:00
|
|
|
|
#if defined(HAVE_TOWUPPER) && defined(__STDC_ISO_10646__)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// If towupper() is available and handles Unicode, use it.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (!(cmp_flags & CMP_INTERNAL))
|
|
|
|
|
return towupper(a);
|
|
|
|
|
#endif
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// For characters below 128 use locale sensitive toupper().
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (a < 128)
|
|
|
|
|
return TOUPPER_LOC(a);
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// For any other characters use the above mapping table.
|
2011-12-08 15:09:52 +01:00
|
|
|
|
return utf_convert(a, toUpper, (int)sizeof(toUpper));
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_islower(int a)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// German sharp s is lower case but has no upper case equivalent.
|
2012-06-01 17:46:59 +02:00
|
|
|
|
return (utf_toupper(a) != a) || a == 0xdf;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return the lower-case equivalent of "a", which is a UCS-4 character. Use
|
|
|
|
|
* simple case folding.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_tolower(int a)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// If 'casemap' contains "keepascii" use ASCII style tolower().
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (a < 128 && (cmp_flags & CMP_KEEPASCII))
|
|
|
|
|
return TOLOWER_ASC(a);
|
|
|
|
|
|
2006-04-30 18:54:39 +00:00
|
|
|
|
#if defined(HAVE_TOWLOWER) && defined(__STDC_ISO_10646__)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// If towlower() is available and handles Unicode, use it.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (!(cmp_flags & CMP_INTERNAL))
|
|
|
|
|
return towlower(a);
|
|
|
|
|
#endif
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// For characters below 128 use locale sensitive tolower().
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (a < 128)
|
|
|
|
|
return TOLOWER_LOC(a);
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// For any other characters use the above mapping table.
|
2011-12-08 15:09:52 +01:00
|
|
|
|
return utf_convert(a, toLower, (int)sizeof(toLower));
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_isupper(int a)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
return (utf_tolower(a) != a);
|
|
|
|
|
}
|
|
|
|
|
|
2011-07-15 21:16:59 +02:00
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_strnicmp(
|
|
|
|
|
char_u *s1,
|
|
|
|
|
char_u *s2,
|
|
|
|
|
size_t n1,
|
|
|
|
|
size_t n2)
|
2011-07-15 21:16:59 +02:00
|
|
|
|
{
|
|
|
|
|
int c1, c2, cdiff;
|
|
|
|
|
char_u buffer[6];
|
|
|
|
|
|
|
|
|
|
for (;;)
|
|
|
|
|
{
|
|
|
|
|
c1 = utf_safe_read_char_adv(&s1, &n1);
|
|
|
|
|
c2 = utf_safe_read_char_adv(&s2, &n2);
|
|
|
|
|
|
|
|
|
|
if (c1 <= 0 || c2 <= 0)
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
if (c1 == c2)
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
cdiff = utf_fold(c1) - utf_fold(c2);
|
|
|
|
|
if (cdiff != 0)
|
|
|
|
|
return cdiff;
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// some string ended or has an incomplete/illegal character sequence
|
2011-07-15 21:16:59 +02:00
|
|
|
|
|
|
|
|
|
if (c1 == 0 || c2 == 0)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// some string ended. shorter string is smaller
|
2011-07-15 21:16:59 +02:00
|
|
|
|
if (c1 == 0 && c2 == 0)
|
|
|
|
|
return 0;
|
|
|
|
|
return c1 == 0 ? -1 : 1;
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Continue with bytewise comparison to produce some result that
|
|
|
|
|
// would make comparison operations involving this function transitive.
|
|
|
|
|
//
|
|
|
|
|
// If only one string had an error, comparison should be made with
|
|
|
|
|
// folded version of the other string. In this case it is enough
|
|
|
|
|
// to fold just one character to determine the result of comparison.
|
2011-07-15 21:16:59 +02:00
|
|
|
|
|
|
|
|
|
if (c1 != -1 && c2 == -1)
|
|
|
|
|
{
|
|
|
|
|
n1 = utf_char2bytes(utf_fold(c1), buffer);
|
|
|
|
|
s1 = buffer;
|
|
|
|
|
}
|
|
|
|
|
else if (c2 != -1 && c1 == -1)
|
|
|
|
|
{
|
|
|
|
|
n2 = utf_char2bytes(utf_fold(c2), buffer);
|
|
|
|
|
s2 = buffer;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
while (n1 > 0 && n2 > 0 && *s1 != NUL && *s2 != NUL)
|
|
|
|
|
{
|
|
|
|
|
cdiff = (int)(*s1) - (int)(*s2);
|
|
|
|
|
if (cdiff != 0)
|
|
|
|
|
return cdiff;
|
|
|
|
|
|
|
|
|
|
s1++;
|
|
|
|
|
s2++;
|
|
|
|
|
n1--;
|
|
|
|
|
n2--;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (n1 > 0 && *s1 == NUL)
|
|
|
|
|
n1 = 0;
|
|
|
|
|
if (n2 > 0 && *s2 == NUL)
|
|
|
|
|
n2 = 0;
|
|
|
|
|
|
|
|
|
|
if (n1 == 0 && n2 == 0)
|
|
|
|
|
return 0;
|
|
|
|
|
return n1 == 0 ? -1 : 1;
|
|
|
|
|
}
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Version of strnicmp() that handles multi-byte characters.
|
2013-01-17 14:39:47 +01:00
|
|
|
|
* Needed for Big5, Shift-JIS and UTF-8 encoding. Other DBCS encodings can
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* probably use strnicmp(), because there are no ASCII characters in the
|
|
|
|
|
* second byte.
|
|
|
|
|
* Returns zero if s1 and s2 are equal (ignoring case), the difference between
|
|
|
|
|
* two characters otherwise.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_strnicmp(char_u *s1, char_u *s2, size_t nn)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2011-07-15 21:16:59 +02:00
|
|
|
|
int i, l;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
int cdiff;
|
2006-04-17 22:14:47 +00:00
|
|
|
|
int n = (int)nn;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2011-07-15 21:16:59 +02:00
|
|
|
|
if (enc_utf8)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2011-07-15 21:16:59 +02:00
|
|
|
|
return utf_strnicmp(s1, s2, nn, nn);
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
for (i = 0; i < n; i += l)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (s1[i] == NUL && s2[i] == NUL) // both strings end
|
2011-07-15 21:16:59 +02:00
|
|
|
|
return 0;
|
|
|
|
|
|
2005-08-10 21:07:57 +00:00
|
|
|
|
l = (*mb_ptr2len)(s1 + i);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (l <= 1)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Single byte: first check normally, then with ignore case.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (s1[i] != s2[i])
|
|
|
|
|
{
|
2007-08-11 11:58:23 +00:00
|
|
|
|
cdiff = MB_TOLOWER(s1[i]) - MB_TOLOWER(s2[i]);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (cdiff != 0)
|
|
|
|
|
return cdiff;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// For non-Unicode multi-byte don't ignore case.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (l > n - i)
|
|
|
|
|
l = n - i;
|
|
|
|
|
cdiff = STRNCMP(s1 + i, s2 + i, l);
|
|
|
|
|
if (cdiff != 0)
|
|
|
|
|
return cdiff;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* "g8": show bytes of the UTF-8 char under the cursor. Doesn't matter what
|
|
|
|
|
* 'encoding' has been set to.
|
|
|
|
|
*/
|
|
|
|
|
void
|
2016-01-30 18:51:09 +01:00
|
|
|
|
show_utf8(void)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int len;
|
2005-05-19 21:00:46 +00:00
|
|
|
|
int rlen = 0;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
char_u *line;
|
|
|
|
|
int clen;
|
|
|
|
|
int i;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Get the byte length of the char under the cursor, including composing
|
|
|
|
|
// characters.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
line = ml_get_cursor();
|
2005-08-10 21:07:57 +00:00
|
|
|
|
len = utfc_ptr2len(line);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (len == 0)
|
|
|
|
|
{
|
2019-01-19 17:43:09 +01:00
|
|
|
|
msg("NUL");
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
clen = 0;
|
|
|
|
|
for (i = 0; i < len; ++i)
|
|
|
|
|
{
|
|
|
|
|
if (clen == 0)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// start of (composing) character, get its length
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (i > 0)
|
2005-05-19 21:00:46 +00:00
|
|
|
|
{
|
|
|
|
|
STRCPY(IObuff + rlen, "+ ");
|
|
|
|
|
rlen += 2;
|
|
|
|
|
}
|
2005-08-10 21:07:57 +00:00
|
|
|
|
clen = utf_ptr2len(line + i);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
2010-05-21 15:46:35 +02:00
|
|
|
|
sprintf((char *)IObuff + rlen, "%02x ",
|
2019-12-04 21:57:43 +01:00
|
|
|
|
(line[i] == NL) ? NUL : line[i]); // NUL is stored as NL
|
2004-06-13 20:20:40 +00:00
|
|
|
|
--clen;
|
2006-04-17 22:14:47 +00:00
|
|
|
|
rlen += (int)STRLEN(IObuff + rlen);
|
2005-05-19 21:00:46 +00:00
|
|
|
|
if (rlen > IOSIZE - 20)
|
|
|
|
|
break;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
2019-01-19 17:43:09 +01:00
|
|
|
|
msg((char *)IObuff);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* mb_head_off() function pointer.
|
|
|
|
|
* Return offset from "p" to the first byte of the character it points into.
|
2009-12-02 14:02:39 +00:00
|
|
|
|
* If "p" points to the NUL at the end of the string return 0.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* Returns 0 when already at the first byte of a character.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
latin_head_off(char_u *base UNUSED, char_u *p UNUSED)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
2019-09-10 21:27:18 +02:00
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_head_off(char_u *base, char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
char_u *q;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// It can't be a trailing byte when not using DBCS, at the start of the
|
|
|
|
|
// string or the previous byte can't start a double-byte.
|
2009-12-02 14:02:39 +00:00
|
|
|
|
if (p <= base || MB_BYTE2LEN(p[-1]) == 1 || *p == NUL)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 0;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// This is slow: need to start at the base and go forward until the
|
|
|
|
|
// byte we are looking for. Return 1 when we went past it, 0 otherwise.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
q = base;
|
|
|
|
|
while (q < p)
|
2005-08-10 21:07:57 +00:00
|
|
|
|
q += dbcs_ptr2len(q);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return (q == p) ? 0 : 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Special version of dbcs_head_off() that works for ScreenLines[], where
|
|
|
|
|
* single-width DBCS_JPNU characters are stored separately.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_screen_head_off(char_u *base, char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
char_u *q;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// It can't be a trailing byte when not using DBCS, at the start of the
|
|
|
|
|
// string or the previous byte can't start a double-byte.
|
|
|
|
|
// For euc-jp an 0x8e byte in the previous cell always means we have a
|
|
|
|
|
// lead byte in the current cell.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (p <= base
|
|
|
|
|
|| (enc_dbcs == DBCS_JPNU && p[-1] == 0x8e)
|
2009-12-02 14:02:39 +00:00
|
|
|
|
|| MB_BYTE2LEN(p[-1]) == 1
|
|
|
|
|
|| *p == NUL)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 0;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// This is slow: need to start at the base and go forward until the
|
|
|
|
|
// byte we are looking for. Return 1 when we went past it, 0 otherwise.
|
|
|
|
|
// For DBCS_JPNU look out for 0x8e, which means the second byte is not
|
|
|
|
|
// stored as the next byte.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
q = base;
|
|
|
|
|
while (q < p)
|
|
|
|
|
{
|
|
|
|
|
if (enc_dbcs == DBCS_JPNU && *q == 0x8e)
|
|
|
|
|
++q;
|
|
|
|
|
else
|
2005-08-10 21:07:57 +00:00
|
|
|
|
q += dbcs_ptr2len(q);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
return (q == p) ? 0 : 1;
|
|
|
|
|
}
|
|
|
|
|
|
2021-12-16 14:45:13 +00:00
|
|
|
|
/*
|
|
|
|
|
* Return offset from "p" to the start of a character, including composing
|
|
|
|
|
* characters. "base" must be the start of the string, which must be NUL
|
|
|
|
|
* terminated.
|
|
|
|
|
*/
|
2004-06-13 20:20:40 +00:00
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_head_off(char_u *base, char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
char_u *q;
|
|
|
|
|
char_u *s;
|
|
|
|
|
int c;
|
2009-12-02 14:02:39 +00:00
|
|
|
|
int len;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#ifdef FEAT_ARABIC
|
|
|
|
|
char_u *j;
|
|
|
|
|
#endif
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (*p < 0x80) // be quick for ASCII
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 0;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Skip backwards over trailing bytes: 10xx.xxxx
|
|
|
|
|
// Skip backwards again if on a composing char.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
for (q = p; ; --q)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Move s to the last byte of this char.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
for (s = q; (s[1] & 0xc0) == 0x80; ++s)
|
|
|
|
|
;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Move q to the first byte of this char.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
while (q > base && (*q & 0xc0) == 0x80)
|
|
|
|
|
--q;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Check for illegal sequence. Do allow an illegal byte after where we
|
|
|
|
|
// started.
|
2009-12-02 14:02:39 +00:00
|
|
|
|
len = utf8len_tab[*q];
|
|
|
|
|
if (len != (int)(s - q + 1) && len != (int)(p - q + 1))
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
|
|
if (q <= base)
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
c = utf_ptr2char(q);
|
|
|
|
|
if (utf_iscomposing(c))
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
#ifdef FEAT_ARABIC
|
|
|
|
|
if (arabic_maycombine(c))
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Advance to get a sneak-peak at the next char
|
2004-06-13 20:20:40 +00:00
|
|
|
|
j = q;
|
|
|
|
|
--j;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Move j to the first byte of this char.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
while (j > base && (*j & 0xc0) == 0x80)
|
|
|
|
|
--j;
|
|
|
|
|
if (arabic_combine(utf_ptr2char(j), c))
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return (int)(p - q);
|
|
|
|
|
}
|
|
|
|
|
|
2020-06-04 18:22:13 +02:00
|
|
|
|
/*
|
|
|
|
|
* Whether space is NOT allowed before/after 'c'.
|
|
|
|
|
*/
|
|
|
|
|
int
|
|
|
|
|
utf_eat_space(int cc)
|
|
|
|
|
{
|
|
|
|
|
return ((cc >= 0x2000 && cc <= 0x206F) // General punctuations
|
|
|
|
|
|| (cc >= 0x2e00 && cc <= 0x2e7f) // Supplemental punctuations
|
|
|
|
|
|| (cc >= 0x3000 && cc <= 0x303f) // CJK symbols and punctuations
|
|
|
|
|
|| (cc >= 0xff01 && cc <= 0xff0f) // Full width ASCII punctuations
|
|
|
|
|
|| (cc >= 0xff1a && cc <= 0xff20) // ..
|
|
|
|
|
|| (cc >= 0xff3b && cc <= 0xff40) // ..
|
|
|
|
|
|| (cc >= 0xff5b && cc <= 0xff65)); // ..
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Whether line break is allowed before "cc".
|
|
|
|
|
*/
|
|
|
|
|
int
|
|
|
|
|
utf_allow_break_before(int cc)
|
|
|
|
|
{
|
|
|
|
|
static const int BOL_prohibition_punct[] =
|
|
|
|
|
{
|
|
|
|
|
'!',
|
|
|
|
|
'%',
|
|
|
|
|
')',
|
|
|
|
|
',',
|
|
|
|
|
':',
|
|
|
|
|
';',
|
|
|
|
|
'>',
|
|
|
|
|
'?',
|
|
|
|
|
']',
|
|
|
|
|
'}',
|
|
|
|
|
0x2019, // ’ right single quotation mark
|
|
|
|
|
0x201d, // ” right double quotation mark
|
|
|
|
|
0x2020, // † dagger
|
|
|
|
|
0x2021, // ‡ double dagger
|
|
|
|
|
0x2026, // … horizontal ellipsis
|
|
|
|
|
0x2030, // ‰ per mille sign
|
2023-09-09 11:23:50 +02:00
|
|
|
|
0x2031, // ‱ per ten thousand sign
|
2020-06-04 18:22:13 +02:00
|
|
|
|
0x203c, // ‼ double exclamation mark
|
|
|
|
|
0x2047, // ⁇ double question mark
|
|
|
|
|
0x2048, // ⁈ question exclamation mark
|
|
|
|
|
0x2049, // ⁉ exclamation question mark
|
|
|
|
|
0x2103, // ℃ degree celsius
|
|
|
|
|
0x2109, // ℉ degree fahrenheit
|
|
|
|
|
0x3001, // 、 ideographic comma
|
|
|
|
|
0x3002, // 。 ideographic full stop
|
|
|
|
|
0x3009, // 〉 right angle bracket
|
|
|
|
|
0x300b, // 》 right double angle bracket
|
|
|
|
|
0x300d, // 」 right corner bracket
|
|
|
|
|
0x300f, // 』 right white corner bracket
|
|
|
|
|
0x3011, // 】 right black lenticular bracket
|
|
|
|
|
0x3015, // 〕 right tortoise shell bracket
|
|
|
|
|
0x3017, // 〗 right white lenticular bracket
|
|
|
|
|
0x3019, // 〙 right white tortoise shell bracket
|
|
|
|
|
0x301b, // 〛 right white square bracket
|
|
|
|
|
0xff01, // ! fullwidth exclamation mark
|
|
|
|
|
0xff09, // ) fullwidth right parenthesis
|
|
|
|
|
0xff0c, // , fullwidth comma
|
|
|
|
|
0xff0e, // . fullwidth full stop
|
|
|
|
|
0xff1a, // : fullwidth colon
|
|
|
|
|
0xff1b, // ; fullwidth semicolon
|
|
|
|
|
0xff1f, // ? fullwidth question mark
|
|
|
|
|
0xff3d, // ] fullwidth right square bracket
|
|
|
|
|
0xff5d, // } fullwidth right curly bracket
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
int first = 0;
|
2021-06-02 13:28:16 +02:00
|
|
|
|
int last = ARRAY_LENGTH(BOL_prohibition_punct) - 1;
|
2020-06-04 18:22:13 +02:00
|
|
|
|
int mid = 0;
|
|
|
|
|
|
|
|
|
|
while (first < last)
|
|
|
|
|
{
|
|
|
|
|
mid = (first + last)/2;
|
|
|
|
|
|
|
|
|
|
if (cc == BOL_prohibition_punct[mid])
|
|
|
|
|
return FALSE;
|
|
|
|
|
else if (cc > BOL_prohibition_punct[mid])
|
|
|
|
|
first = mid + 1;
|
|
|
|
|
else
|
|
|
|
|
last = mid - 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return cc != BOL_prohibition_punct[first];
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Whether line break is allowed after "cc".
|
|
|
|
|
*/
|
|
|
|
|
static int
|
|
|
|
|
utf_allow_break_after(int cc)
|
|
|
|
|
{
|
|
|
|
|
static const int EOL_prohibition_punct[] =
|
|
|
|
|
{
|
|
|
|
|
'(',
|
|
|
|
|
'<',
|
|
|
|
|
'[',
|
|
|
|
|
'`',
|
|
|
|
|
'{',
|
|
|
|
|
//0x2014, // — em dash
|
|
|
|
|
0x2018, // ‘ left single quotation mark
|
|
|
|
|
0x201c, // “ left double quotation mark
|
|
|
|
|
//0x2053, // ~ swung dash
|
|
|
|
|
0x3008, // 〈 left angle bracket
|
|
|
|
|
0x300a, // 《 left double angle bracket
|
|
|
|
|
0x300c, // 「 left corner bracket
|
|
|
|
|
0x300e, // 『 left white corner bracket
|
|
|
|
|
0x3010, // 【 left black lenticular bracket
|
|
|
|
|
0x3014, // 〔 left tortoise shell bracket
|
|
|
|
|
0x3016, // 〖 left white lenticular bracket
|
|
|
|
|
0x3018, // 〘 left white tortoise shell bracket
|
|
|
|
|
0x301a, // 〚 left white square bracket
|
|
|
|
|
0xff08, // ( fullwidth left parenthesis
|
|
|
|
|
0xff3b, // [ fullwidth left square bracket
|
|
|
|
|
0xff5b, // { fullwidth left curly bracket
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
int first = 0;
|
2021-06-02 13:28:16 +02:00
|
|
|
|
int last = ARRAY_LENGTH(EOL_prohibition_punct) - 1;
|
2020-06-04 18:22:13 +02:00
|
|
|
|
int mid = 0;
|
|
|
|
|
|
|
|
|
|
while (first < last)
|
|
|
|
|
{
|
|
|
|
|
mid = (first + last)/2;
|
|
|
|
|
|
|
|
|
|
if (cc == EOL_prohibition_punct[mid])
|
|
|
|
|
return FALSE;
|
|
|
|
|
else if (cc > EOL_prohibition_punct[mid])
|
|
|
|
|
first = mid + 1;
|
|
|
|
|
else
|
|
|
|
|
last = mid - 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return cc != EOL_prohibition_punct[first];
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Whether line break is allowed between "cc" and "ncc".
|
|
|
|
|
*/
|
|
|
|
|
int
|
|
|
|
|
utf_allow_break(int cc, int ncc)
|
|
|
|
|
{
|
|
|
|
|
// don't break between two-letter punctuations
|
|
|
|
|
if (cc == ncc
|
|
|
|
|
&& (cc == 0x2014 // em dash
|
|
|
|
|
|| cc == 0x2026)) // horizontal ellipsis
|
|
|
|
|
return FALSE;
|
|
|
|
|
|
|
|
|
|
return utf_allow_break_after(cc) && utf_allow_break_before(ncc);
|
|
|
|
|
}
|
|
|
|
|
|
2005-01-14 21:48:43 +00:00
|
|
|
|
/*
|
|
|
|
|
* Copy a character from "*fp" to "*tp" and advance the pointers.
|
|
|
|
|
*/
|
|
|
|
|
void
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_copy_char(char_u **fp, char_u **tp)
|
2005-01-14 21:48:43 +00:00
|
|
|
|
{
|
2005-08-10 21:07:57 +00:00
|
|
|
|
int l = (*mb_ptr2len)(*fp);
|
2005-01-14 21:48:43 +00:00
|
|
|
|
|
|
|
|
|
mch_memmove(*tp, *fp, (size_t)l);
|
|
|
|
|
*tp += l;
|
|
|
|
|
*fp += l;
|
|
|
|
|
}
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Return the offset from "p" to the first byte of a character. When "p" is
|
|
|
|
|
* at the start of a character 0 is returned, otherwise the offset to the next
|
|
|
|
|
* character. Can start anywhere in a stream of bytes.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_off_next(char_u *base, char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2024-02-26 20:38:36 +01:00
|
|
|
|
int head_off = (*mb_head_off)(base, p);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2024-02-26 20:38:36 +01:00
|
|
|
|
if (head_off == 0)
|
|
|
|
|
return 0;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2024-02-26 20:38:36 +01:00
|
|
|
|
return (*mb_ptr2len)(p - head_off) - head_off;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return the offset from "p" to the last byte of the character it points
|
|
|
|
|
* into. Can start anywhere in a stream of bytes.
|
2021-12-16 14:45:13 +00:00
|
|
|
|
* Composing characters are not included.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_tail_off(char_u *base, char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int i;
|
|
|
|
|
int j;
|
|
|
|
|
|
|
|
|
|
if (*p == NUL)
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
|
|
if (enc_utf8)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Find the last character that is 10xx.xxxx
|
2004-06-13 20:20:40 +00:00
|
|
|
|
for (i = 0; (p[i + 1] & 0xc0) == 0x80; ++i)
|
|
|
|
|
;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Check for illegal sequence.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
for (j = 0; p - j > base; ++j)
|
|
|
|
|
if ((p[-j] & 0xc0) != 0x80)
|
|
|
|
|
break;
|
|
|
|
|
if (utf8len_tab[p[-j]] != i + j + 1)
|
|
|
|
|
return 0;
|
|
|
|
|
return i;
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// It can't be the first byte if a double-byte when not using DBCS, at the
|
|
|
|
|
// end of the string or the byte can't start a double-byte.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (enc_dbcs == 0 || p[1] == NUL || MB_BYTE2LEN(*p) == 1)
|
|
|
|
|
return 0;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Return 1 when on the lead byte, 0 when on the tail byte.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 1 - dbcs_head_off(base, p);
|
|
|
|
|
}
|
|
|
|
|
|
2006-03-17 23:12:21 +00:00
|
|
|
|
/*
|
|
|
|
|
* Find the next illegal byte sequence.
|
|
|
|
|
*/
|
|
|
|
|
void
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_find_illegal(void)
|
2006-03-17 23:12:21 +00:00
|
|
|
|
{
|
|
|
|
|
pos_T pos = curwin->w_cursor;
|
|
|
|
|
char_u *p;
|
|
|
|
|
int len;
|
|
|
|
|
vimconv_T vimconv;
|
|
|
|
|
char_u *tofree = NULL;
|
|
|
|
|
|
|
|
|
|
vimconv.vc_type = CONV_NONE;
|
|
|
|
|
if (enc_utf8 && (enc_canon_props(curbuf->b_p_fenc) & ENC_8BIT))
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// 'encoding' is "utf-8" but we are editing a 8-bit encoded file,
|
|
|
|
|
// possibly a utf-8 file with illegal bytes. Setup for conversion
|
|
|
|
|
// from utf-8 to 'fileencoding'.
|
2006-03-17 23:12:21 +00:00
|
|
|
|
convert_setup(&vimconv, p_enc, curbuf->b_p_fenc);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
curwin->w_cursor.coladd = 0;
|
|
|
|
|
for (;;)
|
|
|
|
|
{
|
|
|
|
|
p = ml_get_cursor();
|
|
|
|
|
if (vimconv.vc_type != CONV_NONE)
|
|
|
|
|
{
|
|
|
|
|
vim_free(tofree);
|
|
|
|
|
tofree = string_convert(&vimconv, p, NULL);
|
|
|
|
|
if (tofree == NULL)
|
|
|
|
|
break;
|
|
|
|
|
p = tofree;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
while (*p != NUL)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Illegal means that there are not enough trail bytes (checked by
|
|
|
|
|
// utf_ptr2len()) or too many of them (overlong sequence).
|
2006-03-17 23:12:21 +00:00
|
|
|
|
len = utf_ptr2len(p);
|
|
|
|
|
if (*p >= 0x80 && (len == 1
|
|
|
|
|
|| utf_char2len(utf_ptr2char(p)) != len))
|
|
|
|
|
{
|
|
|
|
|
if (vimconv.vc_type == CONV_NONE)
|
2006-04-17 22:14:47 +00:00
|
|
|
|
curwin->w_cursor.col += (colnr_T)(p - ml_get_cursor());
|
2006-03-17 23:12:21 +00:00
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
int l;
|
|
|
|
|
|
2006-04-17 22:14:47 +00:00
|
|
|
|
len = (int)(p - tofree);
|
2006-03-17 23:12:21 +00:00
|
|
|
|
for (p = ml_get_cursor(); *p != NUL && len-- > 0; p += l)
|
|
|
|
|
{
|
|
|
|
|
l = utf_ptr2len(p);
|
|
|
|
|
curwin->w_cursor.col += l;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
goto theend;
|
|
|
|
|
}
|
|
|
|
|
p += len;
|
|
|
|
|
}
|
|
|
|
|
if (curwin->w_cursor.lnum == curbuf->b_ml.ml_line_count)
|
|
|
|
|
break;
|
|
|
|
|
++curwin->w_cursor.lnum;
|
|
|
|
|
curwin->w_cursor.col = 0;
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// didn't find it: don't move and beep
|
2006-03-17 23:12:21 +00:00
|
|
|
|
curwin->w_cursor = pos;
|
|
|
|
|
beep_flush();
|
|
|
|
|
|
|
|
|
|
theend:
|
|
|
|
|
vim_free(tofree);
|
|
|
|
|
convert_setup(&vimconv, NULL, NULL);
|
|
|
|
|
}
|
|
|
|
|
|
2022-05-08 22:32:58 +01:00
|
|
|
|
#if defined(FEAT_GUI_GTK) || defined(FEAT_SPELL) || defined(PROTO)
|
2004-10-07 21:02:47 +00:00
|
|
|
|
/*
|
|
|
|
|
* Return TRUE if string "s" is a valid utf-8 string.
|
2022-07-04 17:34:33 +01:00
|
|
|
|
* When "end" is NULL stop at the first NUL. Otherwise stop at "end".
|
2004-10-07 21:02:47 +00:00
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
utf_valid_string(char_u *s, char_u *end)
|
2004-10-07 21:02:47 +00:00
|
|
|
|
{
|
|
|
|
|
int l;
|
|
|
|
|
char_u *p = s;
|
|
|
|
|
|
|
|
|
|
while (end == NULL ? *p != NUL : p < end)
|
|
|
|
|
{
|
2009-12-02 14:02:39 +00:00
|
|
|
|
l = utf8len_tab_zero[*p];
|
|
|
|
|
if (l == 0)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return FALSE; // invalid lead byte
|
2004-10-07 21:02:47 +00:00
|
|
|
|
if (end != NULL && p + l > end)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return FALSE; // incomplete byte sequence
|
2004-10-07 21:02:47 +00:00
|
|
|
|
++p;
|
|
|
|
|
while (--l > 0)
|
|
|
|
|
if ((*p++ & 0xc0) != 0x80)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return FALSE; // invalid trail byte
|
2004-10-07 21:02:47 +00:00
|
|
|
|
}
|
|
|
|
|
return TRUE;
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
#if defined(FEAT_GUI) || defined(PROTO)
|
|
|
|
|
/*
|
|
|
|
|
* Special version of mb_tail_off() for use in ScreenLines[].
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
dbcs_screen_tail_off(char_u *base, char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// It can't be the first byte if a double-byte when not using DBCS, at the
|
|
|
|
|
// end of the string or the byte can't start a double-byte.
|
|
|
|
|
// For euc-jp an 0x8e byte always means we have a lead byte in the current
|
|
|
|
|
// cell.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (*p == NUL || p[1] == NUL
|
|
|
|
|
|| (enc_dbcs == DBCS_JPNU && *p == 0x8e)
|
|
|
|
|
|| MB_BYTE2LEN(*p) == 1)
|
|
|
|
|
return 0;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Return 1 when on the lead byte, 0 when on the tail byte.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 1 - dbcs_screen_head_off(base, p);
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* If the cursor moves on an trail byte, set the cursor on the lead byte.
|
|
|
|
|
* Thus it moves left if necessary.
|
|
|
|
|
* Return TRUE when the cursor was adjusted.
|
|
|
|
|
*/
|
|
|
|
|
void
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_adjust_cursor(void)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2011-07-07 15:08:58 +02:00
|
|
|
|
mb_adjustpos(curbuf, &curwin->w_cursor);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Adjust position "*lp" to point to the first byte of a multi-byte character.
|
|
|
|
|
* If it points to a tail byte it's moved backwards to the head byte.
|
|
|
|
|
*/
|
|
|
|
|
void
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_adjustpos(buf_T *buf, pos_T *lp)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
char_u *p;
|
|
|
|
|
|
2019-01-26 17:28:26 +01:00
|
|
|
|
if (lp->col > 0 || lp->coladd > 1)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2011-07-07 15:08:58 +02:00
|
|
|
|
p = ml_get_buf(buf, lp->lnum, FALSE);
|
2024-03-12 21:50:32 +01:00
|
|
|
|
if (*p == NUL || ml_get_buf_len(buf, lp->lnum) < lp->col)
|
2018-01-27 21:01:34 +01:00
|
|
|
|
lp->col = 0;
|
|
|
|
|
else
|
|
|
|
|
lp->col -= (*mb_head_off)(p, p + lp->col);
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Reset "coladd" when the cursor would be on the right half of a
|
|
|
|
|
// double-wide character.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (lp->coladd == 1
|
|
|
|
|
&& p[lp->col] != TAB
|
|
|
|
|
&& vim_isprintc((*mb_ptr2char)(p + lp->col))
|
|
|
|
|
&& ptr2cells(p + lp->col) > 1)
|
|
|
|
|
lp->coladd = 0;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return a pointer to the character before "*p", if there is one.
|
|
|
|
|
*/
|
|
|
|
|
char_u *
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_prevptr(
|
2019-12-04 21:57:43 +01:00
|
|
|
|
char_u *line, // start of the string
|
2016-01-30 18:51:09 +01:00
|
|
|
|
char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (p > line)
|
2017-03-12 19:22:36 +01:00
|
|
|
|
MB_PTR_BACK(line, p);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return p;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
2005-08-10 21:07:57 +00:00
|
|
|
|
* Return the character length of "str". Each multi-byte character (with
|
|
|
|
|
* following composing characters) counts as one.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_charlen(char_u *str)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2005-08-29 22:25:38 +00:00
|
|
|
|
char_u *p = str;
|
|
|
|
|
int count;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2005-08-29 22:25:38 +00:00
|
|
|
|
if (p == NULL)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return 0;
|
|
|
|
|
|
2005-08-29 22:25:38 +00:00
|
|
|
|
for (count = 0; *p != NUL; count++)
|
|
|
|
|
p += (*mb_ptr2len)(p);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
return count;
|
|
|
|
|
}
|
|
|
|
|
|
2005-08-29 22:25:38 +00:00
|
|
|
|
/*
|
|
|
|
|
* Like mb_charlen() but for a string with specified length.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_charlen_len(char_u *str, int len)
|
2005-08-29 22:25:38 +00:00
|
|
|
|
{
|
|
|
|
|
char_u *p = str;
|
|
|
|
|
int count;
|
|
|
|
|
|
|
|
|
|
for (count = 0; *p != NUL && p < str + len; count++)
|
|
|
|
|
p += (*mb_ptr2len)(p);
|
|
|
|
|
|
|
|
|
|
return count;
|
|
|
|
|
}
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Try to un-escape a multi-byte character.
|
|
|
|
|
* Used for the "to" and "from" part of a mapping.
|
|
|
|
|
* Return the un-escaped string if it is a multi-byte character, and advance
|
|
|
|
|
* "pp" to just after the bytes that formed it.
|
|
|
|
|
* Return NULL if no multi-byte char was found.
|
|
|
|
|
*/
|
|
|
|
|
char_u *
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_unescape(char_u **pp)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2012-09-18 18:03:37 +02:00
|
|
|
|
static char_u buf[6];
|
|
|
|
|
int n;
|
|
|
|
|
int m = 0;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
char_u *str = *pp;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Must translate K_SPECIAL KS_SPECIAL KE_FILLER to K_SPECIAL and CSI
|
|
|
|
|
// KS_EXTRA KE_CSI to CSI.
|
|
|
|
|
// Maximum length of a utf-8 character is 4 bytes.
|
2012-09-18 18:03:37 +02:00
|
|
|
|
for (n = 0; str[n] != NUL && m < 4; ++n)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (str[n] == K_SPECIAL
|
|
|
|
|
&& str[n + 1] == KS_SPECIAL
|
|
|
|
|
&& str[n + 2] == KE_FILLER)
|
|
|
|
|
{
|
|
|
|
|
buf[m++] = K_SPECIAL;
|
|
|
|
|
n += 2;
|
|
|
|
|
}
|
2008-01-06 16:18:56 +00:00
|
|
|
|
else if ((str[n] == K_SPECIAL
|
2004-06-13 20:20:40 +00:00
|
|
|
|
# ifdef FEAT_GUI
|
2008-01-06 16:18:56 +00:00
|
|
|
|
|| str[n] == CSI
|
|
|
|
|
# endif
|
|
|
|
|
)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
&& str[n + 1] == KS_EXTRA
|
|
|
|
|
&& str[n + 2] == (int)KE_CSI)
|
|
|
|
|
{
|
|
|
|
|
buf[m++] = CSI;
|
|
|
|
|
n += 2;
|
|
|
|
|
}
|
|
|
|
|
else if (str[n] == K_SPECIAL
|
|
|
|
|
# ifdef FEAT_GUI
|
|
|
|
|
|| str[n] == CSI
|
|
|
|
|
# endif
|
|
|
|
|
)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
break; // a special key can't be a multibyte char
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else
|
|
|
|
|
buf[m++] = str[n];
|
|
|
|
|
buf[m] = NUL;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Return a multi-byte character if it's found. An illegal sequence
|
|
|
|
|
// will result in a 1 here.
|
2005-08-10 21:07:57 +00:00
|
|
|
|
if ((*mb_ptr2len)(buf) > 1)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
*pp = str + n + 1;
|
|
|
|
|
return buf;
|
|
|
|
|
}
|
2012-09-18 18:03:37 +02:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Bail out quickly for ASCII.
|
2012-09-18 18:03:37 +02:00
|
|
|
|
if (buf[0] < 128)
|
|
|
|
|
break;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return TRUE if the character at "row"/"col" on the screen is the left side
|
|
|
|
|
* of a double-width character.
|
|
|
|
|
* Caller must make sure "row" and "col" are not invalid!
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_lefthalve(int row, int col)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2007-08-30 11:53:22 +00:00
|
|
|
|
return (*mb_off2cells)(LineOffset[row] + col,
|
|
|
|
|
LineOffset[row] + screen_Columns) > 1;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
2008-08-06 17:06:04 +00:00
|
|
|
|
* Correct a position on the screen, if it's the right half of a double-wide
|
|
|
|
|
* char move it to the left half. Returns the corrected column.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
mb_fix_col(int col, int row)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-07-08 23:30:22 +02:00
|
|
|
|
int off;
|
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
col = check_col(col);
|
|
|
|
|
row = check_row(row);
|
2019-07-08 23:30:22 +02:00
|
|
|
|
off = LineOffset[row] + col;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (has_mbyte && ScreenLines != NULL && col > 0
|
|
|
|
|
&& ((enc_dbcs
|
2019-07-08 23:30:22 +02:00
|
|
|
|
&& ScreenLines[off] != NUL
|
2004-06-13 20:20:40 +00:00
|
|
|
|
&& dbcs_screen_head_off(ScreenLines + LineOffset[row],
|
2019-07-08 23:30:22 +02:00
|
|
|
|
ScreenLines + off))
|
|
|
|
|
|| (enc_utf8 && ScreenLines[off] == 0
|
|
|
|
|
&& ScreenLinesUC[off] == 0)))
|
2008-06-24 21:16:56 +00:00
|
|
|
|
return col - 1;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return col;
|
|
|
|
|
}
|
|
|
|
|
|
2016-01-29 22:36:45 +01:00
|
|
|
|
static int enc_alias_search(char_u *name);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Skip the Vim specific head of a 'encoding' name.
|
|
|
|
|
*/
|
|
|
|
|
char_u *
|
2016-01-30 18:51:09 +01:00
|
|
|
|
enc_skip(char_u *p)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (STRNCMP(p, "2byte-", 6) == 0)
|
|
|
|
|
return p + 6;
|
|
|
|
|
if (STRNCMP(p, "8bit-", 5) == 0)
|
|
|
|
|
return p + 5;
|
|
|
|
|
return p;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Find the canonical name for encoding "enc".
|
|
|
|
|
* When the name isn't recognized, returns "enc" itself, but with all lower
|
|
|
|
|
* case characters and '_' replaced with '-'.
|
|
|
|
|
* Returns an allocated string. NULL for out-of-memory.
|
|
|
|
|
*/
|
|
|
|
|
char_u *
|
2016-01-30 18:51:09 +01:00
|
|
|
|
enc_canonize(char_u *enc)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
char_u *r;
|
|
|
|
|
char_u *p, *s;
|
|
|
|
|
int i;
|
|
|
|
|
|
2004-12-19 22:46:22 +00:00
|
|
|
|
if (STRCMP(enc, "default") == 0)
|
|
|
|
|
{
|
2021-05-31 18:40:49 +02:00
|
|
|
|
#ifdef MSWIN
|
|
|
|
|
// Use the system encoding, the default is always utf-8.
|
|
|
|
|
r = enc_locale();
|
|
|
|
|
#else
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Use the default encoding as it's found by set_init_1().
|
2004-12-19 22:46:22 +00:00
|
|
|
|
r = get_encoding_default();
|
2021-05-31 18:40:49 +02:00
|
|
|
|
#endif
|
2004-12-19 22:46:22 +00:00
|
|
|
|
if (r == NULL)
|
2021-05-30 18:04:19 +02:00
|
|
|
|
r = (char_u *)ENC_DFLT;
|
2004-12-19 22:46:22 +00:00
|
|
|
|
return vim_strsave(r);
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// copy "enc" to allocated memory, with room for two '-'
|
2019-05-24 18:54:09 +02:00
|
|
|
|
r = alloc(STRLEN(enc) + 3);
|
2023-01-14 12:32:28 +00:00
|
|
|
|
if (r == NULL)
|
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
|
|
// Make it all lower case and replace '_' with '-'.
|
|
|
|
|
p = r;
|
|
|
|
|
for (s = enc; *s != NUL; ++s)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2023-01-14 12:32:28 +00:00
|
|
|
|
if (*s == '_')
|
|
|
|
|
*p++ = '-';
|
|
|
|
|
else
|
|
|
|
|
*p++ = TOLOWER_ASC(*s);
|
|
|
|
|
}
|
|
|
|
|
*p = NUL;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2023-01-14 12:32:28 +00:00
|
|
|
|
// Skip "2byte-" and "8bit-".
|
|
|
|
|
p = enc_skip(r);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2023-01-14 12:32:28 +00:00
|
|
|
|
// Change "microsoft-cp" to "cp". Used in some spell files.
|
|
|
|
|
if (STRNCMP(p, "microsoft-cp", 12) == 0)
|
|
|
|
|
STRMOVE(p, p + 10);
|
2005-08-15 21:41:48 +00:00
|
|
|
|
|
2023-01-14 12:32:28 +00:00
|
|
|
|
// "iso8859" -> "iso-8859"
|
|
|
|
|
if (STRNCMP(p, "iso8859", 7) == 0)
|
|
|
|
|
{
|
|
|
|
|
STRMOVE(p + 4, p + 3);
|
|
|
|
|
p[3] = '-';
|
|
|
|
|
}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2023-01-14 12:32:28 +00:00
|
|
|
|
// "iso-8859n" -> "iso-8859-n"
|
2024-01-04 21:19:04 +01:00
|
|
|
|
if (STRNCMP(p, "iso-8859", 8) == 0 && SAFE_isdigit(p[8]))
|
2023-01-14 12:32:28 +00:00
|
|
|
|
{
|
|
|
|
|
STRMOVE(p + 9, p + 8);
|
|
|
|
|
p[8] = '-';
|
|
|
|
|
}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2023-01-14 12:32:28 +00:00
|
|
|
|
// "latin-N" -> "latinN"
|
|
|
|
|
if (STRNCMP(p, "latin-", 6) == 0)
|
|
|
|
|
STRMOVE(p + 5, p + 6);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2023-01-14 12:32:28 +00:00
|
|
|
|
if (enc_canon_search(p) >= 0)
|
|
|
|
|
{
|
|
|
|
|
// canonical name can be used unmodified
|
|
|
|
|
if (p != r)
|
|
|
|
|
STRMOVE(r, p);
|
|
|
|
|
}
|
|
|
|
|
else if ((i = enc_alias_search(p)) >= 0)
|
|
|
|
|
{
|
|
|
|
|
// alias recognized, get canonical name
|
|
|
|
|
vim_free(r);
|
|
|
|
|
r = vim_strsave((char_u *)enc_canon_table[i].name);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
return r;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Search for an encoding alias of "name".
|
|
|
|
|
* Returns -1 when not found.
|
|
|
|
|
*/
|
|
|
|
|
static int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
enc_alias_search(char_u *name)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
for (i = 0; enc_alias_table[i].name != NULL; ++i)
|
|
|
|
|
if (STRCMP(name, enc_alias_table[i].name) == 0)
|
|
|
|
|
return enc_alias_table[i].canon;
|
|
|
|
|
return -1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
2019-01-24 15:54:21 +01:00
|
|
|
|
#ifdef HAVE_LANGINFO_H
|
|
|
|
|
# include <langinfo.h>
|
|
|
|
|
#endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2019-04-28 19:46:49 +02:00
|
|
|
|
#if !defined(FEAT_GUI_MSWIN) || defined(VIMDLL)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
2017-09-26 19:10:37 +02:00
|
|
|
|
* Get the canonicalized encoding from the specified locale string "locale"
|
|
|
|
|
* or from the environment variables LC_ALL, LC_CTYPE and LANG.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
* Returns an allocated string when successful, NULL when not.
|
|
|
|
|
*/
|
|
|
|
|
char_u *
|
2017-09-26 19:10:37 +02:00
|
|
|
|
enc_locale_env(char *locale)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2017-09-26 19:10:37 +02:00
|
|
|
|
char *s = locale;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
char *p;
|
|
|
|
|
int i;
|
|
|
|
|
char buf[50];
|
|
|
|
|
|
2017-09-26 19:10:37 +02:00
|
|
|
|
if (s == NULL || *s == NUL)
|
|
|
|
|
if ((s = getenv("LC_ALL")) == NULL || *s == NUL)
|
|
|
|
|
if ((s = getenv("LC_CTYPE")) == NULL || *s == NUL)
|
|
|
|
|
s = getenv("LANG");
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
if (s == NULL || *s == NUL)
|
2017-09-26 19:10:37 +02:00
|
|
|
|
return NULL;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// The most generic locale format is:
|
|
|
|
|
// language[_territory][.codeset][@modifier][+special][,[sponsor][_revision]]
|
|
|
|
|
// If there is a '.' remove the part before it.
|
|
|
|
|
// if there is something after the codeset, remove it.
|
|
|
|
|
// Make the name lowercase and replace '_' with '-'.
|
|
|
|
|
// Exception: "ja_JP.EUC" == "euc-jp", "zh_CN.EUC" = "euc-cn",
|
|
|
|
|
// "ko_KR.EUC" == "euc-kr"
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if ((p = (char *)vim_strchr((char_u *)s, '.')) != NULL)
|
|
|
|
|
{
|
|
|
|
|
if (p > s + 2 && STRNICMP(p + 1, "EUC", 3) == 0
|
2024-01-04 21:19:04 +01:00
|
|
|
|
&& !SAFE_isalnum((int)p[4]) && p[4] != '-' && p[-3] == '_')
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// copy "XY.EUC" to "euc-XY" to buf[10]
|
2004-06-13 20:20:40 +00:00
|
|
|
|
STRCPY(buf + 10, "euc-");
|
|
|
|
|
buf[14] = p[-2];
|
|
|
|
|
buf[15] = p[-1];
|
|
|
|
|
buf[16] = 0;
|
|
|
|
|
s = buf + 10;
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
s = p + 1;
|
|
|
|
|
}
|
2016-07-11 23:19:05 +02:00
|
|
|
|
for (i = 0; i < (int)sizeof(buf) - 1 && s[i] != NUL; ++i)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (s[i] == '_' || s[i] == '-')
|
|
|
|
|
buf[i] = '-';
|
2024-01-04 21:19:04 +01:00
|
|
|
|
else if (SAFE_isalnum(s[i]))
|
2004-06-13 20:20:40 +00:00
|
|
|
|
buf[i] = TOLOWER_ASC(s[i]);
|
|
|
|
|
else
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
buf[i] = NUL;
|
|
|
|
|
|
|
|
|
|
return enc_canonize((char_u *)buf);
|
|
|
|
|
}
|
2019-01-24 15:54:21 +01:00
|
|
|
|
#endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2017-09-26 19:10:37 +02:00
|
|
|
|
/*
|
|
|
|
|
* Get the canonicalized encoding of the current locale.
|
|
|
|
|
* Returns an allocated string when successful, NULL when not.
|
|
|
|
|
*/
|
|
|
|
|
char_u *
|
|
|
|
|
enc_locale(void)
|
|
|
|
|
{
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#ifdef MSWIN
|
2017-09-26 19:10:37 +02:00
|
|
|
|
char buf[50];
|
|
|
|
|
long acp = GetACP();
|
|
|
|
|
|
|
|
|
|
if (acp == 1200)
|
|
|
|
|
STRCPY(buf, "ucs-2le");
|
2019-09-07 16:07:47 +02:00
|
|
|
|
else if (acp == 1252) // cp1252 is used as latin1
|
2017-09-26 19:10:37 +02:00
|
|
|
|
STRCPY(buf, "latin1");
|
2019-09-07 16:07:47 +02:00
|
|
|
|
else if (acp == 65001)
|
|
|
|
|
STRCPY(buf, "utf-8");
|
2017-09-26 19:10:37 +02:00
|
|
|
|
else
|
|
|
|
|
sprintf(buf, "cp%ld", acp);
|
|
|
|
|
|
|
|
|
|
return enc_canonize((char_u *)buf);
|
2019-01-24 15:54:21 +01:00
|
|
|
|
#else
|
2017-09-26 19:10:37 +02:00
|
|
|
|
char *s;
|
|
|
|
|
|
2019-01-24 15:54:21 +01:00
|
|
|
|
# ifdef HAVE_NL_LANGINFO_CODESET
|
2017-09-26 19:10:37 +02:00
|
|
|
|
if ((s = nl_langinfo(CODESET)) == NULL || *s == NUL)
|
2019-01-24 15:54:21 +01:00
|
|
|
|
# endif
|
|
|
|
|
# if defined(HAVE_LOCALE_H) || defined(X_LOCALE)
|
2017-09-26 19:10:37 +02:00
|
|
|
|
if ((s = setlocale(LC_CTYPE, NULL)) == NULL || *s == NUL)
|
2019-01-24 15:54:21 +01:00
|
|
|
|
# endif
|
2017-09-26 19:10:37 +02:00
|
|
|
|
s = NULL;
|
|
|
|
|
|
|
|
|
|
return enc_locale_env(s);
|
2019-01-24 15:54:21 +01:00
|
|
|
|
#endif
|
2017-09-26 19:10:37 +02:00
|
|
|
|
}
|
|
|
|
|
|
2019-02-17 17:44:42 +01:00
|
|
|
|
# if defined(MSWIN) || defined(PROTO) || defined(FEAT_CYGWIN_WIN32_CLIPBOARD)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Convert an encoding name to an MS-Windows codepage.
|
|
|
|
|
* Returns zero if no codepage can be figured out.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
encname2codepage(char_u *name)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int cp;
|
|
|
|
|
char_u *p = name;
|
|
|
|
|
int idx;
|
|
|
|
|
|
|
|
|
|
if (STRNCMP(p, "8bit-", 5) == 0)
|
|
|
|
|
p += 5;
|
|
|
|
|
else if (STRNCMP(p_enc, "2byte-", 6) == 0)
|
|
|
|
|
p += 6;
|
|
|
|
|
|
|
|
|
|
if (p[0] == 'c' && p[1] == 'p')
|
2013-07-05 20:09:16 +02:00
|
|
|
|
cp = atoi((char *)p + 2);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else if ((idx = enc_canon_search(p)) >= 0)
|
|
|
|
|
cp = enc_canon_table[idx].codepage;
|
|
|
|
|
else
|
|
|
|
|
return 0;
|
|
|
|
|
if (IsValidCodePage(cp))
|
|
|
|
|
return cp;
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
2017-09-26 19:10:37 +02:00
|
|
|
|
# endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
# if defined(USE_ICONV) || defined(PROTO)
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Call iconv_open() with a check if iconv() works properly (there are broken
|
|
|
|
|
* versions).
|
|
|
|
|
* Returns (void *)-1 if failed.
|
|
|
|
|
* (should return iconv_t, but that causes problems with prototypes).
|
|
|
|
|
*/
|
|
|
|
|
void *
|
2016-01-30 18:51:09 +01:00
|
|
|
|
my_iconv_open(char_u *to, char_u *from)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
iconv_t fd;
|
|
|
|
|
#define ICONV_TESTLEN 400
|
|
|
|
|
char_u tobuf[ICONV_TESTLEN];
|
|
|
|
|
char *p;
|
|
|
|
|
size_t tolen;
|
|
|
|
|
static int iconv_ok = -1;
|
|
|
|
|
|
|
|
|
|
if (iconv_ok == FALSE)
|
2019-12-04 21:57:43 +01:00
|
|
|
|
return (void *)-1; // detected a broken iconv() previously
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
#ifdef DYNAMIC_ICONV
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Check if the iconv.dll can be found.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (!iconv_enabled(TRUE))
|
|
|
|
|
return (void *)-1;
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
fd = iconv_open((char *)enc_skip(to), (char *)enc_skip(from));
|
|
|
|
|
|
|
|
|
|
if (fd != (iconv_t)-1 && iconv_ok == -1)
|
|
|
|
|
{
|
|
|
|
|
/*
|
|
|
|
|
* Do a dummy iconv() call to check if it actually works. There is a
|
|
|
|
|
* version of iconv() on Linux that is broken. We can't ignore it,
|
|
|
|
|
* because it's wide-spread. The symptoms are that after outputting
|
|
|
|
|
* the initial shift state the "to" pointer is NULL and conversion
|
|
|
|
|
* stops for no apparent reason after about 8160 characters.
|
|
|
|
|
*/
|
|
|
|
|
p = (char *)tobuf;
|
|
|
|
|
tolen = ICONV_TESTLEN;
|
|
|
|
|
(void)iconv(fd, NULL, NULL, &p, &tolen);
|
|
|
|
|
if (p == NULL)
|
|
|
|
|
{
|
|
|
|
|
iconv_ok = FALSE;
|
|
|
|
|
iconv_close(fd);
|
|
|
|
|
fd = (iconv_t)-1;
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
iconv_ok = TRUE;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return (void *)fd;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Convert the string "str[slen]" with iconv().
|
|
|
|
|
* If "unconvlenp" is not NULL handle the string ending in an incomplete
|
|
|
|
|
* sequence and set "*unconvlenp" to the length of it.
|
|
|
|
|
* Returns the converted string in allocated memory. NULL for an error.
|
2009-06-16 13:23:06 +00:00
|
|
|
|
* If resultlenp is not NULL, sets it to the result length in bytes.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*/
|
|
|
|
|
static char_u *
|
2016-01-30 18:51:09 +01:00
|
|
|
|
iconv_string(
|
|
|
|
|
vimconv_T *vcp,
|
|
|
|
|
char_u *str,
|
|
|
|
|
int slen,
|
|
|
|
|
int *unconvlenp,
|
|
|
|
|
int *resultlenp)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
const char *from;
|
|
|
|
|
size_t fromlen;
|
|
|
|
|
char *to;
|
|
|
|
|
size_t tolen;
|
|
|
|
|
size_t len = 0;
|
|
|
|
|
size_t done = 0;
|
|
|
|
|
char_u *result = NULL;
|
|
|
|
|
char_u *p;
|
|
|
|
|
int l;
|
|
|
|
|
|
|
|
|
|
from = (char *)str;
|
|
|
|
|
fromlen = slen;
|
|
|
|
|
for (;;)
|
|
|
|
|
{
|
|
|
|
|
if (len == 0 || ICONV_ERRNO == ICONV_E2BIG)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Allocate enough room for most conversions. When re-allocating
|
|
|
|
|
// increase the buffer size.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
len = len + fromlen * 2 + 40;
|
2019-05-24 18:54:09 +02:00
|
|
|
|
p = alloc(len);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (p != NULL && done > 0)
|
|
|
|
|
mch_memmove(p, result, done);
|
|
|
|
|
vim_free(result);
|
|
|
|
|
result = p;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (result == NULL) // out of memory
|
2004-06-13 20:20:40 +00:00
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
to = (char *)result + done;
|
|
|
|
|
tolen = len - done - 2;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Avoid a warning for systems with a wrong iconv() prototype by
|
|
|
|
|
// casting the second argument to void *.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (iconv(vcp->vc_fd, (void *)&from, &fromlen, &to, &tolen)
|
|
|
|
|
!= (size_t)-1)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Finished, append a NUL.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*to = NUL;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Check both ICONV_EINVAL and EINVAL, because the dynamically loaded
|
|
|
|
|
// iconv library may use one of them.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (!vcp->vc_fail && unconvlenp != NULL
|
|
|
|
|
&& (ICONV_ERRNO == ICONV_EINVAL || ICONV_ERRNO == EINVAL))
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Handle an incomplete sequence at the end.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*to = NUL;
|
2006-04-17 22:14:47 +00:00
|
|
|
|
*unconvlenp = (int)fromlen;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Check both ICONV_EILSEQ and EILSEQ, because the dynamically loaded
|
|
|
|
|
// iconv library may use one of them.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else if (!vcp->vc_fail
|
|
|
|
|
&& (ICONV_ERRNO == ICONV_EILSEQ || ICONV_ERRNO == EILSEQ
|
|
|
|
|
|| ICONV_ERRNO == ICONV_EINVAL || ICONV_ERRNO == EINVAL))
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Can't convert: insert a '?' and skip a character. This assumes
|
|
|
|
|
// conversion from 'encoding' to something else. In other
|
|
|
|
|
// situations we don't know what to skip anyway.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*to++ = '?';
|
|
|
|
|
if ((*mb_ptr2cells)((char_u *)from) > 1)
|
|
|
|
|
*to++ = '?';
|
2005-03-20 22:37:15 +00:00
|
|
|
|
if (enc_utf8)
|
2006-04-17 22:14:47 +00:00
|
|
|
|
l = utfc_ptr2len_len((char_u *)from, (int)fromlen);
|
2005-03-20 22:37:15 +00:00
|
|
|
|
else
|
|
|
|
|
{
|
2005-08-10 21:07:57 +00:00
|
|
|
|
l = (*mb_ptr2len)((char_u *)from);
|
2005-03-22 23:03:44 +00:00
|
|
|
|
if (l > (int)fromlen)
|
2006-04-17 22:14:47 +00:00
|
|
|
|
l = (int)fromlen;
|
2005-03-20 22:37:15 +00:00
|
|
|
|
}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
from += l;
|
|
|
|
|
fromlen -= l;
|
|
|
|
|
}
|
|
|
|
|
else if (ICONV_ERRNO != ICONV_E2BIG)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// conversion failed
|
2018-02-10 18:45:26 +01:00
|
|
|
|
VIM_CLEAR(result);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
break;
|
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Not enough room or skipping illegal sequence.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
done = to - (char *)result;
|
|
|
|
|
}
|
2009-06-16 13:23:06 +00:00
|
|
|
|
|
2011-04-11 14:29:17 +02:00
|
|
|
|
if (resultlenp != NULL && result != NULL)
|
2009-06-16 13:23:06 +00:00
|
|
|
|
*resultlenp = (int)(to - (char *)result);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# if defined(DYNAMIC_ICONV) || defined(PROTO)
|
|
|
|
|
/*
|
|
|
|
|
* Dynamically load the "iconv.dll" on Win32.
|
|
|
|
|
*/
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
# ifndef DYNAMIC_ICONV // must be generating prototypes
|
2016-01-24 15:36:03 +01:00
|
|
|
|
# define HINSTANCE int
|
|
|
|
|
# endif
|
2005-06-01 21:51:55 +00:00
|
|
|
|
static HINSTANCE hIconvDLL = 0;
|
|
|
|
|
static HINSTANCE hMsvcrtDLL = 0;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2016-01-24 15:36:03 +01:00
|
|
|
|
# ifndef DYNAMIC_ICONV_DLL
|
|
|
|
|
# define DYNAMIC_ICONV_DLL "iconv.dll"
|
|
|
|
|
# define DYNAMIC_ICONV_DLL_ALT1 "libiconv.dll"
|
|
|
|
|
# define DYNAMIC_ICONV_DLL_ALT2 "libiconv2.dll"
|
|
|
|
|
# define DYNAMIC_ICONV_DLL_ALT3 "libiconv-2.dll"
|
|
|
|
|
# endif
|
|
|
|
|
# ifndef DYNAMIC_MSVCRT_DLL
|
|
|
|
|
# define DYNAMIC_MSVCRT_DLL "msvcrt.dll"
|
|
|
|
|
# endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Try opening the iconv.dll and return TRUE if iconv() can be used.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
iconv_enabled(int verbose)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (hIconvDLL != 0 && hMsvcrtDLL != 0)
|
|
|
|
|
return TRUE;
|
2015-10-13 13:49:09 +02:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// The iconv DLL file goes under different names, try them all.
|
|
|
|
|
// Do the "2" version first, it's newer.
|
2016-01-24 15:36:03 +01:00
|
|
|
|
#ifdef DYNAMIC_ICONV_DLL_ALT2
|
2015-10-13 13:49:09 +02:00
|
|
|
|
if (hIconvDLL == 0)
|
|
|
|
|
hIconvDLL = vimLoadLib(DYNAMIC_ICONV_DLL_ALT2);
|
2016-01-24 15:36:03 +01:00
|
|
|
|
#endif
|
|
|
|
|
#ifdef DYNAMIC_ICONV_DLL_ALT3
|
2015-10-13 13:49:09 +02:00
|
|
|
|
if (hIconvDLL == 0)
|
|
|
|
|
hIconvDLL = vimLoadLib(DYNAMIC_ICONV_DLL_ALT3);
|
2016-01-24 15:36:03 +01:00
|
|
|
|
#endif
|
|
|
|
|
if (hIconvDLL == 0)
|
|
|
|
|
hIconvDLL = vimLoadLib(DYNAMIC_ICONV_DLL);
|
|
|
|
|
#ifdef DYNAMIC_ICONV_DLL_ALT1
|
|
|
|
|
if (hIconvDLL == 0)
|
|
|
|
|
hIconvDLL = vimLoadLib(DYNAMIC_ICONV_DLL_ALT1);
|
|
|
|
|
#endif
|
2015-10-13 13:49:09 +02:00
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (hIconvDLL != 0)
|
2010-10-23 14:02:54 +02:00
|
|
|
|
hMsvcrtDLL = vimLoadLib(DYNAMIC_MSVCRT_DLL);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (hIconvDLL == 0 || hMsvcrtDLL == 0)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Only give the message when 'verbose' is set, otherwise it might be
|
|
|
|
|
// done whenever a conversion is attempted.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (verbose && p_verbose > 0)
|
2005-05-31 22:09:46 +00:00
|
|
|
|
{
|
|
|
|
|
verbose_enter();
|
2022-01-01 14:19:49 +00:00
|
|
|
|
semsg(_(e_could_not_load_library_str_str),
|
2021-07-24 13:57:29 +02:00
|
|
|
|
hIconvDLL == 0 ? DYNAMIC_ICONV_DLL : DYNAMIC_MSVCRT_DLL,
|
|
|
|
|
GetWin32Error());
|
2005-05-31 22:09:46 +00:00
|
|
|
|
verbose_leave();
|
|
|
|
|
}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
iconv_end();
|
|
|
|
|
return FALSE;
|
|
|
|
|
}
|
|
|
|
|
|
2022-06-09 20:53:54 +01:00
|
|
|
|
iconv = (size_t (*)(iconv_t, const char **,
|
|
|
|
|
size_t *, char **, size_t *))
|
|
|
|
|
GetProcAddress(hIconvDLL, "libiconv");
|
|
|
|
|
iconv_open = (iconv_t (*)(const char *, const char *))
|
|
|
|
|
GetProcAddress(hIconvDLL, "libiconv_open");
|
|
|
|
|
iconv_close = (int (*)(iconv_t))
|
|
|
|
|
GetProcAddress(hIconvDLL, "libiconv_close");
|
|
|
|
|
iconvctl = (int (*)(iconv_t, int, void *))
|
|
|
|
|
GetProcAddress(hIconvDLL, "libiconvctl");
|
|
|
|
|
iconv_errno = (int *(*)(void))get_dll_import_func(hIconvDLL, "_errno");
|
2013-01-17 14:39:47 +01:00
|
|
|
|
if (iconv_errno == NULL)
|
2022-06-09 20:53:54 +01:00
|
|
|
|
iconv_errno = (int *(*)(void))GetProcAddress(hMsvcrtDLL, "_errno");
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (iconv == NULL || iconv_open == NULL || iconv_close == NULL
|
|
|
|
|
|| iconvctl == NULL || iconv_errno == NULL)
|
|
|
|
|
{
|
|
|
|
|
iconv_end();
|
|
|
|
|
if (verbose && p_verbose > 0)
|
2005-05-31 22:09:46 +00:00
|
|
|
|
{
|
|
|
|
|
verbose_enter();
|
2022-01-01 14:19:49 +00:00
|
|
|
|
semsg(_(e_could_not_load_library_function_str), "for libiconv");
|
2005-05-31 22:09:46 +00:00
|
|
|
|
verbose_leave();
|
|
|
|
|
}
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return FALSE;
|
|
|
|
|
}
|
|
|
|
|
return TRUE;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void
|
2016-01-30 18:51:09 +01:00
|
|
|
|
iconv_end(void)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Don't use iconv() when inputting or outputting characters.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (input_conv.vc_type == CONV_ICONV)
|
|
|
|
|
convert_setup(&input_conv, NULL, NULL);
|
|
|
|
|
if (output_conv.vc_type == CONV_ICONV)
|
|
|
|
|
convert_setup(&output_conv, NULL, NULL);
|
|
|
|
|
|
|
|
|
|
if (hIconvDLL != 0)
|
|
|
|
|
FreeLibrary(hIconvDLL);
|
|
|
|
|
if (hMsvcrtDLL != 0)
|
|
|
|
|
FreeLibrary(hMsvcrtDLL);
|
|
|
|
|
hIconvDLL = 0;
|
|
|
|
|
hMsvcrtDLL = 0;
|
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
# endif // DYNAMIC_ICONV
|
|
|
|
|
# endif // USE_ICONV
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2019-09-07 15:08:38 +02:00
|
|
|
|
#if defined(FEAT_EVAL) || defined(PROTO)
|
|
|
|
|
/*
|
|
|
|
|
* "getimstatus()" function
|
|
|
|
|
*/
|
|
|
|
|
void
|
|
|
|
|
f_getimstatus(typval_T *argvars UNUSED, typval_T *rettv)
|
|
|
|
|
{
|
|
|
|
|
# if defined(HAVE_INPUT_METHOD)
|
|
|
|
|
rettv->vval.v_number = im_get_status();
|
|
|
|
|
# endif
|
|
|
|
|
}
|
2021-07-10 21:29:18 +02:00
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* iconv() function
|
|
|
|
|
*/
|
|
|
|
|
void
|
|
|
|
|
f_iconv(typval_T *argvars UNUSED, typval_T *rettv)
|
|
|
|
|
{
|
|
|
|
|
char_u buf1[NUMBUFLEN];
|
|
|
|
|
char_u buf2[NUMBUFLEN];
|
|
|
|
|
char_u *from, *to, *str;
|
|
|
|
|
vimconv_T vimconv;
|
|
|
|
|
|
|
|
|
|
rettv->v_type = VAR_STRING;
|
|
|
|
|
rettv->vval.v_string = NULL;
|
|
|
|
|
|
2021-07-27 22:00:44 +02:00
|
|
|
|
if (in_vim9script()
|
|
|
|
|
&& (check_for_string_arg(argvars, 0) == FAIL
|
|
|
|
|
|| check_for_string_arg(argvars, 1) == FAIL
|
|
|
|
|
|| check_for_string_arg(argvars, 2) == FAIL))
|
|
|
|
|
return;
|
|
|
|
|
|
2021-07-10 21:29:18 +02:00
|
|
|
|
str = tv_get_string(&argvars[0]);
|
|
|
|
|
from = enc_canonize(enc_skip(tv_get_string_buf(&argvars[1], buf1)));
|
|
|
|
|
to = enc_canonize(enc_skip(tv_get_string_buf(&argvars[2], buf2)));
|
|
|
|
|
vimconv.vc_type = CONV_NONE;
|
|
|
|
|
convert_setup(&vimconv, from, to);
|
|
|
|
|
|
|
|
|
|
// If the encodings are equal, no conversion needed.
|
|
|
|
|
if (vimconv.vc_type == CONV_NONE)
|
|
|
|
|
rettv->vval.v_string = vim_strsave(str);
|
|
|
|
|
else
|
|
|
|
|
rettv->vval.v_string = string_convert(&vimconv, str, NULL);
|
|
|
|
|
|
|
|
|
|
convert_setup(&vimconv, NULL, NULL);
|
|
|
|
|
vim_free(from);
|
|
|
|
|
vim_free(to);
|
|
|
|
|
}
|
2019-09-07 15:08:38 +02:00
|
|
|
|
#endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Setup "vcp" for conversion from "from" to "to".
|
|
|
|
|
* The names must have been made canonical with enc_canonize().
|
|
|
|
|
* vcp->vc_type must have been initialized to CONV_NONE.
|
|
|
|
|
* Note: cannot be used for conversion from/to ucs-2 and ucs-4 (will use utf-8
|
|
|
|
|
* instead).
|
|
|
|
|
* Afterwards invoke with "from" and "to" equal to NULL to cleanup.
|
|
|
|
|
* Return FAIL when conversion is not supported, OK otherwise.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
convert_setup(vimconv_T *vcp, char_u *from, char_u *to)
|
2009-06-16 13:23:06 +00:00
|
|
|
|
{
|
|
|
|
|
return convert_setup_ext(vcp, from, TRUE, to, TRUE);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* As convert_setup(), but only when from_unicode_is_utf8 is TRUE will all
|
|
|
|
|
* "from" unicode charsets be considered utf-8. Same for "to".
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
convert_setup_ext(
|
|
|
|
|
vimconv_T *vcp,
|
|
|
|
|
char_u *from,
|
|
|
|
|
int from_unicode_is_utf8,
|
|
|
|
|
char_u *to,
|
|
|
|
|
int to_unicode_is_utf8)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int from_prop;
|
|
|
|
|
int to_prop;
|
2009-06-16 13:23:06 +00:00
|
|
|
|
int from_is_utf8;
|
|
|
|
|
int to_is_utf8;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Reset to no conversion.
|
2019-01-24 15:54:21 +01:00
|
|
|
|
#ifdef USE_ICONV
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (vcp->vc_type == CONV_ICONV && vcp->vc_fd != (iconv_t)-1)
|
|
|
|
|
iconv_close(vcp->vc_fd);
|
2019-01-24 15:54:21 +01:00
|
|
|
|
#endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
vcp->vc_type = CONV_NONE;
|
|
|
|
|
vcp->vc_factor = 1;
|
|
|
|
|
vcp->vc_fail = FALSE;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// No conversion when one of the names is empty or they are equal.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (from == NULL || *from == NUL || to == NULL || *to == NUL
|
|
|
|
|
|| STRCMP(from, to) == 0)
|
|
|
|
|
return OK;
|
|
|
|
|
|
|
|
|
|
from_prop = enc_canon_props(from);
|
|
|
|
|
to_prop = enc_canon_props(to);
|
2009-06-16 13:23:06 +00:00
|
|
|
|
if (from_unicode_is_utf8)
|
|
|
|
|
from_is_utf8 = from_prop & ENC_UNICODE;
|
|
|
|
|
else
|
|
|
|
|
from_is_utf8 = from_prop == ENC_UNICODE;
|
|
|
|
|
if (to_unicode_is_utf8)
|
|
|
|
|
to_is_utf8 = to_prop & ENC_UNICODE;
|
|
|
|
|
else
|
|
|
|
|
to_is_utf8 = to_prop == ENC_UNICODE;
|
|
|
|
|
|
|
|
|
|
if ((from_prop & ENC_LATIN1) && to_is_utf8)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Internal latin1 -> utf-8 conversion.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
vcp->vc_type = CONV_TO_UTF8;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
vcp->vc_factor = 2; // up to twice as long
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
2009-06-16 13:23:06 +00:00
|
|
|
|
else if ((from_prop & ENC_LATIN9) && to_is_utf8)
|
2004-10-07 21:02:47 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Internal latin9 -> utf-8 conversion.
|
2004-10-07 21:02:47 +00:00
|
|
|
|
vcp->vc_type = CONV_9_TO_UTF8;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
vcp->vc_factor = 3; // up to three as long (euro sign)
|
2004-10-07 21:02:47 +00:00
|
|
|
|
}
|
2009-06-16 13:23:06 +00:00
|
|
|
|
else if (from_is_utf8 && (to_prop & ENC_LATIN1))
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Internal utf-8 -> latin1 conversion.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
vcp->vc_type = CONV_TO_LATIN1;
|
|
|
|
|
}
|
2009-06-16 13:23:06 +00:00
|
|
|
|
else if (from_is_utf8 && (to_prop & ENC_LATIN9))
|
2004-10-07 21:02:47 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Internal utf-8 -> latin9 conversion.
|
2004-10-07 21:02:47 +00:00
|
|
|
|
vcp->vc_type = CONV_TO_LATIN9;
|
|
|
|
|
}
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#ifdef MSWIN
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Win32-specific codepage <-> codepage conversion without iconv.
|
2009-06-16 13:23:06 +00:00
|
|
|
|
else if ((from_is_utf8 || encname2codepage(from) > 0)
|
|
|
|
|
&& (to_is_utf8 || encname2codepage(to) > 0))
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
vcp->vc_type = CONV_CODEPAGE;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
vcp->vc_factor = 2; // up to twice as long
|
2009-06-16 13:23:06 +00:00
|
|
|
|
vcp->vc_cpfrom = from_is_utf8 ? 0 : encname2codepage(from);
|
|
|
|
|
vcp->vc_cpto = to_is_utf8 ? 0 : encname2codepage(to);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
#endif
|
2017-10-28 21:11:06 +02:00
|
|
|
|
#ifdef MACOS_CONVERT
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else if ((from_prop & ENC_MACROMAN) && (to_prop & ENC_LATIN1))
|
|
|
|
|
{
|
|
|
|
|
vcp->vc_type = CONV_MAC_LATIN1;
|
|
|
|
|
}
|
2009-06-16 13:23:06 +00:00
|
|
|
|
else if ((from_prop & ENC_MACROMAN) && to_is_utf8)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
vcp->vc_type = CONV_MAC_UTF8;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
vcp->vc_factor = 2; // up to twice as long
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
else if ((from_prop & ENC_LATIN1) && (to_prop & ENC_MACROMAN))
|
|
|
|
|
{
|
|
|
|
|
vcp->vc_type = CONV_LATIN1_MAC;
|
|
|
|
|
}
|
2009-06-16 13:23:06 +00:00
|
|
|
|
else if (from_is_utf8 && (to_prop & ENC_MACROMAN))
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
vcp->vc_type = CONV_UTF8_MAC;
|
|
|
|
|
}
|
|
|
|
|
#endif
|
2019-01-24 15:54:21 +01:00
|
|
|
|
#ifdef USE_ICONV
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Use iconv() for conversion.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
vcp->vc_fd = (iconv_t)my_iconv_open(
|
2009-06-16 13:23:06 +00:00
|
|
|
|
to_is_utf8 ? (char_u *)"utf-8" : to,
|
|
|
|
|
from_is_utf8 ? (char_u *)"utf-8" : from);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (vcp->vc_fd != (iconv_t)-1)
|
|
|
|
|
{
|
|
|
|
|
vcp->vc_type = CONV_ICONV;
|
2019-12-04 21:57:43 +01:00
|
|
|
|
vcp->vc_factor = 4; // could be longer too...
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
}
|
2019-01-24 15:54:21 +01:00
|
|
|
|
#endif
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (vcp->vc_type == CONV_NONE)
|
|
|
|
|
return FAIL;
|
2005-02-22 08:39:57 +00:00
|
|
|
|
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return OK;
|
|
|
|
|
}
|
|
|
|
|
|
2019-02-17 17:44:42 +01:00
|
|
|
|
#if defined(FEAT_GUI) || defined(AMIGA) || defined(MSWIN) \
|
2016-02-23 14:53:34 +01:00
|
|
|
|
|| defined(PROTO)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
/*
|
|
|
|
|
* Do conversion on typed input characters in-place.
|
|
|
|
|
* The input and output are not NUL terminated!
|
|
|
|
|
* Returns the length after conversion.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
convert_input(char_u *ptr, int len, int maxlen)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
return convert_input_safe(ptr, len, maxlen, NULL, NULL);
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Like convert_input(), but when there is an incomplete byte sequence at the
|
|
|
|
|
* end return that as an allocated string in "restp" and set "*restlenp" to
|
|
|
|
|
* the length. If "restp" is NULL it is not used.
|
|
|
|
|
*/
|
|
|
|
|
int
|
2016-01-30 18:51:09 +01:00
|
|
|
|
convert_input_safe(
|
|
|
|
|
char_u *ptr,
|
|
|
|
|
int len,
|
|
|
|
|
int maxlen,
|
|
|
|
|
char_u **restp,
|
|
|
|
|
int *restlenp)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
char_u *d;
|
|
|
|
|
int dlen = len;
|
|
|
|
|
int unconvertlen = 0;
|
|
|
|
|
|
|
|
|
|
d = string_convert_ext(&input_conv, ptr, &dlen,
|
|
|
|
|
restp == NULL ? NULL : &unconvertlen);
|
2023-01-14 12:32:28 +00:00
|
|
|
|
if (d == NULL)
|
|
|
|
|
return dlen;
|
|
|
|
|
|
|
|
|
|
if (dlen <= maxlen)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2023-01-14 12:32:28 +00:00
|
|
|
|
if (unconvertlen > 0)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2023-01-14 12:32:28 +00:00
|
|
|
|
// Move the unconverted characters to allocated memory.
|
|
|
|
|
*restp = alloc(unconvertlen);
|
|
|
|
|
if (*restp != NULL)
|
|
|
|
|
mch_memmove(*restp, ptr + len - unconvertlen, unconvertlen);
|
|
|
|
|
*restlenp = unconvertlen;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
2023-01-14 12:32:28 +00:00
|
|
|
|
mch_memmove(ptr, d, dlen);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
2023-01-14 12:32:28 +00:00
|
|
|
|
else
|
|
|
|
|
// result is too long, keep the unconverted text (the caller must
|
|
|
|
|
// have done something wrong!)
|
|
|
|
|
dlen = len;
|
|
|
|
|
vim_free(d);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
return dlen;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Convert text "ptr[*lenp]" according to "vcp".
|
|
|
|
|
* Returns the result in allocated memory and sets "*lenp".
|
|
|
|
|
* When "lenp" is NULL, use NUL terminated strings.
|
|
|
|
|
* Illegal chars are often changed to "?", unless vcp->vc_fail is set.
|
|
|
|
|
* When something goes wrong, NULL is returned and "*lenp" is unchanged.
|
|
|
|
|
*/
|
|
|
|
|
char_u *
|
2016-01-30 18:51:09 +01:00
|
|
|
|
string_convert(
|
|
|
|
|
vimconv_T *vcp,
|
|
|
|
|
char_u *ptr,
|
|
|
|
|
int *lenp)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
return string_convert_ext(vcp, ptr, lenp, NULL);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Like string_convert(), but when "unconvlenp" is not NULL and there are is
|
|
|
|
|
* an incomplete sequence at the end it is not converted and "*unconvlenp" is
|
|
|
|
|
* set to the number of remaining bytes.
|
|
|
|
|
*/
|
|
|
|
|
char_u *
|
2016-01-30 18:51:09 +01:00
|
|
|
|
string_convert_ext(
|
|
|
|
|
vimconv_T *vcp,
|
|
|
|
|
char_u *ptr,
|
|
|
|
|
int *lenp,
|
|
|
|
|
int *unconvlenp)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
char_u *retval = NULL;
|
|
|
|
|
char_u *d;
|
|
|
|
|
int len;
|
|
|
|
|
int i;
|
|
|
|
|
int l;
|
|
|
|
|
int c;
|
|
|
|
|
|
|
|
|
|
if (lenp == NULL)
|
|
|
|
|
len = (int)STRLEN(ptr);
|
|
|
|
|
else
|
|
|
|
|
len = *lenp;
|
|
|
|
|
if (len == 0)
|
|
|
|
|
return vim_strsave((char_u *)"");
|
|
|
|
|
|
|
|
|
|
switch (vcp->vc_type)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case CONV_TO_UTF8: // latin1 to utf-8 conversion
|
2004-06-13 20:20:40 +00:00
|
|
|
|
retval = alloc(len * 2 + 1);
|
|
|
|
|
if (retval == NULL)
|
|
|
|
|
break;
|
|
|
|
|
d = retval;
|
|
|
|
|
for (i = 0; i < len; ++i)
|
|
|
|
|
{
|
2004-10-07 21:02:47 +00:00
|
|
|
|
c = ptr[i];
|
|
|
|
|
if (c < 0x80)
|
|
|
|
|
*d++ = c;
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else
|
|
|
|
|
{
|
2004-10-07 21:02:47 +00:00
|
|
|
|
*d++ = 0xc0 + ((unsigned)c >> 6);
|
|
|
|
|
*d++ = 0x80 + (c & 0x3f);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
*d = NUL;
|
|
|
|
|
if (lenp != NULL)
|
|
|
|
|
*lenp = (int)(d - retval);
|
|
|
|
|
break;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case CONV_9_TO_UTF8: // latin9 to utf-8 conversion
|
2004-10-07 21:02:47 +00:00
|
|
|
|
retval = alloc(len * 3 + 1);
|
|
|
|
|
if (retval == NULL)
|
|
|
|
|
break;
|
|
|
|
|
d = retval;
|
|
|
|
|
for (i = 0; i < len; ++i)
|
|
|
|
|
{
|
|
|
|
|
c = ptr[i];
|
|
|
|
|
switch (c)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case 0xa4: c = 0x20ac; break; // euro
|
|
|
|
|
case 0xa6: c = 0x0160; break; // S hat
|
|
|
|
|
case 0xa8: c = 0x0161; break; // S -hat
|
|
|
|
|
case 0xb4: c = 0x017d; break; // Z hat
|
|
|
|
|
case 0xb8: c = 0x017e; break; // Z -hat
|
|
|
|
|
case 0xbc: c = 0x0152; break; // OE
|
|
|
|
|
case 0xbd: c = 0x0153; break; // oe
|
|
|
|
|
case 0xbe: c = 0x0178; break; // Y
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
2004-10-07 21:02:47 +00:00
|
|
|
|
d += utf_char2bytes(c, d);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
}
|
|
|
|
|
*d = NUL;
|
|
|
|
|
if (lenp != NULL)
|
|
|
|
|
*lenp = (int)(d - retval);
|
|
|
|
|
break;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case CONV_TO_LATIN1: // utf-8 to latin1 conversion
|
|
|
|
|
case CONV_TO_LATIN9: // utf-8 to latin9 conversion
|
2004-06-13 20:20:40 +00:00
|
|
|
|
retval = alloc(len + 1);
|
|
|
|
|
if (retval == NULL)
|
|
|
|
|
break;
|
|
|
|
|
d = retval;
|
|
|
|
|
for (i = 0; i < len; ++i)
|
|
|
|
|
{
|
2009-12-02 14:02:39 +00:00
|
|
|
|
l = utf_ptr2len_len(ptr + i, len - i);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (l == 0)
|
|
|
|
|
*d++ = NUL;
|
|
|
|
|
else if (l == 1)
|
|
|
|
|
{
|
2009-12-02 14:02:39 +00:00
|
|
|
|
int l_w = utf8len_tab_zero[ptr[i]];
|
|
|
|
|
|
|
|
|
|
if (l_w == 0)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Illegal utf-8 byte cannot be converted
|
2009-12-02 14:02:39 +00:00
|
|
|
|
vim_free(retval);
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
if (unconvlenp != NULL && l_w > len - i)
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// Incomplete sequence at the end.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
*unconvlenp = len - i;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
*d++ = ptr[i];
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
c = utf_ptr2char(ptr + i);
|
2004-10-07 21:02:47 +00:00
|
|
|
|
if (vcp->vc_type == CONV_TO_LATIN9)
|
|
|
|
|
switch (c)
|
|
|
|
|
{
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case 0x20ac: c = 0xa4; break; // euro
|
|
|
|
|
case 0x0160: c = 0xa6; break; // S hat
|
|
|
|
|
case 0x0161: c = 0xa8; break; // S -hat
|
|
|
|
|
case 0x017d: c = 0xb4; break; // Z hat
|
|
|
|
|
case 0x017e: c = 0xb8; break; // Z -hat
|
|
|
|
|
case 0x0152: c = 0xbc; break; // OE
|
|
|
|
|
case 0x0153: c = 0xbd; break; // oe
|
|
|
|
|
case 0x0178: c = 0xbe; break; // Y
|
2004-10-07 21:02:47 +00:00
|
|
|
|
case 0xa4:
|
|
|
|
|
case 0xa6:
|
|
|
|
|
case 0xa8:
|
|
|
|
|
case 0xb4:
|
|
|
|
|
case 0xb8:
|
|
|
|
|
case 0xbc:
|
|
|
|
|
case 0xbd:
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case 0xbe: c = 0x100; break; // not in latin9
|
2004-10-07 21:02:47 +00:00
|
|
|
|
}
|
2019-12-04 21:57:43 +01:00
|
|
|
|
if (!utf_iscomposing(c)) // skip composing chars
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
if (c < 0x100)
|
|
|
|
|
*d++ = c;
|
|
|
|
|
else if (vcp->vc_fail)
|
|
|
|
|
{
|
|
|
|
|
vim_free(retval);
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
*d++ = 0xbf;
|
|
|
|
|
if (utf_char2cells(c) > 1)
|
|
|
|
|
*d++ = '?';
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
i += l - 1;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
*d = NUL;
|
|
|
|
|
if (lenp != NULL)
|
|
|
|
|
*lenp = (int)(d - retval);
|
|
|
|
|
break;
|
|
|
|
|
|
2006-03-15 22:53:57 +00:00
|
|
|
|
# ifdef MACOS_CONVERT
|
2004-06-13 20:20:40 +00:00
|
|
|
|
case CONV_MAC_LATIN1:
|
|
|
|
|
retval = mac_string_convert(ptr, len, lenp, vcp->vc_fail,
|
2004-07-18 21:34:53 +00:00
|
|
|
|
'm', 'l', unconvlenp);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case CONV_LATIN1_MAC:
|
|
|
|
|
retval = mac_string_convert(ptr, len, lenp, vcp->vc_fail,
|
2004-07-18 21:34:53 +00:00
|
|
|
|
'l', 'm', unconvlenp);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case CONV_MAC_UTF8:
|
|
|
|
|
retval = mac_string_convert(ptr, len, lenp, vcp->vc_fail,
|
2004-07-18 21:34:53 +00:00
|
|
|
|
'm', 'u', unconvlenp);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case CONV_UTF8_MAC:
|
|
|
|
|
retval = mac_string_convert(ptr, len, lenp, vcp->vc_fail,
|
2004-07-18 21:34:53 +00:00
|
|
|
|
'u', 'm', unconvlenp);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
break;
|
|
|
|
|
# endif
|
|
|
|
|
|
|
|
|
|
# ifdef USE_ICONV
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case CONV_ICONV: // conversion with output_conv.vc_fd
|
2009-06-16 13:23:06 +00:00
|
|
|
|
retval = iconv_string(vcp, ptr, len, unconvlenp, lenp);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
break;
|
|
|
|
|
# endif
|
2019-02-17 17:44:42 +01:00
|
|
|
|
# ifdef MSWIN
|
2019-12-04 21:57:43 +01:00
|
|
|
|
case CONV_CODEPAGE: // codepage -> codepage
|
2004-06-13 20:20:40 +00:00
|
|
|
|
{
|
|
|
|
|
int retlen;
|
|
|
|
|
int tmp_len;
|
|
|
|
|
short_u *tmp;
|
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// 1. codepage/UTF-8 -> ucs-2.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (vcp->vc_cpfrom == 0)
|
2008-11-20 16:10:17 +00:00
|
|
|
|
tmp_len = utf8_to_utf16(ptr, len, NULL, NULL);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else
|
2013-02-13 16:49:58 +01:00
|
|
|
|
{
|
|
|
|
|
tmp_len = MultiByteToWideChar(vcp->vc_cpfrom,
|
|
|
|
|
unconvlenp ? MB_ERR_INVALID_CHARS : 0,
|
2016-02-16 15:06:59 +01:00
|
|
|
|
(char *)ptr, len, 0, 0);
|
2013-02-13 16:49:58 +01:00
|
|
|
|
if (tmp_len == 0
|
|
|
|
|
&& GetLastError() == ERROR_NO_UNICODE_TRANSLATION)
|
|
|
|
|
{
|
|
|
|
|
if (lenp != NULL)
|
|
|
|
|
*lenp = 0;
|
|
|
|
|
if (unconvlenp != NULL)
|
|
|
|
|
*unconvlenp = len;
|
|
|
|
|
retval = alloc(1);
|
|
|
|
|
if (retval)
|
|
|
|
|
retval[0] = NUL;
|
|
|
|
|
return retval;
|
|
|
|
|
}
|
|
|
|
|
}
|
2019-05-28 23:08:19 +02:00
|
|
|
|
tmp = ALLOC_MULT(short_u, tmp_len);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (tmp == NULL)
|
|
|
|
|
break;
|
|
|
|
|
if (vcp->vc_cpfrom == 0)
|
2008-11-20 16:10:17 +00:00
|
|
|
|
utf8_to_utf16(ptr, len, tmp, unconvlenp);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else
|
2016-02-16 15:06:59 +01:00
|
|
|
|
MultiByteToWideChar(vcp->vc_cpfrom, 0,
|
|
|
|
|
(char *)ptr, len, tmp, tmp_len);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
|
2019-12-04 21:57:43 +01:00
|
|
|
|
// 2. ucs-2 -> codepage/UTF-8.
|
2004-06-13 20:20:40 +00:00
|
|
|
|
if (vcp->vc_cpto == 0)
|
2008-11-20 16:10:17 +00:00
|
|
|
|
retlen = utf16_to_utf8(tmp, tmp_len, NULL);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else
|
|
|
|
|
retlen = WideCharToMultiByte(vcp->vc_cpto, 0,
|
|
|
|
|
tmp, tmp_len, 0, 0, 0, 0);
|
|
|
|
|
retval = alloc(retlen + 1);
|
|
|
|
|
if (retval != NULL)
|
|
|
|
|
{
|
|
|
|
|
if (vcp->vc_cpto == 0)
|
2008-11-20 16:10:17 +00:00
|
|
|
|
utf16_to_utf8(tmp, tmp_len, retval);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
else
|
|
|
|
|
WideCharToMultiByte(vcp->vc_cpto, 0,
|
2016-02-16 15:06:59 +01:00
|
|
|
|
tmp, tmp_len,
|
|
|
|
|
(char *)retval, retlen, 0, 0);
|
2004-06-13 20:20:40 +00:00
|
|
|
|
retval[retlen] = NUL;
|
|
|
|
|
if (lenp != NULL)
|
|
|
|
|
*lenp = retlen;
|
|
|
|
|
}
|
|
|
|
|
vim_free(tmp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
# endif
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return retval;
|
|
|
|
|
}
|
2020-08-28 21:04:24 +02:00
|
|
|
|
|
2020-08-28 22:24:57 +02:00
|
|
|
|
#if defined(FEAT_EVAL) || defined(PROTO)
|
|
|
|
|
|
2020-08-28 21:04:24 +02:00
|
|
|
|
/*
|
|
|
|
|
* Table set by setcellwidths().
|
|
|
|
|
*/
|
|
|
|
|
typedef struct
|
|
|
|
|
{
|
|
|
|
|
long first;
|
|
|
|
|
long last;
|
|
|
|
|
char width;
|
|
|
|
|
} cw_interval_T;
|
|
|
|
|
|
|
|
|
|
static cw_interval_T *cw_table = NULL;
|
|
|
|
|
static size_t cw_table_size = 0;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Return 1 or 2 when "c" is in the cellwidth table.
|
|
|
|
|
* Return 0 if not.
|
|
|
|
|
*/
|
|
|
|
|
static int
|
|
|
|
|
cw_value(int c)
|
|
|
|
|
{
|
|
|
|
|
int mid, bot, top;
|
|
|
|
|
|
|
|
|
|
if (cw_table == NULL)
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
|
|
// first quick check for Latin1 etc. characters
|
|
|
|
|
if (c < cw_table[0].first)
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
|
|
// binary search in table
|
|
|
|
|
bot = 0;
|
|
|
|
|
top = (int)cw_table_size - 1;
|
|
|
|
|
while (top >= bot)
|
|
|
|
|
{
|
|
|
|
|
mid = (bot + top) / 2;
|
|
|
|
|
if (cw_table[mid].last < c)
|
|
|
|
|
bot = mid + 1;
|
|
|
|
|
else if (cw_table[mid].first > c)
|
|
|
|
|
top = mid - 1;
|
|
|
|
|
else
|
|
|
|
|
return cw_table[mid].width;
|
|
|
|
|
}
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int
|
|
|
|
|
tv_nr_compare(const void *a1, const void *a2)
|
|
|
|
|
{
|
2020-08-28 23:27:20 +02:00
|
|
|
|
listitem_T *li1 = *(listitem_T **)a1;
|
|
|
|
|
listitem_T *li2 = *(listitem_T **)a2;
|
2020-08-28 21:04:24 +02:00
|
|
|
|
|
2024-02-09 19:39:14 +01:00
|
|
|
|
return li1->li_tv.vval.v_number == li2->li_tv.vval.v_number ? 0 :
|
|
|
|
|
li1->li_tv.vval.v_number > li2->li_tv.vval.v_number ? 1 : -1;
|
2020-08-28 21:04:24 +02:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void
|
|
|
|
|
f_setcellwidths(typval_T *argvars, typval_T *rettv UNUSED)
|
|
|
|
|
{
|
|
|
|
|
list_T *l;
|
|
|
|
|
listitem_T *li;
|
|
|
|
|
int item;
|
|
|
|
|
int i;
|
|
|
|
|
listitem_T **ptrs;
|
|
|
|
|
cw_interval_T *table;
|
2021-10-20 11:01:15 +01:00
|
|
|
|
cw_interval_T *cw_table_save;
|
|
|
|
|
size_t cw_table_size_save;
|
2022-07-04 17:34:33 +01:00
|
|
|
|
char *error = NULL;
|
2020-08-28 21:04:24 +02:00
|
|
|
|
|
2022-09-01 12:22:46 +01:00
|
|
|
|
if (check_for_nonnull_list_arg(argvars, 0) == FAIL)
|
2021-07-27 22:00:44 +02:00
|
|
|
|
return;
|
|
|
|
|
|
2020-08-28 21:04:24 +02:00
|
|
|
|
l = argvars[0].vval.v_list;
|
|
|
|
|
if (l->lv_len == 0)
|
|
|
|
|
{
|
|
|
|
|
// Clearing the table.
|
2023-03-07 17:45:11 +00:00
|
|
|
|
VIM_CLEAR(cw_table);
|
2020-08-28 21:04:24 +02:00
|
|
|
|
cw_table_size = 0;
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
ptrs = ALLOC_MULT(listitem_T *, l->lv_len);
|
|
|
|
|
if (ptrs == NULL)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
// Check that all entries are a list with three numbers, the range is
|
|
|
|
|
// valid and the cell width is valid.
|
|
|
|
|
item = 0;
|
2023-03-07 17:13:51 +00:00
|
|
|
|
FOR_ALL_LIST_ITEMS(l, li)
|
2020-08-28 21:04:24 +02:00
|
|
|
|
{
|
|
|
|
|
listitem_T *lili;
|
|
|
|
|
varnumber_T n1;
|
|
|
|
|
|
|
|
|
|
if (li->li_tv.v_type != VAR_LIST || li->li_tv.vval.v_list == NULL)
|
|
|
|
|
{
|
|
|
|
|
semsg(_(e_list_item_nr_is_not_list), item);
|
|
|
|
|
vim_free(ptrs);
|
|
|
|
|
return;
|
|
|
|
|
}
|
2020-08-28 23:27:20 +02:00
|
|
|
|
|
|
|
|
|
lili = li->li_tv.vval.v_list->lv_first;
|
|
|
|
|
ptrs[item] = lili;
|
|
|
|
|
for (i = 0; lili != NULL; lili = lili->li_next, ++i)
|
2020-08-28 21:04:24 +02:00
|
|
|
|
{
|
|
|
|
|
if (lili->li_tv.v_type != VAR_NUMBER)
|
|
|
|
|
break;
|
|
|
|
|
if (i == 0)
|
|
|
|
|
{
|
|
|
|
|
n1 = lili->li_tv.vval.v_number;
|
2023-01-20 16:00:55 +00:00
|
|
|
|
if (n1 < 0x80)
|
2020-08-28 21:04:24 +02:00
|
|
|
|
{
|
2023-01-20 16:00:55 +00:00
|
|
|
|
emsg(_(e_only_values_of_0x80_and_higher_supported));
|
2020-08-28 21:04:24 +02:00
|
|
|
|
vim_free(ptrs);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
else if (i == 1 && lili->li_tv.vval.v_number < n1)
|
|
|
|
|
{
|
|
|
|
|
semsg(_(e_list_item_nr_range_invalid), item);
|
|
|
|
|
vim_free(ptrs);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
else if (i == 2 && (lili->li_tv.vval.v_number < 1
|
|
|
|
|
|| lili->li_tv.vval.v_number > 2))
|
|
|
|
|
{
|
|
|
|
|
semsg(_(e_list_item_nr_cell_width_invalid), item);
|
|
|
|
|
vim_free(ptrs);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (i != 3)
|
|
|
|
|
{
|
|
|
|
|
semsg(_(e_list_item_nr_does_not_contain_3_numbers), item);
|
|
|
|
|
vim_free(ptrs);
|
|
|
|
|
return;
|
|
|
|
|
}
|
2020-08-28 23:27:20 +02:00
|
|
|
|
++item;
|
2020-08-28 21:04:24 +02:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sort the list on the first number.
|
|
|
|
|
qsort((void *)ptrs, (size_t)l->lv_len, sizeof(listitem_T *), tv_nr_compare);
|
|
|
|
|
|
|
|
|
|
table = ALLOC_MULT(cw_interval_T, l->lv_len);
|
|
|
|
|
if (table == NULL)
|
|
|
|
|
{
|
|
|
|
|
vim_free(ptrs);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Store the items in the new table.
|
2020-08-28 23:27:20 +02:00
|
|
|
|
for (item = 0; item < l->lv_len; ++item)
|
2020-08-28 21:04:24 +02:00
|
|
|
|
{
|
2020-08-28 23:27:20 +02:00
|
|
|
|
listitem_T *lili = ptrs[item];
|
2020-08-28 21:04:24 +02:00
|
|
|
|
varnumber_T n1;
|
|
|
|
|
|
|
|
|
|
n1 = lili->li_tv.vval.v_number;
|
|
|
|
|
if (item > 0 && n1 <= table[item - 1].last)
|
|
|
|
|
{
|
|
|
|
|
semsg(_(e_overlapping_ranges_for_nr), (long)n1);
|
|
|
|
|
vim_free(ptrs);
|
|
|
|
|
vim_free(table);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
table[item].first = n1;
|
|
|
|
|
lili = lili->li_next;
|
|
|
|
|
table[item].last = lili->li_tv.vval.v_number;
|
|
|
|
|
lili = lili->li_next;
|
|
|
|
|
table[item].width = lili->li_tv.vval.v_number;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
vim_free(ptrs);
|
2021-10-20 11:01:15 +01:00
|
|
|
|
|
|
|
|
|
cw_table_save = cw_table;
|
|
|
|
|
cw_table_size_save = cw_table_size;
|
2020-08-28 21:04:24 +02:00
|
|
|
|
cw_table = table;
|
|
|
|
|
cw_table_size = l->lv_len;
|
2021-10-20 11:01:15 +01:00
|
|
|
|
|
2022-08-09 12:53:14 +01:00
|
|
|
|
// Check that the new value does not conflict with 'listchars' or
|
|
|
|
|
// 'fillchars'.
|
|
|
|
|
error = check_chars_options();
|
2022-07-04 17:34:33 +01:00
|
|
|
|
if (error != NULL)
|
|
|
|
|
{
|
|
|
|
|
emsg(_(error));
|
|
|
|
|
cw_table = cw_table_save;
|
|
|
|
|
cw_table_size = cw_table_size_save;
|
|
|
|
|
vim_free(table);
|
|
|
|
|
return;
|
|
|
|
|
}
|
2021-10-20 11:01:15 +01:00
|
|
|
|
|
|
|
|
|
vim_free(cw_table_save);
|
2023-01-10 16:03:08 +00:00
|
|
|
|
redraw_all_later(UPD_CLEAR);
|
2020-08-28 21:04:24 +02:00
|
|
|
|
}
|
2020-08-28 22:24:57 +02:00
|
|
|
|
|
2023-01-17 18:31:56 +00:00
|
|
|
|
void
|
|
|
|
|
f_getcellwidths(typval_T *argvars UNUSED, typval_T *rettv)
|
|
|
|
|
{
|
|
|
|
|
if (rettv_list_alloc(rettv) == FAIL)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
for (size_t i = 0; i < cw_table_size; i++)
|
|
|
|
|
{
|
|
|
|
|
list_T *entry = list_alloc();
|
|
|
|
|
if (entry == NULL)
|
|
|
|
|
break;
|
2023-01-18 12:45:30 +00:00
|
|
|
|
if (list_append_number(entry, (varnumber_T)cw_table[i].first) == FAIL
|
|
|
|
|
|| list_append_number(entry, (varnumber_T)cw_table[i].last) == FAIL
|
|
|
|
|
|| list_append_number(entry, (varnumber_T)cw_table[i].width) == FAIL
|
|
|
|
|
|| list_append_list(rettv->vval.v_list, entry) == FAIL)
|
|
|
|
|
{
|
|
|
|
|
list_free(entry);
|
|
|
|
|
break;
|
|
|
|
|
}
|
2023-01-17 18:31:56 +00:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2020-08-28 22:24:57 +02:00
|
|
|
|
void
|
|
|
|
|
f_charclass(typval_T *argvars, typval_T *rettv UNUSED)
|
|
|
|
|
{
|
2021-07-02 20:19:31 +02:00
|
|
|
|
if (check_for_string_arg(argvars, 0) == FAIL
|
|
|
|
|
|| argvars[0].vval.v_string == NULL)
|
2020-08-28 22:24:57 +02:00
|
|
|
|
return;
|
|
|
|
|
rettv->vval.v_number = mb_get_class(argvars[0].vval.v_string);
|
|
|
|
|
}
|
|
|
|
|
#endif
|
patch 9.0.1958: cannot complete option values
Problem: cannot complete option values
Solution: Add completion functions for several options
Add cmdline tab-completion for setting string options
Add tab-completion for setting string options on the cmdline using
`:set=` (along with `:set+=` and `:set-=`).
The existing tab completion for setting options currently only works
when nothing is typed yet, and it only fills in with the existing value,
e.g. when the user does `:set diffopt=<Tab>` it will be completed to
`set diffopt=internal,filler,closeoff` and nothing else. This isn't too
useful as a user usually wants auto-complete to suggest all the possible
values, such as 'iblank', or 'algorithm:patience'.
For set= and set+=, this adds a new optional callback function for each
option that can be invoked when doing completion. This allows for each
option to have control over how completion works. For example, in
'diffopt', it will suggest the default enumeration, but if `algorithm:`
is selected, it will further suggest different algorithm types like
'meyers' and 'patience'. When using set=, the existing option value will
be filled in as the first choice to preserve the existing behavior. When
using set+= this won't happen as it doesn't make sense.
For flag list options (e.g. 'mouse' and 'guioptions'), completion will
take into account existing typed values (and in the case of set+=, the
existing option value) to make sure it doesn't suggest duplicates.
For set-=, there is a new `ExpandSettingSubtract` function which will
handle flag list and comma-separated options smartly, by only suggesting
values that currently exist in the option.
Note that Vim has some existing code that adds special handling for
'filetype', 'syntax', and misc dir options like 'backupdir'. This change
preserves them as they already work, instead of converting to the new
callback API for each option.
closes: #13182
Signed-off-by: Christian Brabandt <cb@256bit.org>
Co-authored-by: Yee Cheng Chin <ychin.git@gmail.com>
2023-09-29 20:42:32 +02:00
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Function given to ExpandGeneric() to obtain the possible arguments of the
|
|
|
|
|
* encoding options.
|
|
|
|
|
*/
|
|
|
|
|
char_u *
|
|
|
|
|
get_encoding_name(expand_T *xp UNUSED, int idx)
|
|
|
|
|
{
|
|
|
|
|
if (idx >= (int)(sizeof(enc_canon_table) / sizeof(enc_canon_table[0])))
|
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
|
|
return (char_u*)enc_canon_table[idx].name;
|
|
|
|
|
}
|