Problem: Resetting cell widths can make 'listchars' or 'fillchars'
invalid.
Solution: Check for conflicts when resetting cell widths (zeertzjq).
closes: #15629
Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: regex: wrong match when searching multi-byte char
case-insensitive (diffsetter)
Solution: Apply proper case-folding for characters and search-string
This patch does the following 4 things:
1) When the regexp engine compares two utf-8 codepoints case
insensitive it may match an adjacent character, because it assumes
it can step over as many bytes as the pattern contains.
This however is not necessarily true because of case-folding, a
multi-byte UTF-8 character can be considered equal to some
single-byte value.
Let's consider the pattern 'ſ' and the string 's'. When comparing and
ignoring case, the single character 's' matches, and since it matches
Vim will try to step over the match (by the amount of bytes of the
pattern), assuming that since it matches, the length of both strings is
the same.
However in that case, it should only step over the single byte value
's' by 1 byte and try to start matching after it again. So for the
backtracking engine we need to ensure:
* we try to match the correct length for the pattern and the text
* in case of a match, we step over it correctly
There is one tricky thing for the backtracing engine. We also need to
calculate correctly the number of bytes to compare the 2 different
utf-8 strings s1 and s2. So we will count the number of characters in
s1 that the byte len specified. Then we count the number of bytes to
step over the same number of characters in string s2 and then we can
correctly compare the 2 utf-8 strings.
2) A similar thing can happen for the NFA engine, when skipping to the
next character to test for a match. We are skipping over the regstart
pointer, however we do not consider the case that because of
case-folding we may need to adjust the number of bytes to skip over.
So this needs to be adjusted in find_match_text() as well.
3) A related issue turned out, when prog->match_text is actually empty.
In that case we should try to find the next match and skip this
condition.
4) When comparing characters using collections, we must also apply case
folding to each character in the collection and not just to the
current character from the search string. This doesn't apply to the
NFA engine, because internally it converts collections to branches
[abc] -> a\|b\|c
fixes: #14294closes: #14756
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: MS-Windows: too much legacy code
Solution: Clean up old code
(Ken Takata)
* Remove very old codes for Cygwin version of GCC.
Nowadays Cygwin GCC cannot be used for building Win32 Vim.
(The `-mno-cygwin` option was removed in Cygwin GCC4.)
* Remove old codes for old versions of MinGW.
Remove `__MINGW32__` as much as possible.
* Adjust makefile.
closes: #15044
Signed-off-by: K.Takata <kentkt@csc.jp>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Cursor wrong after using setcellwidth() in terminal
(mikoto2000)
Solution: output additional spaces, so the behaviour matches the GUI
(mikoto2000)
fixes: #14539closes: #14540
Signed-off-by: mikoto2000 <mikoto2000@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Fix CUI `setcellwidths` characters draw behavior to same GUI behavior.
Problem: Wrong cursor position after using setcellwidths().
Solution: Invalidate cursor position in addition to redrawing.
(zeertzjq)
closes: #14545
Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Patch 9.1.0296 causes too many issues
(Tony Mechelynck, @chdiza, CI)
Solution: Back out the change for now
Revert "patch 9.1.0296: regexp: engines do not handle case-folding well"
This reverts commit 7a27c108e0509f3255ebdcb6558e896c223e4d23 it causes
issues with syntax highlighting and breaks the FreeBSD and MacOS CI. It
needs more work.
fixes: #14487
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Regex engines do not handle case-folding well
Solution: Correctly calculate byte length of characters to skip
When the regexp engine compares two utf-8 codepoints case insensitively
it may match an adjacent character, because it assumes it can step over
as many bytes as the pattern contains.
This however is not necessarily true because of case-folding, a
multi-byte UTF-8 character can be considered equal to some single-byte
value.
Let's consider the pattern 'ſ' and the string 's'. When comparing and
ignoring case, the single character 's' matches, and since it matches
Vim will try to step over the match (by the amount of bytes of the
pattern), assuming that since it matches, the length of both strings is
the same.
However in that case, it should only step over the single byte
value 's' so by 1 byte and try to start matching after it again. So for the
backtracking engine we need to ensure:
- we try to match the correct length for the pattern and the text
- in case of a match, we step over it correctly
The same thing can happen for the NFA engine, when skipping to the next
character to test for a match. We are skipping over the regstart
pointer, however we do not consider the case that because of
case-folding we may need to adjust the number of bytes to skip over. So
this needs to be adjusted in find_match_text() as well.
A related issue turned out, when prog->match_text is actually empty. In
that case we should try to find the next match and skip this condition.
fixes: #14294closes: #14433
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: More code can use ml_get_buf_len() instead of STRLEN().
Solution: Change more STRLEN() calls to ml_get_buf_len(). Also do not
set ml_line_textlen in ml_replace_len() if "has_props" is set,
because "len_arg" also includes the size of text properties in
that case. (zeertzjq)
closes: #14183
Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: upper-case of ß should be U+1E9E (CAPITAL LETTER SHARP S)
(fenuks)
Solution: Make gU, ~ and g~ convert the U+00DF LATIN SMALL LETTER SHARP S (ß)
to U+1E9E LATIN CAPITAL LETTER SHARP S (ẞ), update tests
(glepnir)
This is part of Unicode 5.1.0 from April 2008, so should be fairly safe
to use now and since 2017 is part of the German standard orthography,
according to Wikipedia:
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E#cite_note-auto-12
There is however one exception: UnicodeData.txt for U+00DF
LATIN SMALL LETTER SHARP S does NOT define U+1E9E LATIN CAPITAL LETTER
SHARP S as its upper case version. Therefore, toupper() won't be able
to convert from lower sharp s to upper case sharp s (the other way
around however works, since U+00DF is considered the lower case
character of U+1E9E and therefore tolower() works correctly for the
upper case version).
fixes: #5573closes: #14018
Signed-off-by: glepnir <glephunter@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: qsort() comparison functions should be transitive
Solution: Do not subtract values, but rather use explicit comparisons
Improve qsort() comparison functions
There has been a recent report on qsort() causing out-of-bounds read &
write in glibc for non transitive comparison functions
https://www.qualys.com/2024/01/30/qsort.txt
Even so the bug is in glibc's implementation of the qsort() algorithm,
it's bad style to just use substraction for the comparison functions,
which may cause overflow issues and as hinted at in OpenBSD's manual
page for qsort(): "It is almost always an error to use subtraction to
compute the return value of the comparison function."
So check the qsort() comparison functions and change them to be safe.
closes: #13980
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: is*() and to*() function may be unsafe
Solution: Add SAFE_* macros and start using those instead
(Keith Thompson)
Use SAFE_() macros for is*() and to*() functions
The standard is*() and to*() functions declared in <ctype.h> have
undefined behavior for negative arguments other than EOF. If plain char
is signed, passing an unchecked value from argv for from user input
to one of these functions has undefined behavior.
Solution: Add SAFE_*() macros that cast the argument to unsigned char.
Most implementations behave sanely for negative arguments, and most
character values in practice are non-negative, but it's still best
to avoid undefined behavior.
The change from #13347 has been omitted, as this has already been
separately fixed in commit ac709e2fc0db6d31abb7da96f743c40956b60c3a
(v9.0.2054)
fixes: #13332closes: #13347
Signed-off-by: Keith Thompson <Keith.S.Thompson@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: cannot complete option values
Solution: Add completion functions for several options
Add cmdline tab-completion for setting string options
Add tab-completion for setting string options on the cmdline using
`:set=` (along with `:set+=` and `:set-=`).
The existing tab completion for setting options currently only works
when nothing is typed yet, and it only fills in with the existing value,
e.g. when the user does `:set diffopt=<Tab>` it will be completed to
`set diffopt=internal,filler,closeoff` and nothing else. This isn't too
useful as a user usually wants auto-complete to suggest all the possible
values, such as 'iblank', or 'algorithm:patience'.
For set= and set+=, this adds a new optional callback function for each
option that can be invoked when doing completion. This allows for each
option to have control over how completion works. For example, in
'diffopt', it will suggest the default enumeration, but if `algorithm:`
is selected, it will further suggest different algorithm types like
'meyers' and 'patience'. When using set=, the existing option value will
be filled in as the first choice to preserve the existing behavior. When
using set+= this won't happen as it doesn't make sense.
For flag list options (e.g. 'mouse' and 'guioptions'), completion will
take into account existing typed values (and in the case of set+=, the
existing option value) to make sure it doesn't suggest duplicates.
For set-=, there is a new `ExpandSettingSubtract` function which will
handle flag list and comma-separated options smartly, by only suggesting
values that currently exist in the option.
Note that Vim has some existing code that adds special handling for
'filetype', 'syntax', and misc dir options like 'backupdir'. This change
preserves them as they already work, instead of converting to the new
callback API for each option.
closes: #13182
Signed-off-by: Christian Brabandt <cb@256bit.org>
Co-authored-by: Yee Cheng Chin <ychin.git@gmail.com>
Problem: "clear" macros are not always used.
Solution: Use ALLOC_ONE, VIM_CLEAR, CLEAR_POINTER and CLEAR_FIELD in more
places. (Yegappan Lakshmanan, closes#12104)
Problem: FOR_ALL_ macros are defined in an unexpected file.
Solution: Move FOR_ALL_ macros to macros.h. Add FOR_ALL_HASHTAB_ITEMS.
(Yegappan Lakshmanan, closes#12109)
Problem: Screen is not redrawn after using setcellwidths().
Solution: Redraw the screen when the cell widths have changed. (Yasuhiro
Matsumoto, closes#11800)
Problem: Spacing-combining characters handled as composing, causing text to
take more space than expected.
Solution: Handle characters marked with "Mc" not as composing.
(closes#11282
Problem: Vim9: runtime and compile time type checks are not the same.
Solution: Add more runtime type checks for builtin functions. (Yegappan
Lakshmanan, closes#8646)
Problem: Functions for string manipulation are spread out.
Solution: Move string related functions to a new source file. (Yegappan
Lakshmanan, closes#8470)
Problem: 'fileencodings' default value should depend on 'encoding'. (Gary
Johnson)
Solution: When 'encoding' is "utf-8" use a different default value for
'fileencodings'.
Problem: When 'clipboard' is "unnamed" zp and zP do not work correctly.
Solution: Pass -1 to str_to_reg() and fix computing the character width
instead of using the byte length. (Christian Brabandt,
closes#8301, closes#8317)