Problem: Patch 9.1.0296 causes too many issues
(Tony Mechelynck, @chdiza, CI)
Solution: Back out the change for now
Revert "patch 9.1.0296: regexp: engines do not handle case-folding well"
This reverts commit 7a27c108e0509f3255ebdcb6558e896c223e4d23 it causes
issues with syntax highlighting and breaks the FreeBSD and MacOS CI. It
needs more work.
fixes: #14487
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Regex engines do not handle case-folding well
Solution: Correctly calculate byte length of characters to skip
When the regexp engine compares two utf-8 codepoints case insensitively
it may match an adjacent character, because it assumes it can step over
as many bytes as the pattern contains.
This however is not necessarily true because of case-folding, a
multi-byte UTF-8 character can be considered equal to some single-byte
value.
Let's consider the pattern 'ſ' and the string 's'. When comparing and
ignoring case, the single character 's' matches, and since it matches
Vim will try to step over the match (by the amount of bytes of the
pattern), assuming that since it matches, the length of both strings is
the same.
However in that case, it should only step over the single byte
value 's' so by 1 byte and try to start matching after it again. So for the
backtracking engine we need to ensure:
- we try to match the correct length for the pattern and the text
- in case of a match, we step over it correctly
The same thing can happen for the NFA engine, when skipping to the next
character to test for a match. We are skipping over the regstart
pointer, however we do not consider the case that because of
case-folding we may need to adjust the number of bytes to skip over. So
this needs to be adjusted in find_match_text() as well.
A related issue turned out, when prog->match_text is actually empty. In
that case we should try to find the next match and skip this condition.
fixes: #14294closes: #14433
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Error E877 is not translated (RestorerZ)
Solution: Declare the error with N_ to mark it as translatable, add _()
around the error message in regexp_nfa.c
fixes: #14333
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: regexp: verymagic cannot match before/after a mark
Solution: Correctly check for the very magic check (Julio B)
Fix regexp parser for \v%>'m and \v%<'m
Currently \v%'m works fine, but it is unable to match before or after
the position of mark m.
closes: #14309
Signed-off-by: Julio B <julio.bacel@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: regexp cannot match combining chars in collection
Solution: Check for combining characters in regex collections for the
NFA and BT Regex Engine
Also, while at it, make debug mode work again.
fixes#10286closes: #12871
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: regex: combining chars in collections not handled
Solution: Check for following combining characters for NFA and BT engine
closes: #10459closes: #10286
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: It is not easy to see what client-server commands are doing.
Solution: Add channel log messages if ch_log() is available. Move the
channel logging and make it available with the +eval feature.
Problem: matchstr() still does not match column offset when done after a
text search.
Solution: Only use the line number for a multi-line search. Fix the test.
(closes#10938)
Problem: Search timeout is overrun with some patterns.
Solution: Check for timeout in more places. Make the flag volatile and
atomic. Use assert_inrange() to see what happened.
Problem: Syntax regexp matching can be slow.
Solution: Adjust the counters for checking the timeout to check about once
per msec. (closes#10487, closes#2712)
Problem: Debugging NFA regexp my crash, cached indent may be wrong.
Solution: Fix some debug warnings in the NFA regexp code. Make sure log_fd
is set when used. Fix breakindent and indent caching. (Christian
Brabandt, closes#9482)