thac0/vim - vim - SDF GIT Society

thac0/vim

Author	SHA1	Message	Date
Christian Brabandt	f2b16986a1	patch 9.1.1258: regexp: max \U and \%U value is limited by INT_MAX Problem: regexp: max \U and \%U value is limited by INT_MAX but gives a confusing error message (related: v8.1.0985). Solution: give a better error message when the value reaches INT_MAX When searching Vim allows to get up to 8 hex characters using the /\V and /\%V regex atoms. However, when using "/\UFFFFFFFF" the code point is already above what an integer variable can hold, which is 2,147,483,647. Since patch v8.1.0985, Vim already limited the max codepoint to INT_MAX (otherwise it caused a crash in the nfa regex engine), but instead of error'ing out it silently fell back to parse the number as a backslash value and not as a codepoint value and as such this "/[\UFFFFFFFF]" will happily find a "\" or an literal "F". And this "/[\d127-\UFFFFFFFF]" will error out as "reverse range in character class). Interestingly, the max Unicode codepoint value is U+10FFFF which still fits into an ordinary integer value, which means, that we don't even need to parse 8 hex characters, but 6 should have been enough. However, let's not limit Vim to search for only max 6 hex characters (which would be a backward incompatible change), but instead allow all 8 characters and only if the codepoint reaches INT_MAX, give a more precise error message (about what the max unicode codepoint value is). This allows to search for "[\U7FFFFFFE]" (will likely return "E486 Pattern not found") and "[/\U7FFFFFF]" now errors "E1517: Value too large, max Unicode codepoint is U+10FFFF". While this change is straight forward on architectures where long is 8 bytes, this is not so simple on Windows or 32bit architectures where long is 4 bytes (and therefore the test fails there). To account for that, let's make use of the vimlong_T number type and make a few corresponding changes in the regex engine code and cast the value to the expected data type. This however may not work correctly on systems that doesn't have the long long datatype (e.g. OpenVMS) and probably the test will fail there. fixes: #16949 closes: #16994 Signed-off-by: Christian Brabandt <cb@256bit.org>	2025-03-29 09:08:58 +01:00
Christian Brabandt	c3a02d78bd	patch 9.1.0701: crash with NFA regex engine when searching for composing chars Problem: crash with NFA regex engine when searching for composing chars (SuyueGuo) Solution: When there is no composing character, break out of the loop and check that out1 state is not null fixes: #15583 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-08-28 23:17:52 +02:00
Christian Brabandt	22e8e12d9f	patch 9.1.0645: regex: wrong match when searching multi-byte char case-insensitive Problem: regex: wrong match when searching multi-byte char case-insensitive (diffsetter) Solution: Apply proper case-folding for characters and search-string This patch does the following 4 things: 1) When the regexp engine compares two utf-8 codepoints case insensitive it may match an adjacent character, because it assumes it can step over as many bytes as the pattern contains. This however is not necessarily true because of case-folding, a multi-byte UTF-8 character can be considered equal to some single-byte value. Let's consider the pattern 'ſ' and the string 's'. When comparing and ignoring case, the single character 's' matches, and since it matches Vim will try to step over the match (by the amount of bytes of the pattern), assuming that since it matches, the length of both strings is the same. However in that case, it should only step over the single byte value 's' by 1 byte and try to start matching after it again. So for the backtracking engine we need to ensure: * we try to match the correct length for the pattern and the text * in case of a match, we step over it correctly There is one tricky thing for the backtracing engine. We also need to calculate correctly the number of bytes to compare the 2 different utf-8 strings s1 and s2. So we will count the number of characters in s1 that the byte len specified. Then we count the number of bytes to step over the same number of characters in string s2 and then we can correctly compare the 2 utf-8 strings. 2) A similar thing can happen for the NFA engine, when skipping to the next character to test for a match. We are skipping over the regstart pointer, however we do not consider the case that because of case-folding we may need to adjust the number of bytes to skip over. So this needs to be adjusted in find_match_text() as well. 3) A related issue turned out, when prog->match_text is actually empty. In that case we should try to find the next match and skip this condition. 4) When comparing characters using collections, we must also apply case folding to each character in the collection and not just to the current character from the search string. This doesn't apply to the NFA engine, because internally it converts collections to branches [abc] -> a\\|b\\|c fixes: #14294 closes: #14756 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-07-30 20:39:18 +02:00
John Marriott	82792db631	patch 9.1.0409: too many strlen() calls in the regexp engine Problem: too many strlen() calls in the regexp engine Solution: refactor code to retrieve strlen differently, make use of bsearch() for getting the character class (John Marriott) closes: #14648 Signed-off-by: John Marriott <basilisk@internode.on.net> Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-05-12 00:07:17 +02:00
Christian Brabandt	c97f4d61cd	patch 9.1.0297: Patch 9.1.0296 causes too many issues Problem: Patch 9.1.0296 causes too many issues (Tony Mechelynck, @chdiza, CI) Solution: Back out the change for now Revert "patch 9.1.0296: regexp: engines do not handle case-folding well" This reverts commit `7a27c108e0` it causes issues with syntax highlighting and breaks the FreeBSD and MacOS CI. It needs more work. fixes: #14487 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-04-10 16:22:17 +02:00
Christian Brabandt	7a27c108e0	patch 9.1.0296: regexp: engines do not handle case-folding well Problem: Regex engines do not handle case-folding well Solution: Correctly calculate byte length of characters to skip When the regexp engine compares two utf-8 codepoints case insensitively it may match an adjacent character, because it assumes it can step over as many bytes as the pattern contains. This however is not necessarily true because of case-folding, a multi-byte UTF-8 character can be considered equal to some single-byte value. Let's consider the pattern 'ſ' and the string 's'. When comparing and ignoring case, the single character 's' matches, and since it matches Vim will try to step over the match (by the amount of bytes of the pattern), assuming that since it matches, the length of both strings is the same. However in that case, it should only step over the single byte value 's' so by 1 byte and try to start matching after it again. So for the backtracking engine we need to ensure: - we try to match the correct length for the pattern and the text - in case of a match, we step over it correctly The same thing can happen for the NFA engine, when skipping to the next character to test for a match. We are skipping over the regstart pointer, however we do not consider the case that because of case-folding we may need to adjust the number of bytes to skip over. So this needs to be adjusted in find_match_text() as well. A related issue turned out, when prog->match_text is actually empty. In that case we should try to find the next match and skip this condition. fixes: #14294 closes: #14433 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-04-09 22:53:19 +02:00
Christian Brabandt	b64cec217f	patch 9.1.0229: Error E877 is not translated Problem: Error E877 is not translated (RestorerZ) Solution: Declare the error with N_ to mark it as translatable, add _() around the error message in regexp_nfa.c fixes: #14333 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-03-31 17:56:17 +02:00
Julio B	46fa3c7e27	patch 9.1.0217: regexp: verymagic cannot match before/after a mark Problem: regexp: verymagic cannot match before/after a mark Solution: Correctly check for the very magic check (Julio B) Fix regexp parser for \v%>'m and \v%<'m Currently \v%'m works fine, but it is unable to match before or after the position of mark m. closes: #14309 Signed-off-by: Julio B <julio.bacel@gmail.com> Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-03-28 10:23:37 +01:00
Christian Brabandt	d2cc51f9a1	patch 9.1.0011: regexp cannot match combining chars in collection Problem: regexp cannot match combining chars in collection Solution: Check for combining characters in regex collections for the NFA and BT Regex Engine Also, while at it, make debug mode work again. fixes #10286 closes: #12871 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-01-04 22:54:08 +01:00
Christian Brabandt	be07caa071	patch 9.0.1777: patch 9.0.1771 causes problems Problem: patch 9.0.1771 causes problems Solution: revert it Revert "patch 9.0.1771: regex: combining chars in collections not handled" This reverts commit `ca22fc36a4`. Signed-off-by: Christian Brabandt <cb@256bit.org>	2023-08-20 22:28:28 +02:00
Christian Brabandt	ca22fc36a4	patch 9.0.1771: regex: combining chars in collections not handled Problem: regex: combining chars in collections not handled Solution: Check for following combining characters for NFA and BT engine closes: #10459 closes: #10286 Signed-off-by: Christian Brabandt <cb@256bit.org>	2023-08-20 20:38:56 +02:00
RestorerZ	68ebcee023	patch 9.0.1594: some internal error messages are translated Problem: Some internal error messages are translated. Solution: Consistently do not translate internal error messages. (closes #12459)	2023-05-31 17:12:14 +01:00
Bram Moolenaar	097c5370ea	patch 9.0.1576: users may not know what to do with an internal error Problem: Users may not know what to do with an internal error. Solution: Add a translated message with instructions.	2023-05-24 21:02:24 +01:00
Bram Moolenaar	c9471b1872	patch 9.0.1529: code style test doesn't check for space after "if" Problem: Code style test doesn't check for space after "if". Solution: Add a test for space.	2023-05-09 15:00:00 +01:00
Bram Moolenaar	1f76138ff1	patch 9.0.1427: warning for uninitialized variable Problem: Warning for uninitialized variable. (Tony Mechelynck) Solution: Add #ifdef.	2023-03-25 11:31:32 +00:00
zeertzjq	1b438a8228	patch 9.0.1271: using sizeof() and subtract array size is tricky Problem: Using sizeof() and subtract array size is tricky. Solution: Use offsetof() instead. (closes #11926)	2023-02-01 13:11:15 +00:00
Bram Moolenaar	ebfec1c531	patch 9.0.1234: the code style has to be checked manually Problem: The code style has to be checked manually. Solution: Add basic code style checks in a test. Fix or avoid uncovered problems.	2023-01-22 21:14:53 +00:00
Yegappan Lakshmanan	f97a295cca	patch 9.0.1221: code is indented more than necessary Problem: Code is indented more than necessary. Solution: Use an early return where it makes sense. (Yegappan Lakshmanan, closes #11833)	2023-01-18 18:17:48 +00:00
Bram Moolenaar	79336e19cb	patch 9.0.1047: matchparen is slow Problem: Matchparen is slow. Solution: Actually use the position where the match started, not the position where the search started. (closes #11644)	2022-12-11 14:18:31 +00:00
Bram Moolenaar	4c5678ff0c	patch 9.0.0977: it is not easy to see what client-server commands are doing Problem: It is not easy to see what client-server commands are doing. Solution: Add channel log messages if ch_log() is available. Move the channel logging and make it available with the +eval feature.	2022-11-30 18:12:19 +00:00
Bram Moolenaar	01105b37a1	patch 9.0.0951: trying every character position for a match is inefficient Problem: Trying every character position for a match is inefficient. Solution: Use the start position of the match ignoring "\zs".	2022-11-26 11:47:10 +00:00
Bram Moolenaar	c96311b5be	patch 9.0.0950: the pattern "\_s\zs" matches at EOL Problem: The pattern "\_s\zs" matches at EOL. Solution: Make the pattern "\_s\zs" match at the start of the next line. (closes #11617)	2022-11-25 21:13:47 +00:00
Bram Moolenaar	88456cd3c4	patch 9.0.0904: various comment and indent flaws Problem: Various comment and indent flaws. Solution: Improve comments and indenting.	2022-11-18 22:14:09 +00:00
Bram Moolenaar	753aead960	patch 9.0.0414: matchstr() still does not match column offset Problem: matchstr() still does not match column offset when done after a text search. Solution: Only use the line number for a multi-line search. Fix the test. (closes #10938)	2022-09-08 12:17:06 +01:00
Bram Moolenaar	75a115e8d6	patch 9.0.0407: matchstr() does match column offset Problem: matchstr() does match column offset. (Yasuhiro Matsumoto) Solution: Accept line number zero. (closes #10938)	2022-09-07 18:21:24 +01:00
Bram Moolenaar	13ed494bb5	patch 9.0.0228: crash when pattern looks below the last line Problem: Crash when pattern looks below the last line. Solution: Consider invalid lines to be empty. (closes #10938)	2022-08-19 13:59:25 +01:00
Bram Moolenaar	7f9969c559	patch 9.0.0067: cannot show virtual text Problem: Cannot show virtual text. Solution: Initial changes for virtual text support, using text properties.	2022-07-25 18:13:54 +01:00
Bram Moolenaar	509ce03831	patch 8.2.5137: cannot build without the +channel feature Problem: Cannot build without the +channel feature. (Dominique Pellé) Solution: Add #ifdef around ch_log() calls. (closes #10598)	2022-06-20 11:23:01 +01:00
Bram Moolenaar	616592e081	patch 8.2.5115: search timeout is overrun with some patterns Problem: Search timeout is overrun with some patterns. Solution: Check for timeout in more places. Make the flag volatile and atomic. Use assert_inrange() to see what happened.	2022-06-17 15:17:10 +01:00
Paul Ollis	6574577cac	patch 8.2.5057: using gettimeofday() for timeout is very inefficient Problem: Using gettimeofday() for timeout is very inefficient. Solution: Set a platform dependent timer. (Paul Ollis, closes #10505)	2022-06-05 16:55:54 +01:00
Bram Moolenaar	305abc6123	patch 8.2.5036: using two counters for timeout check in NFA engine Problem: Using two counters for timeout check in NFA engine. Solution: Use only one counter. Tune the counts based on guessing.	2022-05-28 11:08:40 +01:00
Bram Moolenaar	02e8d4e4ff	patch 8.2.5028: syntax regexp matching can be slow Problem: Syntax regexp matching can be slow. Solution: Adjust the counters for checking the timeout to check about once per msec. (closes #10487, closes #2712)	2022-05-27 15:35:28 +01:00
Christian Brabandt	360da40b47	patch 8.2.4978: no error if engine selection atom is not at the start Problem: No error if engine selection atom is not at the start. Solution: Give an error. (Christian Brabandt, closes #10439)	2022-05-18 15:04:02 +01:00
Bram Moolenaar	72bb10df1f	patch 8.2.4693: new regexp does not accept pattern "\%>0v" Problem: new regexp does not accept pattern "\%>0v". Solution: Do accept digit zero.	2022-04-05 14:00:32 +01:00
Bram Moolenaar	91ff3d4f52	patch 8.2.4688: new regexp engine does not give an error for "\%v" Problem: New regexp engine does not give an error for "\%v". Solution: Check for a value argument. (issue #10079)	2022-04-04 18:32:32 +01:00
Bram Moolenaar	b4ad3b0dea	patch 8.2.4649: various formatting problems Problem: Various formatting problems. Solution: Improve the code formatting.	2022-03-30 10:57:45 +01:00
Bram Moolenaar	b10ff5c1b3	patch 8.2.4592: search continues after giving E1204 Problem: Search continues after giving E1204. Solution: Return failure after giving E1204. (closes #9972)	2022-03-19 11:31:38 +00:00
zeertzjq	0a4e098f32	patch 8.2.4546: duplicate #undef Problem: Duplicate #undef. Solution: Remove one #undef. (closes #9932)	2022-03-11 15:33:53 +00:00
Bram Moolenaar	424bcae1fb	patch 8.2.4273: the EBCDIC support is outdated Problem: The EBCDIC support is outdated. Solution: Remove the EBCDIC support.	2022-01-31 14:59:41 +00:00
Bram Moolenaar	b2d85e3784	patch 8.2.4029: debugging NFA regexp my crash, cached indent may be wrong Problem: Debugging NFA regexp my crash, cached indent may be wrong. Solution: Fix some debug warnings in the NFA regexp code. Make sure log_fd is set when used. Fix breakindent and indent caching. (Christian Brabandt, closes #9482)	2022-01-07 16:55:32 +00:00
Bram Moolenaar	d82a47dd04	patch 8.2.4012: error messages are spread out Problem: Error messages are spread out. Solution: Move the last error messages to errors.h.	2022-01-05 20:24:39 +00:00
Bram Moolenaar	9d00e4a814	patch 8.2.4010: error messages are spread out Problem: Error messages are spread out. Solution: Move more error messages to errors.h.	2022-01-05 17:49:15 +00:00
Bram Moolenaar	677658ae49	patch 8.2.4008: error messages are spread out Problem: Error messages are spread out. Solution: Move more error messages to errors.h.	2022-01-05 16:09:06 +00:00
Bram Moolenaar	a6f7929e62	patch 8.2.4005: error messages are spread out Problem: Error messages are spread out. Solution: Move more error messages to errors.h.	2022-01-04 21:30:47 +00:00
Bram Moolenaar	74409f6279	patch 8.2.3970: error messages are spread out Problem: Error messages are spread out. Solution: Move more errors to errors.h.	2022-01-01 15:58:22 +00:00
Dominique Pelle	af4a61a85d	patch 8.2.3914: various spelling mistakes in comments Problem: Various spelling mistakes in comments. Solution: Fix the mistakes. (Dominique Pellé, closes #9416)	2021-12-27 17:21:41 +00:00
Yegappan Lakshmanan	bc404bfb32	patch 8.2.3855: illegal memory access when displaying a blob Problem: Illegal memory access when displaying a blob. Solution: Append a NUL at the end. (Yegappan Lakshmanan, closes #9372)	2021-12-19 19:19:31 +00:00
Bram Moolenaar	52797bae17	patch 8.2.3825: various comments could be improved Problem: Various comments could be improved. Solution: Improve the comments.	2021-12-16 14:45:13 +00:00
Bram Moolenaar	12f3c1b77f	patch 8.2.3749: error messages are everywhere Problem: Error messages are everywhere. Solution: Move more error messages to errors.h and adjust the names.	2021-12-05 21:46:34 +00:00
Bram Moolenaar	64066b9acd	patch 8.2.3612: using freed memory with regexp using a mark Problem: Using freed memory with regexp using a mark. Solution: Get the line again after getting the mark position.	2021-11-17 18:22:56 +00:00

1 2 3 4 5 ...

296 Commits