summaryrefslogtreecommitdiffstats
path: root/struct.c
diff options
context:
space:
mode:
authorKaz Kylheku <kaz@kylheku.com>2024-07-03 18:21:17 -0700
committerKaz Kylheku <kaz@kylheku.com>2024-07-03 18:21:17 -0700
commit0ccf4483d8ed2ad5805116df5dba4a858e5c373a (patch)
treef8e2a595c4a52afb55f3025c767ae790de231357 /struct.c
parent77deceded0e5c9143e01d07a19eb219b3273151b (diff)
downloadtxr-0ccf4483d8ed2ad5805116df5dba4a858e5c373a.tar.gz
txr-0ccf4483d8ed2ad5805116df5dba4a858e5c373a.tar.bz2
txr-0ccf4483d8ed2ad5805116df5dba4a858e5c373a.zip
regex: don't consume input past final match.
The read-until-match functions and the two others in the same family always read a character beyond the characters matched by the regex. This will cause blocking behavior in cases where a TTY or network socket has provided the a matching record delimiter already, using a trivial, fixed-length regex. Similar behavior is seen in GNU Awk also, with its RS (record separator); let's fix it in our world. We introduce a REGM_MATCH_DONE result code, which, like REGM_MATCH, indicates that the state machine is an acceptance state. Unlike REGM_MATCH it also indicates that no more transitions are possible. For instance, for a regex like #/ab|c/, the REGM_MATCH_DONE code will be indicated when the input "ab" is seen, or the input "c" is seen. Any additional characters will cause a mismatch. This indication makes it possible for the caller to avoid reading more characters from an input source. * regex.c (enum regm_reesult, regm_result_t): New REGM_MATCH_DONE enum member. (nfa_has_transitions): New macro. (nfa_closure, nfa_move_closure): New pointer-to-int parameter more. This is set to true only if one or more states in the output state have transitions. (nfa_run): Initialize new local variable more and pass to nfa_closure and nfa_move closure. Break out of the character feeding loop if more is zero. (regex_machine_reset): Pass more parameter to nfa_closure. (regex_machine_feed): Pass more parameter to nfa_move_closure. When returning REG_MATCH, if more is false, return REG_MATCH_DONE. In the derivatives implementation, we report REGM_MATCH_DONE when the derivative we have calculated is null. (search_regex, match_regex): Break loop on REGM_MATCH_DONE, and avoid feeding the null character in that case. (match_regex_right): Likewise, and also handle the REGM_MATCH_DONE case specially at the end. We need to check whether the match reached the end of the string (is anchored to the right). If not, we continue the search. (regex_prefix_match): Break loop on REGM_MATCH_DONE. (scan_until_common): If we hit REGM_MATCH_DONE, break out of the loop and proceed straight to the out_match block, indicating that no characters need to be pushed back from the stack.
Diffstat (limited to 'struct.c')
0 files changed, 0 insertions, 0 deletions