utf8: fix backtracking bugs in buffer decoder.

* utf8.c (utf8_from_buffer): Fix incorrect backtracking logic for handling bad UTF-8 bytes. Firstly, we are not backtracking to the correct byte. Because src is incremented at the top of the loop, the backtrack pointer must be set to src - 1 to point to the possibly bad byte. Secondly, when we backtrack, we are neglecting to rewinding nbytes! Thus after backtracking, we will not scan the entire input. Let's avoid using nbytes, and guard the loop based on whether we hit the end of the buffer; then we don't have any nbytes state to backtrack. * tests/017/ffi-misc.tl: New test case converting a three-byte UTF-8 encoding of U+DC01: an invalid character in the surrogate range. We test that the buffer decoder turns this into three characters, exactly like the stream decoder. Another test case for invalid bytes following a valid sequence start.
author: Kaz Kylheku <kaz@kylheku.com> 2021-04-07 07:09:15 -0700
committer: Kaz Kylheku <kaz@kylheku.com> 2021-04-07 07:09:15 -0700
commit: e2fe814938e7e8aa9286e9ba3e1c0fd7eff37766 (patch)
tree: 237189e4bef17a403f81e38703566e274846a8c6 /tests/017
parent: 38b710b262d96a004092b618f508b2e97f5b2977 (diff)
download: txr-e2fe814938e7e8aa9286e9ba3e1c0fd7eff37766.tar.gz
txr-e2fe814938e7e8aa9286e9ba3e1c0fd7eff37766.tar.bz2
txr-e2fe814938e7e8aa9286e9ba3e1c0fd7eff37766.zip
1 files changed, 7 insertions, 0 deletions
diff --git a/tests/017/ffi-misc.tl b/tests/017/ffi-misc.tl
index 1578cd2c..db510737 100644
--- a/tests/017/ffi-misc.tl
+++ b/tests/017/ffi-misc.tl
@@ -9,3 +9,10 @@
 (test (ffi-put "\x1234@@@" zar) #b'e188b440404000')
 
 (test (ffi-get (ffi-put "\x1234@@@" zar) zar) "\x1234@@@")
+
+(unless (meq (os-symbol) :cygwin :cygnal)
+  (test (ffi-get #b'EDB08100' (ffi (zarray char)))
+       "\xDCED\xDCB0\xDC81")
+
+  (test (ffi-get #b'ED7F7FEDFF00' (ffi (zarray char)))
+       "\xDCED\x7F\x7F\xDCED\xDCFF"))
author	Kaz Kylheku <kaz@kylheku.com>	2021-04-07 07:09:15 -0700
committer	Kaz Kylheku <kaz@kylheku.com>	2021-04-07 07:09:15 -0700
commit	e2fe814938e7e8aa9286e9ba3e1c0fd7eff37766 (patch)
tree	237189e4bef17a403f81e38703566e274846a8c6 /tests/017
parent	38b710b262d96a004092b618f508b2e97f5b2977 (diff)
download	txr-e2fe814938e7e8aa9286e9ba3e1c0fd7eff37766.tar.gz txr-e2fe814938e7e8aa9286e9ba3e1c0fd7eff37766.tar.bz2 txr-e2fe814938e7e8aa9286e9ba3e1c0fd7eff37766.zip