UTF-8 API overhaul: security, and other concerns.

The main aim here is to pave the way for conversion between arbitrary buffers of bytes (that may include embedded NUL characters) and a wide string. Also, a potential security hole is closed. When we convert a TXR string to UTF-8 for use with some C library API, any embedded pnul characters (U+DC00) turn into NUL bytes which effectively cut the UTF-8 string short, and silently so. The C library function receives a shortened string. This could be exploitable in some situations. * lib.c (int_str): Use utf8_dup_to_buf instead of utf8_dup_to_uc. Pass 1 to have the buffer null-terminated, since mp_read_radix depends on it. * stream.c (make_string_byte_input_stream): Use utf8_dup_to_buf. This gives us the size, soo we don't have to call strlen. The buffer is no longer null terminated, but the byte input stream implementation never relied on this. * utf8.c (utf8_from_buf): Replacement fors utf8_from_uc which doesn't assume that the buffer of bytes is null-terminated. It can produce a wide string containing U+DC00 characters corresponding to embedded nulls in the original buffer. (utf8_from): Calculate length of null-terminated string and use utf8_from_buf. (utf8_to_buf): Replacement for utf8_to_uc. Can produce a buffer which is or is not null-terminated, based on new argument. (utf8_to): Use utf8_to_buf, and ask it to null-terminate, thus preserving behavior. (utf8_dup_from_uc): This function was not used anywhere and is removed. (utf8_dup_to_buf): Replacement for utf8_dup_to_uc which takes an extra agrgument, whether to null-terminate or not. (utf8_dup_to): Apply security check here: is the resulting string as long as utf8_to says it should be? If not, it contains embedded nulls. Throw an exception. * utf.h (utf8_from_uc, utf8_to_uc, utf8_dup_from_uc, utf8_dup_to_uc): Declarations removed. (utf8_from_buf, utf8_to_buf, utf8_dup_to_buf): Declared.
author: Kaz Kylheku <kaz@kylheku.com> 2016-03-31 20:53:03 -0700
committer: Kaz Kylheku <kaz@kylheku.com> 2016-03-31 20:53:03 -0700
commit: c27f83bdae5eb00206a478f7764df4fdaa48fc76 (patch)
tree: 3fdcd29807e120c1836a7ba59de6098a0460b636 /stream.c
parent: 98b26ff13eeb8a9f730801720c4cba30eba9e61d (diff)
download: txr-c27f83bdae5eb00206a478f7764df4fdaa48fc76.tar.gz
txr-c27f83bdae5eb00206a478f7764df4fdaa48fc76.tar.bz2
txr-c27f83bdae5eb00206a478f7764df4fdaa48fc76.zip
1 files changed, 1 insertions, 3 deletions
diff --git a/stream.c b/stream.c
index 1f6a6bde..415991a8 100644
--- a/stream.c
+++ b/stream.c
@@ -1690,10 +1690,8 @@ val make_string_byte_input_stream(val string)
 
   {
     struct byte_input *bi = coerce(struct byte_input *, chk_malloc(sizeof *bi));
-    unsigned char *utf8 = utf8_dup_to_uc(c_str(string));
     strm_base_init(&bi->a);
-    bi->buf = utf8;
-    bi->size = strlen(coerce(char *, utf8));
+    bi->buf = utf8_dup_to_buf(c_str(string), &bi->size, 0);
     bi->index = 0;
     return cobj(coerce(mem_t *, bi), stream_s, &byte_in_ops.cobj_ops);
   }
author	Kaz Kylheku <kaz@kylheku.com>	2016-03-31 20:53:03 -0700
committer	Kaz Kylheku <kaz@kylheku.com>	2016-03-31 20:53:03 -0700
commit	c27f83bdae5eb00206a478f7764df4fdaa48fc76 (patch)
tree	3fdcd29807e120c1836a7ba59de6098a0460b636 /stream.c
parent	98b26ff13eeb8a9f730801720c4cba30eba9e61d (diff)
download	txr-c27f83bdae5eb00206a478f7764df4fdaa48fc76.tar.gz txr-c27f83bdae5eb00206a478f7764df4fdaa48fc76.tar.bz2 txr-c27f83bdae5eb00206a478f7764df4fdaa48fc76.zip