summaryrefslogtreecommitdiffstats
path: root/utf8.h
Commit message (Collapse)AuthorAgeFilesLines
* ffi: conv between strings and char/wchar_t arrays.Kaz Kylheku2017-04-291-0/+1
| | | | | | | | | | | | | | * ffi.c (struct txr_ffi_type): New bitfield members char_conv, wchar_conv. (ffi_array_put): Check char_conv and wchar_conv flags and perform conversion to string, watching out for the presence or absence of null termination. (ffi_type_compile): When compiling array, check for the str and wstr element type, and set the flags. * utf8.c (ut8f_dup_from_buf): New function. * utf8.c (ut8f_dup_from_buf): Declared.
* Bump copyright year to 2017.Kaz Kylheku2017-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/except.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/package.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl: Add 2017 to all copyright headers and strings.
* Synchronize license comments with LICENSE.Kaz Kylheku2016-10-011-16/+17
| | | | | | | | | | | | | | | | | | | | * Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/except.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Revert to verbatim 2-Clause BSD.
* UTF-8 API overhaul: security, and other concerns.Kaz Kylheku2016-03-311-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main aim here is to pave the way for conversion between arbitrary buffers of bytes (that may include embedded NUL characters) and a wide string. Also, a potential security hole is closed. When we convert a TXR string to UTF-8 for use with some C library API, any embedded pnul characters (U+DC00) turn into NUL bytes which effectively cut the UTF-8 string short, and silently so. The C library function receives a shortened string. This could be exploitable in some situations. * lib.c (int_str): Use utf8_dup_to_buf instead of utf8_dup_to_uc. Pass 1 to have the buffer null-terminated, since mp_read_radix depends on it. * stream.c (make_string_byte_input_stream): Use utf8_dup_to_buf. This gives us the size, soo we don't have to call strlen. The buffer is no longer null terminated, but the byte input stream implementation never relied on this. * utf8.c (utf8_from_buf): Replacement fors utf8_from_uc which doesn't assume that the buffer of bytes is null-terminated. It can produce a wide string containing U+DC00 characters corresponding to embedded nulls in the original buffer. (utf8_from): Calculate length of null-terminated string and use utf8_from_buf. (utf8_to_buf): Replacement for utf8_to_uc. Can produce a buffer which is or is not null-terminated, based on new argument. (utf8_to): Use utf8_to_buf, and ask it to null-terminate, thus preserving behavior. (utf8_dup_from_uc): This function was not used anywhere and is removed. (utf8_dup_to_buf): Replacement for utf8_dup_to_uc which takes an extra agrgument, whether to null-terminate or not. (utf8_dup_to): Apply security check here: is the resulting string as long as utf8_to says it should be? If not, it contains embedded nulls. Throw an exception. * utf.h (utf8_from_uc, utf8_to_uc, utf8_dup_from_uc, utf8_dup_to_uc): Declarations removed. (utf8_from_buf, utf8_to_buf, utf8_dup_to_buf): Declared.
* Copyright year bump.Kaz Kylheku2015-12-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | * LICENSE, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/cadr.tl, share/txr/stdlib/except.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Add 2016 copyright. * linenoise/LICENSE, linenoise/linenoise.c, linenoise/linenoise.h: Bump one principal author's copyright from 2014 to 2015. The code is based on a snapshot of 2015 upstream work.
* Functions open-fileno and fileno.Kaz Kylheku2015-04-101-0/+1
| | | | | | | | | | | | | | | | * stream.c (fd_k): New keyword variable. (stdio_get_prop): Handle the :fd property by returning the file descriptor. (open_fileno): New function. (stream_init): Initialize fd_k, and register fileno and open-fileno. * stream.h (open_fileno): Declared. * txr.1: Documented open-fileno and fileno. * utf8.c (w_fdopen): New function. * utf8.h (w_fdopen): Declared.
* Update copyright notices from 2014 to 2015.Kaz Kylheku2015-02-011-1/+1
| | | | | | | | | | | * arith.c, arith.h, combi.c, combi.h, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, hash.c, hash.h, lib.c, lib.h, match.c, match.h, parser.h, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, sysif.c, sysif.h, syslog.c, syslog.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Update. * LICENSE, METALICENSE: Likewise.
* * Makefile, arith.c, arith.h, combi.c, combi.h, configure, debug.c,Kaz Kylheku2014-07-231-16/+16
| | | | | | | | debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, hash.c, hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, syslog.c, syslog.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Synchronize license header with LICENSE.
* A trivial change in the UTF-8 decoder allows TXR to handle null bytesKaz Kylheku2014-02-151-0/+3
| | | | | | | | | | | | | | in text. * utf8.h (UTF8_ADMIT_NUL): New preprocessor symbol. (utf8_decoder): New member, flags. * utf8.c (utf8_decoder_init): Initialize flags to 0. (utf8_decode): If a null byte is encountered in the input, then convert it to 0xDC00, rather than keeping it as zero, unless flags contains UTF8_ADMIT_NUL. * txr.1: Document handling of null bytes.
* * stream.c (remove_path, rename_path): New functions.Kaz Kylheku2014-01-281-0/+2
| | | | | | | | | | | * stream.h (remove_path, rename_path): Declared. * utf8.c (w_remove, w_rename): New functions. * utf8.h (w_remove, w_rename): Declared. * eval.c (eval_init): Registered remove_path and rename_path as intrinsics.
* Bumping copyrights to 2014 and expressing them as year ranges.Kaz Kylheku2013-12-101-1/+1
| | | | Fixing some errors in copyright comments.
* * stream.c (struct stdio_handle): New member, mode.Kaz Kylheku2013-11-281-0/+1
| | | | | | | | | | | | | (stdio_stream_mark): Mark the new member during gc. (stdio_seek): When we seek, we should reset the utf8 machine. (tail_strategy): New function. (tail_get_line, tail_get_char, tail_get_byte): Use tail_strategy for polling the file at EOF. (open_tail): Store the mode in the file handle. * utf8.c (w_freopen): New function. * utf8.h (w_freopen): Declared.
* * arith.c: Updated copyright year.Kaz Kylheku2012-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * arith.h: Likewise. * debug.c: Added copyright header. * debug.h: Updated copyright year. * eval.c: Likewise. * eval.h: Likewise. * filter.c: Likewise. * filter.h: Likewise. * gc.c: Likewise. * gc.h: Likewise. * hash.c: Likewise. * hash.h: Likewise. * lib.c: Likewise. * lib.h: Likewise. * match.c: Likewise. * match.h: Likewise. * parser.h: Likewise. * regex.c: Likewise. * regex.h: Likewise. * stream.c: Likewise. * stream.h: Likewise. * txr.c: Likewise, and e-mail address. * txr.h: Updated copyright year. * unwind.c: Likewise. * unwind.h: Likewise.
* * utf8.c (utf8_from_uc, utf8_decode): Impose a minium value on theKaz Kylheku2012-02-021-1/+1
| | | | | | | decoded character based on which UTF-8 case it is from. This rejects overlong forms. * utf8.h (struct utf8_decoder): New member, wch_min.
* * LICENSE, Makefile, configure, filter.c, filter.h, gc.c, gc.h, hash.c,Kaz Kylheku2011-10-041-1/+1
| | | | | | hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated e-mail address.
* * LICENSE, Makefile, configure, gc.c, gc.h, hash.c, hash.h, lib.c,Kaz Kylheku2011-09-231-1/+1
| | | | | | lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated copyright year.
* Bump copyrights to 2010.Kaz Kylheku2010-10-051-1/+1
|
* More void * to mem_t * conversion.Kaz Kylheku2009-12-051-2/+2
|
* Provide both char * and unsigned char * interfaces in UTF-8 module.Kaz Kylheku2009-11-141-4/+8
| | | | Fix unsigned and plan char * mixing.
* Fixed broken utf8_from.Kaz Kylheku2009-11-121-0/+14
| | | | Added utf8_encode, utf8_decoder_init, utf8_decode.
* Big conversion to wide characters and UTF-8 support.Kaz Kylheku2009-11-111-0/+32
This is incomplete. There are too many dependencies on wide character support from the C stream I/O library, and implicit use of some encoding which may not be UTF-8. The regex code does not handle wide characters properly. Character type is still int in some places, rather than wchar_t. Test suite passes though.