summaryrefslogtreecommitdiffstats
path: root/winsup/cygwin/DevNotes
diff options
context:
space:
mode:
Diffstat (limited to 'winsup/cygwin/DevNotes')
-rw-r--r--winsup/cygwin/DevNotes415
1 files changed, 0 insertions, 415 deletions
diff --git a/winsup/cygwin/DevNotes b/winsup/cygwin/DevNotes
deleted file mode 100644
index aeca33076..000000000
--- a/winsup/cygwin/DevNotes
+++ /dev/null
@@ -1,415 +0,0 @@
-2012-08-17 cgf-000016
-
-While debugging another problem I finally noticed that
-sigpacket::process was unconditionally calling tls->set_siginfo prior to
-calling setup_handler even though setup_handler could fail. In the
-event of two successive signals, that would cause the second signal's
-info to overwrite the first even though the signal handler for the first
-would eventually be called. Doh.
-
-Fixing this required passing the sigpacket si field into setup_handler.
-Making setup_handler part of the sigpacket class seemed to make a lot of
-sense so that's what I did. Then I passed the si element into
-interrupt_setup so that the infodata structure could be filled out prior
-to arming the signal.
-
-The other changes checked in here eliminate the ResetEvent for
-signal_arrived since previous changes to cygwait should handle the
-case of spurious signal_arrived detection. Since signal_arrived is
-not a manual-reset event, we really should just let the appropriate
-WFMO handle it. Otherwise, there is a race where a signal comes in
-a "split second" after WFMO responds to some other event. Resetting
-the signal_arrived would cause any subsequent WFMO to never be
-triggered. My current theory is that this is what is causing:
-
-http://cygwin.com/ml/cygwin/2012-08/msg00310.html
-
-2012-08-15 cgf-000015
-
-RIP cancelable_wait. Yay.
-
-2012-08-09 cgf-000014
-
-So, apparently I got it somewhat right before wrt signal handling.
-Checking on linux, it appears that signals will be sent to a thread
-which can accept the signal. So resurrecting and extending the
-"find_tls" function is in order. This function will return the tls
-of any thread which 1) is waiting for a signal with sigwait*() or
-2) has the signal unmasked.
-
-In redoing this it became obvious that I had the class designation wrong
-for the threadlist handling so I moved the manipulation of the global
-threadlist into the cygheap where it logically belongs.
-
-2012-07-21 cgf-000013
-
-These changes reflect a revamp of the "wait for signal" functionality
-which has existed in Cygwin through several signal massages.
-
-We now create a signal event only when a thread is waiting for a signal
-and arm it only for that thread. The "set_signal_arrived" function is
-used to establish the event and set it in a location referencable by
-the caller.
-
-I still do not handle all of the race conditions. What happens when
-a signal comes in just after a WF?O succeeds for some other purpose? I
-suspect that it will arm the next WF?O call and the subsequent call to
-call_signal_handler could cause a function to get an EINTR when possibly
-it shouldn't have.
-
-I haven't yet checked all of the test cases for the URL listed in the
-previous entry.
-
-Baby steps.
-
-2012-06-12 cgf-000012
-
-These changes are the preliminary for redoing the way threads wait for
-signals. The problems are shown by the test case mentioned here:
-
-http://cygwin.com/ml/cygwin/2012-05/msg00434.html
-
-I've known that the signal handling in threads wasn't quite right for
-some time. I lost all of my thread signal tests in the great "rm -r"
-debacle of a few years ago and have been less than enthusiastic about
-redoing everything (I had PCTS tests and everything). But it really is
-time to redo this signal handling to make it more like it is supposed to
-be.
-
-This change should not introduce any new behavior. Things should
-continue to behave as before. The major differences are a change in the
-arguments to cancelable_wait and cygwait now uses cancelable_wait and,
-so, the returns from cygwait now mirror cancelable_wait.
-
-The next change will consolidate cygwait and cancelable_wait into one
-cygwait function.
-
-2012-06-02 cgf-000011
-
-The refcnt handling was tricky to get right but I had convinced myself
-that the refcnt's were always incremented/decremented under a lock.
-Corinna's 2012-05-23 change to refcnt exposed a potential problem with
-dup handling where the fdtab could be updated while not locked.
-
-That should be fixed by this change but, on closer examination, it seems
-like there are many places where it is possible for the refcnt to be
-updated while the fdtab is not locked since the default for
-cygheap_fdget is to not lock the fdtab (and that should be the default -
-you can't have read holding a lock).
-
-Since refcnt was only ever called with 1 or -1, I broke it up into two
-functions but kept the Interlocked* operation. Incrementing a variable
-should not be as racy as adding an arbitrary number to it but we have
-InterlockedIncrement/InterlockedDecrement for a reason so I kept the
-Interlocked operation here.
-
-In the meantime, I'll be mulling over whether the refcnt operations are
-actually safe as they are. Maybe just ensuring that they are atomically
-updated is enough since they control the destruction of an fh. If I got
-the ordering right with incrementing and decrementing then that should
-be adequate.
-
-2012-06-02 cgf-000010
-
-<1.7.16>
-- Fix emacs problem which exposed an issue with Cygwin's select() function.
- If a signal arrives while select is blocking and the program longjmps
- out of the signal handler then threads and memory may be left hanging.
- Fixes: http://cygwin.com/ml/cygwin/2012-05/threads.html#00275
-</1.7.16>
-
-This was try #4 or #5 to get select() signal handling working right.
-It's still not there but it should now at least not leak memory or
-threads.
-
-I mucked with the interface between cygwin_select and select_stuff::wait
-so that the "new" loop in select_stuff::wait() was essentially moved
-into the caller. cygwin_select now uses various enum states to decide
-what to do. It builds the select linked list at the beginning of the
-loop, allowing wait() to tear everything down and restart. This is
-necessary before calling a signal handler because the signal handler may
-longjmp away.
-
-I initially had this all coded up to use a special signal_cleanup
-callback which could be called when a longjmp is called in a signal
-handler. And cygwin_select() set up and tore down this callback. Once
-I got everything compiling it, of course, dawned on me that just because
-you call a longjmp in a signal handler it doesn't mean that you are
-jumping *out* of the signal handler. So, if the signal handler invokes
-the callback and returns it will be very bad for select(). Hence, this
-slower, but hopefully more correct implementation.
-
-(I still wonder if some sort of signal cleanup callback might still
-be useful in the future)
-
-TODO: I need to do an audit of other places where this problem could be
-occurring.
-
-As alluded to above, select's signal handling is still not right. It
-still acts as if it could call a signal handler from something other
-than the main thread but, AFAICT, from my STC, this doesn't seem to be
-the case. It might be worthwhile to extend cygwait to just magically
-figure this out and not even bother using w4[0] for scenarios like this.
-
-2012-05-16 cgf-000009
-
-<1.7.16>
-- Fix broken console mouse handling. Reported here:
- http://cygwin.com/ml/cygwin/2012-05/msg00360.html
-</1.7.16>
-
-I did a cvs annotate on smallprint.cc and see that the code to translate
-%characters > 127 to 0x notation was in the 1.1 revision. Then I
-checked the smallprint.c predecessor. It was in the 1.1 version of that
-program too, which means that this odd change has probably been around
-since <= 2000.
-
-Since __small_sprintf is supposed to emulate sprintf, I got rid of the
-special case handling. This may affect fhandler_socket::bind. If so, we
-should work around this problem there rather than keeping this strange
-hack in __small_printf.
-
-2012-05-14 cgf-000008
-
-<1.7.16>
-- Fix hang when zero bytes are written to a pty using
- Windows WriteFile or equivalent. Fixes:
- http://cygwin.com/ml/cygwin/2012-05/msg00323.html
-</1.7.16>
-
-cgf-000002, as usual, fixed one thing while breaking another. See
-Larry's predicament in: http://goo.gl/oGEr2 .
-
-The problem is that zero byte writes to the pty pipe caused the dread
-end-of-the-world-as-we-know-it problem reported on the mailing list
-where ReadFile reads zero bytes even though there is still more to read
-on the pipe. This is because that change caused a 'record' to be read
-and a record can be zero bytes.
-
-I was never really keen about using a throwaway buffer just to get a
-count of the number of characters available to be read in the pty pipe.
-On closer reading of the documentation for PeekNamedPipe it seemed like
-the sixth argument to PeekNamedPipe should return what I needed without
-using a buffer. And, amazingly, it did, except that the problem still
-remained - a zero byte message still screwed things up.
-
-So, we now detect the case where there is zero bytes available as a
-message but there are bytes available in the pipe. In that scenario,
-return the bytes available in the pipe rather than the message length of
-zero. This could conceivably cause problems with pty pipe handling in
-this scenario but since the only way this scenario could possibly happen
-is when someone is writing zero bytes using WriteFile to a pty pipe, I'm
-ok with that.
-
-2012-05-14 cgf-000007
-
-<1.7.16>
-- Fix invocation of strace from a cygwin process. Fixes:
- http://cygwin.com/ml/cygwin/2012-05/msg00292.html
-</1.7.16>
-
-The change in cgf-000004 introduced a problem for processes which load
-cygwin1.dll dynamically. strace.exe is the most prominent example of
-this.
-
-Since the parent handle is now closed for "non-Cygwin" processes, when
-strace.exe tried to dynamically load cygwin1.dll, the handle was invalid
-and child_info_spawn::handle_spawn couldn't use retrieve information
-from the parent. This eventually led to a strace_printf error due to an
-attempt to dereference an unavailable cygheap. Probably have to fix
-this someday. You shouldn't use the cygheap while attempting to print
-an error about the inavailability of said cygheap.
-
-This was fixed by saving the parent pid in child_info_spawn and calling
-OpenProcess for the parent pid and using that handle iff a process is
-dynamically loaded.
-
-2012-05-12 cgf-000006
-
-<1.7.16>
-- Fix hang when calling pthread_testcancel in a canceled thread.
- Fixes some of: http://cygwin.com/ml/cygwin/2012-05/msg00186.html
-</1.7.16>
-
-This should fix the first part of the reported problem in the above
-message. The cancel seemed to actually be working but, the fprintf
-eventually ended up calling pthread_testcancel. Since we'd gotten here
-via a cancel, it tried to recursively call the cancel handler causing a
-recursive loop.
-
-2012-05-12 cgf-000005
-
-<1.7.16>
-- Fix pipe creation problem which manifested as a problem creating a
-fifo. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00253.html
-</1.7.16>
-
-My change on 2012-04-28 introduced a problem with fifos. The passed
-in name was overwritten. This was because I wasn't properly keeping
-track of the length of the generated pipe name when there was a
-name passed in to fhandler_pipe::create.
-
-There was also another problem in fhandler_pipe::create. Since fifos
-use PIPE_ACCESS_DUPLEX and PIPE_ACCESS_DUPLEX is an or'ing of
-PIPE_ACCESS_INBOUND and PIPE_ACCESS_OUTBOUND, using PIPE_ACCESS_OUTBOUND
-as a "never-used" option for PIPE_ADD_PID in fhandler.h was wrong. So,
-fifo creation attempted to add the pid of a pipe to the name which is
-wrong for fifos.
-
-2012-05-08 cgf-000004
-
-The change for cgf-000003 introduced a new problem:
-http://cygwin.com/ml/cygwin/2012-05/msg00154.html
-http://cygwin.com/ml/cygwin/2012-05/msg00157.html
-
-Since a handle associated with the parent is no longer being duplicated
-into a non-cygwin "execed child", Windows is free to reuse the pid of
-the parent when the parent exits. However, since we *did* duplicate a
-handle pointing to the pid's shared memory area into the "execed child",
-the shared memory for the pid was still active.
-
-Since the shared memory was still available, if a new process reuses the
-previous pid, Cygwin would detect that the shared memory was not created
-and had a "PID_REAPED" flag. That was considered an error, and, so, it
-would set procinfo to NULL and pinfo::thisproc would die since this
-situation is not supposed to occur.
-
-I fixed this in two ways:
-
-1) If a shared memory region has a PID_REAPED flag then zero it and
-reuse it. This should be safe since you are not really supposed to be
-querying the shared memory region for anything after PID_REAPED has been
-set.
-
-2) Forego duping a copy of myself_pinfo if we're starting a non-cygwin
-child for exec.
-
-It seems like 2) is a common theme and an audit of all of the handles
-that are being passed to non-cygwin children is in order for 1.7.16.
-
-The other minor modification that was made in this change was to add the
-pid of the failing process to fork error output. This helps slightly
-when looking at strace output, even though in this case it was easy to
-find what was failing by looking for '^---' when running the "stv"
-strace dumper. That found the offending exception quickly.
-
-2012-05-07 cgf-000003
-
-<1.7.15>
-Don't make Cygwin wait for all children of a non-cygwin child program.
-Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00063.html,
- http://cygwin.com/ml/cygwin/2012-05/msg00075.html
-</1.7.15>
-
-This problem is due to a recent change which added some robustness and
-speed to Cygwin's exec/spawn handling by not trying to force inheritance
-every time a process is started. See ChangeLog entries starting on
-2012-03-20, and multiple on 2012-03-21.
-
-Making the handle inheritable meant that, as usual, there were problems
-with non-Cygwin processes. When Cygwin "execs" a non-Cygwin process N,
-all of its N + 1, N + 2, ... children will also inherit the handle.
-That means that Cygwin will wait until all subprocesses have exited
-before it returns.
-
-I was willing to make this a restriction of starting non-Cygwin
-processes but the problem with allowing that is that it can cause the
-creation of a "limbo" pid when N exits and N + 1 and friends are still
-around. In this scenario, Cygwin dutifully notices that process N has
-died and sets the exit code to indicate that but N's parent will wait on
-rd_proc_pipe and will only return when every N + ... windows process
-has exited.
-
-The removal of cygheap::pid_handle was not related to the initial
-problem that I set out to fix. The change came from the realization
-that we were duping the current process handle into the child twice and
-only needed to do it once. The current process handle is used by exec
-to keep the Windows pid "alive" so that it will not be reused. So, now
-we just close parent in child_info_spawn::handle_spawn iff we're not
-execing.
-
-In debugging this it bothered me that 'ps' identified a nonactive pid as
-active. Part of the reason for this was the 'parent' handle in
-child_info was opened in non-Cygwin processes, keeping the pid alive.
-That has been kluged around (more changes after 1.7.15) but that didn't
-fix the problem. On further investigation, this seems to be caused by
-the fact that the shared memory region pid handles were still being
-passed to non-cygwin children, keeping the pid alive in a limbo-like
-fashion. This was easily fixed by having pinfo::init() consider a
-memory region with PID_REAPED as not available. A more robust fix
-should be considered for 1.7.15+ where these handles are not passed
-to non-cygwin processes.
-
-This fixed the problem where a pid showed up in the list after a user
-does something like: "bash$ cmd /c start notepad" but, for some reason,
-it does not fix the problem where "bash$ setsid cmd /c start notepad".
-That bears investigation after 1.7.15 is released but it is not a
-regression and so is not a blocker for the release.
-
-2012-05-03 cgf-000002
-
-<1.7.15>
-Fix problem where too much input was attempted to be read from a
-pty slave. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00049.html
-</1.7.15>
-
-My change on 2012/04/05 reintroduced the problem first described by:
-http://cygwin.com/ml/cygwin/2011-10/threads.html#00445
-
-The problem then was, IIRC, due to the fact that bytes sent to the pty
-pipe were not written as records. Changing pipe to PIPE_TYPE_MESSAGE in
-pipe.cc fixed the problem since writing lines to one side of the pipe
-caused exactly that the number of characters to be read on the other
-even if there were more characters in the pipe.
-
-To debug this, I first replaced fhandler_tty.cc with the 1.258,
-2012/04/05 version. The test case started working when I did that.
-
-So, then, I replaced individual functions, one at a time, in
-fhandler_tty.cc with their previous versions. I'd expected this to be a
-problem with fhandler_pty_master::process_slave_output since that had
-seen the most changes but was surprised to see that the culprit was
-fhandler_pty_slave::read().
-
-The reason was that I really needed the bytes_available() function to
-return the number of bytes which would be read in the next operation
-rather than the number of bytes available in the pipe. That's because
-there may be a number of lines available to be read but the number of
-bytes which will be read by ReadFile should reflect the mode of the pty
-and, if there is a line to read, only the number of bytes in the line
-should be seen as available for the next read.
-
-Having bytes_available() return the number of bytes which would be read
-seemed to fix the problem but it could subtly change the behavior of
-other callers of this function. However, I actually think this is
-probably a good thing since they probably should have been seeing the
-line behavior.
-
-2012-05-02 cgf-000001
-
-<1.7.15>
-Fix problem setting parent pid to 1 when process with children execs
-itself. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00009.html
-</1.7.15>
-
-Investigating this problem with strace showed that ssh-agent was
-checking the parent pid and getting a 1 when it shouldn't have. Other
-stuff looked ok so I chose to consider this a smoking gun.
-
-Going back to the version that the OP said did not have the problem, I
-worked forward until I found where the problem first occurred -
-somewhere around 2012-03-19. And, indeed, the getppid call returned the
-correct value in the working version. That means that this stopped
-working when I redid the way the process pipe was inherited around
-this time period.
-
-It isn't clear why (and I suspect I may have to debug this further at
-some point) this hasn't always been a problem but I made the obvious fix.
-We shouldn't have been setting ppid = 1 when we're about to pass off to
-an execed process.
-
-As I was writing this, I realized that it was necessary to add some
-additional checks. Just checking for "have_execed" isn't enough. If
-we've execed a non-cygwin process then it won't know how to deal with
-any inherited children. So, always set ppid = 1 if we've execed a
-non-cygwin process.