{"id":1249,"date":"2010-05-26T13:57:45","date_gmt":"2010-05-26T03:57:45","guid":{"rendered":"http:\/\/www.somethinkodd.com\/oddthinking\/?p=1249"},"modified":"2010-05-26T13:57:45","modified_gmt":"2010-05-26T03:57:45","slug":"python-consumer-thread-shutdown-design-notes","status":"publish","type":"post","link":"https:\/\/www.somethinkodd.com\/oddthinking\/2010\/05\/26\/python-consumer-thread-shutdown-design-notes\/","title":{"rendered":"Python Consumer Thread Shutdown Design Notes"},"content":{"rendered":"<div class=\"aside\">Oh well, it worked well <a href=\"http:\/\/www.somethinkodd.com\/oddthinking\/2010\/05\/25\/python-message-buffering-logging-handler-design-notes\/\">yesterday<\/a>. Let&#8217;s try another session of rubber-ducking.<\/div>\n<p>The design so far is that there is a proxy handler object that runs in the client thread(s) and a consumer thread in the background. The consumer thread monitors a Queue.Queue for work items, which is the standard Python idiom.<\/p>\n<p>Ideally, the consumer thread should remain until it the clients no longer exist, then it should complete any outstanding I\/O (if any) and then immediately terminate.<\/p>\n<p>What&#8217;s the best way to ensure that happens?<\/p>\n<p>If the project goes well, the proxy handler will be instantiated by other Python developers, who will use an existing interface, so I cannot make any assumptions that they will clean up nicely after themselves.<\/p>\n<p>Let&#8217;s look at some of the options:<\/p>\n<h3>Daemon Threads<\/h3>\n<p>One option is to declare the consumer thread to be a daemon thread. If the interpreter finds that all the real threads are completed, and only daemon threads remain, it immediately terminates the daemon threads, and hence the application.<\/p>\n<div class=\"aside\">This termination appears to be with prejudice. It isn&#8217;t a graceful shutdown, but the thread just stops executing.<\/div>\n<p>This is unsatisfactory, as it will terminate the consumer thread before it completes any outstanding I\/O. Given logging is often the last action of a dying application, it is important to hang around for long enough to record its final words.<\/p>\n<p>Unfortunately, it isn&#8217;t possible to change the daemon status on a running thread; I can&#8217;t declare it is a daemon thread now, but not when it has work to do.<\/p>\n<h3>Call to join()<\/h3>\n<p>If the client is shutting down and no longer cares about real-time behaviour, the handler object in the client thread could call queue.join() which causes it to block until all the tasks in the queue has been completed.<\/p>\n<p>After that, the client thread could safely terminate. The consumer thread (declared as a daemon thread) would then shut-down when all the other threads do.<\/p>\n<p>If the client calls close(), the join() could be done then. This sounds perfect.<\/p>\n<p>However, if the client <em>neglects<\/em> to call close() before terminating, it causes the consumer thread to quit suddenly before the I\/O is completed.<\/p>\n<h3>Sentinel Values<\/h3>\n<p>If the client can declare that it no longer requires the handler (by calling close()), then the client thread can push a sentinel value onto the queue. The client thread may then terminate immediately.<\/p>\n<p>When the consumer thread (declared NOT to be a daemon thread, in this case) catches up with its backlog, and sees the sentinel value, it can shut the thread down.<\/p>\n<p>Again, this is perfect, when the client calls close().  When the client neglects to call close() it causes the application to freeze up when the main thread terminates.<\/p>\n<h3>Destructors<\/h3>\n<p>Python has garbage collection. When an object is no longer required, its destructor, if any, is normally called. The destructor could call the close() method, to ensure the object is properly shut-down.<\/p>\n<p>However, Python destructors are rather fragile. If the garbage collector detects a cycle, it <em>won&#8217;t<\/em> clean it up if there is an object with a  destructor involved. Also, destructors are not consistently called when a thread terminates.<\/p>\n<p>This was the design I had originally envisioned. My attempts to use the destructor have been only intermittently successful.<\/p>\n<h3>Polling<\/h3>\n<p>Rather than have the consumer thread block forever waiting for input, it could timeout and do some check to see if the client thread is still around.<\/p>\n<p>Firstly, this has issues with how it would detect the &#8220;client thread&#8221;, which could really by many threads.<\/p>\n<p>Secondly, this either is polling is either done frequently enough to make it inefficient background noise for the CPU (especially for an infrequently required service), or it is done infrequently enough that the program takes a long time to terminate, just because of the logging. Neither seems satisfactory.<\/p>\n<h3>Thread per Work Item<\/h3>\n<p>One thread could be created per transaction (e.g. emitting a message). This thread would run until the transaction was completed, and then terminate.<\/p>\n<p>This strikes me as very inefficient (although it may be worth testing). In the case where the system is asked to send a flood of emails, it would take some time to allocate all of the new threads. It would also be memory and CPU intensive, meaning it may well cause other problems for the application.<\/p>\n<h3>Using atexit<\/h3>\n<p>A function may be registered with the atexit module. When the interpreter is shutting down, those functions are called.<\/p>\n<p>The logging module registers a shut-down method which cleans up all handlers.<\/p>\n<p>However, the atexit functions are only called when the program is terminating. The problem here is that the consumer thread is causing the program to not terminate properly, so the atexit functions are not being called.<\/p>\n<h3>Daemon Thread \/ atexit Combination<\/h3>\n<p>Oooh, here is an idea that came from rubber-ducking.<\/p>\n<p>What if the consumer thread was a daemon thread, so it terminated suddenly, but the client thread, during its atexit-driven shutdown, went to check on the queue and processed any remaining items? That is, when the producer was finished, the consumer thread died, the producer woke up for a last ditched effort to act like a consumer to finish off any remaining tasks.<\/p>\n<p>The item that was currently being processed by the consumer thread when it died would be lost. I can think of ways to reduce the size of the window of potential lossage, but not to eradicate it and not without introducing the counter-risk that an item would be processed twice.<\/p>\n<p>[Stop Press: I thought of a way to guarantee at-least-once semantics. If every item was posted to the queue <em>twice<\/em> (and was guaranteed to be in immediate succession, which could be done with a semaphore), then the single consumer thread could process the first one and then discard the second one. The producer thread could process each unique one it finds. That would ensure no log message would be lost, but one might be emitted twice.]<\/p>\n<p>[Stop, stop press: No, that assumes that there is only one producer-turned-consumer, but there may be many client threads shutting down.]<\/p>\n<h3>Multiprocessing<\/h3>\n<p>The multiprocessing module in Python offers a thread-like interface to processes. It might provide an interface that allowed atexit to be used.<\/p>\n<p>However, as well as imposing yet more constraints on the client (e.g. the main module must be importable), it also isn&#8217;t supported by the (default) logging locking system. The provided (protected) handler is probably not process-safe, and runs the risk of corrupting the log messages.<\/p>\n<h3>What else?<\/h3>\n<p>I am out of ideas here&#8230;<\/p>\n<p>&#8230; I may have to assume that the client will jump through hoops to ensure that the handler object is properly closed when the program ends, which will make this project unsuitable for a broader audience.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In which Julian fails to persuade Python to perform a simple task: stopping.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_s2mail":"","footnotes":""},"categories":[34],"tags":[],"class_list":["post-1249","post","type-post","status-publish","format-standard","hentry","category-software-development"],"_links":{"self":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts\/1249","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/comments?post=1249"}],"version-history":[{"count":5,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts\/1249\/revisions"}],"predecessor-version":[{"id":1254,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts\/1249\/revisions\/1254"}],"wp:attachment":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/media?parent=1249"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/categories?post=1249"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/tags?post=1249"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}