OddThinking

A blog for odd things and odd thoughts.

Self-Scaling Log Files

I present another half-formed idea for your consideration and hopeful implementation….

So, I am debugging some software (mine? 3rd party? whatever) and I need more information, so I decrease the logging threshold, so more messages will be written to the log file.

Eventually, I turn that off again, because the problem seems to be finally is fixed, and hasn’t recurred for several days or weeks (or 6 runs) so there is no point wasting time and disk-space with pointless information.

Who am I kidding? I never turn it off. Why would I go back and touch a working piece of software weeks later? Especially if it involves taking down a service that people are depending on!

So my half-formed idea, for someone else to run with, is an option for log files which scales back the logging over time.

It could use the age since the software was built, or the sourcefiles were edited, or the config files were changed, whatever. If no-one is tinkering with the system, save the effort of logging.

That’s as far as I have got. I’ve got to go now. There are a whole lot of pieces of old software I need to update to cut back on their logging.


Comments

  1. Concurrency issues aside (because I assume you can find a trivial way around them), could you not have the logging threshold in a config file, and have the program read the config every X minutes?

    Generally my logging thresholds are command line options, but I don’t really know why I do this. Just force of habit I guess.

  2. Since you’ve said you don’t want to have to restart the app – can you split different types of logging to differently named log files?

    This makes it annoying for analysis, but means you can specify the rules for deleting it differently – so when you’re happy the problem is fixed, just delete the debug level data more often than the other data.

    On linux – see logrotate.d

  3. The quick and dirty solution is to split the logs into two sets of files (e.g. trace and error logs), rotate them every X MB and/or each day/week/month, and to delete the oldest of each set when they get too large.

    As a developer, I’d much rather you left the logging on max the whole time, and that the logger kept a detailed log for me and a (much shorter) error log for you. If there’s an error, the app (automatically?) tells me, or gets you to tell me about it, and I get the detailed log at the same time.

    If only there was a good way of determining how much context you or I am going to need for this. If errors are rare, and you want them fixed every time they happen, then the error log will never need to be truncated, or it can always be truncated to just the most recent error, but most of that debug log is going to be irrelevant to us both…

    Log rotation and deleting old files is only a partial answer here, since for a debug log to be the most use, I need everything relevant you did leading up to the error, just to know how you got the system in the state it’s in, plus if there are any other symptoms of the error that you didn’t mention. Determining what’s relevant just based on how long ago you did is too naive an implementation, and rarely helps in debugging the more difficult errors that escaped functional testing.

    Some kind of runtime data and control dependency flow that could filter the debug messages to just the relevant ones (plus a few version related messages) would be ideal. Keeping as much info as that would be… let’s say there would be a minor performance hit 😉

  4. I like Richard’s solution, because it’s quick. And dirty.

    Here’s another idea. Sophisticated logging frameworks often allow runtime adjustment of logging thresholds. In the Java world there are several external interfaces (webpages, GUI, JMX beans, SNMP) that allow logging levels to be adjusted on-the-fly by an administrator. Something like that sounds like just the ticket for your app.

  5. I’m thinking somewhere between Richard’s solution and Rohan’s solution might work. I am using Python’s logging system which is very flexible, to the point of being ineffable.

    I *believe* it should be possible to configure the logging system to create two separate log files – one that only contains, say, warnings and errors. The other could contain info and debug messages and the warnings and error too, so analysis should be simple.

    Both can rotate, with their own quotas. I appreciate that Richard finds the complete history useful, but it suffers in the context of a long-running server: it is a memory leak, but to disk.

    That still leaves a performance issue though. I continue to be startled at how often the output of my software’s progress is the biggest performance hit to my software’s progress!

    Alan/Tom, adjusting logging levels on the fly is an improvement over requiring the software to be rebounced, but still suffers from the fact of life that I will rarely actually get around to doing it!

    p.s. Alan, enjoy your holiday.

  6. Looks to me like what you really want is the power of the standard unix syslog facility. ie, the app sends all information to it, marked by severity… as well as the application name (“facility”), and the logger then can save or discard as it sees fit, at whatever threshold has been set. Logging thresholds can be changed without restarting your running service, consistent logging, code saving… etc

    … so, syslog then. In userspace?

    BUT!

    What is wrong with syslog as it exists right now?
    http://tools.ietf.org/html/rfc3164 allows for 8 ‘local’ facilities…

    note that http://tools.ietf.org/html/rfc5424 is the latest syslog rfc, and what your local syslog can do may be variable, but should more or less handle the basics described above

  7. Like Alan says, a lot of logging systems will allow you to adjust the log level on the fly. When you (a user) looks at an error, they will often say “I have no idea what happened here, I’m going to increase the log level”. Later, they will either run out of disk space, or find that their logs are cycling too quickly, in which case they’ll reduce the log level again, or they’ll be vigilant and change it, or the logs will keep growing without bound.

    I think what you might want is to increase the log level whenever an error (warning?) is logged, then decrease it again later after a fixed time (like a week, on a timer thread somewhere).

    One of the advantages of the Richard / Rohan solution is that there are already “log readers” out there which can thread multiple log files by time stamps, and colour them and what-not. Some of them even have calendars and things. One of the downsides is that if you find a problem too late, you might lose the debug logs, even though the warnings and errors are still there.

Leave a comment

You must be logged in to post a comment.