With respect to the application log generated in production, I want to:
1. Enable disk I/O for logging only if the application runs into a fatal or irrecoverable error.
2. Reduce file system footprint of the generated log.
3. Ability to provide enough information from the application log file to help developers troubleshoot production issues without having to :
a. Re-run the failing scenario or
b. Dump almost everything to the log.
Due to concurrent nature of the application and extensive use of log4J logging framework, our customers as users and we as developers are facing several issues related to application logging in production. Some of these issues are frequent disk I/O, large file system application log footprint and inconvenient application re-run in production for troubleshooting issues. Additionally, depending on the chosen logging level (and it’s side effects), we are sometime unable to troubleshoot issues quickly due to missing information or a natural need to re-run the failing scenario. Such a situation is clearly undesirable.
To address these issues, we need to look at the application up close and apply certain steps to improve its usage of the log4J logging framework. During analysis of this problem, the following options were thought about:
1. In memory logging framework: An ability within the application to log less severe logging messages during a user session in memory up to a point when a fatal or irrecoverable error occurs. At this time, the in-memory log can be dumped to the application log “as is” or “filtered” and the generated log can be used to troubleshoot the issue. If the application behaves normally during the user session without any issues, the in memory log could be wiped out (removed from memory) or written to a file (less periodically though) to record application behavior during the session. A custom MemoryAppender could easily be added for the log4J logging framework in case no exists (There wasn’t any until 2009). Log4NET does have one in-built.
2. Optimize or reduce logging: An analysis could be easily carried out against the application that determines most frequently called methods (MVF). We could choose the top ‘N’ methods and evaluate their logger calls. If we could optimize the calls to the logger from these methods, we may end up saving significant disk I/O and physical disk space otherwise consumed by log files. Determining MVFs would require filtering real production logs collected over a significant time period (few days?) and then analyzed for capturing the most called methods. The rule of thumb should be – “Log right. Log just enough, no more and no less”.
3. Verify and/or Implement log4J best practices: The log4J logging framework itself provides certain recommendations about best practices for effective and efficient logging behavior in your application. The target application could be evaluated against the usage of such best practices. For example, usage of %F:%L in the ConversionPattern attribute of log4J configuration has serious performance impact on logging. Similary using AsyncAppender is usually considered good for logger performance as log messages are written to the disk only when required. Log4J is known to exhibit slow performance for high load and concurrency. Hence, even if you must use DEBUG levels, you should avoid using %l, %M, and %C format specifiers which are extremely slow as documented. Additionally, it is highly recommended to use efficient toString () methods for application’s custom classes to avoid distressing the performance of logger messages. Another popular recommendation is to use non-functional feature specific logger class for each file and then control what you log in the production application logs. For example, you could have separate logger classes in a file for performance log messages vs. error log messages vs. security. E.g. In case of some typical error messages, you might choose to ignore or filter Security related messages if they are separately managed.
4. Miscellaneous: There are few other steps that could be taken to improve the logging performance in your application or help the developer troubleshoot issues better through log messages. One could employ logging façade framework like SLF4J which is known to provide not only efficient logging on top of a base framework like log4J but also it makes the underlying logging framework like log4J irrelevant to the application. You could replace log4J with any other framework and it would work. Also, a replay trace or execution tool for Java could be used to play the specific scenario that caused the application to error out again. One could also choose to start a separate thread for logging application messages that could additionally use asynchronous logging mechanism to relieve the main thread from such expensive operations.