Analysing Issues Properly in AEM
We have all seen it. An Adobe Experience Manager instance under heavy load, unusable, non-responsive, possibly even going into a deadlock-like or deadlock state.
The question is where to start analysing the data and how to look at it from a proper debugging perspective.
AEM per default delivers quite a lot of information which is not harnessed in many cases. In order to investigate and debug properly it is crucial to understand the data AEM provides and use tools that can interface properly with the Java Virtual Machine (JVM).
A good start when dealing with an unresponsive AEM instance is heaving a look at its log files, such as:
- Access log
- Request log
- Error log
- Garbage Collection log (if configured)
- Other logs you may have configured by OSGi configuration in the logging section
In this article I like to give a little more information on a few of the above mentioned log files and other tools that helped me solve issues in the past and sped up my debugging process immensely.
Looking at the request log for example can provide very valuable information about the general behaviour from before or after an issue may have occurred.
Jörg Hoh from Adobe has written a simple, but great, tool which helps you visualise this information.
The graph-request-log.pl script leverages the OOTB rlog.jar, delivered with AEM, which can filter out and quickly produce a readable output from your request.log.
Seeing stagnation in the responses could point you into the proper direction if AEM wasn’t able to handle the requests at a given time which would lead to a deeper analysis, for example leveraging thread dumps.
Many times when looking at the error log in production instances I see logs flooded with stack traces. Some helpful, but an awful lot of time they are just left overs from development.
Reading stack traces properly (not from the top to bottom but from the bottom to top) can greatly increase your chance to find the root issue or at least will help to point the development team in the right direction.
The error log also provides some information about bundle cache and cache hit / miss ratios. This will improve performance if configured correctly. Adobe provides in-depth information about the CRX bundle cache on their website.
Nnom was originally developed by IBM and it was released to the public a few years ago. Nmon is a very simple but very powerful monitoring tool that might be your life saver when analyzing issues on the system.
It provides you with all the standard OS level data and even goes one step further, serving detailed information like IO statistics.
Setting this information in relation to log's of AEM, which you may have visualised in the form of graphs, or even just looking at them in raw format can make your debugging life a lot easier.
Configuring nmon is simple and won’t put additional strain on existing resources. It can run in daemon mode which saves data to the disk for later analysis. Merging nmon files by the OOTB tool nmonmerge and using the nmon visualizer will quickly provide you with an overview of the system behaviour and even historic trends you will heavily depend on when analysing an issue.
Java development kit (JDK) tools
Java provides many great tools to investigate issues with your AEM instance. JMX should be enabled to really harness the full potential on your AEM instance. To avoid potential security risks it is necessary though to properly protect the tool by using authentication and other precautions. Otherwise it might open specific operations that may disrupt the operation of your website.
Further tools exist like VisualVM to help identify root causes. A plugin for VisualVM everyone should have in their arsenal to debug and analyse properly is the TDA (Thread Dump Analyzer). Additionally, there are many other plugins vital to investigations. Have a stroll on the plugin's page. A few to look out for are:
- MBeans Browser
- Thread Inspector
- OSGi Plugin
Running VisualVM locally on your notebook or on a management server allows you to remotely attach to the AEM instance and start monitoring system behaviour on the fly.
JConsole provides many features similar to VisualVM but is not as mighty as VisualVM. A short comparison and further details can be found on the JDK tools page.
There are also many application monitoring solutions on the market that will support you in your investigations. Having a tool box and understanding how to leverage those tools to their full potential will lead you to a successful analysis.
Debugging in AEM - final words
- Use the tools AEM provides you with for analysis.
- Have a solid understanding of stack trace reading.
- Create and interpret thread dumps, they may lead you to the root cause (in many cases they will).
- Use other tools f.e. nmon or application monitoring.
- It is always advisable to have your QA Team checking the error log as well to ensure that it gets cleaned up properly before going into production.