Tools and approaches to finding the root cause of performance issues analytically and debugging AEM Web Stacks properly.
With web stacks getting larger and more complex, the need for debugging is more present then ever. Performance issues are especially painful to debug, and require much time and attention until a root cause analysis is completely performed. It is sometimes difficult to understand how to debug a problem or where to start when working on AEM Web Stacks. This blog article builds on some of the tools and approaches already mentioned on Analysing Issues Properly in AEM.
A common topic in AEM setup is that there are performance issues or bugs, and someone needs to find the root cause. In many cases the DevOps team or systems engineer tends to have the broadest understanding of the whole web stack, and
Usually one can categorise the different types of issues into two specific categories:
- Performance, something is running slow, the request isn't answered properly or similar
- Some bug, something isn't working as expected, a header is present but should not be
In an AEM stack there are multiple layers involved:
mod_dispatcher / varnish
For debugging such a large stack, it is possible to start on one end or the other. In most cases with AEM, it is good to start at AEM level and then work your way out because in many cases a bug, for example with a request , may be already at AEM level OR in fact it lets you know right away it is not at the AEM level.
Below is a
For performance issues, one can use various tools to first get an overview of where exactly the performance issue is coming from:
- Leverage the rlog.jar to identify slow running requests.
- Use the graphing tool from Jörg Hoh to visualize them.
- A good start is also to use your browser development tools. In some cases, this can lead to a quick pointer in the direction.
Use nmon logs + nmon visualizer to
- Gather all the logs (error, request, nmon etc.)
- Create threaddumps before you restart AEM every 5 seconds 10 pieces
- Create the rlog jar output
- Visualize the logs
- Pinpoint a specific time when the issue occurred.
- Try to correlate slow running requests to some potential issues seen in nmon
- If you can't, the chances are high that the issue is in AEM. This will give you a good guess if it is an AEM issue or possibly some other problem impacting AEM
- A first step is to find the request and look in the error.log to see if there is any stacktrace possibly popping up in relation to the long-running request
- If it is an AEM issue, thread dumps are a must-have.
At this point when looking at thread dumps it is good to involve the solutions architect or a senior developer to further analyze them if you are missing the detailed knowledge.
Additionally, many projects have profilers at hand. For example, AppDynamics. Use it! Look at the general metrics, look for correlations and look at the garbage collections or other JVM metrics to try to identify a possible request and or job.