The disruption of performance due to heavily used authoring systems is a fairly common scenario. This can be frustrating both for users and in the end for the clients, as well as confusing, since the root of these performance issues often aren’t clear. Often, the majority of these issues are the result of unmaintained author instances or unexpected heavy usage. Fortunately, however, there many solutions which can accelerate the AEM author instance to improve performance, leading to happier users and therefore happier clients.
At Netcentric we believe that the best way to solve any problem is to first investigate it analytically. This analytical approach goes as far as understanding the problem to begin with.
It is crucial to identify the root cause of the performance issue. The best approach to this is to drill down into potential issue fields, whilst also not forgetting about other components in the web-stack. When working on AEM, on the author side the user encounters, often unknowingly, a number of components; from browsers, to CDN, or even ISP and caches. This initial analysis stage helps to create some kind of early idea of an area of improvement and mitigation plan. Then it’s time to actually further investigate the performance problem.
A good first step is to aggregate some data. The initial point to look for data of authoring systems is the logs. Specifically request logs are useful for highlighting performance on authoring systems, as well as quickly pinpointing where the slow requests might be coming from or what is responding slowly. If a developer or operations engineer has monitoring tools in place such as AppDynamics or NewRelic, it makes the process easier, as these tools gather a lot of different metrics and prepare them well. Therefore, a developer can access the dashboard and hopefully discover a potential bottleneck immediately.
The solution to the performance issue really depends on what was discovered in the initial analysis, but one solution to a wide range of issues is achieved by simply increasing power of author instances. Particularly when running cloud-based set-ups in Azure or AWS, AEM instances can be easily scaled up. Generally speaking, the more ressources thrown at a an instance, the faster it will run.
The power that impacts on the overall performance essentially comes down to three things - CPU, RAM, and IO and storage:
Another factor to be aware of that can affect performance is “memory thrashing”. This refers to when the TAR files get mapped to memory but the system memory isn’t big enough, then mapping isn’t efficient, and so it starts paging from disk to memory.
The solution here is to tune the instance size. Essentially, with a larger instance size, the memory mapping would work properly, as it would load the Segmentstore into memory and improve performance significantly. The idea is basically to have enough RAM left over for the Segmentstore so that this process can work efficiently. Smaller instance results in more paging, and a decrease in overall performance. When an instance is scaled up, disc IO drops instantly. In the end, improved performance comes down to having a well-oiled machine. It isn’t just about RAM, but rather everything functioning well together.
Data store and Segment store weren’t previously split on AEM 6.2, but with 6.3 they are split again, out of the box. This is brings a bit opportunity and also chances for large performance improvements because, otherwise, the data store sits within the TAR files, which can result in worse performance. Loading even bigger TAR files into storage also becomes a bigger problem. Another factor worth noting is that, typically, the heap size should not exceed 50% or 60% of total system memory.
Additionally copy and read and write is now set as default in the latest AEM versions for the index configurations further improving the overall performance.
A solution not applied effectively very often on the authoring side is caching. On a recent project with a multinational automotive company, Netcentric used time-stamped assets, which allowed for caching of everything except dynamic data. This helps improve performance by offloading what isn’t necessary.
A similar caching strategy setup with AEM is hosting 2,600 retailer websites worldwide. However, the editors were spread across the globe. To work around this, Netcentric incorporated a CDN, which greatly improved performance. It is worth bearing in mind that because different servers are being used with a CDN, there might be some privacy concerns which must be evaluated in such cases. However, it does offer a huge benefit for a project such as this one.
When tuning asset loading behavior on the authoring side, browser performance is very important. Netcentric has run several tests to improve browser performance, such as terminating HTTP2 on the CDN or loadbalancer, and incorporating compression tools such as Gzip and Brotli - and there has been a noticeable improvement in performance. Separate client lib categories can also increase performance, as it means the site being rendered is only using the files that they need at the time.