But, there is one โbutโ: our CRM - SalesMax - is written in java, and, therefore, pauses associated with the work of the garbage collector periodically occur. Until recently, this was the inevitable evil that you just had to put up with.
And so, Oracle announced a new garbage collector - ZGC. According to preliminary announcements, he was supposed to solve the problem of java application freezes - declared pauses should not exceed 100 ms even on multi-gigabyte heaps. With our 6GB maximum memory usage, everything should be fine.
So let's get started.
Add the line to the standalone.conf of the wildfly application server
JAVA_OPTS="$JAVA_OPTS -XX:+UnlockExperimentalVMOptions -XX:+UseZGC"
We start the system, run load tests.
At first glance, everything works as stated, the pauses for garbage collection really decreased.
Without hesitation, it was decided to try a new garbage collector on one of the product servers. We chose the least loaded, configured, launched, began to observe.
At first, everything worked well, in general, they decided that the experiment was successful.
And so, Saturday night. We calmly play billiards, time after midnight. Call from the manager: CRM does not work for the client.
Check - the client from the same server. I put the phone in my hands, open Termius, try to connect to the server via ssh - silence ... Slightly, after about 20 seconds, which at that moment seemed like an eternity, but I still managed to enter. And what do we see? Despite the -Xmx6144M restrictions set in the startup parameters, the java process used up all available memory. After some time, the system completely killed this process.
So, the use of ZGC had to be disabled. The work of the CRM system immediately returned to normal. It would seem that there is nothing to do, we will wait until everything is finished in Oracle.
But, after some time, an article caught my eye in which the author shared the positive experience of using another garbage collector - Shenandoah, whose developer had exactly the same goals, namely: reducing the time that the stop the world phase takes in the garbage collector.
We decided: why not?
Having found the page from which you can download the pre-compiled JDK - https://builds.shipilev.net/ , we started testing: we add new keys to standalone.conf:
JAVA_OPTS="$JAVA_OPTS -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC"
This time, testing showed that everything, in general, is OK. And pauses for garbage collection were reduced, and, best of all, the unpredictable increase in memory consumption stopped. Everything works just perfect in production.
What conclusions can be drawn? I understand that Oracle is also developing, and the difficulties that we encountered in October 2019 may have already been fixed, and ZGC will soon be given a second chance. But at the moment, personally, we chose Shenandoah GC, and did not regret it.