Compilation Types in the JVM: Exposing Black Magic Session

Hello!



Today, your attention is invited to a translation of the article, in which examples of compilation in the JVM are discussed. Particular attention is paid to the AOT compilation supported in Java 9 and above.



Enjoy reading!



I suppose anyone who has ever programmed in Java has heard of instant compilation (JIT), and possibly compilation before execution (AOT). In addition, there is no need to explain what “interpreted” languages ​​are. This article will explain how all these features are implemented in the Java virtual machine, JVM.



You probably know that when programming in Java, you need to run a compiler (using the “javac” program) that collects Java source code (.java files) into Java bytecode (.class files). Java bytecode is an intermediate language. It is called "intermediate" because it is not understood by a real computing device (CPU) and cannot be executed by a computer and, thus, represents a transitional form between the source code and the "native" machine code executed in the processor.



In order for Java bytecode to do any specific work, there are 3 ways to get it to do it:



  1. Directly execute the intermediate code. It is better and more correct to say that it needs to be "interpreted". The JVM has a Java interpreter. As you know, for the JVM to work, you need to run the “java” program.
  2. Immediately before executing the intermediate code, compile it into native code and force the CPU to execute this freshly baked native code. Thus, compilation takes place just before execution (Just in Time) and is called “dynamic”.
  3. 3The very first thing, even before the program is launched, the intermediate code is translated into native and run it through the CPU from beginning to end. This compilation is done before execution and is called AoT (Ahead of Time).


So, (1) is the work of the interpreter, (2) is the result of JIT compilation, and (3) is the result of AOT compilation.



For the sake of completeness, I’ll mention that there is a fourth approach - to directly interpret the source code, but in Java this is not accepted. This is done, for example, in Python.

Now let's see how “java” works as (1) the interpreter of (2) the JIT compiler and / or (3) the AOT compiler - and when.



In short - as a rule, “java” does both (1) and (2). Starting with Java 9, a third option is also possible.



Here is our Test



class, which will be used in future examples.



 public class Test { public int f() throws Exception { int a = 5; return a; } public static void main(String[] args) throws Exception { for (int i = 1; i <= 10; i++) { System.out.println("call " + Integer.valueOf(i)); long a = System.nanoTime(); new Test().f(); long b = System.nanoTime(); System.out.println("elapsed= " + (ba)); } } }
      
      





As you can see, there is a main



method that instantiates the Test



object and cyclically calls the f



function 10 times in a row. The f



function does almost nothing.



So, if you compile and run the above code, the output will be quite expected (of course, the values ​​of the elapsed time will turn out different for you):



 call 1 elapsed= 5373 call 2 elapsed= 913 call 3 elapsed= 654 call 4 elapsed= 623 call 5 elapsed= 680 call 6 elapsed= 710 call 7 elapsed= 728 call 8 elapsed= 699 call 9 elapsed= 853 call 10 elapsed= 645
      
      





And now the question is: is this conclusion the result of the work of “java” as an interpreter, that is, option (1), “java” as a JIT compiler, that is, option (2) or is it somehow related to AOT compilation , that is, option (3)? In this article I am going to find the right answers to all these questions.



The first answer that I want to give is most likely that only (1) takes place here. I say “most likely”, because I don’t know if any environment variable is set here that would change the default JVM options. If nothing superfluous is installed, and this is how “java” works by default, then here we are 100% observing just option (1), that is, the code is fully interpreted. I am sure of this, since:





Please note: JVM can work in client or server mode, and the options set by default in the first and in the second case will be different. As a rule, the decision about the startup mode is made automatically, depending on the environment or the computer where the JVM was launched. Hereinafter, I will indicate the –client



option during all starts, so as not to doubt that the program is running in client mode. This option will not affect the aspects that I want to demonstrate in this post.



If you run “java” with the -XX:PrintCompilation



, the program will print a line when the function is dynamically compiled. Do not forget that JIT compilation is performed separately for each function, some functions in the class may remain in bytecode (that is, not compiled), while others may already have passed JIT compilation, that is, ready for direct execution in the processor .



Below I also add the -Xbatch



option. The -Xbatch



option -Xbatch



needed only to make the output look more presentable; otherwise, the JIT compilation proceeds competitively (along with the interpretation), and the output after compilation can sometimes look strange at runtime (due to -XX:PrintCompilation



). However, the –Xbatch



option disables background compilation, therefore, before executing the JIT compilation, the execution of our program will be stopped.



(For the sake of readability, I will write each option from a new line)



 $ java -client -Xbatch -XX:+PrintCompilation Test
      
      





I will not insert the output of this command here, because by default the JVM compiles a lot of internal functions (relating, for example, to java, sun, jdk packages), so the output will be very long - so, on my screen, there are 274 lines on the internal functions , and a few more - to the very conclusion of the program). To make this research easier, I will cancel the JIT compilation for inner classes or selectively enable it only for my method ( Test.f



). To do this, specify one more option, -XX:CompileCommand



. You can specify many commands (compilation), so it would be easier to put them in a separate file. Fortunately, we have the -XX:CompileCommandFile



option -XX:CompileCommandFile



. So, move on to creating the file. I will call it hotspot_compiler



for a reason that I will explain shortly and write the following:



 quiet exclude java/* * exclude jdk/* * exclude sun/* *
      
      





In this case, it should be completely clear that we exclude all functions (the last *) in all classes from all packages that start with java, jdk and sun (package names are separated by /, and you can use *). The quiet



command tells the JVM not to write anything about the excluded classes, so only those that are now compiled will be output to the console. So, I run:



 java -client -Xbatch -XX:+PrintCompilation -XX:CompileCommandFile=hotspot_compiler Test
      
      





Before telling you about the output of this command, I remind you that I called this file hotspot_compiler



, because it seems (I did not check) that in Oracle JDK, the name .hotspot_compiler



is set by default for the file with compiler commands.



So the conclusion is:



 many lines like this 111 1 n 0 java.lang.invoke.MethodHandle::linkToStatic(LLLLLL)L (native) (static) call 1 some more lines like this 161 48 n 0 java.lang.invoke.MethodHandle::linkToStatic(ILIJL)I (native) (static) elapsed= 7558 call 2 elapsed= 1532 call 3 elapsed= 920 call 4 elapsed= 732 call 5 elapsed= 774 call 6 elapsed= 815 call 7 elapsed= 767 call 8 elapsed= 765 call 9 elapsed= 757 call 10 elapsed= 868
      
      





First, I don’t know why some java.lang.invoke.MethodHandler.



methods are still compiling java.lang.invoke.MethodHandler.



Probably, some things just can not be turned off. As I understand what’s the matter, I’ll update this post. However, as you can see, all other compilation steps (previously there were 274 lines) have now disappeared. In further examples, I will also remove java.lang.invoke.MethodHandler



from the output of the compilation log.



Let's see what we have come to. Now we have a simple code where we run our function 10 times. I mentioned earlier that this function is interpreted, not compiled, as it is indicated in the documentation, and now we see it in the logs (at the same time, we do not see it in the compilation logs, and this means that it is not subjected to JIT compilation). Well, you just saw the “java” tool in action, interpreting and only interpreting our function in 100% of cases. So, we can check the box that figured out with option (1). We pass to (2), dynamic compilation.



According to the documentation, you can run the function 1,500 times and make sure that the JIT compilation is really happening. However, you can also use the -XX:CompileThreshold=invocations



call -XX:CompileThreshold=invocations



, setting the desired value instead of 1500. Let us indicate here 5. This means that we expect the following: after 5 “interpretations” of our function f, the JVM must compile the method and then run the compiled version.

java -client -Xbatch



 -XX:+PrintCompilation -XX:CompileCommandFile=hotspot_compiler -XX:CompileThreshold=5 Test
      
      





If you ran this command, you may have noticed that nothing has changed compared to the above example. That is, compilation still does not occur. It turns out, according to the documentation, -XX:CompileThreshold



only works when TieredCompilation



disabled, which is the default. It -XX:-TieredCompilation



like this: -XX:-TieredCompilation



. Tiered Compilation is a feature introduced in Java 7 to improve both the launch and cruising speed of the JVM. In the context of this post, it is not important, so feel free to disable it. Let's now run this command again:



 java -client -Xbatch -XX:+PrintCompilation -XX:CompileCommandFile=hotspot_compiler -XX:CompileThreshold=5 -XX:-TieredCompilation Test
      
      





Here is the output (recall, I have missed the lines regarding java.lang.invoke.MethodHandle



):



 call 1 elapsed= 9411 call 2 elapsed= 1291 call 3 elapsed= 862 call 4 elapsed= 1023 call 5 227 56 b Test::<init> (5 bytes) 228 57 b Test::f (4 bytes) elapsed= 1051739 call 6 elapsed= 18516 call 7 elapsed= 940 call 8 elapsed= 769 call 9 elapsed= 855 call 10 elapsed= 838
      
      





We welcome (hello!) The dynamically compiled function Test.f or Test::<init>



immediately after calling number 5, because I set CompileThreshold to 5. The JVM interprets the function 5 times, then compiles it, and finally launches the compiled version. Since the function is compiled, it should run faster, but we can’t verify this here, since this function does nothing. I think this is a good topic for a separate post.



As you probably already guessed, another function is compiled here, namely Test::<init>



, which is a constructor of the Test



class. Since the code calls the constructor (new Test()



), whenever f



called, it compiles simultaneously with the f



function, exactly after 5 calls.



In principle, this can end the discussion of option (2), JIT compilation. As you can see, in this case, the function is first interpreted by the JVM, then dynamically compiled after five times interpretation. I would like to add the last detail regarding JIT compilation, namely, mention the option -XX:+PrintAssembly



. As the name implies, it outputs to the console a compiled version of the function (compiled version = native machine code = assembler code). However, this will only work if there is a disassembler in the library path. I guess the disassembler may differ in different JVMs, but in this case we are dealing with hsdis - a disassembler for openjdk. The source code of the hsdis library or its binary file can be taken in different places. In this case, I compiled this file and put hsdis-amd64.so



in JAVA_HOME/lib/server



.



So now we can execute this command. But first I have to add that to run -XX:+PrintAssembly



also need to add the -XX:+UnlockDiagnosticVMOptions



, and it must follow before the PrintAssembly



option. If this is not done, then the JVM will give you a warning about the incorrect use of the PrintAssembly



option. Let's run this code:



 java -client -Xbatch -XX:+PrintCompilation -XX:CompileCommandFile=hotspot_compiler -XX:CompileThreshold=5 -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Test
      
      





The output will be long, and there will be lines like:



 0x00007f4b7cab1120: mov 0x8(%rsi),%r10d 0x00007f4b7cab1124: shl $0x3,%r10 0x00007f4b7cab1128: cmp %r10,%rax
      
      





As you can see, the corresponding functions are compiled into native machine code.



Finally, discuss option 3, AOT. Compilation before execution, AOT, was not available in Java prior to version 9.



A new tool has appeared in JDK 9, jaotc - as the name implies, it is an AOT compiler for Java. The idea is this: run the Java “javac” compiler, then the AOT compiler for Java “jaotc”, and then run the JVM “java” as usual. The JVM normally performs interpretation and JIT compilation. However, if the function has AOT-compiled code, it directly uses it, and does not resort to interpretation or JIT compilation. Let me explain: you are not required to run the AOT compiler, it is optional, and if you use it, you can only compile the classes you want before it is executed.



Let's build a library consisting of an AOT-compiled version of Test::f



. Do not forget: to do this yourself, you will need JDK 9 in build 150+.



 jaotc --output=libTest.so Test.class
      
      





As a result, libTest.so



generated, a library containing AOT-compiled native code of functions included in the Test



class. You can view the characters defined in this library:



 nm libTest.so
      
      





In our conclusion, among other things, there will be:



 0000000000002120 t Test.f()I 00000000000021a0 t Test.<init>()V 00000000000020a0 t Test.main([Ljava/lang/String;)V
      
      





So, all our functions, constructor, f



, static method main



are present in the library libTest.so



.



As in the case of the corresponding “java” option, in this case the option can be accompanied by a file, for this there is the –compile-commands option to jaotc. JEP 295 provides relevant examples that I will not show here.



Let's now run “java” and see if AOT-compiled methods are used. If you run “java” as before, then the AOT library will not be used, and this is not surprising. To use this new feature, the -XX:AOTLibrary



option is provided, which you must specify:



 java -XX:AOTLibrary=./libTest.so Test
      
      





You can specify multiple AOT libraries, separated by commas.



The output of this command is exactly the same as when starting “java” without AOTLibrary



, since the behavior of the Test program has not changed at all. To check if AOT-compiled functions are used, you can add another new option, -XX:+PrintAOT



.



 java -XX:AOTLibrary=./libTest.so -XX:+PrintAOT Test
      
      





Before the Test



program output, this command shows the following:



  9 1 loaded ./libTest.so aot library 99 1 aot[ 1] Test.main([Ljava/lang/String;)V 99 2 aot[ 1] Test.f()I 99 3 aot[ 1] Test.<init>()V
      
      





As planned, the AOT library is loaded, and AOT-compiled functions are used.



If you're interested, you can run the following command and check if JIT compilation is happening.



 java -client -Xbatch -XX:+PrintCompilation -XX:CompileCommandFile=hotspot_compiler -XX:CompileThreshold=5 -XX:-TieredCompilation -XX:AOTLibrary=./libTest.so -XX:+PrintAOT Test
      
      





As expected, JIT compilation does not occur, since the methods in the Test class are compiled before execution and provided as a library.



A possible question is: if we provide a native function code, then how does the JVM determine if the native code is obsolete / stale? As a final example, let's modify the function f



and set a to 6.



 public int f() throws Exception { int a = 6; return a; }
      
      





I did this just to modify the class file. Now we make javac compile and run the same command as above.



 javac Test.java java -client -Xbatch -XX:+PrintCompilation -XX:CompileCommandFile=hotspot_compiler -XX:CompileThreshold=5 -XX:-TieredCompilation -XX:AOTLibrary=./libTest.so -XX:+PrintAOT Test
      
      





As you can see, I did not run “jaotc” after “javac”, so the code from the AOT library is now old and incorrect, and the function f



has a = 5.



The output of the “java” command above demonstrates:



 228 56 b Test::<init> (5 bytes) 229 57 b Test::f (5 bytes)
      
      





This means that the functions in this case were dynamically compiled, so the code resulting from the AOT compilation was not used. So, a change has been detected in the class file. When compilation is performed using javac, its fingerprint is entered into the class, and the class fingerprint is also stored in the AOT library. Since the new fingerprint of the class is different from that stored in the AOT library, native code compiled in advance (AOT) was not used. That is all I wanted to tell about the last version of compilation, before execution.



In this article I tried to explain and illustrate with simple realistic examples how the JVM executes Java code: interpreting it, compiling dynamically (JIT) or in advance (AOT) - moreover, the last opportunity appeared only in JDK 9. I hope you found out something new.



All Articles