How polymorphism is implemented inside the JVM

A translation of this article has been prepared specifically for students in the Java Developer course .










In my previous article Everything About Method Overloading vs Method Overriding , we looked at the rules and differences between method overloading and overriding. In this article, we will see how method overloading and overriding are handled inside the JVM.



For example, take the classes from the previous article: the parent Mammal



(mammal) and the child Human



(human).



 public class OverridingInternalExample { private static class Mammal { public void speak() { System.out.println("ohlllalalalalalaoaoaoa"); } } private static class Human extends Mammal { @Override public void speak() { System.out.println("Hello"); } //   speak() public void speak(String language) { if (language.equals("Hindi")) System.out.println("Namaste"); else System.out.println("Hello"); } @Override public String toString() { return "Human Class"; } } //           public static void main(String[] args) { Mammal anyMammal = new Mammal(); anyMammal.speak(); // Output - ohlllalalalalalaoaoaoa // 10: invokevirtual #4 // Method org/programming/mitra/exercises/OverridingInternalExample$Mammal.speak:()V Mammal humanMammal = new Human(); humanMammal.speak(); // Output - Hello // 23: invokevirtual #4 // Method org/programming/mitra/exercises/OverridingInternalExample$Mammal.speak:()V Human human = new Human(); human.speak(); // Output - Hello // 36: invokevirtual #7 // Method org/programming/mitra/exercises/OverridingInternalExample$Human.speak:()V human.speak("Hindi"); // Output - Namaste // 42: invokevirtual #9 // Method org/programming/mitra/exercises/OverridingInternalExample$Human.speak:(Ljava/lang/String;)V } }
      
      





We can look at the question of polymorphism from two sides: from the “logical” and “physical”. Let's first look at the logical side of the issue.



Logical point of view



From a logical point of view, at the compilation stage, the called method is considered as belonging to the type of reference. But at run time, the method of the object referenced will be called.



For example, in the line humanMammal.speak();



the compiler thinks that Mammal.speak()



will be called, since humanMammal



declared as Mammal



. But at run time, the JVM will know that humanMammal



contains a Human



object and will actually invoke the Human.speak()



method.



It's all pretty simple as long as we stay at a conceptual level. But how does the JVM handle this all internally? How does the JVM calculate which method should be called?



We also know that overloaded methods are not called polymorphic and resolve at compile time. Although sometimes method overloading is called compile-time polymorphism or early / static binding .



Overridden methods (override) resolve at runtime because the compiler does not know if there are overridden methods in the object that is assigned to the link.



Physical point of view



In this section, we will try to find “physical” evidence for all of the above statements. To do this, look at the bytecode that we can get by running javap -verbose OverridingInternalExample



. The -verbose



parameter will allow us to get a more intuitive bytecode corresponding to our java program.



The command above will show two sections of bytecode.



1. The pool of constants . It contains almost everything that is needed to run the program. For example, method references ( #Methodref



), classes ( #Class



), string literals ( #String



).







2. The bytecode of the program. Executable bytecode instructions.







Why method overloading is called static binding



In the above example, the compiler thinks that the humanMammal.speak()



method will be called from the Mammal



class, although at run time it will be called from the object referenced in humanMammal



- it will be an object of the Human



class.



Looking at our code and the javap



result, we see that different bytecode is used to call the methods humanMammal.speak()



, human.speak()



and human.speak("Hindi")



, since the compiler can distinguish them based on the class reference .



Thus, in the event of a method overload, the compiler is able to identify bytecode instructions and method addresses at compile time. That is why this is called static linkage or compile-time polymorphism.



Why method overriding is called dynamic binding



To call the anyMammal.speak()



and humanMammal.speak()



methods, the bytecode is the same, since from the point of view of the compiler both methods are called for the Mammal



class:



 invokevirtual #4 // Method org/programming/mitra/exercises/OverridingInternalExample$Mammal.speak:()V
      
      





So now the question is, if both calls have the same bytecode, how does the JVM know which method to call?



The answer is hidden in the bytecode itself and in the invokevirtual



instruction. According to the JVM specification (translator's note: reference to JVM spec 2.11.8 ) :

The invokevirtual instruction calls the instance method through dispatching on the (virtual) type of the object. This is the normal dispatch of methods in the Java programming language.
The JVM uses the invokevirtual



to invoke in Java methods equivalent to the C ++ virtual methods. In C ++, to override a method in another class, the method must be declared as virtual. But in Java, by default, all methods are virtual (except for final and static methods), so in the child class we can override any method.



The invokevirtual



instruction takes a pointer to the method to call (# 4 is the index in the constant pool).



 invokevirtual #4 // Method org/programming/mitra/exercises/OverridingInternalExample$Mammal.speak:()V
      
      





But reference # 4 further refers to another method and Class.



 #4 = Methodref #2.#27 // org/programming/mitra/exercises/OverridingInternalExample$Mammal.speak:()V #2 = Class #25 // org/programming/mitra/exercises/OverridingInternalExample$Mammal #25 = Utf8 org/programming/mitra/exercises/OverridingInternalExample$Mammal #27 = NameAndType #35:#17 // speak:()V #35 = Utf8 speak #17 = Utf8 ()V
      
      





All these links are used together to get a reference to the method and the class in which the desired method is located. This is also mentioned in the JVM specification ( translator's note: reference to JVM spec 2.7 ):

The Java Virtual Machine does not require any specific internal structure of objects.
In some Oracle Virtual Machine Java implementations, a reference to a class instance is a reference to a handler, which itself consists of a pair of links: one points to a table of object methods and a pointer to a Class object representing the type of the object, and the other to the area data on the heap containing object data.


This means that each reference variable contains two hidden pointers:



  1. A pointer to a table that contains the methods of the object and a pointer to the Class



    object, for example, [speak(), speak(String) Class object]



  2. A pointer to memory on the heap allocated for object data, such as object field values.


But again the question arises: how does invokevirtual



work with this? Unfortunately, no one can answer this question, because it all depends on the implementation of the JVM and varies from JVM to JVM.



From the above reasoning, we can conclude that a reference to an object indirectly contains a link / pointer to a table that contains all the references to the methods of this object. Java borrowed this concept from C ++. This table is known by various names, such as virtual method table (VMT), virtual function table (vftable), virtual table (vtable), dispatch table .



We cannot be sure how vtable is implemented in Java, because it depends on the specific JVM. But we can expect that the strategy will be about the same as in C ++, where vtable is an array-like structure that contains method names and their references. Whenever the JVM tries to execute a virtual method, it requests its address in the vtable.



For each class, there is only one vtable, which means that the table is unique and the same for all objects of the class, similar to the Class object. Class objects are discussed in more detail in the articles Why an outer Java class can't be static and Why Java is Purely Object-Oriented Language Or Why Not .



Thus, there is only one vtable for the Object



class, which contains all 11 methods (if registerNatives



not taken into account) and links corresponding to their implementation.







When the JVM loads the Mammal class into memory, it creates a Class



object for it and creates a vtable that contains all the methods from the vtable of the Object



class with the same references (since Mammal



does not override the methods from Object



) and adds a new entry for the speak()



method.







Then the class of the Human



class comes in, and the JVM copies all the entries from the vtable of the Mammal



class to the vtable of the Human



class and adds a new entry for the overloaded version of speak(String)



.



The JVM knows that the Human



class has overridden two methods: toString()



from Object



and speak()



from Mammal



. Now for these methods, instead of creating new records with updated links, the JVM will change the links to existing methods in the same index in which they were previously present, and retain the same method names.







The invokevirtual



instruction causes the JVM to process the value in the reference to method # 4 not as an address, but as the name of the method to be searched in the vtable for the current object.

I hope it’s now more clear how the JVM uses the constant pool and virtual method table to determine which method to call.

You can find the sample code in the Github repository.



All Articles